| Size: 6680 Comment:  | Size: 10094 Comment:  | 
| Deletions are marked like this. | Additions are marked like this. | 
| Line 11: | Line 11: | 
| == New: ERANGE 3.2 == | == ERANGE Development Edition == | 
| Line 13: | Line 13: | 
| ERANGE3.2 is now released. This is the version we are using in the Wold lab for both ChIP-seq and RNA-seq analyses. Some of the new supported features are: | ERANGE is now available through Git. Future releases starting with ERANGE 4.0 will be available through a git repository. Development snapshots will be uploaded periodically for any interested parties. Available right now is a development alpha of ERANGE 4.0. [[http://woldlab.caltech.edu/gitweb | Woldlab Gitweb portal]] | 
| Line 15: | Line 15: | 
| * Support for custom annotations * Support for scaffolding of genomes using RNA-seq | Developers wishing to create a clone of the repository can do so using: | 
| Line 18: | Line 17: | 
| The current version is: [[href="http://woldlab.caltech.edu/rnaseq/ERANGE3.2.tgz | ERANGE3.2.tgz ]], which was released on 2010/06/03. Note that this version of ERANGE requires an upgrade of Cistematic to version 3.0. | git clone git://woldlab.caltech.edu/erange.git == Important: Discontinue use of ERANGE version 3.2.1 == Recently, an error in version 3.2.1 of Erange has emerged that will result in too many peaks being returned by findall.py and a reported FDR that is too high. Additionally gene counts will be returned as zero although rpkm values will be correct. It is recommended that ERANGE 4.0a be downloaded from the above repository and used. This version has been tested with several of our datasets and the results agree with the prior (v3.2) release. == ERANGE 4.0a == New Features and Functions Erange supports configuration files The files erange.config in the current working directory or ~/.erange.config will be read by most erange scripts. These files should now be used instead of environmental variables to set the root directory and temp directories for cistematic. Command line options can also be set in config files for most erange scripts. Cistematic integration Cistematic will now be developed and released in concert with erange. Erange supports optparse Command line processing is now carried out by using optparse. This will require all long command line arguments to use a double dash (--) instead of the current single dash (-). Improved package topology End user software built on top of erange will now find it easier to import erange either entirely or as submodules. | 
| Line 22: | Line 40: | 
| * A guide to building RDS files: [[ http://woldlab.caltech.edu/rnaseq/README.build-rds | README.build-rds ]] * A guide to using ERANGE for ChIP-seq: [[ http://woldlab.caltech.edu/rnaseq/README.chip-seq | README.chip-seq ]] * A guide to using ERANGE for (expressed) SNPs: [[ http://woldlab.caltech.edu/rnaseq/README.rna-esnp | README.rna-esnp ]] * A guide to using ERANGE for RNA-seq: [[ http://woldlab.caltech.edu/rnaseq/README.rna-seq | README.rna-seq ]] * A list of steps describing the standard RNA-seq pipeline: [[ http://woldlab.caltech.edu/rnaseq/RNA-seq.analysisSteps.txt | RNA-seq.analysisSteps.txt ]] | * A guide to building RDS files: [[ http://woldlab.caltech.edu/erange/README.build-rds | README.build-rds ]] * A guide to using ERANGE for ChIP-seq: [[ http://woldlab.caltech.edu/erange/README.chip-seq | README.chip-seq ]] * A guide to using ERANGE for (expressed) SNPs: [[ http://woldlab.caltech.edu/erange/README.rna-esnp | README.rna-esnp ]] * A guide to using ERANGE for RNA-seq: [[ http://woldlab.caltech.edu/erange/README.rna-seq | README.rna-seq ]] * A list of steps describing the standard RNA-seq pipeline: [[ http://woldlab.caltech.edu/erange/RNA-seq.analysisSteps.txt | RNA-seq.analysisSteps.txt ]] | 
| Line 37: | Line 55: | 
| == Interim Erange3.3 release == An interim build of Erange is available that includes support for analysis using self organizing maps. This will be incorporated into Erange4.0 shortly, but is being made available early as an interim release. [[http://woldlab.caltech.edu/erange/ERANGE3.3.tgz | ERANGE3.3.tgz ]] To use this you will also need an updated version of Cistematic's [[http://woldlab.caltech.edu/erange/analyzego.py | analyzego.py ]] which should replace the version located in your $CISTEMATIC_ROOT/cistematic/stat/ in order to support the bonferroni map correction. | |
| Line 41: | Line 62: | 
| To use it for RNA-seq, first go through the [[ http://woldlab.caltech.edu/rnaseq/rnaseqREADME.txt | RNA-seq README ]], then read the file [[ http://woldlab.caltech.edu/rnaseq/analysisSteps.txt | analysisSteps.txt ]] and take a look at the pipeline shell script runStandardAnalysis.sh. | To use it for RNA-seq, first go through the [[ http://woldlab.caltech.edu/erange/rnaseqREADME.txt | RNA-seq README ]], then read the file [[ http://woldlab.caltech.edu/erange/analysisSteps.txt | analysisSteps.txt ]] and take a look at the pipeline shell script runStandardAnalysis.sh. | 
| Line 47: | Line 68: | 
| * [[ http://woldlab.caltech.edu/rnaseq/ERANGE2.tgz | ERANGE2.tgz ]] (the code) * [[ http://woldlab.caltech.edu/rnaseq/mm9splices_spikes.tgz | mm9splices_spikes.tgz ]] (the files for building the exapnded genomes and remapping splices) * http://woldlab.caltech.edu/rnaseq/RNAFAR.tgz | RNAFAR.tgz ]] (the consolidated RNAFAR analysis, includes repeat library from UCSC - large!) | * [[ http://woldlab.caltech.edu/erange/ERANGE2.tgz | ERANGE2.tgz ]] (the code) * [[ http://woldlab.caltech.edu/erange/mm9splices_spikes.tgz | mm9splices_spikes.tgz ]] (the files for building the exapnded genomes and remapping splices) * [[ http://woldlab.caltech.edu/erange/RNAFAR.tgz | RNAFAR.tgz ]] (the consolidated RNAFAR analysis, includes repeat library from UCSC - large!) | 
| Line 59: | Line 80: | 
| * bigBed / bigWig files for usage with the UCSC genome browser == Tissue Table == | |
| Line 60: | Line 84: | 
| === Brain 1 (no spike) === | {{{#!html <table> <thead> <tr> <td>Spike-Ins?</td> <td>Tissue</td> <td>wig</td> <td>beds.tgz</td> <td>rpkms.tgz</td> <td>comb.eland2.gz</td> <td>bigbed.tgz</td> <tr> </thead> <tbody> <tr> <td rowspan="3">No Spike-In</td> <td>Brain</td> <td><a href="http://woldlab.caltech.edu/erange/mm9Brain.wig">mm9Brain</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Brain1.beds.tgz">mm9Brain1</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Brain1.rpkms.tgz">mm9Brain1</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Brain1.comb.eland2.gz">mm9Brain1</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Brain1.bigbed.tgz">mm9Brain1</a></td> </tr> <tr> <td>Liver</td> <td><a href="http://woldlab.caltech.edu/erange/mm9Liver.wig">mm9Liver</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Liver1.beds.tgz">mm9Liver1</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Liver1.rpkms.tgz">mm9Liver1</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Liver1.comb.eland2.gz">mm9Liver1</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Liver1.bigbed.tgz">mm9Liver1</a></td> </tr> <tr> <td>Muscle</td> <td><a href="http://woldlab.caltech.edu/erange/mm9Muscle.wig">mm9Muscle</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Muscle1.beds.tgz">mm9Muscle1</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Muscle1.rpkms.tgz">mm9Muscle1</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Muscle1.comb.eland2.gz">mm9Muscle1</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Muscle1.bigbed.tgz">mm9Muscle1</a></td> </tr> <tr> <td rowspan="3">Spike-In</td> <td>Brain</td> <td><a href="http://woldlab.caltech.edu/erange/mm9Brain2.wig">mm9Brain2</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Brain2.beds.tgz">mm9Brain2</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Brain2.rpkms.tgz">mm9Brain2</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Brain2.comb.eland2.gz">mm9Brain2</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Brain2.bigbed.tgz">mm9Brain2</a></td> </tr> <tr> <td>Liver</td> <td><a href="http://woldlab.caltech.edu/erange/mm9Liver2.wig">mm9Liver2</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Liver2.beds.tgz">mm9Liver2</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Liver2.rpkms.tgz">mm9Liver2</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Liver2.comb.eland2.gz">mm9Liver2</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Liver2.bigbed.tgz">mm9Liver2</a></td> </tr> <tr> <td>Muscle</td> <td><a href="http://woldlab.caltech.edu/erange/mm9Muscle2.wig">mm9Muscle2</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Muscle2.beds.tgz">mm9Muscle2</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Muscle2.rpkms.tgz">mm9Muscle2</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Muscle2.comb.eland2.gz">mm9Muscle2</a></td> <td><a href="http://woldlab.caltech.edu/erange/mm9Muscle2.bigbed.tgz">mm9Muscle2</a></td> </tr> </tbody> </table> }}} | 
| Line 62: | Line 152: | 
| * [[ http://woldlab.caltech.edu/rnaseq/mm9Brain.wig | mm9Brain.wig ]] * [[ http://woldlab.caltech.edu/rnaseq/mm9Brain1.beds.tgz | mm9Brain1.beds.tgz ]] | === Help === | 
| Line 65: | Line 154: | 
| * [[ http://woldlab.caltech.edu/rnaseq/mm9Brain1.rpkms.tgz | mm9Brain1.rpkms.tgz ]] * [[ http://woldlab.caltech.edu/rnaseq/mm9Brain1.comb.eland2.gz | mm9Brain1.comb.eland2.gz ]] === Brain 2 (spike) === * [[ http://woldlab.caltech.edu/rnaseq/mm9Brain2.wig | mm9Brain2.wig ]] * [[ http://woldlab.caltech.edu/rnaseq/mm9Brain2.beds.tgz | mm9Brain2.beds.tgz ]] * [[ http://woldlab.caltech.edu/rnaseq/mm9Brain2.rpkms.tgz | mm9Brain2.rpkms.tgz ]] * [[ http://woldlab.caltech.edu/rnaseq/mm9Brain2.comb.eland2.gz | mm9Brain2.comb.eland2.gz ]] === Liver 1 (no spike) === * [[ http://woldlab.caltech.edu/rnaseq/mm9Liver.wig | mm9Liver.wig ]] * [[ http://woldlab.caltech.edu/rnaseq/mm9Liver1.beds.tgz | mm9Liver1.beds.tgz ]] * [[ http://woldlab.caltech.edu/rnaseq/mm9Liver1.rpkms.tgz | mm9Liver1.rpkms.tgz ]] * [[ http://woldlab.caltech.edu/rnaseq/mm9Liver1.comb.eland2.gz | mm9Liver1.comb.eland2.gz ]] === Liver 2 (spike) === * [[ http://woldlab.caltech.edu/rnaseq/mm9Liver2.wig | mm9Liver2.wig ]] * [[ http://woldlab.caltech.edu/rnaseq/mm9Liver2.beds.tgz | mm9Liver2.beds.tgz ]] * [[ http://woldlab.caltech.edu/rnaseq/mm9Liver2.rpkms.tgz | mm9Liver2.rpkms.tgz ]] * [[ http://woldlab.caltech.edu/rnaseq/mm9Liver2.comb.eland2.gz | mm9Liver2.comb.eland2.gz ]] === Muscle 1 (no spike) === * [[ http://woldlab.caltech.edu/rnaseq/mm9Muscle.wig | mm9Muscle.wig ]] * [[ http://woldlab.caltech.edu/rnaseq/mm9Muscle1.beds.tgz | mm9Muscle1.beds.tgz ]] * [[ http://woldlab.caltech.edu/rnaseq/mm9Muscle1.rpkms.tgz | mm9Muscle1.rpkms.tgz ]] * [[ http://woldlab.caltech.edu/rnaseq/mm9Muscle1.comb.eland2.gz | mm9Muscle1.comb.eland2.gz ]] === Muscle 2 (spike) === * [[ http://woldlab.caltech.edu/rnaseq/mm9Muscle2.wig | mm9Muscle2.wig ]] * [[ http://woldlab.caltech.edu/rnaseq/mm9Muscle2.beds.tgz | mm9Muscle2.beds.tgz< ]] * [[ http://woldlab.caltech.edu/rnaseq/mm9Muscle2.rpkms.tgz | mm9Muscle2.rpkms.tgz ]] * [[ http://woldlab.caltech.edu/rnaseq/mm9Muscle2.comb.eland2.gz | mm9Muscle2.comb.eland2.gz ]] | For assistance with Erange please contact Sean Upchurch (sau AT caltech.edu) | 
| Line 107: | Line 157: | 
| Last Modified: 2010/08/09 by Sean Upchurch | Last Modified: 7 Jun 2011 by Sean Upchurch | 
Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq
Ali Mortazavi, Brian Williams, Kenneth McCue, Lorian Schaeffer, Barbara Wold
This is the page of the underlying data and code for the analysis of the paper above, which has been published in Nature Methods in 2008. While the paper focuses on mouse tissues, we have since used the same code in C elegans and human cell lines with great success.
If using Bowtie 0.10.X, please make sure to use the new '--strata' flag in order to handle multireads correctly. Note that ERANGE is not compatible with bowtie 0.9.9.X.
ERANGE Development Edition
ERANGE is now available through Git. Future releases starting with ERANGE 4.0 will be available through a git repository. Development snapshots will be uploaded periodically for any interested parties. Available right now is a development alpha of ERANGE 4.0. Woldlab Gitweb portal
Developers wishing to create a clone of the repository can do so using:
git clone git://woldlab.caltech.edu/erange.git
Important: Discontinue use of ERANGE version 3.2.1
Recently, an error in version 3.2.1 of Erange has emerged that will result in too many peaks being returned by findall.py and a reported FDR that is too high. Additionally gene counts will be returned as zero although rpkm values will be correct.
It is recommended that ERANGE 4.0a be downloaded from the above repository and used. This version has been tested with several of our datasets and the results agree with the prior (v3.2) release.
ERANGE 4.0a
New Features and Functions
Erange supports configuration files
- The files erange.config in the current working directory or ~/.erange.config will be read by most erange scripts. These files should now be used instead of environmental variables to set the root directory and temp directories for cistematic. Command line options can also be set in config files for most erange scripts.
Cistematic integration
- Cistematic will now be developed and released in concert with erange.
Erange supports optparse
- Command line processing is now carried out by using optparse. This will require all long command line arguments to use a double dash (--) instead of the current single dash (-).
Improved package topology
- End user software built on top of erange will now find it easier to import erange either entirely or as submodules.
The following READMEs constitute the bulk of the documentation for ERANGE:
- A guide to building RDS files: README.build-rds 
- A guide to using ERANGE for ChIP-seq: README.chip-seq 
- A guide to using ERANGE for (expressed) SNPs: README.rna-esnp 
- A guide to using ERANGE for RNA-seq: README.rna-seq 
- A list of steps describing the standard RNA-seq pipeline: RNA-seq.analysisSteps.txt 
You are highly encouraged to use the following pipeline scripts rather than the individual commands for RNA-seq:
- unpaired reads: runStandardAnalysis.sh
- unpaired reads, stranded: runStrandedAnalysis.sh
- paired reads (assumes on network partition): runRNAPairedAnalysis.sh
- expressed SNP analysis: runSNPAnalysis.sh
Please note that ERANGE3.X is a major departure from the bed-based formats used in ERANGE2.0 and requires re-importing read mappings into sqlite based read datasets (RDS). However, we suggest that you run v3.X instead of v2.X for production purposes.
Interim Erange3.3 release
An interim build of Erange is available that includes support for analysis using self organizing maps. This will be incorporated into Erange4.0 shortly, but is being made available early as an interim release. ERANGE3.3.tgz To use this you will also need an updated version of Cistematic's analyzego.py which should replace the version located in your $CISTEMATIC_ROOT/cistematic/stat/ in order to support the bonferroni map correction.
Dual-use E-RANGE
E-RANGE is our Python package for doing RNA-seq and ChIP-seq (hence the "dual-use"), and is a descendant of the ChIPSeq mini peak finder (Johnson, 2007).
To use it for RNA-seq, first go through the RNA-seq README, then read the file analysisSteps.txt and take a look at the pipeline shell script runStandardAnalysis.sh.
Note that E-RANGE assumes the following requirements: Python 2.5, Linux / Mac OS X (preferably with the Python Psyco compiler), and Cistematic 2.0 (all scripts with a command line genome specification rely on Cistematic!), which you can get here.
If you want to rerun our entire analysis starting with either the raw data (eland files) or the bed files, you will need the following files:
- ERANGE2.tgz (the code) 
- mm9splices_spikes.tgz (the files for building the exapnded genomes and remapping splices) 
- RNAFAR.tgz (the consolidated RNAFAR analysis, includes repeat library from UCSC - large!) 
The Mouse Reference data
Briefly, each tissue has two replicates, the second of which was done with spike-ins, as described in the paper. For each replicate we provide:<br>
- Normalized wigglegrams of the unique reads to display them on UCSC (mm9)
- Bed files of all of the reads (uniques, splices, multireads, spikes) - note that only the splice bed files are small enough for loading onto UCSC
- RPKM counts for each of the major steps of E-RANGE
- ELAND results files run with the --multi option on the expanded genomes for those who want to look at the raw data (these files are *huge* - up to 1GB)
- bigBed / bigWig files for usage with the UCSC genome browser
Tissue Table
| Spike-Ins? | Tissue | wig | beds.tgz | rpkms.tgz | comb.eland2.gz | bigbed.tgz | 
| No Spike-In | Brain | mm9Brain | mm9Brain1 | mm9Brain1 | mm9Brain1 | mm9Brain1 | 
| Liver | mm9Liver | mm9Liver1 | mm9Liver1 | mm9Liver1 | mm9Liver1 | |
| Muscle | mm9Muscle | mm9Muscle1 | mm9Muscle1 | mm9Muscle1 | mm9Muscle1 | |
| Spike-In | Brain | mm9Brain2 | mm9Brain2 | mm9Brain2 | mm9Brain2 | mm9Brain2 | 
| Liver | mm9Liver2 | mm9Liver2 | mm9Liver2 | mm9Liver2 | mm9Liver2 | |
| Muscle | mm9Muscle2 | mm9Muscle2 | mm9Muscle2 | mm9Muscle2 | mm9Muscle2 | 
Help
For assistance with Erange please contact Sean Upchurch (sau AT caltech.edu)
Last Modified: 7 Jun 2011 by Sean Upchurch
 Wold Lab
Wold Lab