This part will map the fastq/fasta RNA-Seq data to genome and transcriptome, then produce QC report of sequencing data, SNP calling results, wig file for Genome browser, expression level and reads counts for gene/exon/splicing junction.
Workflow Input Files
The RSeq workflow requires 5 inputs which are as follows
- FastQ File
- Bustard Summary File
- Precomputed PerM OR Bowtie index files –
Note: a. If you plan to use PerM mapper you only need to download PerM index files.
b. You need not download all index files. Download only index files for the Gender and Read Length corresponding to your input sample. - Reference File – Precomputed reference files for read length 76 is provided in the VM.
Obtaining Input File(s)
Sample FastQ files are provided here. Additionally, you may use your own FastQ files as input.
Steps to run the Rhesus RSeq Workflow
- Place your input files in the folder that you have shared with the Virtual Machine
- Inside the VM: Open a terminal

- Copy your files inside the VM.
cp /brain/shared/my-own-fastq.txt /brain/rhesus/data/
cp /brain/shared/my-summary.htm /brain/rhesus/data/
- Registering your FastQ/Bustard Summary File(s)
cd /brain/rhesus/data
To register your input FastQ file, execute the following command.
Usage: /brain/rhesus/data/scripts/add-rhesus-sample.sh <-f FLOW> \
<-s SAMPLE> <-l LANE> [-i FASTQ_FILE] [-r RCFILE]
-f, --flowcell – Flowcell ID
-s, --sample – Sample ID
-l, --lane – Lane number
-i, --file – Path to the FastQ file to be registered
-r, --rc-file – Location of the Pegasus resource-catalog file (Optional).Note: If -i, --file switch is not provided then the command searches the current folder and registers the first file with an extension “fq”.
Examples
The following command registers the FastQ file located in the current folder with the resource catalog located at /brain/rhesus/config/rc.data.
// From within the decompressed sample directory execute the
// following command
add-rhesus-sample.sh -f FLOW -s HSB_125 -l 1The following command registers the FastQ file located in the current folder with the resource catalog located at /brain/catalog/rc.data.
// From within the decompressed sample directory execute the
// following command
add-rhesus-sample.sh -f FLOW -s HSB_125 -l 1 -r /brain/catalog/rc.dataThe following command registers the FastQ file located at /brain/rhesus/data/sample_1/s_5_sequence.txt with the resource catalog.
add-rhesus-sample.sh -f FLOW -s HSB_125 -l 1 -i \ /brain/rhesus/data/sample_1/s_5_sequence.txt - Downloading index files
The Rhesus RSeq workflow requires a number of other files such as precomputed index files. To simplify the process of downloading, decompressing, and registering these files we have provided a tool. This tool will download all required files based on the sample attributes specified.
PerM: If you want to use PerM mapper with the workflow.
cd /brain/rhesus/data
rhesus-reference-downloader.sh --mapper bowtie
Bowtie: If you want to use Bowtie mapper with the workflow.
cd /brain/rhesus/data
rhesus-reference-downloader.sh --mapper perm --read-length 76
-
Execute the Workflow
cd /brain/rhesus
Usage: run-rhesus.sh <Flowcell> <Sample-ID> <Lane-Number> <Gender> \
<Read-Length> <Mismatch-Count>Flowcell – Is a Flowcell ID
Sample – Is a Sample ID
Lane-Number – Lane number of the sample
Gender – Gender of the sample (M/F)
Read-Length – Read ength with which to run the workflow
Mismatch-Count – Number of mismatches allowed
Illumina-Summary-File-Extension – Valid values .htm or .xml
Bowtie – Valid values Y (Use Bowtie Mapper) or N (Use PerM mapper)
NOTE: Flowcell Sample-ID and Lane-Number should be the same as the ones provided while registering the sample input
NOTE: Read Length should be the read-length of the sample on which the workflow is being run.Examples
The following command will run the workflow on a male sample with read length 76 and allowed mismatch count of 5 using PerM mapper
run-rhesus.sh RHESUS HSB_125 1 M 76 5 .htmThe following command will run the workflow on a male sample with read length 76 and allowed mismatch count of 3 using Bowtie mapper
run-rhesus.sh RHESUS HSB_125 1 M 76 5 .htm Y -
Monitoring the work-flow
- To monitor status of the running workflow execute
pegasus-status -l \
/brain/rhesus/runs/tutorial/pegasus/<FLOWCELL_SAMPLE_LANE>/runXXXX/
// Sample Output
05/03/11 11:54:23 Done Pre Queued Post Ready Un-Ready Failed
05/03/11 11:54:23 === === === === === === ===
05/03/11 11:54:23 0 0 1 0 0 115 0
WORKFLOW STATUS : RUNNING | 0/116 ( 0% ) | (condor processing workflow)
- Continue checking the status until workflow status is completed.
- To verify if all jobs were successful, execute
pegasus-analyzer -i \
/brain/rhesus/runs/tutorial/pegasus/<FLOWCELL_SAMPLE_LANE>/runXXXX/
- To monitor status of the running workflow execute
-
Location of Output
Output is located at /brain/rhesus/storage/tutorial/pegasus/<FLOWCELL_SAMPLE_LANE>/runXXXX/ directory.
Running the work-flow on your own cluster
You may contact us to setup the pipeline on your respective clusters.