Running Rhesus RSeq Workflow

This part will map the fastq/fasta RNA-Seq data to genome and transcriptome, then produce QC report of sequencing data, SNP calling results, wig file for Genome browser, expression level and reads counts for gene/exon/splicing junction.

Workflow Input Files
The RSeq workflow requires 5 inputs which are as follows

  1. FastQ File
  2. Bustard Summary File
  3. Precomputed PerM OR Bowtie index files –
    Note: a. If you plan to use PerM mapper you only need to download PerM index files.
    b. You need not download all index files. Download only index files for the Gender and Read Length corresponding to your input sample.
  4. Reference File – Precomputed reference files for read length 76 is provided in the VM.

Obtaining Input File(s)
Sample FastQ files are provided here. Additionally, you may use your own FastQ files as input.

Steps to run the Rhesus RSeq Workflow

  1. Place your input files in the folder that you have shared with the Virtual Machine
  2. Inside the VM: Open a terminal Terminal
  3. Copy your files inside the VM.

    cp /brain/shared/my-own-fastq.txt /brain/rhesus/data/
    cp /brain/shared/my-summary.htm /brain/rhesus/data/

  4. Registering your FastQ/Bustard Summary File(s)

    cd /brain/rhesus/data

    To register your input FastQ file, execute the following command.
    Usage: /brain/rhesus/data/scripts/add-rhesus-sample.sh <-f FLOW> \
    <-s SAMPLE> <-l LANE> [-i FASTQ_FILE] [-r RCFILE]
    -f, --flowcell – Flowcell ID
    -s, --sample – Sample ID
    -l, --lane – Lane number
    -i, --file – Path to the FastQ file to be registered
    -r, --rc-file – Location of the Pegasus resource-catalog file (Optional).

    Note: If -i, --file switch is not provided then the command searches the current folder and registers the first file with an extension “fq”.

    Examples

    The following command registers the FastQ file located in the current folder with the resource catalog located at /brain/rhesus/config/rc.data.

    // From within the decompressed sample directory execute the
    // following command
    add-rhesus-sample.sh -f FLOW -s HSB_125 -l 1

    The following command registers the FastQ file located in the current folder with the resource catalog located at /brain/catalog/rc.data.

    // From within the decompressed sample directory execute the
    // following command
    add-rhesus-sample.sh -f FLOW -s HSB_125 -l 1 -r /brain/catalog/rc.data

    The following command registers the FastQ file located at /brain/rhesus/data/sample_1/s_5_sequence.txt with the resource catalog.

    add-rhesus-sample.sh -f FLOW -s HSB_125 -l 1 -i \ /brain/rhesus/data/sample_1/s_5_sequence.txt

  5. Downloading index files
    The Rhesus RSeq workflow requires a number of other files such as precomputed index files. To simplify the process of downloading, decompressing, and registering these files we have provided a tool. This tool will download all required files based on the sample attributes specified.
    PerM: If you want to use PerM mapper with the workflow.

    cd /brain/rhesus/data
    rhesus-reference-downloader.sh --mapper bowtie

    Bowtie: If you want to use Bowtie mapper with the workflow.

    cd /brain/rhesus/data
    rhesus-reference-downloader.sh --mapper perm --read-length 76

  6. Execute the Workflow

    cd /brain/rhesus

    Usage: run-rhesus.sh <Flowcell> <Sample-ID> <Lane-Number> <Gender> \
    <Read-Length> <Mismatch-Count>

    Flowcell – Is a Flowcell ID
    Sample – Is a Sample ID
    Lane-Number – Lane number of the sample
    Gender – Gender of the sample (M/F)
    Read-Length – Read ength with which to run the workflow
    Mismatch-Count – Number of mismatches allowed
    Illumina-Summary-File-Extension – Valid values .htm or .xml
    Bowtie – Valid values Y (Use Bowtie Mapper) or N (Use PerM mapper)
    NOTE: Flowcell Sample-ID and Lane-Number should be the same as the ones provided while registering the sample input
    NOTE: Read Length should be the read-length of the sample on which the workflow is being run.

    Examples

    The following command will run the workflow on a male sample with read length 76 and allowed mismatch count of 5 using PerM mapper

    run-rhesus.sh RHESUS HSB_125 1 M 76 5 .htm

    The following command will run the workflow on a male sample with read length 76 and allowed mismatch count of 3 using Bowtie mapper

    run-rhesus.sh RHESUS HSB_125 1 M 76 5 .htm Y

  7. Monitoring the work-flow

    1. To monitor status of the running workflow execute

      pegasus-status -l \
      /brain/rhesus/runs/tutorial/pegasus/<FLOWCELL_SAMPLE_LANE>/runXXXX/
      // Sample Output
      05/03/11 11:54:23 Done Pre Queued Post Ready Un-Ready Failed
      05/03/11 11:54:23 ===  === === === === === ===
      05/03/11 11:54:23   0   0   1   0   0 115   0

      WORKFLOW STATUS : RUNNING | 0/116 ( 0% ) | (condor processing workflow)

    2. Continue checking the status until workflow status is completed.
    3. To verify if all jobs were successful, execute

      pegasus-analyzer -i \
      /brain/rhesus/runs/tutorial/pegasus/<FLOWCELL_SAMPLE_LANE>/runXXXX/

  8. Location of Output
    Output is located at /brain/rhesus/storage/tutorial/pegasus/<FLOWCELL_SAMPLE_LANE>/runXXXX/ directory.

Running the work-flow on your own cluster
You may contact us to setup the pipeline on your respective clusters.