Running RSeq Workflow

This part will map the fastq/fasta RNA-Seq data to genome and transcriptome, then produce QC report of sequencing data, SNP calling results, wig file for Genome browser, expression level and reads counts for gene/exon/splicing junction.

Workflow Input Files
The RSeq workflow requires 5 inputs which are as follows

  1. FastQ File
  2. Bustard Summary File
  3. Precomputed PerM OR Bowtie index files –
    Note: a. If you plan to use PerM mapper you only need to download PerM index files.
    b. You need not download all index files. Download only index files for the Gender and Read Length corresponding to your input sample.
  4. Genome FA file.
    Note: You need not download all FA files. Download only the FA file for the Gender corresponding to your input sample.
  5. Reference File – Precomputed reference files for read length 50, 75, 100 are provided in the VM.

Obtaining Input File(s)
Sample FastQ/index files are provided here. Additionally, you may use your own FastQ files as input.

To use any input file with the RSeq workflow you need to download the files into the Virtual Machine.

Steps to run the RSeq Workflow
The RSeq workflow currently only supports samples with read-length of 50, 75, and 100. Kindly ensure the read length of your sample files is supported.
The explanation assumes that a user is trying to use their own sample with the workflow. The name of the sample FastQ file my-own-fastq.txt nad name of the bustard summary file is my-summary.htm.
Additionally the sample has the following attributes:

  • Flowcell ID: FLOW8
  • Sample ID: SAMP_5
  • Lane Number: 4
  • Read Length: 100
  • Gender: Female
  1. Place your input files in the folder that you have shared with the Virtual Machine
  2. Inside the VM: Open a terminal Terminal
  3. Copy your files inside the VM.

    cp /brain/shared/my-own-fastq.txt /brain/atlas/data/
    cp /brain/shared/my-summary.htm /brain/atlas/data/

  4. Registering your FastQ/Bustard Summary File(s)
    To register FastQ and Summary file execute the add-sample.sh
    Usage: /brain/atlas/data/scripts/add-sample.sh <-f FLOW> <-s SAMPLE> \
    <-l LANE> [-r RCFILE] [-b SUMMARY] [-i FASTQ]
    -f, --flowcell – Flowcell ID
    -s, --sample – Sample ID
    -l, --lane – Lane number
    -b, --summary-file Location of summary file
    -i, --input-fastq Location of FastQ file
    -r, --rc-file – This is an optional parameter which specifies the location of the Pegasus resource-catalog file.

    Examples

    The following command registers my-own-fastq.txt and my-summary.htm files with the resource catalog.

    cd /brain/atlas/data
    add-sample.sh --flowcell FLOW8 --sample SAMP_5 --lane 4 \
    --summary-file ./my-summary.htm \
    --input-fastq ./my-own-fastq.txt

  5. Downloading index and FA files
    The RSeq workflow requires a number other files such as precomputed index files, and FA files. To simplify the process of downloading, decompressing, and registering these files we have provided a tool. This tool will download all required index and FA files based on the sample attributes specified.
    PerM: If you want to use PerM mapper with the workflow.

    cd /brain/atlas/data
    reference-downloader.sh --mapper bowtie --gender female

    Bowtie: If you want to use Bowtie mapper with the workflow.

    cd /brain/atlas/data
    reference-downloader.sh --mapper perm --gender male --read-length 100

  6. Execute the Workflow

    cd /brain/atlas

    To start the workflow, execute the run.sh script.
    run.sh <Flowcell> <Sample-ID> <Lane-Number> <Gender> <Read-Length> \ <Mismatch-Count> <Parts> <Illumina-Summary-File-Extension(.xml/.htm)> \ <Bowtie(Y/N)>

    Flowcell – Is a Flowcell ID
    Sample – Is a Sample ID
    Lane-Number – Lane number of the sample
    Gender – Gender of the sample (M/F)
    Read-Length – Read ength with which to run the workflow (50/75/100)
    Mismatch-Count – Number of mismatches allowed
    Parts – Number of parts in which to split mapping job (For VM we recommend setting this to 1)
    Illumina-Summary-File-Extension – Valid values .htm or .xml
    Bowtie – Valid values Y (Use Bowtie Mapper) or N (Use PerM mapper)
    NOTE: Flowcell Sample-ID and Lane-Number should be the same as the ones provided while registering the sample input
    NOTE: Read Length should be the same as the one provided while registering the PerM index

    Examples

    Run the workflow on your sample using PerM mapper.

    run.sh FLOW8 SAMP_5 4 F 100 5 1 .htm N

    Run the workflow on your sample using Bowtie mapper.

    run.sh FLOW8 SAMP_5 4 F 100 5 1 .htm Y

  7. Monitoring the work-flow
    1. To monitor status on the running workflow execute

      pegasus-status -l /brain/atlas/runs/tutorial/pegasus/FLOW8_SAMP_5_4/run0001/
      # Sample Output
      05/03/11 11:54:23 Done Pre Queued Post Ready Un-Ready Failed
      05/03/11 11:54:23 ===  === === === === === ===
      05/03/11 11:54:23   0   0   1   0   0 115   0

      WORKFLOW STATUS : RUNNING | 0/116 ( 0% ) | (condor processing workflow)

    2. Continue checking the status until workflow status is completed.
    3. To verify if all jobs were successful, execute

      pegasus-analyzer -i /brain/atlas/runs/tutorial/pegasus/FLOW8_SAMP_5_4/run0001/

  8. Location of Output
    Output is located at
    /brain/atlas/storage/tutorial/pegasus/FLOW8_SAMP_5_4/run0001/ directory.

Running the work-flow on your own cluster
Instructions for Bowtie version of the pipeline setup are here.