Quickstart – RSeq Workflow

In this quick start guide we will show you how to set up the virtual machine and run the RSeq workflow using a sample dataset.

  1. VirtualBox You will need to install Virtual Box to run the virtual machine on your computer. If you already have VirtualBox installed, you can use that. Otherwise download the binary versions and install them from the Virtual Box Website.
  2. Download the Virtual Machine The virtual machine is around 1.0 GB in size. We recommend using a command line tool like “wget” to download the VM. Downloading the image using the browser sometimes corrupts the VM. Download Brain VM
  3. Configuring the VM
    1. Uncompress the image using the following command
      On Linux execute the following command:

      tar jxvf Brain.tar.bz2

      On Windows: Use a client like WinRar.

    2. Run VirtualBox
    3. Click on New
    4. Click Next
    5. Enter Details as shown and
    6. Click Next
    7. Specify a Base Memory Size and Click Next. RSEQ work-flow uses PerM mapper which requires a large amount of physical memory. The recommended amount is 16384MB. RSEQ wokflow also support Bowtie mapper which requires 4096MB of memory
      Note: The computer on which the VM is deployed will require memory (RAM) greater than 16384MB for PerM OR 4096MB for Bowtie.
    8. Click Next
    9. Click Choose existing hard disk
    10. Select the Debian-6-x86.vmdk file from the decompressed folder
    11. Click Next
    12. Click Finish
    13. Select BrainVM and Click Settings
    14. Click on Storage -> SATA Controller
    15. Click Add Hard Disk
    16. Click Choose Existing Disk
    17. Select the Brain.vmdk file from the decompressed folder and Click Open
    18. Setup Shared Folder:
      1. Click on Settings
      2. Click on Shared Folder, Click Add
      3. Click on Folder Path, Click Other and select any folder that you want to access inside the VM
      4. Specify “Folder Name” as self-data
      5. Click OK
      6. Click OK
  4. Click Start to Start you VM
  5. Download the Sample dataset
    The sample dataset that we have provided has the following attributes.
    • Read length: 75
    • Gender: Male

    # Change into data directory
    cd /brain/atlas/data
    wget "http://genomics.isi.edu/downloads/sample_1.tar.bz2"
    --2011-05-05 12:08:56-- http://genomics.isi.edu/downloads/sample_1.tar.bz2
    Resolving genomics.isi.edu... 128.9.64.219
    Connecting to genomics.isi.edu|128.9.64.219|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 2900510638 (2.7G) [application/x-bzip2]
    Saving to: “sample_1.tar.bz2”

    100%[=================================>] 2,900,510,638 72.4M/s in 40s

    2011-05-05 12:09:37 (68.5 MB/s) - “sample_1.tar.bz2” saved [2900510638/2900510638]

  6. Decompress the Sample Dataset

    tar jxvf sample_1.tar.bz2
    sample_1/
    sample_1/BrainFromIllumina75bp.fq
    sample_1/Summary.htm
    sample_1/README

  7. Register the Sample dataset

    cd sample_1
    add-sample.sh --flowcell FLOW --sample SAMPLE --lane 1
    Registered file /brain/atlas/data/sample_1/BrainFromIllumina75bp.fq with the resource catalog.
    Registered file /brain/atlas/data/sample_1/Summary.htm with the resource catalog.

  8. Downloading index and FA files
    The RSeq workflow requires a number other files such as precomputed index files, and FA files. To simplify the process of downloading, decompressing, and registering these files we have provided a tool. This tool will download all required index and FA files based on the sample attributes specified.

    cd /brain/atlas/data
    reference-downloader.sh --mapper bowtie --gender male

    Some of the files are large in size and may take a long time to download.
    Mapper: Bowtie
    -----------------------------Downloading http://genomics.isi.edu/downloads/bowtie_gencode_male_index.tar.bz2
    --2011-05-05 12:19:59-- http://genomics.isi.edu/downloads/bowtie_gencode_male_index.tar.bz2
    Resolving genomics.isi.edu... 128.9.64.219
    Connecting to genomics.isi.edu|128.9.64.219|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 163818363 (156M) [application/x-bzip2]
    Saving to: “bowtie_gencode_male_index.tar.bz2”

    100%[=================================>] 163,818,363 75.5M/s in 2.1s

    2011-05-05 12:20:01 (75.5 MB/s) - “bowtie_gencode_male_index.tar.bz2” saved [163818363/163818363]

    -----------------------------Decompressing file bowtie_gencode_male_index.tar.bz2
    bowtie_gencode_male_index/
    bowtie_gencode_male_index/gencode_male_index.1.ebwt
    bowtie_gencode_male_index/add-index.sh
    bowtie_gencode_male_index/gencode_male_index.rev.1.ebwt
    bowtie_gencode_male_index/gencode_male_index.3.ebwt
    bowtie_gencode_male_index/gencode_male_index.4.ebwt
    bowtie_gencode_male_index/gencode_male_index.2.ebwt
    bowtie_gencode_male_index/gencode_male_index.rev.2.ebwt
    bowtie_gencode_male_index/README
    -----------------------------Registering Index
    Searching for .ebwt files in the ./ directory.
    Registered file /brain/atlas/data/bowtie_gencode_male_index/gencode_male_index.1.ebwt with the resource catalog.
    Registered file /brain/atlas/data/bowtie_gencode_male_index/gencode_male_index.2.ebwt with the resource catalog.
    Registered file /brain/atlas/data/bowtie_gencode_male_index/gencode_male_index.3.ebwt with the resource catalog.
    Registered file /brain/atlas/data/bowtie_gencode_male_index/gencode_male_index.4.ebwt with the resource catalog.
    Registered file /brain/atlas/data/bowtie_gencode_male_index/gencode_male_index.rev.1.ebwt with the resource catalog.
    Registered file /brain/atlas/data/bowtie_gencode_male_index/gencode_male_index.rev.2.ebwt with the resource catalog.
    .
    .
    .
    -----------------------------Finished

  9. Running the workflow
    Now we are ready to start the workflow.

    # Change into run directory
    cd /brain/atlas

    # FLOW, SAMPLE, and LANE should match the values passed to the
    # add-sample.sh script
    # M represents the gender of the sample.
    # 75 is the sample read length
    # 5 is the allowed mismatch count.
    # Y means use Bowtie mapper.
    run.sh FLOW SAMPLE 1 M 75 2 1 .htm Y

    2011.05.05 12:30:11.807 PDT:
    ---------------------
    File for submitting this DAG to Condor : FLOW_SAMPLE_1-0.dag.condor.sub
    Log of DAGMan debugging messages : FLOW_SAMPLE_1-0.dag.dagman.out
    Log of Condor library output : FLOW_SAMPLE_1-0.dag.lib.out
    Log of Condor library error messages : FLOW_SAMPLE_1-0.dag.lib.err
    Log of the life of condor_dagman itself : FLOW_SAMPLE_1-0.dag.dagman.log

    -no_submit given, not submitting DAG to Condor. You can do this with:
    "condor_submit FLOW_SAMPLE_1-0.dag.condor.sub"
    ---------------------
    Submitting job(s).
    1 job(s) submitted to cluster 1.

    Your Workflow has been started and runs in base directory given below

    cd /brain/atlas/runs/tutorial/pegasus/FLOW_SAMPLE_1/run0001

    *** To monitor the workflow you can run ***

    pegasus-status -l /brain/atlas/runs/tutorial/pegasus/FLOW_SAMPLE_1/run0001

    *** To remove your workflow run ***
    pegasus-remove -d 1.0
    or
    pegasus-remove /brain/atlas/runs/tutorial/pegasus/FLOW_SAMPLE_1/run0001

    Time taken to execute is 3.969 seconds

  10. Monitoring workflow progress
    Run the following command periodically to monitor the progress of the workflow. Continue monitoring until workflow status shows “COMPLETED”

    pegasus-status runs/tutorial/pegasus/FLOW_SAMPLE_1/run0001/
    05/03/11 11:54:23 Done Pre Queued Post Ready Un-Ready Failed
    05/03/11 11:54:23 ===  === === === === === ===
    05/03/11 11:54:23   0   0   1   0   0 211   0

    WORKFLOW STATUS : RUNNING | 0/212 ( 0% ) | (condor processing workflow)

  11. Output files
    The output files are generated in the storage folder

    cd /brain/atlas/storage/tutorial/pegasus/FLOW_SAMPLE_1/run0001/
    # Execute ls command to list the output files created by the workflow.
    ls -l

You can use your own samples as input for the workflow. The instructions to use your own samples is documented here.