Quickstart – DE Workflow

In this quick start guide we will show you how to set up the virtual machine and run the DE workflow using a sample dataset.

  1. VirtualBox You will need to install Virtual Box to run the virtual machine on your computer. If you already have VirtualBox installed, you can use that. Otherwise download the binary versions and install them from the Virtual Box Website.
  2. Download the Virtual Machine The virtual machine is around 1.0 GB in size. We recommend using a command line tool like “wget” to download the VM. Downloading the image using the browser sometimes corrupts the VM. Download Brain VM
  3. Configuring the VM
    1. Uncompress the image using the following command
      On Linux execute the following command:

      tar jxvf Brain.tar.bz2

      On Windows: Use a client like WinRar.

    2. Run VirtualBox
    3. Click on New
    4. Click Next
    5. Enter Details as shown and
    6. Click Next
    7. Specify a Base Memory Size and Click Next. RSEQ work-flow uses PerM mapper which requires a large amount of physical memory. The recommended amount is 16384MB. RSEQ wokflow also support Bowtie mapper which requires 4096MB of memory
      Note: The computer on which the VM is deployed will require memory (RAM) greater than 16384MB for PerM OR 4096MB for Bowtie.
    8. Click Next
    9. Click Choose existing hard disk
    10. Select the Debian-6-x86.vmdk file from the decompressed folder
    11. Click Next
    12. Click Finish
    13. Select BrainVM and Click Settings
    14. Click on Storage -> SATA Controller
    15. Click Add Hard Disk
    16. Click Choose Existing Disk
    17. Select the Brain.vmdk file from the decompressed folder and Click Open
    18. Setup Shared Folder:
      1. Click on Settings
      2. Click on Shared Folder, Click Add
      3. Click on Folder Path, Click Other and select any folder that you want to access inside the VM
      4. Specify “Folder Name” as self-data
      5. Click OK
      6. Click OK
  4. Click Start to Start you VM
  5. Download the Sample dataset
    The sample DE dataset that we will use for this exercise are already present inside the virtual machine.

    • With Replicate – /brain/de/data/sample_1_gene
    • Without Replicate – /brain/de/data/sample_1_exon
  6. Running the workflow
    Now we are ready to start the workflow.

    # Change into run directory
    cd /brain/de

    # Command to run DE in without replicate mode
    # Where ExonReadCountNRR_s*_all.txt files are located in sample_1_exon directory
    run-de.sh --c1 ExonReadCountNRR_s1_all.txt --c2 ExonReadCountNRR_s3_all.txt --uid s1_s3

    # Command to run DE in with replicate mode
    # Where GeneReadCountNRR_*_all.txt files are located in sample_1_gene directory
    run-de.sh --c1 GeneReadCountNRR_s1_all.txt --c1 GeneReadCountNRR_s3_all.txt --c2 \ GeneReadCountNRR_m1_all.txt --uid s1s3_m1 --with-replicate

  7. Monitoring workflow progress
    Run the following command periodically to monitor the progress of the workflow. Continue monitoring until workflow status shows “COMPLETED”

    # Without Replicate
    pegasus-status -l /brain/de/runs/tutorial/pegasus/s1_s3/run0001
    # With Replicate
    pegasus-status -l /brain/de/runs/tutorial/pegasus/s1s3_m1/run0001
    # Sample Output
    05/03/11 11:54:23 Done Pre Queued Post Ready Un-Ready Failed
    05/03/11 11:54:23 ===  === === === === === ===
    05/03/11 11:54:23   0   0   1   0   0 211   0

    WORKFLOW STATUS : RUNNING | 0/212 ( 0% ) | (condor processing workflow)

  8. Output files
    The output files are generated in the storage folder

    # Without Replicate
    cd /brain/de/storage/tutorial/pegasus/s1_s3/run0001
    # With Replicate
    cd /brain/de/storage/tutorial/pegasus/s1s3_m1/run0001

You can use your own samples as input for the workflow. The instructions to use your own samples is documented here.