Running DE Workflow

This part will do the differentially expressed gene analysis. We use different method for conditions with replicate(there are more than one sample for at least one condition) and without replicate(only one sample for one condition).

The input for the cases with replicate is the reads number falling into gene(this file can be produced by RSEQ part or by the user’s own method), the format is:

ZDHHC16 chr10   +       205
CTNNA3  chr10   –       6
CCDC147 chr10   +       19

 The input for the cases without replicate is the reads number falling into exon(this file can be produced by RSEQ part or by the user’s own method), the format is:

STOM:124118330:124118434        1
STOM:124116878:124116951        3
UBAP2:33922576:33922597  1

Workflow Input File(s)

  • The DE workflow accepts Gene Read Counts and/or Exon Read Counts as input.
  • The input files can be obtained by running the RSeq workflow on two or more samples.
  • If you are using your own input files you will have to import the files into the Virtual Machine in order to use them with the workflow.

Obtaining Input File(s)
Sample input files are provided here. Additionally, you may use your own files as input.

To use any input file with the RSeq workflow you need to download the files into the Virtual Machine.

Steps to run the DE Workflow
The explanation assumes that a user is trying to use their own input files. The input files are named GeneRead_s1.txt, GeneRead_s2.txt, and GeneRead_s3.txt

  1. Place your input files in the folder that you have shared with the Virtual Machine
  2. Inside the VM: Open a terminal Terminal
  3. Copy your files inside the VM.

    cp /brain/shared/GeneRead_s1.txt /brain/de/data/
    cp /brain/shared/GeneRead_s2.txt /brain/de/data/
    cp /brain/shared/GeneRead_s3.txt /brain/de/data/

  4. Registering Input File(s)
    To register the input file, execute the add-de-file.sh script.
    Usage: add-de-file.sh <-i FILE> [-r RC_FILE]
    -i, --input-file Location of DE input file
    -r, --rc-file Location of rc-file (Default: /brain/de/config/rc.data)

    To register the input files copied at located at /brain/de/data/ execute

    add-de-file -i /brain/de/data/GeneRead_s1.txt
    add-de-file -i /brain/de/data/GeneRead_s2.txt
    add-de-file -i /brain/de/data/GeneRead_s3.txt

  5. Execute the Workflow

    cd /brain/de

    To start workflow, execute run-de.sh script.
    Usage: run-de.sh --c1, -1 – Filename for condition 1 \
    --c2, -2 for condition 2 \
    --uid Unique ID \
    --with-replicate, -r – To run workflow with replicate. (Default: Without Replicate)
    Examples

    The following command will run the DE workflow in with-replicate mode. In with-replicate mode multiple input file(s) can be passed for each condition.

    run-de.sh --c1 GeneRead_s1.txt --c1 GeneRead_s2.txt --c2 GeneRead_s3.txt \
    --uid s1s2_s3 --with-replicate

  6. Monitoring the work-flow
    1. To monitor status on the running workflow, execute

      pegasus-status -l /brain/de/runs/tutorial/pegasus/s1s2_s3/run0001
      // Sample Output
      05/03/11 11:54:23 Done Pre Queued Post Ready Un-Ready Failed
      05/03/11 11:54:23 ===  === === === === === ===
      05/03/11 11:54:23   0   0   1   0   0 115   0

      WORKFLOW STATUS : RUNNING | 0/116 ( 0% ) | (condor processing workflow)

    2. Continue checking the status until workflow status is completed.
    3. To verify if all jobs were successful, execute

      pegasus-analyzer -i /brain/de/runs/tutorial/pegasus/s1s2_s3/run0001

Location of Output
Output is located at /brain/de/storage/tutorial/pegasus/s1s2_s3/run0001 directory.