News Flash!

Dec 18th, 2014

We also have a version running on USC HPCC cluster that potential users can try for free. To gain access to that instance , send email to gtfar-devel@isi.edu .

Oct 15th, 2014

We will be there at ASHG 2014, in San Diego. Come and see our poster “Genome and Transcriptome Free Analysis of RNA-Seq Data using cloud computing”  during the poster sessions on Sunday Oct19th (5:00PM-6:00PM) in Room 1410S.

The Pegasus GT-FAR cloud solution will also be demoed at the iSeqTools workshop “iSeqTools to Demistify the Cloud and Genomics Analysis for Researchers Seeking Ways to Analyze High-Throughput DNA Sequencing Data “  on Monday , Oct 20th 2014 12:30-2:00PM in Room 24, Upper Level.

This website highlights the various RNASeq pipelines developed by USC and Pegasus team at USC/ISI, with funding support from NIH.

iseq-logo  rSEQ Project ( 2012- current )

This is an NIH funded project, “Robust and Portable Workflow-based tolls for mRNA re-sequencing” (NIH/NHGRI 1U01 HG006531-01, PIs: Ting Chen, Ewa Deelman, and James Knowles). It is a component of the NHGRI iSeqTools Network that includes research groups from the Broad Institute of MIT, Harvard Medical School, Washington University, Scripps Institute, University of Michigan, University of Utah, and University of Southern California. We developed two RNA-seq data analysis pipelines: GT-FAR and RseqFlow.

  • GT-FAR is a Pegasus powered reference-free RNA-seq data analysis pipeline, which includes functions for RNA-seq Quality, Read Alignment, Expression Level Quantification, Gene Differential Expression, and Variant Calling. GT-FAR sequentially aligns reads to gene models, predicts and validates new splice junctions, and quantifies expression for each gene, exon, and known/novel splice junction. A critical feature of GT-FAR is that it contains a Genome and Transcriptome Free Analysis of RNA (GT-FAR) module which quantifies reads following lightweight assembly. GT-FAR is built on top of Pegasus, with a custom web-based interface to Amazon-based solution that allows investigators to start an EC2 instance via the GUI, upload inputs from local machine, track running workflows, and get outputs from S3.
  • RseqFlow  is a Unix command line based, light-weight RNA-seq data analysis pipeline that consumes small memory and hard disk space, and is perfect to run on desktop computers. It provides functions for computing pre-alignment and post-alignment statistics for quality control, calculating expression levels for genes, exons and splice junctions, identifying differentially expressed genes, calling coding region variants, and converting files to different formats.


Brainspan Project ( 2009-2012)

The Brain Span project was funded by NIH from 2009-2012. It seeked to find when and where in the brain a gene is expressed. This information holds clues to potential causes of disease. A recent study found that forms of a gene associated with schizophrenia are over-expressed in the fetal brain. To make such discoveries about what is abnormal, scientists first need to know what the normal patterns of gene expression are during development. To this end, the National Institute of Mental Health (NIMH), part of the National Institutes of Health (NIH), funded the creation of TADHB. To map human brain “transcriptomes”, researchers identify the composition of intermediate products, called transcripts or messenger RNAs, which translate genes into proteins throughout development.

As part of this project we enabled the geneticists to analyse over 225 human brain RNA sequences using two different mapping algorithms CASAVA ELAND and Perm. More information about this effort can be found here.