Modules in our pipeline were formalized into a workflow to express multi-step RNA-Seq analysis. Input RNA-Seq dataset, the workflow will execute each module in appropriate order and output the analysis results automatically.
A workflow is an automatic process, during which, data, information or tasks are passed from one module to another for action, according to a set of procedural rules. So with workflow, users can acquire analysis results expressly saving from running the pipeline step by step, inputting commands and arguments for each module. As pipeline builders, we can model, design, execute, debug, re-configure and re-run the analysis and visualization pipelines with workflow. For users, the workflow can not only execute a series of computational or data manipulation steps according to presetting steps automatically, but also enables to track the provenance of the workflow execution results, such as, methods used, machine calibrations and parameters, services and databases accessed, data sets used, etc.
The Pegasus Workflow Management Service was adopted to manage the workflow’s operations. It helps workflow execute in different kinds of different environments including desktops, campus clusters, grids, and clouds. The Pegasus was deployed in Virtual Machine, which exempted users from complex installation and configuration. If users want to use their own cluster without VM, they can just use the VM as a submit machine to submit jobs into the cluster.
So the workflow allows users to express multi-step RNA-Seq analysis by just providing sequencing RNA-Seq datasets and typing a few commands in an easy-to-use environment.
Schematic of the workflow