Pegasus WMS

img-pegasus-architecture
Pegasus WMS Architecture

Pegasus WMS is a scientific workflow management system that can manage the execution of complex workflows on distributed resources. Pegasus is funded by National Science Foundation, and has been used in a number of scientific domains including astronomy, bioinformatics, earthquake science , gravitational wave physics, ocean science, limnology, and others. Pegasus has been used to run workflows ranging from just a few computational tasks up to 1 million.

When errors occur, Pegasus tries to recover when possible by retrying tasks, by retrying the entire workflow, by providing workflow-level checkpointing, by re-mapping portions of the workflow, by trying alternative data sources for staging data, and, when all else fails, by providing a rescue workflow containing a description of only the work that remains to be done].

It cleans up storage as the workflow is executed so that data-intensive workflows have enough space to execute on storage-constrained resources]. Pegasus keeps track of what has been done (provenance) including the locations of data used and produced, and which software was used with which parameters.

Pegasus WMS bridges the scientific domain and the execution environment by automatically mapping high-level workflow descriptions onto distributed resources. It automatically locates the necessary input data and computational resources necessary for workflow execution. Pegasus enables scientists to construct workflows in abstract terms without worrying about the details of the underlying execution environment or the particulars of the low-level specifications required by the middleware (Condor, Globus, or Amazon EC2). Pegasus WMS also bridges the current cyberinfrastructure by effectively coordinating multiple distributed resources.

Features

  1. Pegasus workflows are portable and can run on heterogeneous infrastructures.

  2. Reproducible: Pegasus workflows can be reproduced over time.

  3. Pegasus workflows are resilient, and react to failures and performance problems.

  4. Pegasus workflows are shareable with colleagues and the community.

  5. Pegasus workflow are scalable can handle workflows with Terabytes of of data and millions of task.

  6. Pegasus is sophisticated so it can optimize the workflow from the point of view of performance, handle data management across local and wide area networks, and leveraging parallel file systems and object stores

For more information visit http://pegasus.isi.edu/wms