Pegasus GT-FAR on AWS

iseq-logo

Instructions

These are instructions for starting the Pegasus GT-FAR Amazon EC2 virtual image

  1. Get an AWS Account.

  2. Start Pegasus GT-FAR EC2 Instance
  3. Run GT-FAR

1. Get an Amazon Web Services (AWS) Account

You can register for an AWS account from AWS Sign Up.

You will need a credit card. AWS bills monthly, and charges by the hour.

2. Log into the Amazon Management Console

AWS Management Console is a web application that lets you manage cloud resources.

We will refer to this web application as the “console”, and we will refer to the links on the left side of the console as “areas”.

Go to: http://console.aws.amazon.com

Click “Sign in to the AWS console”

Change the region on the upper-left side of the console to “US West (Oregon)”.

region_selection

VERY IMPORTANT: For this pipeline, please use region “US West (Oregon)”. Or you won’t be able to find our public pegasus-gtfar VM image.

 3. Goto the EC2 dashboard from the AWS console and generate access keys

Note: This step needs to be performed only once.

Click on your username at top right.

Click on Security Credentials.

Click Users.

Click Create New Users.

access_key_1

Specify any username, like ‘pegasus-gtfar’

Ensure that Generate Access Key for each user is checked.

access_key_2

Click Create.

Click Download Credentials to save these credentials locally.

access_key_3

The next steps can be done using a dashboard ( recommended ), or manually through the Amazon EC2 dashboard.

4. Using Dashboard:
Back to TOP

Go to https://pegasus-gtfar.isi.edu/

Provide the AWS Access Key, and AWS Secret Key you generated in Step 3.

If you have not previously created a key pair, click “1. Create Key Pair”.

If you have not previously created a security group, click “2. Create Security Group”

Specify the name of the Key pair you created in AWS Key Pair field.

Provide a password in the GT-FAR Password field so that the EC2 instance is accessible only by you. If no password is provided, it defaults to pegasus123.

Click “3. Start EC2 Instance”.

Wait a few minute for the Amazon EC2  instance to initialize, then click on the URL shown in “EC2 Instance Details” section.

When prompted for a password specify username and password shown in the  “EC2 Instance Details” section.

Go to Step 7.

ec2-dash

OR


4. Manually: Goto the EC2 dashboard from the AWS console home and create a key-pair

Note: This step needs to be performed only once.

These are the credentials you use to log into Pegasus GT-FAR EC2 nodes.

Go to the “Key Pairs” area in the console

Click “Create Key Pair”

Call it “pegasus-gtfar-ec2-keypair” and click OK.

It should popup a download box. Save the file. We will refer to this file as KEYPAIR.

key_pair_creation

key_pair_creation5. From the console create a security group under network and security

Note: This step needs to be performed only once.

A Security Group allows you to authorize outsides machines to access your AWS EC2 nodes.

We will assume your submit host is “host.example.com”, and that it has an IP of “192.168.1.1”. The security group we create here will give “host.example.com” unrestricted access to your nodes.

Click on “Security Groups” area in the console.

Click “Create Security Group”.

sg_1Call your new group “pegasus-gtfar-sg”, add a description.

Click on Inbound tab, and add entries as follows.

Type Protocol Port Range Source
All TCP Tcp 0 – 65535 0.0.0.0/0
All UDP UDP 0 – 65535 0.0.0.0/0

sg_2

  1. Launch the Pegasus GT-FAR EC2 public image

We are going to use a pre-configured image. It contains GT-FAR web dashboard, Pegasus, and HTCondor.

Click on AMIs in the Images area of the console.

Click on Filter, and select “Public images”

In the search box enter “Pegasus GTFAR v1.0” and press Enter.

Select the Pegasus GT-FAR image, and Click Launch.

launch_1

Select a instance type which has at least 16+GB of memory. We recommend using m3.2xlarge.

Click “Next: Configure Instance Details.”

launch_2

Click Advanced Details and enter text as follows. AWS_KEYS should be the same as generated in Step 3. GTFAR_PASSWD is the password you want/need to access the GT-FAR dashboard. If no password is specified, it defaults to pegasus123.

S3_ACCESS_KEY=<YOUR_AWS_ACCESS_KEY>

S3_SECRET_KEY=<YOUR_AWS_SECRET_KEY>

GTFAR_PASSWD=<PASSWORD>

launch

Click “Next: Add storage” and configure as shown in screen shot.

launch_4

Click “Next: Tag Instance”, Click”Next: Configure Security Group”

Click “Select an existing security group”.

Search and select “pegasus-gtfar-sg” group that we created earlier in Step 5.

launch_5

Click “Review and Launch”.

Click “Launch”.

Next, select the key-pair we created earlier in Step 4, and click “Launch Instance”

VERY IMPORTANT: Select the security group and key-pair you created earlier or else it won’t work

6. To log onto the GT-FAR web dashboard

Click on Instances in that “Instances” area of the console.

You should see the instance you just launched go from “pending” to “running”. You may need to hit “Refresh” a couple times.

Once the status changes from to “Running”, select the instance and copy the Public DNS section shown in the details section. It should look something like this ec2-54-214-166-116.us-west-2.compute.amazonaws.com.

In the address bar of your browser enter. https://<PUBLIC_DNS>

Note: The instance uses self-signed certificates so you would see an untrusted certificate error.  Simply proceed.

This should bring up the GT-FAR dashboard running on your instance on Amazon. If the page doesn’t load , wait a few minutes for the EC2 to be configured before retrying.


 7. Start a GT-FAR pipeline analysis

On the page opened on your browser (https://PUBLIC_DNS) , click on “Start GT-FAR Run”.

main-page-start-button

Upload your input reads file, specify other options as required, and click Submit.
Important: The input file must be a GZIP compressed FastQ  file with an extension of ‘.fastq.gz’ or ‘.fq.gz’.

start-run-page-filled

You can either track progress of the run on the browser itself or wait for an Email notification ( if an Email address was specified before submitting the run ). Output files are available for download as soon as they are generated by the run.

Note: Email notification may end up in your Junk folder.

details-page-running

8. Download Outputs

Your output files are also saved on S3 and can be accessed directly from Amazon S3 Console.

OR

You can view the outputs from the GT-FAR dashboard.

details-page-outputs

 9. IMPORTANT: Shut down your instance

In the “Instances” area of the console click on the running instance and select “Instance Actions” -> “Terminate”.

VERY IMPORTANT: Amazon keeps charging until the status is terminated.

10. IMPORTANT: Amazon S3

Pegasus GT-FAR pre-computes some files for faster processing on future runs, and also stores output and intermediate files on Amazon S3, for which you will be charged. To ensure you are not charged for S3 storage delete the S3 bucket whose name starts with pegasus-gtfar.