19
Running Scientific Workflow Running Scientific Workflow Applications on the Amazon EC2 Applications on the Amazon EC2 Cloud Cloud Bruce Berriman Bruce Berriman NASA Exoplanet Science Institute, IPAC NASA Exoplanet Science Institute, IPAC Gideon Juve, Ewa Deelman, Karan Vahi, Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta Gaurang Mehta Information Sciences Institute, USC Information Sciences Institute, USC Benjamin Berman Benjamin Berman USC Epigenome Center USC Epigenome Center Phil Maechling Phil Maechling So Cal Earthquake Center So Cal Earthquake Center

Running Scientific Workflow Applications on the Amazon EC2 Cloud

  • Upload
    byron

  • View
    49

  • Download
    0

Embed Size (px)

DESCRIPTION

Running Scientific Workflow Applications on the Amazon EC2 Cloud. Bruce Berriman NASA Exoplanet Science Institute, IPAC Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta Information Sciences Institute, USC Benjamin Berman USC Epigenome Center Phil Maechling So Cal Earthquake Center. - PowerPoint PPT Presentation

Citation preview

Page 1: Running Scientific Workflow Applications on the Amazon EC2 Cloud

Running Scientific Workflow Running Scientific Workflow Applications on the Amazon EC2 Applications on the Amazon EC2

CloudCloud

Bruce Berriman Bruce Berriman NASA Exoplanet Science Institute, IPACNASA Exoplanet Science Institute, IPAC

Gideon Juve, Ewa Deelman, Karan Vahi, Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta Gaurang Mehta

Information Sciences Institute, USCInformation Sciences Institute, USCBenjamin BermanBenjamin Berman

USC Epigenome CenterUSC Epigenome Center Phil Maechling Phil Maechling

So Cal Earthquake CenterSo Cal Earthquake Center

Page 2: Running Scientific Workflow Applications on the Amazon EC2 Cloud

Clouds (Utility Computing)

Pay for what you use rather than purchase compute and storage resources that end up underutilized Analogous to household utilities

Originated in the business domain to provide services for small companies who did not want to maintain an IT Department

Provided by data centers that are built on compute and storage virtualization technologies.

Clouds built with commodity hardware. They are a “new purchasing paradigm” rather than a new technology.

Page 3: Running Scientific Workflow Applications on the Amazon EC2 Cloud

Benefits and Concerns

Benefits

Pay only for what you need

Elasticity - increase or decrease capacity within minutes

Ease strain on local physical plant

Control local system administration costs

Concerns

What if they become oversubscribed and user cannot increase capacity on demand?

How will the cost structure change with time?

If we become dependent on them, will we be at the cloud providers’ mercy?

Are clouds secure?

Are they up to the demands of science applications?

Page 4: Running Scientific Workflow Applications on the Amazon EC2 Cloud

Cloud Providers Pricing Structures vary widely

Amazon EC2 charges for hourly usage

Skytap charges per month

IBM requires an annual subscription

Savvis offers servers for purchase

Uses Running business applications

Web hosting

Provide additional capacity for heavy loads

Application testing

ProviderAmazon.com EC2

AT&T Synaptic Hosting

GNi Dedicated Hosting

IBM Computing on Demand

Rackspace Cloud Servers

Savvis Open Cloud

ServePath GoGrid

Skytap Virtual Lab

3Tera

Unisys Secure

Verizon Computing

Zimory Gateway

Source Information Week, 9/4/09

Page 5: Running Scientific Workflow Applications on the Amazon EC2 Cloud

Purposes of Our Study

How useful is cloud computing for scientific workflow applications?

An experimental study of the performance of three workflows with different I/O, memory and CPU requirements on a commercial cloud

A comparison of the performance of cloud resources and typical HPC resources, and

An analysis of the various costs associated with running workflows on a commercial cloud.

Clouds are well suited to processing of workflows Workflows are loosely-couple applications composed of tasks connected by data Allocate resources as needed for processing tasks and decrease scheduling overheads

Chose Amazon EC 2 Cloud and the NCSA Abe Cluster

http://aws.amazon.com/ec2/http://www.ncsa.illinois.edu/UserInfo/Resources/Hardware/Intel64Cluster/

Page 6: Running Scientific Workflow Applications on the Amazon EC2 Cloud

The Applications: Montage

BgModel

Project

Project

Project

Diff

Diff

Fitplane

Fitplane

Background

Background

Background

Add

Image1

Image2

Image3

Montage processing flow

Reprojection Background rectification Co-addition

Science Grade - preserves spatial and calibration fidelity of input images.

Portable – all common *nix platforms

Open source code

General – all common coords and image projections

Speed – Processes 40 million pixels in 32 min on 128 nodes of 1.2 GHz Linux cluster

Utilities for managing and manipulating image files

Stand-alone modules

http://montage.ipac.caltech.edu

Toolkit for assembling FITS images into science-grade mosaics.

Page 7: Running Scientific Workflow Applications on the Amazon EC2 Cloud

The Applications: Broadband and Epigenome

Broadband simulates and compares seismograms from earthquake simulation codes.

Generates high- and low-frequency earthquakes for several sources

Computes intensities of seismograms at measuring stations.

Epigenome maps short DNA segments collected using high-throughput gene sequencing machines to a reference genome.

Maps chunks to a reference genome

Produces an output map of gene density compared with the reference genome

Page 8: Running Scientific Workflow Applications on the Amazon EC2 Cloud

Comparison of Resource Usage

Ran a mosaic 8 deg sq of M17 in 2MASS J-band

Workflow contains 10,429 tasks

Reads 4.2 GB of input data

Produces 7.9 GB of output data.

Montage is I/O-bound because it spends more than 95% of its time in I/O operations.

Application

I/O Memory CPU

Montage High Low Low

Broadband Medium High Medium

Epigenome Low Medium High

Page 9: Running Scientific Workflow Applications on the Amazon EC2 Cloud

Comparison of Resource Usage

Broadband

4 sources and 5 stations

Workflow contains 320 tasks

6 GB of input data and 160 MB of output data.

Memory-limited because more than 75% of its runtime is consumed by tasks requiring more than 1 GB of physical memory

Epigenome

Workflow contains 81 tasks,

1.8 GB of input data

300 MB of output data.

CPU-bound because it spends 99% of its runtime in the CPU and only 1% on I/O and other activities.

Application

I/O Memory CPU

Montage High Low Low

Broadband Medium High Medium

Epigenome Low Medium High

Page 10: Running Scientific Workflow Applications on the Amazon EC2 Cloud

Processing Resources

Networks and File Systems

HPC systems use high-performance network and parallel file systems BUT

Amazon EC2 uses commodity hardware

Ran all processes on single, multi-core nodes. Used local and parallel file system on Abe.

Processors and OS

Linux Red Hat Enterprise with VMWare

Amazon EC2 offers different instances – look at cost vs. performance

c1.xlarge and abe.local equivalent – estimate overhead due to virtualization

abe.lustre and abe.local differ only in file system

Amazon

Abe

Type Arch CPU Cores

Memory Network Storage

Price

m1.small 32-bit

2.0-2.6 GHz Opteron 1/2 1.7 GB 1-Gbps Ethernet Local $0.10/hr

m1.large 64-bit

2.0-2.6 GHz Opteron 2 7.5 GB 1-Gbps Ethernet Local $0.40/hr

m1.xlarge

64-bit

2.0-2.6 GHz Opteron 4 15 GB 1-Gbps Ethernet Local $0.80/hr

c1.medium

32-bit

2.33-2.66 GHz Xeon 2 1.7 GB 1-Gbps Ethernet Local $0.20/hr

c1.xlarge

64-bit

2.0-2.66 GHz Xeon

8 7.5 GB 1-Gbps Ethernet Local $0.80/hr

abe.local

64-bit

2.33 GHz Xeon 8 8 GB 10-Gbps InfiniBand

Local …

abe.lustre

32-bit

2.0-2.6 GHz Opteron 8 18 GB 10-Gbps InfiniBand Lustre …

Page 11: Running Scientific Workflow Applications on the Amazon EC2 Cloud

Execution Environment

Establish equivalent software environments on the two platforms “Submit” host used to send jobs to EC2 or Abe.All workflows used the Pegasus Workflow Management System with DAGMan and Condor.

Pegasus - transforms abstract workflow descriptions into concrete plansDAGMan – manages dependencies Condor manages task execution

Amazon EC2

Abe

Page 12: Running Scientific Workflow Applications on the Amazon EC2 Cloud

Montage Performance(I/O Bound)

Slowest on m1.small, but fastest on those machines with the most cores: m1.xlarge, c1.xlarge and abe.lustre, abe.local.

The parallel file system on abe.lustre offers a big performance advantage for I/O bound systems – cloud providers would need to offer parallel file system and high-speed networks.

Virtualization overhead <10%

Page 13: Running Scientific Workflow Applications on the Amazon EC2 Cloud

Broadband Performance (Memory bound)

Lower I/O requirements – not much difference between abe.lustre and abe.local; both have 8 GB memory. Only slightly worse performance on c1.xlarge, 7.5 GB memory.

Poor performance on c1.medium – only 1.7 GB of memory. Cores may sit idle to prevent system running out of memory.

Virtualization overhead small

Page 14: Running Scientific Workflow Applications on the Amazon EC2 Cloud

Epigenome Performance (CPU Bound)

c1.xlarge, abe.lustre and abe.local give best performance – they are the three most powerful machines (64-bit, 2.3-2.6 GHz)

The parallel file system on abe.lustre offers little benefit.

Virtualization overhead is roughly 10%, largest of three apps - competing for CPU with OS.

Page 15: Running Scientific Workflow Applications on the Amazon EC2 Cloud

Resource Cost

Analysis

You get what you pay for!

The cheapest instances are the least powerful.

Instance Cost $/hr

m1.small 0.10

m1.large 0.40

m1.xlarge 0.80

c1.medium 0.20

c1.xlarge 0.80

c1.medium a good choice for Montage but more powerful processors better for other two.

Page 16: Running Scientific Workflow Applications on the Amazon EC2 Cloud

Data Transfer Costs

Operation Cost $/GB

Transfer In 0.10

Transfer Out 0.17

For Broadband and Epigenome, economical to transfer data out of the cloud For Montage, output larger than input, so the costs to transfer data out are equal to or higher than processing costs for all but one processing instance.

Is it more economical to store data on the cloud?

Application

Input (GB)

Output (GB)

Logs (MB)

Montage 4.2 7.9 40

Broadband

4.1 0.16 5.5

Epigenome

1.8 0.3 3.3Application

Input

Output

Logs Total

Montage $0.42 $1.32 <$0.01

$1.75

Broadband $0.40 $0.03 <$0.01

$0.43

Epigenome

$0.18 $0.05 <0.01 $0.23

Page 17: Running Scientific Workflow Applications on the Amazon EC2 Cloud

Storage Costs

Item Charges $

Storage of VM’s in S3 Disk

0.15/GB-Month

Storage of data in EBS disk

0.10/GB-Month

Item Low Cost ($) High Cost ($)

Transfer Data In

0.42 0.42

Processing 0.55 2.45

Storage 1.07 1.07

Transfer Out … 1.32

Totals 2.04 5.22

Application

Data VM Monthly Cost $

Montage 0.95 0.12 1.07

Broadband

0.02 0.10 0.12

Epigenome

0.20 0.10 0.32

Storage Costs of Output/jobStorage Charges

… And the bottom line

Page 18: Running Scientific Workflow Applications on the Amazon EC2 Cloud

Most cost-effective model?

15 Xeon 3.2- GHz dual processor dual-core Dell 2650 Power Edge servers

Aberdeen Technologies 6-TB staging disk farm

Dell PowerVault MD1200 storage disks

Transfer In ($)

Store 2MASS ($)

IPAC Service ($)

Transfer In 7,560 3,780 …

Store input data 17,100 61,500 13,200

Processing 9,000 9,000 66,000

Transfer Data Out

25,560 25,560

Cost $/job 1.65 2.75 2.20

Assume 1,000 2MASS mosaics of 4 deg sq centered on M17 per month for 3 years. Assume c1.medium processor on Amazon EC2

Page 19: Running Scientific Workflow Applications on the Amazon EC2 Cloud

Conclusions Clouds can be used effectively and fairly efficiently for

scientific applications. The virtualization overhead is low.

The high speed network and parallel file systems give HPC clusters a significant performance advantage over cloud computing for I/O bound applications.

On Amazon EC2, primary cost for Montage is data transfer. Processing is primary cost for Broadband, epigenome.

Amazon EC2 offers no dramatic cost benefits over a locally mounted image-mosaic service.

Reference: G. Juve, E. Deelman, K. Vahi, G. Mehta, B. Berriman, B. P. Berman,

and P. Maechling, "Scientific Workflow Applications on Amazon EC2," in

CloudComputing Workshop  in Conjunction with e-Science Oxford, UK:

IEEE, 2009