Running Scientific Workflow Running Scientific Workflow Applications on the Amazon EC2 Applications on the Amazon EC2
CloudCloud
Bruce Berriman Bruce Berriman NASA Exoplanet Science Institute, IPACNASA Exoplanet Science Institute, IPAC
Gideon Juve, Ewa Deelman, Karan Vahi, Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta Gaurang Mehta
Information Sciences Institute, USCInformation Sciences Institute, USCBenjamin BermanBenjamin Berman
USC Epigenome CenterUSC Epigenome Center Phil Maechling Phil Maechling
So Cal Earthquake CenterSo Cal Earthquake Center
Clouds (Utility Computing)
Pay for what you use rather than purchase compute and storage resources that end up underutilized Analogous to household utilities
Originated in the business domain to provide services for small companies who did not want to maintain an IT Department
Provided by data centers that are built on compute and storage virtualization technologies.
Clouds built with commodity hardware. They are a “new purchasing paradigm” rather than a new technology.
Benefits and Concerns
Benefits
Pay only for what you need
Elasticity - increase or decrease capacity within minutes
Ease strain on local physical plant
Control local system administration costs
Concerns
What if they become oversubscribed and user cannot increase capacity on demand?
How will the cost structure change with time?
If we become dependent on them, will we be at the cloud providers’ mercy?
Are clouds secure?
Are they up to the demands of science applications?
Cloud Providers Pricing Structures vary widely
Amazon EC2 charges for hourly usage
Skytap charges per month
IBM requires an annual subscription
Savvis offers servers for purchase
Uses Running business applications
Web hosting
Provide additional capacity for heavy loads
Application testing
ProviderAmazon.com EC2
AT&T Synaptic Hosting
GNi Dedicated Hosting
IBM Computing on Demand
Rackspace Cloud Servers
Savvis Open Cloud
ServePath GoGrid
Skytap Virtual Lab
3Tera
Unisys Secure
Verizon Computing
Zimory Gateway
Source Information Week, 9/4/09
Purposes of Our Study
How useful is cloud computing for scientific workflow applications?
An experimental study of the performance of three workflows with different I/O, memory and CPU requirements on a commercial cloud
A comparison of the performance of cloud resources and typical HPC resources, and
An analysis of the various costs associated with running workflows on a commercial cloud.
Clouds are well suited to processing of workflows Workflows are loosely-couple applications composed of tasks connected by data Allocate resources as needed for processing tasks and decrease scheduling overheads
Chose Amazon EC 2 Cloud and the NCSA Abe Cluster
http://aws.amazon.com/ec2/http://www.ncsa.illinois.edu/UserInfo/Resources/Hardware/Intel64Cluster/
The Applications: Montage
BgModel
Project
Project
Project
Diff
Diff
Fitplane
Fitplane
Background
Background
Background
Add
Image1
Image2
Image3
Montage processing flow
Reprojection Background rectification Co-addition
Science Grade - preserves spatial and calibration fidelity of input images.
Portable – all common *nix platforms
Open source code
General – all common coords and image projections
Speed – Processes 40 million pixels in 32 min on 128 nodes of 1.2 GHz Linux cluster
Utilities for managing and manipulating image files
Stand-alone modules
http://montage.ipac.caltech.edu
Toolkit for assembling FITS images into science-grade mosaics.
The Applications: Broadband and Epigenome
Broadband simulates and compares seismograms from earthquake simulation codes.
Generates high- and low-frequency earthquakes for several sources
Computes intensities of seismograms at measuring stations.
Epigenome maps short DNA segments collected using high-throughput gene sequencing machines to a reference genome.
Maps chunks to a reference genome
Produces an output map of gene density compared with the reference genome
Comparison of Resource Usage
Ran a mosaic 8 deg sq of M17 in 2MASS J-band
Workflow contains 10,429 tasks
Reads 4.2 GB of input data
Produces 7.9 GB of output data.
Montage is I/O-bound because it spends more than 95% of its time in I/O operations.
Application
I/O Memory CPU
Montage High Low Low
Broadband Medium High Medium
Epigenome Low Medium High
Comparison of Resource Usage
Broadband
4 sources and 5 stations
Workflow contains 320 tasks
6 GB of input data and 160 MB of output data.
Memory-limited because more than 75% of its runtime is consumed by tasks requiring more than 1 GB of physical memory
Epigenome
Workflow contains 81 tasks,
1.8 GB of input data
300 MB of output data.
CPU-bound because it spends 99% of its runtime in the CPU and only 1% on I/O and other activities.
Application
I/O Memory CPU
Montage High Low Low
Broadband Medium High Medium
Epigenome Low Medium High
Processing Resources
Networks and File Systems
HPC systems use high-performance network and parallel file systems BUT
Amazon EC2 uses commodity hardware
Ran all processes on single, multi-core nodes. Used local and parallel file system on Abe.
Processors and OS
Linux Red Hat Enterprise with VMWare
Amazon EC2 offers different instances – look at cost vs. performance
c1.xlarge and abe.local equivalent – estimate overhead due to virtualization
abe.lustre and abe.local differ only in file system
Amazon
Abe
Type Arch CPU Cores
Memory Network Storage
Price
m1.small 32-bit
2.0-2.6 GHz Opteron 1/2 1.7 GB 1-Gbps Ethernet Local $0.10/hr
m1.large 64-bit
2.0-2.6 GHz Opteron 2 7.5 GB 1-Gbps Ethernet Local $0.40/hr
m1.xlarge
64-bit
2.0-2.6 GHz Opteron 4 15 GB 1-Gbps Ethernet Local $0.80/hr
c1.medium
32-bit
2.33-2.66 GHz Xeon 2 1.7 GB 1-Gbps Ethernet Local $0.20/hr
c1.xlarge
64-bit
2.0-2.66 GHz Xeon
8 7.5 GB 1-Gbps Ethernet Local $0.80/hr
abe.local
64-bit
2.33 GHz Xeon 8 8 GB 10-Gbps InfiniBand
Local …
abe.lustre
32-bit
2.0-2.6 GHz Opteron 8 18 GB 10-Gbps InfiniBand Lustre …
Execution Environment
Establish equivalent software environments on the two platforms “Submit” host used to send jobs to EC2 or Abe.All workflows used the Pegasus Workflow Management System with DAGMan and Condor.
Pegasus - transforms abstract workflow descriptions into concrete plansDAGMan – manages dependencies Condor manages task execution
Amazon EC2
Abe
Montage Performance(I/O Bound)
Slowest on m1.small, but fastest on those machines with the most cores: m1.xlarge, c1.xlarge and abe.lustre, abe.local.
The parallel file system on abe.lustre offers a big performance advantage for I/O bound systems – cloud providers would need to offer parallel file system and high-speed networks.
Virtualization overhead <10%
Broadband Performance (Memory bound)
Lower I/O requirements – not much difference between abe.lustre and abe.local; both have 8 GB memory. Only slightly worse performance on c1.xlarge, 7.5 GB memory.
Poor performance on c1.medium – only 1.7 GB of memory. Cores may sit idle to prevent system running out of memory.
Virtualization overhead small
Epigenome Performance (CPU Bound)
c1.xlarge, abe.lustre and abe.local give best performance – they are the three most powerful machines (64-bit, 2.3-2.6 GHz)
The parallel file system on abe.lustre offers little benefit.
Virtualization overhead is roughly 10%, largest of three apps - competing for CPU with OS.
Resource Cost
Analysis
You get what you pay for!
The cheapest instances are the least powerful.
Instance Cost $/hr
m1.small 0.10
m1.large 0.40
m1.xlarge 0.80
c1.medium 0.20
c1.xlarge 0.80
c1.medium a good choice for Montage but more powerful processors better for other two.
Data Transfer Costs
Operation Cost $/GB
Transfer In 0.10
Transfer Out 0.17
For Broadband and Epigenome, economical to transfer data out of the cloud For Montage, output larger than input, so the costs to transfer data out are equal to or higher than processing costs for all but one processing instance.
Is it more economical to store data on the cloud?
Application
Input (GB)
Output (GB)
Logs (MB)
Montage 4.2 7.9 40
Broadband
4.1 0.16 5.5
Epigenome
1.8 0.3 3.3Application
Input
Output
Logs Total
Montage $0.42 $1.32 <$0.01
$1.75
Broadband $0.40 $0.03 <$0.01
$0.43
Epigenome
$0.18 $0.05 <0.01 $0.23
Storage Costs
Item Charges $
Storage of VM’s in S3 Disk
0.15/GB-Month
Storage of data in EBS disk
0.10/GB-Month
Item Low Cost ($) High Cost ($)
Transfer Data In
0.42 0.42
Processing 0.55 2.45
Storage 1.07 1.07
Transfer Out … 1.32
Totals 2.04 5.22
Application
Data VM Monthly Cost $
Montage 0.95 0.12 1.07
Broadband
0.02 0.10 0.12
Epigenome
0.20 0.10 0.32
Storage Costs of Output/jobStorage Charges
… And the bottom line
Most cost-effective model?
15 Xeon 3.2- GHz dual processor dual-core Dell 2650 Power Edge servers
Aberdeen Technologies 6-TB staging disk farm
Dell PowerVault MD1200 storage disks
Transfer In ($)
Store 2MASS ($)
IPAC Service ($)
Transfer In 7,560 3,780 …
Store input data 17,100 61,500 13,200
Processing 9,000 9,000 66,000
Transfer Data Out
25,560 25,560
Cost $/job 1.65 2.75 2.20
Assume 1,000 2MASS mosaics of 4 deg sq centered on M17 per month for 3 years. Assume c1.medium processor on Amazon EC2
Conclusions Clouds can be used effectively and fairly efficiently for
scientific applications. The virtualization overhead is low.
The high speed network and parallel file systems give HPC clusters a significant performance advantage over cloud computing for I/O bound applications.
On Amazon EC2, primary cost for Montage is data transfer. Processing is primary cost for Broadband, epigenome.
Amazon EC2 offers no dramatic cost benefits over a locally mounted image-mosaic service.
Reference: G. Juve, E. Deelman, K. Vahi, G. Mehta, B. Berriman, B. P. Berman,
and P. Maechling, "Scientific Workflow Applications on Amazon EC2," in
CloudComputing Workshop in Conjunction with e-Science Oxford, UK:
IEEE, 2009