Evaluating Cloud Computing for HPC Applications Lavanya ... · – Largest data volume supernova search • Data Pipeline – Custom data analysis codes • Coordinated by Python

Evaluating Cloud Computing for HPC Applications

Lavanya Ramakrishnan CRD & NERSC

2

•  What are the unique needs and features of a science cloud? –  NERSC Magellan User Survey

•  What applications can efficiently run on a cloud? –  Benchmarking cloud technologies (Hadoop, Eucalyptus) and platforms (Amazon EC2, Azure)

•  Are cloud computing Programming Models such as Hadoop effective for scientific applications?

–  Experimentation with early applications •  Can scientific applications use a data-as-a-service or software-as-a-service model? •  What are the security implications of user-controlled cloud images? •  Is it practical to deploy a single logical cloud across multiple DOE sites? •  What is the cost and energy efficiency of clouds?

Magellan Research Agenda

3

Magellan User Survey

Program Office

Advanced Scientific Computing Research 17%

Biological and Environmental Research 9%

Basic Energy Sciences -Chemical Sciences 10%

Fusion Energy Sciences 10%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

Access to additional resources

Access to on-demand (commercial) paid resources closer to deadlines

Ability to control software environments specific to my application

Ability to share setup of software or experiments with collaborators

Ability to control groups/users

Exclusive access to the computing resources/ability to schedule independently of other groups/users

Easier to acquire/operate than a local cluster

Cost associativity? (i.e., I can get 10 cpus for 1 hr now or 2 cpus for 5 hrs at the same cost)

MapReduce Programming Model/Hadoop

Hadoop File System

User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources through science

Program Office

High Energy Physics 20%

Nuclear Physics 13%

Advanced Networking Initiative (ANI) Project 3%

Other 14%

4

•  Infrastructure as a Service (IaaS) –  Provide unlimited access to data storage and compute cycles –  e.g., Amazon EC2, Eucalyptus

•  Platform as a Service (PaaS) –  Delivery of a computing platform/software stack –  Container/images for specific user groups –  e.g., Hadoop, Azure

•  Software as a Service –  Specific function provided for use across multiple user groups (i.e. Science Gateways)

Cloud Computing Services

Magellan Software

5

Bat

ch Q

ueue

s

Virtu

al M

achi

nes

(Euc

alyp

tus)

Had

oop

Priv

ate

Clu

ster

s

Pub

lic o

r Rem

ote

Clo

ud

Sci

ence

and

Sto

rage

G

atew

ays ANI

Magellan Cluster

•  Web-service API to IaaS offering •  Uses Xen paravirtualization

–  cluster compute instance type uses hardware assisted virtualization

•  Non-persistent local disk in VM •  Simple Storage Service (S3)

–  scalable persistent object store •  Elastic Block Storage (EBS)

–  persistent, block level storage

Amazon Web Services

•  Open source IaaS implementation –  API compatible with Amazon AWS –  manage virtual machines

•  Walrus & Block Storage –  interface compatible to S3 & EBS

•  Available to users on Magellan testbed •  Private virtual clusters

–  scripts to manage dynamic virtual clusters –  NFS/Torque etc

Eucalyptus

•  Platforms –  Amazon, Azure, Lawrencium (IT cluster) –  Magellan

•  IB, TCP over IB, TCP over Ethernet, VM •  Workloads

–  HPCC –  NERSC6 Benchmarks –  Applications Pipelines

•  JGI Supernova Factory

•  Metrics –  Performance, Cost, Reliability, Programmability

Virtualization Impact

NERSC-6 Benchmark Performance [1/2]

0

2

4

6

8

10

12

14

16

18

GAMESS GTC IMPACT fvCAM MAESTRO256

Run

time

Rel

ativ

e to

Car

ver

Amazon EC2

Lawrencium

EC2-Beta

EC2-Beta-Opt

Franklin

Carver

NERSC-6 Benchmark Performance [2/2]

0

10

20

30

40

50

60

MILC PARATEC

Run

time

rela

tive

to C

arve

r

Amazon EC2

Lawrencium

EC2-Beta

EC2-Beta-Opt

Franklin

Carver

Magellan: NERSC6 Application Benchmarks

0 10 20 30 40 50 60 70 80 90

100

GTC PARATEC CAM

Perc

enta

ge P

erfo

rman

ce R

elat

ive

to N

ativ

e

TCP o IB

TCP o Eth

VM

Magellan: HPCC

0 2 4 6 8

10 12

Ping Pong Latency RandRing Latency

Perc

enta

ge

Perf

orm

ance

Rel

ativ

e to

Nat

ive

TCP o IB

TCP o Eth

IB/VM

0

20

40

60

80

100

DGEMM Ping Pong Bandwidth

HPL PTRANS Perc

enta

ge P

erfo

rman

ce w

ith

Res

pect

to N

ativ

e

TCP o IB TCP o Eth VM

•  James Hamilton’s cost model –  http://perspectives.mvdirona.com/2010/09/18/OverallDataCenterCosts.aspx –  expanded for HPC environments

•  Quantify difference in cost between IB and Ethernet

Performance-Cost Tradeoffs

Application Class Performance Increase/Cost

Increase Tightly Coupled with IO

(e.g., CAM) 4.36

Tightly Coupled with minimal IO (e.g., PARATEC, GTC)

1.2

•  Tools to measure the expansion history of the Universe and explore the nature of Dark Energy

–  Largest data volume supernova search •  Data Pipeline

–  Custom data analysis codes •  Coordinated by Python scripts

–  Run on a standard Linux batch queue cluster •  Cloud provides

–  Control over OS versions –  Root access and shared “group” account –  Immunity to externally enforced OS or architecture changes

Nearby Supernova Factory

Experiments on Amazon EC2

15

Input Data Output Data

EBS via NFS Local storage to EBS via NFS

Staged to local storage from EBS

Local storage to EBS via NFS

EBS via NFS EBS via NFS

Staged to local storage from EBS

EBS via NFS

EBS via NFS Local storage to S3

Staged to local storage from S3

Local storage to S3

Total Cost (Instance + Storage/month)

0

20

40

60

80

100

120

140

160

180

EBS-A1 EBS-A2 EBS-B1 EBS-B2 S3-A S3-B

Tota

l Cos

t of a

n ex

perim

ent (

Cos

t of i

nsta

nces

+

stor

gge

cost

/mon

th)

Experiment Matrix

Data Cost per month (80) Instance Cost (80)

Data Cost per month (40) Instance Cost (40)

Magellan Software

17

Bat

ch Q

ueue

s

Virtu

al M

achi

nes

(Euc

alyp

tus)

Had

oop

Priv

ate

Clu

ster

s

Pub

lic o

r Rem

ote

Clo

ud

Sci

ence

and

Sto

rage

G

atew

ays ANI

Magellan Cluster

Hadoop Stack •  Open source reliable, scalable

distributed computing –  implementation of MapReduce –  Hadoop Distributed File System (HDFS)

Core Avro

MapReduce HDFS

Pig Chukwa Hive HBase

Source: Hadoop: The Definitive Guide

Zoo Keeper

Hadoop for Science •  Advantages of Hadoop

–  transparent data replication, data locality aware scheduling

–  fault tolerance capabilities •  Mode of operation

–  use streaming to launch a script that calls executable

–  HDFS for input, need shared file system for binary and database

–  input format •  handle multi-line inputs (BLAST sequences), binary

data (High Energy Physics)

19

•  Compare traditional parallel file systems to HDFS –  TeraGen and Terasort to compare file system performance

•  32 maps for TeraGen and 64 reduces for Terasort over a terabyte of data

Hadoop Benchmarking: Early Results [1/2]

0

2000

4000

6000

GPFS HDFS Lustre

Tim

e (s

econ

ds)

TeraGen

Terasort-64 reduces

21

Hadoop Benchmarking: Early Results [2/2]

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60

Thro

ughp

ut M

B/s

Number of concurrent writers

10MB

10GB

TestDFSIO to understand concurrency at default block size

+  ~ 350 - 500 Genomes ~ .5 – 1 Mil Genes

Every 4 months

65 Samples: 21 Studies IMG+2.6 Mil genes 9.1 Mil total

Monthly

On demand

On demand + 330 Genomes !  158 GEBA

8.2 Mil genes

+ 287 Samples: ~105 Studies + 12.5 Mil genes 19 Mil genes

5,115 Genomes 6.5 Mil genes

IMG Systems: Genome & Metagenome Data Flow

BLAST on Hadoop •  NCBI BLAST (2.2.22)

–  reference IMG genomes- of 6.5 mil genes (~3Gb in size) –  full input set 12.5 mil metagenome genes against

reference •  BLAST Hadoop

–  uses streaming to manage input data sequences –  binary and databases on a shared file system

•  BLAST Task Farming Implementation –  server reads inputs and manages the tasks –  client runs blast, copies database to local disk or ramdisk

once on startup, pushes back results –  advantages: fault-resilient and allows incremental

expansion as resources come available

BLAST Performance

0 10 20 30 40 50 60 70 80 90

100

16 32 64 128

Tim

e (m

inut

es)

Number of processors

EC2-taskFarmer

Franklin-taskFarmer EC2-Hadoop

Azure

•  Initial config – Hadoop memory ulimit issues, –  Hadoop memory limits increased to accommodate high memory tasks –  1 map per node for high memory tasks to reduce contention –  thrashing when DB does not fit in memory

•  NFS shared file system for common DB –  move DB to local nodes (copy to local /tmp). –  initial copy takes 2 hours, but now BLAST job completes in < 10 minutes –  performance is equivalent to other cloud environments. –  future: Experiment with Distributed Cache

•  Time to solution varies - no guarantee of simultaneous availability of resources

BLAST on Yahoo! M45 Hadoop

Strong user group and sysadmin support was key in working through this.

•  Porting applications still requires lots of work •  Public clouds

–  Virtualization has a performance impact –  Failures when creating large instances –  Data Costs tend to be overwhelming

•  Eucalyptus –  Learning curve and stability at scale

•  Alternate stacks and trying version 2.0 –  SSE instructions were not exposed in VM

•  Additional benchmarking

Summary: Virtualization for Science

26

•  Deployment Challenges –  all jobs run as user “hadoop” affecting file permissions –  less control on how many nodes are used - affects allocation policies –  file system performance for large file sizes

•  Programming Challenges: No turn-key solution –  using existing code bases, managing input formats and data

•  Performance –  BLAST over Hadoop: performance is comparable to existing systems –  existing parallel file systems can be used through Hadoop On Demand

•  Additional benchmarking, tuning needed, Plug-ins for Science

Summary: Hadoop for Science

Acknowledgements

This work was funded in part by the Advanced Scientific Computing Research (ASCR) in the DOE Office of Science under contract number DE-C02-05CH11231.

CITRIS/UC, Yahoo M45!, Amazon EC2 Education and Research Grants, Microsoft Research, Wei Lu, Dennis Gannon, Masoud Nikravesh and Greg Bell.

Magellan - Shane Canon, Iwona Sakrejda Magellan Benchmarking – Shane Canon, Nick Wright EC2 Benchmarking – Keith Jackson, Krishna Muriki, John Shalf,

Shane Canon, Harvey Wasserman, Shreyas Cholia, Nick Wright BLAST – Shreyas Cholia, Keith Jackson, Shane Canon, John Shalf Supernova Factory – Keith Jackson, Rollin Thomas, Karl Runge

29

Questions [email protected]

Documents

Evaluating Cloud Computing for HPC Applications Lavanya ... · – Largest data volume supernova search • Data Pipeline – Custom data analysis codes • Coordinated by Python