64
Scien&fic Compu&ng on AWS: NASA/JPL, ESA and CERN Jamie Kinney Principal Solutions Architect World Wide Public Sector [email protected] @jamiekinney 1

Scientific Computing With Amazon Web Services

Embed Size (px)

DESCRIPTION

Researchers from around the world are increasingly using AWS for a wide-array of use cases. This presentation describes how AWS facilitates scientific collaboration and powers some of the world's largest scientific efforts, including real-world examples from NASA JPL, the European Space Agency (ESA) and CERN's CMS particle detector.

Citation preview

Page 1: Scientific Computing With Amazon Web Services

Scien&fic  Compu&ng  on  AWS:NASA/JPL,  ESA  and  CERN

Jamie KinneyPrincipal Solutions ArchitectWorld Wide Public [email protected]@jamiekinney

1

Page 2: Scientific Computing With Amazon Web Services

?How do researchers use AWS today?

Can you run HPC on AWS?

Should everything run on the cloud?

How does AWS facilitate scientific collaboration?

2

Page 3: Scientific Computing With Amazon Web Services

Amazon Web Services

AWS Global Infrastructure

Application Services

Networking

Deployment & Administration

DatabaseStorageCompute

3

Page 4: Scientific Computing With Amazon Web Services

Amazon EC2

4

Page 5: Scientific Computing With Amazon Web Services

ec2-run-instances

5

Page 6: Scientific Computing With Amazon Web Services

6

Page 7: Scientific Computing With Amazon Web Services

Programmable

7

Page 8: Scientific Computing With Amazon Web Services

8

Page 9: Scientific Computing With Amazon Web Services

9

Page 10: Scientific Computing With Amazon Web Services

Elastic

10

Page 11: Scientific Computing With Amazon Web Services

Self Hosting

Waste

CustomerDissatisfaction

Actual demand

Predicted Demand

Rigid

Actual demand

Elastic

11

Page 12: Scientific Computing With Amazon Web Services

Go from one instance...

12

Page 13: Scientific Computing With Amazon Web Services

To Thousands

13

Page 14: Scientific Computing With Amazon Web Services

Instance Types

14

Page 15: Scientific Computing With Amazon Web Services

Standard (m1)High Memory (m2,m3)

High CPU (c1)

15

Page 16: Scientific Computing With Amazon Web Services

Intel Nehalem (cc1.4xlarge)Nvidia GPUs (cg1.4xlarge)

2TB of SSD 120,000 IOPS (hi1.4xlarge)Intel Sandy Bridge E5-2670 (cc2.8xlarge)

Sandy Bridge, NUMA, 240GB RAM (cr1.4xlarge)48 TB of ephemeral storage (hs1.8xlarge)

Cluster Compute

16

Page 17: Scientific Computing With Amazon Web Services

17

Page 18: Scientific Computing With Amazon Web Services

Placement Groups

18

Page 19: Scientific Computing With Amazon Web Services

10 gig EPlacement

Group

Full

Bisection

EC2

EC2

EC2

EC2 EC2 EC2

EC2

EC2EC2

19

Page 20: Scientific Computing With Amazon Web Services

What is Scientific Computing?

20

Page 21: Scientific Computing With Amazon Web Services

Use Cases

•Science-as-a-Service•Large-scale HTC (100,000+ core clusters)•Large-scale MapReduce (Hadoop/Spark/Shark) using EMR or EC2•Small to medium-scale MPI clusters (hundreds of nodes)•Many small MPI clusters working in parallel to explore parameter space•GPGPU workloads•Dev/test of MPI workloads prior to submitting to supercomputing centers•Collaborative research environments•On-demand academic training/lab environments

21

Page 22: Scientific Computing With Amazon Web Services

Large Input Data Sets

22

Page 23: Scientific Computing With Amazon Web Services

ESA Gaia Mission Overview

ESA’s Gaia is an ambitious mission to chart a three-dimensional map of the Milky Way Galaxy in order to reveal the composition, formation and evolution of our Galaxy.

Gaia will repeatedly analyze and record the positions and magnitude of approximately one billion stars over the course of several years.

1 billion stars x 80 observations x 10 readouts = ~1 x 10^12 samples.

1ms processing time/sample = more than 30 years of processing

23

Page 24: Scientific Computing With Amazon Web Services

Gaia Solution Overview

• Purchase at the beginning of the mission for the anticipated high-water mark

• Pay as you go: Launch what you need, as you need it. Turn instances off when you’re done

• Purchase additional systems for redundancy

• If an instance fails, turn it off and launch a replacement at no additional charge

• Large-scale data reprocessing is constrained to available infrastructure. No way to accelerate jobs without additional CapEx

• Need to reprocess the data within a few hours, simply launch more instances. 100 machines running for 1 hour at the same cost as 1 machine running for 100 hours

• Performance constrained to processor/disk/memory available at time of procurement...for a multi-year mission

• AWS frequently launches new instance types running the latest hardware. Simply restart your instances on a newer instance type and stop paying for less-capable infrastructure.

• Data transfer and security policies make it difficult to collaborate with researchers located elsewhere

• Easily and securely collaborate with researchers around the world

24

Page 25: Scientific Computing With Amazon Web Services

Many Iterations With Varying Parameters

25

Page 26: Scientific Computing With Amazon Web Services

Linear Algebra Calculations

26

Page 27: Scientific Computing With Amazon Web Services

27

Page 28: Scientific Computing With Amazon Web Services

JPL Pasadena, CA

CDSCC

Canberra Deep Space

Communication Complex

MDSCC

Madrid Deep Space

Communication Complex

GDSCC

Goldstone Deep Space

Communication Complex

ARC

CheMinMoffett Field, CA

MSSS

MARDI, MAHLI,

MastCamSan Diego, CA

KSC

IKI

DANMoscow, Russia

INTA

REMSMadrid, Spain

LANL

ChemCamLos Alamos, NM

UofGuelph

APXSGuelph, OntarioSwRI

RADBoulder, CO

GSFC

SAMGreenbelt, MD

Plus hundreds of other sites around the world for

Co-Is and Colleagues

MSL Distributed Operations

28

Page 29: Scientific Computing With Amazon Web Services

Data Locality Challenges

Scientist 1 retrieves data from L.A.

Scientist 1 returns data to L.A.

Scientist 2 retrieves data from L.A.

Scientist 2 returns data to L.A.

29

Page 30: Scientific Computing With Amazon Web Services

AWS Global Infrastructure

9 regions

25 availability zones

38 edge locations

30

Page 31: Scientific Computing With Amazon Web Services

AWS Public Data Sets

AWS.amazon.com/datasets31

Page 32: Scientific Computing With Amazon Web Services

Data Locality Challenges

Researcher in L.A. uploads data to the cloud

Scientist 1 uses cloud resources to process data

Scientist 2 retrieves data products from edge network

Scientist 2 uses cloud resources to process data

Global collaboration

32

Page 33: Scientific Computing With Amazon Web Services

33

Page 34: Scientific Computing With Amazon Web Services

On-Demand Pricing

34

Page 35: Scientific Computing With Amazon Web Services

Reserved Instances

35

Page 36: Scientific Computing With Amazon Web Services

Spot Instances

• Bid $X per hour

• If current price <= bid, instance starts

• If current price > bid, instance terminates

• Customers pay market rate, not bid

36

Page 37: Scientific Computing With Amazon Web Services

U. Wisc.: CMS Particle Detector

http://www.hep.wisc.edu/~dan/talks/EC2SpotForCMS.pdf

37

Page 38: Scientific Computing With Amazon Web Services

Integrated Architectures

38

Page 39: Scientific Computing With Amazon Web Services

Amazon VPC

AWS Direct Connect

EC2 EC2

EC2EC2

Los AngelesSingapore

JapanLondon

Sao PaoloNew YorkSydney

39

Page 40: Scientific Computing With Amazon Web Services

40

Page 41: Scientific Computing With Amazon Web Services

Secured Uplink Planning

41

Page 42: Scientific Computing With Amazon Web Services

JPL Data Center

Decider

File Transfer Workers

Data Processing

Workers

Polyphony

Amazon SWF

Decider

Data Processing Tasks

File Transfer Tasks

Decision Tasks

Create EC2 Instances

Upload and Download

File Chunks

Data Processing Workers

EC2 EC2 EC2 EC2

S3

42

Page 43: Scientific Computing With Amazon Web Services

SWFEC2S3

SimpleDBCloudWatch

IAMsELB

5 Giga-pixels in 5 minutes!

43

Page 44: Scientific Computing With Amazon Web Services

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

EC2

Ames

Large, tightly-coupled MPI

Large EP, smaller scale tightly-coupled MPI, dev/test, burst capacity

Small scale MPI and EP

NASA Researcher

44

Page 45: Scientific Computing With Amazon Web Services

45

Page 46: Scientific Computing With Amazon Web Services

46

Page 47: Scientific Computing With Amazon Web Services

Zero to Internet-Scale in One Week!

47

Page 48: Scientific Computing With Amazon Web Services

ELBs on Steroids

48

Page 49: Scientific Computing With Amazon Web Services

Route53

49

Page 50: Scientific Computing With Amazon Web Services

CloudFormation

50

Page 51: Scientific Computing With Amazon Web Services

CloudFront

51

Page 52: Scientific Computing With Amazon Web Services

Regions and AZs

52

Page 53: Scientific Computing With Amazon Web Services

Mars Science Laboratory - Live Video StreamingArchitecture

Availability Zone: us-east-1a

Adobe Flash Media Server

Availability Zone: us-west-1b Telestream Wirecast

CloudFront streaming for

museum partners

Adobe Flash Media Server

Elastic LoadBalancer

Tier 2 Nginx Cache

Tier 1 Nginx Cache

Cloud Formation Stack

Elastic LoadBalancer

Tier 2 Nginx Cache

Tier 1 Nginx Cache

Cloud Formation Stack

53

Page 54: Scientific Computing With Amazon Web Services

Battle Testing JPL’s DeploymentBenchmarking

54

Page 55: Scientific Computing With Amazon Web Services

Dynamic Traffic ScalingUS-East Cache Node Performance

11.4 Gbps

55

Page 56: Scientific Computing With Amazon Web Services

Dynamic Traffic ScalingUS-East Cache Node Performance

25.3 Gbps

56

Page 57: Scientific Computing With Amazon Web Services

Dynamic Traffic ScalingUS-East Cache Node Performance

10.1 Gbps

57

Page 58: Scientific Computing With Amazon Web Services

Dynamic Traffic ScalingUS-East Cache Node Performance

40.3 Gbps

58

Page 59: Scientific Computing With Amazon Web Services

Dynamic Traffic ScalingUS-East Cache Node Performance

26.6 Gbps

59

Page 60: Scientific Computing With Amazon Web Services

Only ~42Mbps

Dynamic Traffic ScalingImpact on US-East FMS Origin Servers

60

Page 61: Scientific Computing With Amazon Web Services

Only ~42Mbps

Dynamic Traffic ScalingImpact on US-East FMS Origin Servers

61

Page 62: Scientific Computing With Amazon Web Services

CloudFront BehaviorsUsing ELBs for Dynamic Content

62

Page 63: Scientific Computing With Amazon Web Services

AWS Academic Grants

AWS.amazon.com/grants63

Page 64: Scientific Computing With Amazon Web Services

Thank You

64