Upload
amazon-web-services
View
434
Download
1
Tags:
Embed Size (px)
Citation preview
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Accelerating Time to Science:Transforming Research in the Cloud
Jamie Kinney - @jamiekinneyDirector of Scientific Computing, a.k.a. “SciCo” – Amazon Web Services
Dr. Michael Ernst - @brookhavenlabDirector, RHIC and ATLAS Computing Facility - Brookhaven National
Laboratory
©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Agenda• An introduction to scientific computing on AWS
• How are researchers using AWS today?
• Case study: How the ATLAS experiment is using AWS
• Q & A
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
What do we mean by scientific computing?
Scientific computing refers to the application of simulation, mathematical modeling, and quantitative analysis to analyze and solve scientific problems.
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
How is AWS used for scientific computing?
• High Performance Computing (HPC) for engineering and simulation
• High-throughput computing (HTC) for data-intensive analytics
• Hybrid supercomputing centers• Collaborative research environments• Citizen science• Science-as-a-Service
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Why do researchers love using AWS?
Time to scienceaccess research
infrastructure in minutes
Low costpay-as-you-go pricing
Globally accessibleeasily collaborate with
researchers around the world
SecureA collection of tools toprotect data and privacy
Scalableaccess to effectively limitless capacity
Elasticeasily add or remove capacity
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Why does AWS care about scientific computing?
• We want to improve our world by accelerating the pace of scientific discovery• It is a great application of AWS with a broad customer base• The scientific community helps us innovate on behalf of all customers
– Streaming data processing and analytics– Exabyte scale data management solutions and exaflop scale compute– Collaborative research tools and techniques– New AWS regions– Significant advances in low-power compute, storage, and data centers– Efficiencies that will lower our costs and therefore pricing for all customers
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Research grantsAWS provides free usage credits to help researchers:
• Teach advanced courses• Explore new projects• Create resources for the scientific community
aws.amazon.com/grants
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Peering with all global research networks
Image courtesy John Hover - Brookhaven National Lab
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Restricted-access genomics on AWS
aws.amazon.com/genomics
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
High-throughput computing at scale
The Large Hadron Collider experiments @ CERN involve thousands of researchers from over 40 countries and produces tens of PB of data each year.
The ATLAS and CMS experiments are using AWS for Monte Carlo simulations and analysis of LHC data.
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Data-intensive computingThe Square Kilometer Array (SKA) will link 250,000 radio telescopes together, creating the world’s most sensitive telescope. The SKA will generate zettabytes of raw data, publishing exabytes annually over 30-40 years.
Researchers are using AWS to develop and test: • Data processing pipelines• Image visualization tools• Exabyte-scale research data management• Collaborative research environments
aws.amazon.com/solutions/case-studies/icrar/
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
High Performance Computing
Simulations in the automotive sector• Crash and materials simulations• Fluid and thermal dynamics simulations• Car body aerodynamics• Electronics and electromagnetic simulations
Honda materials science simulations on AWS:• Deploying scalable HPC clusters on AWS Spot Instances – up to 1,000 C3
instances• Running more simulations than before, for more accurate results
“Cloud offers us an opportunity, as we can innovate faster than before.” - Ayumi Tada, IT System Administrator, Honda R&D
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Schrodinger and Cycle Computing:Computational chemistry for better solar power
Simulation by Mark Thompson of the
University of Southern California to see
which of 205,000 organic compounds
could be used for photovoltaic cells for
solar panel material.
Estimated computation time 264 years
completed in 18 hours.
• 156,314 core cluster, 8 regions
• 1.21 petaFLOPS (Rpeak)
• $33,000 or 16¢ per molecule
loosely coupled
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Science-as-a-Service
Globus Genomics, DNAnexus, and SevenBridges Genomics offer inexpensive, easy-to-use, and secure platforms for processing and analyzing genomic data.
The Weather Company pushes four gigabytes of data to AWS each second in order to deliver 15 billion forecasts each day to their customers around the world.
aws.amazon.com/solutions/case-studies/the-weather-company/
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Accelerating Scientific Discovery in the Cloud
Michael Ernst
Brookhaven National Laboratory
June 25, 2015AWS Government, Education, and Nonprofits Symposium
Washington, D.C.
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 201525
LHCb
CMS
ALICE ATLAS
.
The Large Hadron Collider at CERN
27 km
25
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
New Physics Frontiers in LHC Run 2
30
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
32
Big data: Not a buzz word when it comes to ATLAS
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
LEPdataset: a few TB
ATLASdataset: 160 PB
NDN
33
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
•
••
•••
••
•••
• …
34
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
ATLAS workload: Managed by PanDA
35
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Leveraging the AWS Spot market for compute-hungry HEP
• Cloud resources are very valuable to HEP experimental computing, and HEP generally is a big user
• In the past, experimental HEP has used commercial cloud resources little – we want to change that
• We are compute-limited in our science – cloud resources can enrich the science• Clouds have (cost-efficient) room for us if our workload is fine-grained and flexible,
even when the resource occupancy is high• Just as there’s room for sand in a full jar of rocks, there’s room for us• Joint project with AWS Scientific Computing team and ESnet
• Scoped out a pilot centered on representative HEP/ATLAS workflows • AWS contributes precious technical expertise and credits for trial runs• ESnet contributes expertise and network gear at the AWS/ESnet peering points
• ESnet participation is central to AWS waiving the egress fee (cond. apply)
• Which brings us to our new fine-grained data processing system 37
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
• We’ve leveraged new developments in our Workload Management System (PanDA), our parallel software framework, powerful networking, and efficient I/O and storage to implement a new approach to event processing – a fine-grained event service
• An extension to PanDA that allows it to manage event-level workloads (instead of file level workloads where hundreds of events are clustered)
• Object stores (e.g. S3) provide highly scalable storage for many small event-scale outputs
Applicable to any workflow (not just HEP) able to support fine-grained partitioning of the processing and its output
Data-intensive, network-centric, platform-agnostic computing• An increasingly important paradigm in the scientific computing
community 38
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
ATLAS simulated event production is currently running on EC2 using the event service
• PanDA “Site” at BNL sends jobs to EC2 Spot Market VMs• Exercising scaling to >50k concurrent jobs, entering production soon• Event Service maximizes return on short-lived job slots (~1h)• Leverages capability from the BNL Tier 1 to elastically and transparently expand
workloads into cloud resources: after dedicated resources are fully utilized, jobs overflow into the cloud to accommodate peak demands
40
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Using cloud resources effectively: A policy-based cloud scheduler
Policy
Fully transparent to Workload ManagementSystem (e.g. PanDA),Elastically expandspool of compute resources accordingto user-defined policy
Demand-driven, policy-basedprogrammaticinstantiation andcontraction of cloud resources
41
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Elastic cluster: “Flexible and nimble” provisioning
Programmatically instantiatesCompute resources in theCloud
Designed to serve- Peak demands- Users without dedicated resources- Dynamic creation
of specific resource types (e.g. DB, storage, DTNs)
Goal: setup time <5% of total compute time
HTCondor
HTCondor
42
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Architectural overview from the facility perspective
43
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
100G R&E Exchange
Direct ConnectESnet Pilot 2x10G
AWS Planned 100G to PNWG
Seattle
Direct ConnectESnet Pilot 1x10G
Connecting AWS Facilities to the Research Community
44
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Image Authoring and Runtime ConfigurationDesign goals:
• Useful for ATLAS, but usable by other VOs.• Eliminate runtime RPM installation. Fatal with O(1000) startups.• Images deterministically reproduceable. No snapshotting. • Provide the ability for other users to do it themselves (make toolset public).• Flexibility between build-time and runtime customization. Both options OK.• Open source only. Only use functions/services for which open source equivalents exist (EC2, S3). • Off-the-shelf, non-cloud (Puppet, Hiera, Condor, Yum) wherever possible. Off-the-shelf cloud (cloud-
init, Imagefactory/Oz) only where needed.• Keep custom parts small, simple, and/or optional.
• 10,000 ft summary:• Imagefactory 1.1.7 generates VMs from merged hierarchical templates. • Masterless puppet consumes single Hiera file (injected via cloud-init write_file) at boot.
45
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Build Framework
46
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Final remarks• ATLAS has met the challenge of data-intensive computing at a scale not seen before• Resource virtualization - integration of storage, compute and network - in a
seamless manner, including cloud and local resources• A rather complete and still growing set of AWS services to instantiate VMs,
allocate storage, and network dynamically• New innovations like the Event Server allow ATLAS to efficiently harvest EC2 spot
market resources to meet its computing growth needs• The joint project with the AWS Scientific Computing team and ESnet has been
crucial to the successful implementation
47
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Additional resources• aws.amazon.com/hpc• aws.amazon.com/big-data• aws.amazon.com/grants• aws.amazon.com/genomics• aws.amazon.com/compliance• aws.amazon.com/security
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Thank You.This presentation will be loaded to SlideShare the week following the Symposium.
http://www.slideshare.net/AmazonWebServices
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015