View
3
Download
0
Category
Preview:
Citation preview
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Kevin JorissenSeattle
Kevin has 10 years of experience incomputational science, and holds a Ph.D. inPhysics. He developed codes solving the quantumphysics equations for light absorption bymaterials, taught workshops to scientistsworldwide, and wrote about high performancecomputing in the cloud before it was fashionable.He worked as a postdoctoral researcher inAntwerp, Lausanne, Seattle, and Zurich. Hecontributed to the WIEN2k code (DensityFunctional Theory calculations of materialproperties, www.wien2k.at) and the FEFF code (X-ray and Electron absorption spectra,www.feffproject.org).
Kevin joined Amazon in 2015 to help acceleratethe adoption of cloud computing in the scientificcommunity globally.,
BIO
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
What research&HPC has been successful in the cloud
and why
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
1. The long-term trends in Scientific Computing
? How can we• Democratize research computing so everybody can use it? (no longer just HPC experts)• Meet the need for a variety of hardware platforms? (no longer just CPU based)• Support diverse applications and frameworks? (no longer just Fortran+MPI physics codes)
Additional challenges• Data gravity: massive volumes of data• Cross-disciplinary research• Research Data Management• Data compliance and security• Reproducibility and reusability• New methods, e.g. serverless computing; ML; domain platforms (e.g. Cromwell, Pangeo, …)
Scientific computing will have to evolve to solve these challenges.The public cloud (e.g. AWS) has the right characteristics. (because it evolved under similar constraints)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
2. Key Strengths of AWS for Scientific Discovery
Improve time to discovery• Resources are available when needed• Experiment fast (‘agility’)• Avoid undifferentiated work by using advanced managed services
Collaboration• Store massive data sets• Share them with your collaborators• With compute/analytics/ML tools available• In a highly secure and compliant way
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
https://aws.amazon.com/blogs/aws/saving-koalas-using-genomics-research-and-cloud-computing/
Availability of resources: (We’re off to a cute start …)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Moving quickly with managed services: Jupiter Intelligence
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Collaborating on scientific data in the cloud
AthenaGlue
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Collaborating on scientific data in the cloud
NOAA- NEXRAD on AWS S3, usage increased 2.3x
greater scientific impact
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
NIH - Strides
http://www.cancergenomicscloud.org
Funded projects to create collaborative environments on cloudTens of PB of cancer data coming to the cloud
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
3. Institutional goals
Can cloud make university’s research data more reusable?Can cloud make students more employable after graduation?Can cloud shorten average time-to-discovery and boost impact?Can cloud raise the university’s profile for research (inter)nationally?Can cloud help make competitive faculty hires? (Extra resources allow the competitive new hire to stay on top of the field)Can cloud help new faculty build impact faster? (Put cloud $ in every startup package and see citations build up faster.)Can cloud democratize compute/analytics/ML/AI across all departments?Can cloud help grad students finish up faster? Can cloud boost the approval rate of grant applications?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
3 areas of work:
Toolset:150-odd servicesLearning pathsPrograms(Educate, Open Data, Egress Waiver, Academy, Catalyst, …)
1000’s of 3rd party solutions
Education Research Operations
A University Cloud Journey Quadrant
4 Horsemen:Capability (Can I?)Compliance (May I?)Cost (How much?)Complexity (How the …?)
Timeline:Champions
to
Institutionto
Ecosystem
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
A guided tour of cool people and thingsin the cloud
you be the judge
Page 15The University of Sydney
De novo Transcriptome Assembly
Current HPC Limitations RONIN Summary
• Data since 30th July• Compute requirements highly variable across samples• Temp storage issues - Unable to run on node with other jobs• Strict versioning of dependencies• Often many failures/errors despite using same code with new samples• Parallelization achieved on NCI thanks to SIH
• Setup Time = <1hr• Multiple species and tissues run in
parallel using auto-scaling cluster• 5 assemblies complete in <1
week + QC in <24 hr• Cost: <$500 per assembly
1. A student: meet Parice (U Sydney)
2. A group: meet JGI
• Plant genomes: large data & compute needs• Need 4TB servers that aren’t available in-house• Use Cromwell on AWS (https://docs.opendata.aws/genomics-workflows/ )
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademarkhttps://aws.amazon.com/blogs/aws/natural-language-processing-at-clemson-university-1-1-million-vcpus-ec2-spot-instances/
“I am absolutely thrilled with the outcome of this experiment. The graduate students on the project […] used resources from AWS and Omnibond and developed a new software infrastructure to perform research at a scale and time-to-completion not possible with only campus resources.” – Prof. Amy Apon, Co-Director of the Complex Systems, Analytics and Visualization Institute
“spot market”: cheap AWS computing –a good fit for research
3. A group: meet Clemson U Analytics & Visualization Institute HPC in the cloud : 550,000 cores for Natural Language Processing (Machine Learning)
4. A collaboration: meet Joint Center for Satellite Data Assimilation (JCSDA)• Joint Effort for Data Integration (JEDI) is a next-generation data assimilation (DA) system for
numerical weather prediction (NWP) that is capable and flexible enough to use for both researchand operations. Run the FV3GFS global model on Amazon Web Services, at full resolution andwith the pre-operational configuration.
• 48-node (1,728-core) compute clusters on AWS.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
5. A collaboration: meet RISElab @UCBerkeley
Collaborative 5-year effort between UC Berkeley, National Science Foundation,and industry partners. (2017-2021) – AWS is founding partner. https://riselab.cs.berkeley.edu
• Students and researchers at RISELab use AWS to rapidly prototype and develop new systems at a scale and speed not possible before.
• Previously built Apache Spark, developed on AWS, and integrated with AWS core services.
GOAL:
6. A collaboration: meet Pangeo
• Science gateways are the future: http://pangeo.io/architecture.html• Reproducible/reusable research: https://medium.com/pangeo/cesm-lens-on-aws-
4e2a996397a1
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
8. An institution: meet Emory University
https://edscoop.com/emory-university-research-aws-cloud-rich-mendola
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
9. An institution: institutional support for researchers @ PNNL
18 hours205,000 materials analyzed
156,314 AWS Spot cores at peak2.3M core-hours
Total spending: $33K(Under 1.5 cents per core-hour)
https://www.youtube.com/watch?v=hcnhdwnSY94
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
9. An institution: institutional support for researchers @ PNNL
18 hours205,000 materials analyzed
156,314 AWS Spot cores at peak2.3M core-hours
Total spending: $33K(Under 1.5 cents per core-hour)
https://www.youtube.com/watch?v=hcnhdwnSY94The Cloud Champion team:-Research software engineers (RSE)-cloud architects (SA)-consulting/Proserv role: trusted advisor for researchers, maybe build or help build pipelines-can be embedded in central HPC/IT -training workshops for end users(e.g. we can come do a SageMaker workshop for you)
10. An institution: NIH - Strides
http://www.cancergenomicscloud.org
Funded projects to create collaborative environments on cloudTens of PB of cancer data coming to the cloud
11. A country: meet Chile, leader in astronomy
Don’t miss
• Workshop this Friday: Machine Learning on AWS (with SageMaker)• Workshop this Friday: HPC on AWS (with Ronin)• Late November: Supercomputing ’19 in Denver• Early December: re:invent ‘19 in Vegas (lots of new AWS services)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
The Amazon AI/ML Stack
PLATFORM SERVICES
APPLICATION SERVICES
FRAMEWORKS & INTERFACES
Caffe2 CNTK Apache MXNet PyTorch TensorFlow Torch Keras Gluon
AWS Deep Learning AMIs
Amazon SageMaker AWS DeepLens
Rekognition Transcribe Translate Polly Comprehend Lex
INFRASTRUCTURE
CPU IoT & EdgeGPU (P3) Mobile
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
HPC in the cloud is serious: Seismic modeling at PFLOP scale
Created a big CLUSTER inthe AWS cloud.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
HPC in the cloud is serious: a surprise TOP500 run
#136
https://medium.com/descarteslabs-team/thunder-from-the-cloud-40-000-cores-running-in-concert-on-aws-bf1610679978
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
+
Register for the Researchers Handbook to AWS:aws.amazon.com/rcp
Go play with an Open Dataset:
registry.opendata.aws
1. 2.
Thank Youjorissen@amazon.com
+
3.
Go play with ML on AWS:https://github.com/wleepang/sagemaker4research-workshop
Recommended