Utility HPC: Right Systems, Right Scale, Right Science

Preview:

Citation preview

Utility HPC: Right Systems, Right Scale,

Right Science

Jason Stowe, CEO @jasonastowe, @cyclecomputing

I’m here to recruit you, for a cause

We believe utility access to compute power

makes impossible science, possible.

Dynamic, utility access to compute power

is as important as uptime

(that’s why coded infrastructure is critical)

Skeptical? Flickr:  Tourist  on  Earth  

In prior years (today?)

Researchers/engineers waited for computing

For  the  horsepower  

For  the  place    to  put  it  

For  it  to  be    Configured..  

Flickr: vaxomatic

Yesterday, high performance engineering, science clusters

were…

Too small when you need it most,

Too large every other time.

The Innovation Bottleneck: Researchers/Scientists/Engineers

Forced to size questions to the infrastructure you have

 

Multi-­‐tenant  systems  create  float  capacity  That  is  critical  to  innovation  

 

The 60’s

The 70’s

The 80’s

The 90’s

The 00’s

From centralized to decentralized, collaborative to independent

and right back again!

The 10’s

Mainframes VAX   The  PC   Beowulf Clusters Central  Clouds  

100% 60% 0% 40% ??? %

SHARIN

G   ~  0Mbit   ~ 1Mbit ~ 10Mbit ~  1000  Mbit   ~ 10,000 Mbit

Bigger, better but further and further away from the scientist’s lab

Ask a Question Hypothesize Predict Experiment /

Test Analyze Final Results        

The Scientific Method

Test and Analyze stages require the most time,

compute, and data

Ask a Question Hypothesize Predict Experiment /

Test Analyze Final Results        

The Scientific Method

Any improvements to this cycle yield multiplicative

benefits

A Challenge Across Industries � 3 of Top 5 Insurance � 6 of Top 8 Pharmaceutical � 2 of Top 3 Banks � 2 of Top 3 Genomics Sequencing � 1 of Top 2 FPGA

Utility HPC in the News�WSJ, NYTimes, Wired, Bio-IT World BusinessWeek

To accelerate science, we need automation

Management Software

CC1/CCG Instances EBS S3

Shared FS

EBS

Utility  HPC  Cluster  -­‐ Scales  to  50,000+  cores  -­‐ Data  Scheduling  -­‐ Workload  portability  

Data & Application

Aware Movement

Traditional Scheduler

Massive Scale Based upon workload

Secure, HPC Cluster

User

HPC Reporting &

Audit

50,000-core CycleCloud Using Chef and AWS

ChefConf 2012

10,600-instance cluster against cancer target

ChefConf 2013

Created in 2 hours Configured with Search,

with Data bags

one Chef 11 server

We make software tools to easily orchestrate complex workloads and data access across Utility HPC

Today is a survey of use cases…

10,600 instance Life Science

Molecular Modeling

600 core Manufacturing Nuclear Power Plant for safety

simulation

Genomic Analysis RNA for

Stem Cells

Dynamic, utility access to compute power

is as important as uptime

Why?

#1: “Better” Science =

“Answer the question we want to ask”, not constrained to what fits

on local compute power

#2 “Faster” Science =

Run this “better” science, that would have taken

months or years in hours or days

Survey of Use Cases þ Drug Design þ CAD/CAM þ Genomics …

Life Sciences & Compute? C

ompu

te

Data/Bandwidth

Genomics

Molecular Modeling

CAD/ CAM

All Sample Analysis

Proteomics Biomarker/

Image Analysis

Sensor Data Import

Creating fake Charts, with Fake Data

Why is this important?

(W.H.O./Globocan 2008)

~2 million Type 2 diabetics, ~200k Type 1

Every day is crucial and costly

Before: Trade-off compute time vs.

accuracy

Now: Accurate analysis, fewer false

negatives, faster Initial

Coarse Screen

Higher Quality

Analysis

Best Quality

Process for Drug Design

Higher Quality

Analysis

Best Quality

Big 10 Pharma Built 10,600 instance cluster

($44M) in 2 hours, ran 40 years of science

in 11 hours for $4,372

Most Recent Utility Supercomputer server count:

AWS Console view:

Cycle’s view of this cluster:

One Chef 11 Server

Earlier Drug Design Novartis discussed at BioIT2012

� Needed �  Push-button Utility Supercomputer for molecular

modeling � Created

�  30,000 core run across US/EU Cloud (AWS) �  10 years of compute in 8 hours for $10,000 �  Found 3 compounds now in the wetlab as a result

�  Capacity is no longer an issue

�  Hardware = software �  Testing (error handling, unit testing, etc.)

e.g. Cycle spent ~$1M dollars on AWS over 5 years

�  The only way to do this is to automate

Lessons learned

 Servers  are  not    house  plants  

 

 Servers  are  wheat  

 

Survey of Use Cases þ Drug Design þ CAD/CAM þ Genomics …

Nuclear Power Plant simulation

We don’t’ know what they’re running, but it has “Safety”

600-core CAD/CAM 3 Quarters of a year wait became 3 weeks

Site Data

Corporate

Firewall

3 Weeks instead Of 3 Quarters

Secure HPC

Cluster

TBs FS

External Cloud  

~600 CPU cluster Scheduled

Data Engineer

Survey of Use Cases þ Drug Design þ CAD/CAM þ Genomics …

Gene Expression Analysis Morgridge Institute for Research

Run holistic comparison of all 78 terabyte stem cell RNA samples to build a unique gene expression database

Make it easier to replicate disease in petri dishes w/induced stem cells

78 TB of Stem Cell RNA

1 Million compute hours, 115 years of computing in

1 week for $19,555

Gene Expression Analysis Morgridge Institute for Research

� Cluster details

�  5,000 to 10,000 cores for a week �  Very long individual analysis were check-pointed = Spot instance usage possible

Survey of Use Cases þ Drug Design þ CAD/CAM þ Genomics …

Code can accelerate Science

Ask a Question Hypothesize Predict Experiment /

Test Analyze Final Results        

The Scientific Method on Utility HPC

Yield “Better”, “Faster” Research for less $

Dynamic, utility access to compute power

is as important as uptime

I’m here to recruit you, for a cause

Contribute to Chef. Make the community better.

And you will help Cycle make impossible science,

possible.

2013 BigScience Challenge

$10,000 of free computing to science benefitting humanity

2012 winner: 115yr Genomic analysis

Enter at: http://cyclecomputing.com/big-science-challenge/enter

Thank You! Questions?

Recommended