56
Collaboration @ Scale September, 2015: Life Sciences User Group, Cambridge MA Chris Dwan ([email protected] ) Director, Research Computing and Data Services Acting Director, IT

2015 09 emc lsug

Embed Size (px)

Citation preview

Page 1: 2015 09 emc lsug

Collaboration @ ScaleSeptember, 2015: Life Sciences User Group, Cambridge MA

Chris Dwan ([email protected])Director, Research Computing and Data ServicesActing Director, IT

Page 2: 2015 09 emc lsug

Conclusions

• Good news: The fundamentals still apply.

• Understand your data.– Get intense about what you need and why you need it who is responsible

and how / when you plan to compute against it.– This will require organizational courage.

• Stop thinking about “moving” data. – Archive first. After that, all copies are transient.

• Object storage is different from files– at many weird levels.

• Elasticity in compute is not like elasticity in data– Availability of CPUs vs. proximity to elastic compute.– Also, “trash storage?”

Page 3: 2015 09 emc lsug

• The Broad Institute is a non-profit biomedical research institute founded in 2004

• Fifty core faculty members and hundreds of associate members from MIT and Harvard

• ~1000 research and administrative personnel, plus ~2,400+ associated researchers

• ~1.4 x 106 genotyped samples

 Programs and Initiativesfocused on specific disease or biology areas

CancerGenome BiologyCell CircuitsPsychiatric DiseaseMetabolismMedical and Population GeneticsInfectious DiseaseEpigenomics

Platformsfocused technological innovation and application

GenomicsTherapeuticsImagingMetabolite ProfilingProteomicsGenetic Perturbation

The Broad Institute

Page 4: 2015 09 emc lsug

• The Broad Institute is a non-profit biomedical research institute founded in 2004

• Fifty core faculty members and hundreds of associate members from MIT and Harvard

• ~1000 research and administrative personnel, plus ~2,400+ associated researchers

• ~1.4 x 106 genotyped samples

 Programs and Initiativesfocused on specific disease or biology areas

CancerGenome BiologyCell CircuitsPsychiatric DiseaseMetabolismMedical and Population GeneticsInfectious DiseaseEpigenomics

Platformsfocused technological innovation and application

GenomicsTherapeuticsImagingMetabolite ProfilingProteomicsGenetic Perturbation

The Broad Institute

“This generation has a historic opportunity and responsibility to transform medicine by using systematic approaches in the biological sciences to dramatically accelerate the understanding and cure of disease”

Page 5: 2015 09 emc lsug

If a man’s at odds to know his own mind it’s because he hasn’t got aught but his mind to know it with.

Cormac McCarthy, Blood Meridian or The Evening Redness in the West

Page 6: 2015 09 emc lsug

Broad Genomics Data Production

338 trillion base pairs (PF) in AugustAt ~1.25 bytes per base:422 TByte / month ~= 170 MByte / sec

Page 7: 2015 09 emc lsug

Broad Genomics Data Production: Context

Page 8: 2015 09 emc lsug

Broad Genomics Data Production: Context

We were all talking about “data tsunamis” here.

Page 9: 2015 09 emc lsug

Broad Genomics Data Production: Context

I joined the Broad here

We were all talking about “data tsunamis” here.

Page 10: 2015 09 emc lsug

Under the hood: ~1TB of MongoDB

Page 11: 2015 09 emc lsug

Organizations which design systems … are constrained to produce designs which are copies of the communication structures of those organizations

Melvin Conway, 1968

Page 12: 2015 09 emc lsug

If you have four groups working on a compiler, you’ll get a four pass compiler

Eric S Raymond, The New Hacker’s Dictionary, 1996

Page 13: 2015 09 emc lsug

Never send a human to do a machine’s job.

Agent Smith, The Matrix

Page 14: 2015 09 emc lsug

Broad IT Services

Traditional IT:• Globally shared services• NFS, AD / LDAP, DNS, …• Many services provided using

public cloudsResponsibility: CIO

Page 15: 2015 09 emc lsug

Broad IT Services

Traditional IT:• Globally shared services• NFS, AD / LDAP, DNS, …• Many services provided using

public cloudsResponsibility: CIO

Cancer Genome Analysis Connectivity MapBilling Support:• IT provides coordination between internal cost

objects and cloud vendor “projects” or “roles”• No shared servicesResponsibility: User

Page 16: 2015 09 emc lsug

Broad IT Services

Traditional IT:• Globally shared services• NFS, AD / LDAP, DNS, …• Many services provided using

public cloudsResponsibility: CIO

Cancer Genome Analysis Connectivity MapBilling Support:• IT provides coordination between internal cost

objects and cloud vendor “projects” or “roles”• No shared servicesResponsibility: User

Cloud / Hybrid Model• Granular shared services• VPN used to expose selected

services to particular projects Responsibility: Project / Service Lead

BITS DevOps DSDE Dev Cloud Pilot

VPN VPN VPN

Page 17: 2015 09 emc lsug

The future is already here – it’s just not very well distributed

William Gibson

Page 18: 2015 09 emc lsug

CycleCloud provides straightforward, recognizable cluster functionality with autoscaling and a clean management UI.

Do not be fooled by the 85 page “quick start guide,” it’s just a cluster.

Page 19: 2015 09 emc lsug

Instances are provisioned based on queued jobs

3,000 tasks completed in two hours(differential dependency on gene sets in R)

5 instances @ 32 cores:$8.54 / hrThis was a $20 analysis

Searching for the right use case …

Page 20: 2015 09 emc lsug

Cycle Cloud on Google Pre-emptible Instances

50,000+ cores used for ~2 hours

Page 21: 2015 09 emc lsug

If you want to recruit the best people, you have to hit home runs from time to time.

Page 22: 2015 09 emc lsug

My Metal

Boot Image Provisioning (PXE / Cobbler, Kickstart)

Hardware Provisioning (UCS, Xcat)

Broad configuration (Puppet)

User or execution environment (Dotkit, docker, JVM, Tomcat)

Hypervisor OS

Instance Provisioning (Openstack)

Bare Metal

End User visible OS and vendor patches (Red Hat, plus satellite)

Private Cloud Public CloudContainerized Wonderland

The basics still apply

Network topology (VLANS, et al)

Public Cloud Infrastructure

Instance Provisioning (CycleCloud)

… Docker / Mesos Kubernetes / Cloud Foundry / Workflow

Description Language / …

Page 23: 2015 09 emc lsug

bragg bragg iodineSequencer

Flowcell Directories • Base Calling• Paired reads• /seq/illumina

Lane BAMs• Aligned • Not aggregated• /seq/picard

Aggregated BAMs• Aligned to a reference• /seq/picard_aggregation

Deleted after six weeks

“Keep forever”

gVCF

VCF

argon

A nightmare* of files

Page 24: 2015 09 emc lsug

bragg bragg iodineSequencer

Flowcell Directories • Base Calling• Paired reads• /seq/illumina

Lane BAMs• Aligned • Not aggregated• /seq/picard

Aggregated BAMs• Aligned to a reference• /seq/picard_aggregation

bragg

Deleted after six weeks

knox

Six months on high performance storage, then migrated to cost effective filers.

gVCF

VCF

Over time, these directories become a highly curated forest of symbolic links, spanning several filesystems

A nightmare of files

Page 25: 2015 09 emc lsug

bragg bragg iodineSequencer

Flowcell Directories • Base Calling• Paired reads• /seq/illumina

Lane BAMs• Aligned • Not aggregated• /seq/picard

Aggregated BAMs• Aligned to a reference• /seq/picard_aggregation

bragg

Deleted after six weeks

knox

Six months on high performance storage, then migrated to cost effective filers.

gVCF

VCF

Over time, these directories become a highly curated forest of symbolic links, spanning several filesystems

kiwi

flynn

argon

mint

A nightmare of files

Page 26: 2015 09 emc lsug

bragg bragg iodineSequencer

Flowcell Directories • Base Calling• Paired reads• /seq/illumina

Lane BAMs• Aligned • Not aggregated• /seq/picard

Aggregated BAMs• Aligned to a reference• /seq/picard_aggregation

bragg

Deleted after six weeks

knox

Six months on high performance storage, then migrated to cost effective filers.

gVCF

VCF

Over time, these directories become a highly curated forest of symbolic links, spanning several filesystems

kiwi

flynn

argon

Setting aside the operational issues, meaningful access management is frankly impossible in this architecture.

mint

A nightmare* of files

Page 27: 2015 09 emc lsug

Caching edge filers for shared references

10 Gb/sec Network80+ Gb/sec Network

Openstack

Production Farm

Avere Edge Filer(physical)

On premise data stores

Shared Research Farm

Coherence on small volumes of files provided by a combination of clever network routing and Avere’s caching algorithms.

Page 28: 2015 09 emc lsug

Cloud-backed, file-based storage

10 Gb/sec Network80+ Gb/sec Network

Openstack

Production Farm

Multiple Public Clouds

Avere Edge Filer(physical)

On premise data stores

Cloud backed data stores

Shared Research Farm

We decided to call this fargo. It’s cold, sort of far away, and not really where we were planning to go.

Page 29: 2015 09 emc lsug

Caching edge filers for unlimited expansion space

10 Gb/sec Network80+ Gb/sec Network

Openstack

Production Farm

Multiple Public Clouds

Avere Edge Filer(physical)

On premise data stores

Avere Edge Filer (virtual)Cloud backed

data stores

Shared Research Farm

Eventually we can stand up “cloud pods” that make direct reference to fargo.

Page 30: 2015 09 emc lsug

bragg bragg iodineSequencer

Flowcell Directories • Base Calling• Paired reads• /seq/illumina

Lane BAMs• Aligned • Not aggregated• /seq/picard

Aggregated BAMs• Aligned to a reference• /seq/picard_aggregation

bragg

Deleted after six weeks

knox

Six months on high performance storage, then migrated to cost effective filers.

gVCF

VCF

Over time, these directories become a highly curated forest of symbolic links, spanning several filesystems

kiwi

flynn

argon

Setting aside the operational issues, meaningful access management is frankly impossible in this architecture.

mint

Fargo (avere backed, file storage)

This is cool, but it’s not the answer.

Page 31: 2015 09 emc lsug

bragg bragg iodineSequencer

Flowcell Directories • Base Calling• Paired reads• /seq/illumina

Lane BAMs• Aligned • Not aggregated• /seq/picard

Aggregated BAMs• Aligned to a reference• /seq/picard_aggregation

Deleted after six weeks

Six months on high performance storage, then migrated to cost effective filers.

gVCF

VCF

Over time, these directories become a highly curated forest of symbolic links, spanning several filesystems

argon

Setting aside the operational issues, meaningful access management is frankly impossible in this architecture.

Fargo (avere backed, file storage)

This is cool, but it’s not the answer.

Page 32: 2015 09 emc lsug

Data push to “Fargo”

September 1, 2015:• Sustained 250MB/sec for several weeks• 646TB of files occupying 579TB of usable space (compression, even at 10% savings,

is totally worth it)• Client side encryption in-line: Skip the conversation, just click the button.

Page 33: 2015 09 emc lsug

The edges are still a little rough

The billing API is the best way to get usage information out of cloud providers.

Page 34: 2015 09 emc lsug

The edges are still a little rough

The billing API is the best way to get usage information out of google’s cloud offerings.

“df” can be off by hundreds of TB.

Page 35: 2015 09 emc lsug

The edges are still a little rough

The billing API is the best way to get usage information out of google’s cloud offerings.

Seriously? “df” is off by hundreds of TB.

Eight exabytes is cool though.

Page 36: 2015 09 emc lsug

The edges are still a little rough

The billing API is the best way to get usage information out of google’s cloud offerings.

I guess it’s better than waiting all day for ‘du’ to finish…

Page 37: 2015 09 emc lsug

The edges are still a little rough

The billing API is the best way to get usage information out of google’s cloud offerings.

We write ~250 objects, 1MB each, every second of every day.

“ls” is not a meaningful tool at this scale.

Page 38: 2015 09 emc lsug

The edges are still a little rough

The billing API is the best way to get usage information out of google’s cloud offerings.

Old style dashboards simply won’t cut it.

Page 39: 2015 09 emc lsug

File based storage: The Information Limits• Single namespace filers hit real-world limits at:

– ~5PB (restriping times, operational hotspots, MTBF headaches)– ~109 files: Directories must either be wider or deeper than human

brains can handle.

• Filesystem paths are presumed to persist forever– Leads inevitably to forests of symbolic links

• Access semantics are inadequate for the federated world.– We need complex, dynamic, context sensitive semantics including

consent for research use.

Page 40: 2015 09 emc lsug

File based storage: The Information Limits• Single namespace filers hit real-world limits at:

– ~5PB (restriping times, operational hotspots, MTBF headaches)– ~109 files: Directories must either be wider or deeper than human

brains can handle.

• Filesystem paths are presumed to persist forever– Leads inevitably to forests of symbolic links

• Access semantics are inadequate for the federated world.– We need complex, dynamic, context sensitive semantics including

consent for research use.

Page 41: 2015 09 emc lsug

Object storage

• It’s still made out of disks and servers.

• You get the option of striping across on-premise and cloud in dynamic and sensible ways.

Page 42: 2015 09 emc lsug

My object storage opinions

• The S3 standard defines object storage– Any application that uses any special / proprietary features is a

nonstarter – including clever metadata stuff.

• All object storage must be durable to the loss of an entire data center– Conversations about sizing / usage need to be incredibly simple

• Must be cost effective at scale– Throughput and latency are considerations, not requirements– This breaks the data question into stewardship and usage

• Must not merely re-iterate the failure modes of filesystems

Page 43: 2015 09 emc lsug

Do not call the tortoise unworthy because she is not something else.

Walt Whitman, Song of Myself

Page 44: 2015 09 emc lsug

Object Storage is different

• Filesystems– I/O errors or stalls are rare, and are usually evidence of

serious problems– Optimize for throughput by using long streaming reads

and writes.

• Object Storage– I/O errors are common, with an expectation of several

retries– Optimize for throughput by parallelizing and reducing the

cost of a retry – Multipart upload and download are essential

Page 45: 2015 09 emc lsug

Broad Data Production, 2015: ~100TB /wk of unique information

“Data is heavy: It goes to the cheapest, closest place, and it stays there”

Jeff Hammerbacher

This means that you should put data in its final resting place as soon as it is generated. Anything else leads to madness.

Page 46: 2015 09 emc lsug

bragg bragg iodineSequencer

Flowcell Directories • Base Calling• Paired reads• /seq/illumina

Lane BAMs• Aligned • Not aggregated• /seq/picard

Aggregated BAMs• Aligned to a reference• /seq/picard_aggregation

Deleted after six weeks

gVCF

VCF

Our long term archive must be “object native”

Long term archiveObject native

Archive first

Must re-tool all pipelines to support object storage stage-in and stage out.

Once you have your archive right, all other data is transientCrammed,

encrypted BAMs• Not aligned• Not aggregated

Page 47: 2015 09 emc lsug

bragg bragg iodineSequencer

Flowcell Directories • Base Calling• Paired reads• /seq/illumina

Lane BAMs• Aligned • Not aggregated• /seq/picard

Aggregated BAMs• Aligned to a reference• /seq/picard_aggregation

Deleted after six weeks

gVCF

VCF

Long term archiveObject native

Archive first

Once you have your archive right, all other data is transient

Once the long term archive is object-native, we can move the main-line production to the cloud.

Page 48: 2015 09 emc lsug

The dashboard should look opaque, because metadata lives elsewhere.

Page 49: 2015 09 emc lsug

The dashboard should look opaque

• Object “names” should be a bag of UUIDs• Object storage should be basically unusable without the

metadata index.• Anything else recapitulates the failure mode of file based

storage.

• This should scare you.

Page 50: 2015 09 emc lsug

Data Deletion @ Scale

Me: “Blah Blah … I think we’re cool to delete about 600TB of data from a cloud bucket. What do you think?”

Page 51: 2015 09 emc lsug

Data Deletion @ Scale

Blah Blah … I think we’re cool to delete about 600TB of data from a cloud bucket Ray: “BOOM!”

Page 52: 2015 09 emc lsug

Data Deletion @ Scale

Blah Blah … I think we’re cool to delete about 600TB of data from a cloud bucket

• This was my first deliberate data deletion at this scale. • It scared me how fast / easy it was.• Considering a “pull request” model for large scale deletions.

Page 53: 2015 09 emc lsug

Standards are needed for genomic data

“The mission of the Global Alliance for Genomics and Health is to accelerate progress in human health by helping to establish a common framework of harmonized approaches to enable effective and responsible sharing of genomic and clinical data, and by catalyzing data sharing projects that drive and demonstrate the value of data sharing.”

Regulatory IssuesEthical IssuesTechnical Issues

Page 54: 2015 09 emc lsug

This stuff is important

We have an opportunity to change lives and health outcomes, and to realize the gains of genomic medicine, this year.

We also have an opportunity to waste vast amounts of money and still not really help the world.

I would like to work together with you to build a better future, sooner.

[email protected]

Page 55: 2015 09 emc lsug

Conclusions

• Good news: The fundamentals still apply.

• Understand your data.– Get intense about what you need and why you need it who is responsible

and how / when you plan to compute against it.

• Stop thinking about “moving” data. – Archive first. After that, all copies are transient.

• Object storage is different from files– at many weird levels.

• Elasticity in compute is not like elasticity in data– Availability of CPUs vs. proximity to elastic compute.– Also, “trash storage?”

Page 56: 2015 09 emc lsug

The opposite of play is not work, it’s depression

Jane McGonnigal, Reality is Broken

Thank You