I’m Chris.
I’m an infrastructure geek.
I work for the BioTeam.
The “C” Word.
When I say “cloud”I’m talking IaaS.
Amazon AWSIs the IaaS cloud.
Most others are fooling themselves.(Has-beens, also-rans & delusional marketing
zombies)
A message for thepretenders…
No APIs?Not a cloud.
No self-service?Not a cloud.
I have to email a human?
Not a cloud.
~50% failure rate when provisioning new servers?
Stupid cloud.
Block storage and virtual servers
only?(barely) a cloud;
Private Clouds: My $.02
Private Clouds in 2012:
• Hype vs. Reality ratio still wacky
• Sensible only for certain shops• Have you seen what you have to do to your networks & gear?
• There are easier ways
Private Clouds: My Advice for ‘12
• Remain cynical (test vendor claims)
• Due Diligence still essential• I personally would not deploy/buy
anything that does not explicitly provide Amazon API compatibility
Private Clouds: My Advice for ‘12
• Most people are better off:• Adding VM platforms to existing
HPC clusters & environments• Extending enterprise VM
platforms to allow user self-service & server catalogs
Enough Bloviating. Advice time.
Tip #1
HPC & Clouds: Whole New World
• We have spent decades learning to tune research HPC systems for shared access & many users.
• The cloud upends this model
• Far more common to see …• Dedicated cloud resources
spun up for each app or use case• Each system gets individually
tuned & optimized
Tip #2
Hybrid Clouds & Cloud Bursting
• Lots of aggressive marketing• Lots of carefully constructed
“case studies” and prototypes• The truth?• Less usable than you’ve been
told• Possible? Heck yeah.• Practical? Only sometimes.
• Advice• Be cynical• Demand proof• Test carefully
• Still want to do it?• Buy it, don’t build it• Cycle Computing• Univa• BrightComputing• …
• Follow the crowd• In the real world we see:• Separation between local
and cloud HPC resources• Send your work to the
system most suitable
Tip #3
You can’t rewrite EVERYTHING.
• Salesfolk will just glibly tell you to rewrite your apps so you can use whatever big data analysis framework they happen to be selling today
• They have no clue.
• In life science informatics we have hundreds of codes that will never be rewritten.
• We’ll be needing them for years to come.
• Advice:• MapReduceish methods
are the future for big-data informatics
• It will take years to get there
• We still have to deal with legacy algorithms and codes
• You will need:• A process for figuring out
when it’s worthwhile to rewrite/re-architect
• Tested cloud strategies for handling three use cases
You need 3 cloud architectures:
1. Legacy HPC2. “Cloudy” HPC3. Big Data HPC (Hadoop)
Legacy HPC on the cloud
• MIT StarCluster• http://web.mit.edu
/star/cluster/• This is your baseline• Extend as needed
“Cloudy” HPC
• Use this method when …• It makes sense to rewrite or
rearchitect an HPC workflow to better leverage modern cloud capabilities
“Cloudy” HPC, continued
• Ditch the legacy compute farm model
• Leverage elastic scale-out tools (***)
• Spot Instances for elastic & cheap compute
• SimpleDB for job statekeeping• SQS for job queues & workrflow “glue”• SNS for message passing & monitoring• S3 for input & output data• Etc.
Big Data HPC
• It’s gonna be a MapReduce world
• Little need to roll your own• Ecosystem already healthy• Multiple providers today• Often a slam-dunk cloud use
case
Tip #4
The Cloud was not designed for “us”
• HPC is an edge case for the hyperscale IaaS clouds
• We need to deal with this and engineer around it.
• Many examples• Eventual consistency• Networking & subnets• Latency• Node placement
• Advice• Manage expectations• Benchmark & test• Evangelize• (pester the cloud sales reps
…)
Tip #5
Data Movement Is Still Hard
• Consistently getting easier• Amazon is not a
bottleneck• AWS Import/Export• AWS Direct Connect• Aspera has some
amazing stuff out right now
• Advice• AWS Import/Export works
well• Size of pipe is not
everything• Sweat the small stuff• Tracking, checksums, disk
speed• Dedicated workstations• Secure media storage
Dedicated data movement station
‘naked’ Terabyte-scale data movement
Don’t overlook media storage …
• Advice for 2012• BioTeam is dialing down our
advocacy of physical data ingestion into the cloud
• Why?• Operationally hard,
expensive and no longer strictly needed
Real world cross-country internet-based data movement
March 2012
700Mb/sec into Amazon, stress-free & zero tuning
March 2012
• People trying to move data via physical media quickly realize the operational difficulties
• Bandwidth is cheaper than hiring another body to manage physical data ingestion & movement
• In 2012 we strongly recommend network-based data movement when at all possible
u r doing it wrong
cool data movement, bro!
Tips #6 & 7
Cloud storage. Still slow.
Big shared storage. Still hard.
• Not much we can do except engineer around it
• AWS compute cluster instances are a huge step forward
• AWS competitors take note
• We are not database nerds
• We care about more than just random IO performance
• We need it all• Random I/O• Long sequential
read/write
• Faster Storage Options• Software RAID on EBS• Various GlusterFS
options• Even if you optimize
everything, the virtual NICs are still a bottleneck
• Big Shared Storage• 10GbE nodes and NFS• Software RAID sets• GlusterFS or similar• 2012: pNFS finally?
Tip #8
Things fail differently in the cloud.
• Stuff breaks• It breaks in weird ways• Transient/temporary
issues more common than what we see “at home”
• Advice• Pessimism is good• Design for failure• Think hard about• How will you detect?• How will you respond?
• Advice• Remove humans from
loop• Automate recovery• Automate your backups
Tip #9
Serial/batch computing at-scale
• Loosely coupled workflows are ideal
• Break the pipeline into discrete components
• Components should be able to scale up|down independently
• Component = Opportunity to:• … Make a scaling
decision• (# nodes in use)
• … Make sizing decision• (instance type in use)
Nirvana is …
… independent loosely connected components that can self-scale and communicate asynchronously
Advice:• Many people already doing
this• Best practices are well
known• Steal from the best:• RightScale, Opscode &
Cycle Computing
Phew. Think I’m done now.
End;
Backup Slides
Private Clouds: Pick Your Poison
• OpenStack - http://openstack.org • Pro: Super smart
developers; significant mindshare; True Open Source
• Con: Commitment to AWS API compatibility (?) & stability
Private Clouds: Pick Your Poison
• CloudStack- http://cloudstack.org • Pro: Explicit AWS API
support; very recent move away from “open-core” model; usability
• Con: Developer mindshare? Sudden switch to Apache
Private Clouds: Pick Your Poison
• Eucalyptus- http://eucalyptus.com • Pro: Direct AWS API
compatibility; lots of hypervisor support
• Con: Open-core model; mindshare; Recent ressurection