Upload
cory-von-wallenstein
View
595
Download
0
Embed Size (px)
DESCRIPTION
Talk by Cory von Wallenstein from Dyn at Astricon in Atlanta, GA on the realities of global infrastructure in the cloud.
Citation preview
@cvonwallenstein from @DynInc
Global Infrastructure in the Cloud
Cory von WallensteinChief Technology Officer, Dyn Inc.
@cvonwallenstein
http://www.flickr.com/photos/notaperfectpilot/8119088205/
“Wired people should know something about wires”- Neal Stephenson, quoted in Andrew Blum’s TED Talk What is the Internet, Really?
http://www.ted.com/talks/andrew_blum_what_is_the_internet_really.html
http://www.ted.com/talks/andrew_blum_what_is_the_internet_really.html
http://www.ted.com/talks/andrew_blum_what_is_the_internet_really.html
http://www.ted.com/talks/andrew_blum_what_is_the_internet_really.html
http://www.ted.com/talks/andrew_blum_what_is_the_internet_really.html
@cvonwallenstein from @DynInc
Going Global in the Cloud
• Never been easier• Never been more affordable• Why should or shouldn’t you?• If so, how?
@cvonwallenstein from @DynInc
A Word on Costs and Value
• Unlikely to save you raw dollars• Likely to spend the same or more• But here’s what you gain:
– Flexibility – Performance – Reliability – Efficiency
• Are those worthwhile to you?
(can’t really screw this up)(many caveats here)
(if you do it right)(if your team embraces it)
@cvonwallenstein from @DynInc
Why go from 1 to N?
Reason 1: Disaster Recovery
http://maps.google.com
Reason 1: Disaster Recovery
http://www.cogentco.com/files/images/network/network_map/networkmap_global_large.png
Speed of light299,792.458 km/second
Theoretical RTT~40ms
Real RTT~90ms
Reason 1: Disaster Recovery
• Things don’t work as well at 90ms RTT latency as they do at 9ms RTT latency
• Where can you go to get out of the way of a disaster but not create latency headaches?
http://www.globaldatavault.com/natural-disaster-threat-maps.htm
Reason 1: Disaster Recovery
http://www.datacenterknowledge.com/archives/2012/07/09/outages-surviving-electric-squirrels-ups-failures/
“A frying squirrel took out half of our Santa Clara data center two years back,”- Mike Christian, Yahoo
Reason 1: Disaster Recovery
http://blog.level3.com/level-3-network/the-10-most-bizarre-and-annoying-causes-of-fiber-cuts/
“Squirrel chews account for a whopping 17% of our damages so far this year! But let me add that it is down from 28% just last year and it continues to decrease since we added cable guards to our plant.”, Fred Lawler, Level(3)
Reason 2: Get closer to users
http://www.akamai.com/html/technology/dataviz1.html
Reason 2: Get closer to users
http://www.akamai.com/html/technology/dataviz1.html
Reason 3: “Sorry, we’re full”
http://www.theregister.co.uk/2010/10/12/capgemini_merlin_data_center/
How: Figure out who and where
• Figure out what your motivations are– Disaster recovery– Get closer to users– Future scaling
• Take a latency inventory of your apps– To end users– To other dependencies
• Get out the maps! Fire up traceroute!– EC2: US East (Northern Virginia), US West (Oregon), US West (Northern California), EU (Ireland), Asia
Pacific (Singapore), Asia Pacific (Tokyo), South America (Sao Paulo), and GovCloud.
@cvonwallenstein from @DynInc
How: Deploy and manage w/ sanity• Software defined datacenters
– Fancy term for “I defined the architecture in code instead of Microsoft Visio”
• Configuration management– Orchestrate the cloud APIs, and the config of
systems– Chef– Puppet– CFEngine, and more
• Huge loss if you don’t take advantage of this
@cvonwallenstein from @DynInc
How: Coordinating global traffic• What’s the app?
– Application agnostic, like DNS Global Server Load Balancing
• Fancy term for “DNS servers monitor your servers and change DNS answers when events are detected”
– Application specific, like DUNDi• Decentralized coordination and fault tolerance
• Avoid SPOFs like the plague– Keep it simple, keep it scalable
@cvonwallenstein from @DynInc
What can you expect?• Flexibility
– Deploy new servers in new locations in hours instead of weeks
• Performance– If horizontally scalable on commodity hardware,
you win. Else, be careful.– If closer to users and site-to-site latency not an
issue or data is distributed/eventually consistent, you win. Else, be careful.
@cvonwallenstein from @DynInc
What can you expect?• Reliability
– If you understand “regions” and “availability zones”, you win. Else, be careful.
http://joyent.com/blog/if-i-was-your-cloud-provider-i-d-never-let-you-down
What can you expect?• Efficiency
– Automation– More instrumentation -> reduced MTTD– More scalable– Most important: More focus on what delivers your
business core competitive advantage.
@cvonwallenstein from @DynInc
Thank you (and we’re hiring!)VP Technical Operations, Director of Engineering
Director of Security, Network Engineers, Software Engineers, System Engineers, System Administrators (and more!)
Reach out to me: dyn.com, [email protected], @cvonwallenstein