Upload
oleksiy-kovyrin
View
219
Download
0
Embed Size (px)
Citation preview
8/14/2019 Cloud Storage FUD
1/33
Cloud Storage FUD
Alyssa Henry
General Manager
Amazon S3
8/14/2019 Cloud Storage FUD
2/33
Amazon S3:
Storage for the Internet
Billions of Objects Stored
0
5
10
15
20
25
3035
40
2006 Q4 2007 Q4 2008 Q4
8/14/2019 Cloud Storage FUD
3/33
Design Goals
In life, as in football, you wont
go far unless you know wherethe goalposts are.
Arnold H. Glasgow
8/14/2019 Cloud Storage FUD
4/33
Durable
Wont lose or corrupt objects
http://www.flickr.com/photos/14466267@N07/3237986628/8/14/2019 Cloud Storage FUD
5/33
Available
Always on
No planned downtime
Engineer for 99.99%
8/14/2019 Cloud Storage FUD
6/33
Scalable
Virtually infiniteSupport an unlimited number of web-scale apps
Use scale as an advantage
http://www.flickr.com/photos/milesdeestrellas/210424609/http://www.flickr.com/photos/milesdeestrellas/210424609/8/14/2019 Cloud Storage FUD
7/33
Secure
Secure protocols
Authentication mechanisms
Access controllable, log-able
8/14/2019 Cloud Storage FUD
8/33
Fast
Support high performance apps
S3 latency insignificant relative to Internet latency
Reduce Internet latency by adding new locations
http://www.flickr.com/photos/lazyousuf/3112028635/8/14/2019 Cloud Storage FUD
9/33
Simple
Self-service
Straightforward API
Few concepts to learn
http://www.flickr.com/photos/jose_zaragoza/1520808946/8/14/2019 Cloud Storage FUD
10/33
Cost Effective
Pay as you go
Pay only for what is used
No long-term contracts or commitments
Use software and scale to reduce costs
http://theunquietlibrary.files.wordpress.com/2007/12/pennies.jpg8/14/2019 Cloud Storage FUD
11/33
Uncertainty
Everything is vague to a degree
you do not realize till you havetried to make it precise.
Bertrand Russell
8/14/2019 Cloud Storage FUD
12/33
Customer usage consistent or changing over time
Predominant workload type
Object access frequency
Object access volume
Object access locality
Object lifetimeObject size
What Dont We Know?
http://www.kk.org/thetechnium/question-mark.jpg8/14/2019 Cloud Storage FUD
13/33
Uncertainty Is Certain
Inherent in general purpose systems
Use cases varied
May change over timeMay change suddenly
Have to make assumptions
8/14/2019 Cloud Storage FUD
14/33
Failure
Try again. Fail Again. Fail better
Samuel Beckett
8/14/2019 Cloud Storage FUD
15/33
What Are The Odds?
Many failures happen frequently
Even low probability events happen at high scale
8/14/2019 Cloud Storage FUD
16/33
Failure Happens
Natural disasters destroy data centers
Load balancers corrupt packets
Technicians pull live fiber
Routers black hole traffic
Power and cooling fails
NICs corrupt packets
Disk drives fail
Bits rot
http://www.flickr.com/photos/buglugs/18846545/8/14/2019 Cloud Storage FUD
17/33
Failure Types
None All
Scope
Catastrophic
HarmlessTemp
Perm
Duration
8/14/2019 Cloud Storage FUD
18/33
Techniques
Do not let what you cannot do
interfere with what you can do.John Wooden
8/14/2019 Cloud Storage FUD
19/33
Broadly applicable technique
Increases durability, availability, cost, complexity
Seat belt & air bag vs. belt & suspenders
Plan for catastrophic loss of entire data center
Redundancy
8/14/2019 Cloud Storage FUD
20/33
Resolves temporal failures
Real-time or later date
Leverage redundancy
Idempotency
Retry
LATHER, RINSE, REPEAT
http://www.flickr.com/photos/tfrancis/652520255/8/14/2019 Cloud Storage FUD
21/33
Surge Protection
Rate limiting
Exponential back off
Cache TTL extension
http://www.flickr.com/photos/13926709@N06/2646938237/8/14/2019 Cloud Storage FUD
22/33
Spectrum of choices
Time lapse typically result of node failure
Sacrifice some consistency for availability
Sacrifice some availability for durability
Eventual Consistency
8/14/2019 Cloud Storage FUD
23/33
Failure of components is normal
Routinely fail disks, servers, data centers
http://www.flickr.com/photos/82712482@N00/2174534180/
Routine Failure
http://www.flickr.com/photos/82712482@N00/2174534180/
8/14/2019 Cloud Storage FUD
24/33
Software
Hardware
Workloads
Diversity
http://www.flickr.com/photos/deestea/130262190/8/14/2019 Cloud Storage FUD
25/33
Identifies corruption inbound, outbound, at rest
Increases cost, complexity for the customer
Increases durability, availability
Integrity Checking
8/14/2019 Cloud Storage FUD
26/33
Internal, external
Real time, historical
Per host, aggregate
Telemetry
http://www.flickr.com/photos/bart-nega/3064385470/http://www.flickr.com/photos/bart-nega/3064385470/8/14/2019 Cloud Storage FUD
27/33
Human processes fail
Human reaction time is slow
Autopilot
http://www.flickr.com/photos/sugu/974272658/8/14/2019 Cloud Storage FUD
28/33
Summary
8/14/2019 Cloud Storage FUD
29/33
Design Goals
Durable
Available
ScalableSecure
Fast
SimpleCost Effective
http://www.flickr.com/photos/jennyblush/313714321/8/14/2019 Cloud Storage FUD
30/33
Techniques
Redundancy
Retry
Surge ProtectionEventual Consistency
Routine Failure
Diversity
Integrity Checking
Telemetry
Autopilot
http://www.flickr.com/photos/amanky/3205960362/8/14/2019 Cloud Storage FUD
31/33
Final Thoughts
Storage is a lasting relationship
Requires trust
Reliability at low cost achieved throughengineering, experience, and scale
8/14/2019 Cloud Storage FUD
32/33
More Information
Amazon S3http://aws.amazon.com/s3
Amazon Web Services blog
http://aws.typepad.comWerner Vogels blog
http://www.allthingsdistributed.com
Email me [email protected]
http://aws.amazon.com/s3http://aws.typepad.com/http://www.allthingsdistributed.com/mailto:[email protected]:[email protected]://www.allthingsdistributed.com/http://www.allthingsdistributed.com/http://aws.typepad.com/http://aws.amazon.com/s38/14/2019 Cloud Storage FUD
33/33
Thank You!