1
An Overview of Cloud Computing
Raghu RamakrishnanChief Scientist, Audience and Cloud Computing
Research Fellow, Yahoo! Research
Reflects many discussions with: Eric Baldeschwieler, Jay Kistler, Chuck Neerdaels, Shelton Shugar, and Raymie Stata
and joint work with the Sherpa team, in particular:Brian Cooper, Utkarsh Srivastava, Adam Silberstein and Nick Puz in Y! ResearchChuck Neerdaels, P.P. Suryanarayanan and many others in CCDI
2
CCDI—Research Collaboration
Yahoo! Research
• Raghu Ramakrishnan • Brian Cooper• Utkarsh Srivastava• Adam Silberstein• Nick Puz• Rodrigo Fonseca
CCDI
• Chuck Neerdaels • P.P.S. Narayan • Kevin Athey• Toby Negrin• Plus Dev/QA teams
3
SCENARIOSPie-in-the-sky
4
Living in the Clouds
• We want to start a new website, FredsList.com• Our site will provide listings of items for sale, jobs,
etc.• As time goes on, we’ll add more features
– And illustrate how more cloud capabilities (and corresponding infrastructure components) are used as needed
• List of capabilities/components is illustrative, not exhaustive
• Our cloud provides a “dataset” abstraction– FredsList doesn’t worry about the underlying components
5
Step 1: Listings
Simple Web Service API’s Simple Web Service API’s
Database
Sherpa
FredsList.com application FredsList.com application
1234323, transportation, For sale: one bicycle, barely used
FredsList wants to store listings as (key, category, description)
5523442, childcare, Nanny available in San Jose
215534, wanted, Looking for issue 1 of Superman comic book
DECLARE DATASET Listings AS( ID String PRIMARY KEY,Category String,Description Text )
DECLARE DATASET Listings AS( ID String PRIMARY KEY,Category String,Description Text )
6
Step 2: Search
Simple Web Service API’s Simple Web Service API’s
Database
Sherpa
“bicycle”
FredsList’s customers quickly ask for keyword search
Search
Vespa
“dvd’s” “nanny”
MessagingYMB
FredsList.com application FredsList.com application
ALTER ListingsSET Description SEARCHABLE
ALTER ListingsSET Description SEARCHABLE
7
Step 3: Photos
Simple Web Service API’s Simple Web Service API’s
Database
Sherpa
FredsList decides to add photos to listings
Search
Vespa
MessagingYMB
Storage
MObStorForeign key
photo → listing
FredsList.com application FredsList.com application
ALTER ListingsADD Photo BLOB
ALTER ListingsADD Photo BLOB
8
Step 4: Data Analysis
Simple Web Service API’s Simple Web Service API’s
Database
Sherpa
FredsList wants to analyze its listings to get statistics about category, do geocoding, etc.
Search
Vespa
MessagingYMB
Storage
MObStorForeign key
photo → listing
FredsList.com application FredsList.com application
ALTER ListingsMAKE ANALYZABLE
ALTER ListingsMAKE ANALYZABLE
Compute
Grid
Batch export
Pig query to analyze categories
Hadoop program to geocode data
Hadoop program to generate fancy pages for listings
9
Step 5: Performance
Simple Web Service API’s Simple Web Service API’s
Database
Sherpa
FredsList wants to reduce its data access latency
Search
Vespa
MessagingYMB
Storage
MObStorForeign key
photo → listing
FredsList.com application FredsList.com application
ALTER ListingsMAKE CACHEABLE
ALTER ListingsMAKE CACHEABLE
Compute
Grid
Batch export
Caching
memcached
10
EYES TO THE SKIESMotherhood-and-Apple-Pie
11
Why Clouds?
• On-demand infrastructure to create a fundamental shift in the OE curve. Let’s us:– Do things we can’t do– Reduce time to market– Build more robustly, more
efficiently, more globally, more completely, for a given budget
• Cloud services should do heavy lifting of heavy-lifting of scaling & high-availability– Today, this is done at the
app-level, which is not productive
12
Requirements for Cloud Services
• Multitenant. A cloud service must support multiple, organizationally distant customers.
• Elasticity. Tenants should be able to negotiate and receive resources/QoS on-demand.
• Resource Sharing. Ideally, spare cloud resources should be transparently applied when a tenant’s negotiated QoS is insufficient, e.g., due to spikes.
• Horizontal scaling. It should be possible to add cloud capacity in small increments; this should be transparent to the tenants of the service.
• Metering. A cloud service must support accounting that reasonably ascribes operational and capital expenditures to each of the tenants of the service.
• Security. A cloud service should be secure in that tenants are not made vulnerable because of loopholes in the cloud.
• Availability. A cloud service should be highly available.• Operability. A cloud service should be easy to operate, with few
operators. Operating costs should scale linearly or better with the capacity of the service.
13
Types of Cloud Services
• Two kinds of cloud services:– Horizontal Cloud Services
• Functionality enabling tenants to build applications or new services on top of the cloud
– Functional Cloud Services • Functionality that is useful in and of itself to tenants. E.g., various
SaaS instances, such as Saleforce.com; Google Analytics and Yahoo!’s IndexTools; Yahoo! properties aimed at end-users and small businesses, e.g., flickr, Groups, Mail, News, Shopping
• Could be build on top of horizontal cloud services or from scratch• Yahoo! has been offering these for a long while (e.g., Mail for
SMB, Groups, Flickr, BOSS, Ad exchanges)
14
Horizontal Cloud Services
• Horizontal cloud services are foundations on which tenants build applications or new services. They should be:– Semantics-free. Must be "generic infrastructure,” and not tied to
specific app-logic. • May provide the ability to inject application logic through well-defined
APIs
– Broadly applicable. Must be broadly applicable (i.e., it can't be intended for just one or two properties).
– Fault-tolerant over commodity hardware. Must be built using inexpensive commodity hardware, and should mask component failures.
• While each cloud service provides value, the power of the cloud paradigm will depend on a collection of well-chosen, loosely coupled services that collectively make it easy to quickly develop and operate innovative web applications.
15
What’s in the Horizontal Cloud?
Common Approaches to QA, Production Engineering,Performance Engineering, Datacenter Management, and Optimization
ID & Account Management
Monitoring & QoS
Shared Infrastructure
Metering, Billing, Accounting
Horizontal Cloud Services
Edge Content Servicese.g., YCS,
YCPI
Provisioning & Virtualization
e.g., EC2
Batch Storage & Processing
e.g., Hadoop
& Pig
Operational Storagee.g., S3,
MObStor,Sherpa
Other Services
Messaging, Workflow,
virtual DBs & Webserving
Security
Simple Web Service API’s
16
Yahoo! CCDI Thrust Areas
• Fast Provisioning and Machine Virtualization: On demand, deliver a set of hosts imaged with desired software and configured against standard services– Multiple hosts may be multiplexed onto the same physical
machine.
• Batch Storage and Processing: Scalable data storage optimized for batch processing, together with computational capabilities
• Operational Storage: Persistent storage that supports low-latency updates and flexible retrieval
• Edge Content Services: Support for dealing with network topology, communication protocols, caching, and BCP
Rest of today’s talk
17
[Workflow][Workflow]
Hadoop: Batch Storage/Analysis
Why is batch processing important?
• Whether it’s – response-prediction for advertising– machine-learned relevance for Search, or– content optimization for audience, – data-intensive computing is increasingly
central to everything Yahoo! does– Hadoop is central to addressing this need
• Hadoop is a case-study in our cloud vision– Processes enormous amounts of data– Provides horizontal scaling and fault-
tolerance for our users– Allows those users to focus on their app
logic
HDFSHDFS
Map-ReduceMap-Reduce
High-level query layer (Pig)
High-level query layer (Pig)
18
SHERPA
To Help You Scale Your Mountains of Data
19
The Yahoo! Storage Problem
– Small records – 100KB or less
– Structured records - tens, hundreds or thousands of fields
– Extreme data scale - Tens of TB
– Extreme request scale - Tens of thousands of requests/sec
– Low latency globally - 20+ datacenters worldwide
– High Availability - outages cost $millions
– Variable usage patterns - as applications and users change
19
20
The Sherpa Solution
The next generation global-scale record store
– Record-orientation: Routing, data storage optimized for low-latency record access
– Scale out: Add machines to scale throughput (while keeping latency low)
– Asynchrony: Pub-sub replication to far-flung datacenters to mask propagation delay
– Consistency model: Reduce complexity of asynchrony for the application programmer
– Cloud deployment model: Hosted, managed service to reduce app time-to-market and enable on demand scale and elasticity
20
21
E 75656 C
A 42342 EB 42521 W
C 66354 W
D 12352 E
F 15677 E
What is Sherpa?
E 75656 C
A 42342 EB 42521 W
C 66354 W
D 12352 E
F 15677 E
CREATE TABLE Parts (ID VARCHAR,StockNumber INT,Status VARCHAR…
)
CREATE TABLE Parts (ID VARCHAR,StockNumber INT,Status VARCHAR…
)
Parallel databaseParallel database Geographic replicationGeographic replication
Structured, flexible schemaStructured, flexible schema
Hosted, managed infrastructureHosted, managed infrastructure
A 42342 E
B 42521 W
C 66354 W
D 12352 E
E 75656 C
F 15677 E
21
22
What Will Sherpa Become?
E 75656 C
A 42342 EB 42521 W
C 66354 W
D 12352 E
F 15677 E
E 75656 C
A 42342 EB 42521 W
C 66354 W
D 12352 E
F 15677 E
E 75656 C
A 42342 EB 42521 W
C 66354 W
D 12352 E
F 15677 E
CREATE TABLE Parts (ID VARCHAR,StockNumber INT,Status VARCHAR…
)
CREATE TABLE Parts (ID VARCHAR,StockNumber INT,Status VARCHAR…
)
Parallel databaseParallel database Geographic replicationGeographic replication
Indexes and viewsIndexes and views
Structured, flexible schemaStructured, flexible schema
Hosted, managed infrastructureHosted, managed infrastructure
23
Scalability• Thousands of machines• Easy to add capacity• Restrict query language to avoid costly queries
Geographic replication• Asynchronous replication around the globe• Low-latency local access
High availability and fault tolerance• Automatically recover from failures• Serve reads and writes despite failures
Sherpa Design Goals
23
Consistency• Per-record guarantees• Timeline model • Option to relax if needed
Multiple access paths• Hash table, ordered table• Primary, secondary access
Hosted service• Applications plug and play• Share operational cost
24
Technology Elements
PNUTS • Query planning and execution• Index maintenance
Distributed infrastructure for tabular data • Data partitioning • Update consistency• Replication
YDOT FS • Ordered tables
Applications
YMB• Pub/sub messaging
YDHT FS • Hash tables
Zookeeper• Consistency service
YC
A:
Aut
hori
zati
on
PNUTS API Tabular API
24
25
Data Manipulation
• Per-record operations– Get– Set– Delete
• Multi-record operations– Multiget– Scan– Getrange
• Web service (RESTful) API
25
26
Tablets—Hash Table
Apple
Lemon
Grape
Orange
Lime
Strawberry
Kiwi
Avocado
Tomato
Banana
Grapes are good to eat
Limes are green
Apple is wisdom
Strawberry shortcake
Arrgh! Don’t get scurvy!
But at what price?
How much did you pay for this lemon?
Is this a vegetable?
New Zealand
The perfect fruit
Name Description Price
$12
$9
$1
$900
$2
$3
$1
$14
$2
$8
0x0000
0xFFFF
0x911F
0x2AF3
26
27
Tablets—Ordered Table
27
Apple
Banana
Grape
Orange
Lime
Strawberry
Kiwi
Avocado
Tomato
Lemon
Grapes are good to eat
Limes are green
Apple is wisdom
Strawberry shortcake
Arrgh! Don’t get scurvy!
But at what price?
The perfect fruit
Is this a vegetable?
How much did you pay for this lemon?
New Zealand
$1
$3
$2
$12
$8
$1
$9
$2
$900
$14
Name Description PriceA
Z
Q
H
28
Flexible Schema
Posted date Listing id Item Price
6/1/07 424252 Couch $570
6/1/07 763245 Bike $86
6/3/07 211242 Car $1123
6/5/07 421133 Lamp $15
Color
Red
Condition
Good
Fair
29
Storageunits
Routers
Tablet controller
REST API
Clients
Local regionRemote regions
YMB
Detailed Architecture
29
30
Tablet Splitting and Balancing
30
Each storage unit has many tablets (horizontal partitions of the table)Each storage unit has many tablets (horizontal partitions of the table)
Tablets may grow over timeTablets may grow over timeOverfull tablets splitOverfull tablets split
Storage unit may become a hotspotStorage unit may become a hotspot
Shed load by moving tablets to other serversShed load by moving tablets to other servers
Storage unitTablet
31
QUERY PROCESSING
31
32
Accessing Data
32
SUSU SU
1
Get key k
2Get key k3 Record for key k
4 Record for key k
33
Bulk Read
33
SUScatter/gather server
SU SU
1
{k1, k2, … kn}
2Get k1
Get k2Get k3
34
Storage unit 1 Storage unit 2 Storage unit 3
Range Queries in YDOT
• Clustered, ordered retrieval of records
Storage unit 1Canteloupe
Storage unit 3Lime
Storage unit 2Strawberry
Storage unit 1
Router
AppleAvocadoBananaBlueberry
CanteloupeGrapeKiwiLemon
LimeMangoOrange
StrawberryTomatoWatermelon
AppleAvocadoBananaBlueberry
CanteloupeGrapeKiwiLemon
LimeMangoOrange
StrawberryTomatoWatermelon
Grapefruit…Pear?Grapefruit…Lime?
Lime…Pear?
Storage unit 1Canteloupe
Storage unit 3Lime
Storage unit 2Strawberry
Storage unit 1
35
Updates
1
Write key k
2Write key k7 Sequence # for key k
8 Sequence # for key k
SU SU SU
3Write key k
4
5SUCCESS
6Write key k
RoutersMessage brokers
35
36
ASYNCHRONOUS REPLICATION AND
CONSISTENCY
36
37
Asynchronous Replication
37
38
• Goal: make it easier for applications to reason about updates and cope with asynchrony
• What happens to a record with primary key “Brian”?
Consistency Model
38
Time
Record inserted
Update Update Update UpdateUpdate Delete
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Update Update
39
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Current version
Stale versionStale version
Read
Consistency Model
39
40
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Read up-to-date
Current version
Stale versionStale version
Consistency Model
40
41
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Read ≥ v.6
Current version
Stale versionStale version
Consistency Model
41
42
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Write
Current version
Stale versionStale version
Consistency Model
42
43
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Write if = v.7
ERROR
Current version
Stale versionStale version
Consistency Model
43
44
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Write if = v.7
ERROR
Current version
Stale versionStale version
Consistency Model
44
Mechanism: per record mastershipMechanism: per record mastership
4646
Mastering
A 42342 EB 42521 W
C 66354 W
D 12352 EE 75656 C
F 15677 E A 42342 EB 42521 W
C 66354 W
D 12352 EE 75656 C
F 15677 EA 42342 EB 42521 W
C 66354 W
D 12352 EE 75656 C
F 15677 E
Tablet master
47
Bulk Insert/Update/Replace
Client
Source Data
Bulk manager
1. Client feeds records to bulk manager
2. Bulk loader transfers records to SU’s in batches• Bypass routers and
message brokers• Efficient import into
storage unit
48
Bulk Load in YDOT
• YDOT bulk inserts can cause performance hotspots
• Solution: preallocate tablets
49
Index Maintenance
• How to have lots of interesting indexes, without killing performance?
• Solution: Asynchrony!– Indexes updated asynchronously when base
table updated
Planned functionalityPlanned functionality
50
SHERPAIN CONTEXT
50
5151
MObStor
• Yahoo!’s next-generation globally replicated, virtualized media object storage service
• Better provisioning, easy migration, replication, better BCP, and performance
• New features (Evergreen URLs, CDN integration, REST API, …)
• The object metadata problem addressed using Sherpa, though MObStor is focused on blob storage.
52
Storage & Delivery Stack
53
The World Has Changed
• Web applications need:– Scalability!
• Preferably elastic
– Geographic distribution– High availability– Reliable storage
• Web applications can do without:– Complicated queries– Strong transactions
54
Web Data Management
Large data analysis(Hadoop)
Structured record storage
(PNUTS)
Blob storage(SAN/NAS)
• Scan oriented workloads
• Focus on sequential disk I/O
• $ per cpu cycle
• CRUD • Point lookups
and short scans
• Index organized table and random I/Os
• $ per latency
• Object retrieval and streaming
• Scalable file storage
• $ per GB
55
Types of Record Stores
• Query expressiveness
Simple Feature rich
Object retrieval
Retrieval from single table of
objects/records
SQL
S3 PNUTS Oracle
56
Types of Record Stores
• Consistency model
Best effort Strong guaranteesEventual
consistencyTimeline
consistencyACID
S3 PNUTS Oracle
Program centric
consistency
Program centric
consistencyObject-centric consistency
Object-centric consistency
57
Types of Record Stores
• Elasticity (ability to add resources on demand)
Not scalable Elastic
Limited (via data
distribution)
VLSD(Very Large
Scale Distribution /Replication)
OraclePNUTS
S3
58
Data Stores Comparison
• User-partitioned SQL stores– Microsoft Azure SDS– Amazon SimpleDB
• Multi-tenant application databases– Salesforce.com– Oracle on Demand
• Mutable object stores– Amazon S3
Versus PNUTS
• More expressive queries• Users must control partitioning• Limited elasticity
• Highly optimized for complex workloads
• Limited flexibility to evolving applications
• Inherit limitations of underlying data management system
• Object storage versus record management
59
Application Design Space
Records Files
Get a few things
Scan everything
Sherpa MObStor
Everest Hadoop
YMDBMySQL
Filer
Oracle
BigTable
59
60
Alternatives Matrix
Ela
stic
Ope
rabi
lity
Glo
bal l
ow
late
ncy
Ava
ilab
ilit
y
Stru
ctur
ed
acce
ss
Sherpa
Y! UDB
MySQL
Oracle
HDFS
BigTable
DynamoU
pdat
esCassandra
Con
sist
ency
m
odel
SQL
/AC
ID
60
61
Further Reading
Efficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2008)Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee, Ramana Yerneni, Raghu Ramakrishnan
PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2008)Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana Yerneni
62
Opening Up Yahoo! SearchPhase 1 Phase 2
Giving site owners and developers control over the appearance of Yahoo!
Search results.
BOSS takes Yahoo!’s open strategy to the next level by providing Yahoo!
Search infrastructure and technology to developers and companies to help them
build their own search experiences.
63
babycenter
epicurious
Search Results of the Future
yelp.com
answers.com
webmd
Gawker
New York Times
64
BOSS Offerings
API
A self-service, web services model for developers and start-ups to quickly build and deploy new search experiences.
BOSS offers two options for companies and developers and has partnered with top technology universities to drive search experimentation, innovation and research into next generation search.
• University of Illinois Urbana Champaign• Carnegie Mellon University
• Stanford University
• Purdue University
• MIT
• Indian Institute of
Technology Bombay
• University of
Massachusetts
CUSTOM
Working with 3rd parties to build a more relevant, brand/site specific web search experience.
This option is jointly built by Yahoo! and select partners.
ACADEMIC
Working with the following universities to allow for wide-scale research in the search field:
(Slide courtesy Prabhakar Raghavan)
65
Partner Examples
66
QUESTIONS?
66