55
Managing Data in the Cloud

Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

Embed Size (px)

Citation preview

Page 1: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

Managing Data in the Cloud

Page 2: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 2

App Server

App Server

App Server

Scaling in the Cloud

Load Balancer (Proxy)

App Server

MySQL Master DB

MySQL Slave DB

Replication

Client Site

Database becomes the Scalability Bottleneck

Cannot leverage elasticity

App Server

Client Site Client Site

Page 3: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 3

App Server

App Server

App Server

Scaling in the Cloud

Load Balancer (Proxy)

App Server

MySQL Master DB

MySQL Slave DB

Replication

Client Site

App Server

Client Site Client Site

Page 4: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 4

Key Value Stores

Apache+ App Server

Apache+ App Server

Apache+ App Server

Scaling in the Cloud

Load Balancer (Proxy)

Apache+ App Server

Client Site

Apache+ App Server

Client Site Client Site

Page 5: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 5

CAP Theorem (Eric Brewer)

• “Towards Robust Distributed Systems” PODC 2000.

• “CAP Twelve Years Later: How the "Rules" Have Changed” IEEE Computer 2012

Page 6: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 6

Key Value Stores

• Key-Valued data model– Key is the unique identifier– Key is the granularity for consistent access– Value can be structured or unstructured

• Gained widespread popularity– In house: Bigtable (Google), PNUTS (Yahoo!), Dynamo

(Amazon)– Open source: HBase, Hypertable, Cassandra, Voldemort

• Popular choice for the modern breed of web-applications

Page 7: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 7

• Data model.– Sparse, persistent, multi-dimensional sorted map.

• Data is partitioned across multiple servers.• The map is indexed by a row key, column key, and

a timestamp.• Output value is un-interpreted array of bytes.

– (row: byte[ ], column: byte[ ], time: int64) byte[ ]

Big Table (Google)

Page 8: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 8

Architecture Overview

• Shared-nothing architecture consisting of thousands of nodes (commodity PC).

Google File System

Google’s Bigtable Data Model

…….

Page 9: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 9

• Every read or write of data under a single row is atomic.

• Objective: make read operations single-sited!

Atomicity Guarantees in Big Table

Page 10: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 10

• Google File System (GFS)– Highly available distributed file system that stores log and data files

• Chubby– Highly available persistent distributed lock manager.

• Tablet servers– Handles read and writes to its tablet and splits tablets.– Each tablet is typically 100-200 MB in size.

• Master Server– Assigns tablets to tablet servers,– Detects the addition and deletion of tablet servers,– Balances tablet-server load,

Big Table’s Building Blocks

Page 11: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 11

Overview of Bigtable Architecture

Tablet Serve

r

Tablet Serve

r

Google File System

Tablet Serve

r

Master Chubby

Control Operations

Lease Manageme

nt

T1 T2 Tn Tablets

Master and Chubby Proxies

Log Manager

Cache Manager

Page 12: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 12

GFS Architectural Design

• A GFS cluster– A single master– Multiple chunkservers per master

• Accessed by multiple clients

– Running on commodity Linux machines• A file

– Represented as fixed-sized chunks• Labeled with 64-bit unique global IDs• Stored at chunkservers• 3-way replication across chunkservers

Page 13: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 13

GFS chunkserver

Linux file system

Architectural Design

GFS Master

GFS chunkserver

Linux file systemGFS chunkserver

Linux file system

Application

GFS client

chunk location?

chunk data?

Page 14: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 14

Single-Master Design

• Simple• Master answers only chunk locations• A client typically asks for multiple chunk

locations in a single request• The master also predicatively provides chunk

locations immediately following those requested

Page 15: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 15

Metadata• Master stores three major types

– File and chunk namespaces, persistent in operation log– File-to-chunk mappings, persistent in operation log– Locations of a chunk’s replicas, not persistent.

• All kept in memory: Fast!– Quick global scans

• For Garbage collections and Reorganizations

– 64 bytes of metadata only per 64 MB of data

Page 16: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 16

Mutation Operation in GFS

• Mutation: any write or append operation

• The data needs to be written to all replicas

• Guarantee of the same order when multi user request the mutation operation.

Page 17: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 17

GFS Revisited• “GFS: Evolution on Fast-Forward” an interview with GFS

designers in CACM 3/11.• Single master was critical for early deployment.• “the choice to establish 64MB …. was much larger than the

typical file-system block size, but only because the files generated by Google's crawling and indexing system were unusually large.”

• As the application mix changed over time, ….deal efficiently with large numbers of files requiring far less than 64MB (think in terms of Gmail, for example). The problem was not so much with the number of files itself, but rather with the memory demands all of those files made on the centralized master, thus exposing one of the bottleneck risks inherent in the original GFS design.

Page 18: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 18

GFS Revisited(Cont’d)

• “the initial emphasis in designing GFS was on batch efficiency as opposed to low latency.”

• “The original single-master design: A single point of failure may not have been a disaster for batch-oriented applications, but it was certainly unacceptable for latency-sensitive applications, such as video serving.”

• Future directions: distributed master, etc.• Interesting and entertaining read.

Page 19: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 19

PNUTS Overview

• Data Model:– Simple relational model—really key-value store.– Single-table scans with predicates

• Fault-tolerance:– Redundancy at multiple levels: data, meta-data etc.– Leverages relaxed consistency for high availability:

reads & writes despite failures• Pub/Sub Message System:

– Yahoo! Message Broker for asynchronous updates

Page 20: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 20

Asynchronous replication

Page 21: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 21

Consistency Model

• Hide the complexity of data replication• Between the two extremes:

– One-copy serializability, and– Eventual consistency

• Key assumption:– Applications manipulate one record at a time

• Per-record time-line consistency:– All replicas of a record preserve the update order

Page 22: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 22

Implementation

• A read returns a consistent version• One replica designated as master (per record)• All updates forwarded to that master• Master designation adaptive, replica with

most of writes becomes master

Page 23: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 23

Consistency model

• Goal: make it easier for applications to reason about updates and cope with asynchrony

• What happens to a record with primary key “Brian”?

Time

Record inserted

Update Update Update UpdateUpdate Delete

Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7

Generation 1v. 6 v. 8

Update Update

Page 24: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 24

Consistency model

Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7

Generation 1v. 6 v. 8

Current version

Stale versionStale version

Read

Page 25: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 25

Consistency model

Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7

Generation 1v. 6 v. 8

Read up-to-date

Current version

Stale versionStale version

Page 26: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 26

Consistency model

Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7

Generation 1v. 6 v. 8

Read ≥ v.6

Current version

Stale versionStale version

Page 27: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 27

Consistency model

Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7

Generation 1v. 6 v. 8

Write

Current version

Stale versionStale version

Page 28: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 28

Consistency model

Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7

Generation 1v. 6 v. 8

Write if = v.7

ERROR

Current version

Stale versionStale version

Page 29: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 29

Data-path components

Storage units

RoutersTablet

controller

REST API

Clients

MessageBroker

PNUTS Architecture

Page 30: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 30

Storageunits

Routers

Tablet controller

REST API

Clients

Local region Remote regions

YMB

PNUTS architecture

Page 31: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 31

System Architecture: Key Features

• Pub/Sub Mechanism: Yahoo! Message Broker

• Physical Storage: Storage Unit• Mapping of records: Tablet Controller• Record locating: Routers

Page 32: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 32

Highlights of PNUTS Approach

• Shared nothing architecture• Multiple datacenter for geographic distribution• Time-line consistency and access to stale data.• Use a publish-subscribe system for reliable

fault-tolerant communication• Replication with record-based master.

Page 33: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 33

AMAZON’S KEY-VALUE STORE: DYNAMO

Adapted from Amazon’s Dynamo Presentation

Page 34: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 34

Highlights of Dynamo

• High write availability• Optimistic: vector clocks for resolution• Consistent hashing (Chord) in controlled

environment• Quorums for relaxed consistency.

Page 35: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 35

TOO MANY CHOICES – WHICH SYSTEM SHOULD I USE?

Cooper et al., SOCC 2010

Page 36: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 36

Benchmarking Serving Systems

• A standard benchmarking tool for evaluating Key Value stores: Yahoo! Cloud Servicing Benchmark (YCSB)

• Evaluate different systems on common workloads

• Focus on performance and scale out

Page 37: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 37

Benchmark tiers• Tier 1 – Performance

– Latency versus throughput as throughput increases

• Tier 2 – Scalability– Latency as database, system size increases– “Scale-out”

– Latency as we elastically add servers– “Elastic speedup”

Page 38: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

Workload A – Update heavy: 50/50 read/update

0 2000 4000 6000 8000 10000 12000 140000

10

20

30

40

50

60

70

Workload A - Read latency

Cassandra Hbase PNUTS MySQL

Throughput (ops/sec)

Ave

rag

e r

ea

d la

ten

cy (

ms)

38

Cassandra (based on Dynamo) is optimized for heavy updatesCassandra uses hash partitioning.

Page 39: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 39

Workload B – Read heavy95/5 read/update

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

2

4

6

8

10

12

14

16

18

20

Workload B - Read latency

Cassandra HBase PNUTS MySQL

Throughput (operations/sec)

Ave

rage

rea

d la

tenc

y (m

s)

PNUTS uses MSQL, and MSQL is optimized for read operations

Page 40: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

Workload E – short scansScans of 1-100 records of size 1KB

0 200 400 600 800 1000 1200 1400 16000

20

40

60

80

100

120

Workload E - Scan latency

Hbase PNUTS Cassandra

Throughput (operations/sec)

Av

era

ge

sc

an

late

nc

y (

ms

)

CS271 40

HBASE uses append-only log, so optimized for scans—same for MSQLand PNUTS. Cassandra uses hash partitioning, so poor scan performance.

Page 41: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 41

Summary

• Different databases suitable for different workloads

• Evolving systems – landscape changing dramatically

• Active development community around open source systems

Page 42: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

• Scale-up– Classical enterprise setting

(RDBMS)– Flexible ACID transactions– Transactions in a single node

• Scale-out– Cloud friendly (Key value stores)– Execution at a single server

• Limited functionality & guarantees

– No multi-row or multi-step transactions

CS271 42

Two approaches to scalability

Page 43: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

Key-Value Store Lessons

What are the design principles learned?

Page 44: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271

• Separate System and Application State– System metadata is critical but small– Application data has varying needs– Separation allows use of different class of protocols

44

Design Principles [DNIS 2010]

Page 45: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271

• Decouple Ownership from Data Storage– Ownership is exclusive read/write access to data– Decoupling allows lightweight ownership migration

45

Design Principles

Cache Manager

Transaction Manager Recovery

Ownership[Multi-step transactions or

Read/Write Access]

Storage

Classical DBMSs Decoupled ownership and Storage

Page 46: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271

• Limit most interactions to a single node– Allows horizontal scaling– Graceful degradation during failures– No distributed synchronization

46

Design Principles

Thanks: Curino et al VLDB 2010

Page 47: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271

• Limited distributed synchronization is practical– Maintenance of metadata– Provide strong guarantees only for data that needs it

47

Design Principles

Page 48: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

SBBD 2012

• Need to tolerate catastrophic failures– Geographic Replication

• How to support ACID transactions over data replicated at multiple datacenters – One-copy serializablity: Clients can access data in any datacenter,

appears as single copy with atomic access

48

Fault-tolerance in the Cloud

Page 49: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

SBBD 2012

• Entity groups are sub-database– Static partitioning– Cheap transactions in Entity groups (common)– Expensive cross-entity group transactions (rare)

49

Megastore: Entity Groups(Google--CIDR 2011)

Page 50: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

SBBD 2012

Semantically Predefined• Email

– Each email account forms a natural entity group– Operations within an account are transactional: user’s send message is

guaranteed to observe the change despite of fail-over to another replica• Blogs

– User’s profile is entity group– Operations such as creating a new blog rely on asynchronous messaging

with two-phase commit• Maps

– Dividing the globe into non-overlapping patches– Each patch can be an entity group

50

Megastore Entity Groups

Page 51: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

SBBD 2012 51

Megastore

Slides adapted from authors’ presentation

Page 52: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

SBBD 2012 52

Google’s Spanner: Database Tech That Can Scan the Planet (OSDI 2012)

Page 53: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

The Big Picture (OSDI 2012)

Logs

SSTables

Colossus File System

2PC (atomicity)

Paxos (consistency)

2PL + wound-wait (isolation)

Tablets

Movedir

load balancing

GPS + Atomic Clocks

TrueTime

Page 54: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

TrueTime: APIs that provide real time with bounds on error. o Powered by GPS and atomic clocks.

Enforce external consistencyo If the start of T2 occurs after the commit of T1 ,

then the commit timestamp of T2 must be greater than the commit timestamp of T1 .

Concurrency Control:o Update transactions: 2PLo Read-only transactions: Use real time to return a

consistency snapshot.

TrueTime

Page 55: Managing Data in the Cloud. App Server Scaling in the Cloud Load Balancer (Proxy) App Server MySQL Master DB MySQL Slave DB Replication Client Site Database

CS271 55

Primary References• Chang, Dean, Ghemawat, Hsieh, Wallach, Burrows, Chandra, Fikes,

Gruber: Bigtable: A Distributed Storage System for Structured Data. OSDI 2006

• The Google File System: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. Symp on Operating Systems Princ 2003.

• GFS: Evolution on Fast-Forward: Kirk McKusick, Sean Quinlan Communications of the ACM 2010.

• Cooper, Ramakrishnan, Srivastava, Silberstein, Bohannon, Jacobsen, Puz, Weaver, Yerneni: PNUTS: Yahoo!'s hosted data serving platform. VLDB 2008.

• DeCandia,Hastorun,Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels: Dynamo: amazon's highly available key-value store. SOSP 2007

• Cooper, Silberstein, Tam, Ramakrishnan, Sears: Benchmarking cloud serving systems with YCSB. SoCC 2010