33
Predicting Replicated Database Scalability Sameh Elnikety, Microsoft Research Steven Dropsho, Google Inc. Emmanuel Cecchet, Univ. of Mass. Willy Zwaenepoel, EPFL

Predicting Replicated Database Scalability

Embed Size (px)

DESCRIPTION

Predicting Replicated Database Scalability. Sameh Elnikety, Microsoft Research Steven Dropsho, Google Inc. Emmanuel Cecchet, Univ. of Mass. Willy Zwaenepoel, EPFL. Motivation. Environment E-commerce website DB throughput is 500 tps Is 5000 tps achievable? Yes: use 10 replicas - PowerPoint PPT Presentation

Citation preview

Predicting Replicated Database Scalability

Sameh Elnikety, Microsoft Research Steven Dropsho, Google Inc.Emmanuel Cecchet, Univ. of Mass.Willy Zwaenepoel, EPFL

• Environment– E-commerce website– DB throughput is 500 tps

• Is 5000 tps achievable?– Yes: use 10 replicas– Yes: use 16 replicas – No: faster machines needed

• How tx workload scales on replicated db?

Motivation

SingleDBMS

2

Multi-Master Single-Master

Replica 2

Replica 1

Replica 3

3

Slave 1

Master

Slave 2

Background: Multi-Master

Replica 2

Replica 1

Replica 3

StandaloneDBMS

Load Balancer

4

Read Tx

Replica 2

Replica 1

Replica 3

Load Balancer

T

5

Read tx does not change DB state

Read tx does not change DB state

Update Tx

Replica 2

Replica 1

Replica 3

CertLoad

Balancer

TTwsws wswswswswsws

6

Update tx changesDB state

Update tx changesDB state

Additional Replica

Replica 2

Replica 1

Replica 3

Load Balancer T wsws

Replica 3

7

Replica 4

Cert

wswswsws

• Standalone DBMS– Service demands

• Multi-master system– Service demands– Queuing model

• Experimental validation

Coming Up …

8

• Required– readonly tx: R – update tx: W

• Transaction load– readonly tx: R

– update tx: W / (1 - A1)

Standalone DBMS

SingleDBMS

Abort probability is A1 Submit W / (1 - A1) update tx

Commited tx: WAborted tx: W ∙ A1 / (1- A1)

Abort probability is A1 Submit W / (1 - A1) update tx

Commited tx: WAborted tx: W ∙ A1 / (1- A1) 9

Standalone DBMS

SingleDBMS

1

(1)(1 )

WLoad R rc wc

A

10

• Required– readonly tx: R – update tx: W

• Transaction load– readonly tx: R

– update tx: W / (1 - A1)

Service Demand

1

(1)(1 )

WLoad R rc wc

A

1

(1)(1 )

PwD Pr rc wc

A

11

• Required (whole system of N replicas)– Readonly tx: N ∙ R – Update tx: N ∙ W

• Transaction load per replica– Readonly tx: R

– Update tx: W / (1 - AN)

– Writeset: W ∙ (N - 1)

Multi-Master with N Replicas

( 1)(1 )

( )N

MM

WR rc wc W N ws

ALoad N

12

MM Service Demand

( 1)(1 )

( )N

MM

WR rc wc W N ws

ALoad N

( )(1 )

1)N

MM

PwN Pr rc wc Pw ws

AD N

13Explosive cost!

Compare: Standalone vs MM

( )(1 )

1)N

MM

PwN Pr rc wc Pw ws

AD N

Explosive cost!

1

(1)(1 )

PwD Pr rc wc

A

14

• Standalone:

• Multi-Master:

Readonly Workload

( )(1 )

1)N

MM

PwN Pr rc wc Pw ws

AD N

Explosive cost!

1

(1)(1 )

PwD Pr rc wc

A

15

• Standalone:

• Multi-Master:

Update Workload

( )(1 )

1)N

MM

PwN Pr rc wc Pw ws

AD N

Explosive cost!

1

(1)(1 )

PwD Pr rc wc

A

16

• Standalone:

• Multi-Master:

Closed-Loop Queuing Model

Replica i

LB

LB

LB

...

CPU

Disk

TT

TT

TT

Cert

Cert

Cert

Think time

Load balancer

& network

delay

Certifier delay

Pw..

.

...

N replicas

17

• Standard algorithm

• Iterates over the number of clients

• Inputs:– Number of clients– Service demand at service centers– Delay time at delay centers

• Outputs:– Response time– Throughput

Mean Value Analysis (MVA)

18

Using the Model

Replica i

LB

LB

LB

...

CPU

Disk

TT

TT

TT

Cert

Cert

Cert

Think time

Load balancer

& network

delay

Certifier delay

Pw..

.

...

N replicas

19

• Copy of database

• Log all txs, (Pr : Pw)

• Python script replays txs– Readonly (rc)– Updates (wc)

• Writesets– Instrument db with triggers– Play txs to log writesets– Play writesets (ws)

Standalone Profiling (Offline)

20

MM Service Demand

( )(1 )

1)N

MM

PwN Pr rc wc Pw ws

AD N

21Explosive cost!

Abort Probability

( )

(1)

1(1 ) (1 )

CW N

LN

NA A

22

Using the Model

Replica i

LB

LB

LB

...

CPU

Disk

TT

TT

TT

Cert

Cert

Cert

Think time

Load balancer

& network

delay

Certifier delay

Pw..

.

...

N replicas

# clients, think time

1.5 ∙ fsync()

1 ms

23

• Compare– Measured performance vs model predictions

• Environment– Linux cluster running PostgreSQL

• TPC-W workload– Browsing (5% update txs)– Shopping (20% update txs)– Ordering (50% update txs)

• RUBiS workload– Browsing (0% update txs)– Bidding (20% update txs)

Experimental Validation

24

Multi-Master TPC-W Performance

Throughput Response time

25

26

Browsing, 5% u

15.7 X

Ordering, 50% u6.7 X15%

Multi-Master RUBiS Performance

Throughput Response time

27

28

Browsing, 0% u

16 X

bidding, 20% u

3.4 X

• Database system– Snapshot isolation– No hotspots– Low abort rates

• Server system– Scalable server (no thrashing)

• Queuing model & MVA– Exponential distribution for service demands

Model Assumptions

29

• Models– Single-Master– Multi-Master

• Experimental results– TPC-W– RUBiS

• Sensitivity analysis– Abort rates– Certifier delay

Checkout the Paper

30

Urgaonkar, Pacifici, Shenoy, Spreitzer, Tantawi.

“An analytical model for multi-tier internet services and its applications.” Sigmetrics 2005.

Related Work

31

• Derived an analytical model– Predicts workload scalability

• Implemented replicated systems– Multi-master– Single-master

• Experimental validation– TPC-W– RUBiS– Throughput predictions match within 15%

Conclusions

32

• Questions?

Danke Schön!

33

Predicting Replicated Database Scalability