45
Optimistic Intra- Transaction Parallelism on Chip-Multiprocessors Chris Colohan 1 , Anastassia Ailamaki 1 , J. Gregory Steffan 2 and Todd C. Mowry 1,3 1 Carnegie Mellon University 2 University of Toronto Intel Research Pittsburgh

Optimistic Intra-Transaction Parallelism on Chip-Multiprocessors Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry 1,3 1 Carnegie

Embed Size (px)

Citation preview

Optimistic Intra-Transaction Parallelism onChip-Multiprocessors

Chris Colohan1, Anastassia Ailamaki1,J. Gregory Steffan2 and Todd C. Mowry1,3

1Carnegie Mellon University2University of Toronto

3Intel Research Pittsburgh

Copyright 2005 Chris Colohan 2

Chip Multiprocessors are Here!

2 cores now, soon will have 4, 8, 16, or 32 Multiple threads per core How do we best use them?

IBM Power 5

AMD Opteron

Intel Yonah

Copyright 2005 Chris Colohan 3

Multi-Core Enhances Throughput

Database ServerUsers

Transactions DBMS Database

Cores can run concurrent transactions and improve

throughput

Cores can run concurrent transactions and improve

throughput

Copyright 2005 Chris Colohan 4

Multi-Core Enhances Throughput

Database ServerUsers

Transactions DBMS Database

Can multiple cores improvetransaction latency?

Can multiple cores improvetransaction latency?

Copyright 2005 Chris Colohan 5

Parallelizing transactions

SELECT cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;foreach(item) { GET quantity FROM stock; quantity--; UPDATE stock WITH quantity; INSERT item INTO order_line;}

DBMS

Intra-query parallelism Used for long-running queries (decision support) Does not work for short queries

Short queries dominate in commercial workloads

Copyright 2005 Chris Colohan 6

Parallelizing transactions

SELECT cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;foreach(item) { GET quantity FROM stock; quantity--; UPDATE stock WITH quantity; INSERT item INTO order_line;}

DBMS

Intra-transaction parallelism Each thread spans multiple queries

Hard to add to existing systems! Need to change interface, add latches and locks,

worry about correctness of parallel execution…

Copyright 2005 Chris Colohan 7

Parallelizing transactions

SELECT cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;foreach(item) { GET quantity FROM stock; quantity--; UPDATE stock WITH quantity; INSERT item INTO order_line;}

DBMS

Intra-transaction parallelism Breaks transaction into threads

Hard to add to existing systems! Need to change interface, add latches and locks,

worry about correctness of parallel execution…

Thread Level Speculation (TLS)makes parallelization easier.

Thread Level Speculation (TLS)makes parallelization easier.

Copyright 2005 Chris Colohan 8

Thread Level Speculation (TLS)

*p=

*q=

=*p

=*q

Sequential

Tim

e

Parallel

*p=

*q=

=*p

=*q

=*p

=*q

Epoch 1 Epoch 2

Copyright 2005 Chris Colohan 9

Thread Level Speculation (TLS)

*p=

*q=

=*p

=*q

Sequential

Tim

e

*p=

*q=

=*p

R2

Violation!

=*p

=*q

Parallel

Use epochs

Detect violations

Restart to recover

Buffer state

Worst case: Sequential

Best case: Fully parallel

Data dependences limit performance.Data dependences limit performance.

Epoch 1 Epoch 2

Copyright 2005 Chris Colohan 10

Transactions

DBMS

Hardware

A Coordinated Effort

TPC-CTPC-C

BerkeleyDB

BerkeleyDB

Simulated machine

Simulated machine

Copyright 2005 Chris Colohan 11

TransactionProgrammer

DBMS Programmer

Hardware Developer

A Coordinated Effort

Choose epoch boundaries

Choose epoch boundaries

Remove performance bottlenecks

Remove performance bottlenecks

Add TLS support to architecture

Add TLS support to architecture

Copyright 2005 Chris Colohan 12

So what’s new?

Intra-transaction parallelism Without changing the transactions With minor changes to the DBMS Without having to worry about locking Without introducing concurrency bugs With good performance

Halve transaction latency on four cores

Copyright 2005 Chris Colohan 13

Related Work

Optimistic Concurrency Control (Kung82)

Sagas (Molina&Salem87)

Transaction chopping (Shasha95)

Copyright 2005 Chris Colohan 14

Outline

Introduction Related work Dividing transactions into epochs Removing bottlenecks in the DBMS Results Conclusions

Copyright 2005 Chris Colohan 15

Case Study: New Order (TPC-C)

Only dependence is the quantity field Very unlikely to occur (1/100,000)

GET cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;foreach(item) { GET quantity FROM stock WHERE i_id=item; UPDATE stock WITH quantity-1 WHERE i_id=item; INSERT item INTO order_line;}

78% of transactionexecution time

Copyright 2005 Chris Colohan 16

Case Study: New Order (TPC-C)GET cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;foreach(item) { GET quantity FROM stock WHERE i_id=item; UPDATE stock WITH quantity-1 WHERE i_id=item; INSERT item INTO order_line;}

GET cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;

TLS_foreach(item) { GET quantity FROM stock WHERE i_id=item; UPDATE stock WITH quantity-1 WHERE i_id=item; INSERT item INTO order_line;}

GET cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;

TLS_foreach(item) { GET quantity FROM stock WHERE i_id=item; UPDATE stock WITH quantity-1 WHERE i_id=item; INSERT item INTO order_line;}

Copyright 2005 Chris Colohan 17

Outline

Introduction Related work Dividing transactions into epochs Removing bottlenecks in the

DBMS Results Conclusions

Copyright 2005 Chris Colohan 18

Dependences in DBMSTim

e

Copyright 2005 Chris Colohan 19

Dependences in DBMSTim

e

Dependences serialize execution!

Performance tuning: Profile execution Remove bottleneck

dependence Repeat

Copyright 2005 Chris Colohan 20

Buffer Pool Management

CPU

Buffer Pool

get_page(5)

ref: 1

put_page(5)

ref: 0

Copyright 2005 Chris Colohan 21

get_page(5)put_page(5)

Buffer Pool Management

CPU

Buffer Pool

get_page(5)

ref: 0

put_page(5)

Tim

e

get_page(5)

put_page(5)

TLS ensures first epoch gets page

first.Who cares?

TLS ensures first epoch gets page

first.Who cares?

get_page(5)put_page(5)

Copyright 2005 Chris Colohan 22

Buffer Pool Management

CPU

Buffer Pool

get_page(5)

ref: 0

put_page(5)

Tim

e

get_page(5)

put_page(5)

= Escape Speculation

• Escape speculation• Invoke operation• Store undo function• Resume speculation

• Escape speculation• Invoke operation• Store undo function• Resume speculation

get_page(5)put_page(5)

put_page(5)get_page(5)

Copyright 2005 Chris Colohan 23

Buffer Pool Management

CPU

Buffer Pool

get_page(5)

ref: 0

put_page(5)

Tim

e

get_page(5)

put_page(5)

get_page(5)put_page(5)

Not undoable!

Not undoable!

get_page(5)put_page(5)

= Escape Speculation

Copyright 2005 Chris Colohan 24

Buffer Pool Management

CPU

Buffer Pool

get_page(5)

ref: 0

put_page(5)

Tim

e

get_page(5)

put_page(5)

get_page(5)

put_page(5)

Delay put_page until end of epoch Avoid dependence

= Escape Speculation

Copyright 2005 Chris Colohan 25

Removing Bottleneck Dependences

We introduce three techniques: Delay operations until non-speculative

Mutex and lock acquire and release Buffer pool, memory, and cursor release Log sequence number assignment

Escape speculation Buffer pool, memory, and cursor allocation

Traditional parallelization Memory allocation, cursor pool, error

checks, false sharing

Copyright 2005 Chris Colohan 26

Outline

Introduction Related work Dividing transactions into epochs Removing bottlenecks in the DBMS Results Conclusions

Copyright 2005 Chris Colohan 27

Experimental Setup

Detailed simulation Superscalar, out-of-

order, 128 entry reorder buffer

Memory hierarchy modeled in detail

TPC-C transactions on BerkeleyDB In-core database Single user Single warehouse Measure interval of 100

transactions Measuring latency not

throughput

CPU

32KB4-wayL1 $

Rest of memory system

Rest of memory system

CPU

32KB4-wayL1 $

CPU

32KB4-wayL1 $

CPU

32KB4-wayL1 $

2MB 4-way L2 $

Copyright 2005 Chris Colohan 28

Optimizing the DBMS: New Order

0

0.25

0.5

0.75

1

1.25

Seque

ntial

No Opt

imiza

tions

Latch

es

Lock

s

Mall

oc/F

ree

Buffe

r Poo

l

Curso

r Que

ue

Error C

heck

s

False

Sharin

g

B-Tre

e

Logg

ing

Tim

e (n

orm

aliz

ed)

Idle CPU

ViolatedCache Miss

Busy

Cache misses

increase

Cache misses

increase

Other CPUs not helping

Other CPUs not helping

Can’t optimize

much more

Can’t optimize

much more

26% improvemen

t

26% improvemen

t

Copyright 2005 Chris Colohan 29

Optimizing the DBMS: New Order

0

0.25

0.5

0.75

1

1.25

Seque

ntial

No Opt

imiza

tions

Latch

es

Lock

s

Mall

oc/F

ree

Buffe

r Poo

l

Curso

r Que

ue

Error C

heck

s

False

Sharin

g

B-Tre

e

Logg

ing

Tim

e (n

orm

aliz

ed)

Idle CPU

ViolatedCache Miss

Busy

This process took me 30 days and <1200 lines of code.

This process took me 30 days and <1200 lines of code.

Copyright 2005 Chris Colohan 30

Other TPC-C Transactions

0

0.25

0.5

0.75

1

New Order Delivery Stock Level Payment Order Status

Tim

e (

no

rma

lize

d)

Idle CPU

FailedCache Miss

Busy

3/5 Transactions speed up by 46-66%

Copyright 2005 Chris Colohan 31

Conclusions

A new form of parallelism for databases Tool for attacking transaction latency

Intra-transaction parallelism Without major changes to DBMS

TLS can be applied to more than transactions

Halve transaction latency by using 4 CPUs

Any questions?

For more information, see:www.colohan.com

Backup Slides Follow…

Copyright 2005 Chris Colohan 34

TPC-C Transactions on 2 CPUs

0

0.25

0.5

0.75

1

New Order Delivery Stock Level Payment Order Status

Tim

e (n

orm

aliz

ed)

Idle CPUFailedCache MissBusy

3/5 Transactions speed up by 30-44%

LATCHES

Copyright 2005 Chris Colohan 36

Latches

Mutual exclusion between transactions Cause violations between epochs

Read-test-write cycle RAW Not needed between epochs

TLS already provides mutual exclusion!

Copyright 2005 Chris Colohan 37

Latches: Aggressive Acquire

Acquirelatch_cnt++…work…latch_cnt--

Homefree

latch_cnt++…work…(enqueue release)

Commit worklatch_cnt--

Homefree

latch_cnt++…work…(enqueue release)

Commit worklatch_cnt--Release

Larg

e c

riti

cal secti

on

Copyright 2005 Chris Colohan 38

Latches: Lazy Acquire

Acquire…work…Release

Homefree

(enqueue acquire) …work…(enqueue release)

AcquireCommit workRelease

Homefree

(enqueue acquire)…work…(enqueue release)

AcquireCommit workRelease

Sm

all c

riti

cal secti

on

s

HARDWARE

Copyright 2005 Chris Colohan 40

TLS in Database Systems

Non-DatabaseTLS

Tim

e

TLS in DatabaseSystems

Large epochs:• More dependences

• Must tolerate

• More state• Bigger buffers

Large epochs:• More dependences

• Must tolerate

• More state• Bigger buffers

Concurrenttransactions

Copyright 2005 Chris Colohan 41

Feedback Loop I know this is

parallel!

for() { do_work();}

par_for() { do_work();}

Must…Make…Faster

think

feed back feed back feed back feed back feed back feed back feed back feed back feed back feed back feed

Copyright 2005 Chris Colohan 42

Violations == Feedback

*p=

*q=

=*p

=*q

Sequential

Tim

e

*p=

*q=

=*p

R2

Violation!

=*p

=*q

Parallel

0x0FD80xFD200x0FC00xFC18

Must…Make…Faster

Must…Make…Faster

Copyright 2005 Chris Colohan 43

Eliminating Violations

*p=

*q=

=*p

R2

Violation!

=*p

=*q

Parallel

*q==*q

=*q

Violation!

Eliminate *p Dep.

Tim

e

0x0FD80xFD200x0FC00xFC18

Optimization maymake slower?

Copyright 2005 Chris Colohan 44

Tolerating Violations: Sub-epochs

Tim

e

*q=Violation!

Sub-epochs

=*q

=*q

*q==*q

=*q

Violation!

Eliminate *p Dep.

Copyright 2005 Chris Colohan 45

Sub-epochs

Started periodically by hardware How many? When to start?

Hardware implementation Just like epochs

Use more epoch contexts No need to check

violations between sub-epochs within an epoch

*q=Violation!

Sub-epochs

=*q

=*q