Transactional Memory Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy...

Preview:

Citation preview

Transactional Memory

Companion slides forThe Art of Multiprocessor Programming

by Maurice Herlihy & Nir Shavit

2

Our Vision for the Future

In this course, we covered ….In this course, we covered ….

Best practices …Best practices …

New and clever ideas …New and clever ideas …

And common-sense observations.And common-sense observations.

Art of Multiprocessor Programming

3

Our Vision for the Future

In this course, we covered ….In this course, we covered ….

Best practices …Best practices …

New and clever ideas …New and clever ideas …

And common-sense observations.And common-sense observations.

Nevertheless …Nevertheless …

Concurrent programming is still too hard …Concurrent programming is still too hard …

Here we explore why this is ….Here we explore why this is ….

And what we can do about it.And what we can do about it.

Art of Multiprocessor Programming

4

Locking

Art of Multiprocessor Programming

5

Coarse-Grained Locking

Easily made correct …But not scalable.

Art of Multiprocessor Programming

6

Fine-Grained Locking

Can be tricky …

Art of Multiprocessor Programming

7

Locks are not Robust

If a thread holdinga lock is delayed …

No one else can make progress

Art of Multiprocessor Programming

Locking Relies on Conventions

• Relation between– Lock bit and object bits– Exists only in programmer’s mind

/* * When a locked buffer is visible to the I/O layer * BH_Launder is set. This means before unlocking * we must clear BH_Launder,mb() on alpha and then * clear BH_Lock, so no reader can see BH_Launder set * on an unlocked buffer and then risk to deadlock. */

Actual comment from Linux Kernel

(hat tip: Bradley Kuszmaul)

Art of Multiprocessor Programming

9

Simple Problems are hard

enq(x) enq(y)double-ended queue

No interference if ends “far apart”

No interference if ends “far apart”

Interference OK if queue is small

Interference OK if queue is small

Clean solution is publishable result:

[Michael & Scott PODC 97]

Clean solution is publishable result:

[Michael & Scott PODC 97]

Art of Multiprocessor Programming

Art of Multiprocessor Programming

10

Locks Not Composable

Transfer item from one queue to another

Transfer item from one queue to another

Must be atomic :No duplicate or missing items

Must be atomic :No duplicate or missing items

Art of Multiprocessor Programming

11

Locks Not Composable

Lock sourceLock source

Lock targetLock target

Unlock source & target

Unlock source & target

Art of Multiprocessor Programming

12

Locks Not Composable

Lock sourceLock source

Lock targetLock target

Unlock source & target

Unlock source & target

Methods cannot provide internal synchronization

Methods cannot provide internal synchronization

Objects must expose locking protocols to clients

Objects must expose locking protocols to clients

Clients must devise and follow protocols

Clients must devise and follow protocols

Abstraction broken!Abstraction broken!

13

Monitor Wait and Signal

zzz

Empty

bufferYes!

Art of Multiprocessor Programming

If buffer is empty, wait for item to show up

If buffer is empty, wait for item to show up

14

Wait and Signal do not Compose

empty

emptyzzz…

Art of Multiprocessor Programming

Wait for either?Wait for either?

Art of Multiprocessor Programming

1515

The Transactional Manifesto

• Current practice inadequate– to meet the multicore challenge

• Research Agenda– Replace locking with a transactional API – Design languages or libraries– Implement efficient run-times

Art of Multiprocessor Programming

1616

Transactions

Block of code ….Block of code ….

Atomic: appears to happen instantaneously

Atomic: appears to happen instantaneously

Serializable: all appear to happen in one-at-a-time

order

Serializable: all appear to happen in one-at-a-time

orderCommit: takes effect (atomically)

Commit: takes effect (atomically)

Abort: has no effect (typically restarted)

Abort: has no effect (typically restarted)

Art of Multiprocessor Programming

1717

atomic { x.remove(3); y.add(3);}

atomic { y = null;}

Atomic Blocks

Art of Multiprocessor Programming

1818

atomic { x.remove(3); y.add(3);}

atomic { y = null;}

Atomic Blocks

No data race

Art of Multiprocessor Programming

1919

public void LeftEnq(item x) { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q;}

A Double-Ended Queue

Write sequential Code

Art of Multiprocessor Programming

2020

public void LeftEnq(item x) atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; }}

A Double-Ended Queue

Art of Multiprocessor Programming

2121

public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; }}

A Double-Ended Queue

Enclose in atomic block

Art of Multiprocessor Programming

2222

Warning• Not always this simple

– Conditional waits– Enhanced concurrency– Complex patterns

• But often it is…

Art of Multiprocessor Programming

23

Composition?

Art of Multiprocessor Programming

24

Composition?

public void Transfer(Queue<T> q1, q2){ atomic { T x = q1.deq(); q2.enq(x); }}

Trivial or what?Trivial or what?

Art of Multiprocessor Programming

2525

public T LeftDeq() { atomic { if (this.left == null) retry; … }}

Conditional Waiting

Roll back transaction and restart when

something changes

Roll back transaction and restart when

something changes

Art of Multiprocessor Programming

2626

Composable Conditional Waiting

atomic { x = q1.deq(); } orElse { x = q2.deq();}

Run 1st method. If it retries …Run 1st method. If it retries …Run 2nd method. If it retries …Run 2nd method. If it retries …

Entire statement retriesEntire statement retries

Art of Multiprocessor Programming

2727

Hardware Transactional Memory

• Exploit Cache coherence• Already almost does it

– Invalidation– Consistency checking

• Speculative execution– Branch prediction =

optimistic synch!

Art of Multiprocessor Programming

2828

HW Transactional Memory

Interconnect

caches

memory

read active

T

Art of Multiprocessor Programming

2929

Transactional Memoryread

activeTT

active

caches

memory

Art of Multiprocessor Programming

3030

Transactional Memory

activeTT

activecommitted

caches

memory

Art of Multiprocessor Programming

3131

Transactional Memorywrite

active

committed

TD caches

memory

Art of Multiprocessor Programming

3232

Rewind

activeTT

activewriteaborted

D caches

memory

Art of Multiprocessor Programming

3333

Transaction Commit

• At commit point– If no cache conflicts, we win.

• Mark transactional entries– Read-only: valid– Modified: dirty (eventually written back)

• That’s all, folks!– Except for a few details …

Art of Multiprocessor Programming

3434

Not all Skittles and Beer

• Limits to– Transactional cache size– Scheduling quantum

• Transaction cannot commit if it is– Too big– Too slow– Actual limits platform-dependent

HTM Strengths & Weaknesses

• Ideal for lock-free data structures

HTM Strengths & Weaknesses

• Ideal for lock-free data structures

• Practical proposals have limits on– Transaction size and length– Bounded HW resources– Guarantees vs best-effort

HTM Strengths & Weaknesses

• Ideal for lock-free data structures• Practical proposals have limits on

– Transaction size and length– Bounded HW resources– Guarantees vs best-effort

• On fail– Diagnostics essential– Retry in software?

Composition

Locks don’t compose, transactions do.Locks don’t compose, transactions do.

Composition necessary for Software Engineering.Composition necessary for Software Engineering.

But practical HTM doesn’t really support composition!

But practical HTM doesn’t really support composition!

Why we need STMWhy we need STM

Art of Multiprocessor Programming

39

Simple Lock-Based STM

• STMs come in different forms– Lock-based– Lock-free

• Here : a simple lock-based STM

Art of Multiprocessor Programming

40

Synchronization

• Transaction keeps– Read set: locations & values read– Write set: locations & values to be written

• Deferred update– Changes installed at commit

• Lazy conflict detection– Conflicts detected at commit

Art of Multiprocessor Programming

4141

STM: Transactional Locking

Map

Application Memory

V#

V#

V#

Array of version #s &

locks

Array of version #s &

locks

Art of Multiprocessor Programming

4242

Reading an ObjectMem

Locks

V#

V#

V#

V#

V#

Add version numbers & values to read set

Add version numbers & values to read set

Art of Multiprocessor Programming

4343

To Write an ObjectMem

Locks

V#

V#

V#

V#

V#

Add version numbers & new values to write set

Add version numbers & new values to write set

Art of Multiprocessor Programming

4444

To CommitMem

Locks

V#

V#

V#

V#

V#

X

Y

V#+1

V#+1

Acquire write locksAcquire write locks

Check version numbers unchanged

Check version numbers unchanged

Install new valuesInstall new valuesIncrement version numbersIncrement version numbers

Unlock.Unlock.

Problem: Internal Inconsistency

• A Zombie is an active transaction destined to abort.

• If Zombies see inconsistent states bad things can happen

Art of Multiprocessor Programming

46

Internal Consistency

x

y

4

2

8

4

Invariant: x = 2yInvariant: x = 2y

Transaction A: reads x = 4

Transaction A: reads x = 4Transaction B: writes

8 to x, 16 to y, aborts A )

Transaction B: writes8 to x, 16 to y, aborts

A )Transaction A: (zombie)reads y = 4

computes 1/(x-y)

Transaction A: (zombie)reads y = 4

computes 1/(x-y)

Divide by zero FAIL!Divide by zero FAIL!

Art of Multiprocessor Programming

47

Solution: The Global Clock

• Have one shared global clock

• Incremented by (small subset of) writing transactions

• Read by all transactions

• Used to validate that state worked on is always consistent

100

Art of Multiprocessor Programming

4848

Read-Only TransactionsMem

Locks

12

32

56

19

17

100

Shared Version Clock

Private Read Version (RV)

Copy version clock to local read version clockCopy version clock to

local read version clock

100

Art of Multiprocessor Programming

4949

Read-Only TransactionsMem

Locks

12

32

56

19

17

100

Shared Version Clock

Private Read Version (RV)

Copy version clock to local read version clockCopy version clock to

local read version clockRead lock, version #,

and memoryRead lock, version #,

and memory

100

Art of Multiprocessor Programming

5050

Read-Only TransactionsMem

Locks

12

32

56

19

17

100

Shared Version Clock

Private Read Version (RV)

Copy version clock to local read version clockCopy version clock to

local read version clockRead lock, version #,

and memoryRead lock, version #,

and memoryOn Commit:check unlocked &

version # unchanged

On Commit:check unlocked &

version # unchanged

Art of Multiprocessor Programming

5151

Read-Only TransactionsMem

Locks

12

32

56

19

17

100

Shared Version Clock

Private Read Version (RV)

Copy version clock to local read version clockCopy version clock to

local read version clockRead lock, version #,

and memoryRead lock, version #,

and memoryOn Commit:check unlocked &

version # unchanged

On Commit:check unlocked &

version # unchanged

Check that version #s less than local read clockCheck that version #s

less than local read clock

100

Art of Multiprocessor Programming

5252

Read-Only TransactionsMem

Locks

12

32

56

19

17

100

Shared Version Clock

Private Read Version (RV)

Copy version clock to local read version clockCopy version clock to

local read version clockRead lock, version #,

and memoryRead lock, version #,

and memoryOn Commit:check unlocked &

version # unchanged

On Commit:check unlocked &

version # unchanged

Check that version #s less than local read clockCheck that version #s

less than local read clock

100

We have taken a snapshot without keeping an explicit read

set!

We have taken a snapshot without keeping an explicit read

set!

100

Art of Multiprocessor Programming

5353

Regular TransactionsMem

Locks

12

32

56

19

17

100

Shared Version Clock

Private Read Version (RV)

Copy version clock to local read version clockCopy version clock to

local read version clock

100

Art of Multiprocessor Programming

5454

Regular TransactionsMem

Locks

12

32

56

19

17

100

Shared Version Clock

Private Read Version (RV)

Copy version clock to local read version clockCopy version clock to

local read version clockOn read/write, check:

Unlocked & version # < RV

Add to R/W set

On read/write, check:Unlocked & version # <

RVAdd to R/W set

Art of Multiprocessor Programming

5555

On CommitMem

Locks

100

Shared Version Clock

100

12

32

56

19

17 Private Read Version (RV)

Acquire write locksAcquire write locks

Art of Multiprocessor Programming

5656

On CommitMem

Locks

100

Shared Version Clock

100101

12

32

56

19

17 Private Read Version (RV)

Acquire write locksAcquire write locks

Increment Version ClockIncrement Version Clock

Art of Multiprocessor Programming

5757

On CommitMem

Locks

100

Shared Version Clock

100101

12

32

56

19

17 Private Read Version (RV)

Acquire write locksAcquire write locks

Increment Version ClockIncrement Version ClockCheck version numbers

≤ RVCheck version numbers

≤ RV

Art of Multiprocessor Programming

5858

On CommitMem

Locks

100

Shared Version Clock

100101

12

32

56

19

17 Private Read Version (RV)

Acquire write locksAcquire write locks

Increment Version ClockIncrement Version ClockCheck version numbers

≤ RVCheck version numbers

≤ RVUpdate memoryUpdate memoryx

y

Art of Multiprocessor Programming

5959

On CommitMem

Locks

100

Shared Version Clock

100101

12

32

56

19

17 Private Read Version (RV)

Acquire write locksAcquire write locks

Increment Version ClockIncrement Version ClockCheck version numbers

≤ RVCheck version numbers

≤ RVUpdate memoryUpdate memory

Update write version #sUpdate write version #s

x

y

100

100

Art of Multiprocessor Programming

60

TM Design Issues

• Implementation choices

• Language design issues

• Semantic issues

Art of Multiprocessor Programming

61

Granularity

• Object– managed languages, Java, C#, …– Easy to control interactions between

transactional & non-trans threads

• Word– C, C++, …– Hard to control interactions between

transactional & non-trans threads

Art of Multiprocessor Programming

62

Direct/Deferred Update

• Deferred – modify private copies & install on commit– Commit requires work– Consistency easier

• Direct – Modify in place, roll back on abort– Makes commit efficient– Consistency harder

Art of Multiprocessor Programming

63

Conflict Detection

• Eager– Detect before conflict arises– “Contention manager” module resolves

• Lazy– Detect on commit/abort

• Mixed– Eager write/write, lazy read/write …

Art of Multiprocessor Programming

64

Conflict Detection

• Eager detection may abort transactions that could have committed.

• Lazy detection discards more computation.

Art of Multiprocessor Programming

65

Contention Management & Scheduling

• How to resolve conflicts?

• Who moves forward and who rolls back?

• Lots of empirical work but formal work in infancy

Art of Multiprocessor Programming

66

Contention Manager Strategies

• Exponential backoff

• Priority to– Oldest?– Most work?– Non-waiting?

• None Dominates

• But needed anywayJudgment of Solomon

Art of Multiprocessor Programming

67

I/O & System Calls?

• Some I/O revocable– Provide transaction-

safe libraries– Undoable file

system/DB calls

• Some not– Opening cash

drawer– Firing missile

Art of Multiprocessor Programming

68

I/O & System Calls

• One solution: make transaction irrevocable– If transaction tries I/O, switch to irrevocable

mode.

• There can be only one …– Requires serial execution

• No explicit aborts– In irrevocable transactions

Art of Multiprocessor Programming

69

Exceptions

int i = 0;try { atomic { i++; node = new Node(); }} catch (Exception e) { print(i);}

int i = 0;try { atomic { i++; node = new Node(); }} catch (Exception e) { print(i);}

Art of Multiprocessor Programming

70

Exceptions

int i = 0;try { atomic { i++; node = new Node(); }} catch (Exception e) { print(i);}

int i = 0;try { atomic { i++; node = new Node(); }} catch (Exception e) { print(i);}

Throws OutOfMemoryException!

Art of Multiprocessor Programming

71

Exceptions

int i = 0;try { atomic { i++; node = new Node(); }} catch (Exception e) { print(i);}

int i = 0;try { atomic { i++; node = new Node(); }} catch (Exception e) { print(i);}

Throws OutOfMemoryException!

What is printed?

Art of Multiprocessor Programming

72

Unhandled Exceptions

• Aborts transaction– Preserves invariants– Safer

• Commits transaction– Like locking semantics– What if exception object refers to values

modified in transaction?

Art of Multiprocessor Programming

73

Nested Transactions

atomic void foo() { bar();}

atomic void bar() { …}

atomic void foo() { bar();}

atomic void bar() { …}

Art of Multiprocessor Programming

74

Nested Transactions

• Needed for modularity– Who knew that cosine() contained a

transaction?

• Flat nesting– If child aborts, so does parent

• First-class nesting– If child aborts, partial rollback of child only

Remember 1993?

Citation Count

TM Today

93,300

2,210,000

Second Opinion

Hatin’ on TM

STM is too inefficientSTM is too inefficient

Hatin’ on TM

Requires radical change in programming styleRequires radical change in programming style

Hatin’ on TM

Erlang-style shared nothing only true path to salvationErlang-style shared nothing only true path to salvation

Hatin’ on TM

There is nothing wrong with what we do today.There is nothing wrong with what we do today.

Gartner Hype Cycle

Hat tip: Jeremy Kemp

Art of Multiprocessor Programming

84

• Multicore forces us to rethink almost everything

I, for one, Welcome our new Multicore Overlords …

Art of Multiprocessor Programming

85

• Multicore forces us to rethink almost everything

• Standard approaches too complex

I, for one, Welcome our new Multicore Overlords …

Art of Multiprocessor Programming

86

• Multicore forces us to rethink almost everything

• Standard approaches won’t scale

• Transactions might make life simpler…

I, for one, Welcome our new Multicore Overlords …

Art of Multiprocessor Programming

87

• Multicore forces us to rethink almost everything

• Standard approaches won’t scale

• Transactions might …

• Multicore programming

Plenty more to do…

Maybe it will be you…

I, for one, Welcome our new Multicore Overlords …

Thanks ! תודה

Art of Multiprocessor Programming

88

Recommended