68
Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

  • View
    220

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Shared Counters and Parallelism

Companion slides forThe Art of Multiprocessor

Programmingby Maurice Herlihy & Nir Shavit

Page 2: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

2

A Shared Pool

• Put– Inserts object– blocks if full

• Remove– Removes &

returns an object– blocks if empty

public interface Pool { public void put(Object x); public Object remove();}

Unordered set of objects

Page 3: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

3

put

Simple Locking Implementation

put

Page 4: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

4

put

Simple Locking Implementation

put

Problem: hot-spot

contention

Page 5: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

5

put

Simple Locking Implementation

Problem: hot-spot

contention

Problem: Counter is sequential bottleneck

put

Page 6: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

6

put

Simple Locking Implementation

Problem: hot-spot

contention

Problem: sequential bottleneck

putSolution: Queue Lock

Page 7: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

7

put

Simple Locking Implementation

Problem: hot-spot

contention

Problem: sequential bottleneck

putSolution: Queue Lock

Solution???

Page 8: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

8

Counting Implementation

19

20

21

remove

put

19

20

21

Page 9: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

9

Counting Implementation

19

20

21Only the

counters are sequential

remove

put

19

20

21

Page 10: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

10

Shared Counter

3

2

1

012

3

Page 11: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

11

Shared Counter

3

2

1

012

3

No duplication

Page 12: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

12

Shared Counter

3

2

1

012

3

No duplication

No Omission

Page 13: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

13

Shared Counter

3

2

1

012

3

Not necessarily linearizable

No duplication

No Omission

Page 14: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

14

Shared Counters

• Can we build a shared counter with– Low memory contention, and– Real parallelism?

• Locking– Can use queue locks to reduce

contention– No help with parallelism issue …

Page 15: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

15

Software Combining Tree4

Contention:All spinning local

Parallelism:Potential n/log n

speedup

Page 16: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

16

Combining Trees

0

Page 17: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

17

Combining Trees

0

+3

Page 18: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

18

Combining Trees

0

+3 +2

Page 19: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

19

Combining Trees

0

+3 +2Two threads

meet, combine sums

Page 20: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

20

Combining Trees

0

+3 +2Two threads

meet, combine sums

+5

Page 21: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

21

Combining Trees

5

+3 +2

+5

Combined sum added to root

Page 22: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

22

Combining Trees

5

+3 +2

0

Result returned to

children

Page 23: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

23

Combining Trees

5

00

3

0 Results returned to threads

Page 24: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

24

Devil in the Details

• What if– threads don’t arrive at the same time?

• Wait for a partner to show up?– How long to wait?– Waiting times add up …

• Instead– Use multi-phase algorithm– Try to wait in parallel …

Page 25: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

25

Combining Status

enum CStatus{ IDLE, FIRST, SECOND, DONE, ROOT};

Page 26: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

26

Combining Status

enum CStatus{ IDLE, FIRST, SECOND, DONE, ROOT};

Nothing going on

Page 27: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

27

Combining Status

enum CStatus{ IDLE, FIRST, SECOND, DONE, ROOT};

1st thread in search of partner for combining, will return soon to check for 2nd

thread

Page 28: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

28

Combining Status

enum CStatus{ IDLE, FIRST, SECOND, DONE, ROOT};

2nd thread arrived with value for

combining

Page 29: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

29

Combining Status

enum CStatus{ IDLE, FIRST, SECOND, DONE, ROOT};

1st thread has completed operation & deposited

result for 2nd thread

Page 30: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

30

Combining Status

enum CStatus{ IDLE, FIRST, SECOND, DONE, ROOT};

Special case: root node

Page 31: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

31

Node Synchronization

• Short-term– Synchronized methods– Consistency during method call

• Long-term– Boolean locked field– Consistency across calls

Page 32: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

32

Phases

• Precombining– Set up combining rendez-vous

• Combining– Collect and combine operations

• Operation– Hand off to higher thread

• Distribution– Distribute results to waiting threads

Page 33: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

33

Precombining Phase

0

Examine status IDLE

Page 34: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

34

Precombining Phase

0

0If IDLE, promise to return to look

for partner

FIRST

Page 35: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

35

Precombining Phase

0

At ROOT, turn back

FIRST

Page 36: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

36

Precombining Phase

0

FIRST

Page 37: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

37

Precombining Phase

0

0SECOND

If FIRST, I’m willing to

combine, but lock for now

Page 38: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

38

Code

• Tree class– In charge of navigation

• Node class– Combining state– Synchronization state– Bookkeeping

Page 39: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

39

Combining Phase

0

0SECOND

1st thread locked out until 2nd

provides value+3

Page 40: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

40

Combining Phase

0

0SECOND

2nd thread deposits value to be

combined, unlocks node, & waits …

2

+3

zzz

Page 41: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

41

Combining Phase

+3 +2

+5

SECOND2

0

1st thread moves up the tree with

combined value …

zzz

Page 42: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

42

Combining (reloaded)

0

02nd thread has not yet deposited

value …

FIRST

Page 43: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

43

Combining (reloaded)

0

+3

FIRST

1st thread is alone, locks out

late partner

Page 44: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

44

Combining (reloaded)

0

+3

+3

FIRST

Stop at root

Page 45: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

45

Combining (reloaded)

0

+3

+3

FIRST

2nd thread’s phase 1 visit locked out

Page 46: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

46

Operation Phase

5

+3 +2

+5Add combined value to root, start back down

(phase 4)zzz

Page 47: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

47

Operation Phase (reloaded)

5

Leave value to be

combined …SECOND

2

Page 48: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

48

Operation Phase (reloaded)

5

+2

Unlock, and wait …

SECOND2

zzz

Page 49: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

49

Distribution Phase

5

0

zzz

Move down with resultSECOND

Page 50: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

50

Distribution Phase

5

zzz

Leave result for 2nd thread & lock node

SECOND2

Page 51: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

51

Distribution Phase

5

0

zzz

Move result back down

treeSECOND

2

Page 52: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

52

Distribution Phase

5

2nd thread awakens,

unlocks, takes value

IDLE

3

Page 53: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

53

Bad News: High Latency

+2 +3

+5

Log n

Page 54: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

54

Good News: Real Parallelism

+2 +3

+5

2 threads

1 thread

Page 55: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

55

Throughput Puzzles

• Ideal circumstances– All n threads move together, combine– n increments in O(log n) time

• Worst circumstances– All n threads slightly skewed, locked

out– n increments in O(n · log n) time

Page 56: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

56

Index Distribution Benchmark

void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }}

Page 57: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

57

Index Distribution Benchmark

void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }}

How many iterations

Page 58: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

58

Index Distribution Benchmark

void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }}

Expected time between

incrementing counter

Page 59: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

59

Index Distribution Benchmark

void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }}

Take a number

Page 60: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

60

Index Distribution Benchmark

void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }}

Pretend to work(more work, less

concurrency)

Page 61: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

61

Performance Benchmarks

• Alewife– NUMA architecture– Simulated

• Throughput:– average number of

inc operations in 1 million cycle period.

• Latency:– average number of

simulator cycles per inc operation.

Page 62: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

62

Performance

Latency: Throughput:90000

80000

60000

50000

30000

20000

00 50 100 150 200 250 300

Splock

Ctree[n]

num of processors

cycles per operation

100 150 200 250 300

90000

80000

70000

60000

50000

40000

30000

20000

Splock

Ctree[n]

operationsper millioncycles

50

10000

00

num of processors

work = 0

Page 63: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

63

The Combining Paradigm

• Implements any RMW operation• When tree is loaded

– Takes 2 log n steps– for n requests

• Very sensitive to load fluctuations:– if the arrival rates drop– the combining rates drop– overall performance deteriorates!

Page 64: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

64

Combining Load Sensitivity

Notice Load Fluctuations

Th

roug

hp

ut

processors

Page 65: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

65

Combining Rate vs Work

0

10

20

30

40

50

60

70

1 2 4 8 16 31 48 64

W=100

W=1000

W=5000

Page 66: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

66

Better to Wait Longer

Short wait

Indefinite wait

Medium wait

Th

roug

hp

ut

processors

Page 67: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

67

Conclusions

• Combining Trees– Work well under high contention– Sensitive to load fluctuations– Can be used for getAndMumble() ops

• Next– Counting networks– A different approach …

Page 68: Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming

68

         This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

• You are free:– to Share — to copy, distribute and transmit the work – to Remix — to adapt the work

• Under the following conditions:– Attribution. You must attribute the work to “The Art of

Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work).

– Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.

• For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to– http://creativecommons.org/licenses/by-sa/3.0/.

• Any of the above conditions can be waived if you get permission from the copyright holder.

• Nothing in this license impairs or restricts the author's moral rights.