87
1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

Embed Size (px)

Citation preview

Page 1: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

1© R. Guerraoui

Concurrent Algorithms(Overview)

Prof R. GuerraouiDistributed Programming Laboratory

Page 2: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

2

In short

This course is about the principles of robust concurrent computing

Page 3: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

3

Certain things are incorrect and it is important to understand why (at least what correctness means)

Certain things are impossible and its important to understand why (at least to not try)

Principles Principles

Page 4: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

4

WARNING

This course is different from the master course : Distributed Algorithms

This course is about shared memory whereas the other one is about message passing systems

It does make a lot of sense to take both

Page 5: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

5

Major chip manufacturers have recently announced what is perceived as a major paradigm shift in computing:

Multiprocessors vs faster processors

Maybe Moore was wrong…

Page 6: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

6

Major chip manufacturers have recently announced a major paradigm shift:

New York Times, 8 May 2004:Intel … [has] decided to focus its development efforts on «dual core» processors … with two engines instead of one, allowing for greater efficiency because the processor workload is essentially shared.

Page 7: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

7

The clock speed of a processor cannot be increased without overheating

But

More and more processors can fit in the same space

Page 8: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

8

Multicores Multicores areare almost almost everywhereeverywhere

Dual-core commonplace in laptopsQuad-core in desktopsDual quad-core in serversAll major chip manufacturers produce multicore CPUs

SUN Niagara (8 cores, 32 threads)Intel Xeon (4 cores)AMD Opteron (4 cores)

Page 9: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

9

AMD Opteron (4 cores)AMD Opteron (4 cores)

L1 cache

L2 cache

L3 cache(shared)

Page 10: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

10

SUN’s Niagara CPU2 (8 SUN’s Niagara CPU2 (8 cores)cores)

Page 11: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

11

MultiprocessorsMultiprocessors

Multiple hardware processors, each executes a series of processes (software constructs) modeling sequential programs

Multicore architecture: multiple processors are placed on the same chip

Page 12: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

12

Principles of an Principles of an architecturearchitecture

Two fundamental components that fall apart: processors and memory

The Interconnect links the processors with the memory:- SMP (symmetric): bus (a tiny Ethernet)- NUMA (network): point-to-point network

Page 13: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

13

CyclesCycles

The basic unit of time is the cycle: time to execute an instruction

This changes with technology but the relative cost of instructions (local vs memory) does not

Page 14: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

14

Simple view

MemoryMemory

BusBus

ProcessoProcessor + r +

CacheCache

Page 15: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

15

Hardware synchronization Hardware synchronization objectsobjects

The basic unit of communication is the read and write to the memory (through the cache)

More sophisticated objects are sometimes provided and, as we will see, necessary: C&S, T&S, LL/SC

Page 16: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

16

The free ride is overThe free ride is over

Cannot rely on CPUs getting faster in every generation

Utilizing more than one CPU core requires concurrency

Page 17: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

17

The free ride is overThe free ride is over

One of the biggest future software challenges: exploiting concurrency

Every programmer will have to deal with itConcurrent programming is hard to get right

Page 18: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

18

Speed will be achieved by having several processors work on independent parts of a task

But

the processors would occasionally need to pause and synchronize

Page 19: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

19

Why synchronize?

But

If the task is indeed common, then pure parallelism is usually impossible and, at best, inefficient

Page 20: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

20

Shared object

Concurrent processes

Page 21: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

21

public class Counter

private long value;

public Counter(int i) { value = i;}

public long getAndIncrement()

{

return value++;

}

Counter

Page 22: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

22

Locked object

Locking (mutual exclusion)

Page 23: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

23

Locking with compare&swap()

A Compare&Swap object maintains a value x, init to , and y;

It provides one operation: c&s(v,w);

Sequential spec: c&s(old,new)

{y := x; if x = old then x := new; return(y)}

Page 24: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

24

lock() {

repeat until

unlocked = this.c&s(unlocked,locked)

}

unlock() {

this.c&s(locked,unlocked)

}

Locking with compare&swap()

Page 25: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

25

Locking with test&set()

A test&set object maintains binary values x, init to 0, and y;

It provides one operation: t&s()

Sequential spec: t&s() {y := x; x: = 1; return(y);}

Page 26: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

26

lock() {

repeat until (0 = this.t&s());

}

unlock() {

this.setState(0);

}

Locking with test&set()

Page 27: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

27

lock() {

while (true)

{

repeat until (0 = this.getState());

if 0 = (this.t&s()) return(true);

}

}

unlock() {

this.setState(0);

}

Locking with test&set()

Page 28: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

28

Lock l = ...;

l.lock();

try {

// access the resource protected by this lock

} finally {

l.unlock();

}

Explicit use of a lock

Page 29: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

29

public class SynchronizedCounter {

private int c = 0;

public synchronized void increment() {

c++;

}

public synchronized void getAndincrement() {

c++; return c;

}

public synchronized int value() {

return c;

}

}

Implicit use of a lock

Page 30: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

30

Locking (mutual exclusion)

Difficult: 50% of the bugs reported in Java come from the use of « synchronized »

Fragile: a process holding a lock prevents all others from progressing

Page 31: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

31

Locked object

One process at a time

Page 32: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

32

Processes are asynchronous

Page faultsPre-emptionsFailuresCache misses, …

Page 33: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

33

Processes are asynchronous

A cache miss can delay a process by ten instructionsA page fault by few millionsAn os preemption by hundreds of millions…

Page 34: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

34

Coarse grained locks => slow

Fine grained locks => errors

Page 35: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

35

Double-ended queue

Enqueue Dequeue

Page 36: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

36

Processes are asynchronous

Page faults, pre-emptions, failures, cache misses, …

A process can be delayed by millions of instructions …

Page 37: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

37

Alternative to locking?

Page 38: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

38

Wait-free atomic objects

Wait-freedom: every process that invokes an operation eventually returns from the invocation (robust … unlike locking)

Atomicity: every operation appears to execute instantaneously (as if the object was locked…)

Page 39: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

39

In short

This course shows how to wait-free implement high-

level atomic objects out of more primitive base objects

Page 40: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

40Shared object

Concurrent processes

Page 41: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

41

This course

Theoretical but no specific theoretic background

Written exam at the end of the semester

Page 42: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

42

Roadmap

Model Processes and objects Atomicity and wait-freedom

ExamplesContent

Page 43: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

43

Processes

We assume a finite set of processes

Processes are denoted by p1,..pN or p, q, r

Processes have unique identities and know each other (unless explicitly stated otherwise)

Page 44: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

44

Processes

Processes are sequential units of computations

Unless explicitly stated otherwise, we make no assumption on process (relative) speed

Page 45: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

45

Processes

p1

p2

p3

Page 46: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

46

ProcessesA process either executes the algorithm assigned to it or crashes

A process that crashes does not recover (in the context of the considered computation)

A process that does not crash in a given execution (computation or run) is called correct (in that execution)

Page 47: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

47

Processes

p1

p2

p3

crash

Page 48: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

48

On objects and processes

Processes execute local computation or access shared objects through their operations

Every operation is expected to return a reply

Page 49: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

49

Processes

p1

p2

p3

operation

operation

operation

Page 50: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

50

On objects and processesSequentiality means here that, after invoking an operation op1 on some object O1, a process does not invoke a new operation (on the same or on some other object) until it receives the reply for op1

Remark. Sometimes we talk about operations when we should be talking about operation invocations

Page 51: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

51

Processes

p1

p2

p3

operation

operation

operation

Page 52: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

52

AtomicityWe mainly focus in this course on how to implement atomic objects

Atomicity means that every operation appears to execute at some indivisible point in time (called linearization point) between the invocation and reply time events

Page 53: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

53

Atomicity

p1

p2

p3

operation

operation

operation

Page 54: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

54

Atomicity

p1

p2

p3

operation

operation

operation

Page 55: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

55

Atomicity (the crash case)

p1

p2

p3

operation

operation

operation

p2

crash

Page 56: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

56

Atomicity (the crash case)

p1

p2

p3

operation

operation

operation

p2

Page 57: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

57

Atomicity (the crash case)

p1

p2

p3

operation

operation

p2

Page 58: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

58

Wait-freedomWe mainly focus in this course on wait-free implementations

An implementation is wait-free if any correct process that invokes an operation eventually gets a reply, no matter what happens to the other processes (crash or very slow)

Page 59: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

59

Wait-freedom

p1

p2

p3

operation

Page 60: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

60

Wait-freedomWait-freedom conveys the robustness of the implementation

With a wait-free implementation, a process gets replies despite the crash of the n-1 other processes

Note that this precludes implementations based on locks (mutual exclusion)

Page 61: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

61

Wait-freedom

p1

p2

p3

crash

operation

crash

Page 62: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

62

Roadmap

Model Processes and objects Atomicity and wait-freedom

ExamplesContent

Page 63: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

63

Most synchronization primitives (problems) can be precisely expressed as atomic objects (implementations)

Studying how to ensure robust synchronization boils down to studying wait-free atomic object implementations

Motivation

Page 64: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

64

Example 1

The reader/writer synchronization problem corresponds to the register object

Basically, the processes need to read or write a shared data structure such that the value read by a process at a time t, is the last value written before t

Page 65: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

65

Register

A register has two operations: read() and write()

We assume that a register contains an integer for presentation simplicity, i.e., the value stored in the register is an integer, denoted by x (initially 0)

Page 66: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

66

Sequential specification

Sequential specification

read()

return(x)

write(v)

x <- v;

return(ok)

Page 67: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

67

Atomicity?

p1

p2

p3

write(1) - ok

read() - 2

write(2) - ok

Page 68: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

68

Atomicity?

p1

p2

p3

write(1) - ok

read() - 2

write(2) - ok

Page 69: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

69

Atomicity?

p1

p2

p3

write(1) - ok

read() - 1

write(2) - ok

Page 70: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

70

Atomicity?

p1

p2

p3

write(1) - ok

read() - 1

write(2) - ok

Page 71: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

71

Atomicity?

p1

p2

p3

write(1) - ok

read() - 1

read() - 1

Page 72: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

72

Atomicity?

p1

p2

p3

write(1) - ok

read() - 1

read() - 0

Page 73: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

73

Atomicity?

p1

p2

p3

write(1) - ok

read() - 0

read() - 0

Page 74: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

74

Atomicity?

p1

p2

p3

write(1) - ok

read() - 0

read() - 0

Page 75: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

75

Atomicity?

p1

p2

p3

write(1) - ok

read() - 0

read() - 0

Page 76: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

76

Atomicity?

p1

p2

p3

write(1) - ok

read() - 1

read() - 0

Page 77: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

77

Atomicity?

p1

p2

p3

write(1) - ok

read() - 1

read() - 1

Page 78: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

78

Example 2

The producer/consumer synchronization problem corresponds to the queue object

Producer processes create items that need to be used by consumer processes

An item cannot be consumed by two processes and the first item produced is the first consumed

Page 79: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

79

Queue

A queue has two operations: enqueue() and dequeue()

We assume that a queue internally maintains a list x which exports operation appends() to put an item at the end of the list and remove() to remove an element from the head of the list

Page 80: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

80

Sequential specification

dequeue()

if(x=0) then return(nil);

else return(x.remove())

enqueue(v)

x.append(v);

return(ok)

Page 81: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

81

Atomicity?

p1

p2

p3

enq(x) - ok

deq() - y

deq() - x

enq(y) - ok

Page 82: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

82

Atomicity?

p1

p2

p3

enq(x) - ok

deq() - y

deq() - x

enq(y) - ok

Page 83: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

83

Atomicity?

p1

p2

p3

enq(x) - ok

deq() - y

enq(y) - ok

Page 84: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

84

Atomicity?

p1

p2

p3

enq(x) - ok

deq() - y

enq(y) - ok

Page 85: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

85

Roadmap

Model Processes and objects Atomicity and wait-freedom

ExamplesContent

Page 86: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

86

Content

(1) Implementing registers

(2) The power & limitation of registers

(3) Universal objects & synchronization number

(4) The power of time & failure detection

(5) Tolerating failure prone objects

(6) Anonymous implementations

(7) Transaction memory

Page 87: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

87

In short This course shows how to wait-

free implement high-level atomic objects out of basic objects

Remark. Unless explicitly stated otherwise, objects mean atomic objects and implementations are wait-free