50
Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler • Definitions • Lower bound for consensus • Lower bounds for counters, stacks and queues

Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Contention in shared memory multiprocessors

Multiprocessor synchronization

algorithms (20225241)

Lecturer: Danny Hendler

• Definitions• Lower bound for consensus• Lower bounds for counters, stacks and queues

Page 2: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Contention in shared-memory systems

Contention: the extent to which processes access the same memory locations simultaneously

When multiple processes simultaneously write to the same memory location, they are being stalled

High contention hurts performance!

Page 3: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Memory Stalls & Write-Contention

variable

p0p1p2pj

Stalls# j 2 1 0

Write-contention is the maximum number of processes that can be enabled to perform a write or read-modify-write operation to the same memory location simultaneously.

Page 4: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Recall the consensus implementation we saw…

Decide(v) ; code for pi, i=0,11. CAS(C, null, v) 2. return C

Initially C=null

We use a single object, C, that supports the compare&swap and read operations.

What is the write-contention of this algorithm?

nIt can be shown that this is the write-

contention of any consensus algorithm

Page 5: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

What can we say about the worst-case time complexity of objects such as counters,

stacks and queues?

Page 6: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Naïve Counter Implementation

3

4

6 5

2

1

FAI

Last processes to succeed incur θ(n) time complexity!

FAI

FAIFAI

FAI

FAI

Can we do much better?

FAI object

Page 7: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

We will see a time lower bound of √n on non-blocking implementations of:

counters, stacks, queues…

Any algorithm either (a) suffers high contention or (b) suffers high latency

Page 8: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Capture Influence between processes

3

5

1

4 2

6

Time complexity is determined by the extent by which operations by different processes

influence each other.

Page 9: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Influence-levelShared Counter

17

Each of us may precede you

and modify the value you will

get!

Influence level (w.r.t. p)

FAI

Hmmm… I will soon request a

value

p

Page 10: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Modifying StepsShared Counter

17FAI

Hmmm… I will soon request a

value

Each of us may precede

you!

pq

Page 11: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Modifying StepsShared Counter

17

Hmmm… I will soon request a

value

Each of us may precede

you!

pq

FAI

Page 12: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Modifying StepsShared Counter

17FAI

Hmmm… I will soon request a

value

Each of us may precede

you!

pq

Page 13: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Modifying StepsShared Counter

18

Hmmm… I will soon request a

value

Each of us may precede

you!

pq 17

There’s an atomic step in which q modifies p’s return value.

We bring all the ‘Influencers’ to be on the verge of performing a modifying step

FAI

Page 14: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Space/Write-contention tradeoff

• We bring all Influencers to be on the verge of a modifying step

• Each modifying step is necessarily a write/RMW operation

S ≥IC

Space complexity

Influence-level

Write-contention

Page 15: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Latency/Contention tradeoff

Base-objects on which there are outstanding modifying steps

Shared Counter

17 FAI

Hmmm… I will soon request a

value

p

Process p can be made to read all

these variables in the course of its

operation!

LR ≥IC

# of read base objects

Influence-level

Write-contention

Page 16: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Time lower bound

LRC ≥I

Time complexity is at least I

Page 17: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Influence(n) Objects ClassDefinition: The Influence-function, Io(n), of

a generic object O, is defined as follows:

Io(n)= k, if the influence-level of any n-process nonblocking implementation of O is at least k.

Influence(n) includes: stacks, queues, hash-tables, pools, linearizable counters, consensus, approximate-agreement…

Definition: Influence(n) is the class of generic objects whose Influence-function is in (n)

Page 18: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Concurrent Counter is in Influence(n)

Shared Counter

17

Each of us may precede

you!

FAI

Hmmm… I will soon request a

value

p

Influence-level is (n-1): every q≠p can influence p

Page 19: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Stack is in Influence(n)

Each of us may precede

you!

Hmmm… I will soon

attempt to pop a value.

p123

n

Top of stack

Influence-level is (n-1), e.g. if every q≠p has a pending pop operation.

Page 20: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Approximate Agreement is in Influence(n)

P1

0 2ε 2ε 2ε 2ε 2ε

Influence-level is (n-1)

If p1 runs first, it must return 0. If it is preceded by an

execution where some q≠p1 terminates, p

1 must return a

value no less than ε.

P2 P

3P

4P

5 Pn

In approximate agreement, each process proposes its value.

•Validity: Each process must decide on a value that is legal (in the range of proposed values).

•Approximate agreement: The values decided by any two processes must be no more than ε apart.

Page 21: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

The First-Generation Problem

• Every process calls a First operation once.• We say an operation is in the first generation of execution

E if it is not preceded in E by any other operation

• All operations not in the first generation of the execution must return false.

• In quiescence, at least one operation from the first generation must have returned true.

Lemma

The First-Generation object is in Influence(n), and for this problem our bound is tight.

The bound for Influence(n) is tight

Page 22: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

The mark array of n multi-reader multi-writer atomic variables

An Optimal Implementation for the First Generation Problem

Groups of n

processes

Page 23: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

A linear lower bound on the number ofStalls for long-lived objects

The following material is not required

for the exam/assignments.

Page 24: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

“Naïve” Counter Implementation

3

4

6 5

2

1

FAI

Last process incurs θ(n) time complexity!

FAI

FAIFAI

FAI

FAI

Can we do better?

Shared word supporting fetch&inc

FAI: Fetch-and-Increment

Page 25: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Theorem:Consider any n-process implementation of an obstruction-free counter, then the worst-case number of stalls incurred by a process as it performs a fetch&increment operation is at least n-1.

Page 26: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

Start from an initial state. Fix a process p about to perform a fetch&increment operation.

Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered.p

1

Page 27: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

Start from an initial state. Fix a process p about to perform a fetch&increment operation.

Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered.p

1 2

Page 28: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

Start from an initial state. Fix a process p about to perform a fetch&increment operation.

Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered.p

1 2 3

Page 29: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

Start from an initial state. Fix a process p about to perform a fetch&increment operation.

Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered.p

1 2 34

Page 30: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

p

1 2 34

Let O1 be the first word along p's path that is written by some other process in any p-free execution

There must be such a word.

O1

Page 31: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

p

1 2 34

O1

Let E1 be an execution that maximizes the number of processes that are about to write to O1 over all p-free executions.

|G1| = K1

Page 32: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

p

1 2 34

O1

If (k1=n-1) then we are done.

|G1| = K1

Otherwise, we show that p must access yet another word that may be written by other processes.

Page 33: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

p

1 2 34

O1

|G1| = K1

What happens if p incurs the stalls on O1?

Page 34: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

p

1 2 34

O1

What happens if p incurs the stalls on O1?

Page 35: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

p

1 2 34

O1

What happens if p incurs the stalls on O1?

Page 36: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

p

1 2 34

O1

What happens if p incurs the stalls on O1?

Page 37: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

p

1 2 34

O1

What happens if p incurs the stalls on O1?

Page 38: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

p

1 2 34

O1

What happens if p incurs the stalls on O1?

Page 39: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

p

1 2 34

O1

What happens if p incurs the stalls on O1?

But now the rest of the path may change....

Page 40: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

p

1 2 34

O1

What happens if p incurs the stalls on O1?

But now the rest of the path may change....

3

Page 41: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

p

1 2 34

O1

What happens if p incurs the stalls on O1?

But now the rest of the path may change....

3

Page 42: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

p

1 2 4

O1

What happens if p incurs the stalls on O1?

But now the rest of the path may change....

3

Assume p gets value v

Page 43: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

1 2 4

O1

3

|G1| = K1

v: the value returned by p if we let it run and incur the stallsc: the number of fetch&increment operations completed before p starts its operation

We have: v {c,…,c+K1}

p

Page 44: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

v: the value returned by p if we let it run and incur the stallsc: the number of fetch&increment operations completed before p starts its operation

We have: v {c,…,c+K1}

time

q.enq(x)

q.enq(y)

fetch&inc

fetch&inc

fetch&inc

time

vp

q.enq(x)fetch&inc

q.enq(x)fetch&inc

fetch&inc

c q.enq(x)fetch&inc

K1

Page 45: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

1 2 4

O1

3

|G1| = K1

v: the value returned by p if we let it run and incur the stallsc: the number of fetch&increment operations completed before p starts its operation

p

We select some process q G1 {p}

We let q perform K1+1 fetch&increment operations

q must write to a word read by p after O1

Page 46: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

1 2 4

O1

3

|G1| = K1

v: the value returned by p if we let it run and incur the stallsc: the number of fetch&increment operations completed before p starts its operation

p

We select some process q G1 {p}

We let q perform K1+1 fetch&increment operations

q must write to a word read by p after O1

q

Page 47: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

time

q.enq(x)

q.enq(y)

q.deq(x)

fetch&inc

fetch&inc

fetch&inc

time

v' > vP

q.enq(x)fetch&inc

fetch&inc

c+K1+1 q.enq(x)fetch&inc

K1

Worst-case stalls number ≥ n-1v: the value returned by p if we let it run and incur the stallsc: the number of fetch&increment operations completed before p starts its operation

We let q perform K1+1 fetch&increment operations q must write to a

word read by p after O1

Page 48: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

1 2 4

O1

3

|G1| = K1

p

Let O2 be first word that will be accessed by p after it incurs the K1 stalls that is written by some process G1 {p}Let E2 be an execution that maximizes the number of processes that are about to write to O2 over all (G1 {p})-free executions.

Page 49: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Worst-case stalls number ≥ n-1

O1

|G1| = K1

p

Continuing with this construction we get:

O2

|G2| = K2 |Gm| = Km

Om

Page 50: Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus

Conclusion: “Naïve ” implementation is best

possible!

3

4

6 5

2

1

FAI

FAI

FAIFAI

FAI

FAI

FAI object