View
223
Download
0
Tags:
Embed Size (px)
Citation preview
Contention in shared memory multiprocessors
Multiprocessor synchronization
algorithms (20225241)
Lecturer: Danny Hendler
• Definitions• Lower bound for consensus• Lower bounds for counters, stacks and queues
Contention in shared-memory systems
Contention: the extent to which processes access the same memory locations simultaneously
When multiple processes simultaneously write to the same memory location, they are being stalled
High contention hurts performance!
Memory Stalls & Write-Contention
variable
p0p1p2pj
Stalls# j 2 1 0
Write-contention is the maximum number of processes that can be enabled to perform a write or read-modify-write operation to the same memory location simultaneously.
Recall the consensus implementation we saw…
Decide(v) ; code for pi, i=0,11. CAS(C, null, v) 2. return C
Initially C=null
We use a single object, C, that supports the compare&swap and read operations.
What is the write-contention of this algorithm?
nIt can be shown that this is the write-
contention of any consensus algorithm
What can we say about the worst-case time complexity of objects such as counters,
stacks and queues?
Naïve Counter Implementation
3
4
6 5
2
1
FAI
Last processes to succeed incur θ(n) time complexity!
FAI
FAIFAI
FAI
FAI
Can we do much better?
FAI object
We will see a time lower bound of √n on non-blocking implementations of:
counters, stacks, queues…
Any algorithm either (a) suffers high contention or (b) suffers high latency
Capture Influence between processes
3
5
1
4 2
6
Time complexity is determined by the extent by which operations by different processes
influence each other.
Influence-levelShared Counter
17
Each of us may precede you
and modify the value you will
get!
Influence level (w.r.t. p)
FAI
Hmmm… I will soon request a
value
p
Modifying StepsShared Counter
17FAI
Hmmm… I will soon request a
value
Each of us may precede
you!
pq
Modifying StepsShared Counter
17
Hmmm… I will soon request a
value
Each of us may precede
you!
pq
FAI
Modifying StepsShared Counter
17FAI
Hmmm… I will soon request a
value
Each of us may precede
you!
pq
Modifying StepsShared Counter
18
Hmmm… I will soon request a
value
Each of us may precede
you!
pq 17
There’s an atomic step in which q modifies p’s return value.
We bring all the ‘Influencers’ to be on the verge of performing a modifying step
FAI
Space/Write-contention tradeoff
• We bring all Influencers to be on the verge of a modifying step
• Each modifying step is necessarily a write/RMW operation
S ≥IC
Space complexity
Influence-level
Write-contention
Latency/Contention tradeoff
Base-objects on which there are outstanding modifying steps
Shared Counter
17 FAI
Hmmm… I will soon request a
value
p
Process p can be made to read all
these variables in the course of its
operation!
LR ≥IC
# of read base objects
Influence-level
Write-contention
Time lower bound
LRC ≥I
Time complexity is at least I
Influence(n) Objects ClassDefinition: The Influence-function, Io(n), of
a generic object O, is defined as follows:
Io(n)= k, if the influence-level of any n-process nonblocking implementation of O is at least k.
Influence(n) includes: stacks, queues, hash-tables, pools, linearizable counters, consensus, approximate-agreement…
Definition: Influence(n) is the class of generic objects whose Influence-function is in (n)
Concurrent Counter is in Influence(n)
Shared Counter
17
Each of us may precede
you!
FAI
Hmmm… I will soon request a
value
p
Influence-level is (n-1): every q≠p can influence p
Stack is in Influence(n)
Each of us may precede
you!
Hmmm… I will soon
attempt to pop a value.
p123
n
Top of stack
Influence-level is (n-1), e.g. if every q≠p has a pending pop operation.
Approximate Agreement is in Influence(n)
P1
0 2ε 2ε 2ε 2ε 2ε
Influence-level is (n-1)
If p1 runs first, it must return 0. If it is preceded by an
execution where some q≠p1 terminates, p
1 must return a
value no less than ε.
P2 P
3P
4P
5 Pn
In approximate agreement, each process proposes its value.
•Validity: Each process must decide on a value that is legal (in the range of proposed values).
•Approximate agreement: The values decided by any two processes must be no more than ε apart.
The First-Generation Problem
• Every process calls a First operation once.• We say an operation is in the first generation of execution
E if it is not preceded in E by any other operation
• All operations not in the first generation of the execution must return false.
• In quiescence, at least one operation from the first generation must have returned true.
Lemma
The First-Generation object is in Influence(n), and for this problem our bound is tight.
The bound for Influence(n) is tight
The mark array of n multi-reader multi-writer atomic variables
An Optimal Implementation for the First Generation Problem
Groups of n
processes
A linear lower bound on the number ofStalls for long-lived objects
The following material is not required
for the exam/assignments.
“Naïve” Counter Implementation
3
4
6 5
2
1
FAI
Last process incurs θ(n) time complexity!
FAI
FAIFAI
FAI
FAI
Can we do better?
Shared word supporting fetch&inc
FAI: Fetch-and-Increment
Theorem:Consider any n-process implementation of an obstruction-free counter, then the worst-case number of stalls incurred by a process as it performs a fetch&increment operation is at least n-1.
Worst-case stalls number ≥ n-1
Start from an initial state. Fix a process p about to perform a fetch&increment operation.
Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered.p
1
Worst-case stalls number ≥ n-1
Start from an initial state. Fix a process p about to perform a fetch&increment operation.
Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered.p
1 2
Worst-case stalls number ≥ n-1
Start from an initial state. Fix a process p about to perform a fetch&increment operation.
Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered.p
1 2 3
Worst-case stalls number ≥ n-1
Start from an initial state. Fix a process p about to perform a fetch&increment operation.
Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered.p
1 2 34
Worst-case stalls number ≥ n-1
p
1 2 34
Let O1 be the first word along p's path that is written by some other process in any p-free execution
There must be such a word.
O1
Worst-case stalls number ≥ n-1
p
1 2 34
O1
Let E1 be an execution that maximizes the number of processes that are about to write to O1 over all p-free executions.
|G1| = K1
Worst-case stalls number ≥ n-1
p
1 2 34
O1
If (k1=n-1) then we are done.
|G1| = K1
Otherwise, we show that p must access yet another word that may be written by other processes.
Worst-case stalls number ≥ n-1
p
1 2 34
O1
|G1| = K1
What happens if p incurs the stalls on O1?
Worst-case stalls number ≥ n-1
p
1 2 34
O1
What happens if p incurs the stalls on O1?
Worst-case stalls number ≥ n-1
p
1 2 34
O1
What happens if p incurs the stalls on O1?
Worst-case stalls number ≥ n-1
p
1 2 34
O1
What happens if p incurs the stalls on O1?
Worst-case stalls number ≥ n-1
p
1 2 34
O1
What happens if p incurs the stalls on O1?
Worst-case stalls number ≥ n-1
p
1 2 34
O1
What happens if p incurs the stalls on O1?
Worst-case stalls number ≥ n-1
p
1 2 34
O1
What happens if p incurs the stalls on O1?
But now the rest of the path may change....
Worst-case stalls number ≥ n-1
p
1 2 34
O1
What happens if p incurs the stalls on O1?
But now the rest of the path may change....
3
Worst-case stalls number ≥ n-1
p
1 2 34
O1
What happens if p incurs the stalls on O1?
But now the rest of the path may change....
3
Worst-case stalls number ≥ n-1
p
1 2 4
O1
What happens if p incurs the stalls on O1?
But now the rest of the path may change....
3
Assume p gets value v
Worst-case stalls number ≥ n-1
1 2 4
O1
3
|G1| = K1
v: the value returned by p if we let it run and incur the stallsc: the number of fetch&increment operations completed before p starts its operation
We have: v {c,…,c+K1}
p
Worst-case stalls number ≥ n-1
v: the value returned by p if we let it run and incur the stallsc: the number of fetch&increment operations completed before p starts its operation
We have: v {c,…,c+K1}
time
q.enq(x)
q.enq(y)
fetch&inc
fetch&inc
fetch&inc
time
vp
q.enq(x)fetch&inc
q.enq(x)fetch&inc
fetch&inc
c q.enq(x)fetch&inc
K1
Worst-case stalls number ≥ n-1
1 2 4
O1
3
|G1| = K1
v: the value returned by p if we let it run and incur the stallsc: the number of fetch&increment operations completed before p starts its operation
p
We select some process q G1 {p}
We let q perform K1+1 fetch&increment operations
q must write to a word read by p after O1
Worst-case stalls number ≥ n-1
1 2 4
O1
3
|G1| = K1
v: the value returned by p if we let it run and incur the stallsc: the number of fetch&increment operations completed before p starts its operation
p
We select some process q G1 {p}
We let q perform K1+1 fetch&increment operations
q must write to a word read by p after O1
q
time
q.enq(x)
q.enq(y)
q.deq(x)
fetch&inc
fetch&inc
fetch&inc
time
v' > vP
q.enq(x)fetch&inc
fetch&inc
c+K1+1 q.enq(x)fetch&inc
K1
Worst-case stalls number ≥ n-1v: the value returned by p if we let it run and incur the stallsc: the number of fetch&increment operations completed before p starts its operation
We let q perform K1+1 fetch&increment operations q must write to a
word read by p after O1
Worst-case stalls number ≥ n-1
1 2 4
O1
3
|G1| = K1
p
Let O2 be first word that will be accessed by p after it incurs the K1 stalls that is written by some process G1 {p}Let E2 be an execution that maximizes the number of processes that are about to write to O2 over all (G1 {p})-free executions.
Worst-case stalls number ≥ n-1
O1
|G1| = K1
p
Continuing with this construction we get:
O2
|G2| = K2 |Gm| = Km
Om
Conclusion: “Naïve ” implementation is best
possible!
3
4
6 5
2
1
FAI
FAI
FAIFAI
FAI
FAI
FAI object