22
Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs

Fence Complexity in Concurrent Algorithms

  • Upload
    serge

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Fence Complexity in Concurrent Algorithms. Petr Kuznetsov TU Berlin/DT-Labs. STM is about ease-of-programming and efficiency. What is “efficient“ in a concurrent system?. Cost metrics. Space: used memory Cheap Advanced garbage-collection Time: - PowerPoint PPT Presentation

Citation preview

Page 1: Fence Complexity  in Concurrent Algorithms

Fence Complexity in Concurrent Algorithms

Petr KuznetsovTU Berlin/DT-Labs

Page 2: Fence Complexity  in Concurrent Algorithms
Page 3: Fence Complexity  in Concurrent Algorithms

STM is about ease-of-programmingand efficiency

What is “efficient“ in a concurrent system?

Page 4: Fence Complexity  in Concurrent Algorithms

4

Cost metrics

Space: used memoryCheapAdvanced garbage-collection

Time: the number of reads and writes (per operation)the number of stalls

Page 5: Fence Complexity  in Concurrent Algorithms

5

Relaxed memory modelsMemory is much slower than CPURead: check the cache -> read the memoryWrite: invalidate the caches -> update the memoryTo overcome “stalled writes” – reorder operations

Reordering may result in inconsistency

Page 6: Fence Complexity  in Concurrent Algorithms

6

What is inconsistency?

Process P:

Write(X,1)

Read(Y)

Process Q:

Write(Y,1)

Read(X)

P

QW(Y,1)

R(Y)W(X,1)

R(X)

W(X,1)

Page 7: Fence Complexity  in Concurrent Algorithms

7

Possible outcomes

P Q

P reads before Q writes

P reads after Q writes

Q reads after P writes

Q reads before P writes

Out-of-order

Page 8: Fence Complexity  in Concurrent Algorithms

8

Fixing out-of-order Memory fences: read-after-write (RAW)

write(X,1)

fence() // enforce the order

read(Y)

P

QW(Y,1)

R(Y)W(X,1)

R(X)

Page 9: Fence Complexity  in Concurrent Algorithms

9

Fixing out-of-order Atomic operations: atomic-write-after-read atomic{

read(Y)

write(X,1)

}E.g., CAS, TAS, Fetch&Add,…

RAW/AWAR fences take ~60 RMRs

Page 10: Fence Complexity  in Concurrent Algorithms

10

Our result

10

Any concurrent program in a certain class must use RAW/AWARs

Page 11: Fence Complexity  in Concurrent Algorithms

11

What programs?

Concurrent data types:queues, counters, hash tables, trees,…Non-commutative operationsLinearizable solo-terminating implementations

Mutual exclusion

Page 12: Fence Complexity  in Concurrent Algorithms

12

Non-commutative operations

Operation A is non-commutative if there exists operation B where (applied to some state):

A influences Band

B influences A

Page 13: Fence Complexity  in Concurrent Algorithms

13

Example: Queue enq(v) – add v to the end of the queue deq() – dequeues the item at the head of the queue

Q=1;2

Q.deq():1;Q.deq():2 vs. Q.deq():2;Q.deq():1deq() influence each other

Q.enq(3):ok;Q.deq():1 vs. Q.deq():1;Q.enq(3):okenq() is commutative

Page 14: Fence Complexity  in Concurrent Algorithms

14

Proof sketch A non-commutative operation must write Suppose not

deq():1 deq():11;2

there must be a write!

w

Page 15: Fence Complexity  in Concurrent Algorithms

15

Proof sketch Let w be the first write Suppose there are no AWAR

deq():11;2

A(w) - the longest atomic construct containing w

w

w must be the first base-object event in A(w)!

Page 16: Fence Complexity  in Concurrent Algorithms

16

Proof sketch Suppose there are no RAWs

deq():11;2

No RAW - no difference for deq()!

deq():1

A(w)

Page 17: Fence Complexity  in Concurrent Algorithms

17

Mutual exclusionLock() – acquire the lockUnlock() – release the lock (Mutex) No two process holds the lock at the

same time (Deadlock-freedom) If at least one process

executes Lock() and no active process fails, at least one process acquires the lock

Two Lock() operations influence each other!

Page 18: Fence Complexity  in Concurrent Algorithms

18

Our result

18

In any implementation of mutual exclusion or a concurrent data type with a non-

commutative operation op, a complete execution of op or lock() contains a

RAW or AWAR

Every successful lock acquire incurs a RAW/AWAR fence

Page 19: Fence Complexity  in Concurrent Algorithms

19

Why do we care?

Hardware design: what primitives must be optimized?

API design: returned values matterSet with add returning fail vs. returning ok

Verification – early catch of obviously incorrect algorithm

Page 20: Fence Complexity  in Concurrent Algorithms

20

What’s next? Weaker primitives?

Idempotent Work Stealing [Michael et al,PPoPP’09 ] Tight lower bounds?

How many RAW/AWAR fences are incurred? Other patterns

Read-after-readWrite-after-writeMulti-RAW:

write(Xi,1)

collect(X1,..,Xn)

Page 21: Fence Complexity  in Concurrent Algorithms

21

References H. Attiya, R. Guerraoui, D. Hendler, P. Kuznetsov,

M. Michael, M. VechevLaws of Order: Expensive Synchronization in Concurrent Algorithms Cannot be EliminatedIn POPL 2011

Srivatsan’s talk on STM fence complexity, TR on the way

Page 22: Fence Complexity  in Concurrent Algorithms

22

QUESTIONS?