22
©2009, 2013 Simon Kahan Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan [email protected]

Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan [email protected]

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

Competition => Collaborationa runtime-centric view of parallel computation

Simon [email protected]

Page 2: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

Competition

Multiple entities in contention for limited, indivisible resources or opportunities

Page 3: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

Direct Mitigation Techniques• Take turns

• Share

• Find more dolls

Page 4: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

Direct Mitigation Techniques• Take turns

– Mutual Exclusion

• Share– Transactions

• Find more dolls– Replication (eg, of Data Structures)

Page 5: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

Direct Mitigation Techniques• Take turns

– Mutual exclusion– Delay is linear in concurrency: does not scale

• Share– Transactions– Aborted work is up to quadratic in concurrency: does not scale

• Find more dolls– Replication (eg, of Data Structures)– Cost ~ maximum concurrency sustained + coherency overheads: does not scale

Page 6: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

Collaboration

Entities align to reduce contention, increase throughput.

Page 7: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

Transform competition to collaboration?

Why won’t these people collaborate ?!

Page 8: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

Are computers better collaborators?

Page 9: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan 9

MTA-2 Processor

load, store, int_fetch_add

+ - * /, etc.

Arithmetic Pipeline

Every clock cycle, a ready instruction may begin execution…

(M A C) Stream1

Stream2

.

.

.

Stream128

Instructions

Page 10: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan 10

CPU

Memory

Be Parallel or Die.

CPU

CPU

Simplified Cray/Tera MTA-2 System Architecture

4,00

0 A

ctiv

e Th

read

s

Page 11: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

OS

Heap

Application

Memory Allocation

sbreak()

malloc(), free()

Page 12: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

OS

Heap

Application

Parallel Memory Allocation

sbreak()

malloc(), free()malloc(), free()malloc(), free()malloc(), free()

Page 13: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

OS

Application

Replication for Concurrency

Heap HeapHeap

sbreak() sbreak() sbreak()

Page 14: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

OS

Application

Increase heap size to lower sbreak rate

Heapsbreak()

Heap Heap

Q: What’s wrong with this picture? A: O(P2) wasted space!

Page 15: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

Can collaboration help?

• Idea: apply the ticket line trick!– tasks need to “find” each other– aggregate their requests into one– one “master” task continues; other waits– until master finds heap uncontended, repeat process– master locks heap, fulfills request, unlocks heap– master recursively splits and awakens waiters

• Simon Kahan and Petr Konecny. 2006. "MAMA!": a memory allocator for multithreaded architectures. PPoPP '06.

15

Page 16: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

Combining FunnelsConcurrent Asynchronous Individual

Malloc and Free Requests

Tim

e: lg

F

Aggregate Requests of Size at most F served serially.(Output rate is at most a constant.)

“Funnel”:combining data structure

Concurrency: F

See: “Combining Funnels: a Dynamic Approach to Software Combining”, Nir Shavit, Asaph Zemach, 1999

Page 17: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

Aggregates: Pennants for speedSingle requests (Pennants of order 0)

Combine

Combine

Combine

• Merge is 2 ops:

T2.left = T1.right

T1.right = T2

• Balanced

Unlike linked lists,supports parallel traversal

•Unique representation

Page 18: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

Tree-Heap

NULL NULL………

NULL

while (int_fetch_add(&sem, 1)) try_combine();heap_op(); sem = 0;

Allocate tries for corresponding slot; if empty, marches to right.Free tries for corresponding slot; if full, combines and carries.It’s just binary arithmetic! Worst-case O(log N); Average O(1)

Page 19: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

Instructions vs Delay

MAMA! 1 cpu is highest

MTA Malloc 10 cpu highest (instructions)

Page 20: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

Original MTA malloc vs MAMA

220 MHz MTA-40, 100 streams per processor

MTA

MAMA

40

40

5

5

Page 21: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

annihilate (or void fn).

General Combining SchemeAsynchronous (Competitive)•Arbitrary # computations•Any number of threads•Timing of interaction arbitrary•Chaos!

Synchronous (Collaborative)•Single computation•Number of threads is explicit•Synchronized, exclusive access to data•Order!

DataStructure

make requests …Asynchronous threads…

combine in funnel…aggregate tries lock…

if fail, circulate...

got lock!

re-enter funnel and try again…

Satisfy aggregate in parallel synchronously…

release lock…de-aggregate, returning results in parallel to requesting threads.

Page 22: Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition => Collaboration a runtime-centric view of parallel computation Simon Kahan skahan@cs.washington.edu

©2009, 2013 Simon Kahan

Conclusion

• Concurrency often creates competition.• Competition indicates duplication in need.• Serializing, transacting, replicating -- may

only mitigate competition• Consider transforming competition to

collaboration, aligning common need to get there faster.

22