50
Eliminating Read Barriers through Procrastination and Cleanliness KC Sivaramakrishnan Lukasz Ziarek Suresh Jagannathan

Eliminating Read Barriers through Procrastination and Cleanliness

  • Upload
    siran

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Eliminating Read Barriers through Procrastination and Cleanliness. KC Sivaramakrishnan Lukasz Ziarek Suresh Jagannathan. Big Picture. Lightweight user-level threads. Lots of concurrency!. Scheduler 1. t1. t2. tn. Core 1. Core 2. Core n. Heap. Big Picture. Big Picture. - PowerPoint PPT Presentation

Citation preview

Page 1: Eliminating Read Barriers through Procrastination and Cleanliness

Eliminating Read Barriers through Procrastination and Cleanliness

KC SivaramakrishnanLukasz Ziarek

Suresh Jagannathan

Page 2: Eliminating Read Barriers through Procrastination and Cleanliness

2

Big PictureLightweight user-level threads

Scheduler 1t1 t2 tn Lots of

concurrency!

Core 1 Core nCore 2

Heap

Page 3: Eliminating Read Barriers through Procrastination and Cleanliness

3

Big Picture

Expendable resource?

Big Picture

3

Scheduler 1t1 t2 tn Lots of

concurrency!

Heap

Page 4: Eliminating Read Barriers through Procrastination and Cleanliness

4

Big Picture

Expendable resource?

Big Picture

4

Scheduler 1

t1

t2 tn Lots of concurrency!

Heap

Exploit program concurrency to

eliminate read barriers from thread-local collectors

GC Operation

Alleviate MM cost?

Page 5: Eliminating Read Barriers through Procrastination and Cleanliness

5

MultiMLton• Goals– Safety, Scalability, ready for future manycore processors

• Parallel extension of MLton SML compiler and runtime

• Parallel extension of Concurrent ML– Lots of Concurrency!– Interact by sending messages over first-class channels

C

send (c, v)

v recv (c)

Page 6: Eliminating Read Barriers through Procrastination and Cleanliness

6

MultiMLton GC: Considerations• Standard ML – functional PL with side-effects– Most objects are small and ephemeral

• Independent generational GC– # Mutations << # Reads

• Keep cost of reads to be low

• Minimize NUMA effects• Run on non-cache coherent HW

Page 7: Eliminating Read Barriers through Procrastination and Cleanliness

7

MultiMLton GC: Design

Core

Local Heap

Core

Local Heap

Core

Local Heap

Core

Local Heap

Shared Heap

Thread-local GC

• NUMA Awareness• Circumvent cache-coherence issues

Page 8: Eliminating Read Barriers through Procrastination and Cleanliness

8

Invariant Preservation• Read and write barriers for preserving

invariants

Shared Heap

r

Local Heap

x

Target

Source

Exporting writes

r := x

Shared Heap

r

Local Heap

x

FWD

Transitive closure of x

Mutator needs read

barriers!

Page 9: Eliminating Read Barriers through Procrastination and Cleanliness

9

Challenge• Object reads are pervasive– RB overhead cost (RB) * frequency (RB)∝

• Read barrier optimization– Stacks and Registers never point to forwarded objects

20.1 %15.3 %21.3 %

Mean Overhead----------------------

Read

bar

rier o

verh

ead

(%)

Page 10: Eliminating Read Barriers through Procrastination and Cleanliness

10

Mutator and Forwarded Objects

# RB invocations

# Encountered forwarded objects

< 0.00001

Eliminate read barriers altogether

Page 11: Eliminating Read Barriers through Procrastination and Cleanliness

11

RB Elimination• Visibility Invariant– Mutator does not encounter forwarded objects

• Observation– No forwarded objects created visibility invariant ⇒

No read barriers⇒• Exploit concurrency Procrastination!

Page 12: Eliminating Read Barriers through Procrastination and Cleanliness

12

ProcrastinationShared Heap

r1

Local Heap

x1

T1 T2

r2

x2

r1 := x1 r2 := x2

T T is running

T T is suspended

T T is blocked

Page 13: Eliminating Read Barriers through Procrastination and Cleanliness

13

ProcrastinationShared Heap

r1

Local Heap

x1

r1 := x1

T1 T2

r2 := x2 r2

x2

Delayed write list

Control switches

to T2

T T is running

T T is suspended

T T is blocked

Page 14: Eliminating Read Barriers through Procrastination and Cleanliness

14

ProcrastinationShared Heap

r1

Local Heap

x1

T1 T2

r2

x2

Delayed write list

r1 := x1 r2 := x2

T T is running

T T is suspended

T T is blocked

Page 15: Eliminating Read Barriers through Procrastination and Cleanliness

15

ProcrastinationShared Heap

r1

Local Heap

T1 T2

r2 x2

Delayed write list

r1 := x1 r2 := x2x1

T T is running

T T is suspended

T T is blocked

FWD FWD

Page 16: Eliminating Read Barriers through Procrastination and Cleanliness

16

ProcrastinationShared Heap

r1

Local Heap

T1 T2

r2 x2

Delayed write list

x1

Force local GC

T T is running

T T is suspended

T T is blocked

r1 := x1 r2 := x2

Page 17: Eliminating Read Barriers through Procrastination and Cleanliness

17

Correctness• Does Procrastination introduce deadlocks?– Threads can be procrastinated while holding a lock!

T1 T2T2T T is running

T T is suspended

T T is blocked

Page 18: Eliminating Read Barriers through Procrastination and Cleanliness

18

Correctness

T1

• Is Procrastination safe?– Yes. Forcing a local GC unblocks the threads.– No deadlocks or livelocks!

T2T T is running

T T is suspended

T T is blocked

• Does Procrastination introduce deadlocks?– Threads can be procrastinated while holding a lock!

Page 19: Eliminating Read Barriers through Procrastination and Cleanliness

19

Correctness

T1 T2

• Does Procrastination introduce deadlocks?– Threads can be procrastinated while holding a lock!

T T is running

T T is suspended

T T is blocked• Is Procrastination safe?– Yes. Forcing a local GC unblocks the threads.– No deadlocks or livelocks!

Page 20: Eliminating Read Barriers through Procrastination and Cleanliness

20

• Efficacy (Procrastination) ∝ # Available runnable threads

Is Procrastination alone enough?

M

W1W1 W1

F

J

Serial (low thread availability)

Concurrent (high thread availability)

• With Procrastination, half of local major GCs were forced

Eager exporting writes while preserving visibility invariant

Page 21: Eliminating Read Barriers through Procrastination and Cleanliness

21

Cleanliness• A clean object closure can be lifted to the

shared heap without breaking the visibility invariant

r := xinSharedHeap (r)

inLocalHeap (x)&&

isClean (x)

Eager write (no Procrastination)

Page 22: Eliminating Read Barriers through Procrastination and Cleanliness

22

Cleanliness: Intuition

Shared Heap

Local Heap

x

lift (x) to shared heap

Page 23: Eliminating Read Barriers through Procrastination and Cleanliness

23

Shared Heap

Local Heap

Cleanliness: Intuition

x

FWD

find all references to FWD

Page 24: Eliminating Read Barriers through Procrastination and Cleanliness

24

Shared Heap

Local Heap

Cleanliness: Intuition

x Need to scan the entire local heap

Page 25: Eliminating Read Barriers through Procrastination and Cleanliness

25

Local Heap

h

Shared Heap

Cleanliness: Simpler question

x

FWD

Do all references originate from heap region h?

sizeof (h) << sizeof (local heap)

Page 26: Eliminating Read Barriers through Procrastination and Cleanliness

26

Local Heap

h

Shared Heap

Cleanliness: Simpler question

x Only scan theheap region h.

Heap session!

sizeof (h) << sizeof (local heap)

Page 27: Eliminating Read Barriers through Procrastination and Cleanliness

27

• Current session closed & new session opened– After an exporting write, a user-level context switch, a local

GC

Heap Sessions• Source of an exporting write is often– Young– rarely referenced from outside the closure

Previous Session Current Session FreeLocal Heap

SessionStart Frontier

Young Objects

Old Objects Start

Page 28: Eliminating Read Barriers through Procrastination and Cleanliness

28

• Current session closed & new session opened– After an exporting write, a user-level context switch, a local

GC– SessionStart is moved to Frontier

Heap Sessions• Source of an exporting write is often– Young– rarely referenced from outside the closure

• Average session size < 4KB

Previous Session FreeLocal Heap

Frontier & SessionStartStart

Page 29: Eliminating Read Barriers through Procrastination and Cleanliness

29

Cleanliness: Eager exporting writes• A clean object closure

– is fully contained within the current session– has no references from previous session

Previous SessionCurrent Session

FreeLocal Heap

X

Y Z

r := x

rShared Heap

Page 30: Eliminating Read Barriers through Procrastination and Cleanliness

30

Cleanliness: Eager exporting writes• A clean object closure

– is fully contained within the current session– has no references from previous session

Previous SessionCurrent Session

FreeLocal Heap

X

Y Z

r := x

rShared Heap

Walk and fix

FWD

Page 31: Eliminating Read Barriers through Procrastination and Cleanliness

31

Avoid tracing current session?• Many SML objects are tree-structured (List, Tree, etc,.)

– Specialize for no pointers from outside the closure• ∀x’ transitive closure (x), ∊

ref_count (x) = 0 && ref_count (x’) = 1

Local Heap

x(0)

y(1)

z(1)

• Eager exporting write– No current session tracing needed!

No refs from

outside

– ref_count does not consider pointers from stack or registers

Page 32: Eliminating Read Barriers through Procrastination and Cleanliness

32

Cleanliness: Reference Count

Current Session

X(0)

Current Session

X(1)

Current Session

X(LM)

Current Session

X(G)

PrevSess

Zero One LocalMany Global

• Purpose– Track pointers from previous session to current session– Identify tree-structured object

• Does not track pointers from stack and registers– Reference count only triggered during object initialization and

mutation

Page 33: Eliminating Read Barriers through Procrastination and Cleanliness

33

Bringing it all together• ∀x’ transitive closure (x), ∊

if max (ref_count (x’))– One & ref_count (x) = 0 Clean tree-structured ⇒ ⇒

Session tracing not needed– LocalMany Clean ⇒ ⇒ Trace current session– Global 1+ pointer from previous session ⇒ ⇒

Procrastinate

Page 34: Eliminating Read Barriers through Procrastination and Cleanliness

34

Cleanliness: Tree-structured Closure

Previous Session Current Session

x(0)

y(1)

z(1)T1Local Heap

Shared heap

r := x

r

current stack

Page 35: Eliminating Read Barriers through Procrastination and Cleanliness

35

Shared heap

Cleanliness: Tree-structured Closure

Previous Session Current Session

x

y

z

T1current stackLocal Heap

r := x

FWD

r

Walk current

stack

Page 36: Eliminating Read Barriers through Procrastination and Cleanliness

36

Shared heap

Cleanliness: Tree-structured Closure

Previous Session Current Session

x

y

z

T1current stackLocal Heap

r := x

r

No need to walk current

session!

Page 37: Eliminating Read Barriers through Procrastination and Cleanliness

37

Shared heap

Cleanliness: Tree-structured Closure

Previous Session Current Session

x

y

z

T1Local Heap

r := x

r

T2 Nextstack

FWDcurrent stack

Page 38: Eliminating Read Barriers through Procrastination and Cleanliness

38

Shared heap

Cleanliness: Tree-structured Closure

Previous Session Current Session

x

y

z

T1previousstackLocal Heap

r := x

r

T2 currentstack

Context Switch

Walk target stack

Page 39: Eliminating Read Barriers through Procrastination and Cleanliness

39

Cleanliness: Object graph

Previous Session Current Session

x(0)

y(LM)

z(1)current

stackLocal Heap

Shared heap

r := x

r

a

Page 40: Eliminating Read Barriers through Procrastination and Cleanliness

40

Shared heap

Cleanliness: Object graph

Previous Session Current Session

x

y

z

current stackLocal Heap

r := x

r

a

FWD

FWD

Walk current

stack

Walk current session

Page 41: Eliminating Read Barriers through Procrastination and Cleanliness

41

Shared heap

Cleanliness: Object graph

Previous Session Current Session

x

y

z

current stackLocal Heap

r := x

r

a

Walk current

stack

Walk current session

Page 42: Eliminating Read Barriers through Procrastination and Cleanliness

42

Cleanliness: Global Reference

Previous Session Current Session

x(0)

y(1)

z(G)T1 current

stackLocal Heap

Shared heap

r := x

r

a

Page 43: Eliminating Read Barriers through Procrastination and Cleanliness

43

Cleanliness: Global Reference

Previous Session Current Session

x(0)

y(1)

z(G)T1 current

stackLocal Heap

Shared heap

r := x

r

a

Procrastinate

Page 44: Eliminating Read Barriers through Procrastination and Cleanliness

44

Immutable Objects• Specialize exporting writes• If immutable object in previous session– Copy to shared heap• Immutable objects in SML do not have identity

– Original object unmodified• Avoid space leaks– Treat large immutable objects as mutable

Page 45: Eliminating Read Barriers through Procrastination and Cleanliness

45

Cleanliness: Summary• Cleanliness allows eager exporting writes

while preserving visibility invariant• With Procrastination + Cleanliness, <1% of

local GCs were forced• Additional niceties– Completely dynamic Portable– Does not impose any restriction on the GC

strategy

Page 46: Eliminating Read Barriers through Procrastination and Cleanliness

46

Evaluation• Variants– RB- : TLC with Procrastination and Cleanliness – RB+ : TLC with read barriers

• Sansom’s dual-mode GC– Cheney’s 2-space copying collection Jonker’s sliding

mark-compacting– Generational, 2 generations, No aging

• Target Architectures: – 16-core AMD Opteron server (NUMA)– 48-core Intel SCC (non-cache coherent)– 864-core Azul Vega3

Page 47: Eliminating Read Barriers through Procrastination and Cleanliness

47

Results• Speedup: At 3X min heap size, RB- faster than

RB+– AMD 32% (2X faster than STW collector)– SCC 20%– AZUL 30%

• Concurrency– During exporting write, 8 runnable user-level

threads/core!

Page 48: Eliminating Read Barriers through Procrastination and Cleanliness

Cleanliness Impact• RB- MU- : RB- GC ignoring mutability for Cleanliness• RB- CL- : RB- GC ignoring Cleanliness (Only Procrastination)

48

Avg. slowdown--------------------

11.4%28.2%31.7%

Page 49: Eliminating Read Barriers through Procrastination and Cleanliness

49

Conclusion• Eliminate the need for read barriers by

preserving the visibility invariant– Procrastination: Exploit concurrency for delaying

exporting writes– Cleanliness: Exploit generational property for

eagerly perform exporting writes

Page 50: Eliminating Read Barriers through Procrastination and Cleanliness

50

Questions?

http://multimlton.cs.purdue.edu