30
Portable, mostly-concurrent, mostly-copying GC for multi-processors Tony Hosking Secure Software Systems Lab Purdue University

Portable, mostly-concurrent, mostly-copying GC for multi-processors

  • Upload
    latif

  • View
    24

  • Download
    0

Embed Size (px)

DESCRIPTION

Portable, mostly-concurrent, mostly-copying GC for multi-processors. Tony Hosking Secure Software Systems Lab Purdue University. Platform assumptions. Symmetric multi-processor (SMP/CMP) Multiple mutator threads (Large heaps). Desirable properties. Maximize throughput - PowerPoint PPT Presentation

Citation preview

Page 1: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Portable,mostly-concurrent,

mostly-copying GC formulti-processors

Tony HoskingSecure Software Systems

LabPurdue University

Page 2: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Platform assumptions

• Symmetric multi-processor (SMP/CMP)

• Multiple mutator threads• (Large heaps)

Page 3: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Desirable properties

• Maximize throughput• Minimize collector pauses• Scalability

Page 4: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Exploiting parallelism

• Avoid contention• (Mostly-)Concurrent allocation

• (Mostly-)Concurrent collection

Page 5: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Concurrent allocation

• Use thread-private allocation “pages”

• Threads contend for free pages• Each thread allocates from its own page• multiple small objects per page, or

• multiple pages per large object

Page 6: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Concurrent collection:The tricolour abstraction

• Black• “live”• scanned • cannot refer to white

• Grey• “live” wavefront• still to be scanned• may refer to any color

• White• hypothetical garbage

Page 7: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Garbage collection

• White = whole heap• Shade root targets grey• While grey nonempty

• Shade one grey object black• Shade its white children grey

• At end, white objects are garbage

Page 8: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Copying collection

• Partition white from black by copying

• Reclaim white partition wholesale

• At next GC, “flip” black to white

Page 9: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Mutator threads

Incremental collection

Page 10: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Mutator threads

Concurrent collection

Background GC thread

Page 11: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Concurrent mutators

• Mutation changes reachability during GC

• Loss of black/grey reference is safe• Non-white object losing its last reference will be garbage at next GC

• New reference from black to white• New reference may make target live• Collector may never see new reference

• Mutations may require compensation

Page 12: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Compensation options

• Prevent mutator from creating black-to-white references• write barrier on black• read barrier on grey to prevent mutator obtaining white refs

• Prevent destruction of any path from a grey object to a white object without telling GC• write barrier on grey

Page 13: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Mostly-copying GC [Bartlett]

• Copying collection with ambiguous roots• Uncooperative compilers• Untidy references• Explicit pinning

• Pin ambiguously-referenced objects• Shade their page grey without copying

• Assume heap accuracy• Copy remaining heap-referenced objects

Page 14: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Incremental MCGC[DeTreville]

• Enforce grey mutator invariant– STW greys ambiguously-referenced pages– Read barrier on grey using VM page protection

• Read barrier– Stop mutator threads– Unprotect page– Copy white targets to grey– Shade page black– Restart threads

• Atomic system call wrappers unprotect parameter targets (otherwise traps in OS return error)

Page 15: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Concurrent MCGC?

• Stopping all threads at each increment is prohibitive on SMP & impedes concurrency

• BUT barriers difficult to place on ambiguous references with uncooperative compilers

• ALSO Preemptive scheduling may break wrapper atomicity

Page 16: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Mostly-concurrent MCGC

• Enforce black mutator invariant• STW blackens ambiguously-referenced pages

• Read barrier on load of accurate (tidy) grey reference

• Read barrier:• Blacken grey references as they are loaded

• No system call wrappers: arguments are always black

Page 17: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Read barrier on load of grey

• Object header bit marks grey objects• Inline fast path checks grey bit in target header, calls out to slow path if set

• Out-of-line slow path:• Lock heap meta-data• For each (grey) source object in target page• Copy white targets to grey• Clear grey header bit

• Shade target page black• Unlock heap meta-data

Page 18: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Coherence for fast path

• STW phase synchronizes mutators’ views of heap state

• Grey bits are set only in newly-copied objects (ie, newly-allocated grey pages) since most recent STW

• Mutators can never see a cleared grey header unless the page is also black

• Seeing a spurious grey header due to weak ordering is benign: slow path will synchronize

Page 19: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Implementation

• Modula-3:• gcc-based compiler back-end• No tricky target-specific stack-maps• Compiler front-end emits barriers• M3 threads map to preemptively-scheduled POSIX pthreads

• Stop/start threads: signals + semaphores, or OS primitives if available

• Simple to port: Darwin (OS X), Linux, Solaris, Alpha/OSF

Page 20: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Experiments

• Parallelized GCOld benchmark to permit throughput measurements for multiple mutators

• Measures steady-state GC throughput

• 2 platforms:• 2 x 2.3GHz PowerPC Macintosh Xserve running OS X 10.4.4

• 8 x 700MHz Intel Pentium 3 SMP running Linux 2.6

Page 21: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Read Barriers: STW1 user-level mutator thread, work=1

0

1

1

2

2

3

3

4

4

5

0.1 0.5 1 2 4 8

GC ratio

elapsed time (s)

Hardware Software

Page 22: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Elapsed time (s)1 system-level mutator thread, work=1

0

1

2

3

4

5

6

7

0.1 0.5 1 2 4 8

GC ratio

elapsed time (s)

STW INC

Page 23: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Heap size1 system-level mutator thread

0

20

40

60

80

100

120

140

0.1 0.5 1 2 4 8

GC ratio

maximum heap (MB)

STW INC

Page 24: Portable, mostly-concurrent, mostly-copying GC for multi-processors

BMU1 system-level mutator thread,

work=1000, ratio=1

Page 25: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Scalabilitywork=1000, ratio=1, 8xP3

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

mutator threads

elapsed time (s)

STW INC

Page 26: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Java Hotspot serverwork=1000, 8xP3

0

20

40

60

80

100

120

140

160

180

200

1 2 3 4 5 6 7 8

mutator threads

elapsed time (s)

Serial Concurrent MS

Page 27: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Conclusions

• Mostly-concurrent,mostly-copying collection is feasible for multi-processors (proof-of-existence)

• Performance is good (scalable)• Portable: changes only to compiler front-end to introduce barriers, and to GC run-time system

• Compiler back-end unchanged: full-blown optimizations enabled, no stack-map overheads

Page 28: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Future work

• Convert read barrier to “clean” only target object instead of whole page

Page 29: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Scalabilitywork=10, ratio=1, 8xP3

0

10

20

30

40

50

60

70

80

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

mutator threads

elapsed time (s)

STW INC

Page 30: Portable, mostly-concurrent, mostly-copying GC for multi-processors

Java Hotspot serverwork=10, 8xP3

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8

mutator threads

elapsed time (s)

Serial Concurrent MS