Reducing Pause Time of Conservative Collectors

Reducing Pause Time of Conservative Collectors

Toshio Endo (National Institute of Informatics)

Kenjiro Taura (Univ. of Tokyo)

Incremental GC for soft-realtime applications [Steele 75] [Yuasa 90] [Doligez 93]

Target: Multimedia, game etc.– Pauses should be <10ms

Collection tasks are divided into small pieces Success: Pauses of <5ms [Cheng 01]

– They assume compiler cooperation

Reduction of pause for ‘conservative’ GCs is insufficient

Conservative GC [Boehm et al. 88]

Mark sweep GC for C/C++ programs No compiler cooperation (e.g., write barriers)

Mostly parallel GC [Boehm et al. 91] Incremental, conservative Pauses >100ms fairly common

Write barriers in conservative GCs

No fine-grain write barrier by compiler

VM’s write protectionCoarse grain– Page level– Detect only first update after protection

Restrict design

Incremental mark sweep algorithms

Snapshot at beginning&DLG [Yuasa 90] [Doligez 93]

– Make (conceptual) heap snapshot before marking– Promise short pause– Large space overhead with VM write barrier

Incremental update [Steele 75] [Dijkstra 78]– Maintain consistency after marking

Need final marking before finish

Unlimitedly long!

Only choice

With VM

Contributions

Analyze why previous algorithms fail Propose techniques to bound pauses &

guarantee progress Show a `stress-test’ benchmark: iukiller Demonstrate experimental results

– < 5ms in applications– < 12ms in the stress-test benchmark (constant

across all heap sizes)(This talk omits parallel issues)

Overview of presentation

Mostly parallel GC Techniques to reduce pause time Experimental results Related work Summary

Mostly parallel garbage collector (1)

Start GC

Write-protect heap

Incremental mark User

write fault

Remember dirty (=updated) pages addr.

UnprotectFinal marking

Incremental sweep User

Trap handler

End GC

Mostly parallel garbage collector (2)

Second update is un-trapped– Mark r in final phase

Need final

marking

writer

p

qwriter

p

q r

p

q

Final marking

heap

root1. Scan all dirty pages + root

2. Mark all unmarked objects from scanned region

The amount of work is unbounded # of dirty pages Objects reachable from a dirty

page

Makes pauses >100ms


Mostly parallel garbage collector Techniques to reduce pause time Experimental results Related work Summary

Goal of our collector

Bound pause time (< constant)– Mutator utilization is important, but focus on pause

Guarantee progress of collection

Combine two techniques: Bound dirty pages (BD) Retry incremental marking (RI)

Bounding dirty pages (1)

Basic collector produces many dirty pages

Keep # of dirty pages < a given limit– If exceeds limit, choose a dirty page– Re-protect, scan, clean it – Good: Reduce task in final marking– Bad: More protection cost

Bounding dirty pages (2)

Is pause now bounded?

… No! Unmarked objects

reachable from a dirty page are not bounded

heap

root

Retrying incremental marking (1)

Start GC

Write-protect heap

Incremental mark User Trap handler

Final marking

Incremental sweep User

End GC

Finished before limit?

Yes.

No.Retry!

Keep works of final marking < a given limit

Retrying incremental marking (2)

Good: Bound length of single final marking Bad: Risk of starvation (no progress)

– Final marking may abort before finishing scanning (unbounded) dirty pages

– Unmarked objects may ‘escape’ from collector

The worst case

Abort a final marking with no progressFinal aborts

write

Final abortswrite

Incr.

Incr.

Incr.finishes

Incr.finishes

Ensuring bounded pause and progress

Either is insufficient…Need two techniques:

– Bounding dirty pages (BD)– Retrying incremental marking (RI)

BD Every final marking can scan all dirty pages It finds some unmarked objects, if any


Mostly parallel garbage collector Techniques to reduce pause time Experimental results Related work Summary

Experimental Environments

400MHz UltraSPARC, Solaris 8 Four GCs

– Stop: Stop-the-world GC– Basic: Basic incremental GC– BD: Use bounding dirty pages– BD+R: Use bounding dirty pages + retrying

incremental marking

Basic/BD/BD+R: GC starts when heap usage > 75%BD/BD+R: # of dirty pages < 16

The iukiller synthetic benchmark

‘Stress-test’ benchmark for mostly parallel GC Trees tend to escape from collector

Final marking tends to be long

root root

large binary trees

repeat

Results of iukiller benchmark:the maximum pause time

Previous collectors fail– > 1.8 seconds– The larger the heap,

the longer

BD+R achieves <12ms pause– independent from heap

size

heap live GC kind max. pausesize (MB) data (MB) time (ms)

100 64 Stop 4122Basic 2085BD 1802BD+R 11.7

200 128 Stop 8607Basic 4071BD 3753BD+R 11.7

400 256 Stop 17039Basic 8205BD 7166BD+R 11.2

Application benchmarks

Programs written in C/C++– deltablue: an incremental constraint solver (25MB)– espresso: a logic optimizer for PLA (10MB)– N-Body: an N-Body solver with Barnes-Hut (15MB)– CKY: a context free grammar parser (40MB)– Cube: a Rubik’s cube puzzle solver (8MB)

CKY

020406080

100120140160180

max

. pau

se (

mse

c)

deltablue

0102030405060708090

max

. pau

se

(mse

c)

Results of application benchmarks:the maximum pause time

BD+R achieves <5ms pause in five applications

BD is also OK (< 16ms)

215ms

283ms

espresso

024

68

10m

ax. p

ause

(m

sec)

N- Body

0102030405060

max

. pau

se (

mse

c)

Cube

0102030405060

max

. pau

se (

mse

c)

Results of application benchmarks:

overhead

BD/BD+R is <9% slower than Basic– More protection

All incr. GCs are 1—53% slower than Stop

– VM write barrier– Floating garbage – More GC cycles

deltablue

00.20.40.60.8

11.21.41.6

exec

. tim

e (S

top=

1)

espresso

00.20.40.60.8

11.21.41.6

exec

. tim

e (S

top=

1)

N- Body

00.20.40.60.8

11.21.41.6

exec

. tim

e (S

top=

1)

CKY

00.20.40.60.8

11.21.41.6

exec

. tim

e (S

top=

1)

Cube

00.20.40.60.8

11.21.41.6

exec

. tim

e (S

top=

1)Total execution times (‘Stop’=1)

Related work

[Appel et al. 88]– Copy GC with VM read barrier. Slower than write barrier

[Furuso et al. 91]– Snapshot-at-beginning on VM. Large space overhead

Recent version of [Boehm et al. 91]– Time limit on final marking. Risks of starvation

[Printezis et al. 00] [Ossia et al. 02]– Keep # of dirty cards small. Final marking is still unbounded

Summary

An incremental conservative GC Short pause (<5ms in 5 applications) GC progress

Use both techniques:– Bounding dirty pages– Retrying incremental marking

Future direction

Reducing overhead of BD– Strategy for proper limit for dirty pages

Bounding roots to be scanned– Protect stacks partially

Documents

Reducing Pause Time of Conservative Collectors