Upload
nina-lawrence
View
19
Download
0
Embed Size (px)
DESCRIPTION
Reducing Pause Time of Conservative Collectors. Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo). Incremental GC for soft-realtime applications [Steele 75] [Yuasa 90] [Doligez 93]. Target: Multimedia, game etc. Pauses should be
Citation preview
Reducing Pause Time of Conservative Collectors
Toshio Endo (National Institute of Informatics)
Kenjiro Taura (Univ. of Tokyo)
Incremental GC for soft-realtime applications [Steele 75] [Yuasa 90] [Doligez 93]
Target: Multimedia, game etc.– Pauses should be <10ms
Collection tasks are divided into small pieces Success: Pauses of <5ms [Cheng 01]
– They assume compiler cooperation
Reduction of pause for ‘conservative’ GCs is insufficient
Conservative GC [Boehm et al. 88]
Mark sweep GC for C/C++ programs No compiler cooperation (e.g., write barriers)
Mostly parallel GC [Boehm et al. 91] Incremental, conservative Pauses >100ms fairly common
Write barriers in conservative GCs
No fine-grain write barrier by compiler
VM’s write protectionCoarse grain– Page level– Detect only first update after protection
Restrict design
Incremental mark sweep algorithms
Snapshot at beginning&DLG [Yuasa 90] [Doligez 93]
– Make (conceptual) heap snapshot before marking– Promise short pause– Large space overhead with VM write barrier
Incremental update [Steele 75] [Dijkstra 78]– Maintain consistency after marking
Need final marking before finish
Unlimitedly long!
Only choice
With VM
Contributions
Analyze why previous algorithms fail Propose techniques to bound pauses &
guarantee progress Show a `stress-test’ benchmark: iukiller Demonstrate experimental results
– < 5ms in applications– < 12ms in the stress-test benchmark (constant
across all heap sizes)(This talk omits parallel issues)
Overview of presentation
Mostly parallel GC Techniques to reduce pause time Experimental results Related work Summary
Mostly parallel garbage collector (1)
Start GC
Write-protect heap
Incremental mark User
write fault
Remember dirty (=updated) pages addr.
UnprotectFinal marking
Incremental sweep User
Trap handler
End GC
Mostly parallel garbage collector (2)
Second update is un-trapped– Mark r in final phase
Need final
marking
writer
p
qwriter
p
q r
p
q
Final marking
heap
root1. Scan all dirty pages + root
2. Mark all unmarked objects from scanned region
The amount of work is unbounded # of dirty pages Objects reachable from a dirty
page
Makes pauses >100ms
Overview of presentation
Mostly parallel garbage collector Techniques to reduce pause time Experimental results Related work Summary
Goal of our collector
Bound pause time (< constant)– Mutator utilization is important, but focus on pause
Guarantee progress of collection
Combine two techniques: Bound dirty pages (BD) Retry incremental marking (RI)
Bounding dirty pages (1)
Basic collector produces many dirty pages
Keep # of dirty pages < a given limit– If exceeds limit, choose a dirty page– Re-protect, scan, clean it – Good: Reduce task in final marking– Bad: More protection cost
Bounding dirty pages (2)
Is pause now bounded?
… No! Unmarked objects
reachable from a dirty page are not bounded
heap
root
Retrying incremental marking (1)
Start GC
Write-protect heap
Incremental mark User Trap handler
Final marking
Incremental sweep User
End GC
Finished before limit?
Yes.
No.Retry!
Keep works of final marking < a given limit
Retrying incremental marking (2)
Good: Bound length of single final marking Bad: Risk of starvation (no progress)
– Final marking may abort before finishing scanning (unbounded) dirty pages
– Unmarked objects may ‘escape’ from collector
The worst case
Abort a final marking with no progressFinal aborts
write
Final abortswrite
Incr.
Incr.
Incr.finishes
Incr.finishes
Ensuring bounded pause and progress
Either is insufficient…Need two techniques:
– Bounding dirty pages (BD)– Retrying incremental marking (RI)
BD Every final marking can scan all dirty pages It finds some unmarked objects, if any
Overview of presentation
Mostly parallel garbage collector Techniques to reduce pause time Experimental results Related work Summary
Experimental Environments
400MHz UltraSPARC, Solaris 8 Four GCs
– Stop: Stop-the-world GC– Basic: Basic incremental GC– BD: Use bounding dirty pages– BD+R: Use bounding dirty pages + retrying
incremental marking
Basic/BD/BD+R: GC starts when heap usage > 75%BD/BD+R: # of dirty pages < 16
The iukiller synthetic benchmark
‘Stress-test’ benchmark for mostly parallel GC Trees tend to escape from collector
Final marking tends to be long
root root
large binary trees
repeat
Results of iukiller benchmark:the maximum pause time
Previous collectors fail– > 1.8 seconds– The larger the heap,
the longer
BD+R achieves <12ms pause– independent from heap
size
heap live GC kind max. pausesize (MB) data (MB) time (ms)
100 64 Stop 4122Basic 2085BD 1802BD+R 11.7
200 128 Stop 8607Basic 4071BD 3753BD+R 11.7
400 256 Stop 17039Basic 8205BD 7166BD+R 11.2
Application benchmarks
Programs written in C/C++– deltablue: an incremental constraint solver (25MB)– espresso: a logic optimizer for PLA (10MB)– N-Body: an N-Body solver with Barnes-Hut (15MB)– CKY: a context free grammar parser (40MB)– Cube: a Rubik’s cube puzzle solver (8MB)
CKY
020406080
100120140160180
max
. pau
se (
mse
c)
deltablue
0102030405060708090
max
. pau
se
(mse
c)
Results of application benchmarks:the maximum pause time
BD+R achieves <5ms pause in five applications
BD is also OK (< 16ms)
215ms
283ms
espresso
024
68
10m
ax. p
ause
(m
sec)
N- Body
0102030405060
max
. pau
se (
mse
c)
Cube
0102030405060
max
. pau
se (
mse
c)
Results of application benchmarks:
overhead
BD/BD+R is <9% slower than Basic– More protection
All incr. GCs are 1—53% slower than Stop
– VM write barrier– Floating garbage – More GC cycles
deltablue
00.20.40.60.8
11.21.41.6
exec
. tim
e (S
top=
1)
espresso
00.20.40.60.8
11.21.41.6
exec
. tim
e (S
top=
1)
N- Body
00.20.40.60.8
11.21.41.6
exec
. tim
e (S
top=
1)
CKY
00.20.40.60.8
11.21.41.6
exec
. tim
e (S
top=
1)
Cube
00.20.40.60.8
11.21.41.6
exec
. tim
e (S
top=
1)Total execution times (‘Stop’=1)
Related work
[Appel et al. 88]– Copy GC with VM read barrier. Slower than write barrier
[Furuso et al. 91]– Snapshot-at-beginning on VM. Large space overhead
Recent version of [Boehm et al. 91]– Time limit on final marking. Risks of starvation
[Printezis et al. 00] [Ossia et al. 02]– Keep # of dirty cards small. Final marking is still unbounded
Summary
An incremental conservative GC Short pause (<5ms in 5 applications) GC progress
Use both techniques:– Bounding dirty pages– Retrying incremental marking
Future direction
Reducing overhead of BD– Strategy for proper limit for dirty pages
Bounding roots to be scanned– Protect stacks partially