Upload
gitel
View
42
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Taking Off The Gloves With Reference Counting Immix. Rifat Shahriyar Xi Yang Stephen M. Blackburn Australian National University. Kathryn S. M cKinley Microsoft Research. 53 Years A go…. The Birth of GC. Today…. Why Reference Counting?. Advantages Reclaim as-you-go O bject-local - PowerPoint PPT Presentation
Citation preview
Taking Off The GlovesWith Reference Counting Immix
Rifat ShahriyarXi Yang
Stephen M. BlackburnAustralian National University
Kathryn S. McKinleyMicrosoft Research
5
Why Reference Counting?
Advantages✔ Reclaim as-you-go✔ Object-local✔ Basic RC is easy
Disadvantages✘ Cycles✘ Performance
Series1
40%
9%Tota
l Tim
e v
Pro
du
cti
on
Backup tracing
<2013 2013
OurGoal
7
Looking a Little Deeper…
Mutator time Instructions retired L1D cache misses
9% 9%
32%
6%4%
28%
-2% -3% -2%-3% -3%
1%
RC MS SS Immix
Mu
tato
r v P
rod
ucti
on
Time InstructionsRetired
L1 DCache Misses
9
Looking a Little Deeper…
Mutator time Instructions retired L1D cache misses
9% 9%
32%
6%4%
28%
-2% -3% -2%-3% -3%
1%
RC MS SS Immix
Mu
tato
r v P
rod
ucti
on
Time InstructionsRetired
L1 DCache Misses
Free List
Bump Pointer
12
How RC worksFundamental optimizations
• Backup tracing [Weizenbaum 1969]
– Reclaim cyclic garbage
• Deferral [Deutsch and Bobrow 1976]
– Note changes to stacks & registers occasionally
• Coalescing [Levanoni and Petrank 2001]
– Note only initial and final state of references
13
Deferral[Deutsch and Bobrow 1976, Bacon et al. 2001]
Stacks & Registers
A++F++
B--D++ A--
F--
A B C D FE1 1 1 1 2 1
A--
21 0 2 2
mutator activityGC: scan rootsGC: apply incrementsGC: apply decrementsGC: collectGC: move deferred decsA--F--
++ -- --'
14
Coalescing[Levanoni and Patrank 2001]
B--
Remember A Ignore intermediate mutations
Compare A, Aold
B--, F++
C++C--
D++D--
E++E--
F++
A B C D FE
15
How RC worksRecent Optimizations
• Limited bit count [Shahriyar et al. 2012]
– Use just few bits, fix o/f with backup tracing
• Elision of new object counts [Shahriyar et al. 2012]
– Only do RC work if object survives to first GC
• Allocate as dead [Shahriyar et al. 2012]
– Avoid free-list work for short lived objects
16
How Immix works
0
• Contiguous allocation into regions– 256B lines and 32KB blocks– Objects span lines but not blocks
• Simple mark phase– Mark objects and containing regions
• Free unmarked regions • Recycled allocation and defragmentation
block
line
recyclable linesobject mark line mark
18
Goal & Challenges
• Goal– Object-local pay-as-you-go collection– Excellent mutator locality– Copying to eliminate fragmentation
• Immix provides opportunistic copying ✔Same mutator locality as contiguous allocator
• However, RC is inherently localReferences to an object generally unknown……but copying must redirect all references
19
Contributions
✔Identify heap layout as bottleneck for RC✔Introduce copying RC (RC Immix)
✔Exploit Immix’s opportunistic copy✔Observe new objects can be copied by first GC✔Observe old objects can be copied by backup GC✔Line/block reclamation, header bits
✔Deliver great performance
21
Reference Countingin RC Immix
• Reference count for object• Live object count for line
– Lines ‘born dead’ (zero live object count)– Inc when any object gets first RC increment– Dec when any object is dead
• Collect lines with zero live object count
0 01 3 1 2
11 3 2 1 2200
22
Cycle Collectionin RC Immix
0
• Live object counts zeroed• Trace marks live objects and lines
– Corrects incorrect counts (due to cycles)
• Sweep– Collects unmarked lines– Sweeps dead lines, not dead objects
13 2122 4 0 00 0 2
23
DefragmentationIn RC Immix
• RC is object-local, inhibiting copying• But, RC Immix seizes two opportunities
1. All references to new objects known at first GC2. Backup tracing performs a global trace
• Use opportunistic copying in both cases– Mix copying with in-place RC and marking – Stop copying when available space exhausted
24
Proactive Defragmentation
• Copy surviving new objects (with bounded reserve)
• Optimization, not for correctness– Reserve sized for performance unlike semi-space
• Use past survival rate to predict the future
1 2120 30 41 521 3
25
Reactive Defragmentation
• Backup tracing performs a global trace• Piggyback on this, copy live objects• Use available memory threshold
– If below threshold, do defrag at next cycle GC
27
Hardware, Software & Benchmarks
• 21 benchmarks– DaCapo, SPECjvm98 and pjbb2005
• 20 invocations for each benchmark• Jikes RVM and MMTk
– All garbage collectors are parallel
• Intel Core i7 2600K, 4GB• Ubuntu 10.04.1 LTS
29
Bottom LineGeomean of all benchmarks, versus production
heap size = 2x the minimum heap size
3% improvement over production on geomean
-35%
-30%
-25%
-20%
-15%
-10%
-5%
0%
5%
10%
15%
RC
RC Immix
TotalTime
MutatorTime
GCTime
30
Total TimeBy Benchmark
heap size = 2x the minimum heap size
+5% worst case, -25% best case
-30%
-20%
-10%
0%
10%
20%
30%
40%
RC RC Immix
fast
er
←
Tim
e
→
slo
we
r
jess db
javac
mtr
t
jack
avro
ra
blo
at
chart
ecl
ipse fop
hsq
ldb
jyth
on
luin
dex
luse
arc
hfix
pm
d
sunflow
xala
n
pjb
b2
00
5
com
pre
ss
31
Mutator TimeBy Benchmark
heap size = 2x the minimum heap size
+4% worst case, -10% best case
-15%
-10%
-5%
0%
5%
10%
15%
20%
25%
30%
RC RC Immix
fast
er ←
M
uta
tor
→ s
low
er
jess db
javac
mtr
t
jack
avro
ra
blo
at
chart
ecl
ipse fop
hsq
ldb
jyth
on
luin
dex
luse
arc
hfix
pm
d
sunflow
xala
n
pjb
b2
00
5
com
pre
ss
32
GC TimeBy Benchmark
heap size = 2x the minimum heap size
+5% worst case, -25% best case
-30%
-20%
-10%
0%
10%
20%
30%
40%
RC RC Immix
fast
er
←
GC
→
sl
ow
er
jess db
javac
mtr
t
jack
avro
ra
blo
at
chart
ecl
ipse fop
hsq
ldb
jyth
on
luin
dex
luse
arc
hfix
pm
d
sunflow
xala
n
pjb
b2
00
5
com
pre
ss
33
Total Time v Heap Size
RCImmix matches GenImmix at 1.3x and outperforms from 1.4x
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 61
1.05
1.1
1.15
1.2
1.25
1.3
1.35
1.4
1.45
1.5
GenImmix RC RC Immix RC Immix (No PC)
Heap Size / Minimum Heap
Tim
e /
Be
st