Upload
kostya
View
18
Download
1
Tags:
Embed Size (px)
DESCRIPTION
The Garbage Collection Advantage: Improving Program Locality. Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (Umass), Zhenlin Wang (MTU), Perry Cheng (IBM). Presented by Na Meng. Many thanks to authors and the anonymous speaker on MM course last time. - PowerPoint PPT Presentation
Citation preview
1
The Garbage Collection Advantage:
Improving Program Locality
Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT)
J Eliot B Moss (Umass), Zhenlin Wang (MTU), Perry Cheng (IBM)
Presented by Na Meng
Many thanks to authors and the anonymous speaker on MM course last time
2
Motivation
• Memory gap problem
• OO programs exacerbates memory gap problem– Automatic memory management
• Pointer data structures
Goal: improve OO program locality
3
Opportunity
• Copying garbage collector reorders objects at runtime
4
1
4
65
7
2 3
Copying of Linked Objects
BreadthFirst
65
7
432
1
5
71 2 3 4 5 6
1
4
65
7
2 3
Copying of Linked Objects
65
7
432
1
BreadthFirst
DepthFirst
6
71 2 3 4 5 6
Copying of Linked Objects
DepthFirst
OnlineObjectReordering
1 4BreadthFirst
61 2 3 4 75
1
4
65
7
2 3
65
7
432
1
41
7
Outline
• Motivation• Online Object Reordering
(OOR)• Methodology• Experimental Results• Conclusion
8
Online Object Reordering
• Where are the cache misses?• How to identify hot field accesses
at runtime?• How to reorder the objects?
9
Where Are The Cache Misses?
VM Objects StackOlder
Generation
• Heap structure:
Nursery
Not to scale
10
Where Are The Cache Misses?
_209_db
0200400600800
100012001400160018002000
To
tal
Acc
esse
s (i
n m
illi
on
s)
L2 hits
L2 misses
11
Where Are The Cache Misses?
• Two opportunities to reorder objects in the older generation– Promote nursery objects– Full heap collection
12
How to Find Hot Fields?
• Runtime info (intercept every read)?
• Compiler analysis?• Runtime information + compiler
analysis Key: Low overhead estimation
13
Which Classes Need Reordering?
Step 1: Compiler analysis– Excludes cold basic blocks– Identifies field accesses
Step 2: JIT adaptive sampling identifies hot methods– Mark as hot field accesses in hot
methods
14
Example: Compiler Analysis
Compiler
Hot BBCollect access info
Cold BBIgnore
Compiler
Access List:1. A.b2. ….….
Method Foo { Class A a; try { …=a.b; … } catch(Exception e){ …a.c }}
15
Example: Adaptive Sampling
Method Foo { Class A a; try { …=a.b;
… } catch(Exception e){
…a.c }}
Adaptive Sampling
Foo is hot
Foo Accesses:1. A.b2. ….….
A.b is hot
A
B
b…..
c A’s type information
c b
16
1
4
65
7
2 3
Copying of Linked Objects
65
7
43
OnlineObjectReordering
Type Information
143
2
1
Hot space Cold space
17
OOR System Overview
BaselineCompiler
SourceCode
ExecutingCode
AdaptiveSampling Optimizing
Compiler
HotMethods
Access InfoDatabase
Register HotField Accesses
Look Up
AddsEntries
GC: CopiesObjects
Affects Locality
AdviceGC: CopiesObjects
OOR additionJikesRVM componentInput/Output
OptimizingCompiler
AdaptiveSampling
Improves Locality
18
Outline
• Motivation• Online Object Reordering• Methodology• Experimental Results• Conclusion
19
Virtual Machine
• Jikes RVM– VM written in Java– High performance– Timer based adaptive sampling – Dynamic optimization
• Experiment setup– Pseudo-adaptive – 2nd iteration [Eeckhout et al.]
20
Memory Management
• Memory Management Toolkit (MMTk)– Allocators and garbage collectors– Multi-space heap
• Boot image• Large object space (LOS)• Immortal space
• Experiment setup– Generational copying GC with 4M
bounded nursery
21
Overhead: OOR Analysis Only
Benchmark Base Execution Time (sec)
w/ only OOR Analysis (sec)
Overhead
jess 4.39 4.43 0.84%
jack 5.79 5.82 0.57%
raytrace 4.63 4.61 -0.59%
mtrt 4.95 4.99 0.70%
javac 12.83 12.70 -1.05%
compress 8.56 8.54 0.20%
pseudojbb 13.39 13.43 0.36%
db 18.88 18.88 -0.03%
antlr 0.94 0.91 -2.90%
hsqldb 160.56 158.46 -1.30%
ipsixql 41.62 42.43 1.93%
jython 37.71 37.16 -1.44%
ps-fun 129.24 128.04 -1.03%
Mean -0.19%
22
Detailed Experiments
• Separate application and GC time• Vary thresholds for method heat• Vary thresholds for cold basic
blocks• Three architectures
– x86, AMD, PowerPC
• x86 Performance counter: – DL1, trace cache, L2, DTLB, ITLB
23
Performance javac
24
Performance db
25
Performance jython
Is the improvement significant?
26
Phase Changes
Algorithm: Decay Field Heat
27
DECAY-HEAT(method)1 for each fieldAccess in method do2 if PotentiallyHot(fieldAccess)then3 hotField fieldAccess.field4 class hotField.instantiatingClass5 class.hasHotField true6 for each field in class do7 period Now() – class.lastUpdate8 decay HI/(HI + period)9 field.heat field.heat * decay10 if field.heat < LO then11 field.heat = 012 hotField.heat HI13 class.lastUpdate Now()
Will the latest access pattern erase the earlier access pattern(s)?
m1(){ for(… …){ … … a.b = … }}
m2(){ for(… …){ … … = a.c; }}
for(… …){ m1();//GC works m2();//GC works}
OOR w/o vs. w phase change
28
• Almost all hot fields within an object are visited around the same time
The standard benchmarks have few, if any, traversal order phases.
Copying Advantage (javac)
29
GenCopy vs. MS
Mutator time? GC time? Total time?
A Possible Comparison
30
GenCopy vs. GenOOR ?
Discussion
• Any other solution to improve the locality while doing copying collection
31
32
Questions?
Thank you!