View
213
Download
0
Category
Preview:
Citation preview
Department of Computer Sciences
Cork: Dynamic Memory Leak Detection with Garbage Collection
Maria JumpKathryn S. McKinley
{mjump,mckinley}@cs.utexas.edu
11-Dec-2006 UMCP 2
Department of Computer Sciences
Best case : increases GC workload Worst case: systematic heap growth
causes crash after days of execution
A memory leak in a garbage-collected language occurs when a program inadvertently maintains references to objects that it no longer needs, preventing the collector from reclaiming space.
Cork accurately pinpoints systematicheap growth completely online
11-Dec-2006 UMCP 3
Department of Computer Sciences
Cork’s Solution
1. Summarize heap growth by calculating type points-from graph Piggybacks on full-heap object scan Summarizes the heap by type
2. Interpret the summarization using differencing
3. Generate debugging reports Candidate Report Slice Report Allocation Site Report
11-Dec-2006 UMCP 4
Department of Computer Sciences
3
Type Points-From Graph
Heap
3
4
2
2
1
4
11
1
TPFG
=instance =type
=HashTable =Queue =PQueue =Company =People
11-Dec-2006 UMCP 5
Department of Computer Sciences
12
Differencing TPFGs
1
2
1
2
2
3
2
1
TPFGi+1
TPFGi+2
1
4
1
1
2
1
2
33
4
TPFGi
1
1
1
1
1
1
1
1
1
11-Dec-2006 UMCP 6
Department of Computer Sciences
1
4
11
4
1
1
1
4 Rank growing nodes
Rank all growing nodes
Designate node as a candidate if
Finding Growth (SRT)
0it
s
S
sprr i
iii
tttt *
1
Rt
threstir
11-Dec-2006 UMCP 7
Department of Computer Sciences
0
1
2
3
4
com
pre
ss
jess
raytr
ace db
javac
mpegaudio
mtr
t
jack
pseudojb
b
SP
EC
jbb
antl
r
blo
at
fop
jyth
on
pm
d
ps
xala
n
Reported Candidates
# o
f C
an
did
ate
s
SRT
jess
SP
EC
jbb
fop
11-Dec-2006 UMCP 8
Department of Computer Sciences
1
4
11
4
1
1
1
4 Find nodes that are growing
Rank all growing nodes
Designate node as a candidate if
Finding Growth (RRT)
1*)1(
ii tt vfv
Rt
threstir
1
ii tt vv
1*1
Qprriii ttt
11-Dec-2006 UMCP 9
Department of Computer Sciences
0
1
2
3
4
5
com
pre
ss
jess
raytr
ace db
javac
mpegaudio
mtr
t
jack
pseudojb
b
SP
EC
jbb
antl
r
blo
at
fop
jyth
on
pm
d
ps
xala
n
Reported Candidates
# o
f C
an
did
ate
s
SRT
RRTje
ss
SP
EC
jbb
fop
11-Dec-2006 UMCP 10
Department of Computer Sciences
Finding Data Structure
Type is not enough Growing edges identify the data structure Rank edges
Calculate a slice from each candidate Set of all paths (n0…nn) such that “Sees” beyond non-candidate nodes
1
11
11
4 14
14
01
kk nnr
11-Dec-2006 UMCP 11
Department of Computer Sciences
Implementation and Methodology
Jikes RVM with MMTk Benchmarks:
SPECjvm98, DaCapo, SPECjbb2000 Eclipse 3.1.2
Garbage collector Generational with 4MB bounded nursery
For performance, report application only Replay compilation 2nd run methodology
11-Dec-2006 UMCP 12
Department of Computer Sciences
Efficiency and Scalability
Node/type data stored in type information block (TIB) adding 5 words 1 word for type volume and edge list pointer
for each of the previous 4 collections 1 word for # of phases (p)
Edge data stored in lists Prune parts of TPFG that are non-growing
11-Dec-2006 UMCP 13
Department of Computer Sciences
Space Overhead
jess Eclipse Geomean
# of types
bm+VM 1744 3365 1747
TPFG avg 318 667 334
TPFG max 319 775 346
# of edges
TPFG avg 844 4090 904
TPFG max 861 7585 1142
% pruned 66% 42% 60%
Increased Alloc % 0.094% 0.167% 0.233%
19%
2.7X
0.233%
11-Dec-2006 UMCP 14
Department of Computer Sciences
Time OverheadN
orm
aliz
ed T
ota
l T
ime
Heap Size Relative to Minimum
11-Dec-2006 UMCP 15
Department of Computer Sciences
Benchmarks on Cork Cork identified:
Systematic heap growth Growing types Growing data structure
Analysis: fop – application design jess – memory leak SPECjbb2000 – memory leak
0
1
2
3
4
5
com
pre
ss
jess
rayt
race d
b
java
c
mpegaudio
mtr
t
jack
pse
udojb
b
SP
EC
jbb
antl
r
blo
at
fop
jyth
on
pm
d ps
xala
n
SP
EC
jbb
fop
jess
11-Dec-2006 UMCP 16
Department of Computer Sciences
SPECjbb2000H
eap
Occ
up
ancy
(M
B)
Time (MB of allocation)
11-Dec-2006 UMCP 17
Department of Computer Sciences
Slice Diagram: SPECjbb2000
Order
Orderline
Date
NewOrder
Object[]
longBTreeNode
longBTreelongStaticBTree
Types: 1663 (71)Nodes: 318Edges: 904
Candidate
Non-candidate
11-Dec-2006 UMCP 18
Department of Computer Sciences
SPECjbb2000H
eap
Occ
up
ancy
(M
B)
Time (MB of allocation)
11-Dec-2006 UMCP 19
Department of Computer Sciences
Eclipse 3.1.2 on Cork
IDE Big, complex, and open-source Bug repository details known memory
leaks and how to reproduce them #115789: Memory Leak Comparing 2 source trees or jar files Manually repeat while running Cork
11-Dec-2006 UMCP 20
Department of Computer Sciences
Eclipse 115789H
eap
Occ
up
ancy
(M
B)
Time (MB of allocation)
11-Dec-2006 UMCP 21
Department of Computer Sciences
Slice Diagram: Eclipse 115789
Path
Folder File
ResourceCompareInput$FilteredBufferedResourceNode
ArrayList
Object[]
ListenerList
RuleBasedCollator
ResourceCompareInput$MyDiffNode
HashMap$HashEntry
HashMap$HashEntry[]
HashMap
HashMap$HashIterator
ResourceCompareInput
ElementTree$ChildIDsCache
ElementTree
Types: 3365 (1773)Nodes: 667Edges: 4090
Candidate
Non-candidate
11-Dec-2006 UMCP 22
Department of Computer Sciences
Eclipse 115789H
eap
Occ
up
ancy
(M
B)
Time (MB of allocation)
11-Dec-2006 UMCP 23
Department of Computer Sciences
Slice Diagram: Eclipse 115789
Path
Folder File
ResourceCompareInput$FilteredBufferedResourceNode
ArrayList
Object[]
ListenerList
RuleBasedCollator
ResourceCompareInput$MyDiffNode
HashMap$HashEntry
HashMap$HashEntry[]
HashMap
HashMap$HashIterator
ResourceCompareInput
ElementTree$ChildIDsCache
ElementTree
Types: 3365 (1773)Nodes: 667Edges: 4090
Candidate
Non-candidate
11-Dec-2006 UMCP 24
Department of Computer Sciences
Eclipse 115789H
eap
Occ
up
ancy
(M
B)
Time (MB of allocation)
11-Dec-2006 UMCP 25
Department of Computer Sciences
Cork’s Contributions Very low-overhead technique
<0.5% space overhead ~2% time overhead
Accurately identifies Systematic heap growth Data structure containing the growth
First mechanism for detecting memory leaks in production systems
11-Dec-2006 UMCP 26
Department of Computer Sciences
Thank You!
mjump@cs.utexas.eduhttp://www.cs.utexas.edu/~mjump
11-Dec-2006 UMCP 27
Department of Computer Sciences
Second Run Methodology
Replay compilation Profiling runs chooses hot methods Deterministically applies optimizing compiler Mixture of optimized & unoptimized code
Measure 2nd run First run applies replay compilation Turn off compilation Flush compiler objects from heap Measure second run
11-Dec-2006 UMCP 28
Department of Computer Sciences
Gartner Report predicts that by 2010,80% of all new software will be in Java or C#
C++ Java
Execution efficiency Developer productivity
Trusts the programmer Protects the programmer
Arbitrary memoryaccess possible
Memory accessonly through objects
Can arbitrarily override types Type safety
Procedural or object-oriented Object-oriented
Operator overloading Meaning of operators immutable
Powerful capabilities of languageFeature-rich, easy-to-use
standard library
Explicit memory controlAutomatic memory
management
[Wikipedia: Comparison of Java and C++, Dec 2006]
11-Dec-2006 UMCP 29
Department of Computer Sciences
Panacea for Bugs?
PMD, FindBugs, JLint, … ESC/Java, Bandera, … HPROF, JProbe, HAT, Leakbot, …
Microsoft reports that, even in C#, 75% ofdevelopment time is spent in debugging
Provide a good start Programs still ship with memory
and semantic errors
11-Dec-2006 UMCP 30
Department of Computer Sciences
My Research Focus
PROBLEM:Dynamically detect statistical and anomalous per-object behavior5in production systems Low overhead and high accuracy
SOLUTION: Exploit GC and underlying runtime system Focus only on interesting objects Find ways to summarize object properties
11-Dec-2006 UMCP 31
Department of Computer Sciences
Outline
Motivation: Programs have bugs Cork: Dynamic Memory Leak Detection for
Garbage-Collected Languages Summarize using a type points-from graph Interpret the summarization Find memory leaks with Cork
How to focus only on interesting objects Heap summarization with focus Conclusions and future work
11-Dec-2006 UMCP 32
Department of Computer Sciences
Memory-Related Bugs with GC Lost Pointer : lose pointer to memory
before freeing
Dangling Pointer : de-referencing pointer to memory previously freed
Unnecessary Reference : keeping pointer to memory no longer needed
Objects are live, can not reclaim
Object is live
Reclaims automatically
11-Dec-2006 UMCP 33
Department of Computer Sciences
Heap Occupancy GraphH
eap
Occ
up
ancy
(M
B)
Time (MB of allocation)
11-Dec-2006 UMCP 34
Department of Computer Sciences
Related Work Offline Techniques:
Static analysis [Heine et al. 03]
Heap differencing [JProbe, DePauw et al. 98, 99, 00]
Allocation and/or usage tracking [OptimizeIt, Rationale, Purify, HAT, HPROF, Shaham et al. 00]
Online Techniques: Leakbot (partially online) [Mitchell et al. 03]
Adaptive usage tracking [Chilimbi et al. 04, Bond et al. 06]
Cork accurately pinpoints systematicheap growth completely online
11-Dec-2006 UMCP 35
Department of Computer Sciences
Outline
Motivation: Programs have bugs Cork: Dynamic Memory Leak Detection for
Garbage-Collected Languages Summarize using a type points-from graph Interpret the summarization Find memory leaks with Cork
How to focus only on interesting objects Heap summarization with focus Conclusions and future work
11-Dec-2006 UMCP 36
Department of Computer Sciences
What do we know?
Objects have special properties Lifetime, allocation site, last-use site, calling
context, thread usage, etc. Tracking individual object properties is
useful for debugging Can use dynamic object sampling to
gather fine-grained object statisticsat very low overhead [Jump et al. 04]
11-Dec-2006 UMCP 37
Department of Computer Sciences
Dynamic Object Sampling
Tag objects with special properties One bit in the header indicates a tag Sample tag encodes object properties
Examples: Allocation site Last-use site Lifetime Which data structure
11-Dec-2006 UMCP 38
Department of Computer Sciences
For example, modify a bump-pointer allocator
Dynamic Object Sampling
Sample Tag
11-Dec-2006 UMCP 39
Department of Computer Sciences
During Garbage CollectionGather object statistics
Piggyback on object scanning
SAMPLE TAG FOUND!1. Examine tag2. Collect statistics
survivors
11-Dec-2006 UMCP 40
Department of Computer Sciences
Focus DOS Overhead Sampling every object
12% space overhead 6-7% time overhead
What is interesting depends application Memory leak detection … candidate types Malformed data structures … nodes Dynamic pretenuring … random sampling
Focus only on 6% of objects 0.8% space overhead 2-3% time overhead
6%
11-Dec-2006 UMCP 41
Department of Computer Sciences
DOS in Cork Encode allocation site and lifetime for
candidates <1.3% space overhead, ~4% time overhead Find specific allocation sites causing growth
Future work Encode last-use site in sample tag Requires read/write barrier for candidates Will overhead still be low enough for use in
production systems?
11-Dec-2006 UMCP 42
Department of Computer Sciences
Conclusions
Developed synergistic two techniques Dynamic object sampling Points-from graphs
See detailed object characteristics in high-level summarizations
Unique ways to debug software in production systems
Recommended