Modular Heap Analysis Of Higher Order ProgramsRavichandhran Madhavan + *Ganesan Ramalingam *Kapil Vaswani *
* Microsoft Research India+ EPFL, Switzerland
Goal 1: Analyze Modularly
β’ Compute succinct summaries for procedures
β’ Summaries: total functions approximating the relational semantics
πΎ (ππ’ππππ π¦ π)
Input State
Output States
[π ]πβ
Goal 2: Track Heap Information
β’ The summary of a procedure should capture the transformation of the input mutable heap
Goal 3: Analyze HO programs
β’ Should be able to summarize higher order proceduresβ’ Input state includes data as well as code
Challengeβ’ Indirect procedure calls esp. Call backsβ’ Virtual method calls, function pointer calls, lambda expressions
Foo(PTR* p , FP* fp){ *p = (**fp)(0);}
Count() { iter = this.iterator(); i = 0; while(iter.HasNext()) { iter.next(); i++; } }
Challengeβ’ All widely used languages support Higher Order constructs
But how do existing modular analyses
handle them ?
A Common Hackβ’ Estimate the targets of the indirect calls through an
inexpensive analysis E.g.
β’ CHA, RTI analysis for OO programsβ’ Light weight pointer analysis β¦
β’ Construct a conservative call graph
β’ Analyze bottom up
Limitations of the Hackβ’ Over-approximated targets
β’ A call-graph is necessarily context insensitive for HO programs
A
C
B
D
EBβs context
Aβs context
Limitations of the Hackβ’ Inability to construct client independent summaries
Foo(FP* fp){ (*fp)(β¦);}
m1(){ β¦}
C1(){ Foo(m1);}
m2(){ β¦}
C2(){ Foo(m2)}
Resolved to m1
Summary:
Limitations of the Hack
β’ Reuse of summaries possible only within an analysisβ’ Need to analyze libraries together with clientsβ’ Need to reanalyze libraries for each new client
Doesnβt allow library compositional analysis
Our approach
β’ Use existing techniques for summarizing first-order code segments:
β’ [Whaley, Salcianu, Rinard, OOPSLA β99, VMCAI β04]β’ [Madhavan et al., SAS β11]
β’ Retain the call backs in the summaries
Our approach
β’ Perform as much simplification as possible without the knowledge of the calling context
β’ Eliminate fully resolved calls from the summaries
Enables efficient library compositional analysis
Illustration1
7
2
4
3
5
6
*fp(a,b)π24
π13π12
π56
π67π47
Illustration1
7
2
4
3
5
6
*fp(a,b)π24
π13π12
π56
π67π47
Illustration
3
5
6
*fp(a,b)
π13
π56
π67
7
1
π17
Illustration
3
5
*fp(a,b)
7
1
π17
π57
π13
Exploiting Local Context
3
5
*fp(a,b)
7
1
π17
π57=(π π ,πβ²)
π13
3
5
*fp(a,b)
7
1
π17
π β²
π13
π πβπ13
Frame Rule
Exploiting Local Context
3
5
*fp(a,b)
7
1
π17
π57=(π π ,πβ²)
π13
3
5
*fp(a,b)
7
1
π β²
π13Frame Rule
Flow Insensitive Abstraction
3
5
*fp(a,b)
7
1
π
π
π
π
3
5
*fp(a,b)
7
1
π57
π13
π17
π=π13βπ17βπ57
Flow Insensitive Abstraction
(π , \{π1 ,β¦ ,ππ \})
HO summary = First order summary +
set of call backs
π1β¦ππ
2
3
4
1
π
π
π
π
Composition Operation
(π1 ,π1)π1;π2 ;β¦;ππ π1β¦ππ
2
3
4
1
π1
π1
π1
π1
(π2 ,π2)ππ+1;β¦;ππ π1β¦π π
6
7
8
5
π2
π2
π2
π2
ID
Composition Operation
(π1 ,π1)π1;π2 ;β¦;ππ
(π2 ,π2)ππ+1;β¦;ππ
π1β¦ππ
2
3
4
1
π1
π1
π1
π1
π1β¦π π
6
7
8
5
π2
π2
π2
π2
IDπ2βπ1
Composition Operation
β’ where , is the composed abstract state
β’ When the first order summaries (and hence composition) are isotonic:
(π2 ,π2 )β (π1 ,π1 )=(π2βπ1 ,π1βͺπ2)
Handling Direct Callsβ’ Handle direct calls via summary composition
(ππ ,ππ)
(ππ ,ππ)
ΒΏ
Call backs in the callee are inlined in the caller
Indirect call Resolution
(π , \{π1 \})
3
5
7
1
π π
π
π1B (ππ , \{π2 \})
(π βππ )ββπ
(π2 , \{π1 ,π2 \})
π
A
Indirect Call Resolution
A
(π , \{π1 \})
calls B
B
(π4β , \{π1 ,π2 ,π3 \})(π2
β , \{π1 ,π2 \})
calls C
C
(π3β , \{π1 ,π2 ,π3 \})
calls B calls A
(ππ , \{π2 \})
(ππ , \{π3 \})
Indirect Call Resolution
A
(π πβ 1β ,π πβ1)
β¦
BC
(ππ , \{π2 \})
(ππ , \{π3 \})
(π πβ ,ππ)β¦..
Fixed point
Eliminating resolved calls
(a) is Non escaping.Unreachable from
indirect calls and prestate
π2
(b) and are unreachable from prestate and other call backs
ππ1
*fp1
β¦
Resolved calls
Foo
*fp2
ππ2 β¦
π1Bar
Experimental Evaluation
β’ Applied to Purity/Side-effects Analysis for C# libraries
β’ Every method is classified as:
β’ Pure β No side-effects β’ Conditionally Pure β Purity depends on the calling contextβ’ Impure β Has side-effectsβ’ Impure and Incomplete β Has side-effects and can have more
depending on the calling context
Experimental ResultsBenchmark LOC Pure C-Pure Impure I-Impure Time
DocX 10K ~ 1 min
FB APIs 2.2% 32%
Data Disp. 57%
Test APIs
Json Libs
Quickgraph
Refactory libs 30% 8%
Utility Libs 32% 8%
PDF libs 28.4%
GPS libs 250K ~ 2 hrs
10 β 20%
15 β 30%
20 β 30%
2 β 27 min
Analysis StatisticsBenchmark Unresolved
CallsNon Escaping Abs. Objects
DocX
FB APIs 9%
Data Disp.
Test APIs
Json Libs 7.3
Quickgraph
Refactory libs
Utility Libs
PDF libs 37%
GPS libs 5.9
2 β 4
10 β 33 %
Comparison with CHA CG based Bottom up Analysis
Benchmark Time # of SCCs Avg. Scc size
DocX 12x 0 NA
FB APIs 11x 3x 1.5x
Data Disp. 6x 6x
Test APIs 6x 2x 1.25x
Json Libs 2x 6x
Quickgraph 11x 33x
Refactory libs 1.4x 5.6x
Utility Libs 30x 4x 12x
PDF libs 2x 3.5x 1.5x
Conclusion
β’ A principled approach
β’ Formalized as an Abstract Interpretation
β’ A generic theory agnostic to the underlying compositional heap analysis
β’ Goto www.rise4fun.com/seal for a hands-on experience