Upload
dinhhuong
View
215
Download
1
Embed Size (px)
Citation preview
cs5
10
So
ftware
En
gin
eerin
g
bddbddb System Overview
Joeqfrontend
Java bytecode
Datalogprogram
Input relations
Output relations
cs5
10
So
ftware
En
gin
eerin
g
Compiler Frontend
Convert IR into tuplesTuples format:# V0:16 F0:11 V1:160 0 10 1 21470 0 1464
header line
one tuple per line
cs5
10
So
ftware
En
gin
eerin
g
Compiler Frontend
Robust frontends:Joeq compilerSoot compilerSUIF compiler (for C code)
Still experimental:Eclipse frontendgcc frontend…
cs5
10
So
ftware
En
gin
eerin
g
Extracting Relations
Idea: Iterate thru compiler IR, numbering and dumping relations of interest.
TypesMethodsFieldsVariables…
cs5
10
So
ftware
En
gin
eerin
g
joeq.Main.GenRelations
Generate initial relations for points-to analysis.Does initial pass to discover call graph.
Options:-fly: dump on-the-fly call graph info-cs : dump context-sensitive info-ssa : dump SSA representation-partial : no call graph discovery-Dpa.dumppath= : where to save files-Dpa.icallgraph= : location of initial call graph-Dpa.dumpdotgraph : dump call graph in dot
cs5
10
So
ftware
En
gin
eerin
g
Format of a Datalog file
DomainsName Size ( map file )V 65536 var.mapH 32768
RelationsName ( <attribute list> ) flagsStore (v1 : V, f : F, v2 : V) inputPointsTo (v : V, h : H) input, output
RulesHead :- Body .PointsTo(v1,h) :- Assign(v1,v), PointsTo(v,h).
cs5
10
So
ftware
En
gin
eerin
g
Cloning-Based Solution
Simple brute force technique.Clone every path through the call graph.Run context-insensitive algorithm on expanded call graph.
The catch: exponential blowup
cs5
10
So
ftware
En
gin
eerin
g
Recursion
Actually, cloning is unbounded in the presence of recursive cycles.Technique: We treat all methods within a strongly-connected component as a single node.
cs5
10
So
ftware
En
gin
eerin
g
Top 20 Sourceforge Java Apps
Number of Clones
1.E+001.E+021.E+041.E+061.E+081.E+101.E+121.E+141.E+16
1000 10000 100000 1000000Size of program (variable nodes)
Num
ber o
f clo
nes
1016
1012
108
104
100
cs5
10
So
ftware
En
gin
eerin
g
Cloning is infeasible (?)
Typical large program has ~1014 paths.If you need 1 byte to represent a clone:
Would require 256 terabytes of storage>12 times size of Library of CongressRegistered ECC 1GB DIMMs: $41.7 million
Power: 96.4 kilowatts = Power for 128 homes500 GB hard disks: 564 x $195 = $109,980
Time to read sequential: 70.8 days
cs5
10
So
ftware
En
gin
eerin
g
Key Insight
There are many similarities across contexts.Many copies of nearly-identical results.
BDDs can represent large sets of redundant data efficiently.
Need a BDD encoding that exploits the similarities.
cs5
10
So
ftware
En
gin
eerin
g
Context-sensitive Pointer Analysis Algorithm
1. First, do context-insensitive pointer analysis to get call graph.
2. Number clones.3. Do context-insensitive algorithm on the
cloned graph.
Results explicitly generated for every clone.Individual results retrievable with Datalog query.
cs5
10
So
ftware
En
gin
eerin
g
Expanded Call Graph
A
DB C
E
F G
H
A0
D0B0 C0
E1
F2 G0
H0
E0 E2
F0 F1 G2G1
H1 H2 H3 H4 H5
cs5
10
So
ftware
En
gin
eerin
g
Numbering Clones
A
DB C
E
F G
H
0 0 0
0 1 2
0-2 0-2
0-2 3-5
0A0
D0B0 C0
E1
F2 G0
H0
E0 E2
F0 F1 G2G1
H1 H2 H3 H4 H5