22
Program Analysis in Datalog Using the tool CS 510/08

Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

Embed Size (px)

Citation preview

Program Analysis in Datalog

Using the tool

CS 510/08

cs5

10

So

ftware

En

gin

eerin

g

bddbddb System Overview

Joeqfrontend

Java bytecode

Datalogprogram

Input relations

Output relations

cs5

10

So

ftware

En

gin

eerin

g

Compiler Frontend

Convert IR into tuplesTuples format:# V0:16 F0:11 V1:160 0 10 1 21470 0 1464

header line

one tuple per line

cs5

10

So

ftware

En

gin

eerin

g

Compiler Frontend

Robust frontends:Joeq compilerSoot compilerSUIF compiler (for C code)

Still experimental:Eclipse frontendgcc frontend…

cs5

10

So

ftware

En

gin

eerin

g

Extracting Relations

Idea: Iterate thru compiler IR, numbering and dumping relations of interest.

TypesMethodsFieldsVariables…

cs5

10

So

ftware

En

gin

eerin

g

joeq.Main.GenRelations

Generate initial relations for points-to analysis.Does initial pass to discover call graph.

Options:-fly: dump on-the-fly call graph info-cs : dump context-sensitive info-ssa : dump SSA representation-partial : no call graph discovery-Dpa.dumppath= : where to save files-Dpa.icallgraph= : location of initial call graph-Dpa.dumpdotgraph : dump call graph in dot

cs5

10

So

ftware

En

gin

eerin

g

Format of a Datalog file

DomainsName Size ( map file )V 65536 var.mapH 32768

RelationsName ( <attribute list> ) flagsStore (v1 : V, f : F, v2 : V) inputPointsTo (v : V, h : H) input, output

RulesHead :- Body .PointsTo(v1,h) :- Assign(v1,v), PointsTo(v,h).

cs5

10

So

ftware

En

gin

eerin

g

Demo

cs5

10

So

ftware

En

gin

eerin

g

Program Analysis in Datalog

Context Sensitivity

CS 510/08

cs5

10

So

ftware

En

gin

eerin

g

Existing Solution

Call strings based

cs5

10

So

ftware

En

gin

eerin

g

Cloning-Based Solution

Simple brute force technique.Clone every path through the call graph.Run context-insensitive algorithm on expanded call graph.

The catch: exponential blowup

cs5

10

So

ftware

En

gin

eerin

g

Cloning is exponential!

cs5

10

So

ftware

En

gin

eerin

g

Recursion

Actually, cloning is unbounded in the presence of recursive cycles.Technique: We treat all methods within a strongly-connected component as a single node.

cs5

10

So

ftware

En

gin

eerin

g

Recursion

A

G

B C D

E F

A

G

B C D

E F E F E F

G G

cs5

10

So

ftware

En

gin

eerin

g

Top 20 Sourceforge Java Apps

Number of Clones

1.E+001.E+021.E+041.E+061.E+081.E+101.E+121.E+141.E+16

1000 10000 100000 1000000Size of program (variable nodes)

Num

ber o

f clo

nes

1016

1012

108

104

100

cs5

10

So

ftware

En

gin

eerin

g

Cloning is infeasible (?)

Typical large program has ~1014 paths.If you need 1 byte to represent a clone:

Would require 256 terabytes of storage>12 times size of Library of CongressRegistered ECC 1GB DIMMs: $41.7 million

Power: 96.4 kilowatts = Power for 128 homes500 GB hard disks: 564 x $195 = $109,980

Time to read sequential: 70.8 days

cs5

10

So

ftware

En

gin

eerin

g

Key Insight

There are many similarities across contexts.Many copies of nearly-identical results.

BDDs can represent large sets of redundant data efficiently.

Need a BDD encoding that exploits the similarities.

cs5

10

So

ftware

En

gin

eerin

g

Context-sensitive Pointer Analysis Algorithm

1. First, do context-insensitive pointer analysis to get call graph.

2. Number clones.3. Do context-insensitive algorithm on the

cloned graph.

Results explicitly generated for every clone.Individual results retrievable with Datalog query.

cs5

10

So

ftware

En

gin

eerin

g

Counting rule

0<=i<=k-1

cs5

10

So

ftware

En

gin

eerin

g

Expanded Call Graph

A

DB C

E

F G

H

A0

D0B0 C0

E1

F2 G0

H0

E0 E2

F0 F1 G2G1

H1 H2 H3 H4 H5

cs5

10

So

ftware

En

gin

eerin

g

Numbering Clones

A

DB C

E

F G

H

0 0 0

0 1 2

0-2 0-2

0-2 3-5

0A0

D0B0 C0

E1

F2 G0

H0

E0 E2

F0 F1 G2G1

H1 H2 H3 H4 H5

cs5

10

So

ftware

En

gin

eerin

g

Context Sensitive PointsTo

vP0(v,h) means there is an invocation site h that assigns a newly allocated object to variable v