44
On Abstraction Refinement for Program Analyses in Datalog Xin Zhang , Ravi Mangal, Mayur Naik Georgia Tech Radu Grigore, Hongseok Yang Oxford University

On Abstraction Refinement for Program Analyses in Datalog

Embed Size (px)

DESCRIPTION

On Abstraction Refinement for Program Analyses in Datalog. Xin Zhang , Ravi Mangal , Mayur Naik Georgia Tech. Radu Grigore , Hongseok Yang Oxford University. Datalog for program a nalysis. Datalog. What is Datalog ?. Datalog. What is Datalog ?. Input relations: - PowerPoint PPT Presentation

Citation preview

Page 1: On Abstraction Refinement for Program Analyses in  Datalog

On Abstraction Refinement for Program Analyses in

DatalogXin Zhang, Ravi Mangal,

Mayur NaikGeorgia Tech

Radu Grigore, Hongseok YangOxford University

Page 2: On Abstraction Refinement for Program Analyses in  Datalog

2 6/10/2014

Datalog for program analysis

Datalog

Programming Language Design and Implementation, 2014

Page 3: On Abstraction Refinement for Program Analyses in  Datalog

3 6/10/2014

What is Datalog?

Datalog

Programming Language Design and Implementation, 2014

Page 4: On Abstraction Refinement for Program Analyses in  Datalog

4 6/10/2014

What is Datalog?

Datalog

Input relations:Output relations:Rules:

Programming Language Design and Implementation, 2014

Least fixpoint computation:

edge(i, j).path(i, j).

(1) path(i, i).(2) path(i, k) :- path(i, j), edge(j, k).

Input: edge(0, 1), edge(1, 2).path(0, 0).path(1, 1).path(2, 2).path(0, 1) :- path(0, 0), edge(0, 1).path(0, 2) :- path(0, 1), edge(1, 2).

Page 5: On Abstraction Refinement for Program Analyses in  Datalog

5 6/10/2014

Why Datalog?

Programming Language Design and Implementation, 2014

Datalog

If there exists a path from a to b, and there is an edge from b to c, then there exists a path from a to c:

path(a, c) :- path(a, b), edge(b, c).

Page 6: On Abstraction Refinement for Program Analyses in  Datalog

6 6/10/2014

Why Datalog?

Programming Language Design and Implementation, 2014

k-object-sensitivity,

k = 2, ~100KLOC

Page 7: On Abstraction Refinement for Program Analyses in  Datalog

7 6/10/2014

Limitation

Programming Language Design and Implementation, 2014

k-object-sensitivity,

k = 2, ~100KLOC

k-object-sensitivity,

k = 10, ~500KLOC

Page 8: On Abstraction Refinement for Program Analyses in  Datalog

8 6/10/2014

Program abstraction

Programming Language Design and Implementation, 2014

Precision

Scalability

Abstraction

Page 9: On Abstraction Refinement for Program Analyses in  Datalog

9 6/10/2014

Parametric program abstraction

Programming Language Design and Implementation, 2014

Precision

Scalability

Abstraction

1 1 1 1 1

Page 10: On Abstraction Refinement for Program Analyses in  Datalog

10 6/10/2014

Parametric program abstraction

Programming Language Design and Implementation, 2014

Precision

Scalability

Abstraction

1 0 1 1 0

Page 11: On Abstraction Refinement for Program Analyses in  Datalog

11 6/10/2014

Parametric program abstraction: Example 1

Programming Language Design and Implementation, 2014

1 0 1 1 0

Cloning depth K for each call site and

allocation site

Pointer Analysis

Page 12: On Abstraction Refinement for Program Analyses in  Datalog

12 6/10/2014

Parametric program abstraction: Example 2

Programming Language Design and Implementation, 2014

1 0 1 1 0

Predicates to use as

abstraction predicates

Shape Analysis

Page 13: On Abstraction Refinement for Program Analyses in  Datalog

13 6/10/2014

Program abstraction

Programming Language Design and Implementation, 2014

1 0 1 1 0 0 1 1 0 0

Datalog Progra

m

Datalog Progra

m

alias(p, q)?

alias(m, n)?

Page 14: On Abstraction Refinement for Program Analyses in  Datalog

14 6/10/2014

Program abstraction

Programming Language Design and Implementation, 2014

1 0 1 1 0 0 1 1 0 0

Datalog Progra

m

Datalog Progra

m

alias(p, q)?

alias(m, n)?

Counterexample guided refinement

(CEGAR) via MAXSAT

Page 15: On Abstraction Refinement for Program Analyses in  Datalog

15 6/10/2014

Pointer analysis example

Programming Language Design and Implementation, 2014

f(){ v1 = new ...; v2 = id1(v1); v3 = id2(v2);q2:assert(v3!= v1);}

id1(v){return v;}

g(){ v4 = new ...; v5 = id1(v4); v6 = id2(v5);q1:assert(v6!= v1);}

id2(v){return v;}

Page 16: On Abstraction Refinement for Program Analyses in  Datalog

16 6/10/2014

Pointer analysis as graph reachability

Programming Language Design and Implementation, 2014

0

1

2

3

4

5

6’

7

66’’

7’7’’

a1 a0 b0

c1 c0 d0 d1

a1

c1 c0

b0 b1

d1d0

a0

b1

Page 17: On Abstraction Refinement for Program Analyses in  Datalog

17 6/10/2014

Graph reachability in Datalog

Programming Language Design and Implementation, 2014

Input relations: edge(i, j, n), abs(n)

Output relations: path(i, j)

Rules: (1) path(i, i).(2) path(i, j) :- path(i, k), edge(k, j, n), abs(n).

0

1

2

3

4

5

6’

7

66’’

7’7’’

a1 a0 b0

c1 c0 d0 d1

a1

c1 c0

b0 b1

d1d0

a0

b1

Input tuples:edge(0, 6, a0), edge(0, 6’, a1), edge(3, 6, b0),

abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).

Query Tuple

Original Query

q1: path(0, 5)

assert(v6!= v1)

q2: path(0, 2)

assert(v3!= v1)16 possible abstractions

in total

Page 18: On Abstraction Refinement for Program Analyses in  Datalog

18 6/10/2014

Desired result

Programming Language Design and Implementation, 2014

0

1

2

3

4

5

6’

7

6

7’

6’’

7’’

a1 b0

c1 d0

a1

c1

b0

d0

a0

c0 d1

c0

b1

d1

a0

b1

Input tuples:edge(0, 6, a0), edge(0, 6’, a1), edge(3, 6, b0),

Query Answer

q1: path(0, 5)

a1b0c1d0

q2: path(0, 2)

Impossibility

abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).

Input relations: edge(i, j, n), abs(n)

Output relations: path(i, j)

Rules: (1) path(i, i).(2) path(i, j) :- path(i, k), edge(k, j, n), abs(n).

Page 19: On Abstraction Refinement for Program Analyses in  Datalog

19 6/10/2014

Iteration 1

Programming Language Design and Implementation, 2014

0

1

2

3

4

5

7

66’

7’

6’’

7’’

b0

d0

b0

d0

a0

c0

c0

a0

a1

c1

a1

c1

d1

b1

d1

b1

Query Eliminated Abstractions

q1: path(0, 5)

q2: path(0, 2)

abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).

path(0, 0).path(0, 6) :- path(0, 0), edge(0, 6, a0), abs(a0).path(0, 1) :- path(0, 6), edge(6, 1, a0), abs(a0).path(0, 7) :- path(0, 1), edge(1, 7, c0), abs(c0).path(0, 2) :- path(0, 7), edge(7, 2, c0), abs(c0).path(0, 4) :- path(0, 6), edge(6, 4, b0), abs(b0).path(0, 7) :- path(0, 4), edge(4, 7, d0), abs(d0).path(0, 5) :- path(0, 7), edge(7, 5, d0), abs(d0).…

Page 20: On Abstraction Refinement for Program Analyses in  Datalog

20 6/10/2014

Iteration 1 - derivation graph

Programming Language Design and Implementation, 2014

0

1

2

3

4

5

7

66’

7’

6’’

7’’

b0

d0

b0

d0

a0

c0

c0

a0

a1

c1

a1

c1

d1

b1

d1

b1

abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).

Query Eliminated Abstractions

q1: path(0, 5)

q2: path(0, 2)

Page 21: On Abstraction Refinement for Program Analyses in  Datalog

21 6/10/2014

Iteration 1 - derivation graph

Programming Language Design and Implementation, 2014

abs(d0)

path(0,6)edge(6,1,a0) edge(6,4,b0)

path(0,1) path(0,4)abs(c0)edge(1,7,c0) edge(4,7,d0)

path(0,7)edge(7,2,c0) edge(7,5,d0)

path(0,2) path(0,5)

abs(a0)edge(0,6,a0)path(0,0)

abs(a0) abs(b0)

abs(c0) abs(d0)

Page 22: On Abstraction Refinement for Program Analyses in  Datalog

22 6/10/2014

Iteration 1 - derivation graph

Programming Language Design and Implementation, 2014

abs(d0)

path(0,6)edge(6,1,a0) edge(6,4,b0)

path(0,1) path(0,4)abs(c0)edge(1,7,c0) edge(4,7,d0)

path(0,7)edge(7,2,c0) edge(7,5,d0)

path(0,2) path(0,5)

abs(a0)edge(0,6,a0)path(0,0)

abs(a0) abs(b0)

abs(c0) abs(d0)

a0c0

Page 23: On Abstraction Refinement for Program Analyses in  Datalog

23 6/10/2014

Iteration 1 - derivation graph

Programming Language Design and Implementation, 2014

abs(d0)

path(0,6)edge(6,1,a0) edge(6,4,b0)

path(0,1) path(0,4)abs(c0)edge(1,7,c0) edge(4,7,d0)

path(0,7)edge(7,2,c0) edge(7,5,d0)

path(0,2) path(0,5)

abs(a0)edge(0,6,a0)path(0,0)

abs(a0) abs(b0)

abs(c0) abs(d0)

a0c0

Page 24: On Abstraction Refinement for Program Analyses in  Datalog

24 6/10/2014

Iteration 1 - derivation graph

Programming Language Design and Implementation, 2014

abs(d0)

path(0,6)edge(6,1,a0) edge(6,4,b0)

path(0,1) path(0,4)abs(c0)edge(1,7,c0) edge(4,7,d0)

path(0,7)edge(7,2,c0) edge(7,5,d0)

path(0,2) path(0,5)

abs(a0)edge(0,6,a0)path(0,0)

abs(a0) abs(b0)

abs(c0) abs(d0)

a0b0d0

Page 25: On Abstraction Refinement for Program Analyses in  Datalog

25 6/10/2014

Iteration 1 - derivation graph

Programming Language Design and Implementation, 2014

0

1

2

3

4

5

7

66’

7’

6’’

7’’

b0

d0

b0

d0

a0

c0

c0

a0

a1

c1

a1

c1

d1

b1

d1

b1

abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).

Query Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0 (4/16)

q2: path(0, 2)

a0c0 (4/16)

Page 26: On Abstraction Refinement for Program Analyses in  Datalog

26 6/10/2014

Encoded as MAXSAT

Programming Language Design and Implementation, 2014

MAXSAT(Find thatMaximize Subject to

Hard Constraints

Soft Constraints

Page 27: On Abstraction Refinement for Program Analyses in  Datalog

27 6/10/2014

Encoded as MAXSAT

Programming Language Design and Implementation, 2014

abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).

Hard constraints:

Soft constraints:

Avoid all the counterexam

ples

Minimize the abstraction

cost

Page 28: On Abstraction Refinement for Program Analyses in  Datalog

28 6/10/2014

Encoded as MAXSAT

Programming Language Design and Implementation, 2014

Hard constraints:

Soft constraints:

Query Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0 (4/16)

q2: path(0, 2)

a0c0 (4/16)

Solution:

a1c0d0

Page 29: On Abstraction Refinement for Program Analyses in  Datalog

29 6/10/2014

Iteration 2 and beyond

Programming Language Design and Implementation, 2014

MAXSAT solver

Datalog

solver

a1c0d0

Iteration 1 Derivation

Query Answer Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0, a1d0 (4/16)

q2: path(0, 2)

a0c0, a1c0 (4/16)

Constraint s

 𝑪𝟏 𝑪𝟏∧

Page 30: On Abstraction Refinement for Program Analyses in  Datalog

30 6/10/2014

Iteration 2 and beyond

Programming Language Design and Implementation, 2014

MAXSAT solver

Datalog

solver

a1c0d0

Iteration 2 Derivation

Query Answer Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0, a1d0 (4/16)

q2: path(0, 2)

a0c0, a1c0 (4/16)

Constraint s

 𝑪𝟏 𝑪𝟏∧

Page 31: On Abstraction Refinement for Program Analyses in  Datalog

31 6/10/2014

Iteration 2 and beyond

Programming Language Design and Implementation, 2014

MAXSAT solver

Datalog

solver

a1c0d0

Iteration 2 Derivation

Query Answer Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0, a1d0 (4/16)

q2: path(0, 2)

a0c0

(4/16)

Constraint s

 𝑪𝟏

Page 32: On Abstraction Refinement for Program Analyses in  Datalog

32 6/10/2014

Iteration 2 and beyond

Programming Language Design and Implementation, 2014

MAXSAT solver

Datalog

solver

a1c1d0

Iteration 2 Derivation

Query Answer Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0, a1c0d0

(6/16)

q2: path(0, 2)

a0c0, a1c0 (8/16)

Constraint s

 𝑪𝟏

Page 33: On Abstraction Refinement for Program Analyses in  Datalog

33

Query Answer Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0, a1c0d0 (6/16)

q2: path(0, 2)

a0c0, a1c0 (8/16) 6/10/2014

Iteration 2 and beyond

Programming Language Design and Implementation, 2014

MAXSAT solver

Datalog

solver

a1c1d0

Iteration 3

q1 is proven.

a1c1d0

Derivation

Constraint s

 𝑪𝟏

Page 34: On Abstraction Refinement for Program Analyses in  Datalog

34

Constraint s

 𝑪𝟏

6/10/2014

Iteration 2 and beyond

Programming Language Design and Implementation, 2014

MAXSAT solver

Datalog

solver

a1c1d0

Query Answer Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0, a1c0d0 (6/16)

q2: path(0, 2)

a0c0, a1c0, a1c1, a0c1 (16/16)

Iteration 3

q2 is impossible to prove.

Impossibility

Derivation

q1 is proven.

a1c1d0

Page 35: On Abstraction Refinement for Program Analyses in  Datalog

35 6/10/2014

Mixing counterexamples

Programming Language Design and Implementation, 2014

Iteration 1 Iteration 3

a0c0 a1c1Eliminated Abstractio

ns:

Page 36: On Abstraction Refinement for Program Analyses in  Datalog

36 6/10/2014

Mixing counterexamples

Programming Language Design and Implementation, 2014

Iteration 1 Iteration 3

a0c0 a1c1a0c1

Mixed!

Eliminated Abstractio

ns:

Page 37: On Abstraction Refinement for Program Analyses in  Datalog

37

Implemented in JChord using off-the-shelf solvers: Datalog: bddbddb MAXSAT: MiFuMaX

Applied to two analyses that are challenging to scale: k-object-sensitivity pointer analysis:

flow-insensitive, weak updates, cloning-based typestate analysis:

flow-sensitive, strong updates, summary-based

Evaluated on 8 Java programs from DaCapo and Ashes.

6/10/2014

Experimental setup

Programming Language Design and Implementation, 2014

Page 38: On Abstraction Refinement for Program Analyses in  Datalog

38 6/10/2014

Benchmark characteristics

Programming Language Design and Implementation, 2014

classes methods bytecode(KB)

KLOC

toba-s 1K 6K 423 258

javasrc-p 1K 6.5K 434 265

weblech 1.2K 8K 504 326

hedc 1K 7K 442 283

antlr 1.1K 7.7K 532 303

luindex 1.3K 7.9K 508 295

lusearch 1.2K 8K 511 314

schroeder-m

1.9k 12K 708 460

Page 39: On Abstraction Refinement for Program Analyses in  Datalog

39 6/10/2014

Results: pointer analysis

Programming Language Design and Implementation, 2014

queries abstraction size

iterationstotal

resolved

current

baseline

final max

toba-s 7 7 0 170 18K 10

javasrc-p 46 46 0 470 18K 13

weblech 5 5 2 140 31K 10

hedc 47 47 6 730 29K 18

antlr 143 143 5 970 29K 15

luindex 138 138 67 1K 40K 26

lusearch 322 322 29 1K 39K 17

schroeder-m

51 51 25 450 58K 15

4-object-sensitivity

< 50%

< 3% of max

Page 40: On Abstraction Refinement for Program Analyses in  Datalog

40 6/10/2014

Performance of Datalog: pointer analysis

Programming Language Design and Implementation, 2014

lusearchk = 1, 153s

k = 2, 214s

k = 4, 3h28m

k = 3, 590s

Baseline

Page 41: On Abstraction Refinement for Program Analyses in  Datalog

41 6/10/2014

Performance of MAXSAT: pointer analysis

Programming Language Design and Implementation, 2014

lusearch

Page 42: On Abstraction Refinement for Program Analyses in  Datalog

42 6/10/2014

Statistics of MAXSAT formulae

Programming Language Design and Implementation, 2014

pointer analysis

variables

clauses

toba-s 0.7M 1.5M

javasrc-p 0.5M 0.9M

weblech 1.6M 3.3M

hedc 1.2M 2.7M

antlr 3.6M 6.9M

luindex 2.4M 5.6M

lusearch 2.1M 5M

schroeder-m

6.7M 23.7M

Page 43: On Abstraction Refinement for Program Analyses in  Datalog

43 6/10/2014

Conclusion

Programming Language Design and Implementation, 2014

Abstraction

Datalog MAXSAT

Page 44: On Abstraction Refinement for Program Analyses in  Datalog

44 6/10/2014

Conclusion

Programming Language Design and Implementation, 2014

Datalog

Soundness

Tradeoffs

MAXSAT

Hard Constraints

Soft Constraints

Scalability vs. Precision

Sound vs. Complete

A(x, y):- B(x, z), C(z, y)

𝐴 (𝑥 , 𝑦 )∨¬𝐵 (𝑥 , 𝑧 )∨¬𝐶 (𝑧 , 𝑦 )

1 0 1 1 0

𝑎𝑏𝑠 (𝑎0 ) 𝐰𝐞𝐢𝐠𝐡𝐭𝟏