Control Flow Analysis Mooly Sagiv sagiv/courses/pa.html Tel Aviv University 640-6706 Sunday 18-21...

Preview:

Citation preview

Control Flow AnalysisMooly Sagiv

http://www.math.tau.ac.il/~sagiv/courses/pa.html

Tel Aviv University

640-6706

Sunday 18-21 Scrieber 8

Monday 10-12 Schrieber 317

Textbook Chapter 3(Simplified+OO)

Goals

Understand the problem of Control Flow Analysis– in Functional Languages

– In Object Oriented Languages

– Function Pointers

Learn Constraint Based Program Analysis Technique– General

– Usage for Control Flow Analysis

– Algorithms

– Systems

Similarities between Problems &Techniques

Outline A Motivating Example (OO) The Control Flow Analysis Problem A Formal Specification Set Constraints Solving Constraints Adding Dataflow information Adding Context Information Back to the Motivating Example Conclusions

A Motivating Exampleclass Vehicle Object { int position = 10; void move(x1 : int) { position = position + x1 ;}}class Car extends Vehicle { int passengers;

void await(v : Vehicle) { if (v.position < position) then v.move(position - v.position); else self.move(10); }}class Truck extends Vehicle {

void move(x2 : int) { if (x2 < 55) position = position + x2; }}void main { Car c; Truck t; Vehicle v1;

new c; new t; v1 := c;c.passangers := 2;c.move(60);v1.move(70);c.await(t) ;}

The Control Flow Analysis (CFA) Problem

Given a program in a functional programming language with higher order functions(functions can serve as parameters and return values)

Find out for each function invocation which functions may be applied

Obvious in C without function pointers Difficult in C++, Java and ML The Dynamic Dispatch Problem

An ML Example

let f = fn x => x 1 ;

g = fn y => y + 2 ;

h = fn z => z + 3;

in (f g) + (f h)

An ML Example

let f = fn x => /* {g, h} */ x 1 ;

g = fn y => y + 2 ;

h = fn z => z + 3;

in (f g) + (f h)

The Language FUN Notations

– e Exp // expressions (or labeled terms)

– t Term // terms (or unlabeled terms)

– f, x Var // variables

– c Const // Constants

– op Op // Binary operators

– l Lab // Labels

Abstract Syntax– e ::= tl

– t ::= c | x | fn x e // function definition | fun f x e // recursive function definition | e1 e2 // function applications | if e0 then e1 else e2 | let x = e1 in e2 | e1 op e2

A Simple Example

((fn x x1)2 (fn y y3)4)5

An Example which Loops

(let g = fun f x (f1 (fn y y2)3)4

)5

(g6 (fn z z7)8)9

)10

The 0-CFA Problem Compute for every program a pair (C, ) where:

– C is the abstract cache associating abstract values with labeled program points

is the abstract environment associating abstract values with variables

Formally– v Val = P(Term) // Abstract values Env = Var Val // Abstract environment

– C Cache - Lab Val // Abstract Cache

– For function application (t1l1 t2

l2)l

C(l1) determine the function that can be applied

These maps are finite for a given program No context is considered for parameters

Possible Solutions for ((fn x x1)2 (fn y y3)4)5

1 {fn y y3} {fn y y3}

2 {fn x x1} {fn x x1}

3 {} {}

4 {fn y y3} {fn y y3}

5 {fn y y3} {fn y y3}

x {fn y y3} {}

y {} {}

(let g = fun f x (f1 (fn y y2)3)4

)5

(g6 (fn z z7)8)9

)10

Shorthand

sf fun f x (f1 (fn y y2)3)4

idy fn y y2

idz fn z z7

C(1) = {sf} C(2) = {} C(3) = {idy}

C(4) = {} C(5) = {sf} C(6) = {sf}

C(7) = {} C(8) = {idy} C(9) = {}

C(10) = {} (x) = {idy , idy } (y) = {}

(z) = {}

Relationship to Dataflow Analysis

Expressions are side effect free– no entry/exit

A single environment Represents information at different points via

maps A single value for all occurrences of a variable Function applications act similar to assignments

– “Definition” - Function abstraction is created

– “Use” - Function is applied

A Formal Specification of 0-CFA

A Boolean function define when a solution is acceptable

(C, ) e means that (C, ) is acceptable for the expression e

Define by structural induction on e Every function is analyzed once Every acceptable solution is sound (conservative) Many acceptable solutions Generate a set of constraints Obtain the least acceptable solution by solving the

constraints

Syntax Directed 0-CFA(Simple Expressions)

[const] (C, ) cl always[var] (C, ) xl if (x) C (l)

Syntax Directed 0-CFAFunction Abstraction

[fn] (C, ) (fn x e)l if:(C, ) e

fn x e C(l) [fun] (C, ) (fun f x e)l if:

(C, ) efun x e C(l)

fun x e (f)

Syntax Directed 0-CFAFunction Application

[app] (C, ) (t1l1 t2

l2)l if:(C, ) t1

l1

(C, ) t2l2

for all fn x t0l0 C(l):

C (l2) (x) C(l0) C(l) for all fun x t0

l0 C(l): C (l2) (x) C(l0) C(l)

Syntax Directed 0-CFAOther Constructs

[if] (C, ) (if t0l0 then t1

l1 else t2l2)l if:

(C, ) t0l0

(C, ) t1l1

(C, ) t2l2

C(l1) C(l)C(l2) C(l)

[let] (C, ) (let x = t1l1 in t2

l2)l if:(C, ) t1

l1

(C, ) t2l2

C(l1) (x) C(l2) C(l)

[op] (C, ) (t1l1 op t2

l2)l if:(C, ) t1

l1

(C, ) t2l2

Possible Solutions for ((fn x x1)2 (fn y y3)4)5

1 {fn y y3} {fn y y3}

2 {fn x x1} {fn x x1}

3 {} {}

4 {fn y y3} {fn y y3}

5 {fn y y3} {fn y y3}

x {fn y y3} {}

y {} {}

Set Constraints

A set of rules of the form:– lhs rhs

– {t} rhs’ lhs rhs (conditional constraint)

– lhs, rhs, rhs’ are» terms

» C(l)(x)

The least solution (C, ) can be found iterativelly– start with empty sets

– add terms when needed

Efficient cubic graph based solution

Syntax Directed Constraint Generation (Part I)

C* cl = {}C* xl = { (x) C (l)}

C* (fn x e)l = C* e { {fn x e} C(l)}C* (fun x e)l = C* e { {fun x e} C(l)} {{fun x e} ( f)}

C* (t1l1 t2

l2)l = C* t1l1 C* t2

l2 {{t} C(l) C (l2) (x) | t=fn x t0

l0 Term* } {{t} C(l) C (l0) C (l) | t=fn x t0

l0 Term* } {{t} C(l) C (l2) (x) | t=fun x t0

l0 Term* } {{t} C(l) C (l0) C (l) | t=fun x t0

l0 Term* }

Syntax Directed Constraint Generation (Part II)

C* (if t0l0 then t1

l1 else t2l2)l = C* t0

l0 C* t1l1 C* t2

l2 {C(l1) C (l)} {C(l2) C (l)}

C* (let x = t1l1 in t2

l2)l = C* t1l1 C* t2

l2 {C(l1) (x)} {C(l2) C(l)}

C* (t1l1 op t2

l2)l = C* t1l1 C* t2

l2

Set Constraints for ((fn x x1)2 (fn y y3)4)5

Iterative Solution to the Set Constraints for ((fn x x1)2 (fn y y3)4)5

step Constraint 1 2 3 4 x y

Adding Data Flow Information

Dataflow values can affect control flow analysis Example

(let f = (fn x (if (x1 > 02)3 then (fn y y4)5

else (fn z 56)7

)8

)9

in ((f10 311)12 013)14)15

Adding Data Flow Information Add a finite set of “abstract” values per program

Data Update Val = P(TermData)

Env = Var Val // Abstract environment

– C Cache - Lab Val // Abstract Cache

Generate extra constraints for data Obtained a more precise solution A special of case of product domain (4.4) The combination of two analyses may be more

precise than both For some programs may even be more efficient

Adding Dataflow Information (Sign Analysis)

Sign analysis Add a finite set of “abstract” values per program

Data = {P, N, TT, FF} Update Val = P(TermData) dc is the abstract value that represents a constant c

– d3 = {p}

– d-7= {n}

– dtrue= {tt}

– dfalse= {ff}

Every operator is conservatively interpreted

Syntax Directed Constraint Generation (Part I)

C* cl = dc C (l)}C* xl = { (x) C (l)}

C* (fn x e)l = C* e { {fn x e} C(l)}C* (fun x e)l = C* e { {fun x e} C(l)} {{fun x e} ( f)}

C* (t1l1 t2

l2)l = C* t1l1 C* t2

l2 {{t} C(l) C (l2) (x) | t=fn x t0

l0 Term* } {{t} C(l) C (l0) C (l) | t=fn x t0

l0 Term* } {{t} C(l) C (l2) (x) | t=fun x t0

l0 Term* } {{t} C(l) C (l0) C (l) | t=fun x t0

l0 Term* }

Syntax Directed Constraint Generation (Part II)

C* (if t0l0 then t1

l1 else t2l2)l = C* t0

l0 C* t1l1 C* t2

l2 {dt C (l0) C(l1) C (l)} {df C (l0) C(l2) C (l)}

C* (let x = t1l1 in t2

l2)l = C* t1l1 C* t2

l2 {C(l1) (x)} {C(l2) C(l)}

C* (t1l1 op t2

l2)l = C* t1l1 C* t2

l2 {C(l1) op C(l2) C(l)}

Adding Context Information The analysis does not distinguish between different

occurrences of a variable(Monovariant analysis)

Example(let f = (fn x x1) 2

in ((f3 f4)5 (fn y y6) 7)8)9

Source to source can help (but may lead to code explosion)

Example rewrittenlet f1 = fn x1 x1 in let f2 = fn x2 x2

in (f1 f2) (fn y y)

Simplified K-CFA

Records the last k dynamic calls (for some fixed k)

Similar to the call string approach Remember the context in which expression is

evaluated Val is now P(Term)Contexts

Env = Var Contexts Val

– C Cache - LabContexts Val

1-CFA (let f = (fn x x1) 2 in ((f3 f4)5 (fn y y6) 7)8)9

Contexts– [] - The empty context

– [5] The application at label 5

– [8] The application at label 8

Polyvariant Control FlowC(1, [5]) = (x, 5)= C(2, []) = C(3, []) = (f, []) = ({(fn x x1)}, [] )C(1, [8]) = (x, 8)= C(7, []) = C(8, []) = C(9, []) = ({(fn y y6)}, [] )

The Motivating Exampleclass Vehicle Object { int position = 10; void move(x1 : int) { position = position + x1 ;}}class Car extends Vehicle { int passengers;

void await(v : Vehicle) { if (v.position < position) then v.move(position - v.position); else self.move(10); }}class Truck extends Vehicle {

void move(x2 : int) { if (x2 < 55) position = position + x2; }}void main { Car c; Truck t; Vehicle v1;

new c; new t; v1 := c;c.passangers := 2;c.move(60);v1.move(70);c.await(t) ;}

Missing Material Efficient Cubic Solution to Set Constraints

www.cs.berkeley.edu/Research/Aiken/bane.html Experimental results for OO

www.cs.washington.edu/research/projects/cecil Operational Semantics for FUN (3.2.1) Defining acceptability without structural induction

– More precise treatment of termination (3.2.2)

– Needs Co-Induction (greatest fixed point)

Using general lattices as Dataflow values instead of powersets (3.5.2)

Lower-bounds– Decidability of JOP– Polynomiality

Conclusions

Set constraints are quite useful– A Uniform syntax

– Can even deal with pointers

But semantic foundation is still based on abstract interpretation

Techniques used in functional and imperative (OO) programming are similar

Control and data flow analysis are related

Recommended