Foundations of Data-Flow Analysis

Foundations of Data-Flow Analysis

Basic Questions

Under what circumstances is the iterative algorithm used in the data-flow analysis correct?

How precise is the solution obtained by the iterative algorithm?

Will the iterative algorithm converge? What is the meaning of the solution to the

equations?

Data-Flow Analysis Framework A direction of the data flow D, which is either for

wards or backwards

A semilattice, which includes a domain of values

V and a meet operator A family F of transfer functions from V to V. This

family must include functions suitable for the bou

ndary conditions, which are constant transfer fun

ctions for the special nodes ENTRY and EXIT in

any control flow graph

Example: Reaching Definitions The direction: forwards The domain of values: the set of subsets of

the set of all definitions in the program The meet operator: set union The family of transfer functions: the set of

transfer functions for various statements

Semilattices A semilattice is a set V and a binary meet o

perator such that for all x, y, and z in V: x x = x (meet is idempotent) x y = y x (meet is commutative) x (y z) = (x y) z (meet is associative)

A semilattice has a top element, denoted 丅 , such that for all x in V, 丅 x = x

Optionally, a semilattice may have a bottom element, denoted , such that for all x in V, x =

Example: Reaching Definitions The domain of values is the set of all subsets

of the universal set U, or the power set of U, denoted 2U

The meet operator is the set union The set union is idempotent, commutative,

and associative The top element is the empty set The bottom element is the universal set U

Partial Orders A relation is a partial order on a set V if fo

r all x, y, and z in V: x x (the partial order is reflexive) If x y and y x, then x = y (the partial order is

antisymmetric) If x y and y z, then x z (the partial order is t

ransitive) The pair (V, ) is called a poset, or partially

ordered set We define x < y if and only if x y and x y

The Partial Order for a Semilattice It is useful to define a partial order for a sem

ilattice (V, ). For all x and y in V, we define x y if and only if x y = x

is reflexive: x x = x x x is antisymmetric:

x y x y = x, y x y x = y, x = (x y) = (y x) = y

is transitive: x y x y = x, y z y z = y, (x z) = ((x y) z) = (x (y z )) = (x y) = x x z

Example: Reaching Definitions The relation is the set inclusion

x y = x x y This says that sets larger in size is smaller in

the partial order The set inclusion is reflexive, antisymmetric,

and transitive

Greatest Lower Bounds

A greatest lower bound (or glb) of domain elements x and y is an element g such that

g x, g y, and If z is any element such that z x and z y, t

hen z g

Meet and Greatest Lower Bound The meet of x and y is the greatest lower

bound of x and y Let g = x y g x:

g x = (x y) x = x (y x) = x (x y) = (x x) y = x y = g

g y z x and z y z g

z g = z (x y) = (z x) y = z y = z

Lattice Diagrams

{d2}{d1} {d3}

{d1, d3}{d1, d2} {d2, d3}

{d1, d2, d3}

丅

Product Lattices

The product lattice for lattices (A, A) and (B, B) is defined as follows:

The domain of the product lattice is A B The meet for the product lattice:

(a, b) (a’, b’) = (a A a’, b B b’) The partial order for the product lattice:

(a, b) (a’, b’) iff a A a’ and b B b’ This definition can be extended to the product

of any number of lattices

Example

({},{},{})

({},{d2},{})({d1},{},{}) ({},{},{d3})

({d1},{},{d3})({d1},{d2},{}) ({},{d2},{d3})

({d1}, {d2}, {d3})

丅

Height of a Semilattice An ascending chain in a poset (V, ) is a sequence

x1 < x2 < … < xn

The height of a semilattice is the largest number of < relations in any ascending chain

An iterative data flow analysis algorithm is convergent if the corresponding semilattice has finite height

A lattice consisting of a finite set of values will have a finite height

It is also possible for a lattice with an infinite number of values to have a finite height

Transfer Functions

The family of transfer functions F: V V in a data-flow framework has the following properties:

F has an identity function I, such that I(x) = x for all x in V

F is closed under composition; that is, for any two functions f and g in F, the function h defined by h(x) = g(f(x)) is in F

Example: Reaching Definitions The identity function: gen[B] = kill[B] = Closure under composition:

f1(x) = G1 (x - K1), f2(x) = G2 (x - K2), f2(f1(x)) = G2 ((G1 (x - K1)) - K2)

= (G2 (G1 - K2 )) (x - (K1 K2)).

Let G = G2 (G1 - K2 ) and K = K1 K2. f(x) = f2(f1(x)) = G (x - K).

Monotone Frameworks

A framework (D, F, V, ) is monotone if x y implies f(x) f(y),

for all x and y in V, and f in F Equivalently, a framework (D, F, V, ) is mono

tone if f(x y) f(x) f(y), for all x and y in V, and f in F

Proof of Equivalence

() x y x and x y y f(x y) f(x) and f(x y) f(y) f(x) f(y) is the glb of f(x) and f(y) f(x y) f(x) f(y)() x y x y = x f(x y) = f(x) f(x) f(y) f(y) f(x) f(y)

Distributive Frameworks

A framework (D, F, V, ) is distributive if f(x y) = f(x) f(y)

for all x and y in V, and f in F

Distributivity implies monotonicity

Example: Reaching DefinitionsLet y and z be sets of definitions, and

f(x) = G (x - K)

Then

G ((y z) - K) = (G (y - K)) (G (z - K))

The Iterative Algorithm for General Frameworks: Input A control flow graph, with specially labeled ENTRY

and EXIT nodes, A direction of the data flow D, A set of values V, A meet operator , A set of functions F, where fB in F is the transfer func

tion for basic block B, and A constant value vENTRY or vEXIT in V, representing the

boundary condition for forward and backward frameworks, respectively

The Iterative Algorithm for General Frameworks: Output Values in V for IN[B] and OUT[B] for each

basic block B in the control flow graph

The Iterative Algorithm for General Frameworks: Forward

OUT[ENTRY] = vENTRY;

for (each basic block B other than ENTRY)

OUT[B] := 丅 ;

while (changes to any OUT occur)

for (each basic block B other than ENTRY) {

IN[B] := p pred(B) OUT[p];

OUT[B] := fB(IN[B]);

}

The Iterative Algorithm for General Frameworks: Backward

IN[EXIT] = vEXIT;

for (each basic block B other than EXIT)

IN[B] := 丅 ;

while (changes to any IN occur)

for (each basic block B other than EXIT) {

OUT[B] := s succ(B) IN[s];

IN[B] := fB(OUT[B]);

}

Properties of the Iterative Algorithm If the algorithm converges, the result is a soluti

on to the data-flow equations If the framework is monotone, then the solution

found is the maximum fixedpoint (MFP) of the data-flow equations. The maximum fixedpoint is a solution with the property that in any other solution, the value of IN[B] and OUT[B] are the corresponding values of MFP

If the semilattice of the framework is monotone and finite height, then the algorithm is guaranteed to converge

The Ideal Solution Consider any path

P = ENTRY B1 … Bk-1 Bk The transfer function for P is

fP = fBk-1(fBk-2

( … (fB1) … ))

The ideal solution is

IDEAL[B] = Ppossible paths from ENTRY to B fP(vENTRY) Any answer that is greater than IDEAL is incorr

ect Any value smaller than or equal to IDEAL is co

nservative, i.e., safe

The Meet-Over-Paths Solution Finding all possible paths is undecidable The meet-over-paths solution is

MOP[B] = P paths from ENTRY to B fP(vENTRY) The paths considered in the MOP solution is

a superset of all the paths that are possibly executed

MOP[B] IDEAL[B]

MFP Solution versus MOP Solution The iterative algorithm visits basic blocks, not

necessarily in the order of execution At each confluence point, the algorithm

applies the meet operator to the data-flow values obtained so far. Some of these values used were introduced artificially in the initialization process, not representing the result of any execution from the beginning of the program

Early Meet over Paths

ENTRY

B1 B2

B4

B3

MOP[B4] = ((f B3 f B1

) (f B3 f B2

))(vENTRY)

IN[B4] = f B3 ((f B1

(vENTRY) f B2

(vENTRY)))

Comparison of Solutions

Using the iterative algorithm, we have

IN[B] MOP[B]

for monotone frameworks and

IN[B] = MOP[B]

for distributive frameworks

MFP MOP IDEAL

Documents

Foundations of Data-Flow Analysis