Course Outline Traditional Static Program Analysis –Classic analyses and applications Software...

Preview:

Citation preview

Course Outline• Traditional Static Program Analysis

– Classic analyses and applications

• Software Testing, Refactoring

• Dynamic Program Analysis

Announcements

• I am setting up a new page for the class at

www.rpi.edu/~milana2/csci6961

• Email: milana2@rpi.edu, milanova@cs.rpi.edu

Outline

• Data-flow frameworks– The “Maximal Fixed Point” (MFP) solution– The “Meet Over all Paths” (MOP) solution

• Analysis of object references– Class Hierarchy Analysis (CHA)– Rapid Type Analysis (RTA)

The MOP Solution

1. x:=a*b

2. if y<=a*b

3. a:=a+1

4. x:=x*b

5. goto 2

The MOP at entry of n is V fp(init(ρ))

The MOP over-approximates run-time dataflow facts.The MOP is the best summary of dataflow facts.

p in paths from ρ to n

MOP at entry of 3: f2(f1(Ø)) U f2(f5(f4(f3(f2(f1(Ø)))))) U f2(f5(f4(f3(f2(f5(f4(f3(f2(f1(Ø)))))))))) U … = {(x,1),(x,4),(a,3)}

This MOP over-approximates the reachingdefinitions at entry of 3: E.g., suppose that at the beginning y=1, a=1 and b=2. The actual reaching definitions at entry of 3: {(x,1)} !!!

TF

The MFP Solution

1. x:=a*b

2. if y<=a*b

3. a:=a+1

4. x:=x*b

5. goto 2

The MFP at entry of 3 is the in(3) obtained as a solution of the following equations through fixed-point iteration:

in(1) = Øout(1) = f1(in(1))

TF

in(2) = out(1) U out(5)out(2) = f2(in(2))

in(3) = out(2)out(3) = f3(in(3))

in(4) = out(3)out(4) = f4(in(4))

in(5) = out(4)out(5) = f5(in(5)) = in(5)

The MOP and MFP Solutions

{}

{(x,1)} {(x,4)} {(a,3)}

{(x,1),(x,4)} {(x,4),(a,3)} {(x,1),(a,3)}

{(x,1),(x,4),(a,3)}

1. x:=a*b

2. if y<=a*b

3. a:=a+1

4. x:=x*b

5. goto 2

0

1

in(2), in(3), in(4)

in(1)

in(2), in(3), in(5)

in(4)

in(5)

MOP vs. MFP

• For distributive functions the dataflow analysis can merge paths (p1, p2), without loss of precision!– E.g., fp1(0) need not be calculated explicitly

– MFP=MOP

• Due to Kam and Ullman, 1976,1977: This is not true for monotone functions– MFP≥MOP. In general, MOP is undecidable

• A solution S, S≥MOP, is an unsafe solution– Other terms: unsafe, incorrect, unsound solution

Many Applications!

• White-box testing: compute coverage

• Regression testing

• Reverse engineering

• Restructuring: automated refactoring

• Static debugging– Memory errors – Concurrency bugs

Analysis of object references

• Analysis of object-oriented programs– Java

• Class Analysis problem: Given a reference variable x, what are the classes of the objects that x refers to at runtime?

• Points-to Analysis problem: Given a reference variable x, what are the objects that x refers to at runtime?

Example: BoolExp hierarchyclass BoolExp { public: BoolExp(); virtual bool Evaluate(Context&)=0;};

class Constant : public BoolExp { public: Constant(bool); virtual bool Evaluate (Context&); private: bool _constant;};Constant::Constant(bool c) { _constant = c; }bool Constant::Evaluate(Context& aContext) { return _constant;}

class VarExp : public BoolExp { public: VarExp(char *); virtual bool Evaluate (Context&); private: char* _name;};VarExp::VarExp(char * n) { _name = n; }bool VarExp::Evaluate(Context& aContext) { return aContext.Lookup(_name); }

Example: BoolExp hierarchyclass AndExp : public BoolExp { public: AndExp(BoolExp*, BoolExp*); virtual bool Evaluate (Context&); NOTE: NEED DESTRUCTORS!!! private: BoolExp* _operand1;

BoolExp* _operand2;};AndExp::AndExp(BoolExp* op1, BoolExp* op2) { _operand1=op1; _operand2=op2; }bool AndExp::Evaluate(Context& aContext) { return _operand1->Evaluate(aContext) && _operand2->Evaluate(aContext); }

class OrExp : public BoolExp { public: OrExp(BoolExp*, BoolExp*); virtual bool Evaluate (Context&); private: BoolExp* _operand1;

BoolExp* _operand2;};OrExp::OrExp(BoolExp* op1, BoolExp* op2) { _operand1=op1; _operand2=op2; }bool OrExp::Evaluate(Context& aContext) { return _operand1->Evaluate(aContext) || _operand2->Evaluate(aContext); }

A client of the BoolExp hierarchy

main() { Context theContext; VarExp* x = new VarExp(“X”); VarExp* y = new VarExp(“Y”); BoolExp* exp = new AndExp(

new Constant(true), new OrExp(x, y) ); theContext.Assign(x, true); theContext.Assign(y, false); bool result = exp->Evaluate(theContext);}

Java Example: BoolExp hierarchypublic abstract class BoolExp { public boolean Evaluate(Context c);};

public class Constant extends BoolExp { private boolean _constant; public boolean Evaluate(Context c) {

return _constant; }

public class VarExp extends BoolExp { private String _name; public boolean Evaluate(Context c) { return c.Lookup(_name);}

Java Example: BoolExp hierarchypublic class AndExp extends BoolExp { private BoolExp _operand1; private BoolExp _operand2;

public AndExp(BoolExp op1, BoolExp op2) { _operand=op1; _operand2=op2; } public boolean Evaluate(Context c) { return _operand1.Evaluate(c) && _operand2.Evaluate(c); }}

public class OrExp extends BoolExp { private BoolExp _operand1; private BoolExp _operand2;

public OrExp(BoolExp op1, BoolExp op2) { _operand=op1; _operand2=op2; } public boolean Evaluate(Context c) { return _operand1.Evaluate(c) || _operand2.Evaluate(c); }}

A client of the BoolExp hierarchy in Javamain() { Context theContext; VarExp x = new VarExp(“X”); VarExp y = new VarExp(“Y”); BoolExp exp = new AndExp(

new Constant(true), new OrExp(x, y) ); theContext.Assign(x, true); theContext.Assign(y, false); boolean result = exp.Evaluate(theContext);}

exp: {AndExp}

That is: At runtime exp may refer to (i.e., may point to) an object of class AndExp, but may not refer to an object of class OrExp!

Java Example: BoolExp hierarchypublic class AndExp extends BoolExp { private BoolExp _operand1; private BoolExp _operand2;

public AndExp(BoolExp op1, BoolExp op2) { _operand=op1; _operand2=op2; } public boolean Evaluate(Context c) { return _operand1.Evaluate(c) && _operand2.Evaluate(c); }}

public class OrExp extends BoolExp { private BoolExp _operand1; private BoolExp _operand2;

public OrExp(BoolExp op1, BoolExp op2) { _operand=op1; _operand2=op2; } public boolean Evaluate(Context c) { return _operand1.Evaluate(c) || _operand2.Evaluate(c); }}

_operand1: {Constant} _operand2: {OrExp}

_operand1: {VarExp} _operand2: {VarExp}

Class information: applications

• Compilers: can we devirtualize a virtual function call x.m()/x->m()?

• Software engineering– The calling relations in the program: call graph– Testing– Most interesting analyses require this information

Some terminology• Intraprocedural analysis

– So far, we assumed there are no procedure calls!– Analysis that works within a procedure and approximates

(or does not need) flow into and from procedures

• Interprocedural analysis– Takes into account procedure calls and tracks flow into and

from procedures – Many issues:

• Parameter passing mechanisms• Context• Call graph!• Functions as parameters!

– We will get back to this in a few classes…

Scalability• For most analyses (including class analysis) we need

interprocedural analysis on very large programs• Can the analysis handle large programs?

– 100K LOC, up to 45M LOC?

• Approximations of standard fixed point iteration– Reduce Lattice

– Reduce CFG

– Make transfer functions converge faster

– Other…

Today’s class

• Some simple interprocedural class analyses

• Class analysis: Given a reference variable x, what are the classes of the objects that x refers to at runtime?

• Class Hierarchy Analysis (CHA)• Rapid Type Analysis (RTA)

Class Hierarchy Analysis (CHA)• The simplest method of inferring

information about reference variables– Look at the class hierarchy

• In Java, if a reference variable r has a type A, the possible classes of run-time objects are included in the subtree of A. Denoted by cone(A).

– At virtual call site r.m find the methods that may be called based on the hierarchy information

J. Dean, D. Grove, and C. Chambers, Optimization of OO Programs Using Static Class Hierarchy Analysis, ECOOP’95

Example

public class A {public static void main() {

A a;D d = new D();E e = new E();if (…) a = d; else a = e;a.f(); }

… }public class B extends A {

public void foo() {G g = new G();…

} // there are no other creation sites // or calls in the program

f()

A

B C

G D E

f()

f()

f()

Example

A

B C

G D E

f()

f()

f()

public class A {public static void main() {

A a;D d = new D();E e = new E();if (…) a = d; else a = e;a.f(); }

… }public class B extends A {

public void foo() {G g = new G();… }

… } // there are no other creation sites // or calls in the program

The solution for reference variables by CHA is: a may refer to objects of classes {A,B,C,D,E,G}, d may refer to objects of class {D}, e may refer to objects of class {E}, and g to {G}.

Cone(C)

f()

Example

public class A {public static void main() {

A a;D d = new D();E e = new E();if (…) a = d; else a = e;a.f(); }

… }public class B extends A {

public void foo() {G g = new G();… }

… } // there are no other creation sites // or calls in the program

main

A.f B.f C.f G.f

A

B C

G D E

f()

f()

f()

f()

a.f():

Example: Applies-to Sets

public class A {public static void main() {

A a;D d = new D();E e = new E();if (…) a = d; else a = e;a.f(); }

… }public class B extends A {

public void foo() {G g = new G();… }

… } // there are no other creation sites // or calls in the program

main

A.f B.f C.f G.f

A

B C

G D E

f()

f()

f()

f()

a.f():

Applies-to sets: A.f = {A}; B.f = {B}; G.f = {G}; C.f = {C,D,E}

Observations on CHA• Do we need to resolve the class of the receiver

uniquely in order to devirtualize a call?

• Applies-to set for each method – At a call site r.f(), take the set of possible classes for

the receiver r; intersect this set with each possible method’s applies-to set.

– If only one method’s set has a non-empty intersection, then invoke the method directly.

– Otherwise, the call cannot be resolved.

Rapid Type Analysis

• Improves on Class Hierarchy Analysis• Interleaves construction of the call graph

with the analysis (known as on-the-fly call graph construction)

• Only expands calls if it has seen an instantiated object of appropriate type

• Makes assumption that the whole program is available!

David Bacon and Peter Sweeney, “Fast Static Analysis of C++ Virtual Function Calls”, OOPSLA ‘96

Examplepublic class A {

public static void main() {A a;D d = new D();E e = new E();if (…) a = d; else a = e;a.f(); }

… }public class B extends A {

public void foo() {G g = new G();…

} // there are no other creation // sites or calls in the // program

RTA starts in main; Sees D, and E are instantiated; Expands a.f() into C.f() only. Never reaches B.foo() and never sees G instantiated.

main

A.f B.f C.f G.f

RTA• Keeps two sets, I (the set of instantiated classes), and R

(the set of reachable methods)• Starts from main, I = {}, R = {main}• Analyze calls in reachable methods: r.f()

– Finds potential targets according to CHA: X.f, Y.f, etc.– If Applies-to(X.f) intersects with I, make X.f a real target, and

add X.f to R

• Analyze instantiation sites in reachable methods: r = new A()– Add A to I– Find all analyzed calls r.f() with potential targets X.f triggered

by A (i.e., A in Applies-to(X.f) at r.f()). Make X.f a real target, and add X.f to R.

Example (continued)

public class A {public static void main() {

A a;D d = new D();E e = new E();if (…) a = d; else a = e;a.f(); }

… }public class B extends A {

public void foo() {G g = new G();…

} // there are no other creation // sites or calls in the // program

main

A.f B.f C.f G.f

{A} {B} {C,D,E} {G}

Comparisons

class A {public :

virtual int foo() { return 1; };};class B: public A {Public :

virtual int foo() { return 2; };virtual int foo(int i) { return i+1; };

};void main() {

B* p = new B;int result1 = p->foo(1);int result2 = p->foo();A* q = p;int result3 = q->foo();

}

Bacon-Sweeny, OOPSLA’96

CHA resolves result2 call uniquely to B.foo(); however, it does not resolve result3.

RTA resolves result3 uniquely because only B has been instantiated.

Type Safety Limitations

• CHA and RTA assume type safety of the code they examine!//#1void* x = (void *) new B;B* q = (B*) x; //a safe downcastint case1 = q->foo()//#2void* x = (void *) new A;B* q = (B*) x; //an unsafe downcastint case2 = q->foo()//probably no error//#3void* x = (void *) new A;B* q = (B *) x; //an unsafe downcastint case3 = q->foo(66);//run-time error

A

B

foo()

foo()foo(int)

Recommended