View
219
Download
0
Category
Preview:
Citation preview
Formal Methods
Program Slicing & Dataflow Analysis
February 2015
Program Analysis
• Automatic analysis of a program
• Two main objectives– Correctness: program verification– Efficiency: code optimization (compilers)– Security: understand code vulnerabilities
• Two types of analysis– Static Analysis:
• do not execute program; reason over all inputs– Dynamic Analysis:
• Execute program; reason over specific input
Static Analysis
• Based upon source code analysis• Useful for: – Semantic Analysis of Programs
e.g. Type Inference, etc. – Optimizations and Transformations
e.g. Dataflow/Control-flow Analysis
– Program Verificatione.g. Dijkstra’s Weakest Precondition Methods
Dynamic Analysis
• Based upon one or more runs of the program on given inputs
• Useful for: – Performance Analysis – Dynamic Slicing – Program Debugging
Static Analysis Techniques
• Type Inference– Check or infer types for program expressions
• Data Flow Analysis – Analyze variable and other dependencies• Program Slicing
– Construct reduced program WRT variables of interest
• Model checking– Check temporal properties of programs
• Theorem proving– Use logical deduction to prove facts
References
Hiralal Agrawal and Joseph Horgan, Dynamic Program Slicing, ACM SIGPLAN Conf. on Programming Language Design and Implementation; also in SIGPLAN Notices, 25(6): 246-256, 1990
H. Agrawal, Richard A. DeMillo, Eugene H. Spafford: Dynamic Slicing in the Presence of Unconstrained Pointers. Proceedings of Symposium on Testing, Analysis, and Verification, 1991: 60-73
Frank Tip, A Survey on Program Slicing Techniques, Journal of Programming Languages, (3):121-189, 1995
Mark Weiser: Program Slicing. IEEE Transactions on Software Engineering. 10(4): 352-357 (1984)
Static and Dynamic Program
Slicing
Static Program Slicing
• Computing a reduced program with respect to a criterion: <stmt, vars>
• Helps understand dependencies in programs and helps program debugging
• Other applications: • software testing• software maintenance• parallelization
#define YES 1#define NO 0main() {
int c, nl, nw, nc, inword;inword = NO;nl = 0; nw = 0; nc = 0;c = getchar();while (c != EOF) {
nc = nc + 1;if (c == ‘\n’) nl = nl + 1;if (c == ‘ ‘ || c == ‘\n’ || c == ‘\t’) inword = NO;else if (inword == NO) {
inword = YES;nw = nw + 1;}
c = getchar();}
printf(“%d \n”, nl);printf(“%d \n”, nw);printf(“%d \n”, nc);}
Example: Char, Line, and Word Counter
while (c != EOF)
nc = nc + 1
if (c == ‘\n’)
if (c == ‘ ‘ || c == ‘\n’ || c == ‘\t’)
nl = nl + 1
if (inword == NO)
inword = YESnw = nw + 1
inword = NO; nl = 0; nw = 0; nc = 0;c = getchar();
TRUE
TRUE
TRUE
TRUE
printf(“%d \n”, nl);printf(“%d \n”, nw);printf(“%d \n”, nc);
inword == NO
c = getchar();
#define YES 1#define NO 0main() {
int c, nw, inword;inword = NO;nw = 0; c = getchar();while (c != EOF) {
if (c == ‘ ‘ || c == ‘\n’ || c == ‘\t’) inword = NO;else if (inword == NO) {
inword = YES;nw = nw + 1;}
c = getchar();}
printf(“%d \n”, nw);}
Program Slice: Word Counter
#define YES 1#define NO 0main() {
int c, nl;nl = 0; c = getchar();while (c != EOF) {
if (c == ‘\n’) nl = nl + 1;c = getchar();}
printf(“%d \n”, nl);}
Program Slice: Line Counter
#define YES 1#define NO 0main() {
int c, nc;nc = 0;c = getchar();while (c != EOF) {
nc = nc + 1;c = getchar();}
printf(“%d \n”, nc);}
Program Slice: Character Counter
Slicing OO Programs: Example
Ohm’s Lawclass component { attributes Real V, I, R; constraints V = I * R; constructor component(V1, I1, R1) { V = V1; I = I1; R = R1; }}
class parallel extends component {attributes component [ ] C; constraints forall X in C: (X.V = V); (sum X in C: X.I) = I; (sum X in C: 1/X.R) = 1/R; constructor parallel(P) { C = P; }}
Slice WRT Resistance
class parallel extends component {attributes component [ ] C; constraints forall X in C: (X.V = V); (sum X in C: X.I) = I; (sum X in C: 1/X.R) = 1/R; constructor parallel(P) { C = P; }}
class parallel extends component {attributes component [ ] C; constraints (sum X in C: 1/X.R) = 1/R; constructor parallel(P) { C = P; }}
Slice WRT Resistance
class component { attributes Real V, I, R; constraints V = I * R; constructor component(V1, I1, R1) { V = V1; I = I1; R = R1; }}
class component { attributes Real R; constraints
constructor component(R1) { R = R1; }}
Static Slicing Classification
• Forward vs Backward• Intra vs Inter Procedural• Procedural vs OO Languages
OO Slicing is a good topic for presentation.
Slicing is based upon Dataflow Analysis, and hence will examine this topic first.
Data Flow Analysis
• Compiler does data flow analysis for various reasons: detect common subexpressions, loop invariant operations, uninitialized variables, etc.
• Two forms of data flow analysis:– Forward Flow– Backward Flow
• Characterized by Data Flow Constraints
Examples of Data Flow Analysis
• Forward Flow– Reaching Definitions (U)– Available Expressions (ח)
• Backward Flow– Live Variables (U)– Very Busy Expressions (Π)
Summary of DF Analyses
Forward
Backward
ReachingDefinitions
Available Expressio
ns
Live Variables
Very BusyExpressio
ns
U (LFP) Π (GFP)
KILL(B) and GEN(B) sets
x := y + 1; z := w + x; v := z + u; x := x + v;
d1:d2:d3:d4:
B
KILL(B) = {d5} GEN(B) = {d2,d3,d4}
IN = { d5: v := 10; d6: y:= 20; d7: w := 30; d8: u := 40;}
KILL(B) eliminates each definition whose variable is re-assigned within B.
GEN(B) adds (the last) definition for each variable that is assigned in B.
Reaching Definitions
OUT(B) = (IN(B) – KILL(B)) U GEN(B)
B
… …
OUT(B)
IN(B)
IN(B) = U {OUT(P) | P B in the graph}
Illustrating the equations
x := y + 1; z := w + x; v := z + u; x := x + v;
d1:d2:d3:d4:
B
KILL(B) = {d5} GEN(B) = {d2,d3,d4}
IN = { d5: v := 10; d6: y:= 20; d7: w := 30; d8: u := 40;}
OUT(B) = IN(B) – KILL(B) U GEN(B)= {d2,d3,d4,d6,d7,d8}
Least Fixed Point
Theorem: Every Monotonic Function on a Finite Lattice has a Least Fixed Point.
For Reaching Definitions, the lattice of interest is P(S), the powerset of S, the set of all definition points, ordered by the ≤ (subset or equal) relation.
Note that S is finite.
The least upper bound and greatest lower bound are set union and set intersection respectively.
More on Monotonicity
• OUT(B) = (IN(B) – KILL(B)) U GEN(B)• U is monotonic in both arguments• X – Y is monotonic in X but not Y• Since KILL(B) is a constant for each B,
its use does not violate monotonicity
Fixed point iteration will converge only if the functions are monotonic. Note: The composition of monotonic functions is monotonic.
Example: Non-Monotonic Function
x = not(y)y = not(x)
Boolean Lattice: <{T,F}, F ≤ T, and, or>
There are two fixed points:
1. x = T, y = F2. x = F, y = T
No unique solution!
Sketch of Algorithm:Least Fixed Point Iteration
• Forall basic blocks B Do { IN(B) := {}; OUT(B) := GEN(B)
}• While no more changes Do {
Forall B Do { IN(B) := U { OUT(P) | P B in the
graph}; } Forall B Do {
OUT(B) := IN(B) – KILL(B) U GEN(B);
} }
Control Flow Graph
From Aho & Ullman, Principles of Compiler Design, 1977
KILL(B) and GEN(B)
{d1, d2}
{d3}
{d4}
{d5}
{}
{d3,d4,d5}
{d1}
{d2,d5}
{d2,d4}
{}
Initialize: IN(B) = {} and OUT(B) = GEN(B)
{d1, d2}
{d3}
{d4}
{d5}
{}
Iteration 1: IN(B) = U {OUT(P) | P B}
{d3}
{d1, d2}
{d3}
{d4}
{d4,d5}
OUT(B) = IN(B) – KILL(B) U GEN(B)
{d1, d2}
{d2, d3}
{d3, d4}
{d5}
{d4,d5}
IN(B) = U {OUT(P) | P B}
{d2,d3}
{d1, d2, d4, d5}
{d2, d3}
{d3, d4}
{d3, d4,d5}
Iteration 2:
OUT(B) = IN(B) – KILL(B) U GEN(B)
{d1,d2}
{d2,d3,d4,d5}
{d3,d4}
{d3,d5}
{d3,d4,d5}
IN(B) = U {OUT(P) | P B}
{d2,d3,d4,d5}
{d1,d2,d3,d4,d5}
{d2,d3,d4,d5}
{d3, d4}
{d3, d4,d5}
Iteration 3:
OUT(B) = IN(B) – KILL(B) U GEN(B)
{d1,d2}
{d2,d3,d4,d5}
{d3, d4}
{d3, d5}
{d3,d4,d5}
IN(B) = U {OUT(P) | P B}
{d2,d3,d4,d5}
{d1,d2,d3,d4,d5}
{d2,d3,d4,d5}
{d3,d4}
{d3,d4,d5}
Iteration 4:
OUT(B) = IN(B) – KILL(B) U GEN(B)
{d1,d2}
{d2,d3,d4,d5}
{d3, d4}
{d3, d5}
{d3,d4,d5}
Uses of Reaching Definitions
• Uninitialized Variables: Add a dummy ass’t for all variables at start of the program and check where they “reach”.
• Loop Invariant Operations: An expr ‘X op Y’ in a loop is invariant if all definitions for X and Y are outside the loop.
• Static Program Slicing: We will examine the technique in more detail in the next class.
Analysis is Approximate
From Aho & Ullman, Principles of Compiler Design, 1977
this statement will not ever be executed
A := 2 and A := 3 reach point p
Algorithm Efficiencies
• Can represent sets by bit vectors, so that U and Π become logical \/ and /\.
• The number of iterations bounded by number of nodes in graph.
• By visiting the nodes B1, …, Bk in “depth-first order” the number iterations can be minimized. In practice, the number <= 5.
(Reverse) Depth-first Traversal Order
TraversalSequence:
IN(B1)OUT(B1)
IN(B2)OUT(B2)
IN(B3)OUT(B3)
…
IN(B10)OUT(B10)
The path of “back edges”
10 7 4 3
determinesnumber
of iterations
Global Common Subexpressions
… … …
…
:= X * Y := X * Y:= X * Y
… …
p:= X * Y
X and Ydo not change
here
Global Common Subexpressions
… … …
…
T := X * Y T := X * YT := X * Y
… …
p:= T
X and Ydo not change
here
Available Expression
X op Y is said to be ‘available’ at a point p if every path from the start of the program to p evaluates X op Y and after the last such evaluation prior to p, there are no subsequent assignments to X or Y.
OUT(B) = (IN(B) – KILLe(B)) U GENe(B)
IN(B) = Π {OUT(P) | P B in the graph}
Algorithmic Sketch:Greatest Fixed Point
ComputationForall basic blocks B except initial block Do {
IN(B) := E (set of all exprs in program);
OUT(B) := E – KILLe(B) }While no more changes Do {
Forall B Do { IN(B) := Π { OUT(P) | P B in the graph}; } Forall B Do { OUT(B) := IN(B) – KILLe(B) U GENe(B); }
}
Greatest Fixed Point Iteration
Theorem: Every Monotonic Function on a Finite Lattice has a Greatest Fixed Point.
For Available Expressions, the lattice of interest here is the P(S), the powerset of S, the set of all expressions appearing in the program ordered by the ≤ (subset or equal) relation.
Note that S is finite.
The least upper bound and greatest lower bound are set union and intersection respectively.
Live Variables
A variable X is live at p if X will be referenced in some path starting from p to the end of the program
IN(B) = (OUT(B) – DEF(B)) U USE(B)
OUT(B) = U {IN(S) | B S in the graph}
DEF(B) = variables that are assigned in B before they are used
USE(B) = variables that are used in B before any assignment to them in B
Live Variable Analysis
• Example of a Backward Flow Analysis.• Useful in register
allocation/deallocation• The role of IN and OUT are reversed
compared with reaching definitions and available expressions
• This is a least fixed point iteration due to the use of the U in defining OUT(B).
Very Busy Expressions
X op Y is said to be ‘very busy’ at a point p if every path from p encounters X op Y before any assignment to X or Y.
DEF(B) = expressions X op Y in B in which X or Y is defined beforecomputing X op Y USE(B) = expressions X op Y in B in which neither X nor Y is defined beforecomputing X op Y
IN(B) = (OUT(B) – DEFvb(B)) U USEvb(B)
OUT(B) = Π {IN(P) | P B in the graph}
Code Hoisting• Very Busy expressions are useful
in “code hoisting”• Example of backward flow
analysis. p
A := B op C D := B op C
:= A := A := D := D
B and C do notchange here
After Code Hoisting
p
:= T := T := T := T
T := B op C
Assumes that B op C does not
Programming with Partial Orders and Lattices
Terms and Exprs
LUB and GLB are basic operations
Pattern Matching with Sets
Program Flow Analysis:Reaching Definitions
Program Flow Analysis:Very Busy Expressions
Note: E is the set of all expressions in the program being analyzed.
Conditional Clauses:Shortest Distance
Conditional Clauses:Shortest Distance
function short/total
Recommended