View
280
Download
2
Category
Tags:
Preview:
Citation preview
Example
1. if (x>y) {2. x = x + y;3. y = x – y;4. x = x – y;5. if (x-y>0) {6. assert(false);7. }8. }
Will assertion failure occur?
Example
8
1
2
3
5
4
6
7
1. if (x>y) {2. x = x + y;3. y = x – y;4. x = x – y;5. if (x-y>0) {6. assert(false);7. }8. }
x > y
x <= y
x = x+y
y = x-y
x = x-y
x-y>0x-y<=0
Example: Path Condition
Assertion failure occurs if and only if:x1 > y1 &&x2=x1 && y2 = y1 &&x3=x2+y2 && y3 = y2 &&x4=x3 && y4=x3-y3 &&x5=x4-y4 && y5=y4 &&x5-y5>0 &&!(false) is satisfiable.
8
1
2
3
5
4
6
7
x > y
x <= y
x = x+y
y = x-y
x = x-y
x-y>0x-y<=0
Symbolic Execution
• Rather than executing a program with concrete input value, execute it with symbolic variables representing the inputs.
• Proposed in 1976*.• Popularized only in recent years due to
advancement in constraint solving techniques.
*L. A. Clarke, “A System to Generate Test Data and Symbolically Execute Programs”, IEEE Transactions on Software Engineering
x1 > y1 &&x2=x1 && y2 = y1 &&x3=x2+y2 && y3 = y2 &&x4=x3 && y4=x3-y3 &&x5=x4-y4 && y5=y4 &&x5-y5>0 &&!(false)
How do we know systematically whether a constraint like this is satisfiable or not?
Boolean Satisfiability Problem
• Boolean Satisfiability (often abbreviated SAT) is the problem of determining if there exists an interpretation that satisfies a given Boolean formula.
• Consider the formula (a b) (¬a ¬c)∨ ∧ ∨– The assignment b = True and c = False satisfies the
formula!
Arguably one of the most important problems in computer science.
Exercise 1
• Consider the following constraints: – John can only meet either on Monday, Wednesday
or Thursday; Catherine cannot meet on Wednesday; Anne cannot meet on Friday; Peter cannot meet neither on Tuesday nor on Thursday
• Question: When can the meeting take place?• Answer the question using SAT solving.
SAT: ExampleUse 3 Boolean variables to represent the 6 colors.
Use 3 variables to present each little square.
Define functions T(X, Y) which change values of the Boolean variables X to Y to represent the turns.
Question: the game can be solved by answering the satisfiability of the following formula.Init(X0) && T(X0, X1) && T(X1, X2) &&& … && T(X17, X18) && Goal(X18)
History
• SAT is shown to be NP-complete in 1971 (Stephen Cook)
• The DPLL algorithm is developed in 1960.• Breakthrough occurred in 90s. • Advanced SAT solver handles problem
instances with millions of Boolean variables.• Annual competition:
http://www.satcompetition.org/
13
Exponential Complexity Growth: The Challenge of Complex Domains
100 200
10K 50K
20K 100K
0.5M 1M
1M5M
Variables
1030
10301,020
10150,500
106020
103010
Cas
e co
mpl
exity
Car repair diagnosis
Deep space mission control
Chess (20 steps deep)
VLSIVerification
War Gaming
100K 450K
Military Logistics
Seconds until heat death of sun
Protein foldingCalculation (petaflop-year)
No. of atomson the earth
1047
100 10K 20K 100K 1MRules (Constraints)
Exponential
Complex
ity
Note: rough estimates, for propositional reasoning
[Credit: Kumar, DARPA; Cited in Computer World magazine]
14
SAT Solver Progress
Instance Posit' 94 Grasp' 96 Sato' 98 Chaff' 01
ssa2670-136 40.66s 1.20s 0.95s 0.02s
bf1355-638 1805.21s 0.11s 0.04s 0.01s
pret150_25 >3000s 0.21s 0.09s 0.01s
dubois100 >3000s 11.85s 0.08s 0.01s
aim200-2_0-no-1 >3000s 0.01s < 0.01s < 0.01s
2dlx_..._bug005 >3000s >3000s >3000s 2.90s
c6288 >3000s >3000s >3000s >3000s
Source: Marques-Silva 2002
Solvers have continually improved over time
SAT Extension: QBF
• SAT: are there b1, b2, b3 such that a formula with no quantifiers is satisfiable or not?
• QBF: Is a formula constituted by Boolean variables and both "for all" ( ) and "there ∀exists" ( ) satisfiable or not.∃– ∀x ∀y ∃z (x ∨ y ∨z) (¬∧ x ¬∨ y ¬∨ z)
SAT Extension: SMT
• Satisfiability Modulo Theories (SMT) enrich QBF formulas with linear constraints, arrays, all-different constraints, uninterpreted functions, etc.
• Very efficient SMT solvers are now available that can handle many such kinds of constraints.
• Annual competition: http://www.smtcomp.org/
SMT Example
• (Difference Logic) Is there a solution {x,y} satisfying x-y < 20 and x -y > 4
• (Linear arithmetic) Is there a solution {x,y,z} satisfying
3x+2y >= 5z and 5z = 2x
Black Box View
Logic FormulaNot satisfiable Or an assignment of the variables
Click here to see a proof that the assertion failure is not occurring.
1. if (x>y) {2. x = x + y;3. y = x – y;4. x = x – y;5. if (x-y>0) {6. assert(false);7. }8. }
SMT Solver
Symbolic Execution: Algo1
Find all paths P which lead to an assertion;
For each path in P { Construct a path condition Con for P; Check whether Con is satisfiable using an SMT solver; if (satisfiable) {
Construct a test case based on the SMT output; Report error; }}
Report assertion verified;
Exercise 2
1. Boolean a = input(); 2. Boolean b = input();3. Boolean c = input(); 4. int x = 0, y = 0, z = 0;5. if (a) {6. x = -2;7. }8. if (b) {9. if (!a && c) { y = 1; }10. z = 2;11. }12. assert(x+y+z!=3)
Analyze the above program using Algo1 to check assertion violation.
Limitation: Path Explosion
How many paths are there?• 2^3• Exponential in
branching structure.
if (input()==true) { x = x+1;}if (input()==true) { x = x+2;}if (input()==true) { x = x+4;}assert(x <= 7);
Limitation: Path Explosion
How do we handle loops?• check all paths which
reach the assertion in one iteration.
• … in two iterations.• … in three iterations.• …
int x = input();while (x > 0) { x++; assert(…);}
The loop invariant problem is still there.
Limitation: Incompleteness
SMT solver is no magic• Existing SMT solvers
supports theories on linear integer arithmetic, bit vectors, string, etc.
• Existing SMT solvers are not particularly scalable.
int x = input();int y = input();int z = input();
if (5x^63 + 7x^12 = 78y^2 + z) { assert(false);}
Example
1. if (input()==true) { x = x+1;}2. if (input()==true) { x = x+2;}3. if (input()==true) { x = x+4;}4. assert(x <= 7);
Is it possible to have an assertion failure?
How many path conditions do we have to solve?
Unfolding Tree
2
3 3
4 4 4
2
3 3
4 4 4
1x = x+1
x = x+2
x = x+4
x = x+2
x = x+4
*
*
** x = x+4
*
* *x = x+4
4 4
Step 1: Symbolic Execution
2
3 3
4 4 4
2
3 3
4 4 4
1x = x+1
x = x+2
x = x+4
x = x+2
x = x+4
*
*
** x = x+4
*
* *x = x+4
Path Condition: x1 = 0 && x2 = x1 && x3 = x2 && x4 = x3 && x4 > 7
4 4
Step 1: Interpolant
A
B
states reached by the path: x1 = 0 && x2 = x1 && x3 = x2 && x4 = x3
bad states: x4 > 7
Interpolate: generalization of A which is still disjoint with B.
Craig Interpolation
Given a pair of predicates (A, B), if A && B is not satisfiable, an interpolant for (A, B) is a formula P with the following properties:• A implies P• P && B is un-satisfiable, and• P refers only to the common variables of A
and B.
Exercise 3: Interpolant
A is: (x <= 3 && y <= 1) || (x <= 2 && y <= 2) || (x <= 1 && y <= 3)B is: (x >= 3 && y >= 2) || (x >= 2 && y >= 3)
Is there any interpolant other than A or !B? Find one if you believe there is. Otherwise, argue why there isn’t any.
Finding interpolants in general is a hard problem.
Interpolation Computation
• There have been many algorithm proposed to compute interpolants efficiently for logics.
• Given a pair of A and B, there might be many different interpolants.
• Weakest precondition is the strongest interpolant, which is expensive to compute.
• Existing tools usually propose interpolants in the form of a conjunctive formula.
2
3 3
4 4 4
2
3 3
4 4 4
1x = x+1
x = x+2
x = x+4
x = x+2
x = x+4
*
*
** x = x+4
*
* *x = x+4
Let A be x1 = 0 && x2 = x1 && x3 = x2 && x4 = x3. Let B be x4 > 7(strongest) interpolant: x4 <= 7.
4 4
We learned:At location 4, x <= 7 implies safety;
2
3 3
4 4 4
2
3 3
4 4 4
1x = x+1
x = x+2
x = x+4
x = x+2
x = x+4
*
*
** x = x+4
*
* *x = x+4
Let A be x1 = 0 && x2 = x1 && x3 = x2.Let B be x4 = x3 && x4 > 7(strongest) interpolant: x3 <= 7.
4 4
We learned:At location 4, x <= 7 implies safety;At location 3, x <= 7 implies safety if we take the else-branch.
2
3 3
4 4 4
2
3 3
4 4 4
1x = x+1
x = x+2
x = x+4
x = x+2
x = x+4
*
*
** x = x+4
*
* *x = x+4
4 4
At location 4, x <= 7 implies safety;At location 3, x <= 7 implies safety if we take the else-branch.At location 2, x <= 7 implies safety if we take two else-branch.At location 2, x <= 7 implies safety if we take three else-branch.
x1=0 && x2=x1 && x3=x2 && x4=x3+4 implies x4<=7, and therefore it is safe.
2
3 3
4 4 4
2
3 3
4 4 4
1x = x+1
x = x+2
x = x+4
x = x+2
x = x+4
*
*
** x = x+4
*
* *x = x+4
4 4
At location 4, x <= 7 implies safety;At location 3, x <= 7 implies safety if we take the else-branch.At location 2, x <= 7 implies safety if we take two else-branch.At location 2, x <= 7 implies safety if we take three else-branch.
Since x1=0 && x2=x1 && x3=x2 && x4=x3+4 && x4>7 is unsatisfiable, we learn using interpolants again.
2
3 3
4 4 4
2
3 3
4 4 4
1x = x+1
x = x+2
x = x+4
x = x+2
x = x+4
*
*
** x = x+4
*
* *x = x+4
4 4
At location 4, x <= 7 implies safety;At location 3, x <= 7 implies safety if we take the else-branch.At location 2, x <= 7 implies safety if we take two else-branch.At location 2, x <= 7 implies safety if we take three else-branch.
Let A be x1=0 && x2=x1 && x3=x2 and B be x4=x3+4 && x4<=7.We found an interpolant x3 <=4 at location 3.
2
3 3
4 4 4
2
3 3
4 4 4
1x = x+1
x = x+2
x = x+4
x = x+2
x = x+4
*
*
** x = x+4
*
* *x = x+4
4 4
At location 4, x <= 7 implies safety;At location 3, x <= 7 implies safety if we take the else-branch.At location 3, x <= 3 implies safety if we take the then-branch.At location 2, x <= 7 implies safety if we take two else-branch.At location 2, x <= 7 implies safety if we take three else-branch.
Let A be x1=0 && x2=x1 && x3=x2 and B be x4=x3+4 && x4>7.We found an interpolant x3 <= 3 at location 3.
2
3 3
4 4 4
2
3 3
4 4 4
1x = x+1
x = x+2
x = x+4
x = x+2
x = x+4
*
*
** x = x+4
*
* *x = x+4
4 4
At location 4, x <= 7 implies safety;At location 3, x <= 3 implies safety;At location 2, x <= 3 implies safety if we take the else-branch first.At location 2, x <= 3 implies safety if we take two else-branch.
2
3 3
4 4 4
2
3 3
4 4 4
1x = x+1
x = x+2
x = x+4
x = x+2
x = x+4
*
*
** x = x+4
*
* *x = x+4
4 4
At location 3, x <= 3 implies safety;At location 2, x <= 3 implies safety if we take the else-branch first.At location 2, x <= 3 implies safety if we take two else-branch.
x1=0 && x2=x1 && x3=x2+2 implies x3<=3, and therefore it is safe.
2
3 3
4 4 4
2
3 3
4 4 4
1x = x+1
x = x+2
x = x+4
x = x+2
x = x+4
*
*
** x = x+4
*
* *x = x+4
4 4
At location 2, x <= 1 implies safety;
x1=0 && x2=x1 && x3=x2+2 implies x3<=3, and therefore it is safe.
2
3 3
4 4 4
2
3 3
4 4 4
1x = x+1
x = x+2
x = x+4
x = x+2
x = x+4
*
*
** x = x+4
*
* *x = x+4
4 4
At location 2, x <= 1 implies safety;
x1=0 && x2=x1+1 implies x2<=1, and therefore it is safe.
Reduction
2
3 3
4 4 4
2
3 3
4 4 4
1x = x+1
x = x+2
x = x+4
x = x+2
x = x+4
*
*
** x = x+4
*
* *x = x+4
4 4
AlgorithmInput: a finite tree T with root v representing a program, assuming that each leaf represents an assertion: assert(Q).Output: a test case leading to assertion violation or “no assertion violation”
while (there is un-visited nodes) { visit each node N in DFS order; if (there is an unconditioned learned result: “if P is satisfied at N, then safe”) { let PathCond be the path condition of the current path; if (PathCond implies P) { update the learned results based interpolants from PathCond && !P;
skip the node; } else if (N is a leaf) {
if (PathCond && !Q is satisfiable) {report with a test case for assertion violation;} else {
update the learned results based interpolants from PathCond && !Q; }
} }}
Exercise 4: Show How it Works
int y = input();1. if (input()==true) { x = x+1;}2. if (y>=1) { x = x+2;}3. if (y<1) { x = x+4;}4. assert(x <= 5);
Loops
• A program which contains one or more loops would lead an unbounded tree.
• Symbolic execution can be used to help discovering loop invariant.
1. if (input()==true) { x = x+1;}2. if (input()==true) { x = x+2;}3. if (input()==true) { x = x+4;}4. assert(x <= 7);
How about we verify the program using simply Hoare logic?
Example
function foo(int x, int n) { int y = x; int i = 0;
while (i < n) {x = x+1;i = i +1;
}
if (x < y) {error();
}}
Is error possible?
How do we systematically verify that?
Example
function foo(int x, int n) { 1. int y = x; 2. int i = 0;
3. while (i < n) {4. x = x+1;5. i = i +1;
}
6. if (x < y) {7. error();
}}
1
2
3
4
5
6
7 not safex<y
i>=n
i<n
y=x
i=0
x=x+1
i=i+1
Example
Step 1:Path condition:y=x && i = 0 && i >= n && x < y
Unsatisfiable
Interpolant at 6: x >= y
1
2
3
4
5
6
7 not safex<y
i>=n
i<n
y=x
i=0
x=x+1
i=i+1
x>=y implies safety
Example
Step 1:Path condition:y=x && i = 0 && i >= n && x < y
Unsatisfiable
Interpolant at 3->6: (x >= y)
1
2
3
4
5
6
7 not safex<y
i>=n
i<n
y=x
i=0
x=x+1
i=i+1
x>=y implies safety
x>=y
In theory, it should be: !(x<y && i >=n), why?
Example
Step 2:Path condition:y=x && i = 0 && i < n && x1=x+1 && i=i+1 && i >= n&&x<y
Unsatisfiable
Interpolant at 3->6: (x >= y)
1
2
3
4
5
6
7 not safex<y
i>=n
i<n
y=x
i=0
x=x+1
i=i+1
x>=y implies safety
x>=yimplies safety
Guessing Loop Invariants
• Through symbolic execution with interpolants, we obtain conditions which must be satisfied in order to verify safety.
• These interpolants perhaps are related to the loop invariants.
• The Idea: take (part of) the condition as candidates for loop invariant and check.
Candidate: x>=yTo check whether x>=y is a sufficiently strong loop invariant, we need to establish:
{true}y=x;i=0{x>=y}{i<n&&x>=y}x=x+1; i=i+1;{x>=y}andx>=y implies x>=y at 6 which implies safety.
1
2
3
4
5
6
7 not safex<y
i>=n
i<n
y=x
i=0
x=x+1
i=i+1
x>=y implies safety
Do the above Hoare triples hold?
Candidate: x>=y{true}y=x;i=0{x>=y}{i<n&&x>=y}x=x+1; i=i+1;{x>=y}and x>=y implies x>=y at 6 which implies safety
The above Hoare triples can be discharged using symbolic execution by checking the satisfiability of the following:
y=x&&i=0&&x<yi<n&&x>=y&&x1=x+1&&i1=i+1&&x<y
1
2
3
4
5
6
7 not safex<y
i>=n
i<n
y=x
i=0
x=x+1
i=i+1
x>=y implies safety
Empirical Study
Reported in “Lazy Abstraction with Interpolants” (CAV 2006)SGP = simple goto programsThese are all windows device drivers
Conclusion
• Symbolic execution allows us to check many test cases (which share the same path) at once.
• Symbolic execution needs the support of advanced constraint solving like SMT solving – which is not yet very scalable.
• Symbolic execution with interpolants eases the path explosion problem by “learning” from failures (in reaching the error state).
Exercise 5
• Verify that the following program is free from exception using symbolic execution with interpolants.
int x;int[] array = new array[]{1,2,3,4, …};
rec();
array[x] = 2;
public void rec() { if (input() == true) {
rec();rec();
} else {
x = x+1;return;
}}
Motivation
• Random testing can cover many paths but is hardly ever complete
• Symbolic execution can completely check all paths if there aren’t many.
if (x == 19973) { assert(false);}
What is the probability of finding the assertion failure?
How about we randomly test first and use symbolic execution to increase coverage?
Example
1. int h(int x, int y) {2. if (x != y) {3. if (2*x == x + 10) {4. abort(); /*error*/5. }6. else {7. return 2x+y; 8. }9. }10. else {11. return 2x; 12. }13. }
1
113
x == yx != y
74
else2*x == x+10
random testing
symbolic executing
DART: Approach
Objective:• Input: a function written in C. • Output: a set of test cases which provides 100% code coverage.
Method:• Generate a test driver that performs random testing to
simulate the most general environment the program can operate in.
• Dynamically analyze how the program behaves under random testing and generate new test inputs systematically using symbolic execution.
Input
A function with parameters• The function is assumed to be always terminating.• It contains the following statements:
– abort()– if (e) {goto l} else {goto l’} (where e is an expression and l and
l’ are statements)– assignment: m := e (where m is a variable name and e is an
expression)• Expression e can be
– A constant c, e1 * e2, e1 <= e2, !e1, *e1• Expressions are side-effect-free.
Test Driver
• Identity all external inputs needed by the program– Function parameters and user inputs
Test Driver
• Identity all external inputs needed by the program– external functions
Is this justified?
Dart: the Algorithm
complete = true;
do { path = <>; inits = []; directed = true; while (directed) {
run_instructed(); }} while (complete);
complete is true iff the applied SMT solver is complete in solving the constraints.
path is a sequence of statements and variable valuations;
inits assigns values to some variable (generated by the SMT solvers);
Dart: the Algorithmrun_instructed() { for each variable x {
x = random() if it is not in inits; otherwise x = inits(x); }
Let s be the initial statement; while (s is not abort or halt) {
execute s;add s and current variable valuations into path; s = next statement;
} if (s == abort) { report bug; exit(); } else {
return solvePathCondition(); }}
Dart: the AlgorithmsolvePathCondition () { from the last of path, find a statement if (B) {} else {} such that only its then-
branch or else-branch has been executed; if (no such branching condition exists) {
directed = false;return;
} else {
Remove from path all statements after that branch statement;Let C be B is the else-branch is taken or else !B;if (SMT-solve(path, C)) { set inits be the variable valuations returned by the SMT solver; return;}else { solvePathCondition(); }
}}
Dart: the AlgorithmSMT-solve(path, C) { Let SM be a symbolic memory such that SM(x) = x for all variable x; evaluate each statement in path one by one on SM by calling evaluate(e, CM, SM) where
CM is the concrete variable valuation before the execution of the statement; return true iff SM && C is satisfiable by an SMT solver;}
evaluate(e, CM, SM) { if (e is variable name m) {
return SM(m) if m is of a type supported by the SMT solver; or else CM(m); } if (e is e1 * e2) {
let e1’ = evaluate(e1, CM, SM); e2’ = evaluate(e2, CM, SM); if (neither of e1’ or e2’ is a constant) { complete = false; return the evaluation result of e with CM;}else { return e1’*e2’; }
} …}
Dart: Theorem
(A) If Dart reports a bug, then there is some input that leads to an abort; (B) If Dart terminates without reporting a bug, there is no input that leads to an abort and all paths in the program have been exercised; (C) Otherwise, Dart runs forever.
Example
1. int h(int x, int y) {2. if (x != y) {3. if (2*x == x + 10) {4. abort(); /*error*/5. }6. else {7. return 2x+y; 8. }9. }10. else {11. return 2x; 12. }13. }
1
113
x == yx != y
74
else2*x == x+10
random testing
symbolic executing
Question
foo (int x, int y) { if (x*x*x > 0) {
if (x > 0 && y == 10) { abort();
} } else {
if (x > 0 && y == 20) { abort();
} }}
Will Dart find the bug? Assume that the SMT solver can’t deal with non-linear expressions.
Case Study
• oSIP library: 30K lines of C codes, 600+ externally visible functions http://www.gnu.org/software/osip/osip.html
• Apply DART to test every function• There are no assertions
– DART is used to look for segmentation fault and non-termination.
• DART found ways to crash 65% of the functions– Most of which caused by null-pointers in function
parameters. • Pex: a tool based on the same idea of DART will be part
of Visual Studio 2015.
Recommended