50.530: Software Engineering Sun Jun SUTD. Week 10: Symbolic Execution

50.530: Software Engineering

Sun JunSUTD

Week 10: Symbolic Execution

Example

int x, y;

if (x>0) { assert(x>=0); array[x] = 5;}

Will assertion failure occur?

Example

1. if (x>y) {2. x = x + y;3. y = x – y;4. x = x – y;5. if (x-y>0) {6. assert(false);7. }8. }

Will assertion failure occur?

Example

x <= y

x = x+y

y = x-y

x = x-y

x-y>0x-y<=0

Example: Path Condition

Assertion failure occurs if and only if:x1 > y1 &&x2=x1 && y2 = y1 &&x3=x2+y2 && y3 = y2 &&x4=x3 && y4=x3-y3 &&x5=x4-y4 && y5=y4 &&x5-y5>0 &&!(false) is satisfiable.

x <= y

x = x+y

y = x-y

x = x-y

x-y>0x-y<=0

Symbolic Execution

• Rather than executing a program with concrete input value, execute it with symbolic variables representing the inputs.

• Proposed in 1976*.• Popularized only in recent years due to

advancement in constraint solving techniques.

*L. A. Clarke, “A System to Generate Test Data and Symbolically Execute Programs”, IEEE Transactions on Software Engineering

x1 > y1 &&x2=x1 && y2 = y1 &&x3=x2+y2 && y3 = y2 &&x4=x3 && y4=x3-y3 &&x5=x4-y4 && y5=y4 &&x5-y5>0 &&!(false)

How do we know systematically whether a constraint like this is satisfiable or not?

Boolean Satisfiability Problem

• Boolean Satisfiability (often abbreviated SAT) is the problem of determining if there exists an interpretation that satisfies a given Boolean formula.

• Consider the formula (a b) (¬a ¬c)∨ ∧ ∨– The assignment b = True and c = False satisfies the

formula!

Arguably one of the most important problems in computer science.

Exercise 1

• Consider the following constraints: – John can only meet either on Monday, Wednesday

or Thursday; Catherine cannot meet on Wednesday; Anne cannot meet on Friday; Peter cannot meet neither on Tuesday nor on Thursday

• Question: When can the meeting take place?• Answer the question using SAT solving.

SAT: ExampleUse 3 Boolean variables to represent the 6 colors.

Use 3 variables to present each little square.

Define functions T(X, Y) which change values of the Boolean variables X to Y to represent the turns.

Question: the game can be solved by answering the satisfiability of the following formula.Init(X0) && T(X0, X1) && T(X1, X2) &&& … && T(X17, X18) && Goal(X18)

History

• SAT is shown to be NP-complete in 1971 (Stephen Cook)

• The DPLL algorithm is developed in 1960.• Breakthrough occurred in 90s. • Advanced SAT solver handles problem

instances with millions of Boolean variables.• Annual competition:

http://www.satcompetition.org/

Exponential Complexity Growth: The Challenge of Complex Domains

100 200

10K 50K

20K 100K

0.5M 1M

Variables

10301,020

10150,500

106020

103010

Car repair diagnosis

Deep space mission control

Chess (20 steps deep)

VLSIVerification

War Gaming

100K 450K

Military Logistics

Seconds until heat death of sun

Protein foldingCalculation (petaflop-year)

No. of atomson the earth

100 10K 20K 100K 1MRules (Constraints)

Exponential

Complex

Note: rough estimates, for propositional reasoning

[Credit: Kumar, DARPA; Cited in Computer World magazine]

SAT Solver Progress

Instance Posit' 94 Grasp' 96 Sato' 98 Chaff' 01

ssa2670-136 40.66s 1.20s 0.95s 0.02s

bf1355-638 1805.21s 0.11s 0.04s 0.01s

pret150_25 >3000s 0.21s 0.09s 0.01s

dubois100 >3000s 11.85s 0.08s 0.01s

aim200-2_0-no-1 >3000s 0.01s < 0.01s < 0.01s

2dlx_..._bug005 >3000s >3000s >3000s 2.90s

c6288 >3000s >3000s >3000s >3000s

Source: Marques-Silva 2002

Solvers have continually improved over time

SAT Extension: QBF

• SAT: are there b1, b2, b3 such that a formula with no quantifiers is satisfiable or not?

• QBF: Is a formula constituted by Boolean variables and both "for all" ( ) and "there ∀exists" ( ) satisfiable or not.∃– ∀x ∀y ∃z (x ∨ y ∨z) (¬∧ x ¬∨ y ¬∨ z)

QBF Example

Query: Does there exist a strategy such that for all opponent’s move, I would win?

SAT Extension: SMT

• Satisfiability Modulo Theories (SMT) enrich QBF formulas with linear constraints, arrays, all-different constraints, uninterpreted functions, etc.

• Very efficient SMT solvers are now available that can handle many such kinds of constraints.

• Annual competition: http://www.smtcomp.org/

SMT Example

• (Difference Logic) Is there a solution {x,y} satisfying x-y < 20 and x -y > 4

• (Linear arithmetic) Is there a solution {x,y,z} satisfying

3x+2y >= 5z and 5z = 2x

Black Box View

Logic FormulaNot satisfiable Or an assignment of the variables

Click here to see a proof that the assertion failure is not occurring.

SMT Solver

Symbolic Execution: Algo1

Find all paths P which lead to an assertion;

For each path in P { Construct a path condition Con for P; Check whether Con is satisfiable using an SMT solver; if (satisfiable) {

Construct a test case based on the SMT output; Report error; }}

Report assertion verified;

Exercise 2

1. Boolean a = input(); 2. Boolean b = input();3. Boolean c = input(); 4. int x = 0, y = 0, z = 0;5. if (a) {6. x = -2;7. }8. if (b) {9. if (!a && c) { y = 1; }10. z = 2;11. }12. assert(x+y+z!=3)

Analyze the above program using Algo1 to check assertion violation.

Limitation: Path Explosion

How many paths are there?• 2^3• Exponential in

branching structure.

if (input()==true) { x = x+1;}if (input()==true) { x = x+2;}if (input()==true) { x = x+4;}assert(x <= 7);

Limitation: Path Explosion

How do we handle loops?• check all paths which

reach the assertion in one iteration.

• … in two iterations.• … in three iterations.• …

int x = input();while (x > 0) { x++; assert(…);}

The loop invariant problem is still there.

Limitation: Incompleteness

SMT solver is no magic• Existing SMT solvers

supports theories on linear integer arithmetic, bit vectors, string, etc.

• Existing SMT solvers are not particularly scalable.

int x = input();int y = input();int z = input();

if (5x^63 + 7x^12 = 78y^2 + z) { assert(false);}

AN INTERPOLATION METHOD FOR CLP TRAVERSAL

Jaffar et al. CP 2009

Symbolic Execution

Path Explosion

How do we solve the problem of path explosion?

Example

1. if (input()==true) { x = x+1;}2. if (input()==true) { x = x+2;}3. if (input()==true) { x = x+4;}4. assert(x <= 7);

Is it possible to have an assertion failure?

How many path conditions do we have to solve?

Unfolding Tree

1x = x+1

x = x+2

x = x+4

x = x+2

x = x+4

** x = x+4

* *x = x+4

Step 1: Symbolic Execution

1x = x+1

x = x+2

x = x+4

x = x+2

x = x+4

** x = x+4

* *x = x+4

Path Condition: x1 = 0 && x2 = x1 && x3 = x2 && x4 = x3 && x4 > 7

Step 1: Interpolant

states reached by the path: x1 = 0 && x2 = x1 && x3 = x2 && x4 = x3

bad states: x4 > 7

Interpolate: generalization of A which is still disjoint with B.

Craig Interpolation

Given a pair of predicates (A, B), if A && B is not satisfiable, an interpolant for (A, B) is a formula P with the following properties:• A implies P• P && B is un-satisfiable, and• P refers only to the common variables of A

and B.

Example

B: x > 7A: x=0

Sample Interpolants: • x = 0• x <= 3• x < 7• x <= 7

Exercise 3: Interpolant

A is: (x <= 3 && y <= 1) || (x <= 2 && y <= 2) || (x <= 1 && y <= 3)B is: (x >= 3 && y >= 2) || (x >= 2 && y >= 3)

Is there any interpolant other than A or !B? Find one if you believe there is. Otherwise, argue why there isn’t any.

Finding interpolants in general is a hard problem.

Interpolation Computation

• There have been many algorithm proposed to compute interpolants efficiently for logics.

• Given a pair of A and B, there might be many different interpolants.

• Weakest precondition is the strongest interpolant, which is expensive to compute.

• Existing tools usually propose interpolants in the form of a conjunctive formula.

1x = x+1

x = x+2

x = x+4

x = x+2

x = x+4

** x = x+4

* *x = x+4

Let A be x1 = 0 && x2 = x1 && x3 = x2 && x4 = x3. Let B be x4 > 7(strongest) interpolant: x4 <= 7.

We learned:At location 4, x <= 7 implies safety;

1x = x+1

x = x+2

x = x+4

x = x+2

x = x+4

** x = x+4

* *x = x+4

Let A be x1 = 0 && x2 = x1 && x3 = x2.Let B be x4 = x3 && x4 > 7(strongest) interpolant: x3 <= 7.

We learned:At location 4, x <= 7 implies safety;At location 3, x <= 7 implies safety if we take the else-branch.

1x = x+1

x = x+2

x = x+4

x = x+2

x = x+4

** x = x+4

* *x = x+4

At location 4, x <= 7 implies safety;At location 3, x <= 7 implies safety if we take the else-branch.At location 2, x <= 7 implies safety if we take two else-branch.At location 2, x <= 7 implies safety if we take three else-branch.

x1=0 && x2=x1 && x3=x2 && x4=x3+4 implies x4<=7, and therefore it is safe.

1x = x+1

x = x+2

x = x+4

x = x+2

x = x+4

** x = x+4

* *x = x+4

Since x1=0 && x2=x1 && x3=x2 && x4=x3+4 && x4>7 is unsatisfiable, we learn using interpolants again.

1x = x+1

x = x+2

x = x+4

x = x+2

x = x+4

** x = x+4

* *x = x+4

Let A be x1=0 && x2=x1 && x3=x2 and B be x4=x3+4 && x4<=7.We found an interpolant x3 <=4 at location 3.

1x = x+1

x = x+2

x = x+4

x = x+2

x = x+4

** x = x+4

* *x = x+4

At location 4, x <= 7 implies safety;At location 3, x <= 7 implies safety if we take the else-branch.At location 3, x <= 3 implies safety if we take the then-branch.At location 2, x <= 7 implies safety if we take two else-branch.At location 2, x <= 7 implies safety if we take three else-branch.

Let A be x1=0 && x2=x1 && x3=x2 and B be x4=x3+4 && x4>7.We found an interpolant x3 <= 3 at location 3.

1x = x+1

x = x+2

x = x+4

x = x+2

x = x+4

** x = x+4

* *x = x+4

At location 4, x <= 7 implies safety;At location 3, x <= 3 implies safety;At location 2, x <= 3 implies safety if we take the else-branch first.At location 2, x <= 3 implies safety if we take two else-branch.

1x = x+1

x = x+2

x = x+4

x = x+2

x = x+4

** x = x+4

* *x = x+4

At location 3, x <= 3 implies safety;At location 2, x <= 3 implies safety if we take the else-branch first.At location 2, x <= 3 implies safety if we take two else-branch.

x1=0 && x2=x1 && x3=x2+2 implies x3<=3, and therefore it is safe.

1x = x+1

x = x+2

x = x+4

x = x+2

x = x+4

** x = x+4

* *x = x+4

At location 2, x <= 1 implies safety;

x1=0 && x2=x1 && x3=x2+2 implies x3<=3, and therefore it is safe.

1x = x+1

x = x+2

x = x+4

x = x+2

x = x+4

** x = x+4

* *x = x+4

At location 2, x <= 1 implies safety;

x1=0 && x2=x1+1 implies x2<=1, and therefore it is safe.

Reduction

1x = x+1

x = x+2

x = x+4

x = x+2

x = x+4

** x = x+4

* *x = x+4

AlgorithmInput: a finite tree T with root v representing a program, assuming that each leaf represents an assertion: assert(Q).Output: a test case leading to assertion violation or “no assertion violation”

while (there is un-visited nodes) { visit each node N in DFS order; if (there is an unconditioned learned result: “if P is satisfied at N, then safe”) { let PathCond be the path condition of the current path; if (PathCond implies P) { update the learned results based interpolants from PathCond && !P;

skip the node; } else if (N is a leaf) {

if (PathCond && !Q is satisfiable) {report with a test case for assertion violation;} else {

update the learned results based interpolants from PathCond && !Q; }

Exercise 4: Show How it Works

int y = input();1. if (input()==true) { x = x+1;}2. if (y>=1) { x = x+2;}3. if (y<1) { x = x+4;}4. assert(x <= 5);

• A program which contains one or more loops would lead an unbounded tree.

• Symbolic execution can be used to help discovering loop invariant.

1. if (input()==true) { x = x+1;}2. if (input()==true) { x = x+2;}3. if (input()==true) { x = x+4;}4. assert(x <= 7);

How about we verify the program using simply Hoare logic?

Example

function foo(int x, int n) { int y = x; int i = 0;

while (i < n) {x = x+1;i = i +1;

if (x < y) {error();

Is error possible?

How do we systematically verify that?

Example

function foo(int x, int n) { 1. int y = x; 2. int i = 0;

3. while (i < n) {4. x = x+1;5. i = i +1;

6. if (x < y) {7. error();

7 not safex<y

Example

Step 1:Path condition:y=x && i = 0 && i >= n && x < y

Unsatisfiable

Interpolant at 6: x >= y

7 not safex<y

x>=y implies safety

Example

Step 1:Path condition:y=x && i = 0 && i >= n && x < y

Unsatisfiable

Interpolant at 3->6: (x >= y)

7 not safex<y

x>=y implies safety

In theory, it should be: !(x<y && i >=n), why?

Example

Step 2:Path condition:y=x && i = 0 && i < n && x1=x+1 && i=i+1 && i >= n&&x<y

Unsatisfiable

Interpolant at 3->6: (x >= y)

7 not safex<y

x>=y implies safety

x>=yimplies safety

Guessing Loop Invariants

• Through symbolic execution with interpolants, we obtain conditions which must be satisfied in order to verify safety.

• These interpolants perhaps are related to the loop invariants.

• The Idea: take (part of) the condition as candidates for loop invariant and check.

Candidate: x>=yTo check whether x>=y is a sufficiently strong loop invariant, we need to establish:

{true}y=x;i=0{x>=y}{i<n&&x>=y}x=x+1; i=i+1;{x>=y}andx>=y implies x>=y at 6 which implies safety.

7 not safex<y

x>=y implies safety

Do the above Hoare triples hold?

Candidate: x>=y{true}y=x;i=0{x>=y}{i<n&&x>=y}x=x+1; i=i+1;{x>=y}and x>=y implies x>=y at 6 which implies safety

The above Hoare triples can be discharged using symbolic execution by checking the satisfiability of the following:

y=x&&i=0&&x<yi<n&&x>=y&&x1=x+1&&i1=i+1&&x<y

7 not safex<y

x>=y implies safety

Empirical Study

Reported in “Lazy Abstraction with Interpolants” (CAV 2006)SGP = simple goto programsThese are all windows device drivers

Conclusion

• Symbolic execution allows us to check many test cases (which share the same path) at once.

• Symbolic execution needs the support of advanced constraint solving like SMT solving – which is not yet very scalable.

• Symbolic execution with interpolants eases the path explosion problem by “learning” from failures (in reaching the error state).

Exercise 5

• Verify that the following program is free from exception using symbolic execution with interpolants.

int x;int[] array = new array[]{1,2,3,4, …};

rec();

array[x] = 2;

public void rec() { if (input() == true) {

rec();rec();

} else {

x = x+1;return;

Question

Does the traversing order matter in term of reduction?

DART: DIRECTED AUTOMATED RANDOM TESTING

Godefroid et al. PLDI 2005

Motivation

• Random testing can cover many paths but is hardly ever complete

• Symbolic execution can completely check all paths if there aren’t many.

if (x == 19973) { assert(false);}

What is the probability of finding the assertion failure?

How about we randomly test first and use symbolic execution to increase coverage?

Example

1. int h(int x, int y) {2. if (x != y) {3. if (2*x == x + 10) {4. abort(); /*error*/5. }6. else {7. return 2x+y; 8. }9. }10. else {11. return 2x; 12. }13. }

x == yx != y

else2*x == x+10

random testing

symbolic executing

DART: Approach

Objective:• Input: a function written in C. • Output: a set of test cases which provides 100% code coverage.

Method:• Generate a test driver that performs random testing to

simulate the most general environment the program can operate in.

• Dynamically analyze how the program behaves under random testing and generate new test inputs systematically using symbolic execution.

A function with parameters• The function is assumed to be always terminating.• It contains the following statements:

– abort()– if (e) {goto l} else {goto l’} (where e is an expression and l and

l’ are statements)– assignment: m := e (where m is a variable name and e is an

expression)• Expression e can be

– A constant c, e1 * e2, e1 <= e2, !e1, *e1• Expressions are side-effect-free.

Test Driver

• Identity all external inputs needed by the program– Function parameters and user inputs

Test Driver

• Identity all external inputs needed by the program– external functions

Is this justified?

Dart: the Algorithm

complete = true;

do { path = <>; inits = []; directed = true; while (directed) {

run_instructed(); }} while (complete);

complete is true iff the applied SMT solver is complete in solving the constraints.

path is a sequence of statements and variable valuations;

inits assigns values to some variable (generated by the SMT solvers);

Dart: the Algorithmrun_instructed() { for each variable x {

x = random() if it is not in inits; otherwise x = inits(x); }

Let s be the initial statement; while (s is not abort or halt) {

execute s;add s and current variable valuations into path; s = next statement;

} if (s == abort) { report bug; exit(); } else {

return solvePathCondition(); }}

Dart: the AlgorithmsolvePathCondition () { from the last of path, find a statement if (B) {} else {} such that only its then-

branch or else-branch has been executed; if (no such branching condition exists) {

directed = false;return;

} else {

Remove from path all statements after that branch statement;Let C be B is the else-branch is taken or else !B;if (SMT-solve(path, C)) { set inits be the variable valuations returned by the SMT solver; return;}else { solvePathCondition(); }

Dart: the AlgorithmSMT-solve(path, C) { Let SM be a symbolic memory such that SM(x) = x for all variable x; evaluate each statement in path one by one on SM by calling evaluate(e, CM, SM) where

CM is the concrete variable valuation before the execution of the statement; return true iff SM && C is satisfiable by an SMT solver;}

evaluate(e, CM, SM) { if (e is variable name m) {

return SM(m) if m is of a type supported by the SMT solver; or else CM(m); } if (e is e1 * e2) {

let e1’ = evaluate(e1, CM, SM); e2’ = evaluate(e2, CM, SM); if (neither of e1’ or e2’ is a constant) { complete = false; return the evaluation result of e with CM;}else { return e1’*e2’; }

} …}

Dart: Theorem

(A) If Dart reports a bug, then there is some input that leads to an abort; (B) If Dart terminates without reporting a bug, there is no input that leads to an abort and all paths in the program have been exercised; (C) Otherwise, Dart runs forever.

Example

1. int h(int x, int y) {2. if (x != y) {3. if (2*x == x + 10) {4. abort(); /*error*/5. }6. else {7. return 2x+y; 8. }9. }10. else {11. return 2x; 12. }13. }

x == yx != y

else2*x == x+10

random testing

symbolic executing

Question

foo (int x, int y) { if (x*x*x > 0) {

if (x > 0 && y == 10) { abort();

} } else {

if (x > 0 && y == 20) { abort();

Will Dart find the bug? Assume that the SMT solver can’t deal with non-linear expressions.

Case Study

• oSIP library: 30K lines of C codes, 600+ externally visible functions http://www.gnu.org/software/osip/osip.html

• Apply DART to test every function• There are no assertions

– DART is used to look for segmentation fault and non-termination.

• DART found ways to crash 65% of the functions– Most of which caused by null-pointers in function

parameters. • Pex: a tool based on the same idea of DART will be part

of Visual Studio 2015.

50.530: Software Engineering Sun Jun SUTD. Week 10: Symbolic Execution

Documents

FUSION - Singapore University of Technology and Design (SUTD)...- SUTD-GREaT Lab (SGLab), our newest centre, has a main thrust to produce the next generation. of game visionaries through

ETH Alumni Singapore's Visit to SUTD

50.530: Software Engineering Sun Jun SUTD 1. Week 12: Software Model Checking 2

50.530: Software Engineering Sun Jun SUTD. Week 10: Invariant Generation

Fractal Symbolic Analysispingali/CS380C/2016/papers/fractal.pdf · Fractal Symbolic Analysis † 779 Fig. 2. Overview of fractal symbolic analysis. an approximate symbolic analysis

DETC/DTM-14563 (Repeat) V3 - SUTD

50.530: Software Engineering Sun Jun SUTD. Week 13: Rely-Guarantee Reasoning

First Ever Pathway Programme from a Polytechnic to SUTD in ... · Singapore Polytechnic (SP) partners Singapore University of Technology and Design (SUTD) to develop and launch a

SUTD-MIT INTERNATIONAL DESIGN CENTRE (IDC) Final Report: … · SUTD-MIT INTERNATIONAL DESIGN CENTRE (IDC) Final Report: SUTD IDC UROP . SUMMARY . Title: SUTD Smart Bicycle Sharing

The SUTD-SMU dual degree brings you the best of both ... Students can choose to single-major or double-major in any ... an SUTD Engineering degree and an SMU Business Management

SUTD PhD Programme€¦ · SUTD o˜ers the SUTD PhD Programme under these academic tracks: Students who successfully complete the programme will graduate with a Doctor of Philosophy

Educating Technology Leaders for Design-Driven Innovation@SUTD SUTD Game Lab Digital Design & Manufacturing Health and Medical Engineering Big Data Robotics Smart Energy Materiality

ConstantinosDaskalakis ISTD&SUTD October9,2019 arXiv:1905 ... · ISTD&SUTD ioannis@sutd.edu.sg October9,2019 ... and an MIT-IBM Watson AI Lab research grant. ... social network analysis,

FROM SINGAPORE TO U.S. TO CHINA, THE SUTD TECHNOLOGY ... The SUTD Technology Entrepreneurship Programme (STEP) is a premier 4.5-year integrated programme culminating in a Bachelor

SUTD Sourcing Supplier Handbook Singapore University of ...1).pdf · SUTD Sourcing Supplier Handbook. Singapore University of Technology and Design. SUTD Sourcing. Supplier Handbook

SUTD PhD PROGRAMME JUST AS IN DESIGNING GREEN CITIES … · 2014. 7. 3. · SUTD EAST COAST CAMPUS. DESIGNED BY UNSTUDIO AND DP ARCHITECTS. TO BE COMPLETED IN 2014. The Singapore

SUTD-TrafficQA: A Question Answering Benchmark and an

Visiting Singapore University of Technology and Design (SUTD)

Singapore Summer Internship at SUTD Smart Systems, SUTD ... · Summer Internship at SUTD Singapore Don Kurian Dennis 1301CS17 Supervised by Dr. Lim Hock Beng and Dr. Vishram Mishra

50.530: Software Engineering Sun Jun SUTD. DateTopicRemarks Sep 15Introduction Sep 22Automatic Testing Sep 29Delta Debugging Oct 13Bug Localization Oct