Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs513x/6.SymbolicExecution.pdf · Intuitive Understanding of Symbolic Execution I ’Execute’ programs with symbols:

Symbolic Execution

Wei Le

2016 April

Agenda

I What is symbolic execution?

I Applications

I History

I Interal Design: The three challengesI Path explosionI Modeling statements and environmentsI Constraint solving

I Implementation and symbolic execution tools

Concrete Execution Versus Symbolic Execution



Intuitive Understanding of Symbolic Execution

I ’Execute’ programs with symbols: we track symbolic state ratherthan concrete input

I ’Execute’ many program paths simultaneously: when execution pathdiverges, fork and add constraints on symbolic values

I When ’execute’ one path, we actually simulate many test runs, sincewe are considering all the inputs that can exercise the same path

I Other variantsI Symbolic Analysis: Model library and system calls rather than

’executing’ it [2003:ICSE:Le]I Concolic Testing: Mixture of symbolic and concrete inputs

[2005:FSE:Sen]

Symbolic Execution Tree

Applications of Symbolic Execution

General goal: identifying semantics of programs

Basic applications:

I Detecting infeasible paths

I Generating test inputs

I Finding bugs and vulnerabilities

I Proving two code segments are equivalent (Code Hunt)

Advanced applications:

I Generating program invariants

I Debugging

I Repair programs

Detecting Infeasible Paths

Suppose we require α = β

Test Input Generation

Path 1: α = 1, β = 1Path 2: α = 1, β = 6Path 3 ...

Bug Finding

Bug Finding

Test Input Generation: Code Hunt

Code Hunt: Behind the Scene

History of Symbolic Execution

Resurgence of Symbolic Execution

The block issues in the past:

I Not scalable: program state has many bits, there are many programpaths

I Not able to go through loops and library calls

I Constraint solver is slow and not capable to handle advancedconstraints

The two key projects that enable the advance:

I DART Godefroid and Sen, PLDI 2005 (introduce dynamicinformation to symbolic execution)

I EXE Cadar, Ganesh, Pawlowski, Dill, and Engler, CCS 2006 (STP:a powerful constraint solver that handles array)

Moving forward:

I More powerful computers and clusters

I Techniques of mixture concrete and symbolic executions

I Powerful constraint solvers

Today: Two Important Tools

KLEE [2008:OSDI:Cadar]

I Open source symbolic executor

I Runs on top of LLVM

I Has found lots of problems in open-source software

SAGE [PLDI:Godefroid:2008]

I Microsoft internal tool

I Symbolic execution to find bugs in file parsers - E.g., JPEG, DOCX,PPT, etc

I Cluster of n machines continually running SAGE

Other Symbolic Executors

I Cloud9: parallel symbolic execution, also supports threads

I Pex, Code Hunt: Microsoft tools, symbolic execution for .NET

I Cute: concolic testing

I jCUTE: symbolic execution for Java

I Java PathFinder: NASA tools, a model checker that also supportssymbolic execution

I SymDroid: symbolic execution on Dalvik Bytecode

I Kleenet: testing interaction protocols for sensor network

Internal of Symbolic Executors: KLEE

Three Challenges

I Path explosion

I Modeling program statements and environment

I Constraint solving

Challenge 1: Path Explosion

Search Strategies: Naive Approach

DFS (depth first search), BFS (breadth first search)

The two approaches purely are based on the structure of the code

I You cannot enumerate all the paths

I DFS: search can stuck at somewhere in a loop

I BFS: very slow to determine properties for a path if there are manybranches



















Search Strategies: Random Search

How to perform a random search?

I Idea 1: pick next path to explore uniformly at random

I Idea 2: randomly restart search if haven’t hit anything interesting ina while

I Idea 3: when have equal priority paths to explore, choose next oneat random

I ...

Drawback: reproducibility, probably good to use psuedo-randomnessbased on seed, and then record which seed is picked

Search Strategies: Coverage Guided Search

Goal: Try to visit statements we haven’t seen before

Approach:

I Select paths likely to hit the new statements

I Favor paths on recently covering new statements

I Score of statement = # times its been seen and how often; Picknext statement to explore that has lowest score

Pros and cons:

I Good: Errors are often in hard-to-reach parts of the program, thisstrategy tries to reach everywhere.

I Bad: Maybe never be able to get to a statement

Search Strategies: Generational Search

I Hybrid of BFS and coverage-guided search

I Generation 0: pick one path at random, run to completion

I Generation 1: take paths from gen 0, negate one branch conditionon a path to yield a new path prefix, find a solution for that pathprefix, and then take the resulting path

I ...

I Generation n: similar, but branching off gen n-1 (also uses acoverage heuristic to pick priority)

Search Strategies: Combined Search

I Run multiple searches at the same time and alternate between them

I Depends on conditions needed to exhibit bug; so will be as good asbest solution, with a constant factor for wasting time with otheralgorithms

I Could potentially use different algorithms to reach different parts ofthe program

Challenge 2: Complex Code and EnvironmentDependencies

I System calls: open(file)

I Library calls: sin(x), glibc

I Pointers and heap: linklist, tree

I Loops and recursive calls: how many times it should iterate andunfold?

I ...

Solutions

I Simulate system calls

I Build simple versions of library calls

I Assign random values after library calls

I Run library an system calls with a concrete value

I Summarize the loops

I ...

An Example

Program was initiated with a symbolic file system with up to N files.Open all N files + one open() failure.

Solutions: Concretization[2005:PLDI:Godefroid],[2005:FSE:Sen]

I Concolic (concrete/symbolic) testing: run on concrete randominputs. In parallel, execute symbolically and solve constraints.Generate inputs to other paths than the concrete one along the way.

I Replace symbolic variables with concrete values that satisfy the pathcondition

I So, could actually do system calls

I And can handle cases when conditions too complex for SMT solver

Solutions: Concretization[2005:PLDI:Godefroid],[2005:FSE:Sen]

See example of DART

Challenge 3: Constraint Solving - SAT

SAT: find an assignment to a set of Boolean variables that makes theBoolean formula true

Complexity: NP-Complete

Constraint Solving - SMT [2011:ACM:DeMoura]

SMT (Satisfiability Modulo Theories) = SAT++

I An SMT formula is a Boolean combination of formulas overfirst-order theories

I Example of SMT theories include bit-vectors, arrays, integer and realarithmetic, strings, ...

I The satisfiability problem for these theories is typically hard ingeneral (NP-complete, PSPACE-complete, ...)

I Program semantics are easily expressed over these theories

I Many software engineering problems can be easily reduced to theSAT problem over first-order theories

Constraint Solving - SMT

The State of the Art: Handle linear integer constraints

Challenges:

I Constraints that contain non-linear operands, e.g., sin(), cos()

I Float-point constraints: no theory support yet, convert to bit-vectorcomputation

I String constraints: a = b.replace(’x’, ’y’)

I Quantifies: ∃, ∀I Disjunction

Tool Design KLEE - Path Explosion

I Random, coverage-optimize search

I Compute state weight using:I Minimum distance to an uncovered instructionI Call stack of the stateI Whether the state recently covered new code

I Timeout: one hour per utility when experimenting with coreutils

Tool Design KLEE - Tracking Symbolic States

Trees of symbolic expressions:

I Instruction pointer

I Path condition

I Registers, heap and stack objects

I Expressions are of C language: arithmetic, shift, dereference,assignment

I Checks inserted at dangerous operations: division, dereferencing

Modeling environment:

I 2500 lines of modeling code to customize system calls (e.g. open,read, write, stat, lseek, ftruncate, ioctl)

I How to generate tests after using symbolic env: supply andescription of symbolic env for each test path; a special drivercreates real OS objects from the description

Tool Design KLEE - Constraint Solving

I STP: a decision procedure for Bit-Vectors and Arrays

I Decision procedures are programs which determine the satisfiabilityof logical formulas that can express constraints relevant to softwareand hardware

I STP uses new efficient SAT solvers

I Treat everything as bit vectors: arithmetic, bitwise operations,relational operations.

Tool Usage KLEE

I Using LLVM to compile to bytecode

I Run KLEE with bytecode

Coverage Results: KLEE

Bug Detection Results: KLEE

Mismatch of CoreUtils and BusyBox

Discussions

I Symbolic environment interaction - how reliable can the customizedmodeling really be? think about concurrent programs, inter-processprograms.

I What is more commonly needed - functional testing orsecurity/completeness/crash testing?

Documents

Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs513x/6.SymbolicExecution.pdf · Intuitive Understanding of Symbolic Execution I ’Execute’ programs with symbols: