6
Tackling the Path Explosion Problem in Symbolic Execution-driven Test Generation for Programs Saparya Krishnamoorthy *† , Michael S. Hsiao *‡ and Loganathan Lingappan § * Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, Virginia 24061, USA § Intel Corporation, Folsom, California 95630, USA [email protected], [email protected] § [email protected] Abstract—Symbolic techniques have been shown to be very effective in path-based test generation; however, they fail to scale to large programs due to the exponential number of paths to be explored. In this paper, we focus on tackling this path explosion problem and propose search strategies to achieve quick branch coverage under symbolic execution, while exploring only a fraction of paths in the program. We present a reachability-guided strategy that makes use of the reachability graph of the program to explore unvisited portions of the program and a conflict- driven backtracking strategy that utilizes conflict analysis to perform nonchronological backtracking. We present experimental evidence that these strategies can significantly reduce the search space and improve the speed of test generation for programs. Index Terms—Symbolic execution, software testing, conflict analysis. I. I NTRODUCTION Testing is widely recognized as a crucial step in software develop- ment and usually accounts for a significant percentage of the devel- opment cost. Manual testing is labor-intensive and cannot guarantee that all possible behaviors of the program have been verified [1]. In order to improve the test coverage observed, several techniques have been proposed to automatically generate values for the inputs. One such technique is to randomly choose the values over the domain of potential inputs. The problem with this technique is that many sets of values may result in the same observable output and are thus redundant, and the probability of choosing inputs that cause faulty behavior may be extremely small. Constraint-based testing [2] is an increasingly popular technique to automatically generate test input values and tackle some of the previously mentioned challenges. This method focuses on translating portions of the program into logical formulas whose solutions are rel- evant test data. Constraint-based testing is powered by recent progress in constraint solvers and symbolic execution engines. However, such symbolic methods fail to scale to large problems [3]. This is because the possible number of execution paths to be considered symbolically is so large that only a small part of the program path space is actually explored. This phenomenon in path-based testing is called path explosion [4]. Covering all paths of a program, though, is not the primary objective of our experiments, as is the case with most modern testing practices. As an example, we cite the tool Bullseye which is widely used in the software industry to measure the quality of test data; this tool reports only function and branch coverage [5]. In this paper we focus on tackling this path explosion problem and aim to reduce the time for test generation. We propose symbolic search strategies that help achieve complete branch coverage quickly, while searching only a small fraction of paths in the program. We present two strategies as enhancements of the basic depth-first search (DFS) procedure: a reachability-guided strategy that utilizes the program reachability graph to explore unvisited parts of the program and a conflict-driven backtracking strategy that makes use of conflict analysis to enable backtracking over multiple levels of the graph in order to quickly discard as many redundant paths as possible. Experiments show that our search strategies are effective in discarding infeasible paths and reducing the number of paths explored, without a loss of branch coverage compared to the base search procedure. This paper is structured as follows. Section II discusses some preliminaries and Section III gives an overview of related work in this area. In Sections IV and V we present our search strategies along with simple examples. We discuss the implementation details and report the experiments that have been conducted in Sections VI and VII. Finally, Section VIII provides a conclusion to the paper. II. OVERVIEW We first describe constraint-based symbolic execution, followed by the motivation behind our proposed search strategies. A. Constraint-based Test Generation With the advent of advanced program analysis and constraint solving techniques, several tools like DART [2], CUTE [1] etc., utilize symbolic reasoning for dynamic test generation. In symbolic execution, a program is executed on symbolic inputs: the execution of an assignment statement updates the program state with symbolic expressions and the execution of a conditional expression generates a symbolic constraint in terms of the symbolic inputs. To test a program using symbolic execution, the test generator first instruments the program under test. Code instrumentation provides a way to monitor these symbolic values and observe program execution. Symbolic execution-based tools gather knowledge about the exe- cution of a program using a directed search. Values are generated that satisfy the symbolic constraints generated along each execution path. Constraint solvers like Satisfiability-modulo-theories (SMT) solvers are used to solve these path constraints. In other words, path-oriented approaches to testing are built upon the idea of a path predicate. In [6], a path predicate is defined as the following: Definition 1 (Path predicate). Given a program P of input domain D and π a path of P, a path predicate of π is a formula φπ on D such that if V | = φπ then execution of P on V follows the path π. A path predicate for φπ for a path π can be computed by keeping track of logical relations among variables along the execution. A solution to a path predicate φπ for a given program P is actually a test case and forces the program to exercise path π. The process of symbolic execution with the help of path predicates can be best explained with an example. We demonstrate how constraint-based execution is performed along a path of a program corresponding to the C code illustrated in Figure 1. Note that the branch predicates are represented in single static assignment (SSA) form. The branch predicates are computed in terms of the symbolic variables x0 and y0 defined in place of the actual program variables. The path predicate of the path 0 1 2 (3, true) (4, f alse) 6 7 8 is computed with the help of the branch constraints of the path as: (y1 = y0 + 1) (z0 =3.x0)(x0 = y1) (z0 = x0 + 6). This, in turn, can be represented in terms of the input variables as: (x0 = y0 + 1) (3x0 = x0 + 6). This path constraint represents all the input vectors that drive the program through the previously shown path. To force the program through a different branch on this path, the tool calculates a solution to the path constraint obtained by negating a branch predicate of the current path constraint. 2010 19th IEEE Asian Test Symposium 1081-7735/10 $26.00 © 2010 IEEE DOI 10.1109/ATS.2010.19 59

[IEEE 2010 19th Asian Test Symposium (ATS) - Shanghai, China (2010.12.1-2010.12.4)] 2010 19th IEEE Asian Test Symposium - Tackling the Path Explosion Problem in Symbolic Execution-Driven

Embed Size (px)

Citation preview

Page 1: [IEEE 2010 19th Asian Test Symposium (ATS) - Shanghai, China (2010.12.1-2010.12.4)] 2010 19th IEEE Asian Test Symposium - Tackling the Path Explosion Problem in Symbolic Execution-Driven

Tackling the Path Explosion Problem in SymbolicExecution-driven Test Generation for Programs

Saparya Krishnamoorthy∗†, Michael S. Hsiao∗‡ and Loganathan Lingappan§∗Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, Virginia 24061, USA

§Intel Corporation, Folsom, California 95630, USA†[email protected], ‡[email protected] §[email protected]

Abstract—Symbolic techniques have been shown to be very effective inpath-based test generation; however, they fail to scale to large programsdue to the exponential number of paths to be explored. In this paper,we focus on tackling this path explosion problem and propose searchstrategies to achieve quick branch coverage under symbolic execution,while exploring only a fraction of paths in the program. We present areachability-guided strategy that makes use of the reachability graph ofthe program to explore unvisited portions of the program and a conflict-driven backtracking strategy that utilizes conflict analysis to performnonchronological backtracking. We present experimental evidence thatthese strategies can significantly reduce the search space and improvethe speed of test generation for programs.

Index Terms—Symbolic execution, software testing, conflict analysis.

I. INTRODUCTION

Testing is widely recognized as a crucial step in software develop-ment and usually accounts for a significant percentage of the devel-opment cost. Manual testing is labor-intensive and cannot guaranteethat all possible behaviors of the program have been verified [1]. Inorder to improve the test coverage observed, several techniques havebeen proposed to automatically generate values for the inputs. Onesuch technique is to randomly choose the values over the domainof potential inputs. The problem with this technique is that manysets of values may result in the same observable output and are thusredundant, and the probability of choosing inputs that cause faultybehavior may be extremely small.

Constraint-based testing [2] is an increasingly popular techniqueto automatically generate test input values and tackle some of thepreviously mentioned challenges. This method focuses on translatingportions of the program into logical formulas whose solutions are rel-evant test data. Constraint-based testing is powered by recent progressin constraint solvers and symbolic execution engines. However, suchsymbolic methods fail to scale to large problems [3]. This is becausethe possible number of execution paths to be considered symbolicallyis so large that only a small part of the program path space isactually explored. This phenomenon in path-based testing is calledpath explosion [4]. Covering all paths of a program, though, is notthe primary objective of our experiments, as is the case with mostmodern testing practices. As an example, we cite the tool Bullseyewhich is widely used in the software industry to measure the qualityof test data; this tool reports only function and branch coverage [5].

In this paper we focus on tackling this path explosion problemand aim to reduce the time for test generation. We propose symbolicsearch strategies that help achieve complete branch coverage quickly,while searching only a small fraction of paths in the program.We present two strategies as enhancements of the basic depth-firstsearch (DFS) procedure: a reachability-guided strategy that utilizesthe program reachability graph to explore unvisited parts of theprogram and a conflict-driven backtracking strategy that makes useof conflict analysis to enable backtracking over multiple levels of thegraph in order to quickly discard as many redundant paths as possible.Experiments show that our search strategies are effective in discardinginfeasible paths and reducing the number of paths explored, withouta loss of branch coverage compared to the base search procedure.

This paper is structured as follows. Section II discusses somepreliminaries and Section III gives an overview of related work in this

area. In Sections IV and V we present our search strategies along withsimple examples. We discuss the implementation details and reportthe experiments that have been conducted in Sections VI and VII.Finally, Section VIII provides a conclusion to the paper.

II. OVERVIEW

We first describe constraint-based symbolic execution, followed bythe motivation behind our proposed search strategies.

A. Constraint-based Test Generation

With the advent of advanced program analysis and constraintsolving techniques, several tools like DART [2], CUTE [1] etc.,utilize symbolic reasoning for dynamic test generation. In symbolicexecution, a program is executed on symbolic inputs: the executionof an assignment statement updates the program state with symbolicexpressions and the execution of a conditional expression generates asymbolic constraint in terms of the symbolic inputs. To test a programusing symbolic execution, the test generator first instruments theprogram under test. Code instrumentation provides a way to monitorthese symbolic values and observe program execution.

Symbolic execution-based tools gather knowledge about the exe-cution of a program using a directed search. Values are generated thatsatisfy the symbolic constraints generated along each execution path.Constraint solvers like Satisfiability-modulo-theories (SMT) solversare used to solve these path constraints. In other words, path-orientedapproaches to testing are built upon the idea of a path predicate.In [6], a path predicate is defined as the following:

Definition 1 (Path predicate). Given a program P of input domainD and π a path of P, a path predicate of π is a formula φπ on Dsuch that if V |= φπ then execution of P on V follows the path π.

A path predicate for φπ for a path π can be computed by keepingtrack of logical relations among variables along the execution. Asolution to a path predicate φπ for a given program P is actuallya test case and forces the program to exercise path π. The processof symbolic execution with the help of path predicates can be bestexplained with an example. We demonstrate how constraint-basedexecution is performed along a path of a program corresponding tothe C code illustrated in Figure 1. Note that the branch predicatesare represented in single static assignment (SSA) form. The branchpredicates are computed in terms of the symbolic variables x0 and y0defined in place of the actual program variables. The path predicateof the path 0 → 1 → 2 → (3, true) → (4, false) → 6 → 7 → 8 iscomputed with the help of the branch constraints of the pathas: (y1 = y0 + 1) ∧ (z0 = 3.x0)∧ (x0 6= y1) ∧ (z0 6= x0 + 6). This,in turn, can be represented in terms of the input variables as:(x0 6= y0 + 1) ∧ (3x0 6= x0 + 6). This path constraint represents allthe input vectors that drive the program through the previously shownpath. To force the program through a different branch on this path, thetool calculates a solution to the path constraint obtained by negatinga branch predicate of the current path constraint.

2010 19th IEEE Asian Test Symposium

1081-7735/10 $26.00 © 2010 IEEE

DOI 10.1109/ATS.2010.19

59

Page 2: [IEEE 2010 19th Asian Test Symposium (ATS) - Shanghai, China (2010.12.1-2010.12.4)] 2010 19th IEEE Asian Test Symposium - Tackling the Path Explosion Problem in Symbolic Execution-Driven

0 . i n t g ( i n t x , i n t y ) {1 . y ++;2 . z = 3 ∗ x ;3 . i f ( x != y ) {4 . i f ( z == x + 6)5 . a b o r t ( ) ; / / Error6 . }7 . re turn 0 ;}

Fig. 1: An example C program

B. Basic DFS-based Path ExplorationThe goal of most dynamic test generation engines is to explore all

paths in the execution tree of the program. The program executiontree represents the unfolded control-flow graph of the program. Eachnode in this tree corresponds to a conditional or a choice pointin the program (if, switch, etc.), and each of these nodes has twopossible edges corresponding to its two branches. The most commonmethod to explore the program execution tree is the depth-first search(DFS) strategy. The path predicate of each path is built incrementally,reusing the path prefix up to the last choice point in the program.New input values are generated to attempt to force the next run toexplore the last unexplored branch of the conditional on the stack. Ifthe constraint solver is able to find a feasible solution for the instance,it provides the test generation tool with this solution to be used asthe test input in the next run.

We describe the basic depth-first search strategy in the context ofthe C program shown in Figure 1. Let’s assume that a symbolic testgeneration engine initially generates the value 0 for both x and y.As a result, ‘g’ executes the false branch of the first if-statement.The path predicate (x0 6= y0) is formed on the fly, based on theevaluation of the path conditions. To force the program through adifferent path, the solver calculates a solution to the path constraint(x0 = y0), obtained by negating the current path constraint. Apossible solution to this path constraint is 〈x0 = 75, y0 = 0〉 and con-sequently the true branch of the first if-statement and the false branchof the second if-statement are executed. The predicate sequence(x0 6= y0 + 1) ∧ (3x0 6= x0 + 6) is the path constraint. Followingthe DFS strategy, the last predicate of the current path constraintis negated, which results in (x0 6= y0 + 1) ∧ (3x0 = x0 + 6). Apossible solution to this new path constraint is 〈x0 = 3, y0 = 86〉and when the instrumented ‘g’ runs again, it uses these values forthe input variables. This execution reveals the error in the program bydriving it into the abort() statement. In this way, bugs are uncoveredas the paths of the program are explored in a depth-first manner.

C. Path Explosion PhenomenonThe most significant scalability challenge in path-based testing is

how to handle the exponential number of paths in the program. Thepath explosion is mainly due to nested calls, loops and conditions.However, covering all paths of a program is not the primary objectiveof current testing practices. Even in critical systems, it is onlyrequired to fully cover a class of structural entities of the programsource code such as instructions, branches or conditions. As anexample, for testing a 32-character pattern-matching function, wemight only be interested in a test case that matches a given patternand another that does not. We may not be interested in the rest ofthe exponential number of possible combinations.

III. RELATED WORK

Burnim and Sen describe a technique that builds on the basic depth-first search strategy to bound the number of branches explored [3].They also describe other strategies like the uniform random searchand control-flow graph (CFG) directed search. In the CFG-directedsearch, the algorithm uses the static graph of the program to find shortpaths from branches along the execution to branches that have not yetbeen explored. In [7], a technique called hybrid testing was proposed,

which deals with breaking the regularity of a DFS with randomtest generation. During the DFS process, a test datum is sometimesgenerated at random and the DFS continues from the correspondingpath. Best-first search is a search technique proposed in [8], which isbasically a DFS enhanced with breadth-first aspects. At intervals, allactive choice points are ranked according to some internal heuristicand the best branch is expanded. Similarly, generational searchintroduced in [9] computes all potential new paths from a givenexecution and explores the best one, based on a ranking of allpotential new paths from all active executions. A heuristic describedin [4] deals with pruning a path exploration when the current programstate is considered similar to a previously encountered state. In theseways, the problem of path explosion in dynamic test generation weremoderately addressed, although different from ours.

IV. A REACHABILITY-GUIDED SEARCH STRATEGY

Our first search strategy, ‘Reachability-guided search’ performs areachability analysis of the control-flow graph (CFG) of the programto decide whether the current branch must be explored or not.The reachability-guided strategy utilizes the reachability graph ofthe program-under-test to explore uncovered important parts of theprogram. Such a reachability graph can be extracted using staticanalysis tools or from a specification or a high level model of thesoftware program. This simple strategy also guides the search towardsimportant parts of the code, where the important parts are user-specified in terms of program conditionals (or nodes in the CFG).

Algorithm 1 Reachability-guided strategyCurrNode ← initial node, Path ← CurrNodewhile Path is not empty do

if not covered either branch of CurrNode thentransition ← false branch of CurrNodeif transition can reach any unvisited, important branches then

if transition does not lead to terminal node thenappend transition to Path

elserecord Tests if path constraint of Path has solution

end ifend if

else if covered only one branch of CurrNode then/* symmetric case for true branch */

elseCurrNode← previous node in Path /* backtrack */

end ifend while

This technique builds on the depth-first strategy, such that theadditional reachability check is performed during the search. When-ever the test generator encounters a program conditional whilebacktracking, it first looks up the counter-edge of the previouslyvisited edge. This counter-edge is explored only if it can lead to anyimportant conditional that has not yet been visited, or if the currentconditional itself is an important conditional. If no new importantitems can be reached from the unvisited branch of the current node,then exploration from the current node stops. The search procedurethen backtracks to the antecedent node in the CFG and this processcontinues until all the branches are covered. This method helpsavoid re-visiting branches that do not lead to any new, importantbranches. The steps of the reachability-guided strategy are describedin Algorithm 1.

For programs with several interdependent function calls, thereachability-guided strategy can quickly achieve complete branchcoverage. Let us consider a densely-connected control flow graphwith a number of feasible paths exponential to the number of nodesin the graph. For such a program, the path-exhaustive search wouldexplore all the paths in the program. Our reachability-guided searchwould only try to explore every edge in the graph once and stillachieve complete branch coverage. When we are not interested in

60

Page 3: [IEEE 2010 19th Asian Test Symposium (ATS) - Shanghai, China (2010.12.1-2010.12.4)] 2010 19th IEEE Asian Test Symposium - Tackling the Path Explosion Problem in Symbolic Execution-Driven

exercising every feasible path and are only interested in covering alledges in the graph, the reachability-guided strategy helps us achieveexactly that. In addition, when the goal is to generate a test caseto target a particular branch, just that conditional can be ‘marked’important in the user-supplied list of important conditionals. As aresult, the reachability-guided search strategy provides significantimprovement in the speed of test case generation by achievingcomplete branch coverage quickly, while exploring fewer paths.

V. A CONFLICT-DRIVEN BACKTRACKING STRATEGY

As described earlier, test data to exercise a particular path in aprogram are generated by solving the path constraints of that path,with the help of a constraint solver. Such a solver is able to check ifa constraint equation is ‘satisfiable’ or ‘unsatisfiable’. In the case ofsatisfiability, it returns a solution to the constraint equation. Similarly,when unsatisfiable, we can derive useful information from the solverto further improve our search process. We describe our proposednonchronological backtracking strategy, similar to the search strategyfound in most of the SAT solvers today [10] and is also known asbackjumping. We have extended this concept towards the problem ofsoftware test generation using SMT solvers. In addition to conflict-driven nonchronological backtracking, we also use ‘conflict-drivenlearning’ to avoid revisiting paths that share the same set of conflict-ing assignments. This procedure is detailed in Section V-B.

In our approach, an infeasible path is viewed as a ‘conflict’, and thetwo terms will be used inter-changeably (depending on the context)henceforth. Let us assume that the direction-assignment at decisionlevel β in the execution tree leads to a conflict with a previousassignment and results in an infeasible path. It is worth noting that allvalue assignments that are made after the decision level β will forcethe just-identified ‘conflict clause’ to be unsatisfied. In the basic DFSbased search process, the procedure backtracks chronologically to theimmediately preceding decision level and may waste a significantamount of time exploring a useless region of the search space only todiscover that the region does not contain any satisfying assignments.However, by an analysis of the encountered conflicts, it may bepossible to backtrack nonchronogically by jumping back multiplelevels in the search tree at once.

Fig. 2: Control-flow graph showing the benefit of the conflict-drivenbacktracking strategy

An example of the benefit of this strategy can be seen in Figure2. The nodes in the graph depict conditionals and the solid anddotted lines represent their true and false branches, respectively. Letus consider the case when an infeasible path is encountered, say,(0, F ) → (1, T ) → (3, T ) → (5, F ) → (6, T ) → (9, F ) → (10, T ) .

As soon as we reach the end of the path (node 13), we derive thecause of the infeasibility; in this case, the two incompatible ‘clauses’(y < 4)T and (y < 7)F . Since this combination of constraints cannever be satisfied, all paths containing this pair of clauses will beinfeasible. In the basic DFS approach, the following set of infeasiblepaths will be explored next:

(0, F ) → (1, T ) → (3, T ) → (5, F ) → (6, T ) → (9, F )...(0, F ) → (1, T ) → (3, T ) → (5, F ) → (6, T ) → (9, T )...(0, F ) → (1, T ) → (3, T ) → (5, F ) → (6, F ) → (8)

Note that four infeasible paths are explored before webacktrack to node 5 and explore the next feasible path,say (0, F ) → (1, T ) → (3, T ) → (5, T ) → (7, F ) → (9, T ). In ourconflict-driven backtracking strategy, we avoid visiting these infea-sible paths by ‘backjumping’ from terminal node 13 directly to thecause of the conflict, i.e. conditional (y < 7). Here, for the currentpath (0, F ) → (1, T ) → (3, T ) → (5, F ) → (6, T ) → (9, F ) →(10, T ), the current decision level (at node 10) is 6 and the levelthat we must backtrack to, β is 3. We then negate this conditional 5,in order to visit its true branch and the search then continues from thispoint on. Thus, for this small example, by using the conflict-drivenstrategy we visit only one infeasible path instead of four!

A. Using the Unsatisfiable core for Conflict AnalysisIn order to implement nonchronological backtracking, it is neces-

sary to determine the cause of a conflict. This information can beobtained with the help of the unsatisfiable core of the SMT instance.The SMT-LIB standard [11] describes the unsat core as a subset of theset of all assertions that the solver has determined to be unsatisfiable.

Definition 2 (Unsatisfiable Core). Given an unsatisfiable SMTformula ϕ, we say that an unsatisfiable SMT formula ψ is anunsatisfiable core of ϕ iff ϕ = ψ ∧ ψ’ for some (possibly empty)SMT formula ψ’. Intuitively, ψ is a subset of the constraints in ϕcausing the unsatisfiability of ϕ.

Some of the current state-of-the-art SMT solvers provide waysto extract the unsat core with techniques adapted from SAT. Thesolver that we use, Yices, uses a technique involving selector variablesintroduced for each original clause, to extract the unsat core [12]. Forthis purpose, each clause has an ID. The unsatisfiable core is then asubset of the assertions that is inconsistent by itself.

When an infeasible path is encountered in our execution tree, theunsatisfiable core is extracted to find the cause of the conflict. The IDsof the clauses in the unsat core are then mapped onto the conditionalsin the program. Next, instead of backtracking chronologically tothe immediately preceding conditional, we backtrack to a previousconditional that led to the conflict and negate it. In the simple casewhere the conflict is caused by two conflicting assignments, as shownin Figure 2, the unsat core that we obtain from Yices contains thispair of conflicting clauses [(y < 4)T, (y < 7)F ]. We then backjumpto the previous conditional that is a part of the conflict clause pair,i.e. conditional (y < 7). In the case where the cause of the conflict ismore complex and the unsat core consists of more than two conflictclauses, we backtrack to the most recently visited conditional whosedirection assignment led to the conflict and whose counter-edgeremains unexplored. The complete algorithm of the conflict-drivenbacktracking strategy is shown in Algorithm 2.

B. Conflict-driven learningAnother pruning technique used in most state-of-the-art SAT

solvers is called learning. Learning extracts and records the infor-mation from the previously searched space to prune the search inthe future. Here, we focus on learning that occurs as a consequenceof conflicts created during the search process. This is referred to asconflict-driven learning and from this point on, we will use the termlearning only in this context.

61

Page 4: [IEEE 2010 19th Asian Test Symposium (ATS) - Shanghai, China (2010.12.1-2010.12.4)] 2010 19th IEEE Asian Test Symposium - Tackling the Path Explosion Problem in Symbolic Execution-Driven

Algorithm 2 Conflict-driven backtracking strategyCurrNode ← initial node, Path ← CurrNodewhile Path is not empty do

if not covered either branch of CurrNode thentransition← branch that does not violate learned clausesif transition can reach any unvisited, important branches then

if transition does not lead to terminal node thenappend transition to Path

elseif path constraint of Path has solution then

record Testselse

blevel← compute backtrack level from unsat coreupdate conflict clause database

end ifend if

end ifelse if covered only one branch of CurrNode then

/* symmetric case for counter-branch */else

if Path was infeasible thenCurrNode← node in blevel /* backjump */

elseCurrNode← previous node in Path /* backtrack */

end ifend if

end while

Each time a conflict is encountered, the conflict analysis engineadds some clauses to the database. These clauses record the reasonsdeduced from the conflict to avoid making the same mistake in thefuture search. We use a similar idea to avoid visiting paths that includeassignments that we already know will lead to a conflict. Every timean infeasible path is encountered, the unsatisfiable core is extracted.This information is translated into a small set of conditionals andtheir branch assignments that led to the conflict.

In our example depicted in Figure 2, the conflict clause learnedis [(1, T ), (5, F )]. This ‘clause’ is then stored in a conflict clausedatabase, like those maintained in SAT solvers. In the followingiterations of our search, when we select a new branch to explore,we first check if the chosen branch assignment in the path containsany of the learned conflict clauses. If such a violation is found, we flipthe faulty assignments until we obtain a path, free of any conflictingassignments, or backtrack further up our search tree.

C. Completeness of the algorithm

Further, we prove that our nonchronological strategy is completeand does not result in any loss of branch coverage. We base thisinference on the proof of completeness of the GRASP algorithm [10].For reference, we reproduce a part of the discussion here, with respectto our strategy.

Lemma 1. Let β be the backtracking decision level computed bythe conflict analysis. Let Aβ denote the partial path assignmentcontaining all the conditionals and their directions with decisionlevels no greater than β. In this scenario, a solution cannot be foundfor any path assignment A such that A ⊃ Aβ .In other words, a solution cannot be found for any path, with pathpredicate P such that P ⊃ Pβ , where Pβ is the path predicatecorresponding to Aβ .

Corollary 1. Let α be the current decision level and β be thecomputed backtracking decision level. In this scenario, a feasiblepath to the current branch of the conditional at level α cannot befound until the search process backtracks to decision level β.

Definition 3 (Completeness). In our context of path exploration, asearch strategy is said to be complete if, for every branch, a feasiblepath will be found if at least one such feasible path exists.

Theorem. The nonchronological backtracking search algorithm iscomplete.Proof. It is well known that the depth-first strategy, with chrono-logical backtracking is a complete search procedure. Backtrackingsearch extends partial assignments until the depth of the search treehas been reached. In the event of an inconsistent path assignment,i.e. an infeasible path being encountered, the most recent decisionassignment yet untried is considered and the search proceeds. Hence,if a solution exists, it will eventually be found. Given these facts, weonly need to prove that nonchronological backtracks don’t jump overpartial assignments that can be extended to feasible path assignments.Let us suppose that the current decision level is α and the computedbacktracking decision level is β. Then, by Lemma 1 and Corollary 1,we can conclude that a feasible path for the current branch cannot befound by extending any partial path assignment defined by decisionlevels greater than β on the current decision path, i.e. all pathassignments {Aγ | β < γ ≤ α} are infeasible. It then follows thatif a feasible path to the current branch exists, it will eventually beenumerated and, consequently, the algorithm is complete.

VI. SOME IMPLEMENTATION DETAILS

For our initial experiments, we implemented our algorithm on atest generation tool, driven by symbolic execution. This tool is builton the lines of the DART [2] and CUTE [1] frameworks. It is writtenin the functional language, OCaml, and is composed of three maincomponents - an instrumentation tool, a symbolic execution library,a path exploration framework. CIL [13], an OCaml framework forparsing, transforming and analyzing C code, is used to instrumentthe program under test for symbolic execution. The library deals withsymbolic execution of the program under test. The path explorationframework deals with the exploration of the program. The pathconstraints are solved with the Yices SMT solver [12].

We implemented our search strategies on a stand-alone path explo-ration framework without the overhead of program instrumentationand symbolic execution. The proposed technique is written in C++and makes use of the parsing utilities Lex and Yacc. The SMT solverYices is used as the constraint solver. The proposed test generatortakes as input a control flow graph of a program, represented inthe form of a finite state machine. It selects a path to explore byaccumulating the branch constraints of each state (or node in thegraph) along the path and provides this conjunction of constraints tothe SMT solver. If the solver is able to find a solution, the obtainedsolution serves as the test vector for the path and if no solution isfound, the path is recorded as an infeasible path. It then selects thenext path to explore, based on the search strategy selected.

Our testing framework can be used for testing any model thatcaptures the behavior of a system; thus, it is programming languageindependent, as long as the intent of the program can be expressed inthe form of a control-flow graph and the conditionals in the programcan be expressed using the available SMT theories. Programs writtenin any C-like language and even in register-transfer-level (RTL) formcan be modeled in the form of a finite state machine.

VII. EXPERIMENTS AND RESULTS

We performed our experiments on a dual-core 2.8GHz Intel Core 2Duo machine with 2.9 GB of RAM. We evaluated our implementationon examples derived from Test Programs that are used in thehigh volume manufacturing of Intel’s semiconductor devices. TestPrograms are software entities written on top of a tester operatingsystem and used to control the tester hardware. They determinewhether a Silicon device is defect-free and also its product category.A bug in the Test Program software can result in defective units beingshipped to customers or working units being discarded or binning ofthe units into incorrect product categories. These Test Programs arewritten in C and are loop-free, but control-intensive. The number ofpaths in Test Programs is often in the order of millions.

62

Page 5: [IEEE 2010 19th Asian Test Symposium (ATS) - Shanghai, China (2010.12.1-2010.12.4)] 2010 19th IEEE Asian Test Symposium - Tackling the Path Explosion Problem in Symbolic Execution-Driven

TABLE I: Experimental results for the path-exhaustive, reachability-guided and conflict-driven backtracking strategiesPath-exhaustive (PE) Reachability-guided (RG) Conflict-driven backtrack (RG+CB)

Test-case

#conds

Paths exp. Inf. paths Time (s) Pathsexp.

Inf. paths Time (s) Pathsexp.

Inf.paths

Time (s) Speedup

tc1 17 6,816 576 5.58 32 12 0.14 19 1 0.07 2.0tc2 20 12,944 33,136 23.48 1640 5574 5.68 18 9 0.06 94.67tc3 28 3,238,240 73,760 2,596.01 28 137 0.53 28 3 0.16 3.31tc5 30 4,239,360 0 2,701.85 31 0 0.12 31 0 0.12 1.0tc18 34 – – – 407 192 4.14 34 1 0.28 14.79tc6 39 – – – 9,590 9,568 73.67 41 1 0.28 263.11tc19 39 – – – 35 843,264 525.41 35 2 0.18 2,918.94tc7 42 – – – – – – 42 4 0.25 N/Atc20 42 1,716 0 0.93 43 0 0.06 43 0 0.06 1.0tc8 47 – – – 3,868 3,840 55.91 53 4 0.53 105.49tc9 61 – – – 127 50 2.59 62 3 0.71 3.65tc22 84 1,159,488 0 1,026.91 85 0 0.21 85 0 0.21 1.0tc23 143 – – – – – – 194 8 5.84 N/Atc10 176 – – – – – – 180 2 8.02 N/Atc11 180 – – – 268 126 26.9 191 32 10.95 2.46tc24 210 – – – 211 0 0.82 211 0 0.81 1.00tc13 445 – – – 622 182 193.34 467 28 114.98 1.68‘-’ indicates time-out in 2 hours.

We compare our two proposed strategies, the reachability-guidedstrategy and the conflict-driven nonchronological backtracking strat-egy with the base DFS strategy, which we call the path-exhaustivestrategy. An important point to note is that, since both of ourstrategies are complementary, we implemented our nonchronologicalbacktracking strategy, as an enhancement of the reachability strategy.We also evaluate our proposed search strategies against other efficientsearch strategies proposed in recent work. We compare our conflict-driven backtracking-based search strategy with two search strategiesin CREST, an open-source test generation tool for C [3].

The results obtained for the three strategies: path-exhaustivesearch (PE), reachability-guided search (RG) and conflict-drivenbacktracking-based search (RG+CB), are shown in Table I. For eachtest case derived from a Test Program, # conds denotes the number ofdistinct conditionals in the program model, where each conditionalleads to a true and false transition. Based on the outcome of theconditional, each transition leads to either another such state or aterminal state in the model. In Table I, Paths exp. denotes the numberof feasible paths explored and Inf. paths indicates the number ofinfeasible paths explored. Time (s) is the CPU run time in secondsand Speedup indicates the speedup of the RG+CB method overthe RG method. The entries (–) represent instances for which thecorresponding experimental run timed-out (the time-out was set to2 hours). In each of the test cases with statistics reported, completebranch coverage was achieved.

We can see that the RG strategy greatly outperforms the PEstrategy for all cases. For most test cases with more than 30 distinctconditionals (except test cases without any infeasible paths), the PEstrategy results in a time-out. We also observe that, unsurprisingly,the RG and RG+CB strategies give the same results for test caseswith no infeasible paths in the program model. From this we caninfer that as long as there are no infeasible paths in the model,the RG strategy performs well enough. However, as the model sizeand the number of infeasible paths increase, the RG strategy takessignificantly longer to complete and in some cases, even results in atime-out. In addition, we see that for some test cases, the number offeasible paths explored is also lower using the RG+CB strategy thanthe RG strategy. This is attributed to the fact that the reachabilityanalysis is only performed while backtracking through the currentpath and not whenever a new path is selected. Also, every time anew path is selected, the default branch that is selected for everysubsequent conditional is the false branch. This leads to additionalfeasible paths being explored using the RG strategy. For example, fortc18 with 34 conditionals, the path-exhaustive method timed out in 2

hours. The reachability-guided method needed to explore 407 pathsto complete the search. The total time taken is less than 5 seconds!Next, in the conflict-driven backtrack method, only 34 paths neededto be explored. Among the 192 infeasible paths, only 1 needed tobe considered, as the rest could be readily blocked from the addedconflict clauses. The total time taken was less than a second! This ismore than 14× speedup over the RG strategy!

Table II shows the results for varying the number of conflicts intro-duced into the model. For each test case derived from a Test Program,# conflicts denotes the number of inconsistencies introduced in themodel for the purpose of evaluation. These inconsistencies in turn,lead to infeasible paths in the model. # conds, Paths exp., Inf. pathsand Time (s) retain the same meanings as in Table I.

We can see from Table II that, when there are no inconsistencies inthe model, the two strategies perform identically, as expected. As thenumber of conflicts increases, the number of paths explored under theRG strategy markedly increases and consequently, so does the run-time. We can also see that the number of feasible paths explored usingthe RG strategy is much larger in many cases, as previously explainedwith reference to Table I. In the case of the RG+CB strategy, with thehelp of the intelligence of the conflict analysis engine, infeasible pathsonce explored, are not visited again. For example, for tc2 with 20conditionals, both the RG and the RG+CB method achieved completebranch coverage in less than 0.1 seconds, by exploring just 19 feasiblepaths, when there were no conflicts in the model. However, when8 conflicts were introduced in the same model, the RG methodtook 5.68 seconds to complete, while exploring 1640 feasible pathsand 5574 infeasible paths. Whereas, the RG+CB method needed toexplore only 18 paths and 9 infeasible paths for complete coverage.The time taken by the RG+CB method was only 0.06 seconds, morethan 90× speedup over the RG method.

In Table III, we compare our conflict-driven backtracking-based(RG+CB) search strategy with two search strategies in CREST, anopen-source test generation tool for C [3]. The table contains resultsfor comparison of our RG+CB strategy with CREST’s uniform-random and CFG-directed search strategies. These two strategies arerepresented as CREST (UR) and CREST (CFG-D) respectively. TheCREST (UR) strategy samples the path space of a program ratherthan the input space, by generating paths uniformly at random, andthe CREST (CFG-D) search attempts to increase branch coverageof a program by driving the execution down short static paths tocurrently uncovered branches. In Table III, for each test case, # condsrepresents the number of program conditionals and # iters representsthe number of iterations or paths explored (feasible and infeasible)

63

Page 6: [IEEE 2010 19th Asian Test Symposium (ATS) - Shanghai, China (2010.12.1-2010.12.4)] 2010 19th IEEE Asian Test Symposium - Tackling the Path Explosion Problem in Symbolic Execution-Driven

TABLE II: Statistics of experiments with reachability-guided and conflict-driven strategies, for varying number of conflictsReachability-guided (RG) Conflict-driven backtracking (RG+CB)

Test-case # conds # conflicts Paths exp. Inf. paths Time (s) Paths exp. Inf. paths Time (s)

tc2 20

0 19 0 0.06 19 0 0.052 30 9 0.09 21 2 0.054 62 1,467 1.16 20 4 0.068 1,640 5,574 5.68 18 9 0.06

tc15 36

0 38 0 0.14 38 0 0.234 56 32 0.53 39 4 0.246 134 225 1.03 39 6 0.277 2,119 2,654 5.49 40 7 0.27

tc16 470 48 0 0.34 48 0 0.343 65 27 0.54 51 3 0.346 2,008 9,115 26.92 48 5 0.37

tc9 61 0 62 0 0.72 62 0 0.71 127 50 2.59 62 3 0.71

‘-’ indicates that no back-jumping was performed.

TABLE III: Results for comparison of conflict-driven backtracking search strategy with two CREST search strategiesCREST (UR) CREST (CFG-D) RG+CB Speedup of RG+CB

Test-case # conds # iters Time (s) # iters Time (s) # iters Time (s) v/s (UR) v/s (CFG-D)tc2 20 74 1.77 19 0.39 20 0.06 25.33 5.53tc17 27 89 2.25 33 0.73 28 0.16 14.04 4.57tc18 34 196 4.96 176 3.45 35 0.28 14.18 12.32tc19 34 118 2.62 39 0.76 37 0.18 14.57 4.23tc14 37 124 3.21 43 0.82 43 0.24 13.37 3.4tc20 42 430 9.51 62 0.69 43 0.06 158.42 11.53tc9 61 382 9.29 67 1.36 65 0.71 13.08 1.91tc21 73 432 9.98 106 2.08 80 1.51 6.61 1.37tc22 84 4,779 32.64 116 2.24 85 0.21 153.95 10.56tc23 143 1,058 29.29 159 3.77 194 5.52 5.31 0.68tc24 210 53,524 393.76 293 5.87 211 0.83 475.33 7.08

to obtain complete branch coverage. The last two columns denotethe speedup obtained by the RG+CB method over UR and CFG-D,respectively. Both these methods used in CREST use heuristics toselect new branches to explore, due to which, each CREST run mayresult in different paths being explored and consequently, differentstatistics. The statistics shown here were obtained for each test caseby averaging the results obtained from 10 separate runs.

From the statistics in Table III, we can see that our RG+CBstrategy greatly outperforms CREST’s uniform-random search pro-cedure and in most cases, performs better than the CFG-directedsearch procedure. For example, in the case of tc18, a program with34 conditionals, the CREST (UR) method required 196 iterations toachieve complete branch coverage and the CREST (CFG-D) methodrequired 176, whereas our method required just 35. The RG+CBmethod resulted in a speedup of 14.18 over the CREST (UR) methodand a speedup of 12.32 over the CREST (CFG-D) method. The searchstrategies used in CREST to quickly obtain branch coverage are basedon heuristics to select new branches and discard previously visitedpaths. These heuristics are found to be insufficient when the programbeing tested has several infeasible paths or several long paths withina sub-tree of its CFG. Our proposed strategy, on the other hand, dealswith these problems efficiently.

VIII. CONCLUSION

Constraint-based test generation is a promising technique to auto-matically generate tests with high coverage. However, it suffers fromthe major bottleneck of path explosion in large applications. Thispaper introduces intelligent search strategies to help solve this scala-bility challenge . Our experiments showed that the reachability-guidedstrategy can majorly reduce the number of tests required to achievebranch coverage quickly. The nonchronological backtracking strategyshows significant savings in the number of infeasible paths exploredby an order of magnitude. The two strategies are complementary toeach other and improve the overall speed of test generation.

ACKNOWLEDGMENT

This work was supported in part by a grant from Intel Corporation.

REFERENCES

[1] K. Sen, D. Marinov, and G. Agha, “CUTE: A Concolic Unit TestingEngine for C,” in Proc. European software engineering conf., (New York,NY, USA), pp. 263–272, ACM, 2005.

[2] P. Godefroid, N. Klarlund, and K. Sen, “DART: Directed AutomatedRandom Testing,” in Proc. ACM Conf. Programming language designand implementation, (New York, NY, USA), pp. 213–223, ACM, 2005.

[3] J. Burnim and K. Sen, “Heuristics for Scalable Dynamic Test Genera-tion,” in Automated Software Engineering, 2008. IEEE/ACM Int’l Conf.,pp. 443–446, September 2008.

[4] P. Boonstoppel, C. Cadar, and D. R. Engler, “RWset: Attacking PathExplosion in Constraint-Based Test Generation,” in TACAS, pp. 351–366, 2008.

[5] B. T. Technology, “BullseyeCoverage - Measurement Technique,” http://www.bullseye.com/measurementTechnique.html.

[6] S. Bardin and P. Herrmann, “Pruning the Search Space in Path-BasedTest Generation,” in Pro. 2009 Int’l Conf. Software Testing Verificationand Validation, pp. 240–249, 2009.

[7] R. Majumdar and K. Sen, “Hybrid concolic testing,” in Proc. Int’l Conf.Software Engineering, pp. 416–426, IEEE Computer Society, 2007.

[8] C. Cadar, V. Ganesh, P. Pawlowski, D. Dill, and D. Engler, “EXE:Automatically generating inputs of death,” ACM Trans. Information andSystem Security, vol. 12, no. 2, p. 10, 2008.

[9] P. Godefroid, M. Levin, D. Molnar, et al., “Automated whitebox fuzztesting,” in Proc. Network and Distributed System Security Symp.,Citeseer, 2008.

[10] J. Marques-Silva and K. Sakallah, “GRASP: A search algorithm forpropositional satisfiability,” IEEE Trans. Computers, vol. 48, no. 5,pp. 506–521, 1999.

[11] S. Ranise and C. Tinelli, “The SMT-LIB Standard: Version 1.2,” Depart-ment of Computer Science, The University of Iowa, Tech. Rep, 2006.

[12] B. Dutertre and L. De Moura, “The Yices SMT solver,” Tool paper athttp://yices.csl.sri.com/ tool-paper.pdf , 2006.

[13] G. C. Necula, S. McPeak, S. P. Rahul, and W. Weimer, “CIL: Inter-mediate Language and Tools for Analysis and Transformation of CPrograms,” in Proc. Int’l Conf. Compiler Construction, (London, UK),pp. 213–228, Springer-Verlag, 2002.

64