IEEE TRANSACTIONS ON DEPENDABLE AND …wpage.unina.it/roberto.natella/papers/natella_faultprog_tdsc2016.pdfWe propose Faultprog, a systematic approach for testing the accuracy of binary

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING 1

Faultprog: Testing the Accuracy ofBinary-Level Software Fault Injection

Domenico Cotroneo, Anna Lanzaro, and Roberto Natella

Abstract—Off-The-Shelf (OTS) software components are the cornerstone of modern systems, including safety-critical ones.However, the dependability of OTS components is uncertain due to the lack of source code, design artifacts and test cases,since only their binary code is supplied. Fault injection in components’ binary code is a solution to understand the risks posed bybuggy OTS components.In this paper, we consider the problem of the accurate mutation of binary code for fault injection purposes. Fault injection emulatesbugs in high-level programming constructs (assignments, expressions, function calls, ...) by mutating their translation in binarycode. However, the semantic gap between the source code and its binary translation often leads to inaccurate mutations.We propose Faultprog, a systematic approach for testing the accuracy of binary mutation tools. Faultprog automatically generatessynthetic programs using a stochastic grammar, and mutates both their binary code with the tool under test, and their sourcecode as reference for comparisons. Moreover, we present a case study on a commercial binary mutation tool, where Faultprogwas adopted to identify code patterns and compiler optimizations that affect its mutation accuracy.

Index Terms—Off-The-Shelf software; Dependability Benchmarking; Fault Injection; Software Mutation; Random Testing.

F

1 INTRODUCTION

The need for lowering development costs and thetime-to-market leads companies to develop their sys-tems by integrating Off-The-Shelf (OTS) softwarecomponents from third parties. Nowadays, this ap-proach is also adopted for safety-critical systems suchas avionic, automotive and medical ones, where thedependability of OTS software becomes a major con-cern [1], [2], [3]. When an OTS component is reusedin a new context, the system may use parts of thecomponent that were not previously used and/orthat were only lightly tested. Furthermore, the OTScomponent may interact with the new environment inunforeseen ways, thus exposing residual software faults(“bugs”) that had not been discovered in previous use.The issue of OTS software is exacerbated by the lackof source code, design artifacts and test cases, sinceonly its executable binary code is usually available.

Fault injection is a widely-used approach for miti-gating the risks posed by faulty OTS components in acritical system [4], [5], [6]. It consists in the deliberateinjection of faults into a component, by means ofbinary mutation (BM). Mutations mimic softwarefaults by inserting small “faulty” changes into thebinary code of the OTS component; then, the systemis executed, in order to analyze its ability to toleratefaults in the OTS component, as demonstrated in avariety of work [7], [8], [9], [10]. Another emergingapplication of BM is to increase the efficiency of

• Authors are with the Department of Electrical Engineering and Infor-mation Technologies (DIETI), Federico II University of Naples, Italy.E-mail: {cotroneo, anna.lanzaro, roberto.natella}@unina.it

mutation testing (which uses mutants to select testcases), by avoiding the cost of recompiling a highnumber of source-code mutants [11], [12].

However, making accurate changes to binary codeis a technically-challenging problem [13], [14], [15],[16], [17]. A BM tool works in two steps: (i) it firstlooks for code patterns in the binary code that representhigh-level programming constructs, according to afault model to inject [18], [19], [14]; (ii) the code patternis mutated by replacing its machine instructions. Fig. 1shows an example: To emulate a “missing variableassignment” fault (a frequent type of software fault,as discussed later), a BM tool should identify, andremove, the machine instructions that result from thetranslation of an assignment in the source code. Theremoval of these instructions from the binary codeemulates a programmer’s omission in the source code.

Source code!!int a, b, c;!a = 3;!b = 5;!c = 7;!

Binary code!!mov %rsi,-0x20(%rbp)!movl $0x3,-0xc(%rbp)!movl $0x5,-0x8(%rbp)!movl $0x7,-0x4(%rbp)!

Compilation!Binary!

mutation!

Source!mutation!

Faultprog: Is binary mutation!an accurate alternative to!

source mutation?!

Mutated binary!!mov %rsi,-0x20(%rbp)!movl $0x3,-0xc(%rbp)!!movl $0x7,-0x4(%rbp)!nopl 0x0(%rax)!

Mutated source!!int a, b, c;!a = 3;!!c = 7;!b = 5;!

Fig. 1. Example of fault injection at binary level.

The main challenge for a BM tool is to assure the ac-curacy of binary mutations, that is: binary mutationsshould emulate the software faults as they wouldbe injected into source code and then compiled intobinary code [13], [14], [17]. Accuracy is a necessary


condition for the rigorous and unbiased evaluationof fault tolerance properties [8], [7], [9], [10], sinceinjecting inaccurate mutants would lead to misleadingconclusions (which can be very dangerous in thecontext of critical systems). However, the accuracyof binary mutation is hampered by the semanticgap between source code (consisting of high-levelprogramming constructs wrote by developers) andits translation into binary code (consisting of low-level machine instructions). The gap is widened bythe complexity of programming languages, compil-ers, and CPUs, and makes difficult to identify high-level programming constructs by only looking at theirtranslation in binary code. This gap negatively affectsthe accuracy of BM tools, as showed in previous work[15] (see § 2 for a more detailed discussion), and leadsto an incorrect assessment of fault tolerance.

In this work, we propose an approach, namelyFaultprog, for testing the accuracy of BM tools. Fault-prog is based on the automatic generation of syntheticprograms, which are submitted as inputs to a BM tool,in order to evaluate its accuracy at performing binarymutations. First, several synthetic programs are gen-erated by encompassing different programming con-structs (e.g., expressions, function calls, assignments)in different contexts (e.g., nested loops, control flowconstructs and function calls). Then, the BM tool is ap-plied on the binary code of these synthetic programs.The binary mutants produced by the BM tool arecompared against the mutants produced by source-code mutation and compilation. In this process, theanalysis of source code serves as a reference, as it doesnot suffer the limitations of binary-level mutation.Faultprog assesses the accuracy of the BM tool atcorrectly recognizing and mutating programmingconstructs at the binary level, revealing its issues andlimitations. In other words, the synthetic programs actas a “test suite” for assessing the quality of binarymutations. Given the increasing use of fault injectiontechniques in industry, which are nowadays recom-mended by several safety standards [20], we believethat giving an approach to practitioners to evaluatethe accuracy of fault injection tools is very important.

This paper evaluates the feasibility and the use-fulness of putting the Faultprog approach into prac-tice in the context of a BM tool from a commercialfault injection suite. We use Faultprog to test a toolcurrently under development by CRITICAL SoftwareS.A., a leading company in the field of independentV&V of safety-critical systems. We focus on the in-depth analysis of this BM tool, and present the lessonslearned and the generality and limitations of the find-ings. Overall, the use of random program generationin Faultprog proved to be useful at exposing the BMtool to complex program expressions, and to uncovercorner cases that cause inaccurate mutations. Whilethe BM tool produced correct mutations in manycases, we found that some code patterns in the fault

model (“constraints”) are not correctly handled, andthat additional code patterns are needed to identifyall injectable faults. Finally, we found that only fewcompiler optimizations have a significant impact onthe accuracy of the BM tool, and developers’ effortsshould be focused on improving the tool with respectto these optimizations.

The paper is structured as follows: § 2 discusses theproblem of BM accuracy with a motivating example;§ 3 describes Faultprog; § 4 presents the experimentalresults obtained in the context of a BM tool; § 5discusses related work; § 6 closes the paper.

2 THE PROBLEM OF ACCURACY

As a motivating example for the rest of the paper, wesummarize the findings of our preliminary work onthe accuracy of BM [15]. In that work, we analyzed aBM tool that adopts the Generic Software Fault InjectionTechnique (G-SWFIT) [14], a well-known technique forinjecting realistic software faults in binary code. Thepeculiarity of G-SWFIT is that its fault model reflectsthe types of software faults that most frequently cause fail-ures, according to field data about failures experiencedby users [18], [19], [14]. The definition of these faulttypes is based on the well-known Orthogonal DefectClassification [21], which classifies software faults asAssignment, Algorithm, Checking, and Interface faults.

For each fault type (listed in TABLE 1), G-SWFITdefines: a code pattern where the fault type shouldbe injected (for instance, assignment faults shouldbe injected in move instructions, and control flowfaults should be injected in branch instructions), anda code change to be injected for emulating that faulttype (e.g., replacing move or branch instructions withnops). Code patterns reflect the typical translation ofhigh-level programming constructs of the fault modelinto sequences of machine instructions. The defini-tion of code patterns is the most tricky aspect of G-SWFIT, since the binary translation of programmingconstructs is influenced by several factors, includingthe programming language, the compiler and its op-timizations, and the hardware architecture.

In addition to the code patterns, G-SWFIT’s faulttypes also include a set of rules (constraints) that definethe “context” in which faults can be injected. Theconstraints serve to avoid code locations where thechange would not emulate a realistic fault [14]. Theseconstraints, listed in TABLE 1 and in TABLE 2, arebased on the analysis of field data (see [22] for details).Constraints may apply to more than one fault type.

For example, the MFC fault type has a constraint(C01) imposing that a function call should be removedonly if it does not return any value or the return valueis discarded by the caller. In fact, field data (and alsointuition) point out that if a programmer uses thereturn value from a function call, then it is unlikely toforget that function call in the program, and removing


TABLE 1Fault types of G-SWFIT [14], [22].

Class Acronym Description Constraints

Ass

ignm

ent

MVIV Missing variable initializationusing a value

C02, C03, C04,C05, C06

MVAV Missing variable assignmentusing a value

C02, C03, C06,C07

MVAE Missing variable assignmentwith an expression

C02, C03, C06,C07

WVAV Wrong value assigned to vari-able

C03, C06, C07

Alg

orith

m

MFC Missing function call C01, C02

MIFS Missing IF construct + state-ments

C02, C08, C09

MIEB Missing IF construct + state-ments + ELSE construct

C08, C09

MLPA Missing small and localizedpart of the algorithm

C02, C10

Che

ckin

g

MIA Missing IF construct aroundstatements

C08, C09

MLAC Missing AND in expressionused as branch condition

MLOC Missing OR in expression usedas branch condition

Inte

rfac

e WPFV Wrong variable used in param-eter of function call

C03, C11

WAEP Wrong arithmetic expression infunction call parameter

the function call would not be representative of realsoftware faults made by programmers. Another ex-ample is the MLPA fault type, which has a constraint(C10) that imposes to remove at least two, and at mostfive, consecutive statements that are neither controlnor loop statements. All these constraints are impor-tant for emulating software faults in a representativeway, which in turn is a requirement for a correct andunbiased evaluation of fault tolerance [23].

In our previous experiment [15], we evaluatedthe accuracy of a G-SWFIT fault injection tool ona complex software system for the space domain.In that experiment, we injected faults in both OSand application binary code, and compared binarymutations with mutations performed on the sourcecode, following the rules of G-SWFIT in both cases.We found that: (i) for each fault injected by the BMtool, there were two more injectable faults that wereomitted by the tool, and (ii) among the faults injectedby the BM tool, about half of them were invalid.These incorrect injections were due to two reasons:(i) intrinsic limitations of G-SWFIT, and of BM ingeneral, that are very difficult to avoid (e.g., inlineC functions, which mislead a BM tool to inject oneseparate fault for each call site of the function); (ii)incorrect injections due to the incomplete or simplifiedimplementation of the BM tool. Many incorrect injec-

TABLE 2Constraints of fault types in G-SWFIT [14], [22].

Id Description

C01 Return value of the function must not being used

C02 The construct must not be the only statement in the block

C03 Variable must be local

C04 Must be the first assignment for that variable in themodule

C05 Assignment must not be inside a loop

C06 Assignment must not be part of a FOR construct

C07 Must not be the first assignment for that variable in themodule

C08 The IF construct must not be associated to an ELSEconstruct

C09 The block must not include more than five statementsand not include loops

C10 Statements are in the same block, do not include morethan 5 statements nor loops

C11 There must be at least two variables in this module

tions were due to the second cause, and in particularwere related to the implementation of fault constraintsand to the identification of code blocks and controlstructures. For instance, spurious faults were in somecases incorrectly injected when the target instructionwas the only statement within a block, and somefaults were omitted when an if construct included areturn statement. These incorrect injections were notdue to intrinsic problems of BM, and could be avoidedby conducting a rigorous testing of the BM tool.

These results motivated us to develop a system-atic approach for testing BM tools. The experimentalmethodology of our previous study [15] cannot beeasily adopted by developers of BM tools, since itwas based on the injection of a large number of faults(tens of thousands in our experiment), but analyzingeven a sample of these faults required considerableefforts. Moreover, the analysis is focused on a specificsoftware under study, and incurs a significant numberof incorrect injections not due to issues in the BM tool(which are of interest for developers of BM tools), butto intrinsic or known limitations of BM. Therefore, inthis paper we propose a novel approach that evaluatesinjection accuracy on synthetic programs, which aregenerated in a controlled and automated way in orderto make the analysis more efficient.

3 THE FAULTPROG APPROACH

Faultprog automates the evaluation of BM tools ataccurately injecting faults in binary code. The ap-proach uses synthetic programs, i.e., programs (in ahigh-level language, such as C) that are automaticallyand randomly generated with the sole purpose toevaluate the ability of the BM tool to inject faults


into them. These synthetic programs are compiled intobinary code, and fed to the BM tool. Then, we analyzethe mutations obtained from the BM tool.

The idea of this approach is to generate severalsynthetic programs, in such a way to expose the BMtool to code patterns that could point out its inaccura-cies. Synthetic programs (Fig. 2) contain a target faultlocation in their code, where the BM tool is expected toinject a fault. The target fault location is generated tocomply with the fault types and constraints for whichthe BM tool is designed. For instance, to evaluatethe “missing assignment using value” (MVAV) faulttype of G-SWFIT (Table 1), we generate a targetfault location containing an assignment statement.To comply with the constraints of fault types of G-SWFIT (Table 2), the target fault location consists of anassignment made to a local variable (constraint C03),and this assignment is not the only instruction of itsblock (constraint C02). If the tool is not able to injecta fault in the target fault location, then the syntheticprogram exposes an issue of the fault injector, i.e., itexhibits an omitted injection.

Moreover, we also generate synthetic programs inwhich the fault constraints are deliberately not satis-fied, and in which the fault injector should avoid toinject faults. If the fault injector fails to recognize thatthe target fault location is not compliant to the faultmodel, it will erroneously inject a fault in the targetfault location. In such a case, the synthetic programexposes an issue of the BM tool. This situation repre-sents a spurious injection.

Finally, it is possible that the BM tool correctlyidentifies the target fault location, but it does notinject the intended fault. For instance, the BM toolmay mutate both the target fault location, and otherstatements that surround the target but that shouldnot be mutated. In such cases, executing the mutatedprogram will produce different results than the sameprogram mutated at the source code level. This situ-ation represents an incorrect injection.

To complete the synthetic programs, and to eval-uate the accuracy of the BM tool in the presenceof complex code patterns, the target fault locationis surrounded, preceded and followed by additionalrandomly-generated programming constructs, thatrepresent respectively the context, the preamble and thepostamble of the target fault location (Fig. 2).

Fig. 3 summarizes the workflow and the tools in-volved in the Faultprog approach. A program generatorproduces synthetic programs based on the structureof Fig. 2, by using a grammar of the source lan-guage to generate syntactically-valid statements, anda fault model to generate different flavors of the targetfault location. The program is fed both to the BMtool that we are testing (Tool Under Test) and to asource mutation tool (Oracle Tool) that injects faultsin source code, rather than binary. The Oracle Toolserves as a reference for evaluating the accuracy of

void func() { stmt1; … stmtn;

+ Target stmt; stmt1; … stmtm;

} ... void main() { entry_func(); checksum(); }

Context stmt Context statement (None, SelecCon, IteraCon)

Target fault loca3on, depending on the fault model (assignment, funcCon call, ...)

Random statements before the target (Preamble)

Random statements aIer the target (Postamble)

Fig. 2. General structure of a synthetic program.

Program generator

SourceMutation Tool(oracle tool)

Mutated source test case

tcsrc

tcbin

osrcSource

test case

Binarytest

case

Mutated source execution

Mutated binarytest case

obin

Mutated binaryexecution

Compiler

Compiler

Com

paris

on

Com

paris

on

mtcbin

mtcsrc

BinaryMutation Tool

(tool under test)

Fig. 3. The Faultprog approach.

the Tool Under Test: it follows the same fault model(e.g., G-SWFIT), but works on source code, thus itavoids the accuracy issues encountered by the BMtool. The mutated programs are statically and dynam-ically compared. Program generation, mutation andcomparison are automatically repeated, by leveraginga compiler and a build system.

Test-suite generation (§ 3.1): We automatically gen-erate a set of synthetic programs, which are test-casesfor the BM tool, and form a test-suite. Programs aregenerated according to the fault types and constraintsthat we are testing, where the target statement is sur-rounded by a context, i.e., iteration (a loop construct)or selection (a conditional construct) statements. Thetest-suite is obtained by generating different combi-nations of these parameters, while also controllingthe nesting level and type of expressions (e.g., con-stants, variables, arithmetic operations). The test-suite(TSsrc) contains both valid test-cases (i.e., test-casesthat satisfy all the constraints for a fault type, andwhere the tool is expected to inject), and invalid test-cases (i.e., test-cases that do not satisfy one of theconstraints, and that the tool should not mutate). Test-cases are compiled into binary code (TSbin).

Test-suite mutation (§ 3.2): The Tool Under Testtakes binary test cases from TSbin as input, and pro-duces faulty versions of these programs (mutated testsuite, MTSbin) by changing their binary code. At thesame time, the test-suite TSsrc in source-code form is


fed to the Oracle Tool. For each test-case tcbin in TSbin,we collect the mutants MTSbin generated by the ToolUnder Test, and the output obin resulting from theexecution of mtcbin. The same process is performed onthe source-code test-suite TSsrc using the Oracle Tool,producing the mutants tcsrc and the outputs osrc.

Comparison of mutants (§ 3.3): The mutants pro-duced by the tools (mtcsrci and mtcbini ) are compared,in terms of mutated instructions, and outputs fromtheir execution (osrci and obini ). The comparisons deter-mine whether the BM tool did spurious (i.e., mutationsinjected in the binary code, but not in the source code),omitted (i.e., mutations injected in the source code,but not in the binary code) or incorrect injections (i.e.,mutations that lead to different outputs).

3.1 Test-suite generation

The proposed approach is based on the random gener-ation of synthetic programs. We extend the use of ran-dom programs, that were adopted in past studies fortesting compilers, interpreters, and program analyzers[24], [25], [26], [27], to test the accuracy of binarymutation tools. Random program generators produceprograms as a sequence of statements, includingglobal and local variables declarations, functions, as-signment, expressions, selection and iteration state-ments. The inputs of these programs are constantsproduced during the random generation process. Theoutput of these programs is a checksum of the globalvariables of the program, which is computed justbefore the termination of the program.

A synthetic program is a sequence of randomly gen-erated statements. A statement can be an expression,an assignment, a function call, a selection or an it-eration statement. The random generator bases theprogram generation on a stochastic grammar of thelanguage [28]. A stochastic grammar associates prob-abilities to each grammar rule of the language. Eachrule consists of a left side and a right side, where theleft side is a non-terminal symbol, and the right sidecontains one or more sequences of symbols (eitherterminal and non-terminal). A statement is generatedby concatenating terminal (e.g., operators like “+” and“->”, or keywords like “for”) and non-terminal symbolsin the sequence. Beginning from a “start” rule, theprogram generator replaces each non-terminal sym-bol, by recursively applying the rules in the grammar,until there are no more non-terminal symbols. Whenthe right side contains more than one sequence, thestochastic grammar associates a probability to each se-quence, and the program generator randomly selectsa sequence on the basis of its probability.

In the Faultprog approach, we modify this randomprogram generation process to follow the structureshowed in Fig. 2. The random program should containa statement, named target, that is the location forinjecting faults according to the fault model. The

target statement is generated randomly, according tothe following parameters (TABLE 3):

• Fault type that has to be tested. For instance,when testing the MVAE fault type of G-SWFIT,the target fault location consists of a local vari-able assignment with an expression, such as anarithmetic expression.

• Fault constraint to be violated (if any). For in-stance, when testing the MFC fault type in G-SWFIT, we can generate valid programs that com-ply to both constraints C01 and C02, and invalidprograms that violate one of these two constraints(e.g., the target statement is a function call whosereturn value is used in the rest of the program).

• Context in which the target statement has to beinserted. It can be a selection (e.g., if-then-elseconstruct) or an iteration statement (e.g., while-or for-loop construct).

• Nesting depth of the context, such as the numberof nested loops in which the target statementshould be contained.

• Type of basic operand (BO) to use in expressionsof the target statement. They can be constants,global or local variables, or function calls.

• Structure of expressions in the target statement.According to this parameter, the target statementcontains expressions that are obtained by differ-ent combinations of one or more BOs, randomsub-expressions and random operators.

To obtain test-suites, we generate several randomprograms using the Faultprog automated programgenerator. We generate test cases to cover all the faulttypes in the fault model. Moreover, we consider allcombinations of parameters of Table 3 that apply foreach fault type. For instance, given the MFC faulttype, we generate three sets of test cases: (i) bothconstraints C01 and C02 are satistified; (ii) C01 is notsatisfied; (iii) C02 is not satisfied.

Fig. 4 shows an example of a random (valid) pro-gram, in which (i) the MFC fault type is selected,(ii) all constraints are satisfied, (iii) a function call isnested in two loops, and (iv) the expression in thetarget statement (in this case, the parameter of thefunction call) is the sum of two local variables.

void target_func(int a, int b) {! ...! while(...) {! while(...) {! statement;! statement;! another_func(a+b);! statement;! }! }! ...!}!

Fault type: missing function call!Constraint: all satisfied!Context: while loop!Nesting: 2!Operand type: local variable!Expression: two basic operands!

Synthetic program!

Faultprog parameters!

Faultprog!

Target fault location!

Fig. 4. An example of synthetic program for testing theMFC ("Missing Function Call") fault type.

In this study, we tailored the Faultprog random


TABLE 3Parameters of the Faultprog program generator.

Parameter Description Options

Fault type Type of targetstatement,according to thefault model.

• MFC• MIFS• MIEB. . .

Faultconstraint

Constraint to be vi-olated (if any).

• all satisfied• C01 not satisfied. . .

Context The statementssurrounding thetarget statement.

• none• while loop• for loop• if -target• if -target-then-else• if -then-else-target

Nesting The nesting depthof the contextstatements.

• 0• 1• 2

Basicoperand(BO) type

The type ofoperands involvedin the targetstatement.

• constant• local variable• global variable• function call

Expressionstructure

The expression in-volved in the tar-get statement.1

• single basic operand (BO)• expression with two BOs• expression with three BOs• expression with a BO and arandom sub-expression• expression with a BO andtwo random sub-expressions

1 Expressions are obtained by a random combination of one ormore basic operands, sub-expressions and operators.

program generator for the C language, since thislanguage is predominant in safety-critical control sys-tems and systems software. However, the generalapproach behind Faultprog can easily be ported toother programming languages, such as C++, by in-cluding additional programming constructs in theprogram generation (e.g., by including pointers asbasic operands in the target statement). In fact, thisapproach has been adopted for many languages, suchas SQL [29], Java [30], and x86 machine code [31].Sirer and Bershad [30] showed how grammars canbe customized using a domain specific language. Weimplemented Faultprog by enhancing the open-sourceRandprog tool (that was originally aimed for testing Ccompilers [32]), to support the evaluation of BM tools.

3.2 Test-suite mutationAfter generating test-suites (both in source-code form,TSsrc, and in binary-code form, TSbin), we apply theOracle Tool and the Tool Under Test on them. For eachsynthetic program, we store mutants produced by thetools, i.e., the sets MTSsrc and MTSbin.

For each binary test case tcbin, we focus the analysison mutations injected in the binary instructions thatcorrespond to the target fault location of the syntheticprogram tcsrc. Mutations in other parts the program

that do not belong to the target fault location, such asthe preamble, the postamble and the context (Fig. 2),are not considered in the analysis, since they are onlymeant to introduce complex code patterns around thetarget fault location, and are aimed at evaluating theability of the BM tool at recognizing the target faultlocation among other binary instructions.

The location of target statement in the source codeis recorded by the program generator (Fig. 3) whena synthetic program tcsrc is created. The syntheticprogram is then compiled into a binary programtcbin. To identify the binary instructions in tcbin thatcorrespond to the target fault location, we leveragedebugging information in the binary program, that is,auxiliary information inserted by the compiler in theprogram to track the correspondence between pro-gram elements (such as statements) and their binarytranslation [33]. Even if this information is not avail-able in OTS software components, we still insert itwhen compiling the synthetic program, in order toprovide a “ground truth” for evaluating the BM tool(as discussed in § 3.3). We remark that debugginginformation does not interfere with the binary codeof synthetic programs: compilers insert debugginginformation in a distinct section of an executable file,which is separated from the binary code. Therefore,the binary instructions are the same that would begenerated when debugging information is disabled.

After applying the Oracle Tool and the Tool UnderTest, we obtain for each synthetic program one ormore mutants of the source code, and one or moremutants of its binary code. It is possible that morethan one mutant is generated when the target faultlocation is a complex statement, and more than onefault could be potentially injected in that location(e.g., the “missing arithmetic expression in functionparameter”, when the function call contains severalparameters). These synthetic programs are handled asa set of several distinct test cases.

3.3 Comparison of mutants

The mutants from both the Oracle Tool and the ToolUnder Test are compared to identify omitted, spuriousand incorrect injections by the Tool Under Test. Ide-ally, these tools should inject the same mutation atthe binary- and at the source-level. First, we checkwhether the binary instructions mutated by the ToolUnder Test are the ones corresponding to the targetfault location (denoted by debugging information,§ 3.2). Then, we run the mutants generated by bothtools, and compare their behavior. The Tool UnderTest injects correctly in two cases:

1) In the case of valid synthetic programs (i.e.,the target statement satisfies all the constraintsimposed by the fault type being tested): theBM tool identifies and mutates all the binaryinstructions of the target fault location, and the


resulting mutant produces the same outputs ofthe corresponding source mutant.

2) In the case of invalid synthetic programs (i.e.,the target statement does not satisfy one ofthe constraints imposed by the fault type beingtested): both tools identify the target statementas an invalid location where not to inject a fault,and thus do not produce mutants.

On the contrary, the Tool Under Test fails when:

1) In the case of valid synthetic programs: the Ora-cle Tool mutates the target fault location, whilethe Tool Under Test does not mutate the cor-responding binary instructions (denoted by thedebugging information). The test case detects anomission of the Tool Under Test.

2) In the case of invalid synthetic programs: the Or-acle Tool does not mutate the target fault location,but the Tool Under Test mutates the binary in-structions of the target location (denoted by thedebugging information). The test case detects aspurious injection of the Tool Under Test.

3) In the case of valid synthetic programs: the BMtool injects a fault in the target binary instruc-tions, but the execution results differs from themutant by the Oracle Tool. The test case detectsan incorrect injection of the Tool Under Test.

In order to achieve sufficient confidence that asynthetic program is accurately mutated, we executeit under several random inputs. The inputs exercisethe mutated statement, and lead it to "infect" the stateof the program (in the sense of the PIE model by Voas[34]). This approach (and any other approach) cannotprove that two mutants behave in the same way forall possible inputs, since this problem is undecidable: itis the same problem of detecting equivalent mutants inmutation testing [35], [36], [37]. However, as discussedin recent work on mutation testing [38], random in-puts can estimate the amount of equivalent mutants,with a high degree of statistical confidence.

Given that the two tools should inject the samefaults, we expect that binary-level and source-levelinjections have the same effects on the target program,in terms of impact on the control flow of the executionand on the state of the program. We detect differencesin the two executions by computing a checksum ofglobal variables, and by comparing these two check-sums. We deliberately generate programs that makefrequent use of global variables and of the variablesin the target statement, in order to let the effects of thefault to surface quickly. For example, assuming thatthe target statement is an if construct, and that thefault removes its else branch or changes the logicalcondition of the if, then the program state will beimmediately influenced by the lack of the accessesto global variables made within the else branch. It isimportant to note that our objectives do not requireus to fully cover the code of the randomly-generated

programs, since they are not intended to make anymeaningful computation, and thus we are not inter-ested in testing their correctness.

When we run the program with random inputs, wecheck whether the target statement is covered. If thetarget statement is not covered because of an infiniteloop, or because of a control-flow construct that con-tains the target statement, we make a small change inthe preamble and context of the synthetic program, inorder to steer the execution towards the target state-ment. The change retains the features of the Faultprogtest: we keep unchanged the target statement, andthe program structure around the target statement;we only make minimal changes to expressions in thecontrol condition of loop and control-flow constructs,to guide the execution on the desired path.

First, we enforce a maximum number of iterationsfor loops that do not terminate in a reasonable amountof time. This guarantees that the program does not getstuck in infinite loops, and that function calls return ina limited amount of time. Second, we change the con-ditional expression of control-flow constructs that arein the path of the target statement, but that hinder itsexecution. When the target statement is not covered,we identify the covered control-flow construct closestto the target statement. Then, its control condition isnegated both in the source mutant and in the binarymutant, and execute them again. These steps arerepeated for each conditional expression in the pathbefore the target statement, until the target statementis covered. This approach guarantees convergence ina limited number of iterations: in the worst case,we need to iterate for all the conditional expressionswithin the function that contains the target statement.

After performing the comparisons of the mutants,we analyze the results to identify the causes of inaccu-racies in the BM tool. To give feedback to developers,we analyze the distributions of failed test-cases withrespect to Faultprog’s parameters, to identify whichparameter leads to the highest number of failures.When pinpointing inaccuracies, it is useful to knowwhich fault types, constraints, or contexts caused asignificant number of omitted, spurious or incorrectinjections. This information enables developers to di-agnose problems, by looking at specific areas of theBM tool. Moreover, after fixing the tool, developerscan apply again the test-cases to validate the fix (i.e.,whether it reduces inaccurate injections).

4 CASE STUDYWe applied the Faultprog approach to a BM tool basedon G-SWFIT (Tool Under Test) from the Xceptionsuite. As reference, we use the SAFE tool for sourcecode mutation (Oracle Tool).

4.1 The Xception fault injection toolXception is a fault injection tool suite, developed byCRITICAL Software S.A. (under the brand csXcep-


tion), for supporting V&V of safety- and mission-critical systems. Xception was originally developedas a Software-Implemented Fault Injection (SWIFI) tool,jointly with the University of Coimbra in Portugal.The original Xception, and SWIFI in general, in-jected hardware faults (such as electromagnetic in-terferences, and circuit wear-out) by emulating theeffects of such faults in software, through random bit-flipping of memory and CPU registers. In particular,Xception was designed to exploit the debugging andperformance monitoring features of modern CPUs, inorder to inject faults with minimal intrusiveness [39].

Xception is today a professional environment forperforming fault injection tests. It adopts a modulararchitecture to support several forms of fault injectionincluding, but not limited to, SWIFI. In particular,Xception is currently being extended with a plug-in to support the G-SWFIT fault injection technique(previously discussed in § 2), for the injection ofsoftware faults through binary mutation.

The Experiment Management Environment (EME)is the core of Xception, and is responsible for con-trolling, monitoring, and storing the results of theexperiments. The EME interacts with a plug-in thatactually injects faults into the target system. In turn,the G-SWFIT plug-in (Fig. 5a) performs binary mu-tations on behalf of Xception. The G-SWFIT plug-indisassembles the binary code of the target program,and looks for code patterns (i.e., sequences of machineinstructions) that match to the target programmingconstructs (function calls, assignments, ...), accordingto the fault types and constraints presented in Sec-tion 2. Once code patterns are recognized, they aremutated by replacing machine instructions with nopinstructions (to emulate a statement omission), or bymodifying opcodes and the operands (to emulate anincorrect statement). This plug-in currently supportsthe PowerPC hardware architecture and the GCC compilertoolchain, on which we focus our analysis.

4.2 The SAFE fault injection tool

To point out inaccuracies of binary mutation, wealso inject source-code mutations in the syntheticprograms, which serve as a reference for binary mu-tations. We adopt the SAFE fault injection tool [23],which implements the same fault types of the originalG-SWFIT, but injects these faults in the source codeinstead of binary code (Fig. 5b). A source code file isfirst processed by a C/C++ front-end, which builds anAbstract Syntax Tree representation. It then identifieslocations where a fault type can be introduced in asyntactically-correct manner, and that comply withfault constraints (§ 2). The source code is then mu-tated, compiled, and compared to binary mutations aspreviously discussed. The SAFE tool has extensivelybeen used in several fault injection studies and indus-trial projects [23], [20], [40], [41], [42], including our

01001101!11101010!01110000!01011101!

Target application (binary code)!

Code patterns analysis!

0100XXXX!11101010!01110000!01011101!

01001101!XXXX1010!01110000!01011101!

01001101!1110XXXX!01110000!01011101!

...

Binary-level mutated versions!

Binary Mutation!

G-SWFIT fault types and constraints!

(a) csXception (Tool Under Test).

if(a && b)!{! c=1;!}!

Target application (source code)!

Source code

analysis!

...

Source-level mutated versions!

Source mutation!

if(a && b)!{! c=1;!}!

if(a && b)!{! c=1;!}!

if(a && b)!{! c=2;!}!

G-SWFIT fault types and constraints!

(b) SAFE (Oracle Tool).

Fig. 5. Fault Injection Tools analyzed in this study.

preliminary study on BM accuracy in [15], and it hasbeen applied on several software systems (includingthe Linux kernel, the RTEMS real-time OS, and severalopen-source projects). Based on this experience, thequality of source code mutations produced by SAFEhas steadily improved over the years, and it hasmatured to be a reliable implementation of G-SWFIT.

4.3 Test suite for Xception G-SWFIT

We use Faultprog to generate a test-suite for evalu-ating the Xception G-SWFIT plug-in. For each faulttype, Faultprog generates a set of valid programs (i.e.,programs containing a valid target fault location), anda set of invalid programs (i.e., programs containing atarget fault location that violates one of the constraintsof the fault type, according to TABLEs 1 and 2).

Several programs are generated for each fault type.Test cases cover all possible combinations in theFaultprog parameter space (see § 3.1 and TABLE 3).These parameters determine the content of a randomprogram, by influencing which type of expressionshould be generated in the target fault location, andwhich statements should surround it. The numberof generated random programs varies across ODCclasses, since they depend on the number and on thetype of constraints, and on the number of meaningfulcombinations of parameters for each fault type. Weonly exclude combinations with conflicts between pa-rameters. For example, the MVAV fault type (“missingvariable assignment using a value”) imposes that theparameter “expression structure” must be a singleoperand, and that the “basic operand type” can onlybe a constant. The number of resulting combinationsis showed in TABLE 4. For each combination, wegenerated 5 random programs for each combination,which allowed to achieve a reasonable code coverageof the BM tool, and to point out important issues.


TABLE 4Test-suites generated by Faultprog

ODC Class # of validprograms

# of invalidprograms

statementcoverage (%)

Assignment 416 365 35%

Algorithm 960 1,745 34%

Checking 624 480 41%

Interface 224 42 29%

Total 2,224 2,632 72%

4.4 Experimental measuresIn our experiments, we evaluate the accuracy of BMby measuring the percentage of test cases that leadto spurious and omitted injections, i.e., test cases inwhich the BM tool injects an incorrect number offaults (§ 3.3). More precisely, the percentage of omittedinjections of the BM tool is defined as:

omitted% =

# source-level injectionsnot matching binary-level injections

# source-level injections(1)

which is the ratio between (on the denominator)the number of all source-injected test cases (i.e., theinjections that the BM tool is ideally expected to injectat the binary-level), and (on the numerator) whichhave been mutated as source code by the Oracle Tool,but were not mutated as binary code by the ToolUnder Test (i.e., omitted injections). In a similar way,the percentage of spurious injections of the BM tool is:

spurious% =

# binary-level injections notmatching source-level injections

# binary-level injections(2)

The percentage is computed from the ratio between(on the denominator) the number of all binary-injected test cases (including both correct and spuri-ous injections), and (on the numerator) the number oftest cases which have been mutated as binary code bythe Tool Under Test, but were not mutated as sourcecode by the Oracle Tool (i.e., spurious injections).

Finally, the percentage of incorrect injections isgiven by the ratio between (on the denominator)the number of test cases that were mutated both atbinary- and at source-level, and (on the numerator)the number of such test cases whose binary andsource mutants produced different outputs:

incorrect% =

# non-omitted, non-spurious injections wheresource and binary mutations give different outputs

# non-omitted, non-spurious injections(3)

4.5 Experimental resultsWe here analyze omitted, spurious, and incorrect in-jections (§ 4.4) for the Xception BM tool. First, we

consider the most typical scenario, where the Fault-prog test-suite is compiled under the default compilerconfiguration. We then evaluate whether compiler opti-mizations impact on the accuracy of the BM tool, bycompiling the test-suite with optimizations enabled,and applying it again on the BM tool. The experimen-tal setup included an Intel x86-64 PC running UbuntuLinux 14.04, the Faultprog program generator, thefault injection tools, and GCC version 4.1.1, which weconfigured to cross-compile for the powerpc-eabi target.We executed the mutated programs on an AppleiBook G4 running Debian Linux 6.0 for PowerPC.

4.5.1 Spurious injectionsAccording to eq. 2, we evaluated the percentage oftest cases, grouped by ODC class, in which the BMtool produced spurious faults, showed in the bar plotsof Fig. 6. The bar plots show the same percentages,but the bars are splitted with respect to differentperspectives (TABLE 3): the percentages are dividedrespectively by (i) constraint violated by the testcase (Fig. 6a), (ii) context of the target fault location(Fig. 6b), (iii) level of nesting (Fig. 6c), (iv) type ofoperand (Fig. 6d), and (v) structure of the expressionin the target fault location (Fig. 6e).

The highest percentage (49.58%) occurred for theAlgorithm class, and in particular the MIEB, MFC, andMLPA fault types. In Fig. 6a, constraints C02 and C09account for most of the spurious injections (respec-tively, 19.37% and 10.05% of all test cases). This meansthat there was a high number of incorrect injectionseven if the target statement was the only statementin its block (but a fault should not be injected, sincethe test case violates constraint C02) or the if constructincluded more than five statements (a fault should notbe injected, since constraint C09 is violated).

An example of spurious injection for the MFC faulttype is showed in Fig. 7. The function call func_7() isthe only instruction in a while block, and it is not avalid location according to constraint C02: thus, faultsshould not be injected there. However, a spurious in-jection was made there by the BM tool: the parametersof this function call were erroneously interpreted asdistinct statements instead of an individual statement,due to the complexity of the expressions in input tofunc_7(). The BM tool handles the binary instructionsfor computing the function inputs (e.g., the calls tofunc_5() and func_13(), lines 1-16) separately from theinstructions that invoke the function (i.e., the registermoves and the branch to func_7() in lines 17-19, in darkgray). A spurious fault is injected only in instructions17-19, leaving the “useless” computation of functioninputs in the loop, which is far from a realistic soft-ware fault that programmers would make.

To avoid such spurious injections, the fault injectorshould take into account data dependencies betweenthe instructions of the function call and the instruc-tions that compute its inputs, and recognize all of


Checking

Interface

All

Per

cen

tag

e o

f sp

uri

ou

s in

ject

ion

s

Fault Class

C11 violatedC10 violatedC09 violatedC08 violatedC07 violatedC06 violatedC05 violatedC04 violatedC03 violatedC02 violatedC01 violatedAll satisfied

0%

20%

40%

60%

80%

100%

Assignm

ent

Algorithm

(a) Split by constraint.

Interface

All

Per

cen

tag

e o

f sp

uri

ou

s in

ject

ion

s

Fault Class

IF { target } ELSE {}IF {} ELSE { target }IF { target }FOR { target }WHILE {target}target (no context)

0%

20%

40%

60%

80%

100%

Assignm

ent

Algorithm

Checking

(b) Split by context.

Algorithm

Checking

Interface

All

Per

cen

tag

e o

f sp

uri

ou

s in

ject

ion

s

Fault Class

3−levels nesting2−levels nesting1−level nesting

0%

20%

40%

60%

80%

100%

Assignm

ent

(c) Split by nesting.

Algorithm

Checking

Interface

All

Per

cen

tag

e o

f sp

uri

ou

s in

ject

ion

s

Fault Class

Function CallsVariablesConstants

0%

20%

40%

60%

80%

100%

Assignm

ent

(d) Split by type ofoperand.

100%

Assignm

ent

Algorithm

Checking

Interface

All

Per

cen

tag

e o

f sp

uri

ou

s in

ject

ion

s

Fault Class

Expr. with a basic operand and two random sub−exprs.Expr. with a basic operand and random sub−expr.Expr. with three basic operandsExpr. with two basic operandsSingle basic operand

0%

20%

40%

60%

80%

(e) Split by structure ofexpressions.

Fig. 6. Spurious injections.

Binary code!!!!!!!!!!!!!!!!!!!

Source code!!!!!!!!!!!!

void func(void) { ...! while (l_6 & func_7(g_139)) {!! func_7( func_13((g_4 % func_5(0x7790L)),! (g_4% func_5(1L)))! );!! }! ...!}!

Target fault location

...!li!bl!mr!...!...!li!bl!mr!...!...!mr!mr!bl!mr!...!...!mr!bl!

!r3,30608!100009c0 <func_5>!r9,r3!!!r3,1!100009c0 <func_5>!r9,r3!!!r3,r28!r4,r0!10000f74 <func_13>!r0,r3!!!r3,r0!10000ab0 <func_7>!

data dependency

data dependency

data dependency

12345678910111213141516171819

Fig. 7. Example of synthetic program causing a spuri-ous injection (MFC fault type).

them as inter-dependent expressions. In the exampleof Fig. 7, the tool should avoid to inject a fault, sinceall the instructions in the while body block are all

involved in the same function call, since the resultsof the expressions are all input parameters of thefunction call. It must be noted that, even if data-flow analysis is extensively used in compiler opti-mizations [43], to the best of our knowledge thesetechniques have still not been applied for accuratebinary fault injection–we are currently investigatingnovel approaches based on the analysis of data depen-dencies [44]. We remark that having only a functioncall within a block is a corner-case issue, which canonly be found with a systematic testing approachsuch as Faultprog. We found similar problems for theMIEB and MLPA fault type, since the tool does notcorrectly computes the number of statements withinan if block, leading to spurious injections when an ifblock contains a loop or more than 5 statements.

4.5.2 Omitted injectionsFig. 8 shows the distribution of omitted injections,again by splitting the bars with respect to Faultprogparameters (except for the lack of “constraint viola-tions”, that do not apply for “valid” test cases). Thepercentage of omitted injections is very high, for allODC classes. However, these omitted injections can-not be attributed to a problem of a specific fault typeor Faultprog parameter, since omitted injections oc-curred for every fault type, context, nesting, operandtype, and expression structure (the bands in bars ofFig. 8a to 8d have about the same height in all cases).

From a closer analysis of omitted injections, wefound that the root cause was the inaccurate iden-tification of statement boundaries in the target faultlocation. As in the example of Fig. 7, the problemmanifested for test cases with statements containingcomplex expressions, which were handled by csX-ception as a sequence of several small statements(instead of a unique, large statement). For this reason,csXception overestimated the number of statements inthe blocks of the program, and (mistakenly) believedthat constraints C02 or C09 were violated, leadingto omitted injections. This problem also affected theInterface and Checking faults, even if these fault typesdo not require the C02 and C09 constraints: for theseclasses of faults, csXception did not recognize largestatements that included several sub-expressions (e.g.,for WPFV and WAEP, function calls with severalsub-expressions in its parameters; for MLAC andMLOC, branch conditions with several boolean sub-expressions), since it did not parse these statementsin their entirety, thus omitting to inject in them.

4.5.3 Incorrect injectionsAfter identifying the test cases that did not causeomitted or spurious injections, we executed both theirbinary-mutated and source-mutated versions, andcompared their execution. We then identified incorrectinjections, i.e., test cases where a binary-mutant pro-duced different results than the corresponding source-


Interface

All

Per

cen

tag

e o

f o

mit

ted

in

ject

ion

s

Fault Class

IF { target } ELSE {}IF {} ELSE { target }IF { target }FOR { target }WHILE {target}target (no context)

0%

20%

40%

60%

80%

100%

Assignm

ent

Algorithm

Checking

(a) Split by context.

Algorithm

Checking

Interface

All

Per

cen

tag

e o

f o

mit

ted

in

ject

ion

s

Fault Class

3−levels nesting2−levels nesting1−level nesting

0%

20%

40%

60%

80%

100%

Assignm

ent

(b) Split by nesting.

Algorithm

Checking

Interface

All

Per

cen

tag

e o

f o

mit

ted

in

ject

ion

s

Fault Class

Function CallsVariablesConstants

0%

20%

40%

60%

80%

100%

Assignm

ent

(c) Split by type ofoperand.

100%

Assignm

ent

Algorithm

Checking

Interface

All

Per

cen

tag

e o

f o

mit

ted

in

ject

ion

s

Fault Class

Expr. with a basic operand and two random sub−exprs.Expr. with a basic operand and random sub−expr.Expr. with three basic operandsExpr. with two basic operandsSingle basic operand

0%

20%

40%

60%

80%

(d) Split by structure ofexpressions.

Fig. 8. Omitted injections.

mutant. The percentage of incorrect injections washigh across all fault classes, accounting for 60% ofnon-spurious, non-omitted injections. These incorrectinjections were caused by another form of the state-ment boundary problem: the tool mutated only a sub-set of the binary instructions of the target statement.

For example, Fig. 9 shows an example for the MIAfault type. The BM tool was supposed to inject anMIA fault, by replacing the binary instructions ofthe if construct with nops (i.e., the instructions forevaluating the boolean expression in the if and thebranch to the block after the if, highlighted in lightgray in Fig. 9). To identify the target fault location, theBM tool first identifies the branch instruction related tothe if (beq-, line 20 in the figure). Then, to identify thebeginning of the if construct, it inspects backwards thebinary instructions that precede the branch, by lookingfor either an assignment or a function call that denotethe boundary with another statement. However, thistechnique did not work correctly for this program,as the binary instructions identified by the BM tool(dark gray in the figure) are only a subset of thetarget fault location. The BM tool stops the backwardinspection in the middle of the if construct (line 10,at the end of the func_12() function call) rather thanat the boundary of the statement (at line 4), thus notinjecting the intended mutation in its entirety.

This incomplete mutation exhibits a different be-

!!!!!!!!

Source code!uint32_t l_128 = 0x39CDCE66L;!!if ((func_12(l_24, l_126) * func_13(l_13, l_5)))!{!...!}!

Binary translation of source code!

Binary instructions actually injected by the BM tool!

Binary code!lis r0,14797!ori r0,r0,52838!stw r0,8(r31)!!lbz r0,16(r31)!extsb r0,r0!lwz r3,20(r31)!mr r4,r0!bl 10000e04 <func_12>!mr r29,r3!lbz r0,24(r31)!clrlwi r9,r0,24!lhz r0,26(r31)!clrlwi r0,r0,16!mr r3,r9!mr r4,r0!bl 10000e30 <func_13>!mr r0,r3!mullw r0,r29,r0!cmpwi cr7,r0,0!beq- cr7,10000d34 <func+0x57c>!

!123

4567891011121314151617181920

Fig. 9. Example of synthetic program causing anincorrect injection (MIA fault type).

havior than the corresponding source mutant: whilethe branch is removed from the program, the func-tion call to func_12() is not removed, and affects itsexecution (e.g., through side effects). As discussedbefore, the BM tool should take into account the de-pendencies between binary instructions (in this case,the dependency is at the mullw instruction at line18, which uses the results of the function calls) toaccurately identify the boundaries of a fault location.

4.5.4 Impact of optimizationsFinally, we consider the impact of compiler optimiza-tions on the accuracy of Xception G-SWFIT. GCCprovides three sets of optimizations (“levels”), whichdevelopers can selectively enable to reduce code sizeor to minimize compilation or execution time:

• Basic optimizations, which generate binary codewith reduced size and improved performance,but without increasing the compilation time. Thisoption is useful for many large server/databaseapplications where memory paging due to largercode size is an issue.

• Speed optimizations, which include the opti-mizations of the previous level, and add globaltransformations that create faster, small code, butsignificantly increase the compilation time.

• Speed-over-size optimizations, which includethe optimizations of the previous levels, and fur-ther improve performance but increase the sizeof binary code, such as loop unrolling, functioninlining, and vectorization. They are used for ap-plications with many floating point calculationsor that process large data sets.

We apply again the Faultprog test-suite, by com-piling test cases using these optimizations. However,we do not enable all optimizations at the same time(according to levels), since this approach would noteasily allow to pinpoint the specific optimizationsthat interfere with injection accuracy. We hypothesizethat most of the optimizations do not affect the codepatterns involved in the fault model: therefore, wecompile each test case several times and, at each


compilation, we enable only one optimization option,and keep disabled the remaining ones. This approachallows to easily identify specific optimization optionsthat cause inaccuracies. In total, we consider 48 opti-mization options provided by the GCC compiler (doc-umented in [45]), with the exception of loop unrollingand function inlining, which we already showed to beproblematic in our previous work [15].

To analyze the impact of optimizations, we focuson test cases that were previously injected in a cor-rect way. Test-cases are compiled with optimizationsenabled, and are provided in input to the BM tool.The outcome is compared to the non-optimized test-case, to test whether the tool still injects correctly evenwhen the binary code has been optimized. Compileroptimizations often merge the binary instructions ofa group of source code statements: thus, debugginginformation now provides the binary-source mappingfor the whole group of source code statements, insteadof the mapping for individual statements. Therefore,we apply the BM tool on the whole group (both inthe optimized and non-optimized versions), to checkwhether the number of injected faults is the same and,if so, whether all the execution results match.

Fig. 10 shows, for each optimization option, thepercentage of test-cases that are still correctly-injectedunder that optimization option, where 100% meansthat the optimization had no impact on accuracy. Wefound that almost all optimizations have a negligi-ble impact on accuracy, as the percentage of correctinjections is about 97% for most of them (for read-ability, we do not show all of these optimizationswith high accuracy in Fig. 10). However, there arethree optimizations that decrease accuracy, namelyomit-frame-pointer, schedule-insns, and schedule-insns2:they respectively change the convention for callingand returning from functions (by not using the framepointer register), and reorder instructions (if they donot have dependencies) to avoid stalls that occur inpipelined CPUs. The former optimization hampersthe identification of functions within the binary code(by changing the beginning and the ending of thebinary translation of each function), thus it totallymisleads the BM tool. In the other two optimizations,the compiler varies the order of binary instructions,thus leading to unexpected code patterns that werenot considered during the development of the BMtool, and thus are not included in the implementationof the fault operators. Overall, the results of thisimpact analysis are encouraging for the adoption ofbinary mutation: (i) most of the optimizations do notinterfere with the code patterns that are targeted byfault injection; and (ii) only few optimizations haveto be dealt with, by extending the implementation offault operators to cover their specific code patterns.

5 RELATED WORK

The injection of software faults has extensively beenused for the evaluation of dependable systems. Itworth to mention the use of fault injection to assessthe likelihood of high-severity failures modes, such as:(i) data losses in a crash-tolerant file cache [8]; (ii) fail-stop violations in a transactional DBMS [7]; (iii) errorpropagation across components in a microkernel OS [9].Another example is dependability benchmarking (e.g.,OTS OSes, web servers, DBMSs) [10], which requiresaccurate injections to give a fair comparison.

Several binary-level fault injection techniques forOTS software have been proposed. Early SWIFI tools,such as FERRARI [46], NFTAPE [47], and Xception[39], adopted debug supports to introduce faults atrun-time, by (i) inserting a breakpoint that triggers ahandler when a target instruction is executed, and (ii)corrupting memory and CPU registers in the handler,by following a simple bit-flip fault model.

To better emulate software faults, researchers inves-tigated more complex fault models. The FINE tool [48]proposed simple binary code mutations, by replacingmachine instructions with nops and changing theirdestination operand. Subsequent studies, includingNg and Chen’s [8], G-SWFIT [14], and LFI [49] refinedthe code patterns of binary mutations to reproducemore complex patterns, as discussed in § 2.

More recent studies leveraged virtualization andcompiler technology to mutate binary code in aportable and easy-to-use way. XEMU [50] introducesbinary mutations in the QEMU emulator, by extend-ing its dynamic binary code translator. The programrunning in QEMU is first translated into an inter-mediate representation (IR), i.e., platform-independentinstructions, and then translated again into nativehost instructions, using Just-In-Time compilation tokeep low the translation overhead. In the middle ofthis process, XEMU injects mutations in selected IRcode patterns (e.g., loads/stores, comparisons, assign-ments). In a similar way, ESI [51], LLFI [52], andEDFI [53] insert fault injection code at compile-time,by leveraging the LLVM compiler framework [54] toinstrument the IR. If the source code of the target isavailable, these approaches can preserve the mappingbetween source code and IR code, and can thusachieve a high degree of accuracy. However, whenonly the binary code is available (as for OTS software),this solution cannot preserve the mapping with thesource code, and the IR code has to be reversed frombinary code [55], thus suffering from inaccuracies.

Several studies experimentally investigated theproblem of the accuracy of binary-level fault injection,both for software faults [13], [14], [15] and hardwarefaults [16], [17]. Madeira et al. [13] analyzed howsoftware faults translate into binary code, and howto emulate them through SWIFI with bit-flipping.They found that code patterns are in most cases too


!!"#

!!$"#

!!%"#

!!&"#

!!'"#

!!(""#

)*+,-+ï,./012.,1

)3.).,ï+-+

)0)ï*-45.,10-4

)6.,/.ï*-4127421

)2,..ï1,7

)2,..ï2.,

)8402ï72ï7ï206.

)-602ï),76

.ï+-042.,

9.,*.427/.!-)!*-,,.*2!04:.*20-41

;+2060<720-4

....

other

optimizations

with high

accuracy

...

(a) Basic (-O1).

!!"#

!!$"#

!!%"#

!!&"#

!!'"#

!!(""#

)*+,-.ï/01

23

)*+,-.ï+*45+3

)6576856ï4+79:3

)96733/012,.-

);<65*8ï/01

23

);655ï265

);655ï=62

)39<580+5ï,.3.3$

)39<580+5ï,.3.3

>5695.;*-5!7)!976659;!,./59;,7.3

?2;,1,@*;,7.

....

other

optimizations

with high

accuracy

...

(b) Speed (-O2).

!!"#

!!$"#

!!%"#

!!&"#

!!'"#

!!(""#

)*+,-ï.)/-0ï0-12.3

)40-35+/56-ï+27

72858*

)/0--ï6-+/2059-

:-0+-8/.*-!2)!+200-+/!58;-+/528,

<4/5759./528

(c) Speed-over-size (-O3).

Fig. 10. Impact of compiler optimizations.

complex for SWIFI techniques due to the semanticgap, and that inaccurate injections significantly im-pact on fault injection results. Other empirical studiesanalyzed the impact of the gap between the IR andmachine code on SWIFI [17] and on fault activation[56]. However, these studies did not focus on thesystematic testing of BM tools as in the present paper.

6 CONCLUSION

This paper presented the Faultprog approach for test-ing the accuracy of binary mutation tools, and a casestudy on a commercial tool. In this section, we discussthe limitations and the lessons learned of the study.

6.1 LimitationsTo implement the Faultprog approach, we focused onthe fault model of G-SWFIT, and made some designtrade-offs for the Faultprog parameters (TABLE 3).The fault model and parameters restrict the syn-thetic programs that can be generated by our currentimplementation of Faultprog. We here discuss theserestrictions, and suggest opportunities for extensions.

We focused the selection of basic operand typeson local and global variables, constants, and functioncalls. In addition, basic operand types can be extendedto include pointer variables, in order to test pointer-related fault types. In this paper, we did not considerpointer variables since this kind of operands are aknown limitation of the BM tool that was identified inour previous study [15], and since G-SWFIT does notinclude specific fault types for pointers. Function callscould be extended to interact with the environmentand have side effects that cannot be reverted, suchas I/O interactions with the user. In this case, whencomparing the executions of the binary-mutated andsource-mutated programs, we should also comparethe interactions with the environment (e.g., by tracingmessages shown to the user) and use deterministicexecution techniques [40] to make traces comparable.

We limited the “context” parameter by not mixingcontrol-flow and loop constructs in the same test, andby only inserting the target statement in the body

of control-flow constructs, but not in the controllingexpression. These choices helped to avoid a significantincrease of the number of test cases. Moreover, usingthe target statement as controlling expression wouldoften lead to syntactically-invalid programs (e.g., anif cannot be within a controlling expression; injectingan MVIV could lead to an empty controlling con-dition). However, for specific fault types, Faultproggenerates, through the “fault constraints” parameter,target statements within a controlling expression. Forexample, in order to test the fault constraint C06 for“missing assignment” faults, we negate the constraintand generate programs where the target statement (anassignment) is the controlling expression.

6.2 Lessons learned and generalizationThe case study pointed out some challenging aspectsfor binary mutation. Besides the specific technicalproblems of a real BM tool, this experience pointsout the importance of systematic testing of BMtools. In fact, BM tools implement sophisticated faultmodels, such as G-SWFIT, and present an elaborated“input domain”, i.e., binary programs generated bymodern compilers. Thus, it is not sufficient anymoreto test BM tools using few small programs [14], [57].Instead, we need large test suites driven by coveragecriteria and oracles, in order to identify specific areasof the input domain that need to be improved. Inour case study, many inaccuracies were related tospecific “fault constraints”, that is, conditions thata program location has to satisfy to be eligible forfault injection. To test fault constraints, we generatedsynthetic programs that violated them, and uncoveredproblematic corner cases such as blocks with only onestatement or with more statements than allowed.

Moreover, the most relevant issues are a conse-quence of the more general problem of the “semanticgap” between binary and source code, such as theheuristics for the identification of statement bound-aries in binary code, which were based on simplifiedcode patterns. The influence of the semantic gapdepends on factors such as the source-level program-ming language, the reference hardware architecture,


and the fault model; however, similar issues will in-deed affect other BM tools. This experience shows thatthe semantic gap is a challenge for injection accuracyand require robust techniques for binary analysis (e.g.,using more complete code patterns to scan binarycode, and looking at data dependencies).

Finally, we found that many compiler optimiza-tions have a negligible influence on BM accuracy.For instance, changing the layout of data structures toincrease the cache hit ratio will likely not affect the BMtool, since the code patterns that are typically targetedby fault models (such as assignments, control flowconstructs, loops) will still hold even in the presenceof optimizations. This result implies that BM is rela-tively insensitive to optimizations, and encourages itsadoption in OTS software. However, our experimentsalso show that there are few optimizations that canimpact on BM accuracy, such as those influencingfunction calls and instruction scheduling. Therefore,we suggest that BM tool developers should test eachindividual optimization, and that the BM tool shouldcheck the presence of problematic optimizations in abinary program (e.g., using reverse engineering tech-niques [58]) and to explicitly address these optimiza-tions (e.g., by supporting alternative code patterns).

The case study aimed to show the feasibility and theusefulness of the Faultprog approach in a specific con-text, focusing on a specific architecture (PowerPC) andcompiler toolchain (GCC). Thus, care must be takento generalize the findings to other scenarios. However,the problems due to the semantic gap are not specificto this case study. Moreover, the case study targeted awidespread hardware architecture in embedded sys-tems and representative of RISC architectures (suchas ARM), and a popular, industrial-strength compilertoolchain, based on compiler techniques also adoptedby other modern compilers [43]. Thus, similar issuesand findings are likely to apply to other configura-tions. The possible differences with respect to differentcompiler versions or families are: (i) the use of dif-ferent binary code patterns to translate programmingconstructs (such as loops, assignments, etc.) in thefault model, and (ii) a different set of optimizations.In these cases, developers should design the BM toolto address code patterns and optimizations of thetarget compiler that intersect with the fault model,and mutate them accordingly.

ACKNOWLEDGMENTS

This work has been supported by the CECRIS EU FP7project (grant agreement no. 324334).

REFERENCES

[1] E. J. Weyuker, “Testing component-based software: A caution-ary tale,” IEEE Software, vol. 15, no. 5, 1998.

[2] J. Voas, “COTS software: the economical choice?” IEEE Soft-ware, vol. 15, no. 2, 1998.

[3] P. Koopman and J. DeVale, “The exception handling effective-ness of POSIX operating systems,” IEEE Trans. on Soft. Eng.,vol. 26, no. 9, 2000.

[4] J. Arlat, M. Aguera, L. Amat, Y. Crouzet, J. Fabre, J. Laprie,E. Martins, and D. Powell, “Fault Injection for DependabilityValidation: A Methodology and Some Applications,” IEEETrans. on Soft. Eng., vol. 16, no. 2, 1990.

[5] J. Voas, “Certifying off-the-shelf software components,” IEEESoftware, vol. 31, no. 6, 1998.

[6] R. Natella, D. Cotroneo, and H. Madeira, “Assessing De-pendability with Software Fault Injection: A Survey,” ACMComputing Surveys, vol. 48, no. 3, 2016.

[7] S. Chandra and P. M. Chen, “How fail-stop are faulty pro-grams?” in Proc. FTCS, 1998.

[8] W. T. Ng and P. M. Chen, “The Design and Verification of theRio File Cache,” IEEE Trans. on Comp., vol. 50, no. 4, 2001.

[9] J. Arlat, J. Fabre, M. Rodríguez, and F. Salles, “Dependabilityof COTS Microkernel-Based Systems,” IEEE Trans. on Comp.,vol. 51, no. 2, 2002.

[10] K. Kanoun and L. Spainhower, Dependability Benchmarking forComputer Systems. Wiley-IEEE Computer Society, 2008.

[11] Y.-S. Ma, J. Offutt, and Y. R. Kwon, “Mujava: an automatedclass mutation system,” Software Testing, Verification and Relia-bility, vol. 15, no. 2, 2005.

[12] D. Schuler and A. Zeller, “Javalanche: Efficient mutation test-ing for Java,” in Proc. ESEC/FSE, 2009.

[13] H. Madeira, D. Costa, and M. Vieira, “On the Emulation ofSoftware Faults by Software Fault Injection,” in Proc. DSN,2000.

[14] J. Durães and H. Madeira, “Emulation of Software faults: AField Data Study and a Practical Approach,” IEEE Trans. onSoft. Eng., vol. 32, no. 11, 2006.

[15] D. Cotroneo, A. Lanzaro, R. Natella, and R. Barbosa, “Ex-perimental analysis of binary-level software fault injection incomplex software,” in Proc. EDCC, 2012.

[16] H. Cho, S. Mirkhani, C.-Y. Cher, J. A. Abraham, and S. Mitra,“Quantitative evaluation of soft error injection techniques forrobust system design,” in Proc. DAC, 2013.

[17] J. Wei, A. Thomas, G. Li, and K. Pattabiraman, “Quantifyingthe accuracy of high-level fault injection techniques for hard-ware faults,” in Proc. DSN, 2014.

[18] M. Sullivan and R. Chillarege, “Software Defects and theirImpact on System Availability: A Study of Field Failures inOperating Systems,” in Proc. FTCS, 1991.

[19] J. Christmansson and R. Chillarege, “Generation of an ErrorSet that Emulates Software Faults based on Field Data,” inProc. FTCS, 1996.

[20] D. Cotroneo and R. Natella, “Fault Injection for SoftwareCertification,” IEEE Security & Privacy, vol. 11, no. 4, 2013.

[21] R. Chillarege, I. Bhandari, J. Chaar, M. Halliday, D. Moebus,B. Ray, and M. Wong, “Orthogonal Defect Classification–AConcept for In-Process Measurements,” IEEE Trans. on Soft.Eng., vol. 18, no. 11, 1992.

[22] J. Durães and H. Madeira, “G-SWFIT fault emulationoperators,” http://wpage.unina.it/roberto.natella/misc/fault-emulation-annex-e0849s.pdf.

[23] R. Natella, D. Cotroneo, J. A. Duraes, and H. S. Madeira,“On fault representativeness of software fault injection,” IEEETrans. on Soft. Eng., vol. 39, no. 1, 2013.

[24] T. Yoshikawa, K. Shimura, and T. Ozawa, “Random programgenerator for Java JIT compiler test system,” in Proc. QSIC,2003.

[25] X. Yang, Y. Chen, E. Eide, and J. Regehr, “Finding and un-derstanding bugs in C compilers,” in ACM SIGPLAN Notices,vol. 46, no. 6, 2011.

[26] C. Holler, K. Herzig, and A. Zeller, “Fuzzing with CodeFragments,” in USENIX Security Symp., 2012.

[27] I. Hussain, C. Csallner, M. Grechanik, Q. Xie, S. Park, K. Taneja,and M. Hossain, “RUGRAT: Evaluating program analysis andtesting tools and compilers with large generated randombenchmark applications,” Software: Pract. and Exp., 2014.

[28] W. M. McKeeman, “Differential testing for software,” DigitalTechnical Journal, vol. 10, no. 1, 1998.

[29] D. Slutz, “Massive stochastic testing of SQL,” in Proc. VLDB,vol. 98, 1998, pp. 618–622.

http://wpage.unina.it/roberto.natella/misc/fault-emulation-annex-e0849s.pdf

http://wpage.unina.it/roberto.natella/misc/fault-emulation-annex-e0849s.pdf


[30] E. G. Sirer and B. N. Bershad, “Using production grammarsin software testing,” in ACM SIGPLAN Notices, vol. 35, no. 1,1999, pp. 1–13.

[31] L. Martignoni, R. Paleari, G. Fresi Roglia, and D. Bruschi,“Testing system virtual machines,” in Proc. ISSTA, 2010, pp.171–182.

[32] Bryan Turner, “The Random C Program Generator,” https://sites.google.com/site/brturn2/randomcprogramgenerator.

[33] GCC Online Documentation, “Options for Debugging YourProgram or GCC,” http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html.

[34] J. Voas, “PIE: A dynamic failure-based technique,” IEEE Trans.on Soft. Eng., vol. 18, no. 8, 1992.

[35] Y. Jia and M. Harman, “An Analysis and Survey of theDevelopment of Mutation Testing,” IEEE Trans. on Soft. Eng.,vol. 37, no. 5, 2011.

[36] L. Madeyski, W. Orzeszyna, R. Torkar, and M. Józala, “Over-coming the equivalent mutant problem: A systematic literaturereview and a comparative experiment of second order muta-tion,” IEEE Trans. on Soft. Eng., vol. 40, no. 1, pp. 23–42, 2014.

[37] M. Papadakis, Y. Jia, M. Harman, and Y. LeTraon, “TrivialCompiler Equivalence: A Large Scale Empirical Study of aSimple, Fast and Effective Equivalent Mutant Detection Tech-nique,” in Proc. ICSE, 2015, pp. 936–946.

[38] R. Gopinath, A. Alipour, I. Ahmed, C. Jensen, and A. Groce,“How hard does mutation analysis have to be anyway?” inProc. ISSRE, 2015.

[39] J. Carreira, H. Madeira, and J. Silva, “Xception: A techniquefor the experimental evaluation of dependability in moderncomputers,” IEEE Trans. on Soft. Eng., vol. 24, no. 2, 1998.

[40] A. Lanzaro, R. Natella, S. Winter, D. Cotroneo, and N. Suri,“An empirical study of injected versus actual interface errors,”in Proc. ISSTA, 2014.

[41] M. Cinque, D. Cotroneo, R. Della Corte, and A. Pecchia,“Assessing direct monitoring techniques to analyze failuresof critical industrial systems,” in Proc. ISSRE, 2014.

[42] S. Winter, O. Schwahn, R. Natella, N. Suri, and D. Cotroneo,“No PAIN, No Gain? The utility of PArallel fault INjections,”in Proc. ICSE, 2015.

[43] A. Aho, M. Lam, R. Sethi, and J. Ullman, Compilers: Principles,Techniques, and Tools. Addison-Wesley, 2006.

[44] A. K. Iannillo, D. Cotroneo, and R. Natella, “A Fault InjectionTool For Java Software Applications,” Master’s thesis, FedericoII University of Naples, Italy, 2013.

[45] GCC Online Documentation, “Options That ControlOptimization,” https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html.

[46] G. Kanawati, N. Kanawati, and J. Abraham, “FERRARI: AFlexible Software-Based Fault and Error Injection System,”IEEE Trans. on Comp., vol. 44, no. 2, 1995.

[47] D. Stott, B. Floering, Z. Kalbarczyk, and R. Iyer, “A Frame-work for Assessing Dependability in Distributed Systems withLightweight Fault Injectors,” in Proc. ICPDS, 2000.

[48] W.-I. Kao, R. Iyer, and D. Tang, “FINE: A Fault Injectionand Monitoring Environment for Tracing the UNIX SystemBehavior under Faults,” IEEE Trans. on Soft. Eng., vol. 19,no. 11, 1993.

[49] P. Marinescu and G. Candea, “Efficient testing of recoverycode using fault injection,” ACM Trans. on Comp. Sys., vol. 29,no. 4, 2011.

[50] M. Becker, D. Baldin, C. Kuznik, M. M. Joy, T. Xie, andW. Mueller, “XEMU: an efficient QEMU based binary mutationtesting framework for embedded software,” in Proc. EMSOFT,2012.

[51] U. Schiffel, A. Schmitt, M. Susskraut, and C. Fetzer, “Sliceyour bug: Debugging error detection mechanisms using errorinjection slicing,” in Proc. EDCC, 2010.

[52] A. Thomas and K. Pattabiraman, “LLFI: An intermediate codelevel fault injector for soft computing applications,” Proc.SELSE, 2013.

[53] C. Giuffrida, A. Kuijsten, and A. S. Tanenbaum, “EDFI: A De-pendable Fault Injection Tool for Dependability BenchmarkingExperiments,” in Proc. PRDC, 2013.

[54] C. Lattner and V. Adve, “LLVM: A compilation frameworkfor lifelong program analysis & transformation,” in Proc. CGO,2004.

[55] C. Lattner, “Intro to the LLVM MC Project,” http://blog.llvm.org/2010/04/intro-to-llvm-mc-project.html.

[56] E. van der Kouwe, C. Giuffrida, and A. S. Tanenbaum, “Eval-uating Distortion in Fault Injection Experiments,” in Proc.HASE, 2014.

[57] A. Jin and J. Jiang, “Fault Injection Scheme for EmbeddedSystems at Machine Code Level and Verification,” in Proc.PRDC, 2009.

[58] A. Slowinska, T. Stancescu, and H. Bos, “Howard: A DynamicExcavator for Reverse Engineering Data Structures,” in Proc.NDSS, 2011.

Domenico Cotroneo (Ph.D.) is associateprofessor at the Federico II University ofNaples. His main interests include soft-ware fault injection, dependability assess-ment, and field-based measurements tech-niques. He has been member of the steeringcommittee and general chair of the IEEE Intl.Symp. on Software Reliability Engineering(ISSRE), PC co-chair of the 46th AnnualIEEE/IFIP Intl. Conf. on Dependable Sys-tems and Networks (DSN), and PC member

for several other scientific conferences on dependable computingincluding SRDS, EDCC, PRDC, LADC, and SafeComp.

Anna Lanzaro (Ph.D.) is a postdoctoral re-searcher at CINI and at the Federico II Uni-versity of Naples, Italy. Her research inter-ests are on the dependability assessment ofsafety-critical systems, on hardware and soft-ware fault injection, and on the dependabilityof multicore and virtualization technologies.She was recipient of the Best Presenta-tion award at the 9th European DependableComputing Conference (EDCC 2012) for aprevious version of this paper.

Roberto Natella (Ph.D.) is a postdoctoralresearcher at the Federico II University ofNaples, Italy, and co-founder of the Criti-ware s.r.l. spin-off company. His researchinterests include dependability benchmark-ing, software fault injection, and software ag-ing and rejuvenation, and their applicationin operating systems and virtualization tech-nologies. He has been involved in projectswith Finmeccanica, CRITICAL Software, andHuawei Technologies. He contributed, as au-

thor and reviewer, to several journals and conferences on depend-able computing and software engineering, and he has been inthe steering committee of the workshop on software certification(WoSoCer ) held with recent editions of the ISSRE conference.

https://sites.google.com/site/brturn2/randomcprogramgenerator

https://sites.google.com/site/brturn2/randomcprogramgenerator

http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html

http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html

https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

http://blog.llvm.org/2010/04/intro-to-llvm-mc-project.html

http://blog.llvm.org/2010/04/intro-to-llvm-mc-project.html

Documents

IEEE TRANSACTIONS ON DEPENDABLE AND …wpage.unina.it/roberto.natella/papers/natella_faultprog_tdsc2016.pdfWe propose Faultprog, a systematic approach for testing the accuracy of binary