16
Research Evaluating regression test suites based on their fault exposure capability Sebastian G. Elbaum 1, * and John C. Munson 2 1 Computer Science and Engineering Department, University of Nebraska–Lincoln, Lincoln NE 68588, U.S.A. 2 Computer Science Department, University of Idaho, Moscow ID 83843, U.S.A. SUMMARY The test process for evolving software systems takes on a different measurement aspect than that of new systems. Existing systems are generally being modified on a continuing basis as a normal part of the software maintenance activity. This process of product modifications is fault prone because faults are introduced in the code as it is being modified. From a statistical perspective, regression testing should be focused on those areas that are most likely to contain the introduced faults. Based on that premise, we have developed an evolutionary fault index that works as a fault surrogate varying in the same manner as faults. However, the knowledge as to the location of probable faults is not enough to assess the capabilities of a regression test suite. It is necessary to understand what the software is doing under each test. For that purpose, test execution profiles are gathered. Test execution profiles and the evolutionary fault indexes are combined in one methodology to provide an assessment of the overall regression testing activity and the suitability of each individual test. The methodology is illustrated with data from a 300 KLOC embedded system and its corresponding regression test suite. Copyright 2000 John Wiley & Sons, Ltd. J. Softw. M aint: Res. P ract., 2000; 12 (3):000–000 No. of Fig ures: 2. No . of Table s: 5. No. of re ferences: 20 . KEY WO RDS: so ftware measu rement; softwa re evolution ; software build s; software pro files; fault index; fault exposure *Correspondence to: Dr. Sebastian G. Elbaum, Department of Computer Science and Engineering, University of Nebraska–Lincoln, Lincoln NE 68588–0115, U.S.A. Email: [email protected] Contract grant/sponsor: Partial support provided by Storage Technology Corporation, Louisville CO, U.S.A. 1. INTRODUCTION Exhaustively testing a large software system is virtually impossible. Even trivially small systems, on the order of 20 or 30 modules, often have far too many possible execution paths for complete deterministic testing. This being the case, we must revisit what we hope to accomplish

Evaluating regression test suites based on their fault ...cse.unl.edu/~elbaum/papers/journals/JournalMaintenance00.pdfthe module fault index between build v and build v+1. We call

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Evaluating regression test suites based on their fault ...cse.unl.edu/~elbaum/papers/journals/JournalMaintenance00.pdfthe module fault index between build v and build v+1. We call

Research

Evaluating regression test suitesbased on their fault exposurecapability

Sebastian G. Elbaum1,* and John C. Munson2

1Computer Science and Engineering Department, University of

Nebraska–Lincoln, Lincoln NE 68588, U.S.A.2Computer Science Department, University of Idaho, Moscow ID 83843, U.S.A.

SUMMARY

The test process for evolving software systems takes on a different measurement aspectthan that of new systems. Existing systems are generally being modified on a continuingbasis as a normal part of the software maintenance activity. This process of productmodifications is fault prone because faults are introduced in the code as it is beingmodified. From a statistical perspective, regression testing should be focused on those areasthat are most likely to contain the introduced faults. Based on that premise, we havedeveloped an evolutionary fault index that works as a fault surrogate varying in the samemanner as faults. However, the knowledge as to the location of probable faults is notenough to assess the capabilities of a regression test suite. It is necessary to understandwhat the software is doing under each test. For that purpose, test execution profiles aregathered. Test execution profiles and the evolutionary fault indexes are combined in onemethodology to provide an assessment of the overall regression testing activity and thesuitability of each individual test. The methodology is illustrated with data from a 300KLOC embedded system and its corresponding regression test suite. Copyright ã 2000John Wiley & Sons, Ltd.

J. Softw. M aint: Res. P ract., 2000; 12(3):000–000

No. of Fig ures: 2. No . of Table s: 5. No. of re ferences: 20 .

KEY WO RDS: so ftware measu rement; softwa re evolution ; software build s; software pro files; fault index; fault

exposure

*Correspondence to: Dr. Sebastian G. Elbaum, Department of Computer Science and Engineering, University of

Nebraska–Lincoln, Lincoln NE 68588–0115, U.S.A. Email: [email protected]

Contract grant/sponsor: Partial support provided by Storage Technology Corporation, Louisville CO, U.S.A.

1. INTRODUCTION

Exhaustively testing a large software system is virtually impossible. Even trivially smallsystems, on the order of 20 or 30 modules, often have far too many possible execution paths forcomplete deterministic testing. This being the case, we must revisit what we hope to accomplish

Page 2: Evaluating regression test suites based on their fault ...cse.unl.edu/~elbaum/papers/journals/JournalMaintenance00.pdfthe module fault index between build v and build v+1. We call

SMRSGE.wpd: Fault exposure capability Page 2

by testing the system. Given unlimited time and resources, identification and removal of allfaults might be a noble goal, but real world constraints make this realistically unattainable. Weare left with the need to provide an adequate level of reliability, given that we cannot find andremove all of the faults. A test organization develops regression test suites to provide confidencethat changed portions of code (where faults might have been introduced) have not diminishedproduct reliability, since a regression test verifies that modifications to the code have not causedunintended effects [1]. The process of test selection and generation is extremely important at thisstage because it can save resources by choosing the tests that might exercise the changedsections of code. Different regression test selection and generation techniques have beenpresented (e.g., [2–4]) and evaluated (e.g., [5–7]) in the literature.

Our approach is neither a test selection technique nor a test generator tool. We are presentinga test assessment methodology based on the ultimate testing criterion: software faults. Theevaluation of regression test suites is based on the level of fault exposure they provide instead oftraditional coverage measures of control flow, commonly occurring errors or functionalproperties. By using static software measurement, we seek to identify the modules that are mostlikely to contain faults. Then, through the use of dynamic software measurement, we determinethe sections of code that are being exercised by each test. Joining the static and the dynamicmeasures allows the assessment of the regression testing activity in terms of the exposure topotential faults provided by each test. The target regression testing scenario includes largesystems, with over 100K lines of code, over 1000 modules, where builds are generated daily andwhere the limits of scalability of other assessment techniques have been surpassed.

2. STATIC MEASUREMENT

2.1. Developing a fault index

A fault index is an indicator of fault likelihood that serves to identify regions of code that aremost likely to contain faults. The main objective of a fault index is to facilitate understanding ofhow the complexity is distributed across large systems. By understanding how the complexity isdistributed, fault distribution can also be predicted. The prediction of fault locations helps toprioritize the resources and identify where spend them could yield the best return.

There are many ways to construct a fault index. Using lines of code (LOC) might constitutean initial attempt. On this view, a module with more LOC, i.e., a larger module, is more likely tohave faults. In practice, however, a smaller module with a complex flow control structure mightbe more likely to have faults. It is clear that there are many contributing factors for theintroduction of software faults into a program. Measuring just LOC would be naive. The creationof a fault index is a multivariate problem. We need to include multiple metrics to account for thedifferent factors that relate to the introduction of faults.

Figure 1 about here, pleaseFigure 1. Principal com ponents analysis (PCA ) can reveal the comm onalities among metrics.

For very large systems with a large number of metrics, the fault index construction becomesdifficult. To simplify the structure of the metrics, we use a statistical technique call principalcomponents analysis (PCA) to reduce the dimensionality of the metrics problem, as suggested inFigure 1. By using this technique a large number of metrics may be mapped onto a very small

Page 3: Evaluating regression test suites based on their fault ...cse.unl.edu/~elbaum/papers/journals/JournalMaintenance00.pdfthe module fault index between build v and build v+1. We call

SMRSGE.wpd: Fault exposure capability Page 3

number of orthogonal metric domains [8]. This permits a significant reduction in the number ofmetrics that we will have to manage. The new domains produced by the principal componentsanalysis are fewer than the original complexity metrics and they also have the property that theyare uncorrelated. The principal components analysis produces a transformation matrix thatcondenses the original r metrics onto a new smaller number, s of principal component metricsthat are the extracted uncorrelated domains, D. The transformation matrix, T, is an r×s matrixwhere r>s.

In addition to this transformation matrix, we can obtain from the principal componentsanalysis a set of eigenvalues, one for each of the s principal component metric domains. Eacheigenvalue represents the relative contribution of its domain to the overall variance in thecomplete metric set. Table I presents this in a generic form, with the contribution of the softwaremodules also shown, where the principal component domains are ordered from left to right bytheir eigenvalues. It is an essential characteristic of principal components analysis that the newmetric domains are extracted from the original metrics from the largest to the smallesteigenvalues. Thus, the first extracted metric domain will account for more of the variation in theoriginal metric set than will subsequent extracted domains.

Table I about here, pleaseTable I. Contribution of software modules to eigenvalues.

In order to construct the fault index, a linear function, F, is needed that is related in somemanner to the existence of software faults either directly or inversely. This can be expressedgenerically as F(x) = ax + b, where x is some unitary measure of program complexity, a is acoefficient and b is the offset . The more closely related x is to the existence of software faults,the more valuable the function, F, will be in the anticipation of software faults.

Previous research has established that it is possible to construct a fault index that hasproperties that are useful in this regard [9,10]. That fault index, D, may be represented as

(1)

where 8q is the eigenvalue associated with the qth principal component and dgq is the qth factorscore of the gth program module (see Table I).

Each of the eigenvalues represents the relative contribution of its associated principalcomponent to the total variance explained by all of the domains. In essence then, the fault indexis a weighted sum of the orthogonal factor score metrics. In this context, the fault indexrepresents each complexity metric in proportion to the amount of unique variation contributed bythat complexity metric.

The fault index already has a good deal of empirical support as a successful surrogatemeasure of software faults [9,1–13]. Since the fault index is a composite measure based onoriginal measurements, it incorporates information represented by statements, cycles, operands,function calls, and other software attributes reported in the literature to be related to softwarefaults.

It is possible and meaningful to compute a fault index, F, for the total software system. It issimply the sum of the fault indices of each module,

(2)

where g is a system’s module, and N is the number of modules in the system.

Page 4: Evaluating regression test suites based on their fault ...cse.unl.edu/~elbaum/papers/journals/JournalMaintenance00.pdfthe module fault index between build v and build v+1. We call

SMRSGE.wpd: Fault exposure capability Page 4

The principle behind the fault index is that it serves as a fault surrogate. That is, it will vary inprecisely the same manner as software faults. The fault potential fg of a particular module g isdirectly proportional its value of the fault surrogate. Thus,

. (3)

Our ability to locate the remaining faults in a system will relate directly to our exposure to

these faults. If for example, at the vth build of a system there are remaining faults in module

g, we cannot expect to identify any of these faults unless some test activity is allocated toexercising module g.

2.2. An evolutionary fault index

As the software progresses through a number of sequential builds, faults will be identified andcode will be changed in an attempt to eliminate those faults. However, the introduction of newcode is a process at least as fault prone as the initial code generation. Faults will be introducedduring this evolutionary process. Although code does not always change in response to faults,other types of changes (adaptations, new features, enhancements, etc.) suffer from the sameaffliction, the introduction of faults.

The typical software system is in a constant state of evolution. As the system grows andmodifications are made, the code is recompiled and a new version, or build, is created. Eachbuild is constructed from a set of software modules. The new version may contain some of thesame modules as the previous version, some entirely new modules, and it may omit somemodules that were present in an earlier version. Of the modules that are common to both the oldand new version, some may have undergone modification since the last build. When evaluatingthe change that occurs in the system between any two builds (software evolution), we areinterested in three sets of modules. The first set, M-common, is the set of modules present in bothbuilds of the system. These modules may have changed since the earlier version but were notremoved. The second set, M-old, is the set of modules that were in the previous build and wereremoved prior to the later build. The final set, M-new, is the set of modules that have been addedto the system since the previous build.

The fault index of the whole system Rv at build v relative to a later build v+1 is given by

(4)

where is the fault index for the module c on build v, and i is a module in the previous build.

Similarly, the fault index of the system Rv+1 at build v+1 relative to the previous build v, is givenby

(5)

where w is a module added in build v+1.If Rv+1 > Rv then the relative fault burden of the system will have increased from build v to

build v+1. The change in the fault index for a single module between two builds may be

Page 5: Evaluating regression test suites based on their fault ...cse.unl.edu/~elbaum/papers/journals/JournalMaintenance00.pdfthe module fault index between build v and build v+1. We call

SMRSGE.wpd: Fault exposure capability Page 5

measured in one of two distinct ways. First, we may simply compute the simple difference inthe module fault index between build v and build v+1. We call this value the fault delta, *, for

the module g, or . However, from the standpoint of fault introduction,

removing code is probably as catastrophic as adding code [14]. The absolute value of the faultdelta is a more appropriate measure for an evolving system. The new measure, the EvolutionaryFault Index (EFI), P, for module g is simply

. (6)

The EFI of the whole system over the same builds is

. (7)

2.3. Static measures of QTB

For purposes of demonstration, an embedded real-time system, QTB has been evaluated. QTB isa system developed in an industrial environment to perform the tasks of booting-up, monitoringand failure recovery for a hardware system. The system has approximately 300K LOC in morethan 3700 modules (functions) programmed in C. Table I shows the fault indexes and EFI for asmall subset of the QTB modules during five evolution cycles (six builds), identified as B–2through B–6. The selected subset of modules is meant to show different aspects of an evolvingsystem.

In Table II, column one contains the module identifier. Column two contains the fault indexvalues for each one of the modules as of build one (B–1). Module 24 was dropped starting withbuild B–4. Modules 36 and 78 do not have a fault index for B–1. These modules belong to theset M-new in builds B–6 and B–4 respectively; they were not part of the system in the originalbuild, instead they were added subsequently. Columns three to seven contain the EFI betweentwo builds. For example, module 21 has an EFI of 2.01 at build B–3. This EFI value wascomputed based on the comparison of build B–2 with build B–3. Observe that most of the EFIvalues are zero for the chosen subset of modules. This was typical for this system. On average,only 7% of the system modules were modified on a build.

Table II about here, pleaseTable II. FI and EFI for QTB.

3. DYNAMIC MEASUREMENT

3.1. Test execution profile

To know the location of probable faults in a system it is not sufficient to assess the faultexposure capability of a test. To complete the fault exposure capability concept, it is necessary toquantify the test activity. In particular, it is necessary to recognize the distribution of the testingactivity across the system. With that objective in mind, a software system may be viewed as a setof program modules that are executing a set of mutually exclusive functions. If the system

Page 6: Evaluating regression test suites based on their fault ...cse.unl.edu/~elbaum/papers/journals/JournalMaintenance00.pdfthe module fault index between build v and build v+1. We call

SMRSGE.wpd: Fault exposure capability Page 6

executes a functionality expressed by a subset of modules that are fault free, it will never failfrom faults within those modules. If, on the other hand, the system is executing a functionalityexpressed in a subset of fault laden modules, there is a very high probability that the system willfail. Thus, failure probability is dependent upon the input data sets [15], which drive the systeminto regions of code (i.e., functionalities) of differing complexities (i.e., fault proneness).

As each test is run to completion, it generates a test execution profile, which represents theresults of the execution of one or more modules [16]. For a given system, S, there is a call treethat shows the transition of program control from one program module to another. At anyparticular transition, the software is found to be executing exactly one of its N modules. In otherwords, there is an execution profile represented by the probabilities p1,p2,p3,...,pN [17]. Our best

estimates for the individual profile values are derived from the individual module frequency

values for each test k, as follows:

(8)

where is the frequency of execution of module g during the test k, and Bk is the total

frequency count of all module executions during test k [18]. The execution profiles generatedfrom each test may be characterized by the profile pk = { pg | 1 # g # N } for the kth test where Nis the cardinality of the complete set of program modules.

3.2. Dynamic measurement of QTB

In Table III, the execution profiles obtained from five tests of QTB are shown. The selectedmodules constitute a complete system functionality. Columns three to five of Table III show thevarious profiles for these modules under each test. The probability of executing a particularmodule changes from test to test. Some modules, like 817, are executed under all tests. On thecontrary, 830 is only executed under test J with a very low probability. Some tests execute a highpercentage of modules, others, like test DB, only execute 30% of the modules. We will showhow these probabilities can be used to estimate the fault exposure capability of a given test.

Table III about here, pleaseTable II. Test execution profiles for build 2 of the QTB.

4. FAULT EXPOSURE CAPABILITY

4.1. Test assessment

The objective of the fault exposure capability, FEC, is to assess the test activity in terms of theexposure received by the fault-laden sections of code. The most obvious application of theinformation provided by the fault exposure capability is the assessment of the testing activity.Test assessment can be used to direct further testing activity such as test augmentation or testselection. In addition, the fault exposure capability can assist the risk analysis activities byproviding levels of confidence in the regression testing performance.

Page 7: Evaluating regression test suites based on their fault ...cse.unl.edu/~elbaum/papers/journals/JournalMaintenance00.pdfthe module fault index between build v and build v+1. We call

SMRSGE.wpd: Fault exposure capability Page 7

In order to generate a FEC value, we hope to identify which modules are most likely tocontain the most faults and, based on execution profiles of the system, how these potential faultscan impact software reliability. The idea is that a fault that never executes does not cause afailure. However, a fault that lies along the path of normal execution will cause frequent failures.If a given module has a large fault potential, but limited exposure (small profile value) then thefault exposure of that module is also small. Our objective during the test phase is to maximizeour exposure to the faults in the system.

4.2. FEC definition

Each program module has a fault evolutionary index value. This EFI is a fault surrogate. That is,the larger the value of the index the greater the fault potential in that module [11]. In addition, asa test runs, a particular distinct execution profile emerges. For each test, some modules have ahigh probability of being executed, while other have a low probability. If a given module has alarge fault potential, but limited exposure (small profile value) then its faults are less likely to beexposed [19].

The FEC of a regression test, then, is given by the following formula

(9)

where N* represents the cardinality of the set { M-common ^ M-new } composed from the unionof the two sets described earlier. In this case, J is simply the expected value of the EFI indexunder the profile Pk.

It is worth noting that a regression test that has a high FEC on one build may have a low faultexposure on a subsequent build because of the evolving nature of the software system. Thecomparison of the FEC of one test for different builds can be accomplished by normalizing theprevious formula by the EFI for the whole build. Under this new definition, J is the expectedvalue for the EFI (for the N* modules in build v+1) under the profile generated by test knormalized by the total EFI in the build. This can be expressed as

. (10)

Figure 2 presents the generation process of the fault exposure capability methodology.Through the static measures we determine the most likely location of faults between builds v andv+1. Through dynamic measures the activity of each test k is quantified. Executing modules thatare likely to contain faults, causes the FEC to increase. Executing modules that are not likely tocontain faults, causes the FEC to decrease.

Figure 2 about here, pleaseFigure 2. Computation of the fault exposure capability (FEC).

The concept of fault exposure capability permits the numerical evaluation of a test on theactual changes that have been made to the software system. It is simply the normalized expectedvalue of the EFI from one release to another under a particular test. If the value of J is large for a

Page 8: Evaluating regression test suites based on their fault ...cse.unl.edu/~elbaum/papers/journals/JournalMaintenance00.pdfthe module fault index between build v and build v+1. We call

SMRSGE.wpd: Fault exposure capability Page 8

given test then the test will have exercised the changed modules. If the set of Js for a givenrelease is low, then it is reasonable to suppose that the changed modules have not been tested inproportion to the number of probable faults that were introduced during the maintenancechanges.

4.3. Assumptions

The next two paragraphs describe the two basic assumptions about our work reported here. Thefirst assumption is associated with the construction of the fault surrogate, and the second onefocuses on the testing process.

FEC is the expected value of a fault surrogate. As such, the strength of the FEC is limited bythe precision of the fault likelihood indicator. A poorly constructed fault surrogate couldcompletely mislead the test assessment. Therefore, the key to producing a valuable FEC is thefault surrogate choice. For this study, we have used EFI, validated in previous studies on thesame target system and environment [12–14]. The profiles presented in this work were obtainedfrom the same system on successive builds. Since there were no changes in the maintenanceprocess that could affect the system, we have assumed that the EFI effectiveness as a faultsurrogate remained stable.

We have also assumed the testing process follows a basic set of rules regarding the regressiontest scheme that should be in place in order to make the most out of this technique. First, eachtest should be run only once. Running a high fault-exposure test repeatedly would only increaseour perception of the test suite capability. Second, the tests should be intrinsically different fromeach other. This is harder to enforce than the first rule but it is as important. One feasibleapproach is through the analysis of generated profiles. For example, a guideline might be that alltests should generate different profiles and a test should not be a linear combination of othertests. The third rule states that common sense should prevail. There are other reasons for testingthan finding faults. High on this list are the non-functional requirements such as performancethat must be met by the software. Statistical testing techniques are not meant to supplant thesefacets of the test process.

4.4. Fault exposure capability of QTB

In this section, we present the complete set of data, and report on the performed testingevaluation as described in Section 4.2. The only adjustment that has been made to present thedata in the following tables is to convert the probabilities to a logarithmic scale. Table III showsthe FEC of five tests for a system task in build B–2. The upper bound in fault exposure will bereached if a test can execute exclusively module 831. This upper bound is generally constrainedby the design of the system but the idea is still to maximize it as much as possible. Test J is thebest based on the fault exposure criteria mainly because is the only one that covers the module831, the one that experienced major change activity. The tests AU, BH and BM have a similarcapability but test DB is less than adequate because it does not execute the modules that have thehighest EFI in this task. Observe that only 47% of the modules were executed during testing.This would not be necessarily dangerous except for the fact that module 827 experienced somemodifications but received no exposure to testing, which risks a system failure if a fault ispresent in that module and the module is executed.

Table IV about here, please

Page 9: Evaluating regression test suites based on their fault ...cse.unl.edu/~elbaum/papers/journals/JournalMaintenance00.pdfthe module fault index between build v and build v+1. We call

SMRSGE.wpd: Fault exposure capability Page 9

Table IV. Fault exposure capabilities for build B–2 of the QTB.

Table IV summarizes the FEC for the same five tests for all of the QTB modules involved in three different builds of the product. Since each build presented a different set of changed, newor removed modules, the fault exposure capability of a test varied across the different releases.Hence, a correct regression test suite evaluation must be done in relationship to each particularbuild. A test that was considered effective for one build might not be effective at all for anotherbuild if the modules that changed were completely different.

Table V about here, pleaseTable IV. Fault exposure capability under different builds of the QTB.

It is clear from the data in Table V that although test J presents a high FEC under build B–2,it does not do such a good job in the other two builds. On the other hand, test AU works betterfor B–3 but not so well for B–2. Test BM shows to be the most variable in its fault exposurecapabilities being the worst for B–3 and the best for B–4. Overall, the test suite FEC data variesgreatly among builds. This reflected a testing scheme in which tests were run routinely on eachbuild, without enough focus or opportunity to develop additional tests that would exercise themodules that might have introduced faults into the product.

5. FUTURE WORK

Recent studies [20] reference the need to investigate the application of measures of faultproneness as a possible path for improvement in test selection and prioritization techniques.Although our work has focused on test assessment, the transition to test selection andprioritization seems natural. The minimum requirements for these techniques are an objectivefunction, and the computation and testing procedures to maximize that objective function. Ourcriterion measure, FEC, could serve or be a component of the objective function. However, westill ought to investigate how to organize the test activity based on FEC.

Further work will also be necessary in the definition of FEC. Note that modules from M-olddo not appear in the FEC definition. The modules from M-old do not exist in the newer build;consequently they do not have a profile value. However, modules that were closely associatedwith a removed module could have a fault (e.g., a call to the removed module that was notchanged appropriately) that bypasses the EFI. Although we have not quantified the magnitude ofthis issue, we can imagine that a possible solution to this problem is to compute FEC for higher-level abstractions such as functionalities. Although functional profiles have already been definedand used [20] a redefinition for EFI will be necessary to include the complexity of the modulesremoved. Another possible improvement could come from the addition of cohesion (strength)and coupling metrics to the EFI to reflect the allocations of functions and linkage among themodules.

6. SUMMARY

Software testing of large systems is not an intuitive process. On the contrary, it requiresquantification mechanisms to validate it. The measurement strategy that we have implemented

Page 10: Evaluating regression test suites based on their fault ...cse.unl.edu/~elbaum/papers/journals/JournalMaintenance00.pdfthe module fault index between build v and build v+1. We call

SMRSGE.wpd: Fault exposure capability Page 10

can assist the testing staff in the hard task of determining the quality of each test suite in the faceof evolving software. There are two distinct aspects of this assessment strategy. First, there arethe static measures of fault exposure. In this regard, we have developed a measure of the overallburden of a program module, the fault index, and extended this concept to evolving systemsthrough the EFI. Second, we have developed measures of test profiles to show the participationof individual program modules in tests.

Intuitively—and empirically—a program that spends a high proportion of its time executing amodule set with a high EFI will be more failure prone than one executing a module set with alow EFI. The fault exposure capability, FEC, reflects exactly that. FEC provides a measure ofthe regression testing activity based on the exposure received by the fault prone sections of thesystem.

ACKNOWLEDGMENTS

This work was supported in part by the Storage Technology Corporation of Louisville, Colorado. The authors

acknowledge the helpful comments offered by three of the Journal’s anonymous referees.

REFERENCES

1. IEEE . ANSI/IEEE Standard 610.12–1990 Glossary of Software Engineering Terminology 1990. IEEE: New

York NY; XXX.

2. Korel B, Al-Yami A. Automated regression test generation. In Proceedings of the International Symposium

on Softw are Testing and An alysis 1998. ACM Press: New York NY; 143–151.

3. Ferguson R, Kore l B. The c haining app roach for so ftware test data g eneration. ACM Transactions on

Software Engineering and Methodology 1996; 5(1):63–86.

4. Rotherm el G, Har rold M J. A safe, efficient re gression test sele ction techniq ue. ACM Transactions on

Software Engineering and Methodology 1997; 6(2):173–210.

5. Graves TL, Harrold MJ, Kim JM, Porter A, Rothermel G. An empirical study of regression test selection

techniques. In Proceedings of the International Conference on Software Engineering 1998. ACM Press: New

York NY; 188–197.

6. Rotherm el G, Har rold M J. Empiric al studies of a safe regression tes t selection techn ique. IEEE Transactions

on Software Engineering 1998; 24(6):401–419.

7. Rosenb lum DS, W eyuker EJ . Lessons lear ned from r egression testin g case study. Empirical So ftware

Engineering Journal 1997; 2(2):188–191.

8. Munson JC, Khoshgoftaar TM . The relative software complexity metric: a validation study. In Proceedings

of the Software Engineering Conference 1990. Cambridge University Press: Cambridge, UK; 89–102.

9. Munso n JC, Kh oshgoftaar TM. T he detectio n of fault-prone program s. IEEE Tra nsactions on S oftware

Engineering 1992; 18(5):423–433.

10. Nikora AP, Munson J C . Determining fault insertion rates for evolving software systems. In Proceedings

International Symposium of Software Reliability Engineering 1998. IEEE Computer Society Press, Los

Alamitos CA; 306–315.

11. Nikora A P. Software System Defect Content Prediction from Development Process And Product

Characteristics, Doctoral Dissertation 1998. Department of Computer Science, University of Southern

California: Los Angeles CA.

12. Elbaum SG, Munson JC . Code churn: a measure for estimating the impact of code change. In Proceedings

International Conference on Software Maintenance 1998. IEEE Computer Society Press: Los Alamitos CA;

24–31.

13. Elbaum SG, M unson JC . Software ev olution and the code fau lt introduction process. Empirical So ftware

Engineering 1999; 4(3):241–262.

14. Elbaum SG, Munson JC . Getting a handle on the fault injection process: validation of measurement tools. In

Proceedings International Software Metrics Symposium 1998. IEEE Computer Society Press: Los Alamitos

CA; 133–141.

Page 11: Evaluating regression test suites based on their fault ...cse.unl.edu/~elbaum/papers/journals/JournalMaintenance00.pdfthe module fault index between build v and build v+1. We call

SMRSGE.wpd: Fault exposure capability Page 11

15. Li N, Malaiya YK. On input profile selection for software testing. In Proceedings International Symposium

on Software Reliability Engineering1994. IEEE Computer Society Press: Los Alamitos CA; 196–205.

16. Munson JC, Elbaum SG , Karcich RM, Wilcox JP. So ftware risk assessment through software measurement

and modeling. In Proceedings Aerospace Conference 1998. IEEE Computer Society Press: Los Alamitos CA;

137–147.

17. Munson JC. A functional approach to software reliability modeling. In Proceedings Conference on

Mathem atical and Scien tific Computing: Q uality of Numerica l Software 1997. Chapman & H all: London,

UK; 61 –76.

18. Munson JC, Elbaum SG . Software reliability as a function of user execution patterns. In Proceedings IEEE

32nd Annual Hawaii International Conference on Systems and Sciences 1999. IEE E Comp uter Society Press:

Los Alamitos CA; 281.

19. Hamlet D, Voas J. Faults on its sleeve: amplifying software reliability testing. In Proceedings of the

Internatio nal Sym posium on Softw are Testing and An alysis 1992. ACM Press: New York NY; 89–98.

20. Rothermel G, Untch RH, Chu C, Ha rrold MJ. Test case prioritization: an empirical study. In Proceedings

International Conference of Software Maintenance 1999. IEEE Computer Society Press: Los Alamitos CA;

179–188.

AUTHOR’S BIOGRAPHIES

Sebastian G. Elbaum is an Assistant Professor at the Lincoln campus of the University of Nebraska–Lincoln, where

he holds a J . D. Edw ards Pro fessorship in S oftware En gineering. H is research inter ests are softwar e measure ment,

software testing, so ftware reliability, and intrusion dete ction. He re ceived his P h.D. at the U niversity of Idah o in

Computer Science. He obtained his MS in Science with a software engineering orientation at the University of Idaho

in 1997 . He has a B S in Systems E ngineering fro m Univer sidad Ca tolica de C ordob a, Argentina. E mail:

[email protected]

John C. Munson is one of the founders of Cylant Technologies and a Professor of Computer Science at the

University of Idaho in Moscow Idaho. He has been closely associated with the IEEE software reliability, metrics and

maintenanc e comm unities. He cur rently is funded fo r research e fforts at Storage Techno logy Corp oration in

Colorado, the U.S. Department of Defense (DOD), and the Jet Propulsion Laboratory. He received his Ph.D. from

the University of New Mexico in Statistics. Email: [email protected]

Page 12: Evaluating regression test suites based on their fault ...cse.unl.edu/~elbaum/papers/journals/JournalMaintenance00.pdfthe module fault index between build v and build v+1. We call

SMRSGE.wpd: Fault exposure capability Page 12

Table I. Contribution of software modules to eigenvalues.

DomainsModules

D1 D2 … Dq … Ds

Mod1 d11 d12 … d1q … D1s

Mod2 d21 d22 … d2q … D2s

… … … … … … …Modg dg1 dg2 … dgq … dgs

… … … … … … …ModN dN1 dN2 … dNq … dNs

Eigenvalues l1 l2 … lq … ls

Page 13: Evaluating regression test suites based on their fault ...cse.unl.edu/~elbaum/papers/journals/JournalMaintenance00.pdfthe module fault index between build v and build v+1. We call

SMRSGE.wpd: Fault exposure capability Page 13

Table II. FI and EFI for QTB

Modules Fault

Index

Evolutionary Fault Index (EFI)

B–2 B–3 B–4 B–5 B–6

21 44.47 0 2.01 0 0 0

22 49.12 0 0 0 0 0

23 56.82 0 0 0 0 0

24 57.30 0 0 - - -

25 53.98 0 0 0 0 3.13

26 36.52 0 0 0 0 0

31 51.90 0 0 0 0 0

36 - - - - - 57.89

57 46.74 1.20 0 2.66 0 0

60 58.28 0 0 0 2.07 0

63 50.43 0 4.14 0 0 0

77 52.55 0 0 0.62 0 0

78 - - - 51.24 0.13 0

114 49.94 0 0 0 0.02 0.02

115 46.00 0 0 1.75 0.04 0.76

Page 14: Evaluating regression test suites based on their fault ...cse.unl.edu/~elbaum/papers/journals/JournalMaintenance00.pdfthe module fault index between build v and build v+1. We call

SMRSGE.wpd: Fault exposure capability Page 14

Table III. Test execution profiles for build–2 of the QTB.

Modules

Tests

J AU BH BM DB

815 0.0088 0.0064 0.0113 0 0

816 0 0.0008 0 0 0

817 0.0473 0.0149 0.0205 0.0658 0.0074

818 0.0008 0 0 0.0219 0

819 0.0929 0.0363 0.0635 0.0439 0.0074

820 0 0 0 0 0

821 0 0.0004 0.0007 0 0

822 0.0111 0.0185 0.0176 0 0

823 0.0923 0.0278 0.0649 0.0439 0

824 0 0.0020 0 0 0

825 0 0 0 0 0

826 0.0473 0.0149 0.0205 0.0658 0.0074

827 0 0 0 0 0

828 0.0115 0.0181 0.0166 0.0439 0.0149

829 0.0104 0.0120 0.0124 0.0219 0.0019

830 0.0003 0 0 0 0

831 0.0003 0 0 0 0

Page 15: Evaluating regression test suites based on their fault ...cse.unl.edu/~elbaum/papers/journals/JournalMaintenance00.pdfthe module fault index between build v and build v+1. We call

SMRSGE.wpd: Fault exposure capability Page 15

Table IV. Fault exposure capability for build–2 of the QTB.

Modules EFIFault exposure capability under tests

ExposureJ AU BH BM DB

815 0.34 2.38 2.34 2.42 0 0 YES

816 0 0 0 0 0 0

817 1.71 13.12 12.27 12.50 13.37 11.74 YES

818 0.41 2.41 0 0 2.99 0 YES

819 0 0 0 0 0 0

820 0 0 0 0 0 0

821 0 0 0 0 0 0

822 0 0 0 0 0 0

823 0.04 0.35 0.33 0.35 0.34 0 YES

824 0 0 0 0 0 0

825 0 0 0 0 0 0

826 1.43 10.94 10.23 10.42 11.15 9.78 YES

827 3.91 0 0 0 0 0 YES

828 2.40 19.31 19.78 19.69 18.30 17.18 YES

829 0 0 0 0 0 0

830 0 0 0 0 0 0

831 58.10 316.14 0 0 0 0 YES

TOTAL 68.33 364.65 44.94 45.38 46.14 38.72 47%

Page 16: Evaluating regression test suites based on their fault ...cse.unl.edu/~elbaum/papers/journals/JournalMaintenance00.pdfthe module fault index between build v and build v+1. We call

SMRSGE.wpd: Fault exposure capability Page 16

Table V. Fault exposure capability under different builds of the QTB.

Builds Fault exposure capability under tests TOTAL

J AU BH BM DB

B–2 364.65 44.94 45.38 46.14 38.72 539.83

B–3 52.11 66.87 4.22 2.55 18.78 144.53

B–4 65.43 102.56 95.08 153.15 47.90 464.12