Automated Object-Oriented Software Testing using Genetic …se.inf.ethz.ch/old/projects/lucas_silva/report.pdf · 2010-05-25 · Automated Object-Oriented Software Testing using Genetic

Automated Object-Oriented Software

Testing using Genetic Algorithms and

Static-Analysis

Lucas Serpa Silva

Software Engineering

Swiss Federal Institute of Technology

A thesis submitted for the degree of

Msc Computer Science

Supervised by

Yi Wei

Prof. Bertrand Meyer

2010, March

mailto:[email protected]

http://se.inf.ethz.ch/

http://www.ethz.ch/

Abstract

It is estimated that 80% of software development cost is spent on detecting and

fixing defects. To tackle this issue, a number of tools and testing techniques have

been developed to improve the testing framework. Although techniques such as

static analysis, random testing and evolutionary testing have been used to au-

tomate the testing process, it is not clear what is the best approach. Previous

research on evolutionary testing has mainly focused on procedural programming

languages with simple test data inputs such as numbers. In this work, we present

an evolutionary object-oriented testing approach that combines a genetic algo-

rithm with static analysis to increase the number of faults found within a time

frame. A total of 4,379 hours of experiments were executed to evaluate the effec-

tiveness of the evolutionary testing approach implemented compared to a random

testing strategy and a precondition-satisfaction strategy. The results show that

genetic algorithm combined with static analysis can increse the number of faults

by 34% compared to the second best performing strategy. It also showed that

compared to the other two strategies, the chance of a test case generated by our

evolutionary testing aproach to find a fault is 320% higher.

Acknowledgements

I would like to thank Yi Wei for his constructive comments throughout this work.

I would also like to thank Anna-Franziska Rudschies for the time she spent proof-

reading this thesis. Finaly, I want to thank my girlfriend Sonja Meier for her

unconditional love and support.

Contents

List of Figures vii

List of Tables ix

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Past research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Project goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 5

2.1 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Black box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.2 White box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.3 Automated testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.4 Test evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.4.1 State Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Eiffel & Design by Contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Autotest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.1 Original random strategy (OR) . . . . . . . . . . . . . . . . . . . . . . 10

2.3.2 Precondition satisfaction strategy (PS) . . . . . . . . . . . . . . . . . . 11

2.3.3 Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4.1 Chromosome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4.2 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4.3 Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.4 Objective and fitness value . . . . . . . . . . . . . . . . . . . . . . . . 16

iv

CONTENTS

2.4.5 Selecting individuals for reproduction . . . . . . . . . . . . . . . . . . 17

2.4.6 GA Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 Evolutionary testing 19

3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1.1 Evolution of a strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1.1.1 Objective Function . . . . . . . . . . . . . . . . . . . . . . . 20

3.1.2 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.2.1 Strategy specification . . . . . . . . . . . . . . . . . . . . . . 22

3.1.2.2 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1.2.4 Mutation and crossover . . . . . . . . . . . . . . . . . . . . . 23

3.2 Evolutionary AutoTest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2.1 Instantiation of objects . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2.2 Methods and parameter selection . . . . . . . . . . . . . . . . . . . . . 28

3.2.3 Object pool creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4 Experiments 30

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3 Executions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.4 Faults found in class group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.5 Faults per class analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.6 Strategy analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.6.1 Primitive values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5 Discussion 44

5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.2 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.3 Further improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.4 Project information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

A Primitive Values 47

B Chromosome files 48

v

CONTENTS

C Test groups 49

D List of classes tested 52

Bibliography 55

vi

List of Figures

2.1 Program flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Example of Design by Contract . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 AutoTest algorithm 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4 AutoTest algorithm 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 Genetic Algorithm flow diagram . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.6 Examples of mutation algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.7 One and two points crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.8 Order crossover examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.1 Four basic components of the system . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Parallel population evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3 Corrupted chromosome caused by crossover . . . . . . . . . . . . . . . . . . . 25

3.4 Valid chromosome crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.5 Evolutionary AutoTest 1 - Pseudocode describing the selection of creation

procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.6 Evolutionary AutoTest 2 - Pseudocode describing the creation of extended

objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.7 Evolutionary AutoTest 3 - Pseudocode describing how the features to be tested

are selected . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.8 Evolutionary AutoTest 4 - Pseudocode describing the creation of the object pool 29

4.1 Percentage of test cases that finds a fault . . . . . . . . . . . . . . . . . . . . 33

4.2 Number of faults found for all class groups by each strategy . . . . . . . . . . 34

4.3 Comparison of the number of faults found for each class tested . . . . . . . . 35

4.4 Comparison of the number of faults found only by on method . . . . . . . . . 35

4.5 Number of faults that were only found by a strategy . . . . . . . . . . . . . . 36

vii

LIST OF FIGURES

4.6 SQRT of the number of times each fault was found . . . . . . . . . . . . . . . 37

4.7 Charts showing the hard to find faults for each strategy . . . . . . . . . . . . 38



4.10 Number of faults found per number of times the feature was tested . . . . . . 41

4.11 Distribution for the values for the Real 32 and Character (16,32) . . . . . . . 42


viii

List of Tables

1.1 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.1 Testingtool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1 Chromosome specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 Strategy files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.1 Genetic operators and parameters used by the evolutionary algorithm . . . . 31

4.2 Statistics from the test executions of the three strategies . . . . . . . . . . . . 32

4.3 Average of the SQRT of the frequency the faults were found . . . . . . . . . . 38

A.1 Autotest primitive values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

B.1 Chromosome files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

C.1 List of all test groups part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49



D.1 List of all classes tested part 1 of 3 . . . . . . . . . . . . . . . . . . . . . . . . 52



ix

1

Introduction

1.1 Motivation

In the past 50 years the growing influence of software in all areas of industry have led to an

ever-increasing demand for complex and reliable software. According to a study(3) conducted

by the National Institute of Standard & Technology, approximately 80% of the development

cost is spent on identifying and correcting defects. The same study found that software bugs

cost the United States economy around $59.5 billion a year, with one third of this value being

attributed to the poor software testing infrastructure. In the effort to improve the existing

testing infrastructure, a number of tools have been developed to automate the test execution

such as JUnit(1) and GoboTest(4). However, the automation of test data generation is still

a topic under research. Recently, a number of methods such as metaheuristic search, random

test generation and static analysis have been used to completely automate the testing process,

but the application of these tools to real software is still limited. Random test case generation

has been used by a number of tools (Jartege(34), AutoTest(33), Dart(32)) that automate

the generation of test cases, but a number of studies found a genetic algorithm (evolutionary

testing) to be more efficient and to outperform random testing(9; 13; 16; 18; 26) for structural

testing.

1.2 Past research

The study of genetic algorithms as a technique for automating the process of test case gen-

eration is often referred to as evolutionary testing in the literature. Since the early 1990s, a

number of studies have been done on evolutionary testing. The complexity and applicability

1

1.2 Past research

of these studies vary. In order to classify the relevance of past research for this project, a

number of studies have been classified according to the complexity of the test cases being

generated and the optimization parameter used by the genetic algorithm. The complexity

of the test cases being generated is important because to generate test cases for structured

programs that only take simple input, such as numbers, is simpler than generating test cases

for object-oriented programs, which is one of the goals of this project.

Reference Year Language type Optimization parameter

(5)Xanthakis, S. 1992 Procedural (C ) Branch coverage(6)Shultz, A.. 1993 Procedural (Vehicle Simulator) Functional(7)Hunt, J. 1995 Procedural (POP11[X] ) Functional (Seeded errors)(8)Roper, M.. 1995 Procedural (C) Branch coverage(9)Watkins, A. 1995 Procedural (TRITYP simulator) Path Coverage(10)Alander, J.. 1996 Procedural (Strings) Time(18)Harman M. 1996 Procedural (Integers) Branch coverage(14)Jones, B. 1998 Procedural (Integers) Branch coverage(11)Tracey, N.. 1998 Complex (ADA) Functional (specification)(12)Borgelt, K. 1998 Procedural (TRITYP simulator) Path Coverage(13)Pargas, R.. 1999 Procedural (TRITYP simulator) Branch coverage(15)Lin. 2001 Procedural (TRITYP simulator) Path Coverage(16)Michael, C. 2001 Procedural (GADGET) Branch coverage(17)Wegener, J. 2001 Procedural Branch coverage(19)Daz E 2003 Procedural Branch coverage(20)Berndt, D. 2003 Procedural (TRITYP simulator) Functional(9)A. Watkins 2004 Procedural Functional (Seeded error)(24)Tonella, P 2004 Object-oriented (Java) Branch coverage(21)D. J. Berndt 2005 Procedural (Robot simulator) Functional (Seeded error)(22)Alba .E 2005 Procedural (C) Condition coverage(23)McMinn P. 2005 Procedural (C) Branch coverage(27)Wappler, S. 2005 Object-oriented (Java) Branch, condition coverage(28)Wappler, S. 2006 Object-oriented (Java) Exceptions / Branch coverage(26)Harman, M. 2007 Procedural Branch coverage(25)Mairhofer, S. 2008 Object-oriented (Ruby) Branch coverage

Table 1.1: Previous work.

As shown in Table 1.1, there have been only a few projects that generate test cases for

object-oriented programs, and to the best of our knowledge there was only one project(11)

that generates test cases for object-oriented programs and uses the number of faults found as

2

1.3 Project goals

the optimization parameter for the genetic algorithm. In that study, test cases were generated

for ADA programs, but a formal specification had to be manually specified in a SPARK-Ada

proof context. Thus, the testing process was not completely automated.

Table 1.1 also shows that branch coverage was the most common optimization parameter

used to drive the evolution of test cases. However, there is little evidence of a correlation

between branch coverage and the number of uncovered faults. Although code coverage is a

useful test suit measurement, the number of faults a test suit unveils is a more important

measurement. Past research has shown that evolutionary testing is a good approach to

automate the generation of test cases for structured programs (9; 13; 16; 18; 26). To make

this approach attractive to industry, however, the system must be able to automatically

generate test cases for object-oriented programs and to use the number of faults found as the

main optimization parameter. To the best of our knowledge, there is currently no existing

project that fulfills these requirements.

programs and using the number of errors as an optimization parameter. However, a

specification for the program under test had to be manually provided so the process was

not completely automated. One of the problems when evolving test case for object-oriented

programs is the initialization of an object into a specific state. The object may recursively

require other objects as a parameter and the typing must match. Tonella(24) solved this

problem by defining a grammar for the chromosome and defining the mutation and crossover

operations based on it. Another problem when generating test case for object-oriented pro-

gram is the lack of software specification to check if a test has passed. Wappler(28) used

software exceptions as an indication of a fault and Alander(10) used the time needed for

execution.

1.3 Project goals

The base hypothesis for this work is that an automated testing strategy which can adapt

the test case generation to the classes under test will perform better than strategies which

cannot. This hypothesis is based on three assumptions:

1. Each class has a different testing priority. That is, each class has a set of methods that

should be tested more often than the others.

2. Each class has different object type distribution. The testing strategy ought to generate

more strings for classes that work with strings than objects of other types.

3

1.3 Project goals

3. Each class has an optimal set of primitive values, and this set is not necessarily the

same for other classes.

In this project a representation of a testing strategy is encoded in a chromosome and

then a genetic algorithm is used to evolve a testing strategy that is adapted to the classes

under test. The genetic algorithm will evolve a testing strategy according to the number

of unique faults it finds, the number of untested features and the number of unique states.

This project innovates by using a combination of functional and structural metrics as the

optimization parameter for the genetic algorithm and by applying static analysis to improve

the performance of the genetic algorithm. This project is based on the AutoTest(2) tool and

the Design by Contract methodology implemented by the Eiffel programming language(29).

4

2

Background

2.1 Testing

Testing is one of the most used software quality assessment methods. There are two important

processes when testing object-oriented software. First, the software has to be initialized with

a set of values. These values are used to set a number of variables that are relevant for the

test case. The values of these variables define a single state from the possible set of states the

softwara can be. These values can either be a primitive value such as an integer or complex

values such as an object. With the software initialized, its methods under test can then be

tested by calling them. If a method takes one or more objects as parameters, these objects

also have to be initialized. To determine if the test case passed or fail, a software specification

has to be used. The software specification defines the output of the software and what is

a valid input. Since the number of possible states a software may have is exponential, it is

impossible to test all of them. In practice, interesting states are normally identified by the

developers according to a software specification, program structure or their own experience.

There are many types of testing. However, they can all be classified as either black box or

white box testing.

2.1.1 Black box

The Black box testing, also called functional testing(30), will consider the unit under test as a

black box where data is fed in and the output is verified according to a software specification.

Functional testing has the advantage that it is uncoupled from the source code because,

given the software specification, test data can be generated even before the function has

5

2.1 Testing

been implemented. Functional testing is also closely related to the user requirements since

it is testing a function of the program. Its main disadvantage is that it requires a software

specification and it may not explore the unit under test well, since it does not know the code

structure.

2.1.2 White box

The white box testing technique, also called structural testing, will take into account the

internal structure of the code. By analyzing the structure of the code, different test data

can be generated to explore those specific areas. Structural testing may also be used to

measure how much of the code has been covered according to some structural criteria. By

analyzing the program flow and the path of an execution, a code coverage can be computed

given certain criteria such as statement coverage, which computes the number of unique

statements executed.

2.1.3 Automated testing

To automate the testing process, both the generation of test data and the execution of

test cases have to be automated. There are already a number of tools such as JUnit(1)

and GoboTest(4) that automate the test case execution but the main problem lies in the

automation of the test data generation. Apart from the evolutionary testing, there are a

number of tools that automate the testing process using static analysis, software model and

random testing. The table 2.1 describe the method used for each tool. Random testing is

the widely adopted method; tools such as as AutoTest (33), DART (32) and Jartege(34)

implement a random algorithm, but there are many algorithms that performs better than

random algorithms for optimization problems.

2.1.4 Test evaluation

The main goal of software testing is to uncover as many faults as possible and show the

robustness of the software. But to be able to rely on a test suit for quality assurance, this

test suit needs to be good. Besides the number of fauls found, one method used to evaluate

a test suit is code coverage. The code coverage measures how much of the source code was

executed by the test suit. Intuitively, a good test suit ought to have a good code coverage.

6

2.1 Testing

Name Method

(46) Agitator Dynamic Analysis(33) AutoTest Random testing(32) DART Random testing(47) DSD-Crasher Static / Dynamic Analysis(48) Eclat Models(44) FindBugs Static Analysis(34) Jartege Random testingJML(45) Java PathFinder Model Checker(50) JTest Static Analysis(49) Symstra State Space Model

Table 2.1: Testing tools.

However, this depends on how the code coverage is measured, some of the main code coverage

criteria are described below.

• Statement coverage measures the number of unique statements executed. The main

advantage of statement coverage is that it can be measured directly from object code.

But it is too simplistic and usually not a good testing evaluation criteria(51).

• Branch coverage measures the unique evaluation of boolean expression from condi-

tional statements. It is simple to compute and it is stronger than statement coverage.

Full branch coverage implies full statement coverage. However, it is insensitive to com-

plex boolean expression and it does not take into account the sequence of statements.

• Condition coverage measures the unique evaluation of each atomic boolean expres-

sion independently of others. It provides a more sensitive analysis compared to branch

coverage. Full condition coverage does not imply full branch coverage since brahcnes

might exist that are unreachable.

• Path coverage measures unique paths a program execution can take. It requires

thorough testing to achieve a good path coverage, it is very expensive, and in many

situation unfeasible, since the number of paths is exponential to the number of branches

and there are paths that are impossible to execute. For instance in figure 2.1, it is not

possible to execute a program that has S2 and S6 on its path.

7

2.2 Eiffel & Design by Contract

Figure 2.1: Program flow - Example of program flow analysis

2.1.4.1 State Coverage

Another interesting approach is to use state coverage, where the state is defined as the values

of numerical and boolean attributes of a class at each execution point. At the end of the

execution the sum of the number of unique values at each execution point is considered as the

state coverage. This approach was used in this project because it provides a more fine-grain

measurement of how much of the software was executed and it may be easier to implement

compared to other code coverage.


The lack of software specification is one of the main problems when automating the testing

process. Without specification it is impossible to be sure that a feature1 has failed. Even when

the test case leads the program to crash or throw an exception, it is not clear if the software

has a fault since the program may have not been defined for the given input. Normally,

the developers will write a header as a comment for each method, describing its behaviour.

Although there are guidelines on how to write these headers, they are not formal enough to1Feature means either a procedure or a function. In this report feature and method are interchangeably

used to refer to a procedure or a function.

8


allow the derivation of the method’s specification.

This problem has been dealt with by the Eiffel programming language(29), which, besides

other methodologies, implements the Design by Contract (31) concept. The idea behind De-

sign by Contract is that each method call is a contract between the caller (client) and the

method (supplier). This contract is specified in terms of what the client must provide and

what the supplier guarantees in return. This contract is normally written in the form of pre-

and postcondition boolean expressions for each method. In the example illustrated in Figure

2.2, the precondition is composed of four boolean expressions and the postcondition of two

boolean expressions. These expressions are evaluated sequentially upon method invocation

and termination. The system will throw an exception as soon as one expression is evaluated

to be false. Therefore, the method caller must ensure the precondition is true before calling

the method and the method must ensure that the postcondition is true before returning. For

example, the borrow book method shown in Figure 2.2 takes the id of a borrower and the id

of the book being borrowed.

Figure 2.2: Example of Design by Contract -

The method caller must ensure that the book id is a valid id, the book with that id

has at least one copy available, the borrower id is a valid id and the borrower can borrow

books. If these conditions are fulfilled, the method guarantees that it will add the book to the

borrower’s list of borrowed book and decrease the number of copies available by one. Apart

from the pre- and postcondition, every class has an invariant condition that has to remain

true after the execution of the constructor and loops may have variants and invariants. With

Design by Contract a method has a fault if it:

9

2.3 Autotest

1. violates another method’s precondition.

2. does not fulfil its own postcondition.

3. violates the class invariant.

4. violates loop variant or invariant.

For the automation of test case generation, Design by Contract can be used to determine if

the generated test data is defined for a given method by checking it against the precondition.

It can also be used to check if a method has failed or not by comparing the result against the

postcondition. In the next section we discuss how this idea is implemented in the AutoTest

tool (2).

2.3 Autotest

AutoTest exploits the Design by Contract methodology implemented in Eiffel to automatically

generate random test data for Eiffel classes. AutoTest works with a given timeout and a set

of classes to be tested. AutoTest starts by loading the classes to be tested and creating a

table containing all (including the inherited) methods of those classes. AutoTest can then

apply an original random strategy (OR) or a precondition satisfaction strategy (PS). These

strategies are described below.

2.3.1 Original random strategy (OR)

Figure 2.3: AutoTest algorithm 1 - Method invocation

10

2.3 Autotest

As described in the algorithm 2.3, OR will randomly select methods to test while the

timeout has not expired. OR chooses the method to be tested (line 4) and the creation

method (line 23) randomly. It uses a probability to determine if a new object should be

created or selected from the object pool (line 11). The object pool is a set of all objects

created by AutoTest. The idea behind the object pool is that reusing the objects that might

have been modified during a previous method call will increase the chance of finding more

faults. When creating an object, AutoTest uses different algorithms for extended and non

extended types. Extended types are the primitive types such as Integer, Boolean, Character

and so on. For these types, AutoTest must provide an initial value as shown in Figure 2.4.

The initial values for the extended types are randomly selected from a set of fixed values

chosen by the developers. These values are listed in appendix A.1.

Figure 2.4: AutoTest algorithm 2 - Object creation

When instantiating objects that are not of the extended type, OR will randomly select

one of its creation procedures and invoke it. After the timeout expires, it will generate a

report containing the number of test cases generated, the number of failures, the number of

unique failures, the number of invalid test cases and will reproduce the code that triggers the

faults it found.

2.3.2 Precondition satisfaction strategy (PS)

The idea behind PS (40) strategy is to increase the likelihood of selecting a precondition-

satisfying object from the object pool when testing a feature. It is an extension from the OR

11

2.4 Genetic Algorithm

strategy that, apart from the object pool, it keeps apredicate valuation pool (v-pool). The v-

pool keeps track of which objects satisfy which precondition clauses. The PS strategy differs

from OR strategy by using a function to decide whether or not to turn the precondition-

satisfaction strategy on. When on, PS will select precondition-satisfying objects from the

object pool and after the execution it will compute which precondition-satisfying predicates

hold for the objects used in the execution and it updates the v-pool.

2.3.3 Faults

Eiffel throws an exception whenever an expression in a contract is violated (precondition,

postcondition, class invariant, loop invariant, loop variant). AutoTest will then examine this

exception to find out if it was triggered by an invalid test case or by an actual fault in the

code. Invalid test cases are the test cases that violate the precondition of the feature being

tested or a class invariant. If it is a valid test case, AutoTest will check if this fault is unique

by looking at the line of code where the exception happened and compare to all unique faults

it has already found. Beside the faults triggered by the Design by Contract conditions, other

exceptions triggered by calling methods on void object, lack of memory are also considered

as valid test cases.


Genetic Algorithms (GA) are search algorithms based on the natural selection as described

by Charles Darwin. They are used to find solutions to optimization and search problems.

Genetic algorithms became popular when John Holland published the “Adaptation in Natu-

ral and Artificial Systems”(36) in 1975 and De Jong finished an analysis of the behaviour of

a class of genetic adaptive systems(35) in the same year. The basic idea of a GA is to encode

the values of the parameters of an optimization problem in a chromosome which is evaluated

by an objective function. As shown in Figure 2.5, the algorithm starts by initializing or

randomly generating a set of chromosomes (population). At the end of each generation, each

chromosome is evaluated and modified according to a number of genetic operations in order

to produce a new population. This process repeats until a predefined number of generations

is computed or until the objective value of the population has converged.

12


Figure 2.5: Genetic Algorithm flow diagram -

2.4.1 Chromosome

Each individual in the population is represented by a chromosome that stores the values of

the optimization problem. The chromosome is normally encoded as a list of bits, but its

encoding and structure can vary. Each gene of the chromosome can have a specific allele.

An allele specifies the range or the possible values that the gene can have. To evaluate each

chromosome, an objective function must be defined. The objective function uses the values

encoded on the chromosome to check how well it performs in the optimization problem. At

the end of each generation a number of genetic operations such as mutation and crossover

are applied to each chromosome to produce the population for the next generation.

2.4.2 Mutation

When a chromosome is passed on, there is a probability that some of its genes will not be

copied correctly and undergo a small mutation. Mutation ensures that the solutions of the

new generation are not identical to those of the previous one. The mutation probability

controls how much of the chromosome will mutate. A small probability leads to a slower

convergence, while a large probability will lead to instability. The mutation operator can be

defined in different ways. Three basic mutation operation are described below.

13


1. Flip mutator will change a single gene of the chromosome to a random value according

to the range specified by the alleles.

2. Swap mutator will randomly swap a number of genes of the chromosome.

3. Gaussian mutator will pick a new value around the current value using a Gaussian

distribution.

The mutation operation is defined according to the structure of the chromosome. When

the chromosome is stored in a tree, one possible mutation is to swap subtrees as shown in

Figure 2.6.

Figure 2.6: Examples of mutation algorithms -

2.4.3 Crossover

Crossover is the process where two or more chromosomes are combined to form one or more

chromosomes. The idea behind crossover is that the offspring may be better than both

parents. Crossover is normally done between two individuals, but more can be used. There

are many crossover algorithms, some of them are described below:

1. Uniform crossover will randomly select the parent where each gene should come from.

14


2. Even odd crossover will select the genes with even index from parent A and the genes

with odd index from parent B.

3. One point crossover will randomly select a position on the chromosome and all the

genes to the left come from parent A and the genes to the right come from parent B.

4. Two points crossover will randomly select two positions and pick the genes from parent

A which have a greater index than the smaller position and a smaller index than the

biggest position. The remaining genes come from parent B.

Figure 2.7: One and two points crossover -

5. Partial match crossover will produce two children C1 and C2. It initializes C1 by

copying the chromosome of the parents A and C2 by copying the chromosome of parent

B. It will then randomly select a number of positions and swap the genes between C1

and C2 at those positions.

6. Order crossover produces two children C1 and C2. It initializes by copying the

genes of the parents to the children and deleting n genes randomly selected from each

offspring. It then selects an interval with size n and slides the genes such that the

interval is empty. It then selects the original genes in that interval from the opposite

offspring. The algorithm is illustrated in Figure 2.8.

7. Cycle crossover produces two children C1 and C2. It initializes C1 and C2 by copy-

ing the chromosomes of the parents A and B respectively. Then it selects n random

positions and replaces the genes from C1 with genes from parent B in those positions.

The process is repeated for C2 with parent A.

15


Figure 2.8: Order crossover examples -

2.4.4 Objective and fitness value

The objective value is the performance measurement for each chromosome. This value is

used to aid the selection of chromosomes for crossover. It can be used directly to select the

good chromosomes for crossover, but it is normally scaled to produce a fitness value. The

scaling function is one method that can be used to minimize the elitism problem described in

section 2.4.5, where only a limited number of chromosomes is involved in producing the next

generation. This fitness value is then used to compute compatibility of each chromosome

for crossover. The compatibility is used to ensure that good individuals are not combined

with bad ones. Many methods exist to compute the fitness value; the most common scaling

methods are described below.

1. Linear scaling

fitness = α ∗ objectiveV alue+ β (2.1)

2. Power law scaling

fitness = objectiveV alueα (2.2)

3. Sharing scaling computes the number of genes that the two chromosomes have in

common. Two individual are considered unfit for mating when their difference is very

low, meaning that they are too similar. The difference can be computed using bitwise

operations (37) or other user-specified method if the chromosome is not encoded as bit

strings.

16


2.4.5 Selecting individuals for reproduction

Elitism and diversity are two important factors when selecting individuals for reproduction.

With elitism, selection is biased towards the individuals with the best objective value. Elitism

is important since it removes bad solutions from the population and reproduces the good

ones. However, by continuously reproducing from a small set of individuals, the population

becomes very similar which may lead to a sub-optimal solution. This effect The diversity of

the population ought to be controlled to ensure that the search space is explored well. Many

selection schemas have been developed to properly select the individuals for reproduction and

to try to minimize the elitism problem. Some of the selection schemas include:

1. Rank schema selects the best individuals of the population every time.

2. Roulette Wheel selects individuals according to their fitness values as compared to

the population. The probability of an individual being picked is:.

p1 =fitness∑len(population)

i=0 fitnessi(2.3)

3. Tournament sampling uses the roulette wheel method to select two individuals. Then

it picks the one with the higher fitness value.

4. Uniform sampling selects an individual randomly from the population.

5. Stochastic remainder sampling first computes the probability of each individual be-

ing selected, p1, and its expected representation, ε = p1∗len(population). The expected

representation is used to create a new population of the same size. For example, if an

individual has ε equal to 1.7, it will fill one position in the new population and it has

a probability of 0.7 to fill another position. After the new population is created, the

uniform method is used to select the individuals for mating.

6. Deterministic sampling computes ε of each individual as in the stochastic remainder

sampling. A new population is created and filled with all individuals with ε > 1 and

the remaining positions are filled by sorting the original population’s fractional parts

of ε and selecting the highest individuals on the list.

17


2.4.6 GA Variations

There are three common types of the Genetic Algorithm. They differ in how the new popu-

lation is computed at the end of each generation.

1. Simple Genetic Algorithm uses a non-overlapping population between generations.

At each generation the population is completely replaced.

2. Steady-state Genetic Algorithm uses an overlapping population where a percentage

of the population is replaced by new individuals.

3. Incremental Genetic Algorithm has only one or two children replacing members of

the current population at the end of each generation.

Compared to other optimization algorithms, genetic algorithm is relative simple and

robust(37). In the past, it has been successfully used to automatically generate test data

to optimize the code coverage as described in section 1.2. In this work, genetic algorithms

are used to automatically generate a set of test cases and optimize the number of faults found.

One of the main reasons we believe genetic algorithm is a good approach for automatically

generating test data is because it can adapt to the code being tested. It is plausible to assume

that developers will acquire bad habits with time which leads to a patter of mistakes. One

assumption is that genetic algorithms may be able detect some of these mistakes and tune

the test data generation mechanism to exploit it.

18

3

Evolutionary testing

3.1 Implementation

The system is composed of two programs, a modified version of AutoTest and a genetic algo-

rithm implemented in C++ using the GAlib(38) library. The extended version of AutoTest

can use a testing strategy to guide the generation and execution of test cases. A genetic al-

gorithm is used to evolve a testing strategy for a given set of classes and the communication

between AutoTest and the genetic algorithm is done through two files as show in Figure 3.1.

Figure 3.1: Four basic components of the system -

The testing automation is divided into two processes. First a strategy must be evolved

for the classes under test, then the classes can be tested with this strategy. Although, a

single testing strategy can be used for multiple test runs, in this research all the strategies

are evolved from scratch.

19

3.1 Implementation

3.1.1 Evolution of a strategy

In this project a genetic algorithm is used to evolve a good testing strategy for a given set

of classes. A strategy is composed of 3 set of parameters that guide the generation and

execution of test cases. The parameters are described below:

1. Primitive values: these specify a set of n values for each of the five primitive types

(Integer, Real, Characters , Boolean and Natural) which are used for instantiating

objects. In this project n is set to 20, but previous research (41) indicates that this

value can be optimized according to execution time. Long evolution and execution of

a strategy performs better with bigger values for n .

2. Method call: specifies which methods should be called and which parameters should

be used for each method call.

3. Object pool: specifies how many object for each type the object pool should have.

As described in section 2.3, AutoTest has a fixed set of primitive values. Although,

AutoTest performs better when the probability of selecting a primitive value from this fixed

set is higher than the probability of selecting a random value (2), there is no study showing

that these primitive values are the optimal for every group of classes. On the contrary, it is

likely that different groups of classes require different primitives values. The same argument

applies to the distribution of object types in the object pool. When testing a set of classes that

performs numerical analysis, it would be better to have an object pool with more numerical

objects than Characters for example. It is also unlikely that all methods of a class require

the same amount of testing, with random testing, all the methods have the same likelihood

to be tested, but in most cases, there are methods in a class that are more complicated or

more fault prone than others, thus the testing strategy should test these methods more often.

By adapting these three parameters for a specific group of classes, the genetic algorithm

can evolve a good testing strategy. To guide the evolution of a testing strategy the genetic

algorithm requires an objective function that quantify how good each strategy is.

3.1.1.1 Objective Function

To determine how good each strategy is, the generic algorithm must execute the strategy for

a short amount of time and extract from the testing report the required attributes about the

execution run to evaluate the strategy. The attributes used to evaluate the strategy are:

20

3.1 Implementation

1. Unique number of faults: The number of unique faults found during the test run.

2. Number of unique states: The unique states is the sum of unique values of each nu-

merical (Integer, Natural, Real) and boolean attributes of the classes under test had

during the test execution.

3. Number of untested features: The number of features that were not tested, had no valid

test cases, during the test execution.

4. Precondition Score: Normally a feature is untested because no test cases generated

could satisfy its precondition. For each invalid test case generated for an untested

feature the failing position, that is the index of the precondition expression violated,

of this test case is used to compute the precondition score. For example a test case

which has satisfied the first expression but fail on the second expression of a features

precondition, so the failing position for this invalid test cases is 2. The final precondition

score is the sum of the failing position for all invalid test cases generated for all untested

features.

The number of unique faults found is important to measure how good a testing strategy

is, but when two strategies find the same number of faults, the strategy that has tested more

features and tested the software in a more diverse set of states is considered better. The

number of unique states measure the number of unique states the software was tested in

and the number of untested methods provides a more direct measurement of the number

of features which wereon tested. When two strategies have the same number of untested

features, the strategy that was able to satisfy more preconditions for the untested features is

considered better. The score for a strategy is computed as follow:

objective value = ((nUFaults ∗ 10000) + (1000− (nUFeatures ∗ 10) + pScore) + nUStates)

(3.1)

Where nUFaults is the number of unique faults, nUFeatures is the number of untested

features, pScore is the precondition score and nUStates is the number of unique states. To

evaluate a strategy the genetic algorithm writes the strategy to a set of files and execute

AutoTest. AutoTest will load the strategy from file and test the classes for a given amount of

time. In the end, it produces a report containing information about the test execution which

is used by the genetic algorithm to compute the objective value.

21

3.1 Implementation

3.1.2 Genetic Algorithm

The implementation of the genetic algorithm is then divided into four stages.

1. Strategy specification: Specification of a valid strategy.

2. Initialization: Create the initial set of strategies.

3. Evaluation: Evaluate the strategies.

4. Mutation and Crossover: Apply evolutionary operators.

3.1.2.1 Strategy specification

The chromosome (strategy) is encoded as a list of floating numbers. As described in section

2.4.1, the alleles can be used to specify the range or a list of possible values allowed for

each gene. Specifying the allele simplifies the chromosome encoding and interpretation. For

example, the range of valid values for the Character data type is between 0 and 600, but

by randomly selecting a floating number, it is likely that a number outside this range will

be selected. This would force the number to be rounded down to 600 or up to 0, and lead

to a set of characters with similar values. By specifying the allele (0,600), all the characters

will have the same probability of being picked. The chromosome encoding and its alleles are

described in Table 3.1.

3.1.2.2 Initialization

As described in section 2.4, a genetic algorithm starts by creating a population of individual,

where each individual specifies a strategy. The values for the method call and the object pool

parameters are randomly generated values from the range of values specified by the alleles.

The primitive values however, are initialized with values obtained using static analysis com-

bined to randomly generated values. Static analysis is used to extract primitive values from

the classes under test. The system works by scanning the classes for natural, integer, real and

character values and storing these values. When initializing a strategy, a probability of (0.8)

is used to specify whether a value should come from the set of value obtained using static

analysis or from a random value generator (0.2). This probability is used to avoid initializing

a population that is too similar, by introducing some random values, a level of diversification

is guaranteed.

22

3.1 Implementation

Group Parameter Starting Index Length AllelePrimitive Values BOOLEAN 0 20 -1,1

CHARACTER 32 20 20 0 , 600CHARACTER 8 40 20 0 , 255INTEGER 16 60 20 -32768 , 32767INTEGER 32 80 20 -2147483648 , 2147483647INTEGER 64 100 20 -9223372036854775808 ,

9223372036854775807INTEGER 8 120 20 -128 , 127NATURAL 16 140 20 0 , 65535NATURAL 32 160 20 0 , 4294967295NATURAL 64 180 20 0 , 1.84E+019NATURAL 8 200 20 0 , 255REAL 32 220 20 -1.0e30 , 1.0e30REAL 64 240 20 -1.0e30 , 1.0e30

Method Calls METHOD CALL 260 5000 0 , 500Object Pool OBJECT POOL 5000 500 0 , 100

Table 3.1: Chromosome specification

3.1.2.3 Evaluation

When evaluating a strategy, the genetic algorithm will generate a set of files (shown in Table

3.2) that specifies the strategy evolved for a specific set of classes. AutoTest is then executed

for a short amount of time. When the execution is completed, a report about the test run is

used by the genetic algorithm to compute the object value for this strategy.

Since each chromosome can be executed independently from the others, the evaluation of

the population is executed in multiple threads. The evaluation of the population works by

creating 8 instances of the code under test and calling Evolutionary AutoTest for each one

of them. As illustrated in Figure 3.2, eight strategies are evaluated in parallel.

3.1.2.4 Mutation and crossover

To test a piece of software thoroughly, it is important to test it in many different states. A

state can be reached by a particular sequence of method calls. AutoTest hopes to achieve

different states by randomly invoking methods. To map this behaviour onto the chromosome,

the possibility of adding and removing a method call has to be considered because some states

23

3.1 Implementation

Figure 3.2: Parallel population evaluation -

Parameter Strategy File

BOOLEAN boolean.txt

CHARACTER 32 character 32.txt

CHARACTER 8 character 8.txt

INTEGER 16 integer 16.txt




NATURAL 16 natural 16.txt




REAL 32 real 32.txt

REAL 64 real 64.txt

METHOD CALL method call sequence.txt

OBJECT POOL creation probability.txt

Table 3.2: Strategy files

24

3.1 Implementation

can be reached in two, while others may require seven method calls. Another problem is that

each method call has a certain number of type-specific parameters. With these requirements,

the crossover operation may produce a corrupted chromosome, since the number of method

calls and parameters for each method call may differ in each chromosome.

Figure 3.3: Corrupted chromosome caused by crossover -

Figure 3.3 shows an example where the chromosome stores the method to be called with

the parameters in the same section of the chromosome. Chromosome X will call method a,

method b and method a again. The problem is that method a takes two String parameters.

The combined chromosome, however, will produce a call to method a that takes one String

and one Integer. One possible solution to this problem was described by Tonella(24). Tonella

used grammar to specify a valid chromosome. This grammar was then used to drive the

mutation and crossover operations.

In this project, a simpler approach was used to solve the same problem. First, the section

on the chromosome that specifies which methods should be invoked is separated from the

section that specifies which parameters should be used. When a method needs three param-

eters, it reads three slots from the parameter section of the chromosome. If the next method

requires two parameters it will read the next two slots. To ensure that the parameters are of

25

3.2 Evolutionary AutoTest

the right type, the chromosome does not specify the object to be used but instead specifies

an index of the object as shown in Figure 3.4. Since AutoTest knows which types are needed

to execute each method, the chromosome just needs to specify which object from the list of

possible objects has to be used. Because the number of methods and the number of available

types is not know in advance, the chromosome assumes a maximum number and the real in-

dex is computed with real index = chromosome index MOD list size. Where the list size

is the list of methods to call or a list of available object of a given type.

Figure 3.4: Valid chromosome crossover -

With this approach, adding or removing a method call is very simple. Whenever a mu-

tation makes the real index = 0, the method call is removed and when the real index is

modified from 0 to a different number, a method call is added. With this approach different

mutations and crossover methods can be used without having to worry about the chromosome

getting corrupted.


Evolutionary AutoTest extends AutoTest by adding an evolutionary testing strategy. When

selected, the evolutionary testing strategy will first load the strategy generated by the genetic

algorithm.Then it will proceed to test the classes according to the strategy loaded. The

26


strategy is composed of 15 files as(shown shown in Table 3.2) stored in the folder evolve

on the source root directory. Compared to the random strategy implemented in AutoTest

described in section 2.3, Evolutionary AutoTest differs in three major processes:

1. Instantiation of objects.

2. Selection of methods and parameters to execute a test case.

3. Object pool creation.

3.2.1 Instantiation of objects

First, the constructor used to instantiate the object is selected by reading the next index from

the method call parameter. If the selected method takes parameters, then compatible objects

must be selected from the object pool. A list of all compatible objects for each parameter is

retrieved and the object is selected by reading the next index from the method call parameter

and selecting the object with that index. This process is shown in Figure 3.5

Figure 3.5: Evolutionary AutoTest 1 - Pseudocode describing the selection of creationprocedure -

If one of the parameters is of a primitive types ( Boolean, Character, Integer, Natu-

ral, Real) the next value from the primitive type parameter is used. Figure 3.6 shows the

algorithm for the creation of primitive types.

27


Figure 3.6: Evolutionary AutoTest 2 - Pseudocode describing the creation of ex-tended objects -

3.2.2 Methods and parameter selection

After the classes to be testes are loaded, Evolutionary AutoTest creates a constant table of

all methods to be tested. From this table Evolutionary AutoTest selects the next method

to be tested. The process is very similar to the object instantiation. Figure 3.7 shows the

algorithm to select the next method to be tested.

Figure 3.7: Evolutionary AutoTest 3 - Pseudocode describing how the features tobe tested are selected -

28


3.2.3 Object pool creation

Whenever an object is requested from the object pool, the system will check how many object

of that type is in the object pool and check if more objects should be created by comparing

the number of objects to the number specified by the object pool parameter. The algorithm

in Figure 3.8 describes this process.

Figure 3.8: Evolutionary AutoTest 4 - Pseudocode describing the creation of theobject pool -

29

4

Experiments

4.1 Introduction

To evaluate the effectiveness of the evolutionary testing strategy (EV) compared to the ran-

dom testing strategy implemented in AutoTest (OR) and the precondition satisfaction strat-

egy (PS), each strategy is used to test 92 classes and the number of faults found is used

for comparison. The classes were selected from EiffelBase (43) and grouped into 57 testing

groups of strongly related classes such as Two way list and Two way list cursor. Each group

was then tested 30 times for 60 minutes each. The tests were executed in a set of dedicated

machines with an Intel Pentium 4 CPU at 3 GHz and 1 GB of RAM running Red Hat Enter-

prise Linux 5.3 with kernel version 2.6.18-92.1.1.el5. To facilitate the comparison between,

EV, OR and PS, the same experiment setting used in previous studies (52; 53) is used in

this project. The list of selected classes is provided in appendix D and the class groups in

appendix C. To make the comparison fair, EV uses the same amount of time used by OR and

PS (60 minutes), to both evolve a testing strategy and test the classes. Although in practice

a testing strategies can be evolved once for multiple test executions, in this study for each

test execution a strategy is evolved. The time allocation for the evolution of a strategy is

27 minutes and 33 minutes for the test run. The genetic operators and parameters used by

the genetic algorithm is shown in Table 4.1. The selection of the genetic operators and the

parameters for the evolutionary algorithm is based on a previous studies (41; 42) I performed

on the effectiveness of different genetic algorithm for automated object-oriented testing.

30

4.2 Evaluation

Evolutionary algorithm setting

Population size 8

Number of generation 4

Mutation probability 0.4

Crossover probability 0.4

Crossover algorithm Partial Matching

Selection algorithm Stochastic Remainder

Mutation algorithm Flip Mutator

Table 4.1: Genetic operators and parameters used by the evolutionary algorithm

4.2 Evaluation

The main parameter for evaluation is the number of faults found. AutoTest as described in

section 2.3.3, will report even the faults found on parameters passed on to the methods under

test. A better representation of the number of faults found in a class should only count the

faults found in the class under test and its ancestors. Also due to a bug in AutoTest that

was only detected after the experiments, where AutoTest is wrongly reporting some invalid

invariant violation faults as valid fault, the number of faults found without invariant faults is

also reported. The number of faults reported here is always the sum of all unique faults found

in all 30 execution runs. The notation used to specify which faults are counted is described

below:

1. (OR,PS,EV): The total number of unique faults are counted

2. (OR Anc,PS Anc, EV Anc): Only the faults found in the class tested or its ances-

tors are counted

3. (OR Anc Inv,PS Anc Inv,EV Anc Inv): Only the faults found in the class tested

or its ancestors are counted and the invariant violation faults are excluded

4.3 Executions

A total of 4,379 hours of test run is reported here, where the testing results obtained using

the OR and PS strategies are reused from previous studies (52; 53).

31

4.3 Executions

OR PS EV

Number of classes tested 115 119 149

Total features tested 8,242 8,268 9,425

Tested time (minutes) 104,310 104,311 55,167

Average execution time (minutes) per test run 61 61 32

Number of untested features (not unique) 24,193 17,659 32,954

Number of untested features (unique) 152 96 311

Number of Faults found (not unique) 32,589 34,357 45,172

Number of faults found per min of test 534.25 563.22 1,400.19

Percentage of features that were untested 1.8% 1.2% 3.3%

Faults found per test case generated (min) 0.0005 0.0005 0.0016

Number of Valid TC 31,545,994 31,279,452 12,691,422

Total pass test case 29,818,004 29,568,257 11,397,927

Total fail test case 1,714,948 1,698,456 1,273,533

Total bad test cases 13,042 12,739 19,962

Total invalid test cases 8,987,338 8,508,082 3,545,692

Total test cases 40,533,332 39,787,534 16,237,114

Total test cases per minute of test 664,481 652,248 503,298

Table 4.2: Statistics from the test executions of the three strategies

After running the experiments a set of basic statistics is computed for the three testing

strategies as shown in Table 4.2. It shows that although EV was executed for less time it

tested more classes, more features and the number of faults it found was higher. The percent-

age of test cases generated by EV to find a fault is 320% higher than the other two strategies

as depicted in Figure 4.1. Also from all features it tried to test 3.3% remained untested,

this is higher than the other two strategies. Also the EV strategy was less efficient in the

generation of test cases, the total test case generated per minute by EV is around 25% less

efficient than the other strategies.

32

4.4 Faults found in class group

Figure 4.1: Percentage of test cases that finds a fault -

4.4 Faults found in class group

The sum of faults found in all 30 execution runs for the three strategies applying the three

different methods to count the number of faults is shown in Figure 4.2. When counting all

the faults, EV increased in 98% the number of faults found compared to PS, but a total of

45% of these faults were found in classes that are not an ancestor of, or the class under test.

When considering only the class under test and its ancestors EV Anc found 412 more faults

than PS Anc and that is 43% improvement.

The number of fault is the sum of the number of unique faults found for each test group,

but many of these test groups have classes in common and many classes are also likely to

have ancestors in common, so a fault that is found in an ancestor that has descendants in

common or in a class that is used by more than one group is counted multiple times. For

this reason an analysis of faults found per class is performed. In this analysis only the faults

found in each class is reported.

33

4.5 Faults per class analysis

Figure 4.2: Number of faults found for all class groups by each strategy -


A total of 149 classes were tested by EV, the OR strategy tested 115 different classes and

PS tested 119 classes. These numbers differ from the original number of selected classes (92)

because AutoTest also reports faults found in ancestors and parameters. When eliminating

duplicated faults, we have a better representation of the improvement of a strategy over the

others. As we can see in Figure 4.3, EV is still the best performing method with around 32%

improvement over PS, and PS 8% improvement over OR.

One hypothesis to explain why EV finds more faults than the other methods is that it

tested more classes, but this is not the case as shown in Figure 4.4. Figure 4.4 shows an area

chart with the number of faults that were only found by one method for each class.

Figure 4.4 shows that for most of the classes, EV found faults that were not found by the

other methods. But EV has also missed some faults that were only found by OR or PS. The

exact number of faults that were only found by one method is reported in Figure 4.5. EV

missed a total of 133 (counting invariant faults) and 74 (without invariant faults) faults that

were only found by OR or PS. Interestingly, the PS strategy found a total of 28 faults just

34


Figure 4.3: Comparison of the number of faults found for each class tested -

Figure 4.4: Comparison of the number of faults found only by on method -

35


in the Lex builder class that were not found by PS or EV. This may indicate that the PS

strategy may only work for some set of classes.

Figure 4.5: Number of faults that were only found by a strategy -

EV seems to find a whole new set of faults that are not detected by OR nor PS. Also the

frequency it finds faults is very different from PS and OR as shown in Figure 4.6. Figure

4.6 shows the frequency that each fault was found in all execution runs. On the y-axis is

the square root of the number of times a fault was found and on the x-axis the faults found

sorted according to the frequency of EV. The frequency of OR and PS looks very similar,

whereas the frequency of EV has no visible pattern when compared to OR and PS. In the

right lower corner of Figure 4.6 we can see the faults that were only found by OR and PS.

Each strategy has a set of faults that it found less than five times. These faults are

considered hard to find faults and are also important because in a normal execution, where

36


Figure 4.6: SQRT of the number of times each fault was found -

a class is tested only one time, these faults are unlikely to be found. Figure 4.7 shows the

hard to find faults for all three methods. To facilitate the comparison, the square root of the

frequency a fault was found is used in the y-axis and the the sorted faults according to the

frequency on the x-axis. The first chart on the left upper corner shows the hard to find faults

for the OR strategy, most of these faults were found with a high frequency by both the PS

and EV strategy. The chart on the right shows the hard to find faults for the PS strategy.

These faults seems to be also hard for the other two strategies, but some of these faults were

found by EV strategy with high frequency. The lower chart shows the hard to find faults

for EV. Most of the faults were not found by the other strategies at all, and only a few of

these faults were found by OR and PS. In general there seems to be no correlation between

hard to find faults for one strategy to the others. Table 4.3 shows the average frequency for

each strategy for the hard to find faults for OR, PS and EV. The first row shows the average

frequency each strategy achieved for the hard to find faults for OR, the second row for the

hard to find faults for PS and the third EV. The EV strategy performed better in all cases.

37


Hard to find faults OR PS EV

OR 1.38 1.55 1.66

PS 0.71 1.33 1.44

EV 0.68 0.77 1.36

Average 0.92 1.22 1.49

Table 4.3: Average of the SQRT of the frequency the faults were found

Figure 4.7: Charts showing the hard to find faults for each strategy -

38

4.6 Strategy analysis


One of the assumptions of this work is that by adapting the frequency each method is tested

the number of faults found would increase. The PS strategy has shown this but it focused

on methods without any valid test cases, but in EV even methods that are not considered

hard to test may have a higher priority. Figure 4.8 shows the normalized frequency of valid

test cases for each method tested. The value was normalized because EV was executed only

for 33 minutes whereas the other two methods for one hour. The normalization is:

value =

√number valid test cases

tested minutes(4.1)


As shown in Table 4.2, out of 1710 execution runs there were a total of 31,545,994 valid

test cases in the OR strategy, 31,279,452 in the PS strategy and 12,691,422 in the EV strategy.

The distribution of valid test cases for the three strategies shows that OR and PS have a very

similar distribution but EV is completely different. It is also interesting that the hard to test

features, that is features with less than 10 valid test cases, seems to be tested more often by

39


EV. However, by looking at the Figure 4.9 we can see on the chart on the top left corner

that the hard to test features for OR strategy were better tested by the PS strategy. The

hard to test features for the PS strategy were also hard to test features for the OR strategy

but many of these features were tested with high frequency by EV. The hard to test features

for EV, however, shows a very interesting patter. There were many more features that were

tested less than 10 times, and for most of these features there were no valid test cases from

OR nor PS, but for some of these features both the OR and PS strategy had a high number

of valid test cases.


Figure 4.10 shows the number of faults found and the number of times the feature was

test. This chart reveals something very strange, for many methods the number of faults found

40


were grater than zero but the number of times the method were tested was zero. Further

analysis showed that AutoTest, whenever a class inherit or redefine a feature, and this feature

has a fault where it is redefined, AutoTest will report the fault as belonging to the ancestor

where the original feature was defined. This problem invalidates the analysis on Figure 4.10

but since most analysis on this paper is on the number of faults found this problem does not

have a negative effect.

Figure 4.10: Number of faults found per number of times the feature was tested -

4.6.1 Primitive values

Another important assumption is that the primitive values used by a testing strategy should

be adapted to the classes under test. The adaptation of this parameter was partially per-

formed using static analysis as described in section 3.1.2.2. Each of the 1710 testing strategies

evolved has a set of primitive values for all primitive types. By plotting these values we have

a distribution of the values for a type over the evolved strategies and we can see that not

all distributions are random. Figure 4.11 shows the distribution for Real 32 and Character.

The Real 32 distribution looks like a random distribution which was unexpected, but the

distribution for Character has an interesting distribution. In the Character distribution the

values for both Character 16 and Character 32 are plotted, which is visible on the Chart.

41


Figure 4.11: Distribution for the values for the Real 32 and Character (16,32) -

In the Character distribution we can see a line around the values 34, 36 40, 42, 48 which

are the characters ( “ ,$ ,( ,* ,0 ). It would be interesting to find out why such characters were

common values but such analysis is out of the scope of this project. The distribution for the

Integer 8 also looks very interesting. Figure 4.12 shows on the left hand side the distribution

for Integer 8 and on the right hand side the same chart with a different scale on the y-axis.


In the Integer distribution it seems that values close to zero have a high importance.

It also shows that the values 2000 and -2000 also seems to be very important but this is

interesting. One possible hypothesis is that 2000 appeared in many classes and it was found

by the static analysis. Since the static analysis technique used was very basic, the 2000 value

could be referring to date in the class header.

42

5

Discussion

5.1 Conclusion

In this work a total a total of 4,379 hours of experiments were executed and 96,557,980 test

cases were generated. Excluding invariant violation faults, all three methods combined found

893 faults in 148 Eiffel classes that are widely used in production software. From all the

faults found 88% of these faults were found by the evolutionary testing strategy, 66% by the

precondition-satisfaction strategy and 60% by the original random strategy. EV found 249

more faults than the OR strategy which is an improvement of 46%. When compared to the

PS strategy, EV found 198 more faults resulting in a 34% improvement. There were a total

of 74 faults that EV could not find, where 38% of these faults were found by PS in a single

class. EV on the other hand found a total 276 faults spread over 86 classes that OR nor

PS could find, and all this improvement was achieved using only 54% of the time used by

OR an PS to test the classes. In practice a testing strategy could be evolved only once for

multiple testing executions. The experiments also showed that despite being less efficient

on the generation of test cases the chance a test case generated by EV to finds a fault is

320% higher than the other two strategies as shown in Figure 4.1. The only disadvantage of

EV to the other strategies is that it needs to evolve a strategy before it can start testing a

class. This can be solved by providing a set of default strategies until a strategy is evolved

on the background. Analysis on the evolved testing strategy showed that those strategies

had adapted the primitive values and the frequency the methods were tested. The results

achieved on this project gives a strong indication that adapting the testing strategy to the

class under test will considerably improves the unmber of faults found.

43

5.2 Considerations

5.2 Considerations

1. In this project two problems were found in Autotest. In the first problem Autotest

was reporting an invariant violations caused by parameters passed to a method as a

valid fault, and to deal with this problem we provided both the number of faults found

with and without invariant violations and the results of this project is based on the

number of faults found without invariant violation. Meaning that the number of faults

and possibly the improvement is an underestimation. In the second problem, Autotest

reports a fault that belongs to a descendant to an Ancestor, but since most of the

analysis on the project were over the number of faults found the consequence of this

fault does not modify the improvement achieved by EV reported here.

2. Time was the main resource used for comparison in this study, a total of 60 minutes

were used for each test execution for OR and PS and the processing power was not

taken into account. The evolutionary algorithm used eight threads to evolve a strategy.

After the strategy was evolved EAutotest was then executed in a single thread.

5.3 Further improvement

Despite of the big improvement achieved by evolutionary testing, there are many improve-

ments that should increase this improvement even further. Some possible improvements and

research directions are presented below.

• Efficiency - the efficiency of the algorithm can be improved by combining the genetic

algorithm and Autotest into a single system. At the moment, every time Autotest is

invoked from the genetic algorithm, it has to load and parse the class under test.

• Reusing strategies - the strategies evolved by the genetic algorithm may be reused

when evolving a new strategy for a new class.

• Static analysis - this system uses a vary naive static analysis technique, a more

advanced technique may be able to improve the adaptation of the evolutionary strategy.

• Code metrics - at the moment the information about the complexity of the methods

being tested is not taken into account. When initializing the testing strategies, code

metrics could be used to have a smarter initialization.

44

5.4 Project information

• Evolution of strategies - by evolving a testing strategy for a longer time may increase

the number of faults found, specially for classes with a low number of faults.

5.4 Project information

The source code and the results reported in this project can be download at www.lucas.za.

org/evotec , most of the results were compiled using a python script which is also available

for download.

45

www.lucas.za.org/evotec

www.lucas.za.org/evotec

Appendix A

Primitive Values

Primitive type Values

BOOLEAN True, False

CHARACTER 8 1 to 255

CHARACTER 32 1 to 600

REAL 32 -100.0, -2.0, -1.0, 1.0, 2.0, 100.0, 3.40282e+38, 1.17549e-38,1.19209e-07

REAL 64 -1.0, 1.0, -2.0, 2.0, 0, 3.14159265358979323846,-2.7182818284590452354, 2.2250738585072014e-308,2.2204460492503131e-16,1.7976931348623157e+308

INTEGER 8 -100, -10 ,-2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8 ,9, 10, 100, Min, Max

INTEGER 16 -100, -10 ,-2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8 ,9, 10, 100, Min, Max

INTEGER 32 -100, -10 ,-9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6,7, 8 ,9, 10, 100, Min, Max

INTEGER 64 -100, -10 ,-2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8 ,9, 10, 100, Min, Max

NATURAL 8 0, 1, 2, 3, 4, 5, 6, 7, 8, 8, 10, 20, 126, 127, 128, 129, Min, Max

NATURAL 16 0, 1, 2, 3, 4, 5, 6, 7, 8, 8, 10, 20, 126, 127, 128, 129, Min, Max

NATURAL 32 0, 1, 2, 3, 4, 5, 6, 7, 8, 8, 10, 20, 126, 127, 128, 129, Min, Max

NATURAL 64 0, 1, 2, 3, 4, 5, 6, 7, 8, 8, 10, 20, 126, 127, 128, 129, Min, Max

Table A.1: Autotest primitive values.

46

Appendix B

Chromosome files

# Parameter File name

1 Boolean boolean.txt

2 Char 32 character 32.txt

3 Char 8 character 6.txt

4 Integer 16 integer 16.txt




8 Natural 16 natural 16.txt




12 Seed seed.txt

13 Real 32 real 32.txt

14 Real 64 real 64.txt

15 Method call method call sequence.txt

16 creation probability creation probability.txt

Table B.1: Chromosome files.

47

Appendix C

Test groups

# Class Group1 ACTIVE LIST, ARRAYED LIST CURSOR2 ARRAY3 ARRAYED CIRCULAR, CIRCULAR CURSOR4 ARRAYED LIST, ARRAYED LIST CURSOR5 ARRAYED QUEUE6 ARRAYED SET7 ARRAYED TREE, CURSOR, BINARY TREE8 BINARY TREE, ARRAYED LIST CURSOR9 BOUNDED QUEUE10 COMPACT CURSOR TREE, COMPACT TREE CURSOR11 DS ARRAYED LIST, DS ARRAYED LIST CURSOR, KL EQUALITY TESTER, AR-

RAY12 DS AVL TREE, DS BINARY SEARCH TREE CURSOR, KL EQUALITY TESTER13 DS BILINKED LIST, DS BILINKED LIST CURSOR, KL EQUALITY TESTER, AR-

RAY14 DS BINARY SEARCH TREE, ,DS BINARY SEARCH TREE CURSOR

,KL EQUALITY TESTER15 DS BINARY SEARCH TREE SET, DS BINARY SEARCH TREE SET CURSOR

Table C.1: List of all test groups part 1

48

16 DS HASH SET, DS HASH SET CURSOR, KL EQUALITY TESTER17 DS LEFT LEANING RED BLACK TREE, DS BINARY SEARCH TREE CURSOR,

KL EQUALITY TESTER18 DS LINKED LIST, DS LINKED LIST CURSOR, KL EQUALITY TESTER, ARRAY19 DS LINKED QUEUE, KL EQUALITY TESTER20 DS LINKED STACK, KL EQUALITY TESTER21 DS MULTIARRAYED HASH SET, DS MULTIARRAYED HASH SET CURSOR,

KL EQUALITY TESTER22 DS MULTIARRAYED HASH TABLE, DS MULTIARRAYED HASH TABLE CURSOR,

KL EQUALITY TESTER, KL EQUALITY TESTER23 DS RED BLACK TREE, DS BINARY SEARCH TREE CURSOR,

KL EQUALITY TESTER24 FIXED DFA, STATE OF DFA, LINKED LIST, ARRAY25 FIXED TREE, CURSOR26 HIGH BUILDER, LINKED LIST, ARRAY, PDFA27 KL STRING28 LEXICAL29 LEX BUILDER, LINKED LIST, ARRAY, PDFA30 LINKED AUTOMATON, STATE, CURSOR31 LINKED CIRCULAR, CIRCULAR CURSOR32 LINKED CURSOR TREE, LINKED CURSOR TREE CURSOR33 LINKED DFA, STATE OF DFA, LINKED LIST, LINKED LIST CURSOR34 LINKED LIST LINKED LIST CURSOR35 LINKED PRIORITY QUEUE36 LINKED SET, LINKED LIST CURSOR37 LINKED TREE, LINKED TREE CURSOR, BINARY TREE38 LX DFA, LX START CONDITIONS39 LX DFA REGULAR EXPRESSION, ARRAY40 LX FULL DFA, LX DESCRIPTION41 LX LEX SCANNER, UT ERROR HANDLER, YY BUFFER, LX DESCRIPTION42 LX NFA, DS ARRAYED LIST, LX RULE, LX NFA STATE, LX SYMBOL CLASS43 LX PROTO QUEUE, LX TRANSITION TABLE, LX DFA STATE,

KL EQUALITY TESTER, DS BILINKED LIST CURSOR, LX PROTO44 LX REGEXP PARSER, UT ERROR HANDLER, YY BUFFER,

LX ACTION FACTORY, LX DESCRIPTION45 LX REGEXP SCANNER, UT ERROR HANDLER, YY BUFFER, LX DESCRIPTION


49

46 LX SYMBOL CLASS, DS ARRAYED LIST CURSOR, KL EQUALITY TESTER,LX EQUIVALENCE CLASSES

47 LX TEMPLATE LIST, LX DFA STATE, KL EQUALITY TESTER,DS LINKED LIST CURSOR, LX TRANSITION TABLE, LX EQUIVALENCE CLASSES

48 MULTI ARRAY LIST, MULTAR LIST CURSOR49 PART SORTED SET50 PART SORTED TWO WAY LIST, TWO WAY LIST CURSOR51 SORTED TWO WAY LIST, TWO WAY LIST CURSOR52 SUBSET STRATEGY TREE, BINARY SEARCH TREE SET53 TWO WAY CIRCULAR, CIRCULAR CURSOR54 TWO WAY CURSOR TREE, TWO WAY CURSOR TREE CURSOR55 TWO WAY LIST, TWO WAY LIST CURSOR56 TWO WAY SORTED SET, TWO WAY LIST CURSOR57 TWO WAY TREE, TWO WAY TREE CURSOR, BINARY TREE


50

Appendix D

List of classes tested

# Tested Classes1 ACTIVE LIST2 ARRAY3 ARRAYED CIRCULAR4 ARRAYED LIST5 ARRAYED LIST CURSOR6 ARRAYED QUEUE7 ARRAYED SET8 ARRAYED TREE9 BINARY SEARCH TREE SET10 BINARY TREE11 BOUNDED QUEUE12 CIRCULAR CURSOR13 COMPACT CURSOR TREE14 COMPACT TREE CURSOR15 CURSOR16 DS ARRAYED LIST17 DS ARRAYED LIST CURSOR18 DS AVL TREE19 DS BILINKED LIST20 DS BILINKED LIST CURSOR21 DS BINARY SEARCH TREE22 DS BINARY SEARCH TREE CURSOR23 DS BINARY SEARCH TREE SET

Table D.1: List of all classes tested part 1 of 3

51

24 DS BINARY SEARCH TREE SET CURSOR25 DS HASH SET26 DS HASH SET CURSOR27 DS LEFT LEANING RED BLACK TREE28 DS LINKED LIST29 DS LINKED LIST CURSOR30 DS LINKED QUEUE31 DS LINKED STACK32 DS MULTIARRAYED HASH SET33 DS MULTIARRAYED HASH SET CURSOR34 DS MULTIARRAYED HASH TABLE35 DS MULTIARRAYED HASH TABLE CURSOR36 DS RED BLACK TREE37 FIXED DFA38 FIXED TREE39 HIGH BUILDER40 KL EQUALITY TESTER41 KL STRING42 LEX BUILDER43 LEXICAL44 LINKED AUTOMATON45 LINKED CIRCULAR46 LINKED CURSOR TREE47 LINKED CURSOR TREE CURSOR48 LINKED DFA49 LINKED LIST50 LINKED LIST CURSOR51 LINKED PRIORITY QUEUE52 LINKED SET53 LINKED TREE54 LINKED TREE CURSOR55 LX ACTION FACTORY56 LX DESCRIPTION57 LX DFA58 LX DFA REGULAR EXPRESSION59 LX DFA STATE60 LX EQUIVALENCE CLASSES61 LX FULL DFA62 LX LEX SCANNER


52

63 LX NFA64 LX NFA STATE65 LX PROTO66 LX PROTO QUEUE67 LX REGEXP PARSER68 LX REGEXP SCANNER69 LX RULE70 LX START CONDITIONS71 LX SYMBOL CLASS72 LX TEMPLATE LIST73 LX TRANSITION TABLE74 MULTAR LIST CURSOR75 MULTI ARRAY LIST76 PART SORTED SET77 PART SORTED TWO WAY LIST78 PDFA79 SORTED TWO WAY LIST80 STATE81 STATE OF DFA82 SUBSET STRATEGY TREE83 TWO WAY CIRCULAR84 TWO WAY CURSOR TREE85 TWO WAY CURSOR TREE CURSOR86 TWO WAY LIST87 TWO WAY LIST CURSOR88 TWO WAY SORTED SET89 TWO WAY TREE90 TWO WAY TREE CURSOR91 UT ERROR HANDLER92 YY BUFFER


53

Bibliography

[1] Y. Cheon and G. T. Leavens. A simple andpractical approach to unit testing: The JMLand JUnit way. Technical Report 01-12, De-partment of Computer Science, Iowa StateUniversity, Nov. 2001. 1, 6

[2] I. Ciupa, A. Leitner, M. Oriol, and B.Meyer. Experimental assessment of randomtesting for object-oriented software. In Pro-ceedings of the International Symposiumon Software Testing and Analysis 2007 (IS-STA07), pages 8494, 2007. 4, 10, 20

[3] NIST (National Institute of Standardsand Technology): The Economic Impactsof Inadequate Infrastructure for SoftwareTesting, Report 7007.011, available atwww.nist.gov/director/prog-ofc/report02-3.pdf 1

[4] Eric Bezault et al.: Gobo library and tools,at www.gobosoft.com. 1, 6

[5] Xanthakis, S. Ellis, C. Skourlas, C., LeGall,A, Katsikas, S, Application of Genetic Al-gorithms to Software Testing, Proceedingsof the 5th International Conference of Soft-ware Engineering, pages 625-636, France,December, 1992. 2

[6] Shultz, A., Grefenstette, J., De Jong, K.,Test & Evaluation by Genetic Algorithms,Navy Center for Applied Research in Arti-ficial Intelligence, IEEE, 1993. 2

[7] Hunt, J., Testing Control Software using aGenetic Algorithm, Working Paper, Univer-sity of Wales, UK, 1995. 2

[8] Roper, M., Maclean, I., Brooks, A., Miller,J.,Wood, M., Genetic Algorithms and theAutomatic Generation of Test Data, Work-ing Paper, Department of Computer Sci-ence, University of Strathclyde, UK, 1991.2

[9] Watkins, A., The Automatic Generationof Software Test Data using Genetic Algo-rithms, Proceedings of the Fourth SoftwareQuality Conference, 2: 300-309, Dundee,Scotland, July, 1995. 1, 2, 3

[10] Alander, J., Mantere, T. and Turunen, P,Genetic Algorithm Based Software Testing,in G. Smith, N. Steele and R. Albrecht, edi-tors, Artificial Neural Nets and Genetic Al-gorithms, Springer-Verlag, Wien, Austria,pages 325-328, 1998. 2, 3

[11] Tracey, N., Clark, J., Mander, K., Auto-mated Program Flaw Finding Using Sim-ulated Annealing, ISSTA-98, ClearwaterBeach, Florida, USA, 1998. 2

[12] Borgelt, K., Software Test Data Genera-tion From A Genetic Algorithm, IndustrialApplications Of Genetic Algorithms, CRCPress 1998. 2

[13] Pargas, R., Harold, M., Peck, R., Test DataGeneration Using Genetic Algorithms, Soft-ware Testing, Verification And Reliability,9: 263-282, 1999. 1, 2, 3

[14] Jones,B.,Sthamer, H. and D. Eyres. Au-tomatic structural testing using geneticalgorithms. Software Engineering Jour-nal,11(5):299306, 1996. 2

54

BIBLIOGRAPHY

[15] Lin, J-C. and Yeh, P-U. Automatic TestData Generation for Path Testing usingGAs, Information Sciences, 131: 47-64,2001. 2

[16] Michael, C., McGraw, G., Schatz, M., Gen-erating Software Test Data by Evolution,IEEE Transactions On Software Engineer-ing, 27(12), December 2001. 1, 2, 3

[17] Wegener, J., Baresel, A., Sthamer, H., Evo-lutionary Test Environment for AutomaticStructural Testing, Information & SoftwareTechnology, 2001. 2

[18] Harman M. The automatic generation ofsoftware test data using genetic algorithms.Ph.D. thesis, University of Glamorgan, Pon-typrid, Wales, Great Britain, 1996. 1, 2, 3

[19] Daz E, Tuya J., and Blanco R. AutomatedSoftware Testing Using a MetaheuristicTechnique Based on Tabu Search, In 18thIEEE International Conference on Auto-mated Software Engineering, pp. 310-313,2003. 2

[20] Berndt, D., Fisher, J., Johnson, L., Ping-likar, J., and Watkins, A. (2003). Breed-ing Software Test Cases with Genetic Al-gorithms. In 36th Annual Hawaii Int. Con-ference on System Sciences (HICSS2003). 2

[21] D. J. Berndt, A. Watkins, High VolumeSoftware Testing using Genetic Algorithms,hicss,pp.318b, Proceedings of the 38th An-nual Hawaii International Conference onSystem Sciences (HICSS’05) - Track 9, 20052

[22] Alba E., and Chicano J. F. Software Testingwith Evolutionary Strategies, Proceedingsof the Rapid Integration of Software Engi-neering Techniques (RISE-2005), Heraklion,Grecia, 2005 2

[23] McMinn P., and Holcombe M. Evolution-ary testing of state-based programs. InProceedings of the Genetic and Evolution-ary Computation Conference (GECCO05),pages 1013 1020. Washington DC, USA,June 2005. 2

[24] Tonella, P. Evolutionary Testing of Classes.In Proceedings of the 2004 ACM SIGSOFTinternational symposium on Software test-ing and analysis (ISSTA 04), ACM Press,New York, NY (2004) 119-128 2, 3, 25

[25] Stefan Mairhofer, Search-based softwaretesting and complex test data generation ina dynamic programming language, Masterthesis 2008 2

[26] Harman, M., and McMinn, P. A theoreti-cal & empirical analysis of evolutionary test-ing and hill climbing for structural test datageneration. In ISSTA 07: Proceedings of the2007 international symposium on Softwaretesting and analysis (New York, NY, USA,2007), ACM, pp. 7383. 1, 2, 3

[27] Wappler, S., and Lammermann, F. Usingevolutionary algorithms for the unit testingof object-oriented software. In GECCO 05:Proceedings of the 2005 conference on Ge-netic and evolutionary computation (NewYork,NY, USA, 2005), ACM, pp. 10531060.2

[28] Wappler, S., and Wegener, J. Evolution-ary unit testing of object- oriented soft-ware using a hybrid evolutionary algorithm.In CEC06: Pro- ceedings of the 2006IEEE Congress on Evolutionary Computa-tion (2006), IEEE, pp. 851858. 2, 3

[29] ECMA-367 Eiffel: Analysis, Designand Programming Language, 2nd Edi-tion. http://www.ecma-international.org/

55

BIBLIOGRAPHY

publications/standards/Ecma-367.htm. 4,9

[30] Beizer, B.: ’Software Testing Techniques’,Second Edition, New York: van NostrandRheinhold, ISBN 0442206720, 1990 5

[31] Meyer, B. Object-Oriented Software Con-struction, 2nd edition. Prentice Hall, 1997.9

[32] Godefriod, P., Klarlund, N., and Sen, K.Dart:directed automated random testing.In PLDI 05: Proceedings of the 2005ACM SIGPLAN conference on Program-ming language design and implementation(New York, NY, USA, 2005), ACM Press,pp. 213223. 1, 6, 7

[33] Meyer, B., Ciupa, I., Leitner, A., and Liu,L. L. Automatic testing of object-orientedsoftware. In Proceedings of SOFSEM 2007(Current Trends in Theory and Practice ofComputer Science) (2007), J. van Leeuwen,Ed., Lecture Notes in Computer Science,Springer-Verlag. 1, 6, 7

[34] Oriat, C. Jartege: a tool for randomgeneration of unit tests for Java classes.Tech. Rep. RR-1069-I, Centre Nationalde la Recherche Scientifique, Institut Na-tional Polytechnique de Grenoble, Univer-sitte Joseph Fourier Grenoble I, June 2004.1, 6, 7

[35] De Jong, K. A. (1975). An analysis of thebehavior of a class of genetic adaptive sys-tems (Doctoral dissertation, University ofMichigan). Dissertation Abstracts Interna-tional, 36(10), 5140B. (University Micro lmsNo. 76-9381) 12

[36] Holland, J. H. (1975). Adaptation in naturaland arti cial systems. Ann Arbor: Univer-sity of Michigan Press. 12

[37] Goldberg, D. E. (1989c). Genetic algorithmsin search, optimization, and machine learn-ing. Reading, MA: Addison-Wesley. 16, 18

[38] M. Wall. GAlib: A C++ Library ofGenetic Algorithm Components. MIT,http://lancet.mit.edu/ga/, 1996. 19

[39] I. Ciupa, A. Pretschner, A. Leitner, M.Oriol, and B. Meyer. On the predictabilityof random tests for object-oriented software.In Proceedings of the First InternationalConference on Software Testing, Verifica-tion and Validation (ICST08), April 2008.

[40] Satisfying Test Preconditions throughGuided Object Selection Yi Wei, SergeGebhardt, Manual Oriol, Bertrand MeyerThird International Conference on Soft-ware Testing, Verification and Validation(ICST’10) 11

[41] L. S, Silva, Evolutionary Object-OrientedTesting. Msc. thesis, University of Amster-dam, Amsterdam, The Netherlands, 2009.20, 30

[42] L. S, Silva, Evolutionary Testing of Object-Oriented Software, To appear in: Proceed-ings of ACM Symposium on Applied Com-puting in 2010, ACM SIGAPP. 30

[43] EiffelBase. Eiffel Software.http://www.eiffel.com/libraries/base.html.30

[44] HOVEMEYER, D., AND PUGH, W. Find-ing bugs is easy.SIGPLAN Not. 39, 12(2004), 92106. 7

[45] VISSER, W., PASAREANU, C. S., ANDKHURSHID, S. Test input generation withjava pathfinder. In ISSTA 04: Proceedingsof the 2004 ACM SIGSOFT internationalsymposium on Software testing and analysis

56

BIBLIOGRAPHY

(New York, NY, USA, 2004), ACM Press,pp. 97107. 7

[46] BOSHERNITSAN, M., DOONG, R., ANDSAVOIA, A. From Daikon to Agitator:lessons and challenges in building a com-mercial tool for developer testing. In ISSTA06: Proceedings of the 2006 internationalsymposium on Software testing and analysis(New York, NY, USA, 2006), ACM Press,pp. 169180. 7

[47] CSALLNER, C., AND SMARAGDAKIS,Y. Dsd-crasher: A hybrid analysis tool forbug finding. In International Symposiumon Software Testing and Analysis (ISSTA)(July 2006), pp. 245254. 7

[48] PACHECO, C., AND ERNST, M. D.Eclat: Automatic generation and classi-fication of test inputs. In ECOOP 2005Object-Oriented Programming, 19th Euro-pean Conference (Glasgow, Scotland, July2529, 2005). 7

[49] XIE, T., MARINOV, D., SCHULTE, W.,

AND NOTKIN, D. Symstra: A frameworkfor generating object-oriented unit tests us-ing symbolic execution. In Proceedings ofthe 11th International Conference on Toolsand Algorithms for the Construction andAnalysis of Systems (TACAS 05) (April2005), pp. 365381. 7

[50] Jtest. Parasoft Corporation.http://www.parasoft.com/. 7

[51] VICTOR R. BASILI, SENIOR MEMBER,Comparnng the Effectiveness of SoftwareTesting Strategies 7

[52] Yi Wei, Serge Gebhardt, Manual Oriol,Bertrand, ”Satisfying Test Preconditionsthrough Guided Object Selection”, ThirdInternational Conference on Software Test-ing, Verification and Validation (ICST’10),to appear. 30, 31

[53] Yi Wei, Manuel Oriol, and Bertrand Meyer.Is Coverage a Good Measure of Testing Ef-fectiveness? Technical report, ETH Zurich.30, 31

57

Documents

Automated Object-Oriented Software Testing using Genetic …se.inf.ethz.ch/old/projects/lucas_silva/report.pdf · 2010-05-25 · Automated Object-Oriented Software Testing using Genetic