Search-Based Robustness Testing of Data Processing Systems

Preview:

Citation preview

.lusoftware verification & validationVVS

Search-Based !Robustness Testing!

of Data Processing Systems Daniel Di Nardo, Fabrizio Pastore,

Andrea Arcuri, Lionel Briand

University of Luxembourg Interdisciplinary Centre for Security, Reliability and Trust Software Verification and Validation Lab

Data Processing

System

Data Processing

System

Satellite

Data Processing

System

4

•  Essential component of systems that aggregate and analyse real-world data

•  Robustness is “the degree to which a system or component can function correctly in the presence of invalid inputs or stressful environmental conditions”

Data Processing

System

Multiple fields

Nested structure w/ different types

Constraints among fields

Huge amount

Valid data

Invalid data

Accepted & processed

Discarded

Data Processing

System

Multiple fields

Nested structure w/ different types

Constraints among fields

Huge amount

Valid data

Invalid data

Accepted & processed

Discarded

Contributions

• An evolutionary algorithm to automate robustness testing of data processing systems

• Use of four fitness functions (model-based and code-based) that enable the effective generation of robustness test cases by means of evolutionary algorithms

• An extensive study of the effect of fitness functions and configuration parameters on the effectiveness of the approach using an industrial data processing system as case study.

7

Testing Automation Problems • How to automatically generate test inputs?

• Data mutation methodology [ICST’15]

• How to automatically verify test execution results?

• Modelling methodology [ASE’13]

• How to identify the most effective inputs?

• Best size of inputs? Which data types to consider? !How many data faults should be present? !Which constraints should be broken?

• Meta-heuristic search approach [ASE’15]

Testing Automation Problems • How to automatically generate test inputs?

• Data mutation methodology [ICST’15]

• How to automatically verify test execution results?

• Modelling methodology [ASE’13]

• How to identify the most effective inputs?

• Best size of inputs? Which data types to consider? !How many data faults should be present? !Which constraints should be broken?

• Meta-heuristic search approach [ASE’15]

10

Meta-heuristic search approach

Model-based mutation to generate inputs

Model-based validation as oracle +

Coverage objectives to evaluate inputs +

Data Modelling Methodology

Input

Output

Modelling using Class Diagrams + OCL Constraints

context Vcdu inv: let frameCount : Integer = self.vcFrameCount, previousFrameCount : Integer = self.vcFrameCount in frameCount <> previousFrameCount + 1 implies VcduEvents.allInstances()

->exists(e | e.eventType = COUNTER_JUMP)

Software Under Test

Output

Constraints Checking

Test Input

Data Model

Violated Constraints

Model Instance

Field Data Mutation

Based Generation

Model-Based Mutation Testing

Generic mutation operators (reusable across projects)

Class    Instance  

Duplica/on  

A2ribute  Replacement  with  Random  

A2ribute  Bit  

Flipping  

Class    Instance  Removal  

Class    Instances  Swapping  

A2ribute  Replacement  

Boundary  Cond.  

Generic mutation operators (reusable across projects)

Configurations for operators (fit the fault model)

Transmission Vcdu 1

1 1

versionNumber : Integer spaceCraftId : Integer checksum : Integer

Header

Class    Instance  

Duplica/on  

A2ribute  Bit  

Flipping  

Class    Instance  Removal  

Class    Instances  Swapping  

A2ribute  Replacement  

Boundary  Cond.  

1..*

A2ribute  Replacement  with  Random  

Generic mutation operators (reusable across projects)

Configurations for operators (fit the fault model)

Transmission Vcdu 1

1 1

versionNumber : Integer spaceCraftId : Integer checksum : Integer

Header

Class    Instance  

Duplica/on  

A2ribute  Bit  

Flipping  

Class    Instance  Removal  

Class    Instances  Swapping  

A2ribute  Replacement  

Boundary  Cond.  

1..*

A2ribute  Replacement  with  Random  

Configuration for mutation operators is provided by UML stereotypes used to select mutation targets !!!

Generic mutation operators (reusable across projects)

Transmission Vcdu 1

1 1

<Identifier> versionNumber : Integer spaceCraftId : Integer checksum : Integer

Header

Class    Instance  

Duplica/on  

A2ribute  Bit  

Flipping  

Class    Instance  Removal  

Class    Instances  Swapping  

A2ribute  Replacement  

Boundary  Cond.  

1..*

A2ribute  Replacement  with  Random  

Configuration for mutation operators is provided by UML stereotypes used to select mutation targets !!!

Configurations for operators (fit the fault model)

Generic mutation operators (reusable across projects)

Transmission Vcdu 1

1 1

<Identifier> versionNumber : Integer spaceCraftId : Integer checksum : Integer

Header

Class    Instance  

Duplica/on  

A2ribute  Replacement  with  Random  

A2ribute  Bit  

Flipping  

Class    Instance  Removal  

Class    Instances  Swapping  

A2ribute  Replacement  

Boundary  Cond.  

1..*

Configurations for operators (fit the fault model)

Configuration for mutation operators is provided by UML stereotypes used to select mutation targets !!!

Test Input Field Data Mutation

Based Generation

Test Input Test Input Test Input Test Input Test Input

Effective Small size

By measuring specific objectives.

How to generate an effective and small test suite?

How to evaluate the effectiveness of a test suite?

By means of a meta-heuristic search algorithm.

Generic mutation operators (reusable across projects)

Configurations for operators (fit the fault model)

Transmission Vcdu 1

1 1

<Identifier> versionNumber : Integer spaceCraftId : Integer <Derived> checksum : Integer

Header

Class    Instance  

Duplica/on  

A2ribute  Replacement  with  Random  

A2ribute  Bit  

Flipping  

Class    Instance  Removal  

Class    Instances  Swapping  

A2ribute  Replacement  

Boundary  Cond.  

•  UML stereotypes to select !mutation targets

•  UML stereotype to identify the !fields to update!!

1..*

19

Generic mutation operators (reusable across projects)

Configurations for operators (fit the fault model)

Transmission Vcdu 1..* 1

1 1

<Identifier> versionNumber : Integer spaceCraftId : Integer <Derived> checksum : Integer

Header

Class    Instance  

Duplica/on  

A2ribute  Replacement  with  Random  

A2ribute  Bit  

Flipping  

Class    Instance  Removal  

Class    Instances  Swapping  

A2ribute  Replacement  

Boundary  Cond.  

•  UML stereotypes to select !mutation targets

•  UML stereotype to identify the !fields to update

•  OCL queries to express complex !target selection criteria

20

How to determine if the generated test suite is effective?

21

Test Effectiveness Objectives • O1: Include input data that covers all the classes of the data model

• Data has a complex structure

• O2: Cover all the data faults of a fault model

• A variety of faults might be present in a system

• O3: Cover all the clauses of the input/output constraints

•  Input/output constraints can have multiple conditions under which a given output is expected

• O4: Maximise code coverage

•  Implemented features should be fully executed

O1: Cover all the classes of the data model

• Coverage of each class of a data model is tracked

• Test input covers a class if it contains at least one instance of the class

O1: Cover all the classes of the data model

Vcdu

ActivePacketZone

1

1 1

versionNumber : Integer vcFrameCount : Integer checksum : Integer

Header

1..*

Transmission

Packet

IdlePacketZone

PacketZone

1 1..*

1 1

O1: Cover all the classes of the data model

Objective Targets Test Inputs Inp1 Inp2 Inp3

Vcdu X X X Header X X X IdlePacketZone X X ActivePacketZone X X Packet X X X

O2: Cover the fault model

• Attributes and class instances of the input data model can be mutated in different ways by different mutation operators

• Keep track of which mutation operator(s) have been applied to a specific class/attribute instance when generating test data

O2: Cover the fault model Vcdu

Packet

1

1 1

versionNumber : Integer vcFrameCount : Integer

Header

1..*

O2: Cover the fault model Vcdu

Packet

1

1 1

versionNumber : Integer vcFrameCount : Integer

Header

1..*

Header.versionNumber::ReplaceWithRandom

Attribute Instance Mutation Operator

O2: Cover the fault model Vcdu

Packet

1

1 1

versionNumber : Integer vcFrameCount : Integer

Header

1..*

Header.vcFrameCount::ReplaceWithRandom

O2: Cover the fault model Vcdu

Packet

1

1 1

versionNumber : Integer vcFrameCount : Integer

Header

1..*

Packet::InstanceDuplication Packet::InstanceRemoval Packet::InstanceSwapping

Class Instance Mutation Operator

O2: Cover the fault model

Objective Targets Test Inputs Inp1 Inp2 Inp3

Header.versionNumber::ReplaceWithRandom X X Header.vcFrameCount::ReplaceWithRandom X Packet::InstanceRemoval Packet::InstanceDuplication Packet::InstanceSwapping X

O3: Cover clauses of constraints

• An input/output constraint shows the output expected under a given input condition

• The test suite should stress all the conditions under which a given output is expected

O3: Cover clauses of constraints context Vcdu inv: if previousFrameCount < 16777215

then frameCount <> previousFrameCount + 1 else

previousFrameCount = 16777215 and frameCount <> 0 endif implies VcduEvent.allInstances()èexists(e | e.eventType = COUNTER_JUMP)

O3: Cover clauses of constraints context Vcdu inv: if previousFrameCount < 16777215

then frameCount <> previousFrameCount + 1 else

previousFrameCount = 16777215 and frameCount <> 0 endif implies VcduEvent.allInstances()èexists(e | e.eventType = COUNTER_JUMP)

For each clause, keep track of whether a test input makes the clause true and/or false.

O3: Cover clauses of constraints Objective Targets Test Inputs

Inp1 Inp2 Inp3 True : previousFrameCount < 16777215 X X X True : frameCount <> previousFrameCount + 1 X True : previousFrameCount = 16777215 True : frameCount <> 0 X X X

False : previousFrameCount < 16777215 False : frameCount <> previousFrameCount + 1 X X X False : previousFrameCount = 16777215 X X X False : frameCount <> 0

O4: Maximize code coverage • Execute JaCoCo to measure the instructions covered by each

test case

Objective Targets Test Inputs Inp1 Inp2 Inp3

SesDaq.java : Instruction 10 X X X SesDaq.java : Instruction 11 X …

• Limitation: Requires the execution of the system under test

Evolutionary Algorithm with Archive

How to generate an effective and small test suite?

Huge amount of test inputs can be generated

Exhaustive test generation not feasible

Sample new chunk

Field Data

With Seeding No Seeding

Sample new chunk

Field Data

Field data (satellite transmission):

With Seeding No Seeding

No seeding: frequent packet types selected more often No seeding: packets randomly selected

Sample new chunk

Field Data

Field data (satellite transmission):

With Seeding No Seeding

With seeding: all packet types same probability With seeding: packet types randomly selected

Apply a mutation Put in Archive

Copy from archive

Sample new chunk

Field Data

filtering

pruning

Assessment

filtering

pruning

Assessment

Put in Archive

Objective 1

Data model coverage

Vcdu X X X Header X X X IdlePacketZone X X ActivePacketZone X X X Packet X X X

Objective 2

Fault model! coverage

Header.versionNumber::ReplaceWithRandom X X Header.vcFrameCount::ReplaceWithRandom X Packet::InstanceRemoval X Packet::InstanceDuplication Packet::InstanceSwapping

Objective 3

Constraints !clause !

coverage

True : previousFrameCount < 16777215 X X X True : frameCount <> previousFrameCount + 1 X True : previousFrameCount = 16777215 True : frameCount <> 0 X X X False : previousFrameCount < 16777215 False : frameCount <> previousFrameCount + 1 X X X False : previousFrameCount = 16777215 X X X

Objective 4 Code cov.

SesDaq.java : Line 10 X X X SesDaq.java : Line 11 X

Objective 1

Data model coverage

Vcdu X X X Header X X X IdlePacketZone X X ActivePacketZone X X X Packet X X X

Objective 2

Fault model! coverage

Header.versionNumber::ReplaceWithRandom X X Header.vcFrameCount::ReplaceWithRandom X Packet::InstanceRemoval X Packet::InstanceDuplication Packet::InstanceSwapping

Objective 3

Constraints !clause !

coverage

True : previousFrameCount < 16777215 X X X True : frameCount <> previousFrameCount + 1 X True : previousFrameCount = 16777215 True : frameCount <> 0 X X X False : previousFrameCount < 16777215 False : frameCount <> previousFrameCount + 1 X X X False : previousFrameCount = 16777215 X X X

Objective 4 Code cov.

SesDaq.java : Line 10 X X X SesDaq.java : Line 11 X

Objective 1

Data model coverage

Vcdu X X X Header X X X IdlePacketZone X X ActivePacketZone X X X Packet X X X

Objective 2

Fault model! coverage

Header.versionNumber::ReplaceWithRandom X X Header.vcFrameCount::ReplaceWithRandom X Packet::InstanceRemoval X Packet::InstanceDuplication Packet::InstanceSwapping

Objective 3

Constraints !clause !

coverage

True : previousFrameCount < 16777215 X X X True : frameCount <> previousFrameCount + 1 X True : previousFrameCount = 16777215 True : frameCount <> 0 X X X False : previousFrameCount < 16777215 False : frameCount <> previousFrameCount + 1 X X X False : previousFrameCount = 16777215 X X X

Objective 4 Code cov.

SesDaq.java : Line 10 X X X SesDaq.java : Line 11 X

Objective Targets Test Inputs Inp1 Inp2 Inp3

Objective 1

Data model coverage

Vcdu X X X Header X X X IdlePacketZone X X ActivePacketZone X X X Packet X X X

Objective 2

Fault model coverage

Header.versionNumber::ReplaceWithRandom X X Header.vcFrameCount::ReplaceWithRandom X Packet::InstanceRemoval X Packet::InstanceDuplication Packet::InstanceSwapping

Objective 3

Constraints !clause !

coverage

True : previousFrameCount < 16777215 X X X True : frameCount <> previousFrameCount + 1 X True : previousFrameCount = 16777215 True : frameCount <> 0 X X X False : previousFrameCount < 16777215 False : frameCount <> previousFrameCount + 1 X X X False : previousFrameCount = 16777215 X X X False : frameCount <> 0

Objective 4 Code Coverage

SesDaq.java : Line 10 X X X SesDaq.java : Line 11 X

Apply a mutation Put in Archive

Copy from archive

Sample new chunk

Field Data

filtering

pruning

Assessment

47

Test Inputs

Execute on System And Validate Results

Constraint Violations

48

Apply a mutation

Put in Archive

Copy from archive

Sample new chunk

Field Data

Test Inputs

Execute on System And Validate Results

Constraint Violations

filtering

pruning

Assessment

Research questions

•  RQ1: How does the search algorithm compare with random and state-of-the-art approaches?

•  RQ2: How does fitness based on code coverage affect performance?

•  RQ3: How does seeding affect performance?

•  RQ4: What are the configuration parameters that affect performance?

•  RQ5: What configuration should be used in practice?

•  Case study: Satellite DAQ developed by SES

50

Apply a mutation

Put in Archive

Copy from archive

Sample new chunk

Field Data

Test Inputs

Execute on System And Validate Results

Constraint Violations

filtering

pruning

Assessment

p seeding = 0, 0.5

p mutation = 0, 0.5, 1 p sampling = 0.3, 0.5, 0.8

Max mutations = 1, 10, 100

Stop after: 50k, 100k, 150k, 200k, 250k

Coverage-fitness: on, off

This leads to 3 × 3 × 3 × 2 × 2 = 108 configurations 108 × 5 = 540 different configurations of search algorithm Each experiment repeated 5 times to account for randomness: 540 × 5 = 2700 runs

RQ1: How does the search algorithm compare with random and state-of-the-art approaches?

Budget (in Cadus) Configuration Coverage # of Tests 50k Best: r=0.5,m=1,n=100

BO: r=0.5,m=1,n=100Rand: r=1,m=1,n=1

23424.423424.423386.8

28.428.443.2

100k Best: r=0.5,m=1,n=100BO: r=0.5,m=1,n=100Rand: r=1,m=1,n=1

23487.823487.8 23436.8

31.6 31.6 52.0

150k Best: r=0.5,m=1,n=100BO: r=0.5,m=1,n=100Rand: r=1,m=1,n=1

23502.0 23502.0 23453.4

34.0 34.0 57.8

200k Best: r=0.5,m=0.5,n=100BO: r=0.5,m=1,n=100Rand: r=1,m=1,n=1

23519.6 23513.4 23465.8

34.6 36.0 60.2

250k Best: r=0.5,m=1,n=10BO: r=0.5,m=1,n=100Rand: r=1,m=1,n=1

23538.6 23515.2 23482.6

38.4 36.4 62.4

r, probability of random sampling m, probably of applying mutation when sampling n, maximum number of allowed mutations in a test (Seeding not used)

Best, best configuration for the given search budget BO, best configuration, on average, over all the search budgets Rand, random approach

RQ1: How does the search algorithm compare with random and state-of-the-art approaches?

• Random approach

• Always sample and mutate; do not reuse archived items

• Previous approach (ICST’15)

• Stops test input generation when all attributes have been mutated at least once by each applicable mutation operator

• Search-based algorithm

• Best overall configuration

• Best configuration for a given budget

RQ1: How does the search algorithm compare with random and state-of-the-art approaches?

Budget Configuration Coverage # of Tests ICST’15 23283.0 43.0

50k Best: r=0.5,m=1,n=100BO: r=0.5,m=1,n=100Random

23424.423424.423386.8

28.428.443.2

100k Best: r=0.5,m=1,n=100BO: r=0.5,m=1,n=100Rand: r=1,m=1,n=1

23487.823487.8 23436.8

31.6 31.6 52.0

150k Best: r=0.5,m=1,n=100BO: r=0.5,m=1,n=100Rand: r=1,m=1,n=1

23502.0 23502.0 23453.4

34.0 34.0 57.8

200k Best: r=0.5,m=0.5,n=100BO: r=0.5,m=1,n=100Rand: r=1,m=1,n=1

23519.6 23513.4 23465.8

34.6 36.0 60.2

250k Best: r=0.5,m=1,n=10BO: r=0.5,m=1,n=100Rand: r=1,m=1,n=1

23538.6 23515.2 23482.6

38.4 36.4 62.4

Search algorithm achieves better coverage than both random and the ICST’15

approaches. Search also generates significantly

smaller test suites.

RQ1: How does the search algorithm compare with random and state-of-the-art approaches?

Budget Configuration Coverage # of Tests ICST’15 23283.0 43.0

50k Best: r=0.5,m=1,n=100BO: r=0.5,m=1,n=100Random

23424.423424.423386.8

28.428.443.2

100k Best: r=0.5,m=1,n=100BO: r=0.5,m=1,n=100Random

23487.823487.8 23436.8

31.6 31.6 52.0

150k Best: r=0.5,m=1,n=100BO: r=0.5,m=1,n=100Random

23502.0 23502.0 23453.4

34.0 34.0 57.8

200k Best: r=0.5,m=0.5,n=100BO: r=0.5,m=1,n=100Random

23519.6 23513.4 23465.8

34.6 36.0 60.2

250k Best: r=0.5,m=1,n=10BO: r=0.5,m=1,n=100Random

23538.6 23515.2 23482.6

38.4 36.4 62.4

At the cost of a larger test suite.

With higher search budgets, search can achieve greater coverage.

RQ1: How does the search algorithm compare with random and state-of-the-art approaches?

55

Budget Configuration Coverage # of Tests ICST’15 23283.0 43.0

50k Best: r=0.5,m=1,n=100BO: r=0.5,m=1,n=100Rand: r=1,m=1,n=1

23424.423424.423386.8

28.428.443.2

100k Best: r=0.5,m=1,n=100BO: r=0.5,m=1,n=100Rand: r=1,m=1,n=1

23487.823487.8 23436.8

31.6 31.6 52.0

150k Best: r=0.5,m=1,n=100BO: r=0.5,m=1,n=100Rand: r=1,m=1,n=1

23502.0 23502.0 23453.4

34.0 34.0 57.8

200k Best: r=0.5,m=0.5,n=100BO: r=0.5,m=1,n=100Rand: r=1,m=1,n=1

23519.6 23513.4 23465.8

34.6 36.0 60.2

250k Best: r=0.5,m=1,n=10BO: r=0.5,m=1,n=100Rand: r=1,m=1,n=1

23538.6 23515.2 23482.6

38.4 36.4 62.4

r, probability of random sampling m, probably of applying mutation when sampling n, maximum number of allowed mutations in a test (Seeding not used)

Best, best configuration for the given search budget BO, best configuration, on average, over all the search budgets Rand, random approach

Search algorithm achieves better coverage than a random approach.

APT achieved an average coverage of 23283 instructions.

Less than both search and random.

Search also generates significantly smaller test suites.

RQ2: How does fitness based on code coverage affect performance?

Budget Code Seeding Configuration Coverage # of Tests # of Mut. 50k F

T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.5,m=1,n=100Best: r=0.5,m=1,n=100Best: r=0.5,m=1,n=10 Best: r=0.5,m=1,n=10 BO: r=0.3,m=0,n=10

23361.4 23424.4 23417.2 23428.4 23401.8

17.0 28.4 21.0 34.2 27.0

4.8 3.6 4.0 3.2 4.3

100k F T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.3,m=1,n=10Best: r=0.5,m=1,n=100Best: r=0.5,m=1,n=10 Best: r=0.3,m=0,n=10 BO: r=0.3,m=0,n=10

23404.4 23487.8 23442.2 23487.0 23487.0

16.8 31.6 21.0 33.2 33.2

8.2 4.9 6.4 5.6 5.6

150k F T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.8,m=1,n=100 Best: r=0.5,m=1,n=100Best: r=0.5,m=1,n=100 Best: r=0.3,m=0,n=10 BO: r=0.3,m=0,n=10

23418.4 23502.0 23447.4 23528.2 23528.2

28.2 34.0 23.4 35.6 35.6

4.0 6.0 7.5 6.5 6.5

200k F T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.8,m=1,n=100 Best: r=0.5,m=0.5,n=100Best: r=0.5,m=1,n=100 Best: r=0.3,m=0,n=10 BO: r=0.3,m=0,n=10

23426.0 23519.6 23456.0 23551.0 23551.0

28.0 34.6 23.2 37.2 37.2

4.7 6.7 9.2 7.0 7.0

250k F T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.8,m=1,n=100 Best: r=0.5,m=1,n=10Best: r=0.5,m=1,n=100 Best: r=0.3,m=0,n=10 BO: r=0.3,m=0,n=10

23433.2 23538.6 23461.8 23554.4 23554.4

28.6 38.4 23.6 37.2 37.2

5.4 7.1 10.3 7.4 7.4

r, probability of random sampling m, probably of applying mutation when sampling n, maximum number of allowed mutations in a test

Best, best configuration for the given search budget BO, best configuration, on average, over all the search budgets

RQ2: How does fitness based on code coverage affect performance?

Budget Code Seeding Configuration Coverage # of Tests # of Mut. 50k F

T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.5,m=1,n=100Best: r=0.5,m=1,n=100Best: r=0.5,m=1,n=10 Best: r=0.5,m=1,n=10 BO: r=0.3,m=0,n=10

23361.4 23424.4 23417.2 23428.4 23401.8

17.0 28.4 21.0 34.2 27.0

4.8 3.6 4.0 3.2 4.3

100k F T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.3,m=1,n=10Best: r=0.5,m=1,n=100Best: r=0.5,m=1,n=10 Best: r=0.3,m=0,n=10 BO: r=0.3,m=0,n=10

23404.4 23487.8 23442.2 23487.0 23487.0

16.8 31.6 21.0 33.2 33.2

8.2 4.9 6.4 5.6 5.6

150k F T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.8,m=1,n=100 Best: r=0.5,m=1,n=100Best: r=0.5,m=1,n=100 Best: r=0.3,m=0,n=10 BO: r=0.3,m=0,n=10

23418.4 23502.0 23447.4 23528.2 23528.2

28.2 34.0 23.4 35.6 35.6

4.0 6.0 7.5 6.5 6.5

200k F T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.8,m=1,n=100 Best: r=0.5,m=0.5,n=100Best: r=0.5,m=1,n=100 Best: r=0.3,m=0,n=10 BO: r=0.3,m=0,n=10

23426.0 23519.6 23456.0 23551.0 23551.0

28.0 34.6 23.2 37.2 37.2

4.7 6.7 9.2 7.0 7.0

250k F T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.8,m=1,n=100 Best: r=0.5,m=1,n=10Best: r=0.5,m=1,n=100 Best: r=0.3,m=0,n=10 BO: r=0.3,m=0,n=10

23433.2 23538.6 23461.8 23554.4 23554.4

28.6 38.4 23.6 37.2 37.2

5.4 7.1 10.3 7.4 7.4

r, probability of random sampling m, probably of applying mutation when sampling n, maximum number of allowed mutations in a test

Best, best configuration for the given search budget BO, best configuration, on average, over all the search budgets

Code coverage fitness results in higher code coverage.

At the expense of a larger test suite.

RQ2: How does fitness based on code coverage affect performance?

• For each search budget:

•  Identified the best configuration with/without the code coverage objective enabled

Code coverage objective results in test suites with higher code coverage.

At the expense of a larger test suite (50% more test cases).

RQ3: How does seeding affect performance?

• For each search budget:

•  Identified the best configuration with/without seeding

Seeding is always part of the configurations that achieve the

highest code coverage or lowest number of test cases

(for search budgets above 150k).

RQ4: What are the configuration parameters that affect performance?

RQ3: How does smart seeding affect performance?

Budget Code Seeding Configuration Coverage # of Tests # of Mut. 50k F

T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.5,m=1,n=100Best: r=0.5,m=1,n=100Best: r=0.5,m=1,n=10 Best: r=0.5,m=1,n=10 BO: r=0.3,m=0,n=10

23361.4 23424.4 23417.2 23428.4 23401.8

17.0 28.4 21.0 34.2 27.0

4.8 3.6 4.0 3.2 4.3

100k F T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.3,m=1,n=10Best: r=0.5,m=1,n=100Best: r=0.5,m=1,n=10 Best: r=0.3,m=0,n=10 BO: r=0.3,m=0,n=10

23404.4 23487.8 23442.2 23487.0 23487.0

16.8 31.6 21.0 33.2 33.2

8.2 4.9 6.4 5.6 5.6

150k F T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.8,m=1,n=100 Best: r=0.5,m=1,n=100Best: r=0.5,m=1,n=100 Best: r=0.3,m=0,n=10 BO: r=0.3,m=0,n=10

23418.4 23502.0 23447.4 23528.2 23528.2

28.2 34.0 23.4 35.6 35.6

4.0 6.0 7.5 6.5 6.5

200k F T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.8,m=1,n=100 Best: r=0.5,m=0.5,n=100Best: r=0.5,m=1,n=100 Best: r=0.3,m=0,n=10 BO: r=0.3,m=0,n=10

23426.0 23519.6 23456.0 23551.0 23551.0

28.0 34.6 23.2 37.2 37.2

4.7 6.7 9.2 7.0 7.0

250k F T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.8,m=1,n=100 Best: r=0.5,m=1,n=10Best: r=0.5,m=1,n=100 Best: r=0.3,m=0,n=10 BO: r=0.3,m=0,n=10

23433.2 23538.6 23461.8 23554.4 23554.4

28.6 38.4 23.6 37.2 37.2

5.4 7.1 10.3 7.4 7.4

r, probability of random sampling m, probably of applying mutation when sampling n, maximum number of allowed mutations in a test

Best, best configuration for the given search budget BO, best configuration, on average, over all the search budgets

RQ3: How does smart seeding affect performance?

Budget Code Seeding Configuration Coverage # of Tests # of Mut. 50k F

T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.5,m=1,n=100Best: r=0.5,m=1,n=100Best: r=0.5,m=1,n=10 Best: r=0.5,m=1,n=10 BO: r=0.3,m=0,n=10

23361.4 23424.4 23417.2 23428.4 23401.8

17.0 28.4 21.0 34.2 27.0

4.8 3.6 4.0 3.2 4.3

100k F T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.3,m=1,n=10Best: r=0.5,m=1,n=100Best: r=0.5,m=1,n=10 Best: r=0.3,m=0,n=10 BO: r=0.3,m=0,n=10

23404.4 23487.8 23442.2 23487.0 23487.0

16.8 31.6 21.0 33.2 33.2

8.2 4.9 6.4 5.6 5.6

150k F T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.8,m=1,n=100 Best: r=0.5,m=1,n=100Best: r=0.5,m=1,n=100 Best: r=0.3,m=0,n=10 BO: r=0.3,m=0,n=10

23418.4 23502.0 23447.4 23528.2 23528.2

28.2 34.0 23.4 35.6 35.6

4.0 6.0 7.5 6.5 6.5

200k F T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.8,m=1,n=100 Best: r=0.5,m=0.5,n=100Best: r=0.5,m=1,n=100 Best: r=0.3,m=0,n=10 BO: r=0.3,m=0,n=10

23426.0 23519.6 23456.0 23551.0 23551.0

28.0 34.6 23.2 37.2 37.2

4.7 6.7 9.2 7.0 7.0

250k F T F T T

0.0 0.0 0.5 0.5 0.5

Best: r=0.8,m=1,n=100 Best: r=0.5,m=1,n=10Best: r=0.5,m=1,n=100 Best: r=0.3,m=0,n=10 BO: r=0.3,m=0,n=10

23433.2 23538.6 23461.8 23554.4 23554.4

28.6 38.4 23.6 37.2 37.2

5.4 7.1 10.3 7.4 7.4

r, probability of random sampling m, probably of applying mutation when sampling n, maximum number of allowed mutations in a test

Best, best configuration for the given search budget BO, best configuration, on average, over all the search budgets

For search budgets greater than 150k, smart seeding achieves the highest coverage or

lowest number of test cases.

Apply a mutation

Put in Archive

Copy from archive

Sample new chunk

Field Data

Test Inputs

Execute on System And Validate Results

Constraint Violations

filtering

pruning

Assessment

p sampling = 0.3, 0.5, 0.8

Max mutations = 1, 10, 100

p seeding = 0, 0.5

p mutation = 0, 0.5, 1

Coverage-fitness: on, off

Stop after: 50k, 100k, 150k, 200k, 250k

RQ4: What are the configuration parameters that affect performance?

Coverage fitness applied in top configurations.

Never by worst ones.

Apply a mutation

Put in Archive

Copy from archive

Sample new chunk

Field Data

Test Inputs

Execute on System And Validate Results

Constraint Violations

filtering

pruning

Assessment

p sampling = 0.3, 0.5, 0.8

Max mutations = 1, 10, 100

p seeding = 0, 0.5

p mutation = 0, 0.5, 1

Coverage-fitness: on, off

Stop after: 50k, 100k, 150k, 200k, 250k

Coverage fitness applied in top configurations.

Never by worst ones.

Apply a mutation

Put in Archive

Copy from archive

Sample new chunk

Field Data

Test Inputs

Execute on System And Validate Results

Constraint Violations

filtering

pruning

Assessment

p sampling = 0.3, 0.5, 0.8

Max mutations = 1, 10, 100

p seeding = 0, 0.5

p mutation = 0, 0.5, 1

Coverage-fitness: on, off

Stop after: 50k, 100k, 150k, 200k, 250k

RQ4: What are the configuration parameters that affect performance?

For small search budgets, search achieves better results when more focused on exploitation (using archived inputs).

Apply a mutation

Put in Archive

Copy from archive

Sample new chunk

Field Data

Test Inputs

Execute on System And Validate Results

Constraint Violations

filtering

pruning

Assessment

p sampling = 0.3, 0.5, 0.8

Max mutations = 1, 10, 100

p seeding = 0, 0.5

p mutation = 0, 0.5, 1

Coverage-fitness: on, off

Stop after: 50k, 100k, 150k, 200k, 250k

For small search budgets, search achieves better results when more focused on exploitation (using archived inputs).

Apply a mutation

Put in Archive

Copy from archive

Sample new chunk

Field Data

Test Inputs

Execute on System And Validate Results

Constraint Violations

filtering

pruning

Assessment

p sampling = 0.3, 0.5, 0.8

Max mutations = 1, 10, 100

p seeding = 0, 0.5

p mutation = 0, 0.5, 1

Coverage-fitness: on, off

Stop after: 50k, 100k, 150k, 200k, 250k

RQ4: What are the configuration parameters that affect performance?

For larger search budgets, with no seeding or coverage, putting more emphasis on exploration (new samples) pays off.

Apply a mutation

Put in Archive

Copy from archive

Sample new chunk

Field Data

Test Inputs

Execute on System And Validate Results

Constraint Violations

filtering

pruning

Assessment

p sampling = 0.3, 0.5, 0.8

Max mutations = 1, 10, 100

p seeding = 0, 0.5

p mutation = 0, 0.5, 1

Coverage-fitness: on, off

Stop after: 50k, 100k, 150k, 200k, 250k

For larger search budgets, with no seeding or coverage, putting more emphasis on exploration (new samples) pays off.

Apply a mutation

Put in Archive

Copy from archive

Sample new chunk

Field Data

Test Inputs

Execute on System And Validate Results

Constraint Violations

filtering

pruning

Assessment

p sampling = 0.3, 0.5, 0.8

Max mutations = 1, 10, 100

p seeding = 0, 0.5

p mutation = 0, 0.5, 1

Coverage-fitness: on, off

Stop after: 50k, 100k, 150k, 200k, 250k

RQ4: What are the configuration parameters that affect performance?

If either seeding OR coverage fitness used, the need to explore the search landscape decreases.

Apply a mutation

Put in Archive

Copy from archive

Sample new chunk

Field Data

Test Inputs

Execute on System And Validate Results

Constraint Violations

filtering

pruning

Assessment

p sampling = 0.3, 0.5, 0.8

Max mutations = 1, 10, 100

p seeding = 0, 0.5

p mutation = 0, 0.5, 1

Coverage-fitness: on, off

Stop after: 50k, 100k, 150k, 200k, 250k

If either seeding OR coverage fitness used, the need to explore the search landscape decreases.

Apply a mutation

Put in Archive

Copy from archive

Sample new chunk

Field Data

Test Inputs

Execute on System And Validate Results

Constraint Violations

filtering

pruning

Assessment

p sampling = 0.3, 0.5, 0.8

Max mutations = 1, 10, 100

p seeding = 0, 0.5

p mutation = 0, 0.5, 1

Coverage-fitness: on, off

Stop after: 50k, 100k, 150k, 200k, 250k

RQ4: What are the configuration parameters that affect performance?

If either seeding AND coverage fitness used, the need to explore the search landscape further decreases.

Apply a mutation

Put in Archive

Copy from archive

Sample new chunk

Field Data

Test Inputs

Execute on System And Validate Results

Constraint Violations

filtering

pruning

Assessment

p sampling = 0.3, 0.5, 0.8

Max mutations = 1, 10, 100

p seeding = 0, 0.5

p mutation = 0, 0.5, 1

Coverage-fitness: on, off

Stop after: 50k, 100k, 150k, 200k, 250k

If either seeding AND coverage fitness used, the need to explore the search landscape further decreases.

Apply a mutation

Put in Archive

Copy from archive

Sample new chunk

Field Data

Test Inputs

Execute on System And Validate Results

Constraint Violations

filtering

pruning

Assessment

p sampling = 0.3, 0.5, 0.8

Max mutations = 1, 10, 100

p seeding = 0, 0.5

p mutation = 0, 0.5, 1

Coverage-fitness: on, off

Stop after: 50k, 100k, 150k, 200k, 250k

RQ4: What are the configuration parameters that affect performance?

Average number of mutations per test input remains low (~10).

Apply a mutation

Put in Archive

Copy from archive

Sample new chunk

Field Data

Test Inputs

Execute on System And Validate Results

Constraint Violations

filtering

pruning

Assessment

Average number of mutations per test input remains low (~10).

p sampling = 0.3, 0.5, 0.8

Max mutations = 1, 10, 100

p seeding = 0, 0.5

p mutation = 0, 0.5, 1

Coverage-fitness: on, off

Stop after: 50k, 100k, 150k, 200k, 250k

Apply a mutation

Put in Archive

Copy from archive

Sample new chunk

Field Data

Test Inputs

Execute on System And Validate Results

Constraint Violations

filtering

pruning

Assessment

p sampling = 0.3, 0.5, 0.8

Max mutations = 1, 10, 100

p seeding = 0, 0.5

p mutation = 0, 0.5, 1

Coverage-fitness: on, off

Stop after: 50k, 100k, 150k, 200k, 250k

RQ5: What configuration should be used in practice?

RQ5: What configurations should be used in practice?

• Small probability of sampling new test data at random • (p=0.3) !

• Do not mutate new inputs immediately when sampled!

• Limit the max number of mutations (max mutations = 10)

• Seeding and code coverage are used

Higher coverage Smaller test suites

Recommended