Download ppt - Data Dependence Based Testability Transformation in Automated Test Generation Presented by: Qi Zhang

Data Dependence Based Testability Transformation in Automated Test

Generation

Presented by: Qi Zhang

Outline Introduction to test data generation Test data generation methods Data dependence oriented test generation Testability transformation in test data

generation Conclusions

Test Data Generation Problem

Given: a target

Goal: find a program input on which the target is executed

ExampleF(int a[10], int b[10], int target) { int i; bool fa, fb; i=1; fa=false; fb=false; while (i < 10 { if (a[i] == target) fa=true; i=i + 1; } if (fa == true) { i=1; fb=true; while (i < 10) { if (b[i] != target) fb=false; i=i+1; } } if (fb==true) printf(“message1”); else printf(“message2”);}

target statement

Target A statement A branch A path A data flow A multiple condition An assertion A specific output value …

Application of Test Data Generation

Code-based (white-box) testing

Identification of program properties

Specification-based testing

Testing specification conformance

…

Test Data Generation Methods Random test generation

Path-oriented test generation

Symbolic execution oriented test generation

Execution-oriented test generation

Goal-oriented test generation

Chaining approach of test generation

Simulated annealing

Evolutionary algorithms

…

Path-Oriented Test Generation

Select path P to target statement S

Target statement S

Find input to execute path P

Input found?

no yes An input to execute path P and target

statement S

Example for path oriented test generation

1 input (a,n);2 max=a[1];3 min=a[1];4 i=2;5 while (i<=n) {6,7 if (max<a[i])

max=a[i];8,9 if (min>a[i]) min=a[i];10 i=i+1;

}11 output(min,max);

2

1

3

4

en

5

67

89

10

11

ex

Path-oriented test generation Finding input to execute the selected path

Symbolic execution oriented test generation

Execution-oriented test generation

Path-Oriented Test Generation

Problems: Selected paths are frequently non-executable A lot of search effort is “wasted” on non-

executable paths It is considered a restrictive in the presence of

loops

Goal-Oriented Test Generation

Paths are not selected Based on actual program execution A control graph of the program is used It solves problems (sub-goals) as they occur to

reach the target statement Fitness functions are used to guide the search

Goal-Oriented Test Generationexecute program

on any input

this executiondoes not leadto the target

this execution may leadto the target

problem node

x

target statement





2

1

3

4

en

5

67

89

10

11

ex

target statement





2

1

3

4

en

5

67

89

10

11

ex

a={2, 7}, n=-5

Initial input:





2

1

3

4

en

5

67

89

10

11

ex

F=i-n=7

find new value of a and nsuch that F<=0

a={2, 7}, n=-5

Initial input:


There are many searching algorithms that can be used to find a new program input based on the fitness function

Hill-climbing algorithm Simulated annealing Evolutionary algorithm …

Chaining Approach

The chaining approach is an extension of the goal-oriented approach

The chaining approach uses: Control flow graph Data flow (data dependence) information

1 void F(int A[], int C[]) {int i, j, top, f_exit;2 i=1;3 j = 1 ;4 top = 0 ;5 f_exit=0;6 while (C[j]<5) {7 j = j + 1 ;8 if (C[j] == 1) {9 i = i + 1 ;10 if (A[i] > 0) {11,12 top = top + 1; AR[top] = A[i] ;

}; };

13 if (C[j] == 2) {14 if (top>0) {15,16 write(AR[top]); top = top - 1 ;

}; };

17 if (C[j]==3) { 18,19 if (top>100) {write(1);} //target

statement20 else write(0);

};}; //endwhile

}

data dependence concepts There exists a data dependence between statement S1 and S2 if:

S1 is a definition of variable v (assigns value to v)

S2 is an use of variable v (references v) There exists a path in the program from S1 to S2

along which v is not modified

1 void F(int A[], int C[]) {int i, j, top, f_exit;2 i=1;3 j = 1 ;4 top = 0 ;5 f_exit=0;6 while (C[j]<5) {7 j = j + 1 ;8 if (C[j] == 1) {9 i = i + 1 ;10 if (A[i] > 0) {11,12 top = top + 1; AR[top] =

A[i] ; }; };

13 if (C[j] == 2) {14 if (top>0) {15,16 write(AR[top]); top = top -

1 ; }; };

17 if (C[j]==3) { 18,19 if (top>100) {write(1);}20 else write(0);

};}; //endwhile

}

Chaining Approach

It may significantly increase chances of finding inputs over the goal-oriented approach

It relies on direct data dependences related to problem statements

The chaining approach does not have a “global view” of dependences in the program

Data Dependence Based Test Generation

We present data dependence based test generation

This approach uses a data dependence graph rather than individual data dependences during the search


A[i] ; }; };


1 ; }; };


};}; //endwhile

}


Data dependence based test generation is used when the existing methods fail to find the solution

Suppose the existing methods fail at some conditional statement (predicate) which is referred to as a problem node

The data dependence based test generation constructs a data dependence graph which contains the statements that influence the problem node


The data dependence graph is used by the search engine to guide the search

The data dependence based test generation identifies different sequences for exploration in the data-dependence graph leading to the problem statement

The identified sequences are used in the program to guide the search


18

11 16

4

Data-dependence graph

Data dependences with respect to variable top




The identified sequences are used in the program to guide the search


18

11 16

4


18

11 16

4

en, 4, 11, 16, 11, 18


Sample sequences generated from the data dependence graph:

P1: en, 4, 18

P2: en, 4, 11, 18

P3: en, 4, 16, 18

P4: en, 4, 11, 16, 18

P5: en, 4, 16, 11, 18

…




The identified sequences are used by the search engine to “execute” (explore) them in the program


For some programs, a large number of different sequences can be generated from the data dependence graph for exploration before the solution is found

Many sequences may not lead to the solution It may be expensive to explore sequences in the

original program The search engine may require a lot of effort to

move from one node to another one as specified by the sequences

Testability transformation

The idea is to explore these sequences not in the original program

but in a transformed program in which it should be

much easier (faster) to determine whether the fitness function associated with the problem node may evaluate to the target value for a given sequence


Original program

input x

Transformedprogram

Sequence S

fitness function F

input x


The transformed program is used to identify promising sequences

A promising sequence is a sequence for which it is possible to find a program input on which the fitness function at the problem node evaluates to the target value


Transformedprogram

Sequence S

fitness function F

input x

Find input x on which Fitness function F

evaluates to the target value during execution of

sequence S


It is inexpensive to identify promising/unpromising sequences in the transformed program

Identified promising sequences are then explored in the original program to find the solution


A data dependence graph is used to construct a “corresponding (transformed) program”


18

11 16

4

float TransFunc(int A[], int C[], int PathSize, int S[], int R[]) {int i, j, top;

2 i=1;3 while (i<=PathSize) {4 switch (S[i]) {5 case 4: {top = 0; // 46 break; }7 case 11: {top = top + 1; // 11 8 for (j=1;j<R[i];j++) top = top + 1;9 break; }10 case 16: {top = top - 1; // 1611 for (j=1;j<R[i];j++) top = top - 1;12 break; }13 }14 i++;15 };16 return 100-top; //computation of the fitness function at node 18

}



}



}


18

11 16

4


18

11 16

4How many times?



}


Transformedprogram

PathSize

Sequence S

F

R[]A[] C[]Find input A[], C[], and R[]

on which F < 0

during execution of sequence S


18

11 16

4

en, 4, 11*, 18


S = 4 11

R = ? ?

Given: PathSize = 2

Find:

A = ? ?

C = ? ?

Such that F < 0


R = 1 101

Solution:

A = - -

C = - -


18

11 16

4

en, 4, 11101, 18

100times


A[i] ; }; };


1 ; }; };


};}; //endwhile

}

Given a promising sequence:

S = <4, 11101, 18>

How much saving?

With transformed program At most 5 sequence explorations in the transformed

program Only one sequence identified as a promising one

Without transformed program In the best case, over 100 sequence explorations


Multiple variables?


55

6

24

41

43

30

31

5

22

1

29

lineposwordlenmaxpos


For data dependence graphs with multiple variables we identify first data dependence execution graphs (rather than sequences)

In the next step, sequences for exploration are generated from these data dependence execution graphs


Data dependence execution graph Each execution graph represents a different way

the fitness function associated with a problem node may be computed

The execution graph contains all dependences that may occur during program execution

The execution graph is derived from the data dependence graph by traversing backwards from the problem node

Data Dependence Based Test Generation55

43

1

43

55

30

29

A sample data dependence execution graph

wordlenwordlen

wordlen wordlen

maxpos

linepos


For data dependence graphs with multiple variables we identify first data dependence execution graphs (rather than sequences)

In the next step, valid sequences for exploration are generated from data dependence graphs


Valid sequence Represents a possible sequence of executions of

nodes in the execution graph All data dependences in the execution graph are

preserved

Data Dependence Based Test Generation55

43

1

43

55

30

29

wordlen

wordlen

wordlen

wordlen

maxpos

linepos

wordlen lineposlineposwordlen

lineposwordlen

1 43 55 29 43 55 30wordlen

A valid sequence:


Generated sequences are explored in the transformed program to identify promising/unpromising sequences

Identified promising sequences are then explored in the original program to find the solution

Conclusions Data dependence analysis is used to guide

transformations to improve testability The transformations can improve the test

data generation The transformations employed do not

preserve the meaning of the program, yet this is unimportant in the context of test data generation

Conclusions

By using testability transformation The chances of finding a solution are increased It is much easier to explore different data

dependence sequences The search may find a solution more efficiently

Questions?