Data Dependence Based Testability Transformation in Automated Test
Generation
Presented by: Qi Zhang
Outline Introduction to test data generation Test data generation methods Data dependence oriented test generation Testability transformation in test data
generation Conclusions
Test Data Generation Problem
Given: a target
Goal: find a program input on which the target is executed
ExampleF(int a[10], int b[10], int target) { int i; bool fa, fb; i=1; fa=false; fb=false; while (i < 10 { if (a[i] == target) fa=true; i=i + 1; } if (fa == true) { i=1; fb=true; while (i < 10) { if (b[i] != target) fb=false; i=i+1; } } if (fb==true) printf(“message1”); else printf(“message2”);}
target statement
Target A statement A branch A path A data flow A multiple condition An assertion A specific output value …
Application of Test Data Generation
Code-based (white-box) testing
Identification of program properties
Specification-based testing
Testing specification conformance
…
Test Data Generation Methods Random test generation
Path-oriented test generation
Symbolic execution oriented test generation
Execution-oriented test generation
Goal-oriented test generation
Chaining approach of test generation
Simulated annealing
Evolutionary algorithms
…
Path-Oriented Test Generation
Select path P to target statement S
Target statement S
Find input to execute path P
Input found?
no yes An input to execute path P and target
statement S
Example for path oriented test generation
1 input (a,n);2 max=a[1];3 min=a[1];4 i=2;5 while (i<=n) {6,7 if (max<a[i])
max=a[i];8,9 if (min>a[i]) min=a[i];10 i=i+1;
}11 output(min,max);
2
1
3
4
en
5
67
89
10
11
ex
Path-oriented test generation Finding input to execute the selected path
Symbolic execution oriented test generation
Execution-oriented test generation
Path-Oriented Test Generation
Problems: Selected paths are frequently non-executable A lot of search effort is “wasted” on non-
executable paths It is considered a restrictive in the presence of
loops
Goal-Oriented Test Generation
Paths are not selected Based on actual program execution A control graph of the program is used It solves problems (sub-goals) as they occur to
reach the target statement Fitness functions are used to guide the search
Goal-Oriented Test Generationexecute program
on any input
this executiondoes not leadto the target
this execution may leadto the target
problem node
x
target statement
Goal-Oriented Test Generation
1 input (a,n);2 max=a[1];3 min=a[1];4 i=2;5 while (i<=n) {6,7 if (max<a[i])
max=a[i];8,9 if (min>a[i]) min=a[i];10 i=i+1;
}11 output(min,max);
2
1
3
4
en
5
67
89
10
11
ex
target statement
Goal-Oriented Test Generation
1 input (a,n);2 max=a[1];3 min=a[1];4 i=2;5 while (i<=n) {6,7 if (max<a[i])
max=a[i];8,9 if (min>a[i]) min=a[i];10 i=i+1;
}11 output(min,max);
2
1
3
4
en
5
67
89
10
11
ex
a={2, 7}, n=-5
Initial input:
Goal-Oriented Test Generation
1 input (a,n);2 max=a[1];3 min=a[1];4 i=2;5 while (i<=n) {6,7 if (max<a[i])
max=a[i];8,9 if (min>a[i]) min=a[i];10 i=i+1;
}11 output(min,max);
2
1
3
4
en
5
67
89
10
11
ex
F=i-n=7
find new value of a and nsuch that F<=0
a={2, 7}, n=-5
Initial input:
Goal-Oriented Test Generation
There are many searching algorithms that can be used to find a new program input based on the fitness function
Hill-climbing algorithm Simulated annealing Evolutionary algorithm …
Chaining Approach
The chaining approach is an extension of the goal-oriented approach
The chaining approach uses: Control flow graph Data flow (data dependence) information
1 void F(int A[], int C[]) {int i, j, top, f_exit;2 i=1;3 j = 1 ;4 top = 0 ;5 f_exit=0;6 while (C[j]<5) {7 j = j + 1 ;8 if (C[j] == 1) {9 i = i + 1 ;10 if (A[i] > 0) {11,12 top = top + 1; AR[top] = A[i] ;
}; };
13 if (C[j] == 2) {14 if (top>0) {15,16 write(AR[top]); top = top - 1 ;
}; };
17 if (C[j]==3) { 18,19 if (top>100) {write(1);} //target
statement20 else write(0);
};}; //endwhile
}
data dependence concepts There exists a data dependence between statement S1 and S2 if:
S1 is a definition of variable v (assigns value to v)
S2 is an use of variable v (references v) There exists a path in the program from S1 to S2
along which v is not modified
1 void F(int A[], int C[]) {int i, j, top, f_exit;2 i=1;3 j = 1 ;4 top = 0 ;5 f_exit=0;6 while (C[j]<5) {7 j = j + 1 ;8 if (C[j] == 1) {9 i = i + 1 ;10 if (A[i] > 0) {11,12 top = top + 1; AR[top] =
A[i] ; }; };
13 if (C[j] == 2) {14 if (top>0) {15,16 write(AR[top]); top = top -
1 ; }; };
17 if (C[j]==3) { 18,19 if (top>100) {write(1);}20 else write(0);
};}; //endwhile
}
Chaining Approach
It may significantly increase chances of finding inputs over the goal-oriented approach
It relies on direct data dependences related to problem statements
The chaining approach does not have a “global view” of dependences in the program
Data Dependence Based Test Generation
We present data dependence based test generation
This approach uses a data dependence graph rather than individual data dependences during the search
1 void F(int A[], int C[]) {int i, j, top, f_exit;2 i=1;3 j = 1 ;4 top = 0 ;5 f_exit=0;6 while (C[j]<5) {7 j = j + 1 ;8 if (C[j] == 1) {9 i = i + 1 ;10 if (A[i] > 0) {11,12 top = top + 1; AR[top] =
A[i] ; }; };
13 if (C[j] == 2) {14 if (top>0) {15,16 write(AR[top]); top = top -
1 ; }; };
17 if (C[j]==3) { 18,19 if (top>100) {write(1);}20 else write(0);
};}; //endwhile
}
Data Dependence Based Test Generation
Data dependence based test generation is used when the existing methods fail to find the solution
Suppose the existing methods fail at some conditional statement (predicate) which is referred to as a problem node
The data dependence based test generation constructs a data dependence graph which contains the statements that influence the problem node
Data Dependence Based Test Generation
The data dependence graph is used by the search engine to guide the search
The data dependence based test generation identifies different sequences for exploration in the data-dependence graph leading to the problem statement
The identified sequences are used in the program to guide the search
Data Dependence Based Test Generation
18
11 16
4
Data-dependence graph
Data dependences with respect to variable top
Data Dependence Based Test Generation
The data dependence graph is used by the search engine to guide the search
The data dependence based test generation identifies different sequences for exploration in the data-dependence graph leading to the problem statement
The identified sequences are used in the program to guide the search
Data Dependence Based Test Generation
18
11 16
4
Data Dependence Based Test Generation
18
11 16
4
en, 4, 11, 16, 11, 18
Data Dependence Based Test Generation
Sample sequences generated from the data dependence graph:
P1: en, 4, 18
P2: en, 4, 11, 18
P3: en, 4, 16, 18
P4: en, 4, 11, 16, 18
P5: en, 4, 16, 11, 18
…
Data Dependence Based Test Generation
The data dependence graph is used by the search engine to guide the search
The data dependence based test generation identifies different sequences for exploration in the data-dependence graph leading to the problem statement
The identified sequences are used by the search engine to “execute” (explore) them in the program
Data Dependence Based Test Generation
For some programs, a large number of different sequences can be generated from the data dependence graph for exploration before the solution is found
Many sequences may not lead to the solution It may be expensive to explore sequences in the
original program The search engine may require a lot of effort to
move from one node to another one as specified by the sequences
Testability transformation
The idea is to explore these sequences not in the original program
but in a transformed program in which it should be
much easier (faster) to determine whether the fitness function associated with the problem node may evaluate to the target value for a given sequence
Testability transformation
Original program
input x
Transformedprogram
Sequence S
fitness function F
input x
Testability transformation
The transformed program is used to identify promising sequences
A promising sequence is a sequence for which it is possible to find a program input on which the fitness function at the problem node evaluates to the target value
Testability transformation
Transformedprogram
Sequence S
fitness function F
input x
Find input x on which Fitness function F
evaluates to the target value during execution of
sequence S
Testability transformation
It is inexpensive to identify promising/unpromising sequences in the transformed program
Identified promising sequences are then explored in the original program to find the solution
Testability transformation
A data dependence graph is used to construct a “corresponding (transformed) program”
Testability transformation
18
11 16
4
float TransFunc(int A[], int C[], int PathSize, int S[], int R[]) {int i, j, top;
2 i=1;3 while (i<=PathSize) {4 switch (S[i]) {5 case 4: {top = 0; // 46 break; }7 case 11: {top = top + 1; // 11 8 for (j=1;j<R[i];j++) top = top + 1;9 break; }10 case 16: {top = top - 1; // 1611 for (j=1;j<R[i];j++) top = top - 1;12 break; }13 }14 i++;15 };16 return 100-top; //computation of the fitness function at node 18
}
float TransFunc(int A[], int C[], int PathSize, int S[], int R[]) {int i, j, top;
2 i=1;3 while (i<=PathSize) {4 switch (S[i]) {5 case 4: {top = 0; // 46 break; }7 case 11: {top = top + 1; // 11 8 for (j=1;j<R[i];j++) top = top + 1;9 break; }10 case 16: {top = top - 1; // 1611 for (j=1;j<R[i];j++) top = top - 1;12 break; }13 }14 i++;15 };16 return 100-top; //computation of the fitness function at node 18
}
float TransFunc(int A[], int C[], int PathSize, int S[], int R[]) {int i, j, top;
2 i=1;3 while (i<=PathSize) {4 switch (S[i]) {5 case 4: {top = 0; // 46 break; }7 case 11: {top = top + 1; // 11 8 for (j=1;j<R[i];j++) top = top + 1;9 break; }10 case 16: {top = top - 1; // 1611 for (j=1;j<R[i];j++) top = top - 1;12 break; }13 }14 i++;15 };16 return 100-top; //computation of the fitness function at node 18
}
Testability transformation
18
11 16
4
Testability transformation
18
11 16
4How many times?
float TransFunc(int A[], int C[], int PathSize, int S[], int R[]) {int i, j, top;
2 i=1;3 while (i<=PathSize) {4 switch (S[i]) {5 case 4: {top = 0; // 46 break; }7 case 11: {top = top + 1; // 11 8 for (j=1;j<R[i];j++) top = top + 1;9 break; }10 case 16: {top = top - 1; // 1611 for (j=1;j<R[i];j++) top = top - 1;12 break; }13 }14 i++;15 };16 return 100-top; //computation of the fitness function at node 18
}
Testability transformation
Transformedprogram
PathSize
Sequence S
F
R[]A[] C[]Find input A[], C[], and R[]
on which F < 0
during execution of sequence S
Testability transformation
18
11 16
4
en, 4, 11*, 18
Testability transformation
S = 4 11
R = ? ?
Given: PathSize = 2
Find:
A = ? ?
C = ? ?
Such that F < 0
Testability transformation
R = 1 101
Solution:
A = - -
C = - -
Testability transformation
18
11 16
4
en, 4, 11101, 18
100times
1 void F(int A[], int C[]) {int i, j, top, f_exit;2 i=1;3 j = 1 ;4 top = 0 ;5 f_exit=0;6 while (C[j]<5) {7 j = j + 1 ;8 if (C[j] == 1) {9 i = i + 1 ;10 if (A[i] > 0) {11,12 top = top + 1; AR[top] =
A[i] ; }; };
13 if (C[j] == 2) {14 if (top>0) {15,16 write(AR[top]); top = top -
1 ; }; };
17 if (C[j]==3) { 18,19 if (top>100) {write(1);}20 else write(0);
};}; //endwhile
}
Given a promising sequence:
S = <4, 11101, 18>
How much saving?
With transformed program At most 5 sequence explorations in the transformed
program Only one sequence identified as a promising one
Without transformed program In the best case, over 100 sequence explorations
Data Dependence Based Test Generation
Multiple variables?
Data Dependence Based Test Generation
55
6
24
41
43
30
31
5
22
1
29
lineposwordlenmaxpos
Data Dependence Based Test Generation
For data dependence graphs with multiple variables we identify first data dependence execution graphs (rather than sequences)
In the next step, sequences for exploration are generated from these data dependence execution graphs
Data Dependence Based Test Generation
Data dependence execution graph Each execution graph represents a different way
the fitness function associated with a problem node may be computed
The execution graph contains all dependences that may occur during program execution
The execution graph is derived from the data dependence graph by traversing backwards from the problem node
Data Dependence Based Test Generation55
43
1
43
55
30
29
A sample data dependence execution graph
wordlenwordlen
wordlen wordlen
maxpos
linepos
Data Dependence Based Test Generation
For data dependence graphs with multiple variables we identify first data dependence execution graphs (rather than sequences)
In the next step, valid sequences for exploration are generated from data dependence graphs
Data Dependence Based Test Generation
Valid sequence Represents a possible sequence of executions of
nodes in the execution graph All data dependences in the execution graph are
preserved
Data Dependence Based Test Generation55
43
1
43
55
30
29
wordlen
wordlen
wordlen
wordlen
maxpos
linepos
wordlen lineposlineposwordlen
lineposwordlen
1 43 55 29 43 55 30wordlen
A valid sequence:
Testability transformation
Generated sequences are explored in the transformed program to identify promising/unpromising sequences
Identified promising sequences are then explored in the original program to find the solution
Conclusions Data dependence analysis is used to guide
transformations to improve testability The transformations can improve the test
data generation The transformations employed do not
preserve the meaning of the program, yet this is unimportant in the context of test data generation
Conclusions
By using testability transformation The chances of finding a solution are increased It is much easier to explore different data
dependence sequences The search may find a solution more efficiently
Questions?