Upload
kaylyn-wallwork
View
225
Download
9
Tags:
Embed Size (px)
Citation preview
1
Improving Cluster Selection Techniques of Regression Testing
by Slice Filtering
Yongwei Duan, Zhenyu Chen, Zhihong Zhao, Ju Qian and Zhongjun Yang
Software Institute, Nanjing University, Nanjing, Chinahttp://software.nju.edu.cn/zychen
2
Outline• Introduction
• Our Approach
• Experiment and Evaluation
• Future Work
3
Introduction• Test selection techniques
• Cluster selection techniques
• Problems
4
Test selection techniques• Rerunning all of the existing test
cases is costly in regression testing
• Test selection techniques : choose a subset of test cases to rerun
5
Cluster Selection
Run Test Cases CollectionExecution Profiles (Basic block level)
Clusters ofTest Cases
A reduced test suite
Cluster selection overview
Clustering
Sampling
Problems• Too much data to cluster
– Huge amount of execution traces– Always a high dimension
6
Just focus on the code fragments that are actually relevant to the program
modification!!!
Our approach• Overview• Slice filtering• Clustering analysis• Sampling
7
Our approach• Overview
8
Running test cases
Execution traces
Trace filteringtraces
Cluster analysis clusters
Reduced test suite sampling
Slice filtering
• The execution traces are too detailed to be used in clustering analysis
• We use program slice to filter out fragments that are irrelevant to program modification.
9
Slice filtering cont’d
• Statement 2 is changed from ‘if(m<n)’ to ‘if(m<=n)’
• We compute a program slice with respect to statement 2 and intersect it with each execution trace.
• Given 3 test cases, we compare their execution traces and filtered execution traces.
10
if(m<=n){
11
Slice filtering cont’d
Test cases
Input Execution trace(Statement no.)
Statement no. by filteringm n
t1 1 0 1,2,4,5,6,7,8,9,10,11,12,13,14
2,4,5,6,7,8
t2 -1 0 1,2,3,5,6,7,8,9,10,11,12,13,14
2,3,5,6,7,8
t3 -1 1 1,2,3,5,6,7,8,9 2,3,5,6,7,8
• Execution traces are much smaller after program slice filtering.
• Traces of t2 and t3 are the same by filtering while the difference between t1 and t2 is magnified.
• To condense the traces further, adjacent statements within a basic block is combined into one statement.
• Patterns are easy to reveal with simple execution traces.
12
Slice filtering cont’d• But the amount of test cases is still
large. • If a trace is too small (below a
threshold) after intersection with the program slice, it is unlikely to be a fault-revealing test case, so we remove it from the test suite.
13
Slice filtering cont’d• Filtering rate
– We define filtering rate FR as: if the threshold is M and the size of the program slice is N, then the filtering rate FR = M / N * 100%.
– When FR gets lower, the effect of filtering diminishes i.e. fewer features can be eliminated.
14
Slice filtering cont’d• Why not just use Dynamic slicing
– The computing of dynamic slicing is complex and time consuming
– Effective dynamic slicing tools are hard to come by
15
Clustering analysis
•Distance measure– For a filtered trace fi = <ai1,ai2,…,ain>,
where aij is the execution count of a basic block. The distance between two filtered trace fi and fj is:
m
k jkikji aaffD1
2)(),(
16
Sampling
•We use adaptive sampling in our approach
– We first sample a certain number of test cases. If a test case is fault-revealing, the entire cluster from which the test cases are sampled is selected. This strategy favors small clusters and has high probability to select fault-revealing test cases.
17
Experiment & Evaluation
• Subject program– space, from SIR(Software-artifact
Infrastructure Repository )– 5902 LOC– 1533 basic-blocks– 38 modified versions (a real fault is
augmented for each version )– 13585 test cases
18
Experiment & Evaluation
• Subject program• Measurements• Experimental results• Observations
19
Experiment & Evaluation
• 3 measurements– Precision– Reduction– Recall
2020
Experiment & Evaluation
• Precision– if in a certain run the technique selects a
subset of N test cases, in which M test cases are fault-revealing. The precision of the technique is: M / N * 100%.
– Precision measures the extent to which a selection method omits non-fault-revealing test cases in a run
2121
Experiment & Evaluation
• Reduction– if a selection technique selects M test cases
out of all N existing test cases in a certain run, the reduction of the technique is: M / N * 100%.
– Reduction measures the extent to which a technique can reduce the size of the original test suite.
– A low reduction means a selection technique greatly reduce the original test suite.
2222
Experiment & Evaluation• Recall
– if a selection technique selects M fault-revealing test cases out of N existing fault-revealing test cases in a certain run, the recall of the technique is: M / N * 100%.
– Recall measures the extent to which a selection technique can include fault-revealing test cases.
– Recall indicates the fault detecting capability of a technique. A safe selection technique achieves 100% recall.
2323
Experiment & Evaluation• Experimental results
– A comparison between our approach and Dejavu. Dejavu is known as an effective algorithm in its high precision of test selection.
– A comparison between 2 different filtering rate: FR = 0.3 and FR = 0.5
24
Experiment & Evaluation
24
Comparison of precision between our approach when FR=0.3 and Dejavu
25
Experiment & Evaluation
25
Comparison of reduction between our approach when FR=0.3 and Dejavu
26
Experiment & Evaluation
26
Comparison of recall between our approach when FR=0.3 and Dejavu
We achieve certain improvement except version 13, 25, 26, 35, 37, 38.
Experiment & Evaluation
• Analysis– The key to our approach is to isolate the fault-
revealing test cases into small clusters– Failures detected on version 13, 25, 26, 35, 37, 38
are mostly memory access violation failures. Those failures cause premature termination of the execution flows.
– Program slicing cannot predict runtime execution flow changes and therefore cannot provide enough information to differentiate these test cases and lump them into different clusters.
2727
28
Experiment & Evaluation
28
Comparison of precision between FR=0.3 and FR=0.5
29
Experiment & Evaluation
29
Comparison of reduction between FR=0.3 and FR=0.5
30
Experiment & Evaluation
30Comparison of recall between FR=0.3 and FR=0.5
If we raise FR to 0.5, certain improvement on precision, reduction and recall can be achieved
Experiment & Evaluation
• Observations– for most versions, our approach has
higher precision and lower reduction (lower is better) than Dejavu. It means that we can select fault-revealing test cases from the original test suite and select relatively few non-fault-revealing test cases
3131
Experiment & Evaluation
• Observations– the effectiveness of our approach
depends largely on the level of isolations of fault-revealing test cases. By choosing appropriate parameters such as filtering rate, sampling rate, initial cluster number etc., we can enhance the level of isolation.
3232
Future work
• We will try to answer the following questions in our future work– How do distance metrics and cluster
algorithms affect the result of cluster selection techniques?
– Given a program, how to find the best filtering rate and other parameters?
3333
34
Q & A