04/10/23 Data Mining: Principles and Algorithms
1
Data Mining: Concepts and Techniques
— Chapter 11 —
—Software Bug Mining—
Jiawei Han and Micheline Kamber
Department of Computer Science
University of Illinois at Urbana-Champaign
www.cs.uiuc.edu/~hanj
©2006 Jiawei Han and Micheline Kamber. All rights reserved.Acknowledgement: Chao Liu
04/10/23 Data Mining: Principles and Algorithms
2
04/10/23 Data Mining: Principles and Algorithms
3
Outline
Automated Debugging and Failure Triage
SOBER: Statistical Model-Based Fault Localization
Fault Localization-Based Failure Triage
Copy and Paste Bug Mining
Conclusions & Future Research
04/10/23 Data Mining: Principles and Algorithms
4
Software Bugs Are Costly
Software is “full of bugs” Windows 2000, 35 million lines of code
63,000 known bugs at the time of release, 2 per 1000 lines Software failure costs
Ariane 5 explosion due to “errors in the software of the inertial reference system” (Ariaen-5 flight 501 inquiry board report http://ravel.esrin.esa.it/docs/esa-x-1819eng.pdf )
A study by the National Institute of Standards and Technology found that software errors cost the U.S. economy about $59.5 billion annually http://www.nist.gov/director/prog-ofc/report02-3.pdf
Testing and debugging are laborious and expensive “50% of my company employees are testers, and the rest
spends 50% of their time testing!” —Bill Gates, in 1995
04/10/23 Data Mining: Principles and Algorithms
5
Automated Failure Reporting
End-users as Beta testers Valuable information about failure occurrences in reality 24.5 million/day in Redmond (if all users send)
– John Dvorak, PC Magazine Widely adopted because of its usefulness
Microsoft Windows, Linux Gentoo, Mozilla applications … Any applications can implement this functionality
04/10/23 Data Mining: Principles and Algorithms
6
After Failures Collected …: Failure triage
Failure triage Failure prioritization:
What are the most severe bugs? Failure assignment:
Which developers should debug a given set of failures?
Automated debugging Where is the likely bug location?
04/10/23 Data Mining: Principles and Algorithms
7
A Glimpse on Software Bugs
Crashing bugs Symptoms: segmentation faults Reasons: memory access violations Tools: Valgrind, CCured
Noncrashing bugs Symptoms: unexpected outputs Reasons: logic or semantic errors
if ((m >= 0)) vs. if ((m >= 0) && (m != lastm)) < vs. <=, > vs. >=, etc .. j = i vs. j= i+1
Tools: No sound tools
04/10/23 Data Mining: Principles and Algorithms
8
Semantic Bugs Dominate
16%
3%
78%
3%
Semantic Bugs:• Application specific• Only few detectable• Mostly require annotations or specifications
Memory-related Bugs:• Many are detectable
Others
Concurrency bugs
Bug Distribution [Li et al., ICSE’07]
• 264 bugs in Mozilla and 98 bugs in Apache manually checked• 29,000 bugs in Bugzilla automatically checked
Courtesy of Zhenmin Li
04/10/23 Data Mining: Principles and Algorithms
9
Hacking Semantic Bugs is HARD
Major challenge: No crashes! No failure signatures No debugging hints
Major Methods Statistical debugging of semantic bugs [Liu et
al., FSE’05, TSE’06] Triage noncrashing failures through statistical
debugging [Liu et al., FSE’06]
04/10/23 Data Mining: Principles and Algorithms
10
Outline
Automated Debugging and Failure Triage
SOBER: Statistical Model-Based Fault Localization
Fault Localization-Based Failure Triage
Copy and Paste Bug Mining
Conclusions & Future Research
04/10/23 Data Mining: Principles and Algorithms
11
A Running Example
void subline(char *lin, char *pat, char *sub)
{
int i, lastm, m;
lastm = -1;
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if (m >= 0){
lastm = m;
}
if ((m == -1) || (m == i)){
i = i + 1;
} else
i = m;
}
}
void subline(char *lin, char *pat, char *sub)
{
int i, lastm, m;
lastm = -1;
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if ((m >= 0) && (lastm != m) ){
lastm = m;
}
if ((m == -1) || (m == i)) {
i = i + 1;
} else
i = m;
}
}
void subline(char *lin, char *pat, char *sub)
{
int i, lastm, m;
lastm = -1;
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if ((m >= 0)){
lastm = m;
}
if ((m == -1) || (m == i)) {
i = i + 1;
} else
i = m;
}
}
130 of 5542 test cases fail, no crashes
Predicate # of true # of false
(lin[i] != ENDSTR)==true
Ret_amatch < 0
Ret_amatch == 0
Ret_amatch > 0
(m >= 0) == true
(m == i) == true
(m >= -1) == true
5 1
5 1
1 5
1 54 2
2 4
1 5
Predicate evaluation as tossing a coin
04/10/23 Data Mining: Principles and Algorithms
12
Profile Executions as Vectors
5 1 1 5 1 5 4 2 1 5 2 4 1 5
19 1 1 19 1 19 18 2 1 19 2 18 1 19
9 1 1 9 1 9 8 2 8 2 2 8 1 9
Two passing executions
One failing execution
Extreme case Always false in passing and always true in failing …
Generalized case Different true probability in passing and failing
executions
04/10/23 Data Mining: Principles and Algorithms
13
Estimated Head Probability
Evaluation bias Estimated head probability from every
execution Specifically,
where and are the number of true and false evaluations in one execution.
Defined for each predicate and each execution
ft
t
nn
nX
tn fn
04/10/23 Data Mining: Principles and Algorithms
14
Divergence in Head Probability
Multiple evaluation biases from multiple executions
Evaluation bias as generated from models
0 1
Prob
Head Probability0 1
Prob
Head Probability
)|(~),...,,(),...,,( 2121 fmFm XfXXXSfffF )|(~),...,,(),...,,( ''
2'121 pnPn XfXXXSpppP
04/10/23 Data Mining: Principles and Algorithms
15
Major Challenges
No closed form of either model No sufficient number of failing executions
to estimate
0 1
Prob
Head Probability0 1
Prob
Head Probability
)|( fXf
04/10/23 Data Mining: Principles and Algorithms
16
04/10/23 Data Mining: Principles and Algorithms
17
SOBER in Summary
Source CodeSource Code
Pred2
Pred6
Pred1
Pred3 Pred2
Pred6
Pred1
Pred3
SOBERTest Suite
Test Suite
04/10/23 Data Mining: Principles and Algorithms
18
Previous State of the Art [Liblit et al, 2005]
Correlation analysis Context(P) = Prob(fail | P ever
evaluated)
Failure(P) = Prob(fail | P ever evaluated as true)
Increase(P) = Failure(P) – Context(P)How more likely the program fails when a predicate is ever evaluated true
04/10/23 Data Mining: Principles and Algorithms
19
Liblit05 in Illustration
Failing
PassingO
++
+ +++
O
OO O
O O
O
O O
Context(P) = Prob(fail | P ever evaluated)
= 4/10 = 2/5
Increase(P) = Failure(P) – Context(P) = 3/7 – 2/5 = 1/35
Failure(P) = Prob(fail | P ever evaluated as true)
= 3/7
04/10/23 Data Mining: Principles and Algorithms
20
SOBER in Illustration
O
++
+ +++
O
OO O
O O
O
O O
0 1
Prob
Evaluation bias
Failing
Passing
0 1
Prob
Evaluation bias
04/10/23 Data Mining: Principles and Algorithms
21
Difference between SOBER and Liblit05
Methodology: Liblit05: Correlation
analysis SOBER: Model-
based approach void subline(char *lin, char *pat, char
*sub)
{
1 int i, lastm, m;
2 lastm = -1;
3 i = 0;
4 while((lin[i] != ENDSTR)) {
5 m = amatch(lin, i, pat, 0);
6 if (m >= 0){
7 putsub(lin, i, m, sub);
8 lastm = m;
9 }
10 }
11 }
Utilized information Liblit05: Ever true? SOBER: What percentage is
true?
Liblit05:
• Line 6 is ever true in most passing and failing exec.
SOBER:
• Prone to be true in failing exec.
• Prone to be false in passing exec.
04/10/23 Data Mining: Principles and Algorithms
22
l in
while(lin[i]!=E N DSTR ) m = amatc h(lin, i, pat, 0)
i = 0
if(m >= 0) lastm = m
if(m==-1 || m==i) i = i + 1
i = m
lastm = -1
T-Score: Metric of Debugging Quality
void subline(char *lin, char *pat, char *sub)
{
int i, lastm, m;
lastm = -1;
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if ((m >= 0)){
lastm = m;
}
if ((m == -1) || (m == i)) {
i = i + 1;
} else
i = m;
}
}
T-score = 70%
How close is the blamed to the real bug location?
04/10/23 Data Mining: Principles and Algorithms
23
l in
while(lin[i]!=E N DSTR ) m = amatc h(lin, i, pat, 0)
i = 0
if(m >= 0) lastm = m
if(m==-1 || m==i) i = i + 1
i = m
lastm = -1
A Better Debugging Result
void subline(char *lin, char *pat, char *sub)
{
int i, lastm, m;
lastm = -1;
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if ((m >= 0)){
lastm = m;
}
if ((m == -1) || (m == i)) {
i = i + 1;
} else
i = m;
}
}
T-score = 40%
04/10/23 Data Mining: Principles and Algorithms
24
Evaluation 1: Siemens Program Suite
T-Score <= 20% is meaningful
Siemens program suite 130 buggy versions
of 7 small (<700LOC) programs
What percentage bugs can be located with no more than % code examination
04/10/23 Data Mining: Principles and Algorithms
25
Evaluation 2: Reasonably Large Programs
Bug Type Failure Number T-Score
Flex 2.4.7(8,834 LOC)
Bug 1 Misuse >= for > 163/525 0.5%
Bug 2 Misuse of = for == 356/525 1.6%
Bug 3 Mis-assign value true for false 69/525 7.6%
Bug 4 Mis-parenthesize ((a||b)&&c)as (a || (b && c))
22/525 15.4%
Bug 5 Off-by-one 92/525 45.6%
Grep 2.2(11,826 LOC)
Bug 1 Off-by-one 48/470 0.6%
Bug 2 Subclause-missing 88/470 0.2%
Gzip 1.2(6,184 LOC)
Bug 1 Subclause-missing 65/217 0.5%
Bug 2 Subclause-missing 17/217 2.9%
Software-artifact Infrastructure Repository (SIR): http://sir.unl.edu
04/10/23 Data Mining: Principles and Algorithms
26
A Glimpse of Bugs in Flex-2.4.7
Bug 1: Misuse >= for >
#ifndef BUG_1if ( performance_report > 0 )
#elseif ( performance_report >= 0 )
#endif
Bug 2: Mis-parenthesize
#ifndef BUG_2if ( (fulltbl || fullspd) && reject )
#elseif ( fulltbl || (fullspd && reject) )
#endif
Bug 3: Misuse != for ==#ifndef BUG_3
if (s - nextchar == my_strlen (p->name))#else
if (s - nextchar != my_strlen (p->name))#endif
Bug 4: Misuse = for ==
#ifndef BUG_4if ( yymore_really_used == REALLY_USED )
#elseif ( yymore_really_used = REALLY_NOT_USED )
#endif
Bug 5: Off-by-one
#ifndef BUG_5chk[offset] = EOB_POSITION;
#elsechk[offset - 1] = EOB_POSITION;
#endif
04/10/23 Data Mining: Principles and Algorithms
27
Evaluation 2: Reasonably Large Programs
Bug Type Failure Number T-Score
Flex 2.4.7(8,834 LOC)
Bug 1 Misuse >= for > 163/525 0.5%
Bug 2 Misuse of = for == 356/525 1.6%
Bug 3 Mis-assign value true for false 69/525 7.6%
Bug 4 Mis-parenthesize ((a||b)&&c)as (a || (b && c))
22/525 15.4%
Bug 5 Off-by-one 92/525 45.6%
Grep 2.2(11,826 LOC)
Bug 1 Off-by-one 48/470 0.6%
Bug 2 Subclause-missing 88/470 0.2%
Gzip 1.2(6,184 LOC)
Bug 1 Subclause-missing 65/217 0.5%
Bug 2 Subclause-missing 17/217 2.9%
Software-artifact Infrastructure Repository (SIR): http://sir.unl.edu
04/10/23 Data Mining: Principles and Algorithms
28
A Close Look: Grep-2.2: Bug 1
static int grep(int fd) { ...541 for( ; ; )542 { ...548 lastnl = bufbeg;549 if (lastout) 550 lastout = bufbeg;
553 beg = bufbeg + save - residue + 1; /* fault 1 */
...574 if (beg != lastout)575 lastout = 0; ... 580 } ... 587 return nlines;588 }
P1470
P1484
• 11,826 lines of C code• 3,136 predicates instrumented• 48 out of 470 cases fail
04/10/23 Data Mining: Principles and Algorithms
29
Grep-2.2: Bug 2 static char ** comsubs(char* left, char* right) { ...2264 for(lcp = left; *lcp != '\0'; ++lcp)2265 {
...2268 while (rcp != NULL)2269 {2270 for(i = 1; lcp[i] != '\0' /* && lcp[i] == rcp[i] */; ++i) // fault 22271 continue;
...2275 }
...2280 }2281 return cpp;2282 }
P1952
• 11,826 lines of C code• 3,136 predicates instrumented• 88 out of 470 cases fail
04/10/23 Data Mining: Principles and Algorithms
30
No Silver Bullet: Flex Bug 5132 void genctbl() { ...177 for ( i = 0; i <= lastdfa; ++i )178 {179 int anum = dfaacc[i].dfaacc_state;180 int offset = base[i];#ifndef BUG_5
chk[offset] = EOB_POSITION;#else
chk[offset - 1] = EOB_POSITION;#endif183 chk[offset - 1] = ACTION_POSITION;184 nxt[offset - 1] = anum;/* action number */185 } ...224 }
No wrong value in chk[offset -1]
chk[offset] is not used here but later
• 8,834 lines of C code• 2,699 predicates instrumented
04/10/23 Data Mining: Principles and Algorithms
31
Experiment Result in Summary Effective for bugs demonstrating abnormal control flows
Bug Type Failure Number T-Score
Flex 2.4.7(8,834 LOC)
Bug 1 Misuse >= for > 163/525 0.5%
Bug 2 Misuse of = for == 356/525 1.6%
Bug 3 Mis-assign value true for false 69/525 7.6%
Bug 4 Mis-parenthesize ((a||b)&&c)as (a || (b && c))
22/525 15.4%
Bug 5 Off-by-one 92/525 45.6%
Grep 2.2(11,826 LOC)
Bug 1 Off-by-one 48/470 0.6%
Bug 2 Subclause-missing 88/470 0.2%
Gzip 1.2(6,184 LOC)
Bug 1 Subclause-missing 65/217 0.5%
Bug 2 Subclause-missing 17/217 2.9%
04/10/23 Data Mining: Principles and Algorithms
32
SOBER Handles Memory Bugs As Well
void more_variables () { ...127 old_count = v_count; ...137 for (indx = 3; indx < old_count; indx++) ...141 for (; indx < v_count; indx++) ... }
void more_arrays () { ...167 arrays = (bc_var_array **) bc_malloc (a_count * sizeof(bc_var_array*)); ... /* Copy the old arrays. */ for (indx = 1; indx < old_count; indx++) arrays[indx] = old_ary[indx];
176 for (; indx < v_count; indx++) arrays[indx] = NULL; ... }
bc 1.06:
• Two memory bugs found with SOBER
• One of them is unreported
• Blamed location is NOT the crashing venue
04/10/23 Data Mining: Principles and Algorithms
33
Outline
Automated Debugging and Failure Triage
SOBER: Statistical Model-Based Fault Localization
Fault Localization-Based Failure Triage
Copy and Paste Bug Mining
Conclusions & Future Research
04/10/23 Data Mining: Principles and Algorithms
34
Major Problems in Failure Triage
Failure Prioritization What failures are likely due to the same
bug What bugs are the most severe Worst 1% bugs = 50% failures
Failure Assignment Which developer should debug which set
of failures?
Courtesy of Microsoft Corporation
04/10/23 Data Mining: Principles and Algorithms
35
Failure indexing Identify failures likely due to the same bug
A Solution: Failure Clustering
Most sever
Less Severe
Least Severe
Fault in core.io?
Fault in function
initialize()?
Failure Reports
Failure Reports
++++++
+++++++
++++++
++
+
X
Y
0
04/10/23 Data Mining: Principles and Algorithms
36
The Central Question: A Distance Measure between Failures
Different measures render different clusterings
OOO
OOO
+++++++
X
Y
Dist. defined on X-axis Dist. defined on Y-axis
O+X
++ OO OOOOOO +++++++
Y
0
0 0
04/10/23 Data Mining: Principles and Algorithms
37
How to Define a Distance
Previous work [Podgurski et al., 2003] T-Proximity: Distance defined on literal trace similarity
,1fail ,2fail mfail3fail
1 2 3 m
= SOBER
Our approach [Liu et al., 2006] R-Proximity: Distance defined on likely bug location
04/10/23 Data Mining: Principles and Algorithms
38
3RC
Why Our Approach is Reasonable
Optimal proximity: defined on root causes (RC) Our approach: defined on likely causes (LC)
1RC
,1fail ,2fail mfailF ,1pass ,2pass npass P3fail
2RC mRC+
1fail
mfail+
+2fail
3fail+
X
Y
03RC
= Automated Fault Localization
mLC3LC2LC1LC
04/10/23 Data Mining: Principles and Algorithms
39
R-Proximity: An Instantiation with SOBER
Likely causes (LCs) are predicate rankings
Pred2
Pred3
Pred1
Pred6
2Pred2
Pred3
Pred1
Pred6
3Pred2
Pred6
Pred1
Pred3
m
,1fail ,2fail mfail ,1pass ,2pass npass3failF P
A distances between rankings is needed
+2fail
3fail+
+1fail
mfail+
X
Y
0
Pred2
Pred6
Pred1
Pred3
1
SOBER
04/10/23 Data Mining: Principles and Algorithms
40
Distance between Rankings
Traditional Kendall’s tau distance Number of preference disagreements E.g.
NOT all predicates need to be considered? Predicates are uniformly instrumented Only fault-relevant predicates count
2),( then ),,,(),,,( 2132122131 DistPPPPPP
04/10/23 Data Mining: Principles and Algorithms
41
Predicate Weighting in a Nutshell
Fault-relevant predicates receive higher weights
Fault-relevance is implied by rankings
Pred2
Pred6
Pred1
Pred3
1 2 3 mPred2
Pred6
Pred1
Pred3
Pred2
Pred1
Pred3
Pred6
Pred2
Pred1
Pred3
Pred6 Mostly favored predicates receive
higher weights
04/10/23 Data Mining: Principles and Algorithms
42
Automated Failure Assignment
Most-favored predicates indicate the agreed bug location for a group of failures
Predicate spectrum graph
Pred2
Pred6
Pred1
Pred3
1 2 3 4Pred2
Pred6
Pred1
Pred3
Pred2
Pred1
Pred3
Pred6
Pred2
Pred1
Pred3
Pred6
Pred. Index
Y
0 1 3 62 4 5
2
4
04/10/23 Data Mining: Principles and Algorithms
43
Case Study 1: Grep-2.2
470 test cases in total 136 cases fail due to both faults, no crashes 48 fail due to Fault 1, 88 fail due to Fault 2
static int grep(int fd) { ...541 for( ; ; )542 { ...548 lastnl = bufbeg;549 if (lastout)550 lastout = bufbeg;551 if (buflim - bufbeg == save)552 break;553 beg = bufbeg + save - residue + 1; /* fault 1 */554 for(lim = buflim; lim > beg && lim[-1] != '\n'; --lim)555 ; ...574 if (beg != lastout)57 5 lastout = 0;576 save = residue + lim - beg; ...580 } ...587 return nlines;588 }
Fault 1: A n off-by-one error in grep.c
static char ** comsubs(char* left, char* right) { ...2264 for(lcp = left; *lcp != '\0'; ++lcp)2265 {2266 len = 0;2267 rcp = index(right, *lcp);2268 while (rcp != NULL)2269 { /* fault 2 */2270 for (i = 1; lcp[i] != '\0'/* && lcp[i] == rcp[i] */; ++i)2271 continue;2272 if (i > len)2273 len = i;2274 rcp = index(rcp + 1, *lcp);2275 }2276 if (len == 0)2277 continue;2278 if ((cpp = enlist(cpp, lcp, len)) == NULL)2279 break;2280 }2281 return cpp;2282 }
Fault 2: A subc lause-miss ing error in dfa.c
P 1470
P 1484
P 1952
04/10/23 Data Mining: Principles and Algorithms
44
Failure Proximity Graphs
Red crosses are failures due to Fault 1 Blue circles are failures due to Fault 2 Divergent behaviors due to the same fault Better clustering result under R-Proximity
T-Proximity R-Proximity
04/10/23 Data Mining: Principles and Algorithms
45
Guided Failure Assignment What predicates are favored in each
group?
04/10/23 Data Mining: Principles and Algorithms
46
Assign Failures to Appropriate Developers
The 21 failing cases in Cluster 1 are assigned to developers responsible for the function grep
The 112 failing cases in Cluster 2 are assigned to developers responsible for the function comsub
static int grep(int fd) { ...541 for( ; ; )542 { ...548 lastnl = bufbeg;549 if (lastout)550 lastout = bufbeg;551 if (buflim - bufbeg == save)552 break;553 beg = bufbeg + save - residue + 1; /* fault 1 */554 for(lim = buflim; lim > beg && lim[-1] != '\n'; --lim)555 ; ...574 if (beg != lastout)57 5 lastout = 0;576 save = residue + lim - beg; ...580 } ...587 return nlines;588 }
Fault 1: A n off-by-one error in grep.c
static char ** comsubs(char* left, char* right) { ...2264 for(lcp = left; *lcp != '\0'; ++lcp)2265 {2266 len = 0;2267 rcp = index(right, *lcp);2268 while (rcp != NULL)2269 {2270 for (i = 1; lcp[i] != '\0'/* && lcp[i] == rcp[i] */; ++i) /* fault 2 */2271 continue;2272 if (i > len)2273 len = i;2274 rcp = index(rcp + 1, *lcp);2275 }2276 if (len == 0)2277 continue;2278 if ((cpp = enlist(cpp, lcp, len)) == NULL)2279 break;2280 }2281 return cpp;2282 }
Fault 2: A subc lause-miss ing error in dfa.c
P 1470
P 1484
P 1952
04/10/23 Data Mining: Principles and Algorithms
47
Case Study 2: Gzip-1.2.3
217 test cases in total 82 cases fail due to both faults, no crashes 65 fail due to Fault 1, 17 fail due to Fault 2
661 ulg deflate() { ...//Fault 1686 if (hash_head != NIL /* && prev_length < max_lazy_match */ 687 && strstart - hash_head <= MAX_DIST ) { ... 692 match_length = longest_match (hash_head); }707 if (prev_length >= MIN_MATCH && match_length <= prev_length) { ...711 flush = ct_tally(strstart-1-prev_match, prev_length - MIN_MATCH); ...719 strstart++; ...732 } else if (match_available) { ...738 if (ct_tally (0, window[strstart-1])) {
...741 strstart++;743 } else { ...750 } }
580 local ulg deflate_fast() { ... // Fault 2596 if (hash_head != NIL /*&& strstart - hash_head <= MAX_DIST*/) {601 match_length = longest_match (hash_head); }605 if (match_length >= MIN_MATCH) { ...608 flush = ct_tally(strstart-match_start, match_length - MIN_MATCH);610 lookahead -= match_length;
615 if (match_length <= max_insert_length) { ...626 strstart++; 635 }636 } else { ...639 flush = ct_tally (0, window[strstart]); ...642 }653 return FLUSH_BLOCK(1); /* eof */654 }
04/10/23 Data Mining: Principles and Algorithms
48
Failure Proximity Graphs
Red crosses are for failures due to Fault 1 Blue circles are for failures due to Fault 2 Nearly perfect clustering under R-Proximity Accurate failure assignment
T-Proximity R-Proximity
04/10/23 Data Mining: Principles and Algorithms
49
Outline
Automated Debugging and Failure Triage
SOBER: Statistical Model-Based Fault Localization
Fault Localization-Based Failure Triage
Copy and Paste Bug Mining
Conclusions & Future Research
04/10/23 Data Mining: Principles and Algorithms
50
Mining Copy-Paste Bugs
Copy-pasting is common 12% in Linux file
system [Kasper2003]
19% in X Window system [Baker1995]
Copy-pasted code is error prone Among 35 errors in
Linux drivers/i2o, 34 are caused by copy-paste [Chou2001]
Forget to change!
void __init prom_meminit(void){ …… for (i=0; i<n; i++) { total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1]; } ……
for (i=0; i<n; i++) { taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1]; }
for (i=0; i<n; i++) { total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1]; }
(Simplified example from linux-2.6.6/arch/sparc/prom/memory.c)
04/10/23 Data Mining: Principles and Algorithms
51
An Overview of Copy-Paste Bug Detection
Parse source code &build a sequence database
Mine for basic copy-pasted segments
Compose larger copy-pasted segments
Prune false positives
04/10/23 Data Mining: Principles and Algorithms
52
Parsing Source Code
5 61 20
old = 3;Tokenize
Hash16
Purpose: building a sequence database Idea: statement number
Tokenize each component Different operators/constant/key words different tokens
Handle identifier renaming: same type of identifiers same token
new = 3;
5 61 20
Hash16
04/10/23 Data Mining: Principles and Algorithms
53
Building Sequence Database
Program a long sequence Need a sequence
database
Cut the long sequence Naïve method: fixed
length Our method: basic block
for (i=0; i<n; i++) { total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1];} ……
for (i=0; i<n; i++) { taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1];}
65161671
…
65161671
Final sequence DB:(65)(16, 16, 71)…(65)(16, 16, 71)
Hash values
04/10/23 Data Mining: Principles and Algorithms
54
total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1];
taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1];
Mining for Basic Copy-pasted Segments
Apply frequent sequence mining algorithm on the sequence database
Modification Constrain the max gap
(16, 16, 71)……(16, 16, 10, 71)
(16, 16, 71)……(16, 16, 71)
Frequent subsequence Insert 1 statement
(gap = 1)
04/10/23 Data Mining: Principles and Algorithms
55
Composing Larger Copy-Pasted Segments
Combine the neighboring copy-pasted segments repeatedly
for (i=0; i<n; i++) {
total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1];
for (i=0; i<n; i++) {
taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1];
for (i=0; i<n; i++) { taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1];}
for (i=0; i<n; i++) { total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1];}
65
65
161671
161671
65161671
65161671
Hash values
copy-pasted
……
combine
combine
04/10/23 Data Mining: Principles and Algorithms
56
Pruning False Positives
Unmappable segments Identifier names cannot be mapped to
corresponding ones
Tiny segments
For more detail, see Zhenmin Li, Shan Lu, Suvda Myagmar, Yuanyuan Zhou.
CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code, in Proc. 6th Symp. Operating Systems Design and Implementation, 2004
f (a1);f (a2);f (a3);
f1 (b1);f1 (b2);f2 (b3);
conflict
04/10/23 Data Mining: Principles and Algorithms
57
Some Test Results of C-P Bug Detection
Software Verified Bugs Potential Bugs (careless programming)
Linux 28 21
FreeBSD 23 8
Apache 5 0
PostgreSQL 2 0
Software # LOC
Linux 4.4 M
FreeBSD 3.3 M
Apache 224 K
PostgreSQL 458 K
Space (MB)TimeSoftware
5738 secsPostgreSQL
3015 secsApache
45920 minsFreeBSD
52720 minsLinux
04/10/23 Data Mining: Principles and Algorithms
58
Outline
Automated Debugging and Failure Triage
SOBER: Statistical Model-Based Fault Localization
Fault Localization-Based Failure Triage
Copy and Paste Bug Mining
Conclusions & Future Research
04/10/23 Data Mining: Principles and Algorithms
59
Conclusions
Data mining into software and computer systems Identify incorrect executions from program
runtime behaviors Classification dynamics can give away “backtrace”
for noncrashing bugs without any semantic inputs A hypothesis testing-like approach is developed to
localize logic bugs in software No prior knowledge about the program semantics
is assumed Lots of other software bug mining methods should
be and explored
04/10/23 Data Mining: Principles and Algorithms
60
Future Research: Mining into Computer Systems
Huge volume of data from computer systems Persistent state interactions, event logs,
network logs, CPU usage, … Mining system data for …
Reliability Performance Manageability …
Challenges in data mining Statistical modeling of computer systems Online, scalability, interpretability …
04/10/23 Data Mining: Principles and Algorithms
61
References [DRL+98] David L. Detlefs, K. Rustan, M. Leino, Greg Nelson and James B. Saxe. Extended
static checking, 1998 [EGH+94] David Evans, John Guttag, James Horning, and Yang Meng Tan. LCLint: A tool for
using specifications to check code. In Proceedings of the ACM SIG-SOFT '94 Symposium on the Foundations of Software Engineering, pages 87-96, 1994.
[DLS02] Manuvir Das, Sorin Lerner, and Mark Seigle. Esp: Path-sensitive program verication in polynomial time. In Conference on Programming Language Design and Implementation, 2002.
[ECC00] D.R. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system-specic, programmer-written compiler extensions. In Proc. 4th Symp. Operating Systems Design and Implementation, October 2000.
[M93] Ken McMillan. Symbolic Model Checking. Kluwer Academic Publishers, 1993 [H97] Gerard J. Holzmann. The model checker SPIN. Software Engineering, 23(5):279-295,
1997. [DDH+92] David L. Dill, Andreas J. Drexler, Alan J. Hu, and C. Han Yang. Protocol verication as
a hardware design aid. In IEEE Int. Conf. Computer Design: VLSI in Computers and Processors, pages 522-525, 1992.
[MPC+02] M. Musuvathi, D. Y.W. Park, A. Chou, D. R. Engler and D. L. Dill. CMC: A Pragmatic Approach to Model Checking Real Code. In Proc. 5th Symp. Operating Systems Design and Implementation, 2002.
04/10/23 Data Mining: Principles and Algorithms
62
References (cont’d)
[G97] P. Godefroid. Model Checking for Programming Languages using VeriSoft. In Proc. 24th ACM Symp. Principles of Programming Languages, 1997
[BHP+-00] G. Brat, K. Havelund, S. Park, and W. Visser. Model checking programs. In IEEE Int.l Conf. Automated Software Engineering (ASE), 2000.
[HJ92] R. Hastings and B. Joyce. Purify: Fast Detection of Memory Leaks and Access Errors. 1991. in Proc. Winter 1992 USENIX Conference, pp. 125-138. San Francisco, California
Chao Liu, Xifeng Yan, and Jiawei Han, “Mining Control Flow Abnormality for Logic Error Isolation,” in Proc. 2006 SIAM Int. Conf. on Data Mining (SDM'06), Bethesda, MD, April 2006.
C. Liu, X. Yan, L. Fei, J. Han, and S. Midkiff, “SOBER: Statistical Model-based Bug Localization”, in Proc. 2005 ACM SIGSOFT Symp. Foundations of Software Engineering (FSE 2005), Lisbon, Portugal, Sept. 2005.
C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu, “Mining Behavior Graphs for Backtrace of Noncrashing Bugs”, in Proc. 2005 SIAM Int. Conf. on Data Mining (SDM'05), Newport Beach, CA, April 2005.
[SN00] Julian Seward and Nick Nethercote. Valgrind, an open-source memory debugger for x86-GNU/Linux http://valgrind.org/
[LLM+04] Zhenmin Li, Shan Lu, Suvda Myagmar, Yuanyuan Zhou. CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code, in Proc. 6th Symp. Operating Systems Design and Implementation, 2004
[LCS+04] Zhenmin Li, Zhifeng Chen, Sudarshan M. Srinivasan, Yuanyuan Zhou. C-Miner: Mining Block Correlations in Storage Systems. In pro. 3rd USENIX conf. on file and storage technologies, 2004
04/10/23 Data Mining: Principles and Algorithms
63
04/10/23 Data Mining: Principles and Algorithms
64
Surplus Slides
The remaining are leftover slides
04/10/23 Data Mining: Principles and Algorithms
65
Representative Publications
Chao Liu, Long Fei, Xifeng Yan, Jiawei Han and Samuel Midkiff, “Statistical Debugging: A Hypothesis Testing-Based Approach,” IEEE Transaction on Software Engineering, Vol. 32, No. 10, pp. 831-848, Oct., 2006.
Chao Liu and Jiawei Han, “R-Proximity: Failure Proximity Defined via Statistical Debugging,” IEEE Transaction on Software Engineering, Sept. 2006. (under review)
Chao Liu, Zeng Lian and Jiawei Han, "How Bayesians Debug", the 6th IEEE International Conference on Data Mining, pp. pp. 382-393,Hong Kong, China, Dec. 2006.
Chao Liu and Jiawei Han, "Failure Proximity: A Fault Localization-Based Approach", the 14th ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 286-295, Portland, USA, Nov. 2006.
Chao Liu, "Fault-aware Fingerprinting: Towards Mutualism between Failure Investigation and Statistical Debugging", the 14th ACM SIGSOFT Symposium on the Foundations of Software Engineering, Portland, USA, Nov. 2006.
Chao Liu, Chen Chen, Jiawei Han and Philip S. Yu, "GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis", the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 872-881, Philadelphia, USA, Aug. 2006.
Qiaozhu Mei, Chao Liu, Hang Su and Chengxiang Zhai, "A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs", the 15th International Conference on World Wide Web, pp. 533-542, Edinburgh, Scotland, May, 2006.
Chao Liu, Xifeng Yan and Jiawei Han, "Mining Control Flow Abnormality for Logic Error Isolation", 2006 SIAM International Conference on Data Mining, pp. 106-117, Bethesda, US, April, 2006.
Chao Liu, Xifeng Yan, Long Fei, Jiawei Han and Samuel Midkiff, "SOBER: Statistical Model-Based Bug Localization", the 5th joint meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 286-295, Lisbon, Portugal, Sept. 2005.
William Yurcik and Chao Liu. "A First Step Toward Detecting SSH Identity Theft on HPC Clusters: Discriminating Cluster Masqueraders Based on Command Behavior" the 5th International Symposium on Cluster Computing and the Grid, pp. 111-120, Cardiff, UK, May 2005.
Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han and Philip S. Yu, "Mining Behavior Graphs for "Backtrace" of Noncrashing Bugs", In Proc. 2005 SIAM Int. Conf. on Data Mining, pp. 286-297, Newport Beach, US, April, 2005.
04/10/23 Data Mining: Principles and Algorithms
66
void subline(char *lin, char *pat, char *sub)
{
int i, lastm, m;
lastm = -1;
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if (m > 0){
putsub(lin, i, m, sub);
lastm = m;
}
if ((m == -1) || (m == i)){
fputc(lin[i], stdout);
i = i + 1;
} else
i = m;
}
}
void subline(char *lin, char *pat, char *sub)
{
int i, lastm, m;
lastm = -1;
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if (m >= 0){
putsub(lin, i, m, sub);
lastm = m;
}
if ((m == -1) || (m == i)){
fputc(lin[i], stdout);
i = i + 1;
} else
i = m;
}
}
Example of Noncrashing Bugs
void subline(char *lin, char *pat, char *sub)
{
int i, lastm, m;
lastm = -1;
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if (m >= 0){
putsub(lin, i, m, sub);
lastm = m;
}
if ((m == -1) || (m == i)){
fputc(lin[i], stdout);
i = i + 1;
} else
i = m;
}
}
void subline(char *lin, char *pat, char *sub)
{
int i, lastm, m;
lastm = -1;
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if ((m >= 0) && (lastm != m) ){
putsub(lin, i, m, sub);
lastm = m;
}
if ((m == -1) || (m == i)){
fputc(lin[i], stdout);
i = i + 1;
} else
i = m;
}
}
04/10/23 Data Mining: Principles and Algorithms
67
Debugging Crashes
Crashing Bugs
04/10/23 Data Mining: Principles and Algorithms
68
Bug Localization via Backtrace
Can we circle out the backtrace for noncrashing bugs?
Major challenges
We do not know where abnormality happens
Observations
Classifications depend on discriminative features,
which can be regarded as a kind of abnormality
Can we extract backtrace from classification
results?
04/10/23 Data Mining: Principles and Algorithms
69
Outline
Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification
Dynamics Mining Control Flow Abnormality for Logic
Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions
04/10/23 Data Mining: Principles and Algorithms
70
Related Work
Crashing bugs Memory access monitoring
Purify [HJ92], Valgrind [SN00] …
Noncrashing bugs Static program analysis Traditional model checking Model checking source code
04/10/23 Data Mining: Principles and Algorithms
71
Static Program Analysis
Methodology Examine source code directly Enumerate all the possible execution paths without
running the program Check user-specified properties, e.g.
free(p) …… (*p) lock(res) …… unlock(res) receive_ack() … … send_data()
Strengths Check all possible execution paths
Problems Shallow semantics Properties can be directly mapped to source code
structure Tools
ESC [DRL+98], LCLint [EGH+94], ESP [DLS02], MC Checker [ECC00] …
×
04/10/23 Data Mining: Principles and Algorithms
72
Traditional Model Checking
Methodology Formally model the system under check in a
particular description language Exhaustive exploration of the reachable states
in checking desired or undesired properties Strengths
Model deep semantics Naturally fit in checking event-driven systems,
like protocols Problems
Significant amount of manual efforts in modeling
State space explosion Tools
SMV [M93], SPIN [H97], Murphi [DDH+92] …
04/10/23 Data Mining: Principles and Algorithms
73
Model Checking Source Code
Methodology Run real program in sandbox Manipulate event happenings, e.g.,
Message incomings the outcomes of memory allocation
Strengths Less significant manual specification
Problems Application restrictions, e.g.,
Event-driven programs (still) Clear mapping between source code and logic event
Tools CMC [MPC+02], Verisoft [G97], Java PathFinder
[BHP+-00] …
04/10/23 Data Mining: Principles and Algorithms
74
Summary of Related Work
In common, Semantic inputs are necessary
Program model Properties to check
Application scenarios Shallow semantics Event-driven system
04/10/23 Data Mining: Principles and Algorithms
75
Outline
Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification
Dynamics Mining Control Flow Abnormality for Logic
Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions
04/10/23 Data Mining: Principles and Algorithms
76
void subline(char *lin, char *pat, char *sub)
{
int i, lastm, m;
lastm = -1;
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if (m > 0){
putsub(lin, i, m, sub);
lastm = m;
}
if ((m == -1) || (m == i)){
fputc(lin[i], stdout);
i = i + 1;
} else
i = m;
}
}
void subline(char *lin, char *pat, char *sub)
{
int i, lastm, m;
lastm = -1;
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if (m >= 0){
putsub(lin, i, m, sub);
lastm = m;
}
if ((m == -1) || (m == i)){
fputc(lin[i], stdout);
i = i + 1;
} else
i = m;
}
}
Example Revisited
void subline(char *lin, char *pat, char *sub)
{
int i, lastm, m;
lastm = -1;
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if (m >= 0){
putsub(lin, i, m, sub);
lastm = m;
}
if ((m == -1) || (m == i)){
fputc(lin[i], stdout);
i = i + 1;
} else
i = m;
}
}
void subline(char *lin, char *pat, char *sub)
{
int i, lastm, m;
lastm = -1;
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if ((m >= 0) && (lastm != m) ){
putsub(lin, i, m, sub);
lastm = m;
}
if ((m == -1) || (m == i)){
fputc(lin[i], stdout);
i = i + 1;
} else
i = m;
}
}
• No memory violations
• Not event-driven program
• No explicit error properties
04/10/23 Data Mining: Principles and Algorithms
77
Identification of Incorrect Executions
A two-class classification problem How to abstract program executions
Program behavior graph Feature selection
Edges + Closed frequent subgraphs Program behavior graphs
Function-level abstraction of program behaviorsint main(){ ... A(); ... B();}int A(){ ... }int B(){ ... C() ... }int C(){ ... }
04/10/23 Data Mining: Principles and Algorithms
78
Values of Classification
A graph classification problem Every execution gives one behavior graph Two sets of instances: correct and incorrect
Values of classification Classification itself does not readily work for bug
localization Classifier only labels each run as either correct or incorrect as a
whole It does not tell when abnormality happens
Successful classification relies on discriminative features
Can discriminative features be treated as a kind of abnormality?
When abnormality happens? Incremental classification?
?
04/10/23 Data Mining: Principles and Algorithms
79
Outline
Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification
Dynamics Mining Control Flow Abnormality for Logic
Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions
04/10/23 Data Mining: Principles and Algorithms
80
Incremental Classification
Classification works only when instances of two classes are different.
So that we can use classification accuracy as a measure of difference.
Relate classification dynamics to bug relevant functions
04/10/23 Data Mining: Principles and Algorithms
81
Illustration: Precision Boost
main main
A A
B C
D
B C
D
One Correct Execution One Incorrect Execution
E E
F
G
F
G
H
04/10/23 Data Mining: Principles and Algorithms
82
Bug Relevance
Precision boost For each function F:
Precision boost = Exit precision - Entrance precision.
Intuition Differences take place within the execution
of F Abnormalities happens while F is in the
stack The larger this precision boost, the more
likely F is part of the backtrace Bug-relevant function
04/10/23 Data Mining: Principles and Algorithms
83
Outline
Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Case Study Conclusions
04/10/23 Data Mining: Principles and Algorithms
84
Case Study
Subject program replace: perform regular
expression matching and substitutions
563 lines of C code 17 functions are involved
Execution behaviors 130 out of 5542 test cases
fail to give correct outputs No incorrect executions
incur segmentation faults Logic bug
Can we circle out the backtrace for this bug?
void subline(char *lin, char *pat, char *sub)
{
int i, lastm, m;
lastm = -1;
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if (m >= 0){
putsub(lin, i, m, sub);
lastm = m;
}
if ((m == -1) || (m == i)){
fputc(lin[i], stdout);
i = i + 1;
} else
i = m;
}
}
void subline(char *lin, char *pat, char *sub)
{
int i, lastm, m;
lastm = -1;
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if ((m >= 0) && (lastm != m) ){
putsub(lin, i, m, sub);
lastm = m;
}
if ((m == -1) || (m == i)){
fputc(lin[i], stdout);
i = i + 1;
} else
i = m;
}
}
04/10/23 Data Mining: Principles and Algorithms
85
Precision Pairs
04/10/23 Data Mining: Principles and Algorithms
86
Precision Boost Analysis
Objective judgment of bug relevant functions
main function is always bug relevant
Stepwise precision boost
Line-up property
04/10/23 Data Mining: Principles and Algorithms
87
Backtrace for Noncrashing Bugs
04/10/23 Data Mining: Principles and Algorithms
88
Method Summary
Identify incorrect executions from program runtime
behaviors
Classification dynamics can give away “backtrace”
for noncrashing bugs without any semantic inputs
Data mining can contribute to software
engineering and system researches in general
04/10/23 Data Mining: Principles and Algorithms
89
Outline
Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification
Dynamics Mining Control Flow Abnormality for Logic
Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions
04/10/23 Data Mining: Principles and Algorithms
90
An Example
Replace program: 563 lines of C code, 20 functions Symptom: 30 out of 5542 test cases fail to give correct outputs,
and no crashes Goal: Localizing the bug, and prioritizing manual examination
void dodash(char delim, char *src, int *i, char *dest, int *j, int maxset){ while (…){ … if(isalnum(isalnum(src[*i+1]) && src[*i-1]<=src[*i+1]){
for(k = src[*i-1]+1; k<=src[*i+1]; k++)junk = addst(k, dest, j, maxset);
*i = *i + 1;}*i = *i + 1;
}}
04/10/23 Data Mining: Principles and Algorithms
91
Difficulty & Expectation
Difficulty Statically, even small programs are
complex due to dependencies Dynamically, execution paths can vary
significantly across all possible inputs Logic errors have no apparent
symptoms Expectations
Unrealistic to fully unload developers Localize buggy region Prioritize manual examination
04/10/23 Data Mining: Principles and Algorithms
92
Execution Profiling
Full execution trace Control flow + value tags Too expensive to record at runtime Unwieldy to process
Summarized control flow for conditionals (if, while, for) Branch evaluation counts Lightweight to take at runtime Easy to process and effective
04/10/23 Data Mining: Principles and Algorithms
93
Analysis of the Example
A = isalnum(isalnum(src[*i+1]))
B = src[*i-1]<=src[*i+1] An execution is logically correct until (A ^ ¬B) is evaluated
as true when the evaluation reaches this condition If we monitor the program conditionals like A here, their
evaluation will shed light on the hidden error and can be exploited for error isolation
if(isalnum(isalnum(src[*i+1]) && src[*i-1]<=src[*i+1]){
for(k = src[*i-1]+1; k<=src[*i+1]; k++)
junk = addst(k, dest, j, maxset);
*i = *i + 1; }
04/10/23 Data Mining: Principles and Algorithms
94
Analysis of Branching Actions
Correct vs. in correct runs in program P
AS we tested through 5542 test cases, the true eval prob for (A^¬B) is 0.727 in a correct and 0.896 in an incorrect execution on average
Error location does exhibit detectable abnormal behaviors in incorrect executions
A ¬A
B nAB n¬AB
¬B nA¬B = 0 n¬A¬B
A ¬A
B nAB n¬AB
¬B nA¬B ≥1 n¬A¬B
04/10/23 Data Mining: Principles and Algorithms
95
Conditional Test Works for Nonbranching Errors
Off-by-one error can still be detected using the conditional tests
Void makepat (char *arg, int start, char delim, char *pat)
{
…
if (!junk)
result = 0;
else
result = i + 1; /* off-by-one error */
/* should be: result = i */
return result;
}
04/10/23 Data Mining: Principles and Algorithms
96
Ranking Based on Boolean Bias
Let input di has a desired output oi. We execute P. P passes the test iff oi’ is identical to oi
Tp = {ti| oi’= P(di) matches oi}
Tf = {ti| oi’= P(di) does not match oi} Boolean bias:
nt: # times that a boolean feature B evaluates true, similar for nf
Boolean bias: π(B) = (nt – nf )/(nt + nf) It encodes the distribution of B’s value: 1 if B
always assumes true, -1 if always false, in between for all the other mixtures
04/10/23 Data Mining: Principles and Algorithms
97
Evaluation Abnormality
Boolean bias for branch P the probability of being evaluated as true within
one execution Suppose we have n correct and m incorrect
executions, for any predicate P, we end up with An observation sequence for correct runs
S_p = (X’_1, X’_2, …, X’_n) An observation sequence for incorrect runs
S_f = (X_1, X_2, …, X_m) Can we infer whether P is suspicious based on S_p
and S_f?
04/10/23 Data Mining: Principles and Algorithms
98
Underlying Populations
Imagine the underlying distribution of boolean bias for correct and incorrect executions are f(X|θp) and f(X|θf)
S_p and S_f can be viewed as random sample from the underlying populations respectively
Major heuristic: The larger the divergence between f(X|θp) and f(X|θf), the more relevant the branch P is to the bug
0 1
Prob
Evaluation bias 0 1
Prob
Evaluation bias
04/10/23 Data Mining: Principles and Algorithms
99
Major Challenges
No knowledge of the closed forms of both distributions
Usually, we do not have sufficient incorrect executions to estimate f(X|θf) reliably.
0 1
Prob
Evaluation bias0 1
Prob
Evaluation bias
04/10/23 Data Mining: Principles and Algorithms
100
Our Approach: Hypothesis Testing
04/10/23 Data Mining: Principles and Algorithms
101
Faulty Functions
Motivation Bugs are not necessarily on branches Higher confidence in function rankings
than branch rankings Abnormality score for functions
Calculate the abnormality score for each branch within each function
Aggregate them
04/10/23 Data Mining: Principles and Algorithms
102
Two Evaluation Measures
CombineRank Combine these score by summation Intuition: When a function contains many
abnormal branches, it is likely bug-relevant UpperRank
Choose the largest score as the representative
Intuition: When a function has one extremely abnormal branch, it is likely bug-relevant
04/10/23 Data Mining: Principles and Algorithms
103
Dodash vs. Omatch: Which function is likely buggy?─And Which Measure is More
Effective?
04/10/23 Data Mining: Principles and Algorithms
104
Bug Benchmark
Bug benchmark Siemens Program Suite
89 variants of 6 subject programs, each of 200-600 LOC
89 known bugs in total Mainly logic (or semantic) bugs
Widely used in software engineering research
04/10/23 Data Mining: Principles and Algorithms
105
Results on Program “replace”
04/10/23 Data Mining: Principles and Algorithms
106
Comparison between CombineRank and UpperRank
Buggy function ranked within top-k
04/10/23 Data Mining: Principles and Algorithms
107
Results on Other Programs
04/10/23 Data Mining: Principles and Algorithms
108
More Questions to Be Answered
What will happen (i.e., how to handle) if multiple errors exist in one program?
How to detect bugs if only very few error test cases are available?
Is it really more effective if we have more execution traces?
How to integrate program semantics in this statistics-based testing algorithm?
How to integrate program semantics analysis with statistics-based analysis?