108
06/07/22 Data Mining: Principles and A lgorithms 1 Data Mining: Concepts and Techniques — Chapter 11 — Software Bug MiningJiawei Han and Micheline Kamber Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj ©2006 Jiawei Han and Micheline Kamber. All rights reserved. Acknowledgement: Chao Liu

5/4/10 Data Mining: Principles and Algorithms

  • Upload
    tommy96

  • View
    763

  • Download
    0

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

1

Data Mining: Concepts and Techniques

— Chapter 11 —

—Software Bug Mining—

Jiawei Han and Micheline Kamber

Department of Computer Science

University of Illinois at Urbana-Champaign

www.cs.uiuc.edu/~hanj

©2006 Jiawei Han and Micheline Kamber. All rights reserved.Acknowledgement: Chao Liu

Page 2: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

2

Page 3: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

3

Outline

Automated Debugging and Failure Triage

SOBER: Statistical Model-Based Fault Localization

Fault Localization-Based Failure Triage

Copy and Paste Bug Mining

Conclusions & Future Research

Page 4: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

4

Software Bugs Are Costly

Software is “full of bugs” Windows 2000, 35 million lines of code

63,000 known bugs at the time of release, 2 per 1000 lines Software failure costs

Ariane 5 explosion due to “errors in the software of the inertial reference system” (Ariaen-5 flight 501 inquiry board report http://ravel.esrin.esa.it/docs/esa-x-1819eng.pdf )

A study by the National Institute of Standards and Technology found that software errors cost the U.S. economy about $59.5 billion annually http://www.nist.gov/director/prog-ofc/report02-3.pdf

Testing and debugging are laborious and expensive “50% of my company employees are testers, and the rest

spends 50% of their time testing!” —Bill Gates, in 1995

Page 5: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

5

Automated Failure Reporting

End-users as Beta testers Valuable information about failure occurrences in reality 24.5 million/day in Redmond (if all users send)

– John Dvorak, PC Magazine Widely adopted because of its usefulness

Microsoft Windows, Linux Gentoo, Mozilla applications … Any applications can implement this functionality

Page 6: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

6

After Failures Collected …: Failure triage

Failure triage Failure prioritization:

What are the most severe bugs? Failure assignment:

Which developers should debug a given set of failures?

Automated debugging Where is the likely bug location?

Page 7: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

7

A Glimpse on Software Bugs

Crashing bugs Symptoms: segmentation faults Reasons: memory access violations Tools: Valgrind, CCured

Noncrashing bugs Symptoms: unexpected outputs Reasons: logic or semantic errors

if ((m >= 0)) vs. if ((m >= 0) && (m != lastm)) < vs. <=, > vs. >=, etc .. j = i vs. j= i+1

Tools: No sound tools

Page 8: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

8

Semantic Bugs Dominate

16%

3%

78%

3%

Semantic Bugs:• Application specific• Only few detectable• Mostly require annotations or specifications

Memory-related Bugs:• Many are detectable

Others

Concurrency bugs

Bug Distribution [Li et al., ICSE’07]

• 264 bugs in Mozilla and 98 bugs in Apache manually checked• 29,000 bugs in Bugzilla automatically checked

Courtesy of Zhenmin Li

Page 9: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

9

Hacking Semantic Bugs is HARD

Major challenge: No crashes! No failure signatures No debugging hints

Major Methods Statistical debugging of semantic bugs [Liu et

al., FSE’05, TSE’06] Triage noncrashing failures through statistical

debugging [Liu et al., FSE’06]

Page 10: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

10

Outline

Automated Debugging and Failure Triage

SOBER: Statistical Model-Based Fault Localization

Fault Localization-Based Failure Triage

Copy and Paste Bug Mining

Conclusions & Future Research

Page 11: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

11

A Running Example

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if (m >= 0){

lastm = m;

}

if ((m == -1) || (m == i)){

i = i + 1;

} else

i = m;

}

}

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if ((m >= 0) && (lastm != m) ){

lastm = m;

}

if ((m == -1) || (m == i)) {

i = i + 1;

} else

i = m;

}

}

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if ((m >= 0)){

lastm = m;

}

if ((m == -1) || (m == i)) {

i = i + 1;

} else

i = m;

}

}

130 of 5542 test cases fail, no crashes

Predicate # of true # of false

(lin[i] != ENDSTR)==true

Ret_amatch < 0

Ret_amatch == 0

Ret_amatch > 0

(m >= 0) == true

(m == i) == true

(m >= -1) == true

5 1

5 1

1 5

1 54 2

2 4

1 5

Predicate evaluation as tossing a coin

Page 12: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

12

Profile Executions as Vectors

5 1 1 5 1 5 4 2 1 5 2 4 1 5

19 1 1 19 1 19 18 2 1 19 2 18 1 19

9 1 1 9 1 9 8 2 8 2 2 8 1 9

Two passing executions

One failing execution

Extreme case Always false in passing and always true in failing …

Generalized case Different true probability in passing and failing

executions

Page 13: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

13

Estimated Head Probability

Evaluation bias Estimated head probability from every

execution Specifically,

where and are the number of true and false evaluations in one execution.

Defined for each predicate and each execution

ft

t

nn

nX

tn fn

Page 14: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

14

Divergence in Head Probability

Multiple evaluation biases from multiple executions

Evaluation bias as generated from models

0 1

Prob

Head Probability0 1

Prob

Head Probability

)|(~),...,,(),...,,( 2121 fmFm XfXXXSfffF )|(~),...,,(),...,,( ''

2'121 pnPn XfXXXSpppP

Page 15: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

15

Major Challenges

No closed form of either model No sufficient number of failing executions

to estimate

0 1

Prob

Head Probability0 1

Prob

Head Probability

)|( fXf

Page 16: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

16

Page 17: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

17

SOBER in Summary

Source CodeSource Code

Pred2

Pred6

Pred1

Pred3 Pred2

Pred6

Pred1

Pred3

SOBERTest Suite

Test Suite

Page 18: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

18

Previous State of the Art [Liblit et al, 2005]

Correlation analysis Context(P) = Prob(fail | P ever

evaluated)

Failure(P) = Prob(fail | P ever evaluated as true)

Increase(P) = Failure(P) – Context(P)How more likely the program fails when a predicate is ever evaluated true

Page 19: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

19

Liblit05 in Illustration

Failing

PassingO

++

+ +++

O

OO O

O O

O

O O

Context(P) = Prob(fail | P ever evaluated)

= 4/10 = 2/5

Increase(P) = Failure(P) – Context(P) = 3/7 – 2/5 = 1/35

Failure(P) = Prob(fail | P ever evaluated as true)

= 3/7

Page 20: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

20

SOBER in Illustration

O

++

+ +++

O

OO O

O O

O

O O

0 1

Prob

Evaluation bias

Failing

Passing

0 1

Prob

Evaluation bias

Page 21: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

21

Difference between SOBER and Liblit05

Methodology: Liblit05: Correlation

analysis SOBER: Model-

based approach void subline(char *lin, char *pat, char

*sub)

{

1 int i, lastm, m;

2 lastm = -1;

3 i = 0;

4 while((lin[i] != ENDSTR)) {

5 m = amatch(lin, i, pat, 0);

6 if (m >= 0){

7 putsub(lin, i, m, sub);

8 lastm = m;

9 }

10 }

11 }

Utilized information Liblit05: Ever true? SOBER: What percentage is

true?

Liblit05:

• Line 6 is ever true in most passing and failing exec.

SOBER:

• Prone to be true in failing exec.

• Prone to be false in passing exec.

Page 22: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

22

l in

while(lin[i]!=E N DSTR ) m = amatc h(lin, i, pat, 0)

i = 0

if(m >= 0) lastm = m

if(m==-1 || m==i) i = i + 1

i = m

lastm = -1

T-Score: Metric of Debugging Quality

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if ((m >= 0)){

lastm = m;

}

if ((m == -1) || (m == i)) {

i = i + 1;

} else

i = m;

}

}

T-score = 70%

How close is the blamed to the real bug location?

Page 23: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

23

l in

while(lin[i]!=E N DSTR ) m = amatc h(lin, i, pat, 0)

i = 0

if(m >= 0) lastm = m

if(m==-1 || m==i) i = i + 1

i = m

lastm = -1

A Better Debugging Result

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if ((m >= 0)){

lastm = m;

}

if ((m == -1) || (m == i)) {

i = i + 1;

} else

i = m;

}

}

T-score = 40%

Page 24: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

24

Evaluation 1: Siemens Program Suite

T-Score <= 20% is meaningful

Siemens program suite 130 buggy versions

of 7 small (<700LOC) programs

What percentage bugs can be located with no more than % code examination

Page 25: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

25

Evaluation 2: Reasonably Large Programs

Bug Type Failure Number T-Score

Flex 2.4.7(8,834 LOC)

Bug 1 Misuse >= for > 163/525 0.5%

Bug 2 Misuse of = for == 356/525 1.6%

Bug 3 Mis-assign value true for false 69/525 7.6%

Bug 4 Mis-parenthesize ((a||b)&&c)as (a || (b && c))

22/525 15.4%

Bug 5 Off-by-one 92/525 45.6%

Grep 2.2(11,826 LOC)

Bug 1 Off-by-one 48/470 0.6%

Bug 2 Subclause-missing 88/470 0.2%

Gzip 1.2(6,184 LOC)

Bug 1 Subclause-missing 65/217 0.5%

Bug 2 Subclause-missing 17/217 2.9%

Software-artifact Infrastructure Repository (SIR): http://sir.unl.edu

Page 26: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

26

A Glimpse of Bugs in Flex-2.4.7

Bug 1: Misuse >= for >

#ifndef BUG_1if ( performance_report > 0 )

#elseif ( performance_report >= 0 )

#endif

Bug 2: Mis-parenthesize

#ifndef BUG_2if ( (fulltbl || fullspd) && reject )

#elseif ( fulltbl || (fullspd && reject) )

#endif

Bug 3: Misuse != for ==#ifndef BUG_3

if (s - nextchar == my_strlen (p->name))#else

if (s - nextchar != my_strlen (p->name))#endif

Bug 4: Misuse = for ==

#ifndef BUG_4if ( yymore_really_used == REALLY_USED )

#elseif ( yymore_really_used = REALLY_NOT_USED )

#endif

Bug 5: Off-by-one

#ifndef BUG_5chk[offset] = EOB_POSITION;

#elsechk[offset - 1] = EOB_POSITION;

#endif

Page 27: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

27

Evaluation 2: Reasonably Large Programs

Bug Type Failure Number T-Score

Flex 2.4.7(8,834 LOC)

Bug 1 Misuse >= for > 163/525 0.5%

Bug 2 Misuse of = for == 356/525 1.6%

Bug 3 Mis-assign value true for false 69/525 7.6%

Bug 4 Mis-parenthesize ((a||b)&&c)as (a || (b && c))

22/525 15.4%

Bug 5 Off-by-one 92/525 45.6%

Grep 2.2(11,826 LOC)

Bug 1 Off-by-one 48/470 0.6%

Bug 2 Subclause-missing 88/470 0.2%

Gzip 1.2(6,184 LOC)

Bug 1 Subclause-missing 65/217 0.5%

Bug 2 Subclause-missing 17/217 2.9%

Software-artifact Infrastructure Repository (SIR): http://sir.unl.edu

Page 28: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

28

A Close Look: Grep-2.2: Bug 1

static int grep(int fd) { ...541 for( ; ; )542 { ...548 lastnl = bufbeg;549 if (lastout) 550 lastout = bufbeg;

553 beg = bufbeg + save - residue + 1; /* fault 1 */

...574 if (beg != lastout)575 lastout = 0; ... 580 } ... 587 return nlines;588 }

P1470

P1484

• 11,826 lines of C code• 3,136 predicates instrumented• 48 out of 470 cases fail

Page 29: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

29

Grep-2.2: Bug 2 static char ** comsubs(char* left, char* right) { ...2264 for(lcp = left; *lcp != '\0'; ++lcp)2265 {

...2268 while (rcp != NULL)2269 {2270 for(i = 1; lcp[i] != '\0' /* && lcp[i] == rcp[i] */; ++i) // fault 22271 continue;

...2275 }

...2280 }2281 return cpp;2282 }

P1952

• 11,826 lines of C code• 3,136 predicates instrumented• 88 out of 470 cases fail

Page 30: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

30

No Silver Bullet: Flex Bug 5132 void genctbl() { ...177 for ( i = 0; i <= lastdfa; ++i )178 {179 int anum = dfaacc[i].dfaacc_state;180 int offset = base[i];#ifndef BUG_5

chk[offset] = EOB_POSITION;#else

chk[offset - 1] = EOB_POSITION;#endif183 chk[offset - 1] = ACTION_POSITION;184 nxt[offset - 1] = anum;/* action number */185 } ...224 }

No wrong value in chk[offset -1]

chk[offset] is not used here but later

• 8,834 lines of C code• 2,699 predicates instrumented

Page 31: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

31

Experiment Result in Summary Effective for bugs demonstrating abnormal control flows

Bug Type Failure Number T-Score

Flex 2.4.7(8,834 LOC)

Bug 1 Misuse >= for > 163/525 0.5%

Bug 2 Misuse of = for == 356/525 1.6%

Bug 3 Mis-assign value true for false 69/525 7.6%

Bug 4 Mis-parenthesize ((a||b)&&c)as (a || (b && c))

22/525 15.4%

Bug 5 Off-by-one 92/525 45.6%

Grep 2.2(11,826 LOC)

Bug 1 Off-by-one 48/470 0.6%

Bug 2 Subclause-missing 88/470 0.2%

Gzip 1.2(6,184 LOC)

Bug 1 Subclause-missing 65/217 0.5%

Bug 2 Subclause-missing 17/217 2.9%

Page 32: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

32

SOBER Handles Memory Bugs As Well

void more_variables () { ...127 old_count = v_count; ...137 for (indx = 3; indx < old_count; indx++) ...141 for (; indx < v_count; indx++) ... }

void more_arrays () { ...167 arrays = (bc_var_array **) bc_malloc (a_count * sizeof(bc_var_array*)); ... /* Copy the old arrays. */ for (indx = 1; indx < old_count; indx++) arrays[indx] = old_ary[indx];

176 for (; indx < v_count; indx++) arrays[indx] = NULL; ... }

bc 1.06:

• Two memory bugs found with SOBER

• One of them is unreported

• Blamed location is NOT the crashing venue

Page 33: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

33

Outline

Automated Debugging and Failure Triage

SOBER: Statistical Model-Based Fault Localization

Fault Localization-Based Failure Triage

Copy and Paste Bug Mining

Conclusions & Future Research

Page 34: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

34

Major Problems in Failure Triage

Failure Prioritization What failures are likely due to the same

bug What bugs are the most severe Worst 1% bugs = 50% failures

Failure Assignment Which developer should debug which set

of failures?

Courtesy of Microsoft Corporation

Page 35: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

35

Failure indexing Identify failures likely due to the same bug

A Solution: Failure Clustering

Most sever

Less Severe

Least Severe

Fault in core.io?

Fault in function

initialize()?

Failure Reports

Failure Reports

++++++

+++++++

++++++

++

+

X

Y

0

Page 36: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

36

The Central Question: A Distance Measure between Failures

Different measures render different clusterings

OOO

OOO

+++++++

X

Y

Dist. defined on X-axis Dist. defined on Y-axis

O+X

++ OO OOOOOO +++++++

Y

0

0 0

Page 37: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

37

How to Define a Distance

Previous work [Podgurski et al., 2003] T-Proximity: Distance defined on literal trace similarity

,1fail ,2fail mfail3fail

1 2 3 m

= SOBER

Our approach [Liu et al., 2006] R-Proximity: Distance defined on likely bug location

Page 38: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

38

3RC

Why Our Approach is Reasonable

Optimal proximity: defined on root causes (RC) Our approach: defined on likely causes (LC)

1RC

,1fail ,2fail mfailF ,1pass ,2pass npass P3fail

2RC mRC+

1fail

mfail+

+2fail

3fail+

X

Y

03RC

= Automated Fault Localization

mLC3LC2LC1LC

Page 39: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

39

R-Proximity: An Instantiation with SOBER

Likely causes (LCs) are predicate rankings

Pred2

Pred3

Pred1

Pred6

2Pred2

Pred3

Pred1

Pred6

3Pred2

Pred6

Pred1

Pred3

m

,1fail ,2fail mfail ,1pass ,2pass npass3failF P

A distances between rankings is needed

+2fail

3fail+

+1fail

mfail+

X

Y

0

Pred2

Pred6

Pred1

Pred3

1

SOBER

Page 40: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

40

Distance between Rankings

Traditional Kendall’s tau distance Number of preference disagreements E.g.

NOT all predicates need to be considered? Predicates are uniformly instrumented Only fault-relevant predicates count

2),( then ),,,(),,,( 2132122131 DistPPPPPP

Page 41: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

41

Predicate Weighting in a Nutshell

Fault-relevant predicates receive higher weights

Fault-relevance is implied by rankings

Pred2

Pred6

Pred1

Pred3

1 2 3 mPred2

Pred6

Pred1

Pred3

Pred2

Pred1

Pred3

Pred6

Pred2

Pred1

Pred3

Pred6 Mostly favored predicates receive

higher weights

Page 42: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

42

Automated Failure Assignment

Most-favored predicates indicate the agreed bug location for a group of failures

Predicate spectrum graph

Pred2

Pred6

Pred1

Pred3

1 2 3 4Pred2

Pred6

Pred1

Pred3

Pred2

Pred1

Pred3

Pred6

Pred2

Pred1

Pred3

Pred6

Pred. Index

Y

0 1 3 62 4 5

2

4

Page 43: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

43

Case Study 1: Grep-2.2

470 test cases in total 136 cases fail due to both faults, no crashes 48 fail due to Fault 1, 88 fail due to Fault 2

static int grep(int fd) { ...541 for( ; ; )542 { ...548 lastnl = bufbeg;549 if (lastout)550 lastout = bufbeg;551 if (buflim - bufbeg == save)552 break;553 beg = bufbeg + save - residue + 1; /* fault 1 */554 for(lim = buflim; lim > beg && lim[-1] != '\n'; --lim)555 ; ...574 if (beg != lastout)57 5 lastout = 0;576 save = residue + lim - beg; ...580 } ...587 return nlines;588 }

Fault 1: A n off-by-one error in grep.c

static char ** comsubs(char* left, char* right) { ...2264 for(lcp = left; *lcp != '\0'; ++lcp)2265 {2266 len = 0;2267 rcp = index(right, *lcp);2268 while (rcp != NULL)2269 { /* fault 2 */2270 for (i = 1; lcp[i] != '\0'/* && lcp[i] == rcp[i] */; ++i)2271 continue;2272 if (i > len)2273 len = i;2274 rcp = index(rcp + 1, *lcp);2275 }2276 if (len == 0)2277 continue;2278 if ((cpp = enlist(cpp, lcp, len)) == NULL)2279 break;2280 }2281 return cpp;2282 }

Fault 2: A subc lause-miss ing error in dfa.c

P 1470

P 1484

P 1952

Page 44: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

44

Failure Proximity Graphs

Red crosses are failures due to Fault 1 Blue circles are failures due to Fault 2 Divergent behaviors due to the same fault Better clustering result under R-Proximity

T-Proximity R-Proximity

Page 45: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

45

Guided Failure Assignment What predicates are favored in each

group?

Page 46: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

46

Assign Failures to Appropriate Developers

The 21 failing cases in Cluster 1 are assigned to developers responsible for the function grep

The 112 failing cases in Cluster 2 are assigned to developers responsible for the function comsub

static int grep(int fd) { ...541 for( ; ; )542 { ...548 lastnl = bufbeg;549 if (lastout)550 lastout = bufbeg;551 if (buflim - bufbeg == save)552 break;553 beg = bufbeg + save - residue + 1; /* fault 1 */554 for(lim = buflim; lim > beg && lim[-1] != '\n'; --lim)555 ; ...574 if (beg != lastout)57 5 lastout = 0;576 save = residue + lim - beg; ...580 } ...587 return nlines;588 }

Fault 1: A n off-by-one error in grep.c

static char ** comsubs(char* left, char* right) { ...2264 for(lcp = left; *lcp != '\0'; ++lcp)2265 {2266 len = 0;2267 rcp = index(right, *lcp);2268 while (rcp != NULL)2269 {2270 for (i = 1; lcp[i] != '\0'/* && lcp[i] == rcp[i] */; ++i) /* fault 2 */2271 continue;2272 if (i > len)2273 len = i;2274 rcp = index(rcp + 1, *lcp);2275 }2276 if (len == 0)2277 continue;2278 if ((cpp = enlist(cpp, lcp, len)) == NULL)2279 break;2280 }2281 return cpp;2282 }

Fault 2: A subc lause-miss ing error in dfa.c

P 1470

P 1484

P 1952

Page 47: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

47

Case Study 2: Gzip-1.2.3

217 test cases in total 82 cases fail due to both faults, no crashes 65 fail due to Fault 1, 17 fail due to Fault 2

661 ulg deflate() { ...//Fault 1686 if (hash_head != NIL /* && prev_length < max_lazy_match */ 687 && strstart - hash_head <= MAX_DIST ) { ... 692 match_length = longest_match (hash_head); }707 if (prev_length >= MIN_MATCH && match_length <= prev_length) { ...711 flush = ct_tally(strstart-1-prev_match, prev_length - MIN_MATCH); ...719 strstart++; ...732 } else if (match_available) { ...738 if (ct_tally (0, window[strstart-1])) {

...741 strstart++;743 } else { ...750 } }

580 local ulg deflate_fast() { ... // Fault 2596 if (hash_head != NIL /*&& strstart - hash_head <= MAX_DIST*/) {601 match_length = longest_match (hash_head); }605 if (match_length >= MIN_MATCH) { ...608 flush = ct_tally(strstart-match_start, match_length - MIN_MATCH);610 lookahead -= match_length;

615 if (match_length <= max_insert_length) { ...626 strstart++; 635 }636 } else { ...639 flush = ct_tally (0, window[strstart]); ...642 }653 return FLUSH_BLOCK(1); /* eof */654 }

Page 48: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

48

Failure Proximity Graphs

Red crosses are for failures due to Fault 1 Blue circles are for failures due to Fault 2 Nearly perfect clustering under R-Proximity Accurate failure assignment

T-Proximity R-Proximity

Page 49: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

49

Outline

Automated Debugging and Failure Triage

SOBER: Statistical Model-Based Fault Localization

Fault Localization-Based Failure Triage

Copy and Paste Bug Mining

Conclusions & Future Research

Page 50: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

50

Mining Copy-Paste Bugs

Copy-pasting is common 12% in Linux file

system [Kasper2003]

19% in X Window system [Baker1995]

Copy-pasted code is error prone Among 35 errors in

Linux drivers/i2o, 34 are caused by copy-paste [Chou2001]

Forget to change!

void __init prom_meminit(void){ …… for (i=0; i<n; i++) { total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1]; } ……

for (i=0; i<n; i++) { taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1]; }

for (i=0; i<n; i++) { total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1]; }

(Simplified example from linux-2.6.6/arch/sparc/prom/memory.c)

Page 51: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

51

An Overview of Copy-Paste Bug Detection

Parse source code &build a sequence database

Mine for basic copy-pasted segments

Compose larger copy-pasted segments

Prune false positives

Page 52: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

52

Parsing Source Code

5 61 20

old = 3;Tokenize

Hash16

Purpose: building a sequence database Idea: statement number

Tokenize each component Different operators/constant/key words different tokens

Handle identifier renaming: same type of identifiers same token

new = 3;

5 61 20

Hash16

Page 53: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

53

Building Sequence Database

Program a long sequence Need a sequence

database

Cut the long sequence Naïve method: fixed

length Our method: basic block

for (i=0; i<n; i++) { total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1];} ……

for (i=0; i<n; i++) { taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1];}

65161671

65161671

Final sequence DB:(65)(16, 16, 71)…(65)(16, 16, 71)

Hash values

Page 54: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

54

total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1];

taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1];

Mining for Basic Copy-pasted Segments

Apply frequent sequence mining algorithm on the sequence database

Modification Constrain the max gap

(16, 16, 71)……(16, 16, 10, 71)

(16, 16, 71)……(16, 16, 71)

Frequent subsequence Insert 1 statement

(gap = 1)

Page 55: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

55

Composing Larger Copy-Pasted Segments

Combine the neighboring copy-pasted segments repeatedly

for (i=0; i<n; i++) {

total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1];

for (i=0; i<n; i++) {

taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1];

for (i=0; i<n; i++) { taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1];}

for (i=0; i<n; i++) { total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1];}

65

65

161671

161671

65161671

65161671

Hash values

copy-pasted

……

combine

combine

Page 56: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

56

Pruning False Positives

Unmappable segments Identifier names cannot be mapped to

corresponding ones

Tiny segments

For more detail, see Zhenmin Li, Shan Lu, Suvda Myagmar, Yuanyuan Zhou.

CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code, in Proc. 6th Symp. Operating Systems Design and Implementation, 2004

f (a1);f (a2);f (a3);

f1 (b1);f1 (b2);f2 (b3);

conflict

Page 57: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

57

Some Test Results of C-P Bug Detection

Software Verified Bugs Potential Bugs (careless programming)

Linux 28 21

FreeBSD 23 8

Apache 5 0

PostgreSQL 2 0

Software # LOC

Linux 4.4 M

FreeBSD 3.3 M

Apache 224 K

PostgreSQL 458 K

Space (MB)TimeSoftware

5738 secsPostgreSQL

3015 secsApache

45920 minsFreeBSD

52720 minsLinux

Page 58: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

58

Outline

Automated Debugging and Failure Triage

SOBER: Statistical Model-Based Fault Localization

Fault Localization-Based Failure Triage

Copy and Paste Bug Mining

Conclusions & Future Research

Page 59: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

59

Conclusions

Data mining into software and computer systems Identify incorrect executions from program

runtime behaviors Classification dynamics can give away “backtrace”

for noncrashing bugs without any semantic inputs A hypothesis testing-like approach is developed to

localize logic bugs in software No prior knowledge about the program semantics

is assumed Lots of other software bug mining methods should

be and explored

Page 60: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

60

Future Research: Mining into Computer Systems

Huge volume of data from computer systems Persistent state interactions, event logs,

network logs, CPU usage, … Mining system data for …

Reliability Performance Manageability …

Challenges in data mining Statistical modeling of computer systems Online, scalability, interpretability …

Page 61: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

61

References [DRL+98] David L. Detlefs, K. Rustan, M. Leino, Greg Nelson and James B. Saxe. Extended

static checking, 1998 [EGH+94] David Evans, John Guttag, James Horning, and Yang Meng Tan. LCLint: A tool for

using specifications to check code. In Proceedings of the ACM SIG-SOFT '94 Symposium on the Foundations of Software Engineering, pages 87-96, 1994.

[DLS02] Manuvir Das, Sorin Lerner, and Mark Seigle. Esp: Path-sensitive program verication in polynomial time. In Conference on Programming Language Design and Implementation, 2002.

[ECC00] D.R. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system-specic, programmer-written compiler extensions. In Proc. 4th Symp. Operating Systems Design and Implementation, October 2000.

[M93] Ken McMillan. Symbolic Model Checking. Kluwer Academic Publishers, 1993 [H97] Gerard J. Holzmann. The model checker SPIN. Software Engineering, 23(5):279-295,

1997. [DDH+92] David L. Dill, Andreas J. Drexler, Alan J. Hu, and C. Han Yang. Protocol verication as

a hardware design aid. In IEEE Int. Conf. Computer Design: VLSI in Computers and Processors, pages 522-525, 1992.

[MPC+02] M. Musuvathi, D. Y.W. Park, A. Chou, D. R. Engler and D. L. Dill. CMC: A Pragmatic Approach to Model Checking Real Code. In Proc. 5th Symp. Operating Systems Design and Implementation, 2002.

Page 62: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

62

References (cont’d)

[G97] P. Godefroid. Model Checking for Programming Languages using VeriSoft. In Proc. 24th ACM Symp. Principles of Programming Languages, 1997

[BHP+-00] G. Brat, K. Havelund, S. Park, and W. Visser. Model checking programs. In IEEE Int.l Conf. Automated Software Engineering (ASE), 2000.

[HJ92] R. Hastings and B. Joyce. Purify: Fast Detection of Memory Leaks and Access Errors. 1991. in Proc. Winter 1992 USENIX Conference, pp. 125-138. San Francisco, California

Chao Liu, Xifeng Yan, and Jiawei Han, “Mining Control Flow Abnormality for Logic Error Isolation,” in Proc. 2006 SIAM Int. Conf. on Data Mining (SDM'06), Bethesda, MD, April 2006.

C. Liu, X. Yan, L. Fei, J. Han, and S. Midkiff, “SOBER: Statistical Model-based Bug Localization”, in Proc. 2005 ACM SIGSOFT Symp. Foundations of Software Engineering (FSE 2005), Lisbon, Portugal, Sept. 2005.

C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu, “Mining Behavior Graphs for Backtrace of Noncrashing Bugs”, in Proc. 2005 SIAM Int. Conf. on Data Mining (SDM'05), Newport Beach, CA, April 2005.

[SN00] Julian Seward and Nick Nethercote. Valgrind, an open-source memory debugger for x86-GNU/Linux http://valgrind.org/

[LLM+04] Zhenmin Li, Shan Lu, Suvda Myagmar, Yuanyuan Zhou. CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code, in Proc. 6th Symp. Operating Systems Design and Implementation, 2004

[LCS+04] Zhenmin Li, Zhifeng Chen, Sudarshan M. Srinivasan, Yuanyuan Zhou. C-Miner: Mining Block Correlations in Storage Systems. In pro. 3rd USENIX conf. on file and storage technologies, 2004

Page 63: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

63

Page 64: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

64

Surplus Slides

The remaining are leftover slides

Page 65: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

65

Representative Publications

Chao Liu, Long Fei, Xifeng Yan, Jiawei Han and Samuel Midkiff, “Statistical Debugging: A Hypothesis Testing-Based Approach,” IEEE Transaction on Software Engineering, Vol. 32, No. 10, pp. 831-848, Oct., 2006.

Chao Liu and Jiawei Han, “R-Proximity: Failure Proximity Defined via Statistical Debugging,” IEEE Transaction on Software Engineering, Sept. 2006. (under review)

Chao Liu, Zeng Lian and Jiawei Han, "How Bayesians Debug", the 6th IEEE International Conference on Data Mining, pp. pp. 382-393,Hong Kong, China, Dec. 2006.

Chao Liu and Jiawei Han, "Failure Proximity: A Fault Localization-Based Approach", the 14th ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 286-295, Portland, USA, Nov. 2006.

Chao Liu, "Fault-aware Fingerprinting: Towards Mutualism between Failure Investigation and Statistical Debugging", the 14th ACM SIGSOFT Symposium on the Foundations of Software Engineering, Portland, USA, Nov. 2006.

Chao Liu, Chen Chen, Jiawei Han and Philip S. Yu, "GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis", the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 872-881, Philadelphia, USA, Aug. 2006.

Qiaozhu Mei, Chao Liu, Hang Su and Chengxiang Zhai, "A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs", the 15th International Conference on World Wide Web, pp. 533-542, Edinburgh, Scotland, May, 2006. 

Chao Liu, Xifeng Yan and Jiawei Han, "Mining Control Flow Abnormality for Logic Error Isolation", 2006 SIAM International Conference on Data Mining, pp. 106-117, Bethesda, US, April, 2006.

Chao Liu, Xifeng Yan, Long Fei, Jiawei Han and Samuel Midkiff, "SOBER: Statistical Model-Based Bug Localization", the 5th joint meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 286-295, Lisbon, Portugal, Sept. 2005.

William Yurcik and Chao Liu. "A First Step Toward Detecting SSH Identity Theft on HPC Clusters: Discriminating Cluster Masqueraders Based on Command Behavior" the 5th International Symposium on Cluster Computing and the Grid, pp. 111-120, Cardiff, UK, May 2005.

Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han and Philip S. Yu, "Mining Behavior Graphs for "Backtrace" of Noncrashing Bugs", In Proc. 2005 SIAM Int. Conf. on Data Mining, pp. 286-297, Newport Beach, US, April, 2005.

Page 66: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

66

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if (m > 0){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if (m >= 0){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

Example of Noncrashing Bugs

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if (m >= 0){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if ((m >= 0) && (lastm != m) ){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

Page 67: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

67

Debugging Crashes

Crashing Bugs

Page 68: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

68

Bug Localization via Backtrace

Can we circle out the backtrace for noncrashing bugs?

Major challenges

We do not know where abnormality happens

Observations

Classifications depend on discriminative features,

which can be regarded as a kind of abnormality

Can we extract backtrace from classification

results?

Page 69: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

69

Outline

Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification

Dynamics Mining Control Flow Abnormality for Logic

Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions

Page 70: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

70

Related Work

Crashing bugs Memory access monitoring

Purify [HJ92], Valgrind [SN00] …

Noncrashing bugs Static program analysis Traditional model checking Model checking source code

Page 71: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

71

Static Program Analysis

Methodology Examine source code directly Enumerate all the possible execution paths without

running the program Check user-specified properties, e.g.

free(p) …… (*p) lock(res) …… unlock(res) receive_ack() … … send_data()

Strengths Check all possible execution paths

Problems Shallow semantics Properties can be directly mapped to source code

structure Tools

ESC [DRL+98], LCLint [EGH+94], ESP [DLS02], MC Checker [ECC00] …

×

Page 72: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

72

Traditional Model Checking

Methodology Formally model the system under check in a

particular description language Exhaustive exploration of the reachable states

in checking desired or undesired properties Strengths

Model deep semantics Naturally fit in checking event-driven systems,

like protocols Problems

Significant amount of manual efforts in modeling

State space explosion Tools

SMV [M93], SPIN [H97], Murphi [DDH+92] …

Page 73: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

73

Model Checking Source Code

Methodology Run real program in sandbox Manipulate event happenings, e.g.,

Message incomings the outcomes of memory allocation

Strengths Less significant manual specification

Problems Application restrictions, e.g.,

Event-driven programs (still) Clear mapping between source code and logic event

Tools CMC [MPC+02], Verisoft [G97], Java PathFinder

[BHP+-00] …

Page 74: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

74

Summary of Related Work

In common, Semantic inputs are necessary

Program model Properties to check

Application scenarios Shallow semantics Event-driven system

Page 75: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

75

Outline

Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification

Dynamics Mining Control Flow Abnormality for Logic

Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions

Page 76: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

76

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if (m > 0){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if (m >= 0){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

Example Revisited

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if (m >= 0){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if ((m >= 0) && (lastm != m) ){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

• No memory violations

• Not event-driven program

• No explicit error properties

Page 77: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

77

Identification of Incorrect Executions

A two-class classification problem How to abstract program executions

Program behavior graph Feature selection

Edges + Closed frequent subgraphs Program behavior graphs

Function-level abstraction of program behaviorsint main(){ ... A(); ... B();}int A(){ ... }int B(){ ... C() ... }int C(){ ... }

Page 78: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

78

Values of Classification

A graph classification problem Every execution gives one behavior graph Two sets of instances: correct and incorrect

Values of classification Classification itself does not readily work for bug

localization Classifier only labels each run as either correct or incorrect as a

whole It does not tell when abnormality happens

Successful classification relies on discriminative features

Can discriminative features be treated as a kind of abnormality?

When abnormality happens? Incremental classification?

?

Page 79: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

79

Outline

Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification

Dynamics Mining Control Flow Abnormality for Logic

Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions

Page 80: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

80

Incremental Classification

Classification works only when instances of two classes are different.

So that we can use classification accuracy as a measure of difference.

Relate classification dynamics to bug relevant functions

Page 81: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

81

Illustration: Precision Boost

main main

A A

B C

D

B C

D

One Correct Execution One Incorrect Execution

E E

F

G

F

G

H

Page 82: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

82

Bug Relevance

Precision boost For each function F:

Precision boost = Exit precision - Entrance precision.

Intuition Differences take place within the execution

of F Abnormalities happens while F is in the

stack The larger this precision boost, the more

likely F is part of the backtrace Bug-relevant function

Page 83: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

83

Outline

Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Case Study Conclusions

Page 84: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

84

Case Study

Subject program replace: perform regular

expression matching and substitutions

563 lines of C code 17 functions are involved

Execution behaviors 130 out of 5542 test cases

fail to give correct outputs No incorrect executions

incur segmentation faults Logic bug

Can we circle out the backtrace for this bug?

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if (m >= 0){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if ((m >= 0) && (lastm != m) ){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

Page 85: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

85

Precision Pairs

Page 86: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

86

Precision Boost Analysis

Objective judgment of bug relevant functions

main function is always bug relevant

Stepwise precision boost

Line-up property

Page 87: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

87

Backtrace for Noncrashing Bugs

Page 88: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

88

Method Summary

Identify incorrect executions from program runtime

behaviors

Classification dynamics can give away “backtrace”

for noncrashing bugs without any semantic inputs

Data mining can contribute to software

engineering and system researches in general

Page 89: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

89

Outline

Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification

Dynamics Mining Control Flow Abnormality for Logic

Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions

Page 90: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

90

An Example

Replace program: 563 lines of C code, 20 functions Symptom: 30 out of 5542 test cases fail to give correct outputs,

and no crashes Goal: Localizing the bug, and prioritizing manual examination

void dodash(char delim, char *src, int *i, char *dest, int *j, int maxset){ while (…){ … if(isalnum(isalnum(src[*i+1]) && src[*i-1]<=src[*i+1]){

for(k = src[*i-1]+1; k<=src[*i+1]; k++)junk = addst(k, dest, j, maxset);

*i = *i + 1;}*i = *i + 1;

}}

Page 91: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

91

Difficulty & Expectation

Difficulty Statically, even small programs are

complex due to dependencies Dynamically, execution paths can vary

significantly across all possible inputs Logic errors have no apparent

symptoms Expectations

Unrealistic to fully unload developers Localize buggy region Prioritize manual examination

Page 92: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

92

Execution Profiling

Full execution trace Control flow + value tags Too expensive to record at runtime Unwieldy to process

Summarized control flow for conditionals (if, while, for) Branch evaluation counts Lightweight to take at runtime Easy to process and effective

Page 93: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

93

Analysis of the Example

A = isalnum(isalnum(src[*i+1]))

B = src[*i-1]<=src[*i+1] An execution is logically correct until (A ^ ¬B) is evaluated

as true when the evaluation reaches this condition If we monitor the program conditionals like A here, their

evaluation will shed light on the hidden error and can be exploited for error isolation

if(isalnum(isalnum(src[*i+1]) && src[*i-1]<=src[*i+1]){

for(k = src[*i-1]+1; k<=src[*i+1]; k++)

junk = addst(k, dest, j, maxset);

*i = *i + 1; }

Page 94: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

94

Analysis of Branching Actions

Correct vs. in correct runs in program P

AS we tested through 5542 test cases, the true eval prob for (A^¬B) is 0.727 in a correct and 0.896 in an incorrect execution on average

Error location does exhibit detectable abnormal behaviors in incorrect executions

A ¬A

B nAB n¬AB

¬B nA¬B = 0 n¬A¬B

A ¬A

B nAB n¬AB

¬B nA¬B ≥1 n¬A¬B

Page 95: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

95

Conditional Test Works for Nonbranching Errors

Off-by-one error can still be detected using the conditional tests

Void makepat (char *arg, int start, char delim, char *pat)

{

if (!junk)

result = 0;

else

result = i + 1; /* off-by-one error */

/* should be: result = i */

return result;

}

Page 96: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

96

Ranking Based on Boolean Bias

Let input di has a desired output oi. We execute P. P passes the test iff oi’ is identical to oi

Tp = {ti| oi’= P(di) matches oi}

Tf = {ti| oi’= P(di) does not match oi} Boolean bias:

nt: # times that a boolean feature B evaluates true, similar for nf

Boolean bias: π(B) = (nt – nf )/(nt + nf) It encodes the distribution of B’s value: 1 if B

always assumes true, -1 if always false, in between for all the other mixtures

Page 97: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

97

Evaluation Abnormality

Boolean bias for branch P the probability of being evaluated as true within

one execution Suppose we have n correct and m incorrect

executions, for any predicate P, we end up with An observation sequence for correct runs

S_p = (X’_1, X’_2, …, X’_n) An observation sequence for incorrect runs

S_f = (X_1, X_2, …, X_m) Can we infer whether P is suspicious based on S_p

and S_f?

Page 98: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

98

Underlying Populations

Imagine the underlying distribution of boolean bias for correct and incorrect executions are f(X|θp) and f(X|θf)

S_p and S_f can be viewed as random sample from the underlying populations respectively

Major heuristic: The larger the divergence between f(X|θp) and f(X|θf), the more relevant the branch P is to the bug

0 1

Prob

Evaluation bias 0 1

Prob

Evaluation bias

Page 99: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

99

Major Challenges

No knowledge of the closed forms of both distributions

Usually, we do not have sufficient incorrect executions to estimate f(X|θf) reliably.

0 1

Prob

Evaluation bias0 1

Prob

Evaluation bias

Page 100: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

100

Our Approach: Hypothesis Testing

Page 101: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

101

Faulty Functions

Motivation Bugs are not necessarily on branches Higher confidence in function rankings

than branch rankings Abnormality score for functions

Calculate the abnormality score for each branch within each function

Aggregate them

Page 102: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

102

Two Evaluation Measures

CombineRank Combine these score by summation Intuition: When a function contains many

abnormal branches, it is likely bug-relevant UpperRank

Choose the largest score as the representative

Intuition: When a function has one extremely abnormal branch, it is likely bug-relevant

Page 103: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

103

Dodash vs. Omatch: Which function is likely buggy?─And Which Measure is More

Effective?

Page 104: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

104

Bug Benchmark

Bug benchmark Siemens Program Suite

89 variants of 6 subject programs, each of 200-600 LOC

89 known bugs in total Mainly logic (or semantic) bugs

Widely used in software engineering research

Page 105: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

105

Results on Program “replace”

Page 106: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

106

Comparison between CombineRank and UpperRank

Buggy function ranked within top-k

Page 107: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

107

Results on Other Programs

Page 108: 5/4/10 Data Mining: Principles and Algorithms

04/10/23 Data Mining: Principles and Algorithms

108

More Questions to Be Answered

What will happen (i.e., how to handle) if multiple errors exist in one program?

How to detect bugs if only very few error test cases are available?

Is it really more effective if we have more execution traces?

How to integrate program semantics in this statistics-based testing algorithm?

How to integrate program semantics analysis with statistics-based analysis?