95
Micro Interaction Metrics for Defect Prediction Taek Lee, Jaechang Nam, Dongyun Han, Sunghun Kim, Hoh Peter In FSE 2011, Hungary, Sep. 5-9

Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Embed Size (px)

DESCRIPTION

ESEC/FSE presentation

Citation preview

Page 1: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Micro Interaction Metrics for Defect Prediction

Taek Lee, Jaechang Nam, Dongyun Han, Sunghun Kim, Hoh Peter In

FSE 2011, Hungary, Sep. 5-9

Page 2: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Outline

• Research motivation

• The existing metrics

• The proposed metrics

• Experiment results

• Threats to validity

• Conclusion

Page 3: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Defect Prediction?

why is it necessary?

Page 4: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Software quality assurance is inherently a resource

constrained activity!

Page 5: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Predicting defect-prone

software entities* is

to put the best labor effort

on the entities

* functions or code files

Page 6: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

• Complexity of source codes (Chidamber and Kemerer 1994)

• Frequent code changes (Moser et al. 2008)

• Previous defect information (Kim et al. 2007)

• Code dependencies (Zimmermann 2007)

Indicators of defects

Page 7: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Indeed,

where do defects come from?

Page 8: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Humans Error!

Programmers make mistakes,

consequently defects are

injected, and software fails

Human Errors

Bugs Injected

Software fails

Page 9: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Programmer Interaction and Software Quality

Page 10: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

“Errors are from cognitive breakdown while understanding and implementing

requirements”

- Ko et al. 2005

Programmer Interaction and Software Quality

Page 11: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

“Work interruptions or task switching may affect programmer productivity”

- DeLine et al. 2006

“Errors are from cognitive breakdown while understanding and implementing

requirements”

- Ko et al. 2005

Programmer Interaction and Software Quality

Page 12: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Don’t we need to also consider

developers’ interactions

as defect indicators?

Page 13: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

…, but the existing indicators

can NOT directly capture

developers’ interactions

Page 14: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Using Mylyn data, we propose novel

“Micro Interaction Metrics (MIMs)”

capturing developers’ interactions

Page 15: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

The Mylyn* data is stored

as an attachment to the

corresponding bug reports in

the XML format

* Eclipse plug-in storing and recovering task contexts

Page 16: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)
Page 17: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

<InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ” … StructureHandle=“ ” … Interest=“ ” … >

Page 18: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

<InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ” … StructureHandle=“ ” … Interest=“ ” … >

Page 19: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

<InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ” … StructureHandle=“ ” … Interest=“ ” … >

Page 20: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Two levels of MIMs Design

Page 21: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Two levels of MIMs Design

File-level MIMs

specific interactions for a

file in a task (e.g., AvgTimeIntervalEditEdit)

Page 22: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Two levels of MIMs Design

File-level MIMs

specific interactions for a

file in a task (e.g., AvgTimeIntervalEditEdit)

Task-level MIMs

property values shared

over the whole task (e.g., TimeSpent)

Page 23: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Two levels of MIMs Design

File-level MIMs

specific interactions for a

file in a task (e.g., AvgTimeIntervalEditEdit)

Mylyn Task Logs

Selection

Edit

Edit

file A

file B

file B

10:30

11:00

12:30

Page 24: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Two levels of MIMs Design

Task-level MIMs

property values shared

over the whole task (e.g., TimeSpent)

Mylyn Task Logs

Selection

Edit

Edit

file A

file B

file B

10:30

11:00

12:30

Page 25: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

The Proposed Micro Interaction Metrics

Page 26: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

The Proposed Micro Interaction Metrics

Page 27: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

The Proposed Micro Interaction Metrics

Page 28: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

For example,

NumPatternSXEY is to capture

this interaction:

Page 29: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

For example,

NumPatternSXEY is to capture

this interaction:

“How much times did a programmer

Select a file of group X

and then Edit a file of group Y

in a task activity?”

Page 30: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

group X or Y

X if a file shows defect locality* properties

Y otherwise

group H or L

H if a file has high** DOI value

L otherwise

* hinted by the paper [Kim et al. 2007] ** threshold: median of degree of interest (DOI) values in a task

Page 31: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Bug Prediction Process

Page 32: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

STEP1: Counting & Labeling Instances

Sep 2010 Time P

Task 1 Task 2 Task 3 Task i

f3.java f1.java

f2.java f3.java

f1.java f2.java f3.java

Task i+1 Task i+2 Task i+3

Dec 2005

Page 33: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

STEP1: Counting & Labeling Instances

Sep 2010 Time P

Task 1 Task 2 Task 3 Task i

f3.java f1.java

f2.java f3.java

f1.java f2.java f3.java

Task i+1 Task i+2 Task i+3

Dec 2005

All the Mylyn task data collectable from Eclipse subprojects (Dec 05 ~Sep 10)

Page 34: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

STEP1: Counting & Labeling Instances

Sep 2010 Time P

Task 1 Task 2 Task 3 Task i

f3.java f1.java

f2.java f3.java

f1.java f2.java f3.java

Task i+1 Task i+2 Task i+3

Dec 2005

Page 35: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

STEP1: Counting & Labeling Instances

Sep 2010 Time P

Task 1 Task 2 Task 3 Task i

f3.java f1.java

f2.java f3.java

f1.java f2.java f3.java

Task i+1 Task i+2 Task i+3

Post-defect counting period

Dec 2005

Page 36: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

STEP1: Counting & Labeling Instances

Sep 2010 Time P

Task 1 Task 2 Task 3 Task i

f3.java f1.java

f2.java f3.java

f1.java f2.java f3.java

Task i+1 Task i+2 Task i+3

Post-defect counting period

Dec 2005

Page 37: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

STEP1: Counting & Labeling Instances

Sep 2010 Time P

Task 1 Task 2 Task 3 Task i

f3.java f1.java

f2.java f3.java

f1.java f2.java f3.java

Task i+1 Task i+2 Task i+3

Post-defect counting period

Dec 2005

Page 38: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

STEP1: Counting & Labeling Instances

Sep 2010 Time P

Task 1 Task 2 Task 3 Task i

f3.java f1.java

f2.java f3.java

f1.java f2.java f3.java

Task i+1 Task i+2 Task i+3

Post-defect counting period

Dec 2005

Page 39: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

STEP1: Counting & Labeling Instances

Sep 2010 Time P

Task 1 Task 2 Task 3 Task i

f3.java f1.java

f2.java f3.java

f1.java f2.java f3.java

Task i+1 Task i+2 Task i+3

Post-defect counting period

Dec 2005

Page 40: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

STEP1: Counting & Labeling Instances

Sep 2010 Time P

Task 1 Task 2 Task 3 Task i

f3.java f1.java

f2.java f3.java

f1.java f2.java f3.java

Task i+1 Task i+2 Task i+3

Post-defect counting period

Dec 2005

The number of counted post defects (edited files only within bug fixing tasks)

f1.java = 1 f2.java = 1 f3.java = 2

Labeling rule for the file instance “buggy” (if # of post-defects > 0)

“clean” (if # of post-defects = 0)

Page 41: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Sep 2010 Time P Dec 2005

STEP2: Extraction of MIMs

Page 42: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Sep 2010 Time P

Metrics extraction period

Dec 2005

Task 1 Task 2 Task 3 Task 4

STEP2: Extraction of MIMs

Page 43: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Sep 2010 Time P

f3.java ...

edit …

edit …

Task 1

Metrics extraction period

Dec 2005

STEP2: Extraction of MIMs

Page 44: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Sep 2010 Time P

f3.java ...

edit …

edit …

Task 1

Metrics extraction period

Dec 2005

Metrics Computation

STEP2: Extraction of MIMs

Page 45: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Sep 2010 Time P

f3.java ...

edit …

edit …

Task 1

Metrics extraction period

Dec 2005

Metrics Computation

MIMf3.java valueTask1

STEP2: Extraction of MIMs

Page 46: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Sep 2010 Time P

f3.java ...

edit …

edit …

Task 1

f1.java ...

edit …

edit …

Task 2

Metrics extraction period

Dec 2005

Metrics Computation

MIMf1.java valueTask2

MIMf3.java valueTask1

STEP2: Extraction of MIMs

Page 47: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Sep 2010 Time P

f3.java ...

edit …

edit …

Task 1

f1.java ...

edit …

edit …

Task 2

f2.java ...

edit …

edit …

Task 3

Metrics extraction period

Dec 2005

Metrics Computation

MIMf1.java valueTask2

MIMf2.java valueTask3

MIMf3.java valueTask1

STEP2: Extraction of MIMs

Page 48: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Sep 2010 Time P

f3.java ...

edit …

edit …

Task 1

f1.java ...

edit …

edit …

Task 2

f2.java ...

edit …

edit …

Task 3

f1.java …edit …edit..

f2.java …edit…

Task 4

Metrics extraction period

Dec 2005

Metrics Computation

MIMf1.java valueTask2

MIMf2.java valueTask3

MIMf3.java valueTask1

STEP2: Extraction of MIMs

Page 49: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Sep 2010 Time P

f3.java ...

edit …

edit …

Task 1

f1.java ...

edit …

edit …

Task 2

f2.java ...

edit …

edit …

Task 3

f1.java …edit …edit..

f2.java …edit…

Task 4

Metrics extraction period

Dec 2005

Metrics Computation

MIMf1.java (valueTask2+valueTask4)

MIMf2.java (valueTask3+valueTask4)

MIMf3.java valueTask1

STEP2: Extraction of MIMs

Page 50: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Sep 2010 Time P

f3.java ...

edit …

edit …

Task 1

f1.java ...

edit …

edit …

Task 2

f2.java ...

edit …

edit …

Task 3

f1.java …edit …edit..

f2.java …edit…

Task 4

Metrics extraction period

Dec 2005

Metrics Computation

MIMf1.java (valueTask2+valueTask4)

MIMf2.java (valueTask3+valueTask4)

MIMf3.java valueTask1

STEP2: Extraction of MIMs

MIMf1.java (valueTask2+valueTask4)/2

MIMf2.java (valueTask3+valueTask4)/2

Page 51: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Understand JAVA tool was used

for extracting 32 source Code

Metrics (CMs)*

* Chidamber and Kemerer, and OO metrics

Page 52: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Understand JAVA tool was used

for extracting 32 source Code

Metrics (CMs)*

* Chidamber and Kemerer, and OO metrics

Time P Dec 2005 Sep 2010

CVS last revision

List of selected source code metrics

Page 53: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Fifteen History Metrics (HMs)* were collected from the corresponding

CVS repository

* Moser et al.

Page 54: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Fifteen History Metrics (HMs)* were collected from the corresponding

CVS repository

Time P Dec 2005 Sep 2010

CVS revisions

List of history metrics (HMs)

* Moser et al.

Page 55: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Instance Name

Extracted MIMs Label

# of post defects

Instance Name

Extracted MIMs

Training Classifier

Training Regression

STEP3: Creating a training corpus

Page 56: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

STEP4: Building prediction models

Classification and Regression

modeling with different machine

learning algorithms using the

WEKA* tool

* an open source data mining tool

Page 57: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

STEP5: Prediction Evaluation

Classification Measures

Page 58: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

STEP5: Prediction Evaluation

Classification Measures

How many instances are really buggy among

the buggy-predicted outcomes?

Page 59: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

STEP5: Prediction Evaluation

Classification Measures

How many instances are really buggy among

the buggy-predicted outcomes?

How many instances are correctly predicted as ‘buggy’ among the real

buggy ones

Page 60: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

STEP5: Prediction Evaluation

correlation coefficient (-1~1)

mean absolute error (0~1)

root square error (0~1)

Regression Measures

Page 61: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

STEP5: Prediction Evaluation

correlation coefficient (-1~1)

mean absolute error (0~1)

root square error (0~1)

Regression Measures

between # of real buggy instances and # of instances

predicted as buggy

Page 62: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Reject H0* and accept H1* if p-value < 0.05

(at the 95% confidence level)

* H0: no difference in average performance, H1: different (better!)

T-test with 100 times of 10-fold cross validation

Page 63: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Result Summary MIMs improve prediction accuracy for

different Eclipse project subjects

different machine learning algorithms

different model training periods

1

2

3

Page 64: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

File instances and % of defects

Prediction for different project subjects

Page 65: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Prediction for different project subjects

MIM: the proposed metrics CM: source code metrics HM: history metrics

Page 66: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Prediction for different project subjects

MIM: the proposed metrics CM: source code metrics HM: history metrics

BASELINE: Dummy Classifier

predicts in a purely random manner

e.g., for 12.5% of buggy instances

Precision(B)=12.5%, Recall(B)=50%

F-measure(B)=20%

Page 67: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Prediction for different project subjects

MIM: the proposed metrics CM: source code metrics HM: history metrics

Page 68: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Prediction for different project subjects

T-test results (significant figures are in bold, p-value < 0.05)

Page 69: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Prediction with different algorithms

Page 70: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Prediction with different algorithms

T-test results (significant figures are in bold, p-value < 0.05)

Page 71: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Prediction in different training periods

Sep 2010 Time P

Model training period

Dec 2005

Model testing period

Page 72: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Prediction in different training periods

Sep 2010 Time P

Model training period

Dec 2005

Model testing period

50% : 50% 70% : 30% 80% : 20%

Page 73: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Prediction in different training periods

Page 74: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Prediction in different training periods

T-test results (significant figures are in bold, p-value < 0.05)

Page 75: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)
Page 76: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Top 42 (37%) from MIMs among total 113 metrics

(MIMs+CMs+HMs)

Page 77: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

TOP 1: NumLowDOIEdit TOP 2: NumPatternEXSX TOP 3: TimeSpentOnEdit

Possible Insight

Page 78: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

TOP 1: NumLowDOIEdit TOP 2: NumPatternEXSX TOP 3: TimeSpentOnEdit

Chances are that more defects might be generated

if a programmer TOP2 repeatedly edit and browse a

file especially related to the previous defects TOP3

with putting more weight on editing time, and

especially TOP1 when editing such the files less

frequently or less recently accessed ever …

Possible Insight

Page 79: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Performance comparison

with regression modeling

for predicting # of post-defects

Page 80: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Predicting Post-Defect Numbers

Page 81: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Predicting Post-Defect Numbers

T-test results (significant figures are in bold, p-value < 0.05)

Page 82: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

• Systems examined might not be representative

• Systems are all open source projects

• Defect information might be biased

Threats to Validity

Page 83: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Our findings exemplify that developer’s

interaction can affect software quality

Our proposed micro interaction metrics

improve defect prediction accuracy

significantly

Conclusion

Page 84: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

We believe future defect prediction models will use more developers’ direct and

micro level interaction information

MIMs are a first step towards it

Page 85: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Thank you! Any Question?

• Problem – Developer’s interaction information can affect

software quality (defects)?

• Approach – We proposed novel micro interaction metrics

(MIMs) overcoming the popular static metrics

• Result – MIMs significantly improve prediction accuracy

compared to source code metrics (CMs) and history metrics (HMs)

Page 86: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Backup Slides

Page 87: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)
Page 88: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

One possible ARGUMENT:

Some developers may not

have used Mylyn to fix bugs

Page 89: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Error chance in counting post-defects

as a result getting biased labels

(i.e., incorrect % of buggy instances)

Page 90: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Repeated experiment using

same instances but with a

different heuristics of defect

counting, CVS-log-based

approach*

* with keywords: “fix”, “bug”, “bug report ID” in change logs

Page 91: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

CV

S-lo

g-b

ase

d

Prediction with CVS-log-based approach

Page 92: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

CV

S-lo

g-b

ase

d

T-test results (significant figures are in bold, p-value < 0.05)

Prediction with CVS-log-based approach

Page 93: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

CVS-log-based approach reported more

additional post-defects

(more % of buggy-labeled instances)

Page 94: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

CVS-log-based approach reported more

additional post-defects

(more % of buggy-labeled instances)

MIMs failed to feature them due to

lack of the corresponding Mylyn data

Page 95: Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

Note that it is difficult to

100% guarantee the quality of

CVS change logs

(e.g., no explicit bug ID, missing logs)