28
Heterogeneous Defect Prediction ESEC/FSE 2015 September 3, 2015 Jaechang Nam and Sunghun Kim Department of Computer Science and Engineering HKUST

Heterogeneous Defect Prediction (ESEC/FSE 2015)

Embed Size (px)

Citation preview

Page 1: Heterogeneous Defect Prediction (ESEC/FSE 2015)

Heterogeneous Defect PredictionESEC/FSE 2015

September 3, 2015Jaechang Nam and Sunghun Kim

Department of Computer Science and Engineering

HKUST

Page 2: Heterogeneous Defect Prediction (ESEC/FSE 2015)

2

Predict

Training

?

?

Model

Project A

: Metric value: Buggy-labeled instance: Clean-labeled instance

?: Unlabeled instance

Software Defect Prediction

Related WorkMunson@TSE`92, Basili@TSE`95, Menzies@TSE`07,Hassan@ICSE`09, Bird@FSE`11,D’ambros@EMSE112Lee@FSE`11,...

Within-Project Defect Prediction (WPDP)

Page 3: Heterogeneous Defect Prediction (ESEC/FSE 2015)

3

What if labeled instances do not exist?

?

?

?

??

Project X

Unlabeled Dataset

?: Unlabeled instance: Metric value

Model

New projectsProjects lacking inhistorical data

Page 4: Heterogeneous Defect Prediction (ESEC/FSE 2015)

4

Existing Solutions?

?

?

?

??

(New) Project X

Unlabeled Dataset

?: Unlabeled instance: Metric value

Page 5: Heterogeneous Defect Prediction (ESEC/FSE 2015)

5

Cross-Project Defect Prediction (CPDP)

?

?

?

??

Training

Predict

Model

Project A(source)

Project X(target)

Unlabeled Dataset

: Metric value: Buggy-labeled instance: Clean-labeled instance

?: Unlabeled instance

Related WorkWatanabe@PROMISE`08, Turhan@EMSE`09Zimmermann@FSE`09, Ma@IST`12, Zhang@MSR`14Zhang@MSR`14, Panichella@WCRE`14, Canfora@STVR15

Challenge

Same metric set(same feature space)

• Heterogeneousmetrics betweensource and target

Page 6: Heterogeneous Defect Prediction (ESEC/FSE 2015)

6

Motivation

?

Training

Test

Model

Project A(source)

Project C(target)

?

?

?

?

?

?

?Heterogeneous metric sets

(different feature spacesor different domains)

Possible to Reuse all the existing defect datasets for CPDP!

Heterogeneous Defect Prediction (HDP)

Page 7: Heterogeneous Defect Prediction (ESEC/FSE 2015)

7

Key Idea• Consistent defect-proneness tendency of

metrics– Defect prediction metrics measure complexity of

software and its development process.• e.g.

– The number of developers touching a source code file (Bird@FSE`11)

– The number of methods in a class (D’Ambroas@ESEJ`12)– The number of operands (Menzies@TSE`08)

More complexity implies more defect-proneness (Rahman@ICSE`13)

• Distributions between source and target should be the same to build a strong prediction model.

Match source and target metrics that have similar distribution

Page 8: Heterogeneous Defect Prediction (ESEC/FSE 2015)

8

Heterogeneous Defect Prediction (HDP)

- Overview -X1 X2 X3 X4 Label

1 1 3 10 Buggy8 0 1 0 Clean⋮ ⋮ ⋮ ⋮ ⋮9 0 1 1 Clean

MetricMatching

Source: Project A Target: Project B

Cross-prediction Mode

l Build

(training)Predict(test)

MetricSelection

Y1 Y2 Y3 Y4 Y5 Y6 Y7 Label

3 1 1 0 2 1 9 ?1 1 9 0 2 3 8 ?⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮0 1 1 1 2 1 1 ?

1 3 10 Buggy8 1 0 Clean⋮ ⋮ ⋮ ⋮9 1 1 Clean

1 3 10 Buggy8 1 0 Clean⋮ ⋮ ⋮ ⋮9 1 1 Clean

9 1 1 ?8 3 9 ?⋮ ⋮ ⋮ ⋮1 1 1 ?

Page 9: Heterogeneous Defect Prediction (ESEC/FSE 2015)

9

Metric Selection• Why? (Guyon@JMLR`03)

– Select informative metrics• Remove redundant and irrelevant metrics

– Decrease complexity of metric matching combinations

• Feature Selection Approaches (Gao@SPE`11,Shivaji@TSE`13)

– Gain Ratio– Chi-square– Relief-F– Significance attribute evaluation

Page 10: Heterogeneous Defect Prediction (ESEC/FSE 2015)

10

Metric Matching

Source Metrics Target MetricsX1

X2

Y1

Y2

0.8

0.4

0.5

0.3

* We can apply different cutoff values of matching scores.* It can be possible that there is no matching at all.

Page 11: Heterogeneous Defect Prediction (ESEC/FSE 2015)

11

Compute Matching Score KSAnalyzer• Use p-value of Kolmogorov-Smirnov Test

(Massey@JASA`51) Matching Score M of i-th source and j-th target metrics:

Mij = pij

Page 12: Heterogeneous Defect Prediction (ESEC/FSE 2015)

12

Heterogeneous Defect Prediction- Overview -

X1 X2 X3 X4 Label

1 1 3 10 Buggy8 0 1 0 Clean⋮ ⋮ ⋮ ⋮ ⋮9 0 1 1 Clean

MetricMatching

Source: Project A Target: Project B

Cross-prediction Mode

l Build

(training)Predict(test)

MetricSelection

Y1 Y2 Y3 Y4 Y5 Y6 Y7 Label

3 1 1 0 2 1 9 ?1 1 9 0 2 3 8 ?⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮0 1 1 1 2 1 1 ?

1 3 10 Buggy8 1 0 Clean⋮ ⋮ ⋮ ⋮9 1 1 Clean

1 3 10 Buggy8 1 0 Clean⋮ ⋮ ⋮ ⋮9 1 1 Clean

9 1 1 ?8 3 9 ?⋮ ⋮ ⋮ ⋮1 1 1 ?

Page 13: Heterogeneous Defect Prediction (ESEC/FSE 2015)

EVALUATION

13

Page 14: Heterogeneous Defect Prediction (ESEC/FSE 2015)

14

Baselines• WPDP

• CPDP-CM (Turhan@EMSE`09,Ma@IST`12,He@IST`14)

– Cross-project defect prediction using only common metrics between source and target datasets

• CPDP-IFS (He@CoRR`14)

– Cross-project defect prediction on Imbalanced Feature Set (i.e. heterogeneous metric set)

– 16 distributional characteristics of values of an instance as features (e.g., mean, std, maximum,...)

Page 15: Heterogeneous Defect Prediction (ESEC/FSE 2015)

15

Research Questions (RQs)• RQ1– Is heterogeneous defect prediction

comparable to WPDP?• RQ2– Is heterogeneous defect prediction

comparable to CPDP-CM? • RQ3– Is Heterogeneous defect prediction

comparable to CPDP-IFS?

Page 16: Heterogeneous Defect Prediction (ESEC/FSE 2015)

16

Benchmark DatasetsGroup Dataset

# of instances # of

metrics

Granularity

All Buggy (%)

AEEEM

EQ 325 129 (39.7%)

61 Class

JDT 997 206 (20.7%)

LC 399 64 (9.36%)

ML 1862

245 (13.2%)

PDE 1492

209 (14.0%)

MORPH

ant-1.3 125 20 (16.0%)

20 Class

arc 234 27 (11.5%)

camel-1.0 339 13 (3.8%)

poi-1.5 237 141 (75.0%)

redaktor 176 27 (15.3%)

skarbonka 45 9 (20.0%)

tomcat 858 77 (9.0%)velocity-

1.4 196 147 (75.0%)

xalan-2.4 723 110 (15.2%)

xerces-1.2 440 71

(16.1%)

Group Dataset

# of instances # of

metrics

Granularity

All Buggy (%)

ReLink

Apache 194 98 (50.5%)

26 FileSafe 56 22 (39.3%)

ZXing 399 118 (29.6%)

NASA

cm1 327 42 (12.8%)

37 Function

mw1 253 27 (10.7%)

pc1 705 61 (8.7%)

pc3 1077 134 (12.4%)

pc4 1458 178 (12.2%)

SOFTLAB

ar1 121 9 (7.4%)

29 Function

ar3 63 8 (12.7%)

ar4 107 20 (18.7%)

ar5 36 8 (22.2%)

ar6 101 15 (14.9%)

600 prediction combinations in total!

Page 17: Heterogeneous Defect Prediction (ESEC/FSE 2015)

17

Experimental Settings• Logistic Regression• HDP vs. WPDP, CPDP-CM, and CPDP-IFS

Test set(50%)

Training set(50%)

Project 1

Project 2

Project n

......

X 1000

Project 1

Project 2

Project n

......

CPDP-CMCPDP-IFS

HDPWPDP

Project A

Page 18: Heterogeneous Defect Prediction (ESEC/FSE 2015)

18

Evaluation Measures• False Positive Rate = FP/(TN+FP)• True Positive Rate = Recall• AUC (Area Under receiver operating characteristic Curve)

False Positive rate

True

Pos

itive

rate

0 1

1

Page 19: Heterogeneous Defect Prediction (ESEC/FSE 2015)

19

Evaluation Measures• Win/Tie/Loss (Valentini@ICML`03, Li@JASE`12,

Kocaguneli@TSE`13)

– Wilcoxon signed-rank test (p<0.05) for 1000 prediction results

– Win• # of outperforming HDP prediction combinations

with statistical significance. (p<0.05)– Tie

• # of HDP prediction combinations with no statistical significance. (p≥0.05)

– Loss• # of outperforming baseline prediction results

with statistical significance. (p>0.05)

Page 20: Heterogeneous Defect Prediction (ESEC/FSE 2015)

RESULT

20

Page 21: Heterogeneous Defect Prediction (ESEC/FSE 2015)

Prediction Results in median AUC

Target WPDP CPDP-CM

CPDP-IFS

HDPKS(cutoff=0.05)

EQ 0.583 0.776 0.461 0.783JDT 0.795 0.781 0.543 0.767MC 0.575 0.636 0.584 0.655ML 0.734 0.651 0.557 0.692*PDE 0.684 0.682 0.566 0.717

ant-1.3 0.670 0.611 0.500 0.701arc 0.670 0.611 0.523 0.701

camel-1.0 0.550 0.590 0.500 0.639

poi-1.5 0.707 0.676 0.606 0.537redaktor 0.744 0.500 0.500 0.537skarbonk

a 0.569 0.736 0.528 0.694*

tomcat 0.778 0.746 0.640 0.818velocity-

1.4 0.725 0.609 0.500 0.391

xalan-2.4 0.755 0.658 0.499 0.751xerces-

1.2 0.624 0.453 0.500 0.48921

Target WPDP CPDP-CM

CPDP-IFS

HDPKS(cutoff=0.05)

Apache 0.714 0.689 0.635 0.717*Safe 0.706 0.749 0.616 0.818*

ZXing 0.605 0.619 0.530 0.650*cm1 0.653 0.622 0.551 0.717*mw1 0.612 0.584 0.614 0.727pc1 0.787 0.675 0.564 0.752*pc3 0.794 0.665 0.500 0.738*pc4 0.900 0.773 0.589 0.682*ar1 0.582 0.464 0.500 0.734*ar3 0.574 0.862 0.682 0.823*ar4 0.657 0.588 0.575 0.816*ar5 0.804 0.875 0.585 0.911*ar6 0.654 0.611 0.527 0.640

All 0.657 0.636 0.555 0.724*

HDPKS: Heterogeneous defect prediction using KSAnalyzer

Page 22: Heterogeneous Defect Prediction (ESEC/FSE 2015)

22

Win/Tie/Loss ResultsTarget

Against WPDP

Against CPDP-

CM

AgainstCPDP-

IFSW T L W T L W T L

EQ 4 0 0 2 2 0 4 0 0JDT 0 0 5 3 0 2 5 0 0LC 6 0 1 3 3 1 3 1 3ML 0 0 6 4 2 0 6 0 0PDE 3 0 2 2 0 3 5 0 0

ant-1.3 6 0 1 6 0 1 5 0 2arc 3 1 0 3 0 1 4 0 0

camel-1.0 3 0 2 3 0 2 4 0 1

poi-1.5 2 0 2 3 0 1 2 0 2redaktor 0 0 4 2 0 2 3 0 1skarbonk

a 11 0 0 4 0 7 9 0 2

tomcat 2 0 0 1 1 0 2 0 0velocity-

1.4 0 0 3 0 0 3 0 0 3

xalan-2.4 0 0 1 1 0 0 1 0 0xerces-

1.2 0 0 3 3 0 0 1 0 2

Target

AgainstWPDP

AgainstCPDP-CM

AgainstCPDP-IFS

W T L W T L W T L

Apache 6 0 5 8 1 2 9 0 2

Safe 14 0 3 12 0 5 15 0 2ZXing 8 0 0 6 0 2 7 0 1cm1 7 1 2 8 0 2 9 0 1mw1 5 0 1 4 0 2 4 0 2pc1 1 0 5 5 0 1 6 0 0pc3 0 0 7 7 0 0 7 0 0pc4 0 0 7 2 0 5 7 0 0ar1 14 0 1 14 0 1 11 0 4ar3 15 0 0 5 0 10 10 2 3ar4 16 0 0 14 1 1 15 0 1ar5 14 0 4 14 0 4 16 0 2ar6 7 1 7 8 4 3 12 0 3

Total 147 3 72 147 14 61 182 3 35% 66.2

%1.4%

32.4%

66.2%

6.3%

27.5%

82.0%

1.3%

16.7%

Page 23: Heterogeneous Defect Prediction (ESEC/FSE 2015)

23

Matched Metrics (Win)M

etric

Val

ues

Distribution(Source metric: RFC-the number of method invoked by a class, Target metric: the number of operands)

Matching Score = 0.91

AUC = 0.946 (ant1.3 ar5)

Page 24: Heterogeneous Defect Prediction (ESEC/FSE 2015)

24

Matched Metrics (Loss)M

etric

Val

ues

Distribution(Source metric: LOC, Target metric: average number of LOC in a method)

Matching Score = 0.13

AUC = 0.391 (Safe velocity-1.4)

Page 25: Heterogeneous Defect Prediction (ESEC/FSE 2015)

25

Different Feature Selections(median AUCs, Win/Tie/Loss)

Approach

AgainstWPDP

AgainstCPDP-CM

AgainstCPDP-IFS HDP

AUC Win% AUC Win% AUC Win% AUC

Gain Ratio 0.657 63.7% 0.645 63.2% 0.536 80.2% 0.720Chi-

Square 0.657 64.7% 0.651 66.4% 0.556 82.3% 0.727

Significance 0.657 66.2% 0.636 66.2% 0.553 82.0% 0.724

Relief-F 0.670 57.0% 0.657 63.1% 0.543 80.5% 0.709None 0.657 47.3% 0.624 50.3% 0.536 66.3% 0.663

Page 26: Heterogeneous Defect Prediction (ESEC/FSE 2015)

26

Results in Different Cutoffs

CutoffAgainstWPDP

AgainstCPDP-CM

AgainstCPDP-IFS HDP

TargetCoverageAUC Win% AUC Win% AUC Win% AUC

0.05 0.657 66.2% 0.636 66.2% 0.553 82.4% 0.724* 100%0.90 0.657 100% 0.761 71.4% 0.624 100% 0.852* 21%

Page 27: Heterogeneous Defect Prediction (ESEC/FSE 2015)

27

Conclusion• HDP– Potential for CPDP across datasets

with different metric sets.

• Future work– Filtering out noisy metric matching– Determine the best probability

threshold

Page 28: Heterogeneous Defect Prediction (ESEC/FSE 2015)

28

Q&ATHANK YOU!