97
Survey on Software Defect Prediction - PhD Qualifying Examination - July 3, 2014 Jaechang Nam Department of Computer Science and Engineering HKUST

Survey on Software Defect Prediction

Embed Size (px)

DESCRIPTION

PQE slides at HKUST

Citation preview

Page 1: Survey on Software Defect Prediction

Survey onSoftware Defect Prediction

- PhD Qualifying Examination -

July 3, 2014Jaechang Nam

Department of Computer Science and Engineering

HKUST

Page 2: Survey on Software Defect Prediction

2

Outline

• Background• Software Defect Prediction Approaches– Simple metric and defect estimation models– Complexity metrics and Fitting models– Prediction models– Just-In-Time Prediction Models– Practical Prediction Models and Applications– History Metrics from Software Repositories– Cross-Project Defect Prediction and

Feasibility

• Summary and Challenging Issues

Page 3: Survey on Software Defect Prediction

3

Motivation• General question of software defect

prediction– Can we identify defect-prone entities (source

code file, binary, module, change,...) in advance?• # of defects• buggy or clean

• Why?– Quality assurance for large software

(Akiyama@IFIP’71)

– Effective resource allocation• Testing (Menzies@TSE`07)

• Code review (Rahman@FSE’11)

Page 4: Survey on Software Defect Prediction

4

Ground Assumption

• The more complex, the more defect-prone

Page 5: Survey on Software Defect Prediction

5

Two Focuses on Defect Prediction

• How much complex is software and its process?– Metrics

• How can we predict whether software has defects?– Models based on the metrics

Page 6: Survey on Software Defect Prediction

6

Prediction Performance Goal

• Recall vs. Precision

• Strong predictor criteria– 70% recall and 25% false positive rate

(Menzies@TSE`07)

– Precision, recall, accuracy ≥ 75% (Zimmermann@FSE`09)

Page 7: Survey on Software Defect Prediction

7

Outline

• Background• Software Defect Prediction Approaches– Simple metric and defect estimation models– Complexity metrics and Fitting models– Prediction models– Just-In-Time Prediction Models– Practical Prediction Models and Applications– History Metrics from Software Repositories– Cross-Project Defect Prediction and

Feasibility

• Summary and Challenging Issues

Page 8: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Metr

ics

Mod

els

Oth

ers

Page 9: Survey on Software Defect Prediction

9

Identifying Defect-prone Entities

• Akiyama’s equation (Ajiyama@IFIP`71)

– # of defects = 4.86 + 0.018 * LOC (=Lines Of Code)

• 23 defects in 1 KLOC• Derived from actual systems

• Limitation– Only LOC is not enough to capture software

complexity

Page 10: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting Model

Cyclomatic

MetricHalstea

d Metrics

Metr

ics

Mod

els

Oth

ers

Page 11: Survey on Software Defect Prediction

11

Complexity Metrics and Fitting Models

• Cyclomatic complexity metrics (McCabe`76)

– “Logical complexity” of a program represented in control flow graph

– V(G) = #edge – #node + 2

• Halstead complexity metrics (Halsted`77)

– Metrics based on # of operators and operands

– Volume = N * log2n

– # of defects = Volume / 3000

Page 12: Survey on Software Defect Prediction

12

Complexity Metrics and Fitting Models

• Limitation– Do not capture complexity (amount) of

change.– Just fitting models but not prediction

models in most of studies conducted in 1970s and early 1980s• Correlation analysis between metrics and # of

defects– By linear regression models

• Models were not validated for new entities (modules).

Page 13: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Cyclomatic

MetricHalstea

d Metrics

Process Metrics

Metr

ics

Mod

els

Oth

ers

Prediction Model (Classification)

Page 14: Survey on Software Defect Prediction

14

Regression Model• Shen et al.’s empirical study (Shen@TSE`85)

– Linear regression model– Validated on actual new modules– Metrics

• Halstead, # of conditional statements• Process metrics

– Delta of complexity metrics between two successive system versions

– Measures• Between actual and predicted # of defects on new

modules– MRE (Mean magnitude of relative error)

» average of (D-D’)/D for all modules• D: actual # of defects• D’: predicted # of defects

» MRE = 0.48

Page 15: Survey on Software Defect Prediction

15

Classification Model• Discriminative analysis by Munson et al.

(Munson@TSE`92)

• Logistic regression• High risk vs. low risk modules• Metrics

– Halstead and Cyclomatic complexity metrics

• Measure– Type I error: False positive rate– Type II error: False negative rate

• Result– Accuracy: 92% (6 misclassifi cation out of 78 modules)– Precision: 85%– Recall: 73%– F-measure: 88%

Page 16: Survey on Software Defect Prediction

16

?

Defect Prediction Process(Based on Machine Learning)

Classification /Regression

SoftwareArchives

BCC

B

...

250

1

...

Instances withmetrics

(features) and labels

BC

B...

2

0

1

...

Training Instances

(Preprocessing)

Model

?

New instances

Generate

Instances

Builda

model

Page 17: Survey on Software Defect Prediction

17

Defect Prediction(Based on Machine Learning)

• Limitations– Limited resources for process metrics

• Error fix in unit testing phase was conducted informally by an individual developer (no error information available in this phase). (Shen@TSE`85)

– Existing metrics were not enough to capture complexity of object-oriented (OO) programs.

– Helpful for quality assurance team but not for individual developers

Page 18: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

Process Metrics

Metr

ics

Mod

els

Oth

ers

Just-In-Time Prediction Model

Practical Model and Applications

History Metrics

CK Metrics

Page 19: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

Just-In-Time Prediction Model

Practical Model and Applications

Process Metrics

Metr

ics

Mod

els

Oth

ers

History Metrics

CK Metrics

Page 20: Survey on Software Defect Prediction

20

Risk Prediction of Software Changes

(Mockus@BLTJ`00)• Logistic regression• Change metrics

– LOC added/deleted/modified– Diffusion of change– Developer experience

• Result– Both false positive and false negative rate:

20% in the best case

Page 21: Survey on Software Defect Prediction

21

Risk Prediction of Software Changes

(Mockus@BLTJ`00)• Advantage

– Show the feasible model in practice

• Limitation– Conducted 3 times per week

• Not fully Just-In-Time

– Validated on one commercial system (5ESS switching system software)

Page 22: Survey on Software Defect Prediction

22

BugCache (Kim@ICSE`07)

• Maintain defect-prone entities in a cache• Approach

• Result– Top 10% files account for 73-95% of defects on 7

systems

Page 23: Survey on Software Defect Prediction

23

BugCache (Kim@ICSE`07)

• Advantages– Cache can be updated quickly with less cost. (c.f.

static models based on machine learning)– Just-In-Time: always available whenever QA teams

want to get the list of defect-prone entities

• Limitations– Cache is not reusable for other software projects.– Designed for QA teams

• Applicable only in a certain time point after a bunch of changes (e.g., end of a sprint)

• Still limited for individual developers in development phase

Page 24: Survey on Software Defect Prediction

24

Change Classification (Kim@TSE`08)

• Classification model based on SVM• About 11,500 features

– Change metadata such as changed LOC, change count

– Complexity metrics– Text features from change log messages, source

code, and file names

• Results– 78% accuracy and 60% recall on average from 12

open-source projects

Page 25: Survey on Software Defect Prediction

25

Change Classification (Kim@TSE`08)

• Limitations– Heavy model (11,500 features)– Not validated on commercial software products.

Page 26: Survey on Software Defect Prediction

26

Follow-up Studies• Studies addressing limitations

– “Reducing Features to Improve Code Change-Based Bug Prediction” (Shivaj i@TSE`13)

• With less than 10% of all features, buggy F-measure is 21% improved.

– “Software Change Classification using Hunk Metrics” (Ferzund@ICSM`09)

• 27 hunk-level metrics for change classification• 81% accuracy, 77% buggy hunk precision, and 67% buggy hunk

recall

– “A large-scale empirical study of just-in-time quality assurance” (Kamei@TSE`13)

• 14 process metrics (mostly from Mockus`00)• 68% accuracy, 64% recall on 11open-source and commercial

projects

– “An Empirical Study of Just-In-Time Defect Prediction Using Cross-Project Models” (Fukushima@MSR`14)

• Median AUC: 0.72

Page 27: Survey on Software Defect Prediction

27

Challenges of JIT model

• Practical validation is difficult– Just 10-fold cross validation in current

literature– No validation on real scenario

• e.g., online machine learning

• Still difficult to review huge change– Fine-grained prediction within a change

• e.g., Line-level prediction

Page 28: Survey on Software Defect Prediction

Next Steps of Defect Prediction

1980s 1990s 2000s 2010s 2020s

Online Learning JIT Model

Prediction Model (Regression)

Prediction Model (Classification)

Just-In-Time Prediction Model

Process Metrics

Metrics

Mod

els

Oth

ers

Fine-grained Prediction

Page 29: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

Just-In-Time Prediction Model

Practical Model and Applications

Process Metrics

Metr

ics

Mod

els

Oth

ers

History Metrics

CK Metrics

Page 30: Survey on Software Defect Prediction

30

Defect Prediction in Industry• “Predicting the location and number of faults

in large software systems” (Ostrand@TSE`05)– Two industrial systems– Recall 86%– 20% most fault-prone modules account for 62%

faults

Page 31: Survey on Software Defect Prediction

31

Case Study for Practical Model

• “Does Bug Prediction Support Human Developers? Findings From a Google Case Study” (Lewis@ICSE`13)

– No identifiable change in developer behaviors after using defect prediction model

• Required characteristics but very challenging– Actionable messages / obvious reasoning

Page 32: Survey on Software Defect Prediction

Next Steps of Defect Prediction

1980s 1990s 2000s 2010s 2020s

Actionable Defect

Prediction

Prediction Model (Regression)

Prediction Model (Classification)

Just-In-Time Prediction Model

Practical Model and Applications

Process Metrics

Metrics

Mod

els

Oth

ers

Page 33: Survey on Software Defect Prediction

33

Evaluation Measure for Practical Model

• Measure prediction performance based on code review effort

• AUCEC (Area Under Cost Effectiveness Curve)

Percent of LOC

Perc

ent

of

bugs

found

0100%

100%

50%10%

M1

M2

Thre

shol

d

Rahman@FSE`11, Bugcache for inspections: Hit or miss?

Page 34: Survey on Software Defect Prediction

34

Practical Application

• What else can we do more with defect prediction models?– Test case selection on regression testing

(Engstrom@ICST`10)

– Prioritizing warnings from FindBugs (Rahman@ICSE`14)

Page 35: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK MetricsProcess Metrics

Metr

ics

Mod

els

Oth

ers

Practical Model and Applications

Just-In-Time Prediction Model

History Metrics

Page 36: Survey on Software Defect Prediction

36

Representative OO Metrics

Metric Description

WMC Weighted Methods per Class (# of methods)

DIT Depth of Inheritance Tree ( # of ancestor classes)

NOC Number of Children

CBO Coupling between Objects (# of coupled classes)

RFC Response for a class: WMC + # of methods called by the class)

LCOM Lack of Cohesion in Methods (# of "connected components”)

• CK metrics (Chidamber&Kemerer@TSE`94)

• Prediction Performance of CK vs. code (Basili@TSE`96)

– F-measure: 70% vs. 60%

Page 37: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK MetricsProcess Metrics

Metr

ics

Mod

els

Oth

ers

Practical Model and Applications

Just-In-Time Prediction Model

History Metrics

Page 38: Survey on Software Defect Prediction

38

Representative History Metrics

Name# of

metrics

Metric source Citation

Relative code change churn 8 SW Repo.* Nagappan@ICSE`05

Change 17 SW Repo. Moser@ICSE`08

Change Entropy 1 SW Repo. Hassan@ICSE`09

Code metric churnCode Entropy 2 SW Repo. D’Ambros@MSR`10

Popularity 5 Email archive

Bacchelli@FASE`10

Ownership 4 SW Repo. Bird@FSE`11

Micro Interaction Metrics (MIM) 56 Mylyn Lee@FSE`11

* SW Repo. = version control system + issue tracking system

Page 39: Survey on Software Defect Prediction

Representative History Metrics

• Advantage– Better prediction performance than code metrics

39

Moser`08 Hassan`09 D'Ambros`10 Bachille`10 Bird`11 Lee`110.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

Performance Improvement(all metrics vs. code complexity metrics)

(F-measure) (F-measure)(Absoluteprediction

error)

(Spearmancorrelation)

(Spearmancorrelation)

(Spearmancorrelation*)

(*Bird`10’s results are from two metrics vs. code metrics, No comparison data in Nagappan`05)

PerformanceImprovement

(%)

Page 40: Survey on Software Defect Prediction

40

History Metrics

• Limitations– History metrics do not extract particular program

characteristics such as developer social network, component network, and anti-pattern.

– Noise data• Bias in Bug-Fix Dataset (Bird@FSE`09)

– Not applicable for new projects and projects lacking in historical data

Page 41: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers

History Metrics

Other Metrics

Noise Reduction

Semi-supervised/active

Page 42: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers

History Metrics

Other Metrics

Noise Reduction

Semi-supervised/active

Page 43: Survey on Software Defect Prediction

43

Other Metrics

Name# of

metrics

Metric source Citation

Component network 28

Binaries(Windows

Server 2003)

Zimmermann@ICSE`08

Developer-Module network 9 SW Repo. + Binaries

Pinzger@FSE`08

Developer social network 4 SW Repo. Meenely@FSE`08

Anti-pattern 4

SW Repo. +

Design-pattern

Taba@ICSM`13

* SW Repo. = version control system + issue tracking system

Page 44: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers

History Metrics

Other Metrics

Noise Reduction

Semi-supervised/active

Page 45: Survey on Software Defect Prediction

45

Noise Reduction

• Noise detection and elimination algorithm (Kim@ICSE`11)

– Closest List Noise Identification (CLNI)• Based on Euclidean distance between instances

– Average F-measure improvement• 0.504 0.621

• Relink (Wo@FSE`11)

– Recover missing links between bugs and changes

– 60% 78% recall for missing links– F-measure improvement

• e.g. 0.698 (traditional) 0.731 (ReLink)

Page 46: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers

History Metrics

Other Metrics

Semi-supervised/active

Page 47: Survey on Software Defect Prediction

47

Defect Prediction for New Software Projects

• Universal Defect Prediction Model

• Simi-supervised / active learning

• Cross-Project Defect Prediction

Page 48: Survey on Software Defect Prediction

48

Universal Defect Prediction Model

(Zhang@MSR`14)• Context-aware rank transformation– Transform metric values ranged from 1 to 10

across all projects.

• Model built by 1398 projects collected from SourceForge and Google code

Page 49: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers

History Metrics

Other Metrics

Semi-supervised/active

Page 50: Survey on Software Defect Prediction

50

Other approaches for CDDP

• Semi-supervised learning with dimension reduction for defect prediction (Lu@ASE`12)

– Training a model by a small set of labeled instances together with many unlabeled instances

– AUC improvement• 0.83 0.88 with 2% labeled instances

• Sample-based semi-supervised/active learning for defect prediction (Li@AESEJ`12)

– Average F-measure• 0.628 0.685 with 10% sampled instances

Page 51: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers

History Metrics

Other Metrics

Semi-supervised/active

Page 52: Survey on Software Defect Prediction

52

Cross-Project Defect Prediction

(CPDP)• For a new project or a project

lacking in the historical data

?

?

?

Training

Test

Model

Project A Project B

Only 2% out of 622 prediction combinations worked. (Zimmermann@FSE`09)

Page 53: Survey on Software Defect Prediction

Transfer Learning (TL)

27

Traditional Machine Learning (ML)

Learning

System

Learning

System

Transfer Learning

Learning

System

Learning

System

Knowledge

Transfer

Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis

Page 54: Survey on Software Defect Prediction

54

CPDP

• Adopting transfer learning

Transfer learning

Metric Compensation

NN Filter TNB TCA+

Preprocessing N/AFeature

selection,Log-filter

Log-filter Normalization

Machine learner

C4.5 Naive Bayes TNBLogistic

Regression

# of Subjects 2 10 10 8

# of predictions

2 10 10 26

Avg. f-measure0.67

(W:0.79, C:0.58)0.35

(W:0.37, C:0.26)

0.39(NN: 0.35,

C:0.33)

0.46(W:0.46, C:0.36)

Citation Watanabe@PROMISE`08

Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13

* NN = Nearest neighbor, W = Within, C = Cross

Page 55: Survey on Software Defect Prediction

55

Metric Compensation (Watanabe@PROMISE`08)

• Key idea

• New target metric value =target metric value * average source

metric value average target metric value

s

Source Target New Target

Let me transform like source!

Page 56: Survey on Software Defect Prediction

56

Metric Compensation (cont.)(Watanabe@PROMISE`08)

Transfer learning

Metric Compensation

NN Filter TNB TCA+

Preprocessing N/AFeature

selection,Log-filter

Log-filter Normalization

Machine learner

C4.5 Naive Bayes TNBLogistic

Regression

# of Subjects 2 10 10 8

# of predictions

2 10 10 26

Avg. f-measure0.67

(W:0.79, C:0.58)0.35

(W:0.37, C:0.26)

0.39(NN: 0.35,

C:0.33)

0.46(W:0.46, C:0.36)

Citation Watanabe@PROMISE`08

Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13

* NN = Nearest neighbor, W = Within, C = Cross

Page 57: Survey on Software Defect Prediction

57

NN filter(Turhan@ESEJ`09)

• Key idea

• Nearest neighbor filter– Select 10 nearest source instances of

each target instance

New Source Target

Hey, you look like me! Could you be my model?

Source

Page 58: Survey on Software Defect Prediction

58

NN filter (cont.)(Turhan@ESEJ`09)

Transfer learning

Metric Compensation

NN Filter TNB TCA+

Preprocessing N/AFeature

selection,Log-filter

Log-filter Normalization

Machine learner

C4.5 Naive Bayes TNBLogistic

Regression

# of Subjects 2 10 10 8

# of predictions

2 10 10 26

Avg. f-measure0.67

(W:0.79, C:0.58)0.35

(W:0.37, C:0.26)

0.39(NN: 0.35,

C:0.33)

0.46(W:0.46, C:0.36)

Citation Watanabe@PROMISE`08

Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13

* NN = Nearest neighbor, W = Within, C = Cross

Page 59: Survey on Software Defect Prediction

Transfer Naive Bayes(Ma@IST`12)

• Key idea

59

Target

Hey, you look like me! You will get more chance to be my best model!

Source

Provide more weight to similar source instances to build a Naive Bayes Model

Build a model

Please, consider me more important than other instances

I’m not that

important!

Page 60: Survey on Software Defect Prediction

60

Transfer Naive Bayes (cont.)(Ma@IST`12)

• Transfer Naive Bayes

– New prior probability

– New conditional probability

Page 61: Survey on Software Defect Prediction

61

Transfer Naive Bayes (cont.)(Ma@IST`12)

• How to find similar source instances for target– A similarity score

– A weight value

F1 F2 F3 F4Score

(si)

Max of target 7 3 2 5 -

src. inst 1 5 4 2 2 3

src. inst 2 0 2 5 9 1

Min of target 1 2 0 1 -

k=# of features, si=score of instance i

Page 62: Survey on Software Defect Prediction

62

Transfer Naive Bayes (cont.)(Ma@IST`12)

Transfer learning

Metric Compensation

NN Filter TNB TCA+

Preprocessing N/AFeature

selection,Log-filter

Log-filter Normalization

Machine learner

C4.5 Naive Bayes TNBLogistic

Regression

# of Subjects 2 10 10 8

# of predictions

2 10 10 26

Avg. f-measure0.67

(W:0.79, C:0.58)0.35

(W:0.37, C:0.26)

0.39(NN: 0.35,

C:0.33)

0.46(W:0.46, C:0.36)

Citation Watanabe@PROMISE`08

Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13

* NN = Nearest neighbor, W = Within, C = Cross

Page 63: Survey on Software Defect Prediction

63

TCA+(Nam@ICSE`13)

• Key idea– TCA (Transfer Component Analysis)

Source Target

Oops, we are different! Let’s meet in another world!

New Source New Target

Page 64: Survey on Software Defect Prediction

64

Transfer Component Analysis (cont.)

• Feature extraction approach– Dimensionality reduction– Projection• Map original data

in a lower-dimensional feature space

1-dimensional feature space

2-dimensional feature space

Page 65: Survey on Software Defect Prediction

65

TCA (cont.)

Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis

Target domain dataSource domain data

Page 66: Survey on Software Defect Prediction

66

TCA (cont.)

TCA

Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis

Page 67: Survey on Software Defect Prediction

TCA+(Nam@ICSE`13)

67

Source Target

Oops, we are different! Let’s meet at another world!

New Source New Target

But, we are still a bit different!

Source Target

Oops, we are different! Let’s meet at another world!

New Source New Target

Normalize US together!

TCATCA+

Page 68: Survey on Software Defect Prediction

Normalization Options

• NoN: No normalization applied

• N1: Min-max normalization (max=1, min=0)

• N2: Z-score normalization (mean=0, std=1)

• N3: Z-score normalization only using source mean and standard deviation

• N4: Z-score normalization only using target mean and standard deviation

13

Page 69: Survey on Software Defect Prediction

69

Preliminary Results using TCA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

F-measure

*Baseline: Cross-project defect prediction without TCA and normalization

Prediction performance of TCA varies according to different

normalization options!Baseline NoN N1 N2 N3 N4

Baseline NoN N1 N2 N3 N4

Project A Project B

Project B Project A

F-m

easu

re

Page 70: Survey on Software Defect Prediction

70

TCA+: Decision Rules

• Find a suitable normalization for TCA

• Steps–#1: Characterize a dataset–#2: Measure similarity

between source and target datasets

–#3: Decision rules

Page 71: Survey on Software Defect Prediction

71

TCA+: #1. Characterize a Dataset

3

1

Dataset A

Dataset B

2

4

5

8

9

6

11

d1,

2

d1,

5

d1,

3

d3,11

3

1

24

5

8

9

611

d2,

6

d1,

2 d1,

3

d3,11

DIST={dij : i,j, 1 ≤ i < n, 1 < j ≤ n, i < j}

A

Page 72: Survey on Software Defect Prediction

72

TCA+: #2. Measure Similarity between Source and Target

• Minimum (min) and maximum (max) values of DIST

• Mean and standard deviation (std) of DIST• The number of instances

Page 73: Survey on Software Defect Prediction

73

TCA+: #3. Decision Rules

• Rule #1– Mean and Std are same NoN

• Rule #2– Max and Min are different N1 (max=1,

min=0)

• Rule #3,#4– Std and # of instances are different

N3 or N4 (src/tgt mean=0, std=1)

• Rule #5– Default N2 (mean=0, std=1)

Page 74: Survey on Software Defect Prediction

74

TCA+ (cont.)(Nam@ICSE`13)

Transfer learning

Metric Compensation

NN Filter TNB TCA+

Preprocessing N/AFeature

selection,Log-filter

Log-filter Normalization

Machine learner

C4.5 Naive Bayes TNBLogistic

Regression

# of Subjects 2 10 10 8

# of predictions

2 10 10 26

Avg. f-measure0.67

(W:0.79, C:0.58)0.35

(W:0.37, C:0.26)

0.39(NN: 0.35,

C:0.33)

0.46(W:0.46, C:0.36)

Citation Watanabe@PROMISE`08

Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13

* NN = Nearest neighbor, W = Within, C = Cross

Page 75: Survey on Software Defect Prediction

75

Current CPDP using TL• Advantages

– Comparable prediction performance to within-prediction models

– Benefit from the state-of-the-art TL approaches

• Limitation– Performance of some cross-prediction pairs is still

poor. (Negative Transfer)

Source Target

Page 76: Survey on Software Defect Prediction

Defect Prediction Approaches

1970s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers

History Metrics

Other Metrics

Semi-supervised/active

Page 77: Survey on Software Defect Prediction

77

Feasibility Evaluation for CPDP

• Solution for negative transfer– Decision tree using project characteristic metrics

(Zimmermann@FSE`09)

• E.g. programming language, # developers, etc.

Page 78: Survey on Software Defect Prediction

78

Follow-up Studies• “An investigation on the feasibility of cross-

project defect prediction.” (He@ASEJ`12)

– Decision tree using distributional characteristics of a dataset E.g. mean, skewness, peakedness, etc.

Page 79: Survey on Software Defect Prediction

79

Feasibility for CPDP

• Challenges on current studies– Decision trees were not evaluated properly.

• Just fitting model

– Low target prediction coverage• 5 out of 34 target projects were feasible for cross-

predictions (He@ASEJ`12)

Page 80: Survey on Software Defect Prediction

Next Steps of Defect Prediction

1980s 1990s 2000s 2010s 2020s

Cross-Prediction Feasibility

Model

Prediction Model (Regression)

Prediction Model (Classification)

CK Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metrics

Mod

els

Oth

ers

History Metrics

Other Metrics

Semi-supervised/active

Page 81: Survey on Software Defect Prediction

Semi-supervised/active

Defect Prediction Approaches

1970s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

History Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Other Metrics

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers

Personalized Model

Page 82: Survey on Software Defect Prediction

82

Cross-prediction Model• Common challenge

– Current cross-prediction models are limited to datasets with same number of metrics

– Not applicable on projects with different feature spaces (different domains)• NASA Dataset: Halstead, LOC• Apache Dataset: LOC, Cyclomatic, CK metrics

Source Target

Page 83: Survey on Software Defect Prediction

Next Steps of Defect Prediction

1980s 1990s 2000s 2010s 2020s

Prediction Model (Regression)

Prediction Model (Classification)

CK Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metrics

Mod

els

Oth

ers

Cross-Domain

Prediction

History Metrics

Other Metrics

Noise Reduction

Semi-supervised/activePersonalized Model

Page 84: Survey on Software Defect Prediction

84

Other Topics

Page 85: Survey on Software Defect Prediction

Defect Prediction Approaches

1970s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

History Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Other Metrics

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers

Data Privacy

Noise Reduction

Semi-supervised/activePersonalized Model

Page 86: Survey on Software Defect Prediction

86

Other Topics• Privacy issue on defect datasets

– MORPH (Peters@ICSE`12)

• Mutate defect datasets while keeping prediction accuracy

• Can accelerate cross-project defect prediction with industrial datasets

• Personalized defect prediction model (Jiang@ASE`13)

– “Different developers have different coding styles, commit frequencies, and experience levels, all of which cause different defect patterns.”

– Results• Average F-measure: 0.62 (personalized models) vs. 0.59

(non-personalized models)

Page 87: Survey on Software Defect Prediction

87

Outline

• Background• Software Defect Prediction Approaches– Simple metric and defect estimation models– Complexity metrics and Fitting models– Prediction models– Just-In-Time Prediction Models– Practical Prediction Models and Applications– History Metrics from Software Repositories– Cross-Project Defect Prediction and

Feasibility

• Summary and Challenging Issues

Page 88: Survey on Software Defect Prediction

Defect Prediction Approaches

1970s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

History Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Other Metrics

Practical Model and Applications

Data Privacy

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers Noise

Reduction

Semi-supervised/activePersonalized Model

Page 89: Survey on Software Defect Prediction

Next Steps of Defect Prediction

1980s 1990s 2000s 2010s 2020s

Online Learning JIT Model

Actionable Defect

Prediction

Cross-Prediction Feasibility

Model

Prediction Model (Regression)

Prediction Model (Classification)

CK Metrics

History Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Other Metrics

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metrics

Mod

els

Oth

ers

Cross-Domain

Prediction

Fine-grained Prediction

Data Privacy

Noise Reduction

Semi-supervised/activePersonalized Model

Page 90: Survey on Software Defect Prediction

90

Thank you!

Page 91: Survey on Software Defect Prediction

91

Page 92: Survey on Software Defect Prediction

92

Evaluation Measures (classification)

• Measures for binary classification– Confusion matrix

Buggy Clean

Buggy True Positive (TP) False Negative (FN)

Clean False Positive (FP) True Negatives (TN)

Predicted Class

ActualClass

Page 93: Survey on Software Defect Prediction

93

Evaluation Measures (classification)

• False positive rate (FPR,PF) = FP/(TN+FP)

• Accuracy = (TP+TN)/(TP+FP+TN+FN)

• Precision = TP/(TP+FP)• Recall = TP/(TP+FN)• F-measure =

2*Precision*Recall Precision+Recall

Page 94: Survey on Software Defect Prediction

94

Evaluation Measures (classification)

• AUC (Area Under receiver operating characteristic Curve)

False Positive rate

True P

osi

tive r

ate

01

1

Page 95: Survey on Software Defect Prediction

95

Evaluation Measures (classification)

• AUCEC (Area Under Cost Effectiveness Curve)

Percent of LOC

Perc

ent

of

bugs

found

0100%

100%

50%10%

M1

M2Th

resh

old

Rahman@FSE`11, Bugcache for inspections: Hit or miss?

Page 96: Survey on Software Defect Prediction

96

Evaluation Measures (Regression)

• Target–Metric values vs. the number of bugs– Actual vs. predicted number of bugs

• Correlation coefficient– Spearman / Pearson /R2

• Mean squared error

Page 97: Survey on Software Defect Prediction

97

CK metrics

Metric Description

WMC Weighted Methods per Class (# of methods)

DIT Depth of Inheritance Tree ( # of ancestor classes)

NOC Number of Children

CBO Coupling between Objects (# of coupled classes)

RFC Response for a class: WMC + # of methods called by the class)

LCOM Lack of Cohesion in Methods (# of "connected components”)