Upload
sung-kim
View
2.131
Download
2
Embed Size (px)
DESCRIPTION
PQE slides at HKUST
Citation preview
Survey onSoftware Defect Prediction
- PhD Qualifying Examination -
July 3, 2014Jaechang Nam
Department of Computer Science and Engineering
HKUST
2
Outline
• Background• Software Defect Prediction Approaches– Simple metric and defect estimation models– Complexity metrics and Fitting models– Prediction models– Just-In-Time Prediction Models– Practical Prediction Models and Applications– History Metrics from Software Repositories– Cross-Project Defect Prediction and
Feasibility
• Summary and Challenging Issues
3
Motivation• General question of software defect
prediction– Can we identify defect-prone entities (source
code file, binary, module, change,...) in advance?• # of defects• buggy or clean
• Why?– Quality assurance for large software
(Akiyama@IFIP’71)
– Effective resource allocation• Testing (Menzies@TSE`07)
• Code review (Rahman@FSE’11)
4
Ground Assumption
• The more complex, the more defect-prone
5
Two Focuses on Defect Prediction
• How much complex is software and its process?– Metrics
• How can we predict whether software has defects?– Models based on the metrics
6
Prediction Performance Goal
• Recall vs. Precision
• Strong predictor criteria– 70% recall and 25% false positive rate
(Menzies@TSE`07)
– Precision, recall, accuracy ≥ 75% (Zimmermann@FSE`09)
7
Outline
• Background• Software Defect Prediction Approaches– Simple metric and defect estimation models– Complexity metrics and Fitting models– Prediction models– Just-In-Time Prediction Models– Practical Prediction Models and Applications– History Metrics from Software Repositories– Cross-Project Defect Prediction and
Feasibility
• Summary and Challenging Issues
Defect Prediction Approaches1970
s 1980s 1990s 2000s 2010sLOC
Simple Model
Metr
ics
Mod
els
Oth
ers
9
Identifying Defect-prone Entities
• Akiyama’s equation (Ajiyama@IFIP`71)
– # of defects = 4.86 + 0.018 * LOC (=Lines Of Code)
• 23 defects in 1 KLOC• Derived from actual systems
• Limitation– Only LOC is not enough to capture software
complexity
Defect Prediction Approaches1970
s 1980s 1990s 2000s 2010sLOC
Simple Model
Fitting Model
Cyclomatic
MetricHalstea
d Metrics
Metr
ics
Mod
els
Oth
ers
11
Complexity Metrics and Fitting Models
• Cyclomatic complexity metrics (McCabe`76)
– “Logical complexity” of a program represented in control flow graph
– V(G) = #edge – #node + 2
• Halstead complexity metrics (Halsted`77)
– Metrics based on # of operators and operands
– Volume = N * log2n
– # of defects = Volume / 3000
12
Complexity Metrics and Fitting Models
• Limitation– Do not capture complexity (amount) of
change.– Just fitting models but not prediction
models in most of studies conducted in 1970s and early 1980s• Correlation analysis between metrics and # of
defects– By linear regression models
• Models were not validated for new entities (modules).
Defect Prediction Approaches1970
s 1980s 1990s 2000s 2010sLOC
Simple Model
Fitting ModelPrediction Model (Regression)
Cyclomatic
MetricHalstea
d Metrics
Process Metrics
Metr
ics
Mod
els
Oth
ers
Prediction Model (Classification)
14
Regression Model• Shen et al.’s empirical study (Shen@TSE`85)
– Linear regression model– Validated on actual new modules– Metrics
• Halstead, # of conditional statements• Process metrics
– Delta of complexity metrics between two successive system versions
– Measures• Between actual and predicted # of defects on new
modules– MRE (Mean magnitude of relative error)
» average of (D-D’)/D for all modules• D: actual # of defects• D’: predicted # of defects
» MRE = 0.48
15
Classification Model• Discriminative analysis by Munson et al.
(Munson@TSE`92)
• Logistic regression• High risk vs. low risk modules• Metrics
– Halstead and Cyclomatic complexity metrics
• Measure– Type I error: False positive rate– Type II error: False negative rate
• Result– Accuracy: 92% (6 misclassifi cation out of 78 modules)– Precision: 85%– Recall: 73%– F-measure: 88%
16
?
Defect Prediction Process(Based on Machine Learning)
Classification /Regression
SoftwareArchives
BCC
B
...
250
1
...
Instances withmetrics
(features) and labels
BC
B...
2
0
1
...
Training Instances
(Preprocessing)
Model
?
New instances
Generate
Instances
Builda
model
17
Defect Prediction(Based on Machine Learning)
• Limitations– Limited resources for process metrics
• Error fix in unit testing phase was conducted informally by an individual developer (no error information available in this phase). (Shen@TSE`85)
– Existing metrics were not enough to capture complexity of object-oriented (OO) programs.
– Helpful for quality assurance team but not for individual developers
Defect Prediction Approaches1970
s 1980s 1990s 2000s 2010sLOC
Simple Model
Fitting ModelPrediction Model (Regression)
Prediction Model (Classification)
Cyclomatic
MetricHalstea
d Metrics
Process Metrics
Metr
ics
Mod
els
Oth
ers
Just-In-Time Prediction Model
Practical Model and Applications
History Metrics
CK Metrics
Defect Prediction Approaches1970
s 1980s 1990s 2000s 2010sLOC
Simple Model
Fitting ModelPrediction Model (Regression)
Prediction Model (Classification)
Cyclomatic
MetricHalstea
d Metrics
Just-In-Time Prediction Model
Practical Model and Applications
Process Metrics
Metr
ics
Mod
els
Oth
ers
History Metrics
CK Metrics
20
Risk Prediction of Software Changes
(Mockus@BLTJ`00)• Logistic regression• Change metrics
– LOC added/deleted/modified– Diffusion of change– Developer experience
• Result– Both false positive and false negative rate:
20% in the best case
21
Risk Prediction of Software Changes
(Mockus@BLTJ`00)• Advantage
– Show the feasible model in practice
• Limitation– Conducted 3 times per week
• Not fully Just-In-Time
– Validated on one commercial system (5ESS switching system software)
22
BugCache (Kim@ICSE`07)
• Maintain defect-prone entities in a cache• Approach
• Result– Top 10% files account for 73-95% of defects on 7
systems
23
BugCache (Kim@ICSE`07)
• Advantages– Cache can be updated quickly with less cost. (c.f.
static models based on machine learning)– Just-In-Time: always available whenever QA teams
want to get the list of defect-prone entities
• Limitations– Cache is not reusable for other software projects.– Designed for QA teams
• Applicable only in a certain time point after a bunch of changes (e.g., end of a sprint)
• Still limited for individual developers in development phase
24
Change Classification (Kim@TSE`08)
• Classification model based on SVM• About 11,500 features
– Change metadata such as changed LOC, change count
– Complexity metrics– Text features from change log messages, source
code, and file names
• Results– 78% accuracy and 60% recall on average from 12
open-source projects
25
Change Classification (Kim@TSE`08)
• Limitations– Heavy model (11,500 features)– Not validated on commercial software products.
26
Follow-up Studies• Studies addressing limitations
– “Reducing Features to Improve Code Change-Based Bug Prediction” (Shivaj i@TSE`13)
• With less than 10% of all features, buggy F-measure is 21% improved.
– “Software Change Classification using Hunk Metrics” (Ferzund@ICSM`09)
• 27 hunk-level metrics for change classification• 81% accuracy, 77% buggy hunk precision, and 67% buggy hunk
recall
– “A large-scale empirical study of just-in-time quality assurance” (Kamei@TSE`13)
• 14 process metrics (mostly from Mockus`00)• 68% accuracy, 64% recall on 11open-source and commercial
projects
– “An Empirical Study of Just-In-Time Defect Prediction Using Cross-Project Models” (Fukushima@MSR`14)
• Median AUC: 0.72
27
Challenges of JIT model
• Practical validation is difficult– Just 10-fold cross validation in current
literature– No validation on real scenario
• e.g., online machine learning
• Still difficult to review huge change– Fine-grained prediction within a change
• e.g., Line-level prediction
Next Steps of Defect Prediction
1980s 1990s 2000s 2010s 2020s
Online Learning JIT Model
Prediction Model (Regression)
Prediction Model (Classification)
Just-In-Time Prediction Model
Process Metrics
Metrics
Mod
els
Oth
ers
Fine-grained Prediction
Defect Prediction Approaches1970
s 1980s 1990s 2000s 2010sLOC
Simple Model
Fitting ModelPrediction Model (Regression)
Prediction Model (Classification)
Cyclomatic
MetricHalstea
d Metrics
Just-In-Time Prediction Model
Practical Model and Applications
Process Metrics
Metr
ics
Mod
els
Oth
ers
History Metrics
CK Metrics
30
Defect Prediction in Industry• “Predicting the location and number of faults
in large software systems” (Ostrand@TSE`05)– Two industrial systems– Recall 86%– 20% most fault-prone modules account for 62%
faults
31
Case Study for Practical Model
• “Does Bug Prediction Support Human Developers? Findings From a Google Case Study” (Lewis@ICSE`13)
– No identifiable change in developer behaviors after using defect prediction model
• Required characteristics but very challenging– Actionable messages / obvious reasoning
Next Steps of Defect Prediction
1980s 1990s 2000s 2010s 2020s
Actionable Defect
Prediction
Prediction Model (Regression)
Prediction Model (Classification)
Just-In-Time Prediction Model
Practical Model and Applications
Process Metrics
Metrics
Mod
els
Oth
ers
33
Evaluation Measure for Practical Model
• Measure prediction performance based on code review effort
• AUCEC (Area Under Cost Effectiveness Curve)
Percent of LOC
Perc
ent
of
bugs
found
0100%
100%
50%10%
M1
M2
Thre
shol
d
Rahman@FSE`11, Bugcache for inspections: Hit or miss?
34
Practical Application
• What else can we do more with defect prediction models?– Test case selection on regression testing
(Engstrom@ICST`10)
– Prioritizing warnings from FindBugs (Rahman@ICSE`14)
Defect Prediction Approaches1970
s 1980s 1990s 2000s 2010sLOC
Simple Model
Fitting ModelPrediction Model (Regression)
Prediction Model (Classification)
Cyclomatic
MetricHalstea
d Metrics
CK MetricsProcess Metrics
Metr
ics
Mod
els
Oth
ers
Practical Model and Applications
Just-In-Time Prediction Model
History Metrics
36
Representative OO Metrics
Metric Description
WMC Weighted Methods per Class (# of methods)
DIT Depth of Inheritance Tree ( # of ancestor classes)
NOC Number of Children
CBO Coupling between Objects (# of coupled classes)
RFC Response for a class: WMC + # of methods called by the class)
LCOM Lack of Cohesion in Methods (# of "connected components”)
• CK metrics (Chidamber&Kemerer@TSE`94)
• Prediction Performance of CK vs. code (Basili@TSE`96)
– F-measure: 70% vs. 60%
Defect Prediction Approaches1970
s 1980s 1990s 2000s 2010sLOC
Simple Model
Fitting ModelPrediction Model (Regression)
Prediction Model (Classification)
Cyclomatic
MetricHalstea
d Metrics
CK MetricsProcess Metrics
Metr
ics
Mod
els
Oth
ers
Practical Model and Applications
Just-In-Time Prediction Model
History Metrics
38
Representative History Metrics
Name# of
metrics
Metric source Citation
Relative code change churn 8 SW Repo.* Nagappan@ICSE`05
Change 17 SW Repo. Moser@ICSE`08
Change Entropy 1 SW Repo. Hassan@ICSE`09
Code metric churnCode Entropy 2 SW Repo. D’Ambros@MSR`10
Popularity 5 Email archive
Bacchelli@FASE`10
Ownership 4 SW Repo. Bird@FSE`11
Micro Interaction Metrics (MIM) 56 Mylyn Lee@FSE`11
* SW Repo. = version control system + issue tracking system
Representative History Metrics
• Advantage– Better prediction performance than code metrics
39
Moser`08 Hassan`09 D'Ambros`10 Bachille`10 Bird`11 Lee`110.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
Performance Improvement(all metrics vs. code complexity metrics)
(F-measure) (F-measure)(Absoluteprediction
error)
(Spearmancorrelation)
(Spearmancorrelation)
(Spearmancorrelation*)
(*Bird`10’s results are from two metrics vs. code metrics, No comparison data in Nagappan`05)
PerformanceImprovement
(%)
40
History Metrics
• Limitations– History metrics do not extract particular program
characteristics such as developer social network, component network, and anti-pattern.
– Noise data• Bias in Bug-Fix Dataset (Bird@FSE`09)
– Not applicable for new projects and projects lacking in historical data
Defect Prediction Approaches1970
s 1980s 1990s 2000s 2010sLOC
Simple Model
Fitting ModelPrediction Model (Regression)
Prediction Model (Classification)
Cyclomatic
MetricHalstea
d Metrics
CK Metrics
Just-In-Time Prediction Model
Cross-Project Prediction
Practical Model and Applications
Universal
Model
Process Metrics
Cross-Project Feasibility
Metr
ics
Mod
els
Oth
ers
History Metrics
Other Metrics
Noise Reduction
Semi-supervised/active
Defect Prediction Approaches1970
s 1980s 1990s 2000s 2010sLOC
Simple Model
Fitting ModelPrediction Model (Regression)
Prediction Model (Classification)
Cyclomatic
MetricHalstea
d Metrics
CK Metrics
Just-In-Time Prediction Model
Cross-Project Prediction
Practical Model and Applications
Universal
Model
Process Metrics
Cross-Project Feasibility
Metr
ics
Mod
els
Oth
ers
History Metrics
Other Metrics
Noise Reduction
Semi-supervised/active
43
Other Metrics
Name# of
metrics
Metric source Citation
Component network 28
Binaries(Windows
Server 2003)
Zimmermann@ICSE`08
Developer-Module network 9 SW Repo. + Binaries
Pinzger@FSE`08
Developer social network 4 SW Repo. Meenely@FSE`08
Anti-pattern 4
SW Repo. +
Design-pattern
Taba@ICSM`13
* SW Repo. = version control system + issue tracking system
Defect Prediction Approaches1970
s 1980s 1990s 2000s 2010sLOC
Simple Model
Fitting ModelPrediction Model (Regression)
Prediction Model (Classification)
Cyclomatic
MetricHalstea
d Metrics
CK Metrics
Just-In-Time Prediction Model
Cross-Project Prediction
Practical Model and Applications
Universal
Model
Process Metrics
Cross-Project Feasibility
Metr
ics
Mod
els
Oth
ers
History Metrics
Other Metrics
Noise Reduction
Semi-supervised/active
45
Noise Reduction
• Noise detection and elimination algorithm (Kim@ICSE`11)
– Closest List Noise Identification (CLNI)• Based on Euclidean distance between instances
– Average F-measure improvement• 0.504 0.621
• Relink (Wo@FSE`11)
– Recover missing links between bugs and changes
– 60% 78% recall for missing links– F-measure improvement
• e.g. 0.698 (traditional) 0.731 (ReLink)
Defect Prediction Approaches1970
s 1980s 1990s 2000s 2010sLOC
Simple Model
Fitting ModelPrediction Model (Regression)
Prediction Model (Classification)
Cyclomatic
MetricHalstea
d Metrics
CK Metrics
Just-In-Time Prediction Model
Cross-Project Prediction
Practical Model and Applications
Universal
Model
Process Metrics
Cross-Project Feasibility
Metr
ics
Mod
els
Oth
ers
History Metrics
Other Metrics
Semi-supervised/active
47
Defect Prediction for New Software Projects
• Universal Defect Prediction Model
• Simi-supervised / active learning
• Cross-Project Defect Prediction
48
Universal Defect Prediction Model
(Zhang@MSR`14)• Context-aware rank transformation– Transform metric values ranged from 1 to 10
across all projects.
• Model built by 1398 projects collected from SourceForge and Google code
Defect Prediction Approaches1970
s 1980s 1990s 2000s 2010sLOC
Simple Model
Fitting ModelPrediction Model (Regression)
Prediction Model (Classification)
Cyclomatic
MetricHalstea
d Metrics
CK Metrics
Just-In-Time Prediction Model
Cross-Project Prediction
Practical Model and Applications
Universal
Model
Process Metrics
Cross-Project Feasibility
Metr
ics
Mod
els
Oth
ers
History Metrics
Other Metrics
Semi-supervised/active
50
Other approaches for CDDP
• Semi-supervised learning with dimension reduction for defect prediction (Lu@ASE`12)
– Training a model by a small set of labeled instances together with many unlabeled instances
– AUC improvement• 0.83 0.88 with 2% labeled instances
• Sample-based semi-supervised/active learning for defect prediction (Li@AESEJ`12)
– Average F-measure• 0.628 0.685 with 10% sampled instances
Defect Prediction Approaches1970
s 1980s 1990s 2000s 2010sLOC
Simple Model
Fitting ModelPrediction Model (Regression)
Prediction Model (Classification)
Cyclomatic
MetricHalstea
d Metrics
CK Metrics
Just-In-Time Prediction Model
Cross-Project Prediction
Practical Model and Applications
Universal
Model
Process Metrics
Cross-Project Feasibility
Metr
ics
Mod
els
Oth
ers
History Metrics
Other Metrics
Semi-supervised/active
52
Cross-Project Defect Prediction
(CPDP)• For a new project or a project
lacking in the historical data
?
?
?
Training
Test
Model
Project A Project B
Only 2% out of 622 prediction combinations worked. (Zimmermann@FSE`09)
Transfer Learning (TL)
27
Traditional Machine Learning (ML)
Learning
System
Learning
System
Transfer Learning
Learning
System
Learning
System
Knowledge
Transfer
Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis
54
CPDP
• Adopting transfer learning
Transfer learning
Metric Compensation
NN Filter TNB TCA+
Preprocessing N/AFeature
selection,Log-filter
Log-filter Normalization
Machine learner
C4.5 Naive Bayes TNBLogistic
Regression
# of Subjects 2 10 10 8
# of predictions
2 10 10 26
Avg. f-measure0.67
(W:0.79, C:0.58)0.35
(W:0.37, C:0.26)
0.39(NN: 0.35,
C:0.33)
0.46(W:0.46, C:0.36)
Citation Watanabe@PROMISE`08
Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13
* NN = Nearest neighbor, W = Within, C = Cross
55
Metric Compensation (Watanabe@PROMISE`08)
• Key idea
• New target metric value =target metric value * average source
metric value average target metric value
s
Source Target New Target
Let me transform like source!
56
Metric Compensation (cont.)(Watanabe@PROMISE`08)
Transfer learning
Metric Compensation
NN Filter TNB TCA+
Preprocessing N/AFeature
selection,Log-filter
Log-filter Normalization
Machine learner
C4.5 Naive Bayes TNBLogistic
Regression
# of Subjects 2 10 10 8
# of predictions
2 10 10 26
Avg. f-measure0.67
(W:0.79, C:0.58)0.35
(W:0.37, C:0.26)
0.39(NN: 0.35,
C:0.33)
0.46(W:0.46, C:0.36)
Citation Watanabe@PROMISE`08
Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13
* NN = Nearest neighbor, W = Within, C = Cross
57
NN filter(Turhan@ESEJ`09)
• Key idea
• Nearest neighbor filter– Select 10 nearest source instances of
each target instance
New Source Target
Hey, you look like me! Could you be my model?
Source
58
NN filter (cont.)(Turhan@ESEJ`09)
Transfer learning
Metric Compensation
NN Filter TNB TCA+
Preprocessing N/AFeature
selection,Log-filter
Log-filter Normalization
Machine learner
C4.5 Naive Bayes TNBLogistic
Regression
# of Subjects 2 10 10 8
# of predictions
2 10 10 26
Avg. f-measure0.67
(W:0.79, C:0.58)0.35
(W:0.37, C:0.26)
0.39(NN: 0.35,
C:0.33)
0.46(W:0.46, C:0.36)
Citation Watanabe@PROMISE`08
Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13
* NN = Nearest neighbor, W = Within, C = Cross
Transfer Naive Bayes(Ma@IST`12)
• Key idea
59
Target
Hey, you look like me! You will get more chance to be my best model!
Source
Provide more weight to similar source instances to build a Naive Bayes Model
Build a model
Please, consider me more important than other instances
I’m not that
important!
60
Transfer Naive Bayes (cont.)(Ma@IST`12)
• Transfer Naive Bayes
– New prior probability
– New conditional probability
61
Transfer Naive Bayes (cont.)(Ma@IST`12)
• How to find similar source instances for target– A similarity score
– A weight value
F1 F2 F3 F4Score
(si)
Max of target 7 3 2 5 -
src. inst 1 5 4 2 2 3
src. inst 2 0 2 5 9 1
Min of target 1 2 0 1 -
k=# of features, si=score of instance i
62
Transfer Naive Bayes (cont.)(Ma@IST`12)
Transfer learning
Metric Compensation
NN Filter TNB TCA+
Preprocessing N/AFeature
selection,Log-filter
Log-filter Normalization
Machine learner
C4.5 Naive Bayes TNBLogistic
Regression
# of Subjects 2 10 10 8
# of predictions
2 10 10 26
Avg. f-measure0.67
(W:0.79, C:0.58)0.35
(W:0.37, C:0.26)
0.39(NN: 0.35,
C:0.33)
0.46(W:0.46, C:0.36)
Citation Watanabe@PROMISE`08
Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13
* NN = Nearest neighbor, W = Within, C = Cross
63
TCA+(Nam@ICSE`13)
• Key idea– TCA (Transfer Component Analysis)
Source Target
Oops, we are different! Let’s meet in another world!
New Source New Target
64
Transfer Component Analysis (cont.)
• Feature extraction approach– Dimensionality reduction– Projection• Map original data
in a lower-dimensional feature space
1-dimensional feature space
2-dimensional feature space
65
TCA (cont.)
Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis
Target domain dataSource domain data
66
TCA (cont.)
TCA
Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis
TCA+(Nam@ICSE`13)
67
Source Target
Oops, we are different! Let’s meet at another world!
New Source New Target
But, we are still a bit different!
Source Target
Oops, we are different! Let’s meet at another world!
New Source New Target
Normalize US together!
TCATCA+
Normalization Options
• NoN: No normalization applied
• N1: Min-max normalization (max=1, min=0)
• N2: Z-score normalization (mean=0, std=1)
• N3: Z-score normalization only using source mean and standard deviation
• N4: Z-score normalization only using target mean and standard deviation
13
69
Preliminary Results using TCA
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
F-measure
*Baseline: Cross-project defect prediction without TCA and normalization
Prediction performance of TCA varies according to different
normalization options!Baseline NoN N1 N2 N3 N4
Baseline NoN N1 N2 N3 N4
Project A Project B
Project B Project A
F-m
easu
re
70
TCA+: Decision Rules
• Find a suitable normalization for TCA
• Steps–#1: Characterize a dataset–#2: Measure similarity
between source and target datasets
–#3: Decision rules
71
TCA+: #1. Characterize a Dataset
3
1
…
Dataset A
Dataset B
2
4
5
8
9
6
11
d1,
2
d1,
5
d1,
3
d3,11
3
1
…
24
5
8
9
611
d2,
6
d1,
2 d1,
3
d3,11
DIST={dij : i,j, 1 ≤ i < n, 1 < j ≤ n, i < j}
A
72
TCA+: #2. Measure Similarity between Source and Target
• Minimum (min) and maximum (max) values of DIST
• Mean and standard deviation (std) of DIST• The number of instances
73
TCA+: #3. Decision Rules
• Rule #1– Mean and Std are same NoN
• Rule #2– Max and Min are different N1 (max=1,
min=0)
• Rule #3,#4– Std and # of instances are different
N3 or N4 (src/tgt mean=0, std=1)
• Rule #5– Default N2 (mean=0, std=1)
74
TCA+ (cont.)(Nam@ICSE`13)
Transfer learning
Metric Compensation
NN Filter TNB TCA+
Preprocessing N/AFeature
selection,Log-filter
Log-filter Normalization
Machine learner
C4.5 Naive Bayes TNBLogistic
Regression
# of Subjects 2 10 10 8
# of predictions
2 10 10 26
Avg. f-measure0.67
(W:0.79, C:0.58)0.35
(W:0.37, C:0.26)
0.39(NN: 0.35,
C:0.33)
0.46(W:0.46, C:0.36)
Citation Watanabe@PROMISE`08
Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13
* NN = Nearest neighbor, W = Within, C = Cross
75
Current CPDP using TL• Advantages
– Comparable prediction performance to within-prediction models
– Benefit from the state-of-the-art TL approaches
• Limitation– Performance of some cross-prediction pairs is still
poor. (Negative Transfer)
Source Target
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010sLOC
Simple Model
Fitting ModelPrediction Model (Regression)
Prediction Model (Classification)
Cyclomatic
MetricHalstea
d Metrics
CK Metrics
Just-In-Time Prediction Model
Cross-Project Prediction
Practical Model and Applications
Universal
Model
Process Metrics
Cross-Project Feasibility
Metr
ics
Mod
els
Oth
ers
History Metrics
Other Metrics
Semi-supervised/active
77
Feasibility Evaluation for CPDP
• Solution for negative transfer– Decision tree using project characteristic metrics
(Zimmermann@FSE`09)
• E.g. programming language, # developers, etc.
78
Follow-up Studies• “An investigation on the feasibility of cross-
project defect prediction.” (He@ASEJ`12)
– Decision tree using distributional characteristics of a dataset E.g. mean, skewness, peakedness, etc.
79
Feasibility for CPDP
• Challenges on current studies– Decision trees were not evaluated properly.
• Just fitting model
– Low target prediction coverage• 5 out of 34 target projects were feasible for cross-
predictions (He@ASEJ`12)
Next Steps of Defect Prediction
1980s 1990s 2000s 2010s 2020s
Cross-Prediction Feasibility
Model
Prediction Model (Regression)
Prediction Model (Classification)
CK Metrics
Just-In-Time Prediction Model
Cross-Project Prediction
Practical Model and Applications
Universal
Model
Process Metrics
Cross-Project Feasibility
Metrics
Mod
els
Oth
ers
History Metrics
Other Metrics
Semi-supervised/active
Semi-supervised/active
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010sLOC
Simple Model
Fitting ModelPrediction Model (Regression)
Prediction Model (Classification)
Cyclomatic
MetricHalstea
d Metrics
CK Metrics
History Metrics
Just-In-Time Prediction Model
Cross-Project Prediction
Other Metrics
Practical Model and Applications
Universal
Model
Process Metrics
Cross-Project Feasibility
Metr
ics
Mod
els
Oth
ers
Personalized Model
82
Cross-prediction Model• Common challenge
– Current cross-prediction models are limited to datasets with same number of metrics
– Not applicable on projects with different feature spaces (different domains)• NASA Dataset: Halstead, LOC• Apache Dataset: LOC, Cyclomatic, CK metrics
Source Target
Next Steps of Defect Prediction
1980s 1990s 2000s 2010s 2020s
Prediction Model (Regression)
Prediction Model (Classification)
CK Metrics
Just-In-Time Prediction Model
Cross-Project Prediction
Practical Model and Applications
Universal
Model
Process Metrics
Cross-Project Feasibility
Metrics
Mod
els
Oth
ers
Cross-Domain
Prediction
History Metrics
Other Metrics
Noise Reduction
Semi-supervised/activePersonalized Model
84
Other Topics
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010sLOC
Simple Model
Fitting ModelPrediction Model (Regression)
Prediction Model (Classification)
Cyclomatic
MetricHalstea
d Metrics
CK Metrics
History Metrics
Just-In-Time Prediction Model
Cross-Project Prediction
Other Metrics
Practical Model and Applications
Universal
Model
Process Metrics
Cross-Project Feasibility
Metr
ics
Mod
els
Oth
ers
Data Privacy
Noise Reduction
Semi-supervised/activePersonalized Model
86
Other Topics• Privacy issue on defect datasets
– MORPH (Peters@ICSE`12)
• Mutate defect datasets while keeping prediction accuracy
• Can accelerate cross-project defect prediction with industrial datasets
• Personalized defect prediction model (Jiang@ASE`13)
– “Different developers have different coding styles, commit frequencies, and experience levels, all of which cause different defect patterns.”
– Results• Average F-measure: 0.62 (personalized models) vs. 0.59
(non-personalized models)
87
Outline
• Background• Software Defect Prediction Approaches– Simple metric and defect estimation models– Complexity metrics and Fitting models– Prediction models– Just-In-Time Prediction Models– Practical Prediction Models and Applications– History Metrics from Software Repositories– Cross-Project Defect Prediction and
Feasibility
• Summary and Challenging Issues
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010sLOC
Simple Model
Fitting ModelPrediction Model (Regression)
Prediction Model (Classification)
Cyclomatic
MetricHalstea
d Metrics
CK Metrics
History Metrics
Just-In-Time Prediction Model
Cross-Project Prediction
Other Metrics
Practical Model and Applications
Data Privacy
Universal
Model
Process Metrics
Cross-Project Feasibility
Metr
ics
Mod
els
Oth
ers Noise
Reduction
Semi-supervised/activePersonalized Model
Next Steps of Defect Prediction
1980s 1990s 2000s 2010s 2020s
Online Learning JIT Model
Actionable Defect
Prediction
Cross-Prediction Feasibility
Model
Prediction Model (Regression)
Prediction Model (Classification)
CK Metrics
History Metrics
Just-In-Time Prediction Model
Cross-Project Prediction
Other Metrics
Practical Model and Applications
Universal
Model
Process Metrics
Cross-Project Feasibility
Metrics
Mod
els
Oth
ers
Cross-Domain
Prediction
Fine-grained Prediction
Data Privacy
Noise Reduction
Semi-supervised/activePersonalized Model
90
Thank you!
91
92
Evaluation Measures (classification)
• Measures for binary classification– Confusion matrix
Buggy Clean
Buggy True Positive (TP) False Negative (FN)
Clean False Positive (FP) True Negatives (TN)
Predicted Class
ActualClass
93
Evaluation Measures (classification)
• False positive rate (FPR,PF) = FP/(TN+FP)
• Accuracy = (TP+TN)/(TP+FP+TN+FN)
• Precision = TP/(TP+FP)• Recall = TP/(TP+FN)• F-measure =
2*Precision*Recall Precision+Recall
94
Evaluation Measures (classification)
• AUC (Area Under receiver operating characteristic Curve)
False Positive rate
True P
osi
tive r
ate
01
1
95
Evaluation Measures (classification)
• AUCEC (Area Under Cost Effectiveness Curve)
Percent of LOC
Perc
ent
of
bugs
found
0100%
100%
50%10%
M1
M2Th
resh
old
Rahman@FSE`11, Bugcache for inspections: Hit or miss?
96
Evaluation Measures (Regression)
• Target–Metric values vs. the number of bugs– Actual vs. predicted number of bugs
• Correlation coefficient– Spearman / Pearson /R2
• Mean squared error
97
CK metrics
Metric Description
WMC Weighted Methods per Class (# of methods)
DIT Depth of Inheritance Tree ( # of ancestor classes)
NOC Number of Children
CBO Coupling between Objects (# of coupled classes)
RFC Response for a class: WMC + # of methods called by the class)
LCOM Lack of Cohesion in Methods (# of "connected components”)