54
Gerardo Canfora Andrea De Lucia Massimiliano Di Penta Rocco Oliveto Annibale Panichella Sebas<ano Panichella Multi-Objective Cross-Project Defect Prediction

Multi-Objective Cross-Project Defect Prediction

Embed Size (px)

Citation preview

Page 1: Multi-Objective Cross-Project Defect Prediction

Gerardo  Canfora  

Andrea  De  Lucia  

Massimiliano  Di  Penta  

Rocco  Oliveto  

Annibale Panichella

Sebas<ano  Panichella  

Multi-Objective Cross-Project Defect Prediction

Page 2: Multi-Objective Cross-Project Defect Prediction

Bugs are everywhere…

Page 3: Multi-Objective Cross-Project Defect Prediction

Software Testing

Page 4: Multi-Objective Cross-Project Defect Prediction

Practical Constraints

Sofwtare Quality

Money Time

Page 5: Multi-Objective Cross-Project Defect Prediction

Defect Prediction

Spent more resources on components most

likely to fail

Page 6: Multi-Objective Cross-Project Defect Prediction

Indicators of defects

Cached history information Kim  at  al.    ICSE  2007  

Change Metrics Moset  at  al.    ICSE  2008.  

A metrics suite for object oriented design Chidamber  at  al.  TSE      1994  

Page 7: Multi-Objective Cross-Project Defect Prediction

Defect Prediction Methodology

Predic<ng  Model  

 Project    

Test  Set  

Training  Set  Defect Prone

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Page 8: Multi-Objective Cross-Project Defect Prediction

Defect Prediction Methodology

Predic<ng  Model  

 Project    

Test  Set  

Training  Set  Defect Prone

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Within Project

Page 9: Multi-Objective Cross-Project Defect Prediction

Defect Prediction Methodology

Predic<ng  Model  

 Project    

Test  Set  

Training  Set  Defect Prone

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Within Project Issue: Size of the

Training Set

Page 10: Multi-Objective Cross-Project Defect Prediction

Defect Prediction Methodology

Predic<ng  Model  

 Project    

Test  Set  

Training  Set  Defect Prone

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Predic<ng  Model  

Test  Set  

Training  Set  Defect Prone

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Within Project Issue: Size of the

Training Set

 Past  Projects  

 New  Project  

Page 11: Multi-Objective Cross-Project Defect Prediction

 Project  B  

 Project  A  

Defect Prediction Methodology

Predic<ng  Model  

 Project    

Test  Set  

Training  Set  Defect Prone

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Predic<ng  Model  

Test  Set  

Training  Set  Defect Prone

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Within Project

Cross-Project

Issue: Size of the Training Set

Page 12: Multi-Objective Cross-Project Defect Prediction

 Project  B  

 Project  A  

Defect Prediction Methodology

Predic<ng  Model  

 Project    

Test  Set  

Training  Set  Defect Prone

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Predic<ng  Model  

Test  Set  

Training  Set  Defect Prone

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Within Project

Cross-Project

Issue: Size of the Training Set

Issue: The predicting

accuracy can be lower

Page 13: Multi-Objective Cross-Project Defect Prediction

Cost Effectiveness

1)  Cross-project does not necessarily works worse than within-project

2)  Better precision (accuracy)

does not mirror less inspection cost

3)  Traditional predicting model: logistic regression

Recaling the “imprecision” of Cross-project Defect Prediction, Rahman   at   al.  FSE  2012  

Page 14: Multi-Objective Cross-Project Defect Prediction

Cost Effectiveness: example

Class  A   Class  B   Class  C   Class  D  

Page 15: Multi-Objective Cross-Project Defect Prediction

Cost Effectiveness: example

Predicting model 1

Class  A   Class  B   Class  A   Class  C   Class  D  

100 LOC

10,000 LOC

100 LOC

100 LOC

100 LOC

Predicting model 2

Class  A   Class  B   Class  C   Class  D  

Page 16: Multi-Objective Cross-Project Defect Prediction

Cost Effectiveness: example

Predicting model 1

Class  A   Class  B   Class  A   Class  C   Class  D  

BUG   BUG  

100 LOC

10,000 LOC

100 LOC

100 LOC

100 LOC

Predicting model 2

Class  A   Class  B   Class  C   Class  D  

Page 17: Multi-Objective Cross-Project Defect Prediction

Cost Effectiveness: example

Predicting model 1

Class  A   Class  B   Class  A   Class  C   Class  D  

BUG   BUG  

100 LOC

10,000 LOC

100 LOC

100 LOC

100 LOC

Precision  =  50  %  

Cost  =10,100  LOC  

Predicting model 2

Class  A   Class  B   Class  C   Class  D  

Page 18: Multi-Objective Cross-Project Defect Prediction

Cost Effectiveness: an example

Predicting model 1

Class  A   Class  B   Class  A   Class  C   Class  D  

BUG   BUG  

100 LOC

10,000 LOC

100 LOC

100 LOC

100 LOC

Precision  =  50  %  

Cost  =10,100  LOC  

Predicting model 2

Precision  =  33  %  

Cost  =  300  LOC  

Class  A   Class  B   Class  C   Class  D  

Page 19: Multi-Objective Cross-Project Defect Prediction

Class  A   Class  B   Class  C   Class  D  

Cost Effectiveness: an example

Predicting model 1

Class  A   Class  B   Class  A   Class  C   Class  D  

BUG   BUG  

100 LOC

10,000 LOC

100 LOC

100 LOC

100 LOC

Predicting model 2

Precision does not mirror the inspection cost

All the existing predicting models work

on precision and not on cost

We need COST oriented models

Page 20: Multi-Objective Cross-Project Defect Prediction

Mul+-­‐objec+ve    Logis+c  Regression  

Page 21: Multi-Objective Cross-Project Defect Prediction

Building Predicting Model on Training Set

Training  Set  

P1 P2 …

Class1 m11 m12 …

Class2 m21 m22 …

Class3 m31 m32 …

Class4 … … …

… … … …

Logis<c  Regression  

Pred.

C1 1

C2 1

C3 0

C4 1

… 0

Page 22: Multi-Objective Cross-Project Defect Prediction

Building Predicting Model on Training Set

Training  Set  

Logis<c  Regression  

Pred.

C1 1

C2 1

C3 0

C4 1

… 0

Actual Val

C1 1

C2 0

C3 1

C4 1

… 0

P1 P2 …

Class1 m11 m12 …

Class2 m21 m22 …

Class3 m31 m32 …

Class4 … … …

… … … …

Page 23: Multi-Objective Cross-Project Defect Prediction

Building Predicting Model on Training Set

Training  Set  

Logis<c  Regression  

Pred.

C1 1

C2 1

C3 0

C4 1

… 0

Actual Val

C1 1

C2 0

C3 1

C4 1

… 0

Comparison

P1 P2 …

Class1 m11 m12 …

Class2 m21 m22 …

Class3 m31 m32 …

Class4 … … …

… … … …

Page 24: Multi-Objective Cross-Project Defect Prediction

Building Predicting Model on Training Set

Training  Set  

Logis<c  Regression  

Pred.

C1 1

C2 1

C3 0

C4 1

… 0

Actual Val

C1 1

C2 0

C3 1

C4 1

… 0

Comparison

P1 P2 …

Class1 m11 m12 …

Class2 m21 m22 …

Class3 m31 m32 …

Class4 … … …

… … … …

GOAL: minimazing the predicting error (PRECISION)

Page 25: Multi-Objective Cross-Project Defect Prediction

Building Predicting Model on Training Set

Training  Set  

Logis<c  Regression  

Pred.

C1 1

C2 1

C3 0

C4 1

… 0

Actual Val

C1 1

C2 0

C3 1

C4 1

… 0

Comparison

P1 P2 …

Class1 m11 m12 …

Class2 m21 m22 …

Class3 m31 m32 …

Class4 … … …

… … … …

GOAL: minimazing the predicting error (PRECISION)

Page 26: Multi-Objective Cross-Project Defect Prediction

Multi-objective Logistic Regression

Pred.

1

0

1

0

LOC

100

95

110

10

*   =  

Cost

100

0

110

0

Ispection Cost = 210 LOC

Page 27: Multi-Objective Cross-Project Defect Prediction

Multi-objective Logistic Regression

Pred.

1

0

1

0

LOC

100

95

110

10

*   =  

Cost

100

0

110

0

Ispection Cost = 210 LOC

Pred.

1

0

1

0

Actual Values

1

1

1

0

*   =  

#Bug

1

0

1

0

Effectiveness = 2 defects

Page 28: Multi-Objective Cross-Project Defect Prediction

Multi-objective Logistic Regression

⎪⎩

⎪⎨

⋅=

⋅=

∑∑

iii

ii

i

ActualPredessEffectiven

CostPredCostIspection min

max

Pred.

1

0

1

0

LOC

100

95

110

10

*   =  

Cost

100

0

110

0

Ispection Cost = 210 LOC

Pred.

1

0

1

0

Actual Values

1

1

1

0

*   =  

#Bug

1

0

1

0

Effectiveness = 2 defects

Page 29: Multi-Objective Cross-Project Defect Prediction

Multi-objective Logistic Regression

⎪⎩

⎪⎨

⋅=

⋅=

∑∑

iii

ii

i

ActualedessEffectiven

CostPredCostIspection

Pr

min

max

Pred.

1

0

1

0

LOC

100

95

110

10

*   =  

Cost

100

0

110

0

Ispection Cost = 210 LOC

Pred.

1

0

1

0

Actual Values

1

1

1

0

*   =  

#Bug

1

0

1

0

Effectiveness = 2 defects

Page 30: Multi-Objective Cross-Project Defect Prediction

a + b mi1 + c mi2 + …

Multi-objective Genetic Algorithm

⎪⎩

⎪⎨

⋅=

⋅=

∑∑

iii

ii

i

ActualedessEffectiven

CostPredCostIspection

Pr

min

max

. 1

eePred+

=a + b mi1 + c mi2 + …

Chromosome        (a, b,c , …)

Fitness Function

Multiple objectives are optimized using Pareto efficient approaches

Page 31: Multi-Objective Cross-Project Defect Prediction

Multi-objective Genetic Algorithm

Pareto Optimality: all solutions that are not dominated by any other solutions form the Pareto optimal set.

Multiple otpimal solutions (models) can be found Cost

Effe

ctiv

enes

s

The frontier allows to make a

well-informed decision that

b a l a n c e s t h e t r a d e - o f f s

between the two objectives

Page 32: Multi-Objective Cross-Project Defect Prediction

Empirical Evaluation

Page 33: Multi-Objective Cross-Project Defect Prediction

Research Questions

RQ1: How does the multi-objective (MO)prediction perform,

compared to single-objective (SO) prediction

Page 34: Multi-Objective Cross-Project Defect Prediction

Research Questions

RQ1: How does the multi-objective (MO)prediction perform,

compared to single-objective (SO) prediction

Cross-project MO vs. cross-project SO vs. within project SO

Page 35: Multi-Objective Cross-Project Defect Prediction

Research Questions

RQ2: How does the proposed approach perform, compared to the local prediction approach by Menzie et al. ?

RQ1: How does the multi-objective (MO)prediction perform,

compared to single-objective (SO) prediction

Cross-project MO vs. cross-project SO vs. within project SO

Page 36: Multi-Objective Cross-Project Defect Prediction

Research Questions

RQ2: How does the proposed approach perform, compared to the local prediction approach by Menzie et al. ?

RQ1: How does the multi-objective (MO)prediction perform,

compared to single-objective (SO) prediction

Cross-project MO vs. cross-project SO vs. within project SO

Cross-project MO vs. Local Prediction

Page 37: Multi-Objective Cross-Project Defect Prediction

Experiment outline •  10 java projects from PROMISE dataset ü   different  sizes  

ü   different  context  applica<on  

Page 38: Multi-Objective Cross-Project Defect Prediction

•  10 java projects from PROMISE dataset ü   different  sizes  ü   different  context  applica<on  

Experiment outline

•  Cross-projects defect prediction: ü  Training  model  on  nine  projects  and  test  on  the  remaining  one    (10  <mes)  

RQ1  

Page 39: Multi-Objective Cross-Project Defect Prediction

•  10 java projects from PROMISE dataset ü   different  sizes  ü   different  context  applica<on  

Experiment outline

•  Cross-projects defect prediction: ü  Training  model  on  nine  projects  and  test  on  the  remaining  one    (10  <mes)  

•  Within project defect prediction: ü   10  cross-­‐folder  valida<on  

RQ1  

RQ1  

Page 40: Multi-Objective Cross-Project Defect Prediction

•  10 java projects from PROMISE dataset ü   different  sizes  ü   different  context  applica<on  

Experiment outline

•  Cross-projects defect prediction: ü  Training  model  on  nine  projects  and  test  on  the  remaining  one    (10  <mes)  

•  Within project defect prediction: ü   10  cross-­‐folder  valida<on  

•  Local prediction: ü     K-­‐means  clustering  algorithm  ü   Silhoue]e  Coefficient  

RQ1  

RQ1  

RQ2  

Page 41: Multi-Objective Cross-Project Defect Prediction

Results

Page 42: Multi-Objective Cross-Project Defect Prediction

Results

Log4j jEdit

Page 43: Multi-Objective Cross-Project Defect Prediction

Cross-project MO vs. Cross-project SO

0  

50  

100  

150  

200  

250  

300  

KLOC  

Cross-­‐project  SO   Cross  project  MO  

Page 44: Multi-Objective Cross-Project Defect Prediction

Cross-project MO vs. Cross-project SO

0  

50  

100  

150  

200  

250  

300  

KLOC  

Cross-­‐project  SO   Cross  project  MO  

The proposed multi-objective model Outperform the single-objective one

Page 45: Multi-Objective Cross-Project Defect Prediction

Cross-project MO vs. Within-project SO

0  

50  

100  

150  

200  

250  

300  

350  

KLOC  

Within  project  SO   Cross  project  MO  

Page 46: Multi-Objective Cross-Project Defect Prediction

Cross-project MO vs. Within-project SO

0  10  20  30  40  50  60  70  80  90  

100  

Precision  

Within  project  SO   Cross  project  MO  

Page 47: Multi-Objective Cross-Project Defect Prediction

Cross-project MO vs. Within-project SO

0  10  20  30  40  50  60  70  80  90  

100  

Precision  

Within  project  SO   Cross  project  MO  

Cross-project prediction is worse than within-project

prediction in terms of PRECISION

Page 48: Multi-Objective Cross-Project Defect Prediction

Cross-project MO vs. Within-project SO

0  10  20  30  40  50  60  70  80  90  

100  

Precision  

Within  project  SO   Cross  project  MO  

Cross-project prediction is worse than within-project

prediction in terms of PRECISION

But it is better than within-project predictors in term

of COST-EFFECTIVENESS

Page 49: Multi-Objective Cross-Project Defect Prediction

0  

50  

100  

150  

200  

250  

300  

KLOC  

Local  Predic<on   Cross  project  MO  

Cross-project MO vs. Local Prediction

Page 50: Multi-Objective Cross-Project Defect Prediction

0  

50  

100  

150  

200  

250  

300  

KLOC  

Local  Predic<on   Cross  project  MO  

Cross-project MO vs. Local Prediction

The multi-objective predictor outperforms the local

predictor.

Page 51: Multi-Objective Cross-Project Defect Prediction

Conclusions

Page 52: Multi-Objective Cross-Project Defect Prediction

Conclusions

Page 53: Multi-Objective Cross-Project Defect Prediction

Conclusions

Page 54: Multi-Objective Cross-Project Defect Prediction

Conclusions