29
Introduction to Defect Prediction Cmpe 589 Spring 2008

Introduction to Defect Prediction

  • Upload
    ron

  • View
    100

  • Download
    9

Embed Size (px)

DESCRIPTION

Introduction to Defect Prediction. Cmpe 589 Spring 2008. Problem 1. How to tell if the project is on schedule and within budget? Earned-value charts. Problem 2. How hard will it be for another organization to maintain this software? McCabe Complexity. Problem 3. - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to Defect Prediction

Introduction to Defect Prediction

Cmpe 589Spring 2008

Page 2: Introduction to Defect Prediction

Problem 1

How to tell if the project is on schedule and within budget? Earned-value

charts.

Page 3: Introduction to Defect Prediction

Problem 2

How hard will it be for another organization to maintain this software? McCabe

Complexity

Page 4: Introduction to Defect Prediction

Problem 3

How to tell when the subsystems are ready to be integrated Defect Density

Metrics.

Page 5: Introduction to Defect Prediction

Problem Definition Software

development lifecycle:

Requirements Design Development Test (Takes ~50% of overall

time)

Detect and correct defects before delivering software.

Test strategies: Expert judgment Manual code reviews Oracles/ Predictors as secondary

tools

Page 6: Introduction to Defect Prediction

Problem Definition

Page 7: Introduction to Defect Prediction

Testing

Page 8: Introduction to Defect Prediction

Defect Prediction 2-Class Classification Problem.

Non-defective If error = 0

Defective If error > 0

2 things needed: Raw data: Source code Software Metrics -> Static Code

Attributes

Page 9: Introduction to Defect Prediction
Page 10: Introduction to Defect Prediction

Static Code Attributes void main() { //This is a sample code

//Declare variables int a, b, c;

// Initialize variables a=2; b=5;

//Find the sum and display c if greater than zero

c=sum(a,b); if c < 0 printf(“%d\n”, a); return; }

int sum(int a, int b) { // Returns the sum of two

numbers return a+b; }

c > 0

c

Module

LOC LOCC V CC Error

main() 16 4 5 2 2

sum() 5 1 3 1 0

LOC: Line of CodeLOCC: Line of commented CodeV: Number of unique operands&operatorsCC: Cyclometric Complexity

Page 11: Introduction to Defect Prediction
Page 12: Introduction to Defect Prediction
Page 13: Introduction to Defect Prediction

+

Page 14: Introduction to Defect Prediction

Defect Prediction

Machine Learning based models. Defect density estimation

Regression models: error pronness First classification then regression

Defect prediction between versions Defect prediction for embedded systems

Page 15: Introduction to Defect Prediction

Constructing Predictors Baseline: Naive Bayes. Why?: Best reported results so far

(Menzies et al., 2007) Remove assumptions and construct

different models. Independent Attributes ->Multivariate

dist. Attributes of equal importance

Page 16: Introduction to Defect Prediction

Weighted Naive Bayes))(log(

2

1)(

2

1i

d

j j

ijtj

i CPs

mxxg

Naive Bayes

Weighted Naive Bayes ))(log(2

1)(

2

1i

d

j j

ijtj

ji CPs

mxwxg

Page 17: Introduction to Defect Prediction

DatasetsName # Features #Modules Defect Rate(%)

CM1 38 505 9

PC1 38 1107 6

PC2 38 5589 0.6

PC3 38 1563 10

PC4 38 1458 12

KC3 38 458 9

KC4 38 125 40

MW1 38 403 9

Page 18: Introduction to Defect Prediction

Performance Measures

DefectsActual

no yes

Prdno A B

yes C D

Accuracy: (A+D)/(A+B+C+D)

Pd (Hit Rate): D / (B+D)

Pf (False Alarm Rate): C / (A+C)

Page 19: Introduction to Defect Prediction
Page 20: Introduction to Defect Prediction

Results: InfoGain&GainRatio

DataWNB+IG (%) WNB+GR (%) IG+NB (%)

pd pf bal pd pf bal pd pf bal

CM1 82 39 70 82 39 70 83 32 74

PC1 69 35 67 69 35 67 40 12 57

PC2 72 15 77 66 20 72 72 15 77

PC3 80 35 71 81 35 72 60 15 70

PC4 88 27 79 87 24 81 92 29 78

KC3 80 27 76 83 30 76 48 15 62

KC4 77 35 70 78 35 71 79 33 72

MW1 70 38 66 68 34 67 44 07 60

Avg: 77 31 72 77 32 72 65 20 61

Page 21: Introduction to Defect Prediction

Results: Weight Assignments

Page 22: Introduction to Defect Prediction

Benefiting from defect data in practice

Within Company vs Cross Company Data Investigated in cost estimation literature No studies in defect prediction! No conclusions in cost estimation… Straight forward interpretation of results in

defect prediction. Possible reason: well defined features.

Page 23: Introduction to Defect Prediction

How much data do we need?

Consider: Dataset size:1000 Defect rate: 8% Training instances: %90

1000*8%*90%=72 defective instances

(1000-72) non-defective instances

Page 24: Introduction to Defect Prediction

Intelligent data sampling

With random sampling of 100 instances we can learn as well as thousands.

Can we increase the performance with wiser sampling strategies? Which data?

Practical aspects: Industrial case study.

Page 25: Introduction to Defect Prediction

WC vs CC Data?• When to use WC or CC?

• How much data do we need to construct a model?

Page 26: Introduction to Defect Prediction
Page 27: Introduction to Defect Prediction
Page 28: Introduction to Defect Prediction

Module Structure vs Defect Rate

Fan-in, fan-out Page Rank Algorithm Call graph information on the code “small is beautiful”

Page 29: Introduction to Defect Prediction

Performance vs. Granularity

0

20

40

60

80

100

120

Statement Method Class File Component Project

Performance

Granularity