Software Metrics and Defect Prediction Ayşe Başar Bener

Software Metrics and Defect Prediction

Ayşe Başar Bener

Problem 1

How to tell if the project is on schedule and within budget? Earned-value

charts.

Problem 2

How hard will it be for another organization to maintain this software? McCabe

Complexity

Problem 3

How to tell when the subsystems are ready to be integrated Defect Density

Metrics.

Problem Definition Software development

lifecycle: Requirements Design Development Test (Takes ~50% of overall time)

Detect and correct defects before delivering software.

Test strategies: Expert judgment Manual code reviews Oracles/ Predictors as secondary

tools

Testing

Defect Prediction 2-Class Classification Problem.

Non-defective If error = 0

Defective If error > 0

2 things needed: Raw data: Source code Software Metrics -> Static Code

Attributes

Static Code Attributes void main() { //This is a sample code

//Declare variables int a, b, c;

// Initialize variables a=2; b=5;

//Find the sum and display c if greater than zero

c=sum(a,b); if c < 0 printf(“%d\n”, a); return; }

int sum(int a, int b) { // Returns the sum of two numbers return a+b; }

c > 0

c

Module

LOC LOCC V CC Error

main() 16 4 5 2 2

sum() 5 1 3 1 0

LOC: Line of CodeLOCC: Line of commented CodeV: Number of unique operands&operatorsCC: Cyclometric Complexity

+

Research on Defect Prediction

Defect prediction using machine learning techniques How effectively we can estimate defect density?

Regression models First classification, then regression

Defect prediction in multi version software Defect prediction in embedded software

B. Turhan, and A. Bener, "A Multivariate Analysis of Static Code Attributes for Defect Prediction", QSIC 2007, Portland, USA, October 11-12, 2007

A.D. Oral and A. Bener, "Defect Prediction for Embedded Software", ISCIS 2007, Ankara, Turkey, November 9-11, 2007. Software Defect Identification Using Machine Learning Techniques”, E. Ceylan, O. Kutlubay, A. Bener, EUROMICRO SEAA, Dubrovnik,

Croatia, August 28th - September 1st, 2006 "Mining Software Data", B. Turhan and O. Kutlubay, Data Mining and Business Intelligence Workshop in ICDE'07 , İstanbul, April 2007 "A Two-Step Model for Defect Density Estimation", O. Kutlubay, B. Turhan and A. Bener, EUROMICRO SEAA, Lübeck, Germany, August

2007 "A Defect Prediction Method for Software Versioning", Y. Kastro and A. Bener, Software Quality Journal (in print). “Software Defect Density Estimation Using Static Code Attributes: A Two Step Model”, O. Kutlubay, B. Turhan, A. Bener, Eng. App. of AI

(under review)

Constructing Predictors Baseline: Naive Bayes. Why?: Best reported results so far (Menzies et

al., 2007) Remove assumptions and construct different

models. Independent Attributes ->Multivariate dist. Attributes of equal importance

"Software Defect Prediction: Heuristics for Weighted Naïve Bayes", B. Turhan and A. Bener, ICSOFT2007, Barcelona, Spain, July 2007.

“Software Defect Prediction Modeling”, B. Turhan, IDOESE 2007, Madrid, Spain, September 2007

“Yazılım Hata Kestirimi için Kaynak Kod Ölçütlerine Dayalı Bayes Sınıflandırması”, UYMS2007, Ankara, September 2007

“A Multivariate Analysis of Static Code Attributes for Defect Prediction”, B. Turhan and A. Bener QSIC 2007, Portland, USA, October 2007.

Weighted Naive Bayes))(log(

2

1)(

2

1i

d

j j

ijtj

i CPs

mxxg

Naive Bayes

Weighted Naive Bayes ))(log(2

1)(

2

1i

d

j j

ijtj

ji CPs

mxwxg

DatasetsName # Features #Modules Defect Rate(%)

CM1 38 505 9

PC1 38 1107 6

PC2 38 5589 0.6

PC3 38 1563 10

PC4 38 1458 12

KC3 38 458 9

KC4 38 125 40

MW1 38 403 9

Performance Measures

Defects

Actual

no yes

Prd

no A B

yes

C D

Accuracy: (A+D)/(A+B+C+D)

Pd (Hit Rate): D / (B+D)

Pf (False Alarm Rate): C / (A+C)

Results: InfoGain&GainRatio

DataWNB+IG (%) WNB+GR (%) IG+NB (%)

pd pf bal pd pf bal pd pf bal

CM1 82 39 70 82 39 70 83 32 74

PC1 69 35 67 69 35 67 40 12 57

PC2 72 15 77 66 20 72 72 15 77

PC3 80 35 71 81 35 72 60 15 70

PC4 88 27 79 87 24 81 92 29 78

KC3 80 27 76 83 30 76 48 15 62

KC4 77 35 70 78 35 71 79 33 72

MW1 70 38 66 68 34 67 44 07 60

Avg: 77 31 72 77 32 72 65 20 61

Results: Weight Assignments

ICSOFT’07

WC vs CC Data?• When to use WC or CC?

• How much data do we need to construct a model?

ICSOFT’07

Thank You

http://softlab.boun.edu.tr

Documents

Software Metrics and Defect Prediction Ayşe Başar Bener