Analysing Defect Inflow Distribution of Automotive & Large Software Projects

Rakesh Rana

University of Gothenburg, Sweden

[email protected]

Analyzing defect inflow distribution of large software projects

mailto:[email protected]

Defect inflow distribution

Tracking and predicting quality challenge.

Software defects observable and useful indicator to track and forecast software reliability.

Software reliability measures are primarily used for [1]:• Planning and controlling testing resources allocation, and• Evaluating the maturity or release readiness.

[1] C.-Y. Huang, M. R. Lyu, and S.-Y. Kuo, “A unified scheme of some nonhomogenous poisson process models for software reliability estimation,” IEEE Trans. Softw.

Eng., vol. 29, no. 3, pp. 261–269, 2003.

SRGMs: Software Reliability Growth Models

Defect inflow distribution

According to Okamura, Dohi and Osaki [1], understanding underlying defect distribution family is important:

“When the number of total software faults is given by a Poisson random variable, the mean value function of NHPP-based SRGMs is dominated by only failure time distribution. That is, the essential problem can be reduced to what kind of probability distribution is suitable for representing the failure time distribution.”

[1] H. Okamura, T. Dohi, and S. Osaki, “Software reliability growth model with normal distribution and its parameter estimation,” in Quality, Reliability, Risk, Maintenance,

and Safety Engineering (ICQR2MSE), 2011 International Conference on, 2011, pp. 411–416.

Why?

Finding the distribution that fits best to observed defect inflow data is helpful for:

1. Understanding underlying process of defect discovery

2. Choose the correct statistical analysis

3. Visualization and simulations

4. Selecting appropriate model for modelling/forecasting

reliability growth

5. Bayesian analysis to model prior probability

Research objectives

Explore which statistical distribution fit best to the defect inflow from large software projects, and

Explore how different information criteria differ in selection of best distribution fit.

Research methodology

Context: Large Software Projects

Case: Defect Inflow Distribution (best fit)

Unit 1: VCC

Four large automotive software project

Unit 2: Ericsson

Five consecutive releases of a large telecom product

Unit 3: OSS

Five large open source software project

Unit of analysis Application domainSoftware development process for studied

projects

Volvo Cars

GroupAutomotive

V-shaped software development mostly using sub-

suppliers for implementation

Ericsson Telecom Agile development, mostly in-house

OSS Open-Source ProjectsOpen source software development, projects from

Apache and Mozilla

Projects used in study

Case Unit Project/Release Time PeriodTotal number of

Defects*/Issues

VCG

Project-A1

Project-A2

Project-A3

Project-A4

NA

6.7X

14.4X

2.0X

X

Ericsson

Release-B1

Release-B2

Release-B3

Release-B4

Release-B5

NA

2.2Y

Y

1.3Y

1.2Y

1.6Y

OSS

Project- HTTPClient

Project- Jackrabbit

Project- Lucene-Java

Project- Rhino

Project- Tomcat5

Nov-2001 – Apr-2012

Sep-2004 – Apr-2012

Mar-2004 – Mar-2012

Nov-1999 – Feb-2012

May-2002 – Dec-2011

305

938

697

302

670

Overview of distributions

No Distribution Notation Parameters Probability Density Function

1 Exponential 𝐸𝑥𝑝(𝜆) 𝜆 > 0; 𝜆 𝑒−𝜆𝑥

2 Weibull 𝑊𝑒𝑖𝑏𝑢𝑙𝑙(𝜆, 𝑘)𝜆 > 0;𝑘 > 0

𝑘

𝜆

𝑥

𝜆

𝑘−1

𝑒− 𝑥 𝜆 𝑘

0 , 𝑥 < 0, 𝑥 ≥ 0

3 Beta 𝐵𝑒𝑡𝑎(𝛼, 𝛽)𝛼 > 0;𝛽 > 0

𝑥𝛼−1 1 − 𝑥 𝛽−1

𝐵 𝛼, 𝛽,

𝑤ℎ𝑒𝑟𝑒 𝐵 𝛼, 𝛽 = 0

1

𝑢𝛼−1 1 − 𝑢 𝛽−1 𝑑𝑢

4 Gamma 𝐺𝑎𝑚𝑚𝑎(𝑘, 𝜃)𝑘 > 0;𝜃 > 0

1

Γ 𝑘 𝜃𝑘𝑥𝑘−1𝑒

−𝑥𝜃,

𝑤ℎ𝑒𝑟𝑒 Γ 𝑘 = 0

∞

𝑥𝑡−1𝑒−𝑥 𝑑𝑥

5 Logistic 𝐿𝑜𝑔𝑖𝑠𝑡𝑖𝑐(𝜇, 𝑠)𝜇 (𝑟𝑒𝑎𝑙);

𝑠 > 0

𝑒−𝑥−𝜇𝑠

𝑠 1 + 𝑒−𝑥−𝜇𝑠

2

6 Normal 𝒩(𝜇, 𝜎2)𝜇 (𝑟𝑒𝑎𝑙);

𝜎2 > 0

1

𝜎 2𝜋𝑒−𝑥−𝜇 2

2𝜎2

Overview of information criteria

No Short Long Name Definition

1 LogLik Log likelihoodLogarithm of the probability of

observed outcomes given a set ofparameter values

2 ML Maximum Likelihood 𝑀𝐿 = −2 ∗ 𝑙𝑜𝑔𝑙𝑖𝑘

3 AIC Akaike Information Criterion 𝐴𝐼𝐶 = −2 ∗ 𝑙𝑜𝑔𝑙𝑖𝑘 + 2 ∗ 𝑘

4 AICcAkaike Information Criterion

(correction)𝐴𝐼𝐶𝑐 = −2 ∗ 𝑙𝑜𝑔𝑙𝑖𝑘 +

2𝑘𝑛

𝑛 − 𝑘 − 1

5 BIC Bayesian Information Criterion 𝐵𝐼𝐶 = −2 ∗ 𝑙𝑜𝑔𝑙𝑖𝑘 + 𝑘 ∗ log(𝑛)

6 HQCHannan–Quinn Information

Criterion

𝐻𝑄𝐶= −2 ∗ 𝑙𝑜𝑔𝑙𝑖𝑘 + 2 ∗ 𝑘 ∗ log(log(𝑛))

Where 𝑘 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑎𝑛𝑑 𝑛 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠

Defect Inflow Profiles

Probability density plots

Quantile–quantile plots (QQ-plots)

Different information criteria: Project-Jack

Project Distribution LogLik ML AIC AICc BIC HQC

Jack

Exponential 7.29 -14.57 -12.57 -12.53 -10.05 -11.56

Weibull 36.25 -72.50 -68.50 -68.36 -63.45 -66.46

Beta 36.72 -73.44 -69.44 -69.31 -64.40 -67.41

Gamma 36.05 -72.10 -68.10 -67.96 -63.06 -66.06

Logistic 31.43 -62.86 -58.86 -58.72 -53.81 -56.82

Normal 30.79 -61.58 -57.58 -57.44 -52.53 -55.54

Selected Criteria 36.72 -73.44 -69.44 -69.31 -64.40 -67.41

Selected Distribution Beta Beta Beta Beta Beta Beta

Log-Likelihood values for selected distribution

Project Exponential Weibull Beta Gamma Logistic Normal

A1 59.0 63.6 105.9 66.0 8.4 5.2

A2 5.5 7.0 19.8 5.5 -6.8 -3.1

A3 56.9 82.9 104.6 98.6 11.6 10.9

A4 119.8 188.3 491.2 199.5 50.3 35.3

Release Exponential Weibull Beta Gamma Logistic Normal

B1 88.1 92.6 167.1 97.9 48.8 39.6

B2 -4.3 4.6 24.8 6.0 2.7 2.0

B3 -7.8 2.8 5.0 2.9 -0.7 0.3

B4 38.8 38.9 86.7 40.4 30.2 24.7

B5 -8.4 5.4 11.9 5.3 1.7 2.2

Project Exponential Weibull Beta Gamma Logistic Normal

Http 195.4 202.1 406.2 211.9 174.6 169.0

Jack 7.3 36.2 36.7 36.0 31.4 30.8

Lucene 62.0 67.6 70.7 71.9 59.4 58.8

Rhino 52.9 63.7 59.4 77.3 31.4 28.1

TomCat 58.5 73.0 76.9 71.9 12.3 10.7

Quantile–Quantile plots (QQ-plots)

Conclusions

Research objectives Explore which statistical distribution fit best to the defect inflow

from large software projects, and Explore how different information criteria differ in selection of

best distribution fit.

It’s useful for: Understanding underlying process of defect discovery. Choose the correct statistical analysis Visualization and simulations Selecting appropriate model for modelling/forecasting reliability

growth Bayesian analysis to model prior probability