Upload
rakesh-rana
View
56
Download
0
Embed Size (px)
Citation preview
Rakesh Rana
University of Gothenburg, Sweden
Analyzing defect inflow distribution of large software projects
Defect inflow distribution
Tracking and predicting quality challenge.
Software defects observable and useful indicator to track and forecast software reliability.
Software reliability measures are primarily used for [1]:β’ Planning and controlling testing resources allocation, andβ’ Evaluating the maturity or release readiness.
[1] C.-Y. Huang, M. R. Lyu, and S.-Y. Kuo, βA unified scheme of some nonhomogenous poisson process models for software reliability estimation,β IEEE Trans. Softw.
Eng., vol. 29, no. 3, pp. 261β269, 2003.
SRGMs: Software Reliability Growth Models
Defect inflow distribution
According to Okamura, Dohi and Osaki [1], understanding underlying defect distribution family is important:
βWhen the number of total software faults is given by a Poisson random variable, the mean value function of NHPP-based SRGMs is dominated by only failure time distribution. That is, the essential problem can be reduced to what kind of probability distribution is suitable for representing the failure time distribution.β
[1] H. Okamura, T. Dohi, and S. Osaki, βSoftware reliability growth model with normal distribution and its parameter estimation,β in Quality, Reliability, Risk, Maintenance,
and Safety Engineering (ICQR2MSE), 2011 International Conference on, 2011, pp. 411β416.
Why?
Finding the distribution that fits best to observed defect inflow data is helpful for:
1. Understanding underlying process of defect discovery
2. Choose the correct statistical analysis
3. Visualization and simulations
4. Selecting appropriate model for modelling/forecasting
reliability growth
5. Bayesian analysis to model prior probability
Research objectives
Explore which statistical distribution fit best to the defect inflow from large software projects, and
Explore how different information criteria differ in selection of best distribution fit.
Research methodology
Context: Large Software Projects
Case: Defect Inflow Distribution (best fit)
Unit 1: VCC
Four large automotive software project
Unit 2: Ericsson
Five consecutive releases of a large telecom product
Unit 3: OSS
Five large open source software project
Unit of analysis Application domainSoftware development process for studied
projects
Volvo Cars
GroupAutomotive
V-shaped software development mostly using sub-
suppliers for implementation
Ericsson Telecom Agile development, mostly in-house
OSS Open-Source ProjectsOpen source software development, projects from
Apache and Mozilla
Projects used in study
Case Unit Project/Release Time PeriodTotal number of
Defects*/Issues
VCG
Project-A1
Project-A2
Project-A3
Project-A4
NA
6.7X
14.4X
2.0X
X
Ericsson
Release-B1
Release-B2
Release-B3
Release-B4
Release-B5
NA
2.2Y
Y
1.3Y
1.2Y
1.6Y
OSS
Project- HTTPClient
Project- Jackrabbit
Project- Lucene-Java
Project- Rhino
Project- Tomcat5
Nov-2001 β Apr-2012
Sep-2004 β Apr-2012
Mar-2004 β Mar-2012
Nov-1999 β Feb-2012
May-2002 β Dec-2011
305
938
697
302
670
Overview of distributions
No Distribution Notation Parameters Probability Density Function
1 Exponential πΈπ₯π(π) π > 0; π πβππ₯
2 Weibull πππππ’ππ(π, π)π > 0;π > 0
π
π
π₯
π
πβ1
πβ π₯ π π
0 , π₯ < 0, π₯ β₯ 0
3 Beta π΅ππ‘π(πΌ, π½)πΌ > 0;π½ > 0
π₯πΌβ1 1 β π₯ π½β1
π΅ πΌ, π½,
π€βπππ π΅ πΌ, π½ = 0
1
π’πΌβ1 1 β π’ π½β1 ππ’
4 Gamma πΊππππ(π, π)π > 0;π > 0
1
Ξ π πππ₯πβ1π
βπ₯π,
π€βπππ Ξ π = 0
β
π₯π‘β1πβπ₯ ππ₯
5 Logistic πΏππππ π‘ππ(π, π )π (ππππ);
π > 0
πβπ₯βππ
π 1 + πβπ₯βππ
2
6 Normal π©(π, π2)π (ππππ);
π2 > 0
1
π 2ππβπ₯βπ 2
2π2
Overview of information criteria
No Short Long Name Definition
1 LogLik Log likelihoodLogarithm of the probability of
observed outcomes given a set ofparameter values
2 ML Maximum Likelihood ππΏ = β2 β ππππππ
3 AIC Akaike Information Criterion π΄πΌπΆ = β2 β ππππππ + 2 β π
4 AICcAkaike Information Criterion
(correction)π΄πΌπΆπ = β2 β ππππππ +
2ππ
π β π β 1
5 BIC Bayesian Information Criterion π΅πΌπΆ = β2 β ππππππ + π β log(π)
6 HQCHannanβQuinn Information
Criterion
π»ππΆ= β2 β ππππππ + 2 β π β log(log(π))
Where π = ππ’ππππ ππ πππππππ‘πππ πππ π = ππ’ππππ ππ πππ πππ£ππ‘ππππ
Defect Inflow Profiles
Probability density plots
Quantileβquantile plots (QQ-plots)
Different information criteria: Project-Jack
Project Distribution LogLik ML AIC AICc BIC HQC
Jack
Exponential 7.29 -14.57 -12.57 -12.53 -10.05 -11.56
Weibull 36.25 -72.50 -68.50 -68.36 -63.45 -66.46
Beta 36.72 -73.44 -69.44 -69.31 -64.40 -67.41
Gamma 36.05 -72.10 -68.10 -67.96 -63.06 -66.06
Logistic 31.43 -62.86 -58.86 -58.72 -53.81 -56.82
Normal 30.79 -61.58 -57.58 -57.44 -52.53 -55.54
Selected Criteria 36.72 -73.44 -69.44 -69.31 -64.40 -67.41
Selected Distribution Beta Beta Beta Beta Beta Beta
Log-Likelihood values for selected distribution
Project Exponential Weibull Beta Gamma Logistic Normal
A1 59.0 63.6 105.9 66.0 8.4 5.2
A2 5.5 7.0 19.8 5.5 -6.8 -3.1
A3 56.9 82.9 104.6 98.6 11.6 10.9
A4 119.8 188.3 491.2 199.5 50.3 35.3
Release Exponential Weibull Beta Gamma Logistic Normal
B1 88.1 92.6 167.1 97.9 48.8 39.6
B2 -4.3 4.6 24.8 6.0 2.7 2.0
B3 -7.8 2.8 5.0 2.9 -0.7 0.3
B4 38.8 38.9 86.7 40.4 30.2 24.7
B5 -8.4 5.4 11.9 5.3 1.7 2.2
Project Exponential Weibull Beta Gamma Logistic Normal
Http 195.4 202.1 406.2 211.9 174.6 169.0
Jack 7.3 36.2 36.7 36.0 31.4 30.8
Lucene 62.0 67.6 70.7 71.9 59.4 58.8
Rhino 52.9 63.7 59.4 77.3 31.4 28.1
TomCat 58.5 73.0 76.9 71.9 12.3 10.7
QuantileβQuantile plots (QQ-plots)
Conclusions
Research objectives Explore which statistical distribution fit best to the defect inflow
from large software projects, and Explore how different information criteria differ in selection of
best distribution fit.
Itβs useful for: Understanding underlying process of defect discovery. Choose the correct statistical analysis Visualization and simulations Selecting appropriate model for modelling/forecasting reliability
growth Bayesian analysis to model prior probability