Defect Inflow Prediction in Large Software Projects

e-Informatica Software Engineering Journal, Volume 4, Issue 1, 2010

Defect Inflow Predictionin Large Software Projects

Miroslaw Staron∗, Wilhelm Meding∗∗∗Department of Applied IT, Chalmers | University of Gothenburg

∗∗Ericsson SW Research, Ericsson AB

[email protected], [email protected]

AbstractPerformance of software projects can be improved by providing predictions of various projectcharacteristics. The predictions warn managers with information about potential problems andprovide them with the possibility to prevent or avoid problems. Large software projects are charac-terized by a large number of factors that impact the project performance, which makes predictingproject characteristics difficult. This paper presents methods for constructing prediction modelsof trends in defect inflow in large software projects based on a small number of variables. We referto these models as short-term prediction models and long-term prediction models. The short-termprediction models are used to predict the number of defects discovered in the code up to threeweeks in advance, while the long-term prediction models provide the possibility of predicting thedefect inflow for the whole project. The initial evaluation of these methods in a large softwareproject at Ericsson shows that the models are sufficiently accurate and easy to deploy.

1. Introduction

Large software projects have very different dy-namics than small projects; the number of fac-tors that affects the project is much larger thanfor small and medium software projects. There-fore, while constructing predictions for largesoftware projects, there is a trade-off betweenthe prediction accuracy and the effort requiredto collect the data necessary to predict (desig-nated by the number and complexity of vari-ables). The need to collect larger number ofdata is particularly important as the data maybe distributed over time and across the globe.Ericsson is no exception in that. The currentpractices for constructing predictions in largesoftware projects at Ericsson rely heavily on ex-pert estimations, which are rather time consum-ing; in particular the experts use analogy basedclassification techniques while constructing the

predictions for defect inflow – by identifying sim-ilarities and differences between projects, the ex-perts construct the predictions.

In this paper we present a case study con-ducted at Ericsson which results in developingnew methods for short-term and long-term de-fect inflow prediction. In our case (and at Er-icsson), defect inflow is defined the number ofdefects reported as a result of executing testcases during the development project at a spe-cific timeframe, often a week, a month or a year.The term inflow is used in the company to de-note that the defects which are discovered haveto be removed before the project concludes (i.e.,the product is released) and therefore these de-fects constitute additional work inflow in the de-velopment project.

In this paper we present two methods for pre-dicting defect inflow in large software projects inindustry:

90 Miroslaw Staron, Wilhelm Meding

– short-term defect inflow prediction used topredict defect inflow for periods up to 3weeks on a weekly basis, and

– long-term defect inflow prediction for the en-tire project lifecycle on a monthly basis.

For each project, the long-term prediction modelprovides means of planning projects and allocat-ing resources by taking into consideration ex-pected number of defect detected which have tobe removed before the release. The short-termprediction model provides the means of immedi-ate monitoring of the defect inflow status in theproject. The methods complement each other asthe predictions from the short-term model needto be interpreted in the context of the predic-tions from the long-term model.

In this paper we also present an evaluationof these two methods in a large software projectat Ericsson. The results of the evaluation showthat both the short-term and long-term predic-tion models developed using our methods in-crease the prediction accuracy in comparison tothe existing practices.

The structure of the paper is as follows. Sec-tion 2 presents the most relevant work for thispaper. Section 3 introduces the context of thecase study, in particular the organization of thelarge projects for which the methods are con-structed. Section 4 describes the short-term pre-diction method, while Section 5 presents thelong-term prediction method. Section 6 presentsthe process of evaluation of the prediction mod-els and Section 7 presents the results from theevaluation and Section 8 validity evaluation ofour study. Finally, Section 9 contains the con-clusions.

2. Related Work

The most related work to our long-term pre-diction method is the work of Amasaki [2] whoalso aims at creating defect inflow profiles andtrends. Their work is focused on using trendsfrom development project to predict post-releasedefect inflow, which is different from our work.Our methods are intended to predict the defectinflow trends in the development project (i.e.,

when the software product has not been releasedyet) with the aim to support project manage-ment rather than maintenance of the product.

Our long-term defect inflow prediction modelis similar to the the HyDEEP method [9] byKlas et al. The HyDEEP method aims to sup-port product quality assurance similarly to ourwork. The HyDEEP method requires establish-ing a baseline for effectiveness of defect removalprocess which requires additional effort from thequality managers (compared to our method).The mean relative error of the predictions cre-ated by HyDEEP can be up to 29.6% whichmakes it a viable alternative to our long-termdefect inflow prediction. In our future work weintend to compare the HyDEEP method to ourlong-term prediction in our industrial context tocompare the complexity and ease-of-use of thesetwo methods.

When developing the long-term defect inflowprediction model we used a similar approach toGoel [7]. Goel’s process advocates for using oneof the predefined defect inflow prediction model– e.g. Goel–Okumoto Nonhomogeneous PoissionModel, whereas we advocate for using linearregression methods. The models considered byGoel usually consider non-linear dependenciesbetween defect occurrences and between testingand defect discovery. Based on our discussionwith experts at the company the dependenciesin our case are linear cause-effect relationships.

Our research on long-term defect inflow pre-diction models is similar to the research on re-liability theory in software engineering w.r.t thefact that we are interested in the defect in-flow profile and not defect density during de-velopment. One of the most well-known mod-els used in the reliability theory is the Rayleighmodel [10], which describes the defect arrivalrates for software projects after the release. Itwas the first model we attempted to adjust andapply before we created the method presentedin this paper without much success. We use thismodel in the evaluation to show why this modelwas not appropriate for our case. Our work issubstantially different from the models whichpredict the defect inflow after release for the fol-lowing reasons:

Defect Inflow Prediction in Large Software Projects 91

– we can assume that when the developmentproject is concluded, the defects reported aretaken care of during the maintenance project(which can use the reliability theory to pre-dict the defect inflow),

– changes in the trends of the defect inflow arecaused by the project milestones and do notdepend on the time after the release.

These underlying differences from the reliabilitytheory underpin our research and make us havea different approach than the reliability theory.

Li et al. [11] evaluated empirically the waysof predicting field defect rates using Theil fore-casting statistics. The results of their work in-fluenced the design of our study on long-termdefect inflow prediction, although we see theirassumption of defects being reported after therelease to be the main difference in contexts fromour research. The Software Reliability GrowthModel [13], which is used in their paper, seemsnot to be applicable to the development projectsdone in iterative way, since it assumes that nonew functionality is developed when the defectsare reported; this is not the case of the develop-ment projects.

In our future work we consider extending ourprediction methods by using the same princi-ples of Constructive Quality Modeling for De-fect Density Prediction (COQUALMO) [4] andits recent extension – Dynamic COQUALMO[8]. In particular we intend to consider defectintroduction and removal as two separate pro-cesses and use the efficiency of defect removal(from [9]) as one of the variables in our predic-tion models in the short-term predictions.

In our short-term models we consider the de-fect inflow to be the function of characteristicsof work packages (e.g. the accumulated numberof components reaching a particular milestone)and not directly the characteristics of the af-fected components (e.g. size or complexity). Us-ing the characteristics of components as the solepredictors would provide us with a possibilityto predict the defect density of the componentand present this data on a monthly/weekly ba-sis (based on when the component will be putunder testing). Such an approach would be anextension of the current work on defect den-

sity, e.g., [3, 16, 12, 1]. In our case, neverthe-less, this approach seems not feasible, becausethe information about how the components areto be affected by the project is not available atthe time of developing predictions; in particularthe change of size and complexity is not avail-able. For short-term predictions, the data on sizeand complexity of components was not avail-able on a weekly basis simply because measur-ing the size and complexity change is not mean-ingful for particular weeks; the measurements ofcomponent characteristics are done according toproject plans – e.g. builds – and not on a weeklybasis (i.e., not according to calendar time). Inour further work we intend to evaluate if it isfeasible to re-configure this data and use it asan auxiliary prediction method.

Our research on short-term defect inflow canbe extended by using the research of Ostrandet al. [17] on predicting the location of defectsin software components in large software sys-tems. Predicting the location of defects can beapplied as the next step after the long-term de-fect inflow predictions are in place, to guide theproject managers into channelling testing effortsinto components (or work packages which affectthese components) which are historically respon-sible for the largest amount of defects in the sys-tem.

When developing the short-term defect in-flow prediction models we considered usingcapture-recapture techniques [18] for estimatingthe number of defects in the product to assessthe viability of our predictions. However, we de-cided to prioritize the simplicity of data collec-tion since using capture-recapture data wouldrequire additional effort from the testers whenreporting discovered defects and a more thor-ough statistics of the data from the defectsdatabase.

3. Context

The context of the case study is Ericsson and oneof its large projects, which is developing one ofthe releases of a network product. In the courseof development of the methods we used the pre-


vious releases of the product and we applied themethods on the new release of the product. Theproduct has already had several releases and itcan be considered as a mature one. Choosinglate project releases decreases the risk of usingdata biased with immaturity in the organiza-tion, as it has already been shown by [21, 22] ina similar context at the same company.

As a new practice at Ericsson, the large soft-ware projects are structured into a set of workpackages which are defined during the projectplanning phase. In the projects which we stud-ied, the temporal aspects of defect discoverywere more important than the total numberof defects for the products. This was dictatedby the fact that stakeholders for this particularwork were project managers and at the projectmanagement level, the defect inflow is a measureof extra effort in the project (as the discovereddefects have to be removed from the productbefore the release). The number of defects dis-covered at a particular point in time seems tobe a function of the number of work packagesreaching the testing phase. Complexity and sizecharacteristics of the product do not have a di-rect impact on the number of defects discoveredin a particular time frame, but the total numberof defects discovered in the product.

In the existing prediction work on de-fect density [3, 16, 12, 14] it is usually thecase that a component is developed by a sin-gle work package (or even the project, de-pending on the size of the component andproject). In the case of Ericsson, work pack-ages are related to the new features being devel-oped and seldom result in creating completelynew subsystems or single system components.The division of project into work packages isbased on customer requirements, while the di-vision of system into sub-systems and compo-nents is based on such elements as architec-tural design and the architecture of the un-derlying hardware (hardware/environment con-straints). Each work package develops (or makechanges to) components for each new largeproject, which makes it hard to develop aunified defect inflow prediction model usingmeasurements at the component level. Dur-

ing the whole product life cycle (which spansover more than one large project – also re-ferred to as release from the perspective ofthe product) the division of projects intowork packages changes to a large extent (asthe requirements are different for every re-lease). Therefore using work package charac-teristics (which are based on distributing func-tionality) makes the method for long-termpredictions generalizable to other projectsat Ericsson.

Furthermore, from the perspective of projectmanagement, the defect inflow is also a functionover the status of the project, i.e., where in thelifecycle the project is. This information is con-veyed by the work package completion status.This is the assumption that we use in construct-ing the defect inflow predictions – that the defectinflow rate is dependent on where in the lifecyclethe project is. During the project lifecycle thereare 3 major milestones which are important forthe long-term prediction method:– Md – design ready milestone, which defines

the point when all work-packages have fin-ished designing. The Md milestone is used asa reference point when comparing the base-line and predicted projects,

– Mt – test ready milestone, which defines thepoint when all work-packages have finishedtheir tests,

– Mf – product finished, which defines thepoint when the development project con-cludes and the product is released (the main-tenance organization takes over the responsi-bility for the released version of the product– for that period of time the reliability theorycan be used to predict the defect inflow).

Predicting the defect inflow in the large soft-ware projects at Ericsson is an important taskfor project planning (long-term predictions),project monitoring, and early warning mecha-nism (short-term predictions). Ericsson’s qual-ity managers created the predictions manuallyusing expert opinions and analogy based tech-niques. The process of creating the predictionswas time consuming and resulted usually in cre-ating predictions for 2 months and interpolat-ing the remaining months using straight lines.


The short term predictions were not widely usedin the company due to the fact that they wereeffort intensive. The new organization of soft-ware projects provides a unique opportunity tocreate more accurate and less time-consumingprediction models using statistical regressiontechniques.

4. Short-Term Defect InflowPrediction

In this section we present the process of develop-ing the short-term prediction model, introducemethods used for developing the model, presentthe development of the model, and finally providea short example based on a case from Ericsson.

4.1. Process

The process of creating predictions in our casestarted by using a baseline project data to de-velop the prediction models, which resulted ina set of candidate models. The applicability ofthese candidate models for predictions was as-sessed by examining the R2 model-fit coefficient[23]. The candidate model which had the high-est R2 was chosen for further development. Thismodel was used on a new set of data from oneof the current projects to check if it was ap-plicable for predictions. The check was done bycalculating Mean Magnitude of Relative Erroror MMRE [5]. If the MMRE was sufficiently low(below 30%, which was arbitrarily chosen by Er-icsson), then the model was used for predictions.If the MMRE was higher than 30% then the“second-best” candidate model was used withthe new set of data and a new MMRE was cal-culated.

4.2. Methods

When developing the models we used a num-ber of statistical methods. We decided that theprediction model would be in form of a linearequation – an outcome of multivariate linear re-gression method [23]. The choice of linear re-gression was dictated by the dependencies be-

tween measures (predictor variables) at Erics-son. Based on discussions with experts in thecompany we could not identify polynomial orexponential dependencies in short-term predic-tions which dictated the use of linear model. Toconstruct the model we used the previous releaseof the product, which we found to be similar insize, complexity, and maturity of project teamsto the new projects in the company.

In order to avoid problems with co-linearitywithin our data set we used Pricipal ComponentAnalysis or PCA [6]. Principal component anal-ysis was performed prior to multivariate linearregression and was used to identify variables inthe data set which can explain most variabilityin the data set. We did not use the principalcomponent since our goal was to use as littledata as possible and still be able to have accu-rate predictions.

To create the multivariate linear regres-sion equation we used the method of LeastSquares [23] for fitting coefficients in the equa-tion and we used the model-fit coefficient R2 [23]when evaluating whether the model can be usedfor predictions. When we evaluated the modelon a new data set from the current project weused MMRE [5] to check how well the predic-tions fit the actual values of defect inflow. Theevaluation methods are described in more detailin Section 7.

4.3. Model Development

4.3.1. Assumptions and Application

The short-term defect inflow prediction modelwas based on the following assumption: we useddata from past weeks to describe the defect in-flow for the current week. In other words, thedependent variable in the model was the defectinflow for the current week (which has alreadybeen known), while the independent variableswere the defect inflow from previous weeks andplanned/actual milestone completion status forthe current week and the previous weeks.

This assumption meant that once we hadthe regression model describing the defect inflowfor a particular week using data from the past


weeks, we could use this model to predict thedefect inflow for the coming weeks by substitut-ing data for past weeks with the data from thecurrent week. As the predictor variables we usedthe defect inflow for the current project (for thetime up to till the current week), and milestonecompletion status.

The short-term prediction model allowedpredicting the defect inflow for 1–5 weeks in ad-vance, for example: if we found that our predic-tor variables were number of defect inflow from5 weeks before and the accumulated planned Mdcompletion, using the data for the current week,we could predict what the defect inflow will be in5 weeks. However, through simulations we foundthat the predictions for future weeks 4 and 5 hadlow prediction accuracy and therefore they arenot discussed in this paper.

4.3.2. Choice of Predictor Variables

The choice of predictor variables was dictated bythe availability of data and our goal – to makepredictions based on the data that already ex-isted in the organization and was easy to obtain.As discussed in Section 3, we could use milestonecompletion and test progress to predict defect in-flow (since these were relevant variables and theyinfluenced the defect inflow). The source of thedefect inflow: testing was performed before boththe design ready milestone (Md) and test readymilestone (Mt). We discovered that we needed todistinguish between Md and Mt phases as Md isonly used before the Md date for the project andMt completion status is used after the Md date.

In our case study we took into considerationthe following predictor variables:– Number of planned Md completions (pre-

fixed with Mdp), and accumulated numberof planned Md completions (prefixed withAMdp) – accumulated number of comple-tions is the number of completions from thebeginning of the project until the currentweek – for– The predicted week (Mdp0, AMdp0),– 1 week before (Mdp1, AMdp1), 2 weeks

before (Mdp2, AMdp2), . . . , 5 weeks be-fore (Mdp5, AMdp5) the predicted week,

– Number of actual Md completions (Mda)and accumulated number of actual Md com-pletions (AMda) for– 1 week before (Mda1, AMda1), 2 weeks

before (Mda2, AMda2), . . . , 5 weeks be-fore (Mda5, AMda5) the predicted week;

– Number of planned Mt completions (Mtp)and accumulated number of planned Mtcompletions (AMtp) for– The predicted week (Mtp0, AMtp0),– 1 week before (Mtp1, AMtp1), 2 weeks

before (Mtp2, AMtp2), . . . , 5 weeks be-fore (Mtp5, AMtp5) the predicted week;

– Number of actual Mt completions (Mta) andaccumulated number of actual Mt comple-tions (AMta) for– 1 weeks before (Mta1, AMta1), 2 weeks

before (Mta2, AMta2), . . . , 5 weeks be-fore (Mta5, AMta5) the predicted week;

– Number of reported defects (defect inflow,Di) for– 1 week before (Di1), 2 weeks before

(Di2), . . . , 5 weeks before (Di5) the pre-dicted week.

To avoid problems with multi-collinearity weused the variables that were not correlated andwe chose the data from the project plan whichcan always be obtained early in the project.A representative scatter plot for relationshipbetween two of these variables is presented inFigure 1.

While constructing the defect inflow predic-tion models we deliberately did not include thedata for product size/complexity as this datawas not related to project planning for the fol-lowing reasons:– the software components were not assigned

on a one-to-one basis to work packages andthe milestones characterize work packages,not the components,

– the data on source code size was collectedfor milestones in the project (as it does notmake sense to collect them on a weekly ba-sis) – which means that for the whole projectwe could use few data points for size,

– the organization was concerned with projectplanning and monitoring, and not sourcecode characteristics (e.g., size is only an in-


Figure 1. Relationship between accumulated planned test cases to be executedand Accumulated Md closures, Spearman’s correlation coefficient: 0.96

put to the planning, but is not monitored inthe projects).

Neither did we decide to use data about Md/Mtcompletion status as the accumulated num-bers of test cases planned and executed werehighly correlated with the Md/Mt completionstatus (Spearman’s correlation coefficients [23]between 0.96 and 0.99 significant at the 0.01 sig-nificance level).

In our search for the predictor variables weused a large number of approaches and experi-mented with more variables than the above ones.However adding more variables did not increasethe accuracy of the model significantly. For in-stance using all relevant variables (ca. 30) in oneof the equations increased the accuracy by only1%, but the additional effort for data collectionincreased significantly.

4.3.3. Reducing Number of Predictor Variables

Before constructing the regression model overthe candidate variables, Principal ComponentAnalysis was used to reduce the number of vari-ables and thus identifying the strongest pre-dictors. We experimented with the initial setof variables (the input to PCA) in order toachieve the best possible percentage of explain-

ing the variability at the minimal set of mea-surements. PCA analysis identified 4–7 princi-pal components (depending on the predictionperiod: 1, 2, 3, 4, or 5 weeks in advance). Thescatter plot for the main principal componentsand the defect inflow is presented in Figure 2.Due to the confidentiality agreements with ourindustrial partner, the values on the Y -axis arenot provided.

We used PCA to identify the key compo-nents and we used variables which constitutedthese components for the prediction models. Were-calculated the loadings in the components sothat we could present the equations using theoriginal variables and not the components (asthis was one of our requirements while deployingthe model in the company – to use the originalnames of measurements, not the names of thecomponents).

4.4. Result: Short-Term Defect InflowPrediction Model

The principal components before Md seemed tobe linearly correlated to the defect inflow, whichmade the linear regression a viable technique forbuilding the prediction model in this case. Forthe principal components after Md, the compo-


Figure 2. Scatter plot for the main principal components and defect inflow for the models before Mdand after Md. The X-axes show the values of the components

nents did not show a strong correlation with thedefect inflow, which affects the fitness of the re-gression models and the prediction accuracy. Wechecked whether the principal components ex-pose logarithmic and polynomial relationship tothe defect inflow, but the results showed almosta complete lack of relationship for the logarith-mic curve and large scatter for the polynomialcurve. This supported the claims from Ericssonexperts that the relationships are linear and notpolynomial or exponential. The variability of thedata set explained by the principal componentsis presented in the last column in Table 1.

The equations used to predict the short-termdefect inflow are presented together with the R2

coefficient for the regression model and the vari-ability explained by the component which con-tained the variables used in the equation. Thevariables used in the equations are subsets ofvariables presented earlier in this section.

4.5. Example

As an example, let us predict the defect inflowfor a particular week in an example project. Letus assume that we are in week 10 of the project,and it is before the design ready milestone (Md).The data for that particular week is presented

in Table 2. We predict week 11, so the values for1 week before the predicted week are the valuesfor week 10 (the current week), the values for 2weeks before the predicted week are the valuesfor week 9, etc. We substitute the appropriatenumbers for the equations presented in Table 1thus obtaining the predictions for week 11. Byusing the data from week 10 as the data for 2weeks before, data from week 9 as the data for 3weeks before, we can create predictions for week12 – for example AMdp2 in equation is AMdp1from Table 2 (because the data shows the valuesrelative to week 10, not week 12. The value ofthe defect inflow for week 13 is obtained in thesame way.

The results for the short-term predictions arepresented in Figure 3.

The figure shows that given the current cir-cumstances of the project (i.e., the number ofdefects reported in the current week and the sta-tus of the planned and accumulated numbers ofwork packages reaching the Md milestone) week13 seems to be the week when the project man-ager should pay more attention to as the numberof defects discovered is going to be rather high.The potential action of the project manager isto prepare more development resources for thatweek to repair the defects discovered.


Table 1. Defect inflow prediction models (short-term)

Before/after Md(design readymilestone)

Period Equation R2 Variabilityexplainedby compo-nent

Before Md 1 week D = 1.499+0.584∗AMdp0 +0.650∗Di1−1.285∗AMdp1 + 1.102 ∗AMda1

0.86 93.84%

Before Md 2 weeks D = 2.639 + 1.173 ∗ AMdp0 − 2.029 ∗ AMdp2 +1.724 ∗AMda2 + 0.461 ∗Di2 − 0.366 ∗Di3

0.82 91.43%

Before Md 3 weeks D = 4.187−0.357∗Di3 +1.928∗AMdp5−1.192∗AMdp0

0.62 90.92%

After Md 1 week D = 39.155+0.470∗Di1 +1.290∗AMtp0−1.185∗AMtp1−1.214∗AMta1 +0.039∗AMtp2−0.044∗AMta2 +1.297∗AMtp3−0.517∗AMta3 +0.003∗AMtp4 + 0.107 ∗ AMta4 + 0.148 ∗ Di2 − 0.237 ∗Di3 − 0.194 ∗Di4 + 0.180 ∗Di5

0.67 65.74%

After Md 2 weeks D = 53.669 + 1.419 ∗ AMtp0 − 0.682 ∗ AMtp1 +0.099∗AMtp2−1.844∗AMta2 +0.429∗AMtp3−0.768∗AMta3 +0.646∗AMtp4 +0.446∗AMta4 +0.306 ∗Di2 − 0.265 ∗Di3

0.55 78.69%

After Md 3 weeks D = 24.82+0.47∗Di3−0.351∗AMtp2+0.308∗Di5 0.62 59.09%

Table 2. Data for week 10 of the project

Variable AMdp0−week11

AMdp0−week12

AMdp0−week13

AMdp1 AMdp2 AMdp5 AMda1 AMda2 Di1 Di2 Di3

Value 5 5 5 13 14 20 12 12 4 5 5

1

35

45 54

0

5

10

15

20

25

30

35

40

Week 8 Week 9 Week 10 Week 11 Week 12 Week 13

Num

ber o

f def

ects

Defect inflow

Predicted defect inflowCurrent weekActual defect inflow

Figure 3. Short-term defect inflow prediction for week 10


5. Long-Term Defect InflowPrediction

While the rationale behind the short-term pre-diction was to predict the status of the project,the rationale for the long-term prediction was tocreate a monthly trend of defect inflows in theproject for the whole duration of the project.The process for creating the model was the sameas for the short-term prediction, but the meth-ods and results were different.

In the case of short-term models it was theequations which were the resulting model. In thecase of long-term prediction it was the methodwhich was the result. This method consists ofthe following steps:1. Identify a baseline project,2. Partition the defect inflow curve of the base-

line project,3. Create regression models for each part of the

curve,4. Calculate scaling factor for the new project,5. Plot the defect inflow curve for the new

project.On the contrary to the short-term defect in-

flow prediction, the long-term prediction useda single predictor variable – project progress –unlike several variables in the short-term predic-tion models.

5.1. Long-Term Defect Inflow PredictionMethod Development

5.1.1. Methods

For creating the regression models we used thepolynomial regression method and we used onlyone variable – the project progress – as thepredictor variable. The regression use the LeastSquared Estimators similar to the multivariatelinear regression used for short-term predictions.

For creating the scaling factors we used anaverage ratio between the number of defectsreported in the new project compared to thebaseline project. The calculation of this fac-tor Sc was done using the following formula

Sc =1n

n∑i=1

pi − ai

pi,

where: n – the number of months for which wehave the actual data, pi – the value obtained byfrom the equation before scaling, ai – the valueof the actual data of defect inflow for this month.The rationale behind this formula was that itwas an average relative difference between thepredicted defect inflow and the actual data fromthe predicted project.

The scaling factor Sc was calculated for thecurve for which there was some initial defect in-flow data available. If the data were not availablethe scaling factor can be expert estimations (e.g.before the project start).

The method for calculating the scalingconstant Sc1 for the curves which did notstart at the beginning of the project was

Sc1 =predActualProject

predBaselineProject,

where: predActualproject – the predicted defect in-flow for the current project for the last month forwhich the 1st equation (curve) can be used, andpredBaselineproject – the predicted defect inflowfor the equation describing the second curve.

5.1.2. Assumptions and Application

The assumption for the long-term defect in-flow prediction in the software project was thatthere existed a number of projects which couldbe used as a baseline. With that respect thelong-term defect inflow prediction was similarto the analogy-based estimations.

The long-term prediction method should beused at the beginning of new projects in orderto estimate how many defects can be discoveredand when during the project. This information isparticularly important for project planning andmonitoring and the stakeholders for these pre-dictions are project managers in large softwareprojects. The long-term defect inflow predictionis usually not applicable for small projects sincethese usually different significantly from each


other, even if they are executed by the sameorganization and on the same product.

Our long-term defect inflow prediction modelwas designed to work best for projects alreadyin progress as it used the actual reported de-fect inflow from the projects to adjust the pre-diction models and thus increase their accuracy.Using expert estimations at the beginning of theproject should be replaced as soon as real datais available in the new project.

5.2. Result: Method for ConstructingLong-Term Defect Inflow PredictionModels

It should be noted that the resulting long-termprediction model should be adjusted by theowner of the prediction in order to increase itsaccuracy. In particular, the defect inflow curveneeds to be maintained once a month so that thepredicted defect inflow is as accurate as possible.

5.2.1. Identify Similar Projects

The identification of the most similar projectneeds to be done by experts, e.g., the expertsinvolved in creating the predictions. The factorsthat should be taken into account while identi-fying the most common projects are:1. Estimated size of the project, measured as

number of person-hours in the project.2. Number of “heavy” features in the project –

i.e., features which are of high complexity.3. Complexity of the complete project – i.e., in-

cluding integration complexity.4. Time span of the project.

The most important factor is the estimatedcomplexity. A rule of thumb used by experts isthat for a project which is twice as big as a ref-erence project, the number of defect inflow ineach month would be around 75% more than inthe reference project.

The time span is important only in the casewhen the projects are significantly longer, other-wise the method compensates for that by usingthe time relative to the Md milestone. Hence,

similar to the short-term defect inflow predic-tion the Md milestone datse is an importantreference point. In case the time span of theproject is significantly longer, the scaling factor(described later in this section) needs to be ob-tained through expert estimates and not usingthe method described in this paper.

5.2.2. Partition the Defect Inflow Curve of theBaseline Project

In order to identify curves, identify the pointswhere the curves change shape. These changesare important to identify otherwise we assumethat the fitted equations will under-/over- pre-dict the values of the peaks in the defect inflow.The peaks, however, are the most crucial ele-ments to predict since project management isinterested in the information how many defectmight come in the peak time.

It should be noted that projects are usu-ally independent from the calendar time –which means that the prediction model shouldbe done according to the project milestones –e.g. Md. However, changes in the time scaleshould be done after the equations are builtand when the models are to be applied for thecurrent project.

5.2.3. Create Regression Models for Each Partof the Curve

The next step is to create equations describingthe shape of the curves in this chart using regres-sion methods. The equations for the curves canbe identified using curve estimations methods instatistical packages. These curve estimations usethe standard regression techniques – e.g. leastsquare estimators.

In order to create the equations describingeach curve (identified in the previous section)we need to:– Use separate equation for each curve,– Start with a straight line, and– Change the curve type to polynomial with

order set to 2, 3 or 4. The order depends on


the R2 – as a rule of thumb, it should be assmall as possible. The higher the degree ofthe polynomial, the higher the risk of errorsin predictions when the time-scale of the pre-dicted project is different than the time-planof the baseline project.These equations “re-create” the defect in-

flow for the baseline project using mathematicalequations, which allows us to adapt the defectinflow trends for different time scales.

5.2.4. Calculate Scaling Factor forthe New Project

Up to this point, the method constructs a setof equations which describe the trend of defectinflow in the baseline project. In order to pre-dict the defect inflow in the future projects, theequations need to be scaled (moved up or downthe y-axis) to reflect the actual values so farof the predicted project. The goal of this stepis to have a prediction model which is in formof a set of equations which are scaled accord-ing to certain criteria. For the different partsof the curves identified so far we use differentscaling factors. For the first curve, in our casethere is a single criterion: the predicted defectinflow should fit the actual defect inflow fromthe predicted project. The outcome of the scal-ing is the scaling factor Sc, which is used in thefollowing way

Dm(month) = Sc ∗ y(month),

where: Dm(month) – defect inflow for a specificmonth, y(month) – the predicted defect inflowfor the month calculated from the equations de-scribing the baseline project.

As mentioned in the methods section thereare two different ways of creating the scaling fac-tor, which are used (i) at the beginning of theproject (when no defects have been reported),and (ii) during the project when some defectshave been reported. In the first case (i), the scal-ing needs to be done using expert opinion – i.e.,the expert has to provide the ratio of complex-ity between the predicted and this ratio becomesthe scaling factor Sc.

In the latter case (ii), the scaling can bedone by “fitting” the predictions to the existingtrend in defect inflow for the predicted project.Using the actual data means that we do notrely on subjective estimates but we actually tryto answer the question: “How will the defectinflow look like in the predicted project if wecontinue with the current trend of defect in-flow?” Furthermore, the predictions of defect in-flow become more important once the projectprogresses – i.e., since the defect inflow is notexpected to be large at the beginning of theproject. This fitting can be done in the follow-ing ways: (i) Calculating the average relative dif-ference, or (ii) Using non-linear regression. Thefirst one (i) works well for the projects which hassimilar life span (i.e., differences in life spansshould not be more than 2 months). The sec-ond (ii) is more robust and does not have thislimitation.

5.3. Example: Long-Term Defect InflowPrediction Model

In this section we present how we constructed aprediction model for one of the projects at Er-icsson. In our study in one of the product thetrends in the defect inflow for some of the re-leases are presented in Figure 4. The trends aresimilar, but not the same, which requires moreadvanced methods for predicting the defect in-flow than only the analogy based estimations.Due to the confidentiality agreement with ourindustrial partner, we present the data scaledto the largest defect inflow in each project andwe present only the subset of months for theproject.

The figure shows that the trends in the de-fect inflow are rather stable over releases (al-though they differ in the values, which cannot beshown in the figure due to confidentiality agree-ments). They are presented in a relative timescale with the common point of Md milestone;the milestone when all work packages have fin-ished designing.

An important observation is that all threeprojects have a similar percentage of defects in-


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

mon

th -6

mon

th -5

mon

th -4

mon

th -3

mon

th -2

mon

th -1

Des

ign-

read

y

mon

th 1

mon

th 2

mon

th 3

mon

th 4

Perc

enta

ge o

f def

ects

(sca

led

to th

e pe

ak)

Release: baseline-2

Release: baseline-1

Release: baseline

Figure 4. Defect inflow trends for previous projects scaled towards design ready milestone. The dotsrepresent removing a number of months since we cannot show the complete time frame for project

flow at the Md milestone month. Since this isthe case, we use the Md milestone month asa reference point in constructing the long-termpredictions.

In order to identify the baseline project weasked experts who work with these baselineprojects. They identified previous release of thesame product as the best baseline (“Release:baseline” in Fig. 4).

In the case of this baseline project we iden-tified the following curves:1. months 1–5,2. months 5–9,3. months 9–11.The developed trend line for months 1–5 is pre-sented in Figure 5. The values of the defectinflow are normalized, due to the confidential-ity agreement with our industrial partner. Thecurve equation is displayed in the chart, where xdenotes the month. The scaling factor was cal-culated to be 1.23 in month number 3, whichmeant that the new project produced ca. 23%more defects thanAcknowledgments the previ-ous project. We validated that with the qualitymanagers for that project who confirmed thatthis number reflected their expert opinion.

6. Evaluation of Defect Inflow Modelsin the Context of Ericsson

6.1. Design of Evaluation

When developing the prediction models we usedthe model fit coefficient (R2) to observe whethermodels are accurate w.r.t. the past projectsused to build the models. In this section we de-scribe how we used the data from new (current)projects to check whether the models accuratelypredict defect inflows in new projects. We eval-uated two aspects: the ability to predict the cor-rect value and whether predictions over a periodof time (e.g., predictions 3, 2, and 1 weeks inadvance) predict the same value (stability). Un-stable models change rapidly over time whichmakes them less trustworthy – how can we trusta prediction model that will change a lot in thecoming weeks/months?

We used the Magnitude of Relative Error(MRE) metric to measure how accurate the pre-dictions are. MRE was defined in [5] as

MRE =|ai − pi|

ai,


y = 0.75714x2 + 1.4571x + 8.43R2 = 0.9598

1 2 3 4 5

Baseline projectTrendline

Figure 5. Equation for the curve for months 1–11

where ai means the actual value for the defectinflow for i-th week (short-term predictions) ormonth (long-term predictions); pi denotes thepredicted value of defect inflow for the i-th weekor month. In the evaluation we used both thedistribution of MRE and mean MRE (MMRE).The best models were expected to have the low-est value of MMRE – i.e., the mis-predictions ofmodels are small.

For evaluating the stability we used our ownmeasure mean in-stability (MiST) as a metricfor comparison, which we defined as

MiST =1n

n∑i=1

|m0i −mji|m0i

,

where: n – the number of months used in theprediction, m0i – the predicted defect inflow forthe i-th month created before the project (0thmonth), mji – the predicted defect inflow for thei-th month created in the j-th month.

In evaluation of the prediction accuracy wecompared models developed in our research(presented in Table 1) and “average” models,i.e., predicting using a simple average amountof defect inflow in a baseline project (or the av-erage number of defects in the current project– up to the week for which the prediction wasmade), and the moving averages. The rationalebehind the average models was that if we did notknow how to predict the number of defect inflowin a particular week, we could take the averagenumber of defects for all weeks as an estimator;

alternatively we could also use the median (i.e.,the most common value of the defect inflow).Thus, in the evaluation (Figure 6), we used thefollowing models:– Average number of defect inflow from the

baseline project,– Average number of defect inflow from the ac-

tual project until the week of prediction,– Moving average (2 weeks) of the number of

defect inflow from the current project (i.e.,the predicted value of the defect inflow is theaverage of the defect inflow from previous 2weeks),

– Moving average (3 weeks) of the number ofdefect inflow from the current project (i.e.,the predicted value of the defect inflow is theaverage of the defect inflow from previous 3weeks),

– Value of the mode of the defect inflow fromthe baseline project (to some extend this isthe use of analogy based estimation),

– Value of the mode of the defect inflow fromthe current project,

– Expert estimations for 1, 2, and 3 weeks.In order to evaluate the long-term prediction

models developed using this method, we com-pared the prediction model developed using ourmethod to the following prediction models:– Linear, quadratic, cubic, 4th degree, 5th de-

gree, and exponential curve depicting thetrend in defect inflow,

– Rayleigh model,


– Expert estimations, which are based on thepredictions done for previous projects, priorto our research.The above models were chosen since they

require a similar amount of effort to createcompared to our models. We have deliber-ately excluded methods like Bayesian Belief Net-works [15] as their construction requires signifi-cantly more effort than the construction of thesimple prediction models using our method.

6.2. Results of Evaluation

6.2.1. Short-Term Predictions

In this section we show how the models workedin a new project at the company. The newproject which we chosen for the evaluation isthe next release of the same product, while thebaseline project was the previous release of thesame product.

The values for the MMRE for the referenceprediction models and the short-term predictionmodels are presented in Figure 6. Figure 6 in-dicates that the best model is the moving av-erage for 2 weeks, which is one of the simplestmodels to construct. Our prediction models havelarger mis-predictions, which are caused by thefact that the predictions after Md are based onthe data which was weakly correlated with thedefect inflow. Despite a low value of MMRE forpredictions using moving averages, there is a dis-advantage of these two models. Using movingaverages shows trends in the defect inflow formore than one week, which are rather stable (themoving average are partially used by experts inestimating the defect inflow). However, the mov-ing averages do not allow predicting peaks (asthe peak shown in Figure 3 for week 13).

The worst prediction models are the modelsusing modes from current and baseline projects;these two models mis-predict the defect inflowin most cases by more than 100%The predic-tions made by the expert had the most com-mon mis-predictions of 75%-90%. The predic-tions created using moving averages can resultin less accurate models (since they have a larger

percentage of mis-predictions above 90% than’our’ models). From the experiments with thehistorical data we found that the predictionmodels developed in this paper had a tendencyof over-predicting (i.e., predicting values thatwere larger than the actual values), in partic-ular indicating the potential “red-alerts” for theprojects – i.e., showing that there will be a highraise in the defect inflow in the project. Al-though this might be a problem from the statis-tical perspective (low accuracy), this provides ameans for project managers to get early warn-ings of potential problems so that they can havea time for reacting to some extent. This, how-ever, cannot be verified on the historical dataand we are currently in the process of evaluatingthe models in other large projects at Ericsson.The interview with the expert provided the pre-dicted values for 10 different weeks. The expertwas asked to make the predictions for 10 differ-ent weeks in the same way as he does the pre-dictions when they are needed (using the infor-mation only up to the predicted week), althoughmaking short-term predictions is not done veryoften; the experts are focused on long-term pre-dictions for the whole project. The results showthat the methods presented in this paper are notworse than the predictions made by the expertand hence can be used as a surrogate for expertpredictions. A disadvantage of the expert pre-dictions is that they require a very deep, insightknowledge into the project. The expert makingthe estimations is very experienced. The timerequired to create the predictions was negligiblylonger than the time required when using theprediction models.

6.2.2. Long-Term Predictions

The values of Mean Magnitude of Relative Er-rors (MMRE) are presented in Figure 7. TheMMRE is calculated using 7 months after theproject start (not for the whole project, since itis not yet concluded).

This shows that the best model is using asingle, exponential equation. However, by exam-ining only the first 7 months can be misleading,


52% 57%

154%

564%

223%

90%106%

79% 79%

375%

191%171%

0%

100%

200%

300%

400%

500%

600%

1 week(our)

2 weeks(our)

3 weeks(our)

AVG-baseline

AVG-current

AVG-2w AVG-3w Mode-baseline

Mode-current

Expert -1w

Expert -2w

Expert -3w

MMRE

Figure 6. MMRE for short-term predictions

49% 52%

87%

155%

100% 105%93%

25%

46%

29%

0%

20%

40%

60%

80%

100%

120%

140%

160%

180%

Custom

(our)

Analog

yLin

ear

Quadra

ticCub

ic

4th de

gree

5th de

gree

Expon

entia

l

Raylei

ghExp

ert

MMRE

Figure 7. MMRE for the new project

as the predictions for the whole project usingexponential equations do not produce trustwor-thy results by the end of the project. Figure 8presents the predictions for the whole project.

The results from the stability evaluationare presented in Figure 9. A good model isnot changing very much from month to monthmeaning that the initial predictions are actuallytrustworthy.

The stability need to be assessed togetherwith the actual defect inflow trend in theproject, which is presented in Figure 8. Thetrend in defect inflow is different that was ex-pected, and it is different than in the baselineproject. These differences caused the instabilityof the model. However, as the project progresses,the peak in month (a+3) was leveraged and thetrend in month (b) came back to normal. How-


-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

Actual valuesCustom (our)AnalogyExponentialRayleigh

Md

Figure 8. Long-term predictions for the new project

207%

126%

74%

37%

0%

50%

100%

150%

200%

250%

M1 M2 M3 M4

Mean in-stability

Figure 9. Stability evaluation

ever, given the history and the way in which thescaling factor is calculated, the current predic-tions are that the defect inflow in the projectwill be smaller than initially expected.

By observing the MiST chart in Figure 9 wecould conclude that the predictions are stableonly when there are no peaks in the defect in-flow. The peaks are exceptional situations in theprojects and they render the predictions unus-able, thus call for adjusting the predictions. Thepresence of such a peak means that the predic-tion model should be constructed differently –e.g. by choosing a different baseline project, orsplitting the baseline project into more curves(c.f. Section 5) – basically the instability iscaused by the fact that we use inappropriatecurves, which was caused by the fact that thispeak was not present in the baseline project.

7. Validity Evaluation

In this section we evaluate the validity of ourstudies for constructing and evaluation of themethods. We use the framework of Wohlinet al. [24].

The main threat to the external validity ofour results is the fact that we developed andused prediction methods in a single organizationwithin Ericsson. Even though the organizationis a large one (c.f. [20]) and we tested that atmore than one product, it could still be seen as athreat. In order to minimize the threat we usedthe predictions in more than one project andproduct. The results were sufficiently accuratefor both products and they also led to takingimmediate actions by project managers in orderto prevent potential predicted problems.


The main threat to the construct validityis the use of data from manual reporting toconstruct the prediction models. Although thismight be seen as a problem issue, from ourdiscussions with experts it was clear that thesimplicity to create the predictions was one ofthe top priorities and hence our decision. Usingtest-progress data for predicting defect inflow inthe same organization can be found in [19]. In or-der to minimize the threat that we use “random”variables without empirical causal relationshipswe preformed a short workshop with Ericssonexperts. The outcome of that workshop was thatthere is empirical causality between the predic-tors and predicted variables.

Another threat to the construct validity isthe use of regression method in our research. Us-ing regression algorithms can be burdened withthe problem of overfitting, i.e., fitting the re-gression equation in short term predictions (orcurve for long-term predictions) too closely tothe baseline project. Overfitting can cause themodels to be inapplicable for other projects. Weminimize this threat by evaluating our results innew projects and check the applicability of themodels.

The main threat to the internal validity ofthe study is the completeness of the data andmortality of data points. It is rather a com-mon situation in industry that data might bemissing due to external factors (e.g. vacations,sick-leaves). In our case the missing data washandled by removing data points which were in-complete and removing the data points whichcould potentially be affected by low-quality data(mainly during vacation periods). The numberof data points removed was small compared tothe data sets (less than 5% of data points).

Finally, we have not discovered any threatsto the conclusion validity as we used establishedstatistical methods to develop the models andconfirmed our findings with expert knowledgein the company.

8. Conclusions

This paper presented two complementary meth-ods for predicting defect inflow in large softwareprojects: short-term and long-term defect inflowprediction. The methods are used for the pur-pose of project planning and monitoring at Er-icsson. The goal of introducing new methods inthis paper is to provide the experts with sup-port for creating the prediction models usingstatistical methods based on the data which isalready collected in the organization (or whichcan be collected at reasonable costs). Using lin-ear regression methods resulted in simple andhigh-cost efficient methods, which could be seenas a trade-off between prediction accuracy andcosts of predicting. In this paper we tried to ad-dress this trade-off by minimizing the numberof measurements to be collected and focus onmeasurements established and existing in the or-ganization at the same time minimizing the costfor data reconfiguration. However, in the courseof our research new ways of data collection wereintroduced, which improved the practice and al-lowed for more accurate predictions.

Based on our experience and evaluation ofthese methods at Ericsson we can recommendusing these methods and adjusting them to localneeds for particular organizations. The meth-ods are intended to support projects whichare structured around work packages and notsub-projects. Our further work is focused onmonitoring the performance of these methods ina larger number of projects and further evaluat-ing their robustness. We also intend to deploythese methods to other departments and orga-nizations to check their external validity.

Acknowledgements. We would like tothank Ericsson AB for support in the study, inparticular the experts we have the possibility towork with. We would like to thank the SoftwareArchitecture Quality Center for support in thisstudy.


References

[1] W. W. Agresti and W. M. Evanco. Project-ing software defects from analyzing ada designs.Software Engineering, IEEE Transactions on,18(11):988–997, 1992.

[2] S. Amasaki, T. Yoshitomi, O. Mizuno, Y. Tak-agi, and T. Kikuno. A new challenge forapplying time series metrics data to softwarequality estimation. Software Quality Journal,13:177–193, 2005.

[3] T. Ball and N. Nagappan. Static analysis toolsas early indicators of pre-release defect density.In 27th International Conference on SoftwareEngineering, pages 580–586, St. Louis, MO,USA, IEEE, 2005.

[4] S. Chulani. Constructive quality for defect den-sity prediction: COQUALMO. In InternationalSymposium on Software Reliability Engineering,pages 3–5, 1999.

[5] S. D. Conte, H. E. Dunsmore, and V. Y.Shen. Software Engineering Metrics and Mod-els. Benjamin-Cummings, Menlo Park CA,1986.

[6] G. H. Dunteman. Principal Component Analy-sis. SAGE Publications, 1989.

[7] A. L. Goel. Software reliability models: As-sumptions, limitations, and applicability.

[8] D. Houston, D. Buettner, and M. Hecht. Dy-namic COQUALMO: Defect profiling over de-velopment cycles. In ICSP, pages 161–172, 2009.

[9] M. Klaes, H. Nakao, F. Elberzhager, andJ. Munch. Support planning and controllingof early quality assurance by combining expertjudgement and defect data – a case study. Em-pirical Software Engineering, 2009.

[10] L. M. Laird and M. C. Brennan. Software mea-surement and estimation: a practical approach.John Wiley and Sons, Hoboken, N.J., 2006.

[11] P. L. Li, J. Herbsleb, and M. Shaw. Forecastingfield defect rates using a combined time-basedand metrics-based approach: a case study ofOpenBSD. In The 16th IEEE InternationalSymposium on Software Reliability Engineering,page 10. IEEE, 2005.

[12] Y. K. Malaiya and J. Denton. Module size dis-tribution and defect density. In 11th Interna-tional Symposium on Software Reliability Engi-neering, pages 62–71, San Jose, CA, USA, 2000.

[13] K. Matsumoto, K. Inoue, T. Kikuno, andK. Torii. Experimental evaluation of software

reliability growth models. In The 18th Inter-national Symposium on Fault-Tolerant Comput-ing, 1988, pages 148–153, Tokyo, Japan, IEEE,1988.

[14] P. Mohagheghi, R. Conradi, O. M. Killi, andH. Schwarz. An empirical study of softwarereuse vs. defect-density and stability. In 26thInternational Conference on Software Engineer-ing, pages 282–291. IEEE, 2004.

[15] M. Neil and N. Fenton. Predicting softwarequality using bayesian belief networks. In 21stAnnual Software Engineering Workshop, pages217–230, NASA Goddard Space Flight Centre,1996.

[16] A. M. Neufelder. How to predict software de-fect density during proposal phase. In NationalAerospace and Electronics Conference, pages71–76, Dayton, OH, USA, 2000.

[17] T. J. Ostrand, E. J. Weyuker, and R. M. Bell.Predicting the location and number of faults inlarge software systems. IEEE Transactions onSoftware Engineering, 31(4):340–355, 2005.

[18] H. Petersson, T. Thelin, P. Runeson, andC. Wohlin. Capture-recapture in software in-spections after 10 years research: Theory, eval-uation and application. Journal of Systems andSoftware, 72:249–264.

[19] M. Staron and W. Meding. Predicting weeklydefect inflow in large software projects basedon project planning and test status. Informa-tion and Software Technology, 50(7–8):782–796,2009.

[20] M. Staron, W. Meding, and C. Nilsson. A frame-work for developing measurement systems andits industrial evaluation. Information and Soft-ware Technology, 51(4):721–737, 2009.

[21] P. Tomaszewski and L. Lundberg. Software de-velopment productivity on a new platform: anindustrial case study. Information and SoftwareTechnology, 47(4):257–269, 2005.

[22] P. Tomaszewski and L. Lundberg. The increaseof productivity over time: an industrial casestudy. Information and Software Technology,48(9):915–927, 2006.

[23] R. E. Walpole. Probability and statistics forengineers and scientists. Prentice Hall, UpperSaddle River, NJ, 7th edition, 2002.

[24] C. Wohlin, P. Runeson, M. Host, M. C. Ohlsson,B. Regnell, and A. Wesslen. Experimentation inSoftware Engineering: An Introduction. KluwerAcademic Publishing, 2000.

Documents

Defect Inflow Prediction in Large Software Projects