3
IEEE TRANSACTIONS ON RELIABILITY, VOL. R-33, NO. 2, JUNE 1984 157 Rationale for a Modified Duane Model Bev Littlewood analysis of the fault-removal process. I hope that the Centre for Software Reliability, London understanding which comes from this more detailed modeling will suggest more-plausible models. Key Words-Reliability growth, Duane model, Modified Duane Notation model, Software reliability, Design fault, Design debugging. K(t) number of failures in (0, t) Reader Aids- M(t) mean value function, E{K(t)} Purpose: Tutorial Special math needed for explanations: Probability X(t) ROCOF, M' (t) Special math needed to use results: None Xi time to detection of fault i Results useful to: Reliability theoreticians cI.i random variable rate of occurrence of fault i ¢' realisation of c1 Abstract-The Duane model for reliability growth involves a rate common of of n function which is an inverse power law and has an "infinite" value at t - 0. The model is usually motivated entirely empirically. Here a prob- Moranda model abilistic rationale is proposed via a reliability growth model involving the X(i) order statistics of random variables Xi removal of design faults. This rationale results in a modified power law Ti inter-failure times: Ti = X(i) - X(i-,) rate, finite at the origin. A wider class of rate functions should be in- NHPP non-homogeneous Poisson process vestigated for NHPP models of reliability growth. ROCOF rate of occurrence of failures Other, standard notation is given in "Information for Readers & Authors" at rear of each issue. 1. INTRODUCTION A non-homogeneous Poisson process (NHPP) whose 2 RATIONALE FOR A MODIFIED DUANE MODEL rate function follows an inverse power law has come to be called the Duane model. This follows the claim of Duane Reliability growth takes place as a result of the [1] that plots of the empirical failure rate against t on log- removal of sources of failure, which will be called faults. log paper were close to linear for real reliability growth Stochastic modeling of the fault-removal process has been data. Crow [2] added the NHPP assumption to the Duane much more common in software reliability [5, 6] than in postulate and obtained properties of the maximum the hardware literature. In fact the observations here con- likelihood estimates of the rate parameters. cern a model [6] which was originally proposed for soft- This model is sometimes referred to improperly as the ware reliability growth. However, the removal of design Weibull process, presumably because the rate of occur- faults from hardware has many similarities, and it is likely rence of failure (ROCOF) has the same mathematical form that the software reliability growth literature will prove as the hazard rate function of the Weibull distribution. useful in this area. Ascher argues convincingly [3] that this terminology has These two assumptions represent the idealisation of caused great confusion between the stochastic process, the fault-removal process: which is the topic of this paper, and the (Weibull) prob- 1. The system begins life containing N faults. ability distribution. I agree with Ascher that this termi- 2. When a fault reveals itself by causing a system nology should be discouraged. Finkelstein [4], amogohes,asfailure, the fault is instantaneously removed and the Finkelstein [41] among others, has pointed out that the sytmimdaeyreundt.t praigevrnet ROCOF of the Duane model has two disadvantages: It is infinite at t = 0 and zero at t- oo. Real systems have For an arbitrary labeling of the faults, let the random neither of these properties. variable X, represent the total time on test (or in use) until The second of these disadvantages is easily corrected. fault i reveals itself by causing a failure (and hence being The ROCOF can be modified by adding a constant term to removed). The order statitics X(z,, X(2,, X(3,, ... are the represent the rate of occurrence of ineradicable sources of times of occurrence of the 1st, 2nd, 3rd, ... failures. The failure; essentially superposing a simple Poisson process inter-event time random variables T,, 7'2, I3, ..are the on the NHPP. This problem will not be considered here. spacings: The first problem is more serious. It relates to the early part of the observed history, and so will always be T, - X(,) seen. I propose a modification to the Duane ROCOF to T X X overcome this difficulty. My main intention, however, is to "2 = 2 - ( ) give a rationale for the use of models of this type via an T3 = X(3, - X(2,, etc. (1) 0018-9529/84/0600-0157$O1 .00©( 1984 IEEE

Rationale for a Modified Duane Model

  • Upload
    bev

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Rationale for a Modified Duane Model

IEEE TRANSACTIONS ON RELIABILITY, VOL. R-33, NO. 2, JUNE 1984 157

Rationale for a Modified Duane Model

Bev Littlewood analysis of the fault-removal process. I hope that theCentre for Software Reliability, London understanding which comes from this more detailed

modeling will suggest more-plausible models.

Key Words-Reliability growth, Duane model, Modified Duane Notationmodel, Software reliability, Design fault, Design debugging.

K(t) number of failures in (0, t)Reader Aids- M(t) mean value function, E{K(t)}

Purpose: TutorialSpecial math needed for explanations: Probability X(t) ROCOF, M' (t)Special math needed to use results: None Xi time to detection of fault iResults useful to: Reliability theoreticians cI.i random variable rate of occurrence of fault i

¢' realisation of c1Abstract-The Duane model for reliability growth involves a rate common ofof n

function which is an inverse power law and has an "infinite" value at t -0. The model is usually motivated entirely empirically. Here a prob- Moranda modelabilistic rationale is proposed via a reliability growth model involving the X(i) order statistics of random variables Xiremoval of design faults. This rationale results in a modified power law Ti inter-failure times: Ti = X(i) - X(i-,)rate, finite at the origin. A wider class of rate functions should be in- NHPP non-homogeneous Poisson processvestigated for NHPP models of reliability growth. ROCOF rate of occurrence of failures

Other, standard notation is given in "Information forReaders & Authors" at rear of each issue.

1. INTRODUCTION

A non-homogeneous Poisson process (NHPP) whose 2 RATIONALE FOR A MODIFIED DUANE MODELrate function follows an inverse power law has come to becalled the Duane model. This follows the claim of Duane Reliability growth takes place as a result of the[1] that plots of the empirical failure rate against t on log- removal of sources offailure, which will be called faults.log paper were close to linear for real reliability growth Stochastic modeling of the fault-removal process has beendata. Crow [2] added the NHPP assumption to the Duane much more common in software reliability [5, 6] than inpostulate and obtained properties of the maximum the hardware literature. In fact the observations here con-likelihood estimates of the rate parameters. cern a model [6] which was originally proposed for soft-

This model is sometimes referred to improperly as the ware reliability growth. However, the removal of designWeibull process, presumably because the rate of occur- faults from hardware has many similarities, and it is likelyrence of failure (ROCOF) has the same mathematical form that the software reliability growth literature will proveas the hazard rate function of the Weibull distribution. useful in this area.Ascher argues convincingly [3] that this terminology has These two assumptions represent the idealisation ofcaused great confusion between the stochastic process, the fault-removal process:which is the topic of this paper, and the (Weibull) prob- 1. The system begins life containing N faults.ability distribution. I agree with Ascher that this termi- 2. When a fault reveals itself by causing a systemnology should be discouraged.

Finkelstein [4], amogohes,asfailure, the fault is instantaneously removed and theFinkelstein [41] among others, has pointed out that the sytmimdaeyreundt.t praigevrnetROCOF of the Duane model has two disadvantages: It isinfinite at t = 0 and zero at t- oo. Real systems have For an arbitrary labeling of the faults, let the randomneither of these properties. variable X, represent the total time on test (or in use) until

The second of these disadvantages is easily corrected. fault i reveals itself by causing a failure (and hence beingThe ROCOF can be modified by adding a constant term to removed). The order statiticsX(z,, X(2,, X(3,, ... are therepresent the rate of occurrence of ineradicable sources of times of occurrence of the 1st, 2nd, 3rd, ... failures. Thefailure; essentially superposing a simple Poisson process inter-event time random variables T,, 7'2, I3, ..are theon the NHPP. This problem will not be considered here. spacings:

The first problem is more serious. It relates to theearly part of the observed history, and so will always be T, - X(,)seen. I propose a modification to the Duane ROCOF to T X Xovercome this difficulty. My main intention, however, is to "2 = 2 - ( )give a rationale for the use of models of this type via an T3 = X(3, - X(2,, etc. (1)

0018-9529/84/0600-0157$O1 .00©( 1984 IEEE

Page 2: Rationale for a Modified Duane Model

158 IEEE TRANSACTIONS ON RELIABILITY, VOL. R-33, NO. 2, JUNE 1984

The joint distribution of the random variables x, (i = has been suggested that this optimism could be a conse-1, 2, ..., N) completely determines the structure of the quence of assuming the fault rates to be equal when in factobserved stochastic process. This structure is particularly they differ markedly [11].simple where the Xi's are s-independent and identically The new model assumes, like the Jelinski-Morandadistributed. The first model considered here, due to Jelin- model [5], that the system begins life with N faults. Theski & Moranda [5], assumes: random variables Xi, representing the times to detection of

3. The random variables Xi are iid with the different faults, are assumed to be s-independent, asbefore, with

Sf{x} = exp(- Ox), (x > 0, 0 > 0). (2)Sf{xiI(i = oil = exp(- 1xi) (8)

The random variables Ti (i = 1, 2, ... N) are s-independentwith Here the 1, represent the (different) rates of the different

faults. The novelty is that the values 01, 02, ..., O., areSf{ti} = exp[- (N - i + 1)Oti] (3) assumed to be realisations of iid random variables, (Di. In

what follows these are assumed to have a gamma distribu-The Ti's are stochastically ordered with: tion:

T~~~> ~~~~'i~~~ (4) ~~~~pdf(O) = /3k(,/ > 0) (9)s;t Fg](a)XIe$which is one way that this model represents reliability The Xi are iid with the common Pareto distribution:growth.

Let the random variable K(t) denote the number of oufailures in (0, t). The mean value function and ROCOF of pdf(x) = (3 + x)a+l (10)the point process are

M(t) = N(I - e-¢) (5) As before, the order statistics X(,), X(2), ... are the timesof occurrence of the failures, and the spacings are the

X\(t) = NOe- 01 (6) inter-failure time random variables T,, T2, These lat-ter are conditionally Pareto distributed:

This exponential decay of the rate of the process contrastswith the inverse power law of the Duane postulate. The pdf(ti (i - I)th failure occurred at time r)process is not a NHPP. In fact, (6) is an unconditional ratein the sense that X(t) dt is the unconditional probability of a pdf(ti X(il, = T)failure in (t, t + dt). The conditional rate of the process isvery different: If we know that i failures have occurred (N- i)a(/3 + T)(N-I)a 11then the conditional rate is: (d + r + t,)(N-l)a+l

(N - i)O. (7) Again the Ti's are stochastically ordered, giving reliabilitygrowth. However, they are not s-independent.

The difference between (6) and (7) can be explained by The mean value function and, by differentiation, theobserving that (5) is obtained by averaging over all sample unconditional ROCOF of this stochastic process are:paths of the process which could have occurred in (0, t).Ascher & Feingold [7] carefully discuss some of the impor- M(t)-NL-( a (12)tant issues concerning interpretation of rates. + t

The main disadvantage of this model lies in its X(t) = Nc/a(/ + t)-a- (13)assumption of a common rate k for each fault. This im-plies that all faults are of the same size, in the sense that This is the required inverse power law rate with finitethey all have the same rate of occurrence. On the contrary, value (Nuo1) at t 0.it seems likely that for both hardware and software faultsthe rates are different. Indeed, some recent empirical with iniie au a h oii. Alo.f.,/ o epnstudies by Nagel & Skrivan [8] suggest that, at least for ~ fxd hsmdlbcmsteJlnk-oadsoftware, the rates can differ by several orders Of model.magnitude.

A recent model by Littlewood overcomes these dif- 3. COMMENTARYficulties [6]. The practical motivation of the new model isto overcome the tendency of the Jelinski-Moranda model A rationale has been presented for the modified Duaneto give reliability predictions which are too optimistic. It postulate by arguing that times to discovery of faults are

Page 3: Rationale for a Modified Duane Model

LITTLEWOOD: RATIONALE FOR A MODIFIED DUANE MODEL 159

s-independently exponentially distributed with rates (fault These comments on the NHPP do not detract fromsizes) which form a random sample from a gamma the rationale for the Duane postulate, which originallydistribution. concerned only the rate function. Crow [2] seems to have

The model described here (and in more detail in [6]), been the first to have coupled this with the extra assump-and the Jelinski-Moranda model, are examples of order tions of the NHPP. These extra assumptions (principallystochastic models for reliability growth. They bear some the assumption of s-independent increments) do not ap-relationship to the common problem in reliability in which pear to have been justified in the literature.N similar devices are placed on test simultaneously. Eachdevice is observed until it fails, in which case its life X is ACKNOWLEDGMENTrecorded, or the test is stopped (by one of a variety of stop-ping rules) [9]. Nis known and problems centre around the This work was partially supported by NASA Langleydifficulties of inference, based on the first few observed Research Center under grant NAG-1-179, and partially byorder statistics, for different stopping rules and life-time US Army European Research Office under grantdistributions. In the situations described in this paper in- DAERO-79-0038.terest centres upon the stochastic process of observedfailures in real time and, of course, N is unknown. REFERENCES

This last fact causes the difficulties in estimating theparameters of the models [5, 6]. For this reason a NHPP [1] J. T. Duane, "Learning curve approach to reliability monitoring",with rate (6) or (13) might be an attractive approximation IEEE Trans. Aerospace, vol 2, 1964, pp 563-566.wihrte or(cineapre eorath [2] L. H. Crow, "Confidence interval procedures for reliability growthto the exact stochastic process. At least inference for the analysis", Tech. Report 197, US Army Material Systems AnalysisNHPP iS straightforward. Activity, Aberdeen, Maryland, 1977.

The following is one way in which a NHPP could be [3] H. Ascher, "Comments on: Models for reliability of repaired equip-justified. Let N be a Poisson random variable with mean ment", IEEE Trans. Reliability, vol R-28, 1979 Jun, p 119.M, and Xi be iid with pdf f(x). The unconditional [4] J. M. Finkelstein, "Starting and limiting values for reliabilitystochastic point process (ie, mixed over N) will be a NHPP growth", IEEE Trans. Reliability, vol R-28, 1979 Jun, pp 111-114.[5] Z. Jelinski, P. B. Moranda, "Software reliability research",(see, for example, ~[10]) with rate: Statistical Computer Performance Evaluation, Ed. W. Freiberger.

Academic Press, 1972, pp 465-484.X(t) = Mf(x). (14) [6] B. Littlewood, "Stochastic reliability growth: a model for fault

removal in computer programs and hardware designs", IEEEIn particular, for the stochastic process described by Trans. Reliability, vol R-30, 1981 Oct, pp 313-320.(8)-(13) the NHPP has rate [7] H. Ascher, H. Feingold, Repairable Systems Reliability: Modelling,

Inference, Misconceptions and their Causes. Marcel Dekker (toX(t) = Maxo(3 + t)-'- (15) appear).

[8] P. M. Nagel, J. A. Skrivan, "Software reliability: repetitive run ex-

ie, the modified inverse power law. Presumably such a perimentation and modelling", BCS-40399, Boeing Computer Ser-mixture over a Poisson distribution for N would only have vices Company, Seattle, Washington, 1981 Dec.[9] N. R. Mann, R. E. Schafer, N. D. Singpurwalla, Methods foran interpretation for a Bayes subjectivist. Statistical Analysis ofReliability and Life Data, John Wiley & Sons,

If these arguments are accepted, the particular inverse 1974.power law form of the rate function arises solely because [10] N. Langberg, N. D. Singpurwalla, "A unification of some softwareof the gamma mixing function (9). Different mixtures pro- reliability models via the Bayesian approach", Tech. Memduce different rate functions. This suggests that if we are TM-66571, School of Eng. and Applied Science, George

Washington University, Washington, DC, 1981.to continue to use NHPP models for reliability growth, we [I 1] P. A. Keiller, B. Littlewood, D. R. Miller, A. Sofer, "On the qualityshould consider a wider class of rate functions than hither- of software reliability predictions", Proc. NATO ASI on Electronicto. The particular choice of model from this wider class Systems Effectiveness and Life Cycle Costing (Norwich, UK, 1982),might then be made according to its ability to predict Springer, 1983, pp 441-460.future behaviour of the system under examination. Littleattention has been paid to this important problem of AUTHORchoosing among stochastic process models according totheir predictive ability [11]. Dr. B. Littlewood, Director; Centre for Software Reliability; The City

It is known that the 'true' rate changes by a jump University; Northampton Square; London, EC1V OHB, ENGLAND.when a failure occurs (and a fault is removed); these in- Bev Littlewood has BSc and MSc degrees from the University of

' ~~~London in Mathematics and Statistics, respectively, and a PhD from Theterventions are the source of reliability growth. Conse- City University, London in Statistics and Computer Science. His teachingquently, these processes do not have s-independent in- and research interests are in Applied Probability, particularly Softwarecrements, and there is a sense in which the adoption of a Reliability. He is Director of the recently formed Centre for SoftwareNHPP is 'wrong'. It seems to be an open question how Reliability and is a Fellow of the Royal Statistical Society.closely a process with event-altered rate can be approx-

imate usin aprcesswith -indeendet incement and Manuscript TR83-009 received 1983 January 26; revised 1984 January 25.

time-dependent rate.***