6
IEEE TRANSACTIONS ON RELIABILITY, VOL. R-28, NO. 3, AUGUST 1979 241 Software Reliability Model for Modular Program Structure Bev Littlewood hicle traffic control, space and defence systems, manufacturing The City University, London process control are just a few examples of such application areas. Key Words-Software reliability measurement, Software reliability 2) At the same time, there has occurred a revolution in costs, Modular software reliability, Semi-Markov process, Asymptotic hardware technology. The cost per bit of storage and the Poisson process, Software failure costs, Software life-cycle costs, Avail- execution cost per instruction have decreased dramatically, ability and continue to do so. Software, however, being highly Reader Aids- labour-intensive can probably never achieve such economies, Purpose: Widen state of the art in spite of the great effort which continues to be made in im- Special math needed for explanations: SemimMarkov processes proving management and programming techniques [7] . Special math needed for results: None Resusiuel mato:neSoftresandureliabi or The result of these two trends is that a higher and higher per- centage of the cost of computing systems goes to software Abstract-The paper treats a modular program in which transfers development. One can envisage a time when "you buy the of control between modules follow a semi-Markov process. Each software, and we will throw in the hardware for free". module is failure-prone, and the different failure processes are assumed to be Poisson. The transfers of control between modules (interfaces) When we look in detail at the history of hardware reliability are themselves subject to failure. The overall failure process of the pro- theory there appear to be two recognisable, but overlapping, gram is described, and an asymptotic Poisson process approximation is strands. On the one hand, a great deal of effort has gone into given for the case when the individual modules and interfaces are very the study of 'black-box' models. These range from simple reliable. A simple formula gives the failure rate of the overall program (and hence mean time between failures) under this limiting condition. Poion prcess models, vi m everatieofailure time distri- The remainder of the paper treats the consequences of failures. Each butions such as the Weibull, to general theories involving in- failure results in a cost, represented by a random variable with a distri- creasing (decreasing) failure rate which are designed to repre- bution typical of the type of failure. The quantity of interest is the sent our understanding of the ageing process [8]. The other total cost of running the program for a time t, and a simple approxi- area of research has centred upon the structures which can be mating distribution is given for large t. The parameters of this limiting built from failure-prone components. One of the great achieve- distribution are functions only of the means and variances of the under- lying distributions, and are thus readily estimable. A calculation of pro- ments here has been the recognition that in certain circum- gram availability is given as an example of the cost process. There fol- stances it is possible to-build a system with any required relia- lows a brief discussion of methods of estimating the parameters of the bility using components of any given unreliability. model, with suggestions of areas in which it might be used. The majority of work in software reliability so far has con- centrated on the former, black-box approach. Even here, it 1. INTRODUCTION seems that software has quite unique properties (e.g. lack of natural degradation) which modeling has attempted to emulate To what extent can software engineers avail themselves of [2 - 5] . Possibly motivated by a desire to learn to walk before the large body of reliability theory developed for hardware we run, comparatively little attention has so far been given to systems during the past 20 years? The debate on this question the problem of software structure. An exception is an inter- continues [1], with the different points of view finding their esting paper by Buzen, et al. [9], which proves the superi- expression in competing models for software reliability meas- ority of a type of virtual machine organisation under quite urement [2 - 5]. My own work leads me to think that the plausible assumptions. Shooman considered the reliability differences between software and hardware have deep and measurement of structured or modular programs, using the fundamental implications - so fundamental, in fact, that frequencies with which paths are run [10] . My own previous hardware concepts should always be positively justified rather work on this problem [11, 12] adopted a modular approach than adopted unthinkingly [6]e Other workers, notably Musa to the software and attempted to describe the structure via and Shooman, emphasise the need for compatibility between its dynamic behaviour - using a Markov assumption. The hardware and software measures in the search for unified purpose of the current paper is to present results which ex- models of system reliability [1] . tend and generalise this work. On one thing, however, there is no disagreement: the need In section 2 the software structure and failure model are for a theory of software reliability as powerful and compre- described. The program comprises a finite number of modules hensive as that which exists for hardware. There are two main and exchanges of control between these follow a semi-Markov reasons for this new awareness. law. The Markov property ensures that the probability of 1) Computer systems are being increasingly depended upon calling a given module from another module is a function only to perform in situations where failures can have catastrophic of the calling and called modules. This quite stringent assump- costs -in both human and economic terms. Air and road ye- tion can be relaxed, and a discussion of its implications for real 0018-9529/79/0800-241 $00.75 ©D 1979 IEEE

Software Reliability Model for Modular Program Structure

  • Upload
    bev

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Software Reliability Model for Modular Program Structure

IEEE TRANSACTIONS ON RELIABILITY, VOL. R-28, NO. 3, AUGUST 1979 241

Software Reliability Model forModular Program Structure

Bev Littlewood hicle traffic control, space and defence systems, manufacturingThe City University, London process control are just a few examples of such application

areas.

Key Words-Software reliability measurement, Software reliability 2) At the same time, there has occurred a revolution incosts, Modular software reliability, Semi-Markov process, Asymptotic hardware technology. The cost per bit of storage and thePoisson process, Software failure costs, Software life-cycle costs, Avail- execution cost per instruction have decreased dramatically,ability and continue to do so. Software, however, being highly

Reader Aids- labour-intensive can probably never achieve such economies,Purpose: Widen state of the art in spite of the great effort which continues to be made in im-Special math needed for explanations: SemimMarkov processes proving management and programming techniques [7] .

Special math needed for results: NoneResusiuelmato:neSoftresandureliabi or The result of these two trends is that a higher and higher per-centage of the cost of computing systems goes to software

Abstract-The paper treats a modular program in which transfers development. One can envisage a time when "you buy theof control between modules follow a semi-Markov process. Each software, and we will throw in the hardware for free".module is failure-prone, and the different failure processes are assumedto be Poisson. The transfers of control between modules (interfaces) When we look in detail at the history of hardware reliabilityare themselves subject to failure. The overall failure process of the pro- theory there appear to be two recognisable, but overlapping,gram is described, and an asymptotic Poisson process approximation is strands. On the one hand, a great deal of effort has gone intogiven for the case when the individual modules and interfaces are very the study of 'black-box' models. These range from simplereliable. A simple formula gives the failure rate of the overall program(and hence mean time between failures) under this limiting condition. Poion prcess models,vi m everatieofailuretime distri-The remainder of the paper treats the consequences of failures. Each butions such as the Weibull, to general theories involving in-failure results in a cost, represented by a random variable with a distri- creasing (decreasing) failure rate which are designed to repre-bution typical of the type of failure. The quantity of interest is the sent our understanding of the ageing process [8]. The othertotal cost of running the program for a time t, and a simple approxi- area of research has centred upon the structures which can bemating distribution is given for large t. The parameters of this limiting built from failure-prone components. One of the great achieve-distribution are functions only of the means and variances of the under-lying distributions, and are thus readily estimable. A calculation of pro- ments here has been the recognition that in certain circum-gram availability is given as an example of the cost process. There fol- stances it is possible to-build a system with any required relia-lows a brief discussion of methods of estimating the parameters of the bility using components of any given unreliability.model, with suggestions of areas in which it might be used. The majority of work in software reliability so far has con-

centrated on the former, black-box approach. Even here, it1. INTRODUCTION seems that software has quite unique properties (e.g. lack of

natural degradation) which modeling has attempted to emulateTo what extent can software engineers avail themselves of [2 - 5] . Possibly motivated by a desire to learn to walk before

the large body of reliability theory developed for hardware we run, comparatively little attention has so far been given tosystems during the past 20 years? The debate on this question the problem of software structure. An exception is an inter-continues [1], with the different points of view finding their esting paper by Buzen, et al. [9], which proves the superi-expression in competing models for software reliability meas- ority of a type of virtual machine organisation under quiteurement [2 - 5]. My own work leads me to think that the plausible assumptions. Shooman considered the reliabilitydifferences between software and hardware have deep and measurement of structured or modular programs, using thefundamental implications - so fundamental, in fact, that frequencies with which paths are run [10] . My own previoushardware concepts should always be positively justified rather work on this problem [11, 12] adopted a modular approachthan adopted unthinkingly[6]e Other workers, notably Musa to the software and attempted to describe the structure viaand Shooman, emphasise the need for compatibility between its dynamic behaviour - using a Markov assumption. Thehardware and software measures in the search for unified purpose of the current paper is to present results which ex-models of system reliability [1] . tend and generalise this work.

On one thing, however, there is no disagreement: the need In section 2 the software structure and failure model arefor a theory of software reliability as powerful and compre- described. The program comprises a finite number of moduleshensive as that which exists for hardware. There are two main and exchanges of control between these follow a semi-Markovreasons for this new awareness. law. The Markov property ensures that the probability of

1) Computer systems are being increasingly depended upon calling a given module from another module is a function onlyto perform in situations where failures can have catastrophic of the calling and called modules. This quite stringent assump-costs -in both human and economic terms. Air and road ye- tion can be relaxed, and a discussion of its implications for real

0018-9529/79/0800-241 $00.75 ©D 1979 IEEE

Page 2: Software Reliability Model for Modular Program Structure

242 IEEE TRANSACTIONS ON RELIABILITY, VOL. R-28, NO. 3, AUGUST 1979

programs will be given later. The time spent in each module and variances of the individual failure cost distributions. Thismay be a random variable with any distribution (hence semi- last result is of great practical importance, since, although itIdarkov), characteristic of that module and the destination is unlikely that the exact distributions of costs for the differ-module. This assumption is more realistic than that of my ent failure modes will be known, it might be expected thatearlier work [11, 12], where it was necessary to assume that their first two moments will be known, or estimable in spiteexits from modules took place with constant transition rates. of the paucity of information available about software failure

The failure process itself is designed to be sufficiently sim- costs. Although programming managers are aware that not allple to be mathematically tractable and at the same time to failures have similarly severe consequences, there is some reluc-reflect the kind of problems which arise in real modular pro- tance to quantify these differences. This is not the place tograms. Individual modules, when they are executing, fail with discuss the matter in detail, beyond registering a plea for moreconstant failure (hazard) rates which differ from module to detailed data on failure consequences.module. Since the modules are analogous to the components Section 6 shows how the model can be used to estimate aof hardware engineering, it is envisaged that their failure rates program's availability. The simple expedient of allowing costswould be estimated before they are integrated into the overall to represent the down-times caused by the different types ofprogram by, for example, simulation and the use of the black- failure results in a very simple formula for program availability.box software reliability models [2 - 5]. It is an unfortunate Section 7 discusses application of the model and suggestsfact of life that the integration phase usually reveals more fail- techniques by which model parameters might be estimated.ure modes than had been suspected during the time the indi- Some attention is given to the severity of the underlying as-vidual modules were under test. These interfacing failures are sumptions, and the degree to which they might be relaxedincluded in the model by assuming that the transfers of con- whilst retaining the powerful simplicity of the results.trol between modules are themselves failure-prone: each call My aim throughout the paper has been to give an overviewof one module by another may fail with a probability which of the model whilst omitting detailed mathematics. A Sup-depends on both modules. plement [13] is the complete paper, and contains proofs of

Section 3 presents the limiting failure process. Since the the results reported here, together with procedures for com-exact result is very complex it is unlikely to be practical. The puting the model parameters.limiting operation used to obtain the asymptotic result of sec-tion 3 is extremely plausible: namely that the individual mod- 2. POINT PROCESS OF FAILURESule failure rates are much smaller than the switching rates be-tween modules. The asymptotic result itself - a Poisson pro- Consider a program which comprises R modules, denotedcess for the program failures - is easy to use, since the only 1, 2, . . ., R. Exchanges of control ('transitions' in the termi-parameter is a simple function of the model parameters. nology of stochastic processes) take place among these, in such

In section 4 the idea of failure cost is introduced. The early a way that at each epoch one and only one module is occupied,black-box models of software reliability suffered from a naive according to a semi-Markov scheme characterised by the func-assumption wlhich has received little notice. Without exception tions:these models studied only the failure point process, usingMTTF, failure rates, etc. as measures of program quality. This Fik(t) = Pr {Program terminates occupation of module i byapproach assumes that all failures are of equal importance: a calling (enteing) module k before t + rl programpoint of view which is even more ludicrous for software than entered i at T}it is for hardware. One important ultimate goal of this kind for i, k = 1, 2, . .-R; this is independent of r.of research is the provision of means whereby software life-cycle costs can be estimated and incorporated into a model of d ithe thastic bavio of the prog s notcosts and benefits for the life cycle. It is only via such a ra-

describes t

alwavs the easiest wav of doing so. Since there are two ele-tional quantitative approach that we can hope to arrive at y y gsensible planning decisions at an early enough stage to be ef- ments of randomness in a semi-Markov process, it is intuitivelyfective. Equally clearly, a model which merely counts failures appealing to describe them separately.will not be adequate to this task. 1) There is uncertainty about the module which will be

A simple cost structure assumes failures in modules and called next by a given module (ignoring the time taken for thisat interfaces incur a random variable cost with a distribution call to occur): this part of the process is called the embeddedcharacteristic of the module or interface. The overall cost pro- Markov chain and is characterised by the stochastic matrix,cess in time is a very complicated sequence of these randomvariables. Interest will centre upon the total cost in a given P = {p>} whereperiod of execution, i.e. the sum of these random variables.

Because of their complexity, it is again unlikely thiat the P;Pr {program transits from module ito module j}.exact results are practical. Section 5 gives an approximateresult. For long working periods, the distribution of the total 2) The other uncertainty concerns the sojourn times incost incurred from failures has a s-normal (Gaussian) distribu- modules. Unlike a Markov process, where exits from modulestion, with parameters which are functions merely of the means have a constant transition rate, in the semi-Markov process the

Page 3: Software Reliability Model for Modular Program Structure

LITTLEWOOD: SOFTWARE RELIABILITY MODEL FOR MODULAR PROGRAM STRUCTURE 243

sojourn time in module i has a general distribution which de- cess can be obtained by a direct probabilistic argument, writingpends upon i and j (the module entered from i). This general- (3.1) as:isation is of particular importance in our application to soft-ware, since programs exist where module sojourn times are not rXT ij 7 _exponentially distributed: for example, the time spent in a l. + L i X... (3.2)module may be the same constant for each visit. The main Jl lIji iP.p.. jlresults of the paper, i.e. the simple approximating distributions 1of sections 3 and 5, depend on these sojourn time distributions The first term in parentheses is the (limiting) proportion ofonly via their first two moments. We shall denote these by psi, time the program spends in module i when the sojourn is ter-,us. Then a sufficient description of the program's dynamic minated by entering module j. Summing over j, therefore,behaviour for our purposes will be given by these, together gives the total (limiting) proportion of time spent in i: thuswith the pi1's. the first term in (3.2) is the intra-module failure rate of the

We now impose failure processes upon this underlying semi- program. The second term in parentheses is the (limiting)Markov program behaviour. When in module i, we assume fail- number of i to j transfers of control per unit time. Thus theures occur according to a Poisson process, parameter vi. These second term in (3.2) represents the inter-module, or inter-Vi are the module failure rates which would in practice be esti- facing, failure rate.mated, before integration, by module testing. When the sys- The sojourn time distributions enter only via their means,tem is integrated, the interfaces between modules introduce which might be expected to be readily estimable. In many ap-another potential source of failure. We assume that when plications even greater simplification can be achieved. If wemodule i calls module j there will be a probability Xij of a fail- write (3.2) as:ure's occurring (i # j).

Our interest initially centres upon the total number of fail- S2 ai vi + lij b 1i Xi, (3.3)ures of the integrated program in time interval (0, t): denotedby N(t). This random variable is the sum of the failures in the where ai represents the proportion of time spent in module i,different modules during their sojourn times in (0, t), together and bi is the frequency of i -+ i transfer of control, it will oftenwith the interfacing failures. The behaviour of N(t) as t in- be possible to obtain the a's and b's directly. Parameter esti-creases is very complex. It is possible to obtain a complete mation is discussed in section 7.formal description of this failure point process, but this will The usefulness of Theorem 1 to the software engineer liesbe relegated to the Supplement [13]. In any case, such a in its extreme simplicity. It provides a rationale for the use ofcomplete description requires knowledge of the distributions a great deal of conventional reliability theory: exponentiallyof sojourn times; these are unlikely to be available in practice. distributed intra-failure times, mean time to failure as a mea-

sure of program quality, etc.3. ASYMPTOTIC FAILURE PROCESS

A very plausible assumption for a modular program is that 4. FAILURE COSTSthe individual failure rates (inter- and intra-module) are much The program behaves as before, and we shal retain the samesmaller than the switching rates between modules. Alterna-

n

tively, the times between failures are going to tend to be much notatInasdit iousw os ect fort semi-Markovlagrta h ie ewe ecagso oto;w ol process. In addition, we assume that each failure results in alarger than the times between exchanges of control; we would -

random variable cost. These random variables will be assumedexpect many exchanges of control to take place between suc-cessive program failures. A program for which this assumption the module or interface in which the failure occurs. Let Yicould not be made would be enormously unreliable, and it is

unlikely tts irepresent the cost of a failure in module i, with Cdf Gi(y), andunlikdoel that smplystem inthegoration woilulrev boes Yik represent the cost of a failure when module i calls module

k(i -* k transition), with Cdf Hik(y). Denote by Yi(t) the totalTheorem 1: The failure point process of the integrated pro- cost of failures of module i during (0, t), and Yik(t) the totalgram is asymptotically a Poisson process with rate parameter: cost of failures of the i - k interface. We are interested in total

irp ..(i'Pi +Xjj) program cost:

I, ll lJ (3.1) Y(t) = YR.Yy(t) + Zi R y. (t). (4.1)

. Tii 1 i1)An exact description of the overall failure cost process,

as all the Xt's and v's become vanishingiy small, where ir = {7rri} Y(t), is extremely complex (see [13] for details). It is unlikelyis the equilibrium vector of P (i.e. it is the solution of ir *P = ir that even the mean total cost of failures during (0, t) will betogether with S i r1 = 1). simple enough to be of practical use. More importantly, it isA proof of this result can be obtained via general theories of unlikely that the functions F, G, H, which are required for the

thinning of point processes [14] . The rate of the Poisson pro- these exact results, wrill be known. The following approximat-

Page 4: Software Reliability Model for Modular Program Structure

244 IEEE TRANSACTIONS ON RELIABILITY, VOL. R-28, NO. 3, AUGUST 1979

ing result is important because it is much simpler than the availability converges to:exact theory, and it depends only upon the means and vari- 1 - 11 gi Pij (6.1)ances of the defining distributions. =

1+,U La jri Pij [Pij + p i pa m 1 + mii5. AN APPROXIMATE RESULT

where m' is the mean down-time for a module i failure, andTheorem 2: The cost process Y(t) is asymptotically s-normally mij is the mean down-time for a failure on interface i -* j.distributed as t - oo, i.e. Again, note the simplicity of (6.1): only mean down-times

Y(t)-,pt d appear._2 N(0, 1), (5.1)

a t½ 7. ESTIMATION, USE, VALIDATION

where p is the (asymptotic) mean cost incurred per unit time Unlike the earlier black-box models of software reliability,for the integrated program and is: structural models like this are primarily intended to provide the

software engineer with a general design tool. Thus, prior to7r p(p( VMlI + MDr') (52) system integration, but when sufficient data are available from

module testing, one of the earlier models [2 - 5] can be usedLi,; 7ripf> 1l to estimate the module failure rates, vi. The result of section

3 can then be used to predict the program reliability as a fail-where m'l and ml] are the means of Yi, Yij respectively. Sim- ure rate, or in the terms of some appropriate measure from theilarly a2 can be thought of as the asymptotic variance of the time-to-failure distribution: mean time to failure, 90% tolerancecost per unit time. It can be shown that a2 depends upon the interval for time of next failure, etc. More important than thisG and H cost distributions only via their first and second mo- kind of predictive calculation, however, is the possibility ofments. Details of a computational procedure for a2 are con- using the model as an aid to project management. Duringtained in the Supplement [13]. module testing it is likely (one hopes, indeed, certain) that

module reliability will be increasing. The models will be able6. AVAILABILITY to predict module reliability at some future time. Use of the

results of section 3 will enable the project manager to predictThere are two definitions of availability in common use. system reliability from these predictions of module reliability.

Pointwise availability is the probability that the system will Since (3.1) allows insight into the contribution made by eachbe able to operate satisfactorily at a given epoch t. Interval module to the unreliability of the whole program, it can beavailability is the s-expected fraction of a given interval (a, b) used to allocate programmer effort to those modules whichthat the system will be able to operate satisfactorily. These contribute most to system unreliability. It should be possibledefimitions come from hardware reliability and there are to decide how much effort needs to be invested in each modulegrounds for believing they should be applied with care to to maintain on schedule the reliability growth of the system.software [6]. For the present model we can overcome any All such projections and decisions would, of course, be con-difficulties by assuming (as has been done) that the means of stantly up-dated as new testing data became available.the underlying model distributions all exist: viz, they are not In the foregoing it has not been possible to say anythinginfinite. about the interfacing failures; it is implicitly assumed that the

Exact results are, as before, too difficult to obtain; so we X's are all zero. No amount of effort to debug modules willconsider the asymptotic theory again. The pointwise availa- help to eliminate potential sources of failure on the interfacesbility for large t, and interval availability for a large interval, when system integration is attempted. Unfortunately, therewill converge to the same numerical value. We shall therefore seems to be little information available about interfacing fall-refer to them merely as availability. ures, apart from general agreement that they exist separately

We assume that the 'cost' associated with a failure represents from the module failures which are revealed in testing. Athe program down-time consequent upon that failure. In hard- crude approximation would be to assume all the X's are equalware terminology this is called repair time, but software fail- and use an estimate which has been obtained by observation ofures do not have very simple consequences. Often software other, similar projects. It seems reasonable to assume the X'sfailures do not incur any down-time, and when they do it may are fairly independent of program structure: at least more sobe merely the time required for reloading. Examples of the than the v's sincenp clearly depends on the size and complexitylatter are common in real-time systems which fail when a sub- of module i. If we are interested solely in a measure of theset of the input space is (randomly) encountered: start-up will current program reliability during integration, we can probablyusually take place in a different region of the input space and estimate the interfacing failure probabilities by examining thethe program will not immediately fail again. failure log and observing the proportion of i -o j module-calls

The program alternates between up-times and down-times; which fail. This involves counting the number of times each ofbut it does not behave as an alternating renewal process [15] the R2 possible transitions occurs; but this may be needed forsince the down-times will have random variable durations estimating P, anyway. Such counting can be done very simplytypical of the failures which cause them. The overall program and automatically; it will usually be possible to establish a

Page 5: Software Reliability Model for Modular Program Structure

LITTLEWOOD: SOFTWARE RELIABILITY MODEL FOR MODULAR PROGRAM STRUCTURE 245

priori that some transitions are impossible. Of the assumptions underlying the model itself, it is theTo use the model in the way suggested, it will be necessary nature of the switching process among modules which is

to estimate the a's and b's of (3.3). Since these are functions most likely to be in question, since limiting operation makesof 7r's, p's and ,Iu's, it will be necessary to estimate all these the exact Poisson form of the intra-module failure processesseparately. The MI 's, mean times spent in modules per visit, unnecessarily restrictive. The assumption of a semi-Markovwill probably be easiest; good estimates of these will be avail- process for this switching already allows great flexibility forable from testing. It may be possible to get exact values for the sojourn time distributions; it remains to justify the em-these parameters by a sufficiently detailed analysis of module bedded Markov chain assumption. It would always be best tostructure. The p's will usually present more difficulties. In have hard evidence of this from knowledge of program struc-most cases the P matrix will be sparse, i.e. have many zero ture, and this may occur in some cases. It is possible, however,elements, since many module calls will be impossible. It will that such evidence will have to be obtained from observationusually be best to find these first (generally a simple task of of program behaviour (actual or simulated). This is a formid-examining each module to find which modules it cannot call). able problem: to check that the proportion of times moduleThe remaining, non-zero p's may sometimes be available ex- j is called by module i depends only upon i and j and notactly by analyzing program structure. More likely it will be upon earlier module visits. It will usually be sufficient tonecessary to estimate them. If this cannot be done by simula- verify that i - j transitions do not depend upon the immedi-tion of the modules, it will have to wait until the system inte- ately preceding states of the system (say, the previous two orgration phase. Fortunately, however, good estimates will be three modules occupied before i), but even this will requireobtained very quickly in view of the high speed of control considerable data. This problem is common to all situationsexchanges compared with failure rates. Thus at an early stage where a Markov assumption is made and is not peculiar to theof system integration the laboriously gathered failure data can software application. Indeed, we are in a better position thanbe incorporated with the quickly available system data to pro- in most applications in the natural sciences, since we typicallyvide an estimate of system reliability. have very long realisations available (i.e. large numbers of

For the purposes of using the results of section 3, it is easiest module switches).to estimate the a's and b's directly if the p's cannot be obtained Again there is\levidence that the limiting results would tol-by inspecting program structure. For section 5, however, P it- erate some degree of relaxation of the strict Markov assump-self is needed to calculate U2. tion. In order to obtain an asymptotic Poisson process and

The value of results like those in sections 4 and 5 lies in their Central Limit Theorem we require asymptotic s-independenceability to provide estimates of the total costs of failures during of increments in the process of module occupancies: i.e. wethe planned life of the program. Thus (5.2) will enable us to need to be able to assert that two non-overlapping intervalscalculate the s-expected cost accruing from failures over a long of time will have associated blocks of behaviour which aretime. asymptotically s-independent. This will be true if the process

As a design tool, (5.2) can be used to make a rational deci- is 'approximately Markov' in the sense that dependence onsion about when to end debugging. When the cost of an im- far past history is asymptotically zero. In many software ap-provement in reliability (in programmer time, say) will not be plications, it seems to me, this could be asserted with con-recouped by a greater reduction in s-expected life-time failure fidence. There will remain, however, programs for which thiscosts, then debugging should be terminated. This extends the assumption is not valid; the results of this paper cannot beusual reliability analysis; we are able not merely to predict used in such cases.when a given reliability will be achieved, but to decide whethersuch an improvement will be cost effective.

The approximating distribution of section 5 enables toler-ance bounds to be placed on the total cost of failures during agiven length of time. Since the variance, 2, is required for 8. CONCLUSIONSthis, the calculation will be more tedious than that for thes-expected total cost. The most important and useful results of this paper are those

It is possible to use this model to calculate new and poten- contained in sections 3, 5, 6. Although exact descriptions oftially useful reliability measures. For example, in some appli- the failure and cost processes are far too complicated to be ofcations there may be a maximum permissible cost (per year, general value, they result in very simple processes when thesay). The result of section 5 could be used to calculate the (plausible) limiting operations are carried out.probability that this maximum is not exceeded. In a sense Indeed, the result contained in Section 3 can be thought ofthis has generalised the concept of reliability [8] ; the two will as a quite general rationale for using the Poisson failure law forcoincide when maximum permissible cost is zero (no failures). certain software. This law has been widely used for software

Testing the validity of the model falls into two areas: the before, but its justification has generally depended upon theassumptions which underlie the model, and those which are nature of the input data stream in realtime systems [5], rathernecessary for the approximate results of Theorems 1 and 2. than the structure of the program, as here. In this sense theThe limiting processes which have been used in the latter are result is analogous to the arguments which have been used invery plausible for software in general, and it is unlikely that hlardware reliability theory to justify exponential inter-failurethese will need much verification [13] . times for certain complex devices [8; pp 18-22] .

Page 6: Software Reliability Model for Modular Program Structure

246 IEEE TRANSACTIONS ON RELIABILITY, VOL. R-28, NO. 3, AUGUST 1979

REFERENCES Francisco, CA; 1976 Oct, pp 268-280.111 B. Littlewood, "A reliability model for Markov structured soft-

ware", Proc. 1975 Intern. Conf Reliable Software, Los Angeles,[L] Computer Systems Command, U.S. Army, Proc. 1st Software CA; 1975 Apr 21-23, pp 204-207.Life Cycle Management Workshop, Airlie, VA 1977 Aug 22-23. 12] B. Littlewood, "A reliability model for systems with Markov

2] Z. Jelinski, P.B. Moranda, "Software reliability research", in structure", Applied Statistics (J. Royal Statist. Soc., Series C)Statistical Computer Performance Evaluation, Ed.: W. Freiberger. vol 24, 1975 pp 172-177.New York: Academic, 1972, pp 465-484. , [131 Supplement (Contains detailed mathematical analysis omitted

[3] J.D. Musa, "A theory of software reliability and its application" from present paper, including formulae for computing theIEEE Trans. Software Engineering, vol SE-i, 1975, pp 312-327. parameters of the approximating models.): NAPS document

[4] M. Shooman, "Operational testing and software reliability esti- No. 03307; 46 pages in this Supplement. For current orderingmation during program development", in Record, 1973 IEEE information, see "Information for Readers & Authors" in aSymp. Computer Software Reliability, New York, NY; 1973 current issue. Order NAPS document No. 03307, 46 pages.Apr 30 - May 2, pp 51-57. ASIS-NAPS- Microfiche Publications PO Box 3513, Grand

[51 B. Littlewood, J.L. Verrall, "'A Bayesian reliability growth model Central Station cNew York, NY 100B7USA.for computer software", same source as [41, pp 70-76. [141 P. Jagers, T. Lindvall, ""Thinning and rare events in point pro-

[6] B. Littlewood, "How to measure software reliability, and how cesses", Z. Wahrscheinlichkeitstheorie, vol 28, 1974, pp 89-99.not to.. .", in Proc. 3rd Intern. Conf Software Engineering, [151 D.R. Cox, Renewal Theory, London: Methuen, 1962.Atlanta, Georgia; 1978 May, pp 3745. Also see IEEE Trans.Reliability, vol R-28, 1979 Jun, pp 103-110.

[71 C.E. Walston, C.P. Felix, "A Method of programming measure- AUTHORment and estimation", IBM Systems J, vol 16, 1977, pp 54-73.

[8] R.E. Barlow, F. Proschan, Mathematical Theory of Reliability B. Littlewood; Mathematics Department; The City University; North-New York: Wiley, 1965. ampton Square; London EC1V OHB ENGLAND.

[91 J.P. Buzen, P.P. Chen, R.P. Goldberg, "Virtual machine tech-niques for improving system reliability", same source as [41, B. Littlewood: For biography,.see vol R-28, 1979 Jun, p 110.pp 12-17.

[10] M. Shooman, "Structural models for software reliability pre- Manuscript S179-10 received 1978 December 1; revised 1979 February 1.diction", in Proc. 2nd Intern. Conf. Software Engineering, San * *

Manuscri pts Received For information, write to the author at the address listed; do NOT write to the Editor

"Protection system reliability modeling: Operational "A k-out-of-n redundant system with common-causereadiness and mean duration of undetected faults", failures", Who Kee Chung; Research Associate; Dept. of"Frequency of system failure and related indices using Chemical Engineering; University of Ottawa; Ottawasemi-Markov processes", Chanan Singh; Dept. of Electri- KIN 984 CANADA.cal Engineering; Texas A&M University; College Station,Texas 77843 USA. "l-out-of-2:F system exposed to a damage process with

preventive maintenance", A. Subramanian; Dept. of"Tolerance intervals for a class of IFR distributions with Mathematics; Annamalai Univ.; Annamalainagar - 608a threshold parameter", Jagdish K. Patel; College of Arts 101 INDIA.and Sciences; Dept. of Mathematics; University of Misso- "Reliability determination of non-Markovian power sys-uri; Rolla, MO 65401 USA. tems: Part I: Basic analytic procedure; Part II: Basic

simulation technique", Dr. Ing. H. D. Kochs; Inst. fur"The reliability of a standby system with identical units Elektrische Anlagen und Energiewirtschaft; Rhein.-having the Birnbaum and Saunders distribution", F.J.M. Westf. Techn.; Hochschule, Aachen, WEST GERMANY.Raaijamkers; Royal Netherlands Naval College;Het Nieuwe Diep 8; 1781 AC Den Helder; "Availability modeling techniques for restoration adjust-THE NETHERLANDS. ments and multiple working conditions", Wm. J. Kolarik;

Texas Tech U.; Box 4130; Lubbock, TX 70409 USA."On complementation of pathsets and cutsets", "Relia-bility evaluation in computer-communication networks", "TRIM - A new approach to multicriterion optimizationK. K. Aggarwal; Professor & Head; Electronics & Comm. with application to system reliability", Ms. Belur V.Eng. Dept.; Kurukshetra, INDIA. Sheela; Orbital Mechanics Section; M.O.P.D., ISRO Sa-

tellite Center; Peenya Industrial Estate; Peenya,"Availability of a complex system under several repair Bangalore 560058 INDIA.preemptions", "Probabilistic analysis of a multicompo-nent system with opportunistic repairs", Anil Kumar; "A fault tree algebra for non-coherent systems", JamesDte. of Scientific Evaluation; Research and Development M. Cargal; Science Applications, Inc.; 88 SepulvedaOrgn.; Ministry of Defence; Shed No. 1, AHQ Survey Coy Blvd.; El Segundo, CA 90245 USA.Bldg.; Old Secretariat; Delhi 110054 INDIA.

"Reliability bounds on 3-state systems with associated"Logistics availability for sites with periodic or random components", Yukio Hatoyama; Dept. of Managementresupply", Jack M. Finkelstein; Hughes Aircraft Co.; P.O. Science &c Engineering; Tokyo Institute of Technology; 2-Box 3310 M/S 606/K21 1; Fullerton, CA 92634 USA 12-1 O-okayama, Neguro-ku; Tokyo 152 JAPAN.