4
Method for the diagonosis of a single intermittent fault in combinatorial logic circuits P.K. Lala and J.I. Missen Indexing terms: Combinatorial circuits, Fault location, Intermittent faults Abstract: The paper presents a technique, based on probability theory, which detects a well behaved inter- mittent fault in combinatorial logic circuits. The procedure employs repeated applications of tests that detect solid faults in the circuit. The time period, during which a test is repeatedly applied, depends on the prob- ability of detection desired and is derived from the Poisson distribution of statistics. The percentage of faults located using repeated tests via software simulation agrees very well with the statistical prediction. 1 Introduction Intermittent faults are the most frequently occuring faults in data-processing systems once the installation period is terminated. To detect a permanent fault, any particular test need only be applied once. However, a fault which is intermittent in nature may escape detection when the test is applied. There are three possible ways to deal with inter- mittent faults in digital systems: l (a) Development of better test techniques for the detection and location of intermittent faults. (b) Development of techniques for inducing the inter- mittent faults to appear as solid (i.e. permanent). (c) Designing of networks which will mask the effects of intermittent faults. Method a still remains as an area in need of further research and investigation. Recent efforts in this direction can be found in the works of Breuer, 2 Kamal and Page 3 and Koren and Kohavi. 4 A model of fault intermittency, based on classical probability theory, has been presented by Parker and McCluskey. 5 This paper presents a detection procedure, based on the model of Parker et al, to trap intermittent faults. Only 'well behaved' intermittent faults will be considered; i.e. either the circuit under test behaves as if it is fault free or seems to possess a solid fault during the duration of a test. 2 The test procedure involves repeated application of tests that test for permanent faults in the circuit. The time period, during which a test is repeated, is selected on the basis of the 'probability of detection' which is acceptable to the user. It may be helpful to draw an analogy with the 'acceptable quality level'which occurs in the theory of quality control. 6 Here one may select a confidence limit (say 90%) that the product shall lie within defined limits of quality. In this paper the probability of detection P D of an intermittent fault ocrresponds to the confidence limit. 2 Intermittent fault model If p f is the probability of a fault occuring in the circuit under test, then Pf = 0 means that the fault does not exist, whereas p f = 1 indicates that it is present permanently in the circuit (a solid fault). When p f has a value between the T432C, first received 29th March and in revised form 19th July 1979 Dr. Missen is, and Dr. Lala was formerly, with the Department of Physics, The City University, St. Johns Street, London EC1V 4PB, England. Dr. Lala is now with Redifon Computers, Crawley, Sussex, England two extremes (0 <p f < 1), the fault can be assumed to be intermittent because it appears randomly in time. This model is based on the assumption that an a priori prob- ability value may be assigned to the occurrence of a fault. It has been indicated in Reference 3 that these values can be estimated empirically based on the familiarity with the circuit. An example is quoted from Reference 3 for illustration. Among the gates produced by a certain manufacturer, it is estimated that for about 0-01 percent of them, the gap between ON and OFF voltages is smaller than some critical value. If the gap is below the critical value, the gate will malfunction 5 percent of the time . . . The probability of failure in this case is assumed to be 0-0005%; i.e. (0-01 x 0-05)%. By definition, intermittency is a time-dependent phenomenon and can be represented by a random variable, the distribution of which depends on the components of the system and the environment. Although an intermittent fault may, at least theoretically, be detected by repeating a test which would detect the fault if it were solid, 3 ' 7 there are two major problems in the application of the fault model of Reference 3. First, it is very difficult to obtain realistic estimates of the model parameters using the currently available data, and secondly there is the problem of the huge number of tests that have to be applied. 8 One significant point to be noted is that the same number of tests may be cycled in a shorter or longer period of time by varying the rate at which they are applied. The main problem in intermittent-fault detection is to make sure that a test is applied to the circuit when the fault occurs. Given the probability of failure, the problem becomes that of determining the length of time in which the fault has a high probability of occurrence. Therefore, if the test is repeated for that length of time (say T), the probability of detecting the fault will also be high. If A T is the duration of a test, then, statistically, it has to be cycledN(= T/AT) times to detect the fault; it is assumed that a fault, when it appears, also exists for time AT. 3 Poisson distribution Let us consider a random variable x as the number of successes in n independent trials of a 2-outcome experiment; this variable assumes the values 0, 1,2 . . .n. If the prob- ability of success of each event is p, then the probability of x successes in n independent trials is given by P{x) = COMPUTERS AND DIGITAL TECHNIQUES, OCTOBER 1979, Vol. 2, No. 5 187 0140-1335/79/050187+04 $01-50/0

Method for the diagonosis of a single intermittent fault in combinatorial logic circuits

  • Upload
    ji

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Method for the diagonosis of a single intermittent fault in combinatorial logic circuits

Method for the diagonosis of a single intermittent

fault in combinatorial logic circuits

P.K. Lala and J.I. Missen

Indexing terms: Combinatorial circuits, Fault location, Intermittent faults

Abstract: The paper presents a technique, based on probability theory, which detects a well behaved inter-mittent fault in combinatorial logic circuits. The procedure employs repeated applications of tests that detectsolid faults in the circuit. The time period, during which a test is repeatedly applied, depends on the prob-ability of detection desired and is derived from the Poisson distribution of statistics. The percentage of faultslocated using repeated tests via software simulation agrees very well with the statistical prediction.

1 Introduction

Intermittent faults are the most frequently occuring faultsin data-processing systems once the installation period isterminated. To detect a permanent fault, any particulartest need only be applied once. However, a fault which isintermittent in nature may escape detection when the testis applied. There are three possible ways to deal with inter-mittent faults in digital systems:l

(a) Development of better test techniques for thedetection and location of intermittent faults.

(b) Development of techniques for inducing the inter-mittent faults to appear as solid (i.e. permanent).

(c) Designing of networks which will mask the effects ofintermittent faults.

Method a still remains as an area in need of furtherresearch and investigation. Recent efforts in this directioncan be found in the works of Breuer,2 Kamal and Page3 andKoren and Kohavi.4 A model of fault intermittency, basedon classical probability theory, has been presented by Parkerand McCluskey.5 This paper presents a detection procedure,based on the model of Parker et al, to trap intermittentfaults. Only 'well behaved' intermittent faults will beconsidered; i.e. either the circuit under test behaves as ifit is fault free or seems to possess a solid fault during theduration of a test.2 The test procedure involves repeatedapplication of tests that test for permanent faults in thecircuit.

The time period, during which a test is repeated, isselected on the basis of the 'probability of detection' whichis acceptable to the user.

It may be helpful to draw an analogy with the 'acceptablequality level'which occurs in the theory of quality control.6

Here one may select a confidence limit (say 90%) that theproduct shall lie within defined limits of quality. In thispaper the probability of detection PD of an intermittentfault ocrresponds to the confidence limit.

2 Intermittent fault model

If pf is the probability of a fault occuring in the circuitunder test, then Pf = 0 means that the fault does not exist,whereas pf = 1 indicates that it is present permanently inthe circuit (a solid fault). When pf has a value between the

T432C, first received 29th March and in revised form 19th July 1979Dr. Missen is, and Dr. Lala was formerly, with the Department ofPhysics, The City University, St. Johns Street, London EC1V 4PB,England. Dr. Lala is now with Redifon Computers, Crawley, Sussex,England

two extremes (0 <pf < 1), the fault can be assumed to beintermittent because it appears randomly in time. Thismodel is based on the assumption that an a priori prob-ability value may be assigned to the occurrence of a fault.It has been indicated in Reference 3 that these values canbe estimated empirically based on the familiarity with thecircuit. An example is quoted from Reference 3 forillustration.

Among the gates produced by a certain manufacturer,it is estimated that for about 0-01 percent of them, thegap between ON and OFF voltages is smaller than somecritical value. If the gap is below the critical value, thegate will malfunction 5 percent of the time . . .

The probability of failure in this case is assumed to be0-0005%; i.e. (0-01 x 0-05)%.

By definition, intermittency is a time-dependentphenomenon and can be represented by a random variable,the distribution of which depends on the components ofthe system and the environment. Although an intermittentfault may, at least theoretically, be detected by repeatinga test which would detect the fault if it were solid,3'7 thereare two major problems in the application of the faultmodel of Reference 3. First, it is very difficult to obtainrealistic estimates of the model parameters using thecurrently available data, and secondly there is the problemof the huge number of tests that have to be applied.8 Onesignificant point to be noted is that the same number oftests may be cycled in a shorter or longer period of timeby varying the rate at which they are applied. The mainproblem in intermittent-fault detection is to make surethat a test is applied to the circuit when the fault occurs.Given the probability of failure, the problem becomesthat of determining the length of time in which the faulthas a high probability of occurrence. Therefore, if the testis repeated for that length of time (say T), the probabilityof detecting the fault will also be high. If A T is the durationof a test, then, statistically, it has to be cycledN(= T/AT)times to detect the fault; it is assumed that a fault, when itappears, also exists for time AT.

3 Poisson distribution

Let us consider a random variable x as the number ofsuccesses in n independent trials of a 2-outcome experiment;this variable assumes the values 0, 1,2 . . .n. If the prob-ability of success of each event is p, then the probabilityof x successes in n independent trials is given by

P{x) =

COMPUTERS AND DIGITAL TECHNIQUES, OCTOBER 1979, Vol. 2, No. 5 187

0140-1335/79/050187+04 $01-50/0

Page 2: Method for the diagonosis of a single intermittent fault in combinatorial logic circuits

The above expression for P(x) enables one to assign a prob-ability to each value of x from 0 to n. The resultingdistribution of probabilities is known as the 'binomialdistribution'.

If in the binomial distribution p < 1 but n is sufficientlylarge for the mean np to be significant then the probabilityfunction P(x) may be shown to take the approximate form

uxexl

(x = 0 , l , 2 , . . . / i )

where u — np is the mean of the distribution.A distribution given by the above equation is called the

'Poisson distribution'. The Poisson distribution strictlyspeaking is not a continuous one, but gives probabilitiesfor one particular whole number 0, 1, 2 , . . . of successeswhen the mean is u and the total number of occurences isnot known. The factors uxfx\ in the sucessive probabilitiesare the successive terms in the expansion of e+u, so that thesum of all the probabilities is e+ue~u = 1 as expected.

The Poisson distribution has two main applications,first as a useful approximation to the binomial distributionwhen the binomial parameter p is small, and secondly fordescribing the number of events which occur randomly incertain time intervals.

4 The Poisson distribution in intermittent faultdetection

Let a fault occur randomly in a circuit and the probabilityof its occurence be pf. If a test for the fault is repeated fora sufficiently long time T, the fault will appear PfT(= u)times, (the Poisson parameter.) The problem is to find thetime t{< T) in which the fault will appear at least once,assuming one occurence of the fault is enough for itsdetection.

The Poisson distribution for the situation is

(x = 0 , 1 , 2 , . . . )

The probability of getting at least one fault is

p(x>\) = l-p(x = 0)

= l-e~pfT(1)

(i.e. total probability — probability of detecting zero faults).Eqn. 1 has been plotted in Fig. 1 to show the expectedprobability of detection against testing time for a range ofvalues of pf which is likely to be experienced in practice.P(x > 1) will be termed as the probability of detecting anintermittent fault, PD. In eqn. 1, if for example pf — 0-04andT= 150, then

PD = 1 - 0-00248 = 0-99752 ( ^ 1)

This problem may be stated in a different way: if the prob-ability of failure is known, how long does a test for thefault have to be repeated to get a desired probability ofdetection? The solution to this is of main interest in thispaper. The longer the test is applied, the higher is theprobability of detecting an intermittent fault. Consequently,if the length of the test time is reduced, there will be acorresponding decrease in the probability of detection.

For example, using Fig. 1, if an intermittent fault is tobe detected having pf = 0-001, then to obtain a probabilityof detection of approximately 80%, the appropriate test hasto be cycled for 1600 units of time.

If, for example, the fault duration is assumed to be1 /is, then, if one test also occupies 1 /us (well behavedfault) then a total number of 1600 tests will be required,taking 1600 jus.

0001 "001 01probability of failure pf

10

Fig. 1 Poisson probability function.

Probability of failure against testing timeT=(l/pf)\n{lKl-PD)}

5 Detection of an intermittent fault in combinatorialcircuits

A procedure to derive a minimal test set for a combinationalcircuit under solid fault conditions has been described in aprevious paper.9 The test set is applied to the circuit undertest and a fault, if present, is detected and located withinan indistinguishable fault class.

During the process of cycling, if a test does not give anyfalse output value, then it may be assumed that the inter-mittent fault does not belong to the fault set associatedwith the test. The preset time for cycling a test, in fact,gives the limit to the length of time a circuit has to betested before one may conclude that none of the faultswhich the test can detect appears intermittently; hence, assoon as a fault is detected, the testing may be discontinued.

The following steps are to be carried out sequentially inorder to detect an intermittent fault:

(i) Apply a test from the test set derived under thesolid fault assumption.

(ii) If the output of the circuit is different from thefault-free circuit, then the intermittent fault has beendetected; in that case the testing may be discontinued;otherwise go to step iii.

(iii) If the present test time limit has been exceeded, goto step iv otherwise apply the test again and go back tostep ii.

(iv) Make one of the following decisions:(a) Stop testing; the circuit is intermittent faultfree.(b) Select another test from the testset, i.e. go back

to step (i).

6 Circuit example

As an example of the intermittent fault detection process,the circuit10 of Fig. 2 is considered. The minimal test setfor the circuit which detects all single solid faults in thecircuit is {0000, 1101, 1110,0111, 1000}.

188 COMPUTERS AND DIGITAL TECHNIQUES, OCTOBER 1979, Vol. 2, No. 5

Page 3: Method for the diagonosis of a single intermittent fault in combinatorial logic circuits

Table 1 : Results of fault-locating experiments

Experiment Number of tests Number of times testswere applied

Faultprobability

Pf

Number of times faultwas located

Percentagesuccess

Predictedpercentage success

1234567891011121314

3005003005001002007050401008020510

2020202020202020222020215050

00005000050001000100050010010010040 040 04005005001

3377101699171820151337

15153535508045457790100712674

1422263740865038809896632263

Fig. 2 Illustrative circuit

A random number generator11 is used to simulate theintermittent appearence of a fault. It generates a number*which lies between 0 and 1, by initially defining a seednumber Y of value 0 < Y < 1. Different initial seedsgenerate different sequences of random numbers andconversely the same sequence of pseudorandom numbers isgenerated by the same initial seeds. If x<,pf (the prob-ability of intermittent failure) when a test is applied, thenthe desired intermittent fault is set up as a solid fault forthe test concerned and is removed at the end of the test. Anew value of x is generated before the test is applied again.

The method utilised for random-number generation hasbeen subjected to several tests for randomness with positiveresults; indeed the successful outcome of the method heredescribed provides endorsement.

It is assumed that the circuit of Fig. 2 has an intermittentfault in which gate element 7 is stuck at 0 intermittentlywith a range of probabilities. By varying the random seed,described earlier, the intermittent fault can be arranged tooccur randomly with a described probability.

Table 1 gives the results of fourteen experiments inwhich the fault-finding technique was simulated with awide range of fault probabilities and testing times. Thus,for example, in experiment 6, 200 tests were applied with afault probability of 0-01, and the series of tests wasrepeated 20 times. The fault was, in fact, located in 16 outof these 20 attempts, giving a percentage success of 80.

10 20 30 40 50 60 70 80 90 100actual percentage of faults located

Fig. 3 Plot of predicted and experimental results

Reference to Fig. 1 shows that the predicted success isbetween 0-8 and 0-9.

The percentage successes obtained by simulation havebeen plotted against the predicted values in Fig. 3. It willbe seen that good agreement is achieved over a wide rangeof parameters.

7 Conclusion

An algorithmic procedure to detect and locate a singleintermittent fault in combinatorial circuits has beendescribed. This procedure is based on probability theory,and involves repeated application of tests, which aredesigned for solid fault detection within a preset timeperiod. The time periods are calculated from the Poissondistribution when the probability of failure is known. Ithas been proposed that, if a fault appears with a knownprobability, then it has a certain probability of beingdetected within the selected time period; it was assumedthat an intermittent fault has to occur only once during thetest time period for it to be detected.

The intermittent fault detection procedure is based onthe assumption that the probability of malfunction isknown. Although this seems a restrictive assumption, theprobability could perhaps be estimated from the reliabilityrequirement of the circuit during operation. A circuit in

COMPUTERS AND DIGITAL TECHNIQUES, OCTOBER 1979, Vol. 2, No. 5 189

Page 4: Method for the diagonosis of a single intermittent fault in combinatorial logic circuits

this context, may be considered to be highly reliable ifduring its operation the probability of intermittent faultoccurence is very low. If the network reliability is madeproportional to a 'quality factor' Q, where

Q =l

Probability of intermittent fault occurence

then a high Q{\ < Q < °°) will mean high reliability; Q -» °°refers to an ultrareliable circuit and Q = 1 means that asolid fault is present. Thus, depending on the reliabilitydesired, the circuit has to be tested to screen out a faultwith a probability of occurence equal to l/Q.

At present, hardly any information is available on themechanism of intermittent failures, it seems semiconductormanufacturers can give a lead in this direction by providing,with their supplied products, failure statistics, especiallysolid and intermittent failure rates.

However, from the results obtained, it seems that astatistical prediction can permit an acceptable percentagesuccess to be obtained in locating intermittent faults with-out unduly lengthy testing periods.

8 References

1 BALL, M., and HARDIE, F.: 'Effects and dection of inter-mittent failures in digital systems'. Proceedings of the falljoint computer conference, 1969, pp. 329-335

2 BREUER, M.A.: Testing for intermittent faults in digitalcircuits', IEEE Trans, 1973, C-22, pp. 241-246

3 KAMAL, S., and PAGE, C.V.: 'Intermittent faults: a model anddetection procedure', ibid, 1974, C-23, pp. 713-719

4 KOREN, I., and K0HAV1, Z.: 'Diagnosis of intermittent faultsin combinational networks', ibid., 1977, C-26, pp. 1154 1157

5 PARKER, K.P., and McCLUSKEY, E.: 'Analysis of logic circuitswith faults using input signal probabilities', ibid., 1975, C-24,pp. 573-578

6 KREYSIG, E.: 'Advanced engineering mathematics' (Wiley,1972)

7 KAMAL, S.: 'An approach to the diagnosis of intermittentfaults', IEEE Trans., 1975, C-24, pp. 461 -467

8 TASAR, O., and TASAR, V: 'A study of intermittent faults indigital computers'. Proceedings of the national computerconference, 1977, pp. 807-811

9 LALA, P.K., and M1SSEN, J.I.: 'Application of fault-folding inthe test generation for logic circuits', Digital Processes, 1978, 4,pp. 109-120

10 SCHNEIDER, P.R.: 'On the necessity to examine D-chains indiagnostic test generation - an example' IBM J. Res. & Dev.,1967, l i p . 114

11 ROTENBERG, A.: 'A new pseudo-random number generator', /.ACM, 1960, 7, pp. 75-77

P.K. Lala arrived in the UK afterobtaining a degree in Physics in India.He received an M.Sc. in Electronicsfrom King's College, London and aPh.D from the City University. Atpresent he is a senior research engineerwith Redifon Computers Ltd. Hisresearch interests are in digital-systemdesign, fault-tolerant computing and

i computer architecture.

James Missen obtained a Physicsdegree at Imperial College in 1949 andwas on the senior scientific staff at theGEC Research Laboratories, Wembleyfor a number of years, working onsemiconductor development andapplications. He gained an M.Sc. atBirkbeck College in 1955 afterresearch on the design of a thermionic-valve computer, and is now seniorlecturer in the Physics Department of

City University. The subject of his Ph.D thesis was thethermal properties of integrated circuits, and he is nowworking on some aspects of fault dignosis in logic circuits.

190 COMPUTERS AND DIGITAL TECHNIQUES, OCTOBER 1979, Vol. 2, No. 5