Probability estimation algorithms for self-validating sensors

�Expanded version of paper presented at IFAC Algorithms andArchitectures for Real-Time Control conference, Majorca, May 2000.

*Corresponding author. Tel.: #44-1232-335-439; fax: #44-1232-664-265.E-mail address: [email protected] (G.W. Irwin).

Control Engineering Practice 9 (2001) 425}438

Probability estimation algorithms for self-validating sensors�

A.W. Moran, P.G. O'Reilly, G.W. Irwin*Intelligent Systems and Control Group, School of Electrical and Electronic Engineering, The Queen's University of Belfast, Ashby Building, Stranmillis Road,

Belfast BT9 5AH, UK

Received 5 June 2000; accepted 21 August 2000

Abstract

Three alternative approaches are investigated for probability estimation for use in a self-validating sensor. The three methods areStochastic Approximation (SA), a Reduced Bias Estimate (RBE) of this same approach and a method based on the BayesianSelf-Organising Map using Gaussian Kernels (GK). Simulation studies show that the GK-based method gives superior results whencompared to the RBE algorithm. It has also been demonstrated that the GK method is more computationally e$cient and requiresstorage space for fewer variables. The techniques are demonstrated using data from a thermocouple sensor experiencing a change intime constant. � 2001 Elsevier Science ¸td. All rights reserved.

Keywords: Probability; Sensors; Stochastic approximation; Gaussian distributions

1. Introduction

The use of smart instruments in industry has increasedover recent years as engineers have taken advantage ofthe added features that they o!er over more conventional`dumba alternatives. These devices are able to applycorrections to the raw measurement in order to providesuch features as outputs in engineering units and com-pensation for thermal drift (de SaH , 1988).This has generated research interest in instruments

that can not only compensate for some undesired phys-ical property, such as non-linearity, but that can alsodetect and, more importantly, counteract internal faults.Such types of instruments have been termed `self-validat-inga (Henry & Clarke, 1993), in that they are able toprovide an indication as to the validity, or con"dence, inthe measured value and also allow an indication of thehealth of the instrument to be generated. Such an instru-ment must therefore be able to extract more than just theprocess measurement from the sensor output. The SEVAstandard de"nes certain metrics that will show the state

of the sensor. However, these metrics will only providemeaningful information if the underlying fault detectionmethods are accurate.Various self-validating instruments have been de-

veloped including a thermocouple (Yang & Clarke,1997), a DOx sensor (Clarke & Fraher, 1996), and acoriolis mass-#ow meter (Henry et al., 1996) and (Henryet al., 2000). One self-validating approach was describedby Yung and Clarke (1989) and this is also implementedon a thermocouple system (Moran, O'Reilly, & Irwin,2000). This involves building a parametric model of thesensor output during fault-free operation. This model isthen used as an inverse "lter to generate an innovationssequence which should be white noise if the model consti-tutes an accurate description of the sensor output. Anychange in the statistics of the innovations sequence canbe related to the occurrence of sensor faults (Upadhyaya,1985).Previous work by the present authors has demon-

strated how a change in the innovations variance can bedetected, both in simulation (O'Reilly, 1998) and usingdata from a practical temperature system (Moran et al.,2000). The detection method involved a likelihood ratiotest for two hypotheses, the null hypothesis, H

�, where

there has been no change and the test hypothesis, H�,

where a change has occurred. The probabilities ofH

�and H

�were found using on-line stochastic approxi-

mation (SA). However, from operational data in Naim

0967-0661/01/$ - see front matter � 2001 Elsevier Science Ltd. All rights reserved.PII: S 0 9 6 7 - 0 6 6 1 ( 0 0 ) 0 0 1 1 8 - 0

and Kam (1994) it was shown that such probabilityestimates will only be unbiased if the decisions on thealternative hypotheses are error free, i.e. the probabilitiesof false alarm and missed detection are zero. This situ-ation is unlikely to occur in practice and the fact thaterrors occur in the decision process may then lead tobiased probability estimates.The trend towards instruments with higher levels of

sophistication means that sensors are constructed withan increasing amount of local processing power. How-ever, the computing power of any one instrument will stillbe limited and so the best use must be made of theavailable resources. It is therefore always desirable thatthe on-board algorithms required for self-validationshould be e$cient both computationally and in theirstorage requirements.The aim of this work was to devise an improved

probability estimation algorithm suitable for on-line im-plementation in the self-validating sensor test-bed de-scribed in Moran et al. (2000).The method, described in Naim and Kam (1994), for

generating reduced bias estimates relies on producing anestimation of the bias and then using this to correct theprobability estimates. This paper will con"rm that thetechnique described does indeed lead to estimates withreduced bias but that similar performance can beachieved by means of a very simple and computationallye$cient method based on the Bayesian Self OrganizingMap (BSOM) (Yin & Allinson, 1997). This is a simpletype of neural network that is described as self-organisingas the training is unsupervised. In the implementationused here, the network used is that of two Gaussiankernels.The decision theory background is outlined in Section 2

which describes the requirement for accurate probabilityestimates. Section 3 presents three di!erent methods ofprobability estimation together with a comparison oftheir performance; Stochastic Approximation (SA),Reduced Bias Estimate (RBE) in Naim and Kam (1994)and a method based on a Bayesian Self Organizing Mapusing Gaussian Kernels (GK).Section 4 describes a practical thermocouple system

to which faults can be applied and the application ofthe algorithms to sensor fault detection, in particularto a change in the time constant of a thermocouple.It describes how the RBE and GK methods may beused to detect just such a change. The signi"canceof the fact that the innovations sequence used hasa zero mean, especially to the RBE algorithm, is high-lighted. A comparison of the two methods is givenwith data taken from a practical experiment. Therelative complexity of the two algorithms is discussed.Finally, Section 5 gives conclusions with comments onthe relative performance and complexity of the di!erentalgorithms with particular emphasis to on-line imple-mentation.

2. Decision theory

In order to indicate that a fault has occurred, it is "rstnecessary to detect the incidence of a fault. In otherwords, it is necessary to distinguish between fault andno-fault conditions. This can be expressed as a hypothe-sis test, where the null hypothesis, H

�, assumes that there

is no fault, and the test hypothesis, H�, that there is

a fault.The probabilities that a given data set, d, asserts, or

refutes, each of these two hypotheses is calculated and thedecision, as to which hypothesis is more likely, can thenbe expressed as the likelihood ratio, ¸:

¸"p(d�H�)/p(d�H

�). (1)

The decision is now one of determining whether thislikelihood ratio is above or below a threshold value, �.

if ¸'� then H�is accepted,

if ¸)� then H�is accepted. (2)

Taking � as 1, this test will simply determine whichhypothesis is more likely for the given data set, d, andtherefore whether a fault has occurred or not.If the threshold, �, in (2) is calculated with reference to

P��

and P��

(the probabilities of H�and H

�), then the

Bayes risk can be minimized:

�"

(C��

!C��)P

��

(C��

!C��)P

��

. (3)

Eq. (3) can be simpli"ed if C��

and C��, the two costs of

making a correct decision, are equal to zero and ifC

��and C

��, the costs of making an incorrect decision,

are equal. The likelihood ratio test can then be expressedas:

If ¸'

P��

P��

, then H�is accepted,

if ¸)

P��

P��

, then H�is accepted. (4)

3. Probability estimation

3.1. Stochastic approximation

To use (4) it is necessary to know the probabilities ofthe two hypotheses. Although the exact values may notbe known in practice, they can be estimated on-line bya recursive stochastic approximation (SA) method asdescribed in Naim and Kam (1994). Thus:

PK (H�)��"PK (H

�)��#

1

k(u��!PK (H

�)��),

PK (H�)��"1!PK (H

�)��. (5)

426 A.W. Moran et al. / Control Engineering Practice 9 (2001) 425}438

Fig. 1. Illustrative example, P(H�)"0.7, ��

�"��

�"2.5, �

�"0.0, �

�"4.0.

Here PK (H�)�� is the estimate of P(H

�) at time-step k and

u�� is the binary value of the decision at time-step k (i.e.u��"1 if H

�is accepted and u��"0 if H

�is accepted).

As can be appreciated, there is a need for accurate estima-tion of P(H

�) and P(H

�) and (5) will only give unbiased

estimates if the decision process is error free such that theprobabilities of False Alarm and Missed Detection arezero (Naim & Kam, 1994).The basic SA algorithm is as follows:

Step 1: Set initial conditions, i.e. randomly select a valuefor P(H

�) within the range 0 to 1.

Step 2: Calculate the decision threshold,

�"�1!PK (H

�)��

PK (H�)�� . (6)

Step 3: Calculate the likelihood ratio of the two pdfs forthe next data input value d:

¸"

p(d�H�)

p(d�H�). (7)

Step 4: Determine the binary decision

u��"�1, ¸*�,

0, ¸(�.(8)

Step 5: Update the probability estimate according to (5).Step 6: k"k#1, Go to step 2.

A simple example will su$ce to demonstrate the per-formance of this algorithm and also to highlight theproblem of biased estimates. Consider the case of themixed output from two Gaussian noise sources that haveequal variances, but with di!erent means and unknownprobabilities, i.e. it is not known in what proportion thetwo noise sources are mixed.The Gaussian distributions of the two noise sources

are depicted in Fig. 1. As can be seen, the two distribu-tions have a fairly large overlap, the importance of whichwill be discussed shortly.One source of data can be imagined as representing the

null hypothesis, H�, and the other as the test hypothesis,

H�. SA can be used to determine the proportions, or

probabilities, of each of the two Gaussian noise sources.For means of 0.0 and #4.0, identical variances of 2.5and probabilities of 0.3 and 0.7 the results shown inFig. 2 were produced.The "nal value of PK (H

�) was 0.7242 giving a bias in the

result of 0.0242. From the "gure it can be seen that the earlyestimates reach over 0.8, from the initial estimate of 0.585before becoming more settled. The results for a MonteCarlo simulation, of 100 runs, are shown in the Table 1.The reason for the biased probability estimate is that

the decisions are not error free. The probabilities of falsealarm, P

�, and missed detection, P

�, are not zero due to

the fact that the two distributions overlap and so in thissimple example it is inevitable that errors will be made.By taking into account P

�and P

�, an estimate of the

bias can be produced which can then be used to correctthe probability estimates. This can be accomplished by

A.W. Moran et al. / Control Engineering Practice 9 (2001) 425}438 427

Fig. 2. Variation in PK (H�) for illustrative example using SA.

Table 1Monte Carlo simulation for SA

True value Mean BiasP(H

�) E�PK (H

�)� E�P(H

�)!PK (H

�)�

0.7000 0.7191 !0.0191

determining what proportion of a distribution is &ob-scured' by the other. The crucial thing to note fromFig. 1 is the point of coincidence of the two distributions.From this point to #R the distribution curve for P(H

�)

is obscured by that for P(H�). Similarly, from !R to

the coincidence point the distribution curve for P(H�) is

obscured by that for P(H�). The two areas, bounded by

these limits, will be equivalent to P�and P

�respectively.

It is therefore necessary to determine the value of thispoint of coincidence.In the example it is known what the true values of

P(H�) and P(H

�) are, and therefore, what the decision

threshold, �, is. What is also known are the means andvariances of the two noise source distributions, �

�, �

�, ��

�and ��

�. The likelihood ratio, ¸, is not fully described,

however, as the data value, call it x, is not known:

¸"

p(x�H�)

p(x�H�)

"

1

�2��

exp��

1

�2��

exp��

. (9)

By equating ¸ with �, as de"ned by Eq. (4), the value ofx can be found. The expression can be simpli"ed as it isknown that ��

�"��

�and also that �

�"0. Taking logs

gives the expression:

x"

2�� log �#��

2��

. (10)

This value of x marks the point of coincidence of the twoprobability density function curves. The probabilitiesP�and P

�are now found from the corresponding areas

under the pdf curves, as

P�"�

�

�

p(d�H�)dx, (11)

P�

"��

�

p(d�H�)dx. (12)

The integrals in (11) and (12) can be expressed in terms ofthe error function:

PK�"

1

2�1!erf�x!�

��2 ��, (13)

PK�

"

1

2�1#erf�x!�

��2 ��. (14)

The bias can then be estimated using (Naim & Kam,1994):

Bias, BK (H�)"(1!PK (H

�))PK

�!PK (H!1)PK

�. (15)


Fig. 3. Variation in PK (H�) for illustrative example using RBE.

Table 2Monte Carlo simulation for RBE


�) E�PK (H

�)� E�P(H

�)!PK (H

�)�

0.7000 0.6994 0.0006

Therefore using Eqs. (10), (13)}(15) it is possible to "ndthe bias in the probability estimates. For this example

x"1.4712 P�"0.1759 P

�"0.0547 Bias"0.0144.

This estimated value of the bias compares very well withthat found from the Monte Carlo simulation in Table 1.

3.2. Reduced bias estimation

The SA algorithm has been modi"ed to provide re-duced bias estimates of probabilities (Naim & Kam,1994). This modi"ed algorithm is de"ned as follows:

Step 1: Set initial conditions, by randomly selectinga value for PK (H

�) within the range 0 to 1.

Step 2 Calculate the decision threshold (6), likelihoodto ratio (7)

Step 4: and binary decision (8) as for the SA algorithm.Step 5: Update the probability estimate according to,

PK (H�)��"PK (H

�)��#

1

k(u��!BK (H

�)��!PK (H

�)��).

(16)

Step 6: Determine the coincidence point x for the twopdfs using (10), which assumes equal variance.

Step 7: Approximate the probabilities of false alarm andmissed detection, PK

�and PK

�, respectively. Eqs.

(13) and (14).Step 8: Approximate the bias in P(H

�) using (15).

Step 9: k"k#1, Go to step 2.

Using the same example problem as previously, thevariation in PK (H

�) for a typical run is shown in Fig. 3.

Here the "nal value of PK (H�) is 0.7016, whereas the

true value is 0.7. While this is better than for SA, there isstill a small bias in the result of 0.0016. What is evident isthe rather `energetica behaviour of the initial estimates.From the "gure it can be seen that the estimates reachalmost 0.8, from the initial estimate of 0.585, beforesettling nearer to the true value. The results for a MonteCarlo simulation, of 100 runs, using the reduced biasestimation (RBE) algorithm are shown in Table 2.These results show that the RBE does indeed give

probability estimates with a reduced bias. However, thishas to be o!set against the rather marked increase incomplexity for the purpose of online implementation.

3.3. Gaussian kernels

The two approaches of probability estimation dis-cussed so far have made use of the data in an indirectmanner as the probability estimates are not directly


Fig. 4. Variation in PK (H�) for illustrative example using GK.

derived from the data. Instead the data is used to producea likelihood ratio, which is a measure of the likelihood ofthe data having originated from one of two pdfs. Thisratio is then compared to an adaptive decision thresholdto provide a binary decision. Finally, this decision is usedin a mechanism to adjust the probability estimates.A method is now described that makes use of the data

in a much more direct way to provide accurate probabil-ity estimates. The data is used directly to determine therelative weighted outputs from two pdfs. The magnitudeof the outputs from the pdfs then determines the adjust-ment to the weights for each. The weights, in this case,being equivalent to the probabilities.This use of two Gaussian kernels was inspired by Yin

and Allinson (1997), who described how an arbitraryprobability density function can be approximated usinga mixture of Gaussian kernels. Each kernel has threeparameters that can be adjusted during training, themean, variance and priors (or weights). These are alladjusted at each time-step by way of an adaptive gain, �,which is set to decrease with time.In the example only two kernels are involved with

known means and variances. It is then only necessary toadjust the priors of each.The BSOM-based method (GK) is as follows:

Step 1: Set the initial conditions. Since the probabilitiesare unknown, randomly select a value for PK (H

�)

(Ow�) within the range 0 to 1.

Step 2: Calculate the weighted output for each of the twokernels.

P�(x)"w

�) p

�(x��

�, �

�), i"1, 2. (17)

Step 3: Calculate the output over the sum of theweighted kernel outputs:

P(x)"��

w�) p

�(x��

�,�

�). (18)

Step 4: Update the weights:

w��

"w��

#��w��

) P��

P(x)��!w��

� �. (19)

Step 5: Adjust adaptive gain, �, and go to step 2 .

��"�10

1000#k�. (20)

For the example problem, the variation in PK (H�) from

the GK algorithm is shown in Fig. 4. Here the "nal valueof PK (H

�) is 0.6988 which gives, therefore, a bias of 0.0012

which is comparable to RBE. The overall response is notthat much di!erent from that of the RBE algorithm. Themost obvious di!erence is that the initial estimates arenot as `energetica as those given by RBE, as they rise ina more gradual fashion from the initial estimate of 0.585.The results of a Monte Carlo simulation of 100 runs

are given in Table 3.


Table 3Monte Carlo simulation for GK


�) E�PK (H

�)� E�P(H

�)!PK (H

�)�

0.7000 0.6983 0.0017

Table 4Algorithm performance comparison

SA RBE GK

Execution time (s) 20.4 106.8 38.7Relative speed 1.00 5.24 1.90Number of variables 11 34 12

These results show that the performance of GK is verysimilar to that of RBE. This is achieved with an algo-rithm that is much simpler and therefore easier to imple-ment.

3.4. Implementations

Each of the three algorithms discussed earlier wasimplemented as a `Ca code program running on a PC.A few words on the relative complexity is of appropriate.SA is very simple and easily implemented, but the

results show that biased estimates can be produced.RBE gave more accurate results by the introduction of

an estimate of the bias into the probability calculation.However, in order to provide an estimate of the bias,a not inconsiderable amount of work has to be carriedout. This involves determining the point of coincidence ofthe two pdf curves. This point is then used as a limit ofintegration in order to estimate the areas under thecurves corresponding to the wrong decision probabilities.Because "nding the area under a Gaussian pdf curve isa nonanalytic process some other method must be used.In this instance use is made of `Ca code routines from(Press, Teukolsky, Vetterling, & Flannery, 1992) in orderto provide an implementation of the error function.In contrast GK is very simple indeed. The simplicity

stems from the fact that the algorithm is merely evaluat-ing the outputs from the two Gaussian kernels at eachtime-step. The weights, or probabilities, of the two ker-nels are then adjusted in relation to their relative outputs.A simple test was carried out to indicate computa-

tional complexity. This comprised of timing the execu-tion speed of each algorithm for a Monte Carlosimulation of 200 runs, with each run comprising of10,000 iterations. The results of the relative speeds of thethree algorithms are shown in Table 4 together with thenumber of variables used by each.

The increased complexity of RBE is clearly evident, asit takes over "ve times as long to execute as for SA andalso requires three times as many variables. In compari-son, GK takes a little less than twice as long to executebut with only one more variable than SA. The extraexecution time is mostly taken up in evaluating the ex-ponential function twice at each iteration.

4. Application to fault detection

The previous section has shown that the estimation ofprobabilities required for on-line fault detection can becarried out in a number of ways. The accuracy of thedi!erent methods and the complexity of the algorithmshave been compared with a simple example. With theaim of providing reliable and easily implemented sensorfault detection, this section now looks at application tothe problem of detecting the changes to a thermocouplesensor.

4.1. System description

The sensing system comprises a thermocouple, signalconditioning ampli"er, anti-aliasing "lter, 16-bit ana-logue-to-digital converter and a Texas Instrumentsdigital signal processor (DSP) on a card "tted into a PC.The DSP runs a `Ca code program for downsampling"ltering, identi"cation of the thermocouple outputmodel, innovation generation and calculating the vari-ance of the innovation sequence. The `Ca code alsoallows the DSP to communicate with the signal condi-tioning ampli"er in order to be able to modify the ampli-"er characteristics. The host PC is merely used asa means of downloading the C code to the DSP, datalogging and data visual display. A block diagram of thecomplete sensing system is shown in Fig. 5.The system uses an industry standard, type `Ka ther-

mocouple together with a cold-junction compensatingampli"er to provide a process measurement of10 mV/3C. The compensating ampli"er is pre-calibratedfor the range 03C to #503C. The ampli"er has beendesigned in such a way that it is possible to alter the gainand the bias under software control. This arrangementallows the overall gain to be altered by $10% and thebias to be altered by the equivalent of $63C of thenominal temperature. There is also a coarse adjustmentto the bias through which it is possible to simulate thecondition of saturation below full scale. Fig. 6 shows theampli"er output during a test where the bias is "rstramped to the upper limit and back, at a rate of 23C/min,followed by the ampli"er gain being ramped similarly, ata rate of 2%/min.Another important type of thermocouple fault con-

cerning loss of contact with the process medium can alsobe induced. At present this is not done under software


Fig. 5. System block diagram.

Fig. 6. Ampli"er output with bias change followed by gain change.

control. It has been shown that a change to the timeconstant of a thermocouple sensor can be detected by thechange to the variance of an innovations sequence (Yung& Clarke, 1989; Moran et al., 2000). Such a change isindicative of a classic fault where the contact between thethermocouple and the process being measured becomescompromised.

4.2. Modelling the sensor

In order to be able to detect faults it is "rst necessary todetermine a model of the thermocouple sensor. This isdone by on-line estimation of the parameters of a suitableARIMA model of the thermocouple system. It should benoted that the thermocouple system refers to the ther-mocouple and compensating ampli"er and not just to thethermocouple sensor itself. An ARIMA model is usedsince it is the high frequency behaviour of the system thatis of importance. Also, by identifying an ARIMA ratherthan an ARMA model, as in Yung and Clarke (1989),there is no need for an additional high-pass "lter. Thechange to the thermocouple time constant can bethought of as an additional "rst-order lag between theprocess medium and the thermocouple. This would havethe e!ect of band-limiting the high-frequency content ofthe innovations sequence.The ARIMA model of the thermocouple output was

identi"ed using recursive maximum likelihood as

y(k)"CK (q��)

AK (q��)�e(k) (21)


where AK (q��) and CK (q��) are the model AR and MApolynomial estimates, in the backward shift operatorq��, respectively and � is the di!erence operator. Theparameter estimation routine was halted once the modelparameters had settled down. The innovation sequencewas then generated by inverting the ARIMA model andusing it as a "lter:

(k)"AK (q��)�CK (q��)

y(k). (22)

Of course, if the model were a perfect description of thesystem then (k) would be equal to e(k). The subsequentstatistical tests were performed upon this innovationssequence. Under normal fault-free operating conditionsa number of statistics, such as mean and variance, can bedetermined for the innovation sequence. Any subsequentchanges to the sensor will be re#ected in this sequenceand hence in the derived statistics. Previous work (Yung&Clarke, 1989) has shown that a change to the dynamicsof a thermocouple system can be detected from the vari-ance of the innovation sequence. Hence, by "nding thefault-free value of the variance, ��

�, this can then be used

to check against an on-line estimate to determinewhether a change has occurred.

4.3. Estimation of variance

Since it is the variance of the innovation sequence thatis of importance, it is necessary to be able to provide anaccurate estimate of this quantity.With �� unknown and a data vector d of dimension N:

p(d�H,�)"�1

�2��exp

��

��

. (23)

Di!erentiating with respect to � leads to an expressionwhich will become zero when

�!��

��(d

�!�)�

��#N�"0 (24)

or

��"��

��(d

�!�)�

N. (25)

The maximum likelihood estimate of the variance is,therefore, simply the estimate of the variance. This can befound from the di!erence between the mean-square valueand the square of the mean:

�( ��"d) �(t)!(d) (t))�. (26)

See Yung and Clarke (1989) and Gertler and Chang(1986).

4.4. Likelihood ratio

Because of the particular circumstances of the sensorapplication, the likelihood ratio is formulated in a man-ner that avoids the need to evaluate the exponentialfunction. The de"nition of the likelihood ratio, for mul-tiple data, is given then by:

¸"

( 1

�2��)� exp��

��

( 1

�2��)� exp��

��

. (27)

From (25)

N��"��

(d�!�)�. (28)

Here ��"�

�"0 and �

�is also known, thus (27) reduces

to

¸"��

�( ��

��exp��( ��

��. (29)

Taking logs gives the log likelihood ratio:

log(¸)"N

2 �logR#�1

R!1�� (30)

where R"��/�( �

�and N is the window length over which

the test variance, �( ��, is estimated.

To generate the binary decision, for both SA and RBE,it is now simply a matter of comparing the log likelihoodratio with the log of the decision threshold, �, i.e.

u��"�1, log(¸)*log �,

0, log(¸)(log �.(31)

4.5. Reduced bias estimate

The fault to be detected, in the practical application,produces a change to the variance of the innovationssequence. Since an ARIMAmodel is being used to gener-ate this sequence, the data will have a mean of zero. Thissimple fact is of great signi"cance to the complexity of theRBE algorithm, as will be explained in due course. Thetask here is therefore to di!erentiate between two distri-butions of equal mean but di!erent variances. This isa subtle, but important, di!erence to the example casestudied earlier.In Fig. 1 two distributions had equal variances and

di!erent means, producing a single point of coincidencebetween the two distribution curves. The probabilitiesP�and P

�could then be found using the error function

as described in Section 3.1.Now, however, the two distributions are centred on

zero and whether they coincide or not depends not only


Fig. 7. Case 1 * Real values for the coincidence points $x.

on the variances but also the probabilities of the twodistributions.As explained previously, the RBE algorithm involved"nding the point of coincidence of the two pdf curves.For this new case, however, Eq. (10) no longer holds sincethere may now be two points of coincidence. As before¸ is given by Eq. (9). The means �

�and �

�are both zero,

��is known and �( �

�is the estimated test variance. Equat-

ing ¸ to the adaptive decision threshold, �, and takinglogs produces the following expression for the requiredpoints of coincidence:

x"$�2 log��(��

��

��

�( ��

�( ��!��

��. (32)

Whether a real value is generated or not from Eq. (32) isdependent not only on the two variances, ��

�and �( �

�, but

also on the decision threshold, �.Overall there are "ve distinct cases to be considered,

only two of which give real values for x.

� Case 1 (Fig. 7): Provided the following conditions aremet, then real values for x will be found:

P(H�)

P(H�)*

��

�(�

for �(�'�

�.

As can be seen from Fig. 7, under these conditions,there are two points of coincidence of the pdfs, at #xand !x. The expressions for the wrong decision

probabilities, PK�and PK

�, are given by

PK�"�

�

��

p(d�H�) dx#�

��

�

p(d�H�) dx, (33)

PK�"1!erf�

x

��2�, (34)

PK�

"��

��

p(d�H�) dx, (35)

PK�

"erf �x

�(��2�. (36)

� Case 2: This also gives rise to a real value for x, butonly under the conditions:

P(H�)

P(H�))

��

�(�

for �(�(�

�.

PK�and PK

�are given by

PK�"erf�

x

��2�, (37)

PK�

"1!erf�x

�(��2�. (38)

� Case 3 (Fig. 8): This case does not give rise to a realvalue for x because the two distribution curves do not


Fig. 8. Case 3 * No coincidence of pdf curves (imaginary values for x).

coincide. This will occur under the following condi-tions:

P(H�)

P(H�)'

��

�(�

for �(�)�

�.

From Fig. 8 it can be seen that one distribution curvecompletely encloses the other, i.e. there is no coincid-ence of the two curves. In this instance P(H

�)(P(H

�),

therefore, P�, must be considered to be zero and P

�, is

unity.

� Case 4: Again this case does not give rise to a real valuefor x because the two distribution curves do not co-incide. This will occur under the following conditions:

P(H�)

P(H�)(

��

�(�

for �(�*�

�.

In this instance, the pdf curve for P(H�) will com-

pletely enclose that for P(H�), i.e. again there will be no

coincidence of the two curves, and P�must be con-

sidered to be unity and P�

to be zero.� Case 5: There is one "nal, though unlikely, case to be

de"ned, where P(H�)"P(H

�) and �(

�"�

�. In this

incidence, the two distribution curves are identical.This then means that the probabilities of makinga wrong decision are also identical. The bias, as givenby (15), will therefore be zero.

This fault detection application shows the RBE algo-rithm to be rather more complicated than the `simplea

introduction of an estimate of the bias as it is now alsonecessary to discover if the two pdf curves coincide ornot. Only if there is coincidence of the two distributioncurves can estimates of the wrong decision probabilitiesbe found. The implementation of this algorithm is nowquite involved as the complexity has increased, mainlydue to having to implement the error function, and thestorage requirements have grown as well.The performance of the RBE algorithm is shown in

Fig. 9, together with that for SA for comparison pur-poses. The data for this o!-line test was collected from anexperiment where the time constant of a thermocoupleprobe, placed in a moving warm air stream, was alteredby removing a paper cover from the tip of the probe. Thealtering of the time constant manifests itself as a changeto the variance of the innovations sequence, shown in the"gure as the change to log(¸) at approximately t"400.The two other curves are the responses of the decisionthreshold, log(�), for RBE and SA.As can be seen from this "gure, there is a signi"cant

di!erence between the response of SA and RBE.Take the response of SA "rst. Since log(¸) is below

log(�) in the period up to t"400, u�� will be zero, Eq. (8),which will cause log(�) to increase, Eqs. (5) and (6). Thissituation will continue until log(¸) increases above log(�),indicating an increase in variance, which will cause u�� tobecome zero and hence log(�) will decrease.The performance of RBE is quite di!erent to that of

SA. Initially the curve for the decision threshold, log(�), isperfectly steady until it crosses log(¸) at about t"400.


Fig. 9. Response of RBE algorithm to thermocouple test data.

The reason that the decision threshold does not increasewith time, as it does for SA, is due to the correction in thealgorithm to counter the possibility of generating biasedestimates.The performance of both of the algorithms is very

similar once a fault has been detected, i.e. when log(¸) hascrossed log(�). The reason being that the update to theprobability estimate, PK (H

�), is dominated by the fact that

log(¸) is now greater than log(�) which makes the binarydecision, u��, unity, hence the decrease in PK (H

�).

4.6. Gaussian kernels

In contrast to RBE, the GK algorithm is hardly anymore complex for the practical application than for theexample problem. The reason for this stems from thefundamentally di!erent way in which the algorithmworks. The probability estimates are simply the weightsof the two Gaussian kernels, which are updated at eachtime-step in proportion to each kernels' output for thecurrent data. There is therefore no need to determine thecoincidence points, the wrong decision probabilities orthe bias. Hence the extra computational load is avoided.In fact only very minor changes are required to the

algorithm which consist of estimating the test variance,the likelihood ratio and decision threshold. Each of thesevariables has the same form, and are calculated in thesame manner, as for RBE. However, in this case ¸ and� are used to reset the adaptive gain, �, once a fault hasbeen detected, in order to give it more impetus.

The GK performance to the thermocouple test data isshown in Fig. 10. As can be seen from the "gure, the GKresponse is very similar to that for RBE, shown in Fig. 9.A further indication as to the performance of the di!er-

ent algorithms can be obtained from their executiontimes. As for the example problem previously, MonteCarlo simulations of 200 runs of 10,000 data points werecarried out on each algorithm. The results are shown inTable 5.When compared to the results for the example prob-

lem shown in Table 4, the relative execution times of allthree algorithms have changed. Those for SA and RBEhave decreased by approximately 16%. This is due to thefact that the estimation of the variance and the calcu-lation of the log likelihood ratio is actually computation-ally more e$cient than the calculation of the `basicalikelihood ratio used in the example problem. Since bothroutines use the same techniques, they are a!ected identi-cally. It may be surprising that the execution time forRBE is not longer than it is given that there could be twocoincidence points for the pdfs. Since the coincidences, ifany, are symmetrical, the calculations of PK

�and PK

�are

no more complicated than for the example problem. Theincrease to the relative execution time for GK is moresigni"cant. This is due to the extra computational e!ortrequired to estimate the variance, and calculate the loglikelihood ratio, neither of which were required for theexample problem.From the table it can be seen that the execution time

for RBE is signi"cantly greater than for GK, at a little


Fig. 10. Response of GK algorithm to thermocouple test data.

Table 5Execution time comparison for unknown variance case

SA RBE GK

Execution speed (s) 17.2 90.1 48.0Relative speed 1.00 5.24 2.79Number of variables 15 40 22

less than double the time taken. For the case where boththe variance and mean of the data are unknown, thecomplexity of the RBE algorithm increases still further.A timing test for this case has shown that the executiontime for RBE increases by 84%, whereas that for GKdoes not change.

5. Conclusions

It has been shown, with the use of an example problem,that accurate probability estimates cannot be guaranteedto be provided by the SA algorithm. The RBE algorithm,as described in Naim and Kam (1994), does indeed givemore accurate estimates by including an estimate of thebias. However, this increase in accuracy is accompaniedby an attendant increase in computational complexityand storage requirements. A technique that uses Gaus-sian kernels, in a type of Bayesian self organizing map,has been shown to provide comparable results but withmuch reduced complexity.

The two techniques, RBE and GK, have been appliedto the practical problem of determining the occurrence ofthe change in time constant of a thermocouple. For thisproblem, it has been shown that in order to provideestimates of the bias for the RBE algorithm a quiteextensive amount of work has to be carried out. This isdue to the nature of the innovations sequence that is usedas the test statistic, in particular the fact that the sequencehas zero mean. This is of importance if the techniquewere to be part of the implementation of a smart instru-ment, as the increased complexity and storage require-ments may be di$cult to accommodate within thelimited processing power of such a device. The GKalgorithm, however, is shown to provide very similarperformance to that of RBE but with a much lowerrequirement for processing power and storage.

References

de SaH , D. (1988). The evolution of the intelligent measurement.Measure-ment and Control, 21, 142}144.

Clarke, D. W., & Fraher, P. M. A. (1996). Model-based validation ofa DOx sensor. Control Engineering Practice, 4(9), 1313}1320.

Gertler, J., & Chang, H.-S. (1986). An instability indicator for expertcontrol. IEEE Control Systems Magazine, 6(4), 14}17.

Henry, M. P., Archer, N., Atia, M. R. A., Bowles, J., Clarke, D. W.,Fraher, P. M. A., Page, I., Randall, G., & Yang, J. C.-Y. (1996).Programmable hardware architectures for sensor validation. Con-trol Engineering Practice, 4(10), 1339}1354.

Henry, M. P., & Clarke, D. W. (1993). The self-validating sensor:Rational, de"nitions and examples. Control Engineering Practice,1(4), 585}610.


Henry,M. P., Clarke, D. W., Archer, N., Bowles, J., Leahy,M. J., Liu, R.P., Vignos, J., & Zhou, F. B. (2000). A self-validating digital coriolismass-#ow meter (1): Overview. Control Engineering Practice, 8(5),487}506.

Moran, A. W., O'Reilly, P. G., & Irwin, G. W. (2000). A case study inon-line intelligent sensing. In Proceedings of the American controlconference, Chicago, Illinois (pp. 3949}3953).

Naim, A., & Kam, M. (1994). On-line estimation of probabilities fordistributed bayesian detection. Automatica, 30(4), 633}642.

O'Reilly, P. G. (1998). Local sensor fault detection using bayesiandecision theory. In UKACC international conference on control,Swansea, UK (pp. 247}251).

Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P.(1992). Numerical recipes in C } The art of scientixc computing (2nded.). Cambridge: Cambridge University Press.

Upadhyaya, B. R. (1985). Sensor failure detection and estimation.Nuclear Safety, 26(1), 32}43.

Yang, J. C. -Y., & Clarke, D. W. (1997). A self-validating ther-mocouple. IEEE Transactions on Control Systems Technology, 5(2),239}253.

Yin, H., & Allinson, N. M. (1997). Bayesian learning for self-organisingmaps. Electronics Letters, 33(4), 304}305.

Yung, S. K., & Clarke, D. W. (1989). Local sensor validation. Measure-ment and Control, 22, 132}141.


Documents

Probability estimation algorithms for self-validating sensors