17
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. I, JANUARY 1995 171 A Generalized Change Detection Problem Igor V. Nikiforov Abstract-The purpose of this paper is to give a new statistical approach to the change diagnosis (detectiodisolation) problem. The change detection problem has received extensive research attention;however, the change isolation problem has, for the most part, been ignored. We consider a stochastic dynamical system with abrupt changes and investigate the multiple hypotheses extension of Lorden’s results. We introduce a joint criterion of optimality For the detectiodisolation problem and then design a change detectiodisolation algorithm. We also investigate the statistical properties of this algorithm. We prove a lower bound for the criterion in a class of sequential change detectiodisolation algorithms. It is shown that the proposed algorithm is asymptot- ically optimal in this class. The theoretical results are applied to the case of additive changes in linear stochastic models. Index Terms-Sequential change detection and isolation, gen- eralized change detection, linear stochastic models. I. INTRODUCTION TATISTICAL decision tools for detecting and isolating S abrupt changes in the properties of stochastic signals and dynamical systems have numerous applications, from on-line fault diagnosis in complex technical systems to edge detection in images and detection of signals with unknown arrival time in geophysics, radar and sonar signal processing. For example, the early on-line fault diagnosis (change detection and isolation) in industrial processes helps in preventing these processes from more catastrophic failures. As another example, let us consider the problem of target detection and identification. Assume that there are several types of targets. Each of them can appear at an unknown moment of time. The problem is to detect the fact that a target has arrived and to classify the type of target as soon as possible. The solution of the change diagnosis problem that is now almost traditional consists of subdividing this problem into two stages: change (fault) detection and change (fault) iso- lation, which are executed sequentially. As mentioned in the pioneering paper by A.Willsky [24], the fault detection (or alarm task) consists of “making a binary decision-ither that something has gone wrong or that everything is fine.” The fault isolation “is that of determining the source of the fault.” The change detection problem has received extensive re- search attention. Mathematically the change detection problem Manuscript received July 14, 1993; revised April 16, 1994. An early version of the paper was presented in part at the IEE Conference CONTROL’94, Coventry, UK, March 21-24 1994 and at the American Control Conference 1994, Baltimore, MD, June 29-July 1, 1994. The author is on leave from the Institute of Control Sciences, Moscow, Russia. He was with IRISA, Campus de Beaulieu, 35042 Rennes Cedex, France. He is now with L.A.I.L. U.R.A. CNRS D 1440, Universite des Sciences et Technologies de Lille, Batiment P2, 59566 Villeneuve d’ Ascq Cedex, France. IEEE Log Number 9406687. can be formulated as that of the quickest detection of abrupt changes in stochastic systems, Recent results and references can be found in [3]. The two main classes of quickest detection problems are the Bayesian and the non-Bayesian approaches. The first optimality results for the Bayesian approach were obtained in [18], [19]. More recent results can be found in [ 161. The first algorithm for the non-Bayesian approach was suggested by E. S. Page in [15]. It was the cumulative sum algorithm (CUSUM). The asymptotic minimax (“worst case”) optimality of the CUSUM was proved in [lo], where G. Lorden has given a lower bound for the worst mean delay for detection and has proven that the CUSUM algorithm reaches this lower bound. Recently, nonasymptotic aspects of optimal- ity for non-Bayesian algorithms were investigated in [12], [17]. The asymptotic minimax optimality of the CUSUM algorithm in the case of dependent random processes was obtained in [2]. The change isolation problem has been investigated much less. To our knowledge no proof of optimality in a mathe- matically precise sense exists. Moreover, because the quickest detection criterion of optimality does not take into account the isolation problem, the way to combine the detection and isolation algorithms together is not obvious. Therefore, the following problems remain unsolved: What is a convenient criterion of optimality for the fault isolation problems? How may we avoid contradictions between the criteria of the detection and isolation stages? Typically a short mean detection delay is desirable, but a longer decision delay can improve the result of the isolation stage. We shall prove that this is not always necessary. What is a lower bound for the performance index in a class of detectionhsolation algorithms? What is an optimal (or asymptotically optimal) algorithm which reaches this lower bound? The goal of this paper is to present a new statistical method for jointly detecting and isolating changes in the properties of stochastic systems, and to prove the asymptotic optimality of this method. The criterion of optimality of this “generalized change detection problem” consists in minimizing the worst mean detectionhsolation delay for a given mean time before a false alarm or a false isolation. The paper is the first attempt to solve this new problem. For this reason we assume that the statistical models before and after changes are known exactly. This means that we assume the case of simple hypotheses. The case of composite hypotheses will be discussed elsewhere. The paper is organized as follows. First, we give the generalized change detection problem statement, the basic model which is a finite parametric family 0018-9448/95$04.00 0 1995 IEEE

A generalized change detection problem

  • Upload
    iv

  • View
    216

  • Download
    3

Embed Size (px)

Citation preview

Page 1: A generalized change detection problem

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. I , JANUARY 1995 171

A Generalized Change Detection Problem Igor V. Nikiforov

Abstract-The purpose of this paper is to give a new statistical approach to the change diagnosis (detectiodisolation) problem. The change detection problem has received extensive research attention; however, the change isolation problem has, for the most part, been ignored. We consider a stochastic dynamical system with abrupt changes and investigate the multiple hypotheses extension of Lorden’s results. We introduce a joint criterion of optimality For the detectiodisolation problem and then design a change detectiodisolation algorithm. We also investigate the statistical properties of this algorithm. We prove a lower bound for the criterion in a class of sequential change detectiodisolation algorithms. It is shown that the proposed algorithm is asymptot- ically optimal in this class. The theoretical results are applied to the case of additive changes in linear stochastic models.

Index Terms-Sequential change detection and isolation, gen- eralized change detection, linear stochastic models.

I. INTRODUCTION TATISTICAL decision tools for detecting and isolating S abrupt changes in the properties of stochastic signals and

dynamical systems have numerous applications, from on-line fault diagnosis in complex technical systems to edge detection in images and detection of signals with unknown arrival time in geophysics, radar and sonar signal processing. For example, the early on-line fault diagnosis (change detection and isolation) in industrial processes helps in preventing these processes from more catastrophic failures. As another example, let us consider the problem of target detection and identification. Assume that there are several types of targets. Each of them can appear at an unknown moment of time. The problem is to detect the fact that a target has arrived and to classify the type of target as soon as possible.

The solution of the change diagnosis problem that is now almost traditional consists of subdividing this problem into two stages: change (fault) detection and change (fault) iso- lation, which are executed sequentially. As mentioned in the pioneering paper by A.Willsky [24], the fault detection (or alarm task) consists of “making a binary decision-ither that something has gone wrong or that everything is fine.” The fault isolation “is that of determining the source of the fault.”

The change detection problem has received extensive re- search attention. Mathematically the change detection problem

Manuscript received July 14, 1993; revised April 16, 1994. An early version of the paper was presented in part at the IEE Conference CONTROL’94, Coventry, UK, March 21-24 1994 and at the American Control Conference 1994, Baltimore, MD, June 29-July 1, 1994.

The author is on leave from the Institute of Control Sciences, Moscow, Russia. He was with IRISA, Campus de Beaulieu, 35042 Rennes Cedex, France. He is now with L.A.I.L. U.R.A. CNRS D 1440, Universite des Sciences et Technologies de Lille, Batiment P2, 59566 Villeneuve d’ Ascq Cedex, France.

IEEE Log Number 9406687.

can be formulated as that of the quickest detection of abrupt changes in stochastic systems, Recent results and references can be found in [3]. The two main classes of quickest detection problems are the Bayesian and the non-Bayesian approaches. The first optimality results for the Bayesian approach were obtained in [18], [19]. More recent results can be found in [ 161. The first algorithm for the non-Bayesian approach was suggested by E. S. Page in [15]. It was the cumulative sum algorithm (CUSUM). The asymptotic minimax (“worst case”) optimality of the CUSUM was proved in [lo], where G. Lorden has given a lower bound for the worst mean delay for detection and has proven that the CUSUM algorithm reaches this lower bound. Recently, nonasymptotic aspects of optimal- ity for non-Bayesian algorithms were investigated in [12], [17]. The asymptotic minimax optimality of the CUSUM algorithm in the case of dependent random processes was obtained in [2].

The change isolation problem has been investigated much less. To our knowledge no proof of optimality in a mathe- matically precise sense exists. Moreover, because the quickest detection criterion of optimality does not take into account the isolation problem, the way to combine the detection and isolation algorithms together is not obvious. Therefore, the following problems remain unsolved:

What is a convenient criterion of optimality for the fault isolation problems? How may we avoid contradictions between the criteria of the detection and isolation stages? Typically a short mean detection delay is desirable, but a longer decision delay can improve the result of the isolation stage. We shall prove that this is not always necessary. What is a lower bound for the performance index in a class of detectionhsolation algorithms? What is an optimal (or asymptotically optimal) algorithm which reaches this lower bound?

The goal of this paper is to present a new statistical method for jointly detecting and isolating changes in the properties of stochastic systems, and to prove the asymptotic optimality of this method. The criterion of optimality of this “generalized change detection problem” consists in minimizing the worst mean detectionhsolation delay for a given mean time before a false alarm or a false isolation.

The paper is the first attempt to solve this new problem. For this reason we assume that the statistical models before and after changes are known exactly. This means that we assume the case of simple hypotheses. The case of composite hypotheses will be discussed elsewhere. The paper is organized as follows.

First, we give the generalized change detection problem statement, the basic model which is a finite parametric family

0018-9448/95$04.00 0 1995 IEEE

Page 2: A generalized change detection problem

172 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. 1, JANUARY 1995

of distributions, and the criteria of optimality in Section 11. We also give an intuitive definition of the criteria and discuss some features of the proposed criteria.

Next, we design the change detectiodisolation algorithm for the basic model in Section 111. We also investigate the statistical properties of this algorithm. The main results are stated in Theorem I.

In Section IV we investigate a lower bound for the worst mean detectiodisolation delay in a class of sequential change detectiodisolation algorithms. The main result is established in Theorem 2.

Finally, in Section V we introduce two types of linear stochastic models with additive abrupt changes. The first type of stochastic models is a regression model with redundancy. The second type is a stochastic dynamical model. In these two cases we show how the new change detectiodisolation problems can be reduced to the basic problem and we discuss some new features which play a key role in these new models. We also investigate the statistical properties of the change detectiodisolation algorithm. The results are given in Theorems 3 and 4.

.

11. PROBLEM STATEMENT In this section we give an intuitive formulation of the change

detectiodisolation criteria and discuss some features of the proposed criteria. Next, we define the basic model which is a finite parametric family of distributions and give a formal definition of the criteria.

A. Intuitive Formulation of the Change Detectiodlsolation Problem

Let us assume that there exists a discrete time stochastic dynamical model Fn(e), n = 1,2, . . .. The vector 0 E R' is the parameter of interest. Let

K-1

F = {F(O) : 8 E R, R = U {e,}} i=O

be a finite family ( K members) of this model. Until the unknown time k the vector is 6' = 6'0 and from k+ 1 it becomes 8 = 81 for some I , 1 = 1,. . . , K - 1. Therefore, -En(@) is the model with abrupt changes

for some 1 = 1,. . . , K - 1 and k = 0,1,2, . . .

(1)

where F(f3,) is the normal operation mode of the model F and F(6j) is the mode with fault number 1 > 1. We assume that the values of 81 are known a priori. The change time k + 1 and number 1 are unknown.

Let (Yn),>l be a sequence of observations, which are coming from system (1). The problem is to detect and isolate the change in 0. In other words, we have to determine the type of fault (number 1) as soon as possible. The change detectionhsolation algorithm has to compute a pair ( N , v) based on the observations YI , Y2, . . ., where N is the alarm

time at which a v-type change is detectedisolated and v, v = l , . . . , K - 1, is the $nul decision. In other words, at time N the hypothesis 'FI, : (6' = e,} is accepted.

The following three situations can occur: If the change is detectedisolated after time k ( N > IC is true), then the delay for detection/isolation of an 1-type change is

~ 1 z N - k .

On the contrary, if the changes in 0 are detected before time L or if the final decision is incorrect (v # Z), then these are false alarms or false isolations which we characterize in the following manner:

-False alarms. Let the observations (Yn),21 come from the normal mode system F(e0). Con- sider the following sequence of alarm times

No = 0 < NI < N2 < . . . < N , < .. .

where N , is the alarm time of the detec- tion/isolation algorithm which is applied to Y N , - ~ + ~ , Y N , - ~ + ~ , . . .. Define thejrst false alarm time Nu=' of a j-type in this sequence

N ' ' 3 = i n f { N r : v , = j } l I s j l K - 1 r2l

where inf{8} = CO as usual. -False isolations. In order to avoid uncertainties in initial conditions let us assume that k = 0. In other words we assume that the observations (Yn),>l come from the mode F(6'1) with fault number 1 2 1. Define thejrs t false isolation time

of a j-type in this sequence: NU=j

NU=j - ' - inf { N , : v, = j } , 1 5 j # I 5 K - 1.

It is intuitively obvious that the optimality criterion must favor fast detectiodisolation with few false alarms and few false isolations. In other words, the delay rl = N - k given that N > IC should be stochastically small for each1 = 1, . . . , K-1, and

N"=j = r>l inf{N, : v, = j }

should be stochastically large for each combination of numbers j # 1.

B. The Basic Model

We consider a finite family of distributions P = {Pi, i = 0, . . . , K - 1) with densities { p i l i = 0 , . . . , K - l} with respect to some measure p. In the parametric case, we assume that P = {PO, 6' E R}, where B E R'

K-1

= U { O i l i=O

and we denote the density function of this family by p e ( X ) .

Page 3: A generalized change detection problem

NIKIFOROV: A GENERALIZED CHANGE DETECTION PROBLEM 173

Let ( X n ) n 2 1 be an independent random sequence ob- served sequentially and X I , . . . ] X I , have distribution Po while XI,+^, XI,+^, . . . have distribution Pl, I = 1, . . . K - 1

where L( ) is the probability law. The change time k + 1 and number 1 are unknown. We assume that for this family of distributions the following inequality is true:

where p;j is the Kullback-Leibler information.

C. Formal Dejnition of Criteria

Let us pursue our discussion of the criteria of optimality. The change detectiodisolation algorithm consists in comput- ing a pair ( N , U) based on the observations X I X2, . . .. Let Pk+l be the distribution of observations X I , Xal . . . when k = 0,1 ,2 , . . . and EL+l are the expectation under PL+l. Therefore, the mean delay for the detectiodisolation of an 1-type change is

71 = E;+l ( N - k 1 N > k ] A71 , . . . . X k ) (4)

where E( I ) is the conditional expectation. It is obvious that, without knowing a priori the distribution of the change time k + 1, the mean decision delay defined in (4) is a function of k and the past “trajectory” of the random sequence X I , . . . , Xk. In many practical cases, it is useful to have an algorithm which is independent of the distribution of the change time k + 1 and of the sample path of the observations X I . . . . X k . For this reason we use another minimax performance index, which has been introduced in [lo]. Hence, the worst mean detectionlisolation delay is’

-* T~ = s u p e s s s u p ~ ; + , ( ~ - k I N > ~ , x ~ , . . . , x ~ ) . ( 5 ) k20

On the other hand, the mean time before the first false alarm N”’J of a j-type is defined by the following formula:

9 Eo(N”=J) = Eo($in{Nv : v, = j

where Eo( ) = EL( ), and where the superscript is in this case irrelevant. Analogously, the mean time before the first false isolation NU’J of a j-type is

where El( ) = E:( ).

that the worst mean detectiodisolation delay Let us consider the following minimax criterion. We require

-* 7 - - sup esssupE;+,(N-k 1 N > I C . X ~ , . . . . X ~ ) I,>O,l<l<K-I

(6) ‘Let us assume that y, y, s are random variables. We say that the y =

esssup s i f 1) P(s 5 y) = 1; 2) if P(T 5 y) = 1 then P ( y 5 6) = 1, where P( A) 15 the probability of the event d.

should be as small as possible for a given minimum T of the mean times before a false alarm or a false isolation

Remark I : Usually, in the classical change detection prob- lem the mean time before false alarm is equal to the mean time between false alarms. It follows from the fact that the system inspection and repair times are not of interest of us and thus are assumed to be zero. In other words, we assume that the process of observation is restarted immediately as at the beginning. In the problem discussed we can assume that the T is equal to the minimum mean time between false alarms or false isolations under the same assumptions.

Discussion: Let us explain criterion (6)-(7). Originally, this type of criterion was introduced for the quickest change detection problem in [lo]. For a given change time k + 1, the conditional expectation

- 71 = EL+,(N - k I N > k 1 X 1 . ‘ . . ] X k )

is a random value. For this reason we have to use the essential supremum esssup 71 ( X , . . . , X,) in order to reach the smallest A l ( k ) such that

- Tl(X1.. ’ ‘ , Xk) I Al(k )

almost surely under the probability measure p of the observa- tions X I . . . . X k . After this we have to use the supremum

S U ~ A l ( k ) k > 0, 1 < 1 < K - 1

in order to guarantee the worst mean detectionholation delay with respect to the unknown change time and number 1 of the hypothesis 7&. On the other hand, we constrain the minimum value T of the mean time before a false alarm or a false isolation in order to guarantee some acceptable level of false solutions.

111. THE BASIC ALGORITHM AND ITS PROPERTIES

This section is organized in the following manner. First, we design the joint detection/isolation algorithm. Then we in- vestigate the asymptotic statistical properties of this algorithm using criterion (6)-(7).

A. The Change Detectionlholation Algorithm I ) Sequential Hypotheses Testing Problem: Let us start

with the Armitage sequential probability ratio test (SPRT) for statistical multiple hypotheses testing problem (see [ 11 or handbook [9, p. 2371). Again, we suppose that P = { P, : i = 0, . . . , K - l} is a finite family of distributions.

Hence, we consider K simple hypotheses

3-t,:(L(Xn),>1 =P,} , i = O ; . . , K - l .

We are observing sequentially an independent random se- quence (X,),,, with density p,, where p , is the density of an unknown member P, of the family P. The multiple hypotheses SPRT is nothing but the following pair (Ad, d) , where M is

Page 4: A generalized change detection problem

174 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. 1 , JANUARY 1995

the stopping time and d is the final decision which are defined in the following manner:

the comparison of the difference between the value of the log LR and its current minimum value

M = min Mi 1=0;. . ,K-l

gn = S; - O < k < n min S! = l < k < n max s:, s,O = o

2 1 : min [s:(l,j) - hl j ] 2 ") (9) with a given threshold. In other words, the CUSUM stops at time N, if, for some IC < N,, the observations x k , ' . , XAI~ are sign$cant for accepting the hypothesis about change

O<j#l<K--l

d = arg 1=0:..,K-l min Ml,

where

is the log likelihood ratio (LR) between hypotheses 3-11 and 3-1j and hlj are chosen thresholds.

The interpretation of SPRT (8) is very simple: it stops the first time n at which there exists some hypothesis 3-11 for which each LR S; (1 , j ) between 3-1~ and 3-1j is greater than or equal to a chosen threshold hlj , 0 5 j # 1 5 K - 1.

2 ) Design of the Detectiodlsolation Algorithm: The idea of the proposed detectionhsolation algorithm is based on a class of "extended stopping variables" which was introduced in [ 101 originally. Because we are not interested in the detection of hypothesis 3-10 we will assume that 1 = 1,. . . , K - 1.

Let us introduce the following stopping time:

n 2 1 : max SF 2 h lsksn

If we have to detect and isolate an 1-type change in the model then, generally speaking, we may exploit Page's idea with some modifications. Now we haxe the set 3-10, 3-11, . . . , %K_I

of alternatives. For this reason N 1 stops if, for some IC < N', the observations XI,, . . . , X,, are signiJicant for accepting the hypothesis 3-11 with respect to this set of alternatives

n 2 1 : max min S;( l , j ) 2 h } . (14) l < k < n O < j # l < r C - l

Let us discuss the following example. Let X , E R2 be a Gaussian vector: C(Xn) = N(B, I ) , where I is a unit covariance matrix. We consider the following hypotheses:

3-10 : { e = ( 0 ; O ) T ) 3-11 : { B = (-1; ly},

3-12 : { e = (1;2>T). and final decision

i, = argmin{N1, . . . , iVK-l) (1 1)

o_f the detectionhsolation algorithm where the stopping time N 1 is responsible for the detection of hypothesis 3-11 and is defined by the following formula:

IV' = inf N i " ( k ) (12) k l l

n 2 IC : min S;( l , j ) 2 h } , O<j#l<K-l

3) Discussion: Let us discuss the design of algorithm (10)-(12). It follows from (12) that the algorithm is based on the concept that is very important in detection theory, namely the LR

The typical behavior of the log LR is depicted in Figs. 1 and 2 where log LR Sy(Z,j) are denoted by S( l , j ) . For this example, the change time IC is equal to 20. Figs. 1 and 2 show the changes from 3-10 to 'FI1 and from 3-10 to 3-12,

respectively. The decision-making process is quite simple: when both differences between the values of log LR S;( 1 , O ) and Sy (1,2) and their current minima are greater than or equal to the threshold (here h = lo), as shown in Fig. 1, then we stop the process of observation ( N = 27) and accept the hypothesis 3-11 (v = 1). Analogously, when both differences between the values of log LR Sy(2,O) and SP(2,l) = -S;(l, 2) and their current minima are greater than or equal to the threshold then we stop the process of observation ( N = 24) and accept the hypothesis 3-12 (v = 2). This situation is shown in Fig. 2.

Stopping time (14) can be also interpreted as the generalized likelihood ratio (GLR) algorithm [lo], [ l l] , [24]. The GLR algorithm for testing between two composite hypotheses Ho = (0 E OO} and H1 = { e E 0 1 ) is based on the following statistics:

between hypotheses 3-11 and Xj. The key statistical properties of this ratio are as follows: S U P P d X ; )

SUP PdX,".) *

=

Ej(S;) < 0 and El(S;) > 0. B E O o

In other words, a change in statistical model (2) is reflected as a change in the sign of the log LR mean.

If we have to detect a change in a distribution then the

(CUSUM) algorithm [15]. The CUSUM algorithm is based on

In our case, the hypothesis H1 = 3-1~ is simple and the hypothesis

classical optimal solution of this problem is the cumulative sum H o = U X j O<j#l5K-l

Page 5: A generalized change detection problem

NIKIFOROV: A GENERALIZED CHANGE DETECTION PROBLEM 175

g lo-

L

Fig. 1.

/ I 1 I I

Typical behavior of the

20

Time

log likelihood ratios Sp( l . j ) . The change from 'MO to Xi.

40 I I I

30 !

1 S(1.0)

I I

2 -10 I -

-40

10 15 20 25 30 Time

Fig. 2. Typical behavior of the log likelihood ratios Sy ( 1 , j ) . The change from 'MO to ' M H z .

is composite, but finite. Hence

Let us add a comment on the threshold issue. In algorithm (10)-(12) we use the threshold h instead of hlj in SPRT (8). The main idea of this choice is the fact that the level of false alarms (or false isolations) is a function of the thresholds (see Lemma 1 for details). Therefore, to have the sjame level of false decisions for the separate stopping times N' we choose the same level of the thresholds h1.j = h.

B. Statistical Properties

Now we investigate the statistical properties, namely the relation between the worst mean detectiodisolation delay 7*

given in (6) and the mean time T before a false alarm or a false isolation given in (7).

The main result is _stated in the following theorem: Theorem I: Let ( N , i') be the detectiodisolation algorithm

(10)-(12). Then

p* = min min pl j 1<1<K-1 O < j # l < K - l

The proof uses the following three results. Lemma I : Let N J be stopping variables (12) with respect

to X 1 , X 1 , . . . , - 9. Then

El(fii) 2 eh for 1 = O,. . . ,K- l , 15 j # 15 K-1. (16)

Proof of Lemma I : It is known from Lorden's Theorem 2 [lo] that the expectation of the stopping variable N = infk21 { n ( k ) } , where n ( k ) is the stopping time of the open- ended SPRT which is activated at time I C , satisfies the follow-

-

Page 6: A generalized change detection problem

176 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. 1 , JANUARY 1995

ing inequality:

1 E l ( N ) 2 ;

From Farrell’s Theorem 3 [7] we know the following prop- erties of M :

P ( M < CO) = 1; lim h-’E(M(h)) h + w

1 - - when Pl(n(IC) < 00) 5 a, IC = 1 , 2 , . . .. Hence, it will suffice to show that Pl(Nj(1) < m) 5 e-h. min(m1, m2 . . . , mK-1) ’

In the case of stopping variable f i j we have m; = pj; > 0. Let us consider the following stopping variables:

The proof of Lemma 2 is complete.

rithm (10)-(12). Then Corollary I : Let ( N , Y) be the detectiodisolation algo-

as h 4 cy). (19)

Proof of Corollary 1: Formula (19) follows at once from

h max ~j[Nj(l)l - 7 7* 5 where A; = {fijZ(l) < cy)} and

l< j<K-1 lVjZ(1) = inf{n 2 1 : ~y(j,i) 2 h) .

It is easy to show that

Pl (B) 5 Pl(Al).

the definition of fi (10) and (18) (Lemma 2). Proof of Theorem 1: First, let us show that

Eo(fi”=j) 2 e’j = l , . . . , K - 1.

Hence, from the above formula and Wald’ s inequality Pl (Al ) 5 ePh (see [22, pp. 40-441)

Define the following two events: { f i ” = j 5 n} , where

N U = 3 = inf{fi, : v, = j } ~ l ( N j ( 1 ) < CO) 5 e-’. (17) r > l

and {fij 5 n}. Denote by T:, the argument of the above minimum. It iS obvious that {Nu” 5 nITb> = {fi’ 5 n} when T; = 1 and { f i ” = j 5 nlr;} C {fij 5 n} when rb > 1. Therefore

The proof of Lemma 1 is complete.

to x1,...,xk, Xk+l ,”’ , -PL+~. Then hmma 2: Let ~j be stopping variables (12) with respect

as h - i cy). (18) po(fi”=j 5 n I T ; ) 5 Po(f i j 5 n) when T; 2 1 h

min O < i # J < K - l

T; 5 Ej[fij(l)] - and

p o ( f i V = j > n I T ; ) 2 Po(Nj > n) when T: 2 1. Proof of Lemma 2: The first part of the proof follows

from Lorden’s Theorem 2 [lo]. Namely, note here that the event { f i j 5 I C } , where Now, since

W N j = inf fij(k) . . k > l Eo(N”=3 I T i ) = Po(fiV=j > n I T i )

n=O is the union of CO

2 CP0(lVj > n) = Eo(fij) {fij(l) 5 IC}, { N j ( 2 ) 5 I C } , . . . , { “ j ( k ) 5 I C } . n=O

This results in the worst mean detectiodisolation delay satis- fyings the following inequality:

we have from Lemma 1 that

~o(fij) 2 eh.

Finally, we get 7; 5 Ej (N j (1 ) ) .

Let us define the following stopping variable M(h) : E0(lV’”’j) = Eo[Eo(fi”=j 1 T i ) ] 2 e’. (20)

M = i n f { n > l : m i n ( e y : , . . . , k y T - ’ ) 2 h} Second, let us show that i=l i=l E, (RU=j ) 2 e’, 1 = 1, . . . , K - 1, 1 5 j # I 5 K - 1.

- \ I~

where Y1, . . . , Yn, of independent identically distributed (i.i.d.) random vectors. Moreover, we assume that

Y , = Cy:, . . . , y/K-l)T is a sequence (21) The Proof of this step of Theorem 1 is similar to the previous step. It will suffice to use Pl( ) (E l ( ) ) instead of Po()(Eo( 1).

Relation (15) follows at once from (20), (21), and Corollary 0 < min (ml, . . . , m K - l ) , where mj = E(yJ). 1 . The proof of Theorem 1 is complete.

Page 7: A generalized change detection problem

NIKIFOROV: A GENERALIZED CHANGE DETECTION PROBLEM I77

IV. ASYMPTOTIC THEORY

In this section we prove a lower bound for the worst mean detectionholation delay over the class IC, of sequential change detection/isolation algorithms. First, we give some technical results on sequential multiple hypotheses tests and then we prove this asymptotic lower bound for 7*. Finally, we compare this new lower bound with the lower bound for the worst mean delay of the classical change detection (without isolation !) algorithms which have been mentioned before.

Lemma 3: Let X I , X 2 , . . . be a sequence of i.i.d. random variables. Let 7 & , . . . , % ~ - 1 be K 2 2 hypotheses, where 'Ft, is the hypothesis that X has density p , with respect to some probability measure p, for i = 0, . . . . K - 1 and assume inequality (3) to be true. Let E z ( N ) be the average sample number (ASN) in a sequential test which chooses one of the K hypotheses subject to a K x K error matrix A = lluZJI), where utJ = P , (accepting ' X 3 ) , i , j = 0. . . . , K - 1. Let us define the following matrix A (see bottom of the page): Then a lower bound for E , ( N ) is given by the following formula:

stopping variable and v is the final decision, that satisfy the following inequalities:

min inf { N , : v, = j } ) 2 y. (24)

Theorem 2: Suppose class (24) is nonempty. Let us define the lower bound n(7) as the infimum of the worst mean detectionholation delay in the class IC,

n(7) = inf (?*). ( hr, u) E IC,

Let inequality (3) be true. Then

where

p* = min rriin p l j . 1<1<K-1 O < j # l < K - l

Proof of Theorem 2: See Appendix 11. Corollary 2: Detectionhsolation algorithm (lo)-( 12) is

.asymptotically optimal in the class IC7. It is of interest to compare n(7) (25) with the infimum

nc(y ) of the worst mean detection delay for a change detection algorithm. Let N" be the stopping variable of a change detection algorithm. We suppose that the worst mean detection delay is

sup esssupE;+,(N'-k 1 N" > k1X1,...,Xli).

Denote by IC; the class of all stopping variables satisfying

- T * =

k > O , l < l < K - l

(23)

Eo(") 2 YC. Corollary 3: Let the following equality be true

11 (1 - qz)1n@j3' - 1n2

P2.7

niax ( l < j # z < K - 1

for i = 1,. . . , K - 1, where

K-1

min plo = p* rz = Yz + Pd.

1=1,1#t 1 5 1 5 K - l

Proof of Lemma 3: See Appendix I. and ( K - l)y = 7"; then Definition I : Let IC , be the cZass of all sequential detec-

tiodisolation algorithms ( N , v), where N is the extended T;(,*(Y) - ~ j * ( y ) as y -+ 00. (26)

A =

L 1=1

71

7 2

. . .

Yi

. . .

YK-1

K-1

1 - Ca,l-Y1 1=2

P 2 1

. . .

Pi I

LjK - 1,1

aK-1

. . . P1,K-l

. . . P2,K-1

. . . . . .

. . . @i,K-1

. . . . . .

Page 8: A generalized change detection problem

~

178 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. 1, JANUARY 1995

Proof of Corollary 3: It follows immediately from Lor- den’s Theorem 3 [lo] that

In yc

1 g g - 1

nc(y ) = inf (7;) - as yc + m. N e E K ; min PlO

Hence, it is easy to see that (26) is true when

Discussion: Let us discuss the following practical interpre- tations of the above results:

From Theorem 2 it follows that the Kullback-Leibler numbers p i j play a key role in the statistical properties of the detectiodisolation algorithms. The minimum p* of the Kullback-Leibler “distance”2 between the two closest hypotheses ‘Hi and %j, 0 5 a # j 5 K - 1 define the worst mean detectionlisolation delay. Let us consider the two problems which have been mentioned in Section I:

1) The first problem is a change detection task without any isolation of the source of the change (alarm task by A. Willsky).

2) The second problem is the joint change detec- tiodisolation task.

If we suppose that the delay for detection is the price to be paid, then the following basic question arises: Is it necessary to pay more in the case of the more complicated second, problem? If p*, which is the Kullback-Leibler “distance” between the closest altematives ‘ H l and %j, is greater then or equal to the minimum “distance”

between the altematives %l and the null-hypothesis %O or, equivalently

then the answer will be “No.”

V. . ADDITIVE CHANGES IN LINEAR STOCHASTIC MQDELS

In this section we introduce some linear (regression and dynamical) stochastic models with additive changes. We also introduce, in brief, the key concepts that are to be used for the corresponding detectiodisolation problem: namely, redundancy and innovation (see [3 , pp. 249-2521 for details). After this we show how the new detectiodisolation problem can be reduced to the basic problem of Section I1 and we discuss some new features which play a key role in linear stochastic models with additive changes.

’Strictly speaking, p z J is not a distance in precise sense. However, in some cases, for instance, such as the change in the mean of a Gaussian vector sequence, this interpretation is precise and useful.

A. Models

ily of distributions: I ) Basic Model: We consider the following Gaussian fam-

K-1

P = { P , , 6 E R = U {e,}} i=O

where Po, = N ( 6 i , C ) , and C > 0 is a known covariance matrix. Let (Yn),zl be an independent Gaussian random sequence observed sequentially

, \ - I

(27) where Bo = 0 and 81 are known constants. The change time k + 1 and number 1 are unknown. We assume that the following inequality is true:

1 0 < pz3 = 2 (6 , - 63)T C-’ (6 , - 6 , ) < 00, 0 5 i # j 5 K - 1.

(28) 2 ) Regression Models: We consider the following regres-

sion model with additive changes:

Yn = H X n + Vn + Tl(n, k + 1) (29)

where X, is the unknown state, V, is a Gaussian white noise with covariance matrix R = a21,a2 > 0, H is a full rank matrix of size T x s with T > s, and Tl(n, k + 1) is the I-type change occurring at time IC + 1, namely

The characteristic feature of model (29) is the existence of redundancy (T - s > 0) in the information contained in the observations.

3) Stochastic Dynamical Models: We consider the follow- ing linear stochastic dynamical model with additive changes:

where U, is the known input vector, 2-l is the backward shift operator, A(z-l), B(z- l ) , C(z- l ) are polynomial matrices in the operator z-l, V, is a Gaussian white noise with covariance matrix C > 0. Assume, as usual, that the characteristic equations

P

i=l

have zeroes outside the unit circle.

B. Algorithms

In this subsection we design the change detectiodisolation algorithms for the above linear stochastic models. We design the algorithm for basic model (27), which is a Gaussian case of model (2). Then we show that the algorithm for regression model (29) is based on the residuals of the least squares algo- rithm. The statistical background of this problem is multiple hypotheses testing with nuisance parameters and the minimax solution of this problem is the GLR algorithm. On the other

Page 9: A generalized change detection problem

NIKIFOROV A GENERALIZED CHANGE DETECTION PROBLEM 179

hand, this algorithm can be reduced to the basic algorithm. Finally, we show that the detectiodisolation algorithm for dynamical model (30) is based on the innovations of the whitening filter. The statistical background of this algorithm is the Transformation theorem [23. pp. 53-59]. Again this algorithm is a particular case of the basic algorithm.

1) Basic Model: Model (27) is a particular case of basic model (2), which is defined in Section 11. For this reason, detectiodisolation algorithm (lo)-( 12) is valid for model (27):

jV = min {N1,. . . . f iK-1)

i. = argmin {fi', . . . , N K - l 1 N 1 = inf fi'(/c)

k > l

Note here that plJ is a function of the difference z = x' - X J . Therefore, we minimize p lJ (x ) with respect to z. The minimum is obtained for

z* = ( H T H ) - l H T ( Y J - Yl)

P&*) = - ( y J - K ) ~ P ( T , - rz), and is given by

1 2 0 2

(33)

where P = I - H ( H T H ) - l H T is the projection matrix, rank P = T - s. Finally, we have the following formula of LR (13) for hypotheses (32) under the least favorable value z* of the nuisance parameter

(31) 1 2

- -(& - QJTC-l(el - QJ) .

In the following subsections, we show how the other linear stochastic models with additive changes can be reduced to model (27).

2) Regression Models: In this case we consider regres- sion model (29). The characteristic feature of this detec- tiodisolation problem with respect to the above basic problem is the fact that the vector X is unknown. This type of statis- tical problem is usually called a hypotheses testing problem with nuisance parameters. Some tutorial introduction to these problems can be found in [3, pp. 141-145; 270-2731. Because r > s and the matrix H has rank s, we can use the redundancy to solve the detectiodisolation problem.

3) Minimax Algorithm: Let us define the following hy- potheses testing problem:

x1 = {Po(Y); B = H X 1 + Yl. X ' }

xJ = {Po(Y); 6' = H X J + YJr X J } (32)

and

where Tl, Y J are the informative parameters, and X ' , X J are the nuisance parameters. We are interested in detecting a change from Y3 to " 1 , while considering X as an unknown parameter of model (29), but since the expectation of the distribution Po is a function of this unknown parameter, the design of the test is a nontrivial problem.

From Theorem 2 it results that lower bound (25) in the class IC, is a monotone decreasing function of the minimum value of the Kullback-Leibler information p*. Therefore, the design of the minimax algorithm consists of finding a pair of the least favorable values X 1 and X J for which the Kullback-Leibler information plJ is minimum, and in computing the LR of the optimal algorithm for these values.

The expectation 0 of the output Y of model (29) is

0, = E ( Y ) = H X 3 + Y J j = 0, . . . , K - I

where TO = 0. Then the Kullback-Leibler information plJ is 1

PlJ = G ( 8 1 - &)T(ol - 0,)

Note that this LR is independent of the unknown values XI and X j . Therefore, let us define the following minimax algorithm:

n _> k : rriin S;( l , j ) 2 h } O<j#l<K-1

-1

i=k

(35)

4 ) Discussion: Let us add two comments about (34). First, it is easy to show that the minimax approach is equivalent to the GLR, which is based on the maximization of the likelihood function with respect to the unknown nuisance parameters (see also [3, p. 1441). In other words

1 2a2

- -(Y1- YJ)TP(Y1 - Y J ) .

Second, it is worth noting that (34) can be rewritten as

" 1 - s;(z,j) = - y > ( Y l ff - r,)T(ei - Tj)

z=k

1 - -(% 2a2 - r,)T(rl- Tj)

where ' f ' l = TTT1, T j = TTYj , ei = TTY,, T = ( t l , . . . , t r - s ) is a matrix of size T x ( r - s), and t l , . . . , tr -s are the eigenvectors of the projection matrix P. Therefore, LR (34) is a function of the purify vector ei of the analytical redundancy approach [8]. This parity vector ei is the trans- formation of the measurements Y, into a set of T - s linearly

Page 10: A generalized change detection problem

~

180 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. 1, JANUARY 1995

independent variables by projection onto the left null space of the matrix H .

The parity vector sequence (e,),>l can be modeled as

Consequently, to solve the detectiodisolation problem in the case of regression model (29), we have to transform the observations Y, into the parity vector e, and then solve the corresponding basic problem (27).

5) Example 1 (Radionavigation System Integrity Monitor- ing): Navigation systems are standard equipment for planes, boats, rockets, and other moving vehicles. On-line integrity monitoring (fault detectiodisolation) is one of the main prob- lems in the design of modem navigation systems (see examples and references in [6], [21], [13], [14], [3, pp. 454-4631).

For instance, let us consider integrity monitoring of the global positioning satellite set. Simplified measurement models of this type of radionavigation system can be described by (29). The problem is to detect and isolate a satellite clock fault which can be represented as the additional bias Yl in model (29). Conventional global navigation sets require measurements from four satellites to estimate three spatial orthogonal coordinates and a clock bias, or three orthogonal velocities and a clock bias rate ( X E R4). Because for 18- satellites global navigation sets, five or more satellites (T 2 5) are visible 99.3% of the time, it is possible to provide integrity monitoring by using these redundant measurements [2 11.

Let (Y,),>l be the output of model (29). Let us assume that Yl = (0,. . . , O . S l , 0 , . . . , O ) T . Satellite number 1 clock fault is represented by bias 61.

In this case the statistics S;(Z,j) are defined by the follow- ing formulas:

where E ~ J is the component of the LS residuals

E; = Py,

and pl, is the element of the matrix P. 6) Stochastic Dynamical Models: In this case we consider

dynamical model (30). It is obvious that the output Y, of this model is expressible as the sum of the output of the deterministic part P ( z - ~ ) U , of model (30) and the noise process with abrupt changes C(z- ' )[V,+Tl(n, k f l ) ] . Hence

Y, = A(z-l)E;, - B(z- ' )Un = C ( z - ' ) [ v , + Yl(n, k + l)]. (37)

Let us consider two Gaussian vectors Y, and X,:

U, = C(Z-1)Xn x, = v, + Yl(n, k + 1) n = 1 , 2 , . . . , Xn<0 = 0.

It is _easy to show that the transformation from the X space to the Y space is a diffeomorphism (one-to-one transformation). Note by J its Jacobian matrix. From the Transformation theorem (see [23, pp. 53-59]) it results that

p(Yl, . . . ,Y,) = ldet J J p ( X l , . . . , X , ) .

Therefore

The result from (38) is that this detectionlisolation problem is reduced to the above basic problem.

C. Statistical Properties of the Algorithms

In this subsection, we investigate the statistical properties of the change detectiodisolation algorithms. The goal of this subsection is to give some interpretations of the general results of Sections I11 and IV.

I) Basic Model: Since model (27) is a special case of basic model (2), Theorems I and 2 are valid in the case of algorithm (31). Note, that p* is given by

2) Linear Stochastic Models: From the above paragraphs it follows that the change detectiodisolation problems in the case of linear stochastic models (29)-(30) are reduced to the basic detectiodisolation problem. It is obvious that Theorem 1 is valid for these models. Moreover, algorithm (31) is asymptotically optimal in the sense of Theorem 2. In the case of stochastic dynamical model (30), the proof of this fact is trivial, for it is sufficient Lo remember that the transformation from the X space to the Y space is a diffeomorphism. In the case of regression model (29), we have to define the meaning of optimality.

We consider the following family of distributions: P = {P,2,xt,i = O,. . . ,K - l}, where 19, are the informative parameters and X i are the nuisance parameters. Suppose there exists a class IC, of all sequential detectiodisolation algorithms ( N , v) over this family of distributions.

DeJnition 2: Let us define the minimax lower bound %(r) as follows:

where

0 < p" = min min inf plj < CO.

We say that the detection/isolation algorithm (N, 77) is asymp- totically minimax if the following condition holds:

1<1<K-l O<j#l<K-l X"X-1

Page 11: A generalized change detection problem

NIKIFOROV: A GENERALIZED CHANGE DETECTION PROBLEM 181

Theorem 3: Let us consider regression model (29). We assume that the following inequality is true:

Let (N, 9) be detectiodisolation algorithm (35). Then

where

p* = min min 151 5 K -1 O<j+l< K - 1 p;j -

Proof of Theorem 3: As we have mentioned above, LR (34) is a function of the parity vector e, which is a Gaussian T - s-dimensional random variable. From (33) it follows that

Finally, (40) follows at once from Theorem 1. Corollary 4: Detectiodisolation algorithm (35) is asymp-

totically minimax. 3) Discussion: Let us add a remark about the problem of

detectability and isolability of changes in regression model (29). We emphasize that this problem is nontrivial in the case of the regression model.

Suppose that the vectors Yj E R', j = 0 , . . . , K - 1 are chosen arbitrarily such that llri - Yjll2 2 E > 0,O 5 i # j 5 K - 1, where

I T

From matrix theory it results immediately that

where Y = Yz - Y3. For this reason inequality (39) is not valid for all arbitrary vectors Tz and T3. Roughly speaking, it is impossible to detect and isolate all arbitrary changes in model (29). The fact that the norm is strictly positive is not a sufficient condition in this case. Some of these changes will be indistinguishable from the statistical point of view. In order to simplify the problem, let us assume a priori the following: i) all the vectors Tz - T3 have l 5 rank P = T - s nonzero components only; ii) all the principal minors with order from 1 to l of the matrix P are strictly positive. Then the problem is much simpler. Namely, under these constraints inequality (39) holds true for arbitrary vectors Tz and Y3. If these constraints do not apply, then we should check a priori inequality (39). 4) Example 2 (Radionavigation System Integrity Monitor- ing-Continued): Let us pursue our discussion of Example 1. Assume that only one satellite clock can fail at a time. Discuss the following problem: How many visible satellites are necessary to detect and isolate this fault? Because To = 0, it is easy to see that the minimal number T of visible satellites for detecting this fault is equal to five (redundancy= T - s = T - 4 = 1). If we wish to detect

and isolate this fault, then the maximal number l of the nonzero components of the vectors Ti - Tj is equal to two and, consequently, it is necessary to have six or more visible satellites ( e = 2 5 T - s = T - 4)!

D. Stability Against Changes in Design Parameters

The goal of this paragraph is to investigate the stability of the above change detectiodisolation algorithms with respect to design parameters. From a practical point of view, it is important to have a detection algorithm which holds its performance stably against changes in design parameters. Let us consider the two following aspects of this problem: the unknown dynamic profile &(n - k ) of the change magnitude, and the unknown covariance matrix C of the observations after the change time IC + 1.

Let us consider change detectiodisolation algorithm (3 1). From the above paragraphs it follows that this algorithm is asymptotically optimal in the case of Gaussian basic model (27). Suppose now, that observations Y k + 1 , Y k + a , . . . are gen- erated by another Gaussian distribution. In other words, the model is

' ( K ) = { N(dl(n - IC), 51, if n 2 + 1

where ,&, = 0. The profile &(n - I C ) and the covariance matrix 0 < C < 03 are unknown a priori. We assume that the following condition holds:

N ( & , C ) , if n _< k

IC = 0,1,2;.. (41)

1 n-m lim - c d l ( n ) n = I$, 1 = 1 , . . . , K - 1. (42)

i=l

Let kk+l( ) denote the expectation under N(&(n - IC), 9) and I$( ) = hi( ).

Let us define the following worst mean detectiodisolation delay:

esssup E:+~ ( N - k I N > IC, Y I , . . . , ~ k ) . 7* =

The goal of this subsection is to show that 7; is expressible by the same asymptotic formula as in the case of true model (27) (see (19) in Corollary 1).

Theorem 4: Let us consider model (41)-(42). We assume that the following condition holds:

sup -

k>O,l<l<K-l

Let (fi, fi) be detection/isolation algorithm (3 1). Then

Proof of Theorem 4: See Appendix 111. Corollary: Let us assume that 0; = & , 1 = 1, . . . , K - 1.

Then

7*(h ) w 7:(h) ash i CO.

Page 12: A generalized change detection problem

182 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. 1, JANUARY 1995

VI. DISCUSSION

A new statistical approach to the change diagnosis problem is proposed. This approach consists of jointly detecting and isolating abrupt changes in a stochastic system.

Our main results are the following: 1) We introduced a minimax criterion of optimality (6)-(7)

for this detectiodisolation problem. 2) A new statistical change detectiodisolation algorithm

has been designed. This algorithm is expressible by

3) We investigated the statistical properties of this algo- rithm. The result is stated in Theorem 1. We proved a lower bound for the worst mean detectionholation delay in a certain class of sequential change detectiodisolation algorithms. This result is given by Theorem 2. From Theorems 1 and 2 it follows that the proposed algorithm is asymptotically optimal in this class.

4) As we demonstrated in Section V, the general results can be applied to some classical linear stochastic models with additive abrupt changes. The nontrivial problem of detectability and isolability, which arises in the case of regression model with redundancy (29), has been addressed.

5) It has been proven that the detectionhsolation algorithm is stable against changes in design parameters. Particu- larly, the algorithm is stable with respect to the unknown dynamic profile of the change magnitude if this profile converges to a known constant.

Let us add a concluding remark. It is obvious that the pro- posed scheme (10)-(12) is not a recursive algorithm. Hence, the problem of interest is to find another appropriate recursive computational scheme in order to reduce the amount of nu- merical operations, which should be performed for every new observation, without losing optimality.

(lo)-( 12).

APPENDIX I PROOF OF LEMMA 3

Let us prove the first part of inequality (23). The following inequality appears in [20, Theorem 3.11 as a generalized Wuld lower bound for ASN:

E , ( N ) ~ , , 2 K-l

a,l In s, /=0 a, 1

for i = 1.. . . . K - 1

where index .J' is arbitrary except for j # i . Let us assume that j = 0 and 1 5 i 5 K - 1. In accordance with the notations (22), (23), and Lemma 6 [2OI3 we can write

1=1

Lemma 6 [Simons]: Let (I 1, . . . , and bl , . . . , b, be two sequences of positive real numbers. Let (I = a , and b = Cy=l b l . Then

K-1

l = l , l # i ~

1=1

(44)

Finally, the following inequality follows from the fact that

is nonnegative and the minimum value of y; In yi + (1 - n) In (1 - 7;) is equal to -In 2

f o r i = l ; . . . , K - l . (45)

Let us prove the second part of inequality (23). Let 1 5 i 5 K - 1 and 1 5 j # i 5 K - 1. And again we can write by the same arguments

where

K-1

ri = yi + Pi1 l= l , l # i

Therefore, we have the following formula :

for i = 1, . . . , K - 1. (47)

Page 13: A generalized change detection problem

NIKIFOROV: A GENERALIZED CHANGE DETECTION PROBLEM 183

APPENDIX I1 PROOF OFTHEOREM 2

The proof consists of two parts. The first part includes the derivation of the asymptotic relation between the worst mean detectiodisolation delay and the mean time before false alarms. The second part includes the derivation of an analogous result for false isolations.

Note that the scheme of our proof is as in Lorden’s Theorem 3 (see details in [lo]). The novelty is in the extension of Lorden’s results to the case of K > 2 hypotheses.

A. The First Part:

For this reason the following inequality holds:

Pk+,(Ti < 00 I Ti-1 = k < N ) I €1.

We denote all subsets D;k = {Ti-l = k < N } , for which Po(Dik) > o and, hence, also P:+,(Dik) > 0. Let us define the following sequential test ( N * , d* ) on the subset Dik by using the stopping variables Ti, N , and the final decision v:

N * = min{N, Ti} v, if N 5 Ti 0, if N > Ti. d* = { (49)

It is sufficient to show that for every € 1 E ( 0 , l ) there exists IC1(El, K)I < col = 1 , . . . , K - 1 such that for all ( N , v) E IC, the following inequality is true:

In other words, at time N* one of the following hypotheses ‘Ho<d*<K-l is accepted (see bottom of page):

Let us consider the statistical properties of this sequential test. The conditional expectation of the sample number of observations taken for the test is , 1 = 1, . . . , K - 1. (48) (1 - El)lny+ G ( Q , K ) 7; 2

P10

As in [lo] let us introduce the following “additional” E;+l(N* - 1 Dik). stopping variables:

It results from the definition of the worst mean detec- tiodisolation delay that To = 0 < Ti < T2..’

P“,,,(Tj < m I Ti-1 = k < N ) 5 € 1 So, we have the following lower bound for the ASN of test (49) on the subset D;k (see the inequality at the bottom of

provided that

Po(Ti-1 = k < N ) > 0.

Moreover, it is obvious that

{Ti < CO I Ti-1 = IC < N }

. . 1=1

Page 14: A generalized change detection problem

184 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. I , JANUARY 1995

Hence, we have K-1

Second, (51) has been received by Lorden [lo, see eqs. (19)-(21)] under the constraint on the following minimum:

Q = inf {Po(Z < i + 1 I Z 2 i ) } 7; 2 (1 - tl)I In P o ( N 5 Ti n v = lIDik)l - In2 1=1 221

PEO. (50)

It can be shown [lo] that Po(N I: T, n v = 1 I T,-1 < N ) is an average over k of the probabilities P o ( N 5 T, n v = 1 I T,-1 = k < N ) satisfying (50).

P o ( N 5 T, n v = 1 1 TXp1 < N ) =Eo[Po(N 5 T, n v = 1 I T,-1 = IC < N ) I T,-1 < N ]

cc

= C P ~ ( T ~ - , = I C ) P ~ ( N 5 T, n v = I 1 T , - ~ = k < N ) . k=O

Moreover, from Jensen inequality for convex functions it follows that for all values z : Po(T,-1 < N ) > 0 we get (see bottom of page)

It follows from the definition of the sequential test ( N * . d * ) that

Po(N 5 Tl I E-1 < N ) K-1

= P ~ ( N 5 T, n = i I T , - ~ < N ) . 1=1

Hence, we can utilize here Lorden’s Theorem 3 [lo] which provides us with the following lower bound:

(1 - t l ) l nEo(N) - (1 - q) lnEO(Tl ) - In2 PLO

Ti; 2 .

l = l ; . . , K - l . (51)

Let us assume that the elements of the sum K-1

P ~ ( N 5 T, nu = 1 1 T ~ - ~ < N ) 1=1

are equally chosen. Combining (52) and (53) , we get the probability of the false alarm of an 1-type

(54)

Let us consider now the following sequences of stopping variables (here false alarms):

No = 0 < N1 < N2 < . . . < N,. < . . . where N, denotes the stopping variable obtained by ap- plying N to X N , - ~ + ~ , X N , + ~ + ~ , . . ., and final decisions v1, v2 , . . . , v, . . .. Since v1 , up, . . . are i.i.d. random variables we have immediately that

Moreover, N1, N2 - N I , N3 - Nz . . . are i.i.d.4 as well, and Eo(inf{r 2 1 : v, = 1 ) ) < ca. Hence, Wald’s identity [22, pp. 52-54; App. A.3I5

To finish the first part of the theorem we have to compute the Eo = ~ ~ ( i ~ f { ~ 1 : v, = l j ) ~ ~ ( ~ )

(56) mean time before the first false alarm of an Z-type

EO rnin{N, : v, = l } ) , 1 = 1,. . . , K - 1 ( e l

in the sequence N I < N2 < . . . < N, < . . . of false alarms. Let us assume that X I , Xz, . . . , N Po. We consider the

sequence T I , T2, . . . of the stopping variables before the first N 5 T, and denote by Z = irif {i 2 1 : N 5 TL} its number. The final decision here is v 2 1. The probability of the false alarm of an 1-type is

cz1

~ ~ j l ) = C ~ , ( ~ = i n ~ = 1 ) , 1 = i . . . . . K - 1 . (52) a=l

If Po(Z 2 i ) > 0 then Po(Z < i + 1 n v = 1 I Z 2 i ) is well defined and we get the formula

PO(Z = i nv = 1 ) = PO(Z < i + i n v = I I z 2 i )Po(Z 2 i). ( 5 3 )

Let us consider (52) and (53). First

Po(Z < i + 1 n v = 1 I Z 2 i) = Po(N 5 T, n v = 1 I T,-1 < N ) .

holds and combining (54)-(56) we get

Eo (inf { N , : v, = 1) ) (57)

rz l

K - 1 EO(N =

Finally, from (51) and (57) we get the lower bound

Ti; (1 - ti)1nEo(inf,21{Nr : v, = 1 ) ) + Cl(El,K)

PLO 1

1 = 1, . . , K - 1 (58)

where

Cl(q,K) = - ( l - t l ) [ lnEo(T1)+In(K-l)] -1n2.

The proof of the first part of the theorem is complete. 4Here the “increments” *Y, - V7-, are distnbuted as S 5Theorem [Wald] Assume that ‘1 3 2 are I I d random variables and

E(CY==, I 5 , I ) < 3c For any integrable stopping variable r we have

E(S, ) = E(r)E(s) S, = 5 ,

,=1

Page 15: A generalized change detection problem

NIKIFOROV: A GENERALIZED CHANGE DETECTION PROBLEM 185

B. The Second Part

It will suffice to show that for all (N, v) E IC, the following inequality is true:

where C z ( K ) = - (K - 2)e-I - In2 and

Let us assume that XI, XZ, ... , - Pl, 1 = 1,. . . , K - 1. Now we do not need to introduce the “artificial” additional stopping variable Ti and we consider the following sequence of stopping variables:

No = 0 < NI < N2 < .. . < N, < . . .

where N, denotes the stopping variable obtained by applying N to X N , _ ~ + ~ , X ~ , - ~ + ~ , . . . . Let us define the sequential test (NT,vT) which chooses one of the K - 1 hypotheses

Let us consider the statistical properties of this sequential test. From the definition of the worst mean detectionhsolation delay it results immediately that

T~ =sup esssup EL+,(N - IC I N > IC,Xl,Xz,...)

%l, .” ,%K-1 .

-* k20

2 E:(N - 0 1 D,o) = Ei(N).

In order to apply lower bound (23) for the ASN in this case, we have to assume that in Lemma 3 yl = 0, l = 1, . . . , K - 1. The convention which interprets Oln as zero [20] leads to the following lower bound for the ASN:

for 1 = l , . . . , K - 1, where K-1

Tl = P l j j= l , j # l

and

,Bjl = Pj(accepting % l , Z # j ) = Pj(vT = l ) ,

holds and we have

Inserting (62) in (61), we get the following inequality:

}. (63)

Note here that min,>o(zlnz) = e-’. Since inequality (63) holds for all values 1 = 1, . . . , K - 1, we get finally

(1 - ( K - 2)$)1n$ -1112 { P l j 7; 2 max

l<j#l<K-l

lny - In?* - ( K - 2)e-I - ln2 7* 2

P or

7*(1+ o(1)) 2 In’ + C z ( K ) as y 3 00 (64) P

and the proof of the second part of the theorem is complete.

and the fact that The “first-order’’ lower bound (25) follows from (58) , (64),

p* = min { l<Z<K-l min (p lo), p } .

APPENDIX I11 PROOF OF THEOREM 4

The first part of (43) follows from Lemma 2 and Corollary 1. It is sufficient to show that (see bottom of page) to prove the second part of formula (43)

h a s h ---f CO. (65)

l<j< K -1

It results from Berk’s Theorem 3.1 (see [ 5 ] ) that the mean delay for detection satisfies

provided that

j ,1= l , . . . , K - l . and also, for some 6 E (0, p ) , the “large deviation” probability

min SY(1,j) O<j#l<K-l

n

lim npn = 0 (68)

[ Since VI, vp, . . . are i.i.d. random variables we have immedi- ately that P n = Pl

1 El(inf{r 2 1 : v, = j } ) = -

P l j .

Moreover NI, N2 - NI, N3 - Nz, . . . , are i.i.d. as well, and satisfies the two following conditions:

El(inf{r 2 1 : U, = j } ) < CO. Hence, Wald’s identity n+m M

El ($in”, : v, = 3 ) = El(N)El(inf{r 2 1 : v, = j } ) . ) n=l

Page 16: A generalized change detection problem

186 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. 1, JANUARY 1995

Let us show that (67) is true when e is defined by Let us consider the Gaussian random vector 2, E RK-' such

The left-hand side of (67) can be rewritten as It is known that the family X N N ( 0 , I ) remains invariant6 under the transformation gX = R X , where C z = RRT. Therefore

f n = o<jgg-l { ;Wd} min {zn(llj) + & ( l , j ) } @O, I (A) = @?(O,I) (gA) - -

O<j#l<K-l

where C[Z,(Z.j)] = N(0 , 5 ) and

1 2

- - ( B l - Oj)TC-1(8, - S j ) .

It results from the strong law of large numbers that ~ ~ ( l , j ) ~ Z l O . Hence, the continuity theorem [4, ch. 1, para. 51 and the fact that

-

-

d ( l , j ) = lim & ( l , j ) = (0, - - ~ j ) n-3o

1 2

- -(Ol - Bj)TC-1(el - O j )

lead to the following formula:

min { ( e l - ~ j ) ~ ~ - l ( i J l - ~ j ) O<j#l<K-l f n w z l

where

g ( O , I ) = (RO; E,) A = { X : X T X 5 X2}

g A = (2 : ZTCG1Z 5 A'}>, 2 = R X

cpe,c(X) = ( 2 ~ ) - ( ~ - ' ) / ' ( d e t C)-1/2

.exp { - f ( ~ - s ) T C - ~ ( X - 8)).

Define the following ellipsoid:

ZTC,lZ = X2

where X2 = C 2 ( N ) mini rii' and ai;' are the diagonal elements of the matrix E,'. It is easy to see that

1 2

--(Ol - Oj)TC-l(dl - 6y). p , 5 1 - pi{ O<j#l<K-1 min [Z,(l,j)l 2 C ( W }

Let us estimate the "large deviation" probability p , and < 1 - @ O , n - l ~ z ( g A ) 1 1 - @o,n-1I(A) prove (68)-(69). First, find the following upper bound for p,:

and, finally

p , = P l - n o < j # l < ~ - i min S:( l , j ) < Q) p , 5 1 - Q ~ , , - ~ ~ ( A ) < p , = 1 - (1 - 24(-Xfi)}K-' - =hi{ niin [ z , ( ~ , j ) +ii,(l,j)l < 4) where X = > 0. From this and the asymptotic formula

' { l

5 i.,{ o<j$;~K-l[~n(llj) + o<jg&l dn(l1dl < G }

= Pl{ o<j$iK-l [Zn(h.dl < Co<j$g-ld( l l j )

O<j#l<K-l

1 - 1 1 3 4(--5) exp (-$) (1 - + 2 +. . . 3

- " 1 = I , - exp (-g) d-5 6

-

- min d , ( l d } , we deduce that O<j#l<K-l

lim np, = 0. where c E (0 , l ) . 1 1 - 0 0

It is obvious that for all c E ( 0 , l ) there exists N ( c ) such Moreover, straightforward computations show that that for all n > N ( c ) the following inequality holds: 00 -

C ( N ) = ( e - 1) O<j#,<K-l min d ( l , j ) cisn < 00 n=l - -

4Zl.d - o<jg;K-l d n ( 4 j ) l < 0. + I O<j$?K-l

Hence, we have for all n > N ( c ) and the proof of (68)-(69) is complete.

Thus we have proved that (67)-(69) hold true. From this we then have (66) and, finally, we get (65). The proof of Theorem 4 is complete.

6A parametric family of distributions P = {Po} remains invariant under a group of transformation 4 if Vg E 6 and V0, 30, such that: Po(Y E A ) = Pog (Y E g 4 ) , where 0, = g0.

Page 17: A generalized change detection problem

NIKIFOROV: A GENERALIZED CHANGE DETECTION PROBLEM 187

ACKNOWLEDGMENT 191 B. K. Ghosh and P. K. Sen, Eds., Handbook of Sequential Analysis. New York: Marcel Dekker, 1991.

1101 G. Lorden, “Procedures for reacting to a change in distribution,” Ann. The author would like to thank the reviewers for valuable -~ comments on the paper. The author is gratefully acknowledge reviewer c for his very helpful comments and suggestions on an version of Theorem 2. The author is also grateful to

Math. Stat., vol. 42, pp. 1897-1908, 1971. ~

[ 111 G. Lorden, “Open-ended tests for Koopman-Dmois families,” Ann.

1121 G. Moustakides, “Optimal procedures for detecting changes in dist$bu- Stat., vol. 1, pp. 633-643, 1973.

A. Benveniste and M. Basseville for their many constructive comments on an early version of the paper.

REFERENCES

[I ] P. Armitage, “Sequential analysis with more than two alternative hy- potheses, and its relation to discriminant function analysis,” J. Roy. Statist. Soc. B. , vol. 12, pp. 137-144, 1950.

[2] R. K. Bansal and P. Papantoni-Kazakos, “An algorithm for detecting a change in a stochastic process,” IEEE Trans. Inform. Theory, vol. IT-32, pp. 227-235, Mar. 1986.

[3] M. Basseville and I. Nikiforov, Detection ofAbrupt Changes. Theory and Applications (Information and System Sciences Series). Englewood Cliffs, NJ: Prentice-Hall, 1993.

[4] A. A. Borovkov, Theory of Mathematical Statistics-Estimation and Hypotheses Testing. Moscow, USSR: Nauka, 1984 (in Russian).

[5] R. H. Berk, “Some asymptotic aspects of sequential analysis,” Ann. Stat., vol. 1, pp. 1 1 2 6 1 138, 1973.

[6] T.-T. Chien and M. B. Adams, “A sequential failure detection technique and its application,” IEEE Trans. Automat. Contr., vol. AC-21, pp. 75C757, Oct. 1976.

[7] R. H. Farrell, “Limit theorems for stopped random walks,” Ann. Math. Stat., vol. 35, pp. 1332-1343, 1964.

[8] P. M. Frank, “Fault diagnosis in dynamic systems using analytical and knowledge based redundancy-A survey and new results,” Automatica, vol. 26, no. 3, pp. 459474, 1990.

-~ tions,” Ann. Stat., vol. 14;pp. 1379-1387, 1986.

on- itoring based on statistical change detection algorithms,” in /roc. TOOLDIAG ’93, (Toulose, France, Apr. 1993), vol. 2, pp. 477-48 .

[ 141 __, “Application of statistical fault detection algorithms to naviga- tion system monitoring,” Automatica, vol. 29, no. 5, pp. 1275-1290, 1993.

[I51 E. S. Page, “Continuous inspection schemes,” Biometrika, vol. 41, pp. 10G115, 1954.

[I61 M. Pollak, “Optimal detection of a change in distribution,” Ann. Stat., vol. 13, pp. 20&227, 1985.

[ 171 Y. Ritov, “Decision theoretic optimality of the CUSUM procedure,” Ann. Stat., vol. 18, pp. 1464-1469, 1990.

[18] A. N. Shiryaev, “The problem of the most rapid detection of a distur- bance in a stationary process,”Sov. Math. -Dokl., no. 2, pp. 795-799, 1961.

[ 191 -, “On optimum methods in quickest detection problems,” Theory Prob. Appl., vol. 8, pp. 2 2 4 6 , 1963.

[20] G. Simons, “Lower bounds for average sample number of sequential multihypothesis tests,” Ann. Math. Stat., vol. 38, pp. 1343-1364, 1967.

[21] M. A. Sturza, “Navigation system integrity monitoring using redundant measurements,” Navigation, vol. 35, pp. 483-501, Winter 1988-1989.

[22] A. Wald, Sequential Analysis. [23] S. S. Wilks, Mathematical Statistics. New York Wiley, 1963. [24] A. S. Willsky, “A survey of design methods for failure detection in

[I31 I. Nikiforov, V. Varavva, and V. Kireichikov, “GNSS integrity

New York Wiley, 1947.

dynamic systems,” Automatica, vol. 12, pp. 601-611, 1976.