Upload
iv
View
216
Download
3
Embed Size (px)
Citation preview
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. I , JANUARY 1995 171
A Generalized Change Detection Problem Igor V. Nikiforov
Abstract-The purpose of this paper is to give a new statistical approach to the change diagnosis (detectiodisolation) problem. The change detection problem has received extensive research attention; however, the change isolation problem has, for the most part, been ignored. We consider a stochastic dynamical system with abrupt changes and investigate the multiple hypotheses extension of Lorden’s results. We introduce a joint criterion of optimality For the detectiodisolation problem and then design a change detectiodisolation algorithm. We also investigate the statistical properties of this algorithm. We prove a lower bound for the criterion in a class of sequential change detectiodisolation algorithms. It is shown that the proposed algorithm is asymptot- ically optimal in this class. The theoretical results are applied to the case of additive changes in linear stochastic models.
Index Terms-Sequential change detection and isolation, gen- eralized change detection, linear stochastic models.
I. INTRODUCTION TATISTICAL decision tools for detecting and isolating S abrupt changes in the properties of stochastic signals and
dynamical systems have numerous applications, from on-line fault diagnosis in complex technical systems to edge detection in images and detection of signals with unknown arrival time in geophysics, radar and sonar signal processing. For example, the early on-line fault diagnosis (change detection and isolation) in industrial processes helps in preventing these processes from more catastrophic failures. As another example, let us consider the problem of target detection and identification. Assume that there are several types of targets. Each of them can appear at an unknown moment of time. The problem is to detect the fact that a target has arrived and to classify the type of target as soon as possible.
The solution of the change diagnosis problem that is now almost traditional consists of subdividing this problem into two stages: change (fault) detection and change (fault) iso- lation, which are executed sequentially. As mentioned in the pioneering paper by A.Willsky [24], the fault detection (or alarm task) consists of “making a binary decision-ither that something has gone wrong or that everything is fine.” The fault isolation “is that of determining the source of the fault.”
The change detection problem has received extensive re- search attention. Mathematically the change detection problem
Manuscript received July 14, 1993; revised April 16, 1994. An early version of the paper was presented in part at the IEE Conference CONTROL’94, Coventry, UK, March 21-24 1994 and at the American Control Conference 1994, Baltimore, MD, June 29-July 1, 1994.
The author is on leave from the Institute of Control Sciences, Moscow, Russia. He was with IRISA, Campus de Beaulieu, 35042 Rennes Cedex, France. He is now with L.A.I.L. U.R.A. CNRS D 1440, Universite des Sciences et Technologies de Lille, Batiment P2, 59566 Villeneuve d’ Ascq Cedex, France.
IEEE Log Number 9406687.
can be formulated as that of the quickest detection of abrupt changes in stochastic systems, Recent results and references can be found in [3]. The two main classes of quickest detection problems are the Bayesian and the non-Bayesian approaches. The first optimality results for the Bayesian approach were obtained in [18], [19]. More recent results can be found in [ 161. The first algorithm for the non-Bayesian approach was suggested by E. S. Page in [15]. It was the cumulative sum algorithm (CUSUM). The asymptotic minimax (“worst case”) optimality of the CUSUM was proved in [lo], where G. Lorden has given a lower bound for the worst mean delay for detection and has proven that the CUSUM algorithm reaches this lower bound. Recently, nonasymptotic aspects of optimal- ity for non-Bayesian algorithms were investigated in [12], [17]. The asymptotic minimax optimality of the CUSUM algorithm in the case of dependent random processes was obtained in [2].
The change isolation problem has been investigated much less. To our knowledge no proof of optimality in a mathe- matically precise sense exists. Moreover, because the quickest detection criterion of optimality does not take into account the isolation problem, the way to combine the detection and isolation algorithms together is not obvious. Therefore, the following problems remain unsolved:
What is a convenient criterion of optimality for the fault isolation problems? How may we avoid contradictions between the criteria of the detection and isolation stages? Typically a short mean detection delay is desirable, but a longer decision delay can improve the result of the isolation stage. We shall prove that this is not always necessary. What is a lower bound for the performance index in a class of detectionhsolation algorithms? What is an optimal (or asymptotically optimal) algorithm which reaches this lower bound?
The goal of this paper is to present a new statistical method for jointly detecting and isolating changes in the properties of stochastic systems, and to prove the asymptotic optimality of this method. The criterion of optimality of this “generalized change detection problem” consists in minimizing the worst mean detectionhsolation delay for a given mean time before a false alarm or a false isolation.
The paper is the first attempt to solve this new problem. For this reason we assume that the statistical models before and after changes are known exactly. This means that we assume the case of simple hypotheses. The case of composite hypotheses will be discussed elsewhere. The paper is organized as follows.
First, we give the generalized change detection problem statement, the basic model which is a finite parametric family
0018-9448/95$04.00 0 1995 IEEE
172 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. 1, JANUARY 1995
of distributions, and the criteria of optimality in Section 11. We also give an intuitive definition of the criteria and discuss some features of the proposed criteria.
Next, we design the change detectiodisolation algorithm for the basic model in Section 111. We also investigate the statistical properties of this algorithm. The main results are stated in Theorem I.
In Section IV we investigate a lower bound for the worst mean detectiodisolation delay in a class of sequential change detectiodisolation algorithms. The main result is established in Theorem 2.
Finally, in Section V we introduce two types of linear stochastic models with additive abrupt changes. The first type of stochastic models is a regression model with redundancy. The second type is a stochastic dynamical model. In these two cases we show how the new change detectiodisolation problems can be reduced to the basic problem and we discuss some new features which play a key role in these new models. We also investigate the statistical properties of the change detectiodisolation algorithm. The results are given in Theorems 3 and 4.
.
11. PROBLEM STATEMENT In this section we give an intuitive formulation of the change
detectiodisolation criteria and discuss some features of the proposed criteria. Next, we define the basic model which is a finite parametric family of distributions and give a formal definition of the criteria.
A. Intuitive Formulation of the Change Detectiodlsolation Problem
Let us assume that there exists a discrete time stochastic dynamical model Fn(e), n = 1,2, . . .. The vector 0 E R' is the parameter of interest. Let
K-1
F = {F(O) : 8 E R, R = U {e,}} i=O
be a finite family ( K members) of this model. Until the unknown time k the vector is 6' = 6'0 and from k+ 1 it becomes 8 = 81 for some I , 1 = 1,. . . , K - 1. Therefore, -En(@) is the model with abrupt changes
for some 1 = 1,. . . , K - 1 and k = 0,1,2, . . .
(1)
where F(f3,) is the normal operation mode of the model F and F(6j) is the mode with fault number 1 > 1. We assume that the values of 81 are known a priori. The change time k + 1 and number 1 are unknown.
Let (Yn),>l be a sequence of observations, which are coming from system (1). The problem is to detect and isolate the change in 0. In other words, we have to determine the type of fault (number 1) as soon as possible. The change detectionhsolation algorithm has to compute a pair ( N , v) based on the observations YI , Y2, . . ., where N is the alarm
time at which a v-type change is detectedisolated and v, v = l , . . . , K - 1, is the $nul decision. In other words, at time N the hypothesis 'FI, : (6' = e,} is accepted.
The following three situations can occur: If the change is detectedisolated after time k ( N > IC is true), then the delay for detection/isolation of an 1-type change is
~ 1 z N - k .
On the contrary, if the changes in 0 are detected before time L or if the final decision is incorrect (v # Z), then these are false alarms or false isolations which we characterize in the following manner:
-False alarms. Let the observations (Yn),21 come from the normal mode system F(e0). Con- sider the following sequence of alarm times
No = 0 < NI < N2 < . . . < N , < .. .
where N , is the alarm time of the detec- tion/isolation algorithm which is applied to Y N , - ~ + ~ , Y N , - ~ + ~ , . . .. Define thejrst false alarm time Nu=' of a j-type in this sequence
N ' ' 3 = i n f { N r : v , = j } l I s j l K - 1 r2l
where inf{8} = CO as usual. -False isolations. In order to avoid uncertainties in initial conditions let us assume that k = 0. In other words we assume that the observations (Yn),>l come from the mode F(6'1) with fault number 1 2 1. Define thejrs t false isolation time
of a j-type in this sequence: NU=j
NU=j - ' - inf { N , : v, = j } , 1 5 j # I 5 K - 1.
It is intuitively obvious that the optimality criterion must favor fast detectiodisolation with few false alarms and few false isolations. In other words, the delay rl = N - k given that N > IC should be stochastically small for each1 = 1, . . . , K-1, and
N"=j = r>l inf{N, : v, = j }
should be stochastically large for each combination of numbers j # 1.
B. The Basic Model
We consider a finite family of distributions P = {Pi, i = 0, . . . , K - 1) with densities { p i l i = 0 , . . . , K - l} with respect to some measure p. In the parametric case, we assume that P = {PO, 6' E R}, where B E R'
K-1
= U { O i l i=O
and we denote the density function of this family by p e ( X ) .
NIKIFOROV: A GENERALIZED CHANGE DETECTION PROBLEM 173
Let ( X n ) n 2 1 be an independent random sequence ob- served sequentially and X I , . . . ] X I , have distribution Po while XI,+^, XI,+^, . . . have distribution Pl, I = 1, . . . K - 1
where L( ) is the probability law. The change time k + 1 and number 1 are unknown. We assume that for this family of distributions the following inequality is true:
where p;j is the Kullback-Leibler information.
C. Formal Dejnition of Criteria
Let us pursue our discussion of the criteria of optimality. The change detectiodisolation algorithm consists in comput- ing a pair ( N , U) based on the observations X I X2, . . .. Let Pk+l be the distribution of observations X I , Xal . . . when k = 0,1 ,2 , . . . and EL+l are the expectation under PL+l. Therefore, the mean delay for the detectiodisolation of an 1-type change is
71 = E;+l ( N - k 1 N > k ] A71 , . . . . X k ) (4)
where E( I ) is the conditional expectation. It is obvious that, without knowing a priori the distribution of the change time k + 1, the mean decision delay defined in (4) is a function of k and the past “trajectory” of the random sequence X I , . . . , Xk. In many practical cases, it is useful to have an algorithm which is independent of the distribution of the change time k + 1 and of the sample path of the observations X I . . . . X k . For this reason we use another minimax performance index, which has been introduced in [lo]. Hence, the worst mean detectionlisolation delay is’
-* T~ = s u p e s s s u p ~ ; + , ( ~ - k I N > ~ , x ~ , . . . , x ~ ) . ( 5 ) k20
On the other hand, the mean time before the first false alarm N”’J of a j-type is defined by the following formula:
9 Eo(N”=J) = Eo($in{Nv : v, = j
where Eo( ) = EL( ), and where the superscript is in this case irrelevant. Analogously, the mean time before the first false isolation NU’J of a j-type is
where El( ) = E:( ).
that the worst mean detectiodisolation delay Let us consider the following minimax criterion. We require
-* 7 - - sup esssupE;+,(N-k 1 N > I C . X ~ , . . . . X ~ ) I,>O,l<l<K-I
(6) ‘Let us assume that y, y, s are random variables. We say that the y =
esssup s i f 1) P(s 5 y) = 1; 2) if P(T 5 y) = 1 then P ( y 5 6) = 1, where P( A) 15 the probability of the event d.
should be as small as possible for a given minimum T of the mean times before a false alarm or a false isolation
Remark I : Usually, in the classical change detection prob- lem the mean time before false alarm is equal to the mean time between false alarms. It follows from the fact that the system inspection and repair times are not of interest of us and thus are assumed to be zero. In other words, we assume that the process of observation is restarted immediately as at the beginning. In the problem discussed we can assume that the T is equal to the minimum mean time between false alarms or false isolations under the same assumptions.
Discussion: Let us explain criterion (6)-(7). Originally, this type of criterion was introduced for the quickest change detection problem in [lo]. For a given change time k + 1, the conditional expectation
- 71 = EL+,(N - k I N > k 1 X 1 . ‘ . . ] X k )
is a random value. For this reason we have to use the essential supremum esssup 71 ( X , . . . , X,) in order to reach the smallest A l ( k ) such that
- Tl(X1.. ’ ‘ , Xk) I Al(k )
almost surely under the probability measure p of the observa- tions X I . . . . X k . After this we have to use the supremum
S U ~ A l ( k ) k > 0, 1 < 1 < K - 1
in order to guarantee the worst mean detectionholation delay with respect to the unknown change time and number 1 of the hypothesis 7&. On the other hand, we constrain the minimum value T of the mean time before a false alarm or a false isolation in order to guarantee some acceptable level of false solutions.
111. THE BASIC ALGORITHM AND ITS PROPERTIES
This section is organized in the following manner. First, we design the joint detection/isolation algorithm. Then we in- vestigate the asymptotic statistical properties of this algorithm using criterion (6)-(7).
A. The Change Detectionlholation Algorithm I ) Sequential Hypotheses Testing Problem: Let us start
with the Armitage sequential probability ratio test (SPRT) for statistical multiple hypotheses testing problem (see [ 11 or handbook [9, p. 2371). Again, we suppose that P = { P, : i = 0, . . . , K - l} is a finite family of distributions.
Hence, we consider K simple hypotheses
3-t,:(L(Xn),>1 =P,} , i = O ; . . , K - l .
We are observing sequentially an independent random se- quence (X,),,, with density p,, where p , is the density of an unknown member P, of the family P. The multiple hypotheses SPRT is nothing but the following pair (Ad, d) , where M is
174 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. 1 , JANUARY 1995
the stopping time and d is the final decision which are defined in the following manner:
the comparison of the difference between the value of the log LR and its current minimum value
M = min Mi 1=0;. . ,K-l
gn = S; - O < k < n min S! = l < k < n max s:, s,O = o
2 1 : min [s:(l,j) - hl j ] 2 ") (9) with a given threshold. In other words, the CUSUM stops at time N, if, for some IC < N,, the observations x k , ' . , XAI~ are sign$cant for accepting the hypothesis about change
O<j#l<K--l
d = arg 1=0:..,K-l min Ml,
where
is the log likelihood ratio (LR) between hypotheses 3-11 and 3-1j and hlj are chosen thresholds.
The interpretation of SPRT (8) is very simple: it stops the first time n at which there exists some hypothesis 3-11 for which each LR S; (1 , j ) between 3-1~ and 3-1j is greater than or equal to a chosen threshold hlj , 0 5 j # 1 5 K - 1.
2 ) Design of the Detectiodlsolation Algorithm: The idea of the proposed detectionhsolation algorithm is based on a class of "extended stopping variables" which was introduced in [ 101 originally. Because we are not interested in the detection of hypothesis 3-10 we will assume that 1 = 1,. . . , K - 1.
Let us introduce the following stopping time:
n 2 1 : max SF 2 h lsksn
If we have to detect and isolate an 1-type change in the model then, generally speaking, we may exploit Page's idea with some modifications. Now we haxe the set 3-10, 3-11, . . . , %K_I
of alternatives. For this reason N 1 stops if, for some IC < N', the observations XI,, . . . , X,, are signiJicant for accepting the hypothesis 3-11 with respect to this set of alternatives
n 2 1 : max min S;( l , j ) 2 h } . (14) l < k < n O < j # l < r C - l
Let us discuss the following example. Let X , E R2 be a Gaussian vector: C(Xn) = N(B, I ) , where I is a unit covariance matrix. We consider the following hypotheses:
3-10 : { e = ( 0 ; O ) T ) 3-11 : { B = (-1; ly},
3-12 : { e = (1;2>T). and final decision
i, = argmin{N1, . . . , iVK-l) (1 1)
o_f the detectionhsolation algorithm where the stopping time N 1 is responsible for the detection of hypothesis 3-11 and is defined by the following formula:
IV' = inf N i " ( k ) (12) k l l
n 2 IC : min S;( l , j ) 2 h } , O<j#l<K-l
3) Discussion: Let us discuss the design of algorithm (10)-(12). It follows from (12) that the algorithm is based on the concept that is very important in detection theory, namely the LR
The typical behavior of the log LR is depicted in Figs. 1 and 2 where log LR Sy(Z,j) are denoted by S( l , j ) . For this example, the change time IC is equal to 20. Figs. 1 and 2 show the changes from 3-10 to 'FI1 and from 3-10 to 3-12,
respectively. The decision-making process is quite simple: when both differences between the values of log LR S;( 1 , O ) and Sy (1,2) and their current minima are greater than or equal to the threshold (here h = lo), as shown in Fig. 1, then we stop the process of observation ( N = 27) and accept the hypothesis 3-11 (v = 1). Analogously, when both differences between the values of log LR Sy(2,O) and SP(2,l) = -S;(l, 2) and their current minima are greater than or equal to the threshold then we stop the process of observation ( N = 24) and accept the hypothesis 3-12 (v = 2). This situation is shown in Fig. 2.
Stopping time (14) can be also interpreted as the generalized likelihood ratio (GLR) algorithm [lo], [ l l] , [24]. The GLR algorithm for testing between two composite hypotheses Ho = (0 E OO} and H1 = { e E 0 1 ) is based on the following statistics:
between hypotheses 3-11 and Xj. The key statistical properties of this ratio are as follows: S U P P d X ; )
SUP PdX,".) *
=
Ej(S;) < 0 and El(S;) > 0. B E O o
In other words, a change in statistical model (2) is reflected as a change in the sign of the log LR mean.
If we have to detect a change in a distribution then the
(CUSUM) algorithm [15]. The CUSUM algorithm is based on
In our case, the hypothesis H1 = 3-1~ is simple and the hypothesis
classical optimal solution of this problem is the cumulative sum H o = U X j O<j#l5K-l
NIKIFOROV: A GENERALIZED CHANGE DETECTION PROBLEM 175
g lo-
L
Fig. 1.
/ I 1 I I
Typical behavior of the
20
Time
log likelihood ratios Sp( l . j ) . The change from 'MO to Xi.
40 I I I
30 !
1 S(1.0)
I I
2 -10 I -
-40
10 15 20 25 30 Time
Fig. 2. Typical behavior of the log likelihood ratios Sy ( 1 , j ) . The change from 'MO to ' M H z .
is composite, but finite. Hence
Let us add a comment on the threshold issue. In algorithm (10)-(12) we use the threshold h instead of hlj in SPRT (8). The main idea of this choice is the fact that the level of false alarms (or false isolations) is a function of the thresholds (see Lemma 1 for details). Therefore, to have the sjame level of false decisions for the separate stopping times N' we choose the same level of the thresholds h1.j = h.
B. Statistical Properties
Now we investigate the statistical properties, namely the relation between the worst mean detectiodisolation delay 7*
given in (6) and the mean time T before a false alarm or a false isolation given in (7).
The main result is _stated in the following theorem: Theorem I: Let ( N , i') be the detectiodisolation algorithm
(10)-(12). Then
p* = min min pl j 1<1<K-1 O < j # l < K - l
The proof uses the following three results. Lemma I : Let N J be stopping variables (12) with respect
to X 1 , X 1 , . . . , - 9. Then
El(fii) 2 eh for 1 = O,. . . ,K- l , 15 j # 15 K-1. (16)
Proof of Lemma I : It is known from Lorden's Theorem 2 [lo] that the expectation of the stopping variable N = infk21 { n ( k ) } , where n ( k ) is the stopping time of the open- ended SPRT which is activated at time I C , satisfies the follow-
-
176 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. 1 , JANUARY 1995
ing inequality:
1 E l ( N ) 2 ;
From Farrell’s Theorem 3 [7] we know the following prop- erties of M :
P ( M < CO) = 1; lim h-’E(M(h)) h + w
1 - - when Pl(n(IC) < 00) 5 a, IC = 1 , 2 , . . .. Hence, it will suffice to show that Pl(Nj(1) < m) 5 e-h. min(m1, m2 . . . , mK-1) ’
In the case of stopping variable f i j we have m; = pj; > 0. Let us consider the following stopping variables:
The proof of Lemma 2 is complete.
rithm (10)-(12). Then Corollary I : Let ( N , Y) be the detectiodisolation algo-
as h 4 cy). (19)
Proof of Corollary 1: Formula (19) follows at once from
h max ~j[Nj(l)l - 7 7* 5 where A; = {fijZ(l) < cy)} and
l< j<K-1 lVjZ(1) = inf{n 2 1 : ~y(j,i) 2 h) .
It is easy to show that
Pl (B) 5 Pl(Al).
the definition of fi (10) and (18) (Lemma 2). Proof of Theorem 1: First, let us show that
Eo(fi”=j) 2 e’j = l , . . . , K - 1.
Hence, from the above formula and Wald’ s inequality Pl (Al ) 5 ePh (see [22, pp. 40-441)
Define the following two events: { f i ” = j 5 n} , where
N U = 3 = inf{fi, : v, = j } ~ l ( N j ( 1 ) < CO) 5 e-’. (17) r > l
and {fij 5 n}. Denote by T:, the argument of the above minimum. It iS obvious that {Nu” 5 nITb> = {fi’ 5 n} when T; = 1 and { f i ” = j 5 nlr;} C {fij 5 n} when rb > 1. Therefore
The proof of Lemma 1 is complete.
to x1,...,xk, Xk+l ,”’ , -PL+~. Then hmma 2: Let ~j be stopping variables (12) with respect
as h - i cy). (18) po(fi”=j 5 n I T ; ) 5 Po(f i j 5 n) when T; 2 1 h
min O < i # J < K - l
T; 5 Ej[fij(l)] - and
p o ( f i V = j > n I T ; ) 2 Po(Nj > n) when T: 2 1. Proof of Lemma 2: The first part of the proof follows
from Lorden’s Theorem 2 [lo]. Namely, note here that the event { f i j 5 I C } , where Now, since
W N j = inf fij(k) . . k > l Eo(N”=3 I T i ) = Po(fiV=j > n I T i )
n=O is the union of CO
2 CP0(lVj > n) = Eo(fij) {fij(l) 5 IC}, { N j ( 2 ) 5 I C } , . . . , { “ j ( k ) 5 I C } . n=O
This results in the worst mean detectiodisolation delay satis- fyings the following inequality:
we have from Lemma 1 that
~o(fij) 2 eh.
Finally, we get 7; 5 Ej (N j (1 ) ) .
Let us define the following stopping variable M(h) : E0(lV’”’j) = Eo[Eo(fi”=j 1 T i ) ] 2 e’. (20)
M = i n f { n > l : m i n ( e y : , . . . , k y T - ’ ) 2 h} Second, let us show that i=l i=l E, (RU=j ) 2 e’, 1 = 1, . . . , K - 1, 1 5 j # I 5 K - 1.
- \ I~
where Y1, . . . , Yn, of independent identically distributed (i.i.d.) random vectors. Moreover, we assume that
Y , = Cy:, . . . , y/K-l)T is a sequence (21) The Proof of this step of Theorem 1 is similar to the previous step. It will suffice to use Pl( ) (E l ( ) ) instead of Po()(Eo( 1).
Relation (15) follows at once from (20), (21), and Corollary 0 < min (ml, . . . , m K - l ) , where mj = E(yJ). 1 . The proof of Theorem 1 is complete.
NIKIFOROV: A GENERALIZED CHANGE DETECTION PROBLEM I77
IV. ASYMPTOTIC THEORY
In this section we prove a lower bound for the worst mean detectionholation delay over the class IC, of sequential change detection/isolation algorithms. First, we give some technical results on sequential multiple hypotheses tests and then we prove this asymptotic lower bound for 7*. Finally, we compare this new lower bound with the lower bound for the worst mean delay of the classical change detection (without isolation !) algorithms which have been mentioned before.
Lemma 3: Let X I , X 2 , . . . be a sequence of i.i.d. random variables. Let 7 & , . . . , % ~ - 1 be K 2 2 hypotheses, where 'Ft, is the hypothesis that X has density p , with respect to some probability measure p, for i = 0, . . . . K - 1 and assume inequality (3) to be true. Let E z ( N ) be the average sample number (ASN) in a sequential test which chooses one of the K hypotheses subject to a K x K error matrix A = lluZJI), where utJ = P , (accepting ' X 3 ) , i , j = 0. . . . , K - 1. Let us define the following matrix A (see bottom of the page): Then a lower bound for E , ( N ) is given by the following formula:
stopping variable and v is the final decision, that satisfy the following inequalities:
min inf { N , : v, = j } ) 2 y. (24)
Theorem 2: Suppose class (24) is nonempty. Let us define the lower bound n(7) as the infimum of the worst mean detectionholation delay in the class IC,
n(7) = inf (?*). ( hr, u) E IC,
Let inequality (3) be true. Then
where
p* = min rriin p l j . 1<1<K-1 O < j # l < K - l
Proof of Theorem 2: See Appendix 11. Corollary 2: Detectionhsolation algorithm (lo)-( 12) is
.asymptotically optimal in the class IC7. It is of interest to compare n(7) (25) with the infimum
nc(y ) of the worst mean detection delay for a change detection algorithm. Let N" be the stopping variable of a change detection algorithm. We suppose that the worst mean detection delay is
sup esssupE;+,(N'-k 1 N" > k1X1,...,Xli).
Denote by IC; the class of all stopping variables satisfying
- T * =
k > O , l < l < K - l
(23)
Eo(") 2 YC. Corollary 3: Let the following equality be true
11 (1 - qz)1n@j3' - 1n2
P2.7
niax ( l < j # z < K - 1
for i = 1,. . . , K - 1, where
K-1
min plo = p* rz = Yz + Pd.
1=1,1#t 1 5 1 5 K - l
Proof of Lemma 3: See Appendix I. and ( K - l)y = 7"; then Definition I : Let IC , be the cZass of all sequential detec-
tiodisolation algorithms ( N , v), where N is the extended T;(,*(Y) - ~ j * ( y ) as y -+ 00. (26)
A =
L 1=1
71
7 2
. . .
Yi
. . .
YK-1
K-1
1 - Ca,l-Y1 1=2
P 2 1
. . .
Pi I
LjK - 1,1
aK-1
. . . P1,K-l
. . . P2,K-1
. . . . . .
. . . @i,K-1
. . . . . .
~
178 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. 1, JANUARY 1995
Proof of Corollary 3: It follows immediately from Lor- den’s Theorem 3 [lo] that
In yc
1 g g - 1
nc(y ) = inf (7;) - as yc + m. N e E K ; min PlO
Hence, it is easy to see that (26) is true when
Discussion: Let us discuss the following practical interpre- tations of the above results:
From Theorem 2 it follows that the Kullback-Leibler numbers p i j play a key role in the statistical properties of the detectiodisolation algorithms. The minimum p* of the Kullback-Leibler “distance”2 between the two closest hypotheses ‘Hi and %j, 0 5 a # j 5 K - 1 define the worst mean detectionlisolation delay. Let us consider the two problems which have been mentioned in Section I:
1) The first problem is a change detection task without any isolation of the source of the change (alarm task by A. Willsky).
2) The second problem is the joint change detec- tiodisolation task.
If we suppose that the delay for detection is the price to be paid, then the following basic question arises: Is it necessary to pay more in the case of the more complicated second, problem? If p*, which is the Kullback-Leibler “distance” between the closest altematives ‘ H l and %j, is greater then or equal to the minimum “distance”
between the altematives %l and the null-hypothesis %O or, equivalently
then the answer will be “No.”
V. . ADDITIVE CHANGES IN LINEAR STOCHASTIC MQDELS
In this section we introduce some linear (regression and dynamical) stochastic models with additive changes. We also introduce, in brief, the key concepts that are to be used for the corresponding detectiodisolation problem: namely, redundancy and innovation (see [3 , pp. 249-2521 for details). After this we show how the new detectiodisolation problem can be reduced to the basic problem of Section I1 and we discuss some new features which play a key role in linear stochastic models with additive changes.
’Strictly speaking, p z J is not a distance in precise sense. However, in some cases, for instance, such as the change in the mean of a Gaussian vector sequence, this interpretation is precise and useful.
A. Models
ily of distributions: I ) Basic Model: We consider the following Gaussian fam-
K-1
P = { P , , 6 E R = U {e,}} i=O
where Po, = N ( 6 i , C ) , and C > 0 is a known covariance matrix. Let (Yn),zl be an independent Gaussian random sequence observed sequentially
, \ - I
(27) where Bo = 0 and 81 are known constants. The change time k + 1 and number 1 are unknown. We assume that the following inequality is true:
1 0 < pz3 = 2 (6 , - 63)T C-’ (6 , - 6 , ) < 00, 0 5 i # j 5 K - 1.
(28) 2 ) Regression Models: We consider the following regres-
sion model with additive changes:
Yn = H X n + Vn + Tl(n, k + 1) (29)
where X, is the unknown state, V, is a Gaussian white noise with covariance matrix R = a21,a2 > 0, H is a full rank matrix of size T x s with T > s, and Tl(n, k + 1) is the I-type change occurring at time IC + 1, namely
The characteristic feature of model (29) is the existence of redundancy (T - s > 0) in the information contained in the observations.
3) Stochastic Dynamical Models: We consider the follow- ing linear stochastic dynamical model with additive changes:
where U, is the known input vector, 2-l is the backward shift operator, A(z-l), B(z- l ) , C(z- l ) are polynomial matrices in the operator z-l, V, is a Gaussian white noise with covariance matrix C > 0. Assume, as usual, that the characteristic equations
P
i=l
have zeroes outside the unit circle.
B. Algorithms
In this subsection we design the change detectiodisolation algorithms for the above linear stochastic models. We design the algorithm for basic model (27), which is a Gaussian case of model (2). Then we show that the algorithm for regression model (29) is based on the residuals of the least squares algo- rithm. The statistical background of this problem is multiple hypotheses testing with nuisance parameters and the minimax solution of this problem is the GLR algorithm. On the other
NIKIFOROV A GENERALIZED CHANGE DETECTION PROBLEM 179
hand, this algorithm can be reduced to the basic algorithm. Finally, we show that the detectiodisolation algorithm for dynamical model (30) is based on the innovations of the whitening filter. The statistical background of this algorithm is the Transformation theorem [23. pp. 53-59]. Again this algorithm is a particular case of the basic algorithm.
1) Basic Model: Model (27) is a particular case of basic model (2), which is defined in Section 11. For this reason, detectiodisolation algorithm (lo)-( 12) is valid for model (27):
jV = min {N1,. . . . f iK-1)
i. = argmin {fi', . . . , N K - l 1 N 1 = inf fi'(/c)
k > l
Note here that plJ is a function of the difference z = x' - X J . Therefore, we minimize p lJ (x ) with respect to z. The minimum is obtained for
z* = ( H T H ) - l H T ( Y J - Yl)
P&*) = - ( y J - K ) ~ P ( T , - rz), and is given by
1 2 0 2
(33)
where P = I - H ( H T H ) - l H T is the projection matrix, rank P = T - s. Finally, we have the following formula of LR (13) for hypotheses (32) under the least favorable value z* of the nuisance parameter
(31) 1 2
- -(& - QJTC-l(el - QJ) .
In the following subsections, we show how the other linear stochastic models with additive changes can be reduced to model (27).
2) Regression Models: In this case we consider regres- sion model (29). The characteristic feature of this detec- tiodisolation problem with respect to the above basic problem is the fact that the vector X is unknown. This type of statis- tical problem is usually called a hypotheses testing problem with nuisance parameters. Some tutorial introduction to these problems can be found in [3, pp. 141-145; 270-2731. Because r > s and the matrix H has rank s, we can use the redundancy to solve the detectiodisolation problem.
3) Minimax Algorithm: Let us define the following hy- potheses testing problem:
x1 = {Po(Y); B = H X 1 + Yl. X ' }
xJ = {Po(Y); 6' = H X J + YJr X J } (32)
and
where Tl, Y J are the informative parameters, and X ' , X J are the nuisance parameters. We are interested in detecting a change from Y3 to " 1 , while considering X as an unknown parameter of model (29), but since the expectation of the distribution Po is a function of this unknown parameter, the design of the test is a nontrivial problem.
From Theorem 2 it results that lower bound (25) in the class IC, is a monotone decreasing function of the minimum value of the Kullback-Leibler information p*. Therefore, the design of the minimax algorithm consists of finding a pair of the least favorable values X 1 and X J for which the Kullback-Leibler information plJ is minimum, and in computing the LR of the optimal algorithm for these values.
The expectation 0 of the output Y of model (29) is
0, = E ( Y ) = H X 3 + Y J j = 0, . . . , K - I
where TO = 0. Then the Kullback-Leibler information plJ is 1
PlJ = G ( 8 1 - &)T(ol - 0,)
Note that this LR is independent of the unknown values XI and X j . Therefore, let us define the following minimax algorithm:
n _> k : rriin S;( l , j ) 2 h } O<j#l<K-1
-1
i=k
(35)
4 ) Discussion: Let us add two comments about (34). First, it is easy to show that the minimax approach is equivalent to the GLR, which is based on the maximization of the likelihood function with respect to the unknown nuisance parameters (see also [3, p. 1441). In other words
1 2a2
- -(Y1- YJ)TP(Y1 - Y J ) .
Second, it is worth noting that (34) can be rewritten as
" 1 - s;(z,j) = - y > ( Y l ff - r,)T(ei - Tj)
z=k
1 - -(% 2a2 - r,)T(rl- Tj)
where ' f ' l = TTT1, T j = TTYj , ei = TTY,, T = ( t l , . . . , t r - s ) is a matrix of size T x ( r - s), and t l , . . . , tr -s are the eigenvectors of the projection matrix P. Therefore, LR (34) is a function of the purify vector ei of the analytical redundancy approach [8]. This parity vector ei is the trans- formation of the measurements Y, into a set of T - s linearly
~
180 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. 1, JANUARY 1995
independent variables by projection onto the left null space of the matrix H .
The parity vector sequence (e,),>l can be modeled as
Consequently, to solve the detectiodisolation problem in the case of regression model (29), we have to transform the observations Y, into the parity vector e, and then solve the corresponding basic problem (27).
5) Example 1 (Radionavigation System Integrity Monitor- ing): Navigation systems are standard equipment for planes, boats, rockets, and other moving vehicles. On-line integrity monitoring (fault detectiodisolation) is one of the main prob- lems in the design of modem navigation systems (see examples and references in [6], [21], [13], [14], [3, pp. 454-4631).
For instance, let us consider integrity monitoring of the global positioning satellite set. Simplified measurement models of this type of radionavigation system can be described by (29). The problem is to detect and isolate a satellite clock fault which can be represented as the additional bias Yl in model (29). Conventional global navigation sets require measurements from four satellites to estimate three spatial orthogonal coordinates and a clock bias, or three orthogonal velocities and a clock bias rate ( X E R4). Because for 18- satellites global navigation sets, five or more satellites (T 2 5) are visible 99.3% of the time, it is possible to provide integrity monitoring by using these redundant measurements [2 11.
Let (Y,),>l be the output of model (29). Let us assume that Yl = (0,. . . , O . S l , 0 , . . . , O ) T . Satellite number 1 clock fault is represented by bias 61.
In this case the statistics S;(Z,j) are defined by the follow- ing formulas:
where E ~ J is the component of the LS residuals
E; = Py,
and pl, is the element of the matrix P. 6) Stochastic Dynamical Models: In this case we consider
dynamical model (30). It is obvious that the output Y, of this model is expressible as the sum of the output of the deterministic part P ( z - ~ ) U , of model (30) and the noise process with abrupt changes C(z- ' )[V,+Tl(n, k f l ) ] . Hence
Y, = A(z-l)E;, - B(z- ' )Un = C ( z - ' ) [ v , + Yl(n, k + l)]. (37)
Let us consider two Gaussian vectors Y, and X,:
U, = C(Z-1)Xn x, = v, + Yl(n, k + 1) n = 1 , 2 , . . . , Xn<0 = 0.
It is _easy to show that the transformation from the X space to the Y space is a diffeomorphism (one-to-one transformation). Note by J its Jacobian matrix. From the Transformation theorem (see [23, pp. 53-59]) it results that
p(Yl, . . . ,Y,) = ldet J J p ( X l , . . . , X , ) .
Therefore
The result from (38) is that this detectionlisolation problem is reduced to the above basic problem.
C. Statistical Properties of the Algorithms
In this subsection, we investigate the statistical properties of the change detectiodisolation algorithms. The goal of this subsection is to give some interpretations of the general results of Sections I11 and IV.
I) Basic Model: Since model (27) is a special case of basic model (2), Theorems I and 2 are valid in the case of algorithm (31). Note, that p* is given by
2) Linear Stochastic Models: From the above paragraphs it follows that the change detectiodisolation problems in the case of linear stochastic models (29)-(30) are reduced to the basic detectiodisolation problem. It is obvious that Theorem 1 is valid for these models. Moreover, algorithm (31) is asymptotically optimal in the sense of Theorem 2. In the case of stochastic dynamical model (30), the proof of this fact is trivial, for it is sufficient Lo remember that the transformation from the X space to the Y space is a diffeomorphism. In the case of regression model (29), we have to define the meaning of optimality.
We consider the following family of distributions: P = {P,2,xt,i = O,. . . ,K - l}, where 19, are the informative parameters and X i are the nuisance parameters. Suppose there exists a class IC, of all sequential detectiodisolation algorithms ( N , v) over this family of distributions.
DeJnition 2: Let us define the minimax lower bound %(r) as follows:
where
0 < p" = min min inf plj < CO.
We say that the detection/isolation algorithm (N, 77) is asymp- totically minimax if the following condition holds:
1<1<K-l O<j#l<K-l X"X-1
NIKIFOROV: A GENERALIZED CHANGE DETECTION PROBLEM 181
Theorem 3: Let us consider regression model (29). We assume that the following inequality is true:
Let (N, 9) be detectiodisolation algorithm (35). Then
where
p* = min min 151 5 K -1 O<j+l< K - 1 p;j -
Proof of Theorem 3: As we have mentioned above, LR (34) is a function of the parity vector e, which is a Gaussian T - s-dimensional random variable. From (33) it follows that
Finally, (40) follows at once from Theorem 1. Corollary 4: Detectiodisolation algorithm (35) is asymp-
totically minimax. 3) Discussion: Let us add a remark about the problem of
detectability and isolability of changes in regression model (29). We emphasize that this problem is nontrivial in the case of the regression model.
Suppose that the vectors Yj E R', j = 0 , . . . , K - 1 are chosen arbitrarily such that llri - Yjll2 2 E > 0,O 5 i # j 5 K - 1, where
I T
From matrix theory it results immediately that
where Y = Yz - Y3. For this reason inequality (39) is not valid for all arbitrary vectors Tz and T3. Roughly speaking, it is impossible to detect and isolate all arbitrary changes in model (29). The fact that the norm is strictly positive is not a sufficient condition in this case. Some of these changes will be indistinguishable from the statistical point of view. In order to simplify the problem, let us assume a priori the following: i) all the vectors Tz - T3 have l 5 rank P = T - s nonzero components only; ii) all the principal minors with order from 1 to l of the matrix P are strictly positive. Then the problem is much simpler. Namely, under these constraints inequality (39) holds true for arbitrary vectors Tz and Y3. If these constraints do not apply, then we should check a priori inequality (39). 4) Example 2 (Radionavigation System Integrity Monitor- ing-Continued): Let us pursue our discussion of Example 1. Assume that only one satellite clock can fail at a time. Discuss the following problem: How many visible satellites are necessary to detect and isolate this fault? Because To = 0, it is easy to see that the minimal number T of visible satellites for detecting this fault is equal to five (redundancy= T - s = T - 4 = 1). If we wish to detect
and isolate this fault, then the maximal number l of the nonzero components of the vectors Ti - Tj is equal to two and, consequently, it is necessary to have six or more visible satellites ( e = 2 5 T - s = T - 4)!
D. Stability Against Changes in Design Parameters
The goal of this paragraph is to investigate the stability of the above change detectiodisolation algorithms with respect to design parameters. From a practical point of view, it is important to have a detection algorithm which holds its performance stably against changes in design parameters. Let us consider the two following aspects of this problem: the unknown dynamic profile &(n - k ) of the change magnitude, and the unknown covariance matrix C of the observations after the change time IC + 1.
Let us consider change detectiodisolation algorithm (3 1). From the above paragraphs it follows that this algorithm is asymptotically optimal in the case of Gaussian basic model (27). Suppose now, that observations Y k + 1 , Y k + a , . . . are gen- erated by another Gaussian distribution. In other words, the model is
' ( K ) = { N(dl(n - IC), 51, if n 2 + 1
where ,&, = 0. The profile &(n - I C ) and the covariance matrix 0 < C < 03 are unknown a priori. We assume that the following condition holds:
N ( & , C ) , if n _< k
IC = 0,1,2;.. (41)
1 n-m lim - c d l ( n ) n = I$, 1 = 1 , . . . , K - 1. (42)
i=l
Let kk+l( ) denote the expectation under N(&(n - IC), 9) and I$( ) = hi( ).
Let us define the following worst mean detectiodisolation delay:
esssup E:+~ ( N - k I N > IC, Y I , . . . , ~ k ) . 7* =
The goal of this subsection is to show that 7; is expressible by the same asymptotic formula as in the case of true model (27) (see (19) in Corollary 1).
Theorem 4: Let us consider model (41)-(42). We assume that the following condition holds:
sup -
k>O,l<l<K-l
Let (fi, fi) be detection/isolation algorithm (3 1). Then
Proof of Theorem 4: See Appendix 111. Corollary: Let us assume that 0; = & , 1 = 1, . . . , K - 1.
Then
7*(h ) w 7:(h) ash i CO.
182 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. 1, JANUARY 1995
VI. DISCUSSION
A new statistical approach to the change diagnosis problem is proposed. This approach consists of jointly detecting and isolating abrupt changes in a stochastic system.
Our main results are the following: 1) We introduced a minimax criterion of optimality (6)-(7)
for this detectiodisolation problem. 2) A new statistical change detectiodisolation algorithm
has been designed. This algorithm is expressible by
3) We investigated the statistical properties of this algo- rithm. The result is stated in Theorem 1. We proved a lower bound for the worst mean detectionholation delay in a certain class of sequential change detectiodisolation algorithms. This result is given by Theorem 2. From Theorems 1 and 2 it follows that the proposed algorithm is asymptotically optimal in this class.
4) As we demonstrated in Section V, the general results can be applied to some classical linear stochastic models with additive abrupt changes. The nontrivial problem of detectability and isolability, which arises in the case of regression model with redundancy (29), has been addressed.
5) It has been proven that the detectionhsolation algorithm is stable against changes in design parameters. Particu- larly, the algorithm is stable with respect to the unknown dynamic profile of the change magnitude if this profile converges to a known constant.
Let us add a concluding remark. It is obvious that the pro- posed scheme (10)-(12) is not a recursive algorithm. Hence, the problem of interest is to find another appropriate recursive computational scheme in order to reduce the amount of nu- merical operations, which should be performed for every new observation, without losing optimality.
(lo)-( 12).
APPENDIX I PROOF OF LEMMA 3
Let us prove the first part of inequality (23). The following inequality appears in [20, Theorem 3.11 as a generalized Wuld lower bound for ASN:
E , ( N ) ~ , , 2 K-l
a,l In s, /=0 a, 1
for i = 1.. . . . K - 1
where index .J' is arbitrary except for j # i . Let us assume that j = 0 and 1 5 i 5 K - 1. In accordance with the notations (22), (23), and Lemma 6 [2OI3 we can write
1=1
Lemma 6 [Simons]: Let (I 1, . . . , and bl , . . . , b, be two sequences of positive real numbers. Let (I = a , and b = Cy=l b l . Then
K-1
l = l , l # i ~
1=1
(44)
Finally, the following inequality follows from the fact that
is nonnegative and the minimum value of y; In yi + (1 - n) In (1 - 7;) is equal to -In 2
f o r i = l ; . . . , K - l . (45)
Let us prove the second part of inequality (23). Let 1 5 i 5 K - 1 and 1 5 j # i 5 K - 1. And again we can write by the same arguments
where
K-1
ri = yi + Pi1 l= l , l # i
Therefore, we have the following formula :
for i = 1, . . . , K - 1. (47)
NIKIFOROV: A GENERALIZED CHANGE DETECTION PROBLEM 183
APPENDIX I1 PROOF OFTHEOREM 2
The proof consists of two parts. The first part includes the derivation of the asymptotic relation between the worst mean detectiodisolation delay and the mean time before false alarms. The second part includes the derivation of an analogous result for false isolations.
Note that the scheme of our proof is as in Lorden’s Theorem 3 (see details in [lo]). The novelty is in the extension of Lorden’s results to the case of K > 2 hypotheses.
A. The First Part:
For this reason the following inequality holds:
Pk+,(Ti < 00 I Ti-1 = k < N ) I €1.
We denote all subsets D;k = {Ti-l = k < N } , for which Po(Dik) > o and, hence, also P:+,(Dik) > 0. Let us define the following sequential test ( N * , d* ) on the subset Dik by using the stopping variables Ti, N , and the final decision v:
N * = min{N, Ti} v, if N 5 Ti 0, if N > Ti. d* = { (49)
It is sufficient to show that for every € 1 E ( 0 , l ) there exists IC1(El, K)I < col = 1 , . . . , K - 1 such that for all ( N , v) E IC, the following inequality is true:
In other words, at time N* one of the following hypotheses ‘Ho<d*<K-l is accepted (see bottom of page):
Let us consider the statistical properties of this sequential test. The conditional expectation of the sample number of observations taken for the test is , 1 = 1, . . . , K - 1. (48) (1 - El)lny+ G ( Q , K ) 7; 2
P10
As in [lo] let us introduce the following “additional” E;+l(N* - 1 Dik). stopping variables:
It results from the definition of the worst mean detec- tiodisolation delay that To = 0 < Ti < T2..’
P“,,,(Tj < m I Ti-1 = k < N ) 5 € 1 So, we have the following lower bound for the ASN of test (49) on the subset D;k (see the inequality at the bottom of
provided that
Po(Ti-1 = k < N ) > 0.
Moreover, it is obvious that
{Ti < CO I Ti-1 = IC < N }
. . 1=1
184 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. I , JANUARY 1995
Hence, we have K-1
Second, (51) has been received by Lorden [lo, see eqs. (19)-(21)] under the constraint on the following minimum:
Q = inf {Po(Z < i + 1 I Z 2 i ) } 7; 2 (1 - tl)I In P o ( N 5 Ti n v = lIDik)l - In2 1=1 221
PEO. (50)
It can be shown [lo] that Po(N I: T, n v = 1 I T,-1 < N ) is an average over k of the probabilities P o ( N 5 T, n v = 1 I T,-1 = k < N ) satisfying (50).
P o ( N 5 T, n v = 1 1 TXp1 < N ) =Eo[Po(N 5 T, n v = 1 I T,-1 = IC < N ) I T,-1 < N ]
cc
= C P ~ ( T ~ - , = I C ) P ~ ( N 5 T, n v = I 1 T , - ~ = k < N ) . k=O
Moreover, from Jensen inequality for convex functions it follows that for all values z : Po(T,-1 < N ) > 0 we get (see bottom of page)
It follows from the definition of the sequential test ( N * . d * ) that
Po(N 5 Tl I E-1 < N ) K-1
= P ~ ( N 5 T, n = i I T , - ~ < N ) . 1=1
Hence, we can utilize here Lorden’s Theorem 3 [lo] which provides us with the following lower bound:
(1 - t l ) l nEo(N) - (1 - q) lnEO(Tl ) - In2 PLO
Ti; 2 .
l = l ; . . , K - l . (51)
Let us assume that the elements of the sum K-1
P ~ ( N 5 T, nu = 1 1 T ~ - ~ < N ) 1=1
are equally chosen. Combining (52) and (53) , we get the probability of the false alarm of an 1-type
(54)
Let us consider now the following sequences of stopping variables (here false alarms):
No = 0 < N1 < N2 < . . . < N,. < . . . where N, denotes the stopping variable obtained by ap- plying N to X N , - ~ + ~ , X N , + ~ + ~ , . . ., and final decisions v1, v2 , . . . , v, . . .. Since v1 , up, . . . are i.i.d. random variables we have immediately that
Moreover, N1, N2 - N I , N3 - Nz . . . are i.i.d.4 as well, and Eo(inf{r 2 1 : v, = 1 ) ) < ca. Hence, Wald’s identity [22, pp. 52-54; App. A.3I5
To finish the first part of the theorem we have to compute the Eo = ~ ~ ( i ~ f { ~ 1 : v, = l j ) ~ ~ ( ~ )
(56) mean time before the first false alarm of an Z-type
EO rnin{N, : v, = l } ) , 1 = 1,. . . , K - 1 ( e l
in the sequence N I < N2 < . . . < N, < . . . of false alarms. Let us assume that X I , Xz, . . . , N Po. We consider the
sequence T I , T2, . . . of the stopping variables before the first N 5 T, and denote by Z = irif {i 2 1 : N 5 TL} its number. The final decision here is v 2 1. The probability of the false alarm of an 1-type is
cz1
~ ~ j l ) = C ~ , ( ~ = i n ~ = 1 ) , 1 = i . . . . . K - 1 . (52) a=l
If Po(Z 2 i ) > 0 then Po(Z < i + 1 n v = 1 I Z 2 i ) is well defined and we get the formula
PO(Z = i nv = 1 ) = PO(Z < i + i n v = I I z 2 i )Po(Z 2 i). ( 5 3 )
Let us consider (52) and (53). First
Po(Z < i + 1 n v = 1 I Z 2 i) = Po(N 5 T, n v = 1 I T,-1 < N ) .
holds and combining (54)-(56) we get
Eo (inf { N , : v, = 1) ) (57)
rz l
K - 1 EO(N =
Finally, from (51) and (57) we get the lower bound
Ti; (1 - ti)1nEo(inf,21{Nr : v, = 1 ) ) + Cl(El,K)
PLO 1
1 = 1, . . , K - 1 (58)
where
Cl(q,K) = - ( l - t l ) [ lnEo(T1)+In(K-l)] -1n2.
The proof of the first part of the theorem is complete. 4Here the “increments” *Y, - V7-, are distnbuted as S 5Theorem [Wald] Assume that ‘1 3 2 are I I d random variables and
E(CY==, I 5 , I ) < 3c For any integrable stopping variable r we have
E(S, ) = E(r)E(s) S, = 5 ,
,=1
NIKIFOROV: A GENERALIZED CHANGE DETECTION PROBLEM 185
B. The Second Part
It will suffice to show that for all (N, v) E IC, the following inequality is true:
where C z ( K ) = - (K - 2)e-I - In2 and
Let us assume that XI, XZ, ... , - Pl, 1 = 1,. . . , K - 1. Now we do not need to introduce the “artificial” additional stopping variable Ti and we consider the following sequence of stopping variables:
No = 0 < NI < N2 < .. . < N, < . . .
where N, denotes the stopping variable obtained by applying N to X N , _ ~ + ~ , X ~ , - ~ + ~ , . . . . Let us define the sequential test (NT,vT) which chooses one of the K - 1 hypotheses
Let us consider the statistical properties of this sequential test. From the definition of the worst mean detectionhsolation delay it results immediately that
T~ =sup esssup EL+,(N - IC I N > IC,Xl,Xz,...)
%l, .” ,%K-1 .
-* k20
2 E:(N - 0 1 D,o) = Ei(N).
In order to apply lower bound (23) for the ASN in this case, we have to assume that in Lemma 3 yl = 0, l = 1, . . . , K - 1. The convention which interprets Oln as zero [20] leads to the following lower bound for the ASN:
for 1 = l , . . . , K - 1, where K-1
Tl = P l j j= l , j # l
and
,Bjl = Pj(accepting % l , Z # j ) = Pj(vT = l ) ,
holds and we have
Inserting (62) in (61), we get the following inequality:
}. (63)
Note here that min,>o(zlnz) = e-’. Since inequality (63) holds for all values 1 = 1, . . . , K - 1, we get finally
(1 - ( K - 2)$)1n$ -1112 { P l j 7; 2 max
l<j#l<K-l
lny - In?* - ( K - 2)e-I - ln2 7* 2
P or
7*(1+ o(1)) 2 In’ + C z ( K ) as y 3 00 (64) P
and the proof of the second part of the theorem is complete.
and the fact that The “first-order’’ lower bound (25) follows from (58) , (64),
p* = min { l<Z<K-l min (p lo), p } .
APPENDIX I11 PROOF OF THEOREM 4
The first part of (43) follows from Lemma 2 and Corollary 1. It is sufficient to show that (see bottom of page) to prove the second part of formula (43)
h a s h ---f CO. (65)
l<j< K -1
It results from Berk’s Theorem 3.1 (see [ 5 ] ) that the mean delay for detection satisfies
provided that
j ,1= l , . . . , K - l . and also, for some 6 E (0, p ) , the “large deviation” probability
min SY(1,j) O<j#l<K-l
n
lim npn = 0 (68)
[ Since VI, vp, . . . are i.i.d. random variables we have immedi- ately that P n = Pl
1 El(inf{r 2 1 : v, = j } ) = -
P l j .
Moreover NI, N2 - NI, N3 - Nz, . . . , are i.i.d. as well, and satisfies the two following conditions:
El(inf{r 2 1 : U, = j } ) < CO. Hence, Wald’s identity n+m M
El ($in”, : v, = 3 ) = El(N)El(inf{r 2 1 : v, = j } ) . ) n=l
186 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. 1, JANUARY 1995
Let us show that (67) is true when e is defined by Let us consider the Gaussian random vector 2, E RK-' such
The left-hand side of (67) can be rewritten as It is known that the family X N N ( 0 , I ) remains invariant6 under the transformation gX = R X , where C z = RRT. Therefore
f n = o<jgg-l { ;Wd} min {zn(llj) + & ( l , j ) } @O, I (A) = @?(O,I) (gA) - -
O<j#l<K-l
where C[Z,(Z.j)] = N(0 , 5 ) and
1 2
- - ( B l - Oj)TC-1(8, - S j ) .
It results from the strong law of large numbers that ~ ~ ( l , j ) ~ Z l O . Hence, the continuity theorem [4, ch. 1, para. 51 and the fact that
-
-
d ( l , j ) = lim & ( l , j ) = (0, - - ~ j ) n-3o
1 2
- -(Ol - Bj)TC-1(el - O j )
lead to the following formula:
min { ( e l - ~ j ) ~ ~ - l ( i J l - ~ j ) O<j#l<K-l f n w z l
where
g ( O , I ) = (RO; E,) A = { X : X T X 5 X2}
g A = (2 : ZTCG1Z 5 A'}>, 2 = R X
cpe,c(X) = ( 2 ~ ) - ( ~ - ' ) / ' ( d e t C)-1/2
.exp { - f ( ~ - s ) T C - ~ ( X - 8)).
Define the following ellipsoid:
ZTC,lZ = X2
where X2 = C 2 ( N ) mini rii' and ai;' are the diagonal elements of the matrix E,'. It is easy to see that
1 2
--(Ol - Oj)TC-l(dl - 6y). p , 5 1 - pi{ O<j#l<K-1 min [Z,(l,j)l 2 C ( W }
Let us estimate the "large deviation" probability p , and < 1 - @ O , n - l ~ z ( g A ) 1 1 - @o,n-1I(A) prove (68)-(69). First, find the following upper bound for p,:
and, finally
p , = P l - n o < j # l < ~ - i min S:( l , j ) < Q) p , 5 1 - Q ~ , , - ~ ~ ( A ) < p , = 1 - (1 - 24(-Xfi)}K-' - =hi{ niin [ z , ( ~ , j ) +ii,(l,j)l < 4) where X = > 0. From this and the asymptotic formula
' { l
5 i.,{ o<j$;~K-l[~n(llj) + o<jg&l dn(l1dl < G }
= Pl{ o<j$iK-l [Zn(h.dl < Co<j$g-ld( l l j )
O<j#l<K-l
1 - 1 1 3 4(--5) exp (-$) (1 - + 2 +. . . 3
- " 1 = I , - exp (-g) d-5 6
-
- min d , ( l d } , we deduce that O<j#l<K-l
lim np, = 0. where c E (0 , l ) . 1 1 - 0 0
It is obvious that for all c E ( 0 , l ) there exists N ( c ) such Moreover, straightforward computations show that that for all n > N ( c ) the following inequality holds: 00 -
C ( N ) = ( e - 1) O<j#,<K-l min d ( l , j ) cisn < 00 n=l - -
4Zl.d - o<jg;K-l d n ( 4 j ) l < 0. + I O<j$?K-l
Hence, we have for all n > N ( c ) and the proof of (68)-(69) is complete.
Thus we have proved that (67)-(69) hold true. From this we then have (66) and, finally, we get (65). The proof of Theorem 4 is complete.
6A parametric family of distributions P = {Po} remains invariant under a group of transformation 4 if Vg E 6 and V0, 30, such that: Po(Y E A ) = Pog (Y E g 4 ) , where 0, = g0.
NIKIFOROV: A GENERALIZED CHANGE DETECTION PROBLEM 187
ACKNOWLEDGMENT 191 B. K. Ghosh and P. K. Sen, Eds., Handbook of Sequential Analysis. New York: Marcel Dekker, 1991.
1101 G. Lorden, “Procedures for reacting to a change in distribution,” Ann. The author would like to thank the reviewers for valuable -~ comments on the paper. The author is gratefully acknowledge reviewer c for his very helpful comments and suggestions on an version of Theorem 2. The author is also grateful to
Math. Stat., vol. 42, pp. 1897-1908, 1971. ~
[ 111 G. Lorden, “Open-ended tests for Koopman-Dmois families,” Ann.
1121 G. Moustakides, “Optimal procedures for detecting changes in dist$bu- Stat., vol. 1, pp. 633-643, 1973.
A. Benveniste and M. Basseville for their many constructive comments on an early version of the paper.
REFERENCES
[I ] P. Armitage, “Sequential analysis with more than two alternative hy- potheses, and its relation to discriminant function analysis,” J. Roy. Statist. Soc. B. , vol. 12, pp. 137-144, 1950.
[2] R. K. Bansal and P. Papantoni-Kazakos, “An algorithm for detecting a change in a stochastic process,” IEEE Trans. Inform. Theory, vol. IT-32, pp. 227-235, Mar. 1986.
[3] M. Basseville and I. Nikiforov, Detection ofAbrupt Changes. Theory and Applications (Information and System Sciences Series). Englewood Cliffs, NJ: Prentice-Hall, 1993.
[4] A. A. Borovkov, Theory of Mathematical Statistics-Estimation and Hypotheses Testing. Moscow, USSR: Nauka, 1984 (in Russian).
[5] R. H. Berk, “Some asymptotic aspects of sequential analysis,” Ann. Stat., vol. 1, pp. 1 1 2 6 1 138, 1973.
[6] T.-T. Chien and M. B. Adams, “A sequential failure detection technique and its application,” IEEE Trans. Automat. Contr., vol. AC-21, pp. 75C757, Oct. 1976.
[7] R. H. Farrell, “Limit theorems for stopped random walks,” Ann. Math. Stat., vol. 35, pp. 1332-1343, 1964.
[8] P. M. Frank, “Fault diagnosis in dynamic systems using analytical and knowledge based redundancy-A survey and new results,” Automatica, vol. 26, no. 3, pp. 459474, 1990.
-~ tions,” Ann. Stat., vol. 14;pp. 1379-1387, 1986.
on- itoring based on statistical change detection algorithms,” in /roc. TOOLDIAG ’93, (Toulose, France, Apr. 1993), vol. 2, pp. 477-48 .
[ 141 __, “Application of statistical fault detection algorithms to naviga- tion system monitoring,” Automatica, vol. 29, no. 5, pp. 1275-1290, 1993.
[I51 E. S. Page, “Continuous inspection schemes,” Biometrika, vol. 41, pp. 10G115, 1954.
[I61 M. Pollak, “Optimal detection of a change in distribution,” Ann. Stat., vol. 13, pp. 20&227, 1985.
[ 171 Y. Ritov, “Decision theoretic optimality of the CUSUM procedure,” Ann. Stat., vol. 18, pp. 1464-1469, 1990.
[18] A. N. Shiryaev, “The problem of the most rapid detection of a distur- bance in a stationary process,”Sov. Math. -Dokl., no. 2, pp. 795-799, 1961.
[ 191 -, “On optimum methods in quickest detection problems,” Theory Prob. Appl., vol. 8, pp. 2 2 4 6 , 1963.
[20] G. Simons, “Lower bounds for average sample number of sequential multihypothesis tests,” Ann. Math. Stat., vol. 38, pp. 1343-1364, 1967.
[21] M. A. Sturza, “Navigation system integrity monitoring using redundant measurements,” Navigation, vol. 35, pp. 483-501, Winter 1988-1989.
[22] A. Wald, Sequential Analysis. [23] S. S. Wilks, Mathematical Statistics. New York Wiley, 1963. [24] A. S. Willsky, “A survey of design methods for failure detection in
[I31 I. Nikiforov, V. Varavva, and V. Kireichikov, “GNSS integrity
New York Wiley, 1947.
dynamic systems,” Automatica, vol. 12, pp. 601-611, 1976.