6
Systems & Control Letters 60 (2011) 486–491 Contents lists available at ScienceDirect Systems & Control Letters journal homepage: www.elsevier.com/locate/sysconle Input–output properties of the Page–Hinkley detector László Gerencsér a,, Cecilia Prosdocimi b a MTA SZTAKI, Hungarian Academy of Sciences, Budapest, Hungary b Department of Economics and Business, LUISS Guido Carli, Rome, Italy article info Article history: Received 28 April 2010 Received in revised form 9 January 2011 Accepted 4 April 2011 Available online 8 May 2011 Keywords: Page–Hinkley detector L-mixing Exponential inequalities False alarm rate abstract We consider the stochastic input–output properties of a simple non-linear dynamical system, the so- called Page–Hinkley detector, playing a key role in change detection, and also in queuing theory. We show that for L-mixing inputs with negative expectation the output process of this system is L-mixing. The result is applied to get an upper bound for the false alarm rate. The proof is then adapted to get a similar result for the case of random i.i.d. inputs. Possible extensions and open problems are given in the discussion. © 2011 Elsevier B.V. All rights reserved. 1. Introduction Detection of changes of statistical patterns is a fundamental problem in many applications; for a survey see [1,2]. A basic method for detecting temporal changes is the Cumulative Sum (CUSUM) test or Page–Hinkley detector, introduced by Page [3] and analyzed later, among others, by Hinkley [4] and Lorden [5]. The CUSUM test or Page–Hinkley detector is defined via a sequence of random variables (r.v.-s) (X n ), often called residuals in the engineering literature, such as likelihood ratios, such that E(X n )< 0 for n τ 1, and E(X n )> 0 for n τ , with τ denoting the change point. To give an example, in the case of i.i.d. samples with densities f (x0 ) and f (x1 ) before and after the change point, we would set X n =− log f (x n 0 ) + log f (x n 1 ) where x n is the nth sample. Letting S 0 := 0 and S n := n k=1 X k , the CUSUM statistics or Page–Hinkley detector is defined for n 0 as g n := S n min 0kn S k = max 0kn (S n S k ). (1) This work was supported by the CNR–MTA Cooperation Agreement, and by the University of Padova. Corresponding author. Tel.: +36 1 279 6138; fax: +36 1 4667 503. E-mail addresses: [email protected], [email protected] (L. Gerencsér). An alarm is given if g n exceeds a pre-fixed threshold δ> 0. The moment of alarm is defined by ˆ τ τ(δ) = inf n | S n min 0kn S k . (2) The Page–Hinkley detector was first used for independent observations, but its range of applicability has been extended for dependent sequences. The applicability of the Page–Hinkley detector to ARMA systems with unknown dynamics before and after the change has been demonstrated in [6], using heuristic arguments and simulations, and later adapted to Hidden Markov Models (HMM-s) in [7]. The Page–Hinkley detector for HMM- s, with known dynamics before and after the change, was also considered in [8], but no detailed analysis of the proposed algorithm was given. For a special class of dependent sequences the Page–Hinkley detector was used in [9], again without a theoretical analysis. Change detection for general dependent sequences was first rigorously studied in [10] under the very weak condition that lim N 1 N τ +N n=τ X n = I > 0, where the convergence is meant in probability, and where X n is the conditional loglikelihood ratio (as in Eq. (9) in Section 3 below). A deep theoretical analysis of the expected delay with given Average Run Length for HHM-s is provided in [11,12]. The Page–Hinkley detector (g n ) can be equivalently defined via a non-linear dynamical system, with a + = max{0, a}, as follows: g n = (g n1 + X n ) + with g 0 = 0. (3) 0167-6911/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.sysconle.2011.04.004

Input–output properties of the Page–Hinkley detector

Embed Size (px)

Citation preview

Page 1: Input–output properties of the Page–Hinkley detector

Systems & Control Letters 60 (2011) 486–491

Contents lists available at ScienceDirect

Systems & Control Letters

journal homepage: www.elsevier.com/locate/sysconle

Input–output properties of the Page–Hinkley detector

László Gerencsér a,∗, Cecilia Prosdocimi ba MTA SZTAKI, Hungarian Academy of Sciences, Budapest, Hungaryb Department of Economics and Business, LUISS Guido Carli, Rome, Italy

a r t i c l e i n f o

Article history:Received 28 April 2010Received in revised form9 January 2011Accepted 4 April 2011Available online 8 May 2011

Keywords:Page–Hinkley detectorL-mixingExponential inequalitiesFalse alarm rate

a b s t r a c t

We consider the stochastic input–output properties of a simple non-linear dynamical system, the so-called Page–Hinkley detector, playing a key role in change detection, and also in queuing theory. Weshow that for L-mixing inputs with negative expectation the output process of this system is L-mixing.The result is applied to get an upper bound for the false alarm rate. The proof is then adapted to get asimilar result for the case of random i.i.d. inputs. Possible extensions and open problems are given in thediscussion.

© 2011 Elsevier B.V. All rights reserved.

1. Introduction

Detection of changes of statistical patterns is a fundamentalproblem in many applications; for a survey see [1,2]. A basicmethod for detecting temporal changes is the Cumulative Sum(CUSUM) test or Page–Hinkley detector, introduced by Page [3] andanalyzed later, among others, by Hinkley [4] and Lorden [5]. TheCUSUM test or Page–Hinkley detector is defined via a sequenceof random variables (r.v.-s) (Xn), often called residuals in theengineering literature, such as likelihood ratios, such that

E(Xn) < 0 for n ≤ τ ∗− 1, and E(Xn) > 0 for n ≥ τ ∗,

with τ ∗ denoting the change point. To give an example, in the caseof i.i.d. samples with densities f (x, θ0) and f (x, θ1) before and afterthe change point, we would set

Xn = − log f (xn, θ0) + log f (xn, θ1)

where xn is the nth sample. Letting S0 := 0 and Sn :=∑n

k=1 Xk, theCUSUM statistics or Page–Hinkley detector is defined for n ≥ 0 as

gn := Sn − min0≤k≤n

Sk = max0≤k≤n

(Sn − Sk). (1)

This work was supported by the CNR–MTA Cooperation Agreement, and by theUniversity of Padova.∗ Corresponding author. Tel.: +36 1 279 6138; fax: +36 1 4667 503.

E-mail addresses: [email protected], [email protected](L. Gerencsér).

0167-6911/$ – see front matter© 2011 Elsevier B.V. All rights reserved.doi:10.1016/j.sysconle.2011.04.004

An alarm is given if gn exceeds a pre-fixed threshold δ > 0. Themoment of alarm is defined by

τ = τ (δ) = infn | Sn − min

0≤k≤nSk > δ

. (2)

The Page–Hinkley detector was first used for independentobservations, but its range of applicability has been extendedfor dependent sequences. The applicability of the Page–Hinkleydetector to ARMA systems with unknown dynamics before andafter the change has been demonstrated in [6], using heuristicarguments and simulations, and later adapted to Hidden MarkovModels (HMM-s) in [7]. The Page–Hinkley detector for HMM-s, with known dynamics before and after the change, was alsoconsidered in [8], but no detailed analysis of the proposedalgorithmwas given. For a special class of dependent sequences thePage–Hinkley detector was used in [9], again without a theoreticalanalysis. Change detection for general dependent sequences wasfirst rigorously studied in [10] under the very weak condition that

limN

1N

τ+N−n=τ

Xn = I > 0,

where the convergence ismeant in probability, andwhere Xn is theconditional loglikelihood ratio (as in Eq. (9) in Section 3 below). Adeep theoretical analysis of the expected delay with given AverageRun Length for HHM-s is provided in [11,12].

The Page–Hinkley detector (gn) can be equivalently defined viaa non-linear dynamical system, with a+ = max0, a, as follows:

gn = (gn−1 + Xn)+ with g0 = 0. (3)

Page 2: Input–output properties of the Page–Hinkley detector

L. Gerencsér, C. Prosdocimi / Systems & Control Letters 60 (2011) 486–491 487

From a system-theoretic point of view this system is not stablein any sense. E.g., for a constant positive input, (gn) becomesunbounded, and the effect of initial perturbations may not vanish.On the other hand, for an i.i.d. input sequence (Xn), with E(Xn) <0, some stability of the output process (gn) can be expected.The resulting non-linear stochastic system is a standard objectin queuing theory (see [13] Chapter 1 and [14] Chapters 1.5 and3.6), and in the theory of risk processes (see [15]). In this case theprocess (gn) is clearly a homogeneous Markov chain, also called aone-sided random walk or Lindley process. A number of stabilityproperties of (gn) have been established in [14,16,17], as we willrecall in Section 2.

Thepurpose of this paper is to extend these resultsmotivatedbychange detection for HMM-s, as described in Section 3. After givinga brief overview of the results for the i.i.d. case, we show that forL-mixing inputs with negative expectation and further technicalconditions, such as boundedness, the output process (gn) of thissystem is L-mixing. (For the definition of L-mixing see Appendixand [18] for further details). The result is applied to get an upperbound for the false alarm rate. The proof is adapted to get a similarresult for the more standard case of random i.i.d. inputs withnegative expectation, and finite exponential moments of somepositive order, reproducing some known tight bounds for the falsealarm rate. Further possible extensions and open problems areformulated in the Discussion section.

The assumption that (Xn) is an i.i.d. sequence reflects the tacitassumption that actually there is no change at all, i.e. τ ∗

= +∞.The Page–Hinkley detector can still be used tomonitor the process,and we may occasionally get an alarm. Our results can be appliedto give an upper bound for the almost sure false alarm rate as afunction of the threshold δ, defined as

lim supN−→+∞

1N

N−n=1

Ign>δ. (4)

A key quantity in change detection is the Average Run Length(ARL) defined as E0

τ (δ)

(see Chapter 6.2 in [19] or [20]).

For the i.i.d. case, it is shown in [20] that Eτ (δ)

, defined in

terms of the stationary distribution of the Markov chain (gn), isapproximatively reciprocal to the false alarm rate, for large δ. Inthis case, the false alarm rate could be defined using the Lawof Large Numbers for homogeneous Markov chains. However formodels with dependent and inhomogeneous input data, the falsealarm rate seems not to be directly quantifiable as a pathwisecharacteristic.

2. The case of i.i.d. inputs

If the input process (Xn) is an i.i.d. sequence, then (gn) isa homogeneous Markov chain. A number of results for thisMarkov chain are established in [16]. The existence of a uniqueinvariant measure is proven under the hypothesis E(X1) < 0 (seeProposition 8.5.1. and Theorem 10.0.1.) Moreover, it is proventhat (gn) is V -geometrically mixing under the assumptionE(exp c ′X1) < ∞ for some c ′ > 0 (see Chapter 16.1 for details).It follows that the strong law of large numbers holds for (gn), seeTheorem 17.0.1.

An alternative approach to the analysis of (gn) is given in[17]. It is noted there that the process (gn) can be generated ina convenient way by repeated applications of random functions:letting fX (g) := (g + X)+ we have

gn = fXn fXn−1 fXn−2 . . . fX1(g0). (5)

Using this representation, the existence of an invariant measure isproven, for the case when E(X1) < 0, via a backward iteration.

To formulate a useful additionwe need the following notations:

Fn := σ(Xi | i ≤ n) and F +

n := σ(Xi | i ≥ n + 1).

Thus Fn is the past, and F +n is the future of (Xn) up to time n.

Assume:

µ := µ(c ′) = E(exp c ′X1) < 1 for some c ′ > 0. (6)

Assuming that E(X1+) > 0, let c∗ be defined by

µ(c∗) = E(exp c∗X1) = 1. (7)

Theorem 1. Let (Xn) be a sequence of i.i.d. r.v.-s such that (6) holds.Then (gn), defined by Eq. (3), is L-mixing with respect to (w.r.t.)(Fn, F +

n ). In addition, for any c ′′ such that 0 < c ′′ < c ′ < c∗, wehave with µ = µ(c ′)

Eexp c ′′ gn

≤ 1 +

c ′′

c ′ − c ′′

µ

1 − µ=: Cc′′,c′ . (8)

An outline of the proof will be given in Section 5. The theoremholds true if (Xn) are independent, not necessarily identicallydistributed, and

µ(c ′) := supn

E(exp c ′Xn) < 1 for some c ′ > 0.

3. The case of L-mixing input

Consider now the case when the input (Xn) is L-mixingw.r.t. (Fn, F +

n ). This condition is motivated by change detectionproblems for HMM-s. In the case of a HMM with finite state spaceand continuous read-out, parametrized by θ0 and θ1 before andafter the change, the residuals would be defined as

Xn = − log p(Yn|Yn−1, . . . , Y0, θ0) + log p(Yn|Yn−1, . . . , Y0, θ1). (9)

(Xn) is L-mixing under certain technical conditions, see [21,22].We need two additional technical assumptions, using the

notations of the Appendix. The first one is fairlymild, requiring that

+∞−τ=0

τγq(τ , X) < +∞ for all 1 ≤ q < +∞. (10)

The second assumption is much more restrictive, saying that

M∞(X) < +∞ and Γ∞(X) < +∞. (11)

This condition will be discussed in the Discussion section. Wedefine a critical exponent in terms ofM∞(X) andΓ∞(X) as follows:

β∗:= ε/(4M∞(X)Γ∞(X)). (12)

Then, for any β ′≤ β∗ define

λ = λ(β ′) := exp4M∞(X)Γ∞(X)(β ′)2 − β ′ε

. (13)

Note that for the critical value β∗ we have λ(β∗) = 1, and forβ ′ < β∗ we have λ(β ′) < 1. The main result of this section isthen the following:

Theorem 2. Let (Xn) be an L-mixing process w.r.t. (Fn, F +n ) such

that (10) and (11) are satisfied, and

E(Xn) ≤ −ε < 0 for all n ≥ 0. (14)

Let (gn) be defined as in (3). Then (gn) is L-mixing w.r.t. (Fn, F +n ). In

addition, for any β ′′, β ′ such that 0 < β ′′ < β ′ < β∗, we have withλ = λ(β ′)

Eexpβ ′′ gn

≤ 1 +

β ′′

β ′ − β ′′

λ

1 − λ=: Kβ ′′,β ′ . (15)

Page 3: Input–output properties of the Page–Hinkley detector

488 L. Gerencsér, C. Prosdocimi / Systems & Control Letters 60 (2011) 486–491

Proof of Theorem 2. Use the following equivalent formulation for(gn):

gn = max1≤i≤n

(Xi + · · · + Xn)+, (16)

and define the auxiliary process

gn,n−τ (X) := max1≤i≤n−τ

(Xi + · · · + Xn)+. (17)

Lemma 1. Let (Xn) and β ′′, β ′ and λ be as in Theorem 2. Then

Eexpβ ′′ gn,n−τ (X)

≤ 1 +

β ′′

β ′ − β ′′

λτ+1

1 − λ. (18)

For the proof of Lemma 1 we need the following result:

Lemma 2. Let (Xn) and β ′ and λ be as in Theorem 2. Then for anyx ≥ 0

Pgn,n−τ (X) > x

λτ+1

1 − λexp(−β ′x). (19)

Proof of Lemma 2. We follow the arguments of the proof ofTheorem 3.1 in [23]. First we estimate E

expβ ′(Xi + · · · + Xn)

,

1 ≤ i ≤ n − τ . Define

Dk := Xk − E(Xk),

for all k ≥ 1. ObviouslyE(Dk) = 0 for all k,M∞(D) ≤ 2M∞(X), andΓ∞(D) = Γ∞(X). By the exponential inequality, given as Theorem5.1 in [23], applied to the process (Dk)i≤k≤n with weights fk = β ′

we obtain

E

exp

β ′

n−k=i

Dk − 2M∞(D)Γ∞(D)β ′2(n − i + 1)

≤ 1.

After rearrangement and multiplication by expβ ′∑n

k=i E(Xk), weget

E

exp

β ′

n−k=i

Dk + E(Xk)

≤ exp

αβ ′2(n − i + 1) + β ′

n−k=i

E(Xk)

,

with α := 2M∞(D)Γ∞(D). Noting that Dk + E(Xk) = Xk, E(Xk) ≤

−ε, and α ≤ 4M∞(X)Γ∞(X), we conclude that

E

expβ ′

n−k=i

Xk

≤ exp

4M∞(X)Γ∞(X)β ′2(n − i + 1)

− β ′ε(n − i + 1).

Take β ′ < β∗. Recalling the definition of λ(β ′) we get

E

expβ ′

n−k=i

Xk

exp

4M∞(X)Γ∞(X)β ′2

− β ′ε(n−i+1)

= λ(β ′)n−i+1.

Now, for β ′ < β∗, we have λ = λ(β ′) < 1, and thus we obtain forx ≥ 0

Pgn,n−τ (X) > x

n−τ−i=1

P

Xi + · · · + Xn+

> x

n−τ−i=1

Eexpβ ′(Xi + · · · Xn)

/ exp(β ′x)

n−τ−i=1

λn−i+1/ exp(β ′x)

n−l=τ+1

λl/ exp(β ′x)

+∞−l=τ+1

λl/ exp(β ′x) =λτ+1

1 − λexp(−β ′x).

Proof of Lemma 1. We have

Eexpβ ′′ gn,n−τ (X)

=

∫+∞

0P(expβ ′′gn,n−τ (X) > x)dx. (20)

For x ≥ 1 we get by Lemma 2

P(expβ ′′gn,n−τ (X) > x) ≤λτ+1

1 − λexp

β ′ log xβ ′′

λτ+1

1 − λx−β ′/β ′′

. (21)

For x < 1 we have P(expβ ′′ gn,n−τ (X) > x) = 1. Combining (20)and (21) we get

Eexpβ ′′ gn,n−τ (X)

= 1 +

∫+∞

1P(expβ ′′gn,n−τ (X) > x)dx

= 1 +λτ+1

1 − λ

∫+∞

1x−β ′/β ′′

dx ≤ 1 +

β ′′

β ′ − β ′′

λτ+1

1 − λ.

Corollary 1. Under the conditions and notations of Theorem 2 wehave

‖ gn,n−τ (X) ‖p ≤ Kpλ(τ+1)/p (22)

for any integer p ≥ 1, where Kp :=1

β ′′

β ′′

β ′−β ′′

1/p p!

1−λ

1/p.

The claim follows directly from Lemma 1 and the inequality

expβ ′′gn,n−τ

≥ 1 + (β ′′)p

(gn,n−τ )p

p!. (23)

We continue the proof of Theorem 2. The starting point is (16)and (17), with i replaced by k:

gn = max1≤k≤n

(Xk + · · · + Xn)+, (24)

and

gn,n−τ (X) := max1≤k≤n−τ

(Xk + · · · + Xn)+. (25)

Since Xn is Fn-adapted for any n ∈ N, it follows that (gn) is Fn-adapted. To show that (gn) is M-bounded note that gn = gn,n(X).

For any fixed q, let p := ⌈q⌉ be the first integer greater or equalto q. Then, by Corollary 1, we have ‖ gn ‖q ≤‖ gn ‖p ≤ Kpλ

1/p. Toshow that (gn) is L-mixing we make use of Lemma 4 in Appendix.Let

X+

k,n−τ := EXk|F

+

n−τ

.

Since (Xn) is L-mixing, for k ≥ n − ⌈τ2 ⌉ + 1 , or k − (n − τ) ≥

τ − ⌈τ2 ⌉ + 1, X+

k,n−τ is a good approximation of Xk. A key step is toapproximate gn by

g++

n,n−τ := maxn−⌈

τ2 ⌉+1≤k≤n

(X+

k,n−τ + · · · + X+

n,n−τ )+. (26)

Page 4: Input–output properties of the Page–Hinkley detector

L. Gerencsér, C. Prosdocimi / Systems & Control Letters 60 (2011) 486–491 489

Note that g++

n,n−τ isF +

n−τ measurable, as required. For each τ , define

γ ++

q (τ ) := supn≥τ

‖ gn − g++

n,n−τ ‖q and

Γ ++

q (g) :=

+∞−τ=0

γ ++

q (τ ).(27)

By Lemma 4 in the Appendix we have

Γq(g) ≤ 2Γ ++

q (g). (28)

To estimate gn − g++

n,n−τ we use an intermediate approximation ofgn:

gn,n−τ := maxn−⌈

τ2 ⌉+1≤k≤n

(Xk + · · · + Xn)+. (29)

Note that gn,n−τ is not necessarily F +

n−τ -measurable. Write

‖ gn − g++

n,n−τ ‖q ≤‖ gn − gn,n−τ ‖q + ‖ gn,n−τ − g++

n,n−τ ‖q (30)

γ q(τ ) := supn≥τ

‖ gn − gn,n−τ ‖q, Γ q(g) :=

+∞−τ=0

γ q(τ ),

γ ++

q (τ ) := supn≥τ

‖ gn,n−τ − g++

n,n−τ ‖q, Γ++

q (g) :=

+∞−τ=0

γ ++

q (τ ).

Taking supn≥τ in Eq. (30) and summing over τ we get

Γ ++

q (g) ≤ Γ q(g) + Γ++

q (g). (31)

To estimate ‖ gn−gn,n−τ ‖q we use the following inequality: letK be a finite set, and K = K1 ∪ K2, with K1 ∩ K2 = ∅. Then for anyAk ∈ R+, k ∈ Kmaxk∈K

Ak ≤ maxk∈K1

Ak + maxk∈K2

Ak. (32)

With K = 1, . . . , n, K1 = 1, . . . , n − ⌈τ2 ⌉, K2 = n − ⌈

τ2 ⌉ +

1, . . . , n:

gn − gn,n−τ ≤ max1≤k≤n−⌈

τ2 ⌉

(Xk + · · · + Xn)+ = gn,n−⌈τ2 ⌉(X). (33)

Now for any real q ≥ 1 let p := ⌈q⌉. Using Corollary 1 we finallyget

γ q(τ ) ≤ supn≥τ

‖ gn − gn,n−τ ‖p ≤‖ gn,n−⌈τ2 ⌉(X) ‖p ≤ Kpλ

(⌈ τ2 ⌉+1)/p.

(34)To estimate ‖ gn,n−τ − g++

n,n−τ ‖q we use the following simpleinequality: let (an), (bn), n ≥ 1, be sequences of real numbers,and for 1 ≤ m ≤ n seta := max

m≤k≤n(ak + · · · + an)+ and b := max

m≤k≤n(bk + · · · + bn)+.

Then |a − b| ≤∑n

k=m |ak − bk|. Applying this we get

‖ gn,n−τ − g++

n,n−τ ‖q ≤

n−k=n−⌈

τ2 ⌉+1

‖ Xk − X+

k,n−τ ‖q

τ−j=⌊

τ2 ⌋+1

γq(j, X).

It follows that, using condition (10),

Γ++

q (g) =

+∞−τ=0

γ ++

q (τ ) ≤

+∞−τ=0

τ−j=⌊

τ2 ⌋+1

γq(j, X)

+∞−τ=0

τγq(τ , X) < +∞. (35)

Combining (35), (34), (31) and (28), we conclude that Γq(g) <+∞, as stated. To conclude the proof of Theorem 2, we note oncemore that (15) follows from Lemma 1, recalling that gn = gn,n.

4. False alarm rate

As a corollary to Theorems 1 and 2 we can get an upper boundfor the a.s. false alarm rate defined as

lim supN−→+∞

1N

N−n=1

Ign>δ, (36)

with the tacit assumption that τ ∗= +∞. This is in fact the most

important implication of the results of the previous sections. Weprove the result for L-mixing inputs; the i.i.d. case will be brieflycovered below.

Theorem 3. Let (Xn) and β∗ be as in Theorem 2, and let (gn) bedefined as in (1). Then for any δ > 0, and any 0 < β ′′ < β ′ < β∗ wehave

lim supN−→+∞

1N

N−n=1

Ign>δ ≤ Kβ ′′,β ′ exp(−β ′′δ) (37)

where Kβ ′′,β ′ is defined in Theorem 2.

Proof. By Theorem 2 we have

Pgn > δ

= P

exp(β ′′ gn) > exp(β ′′ δ)

≤ Kβ ′′,β ′/ exp(β ′′ δ) (38)

for all n ≥ 1. Let δ′ < δ and let f be a smooth Lipschitz-continuousfunction such that Ig>δ ≤ f (g) ≤ Ig>δ′. Transformations of L-mixing processes via real Lipschitz-continuous bounded functionsare L-mixing and by Theorem 2 (gn) is L-mixing, thus (f (gn)) is alsoL-mixing.

Using the strong law of large numbers for L-mixing processes,we get, after centering,

lim supN−→+∞

1N

N−n=1

Ign>δ ≤ lim supN−→+∞

1N

N−n=1

f (gn)

= lim supN−→+∞

1N

N−n=1

Ef (gn)

≤ lim sup

N−→+∞

1N

N−n=1

EIgn>δ′

. (39)

Taking into account (38), and that δ′ is arbitrary, we get theclaim.

5. The i.i.d. case revisited

First we outline the proof of Theorem 1. Standard results of thetheory of risk processes imply that for any c ′ such that 0 < c ′ < c∗

with c∗ as in Eq. (7), we have E (exp c ′gn) < ∞, see [24,15]. Theargument below partially follows the line of proof of this knownresult. Trivially, for any n ∈ N, Fn and F +

n are independent, and(gn) is Fn-adapted. Following the proof of Theorem 2 we definethe auxiliary process

gn,n−τ (X) := max1≤k≤n−τ

(Xk + · · · + Xn)+. (40)

The exponential moments of gn,n−τ (X) can be bounded as inLemma 1:

Lemma 3. Let (Xn) and c ′ > c ′′ and µ be as in Theorem 1. Then

Eexp c ′′gn,n−τ (X)

≤ 1 +

c ′′

c ′ − c ′′

µτ+1

1 − µ. (41)

Page 5: Input–output properties of the Page–Hinkley detector

490 L. Gerencsér, C. Prosdocimi / Systems & Control Letters 60 (2011) 486–491

For the proof we use the following inequality (see Lemma 2):

Pgn,n−τ (X) > x

µτ+1

1 − µexp(−c ′x) (42)

for any x ≥ 0. In the proof the required exponential inequalityreduces to

Eexp c ′

n−k=i

Xk

=

n∏k=i

Eexp c ′Xk

≤ µn−i+1.

The proof of Lemma 3 is obtained by mimicking the proof ofLemma 1.

To show that (gn) is L-mixing we use the F +

n−τ -adaptedapproximation

g++

n,n−τ = gn,n−τ := maxn−τ+1≤k≤n

(Xk + · · · + Xn)+, (43)

in analogy with (26) and (29). We get as in (33)

gn − gn,n−τ ≤ max1≤k≤n−τ

(Xk + · · · + Xn)+ = gn,n−τ , (44)

and the proof is completed as in the L-mixing case.For the false alarm rate we get, as in the case Theorem 3:

Theorem 4. Let (Xn) and c∗ be as in Theorem 1, and let (gn) bedefined as in (1). Then for any δ > 0, and any 0 < c ′′ < c ′ < c∗

we have

lim supN−→+∞

1N

N−n=1

Ign>δ ≤ Cc′′,c′ exp(−c ′′δ) (45)

where Cc′′,c′ is defined in Theorem 1.

As an example, let Xn = − log f (Yn, θ0)+ log f (Yn, θ1). Thenwehave E

exp(Xn)

= 1, i.e. c∗

= 1. Thus the a.s. false alarm rate isless than Cc′′,c′ exp(−c ′′δ) for any c ′′ < 1, essentially reproducingthe bound K exp(−δ) given in [20], Section 5.3.

6. Discussion

The problem formulation of this paper was motivated by theproblemof change detection of HiddenMarkov Processes. It shouldbe admitted though, that the results of the paper are not directlyapplicable. Namely, the Page–Hinkley-scores Xn defined under (9)as

Xn = − log p(Yn|Yn−1, . . . , Y0, θ0) + log p(Yn|Yn−1, . . . , Y0, θ1).

may not be bounded, i.e. the condition M∞(X) < +∞ maynot be satisfied. An even more serious restriction is the conditionΓ∞(X) < +∞. These two conditions were crucial for the validityof an exponential inequality for partial sums of L-mixing processes,used in the proof of Lemma 1. A careful study of the proof ofLemma 1 shows that what we really need is the validity of

E exp

β ′

n−k=1

Xk − E(Xk)

≤ C exp

c(β ′)2n

, (46)

with some β ′ > 0 and C, c > 0. A necessary condition for this isthat

J(β ′) = lim supn

1nlogE exp

β ′

n−k=1

Xk

< +∞. (47)

J(β ′) in Eq. (47) is a well known quantity in risk sensitive control.It is known that J(β ′) < +∞ for some β ′ > 0 if |Xk| ≤ X∗

k whereX∗

k = ZTk Zk where Zk is the output of a finite-dimensional, time-

invariant, stable linear Gaussian system, see Appendix F in [25].

An important and relevant example is when the data Yn aregenerated by a finite dimensional, time-invariant, stable, linearGaussian system. Then, writing it in innovation form, via Kalman-filtering, the conditional density log p(Yn|Yn−1, . . . , Y0, θ), andhence the Page–Hinkley-score Xn does indeed satisfy the abovemajorization condition. We conclude that (47) is satisfied withsome β ′ > 0. A closer look at the functional J(β ′), which can becomputed explicitly, reveals that actually the stronger condition(46) also holds. We conclude that, adapting the arguments ofTheorem 2, an almost sure upper bound for the false alarm ratecan be given when trying to detect changes in the dynamics of afinite dimensional, time-invariant, stable, linear Gaussian system.

It may be of interest to establish the stability properties ofthe Page–Hinkley-detector for deterministic inputs, mimicking thei.i.d. case. We conclude this section by formulating the followingproblem: assume that (Xn) is a bounded deterministic sequencesatisfying

lim supN−→+∞

1N

N−n=1

Xn < 0. (48)

Let (gn) be the response of the Page–Hinkley-detector driven by(Xn). Does it follow that

lim supN−→+∞

1N

N−n=1

gn < +∞? (49)

Appendix. L-mixing processes

We summarize a few definitions given in [18]. Let (Ω, F , P)be a probability space, and let (Xn) be a stochastic process on(Ω, F , P).

Definition 1. Wesay that (Xn) isM-bounded if for all 1 ≤ q < +∞

Mq(X) := supn≥0

‖ Xn ‖q < +∞.

We can also defineMq(X) for q = +∞ as

M∞(X) := supn≥1

ess sup |Xn|.

Let (Fn)n≥1 be an increasing family of σ -fields and let (F +n )n≥1 be

a decreasing family of σ -fields, Fn ⊆ F and F +n ⊆ F for any n.

Assume that Fn and F +n are independent for all n. Let τ ≥ 0 be an

integer, and let for 1 ≤ q < +∞

γq(τ , X) = γq(τ ) := supn≥τ

‖ Xn − E(Xn|F+

n−τ ) ‖q,

Γq(X) :=

+∞−τ=0

γq(τ ).

We can also define

γ∞(τ , X) := supn≥τ

ess sup |Xn − E(Xn|F+

n−τ )|,

Γ∞(X) :=

+∞−τ=0

γ∞(τ , X).

Definition 2. A process (Xn) is L-mixing w.r.t. (Fn, F +n ) if Xn is Fn-

measurable for all n ≥ 1, (Xn) isM-bounded, and Γq(X) < +∞ forall 1 ≤ q < +∞.

A prime example of L-mixing process is the output processof a stable linear stochastic system driven by a M-bounded i.i.d.sequence. To estimate γq(τ , X) the following lemma is useful:

Page 6: Input–output properties of the Page–Hinkley detector

L. Gerencsér, C. Prosdocimi / Systems & Control Letters 60 (2011) 486–491 491

Lemma 4. Let F ′⊂ F be two σ -algebras. Let ξ be a F -measurable

r.v.. Then for any 1 ≤ q < +∞ and any F ′-measurable r.v. η wehave

‖ ξ − E(ξ |F ′) ‖q ≤ 2 ‖ ξ − η ‖q . (A.1)

Centered L-mixing processes satisfy the strong law of largenumbers, see Corollary 1.3 in [18].

References

[1] M. Basseville, I. Nikiforov, Detection of Abrupt Changes: Theory andApplication, Prentice-Hall, 1993.

[2] B. Brodsky, B. Darkhovsky, NonparametricMethods in Change-Point Problems,Kluwer Academic Publishers, 1993.

[3] E. Page, Continuous inspection schemes, Biometrika 41 (1/2) (1954) 100–115.[4] D. Hinkley, Inference about the change-point from cumulative sum tests,

Biometrika 58 (3) (1971) 509–523.[5] G. Lorden, Procedures for reacting to a change in distribution, The Annals of

Mathematical Statistics 42 (6) (1971) 1897–1908.[6] J. Baikovicius, L. Gerencsér, Change-point detection as model selection,

Informatica 3 (1) (1992) 3–20.[7] L. Gerencsér, G. Molnár-Sáska, Change detection of hiddenMarkovmodels, in:

Proceedings of the 43th IEEE Conference on Decision and Control, 2004, pp.754–1758.

[8] B. Chen, P. Willett, Quickest detection of hiddenMarkovmodels, in: Proc. 36thIEEE CDC, San Diego, CA, 1997, pp. 3984–3989.

[9] R. Jana, S. Dey, Change detection in teletraffic models, IEEE Transactions onSignal Processing 48 (3) (2000) 846–853.

[10] T.L. Lai, Information bounds and quick detection of parameter changesin stochastic systems, IEEE Transactions on Information Theory 44 (1998)2917–2929.

[11] C. Fuh, Sprt and cusum in hidden Markov models, Annals of Statistics 31 (3)(2003) 942–977.

[12] C. Fuh, Asymptotic operating characteristics of an optimal change pointdetection in hidden Markov models, Annals of Statistics 32 (5) (2004)2305–2339.

[13] L. Takacs, Introduction to the Theory of Queues, Oxford University Press, NewYork, 1962.

[14] S. Asmussen, Applied Probability and Queues, Springer-Verlag, New York,2003.

[15] H. Panjer, G. Willmot, Insurance Risk Models, Society of Actuaries, 1992.[16] S. Meyn, R. Tweedie, Markov Chains and Stochastic Stability, Springer-Verlag,

London, 1993.[17] P. Diaconis, D. Freedman, Iterated random functions, SIAM Review 41 (1)

(1999) 45–76.[18] L. Gerencsér, On a class of mixing processes, Stochastics 26 (1989) 165–191.[19] H. Poor, O. Hadjiliadis, Quickest Detection, Cambridge University Press, 2009.[20] M. Pollak, A.G. Tartakovsky, Asymptotic exponentiality of the distribution of

first exit times for a class of Markov processes with applications to quickestchange detection, Theory of Probability and its Applications Probability andApplications 53 (3) (2009) 430–442.

[21] L. Gerencsér, G. Michaletzky, G. Molnár-Sáska, An improved bound forthe exponential stability of predictive filters of hidden Markov models,Communications in Information and Systems 7 (2) (2007) 133–152 (specialVolume on Stochastic Control and Filtering, in Honor of Tyrone Duncan on theOccasion of his 65th Birthday, guest eds.: A. Bensoussan, S. Mitter and B. Pasik-Duncan).

[22] L. Gerencsér, G. Molnár-Sáska, Identification of Hidden Markov Models -Uniform LLN-s, in: Modeling, Estimation and Control, Festschrift in Honor ofGiorgio Picci on the Occasion of his Sixty-Fifth Birthday, in: Lecture Notes inControl and Information Sciences, vol. 364, 2007, pp. 135–149, Springer (Ed.).

[23] L. Gerencsér, Almost sure exponential stability of random linear differentialequations, Stochastics and Stochastics Reports 36 (1991) 91–107.

[24] E. Sparre Andersen, On the collective theory of risk in the case of contagionbetween the claims, in: Transactions of the XV-th International Congress ofActuaries, 1957, pp. 219–229.

[25] A. Stoorvogel, J. van Schuppen, System identification with informationtheoretic criteria, in: S. Bittanti, G. Picci (Eds.), Identification, Adaptation,Learning, Springer-Verlag, Berlin, 1996, pp. 289–338.