A stopped stochastic approximation algorithm

Systems & Control Letters 11 (1988) 107-115 North-Holland

107

A stopped stochastic approximation algorithm *

G. YI N Department of Mathematics, Wayne State University, Detroit, MI 48202, U.S.A.

Received 11 January 1988 Revised 27 April and 26 May 1988

Abstract: A stopping time problem for multidimensional stochastic approximation algorithms is studied in this paper. The stopping rule is so determined that the recursive procedure will be terminated if the unknown parameter 0 is inside a desired ellipsoidal confidence region with high probability. The stopped process is shown to be asymptotically normal by means of weak convergence methods.

Keywords: Stopping times, Stochastic approximation, Confidence ellipsoid, Asymptotic normality, Weak convergence.

1. Introduction

Since the pioneer work of Robbins and Monro [1], an extensive literature has been developed in the study of stochastic approximation algorithms. These algorithms have been successfully employed in various stochastic systems, such as adaptive control, signal estimation and detection, Monte Carlo optimization and many other related fields.

The well-known Robbins-Monro (RM) procedure is concerned with locating the root of a function f(x), with random noise corrupted observations. To find the root 0, one generates a sequence of estimates { X. }, according to:

(a)

(2)

Y,, is the observation taken at time n, and ~,, is the unobservable noise. In a wide range of applications, one may wish to terminate the successive approximation procedure if

the iterate X,, is sufficiently close to 0 with high probability. Moreover, when a procedure is implemented LtA~r,., a on a digital computer, infinite loops are rarely desired. Therefore, ,ho ~ is real need for determining good

stopping rules. Relatively little attention has been focused on the issue of stopping time problems for stochastic

approximation. To our knowledge, this problem was only investigated in [2] and [3]. Only one dimensional algorithms were studied, and only i.i.d, random variables were dealt with. From an application point of view, this is not quite satisfactory. The problems which we are facing are normally multidimensional, and the noise processes are rarely independent. We thus need to give more realistic considerations.

In this paper, we present some preliminary results of a multidimensional stopping rule for algorithm (1), (2). We shah determine a stopping rule N by means of constructing an ellipsoidal confidence region for 0, with volume less than or equal to e, so that for each 0 < a < 1, as ~ --* 0,

lira P { X, ~ the confidence region } = 1 - a. (3) e

This stopping rule indicates that the recursive computations will be terminated if the estimation error is small enough with confidence coefficient close to 1.

* This research was supported in part by Wayne State University under the Wayne State University Research Award.

0167-6911/88/$3.50 © 1988, Elsevier Science Publishers B.V. (North-Holland)

108 G. Yin / Stopped stochastic approximation algorithm

We organize the rest of our paper as follows: the problem formulation and the main theorem are given in Section 2; the asymptotic normality for the stopped stochastic approximation algorithm is derived in Section 3; and consistent estimates of various unknown quantities are constructed in Section 4. Finally, some concluding remarks are made in Section 5.

2. Formulation and main theorem

To emphasize the main idea of determining the stopping rule, and to simplify the notations, we shall consider the followiag simplest form of stochastic approximation model.

Let X, ¢; ~ R e, f ( . ) " R" ~ R e, and

I(/(X.) + ~.). x.+1 = x . -

The following conditions will be needed.

(4)

(AI) f ( . ) is a continuous function, such that: (1) f ( x ) = 0 has a unique root 0. (2) f (x)-H(x-O)+O(Ix-OI 1+~) for some y > 0 . A = ½ I - H is stable in the sense that all

eigenvalues of A have negative real parts. (3) I f (x ) l _<K(1 + Ixl), for some K > 0 .

(A2) Let ~ be a sequence of increasing o-algebras, such that: (1) (£., . ~ ) is a martingale difference for each n. (2) For some a > 2, sup, E ( ] ~ , l " l ~ _ l ) < oo. (3) There is a symmetric positive definite matrix S e R r×', such that E(~,,/2" I @_1)--* S, as n --* oo.

(A3) There exists a twice continuously differentiable Liapunov functi,m v (x )> O, such that v (x )> 0 V x ~ O , and v ( x ) ~ oo, as I x l - " oo. Moreover, v ' ( x ) f ( x ) > O , for all x ~ O .

Remark 1. Under (A1)-(A3), algorithm (4) is strongly consistent, i.e., X,, -~ 0 w.p. 1. Moreover, V/~(X,,- 0) D

.A/'(0, Z), where 2' is the asymptotic covariance matrix given by

fO ° -Y = eats e a't dt

with A = ½1- H (cf. [4]).

(5)

Since V~(X, , - 0) is asymptotically normal, we can construct an ellipsoidal confidence region for 0 in the following way. Let ,y~-i be nonsingular and ,y~-i ..., ,y-1 as n --, oo; let

Then,

(6)

E,, =-- {o; < o} (7)

is an ellipsoidal confidence region for 0. Let V(E, ) denote the volume of E,,. The following formula was derived in [5] (see also [6]):

V( E . ) = ¢r'/2( c /n)r /e (de t ,y,,)z/2

r (½,+ 1) " (8)

G. Fin / Stopped stochastic approximation algorithm 109

It is clear that to make the estimates close to 0 with high probability is equivalent to make 0 inside of a confidence ellipsoid with volume small enough. Henceforth, instead of forcing I X, - 0l -< e, we shall make V(E.)

The assumption 27;,- ~ -* 2?-I implies that

P U~ - n( X, - O)2?-~( X,, - O) ~ 0 (9)

D and hence, U~ --, X 2, the Chi-square random variable with r degrees of freedom. Therefore, if c = c a, we

have R

P ( O e E . ) f P(U~ <c,,) ~ e (x2 ~ c . ) f l - a . (10)

For any e > O, we can choose n large enough, such that V(E.) < e r. Consequently,

~rca(det 2?) 1/" (11) Pe,a-- < n .

(r(½r + 1)) "/"

Thus, for any e > 0, if a confidence ellipsoid with volume not larger than r" is desired, we impose the following stopping rule Ne,,.:

N e .. = inf{ n; Z'~a < n } where J'~a = ¢rc..(det 27n) 1/r (12) ' e2(F(½r+ 1)) z/ '"

Our main result is:

Theorem 1. I f there exists a sequence { ~. }, such that Zn -'* ~, w.p. 1, with T. given in (5), and if conditions (A1)-(A3) hold, then

N~. (13) lim "'- - 1 w.p. I e~O Pe,a

and

l imP{0; O~-E~.° and V(E,v.°) < e ' } - 1 - a . e ....~ O ' '

(14)

Remark 2. The construction of the sequence {2?, } ~.~.d its consistency will be demonstrated in Section 4.

Equation (13) is readily verified. We need only prove (14). In order to do so, we need only prove the asymptotic normality for the stopped stochastic process. I.e., we need only show

f ~ ( X j v . . - O ) ~ .Af(0, 2?). (15)

3. Asymptotic normality for the stopped process

It is well known (cf. [4]) that the iterates X. defined by (4) satisfy for some m > 1,

~ + 1 ( X , , + , - O ) - - --~ 1 + -~ Ank~k+O(1) k----m

(16)


where o(1) -* 0 in probability, and

I-I ( l + A / l ) i f j > k , Ajk = i f k + l

I if j=k. (17)

Moreover, since ~/1 + 1/n = 1 + O(1 /n) , (16) can be further reduced to

¢ ~ + l ( X . + , - O ) = - ~ 1 k=m -- '~ Ank~k "{- o(1) (18)

with o(1)--, 0 in probability. By virtue of equation (13), Ne. . --, Qo and v~,~-, oo w.p. 1, as ~--, 0. With some modification of the

e ' ,

argument in [4], the following lemma holds.

Lemma 1. Under (A1)-(A3),

tN~,.+I(XN..°+, - 0 ) N.,. 1 A = - E ~- ..,.,~ + o(1).

k~m (19)

Due to space ILmitations, we shall omit the proof. In view of the above lernma, to derive the asymptotic normality, we need only work with the expression

Ne'a 1

E ~-~,,..°.~. (20) k=m

To proceed, we use the martingale central limit theorem in [7] and the idea of random change of time in [8] to establish the desired results.

For any t ~ [0, 1], define

[ntl 1 w~(t)- E ~ a . ~ , (2~)

k--m

[ntl 1 .~.(t) = E -~A.kE($k~.'k l ~'k-,)A~,k. (22)

k ffi m

Now IV.(.)¢ D~[0, 1], and Z . ( . ) ~ Dr×'[0, 1], where these D spaces are the spaces of functions which are right continuous and have left hand limits endowed with the Skorohod topology (cf. [7-9]).

Lemma 2. Under (A1)-(A3), W,(.)=* W(.), with W(.) a process with independent Gaussian increments, and sample paths in cr[o, 1].

In the above, ~ denotes the notion of weak convergence, and cr[0, 1] denotes the space of continuous functions on [0, 1].

Proof. In view of the martingale inequality, for any ~ > O,

) e [ sup IW, , ( t ) -W. ( t - ) l t~[O,1]

° sup t e [O,1l

f ~ E I W~(1) - w . ( 1 - ) I ~

< ~ + x2 dx

G. kin / Stopped stochastic approximation algorithm 111

E I W~(1) - W~(I-) I'- <~/+

~. 1"" A" lira E( I ~ 0 ) - ~ 0 - ) I ) - lira E ~ . ~ A . ~ n-*oo n--* oo [ n l - ]

n - - * Qo [ n l - ]

Since IAnkl 2 < K ~ ( k / n ) 2~ for some h~ >0, K 1 >0, and

• 2~, ~ k ~ , _ , forums,_, :nloo --'~i = 2 ~ 1 d u = l , k--1

(23)

we have that

~- ,1 K1 lim -~ l Ank l 2 < lira g l "

--® k-~ ,-® ~ E k ~'-~ = .... < ~. k-I 2~,~ (24)

(A2) and (24) together with the arbitrariness of ~/imply that the limit in (23) is 0. That is,

sup t~ [o , l ]

Next, we compute the quadratic variation of W.(-). Let W~(.) denote the i-th component of W~(.), and let 0 = t~' < t~' < . . - < t m u . t be a partition of [0, t]. If max il m _ ,,. ffi tt+l -t [-+0, a sm-~co , then

Mm-1

E l,ffiO

(W~(tt+,)- W~(t~))(W~(tt+1)- W~(t~))

Mm-1

=E l.ftO

Int.+l] 1 ) k--lnt?]

k.=O "k

= $~/(t).

[ ntV+ d 1 )

h=lnt~ ' ]

Hence, the quadratic variation of W ~ ( . ) is given by

~,~Jft)= [W~tt),W~(t)] (25a)

or

[nil 1 , , 2. (t) = ~, -~A,~,4.~.

k--O (25b)

Clearly, ,Y.(t) and ,~,(t) have the same limit in probability. To compute the limit of ,~,(t), we need only examine the limit of ,~,(t).

We claim that -~n(t) -* Z(t), with ,~(t) given by

f : ea"S e a'" du if t > 0, .~(t) = h,,

if t=O.


This is an extension of the argument in [4] . For fixed t > 0 , we have

[nt] l e -Ain tk /n )se -A ' ln (k /n )+(~@lmi , ~n=~Im 1 z.(t)= E -; k= l

[nt] + ~ 1Ank(E(~k~k l '~ ' k - , ) - -S )A 'nk .

k=m

e-A In(k/n) s e-A" In(k/n)

(26)

The last two terms on the right hand side of (26) tend to 0, as n -* oo. Thus, we need only evaluate the first term. As r/---, oo,

[nt] [ntl 1 -A In(k/n) S In(k/n) 1 E k e-A In(k/n)s e-A In(kin,= X k/ln,----"'~ e e - #

[nt] kffil k - 1

fo I 1 e_ A !n(ut) S ln(ut) - ' u e -A' du

_ _ f-o eA"s eA,,, du. (27.) d _ in t

P Also, ~.?n(0)-0 for any n. As a consequence, Z n ( t ) - " Z( t ) as n--- oo. It is fairly easy to check the following:

(i) ~ ( t ) is continuous on [0, 1]; (ii) Q'( .Y(t) - ~ ( s ) ) Q > O, VQ ~ R', t > s > O. Moreover, it is readily seen that for any s < t,

E ( ( W j ( t ) W / ( t ) - ~i.J(t)) l ~ )--- W,[(s)WJ(s) - ~iJfs).

Hence, W.~(t)WJ(t) - ,Y~,J(t) are martingales. Al l conditions of Theorem 1.4 (Chapter 7) in [7] are satisfied. By virtue of this theorem, Wn(.)=* W('), and W(') is a process with independent Gaussian increments and with sample paths in C'[0, 1]. Thus, the proof of Lemma 2 is concluded.

D Note that (27) yields ,Y(1) = f0 ~ eAuS e A'u du - Z, and consequently Lemma 2 implies W~(1) --, M:(0, ,Y).

Without loss of generality, we may assume that Pe.a is an integer. Clearly, as e - , 0, u~... --* oo. Define

{ tN,.o/~,.. +,.. .(t) = t

It is immediate that

if N,.Jue.,. < 1, otherwise.

I suplO,...(t)-tl, <_ v,., - 1 - , 0 .

Certainly, @,,.° converges in probability in the sense of Skomhod topology to ¢ ( t ) = t. Now, (W,.°, 0 , . . ) =* (W, O), a n : both W and • have continuous paths. It follows that W,. . o O, . . •.

W o O. In view of ihe fact that W O(t) = W(t) for any t ~ [0, 1], we conclude that W,... o ~,.i~ •* W; Let

R,. .( t ) -- -@ A, ....

The definition of ~,... then yields

W,,o 0,... R,.o if N~" . ° --" . "'-<_1 P~,a (28)

G. Yin / Stopped stochastic approximation algorithm 113

and (13) yields that

> 1 ---* O. t ~',.-

Therefore, R , , . =~ W * @ = W. Recall that

WN,..( t ) = - '~ Alv...,k~ k.

If N.,./p.,. ~ 1,

P sup I R, . . . ( t ) - W#...(t ) I ~ I A,..,N... - I I sup I WN...(t) I --' 0

t t

and hence

P R,. . ( . ) ffi W~..( • ) + o(1), where o(1) --~ 0.

D By virtue of (29), we conclude W~..o(. )•* W(.). Consequently, W~v..o(1) ~ .ft'(0, E), and hence,

( xN. - o ) if(o, z) a, , - ,o .

D Now, it is plain that UN... ~ X~, as e --* 0. Thus, we have completed the proof of Theorem 1.

(29)

(30)

4. Conslmeltion of {,~. }

In equation (5), there are several terms unknown. Recall that A = ½1 - H. In general, H is an unknown matrix. If f ( . ) is differentiable (or differentiable locally), then H = f,,(0). In addition, the error covarianee matrix S is also unknown since { ~, } is not observable. To proceed, we need to construct two consistent sequences { H, } and { S,, }, such that H~ --, H, S~ --, S as n ~ ~ . With H~ and S~, we can then define

,Y, - " ] '~ 'e( t /2-u ' l 'S e (t/2-H.)'t dt . ¢ 0 n

(31)

The consistency of H, and S, in turn yields Z , - , E.

4.1. Construction of { S n }

A good candidate for { S~ } is the 'sample covafiance'. Let

n

S, = nl Yk Y:' = nl E ( f ( Xk ) + l~ k ) ( f ( Xk ) + l~ k )' k=1 k=l

[ 1 ~ S,,_I 1 ffi ,11- n/-I + n(f(X,,) + ~2,,)(f(X,,) + ~2,,)'. (32)

Lenuna 3. I f (A1)-(A3) are satisfied, then S, --, S w.p. 1.


E t Proof. Define c~.= then

S.- S= (1- 1) (S"-l S) + l f( X.)f '( X.)

l e . , , 1 , 1 + + . I x . ) + + - ( , . - n

To complete the proof, we shall utilize the following lemma.

(33)

Lemma 4. Let a sequence { x , } be given by

1 1 1 x .+l = x . - --n x . + -b.n + n c"

where E1/nb . converges and c. ~ 0 as n ~ oo. Then x . - , o a s n oo ( c f . [lO1).

(34)

Define

x . = S . - S ,

By the local martingale convergence theorem in [11], Y ' l /nb . converges w.p. 1. In view of the strong consistency of X., the first term in c. tends to 0, as n---, ~ . By virtue of (A2)(3), the second term also tends to 0. Thus, Lemma 4 yields that S. - S ~ 0 w.p. 1, as n --, oo. The proof of Lemma 3 is completed.

4.2. Construction of { 1-1, }

Here, we shall use the idea developed in [12-14]. In accordance with their approach, we fit a straight line regression model Y = B0 + tim X + ~ to { Y~, X, } and use/~1 as an estimate for H.

To this end, let X be the n × (r + 1) design matrix with the j-th row equal to (1, Xf), and let Y be the n × r response matrix with j-th row equal to Y/. For sufficiently large n, define the (r + 1) x r matrix by

/:<x'x)-'x'Y:(&,/,). (35)

Remark 3. It is well known that the least squares estimation can be performed recursively. Thus, the required computation is not a burden on us. The idea is to collect the data obtained in the stochastic approximation algorithm, and use that as a bridge to obtain the estimate of H via the least squares procedure.

Lemma 5. Under (A1) and (A2),/~1 ~ H w.p. 1. (cf. [12-141).

With Lemma 3 and Lemma 5, we conclude the following theorem.

Theorem 2. Under (A1)-(A3), the sequence defined in (31) is strongly consistent. In other words ~ . . - , ~. w.p 1.

5. Concluding remarks

1. A stopping rule for multidimensional stochastic approximation algorithms is developed in this paper. The essence of our approach is to construct an ellipsoidal confidence region with small volume.

2. Equation (13) indicates that N~,~ is an asymptotically efficient stopping time. 3. In the present paper, we have only considered the simplest model (4). More complex models can be

inco~pocated hi~o our framework, for example, the interesting adaptive stochastic approximation algorithms with a, = D,,/n replacing 1 / n in (4).

G. Yin / Stopped stochastic approximation algorithm 115

Acknowledgement

I wish to thank an anonymous referee for careful reading of an early version of this paper and for c o ~ e n t s which led to the much relaxed condition on H.

References

[1] [21 [31 [4]

[51 [6] [71 [81 [9]

[io]

[111 [121

[13]

[14]

H. Robbins and S. Monro, A stochastic approximation method, Ann. Math. Statist. 22 (1951) 400-407. R. Sielken, Stopping times for stochastic approximation procedures, Z. WahrscK Verw. Geb. 26 (1973) 67-75. D.F. Stroup and H.I. Braun, On a new stopping rule for stochastic approximation, Z. Wahrsch. Verw. Geb. 60 (1982) 535-554. M.B. Nevelson and R.Z. Khasminskii, Stochastic Approximation and Recursive Estimation, Translation of Math. Monographs, Vol. 47 (AMS, Providence, RI, 1973). H. Cram6r, Mathematical Methods for Statistics (Princeton Univ. Press, Princeton, NJ, 1946). R.J. Serfling, Approximation Theory of Mathematical Statistics (J. Wiley & Sons, New York, 1980). S. Ethier and T.G. Kurtz, Markov Processes (J. Wiley & Sons, New York, 1986). P. Billlngsley, Convergence of Probability Measures (J. Wiley & Sons, New York, 1968). H.J. Kushner, Approximation and Weak Convergence Methods for Random Processes with Applications to Stochastic Systems Theory (MIT Press, Cambridge, MA, 1984). Y.M. Zhu, Extensions of ~he relations of series to infinite products and the convergence of a class of recursive algorithms, Math. Numer. Sinica 7 (1985) 369-376. Y.S. Chow, Local convergence of martingales and the law of large numbers, Ann. Math. Statist. 36 (1965) 552-558. T.L. Lai and H. Robhins, Consistency and asymptotic efficiency of slope estimates in stochastic approximation schemes, Z. Wahrsch. Verw. Geb. 56 (1981) 329-360. E.W. Frees and D. Ruppert, Estimation following a Robbins-Monro designed experiment, Technical Report No. 811, Dept. of Statist., Univ. of Wisconsin, Madison, W[ (May, 1987). C.Z. Wei, Asymptotic properties of least squares es~nates in stochastic regression models, Ann. Statist. 13 (1985) 1498-1508.

Documents

A stopped stochastic approximation algorithm