6
Proceedings of the 34th Conference on Decision & Control New Orleans, LA - December 1995 FP03 350 Necessary and Sufficient Conditions for Convergence of Stochastic Approximation Algorithms Under Arbitrary Disturbances' Sanjeev R. Kulkarni and Charlie S. Horn Department of Electrical Engineering Princeton University, Princeton, NJ 08544 {kulkarni,horn}Qee.princeton.edu Abstract A parametrized extension of the Kushner-Clark con- dition is introduced for the study of convergence of stochastic approximation algorithms. Our results pro- vide necessary and sufficient conditions for conver- gence that hold in a Hilbert space setting and apply to general gain sequences. These results exhibit the in- terplay among the noise sequence, the gain sequence, and key properties of the underlying function. The proof is direct, completely deterministic, and is ele- mentary, involving only basic notions of convergence. Some corollaries to our main result are also presented. 1 Introduction In this paper, we examine the Robbins-Monro algo- rithm for finding the zero of a function on a Hilbert space, H, based on the equation: where x,, E D-B is the estimate for the location of the zero, a*, of the function f : H + W, a, is a sequence of positive constants tending to zero, and e, E H represents measurement noise. We let (e, e) denote the inner product on H and I e I the corresponding induced norm. A classic result based on the ODE method was provided by Kushner and Clark [6]. Un- der suitable assumptions on f, the usual assumptions on the gain sequence that a, + 0 and Cna, = 00, and a boundedness assumption on the x,, sequence, they gave a sufficient condition on the noise sequence, known as the Kushner-Clark (or simply KC) condi- tion, for convergence of stochastic approximation al- gorithms. See, for example, [I, 6, 71 for many other results and extensive bibliographies in this area. In this paper, we introduce a parametrized version of a Kushner-Clark condition and show that it charac- terizes convergence for correspondingly parametrized function classes. In this way, our results exhibit pre- cise tradeoffs among the class of noise sequences un- der which the algorithms converge, the gain sequence, IThis work was supported in part by the National Science Foundation under grants IRI-9209577, IRI-9457645 and ECS- 9216450, by EPRI under grant RPS030-18, and by the A m y Research Office under grant DAAL03-92-G-0320. and key properties of the underlying function. Our main result is a deterministic statement that gives a necessary and sufficient condition for convergence of the algorithm. The result holds for general gain sequences and in a general Hilbert space setting. A special case of the result together with an equivalence result in [lo, 111 gives the result that the standard Kushner-Clark condition is necessary as well as suffi- cient, as shown in [IO, 111. Another interesting aspect of the work presented here is the proof technique, which is based on the approach in [5, 41. The proof is completely deter- ministic, remains in a discrete-time setting, and is very elementary, involving only basic notions of con- vergence. The present paper, as well as other work such as [2], further shows strengths of an elementary, completely deterministic approach. Section 2 contains the main ideas and a statement of our main result, which is proved in Section 3. An alternate form for our condition and some corollaries to the main result are discussed in Section 4. 2 Main Result Before, defining our parametrized version of a Kushner- Clark condition and stating our main result, we first provide some insight into our condition and assump- tions. Intuitively, if the magnitude of the function is always strictly greater than some positive constant, we would expect the algorithm to be able to tolerate more noise than in the continuous case where f goes to zero. In fact, the algorithm should even be able to tolerate noise which is always greater in magnitude than some small positive constant (which can lead to lack of convergence in the continuous case). We shall examine the noise sequences for which convergence or lack of convergence is provided for various families of functions. Our result exhibits the tradeoff between the set of noise sequences and the size of the family of functions. We shall examine the following two parametrized assumptions on the function f, which are denoted (Bl)(r) and (Cl)(r), where T is a parameter that en- ters into the assumption. The assumptions are as follows: 0-7803-2685-7/95 $4.00 0 1995 IEEE 3843

[IEEE 1995 34th IEEE Conference on Decision and Control - New Orleans, LA, USA (13-15 Dec. 1995)] Proceedings of 1995 34th IEEE Conference on Decision and Control - Necessary and sufficient

  • Upload
    cs

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE 1995 34th IEEE Conference on Decision and Control - New Orleans, LA, USA (13-15 Dec. 1995)] Proceedings of 1995 34th IEEE Conference on Decision and Control - Necessary and sufficient

Proceedings of the 34th Conference on Decision & Control New Orleans, LA - December 1995 FP03 350

Necessary and Sufficient Conditions for Convergence of Stochastic Approximation Algorithms Under Arbitrary Disturbances'

Sanjeev R. Kulkarni and Charlie S. Horn Department of Electrical Engineering

Princeton University, Princeton, NJ 08544 {kulkarni,horn}Qee.princeton.edu

Abstract A parametrized extension of the Kushner-Clark con- dition is introduced for the study of convergence of stochastic approximation algorithms. Our results pro- vide necessary and sufficient conditions for conver- gence that hold in a Hilbert space setting and apply to general gain sequences. These results exhibit the in- terplay among the noise sequence, the gain sequence, and key properties of the underlying function. The proof is direct, completely deterministic, and is ele- mentary, involving only basic notions of convergence. Some corollaries to our main result are also presented.

1 Introduction In this paper, we examine the Robbins-Monro algo- rithm for finding the zero of a function on a Hilbert space, H, based on the equation:

where x,, E D-B is the estimate for the location of the zero, a*, of the function f : H + W, a, is a sequence of positive constants tending to zero, and e, E H represents measurement noise. We let (e, e ) denote the inner product on H and I e I the corresponding induced norm. A classic result based on the ODE method was provided by Kushner and Clark [6]. Un- der suitable assumptions on f, the usual assumptions on the gain sequence that a, + 0 and Cna, = 00, and a boundedness assumption on the x,, sequence, they gave a sufficient condition on the noise sequence, known as the Kushner-Clark (or simply KC) condi- tion, for convergence of stochastic approximation al- gorithms. See, for example, [I, 6, 71 for many other results and extensive bibliographies in this area.

In this paper, we introduce a parametrized version of a Kushner-Clark condition and show that it charac- terizes convergence for correspondingly parametrized function classes. In this way, our results exhibit pre- cise tradeoffs among the class of noise sequences un- der which the algorithms converge, the gain sequence,

IThis work was supported in part by the National Science Foundation under grants IRI-9209577, IRI-9457645 and ECS- 9216450, by EPRI under grant RPS030-18, and by the A m y Research Office under grant DAAL03-92-G-0320.

and key properties of the underlying function. Our main result is a deterministic statement that gives a necessary and sufficient condition for convergence of the algorithm. The result holds for general gain sequences and in a general Hilbert space setting. A special case of the result together with an equivalence result in [lo, 111 gives the result that the standard Kushner-Clark condition is necessary as well as suffi- cient, as shown in [IO, 111.

Another interesting aspect of the work presented here is the proof technique, which is based on the approach in [5, 41. The proof is completely deter- ministic, remains in a discrete-time setting, and is very elementary, involving only basic notions of con- vergence. The present paper, as well as other work such as [2], further shows strengths of an elementary, completely deterministic approach.

Section 2 contains the main ideas and a statement of our main result, which is proved in Section 3. An alternate form for our condition and some corollaries to the main result are discussed in Section 4.

2 Main Result Before, defining our parametrized version of a Kushner- Clark condition and stating our main result, we first provide some insight into our condition and assump- tions. Intuitively, if the magnitude of the function is always strictly greater than some positive constant, we would expect the algorithm to be able to tolerate more noise than in the continuous case where f goes to zero. In fact, the algorithm should even be able to tolerate noise which is always greater in magnitude than some small positive constant (which can lead to lack of convergence in the continuous case). We shall examine the noise sequences for which convergence or lack of convergence is provided for various families of functions. Our result exhibits the tradeoff between the set of noise sequences and the size of the family of functions.

We shall examine the following two parametrized assumptions on the function f , which are denoted (Bl)(r) and (Cl)(r), where T is a parameter that en- ters into the assumption. The assumptions are as follows:

0-7803-2685-7/95 $4.00 0 1995 IEEE 3843

Page 2: [IEEE 1995 34th IEEE Conference on Decision and Control - New Orleans, LA, USA (13-15 Dec. 1995)] Proceedings of 1995 34th IEEE Conference on Decision and Control - Necessary and sufficient

(Bl)(r) v6 > 0 , 3 . 6 > 0 8.t. I;c - z*l 2 6 + (f(z), z - a*) 2 (. + h6)la: - Z*l

(Cl)(T) 1im6+osuPlul<, If(.* + .1)1 i T

Roughly speaking, assumption (Bl)(r) constrains the restoring force o f f away from z* (i.e., the movement towards z* due to the anf(x,) term), and assump tion (Cl)(r) constrains the magnitude of the function in the neighborhood of z*. We shall see that both of these quantities are important. In particular, the restoring force of f away from z* is a natural quan- tity to speak of when making positive statements of convergence. That is, as long as the "height" of the noise is less than the restoring force of f , the algo- rithm should converge for all 21. On the other hand, the magnitude of the function in the neighborhood of x* is a natural quantity to speak of when making negative statements of convergence. Namely, if the "height" of the noise is greater than the magnitude o f f in the neighborhood of x* then convergence will not take place for any 21.

Assumptions (Bl)(r) and (Cl)(r) parametrize the family of functions about which we will make con- vergence or lack of convergence statements. We now introduce a parametrized Kushner-Clark type condi- tion to characterise the set of noise sequences that provide such convergence or lack of convergence.

Definition 1 The noise sequence e, is said to satis& the K C J ( r ) condition if V a > r ,P > 0 and infinite sets of nonoverlapping intervals { I k I J

n E I h n E I h

for all but a finite number of k 's.

We shall also find it useful to consider the negative of the above statement, namely e, does not satisfy KC'(r) if 3 constants a: > r ,P > 0 and an infinite set of nonoverlapping intervals {&} such that for all I C , ICnEEIh%enl 2 a:CnEIhan + P. We also refer to a noise sequence that does not satisfy KC'(r) as a persistently disturbing noise sequence of height > r , since such a noise sequence persistently disturbs the natural convergence properties of the algorithm.

It was shown in [lo] that for T = 0, this condition is equivalent to the standard Kushner-Clark condition (hence the KC' notation). In fact, our parametrized KC'(r) condition suggests that an equivalent param- eterized form of the standard Kushner-Clark condi- tion can also be introduced. This is in fact the case and the paper [ll] gives parametrized extensions and addresses the equivalence between a number of con- ditions used in the literature.

3844

In our main result below, part (a) is a positive statement which gives a necessary and sufficient con- dition for convergence for one parametrized class of functions, while part (b) is a negative statement which gives a necessary and sufficient condition for lack of convergence for a second parametrized class of func- tions. Finally, part (c) is a combination of part (a) and part (b). The theorem involves the assumptions described above as well as the following assumptions.

( A l l M.11 I K f vz (A2) % > 0, Theorem 1 Consider the families of functions

Fl(r) = { f : f satisfies (Af) and ( B f ) ( r ) }

FZ(r) = {f : f satisfies (Af) and (Cf) (r ) } . Let cr, satisfy (AZ)J and let x, be generated according t o the Robbins-Monro algorithm (RM). Then (a) z, + z* for every f E Fx(r) and every z1 E H

iff the noise sequence e, satisfies KC'(r).

(b) Z, f+ z* for every f E &(r) and every z1 E H iff the noise sequence e, does not satisfy KC'(r).

(c) for any f E Fx(r) n F2(r) and any XI E U-!, a, -+

x* iff the noise sequence e, satisfies KC'(r).

For the special case of T = 0, the results reduce to standard sufficiency results. Also, for r = 0, the proof technique for a one-dimensional version of the algo- rithm first appeared in [5] and for the general Hilbert space setting in [4], using a direct and elementary ap- proach in a general setting. The result was also the first to prove necessity for general gain sequences and in a general Hilbert space setting. Subsequent work such as [2] provided a similar direct approach for a constrained Robbins-Monro algorithm.

Note that (Al), the boundedness assumption on the function class, is not required for necessity (i.e., 2, + Z* implies e, satisfies KC'(r)). Thus the theo- rem can be relaxed appropriately.

The above theorem shows the tradeoff between the noise sequences which provide convergence or lack of convergence of the algorithm and the size of the fam- ily of functions for which the convergence statement is made. It is straightforward in 1-D to construct ex- amples showing the tightness of the result. Namely, one can show that if the height of f is equal to the "height" of the noise, then the algorithm may or may not converge depending on the details of the noise. Also, it can be shown that a strong result in the form of Theorem l(c) that works for any function would need to take into account specifics on the function and the initial condition. In contrast, the KC'(r) condition involves only the interplay of the a, and e, sequences and is independent of the function and the initial condition.

+ 0, and E,"==, an = 00

Page 3: [IEEE 1995 34th IEEE Conference on Decision and Control - New Orleans, LA, USA (13-15 Dec. 1995)] Proceedings of 1995 34th IEEE Conference on Decision and Control - Necessary and sufficient

3 Proof of Main Result As in [4], we argue that it is enough to consider two basic categories, namely when there is a gross imbal- ance between the force due to the noise and the force due to the function and when a detailed balance be- tween the forces must be studied in a situation where the movement of the algorithm is directed in magni- tude away from a*. The following three lemmas help us in this regard. The first two are taken directly from [4], while the third is an extension of a result in [4]. The proof of Theorem 1 follows the three lemmas.

Lemma 1 (Gross Imbalance) Assume (AI) and sup- pose 3c > 0 and an interval I k = [Lk, Mk - 11 such that KfCnEI,,a, 5 c/4 and IzM,, - ~ L ~ I 2 c . Then

Lemma 2 (4) Let a, y, z E H with IzI < M and let h > 0. Then Vq E (0 , l ) and c E ( 0 , qh/M] we have that i f ( z , y) 2 h and Iy - a1 5 c then ( z , a) 2 (1 - q)h ( b ) Let 6 > 0 , E 2 0 , and c > 0. Let a,y E H with la1 2 6 , IyI 2 6, IyI 2 121 - 2 ~ , and ly - a1 5 c . Then

lCnEIha,enl 2 2KjCnEIhan + c / 4 *

(y - a, z/IzI) 2 -c2/(26) - 2~

Lemma 3 (Detailed Imbalance) A8sume ( A i ) and (Bl)(r) and let c > 0 and E 2 0. Suppose 36 > 0 and an interval Ik = [Lk, Mk - 11 such that c / 8 5 KjCnEIhan and vn E I k U ( M k } , the inequalities Ian - ZL, ,~ < C , Ian - a* / 2 6 , and Ian - a*I 3 1 ” ~ ~ - a*I - 2~ all hold. Then, 3c0 > 0 and E O ( C ) > 0 such that i f c E ( 0 , CO] and E E [0, E O ( C ) ] , then 3a > r , p > 0 such that (CnEI,,anenI 2 aCnEI,,an + P . Proof Let CO = min(qha/Kf, ((l-q)6ha)/(l6Kf)},

where Q E ( 0 , l ) is chosen small enough so that (1 - q)(r + ha) > r. Let c E (O,CO]. By assumption, c E (0, qha/Kf] so that Lemma 2a applies (with M = K f , h = ha, and Q chosen above). Then by assump- tion (Bl)(v) and Lemma 2a (with a = ZL,, - a*, y = an - a*, and z = f(zn)),

(f(an), - 0’) 2 (1 - q)(f(an), an .- a*) 2 (1-r])(T+h6)1an-z*/

I (1 - T ) ( T + ha)(lzLh - 0*1- 26)

Therefore, CnEI,,an(f(an), V k ) 1 (1 - Q)(l- Y ) ( T + h6)C,E~han, where v k = (zL,, -a*) / Ia~, , --*I. Also, by Lemma 2b (with a = a&,, - a* and y = OM^ - a*),

( Z M h - EL,,, vk) 2 -ea / (%) - 2 E .

Therefore, by recalling that aMh

en), we have that t ~ ; k - ~ n E I h Q n ( f ( ~ n ) f

%(en, vk) 5 - an(f(%),vk) + c2 / (26 ) + 2E* n E I h n E I h

The right hand side of this inequality consists of a restoring force term due to the function and a term with an opposite sign. If the restoring force term dominates, then the right hand side is strictly nega- tive. This would imply that

2 E 2 (1 - q)( l - TI(‘‘ + h6) %I

n E I k

-ca/(26) - 2~

If, in addition, E is small enough so that (1 - ~ ) ( l -

(1 -q ) ( l - ?)(.+ha) can be written as CY+^, where CY > r and 7 > 0. Thus.

a)(r a € + ha) > T (let E’ denote the largest such E ) , then

nEG

where p = 7 - G - 2 ~ . The important point is that p is quadratic in c and is in fact positive if c is fixed small enough (in particular, straightforward compu- tations show that c E ( O , C O ] is small enough) and then E is taken sufficiently small (and again, straight- forward computations show that E € [0, EO(C)] is small enough, where E O ( C ) = min{e’, [(l - q)hac / ( l6Kf ) - ca/ (26) ] / [4(1 + ( 1 - q)hac / ( l66Kf ) ) ] > 0 ) . There- fore, for c E (O,co] and E E [O,~o(c) ] , ICnEIhanen) 2 CY^^^^,,^, + p , where a > r and p > 0 are as defined above. 0

SKI

Proof of Theorem 1 (Part (a)) (e) We prove the result by proving the contrapositive. Assume that an +t a* for some f E Fl(r) and some a1 E H. We shall show that en does not satisfy KC’(r). First, note that if (anen( -H

0, then certainly en does not satisfy KC’(v). This is true since if en satisfies KC’(T), then by choosing the intervals I k = { k } , k = 1 , 2 , . . . we would obtain V a > r ,P > 0, Ia,enl < aa, + p for all but a finite number of k’s. Since a, -+ 0 by (A2), this implies la,e,l 4 0. Therefore, we need only consider the case lanenl -+ 0. Then, by (Al) and (A2), we have that the movement per iteration goes to zero, i.e.,

Ian(f(an) +en) ] 5 G K f + I h e n l -+ 0. (MI)

We consider the following three cases which charac- terize lack of convergence. In each of the three cases, we show that there exist natural choices of intervals in which either Lemma 1 or Lemma 3 applies. Since 2Kj > T , we see that the lemmas provide appropriate CY > r and p > 0 terms.

3845

Page 4: [IEEE 1995 34th IEEE Conference on Decision and Control - New Orleans, LA, USA (13-15 Dec. 1995)] Proceedings of 1995 34th IEEE Conference on Decision and Control - Necessary and sufficient

Case 1: (0 < liminf la, - z*I = limsupla, -

In this case la, - a*/ converges to a strictly positive number t. Choose c = CO and E = E O ( C O ) (where for the definition of CO and eO(c0) coming from the proof of Lemma 3, use 6 = t / 2 ) . Since the limit exists, 3 NI < m s.t. lltn - a*1 - tl 5 e Vn > NI, i.e., wait long enough so that la, - a*1 is at most some fixed E

away from its limiting value t. Now pick an infinite sequence of non-overlapping intervals { J k } with J k =

M k such that c / 8 6 KjCnEJh% 5 c/4, i.e., the intervals should be both long enough to apply Lemma 3 and small enough to apply Lemma 1. Note that assumption (A2) allows us to select the intervals J k

so that they satisfy the desired inequality. Convergence of the quantity la, - a* I implies that

a, will ‘slip’ a t most 2t- back from a~~ towards a*. It can then be argued that either Lemma 1 can be applied on a subinterval of J k or Lemma 3 can be applied on J k . This is so because either lash - a~,, I 2 c for some s k E J k U { h f k } or not. If lzsh - CL^^ 2 c for some S k E J k u { k f k } (i.e., if some point lies in the 2e annulus but outside the circle of radius c around Z L ~ ) then let I k = [ & , s k - 11. Since 5 CnE Jhu,,, Lemma 1 implies that

a*I < CO)

[ L k , M k - 11 by letting L1 > N i l L k > h f k - 1 , and

(as the force due to the function would only allow movement within a c/4 ball). If lash - < c for all s k E J k U ( M k } then k t = J k . Then it is easy to verify that vn E I t , U { M k } , the inequalities Iz, - x * I 2 t - e, 12, - ZL,,~ < C, and 12, - z*I 2 ~oL,, - % * I - 2e all hold. Therefore, by Lemma 3, 3a,P > 0 such that ICnEIhhen/ 2 a C n E I h ~ +pa

Case 2: (lim inf lz, - z* I # lim sup Ian - z* I) In this case, infinitely often IC, - z*l must alternate moving near the liminf value and the limsup value. Therefore, there exist constants 63 > 62 > b1 > 0 such that lt,-a* I 2 63 infinitely often and 6 1 < la,- x* I < 6 2 infinitely often. Let AIZ = 62 -61 and A23 = 63 - 62. Now choose c E (O,min(A1z,A~3/4,~0)) (where for the definition of CO coming from the proof of Lemma 3, use 6 = 6 1 ) . From (MI) and (A2), there exists a time such that movements from iteration to iteration are small. In particular, 3 Nz < 00 such that lan(f(an) +e,)] < c/4 and K j a , < c/8

Now choose an infinite sequence of non-overlapping intervals J k , with J k = [ L k , M k - 13, to denote the occurrences of movement from the lim inf value to the limsup value. In other words, let J k be the time in- terval representing the k-th occurrence (starting after

Vn > N2.

time N z ) of exiting 5 1 < 12 - a*1 < 62 and then exit- ing Ia-a*l < 63 without ever reentering la-a*/ < 62.

We start J k at L k which we use to represent the last time that {t,} is in the interval 61 < la - a*/ < 62

before the k-th exit of I C - a* I < 63 and we end J k a t M k - 1 which we use to represent the last time that (a,} is in IC - z*I < 63 before the k-th exit. By our definitions, l c ~ ~ - a*/ 2 63.

Note that the movement in the J k intervals is away (in magnitude) from z*. It can then be argued that either Lemma 1 can be applied on J k or a subinterval of J k , or Lemma 3 can be applied on a subinterval of

We first ask whether KjC, , ,a , 5 c or not. If K f C n g j h ~ 5 c then let I k = J k . Since c < A23/4, Lemma 1 implies that /CnElhanenl 2 2 K j ~ c , E I h a n + 9 (as IzM, - aLhI is large in relation to c). If K j C n E J h ~ > c then let JL = [ L k , R k - 11 where R k

is chosen so that c / 8 6 K f C n g J;a, 5 c /4 , i.e., as in Case 1, the intervals should be both long enough to apply Lemma 3 and small enough to apply Lemma 1. We can then proceed similar,to Case 1. Either lash - ZL,,~ 2 C for Some s k E Jk U { R k } Or not. If /ash - CL,, I 2 c for some s k E J; U { R k } (i.e., if some point lies more than a 6 2 distance away from a* but outside the circle of radius c around a:~,,) then let I k = [&, S k - 11. Since CnEIka, 5 Cng ,;a,, Lemma 1 implies that ICnEIka,e,l 2 2KfCnEIhan + 4 (as the force due to the function would only allow move- ment within a c / 4 ball). Finally, if lash 7 a&,,! < c for dl s k E J L U ( R k } then let = J k . By our definitions of L k and M k we have la,, - a*/ 2 62

and It, - 2 ~ ~ 1 < c for dl n E I k U {&}. There- fore, by Lemma 3 (with E = 0), 3a,P > 0 such that I C n E I h ~ e n l 2 QC,EI,,k + P-

Jk -

for n E I k U { R k } - { L k } , 62 > IaLk - a*/ > 6 1 ,

Case 3: (lim inf It, - a* I = m) In this case, there exist constants A,,,A,, > 0 and an infinite sequence 6 3 , k > 6 2 , k > 6 1 , k > 0 , with

such that a, moves from 6 1 , k < /a, - a*] < 6 2 , k to la, - x*I 2 6 3 , k . With the choice of A12 and A239 proceed exactly as in Case 2 above.

Thus, in each of the three cases, we have shown that e, does not satisfy (KC’(?)) on an infinite set of nonoverlapping intervals. The constants a, p in each of the three cases only depend on whether Lemma 1 or Lemma 3 were applied and are otherwise independent of the particular intervals. Therefore, e, does not satisfy KC’(r). (a) Since the set F ~ ( T ) n &(r) is not empty,

this direction follows from (-e=) direction of part (b) presented below.

6 2 , k - 6 1 , k = A 1 2 and 63,k - 6 2 , k = A23 for all k,

3846

Page 5: [IEEE 1995 34th IEEE Conference on Decision and Control - New Orleans, LA, USA (13-15 Dec. 1995)] Proceedings of 1995 34th IEEE Conference on Decision and Control - Necessary and sufficient

(Part (b)) (e) Assume e, does not satisfy KC'(r) so that there exists positive constants a > r ,@ > 0 and an infinite set of nonoverlapping intervals {Ik = [Lk, Mk - 11) such that ICnEIhanenl 2 aCnE~,,% + @ holds. Let Vk be the unit vector representing the direction of CnEIkane,.

By assumption (Cl)(r), for any f E &(r) there exists 7 > 0 such that If(.)/ < a for la - z*1 < y. Thus, if la - a* I < y for all n E I k , then

1'hfh - zL),/ = 1 a n ( f ( a n > + en11 nEIr

2 l ( x a n ( f ( % ) +en),Vk)I

2 l x a n e n l - I ( x a n f ( z n ) , v k ) l

nEIh

n € I h n € I h

> P. Therefore z, * a* for all f E F ~ ( T ) and all a1 E W.

(&) Again, since the set F l ( r ) n F ~ ( T ) is not empty, this direction follows from (+) direction of part (a).

Proof In each case, we prove the contrapositive (i) * (ii) ICncIhanenl = I(CncIkanen,Vk)l , where vk is the unit vector in the direction of x n E I h a n e n . (ii) ==+ (iii) Let S be the unit sphere which rep- resents the set of unit directional vectors. Note that

Also, note that I&Ihan(en, .)I 2 (1 -E) (QCnEIka , ,+ p) for all v E S in any open E neighborhood, c , (wk) , of wk. This is so since

n E I h

Now fix E small enough so that (1 - €)a > r . Since S is compact, a t least one of the neighborhoods, say c c ( w * ) , contains infinitely many wk. Therefore,

(Part (c)) parts (a) and (b) to F ~ ( T ) n F2(r). 0 n € h n E I k Follows directly from the application of / x a n ( e n , w * ) I L ( ~ - E ) ( ~ C & + P )

infinitely often.

4 Equivalent Conditions and Corollaries

The following definition and theorem yield the simple but interesting conclusion that, in finite dimensions, e, satisfies KC'(r) iff it satisfies KC'(r) in a one di- mensional manner.

Definition 2 A noise sequence e, is said t o satisfy the directional KC'(r) condition i f V a > r , p > 0, infinite sets of nonoverlapping intervals { I k } , and corresponding unit directional vectors V k

n e 4

for all but a finite number of k's. e, is said t o satisfy the unidirectional KC'(r) condition i f there is some f ixed v such that e, satisfies the directional K d ( r ) condition with V k = V .

Note that, for any Hilbert space, directional KC'(r) is equivalent to KC'(r). In finite dimensions, all of the definitions are equivalent.

Theorem 2 For It0 = Rd, the following are equiva- lent: (i) e, satisfies the KC'(r) condition (ii) e, satisfies the directionab KC'(r) condition (iii) e, satisfies the unidirectional K C ' ( r ) condition

We now show how to recover a result for stochas- tic noise in terms of our form of the Kushner-Clark condition. It is interesting to see how the result that almost surely e, satisfies KC'(r) arises quite natu- rally using only Markov's inequality and the Borel- Cantelli Lemma. This contrasts with standard mar- tingale techniques (e.g., [8]) for recovering stochastic results.

Corollary 1 Let f be measurable, let assumptions (Al), (A.2), and (Bl)(r) be satisfied, and assume that the a,, satisfy x,"=lai < 00. Suppose the noise se- quence e, is uncorrelated with lim I Ee, I < r and covariance C, with C,"=la:tr(Cn) < 00. Then for any a1 E Rd, a, -+ a* almost surely.

Proof By Theorem l(a), it is enough to show that almost surely, the noise sequence e, satisfies KC'(r). Let a > 0,p > 0 and let Ik be any set of non-overlapping intervals. Let sk = Enelk anen so that sk has mean EnEIha, ,Een and covariance CnEIha:Cn. Then we need to show that almost surely Isk I > a EnGI,, a, +p only finitely many times. However, using Markov's inequality, we get

pr(lsk - E S k I > a an + p ) n € L

3847

Page 6: [IEEE 1995 34th IEEE Conference on Decision and Control - New Orleans, LA, USA (13-15 Dec. 1995)] Proceedings of 1995 34th IEEE Conference on Decision and Control - Necessary and sufficient

where the second inequality follows from the fact that QI > 0 and a, > 0. Therefore, we have

m

Thus, by the Borel-Cantelli lemma, I s k - EsaI > a CnEIh a,+P only finitely many times almost surely. Since I s k - E s k I 2 ( S k I - l E s k l 2 / S k I - C n € ~ , % I J % a ] ,

therefore 1861 - CneIb anIEenI > axnEIh + P only finitely many times almost surely. Since we have lim IEe, I 5 r , wait long enough so that IEe,l is always less than r + a. Therefore Is61 > ( r + 2 a ) CnEIh a, + p only finitely many times al- most surely. This is true for all a,P > 0 so in fact, for all a’ > r,P > 0, l S k l > a’CneIh a, + P only finitely many times almost surely. 0

For the specific case a, = l/n, a necessary and sufficient condition for convergence was given in [3]. The following corollary of Theorem 2 slightly extends the previous result and with simpler proof techniques.

Corollary 2 Let f be measurable and satia8fy the as- sumptions (A l ) , (Bl)(r), and (Cl)(r), and take any a1 E H. Then for the specific case of& = l/n, the Robbins-Monro algorithm converges almost surely t o z* iglimsupN-roo (I/N)I E=l e,l< r almost surely.

Proof (-e=) Let 7 N = (l/~)x,N,~e,. BY ex- panding and rearranging terms, one can show that

Let I k = [Lk, M k - 11 be any interval. Then

ML-1

Since l i m s u p I 7 ~ I < r , we have that Vc > 0 such that

3N, Vn 2 N,. Therefore V a > Iynl < r + c

r , P > 0, and sets of intervals (Ik} we have that for all intervals I k starting after N, where c = min(a - r, P / 2 1 ,

< a C a n + P , n e I h

which says that e, satisfies KC’(r) and therefore the algorithm converges.

(*) It is easy to examine the original Robbins- Monro equation (RM) directly, but the details are omitted for space reasons. 0

References A. Benveniste, M. MCtivier, and P. Priouret, Adaptive Algorithms and Stochastic Approximations, Springer Verlag, 1990.

H.F. Chen, “Stochastic Approximation and Its New Applications,” Proc. 1994 Hong Kong International Workshop on New Directions of Control and Manu- facturing, pp. 2-12, 1994.

D.S. Clark, “Necessary and Sufficient Conditions for the Robbins-Monro Method,” Stochastic Processes and their Applications, Vol. 17, pp.359-367, 1984.

S.R. KulkarN and C. Horn, “An Alternative Proof for Convergence of Stochastic Approximation Algo- rithms,” submitted to IEEE Transactions on Auto- matic Control.

S.R. Kulkarni and C. Horn, “Convergence of the Robbins-Monro Algorithm Under Arbitrary Distur- bances,” Proc. 32nd IEEE Conf. on Decision and Contro2, pp. 537-538, December, 1993.

H.J. Kushner and D.S. Clark, Stochastic Approxi- mation f o r Constrained and Unconstrained Systems, Springer-Verlag, New York, 1978.

L. Ljung, G. Pflug, and H. Walk, Stochastic Ap- proximation and Optimization of Random Systems, Birkhauser, Germany, 1992.

M. Metivier and P. Priouret, “Applications of a Kushner and Clark Lemma to General Classes of Stochastic Algorithms,” IEEE Transactions on In- formation Theory, Vol. IT-30, pp. 140-151, 1984.

H. Robbins and S. Monro, “A Stochastic Approx- imation Method,” AnnaZs of Math. Stat., Vol. 22,

I-J. Wang, E.K.P. Chong, and S.R. Kulkarni, “Ne- cessity of Kushner-Clark Condition for Convergence of Stochastic Approximation Algorithms,” Proceed- ings of Allerton Conference, 1994.

I-J. Wang, E.K.P. Chong, and S.R. Kulkarni, “On Equivalence of Some Noise Conditions for Stochastic Approximation Algorithms,” Proc. 34th IEEE Conf. on Decision and Control, 1995.

pp.400-407, 1951.

3848