10
Vol. 41 No. 4 SCIENCE IN CHINA (Series E) August 1998 Sharp convergence rates of stochastic approximation for degenerate roots* FANG Haitao (jY~J~j~) and CHEN Hanfu (~1~) (Laboratory of Systems and Control, Institute of Systems Science, Chinese Academy of Sciences, Beijing 100080, China) Received January 13, 1998 Abstract Sharp convergence rates of stochastic approximation algorighms are given for the case where the deriva- tive of the unknown regression function at the sought-for root is zero. The convergence rates obtained are sharp for the general step size used in the algorithms in contrast to the previous work where they are not sharp for slowly decreasing step sizes; all possible limit points are found for the case where the first matrix coefficient in the expansion of the re- gression function is normal; and the estimation upper bound is shown to be achieved for the multi-dimensional case in contrast to the previous work where only the one-dimensional result is proved. Keywords: stochastic approximation, convergence rate. Stochastic approximation (SA) has widely been applied in various fields such as optimization, system identification, adaptive control, neural network, pattern recognition and others. The problem of SA is to seek the root x ~ of an unknown function f(. ) : ~d __~Ra on the basis of noisy observations {yk I on f: Yk+l = f(xk) + ek+l, (1) where x k denotes the estimate for x ~ at time k and ek + 1 is the observation noise. Reference [ 1] proposed the following algorithm which is now called Robbins-Monro (RM) algorithm: xk+l = xk + akYk+l, (2) where ak ( > 0) is the step size of the algorithm. The convergence rate of an algorithm is not only of theoretical interest, but alto of great im- portance for applications. It is understandable that the convergence rate of a SA algorithm de- pends upon whether or not f'( x ~ = 0, and the rate in the nondegenerate case (f' (x ~ r 0) (cf. refs. [2--6]) should be faster than that in the degenerate case (f'(x ~ = 0). The first result for degenerate case probably belongs to Ljung and Pflug [73 . Then it has been extended to the multidi- mensional case under fairly general conditions 1) . It is shown that the rate is the order of n-1 (loga:l) t/~', $ > 0. By using the rate scale t, = ~-~ai to replace loga~ 1 applied in footnote 1) I=0 the present paper has succeeded in obtaining the sharp convergence rates. * Project supported by the National Climbing Project of China and the National Natural Science Foundation of China (Grant No. 69674006), and also by the Post-doctor Grant of the Chinese Academy of Sciences to the first author. 1 )Chen, H. F., Convergence rate of stochastic approximation algorithms in the degenerate case, to appear in SIAMJ. Con- trol. Optim.

Sharp convergence rates of stochastic approximation for degenerate roots

Embed Size (px)

Citation preview

Vol. 41 No. 4 SCIENCE IN CHINA ( S e r i e s E) August 1998

Sharp convergence rates of stochastic approximation for degenerate roots*

FANG Haitao (jY~J~j~) and CHEN Hanfu ( ~ 1 ~ )

(Laboratory of Systems and Control, Institute of Systems Science, Chinese Academy of Sciences, Beijing 100080, China)

Received January 13, 1998

Abstract Sharp convergence rates of stochastic approximation algorighms are given for the case where the deriva-

tive of the unknown regression function at the sought-for root is zero. The convergence rates obtained are sharp for the

general step size used in the algorithms in contrast to the previous work where they are not sharp for slowly decreasing

step sizes; all possible limit points are found for the case where the first matrix coefficient in the expansion of the re-

gression function is normal; and the estimation upper bound is shown to be achieved for the multi-dimensional case in

contrast to the previous work where only the one-dimensional result is proved.

Keywords: stochastic approximation, convergence rate.

Stochastic approximation (SA) has widely been applied in various fields such as optimization,

system identification, adaptive control, neural network, pattern recognition and others. The

problem of SA is to seek the root x ~ of an unknown function f ( . ) : ~d __~ Ra on the basis of noisy

observations { yk I on f :

Yk+l = f ( xk ) + ek+l, (1)

where x k denotes the estimate for x ~ at time k and ek + 1 is the observation noise.

Reference [ 1] proposed the following algorithm which is now called Robbins-Monro ( R M )

algorithm:

xk+l = xk + akYk+l, (2)

where ak ( > 0) is the step size of the algorithm.

The convergence rate of an algorithm is not only of theoretical interest, but alto of great im-

portance for applications. It is understandable that the convergence rate of a SA algorithm de-

pends upon whether or not f ' ( x ~ = 0, and the rate in the nondegenerate case ( f ' ( x ~ r 0) (cf.

refs. [ 2 - - 6 ] ) should be faster than that in the degenerate case ( f ' ( x ~ = 0) . The first result for

degenerate case probably belongs to Ljung and Pflug [73 . Then it has been extended to the multidi-

mensional case under fairly general conditions 1) . It is shown that the rate is the order of n - 1

(loga:l) t/~', $ > 0. By using the rate scale t , = ~-~ai to replace loga~ 1 applied in footnote 1) I = 0

the present paper has succeeded in obtaining the sharp convergence rates.

* Project supported by the National Climbing Project of China and the National Natural Science Foundation of China (Grant

No. 69674006), and also by the Post-doctor Grant of the Chinese Academy of Sciences to the first author.

1 )Chen, H. F . , Convergence rate of stochastic approximation algorithms in the degenerate case, to appear in SIAMJ. Con-

trol. Optim.

384 S C I E N C E IN C H I N A (Ser ies E ) Vol . 41

Preliminary results

We first describe the SA algorithm considered in the paper and present conditions for its con-

vergence. Let { Mk [ be a sequence of real numbers, Mi > O, Mi--~ co and let x ~ be a fixed point in

~d. The estimate x~ is recursively given by

xk+t = xk + akYk+l, Xo arbitrary, (3)

X~+l= xh+lI III~+ITT<M f + x* I tll~k+~r r>Mk I, (4)

k - t

ok = S ] Ii th ~,~ II > ~ I, (5) i=0 i

Yk+l= f ( x k ) + ea+t. (6)

This is the RM algorithm truncated at randomly varying bounds (cf. references [4, 5, 8] ) .

Since Mi diverges to infinity, eqs. ( 3 ) - - ( 6 ) coincide with the RM algorithm starting from a

certain time, if { xk I defined by ( 3 ) - - ( 6 ) can be proved to be bounded.

Let us list conditions which will be used later on.

(A1) f ( " ) is an ]~a _,. Ra measurable and locally bounded function, and f ( x k ) = 0, V x E J ,

i .e. J is the root set of f ( ' ) .

(A2) ak > 0 , ak--~0 as k --~co, ~ a i = c o .

i=1

(A3) There is a continuously differentiable function v ( . ) : R d --~ N such that sup f T ( x ) v ~ ( x ) < for all 0 < 8 < A,

and v ( J ) is not dense in any interval, where v ~ ( x ) denotes the gradient of v ( x ) ,

d ( x , J ) = inf] II x - y l[ : V y E J[ and v ( J ) = { v ( x ) : x 6 J [ . (A3') The same as (A3) but with J defined by

J zx I x E Ra: v ~ ( x ) = 0 o r f ( x ) = 01.

(A4) x ~ is the unique root of f ( " ), and as x - - }x ~ the function f ( x ) is expressed as

f ( x ) = H ( x - x ~ [I x - x ~ II " + r ( x ) , 7 > 0 , (7)

where H is a stable matrix (i. e. all its eigenvalues have negative real parts) and r ( x ) 6 ~d , r ( x ) / It x -- x ~ II l+r ._~0 as x--~ x 0" (8)

Theorem 1. Assume conditions ( A 1 ) - - ( A 3 ) (A3") hold. I f there is a constant Co such

that II x ~ II <co, v ( x " )<inf l l~l l = % v ( x ) , then {xkt de f inedby ( 3 ) - - ( 6 ) convergestoJ =

t xERa : f ( x ) = 0 } ( J - - t x E R d : v~ (x )=0 o r f ( x ) = O t ) , i . e .

l i m d ( x ~ , J ) = O,

whenever I ei f is such that m ( n k, t )

lira limsup 1 r-o k-oo -T ~ aie,+t = 0, V t E [0, T ] (9) i= n k

for any { n k t such that t x , , I converges, where n

m ( k , t ) -- m a x t n , ~ a i < t 1. i = k

For the proof please refer to refs. [ 4, 5 ]. When (A3) is replaced by (A3 ' ) , the proof is es-

No. 4 SHARP CONVERGENCE RATES OF STOCHASTIC APPROXIMATION 385

sentially the same.

The following lemma gives sufficient conditions for ( 9 ) . o o

Lemma 1. ( i) I f ck = e~ + vk and ~ ake~ + t < oo and vk--~O, then (9) holds. k = 0

(ii) I f ( A 2 ) holds and

~ 1 k - I ak(tk+l)a/~e~+l < oo wi th tk = ~-]ai,

k = 0 i = 0

then (9) holds.

Proof. Part ( i ) is obvious. We now prove part ( i i ) . Setting

1/z s, = ak(tk+l) ek+l, So = 0,

k = 0

we have

( 1 0 )

n

1 ~ ( t~+z) - ( l+ : / r ) (1 + o ( 1 ) ) , - max I si I ak+l

which tends to zero as n --~oo and m --~oo, s ince 2 a k + l ( t k + 2 ) -(1+1/r) < o o . k - 0

o o

Hence, ~ akek+l < oo and by part ( i) (9) holds. k = 0

Lemma 2, Under the conditions ( A 1 ) - - ( A 3 ) ( ( A 3 ' ) ) , xk defined by ( 3 ) - - ( 6 ) con-

verges to J defined in ( A 3 ) ( ( A 3 ' ) ) as k -~oo.

Proof. The conclusions directly follow from Theorem 1 and Lemma 1.

2 M a i n r e s u l t s

Since the matrix H in (A4) is stable, by the Lyapunov equality there is a positive definite

matrix P > 0 such that

P H + H T p = - I . (12)

Denote by 2 max and 2 rain the max imum and minimum eigenvalues of P respectively and by K

the condition number 2 m~x/2 rain'

Theorem 2. ( i ) I f conditions ( A 1 ) - - ( A 4 ) are satisfied, then for I xk I defined by

( 3 ) - - ( 6 )

l i m s u p ( t , ) t / r 11 x , - x ~ II v/K (22m'x] 1/7 (13)

max I si I ~ I (tk+l) - l / r - (tk+2) -1/~ I m<~k<~n k= m

= max I si I ,_. , . tk+2. -1/r 1 - - 1 m~<k~<, tk+2 k = m

~ " " ~-11~' ~xg],akck+l = ( s k - s k - 1 ) ~ t k + l J k = m k = m

n-1

= S n ( t n + l ) - 1 / 7 -- S m _ l ( t m + l ) -111' + ~ s ~ ( ( t k + t ) -1/r - ( t k + 2 ) - 1 1 7 ) . ( 1 1 ) k= ra

Since s, converges, by (A2) the first two terms on the right-hand side of (11) tend to zero

as n -~co and m --~co . The last te rm in (11) is dominated by

386 SCIENCE IN CHINA (Series E) Vol. 41

for any ] ei } satisfying the fo l lowing condition

~ak ( t k+l ) l /~r < co, (14) k=0

where ~" is given by (7 ) , and t tk t is defined in Lemma 1.

(ii) I f , in addit ion, H i s normal, i . e . H T H = H H T, then ( t . ) t / r [[ x . - x ~ [] con- ( 1 )1/7

verges either to 0 or to one o f _ R e ( A s ) ) , , j = l , "" , d , w h e r e a s, j = l , " ' , d are the

eigenvalues o f H and ?~ is given by ( 7 ) . In the latter case, ( t . ) l / r ( x . - x ~ converges to one o f

the eigenspaces ~bj spanned by eigenvectors qs(1) and q j ( 2 ) o f [-I zx H + H T

_ _ ~ that are respectively

the real and imaginary part o f the eigenvector o f H corresponding to Aj.

(iii) Further, i f t e . , ~-. } is a martingale difference, and for any J.-measurable unit

vector O,

T b E ( I r I I ~r.) ~ t ~ + u r , b > 0 , (15)

where ~-. is a f a m i l y o f nondecreasing a-algebras, then

( 1 ) t / r l i m ( t . ) 1/~ {I x . - x ~ {I = R e ( 2 l ) ) " ' (16)

where A z is the eigenvalue o f H wi th the smallest real part in magni tude .

Before proving the theorem, we start wi th lemmas.

Define z . = ( t . ) a / ~ ( x . - x ~ (17)

Then we have Zn+l "=- ( t n + l ) I / 7 ( X n + l -- 3C O)

where

Define

~-i-f. I ( ( t . ) l / ~ ( ~ . + ~ - x . + ~ . - ~ o ) )

( a./1/~ )1:~( x o : 1 + t-~ I ( a . ( t . H II x . - I[ r ( x . - x ~ + r ( x . ) + e.+ 1) + z . )

~--- 1 + ~-n ] g:n + ~n H [I z n [[ )'z. + I[ Zn -- X0 [I 1+)' [] Zn [I 1+7" "1- an(tn)l/~t~n+ 1

an 11;r = ~n + E-hn(~n) + an(t.+ 1) ~n+l,

r_(x.) ) h . ( z ) = (1 + o ( 1 ) ) + H 1[ z [[ rz + [[ x . x ~ [1 1+~' H Z II 1+~" .

h ( z ) = z + H ]] z ][rz. ).

L e m m a 3. Under the conditions ( A 1 ) - - ( A 4 ) and ( 1 4 ) , t z . } is bounded,

Proof. By Lemma 2, x . - -~x ~ there exists an No such that }1 x . - x ~ II < 1 and

1 ]] r ( x n ) I} / ]1 xn -- x ~ ]1 t+~, < 2Amax A 1

(18)

(19)

(20)

No. 4 SHARP CONVERGENCE RATES OF STOCHASTIC APPROXIMATION 387

for any n > N o . By the definition of z . , we also have H z . II < ( t . ) 1/r for any n > N o . Noting

that

~]ak( tk+l) l / rek+l < oo, a . k=0

we can find N1 > N o such that for any m > n ~ N 1 , m

~-]ak ( tk+l )l/r~k+l k = n

0 and t . ~ e ~

1 A 1, (21)

2 a ~ 7 + 1 + I l H [ I o

a.~-~ 1 (22)

24~ma~ " ~ + 1 + I INi lN

t . > l and 1 + o (1 ) ~ 2 for n > N1, where II H I10 = supll .Ii =1 1[ Hz II �9

Define

( ( 1 ) ) Mo = (tN 1)1/~ V V l a n d M = 4K a ~ , a x + - ~ + l M~ + ~ + 1 .

Let n o + l = in f tn > N 1 , tl z , II ~ M o t . If no = c~, then !1 z , II ~-~M0 V max=<N 1 I1 z= II,

V n, i .e . tz~t is bounded. Otherwise, let l0 + 1 = in f t ! > no + 1, I1 z~ I! 2 > M f . We need

only to consider the case lo < oo, since if it is not true, then { zn t is clearly bounded. Then we

h a v e

zTo+lPzzo+ 1 = (zZo+l -- zlo + Z l o ) T p ( z l o + 1 -- ~:lo + Zl o)

1 �9 = hlo(Zlo) + ato(tlo+l)l/reZ,+l + lo Zlo

�9 P l a q h t ( z , ) + aZo( t lo+l) l / rc to+l + g'l.]

g

t lo o o

= ~ 1 ( l o ) + ~ 2 ( l o ) + ~b3(lo, azo(tto+l)l/rezo)

+ (azo(tzo+l)I/rezo+a + zlo)Tp(ato(tlo+X)I/reto+l + zto), (23)

where

~1( lo ) = az~ )Pz~ + zTophzo(zzo)), (24) tZo o o o

2 2

a t ~ a'~ l{ hto(zz o) I1 z (.25) ~2( lo ) = ot2z--nt~176176 ~ z "max t l o

a 2 (tlo+l) l / l qb3(l 0, alo(tlo+l)l/relo+l) -- to tto (eTo+lPhzo(ZZo) + hTo(zzo)Pctcx) . (26)

In the following, we will prove

~1(10) "4- ~2(10) + qb3(10, at(tlo+l)l/relo) ~ O. (27)

For this, we note that

~ l ( 1 0 ) ~ az--!~ + o ( 1 ) ) tl o

. . . . ,, r(x,o, ,, 1 zToPzt ~ - 2 I} zt o I[ 2*r + 2amax I{ Xl o -- XO II l+r II ZZ o {I 2+7

388 SCIENCE IN CHINA (Series E) Vol. 41

and

~ - 2~ ~ I1% II 2+7", (28)

r -~ II % % t l o

t l o

aZo~ . [ 1 ) at~ I[ 2+r (29) ~-tZoUamaxatok~,2 +1+ I l n l l o 2 Ilzzoll2+r~-~4-~tollZZo ,

where the last but one inequality is by the fact that tto > 1, II % I1 > 1 and II % }l/t~:r~l, while the last inequality follows from (22) .

Finally, by (21) we have

r at0( tzo+1)l/reto)~ 2 at~ [{ az0( 1/r tt ~ tz0+l) et0+l 11 11 Phto(Zt o) 11

~<2~~ II%(t )1/,~ .11 tth~o( )11 (30) t l~ max lo+1 lo+1 ZI 0 ,

where

2 [] + 2 I! IIr It Hzz o }l + 2 l1 H 1+, It hzo(%) II ~ 7 II % % %

2 1[ + 2 ( 1 + { I n ] [ o ) ll zt oj[~+r <~7 II % (1 ) ~<2 7 + 1 + IIHIio I1%112+' Therefore, we conclude

a l

r alo(tlo+l)l/~'elo+t) ~ ~ ]1 Zl o 11 2+>',

and taking ( 2 8 ) - - ( 3 0 ) into account we have (27) . Thus, from (23) and (27) it follows that

z t T * l P z z o + l ~ (aZo(tZo+l)l/reto+l + Z t o ) r P ( a t o ( t t o+ 1 ) 1/ret o+ 1 + zl o )

I o

= r -- 1 ) + r - 1) + 03

+ l o

ai(ti+l)l/rr + zzo-1 i = / 0 - 1

l o - 1, ~ ai(ti§162 i = / 0 - 1

P ti+l)l/rei+1 + Zlo-1 �9 i

By the same argument as that used above, we inductively derive

a i ( t i + l ) ~i+1 + Zn 0 l = l l 0 i = n o

(31)

No. 4 SHARP CONVERGENCE RATES OF STOCHASTIC APPROXIMATION 389

2A =x ~o 2 <~ 2Z~oPZ % + 2.[#ai[ti+l) i+t

i = n 0

1

Hence II %+1 il 2 ~< M . This contradicts definition of 10. Therefore l0 must be co.

Lemma 4 {Theorem A) D] I f ~. ~ 0 and for any o~ E f ' ,

~.+1 > / ( 1 + b.A)~,, + b.(en+l + v,,+l),

where A >0, b~--~O, ~ b. = ~ , v~ --~ 0, ( e~) is a martingale difference and n = l

l iminfE(I en+l II ~ r ) > 0,

then P l o ~ 6 I", ~,,(w)---~'O} = 0 .

We now are in a position to prove our main results.

Proof o f Theorem 2. (i) Take V(Z) = C ( z T P z ) z T P z ,

where C ( x ) E C 1 (R, ~ ) is monotonously nondecreasing and 22 -2/y

1, if x > m ~ x k - - ~ / + 1,

C(x) = 2amax .2/r o , .

It is clear that v z ( z ) = 2 ( C ' ( z T P z ) + C ( z T P z ) ) P z ,

(32)

and

zTph ( z ) = l z T p z + II z II rzTpHz = l z T p z -- 1 r -II z II 7+2 < 0

for any z with zTPz > a ~ x ~ . By Lemma 3 { z . I is bounded, and hence from Theorem

1, it follows that /2Amax / 2/7

limsup z ~TPz,, ~ a max \ - - 7 - ] '

This implies (13) .

(ii) Let H be normal. By Theorem 8 . 1 . 3 of ref. [10] H is normal, if and only if there ex-

ists an orthogonal matrix Q such that

i- 01 H = Q "'. QT (33)

0 Kr

where each block Kj, j = 1, 2, "" , r , is either a I x i block or a 2 x 2 block - 3t

pair of complex conjugate eigenvalues a t zx % + fit i and at" Hence, we have

IRe(~ 1) 0 1 [-I = Q ",. Q T (34)

Re( aa ) and

390 SCIENCE IN CHINA (Series E) Vol. 41

H/2/-1 + / - - / 1 H T = 8 /2 / -1 + H-1HTH[- t -1 = H / z / - 1 + [--/-llf-IHT/-~ -1 = 2 1 .

Taking

1 v ( z ) = zTf_ / l z+ ~ l l ~ l l ? .+2 we find that )T(; )

1 zT_g/- 1 2 - r2 z + T l l ~ l l ' + 2 + Ilzll2'~THz

= - l ( _ bI) 1/2z - ( - fI) 1/211z ll ?"z 2<~0.

By (34) it is clear that

t 1 I z : v = ( z ) h ( z ) - - O t = z : T + f-/llzllTz=0

1 ~l/r

where gj is any unit vector belonging to 0i defined in the formulation of the theorem. Therefore,

the assertion (ii) directly follows from Lemma 2 and Theorem 1.

(iii) Let us go back to (34) . Corresponding to Kj, j = 1, "" , r we partition columns of Q

into r groups Q = ( QI "'" Qr) such that the number of columns of Q~ equals the dimension o{ Kj.

Assume 2j = aj+fl j i , K s = - f l J andQj = ( q j ( 1 ) q j ( 2 ) ) . In the case where ~ j i s rea lwe

identify q~ (2) to 0, and ,lj to aj and Kj. Then

H r q i ( 1 ) -- Q Q T H T Q Q T q ' j ( 1 ) = a~q~(1) + fljq~(2) (36)

and HTqj(2) = Q Q T H T Q Q T q j ( 2 ) = - fljq~(1) + a~q~(2). (37)

Set r , = h , ( z n ) - h ( z , ) ,

( z , ,qz (1) )q t (1) + ( z~q t (2 ) )q t (2 ) P. = ~ / ( zTq l (1 ) )2 + (zTqt(2)) 2 , if ~/ ( z~qz(1) ) 2 + (zTqz(2)) 2 ~ 0,

qt ( 1 ), otherwise,

and ~,,+1 = I P.((z.+lqt(1))qt(1)T T + (zT+lqz(Z) )qz (2) ) I.

By ( 3 6 ) - - ( 3 8 ) , we derive T T T ( z T H T q l ( 2 ) ) q t ( 2 ) ) p n ( ( z . H q l (1 ) )q t (1 ) +

= p T ( a t ( ( z T q t ( 1 ) ) q t ( 1 ) + ( zTqz (2 ) )qz (2 ) ) + f l t ( ( zTq t (2 ) )q t (1 ) -- ( zTq t (1 ) )q t (2 ) ) )

= az ~/(z.Vql(1)) 2 + (z.Tqz(2)) 2,

and 1 / r T A / r p ~ ( ( ( r . + t .+ l t . e .+ l ) q ( 1 ) ) q ( 1 ) + ( ( r . + t .+XGe .+l ) rq (2 ) )q (2 ) )

. T ( r 1/? = P n n + t n + l t n ~ n + l ) "

From (40) and (41) it follows that

(38)

(39)

(41)

(40)

No. 4 SHARP CONVERGENCE RATES OF STOCHASTIC APPROXIMATION 391

.1/7 ) ~n t ' n ~

( )) = 1 + -~. + ~l I[ Zn I[ y I] Pn-1 II ~/(z~Tq,(1)) 2 + (zT~q,(2)) 2

( l + - ~ - a n ( l + a, ,t z, ,I y ) ) l p._l( (z~qz(1) )q t ( l ) + (z.qt(2) )qz(2) ) I T X

( a ( 1 )) = 1 + E +,~l l~=l l ~ gn.

DefineFm lo~,lim, ilz~il < ( l ~ m ) '/;'] = - wi th m being an integer , and set

~,, = ~.1 I1~ ~ , < ~ 7o, j

By ( 4 2 ) , we have

~ " + ~ ( ( l + t n , 7

>(t+_ rn?" t--~ $~ + t n en+l + v . + l ) ,

where

--w-, j } - < + m: 1/r T 1- :_=Xkv_ ~ v~ en+l= t"+xt~Pnen+11tll%'l<(- r% ] t"

)) an_s, v~ ) : ~-:_aL~_ ~,,, + al ][ Zn [[ 7 ~n + ~ nkrn + tn+ltnen+l) I I I I z . U< ~- ra, ] }

(42)

~=ii~k- ~'t ] 'ti'%-1i1<~. - r"t t '

Since rn --~ 0 and II zn II converges, we see that vn+l * 0. By ( 1 5 ) , for any ~o E / 'm,

t 1/r t b E ( I e~+l I I ~ ) ~ n 1 n:nl~l/r ~ b > 0 for sufficiently large n .

By L e m m a 4 it follows tha t

PtPm r] t ~n - -~o}} = 0 (43)

for any integer m . For notat ional convenience let us wr i t e q~ as q t ( 1 ) and )~t as Re()~t) if )~t is re-

al. By (i i) of the theorem we ei ther have (16) or z~ converges to 0 or to some Cj corresponding to

2 i wi th Re(Aj ) < R e ( 2 t ) . In the la t ter case we have zTqz(1) --~ 0, zTq t (2 ) --~ 0 and hence $n

0 Therefore , (43) implies P { f ' , , } = 0 and lira ]] z~ H ~.~ (1 - 1 / m )l/r - , oo " , ~ azY a . s . for all suf-

ficiently large m . Tend ing m to oo verifies tha t (16) also holds for this case.

3 Concluding remarks

In this paper we have given sharp convergence rates for SA a lgor i thm for the case where the

derivative of f ( " ) at its root x ~ is zero. T h e precise limit points of the a lgor i thm are given when

H is normal and the limit is unique if, in addition, the noise satisfies some condit ions. For fur ther

studying we would like to ment ion the following prob lems: ( i ) I t would be nice to consider the

more general s t ructure than that given by (7 ) for f ( " ) in the neighborhood of x ~ (ii) I t is inter-

esting to consider more general noise than ( 1 0 ) . ( i i i ) It is also interest ing to present the limit

392 SCIENCE IN CHINA (Series E) Vol. 41

p o i n t s of t h e a l g o r i t h m for t h e case w h e r e H is n o t n e c e s s a r i l y n o r m a l .

References

1 Robbins, H., Monro, S., A stochastic approximation method, Ann. Math. Statist., 1951, 22: 400. 2 Nevelson, M. B., Hasminskii, R. Z., Stochastic approximation and reeursive estimation, AMS Translations of Math.

Monographs, Vol. 47, Providence: Amer. Math. Soc., 1976. 3 Kushner, H. J . , Clark, D. S., Stochastic Approximation for Constrained and Unconstrained Systems, Berlin: Springer,

1978. 4 Chen, H. F., Stochastic approximation and its new applications, Proc. 1994 Hong Kong International Workshop on New

Directions of Control and Manufacturing, Hong Kong: Hong Kong UST, 1994, 2--12. 5 Chen, H. F. , Zhu, Y. M., Stochastic Approximation (in Chinese), Shanghai: Shanghai Scientific & Technical Publish-

ers, 1996. 6 Pelletier, M., Loi forte quadratique des grands nombres pour les algorithmes stochastiques, C. R. Acad. Sci. Paris, I,

1996, 323: 665. 7 Ljung, L., Pflug, G., Walk, H., Stochastic Approximation of Random Systems, Basel: Birkhisuser, 1992, 71--76. 8 Chen, H. F. , Duncan, T . , Pasik-Duncan, B., On Ljung's approach to system parameter identification, lOth IFAC Sym-

posium on Systems Identification (eds. Blanke, M., S6derstr6m, T. ), Vol. 2, Preprints, Copenhagen: Danish Information

Society, 1995, 667--671. Brandi6re, O., Duflo, M., Les algorithmes stoehastiques eontournent-ile les pi6ges? Ann. Inst. H. Polncark, 1996, 32:

395.

Lewis, D. W., Matrix Theory, Singapore: World Scientific, 1991.

9

10