7
Systems & Control Letters 35 (1998) 309–315 Convergence analysis of dynamic stochastic approximation Han-Fu Chen a; * , Katsuji Uosaki b a Laboratory of Systems and Control, Institute of Systems Science, Chinese Academy of Sciences, Beijing 100080, People’s Republic of China b Department of Information and Knowledge Engineering, Tottori University, Tottori 680, Japan Received 19 January 1998; received in revised form 5 May 1998 Abstract Stochastic approximation method concerns the sequential estimation of the root or the extreme of a function observed with noise. The idea is then extended to the moving root case by Dupa c, and it is called as the dynamic stochastic approximation method. Its convergence properties are derived here by using a randomly varying truncation technique under weaker conditions in comparison with the previous work: the growth rate restriction on the function has been removed and the condition on the observation noises has been weakened to the possibly weakest ones. c 1998 Elsevier Science B.V. All rights reserved. Keywords: Dynamic stochastic approximation; w.p.1 convergence; Randomly varying truncation 1. Introduction Stochastic approximation (SA) method originated by Robbins and Monro [9] and Kiefer and Wolfowitz [8] concerns the sequential estimation of the root or the extreme of a function observed with noise and called as a regression function. The idea was then ex- tended to estimating the dynamic root of a regression function by Dupa c [6], i.e. to the case where the root moved in a specied manner during the approxima- tion process. However, in [7] only the case where the movement of the root was expressed by a linear func- tion of its present location was considered. Further progress in this direction was made by Uosaki [10], who generalized the idea to more general movements with nonlinear or random trends. * Corresponding author. E-mail: [email protected]. This research was supported in part by JSPS, the National Key Project of China and the National Natural Science Foundation of China. However, in the past work [1], the growth rate re- striction on the regression functions (usually pseudo- linear) and observation noise conditions, are rather strong. In case of the conventional (static) stochas- tic approximation, the growth rate restriction imposed on the regression function has been removed by us- ing a randomly varying truncation technique [2, 3]. In the dynamic stochastic approximation, the roots are moving with time and may be unbounded. Therefore, the estimates, in general, should not be required to be bounded. It will be shown in this paper how to apply the randomly varying truncation technique to the dy- namic stochastic approximation in order to remove the growth rate restrictions on the sequence of regression functions. Concerning the observation noise, we can weaken it to the possibly weakest one, which is similar to those given by Chen [3]. Under standard conditions on the regression function for convergence of the stochastic approximation, the necessary and sucient conditions that should be satised by the observa- tion noise are given by Wang et al. [11]. Under 0167-6911/98/$ – see front matter c 1998 Elsevier Science B.V. All rights reserved. PII:S0167-6911(98)00077-2

Convergence analysis of dynamic stochastic approximation

Embed Size (px)

Citation preview

Page 1: Convergence analysis of dynamic stochastic approximation

Systems & Control Letters 35 (1998) 309–315

Convergence analysis of dynamic stochastic approximation

Han-Fu Chen a;∗, Katsuji Uosaki b

a Laboratory of Systems and Control, Institute of Systems Science, Chinese Academy of Sciences,Beijing 100080, People’s Republic of China

b Department of Information and Knowledge Engineering, Tottori University, Tottori 680, Japan

Received 19 January 1998; received in revised form 5 May 1998

Abstract

Stochastic approximation method concerns the sequential estimation of the root or the extreme of a function observed withnoise. The idea is then extended to the moving root case by Dupa�c, and it is called as the dynamic stochastic approximationmethod. Its convergence properties are derived here by using a randomly varying truncation technique under weaker conditionsin comparison with the previous work: the growth rate restriction on the function has been removed and the condition onthe observation noises has been weakened to the possibly weakest ones. c© 1998 Elsevier Science B.V. All rights reserved.

Keywords: Dynamic stochastic approximation; w.p.1 convergence; Randomly varying truncation

1. Introduction

Stochastic approximation (SA) method originatedby Robbins and Monro [9] and Kiefer and Wolfowitz[8] concerns the sequential estimation of the root orthe extreme of a function observed with noise andcalled as a regression function. The idea was then ex-tended to estimating the dynamic root of a regressionfunction by Dupa�c [6], i.e. to the case where the rootmoved in a speci�ed manner during the approxima-tion process. However, in [7] only the case where themovement of the root was expressed by a linear func-tion of its present location was considered. Furtherprogress in this direction was made by Uosaki [10],who generalized the idea to more general movementswith nonlinear or random trends.

∗ Corresponding author. E-mail: [email protected]. Thisresearch was supported in part by JSPS, the National Key Projectof China and the National Natural Science Foundation of China.

However, in the past work [1], the growth rate re-striction on the regression functions (usually pseudo-linear) and observation noise conditions, are ratherstrong. In case of the conventional (static) stochas-tic approximation, the growth rate restriction imposedon the regression function has been removed by us-ing a randomly varying truncation technique [2, 3].In the dynamic stochastic approximation, the roots aremoving with time and may be unbounded. Therefore,the estimates, in general, should not be required to bebounded. It will be shown in this paper how to applythe randomly varying truncation technique to the dy-namic stochastic approximation in order to remove thegrowth rate restrictions on the sequence of regressionfunctions.Concerning the observation noise, we can weaken

it to the possibly weakest one, which is similar tothose given by Chen [3]. Under standard conditionson the regression function for convergence of thestochastic approximation, the necessary and su�cientconditions that should be satis�ed by the observa-tion noise are given by Wang et al. [11]. Under

0167-6911/98/$ – see front matter c© 1998 Elsevier Science B.V. All rights reserved.PII: S0167 -6911(98)00077 -2

Page 2: Convergence analysis of dynamic stochastic approximation

310 H.-F. Chen, K. Uosaki / Systems & Control Letters 35 (1998) 309–315

more general conditions on the regression functions,similar necessary and su�cient conditions are alsogiven by Chen [5]. It is worth noting that these noiseconditions are equivalent only under additional con-ditions and that without additional conditions the onegiven by Chen [5] is the weakest. In this paper, thenecessary and su�cient noise conditions will alsobe given for convergence of the dynamic stochasticapproximation.

2. Dynamic stochastic approximation withrandomly varying truncations

Let {Mk(·)} be a sequence of unknown functionsMk(·) :Rn→Rn; Mk(�k)= 0; k =1; 2; : : : . The prob-lem is to estimate �k . Let xk be the estimate for �kat time k based on the observations {yj; j6 k}. Theevolution of the root �k satis�es the following equa-tion

�k+1 = gk(�k) + ”k ; (1)

where gk(·) :Rn→Rn are known functions, while{”k} is a sequence of dynamic noises.The observations on {Mk(·)} are given by

yk+1 =Mk+1(gk(xk)) + wk+1; (2)

where {wk} is a sequence of observation noises andwk+1 is allowed to depend on (xk − �k).In what follows the discussion is for a �xed sam-

ple, and the analysis is purely deterministic. Let usarbitrarily take x1 as the estimate for �1 and de�ne

h1(x)= g1(x);

hk(x)= gk(hk−1(x)) for k =2; 3; : : : :(3)

From Eq. (1), we see that hk(x1) may serve as arough estimate for �k+1. In the sequel, we will imposesome conditions on {gk(·)} and {”k} so that‖hk(x1)− �k+1‖¡�¡∞ ∀k =1; 2; : : : ; (4)

where � is an unknown constant. Therefore, xk+1 −hk(x1) should not diverge to in�nity. But � is un-known, so we have to use expanding truncationbounds. This is the idea of the following algorithm.Take a sequence of increasing real numbers {Ki}satisfying

Ki¿0; Ki+1¿Ki; limi→∞

Ki=∞: (5)

Let {xk} be recursively de�ned by the followingalgorithm:

xk+1 = (gk(xk) + akyk+1)I[‖gk (xk )+akyk+1−hk (x1)‖6K�k ]

+ hk(x1)I[‖gk (xk )+akyk+1−hk (x1)‖¿K�k ]; (6)

�k =k−1∑i=1

I[‖gi(xi)+aiyi+1−hi(x1)‖¿K�i ]: (7)

Clearly, �k is the number of truncations occurreduntil time k. The algorithm means that at time k + 1,the estimate gk(xk) + akyk+1 given by the dynamicstochastic approximation for the unknown parameter�k+1 is compared with hk(x1). If the di�erence is lessthan the truncation bound K�k , then the estimate isassured acceptable. Otherwise, we set xk+1 = hk(x1).We now list conditions that will be used later on.A1. ak¿0; limk→∞ ak =0;

∑∞k=1 ak =∞.

A2.Mk(·) :Rn→Rn; Mk(�k)= 0; Mk(·) is measur-able and for any c¿0, there is a constant �(c) possi-bly depending on c so that ‖Mk(�k + �)‖¡�(c) for∀� with ‖�‖6 c; ∀k =1; 2; : : : .A3. gk(·) :Rn→Rn are known such that for

k =1; 2; : : : ,

‖dk(x)‖6 k‖x − �k‖ ∀x;where dk(x) := gk(x)− gk(�k)− (x− �k), k =o(ak);∑∞

k=1 k¡∞.A4. ‖”k‖=o(ak);

∑∞k=1 ‖”k‖¡∞.

A5. There is a continuously di�erentiable functionv(·) :Rn→R such that v(x) 6=0 for ∀x 6=0; v(0)= 0and for any 0¡r1¡r2¡∞supk

supr16‖x−�k‖6r2

MTk (x)vx(x − �k)¡− a;

where a is a positive constant possibly depending onr1 and r2, and vx(·) denotes the gradient of v(·). Itis also required that for � in Eq. (4), there exists aconstant r¿� such that

sup‖y‖6�

v(y)¡ sup‖x‖=r

v(x): (8)

A6. For any convergent subsequence {xki −�ki} theobservation noise satis�es

limT→0

lim supi→∞

1T

∣∣∣∣∣∣∣∣∣∣∣∣m(ki ; t)∑j=ki

ajwj+1

∣∣∣∣∣∣∣∣∣∣∣∣ =0 ∀t ∈ [0; T ];

where

m(k; T )=max

{m|

m∑i=k

ai6T

}:

Page 3: Convergence analysis of dynamic stochastic approximation

H.-F. Chen, K. Uosaki / Systems & Control Letters 35 (1998) 309–315 311

We now explain the conditions. Condition A1 isstandard. Condition A2 implies the local bounded-ness but the upper bound should be uniform withrespect to k. In Condition A3, dk(x) measures thedi�erence between the estimation error (x−�k) and theprediction error gk(x) − gk(�k). In general, ‖gk(x) −gk(�k)‖ should be greater than ‖x−�k‖. For example,gk(x)= c + x; ”k ≡ 0, then Condition A3 holds with k ≡ 0; �k+1 = c+ �k . Condition A4 means that in theroot dynamics, the noise should be vanishing.Condition A5 is about existence of a Lyapunov

function. This kind of condition is unavoidable in con-vergence analysis of stochastic approximationmethod.Inequality (8) is an easy condition. For example, ifv(x)= xTx, or more general, if v(x)→∞ as ‖x‖→∞,then inequality (8) is satis�ed. Concerning Condi-tion A6, as will be shown, this condition is also nec-essary for xk−�k → 0 as k→∞ if Conditions A1–A5hold. As a matter of fact, it is equivalent to

limT→0

lim supk→∞

1T

∣∣∣∣∣∣∣∣∣∣∣∣m(k; T )∑j=k

ajwj+1

∣∣∣∣∣∣∣∣∣∣∣∣=0 ∀t ∈ [0; T ] (9)

under Conditions A1–A5. However, in the natureof conditions, Condition A6 is weaker than Eq. (9).In some cases, Condition A6 can be established beforeproving boundedness of {xk − �k}, but it is often dif-�cult to verify Eq. (9) without knowing boundednessof {xk − �k}.We now give an example showing the possible ap-

plication of the algorithm.

Example. Assume that the chemical product is pro-duced in a batch mode, and the product quality orquantity of the kth batch depends on the temperature inthe batch. When the temperature equals the ideal onethen the product is optimized. Let Mk(x) denote thedeviation of the temperature from its optimal value forthe kth batch, where x denotes the control parameter,which may be, for example, the pressure in batch, thequantity of catalytic promoter, the raw material pro-portion and others. The deviation reduces to zero if thecontrol x equals its optimal value �k , i.e. Mk(�k)= 0.Because of the environment change the optimal pa-rameter �k may change from batch to batch. Assume

�k+1 = gk(�k) + �k ; where gk(·) is knownand �k is the noise:

Let xk be the estimate for �k . Then gk(xk) may serveas a prediction for �k+1. Apply gk(xk) as the control

parameter for the (k + 1)th batch. Assume that thetemperature deviation Mk+1(gk(xk)) for the (k + 1)thbatch can be observed, but the observation yk+1 maybe corrupted by noise, i.e.

yk+1 =Mk+1(gk(xk)) + wk+1;

where wk+1 is the observation noise.Then we can apply algorithms (6) and (7) to esti-

mate �k and under Conditions A1–A6 by Theorem 1the estimate xk is consistent, i.e. ‖xk − �k‖ →

k→∞0.

3. Boundedness of {xk − �k}

Set

�k = xk − �k :

In this section, we show that after a �nite numberof steps the algorithm becomes untruncated:

xk+1 = gk(xk) + akyk+1: (10)

As a consequence, {�k} is shown to be bounded.

Lemma 1. Under Conditions A3 and A4, the se-quence {hk(x1)− �k+1} is bounded for any x1.

Proof. By Eq. (1) and Conditions A3 and A4, wehave

‖hk(x1)− �k+1‖= ‖hk(x1)− gk(�k)− ”k‖= ‖gk(hk−1(x1))− gk(�k)− ”k‖= ‖dk(hk−1(x1)) + hk−1(x1)−�k−”k‖6 (1 + k)‖hk−1(x1)− �k‖+ ‖”k‖

6k∏i=1

(1 + i)‖g1(x1)− �1‖

+k∑i=1

k∏j=i+1

(1 + j)‖”i‖

6∞∏i=1

(1 + i)‖g1(x1)− �1‖

+∞∑i=1

∞∏j=1

(1 + j)‖”i‖= �¡∞ ∀k; (11)

Page 4: Convergence analysis of dynamic stochastic approximation

312 H.-F. Chen, K. Uosaki / Systems & Control Letters 35 (1998) 309–315

where by Condition A3,∏∞j=1(1 + j)¡∞, and by

Condition A4,∑∞

i=1 ‖”i‖¡∞.

Lemma 2. Assume Conditions A1–A4 and A6. Let{�ki} be a convergent subsequence such that �ki → ��as i→∞. Then, there are su�ciently small T¿0 anda su�ciently large integer i0 such that for i¿ i0

xm+1 = gm(xm) + amym+1; (12)

‖�m − �ki‖6 ct (13)

for ∀m: ki6m6m(ki; t); ∀t ∈ [0; T ], where c is aconstant independent of i.

Proof. In the case �k → �¡∞ as k→∞, {gk(xk) +akyk+1 − hk(x1)} is bounded, and hence {xk+1 −hk(x1)} is bounded. By Lemma 1, {hk(x1) − �k+1}is bounded. Therefore, {�k} is bounded. For large i,ki¿�, and

xk+1 = gk(xk) + akyk+1 ∀k¿ ki: (14)

The following expression (15) and estimate (16)will be used frequently. By Eq. (1) and Condition A3,we have

gk(xk) + akMk+1(gk(xk))

= gk(xk)− gk(�k)− ”k + �k+1 + akMk+1(gk(xk)− gk(�k)− ”k + �k+1)

=dk(xk) + �k − ”k + �k+1+ akMk+1(�k+1 + dk(xk) + �k − ”k) (15)

and

‖gk(xk) + akyk+1 − �k+1‖6 (1 + k)‖�k‖+ ‖”k‖+ ak�((1 + k)‖�k‖+ ‖”k‖) + ‖akwk+1‖: (16)

Substitution of Eq. (16) into Eq. (14) leads to

‖�m+1 − �ki‖

6m∑j=ki

j‖�j‖+m∑j=ki

‖”j‖

+m∑j=ki

aj�((1 + j)‖�j‖+ ‖”j‖)

+

∣∣∣∣∣∣∣∣∣∣∣∣m∑j=ki

ajwj+1

∣∣∣∣∣∣∣∣∣∣∣∣ : (17)

By boundedness of {�j} and Condition A3,∑m

j=ki j‖�j‖=

∑mj=ki o(aj)¡ct=4 for some c¿0. By

Condition A4,∑m

j=ki ‖”j‖¡ct=4, while the last termis also less than ct=4 by Condition A6.Without loss of generality, we may assume �((1 +

j)‖�j‖+ ‖”j‖)¡c=4. Therefore, ‖�m+1 − �ki‖6 ct,and the lemma is true for the case �i→ �¡∞ asi→∞. We now consider the case �i→∞ as i→∞.Let i0 be so large that for i¿ i0

‖�ki‖¡‖ ��‖+ 1;∣∣∣∣∣∣∣∣∣∣∣∣m(ki ; t)∑j=ki

ajwj+1

∣∣∣∣∣∣∣∣∣∣∣∣6 c1T ∀t ∈ [0; T ] (18)

with a constant c1, and

j(2 + ‖ ��‖)¡aj; ‖”j‖¡aj ∀j¿ ki0 ; (19)

K�ki0¿2‖ ��‖+ 5 + 2c1T + �+ �(2‖ ��‖+ 5); (20)

where � is given by Eq. (11).Without loss of generality, we may assume

aj¡1 ∀j¿ki0 : (21)

De�ne c=2 + c1 + �(2‖ ��‖ + 5), and take T sosmall that Tc¡1. We prove the lemma by induction.By Eqs. (11) and (16), we see

‖gki0 (xki0+aki0 (Mki0+1(gki0 (xki0 ))+wki0+1)− hki0 (x1)‖6 2‖�ki0‖+ 1 + �+ �(2‖�ki0 ‖+ 1) + c1T6K�ki0 : (22)

Therefore, at time ki0 + 1, there is no truncation.Then by Eqs. (15) and (16),

‖�ki0+1 − �ki0 ‖6 ki0‖�ki0 ‖+ aki0 + aki0�(2‖�ki0‖+ 1) + c1t6 ki0 (‖ ��‖+ 2)+aki0+aki0 �(2(‖ ��‖+ 2)+1)+c1t6 2aki0 + aki0 �(2‖ ��‖+5)+ c1t¡ct; (23)

where Eqs. (19) and (21) have been used.Let Eqs. (12) and (13) be held for m= ki; : : : ; k.

We prove them for m= k + 1. Again by Eq. (16),

‖gk(xk) + akyk+1 − hk(x1)‖6 2(‖ ��‖+ 2) + 1 + �+ �(2‖ ��‖+ 5) + 2c1t¡K�ki0¡K�k for k¿ki0 : (24)

Page 5: Convergence analysis of dynamic stochastic approximation

H.-F. Chen, K. Uosaki / Systems & Control Letters 35 (1998) 309–315 313

Hence there is no truncation at time k + 1. By the in-ductive assumptions and Eqs. (15) and (16), it followsthat

‖�k+1 − �ki0 ‖6k∑

j=ki0

( j‖�j‖+‖”j‖+aj�(1 + j)

×‖�j‖+ ‖”j‖) +∣∣∣∣∣∣∣∣∣∣∣∣k∑

j=ki0

ajwj+1

∣∣∣∣∣∣∣∣∣∣∣∣6 2

k∑j=ki0

aj

+k∑

j=ki0

aj�(2‖ ��‖+ 5) + c1t¡ct; (25)

where Eqs. (18) and (19) are invoked.

Lemma 3. Under Conditions A1–A6, the number oftruncations in Eq. (6) is �nite and {�k} is bounded.

Proof. Using the argument in the Proof of Lemma 2,the boundedness of {�k} follows from the bound-edness of the number of truncations. Hence, it suf-�ces to show that �k → �¡∞ as k→∞. Assumethe converse: �k →∞ as k→∞. This means thatthe sequence {gk(xk) + akyk+1 − hk(x1)} is un-bounded. Let {ki + 1} be the sequence of truncationtimes. We prove that {�k} is also unbounded if�k →∞.Assume {�k} is bounded, then {�ki} is also

bounded. From {�ki}, we select a convergent sub-sequence, denoted by the same {�ki} for nota-tional simplicity, such that �ki → ��. By assumption,truncation happens at the next time ki + 1. Thiscontradicts with Eq. (12). Therefore, in the case�k →∞, {�k} is also unbounded. Since �k →∞, al-gorithm (6) backs to hk(x1) for in�nitely many times.Let xki = hki−1(x1); i=1; 2; : : : . This means that�ki = hki−1(x1)− �ki . By Lemma 1, {�ki} is boundedand by Eq. (11), ‖�ki‖6 �.Because {�k} is unbounded, starting from ki,

{�k} will exit the ball with radius r, where ris given by Eq. (8). Therefore, there is an inter-val [�1; �2]⊂ (sup‖y‖6� v(y); inf‖x‖=r v(x)), andfor any i, there is a sequence �mi ; �mi+1; : : : ; �‘isuch that ki6mi, v(�mi)6 �1, �1¡v(�j)¡�2 for∀j: mi¡j¡‘i, and v(�‘i)¿ �2. In other words, thevalues of v(·) at the sequence {�mi ; �mi+1; : : : ; �‘i}cross the interval [�1; �2] from the left. It is clear that‖�mi‖¡r ∀i=1; 2; : : : . Select from {�mi} a conver-gent subsequence denoted still by {�mi} such that�mi → �� as i→∞. It is clear that �6 ‖ ��‖6 r.

From now on, we assume i is large enough and Tis small enough so that Lemma 2 is applicable andEqs. (12) and (13) are valid with ki replaced by mi.Since �mi converges, by Condition A5 and Eq. (16)

it follows that �mi+1 − �mi → 0 as i→∞. Hence, wehave

limi→∞

v(�mi)= v( ��)= �1: (26)

By Lemma 2, ‖�j−�mi‖6 cT for ∀j: mi6 j6m(mi; T ). Noticing �6 ‖ ��‖, for small T we then have

‖�j‖¿�2 ∀j: mi6 j6m(mi; T ) (27)

In the following Taylor expansion, �̃∈Rn is lo-cated in-between �mi and �m(mi; T ), and by Lemma 2,‖�̃‖6 cT + ‖ ��‖+1. By Eqs. (12) and (15) we have

v(�m(mi; T ))− v(�mi)

= vTx (�̃)m(mi; T )−1∑j=mi

[dj(xj)−”j+aj(Mj+1(gj(xj))+wj+1)]

= vTx (�̃)

[m(mi; T )−1∑j=mi

(dj(xj)− ”j) +m(mi; T )−1∑j=mi

ajwj+1

]

+m(mi; T )−1∑j=mi

aj(vTx (�̃)−vTx (gj(xj)−�j+1))Mj+1(gj(xj))

+m(mi; T )−1∑j=mi

ajvTx (gj(xj)− �j+1))Mj+1(gj(xj)):

(28)

Notice that by Lemma 2 and Eq. (18)

‖ajyj+1‖

6 aj�(2‖ ��‖+ 5) +∣∣∣∣∣∣∣∣∣∣j∑

k=mi

akwk+1 −j−1∑k=mi

akwk+1

∣∣∣∣∣∣∣∣∣∣

6 aj�(2‖ ��‖+ 5) + 2c1T¡�4 (29)

for su�ciently large mi. From Eqs. (27) and (29), itfollows that

‖gj(xj)− �j+1‖= ‖gj(xj) + �j+1 − xj+1‖¿ �4

for j=mi; : : : ; m(mi; T )− 1: (30)

Page 6: Convergence analysis of dynamic stochastic approximation

314 H.-F. Chen, K. Uosaki / Systems & Control Letters 35 (1998) 309–315

On the other hand, by Lemma 2

‖gj(xj)− �j+1‖¡�4+ ‖�j+1‖6 �

4+ ‖�j+1 − �mi‖+ ‖�mi‖

6�4+ cT + r: (31)

Identifying r1 and r2 in Condition A5 to �=4 and �=4+cT + r, respectively, we can �nd a¿0 such that

vTx (gj(xj)− �j+1))Mj+1(gj(xj))¡−a∀j: mi6 j6m(mi; T )− 1 (32)

by Condition A5. Let us consider the RHS of Eq. (28).Noticing ‖dj(xj)‖6 j‖�j‖6 j(cT + ‖ ��‖ + 1) wehave by Conditions A3 and A4,

limi→∞

m(mi; T )−1∑j=mi

(dj(xj)− ”j)= 0: (33)

By Condition A5,

limT→0

lim supi→∞

∣∣∣∣∣∣∣∣∣∣m(mi; T )−1∑j=mi

ajwj+1

∣∣∣∣∣∣∣∣∣∣=o(T ): (34)

Notice that

‖�̃− (gj(xj)− �j+1)‖6 ‖�̃− �mi‖+‖�j−�mi‖+‖gj(xj)− �j+1 − �j‖6 2cT + ‖dj(xj)− ”j‖6 2cT + j(cT + ‖ ��‖+ 1) + ‖”j‖→ 0

∀j: mi6 j6m(mi; T )− 1 (35)

as i→∞ and T→ 0. Hence by continuity of vx(·), asi→∞ and T→ 0, vTx (�̃)− vTx (gj(�j)− �j+1) tends tozero.Noticing ‖Mj+1(gj(xj))‖6 �(2‖ ��‖ + 5), we �nd

that the sum of the �rst and second terms on RHS ofEq. (28) is of o(T ) as i→∞ and T→ 0. This com-bining with Eq. (32) yields the following conclusionthat for i¿ i0 with su�ciently large i0 and for smallenough T from Eq. (28)

v(�m(mi; T ))− v(�mi)6− a2T: (36)

By Eq. (26), tending i to∞, from Eq. (36), we have

lim supi→∞

v(�m(mi; T ))6 �1 − a2T: (37)

By Lemma 2 we have

limT→0

maxmi6m6m(mi; T )

‖v(�m)− v(�mi)‖=0: (38)

However, by de�nition, v(�mi)6 �1; �1¡v(�j)¡�2 for j: mi¡j¡‘i and v(�‘i)¿ �2. Hence fromEq. (38), we must have m(mi; T )¡‘i if T is smallenough. Therefore, v(�m(mi; T ))∈ [�1; �2]. This contra-dicts Eq. (37). The obtained contradiction shows thatlimk→∞ �k¡∞.

4. Main results

Theorem 1. Under Conditions A1–A6, the estima-tion error �k = xk − �k tends to zero as k→∞.

Proof. We �rst show that v(�k) converges. Assumethe converse:

v1 = lim infk→∞

v(�k)¡ lim supk→∞

v(�k)= v2; (39)

where −∞¡v1¡v2¡∞ because {�k} is boundedby Lemma 3. It is clear that there exists an in-terval [�1; �2] that does not contain zero such that[�1; �2]⊂(v1; v2). Without loss of generality, as-sume 0¡�1¡�2. From Condition A6, it followsthat there are in�nitely many sequences such thatv(�mi)6 �1; v(�‘i)¿ �2 and that v(�j)∈ (�1; �2) forj: mi¡j¡‘i; i=1; 2; : : : . Without loss of generality,we may assume {�mi} converges: �mi →�′. Sincev(�′)= �1¿0, there is �¿0 such that ‖�′‖¿ �,and by Lemma 2 ‖�j‖¿�=2 ∀j: mi6 j6m(mi; T ).Completely the same argument as that we used forEqs. (28)–(38) leads to a contradiction. Hence v(�k)is convergent.We now show that �k → 0 as k→∞. Assume that

the converse: there is a subsequence �mi →�′ 6=0.By the same argument we again arrive at Eq. (36).Tending i→∞, by convergence of {v(�k)}, we obtaina contradictory inequality 06 − aT=2. This impliesthat �k → 0 as k→∞.

Theorem 2. Assume that Conditions A1–A5 holdand Mk(·) is continuous at �k uniformly in k. Thenxk − �k → 0 as k→∞ if and only if Condition A6holds. Furthermore, under Conditions A1–A5, thefollowing three conditions are equivalent.(i) Condition A6

Page 7: Convergence analysis of dynamic stochastic approximation

H.-F. Chen, K. Uosaki / Systems & Control Letters 35 (1998) 309–315 315

(ii)

limT→0

lim supk→∞

1T

∣∣∣∣∣∣∣∣∣∣m(k; t)∑i=k

aiwi+1

∣∣∣∣∣∣∣∣∣∣=0 ∀t ∈ [0; T ]

(iii) wk+1 can be decomposed into two parts:wk+1 =w′

k+1 + w′′k+1 so that

∑∞k=1 akw

′k+1¡∞ and

w′′k → 0 as k→∞.

Proof. Assume xk − �k → 0 as k→∞. Then {�k} isbounded. We have shown in the proof of Lemma 3that the number of truncations must be �nite if {�k} isbounded. Therefore, starting from some k0 algorithm(6) becomes

xk+1 = gk(xk) + akyk+1; k¿ k0: (40)

By Eq. (15), we have

wk+1 =�k+1 − �k

ak+dk(xk)− ”k

ak+Mk+1(�k+1 + dk(xk) + �k − ”k): (41)

Set

w′k+1 =

�k+1 − �kak

;

w′′k+1 =

dk(xk)− ”kak

+Mk+1(�k+1 + dk(xk) + �k − ”k): (42)

By A3 and A4 and �k → 0; (dk(xk) − ”k)=ak → 0 ask→∞, while Mk+1(�k+1+dk(xk)+�k−”k) tends tozero because Mk(·) is uniformly continuous at �k anddk(xk) + �k − ”k → 0. Consequently (iii) holds.On the other hand, it is clear that (iii) implies

(ii) which in turn implies A6. By Theorem 1, underA1–A5, Condition A6 implies xk − �k → 0 as k→∞.Thus, the equivalence of (i)–(iii) has been justi�edunder A1–A5.

5. Concluding remarks

We have applied a stochastic approximation algo-rithm to track moving roots. The tracking error isshown to converge to zero. No growth rate restrictionsare imposed on the functions whose roots are tracked.

The noise condition is not only su�cient but also nec-essary.Under some additional conditions on {Mk(·)} and

{wk} the convergence rate and asymptotic normalitycan also be established. This is similar to the ordi-nary (without dynamics) stochastic approximationcase. Similar results can be established for Kiefer–Wolfowitz-type algorithms. Similar to the resultsby Chen [4], convergence of the continuous-timedynamic stochastic approximation can also be proved.To consider more general dynamics of the roots is ofinterest.

References

[1] A. Benveniste, M. M�etivier, P. Priouret, Adaptive Algorithmsand Stochastic Approximation, Springer, Berlin, 1990.

[2] H.-F. Chen, Y-M. Zhu, Stochastic approximation procedureswith randomly varying truncations, Scientia Sinica Ser. A 29(1986) 914–926.

[3] H.-F. Chen, Stochastic approximation and its newapplications, Proc. 1994 Hong Kong International Workshopon New Directions of Control and Manufacturing, 1994,pp. 2–12.

[4] H.-F. Chen, Continuous time stochastic approximation:convergence and asymptotic e�ciency, StochasticsStochastics Rept. 51 (1994) 111–132.

[5] H.-F. Chen, Recent developments in stochasticapproximation, Proc. IFAC World Congr. D (1996) 375–380.

[6] V. Dupa�c, A dynamic stochastic approximation method, Ann.Math. Statist. 36 (1965) 1695–1702.

[7] V. Dupa�c, Stochastic approximation in the presence of trend,Czechoslovak Math. J. 16 (1966) 454–461.

[8] J. Kiefer, J. Wolfowitz, Stochastic estimation of the maximumof a regression function, Ann. Math. Statist. 23 (1952) 462–466.

[9] H. Robbins, S. Monro, A stochastic approximation method,Ann. Math. Statist. 22 (1951) 400–407.

[10] K. Uosaki, Some generalizations of dynamic stochasticapproximation processes, Ann. Statist. 2 (1974) 1042–1048.

[11] I-J. Wang, E.K.P. Chong, S.R. Kulkarni, Equivalent necessaryand su�cient conditions on noise sequences for stochasticapproximation algorithms, Adv. Appl. Probab. 28 (1996)784–801.