Click here to load reader
Upload
han-fu
View
216
Download
0
Embed Size (px)
Citation preview
CONVERGENCE RATE OF STOCHASTIC APPROXIMATIONALGORITHMS IN THE DEGENERATE CASE∗
HAN-FU CHEN†
SIAM J. CONTROL OPTIM. c© 1998 Society for Industrial and Applied MathematicsVol. 36, No. 1, pp. 100–114, January 1998 005
Abstract. Let f(·) be an unknown function whose root x0 is sought by stochastic approximation(SA). Convergence rate and asymptotic normality are usually established for the nondegenerate casef ′(x0) 6= 0. This paper demonstrates the convergence rate of SA algorithms for the degeneratecase f ′(x0) = 0. In comparison with the previous work, in this paper no growth rate restriction isimposed on f(·), no statistical property is required for the measurement noise, the general step sizeis considered, and the result is obtained for the multidimensional case, which is not a straightforwardextension of the one-dimensional result. Although the observation noise may be either deterministicor random, the analysis is purely deterministic and elementary.
Key words. stochastic approximation, convergence rate
AMS subject classification. 62L20
PII. S0363012995281730
1. Introduction. The topic of SA is to search the roots or extremes of an un-known function f(·) : Rl → Rl which can be observed with noise. Since its pioneerwork by Robbins and Monro [1], SA has obtained much attention from researchers [2,3] and is applied in various areas, such as parameter identification, adaptive control,optimization, pattern recognition, and others [4].
In many applications not only convergence but also convergence rate of the al-gorithm is of interest. Intuitively, the rate of convergence depends on the derivativef ′(x0) of the function at its root x0; the rate in the nondegenerate case (f ′(x0) 6= 0)should be faster than it is in the degenerate case (f ′(x0) = 0). To be precise, theRobbins–Monro algorithm is defined by
xn+1 = xn + anyn+1,(1.1)
yn+1 = f(xn) + εn+1,(1.2)
where yn+1 is the observation and εn+1 is the noise. {an} is the step size and isselected to have the following properties:
an > 0, an−−−−→n→∞
0 and∞∑i=1
ai =∞.
Under certain conditions [1–4, 7] imposed on f(·) and εn, xn defined by (1.1),(1.2) converges to the root x0 of f(·), i.e.,
xn−−−−→n→∞
x0, f(x0) = 0.
Further, in the nondegenerate multidimensional case assume
f(x) = H(x− x0) + ∆(x), H < 0,(1.3)
∗Received by the editors February 17, 1995; accepted for publication (in revised form) October9, 1996. This research was supported by the National Natural Science Foundation of China.
http://www.siam.org/journals/sicon/36-1/28173.html†Laboratory of Systems and Control, Institute of Systems Science, Chinese Academy of Sciences,
Beijing 100080, People’s Republic of China ([email protected]).
100
Dow
nloa
ded
11/2
0/14
to 1
32.2
06.2
05.6
. Red
istr
ibut
ion
subj
ect t
o SI
AM
lice
nse
or c
opyr
ight
; see
http
://w
ww
.sia
m.o
rg/jo
urna
ls/o
jsa.
php
CONVERGENCE RATE OF STOCHASTIC APPROXIMATION 101
∆(x) = O(‖x− x0‖2) as x→ x0.(1.4)
Then under some conditions on the observation noise
‖xn − x0‖ = o(aδn) ∀δ ∈(
0,12
),(1.5)
provided H + qδI is a stable matrix where α by assumption is defined by
a−1n+1 − a−1
n −−−−→n→∞
q ≥ 0.
The convergence rate in the case f ′(x0) = 0 was addressed in [5] for the special casewhere (i) f(·) is a scalar function, i.e., l = 1, and f(x) grows not faster than linearly as|x| → ∞; (ii) (x−x0)f(x) < 0 ∀x 6= x0; (iii) f(x) = f0|x−x0|1+γsign(x−x0)·(1+o(1))as x → x0, γ > 0; (iv) the conditional variance of εn+1 given xn is bounded, i.e.,Var(εn+1|xn) ≤ σ2; (v) εn+1 is conditionally independent of x0, . . . , xn−1 given xn;and (vi) the step size is special: an = 1
n . In comparison with [5] this paper derivesthe convergence rate for the general case. To be precise, we do not impose anygrowth rate restriction on f(·); we do not require any statistical property of the noise,which is allowed to be stochastic or deterministic; we consider the general step sizean and, finally, we give the convergence rate for both multidimensional and one-dimensional cases. The approach used here is completely different from that usedin [5] and is purely deterministic. A purely deterministic approach in a discretesetting was used in [9, 10] as an alternative means for obtaining convergence results,and the approach used here is similar in flavor. We further show the power of anelementary deterministic analysis by obtaining convergence rates. It is worth notingthat extension from the one-dimensional result to the multidimensional case is notstraightforward. As will be seen in section 2, in the multidimensional case only theupper bound is obtained, while in the one-dimensional case it is shown that the upperbound is attainable.
2. Main results. Before describing the main results of the paper we presenta convergence result, proved in [4, 6]. The algorithm considered in this paper is amodified version of (1.1), (1.2) and is defined as follows.
Let {Mk} be a sequence of real numbers, Mi > 0, Mi ↑ ∞ and let x∗ be a fixedpoint in Rl. The estimate xn is recursively given by
xk+1 = xk + akyk+1, x0 arbitrary,(2.1)
xk+1 = xk+1I[‖xk+1‖≤Mσk] + x∗I[‖xk+1‖>Mσk
],(2.2)
σk =k−1∑i=0
I[‖xi+1‖>Mσi],(2.3)
yk+1 = f(xk) + εk+1.(2.4)
Since Mi diverges, algorithm (2.1)–(2.4) coincides with the Robbins–Monro al-gorithm (1.1), (1.2) starting from some time, if we can prove that {xk} defined by(2.1)–(2.4) is bounded.
Let us list conditions which will be used later on.A1. f(·) is an Rl → Rl measurable and locally bounded function, and f(x) = 0
∀x ∈ J ; i.e., J is the root set of f(·).
Dow
nloa
ded
11/2
0/14
to 1
32.2
06.2
05.6
. Red
istr
ibut
ion
subj
ect t
o SI
AM
lice
nse
or c
opyr
ight
; see
http
://w
ww
.sia
m.o
rg/jo
urna
ls/o
jsa.
php
102 HAN-FU CHEN
A2. ak > 0, ak −−−−→k →∞
0,∑∞i=1 ai =∞.
A3. There is a differentiable function v(·) : Rl → R such that
d(v(x), v(J)) > 0 if d(x, J) > 0
andsup
δ≤d(x,J)≤∆fτ (x)vx(x) < 0 ∀ 0 < δ < ∆,
where vx(x) denotes the gradient of v(x):
d(x, J) = inf{‖x− y‖ : ∀y ∈ J}, and v(J) = {v(x) : x ∈ J}.
A4. As x→ x0 the function f(x) is expressed as
f(x) = H(x− x0)‖x− x0‖γ + r(x), γ > 0,(2.5)
where H is a stable matrix (i.e., all its eigenvalues have negative real parts) and
r(x) ∈ Rl, r(x)/‖x− x0‖1+γ → 0 as x→ x0.(2.6)
A5.
qn4= a−1
n+1 − a−1n , 0 ≤ qn, lim sup
n→∞qn = q, 0 ≤ q <∞,(2.7)
∞∑i=1
bi =∞, where bi =ai
log a−1i
.(2.8)
PROPOSITION. Assume A1–A3 hold. If there is a constant c0 such that ‖x∗‖ < c0,v(x∗) < inf‖x‖ = c0v(x) and if v(J) is not dense in any interval, then {xk} definedby (2.1)–(2.4) converges to J
limk→∞
d(xk, J) = 0
whenever {εi} satisfies the following condition:
limT→0
lim supk→∞
1T
∥∥∥∥∥∥m(k,t)∑i=k
aiεi+1
∥∥∥∥∥∥ = 0 ∀t ∈ [0, T ],(2.9)
where
m(k, t) = max
{m :
m∑i=k
ai ≤ t}.
Remark 1. An obvious condition which guarantees (2.9) is the convergence of theseries
∞∑i=1
aiεi+1.
Condition (2.9) is also necessary for convergence of xn to the root of f(x). This isdiscussed in the recent paper [8], which also shows that (2.9) is equivalent to the
Dow
nloa
ded
11/2
0/14
to 1
32.2
06.2
05.6
. Red
istr
ibut
ion
subj
ect t
o SI
AM
lice
nse
or c
opyr
ight
; see
http
://w
ww
.sia
m.o
rg/jo
urna
ls/o
jsa.
php
CONVERGENCE RATE OF STOCHASTIC APPROXIMATION 103
standard Kushner–Clark condition [3]. However, when εk depends on {x0, . . . , xk−1},it is difficult to directly verify (2.9). In [4, 6] it is shown that it suffices to verify (2.9)not along the whole sequence {k} but along the subsequence {nk} whenever {xnk}converges. In [4, 6] it is also demonstrated that this verification can be done in manypractically important problems.
Remark 2. If {xk} given by (1.1), (1.2) is a priori known to be bounded, thenunder conditions A1–A3 and (2.9)
limk→∞
d(xn, J) = 0;(2.10)
i.e., in this case the truncations introduced in (2.1)–(2.4) are not necessary.In A4 the matrix H is stable. By the Lyapunov equality there is a positive definite
matrix P > 0 such that
PH +HτP = −I.(2.11)
Denote by λmax and λmin the maximum and minimum eigenvalue of P , respec-tively, and by K the condition number λmax/λmin.
THEOREM. (i) If conditions A1–A5 are satisfied and x0 is the unique root off(·), then for {xn} defined by (2.1)–(2.4)
lim supn→∞
(log a−1n )
1γ ‖xn − x0‖ ≤
√K
(2qλmax
γ
) 1γ
(2.12)
if {εi} satisfies the following condition:
∞∑i=1
ai(log a−1i+1)
1γ εi+1 <∞,(2.13)
where γ and q are given by (2.5), (2.7), respectively.(ii) If, in addition, H is symmetric, then
lim supn→∞
(log a−1n )
1γ ‖xn − x0‖ ≤
(q
λlγ
) 1γ
,(2.14)
where λl is the smallest eigenvalue of −H and γ and q are given by (2.5), (2.7),respectively.
(iii) Further, in the one-dimensional case, i.e., l = 1, under the conditions statedin (i) except A3, the upper bound in (2.14) is attainable if qn → q > 0.
The proof of the theorem is given in section 3.Remark 3. From the theorem it is seen that the convergence rate of (xn − x0)
depends upon the decreasing rate of an. However, it is interesting to note that this de-pendence in the degenerate case is completely different from that in the nondegeneratecase.
From (1.5) it is seen that for the nondegenerate case, if an = 1nα , 0 < α ≤ 1, then
the convergence rate of (xn−x0) is improving as α increases from 0 to 1. However, inthe degenerate case the picture is different. By the theorem, limn→∞(α logn)
1γ |xn −
x0| equals 0 for all α ∈ (0, 1), while it may attain ( 1|H|γ )
1γ if α = 1. This means that in
contrast to the nondegenerate case, the convergence rate of |xn − x0| for α ∈ (0, 1) is
Dow
nloa
ded
11/2
0/14
to 1
32.2
06.2
05.6
. Red
istr
ibut
ion
subj
ect t
o SI
AM
lice
nse
or c
opyr
ight
; see
http
://w
ww
.sia
m.o
rg/jo
urna
ls/o
jsa.
php
104 HAN-FU CHEN
better than that for α = 1. This fact is verified by simulation of the following simpleexample:
f(x) = −x|x|, x0 = 0, f ′(x0) = 0, γ = 1, H = −1,εi ≡ 0,
x(1)n+1 = x
(1)n − x(1)
n |x(1)n |
n , x(1)0 = 0.5,
x(2)n+1 = x
(2)n − x(2)
n |x(2)n |√n
, x(2)0 = 0.5.
The simulation shows that
x(1)n logn−−−−→
n→∞1, while x(2)
n logn−−−−→n→∞
0,
which are reconciled with results stated in the theorem.It is also worth noting that the right-hand sides of (2.14) depend upon the smallest
eigenvalue λl of −H when H is symmetric. As λl decreases the upper bound in(2.14) increases. In other words, the faster f(x) leaves the abscissa, the faster xnconverges to x0. This phenomenon is consistent with the convergence rate changefrom (1.5) for the nondegenerate case to (2.12) for the degenerate case. This is alsoverified by computation: if in the example considered above “H = −1” is replacedby “H = −1
2”, i.e., if f(x) = −12x|x|, then the recursion with an = 1
n becomesxn+1 = xn − xn|xn|
2n , x0 = 0.5. The computation shows
xn logn−−−−→n→∞
2,
which is larger than the limit of x(1)n logn.
Remark 4. In the case q > 0, the convergence rate given in the theorem cannotbe improved. However, when q = 0, i.e., when a slowly decreasing gain is applied,we have only established ‖xn − x0‖ = o(log a−1
n )−1r . The estimate may be not sharp,
but the computation shows that x(2)n logn in Remark 3 converges to zero very slowly.
This means that we should not expect a much faster rate than (2.15).
3. Order of estimation error. In this section we establish the order of esti-mation error when the estimation algorithm (2.1)–(2.4) is applied. As a matter of
fact, we intend to show that ‖zn‖4= ‖(log a−1
n )1γ (xn − x0)‖ is bounded. This is an
intermediate step toward proving the theorem which gives either upper bound or anexact limit of ‖zn‖.
LEMMA 1. If A5 holds, then (2.13) implies (2.9).Proof. Let (2.13) be held. Setting
sn =n∑i=1
ai(log a−1i+1)
1γ εi+1, s0 = 0,
we haven∑
i=m
aiεi+1 =n∑
i=m
(si − si−1)(log a−1i+1)−
1γ
=sn(log a−1n+1)−
1γ +
n−1∑i=m
si[(log a−1i+1)−
1γ − (log a−1
i+2)−1γ ].(3.1)
Dow
nloa
ded
11/2
0/14
to 1
32.2
06.2
05.6
. Red
istr
ibut
ion
subj
ect t
o SI
AM
lice
nse
or c
opyr
ight
; see
http
://w
ww
.sia
m.o
rg/jo
urna
ls/o
jsa.
php
CONVERGENCE RATE OF STOCHASTIC APPROXIMATION 105
Since sn converges, the first term on the right-hand side of (3.1) tends to zero asn→∞, while the last term is dominated by
supm≤i≤n
|si|n−1∑i=m
|(log a−1i+1)−
1γ−(log a−1
i+2)−1γ | = sup
m≤i≤n|si|[(log a−1
m+1)−1γ−(log a−1
n+1)−1γ ],
which tends to zero as n→∞ and m→∞.Hence,
∑∞i=1 aiεi+1 converges and (2.9) holds by Remark 1.
LEMMA 2. Under the conditions stated in (i) or in (iii) of the theorem, xk definedby (2.1)–(2.4) converges to x0 as k →∞.
Proof. By Lemma 1 and the Proposition presented in section 2, under the condi-tions stated in (i) we see xk −−−−→
k →∞x0. For the one-dimensional case stated in (i) we
also have xk −−−−→k →∞
x0 if we can verify A3.
Since x0 is the unique root of f(·) by A1, we have by A4
(x− x0)f(x) < 0 ∀x 6= x0.
Then the function v(x) = (x− x0)2 satisfies A3.Define
zn = (log a−1n )
1γ (xn − x0),(3.2)
h(z) = Hz‖z‖γ +q + ∆γ
z, z ∈ Rl, ∆ > 0.(3.3)
LEMMA 3 (key lemma). Under the conditions stated in (i) of the theorem, {zn}is bounded if (2.13) holds.
Proof. To prove boundedness of {zn} we first express zn in the recursive form.For any ∆ > 0 and sufficiently large n by (2.7), we have qn ≤ q + ∆ and
(log a−1
n+1
log a−1n
) 1γ
=
log a−1
n + loga−1n+1
a−1n
log a−1n
1γ
=(
1 +log(1 + anqn)
log a−1n
) 1γ
=(
1 +anqn +O(a2
n)log a−1
n
) 1γ
=1 +an(q + ∆ + o(1))
γ log a−1n
.(3.4)
By Lemma 2 {xn} is bounded, and hence xk is defined by the Robbins–Monroalgorithm starting from some n0. Consequently, by (3.4) for n ≥ n0 we derive therecursive formula for {zn}:
Dow
nloa
ded
11/2
0/14
to 1
32.2
06.2
05.6
. Red
istr
ibut
ion
subj
ect t
o SI
AM
lice
nse
or c
opyr
ight
; see
http
://w
ww
.sia
m.o
rg/jo
urna
ls/o
jsa.
php
106 HAN-FU CHEN
zn+1 =(
1 +an
γ log a−1n
(q + ∆ + o(1))zn
)
+an
log a−1n
(1 +
an
γ log a−1n
(q + ∆ + o(1)))
(log a−1n )1+ 1
γ
·[H(xn − x0)‖xn − x0‖γ + r(xn)] + an(log a−1n+1)
1γ εn+1
=(
1 +an
γ log a−1n
(q + ∆ + o(1)))zn
+an
log a−1n
(1 +
an
γ log a−1n
(q + ∆ + o(1)))[
Hzn‖zn‖γ +‖zn‖1+γr(xn)‖xn − x0‖1+γ
]+an(log a−1
n+1)1γ εn+1
=zn + bnhn(zn) + an(log a−1n+1)
1γ εn+1(3.5)
=zn + bnHnzn + an(log a−1n+1)
1γ εn+1,(3.6)
where
hn(z) =(Hz‖z‖γ +
‖z‖1+γr(xn)‖xn − x0‖1+γ
)(1 +
an
γ log a−1n
(q + ∆ + o(1)))
+q + ∆ + o(1)
γz = Hnz(3.7)
and
Hn =[(H +
r(xn)‖xn − x0‖1+γ ·
zτn‖zn‖
)(1 + o(1)) +
q + ∆ + o(1)γ‖zn‖γ
· I]‖zn‖γ .(3.8)
Assume that the converse is true, i.e., assume {‖zn‖} is unbounded.Let us fix a large enough constant c > 1 such that
q + ∆γcγ
λmax <15.(3.9)
Denote by {zli}, i = 1, 2, . . . , nc, those of {zn, n ≥ n0} for which ‖zli‖ ≤ c and‖zi‖ > c ∀i : i 6∈ {1, . . . , nc} where nc may be infinite. For both cases nc < ∞and nc =∞ from the unboundedness of {‖zn‖} we will obtain a contradiction. Thisimplies the conclusion of the lemma.
Case 1. If nc <∞, then ‖zi‖ > c ∀i ≥ nc.We now show that by selection (3.9) for c the difference equation (3.5) in Case 1 is
asymptotically stable and zn−−−−→i→∞
0. This implies impossibility of ‖zi‖ > c ∀i ≥ nc.Define
Φn,j = (I + bnHn)(I + bn−1Hn−1) · · · (I + bjHj), Φj,j+14= I,(3.10)
Φτn,jPΦn,j = Φτn−1,j(P + bn(HτnP + PHn) + b2nH
τnPHn)Φn−1,j ,(3.11)
where Hn is defined by (3.8).
Dow
nloa
ded
11/2
0/14
to 1
32.2
06.2
05.6
. Red
istr
ibut
ion
subj
ect t
o SI
AM
lice
nse
or c
opyr
ight
; see
http
://w
ww
.sia
m.o
rg/jo
urna
ls/o
jsa.
php
CONVERGENCE RATE OF STOCHASTIC APPROXIMATION 107
From A4 and Lemma 2, notice that r(xn)/‖xn− x0‖1+γ −−−−→n→∞
0 and for n ≥ nc,
bn‖HτnPHn‖ ≤ c1bn‖zn‖2γ = c1
an
log a−1n
(log a−1n )2‖xn − x0‖2γ −−−−→
n→∞0,(3.12)
where c1 is a constant. Then by (2.11), (3.9), (3.12) for sufficiently large n we have
(HτnP + PHn) + bnH
τnPHn < −
12‖zn‖γI.(3.13)
Without loss of generality we may assume that n0 is large enough so that (3.13) isvalid for n ≥ n0. By (3.11), (3.13) for j ≥ nc we see that
Φτn,jPΦn,j ≤ Φτn−1,j
(P − 1
2bn‖zn‖γI
)Φn−1,j ≤
(1− bn‖zn‖γ
2λmax
)Φτn−1,jPΦn−1,j ,
where as defined in section 2 λmax is the maximum eigenvalue of P .This implies that
Φτn,ncPΦn,nc ≤(1− µbn‖zn‖γ)Φτn−1,ncPΦn−1,nc
<e−µbn‖zn‖γ
Φτn−1,ncPΦn−1,nc < λmaxe−µ
∑ni=nc
bi‖zi‖γ I,
where µ = 12λmax
.Consequently, we have
‖Φn,nc‖ <√Ke−
µ2
∑ni=nc
bi‖zi‖γ .(3.14)
We remind the reader that K = λmax/λmin and λmin is the minimum eigenvalue ofP .
From (3.6) it follows that
zn+1 = Φn,ncznc +n∑
j=nc
Φn,j+1aj(log a−1j+1)
1γ εj+1.(3.15)
Since ‖zi‖ > c ∀i ≥ nc and∑∞i=nc bi = ∞, by (3.14) the first term on the
right-hand side of (3.15) tends to zero as n→∞. Let us now estimate the last termof (3.15).
Set
ξn =n∑
j=nc
aj(log a−1j+1)
1γ εj+1.
By (2.13) it follows that ξn−−−−→n→∞
ξ <∞. We now have
n∑j=nc
Φn,j+1aj(log a−1j+1)
1γ εj+1
=n∑
j=nc
Φn,j+1(ξj − ξj−1)
= ξn −n∑
j=nc+1
(Φn,j+1 − Φn,j)ξj−1 − Φn,nc+1ξnc−1
Dow
nloa
ded
11/2
0/14
to 1
32.2
06.2
05.6
. Red
istr
ibut
ion
subj
ect t
o SI
AM
lice
nse
or c
opyr
ight
; see
http
://w
ww
.sia
m.o
rg/jo
urna
ls/o
jsa.
php
108 HAN-FU CHEN
= ξn −n∑
j=nc+1
(Φn,j+1 − Φn,j)ξ
−n∑
j=nc+1
(Φn,j+1 − Φn,j)(ξj−1 − ξ)− Φn,nnc+1ξnc−1
= (ξn − ξ) + Φn,nc+1ξ −n1∑
j=nc+1
(Φn,j+1 − Φn,j)(ξj−1 − ξ)
+n∑
j=n1+1
Φn,j+1bjHj(ξj−1 − ξ)− Φn,nnc+1ξnc−1.(3.16)
By (3.14) it is clear that on the right-hand side of (3.16) all terms except thesecond-to-last one tend to zero as n → ∞ for any fixed n1. We now show that thesecond-to-last term of (3.16) can be made arbitrarily small by choosing n1 sufficientlylarge. For any ε > 0, take sufficiently large n1 such that
|ξj − ξ| < ε and 1 ≥ µbj‖zj‖γ2
∀j ≥ n1,
which is possible because
bj‖zj‖γ =aj
log a−1j
· [(log a−1j )
1γ ‖xj − x0‖]γ = aj‖xj − x0‖γ → 0.(3.17)
Using (3.14), (3.17), and noticing that ‖Hn‖ ≤ c2‖zn‖γ ∀n ≥ nc for some constantc2 > 0 we derive∥∥∥∥∥∥
n∑j=n1+1
Φn,j+1bjHj(ξj−1 − ξ)
∥∥∥∥∥∥ ≤ εc2√Kn∑
j=n1+1
e−µ2
∑ni=j+1 bi‖zi‖
γ
bj‖zj‖γ
≤ 4εc2√K
µ
n∑j=n1+1
e−µ2
∑ni=j+1 bi‖zi‖
γ
(1− e−µ2 bi‖zj‖
γ
) ≤ 4εc2Kµ
,
where we use the fact that x2 ≤ 1− e−x for x ∈ [0, 1].
Consequently, the left-hand side of (3.16) tends to zero as n → ∞, and hencezn−−−−→
n→∞0. This contradicts ‖zi‖ > c, ∀i ≥ nc. Therefore, nc must be ∞.
Case 2. Assume nc =∞. In this case {zi} will come back to the ball {‖z‖ ≤ c}infinitely many times and at the same time {zi} is unbounded. From this we canconclude that {‖zi‖} crosses a nonempty interval infinitely often. To be precise, letzτliPzli ≤ λmaxc
2, i = 1, . . . , nc, where P is given in (2.11). Starting from any zli ,i ∈ {1, 2, . . . , nc}, there exists an mi > li such that zτmiPzmi > 4c2λ2
max/λmin since{‖zn‖} is unbounded. Further, noticing nc =∞ we can find an integer ni+1 in the set{li, i = 1, 2, . . . , nc} such that ni+1 > mi. This procedure can be continued infinitelymany times. Without loss of generality, we may assume
zτniPzni ≤ λmaxc2, zτmiPzmi ≥ 4c2λ2
max/λmin,
λmaxc2 < zτj Pzj < 4c2λ2
max/λmin,(3.18)
ni < j < mi, i = 1, 2, . . . .
Dow
nloa
ded
11/2
0/14
to 1
32.2
06.2
05.6
. Red
istr
ibut
ion
subj
ect t
o SI
AM
lice
nse
or c
opyr
ight
; see
http
://w
ww
.sia
m.o
rg/jo
urna
ls/o
jsa.
php
CONVERGENCE RATE OF STOCHASTIC APPROXIMATION 109
This implies the crossing property of {‖zi‖}:
‖zni‖ ≤√Kc, ‖zmi‖ ≥ 2c
√K, c < ‖zj‖ < 2cK,(3.19)
ni < j < mi, i = 1, 2, . . . .
We now show that∑mi−1j=ni bj ≥ T > 0 and ‖zs − zni‖ = O(T ) as T → 0 for all
large i and s:∑sj=ni bj ≤ T . This implies a contradiction to (3.19). We now prove
this in detail.Noticing that there are constants c3 and c4 such that
bn‖Hnzn‖≤an
log a−1n
[c3‖zn‖1+γ + c4‖zn‖](3.20)
≤ an
log a−1n
[c3(log a−1n )
γ+1γ ‖xn − x0‖1+γ
+c4(log a−1n )
1γ ‖xn − x0‖]−−−−→
n→∞0,
by (2.13) from (3.6) we see that
zn+1 − zn−−−−→n→∞
0.(3.21)
Summing up both sides of (3.6) from ni to mi we derive
zmi = zni +mi−1∑j=ni
bjHjzj +mi−1∑j=ni
aj(log a−1j+1)
1γ εj+1.(3.22)
From (3.22) using (3.18), (3.19), (3.20) we obtain
2c√K ≤√Kc+
mi−1∑j=ni
bj(c3‖zj‖1+γ + c4‖zj‖) +
∥∥∥∥∥∥mi−1∑j=ni
aj(log a−1j+1)
1γ εj+1
∥∥∥∥∥∥≤√Kc+
mi−1∑j=ni
bj(c3(2cK)1+γ + c42cK) +
∥∥∥∥∥∥mi−1∑j=ni
aj(log a−1j+1)
1γ εj+1
∥∥∥∥∥∥ .(3.23)
Since (2.13) the last term of (3.23) can be made arbitrarily small, say, less thanε (<
√Kc) if i is sufficiently large. Then from (3.23) it follows that
mi−1∑j=ni
bj ≥√Kc− ε
c3(2cK)1+γ + 2cc4K4= T > 0
for all large enough i. This means that l(ni, T ) ≤ mi − 1, where
l(n, t) = max
{l :
l∑i=n
bi ≤ t}, bi =
ai
log a−1i
.(3.24)
Dow
nloa
ded
11/2
0/14
to 1
32.2
06.2
05.6
. Red
istr
ibut
ion
subj
ect t
o SI
AM
lice
nse
or c
opyr
ight
; see
http
://w
ww
.sia
m.o
rg/jo
urna
ls/o
jsa.
php
110 HAN-FU CHEN
Consequently, we have
‖zj − zni‖ ≤∥∥∥∥∥j−1∑s=ni
bsHszs +j−1∑s=ni
as(log a−1s+1)
1γ εs+1
∥∥∥∥∥≤ (c3(2cK)1+γ + 2c4cK)T + o(1) ≤ αT, α > 0,(3.25)
∀j ∈ [ni, . . . , l(ni, T )],
where α is a constantTherefore, by Taylor’s formula there exists z ∈ Rl such that
‖z − zni‖ ≤ αT(3.26)
and
zτl(ni,T )Pzl(ni,T ) − zτniPzni =zτP
l(ni,t)∑j=ni
bjHjzj +l(ni,T )∑j=ni
aj(log a−1j+1)
1γ εj+1
=l(ni,t)∑j=ni
bjzτj PHjzj +
l(ni,T )∑j=ni
bj(z − zj)τPHjzj
+zτPl(ni,T )∑j=ni
aj(log a−1j+1)
1γ εj+1.(3.27)
Using (3.20), (3.25), and (3.26) we see that∥∥∥∥∥∥l(ni,T )∑j=ni
bj(z − zj)τPHjzj
∥∥∥∥∥∥ ≤ 2αT 2λmax(c3(2cK)1+γ + c42cK).
By (2.13), the last term of (3.27) can be made arbitrarily small. Hence, by (3.12),(3.13), (3.19), (3.21) we have
zτj PHjzj =12zτj (PHj +Hτ
j P )zj < −14‖zj‖2+γ ≤ −1
4c2+γ ∀j = ni, . . . , l(ni, T ).
From (3.27) we can conclude that
zτl(ni,T )Pzl(ni,T ) − zτniPzni
≤−14c2+γT + 2αT 2(c3(2cK)1+γ + 2cc4K) + o(1) ≤ −1
5c2+γT,(3.28)
if i is large enough and T is sufficiently small. By (3.18) inequality (3.28) implies that
λmaxc2 < zτl(ni,T )Pzl(ni,T ) ≤ zτniPzni −
15c2+γT → λmaxc
2 − c2+γT
5,
which is impossible. The obtained contradiction shows that {zn} is bounded.
Dow
nloa
ded
11/2
0/14
to 1
32.2
06.2
05.6
. Red
istr
ibut
ion
subj
ect t
o SI
AM
lice
nse
or c
opyr
ight
; see
http
://w
ww
.sia
m.o
rg/jo
urna
ls/o
jsa.
php
CONVERGENCE RATE OF STOCHASTIC APPROXIMATION 111
4. Proof of the theorem. We are now in a position to prove our theorem.Proof of the theorem. (i) For assertion (2.12) of the theorem it suffices to show
lim supn→∞
zτnPzn ≤ λmax
(2(q + ∆)λmax
γ
) 2γ 4
= a(4.1)
for arbitrarily small ∆ > 0. Let us fix ∆ > 0.The idea of proof for (4.1) is that we show that zτnPzn crosses a nonempty interval
infinitely often if (4.1) is not true and, at same time, zτnPzn is decreasing in a certainsense. This contains a contradiction.
By Lemma 3 {zn} is bounded; i.e.,
‖zn‖ ≤ ζ <∞ ∀n.(4.2)
Hence, from (3.8) we see that
Hn = H‖zn‖γ +q + ∆γ
I + o(1).(4.3)
From (3.3), (3.6), and (4.3) it follows that
zn+1 = zn + bnh(zn) + bno(1) + an(log a−1n+1)
1γ εn+1.(4.4)
Fix any small ε > 0 consider z ∈ Rl for which
zτPz ≥ λmax
(2(q + ∆)λmax + ε
γ
) 2γ 4
= b.(4.5)
This implies that
‖z‖ ≥(
2(q + ∆)λmax + ε
γ
) 1γ
.
Then by (2.11) and (4.4) we have
zτ(
(PH +HτP )‖z‖γ +2(q + ∆)
γP
)z
≤ zτ(−‖z‖γI +
2(q + ∆)λmax
γI
)z ≤ − ε
γ‖z‖2.(4.6)
Assume (4.1) is not true. Then there is a small δ > 0 such that
lim supn→∞
zτnPzn > a+ δ.(4.7)
Therefore, there is a subsequence
zτnkPznk > a+ δ, k = 1, 2, . . . .(4.8)
Let ε > 0 be small enough so that
a+ δ > b,(4.9)
where b is given by (4.5).
Dow
nloa
ded
11/2
0/14
to 1
32.2
06.2
05.6
. Red
istr
ibut
ion
subj
ect t
o SI
AM
lice
nse
or c
opyr
ight
; see
http
://w
ww
.sia
m.o
rg/jo
urna
ls/o
jsa.
php
112 HAN-FU CHEN
We now show that from any nk, k = 1, 2, . . ., {zn} will enter the ellipsoid {z :zτPz ≤ b}. Assume the converse, i.e.,
‖zτi Pzi‖ ≥ b ∀i ≥ nk(4.10)
for some nk.By the boundedness of {zn} we have
‖zj − zn‖ ≤ c5T, j = n, . . . , l(n, T ) ∀n,(4.11)
and hence
|zτj Pzj − zτnPzn| ≤ c6T, j = n, . . . , l(n, T ) ∀n,(4.12)
where c5 and c6 are constants. Similar to (3.27), by (4.2), (4.11) we obtain
zτl(nk,T )Pzl(nk,T ) − zτnkPznk ≤l(nk,T )∑j=nk
bjzτj PHjzj + c7T
2 + o(1)
=l(nk,T )∑j=nk
bjzτj
(PH
(‖zj‖γ +
q + ∆γ
P
))zj + c7T
2 + o(1).
(4.13)
Using (2.11) leads to
zτl(nk,T )Pzl(nk,T )−zτnkPznk =12
l(nk,T )∑j=nk
bjzτj
(−‖zj‖γI +
2(q + ∆)γ
P
)zj+c7T 2 +o(1).
From this by (4.6), (4.10) we obtain
zτl(nk,T )Pzl(nk,T ) − zτnkPznk< −12
l(nk,T )∑j=nk
bjε
2γ‖zj‖2 + c7T
2 + o(1)
< − ε
2γ
(2(q + ∆)λmax + ε
γ
) 2γ
T + c7T2 + o(1)
≤ − ε
3γ
(2(q + ∆)λmax + ε
γ
) 2γ
T(4.14)
for sufficiently small T and large enough k. This means that after a finite number ofsteps (4.10) will not be satisfied; i.e., zn will enter the ellipsoid {z : zτPz ≤ b}. Thistogether with (4.8) implies that {zτnPzn} will cross the interval [b, a + δ] infinitelyoften; i.e., there are two subsequences {zlk} and {zmk} such that
zτlkPzlk ≤ b, zτmkPzmk ≥ a+ δ,
b < zτi Pzi < a+ δ ∀i : lk < i < mk.
Take T sufficiently small such that c6T < a + δ − b. Then by (4.12) we seel(li, T ) < mi ∀i and
zτl(lk,T )Pzl(lk,T ) ∈ (b, a+ δ),
Dow
nloa
ded
11/2
0/14
to 1
32.2
06.2
05.6
. Red
istr
ibut
ion
subj
ect t
o SI
AM
lice
nse
or c
opyr
ight
; see
http
://w
ww
.sia
m.o
rg/jo
urna
ls/o
jsa.
php
CONVERGENCE RATE OF STOCHASTIC APPROXIMATION 113
which combined with (4.14) leads to a contradiction:
0 < zτl(lk,T )Pzl(lk,T ) − zτlkPzlk ≤ −ε
3γ
(2(q + ∆)λmax + ε
γ
) 2γ
T.
Therefore, (4.8) is impossible or (4.7) is impossible. Since δ may be arbitrarilysmall, the impossibility of (4.7) implies (4.1). Tending ∆ to zero, from (4.1) we derive(2.12).
(ii) Now, let H be symmetric. We simply consider ‖z‖2 instead of zτPz and setin (4.1) and (4.5)
a =(q + ∆λlγ
) 2γ
, b =(q + ∆ + ε
λlγ
) 2γ
.
The proof can be carried out along the lines of that given for the general case.For example, corresponding to (4.5), (4.6) we now have
‖z‖2 ≥ b, and zτ(H‖z‖γ +
q + ∆γ
I
)z ≤ zτ
(−λl
q + ∆ + ε
λlγ+q + ∆γ
)z = − ε
γ‖z‖2,
respectively, while (4.13) becomes
‖zl(nk,T )‖2 − ‖znk‖2 ≤ −l(nk,T )∑j=nk
bjzτj
(H
(‖zj‖γ +
q + ∆γ
I
)zj + c7T
2 + o(1)).
(iii) Since q > 0, we may set ∆ = 0 in (3.3) and in the proofs of Lemma 3 andpart (i) of the theorem.
In the one-dimensional case H in (2.5) is a negative number, and λl in (2.14)equals |H|. The root set of h(z) defined by (3.3) with ∆ = 0 is J = {0,±( q
−Hγ )1γ }.
It is easy to define a twice differentiable function v(z) such that
v(z) = v(−z), 0 < v(z) < v(0) ∀z : |z| ≤ ζ,
v′(z)h(z) < 0 ∀z 6∈{
0,±(
q
−Hγ
) 1γ
},
where ζ is given in (4.2).For ∀t ≤ T we find that
limT→0
lim supk→∞
1T
∥∥∥∥∥∥l(k,t)∑i=k
(bio(1) + ai(log a−1i+1)
1γ εi+1)
∥∥∥∥∥∥≤ limT→0
lim supk→∞
1T{t · o(1)}+ lim
T→0lim supk→∞
1T
∥∥∥∥∥∥l(k,t)∑i=k
ai(log a−1i+1)
1γ εi+1
∥∥∥∥∥∥=0.(4.15)
Applying Remark 2 in section 2 to (4.4) leads to
limk→∞
d(zk, J) = 0.
Dow
nloa
ded
11/2
0/14
to 1
32.2
06.2
05.6
. Red
istr
ibut
ion
subj
ect t
o SI
AM
lice
nse
or c
opyr
ight
; see
http
://w
ww
.sia
m.o
rg/jo
urna
ls/o
jsa.
php
114 HAN-FU CHEN
This is valid for any {εi} satisfying (4.15). In particular, if {εi} is such thatbio(1) + ai(log a−1
i+1)1γ εi+1 = 0 ∀i ≥ 1, then (4.4) becomes the following recursion:
zn+1 = zn + bnh(zn).(4.16)
Note that h′(0) = qγ > 0, and hence 0 is not stable for the equation
zt = h(zt).
It is clear that 0 cannot be the limit point of (4.16).Therefore, in this case zn can converge either to ( q
|H|γ )1γ or to −( q
|H|γ )1γ . This
verifies the attainability of the upper bound in (2.14).
5. Concluding remarks. By using a deterministic analysis we have shown thepathwise convergence rate of SA when f(x0) = 0 and f ′(x0) = 0. Some problemsare still open and belong to further research. First, it might be possible to obtainmore precise results. For example, as a conjecture, the limit of the left-hand sideof (2.14) is one of ( q
λiγ)
1γ , i = 1, . . . , l, depending upon the initial value where λi,
i = 1, . . . , l are the eigenvalues of H. Second, it is not clear what happens if f(·) hasmore complicated behavior as x→ x0.
REFERENCES
[1] H. ROBBINS AND S. MONRO, A stochastic approximation method, Ann. Math. Statist.,22 (1951), pp. 400–407.
[2] M. B. NEVELSON AND R. Z. HASMINSKII, Stochastic Approximation and Recursive Estimation,Amer. Math. Soc. Transl. Math. Monographs 47, 1976.
[3] H. J. KUSHNER AND D. S. CLARK, Stochastic Approximation for Constrained and Uncon-strained Systems, Springer-Verlag, New York, 1978.
[4] H.-F. CHEN, Stochastic approximation and its new applications, in Proc. 1994 Hong KongInternational Workshop on New Directions of Control and Manufacturing, 1994, pp. 2–12.
[5] L. LJUNG, G. PFLUG, AND H. WALK, Stochastic Approximation of Random Systems,Birkhauser, Basel, 1992, pp. 71–76.
[6] H.-F. CHEN, T. DUNCAN, AND B. PASIK-DUNCAN, On Ljung’s approach to system parameteridentification, in 10th IFAC Symposium on Systems Identification, Vol. 2, preprint, M.Blanke and T. Soderstrom, eds., Copenhagen, 1994, pp. 667–671.
[7] H.-F. CHEN, Recursive Estimation and Control for Stochastic Systems, John Wiley, New York,1985.
[8] I. J. WANG, E. K. P. CHONG, AND S. R. KULKARNI, Equivalent necessary and sufficient condi-tions on noise sequences for stochastic approximation algorithms, Adv. in Appl. Probab.,accepted for publication.
[9] S. R. KULKARNI AND C. HORN, Convergence of the Robbins–Monro algorithm under arbitrarydisturbances, in Proc. of 32nd Conf. on Decision and Control, 1993, pp. 537–538.
[10] S. R. KULKARNI AND C. HORN, Alternative approach and conditions for convergence ofstochastic approximation algorithms, in Proc. 34th Conf. on Decision and Control, NewOrleans, IEEE Control Systems Society, 1995.
Dow
nloa
ded
11/2
0/14
to 1
32.2
06.2
05.6
. Red
istr
ibut
ion
subj
ect t
o SI
AM
lice
nse
or c
opyr
ight
; see
http
://w
ww
.sia
m.o
rg/jo
urna
ls/o
jsa.
php