7
Systems & Control Letters 55 (2006) 139 – 145 www.elsevier.com/locate/sysconle Stochastic approximation with ‘controlled Markov’ noise Vivek S. Borkar , 1 School of Technology and Computer Science, Tata Institute of Fundamental Research, Homi Bhabha Road, Mumbai 400005, India Received 18 December 2004; received in revised form 14 May 2005; accepted 4 June 2005 Available online 18 July 2005 Abstract Stochastic approximation algorithms with additional noise that can be modelled as a controlled Markov process are analyzed and shown to track the solutions of a differential inclusion defined in terms of the ergodic occupation measures associated with the controlled Markov process. © 2005 Elsevier B.V. All rights reserved. Keywords: Stochastic approximation; Controlled Markov noise; Differential inclusions; Ergodic occupation measures; Invariant sets 1. Introduction Stochastic approximation algorithms with ‘Markov’ noise have been extensively analyzed by means of their limiting o.d.e., see, e.g., [3]. The aim of this note is to extend these results to the case when the noise is ‘controlled Markov’, by relating it to a limiting differential inclusion defined in terms of the so-called ergodic occupation measures associ- ated with the controlled Markov process. Stochastic approxi- mation schemes whose limits are differential inclusions have been recently analyzed in [2]. The motivation for this study comes from the fact that many times the so-called noise process (which may not in fact be a physical noise—see the example below) is not Markov, but its lack of Markov property comes through its dependence on some possibly imperfectly known, possibly time-varying process which may be viewed as a ‘control’ for analysis purposes. Consider, for example, the case when the stochastic approximation scheme contributes the parameter estimation component of a self-tuning controller, whence the controlled state process would be what is called ‘controlled Markov noise’ here. The control cannot be a priori assumed Tel.: +91 22 2280 4614x2293; fax: +91 22 2280 4610. E-mail address: [email protected]. 1 Research supported in part by Grant no. III.5(157)/99 from the Department of Science and Technology, Government of India. 0167-6911/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.sysconle.2005.06.005 to be either Markov or stationary. Note that in this example the ‘control’ is in fact a physical control process, but this need not be so in general. The next section derives some preliminary results, based on which the main result is established in Section 3. 2. Preliminaries Specifically, we consider the iteration x n+1 = x n + a(n)[h(x n ,Y n ) + M n+1 ], (1) where {Y n } is a random process taking values in a complete separable metric space S with dynamics we shall soon spec- ify, and h : R d × S R d is jointly continuous in its argu- ments and Lipschitz in its first argument uniformly w.r.t. the second. {M n } is a martingale difference sequence w.r.t. the -fields F n (x m ,Y m ,M m ,m n), n 0, satisfying E[M n+1 2 |F n ] K 0 (1 +x n 2 ) n, (2) for some K 0 > 0. Stepsizes {a(n)} satisfy the usual condi- tions: n a(n) =∞, n a(n) 2 < , (3) with the additional condition that they be eventually decreasing.

Stochastic approximation with ‘controlled Markov’ noise

Embed Size (px)

Citation preview

Page 1: Stochastic approximation with ‘controlled Markov’ noise

Systems & Control Letters 55 (2006) 139–145www.elsevier.com/locate/sysconle

Stochastic approximation with ‘controlled Markov’noise

Vivek S. Borkar∗,1

School of Technology and Computer Science, Tata Institute of Fundamental Research, Homi Bhabha Road, Mumbai 400005, India

Received 18 December 2004; received in revised form 14 May 2005; accepted 4 June 2005Available online 18 July 2005

Abstract

Stochastic approximation algorithms with additional noise that can be modelled as a controlled Markov process are analyzed and shownto track the solutions of a differential inclusion defined in terms of the ergodic occupation measures associated with the controlled Markovprocess.© 2005 Elsevier B.V. All rights reserved.

Keywords: Stochastic approximation; Controlled Markov noise; Differential inclusions; Ergodic occupation measures; Invariant sets

1. Introduction

Stochastic approximation algorithms with ‘Markov’ noisehave been extensively analyzed by means of their limitingo.d.e., see, e.g., [3]. The aim of this note is to extend theseresults to the case when the noise is ‘controlled Markov’,by relating it to a limiting differential inclusion defined interms of the so-called ergodic occupation measures associ-ated with the controlled Markov process. Stochastic approxi-mation schemes whose limits are differential inclusions havebeen recently analyzed in [2].

The motivation for this study comes from the fact thatmany times the so-called noise process (which may not infact be a physical noise—see the example below) is notMarkov, but its lack of Markov property comes through itsdependence on some possibly imperfectly known, possiblytime-varying process which may be viewed as a ‘control’ foranalysis purposes. Consider, for example, the case when thestochastic approximation scheme contributes the parameterestimation component of a self-tuning controller, whence thecontrolled state process would be what is called ‘controlledMarkov noise’ here. The control cannot be a priori assumed

∗ Tel.: +91 22 2280 4614x2293; fax: +91 22 2280 4610.E-mail address: [email protected].

1 Research supported in part by Grant no. III.5(157)/99 from theDepartment of Science and Technology, Government of India.

0167-6911/$ - see front matter © 2005 Elsevier B.V. All rights reserved.doi:10.1016/j.sysconle.2005.06.005

to be either Markov or stationary. Note that in this examplethe ‘control’ is in fact a physical control process, but thisneed not be so in general.

The next section derives some preliminary results, basedon which the main result is established in Section 3.

2. Preliminaries

Specifically, we consider the iteration

xn+1 = xn + a(n)[h(xn, Yn) + Mn+1], (1)

where {Yn} is a random process taking values in a completeseparable metric space S with dynamics we shall soon spec-ify, and h : Rd × S → Rd is jointly continuous in its argu-ments and Lipschitz in its first argument uniformly w.r.t. thesecond. {Mn} is a martingale difference sequence w.r.t. the�-fields Fn��(xm, Ym, Mm, m�n), n�0, satisfying

E[‖Mn+1‖2|Fn]�K0(1 + ‖xn‖2) ∀n, (2)

for some K0 > 0. Stepsizes {a(n)} satisfy the usual condi-tions:∑n

a(n) = ∞,∑n

a(n)2 < ∞, (3)

with the additional condition that they be eventuallydecreasing.

Page 2: Stochastic approximation with ‘controlled Markov’ noise

140 V.S. Borkar / Systems & Control Letters 55 (2006) 139–145

We shall assume that {Yn} is an S-valued controlledMarkov process controlled by two control processes: {xn}above and another random process {Zn} taking values in acompact metric space U. Thus

P(Yn+1 ∈ A|Ym, Zm, xm, m�n)

=∫

A

p(dy|Yn, Zn, xn), n�0,

for A Borel in S, where (y, z, x) ∈ S × U × Rd →p(dw|y, z, x) ∈ P(S) is a continuous map specifying thecontrolled transition probability kernel. (Here and in whatfollows, P(· · ·) will denote the space of probability mea-sures on the complete separable metric space ‘· · ·’ withProhorov topology—see, e.g., [5, Chapter 2]). We assumethat the continuity in the x variable is uniform on com-pacts w.r.t. the other variables. We shall say that {Zn} is astationary randomized control if for each n the conditionallaw of Zn given (Ym, xm, Zm−1, m�n) is �(Yn) for a fixedmeasurable map � : y ∈ S → �(y) = �(y, dz) ∈ P(U)

independent of n. Thus, in particular, Zn is conditionally in-dependent of (Ym, xm, Zm−1, m�n) given Yn for n�0. Byabuse of terminology, we identify the stationary randomizedcontrol above with the map �(·).

If xn = x ∀n for a fixed deterministic x ∈ Rd , then {Yn}will be a time-homogeneous Markov process under any sta-tionary randomized control � with the transition kernel

px,�(dw|y) =∫

p(dw|y, z, x)�(y, dz).

Suppose that this Markov process has a (possibly non-unique) invariant probability measure �x,�(dy) ∈ P(S).Correspondingly, we define the ergodic occupation measure

�x,�(dy, dz)� �x,�(dy)�(y, dz) ∈ P(S × U).

This clearly satisfies∫S

f (y) d�x,�(dy, U)

=∫

S

∫U

f (w)p(dw|y, z, x) d�x,� (dy, dz), (4)

for bounded continuous f : S → R. Conversely, if some� ∈ P(S × U) satisfies the above for bounded f ∈ C(S),then it must be of the form �x,� for some stationary ran-domized control �. This is because we can always disinte-grate � as

�(dy, dz) = �(dy)�(y, dz)

with �, � denoting resp. the marginal on S and the regularconditional law on U. Since �(·) is a measurable map S →P(U), it can be identified with a stationary randomized con-trol. Eq. (4) then implies that � is an invariant probabilitymeasure under the ‘stationary randomized control’ �.

We denote by D(x) the set of all such ergodic occupationmeasures for the prescribed x. Since Eq. (4) is preserved

under convex combinations and convergence in P(S × U),D(x) is closed and convex. We also assume that it is com-pact. Once again, using the fact that (4) is preserved underconvergence in P(S ×U), it follows that if x(n) → x in Rd

and �n → � in P(S × U) with �n ∈ D(x(n)) ∀n, then� ∈ D(x), implying upper semicontinuity of the set-valuedmap x → D(x).

Let �(·) denote the Dirac measure at ‘(·)’. We define aP(S × U)-valued random process �(t), t �0, by

�(t) = �(t, dy)��(Yn,Zn), t ∈ [t (n), t (n + 1)),

for n�0. Also define for t > s�0, �ts ∈ P(S×U ×[s, t]) by

�ts (A × B)� 1

t − s

∫B

�(y, A) dy

for A, B Borel in S × U, [s, t] resp. Similar notation willbe followed for other P(S × U)-valued processes. Recallthat S being a complete separable metric space, it can behomeomorphically embedded as a dense subset of a com-pact metric space S. (See [5, Theorem 1.1.1, p. 2].) Asany probability measure on S × U can be identified with aprobability measure on S × U that assigns zero probabil-ity to (S − S) × U , we may view �(·) as a random vari-able taking values in U� the space of measurable func-tions �(·) = �(·, dy) from [0, ∞) to P(S × U). This spaceis topologized with the coarsest topology that renders con-tinuous maps �(·) ∈ U → ∫ T

0 g(t)∫

f d�(t) dt ∈ R for allf ∈ C(S), T > 0 and g ∈ L2[0, T ]. (This topology, some-times called the ‘stable’ topology, is quite common in theliterature on existence of optimal controls (see, e.g., [4])).We shall assume that:

(∗) For f ∈ C(S), the function

(y, z, x) ∈ S × U × Rd →∫

f (w)p(dw|y, z, x)

extends continuously to S × U × Rd .Later on we see a specific instance of how this might

come by, viz., in the Euclidean case. With a minor abuseof notation, we retain the original notation to denote thisextension. Finally, we denote by U0 ⊂ U the subset {�(·) ∈U : ∫

S×U�(t, dy) = 1 ∀t} with the relative topology.

Lemma 2.1. U is compact metrizable.

Proof. For N �1, let {eNi (·), i�1} denote a complete or-

thonormal basis for L2[0, N ]. Let {fj } be countable densein the unit ball of C(S). Then it is a convergence determin-ing class for P(S). It can then be easily verified that

d(�1(·), �2(·))�∑N �1

∑i �1

∑j �1

2−(N+i+j)

×∥∥∥∥∫ N

0eNi (t)

∫fj d�1(t) dt

−∫ N

0eNi (t)

∫fj d�2(t) dt

∥∥∥∥ ∧ 1

Page 3: Stochastic approximation with ‘controlled Markov’ noise

V.S. Borkar / Systems & Control Letters 55 (2006) 139–145 141

defines a metric on U consistent with its topology. To showsequential compactness, take {�n(·)} ⊂ U and by a diag-onal argument, pick a subsequence of {n}, denoted by {n}again by abuse of terminology, such that for each j and N,∫

fj d�n(·)|[0,N ] → �j (·)|[0,N ] weakly in L2[0, N ] for somereal valued measurable functions {�j (·), j �1} on [0, ∞)

satisfying: �j (·)|[0,N ] ∈ L2[0, N ] for every N �1. Fix N.Mimicking the proof of the Banach–Saks theorem [1, Theo-rem 1.8.4], let n(1)=1 and pick {n(k)} inductively to satisfy

∞∑j=1

2−j max1�m<k

∣∣∣∣∫ N

0

(∫fj d�n(k)(t) − �j (t)

)

×(∫

fj d�n(m)(t) − �j (t)

)dt

∣∣∣∣)

<1

k,

which is possible because∫

fj d�n(·)|[0,N ] → �j (·)|[0,N ]weakly in L2[0, N ]. Denote by ‖ · ‖2, 〈·, ·〉2 the norm andinner product in L2[0, T ]. Then for j �1,∥∥∥∥∥ 1

m

m∑k=1

∫fj d�n(k)(·) − �j (·)

∥∥∥∥∥2

2

� 1

m2

(2mN2 + 2

m∑i=2

i−1∑�=1

×∣∣∣∣⟨∫

fj d�n(i)(·) − �j (·),∫fj d�n(�)(·) − �j (·)

⟩∣∣∣∣)

� 2N2

m2 [m + 2j (m − 1)] → 0,

as m → ∞. Thus

1

m

m∑k=1

∫fj d�n(k)(·) → �j (·)

strongly in L2[0, N ] and hence a.e. along a subsequence{m(�)} of {m}. Fix a t �0 for which this is true. Let �′(t)be a limit point in P(S × U) (which is a compact space byProhorov’s theorem—see [5, Chapter 2]), of the sequence⎧⎨⎩ 1

m(�)

m(�)∑k=1

�n(k)(t), m�1

⎫⎬⎭ .

Then �j (t) = ∫fj d�′(t) ∀j , implying that

[�1(t), �2(t), . . .]∈{[∫

f1 d�,∫

f2 d�, . . .

]: � ∈ P(S × U)

}

a.e. in [0, N ], where the ‘a.e.’ may be dropped by the choiceof a suitable modification of the �j ’s. Then this is true forall t, because N was arbitrary. By a standard measurableselection theorem (see, e.g., [7]), it then follows that there

exists a �∗(·) ∈ U such that �j (t) = ∫fj d�∗(t) ∀t, j .

It follows that d(�n(·), �∗(·)) → 0. This completes theproof. �

We assume the stability condition for {xn}supn

‖xn‖ < ∞ a.s. (5)

E[‖xn‖] < ∞ ∀n. (6)

Eq. (6) is a mild condition. Eq. (5), i.e., the a.s. bounded-ness of iterates, usually has to be separately verified using aconvenient stochastic Lyapunov type stability condition orotherwise. In addition, we shall need the following ‘stabil-ity’ condition for {Yn}:

(†) For any t > 0, the set {�s+ts , s�0} remains tight a.s.

By considering (†) for t rational and then using the ob-vious continuity of the map (s, t) → �t

s , we may take acommon set of zero probability outside which (†) holds forall t > 0. A sufficient condition for (†) when S =Rk will bediscussed later.

Define time instants t (0) = 0, t (n) =∑n−1m=0 a(m), n�1.

By (3), t (n) ↑ ∞. Let In�[t (n), t (n + 1)], n�0. De-fine a continuous, piecewise linear x(t), t �0, by x(t (n))=xn, n�0, with linear interpolation on each interval In. Thatis,

x(t) = xn + (xn+1 − xn)t − t (n)

t (n + 1) − t (n), t ∈ In.

Define h(x, �)�∫

h(x, y)�(dy, U) for � ∈ P(S × U). For�(·) as above, consider the non-autonomous o.d.e.

x(t) = h(x(t), �(t)). (7)

Let xs(t), t �s, denote the unique solution to (7) withxs(s) = x(s), for s�0. Likewise, let xs(t), t �s, denotethe unique solution to (7) ‘ending at s’

xs(t) = h(x(t), �(t)), t �s,

with xs(s) = x(s), s ∈ R. Define also

n =n−1∑m=0

a(m)Mm+1, n�1.

Then (n,Fn), n�1, is a zero mean martingale.

Lemma 2.2. For any T > 0,

lims→∞ sup

t∈[s,s+T ]‖x(t) − xs(t)‖ = 0, a.s.

lims→∞ sup

t∈[s−T ,s]‖x(t) − xs(t)‖ = 0, a.s.

Proof. We shall only prove the first claim, as the argumentsfor proving the second claim are completely analogous.

Page 4: Stochastic approximation with ‘controlled Markov’ noise

142 V.S. Borkar / Systems & Control Letters 55 (2006) 139–145

Let [t]� max{t (m) : t (m)� t}. Then by construction,

x(t (n + m)) = x(t (n)) +m−1∑k=0

a(n + k)h(x(t (n + k)),

�(t (n + k))) + �n,m, (8)

where �n,m�n+m − n. Compare this with

xt(n)(t (m + n))

= x(t (n)) +∫ t (n+m)

t(n)

h(xt(n)(t), �(t)) dt

= x(t (n)) +m−1∑k=0

a(n + k)

× h(xt (n)(t (n + k)), �(t (n + k)))

+∫ t (n+m)

t(n)

(h(xt(n)(y), �(y))

− h(xt (n)([y]), �(y))) dy. (9)

Consider the integral on the right-hand side. In view of (5),C0�supn ‖h(xn)‖ ∨ ‖xn‖ < ∞ a.s. Then, by the Lipschitzproperty of h and (5), a straightforward application of theGronwall inequality yields

supt∈[s,s+T ]

‖xs(t)‖�CT < ∞

for a suitable CT > 0 and hence

C� supt∈[s,s+T ]

‖h(xs(t))‖ < ∞.

Thus,∥∥∥∥∥∫ t (n+m)

t(n)

(h(xt(n)(t), �(t))

− h(xt (n)([t]), �([t]))) dt

∥∥∥∥∥�∫ t (n+m)

t(n)

L‖xt(n)(t) − xt(n)([t])‖ dt

�CL

n+m∑k=n

(t (k + 1) − t (k))2

�CL

∞∑k=n

a(k)2 n↑∞→ 0 a.s., (10)

where L is the Lipschitz constant of h in its first argumentand the second inequality uses the fact

‖xt(n)(t) − xt(n)([t])‖�∥∥∥∥∫ t

[t]h(xt (n)(s), �(s)) ds

∥∥∥∥�C(t − [t])�Ca(m),

for t ∈ [t (m), t (m + 1)]. Also, in view of (2), (3), and (5),we have∑n

a(n)2E[‖Mn+1‖2|Fn] < ∞, a.s.

Thus by [5, Theorem 3.3.4, p. 53], n converges a.s. as n →∞. In particular,

supm�n

‖�n,m‖ n↑∞→ 0, a.s. (11)

Subtracting (9) from (8) and taking norms, we have, in viewof (10) and (11),

‖x(t (n + m)) − xt(n)(t (n + m))‖

�L

m−1∑i=0

a(n + i)‖x(t (n + i)) − xt(n)(t (n + i))‖

+ CL∑k �n

a(k)2 + supk �n

‖�n,k‖ a.s.

By the discrete Gronwall inequality, one then has

supt∈[t (n),t (n)+T ]

‖x(t) − xt(n)(t)‖

�KT

⎛⎝CL

∑k �n

a(k)2 + supk �n

‖�n,k‖⎞⎠ a.s.,

for a suitable KT > 0. The claim follows from (10), (11) forthe special case of s → ∞ along {t (n)}. The general claimfollows easily from this. �

We shall also need the following lemma: let �n(·) →�∞(·) in U0.

Lemma 2.3. If xn(·), n= 1, 2, . . . ,∞, denote solutions to(7) corresponding to �(·) = �n(·) for n = 1, 2, . . . , ∞, andwith xn(0) → x∞(0), then limn→∞ supt∈[s,s+T ] ‖xn(t) −x∞(t)‖ → 0 for every T > 0.

Proof. By our choice of the topology for U0,∫ t

0g(t)

∫f d�n(s) ds

−∫ t

0g(t)

∫f d�∞(s) ds → 0,

for bounded continuous g : [0, t] → R, f : S → R. Hence∫ t

0

∫f (s, ·) d�n(s) ds

−∫ t

0

∫f (s, ·) d�∞(s) ds → 0,

for all bounded continuous f : [0, t] × S → R of the form

f (s, w) =N∑

m=1

amgm(s)fm(w),

Page 5: Stochastic approximation with ‘controlled Markov’ noise

V.S. Borkar / Systems & Control Letters 55 (2006) 139–145 143

for some N �1, scalars ai and bounded continuous real val-ued functions gi, fi on [0, t], S resp., for 1� i�N . By theStone–Weirstrass theorem, such functions can uniformly ap-proximate any f ∈ C([0, T ] × S). Thus, the above conver-gence holds true for all such f , implying that d�n(s) ds →d�∞(s) ds in P(S×[0, t]) and hence in P(S×[0, t]). Thus,in particular∥∥∥∥∫ t

0(h(x∞(s), �n(s))

− h(x∞(s), �∞(s))) ds

∥∥∥∥ → 0.

A straightforward application of the Arzela–Ascoli theoremimplies that this convergence is in fact uniform for t in acompact set. Now for t > 0,

‖xn(t) − x∞(t)‖�‖xn(0) − x∞(0)‖

+∫ t

0‖h(xn(s), �n(s)) − h(x∞(s), �n(s))‖ ds

+∥∥∥∥∫ t

0(h(x∞(s), �n(s)) − h(x∞(s), �∞(s))) ds

∥∥∥∥ .

In view of the foregoing, a straightforward application ofthe Gronwall inequality leads to the desired conclusion. �

3. Main results

The key consequence of (†) that we require is the follow-ing:

Lemma 3.1. Almost surely, every limit point of (�(s+·),x(s+·)) for t > 0 as s → ∞ is of the form (�(·), x(·)),where �(·) satisfies: �(t) ∈ D(x(t)) and x(·) satisfies (7)with �(·) = �(·).

Proof. Let {fi} be a countable set of bounded continuousfunctions S × U → R that is a convergence determiningclass for P(S × U). By replacing each fi by aifi + bi forsuitable ai, bi > 0 we may suppose that 0�fi(·)�1 for alli. For each i,

in�

n−1∑m=1

a(m)

(fi(Ym+1) −

∫fi(w)p(dw|Ym, Zm, xm)

),

is a zero mean martingale with supn E[‖in‖2]�∑na(n)2.

By the martingale convergence theorem, it converges a.s. Let�(n, s)� min{m�n: t (m)� t (n) + s} for s�0, n�0. Thenas n → ∞,

�(n,t)∑m=n

a(m)

(fi(Ym+1)

−∫

fi(w)p(dw|Ym, Zm, xm)

)→ 0, a.s.

for t > 0. By our choice of {fi} and the fact that {a(n)} areeventually non-increasing (this is the only time the latterproperty is used),

�(n,s)∑m=n

(a(m) − a(m + 1))fi(Ym+1) → 0, a.s.

Thus,

�(n,t)∑m=n

a(m)

(fi(Ym)

−∫

fi(w)p(dw|Ym, Zm, xm)

)→ 0, a.s.

Dividing by∑�(n,t)

m=n a(m)� t and using (∗), (†) and the uni-form continuity of p(dw|y, z, x) in x on compacts, we ob-tain∫ t (n)+t

t (n)

∫ (fi(y) −

∫fi(w)

×p(dw|y, z, x(s))�(s)(dy, dz)

)ds → 0, a.s.

Fix a sample point in the probability one set on which theabove holds for all i. Let (�(·), x(·)) be a limit point of(�(s +·), xs(·)) in U×C([0, ∞);Rd) as s → ∞. Then theabove leads to∫ t

0

∫ (fi(y) −

∫fi(w)p(dw|y, z, x(s))

)× �(s)(dy, dz) ds = 0 ∀i. (12)

By (†), �t0(S×U×[0, t])=1 ∀t and thus �t

s (S×U×[0, t])=1 ∀t > s�0. By Lebesgue’s theorem, �(t)(S×U)=1 for a.e.t. A similar application of Lebesgue’s theorem in conjunctionwith (12) shows that∫ (

fi(y) −∫

fi(w)p(dw|y, z, x(t))

)× �(t)(dy, dz) = 0 ∀i,

for a.e. t. The qualification ‘a.e. t’ here may be droppedthroughout by choosing a suitable modification of �(·). Byour choice of {fj }, this leads to

�(dw, U) =∫

p(dw|y, z, x(t))�(t)(dy, dz).

The claim follows. �

Consider the differential inclusion

x(t) ∈ h(x(t)), (13)

where h(x)�{h(x, �) : � ∈ D(x)}. Recall that a closed setA ⊂Rd is said to be an invariant set (resp., a positively/negatively invariant set) for (13) if for x ∈ A some trajectoryx(t), −∞ < t < ∞ (resp., 0� t < ∞/ − ∞ < t �0) of(13) with x(0) = x ∈ A satisfies x(t) ∈ A ∀t ∈ R (resp.,

Page 6: Stochastic approximation with ‘controlled Markov’ noise

144 V.S. Borkar / Systems & Control Letters 55 (2006) 139–145

∀t �0/∀t �0). It is said to be chain transitive in additionif for any x, y ∈ A and any T , � > 0, there exist n�1 andpoints x0 = x, x1, . . . , xn−1, xn = y such that the trajectoryof (7) initiated at xi meets with the �-neighborhood of xi+1at some t �T , for 0� i < n. (If we restrict to y = x in theabove, the set is said to be chain recurrent.)

Combining the above lemmas immediately leads to ourmain result.

Theorem 3.1. Almost surely, {x(s + ·), s�0} converge toa chain transitive invariant set of (13). In particular {xn}converge a.s. to such a set.

Proof. Consider a sample point where (5) and the con-clusions of Lemma 2.2 hold. Let A denote the set⋂

t �0{x(s) : s� t}. Since x(·) is continuous and bounded,A will be nonempty compact and connected. If x ∈ A,there exist sn ↑ ∞ in [0, ∞) such that x(sn) → x. Then bythe first part of the above lemma, it follows that as n ↑ ∞,every limit point (x(·), �(·)) of (x(sn + ·), �(sn + ·)) inC([0, ∞);Rd) ×U satisfies: x(·) is a trajectory of (7) with�(·) replaced by �(·) and with x(0) = x. Thus x(sn + t) →x(t) along a subsequence for each t > 0, implying that x(t) ∈A as well. A similar argument works for t < 0, using thesecond part of Lemma 2.2. Thus, A is invariant under (13).

Let x1, x2 ∈ A and �, T > 0. Let X(x, s, t) de-note the solution to (7) starting at x at time s fort �s. Pick � ∈ (0, �/4) such that if ‖x − y‖ < �, thensupt∈[s,s+2T ],s �0 ‖X(x, s, t) − X(y, s, t)‖ < �/4, regard-less of the choice of �(·). (This is possible by a standardGronwall-based estimate based on the uniform Lipschitzcondition on h(·, �).) Pick n0 �1 large enough so that forn�n0, supt∈[t (n),t (n)+2T ] ‖x(t) − xt(n)(t)‖ < �/4 and fur-thermore, x(t (n0)+·) is contained in the �-neighborhoodof A. (This is possible by Lemma 2.2.) Let n2 > n1 > n0 besuch that ‖x(t (ni))−xi‖ < � for i=1, 2. Set s(1)=t (n1) andpick s(1) < s(2) < · · · < sN =t (n2) for some N �1 such thats(i)= t (k(i)) for some k(i) (thus k(1)=n1, k(N)=n2) andT �s(n+1)−s(n)�2T for 1�n < N . Let y(1)=x1, y(N)=x2 and pick y(n), ∈ A, 1 < n < N, such that y(n) is in the�-neighborhood of x(s(n)) for each n in this range. For1�n < N , let yn(s(n)+·) denote a trajectory of (13) startingat y(n) such that

∥∥yn(s(n + 1)) − xs(n)(s(n + 1))∥∥< �/2.

This is possible for large n0. By our choice of n0,‖xs(n)(s(n+1))− x(s(n+1))‖ < �/4. Finally, by our choiceof y(n + 1), ‖y(n + 1) − x(s(n + 1))‖ < � < �/4. Henceyn(s(n + 1)) is in the �-neighborhood of y(n + 1). Thisproves chain transitivity, completing the proof. �

There are some important extensions of the foregoing thatare immediate:

• When the set {supn ‖xn‖ < ∞} has a positive probabilitynot necessarily equal to one, we still have∑

n

a(n)E[‖Mn+1‖2|Fn] < ∞

a.s. on this set. [5, Theorem 3.3.4, p. 53], then tells us thatn converges a.s. on this set. Thus by the same argumentsas before (which are pathwise), Theorem 3.1 continues tohold ‘a.s. on the set {supn ‖xn‖ < ∞}’.

• While we took {a(n)} to be deterministic in the fore-going, the above arguments would go through if {a(n)}are random and bounded, satisfy (3), and the martingaledifference property of {Mn} along with (2) hold withFn redefined as �(xm, Mm, a(m), m�n) for n�0. Infact, the boundedness condition for random {a(n)} couldbe relaxed by imposing appropriate moment conditions.There are applications, e.g., in system identification, when{a(n)} are naturally random.

• The foregoing would go through even if we replaced (1)by

xn+1 = xn + a(n)[h(xn) + Mn+1 + �(n)], n�0,

where {�(n)} is a deterministic or random bounded se-quence which is o(n). This is because {�(n)} then con-tributes an additional error term in the proof of Lemma 2.2which is also asymptotically negligible and hence doesnot affect the conclusions.

For special cases, more can be said, e.g., the following:

Corollary 3.1. Suppose there is no additional control pro-cess {Zn} as above and for each x ∈ Rd and xn ≡ x ∀n,{Yn} above is an ergodic Markov process with a unique in-variant probability measure �(x)=�(x, dy). Then (13) abovemay be replaced by the o.d.e.

x(t) = h(x(t), �(x(t))). (14)

If p(dw|y, x) denotes the transition kernel of this ergodicMarkov process, then �(x) is characterized by∫ (

f (y) −∫

f (w)p(dw|y, x)

)�(x, dy) = 0

for bounded f ∈ C(S). Since this equation is preservedunder convergence in P(S), it follows that x → �(x) andtherefore x → h(x, �(x)) is a continuous map. This guar-antees the existence of solutions to (14) by standard o.d.e.theory, though not their uniqueness. In general, the solutionset for a fixed initial condition will be a non-empty compactsubset of C([0, ∞);Rd). For uniqueness, we need h(·, �(·))to be Lipschitz, which needs additional information about �and the transition kernel p.

We conclude this section with a sufficient condition for(†) when S = Rm for some m�1. The condition, which isa variant of the well known Foster–Lyapunov criterion (see,e.g., [6, Chapters 11 and 12]), is that there exist a V ∈C(Rm) such that lim‖x‖→∞ V (x) = ∞ and furthermore,

supn

E[V (Yn)2] < ∞, (15)

Page 7: Stochastic approximation with ‘controlled Markov’ noise

V.S. Borkar / Systems & Control Letters 55 (2006) 139–145 145

and for some compact B ⊂ Rm and scalar �0 > 0,

E[V (Yn+1)|Fn]�V (Yn) − �0 (16)

a.s. on {Yn /∈ B}.In the earlier framework, we now replace S by Rm and

S by Rm� the one point compactification of Rm with the

additional ‘point at infinity’ denoted simply by ‘∞’. Let{fi} denote a countable convergence determining class offunctions for Rm satisfying lim‖x‖→∞ |fi(x)| = 0 for all i.Thus, they extend continuously to R

mwith value zero at

∞. Also, note that by the dominated convergence theorem,lim‖x‖→∞

∫fi(w)p(dw|y, z, x) → 0 as well. We assume

that this is uniformly in z, x, which verifies (∗).

Lemma 3.2. As s → ∞, any limit point (�∗(·), x∗(·)) of(�(s + ·), x(s + ·)) in U × C([0, ∞);Rd) is of the form

�∗(t) = a(t)�(t) + (1 − a(t))�∞

for t �0 and a(·) a measurable function [0, ∞) → [0, 1],where �(t) ∈ D(x∗(t)) ∀t .

Proof. For {fi} as above, argue as in the proof of Lemma3.1 to conclude that∫ (

fi(y) −∫

fi(w)p(dw|y, z, x∗(t)))

× �∗(t)(dy, dz) = 0 ∀i,

for all t , a.s. Write �∗(t) = a(t)�(t) + (1 − a(t))�∞ witha(·) : [0, ∞) → [0, 1] a measurable map, which is alwayspossible: The decomposition is in fact unique for those tfor which a(t) < 1 and �(t) can be chosen arbitrarily whena(t) = 1. This ensures a measurable choice of a(·). Thenwhen a(t) > 0 the above reduces to∫ (

fi(y) −∫

fi(w)p(dw|y, z, x∗(t)))

× �(t)(dy, dz) = 0 ∀i,

for all t. The claim follows. �

Corollary 3.2. (†) holds.

Proof. Replacing fi by V in the proof of above lemmaand using (15) to justify the use of martingale convergence

theorem therein, we have

lims→∞

∫ t

0

∫ (V (y) −

∫V (w)

× p(dw|y, z, x(s + r))�(s + r)(dy, dz)

)dr = 0,

a.s. Fix a sample point where this and the above lemma hold.Extend the map

(y, z, x) ∈ S × U × Rd → V (y)

− V (w)p(dw|y, z, x) ∈ R

to S ×U ×Rd by setting it equal to −�0 when x =∞. Thenit is upper semicontinuous on S × U × Rd . Hence takingthe above limit along an appropriate subsequence (denotedby, say, {s(n)}), we get

0� − �0

∫ t

0(1 − a(s)) ds

+∫ t

0a(s)

∫ (V (y) −

∫V (w)

× p(dw|y, z, x∗(s))�(s)(dy, dz)

)ds

= − �0

∫ t

0(1 − a(s)) ds,

by the preceding lemma. Thus a(s)=1 a.e., where the ‘a.e.’may be dropped by taking a suitable modification of �∗(·).This implies that the convergence of �(s(n) + ·) to �∗(·) isin fact in U0. This establishes (†). �

References

[1] A.V. Balakrishnan, Applied functional Analysis, Springer, New York,1976.

[2] M. Benaim, J. Hofbauer, S. Sorin, Stochastic approximations anddifferential inclusions, SIAM J. Control Optim. 2005, to appear.

[3] A. Benveniste, M. Metivier, P. Priouret, Adaptive Algorithms andStochastic Approximations, Springer, Berlin, Heidelberg, 1991.

[4] V.S. Borkar, Optimal Control of Diffusion Processes, Pitman LectureNotes in Mathematics, vol. 203, Longman Scientific and Technical,Harlow, UK, 1989.

[5] V.S. Borkar, Probability Theory: An Advanced Course, Springer, NewYork, 1995.

[6] S.P. Meyn, R.L. Tweedie, Markov Chains and Stochastic Stability,Springer, London, 1993.

[7] T. Parthasarathy, Selection Theorems and Their Applications, SpringerLecture Notes in Mathematics, vol. 263, Springer, Berlin, 1972.