4
Systems & Control Letters 4 (1984) l-4 North-Holland February 1984 A note on controlled diffusions on line with time-averaged cost Vivek S. BORKAR Tata Institute of Fundamental Research, P.O. Box 1234, Banga- lore 560012, India Received 24 August 1983 Revised 15 October 1983 Existence of stable optimal Markov controls is established for a class of cost functions for controlled one-dimensional diffusions with time-averaged cost. Keywords: Controlled diffusions, Existence of optimal con- trols, Markov controls, Time-averaged cost, Invariant prob- abilities. where { W(t)} is a Wiener process and {r(t)} a P( U)-valued process progressively measurable with respect to the completed natural filtration of { X(t)}. Call such {r(t)} an admissible control. It is called a Markov control if in addition r(2) = g( X(t)) for all t and some measurable g: R + P(U). By abuse of terminology, we refer to this Markov control simply as g. Under a Markov control g, { X(t)} is a Markov process which is either positive recurrent or not; and if it is, it has a unique invariant probability measure vn E P(R) such that 1. Introduction This note extends a result of [5] to the control of one-dimensional diffusions with time-averaged cost. ,lirir f&7( X(s)) ds =Jfdv, a.s. 0.2) for any bounded continuousf: R + R. (See [2], [3] and the references therein.) Call g a stable Markov control if it renders { X(t)} positive recurrent. We assume that at least one such exists. The set-up is as follows: Let U be a compact metric space and m:RXU-+R, a:R+R bounded continuous maps such that m is continu- ous in its first argument uniformly w.r.t. the sec- ond and u( 0) 2 a for some a > 0. For any Polish space S, let P(S) denote the Polish space of prob- ability measures on S with the topology of Pro- horov metric [4]. Define &:RxP(U)+R by fi(x, r) =lum(x, u)r(du), x E R, TE P(U). It is easily verified that fi is bounded continuous and continuous in the first argument uniformly with respect to the second. Let {X(t)} be the unique weak solution to The cost function c is a bounded continuous map R + R, satisfying: VI= lim c(x) Ixl-=J exists in [0, co) and satisfies q>cug inf / cdu,, (1.3) the infimum being over all stable Markov controls. Remark. This condition clearly holds if c increases monotonically as x + * co. It can be relaxed to: lim infc(x) > (Y. i-4- 00 The modifications in the proofs that follows, re- quired to accommodate this generalization, are minor and self-evident. A Markov control g is called optimal if under arbitrary admissible control { r(t)}, X(t) =&j&(X(s), r(s)) ds + j( X(s)) dW(s), t E [O, a), / (1.1) 0167-6911/84/$3.00 0 1984, Elsevier Science Publishers B.V. (North-Holland) liin$ffJdC(X(s)) ds>a=/cdv, a.s.

A note on controlled diffusions on line with time-averaged cost

Embed Size (px)

Citation preview

Page 1: A note on controlled diffusions on line with time-averaged cost

Systems & Control Letters 4 (1984) l-4 North-Holland

February 1984

A note on controlled diffusions on line with time-averaged cost

Vivek S. BORKAR Tata Institute of Fundamental Research, P.O. Box 1234, Banga- lore 560012, India

Received 24 August 1983 Revised 15 October 1983

Existence of stable optimal Markov controls is established for a class of cost functions for controlled one-dimensional diffusions with time-averaged cost.

Keywords: Controlled diffusions, Existence of optimal con- trols, Markov controls, Time-averaged cost, Invariant prob- abilities.

where { W(t)} is a Wiener process and {r(t)} a P( U)-valued process progressively measurable with respect to the completed natural filtration of { X(t)}. Call such {r(t)} an admissible control. It is called a Markov control if in addition r(2) = g( X(t)) for all t and some measurable g: R + P(U). By abuse of terminology, we refer to this Markov control simply as g. Under a Markov control g, { X(t)} is a Markov process which is either positive recurrent or not; and if it is, it has a unique invariant probability measure vn E P(R) such that

1. Introduction

This note extends a result of [5] to the control of one-dimensional diffusions with time-averaged cost.

,lirir f&7( X(s)) ds =Jfdv, a.s. 0.2)

for any bounded continuousf: R + R. (See [2], [3] and the references therein.) Call g a stable Markov control if it renders { X(t)} positive recurrent. We assume that at least one such exists.

The set-up is as follows: Let U be a compact metric space and

m:RXU-+R, a:R+R

bounded continuous maps such that m is continu- ous in its first argument uniformly w.r.t. the sec- ond and u( 0) 2 a for some a > 0. For any Polish space S, let P(S) denote the Polish space of prob- ability measures on S with the topology of Pro- horov metric [4]. Define

&:RxP(U)+R

by fi(x, r) =lum(x, u)r(du), x E R, TE P(U).

It is easily verified that fi is bounded continuous and continuous in the first argument uniformly with respect to the second. Let {X(t)} be the unique weak solution to

The cost function c is a bounded continuous map R + R, satisfying:

VI= lim c(x) Ixl-=J

exists in [0, co) and satisfies

q>cug inf / cdu,, (1.3)

the infimum being over all stable Markov controls.

Remark. This condition clearly holds if c increases monotonically as x + * co. It can be relaxed to:

lim infc(x) > (Y. i-4- 00

The modifications in the proofs that follows, re- quired to accommodate this generalization, are minor and self-evident.

A Markov control g is called optimal if under arbitrary admissible control { r(t)}, X(t) =&j&(X(s), r(s)) ds

+ j( X(s)) dW(s), t E [O, a), / (1.1)

0167-6911/84/$3.00 0 1984, Elsevier Science Publishers B.V. (North-Holland)

liin$ffJdC(X(s)) ds>a=/cdv, a.s.

Page 2: A note on controlled diffusions on line with time-averaged cost

Volume 4, Number 1 SYSTEMS & CONTROL LE’ITERS February 1984

Clearly, g must be stable. Our aim is to show the existence of an optimal Markov control.

where h = f 0 @R. Using this, it is easily seen that each f E Gg is twice continuously differentiable with f’, f” vanishing at a, b.

Remark. The admissible controls here are actually relaxed controls. If m(x, U) is assumed to be convex for each x (true, e.g., if U is connected), then standard arguments as in [l] show that for each measurable g : R + P(U), there is a measura- ble h : R + U such that

+i(x, g(x)) = m(x, h(x)), x E R.

Then the results here also yield the existence of an optimal Markov control g which is non-relaxed, i.e. for each x, g(x) has a point support.

Lemma 2.1. Zf p E P(R) satisfies

I( $u2f” + m,f’) dp = 0

foruNfEG,thenp=v,.

Remark. Existence of vs, i.e. positive recurrence of { X(t)} under g, is a part of the conclusion.

Proof. Let Y E P(R) be the image of p under 4,. From (2.2), it follows that (2.3) holds if and only if

2. Main results /+((u#;, 0 $ag)*h”) dv=O

Consider a Markov control g. Let

m&J = m(x, g(x))

and define ‘I/, : R + R by

#,(x) =lx exp -i’sdz)dr. i

for all h E Gg. Using the criterion of [6] for in- variant measures of Markov processes, it follows that Y is the unique invariant probability for the process

y(t) = $+w), t E PA 43

with state space (a, b) and satisfying

Y(t) =lb;) o cp,(Y(s)) dW(s). Then 4, satisfies

d2#, -+ 2m dGg dx*

d--=0 o2 dx * (2.1)

Since +, is monotone strictly increasing and con- tinuous, \cI,( R) has the form (a, b) for some b > a, a, b E [ - co, cc]. Thus +g : #s(R) + R satisfying

$-o $,(x)=x forallx

is a well-defined continuous onto map. Let C, (C,) denote the Banach space of continuous maps (a, b) + R (R --) R) vanishing at a, b ( f co), with the supremum norm. Let G c C, be countable dense such that f E G implies that f is twice con- tinuously differentiable with the first two deriva- tives f’, f” vanishing at & a. Let

Gg= (h:(u,b)-,RJh=fo &forsomefEG}.

Then it is easily seen that G8 c C’ and the inclu- sion is dense. For each f E G, the following iden- tity can be verified by direct substitution:

(fo2f”+m,f’) 0 +g=((io2~‘i) 0 r&,)h” (2.2)

2

(2.3)

It is clear that invariance of Y for { Y(t)} implies the invariance of p for { X(t)}. 0

Now consider { X(t)} governed by an arbitrary admissible control {r(t)}. Let E= R U { oo}, the one point compactification of R, and let

&‘={AxBIA,BBorelsetsinR,P(U)resp.}.

For t E [0, co), let

l-m x a =~~(X(s)EA,r(s)EB} ds.

For fixed sample point and fixed t, p: is a proba- bility measure on the field A? and hence extends uniquely to a probability measure pLI on the prod- uct u-field of EXP(U). Then {CL,} is a P(R X P( U))-valued process. By Prohorov’s theorem [4], EXP(U) and hence P(RXP(U)) are com- pact and thus { pcL,} converges for each sample point to a sample point-dependent compact set in P(RXP(U)).

Page 3: A note on controlled diffusions on line with time-averaged cost

Volume 4, Number 1 SYSTEMS & CONTROL LETTERS February 1984

For each p E P( RX P( U)), we can write the decomposition

+(I --QP”(A nt@> +W))),

for A Bore1 in RXP(U), where a,, E [0, 11, p’ E P( R x P(U)) and p” E P({ co} X P(U)). Here, 6, is uniquely specified. If 6, > 0, p’ is also unique. Then since R and P(U) are Polish, there exists p* E P(R) and a p*-a.s. unique measurable map qN : R + P( P( U)) such that for any bounded con- tinuous f: R x P(U) + R,

j-f W =l,J,,,i’h wz> dq,b,) G*,

where wi, w, are dummy variables of integration for the outer and the inner integral respectively. In particular,

Since ti(x, P(U)) is convex for each x, a simple application of Theorem 4 of [8] shows that there exists a measurable map g,, : R + P(U) such that

w, ER.

Lemma 2.2. Outside a nuN set, each limit point p of {p,} in P(RXP(U)) for which a,> 0, satisfies p* = P g,-

Proof. Fix f E G. By Ito’s formula,

fW)) -f(o) =[th) ds+/$s) dW),

(2.4

where

51(s)=faZtX(S))f”(X(S))

+ww, rb>>f’(Xb)h

52(s) =f’Ws)MX(s)).

The second term on the r&h> in (2.4) equals I%‘( T,) for some Wiener process {IV(t)} where

T, = /

‘S;(s) ds 0

([7], pp. 85-92). Then,

lim WT,) T, <O”

a.s. on ( lim q < 00 > , ,+CC t+m

lim wt) -=O a.s.on lim q= 00 , r-00 T, t t+cc >

lim sup f < cc a.s. **m

Combining these facts, we have

,li% +l,$>(s) dW(s) = 0 a.s.

Since f is bounded, we conclude that

lim ‘/‘,$,(s) ds =0 a.s. 1’00 t 0

Recalling the definition of p, and the fact that f’, f” vanish at k co, it is easily verified that the above implies: Outside a null set each limit point ~1 of {CL, } for which 6, > 0, satisfies

j-(fdfr’ + mgpf’) dp* = 0.

Since G is countable, we can choose a null set outside which the above holds for all f E G. The claim now follows from Lemma 2.1. 0

Let { g” } be a sequence of stable Markov con- trols such that

I c dvg. J (Y.

Define pLn E P( RX U), n = 1, 2,. . . , as follows: For any bounded continuous f: RX U + R, let

By Prohorov’s theorem, P(RX U) is compact. Hence {CL, } converges in P( RX U) to a non- empty compact limit set. Let pLa, be a limit point of {run}. Let 6= p,(R x U). Then it is easily verified that

(Y = lim n+cc /

c dv,. 2 (1 - 6)s.

(1.3) then implies that S > 0. Decompose CL, as

P&o = MAA NR x UN

+(I -s)p:(A n({w> x u)),

3

Page 4: A note on controlled diffusions on line with time-averaged cost

Volume 4, Number 1 SYSTEMS & CONTROL LETTERS February 1984

for A Bore1 in R X U, where

prb,EP(RXlJ) and pz~P({oc}XU).

Clearly, pclb, is uniquely defined. Then there exists r*, E P(R) and a pz-a.s. unique measurable map g m : R + P(U) such that for any bounded con- tinuous f: R X U+ R,

j-f d/C, =l,/,fb, 4 W’(x) We.

Lemma 2.3. p*, = vgw and jc dvgm = a.

Proof.LetfEG.Definef:RXU-tRby

ox, u)= i

tu2(x)fI(x) +m(x, u)f’(x), XER,UE U,

0, x= oQ,UE u.

Then f is bounded continuous and for n = 1, 2,. . . ,

/ fdp,, = 0.

(See [6], p. 1.) From this, using the facts that 6 > 0 and f ‘, f” vanish at * cc, one can easily verify that

/( +02f” + mf’) d&

= j(+u2f” + mgmf’) d/& = 0.

By Lemma 2.1, I**, = vsm. Hence

J cd&>o.

Thus

a = lim n+m J

c dva.

=S cd&+(l-S)n. /

From (1.3), it follows that S = 1 and

/ cd&=a. q

Theorem 2.1. An optimal Markov control exists.

Proof. The stable Markov control g”O in the pre- ceding lemma satisfies (Y = jc dvsI. Under any arbitrary admissible control {r(t)}, Lemma 2.2 implies that outside a null set, the following holds:

4

Suppose p is a limit point of { p, } such that p,, + 1-1 for some {t,} c [0, co). Then

,,liir $p(X(s))ds

Thus

liminff/bc(X(s,)dS>u a.s. t-00

The claim follows. q

Remarks. Though Echeverria’s test can be directly applied for diffusion processes with bounded con- tinuous coefficients, the same cannot be said when the drift coefficient is only measurable. This is precisely why we introduce the map qg above, which works around this difficulty by transferring the original problem to an identical one concern- ing another diffusion (viz. Y(t)= #,(X(t)) in Lemma 2.1), whose coefficients are continuous. A similar transformation was used by Zvonkin [9] to prove the existence and uniqueness of strong solu- tions to a one-dimensional stochastic differential equation with measurable drift coefficient and Lips&&continuous diffusion coefficient.

References

[l] V.E. BeneS, Existence of optimal strategies based on speci- fied information, for a class of stochastic decision prob- lems, SIAM J. Control 8 (2) (1970).

(21 R.N. Bhattacharya, Asymptotic behaviour of several dimen- sional diffusions, in: L. Arnold and R. Lefever, Eds., Sto- chastic Nonlinear Systems (Springer, Berlin-New York, 1981).

[3] R.N. Bhattacharya, Criteria for recurrence and existence of invariant measures for multidimensional diffusions, Ann. Probab. 6 (1978). Correction note: Ann. Probab. 8 (1980).

[4] P. Billingsley, Convergence of Probability Measures (Wiley, New York, 1968).

[5] V.S. Borkar, On minimum cost per unit time control of Markov chains, to appear.

[6] P. Echeverria, A criterion for invariant measures of Markov processes, Z. Warsch. Verw. Gebiete 61 (1982).

[7] N. Ikeda and S. Watanabe, Stochastic Differential Equations and Diffusion Processes (North-Holland, Amsterdam; Kodansha, Tokyo, 1981).

[8] E.J. McShane and R.B. Warfield, On Fillipov’s implicit functions lemma, Proc. AMS (Feb. 1967).

]9] A.K. Zvonkin, A transformation of the phase space of a diffusion process that removes the drift, Math. USSR Sbomik 22 (1) (1974).