Upload
vivek-s-borkar
View
212
Download
0
Embed Size (px)
Citation preview
Systems & Control Letters 4 (1984) l-4 North-Holland
February 1984
A note on controlled diffusions on line with time-averaged cost
Vivek S. BORKAR Tata Institute of Fundamental Research, P.O. Box 1234, Banga- lore 560012, India
Received 24 August 1983 Revised 15 October 1983
Existence of stable optimal Markov controls is established for a class of cost functions for controlled one-dimensional diffusions with time-averaged cost.
Keywords: Controlled diffusions, Existence of optimal con- trols, Markov controls, Time-averaged cost, Invariant prob- abilities.
where { W(t)} is a Wiener process and {r(t)} a P( U)-valued process progressively measurable with respect to the completed natural filtration of { X(t)}. Call such {r(t)} an admissible control. It is called a Markov control if in addition r(2) = g( X(t)) for all t and some measurable g: R + P(U). By abuse of terminology, we refer to this Markov control simply as g. Under a Markov control g, { X(t)} is a Markov process which is either positive recurrent or not; and if it is, it has a unique invariant probability measure vn E P(R) such that
1. Introduction
This note extends a result of [5] to the control of one-dimensional diffusions with time-averaged cost.
,lirir f&7( X(s)) ds =Jfdv, a.s. 0.2)
for any bounded continuousf: R + R. (See [2], [3] and the references therein.) Call g a stable Markov control if it renders { X(t)} positive recurrent. We assume that at least one such exists.
The set-up is as follows: Let U be a compact metric space and
m:RXU-+R, a:R+R
bounded continuous maps such that m is continu- ous in its first argument uniformly w.r.t. the sec- ond and u( 0) 2 a for some a > 0. For any Polish space S, let P(S) denote the Polish space of prob- ability measures on S with the topology of Pro- horov metric [4]. Define
&:RxP(U)+R
by fi(x, r) =lum(x, u)r(du), x E R, TE P(U).
It is easily verified that fi is bounded continuous and continuous in the first argument uniformly with respect to the second. Let {X(t)} be the unique weak solution to
The cost function c is a bounded continuous map R + R, satisfying:
VI= lim c(x) Ixl-=J
exists in [0, co) and satisfies
q>cug inf / cdu,, (1.3)
the infimum being over all stable Markov controls.
Remark. This condition clearly holds if c increases monotonically as x + * co. It can be relaxed to:
lim infc(x) > (Y. i-4- 00
The modifications in the proofs that follows, re- quired to accommodate this generalization, are minor and self-evident.
A Markov control g is called optimal if under arbitrary admissible control { r(t)}, X(t) =&j&(X(s), r(s)) ds
+ j( X(s)) dW(s), t E [O, a), / (1.1)
0167-6911/84/$3.00 0 1984, Elsevier Science Publishers B.V. (North-Holland)
liin$ffJdC(X(s)) ds>a=/cdv, a.s.
Volume 4, Number 1 SYSTEMS & CONTROL LE’ITERS February 1984
Clearly, g must be stable. Our aim is to show the existence of an optimal Markov control.
where h = f 0 @R. Using this, it is easily seen that each f E Gg is twice continuously differentiable with f’, f” vanishing at a, b.
Remark. The admissible controls here are actually relaxed controls. If m(x, U) is assumed to be convex for each x (true, e.g., if U is connected), then standard arguments as in [l] show that for each measurable g : R + P(U), there is a measura- ble h : R + U such that
+i(x, g(x)) = m(x, h(x)), x E R.
Then the results here also yield the existence of an optimal Markov control g which is non-relaxed, i.e. for each x, g(x) has a point support.
Lemma 2.1. Zf p E P(R) satisfies
I( $u2f” + m,f’) dp = 0
foruNfEG,thenp=v,.
Remark. Existence of vs, i.e. positive recurrence of { X(t)} under g, is a part of the conclusion.
Proof. Let Y E P(R) be the image of p under 4,. From (2.2), it follows that (2.3) holds if and only if
2. Main results /+((u#;, 0 $ag)*h”) dv=O
Consider a Markov control g. Let
m&J = m(x, g(x))
and define ‘I/, : R + R by
#,(x) =lx exp -i’sdz)dr. i
for all h E Gg. Using the criterion of [6] for in- variant measures of Markov processes, it follows that Y is the unique invariant probability for the process
y(t) = $+w), t E PA 43
with state space (a, b) and satisfying
Y(t) =lb;) o cp,(Y(s)) dW(s). Then 4, satisfies
d2#, -+ 2m dGg dx*
d--=0 o2 dx * (2.1)
Since +, is monotone strictly increasing and con- tinuous, \cI,( R) has the form (a, b) for some b > a, a, b E [ - co, cc]. Thus +g : #s(R) + R satisfying
$-o $,(x)=x forallx
is a well-defined continuous onto map. Let C, (C,) denote the Banach space of continuous maps (a, b) + R (R --) R) vanishing at a, b ( f co), with the supremum norm. Let G c C, be countable dense such that f E G implies that f is twice con- tinuously differentiable with the first two deriva- tives f’, f” vanishing at & a. Let
Gg= (h:(u,b)-,RJh=fo &forsomefEG}.
Then it is easily seen that G8 c C’ and the inclu- sion is dense. For each f E G, the following iden- tity can be verified by direct substitution:
(fo2f”+m,f’) 0 +g=((io2~‘i) 0 r&,)h” (2.2)
2
(2.3)
It is clear that invariance of Y for { Y(t)} implies the invariance of p for { X(t)}. 0
Now consider { X(t)} governed by an arbitrary admissible control {r(t)}. Let E= R U { oo}, the one point compactification of R, and let
&‘={AxBIA,BBorelsetsinR,P(U)resp.}.
For t E [0, co), let
l-m x a =~~(X(s)EA,r(s)EB} ds.
For fixed sample point and fixed t, p: is a proba- bility measure on the field A? and hence extends uniquely to a probability measure pLI on the prod- uct u-field of EXP(U). Then {CL,} is a P(R X P( U))-valued process. By Prohorov’s theorem [4], EXP(U) and hence P(RXP(U)) are com- pact and thus { pcL,} converges for each sample point to a sample point-dependent compact set in P(RXP(U)).
Volume 4, Number 1 SYSTEMS & CONTROL LETTERS February 1984
For each p E P( RX P( U)), we can write the decomposition
+(I --QP”(A nt@> +W))),
for A Bore1 in RXP(U), where a,, E [0, 11, p’ E P( R x P(U)) and p” E P({ co} X P(U)). Here, 6, is uniquely specified. If 6, > 0, p’ is also unique. Then since R and P(U) are Polish, there exists p* E P(R) and a p*-a.s. unique measurable map qN : R + P( P( U)) such that for any bounded con- tinuous f: R x P(U) + R,
j-f W =l,J,,,i’h wz> dq,b,) G*,
where wi, w, are dummy variables of integration for the outer and the inner integral respectively. In particular,
Since ti(x, P(U)) is convex for each x, a simple application of Theorem 4 of [8] shows that there exists a measurable map g,, : R + P(U) such that
w, ER.
Lemma 2.2. Outside a nuN set, each limit point p of {p,} in P(RXP(U)) for which a,> 0, satisfies p* = P g,-
Proof. Fix f E G. By Ito’s formula,
fW)) -f(o) =[th) ds+/$s) dW),
(2.4
where
51(s)=faZtX(S))f”(X(S))
+ww, rb>>f’(Xb)h
52(s) =f’Ws)MX(s)).
The second term on the r&h> in (2.4) equals I%‘( T,) for some Wiener process {IV(t)} where
T, = /
‘S;(s) ds 0
([7], pp. 85-92). Then,
lim WT,) T, <O”
a.s. on ( lim q < 00 > , ,+CC t+m
lim wt) -=O a.s.on lim q= 00 , r-00 T, t t+cc >
lim sup f < cc a.s. **m
Combining these facts, we have
,li% +l,$>(s) dW(s) = 0 a.s.
Since f is bounded, we conclude that
lim ‘/‘,$,(s) ds =0 a.s. 1’00 t 0
Recalling the definition of p, and the fact that f’, f” vanish at k co, it is easily verified that the above implies: Outside a null set each limit point ~1 of {CL, } for which 6, > 0, satisfies
j-(fdfr’ + mgpf’) dp* = 0.
Since G is countable, we can choose a null set outside which the above holds for all f E G. The claim now follows from Lemma 2.1. 0
Let { g” } be a sequence of stable Markov con- trols such that
I c dvg. J (Y.
Define pLn E P( RX U), n = 1, 2,. . . , as follows: For any bounded continuous f: RX U + R, let
By Prohorov’s theorem, P(RX U) is compact. Hence {CL, } converges in P( RX U) to a non- empty compact limit set. Let pLa, be a limit point of {run}. Let 6= p,(R x U). Then it is easily verified that
(Y = lim n+cc /
c dv,. 2 (1 - 6)s.
(1.3) then implies that S > 0. Decompose CL, as
P&o = MAA NR x UN
+(I -s)p:(A n({w> x u)),
3
Volume 4, Number 1 SYSTEMS & CONTROL LETTERS February 1984
for A Bore1 in R X U, where
prb,EP(RXlJ) and pz~P({oc}XU).
Clearly, pclb, is uniquely defined. Then there exists r*, E P(R) and a pz-a.s. unique measurable map g m : R + P(U) such that for any bounded con- tinuous f: R X U+ R,
j-f d/C, =l,/,fb, 4 W’(x) We.
Lemma 2.3. p*, = vgw and jc dvgm = a.
Proof.LetfEG.Definef:RXU-tRby
ox, u)= i
tu2(x)fI(x) +m(x, u)f’(x), XER,UE U,
0, x= oQ,UE u.
Then f is bounded continuous and for n = 1, 2,. . . ,
/ fdp,, = 0.
(See [6], p. 1.) From this, using the facts that 6 > 0 and f ‘, f” vanish at * cc, one can easily verify that
/( +02f” + mf’) d&
= j(+u2f” + mgmf’) d/& = 0.
By Lemma 2.1, I**, = vsm. Hence
J cd&>o.
Thus
a = lim n+m J
c dva.
=S cd&+(l-S)n. /
From (1.3), it follows that S = 1 and
/ cd&=a. q
Theorem 2.1. An optimal Markov control exists.
Proof. The stable Markov control g”O in the pre- ceding lemma satisfies (Y = jc dvsI. Under any arbitrary admissible control {r(t)}, Lemma 2.2 implies that outside a null set, the following holds:
4
Suppose p is a limit point of { p, } such that p,, + 1-1 for some {t,} c [0, co). Then
,,liir $p(X(s))ds
Thus
liminff/bc(X(s,)dS>u a.s. t-00
The claim follows. q
Remarks. Though Echeverria’s test can be directly applied for diffusion processes with bounded con- tinuous coefficients, the same cannot be said when the drift coefficient is only measurable. This is precisely why we introduce the map qg above, which works around this difficulty by transferring the original problem to an identical one concern- ing another diffusion (viz. Y(t)= #,(X(t)) in Lemma 2.1), whose coefficients are continuous. A similar transformation was used by Zvonkin [9] to prove the existence and uniqueness of strong solu- tions to a one-dimensional stochastic differential equation with measurable drift coefficient and Lips&&continuous diffusion coefficient.
References
[l] V.E. BeneS, Existence of optimal strategies based on speci- fied information, for a class of stochastic decision prob- lems, SIAM J. Control 8 (2) (1970).
(21 R.N. Bhattacharya, Asymptotic behaviour of several dimen- sional diffusions, in: L. Arnold and R. Lefever, Eds., Sto- chastic Nonlinear Systems (Springer, Berlin-New York, 1981).
[3] R.N. Bhattacharya, Criteria for recurrence and existence of invariant measures for multidimensional diffusions, Ann. Probab. 6 (1978). Correction note: Ann. Probab. 8 (1980).
[4] P. Billingsley, Convergence of Probability Measures (Wiley, New York, 1968).
[5] V.S. Borkar, On minimum cost per unit time control of Markov chains, to appear.
[6] P. Echeverria, A criterion for invariant measures of Markov processes, Z. Warsch. Verw. Gebiete 61 (1982).
[7] N. Ikeda and S. Watanabe, Stochastic Differential Equations and Diffusion Processes (North-Holland, Amsterdam; Kodansha, Tokyo, 1981).
[8] E.J. McShane and R.B. Warfield, On Fillipov’s implicit functions lemma, Proc. AMS (Feb. 1967).
]9] A.K. Zvonkin, A transformation of the phase space of a diffusion process that removes the drift, Math. USSR Sbomik 22 (1) (1974).