7
Systems & Control Letters 12 (1989) 343-349 343 North-Holland 'Minimum toll' control of diffusions Vivek S. BORKAR Tata Institute of Fundamental Research, Bangalore Centre, P.O. Box 1234, Bangalore 560 012, India Received 17 October 1988 Revised 23 January 1989 boundaries 8A, 8B, 6D respectively. Write G-=-D \A, F=D\B, H=B\A and let dG, 8F, 8H denote the corresponding boundaries. Let S be a compact metric space and X(.) -~ [XI( • ) ..... Xd( • )] T Abstract: This paper studies the problem of controlling a diffusion up to the first exit time from a bounded domain with a cost being associated with the crossings of a strip separating two subdomains. This leads to a system of Hamilton- Jacobi-Bellman equations coupled through their boundary conditions. the controlled diffusion given by fo % X(t)--x+ (X(s), u(s)) as f/ + o(X(s)) x D, (1.1) Keywords: Controlled diffusions; boundary-crossing cost; opti- mal control; HJB system. 1. Introduction Recently, the author introduced in [4] the prob- lem of controlling a diffusion up to the first exit time from a bounded domain so as to minimize a cost (' toll') associated with the boundary-crossings of a subdomain. The latter was modelled as a finite signed measure supported on this boundary. Standard dynamic programming heuristic then leads, at least formally, to a Hamilton-Jacobi- Bellman equation involving a measure-valued cost. The present paper considers another problem which is similar in spirit and can, in fact, be considered as a simplified caricature of the above. Here, we replace the boundary (a codimension-one submanifold) with a strip and associate a cost with its crossings that depends on the direction of crossing. The similarity of the two problems ends with their conceptual similarity: mathematically, the two are quite different. Unlike the above, the strip-crossing problem leads to a system of two standard Hamilton-Jacobi-Bellman equations coupled through their boundary data. Details fol- low. Let d > 1 and A, B, D bounded open subsets of R d satisfying A c .4c B c B ~ D with C 2 on some probability space (~2, F, P), where: (i) m(., .)= [rnl(. , .) ..... rod(. ' .)IT : Rdxs--+R d is bounded continuous and Lipschitz in its first argument uniformly with respect to the second; (ii) o(')=[[oq(.)] 1, l<i, j<__d, : R d .--} Rd×d is bounded Lipschitz with the least eigenvalue of O(-)oT(-) uniformly bounded away from zero; (iii) W(-) = [WI(. ) ..... Wd(-)] x a d-dimensional standard Wiener process; (iv) u(.) is an S-valued process with measura- ble sample paths satisfying: for t >_ s >_ y , W( t ) - W(s) is independent of u(z), z _<y. Call such a u(-) an admissible control. Call it a Markov control if u(.)= o(X(.)) for a measura- ble map v : R d ---, S. Under a Markov control, (1.1) has a unique strong solution which is a Feller process [11]. In particular, this makes Markov controls admissible. Let = inf( t_> 0 1X(t) ~ D). Arguments of [4], Lemma 2.2, can employed to 0167-6911/89/$3.50 © 1989, Elsevier Science Publishers B.V. (North-Holland)

‘Minimum toll’ control of diffusions

Embed Size (px)

Citation preview

Systems & Control Letters 12 (1989) 343-349 343 North-Holland

'Minimum toll' control of diffusions

V i v e k S. B O R K A R

Tata Institute of Fundamental Research, Bangalore Centre, P.O. Box 1234, Bangalore 560 012, India

Received 17 October 1988 Revised 23 January 1989

boundaries 8A, 8B, 6D respectively. Write G-=-D \ A , F = D \ B , H = B \ A and let dG, 8F, 8 H denote the corresponding boundaries. Let S be a compact metric space and

X ( . ) -~ [ X I ( • ) . . . . . Xd ( • )] T

Abstract: This paper studies the problem of controlling a diffusion up to the first exit time from a bounded domain with a cost being associated with the crossings of a strip separating two subdomains. This leads to a system of Hamilton- Jacobi-Bellman equations coupled through their boundary conditions.

the controlled diffusion given by

fo % X ( t ) - - x + ( X ( s ) , u ( s ) ) as

f/ + o(X(s)) x D, (1.1)

Keywords: Controlled diffusions; boundary-crossing cost; opti- mal control; HJB system.

1. Introduction

Recently, the author int roduced in [4] the prob- lem of controll ing a diffusion up to the first exit time f rom a bounded domain so as to minimize a cost (' toll') associated with the boundary-crossings o f a subdomain. The latter was modelled as a finite signed measure supported on this boundary . Standard dynamic p rogramming heuristic then leads, at least formally, to a H a m i l t o n - J a c o b i - Bellman equation involving a measure-valued cost. The present paper considers another problem which is similar in spirit and can, in fact, be considered as a simplified caricature of the above. Here, we replace the boundary (a codimension-one submanifold) with a strip and associate a cost with its crossings that depends on the direction of crossing. The similarity of the two problems ends with their conceptual similarity: mathematically, the two are quite different. Unlike the above, the strip-crossing problem leads to a system of two s tandard H a m i l t o n - J a c o b i - B e l l m a n equat ions coupled through their boundary data. Details fol- low.

Let d > 1 and A, B, D bounded open subsets of R d satisfying A c . 4 c B c B ~ D with C 2

on some probabil i ty space (~2, F, P) , where:

(i) m ( . , . ) = [rnl(. , .) . . . . . r o d ( . ' .)IT : R d x s - - + R d

is bounded cont inuous and Lipschitz in its first a rgument uniformly with respect to the second;

(ii) o ( ' ) = [ [ o q ( . ) ] 1, l < i , j<__d,

: R d .--} Rd×d

is bounded Lipschitz with the least eigenvalue of O(-)oT(-) uniformly bounded away f rom zero;

(iii) W ( - ) = [WI( . ) . . . . . Wd(-)] x

a d-dimensional s tandard Wiener process; (iv) u ( . ) is an S-valued process with measura-

ble sample paths satisfying: for t >_ s >_ y , W ( t ) - W ( s ) is independent of u(z) , z _<y.

Call such a u(-) an admissible control. Call it a Markov control if u ( . ) = o ( X ( . ) ) for a measura- ble map v : R d ---, S. Under a Markov control, (1.1) has a unique strong solution which is a Feller process [11]. In particular, this makes Markov controls admissible.

Let

= inf( t_> 0 1 X ( t ) ~ D ) .

Arguments of [4], Lemma 2.2, can employed to

0167-6911/89/$3.50 © 1989, Elsevier Science Publishers B.V. (North-Holland)

344 KS. Borkar / Minimum toll control

show that E[ r ] is bounded, uniformly in x and u(-). Define the stopping times

~1 = inf{t > 0l X(t ) ~ 3A },

01 = i n f { t > 0 l ( t ) ~ 3 B } ,

% = i n f { t _ > % l t X ( t ) ~ 3 A a n d X ( s ) ~ 3 B

for s o m e s ~ [ % _ l , t ]} ,

o = inf{t > 0,_ 1 I X ( t ) ~ 3B and S( s ) ~ 3.4

for some s E [ o,_1, t ] },

n = 2, 3 , . . . . Let k : 3H---, R be continuous. For x ~ B, define the cost under control u( . ) to be

Jx(U(-)) = E[ o. ,E k(X(o.))

+ E (1.2) O I < : "rn < "r ]

and for x ~ if,

+ E k(X(oo))]. (1.3) T 1 <: On "< "r /

The interpretation of (1.2)-(1.3) as a toll at- tached to the crossing of the strip H is self-evi- dent. The control problem is to minimize the above cost over all possible

($2, F, P, X ( ' ) , W(- ) , u ( - ) )

for each x ~ D. The plan of the paper is as follows: In the next

section, we show that (1.2)-(1.3) are well-defined and define the 'value-functions' . Section 3 estab- lishes the existence of an optimal control by prob- abilistic arguments and this leads to a continuity result for the value functions. Section 4 studies the system of Hami l ton-Jacobi -Bel lman equations (henceforth called HJB system) satisfied by these value functions and completes the picture.

The problem is motivated by the following situation: Suppose that the controlling mechanism is split (perhaps geographically) into two different centers, one of which controls the process when it is in A or in F, having arrived there from A, the other controls the process when it is in H or in F, having arrived there from H. The cost may then be associated with the switchover from one center

to the other. We have been a little more general in allowing the cost to depend on the state at the time of transfer.

2. Preliminaries

To start with, we show that (1.2)-(1.3) make sense. We start with some preliminary technical lemmas. We introduce at this juncture an ad- ditional assumption which will be dropped later:

re(x, S ) = ( m ( x , y ) l y ~ S }

is convex for each x. ( * )

Lemma 2.1. There exists a Wiener process W and a nonanticipating feedback law

u ( t ) = f ( X ( ' ) , t ) , t > O ,

such that if X(. ) is the solution to (1.1) with replacing W and ~ replacing u, then X and X have the same family of finite dimensional probability distributions.

Remarks. We shall use the weak formulation of the control problem, i.e., optimize over all weak solutions attainable by some nonanticipative con- trol. The above result thus implies that we may assume u(-) to be adapted to the natural filtration of X(.) without any loss of generality.

Proof. Recall the notation of ( * ). By ( * ),

E[m( X( t), u( t ) ) / X ( s ) , s N t]

e m ( X ( t ) , S) a.s., (2.1)

where the conditional expectation is termwise. Applying the selection theorem of Lemma 1 in [2], we conclude that the left hand of (2.1) a.s. equals m(X(t) , ~(t)) for some 2(-) adapted to the natu- ral filtration of X(-). From the results of [12], it follows that

fotm(X(s) ~(s)) ds x(t) =x+

+ fotO(X(s)) d W ( s )

for a suitably defined Wiener process W(.). []

Corollary 2.1. For any stopping time ~ w.r.t, the natural filtration { F~ } of X(.), the regular condi-

V.S. Borkar / Minimum toll control 345

tional law of X( ~ + • ) given F~ is a.s. the law of some controlled diffusion of the type (1.1) with X( ~ ) replacing x.

Proof . By L e m m a 2.1, we m a y assume that u ( . ) is adap ted to (F, }. Thus

u ( t ) = f ( X ( ' ) , t ) a.s.

for some f : C( [0 , oo); R a) × [0, oo) ~ S

which is progressively measurable w.r.t. { F t }. By L e m m a 1.3.3, p. 33, of [10], it follows that the regular condi t ional law of X(~ + • ) given F~ is a.s. the law of a diffusion X ' ( . ) as in (1.1) control led by u ' ( - ) = f ( X ( - ) , .) where X ( . ) = X ( . ) with X(t) , t ~ [0, ~], being held fixed as a parameter , the initial da ta being X(~), again treated as a fixed parameter . []

Lemma 2.2. There exists an tt ~ (0, 1) such that

sup P ( ~ > ~'l) < a. x ~ if, u ( . ) admiss ib le

Proof. Define h : 8G ~ R by h -~ 0 on 6D and - 1 on 6.4. Consider the p rob lem of controll ing 3((-) of (1.1) with x ~ G, with E[h(X(fl))] as the cost, where

fl = i n f ( t > 0[ X ( t ) q~ G}.

This is a classical s i tuat ion for which we know that a Markov control v exists which is op t imal for any x ~ G ([3], Section IV.3). Let X ' ( . ) de- note the corresponding M a r k o v process and L ' its extended generator. Let

+(x) = E [ h ( X ' ( B ) ) / x ' ( o ) = x]

for x ~ G. Then

+ E c ( a ) 0 WI~cP(G), p ~ 2,

and satisfies

a for x E ff for some tt ~ (0, 1). The claim follows. []

Let N be the m a x i m u m n for which ~', < ~-.

L e m m a 2.3. There exists a > 0 such that

sup E[exp(aN)] < oo. (2.2) x~ F,u(- ) admiss ib le

Proof. Pick x ~ ft. Then under any admissible u ( . ) a n d n_> l ,

P(N>_") = E [ E [ I { N > n } / F , , _ , ] I { N > n - 1}] .

By Corol lary 2.1, E [ I ( N > n}/F,,_,] a.s. equals P( .c>'q /X ' (O)) for some control led diffusion X ' ( . ) with X ' (0 ) = X(%_1). Thus by L e m m a 2.2,

E[I{N_<n} /F~ ._ , ] < a a.s.

Thus

P ( N > n ) < a P ( N > _ n - I).

Iterating,

P ( N > n ) < a " .

The claim follows easily f rom this. []

Lemma 2.4. For some c > O,

sup E [ e x p ( c ~ ) ] < ~ . (2.3) x.u(. )

Proof. This follows f rom Corol lary 4.1, p. 433, in [5]. []

Corollary 2.2. (1.2)-(1.3) are well-defined.

This is immedia te in view of the above lemmas. N o w define ' v a lue funct ions '

L'q~ = 0 in G, 4) = h on BG.

Here, WI~P(G ) is the space of measurable func- t ions G--* R, which, a long with their first and second order part ial derivatives (in the sense of distributions), are in Le(O ) when restricted to any arbi t rary open set O whose closure lies in G.

By the m a x i m u m principle for uni formly el- liptic operators , O(x ) < 1 for x ~ G. Thus q,(x) <

,kl : B - - ' R , 4~2 : G - - ' R

as follows: Defining x --* Jx(u(.)) : B ~ R and x ~ J~ ' (u(-) ) : G ~ R by (1.2) and (1.3) respectively, let

q , l ( x ) = i n f J x ( u ( - ) ) , q ,2(x) = i n f J ~ ' ( u ( - ) ) . ~,(-) u(.)

We study these in the subsequent sections.

346 V.S. Borkar / Minimum toll control

3. Continuity of the value functions

Let x , ~ xoo in G and let X"(- ) , n = 1, 2 . . . . . be the solutions to (1.1) with x replaced by x , , n = 1, 2 . . . . . and u(-) by some admissible controls u"( . ) , n = 1, 2 . . . . . respectively. As in [9], one can show that the laws of X"( . ) , n = 1, 2 . . . . . viewed as probabili ty measures on the trajectory space C([0, oo), Rd), are tight and converge along a subsequence of (n } to the law of a controlled diffusion X ~ ( - ) starting at xoo and governed by some admissible control u°~(-). (Note that the assumption (* ) plays a crucial role here.) Drop- ping to this subsequence, we denote it by (n } again by abuse of notation. Furthermore, Skoro- hod 's theorem ([7], p. 9) allows us to consider X"( . ) , n - - 1 , 2 . . . . . oo, to be defined on a com- mon probabil i ty space (~2, F, P ) with X"( - ) X°~(.) a.s. in C([0, oo); Rd). For each n = 1, 2 . . . . . oe, define s topping times 1"", ( ~," ), ( o," ) for X " ( . ) the same way as ~', {~-,), {o,} were defined for X(.) .

Corollary 3.1. For each x ~ G, there exists a con- trol u( ' ) such that J~(u(.)) = q~z(x).

Proof. Fix x ~ G . Let x , = x for all n in the above and take (u" ( - )} such that J~(u"(.)) de- creases to ~2(x) . Then, in the above notat ion,

E k ( X " ( ~ ; ) ) + E k ( X ° ( ~ = ) )

E

+ E k ( X ~ ( o 2 ) ) a.s., (3.1)

by virtue of the above lemma and the fact that ~" cannot coincide with any of (~", o,") for any sample path and any n. By (2.2)-(2.3), the quanti- ties in (3.1) are uniformly integrable. Thus we can take the expectations to conclude (from the choice of u" ( ' ) , n = 1, 2 . . . . ) that Jx(U°°('))=hb2(x). []

Corollary 3.2. ~2 is continuous.

Lemma 3.1. r" --, ~.o~ a.s., ~'i" ~ ~'i °° a.s. for all i and , ~ o~ for all i. % o i a.s.

ProoL We shall prove that ~" ---, ~.o~ a.s. A similar argument proves the other claims. Let

~- '= inf{t >__ 0l X ~ ( t ) ~ 8D} .

Then simple geometric considerations show that a.s., any limit point of (~'n} in [0, oo) must lie in [~,, oo] . But under our uniform ellipticity hy- pothesis on oo T and smoothness (C 2) hypothesis on 8D, ~' = ¢oo a.s. (To see this, first note that by Corol lary 2.1, it suffices to consider a controlled diffusion starting at some x ~ 8D and show that its first exit time f rom C, is a.s. zero. Next, by a local change of coordinates, we may assume that x = 0 and a relatively open neighbourhood of x in 8D is in fact the unit disc in the hyperplane

. . . . .

Thus one must show that the n-th componen t of this diffusion must be strictly positive at some time point in [0, e) for each e > 0, a.s. By a change of measure argument, we may assume that m , ( . , . ) ~ 0. Then the said componen t is only a t ime-changed Brownian motion for which the re- sult is well-known.) Thus it follows that ~" ~ ~'~ a.s. []

Proof. Let x n --, x~ in G. In the f ramework of Lemma 3.1, let u"( . ) , n = 1, 2 . . . . . be such that +2(x,) = Jx,(u"(.)) for each n. Then

by arguments similar to those employed to prove Corollary 3.1. On the other hand, for a fixed control u(-) and Wiener process W( . ) on some probabil i ty space, a s tandard argument involving Lipschitz continuity of m ( . , u), o ( . ) and the Gronwal l inequality shows that the law of the resulting X(- ) in (1.1) depends cont inuously on x. In particular, the laws under x = x , converge as n ---, oo to the law for x = x~o. Once again, argu- ments of the type that led to the preceding corollary can be used to prove that Jx (u ( . ) ) J~=(u(-)). Since J~,(u(.)) >__ J~ (un(-)) , we conclude that Jx®(u( . ) )>Jx(u°°( ' ) ) . Since u( . ) was arbi- trary, it follows that Jx(U°°(.))=~k2(xoo), i.e.,

+2(x.)--'~2(x~)- []

A parallel a rgument leads to the following:

Lemma 3.2. ~b I is continuous.

Corollary 3.3. An optimal control exists for any x ~ D .

V.S. Borkar / Minimum toll control

4. The HJB system

We shall show that ~ , ~2 are characterized by the HJB system comprised of (4.1)-(4.5) below. Let

(L f ) ( x , U)= E mi(x , u i = 1

a Of + E ox,a.j i , j ,k=l

for any function f ~ WI~P(Rd), p >-- 1, where x = [x 1 . . . . . Xd] is a typical point in R a.

Theorem 4.1. (de1, de2) is the unique solution in

n w L " ( s ) ) x ( c ( 8 ) n wL'(a)), p > 2 ,

to (4.1)-(4.5) below:

inf (Ldel)(x, u) = 0 a.e. in B, (4.1) uES

inf (Lde2)(x, u) = 0 a.e. in G, (4.2) u~S

dea - ~b2 = k on 8/1, (4.3)

de2 - de1 = k on 8b, (4.4)

de2 =0 on 89. (4.5)

Proof. Let x ~ B. Routine dynamic programming arguments show that with X(.) as in (1.1),

+,(x) = +,(

Thus de, is the value function for the classical control problem of controlling X(-) of (1.1) in B with the terminal cost E[de1(X(o0)]. In view of Lemma 3.2 and Section IV.2 of [3], it follows that

d/,~C(B) nW,~'(B), p>_2,

and (4.1) holds. Analogous arguments establish that

de2 ~ C(G) n W,~P(G), p >_ 2,

and (4.2), (4.5) hold. (4.3), (4.4) follow from the definitions of q:l, de2. To prove uniqueness, let (de1, de2) be another solution of the HJB system in the prescribed class. Let x E B, u(- ) an admissible control and X(.) the corresponding solution to (1.1). Define (o, } as before and ~, be the least ,~

347

exceeding on for n = 1, 2 , . . . with T 0 = 0. By (4.1), (4.2), for t > 0,

(L ~ l ) (X ( t ) , u(t))>__O a.s. on { X ( t ) ~ B } (4.6)

and

( L~z)( X( t ) , u(t)) >_ 0 a.s. on ( X( t ) ~ G}. (4.7)

By a repeated application of the extended Ito formula of [8], p. 122, coupled with the optional sampling theorem, we get, in view of (4.6), (4.7),

E[ ~2(X(~, ^ z ) ) I{ • > o, }]

>_ E[ ~/2(X(o,,))I( T > o, }], (4.8)

E[~l(X(o , ) ) l { ' r>~, , -1}]

> E[~l(X(rr,_l))l{.:>~r,,_l}]. (4.9)

Furthermore, from (4.3), (4.4), we have

E[ ~/l( X( rr,))I('r > ~,, } ]

= E[ (~2 (X(zr.)) + k( X(rr,)))I{ ~" > rr, }], (4.10)

E[ ~/2( X(o,))1{ i- > o, }]

= E [ ( ~ I (X(o , ) ) + k (X(o , ) ) ) 1{ ~" > o, }]. (4.11)

Summing both sides of (4.8)-(4.11) for each n and then over all n, we end up with qq(x)_< Jx(u(.)). Now let Vl :B ~ S be a measurable map such that

d 0~1 E mi(x, Va(X)) OX, i=1

d . O~/~ (4.12) = inf ]~ mi(x, u ) - ~ x i a.e.

u i = 1

and v 2 : G ~ S a measurable map such that

d O ~ 2

E m i ( x , O2(X))-~X i i = 1

a " 0~2 (4.13) = inf ~ rni(x, u)--~x i a.e. u i z 1

Such vl, 02 can always be found by a standard selection theorem (Lemma 1, p. 185, of [2]). Con-

348

sider a controlled diffusion .~(.) as in (1.1) start- ing at x ~ B as above, but governed by the admis- sible control fi(.) defined by: fi(t) = vl(X(t) ) for ~ l < t <o,, n > l , and ~( t )=v2(X( t ) ) for o , < t < ~r, n > 1. Then by (4.12) and (4.13), equality holds in (4.6), (4.7) when X(-), u(-) are replaced by X(.) , fi(.). As a consequence, equality holds in (4.8)-(4.11) under this substitution, leading on summat ion over n to ~ l ( x ) = J~(fi(-)). Thus

~ l ( x ) = m i n J ~ ( u ( . ) ) = ~ , ( x ) . u(.)

A parallel argument establishes ~2(x)=~P2(x) . []

V.S. Borkar / Minimum toll control

the process X(~,_I + " ) in place of X( . ) estab- lishes this equality for other values of n. Again, a similar argument involving qJ2, G in place of ~p~, B leads to an equality in (4.8) under the Markov control v 2 for all n. Summing (4.8)-(4.11) for each n and then over all n as in the proof of Theorem 4.1, one obtains

~pl(x) = J , ( ( v ~ , v2) ) = i n f J ~ ( u ( . ) ) u(.)

for x ~ B and

+ 2 ( x ) = J : { ( ( v , , v2) ) = i n f Jx ' (U( - ) ) u(-)

for x ~ F. []

Remark. An impor tant detail glossed over in the above proof is the fact that to go over f rom (4.1), (4.2) to (4.6), (4.7) (or the equality therein when X(.) , fi(.) replace X(.) , u(-)), one has to invoke the mutual absolute continuity of the law of X( t ) for any t > 0 with respect to the Lebesgue mea- sure. The latter can be proved as follows: By the Girsanov theorem, it suffices to consider zero drift, in which case the result is a minor consequence of some well-known estimates for parabolic equa- tions as in, e.g., [1].

For measurable maps V 1 : B ~ S , v 2 " a ~ S , let (v~, v2) denote the control policy described by: use u( . ) = v l (X( . ) ) till the first exit f rom B (vacu- ous if initially not in B) and thereafter, use u(-) = va(X(. ) ) in A and during a crossover from A to F and use u( . )=v2 (X( . ) ) in F and during a crossover f rom F to A, where in either crossover, we include the initial time point, i.e. the time when the first of 8A or 8B is hit, but exclude the terminal time point when the other one is hit.

Corollary 4.1. The control policy (vl, v2) defined above is optimal if and only if it satisfies (4.12) and (4.13).

Proof. The ' i f ' part is essentially contained in the p roof of the above theorem. To prove the con- verse, note that (4.1) makes ~Pa the value function for the classical control problem of controll ing X( . ) in B with the exit cost E[tpl(x(o~)) ]. By (4.12), o I is the optimal Markov control for this problem and hence under o, ~ ( X ( o I A t)), t > 0, is a martingale with respect to the natural filtra- tion of X(. ) ([3], pp. 160-161). Thus equality holds in (4.9) for n = 1. A similar argument applied to

Corollary 4.2. Theorem 4.1 still holds if assumption ( * ) is dropped.

Proof. Suppose (* ) is false. Move over to the ' relaxed control ' f ramework of [6], i.e., let S be the space of probabil i ty measures on a compact metric space U with Prohorov topology and m( . , .) be of the form

m ( x , u ) = r u m ( X , y ) u ( d y )

(termwise integration), for some ~ : Rd× U---, R d which is bounded cont inuous and Lipschitz in the first argument uniformly with respect to the sec- ond. N o w (* ) holds and the above applies. Note that the min imum in (4.12)-(4.13) will always be obtained on the set of Dirac measures in S, itself a compact subset of S. By the selection theorem of Lemma 1, p. 185 in [2], v 1, v 2 may be chosen so as to have their range entirely in the set of Dirac measures and can be identified with U-valued maps in an obvious manner, leading to an ordinary (i.e., 'pre-relaxat ion ' ) opt imal control policy. []

Remark 1. Note that while defining the cost in (1.2)-(1.4), the strip H was clubbed together with A rather than with F. This is only a matter of convention. The opposite possibility can be simi- larly handled. A third possibility is to define for x ~ H the cost to be

J"(u(.))

=E[.=,E (k(X(ooA,o))I(,>oo^,ol

A • > o. ^ })[. J

V.S. Borkar / Minimum toll control

This can be handled as follows: Define ~3 : H --, R by

~k3(x) = i n f J ~ " ( u ( - ) ) . u(-)

Adjoin to (4.1)-(4.5) the equat ions

m i n ( L ~ 3 ) ( x , u ) = 0 a.e. in H , u Reference

~b 2 - ~b 3 = k on 8B,

(4.14)

~ l - ~ b a = k o n S A .

(4.15)

Then ~b 3 is the classical value funct ion for the p rob lem of controll ing X( - ) in H with exit cost ~b 1 - k on 8A and ~b 2 - k on 8B. Thus it is uniquely

2,p character ized in C ( H ) n Wlo ¢ ( H ) , p > 2, by (4.14), (4.15). Let v3 : H ---, S be such that

d ~k3 E m , ( x , v 3 ( x ) ) Ox i

i=1

d 01~3 = rain Y'~ m , ( x , u ) - - ~ - x a.e. (4.16)

u i ~ l

Define (vl , v2, v3) to be the control policy that coincides with (01, v2) for x ~ H and for x ~ H, uses the Markov control o 3 till X( - ) exits f rom H and (v 1, v2) thereafter. This is easily proven to be opt imal . Moreover , an analog of Corol lary 4.1 holds for (v l, v2, v3) with (4.16) adjoined to (4.12)-(4.13).

Re mark 2. If we collapse the strip H into a codimens ion-one submani fo ld by setting A = B, H = 8A = 8B with one of (4.3), (4.4) dropped, the HJB system is ill-posed because uniqueness is lost. Probabilistically, the difficulty lies in defining the

349

s topping t imes { T i }, (oi }, because once X( - ) crosses 8.4, it will do so infinitely of ten in any arbi trar i ly small t ime interval thereafter . Thus the model of [4] involving a measure-va lued cost seems the only way out.

[1] D.G. Aronson, Bounds for the fundamental solution of a parabolic equation, Bull. Amer. Math. Soc. 73 (1967) 890-896.

[2] V.E. Benel, Existence of optimal strategies based on specified information, for a class of stochastic decision problems, SlAM J. Control Optim. 8 (1970) 179-188.

[3] A. Bensoussan, Stochastic Control by Functional Analysis Methods (North-Holland, Amsterdam, 1982).

[4] V.S. Borkar, Controlled diffusions with boundary-crossing costs, Applied Math. Optim. 18 (1988) 67-83.

[5] V.S. Borkar, Control of a partially observed diffusion up to an exit time, Systems Control Letters 8 (1987) 429-434.

[6] W.H. Fleming, Generalized solutions in optimal stochastic control, in: E. Roxin, P.T. Lin, and R.L. Sternberg, Eds., Differentml Games and Control Theory III (Marcel Dek- ker, New York, 1977) 147-165.

[7] N. Ikeda and S. Watanabe, Stochastic Differential Equa- tions and Diffusion Processes (North-Holland/Kodanska, 1981).

[8] N.V. Krylov, Controlled Diffusion Processes (Springer, Berlin-New York, 1981).

[9] H.J. Kushner, Existence results for optimal stochastic control, J. Optim. Theory Appl. 15 (1975) 347-359.

[10] D.W. Stroock and S.R.S. Varadhan, Multidimensional Dif- fusion Processes (Springer, Berl in-New York, 1979).

[11] A.Ju. Veretennikov, On strong solutions and explicit for- mulas for solutions of stochastic differential equations, Math. USSR Sbornik 39 (1981) 387-401.

[12] E. Wong, Representation of martingales, quadratic varia- tion and applications, SlAM J. Control Optim. 9 (1971) 621-633.