23
RATES OF CONVERGENCE FOR APPROXIMATION SCHEMES IN OPTIMAL CONTROL * PAUL DUPUIS AND MATTHEW R. JAMES SIAM J. CONTROL OPTIM. c 1998 Society for Industrial and Applied Mathematics Vol. 36, No. 2, pp. 719–741, March 1998 012 Abstract. We present a simple method for obtaining rate of convergence estimates for ap- proximations in optimal control problems. Although the method is applicable to a wide range of approximation problems, it requires in all cases some type of smoothness of the quantity being ap- proximated. We illustrate the method by presenting a number of examples, including finite difference schemes for stochastic and deterministic optimal control problems. A general principle can be ab- stracted, and indeed the method may be applied to a variety of approximation problems, such as the numerical approximation of nonlinear PDEs not a priori related to control theory. Key words. optimal control, numerical approximation, rate of convergence, finite differences, ergodic control, reflected diffusions, nonlinear PDE AMS subject classifications. 93E20, 65N12, 65N15, 65N06, 93E25 PII. S0363012994267789 1. Introduction. A fundamental problem in numerical analysis is the determi- nation of the rate of convergence of approximation schemes. In general, the rate of convergence depends on the nature of the approximation and on the smoothness of the quantity being approximated. For example, the standard finite difference scheme for Laplace’s equation in a smooth domain converges with a rate proportional to the discretization step size. In optimal control theory, one is often faced with the problem of computing the minimal cost function, also referred to as the value function. In many cases, the value function can be characterized as an appropriate solution to a Hamilton–Jacobi– Bellman (HJB) equation that takes the form of a nonlinear PDE. In general, one cannot compute the value function explicitly, and instead must resort to a numerical approximation. Various approximation schemes are available (e.g., finite difference or finite element), and convergence results are either analytic (e.g., Crandall and Lions [5], Barles and Souganidis [1]) or probabilistic (e.g., Kushner [18], Kushner and Dupuis [19]). There are two types of results available on the rate of convergence. One is a global rate of convergence for deterministic problems which makes few assumptions regarding the regularity of the minimal cost function. The first paper to obtain results of this type was [5]. The rate is in the form of an upper bound on the error and is proportional to the square root of the discretization step size. Later papers considered a number of extensions, such as deterministic control problems (e.g., Capuzzo Dolcetta and Falcone [3], Capuzzo Dolcetta and Ishii [4], Gonzalez and Rofman [17]) and differential games (Souganidis [26]). In all these papers the same type of global but locally suboptimal rate estimate as in [5] is obtained. A second type of rate was obtained by Menaldi [23] in the context of control of a nondegenerate diffusion process * Received by the editors May 16, 1994; accepted for publication (in revised form) February 4, 1997. http://www.siam.org/journals/sicon/36-2/26778.html Division of Applied Mathematics, Brown University, Providence, RI 02912 (dupuis@ cfm.brown.edu). The research of this author was supported in part by the Air Force Office of Scientific Research (F49620-93-1-0264) and the Army Research Office (DAAH04-93-G-0070). Department of Engineering, Faculty of Engineering and Information Technology, Aus- tralian National University, Canberra, ACT 0200, Australia ([email protected], http://spigot.anu.edu.au/people/mat/home.html). 719 Downloaded 07/07/14 to 155.97.178.73. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Rates of Convergence for Approximation Schemes in Optimal Control

Embed Size (px)

Citation preview

Page 1: Rates of Convergence for Approximation Schemes in Optimal Control

RATES OF CONVERGENCE FOR APPROXIMATION SCHEMESIN OPTIMAL CONTROL∗

PAUL DUPUIS† AND MATTHEW R. JAMES‡

SIAM J. CONTROL OPTIM. c© 1998 Society for Industrial and Applied MathematicsVol. 36, No. 2, pp. 719–741, March 1998 012

Abstract. We present a simple method for obtaining rate of convergence estimates for ap-proximations in optimal control problems. Although the method is applicable to a wide range ofapproximation problems, it requires in all cases some type of smoothness of the quantity being ap-proximated. We illustrate the method by presenting a number of examples, including finite differenceschemes for stochastic and deterministic optimal control problems. A general principle can be ab-stracted, and indeed the method may be applied to a variety of approximation problems, such as thenumerical approximation of nonlinear PDEs not a priori related to control theory.

Key words. optimal control, numerical approximation, rate of convergence, finite differences,ergodic control, reflected diffusions, nonlinear PDE

AMS subject classifications. 93E20, 65N12, 65N15, 65N06, 93E25

PII. S0363012994267789

1. Introduction. A fundamental problem in numerical analysis is the determi-nation of the rate of convergence of approximation schemes. In general, the rate ofconvergence depends on the nature of the approximation and on the smoothness ofthe quantity being approximated. For example, the standard finite difference schemefor Laplace’s equation in a smooth domain converges with a rate proportional to thediscretization step size.

In optimal control theory, one is often faced with the problem of computing theminimal cost function, also referred to as the value function. In many cases, thevalue function can be characterized as an appropriate solution to a Hamilton–Jacobi–Bellman (HJB) equation that takes the form of a nonlinear PDE. In general, onecannot compute the value function explicitly, and instead must resort to a numericalapproximation. Various approximation schemes are available (e.g., finite difference orfinite element), and convergence results are either analytic (e.g., Crandall and Lions[5], Barles and Souganidis [1]) or probabilistic (e.g., Kushner [18], Kushner and Dupuis[19]). There are two types of results available on the rate of convergence. One is aglobal rate of convergence for deterministic problems which makes few assumptionsregarding the regularity of the minimal cost function. The first paper to obtain resultsof this type was [5]. The rate is in the form of an upper bound on the error and isproportional to the square root of the discretization step size. Later papers considereda number of extensions, such as deterministic control problems (e.g., Capuzzo Dolcettaand Falcone [3], Capuzzo Dolcetta and Ishii [4], Gonzalez and Rofman [17]) anddifferential games (Souganidis [26]). In all these papers the same type of global butlocally suboptimal rate estimate as in [5] is obtained. A second type of rate wasobtained by Menaldi [23] in the context of control of a nondegenerate diffusion process

∗Received by the editors May 16, 1994; accepted for publication (in revised form) February 4,1997.

http://www.siam.org/journals/sicon/36-2/26778.html†Division of Applied Mathematics, Brown University, Providence, RI 02912 (dupuis@

cfm.brown.edu). The research of this author was supported in part by the Air Force Office ofScientific Research (F49620-93-1-0264) and the Army Research Office (DAAH04-93-G-0070).‡Department of Engineering, Faculty of Engineering and Information Technology, Aus-

tralian National University, Canberra, ACT 0200, Australia ([email protected],http://spigot.anu.edu.au/people/mat/home.html).

719

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 2: Rates of Convergence for Approximation Schemes in Optimal Control

720 PAUL DUPUIS AND MATTHEW JAMES

with discounted cost. Here the regularity of the minimal cost function was exploitedto obtain sharp rates of convergence.

In this paper we present a simple method for obtaining rate of convergence es-timates that is applicable to a range of approximation problems, including the im-portant case of numerical approximation. The approach is closer in spirit to thatof Menaldi rather than Crandall and Lions in that we exploit, wherever possible,smoothness of the minimal cost to obtain sharp rates and in some cases an expan-sion of the error in terms of the discretization parameter. A minimal requirementfor the applicability of our method is local smoothness of the quantity that is beingapproximated. For example, in the setting of deterministic optimal control problemswe obtain a rate of convergence that is proportional to the discretization step sizein these regions. The practice of considering separately those regions where greaterregularity applies is standard in numerical analysis, and the information so obtainedis often more useful than a global but locally suboptimal rate of convergence.

The basic idea is as follows. In the problems we consider, the quantity V to beapproximated is represented as a functional of some process x, and the approximatingquantity V h is analogously represented as a functional of an approximating processxh. For a very simple (uncontrolled) example, suppose that V has a representationof the form

V (x) = Ex

[∫ ∞0

e−λtk(xt)dt],

where x is, say, a diffusion process. Suppose also that V h has the representation

V h(x) = Ex

[∫ ∞0

e−λtk(xht )dt]

in terms of an approximating process xh, which for simplicity we will assume tobe Markov. As we illustrate via several examples below, the assumed regularityproperties of V allow one to derive a second representation for V , this time in termsof the process xh:

V (x) = Ex

[∫ ∞0

e−λt(k(xht ) + eh(xht )

)dt

].

The function eh = V − V h is given explicitly in terms of the function V and thegenerators of the processes x and xh. Since V h is supposed to be close to V , onewould expect xh to be close in some sense to x. In fact, one typically has the weakconvergence of xh to x. When this convergence is coupled with the explicit form forthe error eh, a comparison of the two representations of V gives a rate of convergence,and also formulas for rate coefficients when enough regularity is available.

The rate of convergence has a number of uses, the most obvious being as a guidein the selection of step sizes and the comparison of algorithms in numerical approxi-mations. A second use in this setting is in comparing the contributions to the overallerror made by various “parts” of an approximation; e.g., one can consider a problemthat is posed on a bounded domain and compare the contributions made by approxi-mations on the interior and approximations to the boundary condition. This is donefor a reflecting diffusion problem with ergodic cost in section 4.

To show that the basic idea can be used in a variety of situations, we give thedetails for a number of problems that are quite different. In section 2, we consider

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 3: Rates of Convergence for Approximation Schemes in Optimal Control

RATES OF CONVERGENCE FOR APPROXIMATIONS IN CONTROL 721

the stochastic control problem analyzed by Menaldi [23], where the value functionis known to be smooth: V ∈ C2,α and the (global) rate of convergence for a finitedifference scheme is O(hα). Our approach appears to be simpler than Menaldi’s.The focus shifts in section 3 to a general class of approximations to a finite timedeterministic optimal control problem, including finite difference schemes. In section4 we treat an ergodic control problem for a reflecting diffusion process.

Other types of approximations arise in control theory. For example, a diffusionmodel is often a surrogate for a more realistic and more complicated controlled process.Underlying this replacement is (implicitly or explicitly) an approximation argument,in which it is supposed that the realistic process is embedded in a sequence of processeswhose weak limit is the controlled diffusion process. Rate of convergence estimatesare also useful in this context, and the method we discuss can in some cases be usedhere as well. We conclude with remarks on such possibilities and other extensions insection 5.

2. A stochastic optimal control example. In this section we introduce thebasic method for calculating rates of convergence. In order to place it in perspective, itis appropriate that we begin by considering one of the few stochastic control problemsfor which a rate of convergence is known, namely, the stochastic control problemstudied by Menaldi [23].

The dynamics are given by the controlled stochastic differential equation (SDE)

dxt = b(xt, ut) dt+ dwt,(2.1)

and the value function is defined by

V (x) .= infu∈U

Ex

[∫ τ

0e−λtk(xt, ut) dt

](2.2)

for x ∈ D. Here, D ⊂ Rn is a smooth bounded domain, τ = τ(x) = inf{t > 0 : xt 6∈D} is the exit time from D, λ > 0, U is a set of admissible U -valued control processes,U ⊂ Rm is compact, b ∈ C∞(Rn×Rm,Rn) and k ∈ C∞(Rn×Rm) are bounded anduniformly Lipschitz continuous, and Ex denotes expectation conditioned on x0 = x.For the definition of admissible controls, see [23].

Although in certain cases (e.g., uncontrolled systems or one-dimensional prob-lems) V may be more regular, it is known [16], [23, pp. 601–602] that in generalV ∈ C2,α(D) for some 0 < α < 1, and also that V is a classical solution of thedynamic programming or HJB equation{

λV (x) = minu∈U [LuV (x) + k(x, u)] in D,

V (x) = 0 on ∂D.(2.3)

In the last display Lu is the controlled diffusion generator defined for f ∈ C2(Rn) by

Luf(x) .= 〈b(x, u), fx(x)〉+12

tr[fxx(x)],

where trA denotes the trace of a square matrix A, and where fx(x) and fxx(x) denotethe gradient and Hessian of f at x, respectively.

For a real-valued function g(y) we define g+(y) = g(y)∨0 and g−(y) = −(g(y)∧0).For h > 0 let hZn = {hz : zi ∈ Z, i = 1, 2, . . . , n}. There are a number of ways toconstruct an approximation to V that has domain hZn. We focus for now on the

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 4: Rates of Convergence for Approximation Schemes in Optimal Control

722 PAUL DUPUIS AND MATTHEW JAMES

method that is probably most familiar. Thus we replace the differential operator Luby the finite difference operator Lhu defined by

Lhuf(x) .=12

n∑i=1

[f(x+ hei) + f(x− hei)− 2f(x)] /h2

+n∑i=1

b±i (x, u) [f(x± hei)− f(x)] /h,

where x ∈ hZn and ei, i = 1, . . . , n, are the standard unit vectors in Rn. Let us fixh0 > 0, and suppose that D′ is an open set containing the closed h0-neighborhood ofD. Given a function f ∈ C2,α(D), there exists a function f ′ ∈ C2,α

0 (D′) such thatf = f ′ in D (see [16, Lemma 6.37]). Henceforth, we assume h ≤ h0 and simply write ffor the extension f ′. By Taylor’s theorem, we see that the operator Lhu approximatesLu in the sense that if we define the “error”

ehf (x, u) .= Luf(x)− Lhuf(x),(2.4)

then

|ehf (x, u)| = O(hα) uniformly in D × U(2.5)

for all f ∈ C2,α(D). Note that this estimate is well defined, in view of our conventionof extension. The finite difference replacement of (2.3) that we consider is{

λV h(x) = minu∈U[LhuV

h(x) + k(x, u)]

in Dh,

V h(x) = 0 on ∂Dh,(2.6)

where Dh = D ∩ hZn and ∂Dh = (Rn\D) ∩ hZn.For v ∈ Rn and p ∈ [1,∞) define ‖v‖p

.= (|v1|p + · · ·+ |vn|p)1/p. The equation(2.6) can be interpreted as the HJB equation for a controlled Markov chain problem.To see this, multiply both sides of the first equation in (2.6) by ∆th(x), add V h(x),and then divide by (1 + λ∆th(x)). We thereby obtain the equation

V h(x) = minu∈U

∑z∈Nh(x)

11 + λ∆th(x)

(ph(x, z|u)V h(z) + ∆th(x)k(x, u)

) ,(2.7)

where

∆th(x) .=h2

n+ hmaxu∈U ‖b(x, u)‖1(2.8)

is a time interpolation scale, and the functions ph(x, z|u) are transition probabilitiesdefined by

ph(x, z|u) .=

h(maxu∈U ‖b(x, u)‖1 − ‖b(x, u)‖1)n+ hmaxu∈U ‖b(x, u)‖1

if z = x,

1/2 + hb±i (x, u)n+ hmaxu∈U ‖b(x, u)‖1

if z = x± hei, some i = 1, . . . , n,

0 otherwise.(2.9)

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 5: Rates of Convergence for Approximation Schemes in Optimal Control

RATES OF CONVERGENCE FOR APPROXIMATIONS IN CONTROL 723

The functions ph and ∆th(x) are defined as they are so that equation (2.7) can beinterpreted as an HJB equation for a problem involving a controlled Markov chain.If {ξhk , k = 0, 1, . . .} is a controlled Markov chain satisfying

Phx

(ξhk+1 = z | ξhl , ul, l = 0, . . . , k

)= ph(ξhk , z|uhk),

then V h has the representation

V h(x) = infu∈Uh

Ehx

Nh−1∑k=0

(k−1∏l=0

11 + λ∆th(ξhl )

)k(ξhk , u

hk)∆th(ξhk )

.(2.10)

Here, Nh is the exit time from Dh and Uh is an appropriate set of control policies[19]. The minimal cost V h(x) for this problem is well defined and also the uniquesolution to (2.6) [18, 19]. It is easy to formally check that (2.6) is the HJB equationfor (2.10). For a rigorous proof, we refer the reader to [25].

The original motivation for (2.6) is as a finite difference replacement for (2.3).The replacement of the problem (2.3) by the problem (2.6) could be viewed as anapproximation at the level of the PDE. An alternative point of view is to approximateat the level of the process, and this is the perspective naturally associated with (2.7)and the representation (2.10). More precisely, for the transition probabilities andinterpolation interval defined by (2.8) and (2.9), we have

Ehx

[ξhk+1 − ξhk | ξhl , ul, l = 0, . . . , k

]= b(ξhk , u

hk)∆th(ξhk )(2.11)

and

(2.12)

covhx[ξhk+1 − ξhk | ξhl , ul, l = 0, . . . , k

]= I∆th(ξhk ) +O(h)∆th(ξhk ) + [O(∆th(ξhk ))]2,

where cov stands for conditional covariance. These equations imply that an inter-polated version of the chain {(ξhk , uhk)} that uses the interpolation intervals ∆th(ξhk )is a good approximation to the original controlled process (2.1) in the sense of weakconvergence (see [19] and section 3). Although in this section we have motivated theforms of the transition probabilities and interpolation interval by starting with finitedifference approximations, in later sections we will find it more convenient to con-struct them directly. We refer the reader to [19, Chapter 5] for an in-depth discussionon methods for constructing these interpolation times and transition functions andfor a precise statement of the conditions they should satisfy.

Using either weak convergence methods [18, 19] or viscosity solution methods[1, 14], one can show that the approximation V h converges to V :

limh→0

supx∈Dh

|V h(x)− V (x)| = 0.

A rate of convergence result was obtained by Menaldi [23], in a more general setting.Here we show that the same rate estimate can be obtained easily by employing aprobabilistic representation for V in terms of the controlled Markov chain. Indeed,using (2.3) and (2.4), we see that V satisfies

λV (x) = minu∈U

[LhuV (x) + k(x, u) + eh(x, u)

]in Dh.(2.13)

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 6: Rates of Convergence for Approximation Schemes in Optimal Control

724 PAUL DUPUIS AND MATTHEW JAMES

A comparison of (2.13) with (2.6) indicates that V has a representation of exactly thesame form as (2.10), save that the perturbed running cost k + eh is used; viz.,

(2.14)

V (x) = infu∈Uh

Ehx

Nh−1∑k=0

(k−1∏l=0

11 + λ∆th(ξhl )

)(k(ξhk , u

hk) + eh(ξhk , u

hk))

∆th(ξhk )

.Note in particular that the two representations are in terms of the same controlledMarkov chain. This easily leads to the following rate of convergence result.

THEOREM 2.1. Given that V ∈ C2,α(D), we have

supx∈Dh

|V h(x)− V (x)| = O(hα)(2.15)

as h ↓ 0.Proof. For ε > 0 consider an ε-optimal control policy for the right-hand side of

(2.10), and let {(ξhk , uhk)} denote the associated controlled chain. Thus

V h(x) ≥ Ehx

Nh−1∑k=0

(k−1∏l=0

11 + λ∆th(ξhl )

)k(ξhk , u

hk)∆th(ξhk )

− ε.From the representation (2.14) we obtain the estimate

V (x)− V h(x) ≤ Ehx

Nh−1∑k=0

(k−1∏l=0

11 + λ∆th(ξhl )

)eh(ξhk , u

hk)∆th(ξhk )

+ ε.

The definition of ∆th(x) implies the existence of 0 < γ1 ≤ γ2 <∞ such that γ1h2 ≤

∆th(x) ≤ γ2h2 for all x ∈ hZh. Using the bound (2.5), the last equation implies

V (x)− V h(x) ≤[ ∞∑k=0

(1

1 + λγ1h2

)kγ2h

2

]O(hα) + ε = O(hα) + ε,

where O(hα) is uniform in all admissible controls and x ∈ Dh. Since ε > 0 isarbitrary we have V (x)−V h(x) ≤ O(hα) uniformly in x ∈ Dh. The reverse inequalityV h(x)− V (x) ≤ O(hα) is proved in the same way, save that we consider policies thatare ε-optimal for the right-hand side of (2.14).

Remark 2.2. The results of this section continue to hold if the SDE (2.1) has auniformly elliptic diffusion coefficient σ(x, u).

A formula for the rate. If V enjoys greater regularity than was used above,and if there exist a unique (in law) optimal policy u∗ and process x∗ for each initialcondition x ∈ D, then one can derive an explicit formula for the rate of convergence.For example, if V ∈ C3,α(D), then

eh(x, u)h

= −12

n∑i=1

|bi(x, u)|Vxixi(x) +O(hα).

To simplify, we assume that all points where the controlled chain might be stopped liein ∂G. It turns out that the assumed uniqueness allows one to show that interpolated

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 7: Rates of Convergence for Approximation Schemes in Optimal Control

RATES OF CONVERGENCE FOR APPROXIMATIONS IN CONTROL 725

versions of the optimal discrete time process and control converge weakly to theoptimal control and process for the continuous time problem. This can in turn beused to show that

limh→0

V h(x)− V (x)h

= Ex

[∫ τ

0e−λt

12

n∑i=1

|bi(x∗t , u∗t )|Vxixi(x∗t ) dt].(2.16)

The analogous argument for a different control problem will be given in detail insection 3.

As an example of how such information might be useful, suppose that insteadof the one-sided approximations to 〈b(x, u), fx(x)〉 used above, we consider instead acentral difference approximation:

〈b(x, u), fx(x)〉 →n∑i=1

bi(x, u) (f(x+ hei)− f(x− hei)) /2h.

In this case V h(x) has an interpretation as a functional of a controlled Markov chainif and only if h|bi(x, u)| ≤ 1 for all i, x, and u of interest. Let us assume thatthis condition holds. Then, under the assumption that V ∈ C3,α(D), we obtainV h(x) − V (x) = O(h1+α). Under additional regularity one can obtain an even morerefined expression in the spirit of (2.16).

Remark 2.3. Although we have used in a crucial way the fact that V solves theHJB equation (2.3), it is not actually necessary to make an analogous assumptionwith respect to the value functions for the approximations. This can be useful incases where the HJB equations for the prelimit problems are not sufficiently wellunderstood. In such cases an additional argument is needed to show that the minimalcosts for the prelimit problems can be arbitrarily well approximated by problems forwhich the associated HJB equations are known to hold rigorously, e.g., approximationin terms of a countable state space controlled Markov chain. This can be establishedin wide generality by means of weak convergence techniques [19]. However, even ifthey hold only in a formal sense the relations between the HJB equations for thelimiting and prelimit control problems are very useful in motivating the general lineof reasoning that we use.

3. A deterministic optimal control example. Consider the following deter-ministic control system: {

xs = b(xs, us), t < s < T,

xt = x,(3.1)

and finite-horizon value function

V (x, t) = infu∈Ut

[∫ T

t

k(xs, us) ds+ g(xT )

],(3.2)

where b, k, etc., are as in section 2, Ut consists of all measurable functions u :[t, T ]→ U , and g ∈ C∞(Rn) is bounded and uniformly Lipschitz continuous. DefineLuf(x, t) = 〈b(x, u), fx(x, t)〉. Then V ∈ C(Rn × [0, T ]) is Lipschitz continuous andis the unique (viscosity) solution of the HJB equation [14] Vt + min

u∈U[LuV (x, t) + k(x, u)] = 0 in Rn × (0, T ),

V (x, T ) = g(x) for x ∈ Rn.(3.3)

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 8: Rates of Convergence for Approximation Schemes in Optimal Control

726 PAUL DUPUIS AND MATTHEW JAMES

In this section we will consider a general class of approximations to V . In theprevious section we assumed that the solution to the appropriate HJB equation wasregular on the entire domain of interest. In contrast, in these deterministic exampleswe can consider the case where the solution may be regular only on a subset of thedomain. In the first subsection we introduce a class of approximations to (3.2). In thesecond and third subsections we consider the cases where the value function is globallyregular and regular only on a subset, respectively. Besides proving a rate result, wealso show how under certain conditions the coefficients can be identified. Finally, insection 3.4 we give examples from finite difference numerical approximation.

3.1. A general approximation. The class of approximations to (3.1) and (3.3)we consider can be thought of as discrete time “small noise” approximations. Includedare small noise optimal control problems associated with the large deviation theoryfor small noise discrete time stochastic systems, as well as explicit finite differenceschemes for the numerical approximation of V . Let δ > 0 denote the approximationparameter. We will restrict δ to values such that T/δ is an integer. While this isdone in part just for convenience, it also turns out that this assumption plays a rolein determining the specific form for the rate of convergence. See the remark afterTheorem 3.3.

For each such δ > 0, x ∈ Rn, and u ∈ U , let µδx,u denote a probability measureon Rn. In order to have the processes we work with well defined, we assume that themapping (x, u) → µδx,u(A) is Borel measurable for each Borel set A ⊂ Rn. Definetδk = kδ and Nδ = T/δ. We consider controlled discrete time processes {ξδi } thatevolve according to

Pδx,tδk

(ξδi+1 − ξδi

δ∈ A

∣∣∣∣∣ (ξδj , uδj), j ∈ {k, . . . , i})

= µδξδi ,uδi(A).

Here Pδx,tδk

denotes probability conditioned on ξδk = x.We consider the family of value functions

V δ(x, tδk) .= infuδ∈Uδ

Ehx,tδk

Nδ−1∑i=k

k(ξδi , uδi )δ + g(ξδNδ)

.The admissible controls in this case can be taken to be the feedback control laws(i.e., each uδk is simply a measurable function from Rn to U), which implies that thecontrolled process is a nonstationary Markov chain. In order for V δ to be close to Vwe must impose some conditions on µδx,u. Define bδ(x, u) to be the mean of µδx,u(dy):

bδ(x, u) .=∫

Rn

yµδx,u(dy).

We require that

bδ(x, u) = b(x, u) +O(δ) and∫

Rn

‖y‖2µδx,u(dy) = O(1),(3.4)

where the O(δ) and O(1) are uniform on compact subsets of Rn×U . For conveniencewe will also assume that the supports of the measures µδx,u(dy) are bounded uniformlyfor all δ > 0, x ∈ Rn, and u ∈ U , which automatically implies the second part of

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 9: Rates of Convergence for Approximation Schemes in Optimal Control

RATES OF CONVERGENCE FOR APPROXIMATIONS IN CONTROL 727

(3.4). A less restrictive assumption that could be used instead is a uniform bound onthe moment generating functions: for each α ∈ Rn

supδ>0,u∈U,x∈Rn

∫Rn

exp〈α, y〉µδx,u(dy) <∞.(3.5)

Of course, this condition is implied by the assumption of a uniform bound on thesupports. These conditions are usually easy to check.

The minimal costs V δ satisfy the following HJB equation [2]:

∂δt Vδ(x, tδk) + min

u∈U

[LδuV

δ(x, tδk+1) + k(x, u)]

= 0 in Rn × {0, . . . , Nδ − 1},

V δ(x, T ) = g(x) for x ∈ Rn,

(3.6)

where

∂δt f(x, tδk) .=(f(x, tδk+1)− f(x, tδk)

)/δ

and

Lδuf(x, t) .=∫

Rn

(f(x+ δy, t)− f(x, t))µδx,u(dy)/δ.

By weak convergence methods [18, 19] (or, alternatively, by viscosity solutionmethods [1, 14]), one can prove convergence for this scheme:

limδ→0

supx∈Rn, |x|≤C

supk=0,1,...,Nδ

|V δ(x, tδk)− V (x, tδk)| = 0

for each C < ∞. The rate of convergence depends on the smoothness of the valuefunction V . In general, V is merely Lipschitz continuous and may fail to be dif-ferentiable everywhere. However, when V is smooth a rate estimate can easily beestablished.

3.2. The globally smooth case. Assume now that V ∈ C2(Rn × [0, T ]). Toobtain a rate of convergence we follow the same procedure as in section 2. Thus thefirst step is to obtain a representation for V in terms of the controlled chain. ByTaylor’s theorem, V satisfies the discrete equation

∂δt V (x, tδk) + minu∈U

[LδuV (x, tδk+1) + k(x, u) + eδ(x, u, tδk)

]= 0(3.7)

in Rn × {0, . . . , Nδ − 1}, where

eδ(x, u, t) .=(LuV (x, t)− LδuV (x, t)

)+(Vt(x, t)− ∂δt V (x, t)

).

We therefore have the representation

V (x, tδk) = infuδ∈Uδ

Eδx,tδk

Nδ−1∑i=k

(k(ξδi , u

δi ) + eδ(ξδi , u

δi , t

δi ))δ + g(ξδNδ)

.(3.8)

Equation (3.4) implies eδ(x, u, t) = O(δ) uniformly on compact subsets.This representation leads to the following result, whose proof is exactly analogous

to the proof of Theorem 2.1.

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 10: Rates of Convergence for Approximation Schemes in Optimal Control

728 PAUL DUPUIS AND MATTHEW JAMES

THEOREM 3.1. Assume that V ∈ C2(Rn × [0, T ]) and that (3.4) holds. Then foreach C <∞ we have

supx∈Rn, |x|≤C

supk=0,1...,Nδ

|V δ(x, tδk)− V (x, tδk)| = O(δ)(3.9)

as δ ↓ 0.Remark 3.2. Similar results can be obtained for differential game problems, and

also for implicit schemes [19].If V enjoys a greater degree of regularity and (3.4) is replaced by a stronger as-

sumption, we can refine this result and obtain an explicit expression for the coefficientin the rate of convergence. Let B be any n × n symmetric matrix. In place of (3.4)we assume

bδ(x, u) = b(x, u) + δs(x, u) + o(δ),∫Rn

〈By, y〉µδx,u(dy) = q(x, u,B) + o(1),(3.10)

where s(x, u) and q(x, u,B) are continuous in (x, u,B), and where the o(δ) and o(1)terms are uniform in compact subsets of Rn × U . We will also need to make thefollowing assumption:

minu∈U [b(x, u) · p+ k(x, u)] attains a unique minimum

at U∗(x, p), where U∗ is of class C1.(3.11)

Define

r(x, u, t) .= 〈Vx(x, t), s(x, u)〉+ q(x, u, Vxx(x, t)) +12Vtt(x, t).

Note that −eδ(x, u, t)/δ → r(x, u, t) uniformly on compact sets.THEOREM 3.3. Assume that V ∈ C3(Rn × [0, T ]) and (3.10), (3.11) hold. Then

we have the explicit rate of convergence

limδ→0, xδ→x, tδk→t

V δ(xδ, tδk)− V (xδ, tδk)δ

=∫ T

t

r(x∗s, u∗s, s) ds,(3.12)

uniformly on compact subsets, where x∗s is the optimal trajectory corresponding to theunique optimal feedback control u∗s = u∗(x∗s, s) = U∗(x∗s, Vx(x∗s, s)), t ≤ s ≤ T , withinitial condition x∗t = x.

Proof. We first prove that u∗ is the unique optimal control. Let us be any controland let xs be the associated controlled trajectory that starts at x at time t [24]. Wefollow the convention of saying that u = u∗ if and only if us = u∗s for almost every(a.e.) s ∈ [t, T ]. From equation (3.3) we obtain

Vs(x, s) + LusV (x, s) + k(x, us) ≥ 0,

with equality if and only if us = u∗(x, s). Integrating along the trajectory yields

∫ T

t

k(xs, us) ds+ g(xT ) ≥ V (x, t),(3.13)

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 11: Rates of Convergence for Approximation Schemes in Optimal Control

RATES OF CONVERGENCE FOR APPROXIMATIONS IN CONTROL 729

with equality if and only if us = u∗(xs, s) for a.e. s ∈ [t, T ]. Since the solutionto φs = b(φs, u∗(φs, s)) is unique for any initial condition (i.e., x = x∗), we obtainequality if and only if u = u∗.

We now prove the rate result. Following the argument of Theorem 2.1, we let{(ξδi , uδi )} be a δ2-optimal chain and control for the representation of V (xδ, tδk) givenin (3.8). Define interpolated state and control processes xδ, uδ by xδs = ξδi , uδs = uδi on[tδi , t

δi+1) [18, 19]. It follows from the boundedness of the supports of the measures µδx,u

that the random processes {(xδ, uδ), δ > 0} are tight (for the precise topology usedon the control process, see [19]). It follows from (3.4) and an elementary martingaleargument that any limit satisfies (3.1) with probability 1 (w.p.1). Since we haveequality in (3.13) if and only if u = u∗, the fact that V δ(xδ, tδk) → V (x, t) (Theorem3.1) and an argument by contradiction imply the weak convergence

xδ, uδ =⇒ x∗, u∗

as δ → 0, xδ → x, tδk → t. Now since {(ξδi , uδi )} is δ2-optimal in the representation(3.8),

V δ(xδ, tδk)− V (xδ, tδk)δ

≤ Eδxδ,tδk

Nδ−1∑i=k

(r(ξδi , u

δi , t

δi ) + o(1)

+ δ,

where the o(1) term is uniform on compact sets. Thus, by the dominated convergencetheorem,

lim supδ→0,xδ→x, tδk→t

V δ(xδ, tδk)− V (xδ, tδk)δ

≤∫ T

t

r(x∗s, u∗s, s) ds.

The opposite inequality is proven similarly, completing the proof.Remark 3.4. Although the assumption that T/δ is an integer is not needed for

convergence or even Theorem 3.1, it is needed if we wish to identify the rate coefficientas in the last theorem.

3.3. The general case. In general, V is not smooth everywhere, and conse-quently one obtains a slower global rate of convergence (see the discussion in thefollowing subsection on rates for numerical schemes). However, because V is smoothin certain regions N ⊂ Rn × [0, T ] [11, 12], one might expect the rate to be faster inthese regions of smoothness. We now show that the rate is O(δ) in such regions.

Let N be an open, bounded subset of Rn × [0, T ]. Following [12, 13], the set Nis called a region of strong regularity (RSR) provided

1. V ∈ C3(N ).2. Assumption (3.11) holds.3. Given (x, t) ∈ N , denote by x∗s and u∗s = u∗(x∗s, s) = U∗(x∗s, Vx(x∗s, s)), t ≤s ≤ T , the unique optimal state trajectory and control with initial conditionx∗t = x. Define

σ = σx,t = inf {s > t : (x∗s, s) 6∈ N} ,

y = yx,t = x∗(σ), z = zx,t = (y, σ).

Then (x, t) ∈ N implies (x∗s, s) ∈ N , t ≤ s < σ, and σx,t = T .

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 12: Rates of Convergence for Approximation Schemes in Optimal Control

730 PAUL DUPUIS AND MATTHEW JAMES

4. ∂N = Γ1∪Γ2, where Γ1 = {zx,t : (x, t) ∈ N} is an open subset of Rn×{T}.For information regarding the existence of RSRs, we refer the reader to [11, 12].

THEOREM 3.5. Assume (3.10) and let N be an RSR. Then

limδ→0, xδ→x, tδk→t

V δ(xδ, tδk)− V (xδ, tδk)δ

=∫ T

t

r(x∗s, u∗s, s) ds(3.14)

uniformly on compact subsets of N . Consequently,

|V δ − V | = O(δ) in N

as δ → 0.Proof (sketch). This result is proven by modifying the proof of Theorem 3.3 along

the lines of [12, 13]. However, in this proof we use a slightly modified representationfor V (x, t). In place of (3.8) we exploit the strong Markov property to write

V (x, tδk) = infuδ∈Uδ

Eδx,tδk

Mδ−1∑i=k

(k(ξδi , u

δi ) + eδ(ξδi , u

δi , t

δi ))δ + V (zδ)

,where M δ = inf{i > k : (xδi , t

δi ) 6∈ N} is the discrete time of first exit from N ,

σδ = tδMδ , and zδ = (ξδMδ , σδ). We can also write an analogous representation for

V δ(x, tδk) in terms of this stopping time and location. If we let {(ξδi , uδi )} be δ2-optimalfor V as in the proof of Theorem 3.3, then we obtain

V δ(xδ, tδk)− V (xδ, tδk)δ

≤ Eδxδ,tδk

Mδ−1∑i=k

(r(ξδi , u

δi , t

δi ) + o(1)

)δ +

V δ(zδ)− V (zδ)δ

+ δ.

Recall that the bound (3.5) holds for the moment generating functions of thedistributions µδx,u. Because of this bound an upper large deviation principle holds forthe interpolated processes xδs [7, 8]. The large deviation upper bound implies that ifxδ → x and tδk → t as δ → 0, then given η > 0, there exists c > 0 such that for allsufficiently small δ > 0,

Pδxδ,tδk

(sup

tδk≤s≤T|xδs − x∗s| > η

)≤ e−c/δ,

and thus by parts (iii) and (iv) of the definition of a RSR, for all sufficiently smallη > 0,

Pδxδ,tδk

(|zδ − z| > η

)≤ e−c/δ.

Since V (x, t) and V δ(x, t) are uniformly bounded in (x, t) ∈ Rn× [0, T ] and δ ∈ (0, 1),(V δ(zδ)−V (zδ))/δ is uniformly bounded above by some constant times 1/δ. Therefore

V δ(xδ, tδk)− V (xδ, tδk)δ

≤ Eδxδ,tδk

[(∫ T∧σδ

tδk

r(xδs, uδs, s) ds+ o(1)

)1{|zδ−z|<η}

]

+ O(e−c/δ) (1 +O(1/δ)) ,

and we can conclude as in the proof of Theorem 3.3.

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 13: Rates of Convergence for Approximation Schemes in Optimal Control

RATES OF CONVERGENCE FOR APPROXIMATIONS IN CONTROL 731

3.4. Numerical approximations. In this subsection we specialize from theprevious two subsections to the case of an explicit finite difference approximation V h

to V . As in section 2, let h > 0 denote the space discretization step size, ∆th denotethe time discretization, etc. Select v > 0 such that

v ≥ maxx∈Rn, u∈U

‖b(x, u)‖1,

and define the time step size

∆th .= h/v.

We restrict attention to values of h > 0 such that Nh = T/∆th is an integer. Wedefine the discrete times thk

.= k∆th and consider the transition probabilities

ph(x, z|u) .=

1− ‖b(x, u)‖1/v if z = x,

b±i (x, u)/v if z = x± hei for some i = 1, . . . , n,

0 otherwise.

We fit this example into the general framework by setting δ .= h/v and definingµδx,u by

µδx,u(A) .=∑

w∈Zn:vw∈Aph(x, x+ hw|u)

for all Borel sets A ⊂ Rn. These definitions imply∫Rn

yµδx,u(dy) = b(x, u),

∫Rn

〈By, y〉µδx,u(dy) =n∑i=1

vBii|bi(x, u)|,

where B = (Bij). Hence equation (3.4) holds and we may apply Theorem 3.1. Thefunction r(x, u, t) in this example takes the form

vn∑i=1

Vxixi(x, t)|bi(x, u)|+ 12Vtt(x, t),

and under the appropriate conditions Theorems 3.3 and 3.5 hold as well.As an application of the rate of convergence results, consider the following modifi-

cation of the numerical approximation. It is well known that it is at least theoreticallyadvantageous to allow the interpolation times ∆th to depend on the state and control:∆th = ∆th(x, u). While it is obvious that such added flexibility in the selection ofa numerical scheme can only help, it may not be the case that the additional effortrequired to program such schemes is worth the improvement in accuracy. The rate re-sult allows one to estimate the improvement before implementing a more complicatedscheme. Note also that the total time taken to complete the numerical computationsmay be reduced if the interpolation times are allowed to depend on the state andcontrol.

We consider such a modification for the example that was just considered. Theunderlying reason why one expects state- and control-dependent interpolation times

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 14: Rates of Convergence for Approximation Schemes in Optimal Control

732 PAUL DUPUIS AND MATTHEW JAMES

to improve numerical performance is because they allow one to reduce the probabilitythat the controlled chain remains at any given state; i.e., they allow one to designchains for which ph(x, x|u) = 0 [19, Chapter 5]. With only a little effort, one can mod-ify the proofs of the theorems stated above to allow such state and control dependencyof the interpolation times. (Note that if ∆th(x) is state- or control-dependent then forany given x one does not know a priori which continuous times correspond to the inter-polation times chosen by the discrete algorithm. Because of this, one must keep trackof the interpolation times used as one iterates backward when solving the discrete HJBequation and define V h(x, 0) via an interpolation.) The state-dependent interpolationtimes and transition probabilities that are appropriate are ∆th(x, u) = h/|b(x, u)‖1and

ph(x, z|u) =

0 if z = x,

b±i (x, u)/‖b(x, u)‖1 if z = x± hei for some i = 1, . . . , n,

0 otherwise.

The measures µδx,u are defined as before. With these definitions, we again have∫Rn yµ

δx,u(dy) = b(x, u), but now

r(x, u, t) = ‖b(x, u)‖1n∑i=1

Vxixi(x, t)|bi(x, u)|+ 12Vtt(x, t).

Since v ≥ ‖b(x, u)‖1 for all x and u, one expects the rate with the new transitionprobabilities and interpolation times to often be better than that of the previoussetup. If v is much larger than “typical values” of ‖b(x, u)‖1, then the extra pro-gramming effort may indeed be worthwhile. However, if one has a bound such asa ≤ infx,u ‖b(x, u)‖1 ≤ supx,u ‖b(x, u)‖1 ≤ Ca, where C is not very large, then it isprobably not worthwhile.

4. An example with ergodic cost and a reflecting diffusion. In order todemonstrate the versatility of the approach, in this section we will consider a variationon the numerical approximation problem considered in section 2. More precisely, wetreat the analogous problem where the cost to be minimized is an ergodic cost, andwhere a reflecting diffusion replaces the model (2.1). In order to define the reflectingdiffusion model we must specify a reflection direction for each point of ∂D. Let n(x)denote the inward unit normal to ∂D at x ∈ ∂D. The reflection direction will bedenoted by a unit vector r(x). We will assume that 〈r(x), n(x)〉 > 0 for all x ∈ ∂D,and that r ∈ C∞(Rm). Since ∂D is smooth, we can assume that the function nis defined and smooth in an open neighborhood O of ∂D, and that 〈r(x), n(x)〉 isuniformly bounded below away from zero on O.

We next describe the reflected diffusion model. Since the theory of such equationsis not our focus here, the description will only be heuristic. A precise definition canbe found in [21] or [6]. The replacement for (2.1) takes the form

dxt = b(xt, ut) dt+ dwt + dzt,(4.1)

where b satisfies all the assumptions used in section 2. The process zt is a w.p.1bounded variation function of t that constrains xt to remain in D. It acts in thefollowing way. As long as xt ∈ D (recall that D is open), zt does not affect theprocess xt at all, which means that dzt = 0 for all such t. If xt ∈ ∂D, then zt can“push” the process so as to maintain xt ∈ D. The requirements on the “push” are:

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 15: Rates of Convergence for Approximation Schemes in Optimal Control

RATES OF CONVERGENCE FOR APPROXIMATIONS IN CONTROL 733

• that it be in the direction r(xt),• that xt ∈ D for all t w.p.1.

These requirements are formalized by the equations

|z|t =∫ t

0I{xs∈∂D}d|z|s and zt =

∫ t

0r(xs)d|z|s,

where |z|t denotes the total variation of z on the interval (0, t]. Under the assumptionsmade above on b, r, and D, a solution to (4.1) exists and is unique. For precisestatements and more discussion, we refer the reader to [21, 6, 19].

The reflecting diffusion model described above is especially useful when the con-trolled process is considered on an infinite time horizon, since it allows the domainon which the process is defined to be bounded without actually stopping the processwhen it hits ∂D. In some problems, there is a cost proportional to the constrainingaction of the process zt. Because of this, we consider the minimal cost defined by

γ.= inf

u∈Ulim supT→∞

1T

Ex

[∫ T

0k(xt, ut) dt+

∫ T

0l(xt) d|z|t

],(4.2)

where l ∈ C∞(Rn) is bounded and uniformly Lipschitz continuous. Although a priorithe minimal cost might depend on the initial condition x, it turns out under ourassumptions that the cost is independent of x.

The appropriate HJB equation for this problem isγ = min

u∈U[LuV (x) + k(x, u)] in D,

0 = l(x) + 〈Vx(x), r(x)〉 on ∂D,(4.3)

where Lu is again defined by

Luf(x) .= 〈b(x, u), fx(x)〉+12

tr[fxx(x)].

The solution to this equation is the pair (γ, V (·)). Note that if (γ, V (·)) solves (4.3),then so does (γ, V (·) + c) for any c ∈ R. It turns out that this is exactly the form ofnonuniqueness associated with the solutions to (4.3); i.e., if (γ1, V1(·)) and (γ2, V2(·))both solve (4.3), then γ1 = γ2 and V1(·)− V2(·) is a constant. We will assume, as insection 2, that V ∈ C2,α(D) (see [22]).

A general reference for the Markov chain optimal control problems discussed inthis section is section 5 of Chapter 7 in [19]. Recall the definitionDh .= D∩hZn. Whilethe process xt is in D it is the same as the process of section 2. This suggests thatwe can continue to use the transition probabilities and interpolation intervals definedby (2.9) and (2.8), respectively. However, we must still define the approximations forthe boundary condition. Define the operator A by

Af(x) = 〈fx(x), r(x)〉

for f ∈ C1(Rn). Then the boundary condition can be written 0 = l(x) + AV (x) forx ∈ ∂D. Let ∂Dh

+ be a set that contains all points in (Rn\D) ∩ hZn that can bereached from some point in Dh in one step for some choice of the control, i.e., all ysuch that

ph(x, y|u) > 0 for some x ∈ Dh and u ∈ U.

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 16: Rates of Convergence for Approximation Schemes in Optimal Control

734 PAUL DUPUIS AND MATTHEW JAMES

We interpret ∂Dh+ as the “discrete reflecting boundary.” Although one can often

take ∂Dh+ to be exactly those points that can be reached in one step from Dh, the

formulation as given above, which allows a bigger set, is sometimes needed. We willassume that for all h sufficiently small, ∂Dh

+ ⊂ O, and remind the reader that O isan open set on which both n(·) and r(·) are defined.

We next consider the transition probabilities for x ∈ ∂Dh+. The role of these

transitions will be to “mimic” the behavior of the reflecting term zt. The constructionof the transition functions obviously depends on the shape of ∂D, Dh∪∂Dh

+, and thefunction r(·). For most problems the construction is straightforward and intuitive,since we are dealing here with only first-order boundary operators. Since it is notour goal to discuss methods for constructing these functions, we will simply assumethe existence of transition probabilities that satisfy the local consistency equations(4.4) and (4.5) below, and refer the reader to Chapters 5 and 8 of [19] for furtherinformation.

Let ph(x, y) be the transition function for points x ∈ ∂Dh+. (Note that we do not

include a control for such states. This is because in our setup the reflection directionis not controlled. An interesting example where the reflection direction is controlledappears in [20].) Let αh(x)r(x)+sh(x) denote the decomposition of the mean discretereflection mh(x) .=

∑y∈Dh∪∂Dh+

[y − x] ph(x, y) into the orthogonal projection ontothe subspace spanned by r(x) and its complement. Then the minimal type of “localconsistency” we require of the functions ph(x, y) is

infh>0,x∈∂Dh+

αh(x)/h > 0,

sh(x)/h→ 0 uniformly in x ∈ ∂Dh+,(4.4)

and

ch(x)/h .=∑

y∈Dh∪∂Dh+

[y − x−mh(x)

] [y − x−mh(x)

]′ph(x, y)/h→ 0(4.5)

uniformly in x ∈ ∂Dh+. This last equation is automatic if ph is only supported on

neighboring points. The essential consequence of these conditions is that sh(x)/αh(x)→0 and ch(x)/αh(x) → 0 uniformly in ∂Dh

+, the first of which shows that the compo-nent orthogonal to r(x) vanishes faster than the component in the direction r(x), andthe second of which shows that the quadratic variation around the mean vanishesfaster than αh(x). It is often the case that one can choose the probabilities so thatsh(x) ≡ 0. We must also assume that the “radius” of ∂Dh

+ tends to zero:

supx∈∂Dh+

infy∈Dh

‖x− y‖ → 0(4.6)

as h→ 0.Define the operator Ah by

Ahf(x) =∑

y∈Dh∪∂Dh+

[f(y)− f(x)]ph(x, y)αh(x)

for points x ∈ ∂Dh+. Then the conditions given above imply for all f ∈ C1(Rn) that∣∣Af(x)−Ahf(x)

∣∣ = o(1)

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 17: Rates of Convergence for Approximation Schemes in Optimal Control

RATES OF CONVERGENCE FOR APPROXIMATIONS IN CONTROL 735

D

x1

x2

FIG. 4.1. Boundary portion ∂Dh+.

uniformly in x ∈ ∂Dh+, which is sufficient for convergence. However, in order to specify

a rate of convergence, we need to be more precise in describing how fast sh(x) andch(x) tend to zero. We will assume that

sh(x) = O(h2), ch(x) = O(h2)(4.7)

uniformly in x ∈ ∂Dh+. (Note that if we want to identify coefficients, then more is

needed; i.e., we need expansions of the form

sh(x) = h2s(x) + o(h2), ch(x) = h2c(x) + o(h2)

for some continuous functions s(x) and c(x).)Example 4.1. We consider the case n = 2, D = {x : ‖x‖ ≤ 1}, and r(x) = n(x).

Suppose h = 1/k, where k is an integer. In this case we can take the set ∂Dh+ to be

as in Figure 4.1. The definition is

ph(x, y) =

x∓1|x1|+ |x2|

if y = x± h(1, 0),

x∓2|x1|+ |x2|

if y = x± h(0, 1),

0 otherwise.

For this example, we have

mh(x) = hr(x)‖x‖2/‖x‖1, αh(x) = h‖x‖2/‖x‖1, sh(x) = 0,

and

ch(x) = h2x22|x1|+ x2

1|x2|‖x‖31

(1 11 1

).

The transition probabilities at the points of ∂Dh+ will play the role of the con-

straining process z in (4.1); i.e., if the process attempts to leave Dh then it is returned

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 18: Rates of Convergence for Approximation Schemes in Optimal Control

736 PAUL DUPUIS AND MATTHEW JAMES

instantly by a “push” in the appropriate direction [19]. Because of the instantaneousnature of the push, the interpolation interval that is correct for the points in ∂Dh

+ is∆th(x) = 0.

The discrete replacement for (4.3) is γh = minu∈U

[LhuV

h(x) + k(x, u)]

in Dh,

0 = l(x) +AhV h(x) on ∂Dh+,

(4.8)

where Lhu is as in section 2.For an admissible control {uhk , k = 0, 1, . . .}, let {ξhk , k = 0, 1, . . .} be the corre-

sponding controlled Markov chain; i.e.,

Phx

(ξhk+1 = z | ξhl , ul, l = 0, . . . , k

)= ph(ξhk , z|uhk) if x ∈ Dh,

and

Phx

(ξhk+1 = z | ξhl , ul, l = 0, . . . , k

)= ph(ξhk , z) when x ∈ ∂Dh

+.

Define T hN =∑N−1i=0 ∆th(ξhi ). Note that when ξhi ∈ ∂Dh

+ the corresponding summandin ThN is zero. Equation (4.8) is the HJB equation for the Markov chain stochasticoptimal control problem whose transition probabilities are those given above and forwhich the cost to be minimized is

γh = lim supN→∞

Ehx

[(N−1∑i=0

k(ξhi , uhi )∆th(ξhi ) +

N−1∑i=0

I{ξhi ∈∂Dh+}l(ξhi )αh(ξhi )

)/ThN

]

(cf. [19, Chapter 7]). One can easily check that the chain is ergodic for any timeindependent feedback control. Because of this, the limit superior is actually a limit,and the limiting value is independent of x. The equation (4.8) exhibits the sametype of nonuniqueness as the original HJB equation (4.2), namely, if (γh1 , V

h1 (·)) and

(γh2 , Vh2 (·)) both solve (4.8), then γh1 = γh2 and V h1 (·)− V h2 (·) is a constant.

Define the boundary error

gh(x) .= AV (x)−AhV (x).

Thanks to our assumptions on V and (4.7), we have

|gh(x)| = O(h).

We can then rewrite (4.3) in a form analogous to that of (4.8):{γ = minu∈U

[LhuV (x) + k(x, u) + eh(x, u)

]in Dh,

0 = l(x) +AhV (x) + gh(x) on ∂Dh+,

(4.9)

where eh is defined as in section 2 by

eh(x, u) .= LuV (x)− LhuV (x).

(Recall that |eh(x, u)| = O(hα).) Thus γ has a representation as the minimal cost forthe Markov chain optimal control problem whose transition probabilities are the same

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 19: Rates of Convergence for Approximation Schemes in Optimal Control

RATES OF CONVERGENCE FOR APPROXIMATIONS IN CONTROL 737

as those for γh and for which the cost to be minimized is

γ = lim supN→∞

Ehx

[(N−1∑i=0

[k(ξhi , u

hi ) + eh(ξhi , u

hi )]

∆th(ξhi )

+N−1∑i=0

I{ξhi ∈∂Dh+}[l(ξhi ) + gh(ξhi )

]αh(ξhi )

)/ThN

].

A comparison of these two representations allows us to prove the following rateof convergence.

THEOREM 4.2. Assume (4.7) and all the smoothness conditions assumed of b, k,∂D, etc., in this section and section 2. Given that V ∈ C2,α(D), we have

supx∈Dh

|V h(x)− V (x)| = O(hα)(4.10)

as h ↓ 0.Proof. We can use the same proof as that of Theorem 2.1 as soon as we show

that

lim supN→∞

Ehx

[(N−1∑i=0

eh(ξhi , uhi )∆th(ξhi ) +

N−1∑i=0

I{ξhi ∈∂Dh+}gh(ξhi )αh(ξhi )

)/ThN

]= O(hα)

(4.11)

uniformly in all admissible controls and x ∈ Dh.The main difficulty in proving such a bound is in dealing with the second term

in the sum. Let θ(x) (a C2 function from Rn → R) and η > 0 be such that for allsufficiently small h > 0

infx∈∂Dh+

⟨αh(x)r(x) + sh(x)

αh(x), θx(x)

⟩≥ η.(4.12)

Then for any k = 0, 1, 2, . . .,

θ(ξhk )− θ(ξh0 ) =k−1∑i=0

[θ(ξhi+1)− θ(ξhi )

]=k−1∑i=0

〈ξhi+1 − ξhi , θx(ξhi )〉+12

k−1∑i=0

(ξhi+1 − ξhi

)′θxx(ξhi )

(ξhi+1 − ξhi

),

where ξhi is an appropriately selected point between ξhi and ξhi+1. We rewrite this lastequation as

k−1∑i=0

I{ξhi ∈∂Dh+}〈ξhi+1 − ξhi , θx(ξhi )〉+

12

k−1∑i=0

I{ξhi ∈∂Dh+}(ξhi+1 − ξhi

)′θxx(ξhi )

(ξhi+1 − ξhi

)= θ(ξhk )− θ(ξh0 )

−k−1∑i=0

I{ξhi ∈Dh}〈ξhi+1 − ξhi , θx(ξhi )〉 − 1

2

k−1∑i=0

I{ξhi ∈Dh}(ξhi+1 − ξhi

)′θxx(ξhi )

(ξhi+1 − ξhi

).D

ownl

oade

d 07

/07/

14 to

155

.97.

178.

73. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 20: Rates of Convergence for Approximation Schemes in Optimal Control

738 PAUL DUPUIS AND MATTHEW JAMES

By using (4.12), the fact that ch(x) = o(αh(x)) uniformly in x, and equations (2.11)and (2.12), we obtain

η

2Ehx

k−1∑i=0

I{ξhi ∈∂Dh+}αh(ξhi ) ≤ 2‖θ‖∞ +KEh

xThk ,

for all sufficiently small h > 0, where K <∞ is independent of both h and k.For T ∈ [0,∞), define the stopping time MT = min{k : Thk ≥ T}. Note that

T hMT/T → 1 uniformly. It follows from the last display and the fact that MT is a

stopping time that

η

2Ehx

MT−1∑i=0

I{ξhi ∈∂Dh+}αh(ξhi ) ≤ 2‖θ‖∞ +KEh

xThMT

.(4.13)

We can now bound (4.11). According to equation (2.5) in section 2 |eh(ξhi , uhi )| =

O(hα). Thus

lim supN→∞

Ehx

[(N−1∑i=0

eh(ξhi , uhi )∆th(ξhi )

)/ThN

]= O(hα).

On the other hand, we recall that∣∣gh(x)

∣∣ =∣∣AV (x)−AhV (x)

∣∣ = O(h). By combiningthis with (4.13), we obtain

lim supN→∞

Ehx

[(N−1∑i=0

I{ξhi ∈∂Dh+}gh(ξhi )αh(ξhi )

)/ThN

]

= lim supT→∞

Ehx

[(MT−1∑i=0

I{ξhi ∈∂Dh+}gh(ξhi )αh(ξhi )

)/ThMT

]= O(h),

which proves (4.11).An examination of the proof just given shows that the errors in the approximations

to the boundary condition are of smaller order than the approximations on the interior.As in other sections, with added regularity one can identify the coefficient of the

rate of convergence. In this problem one finds two terms in the rate. One term is afunctional of the xs process and represents errors due to approximation on D, whilethe other is a functional of the boundary local time process zs and represents errorsdue to the approximation of the boundary condition.

5. Comments and extensions. In this final section we make some generalcomments and discuss some extensions of our methodology.

5.1. General method. The general method we have employed can be summa-rized with the following heuristics.

Consider the problem of approximating the solution V to the equation

A(V ) + k = 0(5.1)

by an approximation V h given by

Ah(V h) + k = 0.(5.2)

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 21: Rates of Convergence for Approximation Schemes in Optimal Control

RATES OF CONVERGENCE FOR APPROXIMATIONS IN CONTROL 739

The key to our method lies in the use of appropriate representations to solutions ofequations of the type (5.2). Let us suppose that

V h = Rh(k),(5.3)

for some representation operator Rh. The operators A, Ah, and Rh are in generalnonlinear. We have assumed that they are obtainable from linear operators via min,max, min-max, or max-min operations. Let us write

eh = A(V )−Ah(V ).

Then equation (5.1) can be rewritten as

Ah(V ) + [eh + k] = 0,

and consequently V has a representation determined by the method of approximation:

V = Rh(eh + k).(5.4)

To compare V with V h, we formally use the fact that Rh is obtained from a linearoperator by one or more minimization or maximization operations. This allows us towrite

V = Rh(eh + k) = Rh(k) +O(|eh|).(5.5)

Thus if |eh| = O(hα), depending on the smoothness of V , this yields the rate ofconvergence estimate

V h − V = O(hα).(5.6)

More detailed information is available with stronger assumptions. Suppose thateh = hαφ+O(hα+δ), for some δ > 0, and (5.5) is improved:

V = Rh(eh + k) = Rh(k) +Rh1 (eh).(5.7)

Then we have the explicit limit

limh→0

V h − Vhα

= R1(φ).(5.8)

5.2. Partial differential equations. Our approach is applicable to PDEs whichneed not have any a priori connection to control theory. The simplest instance isthat of linear equations. For example, consider a linear uniformly elliptic PDE withsmooth coefficients, boundary, and boundary data. Such a boundary value problemhas a smooth solution, of sufficient regularity to apply our theory and obtain rate ofconvergence estimates for a variety of approximation methods. The representation forlinear equations and their approximations is quite simple, in that no minimizationsor maximizations are required (cf. Feynman–Kac formulas).

A second instance of interest is the case of quasi-linear or even fully nonlinearuniformly elliptic/parabolic PDE. Smooth classical solutions are often available; see[16]. To apply our approach, a representation is needed, and indeed this can beobtained in a great many cases using control or game theory.

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 22: Rates of Convergence for Approximation Schemes in Optimal Control

740 PAUL DUPUIS AND MATTHEW JAMES

To illustrate, let us consider an example similar to the problem of section 2. Wewish only to communicate the general idea, and omit technical details. Suppose thatthe fully nonlinear equation{

λV (x) = F (Vxx(x)) + k(x) in D,

V (x) = 0 on ∂D,(5.9)

has a unique classical solution V ∈ C2,α(D), where F is a smooth nonlinear functionwith bounded gradient satisfying

(i) ξ′FXX(X)ξ ≥ c|ξ|2, c > 0, and(ii) lim|X|→∞ |F (X)|/|X| = 0.We have not assumed that F is convex, nor any other specific form. Following [9]

(see also [10]), F admits a max-min representation:

(5.10)

F (X) = maxv∈Rn2

minu∈Rn2

n∑i,j=1

(∫ 1

0

∂F

∂Xij((1− r)v + ru) dr

)(Xij − vij) + F (v)

.In view of this, let us write

F (X) + k = maxv∈Rn2

minu∈Rn2

n∑i,j=1

aij(u, v)Xij + k(x, u, v)

,where the matrix aij(u, v) is defined from (5.10) and k(x, u, v) = k(x) −∑ni,j=1 aij(u, v)vij + F (v). Suppose that we can write a(u, v) = 1

2σ(u, v)σ(u, v)′

for some Lipschitz matrix function σ.The desired game theoretic representation for V is

V (x) = infu.

supv.

Ex

[∫ τ

0e−λtk(xt, ut, vt) dt

],(5.11)

where

dxt = σ(ut, vt) dwt.

For precise information concerning games and their strategies, see [15].A finite difference approximation V h can be constructed, along the same lines as

in section 2, which will have a game representation. Note that the various quantitieswill depend on the additional control variable v. Then a straightforward modificationof the proof of Theorem 2.1 yields the rate of convergence estimate

supx∈Dh

|V h(x)− V (x)| = O(hα)

as h ↓ 0.

REFERENCES

[1] G. BARLES AND P. E. SOUGANIDIS, Convergence of approximation schemes for fully nonlinearsecond order equations, J. Asymptotic Anal., 4 (1991), pp. 271–283.

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 23: Rates of Convergence for Approximation Schemes in Optimal Control

RATES OF CONVERGENCE FOR APPROXIMATIONS IN CONTROL 741

[2] D. BERTSEKAS AND S. SHREVE, Stochastic Optimal Control: The Discrete Time Case, Aca-demic Press, New York, 1978.

[3] I. CAPUZZO DOLCETTA AND M. FALCONE, Discrete dynamic programming and viscosity so-lutions of the Bellman equation, Ann. Inst. H. Poincare, Anal. Non Lineaire, 6 (1989),pp. 161–184.

[4] I. CAPUZZO DOLCETTA AND H. ISHII, Approximate solution of the Hamilton-Jacobi equationof deterministic control theory, Appl. Math. Optim., 11 (1984), pp. 161–181.

[5] M. G. CRANDALL AND P. L. LIONS, Two approximations of solutions of Hamilton-Jacobiequations, Math. Comp., 43 (1984), pp. 1–19.

[6] P. DUPUIS AND H. ISHII, SDEs with oblique reflection on nonsmooth domains, Ann. Probab.,21 (1993), pp. 554–580.

[7] P. DUPUIS AND H. J. KUSHNER, Stochastic approximation and large deviations: Upper boundsand w.p.1 convergence, SIAM J. Control Optim., 27 (1989), pp. 1108–1135.

[8] P. DUPUIS, R. S. ELLIS, AND A. WEISS, Large deviations for Markov processes with discontin-uous statistics I: General upper bounds, Ann. Probab., 19 (1991), pp. 1280–1297.

[9] L. C. EVANS, On solving certain nonlinear partial differential equations by accretive operatormethods, Israel J. Math., 36 (1980), pp. 225–247.

[10] W. H. FLEMING, The Cauchy problem for degenerate parabolic equations, J. Math. Mech.,13 (1964), pp. 987–1008.

[11] W. H. FLEMING, The Cauchy problem for a nonlinear first order partial differential equation,J. Differential Equations, 5 (1969), pp. 515–530.

[12] W. H. FLEMING, Stochastic control for small noise intensities, SIAM J. Control Optim.,9 (1971), pp. 473–517.

[13] W. H. FLEMING AND M. R. JAMES, Asymptotic series and exit time probabilities, Ann. Probab.,20 (1992), pp. 1369–1384.

[14] W. H. FLEMING AND H. M. SONER, Controlled Markov Processes and Viscosity Solutions,Springer-Verlag, New York, 1993.

[15] W. H. FLEMING AND P. E. SOUGANIDIS, On the existence of value functions of two-playerzero sum stochastic differential games, Indiana Univ. Math. J., 38 (1989), pp. 293–314.

[16] D. GILBARG AND N. S. TRUDINGER, Elliptic Partial Differential Equations of Second Order,2nd ed., Springer-Verlag, New York, 1983.

[17] R. GONZALEZ AND E. ROFMAN, On deterministic control problems: An approximation proce-dure for the optimal cost, parts I and II, SIAM J. Control Optim., 23 (1985), pp. 242–285.

[18] H. J. KUSHNER, Probability Methods for Approximations in Stochastic Control and for EllipticEquations, Academic Press, New York, 1977.

[19] H. J. KUSHNER AND P. G. DUPUIS, Numerical Methods for Stochastic Control Problems inContinuous Time, Springer-Verlag, New York, 1992.

[20] H. J. KUSHNER AND J. YANG, A numerical method for controlled routing in large trunk linenetworks via stochastic control theory, ORSA J. Comput., to appear.

[21] P. L. LIONS AND A. S. SZNITMAN, Stochastic differential equations with reflecting boundaryconditions, Comm. Pure Appl. Math., 37 (1984), pp. 511–553.

[22] P. L. LIONS AND N. S. TRUDINGER, Linear oblique derivative boundary problems for the uni-formly elliptic Hamilton-Jacobi-Bellman equation, Math. Z., 191 (1986), pp. 1–15.

[23] J. MENALDI, Some estimates for finite difference approximations, SIAM J. Control Optim.,27 (1989), pp. 579–607.

[24] R. K. MILLER AND A. N. MICHEL, Ordinary Differential Equations, Academic Press, NewYork, 1982.

[25] M. L. PUTERMAN, Markov decision processes, in Stochastic Models, Vol. 2, D. P. Heyman andM. J. Sobel, eds., North–Holland, Amsterdam, 1991.

[26] P. E. SOUGANIDIS, Approximation schemes for viscosity solutions of Hamilton-Jacobi equa-tions, J. Differential Equations, 59 (1985), pp. 1–43.

Dow

nloa

ded

07/0

7/14

to 1

55.9

7.17

8.73

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php