An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

An Introduction to Stochastic Optimal Control

Herry P. Suryawan

Dept. of Mathematics, Sanata Dharma University, Yogyakarta

June 3, 2015

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 1 / 27

Outline

Motivation

Stochastic Optimal Control

Some Examples

Outlook: Beyond Brownian Motion


Motivation


Tracking a Diusion Particle Under a Microscope

Situation: we study some kind of particles in detail by zooming in on one ofthe particles, i.e. we increase the magnication of the microscope until oneparticle lls a large part of the eld of view.

Problem: the random movement of the particle causes it to rapidly leave oureld of view. So, we have to keep moving around the cover slide in order totrack the motion of the particle.

How to do: we attach an electric motor to the microscope slide which allowsus to move the slide around.

Model: Let zt be the position of the slide relative to the focus of themicroscope. Then, we can write

dztdt

= βut ,

where ut is the voltage applied to the motor and β > 0 is a gain constant.The position of the particle relative to the slide is modelled by a Brownianmotion xt . Hence, the position of the particle relative to the microscopefocus is xt + zt .Goal: to control the slide position to keep the particle in focus, i.e. to chooseut in order that xt + zt stays close to zero.


: tracking a Brownian particle


Formalization of the Problem

Cost Functional:

JT [u] = pE

(1

T

∫ T

0

(xt + zt)2 dt

)+ qE

(1

T

∫ T

0

u2t dt

),

where p and q are positive constants.First term: time-average of mean square distance of the particle on sometime interval [0,T ] from the focus of the microscope.Second term: average power in the control signal.

Aim: to minimize the rst and second term.

Stochastic Optimal Control Problem: to nd the feedback strategy utwhich minimize the cost functional JT [u].

Variation: problem on innite time horizon [0,∞):

J∞[u] = p lim supT→∞

E

(1

T

∫ T

0

(xt + zt)2 dt

)+ q lim sup

T→∞E

(1

T

∫ T

0

u2t dt

).




A Brief History

Impetus of Optimal Control: Calculus of Variation (in 1662 Pierre de Fermatused the method of calculus to minimize the passage time of a light raythrough two optical media) and Brachistochrone problem (in 1696 by JohanBernoulli).

Early work on optimal control theory: in 1800s, e.g. Hamilton, Hurwitz,Maxwell, Poincare, Lyapunov, Wiener, Kolmogorof

Modern optimal control theory: in the end of the World War II, e.g. Bellman,LaSalle, Blackwell, Fleming, Berkovitz, Pontryagin, Kalman.Bellman's dynamic programming method, Pontryagin's maximum principle,Kalman's LQ theory

Stochastic optimal control:Bellman in 1952 mentioned stochastic control in one of his earliest paper,but Ito-type SDE was not involved!Florentin in 1961 derived from a PDE associated with a controlled Markovprocess by using Bellman's dynamic programming.Kushner in 1962 studied optimal control problem using Ito-type SDE as thestate equation.Merton in late 1960s: rst application of stochastic optimal control theory tonancial mathematics.



Stochastic Optimal Control Problem (SOCP): completely observed controlproblem with a state equation of the Ito type (diusion model) and with acost functional of the Bolza type.

The basic source of uncertainty in diusion models is white noise (the timederivative of the Brownian motion), which represents the joint eects of alarge number of independent random forces acting on the systems.

Two main approaches:1 Pontryagin's maximum principle: Hamiltonian system which consists of adjoint

equation (ODE/SDE), original state equation and maximum conditions

2 Bellman's dynamic programming: solving HJB equation which is a PDE ofrst order in deterministic case and of second order in stochastic case.

References:1 W. H. Fleming and H. M. Soner. 2006. Controlled Markov Processes and

Viscosity Solutions. Springer.2 J. Yong and X. Y. Zhou. 2009. Stochastic Controls. Hamiltonian Systems

and HJB Equations. Springer.3 M. Nisio. 2015. Stochastic Control Theory. Dynamic Programming

Principle. Springer.


Basics of SOCP

Let(

Ω,F , (Ft)t≥0 ,P)be a ltered probability space on which an Ft-adapted

Brownian motion B = (Bt)t≥0 is dened. The state of the controlled system in anSOCP is described by an SDE

dX ut = b (t,X u

t , ut) dt + σ (t,X ut , ut) dBt , X0 = x .

Here b : [0,∞)×Rn ×U→ Rn and σ : [0,∞)×Rn ×U→ Rn×m, where U is thecontrol set (the set of values that the control input can take). Often: U = Rq.

Denition

The control strategy u = (ut)t≥0 is called an admissible strategy if

1 ut is an Ft-adapted stochastic process

2 ut(ω) ∈ U for every (ω, t) ∈ Ω× [0,∞)

3 the equation for X ut has a unique strong solution.

Moreover, an admissible strategy u is called a Markov strategy if it is of the form

ut = α (t,X ut ) for some function α : [0,∞)× Rn → U.


Three common types of cost functionals:

1 SOCP on the nite time horizon [0,T ]:

J[u] = E

(∫ T

0

w (s,X us , us) ds + z (X u

T )

),

where w : [0,T ]× Rn × U→ R (the running cost) and z : Rn → R (theterminal cost) are measurable functions and T <∞ is the terminal time.

2 SOCP on an indenite time horizon:

J[u] = E

(∫ τu

0

w (X us , us) ds + z (X u

τu )

),

where S ⊂ Rn, w : S × U→ R, z : ∂S → R, and the stopping timeτu = inf t > 0 : X u

t /∈ S3 SOCP on an innite time horizon:

J[u] = lim supT→∞

E

(1

T

∫ T

0


τu )

),

where w : Rn × U→ R is measurable.


Bellman's dynamic programming method

Given the state of the controlled system

dX ut = b (t,X u

t , ut) dt + σ (t,X ut , ut) dBt , X0 = x

with a cost functional J[u].

The goal is to nd

u∗ = min J[u] : u ∈ U is a Markov strategy

Dene the generator Lαt , α ∈ U as

Lαt g(x) =n∑

i=1

bi (t, x , α)∂g

∂x i(x) +

1

2

n∑i,j=1

m∑k=1

σik(t,x,α)σjk(t, x , α)∂2g

∂x i∂x j(x)

and the cost-to-go function Jut of the Markov strategy u as

Jut (X ut ) = E

(∫ T

t

w (s,X us , us) ds + z (X u

T )

∣∣∣∣∣Ft

).


Theorem

Suppose there is a Vt(x), which is C 1 in t and C 2 in x , such that

∂Vt(x)

∂t+ minα∈ULαt Vt(x) + w(t, x , α) = 0, VT (x) = z(x),

and |E (V0(X0)))| <∞, and choose a minimum

α∗(t, x) ∈ argminα∈U Lαt Vt(x) + w(t, x , α) .

Denote by K the class of admissible strategies u such that

n∑i=1

m∑k=1

∫ t

0

∂Vs

∂x i(X u

s )σik (s,X us , us) dBk

s

is a martingale, and suppose that the control u∗(t) = α∗(t,X u∗

t

)denes an

admissible Markov strategy which is in K. Then, J[u∗] ≤ J[u] for any u ∈ K, andVt(x) = Ju

∗

t (x) is the value function for the control problem.


Some Examples


Example 1: Tracking a Particle

System:dztdt

= βut , xt = x0 + σBt ,

zt is the position of the slide relative to the focus of the microscope

xt is the position of the particle we wish to view under the microscoperelative to the center of the slide

β ∈ R is the gain constant

σ > 0 is the diusion constant of the particle

Aim: to keep the particle in focus (keep xt + zt as close to zero as possible) andto introduce a power constraint on the control as well, as we cannot drive theservo motor with arbitrarily large input powers.Cost Functional:

J[u] = E

(p

T

∫ T

0

(xt + zt)2 dt +

q

T

∫ T

0

(ut)2 dt

),

where p, q > 0 allow us to select the tradeo between good tracking and lowfeedback power.


Example 1: Tracking a Particle. cont'd.

As the control cost depends only on xt + zt , it is more convenient to proceeddirectly with this quantity. That is, we dene et = xt + zt , P = p

T dan Q = qT

and note that

det = βut dt + σ dBt , J[u] = E

(P

∫ T

0

(xt + zt)2 dt + Q

∫ T

0

(ut)2 dt

)

We obtain the HJB equation:

0 =∂Vt(x)

∂t+ minα∈R

σ2

2

∂2Vt(x)

∂x2+ β α

∂Vt(x)

∂x+ Px2 + Qα2

=∂Vt(x)

∂t+σ2

2

∂2Vt(x)

∂x2− β2

4Q

(∂Vt(x)

∂x

)2

+ Px2

with VT (x) = 0 (as there is no terminal cost), and moreover

α∗(t, x) = argminα∈R

β α

∂Vt(x)

∂x+ Qα2

= − β

2Q

∂Vt(x)

∂x.


Example 1: Tracking a Particle. cont'd.

Use ansatz Vt(x) = atx2 + bt and VT (x) = 0:

datdt

+ P − β2

Qa2t = 0, aT = 0,

dbtdt

+ σ2at = 0, bT = 0.

Solution:

at =

√PQ

βtanh

(β

√P

Q(T − t)

), bt =

Qσ2

β2ln

(cosh

(β

√P

Q(T − t)

))

Note that Vt(x) is smooth in x and t and that α∗(t, x) is uniformly Lipschitz on[0,T ]. Hence if we assume that E

((xt + zt)

2)<∞, then we nd that the

feedback control

u∗t = α∗(t, et) = −√

P

Qtanh

(β

√P

Q(T − t)

)(xt + zt)

satises u∗t ∈ K. Thus, by the above theorem u∗t is an optimal control strategy.


Example 2: Optimal Portfolio Selection

We consider:

a single stock with average return µ > 0 and volatility σ > 0

a bank account with interest rate r > 0.

Dynamic of the system:

dSt = µSt dt + σSt dBt , S0 = 1, dRt = rRt dt, R0 = 1.

Assumption:

1 We can modify our investment at any point in time.

2 We consider self-nancing investment strategies

Let Xt be the total wealth at time t, and by ut the fraction of our wealth that isinvested in stock at time t. Then the self-nancing condition implies that

dXt = µut + r(1− ut)Xt dt + σutXt dBt .

Goal: (obviously) to make money!


Example 2: Optimal Portfolio Selection. cont'd.

Let x a terminal time T , and try to choose a strategy ut that maximizes asuitable functional U of our total wealth at time T ; i.e. , we choose the costfunctional

J[u] = E (−U (X uT )) .

How to choose utility function?

The obvious choice U(x) = x turns out not to admit an optimal control if weset U = R, while if we set U = [0, 1] (we do not allow borrowing money orselling short) then we get a rather boring answer: we should always put allour money in stock if µ > r , while if µ ≤ r we should put all our money inthe bank.

Suppose that U is nondecreasing and concave, e.g., U(x) = ln(x). Then therelative penalty for ending up with a low total wealth is much heavier thanfor U(x) = x , so that the resulting strategy will be less risky (concave utilityfunctions lead to risk-averse strategies, while the utility U(x) = x is calledrisk-neutral). As such, we would expect this idea to tell us to put somemoney in the bank to reduce our risk!



The Bellman equation reads (with U = R)

0 =∂Vt(x)

∂t+ minα∈R

σ2α2x2

2

∂2Vt(x)

∂x2+ (µα + r(1− α)) x

∂Vt(x)

∂x

=∂Vt(x)

∂t+ rx

∂Vt(x)

∂x− (µ− r)2

2σ2(∂Vt(x)/∂x)2

∂2Vt(x)/∂x2,

where VT (x) = − ln x , and moreover

α∗(t, x) = −µ− r

σ2∂Vt(x)/∂x

x∂2Vt(x)/∂x2,

provided that ∂2Vt(x)/∂x2 > 0 for all x > 0.Ansatz : Vt(x) = − ln x + bt , and

dbtdt− C = 0, bT = 0, C = r +

(µ− r)2

2σ2.



Solution to Bellman equation:

Vt(x) = − ln x − C (T − t)

(smooth on x > 0 and ∂2Vt(x)/∂x2 > 0).

The corresponding control:

α∗(t, x) =µ− r

σ2.

By existence theorem of SDE, the conditions of above theorem are met and wend that the optimal control:

ut =µ− r

σ2.

Hence, the choice of the utility function tells us to put money in the bank,provided that µ− r < σ2. On the other hand, if µ− r is large, it is better toborrow money from the bank to invest in stock (this is possible in the currentsetting as we have chosen U = R, rather than restricting to U = [0, 1]).


Outlook: Beyond Brownian Motion


Idea for generalizations/improvements: replace the driving process,Brownian motion Bt , in the state of controlled system with some otherstochastic process Mt

dX ut = b (t,X u

t , ut) dt + σ (t,X ut , ut) dMt , X0 = x

Common options for Mt :1 Lévy process: process with stationary and independent increments with cádlág

sample paths P-a.s.Features: innitely divisible distributed, Markov, semimartingale, jumps in thetrajectoriesReference: B. Oksendal and A. Sulem. 2005. Applied Stochastic Control of

Jump Diusions. Springer

2 Fractional Brownian motion with Hurst parameter H ∈ (0, 1) is a centeredGaussian process BH := (BH

t )t≥0 with covariance functionE(BH

t BHs ) =

1

2

(t2H + s2H − |t − s|2H

).

Features: normally distributed, continuous trajectories, long/short memory,non-Markov, non-martingaleReference: F. Biagini et. al. 2008. Stochastic Calculus for FractionalBrownian Motion and Applications. Springer


: Brownian motion

: Fractional Brownian motion with dierent H


: Poisson process

: Independent sum of compound Poisson process with Brownian motion


Lévy Driven Optimal Control

L = (Lt)t≥0 is an Ft-adapted Lévy process dened on(

Ω,F , (Ft)t≥0 ,P).

The state of the controlled system:

dY ut = b (t,Y u

t , ut) dt︸︷︷︸drift part

+σ (t,Y ut , ut) dBt︸︷︷︸

diusion part

+

∫Rn

γ (t,Y ut− , ut− , z) N (dt, dz)︸︷︷︸jump part

with Y0 = y ∈ Rn, b : [0,∞)× Rn × U→ Rn, σ : [0,∞)× Rn × U→ Rn×m, andγ : [0,∞)× Rn × U→ Rn×m.

Cost functional:

J[u] = E(∫ τ

0


τ )

),

where S ⊂ Rn, w : S × U→ R, z : ∂S → R, and τ = inf t > 0 : X ut /∈ S.


Thank you for your attention!


Documents

An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in