27

An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

  • Upload
    others

  • View
    25

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

An Introduction to Stochastic Optimal Control

Herry P. Suryawan

Dept. of Mathematics, Sanata Dharma University, Yogyakarta

June 3, 2015

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 1 / 27

Page 2: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Outline

Motivation

Stochastic Optimal Control

Some Examples

Outlook: Beyond Brownian Motion

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 2 / 27

Page 3: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Motivation

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 3 / 27

Page 4: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Tracking a Diusion Particle Under a Microscope

Situation: we study some kind of particles in detail by zooming in on one ofthe particles, i.e. we increase the magnication of the microscope until oneparticle lls a large part of the eld of view.

Problem: the random movement of the particle causes it to rapidly leave oureld of view. So, we have to keep moving around the cover slide in order totrack the motion of the particle.

How to do: we attach an electric motor to the microscope slide which allowsus to move the slide around.

Model: Let zt be the position of the slide relative to the focus of themicroscope. Then, we can write

dztdt

= βut ,

where ut is the voltage applied to the motor and β > 0 is a gain constant.The position of the particle relative to the slide is modelled by a Brownianmotion xt . Hence, the position of the particle relative to the microscopefocus is xt + zt .Goal: to control the slide position to keep the particle in focus, i.e. to chooseut in order that xt + zt stays close to zero.

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 4 / 27

Page 5: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

: tracking a Brownian particle

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 5 / 27

Page 6: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Formalization of the Problem

Cost Functional:

JT [u] = pE

(1

T

∫ T

0

(xt + zt)2 dt

)+ qE

(1

T

∫ T

0

u2t dt

),

where p and q are positive constants.First term: time-average of mean square distance of the particle on sometime interval [0,T ] from the focus of the microscope.Second term: average power in the control signal.

Aim: to minimize the rst and second term.

Stochastic Optimal Control Problem: to nd the feedback strategy utwhich minimize the cost functional JT [u].

Variation: problem on innite time horizon [0,∞):

J∞[u] = p lim supT→∞

E

(1

T

∫ T

0

(xt + zt)2 dt

)+ q lim sup

T→∞E

(1

T

∫ T

0

u2t dt

).

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 6 / 27

Page 7: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Stochastic Optimal Control

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 7 / 27

Page 8: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

A Brief History

Impetus of Optimal Control: Calculus of Variation (in 1662 Pierre de Fermatused the method of calculus to minimize the passage time of a light raythrough two optical media) and Brachistochrone problem (in 1696 by JohanBernoulli).

Early work on optimal control theory: in 1800s, e.g. Hamilton, Hurwitz,Maxwell, Poincare, Lyapunov, Wiener, Kolmogorof

Modern optimal control theory: in the end of the World War II, e.g. Bellman,LaSalle, Blackwell, Fleming, Berkovitz, Pontryagin, Kalman.Bellman's dynamic programming method, Pontryagin's maximum principle,Kalman's LQ theory

Stochastic optimal control:Bellman in 1952 mentioned stochastic control in one of his earliest paper,but Ito-type SDE was not involved!Florentin in 1961 derived from a PDE associated with a controlled Markovprocess by using Bellman's dynamic programming.Kushner in 1962 studied optimal control problem using Ito-type SDE as thestate equation.Merton in late 1960s: rst application of stochastic optimal control theory tonancial mathematics.

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 8 / 27

Page 9: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Stochastic Optimal Control

Stochastic Optimal Control Problem (SOCP): completely observed controlproblem with a state equation of the Ito type (diusion model) and with acost functional of the Bolza type.

The basic source of uncertainty in diusion models is white noise (the timederivative of the Brownian motion), which represents the joint eects of alarge number of independent random forces acting on the systems.

Two main approaches:1 Pontryagin's maximum principle: Hamiltonian system which consists of adjoint

equation (ODE/SDE), original state equation and maximum conditions

2 Bellman's dynamic programming: solving HJB equation which is a PDE ofrst order in deterministic case and of second order in stochastic case.

References:1 W. H. Fleming and H. M. Soner. 2006. Controlled Markov Processes and

Viscosity Solutions. Springer.2 J. Yong and X. Y. Zhou. 2009. Stochastic Controls. Hamiltonian Systems

and HJB Equations. Springer.3 M. Nisio. 2015. Stochastic Control Theory. Dynamic Programming

Principle. Springer.

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 9 / 27

Page 10: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Basics of SOCP

Let(

Ω,F , (Ft)t≥0 ,P)be a ltered probability space on which an Ft-adapted

Brownian motion B = (Bt)t≥0 is dened. The state of the controlled system in anSOCP is described by an SDE

dX ut = b (t,X u

t , ut) dt + σ (t,X ut , ut) dBt , X0 = x .

Here b : [0,∞)×Rn ×U→ Rn and σ : [0,∞)×Rn ×U→ Rn×m, where U is thecontrol set (the set of values that the control input can take). Often: U = Rq.

Denition

The control strategy u = (ut)t≥0 is called an admissible strategy if

1 ut is an Ft-adapted stochastic process

2 ut(ω) ∈ U for every (ω, t) ∈ Ω× [0,∞)

3 the equation for X ut has a unique strong solution.

Moreover, an admissible strategy u is called a Markov strategy if it is of the form

ut = α (t,X ut ) for some function α : [0,∞)× Rn → U.

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 10 / 27

Page 11: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Three common types of cost functionals:

1 SOCP on the nite time horizon [0,T ]:

J[u] = E

(∫ T

0

w (s,X us , us) ds + z (X u

T )

),

where w : [0,T ]× Rn × U→ R (the running cost) and z : Rn → R (theterminal cost) are measurable functions and T <∞ is the terminal time.

2 SOCP on an indenite time horizon:

J[u] = E

(∫ τu

0

w (X us , us) ds + z (X u

τu )

),

where S ⊂ Rn, w : S × U→ R, z : ∂S → R, and the stopping timeτu = inf t > 0 : X u

t /∈ S3 SOCP on an innite time horizon:

J[u] = lim supT→∞

E

(1

T

∫ T

0

w (X us , us) ds + z (X u

τu )

),

where w : Rn × U→ R is measurable.

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 11 / 27

Page 12: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Bellman's dynamic programming method

Given the state of the controlled system

dX ut = b (t,X u

t , ut) dt + σ (t,X ut , ut) dBt , X0 = x

with a cost functional J[u].

The goal is to nd

u∗ = min J[u] : u ∈ U is a Markov strategy

Dene the generator Lαt , α ∈ U as

Lαt g(x) =n∑

i=1

bi (t, x , α)∂g

∂x i(x) +

1

2

n∑i,j=1

m∑k=1

σik(t,x,α)σjk(t, x , α)∂2g

∂x i∂x j(x)

and the cost-to-go function Jut of the Markov strategy u as

Jut (X ut ) = E

(∫ T

t

w (s,X us , us) ds + z (X u

T )

∣∣∣∣∣Ft

).

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 12 / 27

Page 13: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Theorem

Suppose there is a Vt(x), which is C 1 in t and C 2 in x , such that

∂Vt(x)

∂t+ minα∈ULαt Vt(x) + w(t, x , α) = 0, VT (x) = z(x),

and |E (V0(X0)))| <∞, and choose a minimum

α∗(t, x) ∈ argminα∈U Lαt Vt(x) + w(t, x , α) .

Denote by K the class of admissible strategies u such that

n∑i=1

m∑k=1

∫ t

0

∂Vs

∂x i(X u

s )σik (s,X us , us) dBk

s

is a martingale, and suppose that the control u∗(t) = α∗(t,X u∗

t

)denes an

admissible Markov strategy which is in K. Then, J[u∗] ≤ J[u] for any u ∈ K, andVt(x) = Ju

t (x) is the value function for the control problem.

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 13 / 27

Page 14: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Some Examples

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 14 / 27

Page 15: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Example 1: Tracking a Particle

System:dztdt

= βut , xt = x0 + σBt ,

zt is the position of the slide relative to the focus of the microscope

xt is the position of the particle we wish to view under the microscoperelative to the center of the slide

β ∈ R is the gain constant

σ > 0 is the diusion constant of the particle

Aim: to keep the particle in focus (keep xt + zt as close to zero as possible) andto introduce a power constraint on the control as well, as we cannot drive theservo motor with arbitrarily large input powers.Cost Functional:

J[u] = E

(p

T

∫ T

0

(xt + zt)2 dt +

q

T

∫ T

0

(ut)2 dt

),

where p, q > 0 allow us to select the tradeo between good tracking and lowfeedback power.

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 15 / 27

Page 16: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Example 1: Tracking a Particle. cont'd.

As the control cost depends only on xt + zt , it is more convenient to proceeddirectly with this quantity. That is, we dene et = xt + zt , P = p

T dan Q = qT

and note that

det = βut dt + σ dBt , J[u] = E

(P

∫ T

0

(xt + zt)2 dt + Q

∫ T

0

(ut)2 dt

)

We obtain the HJB equation:

0 =∂Vt(x)

∂t+ minα∈R

σ2

2

∂2Vt(x)

∂x2+ β α

∂Vt(x)

∂x+ Px2 + Qα2

=∂Vt(x)

∂t+σ2

2

∂2Vt(x)

∂x2− β2

4Q

(∂Vt(x)

∂x

)2

+ Px2

with VT (x) = 0 (as there is no terminal cost), and moreover

α∗(t, x) = argminα∈R

β α

∂Vt(x)

∂x+ Qα2

= − β

2Q

∂Vt(x)

∂x.

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 16 / 27

Page 17: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Example 1: Tracking a Particle. cont'd.

Use ansatz Vt(x) = atx2 + bt and VT (x) = 0:

datdt

+ P − β2

Qa2t = 0, aT = 0,

dbtdt

+ σ2at = 0, bT = 0.

Solution:

at =

√PQ

βtanh

√P

Q(T − t)

), bt =

Qσ2

β2ln

(cosh

√P

Q(T − t)

))

Note that Vt(x) is smooth in x and t and that α∗(t, x) is uniformly Lipschitz on[0,T ]. Hence if we assume that E

((xt + zt)

2)<∞, then we nd that the

feedback control

u∗t = α∗(t, et) = −√

P

Qtanh

√P

Q(T − t)

)(xt + zt)

satises u∗t ∈ K. Thus, by the above theorem u∗t is an optimal control strategy.

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 17 / 27

Page 18: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Example 2: Optimal Portfolio Selection

We consider:

a single stock with average return µ > 0 and volatility σ > 0

a bank account with interest rate r > 0.

Dynamic of the system:

dSt = µSt dt + σSt dBt , S0 = 1, dRt = rRt dt, R0 = 1.

Assumption:

1 We can modify our investment at any point in time.

2 We consider self-nancing investment strategies

Let Xt be the total wealth at time t, and by ut the fraction of our wealth that isinvested in stock at time t. Then the self-nancing condition implies that

dXt = µut + r(1− ut)Xt dt + σutXt dBt .

Goal: (obviously) to make money!

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 18 / 27

Page 19: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Example 2: Optimal Portfolio Selection. cont'd.

Let x a terminal time T , and try to choose a strategy ut that maximizes asuitable functional U of our total wealth at time T ; i.e. , we choose the costfunctional

J[u] = E (−U (X uT )) .

How to choose utility function?

The obvious choice U(x) = x turns out not to admit an optimal control if weset U = R, while if we set U = [0, 1] (we do not allow borrowing money orselling short) then we get a rather boring answer: we should always put allour money in stock if µ > r , while if µ ≤ r we should put all our money inthe bank.

Suppose that U is nondecreasing and concave, e.g., U(x) = ln(x). Then therelative penalty for ending up with a low total wealth is much heavier thanfor U(x) = x , so that the resulting strategy will be less risky (concave utilityfunctions lead to risk-averse strategies, while the utility U(x) = x is calledrisk-neutral). As such, we would expect this idea to tell us to put somemoney in the bank to reduce our risk!

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 19 / 27

Page 20: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Example 2: Optimal Portfolio Selection. cont'd.

The Bellman equation reads (with U = R)

0 =∂Vt(x)

∂t+ minα∈R

σ2α2x2

2

∂2Vt(x)

∂x2+ (µα + r(1− α)) x

∂Vt(x)

∂x

=∂Vt(x)

∂t+ rx

∂Vt(x)

∂x− (µ− r)2

2σ2(∂Vt(x)/∂x)2

∂2Vt(x)/∂x2,

where VT (x) = − ln x , and moreover

α∗(t, x) = −µ− r

σ2∂Vt(x)/∂x

x∂2Vt(x)/∂x2,

provided that ∂2Vt(x)/∂x2 > 0 for all x > 0.Ansatz : Vt(x) = − ln x + bt , and

dbtdt− C = 0, bT = 0, C = r +

(µ− r)2

2σ2.

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 20 / 27

Page 21: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Example 2: Optimal Portfolio Selection. cont'd.

Solution to Bellman equation:

Vt(x) = − ln x − C (T − t)

(smooth on x > 0 and ∂2Vt(x)/∂x2 > 0).

The corresponding control:

α∗(t, x) =µ− r

σ2.

By existence theorem of SDE, the conditions of above theorem are met and wend that the optimal control:

ut =µ− r

σ2.

Hence, the choice of the utility function tells us to put money in the bank,provided that µ− r < σ2. On the other hand, if µ− r is large, it is better toborrow money from the bank to invest in stock (this is possible in the currentsetting as we have chosen U = R, rather than restricting to U = [0, 1]).

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 21 / 27

Page 22: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Outlook: Beyond Brownian Motion

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 22 / 27

Page 23: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Idea for generalizations/improvements: replace the driving process,Brownian motion Bt , in the state of controlled system with some otherstochastic process Mt

dX ut = b (t,X u

t , ut) dt + σ (t,X ut , ut) dMt , X0 = x

Common options for Mt :1 Lévy process: process with stationary and independent increments with cádlág

sample paths P-a.s.Features: innitely divisible distributed, Markov, semimartingale, jumps in thetrajectoriesReference: B. Oksendal and A. Sulem. 2005. Applied Stochastic Control of

Jump Diusions. Springer

2 Fractional Brownian motion with Hurst parameter H ∈ (0, 1) is a centeredGaussian process BH := (BH

t )t≥0 with covariance functionE(BH

t BHs ) =

1

2

(t2H + s2H − |t − s|2H

).

Features: normally distributed, continuous trajectories, long/short memory,non-Markov, non-martingaleReference: F. Biagini et. al. 2008. Stochastic Calculus for FractionalBrownian Motion and Applications. Springer

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 23 / 27

Page 24: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

: Brownian motion

: Fractional Brownian motion with dierent H

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 24 / 27

Page 25: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

: Poisson process

: Independent sum of compound Poisson process with Brownian motion

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 25 / 27

Page 26: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Lévy Driven Optimal Control

L = (Lt)t≥0 is an Ft-adapted Lévy process dened on(

Ω,F , (Ft)t≥0 ,P).

The state of the controlled system:

dY ut = b (t,Y u

t , ut) dt︸ ︷︷ ︸drift part

+σ (t,Y ut , ut) dBt︸ ︷︷ ︸

diusion part

+

∫Rn

γ (t,Y ut− , ut− , z) N (dt, dz)︸ ︷︷ ︸jump part

with Y0 = y ∈ Rn, b : [0,∞)× Rn × U→ Rn, σ : [0,∞)× Rn × U→ Rn×m, andγ : [0,∞)× Rn × U→ Rn×m.

Cost functional:

J[u] = E(∫ τ

0

w (X us , us) ds + z (X u

τ )

),

where S ⊂ Rn, w : S × U→ R, z : ∂S → R, and τ = inf t > 0 : X ut /∈ S.

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 26 / 27

Page 27: An Introduction to Stochastic Optimal Control€¦ · Bellman's dynamic programming method,Pontryagin's maximum principle, Kalman's LQ theory Stochastic optimal control: Bellman in

Thank you for your attention!

Herry P. Suryawan (Math USD) DAAD Partnership Workshop 2015, ITB June 3, 2015 27 / 27