14
Ch 17. Optimal control theory and the linear Bellman equation HJ Kappen BTSM Seminar 12.07.19.(Thu) Summarized by Joon Shik Kim

Ch 17. Optimal control theory and the linear Bellman equation HJ Kappen

  • Upload
    kisha

  • View
    62

  • Download
    1

Embed Size (px)

DESCRIPTION

Ch 17. Optimal control theory and the linear Bellman equation HJ Kappen. BTSM Seminar 12.07.19.(Thu) Summarized by Joon Shik Kim. Introduction. Optimising a sequence of actions to attain some future goal is the general topic of control theory. - PowerPoint PPT Presentation

Citation preview

Page 1: Ch  17. Optimal control theory and the linear Bellman  equation HJ  Kappen

Ch 17. Optimal control theory and the linear Bellman equation

HJ Kappen

BTSM Seminar12.07.19.(Thu)

Summarized by Joon Shik Kim

Page 2: Ch  17. Optimal control theory and the linear Bellman  equation HJ  Kappen

Introduction• Optimising a sequence of actions to attain some fu-

ture goal is the general topic of control theory.• In an example of a human throwing a spear to kill an

animal, a sequence of actions can be assigned a cost consists of two terms.

• The first is a path cost that specifies the energy con-sumption to contract the muscles.

• The second is an end cost that specifies whether the spear will kill animal, just hurt it, or miss it.

• The optimal control solution is a sequence of motor commands that results in killing the animal by throw-ing the spear with minimal physical effort.

Page 3: Ch  17. Optimal control theory and the linear Bellman  equation HJ  Kappen

Discrete Time Control (1/3)• where xt is an n-dimensional vector describing the state of the system and ut is an m-dimensional vector that spec-ifies the control or action at time t. • A cost function that assigns a cost to each sequence of

controls

where R(t,x,u) is the cost associated with taking action u at time t in state x, and Φ(xT) is the cost associated with ending up in state xT at time T.

1 ( , , ),t t t tx x f t x u 0,1,..., 1,t T

1

0 0: 10

( , ) ( ) ( , , )T

T T t tt

C x u x R t x u

Page 4: Ch  17. Optimal control theory and the linear Bellman  equation HJ  Kappen

Discrete Time Control (3/3)• The problem of optimal control is to

find the sequence u0:T-1 that min-imises C(x0, u0:T-1).

• The optimal cost-to-go: 1

1

( , ) min ( ) ( , , )t T

T

t T s su s t

J t x x R s x u

min( ( , , ) ( 1, ( , , ))).t

t t t t tuR t x u J t x f t x u

Page 5: Ch  17. Optimal control theory and the linear Bellman  equation HJ  Kappen

Discrete Time Control (1/3)• The algorithm to compute the opti-

mal control, trajectory, and the cost is given by

• 1. Initialization: • 2. Backwards: For t=T-1,…,0 and for

x compute

• 3. Forwards: For t=0,…,T-1 compute

( , ) ( ).J T x x

*( ) argmin{ ( , , ) ( 1, ( , , ))},tu

u x R t x u J t x f t x u * *( , ) ( , , ) ( 1, ( , , )).t tJ t x R t x u J t x f t x u

* * * * *1 ( , , ( )).t t t t tx x f t x u x

Page 6: Ch  17. Optimal control theory and the linear Bellman  equation HJ  Kappen

The HJB Equation (1/2)•

• (Hamilton-Ja-cobi-Belman equation)

• The optimal control at the current x, t is given by

• Boundary condition is

( , ) min( , , ) ( , ( , , ) )),u

J t x R x u dt J t dt x f x u t dt

min( ( , , ) ( , ) ( , ) ( , ) ( , , ) ),t xuR t x u dt J t x J t x dt J t x f x u t dt

( , ) min( ( , , ) ( , , ) ( , )).t xuJ t x R t x u f x u t J x t

( , ) argmin( , , ) ( , , ) ( , )).xuu x t R u t f x u t J t x

( , ) ( ).J x T x

Page 7: Ch  17. Optimal control theory and the linear Bellman  equation HJ  Kappen

The HJB Equation (2/2)Optimal control of mass on a spring

Page 8: Ch  17. Optimal control theory and the linear Bellman  equation HJ  Kappen

Stochastic Differential Equations (1/2)

• Consider the random walk on the line

with x0=0. • In a closed form, .• • In the continuous time limit we define

• The conditional probability distribution

1 ,t t tx x ,t

1

tt iix

0,tx 2 .tx t

20

0( )1( , | ,0) exp .22x xx t x

tt

t t dt tdx x x d (Wiener Process)

Page 9: Ch  17. Optimal control theory and the linear Bellman  equation HJ  Kappen

Stochastic Optimal Control Theory (2/2)

• • dξ is a Wiener process with .• Since <dx2> is of order dt, we must

make a Taylor expansion up to order dx2.

( ( ), ( ), )dx f x t u t t dt d

( , , )i j ijd d t x u dt

21( , ) min ( , , ) ( , , ) ( , ) ( , , ) ( , ) .2t x xu

J t x R t x u f x u t J x t t x u J x t

Stochastic Hamilton-Jacobi-Bellman equation

( , , )dx f x u t dt 2 ( , , )dx t x u dt : drift : diffusion

Page 10: Ch  17. Optimal control theory and the linear Bellman  equation HJ  Kappen

Path Integral Control (1/2)• In the problem of linear control and

quadratic cost, the nonlinear HJB equation can be transformed into a linear equation by a log transforma-tion of the cost-to-go.( , ) log ( , ).J x t x t

21( , ) ( ) .2

T Tt

Vx t f Tr g g

HJB becomes

Page 11: Ch  17. Optimal control theory and the linear Bellman  equation HJ  Kappen

Path Integral Control (2/2)• Let describe a diffusion

process for defined Fokker-Planck equation

( , | , )y x t t

21( ) ( ) .2

T TV f Tr g g

( , ) ( , | , ) exp( ( ) / ).x t dy y T x y y

(1)

Page 12: Ch  17. Optimal control theory and the linear Bellman  equation HJ  Kappen

The Diffusion Process as a Path In-tegral (1/2)

• Let’s look at the first term in the equation 1 in the previous slide. The first term describes a process that kills a sample trajectory with a rate of V(x,t)dt/λ.

• Sampling process and Monte Carlo( , ) ( , ) ,dx f x t dt g x t d

,x x dx With probability 1-V(x,t)dt/λ,†,ix with probability V(x,t)/λ, in this case, path is killed.

1( , ) ( , | , ) exp( ( ) / ) exp( ( ( )) ).ii alive

x t dy y T x t y x TN

Page 13: Ch  17. Optimal control theory and the linear Bellman  equation HJ  Kappen

The Diffusion Process as a Path In-tegral (2/2)

where ψ is a partition function, J is a free-energy, S is the energy of a path, and λ the temperature.

1 1( ( ) | , ) exp ( ( )) .( , )

p x t T x t S x t Tx t

Page 14: Ch  17. Optimal control theory and the linear Bellman  equation HJ  Kappen

Discussion• One can extend the path integral con-

trol of formalism to multiple agents that jointly solve a task. In this case the agents need to coordinate their actions not only through time, but also among each other to maximise a common reward function.

• The path integral method has great potential for application in robotics.