Upload
timothy-stewart
View
214
Download
1
Tags:
Embed Size (px)
Citation preview
Ch 17. Optimal control theory and the linear Bellman equation
HJ Kappen
BTSM Seminar12.07.19.(Thu)
Summarized by Joon Shik Kim
Introduction
• Optimising a sequence of actions to attain some fu-ture goal is the general topic of control theory.
• In an example of a human throwing a spear to kill an animal, a sequence of actions can be assigned a cost consists of two terms.
• The first is a path cost that specifies the energy consumption to contract the muscles.
• The second is an end cost that specifies whether the spear will kill animal, just hurt it, or miss it.
• The optimal control solution is a sequence of motor commands that results in killing the animal by throwing the spear with minimal physical effort.
Discrete Time Control (1/3)
• where xt is an n-dimensional vector describing the state of the system and ut is an m-dimensional vector that specifies the control or action at time t. • A cost function that assigns a cost to each sequence
of controls
where R(t,x,u) is the cost associated with taking action u at time t in state x, and Φ(xT) is the cost associated with ending up in state xT at time T.
1 ( , , ),t t t tx x f t x u 0,1,..., 1,t T
1
0 0: 10
( , ) ( ) ( , , )T
T T t tt
C x u x R t x u
Discrete Time Control (3/3)
• The problem of optimal control is to find the sequence u0:T-1 that min-imises C(x0, u0:T-1).
• The optimal cost-to-go
: 1
1
( , ) min ( ) ( , , )t T
T
t T s su
s t
J t x x R s x u
min( ( , , ) ( 1, ( , , ))).t
t t t t tu
R t x u J t x f t x u
Discrete Time Control (1/3)
• The algorithm to compute the opti-mal control, trajectory, and the cost is given by
• 1. Initialization: • 2. Backwards: For t=T-1,…,0 and for
x compute
• 3. Forwards: For t=0,…,T-1 compute
( , ) ( ).J T x x
*( ) argmin{ ( , , ) ( 1, ( , , ))},tu
u x R t x u J t x f t x u * *( , ) ( , , ) ( 1, ( , , )).t tJ t x R t x u J t x f t x u
* * * * *1 ( , , ( )).t t t t tx x f t x u x
The HJB Equation (1/2)
•
• (Hamilton-Ja-cobi-Belman equation)
• The optimal control at the current x, t is given by
• Boundary condition is
( , ) min( , , ) ( , ( , , ) )),u
J t x R x u dt J t dt x f x u t dt
min( ( , , ) ( , ) ( , ) ( , ) ( , , ) ),t xuR t x u dt J t x J t x dt J t x f x u t dt
( , ) min( ( , , ) ( , , ) ( , )).t xu
J t x R t x u f x u t J x t
( , ) argmin( , , ) ( , , ) ( , )).xu
u x t R u t f x u t J t x
( , ) ( ).J x T x
The HJB Equation (2/2)
Optimal control of mass on a spring
Stochastic Differential Equations (1/2)
• Consider the random walk on the line
with x0=0.
• In a closed form, .• • In the continuous time limit we define
• The conditional probability distribution
1 ,t t tx x ,t
1
t
t iix
0,tx 2 .tx t
20
0
( )1( , | ,0) exp .
22
x xx t x
tt
t t dt tdx x x d (Wiener Process)
Stochastic Optimal Control Theory (2/2)
• • dξ is a Wiener process with .• Since <dx2> is of order dt, we must
make a Taylor expansion up to order dx2.
( ( ), ( ), )dx f x t u t t dt d
( , , )i j ijd d t x u dt
21( , ) min ( , , ) ( , , ) ( , ) ( , , ) ( , ) .
2t x xu
J t x R t x u f x u t J x t t x u J x t
Stochastic Hamilton-Jacobi-Bellman equation
( , , )dx f x u t dt 2 ( , , )dx t x u dt : drift : diffusion
Path Integral Control (1/2)
• In the problem of linear control and quadratic cost, the nonlinear HJB equation can be transformed into a linear equation by a log transforma-tion of the cost-to-go.( , ) log ( , ).J x t x t
21( , ) ( ) .
2T T
t
Vx t f Tr g g
HJB becomes
Path Integral Control (2/2)
• Let describe a diffusion process for defined Fokker-Planck equation
( , | , )y x t t
21( ) ( ) .
2T TVf Tr g g
( , ) ( , | , ) exp( ( ) / ).x t dy y T x y y
(1)
The Diffusion Process as a Path In-tegral (1/2)
• Let’s look at the first term in the equation 1 in the previous slide. The first term describes a process that kills a sample trajectory with a rate of V(x,t)dt/λ.
• Sampling process and Monte Carlo( , ) ( , ) ,dx f x t dt g x t d
,x x dx With probability 1-V(x,t)dt/λ,
†,ix with probability V(x,t)/λ, in this case, path is killed.
1( , ) ( , | , ) exp( ( ) / ) exp( ( ( )) ).i
i alive
x t dy y T x t y x TN
The Diffusion Process as a Path In-tegral (2/2)
•
where ψ is a partition function, J is a free-energy, S is the energy of a path, and λ the temperature.
1 1( ( ) | , ) exp ( ( )) .
( , )p x t T x t S x t T
x t
Discussion
• One can extend the path integral control of formalism to multiple agents that jointly solve a task. In this case the agents need to coordi-nate their actions not only through time, but also among each other to maximise a common reward function.
• The path integral method has great potential for application in robotics.