Upload
kisha
View
62
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Ch 17. Optimal control theory and the linear Bellman equation HJ Kappen. BTSM Seminar 12.07.19.(Thu) Summarized by Joon Shik Kim. Introduction. Optimising a sequence of actions to attain some future goal is the general topic of control theory. - PowerPoint PPT Presentation
Citation preview
Ch 17. Optimal control theory and the linear Bellman equation
HJ Kappen
BTSM Seminar12.07.19.(Thu)
Summarized by Joon Shik Kim
Introduction• Optimising a sequence of actions to attain some fu-
ture goal is the general topic of control theory.• In an example of a human throwing a spear to kill an
animal, a sequence of actions can be assigned a cost consists of two terms.
• The first is a path cost that specifies the energy con-sumption to contract the muscles.
• The second is an end cost that specifies whether the spear will kill animal, just hurt it, or miss it.
• The optimal control solution is a sequence of motor commands that results in killing the animal by throw-ing the spear with minimal physical effort.
Discrete Time Control (1/3)• where xt is an n-dimensional vector describing the state of the system and ut is an m-dimensional vector that spec-ifies the control or action at time t. • A cost function that assigns a cost to each sequence of
controls
where R(t,x,u) is the cost associated with taking action u at time t in state x, and Φ(xT) is the cost associated with ending up in state xT at time T.
1 ( , , ),t t t tx x f t x u 0,1,..., 1,t T
1
0 0: 10
( , ) ( ) ( , , )T
T T t tt
C x u x R t x u
Discrete Time Control (3/3)• The problem of optimal control is to
find the sequence u0:T-1 that min-imises C(x0, u0:T-1).
• The optimal cost-to-go: 1
1
( , ) min ( ) ( , , )t T
T
t T s su s t
J t x x R s x u
min( ( , , ) ( 1, ( , , ))).t
t t t t tuR t x u J t x f t x u
Discrete Time Control (1/3)• The algorithm to compute the opti-
mal control, trajectory, and the cost is given by
• 1. Initialization: • 2. Backwards: For t=T-1,…,0 and for
x compute
• 3. Forwards: For t=0,…,T-1 compute
( , ) ( ).J T x x
*( ) argmin{ ( , , ) ( 1, ( , , ))},tu
u x R t x u J t x f t x u * *( , ) ( , , ) ( 1, ( , , )).t tJ t x R t x u J t x f t x u
* * * * *1 ( , , ( )).t t t t tx x f t x u x
The HJB Equation (1/2)•
• (Hamilton-Ja-cobi-Belman equation)
• The optimal control at the current x, t is given by
• Boundary condition is
( , ) min( , , ) ( , ( , , ) )),u
J t x R x u dt J t dt x f x u t dt
min( ( , , ) ( , ) ( , ) ( , ) ( , , ) ),t xuR t x u dt J t x J t x dt J t x f x u t dt
( , ) min( ( , , ) ( , , ) ( , )).t xuJ t x R t x u f x u t J x t
( , ) argmin( , , ) ( , , ) ( , )).xuu x t R u t f x u t J t x
( , ) ( ).J x T x
The HJB Equation (2/2)Optimal control of mass on a spring
Stochastic Differential Equations (1/2)
• Consider the random walk on the line
with x0=0. • In a closed form, .• • In the continuous time limit we define
• The conditional probability distribution
1 ,t t tx x ,t
1
tt iix
0,tx 2 .tx t
20
0( )1( , | ,0) exp .22x xx t x
tt
t t dt tdx x x d (Wiener Process)
Stochastic Optimal Control Theory (2/2)
• • dξ is a Wiener process with .• Since <dx2> is of order dt, we must
make a Taylor expansion up to order dx2.
( ( ), ( ), )dx f x t u t t dt d
( , , )i j ijd d t x u dt
21( , ) min ( , , ) ( , , ) ( , ) ( , , ) ( , ) .2t x xu
J t x R t x u f x u t J x t t x u J x t
Stochastic Hamilton-Jacobi-Bellman equation
( , , )dx f x u t dt 2 ( , , )dx t x u dt : drift : diffusion
Path Integral Control (1/2)• In the problem of linear control and
quadratic cost, the nonlinear HJB equation can be transformed into a linear equation by a log transforma-tion of the cost-to-go.( , ) log ( , ).J x t x t
21( , ) ( ) .2
T Tt
Vx t f Tr g g
HJB becomes
Path Integral Control (2/2)• Let describe a diffusion
process for defined Fokker-Planck equation
( , | , )y x t t
21( ) ( ) .2
T TV f Tr g g
( , ) ( , | , ) exp( ( ) / ).x t dy y T x y y
(1)
The Diffusion Process as a Path In-tegral (1/2)
• Let’s look at the first term in the equation 1 in the previous slide. The first term describes a process that kills a sample trajectory with a rate of V(x,t)dt/λ.
• Sampling process and Monte Carlo( , ) ( , ) ,dx f x t dt g x t d
,x x dx With probability 1-V(x,t)dt/λ,†,ix with probability V(x,t)/λ, in this case, path is killed.
1( , ) ( , | , ) exp( ( ) / ) exp( ( ( )) ).ii alive
x t dy y T x t y x TN
The Diffusion Process as a Path In-tegral (2/2)
•
where ψ is a partition function, J is a free-energy, S is the energy of a path, and λ the temperature.
1 1( ( ) | , ) exp ( ( )) .( , )
p x t T x t S x t Tx t
Discussion• One can extend the path integral con-
trol of formalism to multiple agents that jointly solve a task. In this case the agents need to coordinate their actions not only through time, but also among each other to maximise a common reward function.
• The path integral method has great potential for application in robotics.