7/29/2019 Mae 546 Lecture 23
1/14
Stochastic Optimal ControlRobert Stengel
Optimal Control and Estimation MAE 546Princeton University, 2012
Copyright 2012 by Robert Stengel. All rights reserved. For educational use only.http://www.princeton.edu/~stengel/MAE546.html
http://www.princeton.edu/~stengel/OptConEst.html
! Nonlinear systems with random inputs andperfect measurements
! Nonlinear systems with random inputs andimperfect measurements
! Certainty equivalence and separation
! Stochastic neighboring-optimal control
! Linear-quadratic-Gaussian (LQG) control
Nonlinear Systems with RandomInputs and Perfect Measurements
Inputs and initial conditions are uncertain, but the state can bemeasured without error
!x t( ) = f x t( ), u t( ),w t( ), t!" #$z t( ) = x t( )
E x 0( )!" #$ = x 0( ); E x 0( )% x 0( )!" #$ x 0( )% x 0( )!" #$T
{ } = 0E w t( )!" #$ = 0; E w t( )w
T&( )!" #$ =W t( )' t% &( )
Assume that random disturbance effects are small and additive
!x t( ) = f x t( ), u t( ),t!" #$ + L t( )w t( )
Cost Must Be an Expected Value Deterministic cost function cannot be minimized because
disturbance effect on state cannot be predicted
state and control have become random variables
minu(t)
J= ! x(tf )"# $% + L x(t),u(t)[ ]to
tf
& dt
minu(t)
J= E ! x(tf)"# $% + L x(t),u(t)[ ]to
tf
& dt'()
*)
+,)
-)
However, the expected value of a deterministic costfunction can be minimized
Stochastic Euler-LagrangeEquations?
There is no single optimal trajectory
Expected values of Euler-Lagrange necessaryconditions may not be well defined
1) E ! (tf )"# $% = E &'[x(tf)]&x()*
+,-
T
2) E! (t)"# $% = &E
'H[x(t),u(t),! (t),t]'x
()*
+,-
T
3) E!H[x(t),u(t),"(t),t]
!u
#$%
&'(= 0
7/29/2019 Mae 546 Lecture 23
2/14
Stochastic Value Functionfor a Nonlinear System
However, a Hamilton-Jacobi-Bellman (HJB) based onexpectations can be solved
Base the optimization on the Principle of Optimality
Optimal expected value function at t1
V* t1( ) = E ! x * (tf )"# $% & L x *('),u* (')[ ]
tf
t1
( d')*+
,+
-.+
/+
=minu
E ! x *(tf)"# $% & L x * ('),u(')[ ]
tf
t1
( d')*+
,+
-.+
/+
Rate of Change of the Value Function
dV*dt
t= t1
= !E L x *(t1),u* (t1)[ ]{ }
x(t) and u(t) can be known precisely; therefore
Total time-derivative of V*
dV*
dtt= t1
= !L x * (t1),u *(t
1)[ ]
Incremental Changein the Value Function
Apply chain rule to total derivative
dV*
dt= E
!V*
!t+
!V*
!x!x
"
#$%
&'
!V* =dV*
dt!t= E
"V*"t
!t+"V*"x
!x!t+1
2!xT"2V*"x2
!x#$%
&'(!t2 +"
)
*+
,
-.
= E"V*"t
!t+"V*"x
f .( ) + Lw .( )( )!t+1
2f .( ) + Lw .( )( )
T "2V*"x2
f .( ) + Lw .( )( )#$%
&'(!t2 +"
)
*+
,
-.
Incremental change in value function,
Expand to second degree
!V
Introduction of the Trace
Trace of a matrix product
Tr ABC( ) = Tr CAB( ) = Tr BCA( )
Tr xTQx( ) = Tr xxTQ( ) = Tr QxxT( ) dim Tr ( )!" #$ = 1%1
dV*
dt! E
"V*"t
+"V*"x
f .( )+ Lw .( )( )+1
2Tr f .( )+ Lw .( )( )
T "2V*"x2
f .( )+ Lw .( )( )#$%
&'()t
*
+,
-
./
= E"V*"t
+"V*"x
f .( )+ Lw .( )( ) +1
2Tr
"2V*"x2
f .( )+ Lw .( )( ) f .( )+ Lw .( )( )T#
$%&'()t
*
+,
-
./
Cancel !t
7/29/2019 Mae 546 Lecture 23
3/14
Toward the StochasticHJB Equation
dV*
dt= E
!V*!t
+!V*!x
f .( ) + Lw .( )( )+1
2Tr
!2V*!x2
f .( ) + Lw .( )( ) f .( ) + Lw .( )( )T"
#$%&'(t
)
*+
,
-.
=
!V*!t
+
!V*!x
f .( ) + E!V*!x
Lw .( ) +1
2Tr
!2V*!x2
f .( ) + Lw .( )( ) f .( ) + Lw .( )( )T"
#$%&'(t
)
*+
,
-.
! Because x(t) and u(t)can be measured
Toward the Stochastic HJB Equation
E w t( )!" #$=
0
E w t( )wT %( )!" #$ =W t( )& t' %( )
dV*
dt=
!V*
!t+
!V*
!xf .( ) +
1
2lim"t#0
Tr!2V*
!x2E f .( ) f .( )
T( )"t+ LE w .( )w .( )T( )LT$% &'"t()*
+,-
=
!V*
!tt( ) +
!V*
!xt( )f .( ) +
1
2Tr
!2V*
!x2t( )L t( )W t( )L t( )
T$
%.
&
'/
! Uncertain disturbance input can only increase the valuefunction rate of change
! Disturbance is assumed to be zero-mean white noise
Stochastic Principleof Optimality
(Perfect Measurements)
!V*
!tt( ) =
"minu
E L x * t( ),u t( ),t#$ %& +!V*
!xt( )f x * t( ),u t( ),t#$ %& +
1
2Tr
!2V*
!x2t( )L t( )W t( )L t( )
T#
$'
%
&(
)*+,
-./,
Boundary (terminal) condition : V* tf( ) = E 0 tf( )#$ %&
dV*
dt= !V*
!tt( )+ !V*
!xt( ) f .( ) + 1
2Tr !
2
V*
!x2t( )L t( )W t( )L t( )
T
"#$ %
&'
! Substitute for total derivative, dV*/dt= L(x*,u*)
! Solve for the partial derivative, !V*/!t
! Stochastic HJB Equation
Observations of StochasticPrinciple of Optimality
(Perfect Measurements)
!V*
!tt( ) =
"minu
E L x * t( ),u t( ),t#$ %& +!V*
!xt( ) f x * t( ),u t( ),t#$ %& +
1
2Tr
!2V*
!x2t( )L t( )W t( )L t( )
T#
$'
%
&(
)*+,
-./,
! Control has no effect on the disturbance input
! Criterion for optimality is the same as for the
deterministic case
! Disturbance uncertainty increases the magnitude
of the total optimal value function, V*(0)
7/29/2019 Mae 546 Lecture 23
4/14
Information Sets andExpected Cost
! Sigma algebra(Wikipedia definitions)
! The collection of sets over which a measure is defined
! The collection of events that can be assigned probabilities
!
A measurable space
! Information available at current time, t1! All measurements from initial time, to! All control commands from initial time
I t
o,t
1[ ] = z to,t1[ ],u to,t1[ ]{ }
The Information Set,I
! Plus available model structure, parameters, and statistics
I to,t1[ ] = z to ,t1[ ],u to ,t1[ ], f ( ),Q,R,!{ }
! Measurements may be directly useful, e.g.,
! Displays
! Simple feedback control
! ... or they may require processing, e.g.,
! Transformation
! Estimation
! Example of a derived information set
! History of mean and covariance from a state estimator
I
Dto, t
1[ ] = x to,t1[ ],P to, t1[ ],u to, t1[ ]{ }
A Derived Information Set, ID
!
Markov derived information set! Most current mean and covariance from
a state estimator
IMD
t1( ) = x t1( ),P t1( ),u t1( ){ }
Additional DerivedInformation Sets
! Multiple model derived information set
! Parallel estimates of current mean, covariance,and hypothesis probability mass function
IMM t1( ) = xA t1( ),PA t1( ),u t1( ),Pr HA( )!" #$ , xB t1( ),PB t1( ),u t1( ),Pr HB( )!" #$,!{ }
7/29/2019 Mae 546 Lecture 23
5/14
! Optimal control requires propagation of information back
from the final time
! Hence, it requires the entire information set, extending from tototf
Required and Available Information Setsfor Optimal Control
I to ,tf!" #$
! Separate information set into knowable and predictable parts
I to,tf!" #$ = I to, t1[ ]+ I t1,tf!" #$
! Knowable information has been received
! Predictable information is to come
Expected Values ofState and Control
! Expected values of the state and control are conditioned
on the information set
E x t( ) | ID
!" #$ = x t( )
E x t( )% x t( )!" #$ x t( ) % x t( )!" #$T
| ID{ } = P t( )
... where the conditional expected values are obtainedfrom a Kalman-Bucy filter
Dependence of the Stochastic CostFunction on the Information Set
! Expand the state covariance
J=1
2E E Tr S(t
f
)x(tf
)xT(t
f
)!"
#$| I
D
!
"
#
$+ E Tr Qx t
( )xT t
( )!"
#${ }
dt0
tf
%+ E Tr Ru t
( )uT t
( )!"
#${ }
dt0
tf
%
&'(
)(
*+(
,(
P t( ) = E x t( )! x t( )"# $% x t( )! x t( )"# $%T
| ID{ }
= E x t( )xT t( )! x t( )xT t( )! x t( ) xT t( )+ x t( ) xT t( )"# $% | ID{ }
E x t( ) xT t( )!" #$ | ID{ } = E x t( )x
Tt( )!" #$ | ID{ } = x t( ) x
Tt( )
P t( ) = E x t( )xT t( )!" #$ | ID{ } % x t( ) xT
t( )
or
E x t( )xT
t( )!" #$ | ID{ } = P t( ) +x t( )
x
T
t( )
Certainty-Equivalent andStochastic Incremental Costs
J= 1
2E Tr S(tf ) P tf( )+ x(tf )x
T(tf )!" #${ }+ Tr Q P t( ) + x t( ) xT t( )!" #${ }dt
0
tf
%+ Tr Ru t( )uT t( )!" #$dt
0
tf
%&
'(
)(
*
+(
,(
! JCE + JS
JCE =1
2E Tr S(tf)x(tf )x
T(tf )!" #$ + Tr Qx t( ) x
Tt( ){ }dt
0
tf
% + Tr Ru t( )uT
t( )!" #$dt0
tf
%&'(
)(
*+(
,(
JS =1
2E Tr S(tf )P tf( )!" #$ + Tr QP t( )!" #$dt
0
tf
%&'(
)(
*+(
,(
! Cost function has two parts
! Certainty-equivalent cost
! Stochastic increment cost
7/29/2019 Mae 546 Lecture 23
6/14
Expected Cost of the Trajectory
V* t
o( ) ! J* tf( ) = E ! x *(tf )"# $% + L x *(&),u *(&)[ ]t0
tF
' d&()*
+*
,-*
.*
E !( ) = E !| I to, t1[ ]( )Pr I to,t1[ ]{ }+ E !| I t1,tf"# $%( )Pr I t1,tf"# $%{ }= E E !| I( )"# $%
! Law of total expectation
! Optimized cost function
! Because the past is established at t1
E J*( ) = E J* | I to , t1[ ]( ) 1[ ]+ E J* | I t1,tf!" #$( )Pr I t1, tf!" #${ }= E J* | I to ,t1[ ]( )+ E J* | I t1,tf!" #$( )Pr I t1, tf!" #${ }
Expected Cost of the Trajectory
!For planning or post-trajectory analysis, one can assumethat the entire information set is available
! For real-time control, t1 !tf, and future information canonly be predicted
! If separation property applies (TBD), future conditioningeffect can be predicted
! If not, future conditioning effect can only be approximated
Separation Property andCertainty Equivalence
Separation Property Optimal Control Law and Optimal Estimation Law can be derived
separately Their derivations are strictly independent
Certainty Equivalence Property Separation property plus, ... The Stochastic Optimal Control Law and the Deterministic Optimal
Control Law are the same The Optimal Estimation Law can be derived separately Linear-quadratic-Gaussian control is certainty-equivalent
Neighboring-Optimal Control withUncertain Disturbance, Measurement,and Initial Condition
7/29/2019 Mae 546 Lecture 23
7/14
Immune Response Example
! Optimal open-loop drug therapy (control)
! Assumptions! Initial condition known without error
! No disturbance
! Optimal closed-loop therapy! Assumptions
! Small error in initial condition
! Small disturbance
! Perfect measurement of state
! Stochastic optimal closed-loop therapy! Assumptions
! Small error in initial condition
! Small disturbance
! Imperfect measurement
! Certainty-equivalence applies toperturbation control
Immune Response
Example with OptimalFeedback Control
Open-Loop Optimal Controlfor Lethal Initial Condition
Open- and Closed-LoopOptimal Control for 150%Lethal Initial Condition
Immune Response with Stochastic
Optimal Feedback Control(Random Disturbance and Measurement Error Not Simulated)
Low-Bandwidth Estimator
(|W| < |N|)
High-Bandwidth Estimator
(|W| > |N|)
! Initial control too sluggish
to prevent divergence
! Quick initial control
prevents divergence
Immune Response to Random Disturbancewith Stochastic Neighboring-Optimal Control
Disturbance due to
Re-infection Sequestered pockets
of pathogen
Noisy measurements
Closed-loop therapy isrobust
... but not robust enough:
Organ death occurs inone case
Probability of satisfactorytherapy can bemaximized by stochasticredesign of controller
7/29/2019 Mae 546 Lecture 23
8/14
Stochastic Linear-QuadraticOptimal Control
Stochastic Principle of Optimality Applied tothe Linear-Quadratic (LQ) Problem
! Linear dynamic constraint
V to( ) = E ! x(tf)"# $% & L x('),u(')[ ]
tf
to
( d')*+
,+
-.+
/+
=
1
2E x
T(t
f)S(t
f)x(t
f) & xT(t) uT(t)"#
$%
Q(t) M(t)
MT(t) R(t)
"
#
00
$
%
11
x(t)
u(t)
"
#00
$
%11
dttf
to
()*+
,+
-.+
/+
!x t( ) = F(t)x(t)+G(t)u(t)+ L(t)w(t)
! Quadratic value function
Components of the LQ Value Function
! Certainty-equivalent value function
V t( )=
1
2x
T
(t)S(t)x(t)
+v t( )
! Quadratic value function has two parts
! Stochastic value function increment
VCE t( ) !1
2xT(t)S(t)x(t)
v t( ) =1
2Tr S !( )L !( )W !( )L !( )
T"#
$%d!t
tf
&
Value Function Gradient and Hessian! Certainty-equivalent value function
! Gradient with respect to the state
VCE t( ) !1
2 xT
(t)S(t)x(t)
!V
!xt( ) = xT(t)S(t)
! Hessian with respect to the state
!2V
!x2
t( ) = S(t)
7/29/2019 Mae 546 Lecture 23
9/14
Linear-Quadratic StochasticHamilton-Jacobi-Bellman Equation
(Perfect Measurements)
!Certainty-equivalent plus stochastic terms
!V*
!t= "min
u
1
2E x *
TQx*+2x *
TMu + u
TRu( )+ x *T S Fx *+Gu( )+ Tr SLWLT( )#$ %&
= "minu
1
2x *
TQx*+2x *
TMu + u
TRu( )+ x *T S Fx *+Gu( )+ Tr SLWLT( )#$ %&
! Terminal condition
V tf( ) =1
2x
T(tf )S(tf )x(tf )
Optimal Control Law! Differentiate right side of HJB equation w.r.t. u
and set equal to zero
! !V !t( )!u
= 0 = xTM + u
TR( )+ xTSG"# $%
u t( ) = !R!1 t( ) GT t( )S t( ) +MT t( )"# $%x t( )
! !C t( )x t( )
! Solve for u, obtaining feedback control law
LQ Optimal Control Law
u t( ) = !R!1 t( ) GT t( )S t( )+MT t( )"# $%x t( )
! !C t( )x t( )
! Zero-mean, white-noise disturbance has no effect on the structureand gains of the LQ feedback control law
Matrix Riccati Equation
! Substitute optimal control law in HJB equation
! Matrix Riccati equation provides S(t)
!S t( ) = !Q(t)+M(t)R!1(t)MT(t)"# $% ! F(t)!G(t)R
!1(t)M
T(t)"# $%
T
S t( )
! S t( ) F(t)!G(t)R!1(t)MT(t)"# $% + S t( )G(t)R!1(t)G
T(t)S t( ), S tf( ) = &xx tf( )
1
2
xT !Sx + !v =1
2
xT !Q + MR!1MT( )! F ! GR!1MT( )T
S ! S F ! GR!1MT( )+ SGR!1GTS"#
$%
x
+1
2Tr SLWL
T( ) u t( ) = !R!1
t( ) GT t( )S t( )+MT t( )"# $%x t( )
! Stochastic value function increases cost due to disturbance
! However, its calculation is independent of the Riccati equation
!v =1
2Tr SLWL
T( )
7/29/2019 Mae 546 Lecture 23
10/14
Evaluation of the Total Cost(Imperfect Measurements)
! Stochastic quadratic cost function, neglecting cross terms
J=1
2Tr E xT(tf )S(tf )x(tf )!" #$ + E x
T(t) uT(t)!"#$
Q(t) 0
0 R(t)
!
"
%
%
#
$
&
&
x(t)
u(t)
!
"
%
%
#
$
&
&
dt
to
tf
'()*
+*
,-*
.*
=1
2Tr S(tf )E x(tf )x
T(tf )!" #$ + Q(t)E x(t)xT(t)!" #$ + R(t)E u(t)u
T(t)!" #${ }dtto
tf
'
J=1
2Tr S(tf )P(tf )+ Q(t)P(t)+R(t)U t( )!" #$dt
to
tf
%&'(
)(
*+(
,(
where
P(t) ! E x(t)xT(t)!" #$
U t( ) ! E u(t)uT(t)!" #$
or
Optimal Control Covariance
u t( ) = !C t( ) x t( )
! Optimal control vector
U t( ) = C t( )P t( )CT t( )
= R!1
t( )GT t( )S t( )P t( )S t( )G t( )R!1 t( )
! Optimal control covariance
Revise Cost to Reflect State andAdjoint Covariance Dynamics
! Integration by parts
J=
1
2Tr S(to )P to( )+ Q(t)P t( ) + R(t)U t( ) + !S(t)P(t)+ S(t) !P(t)!" #$dt
to
tf
%&'(
)(
*+(
,(
S(t)P(t)to
tf= !S(t)P(t) + S(t) !P(t)!" #$dt
to
tf
%
S(tf )P(tf ) = S(to )P(to )+!S(t)P(t) + S(t) !P(t)!" #$dt
to
tf
%
! Rewrite cost function to incorporate initial cost
Evolution of State and AdjointCovariance Matrices
(No Control)
! State covariance response to random disturbance
! Adjoint covariance response to terminal cost
!P t( ) = F t( )P t( ) + P t( )FT t( ) + L t( )W t( )LT t( ), P to( ) given
!S t( ) = !FT t( )S t( ) ! S t( )F t( ) !Q t( ), S tf( ) given
u t( ) = 0; U t( ) = 0
7/29/2019 Mae 546 Lecture 23
11/14
Evolution of State and AdjointCovariance Matrices
(Optimal Control)
!State covariance response to random disturbance
! Adjoint covariance response to terminal cost
!P t( ) = F t( ) !G t( )C t( )"# $%P t( ) + P t( ) F t( ) !G t( )C t( )"# $%T + L t( )W t( )LT t( )
!S t( ) = !FT t( )S t( ) ! S t( )F t( ) !Q t( ) ! S t( )G t( )R!1 t( )GT t( )S t( )
Dependent on S(t)
Independent of P(t)
! With no control
Jno control =
12Tr S(to )P to( ) + S t( )L t( )W t( )LT t( )dt
to
tf
!"
#$$
%
&''
! With optimal control, the equation for the cost is the same
Joptimal control
=1
2Tr S(t
o)P t
o( )+ S t( )L t( )W t( )LT t( )dt
to
tf
!"
#$$
%
&''
! ... but evolutions of S(t) and S(to) are different in each case
Total Cost With andWithout Control
Next Time:Linear-Quadratic-GaussianRegulatorsSupplemental
Material
7/29/2019 Mae 546 Lecture 23
12/14
Dual Control(Fel"dbaum, 1965)
! Nonlinear system
! Uncertain system parameters to be estimated
! Parameter estimation can be aided by test inputs
! Approach: Minimize value function with three increments
! Nominal control
! Cautious control
! Probing control
minu
V* =
minu
V*nominal +V*cautious +V*probing( )
Estimation and control calculations are coupledand necessarily recursive
Adaptive Critic Controller
Nonlinear control law, c, takes the general form
On-line adaptive critic controller Nonlinear control law (action network) Criticizes non-optimal performance via critic network
Adapts control gains to improve performance Adapts cost model to improve estimate
u t( ) = c x(t),a,y * t( )[ ]
x(t) : state
a : parameters of operating point
y *( t) : command input
Algebraic Initialization of Neural Networks(Ferrari and Stengel)
Initially, c[x, a, y*] is unknown
Design PI-LQ controllers with integral compensation that
satisfy requirements at noperating points Scheduling variable, a
u t( ) = CF
a( )y *+CB
a( )!x + CI
a( ) !y t( )dt" # c x(t),a,y * t( )$% &'
Replace Gain Matrices by Neural Networks
Replace control gain matrices by sigmoidal neural networks
u t( ) = NNF
y * t( ),a t( )!" #$ + NNB x t( ),a t( )!" #$ + NNI %y t( )dt& ,a t( )!" #$ = c x(t),a, y * t( )!" #$
7/29/2019 Mae 546 Lecture 23
13/14
Initial Neural Control Law
Algebraic training of neural networks produces exact fit oflinear control gains and trim conditions at noperating points
Interpolation and gain scheduling via neural networks
One node per operating point in each neural network
On-line Optimization of Adaptive
Critic Neural Network Controller
Critic adapts neural network weights toimprove performance usingapproximate dynamic programming
Heuristic Dynamic ProgrammingAdaptive Critic
Dual Heuristic Programming Adaptive Critic for receding-horizonoptimization problem
Critic and Action (i.e., Control) networks adapted concurrently LQ-PI cost function applied to nonlinear problem
Modified resilient backpropagation for neural network training
V x tk
( )[ ] = L x tk( ),u tk( )[ ] + V x tk+1( )[ ]
!V
!u=!L
!u+!V
!x
!x
!u= 0
!V xat( )[ ]
!xat( )
= NNCx
at( ),a t( )[ ]
Action Network On-line TrainingTrain action network, at timet, holding the critic parameters fixed
NNC
Aircraft Model
Transition Matrices State Prediction
Utility Function
Derivatives
NNA
xa(t)
a(t)
Optimality
Condition
NNA
Target
Target Generation
7/29/2019 Mae 546 Lecture 23
14/14
Critic Network On-line TrainingTrain critic network, at time t, holding the action parameters fixed
NNC(old)
Utility Function
Derivatives
NNA
NNC
Target
Target Generation
Aircraft Model Transition Matrices
State Prediction
NNC
Target Cost
Gradient
xa(t)
a(t)