Download pdf - Mae 546 Lecture 23

7/29/2019 Mae 546 Lecture 23

1/14

Stochastic Optimal ControlRobert Stengel

Optimal Control and Estimation MAE 546Princeton University, 2012

Copyright 2012 by Robert Stengel. All rights reserved. For educational use only.http://www.princeton.edu/~stengel/MAE546.html

http://www.princeton.edu/~stengel/OptConEst.html

! Nonlinear systems with random inputs andperfect measurements

! Nonlinear systems with random inputs andimperfect measurements

! Certainty equivalence and separation

! Stochastic neighboring-optimal control

! Linear-quadratic-Gaussian (LQG) control

Nonlinear Systems with RandomInputs and Perfect Measurements

Inputs and initial conditions are uncertain, but the state can bemeasured without error

!x t( ) = f x t( ), u t( ),w t( ), t!" #$z t( ) = x t( )

E x 0( )!" #$ = x 0( ); E x 0( )% x 0( )!" #$ x 0( )% x 0( )!" #$T

{ } = 0E w t( )!" #$ = 0; E w t( )w

T&( )!" #$ =W t( )' t% &( )

Assume that random disturbance effects are small and additive

!x t( ) = f x t( ), u t( ),t!" #$ + L t( )w t( )

Cost Must Be an Expected Value Deterministic cost function cannot be minimized because

disturbance effect on state cannot be predicted

state and control have become random variables

minu(t)

J= ! x(tf )"# $% + L x(t),u(t)[ ]to

tf

& dt

minu(t)

J= E ! x(tf)"# $% + L x(t),u(t)[ ]to

tf

& dt'()

*)

+,)

-)

However, the expected value of a deterministic costfunction can be minimized

Stochastic Euler-LagrangeEquations?

There is no single optimal trajectory

Expected values of Euler-Lagrange necessaryconditions may not be well defined

1) E ! (tf )"# $% = E &'[x(tf)]&x()*

+,-

T

2) E! (t)"# $% = &E

'H[x(t),u(t),! (t),t]'x

()*

+,-

T

3) E!H[x(t),u(t),"(t),t]

!u

#$%

&'(= 0

7/29/2019 Mae 546 Lecture 23

2/14

Stochastic Value Functionfor a Nonlinear System

However, a Hamilton-Jacobi-Bellman (HJB) based onexpectations can be solved

Base the optimization on the Principle of Optimality

Optimal expected value function at t1

V* t1( ) = E ! x * (tf )"# $% & L x *('),u* (')[ ]

tf

t1

( d')*+

,+

-.+

/+

=minu

E ! x *(tf)"# $% & L x * ('),u(')[ ]

tf

t1

( d')*+

,+

-.+

/+

Rate of Change of the Value Function

dV*dt

t= t1

= !E L x *(t1),u* (t1)[ ]{ }

x(t) and u(t) can be known precisely; therefore

Total time-derivative of V*

dV*

dtt= t1

= !L x * (t1),u *(t

1)[ ]

Incremental Changein the Value Function

Apply chain rule to total derivative

dV*

dt= E

!V*

!t+

!V*

!x!x

"

#$%

&'

!V* =dV*

dt!t= E

"V*"t

!t+"V*"x

!x!t+1

2!xT"2V*"x2

!x#$%

&'(!t2 +"

)

*+

,

-.

= E"V*"t

!t+"V*"x

f .( ) + Lw .( )( )!t+1

2f .( ) + Lw .( )( )

T "2V*"x2

f .( ) + Lw .( )( )#$%

&'(!t2 +"

)

*+

,

-.

Incremental change in value function,

Expand to second degree

!V

Introduction of the Trace

Trace of a matrix product

Tr ABC( ) = Tr CAB( ) = Tr BCA( )

Tr xTQx( ) = Tr xxTQ( ) = Tr QxxT( ) dim Tr ( )!" #$ = 1%1

dV*

dt! E

"V*"t

+"V*"x

f .( )+ Lw .( )( )+1

2Tr f .( )+ Lw .( )( )

T "2V*"x2

f .( )+ Lw .( )( )#$%

&'()t

*

+,

-

./

= E"V*"t

+"V*"x

f .( )+ Lw .( )( ) +1

2Tr

"2V*"x2

f .( )+ Lw .( )( ) f .( )+ Lw .( )( )T#

$%&'()t

*

+,

-

./

Cancel !t

7/29/2019 Mae 546 Lecture 23

3/14

Toward the StochasticHJB Equation

dV*

dt= E

!V*!t

+!V*!x

f .( ) + Lw .( )( )+1

2Tr

!2V*!x2

f .( ) + Lw .( )( ) f .( ) + Lw .( )( )T"

#$%&'(t

)

*+

,

-.

=

!V*!t

+

!V*!x

f .( ) + E!V*!x

Lw .( ) +1

2Tr

!2V*!x2

f .( ) + Lw .( )( ) f .( ) + Lw .( )( )T"

#$%&'(t

)

*+

,

-.

! Because x(t) and u(t)can be measured

Toward the Stochastic HJB Equation

E w t( )!" #$=

0

E w t( )wT %( )!" #$ =W t( )& t' %( )

dV*

dt=

!V*

!t+

!V*

!xf .( ) +

1

2lim"t#0

Tr!2V*

!x2E f .( ) f .( )

T( )"t+ LE w .( )w .( )T( )LT$% &'"t()*

+,-

=

!V*

!tt( ) +

!V*

!xt( )f .( ) +

1

2Tr

!2V*

!x2t( )L t( )W t( )L t( )

T$

%.

&

'/

! Uncertain disturbance input can only increase the valuefunction rate of change

! Disturbance is assumed to be zero-mean white noise

Stochastic Principleof Optimality

(Perfect Measurements)

!V*

!tt( ) =

"minu

E L x * t( ),u t( ),t#$ %& +!V*

!xt( )f x * t( ),u t( ),t#$ %& +

1

2Tr

!2V*

!x2t( )L t( )W t( )L t( )

T#

$'

%

&(

)*+,

-./,

Boundary (terminal) condition : V* tf( ) = E 0 tf( )#$ %&

dV*

dt= !V*

!tt( )+ !V*

!xt( ) f .( ) + 1

2Tr !

2

V*

!x2t( )L t( )W t( )L t( )

T

"#$ %

&'

! Substitute for total derivative, dV*/dt= L(x*,u*)

! Solve for the partial derivative, !V*/!t

! Stochastic HJB Equation

Observations of StochasticPrinciple of Optimality


!V*

!tt( ) =

"minu

E L x * t( ),u t( ),t#$ %& +!V*

!xt( ) f x * t( ),u t( ),t#$ %& +

1

2Tr

!2V*

!x2t( )L t( )W t( )L t( )

T#

$'

%

&(

)*+,

-./,

! Control has no effect on the disturbance input

! Criterion for optimality is the same as for the

deterministic case

! Disturbance uncertainty increases the magnitude

of the total optimal value function, V*(0)

7/29/2019 Mae 546 Lecture 23

4/14

Information Sets andExpected Cost

! Sigma algebra(Wikipedia definitions)

! The collection of sets over which a measure is defined

! The collection of events that can be assigned probabilities

!

A measurable space

! Information available at current time, t1! All measurements from initial time, to! All control commands from initial time

I t

o,t

1[ ] = z to,t1[ ],u to,t1[ ]{ }

The Information Set,I

! Plus available model structure, parameters, and statistics

I to,t1[ ] = z to ,t1[ ],u to ,t1[ ], f ( ),Q,R,!{ }

! Measurements may be directly useful, e.g.,

! Displays

! Simple feedback control

! ... or they may require processing, e.g.,

! Transformation

! Estimation

! Example of a derived information set

! History of mean and covariance from a state estimator

I

Dto, t

1[ ] = x to,t1[ ],P to, t1[ ],u to, t1[ ]{ }

A Derived Information Set, ID

!

Markov derived information set! Most current mean and covariance from

a state estimator

IMD

t1( ) = x t1( ),P t1( ),u t1( ){ }

Additional DerivedInformation Sets

! Multiple model derived information set

! Parallel estimates of current mean, covariance,and hypothesis probability mass function

IMM t1( ) = xA t1( ),PA t1( ),u t1( ),Pr HA( )!" #$ , xB t1( ),PB t1( ),u t1( ),Pr HB( )!" #$,!{ }

7/29/2019 Mae 546 Lecture 23

5/14

! Optimal control requires propagation of information back

from the final time

! Hence, it requires the entire information set, extending from tototf

Required and Available Information Setsfor Optimal Control

I to ,tf!" #$

! Separate information set into knowable and predictable parts

I to,tf!" #$ = I to, t1[ ]+ I t1,tf!" #$

! Knowable information has been received

! Predictable information is to come

Expected Values ofState and Control

! Expected values of the state and control are conditioned

on the information set

E x t( ) | ID

!" #$ = x t( )

E x t( )% x t( )!" #$ x t( ) % x t( )!" #$T

| ID{ } = P t( )

... where the conditional expected values are obtainedfrom a Kalman-Bucy filter

Dependence of the Stochastic CostFunction on the Information Set

! Expand the state covariance

J=1

2E E Tr S(t

f

)x(tf

)xT(t

f

)!"

#$| I

D

!

"

#

$+ E Tr Qx t

( )xT t

( )!"

#${ }

dt0

tf

%+ E Tr Ru t

( )uT t

( )!"

#${ }

dt0

tf

%

&'(

)(

*+(

,(

P t( ) = E x t( )! x t( )"# $% x t( )! x t( )"# $%T

| ID{ }

= E x t( )xT t( )! x t( )xT t( )! x t( ) xT t( )+ x t( ) xT t( )"# $% | ID{ }

E x t( ) xT t( )!" #$ | ID{ } = E x t( )x

Tt( )!" #$ | ID{ } = x t( ) x

Tt( )

P t( ) = E x t( )xT t( )!" #$ | ID{ } % x t( ) xT

t( )

or

E x t( )xT

t( )!" #$ | ID{ } = P t( ) +x t( )

x

T

t( )

Certainty-Equivalent andStochastic Incremental Costs

J= 1

2E Tr S(tf ) P tf( )+ x(tf )x

T(tf )!" #${ }+ Tr Q P t( ) + x t( ) xT t( )!" #${ }dt

0

tf

%+ Tr Ru t( )uT t( )!" #$dt

0

tf

%&

'(

)(

*

+(

,(

! JCE + JS

JCE =1

2E Tr S(tf)x(tf )x

T(tf )!" #$ + Tr Qx t( ) x

Tt( ){ }dt

0

tf

% + Tr Ru t( )uT

t( )!" #$dt0

tf

%&'(

)(

*+(

,(

JS =1

2E Tr S(tf )P tf( )!" #$ + Tr QP t( )!" #$dt

0

tf

%&'(

)(

*+(

,(

! Cost function has two parts

! Certainty-equivalent cost

! Stochastic increment cost

7/29/2019 Mae 546 Lecture 23

6/14

Expected Cost of the Trajectory

V* t

o( ) ! J* tf( ) = E ! x *(tf )"# $% + L x *(&),u *(&)[ ]t0

tF

' d&()*

+*

,-*

.*

E !( ) = E !| I to, t1[ ]( )Pr I to,t1[ ]{ }+ E !| I t1,tf"# $%( )Pr I t1,tf"# $%{ }= E E !| I( )"# $%

! Law of total expectation

! Optimized cost function

! Because the past is established at t1

E J*( ) = E J* | I to , t1[ ]( ) 1[ ]+ E J* | I t1,tf!" #$( )Pr I t1, tf!" #${ }= E J* | I to ,t1[ ]( )+ E J* | I t1,tf!" #$( )Pr I t1, tf!" #${ }

Expected Cost of the Trajectory

!For planning or post-trajectory analysis, one can assumethat the entire information set is available

! For real-time control, t1 !tf, and future information canonly be predicted

! If separation property applies (TBD), future conditioningeffect can be predicted

! If not, future conditioning effect can only be approximated

Separation Property andCertainty Equivalence

Separation Property Optimal Control Law and Optimal Estimation Law can be derived

separately Their derivations are strictly independent

Certainty Equivalence Property Separation property plus, ... The Stochastic Optimal Control Law and the Deterministic Optimal

Control Law are the same The Optimal Estimation Law can be derived separately Linear-quadratic-Gaussian control is certainty-equivalent

Neighboring-Optimal Control withUncertain Disturbance, Measurement,and Initial Condition

7/29/2019 Mae 546 Lecture 23

7/14

Immune Response Example

! Optimal open-loop drug therapy (control)

! Assumptions! Initial condition known without error

! No disturbance

! Optimal closed-loop therapy! Assumptions

! Small error in initial condition

! Small disturbance

! Perfect measurement of state

! Stochastic optimal closed-loop therapy! Assumptions

! Small error in initial condition

! Small disturbance

! Imperfect measurement

! Certainty-equivalence applies toperturbation control

Immune Response

Example with OptimalFeedback Control

Open-Loop Optimal Controlfor Lethal Initial Condition

Open- and Closed-LoopOptimal Control for 150%Lethal Initial Condition

Immune Response with Stochastic

Optimal Feedback Control(Random Disturbance and Measurement Error Not Simulated)

Low-Bandwidth Estimator

(|W| < |N|)

High-Bandwidth Estimator

(|W| > |N|)

! Initial control too sluggish

to prevent divergence

! Quick initial control

prevents divergence

Immune Response to Random Disturbancewith Stochastic Neighboring-Optimal Control

Disturbance due to

Re-infection Sequestered pockets

of pathogen

Noisy measurements

Closed-loop therapy isrobust

... but not robust enough:

Organ death occurs inone case

Probability of satisfactorytherapy can bemaximized by stochasticredesign of controller

7/29/2019 Mae 546 Lecture 23

8/14

Stochastic Linear-QuadraticOptimal Control

Stochastic Principle of Optimality Applied tothe Linear-Quadratic (LQ) Problem

! Linear dynamic constraint

V to( ) = E ! x(tf)"# $% & L x('),u(')[ ]

tf

to

( d')*+

,+

-.+

/+

=

1

2E x

T(t

f)S(t

f)x(t

f) & xT(t) uT(t)"#

$%

Q(t) M(t)

MT(t) R(t)

"

#

00

$

%

11

x(t)

u(t)

"

#00

$

%11

dttf

to

()*+

,+

-.+

/+

!x t( ) = F(t)x(t)+G(t)u(t)+ L(t)w(t)

! Quadratic value function

Components of the LQ Value Function

! Certainty-equivalent value function

V t( )=

1

2x

T

(t)S(t)x(t)

+v t( )

! Quadratic value function has two parts

! Stochastic value function increment

VCE t( ) !1

2xT(t)S(t)x(t)

v t( ) =1

2Tr S !( )L !( )W !( )L !( )

T"#

$%d!t

tf

&

Value Function Gradient and Hessian! Certainty-equivalent value function

! Gradient with respect to the state

VCE t( ) !1

2 xT

(t)S(t)x(t)

!V

!xt( ) = xT(t)S(t)

! Hessian with respect to the state

!2V

!x2

t( ) = S(t)

7/29/2019 Mae 546 Lecture 23

9/14

Linear-Quadratic StochasticHamilton-Jacobi-Bellman Equation


!Certainty-equivalent plus stochastic terms

!V*

!t= "min

u

1

2E x *

TQx*+2x *

TMu + u

TRu( )+ x *T S Fx *+Gu( )+ Tr SLWLT( )#$ %&

= "minu

1

2x *

TQx*+2x *

TMu + u

TRu( )+ x *T S Fx *+Gu( )+ Tr SLWLT( )#$ %&

! Terminal condition

V tf( ) =1

2x

T(tf )S(tf )x(tf )

Optimal Control Law! Differentiate right side of HJB equation w.r.t. u

and set equal to zero

! !V !t( )!u

= 0 = xTM + u

TR( )+ xTSG"# $%

u t( ) = !R!1 t( ) GT t( )S t( ) +MT t( )"# $%x t( )

! !C t( )x t( )

! Solve for u, obtaining feedback control law

LQ Optimal Control Law

u t( ) = !R!1 t( ) GT t( )S t( )+MT t( )"# $%x t( )

! !C t( )x t( )

! Zero-mean, white-noise disturbance has no effect on the structureand gains of the LQ feedback control law

Matrix Riccati Equation

! Substitute optimal control law in HJB equation

! Matrix Riccati equation provides S(t)

!S t( ) = !Q(t)+M(t)R!1(t)MT(t)"# $% ! F(t)!G(t)R

!1(t)M

T(t)"# $%

T

S t( )

! S t( ) F(t)!G(t)R!1(t)MT(t)"# $% + S t( )G(t)R!1(t)G

T(t)S t( ), S tf( ) = &xx tf( )

1

2

xT !Sx + !v =1

2

xT !Q + MR!1MT( )! F ! GR!1MT( )T

S ! S F ! GR!1MT( )+ SGR!1GTS"#

$%

x

+1

2Tr SLWL

T( ) u t( ) = !R!1

t( ) GT t( )S t( )+MT t( )"# $%x t( )

! Stochastic value function increases cost due to disturbance

! However, its calculation is independent of the Riccati equation

!v =1

2Tr SLWL

T( )

7/29/2019 Mae 546 Lecture 23

10/14

Evaluation of the Total Cost(Imperfect Measurements)

! Stochastic quadratic cost function, neglecting cross terms

J=1

2Tr E xT(tf )S(tf )x(tf )!" #$ + E x

T(t) uT(t)!"#$

Q(t) 0

0 R(t)

!

"

%

%

#

$

&

&

x(t)

u(t)

!

"

%

%

#

$

&

&

dt

to

tf

'()*

+*

,-*

.*

=1

2Tr S(tf )E x(tf )x

T(tf )!" #$ + Q(t)E x(t)xT(t)!" #$ + R(t)E u(t)u

T(t)!" #${ }dtto

tf

'

J=1

2Tr S(tf )P(tf )+ Q(t)P(t)+R(t)U t( )!" #$dt

to

tf

%&'(

)(

*+(

,(

where

P(t) ! E x(t)xT(t)!" #$

U t( ) ! E u(t)uT(t)!" #$

or

Optimal Control Covariance

u t( ) = !C t( ) x t( )

! Optimal control vector

U t( ) = C t( )P t( )CT t( )

= R!1

t( )GT t( )S t( )P t( )S t( )G t( )R!1 t( )

! Optimal control covariance

Revise Cost to Reflect State andAdjoint Covariance Dynamics

! Integration by parts

J=

1

2Tr S(to )P to( )+ Q(t)P t( ) + R(t)U t( ) + !S(t)P(t)+ S(t) !P(t)!" #$dt

to

tf

%&'(

)(

*+(

,(

S(t)P(t)to

tf= !S(t)P(t) + S(t) !P(t)!" #$dt

to

tf

%

S(tf )P(tf ) = S(to )P(to )+!S(t)P(t) + S(t) !P(t)!" #$dt

to

tf

%

! Rewrite cost function to incorporate initial cost

Evolution of State and AdjointCovariance Matrices

(No Control)

! State covariance response to random disturbance

! Adjoint covariance response to terminal cost

!P t( ) = F t( )P t( ) + P t( )FT t( ) + L t( )W t( )LT t( ), P to( ) given

!S t( ) = !FT t( )S t( ) ! S t( )F t( ) !Q t( ), S tf( ) given

u t( ) = 0; U t( ) = 0

7/29/2019 Mae 546 Lecture 23

11/14

Evolution of State and AdjointCovariance Matrices

(Optimal Control)

!State covariance response to random disturbance

! Adjoint covariance response to terminal cost

!P t( ) = F t( ) !G t( )C t( )"# $%P t( ) + P t( ) F t( ) !G t( )C t( )"# $%T + L t( )W t( )LT t( )

!S t( ) = !FT t( )S t( ) ! S t( )F t( ) !Q t( ) ! S t( )G t( )R!1 t( )GT t( )S t( )

Dependent on S(t)

Independent of P(t)

! With no control

Jno control =

12Tr S(to )P to( ) + S t( )L t( )W t( )LT t( )dt

to

tf

!"

#$$

%

&''

! With optimal control, the equation for the cost is the same

Joptimal control

=1

2Tr S(t

o)P t

o( )+ S t( )L t( )W t( )LT t( )dt

to

tf

!"

#$$

%

&''

! ... but evolutions of S(t) and S(to) are different in each case

Total Cost With andWithout Control

Next Time:Linear-Quadratic-GaussianRegulatorsSupplemental

Material

7/29/2019 Mae 546 Lecture 23

12/14

Dual Control(Fel"dbaum, 1965)

! Nonlinear system

! Uncertain system parameters to be estimated

! Parameter estimation can be aided by test inputs

! Approach: Minimize value function with three increments

! Nominal control

! Cautious control

! Probing control

minu

V* =

minu

V*nominal +V*cautious +V*probing( )

Estimation and control calculations are coupledand necessarily recursive

Adaptive Critic Controller

Nonlinear control law, c, takes the general form

On-line adaptive critic controller Nonlinear control law (action network) Criticizes non-optimal performance via critic network

Adapts control gains to improve performance Adapts cost model to improve estimate

u t( ) = c x(t),a,y * t( )[ ]

x(t) : state

a : parameters of operating point

y *( t) : command input

Algebraic Initialization of Neural Networks(Ferrari and Stengel)

Initially, c[x, a, y*] is unknown

Design PI-LQ controllers with integral compensation that

satisfy requirements at noperating points Scheduling variable, a

u t( ) = CF

a( )y *+CB

a( )!x + CI

a( ) !y t( )dt" # c x(t),a,y * t( )$% &'

Replace Gain Matrices by Neural Networks

Replace control gain matrices by sigmoidal neural networks

u t( ) = NNF

y * t( ),a t( )!" #$ + NNB x t( ),a t( )!" #$ + NNI %y t( )dt& ,a t( )!" #$ = c x(t),a, y * t( )!" #$

7/29/2019 Mae 546 Lecture 23

13/14

Initial Neural Control Law

Algebraic training of neural networks produces exact fit oflinear control gains and trim conditions at noperating points

Interpolation and gain scheduling via neural networks

One node per operating point in each neural network

On-line Optimization of Adaptive

Critic Neural Network Controller

Critic adapts neural network weights toimprove performance usingapproximate dynamic programming

Heuristic Dynamic ProgrammingAdaptive Critic

Dual Heuristic Programming Adaptive Critic for receding-horizonoptimization problem

Critic and Action (i.e., Control) networks adapted concurrently LQ-PI cost function applied to nonlinear problem

Modified resilient backpropagation for neural network training

V x tk

( )[ ] = L x tk( ),u tk( )[ ] + V x tk+1( )[ ]

!V

!u=!L

!u+!V

!x

!x

!u= 0

!V xat( )[ ]

!xat( )

= NNCx

at( ),a t( )[ ]

Action Network On-line TrainingTrain action network, at timet, holding the critic parameters fixed

NNC

Aircraft Model

Transition Matrices State Prediction

Utility Function

Derivatives

NNA

xa(t)

a(t)

Optimality

Condition

NNA

Target

Target Generation

7/29/2019 Mae 546 Lecture 23

14/14

Critic Network On-line TrainingTrain critic network, at time t, holding the action parameters fixed

NNC(old)

Utility Function

Derivatives

NNA

NNC

Target

Target Generation

Aircraft Model Transition Matrices

State Prediction

NNC

Target Cost

Gradient

xa(t)

a(t)