Download pdf - Draguna Vrabie - UTA talks/Sain workshop 10 07.pdfDraguna Vrabie “This is a very weird Control Structure whose likes I have not seen in control system theory” - suspicious quote

Automation & Robotics Research Institute (ARRI)The University of Texas at Arlington

F.L. LewisMoncrief-O’Donnell Endowed Chair

Head, Controls & Sensors Group

Talk available online at http://ARRI.uta.edu/acs

Notes on Optimal Control

Supported by :NSF - PAUL WERBOSARO – RANDY ZACHERY

Draguna Vrabie

Blocking zerosZero synthesismodule theory of multivariable zeros of dynamical systems Exactness of maps TSP nonlinear feedback synthesis

Michael K. Sain and Cheryl B. Schrader, “Bilinear Operators and Matrices,” in Mathematics for Circuits and Filters. Wai-Kai Chen, ed., CRC Press, pp. 23-41, 2000.

Cheryl B. Schrader and Michael K. Sain, “Zero Principles for Implicit Feedback Systems,” Circuits, Systems, and Signal Processing: Special Issue on Implicit and Robust Systems, Vol. 13, No. 2-3, pp. 273-293, 1994.

Michael K. Sain and Cheryl B. Schrader, “Feedback, Zeros, and Blocking Dynamics,” in Recent Advances in Mathematical Theory of Systems, Control, Networks and Signal Processing I. H. Kimura and S. Kodama, eds., Tokyo: Mita Press, pp. 227-232, 1992.

Michael K. Sain: Minimal Torsion Spaces and the Partial Input/Output Problem Information and Control 29(2): 103-124 (1975)

Ronald W. Diersing, Michael K. Sain, and Chang-Hee Won, “Bi-Cumulant Games: A generalization of H-infand H2/H-inf Control," IEEE Transactions on Automatic Control, Submitted, 2007.

M.K. Sain, B.F. Wyman, J.L. Peczkowski, “Extended zeros and model matching,” SIAM Journal on Control and Optimization, May 1991

James L. Massey, Michael K. Sain: Inverse Problems in Coding, Automata, and Continuous Systems FOCS 1967: 226-232

Michael K. Sain

Module theoretic zero structures for system matrices

Author(s): Wyman, Bostwick F.; Sain, Michael K.

Abstract: The coordinate-free module-theoretic treatment of transmission zeros for MIMO transfer functions developed by Wyman and Sain (1981) is generalized to include noncontrollable and nonobservable linear dynamical systems. ...

NASA Center: NASA (non Center Specific)Publication Year: 1987Added to NTRS: 2004-11-03Accession Number: 87A30190; Document ID: 19870042916

“Did I say that? Well, that was then. This is now.”

EIC editorial, IEEE Circuits & Systems magazine, v.3, no. 1, 2003

- M.K. Sain

Cell Homeostasis The individual cell is a complex feedback control system. It pumps ions across the cell membrane to maintain homeostatis, and has only limited energy to do so.

Cellular Metabolism

Permeability control of the cell membrane

http://www.accessexcellence.org/RC/VL/GG/index.html

Optimality in Biological Systems

Optimality in Control Systems DesignR. Kalman 1960

Rocket Orbit Injection

http://microsat.sm.bmstu.ru/e-library/Launch/Dnepr_GEO.pdf

FmmmF

rwvv

mF

rrvw

wr

−=

+−

=

+−=

=

φ

φμ

cos

sin2

2

ObjectivesGet to orbit in minimum timeUse minimum fuel

Dynamics

Adaptive Control is Not Optimal

Optimal Control is off-line, and needs to know the system dynamics to solve design eqs.

We want ONLINE ADAPTIVE OPTIMAL Control

),( uxfx =

∫∫∞∞

+==t

T

t

dtRuuxQdtuxrtxV ))((),())((

),,(),(),(),(),(0 uxVxHuxruxf

xVuxrx

xVuxrV

TT

∂∂

≡+⎟⎠⎞

⎜⎝⎛∂∂

=+⎟⎠⎞

⎜⎝⎛∂∂

=+= 0)0( =V

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

⎟⎟⎠

⎞⎜⎜⎝

⎛

∂∂

+=⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

⎟⎟⎠

⎞⎜⎜⎝

⎛

∂∂

+= ),(),(min),(min0*

)(

*

)(uxf

xVuxrx

xVuxr

T

tu

T

tu

xVxgRtxh T

∂∂

−= −*

12

1* )())((

dxdVggR

dxdVxQf

dxdV T

TT *1

*

41

*

)(0 −⎟⎟⎠

⎞⎜⎜⎝

⎛−+⎟⎟

⎠

⎞⎜⎜⎝

⎛= 0)0( =V

System

Cost

Hamiltonian

Optimal cost

Optimal control

HJB equation

Continuous-Time Optimal Control

Bellman

⎟⎟

⎠

⎞

⎜⎜

⎝

⎛⎟⎠⎞

⎜⎝⎛∂∂

+=⎟⎟

⎠

⎞

⎜⎜

⎝

⎛⎟⎠⎞

⎜⎝⎛∂∂

+= ),(),(min),(min0)()(

uxfxVuxrx

xVuxr

T

tu

T

tu

,

,

For a given control, the cost satisfies this eq.

In LQR, this is a Lyapunov eq

In LQR, this is a Riccati eq

Linear system, quadratic cost

System:Utility:

The cost is quadratic

Optimal control (state feedback):

HJB equation is the algebraic Riccati equation (ARE):

PBPBRQPAPA TT 10 −−++=

1( ) ( ) ( )Tu t R B Px t Lx t−= − = −

BuAxx +=

0,0;),( ≥>+= QRRuuQxxuxr TT

∫∞

==t

T tPxtxduxrtxV )()(),())(( τ

Full system dynamics must be known

),,(),(),(0 uxVxHuxruxf

xV T

∂∂

≡+⎟⎠⎞

⎜⎝⎛∂∂

=

0 ( , ( )) ( , ( ))T

kk k

V f x h x r x h xx

∂⎛ ⎞= +⎜ ⎟∂⎝ ⎠(0) 0kV =

1121( ) ( )T k

kVh x R g xx

−+

∂= −

∂

CT Policy Iteration

• Convergence proved by Saridis 1979 if Lyapunov eq. solved exactly

• Beard & Saridis used complicated GalerkinIntegrals to solve Lyapunov eq.

• Abu Khalaf & Lewis used NN to approx. V for nonlinear systems and proved convergence

RuuxQuxr T+= )(),(Utility

Cost for any given u(t)

Lyapunov equation

Iterative solution

Pick stabilizing initial control

Find cost

Update control

Full system dynamics must be known

To avoid solving HJB equation

LQR Policy iteration = Kleinman algorithm

1. For a given control policy solve for the cost:

2. Improve policy:

If started with a stabilizing control policy the matrix monotonically converges to the unique positive definite solution of the Riccati equation.Every iteration step will return a stabilizing controller.The system has to be known.

xLu k−=

kT

kT

kkkT

k RLLCCAPPA +++=0

11

−−= k

Tk PBRL

kk BLAA −=

0L kP

Kleinman 1968

Lyapunov eq.

Policy Iteration Solution

( ) T TRic P A P PA Q PBB P≡ + + −

1 1( ) ( ) 0T T T Ti i i i i iA BB P P P A BB P PBB P Q+ +− + − + + =

( ) 1

1 ( ), 0,1,ii i P iP P Ric Ric P i

−

+ ′= − = …

Policy iteration

This is in fact a Newton’s Method

Then, Policy Iteration is

Frechet Derivative

)()()(' iTT

iT

P PBBAPPPBBAPRici

−+−≡

Policy Iterations without Lyapunov Equations

Dynamic programmingbuilt on Bellman’s optimality principle – alternative form for CT Systems [Lewis & Syrmos 1995]

* *( )

( ( )) min ( ( ), ( )) ( ( ))t t

utt t t

V x t r x u d V x t tτ

τ

τ τ τ+Δ

≤ < +Δ

⎧ ⎫⎪ ⎪= + + Δ⎨ ⎬⎪ ⎪⎩ ⎭∫

)()()()())(),(( ττττττ RuuQxxuxr TT +=

Draguna Vrabie

f(x) and g(x) do not appear

Solving for the cost – Our approach

))(()())(( TtxVdtRuuQxxtxVTt

t

TT +++= ∫+

Lxu −=

Draguna Vrabie

f(x) and g(x) do not appear

For a given control

The cost satisfies

)()()()()( TtPxTtxdtRuuQxxtPxtx TTt

t

TTT ++++= ∫+

PBRL T1−=

LQR case

Optimal gain is

1 1( ( )) ( ) ( ( ))t T

T kT kk k

t

V x t x Qx u Ru dt V x t T+

+ += + + +∫

1. Policy iteration

11

1 +−

+ = kT

k PBRL

Initial stabilizing control is needed

Draguna Vrabie

For LQR case

Cost update

Control gain update

A and B do not appear

B needed for control update

( ) ( )kku t L x t= −

1 1( ) ( ) ( )( ) ( ) ( ) ( )t T

T T T Tk k k k

tx t P x t x Q L RL x d x t T P x t Tτ τ τ

+

+ += + + + +∫

Solving for the cost – Our approach

1 1( ) ( ) 0T T T Ti i i i i iA B B P P P A B B P P B B P Q+ +− + − + + =

Theorem

This algorithm converges and is equivalent to Kleinman’s Algorithm( ) 1

1 ( ) , 0 , 1 ,ii i P iP P R i c R i c P i

−

+′= − = …

1 1( ) ( ) ( )( ) ( ) ( ) ( )t T

T T T Tk k k k


+

+ += + + + +∫

1 1 11 1( ) ( ) 0T T T T

k k k k k kA BR B P P P A BR B P P BR B P Q− − −+ +− + − + + =

Lemma 1

is equivalent to

xQRKKxxAPPAxdt

xPxdi

Ti

Tiii

Ti

TiT

)()()(

+−=+=

Proof:

( ) ( ) ( ) ( ) ( ) ( )t T t T

T T T T Ti i i i i

t t

x Q K RK xd d x Px x t Px t x t T Px t Tτ+ +

+ = − = − + +∫ ∫

1 Tk kL R B P−=

Solves Lyapunov equation without knowing A or B

Only B is needed

1 1( ) ( ) ( )( ) ( ) ( ) ( )t T

T T T Tk k k k


+

+ += + + + +∫

Critic update

11

1 +−

+ = kT

k PBRL

1kp +Now use RLS along the trajectory to get new weights

Unpack weights into the matrix Pk+1

Then find updated FB gain

( )( ) ( )Tvec ABC C A vec B= ⊗

[ ]1 1( ) ( ) ( ) ( ) ( ) ( )t T

T T T Tk k i i

tp t p x t x t T x Q K RK x dϕ τ τ τ

+

+ +≡ − + = +∫

1 1( ) ( ) ( ) ( ) ( )t T

T T T Tk i i k

tp x t x Q K RK x d p x t Tτ τ τ

+

+ += + + +∫

is the quadratic basis set

Use Kronecker product

To set this up as ( ) ( ) ( )x t x t x t= ⊗

( , )t t Tρ≡ + Reinforcement on time interval [t, t+T]

Algorithm Implementation

Quadratic regression vector

1. Select initial control policy

2. Find associated cost

3. Improve control 11 1

Tk kL R B P−+ +=

[ ]1 ( ) ( ) ( ) ( ) ( ) ( , )t T

T T Tk k k

tp x t x t T x Q L RL x d t t Tτ τ τ ρ

+

+ − + = + = +∫

Solves Lyapunov eq. without knowing dynamics

t t+T

observe x(t)

observe x(t+T)

apply uk(t)=Lkx(t)

observe cost integral

update P

do RLS until convergence to Pk+1

update control gain to Lk+1

Measure cost increment (reinforcement)by adding V as a state. Then

( )T k T kV x Qx u Ru= +

A is not needed anywhere


The Critic update

can be setup as

Evaluating for n(n+1)/2 trajectory points, one can setup a least squares problem to solve

1 1( ) ( ) ( )( ) ( ) ( ) ( )t T

T T T Tk k k k


+

+ += + + + +∫

11 ( )T

ip XX XY−+ =

1 2[ ( ) ( ) ... ( )]NX t t tϕ ϕ ϕ=1 2[ ( , ) ( , ) ... ( , )]N T

i i iY d x K d x K d x K=

( ( ), )kd x t L

Or use batch Least-Squares solution along the trajectory

[ ]1 1( ) ( ) ( ) ( ) ( ) ( ) ( ( ), )t T

T T T Ti i k k k

tp t p x t x t T x Q L RL x d d x t Lϕ τ τ τ

+

+ +≡ − + = + ≡∫

is the quadratic basis set( ) ( ) ( )x t x t x t= ⊗

Direct Optimal Adaptive Controller

A hybrid continuous/discrete dynamic controller whose internal state is the observed cost over the interval

Draguna Vrabie

“This is a very weirdControl Structure whose likesI have not seen in control system theory”

- suspicious quote by F.L. Lewis, 2007u

VZOH T

0; xBuAxx +=System

RuuQxxV TT +=

Critic

TT

xu

VZOH T

0; xBuAxx +=System

RuuQxxV TT +=

Critic

Actor

TT

x

FB Gain L

Lkmultiplier

DynamicControlSystem

)()( txLtu kk −=

Continuous-time control with discrete gain updates

t

Lk

k0 1 2 3 4 5

Sample periods need not be the sameThey can be selected on-line in real time

Gain update (Policy)

Control

1( ) ( ) ( )( ) ( ) ( ) ( )t T

T T T Tk k k k


+

+ = + + + +∫

1( ( )) ( ) ( ( ))t T

T kT kk k

t

V x t x Qx u Ru dt V x t T+

+ = + + +∫

2. CT ADP Greedy iteration

)()( txLtu kk −=

11

1 +−

+ = kT

k PBRL

No initial stabilizing control needed

Draguna Vrabie

LQR

Control policy

Cost update

Control gain update A and B do not appear

B needed for control update

Direct Optimal Adaptive Control for Partially Unknown CT Systems

( )( ) ( )Tvec ABC C A vec B= ⊗

1( ) ( ) ( )( ) ( ) ( ) ( )t T

T T T Tk k k k


+

+ = + + + +∫

1 ( ) ( ) ( ) ( ) ( )t T

T T T Tk k k k

tp x t x Q L RL x d p x t Tτ τ τ

+

+ = + + +∫

The critic update

Use Kronecker product

To set this up as

1kp +Now use RLS along the trajectory to get new weights

Unpack weights into the matrix Pk+1

Then find updated FB gain 11

1 +−

+ = kT

k PBRL

is the quadratic basis set( ) ( ) ( )x t x t x t= ⊗


Previous weightsRegression vector

1. Select control policy

2. Find associated cost

3. Improve control 11 1

Tk kL R B P−+ +=

1 ( ) ( ) ( ) ( ) ( )t T

T T T Tk k k k

tp x t x Q L RL x d p x t Tτ τ τ

+

+ = + + +∫

Solves for cost update without knowing dynamics

t t+T

observe x(t)

observe x(t+T)

apply u

observe cost integral

update P

do RLS until convergence to Pk+1

update control gain to Lk+1

Measure cost increment by adding V as a state. Then ( )T i T iV x Qx u Ru= +

No initial stabilizing control needed

A is not needed anywhere

Direct Optimal Adaptive Controller

A hybrid continuous/discrete dynamic controller whose internal state is the observed value over the interval

Draguna Vrabie

u

VZOH T

0; xBuAxx +=System

RuuQxxV TT +=

Critic

TT

xu

VZOH T

0; xBuAxx +=System

RuuQxxV TT +=

Critic

Actor

TT

x

FB Gain L

LkDynamicControlSystem

Has a different critic cost updateNo initial stabilizing gain needed

Greedy update is equivalent to

Analysis of the algorithm

For a given control policy

( ){ }1 0( ( )) ( ( )), 0t T TT k k

k ktV x t x Qx u Ru d V x t T Vτ

+

+ = + + + =∫

( ) ( )kku t L x t= − k

Tk PBRL 1−=

kT

k PBBRAA 1−−=

1 ( )T Tk k k k

t TA t A t A T A TT

k k k kt

P e Q L RL e dt e P e+

+ = + +∫

with

a strange pseudo-discretized RE

Draguna Vrabie

( ) APBBPBPBPAQAPAP kT

KT

kkT

kT

k1

1−

+ +−+=c.f. DT RE

( ) kKT

kTkkk

Tkk LBPBPLQAPAP +++=+1

∫ −++=−+

TtA

kT

kkktA

kk dteRLLQAPAPePP kTk

01 )(

Lemma 2. CT HDP is equivalent to

kT

k PBBRAA 1−−=

ADP solves the CT ARE without knowledge of the system dynamics A

Analysis of the algorithmDraguna Vrabie

This extra term means the initial Control action need not be stabilizing

When ADP converges, the resulting P satisfies the Continuous-Time ARE !!

Direct OPTIMAL ADAPTIVE CONTROL

Solve the Riccati Equation WITHOUT knowing the plant dynamics

Model-free ADP

Works for Nonlinear Systemsa neural network is used to approximate the cost

Robustness?Comparison with adaptive control methods?

Policy Evaluation – Critic updateLet K be any state feedback gain for the system (1). One can measure the associated cost over the infinite time horizon

where is an initial infinite horizon cost to go.

( , ( )) ( ) ( ) ( ) ( , ( ))t T

T T

tV t x t x Q K RK x d W t T x t Tτ τ τ

+= + + + +∫

( , ( ))W t T x t T+ +

What to do about the tail – issues in Receding Horizon Control

)()( txLtu kk −=

Continuous-time control with discrete gain updates

t

Lk

k0 1 2 3 4 5

Sample periods need not be the same

Gain update (Policy)

Control

0 0.5 1 1.5 2-0.3

-0.2

-0.1

0Control signal

Time (s)

0 0.5 1 1.5 2-0.4

-0.2

0Controller parameters

Time (s)

0 0.5 1 1.5 20

0.5

1

1.5

2

2.5

3

3.5

4System states

Time (s)

0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

Critic parameters

Time (s)

P(1,1)P(1,2)P(2,2)P(1,1) - optimalP(1,2) - optimalP(2,2) - optimal

Simulations on: F-16 autopilotLoad frequency control for power system

A matrix not needed

Converge to SS Riccati equation soln

),( uxfx =

∫∫∞∞

+==t

T

t

dtRuuxQdtuxrtxV ))((),())((

),,(),(),(),(),(0 uxVxHuxruxf

xVuxrx

xVuxrV

TT

∂∂

≡+⎟⎠⎞

⎜⎝⎛∂∂

=+⎟⎠⎞

⎜⎝⎛∂∂

=+= 0)0( =V

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

⎟⎟⎠

⎞⎜⎜⎝

⎛

∂∂

+=⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

⎟⎟⎠

⎞⎜⎜⎝

⎛

∂∂

+= ),(),(min),(min0*

)(

*

)(uxf

xVuxrx

xVuxr

T

tu

T

tu

xVxgRtxh T

∂∂

−= −*

12

1* )())((

dxdVggR

dxdVxQf

dxdV T

TT *1

*

41

*

)(0 −⎟⎟⎠

⎞⎜⎜⎝

⎛−+⎟⎟

⎠

⎞⎜⎜⎝

⎛= 0)0( =V

System

Cost

Hamiltonian

Optimal cost

Optimal control

HJB equation

Continuous-Time Optimal Control

Bellman

⎟⎟

⎠

⎞

⎜⎜

⎝

⎛⎟⎠⎞

⎜⎝⎛∂∂

+=⎟⎟

⎠

⎞

⎜⎜

⎝

⎛⎟⎠⎞

⎜⎝⎛∂∂

+= ),(),(min),(min0)()(

uxfxVuxrx

xVuxr

T

tu

T

tu

)())(,()( 1++= khkkkh xVxhxrxV γ

c.f. DT value recursion, where f(), g() do not appear