39
Neighboring-Optimal Control via Linear-Quadratic (LQ) Feedback Robert Stengel Optimal Control and Estimation, MAE 546 Princeton University, 2018 Linearization of nonlinear dynamic models Nominal trajectory Perturbations about the nominal trajectory Linear, time-invariant dynamic models Examples Continuous-time, time-varying optimal LQ feedback control Discrete-time and sampled-data linear dynamic models Dynamic programming approach to optimal LQ sampled-data control Copyright 2018 by Robert Stengel. All rights reserved. For educational use only. http://www.princeton.edu/~stengel/MAE546.html http://www.princeton.edu/~stengel/OptConEst.html 1 Neighboring Trajectories Satisfy Same Dynamic Equations Neighboring-trajectory dynamic model is the same as the nominal dynamic model ! x N (t ) = f [ x N (t ), u N (t ), w N (t )], x N t o ( ) given ! x(t ) = f [ x(t ), u(t ), w(t )], x t o ( ) given Δx(t o ) = x(t o ) x N (t o ) Δx(t ) = x(t ) x N (t ) Δ ! x(t ) = ! x(t ) ! x N (t ) ! x N (t ) = f [ x N (t ), u N (t ), w N (t ), t ] ! x(t ) = ! x N (t ) + Δ ! x(t ) = f [ x N (t ) + Δx(t ), u N (t ) + Δu(t ), w N (t ) + Δw(t ), t ] Δu(t ) = u(t ) u N (t ) Δw(t ) = w(t ) w N (t ) 2

Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Neighboring-Optimal Control via Linear-Quadratic (LQ) Feedback

Robert Stengel Optimal Control and Estimation, MAE 546

Princeton University, 2018

•  Linearization of nonlinear dynamic models–  Nominal trajectory–  Perturbations about the nominal trajectory–  Linear, time-invariant dynamic models–  Examples

•  Continuous-time, time-varying optimal LQ feedback control

•  Discrete-time and sampled-data linear dynamic models

•  Dynamic programming approach to optimal LQ sampled-data control

Copyright 2018 by Robert Stengel. All rights reserved. For educational use only.http://www.princeton.edu/~stengel/MAE546.html

http://www.princeton.edu/~stengel/OptConEst.html

1

Neighboring Trajectories Satisfy Same Dynamic Equations

•  Neighboring-trajectory dynamic model is the same as the nominal dynamic model

!xN (t) = f[xN (t),uN (t),wN (t)], xN to( ) given

!x(t) = f[x(t),u(t),w(t)], x to( ) given

Δx(to ) = x(to )− xN (to )Δx(t) = x(t)− xN (t)Δ!x(t) = !x(t)− !xN (t)

!xN (t) = f[xN (t),uN (t),wN (t),t]!x(t) = !xN (t)+ Δ!x(t) = f[xN (t)+ Δx(t),uN (t)+ Δu(t),wN (t)+ Δw(t),t]

Δu(t) = u(t)− uN (t)Δw(t) = w(t)−wN (t)

2

Page 2: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Solve for the Nominal and Perturbation Trajectories Separately

Jacobian matrices of the linear model are evaluated along the nominal trajectory

!xN (t) = f[xN (t),uN (t),wN (t),t],xN to( ) given

Δ!x(t) ≈ ∂ f∂x

t( )Δx(t)+ ∂ f∂u

t( )Δu(t)+ ∂ f∂w

t( )Δw(t),

Δx to( ) given

F(t) ! ∂ f∂x x=xN (t )

u=uN (t )w=wN (t )

; G(t) ! ∂ f∂u x=xN (t )

u=uN (t )w=wN (t )

; L(t) ! ∂ f∂w x=xN (t )

u=uN (t )w=wN (t )

Δx(t) = F(t)Δx(t) +G(t)Δu(t) + L(t)Δw(t), Δx to( ) given3

!x(t) = !xN (t)+ Δ!x(t)≈ f[xN (t),uN (t),wN (t),t]

+ ∂ f∂x

Δx(t)+ ∂ f∂u

Δu(t)+ ∂ f∂w

Δw(t),

x(t0 ) = xN (t0 )+ Δx(to ) given

Linearization Example

4

Page 3: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Cubic Springs

Force is a nonlinear function of deflection

f x( ) = −k1x ± k2x3

moment θ( ) ≈ −k1θ + k2θ3

Example: Expand gsinθ in a power series

5

Stiffening Cubic Spring Example2nd-order nonlinear dynamic model

!x1(t) = f1 x(t)[ ] = x2 (t)!x2 (t) = f2 x(t)[ ] = −10x1(t)−10x1

3(t)− x2 (t)Integrate equations to produce nominal path

x1(0)x2 (0)

⎣⎢⎢

⎦⎥⎥→

f1N x(t)[ ]f2N x(t)[ ]

⎢⎢

⎥⎥dt→

0

t f

∫x1N (t)

x2N (t)

⎣⎢⎢

⎦⎥⎥

in 0,t f⎡⎣ ⎤⎦

Evaluate partial derivatives of the Jacobian matrices

∂ f1∂ x1

= 0; ∂ f1∂ x2

= 1

∂ f2∂ x1

= −10 − 30x1N2 (t); ∂ f2

∂ x2= −1

∂ f1∂u = 0;

∂ f1∂w = 0

∂ f2∂u = 0;

∂ f2∂w = 0

6

Page 4: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Nominal and Perturbation Dynamic Equations

!x1N (t) = x2N (t)

!x2N (t) = −10x1N (t)−10x1N3(t)− x2N (t)

Δx1(t)Δx2 (t)

⎣⎢⎢

⎦⎥⎥=

0 1− 10 + 30x1N

2 (t)( ) −1

⎣⎢⎢

⎦⎥⎥

Δx1(t)Δx2 (t)

⎣⎢⎢

⎦⎥⎥

x1N (0)

x2N (0)

⎣⎢⎢

⎦⎥⎥= 0

9⎡

⎣⎢

⎦⎥

Nonlinear, Time-Invariant (NTI) Equation

Linear, Time-Varying (LTV) Equation

Initial Conditions for Nonlinear and Linear Models

7

Δx1(0)Δx2 (0)

⎣⎢⎢

⎦⎥⎥= 0

1⎡

⎣⎢

⎦⎥

Comparison of Approximate and Exact Solutions

xN (t)Δx(t)xN (t)+ Δx(t)x(t)

x2N (0) = 9Δx2 (0) = 1x2N (t)+ Δx2 (t) = 10x2 (t) = 10

!xN (t)Δ!x(t)!xN (t)+ Δ!x(t)!x t( )

8

Page 5: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Euler-Lagrange Equations for Minimizing Variational

Cost Function

9

Optimize Nominal and Neighboring Dynamic Models Separately

Linear, Neighboring-Optimal Control Added to Nominal Control

Δu(t) = −C(t)Δx(t)= −C(t) x(t)− x*(t)[ ]

Nominal, Nonlinear, Open-Loop Optimal Control

u(t) = uopt (t) ! u*(t)

10

u(t) = u*(t)+ Δu(t)

Page 6: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Expand Optimal Control Function

J x*(t0 )+ Δx(t0 )[ ], x*(t f )+ Δx(t f )⎡⎣ ⎤⎦{ } !J * x*(t0 ),x*(t f )⎡⎣ ⎤⎦ + ΔJ Δx(t0 ),Δx(t f )⎡⎣ ⎤⎦ + Δ2J Δx(t0 ),Δx(t f )⎡⎣ ⎤⎦

Nominal optimized cost, plus nonlinear dynamic constraint

Expand optimized cost function to second degree

J * x*(t0 ),x*(t f )⎡⎣ ⎤⎦ = φ x*(t f )⎡⎣ ⎤⎦ + L x*(t),u*(t)[ ]t0

t f

∫ dt

subject to nonlinear dynamic equation!x*(t) = f x*(t),u*(t)[ ], x(t0 ) = x0

= J * x *(t0 ),x *(t f )⎡⎣ ⎤⎦ + Δ2J Δx(t0 ),Δx(t f )⎡⎣ ⎤⎦because First Variation, ΔJ Δx(t0 ),Δx(t f )⎡⎣ ⎤⎦ = 0

11

2nd Variation of the Cost Function

Cost weighting matrices expressed as

Objective: Given optimal nominal solution, minimize 2nd-variational cost subject to linear dynamic constraint

minΔu

Δ2J = 12ΔxT (t f )φxx (t f )Δx(t f )+

12

ΔxT (t) ΔuT (t)⎡⎣

⎤⎦

Lxx (t) Lxu(t)Lux (t) Luu(t)

⎣⎢⎢

⎦⎥⎥

Δx(t)Δu(t)

⎣⎢⎢

⎦⎥⎥dt

t0

t f

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

P(t f ) ! φxx (t f ) =∂2φ∂x2

(t f )

Q(t) M(t)MT (t) R(t)

⎣⎢⎢

⎦⎥⎥!

Lxx (t) Lxu(t)Lux (t) Luu(t)

⎣⎢⎢

⎦⎥⎥

dim P(t f )⎡⎣ ⎤⎦ = dim Q(t)[ ] = n × ndim R(t)[ ] = m ×mdim M(t)[ ] = n ×m

subject to perturbation dynamicsΔ!x(t) = F(t)Δx(t)+G(t)Δu(t), Δx(t0 ) = Δxo

12

Page 7: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

2nd Variational Hamiltonian

Δ2J = 12ΔxT (t f )P(t f )Δx(t f )+

12

ΔxT (t) ΔuT (t)⎡⎣

⎤⎦

Q(t) M(t)MT (t) R(t)

⎣⎢⎢

⎦⎥⎥

Δx(t)Δu(t)

⎣⎢⎢

⎦⎥⎥dt

t0

t f

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

H Δx(t),Δu(t),Δλ t( )⎡⎣ ⎤⎦ = L Δx(t),Δu(t)[ ]+ ΔλT t( )f Δx(t),Δu(t)[ ]= 12

ΔxT (t)Q(t)Δx(t)+ 2ΔxT (t)M(t)Δu(t)+ ΔuT (t)R(t)Δu(t)⎡⎣ ⎤⎦

+ ΔλT t( ) F(t)Δx(t)+G(t)Δu(t)[ ]

Variational Lagrangian plus adjoined dynamic constraint

Variational cost function

= 12ΔxT (t f )P(t f )Δx(t f )+

12

ΔxT (t)Q(t)Δx(t)+ 2ΔxT (t)M(t)Δu(t)+ ΔuT (t)R(t)Δu(t)⎡⎣ ⎤⎦dtt0

t f

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

13

2nd Variational Euler-Lagrange Equations

Δλ t f( ) = φxx (t f )Δx(t f ) = P(t f )Δx(t f )

Terminal condition, solution for adjoint vector, and optimality condition

Δ λ t( ) = −

∂H Δx(t),Δu(t),Δλ t( )⎡⎣ ⎤⎦∂Δx

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

T

= −Q(t)Δx(t) −M(t)Δu(t) − FT (t)Δλ t( )

∂H Δx(t),Δu(t),Δλ t( )⎡⎣ ⎤⎦∂Δu

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

T

=MT (t)Δx(t) + R(t)Δu(t) −GT (t)Δλ t( ) = 0

14Can explicitly solve for control

E-L #1

E-L #2

E-L #3

Page 8: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Two-Point Boundary-Value Problem

Δx(t) = F(t)Δx(t) +G(t)Δu(t)

Δλ t( ) = −Q(t)Δx(t) −M(t)Δu(t) − FT (t)Δλ t( )

Δλ t f( ) = P(t f )Δx(t f )

Δx(t0 ) = Δx0

State Equation

Adjoint Vector Equation

15

Δu(t) = −R−1(t) MT (t)Δx(t) +GT (t)Δλ t( )⎡⎣ ⎤⎦

From Hu = 0

Use Control Law to Solve the LTV Two-Point Boundary-Value Problem

Δu(t) = −R−1(t) MT (t)Δx(t) +GT (t)Δλ t( )⎡⎣ ⎤⎦

Δx(t)Δ λ t( )

⎣⎢⎢

⎦⎥⎥=

F(t) −G(t)R−1(t)MT (t)⎡⎣ ⎤⎦ −G(t)R−1(t)GT (t)

−Q(t) +M(t)R−1(t)MT (t)⎡⎣ ⎤⎦ − F(t) −G(t)R−1(t)MT (t)⎡⎣ ⎤⎦T

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

Δx(t)Δλ t( )

⎣⎢⎢

⎦⎥⎥

Δx(t0 )

Δλ t f( )⎡

⎢⎢

⎥⎥=

Δx0PfΔx f

⎣⎢⎢

⎦⎥⎥

Perturbation state vectorPerturbation adjoint vector

From Hu = 0

Substitute for control in system and adjoint equations

Control law that feeds back state and adjoint vectors

Adjoint relationship at end point

16

Page 9: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Use Control Law to Solve the LTV Two-Point Boundary-Value Problem

Assume the adjoint relationship between state and control applies over the entire interval

Δλ t( ) = P t( )Δx t( )Substitute for adjoint: Control law feeds back state alone

Δu*(t)= –R–1(t) MT(t)Δx(t)+GT(t)P t( )Δx t( )⎡⎣ ⎤⎦= –R–1(t) MT(t)+GT(t)P t( )⎡⎣ ⎤⎦Δx t( )

dim C( ) = m × n

17 Δu*(t) ! argmin

Δu t( ) in t0 , t f( )Δ2J Δx Δu( )⎡⎣ ⎤⎦

Δu*(t) = –C(t)Δx t( )

Time-Varying Linear-Quadratic (LQ) Optimal Control Gain Matrix

•  Properties of feedback gain matrix–  Full state feedback (m x n)–  Time-varying matrix

•  R, G, and M given•  Control weighting matrix, R•  State-control weighting matrix, M•  Control effect matrix, G

Δu(t) = −C(t)Δx t( )

C(t) = R−1(t) GT (t)P t( ) +MT (t)⎡⎣ ⎤⎦

Optimal feedback gain matrix

18P(t) remains to be determined

Page 10: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Solution for the Adjoining Matrix, P(t)

Δλ t( ) = P t( )Δx t( ) + P t( )Δx t( )

P t( )Δx t( ) = Δ λ t( ) − P t( )Δx t( )

Time-derivative of adjoint vector

Rearrange

!P t( )Δx t( ) = −Q(t)+M(t)R−1(t)MT (t)⎡⎣ ⎤⎦Δx t( )− F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦TΔλ t( )

−P t( ) F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦Δx t( )−G(t)R−1(t)GT (t)Δλ t( ){ }

Recall coupled state/adjoint equation

Δx(t)Δ λ t( )

⎣⎢⎢

⎦⎥⎥=

F(t) −G(t)R−1(t)MT (t)⎡⎣ ⎤⎦ −G(t)R−1(t)GT (t)

−Q(t) +M(t)R−1(t)MT (t)⎡⎣ ⎤⎦ − F(t) −G(t)R−1(t)MT (t)⎡⎣ ⎤⎦T

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

Δx(t)Δλ t( )

⎣⎢⎢

⎦⎥⎥

Substitute in adjoint matrix equation

19

Solution for the Adjoining Matrix, P(t)

Substitute for adjoint vector

Δλ t( ) = P t( )Δx t( )

!P t( )Δx t( ) = −Q(t)+M(t)R−1(t)MT (t)⎡⎣ ⎤⎦Δx t( )− F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦

TP t( )Δx t( )

−P t( ) F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦Δx t( )−G(t)R−1(t)GT (t)P t( )Δx t( ){ }

... and eliminate state vector

20

Page 11: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Matrix Riccati Equation for P(t)

!P t( ) = −Q(t)+M(t)R−1(t)MT (t)⎡⎣ ⎤⎦ − F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦TP t( )

−P t( ) F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦ + P t( )G(t)R−1(t)GT (t)P t( )P t f( ) = φxx t f( )

Nonlinear, ordinary differential equation for the Riccati Matrix, P(t), with terminal boundary conditions

21

Characteristics of the Adjoining ( Riccati) Matrix, P(t)

•  P(tf) is symmetric, n x n, and typically positive semi-definite

•  Matrix Riccati equation is symmetric•  Therefore, P(t) is symmetric and positive semi-definite

throughout•  Once P(t) has been determined, optimal feedback

control gain matrix, C(t) can be calculated

22

Page 12: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Example of Neighboring-Optimal Control: Improved Infection Treatment via

Feedback

23

Natural Response to Pathogen Assault (No Therapy)

24

Page 13: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

J = 12s11x1 f

2 + s44x4 f

2( ) + 12

q11x12 + q44x4

2 + ru2( )dtto

t f

Tradeoff Between Rate of Killing Pathogen, Preservation of Organ

Health, and Drug Use

Optimal Open-Loop Control

25

u(t) = u*(t)−C(t)Δx(t)= u*(t)−C(t)[x(t)− x*(t)]

50% Increased Infection: Nominal and Neighboring-Optimal Control (u1)

26

Page 14: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Optimal Control of Linear Systems

27

Time-Varying Linear-Quadratic (LQ) Feedback Control

LTV Continuous-time linear dynamic system

LTV optimal control law

Linear dynamic system with LTV optimal feedback control

Δx(t) = F(t)Δx(t) +G(t)Δu(t)

Δu(t) = −R−1(t) MT (t)+GT (t)P t( )⎡⎣ ⎤⎦Δx t( ) −C(t)Δx t( )

Δx(t) = F(t)Δx(t) +G(t)Δu(t)

= F(t)Δx(t)+G(t) −C(t)Δx t( )⎡⎣ ⎤⎦= F(t)−G(t)C(t)[ ]Δx(t)

28

Page 15: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

LTI System with Time-Varying LQ Feedback Control

Open-loop system is LTI, but optimal control gain matrix is time-varying

Closed-loop system Δu(t) = −R−1 MT +GTP t( )⎡⎣ ⎤⎦Δx t( ) −C(t)Δx t( )

Δ!x(t) = FΔx(t)+G −C(t)Δx t( )⎡⎣ ⎤⎦= F −GC(t)[ ]Δx(t) = Fclosed−loop t( )Δx(t)

29

LTI dynamic system

Δx(t) = FΔx(t) +GΔu(t)

Example: LTV Optimal Control of a LTI First-Order System

Δx = fΔx + gΔu

Δu = −r−1 gp t( )⎡⎣ ⎤⎦Δx t( )

= −gp t( )r

Δx

!p t( ) = −q − 2 fp t( ) + g2p2 t( )r

p t f( ) = pf

Δ2J = 12pfΔx

2 (t f )+12

qΔx2 + 0( )ΔxΔu + rΔu2( )dtt0

t f

30

Page 16: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

!p t( ) = −1+ 2p t( ) + p2 t( )p t f( ) = 1

Δx = −Δx + Δu; x 0( ) = 1

Δu = − p t( )Δx

Δx = − 1+ p t( )⎡⎣ ⎤⎦Δx

Example: LTV Optimal Control of a Stable First-Order Systemf = −1; g = 1

q = r = 1

Control gain = p t( )

Δxopen−loop t( )

Δxclosed−loop t( )

p t( )

Δu t( )

31

Δu = − p t( )Δx

Δx = 1− p t( )⎡⎣ ⎤⎦Δx

Example: LTV Optimal Control of an Unstable First-Order Systemf = +1; g = 1

Control gain = p t( )

Δx = Δx + Δu; x 0( ) = 1

!p t( ) = −1− 2p t( ) + p2 t( )p t f( ) = 1

Δxopen−loop t( )

Δxclosed−loop t( )

p t( )

Δu t( )

32

Page 17: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Discrete-Time and Sampled-Data Linear, Time-Invariant

(LTI) Systems

33

Continuous- and Discrete-Time LTI System Models

Δ!x(t) = FΔx(t)+GΔu(t)+LΔw(t)Δx(t0 ) given

Continuous-time (“analog”) model is based on an ordinary differential equation

Dynamic Process

Observation Process

34

Δx(tk+1) = ΦΔx(tk )+ ΓΔu(tk )+ ΛΔw(tk )Δx(t0 ) given

Dynamic Process

Observation Process

Discrete-time (“digital”) model is based on an ordinary difference equation

Δy(t) = HxΔx(t)+HuΔu(t)+HwΔw(t)

Δy(tk ) = HxΔx(tk )+HuΔu(tk )+HwΔw(tk )

Page 18: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Digital Control Systems Use Sampled Data

Δxk = Δx tk( ) = Δx kΔt( )

•  Periodic sequence

•  Sampler is an analog-to-digital (A/D) converter•  Reconstructor is a digital-to-analog (D/A) converter

35

Solution of a LTI dynamic model

Δ!x(t) = FΔx(t)+GΔu(t)+LΔw(t), Δx(t0 ) given

Δx(t) = Δx(t0 )+ FΔx(τ )+GΔu(τ )+LΔw(τ )[ ]dτt0

t

∫•  ... has two parts

–  Unforced (homogeneous) response to initial conditions–  Forced response to control and disturbance inputs

LTI System Response to Inputs and Initial Conditions

36

Initial condition response

Step input response

Page 19: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Unforced Response to Initial Conditions

The state transition matrix propagates the state from t0 to t by a single multiplication

Δx(t) = Δx(t0 )+ FΔx(τ )[ ]dτt0

t

∫ = Φ t,t0( )Δx(t0 )

Neglecting forcing functions

The state transition matrix is the matrix exponential

Δx(t) = Δx(t0 )+ FΔx(τ )[ ]dτt0

t

∫= Φ t − t0( )Δx(t0 ) = eF t−t0( )Δx(t0 ) 37

LTI State Transition Matrix is the Matrix Exponential

eF t−t0( ) ! eFδ t =Matrix Exponential! Φ δ t( ) = State Transition Matrix

= I+ Fδ t + 12!Fδ t[ ]2 + 1

3!Fδ t[ ]3 + ...

See pages 79-84 of Optimal Control and Estimation for a description of how the State Transition Matrix is calculated for LTV

and LTI systems

38

Page 20: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Response to Inputs

Solution of the LTI model with piecewise-constant

forcing functions

Δx(tk ) = Δx(tk−1) + FΔx(τ ) +GΔu(τ ) + LΔw(τ )[ ]dτtk−1

tk

Δx(tk ) = eFδ tΔx(tk−1)+ e

Fδ t e−F τ −tk−1( )⎡⎣ ⎤⎦dτtk−1

tk

∫ GΔu(tk−1)+LΔw(tk−1)[ ]= ΦΔx(tk−1)+ ΓΔu(tk−1)+ ΛΔw(tk−1)

39

δ t ! tk − tk−1 = constant

Φ ! eFδ t

Γ ! eFδ t I− e−Fδ t( )F−1G = eFδ t − I( )F−1G

Λ ! eFδ t I− e−Fδ t( )F−1L = eFδ t − I( )F−1L

Discrete-Time LTI System Response to Step Input

Δx(t1) = ΦΔx(t0 )+ ΓΔu(t0 )+ ΛΔw(t0 )Δx(t2 ) = ΦΔx(t1)+ ΓΔu(t1)+ ΛΔw(t1)Δx(t3) = ΦΔx(t2 )+ ΓΔu(t2 )+ ΛΔw(t2 )

!

Step Input: Discrete-time solution is exact at sampling instants

Propagation of Δx tk( )with constant δ t, Φ, Γ, and Λ

40

Page 21: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

As time interval becomes very small, discrete-time model approaches

continuous-time model

Φ δ t→0⎯ →⎯⎯ I + Fδt( )Γ δ t→0⎯ →⎯⎯ GδtΛ δ t→0⎯ →⎯⎯ Lδt

Series Expressions for Discrete-Time LTI System Matrices

Γ = eFδ t − I( )F−1G = I + 12!F δt( )⎡⎣ ⎤⎦ +

13!F δt( )⎡⎣ ⎤⎦

2 + ...⎛⎝⎜

⎞⎠⎟Gδt

Λ = eFδ t − I( )F−1L = I + 12!F δt( )⎡⎣ ⎤⎦ +

13!F δt( )⎡⎣ ⎤⎦

2 + ...⎛⎝⎜

⎞⎠⎟Lδt

41

Φ = e Fδt = I + F δt( ) + 1

2!F δt( )⎡⎣ ⎤⎦

2+ 13!F δt( )⎡⎣ ⎤⎦

3+!

Sampled-Data Cost Function

minΔu t( )

Δ2J = minΔu t( )

12ΔxT (t f )P(t f )Δx(t f )+

12

ΔxT (t) ΔuT (t)⎡⎣

⎤⎦

Q MMT R

⎣⎢⎢

⎦⎥⎥

Δx(t)Δu(t)

⎣⎢⎢

⎦⎥⎥dt

t0

t f

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

Minimize subject to sampled-data dynamic constraint

Δx(tk+1) = Φ δt( )Δx(tk ) + Γ δt( )Δu(tk )

Sampled-Data Cost Function: a Discrete-Time Cost Function that accounts for system response between sampling instants

minΔu t( )

Δ2J = minΔu t( )

12Δxk f

TPk fΔxk f +12

ΔxT (t) ΔuT (t)⎡⎣

⎤⎦

Q MMT R

⎣⎢⎢

⎦⎥⎥

Δx(t)Δu(t)

⎣⎢⎢

⎦⎥⎥dt

tk

tk+1

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪k=0

k f −1

Sum integrals over short time intervals, (tk, tk+1)

42

Page 22: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Integrand of Sampled-Data Cost Function

Use dynamic equation ...

12

ΔxT (t) ΔuT (t)⎡⎣

⎤⎦

Q MMT R

⎣⎢⎢

⎦⎥⎥

Δx(t)Δu(t)

⎣⎢⎢

⎦⎥⎥dt

tk

tk+1

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪k=0

k f −1

...to express the integrand in the sampling interval, (tk,tk+1)

Δx t( ) = Φ t,tk( )Δx tk( ) + Γ t,tk( )Δu tk( )! Φ t,tk( )Δxk + Γ t,tk( )Δuk

43

Integrand of Sampled-Data Cost Function

12

ΔxT (t) ΔuT (t)⎡⎣

⎤⎦

Q MMT R

⎣⎢⎢

⎦⎥⎥

Δx(t)Δu(t)

⎣⎢⎢

⎦⎥⎥dt

tk

tk+1

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪k=0

k f −1

=12

ΔxkT Δuk

T⎡⎣

⎤⎦

ΦT t,tk( )QΦ t,tk( ) ΦT t,tk( ) QΓ t,tk( ) +M⎡⎣ ⎤⎦

QΓ t,tk( ) +M⎡⎣ ⎤⎦TΦ t,tk( ) R + ΓT t,tk( )M +MTΓ t,tk( ) + ΓT t,tk( )QΓ t,tk( )⎡⎣ ⎤⎦

⎢⎢⎢

⎥⎥⎥tk

tk+1

∫k

dtΔxkΔuk

⎣⎢⎢

⎦⎥⎥

⎨⎪

⎩⎪

⎬⎪

⎭⎪k=0

k f −1

= 12

ΔxkT Δuk

T⎡⎣

⎤⎦

Q̂ M̂M̂T R̂

⎣⎢⎢

⎦⎥⎥k

ΔxkΔuk

⎣⎢⎢

⎦⎥⎥

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪k=0

k f −1

Bring state and control out of integralAssume control is constant in sampling interval

Integration has been replaced by summation

Berman, Gran, J. Aircraft, 1974 44

Page 23: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Sampled-Data Cost Function Weighting Matrices

Q̂ = ΦT t,tk( )QΦ t,tk( )dttk

tk+1

M̂ = ΦT t,tk( ) QΓ t,tk( ) +M⎡⎣ ⎤⎦dttk

tk+1

R̂ = R + ΓT t,tk( )M +MTΓ t,tk( ) + ΓT t,tk( )QΓ t,tk( )⎡⎣ ⎤⎦dttk

tk+1

The integrand accounts for continuous-time variations of the LTI system between sampling instants (“Inter-sample ripple”)

Assume Q, M, and R are constant in the integration interval

Φ t,tk( ) and Γ t,tk( ) vary in the integration interval

45

Dynamic Programming Approach to Sampled-Data

Optimal Control

46

Page 24: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Discrete-Time Hamilton-Jacobi-Bellman Equation

V (t0 ) =ϕk f+ Lk

k=0

k f −1

Value Function at toDiscrete HJB equation

•  Begin at terminal point with optimal value function•  Working backward, add minimum value function

increment (Bellman’s Principle of Optimality)

Vk*= −minΔuk

Lk +Vk+1 *{ }= −min

ΔukHk , V * Δxk f *⎡⎣ ⎤⎦ = given

… optimal policy … whatever the initial state and initial decision …remaining decisions must constitute an optimal policy with regard to the current state

47

subject toΔxk+1 = ΦΔxk + ΓΔuk

Sampled-Data Cost Function Contains Terminal and Summation Costs

Integral cost has been replaced by a summation costTerminal cost is the same

minΔuk

Jsampled

= minΔuk

12Δxk f

TPk fΔxk f +12

ΔxkT Δuk

T⎡⎣

⎤⎦

Q̂ M̂M̂T R̂

⎣⎢⎢

⎦⎥⎥k

ΔxkΔuk

⎣⎢⎢

⎦⎥⎥

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪k=0

k f −1

∑⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

subject toΔxk+1 = ΦΔxk + ΓΔuk

48

Page 25: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Dynamic Programming Approach to Sampled-Data LQ Control

V (t0 ) =12Δxk f

TPk fΔxk f +12

ΔxkT Δuk

T⎡⎣

⎤⎦

Q̂ M̂M̂T R̂

⎣⎢⎢

⎦⎥⎥

ΔxkΔuk

⎣⎢⎢

⎦⎥⎥

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪k=0

k f −1

Vk*= −minΔuk

12

Δxk *T Q̂Δxk *+2Δxk *T M̂Δuk + ΔukT R̂Δuk⎡⎣ ⎤⎦ +Vk+1 *⎧

⎨⎩

⎫⎬⎭

= −minΔuk

Hk , V * Δxk f *⎡⎣ ⎤⎦ = Δxk f *T Pk fΔxk f *T

subject to Δxk+1 = ΦΔxk + ΓΔuk

Quadratic Value Function at to

Discrete HJB equation

49

Optimality Condition

Vk =12Δxk

TPkΔxk ; Vk+1 =12Δxk+1

TPk+1Δxk+1

Assume value function takes a quadratic form

Optimality condition∂Hk

∂Δuk= Δxk

TM̂ + ΔukT R̂⎡⎣ ⎤⎦ +

∂Vk+1∂Δuk

= 0

∂Vk+1∂Δuk

=∂ 12Δxk+1

TPk+1Δxk+1⎡⎣⎢

⎤⎦⎥

∂Δuk= ΦΔxk + ΓΔuk[ ]T Pk+1Γ

ΔxkTM̂ + Δuk

T R̂⎡⎣ ⎤⎦ + ΔxkTΦT + Δuk

TΓT⎡⎣ ⎤⎦Pk+1Γ = 0

where

hence50

Page 26: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Minimizing Value of Control

Δuk = − R̂ + ΓTPk+1Γ⎡⎣ ⎤⎦

−1M̂T + ΓTPk+1Φ⎡⎣ ⎤⎦Δxk −CkΔxk

Must find Pk in 0,k f( )Use definitions of V * and Δu in HJB equation

∂Hk

∂Δuk= Δxk

T M̂ +ΦTPk+1Γ⎡⎣ ⎤⎦ + ΔukT R̂ + ΓTPk+1Γ⎡⎣ ⎤⎦ = 0

51

Solution for Pk

12Δxk

TPkΔxk =

−minΔuk

12

ΔxkTQ̂Δxk + 2Δxk

TM̂ −CkΔxk( ) + −CkΔxk( )T R̂ −CkΔxk( ) + Δxk+1TPk+1Δxk+1⎡

⎣⎤⎦

⎧⎨⎩

⎫⎬⎭

Vk =12Δxk

TPkΔxk ; Vk+1 =12Δxk+1

TPk+1Δxk+1

Pk = Q̂+Φ TPk+1Φ − M̂ T + Γ TPk+1Φ⎡⎣ ⎤⎦TR̂ + Γ TPk+1Γ⎡⎣ ⎤⎦

−1M̂ T + Γ TPk+1Φ⎡⎣ ⎤⎦

Pk f given

Δuk = − R̂ + ΓTPk+1Γ⎡⎣ ⎤⎦

−1M̂T + ΓTPk+1Φ⎡⎣ ⎤⎦Δxk −CkΔxk

Substitute for Vk ,Vk+1, and Δuk in discrete-time HJB equation

Rearrange and cancel Δxk on both sides of the equation to yield thediscrete-time Riccati equation

52

Page 27: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Discrete-Time System with Linear-Quadratic Feedback Control

Dynamic System

Control law

Dynamic system with LQ feedback control

Δxk+1 = ΦΔxk + ΓΔuk

Δuk = − R̂ + Γ TPk+1Γ⎡⎣ ⎤⎦

−1M̂ T + Γ TPk+1Φ⎡⎣ ⎤⎦Δxk −CkΔxk

Δxk+1 = ΦΔxk + ΓΔuk= ΦΔxk + Γ −CkΔx( )k= Φ − ΓCk( )Δxk

53

Next Time:Dynamic System Stability

ReadingOCE: Section 2.5

54

Page 28: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Supplemental Material

55

Neighboring Trajectories•  Nominal (or reference) trajectory and control history

xN (t), uN (t),wN (t){ } for t in [t0,t f ]

•  Trajectory perturbed by–  Small initial condition variation–  Small control variation–  Small disturbance variation

x(t), u(t),w(t){ } for t in [t0,t f ]

= xN (t)+ Δx(t), uN (t)+ Δu(t),wN (t)+ Δw(t){ }

x : dynamic stateu : control inputw : disturbance input

56

Page 29: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Example: Separate Solutions for Nominal and Perturbation Trajectories

Δ!x1Δ!x2Δ!x3

⎢⎢⎢

⎥⎥⎥=

0 1 0−2a1 x3N − x1N( ) −a2 a2 + 2a1 x3N − x1N( )c1 + b3u1N( ) c1 3c2x3N

2

⎢⎢⎢⎢

⎥⎥⎥⎥

Δx1Δx2Δx3

⎢⎢⎢

⎥⎥⎥

+0 0b1 b2b3x1N 0

⎢⎢⎢

⎥⎥⎥

Δu1Δu2

⎣⎢⎢

⎦⎥⎥+

d00

⎢⎢⎢

⎥⎥⎥Δw1 ;

Δx1(to )Δx2 (to )Δx3(to )

⎢⎢⎢

⎥⎥⎥given

Linear, time-varying equation describes perturbation dynamics

Original nonlinear equation describes nominal dynamics

xN =

x1Nx2Nx3N

⎢⎢⎢⎢

⎥⎥⎥⎥

=

x2N + dw1N

a2 x3N − x2N( ) + a1 x3N − x1N( )2 + b1u1Nc2x3N

3 + c1 x1N + x2N( ) + b3x1N u1N+ b2u2N

⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥

,

x1Nx2Nx3N

⎢⎢⎢⎢

⎥⎥⎥⎥

given

57

!p t( ) = −q + 2p t( ) + p2 t( ); q = 1 or 100

p t f( ) = 1 Δx = −Δx + Δu + Δw; x 0( ) = 0

Δu = − p t( )Δx Δx = − 1+ p t( )⎡⎣ ⎤⎦Δx

Example: LQ Optimal Control, Stable First-Order System, “White-Noise” Disturbance

q = 1 q = 100Open-Loop Response

Open- and Closed-Loop Responses

Δxopen−loop t( )

p t( )

Δu t( )

Δxopen−loop t( )

p t( )

Δu t( )

58

Page 30: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

!p t( ) = −1+ 2p t( ) + p2 t( )p t f( ) = 1 or 1000

Δx = −Δx + Δu + Δw; x 0( ) = 0

Δu = − p t( )Δx Δx = − 1+ p t( )⎡⎣ ⎤⎦Δx

pf = 1 pf = 1000Open-Loop Response

Open- and Closed-Loop Responses

Δxopen−loop t( )

p t( )

Δu t( )

Δxopen−loop t( )

p t( )

Δu t( )

59

Example: LQ Optimal Control, Stable First-Order System, “White-Noise” Disturbance

!p t( ) = −100 + 2p t( ) + p2 t( )p t f( ) = 1 or 0

Δx = −Δx + Δu + Δw; x 0( ) = 0

Δu = − p t( )Δx Δx = − 1+ p t( )⎡⎣ ⎤⎦Δx

pf = 1 pf = 0Δxopen−loop t( )

Δxopen−loop t( ), Δxclosed−loop t( )

p t( )

Δu t( )

Δxopen−loop t( )

Δxopen−loop t( ), Δxclosed−loop t( )

p t( )

Δu t( )

60

Example: LQ Optimal Control, Stable First-Order System, “White-Noise” Disturbance

Page 31: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Multivariable LQ Optimal Control with Cross Weighting, M, = 0

!P t( ) = −Q[ ]− F[ ]T P t( )− P t( ) F[ ]+ P t( )GR−1GTP t( )P t f( ) = Pf

Δu(t) = −R−1 GTP t( )⎡⎣ ⎤⎦Δx t( )

Δ2J = 12ΔxT (t f )P(t f )Δx(t f )

+ 12

ΔxT (t) ΔuT (t)⎡⎣

⎤⎦Q 00 R

⎣⎢⎢

⎦⎥⎥

Δx(t)Δu(t)

⎣⎢⎢

⎦⎥⎥dt

t0

t f

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

No state/control coupling in cost function

Δx(t) = FΔx(t) +GΔu(t)

61

Nonlinear System with Neighboring-Optimal Feedback Control

Nonlinear dynamic system

x(t) = f x t( ),u(t)⎡⎣ ⎤⎦

u(t) = u * (t) −C(t)Δx t( ) = u * (t) −C(t) x t( ) − x * t( )⎡⎣ ⎤⎦

x(t) = f x t( ), u * (t) −C(t) x t( ) − x * t( )⎡⎣ ⎤⎦⎡⎣ ⎤⎦{ }

Neighboring-optimal control law

Nonlinear dynamic system with neighboring-optimal feedback control

62

Page 32: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Development of Neighboring-Optimal Therapy

x(t) = x * (t) + Δx(t)u(t) = u * (t) + Δu(t)

•  Expand dynamic equation to first degree

•  Compute nominal optimal control history using original nonlinear dynamic model

•  Compute optimal perturbation control using locally linearized dynamic model

•  Sum the two for neighboring-optimal control of the dynamic system

Δu(t) = −C(t) x(t)− xopt (t)⎡⎣ ⎤⎦u(t) = uopt (t)+ Δu(t)

u(t) = uopt (t)

63

First-Order LQ Example Code% First-Order LQ Example% Rob Stengel % 2/23/2011 clear global tp p q tw w xo = 0; to = 0; tf = 10; tspanx = [to tf]; tw = [0:0.01:10]; for k = 1:length(tw) w(k) = randn(1); end [tx,x] = ode15s('First',tspanx,xo); pf = 0; q = 100; tspanp = [tf to]; [tp,p] = ode15s('FirstRiccati',tspanp,pf); [tc,xc] = ode15s('FirstCL',tspanx,xo); u = interp1(tp,p,tc).*xc; figure subplot(4,1,1) plot(tx,x),grid,xlabel('Time'),ylabel('x-open-loop') subplot(4,1,2) plot(tp,p,'r'),grid,xlabel('Time'),ylabel('p') subplot(4,1,3) plot(tc,u),grid, xlabel('Time'),ylabel('u') subplot(4,1,4) plot(tc,xc,'r',tx,x,'b'),grid,xlabel('Time'),ylabel('x- closed-loop')

function xdot = First(t,x)global tp p tw w wt = interp1(tw,w,t,'nearest');xdot = -x + wt;

function pdot = FirstRiccati(t,p)

global qpdot = -q + 2*p + p^2;

function xdot = FirstCL(tc,xc);

global tp p tw w wt = interp1(tw,w,tc,'nearest');pt = interp1(tp,p,tc);xdot = -(1 + pt)*xc + wt;

64

Page 33: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Initial-Condition Response via State Transition

Φ = I+ F δ t( ) + 12!F δ t( )⎡⎣ ⎤⎦

2

+ 13!F δ t( )⎡⎣ ⎤⎦

3+ ...

Δx(t1) = Φ t1 − to( )Δx(t0 )Δx(t2 ) = Φ t2 − t1( )Δx(t1)Δx(t3) = Φ t3 − t2( )Δx(t2 )…

Δx(t1) = Φ δ t( )Δx(t0 ) = ΦΔx(t0 )Δx(t2 ) = ΦΔx(t1) = Φ2Δx(t0 )Δx(t3) = ΦΔx(t2 ) = Φ3Δx(t0 )

Propagation of Δx tk( ) in LTI system

State transition matrix is constant if tk − tk−1( ) = δ t = constant

65

Example: Equivalent Continuous-Time and Discrete-Time System Matrices

F = −1.2794 −7.98561 −1.2709

⎣⎢

⎦⎥

G = −9.0690

⎣⎢

⎦⎥

L = −7.9856−1.2709

⎣⎢

⎦⎥

Φ = 0.845 −0.69360.0869 0.8457

⎣⎢

⎦⎥

Γ = −0.8404−0.0414

⎣⎢

⎦⎥

Λ = −0.6936−0.1543

⎣⎢

⎦⎥

Φ = 0.0823 −1.47510.1847 0.0839

⎣⎢

⎦⎥

Γ = −2.4923−0.6429

⎣⎢

⎦⎥

Λ = −1.4751−0.9161

⎣⎢

⎦⎥

Continuous-time (“analog”) system

Discrete-time (“digital”) system

Time interval has a large

effect on the discrete-time

matrices

δt = 0.1s

δt = 0.5s

66

Page 34: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Evaluating Sampled-Data Weighting Matrices

Φ t,tk( ) and Γ t,tk( ) vary in the integration interval

Break interval into smaller intervals, and approximate as sum of short rectangular integration steps

Q̂ = ΦT t,0( )QΦ t,0( )dt0

Δt

! ΦT tk−1,0( )QΦ tk−1,0( )δ t⎡⎣ ⎤⎦k=1

100

∑ , δ t = Δt 100, tk = kδ t

! eFT tk−1QeFtk−1δ t⎡⎣ ⎤⎦

k=1

100

67

Q̂,M̂, and R̂ evaluated just once for LTI system

Γ = eFδ t − I( )F−1G = I + 12!F δt( )⎡⎣ ⎤⎦ +

13!F δt( )⎡⎣ ⎤⎦

2 + ...⎛⎝⎜

⎞⎠⎟Gδt

Evaluating Sampled-Data Weighting Matrices

M̂ = ΦT t,0( ) QΓ t,0( ) +M⎡⎣ ⎤⎦dt0

Δt

! eFT tk−1Q I+ 1

2!Ftk−1[ ]+ 13! Ftk−1[ ]2 + ...⎛

⎝⎜⎞⎠⎟Gtk−1

⎡⎣⎢

⎤⎦⎥+M

⎧⎨⎩

⎫⎬⎭δ t

k=1

100

R̂ ! R + ΓT tk−1( )M +MTΓ tk−1( ) + ΓT tk−1( )QΓ tk−1( )⎡⎣ ⎤⎦δ tk=1

100

∑68

Page 35: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Sampled-Data Cost Function Weighting Always Includes

State-Control Weighting

M̂ = ΦT t,tk( ) QΓ t,tk( ) +M⎡⎣ ⎤⎦dttk

tk+1

= ΦT t,tk( )QΓ t,tk( )dttk

tk+1

∫ even if M = 0

Lk =12

ΔxkTQ̂Δxk + 2Δxk

TM̂Δuk + ΔukT R̂Δuk⎡⎣ ⎤⎦

Sampled-Data Lagrangian

69

Continuous-Time LQ Optimization via Dynamic

Programming

70

Page 36: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Dynamic Programming Approach to Continuous-

Time LQ Control

V (t0 ) =12ΔxT (t f )P(t f ) f Δx(t f )

+ 12

ΔxT (t) ΔuT (t)⎡⎣

⎤⎦

Q(t) M(t)MT (t) R(t)

⎣⎢⎢

⎦⎥⎥

Δx(t)Δu(t)

⎣⎢⎢

⎦⎥⎥dt

to

t f

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

V (t1) =12ΔxT (t f )P(t f )Δx(t f )

− 12

ΔxT (t) ΔuT (t)⎡⎣

⎤⎦

Q(t) M(t)MT (t) R(t)

⎣⎢⎢

⎦⎥⎥

Δx(t)Δu(t)

⎣⎢⎢

⎦⎥⎥dt

tf

t1

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

Value Function at to

Value Function at t1

71

Dynamic Programming Approach to Continous-

Time LQ Control

∂V * Δx*(t1)[ ]∂t

=

−minΔu

12

Δx*T (t1)Q(t1)Δx*(t1)+ 2Δx*T (t1)M(t1)Δu(t1)+ ΔuT (t1)R(t1)Δu(t1)⎡⎣ ⎤⎦

+∂V * Δx*(t1)[ ]

∂ΔxF(t1)Δx*(t1)+G(t1)Δu(t1)[ ]

⎨⎪⎪

⎩⎪⎪

⎬⎪⎪

⎭⎪⎪

Time Derivative of Value Function

72

Page 37: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Dynamic Programming Approach to Continuous-

Time LQ Control

∂V * Δx*[ ]∂t

= −minΔu

H Δx*,Δu, ∂V *∂Δx

⎡⎣⎢

⎤⎦⎥,

V * Δx(t f )⎡⎣ ⎤⎦ = Δx*T (t f )P(t f )Δx*T (t f )

Hamiltonian

H Δx*,Δu, ∂V *∂Δx

⎡⎣⎢

⎤⎦⎥!

12

Δx*T QΔx*+2Δx*T MΔu+ ΔuTRΔu⎡⎣ ⎤⎦ +∂V * Δx*[ ]

∂ΔxFΔx*+GΔu[ ]

HJB Equation

73

Plausible Form for the Value Function

V * Δx * (t)[ ] = Δx *T (t)P(t)Δx *T (t)

∂V *∂t

= −12

Δx *T (t1) P(t1)Δx * (t1)⎡⎣ ⎤⎦

∂V *∂Δx

= Δx *T (t)P(t)

Time Derivative of the Value Function

Gradient of the Value Function with respect to the state

Quadratic Function of State Perturbation

74

Page 38: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Optimal Control Law and HJB Equation

∂H∂u

= ΔxTM + ΔuTR + ΔxTPG = 0

Δu(t) = −R−1 GTP+MT( )Δx(t)

ΔxT !PΔx =

ΔxT −Q+MR−1MT⎡⎣ ⎤⎦ − F −GR−1MT⎡⎣ ⎤⎦TP− P F −GR−1MT⎡⎣ ⎤⎦ + PGR

−1GTP{ }Δx

Optimal control law

Incorporate Value Function Model in HJB equation

Δx t( ) can be cancelled on left and right 75

Matrix Riccati Equation

!P t( ) = −Q(t)+M(t)R−1(t)MT (t)⎡⎣ ⎤⎦

− F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦TP t( )

−P t( ) F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦+P t( )G(t)R−1(t)GT (t)P t( )

P t f( ) = φxx t f( )

76

Page 39: Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear function of deflection f(x)=−k 1 x±k 2 x 3 moment(θ)≈−k 1θ+k 2θ3 Example:

Example: 1st-Order System with LQ Feedback Control

1st-order discrete-time dynamic system

LQ optimal control law

Dynamic system with LQ feedback controlΔxk+1 = φΔxk + γΔuk

= φΔxk + γ −ckΔx( )k= φ −γ ck( )Δxk

Δxk+1 = φΔxk + γΔuk

Δuk = −

m + φγ pk+1r + γ 2 pk+1

Δxk −ckΔxk pk = q + φ2 pk+1 −

m + φγ pk+1( )2

r + γ 2 pk+1

, pkf given

77