Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Neighboring-Optimal Control via Linear-Quadratic (LQ) Feedback
Robert Stengel Optimal Control and Estimation, MAE 546
Princeton University, 2018
• Linearization of nonlinear dynamic models– Nominal trajectory– Perturbations about the nominal trajectory– Linear, time-invariant dynamic models– Examples
• Continuous-time, time-varying optimal LQ feedback control
• Discrete-time and sampled-data linear dynamic models
• Dynamic programming approach to optimal LQ sampled-data control
Copyright 2018 by Robert Stengel. All rights reserved. For educational use only.http://www.princeton.edu/~stengel/MAE546.html
http://www.princeton.edu/~stengel/OptConEst.html
1
Neighboring Trajectories Satisfy Same Dynamic Equations
• Neighboring-trajectory dynamic model is the same as the nominal dynamic model
!xN (t) = f[xN (t),uN (t),wN (t)], xN to( ) given
!x(t) = f[x(t),u(t),w(t)], x to( ) given
Δx(to ) = x(to )− xN (to )Δx(t) = x(t)− xN (t)Δ!x(t) = !x(t)− !xN (t)
!xN (t) = f[xN (t),uN (t),wN (t),t]!x(t) = !xN (t)+ Δ!x(t) = f[xN (t)+ Δx(t),uN (t)+ Δu(t),wN (t)+ Δw(t),t]
Δu(t) = u(t)− uN (t)Δw(t) = w(t)−wN (t)
2
Solve for the Nominal and Perturbation Trajectories Separately
Jacobian matrices of the linear model are evaluated along the nominal trajectory
!xN (t) = f[xN (t),uN (t),wN (t),t],xN to( ) given
Δ!x(t) ≈ ∂ f∂x
t( )Δx(t)+ ∂ f∂u
t( )Δu(t)+ ∂ f∂w
t( )Δw(t),
Δx to( ) given
F(t) ! ∂ f∂x x=xN (t )
u=uN (t )w=wN (t )
; G(t) ! ∂ f∂u x=xN (t )
u=uN (t )w=wN (t )
; L(t) ! ∂ f∂w x=xN (t )
u=uN (t )w=wN (t )
Δx(t) = F(t)Δx(t) +G(t)Δu(t) + L(t)Δw(t), Δx to( ) given3
!x(t) = !xN (t)+ Δ!x(t)≈ f[xN (t),uN (t),wN (t),t]
+ ∂ f∂x
Δx(t)+ ∂ f∂u
Δu(t)+ ∂ f∂w
Δw(t),
x(t0 ) = xN (t0 )+ Δx(to ) given
Linearization Example
4
Cubic Springs
Force is a nonlinear function of deflection
f x( ) = −k1x ± k2x3
moment θ( ) ≈ −k1θ + k2θ3
Example: Expand gsinθ in a power series
5
Stiffening Cubic Spring Example2nd-order nonlinear dynamic model
!x1(t) = f1 x(t)[ ] = x2 (t)!x2 (t) = f2 x(t)[ ] = −10x1(t)−10x1
3(t)− x2 (t)Integrate equations to produce nominal path
x1(0)x2 (0)
⎡
⎣⎢⎢
⎤
⎦⎥⎥→
f1N x(t)[ ]f2N x(t)[ ]
⎡
⎣
⎢⎢
⎤
⎦
⎥⎥dt→
0
t f
∫x1N (t)
x2N (t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥
in 0,t f⎡⎣ ⎤⎦
Evaluate partial derivatives of the Jacobian matrices
∂ f1∂ x1
= 0; ∂ f1∂ x2
= 1
∂ f2∂ x1
= −10 − 30x1N2 (t); ∂ f2
∂ x2= −1
∂ f1∂u = 0;
∂ f1∂w = 0
∂ f2∂u = 0;
∂ f2∂w = 0
6
Nominal and Perturbation Dynamic Equations
!x1N (t) = x2N (t)
!x2N (t) = −10x1N (t)−10x1N3(t)− x2N (t)
Δx1(t)Δx2 (t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥=
0 1− 10 + 30x1N
2 (t)( ) −1
⎡
⎣⎢⎢
⎤
⎦⎥⎥
Δx1(t)Δx2 (t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥
x1N (0)
x2N (0)
⎡
⎣⎢⎢
⎤
⎦⎥⎥= 0
9⎡
⎣⎢
⎤
⎦⎥
Nonlinear, Time-Invariant (NTI) Equation
Linear, Time-Varying (LTV) Equation
Initial Conditions for Nonlinear and Linear Models
7
Δx1(0)Δx2 (0)
⎡
⎣⎢⎢
⎤
⎦⎥⎥= 0
1⎡
⎣⎢
⎤
⎦⎥
Comparison of Approximate and Exact Solutions
xN (t)Δx(t)xN (t)+ Δx(t)x(t)
x2N (0) = 9Δx2 (0) = 1x2N (t)+ Δx2 (t) = 10x2 (t) = 10
!xN (t)Δ!x(t)!xN (t)+ Δ!x(t)!x t( )
8
Euler-Lagrange Equations for Minimizing Variational
Cost Function
9
Optimize Nominal and Neighboring Dynamic Models Separately
Linear, Neighboring-Optimal Control Added to Nominal Control
Δu(t) = −C(t)Δx(t)= −C(t) x(t)− x*(t)[ ]
Nominal, Nonlinear, Open-Loop Optimal Control
u(t) = uopt (t) ! u*(t)
10
u(t) = u*(t)+ Δu(t)
Expand Optimal Control Function
J x*(t0 )+ Δx(t0 )[ ], x*(t f )+ Δx(t f )⎡⎣ ⎤⎦{ } !J * x*(t0 ),x*(t f )⎡⎣ ⎤⎦ + ΔJ Δx(t0 ),Δx(t f )⎡⎣ ⎤⎦ + Δ2J Δx(t0 ),Δx(t f )⎡⎣ ⎤⎦
Nominal optimized cost, plus nonlinear dynamic constraint
Expand optimized cost function to second degree
J * x*(t0 ),x*(t f )⎡⎣ ⎤⎦ = φ x*(t f )⎡⎣ ⎤⎦ + L x*(t),u*(t)[ ]t0
t f
∫ dt
subject to nonlinear dynamic equation!x*(t) = f x*(t),u*(t)[ ], x(t0 ) = x0
= J * x *(t0 ),x *(t f )⎡⎣ ⎤⎦ + Δ2J Δx(t0 ),Δx(t f )⎡⎣ ⎤⎦because First Variation, ΔJ Δx(t0 ),Δx(t f )⎡⎣ ⎤⎦ = 0
11
2nd Variation of the Cost Function
Cost weighting matrices expressed as
Objective: Given optimal nominal solution, minimize 2nd-variational cost subject to linear dynamic constraint
minΔu
Δ2J = 12ΔxT (t f )φxx (t f )Δx(t f )+
12
ΔxT (t) ΔuT (t)⎡⎣
⎤⎦
Lxx (t) Lxu(t)Lux (t) Luu(t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥
Δx(t)Δu(t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥dt
t0
t f
∫⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
P(t f ) ! φxx (t f ) =∂2φ∂x2
(t f )
Q(t) M(t)MT (t) R(t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥!
Lxx (t) Lxu(t)Lux (t) Luu(t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥
dim P(t f )⎡⎣ ⎤⎦ = dim Q(t)[ ] = n × ndim R(t)[ ] = m ×mdim M(t)[ ] = n ×m
subject to perturbation dynamicsΔ!x(t) = F(t)Δx(t)+G(t)Δu(t), Δx(t0 ) = Δxo
12
2nd Variational Hamiltonian
Δ2J = 12ΔxT (t f )P(t f )Δx(t f )+
12
ΔxT (t) ΔuT (t)⎡⎣
⎤⎦
Q(t) M(t)MT (t) R(t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥
Δx(t)Δu(t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥dt
t0
t f
∫⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
H Δx(t),Δu(t),Δλ t( )⎡⎣ ⎤⎦ = L Δx(t),Δu(t)[ ]+ ΔλT t( )f Δx(t),Δu(t)[ ]= 12
ΔxT (t)Q(t)Δx(t)+ 2ΔxT (t)M(t)Δu(t)+ ΔuT (t)R(t)Δu(t)⎡⎣ ⎤⎦
+ ΔλT t( ) F(t)Δx(t)+G(t)Δu(t)[ ]
Variational Lagrangian plus adjoined dynamic constraint
Variational cost function
= 12ΔxT (t f )P(t f )Δx(t f )+
12
ΔxT (t)Q(t)Δx(t)+ 2ΔxT (t)M(t)Δu(t)+ ΔuT (t)R(t)Δu(t)⎡⎣ ⎤⎦dtt0
t f
∫⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
13
2nd Variational Euler-Lagrange Equations
Δλ t f( ) = φxx (t f )Δx(t f ) = P(t f )Δx(t f )
Terminal condition, solution for adjoint vector, and optimality condition
Δ λ t( ) = −
∂H Δx(t),Δu(t),Δλ t( )⎡⎣ ⎤⎦∂Δx
⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
T
= −Q(t)Δx(t) −M(t)Δu(t) − FT (t)Δλ t( )
∂H Δx(t),Δu(t),Δλ t( )⎡⎣ ⎤⎦∂Δu
⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
T
=MT (t)Δx(t) + R(t)Δu(t) −GT (t)Δλ t( ) = 0
14Can explicitly solve for control
E-L #1
E-L #2
E-L #3
Two-Point Boundary-Value Problem
Δx(t) = F(t)Δx(t) +G(t)Δu(t)
Δλ t( ) = −Q(t)Δx(t) −M(t)Δu(t) − FT (t)Δλ t( )
Δλ t f( ) = P(t f )Δx(t f )
Δx(t0 ) = Δx0
State Equation
Adjoint Vector Equation
15
Δu(t) = −R−1(t) MT (t)Δx(t) +GT (t)Δλ t( )⎡⎣ ⎤⎦
From Hu = 0
Use Control Law to Solve the LTV Two-Point Boundary-Value Problem
Δu(t) = −R−1(t) MT (t)Δx(t) +GT (t)Δλ t( )⎡⎣ ⎤⎦
Δx(t)Δ λ t( )
⎡
⎣⎢⎢
⎤
⎦⎥⎥=
F(t) −G(t)R−1(t)MT (t)⎡⎣ ⎤⎦ −G(t)R−1(t)GT (t)
−Q(t) +M(t)R−1(t)MT (t)⎡⎣ ⎤⎦ − F(t) −G(t)R−1(t)MT (t)⎡⎣ ⎤⎦T
⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
Δx(t)Δλ t( )
⎡
⎣⎢⎢
⎤
⎦⎥⎥
Δx(t0 )
Δλ t f( )⎡
⎣
⎢⎢
⎤
⎦
⎥⎥=
Δx0PfΔx f
⎡
⎣⎢⎢
⎤
⎦⎥⎥
Perturbation state vectorPerturbation adjoint vector
From Hu = 0
Substitute for control in system and adjoint equations
Control law that feeds back state and adjoint vectors
Adjoint relationship at end point
16
Use Control Law to Solve the LTV Two-Point Boundary-Value Problem
Assume the adjoint relationship between state and control applies over the entire interval
Δλ t( ) = P t( )Δx t( )Substitute for adjoint: Control law feeds back state alone
Δu*(t)= –R–1(t) MT(t)Δx(t)+GT(t)P t( )Δx t( )⎡⎣ ⎤⎦= –R–1(t) MT(t)+GT(t)P t( )⎡⎣ ⎤⎦Δx t( )
dim C( ) = m × n
17 Δu*(t) ! argmin
Δu t( ) in t0 , t f( )Δ2J Δx Δu( )⎡⎣ ⎤⎦
Δu*(t) = –C(t)Δx t( )
Time-Varying Linear-Quadratic (LQ) Optimal Control Gain Matrix
• Properties of feedback gain matrix– Full state feedback (m x n)– Time-varying matrix
• R, G, and M given• Control weighting matrix, R• State-control weighting matrix, M• Control effect matrix, G
Δu(t) = −C(t)Δx t( )
C(t) = R−1(t) GT (t)P t( ) +MT (t)⎡⎣ ⎤⎦
Optimal feedback gain matrix
18P(t) remains to be determined
Solution for the Adjoining Matrix, P(t)
Δλ t( ) = P t( )Δx t( ) + P t( )Δx t( )
P t( )Δx t( ) = Δ λ t( ) − P t( )Δx t( )
Time-derivative of adjoint vector
Rearrange
!P t( )Δx t( ) = −Q(t)+M(t)R−1(t)MT (t)⎡⎣ ⎤⎦Δx t( )− F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦TΔλ t( )
−P t( ) F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦Δx t( )−G(t)R−1(t)GT (t)Δλ t( ){ }
Recall coupled state/adjoint equation
Δx(t)Δ λ t( )
⎡
⎣⎢⎢
⎤
⎦⎥⎥=
F(t) −G(t)R−1(t)MT (t)⎡⎣ ⎤⎦ −G(t)R−1(t)GT (t)
−Q(t) +M(t)R−1(t)MT (t)⎡⎣ ⎤⎦ − F(t) −G(t)R−1(t)MT (t)⎡⎣ ⎤⎦T
⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
Δx(t)Δλ t( )
⎡
⎣⎢⎢
⎤
⎦⎥⎥
Substitute in adjoint matrix equation
19
Solution for the Adjoining Matrix, P(t)
Substitute for adjoint vector
Δλ t( ) = P t( )Δx t( )
!P t( )Δx t( ) = −Q(t)+M(t)R−1(t)MT (t)⎡⎣ ⎤⎦Δx t( )− F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦
TP t( )Δx t( )
−P t( ) F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦Δx t( )−G(t)R−1(t)GT (t)P t( )Δx t( ){ }
... and eliminate state vector
20
Matrix Riccati Equation for P(t)
!P t( ) = −Q(t)+M(t)R−1(t)MT (t)⎡⎣ ⎤⎦ − F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦TP t( )
−P t( ) F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦ + P t( )G(t)R−1(t)GT (t)P t( )P t f( ) = φxx t f( )
Nonlinear, ordinary differential equation for the Riccati Matrix, P(t), with terminal boundary conditions
21
Characteristics of the Adjoining ( Riccati) Matrix, P(t)
• P(tf) is symmetric, n x n, and typically positive semi-definite
• Matrix Riccati equation is symmetric• Therefore, P(t) is symmetric and positive semi-definite
throughout• Once P(t) has been determined, optimal feedback
control gain matrix, C(t) can be calculated
22
Example of Neighboring-Optimal Control: Improved Infection Treatment via
Feedback
23
Natural Response to Pathogen Assault (No Therapy)
24
�
J = 12s11x1 f
2 + s44x4 f
2( ) + 12
q11x12 + q44x4
2 + ru2( )dtto
t f
∫
Tradeoff Between Rate of Killing Pathogen, Preservation of Organ
Health, and Drug Use
Optimal Open-Loop Control
25
u(t) = u*(t)−C(t)Δx(t)= u*(t)−C(t)[x(t)− x*(t)]
50% Increased Infection: Nominal and Neighboring-Optimal Control (u1)
26
Optimal Control of Linear Systems
27
Time-Varying Linear-Quadratic (LQ) Feedback Control
LTV Continuous-time linear dynamic system
LTV optimal control law
Linear dynamic system with LTV optimal feedback control
Δx(t) = F(t)Δx(t) +G(t)Δu(t)
Δu(t) = −R−1(t) MT (t)+GT (t)P t( )⎡⎣ ⎤⎦Δx t( ) −C(t)Δx t( )
Δx(t) = F(t)Δx(t) +G(t)Δu(t)
= F(t)Δx(t)+G(t) −C(t)Δx t( )⎡⎣ ⎤⎦= F(t)−G(t)C(t)[ ]Δx(t)
28
LTI System with Time-Varying LQ Feedback Control
Open-loop system is LTI, but optimal control gain matrix is time-varying
Closed-loop system Δu(t) = −R−1 MT +GTP t( )⎡⎣ ⎤⎦Δx t( ) −C(t)Δx t( )
Δ!x(t) = FΔx(t)+G −C(t)Δx t( )⎡⎣ ⎤⎦= F −GC(t)[ ]Δx(t) = Fclosed−loop t( )Δx(t)
29
LTI dynamic system
Δx(t) = FΔx(t) +GΔu(t)
Example: LTV Optimal Control of a LTI First-Order System
Δx = fΔx + gΔu
Δu = −r−1 gp t( )⎡⎣ ⎤⎦Δx t( )
= −gp t( )r
Δx
!p t( ) = −q − 2 fp t( ) + g2p2 t( )r
p t f( ) = pf
Δ2J = 12pfΔx
2 (t f )+12
qΔx2 + 0( )ΔxΔu + rΔu2( )dtt0
t f
∫
30
!p t( ) = −1+ 2p t( ) + p2 t( )p t f( ) = 1
Δx = −Δx + Δu; x 0( ) = 1
Δu = − p t( )Δx
Δx = − 1+ p t( )⎡⎣ ⎤⎦Δx
Example: LTV Optimal Control of a Stable First-Order Systemf = −1; g = 1
q = r = 1
Control gain = p t( )
Δxopen−loop t( )
Δxclosed−loop t( )
p t( )
Δu t( )
31
Δu = − p t( )Δx
Δx = 1− p t( )⎡⎣ ⎤⎦Δx
Example: LTV Optimal Control of an Unstable First-Order Systemf = +1; g = 1
Control gain = p t( )
Δx = Δx + Δu; x 0( ) = 1
!p t( ) = −1− 2p t( ) + p2 t( )p t f( ) = 1
Δxopen−loop t( )
Δxclosed−loop t( )
p t( )
Δu t( )
32
Discrete-Time and Sampled-Data Linear, Time-Invariant
(LTI) Systems
33
Continuous- and Discrete-Time LTI System Models
Δ!x(t) = FΔx(t)+GΔu(t)+LΔw(t)Δx(t0 ) given
Continuous-time (“analog”) model is based on an ordinary differential equation
Dynamic Process
Observation Process
34
Δx(tk+1) = ΦΔx(tk )+ ΓΔu(tk )+ ΛΔw(tk )Δx(t0 ) given
Dynamic Process
Observation Process
Discrete-time (“digital”) model is based on an ordinary difference equation
Δy(t) = HxΔx(t)+HuΔu(t)+HwΔw(t)
Δy(tk ) = HxΔx(tk )+HuΔu(tk )+HwΔw(tk )
Digital Control Systems Use Sampled Data
Δxk = Δx tk( ) = Δx kΔt( )
• Periodic sequence
• Sampler is an analog-to-digital (A/D) converter• Reconstructor is a digital-to-analog (D/A) converter
35
Solution of a LTI dynamic model
Δ!x(t) = FΔx(t)+GΔu(t)+LΔw(t), Δx(t0 ) given
Δx(t) = Δx(t0 )+ FΔx(τ )+GΔu(τ )+LΔw(τ )[ ]dτt0
t
∫• ... has two parts
– Unforced (homogeneous) response to initial conditions– Forced response to control and disturbance inputs
LTI System Response to Inputs and Initial Conditions
36
Initial condition response
Step input response
Unforced Response to Initial Conditions
The state transition matrix propagates the state from t0 to t by a single multiplication
Δx(t) = Δx(t0 )+ FΔx(τ )[ ]dτt0
t
∫ = Φ t,t0( )Δx(t0 )
Neglecting forcing functions
The state transition matrix is the matrix exponential
Δx(t) = Δx(t0 )+ FΔx(τ )[ ]dτt0
t
∫= Φ t − t0( )Δx(t0 ) = eF t−t0( )Δx(t0 ) 37
LTI State Transition Matrix is the Matrix Exponential
eF t−t0( ) ! eFδ t =Matrix Exponential! Φ δ t( ) = State Transition Matrix
= I+ Fδ t + 12!Fδ t[ ]2 + 1
3!Fδ t[ ]3 + ...
See pages 79-84 of Optimal Control and Estimation for a description of how the State Transition Matrix is calculated for LTV
and LTI systems
38
Response to Inputs
Solution of the LTI model with piecewise-constant
forcing functions
Δx(tk ) = Δx(tk−1) + FΔx(τ ) +GΔu(τ ) + LΔw(τ )[ ]dτtk−1
tk
∫
Δx(tk ) = eFδ tΔx(tk−1)+ e
Fδ t e−F τ −tk−1( )⎡⎣ ⎤⎦dτtk−1
tk
∫ GΔu(tk−1)+LΔw(tk−1)[ ]= ΦΔx(tk−1)+ ΓΔu(tk−1)+ ΛΔw(tk−1)
39
δ t ! tk − tk−1 = constant
Φ ! eFδ t
Γ ! eFδ t I− e−Fδ t( )F−1G = eFδ t − I( )F−1G
Λ ! eFδ t I− e−Fδ t( )F−1L = eFδ t − I( )F−1L
Discrete-Time LTI System Response to Step Input
Δx(t1) = ΦΔx(t0 )+ ΓΔu(t0 )+ ΛΔw(t0 )Δx(t2 ) = ΦΔx(t1)+ ΓΔu(t1)+ ΛΔw(t1)Δx(t3) = ΦΔx(t2 )+ ΓΔu(t2 )+ ΛΔw(t2 )
!
Step Input: Discrete-time solution is exact at sampling instants
Propagation of Δx tk( )with constant δ t, Φ, Γ, and Λ
40
As time interval becomes very small, discrete-time model approaches
continuous-time model
Φ δ t→0⎯ →⎯⎯ I + Fδt( )Γ δ t→0⎯ →⎯⎯ GδtΛ δ t→0⎯ →⎯⎯ Lδt
Series Expressions for Discrete-Time LTI System Matrices
Γ = eFδ t − I( )F−1G = I + 12!F δt( )⎡⎣ ⎤⎦ +
13!F δt( )⎡⎣ ⎤⎦
2 + ...⎛⎝⎜
⎞⎠⎟Gδt
Λ = eFδ t − I( )F−1L = I + 12!F δt( )⎡⎣ ⎤⎦ +
13!F δt( )⎡⎣ ⎤⎦
2 + ...⎛⎝⎜
⎞⎠⎟Lδt
41
Φ = e Fδt = I + F δt( ) + 1
2!F δt( )⎡⎣ ⎤⎦
2+ 13!F δt( )⎡⎣ ⎤⎦
3+!
Sampled-Data Cost Function
minΔu t( )
Δ2J = minΔu t( )
12ΔxT (t f )P(t f )Δx(t f )+
12
ΔxT (t) ΔuT (t)⎡⎣
⎤⎦
Q MMT R
⎡
⎣⎢⎢
⎤
⎦⎥⎥
Δx(t)Δu(t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥dt
t0
t f
∫⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
Minimize subject to sampled-data dynamic constraint
Δx(tk+1) = Φ δt( )Δx(tk ) + Γ δt( )Δu(tk )
Sampled-Data Cost Function: a Discrete-Time Cost Function that accounts for system response between sampling instants
minΔu t( )
Δ2J = minΔu t( )
12Δxk f
TPk fΔxk f +12
ΔxT (t) ΔuT (t)⎡⎣
⎤⎦
Q MMT R
⎡
⎣⎢⎢
⎤
⎦⎥⎥
Δx(t)Δu(t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥dt
tk
tk+1
∫⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪k=0
k f −1
∑
Sum integrals over short time intervals, (tk, tk+1)
42
Integrand of Sampled-Data Cost Function
Use dynamic equation ...
12
ΔxT (t) ΔuT (t)⎡⎣
⎤⎦
Q MMT R
⎡
⎣⎢⎢
⎤
⎦⎥⎥
Δx(t)Δu(t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥dt
tk
tk+1
∫⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪k=0
k f −1
∑
...to express the integrand in the sampling interval, (tk,tk+1)
Δx t( ) = Φ t,tk( )Δx tk( ) + Γ t,tk( )Δu tk( )! Φ t,tk( )Δxk + Γ t,tk( )Δuk
43
Integrand of Sampled-Data Cost Function
12
ΔxT (t) ΔuT (t)⎡⎣
⎤⎦
Q MMT R
⎡
⎣⎢⎢
⎤
⎦⎥⎥
Δx(t)Δu(t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥dt
tk
tk+1
∫⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪k=0
k f −1
∑
=12
ΔxkT Δuk
T⎡⎣
⎤⎦
ΦT t,tk( )QΦ t,tk( ) ΦT t,tk( ) QΓ t,tk( ) +M⎡⎣ ⎤⎦
QΓ t,tk( ) +M⎡⎣ ⎤⎦TΦ t,tk( ) R + ΓT t,tk( )M +MTΓ t,tk( ) + ΓT t,tk( )QΓ t,tk( )⎡⎣ ⎤⎦
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥tk
tk+1
∫k
dtΔxkΔuk
⎡
⎣⎢⎢
⎤
⎦⎥⎥
⎧
⎨⎪
⎩⎪
⎫
⎬⎪
⎭⎪k=0
k f −1
∑
= 12
ΔxkT Δuk
T⎡⎣
⎤⎦
Q̂ M̂M̂T R̂
⎡
⎣⎢⎢
⎤
⎦⎥⎥k
ΔxkΔuk
⎡
⎣⎢⎢
⎤
⎦⎥⎥
⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪k=0
k f −1
∑
Bring state and control out of integralAssume control is constant in sampling interval
Integration has been replaced by summation
Berman, Gran, J. Aircraft, 1974 44
Sampled-Data Cost Function Weighting Matrices
Q̂ = ΦT t,tk( )QΦ t,tk( )dttk
tk+1
∫
M̂ = ΦT t,tk( ) QΓ t,tk( ) +M⎡⎣ ⎤⎦dttk
tk+1
∫
R̂ = R + ΓT t,tk( )M +MTΓ t,tk( ) + ΓT t,tk( )QΓ t,tk( )⎡⎣ ⎤⎦dttk
tk+1
∫
The integrand accounts for continuous-time variations of the LTI system between sampling instants (“Inter-sample ripple”)
Assume Q, M, and R are constant in the integration interval
Φ t,tk( ) and Γ t,tk( ) vary in the integration interval
45
Dynamic Programming Approach to Sampled-Data
Optimal Control
46
Discrete-Time Hamilton-Jacobi-Bellman Equation
V (t0 ) =ϕk f+ Lk
k=0
k f −1
∑
Value Function at toDiscrete HJB equation
• Begin at terminal point with optimal value function• Working backward, add minimum value function
increment (Bellman’s Principle of Optimality)
Vk*= −minΔuk
Lk +Vk+1 *{ }= −min
ΔukHk , V * Δxk f *⎡⎣ ⎤⎦ = given
… optimal policy … whatever the initial state and initial decision …remaining decisions must constitute an optimal policy with regard to the current state
47
subject toΔxk+1 = ΦΔxk + ΓΔuk
Sampled-Data Cost Function Contains Terminal and Summation Costs
Integral cost has been replaced by a summation costTerminal cost is the same
minΔuk
Jsampled
= minΔuk
12Δxk f
TPk fΔxk f +12
ΔxkT Δuk
T⎡⎣
⎤⎦
Q̂ M̂M̂T R̂
⎡
⎣⎢⎢
⎤
⎦⎥⎥k
ΔxkΔuk
⎡
⎣⎢⎢
⎤
⎦⎥⎥
⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪k=0
k f −1
∑⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
subject toΔxk+1 = ΦΔxk + ΓΔuk
48
Dynamic Programming Approach to Sampled-Data LQ Control
V (t0 ) =12Δxk f
TPk fΔxk f +12
ΔxkT Δuk
T⎡⎣
⎤⎦
Q̂ M̂M̂T R̂
⎡
⎣⎢⎢
⎤
⎦⎥⎥
ΔxkΔuk
⎡
⎣⎢⎢
⎤
⎦⎥⎥
⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪k=0
k f −1
∑
Vk*= −minΔuk
12
Δxk *T Q̂Δxk *+2Δxk *T M̂Δuk + ΔukT R̂Δuk⎡⎣ ⎤⎦ +Vk+1 *⎧
⎨⎩
⎫⎬⎭
= −minΔuk
Hk , V * Δxk f *⎡⎣ ⎤⎦ = Δxk f *T Pk fΔxk f *T
subject to Δxk+1 = ΦΔxk + ΓΔuk
Quadratic Value Function at to
Discrete HJB equation
49
Optimality Condition
Vk =12Δxk
TPkΔxk ; Vk+1 =12Δxk+1
TPk+1Δxk+1
Assume value function takes a quadratic form
Optimality condition∂Hk
∂Δuk= Δxk
TM̂ + ΔukT R̂⎡⎣ ⎤⎦ +
∂Vk+1∂Δuk
= 0
∂Vk+1∂Δuk
=∂ 12Δxk+1
TPk+1Δxk+1⎡⎣⎢
⎤⎦⎥
∂Δuk= ΦΔxk + ΓΔuk[ ]T Pk+1Γ
ΔxkTM̂ + Δuk
T R̂⎡⎣ ⎤⎦ + ΔxkTΦT + Δuk
TΓT⎡⎣ ⎤⎦Pk+1Γ = 0
where
hence50
Minimizing Value of Control
Δuk = − R̂ + ΓTPk+1Γ⎡⎣ ⎤⎦
−1M̂T + ΓTPk+1Φ⎡⎣ ⎤⎦Δxk −CkΔxk
Must find Pk in 0,k f( )Use definitions of V * and Δu in HJB equation
∂Hk
∂Δuk= Δxk
T M̂ +ΦTPk+1Γ⎡⎣ ⎤⎦ + ΔukT R̂ + ΓTPk+1Γ⎡⎣ ⎤⎦ = 0
51
Solution for Pk
12Δxk
TPkΔxk =
−minΔuk
12
ΔxkTQ̂Δxk + 2Δxk
TM̂ −CkΔxk( ) + −CkΔxk( )T R̂ −CkΔxk( ) + Δxk+1TPk+1Δxk+1⎡
⎣⎤⎦
⎧⎨⎩
⎫⎬⎭
Vk =12Δxk
TPkΔxk ; Vk+1 =12Δxk+1
TPk+1Δxk+1
Pk = Q̂+Φ TPk+1Φ − M̂ T + Γ TPk+1Φ⎡⎣ ⎤⎦TR̂ + Γ TPk+1Γ⎡⎣ ⎤⎦
−1M̂ T + Γ TPk+1Φ⎡⎣ ⎤⎦
Pk f given
Δuk = − R̂ + ΓTPk+1Γ⎡⎣ ⎤⎦
−1M̂T + ΓTPk+1Φ⎡⎣ ⎤⎦Δxk −CkΔxk
Substitute for Vk ,Vk+1, and Δuk in discrete-time HJB equation
Rearrange and cancel Δxk on both sides of the equation to yield thediscrete-time Riccati equation
52
Discrete-Time System with Linear-Quadratic Feedback Control
Dynamic System
Control law
Dynamic system with LQ feedback control
Δxk+1 = ΦΔxk + ΓΔuk
Δuk = − R̂ + Γ TPk+1Γ⎡⎣ ⎤⎦
−1M̂ T + Γ TPk+1Φ⎡⎣ ⎤⎦Δxk −CkΔxk
Δxk+1 = ΦΔxk + ΓΔuk= ΦΔxk + Γ −CkΔx( )k= Φ − ΓCk( )Δxk
53
Next Time:Dynamic System Stability
ReadingOCE: Section 2.5
54
Supplemental Material
55
Neighboring Trajectories• Nominal (or reference) trajectory and control history
xN (t), uN (t),wN (t){ } for t in [t0,t f ]
• Trajectory perturbed by– Small initial condition variation– Small control variation– Small disturbance variation
x(t), u(t),w(t){ } for t in [t0,t f ]
= xN (t)+ Δx(t), uN (t)+ Δu(t),wN (t)+ Δw(t){ }
�
x : dynamic stateu : control inputw : disturbance input
56
Example: Separate Solutions for Nominal and Perturbation Trajectories
Δ!x1Δ!x2Δ!x3
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥=
0 1 0−2a1 x3N − x1N( ) −a2 a2 + 2a1 x3N − x1N( )c1 + b3u1N( ) c1 3c2x3N
2
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
Δx1Δx2Δx3
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
+0 0b1 b2b3x1N 0
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
Δu1Δu2
⎡
⎣⎢⎢
⎤
⎦⎥⎥+
d00
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥Δw1 ;
Δx1(to )Δx2 (to )Δx3(to )
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥given
Linear, time-varying equation describes perturbation dynamics
Original nonlinear equation describes nominal dynamics
xN =
x1Nx2Nx3N
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
=
x2N + dw1N
a2 x3N − x2N( ) + a1 x3N − x1N( )2 + b1u1Nc2x3N
3 + c1 x1N + x2N( ) + b3x1N u1N+ b2u2N
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
,
x1Nx2Nx3N
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
given
57
!p t( ) = −q + 2p t( ) + p2 t( ); q = 1 or 100
p t f( ) = 1 Δx = −Δx + Δu + Δw; x 0( ) = 0
Δu = − p t( )Δx Δx = − 1+ p t( )⎡⎣ ⎤⎦Δx
Example: LQ Optimal Control, Stable First-Order System, “White-Noise” Disturbance
q = 1 q = 100Open-Loop Response
Open- and Closed-Loop Responses
Δxopen−loop t( )
p t( )
Δu t( )
Δxopen−loop t( )
p t( )
Δu t( )
58
!p t( ) = −1+ 2p t( ) + p2 t( )p t f( ) = 1 or 1000
Δx = −Δx + Δu + Δw; x 0( ) = 0
Δu = − p t( )Δx Δx = − 1+ p t( )⎡⎣ ⎤⎦Δx
pf = 1 pf = 1000Open-Loop Response
Open- and Closed-Loop Responses
Δxopen−loop t( )
p t( )
Δu t( )
Δxopen−loop t( )
p t( )
Δu t( )
59
Example: LQ Optimal Control, Stable First-Order System, “White-Noise” Disturbance
!p t( ) = −100 + 2p t( ) + p2 t( )p t f( ) = 1 or 0
Δx = −Δx + Δu + Δw; x 0( ) = 0
Δu = − p t( )Δx Δx = − 1+ p t( )⎡⎣ ⎤⎦Δx
pf = 1 pf = 0Δxopen−loop t( )
Δxopen−loop t( ), Δxclosed−loop t( )
p t( )
Δu t( )
Δxopen−loop t( )
Δxopen−loop t( ), Δxclosed−loop t( )
p t( )
Δu t( )
60
Example: LQ Optimal Control, Stable First-Order System, “White-Noise” Disturbance
Multivariable LQ Optimal Control with Cross Weighting, M, = 0
!P t( ) = −Q[ ]− F[ ]T P t( )− P t( ) F[ ]+ P t( )GR−1GTP t( )P t f( ) = Pf
Δu(t) = −R−1 GTP t( )⎡⎣ ⎤⎦Δx t( )
Δ2J = 12ΔxT (t f )P(t f )Δx(t f )
+ 12
ΔxT (t) ΔuT (t)⎡⎣
⎤⎦Q 00 R
⎡
⎣⎢⎢
⎤
⎦⎥⎥
Δx(t)Δu(t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥dt
t0
t f
∫⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
No state/control coupling in cost function
Δx(t) = FΔx(t) +GΔu(t)
61
Nonlinear System with Neighboring-Optimal Feedback Control
Nonlinear dynamic system
x(t) = f x t( ),u(t)⎡⎣ ⎤⎦
u(t) = u * (t) −C(t)Δx t( ) = u * (t) −C(t) x t( ) − x * t( )⎡⎣ ⎤⎦
x(t) = f x t( ), u * (t) −C(t) x t( ) − x * t( )⎡⎣ ⎤⎦⎡⎣ ⎤⎦{ }
Neighboring-optimal control law
Nonlinear dynamic system with neighboring-optimal feedback control
62
Development of Neighboring-Optimal Therapy
x(t) = x * (t) + Δx(t)u(t) = u * (t) + Δu(t)
• Expand dynamic equation to first degree
• Compute nominal optimal control history using original nonlinear dynamic model
• Compute optimal perturbation control using locally linearized dynamic model
• Sum the two for neighboring-optimal control of the dynamic system
Δu(t) = −C(t) x(t)− xopt (t)⎡⎣ ⎤⎦u(t) = uopt (t)+ Δu(t)
u(t) = uopt (t)
63
First-Order LQ Example Code% First-Order LQ Example% Rob Stengel % 2/23/2011 clear global tp p q tw w xo = 0; to = 0; tf = 10; tspanx = [to tf]; tw = [0:0.01:10]; for k = 1:length(tw) w(k) = randn(1); end [tx,x] = ode15s('First',tspanx,xo); pf = 0; q = 100; tspanp = [tf to]; [tp,p] = ode15s('FirstRiccati',tspanp,pf); [tc,xc] = ode15s('FirstCL',tspanx,xo); u = interp1(tp,p,tc).*xc; figure subplot(4,1,1) plot(tx,x),grid,xlabel('Time'),ylabel('x-open-loop') subplot(4,1,2) plot(tp,p,'r'),grid,xlabel('Time'),ylabel('p') subplot(4,1,3) plot(tc,u),grid, xlabel('Time'),ylabel('u') subplot(4,1,4) plot(tc,xc,'r',tx,x,'b'),grid,xlabel('Time'),ylabel('x- closed-loop')
function xdot = First(t,x)global tp p tw w wt = interp1(tw,w,t,'nearest');xdot = -x + wt;
function pdot = FirstRiccati(t,p)
global qpdot = -q + 2*p + p^2;
function xdot = FirstCL(tc,xc);
global tp p tw w wt = interp1(tw,w,tc,'nearest');pt = interp1(tp,p,tc);xdot = -(1 + pt)*xc + wt;
64
Initial-Condition Response via State Transition
Φ = I+ F δ t( ) + 12!F δ t( )⎡⎣ ⎤⎦
2
+ 13!F δ t( )⎡⎣ ⎤⎦
3+ ...
Δx(t1) = Φ t1 − to( )Δx(t0 )Δx(t2 ) = Φ t2 − t1( )Δx(t1)Δx(t3) = Φ t3 − t2( )Δx(t2 )…
Δx(t1) = Φ δ t( )Δx(t0 ) = ΦΔx(t0 )Δx(t2 ) = ΦΔx(t1) = Φ2Δx(t0 )Δx(t3) = ΦΔx(t2 ) = Φ3Δx(t0 )
…
Propagation of Δx tk( ) in LTI system
State transition matrix is constant if tk − tk−1( ) = δ t = constant
65
Example: Equivalent Continuous-Time and Discrete-Time System Matrices
F = −1.2794 −7.98561 −1.2709
⎡
⎣⎢
⎤
⎦⎥
G = −9.0690
⎡
⎣⎢
⎤
⎦⎥
L = −7.9856−1.2709
⎡
⎣⎢
⎤
⎦⎥
Φ = 0.845 −0.69360.0869 0.8457
⎡
⎣⎢
⎤
⎦⎥
Γ = −0.8404−0.0414
⎡
⎣⎢
⎤
⎦⎥
Λ = −0.6936−0.1543
⎡
⎣⎢
⎤
⎦⎥
Φ = 0.0823 −1.47510.1847 0.0839
⎡
⎣⎢
⎤
⎦⎥
Γ = −2.4923−0.6429
⎡
⎣⎢
⎤
⎦⎥
Λ = −1.4751−0.9161
⎡
⎣⎢
⎤
⎦⎥
Continuous-time (“analog”) system
Discrete-time (“digital”) system
Time interval has a large
effect on the discrete-time
matrices
δt = 0.1s
δt = 0.5s
66
Evaluating Sampled-Data Weighting Matrices
Φ t,tk( ) and Γ t,tk( ) vary in the integration interval
Break interval into smaller intervals, and approximate as sum of short rectangular integration steps
Q̂ = ΦT t,0( )QΦ t,0( )dt0
Δt
∫
! ΦT tk−1,0( )QΦ tk−1,0( )δ t⎡⎣ ⎤⎦k=1
100
∑ , δ t = Δt 100, tk = kδ t
! eFT tk−1QeFtk−1δ t⎡⎣ ⎤⎦
k=1
100
∑
67
Q̂,M̂, and R̂ evaluated just once for LTI system
Γ = eFδ t − I( )F−1G = I + 12!F δt( )⎡⎣ ⎤⎦ +
13!F δt( )⎡⎣ ⎤⎦
2 + ...⎛⎝⎜
⎞⎠⎟Gδt
Evaluating Sampled-Data Weighting Matrices
M̂ = ΦT t,0( ) QΓ t,0( ) +M⎡⎣ ⎤⎦dt0
Δt
∫
! eFT tk−1Q I+ 1
2!Ftk−1[ ]+ 13! Ftk−1[ ]2 + ...⎛
⎝⎜⎞⎠⎟Gtk−1
⎡⎣⎢
⎤⎦⎥+M
⎧⎨⎩
⎫⎬⎭δ t
k=1
100
∑
R̂ ! R + ΓT tk−1( )M +MTΓ tk−1( ) + ΓT tk−1( )QΓ tk−1( )⎡⎣ ⎤⎦δ tk=1
100
∑68
Sampled-Data Cost Function Weighting Always Includes
State-Control Weighting
M̂ = ΦT t,tk( ) QΓ t,tk( ) +M⎡⎣ ⎤⎦dttk
tk+1
∫
= ΦT t,tk( )QΓ t,tk( )dttk
tk+1
∫ even if M = 0
Lk =12
ΔxkTQ̂Δxk + 2Δxk
TM̂Δuk + ΔukT R̂Δuk⎡⎣ ⎤⎦
Sampled-Data Lagrangian
69
Continuous-Time LQ Optimization via Dynamic
Programming
70
Dynamic Programming Approach to Continuous-
Time LQ Control
V (t0 ) =12ΔxT (t f )P(t f ) f Δx(t f )
+ 12
ΔxT (t) ΔuT (t)⎡⎣
⎤⎦
Q(t) M(t)MT (t) R(t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥
Δx(t)Δu(t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥dt
to
t f
∫⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
V (t1) =12ΔxT (t f )P(t f )Δx(t f )
− 12
ΔxT (t) ΔuT (t)⎡⎣
⎤⎦
Q(t) M(t)MT (t) R(t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥
Δx(t)Δu(t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥dt
tf
t1
∫⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
Value Function at to
Value Function at t1
71
Dynamic Programming Approach to Continous-
Time LQ Control
∂V * Δx*(t1)[ ]∂t
=
−minΔu
12
Δx*T (t1)Q(t1)Δx*(t1)+ 2Δx*T (t1)M(t1)Δu(t1)+ ΔuT (t1)R(t1)Δu(t1)⎡⎣ ⎤⎦
+∂V * Δx*(t1)[ ]
∂ΔxF(t1)Δx*(t1)+G(t1)Δu(t1)[ ]
⎧
⎨⎪⎪
⎩⎪⎪
⎫
⎬⎪⎪
⎭⎪⎪
Time Derivative of Value Function
72
Dynamic Programming Approach to Continuous-
Time LQ Control
∂V * Δx*[ ]∂t
= −minΔu
H Δx*,Δu, ∂V *∂Δx
⎡⎣⎢
⎤⎦⎥,
V * Δx(t f )⎡⎣ ⎤⎦ = Δx*T (t f )P(t f )Δx*T (t f )
Hamiltonian
H Δx*,Δu, ∂V *∂Δx
⎡⎣⎢
⎤⎦⎥!
12
Δx*T QΔx*+2Δx*T MΔu+ ΔuTRΔu⎡⎣ ⎤⎦ +∂V * Δx*[ ]
∂ΔxFΔx*+GΔu[ ]
HJB Equation
73
Plausible Form for the Value Function
V * Δx * (t)[ ] = Δx *T (t)P(t)Δx *T (t)
∂V *∂t
= −12
Δx *T (t1) P(t1)Δx * (t1)⎡⎣ ⎤⎦
∂V *∂Δx
= Δx *T (t)P(t)
Time Derivative of the Value Function
Gradient of the Value Function with respect to the state
Quadratic Function of State Perturbation
74
Optimal Control Law and HJB Equation
∂H∂u
= ΔxTM + ΔuTR + ΔxTPG = 0
Δu(t) = −R−1 GTP+MT( )Δx(t)
ΔxT !PΔx =
ΔxT −Q+MR−1MT⎡⎣ ⎤⎦ − F −GR−1MT⎡⎣ ⎤⎦TP− P F −GR−1MT⎡⎣ ⎤⎦ + PGR
−1GTP{ }Δx
Optimal control law
Incorporate Value Function Model in HJB equation
Δx t( ) can be cancelled on left and right 75
Matrix Riccati Equation
!P t( ) = −Q(t)+M(t)R−1(t)MT (t)⎡⎣ ⎤⎦
− F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦TP t( )
−P t( ) F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦+P t( )G(t)R−1(t)GT (t)P t( )
P t f( ) = φxx t f( )
76
Example: 1st-Order System with LQ Feedback Control
1st-order discrete-time dynamic system
LQ optimal control law
Dynamic system with LQ feedback controlΔxk+1 = φΔxk + γΔuk
= φΔxk + γ −ckΔx( )k= φ −γ ck( )Δxk
Δxk+1 = φΔxk + γΔuk
Δuk = −
m + φγ pk+1r + γ 2 pk+1
Δxk −ckΔxk pk = q + φ2 pk+1 −
m + φγ pk+1( )2
r + γ 2 pk+1
, pkf given
77