Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Automation & Robotics Research Institute (ARRI)The University of Texas at Arlington
F.L. LewisMoncrief-O’Donnell Endowed Chair
Head, Controls & Sensors Group
http://ARRI.uta.edu/acs
Nonlinear Network Structures forFeedback Control
Organized and invited by Professor Jie Huang, CUHK
SCUT / CUHK Lectures on Advances in ControlMarch 2005
Relevance- Machine Feedback Control
qd
qr1qr2
AzEl
barrel flexiblemodes qf
compliant coupling
moving tank platform
turret with backlashand compliant drive train
terrain andvehicle vibrationdisturbances d(t)
Barrel tipposition
qd
qr1qr2
AzEl
barrel flexiblemodes qf
compliant coupling
moving tank platform
turret with backlashand compliant drive train
terrain andvehicle vibrationdisturbances d(t)
Barrel tipposition
Vehicle mass m
ParallelDamper
mc
activedamping
uc(if used)
kc cc
vibratory modesqf(t)
forward speedy(t)
vertical motionz(t)
surface roughnessρ(t)
k c
w(t)
Series Damper+
suspension+
wheel
Single-Wheel/Terrain System with Nonlinearities
High-Speed Precision Motion Control with unmodeled dynamics, vibration suppression, disturbance rejection, friction compensation, deadzone/backlash control
VehicleSuspension
IndustrialMachines
Military LandSystems
Aerospace
Newton’s Law
v(t)
p(t)
F(t)m
)()( tum
tFx
xmmaF
≡=
==
&&
&&
Mechanical Motion Systems (Vehicles, Robots)
τqB)+τq+G(q)+F(q)q(q,+Vq)qM( dm )(=&&&&&&
Coriolis/centripetalforce
gravity friction disturbances
Actuatorproblems
inertia
Control Input
LaGrange’s Eqs. Of Motion
2122
2111
xdxcxxxbxaxx
+−=−=
&
&
Darwinian Selection & Population Dynamics
x1= preyx2= predator
Volterra’s fishes
Stable Limit Cycle
2122
212111
xdxcxxexxbxaxx
+−=−−=
&
&
Effects of OvercrowdingLimited food and resources
Stable Equilibrium POINT
Favorable to Prey!
Dynamical System Models
)()()(
xhyuxgxfx
=+=&
Nonlinear system
Continuous-Time Systems Discrete-Time Systems
)()()(1
kk
kkkk
xhyuxgxfx
=+=+
Linear system
CxyBuAxx
=+=&
kk
kkk
CxyBAxx
=+=+1
1/s
f(x)
h(x)g(x)
z-1
xx& yu
Control Inputs Internal States Measured Outputs
Issues in Feedback Control
system
Feedbackcontroller
Feedforwardcontroller
Measured outputs
Control inputs
Desired trajectories
Sensornoise
Disturbances
StabilityTracking BoundednessRobustness
to disturbancesto unknown dynamics
Definitions of System Stability
xe
xe+B
xe-B
Const Bound B
tt0 t0+T
T
x(t)
x(t)
t
x(t)
t
Asymptotic Stability Marginal Stability
Uniform Ultimate Boundedness
)()(
1 kk xfxxfx
==
+
&
d
B(d)
plant
controlu(t)
outputy(t)
controller
systemidentifier estimated
output
)(ˆ ty
identificationerror
desiredoutput
)(tyd
plant
controlu(t)
outputy(t)
controllerdesiredoutput
)(tyd
trackingerror
plant
controlu(t)
outputy(t)
controller #1
controller #2
desiredoutput
)(tyd
trackingerror
Indirect Scheme
Controller Topologies
Direct Scheme
Feedback/FeedforwardScheme
Cell Homeostasis The individual cell is a complex feedback control system. It pumps ions across the cell membrane to maintain homeostatis, and has only limited energy to do so.
Cellular Metabolism
Permeability control of the cell membrane
http://www.accessexcellence.org/RC/VL/GG/index.html
Optimality in Biological Systems
Optimality in Control Systems DesignR. Kalman 1960
Rocket Orbit Injection
http://microsat.sm.bmstu.ru/e-library/Launch/Dnepr_GEO.pdf
FmmmF
rwvv
mF
rrvw
wr
−=
+−
=
+−=
=
&
&
&
&
φ
φμ
cos
sin2
2
ObjectivesGet to orbit in minimum timeUse minimum fuel
Dynamics
Performance Index, Cost, or Value function
∫∫ =+=TT
dtuxrdtuRxQJ00
),()]()([CT
Strategic utility utility
),(0
kk
N
kuxrJ ∑
=
=DT
Minimum energy RuuQxxuxr TT +=),(Minimum fuel uuxr =),(
Minimum time 1),( =uxr Then TdtuxrJT
== ∫0
),(
Discounting ),(0
kk
N
k
k uxrJ ∑=
= γ ∫ −=T
t dtuxreJ0
),(γ
Input Membership Fns. Output Membership Fns.
Fuzzy Logic Rule Base
NN
Input
NN
Output
Fuzzy Associative Memory (FAM) Neural Network (NN)
INTELLIGENT CONTROL TOOLS
Input x Output u
Input x Output u
Both FAM and NN define a function u= f(x) from inputs to outputs
FAM and NN can both be used for: 1. Classification and Decision-Making2. Control
(Includes Adaptive Control)
NN Includes Adaptive Control (Adaptive control is a 1-layer NN)
Neural Network Properties
Learning
Recall
Function approximation
Generalization
Classification
Association
Pattern recognition
Clustering
Robustness to single node failure
Repair and reconfiguration
Nervous system cell. http://www.sirinet.net/~jgjohnso/index.html
First groups working on NN Feedback Control in CS community
Werbos
NarendraSanner & SlotineF.C. Chen & KhalilLewisPolycarpou & IoannouChristodoulou & Rovithakis
A.J. Calise, McFarland, Naira HovakimyanEdgar Sanchez & PoznyakSam Ge, Zhang, et al.
Jun Wang, Chinese Univ. Hong Kong
c. 1995
Robot System[Λ I]qd e
Unity-Gain Tracking Loop
τrKv q
Industry Standard- PD Controller
)()()( tetetr Λ+= &Desiredtrajectory
Actualtrajectory
Easy to implement with COTS controllersFastCan be implemented with a few lines of code- e.g. MATLAB
But -- Cannot handle-High-order unmodeled dynamics Unknown disturbancesHigh performance specifications for nonlinear systemsActuator problems such as friction, deadzones, backlash
Controlinput
Two-layer feedforward static neural network (NN)
σ(.)
σ(.)
σ(.)
σ(.)
x1
x2
y1
y2
VT WT
inputs
hidden layer
outputs
xn ym
1
2
3
L
σ(.)
σ(.)
σ(.)
Summation eqs Matrix eqs
)( xVWy TTσ=⎟⎟⎠
⎞⎜⎜⎝
⎛+⎟⎟
⎠
⎞⎜⎜⎝
⎛+= ∑ ∑
= =
K
ki
n
jkjkjiki wvxvwy
10
10σσ
Control System Design Approach
ττ =++++ dm qFqGqqqVqqM )()(),()( &&&&&
)()()( tqtqte d −= eer Λ+= &
ττ −++−= dm xfrVrM )(&
Robot dynamics
Tracking Error definition
Error dynamics
qdRobot System[Λ I]
qe
PD Tracking Loop
τr
ττ =++++ dm qFqGqqqVqqM )()(),()( &&&&&
Robot dynamics
?controller
)()()( tqtqte d −=Tracking error
eer Λ+= &Sliding variable
The equations give the FB controller structure
Control System Design Approach
ττ =++++ dm qFqGqqqVqqM )()(),()( &&&&&
)()()( tqtqte d −= eer Λ+= &
ττ −++−= dm xfrVrM )(&
Robot dynamics
Tracking Error definition
Error dynamics
vrKxVW vTT −+= )ˆ(ˆ στDefine control input
εσ += )()( xVWxf TTApprox. unknown function by NN
Universal Approximation Property UNKNOWN FN.
)()ˆ(ˆ)( tvxVWxVWrKrVrM dTTTT
vm ++−++−−= τσεσ&
Closed-loop dynamics
)(~ tvfrKrVrM dvm +++−−= τ&
qdRobot System[Λ I]
Robust ControlTerm
q
v(t)
e
PD Tracking Loop
τrKv
Neural Network Robot Controller
^
qd
f(x)
Nonlinear Inner Loop
..
Feedforward Loop
Universal Approximation Property
Problem- Nonlinear in the NN weights sothat standard proof techniques do not work
Feedback linearization
Easy to implement with a few more lines of codeLearning feature allows for on-line updates to NN memory as dynamics changeHandles unmodelled dynamics, disturbances, actuator problems such as frictionNN universal basis property means no regression matrix is neededNonlinear controller allows faster & more precise motion
Stability Proof based on Lyapunov Extension
Define a Lyapunov Energy Function
)~~()~~( 21
21
21 VVtrWWtrMrrL TTT ++=
Differentiate
)()'ˆˆ~(~)ˆ'ˆˆ~(~
)2(21
vwrWxrVVtr
xrVrWWtr
rVMrrKrL
TTTT
TTTT
mT
vT
++++
−++
−+−=
σ
σσ&
&
&&
Using certain special tuning rules, one can show that the energyderivative is negative outside a compact set.
L&negative
)(tr
)(~ tW This proves that all signals are bounded
Problems—1. How to characterize the NN weight errors as ‘small’?- use Frobenius Norm2. Nonlinearity in the parameters requires extra care in the proof
Theorem 1 (NN Weight Tuning for Stability)
Let the desired trajectory )(tqd and its derivatives be bounded. Let the initial tracking error bewithin a certain allowable set U . Let MZ be a known upper bound on the Frobenius norm of theunknown ideal weights Z . Take the control input as
vrKxVW vTT −+= )ˆ(ˆ στ with rZZKtv MFZ )()( +−= .
Let weight tuning be provided by
WrFxrVFrFW TTT ˆˆ'ˆˆˆ κσσ −−=& , VrGrWGxV TT ˆ)ˆ'ˆ(ˆ κσ −=&
with any constant matrices 0,0 >=>= TT GGFF , and scalar tuning parameter 0>κ . Initialize the weight estimates as randomVW == ˆ,0ˆ .
Then the filtered tracking error )(tr and NN weight estimates VW ˆ,ˆ are uniformly ultimately bounded. Moreover, arbitrarily small tracking error may be achieved by selecting large controlgains vK . Backprop terms-
WerbosExtra robustifying terms-Narendra’s e-mod extended to NLIP systems
Forward Prop term?
Can also use simplified tuning- Hebbian
010
2030
4050
0
12
3
4
5
6
-20
-15
-10
-5
0
5
10
15
weights
W2 weights, x
d=[0.5sin(t) 0.5cos(t)]T
time
W2 w
eigh
ts
NN weights converge to the best learned values for the given system
0 2 4 6 8 10 12 14 16-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Time(second)
(a)
0 2 4 6 8 10 12 14 16-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
Time(second)
Leng
th(m
eter
)
(a)
0 2 4 6 8 10 12 14 16-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
Time(second)
Leng
th(m
eter
)
(b)
NN Friction Compensator
Desired trajectory
Tracking errors- solid = fixed gain controller, dashed= NN controller
Trajectory Tracking Controller
Position Velocity
position
velocity
Fixed gain
NN
Robot System[Λ I]
Robust ControlTerm
qqd
v(t)
qd
e
Tracking Loop
τf(x)
rKv
Nonlinear Inner Loop
..
^
Feedforward Loop
Static NN => Dynamic NN Feedback Controller
Dynamic NN and Passivity
1s C
A
xx.Bu2
H(s)
u1
1s C
A
xx.Bu2
H(s)
u1
kkTT
kk uxVWAxx ++=+ )(1 σ
Closed-Loop System wrt Neural Networkis a Dynamic (Recursive NN)
Discrete time case
,
TT rWGxV )ˆ'ˆ(ˆ σ=&
TTT xrVFrFW ˆ'ˆˆˆ σσ −=&The backprop tuning algorithms
make the closed-loop system passive
WrFxrVFrFW TTT ˆˆ'ˆˆˆ κσσ −−=&
VrGrWGxV TT ˆ)ˆ'ˆ(ˆ κσ −=&
The enhanced tuning algorithms
make the closed-loop system state-strict passive
SSP gives extra robustness properties to disturbances and HF dynamics
Force Control
Flexible pointing systems
Vehicle active suspensionSBIR Contracts
What about practical Systems?
Flexible Systems with Vibratory Modes
τ⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡+⎥
⎦
⎤⎢⎣
⎡+⎥
⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡+⎥
⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡+⎥
⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡
f
rrr
f
r
fff
r
fffr
rfrr
f
r
fffr
rfrr
BBGF
Koqq
VVVV
MMMM
0000
&
&
&&
&&
Rigid dynamics
Flexible dynamics
Problem- only one control input !
Flexible link pointing system
acceleration velocityposition Flex. modes
[Λ I]
Robust ControlTerm
v(t)
Tracking Loop
τrKv
Nonlinear Inner Loop
f(x)^
Neural network controller for Flexible-Link robot arm
qr = qrqr.e
ee = .
qd =qdqd.
..qd
Robot Systemqfqf.
Fast PDgains
Br-1
Manifoldequation
τ
τF
ξFast Vibration Suppression Loop
Singular PerturbationsAdd an extra feedback loopUse passivity to show stability
Coupled Systems
ee
Tdm
uqiRiLiKqGqFqqqVqqM
=++
=++++
ττ
),()()(),()(
&&
&&&&&
Motor electrical dynamics
Robot mechanical dynamics
Problem- only one control input !
Sprung mass (car body) smsz
Unsprung mass (tire) umuz
F+
F−
rzterrain
tK
VehicleActiveSuspensioncontrol
[Λ I]
Robust ControlTerm vi(t)
Tracking Loop
rKr
Nonlinear FB Linearization Loop
F1(x)^ qr = qrqr.e
ee = .
qd =qdqd.
..qd
RobotSystem1/KB1 i
F2(x)^
Kη
ηid
NN#1
NN#2Backstepping Loop
ue[Λ I]
Robust ControlTerm vi(t)
Tracking Loop
rKrKr
Nonlinear FB Linearization Loop
F1(x)F1(x)^
Neural network backstepping controller for Flexible-Joint robot arm
qr = qrqr.qr =qr = qrqr.qrqr.e
ee = .e = .
qd =qdqd.qd =qd =qdqd.qdqd.
..qd..qd
RobotSystem1/KB1 i
F2(x)F2(x)^
KηKη
ηid
NN#1
NN#2Backstepping Loop
ue
Backstepping
Advantages over traditional Backstepping- no regression functions needed
Add an extra feedback loopTwo NN neededUse passivity to show stability
τ=D(u)
ud+
-d-
m+
m-
.
τ
ud+
d-
mu
τ
BacklashDeadzone
System
Feedbackcontroller
Outputs
ActualControl inputsActuator
nonlinearity
AppliedControl inputs
Actuator Nonlinearities
MechanicalSystem
Kv[ΛΤ Ι]
v
reqd
Estimateof NonlinearFunction
w
--
D(u)u
NN DeadzonePrecompensator
I
II
$( )f x
$τ
τ q
dq&&
NN in Feedforward Loop- Deadzone Compensation
iiiTTTTT
iii WWrTkWrTkUuUWrwUTW ˆˆˆ)('ˆ)(ˆ21 −−= σσ
WrSkrwUWUuUSW TTiii
TT ˆ)(ˆ)('ˆ1−−= σσ
Acts like a 2-layer NNWith enhanced backprop tuning !
little critic network
0 5 10 15-1
-0.5
0
0.5
1
x2(k)
time
0 5 10 15-1
-0.5
0
0.5
e2(k)
time
0 5 10 15-2
-1
0
1
2
x2(k)
time
0 5 10 15-1
-0.5
0
0.5
e2(k)
time
Performance Results
PD control-deadzone chops out the middle
NN control fixes the problem
Nonlinear
SystemK
v[ ΛΤ Ι]
v1
rexd
Estimate
of Nonlinear
Function
--
x
$( )f x
desτ
[0 ΛΤ ]
--
yd
(n)
-
Backlash
-
1/s
Filter v2
Backstepping loop
τ
desτ&
NN Compensator
-
dx r
Kb
uϕ
nny
FZ
Dynamic inversion NN compensator for system with Backlash
U.S. patent- Selmic, Lewis, Calise, McFarland
Performance Results
PD control-backlash chops off tops & bottoms
NN control fixes the problem
0 1 2 3 4 5 6 7 8 9 10-1.5
-1
-0.5
0
0.5
1
time
x 1(t)
PD controller with backlash
0 1 2 3 4 5 6 7 8 9 10-0.1
-0.05
0
0.05
0.1
0.15
time
e 1(t)
0 1 2 3 4 5 6 7 8 9 10-2
-1
0
1
2
time
x 2(t)
PD controller with backlash
0 1 2 3 4 5 6 7 8 9 10-1
-0.5
0
0.5
1
time
e 2(t)
position
velocity
error
0 1 2 3 4 5 6 7 8 9 10-1.5
-1
-0.5
0
0.5
1
time
x 1(t)
PD controller with NN backlash compensation
0 1 2 3 4 5 6 7 8 9 10-0.04
-0.03
-0.02
-0.01
0
0.01
timee 1(t)
position
Tracking error
~x 1
$ ( $ , $ )h x xo 1 2
• ••
• ••
• ••
• ••
• ••
• ••
$ ( $ , $ )h x xc 1 2
ROBOT
Kv
[λ Ι ]vc
kD
KvkpM-1(.)
∫
∫
$$
$q
xx=
⎡⎣⎢
⎤⎦⎥
1
2
eee
=⎡
⎣⎢
⎤
⎦⎥$&
qqqdd
d=
⎡
⎣⎢⎤
⎦⎥&$x1
$x2
$z2
1x)(tτ)(ˆ tr
Neural Network Observer
Neural Network Controller
1~x
111
212
121
~)()()ˆ,ˆ(ˆˆ
~ˆˆ
xKxMxxWz
xxz
++=
+=− t
k
oTo
D
τσ&
&
TooDo k 1
~)ˆ(ˆ xxFW σ−=&
oooooo WFWxF ˆˆ~1 κκ −−
)()(ˆ)ˆ,ˆ(ˆ)( 21 ttt cvcT
c vrKxxW −+= στ
Tccc rxxFW ˆ)ˆ,ˆ(ˆ
21σ=&
ccc WrF ˆˆκ−
)()(ˆ)(ˆ ttt eer Λ+= &
NN ObserversNeeded when all states are not measured
)())(())(()1( kukxgkxfkx +=+
NN Control for Discrete Time Systems
dynamics
)(ˆ)(ˆ)(ˆ)(ˆ)(ˆ)(ˆ)1(ˆ kWkkIkykkWkW iTiii
Tiiiii ϕϕαφα −Γ−−=+
NN Tuning
layerlastforkrkyandNiforkrKkkWky NviT
ii ),1()(ˆ1,,1),()(ˆ)(ˆ)(ˆ +≡−=+≡ Lϕ
Error-based tuning
Gradient descent with momentum
Extra robust term
U.S. Patent- Jagannathan, Lewis
Neural Network Properties
Learning
Recall
Function approximation
Generalization
Classification
Association
Pattern recognition
Clustering
Robustness to single node failure
Repair and reconfiguration
Nervous system cell. http://www.sirinet.net/~jgjohnso/index.html
USED
???
x 2
x1
FL Membership Functions for 2-D Input Vector x
1
0
1 0
X1i X1
i+1
X2j
X2j+
1x1
i x1i+1
x 2j
x 2j+
1
Relation Between Fuzzy Systems and Neural Networks
Separable Gaussian activation functions for RBF NN
Separable triangular activation functions for CMAC NN
Two-layer NN as FL System
σ(.)
σ(.)
σ(.)
σ(.)
x1
x2
y1
y2
VT WT
inputs
hidden layer
outputs
xn ym
1
2
3
L
σ(.)
σ(.)
σ(.)
Standard thresholdsθ1
θ11
θ12
θ1n
FL system = NN with VECTOR thresholds
θ2
.e)b,a,z(2l
ii2l
i
li
)bz(ali
liiA
⎟⎠⎞⎜
⎝⎛ −−
=φ
rWKkr)bBaAˆ(KW WWT
W −−−= Φ&
raKkrWAKa aaT
a −=&
rbKkrWBKb bbT
b −=&
Gaussian membership function
Tuning laws
ControlledPlantKv[ ΛΤ I]
r(t)
-
Input MembershipFunctions
Fuzzy Rule Base
Output MembershipFunctions
xd(t)
e(t)
-
-)x,x(g d
x(t)
Fuzzy Logic Controllers
Dynamic Focusing of Awareness
Initial MFs
Final MFs
Effect of change of membership function spread "a"
Effect of change of membership function elasticities "c"
2cB )b,a,z()c,b,a,z( φφ =
2
22
2
1
c
)bz(a))bz(a(cos)c,b,a,z( ⎥
⎦
⎤⎢⎣
⎡−+−
=φ
Elastic Fuzzy Logic- c.f. P. WerbosWeights importance of factors in the rules
ControlledPlantKv[ ΛΤ I]
r(t)
-
Input MembershipFunctions
Fuzzy Rule Base
Output MembershipFunctions
xd(t)
e(t)
-
-)x,x(g d
x(t)
)x,x(grK)t(u dv −−=raKkrWAKa aa
Ta −=&
rbKkrWBKb bbT
b −=&
rWKkr)cCbBaAˆ(KW WWT
W −−−−= Φ& rcKkrWCKc ccT
c −=&
Elastic Fuzzy Logic ControlControl Tune Membership Functions
Tune Control Rep. Values
Better Performance
UnknownPlant
PerformanceEvaluator
InstantaneousUtility
r(t)
DesiredTrajectory
Action Generating NN
x(t)u(t)
tuning
d(t)
R(t)
)(ˆ xfUnknown
Plant
PerformanceEvaluator
InstantaneousUtility
r(t)
DesiredTrajectory
Action Generating NN
x(t)u(t)
FL Critic
tuning
d(t)
R(t)
)(ˆ xf
Fuzzy Logic Critic NN controller
UnknownPlant
PerformanceEvaluator
d(t)
)(^ xg u(t) x(t)
.
R
r
+
v(t)
Action Generating NN
xd(t)Kv
Kv
∫ ( 6-15 )
( 6-14 )
+
+
11ˆ;ˆ VW
-σ(.)
σ(.)
σ(.)
σ(.)
x1
x2
y1
y2
VT WT
inputs
hidden layer
outputs
xn ym
1
2
3
L
REFERENCE
input membershipfunctions
fuzzy rulle base
output membershipfunctions
+
ρ ρ&
R
Learning FL Critic Controller
,ˆ)ˆ('ˆˆ1111 VrVWrHV TTT Φμ −−=
&
,ˆ)ˆ(ˆ1111 WRrVW TT Γμ −−=
&
2211122222ˆˆ)ˆ('ˆ)()(ˆ WVrVWRrW TTTTT ΓμχσΓχσΓ −−=
&
Tune Action generating NN (controller)
Tune Fuzzy Logic Critic
FL Critic
Action generating NN
Critic requires MEMORY
User input:Reference Signal Performance
MeasurementMechanism
Reinforcement Signal
r(t)
Action Generating Neural Net
PLANT
RobustTerm
Kv
q(t)u(t)
qd(t)
v(t)
$g(x)-
-+
Utility
∑
Critic Element
R(t)
d(t)fr(t)
Control Action
σ( )×
σ( )×
σ( )×
σ( )×
y1
ym-1
ym
Input Layer Hidden
Layer
Output Layer
z2
zN-1
zN
Inpu
t Pre
-pro
cess
ing W
x1
xn-1
xn
1z1=1q
d(t)
Reinforcement Learning NN Controller
High-Level NN Controllers Need Exotic Lyapunov Fns.
1))(sgn()( ±== trtR
)~~(21)( 1
1WFWtrrtL T
n
ii
−
=
+= ∑
WFRxFW T )& κσ −= )(ˆ
)~~(21)1ln()1ln()( 1)()( WFWtreetL Ttrtr −− ++++= αα
& ( ) & ( ~ ~& )L trT T= −sgn +r r W F W1
)~~()(11
1)()(
WFW &&& −
α
−
α−
++⎟
⎟⎠
⎞⎜⎜⎝
⎛
+
α−+
+
α=
−+T
trtrtrtr
eeL
Reinforcement NN control
Simplified critic signal
Lyapunov Fn
Lyap. Deriv. contains R(t) !!
Tuning Law only contains R(t)
Adaptive Reinforcement Learning
,)(ˆ11 ρχσ +⋅= TWR
Critic is output of NN #1
)(ˆ),(ˆ 22 χσTd Wxxg =
Action is output of second NN
,ˆ)(ˆ111 WRW T −−= χσ&
( ) ,ˆˆ)(')(ˆ211122 WRWVrW
TT Γ−+⋅Γ= χσχσ&
The tuning algorithm treats this as a SINGLE 2-layer NN
Principe- Entropy
0000 ),(ln),())(),,(,( dxduuxpuxpuptxuxH ∫ ∫−=
Brockett- Minimum-Attention Controlawareness & effort (partial derivatives in PM)
Renyi’s entropyCorentropy
dtdxxub
tuadtuxruxV
22
0 ),(),( ⎟⎠⎞
⎜⎝⎛
∂∂
+⎟⎠⎞
⎜⎝⎛
∂∂
+= ∫ ∫∫
Encode Information into the Value Function
Information-Theoretic Learning
2. Neural Network Solution of Optimal Design Equations
Nearly Optimal ControlBased on HJ Optimal Design EquationsKnown system dynamicsPreliminary Off-line tuning
1. Neural Networks for Feedback Control
Based on FB Control ApproachUnknown system dynamicsOn-line tuning
Before-
2 2Tz h h u= +
2
0
2
0
2
0
2
0
2
)(
)(
)(
)(γ≤
+=
∫
∫
∫
∫∞
∞
∞
∞
dttd
dtuhh
dttd
dttz T
System
),(
)()()(
uxzxy
dxkuxgxfx
ψ==
++=&
)(ylu =
d
u
z
y control
Performance output
Measuredoutput
disturbance
where
Find control u(t) so that
For all L2 disturbancesAnd a prescribed gain γ2
L2 Gain Problem
H-Infinity Control Using Neural Networks
Zero-Sum differential game
Standard Bounded L2 Gain Problem
Take Ruuu T=2 ddd T=2and
Hamilton-Jacobi Isaacs (HJI) equation
xTT
xxTT
xTT
x VkkVVggRVhhfV 21
41
410γ
+−+= −
Stationary Point
xT VxgRu )(* 1
21 −−=
xT Vxkd )(
21* 2γ
=
If HJI has a positive definite solution V and the associated closed-loop system is ASthen L2 gain is bounded by γ2
Problems to solve HJIBeard proposed a successive solution method using Galerkin approx.
Viscosity Solution
Optimal control
Worst-case disturbance
( )∫∞
−+=0
222),( dtduhhduJ T γ Game theory value function
Bounded L2 Gain Problem for Constrained Input Systems
This is a quasi-norm
∫ −=u
Tq
du0
2 )(2 ννφ
Weaker than a norm –homogeneity property is replaced by the weaker symmetry property qq
xx −=
(Used by Lyshevsky for H2 control)
Control constrained by saturation function φ(.)tanh(p)
p
1
-1
∫ ∫∞
−
⎟⎟⎠
⎞⎜⎜⎝
⎛−+=
0
22
0
)(2),( dtddhhduJu
TT γννφ
Encode constraint into Value function
Hamiltonian
( ) dddhhkdgufxVduVxH T
uTT
T
x2
0
)(2),,,( γννφ −++++∂∂
≡ ∫ −
Stationarity conditions
)(20 1 uVguH
xT −+=
∂∂
= φ
dVkdH
xT 220 γ−=
∂∂
=
Optimal inputs
( )xT Vxgu )(* 2
1 φ−= Note u(t) is bounded!
xT Vxkd )(
21* 2γ
=
Leibniz’s Formula
Solve for u(t)
Cannot solve HJI !! Successive Solution- Algorithm 1:Let γ be prescribed and fixed.
0u a stabilizing control with region of asymptotic stability 0Ω
1. Outer loop- update controlInitial disturbance 00 =d
2. Inner loop- update disturbanceSolve Value Equation
( ) 0)()(2)( 2
0
=−++++∂
∂∫ − iTiu
TTj
Tj
i
dddhhkdgufx
V j
γννφ
Inner loop update
xVxkd j
iTi
∂∂
=+ )(2
12
1
γgo to 2.Iterate i until convergence to jVd ∞∞ , with RAS j
∞Ω
Outer loop update
⎟⎟⎠
⎞⎜⎜⎝
⎛∂
∂−=
∞
+ xVxgu jT
j )(21
1 φ
Go to 1.Iterate j until convergence to ∞
∞∞ Vu , , with RAS ∞
∞Ω
CT Policy Iteration for H-Infinity Control--- c.f. Howard
Consistency equation
Results for this Algorithm
For this to occur it is required that 0* Ω⊆Ω
The algorithm converges to )(*),(*,),(* 0000 ΩΩΩΩ duV
the optimal solution on the RAS 0Ω
Sometimes the algorithm converges to the optimal HJI solution V*, *Ω , u*, d*
For every iteration on the disturbance di one hasj
ij
i VV 1+≤ the value function increasesj
ij
i 1+Ω⊇Ω the RAS decreases
For every iteration on the control uj one has1+
∞∞ ≥ jj VV the value function decreases
1+∞∞ Ω⊆Ω jj the RAS does not decrease
)()()(
)()( i
LT
Li
L
TL
iL WxW
xL
xV
σσ
∇=∂
∂=
∂∂
Value function gradient approximation is
Substitute into Value Equation to get
Therefore, one may solve for NN weights at iteration (i,j)
Neural Network Approximation for Computational Technique
222),,()(),,()(0 i
jTi
jTi
ji
jTi
j duhhduxfxwduxrxxw γσσ −++∇=+∇= &
Neural Network to approximate V(i)(x)
( ) ( ) ( )
1( ) ( ) ( ),
Li i T i
L j j L Lj
V x w x W xσ σ=
= =∑
Problem- Cannot solve the Value Equation!
Neural Network Feedback Controller
1 ( ) .2
T TL Ld k x Wσ= ∇
Optimal Solution
( )LT
LT Wxgu σφ ∇−= )(2
1
A NN feedback controller with nearly optimal weights
Example: Linear system
1u ≤1 1
2 2
0 0.5 0,
1 1.5 1
x xu
x x
−= +
−
⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦ ⎣ ⎦
&
&
2 2 4
15 1 2 1 1 2 2 3 1 2 4 1
4 3 2 2 3 6 6
5 2 6 1 2 7 1 2 8 1 2 9 1 10 2
5 4 2 3 3 2 4 5
11 1 2 12 1 2 13 1 2 14 1 2 15 1 2
( , )V x x w x w x w x x w x
w x w x x w x x w x x w x w x
w x x w x x w x x w x x w x x
= + + + +
+ + + + +
+ + + +
Activation functions = even polynomial basis up to order 6
RAS found by integrating )(xfx −=&
That is, reverse time τddt −=
Initial Gain found by LQR Optimal NN solution
Rotational-Translational Actuator Benchmark Problem
Control input is torque NF is a disturbance
Rotational-Translational Actuator Benchmark Problem
22
1 4 3 32 2 2 2
3 3
42
3 1 4 32 22 2
33
( ) ( ) ( )
0sin cos
1 cos 1 cos( ) , ( )
01cos ( sin )
1 cos1 cos0.2
x f x g x u k x dx
x x x xx x
f x g xx
x x x xxx
ε εε ε
ε εεε
ε
= + +
⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥− + −⎢ ⎥ ⎢ ⎥⎢ ⎥− −⎢ ⎥
= =⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥−⎢ ⎥ ⎢ ⎥−− ⎣ ⎦⎣ ⎦
=
&
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6State Evolution for both controlllers
x1
x2
Switching Surface methodNonquadratic functionals method
( )1
0 0
tanh( ) 2 ( )u
TTV x Q x Rd dtφ μ μ∞
−⎡ ⎤= +⎢ ⎥
⎣ ⎦∫ ∫
Minimum-Time ControlEncode into Valua Function
2. Neural Network Solution of Optimal Design Equations
Nearly Optimal ControlBased on HJ Optimal Design EquationsKnown system dynamicsPreliminary Off-line tuning
1. Neural Networks for Feedback Control
Based on FB Control ApproachUnknown system dynamicsOn-line tuning
Before-
3. Approximate Dynamic Programming
Nearly Optimal ControlBased on recursive equation for the optimal valueUsually Known system dynamics (except Q learning)
The Goal – unknown dynamicsOn-line tuning
IEEE Trans. Neural NetworksSpecial Issue on Neural Networks for Feedback Control
Lewis, Wunsch, Prokhorov, Jie Huang, Parisini
Due date 1 December
Bring together:Feedback control system communityApproximate Dynamic Programming communityNeural Network community
)())(,()( 1++= khkkkh xVxhxrxV γ
),(1 kkk uxfx =+
Discrete-Time Systems
Recursive formConsistency equation
Howard Policy Iteration- Iterate the following until convergence 1. Find the value for the prescribed policy
solve completely2. Policy improvement
)())(,()( 1++= kjkjkkj xVxhxrxV γ
))(),((minarg)( 11 ++ += kjkkukj xVuxrxhk
γ
∑=
−=N
kikk
kik uxrxV ),()( γ
Value in difference form -
Four ADP Methods proposed by Werbos
Heuristic dynamic programming
Dual heuristic programming
AD Heuristic dynamic programming
AD Dual heuristic programming
(Watkins Q Learning)
Critic NN to approximate:
Value
Gradient xV
∂∂
)( kxV Q function ),( kk uxQ
GradientsuQ
xQ
∂∂
∂∂ ,
Action NN to approximate the Control
Bertsekas- Neurodynamic Programming
Barto & Bradtke- Q-learning proof (Imposed a settling time)
xVxgtxu T
∂∂
−=*
* )(21))((
),(
)()()(
uxzxy
dxkuxgxfx
ψ==
++=&
)( ylu =
d
u
z
y),(
)()()(
uxzxy
dxkuxgxfx
ψ==
++=&
)( ylu = )( ylu =
d
u
z
y
),,,(),,()(0 duxVxHduxrkdguf
xV T
∂∂
≡+++⎟⎠⎞
⎜⎝⎛
∂∂
=
xVxktxd T
∂∂
=*
2* )(
21))((γ
dxdVkk
dxdV
dxdVgg
dxdVhhf
dxdV T
TT
TT
T **
2
***
41
410 ⎟⎟
⎠
⎞⎜⎜⎝
⎛+⎟⎟
⎠
⎞⎜⎜⎝
⎛−+⎟⎟
⎠
⎞⎜⎜⎝
⎛=
γ
Continuous-Time Systems
HJB equation
Consistency equation
∫=T
t
dtduxrtxV ),,())((
Value in differential form -
)())(,()( 1++= khkkkh xVxhxrxV γ
Continuous Time Policy Iteration Select a stabilizing initial control1. Outer loop- update control
Initial disturbance set to zero
2. Inner loop- update disturbanceSolve Lyapunov equation
Inner loop disturbance update
go to 2.Until convergence
Outer loop update
Go to 1.Until convergence
( ) 0)( 222=−++++
∂∂ i
jTi
j
Tj
i
duhhkdgufx
V γ
xVxkd j
iTi
∂∂
=+ )(2
12
1
γ
⎟⎟⎠
⎞⎜⎜⎝
⎛∂
∂−=+ x
Vxgu ji
Tj )(2
11
Abu-Khalaf and Lewis- H inf
c.f. Howard work in DT Systems
Saridis – H2
)(),(ˆ xwwxV Tijj
i σ=
Neural Network Approximation of Value Function
222),,()(),,()(0 i
jTi
jTi
ji
jTi
j duhhduxfxwduxrxxw γσσ −++∇=+∇= &
Lyapunov equation becomes
*12
1* )()()( wxxgRxu TT σ∇−= −
Control action
CT Nearly Optimal NN feedback
CT Approx Policy IterationAbu-Khalaf & Lewis
Nearly optimal FB controlOff-line tuningKnown dynamics
Continuous-time adaptive critic
0),(),(),(),(),,( =+⎟⎠⎞
⎜⎝⎛
∂∂
=+⎟⎠⎞
⎜⎝⎛
∂∂
=+=∂∂ uxruxf
xVuxrx
xVuxrVu
xVxH
TT
&&
),(),()(),()(),( uxruxfxwuxrxxwuxrdt
dw TTT
+∇=+∇=+= σσσδ &
residual eq error
221 δ=E
gradient ),()()()( uxfxtw
twE σδδδ ∇=
∂∂
=∂∂
Update weights using, e.g., gradient descentδσα ),()( uxfxw ∇−=&
Critic NN
Abu-Khalaf & Lewis (c.f. Doya)
Hamiltonian (CT consistency check)
Or RLS
)()( xwxV Tσ=On-line tuning
Target value
Action NN
wwxxgRY TTT φσ =∇−= − )()(12
12
)()()( 12
1 xxgRx TTT σφ ∇−= −
vxY T )(2 φ=
])[(][)()(ˆ)( 12
1222 wvxwvxxgRYYxe TTT −=−∇−=−= − φσ
update weights by gradient descent)()( 2 xexv φβ−=&
Target action
Action NN
Activation fns depend on system dynamics
Critic weights
Alternative, simply set wwxxgRYxu TTT φσ =∇−== − )()()( 12
12
Does not work- proof development so far indicates that critic NN must be tuned faster than action NNi.e. α > β
c.f. Bradtke & Barto DT Q learning work
tuxr
tVV
uxrt
VVuxrxVu
xVxH tt
Dtttt
Δ+
Δ−
≈+Δ−
≈+=∂∂ ++ ),(
),(),()(),,( 11&
Small Time-Step Approximate Tuning for Continuous-Time Adaptive Critics
txVxVuxr
uxA ttttD
tt Δ−+
= + )()(),(),(
*1*
1
Baird’s Advantage function
)())(,()( 1++= khkkkh xVxhxrxV γThis is not in standard DT form
Sampled data systems
Optimal ControlLewis & Syrmos 1995
For More InformationJournal papers on http://arri.uta.edu/acs
In Progress: M. Abu-Khalaf, Jie Huang, F.L. LewisNearly Optimal Control by HJ Equation Solution Using Neural Networks
Theorem 1. Necessary and Sufficient Conditions for H-infinity Static OPFB Control
Assume that Q>0, then system (1) is output-feedback stabilizable with L2 gain bounded by γ If and only if:
i. (A, C) is detectable
ii. There exist matrices K* and L such that
)(* 1 LPBRCK T += −
where P>0, PT =P, is a solution of
01 112 =+−+++ −− LRLPBPBRPPDDQPAPA TTTT
γc.f. results by Kucera and De Souza
Note there is an (A,B) stabilizability condition hidden in the existence of Solution to the Riccati eq.
ONLY TWO COUPLED EQUATIONS
1. Initialize:Set n=0, 00 =L , and select γ, Q, R
2. n-th iteration:solve for nP in the ARE
01 112 =+−+++ −−
nT
nnT
nnT
nnT
n LRLPBBRPPDDPQPAAPγ
Evaluate gain and update L
111 )()( −−
+ += TTn
Tn CCCLPBRK
nT
nn PBCRKL −= ++ 11
Solution Algorithm 1- c.f. Geromel
Until Convergence
Based on ARE, so no initial stabilizing gain needed !!
Tries to project gain onto nullspace perp. of C using degrees of freedom in L
Aircraft Autopilot Design
F-16 Normal Acceleration Regulator Design
Aircraftq,α 2.20
2.20+s
kα
kq
α q
_ kI
ke
s 1 _
__
r e ε u δe
nzz =
1010+ sG Command System
TF eqy ][ εα=
ykkkkKyu Ieq ][ α−=−=
SystemdynamicsActuator
dynamics
Sensordynamics
Theorem 2. - new work
Parametrization of all H-infinity Static SVFB Controls
Assume that Q>0, then K is a stabilizing SVFB with L2 gain bounded by γ If and only if:
i.(A, B) is stabilizable
ii.There exist a matrix L such that
)(1 LPBRK T += −
where P>0, PT =P, is a solution of
01 112 =+−+++ −− LRLPBPBRPPDDQPAPA TTTT
γ
OPFB is a special case
1s C
A
xx.Bu2
H(s)
u1
1s C
A
xx.Bu2
H(s)
u1
kkTT
kk uxVWAxx ++=+ )(1 σ
Chaos in Dynamic Neural Networksc.f. Ron Chen
%MATLAB file for chaotic NN from Jun Wang's paper
function [ki,x,y,z]=tcnn(N);y(1)= rand; ki(1)=1; z(1)= 0.08;a=0.9; e= 1/250; Io=0.65;g= 0.0001; b=0.001;
for k=1: N-1;ki(k+1)= k+1;x(k)= 1/(1+exp(-y(k)/e));y(k+1)= a*y(k) + g -
z(k)*(x(k) - Io);z(k+1)= (1-b)*z(k);
endx(N)= 1/(1+exp(-y(N)/e));
⎟⎠⎞
⎜⎝⎛ −
+−+=
=
−+
+
Ie
zgyy
zz
kykkk
kk
ρα
β
/1
1
11
Jun Wang