Upload
infodotz
View
218
Download
0
Embed Size (px)
Citation preview
7/30/2019 1FC0317Bd01
http://slidepdf.com/reader/full/1fc0317bd01 1/6
Stable Neural Control of a Flexible-Joint
Manipulator Subjected to Sinusoidal DisturbanceC.J.B. Macnab
Dept. of Electrical and Computer Engineering, University of Calgary, Calgary, Alberta, Canada,
Email: [email protected]
Abstract—The proposed method aims at halting weight driftwhen using multilayer perception backpropagation networks indirect adaptive control schemes, without sacrificing performanceor requiring unrealistic large control gains. Unchecked weightdrift can lead to a chattering control signal and cause bursting.
Previously proposed robust weight update methods, includinge-modification and dead-zone, will sacrifice significant perfor-mance if large control gains cannot be applied. In this work, a setof alternate weights guides the training in order to prevent drift.Experiments with a two-link flexible-joint robot demonstrate theimprovement in performance compared to e-modification anddead-zone.
1 .. INTRODUCTION
Neural-adaptive control systems utilize neural networks that
adapt online in an unsupervised learning strategy, eliminating
the need for pretraining. Using a Lyapunov-stable direct-
adaptive control framework to derive neural-network weight
update laws can produce stable neural-network robot controls
[1]. However, if a persistent disturbance prevents the errorfrom going to zero, the weights tend to drift upwards in
magnitude. The weight drift effect is well known in static
neural network learning, where it results in overtraining.
Weight drift will eventually cause a chattering control, which
may excite the dynamics and cause a sudden growth in
error (bursting) [2]. Several standard adaptive Lyapunov-
stable control designs guarantee bounded signals, which have
also been applied to neural adaptive control ,including dead-
zone [3], leakage [4], and e-modification [1]). However, these
methods require very large feedback gains to guarantee small
errors. To make the system robust to a significant persistent
disturbance while using realistic gains, one must sacrifice
significant performance (e.g. increasing the size of the dead-zone or increasing the e-modification gain).
This paper further develops and experimentally verifies
a novel technique for halting weight drift in a multilayer
perceptron backpropagation network [5]. This method does
not require a significant sacrifice of performance to achieve ro-
bustness to a sinusoidal disturbance near the natural frequency
of the arm. Experiments with a commercially available two-
link flexible-joint arm show the improvement in performance
over other methods.
q 1 q 2
o = wσ(HT q) = wσ(Vq)
h11
h12 h21 h22h31
h32
hT 1 q hT 2 q hT 3 q
w1w2
w3
σ(h1q) σ(h2q) σ(h3q)
Fig. 1. A linear-output, two-layer MLP
2 .. BACKGROUND
A two-layer linear-output multilayer perceptron (Figure1),
with m hidden units and p inputs, provides an output
o = wT σ(HT q) = wT
σ(Vq) (1)
where w ∈ Rm contains output weights, HT ∈ Rm× p
contains hidden weights, q ∈ R p contains inputs, and
σ(HT q) ∈ Rm contains outputs from the hidden layer. The
matrix HT = V contains row vectors and column vectors as
follows:
HT =
⎡⎢⎣
h11 . . . h1 p
.... . .
...
hm1 . . . hmp
⎤⎥⎦ =
⎡⎢⎣
hT 1...
hT m
⎤⎥⎦ =
v1 . . . v p
= V
(2)
Direct neural-adaptive control works for systems of form
x1 = x2 (3)
M x2 = g(x1, x2) + u (4)
Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand
978-1-4244-2713-0/09/$25.00 ©2009 IEEE 698
7/30/2019 1FC0317Bd01
http://slidepdf.com/reader/full/1fc0317bd01 2/6
w1
w2
w2 ˆ p1
ˆ p2
ˆ p2
wT σ pT σ
Fig. 2. Alternate weights for the output layer
where g(x) contains linear and nonlinear functions and
M is a positive constant. Given a desired trajectory
x1,d(t), x2,d(t), x2,d(t) and defining the augmented state error
as z = λ(x1 −x1,d)+ (x2 −x2,d), with constant λ > 0, results
in error dynamics
M z = f (q, t) + u (5)
where q =
x1 x2 x1,d x2,d x2,d
T . The unknown
weights that would ideally model f (q, t) are denoted w and
H:
f (q, t) = o(w, H, q) + d(q, t) (6)
where d(q, t) is a uniform bounded modeling error. The
weight errors become w = w − w and H = H − H. Consider
the Lyapunov-like function and its derivative
V =1
2M z2 +
1
2β wT w + tr(VT V)
(7)
V = z(f (q, t) + u) − 1β
wT ˙w + tr(VT ˙V)
(8)
where tr() denotes the trace of a matrix. The neural network
contributes a portion of the control signal, along with state
feedback and a nonlinear robust term, in
u = −o − Gz − ζz |z| (9)
where G is a positive feedback gain and ζ is a positive
constant. Using e-modification to achieve robustness to the
modeling error (and any bounded external disturbance) results
in output weight updates as in [1]
˙w = β [z(σ − σVq) − ν |z|w], (10)
where β > 0 is the adaptation gain and ν > 0 is the e-
modification gain. Two equivalent expressions for the hidden
weight updates are
˙hk = β
zq(wT
σ)k − ν |z|hk
, k = 1 . . . m , (11)
˙vj = β
zq j(wT σ
)T − ν |z|vj
, j = 1 . . . p , (12)
where each q k is an element of vector q, (wT σ
)j is an
element of (wT σ
), and σ contains the derivatives of σ with
q 1 q 2
hj1
hj2 rj1rj2
rT j qhT j q
Fig. 3. Alternate weights for the hidden layer (the jth neuron illustrated)
respect to HT q. The control 9 and weights updates (10)(12)
result in a guarantee of semi-globally uniformly ultimately
bounded (SGUUB) signals, established in Appendix A.
Note that another popular method of robustifying theweight update is dead-zone where the weight updates are not
applied when z < δ with δ > dmax/Gmin where dmax is a
bound on disturbances and Gmin is the minimum eigenvalue
of the gain.
3 . . PROPOSED METHOD
In the proposed method alternate weights p help supervise
the training of the output weights. The hidden layer alternate
weights are
RT = ⎡⎢⎣r11 . . . r1 p
..
.
. ..
..
.rm1 . . . rmp
⎤⎥⎦ = ⎡⎢⎣rT 1
..
.rT m
⎤⎥⎦ = s1 . . . s p = S
(13)
The idea is that the alternate weights try to approximate the
outputs of the control weights w and H, on a per-layer basis
(Figures 2 and 3). The design of the training rule ensures the
alternate weights do not undergo the same weight drift as the
control weights.
A. Alternate Weights - Supervised learning
The output alternate weights undergo training
˙p =|z|β a(wT σ − pT σ)σ − Cp , (14)
where a is a positive learning gain and C is positive-definite
(p.d.) diagonal leakage gain. In words, the alternate output
pT σ trains to approximate control output wT σ. The leakage
term −Cp prevents the weights from drifting to infinity. A
proper design of a and C will sacrifice a little approximation
accuracy to keep the weights relatively small in magnitude
(preventing drift). For the alternate hidden weights
˙sk =|z|β
bk q k(Vq − Sq) − Dk sk
, k = 1 . . . p (15)
Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand
699
7/30/2019 1FC0317Bd01
http://slidepdf.com/reader/full/1fc0317bd01 3/6
where sj ∈ Rm are the column vectors of ST , bj is positive,
and Dj is p.d. diagonal. In words, the alternate weights on
the hidden layer produce outputs ST q which will approximate
Vq and the term −Dj sj provides leakage.Although the learning rules (14) and (15) could produce a
set of alternate weights in an off-line training stage, the next
section proposes a method to accomplish this training on-line.
Then terms a,bj , C, and Dj all become appropriately designed
variables.
B. Online Training
Utilizing the alternate weights, the adaptation law (10)changes to
˙w = β z (σ− σ
Vq) + a|z |(pT σ− w
T σ)σ + |z |C(p− w)
˜(16)
The update (16) prevents bursting as long as
1) p < w2) each positive (diagonal) term in C is large enough
The adaptation law for the the hidden weights is analogous:
˙vj = β [(wT σ
)T zq j + bjq j |z |(Sq − Vq) + |z |Dj (sj − vj)]
for j = 1 . . . m and the requirements to prevent bursting are
analogous.
To meet the first requirement, the alternate weights are
initialized to be smaller in magnitude and are kept smaller
by choosing
a(w, p) = η
1
m
m
k=1
|wk − ˆ pk|
(17)
bj(Vj , Sj) = η
1
m
mk=1
|hkj − rkj |
(18)
where η is a positive constant. In words, the adaptation rate
is proportional to the average difference between the alternate
and control weight magnitudes.
To meet the second requirement, measurements of the
weight drift indicate how large to make C and each Dj , called
weight drift indicators:
yw =
|w1 − ˆ p1| . . . |wm − ˆ pm|
T 1m
mk=1 |wk − ˆ pk|
(19)
yVj =
|H 1j − S 1j | . . . |H mj − S mj |T
1m
mk=1 |H kj − S kj |
(20)
That is, the relative (magnitude) of difference between the
control and alternate weights measures the drift of a particular
control weight. Designing C and each Dj to utilize the weight
drift indicators:
C(w, p) = diag (ζ exp(μ yw) + ρ) (21)
Dj(Vj , Sj) = diag (ζ exp(μ yVj) + ρ) (22)
Fig. 4. Experimental two-link flexible-joint robot arm
where ζ and μ are positive constants and ρ is any positive
constant. Measuring the values of the weight drift indicators
(19) and (20) at the point a non-robust experiment goes unsta-
ble allows the quantitative design of appropriate exponential
curves (21) and (22) (ensuring that C and Dj
are very small
when the weight drift indicators are not near their critical
values, but become large otherwise.)
C. Dead-zone
The dead-zone is an area near zero error z < δ where
the control weights freeze. Unlike traditional dead-zone which
requires knowledge of the maximum disturbance bound, this
dead-zone is simply a very small region near the origin where
the method can typically bring the system.
4 .. EXPERIMENTAL APPARATUS
Trajectory tracking of a two-link flexible-joint robot exper-
iment (Fig. 4) serves to validate the approach. During theexperiments, a 1 Kg payload sits at the end of the second
link. Both natural frequencies of the robot are approximately
1 Hz with the payload.
In the experiment the tip of the second link traces a 4
cm square trajectory in 16 seconds, while being subjected
to disturbance
d(t) = 0.32 sin(2πt) (23)
The disturbance has just enough amplitude to make the second
flexible joint have oscillations visible to the naked eye, and
is very near the natural frequencies.
Making the system equivalent to (5), a backstepping pro-
cedure produces appropriate virtual controls and controls asin [6] resulting in
I (q)z = −G z + F (q) − WT σ(VT q), (24)
where q contains the robot states, z contains the output errors
and virtual control errors, F contains linear and nonlinear
terms, and G is a positive-definite, constant, symmetric matrix.
One single-output MLP is used for each (virtual) control
signal, each with five hidden units. This ensures the training
can take place quickly, within 100 repetitive trials (although it
Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand
700
7/30/2019 1FC0317Bd01
http://slidepdf.com/reader/full/1fc0317bd01 4/6
5 10 15 20 25 30 35 40 45 50
0.2
0.4
0.6
0.8
1
1.2
1.4
5 10 15 20 25 30 35 40 45 500
0.01
0.02
0.03
0.04
Trials
R M S
e r r o r ( d e g . )
M a x . w e i g h t m a g n i t u d e
Fig. 5. Training of MLP without any robust modification
may not be able to learn additional trajectories). The hidden
units contain typical sigmoidal functions.
A. Results
All experiments use a common adaptation gain β = 0.5and control gains Gi = diag(2, 2) for i = 1, 2, 3. When
the weight updates do not have any robust modification, the
RMS error briefly converges to 0.2 degrees before the weight
drift causes bursting (and apparent instability) on the 50th
repetitive trial (Figure 5). When using e-modification weight
updates (10),(11) three different values of ν fail to provide
satisfactory performance (Figure 6). Only a value of ν = 1.6(or greater) is able to stop the weight drift, but the resulting
error is five times worse the optimum performance.
Other experiments allowed identification of the critical
value of the weight drift indicators (when bursting occurs)
as yw = yVj= 3, leading to a choice of ζ = 0.0015
and μ = 3 according the design method in Section 3.-B.
Using parameter values of ρ = 0 and η = 0.1 produced
satisfactory alternate outputs. The only parameter than needs
to be identified through further experiment is the size of the
dead-zone δ . Values for the dead-zone of greater than 1 degree
were all sufficient to halt the weight drift. The dead-zone of 1
degree results in the best performance, very near 0.4 degrees
RMS errors (Fig. 7). Note that a traditional dead-zone designrequires δ > dmax/Gmin = 0.32/2 or 9 degrees. Thus, the
new method performs about nine times better than traditional
dead-zone, and from the experiments we see the proposed
method performs four times better than e-modification.
5 . . CONCLUSIONS
A method that uses an alternate set of weights to guide
the training of a multilayer perceptron neural network can
achieve stable control of a flexible-joint robot in the presence
10 20 30 40 50 60 70 80 90 100
0.2
0.4
0.6
0.8
1
1.2
1.4
10 20 30 40 50 60 70 80 90 1000
0.005
0.01
0.015
0.02
0.025
0.03
Trials
R M S
e r r o r ( d e g . )
M a x . w e i g h t m a g n i t u d e
ν = 0.2ν = 0.9ν = 1.6
Fig. 6. Training of MLP using e-modification
10 20 30 40 50 60 70 80 90
0.2
0.4
0.6
0.8
1
1.2
1.4
10 20 30 40 50 60 70 80 900
0.005
0.01
0.015
0.02
0.025
0.03
Trials
R M S e r r o r ( d e g . )
M a
x . w e i g h t m a g n i t u d e
1 deg2 deg5 deg
Fig. 7. Training of MLP using proposed method - varying learning dead-zone
of a significant sinusoidal disturbance. In this situation, a
weight update with no robust modification first converges to
an optimum level of performance before going unstable (due
to unchecked weight drift). The traditional robust methods of
e-modification and dead-zone fail to produce a practical result
in this situation, sacrificing so much performance that no
significant adaptation occurs. The proposed method, however,
can still adapt and reduce the RMS error over a number of repetitive trials, coming within 0.2 degrees of the optimum
performance while completely stopping the weight drift.
APPENDIX A - STABLE BACKPROPAGATION
The solution to stable backpropagation using e-modification
that follows was introduced in [1]. A neural network can
uniformly approximate a nonlinear function f (q) in a local
region D ⊂ R p if there exists a set of weights w and V such
Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand
701
7/30/2019 1FC0317Bd01
http://slidepdf.com/reader/full/1fc0317bd01 5/6
that:
f (q) = wT σ(Vq) + d(t, q) (25)
with |d(t, q)| < dmax ∀ q ∈ D. Define weight errors wT
=wT − wT , hidden weight errors V = V − V and equation (25)
becomes:
f (q) = (wT + wT )[σ + σ(Vq)] + d (26)
where σ = σ(Vq) − σ(Vq). Defining σ = σ(Vq). Then as
in [1] use a Taylor series expansion of σ = σ − σ about σ:
σ = σ(Vq) + σ(Vq − Vq) + O()2 − σ
= σVq + O()2 (27)
where O()2 represents higher order terms and:
σ = ∂ σ
∂ (Vq)
V=V
. (28)
Assume σ ≤ σmax and σ ≤ γ with σmax and γ positive constants. Bound the norm of the higher order terms,
assuming xd ≤ κ with κ a positive constant, as follows:
O()2 ≤ σ + σVq ≤ 2σmax + γ Vq
≤ 2σmax + γκV + γ Vz (29)
where the matrix norm for V is the Frobenius norm. Rewrite
the nonlinear function approximation (26) as:
f (q) =w
T
(σ − σ
ˆVq) + w
T
(σ ˜Vq + σ) + (30)
where:
= wσVq + wT O()2 + d, (31)
≤ γ wVq + wO()2 + dmax (32)
≤ γκwV + γ wVz + wO()2 + dmax
Combined with (29) the result is:
≤ A1 + A2W + A3zW, (33)
where A1, A2 and A3 are positive constants and W =diag(w, V). Equation (24) requires n neural networks with
corresponding weights given by wi and Vi for i = 1 . . . n.Consider the (adaptive control) Lyapunov-like function:
V =1
2zT I z +
1
2β
ni=1
wT i wi + tr(VT
i Vi)
(34)
where tr() denotes the trace of a matrix. Then:
V =d
dt
1
2zT I z
−
1
β
ni=1
wT i
˙wi + tr(VT i
˙Vi)
(35)
Evaluate the first term:
d
dt 1
2zT I z = zT I z +
1
2˙ I z (36)
= zT (−Z z − Gz + F − c +12
˙ I z + ρ) (37)
and assume the ith neural network, for i = 1 . . . n, can model
nonlinearities:
F i +1
2
˙ I zi
= ci + di = wT i σ(Viq) + di (38)
Using the fact that zT Z z = 0 write V =n
i=1 V i and
evaluate V i by combining (35),(37), and (38) and expandingthe vector z = [z1, z2, . . . zn]T :
V i = z i[wT i σ(Viq)+di− ci−Giz i−ρi]−
1
β
“w
T i
˙wi − tr(VT i
˙Vi)
”(39)
and using the result from (30):
V i =zi[wT i (σi − σ
iViq) + wT
i σiViq + wT
i σi + i
− ci − Gizi − ρi] −1
β
wT i
˙wi + tr(VT i
˙Vi)
(40)
Using the facts ci = wT i σi and
tr(VT i
˙Vi) =
pj=1
vT i,j˙vi,j and wT
i σiViq =
pj=1
vT i,j(wT i σ
i)T q j
results in
V i = z i(i −Giz i − ρi) + wT i
"z i(σi − σiViq)−
˙wi
β
#
+pX
j=1
vT i,j
"(w
T i σ
i)T z iq j −˙vi,j
β
#(41)
where q j is the jth element of q. The weight updates, using
e-modification, are:
˙wi = β [zi(σi − σiViq) − ν zwi] (42)
˙vi,j = β [(wT i σ
i)T ziq j − ν zvi,j] (43)
where ν is a positive constant which needs to be chosen large
enough to prevent weight drift. This is a (stable) form of
backpropagation. The resulting Lyapunov derivative is:
˙V = −z
T
Gz + z
T
ρ + z
T
+ ν z
ni=1
w
T
i wi + tr(˜V
T
i
ˆVi)
(44)
Choosing a form of nonlinear damping for the robust term:
ρ = −ζ zz with constant ζ > 0 (45)
results in a bound for the Lyapunov derivative:
V ≤ z
−gz − ζ z2 +
ni=1
i + ν tr(WT
i Wi)(46)
Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand
702
7/30/2019 1FC0317Bd01
http://slidepdf.com/reader/full/1fc0317bd01 6/6
where g is the minimum eigenvalue of G and W =diag(w, V). Using Wi = Wi − Wi and the bound from (33)results in:
V ≤z„»
zW
–T »
−ζ A3/2A3/2 −ν
– »zW
–»
−gA2 + ν W
–T »zW
–+A1
«(47)
where each Ak = [Ak,1 . . . Ak,n] and W =diag(W1 . . . Wn).
Setting (47) equal to zero describes the boundary of a
compact set B on the (z, W) plane. Outside of this
compact set V < 0 if the matrix in the elliptic term is negative
definite, which means the parameters must be chosen such that
ζν > A23/4.
Note that knowledge of the maximum bound on W (the
ideal weights) is required to calculate A3. By standard Lya-punov arguments, the smallest Lyapunov surface enclosing B is then a bound on the signals. By Barbalat’s Lemma, the
surface B is an ultimate bound (as t → ∞) if all signals
are continuous. The system is described as semi-globally
uniformly ultimately bounded.
APPENDIX B - STABILITY OF NEW METHOD
The method of alternate weights is Lyapunov stable in that
all signals are semi-globally uniformly ultimately bounded.
The ability to prevent weight drift better than e-modification
is not apparent in the stability proof, but rather must be
established in simulation experiment. In order to save space,the stability proof for the scalar version is presented. The
stability proof starts with the (adaptive control) Lyapunov-
like function:
V =1
2 I z +
1
2β
wT w + pT p + +tr(VT V) + tr(ST S)
(48)
The derivative is:
V = z[wT
(σ − σ
Vq) + wT σ
Vq + w
T σ +
− c − Gz − r] −1
β
hwT ˙w + tr(V
T ˙V) + p
T ˙p + tr(ST ˙
S)i
V = z( − Gz − r) + wT
z(σ − σVq) −
˙w
β
!
+
pXj=1
vT j
(wT ˆσ
)T zqj −
˙vjβ
!−
1
βpT ˙p −
1
βtr(ST ˙S) (49)
Substitution of weight updates (16),(17),(14),(15) gives:
V = −Gz2
+ zr + z + z
„−w
T [a(p
T σ − w
T σ)σ + C(p − w)]
− pT [a(w
T σ − p
T σ)σ − Cp] −
pXj=1
vT j [bqj(Sq − Vq) + Dj (s
T j − v
T j )]
−
pXj=1
sT j [bqj (Vq − S
T q) − Dj sj ]
«
Next establish the negative semi-definiteness of terms:
−α[wT
(pT σ − w
T σ)σ + p
T (w
T σ − p
T σ)σ]
= −α(wT σ − p
T σ)
T (w
T σ − p
T σ) ≤ 0
and again for terms:
−
pXj=1
bj
hvT j (qj(S
T q − Vq) + s
T j qj(Vq − Sq)
i
= −bj
pXj=1
(vT j qj − s
T j qj)(S
T q − Vq)
= −bj(Vq − Sq)T
(Vq − ST
q) ≤ 0 (50)
Now, using r = −ζz |z|, bound the derivative:
V ≤|z|`
−G|z| − rz2
+ − Ch
wT
(p − w) + pT
pi
−
pXj=1
Dj
hvT j (sj − vj) − s
T j sj
i´(51)
Establish bounds for the terms:−C[w
T (p − w) − p
T p] = −C[w
T (w − p) − p
T (w − p)]
= C[−wT
w + wT
p + pT
w − pT
p]
≤ −ρ
2[w
T pT
]T
2
+ ρw [wT
pT
]T
(52)
and again establish bounds:
−
pXj=1
Dj
hvT j (sj − vj) + s
T j sj
i
≤
pXj=1
„−ρ
2[v
T j s
T j ]
T 2
+ ρvj [vT j s
T j ]
T
«
≤ −ρ
2[V S]
T 2
+ ρV [V S]T
(53)
DefiningW
a = diag([wT pT
]
T
, [V S
]
T
) results in:V ≤|z|
„−G|z| − ζz
2+ A1 + A2W + A3|z|W
−ρ
2Wa2 + ρWWa
«(54)
which has the same basic form as (47) so that semi-global
ultimately boundedness of signals can be established in the
same way.
REFERENCES
[1] F. Lewis, S. Jagannathan, and A. Yesildirek, Neural Network Control of Robot Manipulators and Nonlinear Systems. Philidelphia, PA: Taylorand Francis, 1999.
[2] L. HSU and R. COSTA, “Bursting phenomena in continuous-timeadaptive systems with a o-modification,” IEEE Trans. Automat. Contr.,vol. 32, no. 1, pp. 84–86, 1987.
[3] M. French, C. Szepaesvari, and E. Rogers, Performance of Nonlinear Approximate Adaptive Controllers. West Sussex, England: Wiley, 2003.
[4] J. Spooner, M. Maggiore, R. Ordonez, and K. Passino, Stable Adap-
tive Control and Estimation for Nonlinear Systems, Neural and Fuzzy Approximator Techniques. Wiley-Interscience, 2001.
[5] C. Macnab, “A new robust weight update for multilayer-perceptronadaptive control,” Control and Intelligent Systems, vol. 35, no. 3, pp.279–288, 2007.
[6] ——, “Local basis functions in adaptive control of elastic systems,” inProc. IEEE Int. Conf. Mechatronics Automation, Niagra Falls, Canada,2005, pp. 19–25.
Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand
703