1FC0317Bd01

7/30/2019 1FC0317Bd01

http://slidepdf.com/reader/full/1fc0317bd01 1/6

Stable Neural Control of a Flexible-Joint

Manipulator Subjected to Sinusoidal DisturbanceC.J.B. Macnab

Dept. of Electrical and Computer Engineering, University of Calgary, Calgary, Alberta, Canada,

Email: [email protected]

Abstract—The proposed method aims at halting weight driftwhen using multilayer perception backpropagation networks indirect adaptive control schemes, without sacrificing performanceor requiring unrealistic large control gains. Unchecked weightdrift can lead to a chattering control signal and cause bursting.

Previously proposed robust weight update methods, includinge-modification and dead-zone, will sacrifice significant perfor-mance if large control gains cannot be applied. In this work, a setof alternate weights guides the training in order to prevent drift.Experiments with a two-link flexible-joint robot demonstrate theimprovement in performance compared to e-modification anddead-zone.

1 .. INTRODUCTION

Neural-adaptive control systems utilize neural networks that

adapt online in an unsupervised learning strategy, eliminating

the need for pretraining. Using a Lyapunov-stable direct-

adaptive control framework to derive neural-network weight

update laws can produce stable neural-network robot controls

[1]. However, if a persistent disturbance prevents the errorfrom going to zero, the weights tend to drift upwards in

magnitude. The weight drift effect is well known in static

neural network learning, where it results in overtraining.

Weight drift will eventually cause a chattering control, which

may excite the dynamics and cause a sudden growth in

error (bursting) [2]. Several standard adaptive Lyapunov-

stable control designs guarantee bounded signals, which have

also been applied to neural adaptive control ,including dead-

zone [3], leakage [4], and e-modification [1]). However, these

methods require very large feedback gains to guarantee small

errors. To make the system robust to a significant persistent

disturbance while using realistic gains, one must sacrifice

significant performance (e.g. increasing the size of the dead-zone or increasing the e-modification gain).

This paper further develops and experimentally verifies

a novel technique for halting weight drift in a multilayer

perceptron backpropagation network [5]. This method does

not require a significant sacrifice of performance to achieve ro-

bustness to a sinusoidal disturbance near the natural frequency

of the arm. Experiments with a commercially available two-

link flexible-joint arm show the improvement in performance

over other methods.

q 1 q 2

o = wσ(HT q) = wσ(Vq)

h11

h12 h21 h22h31

h32

hT 1 q hT 2 q hT 3 q

w1w2

w3

σ(h1q) σ(h2q) σ(h3q)

Fig. 1. A linear-output, two-layer MLP

2 .. BACKGROUND

A two-layer linear-output multilayer perceptron (Figure1),

with m hidden units and p inputs, provides an output

o = wT σ(HT q) = wT

σ(Vq) (1)

where w ∈ Rm contains output weights, HT ∈ Rm× p

contains hidden weights, q ∈ R p contains inputs, and

σ(HT q) ∈ Rm contains outputs from the hidden layer. The

matrix HT = V contains row vectors and column vectors as

follows:

HT =

⎡⎢⎣

h11 . . . h1 p

.... . .

...

hm1 . . . hmp

⎤⎥⎦ =

⎡⎢⎣

hT 1...

hT m

⎤⎥⎦ =

v1 . . . v p

= V

(2)

Direct neural-adaptive control works for systems of form

x1 = x2 (3)

M x2 = g(x1, x2) + u (4)

Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand

978-1-4244-2713-0/09/$25.00 ©2009 IEEE 698

7/30/2019 1FC0317Bd01


w1

w2

w2 ˆ p1

ˆ p2

ˆ p2

wT σ pT σ

Fig. 2. Alternate weights for the output layer

where g(x) contains linear and nonlinear functions and

M is a positive constant. Given a desired trajectory

x1,d(t), x2,d(t), x2,d(t) and defining the augmented state error

as z = λ(x1 −x1,d)+ (x2 −x2,d), with constant λ > 0, results

in error dynamics

M z = f (q, t) + u (5)

where q =

x1 x2 x1,d x2,d x2,d

T . The unknown

weights that would ideally model f (q, t) are denoted w and

H:

f (q, t) = o(w, H, q) + d(q, t) (6)

where d(q, t) is a uniform bounded modeling error. The

weight errors become w = w − w and H = H − H. Consider

the Lyapunov-like function and its derivative

V =1

2M z2 +

1

2β wT w + tr(VT V)

(7)

V = z(f (q, t) + u) − 1β

wT ˙w + tr(VT ˙V)

(8)

where tr() denotes the trace of a matrix. The neural network

contributes a portion of the control signal, along with state

feedback and a nonlinear robust term, in

u = −o − Gz − ζz |z| (9)

where G is a positive feedback gain and ζ is a positive

constant. Using e-modification to achieve robustness to the

modeling error (and any bounded external disturbance) results

in output weight updates as in [1]

˙w = β [z(σ − σVq) − ν |z|w], (10)

where β > 0 is the adaptation gain and ν > 0 is the e-

modification gain. Two equivalent expressions for the hidden

weight updates are

˙hk = β

zq(wT

σ)k − ν |z|hk

, k = 1 . . . m , (11)

˙vj = β

zq j(wT σ

)T − ν |z|vj

, j = 1 . . . p , (12)

where each q k is an element of vector q, (wT σ

)j is an

element of (wT σ

), and σ contains the derivatives of σ with

q 1 q 2

hj1

hj2 rj1rj2

rT j qhT j q

Fig. 3. Alternate weights for the hidden layer (the jth neuron illustrated)

respect to HT q. The control 9 and weights updates (10)(12)

result in a guarantee of semi-globally uniformly ultimately

bounded (SGUUB) signals, established in Appendix A.

Note that another popular method of robustifying theweight update is dead-zone where the weight updates are not

applied when z < δ with δ > dmax/Gmin where dmax is a

bound on disturbances and Gmin is the minimum eigenvalue

of the gain.

3 . . PROPOSED METHOD

In the proposed method alternate weights p help supervise

the training of the output weights. The hidden layer alternate

weights are

RT = ⎡⎢⎣r11 . . . r1 p

..

.

. ..

..

.rm1 . . . rmp

⎤⎥⎦ = ⎡⎢⎣rT 1

..

.rT m

⎤⎥⎦ = s1 . . . s p = S

(13)

The idea is that the alternate weights try to approximate the

outputs of the control weights w and H, on a per-layer basis

(Figures 2 and 3). The design of the training rule ensures the

alternate weights do not undergo the same weight drift as the

control weights.

A. Alternate Weights - Supervised learning

The output alternate weights undergo training

˙p =|z|β a(wT σ − pT σ)σ − Cp , (14)

where a is a positive learning gain and C is positive-definite

(p.d.) diagonal leakage gain. In words, the alternate output

pT σ trains to approximate control output wT σ. The leakage

term −Cp prevents the weights from drifting to infinity. A

proper design of a and C will sacrifice a little approximation

accuracy to keep the weights relatively small in magnitude

(preventing drift). For the alternate hidden weights

˙sk =|z|β

bk q k(Vq − Sq) − Dk sk

, k = 1 . . . p (15)


699

7/30/2019 1FC0317Bd01


where sj ∈ Rm are the column vectors of ST , bj is positive,

and Dj is p.d. diagonal. In words, the alternate weights on

the hidden layer produce outputs ST q which will approximate

Vq and the term −Dj sj provides leakage.Although the learning rules (14) and (15) could produce a

set of alternate weights in an off-line training stage, the next

section proposes a method to accomplish this training on-line.

Then terms a,bj , C, and Dj all become appropriately designed

variables.

B. Online Training

Utilizing the alternate weights, the adaptation law (10)changes to

˙w = β z (σ− σ

Vq) + a|z |(pT σ− w

T σ)σ + |z |C(p− w)

˜(16)

The update (16) prevents bursting as long as

1) p < w2) each positive (diagonal) term in C is large enough

The adaptation law for the the hidden weights is analogous:

˙vj = β [(wT σ

)T zq j + bjq j |z |(Sq − Vq) + |z |Dj (sj − vj)]

for j = 1 . . . m and the requirements to prevent bursting are

analogous.

To meet the first requirement, the alternate weights are

initialized to be smaller in magnitude and are kept smaller

by choosing

a(w, p) = η

1

m

m

k=1

|wk − ˆ pk|

(17)

bj(Vj , Sj) = η

1

m

mk=1

|hkj − rkj |

(18)

where η is a positive constant. In words, the adaptation rate

is proportional to the average difference between the alternate

and control weight magnitudes.

To meet the second requirement, measurements of the

weight drift indicate how large to make C and each Dj , called

weight drift indicators:

yw =

|w1 − ˆ p1| . . . |wm − ˆ pm|

T 1m

mk=1 |wk − ˆ pk|

(19)

yVj =

|H 1j − S 1j | . . . |H mj − S mj |T

1m

mk=1 |H kj − S kj |

(20)

That is, the relative (magnitude) of difference between the

control and alternate weights measures the drift of a particular

control weight. Designing C and each Dj to utilize the weight

drift indicators:

C(w, p) = diag (ζ exp(μ yw) + ρ) (21)

Dj(Vj , Sj) = diag (ζ exp(μ yVj) + ρ) (22)

Fig. 4. Experimental two-link flexible-joint robot arm

where ζ and μ are positive constants and ρ is any positive

constant. Measuring the values of the weight drift indicators

(19) and (20) at the point a non-robust experiment goes unsta-

ble allows the quantitative design of appropriate exponential

curves (21) and (22) (ensuring that C and Dj

are very small

when the weight drift indicators are not near their critical

values, but become large otherwise.)

C. Dead-zone

The dead-zone is an area near zero error z < δ where

the control weights freeze. Unlike traditional dead-zone which

requires knowledge of the maximum disturbance bound, this

dead-zone is simply a very small region near the origin where

the method can typically bring the system.

4 .. EXPERIMENTAL APPARATUS

Trajectory tracking of a two-link flexible-joint robot exper-

iment (Fig. 4) serves to validate the approach. During theexperiments, a 1 Kg payload sits at the end of the second

link. Both natural frequencies of the robot are approximately

1 Hz with the payload.

In the experiment the tip of the second link traces a 4

cm square trajectory in 16 seconds, while being subjected

to disturbance

d(t) = 0.32 sin(2πt) (23)

The disturbance has just enough amplitude to make the second

flexible joint have oscillations visible to the naked eye, and

is very near the natural frequencies.

Making the system equivalent to (5), a backstepping pro-

cedure produces appropriate virtual controls and controls asin [6] resulting in

I (q)z = −G z + F (q) − WT σ(VT q), (24)

where q contains the robot states, z contains the output errors

and virtual control errors, F contains linear and nonlinear

terms, and G is a positive-definite, constant, symmetric matrix.

One single-output MLP is used for each (virtual) control

signal, each with five hidden units. This ensures the training

can take place quickly, within 100 repetitive trials (although it


700

7/30/2019 1FC0317Bd01


5 10 15 20 25 30 35 40 45 50

0.2

0.4

0.6

0.8

1

1.2

1.4

5 10 15 20 25 30 35 40 45 500

0.01

0.02

0.03

0.04

Trials

R M S

e r r o r ( d e g . )

M a x . w e i g h t m a g n i t u d e

Fig. 5. Training of MLP without any robust modification

may not be able to learn additional trajectories). The hidden

units contain typical sigmoidal functions.

A. Results

All experiments use a common adaptation gain β = 0.5and control gains Gi = diag(2, 2) for i = 1, 2, 3. When

the weight updates do not have any robust modification, the

RMS error briefly converges to 0.2 degrees before the weight

drift causes bursting (and apparent instability) on the 50th

repetitive trial (Figure 5). When using e-modification weight

updates (10),(11) three different values of ν fail to provide

satisfactory performance (Figure 6). Only a value of ν = 1.6(or greater) is able to stop the weight drift, but the resulting

error is five times worse the optimum performance.

Other experiments allowed identification of the critical

value of the weight drift indicators (when bursting occurs)

as yw = yVj= 3, leading to a choice of ζ = 0.0015

and μ = 3 according the design method in Section 3.-B.

Using parameter values of ρ = 0 and η = 0.1 produced

satisfactory alternate outputs. The only parameter than needs

to be identified through further experiment is the size of the

dead-zone δ . Values for the dead-zone of greater than 1 degree

were all sufficient to halt the weight drift. The dead-zone of 1

degree results in the best performance, very near 0.4 degrees

RMS errors (Fig. 7). Note that a traditional dead-zone designrequires δ > dmax/Gmin = 0.32/2 or 9 degrees. Thus, the

new method performs about nine times better than traditional

dead-zone, and from the experiments we see the proposed

method performs four times better than e-modification.

5 . . CONCLUSIONS

A method that uses an alternate set of weights to guide

the training of a multilayer perceptron neural network can

achieve stable control of a flexible-joint robot in the presence

10 20 30 40 50 60 70 80 90 100

0.2

0.4

0.6

0.8

1

1.2

1.4

10 20 30 40 50 60 70 80 90 1000

0.005

0.01

0.015

0.02

0.025

0.03

Trials

R M S

e r r o r ( d e g . )

M a x . w e i g h t m a g n i t u d e

ν = 0.2ν = 0.9ν = 1.6

Fig. 6. Training of MLP using e-modification

10 20 30 40 50 60 70 80 90

0.2

0.4

0.6

0.8

1

1.2

1.4

10 20 30 40 50 60 70 80 900

0.005

0.01

0.015

0.02

0.025

0.03

Trials

R M S e r r o r ( d e g . )

M a

x . w e i g h t m a g n i t u d e

1 deg2 deg5 deg

Fig. 7. Training of MLP using proposed method - varying learning dead-zone

of a significant sinusoidal disturbance. In this situation, a

weight update with no robust modification first converges to

an optimum level of performance before going unstable (due

to unchecked weight drift). The traditional robust methods of

e-modification and dead-zone fail to produce a practical result

in this situation, sacrificing so much performance that no

significant adaptation occurs. The proposed method, however,

can still adapt and reduce the RMS error over a number of repetitive trials, coming within 0.2 degrees of the optimum

performance while completely stopping the weight drift.

APPENDIX A - STABLE BACKPROPAGATION

The solution to stable backpropagation using e-modification

that follows was introduced in [1]. A neural network can

uniformly approximate a nonlinear function f (q) in a local

region D ⊂ R p if there exists a set of weights w and V such


701

7/30/2019 1FC0317Bd01


that:

f (q) = wT σ(Vq) + d(t, q) (25)

with |d(t, q)| < dmax ∀ q ∈ D. Define weight errors wT

=wT − wT , hidden weight errors V = V − V and equation (25)

becomes:

f (q) = (wT + wT )[σ + σ(Vq)] + d (26)

where σ = σ(Vq) − σ(Vq). Defining σ = σ(Vq). Then as

in [1] use a Taylor series expansion of σ = σ − σ about σ:

σ = σ(Vq) + σ(Vq − Vq) + O()2 − σ

= σVq + O()2 (27)

where O()2 represents higher order terms and:

σ = ∂ σ

∂ (Vq)

V=V

. (28)

Assume σ ≤ σmax and σ ≤ γ with σmax and γ positive constants. Bound the norm of the higher order terms,

assuming xd ≤ κ with κ a positive constant, as follows:

O()2 ≤ σ + σVq ≤ 2σmax + γ Vq

≤ 2σmax + γκV + γ Vz (29)

where the matrix norm for V is the Frobenius norm. Rewrite

the nonlinear function approximation (26) as:

f (q) =w

T

(σ − σ

ˆVq) + w

T

(σ ˜Vq + σ) + (30)

where:

= wσVq + wT O()2 + d, (31)

≤ γ wVq + wO()2 + dmax (32)

≤ γκwV + γ wVz + wO()2 + dmax

Combined with (29) the result is:

≤ A1 + A2W + A3zW, (33)

where A1, A2 and A3 are positive constants and W =diag(w, V). Equation (24) requires n neural networks with

corresponding weights given by wi and Vi for i = 1 . . . n.Consider the (adaptive control) Lyapunov-like function:

V =1

2zT I z +

1

2β

ni=1

wT i wi + tr(VT

i Vi)

(34)

where tr() denotes the trace of a matrix. Then:

V =d

dt

1

2zT I z

−

1

β

ni=1

wT i

˙wi + tr(VT i

˙Vi)

(35)

Evaluate the first term:

d

dt 1

2zT I z = zT I z +

1

2˙ I z (36)

= zT (−Z z − Gz + F − c +12

˙ I z + ρ) (37)

and assume the ith neural network, for i = 1 . . . n, can model

nonlinearities:

F i +1

2

˙ I zi

= ci + di = wT i σ(Viq) + di (38)

Using the fact that zT Z z = 0 write V =n

i=1 V i and

evaluate V i by combining (35),(37), and (38) and expandingthe vector z = [z1, z2, . . . zn]T :

V i = z i[wT i σ(Viq)+di− ci−Giz i−ρi]−

1

β

“w

T i

˙wi − tr(VT i

˙Vi)

”(39)

and using the result from (30):

V i =zi[wT i (σi − σ

iViq) + wT

i σiViq + wT

i σi + i

− ci − Gizi − ρi] −1

β

wT i

˙wi + tr(VT i

˙Vi)

(40)

Using the facts ci = wT i σi and

tr(VT i

˙Vi) =

pj=1

vT i,j˙vi,j and wT

i σiViq =

pj=1

vT i,j(wT i σ

i)T q j

results in

V i = z i(i −Giz i − ρi) + wT i

"z i(σi − σiViq)−

˙wi

β

#

+pX

j=1

vT i,j

"(w

T i σ

i)T z iq j −˙vi,j

β

#(41)

where q j is the jth element of q. The weight updates, using

e-modification, are:

˙wi = β [zi(σi − σiViq) − ν zwi] (42)

˙vi,j = β [(wT i σ

i)T ziq j − ν zvi,j] (43)

where ν is a positive constant which needs to be chosen large

enough to prevent weight drift. This is a (stable) form of

backpropagation. The resulting Lyapunov derivative is:

˙V = −z

T

Gz + z

T

ρ + z

T

+ ν z

ni=1

w

T

i wi + tr(˜V

T

i

ˆVi)

(44)

Choosing a form of nonlinear damping for the robust term:

ρ = −ζ zz with constant ζ > 0 (45)

results in a bound for the Lyapunov derivative:

V ≤ z

−gz − ζ z2 +

ni=1

i + ν tr(WT

i Wi)(46)


702

7/30/2019 1FC0317Bd01


where g is the minimum eigenvalue of G and W =diag(w, V). Using Wi = Wi − Wi and the bound from (33)results in:

V ≤z„»

zW

–T »

−ζ A3/2A3/2 −ν

– »zW

–»

−gA2 + ν W

–T »zW

–+A1

«(47)

where each Ak = [Ak,1 . . . Ak,n] and W =diag(W1 . . . Wn).

Setting (47) equal to zero describes the boundary of a

compact set B on the (z, W) plane. Outside of this

compact set V < 0 if the matrix in the elliptic term is negative

definite, which means the parameters must be chosen such that

ζν > A23/4.

Note that knowledge of the maximum bound on W (the

ideal weights) is required to calculate A3. By standard Lya-punov arguments, the smallest Lyapunov surface enclosing B is then a bound on the signals. By Barbalat’s Lemma, the

surface B is an ultimate bound (as t → ∞) if all signals

are continuous. The system is described as semi-globally

uniformly ultimately bounded.

APPENDIX B - STABILITY OF NEW METHOD

The method of alternate weights is Lyapunov stable in that

all signals are semi-globally uniformly ultimately bounded.

The ability to prevent weight drift better than e-modification

is not apparent in the stability proof, but rather must be

established in simulation experiment. In order to save space,the stability proof for the scalar version is presented. The

stability proof starts with the (adaptive control) Lyapunov-

like function:

V =1

2 I z +

1

2β

wT w + pT p + +tr(VT V) + tr(ST S)

(48)

The derivative is:

V = z[wT

(σ − σ

Vq) + wT σ

Vq + w

T σ +

− c − Gz − r] −1

β

hwT ˙w + tr(V

T ˙V) + p

T ˙p + tr(ST ˙

S)i

V = z( − Gz − r) + wT

z(σ − σVq) −

˙w

β

!

+

pXj=1

vT j

(wT ˆσ

)T zqj −

˙vjβ

!−

1

βpT ˙p −

1

βtr(ST ˙S) (49)

Substitution of weight updates (16),(17),(14),(15) gives:

V = −Gz2

+ zr + z + z

„−w

T [a(p

T σ − w

T σ)σ + C(p − w)]

− pT [a(w

T σ − p

T σ)σ − Cp] −

pXj=1

vT j [bqj(Sq − Vq) + Dj (s

T j − v

T j )]

−

pXj=1

sT j [bqj (Vq − S

T q) − Dj sj ]

«

Next establish the negative semi-definiteness of terms:

−α[wT

(pT σ − w

T σ)σ + p

T (w

T σ − p

T σ)σ]

= −α(wT σ − p

T σ)

T (w

T σ − p

T σ) ≤ 0

and again for terms:

−

pXj=1

bj

hvT j (qj(S

T q − Vq) + s

T j qj(Vq − Sq)

i

= −bj

pXj=1

(vT j qj − s

T j qj)(S

T q − Vq)

= −bj(Vq − Sq)T

(Vq − ST

q) ≤ 0 (50)

Now, using r = −ζz |z|, bound the derivative:

V ≤|z|`

−G|z| − rz2

+ − Ch

wT

(p − w) + pT

pi

−

pXj=1

Dj

hvT j (sj − vj) − s

T j sj

i´(51)

Establish bounds for the terms:−C[w

T (p − w) − p

T p] = −C[w

T (w − p) − p

T (w − p)]

= C[−wT

w + wT

p + pT

w − pT

p]

≤ −ρ

2[w

T pT

]T

2

+ ρw [wT

pT

]T

(52)

and again establish bounds:

−

pXj=1

Dj

hvT j (sj − vj) + s

T j sj

i

≤

pXj=1

„−ρ

2[v

T j s

T j ]

T 2

+ ρvj [vT j s

T j ]

T

«

≤ −ρ

2[V S]

T 2

+ ρV [V S]T

(53)

DefiningW

a = diag([wT pT

]

T

, [V S

]

T

) results in:V ≤|z|

„−G|z| − ζz

2+ A1 + A2W + A3|z|W

−ρ

2Wa2 + ρWWa

«(54)

which has the same basic form as (47) so that semi-global

ultimately boundedness of signals can be established in the

same way.

REFERENCES

[1] F. Lewis, S. Jagannathan, and A. Yesildirek, Neural Network Control of Robot Manipulators and Nonlinear Systems. Philidelphia, PA: Taylorand Francis, 1999.

[2] L. HSU and R. COSTA, “Bursting phenomena in continuous-timeadaptive systems with a o-modification,” IEEE Trans. Automat. Contr.,vol. 32, no. 1, pp. 84–86, 1987.

[3] M. French, C. Szepaesvari, and E. Rogers, Performance of Nonlinear Approximate Adaptive Controllers. West Sussex, England: Wiley, 2003.

[4] J. Spooner, M. Maggiore, R. Ordonez, and K. Passino, Stable Adap-

tive Control and Estimation for Nonlinear Systems, Neural and Fuzzy Approximator Techniques. Wiley-Interscience, 2001.

[5] C. Macnab, “A new robust weight update for multilayer-perceptronadaptive control,” Control and Intelligent Systems, vol. 35, no. 3, pp.279–288, 2007.

[6] ——, “Local basis functions in adaptive control of elastic systems,” inProc. IEEE Int. Conf. Mechatronics Automation, Niagra Falls, Canada,2005, pp. 19–25.


703

Documents

1FC0317Bd01