ACCEPTED FOR PUBLICATION: IEEE TRANSACTIONS ON NEURAL ...mocha-java.uccs.edu/dossier/RESEARCH/200Xtnnaic-.pdf · ACCEPTED FOR PUBLICATION: IEEE TRANSACTIONS ON NEURAL NETWORKS, (IN

ACCEPTED FOR PUBLICATION: IEEE TRANSACTIONS ON NEURAL NETWORKS, (IN PRESS) 1

Adaptive Inverse Control ofLinear and Nonlinear Systems

Using Dynamic Neural NetworksGregory L. Plett, Senior Member, IEEE

Abstract—In this paper we see adaptive control as a three-part adaptive-filtering problem. First, the dynamical system we wish to control is mod-eled using adaptive system-identification techniques. Secondly, the dy-namic response of the system is controlled using an adaptive feedforwardcontroller. No direct feedback is used, except that the system output ismonitored and used by an adaptive algorithm to adjust the parameters ofthe controller. Thirdly, disturbance canceling is performed using an addi-tional adaptive filter. The canceler does not affect system dynamics, butfeeds back plant disturbance in a way that minimizes output disturbancepower. The techniques work to control minimum-phase or nonminimum-phase, linear or nonlinear, SISO or MIMO, stable or stabilized systems.Constraints may additionally be placed on control effort for a practical im-plementation. Simulation examples are presented to demonstrate that theproposed methods work very well.

Keywords—Adaptive inverse control, system identification, feedforwardcontrol, disturbance canceling, disturbance rejection.

I. INTRODUCTION�

RECISE control of dynamic systems (“plants”) can be ex-ceptionally difficult, especially when the system in ques-

tion is nonlinear. In response, many approaches have beendeveloped to aid the control designer. For example, an earlymethod called gain scheduling [1] linearizes the plant dynamicsaround a certain number of pre-selected operating conditions,and employs different linear controllers for each regime. Thismethod is simple and often effective, but applications exist forwhich the approximate linear model is not accurate enough toensure precise or even safe control. One example is the area offlight control, where stalling and spinning of the airframe canresult if operated too far away from the linear region.

A more advanced technique called feedback linearization ordynamic inversion has been used in situations including flightcontrol (e.g., see ref. [2]). Two feedback loops are used: (1)An inner loop uses an inverse of the plant dynamics to subtractout all nonlinearities, resulting in a closed-loop linear system,and (2) an outer loop uses a standard linear controller to aidprecision in the face of inverse plant mismatch and disturbances.

Whenever an inverse is used, accuracy of the plant model iscritical. So, in ref. [3] a method is developed that guaranteesrobustness even if the inverse plant model is not perfect. Inref. [4], a robust and adaptive method is used, to allow learningto occur on-line, tuning performance as the system runs. Yet,even here, a better initial analytical plant model results in bettercontrol and disturbance rejection.

Nonlinear dynamic inversion may be computationally inten-sive, and precise dynamic models may not be available, soref. [5] uses two neural-network controllers to achieve feedback

G. L. Plett is with the Department of Electrical and Computer Engineering,University of Colorado at Colorado Springs, P.O. Box 7150, Colorado Springs,CO 80933–7150 USA. email: [email protected] phone: +1 719 262–3468, fax:+1 719 262–3589

linearization and learning. One radial-basis-function neural net-work is trained off-line to invert the plant nonlinearities, and asecond is trained on-line to compensate for inversion error. Theuse of neural networks reduces computational complexity of theinverse dynamics calculation, and improves precision by learn-ing. In ref. [6], an alternate scheme is used where a feedfor-ward inverse recurrent neural network is used with a feedbackproportional-derivative controller to compensate for inversionerror and to reject disturbances.

All of these methods have certain limitations. For exam-ple, they require that an inverse exist, so do not work fornonminimum-phase plants or for ones with differing numbersof inputs and outputs. They generally also require that a fairlyprecise model of the plant be known a-priori. This paper insteadconsiders a technique called adaptive inverse control [7]–[13]—as reformulated by Widrow and Walach [13]—which does notrequire a precise initial plant model. Like feedback lineariza-tion, adaptive inverse control is based on the concept of dynamicinversion, but an inverse need not exist. Rather, an adaptive ele-ment is trained to control the plant such that model-referencebased control is achieved in a least-mean-squared-error opti-mal sense. Control of plant dynamics and control of plantdisturbance are treated separately, without compromise. Con-trol of plant dynamics can be achieved by preceding the plantwith an adaptive controller whose dynamics are a type of in-verse of those of the plant. Control of plant disturbance can beachieved by an adaptive feedback process that minimizes plantoutput disturbance without altering plant dynamics. The adap-tive controllers are implemented using nonlinear adaptive fil-ters. Nonminimum-phase and non-square plants may be con-trolled with this method.

PSfrag replacements

PlantPC

P

PCOPY

X

MzI

z−1I

Dist. wk

Dist. wk

wk

wk

rk

ukukukukukuk

yk

ykykykykykykdk

e(mod)ke(sys)

k

Fig. 1. Adaptive inverse control system.

Figure 1 shows a block-diagram of an adaptive inverse con-trol system. The system we wish to control is labeled the plant.It is subject to disturbances, and these are modeled additivelyat the plant output, without loss of generality. To control theplant, we first adapt a plant model P using adaptive system-

2 ACCEPTED FOR PUBLICATION: IEEE TRANSACTIONS ON NEURAL NETWORKS, (IN PRESS)

identification techniques. Secondly, the dynamic response ofthe system is controlled using an adaptive controller C . Theoutput of the plant model is compared to the measured plantoutput, and the difference is a good estimate of the disturbance.A special adaptive filter X is used to cancel the disturbances.

Control of linear systems require linear adaptive-filteringmethods, and control of nonlinear systems require nonlinearadaptive-filtering methods. However, even if the plant is linear,nonlinear methods will yield better results than linear methodsif there are non-quadratic constraints on the control effort, or ifthe disturbance is generated from a nonlinear or chaotic source.Therefore, we focus on nonlinear adaptive inverse control (oflinear or nonlinear plants) using nonlinear adaptive filters. Weproceed by first reviewing nonlinear adaptive filtering and sys-tem identification. Next, we discuss adaptive inverse control ofplant dynamics and adaptive disturbance canceling. We con-clude with simulation examples to demonstrate the techniques.

II. DYNAMIC NEURAL NETWORKS

A. The Structure of a Dynamic Neural Network

An adaptive filter has an input xk , an output yk , and a “spe-cial input” dk called the desired response. The filter computes adynamical function of its input, and the desired response speci-fies the output we wish the filter to have at that point in time. Itis used to modify the internal parameters of the filter in such away that the filter “learns” to perform a certain function.

A nonlinear adaptive filter computes a nonlinear dynamicalfunction of its input. It has a tapped delay line connected to itsinput and, possibly, a tapped delay line connected to its output.The output of the filter is computed to be a nonlinear function ofthese delayed inputs and outputs. The nonlinear function maybe implemented in any way, but here we use a layered feedfor-ward neural network.

A neural network is an interconnected set of very simple pro-cessing elements called neurons. Each neuron computes an in-ternal sum that is equal to a constant plus the weighted sum ofits inputs. The neuron outputs a nonlinear “activation” functionof this sum. In this work, the activation function for non-outputneurons is chosen to be the tanh(·) function, and all output neu-rons have the nonlinear function removed. This is done to givethem unrestricted range.

Neural networks may be constructed from individual neuronsconnected in very general ways. However, it is sufficient tohave layers of neurons, where the inputs to each neuron on alayer are identical, and equal to the collection of outputs fromthe previous layer (plus the augmented bias value “1”). The fi-nal layer of the network is called the output layer, and all otherlayers of neurons are called hidden layers. A layered networkis a feedforward (non-recurrent) structure that computes a staticnonlinear function. Dynamics are introduced via the tapped de-lay lines at the input to the network, resulting in a dynamic neu-ral network.

This layered structure also makes it easy to compactly de-scribe the topology of a nonlinear filter. The following notationis used: � (a,b):α:β . . .. This means: “The filter input is comprisedof a tapped delay line with ‘a’ delayed copies of the exogenousinput vector xk , and ‘b’ delayed copies of the output vector yk .

Furthermore, there are ‘α’ neurons in the neural network’s firstlayer of neurons, ‘β’ neurons in the second layer, and so on.”Occasionally, filters are encountered with more than one ex-ogenous input. In that case, the ‘a’ parameter is a row vectordescribing how many delayed copies of each input are used inthe input vector to the network. To avoid confusion, any strictlyfeedforward nonlinear filter is denoted explicitly with a zero inthat part of its description. For example, � (2,0):3:3:1 .

A dynamic neural network of this type is called a NARX(Nonlinear AutoRegressive eXogenous input) filter. It is generalenough to approximate any nonlinear dynamical system [14].A nonlinear single-input-single-output (SISO) filter is createdif xk and yk are scalars, and the network has a single outputneuron. A nonlinear multi-input-multi-output (MIMO) filter isconstructed by allowing xk and yk to be (column) vectors, andby augmenting the output layer with the correct number of ad-ditional neurons. Additionally, notice that since the output layerof neurons is linear, a neural network with a single layer of neu-rons is a linear adaptive filter (if the bias weights of the neuronsare zero). Therefore, the results of this paper—derived for thegeneral NARX structure—can encompass the situation wherelinear adaptive filters are desired if a degenerate neural-networkstructure is used.

B. Adapting Dynamic Neural Networks

A feedforward neural network is one whose input containsno self-feedback of its previous outputs. Its weights may beadapted using the popular backpropagation algorithm, discov-ered independently by several researchers [15,16], and popular-ized by Rumelhart, Hinton and Williams [17]. A NARX fil-ter, on the other hand, generally has self-feedback and mustbe adapted using a method such as real-time recurrent learn-ing (RTRL) [18] or backpropagation through time (BPTT) [19].Although compelling arguments may be made supporting eitheralgorithm [20], we have chosen to use RTRL in this work sinceit easily generalizes as is required later, and is able to adapt fil-ters used in an implementation of adaptive inverse control inreal time.

We assume that the reader is familiar with neural networksand the backpropagation algorithm. If not, the tutorial pa-per [21] is an excellent place to start. Briefly, the backpropa-gation algorithm adapts the weights of a neural network usinga simple optimization method known as gradient descent. Thatis, the change in the weights 1Wk is calculated as

1W Tk = −η

d Jk

dW,

where Jk is a cost function to be minimized and the smallpositive constant η is called the learning rate. Often, Jk =12 � [eT

k ek] where ek = dk − yk is the network error computedas the desired response minus the actual neural-network output.A stochastic approximation to Jk may be made as Jk ≈ 1

2eTk ek ,

which results ind Jk

dW= −eT

kdyk

dW.

For a feedforward neural network, dyk/dW = ∂yk/∂W , whichmay be verified using the chain rule for total derivatives. Thebackpropagation algorithm is an elegant way of recursively

Gregory L. Plett: ADAPTIVE INVERSE CONTROL USING DYNAMIC NEURAL NETWORKS 3

computing ∂ Jk/∂W by “back-propagating” the vector −ek fromthe output of the neural network back through the network to itsfirst layer of neurons. The values that are computed are multi-plied by −η, and are used to adapt the neuron’s weights

It is important for this work to notice that the backprop-agation algorithm may also be used to compute the vectorvT ∂yk/∂W (where v is an arbitrary vector) by simply back-propagating the vector v instead of the vector −ek . Specifi-cally, we will need to compute Jacobian matrices of the neural-network. These Jacobians are ∂yk/∂W and ∂yk/∂ X , where X isthe composite input vector to the neural network (containing alltap-delay-line copies of the exogenous and feedback inputs). Ifwe back-propagate a vector v through the neural network, thenwe have computed the entries of the vector vT ∂yk/∂W . If v ischosen to be a unit vector, then we have computed one row ofthe matrix ∂yk/∂W . By back-propagating as many unit vectorsas there are outputs to the network, we may compose the Ja-cobian ∂yk/∂W one row at a time. Also, if we back-propagatethe vector v past the first layer of neurons to the inputs to thenetwork itself, we have computed vT ∂yk/∂ X . Again, using themethodology of back-propagating unit vectors, we may simulta-neously build up the Jacobian matrix ∂yk/∂ X one row at a time.Werbos called this ability of the backpropagation algorithm the“dual-subroutine.” The primary subroutine is the feedforwardaspect of the network. The dual subroutine is the recursive cal-culation of the Jacobians of the network.

Now that we have seen how to adapt a feedforward neuralnetwork and how to compute Jacobians of a neural network,it is a simple matter to extend the backpropagation algorithmto adapt NARX filters. This was first done by Williams andZipser [18] and called “real time recurrent learning” (RTRL). Asimilar presentation follows.

A NARX filter computes a function of the following form

yk = f (xk, xk−1, . . . , xk−n , yk−1, yk−2, . . . , yk−m, W ).

To adapt using the familiar “sum of squared error” cost function,we need to be able to calculate

d 12‖ek‖

2

dW= −eT

kdyk

dWdyk

dW=

∂yk

∂W+

n∑

i=0

∂yk

∂xk−i

dxk−i

dW+

m∑

i=1

∂yk

∂yk−i

dyk−i

dW. (1)

The first term, ∂yk/∂W , is the direct effect of a change in theweights on yk , and is one of the Jacobians calculated by thedual-subroutine of the backpropagation algorithm. The secondterm is zero, since dxk/dW is zero for all k. The final term maybe broken up into two parts. The first, ∂yk/∂yk−i , is a compo-nent of the matrix ∂yk/∂ X , as delayed versions of yk are part ofthe network’s input vector X . The dual-subroutine algorithmmay be used to compute this. The second part, dyk−i/dW ,is simply a previously-calculated and stored value of dyk/dW .When the system is “turned on,” dyi/dW are set to zero fori = 0,−1,−2, . . ., and the rest of the terms are calculated recur-sively from that point on.

Note that the dual-subroutine procedure naturally calculatesthe Jacobians in such a way that the weight update is done with

simple matrix multiplication. Let

(dwy)k1=

[(dyk−1

dW

)T (dyk−2

dW

)T

· · ·

(dyk−m

dW

)T]T

and

(dxy)k1=

[(∂yk

∂yk−1

) (∂yk

∂yk−2

)· · ·

(∂yk

∂yk−m

)].

The latter is simply the columns of ∂yk/∂ X corresponding tothe feedback inputs to the network, and is directly calculatedby the dual-subroutine. Then, the weight update is efficientlycalculated as

1Wk =

(ηeT

k

[∂yk

∂W+ (dxy)k(dwy)k

])T

.

C. Optimal Solution for a Nonlinear Adaptive Filter

In principle, a neural network can emulate a very generalnonlinear function. It has been shown that any “smooth” staticnonlinear function may be approximated by a two-layer neu-ral network with a “sufficient” number of neurons in its hiddenlayer [22]. Furthermore, a NARX filter can compute any dy-namical finite-state-machine (It can emulate any computer withfinite memory) [14].

In practice, a neural network seldom achieves its full poten-tial. Gradient-descent based training algorithms converge to alocal minimum in the solution space, and not to the global mini-mum. However, it is instructive to exactly determine the optimalperformance that could be expected from any nonlinear system,and then to use it as a lower bound on the MSE of a trained neu-ral network. Generally, a neural network will get quite close tothis bound.

The optimal solution for a nonlinear filter is from refer-ence [23, Theorem 4.2.1]. If the composite input vector to theadaptive filter is Xk , the output is yk , and the desired response isdk , then the optimal filter computes the conditional expectation

yk = �[dk | Xk

].

D. Practical Issues

While adaptation may proceed on-line, there is neither math-ematical guarantee of stability of adaptation, nor of convergenceof the weights of neural network. In practice, we find that if η

is “small enough”, the weights of the adaptive filter converge ina stable way, with a small amount of misadustment. It may bewise to adapt the dynamic neural network off-line (using pre-measured data) at least until close to a solution, and then use asmall value of η to continue adapting on-line. Speed of adap-tation is also an issue. The reader may wish to consider fastersecond-order adaptation methods such as Dynamically Decou-pled Extended Kalman Filter (DDEKF) learning [24], where thegradient matrix H (k) = dyT

k /dW , as computed above. We havefound order-of-magnitude improvement in speed of adaptationusing DDEKF versus RTRL [25] .


PSfrag replacements

PlantP

C

P

PCOPYXMzI

z−1I

Dist. wk

Dist. wkwkwkrk

uk

ukukukukuk

yk

ykykykykykykdk

e(mod)ke(sys)

k

e(mod)k

yk

(a)

PSfrag replacements

PlantP

C

P

PCOPYXMzI

z−1I

Dist. wk

Dist. wkwkwkrk

uk

ukukukukuk

ykyk

ykyk

yk

ykykdk

e(mod)ke(sys)

k e(mod)k

ukyk

Parallel

Series-Parallel

(b)

Fig. 2. Adaptive plant modeling. Simplified (a) and detailed (b) depictions.

III. ADAPTIVE SYSTEM IDENTIFICATION

The first step in performing adaptive inverse control is tomake an adaptive model of the plant. The model should cap-ture the dynamics of the plant well enough that a controller de-signed to control the plant model will also control the plant verywell. This is a straightforward application of the adaptive filter-ing techniques in Section II.

A method for adaptive plant modeling is depicted in Fig. 2(a).The plant is excited with the signal uk , and the disturbed outputyk is measured. The plant model P is also excited with uk , andits output yk is computed. The plant modeling error e(mod)

k =yk − yk is used by the adaptation algorithm to update the weightvalues of the adaptive filter.

NARX models have implicit feedback of delayed versions oftheir output to the input of the model. This feedback is as-sumed in all block diagrams, and is not drawn explicitly, ex-cept in Fig. 2(b). The purpose of Fig. 2(b) is to show that thisfeedback, when training an adaptive plant model, may be con-nected to either the model output yk or the plant output yk . Thefirst method is called a parallel connection for system identifica-tion, and the second method is called a series-parallel connec-tion for system identification. Networks configured in series-parallel may be trained using the standard backpropagation al-gorithm. Networks configured in parallel must be trained witheither RTRL or BPTT. The series-parallel configuration is sim-ple, but is biased by disturbance. The parallel configuration ismore complex to train, but is unbiased by disturbance. In thiswork, nonlinear system identification is first performed usingthe series-parallel configuration to initialize weight values ofthe plant model. When the weight values converge, the plantmodel is re-configured in the parallel configuration and trainingis allowed to continue. This procedure allows speedy trainingof the network, but is not compromised by disturbance.

We now confirm that the adaptive plant model P converges to

P . From Section II–C we can find the optimal solution for P

P (opt)(Euk) = �

[yk

∣∣ Euk]

= �[P (Euk) + wk

∣∣ Euk]

= �[P (Euk)

∣∣ Euk]+ �

[wk

∣∣ Euk]

= P(Euk) + �[wk

]

= P(Euk),

where Euk is a vector containing past values of the control signaluk, uk−1, . . . , u0 and where two assumptions were made: (1)In the fourth line we assume that the disturbance is statisticallyindependent of the command input signal uk ; and (2) In the finalline we assume that the disturbance is zero-mean. Under thesetwo assumptions, the plant model converges to the plant despitethe presence of disturbance.

A final comment should be made regarding the relative timingof the uk and yk signals. In order to be able to model eitherstrictly- or non-strictly-proper plants, we output uk at time t =(kT )− and measure yk at time t = (kT )+, T being the samplingperiod. We will see that this assumption can limit the extent towhich we can effectively cancel disturbance. If we know a priorithat the plant is strictly proper, then we may instead measure ykat time t = (kT )− and use its value when computing uk , (whichis output at time t = (kT )+) since we know that there is noimmediate effect on the plant output due to uk .

IV. ADAPTIVE FEEDFORWARD CONTROL OF PLANTDYNAMICS

To perform adaptive inverse control, we need to be able toadapt the three filters of Fig. 1: the plant model P , the con-troller C , and the disturbance canceler X . We have seen how toadapt P to make a plant model. For the time being we set asideconsideration of the disturbance canceling filter X and concen-trate on the design of the feedforward controller C .1 The goalis to make the dynamics of the controlled system PC approx-imate the fixed filter M as closely as possible, where M is auser-specified reference model. The input reference signal rk isfiltered through M to create a desired response dk for the plantoutput. The measured plant output is compared with the desiredplant output to create a system error signal e(sys)

k = dk − yk . Wewill adapt C to minimize the mean-squared system error whileconstraining the control effort uk .

The reference model M may be designed in a number ofways. Following traditions of control-theory, we might designM to have the step response of a second-order linear systemthat meets design specifications. However, we can often achieveeven better tracking control if we let M be simply a delay cor-responding to the transport delay of the plant. The controller Cwill adapt to a delayed inverse of the plant dynamics.2

1 We shall restrict our development to apply only to stable plants. If the plantof interest is unstable, conventional feedback should be applied to stabilize it.Then the combination of the plant and its feedback stabilizer can be regarded asan equivalent stable plant.

2 We note that nonlinear plants do not, in general, have inverses. So, when wesay that C adapts to a (delayed) inverse of the plant dynamics, we mean that Cadapts so that the cascade PC approximates the (delayed) identity function aswell as possible in the least mean-squared-error sense.


PSfrag replacements

PlantPC

P

PCOPY

X

M

zIz−1I

Dist. wk

Dist. wkwkwk

rk

uk

uk

ukukukuk

yk

ykykykykykyk

dk

e(mod)k

e(sys)k

Fig. 3. Structure diagram illustrating the BPTM method.

A. Adapting a Constrained Controller via the BPTM Algorithm

Recall that an adaptive filter requires a desired response inputsignal in order to be able to adapt its weights. While we do havea desired response for the entire system, dk , we do not have adesired response for the output of the controller C . A key hurdlethat must be overcome by the algorithm is to find a mechanismfor converting the system error to an adaptation signal used toadjust C .

One solution is to regard the series combination of C andP as a single adaptive filter. There is a desired response forthis combined filter: dk . Therefore, we can use dk to com-pute weight updates for the conglomerate filter. However, weonly apply the weight updates to the weights in C ; P is stillupdated using the plant modeling error e(mod)

k . Figure 3 showsthe general framework to be used. We say that the system erroris back-propagated through the plant model, and used to adaptthe controller. For this reason, the algorithm is named “Back-Prop Through (Plant) Model” (BPTM). This solution allows forcontrol of nonminimum-phase and non-square plants.

The algorithm is derived as follows. We wish to train thecontroller C to minimize the squared system error and to si-multaneously minimize some function of the control effort. Weconstruct the following cost function that we will minimize

Jk = �[

12

e(sys)k

T Qe(sys)k + h(uk, uk−1, . . . , uk−r )

].

The differentiable function h(·) defines the cost function associ-ated directly with the control signal uk , and is used to penalizeexcessive control effort, slew rate and so forth. The system erroris the signal e(sys)

k = dk − yk , and the symmetric matrix Q is aweighting matrix that assigns different performance objectivesto each plant output.

If we let g(·) be the function implemented by the controllerC , and f (·) be the function implemented by the plant model P ,we can state without loss of generality

uk = g(uk−1,uk−2 . . .uk−m,rk ,rk−1 . . .rk−q ,WC

)

yk ≈ yk = f(yk−1, yk−2 . . . yk−n,uk ,uk−1 . . .uk−p

), (2)

where WC are the adjustable parameters (weights) of the con-troller.

The controller weights are updated in the direction of the neg-

ative gradient of the cost functional

(1WC )Tk = −η

d Jk

dWC

= −ηd

dWC

{e(sys)

kT Qe(sys)

k + h(uk,uk−1, . . . ,uk−r )}

,

where η is the adaptive learning rate. Continuing,

(1WC )Tk

η= eT

k Qd yk

dWC−

r∑

j=0

(∂h(uk, . . . , uk−r )

∂uk− j

)T (duk− j

dWC

).

Using (2) and the chain rule for total derivatives, two furthersubstitutions may be made at this time

duk

dWC=

∂uk

∂WC+

m∑

j=1

(∂uk

∂uk− j

)(duk− j

dWC

)(3)

d yk

dWC=

p∑

j=0

(∂ yk

∂uk− j

)(duk− j

dWC

)+

n∑

j=1

(∂ yk

∂ yk− j

)(d yk− j

dWC

).(4)

With these definitions, we may find the weight update. We needto find three quantities: ∂h(·)/∂uk− j , duk/dWC and d yk/dWC .

First, we note that ∂h(·)/∂uk− j depends on the user-specifiedfunction h(·). It can be calculated given h(·). Secondly, weconsider duk/dWC as expanded in (3). It is the same in form as(1) and is computed the same way.

Thirdly, we consider d yk/dWC in (4). The first term inthe first summation is the Jacobian ∂ yk/∂uk− j , which may becomputed via the backpropagation algorithm. The next term,duk− j /dW is the current or a previously-computed and savedversion of duk/dWC , computed via (3). The first term in thesecond summation, ∂ yk/∂ yk− j is another Jacobian. The finalterm, d yk− j/dWC , is a previously-computed and saved versionof d yk/dWC .

A practical implementation is realized by compacting the no-tation into a collection of matrices. We define

dUk1=

[(∂uk

∂WC

)T (∂uk−1

∂WC

)T

. . .

(∂uk−p

∂WC

)T]T

dHk1=

[(∂h(·)

∂uk

)T (∂h(·)

∂uk−1

)T

. . .

(∂h(·)

∂uk−r

)T]T

dYk1=

[(∂yk−1

∂WC

)T (∂yk−2

∂WC

)T

· · ·

(∂yk−n

∂WC

)T]T

∂UYk1=

[(∂yk

∂uk

)T (∂yk

∂uk−1

)T

· · ·

(∂yk

∂uk−p

)T]T

∂YYk1=

[(∂yk

∂yk−1

)T (∂yk

∂yk−2

)T

· · ·

(∂yk

∂yk−n

)T]T

∂UUk1=

[(∂uk

∂uk−1

)T (∂uk

∂uk−2

)T

· · ·

(∂uk

∂uk−p

)T]T

.

The algorithm is summarized in Fig. 4. Any programminglanguage supporting matrix mathematics can very easily imple-ment this algorithm. It works well.


begin {Adapt C }Update duk/dWC :• Shift dUk down Ni rows, where Ni is the number of plant inputs.• Backpropagate Ni unit vectors through C to form ∂UUk and

∂uk/∂WC . Each backpropagation produces one row of both ma-trices.

• Compute top Ni rows of dUk to be ∂uk/∂WC + (∂UUk )(dUk ).Update dyk/dWC :• Backpropagate No unit vectors through P to form ∂UYk and ∂YYk ,

where No is the number of plant outputs. Each backpropagationproduces one row of both matrices.

• Compute dWYk = (∂UYk)(dUk)+ (∂YYk )(dYk ).• Shift dYk down No rows and save dWYk in the top No rows.Compute dHk .Update Weights:• Compute (1WC )T

k = ηeTk Q(dWYk )−η(dHT

k )(dUk).• Adapt, enumerating weights in the same order as when computing

∂WUk .end {Adapt C}

Fig. 4. Algorithm to adapt a NARX controller for a NARX plant model.

A similar algorithm may be developed for linear systems, forwhich convergence and stability has been proven [7]. We haveno mathematical proof of stability for the nonlinear version, al-though we have observed it to be stable if η is “small enough”.As a practical matter, it may be wise to adapt the controller off-line until convergence is approximated. Adaptation may thencontinue on-line, with a small value of η. Faster adaptation maybe accomplished using DDEKF as the adaptation mechanism(BPTM-DDEKF) versus using RTRL (BPTM-RTRL). To do so,set H (k) = d J T

k /dWC [25].

V. ADAPTIVE DISTURBANCE CANCELING

Referring again to Fig. 1, we have seen how to adaptivelymodel the plant dynamics, and how to use the plant model toadapt a feedforward adaptive controller. Using this controller,an undisturbed plant will very closely track the desired output.A disturbed plant, on the other hand, may not track the desiredoutput at all well. It remains to determine what can be done toeliminate plant disturbance.

The first idea that may come to mind is to simply “close theloop.” Two methods commonly used to do this are discussed inSection V–A. Unfortunately, neither of these methods is appro-priate if an adaptive controller is being designed. Closing theloop will cause the controller to adapt to a “biased” solution.An alternate technique is introduced that leads to the correct so-lution if the plant is linear, and has substantially less bias if theplant is nonlinear.

A. Conventional Disturbance Rejection Methods Fail

Two approaches to disturbance rejection commonly seen inthe literature are shown in Fig. 5. Both methods use feedback;the first feeds back the disturbed plant output yk , as in Fig. 5(a),and the second feeds back an estimate of the disturbance wk ,as in Fig. 5(b). The approach shown in Fig. 5(a) is more con-ventional, but is difficult to use with adaptive inverse controlsince the dynamics of the closed-loop system are dramaticallydifferent from the dynamics of the open-loop system. Differ-ent methods than those presented here are required to adapt C .The approach shown in Fig. 5(b) is called internal model con-trol [26]–[31]. The benefit of using this scheme is that the dy-

PSfrag replacements

PlantPC

P

PCOPYXMzI

z−1I

Dist. wk

Dist. wkwkwk

rk

uk

uk

ukukukuk ykykykykykykykdk

e(mod)ke(sys)

k(a)

PSfrag replacements

PlantPC

P

PCOPYXMzI

z−1I

Dist. wk

Dist. wk

wk

wk

rk

uk

uk

ukukukuk ykykykykykykykdk

e(mod)ke(sys)

k(b)

Fig. 5. Two methods to close the loop: (a) The output, yk , is fed back to thecontroller; (b) An estimate of the disturbance, wk , is fed back to the controller.

namics of the closed-loop system are equal to the dynamics ofthe open-loop system if the plant model is identical to the plant.Therefore, the methods found to adapt the controller for feed-forward control may be used directly.

Unfortunately, closing the loop using either method in Fig. 5will cause P to adapt to an incorrect solution. In the follow-ing analysis the case of a plant controlled with internal modelcontrol is considered. A similar analysis may be performed forthe conventional feedback system in Fig. 5(a), with the sameconclusion.

B. Least Mean-Squared-Error Solution for P using InternalModel Control

When the loop is closed as in Fig. 5(b), the estimated distur-bance term wk−1 is subtracted from the reference input rk . Theresulting composite signal is filtered by the controller C and be-comes the plant input signal, uk . In the analysis done so far, wehave assumed that uk is independent of wk , but that assumptionis no longer valid. We need to revisit the analysis performed forsystem identification to see if the plant model P still convergesto P .

The direct approach to the problem is to calculate the leastmean-squared-error solution for P and see if it is equal to P .However, due to the feedback loop involved, it is not possi-ble to obtain a closed form solution for P . An indirect ap-proach is taken here. We do not need to know exactly to what Pconverges—we only need to know whether or not it convergesto P . The following indirect procedure is used:

1. First, remove the feedback path (open the loop) and per-form on-line plant modeling and controller adaptation.When convergence is reached P → P .

2. At this time, P ≈ P , and the disturbance estimate wk isvery good. We assume that wk = wk . This assumptionallows us to construct a feedforward-only system that is


equivalent to the internal-model-control feedback systemby substituting wk for wk .

3. Lastly, analyze the system, substituting wk for wk . If theleast mean-squared-error solution for P still converges toP , then the assumption made in the second step remainsvalid, and the plant is being modeled correctly. If P di-verges from P with the assumption made in step 2, then itcan not converge to P in a closed-loop setting. We con-clude that the assumption is justified for the purpose ofchecking for proper convergence of P .

We now apply this procedure to analyze the system of Fig. 5(b).We first open the loop and allow the plant model to convergeto the plant. Secondly, we assume that wk ≈ wk . Finally, wecompute the least mean-squared-error solution for P .

P (opt)(Euk) = �

[yk

∣∣ Euk]

= �[P(Euk) + wk

∣∣ Euk]

= �[P(Euk)

∣∣ Euk]+ �

[wk

∣∣ Euk]

= P (Euk) + �[wk

∣∣ Euk],

where Euk is a function of wk since uk = C(Erk − Ewk−1) and sothe conditional expectation in the last line is not zero. The plantmodel is biased by the disturbance.

Assumptions concerning the nature of the rk and wk signalsare needed to simplify this further. For example, if wk is white,then the plant model converges to the plant. Under almost allother conditions, however, the plant model converges to some-thing else. In general, an adaptive plant model made using theinternal model control scheme will be biased by disturbance.

The bottom line is that if the loop is closed, and the plantmodel is allowed to continue adapting after the loop is closed,the overall control system will become biased by the distur-bance. One solution, applicable to linear plants, is to performplant modeling with dither signals rather than with the com-mand input signal uk [13, schemes B,C]. However, this willincrease the output noise level. A better solution is presentednext, in which the plant model is allowed to continue adaptationand the output noise level is not increased. The plant distur-bances will be handled by a separate circuit from the one han-dling the task of dynamic response. This results in an overallcontrol problem that is partitioned in a very nice way.

C. A Solution Allowing On-Line Adaptation of P

The only means at our disposal to cancel disturbance isthrough the plant input signal. This signal must be computedin such a way that the plant output negates (as much as possi-ble) the disturbance. Therefore, the plant input signal must bestatistically dependent on the disturbance. However, we havejust seen that the plant model input signal cannot contain termsdependent on the disturbance or the plant model will be biased.This conundrum was first solved by Widrow and Walach [13] asshown in Fig. 6.

By studying the figure, we see that the control scheme is verysimilar to internal model control. The main difference is thatthe feedback loop is “moved” in such a way that the disturbancedynamics do not appear at the input to the adaptive plant model,but do appear at the input to the plant. That is, the controller

PSfrag replacements

PlantPC

P

PCOPY

X

MzI

z−1I

Dist. wk

Dist. wk

wk

wk

rk

uk

uk

uk

uk

uk

uk

yk

ykykykykykykdk

e(mod)ke(sys)

k

Fig. 6. Correct on-line adaptive plant modeling in conjunction with disturbancecanceling for linear plants. The circuitry for adapting C has been omitted forclarity.

output uk is used as input to the adaptive plant model P ; on theother hand, the input to the plant is equal to uk + uk , where uk isthe output of a special disturbance-canceling filter, X . P is notused directly to estimate the disturbance; rather, a filter whoseweights are a digital copy of those in P is used to estimate thedisturbance. This filter is called PCOPY.

In later sections we will see how to adapt X and how to mod-ify the diagram if the plant is nonlinear. Now, we proceed toshow that the design is correct if the plant is linear. We computethe optimal function P

P (opt)(Euk) = �

[yk

∣∣ Euk]

= �[P (Euk + Euk) + wk

∣∣ Euk]

= �[P (Euk + Euk)

∣∣ Euk]+ �

[wk

∣∣ Euk]

= �[P (Euk + Euk)

∣∣ Euk]

= �[P (Euk)

∣∣ Euk]+ �

[P( Euk)

∣∣ Euk]

= P(Euk),

where Euk is a vector containing past values of the control signaluk,uk−1, . . . ,u0, and Euk is a vector containing past values of thedisturbance canceler output, uk, uk−1, . . . , u0. In the final linewe assume that Euk is zero-mean (which it would be if wk is zeromean and the plant is linear).

Unfortunately, there is still a bias if the plant is nonlinear.The derivation is the same until the fourth line

P (opt)(Euk) = �

[P(Euk + Euk)

∣∣ Euk],

which does not break up as it did for the linear plant. To proceedfurther than this, we must make an assumption about the plant.Considering a SISO plant, for example, if we assume that P :� k →

�is differentiable at “point” Euk , then we may write the

first-order Taylor expansion [32, Theorem 1]

P(Euk + Euk) = P(Euk) +

k∑

i=0

ui∂P∂ ui

(Euk) + R1( Euk, Euk),

where ui is the plant input at time index i . Furthermore, we havethe result that R1( Euk, Euk)/‖ Euk‖ → 0 as Euk → 0 in

� k . Then,


P (opt)(Euk) = �

[P(Euk) +

k∑

i=0

ui∂P∂ ui

(Euk) + R1( Euk, Euk)

∣∣∣ Euk

]

= P (Euk) + �[ k∑

i=0

ui∂P∂ ui

(Euk)

∣∣∣ Euk

]

+ �[

R1( Euk, Euk)

∣∣∣ Euk

].

Since wk is assumed to be independent of uk , then Euk is inde-pendent of Euk . We also assume that Euk is zero-mean.

P (opt)(Euk) = P (Euk) +

k∑

i=0�

[ui

]︸︷︷︸

=0

∂P∂ ui

(Euk)

+ �[

R1( Euk, Euk)∣∣ Euk

](5)

= P (Euk) + �[

R1( Euk, Euk)∣∣ Euk

]

≈ P (Euk).

The linear term disappears, as we expect from our analysis fora linear plant. However, the remainder term is not eliminated.In principle, if the plant is very nonlinear, the plant model maybe quite different from the plant. In practice, however, we haveseen that this method seems to work well, and much better thanan adaptive version of the internal-model-control scheme.

So, we conclude that this system is not as good as desiredif the plant is nonlinear. It may not be possible to performon-line adaptive plant modeling while performing disturbancecanceling while retaining an unbiased plant model. One solu-tion might involve freezing the weights of the plant model forlong periods of time, and scheduling shorter periods for adaptiveplant modeling with the disturbance canceler turned off whenrequirements for output quality are not as stringent. Another so-lution suggests itself from (5). If the uk terms are kept small, thebias will disappear. We will see in Section V–E that X may beadapted using the BPTM algorithm. We can enforce constraintson the output of X using our knowledge from Section IV to en-sure that uk remains small. Also, since the system is biased, weuse uk as an additional exogenous input to X to help X deter-mine plant-state to cancel disturbance.

D. The Function of the Disturbance Canceler

It is interesting to consider the mathematical function thatthe disturbance canceling circuit must compute. A little care-ful thought in this direction leads to a great deal of insight. Theanalysis is precise if the plant is minimum phase (that is, if it hasa stable, causal inverse), but is merely qualitative if the plant isnonminimum phase. The goal of this analysis is not to be quan-titative, but rather to develop an understanding of the functionperformed by X .

A useful way of viewing the overall system is to considerthat the control goal is for X to produce an output so that yk =M(Erk). We can express yk as

yk = wk + P(

C (Erk) + X( Ewk−1, Euk))

.

PSfrag replacementsPlant

PC

P

PCOPYXMzI

z−1IDist. wkDist. wk

wkwkrkukuk

uk

ukukukykykykykykykykdk

e(mod)ke(sys)

k

wk−1

uk

�[wk | Ewk−1] P−1

(a)


PCP

PCOPYXMzI


wkwkrkukuk

uk

ukukukykykykykykykykdk

e(mod)ke(sys)

kwk−1 −

�[wk | Ewk−1] P−1

(b)

Fig. 7. Internal structure of X : (a) For a general plant; (b) For a linear plant.

Note that X takes the optional input signal uk . This signal isused when controlling nonlinear plants as it allows the distur-bance canceler some knowledge of the plant state. It is not re-quired if the plant is linear.

Next, we substitute yk = M(Erk) and rearrange to solve for thedesired response of X . We see that

X (opt)( Ewk−1, Euk) = P−1(M(Erk) − Ewk)− C (Erk)

= P−1(P(Euk) − Ewk)− uk,

assuming that the controller has adapted until P(Euk) ≈ M(Erk).The function of X is a deterministic combination of known (byadaptation) elements P and P−1, but also of the unknown sig-nal wk . Because of the inherent delay in discrete-time systems,we only know wk−1 at any time, so wk must be estimated fromprevious samples of wk−1, . . . , w0. Assuming that the adaptiveplant model is perfect, and that the controller has been adaptedto convergence, the internal structure of X is then shown inFig. 7(a). The wk signal is computed by estimating its valuefrom previous samples of wk . These are combined and passedthrough the plant inverse to compute the desired signal uk .

Thus, we see that the disturbance canceler contains two parts.The first part is an estimator part that depends on the dynam-ics of the disturbance source. The second part is the cancelerpart that depends on the dynamics of the plant. The diagramsimplifies for a linear plant since some of the circuitry cancels.Figure 7(b) shows the structure of X for a linear plant.

One very important point to notice is that the disturbance can-celer still depends on both the disturbance dynamics and theplant dynamics. If the process generating the disturbance isnonlinear or chaotic, then the estimator required will in generalbe a nonlinear function. The conclusion is that the disturbancecanceler should be implemented as a nonlinear filter even if theplant is a linear dynamical system.

If the plant is generalized minimum phase (minimum phasewith a constant delay), then this solution must be modifiedslightly. The plant inverse must be a delayed plant inverse, withthe delay equal to the transport delay of the plant. The estimatormust estimate the disturbance one time step into the future plusthe delay of the plant. For example, if the plant is strictly proper,there will be at least one delay in the plant’s impulse response,


and the estimator must predict the disturbance at least two timesteps into the future.3

It was stated earlier that these results are heuristic and do notdirectly apply if the plant is nonminimum phase. We can seethis easily now, since a plant inverse does not exist. However,the results still hold qualitatively since a delayed approximateinverse exists; the solution for X is similar to the one for a gen-eralized minimum phase plant. The structure of X consists of apart depending on the dynamics of the system, which amountsto a delayed plant inverse, and a part that depends on the dy-namics of the disturbance generating source, which must nowpredict farther into the future than a single time step. Unlike thecase of the generalized minimum phase plant, however, thesetwo parts do not necessarily separate. That is, X implementssome combination of predictors and delayed inverses that com-pute the least mean-squared-error solution.

E. Adapting a Disturbance Canceler via the BPTM Algorithm

We have seen how a disturbance canceling filter can be in-serted into the control-system design in such a way that it willnot bias the controller for a linear plant, and will minimally biasa controller for a nonlinear plant. Proceeding to develop an al-gorithm to adapt X , we consider that the system error is com-posed of three parts:

• One part of the system error is dependent on the input com-mand vector Erk in C . This part of the system error is re-duced by adapting C .

• Another part of the system error is dependent on the esti-mated disturbance vector Ewk in X . This part of the systemerror is reduced by adapting X .

• The minimum-mean-squared-error, which is independentof both the input command vector in C and the estimatedisturbance vector in X . It is either irreducible (if the sys-tem dynamics prohibit improvement), or may be reducedby making the tapped-delay lines at the input to X or Clarger. In any case, adaptation of the weights in X or Cwill not reduce the minimum-mean-squared-error.

• The fourth possible part of the system error is the part thatis dependent on both the input command vector and thedisturbance vector. However, by assumption, rk and wkare independent, so this part of the system error is zero.

Using the BPTM algorithm to reduce the system error by adapt-ing C , as discussed in Section IV, will reduce the componentof the system error dependent on the input rk . Since the dis-turbance and minimum-mean-squared-error are independent ofrk , their presence will not bias the solution of C . The controllerwill learn to control the feedforward dynamics of the system,but not to cancel disturbance.

If we were to use the BPTM algorithm and backpropagatethe system error through the plant model, using it to adapt X aswell, the disturbance canceler would learn to reduce the compo-nent of the system error dependent on the estimated disturbancesignal. The component of the system error due to unconverged

3 However, if the plant is known a priori to be strictly proper, the delay z−1Iblock may be removed from the feedback path. Then the estimator needs to pre-dict the disturbance one fewer step into the future. Since estimation is imperfect,this will improve performance.

PSfrag replacements

PlantPC

P

PCOPY

X

M

zI

z−1I

Dist. wk

Dist. wkwkwk

rk

uk

uk

uk

uk

uk

uk

ykykykykykykykdk

e(mod)ke(sys)

k

e(sys)k

e(sys)k

Fig. 8. An integrated nonlinear MIMO system.

C and minimum-mean-squared-error will not bias the distur-bance canceler.

BPTM for the disturbance canceler may be developed as fol-lows. Let g(·) be the function implemented by X , f (·) be thefunction implemented by the plant model copy PCOPY, and WX bethe weights of the disturbance canceler neural network. Then

uk = g(uk−1 . . . uk−m , wk−1 . . . wk−q ,uk . . .uk−r ,WX

)

yk ≈ yk = f(yk−1 . . . yk−n, uk . . . uk−p

),

where uk = uk + uk is the input to the plant, uk is the outputof the controller, and uk is the output of the disturbance can-celer. BPTM computes the disturbance-canceler weight update1(WX )T

k = −ηeTk d yk/dWX . This can be found by the set of

equations:

duk

dWX=

duk

dWX+

duk

dWX=

duk

dWX

=∂ uk

∂WX+

m∑

j=1

(∂ uk

∂ uk− j

)(duk− j

dWX

)

d yk

dWX=

p∑

j=0

(∂ yk

∂ uk− j

)(duk− j

dWX

)+

n∑

j=1

(∂ yk

∂ yk− j

)(d yk− j

dWX

).

This method is illustrated in Fig. 8 where a complete inte-grated MIMO nonlinear control system is drawn. The plantmodel is adapted directly, as before. The controller is adaptedby backpropagating the system error through the plant modeland using the BPTM algorithm of Section IV. The distur-bance canceler is adapted by backpropagating the system errorthrough the copy of the plant model and using the BPTM algo-rithm as well. So we see that the BPTM algorithm serves twofunctions: it is able to adapt both C and X .

Since the disturbance canceler requires an accurate estimateof the disturbance at its input, the plant model should be adaptedto near convergence before “turning the disturbance canceleron” (connecting the disturbance canceler output to the plant in-put). Adaptation of the disturbance canceler may begin beforethis point, however.


0 20 40 60 80 100−10

−5

0

5

10


PCP

PCOPYXMzI


wkwkrkukukukukukukykykykykykykykdk

e(mod)ke(sys)

k

System 1

System 2System 3System 4

Iteration

Am

plitu

de

0 20 40 60 80 100−1

0

1

2

3

4


PCP

PCOPYXMzI



e(mod)ke(sys)

kSystem 1

System 2

System 3System 4

Iteration

Am

plitu

de0 20 40 60 80 100

0

2

4

6

8


PCP

PCOPYXMzI



e(mod)ke(sys)

kSystem 1System 2

System 3

System 4

Iteration

Am

plitu

de

0 20 40 60 80 100−2

−1

0

1

2

3


PCP

PCOPYXMzI



e(mod)ke(sys)

kSystem 1System 2System 3

System 4

IterationA

mpl

itude

Fig. 9. (a) System identification of four nonlinear SISO plants in the absence of disturbance. The gray line (when visible) is the true plant output, and the solidline is the output of the plant model.

VI. EXAMPLES

Four simple nonlinear SISO systems were chosen to demon-strate the principles of this paper. The collection includesboth minimum-phase and nonminimum-phase plants; plants de-scribed using difference equations and plants described in state-space; and a variety of ways to inject disturbance. Since theplants are not motivated by any particular ‘real’ dynamical sys-tem, the command signal and disturbance sources are artificialas well. In each case, the command signal is uniform i.i.d.,which was chosen since it is the most difficult to follow. Theraw disturbance source is a first-order Markov process. In somecases the Markov process is driven by i.i.d. uniform randomvariables, and in other cases by i.i.d. Gaussian random variables.The disturbance is added to either the input of the system, to theoutput of the system, to a specific state in the plant’s state-spacerepresentation or to an intermediate stage of the processing per-formed by the plant.

System 1: The first plant was introduced to the literatureby Narendra and Parthasarathy [33]. The difference equationsdefining its dynamics are:

sk =sk−1

1 + s2k−1

+ u3k−1

yk = sk + distk .

The plant model was a � (2,1):8:1 network, and system identifica-tion was initially performed with the plant input signal uk beingi.i.d. uniformly distributed between [−2,2]. Disturbance wasa first-order Markov process, generated by filtering a primaryrandom process of i.i.d. random variables. The i.i.d. randomvariables were uniformly distributed in the range [−0.5, 0.5].The filter used to generate the first-order Markov process was a

one-pole filter with the pole at z = 0.99. The resulting distur-bance was added directly to the output of the system. Note thatthe disturbance is added after the nonlinear filter, and hence itdoes not affect the internal state of the system.

System 2: The second plant is a nonminimum-phase (mean-ing that it does not have a stable causal inverse) nonlineartransversal system. The difference equation defining its dynam-ics is:

yk = exp(uk−1 − 2uk−2 + distk−1) − 1.

The plant model was a � (3,0):3:1 network, with the plant inputsignal uk being i.i.d. uniformly distributed between [−0.5,0.5].Disturbance was a first-order Markov process, generated by fil-tering a primary random process of i.i.d. random variables. Thei.i.d. random variables were distributed according to a Gaussiandistribution with zero mean and standard deviation 0.03. Thefilter used to generate the first-order Markov process was a one-pole filter with the pole at z = 0.99. The resulting disturbancewas added to the input of the system.

System 3: The third system is a nonlinear plant expressed instate-space form. The system consists of a linear filter followedby a squaring device. The difference equations defining thisplant’s dynamics are:

xk =

[0 1

−0.2 0.2

]xk−1 +

[0.21

]uk−1 +

[10

]distk−1

sk =[

1 2]

xk

yk = 0.3(sk)2 .

The plant model was a � (10,5):8:1 network, with the plant in-put signal uk being i.i.d. uniformly distributed between [−1,1].


0 20 40 60 80 100−10

−5

0

5

10


PCP

PCOPYXMzI



e(mod)ke(sys)

k

System 1


Iteration

Am

plitu

de

0 20 40 60 80 100−1

0

1

2

3

4

5


PCP

PCOPYXMzI



e(mod)ke(sys)

kSystem 1

System 2

System 3System 4

Iteration

Am

plitu

de0 20 40 60 80 100

0

2

4

6

8

10


PCP

PCOPYXMzI



e(mod)ke(sys)

kSystem 1System 2

System 3

System 4

Iteration

Am

plitu

de

0 20 40 60 80 100−3

−2

−1

0

1

2


PCP

PCOPYXMzI



e(mod)ke(sys)


System 4

IterationA

mpl

itude

Fig. 9. (b) System identification of four nonlinear SISO plants in the presence of disturbance. The dashed black line is the disturbed plant output, and the solidblack line is the output of the plant model. The gray solid line (when visible) shows what the plant output would have been if the disturbance were absent. Thissignal is normally unavailable, but is shown here to demonstrate that the adaptive plant model captures the dynamics of the true plant very well.

Disturbance was a first-order Markov process, generated by fil-tering a primary random process of i.i.d. random variables. Thei.i.d. random variables were uniformly distributed in the range[−0.5,0.5]. The filter used to generate the first-order Markovprocess was a one-pole filter with the pole at z = 0.99. Theresulting disturbance was added directly to the first state of thesystem.

System 4: The final system is a generalization of one in ref-erence [13]. The nonlinearity in this system has memory or“phase,” and is a type of hysteresis device. The system has twoequilibrium states as opposed to the previous plants which allhad a single equilibrium state. The difference equations defin-ing its dynamics are:

sk = 0.4sk−1 + 0.5uk

yk = distk +

{0.8yk−1 + 0.8tanh(sk − 2), if sk > sk−1;0.8yk−1 + 0.8tanh(sk + 2), if sk ≤ sk−1.

The plant model was a � (10,10):30:1 network, with the plant in-put signal uk being i.i.d. uniformly distributed between [−1,1].Disturbance was a first-order Markov process, generated by fil-tering a primary random process of i.i.d. random variables. Thei.i.d. random variables were distributed according to a Gaussiandistribution with zero mean and standard deviation 0.01. Thefilter used to generate the first-order Markov process was a one-pole filter with the pole at z = 0.95. The resulting disturbancewas added to an intermediate point in the system, just before theoutput filter.

A. System Identification

System identification was first performed, starting with ran-dom weight values, in the absence of disturbance, and a sum-mary plot of is presented in Fig. 9(a). This was repeated, start-ing again with random weight values, in the presence of dis-turbance, and a summary plot is presented in Fig. 9(b). Plantmodels were trained using a series-parallel method first to ac-complish coarse training. Final training was always in a paral-lel connection to ensure unbiased results. The RTRL algorithmwas used with an adaptive learning rate for each layer of neu-rons: µi

k = min0≤ j≤k 1/‖X i, j ‖2 where i is the layer number

and X i,k is the input vector to that layer. This simple rule-of-thumb, based on stability results for LMS, was very effective inproviding a ballpark adaptation rate for RTRL.

In all cases, a neural network was found that satisfactorilyidentified the system. Each system was driven with an i.i.d. uni-form control signal. This was not characteristic of the controlsignal generated by the trained controller in the next section, butwas a starting point and worked quite well to initialize the plantmodel for use in training the controller. Each plant producedits own very characteristic output for the same input, as seen inFig. 9, but it was shown that neural networks could be trainedto identify each system nearly perfectly.

When disturbance is added, it is useful to think of the “dis-turbed plant dynamics” and the “nominal plant dynamics.” Ineach case, the system identification process matched the nom-inal dynamics of the plant, which is what theory predicts, andwhat we would like.


0 20 40 60 80 100−10

−5

0

5

10


PCP

PCOPYXMzI



e(mod)ke(sys)

k

Tracking Uniform White Input

Control Signal for Uniform White InputTracking Sinusoidal Input

Control Signal for Sinusoidal InputTracking Square-Wave Input

Control Signal for Square-Wave Input

Iteration

IterationIterationIterationIterationIteration

Am

plitu

de

0 20 40 60 80 100−3

−2

−1

0

1

2

3


PCP

PCOPYXMzI



e(mod)ke(sys)

kTracking Uniform White Input

Control Signal for Uniform White Input

Tracking Sinusoidal InputControl Signal for Sinusoidal Input

Tracking Square-Wave InputControl Signal for Square-Wave Input

Iteration

Iteration

IterationIterationIterationIteration

Am

plitu

de0 20 40 60 80 100

−10

−5

0

5

10


PCP

PCOPYXMzI



e(mod)ke(sys)



Tracking Sinusoidal Input


Control Signal for Square-Wave InputIterationIteration

Iteration

IterationIterationIteration

Am

plitu

de

0 20 40 60 80 100−2

−1

0

1

2


PCP

PCOPYXMzI



e(mod)ke(sys)



Control Signal for Sinusoidal Input



Iteration

IterationIteration

Am

plitu

de

0 20 40 60 80 1000

2

4

6

8


PCP

PCOPYXMzI



e(mod)ke(sys)




Tracking Square-Wave Input

Control Signal for Square-Wave InputIterationIterationIterationIteration

Iteration

Iteration

Am

plitu

de

0 20 40 60 80 100

−0.5

0

0.5

1

1.5

2


PCP

PCOPYXMzI



e(mod)ke(sys)






Iteration

Am

plitu

de

Fig. 10. (a) Feedforward control of system 1. The controller, � (2,1):6:1, was trained to track uniformly distributed (white) random input, between [−8, 8].Trained performance and generalization are shown.

B. Feedforward Control

After system identification was done, the controller wastrained to perform feedforward control of each system. The con-trol input signal was always uniform i.i.d. random input. Sincethe plants themselves are artificial, this artificial control signalwas chosen. In general, it is the hardest control signal to fol-low. The plants were undisturbed; disturbance canceling fordisturbed plants is considered in the next section.

The control signals generated by the trained controllers arenot i.i.d. uniform. Therefore, the system identification per-formed in the previous section is not sufficient to properly trainthe controller. It provides a very good initial set of valuesfor the weights of the controller, however, and system iden-tification continues on-line as the controller is trained withthe BPTM algorithm. Again, the adaptive learning rate µi

k =

min0≤ j≤k 1/‖X i, j ‖2 proved effective.

First, the controllers were trained with an i.i.d. uniform com-mand input. The reference model in all cases (except for system2) was a unit delay. When the weights had converged, the val-

ues of the network weights were frozen, and the controller wastested with an i.i.d. uniform, a sinusoidal and a square wave toshow the performance and generalization of the system. Theresults are presented in Fig. 10. The desired plant output (grayline) and the true plant output (solid line) are shown for eachscenario.

Generally (specific variations will be addressed below) thetracking of the uniform i.i.d. signal was nearly perfect, and thetracking of the sinusoidal and square waves was excellent aswell. We see that the control signals generated by the controllerare quite different for the different desired trajectories, so thecontroller can generalize well. When tracking a sinusoid, thecontrol signal for a linear plant is also sinusoidal. Here, thecontrol signal is never sinusoidal, indicating in a way the degreeof nonlinearity in the plants.

Notes on System 2: System 2 is a nonminimum-phase plant.This can be easily verified by noticing that its linear-filter parthas a zero at z = 2. This plant cannot follow a unit-delay ref-erence model. Therefore, reference models with different laten-


0 20 40 60 80 100−1

0

1

2

3


PCP

PCOPYXMzI



e(mod)ke(sys)

k





Iteration


Am

plitu

de

0 20 40 60 80 100−1.5

−1

−0.5

0

0.5

1


PCP

PCOPYXMzI



e(mod)ke(sys)





Iteration

Iteration


Am

plitu

de0 20 40 60 80 100

−1

−0.5

0

0.5

1


PCP

PCOPYXMzI



e(mod)ke(sys)






Iteration


Am

plitu

de

0 20 40 60 80 100−1

−0.5

0

0.5

1

1.5


PCP

PCOPYXMzI



e(mod)ke(sys)






Iteration

IterationIteration

Am

plitu

de

0 20 40 60 80 1000

0.2

0.4

0.6


PCP

PCOPYXMzI



e(mod)ke(sys)






Iteration

Iteration

Am

plitu

de

0 20 40 60 80 100

−0.5

−0.4

−0.3

−0.2

−0.1

0


PCP

PCOPYXMzI



e(mod)ke(sys)






Iteration

Am

plitu

de

Fig. 10. (b) Feedforward control of system 2. The controller, � (20,1):20:1, was trained to track uniformly distributed (white) random input, between [−0.75,2.5].

cies were tried, for delays of zero time samples up to 15 timesamples. In each case, the controller was fully trained, and thesteady-state mean-squared-system error was measured. A plotof the results is shown in Fig. 11. Since both low MSE and lowlatency is desirable, a reference model for this work was chosento be a delay of ten time samples: M(z) = z−10.

Notes on System 3: The output of system 3 is constrainedto be positive due to the squaring device in its representation.Therefore it is interesting to see how well this system gener-alizes when asked to track a zero-mean sinusoidal signal. Asshown in Fig. 10(c), the result is something like a full-waverectified version of the desired result. This is neither good norbad—just curious.

Simulations were also done to see what would happen if thesystem were trained to follow this kind of command input.In that case, the plant output looks like a half-wave rectifiedversion of the input. Indeed, this is the result that minimizesMSE—the training algorithm works!

Notes on System 4: As can be seen from Fig. 10(d), the con-

trol signal required to control this plant is extremely harsh. Thehysteresis in the plant requires a type of modulated bang-bangcontrol. Notice that the control signals for the three different in-puts are almost identical. The plant is very sensitive to its input,yet can be controlled precisely by a neural network trained withthe BPTM algorithm.

C. Disturbance Canceling

With system identification and feedforward control accom-plished, disturbance cancellation was performed. The input tothe disturbance canceling filter X was chosen to be tap-delayedcopies of the uk and wk signals.

The BPTM algorithm was used to train the distur-bance cancelers with the same adaptive learning rate µi

k =

min0≤ j≤k 1/‖X i, j ‖2. After training, the performance of the

cancelers was tested and the results are shown in Fig. 12. Inthis figure, each system was run with the disturbance cancelerturned off for 500 time samples, and then turned on for the next


0 20 40 60 80 1000

0.5

1

1.5

2

2.5

3


PCP

PCOPYXMzI



e(mod)ke(sys)

k





Iteration


Am

plitu

de

0 20 40 60 80 100−0.5

0

0.5

1

1.5


PCP

PCOPYXMzI



e(mod)ke(sys)





Iteration

Iteration


Am

plitu

de0 20 40 60 80 100

−1

−0.5

0

0.5

1


PCP

PCOPYXMzI



e(mod)ke(sys)






Iteration


Am

plitu

de

0 20 40 60 80 100−0.5

0

0.5


PCP

PCOPYXMzI



e(mod)ke(sys)






Iteration

IterationIteration

Am

plitu

de

0 20 40 60 80 100

0

0.2

0.4

0.6

0.8

1


PCP

PCOPYXMzI



e(mod)ke(sys)






Iteration

Iteration

Am

plitu

de

0 20 40 60 80 100−0.2

0

0.2

0.4

0.6


PCP

PCOPYXMzI



e(mod)ke(sys)






Iteration

Am

plitu

de

Fig. 10. (c) Feedforward control of system 3. The controller, � (2,8):8:1, was trained to track uniformly distributed (white) random input, between [0,3].

500 time samples. The squared system error is plotted. Thedisturbance cancelers do an excellent job of removing the dis-turbance from the systems.

VII. CONCLUSIONS

Adaptive inverse control is very simple yet highly effective.It works for minimum-phase or nonminimum-phase, linear ornonlinear, SISO or MIMO, stable or stabilized plants. The con-trol scheme is partitioned into smaller sub-problems that canbe independently optimized. First, an adaptive plant model ismade; secondly, a constrained adaptive controller is generated;finally, a disturbance canceler is adapted. All three processesmay continue concurrently, and the control architecture is unbi-ased if the plant is linear, and is minimally biased if the plantis nonlinear. Excellent control and disturbance canceling forminimum-phase or nonminimum-phase plants is achieved.

ACKNOWLEDGMENT

This work was supported in part by the following organiza-tions: The National Science Foundation under contract ECS–9522085 and the Electric Power Research Institute under con-tract WO8016–17.

REFERENCES

[1] K. J. Astrom and B. Wittenmark, Adaptive Control, Addison-Wesley,Reading, MA, second edition, 1995.

[2] S. H. Lane and R. F. Stengel, “Flight control design using non-linear in-verse dynamics”, Automatica, vol. 24, no. 4, pp. 471–83, 1988.

[3] D. Enns, D. Bugajski, R. Hendrick, and G. Stein, “Dynamic inversion: Anevolving methodology for flight control design”, International Journal ofControl, vol. 59, no. 1, pp. 71–91, 1994.

[4] Y. D. Song, T. L. Mitchell, and H. Y. Lai, “Control of a class of nonlinearuncertain systems via compensated inverse dynamics approach”, IEEETransactions on Automatic Control, vol. 39, no. 9, pp. 1866–71, 1994.

[5] B. S. Kim and A. J. Calise, “Nonlinear flight control using neural net-works”, Journal of Guidance, Control and Dynamics, vol. 20, no. 1, pp.26–33, January–February 1997.

[6] L. Yan and C. J. Li, “Robot learning control based on recurrent neuralnetwork inverse model”, Journal of Robotic Systems, vol. 14, no. 3, pp.199–212, 1997.


0 20 40 60 80 100−0.1

−0.05

0

0.05

0.1


PCP

PCOPYXMzI



e(mod)ke(sys)

k





Iteration


Am

plitu

de

0 20 40 60 80 100−6

−4

−2

0

2

4

6


PCP

PCOPYXMzI



e(mod)ke(sys)





Iteration

Iteration


Am

plitu

de0 20 40 60 80 100

−0.1

−0.05

0

0.05

0.1


PCP

PCOPYXMzI



e(mod)ke(sys)






Iteration


Am

plitu

de

0 20 40 60 80 100−6

−4

−2

0

2

4

6


PCP

PCOPYXMzI



e(mod)ke(sys)






Iteration

IterationIteration

Am

plitu

de

0 20 40 60 80 100

0

0.02

0.04

0.06

0.08

0.1


PCP

PCOPYXMzI



e(mod)ke(sys)






Iteration

Iteration

Am

plitu

de

0 20 40 60 80 100−6

−4

−2

0

2

4

6


PCP

PCOPYXMzI



e(mod)ke(sys)






Iteration

Am

plitu

de

Fig. 10. (d) Feedforward control of system 4. The controller, � (10,1):30:1, was trained to track uniformly distributed (white) random input, between [−0.1,0.1].Notice that the control signals are almost identical for these three very different inputs!

[7] G. L. Plett, “Adaptive inverse control of unknown stable SISO and MIMOlinear systems”, International Journal of Adaptive Control and SignalProcessing, vol. 16, no. 4, pp. 243–72, May 2002.

[8] G. L. Plett, Adaptive Inverse Control of Plants with Disturbances, PhDthesis, Stanford University, Stanford, CA 94305, May 1998.

[9] B. Widrow, G. L. Plett, E. Ferreira, and M. Lamego, “Adaptive inversecontrol based on nonlinear adaptive filtering”, in Proceedings of 5thIFAC Workshop on Algorithms and Architectures for Real-Time ControlAARTC’98, (Cancun, MX: April 1998), pp. 247–252, (invited paper).

[10] B. Widrow and G. L. Plett, “Nonlinear adaptive inverse control”, in Pro-ceedings of the 36th IEEE Conference on Decision and Control, (SanDiego, CA: 10–12 December 1997), vol. 2, pp. 1032–1037.

[11] M. Bilello, Nonlinear Adaptive Inverse Control, PhD thesis, StanfordUniversity, Stanford, CA, April 1996.

[12] B. Widrow and G. L. Plett, “Adaptive inverse control based on linear andnonlinear adaptive filtering”, in Proceedings of the World Congress onNeural Networks, (San Diego, CA: September 1996), pp. 620–27.

[13] B. Widrow and E. Walach, Adaptive Inverse Control, Prentice Hall P T R,Upper Saddle River, NJ, 1996.

[14] H. T. Siegelmann, B. B. Horne, and C. L. Giles, “Computational ca-pabilities of recurrent NARX neural networks”, IEEE Trans. Sys., Manand Cybernetics—Part B: Cybernetics, vol. 27, no. 2, pp. 208–215, April1997.

[15] P. Werbos, Beyond Regression: New Tools for Prediction and Analysis

in the Behavioral Sciences, PhD thesis, Harvard University, Cambridge,MA, August 1974.

[16] D. B. Parker, “Learning logic”, Tech. Rep. Invention Report S81–64, File1, Office of Technology Licencing, Stanford University, October 1982.

[17] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internalrepresentations by error propagation”, in Parallel Distributed Processing,D. E. Rumelhart and J. L. McClelland, Eds., vol. 1, chapter 8. The MITPress, Cambridge, MA, 1986.

[18] R. J. Williams and D. Zipser, “Experimental analysis of the real-time re-current learning algorithm”, Connection Science, vol. 1, no. 1, pp. 87–111,1989.

[19] D. H. Nguyen and B. Widrow, “Neural networks for self-learning controlsystems”, IEEE Control Systems Magazine, vol. 10, no. 3, pp. 18–23,April 1990.

[20] S. W. Piche, “Steepest descent algorithms for neural network controllersand filters”, IEEE Transactions on Neural Networks, vol. 5, no. 2, pp.198–212, March 1994.

[21] B. Widrow and M. A. Lehr, “30 years of adaptive neural networks: Per-ceptron, Madaline, and backpropagation”, Proceedings of the IEEE, vol.78, no. 9, pp. 1415–42, September 1990.

[22] A. N. Kolmogorov, “On the representation of continuous functions ofmany variables by superposition of continuous functions of one variableand addition”, Dokl. Akad. Nauk USSR, vol. 114, pp. 953–56, 1957, (inRussian).


0 200 400 600 800 10000

10

20

30

40


PCP

PCOPYXMzI



e(mod)ke(sys)

k

System 1


Iteration

Am

plitu

deSq

uare

d

0 200 400 600 800 10000

1

2

3

4


PCP

PCOPYXMzI



e(mod)ke(sys)

kSystem 1

System 2

System 3System 4

Iteration

Am

plitu

deSq

uare

d0 200 400 600 800 1000

0

2

4

6

8


PCP

PCOPYXMzI



e(mod)ke(sys)

kSystem 1System 2

System 3

System 4

Iteration

Am

plitu

deSq

uare

d

0 200 400 600 800 10000

0.02

0.04

0.06

0.08


PCP

PCOPYXMzI



e(mod)ke(sys)


System 4

Iteration

Am

plitu

deSq

uare

d

Fig. 12. Plots showing disturbance cancellation. Each system is run with the disturbance canceler turned off for 500 time steps. Then, the disturbance canceler isturned on and the system is run for an additional 500 time steps. The square amplitude of the system error is plotted. The disturbance-canceler architectures were:� ([5,5],2):10:1, � ([5,5],1):10:1, � ([5,5],4):10:1, and � ([5,5],1):30:1 respectively.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15−45

−40

−35

−30

−25

−20

−15

−10

−5

0


PCP

PCOPY

XMzI



e(mod)ke(sys)

k

Latency (samples)

10lo

g 10(

mea

n-sq

uare

dsy

stem

erro

r)

Fig. 11. Mean-squared system error (in dB) plotted versus control-system la-tency for System 2.

[23] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression,Kluwer Academic Publishers, Boston, MA, 1992.

[24] G. V. Puskorius and L. A. Feldkamp, “Recurrent network training withthe decoupled extended Kalman filter algorithm”, in Science of ArtificialNeural Networks, New York, (Orlando Florida: 21–24 April 1992), SPIEProceedings Series, vol. 1710, part 2, pp. 461–473.

[25] G. L. Plett and H. Bottrich, “DDEKF learning for fast nonlinear adaptiveinverse control”, in CD-ROM Proceedings of 2002 World Congress onComputational Intelligence, (Honolulu, HI: May 2002).

[26] C. E. Garcia and M. Morari, “Internal model control. 1. A unifying reviewand some new results”, Industrial and Engineering Chemistry ProcessDesign and Development, vol. 21, no. 2, pp. 308–23, April 1982.

[27] C. E. Garcia and M. Morari, “Internal model control. 2. Design procedurefor multivariable systems”, Industrial and Engineering Chemistry ProcessDesign and Development, vol. 24, no. 2, pp. 472–84, April 1985.

[28] C. E. Garcia and M. Morari, “Internal model control. 3. Multivariable con-trol law computation and tuning guidelines”, Industrial and EngineeringChemistry Process Design and Development, vol. 24, no. 2, pp. 484–94,

April 1985.[29] D. E. Rivera, M. Morari, and S. Skogestad, “Internal model control. 4.

PID controller design”, Industrial and Engineering Chemistry ProcessDesign and Development, vol. 25, no. 1, pp. 252–65, January 1986.

[30] C. G. Economou and M. Morari, “Internal model control. 5. Extension tononlinear systems”, Industrial and Engineering Chemistry Process De-sign and Development, vol. 25, no. 2, pp. 403–11, April 1986.

[31] C. G. Economou and M. Morari, “Internal model control. 6. Multiloopdesign”, Industrial and Engineering Chemistry Process Design and De-velopment, vol. 25, no. 2, pp. 411–19, April 1986.

[32] J. E. Marsden and A. J. Tromba, Vector Calculus, pp. 243–7, W. H. Free-man and Company, third edition, 1988.

[33] K. S. Narendra and K. Parthasarathy, “Identification and control of dy-namical systems using neural networks”, IEEE Transactions on NeuralNetworks, vol. 1, no. 1, pp. 4–27, March 1990.

Gregory L. Plett (S’97–M’98–SM’02) was born inOttawa, ON, in 1968. He received the B.Eng. degreein computer systems engineering (with high distinc-tion) from Carleton University, Ottawa, in 1990, andthe M.S. and Ph.D. degrees in electrical engineeringfrom Stanford University, CA, in 1992 and 1998, re-spectively. He is currently Assistant Professor of Elec-trical and Computer Engineering at the University ofColorado at Colorado Springs.

Documents

ACCEPTED FOR PUBLICATION: IEEE TRANSACTIONS ON NEURAL ...mocha-java.uccs.edu/dossier/RESEARCH/200Xtnnaic-.pdf · ACCEPTED FOR PUBLICATION: IEEE TRANSACTIONS ON NEURAL NETWORKS, (IN