Upload
jay-a
View
212
Download
0
Embed Size (px)
Citation preview
1220 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 4, JULY 2007
Self-Organizing Approximation-Based Controlfor Higher Order Systems
Yuanyuan Zhao, Student Member, IEEE, and Jay A. Farrell, Senior Member, IEEE
Abstract—Adaptive approximation-based control typically usesapproximators with a predefined set of basis functions. Recently,spatially dependent methods have defined self-organizing approx-imators where new locally supported basis elements were incor-porated when existing basis elements were insufficiently excited.In this paper, performance-dependent self-organizing approxima-tors will be defined. The designer specifies a positive tracking errorcriteria. The self-organizing approximation-based controller thenmonitors the tracking performance and adds basis elements onlyas needed to achieve the tracking specification. The method of thispaper is applicable to general th-order input-state feedback lin-earizable systems. This paper includes a complete stability analysisand a detailed simulation example.
Index Terms—Adaptive nonlinear control, locally weightedlearning, self-organizing approximation-based control.
I. INTRODUCTION
ONLINE approximation-based control has been well de-veloped to achieve stability and accurate reference input
tracking in the presence of partially unknown nonlinear dy-namics. The design and analysis of adaptive systems involvingonline approximation structures has been extensively addressedin, e.g., [1]–[8], including controller structure selection, auto-matic adjustment of the control law, and the complete proofsof stability. Since such online approximation-based controllerscan never achieve an exact modeling of unknown nonlinearities,inherent approximation errors could arise even if optimal ap-proximator parameters were selected. With the existing results,it is well understood that the tracking performance ultimatelyachieved is determined by the upper bound on these inherentapproximation errors, which can be improved by the choicesof the designer, such as approximator structure, parameteradaptation algorithm, etc. Therefore, proper choices of theapproximator structure and parameter adaptation will producebetter tracking performance without changing the control gain(i.e., bandwidth) of the control system.
Many published universal approximation results [9], [10]state that under reasonable assumption on the basis functionand the function to be approximated, for any given ,if the network approximator has a sufficiently large numberof nodes, then approximation accuracy can be achieved by
Manuscript received November 20, 2005; revised August 15, 2006; acceptedFebruary 5, 2007. This work was supported by the National Science Foundationunder Grant ECS-0322635.
The authors are with the Department of Electrical Engineering, University ofCalifornia, Riverside, CA 92521 USA (e-mail: [email protected]; [email protected]).
Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TNN.2007.899217
proper selection of the approximator parameters. Thus, an idealdesign needs to allocate a sufficiently large number of learningparameters to achieve accuracy specification for function ap-proximation while maintaining low computational complexity.Additionally, allocating too many learning parameters bears thedanger of overparameterizing the approximation, which maybe capable of fitting the measurement noise. With these motiva-tions, nonlinear adaptive control with function approximationemploying automatic structure adaptation has been discussedin [2] and [11]–[16]. In [13] and [14], the authors use waveletnetworks and adapt the structure of the network in responseto the evaluation of the magnitude of the output weights by“hard-thresholding.” Smoothly interpolated linear models areconsidered in [2].
The work of [15] and [16] defines local approximators withinlocalized receptive field, and allows the online approximationto be tuned in a local region without affecting the approxima-tion accuracy previously achieved in other regions. Therefore,a more capable function approximation structure is producedto retain approximation accuracy as a function of the operatingpoint. For the structure adaptation, these papers use gradient de-scent to adjust the distance metric of each local approximatorso that each receptive field is tuned according to local curvatureproperties of the unknown function. In [2], [11], and [12], theyuse the linearly parameterized model locally, which is a spe-cial case of the receptive field weighted regression (RFWR) ap-proach proposed in [15] and [16]. No stability results are givenin [15] and [16]. The common drawbacks of the existing ap-proaches [2], [11]–[14] are the following: 1) they only addressthe stability analysis for the state and the approximator parame-ters, not the change in the number of basis functions, and 2) thestructure adaptation algorithms are defined by the trajectory, notby performance. New approximator nodes are added when thestate is sufficiently far from all existing receptive field centers,whether or not additional approximation accuracy is required.
In [17], we suggested an approach to adjust the structureof the approximator based on the tracking performance of thecontrol system. The structure of the function approximator istuned online such that sufficiently accurate approximation isattained with the least number of local approximators to en-sure the required state tracking performance without enlargingthe bandwidth of the control system. The analysis of [17] fo-cused on the scalar single-input–single-output (SISO) system
with being known and.
In this paper, we extend the design method of the approxi-mator structure adaptation to th-order input-state feedback lin-earizable systems with unknown . Our goal is to design aself-organizing online approximation-based control to achieve
1045-9227/$25.00 © 2007 IEEE
ZHAO AND FARRELL: SELF-ORGANIZING APPROXIMATION-BASED CONTROL FOR HIGHER ORDER SYSTEMS 1221
prespecified tracking accuracy for a linear combination of statetracking errors, without using high-gain control nor large mag-nitude switching. We will also show that each component of thetracking error satisfies a known bound.
II. PROBLEM STATEMENT
In this paper, we extend the previous self-organizing con-troller design of [17] from scalar systems to th-order SISOinput-state feedback linearizable systems of the form
(1)
(2)
where is the state vector and is the con-trol signal. The functions and represent the knownmodel at the design stage (i.e., the design model). The functions
and represent nonlinear effects that are unknown atthe design stage. The functions , , , and are each as-sumed to be continuous functions.
Our goal is to design the control signal to steer totrack the reference input and to achieve boundedness forthe states for . In regions of state space wherethe effect of the model error is too large for the desired levelof tracking to be achieved, the (provably stable) approach pre-sented herein will adapt the structure of the online approximatorto allow the tracking specification to be achieved.
To ensure controllability, it is necessary to assume thatis bounded away from zero and of known sign.
Therefore, without loss of generality, we will invoke the fol-lowing assumption.
Assumption 2.1: The function has a lower boundsuch that , where
is a known function and is a known constant.
A. Reference Trajectory
When the physical system is designed, a specified operatingrange is defined for each element of the state vector as a compactset . Let . The control systemshould ensure that the state remains in the physical operatingregion . It is, therefore, reasonable to assume that the desiredstate trajectory is sufficiently inside . These constraints arestated more rigorously in the following.
There is a desired trajectory with derivatives, , each of which is available and
bounded. The vectorfor all , which implies
that for all . In fact, we will assume existenceof a small constant such that
(3)
for any . This condition states that the desired trajectoryis at least a distance from the boundary of . The unknownfunctions and will be approximated over the region .
Throughout this paper
are the tracking error components and is the tracking errorvector defined as . Note that
.For parameter adaptation, it is convenient to define a scalar
projection of the tracking error vector
(4)
where . Note that definesan -dimensional hyperplane in . The absolute valueof represents the distance of from this hyperplane. Onthe hyperplane , the dynamics of are defined by
...
where is the Laplace variable; therefore, is be selected sothat is a Hurwitzpolynomial. In this case, the transfer function
is bounded-input–bounded-output (BIBO) stable. If can beshown to be bounded for all , then each isbounded.
To allow the bounds to be easily expressed, as suggested in[18, sec. 7.1], it is convenient to choose such that
for some constant . This implies that the vector in (4)is defined as , where
is the binomial coefficient. The transferfunctions to from are
The advantage of defining in this manner is that if there ex-ists a constant such that the magnitude of is boundedas , then the tracking errors are asymptoti-cally bounded by
(5)
1222 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 4, JULY 2007
which yields
as (6)
with and beingthe two-norm of a vector. See [18, pp. 279–280] for additionaldetail.
The self-organizing online approximation-based controllerdeveloped in the subsequent sections is designed to maintainstability and to achieve a tracking accuracy ofwith prespecified at the design stage. If is selectedas previously mentioned, then ensures that
as . This tracking objectivewill be achieved without using large magnitude switching norhigh gain control. The controller will also include terms toensure that the region is attractive (i.e., initial conditionsoutside converge to in finite time).
III. LWL ALGORITHM
In this paper, our development of an online approximation-based control employs locally weighted learning (LWL) [11],[12]. In LWL, the approximation of at a point is formedfrom the normalized weighted average of local approximators
such that
(7)
where each is nonzero only on a set denoted by [defined in(8)] over which will be adapted to improve its accuracy rela-tive to . In the following, we focus on a specific LWL algorithmand give all definitions required for the discussions that follow.Although we will only give definitions based on the function
, similar definitions and discussion also apply to .
A. Weighting Functions
We define a continuous, nonnegative, and locally supported1
weighting function for the th local approximator. De-note the support of by
(8)
Let denote the closure of . Note that is a compact set.An example of a weighting function satisfying the aforemen-tioned conditions is the biquadratic kernel defined as
if
otherwise.(9)
where is the center location of the th weighting functionand is a constant which represents the radius of the region ofsupport. In this example, the region of support is
1If we define the size of a set S by �(S) = max (kx � yk), then“locally supported” means that �(S ) is small relative to �(D).
Since the approximator is self-organizing, the number of localapproximators is not constant. Conditions for increasing
at discrete instants of time are presented in Section IV-C.Since is time varying, the region over which the approximatordefined in (7) can have a nonzero value is also time varying. Thisregion is defined as
When , there exists at least one such that. The normalized weighting functions are defined as
The set of nonnegative functions forms a partitionof unity on
for all
Note that the support of is exactly the same as the supportof .
When , all for are zero.Therefore, to complete the approximator definition of (7) to bevalid for any
ifif .
(10)
In the reminder of this section, we will only consider the casewhen to give all definitions for the LWL algo-rithm.
B. Local Approximators
We define
(11)
where is a prespecified vector of continuous basis functions.For the function in (2), the vector denotes the unknownoptimal parameter estimates for
(12)
where
(13)
Note that is well defined for each because and aresmooth on compact . Therefore, will be referred to as theoptimal local approximator to on .
Let the approximation error on be denoted as
(14)
ZHAO AND FARRELL: SELF-ORGANIZING APPROXIMATION-BASED CONTROL FOR HIGHER ORDER SYSTEMS 1223
In order for to be defined everywhere, let
onotherwise.
The controller will use a known design constant . Weassume that the basis set is sufficiently rich and is suffi-ciently small such that for for some (un-known) positive constant . Note that the boundedness of
comes from the fact that is continuouson compact .
For any , can be represented as the weightedsum of the local approximators
(15)
This expression defines the approximation error onwhich satisfies , since
(16)
(17)
Therefore, if each local model has accuracy on , thenthe global accuracy of on also achievesat least accuracy . The term in (15) is the inherent approx-imation error of for .
Since we have assumed that is unknown, the parametervector is unknown for each . The control law will, there-fore, be written using approximated functions defined globallyon by (10) and locally on by (13). The controller willbe adaptive in the sense that the local parameter vectorswill be adjusted to improve the function approximation accuracywhile the controller is in operation. For analysis of the conver-gence of the parameter estimates, we define the parameter errorvector as for the th local approximator. Thecontroller will also be self-organizing in the sense the structureof the approximator will be augmented during operation as nec-essary to achieve a tracking specification.
IV. APPROXIMATION-BASED FEEDBACK LINEARIZATION
We consider the design and analysis of self-organizing onlineapproximation-based control for feedback linearizable systems.The task of this paper is to design an algorithm to achieve pre-specified tracking accuracy . In [17], we developeda self-organizing approach, for the special case of system (1)and (2) where and , that guaranteed -accuratetracking. In this section, we present a self-organizing control ap-proach appropriate for th-order input-state feedback lineariz-able systems for general satisfying Assumption 2.1.
A. Tracking-Error-Based Controller
The tracking error dynamics are described as
(18)
(19)
which can be written in the matrix form as
(20)
for the tracking error vector defined prior to (4), where
......
.... . .
......
The time derivative of can then be written as
(21)The control law is designed as
(22)
(23)
The motivation for this choice of control law will become clearin the subsequent analysis. In the previously described controllaw design, is a stabilizing linear controller that could take avariety of forms. Here, for simplicity, we choose the state feed-back form
whenwhen
for any scalar control gain defined for and thefeedback gain vector defined for such that
is a Hurwitz matrix (i.e., all the eigenvalues ofare in the left half complex plane). We let and
be zero for . Outside the region , they are definedto ensure the states will converge to the approximation region
. We design the and terms as
forfor andfor and
(24)
forfor andfor and
(25)
where and are known bounding functions such thatand for ,
respectively. The and terms are designed to dominate the(small) approximation errors and for . These termswill be zero for . We select the and terms as
(26)
(27)
1224 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 4, JULY 2007
where and
forotherwise.
The function is defined in (10) and is defined similarly. Weinitialize the structure with no local approximators, i.e.,
; therefore, the set is initially empty.The following analysis will be concerned with bounding
the duration of time for which . There-fore, to decrease redundancy later, we introduce the function
which measures the amount of time in the in-terval for which . For example, this functioncould be computed as
(28)
where the function is defined as
ifotherwise.
In (28), the signal is defined as .
B. Analysis for
The objective of this section is to demonstrate that the defini-tions of and in (24) and (25) ensure that all initial condi-tions will return to in finite time.
For , and all and terms are zero;therefore, the resulting closed-loop tracking error dynamics de-rived from (20) satisfy
(29)
Since we select the constant such that is a Hur-witz matrix, for any given positive-definite , there exist a pos-itive-definite matrix to satisfy the Lyapunov equation
In the following analysis, is selected to be the identity matrix.To analyze performance for , we define the Lyapunov
function
(30)
The derivative of reduces to
The design of and in (24) and (25) will ensure that thesecond term on the right is nonpositive. This yields
(31)
where and are the maximum and minimumeigenvalues of which are both positive since is positivedefinite. The comparison principle yields
for any
Therefore, for any larger than
we have that . Because the desired trajectory inand is at least from the boundary of , this implies thatenters in finite time.
If the region is defined such that its boundary coincideswith a level curve of (30), then once , the sliding modeterms will not allow to leave . If the boundary of is nota level curve of (30), then states in may leave that region,but will return to in finite time. As long as , then sat-isfaction of the tracking specification on —as proved subse-quently—will ensure that ultimately the state is confined toeven in the case where its boundary is not a level curve of (30).
The reminder of this paper will only be concerned with thecase of , where all and terms are zero.
C. Analysis for
With the control law defined in (22) and (23), for , weconsider the closed-loop dynamic equation derived from (21)for as
(32)
We will first consider the tracking performance for(i.e., ). In this case, the closed-loop dynamic equationfor defined in (32) becomes
(33)
For , define the Lyapunov function as. Then, the derivative of is
(34)
When , , and , the definitionsof and in (26) and (27) will ensure that
Then, the derivative of is reduced to
(35)
Therefore, if and while, then the Lyapunov function must decrease. If in-
ZHAO AND FARRELL: SELF-ORGANIZING APPROXIMATION-BASED CONTROL FOR HIGHER ORDER SYSTEMS 1225
creases while , then it must be true thator . This will be used as one condition for aug-menting the approximator structure. Continuing from (35) usingthe comparison principle
(36)
(37)
Therefore, if for, then it must be true that
or for some range ofwith . This will be used asa second condition for augmenting the approximator structure.
D. Structure Adaptation
The analysis in Section IV-C demonstrated that the trackingspecification (i.e., ) can be achieved without approx-imation of the unknown design model error or inthose regions of the operating envelope where the known de-sign model provides sufficient accuracy (i.e., and
). In addition, that analysis showed that the error vari-able is a useful indicator of where the design model is not ca-pable of achieving the tracking specification. Based on the anal-ysis of Section IV-C, we define the following criteria for addinga new local approximator to the approximation structure. A localapproximator is added if the following holds:
1) the present operating point does not acti-vate any of the existing local approximators (i.e.,
);2) either of the following two conditions is satisfied:
a) the function while ;b) for all .
For , we denote the time at which the th local model isadded as (i.e., and .The center location of the new local approximator is defined tobe .
With this notation, is constant with value for. It is possible that for some , the approximator has
sufficient approximation capability, in which case .
E. Analysis for
The goal of this section is to prove that for (i.e.,the number of local regions over this time interval is fixed to ),
, , , , , and for and that thetotal time is bounded for which . In this section, tosimplify the notation, we use the fact that and define
.As shown in Section III-B, the accuracy of
on achieves at least accuracy (i.e., forany ). Similarly, the functionachieves at least accuracy of on . Next, we will consider theadaptation of and to achieve the tracking specification(i.e., ultimately) for .
For , we consider the Lyapunov function
where and thepositive-definite matrices representlearning rates.
For , we choose the adaptive laws
ifotherwise
(38)
(39)
ifotherwise
(40)
where is a projection modification operation to ensurethat is bounded away from zero (see [19, App.E]). This is in accordance with the Assumption 2.1 to ensure thecontrollability of the estimated model. The parameter adaptationwill turn off when either or .
Let denote a time interval for whichis outside the deadzone (i.e., ). Over this time
interval, the state could be either outside or inside .1) For any subinterval , for which
, the parameter adaptation will automaticallystop because . Therefore, isconstant over this time interval. Section IV-C showed the
is decreasing over this interval. Therefore, alsodecreases for , such that and
(41)
2) For any subinterval , for which, the time derivative of along the solution of (32)
is
(42)
Substituting (38)–(40) and the definition of and from(26) and (27) into (42), we obtain
(43)
1226 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 4, JULY 2007
Note that since the projection modification ensures that, and have the same sign.
Thus
(44)
for any .Therefore, with , we can show that
(45)
From this, it is straightforward to show that
which implies
Next, we consider the case where enters the deadzone attime , stays inside the deadzone until , and leaves thedeadzone at . Therefore, denotes atime interval for which and is constant. Bydefinition
In addition, the following facts are true: 1) on the in-terval , the approximator parameters are constant(i.e., adaptation is off); 2) ; and3) . Using these facts, itis straightforward to show that and
. Note that these facts are in-dependent of whether enters or leaves the regionover the time interval .
In the following, we will consider the stability propertiesfor any . According to the criteria given inSection IV-D for adding the th local region, at
and . Assume starts at outsidethe deadzone, enters the deadzone at , leaves at , for
, and eventually stays outside the deadzone until .Let be the last time in this interval such that
. Therefore, the total time outside the deadzone foris
(46)
This proves that on each interval , the total time out-side the deadzone is finite. Therefore, either is infinite with
ultimately bounded by , or is finite with in-creased by one at .
Another important result is that
which follows directly from previous analysis whetheror . Therefore, for any , , , , , , and
. Note that these properties hold even if thestate enters or leaves the region an infinite number of times,or if the combined tracking error enters or leaves the deadzonean infinite number of times.
Since is compact and each increment of covers a por-tion of a volume of having radius , only a finite number ofincrements of can occur. Thus, is ultimately boundedby and , , , , , .
V. STABILITY OF THE SELF-ORGANIZING PROCESS
Here, we summarize the stability properties of the proposedself-organizing controller.
Theorem 5.1: The system described by (1) and (2) with con-trol law of (22) and (23) using the self-organizing function ap-proximation proposed in Section IV-D and the parameter adap-tation laws given in (38)–(40) has the following properties:
1) , , , , , , and ;2) is ultimately bounded by ;3) each is ultimately bounded by , for
, with being a constant selected for de-signing the vector.Proof: Let the time interval of operation be specified as
, where can be infinite. We initialize the approxi-mator structure with . Denote the times at which
increases as , as discussed in Section IV-D.When , and . As shown in
Section IV-C, either the total time such that is lessthan , , and the theorem is proved,or is finite. In either case
For , the th local region is added at . The previoussections have already proved items 1 and 2 of the theorem for
. The only remaining issue is the boundedness ofeach during the transition from to . Wewant to show that each has a finite value. The Lyapunovfunction at is
Note that , because is continuous duringthe transition from to . Since does not activate the
ZHAO AND FARRELL: SELF-ORGANIZING APPROXIMATION-BASED CONTROL FOR HIGHER ORDER SYSTEMS 1227
first local approximators when the th local approxi-mator is added at , the parameter estimates
are unchanged from to . Thus
It has been proved in the previous section that, for any, . Then, we proceed to attain
(47)
For , each has a finitevalue as long as the initial parameter estimates at
are finite. Similarly, each has afinite value. Since only a finite number of increments of canoccur (i.e., ), the summation term on the right-hand side of the inequality (47) has a finite value. Theterm is also a finite value. This directly yields that ,which implies that , , , , , and .
The ultimate bound on each comes directly from (5), whichis guaranteed by the selection of as discussed in [18].
VI. NUMERICAL EXAMPLE
We consider for illustrative purpose a second-order systemgiven by
where
For example, and we assume that there isonly partial a priori knowledge of the system nonlinearities
and . The known “designmodels” are and ;therefore, the unknown design model errors are
and . We also assume that the system is designed tooperate over the region .
A desired trajectorythat is continuous, bounded, and has , forall , is generated by the third-order system
where two functions are included to ensure that, for any . In our simulations, we
select such that the transfer functions
are BIBO stable. As long as is bounded, signals ,and will be continuous and bounded. The main idea of sucha prefilter approach to generating the reference trajectory is thatthe user specifies a signal . The prefilter computes the neces-sary derivatives and ensures that the assumptions of the con-troller are satisfied. Theoretically, can be any bounded signal.For the purpose of this simulation, is selected to be the sum of
and 0.05-Hz square wave of magnitude 1.5.The tracking accuracy we expect to achieve is that as ,
with control gain . The linearlycombined tracking error is defined as with
selected to be 1. The desired tracking accuracy for isselected to be , so that we can ensure that
ultimately. The function approximationaccuracies are specified as .
The weighting function is the biquadratic kernel of the formas
ifotherwise.
(48)
where
For this example, as done by the authors of [11] and [15], wespecify the local basis function as
with being the center of the . Therefore, is the optimallocal affine approximation to on . We select .The parameter estimates on the first allocated region andare initialized at as .When the th center is allocated at , the initialparameter estimates and are selected either tobe zero vectors or based on and , where
1228 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 4, JULY 2007
Fig. 1. Combined tracking error e(t) in the time interval of [0, 20] (dotted),[20, 40] (dash-dot), [40, 60] (dashed), and [60, 80] (solid). The second dottedline (with larger magnitude) shows e(t) in the time interval of [0, 20] for thesimulation with online approximation turned off.
is the index of the closest existing center to the th center.The logic for the parameter initialization is as follows:
if and
else
This initialization forces the th approximator to have the samevalue as the th approximator would have at the center of .This is a means of “boot strapping” the learning process. Thethree elements of are initialized similarly. The adapta-tion rate matrices are set to and
where is the square diagonal matrix withdiagonal component equal to the vector .
Fig. 1 shows the performance of for .Since the period of the reference input is 20 s, we haveplotted the for (dotted), (dash-dot),
(dashed), and (solid) along the sametime axis. The time axis of each plot has been shifted by a mul-tiple of to increase the resolution of the time axis andto facilitate the comparison over repetitions of the trajectory.Note that with online approximation, since the local regions arebeing revisited periodically with , the tracking perfor-mance improves (i.e., tends to decrease) with each repe-tition of the trajectory. This indicates the local approximatorsare learning to achieve increasingly more accurate approxima-tion to the actual function. It is particularly important to notethat the learning is a function of state and the performance im-provement will extend to different trajectories to the extent thatthe new trajectories utilize the same regions in state space. Forcomparison, the second dotted line shows a clipped portion of
for simulations without learning of the online approxima-tion. For the simulation without adaptation, the tracking perfor-mance does not show any improvement over the repetitions ofthe trajectory; the magnitude of is 0.6.
Fig. 2. Plots of the tracking errors versus time t: (a) ~x versus t; (b) ~x versust. Note that the horizontal axes are identical and that the caption is only appliedto (b). The dotted lines are tracking errors in the time interval of [0, 20]. Thedash-dot lines are tracking errors in the time interval of [20, 40]. The dashedlines are tracking errors in the time interval of [40, 60]. The solid lines aretracking errors in the time interval of [60, 80]. The second dotted lines (withlarger magnitude) show the tracking errors in the time interval of [0, 20] for thesimulation with online approximation turned off.
The tracking performance can be shown more clearly inFig. 2, which plots the tracking errors and over thefirst four repetitions of the reference trajectory. Again, forcomparison, the second dotted lines (with larger magnitude)shows the tracking errors for simulations without the learningof the online approximation. Without learning, the range ofwas from 0.6 to 0.6.
Fig. 3(a) plots the time outside the deadzoneat 20-s intervals, which is also the period of the com-
manded state . For example, consider . Fig. 3(a)shows that the time outside the deadzone in the interval
is approximately 13 s. Prior to 300 s,the combined tracking error enters the deadzone and remainstherein for the remainder of the simulation. Therefore, as shownin Fig. 3(b), the total time outside the deadzone is finite as goesto infinity. This demonstrates that the tracking error is ultimatelyachieved and the controller does not include large gains or largeamplitude high-frequency switching.
The allocated center locations for are indicated ona phase plane plot of versus in Fig. 4. During the first 80s, 35 centers are allocated with 19 on the right of the regionand 16 on the left. For the remaining simulation time, no morecenters are added. Each “ ” indicates an allocated center loca-tion. The small square around each center indicates the support
of each local approximator. Since this is an example for il-lustrative purpose, we already know functions and .We can plot the region
where both has at least the accuracy of and has at leastthe accuracy of . Note that the separation between adjacent
ZHAO AND FARRELL: SELF-ORGANIZING APPROXIMATION-BASED CONTROL FOR HIGHER ORDER SYSTEMS 1229
Fig. 3. Graphs indicating the incremental and cumulative time that the trackingerror e is outside the deadzone. (a) Each circle indicates the total time duringthe previous 20-s interval that the combined tracking error e was outside thedeadzone (i.e., jej > 0:02). (b) Cumulative time outside the deadzone. Notethat the horizontal axes are identical and that the caption is only applied to (b).
Fig. 4. Phase plane plot of x versus x for t 2 [0, 80] s. The dotted line is thedesired trajectory for t 2 [0, 80] s. The solid line shows the actual trajectory.The �s indicate the assigned center locations and the small square around eachcenter location represents the associated region of support. M labels the regionwhere jf j < 0:03 and jgj < 0:03.
centers is approximately either in horizon directionor in vertical direction. No local approximators areallocated inside the region where both model errors andare small (i.e., and ). Even outside the region
Fig. 5. (a) Actual function f and its approximation f at 400 s plotted versusx for x = 0:8. (b) Actual function g and its approximation g at 400 s plottedversus x for x = 0:8. M = fx j � 2:21 � x � 0:61g represents theregion where jf(x)j < 0:03 and jg(x)j < 0:03 at the fixed value of x = 0:8.(c) Histogram of the x values along the trajectory (x ; x ) such that 0:4 <
x < 1:2. The �’s along the horizontal axis indicate the x value of centerlocations that would be active with x = 0:8.
, no local approximators are allocated when eitheror . Fig. 4 also depicts the desired (dotted) and actual(solid) trajectories over the time interval [0, 80].
Fig. 5(a) plots the actual function (dotted) and the approx-imated (solid) at 400 s versus for fixed . Thefigure also shows the region such that for with
, both has at least the accuracy of and has atleast the accuracy of . Since no local approximators are al-located for , on . For some ,
has adapted. Over a portion of the operation range, hasconverged toward . Fig. 5(c) shows a histogram of the valuesof along the trajectory for which . These arethe trajectory points that could have an influence on the func-tion approximations shown in Fig. 5(a) and (b). Note that theapproximators are zero in the following: 1) region wherethey are not needed due to the accuracy of the design model, 2)regions where the trajectory did not enter, and 3) regions wherethe state did traverse but with performance was sufficiently goodthat center allocation did not occur. Similarly, Fig. 5(b) displaysthe actual function (dotted) and the approximator (solid) at400 s versus for . Note that for negative values of ,the approximated has overestimated the value of . The theoryherein gives no guarantee of convergence of the parameter es-timates, but does ensure the following: 1) they are bounded, 2)the Lyapunov function is decreasing, and 3) the tracking error
1230 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 4, JULY 2007
Fig. 6. Phase plane plot of x versus x for t 2 [0, 120] s for a simulation withr(t) defined as a sequence of random step inputs with amplitudes in [�4.5, 4.5].The �’s indicate the assigned center locations. The small square around eachcenter location represents the associated region of support. M labels the regionwhere jf j < 0:03 and jgj < 0:03.
is ultimately within the deadzone. This can be accomplishedwithout convergence of the parameter errors.
For Figs. 1–5, was a periodic signal to facilitate the il-lustration of the learning process. Fig. 6 shows the trajectory(solid) and center distribution for a 120-s simulation in whichis a step function of random amplitude in [ 4.5, 4.5] where thestep function amplitude changed every 5.0 s. The randomnessof the amplitude leads to a greater exploration of the operatingregion. Note that the allocated centers still (correctly) lie outsideof .
VII. CONCLUSION
In this paper, we have developed a new self-organizingscheme for the th-order input-state feedback linearizablesystems. The online function approximation is based on theLWR framework, but can be extended to other approximatorswith locally supported basis functions. The proposed onlineself-organizing structure is stable in the sense of Lyapunov andguarantees that a linear combination of tracking errors (i.e.,
with the vector selected in the design stage)will ultimately achieve a specified tracking error bound. Thisis achieved without enlarging the bandwidth of the controlsystem. In addition, with the appropriate selection of the linearcombination (i.e., ), each state tracking error can be shown toultimately achieve some certain accuracy that is related to theprespecified accuracy of the combined tracking error.
The advantage of the self-organizing approach is that it allo-cates new local approximators only as necessary to achieve thedesired tracking accuracy. Previously existing approaches allo-cated new local approximators based on state independent oftracking performance. Therefore, for the same value of , the
approach presented herein should always require fewer local ap-proximators. Due to the “curse of dimensionality,” the payoff forself-organizing systems is much higher for systems with higherstate dimension, especially when the trajectory does not fullyexplore the operating envelope denoted herein by .
In our future work, adaptation of the radius of the region ofsupport (i.e., ) is of interest. In this paper, was picked apriori assuming that both continuous functions and could beapproximated by linear functions and with at least accu-racy and , respectively. While this is a much looser assump-tion than explicit knowledge of the functional forms of and ,methods that adapt in a stable fashion are also of interest.Note that for continuous functions this assumption will be truefor small enough . Therefore, methods where positive ismonotonically decreased should be straightforward. Of greaterinterest are methods that adjust based on performance. Anadditional important direction for future research is extensionof performance-based self-organization to systems more gen-eral than have been considered herein.
ACKNOWLEDGMENT
Any opinions, findings, and conclusions or recommendationsexpressed in this material are those of the author(s) and do notnecessarily reflect the views of the National Science Founda-tion.
REFERENCES
[1] F.-C. Chen and H. K. Khalil, “Adaptive control of a class of nonlineardiscrete-time systems using neural networks,” IEEE Trans. Autom.Control, vol. 40, no. 5, pp. 791–801, May 1995.
[2] J. Y. Choi and J. A. Farrell, “Nonlinear adaptive control using networksof piecewise linear approximators,” IEEE Trans. Neural Netw., vol. 11,no. 2, pp. 390–401, Mar. 2000.
[3] J.-P. Jiang and L. Praly, “Design of robust adaptive controllers for non-linear systems with dynamic uncertainties,” Automatica, vol. 34, no. 7,pp. 825–840, 1998.
[4] F. L. Lewis, K. Liu, and Y. A. , “Neural net robot control with guar-anteed tracking performance,” IEEE Trans. Neural Netw., vol. 6, no. 3,pp. 703–715, May 1995.
[5] F. L. Lewis, A. Yesildirek, and K. Liu, “Multilayer neural-net robotcontroller with guaranteed tracking performance,” IEEE Trans. NeuralNetw., vol. 7, no. 2, pp. 388–399, Mar. 1996.
[6] R. M. Sanner and J. J. Slotine, “Gaussian networks for direct adaptivecontrol,” IEEE Trans. Neural Netw., vol. 3, no. 6, pp. 837–863, Nov.1992.
[7] M. Polycarpou, “Stable adaptive neural control scheme for nonlinearsystems,” IEEE Trans. Autom. Control, vol. 41, no. 3, pp. 447–451,Mar. 1996.
[8] M. Polycarpou and M. Mears, “Stable adaptive tracking of uncertainsystems using nonlinearly parameterized on-line approximators,” Int.J. Control, vol. 70, no. 3, pp. 363–384, 1998.
[9] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforwardnetworks are universal approximators,” Neural Netw., vol. 2, pp.359–366, 1989.
[10] J. Park and I. W. Sandberg, “Universal approximation using radial basisfunction networks,” Neural Comput., vol. 3, no. 2, pp. 246–257, 1991.
[11] J. Nakanishi, J. A. Farrell, and S. Schall, “A locally weighted learningcomposite adaptive controller with structure adaptation,” in Proc.IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2002, pp. 882–889.
[12] J. Nakanishi, J. A. Farrell, and S. Schaal, “Composite adaptive controlwith locally weighted statistical learning,” Neural Netw., vol. 18, no. 1,pp. 71–90, 2005.
[13] R. M. Sanner and J. J. Slotine, “Structurally dynamic wavelet networksfor adaptive control of robotic systems,” Int. J. Control, vol. 70, no. 3,pp. 405–421, 1998.
[14] M. Cannon and J.-J. Slotine, “Space-frequency localized basis functionnetworks for nonlinear system estimation and control,” Neurocomput.,vol. 9, pp. 293–342, 1995.
ZHAO AND FARRELL: SELF-ORGANIZING APPROXIMATION-BASED CONTROL FOR HIGHER ORDER SYSTEMS 1231
[15] C. G. Atkeson, A. W. Moore, and S. Schaal, “Locally weighted learningfor control,” Artif. Intell. Rev., vol. 11, pp. 75–113, 1996.
[16] S. Schaal and C. G. Atkeson, “Constructive incremental learning fromonly local information,” Neural Comput., vol. 10, no. 8, pp. 2047–2084,1998.
[17] J. A. Farrell and Y. Zhao, “Self-organizing approximation based con-trol,” in Proc. Amer. Control Conf., Minneapolis, MN, Jun. 14–16,2006, pp. 3378–3384.
[18] J. Slotine and W. Li, Applied Nonlinear Control. Englewood Cliffs,NJ: Prentice-Hall, 1991.
[19] M. Krstic, I. Kanellakopoulos, and P. Kokotovic, Nonlinear and Adap-tive Control Design. New York: Wiley, 1995.
Yuanyuan Zhao (S’05) received the B.S. and M.S.degrees in electrical engineering from the Universityof Science and Technology, Beijing, China, in 1997and 2000, respectively and the Ph.D. degree in elec-trical engineering from the University of California,Riverside, in 2007.
Her research interests are in computational intel-ligence, learning control, nonlinear adaptive control,and nonlinear robust control.
Jay A. Farrell (S’87–M’89–SM’98) received theB.S. degrees in physics and electrical engineeringfrom the Iowa State University, Ames, in 1986 andthe M.S. and Ph.D. degrees in electrical engineeringfrom the University of Notre Dame, Notre Dame,IN, in 1988 and 1989, respectively.
From 1989 to 1994, he was a Principal Investi-gator on projects involving intelligent and learningcontrol systems for autonomous vehicles at theCharles Stark Draper Laboratory. He is a Professorand former Chair of the Department of Electrical
Engineering, University of California, Riverside. He is the author of over140 technical publications. He is also a coauthor of the books The GlobalPositioning System and Inertial Navigation (New York: McGraw-Hill, 1998)and Adaptive Approximation Based Control: Unifying Neural, Fuzzy andTraditional Adaptive Approximation Approaches (New York: Wiley, 2006).
Dr. Farrell received the Engineering Vice President’s Best Technical Publica-tion Award in 1990 and Recognition Awards for Outstanding Performance andAchievement in 1991 and 1993.