Recursive Identification for Nonlinear ARX Systems Based on Stochastic Approximation Algorithm

Preview:

Citation preview

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 6, JUNE 2010 1287

Recursive Identification for Nonlinear ARX SystemsBased on Stochastic Approximation Algorithm

Wen-Xiao Zhao, Han-Fu Chen, Fellow, IEEE, and Wei Xing Zheng, Senior Member, IEEE

Abstract—The nonparametric identification for nonlinear au-toregressive systems with exogenous inputs (NARX) describedby �� � � � � � �� � � � �� � � �� isconsidered. First, a condition on � � is introduced to guaranteeergodicity and stationarity of . Then the kernel function basedstochastic approximation algorithm with expanding truncations(SAAWET) is proposed to recursively estimate the value of � �at any given � ��� � � � � � ��� � � � � �� �� .It is shown that the estimate converges to the true value withprobability one. In establishing the strong consistency of theestimate, the properties of the Markov chain associated with theNARX system play an important role. Numerical examples aregiven, which show that the simulation results are consistent withthe theoretical analysis. The intention of the paper is not only topresent a concrete solution to the problem under considerationbut also to profile a new analysis method for nonlinear systems.The proposed method consisting in combining the Markov chainproperties with stochastic approximation algorithms may be offuture potential, although a restrictive condition has to be imposedon � �, that is, the growth rate of � � should not be faster thanlinear with coefficient less than 1 as tends to infinity.

Index Terms—Kernel function, Markov chain, nonlinear ARXsystem, recursive identification, stochastic approximation.

I. INTRODUCTION

I DENTIFICATION for linear stochastic systems (see, e.g.,[7], [16]) has been extensively studied for many years, and

it is relatively mature in comparison with that for nonlinear sys-tems. The system consisting of a linear subsystem cascadedwith a static nonlinearity, called the Hammerstein or Wienersystem depending on the order of their connection, probablyis the simplest nonlinear system. Identification of Hammersteinand Wiener systems has been attracting a great attention fromresearchers in recent years (see, e.g., [5], [6], [14], [15], [17],[29], [30] and references therein). For this kind of systems, the

Manuscript received July 24, 2008; revised March 05, 2009. First publishedFebruary 05, 2010; current version published June 09, 2010. This work wassupported in part by NSFC 60821091 and partly by 60625305, 60721003, 973Program 2009CB320602, by NSFC 60821091, 60874001, by a grant from theNational Laboratory of Space Intelligent Control, and by the Australian Re-search Council. Recommended by Associate Editor B. Ninness.

W.-X. Zhao is with the Department of Automation, Tsinghua University, Bei-jing 100084, China (e-mail: wxzhao@mail.tsinghua.edu.cn; wxzhao@amss.ac.cn).

H.-F. Chen is with the Key Laboratory of Systems and Control of CAS, Insti-tute of Systems Science, AMSS, Chinese Academy of Sciences, Beijing 100190,China (e-mail: hfchen@iss.ac.cn).

W. X. Zheng is with the School of Computing and Mathematics, University ofWestern Sydney, Sydney, NSW 1797, Australia (e-mail: w.zheng@uws.edu.au).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TAC.2010.2042236

structure information greatly simplifies the work when identi-fying the systems.

Let us consider the problem of identifying the followingsingle-input single-output (SISO) nonlinear autoregressivesystems with exogenous inputs (NARX)

(1)

where and are the system input and output, respectively,is the system noise, is the known system order and is

the unknown function to be identified. The NARX system (1)is a straightforward generalization of the linear ARX systemand belongs to the class of nonlinear systems without a prioristructure information. The problem under study in this paper isto recursively identify the nonlinear function on the basisof observations and to prove the strong consistency ofestimates.

Identification of the NARX system (1) is a topic of manypapers, e.g., [2], [3], [13], [26], [27] among others. Accordingto the description of , the methods for identifying (1) canroughly be divided into two categories: the parametric approach[27], and the nonparametric approach [2], [3], [13], [26]. In theparametric approach, the unknown is expanded to a sum ofknown functions (for example, polynomials, neural networks,wavelets and so on) with unknown coefficients [27]. Then, iden-tification of NARX systems becomes a parameter estimationproblem.

It is noticed that the system (1) is representable as

with

In the nonparametric approach, the value of is estimatedfor any fixed . It is of practical significance to con-sider the nonparametric identification, since it may be difficult inadvance to choose an appropriate basis of functions to approxi-mate the unknown , and the assumptions made on nonlinearsystems for such kind of methods are, hopefully, weaker thanthose for parametric methods. In this paper the nonparametricmethod is adopted.

We now briefly review the existing works with nonparametricmethods [2], [3], [13], [26]. In [26] a so-called direct weightoptimization (DWO) method is proposed, which later is alsoconsidered in [2]. Let be the collected data set. TheDWO method proposes to estimate by

(2)

0018-9286/$26.00 © 2010 IEEE

1288 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 6, JUNE 2010

with appropriately chosen . In [26],is obtained by minimizing the following cost function, namely

(3)

while in [2] is defined as

(4)

with being a given estimation error bound. Sinceis unknown in (3) and (4), a min-max approach is usually ap-plied. In [2] and [26], the case of fixed data size is considered,and the estimate given by DWO is nonrecursive. On the otherhand, in [13] the kernel functions based algorithms are intro-duced and the estimates are proved to converge to the true valuesin probability. It is pointed out in [13] that further work may beconcentrated on developing recursive estimation algorithms andon proving convergence with probability one. The kernel func-tion based algorithms are also considered in [3], where underan input-to-output exponential stability condition, the estimatesare proved to be convergent in probability.

From a statistical viewpoint, second order statistics usuallycontain adequate information for identifying linear systems, but,in general, this is not true for nonlinear systems. The strict sta-tionarity of a nonlinear system often plays an important rolein its identification and control, but to establish this propertymay not be an easy task. In this paper, the NARX system (1) isfirst transformed to a state space equation, which is a Markovchain, and the strict stationarity of the system is established byinvestigating the probabilistic properties of the obtained chain.It turns out that the value with arbitrarilyfixed is closely connected with , where is the in-variant probability density of the chain. This implies that forestimating we have to estimate . Therefore, theNARX identification problem can be transformed to the root-seeking problem of an unknown regression function with root

. To solve this, the stochastic approximation algorithmwith expanding truncations (SAAWET) [4] is an appropriatetool. This is the basic idea of the new analysis method proposedin the paper.

Thus, a kernel function based SAAWET is proposed to recur-sively estimate the values of at any fixed . The strongconsistency of the estimates is established with the help of thegeometric ergodicity and mixing properties of the Markov chainand the general convergence theorem (GCT) of SAAWET aswell.

Though a condition to be imposed requiring the growth rate ofnot be faster than linear with coefficient less than 1 could

be restrictive in some situations, the analysis method used inthe paper may be of future potential in dealing with other prob-lems arising from systems and control, since many systems areMarkovian and SAAWET is a powerful tool providing recursiveestimates.

The rest of this work is arranged as follows. To make theproblem simple enough, we first consider the first order NARXsystem (1), i.e., . In Section II, the assumptions and iden-tification algorithms are proposed. The probability properties of

the NARX system and the strong consistency of estimates are in-vestigated in Section III. The results are extended to the generalcase in Section IV. Simulation examples are presentedin Section V. Some concluding remarks are made in Section VI.The detailed proofs of some theoretical results and some relatedresults in stochastic processes are given in the Appendix .

Notations: For a vector , its Euclidean norm is de-noted by and its weighted norm by ,where is a positive definite matrix. Letbe the basic probability space. Denote the real line by and theBorel -algebra on by .

II. ASSUMPTIONS AND IDENTIFICATION ALGORITHMS

In this section, we consider the first order case

(5)

The identification task consists in recursively and consistentlyestimating the value of at any fixed based onthe input-output measurements .

Let the input be a sequence of independent and identi-cally distributed (iid) random variables with

and with a probability density function, denoted by ,which is positive and continuous on .

We make the following assumptions.A1) is a sequence of iid random variables with

, and with a density functionwhich is positive and uniformly continuous on ;A2) and are mutually independent.

By introducing and, the NARX system (5) is expressible in the state

space form as

(6)

Noticing that is an iid vector sequence, for anyand , we see that

(7)

(8)

From (7) and (8), it follows that is a time-homoge-neous Markov chain valued in [20].

We further need the following condition.A3) is continuous in and there exist a weightedvector norm on , and constants and

such that

(9)

Remark 1: If is bounded, i.e.,, then A3) is satisfied with .

The weighted norm is equivalent to the Euclidean norm, because for some constants

and . However, there are systems for which A3)may or may not hold depending upon which norm is used. Forthe following system, A3) does not hold if the Euclidean [9]

ZHAO et al.: RECURSIVE IDENTIFICATION FOR NONLINEAR ARX SYSTEMS 1289

norm is applied, but it does hold if is appropriatelychosen.

Let the SISO system be such that

(10)

with and . Take the matrix weight

. A simple calculation shows that is posi-

tive definite and

Let be small enough such that . It isclear that there is a such that ,and .

Noticing that...

... , we have

Hence for system (10), A3) is satisfied.However, if the weighted norm is replaced by the Eu-

clidean norm , then, in general, A3) does not hold for (10).To see this, let and .

Assume A3) were true. Then, there would existand such that , or equivalently

which implies if . However,this is impossible since and .

Remark 2: In [26], with the fixed sample size , identifica-tion of NARX system is transformed to a convex optimizationproblem with constraints. The asymptotical properties of NARXsystem, for example, the stability and stationarity, are not dis-cussed. Since the NARX system (1) is an infinite impulse re-sponse (IIR) system, to investigate its asymptotical properties,a certain kind of conditions have to be imposed on the systemitself. In [3], an exponentially input-to-output stable conditionis introduced, i.e.

where are initial conditions, and andare constants with and . This condi-

tion requires that the input and output sequences be bounded.It is clear that to investigate properties of a dynamic systemit is natural to impose conditions on or rather than

on its input-output data sequence. So in this paper A3) is im-posed. By the tool of Markov chains, it will be shown that theconsidered NARX system is asymptotically stationary, with aunique invariant probability measure and with mixing coeffi-cients tending to zero exponentially fast. It is noticed that similarconditions like A3) are discussed in [1] for some classes of non-linear time series with .

For a given , we define the kernel functionsusing experimental data as

(11)

with for a fixed . Then an estimatefor at an arbitrarily fixed pair can be obtained byusing the quantities and , which are given bythe following algorithms.

Let be a sequence of positive numbers increasinglydiverging to infinity. With arbitrary initial values and

and with and , andare recursively calculated as follows:

(12)

(13)

and

(14)

(15)

where is the indicator function, i.e., if ,, otherwise.

Then, defined by (12)–(13) and definedby (14)–(15) serve as estimates for and

, respectively, where is the invariantprobability density of the chain defined by (6). So

is used as an estimate for providedthat .

Both (12)–(13) and (14)–(15) are SAAWET’s [4]. For conve-nience of reading, the general convergence theorem (GCT) forSAAWET is formulated in Appendix A.

In the algorithm composed of (12)–(13), repre-sents the number of truncations that have occurred upto time , while is the truncation bound when the

estimate is generated. From (12) it is seen thatif

, then the algorithm evolves as the classical Rob-bins-Monro (RM) algorithm [25] and . If

, then

1290 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 6, JUNE 2010

the estimate at time is pulled back to 0, and the truncationbound is enlarged from to .

Under moderate conditions, it can be shown that the trunca-tion ceases in a finite time, or equivalently, the algorithm (12)turns to be the RM algorithm after a finite number of steps. Theconditions required for convergence of SAAWET are signifi-cantly weakener than those for convergence of the RM algo-rithm [22]. For details we refer to Chapters 1 and 2 of [4].

III. ERGODICITY OF AND STRONG

CONSISTENCY OF ESTIMATES

Denote the Lebesgue measure on by . Letbe a signed measure on . The total variation norm ofis denoted by , i.e.

where is the Jordan decomposition of . From thedefinition of , it follows that for :

The detailed definition and related properties of the total varia-tion norm can be found in [8], [20].

For the chain defined by (6), denote the -step transi-tion probability by , where

and . Let.

We have the following lemma.Lemma 1: If A1)–A3) hold, then the chain defined

by (6) is -irreducible, aperiodic, and is the maximal irre-ducibility measure of . Further, any bounded setsuch that is a small set.

Proof: The detailed proof is given in Appendix C. The def-initions of irreducibility, aperiodicity and small set can be foundin [20]. For convenience of reading, they are also listed in Ap-pendix B.

For ergodicity of defined by (6), we have the followingresult.

Theorem 1: Assume A1)–A3) hold. Then there exist a prob-ability measure on , a nonnegative measurablefunction and a constant such that

(16)

Proof: The proof, which is motivated by the stability anal-ysis of nonlinear time series (cf. [1]), is obtained by applyingthe drift criterion (Lemma B4 in Appendix B).

From assumption A3) and noticing that is iid, we havethe following inequalities:

(17)

(18)

where .

Choose large enough such thatand define . By

Lemma 1, is a small set.From (17) and (18), we have

(19)

and

(20)

Using (19) and (20), by Lemma B4 (see Appendix B) with, we conclude (16).

Remark 3: In fact, is the invariant probability measureof , i.e.

(21)

For the nonnegative measurable function appearing inTheorem 1, we have the following lemma.

Lemma 2: Assume A1) and A2) hold.(i) If is bounded, , then

.(ii) If A3) holds, then can be chosen such that

.Proof: (i) can be proved by verifying the Doeblin’s condi-

tion [20], and (ii) is an assertion of [24]. The detailed analysisis attached in Appendix C.

In the sequel, the following condition is also needed.A4) .

Remark 4: If is bounded, then by Lemma 2,and A4) holds. If the initial distri-

bution , i.e., the chain is stationary, then. So, the

assumption A4) is reasonable though it is rather restrictive.Theorem 2: Assume A1)–A4) hold. Then(i) with some constants

and .(ii) is with positive and continuous density on

.Proof: By Theorem 1 and A4) and by noting the basic

property of the total variation norm, for any we have

ZHAO et al.: RECURSIVE IDENTIFICATION FOR NONLINEAR ARX SYSTEMS 1291

Hence (i) holds.To prove (ii), noticing that both and are iid se-

quences with densities and , respectively, forwe obtain (for details see Appendix C, Proof of

Lemma 1)

(22)

By using (21) and (22) and the Fubini’s Theorem [8], it followsthat:

(23)

with density

(24)

Since is continuous and by A1) is uniformly con-tinuous on , is continuous on . According to A3),we have for any fixed . As both

and are positive, for a large enough it followsthat:

This proves (ii).Let and .

The following theorem shows the mixing properties and mixingcoefficients of the considered NARX system. The definitions of

- and -mixing are given in Appendix B.Theorem 3: Assume A1)–A4) hold. Then the chain de-

fined by (6) is an - and -mixing, and the mixing coefficients,denoted by and , are related as follows:

(25)

for some and .

Proof: By Proposition 1 in [10], the -mixing coefficientcan be expressed by the transition probability of the chainas follows:

(26)

Using Theorems 1 and 2 and noting A4), from (26) we have

where is a constant. It is well-known that ,hence (25) holds.

Remark 5: In the case is bounded, it can be verifiedthat the chain is a -mixing with mixing coefficients con-verging to zero exponentially fast. However, without the bound-edness of it is difficult to show that is a -mixing(see [31] for details). This is why we turn to prove the - and

-mixing properties for .Prior to presenting the main result to be given in Theorem 4,

we need the following lemma.Lemma 3: Assume A1)–A4) hold. Then

(27)

(28)

(29)

(30)

for any fixed and

(31)

(32)

(33)

Proof: The detailed proof is given in Appendix C.Theorem 4: Assume A1)–A4) hold. Then for any fixed

(34)

and

(35)

1292 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 6, JUNE 2010

Proof: The algorithm (12) can be rewritten as follows:

where

Define the regression function by

By GCT for SAAWET (see [4] or Appendix A), for (34) it suf-fices to prove

(36)where . Noticing (27),(31) and (33), we know that (36) holds. Hence, (34) is true.

Convergence (35) can be proved in a similar way.Proposition 1: By Theorem 2, is positive on .

Then Theorem 4 implies

IV. EXTENSION TO CASE

We now consider the NARX system with

(37)

Similar to (5), (37) can be rewritten in the state space form as

...

...

...

...

...

...

(38)

or equivalently, , where is a time-homogenous Markov chain. The probability properties of ,such as the irreducibility, aperiodicity, and ergodicity, as wellas recursive identification algorithms of the nonlinear system(37) can be investigated in similar manners as those for the case

under the assumption:A3’) , for some matrix

weight and constants and .For example, for a fixed

the kernel function and the recursive identification algorithmswith and are given by

(39)

with for a fixed

(40)

(41)

and

(42)

(43)

Theorem 5: For the NARX system (37), if A1), A2), A3’),and A4) hold, then

(44)

provided that the invariant probability density of (38) is positiveand continuous at .

Proof: The proof is similar to those for Theorems 1–4 andthus is omitted. We only notice that for the case , the onestep transition probability is used for establishing irre-ducibility and aperiodicity of the chain (see Appendix C),while for the higher-order case , the -step transitionprobability will play the same role as for thecase .

We now present two examples for which A3’) holds.1) The ARX system:

(45)

where is stable, i.e.,;

2) The NARX system

(46)

where is bounded, i.e., .

V. NUMERICAL EXAMPLES

Example 1: Consider the following NARX system

(47)

where and are iid and they are mutually independent,, . It is verified that A1) and A2)

hold.

ZHAO et al.: RECURSIVE IDENTIFICATION FOR NONLINEAR ARX SYSTEMS 1293

Fig. 1. Actual versus estimated surfaces.

Fig. 2. Magnitudes of estimation errors.

Assume . Let and . Similarto Remark 1, it can be verified that A3) is also satisfied.

Let both and be uniformly distributedover the interval . The estimatesand for are calcu-lated according to (12) –(15) with . In Fig. 1,the solid lines describe the true surface of the nonlinearfunction, while the dashed lines the estimated surface

at . The estima-tion error at

is shown in Fig. 2. The estimate sequencesat given points

and are denoted by the dotted lines in Fig. 3, where thesolid lines denote the true values of . It is observed thatthe estimates are close to their true values after .

Example 2: The following system in the state-space form isdefined in [21], which is known as a benchmark problem fornonlinear system identification [3], [21], [26]:

Fig. 3. Sequences of estimates at given points.

where and are the system input and output, respectively,is the system noise, with , and

and are unmeasurable system states.The following NARX system:

is used to model the unknown system.First, samples are generated by iidwith . The function is recursively esti-

mated based on by the algorithms (12)–(15) with. Denote the estimate for by .

Then the following input signals:

are fed to the estimated model to calculate the one-step predictedoutput

, which are marked by the dashed line inFig. 4. The same input signals are also fed to the true system, andthe corresponding output is represented by solid line in Fig. 4.For comparison, the dotted line in Fig. 4 plots the predicted out-puts generated by the following identification algorithms con-sidered in [3], [13]:

where is a Gaussian type kernel with bandwidth .In Fig. 5, the solid and dashed line are the same as those in Fig.

4, but the dotted line indicates the one-step predicted outputsgenerated by the direct weight optimization (DWO) approach

1294 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 6, JUNE 2010

Fig. 4. Actual outputs and predicted outputs.

Fig. 5. Actual outputs and predicted outputs.

[26], where is estimated by ,where

where and.

From Figs. 4 and 5, it is seen that the performances of thethree methods under comparison do not significantly differ fromeach other. However, since the classical kernel approach andthe DWO approach deal with data set of fixed size, they are un-able to update the estimates when new data arriving, i.e., theyare nonrecursive. As the data size increases, the estimates givenby these two methods converge in probability. In the DWO ap-proach, the noise variance is required to be known (see, e.g.,[26]). Besides, the selection of bandwidth may affect the per-formance. In contrast to this, the approach proposed in the paperis recursive and convergent with probability one.

VI. CONCLUSION

From a statistical viewpoint, second order statistics usuallycontain adequate information for identifying linear systems.However, in general, this is not true for nonlinear systems.

To identify nonlinear systems, the strict stationarity is oftenrequired.

In this paper, recursive identification of nonlinear ARX sys-tems is considered and conditions are introduced to guaranteethat the NARX system is strictly stationary and is with mixingcoefficients converging to zero exponentially fast. Then thekernel functions based SAAWET is applied to identify thesystem. The proposed recursive algorithms provide the stronglyconsistent estimates under a set of less restrictive assumptions.The numerical simulations are consistent with the theoreticalanalysis.

Since the Markov property often takes place for many sys-tems, the analysis method used in the paper for identificationmay be applied to other problems in systems and control.

Identification of nonlinear systems is a developing researcharea, and there are many interesting issues to be studied.

(i) Assumption A3) actually restricts the nonlinear functionof the NARX system not to grow up faster than lin-

early. It is of interest to relax A3) but keeping the ergod-icity and stationarity of the system unchanged.

(ii) To remove the positiveness and continuity assumption onthe densities of and is also of interest.

(iii) In the paper, the order is assumed to be available. Iden-tification of , or the structure identification of NARXsystems, deserves further research. Besides, the conver-gence rate of the estimates and the extrapolation capa-bility of the estimated model are also the interesting topicsfor future research.

(iv) The method used in this paper is possible to be extendedto the multivariate case, but the extension may not bestraightforward.

APPENDIX AGENERAL CONVERGENCE THEOREM (GCT) FOR SAAWET

For convenience of reading, let us formulate the GCT for thespecial case, where the root of is a sin-gleton, i.e., .

Let be a sequence of positive numbers increasingly di-verging to infinity and let be given by the following algo-rithms:

(48)

(49)

(50)

We need the following conditions.C1) , and .

C2 There is a continuous differentiable functionsuch that

Further, used in (48) is such that

ZHAO et al.: RECURSIVE IDENTIFICATION FOR NONLINEAR ARX SYSTEMS 1295

C3) For the sample path under consideration

for any such that converges, where

.

C4) is measurable and locally bounded.General Convergence Theorem (GCT) [4]: Assume C1), C2),

and C4) hold. Then for those for which C3) holds.

APPENDIX BRELATED RESULTS OF MARKOV CHAIN AND MIXING PROCESS

Assume that the process is atime-homogeneous Markov chain valued in .

Definition 1: ([20], [28]): The chain iscalled -irreducible if there exists a measure onsuch that for all and all

The measure is called the maximal irreducibility measureof the chain if

(i) is -irreducible;(ii) for any other measure , the chain is -irre-

ducible if and only if is absolutely continuous with re-spect to , i.e., ;

(iii) whenever .In what follows, when the chain is said to be -irre-

ducible, the maximal irreducibility of is implicitly assumed.The following lemma is useful for justifying aperiodicity of

a Markov chain.Lemma B1: ([28]): Suppose that the Markov chain

is -irreducible, . The chain is aperiodic if andonly if there exist a set with and a positiveinteger such that

(51)

for any with , where may dependon .

Definition 2: (Doeblin’s Condition [20]): For a Markovchain the following property is called the Doeblin’s Condition:There exists a probability measure on such that

(52)

for some , and .Lemma B2: (Theorems 16.2.1 and 16.2.3 in [20]): Suppose

that the chain is aperiodic and -irreducible. If sat-isfies the Doeblin’s Condition, then there exist constants

and such that for any ,, where is the invariant probability

measure of .Definition 3: ([20]): Let be a Markov chain in

. A nonempty set is called a small set, ifthere exist some positive integer , some constant , anda probability measure on such that

(53)

Denote the Lebesgue measure on by .Lemma B3: ([23]): Let be a -ir-

reducible and aperiodic Markov chain. For the set with, if there exist some and a

positive integer , such that for any

(54)

then is a small set.Lemma B4: (Drift Criterion [20]): Assume that the chain

is irreducible and aperiodic. If there exist anonnegative measurable function on , a small set

, and constants , and such that

(55)

(56)

then there exists a probability measure on andsuch that

(57)

According to [24], the nonnegative measurable functionin (57) can be selected such that

. Notice that in Lemma B2 is a constant, while inLemma B4 is a positive measurable function. So LemmaB4 is usually called the ergodic criterion, while Lemma B2 theuniformly ergodic criterion.

Definition 4: ([10], [12]): For a random process, let and

. The process is called -mixing if

(58)

and is called -mixing if

(59)

Mixing means that is asymptotically independent ofas tends to infinity. It is well-known that .

Lemma B5: (Covariance Inequality for -Mixing): As-sume that is an -mixing. For and

, if with and, then

Definition 5: ([18], [19]): Let be a sequence of nonde-creasing -algebras. The random sequence

is called a simple mixingale if is -measurable and if forsequences of nonnegative constants andwith as , the following conditions are satisfied:

(a) for all;

(b) ;where for .

1296 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 6, JUNE 2010

Lemma B6: ([18], [19]): Let be asimple mixingale such that

(60)

and for some

(61)

Then

(62)

APPENDIX CPROOFS

Proof of Lemma 1: Step 1 We first show that definedby (6) is -irreducible.

Notice that and are mutually independent withdensities and , respectively. For any

and with , we have

(63)

From (63) it follows that for :

(64)

Since both and are positive and continuous on , itfollows from (64) that for any with , thereexists a bounded set with such that

(65)

By Definition 1, the chain is -irreducible.Step 2 We now prove that is the maximal irreducibility

measure of .Let be another measure on and let be

-irreducible. We first show that . Let and. From (64), , and for any

, we have

(66)

From (66), we conclude that . Otherwise, by the as-sumption that is -irreducible, from it wouldfollow that:

(67)

which contradicts with (66). Therefore, .In (66) we have shown that if ,

then . Hencethe set is null and

. So by Definition 1,is the maximal irreducibility measure of .

Step 3 We now show that the chain is aperiodic. Forthis by (65) and Lemma B1 it suffices to show that

(68)

whenever .Similar to (63) and (64), a direct calculation shows that

(69)

for any and . Since is continuouson and and are positive and continuous on ,

is also positive and continuouswith respect to . For any with

, there exists some bounded subset such that. From (69) we have

So whenever . Hence, byLemma B1, is aperiodic.

Step 4 Finally, we prove that any bounded set withis a small set.

For any bounded set with and for anywith , by noticing that and are

positive and continuous on , it follows from (64) that:

Therefore, by Lemma B3, the bounded set with isa small set.

Proof of Lemma 2: To prove (i), by Lemma B2, we onlyneed to show that the Doeblin’s Condition holds for ifis bounded. The following proof can be deduced from Lemmas1 and 2, Section 2.4.1.2 of [11], but besides the boundedness of

ZHAO et al.: RECURSIVE IDENTIFICATION FOR NONLINEAR ARX SYSTEMS 1297

an additional condition is required therein. We now provethe satisfaction of the Doeblin’s Condition without imposingany additional condition.

Let . Since, from (64) and by the Fubini’s Theorem, for any

fixed we have

(70)

By setting and

it follows from (70) that:

(71)

Since , we haveand . Hence,

. Using the Fubini’s Theorem, we derive from (71) that

(72)

where and. Since

is a probability measure on , the Doeblin’sCondition holds for the chain .

By Lemma B2, we have forsome probability measure and some .

(ii) is an assertion of [24].

Proof of Lemma 3: We first prove (27). As is withdensity , we have the following equalities:

(73)

where

Let . Owing toA3), for some and

. Then by the Lebesgue Dominated Convergence Theorem, wesee

(74)

By Theorem 2 and noting A3), we have

(75)

where and are positive constants. From (74) and(75) it follows that (27) holds. Similarly, convergence (28) canbe proved.

1298 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 6, JUNE 2010

To prove (29), for any fixed we note that

Hence (29) holds. Equation (30) can be proved in a similarmanner.

Next, we prove (31) and (32). Introduce

and and , if . Itis clear that . By Lemma B5, for all and anyfixed , we have the following inequality:

(76)

On the other hand, we have

(77)

Combining (76) and (77) leads to

(78)

where by Theorem 3 and (29),

for some and , and. By Definition 5, is a simple mixingale.

Given , applying Lemma B6 yields

for any fixed and

Hence (31) holds, while (32) can be proved in a similar manner.

Finally, we prove (33). Since is independent of, for (33) we only need to verify

(79)

Noticing (30) and , we have

(80)

Hence (79) holds, which in turn implies that (33) is valid.

ACKNOWLEDGMENT

The authors would like to thank the Associate Editor andthe reviewers for the valuable comments and suggestions whichwere helpful in improving the quality of the paper.

REFERENCES

[1] H. Z. An and F. C. Huang, “The geometrical ergodicity of nonlinearautoregressive models,” Statistica Sinica, vol. 6, no. 4, pp. 943–956,1996.

[2] E. W. Bai and Y. Liu, “Recursive direct weight optimization in non-linear system identification: A minimal probability approach,” IEEETrans. Autom. Control, vol. 52, no. 2, pp. 1218–1231, Feb. 2007.

[3] E. W. Bai, R. Tempo, and Y. Liu, “Identification of IIR nonlinear sys-tems without prior structural information,” IEEE Trans. Autom. Con-trol, vol. 52, no. 3, pp. 442–453, Mar. 2007.

[4] H. F. Chen, Stochastic Approximation and Its Applications. Dor-drecht, The Netherland: Kluwer, 2002.

[5] H. F. Chen, “Pathwise convergence of recursive identification algo-rithms for hammerstein systems,” IEEE Trans. Autom. Control, vol.49, no. 10, pp. 1641–1649, Oct. 2004.

[6] H. F. Chen, “Recursive identification for Wiener model with discon-tinuous piece-wise linear function,” IEEE Trans. Autom. Control, vol.51, no. 3, pp. 390–400, Mar. 2006.

[7] H. F. Chen and L. Guo, Identification and Stochastic Adaptive Con-trol. Boston, MA: Birkhäuser, 1991.

[8] Y. S. Chow and H. Teicher, Probability Theory. New York: Springer-Verlag, 1978.

[9] P. G. Ciarlet, Introductioná I’analyse numérrique matricielle etá I’op-timisation. Paris, France: Masson, 1982.

[10] Y. A. Davydov, “Mixing conditions for Markov chains,” SIAM TheroyProbability Appl., vol. 18, no. 2, pp. 312–328, 1973.

[11] P. Doukhan, Mixing: Properties and Examples, Lecture Notes in Sta-tistics 85. New York: Springer-Verlag, 1994.

[12] J. Q. Fan and Q. Yao, Nonlinear Time Series: Nonparametric and Para-metric Approach. New York: Springer-Verlag, 2003.

[13] A. A. Georgiev, “Nonparametric system identification by kernelmethods,” IEEE Trans. Autom. Control, vol. 29, no. 4, pp. 356–358,Apr. 1984.

[14] W. Greblicki, “Stochastic approximation in nonparametric identifica-tion of hammerstein systems,” IEEE Trans. Autom. Control, vol. 47,no. 11, pp. 1800–1810, Nov. 2002.

[15] X. L. Hu and H. F. Chen, “Strong consistency of recursive identificationfor Wiener systems,” Automatica, vol. 41, no. 11, pp. 1905–1916, 2005.

[16] L. Ljung, System Identification: Theory for Users. Upper SaddleRiver, NJ: Prentice Hall, 1987.

[17] L. Ljung, “Some aspects on nonlinear system identification,” in Proc.14th IFAC Symp. Syst. Identification, Newcastle, Australia, 2005, pp.110–127.

[18] E. Masry and L. Györfi, “Strong consistency and rates for recursiveprobability density estimators of stationary processes,” J. MultivariateAnal., vol. 22, no. 1, pp. 79–93, 1987.

[19] D. L. McLeish, “A maximal inequality and dependent strong laws,”Annals Probability, vol. 3, no. 5, pp. 829–839, 1975.

[20] S. P. Meyn and R. L. Tweedie, Markov Chains and Stochastic Sta-bility. London, U.K.: Springer-Verlag, 1993.

ZHAO et al.: RECURSIVE IDENTIFICATION FOR NONLINEAR ARX SYSTEMS 1299

[21] K. Narendra and S. Li, “Neural networks in control systems,” in Math-ematical Perspectives on Neural Networks, P. Smolensky, M. Mozer,and D. Rumelhart, Eds. Mahwah, NJ: Lawrence Erlbaum, 1996.

[22] M. B. Nevelson and R. Z. Khasminskii, “Stochastic approximationand recursive estimation,” in Translation of math. Monographsm Amer.Math. Soc., Providence, RI, 1976, vol. 47.

[23] E. Nummelin, General Irreducible Markov Chains and Non-negativeOperators. Cambridge, U.K.: Cambridge Univ. Press, 1984.

[24] E. Nummelin and P. Tuominen, “Geometric ergodicity of harris re-current Markov chains with applications to renewal theory,” StochasticProcesses Appl., vol. 12, no. 2, pp. 187–202, 1982.

[25] H. Robbins and S. Monro, “A stochastic approximation method,” Annu.Math. Statist., vol. 22, no. 3, pp. 400–407, 1951.

[26] J. Roll, A. Nazin, and L. Ljung, “Nonlinear system identification viadirect weight optimization,” Automatica, vol. 41, no. 3, pp. 475–490,2005.

[27] J. Sjoberg et al., “Nonlinear black-box modeling in system identifica-tion: A unified overview,” Automatica, vol. 31, no. 12, pp. 1691–1724,1995.

[28] H. Tong, Nonlinear Time Series. Oxford, U.K.: Oxford Univ. Press,1990.

[29] J. Vörös, “Parameter identification of Wiener systems with multiseg-ment piecewise-linear nonlinearities,” Syst. Control Lett., vol. 56, no.2, pp. 99–105, 2007.

[30] T. Wigren, “Recursive prediction error identification using the non-linear Wiener mode,” Automatica, vol. 29, no. 4, pp. 1011–1025, 1993.

[31] W. X. Zhao, “Identification and Adaptive Tracking for Some Classesof Stochastic Systems,” Ph.D. dissertation, Chinese Academy of Sci-ences, Beijing, China, 2008.

Wen-Xiao Zhao received the B.Sc. degree in ap-plied mathematics from Shandong University, Jinan,China in 2003 and the Ph.D. degree in operationresearch and control from the Institute of SystemsScience, AMSS, Chinese Academy of Sciences,Beijing, China, in 2008.

He currently is a Postdoctoral Fellow with the De-partment of Automation, Tsinghua University, Bei-jing, China. His research interests are mainly in re-cursive identification, adaptive control, and systembiology.

Han-Fu Chen (SM’94–F’97) received the Diplomadegree from the Leningrad (St. Petersburg) State Uni-versity, Leningrad, Russia.

He joined the Institute of Mathematics, ChineseAcademy of Sciences (CAS), Beijing, in 1961.Since 1979 he has been with the Institute of SystemsScience, AMSS, CAS. He is a Professor of the KeyLaboratory of Systems and Control of CAS. Hisresearch interests are mainly in stochastic systems,including system identification, adaptive control,and stochastic approximation. He authored and

coauthored more than 180 journal papers and seven books.Mr. Chen is an IFAC Fellow, a Member of TWAS, and a Member of CAS.

Wei Xing Zheng (M’93–SM’98) received the Ph.D.degree in electrical engineering from Southeast Uni-versity, Nanjing, China, in 1989.

He has held various faculty/research/visitingpositions at Southeast University, China, ImperialCollege of Science, Technology and Medicine, UK,University of Western Australia, Curtin Universityof Technology, Australia, Munich University ofTechnology, Germany, University of Virginia, USA,and University of California-Davis, USA. Currentlyhe holds the rank of Full Professor at University of

Western Sydney, Australia.Dr. Zheng was an Associate Editor for the IEEE TRANSACTIONS ON CIRCUITS

AND SYSTEMS-I: FUNDAMENTAL THEORY AND APPLICATIONS (2002–2004), theIEEE TRANSACTIONS ON AUTOMATIC CONTROL (2004–2007), IEEE SIGNAL

PROCESSING LETTERS (2007-), and the IEEE TRANSACTIONS ON CIRCUITS AND

SYSTEMS-II: EXPRESS BRIEFS (2008-). He is also an Associate Editor of IEEEControl Systems Society’s Conference Editorial Board (2000-). Currently, heserves as a Guest Associate Editor of Special Issue on Blind Signal Processingand Its Applications for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I:REGULAR PAPERS. He has also served as the Chair of IEEE Circuits and SystemsSociety’s Technical Committee on Neural Systems and Applications and as theChair-Elect of IEEE Circuits and Systems Society’s Technical Committee onBlind Signal Processing.

Recommended