[IEEE 1999 American Control Conference - San Diego, CA, USA (2-4 June 1999)] Proceedings of the 1999 American Control Conference (Cat. No. 99CH36251) - Important issues of system identification

Proceedings of the American Control Conference San Diego, California *June 1999

Important Issues of System Identification using Algorithms for Structure Selection: Do noise terms have any influence when the

data are chaotic? Eduardo M. A. M. Mendes

Departamento de Eletricidade - FUNREI PraGa Frei Orlando 170 - Centro

Si50 Jo50 Del Rei, MG - 36300.000 - Brazil Tel.: +55(0)32 3792542, FAX: +55(0)32 3792306, E-mail: [email protected]

Abstract

This paper addresses one of the problems of model structure selection of nonlinear systems behaving chaotically. It will be shown that the noise terms can determine whether an estimated model is dynamically valid or not, even if the process terms of such model are perfectly correct.

Keywords: Linear Identijication, Structure Detection and Chaos

1 Introduction

Mathematical modelling is of fundamental importance in science and engineering. It is a very useful and compact way of summarizing the knowledge about a process or system. This need for identifying and modelling underlying dependencies in observation data appears in a wide range of applications such as economics, medicine, process control and other areas like ecology and agriculture. Such model building processes enable many systems to be analyzed and understood.

A critical step in any identification procedure is the selection of model structures. The search for simple but effec- tive model structures is referred to as the model structure selection problem. Several techniques for model structure selection has been suggested in the literature [ l , 21. Such techniques provide a detailed approach to selecting the most relevant terms in accordance with some pre-defined (usual statistical) criterion. Amongst such techniques for model structure selection is the error reduction ratio (ERR) devised by [3] which estimates the contribution that a term makes to the overall system output. ERR is a by-product of the orthogonal least-squares (OLS) algorithm. This algorithm starts off with an empty model. The contribution of each term of the trial model is then measured and the term of the greatest contribution in a particular stage of the algorithm is selected to be part of the final model.

There is an increased necessity of obtaining models which can represent the actual system not only statistically but also dynamically. To this end, it is verified if the estimated model can reproduce the dynamical invariants of the system under scrutiny. In simulation studies it is not difficult to demon- strate that an identified model is a good representation of the original system. However, when no a priori knowledge is available, the detection of the adequacy of a model is there-

fore more difficult. The objective of this paper is to address the relationship between dynamically valid models and the presence of noise terms. It will be shown that the number and type of noise terms greatly affect the dynamics of estimated models.

The paper is organized as follows. Section 2 includes most of the background needed to understand the paper. Section 3 shows the influence of noise terms in the process of identifying correct models. The main point of the paper is sum- marized in section 4.

2 Preliminaries

This section briefly reviews some basic concepts which will be used in the identification of dynamically valid models.

2.1 Identification Consider the nonlinear autoregressive moving average model (NARMA) [4]

where ny and ne are the maximum lags considered for the process and noise terms, respectively. Moreover, y ( k ) is a time series and e(k) accounts for uncertainties, possible noise, unmodelled dynamics, etc. and Fe[.] is some nonlinear function of y ( k ) and e(k) with degree of nonlinearity e € z+. To estimate the parameters of a polynomial map F e [ . ] of degree e, equation ( 1 ) has to be expressed in prediction form as

y ( k ) = YT(k- 1)6 + S(k) ( 2 )

where Y " ( k - 1) = [ q ( k - 1) y<(k- 1) y(k- I)] and

6 = [e: q ( k - 1 ) is a matrix which contains output terms up to and including time k - 1. The matrices q k ( k - 1 ) and y(k - 1) are defined similarly. The parameters corresponding to each term in such matrices are the el- ements of the vectors 6y, eye and &,, respectively. Finally, < ( k ) are the residuals which are defined as the difference between the measured data y ( k ) and the one-step-ahead prediction Y"(k - I)& The parameter vector 0 can be esti-

0-7803-4990-6199 $1 0.00 0 1999 AACC 3960

mated by orthogonal least squares techniques [SI which al- low the determination of the structure of the final model.

Parameter estimation for the aforementioned orthogonal techniques is performed for a linear-in-the-parameters model which is closely related to (2) and which is repre- sented as y ( k ) = ' giwi(k) + c ( k ) , where np + nc is the number of (process plus noise) terms in the model, {gi}!:rs are constant parameters and the polynomials {wi(k)}!IT' are constructed from the original monomials to be orthogonal over the data records. Finally, the original parameters of the model in equation (2) can be easily

n +"E retrieved from the {gi}i'l .

A criterion for selecting the most important terms in the model can be devised as a byproduct of the orthogonal parameter estimation procedure. The maximum mean squared prediction error (MSPE) is achieved when no terms are in- cluded in the model. In this case the MSPE equals y2(k) where the over-bar indicates time averaging. The reduction in the MSPE due to the inclusion of the i th term, giwi(k), in the auxiliary model mentioned in the previous paragraph is (l/N)g? w?(k). Expressing this reduction as a percentage of the total MSPE yields the error reduction ratio (ERR) [I]

n +n

-

Hence those terms with large values of ERR are selected to form the model.

Note that ERR is a statistical criterium and therefore does not imply that a final selected model is dynamically valid. The ERR'S failure to select the relevant terms in some cases is addressed in [6,7].

3 Do noise terms have any influence when the data are chaotic?

It is well-known that when the data under investigation are finite the number of possible nonlinear approximations can be quite large. In these cases, it is very difficult to define which model structure has generated the data since there is a multitude of possible model structures to choose from.

The objective of this section is to show that even if a good structure is correctly identified it is sometimes necessary to do something else in order to reproduce the chaotic behaviour. The data analyzed here contain noise which makes the problem much more difficult for an algorithm such as OLS-ERR to choose a good structure. The reader is re- minded that a good structure in the context of this section means a model structure which can reproduce the dynamical invariants of the original system. An alternative procedure to avoid the problem of noisy data is to filter the data prior to identification [8].

The data used to illustrate the main results were generated from a simulation of Chua's circuit [9, 10, 11, 121 when it settles on the well-known double scroll attractor [9]. The Lyapunov spectra are hl = 0.23, hl = 0.0 and h3 = - 1.78 and therefore the Lyapunov Dimension DL is 2.13 [13].

For comparative purposes the data are the same as those used by Aguirre and Billings in a recent paper [14] where the benefits of the one step ahead filtering were presented when the data are embedding in noise. The authors could not find any model capable of reconstructing the attractor when the data used were not filtered using their approach. Using MATLAB it was possible to test several other structures which fortunately had the desired characteristics, i.e., the chaotic behaviour of Chua's circuit.

The contaminated data used in [I41 contained 18001 points sampled at T, = 0.015. These particular data were only used for filtering. By decimating' the raw data by 10, the data set used in the present study was obtained. It is worth stressing that no filtering operation was made during the process of decimation.

All models were obtained using an initial specification C = 3, n, = 52. This represents models with less complexity than those proposed in [14]. To analyze the different structures obtained by the OLS-ERR algorithm the following strategy was adopted i) The number of terms was varied from 7 to 25, ii) The number of linear noise terms was 0, 5, 10, 20 and 25, corresponding respectively to noise lags of 0, 5 and so on. It is worth mentioning that although the noise was added to the data (additive noise), noise model terms are important in recovering the chaotic behaviour, and iii) The validation of the models were made by comparing the original attractor, the Lyapunov spectra and the Lyapunov dimension.

Table (1) shows the Lyapunov dimension for several models when the number of process and noise terms is varied. Only the Lyapunov exponents of models with the number of noise terms equals to 0 and 20 are shown in Tables (2) and (3). It is interesting to note that the Lyapunov Dimen- sion varies considerably over a range from 0 to 2.148. This means that completely different chaotic and non-chaotic behaviours have been identified from a single piece of data. Such a variety of behaviours is not observed when non- chaotic data are analysed and therefore the inclusion of noise terms was not so important as it is in the case of identification of chaotic systems. For instance, models with 13 process terms and different number of noise terms presented non-chaotic behaviour (DL = O.O), unknown chaotic behaviour (DL = 1.159 and DL = 1.379) and spiral attractor (DL = 2.018 and DL = 2.020). All identified models had the same process terms, only the number of noise terms was varied. The noise terms "tuned" the coefficients of the process terms as will be explained in more details later in the section.

To determine further the effects of the introduction of noise terms, consider models identified with 18 process terms. One example of such a model is

z(k) = +0.15486~ 10" z(k- 1)

-0.63085 x IO-' z(k - 3)z(k - 4)z(k -4)

-0.71827 x 10+'z(k - 3) +0.30381xlO-'z(k- l)z(k- I)z(k-4)

'The terms sampling and decimating may cause problems. In this paper sampling will always be related to the experimental part whereas dec imf - ing will be related to the pre-processing of the data under investigation.

'The letter z in n, indicates that the estimated models were obtained using data of the coordinate z of Chua's equations.

3961

r Lvaounov Dimension I c , .

.. . . . - . . . - . . . . . . . . .. 10 1 ) 1.003 1.004 1 1.008 I 1.016 I 1.022 I 1.062 Y I I 0 I 2.110 I 2.107 I 2.065 I 2.082 I 1.2W

Table 1: Values of Lyapunov Dimension for models identified from noisy data of the z-coordinate of Chua's circuit. np is the number of process terms in the model. ng is the number of noise terms.

+0.62423x 10+'z(k -5)

+0.39056x 10-'z(k-4)z(k-4)z(k-5)

+0.13292x IO-' z(k - 2)z(k - 3)z(k-4) -0 .39112~ 10C0z(k-4)

-0.65396xlO-'z(k- l)z(k- l)z(k- 1)

+0.2453x lO"z(k-2)

-0.17522~ lO+'~(k-5)~(k-5)~(k-5)

-0 .21519~10+~z(k- I)z(k- l)z(k-5)

-0.34382~ lO+'z(k- I)z(k-5)z(k-5) +0.77448x IO-'z(k- l)z(k- I)z(k-3)

+0 .42399~10~~z(k-3)z(k-5)z(k-5)

+0.35102x 10+Oz(k- l)z(k- 3)z(k-5) (4)

+0.58725~ IO-' ~ ( k - l)z(k-4)z(k-5)

-0.25749~ lO+'z(k - 3)z(k - 3)z(k - 5)

Clearly in Table (1) when the number of noise terms is equal to zero the Lyapunov dimension is zero, indicating that the estimated model cannot reproduce the chaotic behaviour. However when the z(k - 1) coefficient of model (4) is varied within a certain range of values, the structure reveals that the model has a chaotic-like behaviour. See the bifurcation diagram depicted in Figure (1)). Indeed if the coefficient is set to be 1.64, the model's attractor is closer to the double scroll attractor than when the estimated coefficient is used as can be seen in Figure (2). This provides an indication that the model structure is correct, in the sense that it can reproduce the desired chaotic behaviour, but the estimated coefficients are in error. The adequacy of such a model structure is further confirmed in Table (4) where the Lyapunov spectra and dimension are calculated. Note that the values are in good agreement with the original values mentioned ear- lier in this section. One way to solve the problem of the coefficients in error is to add noise terms in order to "tune" the coefficients although, in principle, the model does not require them since the noise is additive and IID with a gaus-

I Lvaounov Exoonents I

Table 2: Lyapunov Exponents for models without linear noise terms. Noisy data from the z-coordinate of Chua's circuit was used for the identification. np is the number of process terms in the models.

sian distribution. The assumption that the residuals are IID normal is considered a strong one [15]. Mees stated that the assumption that the residuals are independent is dubi- ous whenever embedding techniques are used but in simple cases such as this very one the assumption is harmless.

I -413 1.4 1.5 1.6 1.7

Y

Figure 1: Part of the bifurcation diagram for model with 18 process terms but without noise terms. The data were embedded using the time delay method. The parameter y is the coefficient of the term z ( k - 1).

When twenty noise terms with lags up twenty are added, the identified model presents a Lyapunov dimension close to 2.13 which signifies that the structure may present the desired chaotic behaviour. Plotting the chaotic attractor for this model, the closeness to the original system can be confirmed as can be seen in Figure (3). It is worth emphasizing that the process terms have not changed, but the coefficients of such terms were tuned by the inclusion of noise terms.

Tables (5) and (6) show the variation of the Lyapunov Di- mension and Exponents when the number of noise terms is changed for models with 18 and 20 process terms respec-

3962

I Lyapunov Exponents I

I LU II U.L558 I -U.W58 I -1.15u I -1.l51 I -4.14u I

Figure 2: Attractor for model with 18 process terms but without noise terms. (a) Estimated Coefficient (b) Coeffi- cient equals 1.64. T p = 3 is used for embedding the data.

I I 'I

Table 3: Lyapunov Exponents for models with 20 linear noise terms. Noisy data from the z-coordinate of Chua's circuit was used for identification. np is the process number in the models.

0.2628 -0.008203

-1.5 -+ DL = 2.1697'2 -1.545 -3.045

Lyapunov exponents =

Table 4: Lyapunov Exponents of model (4) when the coefficient of the term z(k-1) equal to 1.64.

tively. In Table (6) it can be seen that the model identified with 20 process terms and 10 linear noise terms does not have DL close to 2.13 which certainly indicates different dynamic characteristics. A close look at Figure (4) shows that the attractor resembles the Chua's spiral attractor.

[ Model with 18 Process Terms - Dynamic Characteristics 1

, __- I -.-.- , ".""I I ....- , _.,"_ , _. I . -

20 I 2.148 I 0.230 I 0.001 I -1.562 I -1.549 I -4.637 25 I 2.113 I 0.199 I -0.013 I -1.638 I -1.646 I -4.277

Table 5: Values of Lyapunov Dimension and Exponents for estimated models with 18 process terms. ng is the number of noise terms

The inclusion of noise terms in the estimated models can be thought of as a regularization method since further infor- mation about the desired solution is incorporated in order to single out a useful and stable solution. For a nice introduction to this subject refer to [16]. Undoubtedly, the inclusion of noise terms is a rather restricted method where the only free parameters are the number and the type of noise terms.

Figure 3: (a) Double scroll attractor and (b) attractor for the model with 18 process terms and 20 linear noise terms. T p = 3 is used for embedding the data.

model does not reproduce the desired dynamical characteristics, its structure can still be considered adequate. The results for Chua's circuit provide clear support for these ideas. Also it is believed that these results have great implications when real data are analyzed. Models previously regarded as bad can be "regularized" in such a way that the desired dynamical characteristics are achieved. Of course, this can only be done if the model structure is correct.

It is worth mentioning that the identified models shown in this section exhibit sensitive dependence on the parameters (SSD), i.e., a slight change in one of the parameters could cause the model to behave completely differently. This phe- nomenon is exemplified in Figure (4) where a model identified from the data from the double scroll reproduced the Chua's spiral attractor which corresponds to a change in the parameter c1 of the original equations from 9.0 to 8.0. It

I Model with 20 Process Terms - Dvnamic Characteristics 1 I

20 I 2.131 I 0.234 I -0.004 I -1.750 I -1.751 I -4.740 25 I 2.101 I 0.189 I -0 .W I -1.819 I -1.828 I -4.568

Further analysis on the results presented in this section shows that the model structure selection is more robust to noise than the parameter estimation is. That is, even if a

Table 6: Values of Lyapunov Dimension and Exponents for estimated models with 20 process terms. ng is the number of noise terms.

3963

- - 1 I

3-

2 5 -

2 -

5- =. N 1 -

0 5 -

0-

-05-

-1 I -1 0 1 2 3 4

Z(1)

Figure 4: Attractor for the model with 20 process terms and 10 linear noise terms.

is conjectured that SSD of estimated models is very much related to the sensitive dependence on parameters of the quadratic mapping discussed in [17].

Finally it is important to stress that, although the results presented above were based upon the analysis of a single example, they were confirmed when different systems were analysed such as Rossler equation for hyperchaos [18], Lorenz equations [19] and even when a dynamically valid model was identified from real data [20].

4 Conclusions

The influence of the introduction of noise terms in identified models has been investigated for chaotic systems. De- spite having the correct structure models identified from data generated from such systems have been shown to be sensitive to small variations in the parameters, especially when they have to reproduce the dynamical invariants of the original chaotic systems.

Acknowledgments - We are grateful to Prof. S. A. Billings for his constant help.

References

[I] S. A. Billings, S. Chen, and M. J. Korenberg. Identi- fication of MIMO nonlinear systems using a forward- regression orthogonal 2157-2189,1989.

estimator. Int. J. Coitrol, 49(6):

[2] H. Haber and H. Unbehauen. Structure identification of nonlinear dynamic systems - A survey on input/output approaches. Automatica, 26(4):65 1-677, 1990.

[3] S. A. Billings, M. Korenberg, and S. Chen. Identifi- cation of non-linear output-affine systems using an orthogonal least-squares algorithm. Int. J. Systems Sci., 19(8):1559-1568, 1988.

[4] S. Chen and S. A. Billings. Representations of nonlinear systems: the NARMAX model. Int. J. Control, 49(3):1013-1032, 1989.

[5] S. Chen, S. A. Billings, and W. Luo. Orthogonal least squares methods and their application to nonlinear system identification. Int. J. Control, 50(5): 1873- 1896,1989.

[6] E. M. A. M. Mendes and S . A. Billings. An important issue of system identification using algorithms for structure selection. In 12th Conference on Robotics and Factories to the Future, Middlesex University, London, England, 1996.

[7] E. M. A. M. Mendes. An important issue of system identification using algorithms for structure selection: Residual overflow. In 13th Conference on Robotics and Factories to the Future, Pereira, Colombia, 1997.

[SI L. A. Aguirre, E. M. Mendes, and S . A. Billings. Smoothing data with local instabilities for the identification of chaotic systems. Int. J. Control, 1995.

[9] L. 0. Chua, M. Komuro, and T. Matsumoto. The double scroll family. IEEE 1072-1 118, 1986.

Trans. Circuits Syst., 33( 11):

[lo] Leon 0. Chua. The genesis of chua’s circuit. Inter- national Journal of Electronics and communications (AEU), 46(4):250-257, 1992.

[l 11 L. 0. Chua and M. Hasler. (Guest Editors). Special1 issue on Chaos in nonlinear electronic circuits. IEEE Trans. Circuits Syst., 40(10-1 l), 1993.

[12] Leon 0. Chua. Chua’s circuit: An overview ten years later. Journal of Circuits, (2):117-159, 1994.

Systems and Cornput&, 4

[13] S. Chialina, M. Hasler, and A. Premoli. Fast and accu- rate calculation of Lyapunov exponents for piecewise linear systems. Int. J. B$ Chaos, 4( l):(in press), 1994.

[14] L. A. Aguirre and S. A. Billings. Identification of models for chaotic systems from noisy data: implica- tion for performance and nonlinear D, 85(1,2):239-258, 1995.

fiitering. Physics

[15] Alistair I. Mees. Parsimonious dynamical reconstruc- Int. J. Bifurcation and Chaos, 3(3):669-675,

[16] Per Christian Hansen. Regularization Tools: A Mat- lab Package for Analysis and Solution of Discrete Ill- Posed Problems. UNIoC, Danish Computing Center for Research and Education, Building 305, Technical University of Denmark, DK-2800 Lyngby, Denmark, 1993.

[17] J. D. Farmer. Sensitive dependence on parameters in nonlinear dynamics. Phys. Rev. Lett., 55(4):351-354, 1985.

tion. 1993.

0. E. Rossler. An equation for hyperchaos. Letters A , 7 1 (2,3): 155-1 57, 1979.

Physics

[19] Edward N. Lorenz. Deterministic nonperiodic flow. Journal of The Atmospheric Sciences, 20: 13 1-141, 1963.

[20] E. M. A. M. Mendes. Identification of a fluidized bed system: Model structure and validation issues. In COC’97 - Control of Oscillations and Chaos, Saint Petersburg, Russia, 1997.

3964

Documents

[IEEE 1999 American Control Conference - San Diego, CA, USA (2-4 June 1999)] Proceedings of the 1999 American Control Conference (Cat. No. 99CH36251) - Important issues of system identification