Technical Notesand Correspondence

7/30/2019 Technical Notesand Correspondence

1/6

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 10, OCTOBER 2005 1597

Technical Notes and Correspondence_______________________________

On the Role of Prefiltering in Nonlinear

System Identification

William Spinelli, Luigi Piroddi, and Marco Lovera

AbstractData prefiltering is often used in linear system identificationto increase model accuracy in a specified frequency band, as prefiltering isequivalent to a frequency weighting on the prediction error function. How-ever, this interpretation applies only to a strictly linear setting of the iden-tification problem. In this note, the role of data and error prefiltering innonlinear system identification is analyzed and a frequency domain inter-

pretation is provided, based on the Volterra series representation of non-linear systems. Simulation results illustrate the conclusions of the analysis.

Index TermsFrequency domain analysis, NARX modeling, nonlinearidentification, prefiltering, sampling time, Volterra series.

I. INTRODUCTION

An increasing attention is being dedicated in the system identifica-tion literature to the problem of identifying suitable dynamic models

for nonlinear systems: Many different model classes and algorithms

have been proposed in recent years (see, e.g., [1] and [2] for a compre-

hensive review of nonlinearblack-box approaches and [3][9] for some

recent contributions), some of which are devised as extensions to the

nonlinear case of well-known linear models and procedures. However,

though many aspects of linear identification can be conveniently ex-

tended, there are some that have become common practice also in non-

linear identification, but that should be carefully reexamined to be cor-

rectly applied in that context. One of these is data prefiltering, which is

systematically employed in linear identification, mostly for antialiasing

reasons and to reduce theeffect of high frequency disturbances in order

to increase the signal-to-noise ratio. It is well known from the linear

identification literature (see, e.g., [10]) that prefiltering may be used toimprove model accuracy in a specified frequency band, e.g., as a result

of control specifications which narrow the frequencyregion over which

accurate models are actually needed. Typically, input and output data

are filtered with suitable low-pass or band-pass filters and the filtered

data are used for identification: In the linear case this is equivalent to a

filtering of the prediction error which determines a weight on the cost

function in prediction error estimation, and is thus one of the most im-

portant design variables for selectively emphasizing the fit of the linear

model.

Little attention has been dedicated to the analysis of prefiltering in

nonlinear system identification: In [11] it is pointed out that data pre-

filtering will introduce bias into the estimates even in the noise-free

case and a detailed analysis of mean level induced bias is performed.

In this note, a convenient setting for the general analysis of pre-filtering induced bias is proposed, which extends to the nonlinear case

classical results on the frequency domain interpretation of prediction

error filtering available in the linear system identification literature. In

Manuscript received May 29, 2004; revised February 17, 2005 and May 23,2005. Recommended by Associate Editor L. Ljung. This work was supportedby the Italian National MIUR project Innovative Techniques for Identificationand Adaptive Control of Industrial Systems.

The authors are with the Dipartimento di Elettronica e Informazione,Politecnico di Milano, 20133 Milano, Italy (e-mail: [email protected];[email protected]; [email protected]).

Digital Object Identifier 10.1109/TAC.2005.856655

addition, thediscussion points out thepitfallsassociated with the appli-

cation to nonlinear problems of the practice of data prefiltering, while

error filtering can be effective, though ad hoc parameter estimation al-

gorithms have to be devised. Finally, it is shown how error prefiltering

can be included in the identification cycle for specific nonlinear model

classes of the NARMAX [1], [12] family.

The analysis employs the Volterra series representation of nonlinear

systems [13], [14], for which frequency analysis tools have been devel-

oped, such as the generalized frequency response functions [15].

II. PREFILTERING IN LINEAR IDENTIFICATION

Consider a linear identification problem where the inputoutput

(I/O) data is generated by the true system

(1)

where the noise source is a zero mean, white gaussian noise with

variance is a suitable linear transfer function and is theinverse unit delay operator.

Given a model class defined as

(2)

and parameterized by the vector , the corresponding prediction error

is given by

(3)

where is the estimation error on the

systems transfer function.

The prediction error minimization (PEM) approach to system identi-

fication is based on the minimization of a suitable norm of . A typicalchoice is the quadratic norm

(4)

where and

denotes a filtered version of the prediction

error sequence through a stable linear filter . Clearly

(5)

where and . As is well known

[10], filtering the prediction error is equivalent to performing the iden-tification on a prefiltered I/O data set.

In the frequency domain, the filtered error can be rewritten as

(6)

where denotes the Fourier transform of the input signal. In gen-

eral, if the considered model class is such that is iden-

tically zero for some , i.e., if , then the PEM estimate will

be consistent. Otherwise, the identified model will be biased and the

prefilter may be chosen to affect the bias distribution.

0018-9286/$20.00 2005 IEEE


2/6

1598 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 10, OCTOBER 2005

III. PREFILTERING IN NONLINEAR IDENTIFICATION

A. Volterra Series Representation of Nonlinear Systems

The Volterra series [13], [14] generalizes the convolution integral

expression used for linear systems to represent the I/O relationship for

a discrete-time deterministic causal single-inputsingle-output (SISO)

nonlinear system. The output of the system is given by

(7)

where is the th order Volterra kernel of the nonlinear

system. An equivalent representation in the frequency domain [13] is

given by the generalized frequency response function (GFRF) of order

, defined as the Fourier transform of the th order kernel

(8)

with . The output of the -th order kernel

can also be expressed as a function of the corresponding GFRF [15]

(9)

B. Formulation of the Nonlinear Identification Problem

We will assume that the identification data is generated by a true

system of the form

(10)

where is a zero mean white noise, and we will consider a generic

model class represented, for analysis purposes only, in Volterra

series form. The general expression based on Volterra kernels for a

NOE model parameterized by is

(11)

therefore, the prediction error for a NOE model can be expressed as a

function of the differences between the actual and estimated GFRFs

where .

The following Lemma (see [16]) describes the effect of linear fil-

tering applied to a nonlinear system and will be useful later in this note.

Lemma 1: Consider a series of a linear filter with transfer func-

tion and a nonlinear system described as a Volterra series

with GFRFs . Then, the GFRFs of the overall

system are given by if the

linear filter is connected at the input of the nonlinear system, and

if it is connected at the output.

Note that the first-order GFRF, which describes the linear part ofthe overall system, is equally affected by both kinds of filtering. In fact,

it is given in both cases by .

1) Error Filtering: Applying Lemma1, thefiltered prediction error

can be written as

Clearly, if the considered model class is such that

for some , then the esti-

mate will be consistent, i.e., the minimization of the prediction error

norm and of the filtered prediction error norm (4) lead to the same

model, at least asymptotically, as in the linear case. If, on the other

hand, it cannot be assumed that , then the identified model will

be biased and the prefilter may be chosen to affect the frequency

distribution of the estimation error. Note, however, that the frequency

weightings acting on the kernel estimation errors of

increasing order have significantly different shapes. Further insight

in the shape of the weighting factors can be obtained inspecting

their behavior along the main diagonal in the frequency domain, i.e.,

setting . As an example, Fig. 1(a) illustrates the

Bode plot of the weighting factor for a second order digital

low-pass Butterworth filter with bandwidth [0, 0.1], whereas Fig. 1(b)

considers a second order digital band-pass Butterworth filter with

bandwidth [0.1, 1]. While in the linear case the prefiltering tends to

cut out all the frequencies external to the bandwidth over which the

fit is required, in the nonlinear case the bandwidth of the weighting

factor is a function of the kernel order and the fit is also important in

different bands, due to the nonlinear effect that scatters the spectrum.

2) Data Prefiltering: Denoting the prediction error se-

quence obtained after filtering the identification data, we can write in

view of Lemma 1


3/6


Fig. 1. Bode diagram (magnitude) of the weighting factor for alow-pass filter and a band-pass filter.

where

Even assuming that the considered model structure is sufficiently com-

plex so as to have a value such that

is identically zero for all , the input dependent part of the prediction

error will be (ideally) eliminated for a choice of such that

(12)

Apart from the case will, therefore, correspond

to a weighted version of , where the weighting factor de-

pends both on the filter and on the order of the kernel. In

particular, in the frequency regions where

while the approxima-

tion degrades where this ratio significantly differs from 1. Moreover,

Fig. 2. Bode diagram (magnitude) of the weighting factorfor a low-pass filter and a band-pass filter.

TABLE IESTIMATED PARAMETERS FOR THE MODEL IN EXAMPLE IV-A

for rational filters with positive relative degree (such as con-

ventional low-pass or band-pass filters) the frequency weighting

factor (12) will be of high-pass type, taking abnormally high values

outside the filter bandwidth for higher order kernels. The identification

algorithm will thus try to fit a severely distorted kernel: Though this

distortion takes place outside the filter bandwidth, in some cases

the shape of the weighting factor also affects the model accuracy

inside the filter bandwidth. In fact, the minimization procedure in the

identification algorithm will be enticed to consider as numerically

significant only the portion of the identification data with frequency

content outside the filter bandwidth, thus producing an unwanted bias


4/6


also inside the filter bandwidth. Finally, note that even in the ideal

case when the system belongs to the model family, the presence of

the weighting factor in (12) actually configures an under-parameter-

ized identification problem. As an example, Fig. 2(a) illustrates the

weighting factor on the main diagonal, , for

a second order digital low-pass Butterworth filter with bandwidth

[0, 0.1], while Fig. 2(b) considers a second-order digital band-pass

Butterworth filter with bandwidth [0.1, 1]. In the optimal model, theaccuracy of higher order kernels the filter would be reasonable only

within a fraction of the filter bandwidth, which gets smaller as the

kernel order increases. Also, a severe distortion would result outside

the bandwidth for both filters.

C. Error Filtering for Nonlinear Polynomial NARMAX Models

Consider the general black-box nonlinear model structure [1]

(13)

where is a nonlinear function parameterized in and is a vector

of regressors, defined as past input , output and

estimated output values. Function is often assumed to

depend linearly on

(14)

since straightforward parameter estimation methods of the least

squares family can be employed for model identification [17]. In

particular, polynomial input/output recursive black-box models belong

to this class, where the elements of are linear and nonlinear

monomials of the arguments , and .

For such models, the filtered prediction error can be written as

(15)

where denotes the vector of filtered monomials. Filtered ver-

sions of monomials including and terms only can

be computed at the onset of the minimization procedure, while mono-

mials containing also terms of the type must be filtered at

each algorithm iteration after has been recalculated given the

current parameterization . Therefore, for polynomial NFIR, NARX,

and NOE models (see the classification of [1]), which do not have re-

gressors including terms, error filtering simply amounts to

a prefiltering of the regressors contructed from the original I/O data.

IV. SIMULATION EXAMPLES

A. Example 1

Consider the task of identifying an NFIR model

(16)

from data generated by a system of the same structure (see Table I for

the actual values of the parameters). The identification is performed

over a data set of 10 000 samples generated with both input and noise

signals selected as white gaussian noises, with variances 1 and , re-

spectively, the latter value being chosen in order to obtain

dB. The parameters of the model have been estimated using basic least

squares (LS), as well as least squares with data prefiltering (DFLS) and

error filtering (EFLS). In the last two cases, a second-order low-pass

digital Butterworth filter with bandwidth [0, 0.5] has been considered.

Fig. 3. Magnitude of the first and second (main diagonal only) GFRFs of theidentified NFIR models in Example IV-B: system (continuous), LS (dotted),DFLS (dashed), EFLS (dash-dot).

As shown in Table I, while the estimates computed using LS and EFLS

are consistent, DFLS leads to abnormally high values for the parame-

ters associated with the quadratic regressors. Actually, the DFLS al-

gorithm correctly estimates only the first order GFRF, whereas the

second-order GFRF is badly estimated in all the frequency range. Here

and in thefollowing, theGFRFs are computed according to theprobing

method [15].

B. Example 2

Consider the system

(17)


5/6


TABLE IICLUSTER COEFFICIENTS FOR THE MODELS IN EXAMPLE IV-B

and the NFIR model structure

(18)

so that . Model identification has been performed with the

same algorithms and conditions of the previous example, using a data

set of 1000 samples.

As can be seen from Fig. 3, both methods with filtering provide im-

proved accuracy in the filter bandwidth as far as the first order GFRF

is concerned (at the cost of reduced performance at higher frequen-cies) with respect to standard LS. The EFLS and DFLS methods, how-

ever, behave in a significantly different way with respect to the second

order GFRF: EFLS ensures a significant reduction of the bias within

the desired bandwidth with respect to LS, while the estimate of the

second-order GFRF through DFLS is affected by significant distortion,

mainly at high frequency [in accordance with (12)].

An interesting interpretation of these results can be given in terms

ofcluster coefficients. Terms with the same type of nonlinearity form

a cluster [18]. For example, the cluster groups all regressors

represented as products of a th order factor in and an th order

factor in . The cluster coefficient is the sum of the coef-

ficients of all the terms of a cluster and is a measure of the real sig-

nificance of the corresponding nonlinearity in the model. The cluster

coefficients of thesystem andidentified models arereportedin Table II:It turns out that EFLS is the most successful method in balancing the

bias on the two cluster coefficients.

C. Example 3

Consider the NARX system

(19)

and the (under-parameterized) NARX model structure

(20)

The identification is performed over a data set of 1000 samples, with

the same algorithms and conditions of the previous examples. Fig. 4

shows once again that (low-pass) error filtering can be used to reduce

the bias both in the first and second order GFRF estimates in the fre-

quency band of interest. Concerning the data prefiltering case, notice

that the estimate of the second order GFRF does not exhibit a high

frequency amplification as in the previous cases, despite the identical

shape of the weighting function. This is due to a structural constraint

of the model which is such that for all

. The undesirable effect of the weighting function (12) is still visible

in terms of the increased low frequency bias in the GFRF estimates ob-

tained via DFLS with respect to LS.

(b)

Fig. 4. Magnitude of the first and second (main diagonal only) GFRFs of theidentified NARX models in Example IV-C: system (continuous), LS (dotted),DFLS (dashed), EFLS (dash-dot).

V. CONCLUDING REMARKS

The role of prefiltering in nonlinear system identification has been

investigated within the framework of Volterra series representation of

nonlinear systems. The classical results on the frequency domain in-

terpretation of prediction error filtering available in the linear systemidentification literature have been extended to the nonlinear case and

the pitfalls associated with the naive application to nonlinear problems

of the practice of data prefiltering have been analyzed and illustrated

via simulation examples of NFIR and NARX model identification. Fu-

ture developments include the investigation of nonlinear data and error

prefilters.

REFERENCES

[1] J. Sjoberg, Q. Zhang, L. Ljung, A. Benveniste, B. Delyon, P. Glorennec,H. Hjalmarsson, and A. Juditsky, Nonlinear black-box modeling insystem identification: A unified overview, Automatica, vol. 31, pp.16911724, 1995.


6/6


[2] A. Juditsy, H. Hjalmarsson, A. Benveniste, B. Delyon, L. Ljung,J. Sjoberg, and Q. Zhang, Nonlinear black-box models in systemidentification: Mathematical foundations, Automatica, vol. 31, pp.17251750, 1995.

[3] S. Chen and S. A. Billings, Orthogonal least squares methods and theirapplication to nonlinear system identification, Int. J. Control, vol. 50,pp. 18731896, 1989.

[4] , Representations of nonlinear systems: The NARMAX model,Int. J. Control, vol. 49, pp. 10131032, 1989.

[5] G.-O. Glentis, P. Koukoulas, and N. Kalouptsidis, Efficient algorithmsfor Volterra system identification, IEEE Trans. Signal Process., vol. 47,no. 11, pp. 30423057, Nov. 1999.

[6] L. Piroddi and W. Spinelli, An identification algorithm for polynomialNARX models based on simulation error minimization, Int. J. Control,vol. 76, no. 17, pp. 283295, 2001.

[7] F. Previdi and M. Lovera, Identification of a class of nonlinear para-metrically varying models, Int. J. Adapt. Control Signal Process., vol.17, pp. 3350, 2003.

[8] J. Schoukens, J. G. Nemeth, P. Crama, Y. Rolain, and R. Pintelon, Fastapproximate identification of nonlinear systems, Automatica, vol. 39,no. 7, pp. 12671274, 2003.

[9] R. Murray-Smith and T. A. Johansen, Multiple Model Approaches toModeling and Control. London, U.K.: Taylor and Francis, 1997.

[10] L. Ljung, System Identification: Theory for the User. Upper SaddleRiver, NJ: Prentice-Hall, 1999.

[11] J. P. Jones and S. Billings, Mean levels in nonlinear analysis and iden-

tification, Int. J. Control, vol. 58, no. 5, pp. 10331052, 1993.[12] I. Leontaritis and S. Billings, Input output parametric models for non

linear systems, Int. J. Control, vol. 41, pp. 303344, 1985.[13] M. Schetzen, The Volterra and Wiener Theories of Nonlinear Sys-

tems. New York: Wiley, 1980.[14] W. Rugh, Nonlinear System Theory: The Volterra/Wiener Ap-

proach. Baltimore, MD: Johns Hopkins Univ. Press, 1981.[15] S. Billings and K. Tsang, Spectral analysis for nonlinear systems, part

I: Parametric nonlinear spectral analysis, Mech. Syst. Signal Process.,vol. 3, no. 4, pp. 319339, 1989.

[16] S. Boyd, L. Chua, and C. Desoer, Analytical foundations of Volterraseries, IMA J. Math. Control Inform., vol. 1, pp. 243282, 1984.

[17] S. Billings, S. Chen, and M. Korenberg, Identification of MIMO non-linear systems using a forward-regression orthogonal estimator, Int. J.Control, vol. 49, pp. 21572189, 1989.

[18] L. Aguirre and S. Billings, Improved structure selection for nonlinearmodels based on term clustering, Int. J. Control, vol. 62, pp. 569587,

1995.

Kernel Based Partially Linear Models and

Nonlinear Identification

Marcelo Espinoza, Johan A. K. Suykens, and Bart De Moor

AbstractIn this note, we propose partially linear models with least

squares support vector machines (LS-SVMs) for nonlinear ARX models.

We illustrate how full black-box models can be improved when priorinformation about model structure is available. A real-life example, basedon the Silverbox benchmark data, shows significant improvements inthe generalization ability of the structured model with respect to the fullblack-box model, reflected also by a reduction in the effective number ofparameters.

Index TermsKernels, least squares support vector machine (LS-SVM),nonlinear system identification, partially linear models.

I. INTRODUCTION

The objective of nonlinear system identification is to estimate a rela-

tion between inputs and outputs generated by an unknown

dynamical system. Let be the regression vector corresponding

to the output in a NARX model,

. The nonlinear system identification

task is to estimate a function such that ,

where is a disturbance term. Under the nonlinear black box set-

ting, the function can be estimated by using different techniques (ar-

tificial neural networks [6], (least-squares) support vector machines

((LS)-SVM) [14], [15], wavelets [18], basis expansions [12], splines

[16], polynomials [7], etc., involving a training and validation stage

[14]. The one-step-ahead prediction is simply using

the estimated ; simulation -stepsaheadcan be obtainedby iteratively

applying the prediction equation replacing future outputs by its predic-

tions [8], [12]. In this note, we focus on the case where the nonlinearity

in the model does not apply over all the inputs, rather a subset of it,

leading to the identification of a partially linear model [5], [10], [13].

We propose the partially linear LS-SVM (PL-LSSVM) [2] to improve

theperformance of an existing black-box model when there is evidence

that some of the regressors in the model are linear. Consider, e.g., a

system for which the true model can be described as

.Forthis model,the re-

gression vector is defined by .

If a linearparametric model is estimated, it may have a misspecification

error because the nonlinear term may not be known a priori to the user.

However, by moving to a full nonlinear black-box technique, the linear

parts are not identified as such, yet as a part of the general nonlinear

model. In this case, a nonlinear black-box model will show a better per-

formance than a full linearmodel, butit has a complexity which may be

Manuscript received May 28, 2004; revised March 15, 2005 and May 30,

2005. Recommended by Guest Editor L. Ljung. This work was supported bygrants and projects for the Research Council K. U. Leuven (GOA-Mefisto666, GOA-Ambiorics, several Ph.D./Postdocs and Fellow grants), the FlemishGovernment (FWO: Ph.D./Postdocs Grants, Projects G.0240.99, G.0407.02,G.0197.02, G.0211.05, G.0141.03, G.0491.03, G.0120.03, G.0452.04,G.0499.04, ICCoS, ANMMM; AWI; IWT: Ph.D. Grants, GBOU (McKnow,Soft4s), the Belgian Federal Government (Belgian Federal Science PolicyOffice: IUAP V-22; PODO-II (CP/01/40), the EU (FP5-Quprodis; ERNSI, Eu-reka 2063-Impact; Eureka 2419-FLiTE) and Contracts Research/Agreements(ISMC/IPCOS, Data4s, TML, Elia, LMS, IPCOS, Mastercard).

The authors are with the Department of Electrical Engineering ESAT-SCD,Katholieke Universiteit Leuven, B-3001 Leuven, Belgium (e-mail: [email protected]; [email protected]; [email protected]).

Digital Object Identifier 10.1109/TAC.2005.856656

0018-9286/$20.00 2005 IEEE

Documents

Technical Notesand Correspondence