Technical Notesand Correspondence

Embed Size (px)

Citation preview

  • 7/30/2019 Technical Notesand Correspondence

    1/6

    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 10, OCTOBER 2005 1597

    Technical Notes and Correspondence_______________________________

    On the Role of Prefiltering in Nonlinear

    System Identification

    William Spinelli, Luigi Piroddi, and Marco Lovera

    AbstractData prefiltering is often used in linear system identificationto increase model accuracy in a specified frequency band, as prefiltering isequivalent to a frequency weighting on the prediction error function. How-ever, this interpretation applies only to a strictly linear setting of the iden-tification problem. In this note, the role of data and error prefiltering innonlinear system identification is analyzed and a frequency domain inter-

    pretation is provided, based on the Volterra series representation of non-linear systems. Simulation results illustrate the conclusions of the analysis.

    Index TermsFrequency domain analysis, NARX modeling, nonlinearidentification, prefiltering, sampling time, Volterra series.

    I. INTRODUCTION

    An increasing attention is being dedicated in the system identifica-tion literature to the problem of identifying suitable dynamic models

    for nonlinear systems: Many different model classes and algorithms

    have been proposed in recent years (see, e.g., [1] and [2] for a compre-

    hensive review of nonlinearblack-box approaches and [3][9] for some

    recent contributions), some of which are devised as extensions to the

    nonlinear case of well-known linear models and procedures. However,

    though many aspects of linear identification can be conveniently ex-

    tended, there are some that have become common practice also in non-

    linear identification, but that should be carefully reexamined to be cor-

    rectly applied in that context. One of these is data prefiltering, which is

    systematically employed in linear identification, mostly for antialiasing

    reasons and to reduce theeffect of high frequency disturbances in order

    to increase the signal-to-noise ratio. It is well known from the linear

    identification literature (see, e.g., [10]) that prefiltering may be used toimprove model accuracy in a specified frequency band, e.g., as a result

    of control specifications which narrow the frequencyregion over which

    accurate models are actually needed. Typically, input and output data

    are filtered with suitable low-pass or band-pass filters and the filtered

    data are used for identification: In the linear case this is equivalent to a

    filtering of the prediction error which determines a weight on the cost

    function in prediction error estimation, and is thus one of the most im-

    portant design variables for selectively emphasizing the fit of the linear

    model.

    Little attention has been dedicated to the analysis of prefiltering in

    nonlinear system identification: In [11] it is pointed out that data pre-

    filtering will introduce bias into the estimates even in the noise-free

    case and a detailed analysis of mean level induced bias is performed.

    In this note, a convenient setting for the general analysis of pre-filtering induced bias is proposed, which extends to the nonlinear case

    classical results on the frequency domain interpretation of prediction

    error filtering available in the linear system identification literature. In

    Manuscript received May 29, 2004; revised February 17, 2005 and May 23,2005. Recommended by Associate Editor L. Ljung. This work was supportedby the Italian National MIUR project Innovative Techniques for Identificationand Adaptive Control of Industrial Systems.

    The authors are with the Dipartimento di Elettronica e Informazione,Politecnico di Milano, 20133 Milano, Italy (e-mail: [email protected];[email protected]; [email protected]).

    Digital Object Identifier 10.1109/TAC.2005.856655

    addition, thediscussion points out thepitfallsassociated with the appli-

    cation to nonlinear problems of the practice of data prefiltering, while

    error filtering can be effective, though ad hoc parameter estimation al-

    gorithms have to be devised. Finally, it is shown how error prefiltering

    can be included in the identification cycle for specific nonlinear model

    classes of the NARMAX [1], [12] family.

    The analysis employs the Volterra series representation of nonlinear

    systems [13], [14], for which frequency analysis tools have been devel-

    oped, such as the generalized frequency response functions [15].

    II. PREFILTERING IN LINEAR IDENTIFICATION

    Consider a linear identification problem where the inputoutput

    (I/O) data is generated by the true system

    (1)

    where the noise source is a zero mean, white gaussian noise with

    variance is a suitable linear transfer function and is theinverse unit delay operator.

    Given a model class defined as

    (2)

    and parameterized by the vector , the corresponding prediction error

    is given by

    (3)

    where is the estimation error on the

    systems transfer function.

    The prediction error minimization (PEM) approach to system identi-

    fication is based on the minimization of a suitable norm of . A typicalchoice is the quadratic norm

    (4)

    where and

    denotes a filtered version of the prediction

    error sequence through a stable linear filter . Clearly

    (5)

    where and . As is well known

    [10], filtering the prediction error is equivalent to performing the iden-tification on a prefiltered I/O data set.

    In the frequency domain, the filtered error can be rewritten as

    (6)

    where denotes the Fourier transform of the input signal. In gen-

    eral, if the considered model class is such that is iden-

    tically zero for some , i.e., if , then the PEM estimate will

    be consistent. Otherwise, the identified model will be biased and the

    prefilter may be chosen to affect the bias distribution.

    0018-9286/$20.00 2005 IEEE

  • 7/30/2019 Technical Notesand Correspondence

    2/6

    1598 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 10, OCTOBER 2005

    III. PREFILTERING IN NONLINEAR IDENTIFICATION

    A. Volterra Series Representation of Nonlinear Systems

    The Volterra series [13], [14] generalizes the convolution integral

    expression used for linear systems to represent the I/O relationship for

    a discrete-time deterministic causal single-inputsingle-output (SISO)

    nonlinear system. The output of the system is given by

    (7)

    where is the th order Volterra kernel of the nonlinear

    system. An equivalent representation in the frequency domain [13] is

    given by the generalized frequency response function (GFRF) of order

    , defined as the Fourier transform of the th order kernel

    (8)

    with . The output of the -th order kernel

    can also be expressed as a function of the corresponding GFRF [15]

    (9)

    B. Formulation of the Nonlinear Identification Problem

    We will assume that the identification data is generated by a true

    system of the form

    (10)

    where is a zero mean white noise, and we will consider a generic

    model class represented, for analysis purposes only, in Volterra

    series form. The general expression based on Volterra kernels for a

    NOE model parameterized by is

    (11)

    therefore, the prediction error for a NOE model can be expressed as a

    function of the differences between the actual and estimated GFRFs

    where .

    The following Lemma (see [16]) describes the effect of linear fil-

    tering applied to a nonlinear system and will be useful later in this note.

    Lemma 1: Consider a series of a linear filter with transfer func-

    tion and a nonlinear system described as a Volterra series

    with GFRFs . Then, the GFRFs of the overall

    system are given by if the

    linear filter is connected at the input of the nonlinear system, and

    if it is connected at the output.

    Note that the first-order GFRF, which describes the linear part ofthe overall system, is equally affected by both kinds of filtering. In fact,

    it is given in both cases by .

    1) Error Filtering: Applying Lemma1, thefiltered prediction error

    can be written as

    Clearly, if the considered model class is such that

    for some , then the esti-

    mate will be consistent, i.e., the minimization of the prediction error

    norm and of the filtered prediction error norm (4) lead to the same

    model, at least asymptotically, as in the linear case. If, on the other

    hand, it cannot be assumed that , then the identified model will

    be biased and the prefilter may be chosen to affect the frequency

    distribution of the estimation error. Note, however, that the frequency

    weightings acting on the kernel estimation errors of

    increasing order have significantly different shapes. Further insight

    in the shape of the weighting factors can be obtained inspecting

    their behavior along the main diagonal in the frequency domain, i.e.,

    setting . As an example, Fig. 1(a) illustrates the

    Bode plot of the weighting factor for a second order digital

    low-pass Butterworth filter with bandwidth [0, 0.1], whereas Fig. 1(b)

    considers a second order digital band-pass Butterworth filter with

    bandwidth [0.1, 1]. While in the linear case the prefiltering tends to

    cut out all the frequencies external to the bandwidth over which the

    fit is required, in the nonlinear case the bandwidth of the weighting

    factor is a function of the kernel order and the fit is also important in

    different bands, due to the nonlinear effect that scatters the spectrum.

    2) Data Prefiltering: Denoting the prediction error se-

    quence obtained after filtering the identification data, we can write in

    view of Lemma 1

  • 7/30/2019 Technical Notesand Correspondence

    3/6

    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 10, OCTOBER 2005 1599

    Fig. 1. Bode diagram (magnitude) of the weighting factor for alow-pass filter and a band-pass filter.

    where

    Even assuming that the considered model structure is sufficiently com-

    plex so as to have a value such that

    is identically zero for all , the input dependent part of the prediction

    error will be (ideally) eliminated for a choice of such that

    (12)

    Apart from the case will, therefore, correspond

    to a weighted version of , where the weighting factor de-

    pends both on the filter and on the order of the kernel. In

    particular, in the frequency regions where

    while the approxima-

    tion degrades where this ratio significantly differs from 1. Moreover,

    Fig. 2. Bode diagram (magnitude) of the weighting factorfor a low-pass filter and a band-pass filter.

    TABLE IESTIMATED PARAMETERS FOR THE MODEL IN EXAMPLE IV-A

    for rational filters with positive relative degree (such as con-

    ventional low-pass or band-pass filters) the frequency weighting

    factor (12) will be of high-pass type, taking abnormally high values

    outside the filter bandwidth for higher order kernels. The identification

    algorithm will thus try to fit a severely distorted kernel: Though this

    distortion takes place outside the filter bandwidth, in some cases

    the shape of the weighting factor also affects the model accuracy

    inside the filter bandwidth. In fact, the minimization procedure in the

    identification algorithm will be enticed to consider as numerically

    significant only the portion of the identification data with frequency

    content outside the filter bandwidth, thus producing an unwanted bias

  • 7/30/2019 Technical Notesand Correspondence

    4/6

    1600 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 10, OCTOBER 2005

    also inside the filter bandwidth. Finally, note that even in the ideal

    case when the system belongs to the model family, the presence of

    the weighting factor in (12) actually configures an under-parameter-

    ized identification problem. As an example, Fig. 2(a) illustrates the

    weighting factor on the main diagonal, , for

    a second order digital low-pass Butterworth filter with bandwidth

    [0, 0.1], while Fig. 2(b) considers a second-order digital band-pass

    Butterworth filter with bandwidth [0.1, 1]. In the optimal model, theaccuracy of higher order kernels the filter would be reasonable only

    within a fraction of the filter bandwidth, which gets smaller as the

    kernel order increases. Also, a severe distortion would result outside

    the bandwidth for both filters.

    C. Error Filtering for Nonlinear Polynomial NARMAX Models

    Consider the general black-box nonlinear model structure [1]

    (13)

    where is a nonlinear function parameterized in and is a vector

    of regressors, defined as past input , output and

    estimated output values. Function is often assumed to

    depend linearly on

    (14)

    since straightforward parameter estimation methods of the least

    squares family can be employed for model identification [17]. In

    particular, polynomial input/output recursive black-box models belong

    to this class, where the elements of are linear and nonlinear

    monomials of the arguments , and .

    For such models, the filtered prediction error can be written as

    (15)

    where denotes the vector of filtered monomials. Filtered ver-

    sions of monomials including and terms only can

    be computed at the onset of the minimization procedure, while mono-

    mials containing also terms of the type must be filtered at

    each algorithm iteration after has been recalculated given the

    current parameterization . Therefore, for polynomial NFIR, NARX,

    and NOE models (see the classification of [1]), which do not have re-

    gressors including terms, error filtering simply amounts to

    a prefiltering of the regressors contructed from the original I/O data.

    IV. SIMULATION EXAMPLES

    A. Example 1

    Consider the task of identifying an NFIR model

    (16)

    from data generated by a system of the same structure (see Table I for

    the actual values of the parameters). The identification is performed

    over a data set of 10 000 samples generated with both input and noise

    signals selected as white gaussian noises, with variances 1 and , re-

    spectively, the latter value being chosen in order to obtain

    dB. The parameters of the model have been estimated using basic least

    squares (LS), as well as least squares with data prefiltering (DFLS) and

    error filtering (EFLS). In the last two cases, a second-order low-pass

    digital Butterworth filter with bandwidth [0, 0.5] has been considered.

    Fig. 3. Magnitude of the first and second (main diagonal only) GFRFs of theidentified NFIR models in Example IV-B: system (continuous), LS (dotted),DFLS (dashed), EFLS (dash-dot).

    As shown in Table I, while the estimates computed using LS and EFLS

    are consistent, DFLS leads to abnormally high values for the parame-

    ters associated with the quadratic regressors. Actually, the DFLS al-

    gorithm correctly estimates only the first order GFRF, whereas the

    second-order GFRF is badly estimated in all the frequency range. Here

    and in thefollowing, theGFRFs are computed according to theprobing

    method [15].

    B. Example 2

    Consider the system

    (17)

  • 7/30/2019 Technical Notesand Correspondence

    5/6

    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 10, OCTOBER 2005 1601

    TABLE IICLUSTER COEFFICIENTS FOR THE MODELS IN EXAMPLE IV-B

    and the NFIR model structure

    (18)

    so that . Model identification has been performed with the

    same algorithms and conditions of the previous example, using a data

    set of 1000 samples.

    As can be seen from Fig. 3, both methods with filtering provide im-

    proved accuracy in the filter bandwidth as far as the first order GFRF

    is concerned (at the cost of reduced performance at higher frequen-cies) with respect to standard LS. The EFLS and DFLS methods, how-

    ever, behave in a significantly different way with respect to the second

    order GFRF: EFLS ensures a significant reduction of the bias within

    the desired bandwidth with respect to LS, while the estimate of the

    second-order GFRF through DFLS is affected by significant distortion,

    mainly at high frequency [in accordance with (12)].

    An interesting interpretation of these results can be given in terms

    ofcluster coefficients. Terms with the same type of nonlinearity form

    a cluster [18]. For example, the cluster groups all regressors

    represented as products of a th order factor in and an th order

    factor in . The cluster coefficient is the sum of the coef-

    ficients of all the terms of a cluster and is a measure of the real sig-

    nificance of the corresponding nonlinearity in the model. The cluster

    coefficients of thesystem andidentified models arereportedin Table II:It turns out that EFLS is the most successful method in balancing the

    bias on the two cluster coefficients.

    C. Example 3

    Consider the NARX system

    (19)

    and the (under-parameterized) NARX model structure

    (20)

    The identification is performed over a data set of 1000 samples, with

    the same algorithms and conditions of the previous examples. Fig. 4

    shows once again that (low-pass) error filtering can be used to reduce

    the bias both in the first and second order GFRF estimates in the fre-

    quency band of interest. Concerning the data prefiltering case, notice

    that the estimate of the second order GFRF does not exhibit a high

    frequency amplification as in the previous cases, despite the identical

    shape of the weighting function. This is due to a structural constraint

    of the model which is such that for all

    . The undesirable effect of the weighting function (12) is still visible

    in terms of the increased low frequency bias in the GFRF estimates ob-

    tained via DFLS with respect to LS.

    (b)

    Fig. 4. Magnitude of the first and second (main diagonal only) GFRFs of theidentified NARX models in Example IV-C: system (continuous), LS (dotted),DFLS (dashed), EFLS (dash-dot).

    V. CONCLUDING REMARKS

    The role of prefiltering in nonlinear system identification has been

    investigated within the framework of Volterra series representation of

    nonlinear systems. The classical results on the frequency domain in-

    terpretation of prediction error filtering available in the linear systemidentification literature have been extended to the nonlinear case and

    the pitfalls associated with the naive application to nonlinear problems

    of the practice of data prefiltering have been analyzed and illustrated

    via simulation examples of NFIR and NARX model identification. Fu-

    ture developments include the investigation of nonlinear data and error

    prefilters.

    REFERENCES

    [1] J. Sjoberg, Q. Zhang, L. Ljung, A. Benveniste, B. Delyon, P. Glorennec,H. Hjalmarsson, and A. Juditsky, Nonlinear black-box modeling insystem identification: A unified overview, Automatica, vol. 31, pp.16911724, 1995.

  • 7/30/2019 Technical Notesand Correspondence

    6/6

    1602 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 10, OCTOBER 2005

    [2] A. Juditsy, H. Hjalmarsson, A. Benveniste, B. Delyon, L. Ljung,J. Sjoberg, and Q. Zhang, Nonlinear black-box models in systemidentification: Mathematical foundations, Automatica, vol. 31, pp.17251750, 1995.

    [3] S. Chen and S. A. Billings, Orthogonal least squares methods and theirapplication to nonlinear system identification, Int. J. Control, vol. 50,pp. 18731896, 1989.

    [4] , Representations of nonlinear systems: The NARMAX model,Int. J. Control, vol. 49, pp. 10131032, 1989.

    [5] G.-O. Glentis, P. Koukoulas, and N. Kalouptsidis, Efficient algorithmsfor Volterra system identification, IEEE Trans. Signal Process., vol. 47,no. 11, pp. 30423057, Nov. 1999.

    [6] L. Piroddi and W. Spinelli, An identification algorithm for polynomialNARX models based on simulation error minimization, Int. J. Control,vol. 76, no. 17, pp. 283295, 2001.

    [7] F. Previdi and M. Lovera, Identification of a class of nonlinear para-metrically varying models, Int. J. Adapt. Control Signal Process., vol.17, pp. 3350, 2003.

    [8] J. Schoukens, J. G. Nemeth, P. Crama, Y. Rolain, and R. Pintelon, Fastapproximate identification of nonlinear systems, Automatica, vol. 39,no. 7, pp. 12671274, 2003.

    [9] R. Murray-Smith and T. A. Johansen, Multiple Model Approaches toModeling and Control. London, U.K.: Taylor and Francis, 1997.

    [10] L. Ljung, System Identification: Theory for the User. Upper SaddleRiver, NJ: Prentice-Hall, 1999.

    [11] J. P. Jones and S. Billings, Mean levels in nonlinear analysis and iden-

    tification, Int. J. Control, vol. 58, no. 5, pp. 10331052, 1993.[12] I. Leontaritis and S. Billings, Input output parametric models for non

    linear systems, Int. J. Control, vol. 41, pp. 303344, 1985.[13] M. Schetzen, The Volterra and Wiener Theories of Nonlinear Sys-

    tems. New York: Wiley, 1980.[14] W. Rugh, Nonlinear System Theory: The Volterra/Wiener Ap-

    proach. Baltimore, MD: Johns Hopkins Univ. Press, 1981.[15] S. Billings and K. Tsang, Spectral analysis for nonlinear systems, part

    I: Parametric nonlinear spectral analysis, Mech. Syst. Signal Process.,vol. 3, no. 4, pp. 319339, 1989.

    [16] S. Boyd, L. Chua, and C. Desoer, Analytical foundations of Volterraseries, IMA J. Math. Control Inform., vol. 1, pp. 243282, 1984.

    [17] S. Billings, S. Chen, and M. Korenberg, Identification of MIMO non-linear systems using a forward-regression orthogonal estimator, Int. J.Control, vol. 49, pp. 21572189, 1989.

    [18] L. Aguirre and S. Billings, Improved structure selection for nonlinearmodels based on term clustering, Int. J. Control, vol. 62, pp. 569587,

    1995.

    Kernel Based Partially Linear Models and

    Nonlinear Identification

    Marcelo Espinoza, Johan A. K. Suykens, and Bart De Moor

    AbstractIn this note, we propose partially linear models with least

    squares support vector machines (LS-SVMs) for nonlinear ARX models.

    We illustrate how full black-box models can be improved when priorinformation about model structure is available. A real-life example, basedon the Silverbox benchmark data, shows significant improvements inthe generalization ability of the structured model with respect to the fullblack-box model, reflected also by a reduction in the effective number ofparameters.

    Index TermsKernels, least squares support vector machine (LS-SVM),nonlinear system identification, partially linear models.

    I. INTRODUCTION

    The objective of nonlinear system identification is to estimate a rela-

    tion between inputs and outputs generated by an unknown

    dynamical system. Let be the regression vector corresponding

    to the output in a NARX model,

    . The nonlinear system identification

    task is to estimate a function such that ,

    where is a disturbance term. Under the nonlinear black box set-

    ting, the function can be estimated by using different techniques (ar-

    tificial neural networks [6], (least-squares) support vector machines

    ((LS)-SVM) [14], [15], wavelets [18], basis expansions [12], splines

    [16], polynomials [7], etc., involving a training and validation stage

    [14]. The one-step-ahead prediction is simply using

    the estimated ; simulation -stepsaheadcan be obtainedby iteratively

    applying the prediction equation replacing future outputs by its predic-

    tions [8], [12]. In this note, we focus on the case where the nonlinearity

    in the model does not apply over all the inputs, rather a subset of it,

    leading to the identification of a partially linear model [5], [10], [13].

    We propose the partially linear LS-SVM (PL-LSSVM) [2] to improve

    theperformance of an existing black-box model when there is evidence

    that some of the regressors in the model are linear. Consider, e.g., a

    system for which the true model can be described as

    .Forthis model,the re-

    gression vector is defined by .

    If a linearparametric model is estimated, it may have a misspecification

    error because the nonlinear term may not be known a priori to the user.

    However, by moving to a full nonlinear black-box technique, the linear

    parts are not identified as such, yet as a part of the general nonlinear

    model. In this case, a nonlinear black-box model will show a better per-

    formance than a full linearmodel, butit has a complexity which may be

    Manuscript received May 28, 2004; revised March 15, 2005 and May 30,

    2005. Recommended by Guest Editor L. Ljung. This work was supported bygrants and projects for the Research Council K. U. Leuven (GOA-Mefisto666, GOA-Ambiorics, several Ph.D./Postdocs and Fellow grants), the FlemishGovernment (FWO: Ph.D./Postdocs Grants, Projects G.0240.99, G.0407.02,G.0197.02, G.0211.05, G.0141.03, G.0491.03, G.0120.03, G.0452.04,G.0499.04, ICCoS, ANMMM; AWI; IWT: Ph.D. Grants, GBOU (McKnow,Soft4s), the Belgian Federal Government (Belgian Federal Science PolicyOffice: IUAP V-22; PODO-II (CP/01/40), the EU (FP5-Quprodis; ERNSI, Eu-reka 2063-Impact; Eureka 2419-FLiTE) and Contracts Research/Agreements(ISMC/IPCOS, Data4s, TML, Elia, LMS, IPCOS, Mastercard).

    The authors are with the Department of Electrical Engineering ESAT-SCD,Katholieke Universiteit Leuven, B-3001 Leuven, Belgium (e-mail: [email protected]; [email protected]; [email protected]).

    Digital Object Identifier 10.1109/TAC.2005.856656

    0018-9286/$20.00 2005 IEEE