Upload
minhtrieudoddt
View
213
Download
0
Embed Size (px)
Citation preview
7/30/2019 Technical Notesand Correspondence
1/6
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 10, OCTOBER 2005 1597
Technical Notes and Correspondence_______________________________
On the Role of Prefiltering in Nonlinear
System Identification
William Spinelli, Luigi Piroddi, and Marco Lovera
AbstractData prefiltering is often used in linear system identificationto increase model accuracy in a specified frequency band, as prefiltering isequivalent to a frequency weighting on the prediction error function. How-ever, this interpretation applies only to a strictly linear setting of the iden-tification problem. In this note, the role of data and error prefiltering innonlinear system identification is analyzed and a frequency domain inter-
pretation is provided, based on the Volterra series representation of non-linear systems. Simulation results illustrate the conclusions of the analysis.
Index TermsFrequency domain analysis, NARX modeling, nonlinearidentification, prefiltering, sampling time, Volterra series.
I. INTRODUCTION
An increasing attention is being dedicated in the system identifica-tion literature to the problem of identifying suitable dynamic models
for nonlinear systems: Many different model classes and algorithms
have been proposed in recent years (see, e.g., [1] and [2] for a compre-
hensive review of nonlinearblack-box approaches and [3][9] for some
recent contributions), some of which are devised as extensions to the
nonlinear case of well-known linear models and procedures. However,
though many aspects of linear identification can be conveniently ex-
tended, there are some that have become common practice also in non-
linear identification, but that should be carefully reexamined to be cor-
rectly applied in that context. One of these is data prefiltering, which is
systematically employed in linear identification, mostly for antialiasing
reasons and to reduce theeffect of high frequency disturbances in order
to increase the signal-to-noise ratio. It is well known from the linear
identification literature (see, e.g., [10]) that prefiltering may be used toimprove model accuracy in a specified frequency band, e.g., as a result
of control specifications which narrow the frequencyregion over which
accurate models are actually needed. Typically, input and output data
are filtered with suitable low-pass or band-pass filters and the filtered
data are used for identification: In the linear case this is equivalent to a
filtering of the prediction error which determines a weight on the cost
function in prediction error estimation, and is thus one of the most im-
portant design variables for selectively emphasizing the fit of the linear
model.
Little attention has been dedicated to the analysis of prefiltering in
nonlinear system identification: In [11] it is pointed out that data pre-
filtering will introduce bias into the estimates even in the noise-free
case and a detailed analysis of mean level induced bias is performed.
In this note, a convenient setting for the general analysis of pre-filtering induced bias is proposed, which extends to the nonlinear case
classical results on the frequency domain interpretation of prediction
error filtering available in the linear system identification literature. In
Manuscript received May 29, 2004; revised February 17, 2005 and May 23,2005. Recommended by Associate Editor L. Ljung. This work was supportedby the Italian National MIUR project Innovative Techniques for Identificationand Adaptive Control of Industrial Systems.
The authors are with the Dipartimento di Elettronica e Informazione,Politecnico di Milano, 20133 Milano, Italy (e-mail: [email protected];[email protected]; [email protected]).
Digital Object Identifier 10.1109/TAC.2005.856655
addition, thediscussion points out thepitfallsassociated with the appli-
cation to nonlinear problems of the practice of data prefiltering, while
error filtering can be effective, though ad hoc parameter estimation al-
gorithms have to be devised. Finally, it is shown how error prefiltering
can be included in the identification cycle for specific nonlinear model
classes of the NARMAX [1], [12] family.
The analysis employs the Volterra series representation of nonlinear
systems [13], [14], for which frequency analysis tools have been devel-
oped, such as the generalized frequency response functions [15].
II. PREFILTERING IN LINEAR IDENTIFICATION
Consider a linear identification problem where the inputoutput
(I/O) data is generated by the true system
(1)
where the noise source is a zero mean, white gaussian noise with
variance is a suitable linear transfer function and is theinverse unit delay operator.
Given a model class defined as
(2)
and parameterized by the vector , the corresponding prediction error
is given by
(3)
where is the estimation error on the
systems transfer function.
The prediction error minimization (PEM) approach to system identi-
fication is based on the minimization of a suitable norm of . A typicalchoice is the quadratic norm
(4)
where and
denotes a filtered version of the prediction
error sequence through a stable linear filter . Clearly
(5)
where and . As is well known
[10], filtering the prediction error is equivalent to performing the iden-tification on a prefiltered I/O data set.
In the frequency domain, the filtered error can be rewritten as
(6)
where denotes the Fourier transform of the input signal. In gen-
eral, if the considered model class is such that is iden-
tically zero for some , i.e., if , then the PEM estimate will
be consistent. Otherwise, the identified model will be biased and the
prefilter may be chosen to affect the bias distribution.
0018-9286/$20.00 2005 IEEE
7/30/2019 Technical Notesand Correspondence
2/6
1598 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 10, OCTOBER 2005
III. PREFILTERING IN NONLINEAR IDENTIFICATION
A. Volterra Series Representation of Nonlinear Systems
The Volterra series [13], [14] generalizes the convolution integral
expression used for linear systems to represent the I/O relationship for
a discrete-time deterministic causal single-inputsingle-output (SISO)
nonlinear system. The output of the system is given by
(7)
where is the th order Volterra kernel of the nonlinear
system. An equivalent representation in the frequency domain [13] is
given by the generalized frequency response function (GFRF) of order
, defined as the Fourier transform of the th order kernel
(8)
with . The output of the -th order kernel
can also be expressed as a function of the corresponding GFRF [15]
(9)
B. Formulation of the Nonlinear Identification Problem
We will assume that the identification data is generated by a true
system of the form
(10)
where is a zero mean white noise, and we will consider a generic
model class represented, for analysis purposes only, in Volterra
series form. The general expression based on Volterra kernels for a
NOE model parameterized by is
(11)
therefore, the prediction error for a NOE model can be expressed as a
function of the differences between the actual and estimated GFRFs
where .
The following Lemma (see [16]) describes the effect of linear fil-
tering applied to a nonlinear system and will be useful later in this note.
Lemma 1: Consider a series of a linear filter with transfer func-
tion and a nonlinear system described as a Volterra series
with GFRFs . Then, the GFRFs of the overall
system are given by if the
linear filter is connected at the input of the nonlinear system, and
if it is connected at the output.
Note that the first-order GFRF, which describes the linear part ofthe overall system, is equally affected by both kinds of filtering. In fact,
it is given in both cases by .
1) Error Filtering: Applying Lemma1, thefiltered prediction error
can be written as
Clearly, if the considered model class is such that
for some , then the esti-
mate will be consistent, i.e., the minimization of the prediction error
norm and of the filtered prediction error norm (4) lead to the same
model, at least asymptotically, as in the linear case. If, on the other
hand, it cannot be assumed that , then the identified model will
be biased and the prefilter may be chosen to affect the frequency
distribution of the estimation error. Note, however, that the frequency
weightings acting on the kernel estimation errors of
increasing order have significantly different shapes. Further insight
in the shape of the weighting factors can be obtained inspecting
their behavior along the main diagonal in the frequency domain, i.e.,
setting . As an example, Fig. 1(a) illustrates the
Bode plot of the weighting factor for a second order digital
low-pass Butterworth filter with bandwidth [0, 0.1], whereas Fig. 1(b)
considers a second order digital band-pass Butterworth filter with
bandwidth [0.1, 1]. While in the linear case the prefiltering tends to
cut out all the frequencies external to the bandwidth over which the
fit is required, in the nonlinear case the bandwidth of the weighting
factor is a function of the kernel order and the fit is also important in
different bands, due to the nonlinear effect that scatters the spectrum.
2) Data Prefiltering: Denoting the prediction error se-
quence obtained after filtering the identification data, we can write in
view of Lemma 1
7/30/2019 Technical Notesand Correspondence
3/6
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 10, OCTOBER 2005 1599
Fig. 1. Bode diagram (magnitude) of the weighting factor for alow-pass filter and a band-pass filter.
where
Even assuming that the considered model structure is sufficiently com-
plex so as to have a value such that
is identically zero for all , the input dependent part of the prediction
error will be (ideally) eliminated for a choice of such that
(12)
Apart from the case will, therefore, correspond
to a weighted version of , where the weighting factor de-
pends both on the filter and on the order of the kernel. In
particular, in the frequency regions where
while the approxima-
tion degrades where this ratio significantly differs from 1. Moreover,
Fig. 2. Bode diagram (magnitude) of the weighting factorfor a low-pass filter and a band-pass filter.
TABLE IESTIMATED PARAMETERS FOR THE MODEL IN EXAMPLE IV-A
for rational filters with positive relative degree (such as con-
ventional low-pass or band-pass filters) the frequency weighting
factor (12) will be of high-pass type, taking abnormally high values
outside the filter bandwidth for higher order kernels. The identification
algorithm will thus try to fit a severely distorted kernel: Though this
distortion takes place outside the filter bandwidth, in some cases
the shape of the weighting factor also affects the model accuracy
inside the filter bandwidth. In fact, the minimization procedure in the
identification algorithm will be enticed to consider as numerically
significant only the portion of the identification data with frequency
content outside the filter bandwidth, thus producing an unwanted bias
7/30/2019 Technical Notesand Correspondence
4/6
1600 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 10, OCTOBER 2005
also inside the filter bandwidth. Finally, note that even in the ideal
case when the system belongs to the model family, the presence of
the weighting factor in (12) actually configures an under-parameter-
ized identification problem. As an example, Fig. 2(a) illustrates the
weighting factor on the main diagonal, , for
a second order digital low-pass Butterworth filter with bandwidth
[0, 0.1], while Fig. 2(b) considers a second-order digital band-pass
Butterworth filter with bandwidth [0.1, 1]. In the optimal model, theaccuracy of higher order kernels the filter would be reasonable only
within a fraction of the filter bandwidth, which gets smaller as the
kernel order increases. Also, a severe distortion would result outside
the bandwidth for both filters.
C. Error Filtering for Nonlinear Polynomial NARMAX Models
Consider the general black-box nonlinear model structure [1]
(13)
where is a nonlinear function parameterized in and is a vector
of regressors, defined as past input , output and
estimated output values. Function is often assumed to
depend linearly on
(14)
since straightforward parameter estimation methods of the least
squares family can be employed for model identification [17]. In
particular, polynomial input/output recursive black-box models belong
to this class, where the elements of are linear and nonlinear
monomials of the arguments , and .
For such models, the filtered prediction error can be written as
(15)
where denotes the vector of filtered monomials. Filtered ver-
sions of monomials including and terms only can
be computed at the onset of the minimization procedure, while mono-
mials containing also terms of the type must be filtered at
each algorithm iteration after has been recalculated given the
current parameterization . Therefore, for polynomial NFIR, NARX,
and NOE models (see the classification of [1]), which do not have re-
gressors including terms, error filtering simply amounts to
a prefiltering of the regressors contructed from the original I/O data.
IV. SIMULATION EXAMPLES
A. Example 1
Consider the task of identifying an NFIR model
(16)
from data generated by a system of the same structure (see Table I for
the actual values of the parameters). The identification is performed
over a data set of 10 000 samples generated with both input and noise
signals selected as white gaussian noises, with variances 1 and , re-
spectively, the latter value being chosen in order to obtain
dB. The parameters of the model have been estimated using basic least
squares (LS), as well as least squares with data prefiltering (DFLS) and
error filtering (EFLS). In the last two cases, a second-order low-pass
digital Butterworth filter with bandwidth [0, 0.5] has been considered.
Fig. 3. Magnitude of the first and second (main diagonal only) GFRFs of theidentified NFIR models in Example IV-B: system (continuous), LS (dotted),DFLS (dashed), EFLS (dash-dot).
As shown in Table I, while the estimates computed using LS and EFLS
are consistent, DFLS leads to abnormally high values for the parame-
ters associated with the quadratic regressors. Actually, the DFLS al-
gorithm correctly estimates only the first order GFRF, whereas the
second-order GFRF is badly estimated in all the frequency range. Here
and in thefollowing, theGFRFs are computed according to theprobing
method [15].
B. Example 2
Consider the system
(17)
7/30/2019 Technical Notesand Correspondence
5/6
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 10, OCTOBER 2005 1601
TABLE IICLUSTER COEFFICIENTS FOR THE MODELS IN EXAMPLE IV-B
and the NFIR model structure
(18)
so that . Model identification has been performed with the
same algorithms and conditions of the previous example, using a data
set of 1000 samples.
As can be seen from Fig. 3, both methods with filtering provide im-
proved accuracy in the filter bandwidth as far as the first order GFRF
is concerned (at the cost of reduced performance at higher frequen-cies) with respect to standard LS. The EFLS and DFLS methods, how-
ever, behave in a significantly different way with respect to the second
order GFRF: EFLS ensures a significant reduction of the bias within
the desired bandwidth with respect to LS, while the estimate of the
second-order GFRF through DFLS is affected by significant distortion,
mainly at high frequency [in accordance with (12)].
An interesting interpretation of these results can be given in terms
ofcluster coefficients. Terms with the same type of nonlinearity form
a cluster [18]. For example, the cluster groups all regressors
represented as products of a th order factor in and an th order
factor in . The cluster coefficient is the sum of the coef-
ficients of all the terms of a cluster and is a measure of the real sig-
nificance of the corresponding nonlinearity in the model. The cluster
coefficients of thesystem andidentified models arereportedin Table II:It turns out that EFLS is the most successful method in balancing the
bias on the two cluster coefficients.
C. Example 3
Consider the NARX system
(19)
and the (under-parameterized) NARX model structure
(20)
The identification is performed over a data set of 1000 samples, with
the same algorithms and conditions of the previous examples. Fig. 4
shows once again that (low-pass) error filtering can be used to reduce
the bias both in the first and second order GFRF estimates in the fre-
quency band of interest. Concerning the data prefiltering case, notice
that the estimate of the second order GFRF does not exhibit a high
frequency amplification as in the previous cases, despite the identical
shape of the weighting function. This is due to a structural constraint
of the model which is such that for all
. The undesirable effect of the weighting function (12) is still visible
in terms of the increased low frequency bias in the GFRF estimates ob-
tained via DFLS with respect to LS.
(b)
Fig. 4. Magnitude of the first and second (main diagonal only) GFRFs of theidentified NARX models in Example IV-C: system (continuous), LS (dotted),DFLS (dashed), EFLS (dash-dot).
V. CONCLUDING REMARKS
The role of prefiltering in nonlinear system identification has been
investigated within the framework of Volterra series representation of
nonlinear systems. The classical results on the frequency domain in-
terpretation of prediction error filtering available in the linear systemidentification literature have been extended to the nonlinear case and
the pitfalls associated with the naive application to nonlinear problems
of the practice of data prefiltering have been analyzed and illustrated
via simulation examples of NFIR and NARX model identification. Fu-
ture developments include the investigation of nonlinear data and error
prefilters.
REFERENCES
[1] J. Sjoberg, Q. Zhang, L. Ljung, A. Benveniste, B. Delyon, P. Glorennec,H. Hjalmarsson, and A. Juditsky, Nonlinear black-box modeling insystem identification: A unified overview, Automatica, vol. 31, pp.16911724, 1995.
7/30/2019 Technical Notesand Correspondence
6/6
1602 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 10, OCTOBER 2005
[2] A. Juditsy, H. Hjalmarsson, A. Benveniste, B. Delyon, L. Ljung,J. Sjoberg, and Q. Zhang, Nonlinear black-box models in systemidentification: Mathematical foundations, Automatica, vol. 31, pp.17251750, 1995.
[3] S. Chen and S. A. Billings, Orthogonal least squares methods and theirapplication to nonlinear system identification, Int. J. Control, vol. 50,pp. 18731896, 1989.
[4] , Representations of nonlinear systems: The NARMAX model,Int. J. Control, vol. 49, pp. 10131032, 1989.
[5] G.-O. Glentis, P. Koukoulas, and N. Kalouptsidis, Efficient algorithmsfor Volterra system identification, IEEE Trans. Signal Process., vol. 47,no. 11, pp. 30423057, Nov. 1999.
[6] L. Piroddi and W. Spinelli, An identification algorithm for polynomialNARX models based on simulation error minimization, Int. J. Control,vol. 76, no. 17, pp. 283295, 2001.
[7] F. Previdi and M. Lovera, Identification of a class of nonlinear para-metrically varying models, Int. J. Adapt. Control Signal Process., vol.17, pp. 3350, 2003.
[8] J. Schoukens, J. G. Nemeth, P. Crama, Y. Rolain, and R. Pintelon, Fastapproximate identification of nonlinear systems, Automatica, vol. 39,no. 7, pp. 12671274, 2003.
[9] R. Murray-Smith and T. A. Johansen, Multiple Model Approaches toModeling and Control. London, U.K.: Taylor and Francis, 1997.
[10] L. Ljung, System Identification: Theory for the User. Upper SaddleRiver, NJ: Prentice-Hall, 1999.
[11] J. P. Jones and S. Billings, Mean levels in nonlinear analysis and iden-
tification, Int. J. Control, vol. 58, no. 5, pp. 10331052, 1993.[12] I. Leontaritis and S. Billings, Input output parametric models for non
linear systems, Int. J. Control, vol. 41, pp. 303344, 1985.[13] M. Schetzen, The Volterra and Wiener Theories of Nonlinear Sys-
tems. New York: Wiley, 1980.[14] W. Rugh, Nonlinear System Theory: The Volterra/Wiener Ap-
proach. Baltimore, MD: Johns Hopkins Univ. Press, 1981.[15] S. Billings and K. Tsang, Spectral analysis for nonlinear systems, part
I: Parametric nonlinear spectral analysis, Mech. Syst. Signal Process.,vol. 3, no. 4, pp. 319339, 1989.
[16] S. Boyd, L. Chua, and C. Desoer, Analytical foundations of Volterraseries, IMA J. Math. Control Inform., vol. 1, pp. 243282, 1984.
[17] S. Billings, S. Chen, and M. Korenberg, Identification of MIMO non-linear systems using a forward-regression orthogonal estimator, Int. J.Control, vol. 49, pp. 21572189, 1989.
[18] L. Aguirre and S. Billings, Improved structure selection for nonlinearmodels based on term clustering, Int. J. Control, vol. 62, pp. 569587,
1995.
Kernel Based Partially Linear Models and
Nonlinear Identification
Marcelo Espinoza, Johan A. K. Suykens, and Bart De Moor
AbstractIn this note, we propose partially linear models with least
squares support vector machines (LS-SVMs) for nonlinear ARX models.
We illustrate how full black-box models can be improved when priorinformation about model structure is available. A real-life example, basedon the Silverbox benchmark data, shows significant improvements inthe generalization ability of the structured model with respect to the fullblack-box model, reflected also by a reduction in the effective number ofparameters.
Index TermsKernels, least squares support vector machine (LS-SVM),nonlinear system identification, partially linear models.
I. INTRODUCTION
The objective of nonlinear system identification is to estimate a rela-
tion between inputs and outputs generated by an unknown
dynamical system. Let be the regression vector corresponding
to the output in a NARX model,
. The nonlinear system identification
task is to estimate a function such that ,
where is a disturbance term. Under the nonlinear black box set-
ting, the function can be estimated by using different techniques (ar-
tificial neural networks [6], (least-squares) support vector machines
((LS)-SVM) [14], [15], wavelets [18], basis expansions [12], splines
[16], polynomials [7], etc., involving a training and validation stage
[14]. The one-step-ahead prediction is simply using
the estimated ; simulation -stepsaheadcan be obtainedby iteratively
applying the prediction equation replacing future outputs by its predic-
tions [8], [12]. In this note, we focus on the case where the nonlinearity
in the model does not apply over all the inputs, rather a subset of it,
leading to the identification of a partially linear model [5], [10], [13].
We propose the partially linear LS-SVM (PL-LSSVM) [2] to improve
theperformance of an existing black-box model when there is evidence
that some of the regressors in the model are linear. Consider, e.g., a
system for which the true model can be described as
.Forthis model,the re-
gression vector is defined by .
If a linearparametric model is estimated, it may have a misspecification
error because the nonlinear term may not be known a priori to the user.
However, by moving to a full nonlinear black-box technique, the linear
parts are not identified as such, yet as a part of the general nonlinear
model. In this case, a nonlinear black-box model will show a better per-
formance than a full linearmodel, butit has a complexity which may be
Manuscript received May 28, 2004; revised March 15, 2005 and May 30,
2005. Recommended by Guest Editor L. Ljung. This work was supported bygrants and projects for the Research Council K. U. Leuven (GOA-Mefisto666, GOA-Ambiorics, several Ph.D./Postdocs and Fellow grants), the FlemishGovernment (FWO: Ph.D./Postdocs Grants, Projects G.0240.99, G.0407.02,G.0197.02, G.0211.05, G.0141.03, G.0491.03, G.0120.03, G.0452.04,G.0499.04, ICCoS, ANMMM; AWI; IWT: Ph.D. Grants, GBOU (McKnow,Soft4s), the Belgian Federal Government (Belgian Federal Science PolicyOffice: IUAP V-22; PODO-II (CP/01/40), the EU (FP5-Quprodis; ERNSI, Eu-reka 2063-Impact; Eureka 2419-FLiTE) and Contracts Research/Agreements(ISMC/IPCOS, Data4s, TML, Elia, LMS, IPCOS, Mastercard).
The authors are with the Department of Electrical Engineering ESAT-SCD,Katholieke Universiteit Leuven, B-3001 Leuven, Belgium (e-mail: [email protected]; [email protected]; [email protected]).
Digital Object Identifier 10.1109/TAC.2005.856656
0018-9286/$20.00 2005 IEEE