860 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, … · 860 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 4, APRIL 2012 Robustness in Experiment Design Cristian R. Rojas,

860 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 4, APRIL 2012

Robustness in Experiment DesignCristian R. Rojas, Juan-Carlos Agüero, Member, IEEE, James S. Welsh, Member, IEEE,

Graham C. Goodwin, Fellow, IEEE, and Arie Feuer, Fellow, IEEE

Abstract—This paper focuses on the problem of robust exper-iment design, i.e., how to design an input signal which gives rela-tively good estimation performance over a large number of systemsand model structures. Specifically, we formulate the robust experi-ment design problem utilizing fundamental limitations on the vari-ance of estimated parametric models as constraints. Using this for-mulationwe design an input signal for situations where only diffusea priori information is known about the system. Furthermore, wepresent a robust version of the unprejudiced optimal input designproblem. To achieve this, we first develop a closed form solutionfor the input spectrum which minimizes the maximum weightedintegral of the variance of the frequency response estimate over allmodel structures.

Index Terms—Optimal input design, robust input design, systemidentification.

I. INTRODUCTION

A DVANCED control design is based on the availability ofmodels for the system under study [1]. The success of

these techniques depends on the quality of the models utilized todesign the control strategy. This has, inter-alia, inspired interestin the area of system identification over the last three decades(e.g., see [2]–[6]).A general requirement in system identification is to learn as

much about a system as possible from a given observation pe-riod. This has motivated substantial interest in the area of op-timal experiment design. Optimal experiment design has beenstudied both in the statistics literature [7]–[9] and in the engi-neering literature [3], [10]–[14], primarily focusing on the goalsof system identification.Most of the existing literature is based on designing the ex-

periment to optimize a given scalar function of the Fisher In-formation Matrix [3]. This presents a fundamental difficulty,namely, when the system response depends nonlinearly on theparameters, the Information Matrix depends on the true systemparameters. Moreover, models for dynamical systems (even if

Manuscript received April 28, 2009; revised December 08, 2009, August 13,2010; accepted April 28, 2011. Date of publication August 30, 2011; date ofcurrent versionMarch 28, 2012. This work was supported in part by the SwedishResearch Council under Contract 621-2009-4017. Recommended by AssociateEditor T. Zhou.C. R. Rojas is with the ACCESS Linnaeus Center, Electrical Engineering,

KTH–Royal Institute of Technology, S-100 44 Stockholm, Sweden (e-mail:[email protected]).J.-C. Agüero, J. S. Welsh, and G. C. Goodwin are with the School of Elec-

trical Engineering and Computer Science, The University of Newcastle, NSW2308, Australia (e-mail: [email protected]; [email protected]; [email protected]).A. Feuer is with the Department of Electrical Engineering, Technion—Israel

Institute of Technology, Haifa 32000, Israel (e-mail: [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TAC.2011.2166294

linear) typically have the characteristic that their response de-pends nonlinearly on the parameters. Hence, the informationmatrix for models of dynamical systems generally depends uponthe true system parameters. Therefore experiment designs basedon the Fisher InformationMatrix will, in principle, depend uponknowledge of the true system parameters. This is paradoxicalsince the “optimal experiment” then depends on the very thingthat the experiment is aimed at estimating [13].The above reasoning has motivated the study of “robust” ex-

periment design with respect to uncertain a priori information.Work in this area has been growing in recent years [15]–[22].In particular, in [15] and [22] experiments are designed for thepurpose of robust control and the term least-costly identificationexperiment was coined. Based on geometric properties of theinformation matrix, a modification of the least-costly identifica-tion experiment approach is analyzed in [19] and [20]. Finally,in [21] the experiments are designed in a game-theory frame-work where the “true” system parameters belong to a compactset. Here, we propose an alternative approach motivated by theanalysis presented in [23], and the recent results in [24].In general, the choice of the “best” experiment to identify

a process depends on the prior knowledge we have about theprocess. In this paper we analyze, and solve, the followingproblem: Say we are just beginning to experiment on a systemand thus have very little (i.e., diffuse) prior knowledge about it.What would be a “good” initial experiment in order to estimatethe parameters of the system?To this end, we build on works such as [12] and [23], which

assume that both the true system and the noise dynamics areknown (at the time of designing the experiment). In this paper,we do not assume knowledge of the true system, but (for theresults of Section VI) we do assume that the noise spectrumis known. Basic prior knowledge about the plant can be ob-tained, for example, by using nonparametric frequency domainmethods based on a simple experiment [4]–[6], [25], [26]; how-ever, the use of this kind of prior knowledge for robust experi-ment design is not considered in the present contribution, and itwill be explored in a future publication.The results derived in this paper are valid for models with

a finite number of parameters, but where the number of sam-ples is sufficiently large. The possibility of removing this lastcondition and keeping at the same time the simplicity of our ex-pressions, which have a very intuitive interpretation, is beyondthe scope of this paper, since they are much more difficult toobtain, and most approaches to finite sample analysis are doneusing numerical techniques [27]. Moreover, exact variance re-sults for finite samples also rely on higher order moments of theunderlying distribution of the data, so they are inherently lessrobust with respect to the assumptions on the true system thanasymptotic results.The paper is structured as follows. Section II formulates

the problem and describes the notation used in the paper. In

0018-9286/$26.00 © 2011 IEEE

ROJAS et al.: ROBUSTNESS IN EXPERIMENT DESIGN 861

Section III, the basic tools used in the sequel are presented.Section IV deals with the design of an input signal based ondiffuse prior information. In Section V we study the problem ofdesigning an input spectrum having normalized power whichminimizes a weighted integral of the variance of the frequencyresponse of a model. Section VI revisits the problem of unprej-udiced input design. Section VII presents a simple numericalexample. Finally, Section VIII provides conclusions.

II. SYSTEM DESCRIPTION AND NOTATION

Consider a single-input single-output (SISO) linear systemgiven by

where is a quasi-stationary signal [5], and is azero mean Gaussian white noise sequence with variance . Wedenote the unit delay operator by and assume to be astable minimum phase transfer function with . Tosimplify the notation, we denote by .Given input-output data pairs , a model of

the form

will be estimated. Here the dimension of will specify the orderof the model.We assume that the estimators for and are asymptoti-

cally efficient (e.g., Maximum Likelihood, or PEM for Gaussiandisturbances). Note that this is not a limitation, since there arestandard estimation methods which satisfy this condition. Thisassumption allows us to decouple the problems of experimentdesign and estimation (c.f. [3, Sec. 6.2]).The spectrum of a quasi-stationary signal [5] is de-

fined as

where is the autocovariance of[5], and where .

Notation: If , then , and denote its com-plex conjugate, transpose and complex conjugate transpose, re-spectively. Let and

. The Hardy space of analytic functions on takingvalues on such that isdenoted as [28], [29]. Now define as the spaceof all functions from to having a continuousderivative, and as the space of all continuous func-tions such that for every .In the sequel, quantities with a hat “ ” correspond to estima-

tors (of their respective “unhatted” quantities), which implicitlydepend on the data length, . Covariance expressions are validas [5] (i.e., they are correct up to order1 ).

III. TECHNICAL PRELIMINARIES

The results presented below depend upon a fundamental lim-itation result developed in [24]. For completeness, we state the

1Loosely speaking, this means that all expressions in the sequel which involvevariances, of the form , should be interpreted as

.

main result of [24]. In this section, we assume there is no un-dermodelling, i.e., there exists a such that

and . In addition, andare independently parameterized,2 and the vector of true param-eters is split into two components, i.e., .Under these and some additional mild assumptions [5, Sec. 9.4],the estimate of , , has an asymp-totic variance satisfying

where .Theorem 1: If the parameter vector of , , has dimension, and is parameter identifiable under for the

maximum likelihood method [4],3 then

Proof: See [24].As explained in detail in [24], Theorem 1 shows that it is not

possible to reduce the variance of uniformly at all frequen-cies by choosing a suitable model structure, since if we reducethe variance at some frequencies, it will necessarily increase atothers, thus implying a “water-bed” effect [30]. Additionally,any over-parameterization of results in an increase in the in-tegrated variance of its estimate.The following converse to Theorem 1 will prove useful in the

sequel.Theorem 2: Let be continuous and

even. Also, let be such that

(1)

where . Then, there exists a function suchthat

(2)for every .

Proof: See Appendix A.Theorem 2 shows that if a function satisfies

relation (1), which is similar to the fundamental limitation ofTheorem 1, then it is possible to find a model structure forwith parameters for which is the variance of . In thiscase, the resulting model structure is characterized by , thegradient of with respect to . For instance, given from

2Having an independent parameterization for and means that can besplit into two components, and , such that functionally dependsonly on , and functionally depends only on .3The assumption that is parameter identifiable under for the

maximum likelihood (ML) method means the ML estimator of convergesalmost surely to , where .


Theorem 2, a model structure for whichis4 .From Theorems 1 and 2, we have that (1) provides a complete

characterization of those functions which could correspond tothe variance of .Note that the parameters involved in the fundamental limita-

tions of Theorems 1 and 2 must be evaluated at their true values.Also notice that the assumption that and are continuousand nonzero in Theorem 2 might seem restrictive, since it doesnot allow, for example, multisine inputs. However, it is a stan-dard assumption for the derivation of several variance results,e.g., see [31].

IV. EXPERIMENT DESIGN WITH DIFFUSE PRIOR INFORMATION

The problem of designing a good input signal with diffuseprior information was examined in [21], where results based onLjung’s asymptotic (in the number of data points, and also in thenumber of parameters in the model) variance expression wereobtained. In this section the results of [21] are shown to be valid,even for finite model orders, if we consider the worst case overall model structures of a given order. Again, note that we assumeno undermodelling.Our aim is to design an experiment which is “good” for a very

broad class of systems. This means that we need a measure of“goodness” of an experiment which is system independent.As argued in [16], [32]–[34], absolute variances are not par-

ticularly useful when one wants to design an experiment thatapplies to a broad class of systems. Specifically, an error stan-dard deviation of in a variable of nominal size 1 wouldbe considered to be insignificant, whereas the same error stan-dard deviation of in a variable of nominal size wouldbe considered catastrophic. Hence, it seems highly desirable towork with relative errors (see also [35]–[37]).Rather than look at a single frequency , we will look at an

“average” measure over a range of frequencies. This leads to ageneral measure of the “goodness” of an experiment, given by

(3)

where and

The functions and will be specified later.Essentially is a weighting function that allows one to

specify at which frequencies it would be preferable to obtain agood model (depending on the ultimate use of the model, butnot necessarily on the true system characteristics).In [21] it is argued that and should satisfy the following

criteria.

4 is not the only model structure for which. However, any model structure with such vari-

ance must satisfy locally around , forsome satisfying (2), by definition of the model gradient and the smoothnessof .

A.1) The optimal experiment, , which minimizesin (3), should be independent of the system

and the noise dynamics .A.2) The integrand in (3) should increase if the variance

increases at any frequency. This impliesthat should be a monotonically increasing function.

B) The weighting function should satisfy the following:

for every and every

s.t. we have

Criteria A.1 and A.2 are based on the desire to design aninput signal which is independent of the system and the noisedynamics. Criterion B, on the other hand, is based on the ob-servation that many properties of linear systems depend on theratio of poles and zeros rather than on their absolute locationsin the frequency domain [1], [30], [38]. This implies that if wescale the frequency by a constant, the optimal input must keepthe same shape and simply relocate on the frequency axis, sincethe poles and zeros of the new system will have the same ratiosas before.Note that it is not possible in our framework to consider the

full interval , since, as we will see later, the optimal signalwhich satisfies these criteria in the range is noise,which has infinite power over this range; hence it is unrealizablein practice. However, the assumption that in(3) seems reasonable, since for control design, it is well knownthat knowledge of the plant at low frequencies is unimportant, asthe controllers typically include an integrator which takes careof the steady state behavior of the closed loop. Similarly, it isnot necessary to estimate the high frequency region of , sinceplants are typically low-pass. What is actually required from thecontrol designer in order to use the proposed input signal is afrequency range where the relevant dynamics of the plantare believed to be.Note that Criterion A.1 is not the same as in [21], since we

are considering the worst case of over all possible systemsand model structures (of order ).The purpose of obtaining a robust input with respect to all

model structures comes from the fact that the optimal input typ-ically depends on the gradient of the model with respect to theparameter vector , evaluated at its true value . Therefore, fora nonlinearly parameterized model, even though the user knowsthe model structure (since it is a design variable), the gradienttypically depends on the true system, hence it will be unknownprior to the experiment. Of course, the gradient cannot take anypossible value in for some particular model structures (e.g.,linearly parameterized models, for which it is actually indepen-dent of ). However, in the sense of a fundamental limitation,the results derived in this paper establish a lower bound (and aninput spectrum which achieves it) on the performance of the pa-rameter estimation of the system, even before the selection of amodel structure.The following lemma, from [21], describes how must be

chosen to satisfy Criterion B.


Lemma 1: For , let . Ifsatisfies

(4)

for every and every such that, then there exists a such that for

every .Proof: Since is continuous, we have from (4) that

for . Thus

or, by defining and

By the continuity of , we also have that . Thisproves the lemma.Criteria A.1 and A.2 constrain to have a very particular

form, as shown in the following lemma.Lemma 2: Consider the experiment design problem

(5)

where , ,, is continuously differentiable on , and

for . Let be a stationary point. If does notdepend on , then there exist constants such that

and

,

otherwise.

Proof: See Appendix B.In the following lemma we establish that, for the choice ofgiven in Lemma 2, actually corresponds to the global

optimum of the experiment design problem (5).

Lemma 3: Consider the experiment design problem

where , , is continu-ously differentiable on , and

for . The solution to this problem is given by

,

otherwise.(6)

Proof: See Appendix C.Lemma 3 shows that, under Criteria A.1 and A.2, the optimal

input has to be proportional to the weighting function . Thismeans that the input should excite precisely those frequencieswhere higher model quality is required. This agrees with intu-ition. Notice also that the optimal input does not depend on thenoise spectrum (according to Criterion A.1).By combining Lemmas 1 and 2, Criteria A.1, A.2, and B

imply that, when only diffuse prior knowledge is available aboutthe system and the noise, then a reasonable experiment designproblem can be stated as

Moreover, by Lemma 3, the corresponding optimal input spec-trum is given by

which is bandlimited “ ” noise [16]. This extends the resultsof [21] to finite model orders.The result presented in this section can be explained in the

following way [36]: Practitioners who perform experimentsoften say that step type test signals are good, but typically donot excite high frequencies terms well enough. On the otherhand random signals such as PRBS are also considered good,but typically have too much energy in the high-frequency re-gion. Step type inputs have power spectral density that decaysas whereas random signals have constantpower spectral density. This implies that a signal having powerspectral density that lies somewhere between and a con-stant might be a good open-loop test signal. This suggestsnoise (over a limited bandwidth) as a possible good choice.


Examples which show the good performance of bandlimited“ ” noise as a first experiment when compared with othertypical input signals, such as bandlimited white noise or an op-timally designed input (based on perfect knowledge of the plantand noise properties), have been presented by the coauthors inseveral publications, e.g., [16], [21], [33], [36], [39], and [40].Remark 1: It is important to notice that the results of this

section obviously do not imply that bandlimited “ ” noiseis the optimal input signal under every possible circumstance.The optimality of this signal has been established for the casewhen there is a lack of prior knowledge. If, for instance, thefrequency response of the system were known to contain peaksin some frequency regions, then it is reasonable to incorporatethis prior knowledge into the optimization problem of Lemma2. This has already been discussed in [21], where it is notedthat the results of this section resemble the development of thePrinciple of Maximum Entropy as given in [41] and [42], wherethe same issue regarding the incorporation of prior knowledgearises.Remark 2: Since we are considering the case where there is

very little information about the plant, we cannot expect the op-timal input, i.e., bandlimited “ ” noise, to have spectacularperformance compared to a carefully designed input based onfull knowledge of the plant. The input signal we have proposedis designed to be used as the first experiment on the plant, inorder to determine its main features. As performance require-ments on the closed loop are increased, more experiments canbe performed on the plant, from which we can obtain a bettermodel, based on which one can design a better experiment andso on.

V. MIN-MAX ROBUST EXPERIMENT DESIGN

Here we utilize the results of Section III to analyse theproblem of designing an input signal which is robust, in anintegrated variance sense, against all possible model structures(and also the true values of the system parameters). This anal-ysis will then be used in the next section to design input signalswhich are optimally robust, in terms of both bias and varianceerrors, for a particular application.Theorem 3: Consider the experiment design problem:5

where and

5Note that the input power has been normalized to be less than or equal to 1.When the input power is constrained to be below some other value, it suffices,for the problems considered in this paper, to scale the optimal solution to satisfythat constraint. For other kinds of experiment design problems, the reader isreferred to [22] which provides methods to renormalize the optimal input.

for . The solution of this problem is given by

(7)

and the optimal cost is

Proof: See Appendix D.Theorem 3 gives the solution to a robust experiment design

problem. The nonrobust version of that optimization problem(i.e., without the maximization with respect to ) is a very stan-dard problem in experiment design. It was probably first studiedby Ljung in [31], where several choices for were considered,depending on the specific application of the model to be esti-mated. For example, if the purpose of the model is simulation,then could be chosen as

where is the spectrum of the input to be used during thesimulation; if the model is to be used for (one step ahead)prediction, then the choice should be

where is the spectrum of the input to be used during theprediction stage, and can be taken as an initial estimate of thenoise spectrum. The interested reader is referred to [12], [31],and [5, Sec. 12.2], where these choices are studied in detail.The original nonrobust version of the problem, studied in

[31], has a nice explicit closed-form solution for the case whenboth the number of samples and the model order are very large.Solutions for models having a finite number of parameters canbe obtained, in general, only numerically, by using convex op-timization techniques [43].Theorem 3 shows that it is possible to obtain analytic expres-

sions for the robust version of the experiment design problem,which are valid even for models with a finite number of pa-rameters (but which are still asymptotic in sample size). Theoptimal solution in this case, (7), also has a very nice interpre-tation: it is such that the frequency-wise signal-to-noise ratio,

, is proportional to the weighting function . Hence, itputs more power at those frequencies where the noise power ishigh and where a better model is required. Also notice that (7)does not explicitly depend on the number of parameters of themodel structures considered. Hence it is optimal for every fixed. Finally, note that, due to the robustification of the experimentdesign problem (by considering the maximum over all modelstructures having parameters), the optimal spectrum does notdepend (explicitly) on the true system.6

6The optimal spectrum (7) might still depend on the true system through .


VI. UNPREJUDICED INPUT DESIGN FOR FINITE MODEL ORDER

Finally we consider the unprejudiced optimal input designproblem. It is known, [23], that prejudice is introduced into theparameter estimation problem due to the assumption that thesystem of interest belongs to a limited set of models. This hasbeen addressed in [23], where Yuan and Ljung develop a frame-work for reducing the effect of this prejudice for experiment de-sign. This is accomplished in two ways: a) by including a biaseffect explicitly in the cost and b) by using an asymptotic vari-ance expression in the cost.Utilizing fundamental limitations on the variance [24], we

revisit the approach in [23] and develop an unprejudiced optimalinput for finite-order models.The result of the min max robust experiment design problem

in the previous section is used to obtain an improved unprej-udiced open-loop input design, in the sense of Yuan and Ljung[23]. First we recall the concept of an unprejudiced input design.The experiment design problem considered in [23] is of the

form

where , and undermodelling, i.e., bias in canexist. To solve this problem, the mean square error in the esti-mation of can be decomposed into bias and variance terms

where almost surely. This de-composition holds asymptotically in , in the sense that for fi-nite , the bias term should consider instead of thelimit estimate . This approximation, however, allowsfurther simplifications in the calculation of the optimal experi-ment. Minimization of the bias term leads to the following so-lution [5], [23]:

(8)

where almost surely, andis a normalization constant. Notice that this solution is indepen-dent of both and .With respect to the variance term, an asymptotic (in model

order) variance expression is used in [23], which is minimizedby the following input spectrum:

where is a normalization constant. Note that the asymp-totic (in model order) variance expression [31] used to developthis equation for the input spectrum does not consider the effectof bias.

In order to reconcile both expressions for , is con-sidered as a prefilter (designed by the user), such that

where . This solution is dimensionally inconsistent, sinceit constrains the noise prefilter to be proportional to the squareroot of the true noise spectrum, creating a paradox.This paradox arises due to the use of an asymptotic (in model

order) variance expression, which only holds approximately formodel sets with a shift structure [5, Sec. 9.4].To solve this dilemma, we consider the following experiment

design problem:

where is the set of all stable model structures with param-eters, i.e., is differentiablein the connected open set for all and

for all .In this problem formulation we consider the worst case of

the (weighted) mean square error over all model structures ofa given order. Again, the cost function can be decomposed intoboth bias and variance terms. The bias term is minimized by (8),since the solution is independent of and . This implies thattaking the supremum over all model structures in does notaffect the previous solution. The argument is formalized in thefollowing theorem.Theorem 4 (Optimality of Dominant Strategies): Let

be an arbitrary function, where and are arbitrarysets. Assume that there exists an such that

Then

Therefore is an optimal solution of the min-max problem7

.Proof: By definition of the infimum of a function, we have

that

(9)

On the other hand, by the definition of the supremum,

7In game-theoretical terms [44], Theorem 4 establishes that a dominatingstrategy is an equilibrium strategy.


Fig. 1. (Left) Magnitude Bode plot of , from Section VII. (Right) Normalized variance of , as a function of the parameter , obtained from an experimentbased on a sinusoid of unit power and frequency (red solid), and from an experiment based on bandlimited “ ” noise of unit power localized betweenthe frequencies 0:01 and 1 (blue dotted).

Thus, by taking the infimum over , we obtain

(10)

Since (10) holds for every , we can take the supremumover this quantity, which gives [45, Lemma 36.1]

(11)

Combining (9) and (11) and replacing inf by min (since the in-fimum is attained with ) concludes the proof.For the variance term, we consider the true asymptotic (in

sample size) variance expression

(12)

which is asymptotic only in . Notice, however, that we are stillnot considering the effect of bias on the variance of .The variance term, based on expression (12), corresponds ex-

actly to the min-max robust optimal experiment design problemconsidered in the previous section; hence the solution (fromTheorem 3) is

(13)

where is a normalization constant.Remark 3: Notice that (13) and(8) can be naturally combined

by letting !Just as in the robust experiment design problem considered

in Theorem 3, the optimal input obtained here has a nice inter-pretation, namely it is chosen such that the signal-to-noise ratiois proportional, at each frequency, to the weighting function .

VII. NUMERICAL EXAMPLE

Consider the following discrete-time linear time-invariantsystem of second order:

Notice that , and for , is highly reso-nant, with a peak at . The magnitude Bode plot ofis shown in Fig. 1.In order to verify the results of Section IV, let us consider a

model of the form

where is known, andwe need to estimate (whose true value is). The output measurements of system are contam-

inated with white noise of variance . The information matrixfor is

The maximum of is achieved by choosing, where

and is chosen to satisfy an input power constraint. If, then , whichmeans that the optimal input should

be a sinusoid of frequency approximately equal to the resonancefrequency of . Furthermore, as , the shape of


Fig. 2. (Left) Magnitude Bode plot of (red solid), and the spectrum (blue dotted). (Right) Magnitude Bode plots of 50 models estimated from experimentsbased on a white noise input (red dotted), a “ ” noise input (blue dashed), and the unprejudiced optimal input (green solid); for comparison, the Bode plot of

has also been included (yellow solid).

becomes sharper; hence missing the value of (which dependson the true value of ) may cause a huge performance degra-dation for . For example, let ,and . The variance of obtained from an experimentbased on a sinusoid of frequency (instead of ) ofunit power, as a function of the true value of , is shownin Fig. 1. In the same figure the normalized (i.e., multiplied by) variance obtained using “ ” noise (in the frequency range

[0.01, 1]) of unit power, is presented. As it can be seen, the signalproposed in Section IV has superior robustness properties com-pared to the nominal optimal input, since its resulting variance isless sensitive to the knowledge of than with the latter signal.In fact, the maximum variance obtained with the sinusoid inputis , while the maximum variance obtained with“ ” noise is just .The advantages of “ ” noise become apparent in situations

where there is lack of enough prior knowledge to design an“optimal” experiment and it is required to obtain a good “gen-eral-purpose” model for control. However, when there is a moreprecise specification of the final application for the model, theresults of Section VI become relevant.Say that a model of is required for simulating the output

of the system when the input has a spectrum given by

To study the properties of the inputs used during the identifica-tion in the presence of undermodelling, the following first-ordermodel structure is considered:

where . According to the result of Section VI,the unprejudiced optimal input to use has a spectrum pro-portional to . The results of 50 Monte Carlo simulations(with ) are presented in Fig. 2, where the inputsignals considered are white noise, “ ” noise (with the samecharacteristics as before) and the unprejudiced optimal input,normalized to unit power. As it can be seen from the figure,

TABLE IMEAN PERFORMANCE OF THE SIMULATION EXPERIMENT

FOR DIFFERENT INPUT SIGNALS

none of the identified models can appropriately capture theshape of . However, the models estimated using the unprej-udiced optimal input give a slightly better fit in the relevantfrequency range of . This reasoning is corroborated byTable I, which shows the mean performance of the experiments,

, obtained byMonte Carlo averages. This table reveals the benefits of usingthe unprejudiced optimal input obtained in Section VI.8

VIII. CONCLUSIONS

In this paper we have introduced a variational approach to ro-bust experiment design. Based on a fundamental limitation onthe variance of parametric estimated models, a closed form ex-pression is developed for several experiment design problemswhere the variance of the frequency response model is maxi-mized over all model structures of a given finite order. In partic-ular, we have revisited the problem of experiment design withdiffuse prior information, i.e., where an input spectrum is de-signed which is independent of the true system and the noise dy-namics. We have also studied the problem of unprejudiced inputdesign, following Yuan and Ljung’s formulation. Both problemshave been investigated in the literature, however the approachof the current paper leads to results which are valid, not only forhigh order models, but also for models of finite order.

8From Table I it might seem that “ ” noise is not a good input signal inthis case. However, the derivation of such an input was based on the assumptionthat we need a good “general-purpose” model in a given frequency range. In thesimulation example, we ask for a model which is better at high frequencies thanat low ones (because of ). “ ” noise has less power at high frequenciesthan white noise or the unprejudiced optimal input; hence it is expected to giveworse performance in this example.


APPENDIX APROOF OF THEOREM 2

To prove Theorem 2, we first establish the following lemma.Lemma 4 (Uniform Approximation of the Variance): Let

be continuous and even. Also, letbe such that

where . Then, for every there exists a vector-valued polynomial in , , such that

Proof: The idea is to approximate by a piecewise con-stant vector , then by a piecewise linear (continuous) vector, and finally by a trigonometric polynomial vector (using

Theorem 5 of Appendix E).In order to simplify the problem, we define the following:

It can be readily seen that Lemma 4 would follow from estab-lishing the existence of a function such that

(14)

Let such that whenever(for ). According to Lemma

5 of Appendix E, there are orthonormal vectors suchthat9

(15)

where10

9The condition that in Lemma 5 is satisfied if we choose largeenough, since the right side of (15) converges uniformly (in ) to 0 as .10The term in (15) is due to the requirement that

.

Thus, if we define the function byfor and

, then it holds that

where is the Kronecker Delta function, and hence

Now, for every ,

Thus

(16)

since

Let be a continuous function such thatfor all , and

(17)


Here we replace , for a given , by a piecewise linearfunction such that

and for every , thelater being possible since whenever

. Thus, we can choose small enough toensure that (17) holds.Finally, since is contin-

uous with respect to the uniform norm on in a neigh-borhood of , by Theorem 5 of Appendix E, there exists a(vector-valued) trigonometric polynomial

with for , such that

(18)

for every . Therefore, the function given bysatisfies (14), as can be seen by combining

(16)–(18).Proof of Theorem 2: We first outline the steps in the proof,

and then elaborate on each step:1) Construct a sequence of functions in , , usingLemma 4.

2) Show that this sequence is a normal family, from which itfollows that it has a subsequence converging to, say, .

3) Establish that the function

is continuous in a neighborhood of , hence proving thatsatisfies the conditions of the theorem.

The details of each step are as follows:Step 1) Construction of a sequence

To proceed, we use Lemma 4 to construct a sequenceof functions in , , such that

Since the ’s are polynomials in , they are an-alytic in the set , and,

in particular, are bounded in this set. This, togetherwith the fact that

is invariant under scaling of its argument, impliesthat we can assume

(19)

Furthermore, by applying a suitable constant unitarylinear transformation to each , we can further as-sume that

(20)

where for every .Step 2) A converging subsequence in

From Theorem 6 of Appendix E, it follows thatis uniformly bounded (by 1) on .

Therefore, by Theorem 7 of Appendix E we havethat is a normal family in , i.e., thereexists a subsequence which convergesuniformly on compact subsets of . Letbe the limit of this subsequence. Note that isanalytic in by Theorem 8 of Appendix E, andbelongs to due to .Since is compact, uniformlyin .

Step 3) Continuity of the variance expressionThe function

is continuous in a neighborhood of if

Therefore, we have

Thus, in order to show that satisfies the con-dition of the Theorem, we need to show that


(where has been defined in(20)). This can be seen from the expression

for and , where uniformly inas . Therefore, by maximizing over and

letting , we obtain

This implies

Otherwise (19) would not hold. This completes theproof.

APPENDIX BPROOF OF LEMMA 2

By Theorems 1 and 2, the experiment design problem isequivalent to

(21)

The idea here is that every gives rise to a variancewhich satisfies the integral constraint estab-

lished in Theorem 1, and conversely, everywhich satisfies the integral constraint can be related, by The-orem 2, to at least11 one . Therefore, the maximizationwith respect to can be replaced by a maximization withrespect to (imposing the integral constraint ofTheorem 1).Let and be fixed, and assume that exists. This

problem can now be solved using standard tools from calculusof variations [46].

11The possibility of havingmore than one associated with the same varianceis not an issue here, since the cost function of the experiment design problem

depends on only through .

The Lagrangian of problem (21) is

where and are Lagrange multipliers. By [46, Sec. 7.7,Theorem 2], there exist constants for which

, the solution of (21), is a stationary point of.

Thus, for every we have that, where

is the Fréchet differential of at with increment. This means [46, Sec. 7.5] that

Thus, by [46, Sec. 7.5, Lemma 1]

(22)

From (22) we have that . Thus,substituting this into the first equation of (22), and letting

, we have

(23)

The left side of (23) depends on (through ), but the right sidedoes not (due to the assumption of the independence of to). Thus, both sides are equal to a constant, say , which

implies that

(24)

Now, integrating both sides with respect to betweenand , we obtain

(25)

for a constant .On the other hand, we have that

(26)


Hence is proportional to in , and can bemade equalto 0 outside this interval. This concludes the proof.

APPENDIX CPROOF OF LEMMA 3

To simplify the development, we extend to a periodicfunction of period in by making it equal to 0 in

. Then, as in the proof of Lemma 2, the experiment designproblem is equivalent to

Let be fixed. The cost function can then be written as

where is a constant, independent of , given by

Note that, due to the constraint on , should satisfy

(27)

Let , where is chosen to satisfy (27), and letbe any function in which satisfies (27). Since

for all , with equality if and only if ,we have

(28)

with equality if and only if . This implies that, for agiven , we have that

where the supremum is taken over all satisfying the integralconstraint on the experiment design problem, and is givenby

Now, take as in (6). Then, following a similar derivationto that in (28), we have

with equality if and only if . This proves that isthe optimal solution of the experiment design problem.

APPENDIX DPROOF OF THEOREM 3

As in the proof of Lemma 2, by Theorems 1 and 2, the exper-iment design problem is equivalent to

(29)


Note that we have changed the sign in the input power con-straint to an equality, since it is an active constraint.We now fix and define

Then, problem (29) for becomes

This is a mass distribution problem (e.g., see of [16, eq. (17)]);hence the optimal cost is

Now, if it were not true that for almost every, as defined in (7), then for some

. Otherwise, in a set of positivemeasure, which implies that

thus contradicting the constraint on . Therefore

and the cost is minimized with .

APPENDIX EADDITIONAL RESULTS

Lemma 5 (Lieb’s Lemma): Let be a monotonicallynonincreasing sequence in [0, 1] such that. Then there exist orthonormal elements of ,

, such that for all.

Proof: See lemma from [47, p. 458].

Theorem 5 (Weierstrass’ Second Theorem): Ifthen, for every there is a (vector-valued) trigonometricpolynomial

with , such that forevery .

Proof: This is a simple extension of [48, Ch. I, Th. 21] tovector-valued functions on .Theorem 6 (Maximum Modulus Theorem): Let be an an-

alytic function in a region , which contains a closed disk ofradius and center . Then

(30)

Moreover, equality in (30) occurs if and only if is constantin .

Proof: See [49, Th. 10.24].Theorem 7 (Montel’s Theorem): Let be a set of analytic

functions in a region . Then is uniformly bounded on eachcompact subset of (or, equivalently, locally bounded), if andonly if is a normal family, i.e., every sequence of functions incontains a subsequence which converges uniformly on com-

pact subsets of .Proof: See [50, Th. 2.9].

Theorem 8 (Uniform Convergence of Analytic Functions):Let be a sequence of analytic functions in a region ,which converges to a function uniformly on compact subsetsof . Then is also analytic in , and converges touniformly on compact subsets of .Proof: See [49, Th. 10.28].

REFERENCES

[1] G. C. Goodwin, S. F. Graebe, and M. E. Salgado, Control System De-sign. Upper Saddle River, NJ: Prentice-Hall, 2001.

[2] P. Eykhoff, System Identification: Parameter and State Estimation.Hoboken, NJ: Wiley, 1974.

[3] G. C. Goodwin and R. L. Payne, Dynamic System Identification: Exper-iment Design and Data Analysis. New York: Academic Press, 1977.

[4] T. Söderström and P. Stoica, System Identification. Hertfordshire,U.K.: Prentice-Hall, 1989.

[5] L. Ljung, System Identification: Theory for the User, 2nd ed. UpperSaddle River, NJ: Prentice-Hall, 1999.

[6] R. Pintelon and J. Schoukens, System Identification: A Frequency Do-main Approach. New York: IEEE Press, 2001.

[7] D. R. Cox, Planning of Experiments. Hoboken, NJ: Wiley, 1958.[8] O. Kempthorne, Design and Analysis of Experiments. Hoboken, NJ:

Wiley, 1952.[9] V. V. Fedorov, Theory of Optimal Experiments. New York: Aca-

demic Press, 1972.[10] R. K. Mehra, “Optimal input signals for parameter estimation in dy-

namic systems—Survey and new results,” IEEE Trans. Autom. Con-trol, vol. AC-19, no. 6, pp. 753–768, Dec. 1974.

[11] M. Zarrop, Optimal Experiment Design for Dynamic System Identifi-cation. Berlin, Germany: Springer, 1979.

[12] M. Gevers and L. Ljung, “Optimal experiment designs with respectto the intended model application,” Automatica, vol. 22, no. 5, pp.543–554, 1986.

[13] H. Hjalmarsson, “From experiment design to closed-loop control,” Au-tomatica, vol. 41, no. 3, pp. 393–438, Mar. 2005.

[14] J. C. Agüero and G. C. Goodwin, “Choosing between open- and closed-loop experiments in linear system identification,” IEEE Trans. Autom.Control, vol. 52, no. 8, pp. 1475–1480, Aug. 2007.


[15] X. Bombois, G. Scorletti, M. Gevers, P. Van den Hof, and R.Hildebrand, “Least costly identification experiment for control,”Automatica, vol. 42, no. 10, pp. 1651–1662, Oct. 2006.

[16] C. R. Rojas, J. S.Welsh, G. C. Goodwin, and A. Feuer, “Robust optimalexperiment design for system identification,” Automatica, vol. 43, no.6, pp. 993–1008, 2007.

[17] J. Mårtensson and H. Hjalmarsson, “Robust input design using sumof squares constraints,” in 14th IFAC Symp. System Identification(SYSID’06), Newcastle, Australia, 2006, pp. 1352–1357.

[18] H. Hjalmarsson, J. Mårtensson, and B.Wahlberg, “On some robustnessissues in input design,” in Proc. 14th IFAC Symp. System Identification,Newcastle, Australia, 2006.

[19] J. Mårtensson and H. Hjalmarsson, “How to make bias and varianceerrors insensitive to system and model complexity in identification,”IEEE Trans. Autom. Control, vol. 56, no. 1, pp. 100–112, Jan. 2011.

[20] J. Mårtensson, “Geometric analysis of stochastic model errors insystem identification,” Ph.D. dissertation, Royal Inst. Technol. (KTH),Stockholm, Sweden, Oct. 2007, tRITA-EE 2007:061.

[21] C. R. Rojas, G. C. Goodwin, J. S. Welsh, and A. Feuer, “Optimal ex-periment design with diffuse prior information,” in Proc. Eur. ControlConf. (ECC), Kos, Greece, Jul. 2007, pp. 935–940.

[22] C. R. Rojas, J. C. Agüero, J. S. Welsh, and G. C. Goodwin, “On theequivalence of least costly and traditional experiment design for con-trol,” Automatica, vol. 44, no. 11, pp. 2706–2715, 2008.

[23] Z. D. Yuan and L. Ljung, “Unprejudiced optimal open loop input designfor identification of transfer functions,” Automatica, vol. 21, no. 6, pp.697–708, Nov. 1985.

[24] C. R. Rojas, J. S. Welsh, and J. C. Agüero, “Fundamental limitationson the variance of estimated parametric models,” IEEE Trans. Autom.Control, vol. 54, no. 5, pp. 1077–1081, May 2009.

[25] J. S. Welsh and G. C. Goodwin, “Finite sample properties of indirectnonparametric closed-loop identification,” IEEE Trans. Autom. Con-trol, vol. 47, no. 8, pp. 1277–1292, Aug. 2002.

[26] T. Zhou, “Frequency response estimation for NCFs of an MIMOplant from closed-loop time-domain experimental data,” IEEE Trans.Autom. Control, vol. 51, no. 1, pp. 38–51, Jan., 2006.

[27] M. C. Campi and E. Weyer, “Guaranteed non-asymptotic confidenceregions in system identification,” Automatica, vol. 41, pp. 1751–1764,2005.

[28] P. L. Duren, Theory of Spaces. San Diego, CA: Academic Press,1970.

[29] P. Koosis, Introduction to Spaces, 2nd ed. Cambridge, U.K.:Cambridge Univ. Press, 1998.

[30] M. M. Seron, J. H. Braslavsky, and G. C. Goodwin, Fundamental Lim-itations in Filtering and Control. Berlin, Germany: Springer-Verlag,1997.

[31] L. Ljung, “Asymptotic variance expressions for identified black-boxtransfer function models,” IEEE Trans. Autom. Control, vol. AC-30,no. 9, pp. 834–844, Sep. 1985.

[32] G. C. Goodwin, J. S. Welsh, A. Feuer, and M. Derpich, “Utilizing priorknowledge in robust optimal experiment design,” in Proc. 14th IFACSYSID, Newcastle, Australia, 2006, pp. 1358–1363.

[33] G. C. Goodwin, C. R. Rojas, and J. S. Welsh, “Good, bad and optimalexperiments for identification,” in Forever Ljung in System Identifica-tion—Workshop on the Occasion of Lennart Ljung’s 60th Birthday, T.Glad, Ed., Sep. 2006.

[34] J. S. Welsh, G. C. Goodwin, and A. Feuer, “Evaluation and comparisonof robust optimal experiment design criteria,” in Proc. Amer. ControlConf., Minneapolis, MN, 2006, pp. 1659–1664.

[35] G. C. Goodwin, J. I. Yuz, and J. C. Agüero, “Relative error issues insampled data models,” in IFAC World Congress, Seoul, Korea, 2008.

[36] G. C. Goodwin, J. C. Agüero, J. S. Welsh, J. I. Yuz, G. J. Adams, andC. R. Rojas, “Robust identification of process models from plant data,”J. Process Control, vol. 18, no. 9, pp. 810–820, 2008.

[37] J. C. Agüero, G. C. Goodwin, T. Söderström, and J. I. Yuz, “Sampleddata errors-in-variables systems,” in Proc. 15th IFAC Symp. SystemIdentification, Saint Malo, France, 2009, pp. 1157–1162.

[38] H. W. Bode, Network Analysis and Feedback Amplifier Design. NewYork: Van Nostrand, 1945.

[39] G. Goodwin, J. Welsh, A. Feuer, and M. Derpich, “Utilizing priorknowledge in robust optimal experiment design,” in Proc. 14thIFAC Symp. System Identification, Newcastle, Australia, 2006, pp.1358–1363.

[40] C. R. Rojas, “Robust experiment design,” Ph.D. dissertation, The Univ.Newcastle, Callaghan, Australia, Jun. 2008.

[41] J. E. Shore and R. W. Johnson, “Axiomatic derivation of the principleof maximum entropy and the principle of minimum cross-entropy,”IEEE Trans. Inf. Theory, vol. IT-26, no. 1, pp. 26–37, Jan. 1980.

[42] J. Skilling, “The axioms of maximum entropy,” in Maximum-Entropyand Bayesian Methods in Science and Engineering, G. J. Erickson andC. R. Smith, Eds. Dordrecht, The Netherlands: Kluwer Academic,1988, vol. 1, pp. 173–187.

[43] H. Jansson and H. Hjalmarsson, “Input design via LMIs admitting fre-quency-wise model specifications in confidence regions,” IEEE Trans.Autom. Control, vol. 50, no. 10, pp. 1534–1549, Oct. 2005.

[44] S. Karlin, Mathematical Methods and Theory in Games, Programming,and Economics. Reading, MA: Addison-Wesley, 1959.

[45] R. T. Rockafellar, Convex Analysis. Princeton, NJ: Princeton Univ.Press, 1970.

[46] D. G. Luenberger, Optimization by Vector Space Methods. Hoboken,NJ: Wiley, 1969.

[47] E. H. Lieb, “Variational principle for many-fermion systems,” Phys.Rev. Lett., vol. 46, no. 7, pp. 457–459, Feb. 1981.

[48] N. I. Achieser, Theory of Approximation. New York: FrederickUngar, 1956.

[49] W. Rudin, Real and Complex Analysis, 3rd ed. New York: McGraw-Hill, 1987.

[50] J. B. Conway, Functions of One Complex Variable, 2nd ed. NewYork: Springer-Verlag, 1978.

Cristian R. Rojaswas born in 1980. He received theM.S. degree in electronics engineering from the Uni-versidad Técnica Federico Santa María, Valparaíso,Chile, in 2004, and the Ph.D. degree in electrical en-gineering from The University of Newcastle, New-castle, Australia, in 2008.Since October 2008, he has been with the Royal

Institute of Technology, Stockholm, Sweden,where he is currently an Assistant Professor ofthe Automatic Control Laboratory, School ofElectrical Engineering. His research interest is in

system identification.

Juan-Carlos Agüero (S’02–M’04) was born in Os-orno, Chile. He received the professional title of In-geniero civil electrónico and a Master of engineeringfrom the Universidad Técnica Federico Santa María,Chile, in 2000, and the Ph.D. degree from The Uni-versity of Newcastle, Callaghan, Australia, in 2006.He gained industrial experience from 1997 to

1998 in the copper mining industry at El Teniente,Codelco, Chile. He is currently working as a Re-search Academic at The University of Newcastle.His research interests include system identification

and control.

James S. Welsh (S’02–M’03) was born in Maitland,Australia, in 1965. He received the B.E. degree(Hons. I) in electrical engineering in 1997 andthe Ph.D. degree, which studied ill-conditioningproblems arising in system identification, in 2004,both from The University of Newcastle, Callaghan,Australia.During the last several years, he has been actively

involved in research projects at the Centre for Com-plex Dynamic Systems and Control, The Universityof Newcastle, including Powertrain Control, Model

Predictive Control, and System Identification Toolboxes. His research interestsinclude auto-tuning, system identification, and process control. He is currently aSenior Lecturer in the School of Electrical Engineering and Computer Science,The University of Newcastle.


Graham C. Goodwin (F’86) received the B.Sc.(physics), B.Eng. (electrical engineering), and Ph.D.degrees from the University of New South Wales,Sydney, Australia.He is currently Professor Laureate of Elec-

trical Engineering at the University of Newcastle,Callaghan, Australia and is Director of The Uni-versity of Newcastle Priority Research Centre forComplex Dynamic Systems and Control. He holdsHonorary Doctorates from the Lund Institute ofTechnology, Sweden and the Technion—Israel

Institute of Technology. He is the coauthor of eight books, four edited books,and many technical papers.Dr. Goodwin is the recipient of the Control Systems Society 1999 Hendrik

Bode Lecture Prize, a Best Paper Award by the IEEE TRANSACTIONS ONAUTOMATIC CONTROL, a Best Paper Award by the Asian Journal of Control,and two Best Engineering Text Book Awards from the International Federationof Automatic Control in 1984 and 2005. In 2008 he received the QuazzaMedal from the International Federation of Automatic Control, and in 2010he received the IEEE Control Systems Award. He is an Honorary Fellow ofInstitute of Engineers, Australia; a Fellow of the International Federation ofAutomatic Control; a Fellow of the Australian Academy of Science; a Fellowof the Australian Academy of Technology, Science and Engineering; a Memberof the International Statistical Institute; a Fellow of the Royal Society, London;and a Foreign Member of the Royal Swedish Academy of Sciences.

Arie Feuer (F’04) received the B.Sc. and M.Sc.degrees from the Technion—Israel Institute of Tech-nology, Haifa, Israel, in 1967 and 1973, respectivelyand the Ph.D. degree from Yale University, NewHaven, CT, in 1978.He has been with the Electrical Engineering De-

partment, Technion since 1983 where he is currentlya Chaired Professor. From 1967 to 1970 he was inindustry working on automation design and between1978 and 1983 with Bell Labs in Holmdel, N.J. Inthe last 22 years he has been regularly visiting the

EE&CS department at the University of Newcastle. His current research in-terests include: medical imaging—in particular, ultrasound and CT; resolutionenhancement of digital images and videos; 3-D video and multiview data; sam-pling and combined representations of signals and images; adaptive systems insignal processing and control; and applications of vector quantization.Dr. Feuer received in 2009 an honorary doctorate from the University of New-

castle. Between the years 1994 and 2002 he served as the President of the Is-rael Association of Automatic Control and was a member of the IFAC Councilduring the years 2002–2005.

Documents

860 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, … · 860 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 4, APRIL 2012 Robustness in Experiment Design Cristian R. Rojas,