30
Network of Excellence Objective ICT-2009.3.5 Engineering of Networked Monitoring and Control systems FP7 – ICT- 257462 HYCON2 Highly-complex and networked control systems Starting date: 1st September 2010 Duration: 4 years Deliverable number D.1.1.1 Title Filtering, estimation and system identification methods for complex large scale systems: State of the art, potential synergies and limitations. Workpackage 1 Due date M12 Actual submission date 2011-08-15 Organisation name(s) of lead contractor for this deliverable KTH Author(s) Håkan Hjalmarsson With the help of Alessandro Chiuso Nature Report Project co-funded by the European Commission within the Seven Framework Programme (2007-2013) Dissemination Level PU Public X PP Restricted to other programme participants (including the Commission Services) RE Restricted to a group specified by the consortium (including the Commission Services) CO Confidential, only for members of the consortium (including the Commission Services)

Network of Excellence - CORDIS · 2017. 4. 22. · Highly-complex and networked control systems . ... systems are introducing new challenges in terms of modeling and estimation. It

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Network of Excellence Objective ICT-2009.3.5 Engineering of Networked Monitoring and Control systems

    FP7 – ICT- 257462 HYCON2

    Highly-complex and networked control systems

    Starting date: 1st September 2010 Duration: 4 years

    Deliverable number D.1.1.1

    Title

    Filtering, estimation and system identification methods for complex large scale systems: State of the art, potential synergies and limitations.

    Workpackage 1

    Due date M12

    Actual submission date 2011-08-15

    Organisation name(s) of lead contractor for this deliverable

    KTH

    Author(s) Håkan Hjalmarsson

    With the help of Alessandro Chiuso

    Nature Report

    Project co-funded by the European Commission within the Seven Framework Programme (2007-2013)

    Dissemination Level

    PU Public X

    PP Restricted to other programme participants (including the Commission Services)

    RE Restricted to a group specified by the consortium (including the Commission Services)

    CO Confidential, only for members of the consortium (including the Commission Services)

  • Executive Summary

    In this report we review the status of system identification and estimation for complex large scale systems. We stress the importance of understanding the fundamental limitations and the quantification of the experimental cost required to obtain an adequate model. For networked systems, we point out the essential methodologies and outline a statistical basis for design and analysis of such networks from an estimation perspective. A key issue in all real world system identification problems is that of model structure selection. This issue is becoming increasingly accentuated when the system complexity grows. We point out that new techniques are becoming available from the areas of sparse estimation and statistical learning theory.

  • 1 Introduction

    Many of the engineering challenges that society is facing are expected to beaddressed by model-based design and optimization. Smart power grids, hy-brid and electrical vehicles and intelligent infrastructures are examples wherethis technology will be essential. The increasing complexity of engineeringsystems are introducing new challenges in terms of modeling and estimation.It is highly desirable that these systems are able to operate reliably and au-tonomously in changing environments. This implies that these systems haveto be equipped with “learning” abilities, i.e. they need to be able to inferimportant properties of their environment from a variety of sensors and toadapt their behaviour to the state of the environment and the requirementsof the application. This implies the ability to adapt the underlying models,using sensor information. In order to determine when to learn, a systemmust also be able to assess its uncertainty about the surrounding environ-ment and its own state. It is expected that for the learning to be effective,these systems need to be able to take active measures, i.e. use actuators toensure that the required information becomes available.

    Several of these systems are spatially highly distributed, cf. power gridswhich nowadays extend far beyond country limits, and it cannot be expectedthat information from all parts of the system can be communicated andprocessed at a single physical unit. Wireless sensors are attractive in manycontexts but suffer from communication limitations.

    With systems being decentralized and typically containing many actu-ators, sensors, states and non-linearities, but with limited access to sensorinformation, model building that delivers models of sufficient fidelity becomesvery challenging. Existing reports on the cost and complexity of the mod-eling stage of large engineering projects involving dynamical models seemto be confined to the process industry. It is noted in [83] that obtainingthe process model is the single most time consuming task in the applica-tion of model-based control and [56] reports that three quarters of the totalcosts associated with advanced control projects can be attributed to model-ing. This hints that modeling risks may become a serious bottleneck in futureengineering systems unless major theoretical advances are made.

    The objective of this report is to outline the state of the art in the fields ofsystem identification, filtering and estimation pertinent to complex systems.In particular fundamental limitations and synergies with other fields will beexamined.

  • 2 Fundamental limitations

    2.1 The Cramér-Rao lower bound

    The framework of stochastic models has proven to be extremely powerfulfor developing useful signal processing and estimation algorithms. In thisframework the Cramér-Rao lower bound (CRLB) provides a lower bound onthe achievable accuracy in parameter estimation. Let y be a random vectorwith probability density function (pdf) pθ(·) where θ ∈ Rn. The Cramér-RaoLower Bound (CRLB) provides a lower bound for any unbiased estimator θ̂of θ that is based on y. Subject to certain regularity conditions (see [63]),the covariance matrix of the parameter estimate is lower bounded by

    E[

    (θ̂ − θ)(θ̂ − θ)T]

    ≥ I−1F (pθ) (1)

    where IF (pθ) is the Fisher information matrix

    IF (pθ) = −E

    [

    ∂2

    ∂θ∂θTlog pθ(y)

    ]

    (2)

    Under certain conditions the CRLB is achieved asymptotically (as the num-ber of measurements grows) by the maximum likelihood (ML) estimate:

    θ̂ML = argmaxθ

    pθ(y) (3)

    The CRLB also translates into lower bounds on the achievable accuracy ofsystem properties that are functions of the model parameters. It also im-plies performance bounds for model based applications employing identifiedmodels. Let θ̂N ∈ R

    n be the estimated model parameters (we denote thetrue value by θo and N denotes the sample size) and let Vpd(θ) be a measureof the degradation in performance in the application that occurs when themodel corresponding to θ is used instead of the true system in the design ofthe application. For example consider that an incorrect model is used in amodel based control design for which the resulting controller then is appliedto the true system. Then Vpd could be some norm of the difference betweenthe achieved closed loop transfer function (using the model in the design)and the ideal closed loop transfer function (using the true system in the de-sign). Now consider the expected performance degradation E[θ̂N ]. Then theasymptotic version of the CRLB for dynamical models implies (without lossof generality, we assume that Vpd ≥ 0 with minimum Vpd(θo) = 0)

    limN→∞

    N E[Vpd(θ̂N )] ≥1

    2Tr

    {

    V ′′pd(θo)I−1F

    }

    (4)

  • where V ′′pd(θo) denotes the Hessian of Vpd evaluated at θ = θo. With someabuse of notation, we will call this inequality the CRLB (for the performanceindex). It is thus important to understand how experiment configuration (in-put excitation etc), system and model complexity, and model structure, aswell as the property of interest (through the performance index Vpd), influ-ence the performance of the application in which the model is used. How-ever, much due to the matrix inverse of IF , it is not trivial to analyze thesedependencies and often extensive matrix manipulations are required. Under-standing the nature of the CRLB has been subject to rather intense researchfor a range of problem settings, including frequency function estimation [67],multi-input identification [41], and Hammerstein systems [81]. In order tofacilitate the analysis of the CRLB a geometric method has been developedrecently [54]. The idea is as follows. Suppose that the Fisher information isgiven by

    IF = 〈Ψ,Ψ〉 :=1

    ∫ π

    −π

    Ψ(ejω)Ψ∗(ejω)dω

    where Ψ in IF is the (normalized) prediction error gradient. Suppose furtherthat, for simplicity, Vpd(θ) = (J(θ)− J(θo)) where J(θo) is a scalar quantity

    of interest, i.e. we are interested in assessing the variance of J(θ̂N). Fora linear time-invariant system, the quantity J(θo) could for example be thefrequency response at a particular frequency. The CRLB (4) then can bewritten

    limN→∞

    N E[(J(θ̂N )− J(θo))2 ≥ ΛI−1F Λ

    ∗ (5)

    where Λ = dJ(θ)dθ

    θ=θo. Now take a function γ such that

    Λ = 〈γ,Ψ〉 :=1

    ∫ π

    −π

    γT (ejω)Ψ(e−iω)dω

    Then it is easy to show that

    ΛI−1F Λ∗ =

    ∥ProjSΨ{γ}∥

    2(6)

    where ProjSΨ{γ} denotes the orthogonal projection of γ onto SΨ, the columnspaceof Ψ, i.e.

    ProjSΨ{γ} = Ψ〈Ψ,Ψ〉−1〈Ψ, γ〉

    The expression (6) turns the focus to subspace SΨ when analyzing the CRLB.In many cases this subspace can easily be derived allowing structural prop-erties of the CRLB to be deduced, e.g. that a certain sensor does not pro-vide any information whatsoever of J(θo), the quantity of interest. Examples

  • where this technique have been used are variance analysis for estimated polesand zeros [79], frequency response estimation [72], optimal experiment designminimum variance control [75] and errors-in-variables problems [55].

    2.2 Invariants

    For linear time invariant (LTI) single-input single-output systems, it has beenshown in [98] that an invariant, or water-bed effect, holds for the asymptoticvariance of the frequency function estimate AsVarG(ejω, θ̂N ). For outputerror models this effect can be expressed as

    1

    ∫ π

    −π

    Φu(ω) AsVarG(ejω, θ̂N) dω = λe n (7)

    where Φu is the input spectrum, λe is the noise variance and n is the numberof estimated parameters.

    2.3 Cost of complexity

    As mentioned in the introduction, the modeling costs have turned out tobe very high in process industry. For the advanced engineering applicationsenvisioned for the future it therefore seems relevant to investigate what theexperimental cost will be for identifying models of sufficiently accuracy. Thebasis for a theoretical investigation of this issue is the observation that theCRLB is influenced by the experimental conditions (through Ψ in (4)). Thusin principle it is possible compute the minimum experimental cost requiredto obtain a model which is accurate enough that some performance indexcan be guaranteed in the application. The quantification of how this costdepends on performance requirements of the application and system/ modelcomplexity we will call the cost of complexity. This quantity allows the user tomake informed trade-offs, e.g. between performance specs. in the applicationand the experimental cost. The variance expressions developed in [67] canbe seen as an early contribution to this problem. More recently, frequencyfunction estimation has been considered in [102, 97]. For a finite impulseresponse model of order n it is shown that if the maximum error over thefrequency band [0, ωB] is 1/γ, then the cost of complexity, as measured bythe required input energy during the identification experiment, is given byλγωB/π where λ is the noise variance.

    The cost of complexity is the resulting minimum of the criterion obtainedin the least-costly identification framework proposed in [17].

  • 3 Applications-oriented experiment design

    3.1 Introduction

    For complex systems and high performance applications, not only will it beimportant to assess the cost of complexity, it will also be important to beable to compute experiments that achieve the minimum experimental cost.

    The concept of least-costly identification was introduced in [18] and de-veloped in a series of conference contributions, appearing later as [17]. Theidea is to formulate the experiment design problem such that the objective isto minimize the experimental cost, typically measured in terms of requiredinput and output energy, subject to constraints on the model’s accuracy.Optimal experiment design has witnessed a revival in system identificationduring the last decade, see, e.g., [91, 37, 65, 66, 12, 14, 13, 59, 108, 73, 99,109, 95, 101, 38, 92, 8, 16, 7, 46]. Much of the effort has been devoted toreformulate optimal experiment problems to computationally tractable con-vex optimization problems and to connect optimal experiment design forparametric identification methods, such as prediction error identification, tocontrol applications [64, 47, 57, 58, 74, 10, 48, 11, 17, 53, 9, 52, 39].

    3.2 A framework

    In [50] a quite general framework is outlined. Consider a specific applicationand suppose that a measure of the performance degradation Vpd, as discussedin Section 2.1, is available. All conceivable models can now be split into twosets: The first, which we call the set of acceptable models Eapp, is the set forwhich the performance degradation measure is sufficiently small, say less than1/γ for some γ > 0 (which we will refer to as the accuracy), while the secondset is the set of all other models, which then from the applications pointof view are unsatisfactory. From this perspective the objective of systemidentification is to produce a model estimate in the set of acceptable modelsEapp. One can then formulate a least-costly experiment design problem withthis as constraint. Building on previous work it is shown in [50] that a wideclass of applications-oriented optimal experiment design problems can becast as semi-definite programs for linear time-invariant systems. Preliminaryresults on extension to nonlinear systems are given in [60].

  • 3.3 Added value of optimal experiment design to system

    identification

    In [49, 50] it is argued that least-costly optimal experiment design simplifiesthe overall system identification problem in several aspects. Firstly, if pos-sible, only properties important to the application are excited in the system(in the interest of minimizing the experimental effort). Consequently, systemproperties irrelevant to the application are not excited (if it can be avoided).This means that sometimes simplified models can be used without loss of ac-curacy in the application. For example, in a mechanical system, resonancesthat not will be excited during the application will not be excited duringthe system identification experiment either. Formal results are given in [73].Clearly this issue is very important for identification complex systems.

    In [97] it is argued that the importance of the model structure is re-duced when applications-oriented optimal experiment design is used. Thishints that high order general purpose models may be competitive to modelstailored to the particular system to be modeled.

    3.4 Implementation

    Optimal experiment designs typically depend on the true system and there-fore cannot be implemented directly. A common route to handle this “chickenand egg” problem is to make the design adaptive whereby the model is iden-tified recursively and the most recent estimate is being used to replace thetrue system in the experiment design, which consequently is updated as themodel estimate is updated. In [39] it is shown formally that as time increasesthis procedure provides almost as good result as if the true system had beenused in the experiment design. A first attempt of a specific algorithm tailoredfor Model Predictive Control is presented in [61].

    The added value of applications-oriented optimal experiment design dis-cussed above, i.e. the possibility to use models of restricted complexity, isillustrated in [96] where it is proved that when adaptive experiment designis used it is sufficient to use a two parameter finite impulse response modelto consistently estimate the non-minimum phase zero of a stable linear time-invariant system of arbitary order. The algorithm require little in terms ofprior knowledge.

  • 4 Networked Systems and Structured Systems

    4.1 Introduction

    Distributed large scale systems require interconnectivity implemented by wayof networks. Cheap computational and wireless transmission devices com-bined with a wide range of emerging new sensors and actuators, in turnenabled by new technologies such as micro- and nanosystems, have openedup for the use of networked control systems on a large scale [5].

    Characteristic to networked systems are that they consist of “components”that are spatially distributed. Each component can be viewed as consistingof a physical part and an engineered device, in turn consisting of sensors,actuators and a processing unit. The physical parts may be interconnected inrather arbitrary ways (depending on the type of system) while the processingunits are interconnected by communication networks, see Figure 1.

    Sensing

    ActuationCommunic-

    ationProces

    singPhysics Process

    ing

    Process

    ing

    Process

    ing

    Sensing

    Actuation Communic-

    ation

    Communic-

    ation

    Communic-

    ation

    Actuation

    Actuation Sensing

    Sensing

    Comm. Network

    Figure 1: A networked system.

    This configuration changes the perspective of control design in two as-pects: Firstly, the (dynamic) properties of the communication network have

  • to be accounted for, and, secondly, the distributed nature of the system is em-phasized. Large efforts have been (and still are) devoted to develop dynamicmodels for communication networks. These models depend very much onthe particular technology that is used, e.g. wired and wireless networks ex-hibit widely different characteristics. In particular, attention has been givento wireless networks, but there is also extensive work on wired networks,e.g. the Internet [110]. Some features of wireless networks include samplingtime jitter, random packet losses and delays, and requirements of low gradequantization. Constraints on power consumption, costs, and channel capac-ity are consequences of the distributed nature which limits local informationprocessing and information exchange between the components.

    Following [68], we will discuss some aspects that arise when a system isto be identified in a networked environment characterized by limitations indata communication and the possibility/necessity of local data processing,e.g., due to the communication constraints. Two main scenarios can beconsidered:

    1) Fusion centric

    2) Fully decentralized

    where in the first scenario nodes transmit information to a fusion center forfinal processing, whereas in the second scenario no such center exists butnodes have to update each other with as little coordination, synchronizationand communication as possible. Next we will discuss the statistical basis foridentification under such schemes.

    4.2 A statistical basis

    Consider the problem of estimating θ in a networked system. Let ptotθ denotethe joint pdf of all measurements in the system. Then the Fisher informationmatrix I−1F (p

    totθ ) is a universal lower bound for the covariance matrix of any

    unbiased parameter estimate. Notice that (3) requires the processing of allmeasurements, i.e. all data has to be gathered at a “fusion center” and beprocessed there, according to (3). When measurements are processed locallybefore being transmitted to the fusion center one may therefore expect a lossin accuracy. However, information loss does not occur when measurementsin one sensor are condensed into a sufficient statistics [63]. Thus, regardlessof fusion centric or fully decentralized schemes, unless sufficient statistics aretransmitted information loss will occur.

    A subtle issue arises when the nodes are privy to local information re-garding their own measurement process. It may be that the processed mea-surements should be weighted according to some information available only

  • locally. In a fully decentralized context [125] communicates the necessaryweights explicitly whereas in [6] the weighting is done locally.

    Another subtle issue arises when the transmission of information fromthe sensors to the fusion center is subject to rate constraints. It may thenhappen that ancillary statistics may improve upon the estimate. An ancillarystatistic is a statistic whose distribution does not depend on the unknownparameters. We refer to the surveys [42, 124] and references therein for detailsof the problem when the transmission to the fusion center is rate constrained.

    4.3 The impact of noise

    When measurements at different nodes are correlated, it is no longer sufficientto distribute the local sufficient statistics between nodes, or to a fusion center.In this case the situation is much more complex. However, there exists asimple condition for when locally linearly processed measurements can becombined into the centralized linear minimum mean variance estimate [104].

    4.4 The impact of structure

    Consider the distributed multi-sensor network in Figure 2 where sensor imay transmit the measurement yi to a fusion center. Both transfer functionsG1 and G2 are unknown. Consider now the identification of the transferfunction G1. From the figure it is clear that measurements from Sensor 1provide information regarding this transfer function. However, it also seemssomewhat obvious that Sensor 2 can contribute to improving the accuracy ofthe estimate of G1 since y2 = G2G1u+ e2. However, in [51, 116] it is shownthat under certain configurations of the model structures used for G1 and G2,Sensor 2 contains almost no information regarding G1 even if the signal tonoise ratio of this sensor is very high. The result can easily be generalized toarbitrary number of nodes in a cascade structure. A dual result is developedin [41] regarding how additional inputs affect the modeling accuracy.

    4.5 Communication aspects

    We will now discuss some of the issues associated with transmitting informa-tion over a communication channel.

    Bandwidth limitations One aspect is that the capacity of some com-munication channels is so low that it has to be accounted for. One suchtypical constraint is that the available bit rate may be restricted so that theinformation has to be quantized before transmission.

  • G1 G2u u2

    e1 e2y1 y2

    Figure 2: A distributed multi-sensor network. The sensor measurements y1and y2, together with the input u are used by a fusion center to estimate G1and G2.

    Example 1 Suppose that node i in a network measures the scalar

    yi = θ + ei

    where ei are independent normal distributed random variables with zero meanand variance λ. When each node only can relay whether yi exceeds a certainthreshold T (common to all nodes) or not, to the fusion center, it can beshown that the CRLB is at least a factor π

    2above the CRLB for the non-

    quantized case [88]. The optimal threshold is T = θ, and therefore infeasiblesince θ is unknown. We refer to [94] for further details.

    Results on quantized identification in a very general setting can be found in[118, 119, 120].

    Sampling time jitter By way of a frequency domain analysis it is shownin [36] that sampling time jitter induces bias. It is also shown how to compen-sate for this problem. In the case of time-stamping of packets, a continuoustime instrumental variable method is used in [117] to cope with irregularsampling.

    4.6 Relation to other areas

    There are obvious connections between fully distributed identification anddistributed optimization, see e.g. [114, 15]. A widely studied area is dis-tributed estimation where nodes share information regarding some randomvector that is to be estimated. Relevant questions are whether the nodes willconverge to the same estimate as the information exchange increases, and ifso, whether the nodes reach a consensus and the quality of this estimate ascompared to the centralized estimate, see, e.g., [19]. Wireless sensor networksis also a closely related area [69]. A popular class of methods are consensusalgorithms where nodes are ensured to converge to the same result despiteonly local communication between nodes.

  • 4.7 Summary

    We have seen that identification of networked systems is a multifaceted prob-lem with close ties to fields such as distributed estimation and optimization.Whereas for a fix set-up, the CRLB provides a lower bound on the estimationaccuracy, the main challenge when there are communication constraints isto device the entire scheme: When, what and where should a node transmitin order to maximize accuracy? For example, suppose that it is possible totransmit all raw data to a fusion center but that the communication channelis subject to noise. Then it may still be better to pre-process the measure-ments locally before transmission.

    5 Model structure selection

    5.1 Introduction

    When collecting data from a (physical) control system, it is often the casethat there is an underlying interconnection structure; for instance the over-all system could be the interconnection via cascade, parallel, feedback andcombinations thereof, of many dynamical systems. In this scenario any givenvariable may be directly related to only a few other variables. One conve-nient and popular way of graphically describing these relations is via so calledgraphical models [62].

    This sort of structures, which may be self-evident in certain engineeringdomains where the system has been designed via interconnection of smallerparts, pops also up when modeling phenomena described through a largenumber of variables. These interconnections, such as parallel, series or feed-back, may be intrinsic such as in economic and social “systems”.

    In the static Gaussian case, the “relation” between variables can be ex-pressed in terms of conditional independence conditions between subsets ofthem, see e.g. [31]. Estimation of sparse graphical models have been the sub-ject of intense research which is impossible to survey here; we only point thereader to the early paper [78] which propose using the Lasso to this purpose.

    In the dynamic case, i.e. when observed data are trajectories of (pos-sibly stationary) stochastic processes, one may consider several notions ofconditional independence which can be encoded via the so-called time seriescorrelation (TSC) graphs, Granger causality graphs and “partial correlation”graphs, see [29] for details. When the number of measured variables is verylarge and possibly larger than the number of data available (i.e. the num-ber of “samples” available for statistical inference), even though there is no“physical” underlying interconnected system, constructing meaningful mod-

  • els which are useful for prediction/monitoring/intepretation requires tradingoff model complexity vs. fit. In a parametric setup this complexity dependson the number of parameters which is related to both the complexity of each“subsystem” (e.g. measured via its order) as well as to their number (i.e. thenumber of dynamical systems which are “non zero”).

    Model structure selection is combinatorial in nature as it is desirable toassess the performance of all possible model structures and model orders.For autonomous system identification of complex systems this is thus one ofthe key problems.

    Estimation problems involving variable selection, which can be also framedas determining connectivity in a suitable graphical description, have beenrecently studied in the literature, see for instance [112, 80, 76, 77] and ref-erences therein. In the paper [112] coupled nonlinear oscillators (Kuramototype) are considered where the coupling strengths are to be estimated; in[80] nonlinear dynamics are allowed and the attention is restricted to thelinear term1 in the state update equation, equivalent to a vector autoregres-sive (VAR) model of order one. In both cases it is assumed that the entirestate space is measurable and an ℓ1-penalized regression problem is solvedfor estimating the coupling strenghts/linear approximations. Sparse modelsunder “smoothing” conditional independence relations, encoded by “partialcorrelation” graphs or equivalently via zeros in the inverse spectrum [21],have been recently studied in the literature. For instance, in [106] considersVAR models and ℓ1-type penalized regression while in [76, 77] a methodol-ogy based on smoothing a la Wiener is proposed, where interconnections arefound by putting a threshold on the estimated transfer functions.

    We shall now focus on Granger causality concepts, where conditional in-dependence conditions encode the fact that the prediction of (the future of)one variable (which we shall call “output variable”) may require only the pasthistory of few other variables (which we shall call “inputs”) plus possibly itsown past. This can be represented with a graph where nodes are variablesand (directed) edges are (non zero) transfer functions, self-loops encodingdependence on the “output” own past2. The concept of Granger causalityis, in fact, the most “natural” description in the context of causal controlsystems since, under rather mild conditions [40], the model obtained in thisway is the unique internally stable feedback interconnection which describesthe second order statistics of the stationary process under study. In generalboth the dynamical systems and the interconnection structure are unknown

    1Thinking of a first order Taylor expansion around the trajectory2In the language of classical System Identification, dependence of the predictor on the

    past outputs will result in ARMAX models, lack of dependence in Output Error (OE)models.

  • and have to be inferred from data. Without loss of generality we shall ad-dress the problem of modeling the relation between one node in this graph(the “output” variable) and all the other measured variables (the “inputs”). The focus is both on performing variable selection, i.e. finding the un-derlying connections in the graphical model encoding the Granger causalitystructure, as well as obtaining reliable and easily interpretable models whichcan be used, e.g. for prediction/monitoring etc.

    Such sparsity principle permeates many well known techniques in machinelearning and signal processing such as feature selection, selective shrinkageand compressed sensing [45, 32]. We shall now review some of the basictools and then we shall come back to the variable selection and identificationproblem.

    5.2 Sparse estimation

    Many approaches have been suggested to deal with model structure selectionIn Forward Selection regressors are added one by one according to how statis-tically significant they are [121]. Forward stepwise selection and LARS (LeastAngle Regression) [35] are refinements of this idea. Backwards eliminationis another approach with a long history. Here regressors are removed one byone. Another class of methods employ all regressors but use thresholding toforce insignificant parameters to become zero [34]. Another class of methodsthat can handle all regressors at once use regularization, i.e. a penalty onthe size of the parameter vector is added to the cost function. This approachhas close ties with Bayesian estimation. Ridge regression is a classical reg-ularization method where penalty is proportional to the squared 2-norm ofthe parameter vector. While this method “pulls” parameters towards zero,it does not generate sparse estimates, i.e. even though parameters may besmall they are generically non-zero.

    5.3 LASSO

    An important insight has been that the regularization can be chosen so thatsparse estimates are obtained in a one-shot procedure. The LASSO (LeastAbsolute Shrinkage Selection Operator) is one of the early contributions tothis field, and has been of tremendous influence [111]. This algorithm per-forms minimization under a constraint of the ℓ1-norm of the parameter vector

  • θ ∈ Rn. More precisely the criterion is

    minθ

    VN(θ)

    s.t. ‖θ‖1 ≤ c(8)

    Above VN(θ) is the least-squares cost function based on N samples. Forlinear regression problems

    Y = Φθ + E

    where Φ is the regressor matrix, the above problem is convex.Sparse approximation and the closely related area compressed sensing has

    been a very active research area in the last few years, e.g. [22], [33].

    5.4 Extensions

    There exists a number of variations of the ℓ1-trick underlying LASSO. InGroup Lasso, [126], or Sum-Of-Norms regularization the parameters aregrouped into several groups where the sum of the norms of all groups areused in the regularization.

    By instead penalizing changes in the parameters, ℓ1-segmentation algo-rithms can be developed [86]. Linear parameter varying and hybrid modelscan be estimated with similar algorithms. In [86] it is shown that this ap-proach is highly competitive as compared to existing algorithms.

    The ℓ1-trick can also be applied to smoothing problems, e.g. when dis-turbances change abruptly [84] and [85].

    Another related framework usually referred to as “Sparse Bayesian Learn-ing”, whose origins can be traced back to the so called “Automatic RelevanceDetermination” of [70] and to the “Relevance Vector Machine” [113], alsoprovide quite powerful tools for identification of complex systems; see e.g.the preliminary results presented in [25, 27].

    With reference to the linear problem

    Y = Φθ + E, θ := [θ1, .., θn]⊤

    the main idea of sparse Bayesian learning is to assume a prior on θ, where thecomponents θi are independently distributed as θi ∼ N (0, λi), and then tolift the problem of enforcing sparsity from parameter space such as in Lasso[111] to the “scale factors” λi ≥ 0.

    This approach has interesting connections with Lasso and Multiple KernelLearning [4] as studied in [123, 2].

  • 5.5 Choosing the Regularization Parameter

    The choice of the regularization parameter λ in (8) is usually done by way ofcross-validation or generalized cross-validation; the regularization that per-forms best on fresh data is chosen. This requires solving (8) a number oftimes. In [100] it is shown that AIC provides a way to directly obtain areasonable value for λ. The idea is based on the observation that the fullleast squares estimate (let us denote it θ̂LSN ) can model the true system. Thissuggests to find the estimate with smallest norm that has the same model fitas the full least-squares estimate. This leads to an Akaike-like constraint.

    5.6 Bayesian methods for joint variable selection and

    identification

    In this section, following [26, 25, 27], we briefly describe a Bayesian pointof view to joint variable selection and identification of (sparse) linear sys-tems. The starting point is the new identification paradigm developed in[90] that relies on nonparametric estimation of impulse responses (see also[89] for extensions to predictor estimation). In the paper [27], expandingon the recent works [26, 25], this nonparametric paradigm is extended tothe design of optimal linear predictors so as to jointly perform identificationand variable selection. Without loss of generality, analysis may restricted toMISO systems, where the variable to be predicted is called “output variable”and all the other (say m − 1) available variables are called “inputs”. In thisway we interpret the predictor as a system with m inputs (given by the pastoutputs and inputs) and one output (output predictions). Thus, predictordesign amounts to estimating m impulse responses modeled as realizationsof Gaussian processes.

    In [27] two new approaches are introduced: the first, which has beencalled Stable-Spline GLAR (SSGLAR), is based on the GLAR algorithm in[127] and can be seen as a variation of the so-called “elastic net” [128]; thesecond, which has been called Stable-Spline Exponential Hyperprior (SSEH)assigns an exponential hierarchical hyperprior with a common hypervarianceto the scale factors. This second approach has connections with the so-calledRelevance Vector Machine in [113]. The hyerarchical hyperprior favors spar-sity through an ℓ1 penalty on kernel hyperparameters. Inducing sparsity byhyperpriors is an important feature of this second approach. In fact, thispermits to obtain the marginal posterior of the hyperparameters in closedform and hence also their estimates in a robust way. Once the kernels areselected, the impulse responses are obtained in closed form via the Represen-tation Theorem [3]. However, SSEH requires solving a non-linear optimiza-

  • tion problem which may benefit from a “good” initialization. It has beenverified in [27] that a “Bayesian” forward-selection procedure which works onthe marginal density of the hyperparameters provides a robust and computa-tionally attractive way of initializing SSEH. It is interesting to observe that,asymptotically, this procedure is equivalent to a forward selection based onthe BIC criterion [105]; yet the nonparametric description allows to computemarginals in closed for and, as such, all our results apply without any ap-proximation to any finite segment of data.The paper [27] contains extensive numerical experiments involving sparseARMAX systems showing that these new approaches approach provide adefinite advantage over both the standard GLAR (applied to ARX models)and PEM (equipped with AIC or BIC) in terms of predictive capability onnew output data while also effectively capturing the “structural” propertiesof the dynamic network, i.e. being able to identify correctly, with high prob-ability, the absence of dynamic links between certain variables.

    6 Filtering and Estimation

    Also for large scale systems, filtering and estimation typically has to be car-ried out in a decentralized way.

    Decentralized filtering has many applications, e.g. robotics [103]. In earlywork the nodes required access to the global estimate [44, 122]. Subsequentlyfusion centric algorithms were developed where nodes operate locally andtransmit their local estimates and error covariances to a fusion center [28,43]. An early reference to a completely decentralized method is [93]. Thisalgorithm requires all nodes to be able to communicate with each other; inpractice a quite limiting constraint. One approach has been to use ideas fromconsensus and gossip algorithms [87]. This means that the information ateach node is spread throughout the network in between sampling instancesby way of continuous information exchange between neighbors governed bya consensus algorithm. The algorithm in [125], referred to in Section 4.2,belongs to this class of algorithms. Consensus filters now exists for a varietyof contexts, e.g. [23, 30, 24, 115, 107].

    In most consensus algorithms information exchange takes place repeat-edly between nodes inbetween measurement updates. This allows informa-tion to spread beyond neighbours so that when the network graph is con-nected each node possess information from all other nodes.

    When each node is only allowed to communicate with each neigbors, andonly once between measurements the situation changes drastically and the

  • global Kalman filter cannot be achieved. This scenario is considered in, e.g.,[1].

    Closely related to consensus filtering is to see the Kalman filtering prob-lem as an optimization problem and use ideas from decentralized optimiza-tion. Dual decomposition is used in [71]. A very recent idea in decentralizedoptimization is the alternating direction method of multipliers [20].

    In certain networks extensive use of sensors is prohibited. For examplein wireless sensor networks with battery operated sensors and transmittersenergy limits the use of sensors. A separate problem then becomes how toschedule the sensor operation. This is outside the scope of this report andwe refer to the recent [82] for references to this area.

    7 Summary

    Identification and estimation of complex large scale systems is very challeng-ing. We have pointed to that it is important to assess the experimental costof obtaining adequate models for such systems. We observed that a sta-tistical framework is suitable for designing and analysing networked systemidentification and estimation algorithms.

    It is clear that techniques from neighboring fields offer interesting oppor-tunities for the field: Gossip and consensus algorithms, sparse estimation,Gaussian processes and distributed optimization techniques are some of theinteresting ones.

    References

    [1] P. Alriksson and A. Rantzer. Model based information fusion in sensornetworks. In Proceedings of the 17th IFAC World Congress, Seoul,Korea, July 2008.

    [2] A. Aravkin, J. Burke, A. Chiuso, and G. Pillonetto. Convex vs noncon-vex approaches for sparse estimation: Glasso, multiple kernel learningand hyperparameter glasso. In IEEE Conf. on Dec. and Control, toappear, 2011.

    [3] N. Aronszajn. Theory of reproducing kernels. Transactions of theAmerican Mathematical Society, 68:337–404, 1950.

    [4] F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernellearning, conic duality, and the SMO algorithm. In ICML ’04: Proceed-

  • ings of the twenty-first international conference on Machine learning,page 6, New York, NY, USA, 2004. ACM.

    [5] J. Baillieul and P. J. Antsaklis. Control and communication challengesin networked real-time systems. Proc. of the IEEE, 95(1):9–28, Jan.2007.

    [6] S. Barbarossa and G. Scutari. Decentralized maximum-likelihood esti-mation for sensor networks composed of nonlinearly coupled dynamicalsystems. IEEE Trans. Signal Proc., 55(7):3456–3470, Jul. 2007.

    [7] M. Barenthin. Complexity Issues, Validation and Input Design forControl in System Identification. Doctoral thesis, KTH, Stockholm,Sweden, 2008.

    [8] M. Barenthin, X. Bombois, H. Hjalmarsson, and G. Scorletti. Identi-fication for control of multivariable systems: Controller validation andexperiment design via LMIs. Automatica, 44(12):3070–3078, 2008.

    [9] M. Barenthin and H. Hjalmarsson. Identication and control: Jointinput design and H∞ state feedback with ellipsoidal parametric uncer-tainty via LMIs. Automatica, 44(2):543–551, 2008.

    [10] M. Barenthin, H. Jansson, and H. Hjalmarsson. Applications of mixedH∞ and H2 input design in identification. In 16th World Congresson Automatic Control, Prague, Czech Republik, 2005. IFAC. PaperTu-A13-TO/1.

    [11] M. Barenthin, H. Jansson, H. Hjalmarsson, J. Mårtensson, andB. Wahlberg. A control perspective on optimal input design in systemidentification. In Forever Ljung in System Identification, chapter 10.Studentlitteratur, September 2006.

    [12] G. Belforte and P. Gay. Optimal experiment design for regressionpolynomial models identification. International Journal of Control,75(15):1178–1189, 2002.

    [13] G. Belforte and P. Gay. Optimal input design for set-membership iden-tification of Hammerstein models. International Journal of Control,76(3):217–225, 2003.

    [14] K. Bernaerts, R. D. Servaes, S. Kooyman, K. J. Versyck, and J. F. VanImpe. Optimal temperature input design for estimation of the square

  • root model parameters: parameter accuracy and model validity restric-tions. International Journal of Food Microbiology, 73(2-3):145 – 157,2002.

    [15] D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Compu-tation: Numerical Methods. Prentice-Hall, New York, 1989.

    [16] X. Bombois and H. Hjalmarsson. Optimal input design for robust H2deconvolution filtering. In 15th IFAC Symposium on System Identifi-cation, Saint-Malo, France, July 2009.

    [17] X. Bombois, G. Scorletti, M. Gevers, P. M. J. Van den Hof, andR. Hildebrand. Least costly identification experiment for control. Au-tomatica, 42(10):1651–1662, 2006.

    [18] X. Bombois, G. Scorletti, P. M. J. Van den Hof, M. Gevers, andR. Hildebrand. Least costly identification experiment for control. InProceedings of 16th MTNS, Leuven, Belgium, July 2004.

    [19] V. Borkar and P. P. Varaiya. Asymptotic agreement in distributedestimation. IEEE Trans. Aut. Control, 27(3):650–655, Jun. 1982.

    [20] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed op-timization and statistical learning via the alternating direction methodof multipliers. Foundations and Trends in Machine Learning, 3(1):1–124, 2011.

    [21] D.R. Brillinger. Time series: Data analysis and theory. Holden-Day,1981.

    [22] E.J. Candès, J.R. Romberg, and T. Tao. Stable signal recovery fromincomplete and inaccurate measurements. Communications on pureand applied mathematics, 59:1207–1223, 2006.

    [23] R. Carli, A. Chiuso, L. Schenato, and S. Zampieri. Distributed kalmanfiltering based on consensus strategies. IEEE Journal on Selected Areasin Communications, 26(4):622–633, May 2008.

    [24] R. L. G. Cavalcante and M. Bernard. Adaptive filter algorithms foraccelerated discrete-time consensus. IEEE Transactions on Signal Pro-cessing, 58(3):1049–1058, Mar 2010.

    [25] A. Chiuso and G. Pillonetto. Learning sparse dynamic linear systemsusing stable spline kernels and exponential hyperpriors. In Proceedingsof Neural Information Processing Symposium, Vancouver, 2010.

  • [26] A. Chiuso and G. Pillonetto. Nonparametric sparse estimators for iden-tification of large scale linear systems. In Proceedings of IEEE Conf.on Dec. and Control, Atlanta, 2010.

    [27] A. Chiuso and G. Pillonetto. A Bayesian approach tosparse dynamic network identification. Technical report, Uni-versity of Padova, 2011. submitted to Automatica, available athttp://automatica.dei.unipd.it/people/chiuso.html.

    [28] C. Y. Chong. Hierarchical estimation. In Proc. 2nd MIT/ONR CWorkshop, Monterey, CA, July 1979.

    [29] R. Dahlhaus and M. Eichler. Highly structured stochastic systems, chap-ter Causality and graphical models in time series analysis, pages 115–137. Oxford University Press, 2003.

    [30] M. A. Demetriou. Design of consensus and adaptive consensus filtersfor distributed parameter systems. Automatica, 46(2):300–311, Feb2010.

    [31] A.P. Dempster. Covariance selection. Biometrics, 28(1):157–175, 1972.

    [32] D. Donoho. Compressed sensing. IEEE Trans. on Information Theory,52(4):1289–1306, 2006.

    [33] D. L. Donoho. Compressed sensing. IEEE Transactions on InformationTheory, 52(4):1289–1306, April 2006.

    [34] D.L. Donoho and I.M. Johnstone. Ideal spatial adaptation by waveletshrinkage. Biometrika, 81(3):425–455, 1994.

    [35] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angleregression. Annals of Statistics, 32(2):407–451, 2004.

    [36] F. Eng and F. Gustafsson. Identification with stochastic sampling timejitter. Automatica, 44(3):637–646, 2008.

    [37] U. Forssell and L. Ljung. Some results on optimal experiment design.Automatica, 36(5):749–756, May 2000.

    [38] Gaia Franceschini and Sandro Macchietto. Model-based design of ex-periments for parameter precision: State of the art. Chemical Engi-neering Science, 63(19):4846 – 4872, 2008.

  • [39] L. Gerencsér, H. Hjalmarsson, and J. Mårtensson. Identification ofARX systems with non-stationary inputs - asymptotic analysis with ap-plication to adaptive input design. Automatica, 45(3):623–633, March2009.

    [40] M. Gevers and B.D.O. Anderson. On jointly stationary feedback-freestochastic processes. IEEE Trans. Aut. Contr., 27:431–436, 1982.

    [41] M. Gevers, L. Miskovic, D. Bonvin, and A. Karimi. Identification ofmulti-input systems: variance analysis and input design issues. Auto-matica, 42(4):559–572, 2006.

    [42] T. S. Han and S. Amari. Statistical inference under multiterminal datacompression. IEEE Trans. Inf. Theory, 44(6):2300–2324, oct 1998.

    [43] H. R. Hashemipour, S. Roy, and A. J. Laub. Decentralized structuresfor parallel kalman filtering. IEEE Transactions on Automatic Control,33(1):88–94, Jan 1988.

    [44] M. F. Hassan, G. Salut, M. G. Singh, and A. Titli. A decentralizedcomputational algorithm for the global kalman filter. IEEE Transac-tions on Automatic Control, 23:262–268, Apr 1978.

    [45] T. J. Hastie and R. J. Tibshirani. Generalized additive models. InMonographs on Statistics and Applied Probability, volume 43. Chapmanand Hall, London, UK, 1990.

    [46] R. Hildebrand. Optimal inputs for FIR system identification. In 47thIEEE Conference on Decision and Control, pages 5525–5530, Cancun,Mexico, 2008.

    [47] R. Hildebrand and M. Gevers. Identification for control: Optimal in-put design with respect to a worst-case ν-gap cost function. SIAM J.Control Optim, 41(5):1586–1608, 2003.

    [48] R. Hildebrand and G. Solari. Identification for control: Optimal in-put intended to identify a minimum variance controller. Automatica,43(5):758–767, 2007.

    [49] H. Hjalmarsson. From experiment design to closed loop control. Auto-matica, 41(3):393–438, March 2005.

    [50] H. Hjalmarsson. System identification of complex and structured sys-tems. In European Control Conference, pages 3424–3452, Budapest,Hugary, 2009. Plenary address.

  • [51] H. Hjalmarsson. System identification of complex and structured sys-tems. European Journal of Control, 15(4):275–310, 2009. Plenary ad-dress. European Control Conference.

    [52] H. Hjalmarsson and H. Jansson. Closed loop experiment design for lin-ear time invariant dynamical systems via LMIs. Automatica, 44(3):623–636, 2008.

    [53] H. Hjalmarsson and J. Mårtensson. Optimal input design for identifica-tion of non-linear systems: Learning from the linear case. In AmericanControl Conference, New York City, USA, July 11-13 2007.

    [54] H. Hjalmarsson and J. Mårtensson. A geometric approach to varianceanalysis in system identification. IEEE Transactions on AutomaticControl, 56:983–997, 5 2011.

    [55] H. Hjalmarsson, J. Mårtensson, C.R. Rojas, and T. Söderström. On theaccuracy in errors-in-variables identification compared to prediction-error identification. Automatica, 2011. To appear.

    [56] M. A. Hussain. Review of the applications of neural networks in chem-ical process control – simulation and on-line implementation. ArtificialIntelligence in Engineering, 13:55–68, 1 1999.

    [57] H. Jansson. Experiment design with applications in identification forcontrol. Doctoral thesis, KTH, Stockholm, Sweden, 2004.

    [58] H. Jansson and H. Hjalmarsson. Input design via LMIs admittingfrequency-wise model specifications in confidence regions. IEEE Trans-actions on Automatic Control, 50(10):1534–1549, 2005.

    [59] C. Jauberthie, L. Denis-Vidal, P. Coton, and G. Joly-Blanchard. Anoptimal input design procedure. Automatica, 42(5):881 – 884, 2006.

    [60] C. A. Larsson, H. Hjalmarsson, and C. R. Rojas. On optimal inputdesign for nonlinear FIR-type systems. In Proceedings 49th IEEE Con-ference on Decision and Control, Atlanta, GA, USA, 2010.

    [61] C.A. Larsson, C.R. Rojas, and H. Hjalmarsson. MPC oriented exper-iment design. In 18th IFAC World Congress, Milano, Italy, 2011. Toappear.

    [62] S.L. Lauritzen. Graphical Models. Oxford Statistical Science Series 17.Clarendon Press, 1996.

  • [63] E. L. Lehmann and G. Casella. Theory of Point Estimation. JohnWiley & Sons, New York, second edition, 1998.

    [64] K. Lindqvist. On experiment design in identification of smooth linearsystems. Licentiate thesis, 2001.

    [65] K. Lindqvist and H. Hjalmarsson. Optimal input design using linearmatrix inequalities. In Proceedings of the 12th IFAC Symposium onSystem Identification, 2000.

    [66] K. Lindqvist and H. Hjalmarsson. Identification for control: Adaptiveinput design using convex optimization. In Conference on Decisionand Control, pages 4326–4331, Orlando, Florida, USA, December 2001.IEEE.

    [67] L. Ljung. A nonprobabilistic framework for signal spectra. In Proc.24th IEEE Conf. on decision and Control, pages 1056–1060, Fort Lau-dredale, Fla, 1985.

    [68] L. Ljung and H. Hjalmarsson. Four encounters with system identifica-tion. European Journal of Control, 17, 2011. Plenary address. IEEEConference on Decision and Control/European Control Conference.

    [69] Z.-Q. Luo, M. Gatspar, J. Liu, and A. Swami, editors. IEEE SignalProcessing Magazine: Special Issue on Distributed Signal Processing inSensor Networks. IEEE, July 2006.

    [70] D.J.C. Mackay. Bayesian non-linear modelling for the prediction com-petition. ASHRAE Trans., 100(2):3704–3716, 1994.

    [71] J.M. Maestre, P. Giselsson, and A. Rantzer. Distributed receding hori-zon Kalman filter. In Proc. 49th IEEE Conference on Decision andControl, Atlanta, GA, December 2010.

    [72] J. Mårtensson and H. Hjalmarsson. A geometric approach to varianceanalysis in system identification: Linear time-invariant systems. In46th IEEE Conference on Decision and Control, pages 4269–4274, NewOrleans, USA, December 2007.

    [73] J. Mårtensson and H. Hjalmarsson. How to make bias and variance er-rors insensitive to system and model complexity in identification. IEEETransactions on Automatic Control, 56(1):100–112, January 2011.

  • [74] J. Mårtensson, H. Jansson, and H. Hjalmarsson. Input design foridentification of zeros. In 16th World Congress on Automatic Control,Prague, Czech Republik, 2005. IFAC.

    [75] J. Mårtensson, C.R. Rojas, and H. Hjalmarsson. Conditions whenminimum variance control is the optimal experiment for identifying aminimum variance controller. Automatica, 47(3):578–583, mar 2011.

    [76] D. Materassi and G. Innocenti. Topological identification in networksof dynamical systems. Automatic Control, IEEE Transactions on,55(8):1860 –1871, aug. 2010.

    [77] D. Materassi and M.V. Salapaka. On the problem of reconstructingan unknown topology. In Proc. of IEEE American Control Conference(ACC), 2010, pages 2113 –2118, jun. 2010.

    [78] N. Meinshausen and P. Bühlmann. High-dimensional graphs and vari-able selection with the lasso. Annals of Statistics, 34:1436–1462, 2006.

    [79] J. Mårtensson and H. Hjalmarsson. Variance error quantifications foridentified poles and zeros. Automatica, 45(11):2512–2525, Nov. 2009.

    [80] D. Napoletani and T.D. Sauer. Reconstructing the topology of sparselyconnected dynamical networks. Phys. Rev. E, 77(2):026103, Feb 2008.

    [81] B. Ninness and S. Gibson. Quantifying the accuracy of Hammersteinmodel estimation. Automatica, 38:2037–2051, 2002.

    [82] J. L. Ny, E. Feron, and M. A. Dahleh. Scheduling continuous-timekalman filters. IEEE Transactions on Automatic Control, 56:1381–1394, Jun 2011.

    [83] B. A. Ogunnaike. A contemporary industrial perspective on processcontrol theory and practice. A. Rev. Control, 20:1–8, 1996.

    [84] H. Ohlsson, F. Gustafsson, L. Ljung, and S. Boyd. State smoothing bysum-of-norms regularization. In Proceedings of the 49th IEEE Confer-ence on Decision and Control, Atlanta, USA, December 2010.

    [85] H. Ohlsson, F. Gustafsson, L. Ljung, and S. Boyd. Trajectory gen-eration using sum-of-norms regularization. In Proceedings of the 49thIEEE Conference on Decision and Control, Atlanta, USA, December2010.

  • [86] H. Ohlsson and L. Ljung. Piecewise affine system identification usingsum-of-norms regularization. In Proceedings of the 18th IFAC WorldCongress, Milano, Italy, 2011. Submitted.

    [87] R. Olfati-Saber. Distributed kalman filtering for sensor networks. InProc. 46th IEEE Conference on Decision and Control, New Orleans,LA, December 2007.

    [88] H. Papadopoulos, G. Wornell, and A. Oppenheim. Sequential signalencoding from noisy measurements using quantizers with dynamic biascontrol. IEEE Trans. Inf. Theory, 47(3):978–1002, Mar. 2001.

    [89] G. Pillonetto, A. Chiuso, and G. De Nicolao. Prediction error identifica-tion of linear systems: a nonparametric Gaussian regression approach.Automatica, 45(2):291–305, 2011.

    [90] G. Pillonetto and G. De Nicolao. A new kernel-based approach forlinear system identification. Automatica, 46(1):81–93, 2010.

    [91] L. Pronzato. Adaptive optimization and D-optimum experimental de-sign. The Annals of Statistics, 28(6):1743–1761, 2000.

    [92] L. Pronzato. Optimal experimental design and some related controlproblems. Automatica, 44(2):303–325, February 2008.

    [93] B. S. Rao and H. F. Durrant-Whyte. Fully decentralized algorithmfor multisensor kalman filtering. IEE Proceedings-D, 138:413–420, Sep1991.

    [94] A. Ribeiro and G. B. Giannakis. Bandwidth-constrained distributedestimation for wireless sensor networks-part i: Gaussian case. IEEETrans. Signal Proc., 54(3):1131–1143, Mar. 2006.

    [95] C. R. Rojas, J. C. Agüero, J. S. Welsh, and G. C. Goodwin. On theequivalence of least costly and traditional experiment design for control.Automatica, 44(11):2706–2715, 2008.

    [96] C. R. Rojas, H. Hjalmarsson, L. Gerenscér, and J. Mårtensson. Anadaptive method for consistent estimation of real-valued non-minimumzeros in stable lti systems. Automatica, 47(7):1388–1398, Jul 2011.

    [97] C. R. Rojas, M. Barenthin Syberg, J. Welsh, and H. Hjalmarsson.The cost of complexity in system identification: The output error case.Automatica, 2011. To appear.

  • [98] C. R Rojas, J. S. Welsh, and J. C. Agüero. Fundamental limitations onthe variance of parametric models. IEEE Transactions on AutomaticControl, 2008. Submitted.

    [99] C. R Rojas, J. S. Welsh, G. C. Goodwin, and A. Feuer. Robust optimalexperiment design for system identification. Automatica, 43(6):993–1008, June 2007.

    [100] C.R. Rojas and H. Hjalmarsson. Sparse estimation based on a valida-tion criterion. In Proceedings 49th IEEE Conference on Decision andControl, Orlando, FA, USA, 2011. To appear.

    [101] C.R. Rojas, H. Hjalmarsson, L. Gerencsér, and J. Mårtensson. Consis-tent estimation of real NMP zeros in stable LTI systems of arbitrarycomplexity. In 15th IFAC Symposium on System Identification, pages922–927, Saint-Malo, France, July 2009.

    [102] C.R. Rojas, M. Barenthin Syberg, J.S. Welsh, and H. Hjalmarsson.The cost of complexity in system identification: Frequency functionestimation of Finite Impulse Response systems. IEEE Transactions onAutomatic Control, 55(10):2298–2309, 2010.

    [103] S. I. Roumeliotis and G. A. Bekey. Distributed multirobot localization.IEEE Transactions on Robotics and Automation, 18:781–795, Oct 2002.

    [104] S. Roy and R. A. Iltis. Decentralized linear estimation in correlatedmeasurement noise. IEEE Trans. Aerosp. and Electr. Syst., 27(6):939–941, Nov. 1991.

    [105] G. Schwarz. Estimating the dimension of a model. The Annals ofStatistics, 6:461–464, 1978.

    [106] J. Songsiri and L. Vandeberghe. Topology selection in graphical modelsof autoregressive processes. Journal of Machine Learning Research,11:2671–2705, 2010.

    [107] K. Soummya and J. M. F. Moura. Gossip and distributed kalmanfiltering: Weak consensus under weak detectability. IEEE Transactionson Signal Processing, 59(4):1766–1784, Apr 2011.

    [108] J. D. Stigter, D. Vries, and K. J. Keesman. On adaptive optimal inputdesign: A bioreactor case study. AIChE Journal, 52(9):3290–3296,2006.

  • [109] J. Swevers and W. Verdonck J. De Schutter. Dynamic model identifica-tion for industrial robots: Integrated experiment design and parameterestimation. IEEE Control Systems Magazine, 27(5):58–71, 2007.

    [110] A. Tang, L.L.H. Andrew, K. Jacobsson, K.H. Johansson, H. Hjal-marsson, and S.H. Low. Queue dynamics with window flow control.IEEE/ACM Transaction on Networking, 18(5):1422–1435, oct 2010.

    [111] R Tibshirani. Regression shrinkage and selection via the LASSO. Jour-nal of the Royal Statistical Society. Series B, 58(1):267–288, 1996.

    [112] M. Timme. Revealing network connectivity from response dynamics.Phys. Rev. Lett., 98(22):224101, 2007.

    [113] M. Tipping. Sparse bayesian learning and the relevance vector machine.Journal of Machine Learning Research, 1:211–244, 2001.

    [114] J. N. Tsitsiklis, D. P. Bertsekas, and M. Athans. Distributed asyn-chronous deterministic and stochastic gradient optimization algo-rithms. IEEE Trans. Aut. Control, 31(9):803–812, Sep. 1986.

    [115] V. Ugrinovskii. Distributed robust filtering with H∞ consensus of es-timates. Automatica, 47(1):1–13, Jan 2011.

    [116] B. Wahlberg, H. Hjalmarsson, and J. Mårtensson. Variance results foridentification of cascade systems. Automatica, 45(6):1443–1448, 2009.

    [117] J. Wang, W. X. Zheng, and T. Chen. Identification of linear dy-namic systems operating in a networked environment. Automatica,45(12):2763–2772, Dec. 2009.

    [118] L. Y. Wang and G. G. Yin. Asymptotically efficient parameter esti-mation using quantized output observations. Automatica, 43(7):1178–1191, Jul. 2007.

    [119] L. Y. Wang and G. G. Yin. Quantized identification with dependentnoise and Fisher information ratio of communication channels. IEEETrans. Aut. Control, 55(3):674–690, Mar. 2010.

    [120] L. Y. Wang, G. G. Yin, J.-F. Zhang, and Y. Zhao. System identificationwith quantized observations. Birkhäuser, Boston, USA, 2010.

    [121] S. Weisberg. Applied Linear Regression. Wiley, New York, 1980.

  • [122] D. Willner, C. B. Chang, and K. P. Dunn. Kalman filter algorithm fora multisensor system. In Proceedings of the 15th IEEE Conference onDecision and Control, pages 570–574, Clearwater, FL, December 1976.

    [123] D.P. Wipf, B.D. Rao, and S. Nagarajan. Latent variable Bayesian mod-els for promoting sparsity. IEEE Transactions on Information Theory(to appear), 2011.

    [124] J.-J. Xiao, A. Ribeiro, Z.-Q. Luo, and G.B. Giannakis. Distributedcompression-estimation using wireless sensor networks. IEEE SignalProcessing Magazine, 23(4):27–41, 2006.

    [125] L. Xiao, S. Lall, and S. Boyd. A scheme for robust distributed sensorfusion based on average consensus. In IPSN05 (International Confer-ence on Information Processing in Sensor Networks), pages 63–70, LosAngeles, CA, April 2005.

    [126] M. Yuan and Y. Lin. Model selection and estimation in regression withgrouped variables. Journal of the Royal Statistical Society, Series B,68:49–67, 2006.

    [127] M. Yuan and Y. Lin. Model selection and estimation in regression withgrouped variables. Journal of the Royal Statistical Society, Series B,68:49–67, 2006.

    [128] H. Zou and T. Hastie. Regularization and variable selection via theelastic net. Journal of the Royal Statistical Society, Series B, 67:301–320, 2005.