Perspectives on System Identification Lennart Ljung Linköping University, Sweden

Perspectives on System Perspectives on System IdentificationIdentification

Lennart LjungLennart Ljung

Linköping University, SwedenLinköping University, Sweden

The ProblemThe Problem Flight tests with Flight tests with

Gripen at high alphaGripen at high alpha Person in Magnet camera, Person in Magnet camera,

stabilizing a pendulum by stabilizing a pendulum by thinking ”right”-”left”thinking ”right”-”left”

fMRI picture of brainfMRI picture of brain

The ConfusionThe Confusion Support Vector Machines * Manifold learning *prediction error method * Partial Support Vector Machines * Manifold learning *prediction error method * Partial

Least Squares * Regularization * Local Linear Models * Neural Networks * Least Squares * Regularization * Local Linear Models * Neural Networks * Bayes method * Maximum Likelihood * Akaike's Criterion * The Frisch Scheme Bayes method * Maximum Likelihood * Akaike's Criterion * The Frisch Scheme * MDL * Errors In Variables * MOESP * Realization Theory *Closed Loop * MDL * Errors In Variables * MOESP * Realization Theory *Closed Loop Identification * Cram\'er - Rao * Identification for Control * N4SID* Experiment Identification * Cram\'er - Rao * Identification for Control * N4SID* Experiment Design * Fisher Information * Local Linear Models * Kullback-Liebler Distance * Design * Fisher Information * Local Linear Models * Kullback-Liebler Distance * MaximumEntropy * Subspace Methods * Kriging * Gaussian Processes * Ho-MaximumEntropy * Subspace Methods * Kriging * Gaussian Processes * Ho-Kalman * Self Organizing maps * Quinlan's algorithm * Local Polynomial Kalman * Self Organizing maps * Quinlan's algorithm * Local Polynomial Models * Direct WeightOptimization * PCA * Canonical Correlations * RKHS * Models * Direct WeightOptimization * PCA * Canonical Correlations * RKHS * Cross Validation *co-integration * GARCH * Box-Jenkins * Output Error * Total Cross Validation *co-integration * GARCH * Box-Jenkins * Output Error * Total Least Squares * ARMAX * Time Series * ARX * Nearest neighbors * Vector Least Squares * ARMAX * Time Series * ARX * Nearest neighbors * Vector Quantization *VC-dimension * Rademacher averages * Manifold Learning * Quantization *VC-dimension * Rademacher averages * Manifold Learning * Local Linear Embedding* Linear Parameter Varying Models * Kernel smoothing Local Linear Embedding* Linear Parameter Varying Models * Kernel smoothing * Mercer's Conditions *The Kernel trick * ETFE * Blackman--Tukey * GMDH * * Mercer's Conditions *The Kernel trick * ETFE * Blackman--Tukey * GMDH * Wavelet Transform * Regression Trees * Yule-Walker equations * Inductive Wavelet Transform * Regression Trees * Yule-Walker equations * Inductive Logic Programming *Machine Learning * Perceptron * Backpropagation * Logic Programming *Machine Learning * Perceptron * Backpropagation * Threshold Logic *LS-SVM * Generaliztion * CCA * M-estimator * Boosting * Threshold Logic *LS-SVM * Generaliztion * CCA * M-estimator * Boosting * Additive Trees * MART * MARS * EM algorithm * MCMC * Particle Filters Additive Trees * MART * MARS * EM algorithm * MCMC * Particle Filters *PRIM * BIC * Innovations form * AdaBoost * ICA * LDA * Bootstrap * *PRIM * BIC * Innovations form * AdaBoost * ICA * LDA * Bootstrap * Separating Hyperplanes * Shrinkage * Factor Analysis * ANOVA * Multivariate Separating Hyperplanes * Shrinkage * Factor Analysis * ANOVA * Multivariate Analysis * Missing Data * Density Estimation * PEM *Analysis * Missing Data * Density Estimation * PEM *

This TalkThis Talk

Two objectives:Two objectives:• Place System Identification on the global Place System Identification on the global

map. Who are our neighbours in this part map. Who are our neighbours in this part of the universe?of the universe?

• Discuss some open areas in System Discuss some open areas in System Identfication.Identfication.

The communitiesThe communities Constructing (mathe- Constructing (mathe-

matical) models from data matical) models from data is a prime problem in is a prime problem in many scientific fields and many scientific fields and many application areas.many application areas.

Many communities and Many communities and cultures around the area cultures around the area have grown, with their have grown, with their own nomenclatures and own nomenclatures and their own ``social lives''.their own ``social lives''.

This has created a very This has created a very rich, and somewhat rich, and somewhat confusing, plethora of confusing, plethora of methods and approaches methods and approaches for the problem.for the problem.

.

A picture: There is a core of central material, encircled by the different communities

The coreThe core

Model

{ Model Class { Complexity (Flexibility)

EstimationEstimation

So need to meet the data with a prejudice!

All data contain information and misinformation (“Signal and noise”)

information in datainformation in dataSqueeze out the relevant Squeeze out the relevant But NOT MORE !But NOT MORE !

Estimation PrejudicesEstimation Prejudices

Nature is Simple!Nature is Simple!Occam's razor Occam's razor God is subtle, but He is not malicious (Einstein)God is subtle, but He is not malicious (Einstein)

So, conceptually:So, conceptually:

Ex: Akaike:Ex: Akaike:

Regularization:Regularization:

Estimation and Validation Estimation and Validation

So don't be impressed by a good fit to data in a So don't be impressed by a good fit to data in a flexible model set!flexible model set!

Bias and Variance Bias and Variance

MSE = BIAS (B) + VARIANCE (V)MSE = BIAS (B) + VARIANCE (V)

Error = Systematic + RandomError = Systematic + Random

This bias/variance tradeoff is at the heart of estimation!This bias/variance tradeoff is at the heart of estimation!

Information Contents in Information Contents in Data and the CR InequalityData and the CR Inequality

The Communities Around the Core IThe Communities Around the Core IStatistics : The the mother areaStatistics : The the mother area … … EM algorithm for ML estimationEM algorithm for ML estimation

Resampling techniques (bootstrap…)Resampling techniques (bootstrap…) Regularization: LARS, Lasso …Regularization: LARS, Lasso …

Statistical learning theoryStatistical learning theory Convex formulations, SVM (support Convex formulations, SVM (support vector machines)vector machines) VC-dimensionsVC-dimensions

Machine learningMachine learning Grown out of artificial intelligence: Logical trees, Grown out of artificial intelligence: Logical trees,

Self-organizing maps.Self-organizing maps. More and more influence from statistics: More and more influence from statistics:

Gaussian Proc., HMM, Baysian nets Gaussian Proc., HMM, Baysian nets

The Communities Around the Core IIThe Communities Around the Core IIManifold learningManifold learning

Observed data belongs to a high-dimensional spaceObserved data belongs to a high-dimensional space The action takes place on a lower dimensional manifold: The action takes place on a lower dimensional manifold:

Find that!Find that!

ChemometricsChemometrics High-dimensional data spaces High-dimensional data spaces

(Many process variables)(Many process variables) Find linear low dimensional Find linear low dimensional

subspaces that capture the essential state: PCA, PLS subspaces that capture the essential state: PCA, PLS (Partial Least Squares), ..(Partial Least Squares), ..

EconometricsEconometrics Volatility ClusteringVolatility Clustering Common roots for variationsCommon roots for variations

The Communities Around the Core IIIThe Communities Around the Core IIIData miningData mining

Sort through large data bases looking for information: Sort through large data bases looking for information: ANN, NN, Trees, SVD…ANN, NN, Trees, SVD…

Google, Business, Finance…Google, Business, Finance…Artificial neural networksArtificial neural networks

Origin: Rosenblatt's perceptronOrigin: Rosenblatt's perceptron Flexible parametrization of hyper-Flexible parametrization of hyper-

surfacessurfacesFitting ODE coefficients to dataFitting ODE coefficients to data

No statistical framework: No statistical framework: Just link ODE/DAE solvers to Just link ODE/DAE solvers to optimizersoptimizers

System IdentificationSystem Identification Experiment designExperiment design Dualities between time- and frequency domainsDualities between time- and frequency domains

System Identification System Identification – Past and Present– Past and Present

Two basic avenues, both laid out in the 1960'sTwo basic avenues, both laid out in the 1960's• Statistical route: ML etc: Åström-Bohlin 1965Statistical route: ML etc: Åström-Bohlin 1965

• Prediction error framework: postulate predictor and apply curve-fittingPrediction error framework: postulate predictor and apply curve-fitting

• Realization based techniques: Ho-Kalman 1966Realization based techniques: Ho-Kalman 1966• Construct/estimate states from data and apply LS (Subspace Construct/estimate states from data and apply LS (Subspace

methods).methods).

Past and Present:Past and Present:• Useful model structuresUseful model structures

• Adapt and adopt core’s fundamentalsAdapt and adopt core’s fundamentals

• Experiment Design ….Experiment Design ….

•...with intended model use in mind (”identification for control”)...with intended model use in mind (”identification for control”)

System Identification System Identification - Future: Open Areas- Future: Open Areas

Spend more time with our neighbours!Spend more time with our neighbours!Report from a visit later onReport from a visit later on

Model reduction and system identificationModel reduction and system identification Issues in identification of nonlinear Issues in identification of nonlinear

systemssystemsMeet demands from industryMeet demands from industryConvexificationConvexification

Formulate the estimation task as a convex Formulate the estimation task as a convex optimization problemoptimization problem

Model Reduction Model Reduction

System Identification is really ”System Approximation” System Identification is really ”System Approximation” and therefore closely related to Model Reduction.and therefore closely related to Model Reduction.

Model Reduction is a separate area with an extensive Model Reduction is a separate area with an extensive

literature (``another satellite''), which can be more literature (``another satellite''), which can be more seriously linked to the system identification field.seriously linked to the system identification field.

• Linear systems - linear modelsLinear systems - linear models• Divide, conquer and reunite (outputs)!Divide, conquer and reunite (outputs)!

• Non-linear systems – linear modelsNon-linear systems – linear models• Understand the linear approximation - is it good for control?Understand the linear approximation - is it good for control?

• Nonlinear systems -- nonlinear reduced modelsNonlinear systems -- nonlinear reduced models• Much work remainsMuch work remains

Linear Systems - Linear ModelsLinear Systems - Linear ModelsDivide – ConquerDivide – Conquer – Reunite! – Reunite!

Helicopter data: 1 pulse input; 8 outputs Helicopter data: 1 pulse input; 8 outputs

(only 3 shown here). (only 3 shown here).

State Space model of order 20 wanted.State Space model of order 20 wanted.

First fit all 8 outputs at the same time:First fit all 8 outputs at the same time:

Next fit 8 SISO models of Next fit 8 SISO models of

order 12, one for each output:order 12, one for each output:

Linear Systems - Linear ModelsLinear Systems - Linear ModelsDivide – ConquerDivide – Conquer – – Reunite!Reunite!

Now, concatenate the 8 SISO models, reduce the 96th order model to order Now, concatenate the 8 SISO models, reduce the 96th order model to order 20, and run some more iterations.20, and run some more iterations.

( ( mm = [m1;…;m8]; mr = balred(mm,20); model = pem(zd,mr); mm = [m1;…;m8]; mr = balred(mm,20); model = pem(zd,mr); compare(zd,model)compare(zd,model) ) )

Linear Models from Nonlinear Linear Models from Nonlinear SystemsSystems

Model reduction fromnonlinear to linearcould besurprising!

Nonlinear SystemsNonlinear SystemsA user’s guide to nonlinear model A user’s guide to nonlinear model

structures suitable for identification and structures suitable for identification and controlcontrol

Unstable nonlinear systems, stabilized by Unstable nonlinear systems, stabilized by unknown regulatorunknown regulator

Stability handle on NL blackbox modelsStability handle on NL blackbox models

Industrial DemandsIndustrial Demands Data mining in large Data mining in large

historical process historical process data bases data bases (”K,M,G,T,P”)(”K,M,G,T,P”)

A serious integration of physical modeling and A serious integration of physical modeling and identification (not just parameter optimization in identification (not just parameter optimization in simulation software)simulation software)

PM 12, Stora Enso BorlängePM 12, Stora Enso Borlänge

75000 control signals, 15000 control loops75000 control signals, 15000 control loops

All process variables, All process variables, sampled at 1 Hz for sampled at 1 Hz for 100 years100 years

= 0.1 PByte = 0.1 PByte

Industrial Demands: Simple ModelsIndustrial Demands: Simple Models

Simple Models/Experiments for certain aspects Simple Models/Experiments for certain aspects of complex systemsof complex systems

Use input that enhances the aspects, …Use input that enhances the aspects, … … … and also conceals irrelevant featuresand also conceals irrelevant features

Steady state gain for arbitrary systemsSteady state gain for arbitrary systems Use constant input!Use constant input!

Nyquist curve at phase crossoverNyquist curve at phase crossover Use relay feedback experimentsUse relay feedback experiments

But more can be done …But more can be done … ……Hjalmarsson et al: ”Hjalmarsson et al: ”Cost of ComplexityCost of Complexity”.”.

An Example of a Specific AspectAn Example of a Specific Aspect

Estimate a non-minimum-phase zero in Estimate a non-minimum-phase zero in complex systems (without estimating the complex systems (without estimating the whole system) – For control limitations.whole system) – For control limitations.

A NMP zero at for an arbitrary system A NMP zero at for an arbitrary system can be estimated by using the input can be estimated by using the input

Example: 100 complex Example: 100 complex systems, all with a zero at 2, systems, all with a zero at 2, are estimated as 2nd order are estimated as 2nd order FIR models FIR models

System Identification System Identification - Future: Open Areas- Future: Open Areas

Spend more time with our neighbours!Spend more time with our neighbours!Report from a visit later onReport from a visit later on

Model reduction and system identificationModel reduction and system identification Issues in identification of nonlinear Issues in identification of nonlinear

systemssystemsMeet demands from industryMeet demands from industryConvexificationConvexification

Formulate the estimation task as a convex Formulate the estimation task as a convex optimization problemoptimization problem

Convexification IConvexification I Are Local Minima an Are Local Minima an

Inherent feature of a Inherent feature of a model structure? model structure?

Example: Example:

Michaelis – Menten kineticsMichaelis – Menten kinetics

This equation is a linear regression This equation is a linear regression that relates the unknown parameters that relates the unknown parameters and measured variables. We can thus and measured variables. We can thus find them by a simple least squares find them by a simple least squares procedure. We have, in a sense, procedure. We have, in a sense, convexified the problemconvexified the problem

Is this a general property?Is this a general property? Yes, any identifiable Yes, any identifiable

structure can be structure can be rearranged as a linearrearranged as a linearregression (Ritt's algorithm)regression (Ritt's algorithm)

Massage the equations:Massage the equations:

Convexification IIConvexification IIManifold LearningManifold Learning

1. X : Original regressors

2. g(x) Nonlinear, nonparametric recoordinatization

3. Z: New regressor, possibly of lower dimension

4. h(z): Simpleconvex map

5. Y: Goal variable (output)

Narendra-Li’s Example Narendra-Li’s Example

ConclusionsConclusions

System identification is a mature subject ...System identification is a mature subject ...same age as IFAC, with the longest runningsame age as IFAC, with the longest running

symposium seriessymposium series… … and much progress has allowed and much progress has allowed

important industrial applications …important industrial applications …… … but it still has an exciting and bright but it still has an exciting and bright

future! future!

Epilogue: The name of the game….Epilogue: The name of the game….

ThanksThanks

Research:Research: Martin Enqvist, Torkel Glad, Håkan Martin Enqvist, Torkel Glad, Håkan Hjalmarsson, Henrik Ohlsson, Jacob RollHjalmarsson, Henrik Ohlsson, Jacob Roll

Discussions:Discussions: Bart de Moor, Johan Schoukens, Rik Pintelon, Bart de Moor, Johan Schoukens, Rik Pintelon, Paul van den HofPaul van den Hof

Comments on paper:Comments on paper: Michel Gevers, Manfred Deistler, Michel Gevers, Manfred Deistler, Martin Enqvist, Jacob Roll, Thomas SchönMartin Enqvist, Jacob Roll, Thomas Schön

Comments on presentation:Comments on presentation: Martin Enqvist, Håkan Martin Enqvist, Håkan Hjalmarsson, Kalle Johansson, Ulla Salaneck, Thomas Hjalmarsson, Kalle Johansson, Ulla Salaneck, Thomas Schön, Ann-Kristin LjungSchön, Ann-Kristin Ljung

Special effects:Special effects: Effektfabriken AB, Sciss AB Effektfabriken AB, Sciss AB

NonLinear SystemsNonLinear Systems

Stability handle on NL blackbox models:Stability handle on NL blackbox models:

Documents

Perspectives on System Identification Lennart Ljung Linköping University, Sweden