46
SYSTEM SYSTEM IDENTIFICATION IDENTIFICATION Ali Karimpour Associate Professor Ferdowsi University of Mashhad Reference: “System Identification Theory For The User” Lennart Ljung

SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

  • Upload
    hakhue

  • View
    249

  • Download
    6

Embed Size (px)

Citation preview

Page 1: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

SYSTEMSYSTEMIDENTIFICATIONIDENTIFICATION

Ali KarimpourAssociate Professor

Ferdowsi University of Mashhad

Reference: “System Identification Theory For The User” Lennart Ljung

Page 2: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

2

Lecture 7

Parameter Estimation MethodsParameter Estimation MethodsTopics to be covered include: Guiding Principles Behind Parameter Estimation Method.

Minimizing Prediction Error.

Linear Regressions and the Least-Squares Method.

A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method.

Correlation Prediction Errors with Past Data.

Instrumental Variable Methods.

Page 3: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

3

Parameter Estimation MethodParameter Estimation Method

Topics to be covered include:

Guiding Principles Behind Parameter Estimation Method.

Minimizing Prediction Error.

Linear Regressions and the Least-Squares Method.

A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method.

Correlation Prediction Errors with Past Data.

Instrumental Variable Methods.

Page 4: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

4

Guiding Principles Behind Parameter Estimation Method

Parameter Estimation Method

MDMM |)(

)(),()(),()( teqHtuqGty

)(),(),()(),(1)|(ˆ:)( 11 tuqGqHtyqHtyM

Suppose that we have selected a certain model structure M. The set of models defined as:

For each θ , model represents a way of predicting future outputs. The predictor is a linear filter as:

Suppose the system is:

Page 5: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

5

Guiding Principles Behind Parameter Estimation Method

Suppose that we collect a set of data from system as:

)(,)(,...,)2(,)2(,)1(,)1( NuNyuyuyZ N

Formally we are going to find a map from the data ZN to the set DM

MNN DZ

Such a mapping is a parameter estimation method.

Page 6: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

6

Guiding Principles Behind Parameter Estimation Method

Evaluating the candidate modelLet us define the prediction error as:

)|(ˆ)(),( tytyt

When the data set ZN is known, these errors can be computed for t=1, 2 , … , N

A guiding principle for parameter estimation is:

Based on Zt we can compute the prediction error ε(t,θ). Select so that the

prediction error t=1, 2, … , N, becomes as small as possible.

N

,)ˆ,( Nt

?

We describetwo approaches

• Form a scalar-valued criterion function that measure the size of ε.

• Make uncorrelated with a given data sequence. )ˆ,( Nt

Page 7: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

7

Parameter Estimation MethodParameter Estimation Method

Topics to be covered include:

Guiding Principles Behind Parameter Estimation Method.

Minimizing Prediction Error.

Linear Regressions and the Least-Squares Method.

A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method.

Correlation Prediction Errors with Past Data.

Instrumental Variable Methods.

Page 8: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

8

Minimizing Prediction Error

Clearly the size of prediction error

)|(ˆ)(),( tytytis the same as ZN

Let to filter the prediction error by a stable linear filter L(q)

),()(),( tqLtF Then use the following norm

N

tF

NN tl

NZV

1),(1),(

Where l(.) is a scalar-valued positive function.

The estimate is then defined by:N

),(minarg)(ˆˆ NND

NNN ZVZ

M

Page 9: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

9

Minimizing Prediction Error

),()(),( tqLtF

N

tF

NN tl

NZV

1),(1),(

),(minarg)(ˆˆ NND

NNN ZVZ

M

Generally the term prediction error identification methods (PEM) is used for the family of this approaches.

Particular methods with specific names are used according

to:

• Choice of l(.)

• Choice of L(.)

• Choice of model structure

• Method by which the minimization is realized

Page 10: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

10

Minimizing Prediction Error

),()(),( tqLtF

N

tF

NN tl

NZV

1),(1),(

),(minarg)(ˆˆ NND

NNN ZVZ

M

Choice of LThe effect of L is best understood in a frequency-domain interpretation. Thus L acts like frequency weighting.

See also >> 14.4 Prefiltering

Exercise 7-1: Consider following system

)(),()(),()( teqHtuqGty Show that the effect of prefiltering by L is identical to changing the noise model from

),()(),( 1 qHqLqH

Page 11: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

11

Minimizing Prediction Error

),()(),( tqLtF

N

tF

NN tl

NZV

1),(1),(

),(minarg)(ˆˆ NND

NNN ZVZ

M

Choice of l

A standard choice, which is convenient both for computation and analysis.

2

21)( l

See also >> 15.2 Choice of norms: Robustness (against bad data)

One can also parameterize the norm independent of the model parameterization.

Page 12: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

12

Parameter Estimation MethodParameter Estimation Method

Topics to be covered include:

Guiding Principles Behind Parameter Estimation Method.

Minimizing Prediction Error.

Linear Regressions and the Least-Squares Method.

A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method.

Correlation Prediction Errors with Past Data.

Instrumental Variable Methods.

Page 13: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

13

Linear Regressions and the Least-Squares Method

)()()|(ˆ ttty T We introduce linear regressions before as:

φ is the regression vector and for the ARX structure it is Tba ntutuntytyt )(...)1()(...)1()(

μ(t) is a known data dependent vector. For simplicity let it zero in the reminder of this section.Least-squares criterion

)()(),( :iserror Prediction ttyt T

Now let L(q)=1 and l(ε)= ε2/2 then

211

)()(211),(1),( tty

Ntl

NZV T

N

t

N

tF

NN

This is Least-squares criterion for the linear regression

Page 14: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

14

Linear Regressions and the Least-Squares Method

Least-squares criterion

211

)()(211),(1),( tty

Ntl

NZV T

N

t

N

tF

NN

The least square estimate (LSE) is:

N

t

N

t

TNLSN tyt

Ntt

NZV

1

1

1)()(1)()(1),(minargˆ

)(NR )(Nf

)()(ˆ 1 NfNRLSN

Page 15: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

15

Linear Regressions and the Least-Squares Method

)()()( tetty T We introduce linear regressions before as:

2

11

2 )()(21

21),( ttyZV T

N

t

N

tt

N

N

Least-squares criterion

)()(21 NN

T

NN YY

N

T

NN

T

N

LS

N Y 1

N

t

N

t

TNLS

N tytttZV1

1

1)()()()(),(minargˆ

whiteis e(t) and regressors with duncorelate are components Noise

invertible is Suppose

N

T

NΦΦ

Under above assumptions the LSE is BLUE (Best linear unbiased estimator).

Page 16: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

16

Linear Regressions and the Least-Squares Method

)()()( tetty T We introduce linear regressions before as:Least-squares criterion

N

T

NN

T

N

LS

N Y 1

whiteis e(t) and regressors with duncorelate are components Noise

invertible is Suppose

N

T

NΦΦ

Under above assumptions the LSE is BLUE (Best linear unbiased estimator).

NTY meansLinear

}ˆE{ means Unbiased

covariance possible minimum}ˆcov{ meansBest

Page 17: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

17

Linear Regressions and the Least-Squares Method

eY NN We introduce linear regressions before as:

NTY meansLinear

}{}{ˆ TeTETYE}θE{ NN

If TN=I and T is uncorrelated with e then the estimator is unbiased.

Clearly LSE is unbiased.

Condition for linear unbiased estimation?

}θE{ ˆ

Page 18: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

18

Linear Regressions and the Least-Squares Method

eY NN We introduce linear regressions before as:

NTY meansLinear

})ˆ)(ˆ{(ˆcov TθθθθE}θ{

Condition for best linear unbiased estimation?

}))({( T

NN θTYθTYE

}))({( T

NN θTeθTθTeθTE

If the estimator is unbiased then TN=I so:

}))({(ˆcov TθTeθθTeθE}θ{ }{ TTTTeeE

If the estimator is unbiased then T is uncorrelated with e so:

}{)cov(}{ˆcov TTEeTE}θ{ TTT2 We must minimize it?

Page 19: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

19

Linear Regressions and the Least-Squares Method

ITT}θ{ T

T N

2 T subject toˆcovmin ..............................

T

NN

T

NT 1

N

T

NN

T

N

LS

N Y 1

Exercise 7-2: Show that the answer of above optimization is:

So LSE is BLUE since:

Exercise 7-3: Show that by LSE in linear regression one can find an unbiased estimate of cov{e} by

),ˆ(2ˆ 2 NLS

NNe ZVdN

Page 20: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

20

Linear Regressions and the Least-Squares Method

Weighted Least SquaresDifferent measurement could be assigned different weights

2

1

)()(1),(

N

t

Tt

NN tty

NZV

or

2

1

)()(),(),(

N

t

TNN ttytNZV

N

t

N

t

TLSN tyttNtttN

1

1

1)()(),()()(),(ˆ

The resulting estimate is the same as previous.

Page 21: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

21

Linear Regressions and the Least-Squares Method

Colored Equation-error Noise

if the disturbance v(t) is not white noise, then the LSE will not converge to the true value ai and bi .To deal with this problem, we may incorporate further modeling of the equation error v(t) as discussed in chapter 4, let us say

We show that in a difference equation

)()(...)1(

)(...)1()(

1

1

tvntubtub

ntyatyaty

bn

an

b

a

)()()( teqktv Now e(t) is white noise, but the new model take us out from LS environment, except in two cases:

• Known noise properties

• High-order models

Page 22: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

22

Linear Regressions and the Least-Squares Method

Colored Equation-error Noise • Known noise properties

)()(...)1(

)(...)1()(

1

1

tvntubtub

ntyatyaty

bn

an

b

a

Suppose the values of ai and bi are unknown, but k is a known filter (not too realistic a situation), so we have

)()()( teqktv

Filtering through k-1(q) gives

where

Since e(t) is white, the LS method can be applied without problems.

Notice that this is equivalent to applying the filter L(q)=k-1(q) .

)()()()()()( teqktuqBtyqA

)()()()()( tetuqBtyqA ff

)()()()()()( 11 tuqktutyqkty ff

Page 23: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

23

Linear Regressions and the Least-Squares Method

Colored Equation-error Noise

Now we can apply LS method. Note that nA=na+r, nB=nb+r

• High-order models

)()(...)1(

)(...)1()(

1

1

tvntubtub

ntyatyaty

bn

an

b

a

Suppose that the noise v can be well described by k(q)=1/D(q) where D(q) is a polynomial of order r. So we have

)()()( teqktv

)()(

1)()()()( teqD

tuqBtyqA

or

)()()()()()()( tetuqDqBtyqDqA

After deriving AD and BD one can easily derive A and B.

)()(

)()()()(

qAqB

qDqAqDqB

Page 24: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

24

Topics to be covered include:

Guiding Principles Behind Parameter Estimation Method.

Minimizing Prediction Error.

Linear Regressions and the Least-Squares Method.

A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method.

Correlation Prediction Errors with Past Data.

Instrumental Variable Methods.

Parameter Estimation MethodParameter Estimation Method

Page 25: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

25

A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method

Estimation and the Principle of Maximum Likelihood

);(),...,,;( 21N

yN xfxxxf That is:

N

Ax

Ny

N dxxfAyPN

);()(

The area of statistical inference, deals with the problem of extracting informationfrom observations that themselves could be unreliable.

Suppose that observation yN=(y(1), y(2),…,y(N)) has following probability density function (PDF)

θ is a d-dimensional parameter vector. The propose of the observation is in fact to estimate the vector θ using yN.

dNN RRy )(

Suppose the observed value of yN is yN*, then

)(ˆˆ**Ny

Page 26: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

26

A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method

Estimation and the Principle of Maximum LikelihooddNN RRy )(

Many such estimator functions are possible.

),( *N

y yf

A particular one >>>>>>>>> maximum likelihood estimator (MLE) .

),(maxarg)(ˆ**N

yN

ML yfy

The probability that the realization(=observation) indeed should take the value yN* is

proportional to

This is a deterministic function of θ once the numerical value yN* is inserted and it is

called Likelihood function.A reasonable estimator of θ could then be

where the maximization performed for fixed yN* . This function is known as the

maximum likelihood estimator (MLE).

Page 27: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

27

A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method

Example 7-1: Let Niiy ,...,1,)(

Be independent random variables with normal distribution with unknown means θ0

and known variances λi ),()( 0 iNiy A common estimator is the sample mean:

N

i

NSM iy

Ny

1)(1)(

To calculate MLE, we start to determine the joint PDF for the observations. The PDF for y(i) is:

i

i

i

x

2)(exp

21 2

Joint PDF for the observations is: (since y(i) are independent)

N

i i

i

i

Ny

xxf1

2

2)(exp

21);(

Page 28: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

28

A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method

Example 7-1: Let Niiy ,...,1,)(

Be independent random variables with normal distribution with unknown means θ0

and known variances λi

A common estimator is the sample mean:

N

i

NSM iy

Ny

1)(1)(

Joint PDF for the observations is: (since y(i) are independent)

N

i i

i

i

Ny

xxf1

2

2)(exp

21);(

So the likelihood function is:);( N

y yf

Maximizing likelihood function is the same as maximizing its logarithm. So);(logmaxarg)(ˆ N

yN

ML yfy

N

i ii

N

i

iyN1

2

1

)(21

212log

2maxarg

N

i iN

ii

NML

iyy1

1

)(

/1

1)(ˆ

Page 29: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

29

2 4 6 8 10-20

0

20

40

Different experimentsD

iffer

ent e

stim

ator

s

A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method

Example 7-1: Let Niiy ,...,1,)(

Be independent random variables with normal distribution with unknown means θ0

and known variances λi),()( 0 iNiy

Suppose N=15 and y(i) is derived from a random generation (normal distribution) such that the means is 10 but variances are:

10, 2, 3, 4, 61, 11, 0.1, 121, 10, 1, 6, 9, 11, 13, 15 The estimated means for 10 differentexperiments are shown in the figure:

N

i

NSM iy

Ny

1)(1)(

N

i iN

ii

NML

iyy1

1

)(

/1

1)(ˆ

)(ˆ NSM y

)(ˆ NML y

Exercise 6-4:Do the same procedure for another experiments and draw the corresponding figure.

Exercise 6-5:Do the same procedure for another experiments and draw the corresponding figure. Suppose all variances as 10.

Page 30: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

30

A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method

Relationship to the Maximum A Posteriori (MAP) Estimate

Maximum likelihood estimator (MLE)

),(maxarg)(ˆ**N

yN

ML yfy

The Bayesian approach is used to derive another parameter estimation problem.

In the Bayesian approach the parameter itself is thought of as a random variable.

Let the prior PDF for θ is:)()( zPzg

The Maximum A Posteriori (MAP) estimate is:

)().,(maxarg)(ˆ gyfy N

yN

MAP

)();()|( gyfyP Ny

N After some manipulation

Page 31: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

31

A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method

Cramer-Rao Inequality

TNN yyEP 00 )(ˆ)(ˆ}ˆcov{

The quality of an estimator can be assessed by its mean-square error matrix:

True value of θWe may be interested in selecting estimators that make P small. Cramer-Rao inequality give a lower bound for P

M is Fisher Information

matrix

0)(ˆ NyE

100 )(ˆ)(ˆ MyyEPTNN

Let

Then

An estimator is efficient if P=M-1

Exercise 7-6:Proof Cramer-Rao inequality.

Page 32: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

32

A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method

Asymptotic Properties of the MLE

TNN yyEP 00 )(ˆ)(ˆ Calculation of Is not an easy task.

Therefore, limiting properties as the sample size tends to infinity are calculated instead.

For the MLE in case of independent observations, Wald and Cramer obtain

Suppose that the random variable {y(i)} are independent and identically distributed, so that

N

iiiyNy xfxxxf

1)(21 );(),...,,;(

Suppose also that the distribution of yN is given by fy(θ0 ;xN) for some value θ0. Then tends to θ0 with probability 1 as N tends to infinity, and )(ˆ N

ML y

0)(ˆ NML yN

converges in distribution to the normal distribution with zero mean covariance matrix given by Cramer-Rao lower bound M-1.

Page 33: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

33

A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method

Likelihood function for Probabilistic Models of Dynamical Systems

)|(ˆ)(),();,()|(ˆ:)( 1

tytyt

ZtgtyM t

Suppose

);,(PDF thehave andt independen is

txfe

Recall this kind of model a complete probabilistic model.

We note that, the output is:);,( PDF thehas ),( where),()|(ˆ)( txftttyty e

Now we must determine the likelihood function);( N

y yf

Probabilistic Models of Dynamical Systems

Page 34: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

34

A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method

?);( Ny yf

Lemma: Suppose ut is given as a deterministic sequence, and assume that the generation of yt is described by the model

),( is )( of PDF lconditiona the where)(),()( 1 txfttZtgty et

Then the joint probability density function for yt , given ut is:

)(),,()()|,( 1

1 IkZkgkyfuytft

k

ke

ttm

Proof: CPDF of y(t), given Zt-1 , is tZtgxfZxp tte

tt ),,()|( 11

Using Bayes’s rule, the joint CPDF of y(t) and y(t-1), given Zt-2 can be expressed as:

1),,1(.),,( 21

1

tZtgxftZtgxf tte

tte

)|().,)1(|()|,( 21

21

21

t

tt

ttt

tt ZxpZxtyxpZxxp

Similarly we derive (I)

Page 35: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

35

A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method

)|(ˆ)(),();,()|(ˆ:)( 1

tytyt

ZtgtyM t

Suppose

);,(PDF thehave andt independen is

txfe

Now we must determine the likelihood function

Probabilistic Models of Dynamical Systems

By previous lemma

N

t

te

Ny tZtgtyfyf

1

1 ;),;,()();(

N

te ttf

1

;),,(

Maximizing this function is the same as maximizing

N

te

Ny ttf

Nyf

1

;),,(log1);(logN1

If we define

);,(log),,( tftl e

Page 36: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

36

A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method

Probabilistic Models of Dynamical SystemsMaximizing this function is the same as maximizing

N

te

Ny ttf

Nyf

1;),,(log1);(log

N1

If we define);,(log),,( tftl e

We may write

N

t

NML ttl

Ny

1);),,((1minarg)(ˆ

The ML method can thus be seen as a special case of the PEM.

Exercise 7-7 Find the Fisher information matrix for this system.

Exercise 7-8: Derive a lower bound for .ˆNCov

Page 37: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

37

Topics to be covered include:

Guiding Principles Behind Parameter Estimation Method.

Minimizing Prediction Error.

Linear Regressions and the Least-Squares Method.

A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method.

Correlation Prediction Errors with Past Data.

Instrumental Variable Methods.

Parameter Estimation MethodParameter Estimation Method

Page 38: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

38

Correlation Prediction Errors with Past Data

0),()(11

N

ttt

N

Ideally, the prediction error ε(t,θ) for good model should be independent of the past data Zt-1

If ε(t,θ) is correlated with Zt-1 then there was more information available in Zt-1

about y(t) than picked up by )|(ˆ ty

To test if ε(t,θ) is independent of the data set Zt-1we must check

This is of course not feasible in practice.

Uncorrelated withAll transformation of ε(t,θ)

All possible function of Zt-1

Instead, we may select a certain finite-dimensional vector sequence {ζ(t)} derived from Zt-1 and a certain transformation of {ε(t,θ)} to be uncorrelated with this sequence. This would give

Derived θ would be the best estimate based on the observed data.

Page 39: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

39

Correlation Prediction Errors with Past Data

),()(),( tqLtF

Choose a linear filter L(q) and let

Choose a sequence of correlation vectors

),,(),( 1 tZtt

Choose a function α(ε) and define

N

tF

NN tt

NZf

1),(),(1),(

Then calculate

0),(ˆ

NNDN Zfsol

M

Instrumental variable method (next section) is the best known representative of this family.

Page 40: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

40

Correlation Prediction Errors with Past Data

0),(ˆ

NNDN Zfsol

M

N

tF

NN tt

NZf

1),(),(1),(

Normally, the dimension of ξ would be chosen so that fN is a d-dimensional vector.

Then there is many equations as unknowns. Sometimes one use ξ with higher dimension than d so there is an over determined set of equations, typically without solution. so

),(minargˆ NNDN Zf

M

Exercise 7-9: Show that the prediction-error estimate obtained from

),(minarg)(ˆˆ NND

NNN ZVZ

M

can be also seen as a correlation estimate for a particular choice of L, ζ and α.

Page 41: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

41

Correlation Prediction Errors with Past Data

),()|(ˆ tty T

Pseudolinear Regressions

We saw in chapter 4 that a number of common prediction models could be written as:

Pseudo-regression vector φ(t,θ) contains relevant past data, it is reasonable to require the resulting prediction errors be uncorrelated with φ(t,θ) so:

)(),(),( tt

N

t

TPLRN ttyt

Nsol

1

0),()(),(1ˆ

Which we term the PLR estimate. Pseudo linear regressions estimate.

Page 42: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

42

Topics to be covered include:

Guiding Principles Behind Parameter Estimation Method.

Minimizing Prediction Error.

Linear Regressions and the Least-Squares Method.

A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method.

Correlation Prediction Errors with Past Data.

Instrumental Variable Methods.

Parameter Estimation MethodParameter Estimation Method

Page 43: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

43

Instrumental Variable Methods

)()|(ˆ tty T

Consider linear regression as:

The least-square estimate of θ is given by

N

t

TLSN ttyt

Nsol

1

0)()()(1ˆ

So it is a kind of PEM with L(q)=1 and ξ(t,θ)=φ(t)

Now suppose that the data actually described by

)()()( 00 tvtty T

We found in section 7.3 that LSE will not tend to θ0 in typical cases.N

Page 44: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

44

Instrumental Variable Methods

N

t

TLSN ttyt

Nsol

1

0)()()(1ˆ )()()( 00 tvtty T

We found in section 7.3 that LSE will not tend to θ0 in typical cases.N

N

t

TIVN ttyt

Nsol

1

0)()()(1ˆ

Such an application to a linear regression is called instrumental-variable method.

The elements of ξ are then called instruments or instrumental variables.

Estimated θ is:

)()(1)()(1ˆ1

1

1

tytN

ttN

N

t

N

t

TIVN

Page 45: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

45

Instrumental Variable Methods

N

t

TLSN ttyt

Nsol

10)()()(1ˆ

?method IVin asˆ Does 0 NN

Exercise 7-10: Show that will be exist and tend to θ0 if following equations exists.

IVN

0)()(rnonsingula be)()(

0 tvtξEttE T

We found in section 7.3 that LSE will not tend to θ0 in typical cases.N

N

t

TIVN ttyt

Nsol

10)()()(1ˆ )()(1)()(1ˆ

1

1

1tyt

Ntt

N

N

t

N

t

TIVN

Page 46: SYSTEM IDENTIFICATION - Ferdowsi University of Mashhadkarimpor.profcms.um.ac.ir/imagesm/354/stories/iden/system_iden7.pdf · SYSTEM IDENTIFICATION Ali Karimpour Associate Professor

lecture 7

Ali Karimpour Jan 2014

46

Instrumental Variable Methods

So we need

(II) 0)()((I)r nonsingula be)()(

0 tvtξEttE T

)()(...)1()(...)1()( 11 tvntubtubntyatyaty bnan ba

Consider an ARX model

A natural idea is to generate the instruments so as to secure (II) but also consider (I)

)(...)1()(...)1()()( ba ntutuntxtxqKt

Where K is a linear filter and x(t) is generated through a linear system

)()()()( tuqMtxqN

Most instruments used in practice are generated in this way.

(II) and (I) are satisfied. Why?