20
System Identification Lecture 10: Prediction error methods and pseudo-linear regressions Roy Smith 2018-11-20 10.1 Prediction Gpzq ` H pzq vpkq ypkq upkq epkq Typical assumptions Gpz q and H pz q are stable, H pz q is stably invertible (no zeros outside the unit disk) epkq has known statistics: known pdf or known moments. One-step ahead prediction Given Z K “tup0q,yp0q,...,upK ´ 1q,ypK ´ 1qu, what is the best estimate of ypK q? 2018-11-20 10.2

System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

System IdentificationLecture 10: Prediction error methods and pseudo-linear regressions

Roy Smith

2018-11-20 10.1

Prediction

Gpzq`

Hpzqvpkq

ypkq upkq

epkq

Typical assumptions

Gpzq and Hpzq are stable,

Hpzq is stably invertible (no zeros outside the unit disk)

epkq has known statistics: known pdf or known moments.

One-step ahead prediction

Given ZK “ tup0q, yp0q, . . . , upK ´ 1q, ypK ´ 1qu,what is the best estimate of ypKq?

2018-11-20 10.2

Page 2: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

Prediction

Hpzq epkqvpkq

Noise model invertibility

Given, vpkq, k “ 0, . . . ,K ´ 1, can we determine epkq, k “ 0, . . . ,K ´ 1?

Inverse filter: Hinvpzq : epkq “8ÿ

i“0

hinvpiqvpk ´ iq

We also want the inverse filter to be causal and stable:

hinvpkq “ 0, k ă 0, and8ÿ

k“0

|hinvpkq| ă 8.

If Hpzq has no zeros for |z| ě 1, then,

Hinvpzq “ 1

Hpzq .

2018-11-20 10.3

Prediction

Hpzq epkqvpkq

One step ahead prediction

Given measurements of vpkq, k “ 0, . . . ,K ´ 1, can we predict vpKq?

Assume that we know Hpzq, how much can we say about vpKq?

Assume also that Hpzq is monic (hp0q “ 1).

vpkq “8ÿ

i“0

hpiqepk ´ iq

“ epkq `8ÿ

i“1

hpiqepk ´ iqloooooooomoooooooon“ mpk ´ 1q“observed”

2018-11-20 10.4

Page 3: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

Prediction

Hpzq epkqvpkq

One-step ahead prediction

The prediction of vpkq, based on measurements up to time k ´ 1 is,

vpk|k ´ 1q.We will argue that a good choice in this case is,

vpk|k ´ 1q “ mpk ´ 1q “8ÿ

i“1

hpiqepk ´ iq.

The error in our prediction is epkq — which we clearly can’t reduce.

2018-11-20 10.5

One-step prediction statistics

General case

Say epkq is identically distributed with pdf: fepxq,

Probtx ď epkq ď x` δxu “ż x`δx

x

fepxqdx « fepxqδx.

A posteriori distribution

What are the statistics of vpkq given vk´1´8 “ tvp´8q, . . . , vpk ´ 1qu?

Probtx ď vpkq ď x` δx|vk´1´8 u “

“ Probtx ď epkq `mpk ´ 1q ď x` δxu“ Probtx´mpk ´ 1q ď epkq ď x´mpk ´ 1q ` δxu

« fepx ´mpk ´ 1qqδx.

2018-11-20 10.6

Page 4: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

One-step ahead prediction statistics

Maximum of the conditional (a posteriori) distribution

Select the prediction estimate as the peak value of the conditional distribution:

vpk|k ´ 1q “ argmaxx

fepx´mpk ´ 1qq“ mpk ´ 1q for the Gaussian case.

This is the most probable value of vpk | k ´ 1q.Mean of the conditional distribution

Select the prediction estimate as the mean value of the conditional distribution:

vpk|k ´ 1q “ Etvpkq|vk´1´8 u “ Etepkq `mpk ´ 1qu

“ mpk ´ 1q ` Etepkqu “ mpk ´ 1q.This is the expected value of vpk|k ´ 1q.

2018-11-20 10.7

One-step ahead prediction

Calculation

vpk|k ´ 1q “ mpk ´ 1q “8ÿ

i“1

hpiqepk ´ iq

“ pHpzq ´ 1q epkq (assuming Hpzq is monic)

“ Hpzq ´ 1

Hpzq vpkq“ p1´Hinvpzqq vpkq

“ ´8ÿ

i“1

hinvpiqvpk ´ iq

Note that vpk|k ´ 1q depends only on values up to time k ´ 1.

The best we can do is:

vpk|k ´ 1q “ ´kÿ

i“1

hinvpiqvpk ´ iq « ´8ÿ

i“1

hinvpiqvpk ´ iq.

2018-11-20 10.8

Page 5: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

Example

Moving average model

vpkq “ epkq ` cepk ´ 1q, ùñ Hpzq “ 1` cz´1.

For Hpzq to be stably invertible we require |c| ă 1.

Hinvpzq “ 1

1` cz´1“

8ÿ

i“0

p´cqiz´i.

One-step ahead predictor

vpk|k ´ 1q “ p1´Hinvpzqqvpkq “ ´8ÿ

i“1

p´cqivpk ´ iq

« ´kÿ

i“1

p´cqivpk ´ iq

“ cvpk ´ 1q ´ c2vpk ´ 2q ` c3vpk ´ 3q ` ¨ ¨ ¨ ´ p´cqkvp0q.

2018-11-20 10.9

Example

Moving average model

vpkq “ epkq ` cepk ´ 1q, ùñ Hpzq “ 1` cz´1.

Recursive formulation

Note that,

Hpzqvpk|k ´ 1q “ pHpzq ´ 1qvpkqSo,

vpk|k ´ 1q ` cvpk ´ 1|k ´ 2q “ cvpk ´ 1q

vpk|k ´ 1q “ c pvpk ´ 1q ´ vpk ´ 1|k ´ 2qqlooooooooooooooooomooooooooooooooooonεpk ´ 1q (prediction error at k ´ 1)

“ cεpk ´ 1q

2018-11-20 10.10

Page 6: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

Another example

Autoregressive noise model

Our noise model is:

vpkq “8ÿ

i“0

aiepk ´ iq |a| ă 1 for stability.

So, Hpzq “8ÿ

i“0

aiz´i “ 1

1´ az´1,

and Hinvpzq “ 1´ az´1 (a moving average process)

Our one-step ahead predictor is,

vpk|k ´ 1q “ p1´Hinvpzqqvpkq “ avpk ´ 1q.

2018-11-20 10.11

Output prediction

ypkq “ Gpzqupkq ` vpkq

Gpzq`

Hpzqvpkq

ypkq upkq

epkq

One-step ahead prediction

Maximise the expected value of the conditional distribution,

ypk|k ´ 1q “ Etypkq|ZKu “ Gpzqupkq ` vpk|k ´ 1q“ Gpzqupkq ` p1´Hinvpzqqvpkq“ HinvpzqGpzqupkq ` p1´Hinvpzqqypkq

2018-11-20 10.12

Page 7: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

Output prediction

ypkq “ Gpzqupkq ` vpkq

Gpzq`

Hpzqvpkq

ypkq upkq

epkq

Prediction error

ypkq ´ ypk|k ´ 1q “ ´HinvpzqGpzqupkq `Hinvpzqypkq“ Hinvpzqpypkq ´Gpzqupkqq “ Hinvpzqvpkq“ epkq

The innovation is the part of the output prediction that cannot be estimatedfrom past measurements.

2018-11-20 10.13

Prediction error based identification

The one-step ahead predictor is parametrised by θ,

ypk|θ, ZKq “ Hinvpθ, zqGpθ, zqupkq ` p1´Hinvpθ, zqqypkqDefine a parametrised prediction error,

εpk, θq “ ypkq ´ ypk, θq,which we can optionally filter,

εF pk, θq “ F pzqεpk, θq (weighted error).

Define a cost function,

Jpθ, ZKq “ 1

K

K´1ÿ

k“0

lpεF pk, θqq typically lpεF pk, θqq “ }εF pk, θq}2.

θ “ argminθ

Jpθ, ZKq.

2018-11-20 10.14

Page 8: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

Prediction error methods: ARX models

Bpθ, zqApθ, zq`

1

Apθ, zq

vpkqypkq upkq

epkqGpθ, zq “ Bpθ, zq

Apθ, zq ,

Hpθ, zq “ 1

Apθ, zq ,

ypk|θq “ Hinvpθ, zqGpθ, zqupkq ` p1´Hinvpθ, zqqypkq“ Bpzqupkq ` p1´Apzqqypkq“ θTφpkq “ φT pkqθ.

So,

Y ´ Φθ “ ε ÐÝ vector of prediction errors

Least squares regression approach minimises the prediction errors.

2018-11-20 10.15

Model structures

Gpθ, zq`

Hpθ, zq

vpkqypkq upkq

epkq

ARX model structure (equation error)

Bpθ, zqApθ, zq`

1

Apθ, zq

vpkqypkq upkq

epkqGpθ, zq “ Bpθ, zq

Apθ, zq ,

Hpθ, zq “ 1

Apθ, zq ,

2018-11-20 10.16

Page 9: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

ARMAX model structure

Bpθ, zqApθ, zq`

Cpθ, zqApθ, zq

vpkqypkq upkq

epkqGpθ, zq “ Bpθ, zq

Apθ, zq ,

Hpθ, zq “ Cpθ, zqApθ, zq ,

with Apzq, Cpzq monic.

Prediction error structure

ypk|θq “ BpzqCpzqupkq `

ˆ1´ Apzq

Cpzq˙ypkq

Cpzqypk|θq “ Bpzqupkq ` pCpzq ´Apzqq ypkq

ypk|θq “ Bpzqupkq ` p1´Apzqqypkq ` pCpzq ´ 1q pypkq ´ ypk|θqqloooooooomoooooooonεpkq

2018-11-20 10.17

Pseudolinear regression

One-step ahead ARMAX predictor

ypk|θq “ Bpzqupkq ` p1´Apzqqypkq ` pCpzq ´ 1qεpkq“ “

b1 . . . a1 . . . c1 . . .‰

“upk ´ 1q . . . ´ypk ´ 1q . . . εpk ´ 1q . . .

‰T

“ ϕT pθ, kqθ.This is not linear in θ.

Optimisation-based algorithm

minimiseθ,ε

}ε}2 (or more generally, lpεq )

subject to Y “ ΦpεqT θ ` ε (nonlinear equality constraint)

2018-11-20 10.18

Page 10: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

ARMAX example

ARMAX model structure

Bpθ, zqApθ, zq`

Cpθ, zqApθ, zq

vpkqypkq upkq

epkqApzq “ 1` a1z´1 ` a2z´2

Bpzq “ b1z´1 ` b2z´2

Cpzq “ 1` c1z´1 ` c2z´2

θ “ “b1 b2 a1 a2 c1 c2

‰T.

Experiments

§ The plant is “at rest”.

§ Data length K “ 31.

§ PRBS input signal, upkq.

2018-11-20 10.19

ARMAX example

Typical experimental data

Index: k0 5 10 15 20 25 30

-1

-0.5

0

0.5

1u(k)

Index: k0 5 10 15 20 25 30

-20

-10

0

10

20

y(k)

Index: k0 5 10 15 20 25 30

-0.5

0

0.5

v(k)

2018-11-20 10.20

Page 11: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

Constrained minimisation code

% Create data part of regressor. Assume plant at rest

PhiTyu(1,:) = [0, 0, 0, 0];

PhiTyu(2,:) = [u(1),0, -y(1), 0];

for i = 3:K,

PhiTyu(i,:) = [u(i-1), u(i-2), -y(i-1), -y(i-2)];

end

[x,fval] = fmincon(@(x)ARMAXobjective(x),x0,...

[],[],[],[],[],[],@(x)ARMAXconstraint(x,y,PhiTyu));

function [f] = ARMAXobjective(x) % x = [theta; e]

f = sqrt(x(7:end)’*x(7:end));

function [c,ceq] = ARMAXconstraint(x,y,PhiTyu)

e = x(7:end);

PhiTe = zeros(K,2);

PhiTe(2,1) = e(1);

for j = 3:K,

PhiTe(j,:) = [e(j-1), e(j-2)];

end

ceq = y - [PhiTyu, PhiTe] * theta - e; c = [];

2018-11-20 10.21

ARMAX example

Transfer function averages: 128 experiments (data length, K “ 31)

Frequency10

-210

-110

0

Mag.

10-1

100

101

102

103

Mean estimate comparison, K = 31

GzHzGoptHopt

10-2

10-1

100

-200

-150

-100

-50

0

50

2018-11-20 10.22

Page 12: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

ARMAX example

Coefficient statistics for 128 experiments

coefficient

b_1 b_2 a_1 a_2 c_1 c_2-15

-10

-5

0

5

10

15

Expt datalength = 31 (green circles are the true values)

2018-11-20 10.23

ARMAX example

Coefficient error for averages: 2, 4, 8, . . . , 128 experiments

Total data length (#avgs x expt length)10

110

210

310

4

the

ta e

rro

r

10-4

10-3

10-2

10-1

100

101

102

103

b1

b2

a1

a2

c1

c2

2018-11-20 10.24

Page 13: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

ARMAX example

Transfer function estimates: 128 experiments (data length, K “ 31)

Frequency: rad/sample10

-210

-110

0

Magnitude

10-1

100

101

102

|G(jω )| estimates (128) and mean estimate for K = 31

2018-11-20 10.25

ARMAX example

Transfer function estimates: 128 experiments (data length, K “ 31)

Frequency: rad/sample10

-210

-110

0

Magnitude

10-1

100

101

102

103

104

|H(jω )| estimates (128) and mean estimate for K = 31

2018-11-20 10.26

Page 14: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

ARMAX example

Prediction errors and actual innovations (data length, K “ 31)

Total data length (#avgs x expt length)10

110

210

310

410

-2

10-1

Actual innovations vs. optimum prediction error

mean ||e||/sqrt(N)mean ||eps||/sqrt(N)

2018-11-20 10.27

ARMAX example

Longer experiments: K “ 127

Index: k0 20 40 60 80 100 120

-1

-0.5

0

0.5

1u(k)

Index: k0 20 40 60 80 100 120

-20

-10

0

10

20

y(k)

Index: k0 20 40 60 80 100 120

-0.5

0

0.5

v(k)

2018-11-20 10.28

Page 15: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

ARMAX example

Coefficient statistics comparison: K “ 31 and K “ 127

b1K “ 31

b1K “ 127

a1

K “ 31a1

K “ 127c1

K “ 31c1

K “ 127

´15

´10

´5

0

5

10

15

Coefficients

2018-11-20 10.29

ARMAX example

Coefficient error comparison: K “ 31 and K “ 127

Total data length (#avgs x expt length)10

110

210

310

4

the

ta e

rro

r

10-4

10-3

10-2

10-1

100

101

102

103

b1

b2

a1

a2

c1

c2

2018-11-20 10.30

Page 16: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

ARMAX example

Prediction error comparison: K “ 31 and K “ 127

Total data length (#avgs x expt length)10

110

210

310

410

-2

10-1

Actual innovations vs. optimum prediction error

mean ||e||/sqrt(N)mean ||eps||/sqrt(N)

2018-11-20 10.31

ARMAX example

Transfer function estimates: 32 experiments (data length, K “ 127)

Frequency: rad/sample10

-210

-110

0

Magnitude

10-1

100

101

102

|G(jω )| estimates (32) and mean estimate for K = 127

2018-11-20 10.32

Page 17: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

ARMAX example

Transfer function estimates: 32 experiments (data length, K “ 127)

Frequency: rad/sample10

-210

-110

0

Magnitude

10-1

100

101

102

103

104

|H(jω )| estimates (32) and mean estimate for K = 127

2018-11-20 10.33

ARARMAX model structure

Bpθ, zq`1

Apθ, zq

Cpθ, zqDpθ, zq

vpkqypkq upkq

epkq

Gpθ, zq “ Bpθ, zqApθ, zq , Hpθ, zq “ Cpθ, zq

Apθ, zqDpθ, zq ,

with Apzq, Cpzq and Dpzq monic.

2018-11-20 10.34

Page 18: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

Output error model structure

Bpθ, zqF pθ, zq` upkqypkq

epkqGpθ, zq “ Bpθ, zq

F pθ, zq ,

Hpθ, zq “ 1.

with F pzq monic.

Pseudolinear predictor framework

ypk|θq “ Bpθ, zqF pθ, zqupkq “ φpk, θqT θ.

where

φpk, θqT “r upk ´ 1q . . . upk ´mq ´ypk ´ 1, θq . . . ´ypk ´ nf , θqloooooooooooooooooooooooomoooooooooooooooooooooooons.

pseudolinear terms

2018-11-20 10.35

Box-Jenkins model structure

Bpθ, zqF pθ, zq`

Cpθ, zqDpθ, zq

vpkqypkq upkq

epkq

Gpθ, zq “ Bpθ, zqF pθ, zq , Hpθ, zq “ Cpθ, zq

Dpθ, zq ,

Predictor

ypk|θq “ DpzqCpzq

BpzqF pzqupkq `

ˆ1´ Dpzq

Cpzq˙ypkq

2018-11-20 10.36

Page 19: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

General model structure

Bpθ, zqF pθ, zq`1

Apθ, zq

Cpθ, zqDpθ, zq

vpkqypkq upkq

epkq

Gpθ, zq “ Bpθ, zqApθ, zqF pθ, zq , Hpθ, zq “ Cpθ, zq

Apθ, zqDpθ, zq

Predictor

ypk|θq “ DpzqCpzq

BpzqF pzqupkq `

ˆ1´ DpzqApzq

Cpzq˙ypkq

A pseudo-linear regression can be derived.

2018-11-20 10.37

Known noise model (with ARMAX dynamics)

Assume that the noise is known,

vpkq “ LpzqepkqSo

Apzqypkq “ Bpzqupkq ` Lpzqepkq.Filter signals via L´1pzq,

yLpkq “ L´1pzqypkquLpkq “ L´1pzqupkq

Giving,

ApzqyLpkq “ BpzquLpkq ` epkq,for which LS methods give consistent estimates.

2018-11-20 10.38

Page 20: System Identi cationsysid/course/2018slides/... · Another example Autoregressive noise model Our noise model is: v pk q ¸8 i 0 a i e pk iq |a | 1 for stability : So, H pz q ¸8

High-order model fitting

Assume that the noise is autoregressive (ARARX structure),

Apzqypkq “ Bpzqupkq ` 1

Dpzqepkq epkq „ N p0, λq.

Fit a high order model (order of Dpzq is nd):

ApzqDpzqypkq “ BpzqDpzqupkq ` epkq.Least squares estimate with orders n` nd and m` nd. This gives a consistentestimate of,

BpzqDpzqApzqDpzq “

BpzqApzq .

This amounts to making the noise model sufficiently rich to capture additionalautoregressive features in the noise.

In practice the cancellation will not be exact: Apzq and Bpzq will be high order.

2018-11-20 10.39

Bibliography

PredictionLennart Ljung, System Identification;Theory for the User, 2nd Ed., Prentice-Hall,1999, [section 3.2].

Model parametrisations

Lennart Ljung, System Identification;Theory for the User, 2nd Ed., Prentice-Hall,1999, [sections 1.3 and 4.2].

Linear and pseudolinear regression

Lennart Ljung, System Identification;Theory for the User, 2nd Ed., Prentice-Hall,1999, [sections 10.1 and 10.2].

2018-11-20 10.40