System IdentificationLecture 4: Transfer function averaging and smoothing
Roy Smith
2019-10-4 4.1
Averaging
Multiple estimates
Multiple experiments: urpkq, yrpkq, r “ 1, . . . , R, and k “ 0, . . . ,K ´ 1.
Multiple estimates (ETFE):
Grpejωnq “ YrpejωnqUrpejωnq (drop the N from the YN pejωn q notation)
Averaging to improve the estimate.
Gpejωnq “Rÿ
r“1
αrGrpejωnq, withRÿ
r“1
αr “ 1.
How to choose αr?
The “average” is calculated with αr “ 1{R.
2019-10-4 4.2
Averaging
Optimal weighted average
If the variance of Grpejωnq is σ2rpejωnq
(with uncorrelated errors and identical means), then,
variance´Gpejωnq
¯“ variance
˜Rÿ
r“1
αrpejωnqGrpejωnq¸“
Rÿ
r“1
α2rσ
2rpejωnq
This is minimized by
αrpejωnq “ 1{σ2rpejωnq
Rÿ
r“1
1{σ2rpejωnq
.
If variance´Grpejωnq
¯“ φvpejωnq
1N|Urpejωnq|2 then αrpejωnq “ |Urpejωnq|2
Rÿ
r“1
|Urpejωnq|2.
2019-10-4 4.3
Averaging
Variance reduction
Best˚ result:ˇUrpejωnqˇ is the same for all r “ 1, . . . , R.
This gives,
variance´Gpejωnq
¯“
variance´Grpejωnq
¯
R.
If the estimates are biased we will not get as much variance reduction.
(The method of splitting the data record and averaging the periodograms is attributed
to Bartlett (1948, 1950).)
˚ This is “best” given that we have R experiments and an upper bound onˇˇUrpejωn q
ˇˇ.
2019-10-4 4.4
Averaging example (noisy data)
Splitting the data record: K “ 72, R “ 3, N “ 24,
´5
0
5
index: k´20 ´10 0 10 20 30 40 50 60 70
upkq
´5
0
5
index: k´20 ´10 0 10 20 30 40 50 60 70
ypkq
´5
0
5
index: k´20 ´10 0 10 20 30 40 50 60 70
y1pkq
´5
0
5
index: k´20 ´10 0 10 20 30 40 50 60 70
y2pkq
´5
0
5
index: k´20 ´10 0 10 20 30 40 50 60 70
y3pkq
2019-10-4 4.5
Averaging example (noisy data)
Estimates: Grpejωnq, r “ 1, 2, 3 and the weighted average, Gavgpejωnq.
Magnitude
Gpejωq
G1pejωn q
G2pejωn q
G3pejωn qGavgpejωn q
0.1
1
10
Frequency, ω(rad/sample)
1 π
2019-10-4 4.6
Averaging example (noisy data)
Estimates: Grpejωnq, r “ 1, 2, 3 and the weighted average, Gavgpejωnq.
Errors: Erpejωnq “ Gpejωnq ´ Grpejωnq.
Magnitude
E1pejωn qE2pejωn q
E3pejωn q
Eavgpejωn q
0.01
0.1
1
5
Frequency, ω(rad/sample)
1 π
2019-10-4 4.7
Averaging with periodic signals
Splitting the data record: K “ 72, N “ 24, R “ 3, upkq has period M “ 24.
´5
0
5
index: k´20 ´10 0 10 20 30 40 50 60 70
upkq
´5
0
5
index: k´20 ´10 0 10 20 30 40 50 60 70
ypkq
´5
0
5
index: k´20 ´10 0 10 20 30 40 50 60 70
y1pkq
´5
0
5
index: k´20 ´10 0 10 20 30 40 50 60 70
y2pkq
´5
0
5
index: k´20 ´10 0 10 20 30 40 50 60 70
y3pkq
2019-10-4 4.8
Averaging with periodic signals
Estimates: Grpejωnq and Gavgpejωnq “´G2pejωnq ` G3pejωnq
¯{2
Magnitude
Gpejωq
G2pejωn q
G3pejωn q
Gavgpejωn q
0.1
1
10
Frequency, ω(rad/sample)
1 π
2019-10-4 4.9
Averaging with periodic signals
Estimates: Grpejωnq and Gavgpejωnq “´G2pejωnq ` G3pejωnq
¯{2
Errors: Erpejωnq “ Gpejωnq ´ Grpejωnq.
Magnitude
E2pejωn qE3pejωn qEavgpejωn q
Eavgpejωn q(non-periodic)
0.01
0.1
1
5
lFrequency, ω(rad/sample)
1 π
2019-10-4 4.10
Averaging with periodic signals
Estimates: Grpejωnq and Gavgpejωnq “´G2pejωnq ` G3pejωnq
¯{2
Errors: Erpejωnq “ Gpejωnq ´ Grpejωnq.
Magnitude
E2pejωn qE3pejωn qEavgpejωn q
Eavgpejωn q(non-periodic)
0.01
0.1
1
5
lFrequency, ω(rad/sample)
1 π
2019-10-4 4.10
Bias-variance trade-offs in data record splitting
Bartlett’s procedure
Divide a (long) data record into smaller parts for averaging.
Data: tupkq, ypkqu k “ 0, . . . ,K ´ 1 with K very large.
Choose R records, and calculation a length: N, NR ď K.
urpnq “ uprN ` nq, r “ 0, . . . , R´ 1, n “ 0, . . . , N ´ 1.
And average the resulting estimates:
Gpejωnq “R´1ÿ
r“0
Grpejωnq “R´1ÿ
r“0
YrpejωnqUrpejωnq .
Bias and variance effects
As R increases:
§ The number of points calculated, N , decreases (so NR ď K);
§ The variance decreases (by up to 1{R);
§ The bias increases (due to non-periodicity transients).
2019-10-4 4.11
Bias-variance trade-offs in data record splitting
Mean-square error
MSE “ bias2 ` variance.
§ Transient bias grows linearly with the number of data splits.
§ Variance decays with a rate of up to 1/(number of averages).
Optimal bias-variance trade-off
Error
number of records, R
MSE
Bias error
Variance error
1 2
2019-10-4 4.12
Smoothing transfer function estimates
What if we have no option of running periodic input experiments?
§ The system will not tolerate periodic inputs; or
§ The data has already been taken and given to us.
§ More data just gives more frequencies (with the same error variance).
Exploit the assumed smoothness of the underlying system
Gpejωq is assumed to be a low order (smooth) system.
E!pGpejωnq ´ GN pejωnqqpGpejωsq ´ GN pejωsqq
)ÝÑ 0 pn ‰ sq,
(asymptotically at least).
2019-10-4 4.13
Smoothing the ETFE
Smooth transfer function assumption
Assume the true system to be close to constant for a range of frequencies.
Gpejωn`r q « Gpejωnq for r “ 0,˘1,˘2, . . . ,˘R.
Smooth minimum variance estimate
The minimum variance smoothed estimate is,
GN pejωnq “
Rÿ
r“´RαrGN pejωn`r qRÿ
r“´Rαr
, αr “1N
ˇUN pejωn`r qˇ2φvpejωn`r q .
(Here we smooth over 2R` 1 points)
2019-10-4 4.14
Smoothing the ETFE
If N is large (many closely spaced frequencies), the summation can beapproximated by an integral,
GN pejωnq “
Rÿ
r“´RαrGN pejωn`r qRÿ
r“´Rαr
«
ż ωn`r
ωn´r
αpejξqGN pejξq dξż ωn`r
ωn´r
αpejξq dξ, with αpejξq “
1N
ˇUN pejξq
ˇ2
φvpejξq .
2019-10-4 4.15
Smoothing the ETFE
Smoothing window
GN pejωnq “1
2π
ż π
´πWγpejpξ´ωnqqαpejξqGN pejξq dξ
1
2π
ż π
´πWγpejpξ´ωnqqαpejξq dξ
,
with αpejξq “1N
ˇUN pejξq
ˇ2
φvpejξq .
The 1{2π scaling will make it easier to derived time-domain windows later.
2019-10-4 4.16
Smoothing the ETFE
Assumptions on φvpejωq
Assume φvpejωq is also a smooth (and flat) function of frequency.
1
2π
ż π
´πWγpejpξ´ωnqq
ˇˇ 1
φvpejξq ´1
φvpejωnqˇˇ dξ « 0.
Then use,
αpejξq “1N
ˇUN pejξq
ˇ2
φvpejωnq ,
to get,
GN pejωnq “1
2π
ż π
´πWγpejpξ´ωnqq 1
N
ˇˇUN pejξq
ˇˇ2
GN pejξq dξ1
2π
ż π
´πWγpejpξ´ωnqq 1
N
ˇˇUN pejξq
ˇˇ2
dξ
.
2019-10-4 4.17
Weighting functions
Frequency smoothing window characteristics: width (specified by γparameter)
The wider the frequency window (i.e. decreasing γ) ...
§ the more adjacent frequencies included in the smoothed estimate,
§ the smoother the result,
§ the lower the noise induced variance,
§ the higher the bias.
2019-10-4 4.18
Weighting functions
Typical window (Hann) as a function of the width parameter, γ.
γ “ 20
γ “ 10
γ “ 5γ “ 1
N “ 256
Wγpejωn q
5
10
15
20
Frequency: ω
´π ´π{2 0 π{2 π
2019-10-4 4.19
Weighting functions
Window characteristics: shape
Some common choices:
Bartlett: Wγpejωq “ 1
γ
ˆsin γω{2sinω{2
˙2
Hann: Wγpejωq “ 12Dγpωq ` 1
4Dγpω ´ π{γq ` 1
4Dγpω ` π{γq
where Dγpωq “ sinωpγ ` 0.5qsinω{2
Others include: Hamming, Parzen, Kaiser, ...
The differences are mostly in the leakage properties of the energy to adjacentfrequencies. And the ability to resolve close frequency peaks.
2019-10-4 4.20
Weighting functions
Example frequency domain windows.
Welch
Hann
Hamming
Bartlett
γ “ 10
N “ 256
Wγpejωn q
5
10
Frequency: ω
´π ´π{2 0 π{2 π
2019-10-4 4.21
ETFE smoothing example: Matlab calculations
U = fft(u); % calculate N point FFTs
Y = fft(y);
Gest = Y./U; % ETFE estimate
Gs = 0*Gest; % smoothed estimate
[omega,Wg] = WfHann(gamma,N); % window (centered)
zidx = find(omega==0); % shift to start at zero
omega = [omega(zidx:N);omega(1:zidx-1)]; % frequency grid
Wg = [Wg(zidx:N);Wg(1:zidx-1)];
a = U.*conj(U); % variance weighting
for wn = 1:N,
Wnorm = 0; % reset normalisation
for xi = 1:N,
widx = mod(xi-wn,N)+1; % wrap window index
Gs(wn) = Gs(wn) + ...
Wg(widx) * Gest(xi) * a(xi);
Wnorm = Wnorm + Wg(widx) * a(xi);
end
Gs(wn) = Gs(wn)/Wnorm; % weight normalisation
end
2019-10-4 4.22
Window properties
Properties and characteristic values of window functions
1
2π
ż π
´πWγpejξqdξ “ 1 (Normalised)
ż π
´πξWγpejξqdξ “ 0 (“Even” sort of)
Mpγq :“ż π
´πξ2Wγpejξqdξ
W pγq :“ 2π
ż π
´πW 2γ pejξqdξ
Bartlett: Mpγq “ 2.78
γ, W pγq « 0.67γ (for γ ą 5)
Hamming: Mpγq “ π2
2γ2, W pγq « 0.75γ (for γ ą 5)
2019-10-4 4.23
Smoothed estimate properties
Asymptotic bias properties
E!Gpejωnq ´ E
!Gpejωnq
))“ E
!Gpejωnq ´Gpejωnq
)
“Mpγqˆ
1
2G2pejωnq `G1pejωnqφ
1upejωnqφupejωnq
˙
` OpC3pγqqloooomoooonÝÑ 0
as γ ÝÑ 8
` O´
1{?N¯
looooomooooonÝÑ 0
as N ÝÑ 8Increasing γ:
§ makes the frequency window narrower;
§ averages over fewer frequency values;
§ makes Mpγq smaller; and
§ reduces the bias of the smoothed estimate, Gpejωnq.
2019-10-4 4.24
Smoothed estimate properties
Asymptotic variance properties
E
"´Gpejωnq ´ E
!Gpejωnq
)¯2*“ 1
NW pγqφvpe
jωnqφupejωnq` o
`W pγq{N˘
loooooomoooooonÝÑ 0
as γ ÝÑ 8N ÝÑ 8γ{N ÝÑ 0
Increasing γ:
§ makes the frequency window narrower;
§ averages over fewer frequency values;
§ makes W pγq larger; and
§ increases the variance of the smoothed estimate, Gpejωnq.
2019-10-4 4.25
Smoothed estimate properties
Asymptotic MSE properties
E
"ˇˇGpejωnq ´Gpejωnq
ˇˇ2*«M2pγq|F pejωnq|2 ` 1
NW pγqφvpe
jωnqφupejωnq ,
where F pejωnq “ 1
2G2pejωnq `G1pejωnqφ
1upejωnqφupejωnq
If Mpγq “M{γ2 and W pγq “ Wγ then MSE is minimised by,
γoptimal “ˆ
4M2|F pejωnq|2φupejωnqWφvpejωnq
˙1{5N1{5.
and
MSE at γoptimal « CN´4{5
2019-10-4 4.26
Bibliography
Windowing and ETFE smoothing
P. Stoica & R. Moses, Introduction to Spectral Analysis (see Chapters 1 and 2),Prentice-Hall, 1997.
Lennart Ljung, System Identification; Theory for the User, (see Section 6.4)Prentice-Hall, 2nd Ed., 1999.
M.S. Bartlett, “Smoothing Periodograms from Time-Series with Continuous Spectra,”Nature, vol. 161(4096), pp. 686–687, 1948.
M.S. Bartlett, “Periodogram analysis and continuous spectra,” Biometrika, vol. 37,pp. 1–16, 1950.
2019-10-4 4.27