Upload
trantu
View
217
Download
1
Embed Size (px)
Citation preview
Chapter 5 Estimation Theory and Applications
References: S.M.Kay, Fundamentals of Statistical Signal Processing: Estimation Theory, Prentice Hall, 1993
1
Estimation Theory and Applications
Application Areas
1. Radar
Radar system transmits an electromagnetic pulse )(ns . It is reflected by an aircraft, causing an echo )(nr to be received after 0τ seconds:
)()(( nwnsnr +τ−) 0α= where the range R of the aircraft is related to the time delay by
cR /20 =τ
2
3
2. Mobile Communications
The position of the mobile terminal can be estimated using the time-of-arrival measurements received at the base stations.
4
3. Speech Processing
Recognition of human speech by a machine is a difficult task because our voice changes from time to time.
Given a human voice, the estimation problem is to determine the speech as close as possible.
4. Image Processing
Estimation of the position and orientation of an object from a camera image is useful when using a robot to pick it up, e.g., bomb-disposal
5. Biomedical Engineering
Estimation the heart rate of a fetus and the difficulty is that the measurements are corrupted by the mother’s heart beat as well.
6. Seismology
Estimation of the underground distance of an oil deposit based on sound reflection due to the different densities of oil and rock layers.
5
Differences from Detection
1. Radar
Radar system transmits an electromagnetic pulse )(ns . After some time, it receives a signal )(nr . The detection problem is to decide whether )(nr is
echo from an object or it is not an echo
6
7
2. Communications In wired or wireless communications, we need to know the information sent from the transmitter to the receiver. e.g., for binary phase shift keying (BPSK) signals, it consists of only two symbols, “0” or “1”. The detection problem is to decide whether it is “0” or “1”.
8
9
3. Speech Processing
Given a human speech signal, the detection problem is decide what is the spoken word from a set of predefined words, e.g., “0”, “1”,…, “9”
Waveform of “0”
Another example is voice authentication: given a voice and it is indicated that the voice is from George Bush, we need to decide it’s Bush or not.
10
4. Image Processing
Fingerprint authentication: given a fingerprint image and his owner says he is “A”, we need to verify if it is true or not
Other biometric examples include face authentication, iris authentication, etc.
11
5. Biomedical Engineering
17 Jan. 2003, Hong Kong Economics Times
e.g., given some X-ray slides, the detection problem is to determine if she has breast cancer or not
6. Seismology
To detect if there is oil or there is no oil at a region
12
What is Estimation?
Extract or estimate some parameters from the observed signals, e.g., Use a voltmeter to measure a DC signal
1,,1,0],[][ −=+= NnnwAnx L
Given , we need to find the DC value, ][nx A ⇒ the parameter is the observed signal
Estimate the amplitude, frequency and phase of a sinusoid in noise
1,,1,0],[)cos(][ −=+φ+ωα= Nnnwnnx L
Given , we need to find ][nx α, ω and φ ⇒ the parameters are not directly observed in the received signal
13
Estimate the value of resistance R from a set of voltage and current readings:
1,,1,0],[][][],[][][ 2actual1actual −=+=+= NnnwnInInwnVnV L
Given N pairs of ( ][],[ nInV ), we need to estimate the resistance R , ideally, IVR /=
⇒ the parameter is not directly observed in the received signals Estimate the position of the mobile terminal using time-of-arrival measurements:
1,,1,0],[)()(
][22
−=+−−−
= Nnnwc
yyxxnr nsns L
Given ][nr , we need to find the mobile position ( ss yx , ) where is the signal propagation speed and (
cnn yx , ) represent the known position of
the nth base station
⇒ the parameters are not directly observed in the received signals
14
Types of Parameter Estimation Linear or non-linear
Linear: DC value, amplitude of the sine wave Non-linear: Frequency of the sine wave, mobile position
Single parameter or multiple parameters
Single: DC value; scalar Multiple: Amplitude, frequency and phase of sinusoid; vector
Constrained or unconstrained
Constrained: Use other available information & knowledge, e.g., from N ][],[ nInVthe pairs of ( ), we draw a line which best fits
the data points and the estimate of the resistance is given by the slope of the line. We can add a constraint that the line should cross the origin (0,0)
Unconstrained: No further information & knowledge is available
15
Parameter is unknown deterministic or random
Unknown deterministic: constant but unknown (classical) DC value is an unknown constant Random : random variable with prior knowledge of
PDF (Bayesian) If we have prior knowledge that the DC value
is bounded by 0A− and 0A with a particular PDF ⇒ better estimate
Parameter is stationary or changing
Stationary : Unknown deterministic for whole observation
period, time-of-arrivals of a static source Changing : Unknown deterministic at different time
instants, time-of-arrivals of a moving source
16
Performance Measures for Classical Parameter Estimation
Accuracy: Is the estimator biased or unbiased?
e.g., 1,,1,0],[][ −=+= NnnwAnx L
where is a zero-mean random noise with variance ][nw 2wσ
Proposed estimators: ]0[ˆ1 xA =
][1ˆ 1
02 nx
NA
N
n∑=−
=
][1
1ˆ 1
03 nx
NA
N
n∑
−=
−
=
NNN
nNxxxnxA ]1[]1[]0[][ˆ
1
04 −⋅=∏=
−
=L
17
Biased : AAE ≠}ˆ{Unbiased : AAE =}ˆ{Asymptotically unbiased : only if AAE =}ˆ{ ∞→N Taking the expected values for , and , we have 1A 2A 3A
AAwEAExEAE =+=+== 0]}0[{}{]}0[{}ˆ{ 1
AN
ANN
nwEN
AN
nwN
EAN
EnxN
EAE
N
n
N
n
N
n
N
n
N
n
N
n
=∑+⋅⋅=∑+∑=
∑+
∑=
∑=
−
=
−
=
−
=
−
=
−
=
−
=
011]}[{11
][11][1}ˆ{
1
0
1
0
1
0
1
0
1
0
1
02
AN
ANNAE ⋅
−=⋅
−=
/111
1}ˆ{ 3
Q. State the biasedness of , and . 1A 2A 3A
18
For , it is difficult to analyze the biasedness. However, for 4A 0][ =nw :
AAAAANxxx N NNN ==⋅=−⋅ LL ]1[]1[]0[ What is the value of the mean square error or variance?
They correspond to the fluctuation of the estimate in the second order:
})ˆ{(MSE 2AAE −= (5.1)
}})ˆ{ˆ{(var 2AEAE −= (5.2) :
If the estimator is unbiased, then MSE = var
19
In general,
2
2
22
22
(bias)var
)}ˆ{})(ˆ{}ˆ{(2)}ˆ{(var
)}}ˆ{})(ˆ{ˆ{(2})}ˆ{{(}})ˆ{ˆ{(
})}ˆ{}ˆ{ˆ{(})ˆ{(MSE
+=
−−+−+=
−−+−+−=
−+−=−=
AAEAEAEAAE
AAEAEAEAAEEAEAE
AAEAEAEAAE
(5.3)
22222
1 ]}0[{})]0[{(})]0[{(})ˆ{( wwEAwAEAxEAAE σ==−+=−=−
NnwE
NAnx
NEAAE wN
n
N
n
221
0
21
0
22 ][1][1})ˆ{(
σ=
∑=
−∑=−
−
=
−
=
11][
11})ˆ{(
2221
0
23 −
σ+
−=
−∑
−=−
−
= NNAAnx
NEAAE wN
n
20
An optimum estimator should give estimates which are Unbiased Minimum variance (MSE as well)
Q. How do we know the estimator has the minimum variance? Cramer-Rao Lower Bound (CRLB) Performance bound in terms of minimum achievable variance provided by any unbiased estimators Use for classical parameter estimation Require knowledge of the noise PDF and the PDF must have closed form More easier to determine than other variance bounds
21
Let the parameters to be estimated be TP ],,,[ 21 θθθ= Lθ , the CRLB for
θ in Gaussian noise is stated as follows i
[ ] [ ] iiiii ,1
, )()()CRLB( θIθJ −==θ (5.4) where
θ∂
∂
θ∂θ∂
∂
θ∂
∂
θ∂θ∂
∂
θ∂θ∂
∂
θ∂θ∂
∂
θ∂
∂
=
2
2
1
2
22
2
12
21
2
21
2
21
2
);(ln);(ln
);(ln);(ln
);(ln);(ln);(ln
)(
PP
P
pEpE
pEpE
pEpEpE
θx-θx-
θx-θx-
θx-θx-θx-
θI
OM
L
(5.5)
22
);( θxp represents PDF of and it is parameterized by the unknown parameter vector
TNxxx ]]1[,],1[],0[[ −= Lxθ
Note that is known as Fisher information matrix )(θI
is the ( ) element of ii,][J ii, J
e.g., 3][3221
22 =⇒
= ,JJ
θ∂θ∂
∂=
θ∂θ∂
∂
ijji
pEpE );(ln);(ln 22 θxθx
23
Review of Gaussian (Normal) Distribution The Gaussian PDF for a scalar random variable is defined as x
µ−
σ−
πσ= 2
22)(
21exp
2
1)( xxp (5.6)
We can write ),(~ σµNx The Gaussian PDF for a random vector of size x N is defined as
−⋅⋅−−
π= )()(
21exp
)(det)2(1)( 1
2/12/ µxCµxC
x -TN
p (5.7)
We can write ),(~ Cµx N
24
The covariance matrix has the form of C
µ−−µ−−µ−
µ−µ−µ−−µ−µ−
=
−⋅−=
−−
−
})]1[{()}]1[)(]0[{(
)}]1[)(]0[{()}]1[)(]0[{(})]0[{(
})(){(
2110
10
102
0
NN
N
T
NxENxxE
xxENxxExE
E
L
M
MO
L
µxµxC
(5.8) where
TNxxx ]]1[,],1[],0[[ −= Lx
TNE ]],,,[}{ 110 −µµµ== Lxµ
25
If is a zero-mean white vector and all vector elements have variance x 2σ
NTE IµxµxC ⋅σ=
σ
σσ
=−⋅−= 2
2
2
2
000
000
})(){(
L
OM
M
L
The Gaussian PDF for the random vector can be simplified as x
∑
σ−
πσ=
−
=][
21exp
)2(1)( 21
022/2 nxpN
nNx (5.9)
with the use of
N- IC ⋅σ= −21
NN 22 )(det σ=σ=(C)
26
Example 5.1 Determine the PDF of
]0[]0[ wAx += and
1,,1,0],[][ −=+= NnnwAnx L where { } is a white Gaussian process with known variance and )(nw 2
wσ A is a constant
−
σ−
πσ= 2
22)]0[(
21exp
2
1)];0[( AxAxpww
−∑
σ−
πσ=
−
=
21
022/2 )][(2
1exp)2(
1)( Anx;ApN
nwN
wx
27
Example 5.2 Find the CRLB for estimating A based on single measurement:
]0[]0[ wAx +=
22
2
22
22
2
222
1))];0[(ln(
)]0[(1)]0[(22
1))];0[(ln(
)]0[(2
1)2ln())];0[(ln(
)]0[(2
1exp2
1)];0[(
w
ww
ww
ww
AAxp
AxAxA
Axp
AxAxp
AxAxp
σ−=
∂
∂⇒
σ
−=−⋅−⋅
σ−=
∂∂
⇒
−σ
−πσ−=⇒
−
σ−
πσ=
28
As a result,
22
2 1))];0[(ln(
wAAxpE
σ−=
∂
∂
2
2
2
)CRLB(
)(
1)()(
w
w
w
A
AJ
AIA
σ=⇒
σ=⇒
σ==I
This means the best we can do is to achieve estimator variance = or 2
wσ
2)ˆvar( wA σ≥ where A is any unbiased estimator for estimating A
29
We also observe that a simple unbiased estimator
]0[ˆ1 xA =
achieves the CRLB:
222221 ]}0[{})]0[{(})]0[{(})ˆ{( wwEAwAEAxEAAE σ==−+=−=−
Example 5.3 Find the CRLB for estimating A based on N measurements:
1,,1,0],[][ −=+= NnnwAnx L
−∑
σ−
πσ=
−
=
21
022/2 )][(2
1exp)2(
1)( Anx;ApN
nwN
wx
30
22
2
2
1
01
02
21
022/2
21
022/2
));(ln(
)][(1)][(2
21));(ln(
)][(2
1))2ln(());(ln(
)][(2
1exp)2(
1);(
w
w
N
nN
nw
N
nw
Nw
N
nwN
w
NA
Ap
AnxAnx
AAp
AnxAp
AnxAp
σ−=
∂
∂⇒
σ
−∑=−⋅−∑⋅⋅
σ−=
∂∂
⇒
−∑σ
−πσ−=⇒
−∑
σ−
πσ=
−
=−
=
−
=
−
=
x
x
x
x
22
2 ));(ln(
w
NA
ApEσ
−=
∂
∂⇒
x
31
As a result,
NA
NAJ
NAIA
w
w
w
2
2
2
)CRLB(
)(
)()(
σ=⇒
σ=⇒
σ==I
This means the best we can do is to achieve estimator variance = or
Nw /2σ
NA w
2)ˆvar(
σ≥
where A is any unbiased estimator for estimating A
32
We also observe that a simple unbiased estimator
]0[ˆ1 xA = does not achieve the CRLB
222221 ]}0[{})]0[{(})]0[{(})ˆ{( wwEAwAEAxEAAE σ==−+=−=−
On the other hand, the sample mean estimator
][1ˆ 1
02 nx
NA
N
n∑=−
=
achieve the CRLB
NnwE
NAnx
NEAAE wN
n
N
n
221
0
21
0
22 ][1][1})ˆ{(
σ=
∑=
−∑=−
−
=
−
=
⇒ sample mean is the optimum estimator for white Gaussian noise
33
Example 5.4 Find the CRLB for A and given 2
wσ ]}[{ nx :
1,,1,0],[][ −=+= NnnwAnx L
2
1
01
02
21
022
21
022/2
221
022/2
)][(1)][(2
21));(ln(
)][(2
1)ln(2
)2ln(2
)][(2
1))2ln(());(ln(
,)][(2
1exp)2(
1);(
w
N
nN
nw
N
nww
N
nw
Nw
wN
nwN
w
AnxAnx
Ap
AnxNN
Anxp
A,Anxp
σ
−∑=−⋅−∑⋅⋅
σ−=
∂∂
⇒
−∑σ
−σ−π−=
−∑σ
−πσ−=⇒
σ=
−∑
σ−
πσ=
−
=−
=
−
=
−
=
−
=
θx
θx
][θθx
34
0]})[{(
));(ln(
])[()][());(ln(
));(ln(
));(ln(
4
1
02
2
4
1
04
1
02
2
22
2
22
2
=σ
∑−=
σ∂∂
∂⇒
σ
∑−=
σ
−∑−=
σ∂∂
∂⇒
σ−=
∂
∂⇒
σ−=
∂
∂⇒
−
=
−
=
−
=
w
N
n
w
w
N
n
w
N
n
w
w
w
nwE
ApE
nwAnx
Ap
NApE
NAp
θx
θx
θx
θx
35
42
6422
2
21
06422
2
21
0422
21
022
21
2)());(ln(
])[(12)(
));(ln(
)][(2
12
));(ln(
)][(2
1)ln(2
)2ln(2
));(ln(
ww
www
N
nwww
N
nwww
N
nww
NNNpE
nwNp
AnxNp
AnxNNp
σ−=σ⋅
σ−
σ=
σ∂
∂⇒
∑σ
−σ
=σ∂
∂⇒
−∑σ
+σ
−=σ∂
∂⇒
−∑σ
−σ−π−=
−
=
−
=
−
=
θx
θx
θx
θx
σ
σ=
4
2
20
0)(
w
wN
N
θI
36
σ
σ
==
N
Nw
w-
4
2
1
20
0)()( θIθJ
N
NA
ww
w
42
2
2)CRLB(
)CRLB(
σ=σ⇒
σ=⇒
⇒ the CRLBs for unknown and known noise power are identical Q. The CRLB is not affected by knowledge of noise power. Why? Q. Can you suggest a method to estimate ? 2
wσ
37
Example 5.5 Find the CRLB for phase of a sinusoid in white Gaussian noise:
1,,1,0],[)cos(][ 0 −=+φ+ω= NnnwnAnx L
where A and are assumed known 0ω The PDF is
20
1
022/2
20
1
022/2
))cos(][(2
1))2ln(());(ln(
))cos(][(2
1exp)2(
1);(
φ+ω−∑σ
−πσ−=φ⇒
φ+ω−∑
σ−
πσ=φ
−
=
−
=
nAnxp
nAnxp
N
nw
Nw
N
nwN
w
x
x
38
φ+ω−φ+ω∑
σ−=
φ+ω−⋅−⋅φ+ω−∑σ
−=φ∂
φ∂
−
=
−
=
)22sin(2
)sin(][
)sin())cos(][(22
1));(ln(
001
02
001
02
nAnnxA
nAnAnxp
N
nw
N
nw
x
[ ])22cos()cos(][));(ln(00
1
022
2φ+ω−φ+ω∑
σ−=
φ∂
φ∂ −
=nAnnxAp N
nw
x
[ ]
[ ]
φ+ω−φ+ω+∑
σ−=
φ+ω−φ+ω∑σ
−=
φ+ω−φ+ω⋅φ+ω∑σ
−=
φ∂
φ∂
−
=
−
=
−
=
)22cos()22cos(21
21
)22cos()(cos
)22cos()cos()cos());(ln(
001
02
2
0021
02
2
0001
022
2
nnA
nnA
nAnnAApE
N
nw
N
nw
N
nw
x
39
)22cos(22
)22cos(22
));(ln(
01
02
2
2
2
01
02
2
2
2
2
2
φ+ω∑σ
+σ
−=
φ+ω∑σ
+⋅σ
−=
φ∂
φ∂
−
=
−
=
nANA
nANApE
N
nww
N
nww
x
As a result, 11
002
211
002
2
2
2)22cos(112)22cos(
22)CRLB(
−−
=
−−
=
∑ φ+ω−
σ=
∑ φ+ω
σ−
σ=φ
N
n
wN
nwwn
NNAnANA
If 1>>N , 0)22cos(10
1
0≈φ+ω∑
−
=n
NN
n
then
2
22)CRLB(NAwσ≈φ
40
Example 5.6 Find the CRLB for A, 0ω and φ for
1,1,,1,0],[)cos(][ 0 >>−=+φ+ω= NNnnwnAnx L
))(cos)cos(][(1
)cos())cos(][(22
1));(ln(
))cos(][(2
1))2ln(());(ln(
,))cos(][(2
1exp)2(
1);(
02
01
02
001
02
20
1
022/2
02
01
022/2
φ+ω−φ+ω∑σ
=
φ+ω−⋅φ+ω−∑⋅⋅σ
−=∂
∂⇒
φ+ω−∑σ
−πσ−=⇒
φω=
φ+ω−∑
σ−
πσ=
−
=
−
=
−
=
−
=
nAnnx
nnAnxAp
nAnxp
A,nAnxp
N
nw
N
nw
N
nw
Nw
N
nwN
w
θx
θx
],[θθx
41
2
01
02021
022
2
2
)22cos(21
211)(cos1));(ln(
w
N
nw
N
nwN
nnAp
σ−≈
φ+ω+∑
σ−=φ+ω∑
σ−=
∂
∂ −
=
−
=
θx
22
2
2));(ln(
w
NApE
σ−≈
∂
∂ θx
Similarly,
0)22sin(2
))θ;x(ln(0
1
020
2≈φ+ω∑
σ=
ω∂∂∂ −
=nnA
ApE
N
nw
0)22sin(2
))θ;x(ln(0
1
02
2≈φ+ω∑
σ=
φ∂∂∂ −
=nA
ApE
N
nw
42
21
02
20
21
02
2
20
2
2)22cos(
21
21));(ln( nAnnApE
N
nw
N
nw∑
σ−≈
φ+ω−∑
σ−=
ω∂
∂ −
=
−
=
θx
nAnnApEN
nw
N
nw∑
σ−≈φ+ω∑
σ−=
φ∂ω∂∂ −
=
−
=
1
02
20
21
02
2
0
2
2)(sin))θ;x(ln(
2
20
21
02
2
2
2
2)(sin));(ln(
w
N
nw
NAnApEσ
−≈φ+ω∑σ
−=
φ∂
∂ −
=
θx
∑
∑∑σ
≈
−
=
−
=
−
=
220
220
002
1)(
21
0
2
1
0
221
0
2
2
NAnA
nAnA
N
N
n
N
n
N
nwθI
43
After matrix inversion, we have
NA w
22)CRLB(
σ≈
2
2
202
SNR,)1(SNR
12)CRLB(w
ANN σ
=−⋅
≈ω
)1(SNR)12(2)CRLB(+⋅
−≈φ
NNN
Note that
NANNNNN w
22SNR
1SNR
4)1(SNR
)12(2)CRLB( σ=
⋅>
⋅≈
+⋅−
≈φ
⇒ In general, the CRLB increases as the number of parameters to be
estimated increases ⇒ CRLB decreases as the number of samples increases
44
Parameter Transformation in CRLB Find the CRLB for θ)α (g= where ()g is a function e.g., 1,,1,0],[][ −=+= NnnwAnx L What is the CRLB for 2A ? The CRLB for parameter transformation of )(α θg= is given by
θ∂
θ∂−
θ∂θ∂
=α
2
2
2
));(ln(
)(
)CRLB(xpE
g
(5.10)
For nonlinear function, “=” is replaced by “ ” and it is true only for large ≈ N
45
Example 5.7 Find the CRLB for the power of the DC value, i.e., 2A :
1,,1,0],[][ −=+= NnnwAnx L
22
2
4)(2)(
)(
AAAgA
AAg
AAg
=
∂∂
⇒=∂
∂⇒
==α
From Example 5.3, we have
22
2 ));(ln(
w
NA
ApEσ
=
∂
∂−
x
As a result,
1,4
4)CRLB(222
22 >>σ
=σ⋅≈ N
NA
NAA ww
46
Example 5.8 Find the CRLB for Acc 21 +=α from
1,,1,0],[][ −=+= NnnwAnx L c
22
2
2
21
)()(
)(
cAAgc
AAg
AcAg
=
∂∂
⇒=∂
∂⇒
+==α
As a result,
Nc
NcAc
w
w
222
222
22 )CRLB()CRLB(
σ=
σ⋅=⋅=α
47
Maximum Likelihood Estimation Parameter estimation is achieved via maximizing the likelihood function Optimum realizable approach and can give performance close to CRLB Use for classical parameter estimation Require knowledge of the noise PDF and the PDF must have closed form Generally computationally demanding Let be the PDF of the observed vector parameterized by the parameter vector θ. The maximum likelihood (ML) estimate is
);( θxp x
);(maxargˆ θxθ
θp= (5.11)
48
e.g., given where is the observed data, as below θ);( 0xx =p 0x
Q. What is the most possible value of θ?
49
Example 5.9 Given
1,,1,0],[][ −=+= NnnwAnx L where A is an unknown constant and is a white Gaussian noise with known variance 2 . Find the ML estimate of
][nw
wσ A.
−∑
σ−
πσ=
−
=
21
022/2 )][(2
1exp)2(
1)( Anx;ApN
nwN
wx
Since ))};({ln(maxarg);(maxarg θxθx
θθpp = , taking log for gives )( ;Ap x
21
022/2 )][(
21))2ln(());(ln( AnxAp
N
nw
Nw −∑
σ−πσ−=
−
=x
50
Differentiate with respect to A yields
2
1
01
02
)][(1)][(2
21));(ln(
w
N
nN
nw
AnxAnx
AAp
σ
−∑=−⋅−∑⋅⋅
σ−=
∂∂
−
=−
=
x
)};({ln(maxargˆ ApA
Ax= is determined from
][1ˆ0)ˆ][(0)ˆ][( 1
0
1
02
1
0 nxN
AAnxAnx N
n
N
nw
N
n ∑=⇒=−∑⇒=σ
−∑ −
=
−
=
−
=
Note that
ML estimate is identical to the sample mean Attain CRLB
Q. How about if is unknown? 2wσ
51
Example 5.10 Find the ML estimate for phase of a sinusoid in white Gaussian noise:
1,,1,0],[)cos(][ 0 −=+φ+ω= NnnwnAnx L
where A and are assumed known 0ω The PDF is
20
1
022/2
20
1
022/2
))cos(][(2
1))2ln(());(ln(
))cos(][(2
1exp)2(
1);(
φ+ω−∑σ
−πσ−=φ⇒
φ+ω−∑
σ−
πσ=φ
−
=
−
=
nAnxp
nAnxp
N
nw
Nw
N
nwN
w
x
x
52
It is obvious that the maximum of );( φxp or ));(ln( φxp corresponds to the minimum of
20
1
02 ))cos(][(2
1φ+ω−∑
σ
−
=nAnx
N
nw or 2
01
0))cos(][( φ+ω−∑
−
=nAnx
N
n
Differentiating with respect to φ and then set the result to zero:
0)22sin(2
)sin(][
)sin())cos(][(2
001
0
001
0
=
φ+ω−φ+ω∑=
φ+ω−⋅−⋅φ+ω−∑
−
=
−
=
nAnnxA
nAnAnx
N
n
N
n
⇒ )ˆ22sin(2
)ˆsin(][ 01
00
1
0φ+ω∑=φ+ω∑
−
=
−
=nAnnx
N
n
N
n
The ML estimate for φ is determined from the root of the above equation
Q. Any ideas to solve the nonlinear equation?
53
Approximate ML (AML) solution may exist and it depends on the structure of the ML expression. For example, there exists an AML solution for φ
1,002
)ˆ22sin(12
)ˆsin(][1
)ˆ22sin(2
)ˆsin(][
01
00
1
0
01
00
1
0
>>=⋅≈φ+ω∑⋅=φ+ω∑⇒
φ+ω∑=φ+ω∑
−
=
−
=
−
=
−
=
NAnN
AnnxN
nAnnx
N
n
N
n
N
n
N
n
The AML solution is obtained from
)cos(][)ˆsin()sin(][)ˆcos(
0)ˆsin()cos(][)ˆcos()sin(][
0)ˆsin(][
01
00
1
0
01
00
1
0
01
0
nnxnnx
nnxnnx
nnx
N
n
N
n
N
n
N
n
N
n
ω∑⋅φ−=ω∑⋅φ⇒
=φω∑+φω∑⇒
=φ+ω∑
−
=
−
=
−
=
−
=
−
=
54
ω∑
ω∑−=φ −
=
−=−
)cos(][)sin(][
tanˆ0
10
0101
nnxnnx
Nn
Nn
In fact, the AML solution is reasonable:
ω∑+φ
ω∑−φ=
>>
ω∑+φ
ω∑+φ−−≈
ω+φ+ω∑
ω+φ+ω∑−=φ
−=
−=−
−=
−=−
−=
−=−
)cos(][2)cos(
)sin(][2)sin(tan
1,)cos(][)cos(
2
)sin(][)sin(2tan
)cos(])[)cos(()sin(])[)cos((
tanˆ
010
0101
010
0101
0010
00101
nnwNA
nnwNA
NnnwNA
nnwNA
nnwnAnnwnA
Nn
Nn
Nn
Nn
Nn
Nn
55
For parameter transformation, if there is a one-to-one relationship between )(θ=α g and θ, the ML estimate for α is simply:
)ˆ(ˆ θ=α g (5.12) Example 5.11 Given N samples of a white Gaussian process , ][nw 1,,1,0 −= Nn L , with unknown variance 2. Determine the power of in dB. σ ]n[w The power in dB is related to 2σ by
)(log10 210 σ=P
which is a one-to-one relationship. To find the ML estimate for P , we first find the ML estimate for 2σ
56
][2
1)ln(2
)2ln(2
))σ(ln(
][2
1exp)2(
1)σ(
21
0222
21
022/22
nxNN;p
nx;p
N
n
N
nN
∑σ
−σ−π−=⇒
∑
σ−
πσ=
−
=
−
=
w
w
Differentiating the log-likelihood function w.r.t. to 2σ :
][2
12σ
))σ(ln( 21
0422
2nxN;p N
n∑
σ+
σ−=
∂
∂ −
=
w
Setting the resultant expression to zero:
][1ˆ][ˆ21
ˆ221
0
221
042 nxN
nxN N
n
N
n∑=σ⇒∑
σ=
σ
−
=
−
=
As a result,
∑=σ=−
=][1log10)ˆ(log10ˆ 21
010
210 nx
NP
N
n
57
Example 5.12 Given
1,,1,0],[][ −=+= NnnwAnx L where A is an unknown constant and is a white Gaussian noise with unknown variance 2. Find the ML estimates of
][nwσ A and 2σ .
TN
nN AAnx;p ][θθx 221
022/2 ,)][(2
1exp)2(
1)( σ=
−∑
σ−
πσ=
−
=
⇒ 21
0422
1
02
)][(2
12σ
))(ln(
)][(1))(ln(
AnxN;p
AnxA;p
N
n
N
n
−∑σ
+σ
−=∂
∂
−∑σ
=∂
∂
−
=
−
=
θx
θx
58
Solving the first equation:
xnxN
AN
n=∑=
−
=][1ˆ 1
0
Putting xAA == ˆ in the second equation:
21
0
2 )][(1ˆ xnxN
N
n−∑=σ
−
=
Numerical Computation of ML Solution When the ML solution is not of closed form, it can be computed by Grid search
Numerical methods: Newton-Raphson, Golden section, bisection, etc
59
Example 5.13 From Example 5.10, the ML solution of φ is determined from
)ˆ22sin(2
)ˆsin(][ 01
00
1
0φ+ω∑=φ+ω∑
−
=
−
=nAnnx
N
n
N
n
Suggest methods to find φ Approach 1: Grid search
Let
)22sin(2
)sin(][)( 01
00
1
0φ+ω∑−φ+ω∑=φ
−
=
−
=nAnnxg
N
n
N
n
It is obvious that φ = root of )(φg
60
The idea of grid search is simple:
Search for all possible values of or a given range of to find root ⇒
φ φ Values are discrete tradeoff between resolution & computation
e.g., Range for : any values in φ )2,0[ π ⇒Discrete points : 1000 resolution is 1000/2π
MATLAB source code: N=100; n=[0:N-1]; w = 0.2*pi; A = sqrt(2); p = 0.3*pi; np = 0.1; q = sqrt(np).*randn(1,N); x = A.*cos(w.*n+p)+q; for j=1:1000 pe = j/1000*2*pi; s1 =sin(w.*n+pe); s2 =sin(2.*w.*n+2.*pe); g(j) = x*s1'-A/2*sum(s2); end
61
pe = [1:1000]/1000; plot(pe,g)
Note: x-axis is )2/( πφ
62
stem(pe,g) axis([0.14 0.16 -2 2])
23240)2152.0( .-g =π⋅ , 21680)2153.0( .g =π⋅
⇒ (π=π⋅=φ 306.02153.0ˆ π± 001.0 )
63
For a smaller resolution, say 200 discrete points: clear pe; clear s1; clear s2; clear g; for j=1:200 pe = j/200*2*pi; s1 =sin(w.*n+pe); s2 =sin(2.*w.*n+2.*pe); g(j) = x*s1'-A/2*sum(s2);end pe = [1:200]/200; plot(pe,g)
64
stem(pe,g) axis([0.14 0.16 -2 2])
13061)2150.0( .-g =π⋅ , 11501)2155.0( .g =π⋅
⇒ (π=π⋅=φ 310.02155.0ˆ π± 005.0 )
⇒ Accuracy increases as number of grids increases
65
Approach 2: Newton/Raphson iterative procedure
ˆ
With initial guess , the root of 0φ )(φg can be determined from
)ˆ(')ˆ(
)()ˆ(ˆˆ
ˆ
1k
kkkk g
g
ddgg
k
φφ
=
φφφ
−φ=φ
φ=φ
+ (5.13)
)22sin(2
)sin(][)( 01
00
1
0φ+ω∑−φ+ω∑=φ
−
=
−
=nAnnxg
N
n
N
n
)22cos()cos(][
2)22cos(2
)cos(][)('
01
00
1
0
01
00
1
0
φ+ω∑−φ+ω∑=
⋅φ+ω∑−φ+ω∑=φ
−
=
−
=
−
=
−
=
nAnnx
nAnnxg
N
n
N
n
N
n
N
n
with 0ˆ 0 =φ
66
p1 = 0; for k=1:10 s1 =sin(w.*n+p1); s2 =sin(2.*w.*n+2.*p1); c1 =cos(w.*n+p1); c2 =cos(2.*w.*n+2.*p1); g = x*s1'-A/2*sum(s2); g1 = x*c1'-A*sum(c2); p1 = p1 - g/g1; p1_vector(k) = p1; end stem(p1_vector/(2*pi))
Newton/Raphson method converges at ~ 3rd iteration
π=π⋅=φ 305.021525.0ˆ
Q. Can you comment on the grid search & Newton/Raphson method?
67
ML Estimation for General Linear Model The general linear data model is given by
wHθx += (5.14) where x is the observed vector of size N w is Gaussian noise vector with known covariance matrix
pC
N ×H is known matrix of size θ is parameter vector of size p Based on (5.7), the PDF of parameterized by is x θ
−⋅⋅−−
π= )()(
21exp
)(det)2(1)( 1
2/12/ HθxCHθxC
θx; -TNp (5.15)
68
Since is not a function of , the ML solution is equivalent to C θ
{ })J(θ θθ
minargˆ = where )())( 1 HθxCHθ(xθ −⋅⋅−= -TJ
Differentiating )(θJ with respect to and then set the result to zero: θ
θHCHxCH
θHCHxCH-ˆ
0ˆ2211
11
⋅⋅⋅=⋅⋅⇒
=⋅⋅+⋅⋅-T-T
-T-T
As a result, the ML solution for linear model is
xCHHCHθ 111 )(ˆ -T-T ⋅= − (5.16) For white noise:
xHHHxIHHIHθ TT-w
T-w
T ⋅=⋅σ⋅⋅σ= −− 112112 )()())((ˆ (5.17)
69
Example 5.14 Given N pair of ( yx, ) where is error-free but x y is subject to error:
1,,1,0,][][][ −=++⋅= Nnnwcnxmny L where is white Gaussian noise vector with known covariance matrix w C Find the ML estimates for and m c
Tcmnwnxnwcm
nxny
nwcnxmny
][],[]1][[][]1][[][
][][][
=+⋅=+
⋅=⇒
++⋅=
θθ
⇒
]1[]1]1[[]1[
]1[]1]1[[]1[]0[]1]0[[]0[
−+⋅−=−
+⋅=+⋅=
NwNxNy
wxywxy
θ
θθ
LLL
70
Writing in matrix form: wHθy +=
where
TNyyy ]]1[,],1[],0[[ −= Ly
−
=
1]1[
1]1[1[0]
Nx
xx
MMH
Applying (5.16) gives
yCHHCHθ 111 )(ˆˆˆ −−− ⋅=
= TTcm
71
Example 5.15 Find the ML estimates of A, 0ω and φ for
1,1,,1,0],[)cos(][ 0 >>−=+φ+ω= NNnnwnAnx L
where is a white Gaussian noise with variance ][nw 2wσ
Recall from Example 5.6:
],[θθx φω=
φ+ω−∑
σ−
πσ=
−
=0
20
1
022/2 ,))cos(][(2
1exp)2(
1);( A,nAnxpN
nwN
w
The ML solution for can be found by minimizing θ
20
1
00 ))cos(][(),,( φ+ω−∑=φω
−
=nAnxAJ
N
n
72
This can be achieved by using a 3-D grid search or Netwon/Raphson method but it is computationally complex Another simpler solution is as follows
200
1
0
20
1
00
))sin()sin()cos()cos(][(
))cos(][(),,(
nAnAnx
nAnxAJ
N
n
N
n
ωφ+ωφ−∑=
φ+ω−∑=φω
−
=
−
=
Since A and are not quadratic in φ ),,( 0 φωAJ , the first step is to use parameter transformation:
)sin()cos(
2
1φ−=α
φ=αAA
⇒
αα−
=φ
α+α=
−
1
21
22
21
tan
A
73
Let
T
T
N
N
))]1(sin()sin(0[
))]1(cos()cos(1[
00
00
−ωω=
−ωω=
L
L
s
c
We have
[ ]scHαHα-xHα-x
scxscx
=
αα
==
α−α−α−α−=ωαα
,),()(
)()(),,(
2
1
2121021
T
TJ
Applying (5.17) gives
xHHHα TT ⋅= −1)(ˆ
74
Substituting back to ),,( 021 ωααJ :
( ) ( )
xHHHHx-xx
xHHHH-Ix
xHHHH-IHHHH-Ix
xHHHH-IxHHHH-I
xHHHH-xxHHHH-x
αH-xαH-x
⋅⋅⋅⋅=
⋅⋅⋅=
⋅⋅⋅⋅=
⋅⋅⋅⋅=
⋅⋅=
=ω
−
−
−−
−−
−−
TTTT
TTT
TTTTTT
TTTTT
TTTTT
TJ
1
1
11
11
110
)(
))((
))(())((
))(())((
))(())((
)ˆ()ˆ()(
Minimizing )( 0ωJ is identical to maximizing
xHHHHx ⋅⋅⋅ − TTT 1)( or
{ }xHHHHx ⋅⋅⋅=ω −
ω
TTT 10 )(maxargˆ
0
⇒ 3-D search is reduced to a 1-D search
75
After determining , can be obtained as well 0ω α For sufficiently large N :
[ ]
[ ]
( ) ( )2
01
0
22
1
11
)exp(][2
2
2/002/
)(
njnxN
N
NN
N
n
TT
T
TTT
T
T
TT
TTTTTTT
ω−∑=
+=
⋅
⋅≈
⋅
⋅=⋅⋅⋅
−
=
−
−−
xsxc
xsxcxsxc
xsxc
sscsscccxsxcxHHHHx
⇒
ω−∑=−
=ω
2
01
00 )exp(][1maxargˆ
0njnx
N
N
nω periodogram maximum ⇒
76
Least Squares Methods Parameter estimation is achieved via minimizing a least squares (LS) cost function Generally not optimum but computationally simple Use for classical parameter estimation No knowledge of the noise PDF is required Can be considered as a generalization of LS filtering
77
Variants of LS Methods
1. Standard LS
Consider the general linear data model:
wHθx += where
x is the observed vector of size N w is zero-mean noise vector with unknown covariance matrix
pN ×H is known matrix of size θ is parameter vector of size p The LS solution is given by
{
( ) ( )} xHHHHθ-xHθ-xθθ
TTT 1)(minargˆ −== (5.18)
which is equal to (5.17)
78
⇒ LS solution is optimum if covariance matrix of is and is w IC ⋅σ= 2w w
Gaussian distributed Define
Hθ-xe =
where
TNeee )]1()1()0([ −= Le
(5.18) is equivalent to
∑=−
=)(minargˆ 21
0ke
N
kθθ (5.19)
which is similar to LS filtering Q. Any differences between (5.19) and LS filtering?
79
Example 5.16 Given
1,,1,0],[][ −=+= NnnwAnx L where A is an unknown constant and is a zero-mean noise ][nw Find the LS solution of A Using (5.19),
( )
−∑=−
=
21
0][minargˆ AnxA
N
nA
Differentiating ( )21
0][ Anx
N
n−∑
−
= with respect to A and set the result to 0:
][1ˆ 1
0nx
NA
N
n∑=−
=
80
On the other hand, writing ]}[{ nx in matrix form:
wHx += A where
=
1
11
MH
Using (5.18),
][
]1[
]1[]0[
111
1
11
111ˆ 1
0
1
1
nxN
Nx
xx
AN
n∑⋅=
−
⋅
⋅=−
=
−
−
ML
ML ][][
Both (5.18) and (5.19) give the same answer and the LS solution is
81
optimum if the noise is white Gaussian Example 5.17
Consider the LS filtering problem again. Given
1,,1,0],[][][ −=+⋅= NnnqWnXnd T L
where
][nd is desired response TLnxnxnxnX ]]1[]1[][[][ +−−= L is the input signal vector
TLwwwW ][ 110 −= L is the unknown filter weight vector
][nq is zero-mean noise
Writing in matrix form:
W, =+⋅= WqWHd Using (5.18):
dHHHW TT 1)(ˆ −=
82
where
−−−
=
−
=
][]2[]1[
0]0[]1[00]0[
)1(
)1()0(
LNxNxNx
xxx
NX
XX
T
T
T
L
MMMM
L
L
MH
with 0]2[]1[ ==−=− Lxx Note that
dH
HHT
dx
Txx
R
R
=
=
where xxR is not the original version but not modified version of (3.6)
83
Example 5.18 Find the LS estimate of A for
1,1,,1,0],[)cos(][ 0 >>−=+φ+ω= NNnnwnAnx L
where 0ω and are known constants while is zero-mean noise φ ][nw Using (5.19),
( )
φ+ω−∑=−
=
20
1
0)cos(][minargˆ nAnxA
N
nA
Differentiate ( )201
0)cos(][ φ+ω−∑
−
=nAnx
N
n with respect to A & set result to 0:
( )
)(cos)cos(][
0)cos()cos(][2
021
00
1
0
001
0
φ+ω∑=φ+ω∑⇒
=φ+ω−⋅φ+ω−∑
−
=
−
=
−
=
nAnnx
nnAnx
N
n
N
n
N
n
84
The LS solution is then
)(cos
)cos(][ˆ
021
0
01
0
φ+ω∑
φ+ω∑= −
=
−
=
n
nnxA N
n
N
n
2. Weighted LS Use a general form of LS via a symmetric weighting matrix W ( ) ( ){ } WxHWHHHθ-xWHθ-xθ
θTTT 1)(minargˆ −== (5.20)
such that TWW =
Due to the presence of , it is generally difficult to write the cost function ( )
W( )Hθ-xWHθ-x T in scalar form as in (5.19)
85
Rationale of using : put larger weights on data with smaller errors W put smaller weights on data with larger errors When = where is covariance matrix of the noise vector: W 1-C C xCHHCHθ 111 )(ˆ -T-T −= (5.21) which is equal to the ML solution and is optimum for Gaussian noise Example 5.19 Given two noisy measurements of A:
11 wAx += and 22 wAx +=
where and are zero-mean uncorrelated noises with known variances 2 and . Determine the optimum weighted LS solution
1w 2w22σ1σ
86
Use
σσ=
σσ==
−
22
21
1
22
211
/100/1
00-CW
Grouping and into matrix form: 1x 2x
+⋅
=
2
1
2
111
ww
Axx
or wHx +⋅= A
Using (5.21)
[ ] [ ]
σσ
σσ==
−−
2
122
21
1
22
21111
/100/111
11
/100/111)(ˆ
xx
A -T-T xCHHCH
87
As a result,
222
21
21
122
21
22
22
221
11
22
21
11ˆ xxxxA ⋅σ+σ
σ+⋅
σ+σ
σ=
σ+
σ
σ+
σ=
−
Note that If , a larger weight is placed on and vice versa 2
122 σ>σ 1x
If , the solution is equal to the standard sample mean w w
21
22 σ=σ
The solution will be more complicated if and are correlated 2 2
1 2 Exact values for and are not necessary, only ratio is needed 1σ 2σ
Define , we have 2
221 /σσ=λ
21 111ˆ xxA ⋅
λ+λ
+⋅λ+
=
88
3. Nonlinear LS
The LS cost function cannot be represented as a linear model as in
wHθx += In general, it is more complex to solve, e.g., The LS estimates for 0,ωA and φ can be found by minimizing
20
1
0))cos(][( φ+ω−∑
−
=nAnx
N
n
whose solution is not straightforward as seen in Example 5.15 Grid search and numerical methods are used to find the minimum
89
4. Constrained LS
The linear LS cost function is minimized subject to constraints:
{ }
( ) ( )Hθ-xHθ-xθθ
Tminargˆ = subject to (5.22) S where is a set of equalities/inequalities in terms of S θ Generally it can be solved by linear/nonlinear programming, but simpler solution exists for linear and quadratic constraint equations, e.g., Linear constraint equation: 10321 =θ+θ+θ Quadratic constraint equation: 1002
322
21 =θ+θ+θ
Other types of constraints: 10321 >θ>θ>θ
10033
221 ≥θ+θ+θ
90
Consider the constraints is S
bAθ = which contains r linear equations. The constrained LS problem for linear model is
( ) ( ){ }Hθ-xHθ-xθθ
Tminargˆ = subject to bAθ = (5.23)
The technique of Lagrangian multipliers can solve (5.23) as follows Define the Lagrangian
( ) ( ) )( b-AθλHθ-xHθ-x TTcJ += (5.24)
where λ is a r -length vector of Lagrangian multipliers The procedure is first solve λ then θ
91
Expanding (5.24):
bλ-AθλHθHθxH2θ-xx TTTTTTTcJ ++=
Differentiate cJ with respect to : θ
λAHθHx-2Hθ
TTTcJ ++=∂∂ 2
Set the result to zero:
λAHH-θλAHH-xHHHθ
0λAθHHx2H-
TTTTTTc
Tc
TT
111 )(21ˆ)(
21)(ˆ
ˆ2
−−− ==⇒
=++
where is the LS solution. Put into : θ cθ bAθ =
)ˆ())((2
)(21ˆˆ 111 b-θAAHHAλbλAHHA-θAθA −−− =⇒== TTTT
c
92
Put λ back to : cθ
)ˆ())(()(ˆˆ 111 b-θAAHHAAHH-θθ −−−= TTTTc
Idea of constrained LS can be illustrated by finding minimum value of y :
2 23 +−= xxy subject to 1=− yx
93
5. Total LS
Motivation: Noises at both and : x Hθ 21 wHθwx +=+ (5.25) where and are zero-mean noise vectors 1w 2w A typical example is LS filtering in the presence of both input noise and output noise. The noisy input is
1,,1,0),()()( −=+= Nnknkskx i L and the noisy output is
1,,1,0),()()()( −=+⊗= Nnknkhkskr o L The parameters to be estimated are )}({ kh given )(kx and )(ky
94
Another example is in frequency estimation using linear prediction: For a single sinusoid )cos()( φ+ω= kAks , it is true that
)2()1()cos(2)( −−−ω= ksksks s )(k is perfectly predicted by )1( −ks and )1( −ks :
)2()1()( 10 −+−= ksaksaks It is desirable to obtain )cos(20 ω=a and 11 −=a in estimation process In the presence of noise, the observed signal is
1,,1,0),()()( −=+= Nnkwkskx L The linear prediction model is now
95
1,,1,0),2()1()( 10 −=−+−= Nnkxakxakx L 0
)3()2()1(
)1()2()3()0()1()2(
10
10
10
−+−=−
+=+=
NxaNxaNx
xaxaxxaxax
LLL⇒
−−
=
−1
0
)2()2(
)1()2()()1(
)1(
)3()2(
aa
NxNx
xxxx
Nx
xx
MMM
−−
+
−−
=
−
+
−1
0
1
0
)2()2(
)1()2()0()1(
)2()2(
)1()2()0()1(
)1(
)3()2(
)1(
)3()2(
aa
NwNw
wwww
aa
NsNs
ssss
Nw
ww
Ns
ss
MMMMMM
6. Mixed LS
A combination of LS, weighted LS, nonlinear LS, constrained LS and/or total LS Examples: weighted LS with constraints, total LS with constraints, etc.
96
Questions for Discussion
1. Suppose you have N pairs of ),( ii yx , Ni ,,2,1 L= and you need to fit them into the model of axy = . Assuming that only }{ iy contain zero-mean noise, determine the least squares estimate for . a
(Hint: ithe relationship between and ix iy is
Ninaxy iii ,,2,1, L=+= where { } are the noise in in }{ iy .)
97
2. Use least squares to estimate the line axy = in Q.1 but now only }{ ix
contain zero-mean noise. 3. In a radar system, the received signal is
)()()( 0 nwnsnr +τ−α= where the range R of an object is related to the time delay by
cR /20 =τ Suppose we get an unbiased estimate of , say, 0τ 0τ , and its variance is
. Determine the corresponding range variance ˆ , where )ˆvar( 0τ )var(R R is the estimate of R . If and 2
0 )1.0()ˆvar( sµ=τ 18103 −×= msc , what is the value of ? )ˆvar(R
98
99