Upload
danilo-calle
View
217
Download
0
Embed Size (px)
Citation preview
7/28/2019 Espanol l Ectr 16
1/33
1
16. Mean Square Estimation
Given some information that is related to an unknown quantity of
interest, the problem is to obtain a good estimate for the unknown interms of the observed data.
Suppose represent a sequence of random
variables about whom one set of observations are available, and Y
represents an unknown random variable. The problem is to obtain agood estimate forYin terms of the observations
Let
represent such an estimate forY.
Note that can be a linear or a nonlinear function of the observationClearly
represents the error in the above estimate, and the square of
nXXX ,,, 21
.,,, 21 nXXX
)(),,,( 21 XXXXY n (16-1)
)(.,,, 21 nXXX
)()( XYYYX (16-2)
2
|| PILLAI
7/28/2019 Espanol l Ectr 16
2/33
2
the error. Since is a random variable, represents the mean
square error. One strategy to obtain a good estimator would be to
minimize the mean square error by varying over all possible forms
of and this procedure gives rise to the Minimization of the
Mean Square Error (MMSE) criterion for estimation. Thus under
MMSE criterion,the estimator is chosen such that the mean
square error is at its minimum.
Next we show that the conditional mean ofYgivenXis the
best estimator in the above sense.Theorem1: Under MMSE criterion, the best estimator for the unknown
Yin terms of is given by the conditional mean ofY
givesX. Thus
Proof : Let represent an estimate ofYin terms of
Then the error and the mean square
error is given by
}||{2E
),(
)(
nXXX ,,, 21
}.|{)( XYEXY (16-3)
)( XY ).,,,( 21 nXXXX ,YY
}|)(|{}||{}||{ 2222 XYEYYEE (16-4) PILLAI
}||{ 2E
7/28/2019 Espanol l Ectr 16
3/33
3
Since
we can rewrite (16-4) as
where the inner expectation is with respect to Y, and the outer one is
with respect to
Thus
To obtain the best estimator we need to minimize in (16-6)with respect to In (16-6), since
and the variable appears only in the integrand term, minimization
of the mean square error in (16-6) with respect to is
equivalent to minimization of with respect to
}]|{[][ XzEEzE zX
}]|)(|{[}|)(|{
z
2
z
22 XXYEEXYE YX
.X
.)(}|)(|{
}]|)(|{[
2
22
dxXfXXYE
XXYEE
X
(16-6)
(16-5)
,2
. ,0)( XfX ,0}|)(|{
2 XXYE
2
}|)(|{ 2 XXYE .
PILLAI
7/28/2019 Espanol l Ectr 16
4/33
4
Since X is fixed at some value, is no longer random,
and hence minimization of is equivalent to
This gives
or
But
since when is a fixed number Using (16-9)
)(X
}|)(|{ 2 XXYE
.0}|)(|{ 2
XXYE
(16-7)
0}|)({| XXYE
(16-8)
),(}|)({ XXXE (16-9)
)(, XxX ).(x
PILLAI
.0}|)({}|{ XXEXYE
7/28/2019 Espanol l Ectr 16
5/33
5
in (16-8) we get the desired estimator to be
Thus the conditional mean ofYgiven represents the bestestimator forYthat minimizes the mean square error.
The minimum value of the mean square error is given by
As an example, suppose is the unknown. Then the bestMMSE estimator is given by
Clearly if then indeed is the best estimator forY
}.,,,|{}|{)( 21 nXXXYEXYEXY
nXXX ,,, 21
.0)}|{var(
]}|)|(|{[}|)|(|{
)var(
222
min
XYE
XXYEYEEXYEYE
XY
(16-11)
(16-10)
3
XY
,3XY 3 XY
.}|{}|{ 33 XXXEXYEY (16-12)
PILLAI
7/28/2019 Espanol l Ectr 16
6/33
6
in terms ofX. Thus the best estimator can be nonlinear.
Next, we will consider a less trivial example.
Example : Let
where k> 0 is a suitable normalization constant. To determine the best
estimate forYin terms ofX, we need
Thus
Hence the best MMSE estimator is given by
otherwise,0
10,
),(,yxkxy
yxf YX
).|(|
xyfXY
1.x0,2
)1(
2
),()(
212
1
1
,
xkxkxy
kxydydyyxfxf
x
x xYXX
y
x
1
1
.10;1
2
2/)1()(
),()|(
22
,
yxx
y
xkx
kxy
xf
yxfxyf
X
YX
XY
PILLAI
(16-13)
7/28/2019 Espanol l Ectr 16
7/33
7
Once again the best estimator is nonlinear. In general the best
estimator is difficult to evaluate, and hence next we
will examine the special subclass of best linear estimators.
Best Linear EstimatorIn this case the estimator is a linear function of the
observations Thus
where are unknown quantities to be determined. The
mean square error is given by
.1
)1(
3
2
1
1
3
2
13
2
)|(}|{)(
2
2
2
31
2
3
1
2
1
21
1
2
1
22
|
x
xx
x
x
x
y
dyydyy
dyxyfyXYEXY
x
xxx x
y
x XY
}|{
XYE
(16-14)
Y
.,,, 21 nXXX
naaa ,,, 21
n
iiinnl XaXaXaXaY
12211 . (16-15)
)
( lYY PILLAI
7/28/2019 Espanol l Ectr 16
8/33
8
and under the MMSE criterion should be chosen so
that the mean square error is at its minimum possible
value. Let represent that minimum possible value. Then
To minimize (16-16), we can equate
This gives
But
naaa ,,, 21
}||{}|{|}||{ 222 iil XaYEYYEE
}||{ 2E2
n
n
iii
aaan XaYE
n 1
2
,,,
2 }.|{|min21
(16-17)
,n.,,kEak
21,0}|{| 2
(16-18)
.02||
}|{|
*22
kkk aE
aEE
a
(16-19)
PILLAI
(16-16)
7/28/2019 Espanol l Ectr 16
9/33
9
Substituting (16-19) in to (16-18), we get
or the best linear estimator must satisfy
Notice that in (16-21), represents the estimation error
and represents the data. Thus from(16-21), the error is orthogonal to the data for the
best linear estimator. This is the orthogonality principle.
In other words, in the linear estimator (16-15), the unknown
constants must be selected such that the error
.
)()(11
k
k
n
iii
kk
n
iii
k
Xa
Xa
a
Y
a
XaY
a
,0}{2}|{| *
2
k
k
XEa
E
.,,2,1,0}{*
nkXE k
(16-20)
(16-21)
n
i iiXaY 1 ),(
nkXk 1, nkXk 1,
naaa ,,, 21
PILLAI
7/28/2019 Espanol l Ectr 16
10/33
10
is orthogonal to every data for the
best linear estimator that minimizes the mean square error.
Interestingly a general form of the orthogonality principle
holds good in the case of nonlinear estimators also.Nonlinear Orthogonality Rule: Let represent any functional
form of the data and the best estimator forYgiven With
we shall show that
implying that
This follows since
n
i iiXaY
1 nXXX ,,, 21
)(Xh}|{ XYE .X
}|{ XYEYe
).(}|{ XhXYEYe
,0)}({ XehE
.0)}({)}({
]}|)([{)}({
})(]|[{)}({
)}(])|[{()}({
XYhEXYhE
XXYhEEXYhE
XhXYEEXYhE
XhXYEYEXehE
PILLAI
(16-22)
7/28/2019 Espanol l Ectr 16
11/33
11
Thus in the nonlinear version of the orthogonality rule the error is
orthogonal to any functional form of the data.
The orthogonality principle in (16-20) can be used to obtain
the unknowns in the linear case.
For example suppose n = 2, and we need to estimate Yin
terms of linearly. Thus
From (16-20), the orthogonality rule gives
Thus
or
naaa ,,, 21
21 andXX
2211 XaXaYl
0}){(}X{
0}){(}X{
*
22211
*
2
*
12211
*
1
XXaXaYEE
XXaXaYEE
}{}|{|}{
}{}{}|{|
*
22
2
21
*
21
*
12
*
121
2
1
YXEaXEaXXE
YXEaXXEaXE
PILLAI
7/28/2019 Espanol l Ectr 16
12/33
7/28/2019 Espanol l Ectr 16
13/33
13
where are the optimum values from (16-21).
Since the linear estimate in (16-15) is only a special case of
the general estimator in (16-1), the best linear estimator that
satisfies (16-20) cannot be superior to the best nonlinear estimator
Often the best linear estimator will be inferior to the best
estimator in (16-3).
This raises the following question. Are there situations in
which the best estimator in (16-3) also turns out to be linear ? In
those situations it is enough to use (16-21) and obtain the bestlinear estimators, since they also represent the best global estimators.
Such is the case ifYand are distributed as jointly Gaussia
We summarize this in the next theorem and prove that result.
Theorem2: If and Y are jointly Gaussian zero
naaa ,,, 21
)(X
}.|{ XYE
}{}|{|
}){(}{
1
*2
1
**2
n
i
ii
n
iiin
YXEaYE
YXaYEYE
(16-25)
nXXX ,,, 21
nXXX ,,, 21 PILLAI
7/28/2019 Espanol l Ectr 16
14/33
14
mean random variables, then the best estimate forYin terms of
is always linear.
Proof : Let
represent the best (possibly nonlinear) estimate ofY, and
the best linear estimate of Y. Then from (16-21)
is orthogonal to the data Thus
Also from (16-28),
nXXX ,,, 21
}|{),,,( 21 XYEXXXY n (16-26)
n
i
iil XaY1
(16-27)
.1, nkXk
(16-28)
.1,0}X{ *k nkE (16-29)
.0}{}{}{1
n
iii XEaYEE (16-30)
PILLAI
1
n
l i ii
Y Y Y a X
7/28/2019 Espanol l Ectr 16
15/33
15
Using (16-29)-(16-30), we get
From (16-31), we obtain that and are zero mean uncorrelatedrandom variables for But itself represents a Gaussian
random variable, since from (16-28) it represents a linear combination
of a set of jointly Gaussian random variables. Thus and X are
jointly Gaussian and uncorrelated random variables. As a result, and
X are independent random variables. Thus from their independence
But from (16-30), and hence from (16-32)
Substituting (16-28) into (16-33), we get
.1 nk kX
.1,0}{}{}{**
nkXEEXE kk (16-31)
}.{}|{ EXE (16-32)
,0}{ E
.0}|{ XE (16-33)
0}|{}|{ 1
XXaYEXEn
iii
PILLAI
7/28/2019 Espanol l Ectr 16
16/33
16
or
From (16-26), represents the best possible estimator,and from (16-28), represents the best linear estimator.
Thus the best linear estimator is also the best possible overall estimator
in the Gaussian case.
Next we turn our attention to prediction problems using linearestimators.
Linear Prediction
Suppose are known and is unknown.Thus and this represents a one-step prediction problem.
If the unknown is then it represents a k-step ahead prediction
problem. Returning back to the one-step predictor, let
represent the best linear predictor. Then
.}|{}|{1
1
l
n
iii
n
iii YXaXXaEXYE
(16-34)
)(}|{ xXYE
n
i iiXa
1
nXXX ,,, 21 1nX,1 nXY,knX
1
nX
PILLAI
7/28/2019 Espanol l Ectr 16
17/33
17
where the error
is orthogonal to the data, i.e.,
Using (16-36) in (16-37), we get
Suppose represents the sample of a wide sense stationaryiX
(16-35)
,1,
1
1
1
12211
1111
n
n
i ii
nnn
n
iiinnnn
aXa
XXaXaXa
XaXXX
(16-36)
.1,0}{ * nkXE kn (16-37)
1
1
** .1,0}{}{n
ikiikn nkXXEaXE (16-38)
PILLAI
11
= ,n
n i ii
X a X
7/28/2019 Espanol l Ectr 16
18/33
18
stochastic process so that
Thus (16-38) becomes
Expanding (16-40) for we get the following set of
linear equations.
Similarly using (16-25), the minimum mean square error is given by
**)(}{ ikkiki rrkiRXXE
)(tX
(16-39)
.1,1,0}{ 1
1
1
* nkaraXE n
n
ikiikn
(16-40)
,,,2,1 nk
.0
20
10
10
*
33
*
22
*
11
121302
*
11
1231201
nkrrararara
krrararara
krrararara
nnnn
nnn
nnn
(16-41)
PILLAI
7/28/2019 Espanol l Ectr 16
19/33
19
The n equations in (16-41) together with (16-42) can be represented as
Let
.
}){(
}{}{}|{|
01
*
23
*
12
*
1
1
1
*
1
1
1
*
1
*
1
*22
rrararara
raXXaE
XEYEE
nnnn
n
i
ini
n
i
nii
nnnn
(16-42)
.
0
0
0
0
1
2
n
3
2
1
0
*
1
*
1
*
10
*
2
*
1
20
*
1
*
2
110
*
1
210
n
nn
nn
n
n
n
a
a
a
a
rrrr
rrrr
rrrr
rrrr
rrrr
(16-43)
PILLAI
7/28/2019 Espanol l Ectr 16
20/33
20
Notice that is Hermitian Toeplitz and positive definite. Using
(16-44), the unknowns in (16-43) can be represented as
Let
nT
.
0*
1
*
1
*
110
*
1
210
rrrr
rrrr
rrrr
T
nn
n
n
n
(16-44)
1
2
2
n
13
2
1
of
column
Last
0
0
0
0
1
n
nn
n T
T
a
a
a
a
(16-45)
PILLAI
7/28/2019 Espanol l Ectr 16
21/33
21
Then from (16-45),
Thus
.
1,12,11,1
1,22221
1,11211
1
nn
n
n
n
n
n
n
nnn
n
nnn
n
TTT
TTT
TTT
T
.
1
1,1
1,2
1,1
2
2
1
nnn
n
n
n
n
n
n
T
T
T
a
a
a
,01
1,1
2 nn
n
n
T
(16-46)
(16-47)
(16-48)PILLAI
7/28/2019 Espanol l Ectr 16
22/33
22
and
Eq. (16-49) represents the best linear predictor coefficients, and they
can be evaluated from the last column of in (16-45). Using these,
The best one-step ahead predictor in (16-35) taken the form
and from (16-48), the minimum mean square error is given by the
(n +1, n +1) entry of
From (16-36), since the one-step linear prediction error
.
1
1,1
1,2
1,1
1,1
2
1
nn
n
n
n
n
n
nn
n
n T
T
T
T
a
a
a
(16-49)
nT
.1n
T
.)(1
1
1,
1,11
n
i
i
ni
nnn
n
n XTT
X (16-50)
,11111 XaXaXaX nnnnnn (16-51) PILLAI
7/28/2019 Espanol l Ectr 16
23/33
23
we can represent (16-51) formally as follows
Thus, let
them from the above figure, we also have the representation
The filter
represents anAR(n) filter, and this shows that linear prediction leads
to an auto regressive (AR) model.
n
n
nnn zazazaX
1 12
1
1
1
,1)( 12
1
1 n
nnn zazazazA
(16-52)
.)(
1 1 n
n
n XzA
n
nnn zazazazAzH
1
2
1
11
1
)(
1)(
(16-53)
PILLAI
7/28/2019 Espanol l Ectr 16
24/33
24
The polynomial in (16-52)-(16-53) can be simplified using
(16-43)-(16-44). To see this, we rewrite as
To simplify (16-54), we can make use of the following matrix identity
)(zAn)(zAn
2
n
11)1(
2
1
1)1(
12
1
)1(
21
0
0
0
]1,,,,[
1
]1,,,,[
1)(
n
nn
n
nn
nn
nn
n
Tzzz
a
a
a
zzz
zazazazazA
(16-54)
PILLAI
.
0
0
1
BCADC
A
I
ABI
DC
BA(16-55)
7/28/2019 Espanol l Ectr 16
25/33
25
Taking determinants, we get
In particular if we get
Using (16-57) in (16-54), with
.
1
BCADADC
BA
(16-56)
,0D
.0
)1(1C
BAA
BCAn
(16-57)
2
1)1(
0
,],1,,,,[
n
n
nnBTAzzzC
PILLAI
7/28/2019 Espanol l Ectr 16
26/33
26
we get
Referring back to (16-43), using Cramers rule to solve for
we get
.
1
||
01,,,
0
0
0
||
)1()(
1)1(
10
*
2
*
1
110*
1
210
2
1
2
zzz
rrrr
rrrr
rrrr
T
zz
T
TzA
nn
nn
n
n
n
n
n
n
n
n
n
n
(16-58)
),1(1 na
PILLAI
1||
||
||
1201
10
2
1
n
nn
n
n
n
n
nT
T
T
rr
rr
a
7/28/2019 Espanol l Ectr 16
27/33
27
or
Thus the polynomial (16-58) reduces to
The polynomial in (16-53) can be alternatively represented as
in (16-60), and in fact represents a stable
.0||
||
1
2 n
nn
T
T
.1
1
||
1)(
1
2
1
1
1)1(
10
*
2
*
1
110
*
1
210
1
n
nn
nn
nn
n
n
n
n
zazaza
zzz
rrrr
rrrr
rrrr
TzA
(16-59)
(16-60)
PILLAI
)(zAn
)(~)(
1)( nAR
zAzH
n
7/28/2019 Espanol l Ectr 16
28/33
28
AR filter of ordern, whose input error signal is white noise of
constant spectral height equal to and output is
It can be shown that has all its zeros in provided
thus establishing stability.
Linear prediction ErrorFrom (16-59), the mean square error using n samples is given
by
Suppose one more sample from the past is available to evaluate
( i.e., are available). Proceeding as abovethe new coefficients and the mean square error can be determined.
From (16-59)-(16-61),
n
||/|| 1nn TT .1nX
1nX
.0||
||
1
2 n
nn
T
T (16-61)
011 ,,,, XXXX nn 21n
PILLAI
.
||
|| 121
n
nn
T
T (16-62)
)(zAn 1|| z0|| nT
7/28/2019 Espanol l Ectr 16
29/33
29
Using another matrix identity it is easy to show that
Since we must have or for every n.
From (16-63), we have
or
since Thus the mean square error decreases as more
and more samples are used from the past in the linear predictor.
In general from (16-64), the mean square errors for the one-step
predictor form a monotonic nonincreasing sequence
).||1(||
|||| 21
1
2
1
nn
nn s
T
TT (16-63)
,0|| kT 0)||1(2
1 ns 1|| 1 ns
)||1(
||
||
||
|| 21
1
1
221
n
n
n
n
n s
T
T
T
T
nn
,)||1(22
1
22
1 nnnn s (16-64)
PILLAI
.1)||1( 21 ns
7/28/2019 Espanol l Ectr 16
30/33
30
whose limiting value
Clearly, corresponds to the irreducible error in linearprediction using the entire past samples, and it is related to the power
spectrum of the underlying process through the relation
where represents the power spectrum of
For any finite power process, we have
and since Thus
222
1
2 knn (16-65)
.02
)(nTX
).(nTX( ) 0XX
S
2
1
exp ln ( ) 0.2 XXS d
(16-66)
( ) ,XXS d
PILLAI
02
ln ( ) ( ) .XX XX
S d S d
( ( ) 0), ln ( ) ( ).XX XX XX
S S S
(16-67)
7/28/2019 Espanol l Ectr 16
31/33
31
Moreover, if the power spectrum is strictly positive at every
Frequency, i.e.,
then from (16-66)
and hence
i.e., For processes that satisfy the strict positivity condition in
(16-68) almost everywhere in the interval the finalminimum mean square error is strictly positive (see (16-70)).
i.e., Such processes are not completely predictable even using
their entire set of past samples, or they are inherently stochastic,
( ) 0, in - ,XX
S
ln ( ) .XX
S d
2
1exp ln ( ) 0
2XX
S d e
(16-68)
(16-69)
(16-70)
),,(
PILLAI
7/28/2019 Espanol l Ectr 16
32/33
32
since the next output contains information that is not contained in
the past samples. Such processes are known as regularstochastic
processes, and their power spectrum is strictly positive.
)(XX
S
Power Spectrum of a regular stochastic Process
PILLAI
Conversely, if a process has the following power spectrum,
such that in then from (16-70),( ) 0XX
S 21 .02
2
1
)(XX
S
7/28/2019 Espanol l Ectr 16
33/33
33
Such processes are completely predictable from their past data
samples. In particular
is completely predictable from its past samples, since consists
of line spectrum.
in (16-71) is a shape deterministic stochastic process.
k
kkk tanTX )cos()( (16-71)
( )XX
S
)(nTX
1
k
)(XX
S