Espanol l Ectr 16

7/28/2019 Espanol l Ectr 16

1/33

1

16. Mean Square Estimation

Given some information that is related to an unknown quantity of

interest, the problem is to obtain a good estimate for the unknown interms of the observed data.

Suppose represent a sequence of random

variables about whom one set of observations are available, and Y

represents an unknown random variable. The problem is to obtain agood estimate forYin terms of the observations

Let

represent such an estimate forY.

Note that can be a linear or a nonlinear function of the observationClearly

represents the error in the above estimate, and the square of

nXXX ,,, 21

.,,, 21 nXXX

)(),,,( 21 XXXXY n (16-1)

)(.,,, 21 nXXX

)()( XYYYX (16-2)

2

|| PILLAI


2/33

2

the error. Since is a random variable, represents the mean

square error. One strategy to obtain a good estimator would be to

minimize the mean square error by varying over all possible forms

of and this procedure gives rise to the Minimization of the

Mean Square Error (MMSE) criterion for estimation. Thus under

MMSE criterion,the estimator is chosen such that the mean

square error is at its minimum.

Next we show that the conditional mean ofYgivenXis the

best estimator in the above sense.Theorem1: Under MMSE criterion, the best estimator for the unknown

Yin terms of is given by the conditional mean ofY

givesX. Thus

Proof : Let represent an estimate ofYin terms of

Then the error and the mean square

error is given by

}||{2E

),(

)(

nXXX ,,, 21

}.|{)( XYEXY (16-3)

)( XY ).,,,( 21 nXXXX ,YY

}|)(|{}||{}||{ 2222 XYEYYEE (16-4) PILLAI

}||{ 2E


3/33

3

Since

we can rewrite (16-4) as

where the inner expectation is with respect to Y, and the outer one is

with respect to

Thus

To obtain the best estimator we need to minimize in (16-6)with respect to In (16-6), since

and the variable appears only in the integrand term, minimization

of the mean square error in (16-6) with respect to is

equivalent to minimization of with respect to

}]|{[][ XzEEzE zX

}]|)(|{[}|)(|{

z

2

z

22 XXYEEXYE YX

.X

.)(}|)(|{

}]|)(|{[

2

22

dxXfXXYE

XXYEE

X

(16-6)

(16-5)

,2

. ,0)( XfX ,0}|)(|{

2 XXYE

2

}|)(|{ 2 XXYE .

PILLAI


4/33

4

Since X is fixed at some value, is no longer random,

and hence minimization of is equivalent to

This gives

or

But

since when is a fixed number Using (16-9)

)(X

}|)(|{ 2 XXYE

.0}|)(|{ 2

XXYE

(16-7)

0}|)({| XXYE

(16-8)

),(}|)({ XXXE (16-9)

)(, XxX ).(x

PILLAI

.0}|)({}|{ XXEXYE


5/33

5

in (16-8) we get the desired estimator to be

Thus the conditional mean ofYgiven represents the bestestimator forYthat minimizes the mean square error.

The minimum value of the mean square error is given by

As an example, suppose is the unknown. Then the bestMMSE estimator is given by

Clearly if then indeed is the best estimator forY

}.,,,|{}|{)( 21 nXXXYEXYEXY

nXXX ,,, 21

.0)}|{var(

]}|)|(|{[}|)|(|{

)var(

222

min

XYE

XXYEYEEXYEYE

XY

(16-11)

(16-10)

3

XY

,3XY 3 XY

.}|{}|{ 33 XXXEXYEY (16-12)

PILLAI


6/33

6

in terms ofX. Thus the best estimator can be nonlinear.

Next, we will consider a less trivial example.

Example : Let

where k> 0 is a suitable normalization constant. To determine the best

estimate forYin terms ofX, we need

Thus

Hence the best MMSE estimator is given by

otherwise,0

10,

),(,yxkxy

yxf YX

).|(|

xyfXY

1.x0,2

)1(

2

),()(

212

1

1

,

xkxkxy

kxydydyyxfxf

x

x xYXX

y

x

1

1

.10;1

2

2/)1()(

),()|(

22

,

yxx

y

xkx

kxy

xf

yxfxyf

X

YX

XY

PILLAI

(16-13)


7/33

7

Once again the best estimator is nonlinear. In general the best

estimator is difficult to evaluate, and hence next we

will examine the special subclass of best linear estimators.

Best Linear EstimatorIn this case the estimator is a linear function of the

observations Thus

where are unknown quantities to be determined. The

mean square error is given by

.1

)1(

3

2

1

1

3

2

13

2

)|(}|{)(

2

2

2

31

2

3

1

2

1

21

1

2

1

22

|

x

xx

x

x

x

y

dyydyy

dyxyfyXYEXY

x

xxx x

y

x XY

}|{

XYE

(16-14)

Y

.,,, 21 nXXX

naaa ,,, 21

n

iiinnl XaXaXaXaY

12211 . (16-15)

)

( lYY PILLAI


8/33

8

and under the MMSE criterion should be chosen so

that the mean square error is at its minimum possible

value. Let represent that minimum possible value. Then

To minimize (16-16), we can equate

This gives

But

naaa ,,, 21

}||{}|{|}||{ 222 iil XaYEYYEE

}||{ 2E2

n

n

iii

aaan XaYE

n 1

2

,,,

2 }.|{|min21

(16-17)

,n.,,kEak

21,0}|{| 2

(16-18)

.02||

}|{|

*22

kkk aE

aEE

a

(16-19)

PILLAI

(16-16)


9/33

9

Substituting (16-19) in to (16-18), we get

or the best linear estimator must satisfy

Notice that in (16-21), represents the estimation error

and represents the data. Thus from(16-21), the error is orthogonal to the data for the

best linear estimator. This is the orthogonality principle.

In other words, in the linear estimator (16-15), the unknown

constants must be selected such that the error

.

)()(11

k

k

n

iii

kk

n

iii

k

Xa

Xa

a

Y

a

XaY

a

,0}{2}|{| *

2

k

k

XEa

E

.,,2,1,0}{*

nkXE k

(16-20)

(16-21)

n

i iiXaY 1 ),(

nkXk 1, nkXk 1,

naaa ,,, 21

PILLAI


10/33

10

is orthogonal to every data for the

best linear estimator that minimizes the mean square error.

Interestingly a general form of the orthogonality principle

holds good in the case of nonlinear estimators also.Nonlinear Orthogonality Rule: Let represent any functional

form of the data and the best estimator forYgiven With

we shall show that

implying that

This follows since

n

i iiXaY

1 nXXX ,,, 21

)(Xh}|{ XYE .X

}|{ XYEYe

).(}|{ XhXYEYe

,0)}({ XehE

.0)}({)}({

]}|)([{)}({

})(]|[{)}({

)}(])|[{()}({

XYhEXYhE

XXYhEEXYhE

XhXYEEXYhE

XhXYEYEXehE

PILLAI

(16-22)


11/33

11

Thus in the nonlinear version of the orthogonality rule the error is

orthogonal to any functional form of the data.

The orthogonality principle in (16-20) can be used to obtain

the unknowns in the linear case.

For example suppose n = 2, and we need to estimate Yin

terms of linearly. Thus

From (16-20), the orthogonality rule gives

Thus

or

naaa ,,, 21

21 andXX

2211 XaXaYl

0}){(}X{

0}){(}X{

*

22211

*

2

*

12211

*

1

XXaXaYEE

XXaXaYEE

}{}|{|}{

}{}{}|{|

*

22

2

21

*

21

*

12

*

121

2

1

YXEaXEaXXE

YXEaXXEaXE

PILLAI


12/33


13/33

13

where are the optimum values from (16-21).

Since the linear estimate in (16-15) is only a special case of

the general estimator in (16-1), the best linear estimator that

satisfies (16-20) cannot be superior to the best nonlinear estimator

Often the best linear estimator will be inferior to the best

estimator in (16-3).

This raises the following question. Are there situations in

which the best estimator in (16-3) also turns out to be linear ? In

those situations it is enough to use (16-21) and obtain the bestlinear estimators, since they also represent the best global estimators.

Such is the case ifYand are distributed as jointly Gaussia

We summarize this in the next theorem and prove that result.

Theorem2: If and Y are jointly Gaussian zero

naaa ,,, 21

)(X

}.|{ XYE

}{}|{|

}){(}{

1

*2

1

**2

n

i

ii

n

iiin

YXEaYE

YXaYEYE

(16-25)

nXXX ,,, 21

nXXX ,,, 21 PILLAI


14/33

14

mean random variables, then the best estimate forYin terms of

is always linear.

Proof : Let

represent the best (possibly nonlinear) estimate ofY, and

the best linear estimate of Y. Then from (16-21)

is orthogonal to the data Thus

Also from (16-28),

nXXX ,,, 21

}|{),,,( 21 XYEXXXY n (16-26)

n

i

iil XaY1

(16-27)

.1, nkXk

(16-28)

.1,0}X{ *k nkE (16-29)

.0}{}{}{1

n

iii XEaYEE (16-30)

PILLAI

1

n

l i ii

Y Y Y a X


15/33

15

Using (16-29)-(16-30), we get

From (16-31), we obtain that and are zero mean uncorrelatedrandom variables for But itself represents a Gaussian

random variable, since from (16-28) it represents a linear combination

of a set of jointly Gaussian random variables. Thus and X are

jointly Gaussian and uncorrelated random variables. As a result, and

X are independent random variables. Thus from their independence

But from (16-30), and hence from (16-32)

Substituting (16-28) into (16-33), we get

.1 nk kX

.1,0}{}{}{**

nkXEEXE kk (16-31)

}.{}|{ EXE (16-32)

,0}{ E

.0}|{ XE (16-33)

0}|{}|{ 1

XXaYEXEn

iii

PILLAI


16/33

16

or

From (16-26), represents the best possible estimator,and from (16-28), represents the best linear estimator.

Thus the best linear estimator is also the best possible overall estimator

in the Gaussian case.

Next we turn our attention to prediction problems using linearestimators.

Linear Prediction

Suppose are known and is unknown.Thus and this represents a one-step prediction problem.

If the unknown is then it represents a k-step ahead prediction

problem. Returning back to the one-step predictor, let

represent the best linear predictor. Then

.}|{}|{1

1

l

n

iii

n

iii YXaXXaEXYE

(16-34)

)(}|{ xXYE

n

i iiXa

1

nXXX ,,, 21 1nX,1 nXY,knX

1

nX

PILLAI


17/33

17

where the error

is orthogonal to the data, i.e.,

Using (16-36) in (16-37), we get

Suppose represents the sample of a wide sense stationaryiX

(16-35)

,1,

1

1

1

12211

1111

n

n

i ii

nnn

n

iiinnnn

aXa

XXaXaXa

XaXXX

(16-36)

.1,0}{ * nkXE kn (16-37)

1

1

** .1,0}{}{n

ikiikn nkXXEaXE (16-38)

PILLAI

11

= ,n

n i ii

X a X


18/33

18

stochastic process so that

Thus (16-38) becomes

Expanding (16-40) for we get the following set of

linear equations.

Similarly using (16-25), the minimum mean square error is given by

**)(}{ ikkiki rrkiRXXE

)(tX

(16-39)

.1,1,0}{ 1

1

1

* nkaraXE n

n

ikiikn

(16-40)

,,,2,1 nk

.0

20

10

10

*

33

*

22

*

11

121302

*

11

1231201

nkrrararara

krrararara

krrararara

nnnn

nnn

nnn

(16-41)

PILLAI


19/33

19

The n equations in (16-41) together with (16-42) can be represented as

Let

.

}){(

}{}{}|{|

01

*

23

*

12

*

1

1

1

*

1

1

1

*

1

*

1

*22

rrararara

raXXaE

XEYEE

nnnn

n

i

ini

n

i

nii

nnnn

(16-42)

.

0

0

0

0

1

2

n

3

2

1

0

*

1

*

1

*

10

*

2

*

1

20

*

1

*

2

110

*

1

210

n

nn

nn

n

n

n

a

a

a

a

rrrr

rrrr

rrrr

rrrr

rrrr

(16-43)

PILLAI


20/33

20

Notice that is Hermitian Toeplitz and positive definite. Using

(16-44), the unknowns in (16-43) can be represented as

Let

nT

.

0*

1

*

1

*

110

*

1

210

rrrr

rrrr

rrrr

T

nn

n

n

n

(16-44)

1

2

2

n

13

2

1

of

column

Last

0

0

0

0

1

n

nn

n T

T

a

a

a

a

(16-45)

PILLAI


21/33

21

Then from (16-45),

Thus

.

1,12,11,1

1,22221

1,11211

1

nn

n

n

n

n

n

n

nnn

n

nnn

n

TTT

TTT

TTT

T

.

1

1,1

1,2

1,1

2

2

1

nnn

n

n

n

n

n

n

T

T

T

a

a

a

,01

1,1

2 nn

n

n

T

(16-46)

(16-47)

(16-48)PILLAI


22/33

22

and

Eq. (16-49) represents the best linear predictor coefficients, and they

can be evaluated from the last column of in (16-45). Using these,

The best one-step ahead predictor in (16-35) taken the form

and from (16-48), the minimum mean square error is given by the

(n +1, n +1) entry of

From (16-36), since the one-step linear prediction error

.

1

1,1

1,2

1,1

1,1

2

1

nn

n

n

n

n

n

nn

n

n T

T

T

T

a

a

a

(16-49)

nT

.1n

T

.)(1

1

1,

1,11

n

i

i

ni

nnn

n

n XTT

X (16-50)

,11111 XaXaXaX nnnnnn (16-51) PILLAI


23/33

23

we can represent (16-51) formally as follows

Thus, let

them from the above figure, we also have the representation

The filter

represents anAR(n) filter, and this shows that linear prediction leads

to an auto regressive (AR) model.

n

n

nnn zazazaX

1 12

1

1

1

,1)( 12

1

1 n

nnn zazazazA

(16-52)

.)(

1 1 n

n

n XzA

n

nnn zazazazAzH

1

2

1

11

1

)(

1)(

(16-53)

PILLAI


24/33

24

The polynomial in (16-52)-(16-53) can be simplified using

(16-43)-(16-44). To see this, we rewrite as

To simplify (16-54), we can make use of the following matrix identity

)(zAn)(zAn

2

n

11)1(

2

1

1)1(

12

1

)1(

21

0

0

0

]1,,,,[

1

]1,,,,[

1)(

n

nn

n

nn

nn

nn

n

Tzzz

a

a

a

zzz

zazazazazA

(16-54)

PILLAI

.

0

0

1

BCADC

A

I

ABI

DC

BA(16-55)


25/33

25

Taking determinants, we get

In particular if we get

Using (16-57) in (16-54), with

.

1

BCADADC

BA

(16-56)

,0D

.0

)1(1C

BAA

BCAn

(16-57)

2

1)1(

0

,],1,,,,[

n

n

nnBTAzzzC

PILLAI


26/33

26

we get

Referring back to (16-43), using Cramers rule to solve for

we get

.

1

||

01,,,

0

0

0

||

)1()(

1)1(

10

*

2

*

1

110*

1

210

2

1

2

zzz

rrrr

rrrr

rrrr

T

zz

T

TzA

nn

nn

n

n

n

n

n

n

n

n

n

n

(16-58)

),1(1 na

PILLAI

1||

||

||

1201

10

2

1

n

nn

n

n

n

n

nT

T

T

rr

rr

a


27/33

27

or

Thus the polynomial (16-58) reduces to

The polynomial in (16-53) can be alternatively represented as

in (16-60), and in fact represents a stable

.0||

||

1

2 n

nn

T

T

.1

1

||

1)(

1

2

1

1

1)1(

10

*

2

*

1

110

*

1

210

1

n

nn

nn

nn

n

n

n

n

zazaza

zzz

rrrr

rrrr

rrrr

TzA

(16-59)

(16-60)

PILLAI

)(zAn

)(~)(

1)( nAR

zAzH

n


28/33

28

AR filter of ordern, whose input error signal is white noise of

constant spectral height equal to and output is

It can be shown that has all its zeros in provided

thus establishing stability.

Linear prediction ErrorFrom (16-59), the mean square error using n samples is given

by

Suppose one more sample from the past is available to evaluate

( i.e., are available). Proceeding as abovethe new coefficients and the mean square error can be determined.

From (16-59)-(16-61),

n

||/|| 1nn TT .1nX

1nX

.0||

||

1

2 n

nn

T

T (16-61)

011 ,,,, XXXX nn 21n

PILLAI

.

||

|| 121

n

nn

T

T (16-62)

)(zAn 1|| z0|| nT


29/33

29

Using another matrix identity it is easy to show that

Since we must have or for every n.

From (16-63), we have

or

since Thus the mean square error decreases as more

and more samples are used from the past in the linear predictor.

In general from (16-64), the mean square errors for the one-step

predictor form a monotonic nonincreasing sequence

).||1(||

|||| 21

1

2

1

nn

nn s

T

TT (16-63)

,0|| kT 0)||1(2

1 ns 1|| 1 ns

)||1(

||

||

||

|| 21

1

1

221

n

n

n

n

n s

T

T

T

T

nn

,)||1(22

1

22

1 nnnn s (16-64)

PILLAI

.1)||1( 21 ns


30/33

30

whose limiting value

Clearly, corresponds to the irreducible error in linearprediction using the entire past samples, and it is related to the power

spectrum of the underlying process through the relation

where represents the power spectrum of

For any finite power process, we have

and since Thus

222

1

2 knn (16-65)

.02

)(nTX

).(nTX( ) 0XX

S

2

1

exp ln ( ) 0.2 XXS d

(16-66)

( ) ,XXS d

PILLAI

02

ln ( ) ( ) .XX XX

S d S d

( ( ) 0), ln ( ) ( ).XX XX XX

S S S

(16-67)


31/33

31

Moreover, if the power spectrum is strictly positive at every

Frequency, i.e.,

then from (16-66)

and hence

i.e., For processes that satisfy the strict positivity condition in

(16-68) almost everywhere in the interval the finalminimum mean square error is strictly positive (see (16-70)).

i.e., Such processes are not completely predictable even using

their entire set of past samples, or they are inherently stochastic,

( ) 0, in - ,XX

S

ln ( ) .XX

S d

2

1exp ln ( ) 0

2XX

S d e

(16-68)

(16-69)

(16-70)

),,(

PILLAI


32/33

32

since the next output contains information that is not contained in

the past samples. Such processes are known as regularstochastic

processes, and their power spectrum is strictly positive.

)(XX

S

Power Spectrum of a regular stochastic Process

PILLAI

Conversely, if a process has the following power spectrum,

such that in then from (16-70),( ) 0XX

S 21 .02

2

1

)(XX

S


33/33

33

Such processes are completely predictable from their past data

samples. In particular

is completely predictable from its past samples, since consists

of line spectrum.

in (16-71) is a shape deterministic stochastic process.

k

kkk tanTX )cos()( (16-71)

( )XX

S

)(nTX

1

k

)(XX

S

Documents

Espanol l Ectr 16