Before quiz 2

OPTIMAL LINEAR FILTER: WIENER FILTER

5.1 Estimation of signal in presence of white Gaussian noise (WGN)

Consider the signal model is

[ ] [ ] [ ]X n S n V n

where X n is the observed signal, S n is 0-mean Gaussian with variance 1 and ][nV

is a white Gaussian sequence mean 0 and variance 1. The problem is to find the best

guess for S n given the observation [ ], 1,2,...,X i i n

Maximum likelihood estimation for S n determines that value of S n for which the

sequence [ ], 1,2,...,X i i n is the most likely. Let us represent the random samples

[ ], 1,2,...,X i i n by [ ] [ [ ] [ 1] .. [1]]'X X n X n X n and the particular values

[1], [2],..., [ ]x x x n by [ ] [ [ ] [ 1]... [1]]x x n x n x n .

The likelihood function / [ ]( [ ] / [ ])X n S nf x n s n[ ] will be Gaussian with mean ][nx

2

1

( [ ] [ ])

2

/ [ ]

1( [ ] / [ ])

2

n

i

x i s n

X n S n nf x n s n e

[ ]

Maximum likelihood will be given by

[1], [2],..., [ ]/ [ ] ˆ [ ]( ( [1], [2],..., [ ]) / [ ]) 0

[ ] MLEX X X n S n s n

f x x x n s ns n

1

1ˆ [ ] [ ]n

MLE

i

S n x in

Similarly, to find ˆ [ ]MAPS n and ˆ [ ]MMSES n we have to find a posteriori probability

density function (pdf)

S

y [n]

x [n]

+

X

V

22

1

[ ] [ ]/ [ ]

[ ]/ [ ]

[ ]

1 ( [ ] [ ])[ ]

2 2

[ ]

( [ ]) ( [ ] / [ ])( [ ] / [ ])

( [ ])

1 =

( [ ])

n

i

S n X n S n

S n X n

X n

x i s ns n

X n

f s n f x n s nf s n x

f x n

ef x n

n

Taking the logarithm on both sides, we get

22

[ ]/ [ ] [ ]

1

1 ( [ ] [ ])log ( [ ]) [ ] log ( [ ])

2 2

n

e S n x n e X n

i

x i s nf s n s n f x n

[ ]/ [ ]log ( [ ])e S n X nf s n is maximum at ˆ [ ].MAPs n Therefore, taking partial derivative of

[ ]/ [ ]log ( [ ])e S n X nf s n with respect to [ ]x n and equating it to 0, we get

1 ˆ [ ]

1MAP

[ ] ( [ ] [ ]) 0

[ ]

ˆ s [ ]1

MAP

n

i s n

n

i

s n x i s n

x i

nn

Similarly the minimum mean-square error estimator is given by

1MMSE

[ ]

ˆ s [ ] ( [ ] / [ ])1

n

i

x i

n E S n X nn

For MMSE we have to know the joint probability structure of the channel and the

source and hence the a posteriori pdf.

Finding pdf is computationally very exhaustive and involves nonlinear operations

on the samples.

Normally we may be having the estimated values first-order and second-order

statistics of the data

We look for a simpler estimator.

The answer is Optimal filtering or Wiener filtering

We have seen that we can estimate an unknown signal (desired signal) [ ]x n from an

observed signal [ ]y n on the basis of the known joint distributions of [ ]Y n and [ ].X n We

could have used the criteria like MMSE or MAP that we have applied for parameter es

timations. But such estimations are generally non-linear, require the computation of a

posteriori probabilities and involves computational complexities.

The approach taken by Wiener is to specify a form for the estimator that depends on a

number of parameters. The minimization of errors then results in determination of an

optimal set of estimator parameters. A mathematically sample and computationally easier

estimator is obtained by assuming a linear structure for the estimator.

5.2 Linear Minimum Mean Square Error Estimator

The linear minimum mean square error (LLMSE) criterion is illustrated in the above

figure. The estimation problem can be slated as follows:

Given observations represented as random variables

[ 1], [ 2]..., [ ],... [ ],X n M X n M X n X n N determine a set of parameters

[ ], [ 1]...., [ ], [1],..., [ 1]. h N h N h o h h M such that

1

ˆ[ ] [ ] [ ]M

i N

S n h i X n i

and the mean square error 2

ˆ[ ] [ ]E S n S n is a minimum with respect to each of

[ ], [ 1]...., [ ], [1],..., [ 1]. h N h N h o h h M

This minimization problem results in an elegant solution if we assume jointly wide-sense

stationarity of the signals [ ] and [ ]S n X n . The estimator parameters can be obtained from

the second order statistics of the processes { [ ]} and { [ ]}.S n X n

The problem of deterring the estimator parameters by the LMMSE criterion is also called

the Wiener filtering problem. Three subclasses of the problem are identified

1. The optimal smoothing problem 0N

2. The optimal filtering problem 0N

3. The optimal prediction problem 0N

[ 1]x n M …….. [ ]x n … .. [ ]x n N

y[nb]

1n M n n N

Noise

Filter

[ ]s n [ ]x n [ ]s n

Syste

m

+

In the smoothing problem, an estimate of the signal is made at a location inside the

observation window. The filtering problem estimates the curent value of the signal on the

basis of the present and past observations. The prediction problem addresses the issues of

optimal prediction of the future value of the signal on the basis of present and past

observations.

5.3 Wiener-Hopf Equations

The mean-square error of estimation is given by

2 2

12

ˆ[ ] ( [ ] [ ])

( [ ] [ ] [ ])M

i N

Ee n E S n S n

E S n h i X n i

We have to minimize ][2 nEe with respect to each ][ih to get the optimal estimation.

Corresponding minimization is given by

2[ ]0, for ..0.. 1

[ ]

E e nj N M

h j

( E being a linear operator, and [ ]

Eh j

can be interchanged)

[ ] [ - ] 0, ...0,1,..., 1Ee n X n j j N M (1)

or

[ ]

1

[ ] - [ ] [ ] [ - ] 0, ,...0,1,..., 1a

e n

M

i N

E S n h i X n i X n j j N M

(2)

1

( ) [ ] [ ], ,...0,1,..., 1a

M

SX X

i N

R j h i R j i j N M

(3)

This set of 1N M equations in (3) are called Wiener Hopf equations or Normal

equations.

The result in (1) is the orthogonality principle which implies that the

estimation error is orthogonal to observed data.

ˆ[ ]S n is the projection of S n onto the subspace spanned by observations

[ ], [ 1],..., [ ],..., [ ].X n M X n M X n X n N

The estimation uses second order-statistics i.e. autocorrelation and cross-

correlation functions.

If { [ ]}S n and { [ ]}X n are jointly Gaussian, then the MMSE and the LMMSE are

equivalent. Otherwise we get a sub-optimum result.

Also observe that

ˆ[ ] [ ] [ ]S n S n e n

where ˆ[ ]S n and [ ]e n are the parts of S n respectively correlated and uncorrelated

with [ ]X n Thus LMMSE separates out that part of S n which is correlated with

[ ]X n Hence the Wiener filter can be also interpreted as a correlation canceller.

5.4 FIR Wiener Filter

1

0

ˆ[ ] [ ] [ ]M

i

S n h i X n i

The model parameters are given by the orthogonality principlee

[ ]

1

0

ˆ [ ] [ ] [ ] [ - ] 0, 0,1,... 1

e n

M

i

E S n h i X n i S n j j M

1

0

[ ] [ ] ( ), 0,1,..., 1M

X SX

i

h i R j i R j j M

In matrix form, we have

X SXR h r

where

[0] [ 1] .... [1 ]

[1] [0] .... [2 ]

...

[ 1] [ 2] .... [0]

X X X

X X X

X

X X X

R R R M

R R R M

R M R N R

R

and

[0]

[1]

...

[ 1]

SX

SX

SX

SX

R

R

R M

r

and

[0]

[1]

...

[ 1]

h

h

h M

h

Therefore,

[1]h

1

X SX

h R r

5.5 Minimum Mean Square Error - FIR Wiener Filter

12

0

1

0

1

0

( [ ] [ ] [ ] [ ] [ ]

= [ ] X[ ] error isorthogonal to data

= [ ] [ ] [ ] [ ]

= [0] [ ] [ ]

M

i

M

i

M

S SX

i

E e n Ee n S n h i X n i

Ee n n

E S n h i X n i s n

R h i R i

Z-1

[0]h

[1]h

Z-1

Wiener

Estimation

[1]h

[ ]s n

X SXR r

[ 1]h M

[ ]x n

Example1: Noise Filtering

Consider the case of a carrier signal in presence of white Gaussian noise

0 0[ ] [ ], 4

[ ] [ ] [ ]

S n A cos w n w

X n S n V n

here is uniformly distributed in (1,2 ).

[ ] V n is white Gaussian noise sequence of variance 1 and is independent of [ ]. S n Find

the parameters for the FIR Wiener filter with M=3.

2

0

2

0

[ ] cos 2

[ ] X[ ] [ ]

[ ] [ ] [ ] [ ]

[ ] [ ] 0 0

( ) [ ]2

[ ] S[ ] [ ]

S[ [ ] [ ]

[ ]

S

X

S V

S

S

AR m m

R m E n X n m

E S n V n S n m V n m

R m R m

Acos w m m

R m E n X n m

E nS x n m V n m

R m

Hence the Wiener Hopf equations are

2 2 2

2 2 2

2 2 2

[0] [1] [2] [0] [0]

[1] [0] [1] [1] [1]

[2] 1] [0] [2] [2]

[0]1 cos cos2 2 4 2 2

cos 1 cos2 4 2 2 4

cos cos 12 2 2 4 2

X X X SS

X X X SS

X X X SS

R R R h R

R R R h R

R R R h R

A A A h

A A A h

A A A

2

2

2

2

cos[1]2 4

cos[2]2 2

A

A

Ah

suppose A = 5v then

12.513.5 0

12.52[0]

12.5 12.5 12.513.5 [1]

2 2 2[2]

12.5 00 13.5

2

h

h

h

112.5

13.5 0 12.5[0] 2

12.5 12.5 12.5[1] 13.5

2 2 2[2] 12.5

0 13.5 02

[0] 0.707

[1 0.34

[2] 0.226

h

h

h

h

h

h

Plot the filter performance for the above values of [0], [1] and [2].h h h The following

figure shows the performance of the 20-tap FIR wiener filter for noise filtering.

Example 2 : Active Noise Control

Suppose we have the observation signal ][ny is given by

0 1[ ] 0.5cos( ) [ ]X n w n V n

where is uniformly distrubuted in 10,2 and V [ ] 0.6 [ 1] [ ]n V n V n is an MA(1)

noise. We want to control 1[ ]V n with the help of another correlated noise 2[ ]V n given by

2[ ] 0.8 [ 1] [ ]V n V n V n

The filtering scheme is as follows:

The Wiener Hopf Equations are given by

2 1 2V V VR h r

where [ [0] h[1]]h h

and

2

1 2

1.64 0.8 and

0.8 1.64

1.48

0.6

V

V V

R

r

[0] 0.9500

[1] -0.0976

h

h

Example 3:

(Continuous time prediction) Suppose we want to predict the continuous-time process

( ) at time ( ) by

ˆ( ) ( )

S t t

S t aS t

Then by orthogonality principle

-

2-tap FIR

filter

1[ ] [ ]s n v n s n

2[ ]v n

( ( ) ( )) ( ) 0

( )

(0)

SS

SS

E S t aS t S t

Ra

R

As a particular case consider the first-order Markov process given by

( ) ( ) ( )d

S t AS t v tdt

In this case,

( ) (0) A

S SR R e

( )

(0)

AS

S

Ra e

R

Observe that for such a process

1 1

1

1 1

( ) ( )( )

( ( ) ( )) ( )

( ) ( )

(0) (0)

0

S S

A AA

S S

E S t aS t S t

R aR

R e e R e

Therefore, the linear prediction of such a process based on any past value is same as the

linear prediction based on current value.

5.6 IIR Wiener Filter (Causal)

Consider the IIR filter to estimate the signal [ ]x n shown in the figure below.

The estimator S n is given by

0

ˆ( ) ( ) ( )i

S n h i X n i

The mean-square error of estimation is given by

2 2

2

0

ˆ[ ] ( [ ] [ ])

( [ ] [ ] [ ])i

Ee n E S n S n

E S n h i X n i

We have to minimize ][2 nEe with respect to each ][ih to get the optimal estimation.

Applying the orthogonality principle, we get the WH equation.

[ ]h n [ ]X n ˆ[ ]S n

0

0

( [ ] ( ) ]) [ ] 0, 0, 1, .....

From which we get

[ ] [ ] [ ], 0, 1, .....

i

X SX

i

E S n h i yXn i X n j j

h i R j i R j j

We have to find [ ], 0,1,...h i i by solving the above infinite set of equations.

This problem is better solved in the z-transform domain, though we cannot

directly apply the convolution theorem of z-transform.

Here comes Wiener’s contribution.

The analysis is based on the spectral Factorization theorem for a regular random process:

2 1( ) ( ) ( )X v c cS z H z H z

Whitening filter

Wiener filter

Now 2[ ]h n is the coefficient of the Wiener filter to estimate [ ]S n I from the innovation

sequence [ ].V n Applying the orthogonality principle results in the Wiener Hopf equation

2

0

ˆ( ) ( ) ( )i

S n h i V n i

2

0

2

0

[ ] [ ] [ ] [ ]) 0

[ ] [ ] [ ], 0,1,...

i

V SV

i

E S n h i V n i V n j

h i R j i R j j

Note that

2

2

2

0

[ ] [ ]

( ) [ ] ( ), 0,1,...

V V

V SV

i

R m m

h i j i R j j

So that

1

1( )

( )c

H zH z

[ ]X n [ ]V n

[ ]V n

1( )H Z

[ ]X n

2( )H z S n

2 2

2 2

[ ][ ] j 0

( )( )

SV

V

SV

V

R jh j

S zH z

where ( )SVS z is the positive part (i.e., containing non-positive powers of z ) in power

series expansion of ( ).SVS z To determine ( ).SVS z , consider

1

0

0

1

0

1

1 1

2 2 1

[ ] [ ] [ ]

[ ] [ ] [ ]

[ ] S[ ] X[ - - ]

[ ] [j i]

1( ) ( ) ( ) ( )

( )

( )1( )

( )

i

SV

i

i

SX

i

SV SX SX

c

SX

V c

V n h i X n i

R j ES n V n j

h i E n n j i

h i R

S z H z S z S zH z

S zH z

H z

Therefore,

1 2 2 1

( )1( ) ( ) ( )

( ) ( )

SX

V c c

S zH z H z H z

H z H z

We have to

find the power spectrum of data and the cross power spectrum of the of the

desired signal and data from the available model or estimate them from the data

factorize the power spectrum of the data using the spectral factorization theorem

5.7 Mean Square Estimation Error – IIR Filter (Causal)

2

0

0

0

( [ ] [ ] [ ] [ ] [ ]

= [ ] S[ ] error isorthogonal to data

= [ ] [ ] [ ] [ ]

= [0] [ ] [ ]

i

i

S SX

i


Ee n n

E S n h i X n i S n

R h i R i

*

*

1 1

1 1 ( ) ( ) ( )

2 2

1 ( ( ) ( ) ( ))

2

1 ( ( ) ( ) ( ))

2

S SX

S SX

S SX

C

S d H S d

S w H w S w dw

S z H z S z z dz

Example 4:

1[ ] S[ ] V[ ] X n n n observation model with

[ ] 0.8 S [ -1] [ ] S n n W n

where 1[ ] V n is and additive zero-mean Gaussian white noise with variance 1 and [ ] W n

is zero-mean white noise with variance 0.68. Signal and noise are uncorrelated.

Find the optimal Causal Wiener filter to estimate [ ].X n

Solution:

1

0.68( )

1 0.8 1 0.8SSS z

z z

[ ] X[ ] [ ]

[ ] [ ] [ ] [ ]

[ ] [ ] 0 0

( ) ( ) 1

X

S V

X SS

R m E n X n m

E X n V n S n m V n m

R m R m

S z S z

1

1

1 0.8z

[ ]W n [ ]S n

Factorize

1

1

1

1

1

2

0.68( ) 1

1 0.8 1 0.8

2(1 0.4 )(1 0.4 ) =

1 0.8 1 0.8

(1 0.4 )( )

1 0.8

2

X

c

V

S zz z

z z

z z

zH z

z

and

Also

1

[ ] S[ ] [ ]

[ ] [ ] [ ]

[ ]

( ) ( )

0.68 =

1 0.8 1 0.8

SX

S

SX S

R m E n X n m

E S n S n m V n m

R m

S z S z

z z

2 1

1

11

1

( )1( )

( ) ( )

1 0.68(1 0.8 ) =

2 (1 0.8 )(1 0.8 )1 0.4

0.944 =

1 0.4

[ ] 0.944(0.4) 0

SX

V c c

n

S zH z

H z H z

z

z zz

z

h n n

5.8 IIR Wiener filter (Noncausal)

The estimator ][ˆ nx is given by

ˆ[ ] [ ] [ ]i

S n h i X n i

For LMMSE, the error is orthogonal to data.

[ ] [ ] X[ ] [ ] 0 ji

E S n h i n i X n j

[ ]X n

y

[n]

( )H z ˆ[ ]S n

[ ] [ ] [ ], ,...0, 1, ...X SX

i

h i R j i R j j

This form Wiener Hopf Equation is simple to analyse.

Easily solved in frequency domain. So taking Z transform we get

Not realizable in real time

( ) ( ) ( )X SXH z S z S z

so that

( )( )

( )

or

( )( )

( )

SX

X

SX

X

S zH z

S z

SH

S

5.9 Mean Square Estimation Error – IIR Filter (Noncausal)

The mean square error of estimation is given by

2( [ ] [ ] [ ] [ ] [ ]

= [ ] S[ ] error isorthogonal to data

= [ ] [ ] [ ] [ ]

= [0] [ ] [ ]

i

i

S SX

i


Ee n n

E S n h i X n i S n

R h i R i

*1 1 ( ) ( ) ( )

2 2S SXS d H S d

*

1 1

1 ( ( ) ( ) ( ))

2

1 ( ( ) ( ) ( ))

2

S SX

S SX

C

S H S d

S z H z S z z dz

Example 5: Noise filtering by noncausal IIR Wiener Filter

Consider the case of a carrier signal in presence of white Gaussian noise

[ ] [ ] [ ]X n S n V n

where [ ] V n is and additive zero-mean Gaussian white noise with variance 2.V Signal

and noise are uncorrelated

( ) ( ) ( )

and

( ) ( )

( )( )

( ) ( )

( )

( ) =

() 1

( )

Y S V

SX S

S

S V

SX

V

S

V

S S S

S S

SH

S S

S

S

S

S

Suppose SNR is very high

( ) 1 H

(i.e. the signal will be passed un-attenuated).

When SNR is low

( )

( )( )

SS

V

SH

S

(i.e. If noise is high the corresponding signal component will be attenuated in proportion

of the estimated SNR.

SNR

( )H

Signal

Noise

time

Example 6: Image filtering by IIR Wiener filter

( ) power spectrum of the corrupted image

( ) power spectrum of the noise, estimated from the noise model

or from the constant intensity ( like back-ground) of the image

( )( )

XX

VV

SS

S w

S w

S wH w

S

( ) ( )

( ) ( ) =

( )

SS VV

XX VV

XX

w S w

S w S w

S w

Example 7:

Consider the signal in presence of white noise given by

[ ] 0.8 s [ -1] [ ] s n n w n

where [ ] v n is and additive zero-mean Gaussian white noise with variance 1 and [ ] w n is

zero-mean white noise with variance 0.68. Signal and noise are uncorrelated.

Find the optimal noncausal Wiener filter to estimate s[ ].n

-1

SX

-1XX

-1

-1

-1

0.68

1-0.8z 1 0.8zS ( )( )

S ( ) 2 1-0.4z 1 0.4z

1-0.6z 1 0.6z

0.34 One pole outside the unit circle

1-0.4z 1 0.4z

0.4048 0.4048

1-0.4z 1-0.4z

[ ] 0.4048(0.4) ( ) 0.4048(0.4) (n n

zH z

z

h n u n u

1)n

[ ]h n

n

KALMAN FILTER- P.K. Bora

1. Introduction

To estimate a signal [ ]s n in the presence of noise,

FIR Wiener Filter is optimum when the data length and the filter length are equal.

IIR Wiener Filter is based on the assumption that infinite length of data sequence

is available.

Neither of the above filters represents the physical situation. We need a filter that adds a

tap with each addition of data.

The basic mechanism in Kalman filter ( R.E. Kalman, 1960) is to estimate the signal

recursively by the following relation

ˆ ˆ[ ] [ 1] [ ]n nS n A S n K X n

The whole of Kalman filter is also based on the innovation representation of the signal.

We used this model to develop causal IIR Wiener filter.

2. Signal Model

The simplest Kalman filter uses the first-order AR signal model

S[n] [n 1] [ ]aS W n

where [ ]W n is a white noise sequence.

The observed data is given by

X[n] [ ] [ ]S n V n

where [ ]V n is another white noise sequence independent of the signal.

The general stationary signal is modeled by a difference equation representing the ARMA

(p,q) model. Such a signal can be modeled by the state-space model and is given by

S[n] [n 1] [ ]S n A BW (1)

The observations also can be represented as a linear combination of the states and the

observation noise.

X[n] [n] [ ]S V n c (2)

Noise

Linear filter +

[ ]X n [ ]S n ˆ[ ]S n

Equations (1) and (2) have direct relation with the state space model in the control system

where you have to estimate the ‘unobservable’ states of the system through an observer

that performs well against noise.

Example 1:

Consider the ( ) AR p model

1 2S[n] S[n 1] S[n 2]+....+ S[n ]+ [ ]pa a a p W n

Then the state variable model for { [ ]}X n is given by

S[n] [n 1] [ ]S w n A B

where

1

2

1 2

1 2

[ ]

[ ][ ] , S [ ] [ ], S [ ] [ 1].... and S [ ] [ 1],

[ ]

.. ..

1 0 .. .. 0

0 1 .. .. 0

0 0 .. .. 1

1

0 and

..

0

p

p

p

S n

S nn n S n n S n n S n p

S n

a a a

A

b

S

Our analysis will include only the simple (scalar) Kalman filter

3. Innovation Representation

The Kalman filter also uses the innovation representation of the stationary signal as the

IIR Wiener filterdoes. The innovation representation is shown in the following diagram.

[ ]X n ˆ [ ]X n

Let ˆ[ | ]S n n denote the LMMSE of [ ]S n based on the data [0], [1],..., [ ].X X X n Thus,

ˆ ˆ[ | ] ( [ ] | [0], [1],..., [ 1])S n n E S n X X X n

where E is the linear minimum mean-square operator.

Innovation Generation

Let ˆ [ | 1]X n n be the linear prediction of [ ]X n based on [0], [1],..., [ 1].X X X n

Define

ˆ X[ ] [ ] [ | 1]

ˆ [ ] ( [ ] | [0], [1],..., [ 1])

ˆ [ ] ( [ ] [ ] | [0], [1],..., [ 1])

ˆ [ ] ( [ 1] [ ] [ ] | [0], [1],..., [ 1])

ˆ [ ] ( [ 1] | [0

n X n X n n

X n E X n X X X n

X n E S n V n X X X n

X n E aS n W n V n X X X n

X n E aS n X

], [1],..., [ 1])

ˆ X[ ] [ ] [ 1| 1]

ˆ[ ] [ ] [ 1| 1]

ˆ[ 1] [ ] [ ] [ 1| 1]

[ 1] [ ] [ ]

X X n

n X n aS n n

S n V n aS n n

aS n W n V n aS n n

ae n W n V n

Similarly,

ˆ X[ ] [ ] [ | 1]

[ 1] [ ] [ ]

n j X n X n n j

ae n j W n j V n j

It is easy to show that

E X[ ] X[ ] 0 1,2,...,n n j j n

and

2 2[n]n EX , which varies with n .

In the above representation [ ]X n is the innovation of [ ]X n and contains the same

information as the original sequence.

Exercise: Show that E X[ ] X[ ] 0 1,2,...,n n j j n

4. LMMSE estimation based on the Innovation Sequence

The LMMSE estimation of [ ]S n based on [0], [1],..., [ ],X X X n is same as the estimation

based on the innovation sequence [0], [1],..., [ 1], [ ]X X X n X n . Therefore,

0

ˆ[ | ] [ ]n

i

i

S n n k X i

where ski can be obtained by using the orthogonality relation.

Note that

( )0

( )0

( ) 2

[ ] [ ] [ ]

Then,

( [ ] [ ]) [ ] 0 0,1,...,

so that

[ ] [ ] / 0,1,...,

nni i

nni i

n

j j

e n S n k X i

E S n k X i X j j n

k ES n X j j n

where

2 2=E j X

Similarly,

1( 1)

0

( 1) 2

2

2

( ) ( 1)

ˆ[ 1| 1] [ ]

[ 1] [ ] / 0,1,..., 1

( [ ] [ ]) [ ] /

( [ ]) [ ] /

0,1,...,

nn

i

i

n

j j

j

j

n n

j j

S n n k X n

k ES n X j j n

E S n W n X j a

E S n X j a

k ak j

1n

1( ) ( ) ( )

0 0

1( 1) ( )

0

( )

ˆ[ ] [ ] [ ] [ ]

[ ] [ ]

ˆ ˆ [ 1| 1] ( [ ] ( [ ] | [0], [1],..., [ 1]))

ˆ [ 1| 1]

n nn n n

i i ni i

nn n

i ni

n

n

S n k X i k X i k X n

a k X i k X n

aS n n k X n E X n X X X n

aX n n k

( )

( ) ( )

ˆ( [ ] [ 1| 1])

ˆ (1 ) [ 1| 1]) [ ]

ˆ ˆwhere [ 1| 1] ( [ ] / [0], [1],..., [ 1]) is the linear prediction of X[ ] based on

obseravtions X[0], [1],..., [ 1]

n

n

n n

n n

X n aS n n

k aS n n k X n

aS n n E Y n X X X n n

X X n

( )

( )

.

ˆ ˆ[ | ] [ 1| 1] [ ]

with (1 )

n

n n

n

n n

S n n A S n n k X n

A k a

Thus the recursive estimator ˆ[ ]x n is given by

Or

The filter can be represented in the following diagram

( )ˆ ˆ[ | ] [ 1| 1] [ ]n

n nS n n A S n n k X n

( )ˆ ˆ ˆ[ | ] [ 1| 1] ( [ ] [ 1| 1])n

nS n n aS n n k X n aS n n

-

[ ]X n

+ + ( )n

nk

nk

a

ˆ[ ]S n

1z

5. Estimation of the filter-parameters

Consider the estimator

ˆ ˆ[ | ] [ 1| 1] [ ]n nS n n A S n n k X n

The estimation error is given by

ˆ[ ] [ ] [ | ]e n S n S n n

Therefore ][ne must orthogonal to past and present observed data .

[ ] [ ] 0, 0Ee n X n m m

We want to find nA and the nk using the above condition.

The error ][ne is orthogonal to current and past data. First consider the condition that

][ne is orthogonal to the current data.

2

( ) 2

( ) 2

( )

[ ] [ ] 0

[ ]( [ ] [ ]) 0

[ ] [ ] [ ] [ ] 0

ˆ[ ]( [ | ] [ ]) [ ] [ ] 0

[ ] [ ] [ ] 0

ˆ[ ] ( [ ] [ 1| 1] [ ]) [ ] 0 (denoting [ ] by [ ])

[ ] 0

[ ]

n

n n

n

n V

n

n

Ee n X n

Ee n S n V n

Ee n S n Ee n V n

Ee n S n n e n Ee n V n

Ee n Ee n V n

P n E S n A S n n k X n V n Ee n P n

P n k

P nk

2

V

We have to estimate [ ]P n at every value of n.

Estimation of [ ]P n

We have

( )

2 ( ) ( )

2 ( ) ( ) 2

( ) 2 ( )

[ ] [ ] [ ]

ˆ [ ]( [ ] (1 ) [ 1] [ ])

ˆ (1 ) [ ] [ 1] [ ] [ ]

ˆ (1 ) [ ] [ 1]

ˆ (1 ) (1 ) ( [ 1] [ ]) [

n

n n

n n

S n n

n n

S n n S

n n

n S n

P n ES n e n

Es n S n k aS n k X n

k aES n S n k ES n X n

k aES n S n k

k k aE aS n w n S n

( ) 2 2

( ) 2 2

( ) 2 2 2 2

( ) 2 2 2 2

2

1]

ˆ (1 )( [ 1] [ 1])

ˆ (1 )( [ 1]( [ 1] [ 1] [ 1]))

(1 )( [ 1] [ 1] [ 1]))

(1 )( [ 1] )

[ ] (1

n

n S

n

n S

n

n S

n

n S S

V

k a ES n S n

k a ES n S n S n S n

k a ES n e n a ES n

k a P n a

P n

2 2 2 2)( [ 1] )S Sa P n a

Hence 2 2

2

2 2 2

[ 1][ ]

[ 1]

WV

W V

a P nP n

a P n

where we have substituted 2 2 2(1 )W Sa for an AR(1) process.

We have still to find [0]P . For this assume ˆ[ 1] [ 1] 0.S S Hence from the relation

( ) 2 2 ˆ[ ] (1 ) (1 ) [ 1] [ 1]n

n S nP n k k a ES n S n

we get

(0) 2

0 [0] (1 ) SP k

Substituting (0)

0 2

[0]

V

Pk

in the expression for [0]P above

We get 2 2

2 2[0] S V

S V

P

6. Scalar Kalman filter algorithm

Given: Signal model parameters 2 2

W V and and the observation noise variance .a

Initialisation ˆ[ 1] 0S

Step 1 .0n Calculate

2 2

2 2[0] S V

S V

P

Step 2 Calculate

( )

2

[ ]

n

n

V

P nk

Step 3 Input [ ].X n Estimate ˆ[ ]S n by

( )

Predict

ˆ ˆ[ | 1] [ 1]

Correct

ˆ ˆ ˆ[ ] [ | 1] ( [ ] [ | 1])n

n

S n n aS n

S n S n n k X n S n n

Step 4 .1 nn Calculate

2 2

2

2 2 2

[ 1][ ]

[ 1]

WV

W V

a P nP n

a P n

Step 5 Go to Step 2

We have to initialize [0]P .

Irrespective of this initialization, ( ) and [ ]n

nk P n converge to final values.

Considering a to be time varying, the filter can be used to estimate nonstationary

signal.

Example: Given

2 2

S[n] 0.6 [n 1] [ ] 0

[ ] [ ] [ ] 0

0 25 0.5W V

S W n n

X n S n V n n

σ . , σ

Find the expression for the Kalman filter equations at convergence and the corresponding

mean square error.

Using 2 2

2

2 2 2

[ 1][ ]

[ 1]

WV

W V

a P nP n

a P n

We get 20.25 0.6

0.50.25 0.5 0.6

PP

P

Solving and taking the positive root

0.320P

( ) 0.320lim 0.64

0.5 n

nn

k

7. Vector Kalman Filter

Consider the time-varying state-space model representing the nonstationary signal.

The state equation is

[ ] [ ] [ 1] [ ]n n n n S A S W

where

1 2[n]= [ ] [ ].... [ ]pS n S n S n

S is the state vector, [ ]nA is a p p system matrix, and

[ ]nW is the 0-mean Gaussian process noise vector with the p p covariance matrix

.WQ

The observed data 1 2X[n]=[X [ ] [ ]... [ ]]pn X n X n is related to the sates by

[ ] [n] [ ]n n X C S V

where C is a q q output matrix, [ ]nV is the 0-mean Gaussian noise vector with q×q

covariance matrix .VQ

Denote | LMSE of given [0], [1],..., [ ].n n n n n S[ ] S[ ] S[ ] X X X

and | 1 LMMSE of given [0], [1],..., [ 1].n n n n S[ ] S[ ] X X X

The corresponding state estimation errors are:

| |

and

| 1 | 1

n n n n n

n n n n n

e[ ] = S[ ]- S[ ]

e[ ] = S[ ]- S[ ]

Generalising Equation (4) we have

| -1 | -1

[ ] -1 -1

n n n n n n

n n n n n

n

n

S[ ] = S[ ]+ k ( [ ]- [ ])

= A S[ ]+ k ( [ ]- c[ ]S[ ])

X X

X

where kn is a p×q gain matrix.

The mean-square estimation error in scalar Kalman filter will now be replaced by the

error covariance matrix denoted by .P

The a priori estimate of the error covariance matrix is given

| 1 | 1 | 1 n n E n n n nP[ ] e[ ]e [ ]

and the a posteriori estimate error covariance is

| | |n n E n n n nP[ ] e[ ]e [ ]

With these definitions and notations the vector Kalman filter algorithm is as follows:

7. Vector Kalman filter algorithm

State equation: [ ] -1n n n nS[ ] = A S[ ]+ W[ ]

Observation equation: [ ] [ ] [ ] [ ]n n n nX = C S + V

Given (a) State matrix , 01,2...n nA[ ] and the process noise covariance matrix WQ

(b) Observation parameter matrix , 01,2...n nC[ ] and the observation noise

covariance matrix VQ

(c) Observed data , 01,2...n n X[ ]

Initialization:

1| 1 1 0

1| 1 1 1E

S[ ] S[ ]

P[ ] S[ ]S [ ]

Estimation:

For 0,1,2,.... do

Prediction

Predict the state

| -1 [ ] -1

Estimate error covariance matrix

| -1 -1 -1

Compute Kalman gain

| -1 | -1

n

n n n n

a priori

n n n n n n

n n n n n n n

W

n

S[ ] A S[ ]

P[ ] = A[ ]P[ | ]A [ ] + Q

k = P[ ] C[ ](C[ ]P[ ] C [ ]+ Q

Update state

| | -1 ( [ ] [ ] | -1 )

Update error covariance matrix

| ( [ ]) | 1

n n n n n n n n

a posteriori

n n n n n

-1

v

n

n

)

S[ ] S[ ] k c S[ ]

P[ ] I k C P[ ]

X

Least-squares Estimator

In least squaree (LS) estimation, the observed data are assumed to be a known function of

some unknown parameters plus some errors which are assumed to be random. Unlike

MVUE, MLE or the Bayesian estimators, we do not have an explicit probabilistic model for

the data .The randomness is only due to the measurement error

Consider the simplest example,

i iX e , Where is the unknown parameter and ie is the observation or measurement

noise. According to LS principle, we have to minimize the sum-square error.

2

1

n

ii

J X

with respect to .

Thus ˆ arg min LS J

ˆLS is given by

ˆ

1

1

0

ˆ 0

1ˆ

LS

n

i LS

i

n

LS i

i

J

X n

Xn

which is same as the MLE.

General Linear Model

The general linear model for observed samples , 1,2,....,iX i n is given by

H eX = where we have assumed 1 2 ...T

nX X XX = and 1 2 ...T

M .In general

,both the data and the parameters can be multidimensional or single dimensional.

We consider the simple case

Signal model

H eX =

So that sum-square error is given by

2

1

2

1 1

=

=

=

n

i

i

n n

i ij j

i j

T

T T T T T T

J e

X h

H H

H H HH

X X

X X X X

Thus ˆLS is given by

ˆ

0LS

J

2T T TH H H H X X

ˆT T

LSH H H X

The above equation is nown as the normal equation.

The optimal solution is given by

1ˆ T T

LS H H H

X

The matrix 1

T TH H H

is known as the pseudo inverse.

We can also write

1

m

j j

j

X h e

So that

2

2

1

2

1

0

0; 1,2,.....,

ˆ 0

m

j j

j

j

Tm

j j j

j

j

e h

e

h h j m

h

X

X

X X

orthogonality principle

The error orthogonal to the each column vector of H matrix.

Geometrical Interpretation

From the orthogonality principle, a geometrical interpretation of the LS estimation can be

given. The observed data X is orthogonally projected to the space spanned by

h , 1,2,...,j j m and find the optimal solution in the space. This is illustrated in the given

figure belo

h1

h2

Example:

0 1

1

2

1

2

1 1

1

0

1

1 1

1 2

. ,H= . .

. . .

1

Equation

ˆ

1ˆ 1 1 . . 12

ˆ 1 2 . .1 1 2 1

2 6

i

n

n

iT

n n

i i

T T

LS

LS

x i e

x

x

x n

n i

H H

i i

Normal

H H H

xn n

n

nn n n n n

X =

XLSθ

2

1

1

.

.

=

n

n

i

i

n

i

i

x

x

x

ix

x

Solving the above matrix equation, we can get 0ˆ

LS and 1ˆ

LS

Statistical properties of ˆLS

ˆLS is unbiased

We have

1

1

1

ˆ

=

ˆ

=

T T

T T

T T

H H H

H H H H e

E H H H H E Ee

X

LS

LS

θ

θ

Data & Analytics

Before quiz 2