A Tutorial on Kalman Filtering - Imperial College …...A Tutorial on Kalman Filtering Dr. Wei Dai Imperial College London (IC) January 2013 Dr. Wei Dai (Imperial College) Kalman Filtering:

A Tutorial on Kalman Filtering

Dr. Wei Dai

Imperial College London (IC)

January 2013

Dr. Wei Dai (Imperial College) Kalman Filtering: a Tutorial January 2013 1

Contents

MMSE Estimator for the general case.Linear MMSE estimator.Kalman Filtering


MMSE Estimation Problem

Assumptions:Let x and y be jointly distributed random variables.

The MMSE (Minimum Mean-Squared Error) estimation problem:Let y be observed.Find x̂, an estimate of the unobserved x, s.t.

E[‖x− x̂‖22

]is minimized.


TheoremThe MMSE estimator is given by x̂ = E [x|y].

Proof.For any function of y, say g (y),

E[‖x− g (y)‖2

]= Ey

[Ex

[‖x− g (y)‖2 |y

]].

Sincex− g (y) = (x− x̂) + (x̂− g (y)) ,

the inner expectation satisfiesEx

[(x− g (y))T (x− g (y)) |y

]= Ex

[(x− x̂)T (x− x̂) |y

]+Ex

[(x̂− g (y))T (x̂− g (y)) |y

]≥ Ex

[(x− x̂)T (x− x̂) |y

].


Example: the Gaussian Case

Model:y =Hx+ v,

where x ∼ N (0,Σx) and v ∼ N (0,Σv).

The estimation problem:Find x̂ such that E

[‖x− x̂‖22

]is minimized.

Remark:When x ∼ N (µ,Σx),

I x̂ = µ+K (y − µ).

It turns out that a linear estimator is optimal:I x̂ =Ky for some K.


MMSE Estimator for the Gaussian Case

x̂ = E [x|y]The posterior p (x|y):

p (x|y) ∝ p (y|x) p (x)∝ exp

{−1

2 (y −Hx)T Σ−1v (y −Hx)

}exp

{−1

2xTΣ−1x x

}∝ exp

{−1

2xT(HTΣ−1v H + Σ−1x

)x+ yTΣ−1v Hx

}= exp

{−1

2xTΣ−1ε x+ yT

(Σ−1v HΣε

)Σ−1ε x

}∝ exp

{−1

2 (x−Ky)T Σ−1ε (x−Ky)

}.

Hence, p (x|y) = N (Ky,Σε),where Σε =

(HTΣ−1v H + Σ−1x

)−1, K = ΣεHTΣ−1v .

MMSE estimator: x̂ = E [x|y] =Ky.I It is Linear.


Linear MMSE Estimators

Problem:Consider linear estimators x̂L = Ly.Find the L that minimizes E

[‖x− x̂L‖22

].

The general case.I Orthogonality property.

Linear signal model: y =Hx+ v.I Two other different derivations.


Properties of the MMSE Estimator: Orthogonality

TheoremConsider arbitrary functions g (y).Let x̂ = Ex [x|y] be the MMSE estimator.Then

E[(x− x̂) gT (y)

]= 0.

Proof:Ex[(x− x̂) gT (y) |y

]= Ex

[x · gT (y) |y

]− Ex

[x̂ · gT (y) |y

]= Ex [x|y] · gT (y)− x̂ · gT (y) = 0.

�

Consequences:E[(x− x̂)yT

]= 0: g (y) = y.

E[(x− x̂) x̂T

]= 0: g (y) = Ex [x|y].


LMMSE Estimators

TheoremLet x and y be jointly distributed (not necessarily Gaussian).The LMMSE estimator is given by

x̂L =Ky where K = ΣxyΣ−1yy .

Proof:The matrix K satisfies the Wiener-Hopf equation:E[(x−Ky)yT

]= 0.

E[(x−Ky)yT

]= E

[xyT

]−K · E

[yyT

]= Σxy −K ·Σyy = 0.


Proof (continued):For any linear estimator Ly,

x−Ly = (x−Ky) + (Ky −Ly)Then,

E[(x−Ly)T (x−Ly)

]= E

[(x−Ky)T (x−Ky)

]+tr

{E[(x−Ky)yT

](K −L)T

}+E

[(Ky −Ly)T (Ky −Ly)

]≥ E

[(x−Ky)T (x−Ky)

].

�

Consequence:Σε = E(x−Ky) (x−Ky)T = Σx −KΣyx.


Special Case for LMMSE: Linear Signal Model

y =Hx+ v,

From the Wiener-Hopf equation:

K = ΣxyΣ−1yy

= ΣxHT(HΣxH

T + Σv

)−1.

Σε = Σx −KΣyx

= Σx −KHΣx

= (I −KH)Σx.


Special Case: Another DerivationPure algebraic, minimum preliminaries required.

The optimization problem:Find K to minimize

E[‖x−Ky‖22

].

Steps:1 Let ε = x− x̂L = x−Ky = x−K (Hx+ v).

Then E[‖x−Ky‖22

]= E

[εT ε]= tr

(E[εεT])

= tr (Σε).

2 Σε = E[εεT]= (I −KH)Σx (I −KH)T +KΣvK

T .

3 ∂tr(Σε)∂K = 2 (I −KH)Σx

(−HT

)+ 2KΣv.

4 Set it to zero, one hasK = ΣxH

T(HΣxH

T + Σv

)−1.Dr. Wei Dai (Imperial College) Kalman Filtering: a Tutorial January 2013 12

Another Special Case: Linear Gaussian ModelModel:

y =Hx+ v,where x ∼ N (0,Σx) and v ∼ N (0,Σv).

Estimators:MMSE estimator, p (x|y) = N (Ky,Σε):

I K = ΣεHTΣ−1

v .I Σε =

(HTΣ−1

v H + Σ−1x

)−1.

LMMSE estimator:I K = ΣxH

T(HΣxH

T + Σv

)−1.I Σε = Σx −KHΣx.

Question: Are they consistent?Answer: Yes.


Proof of Consistency

Key: Woodbury matrix identity.

Σε = Σx −KHΣx =(HTΣ−1v H + Σ−1x

)−1Σε =


)−1= Σx −ΣxH

T(HΣxH

T + Σv

)−1HΣx

= Σx −KHΣx.

K = ΣxHT(HΣxH

T + Σv

)−1= ΣεH

TΣ−1v

K =(ΣεΣ

−1ε

)ΣxH

T(HΣxH

T + Σv

)−1= Σε


)ΣxH

T(HΣxH

T + Σv

)−1= ΣεH

TΣ−1v(HΣxH

T + Σv

) (HΣxH

T + Σv

)−1= ΣεH

TΣ−1v .


Summary

K = ΣxHT(HΣxH

T + Σv

)−1=(HTΣ−1v H + Σ−1x

)−1HTΣ−1v

= ΣεHTΣ−1v

x̂ = µ+K (y −Hµ) .

Σε = (I −KH)Σx (I −KH)T +KΣvKT

= (I −KH)Σx

=(HTΣ−1v H + Σ−1x

)−1


The Kalman Filter

Model:xt = Atxt−1 + ut,yt = Btxt + vt,

where x0 ∼ N (0,Σ0), ut ∼ N (0,Σu), and vt ∼ N (0,Σv).

Prediction:Given y1, · · · ,yt, find the MMSE estimate x̂t+1|t.

Estimation:Given y1, · · · ,yt, find the MMSE estimate x̂t|t.


TheoremLinear combinations of the Gaussians are Gaussian.If x ∼ N (µ,Σ), then y = Ax+ b ∼ N

(Aµ+ b,AΣAT

).

Proof: Use moment generating function (MGF):Mx (τ ) = E

[eτ

Tx]=∫eτ

Txf (x) dx.Same distribution⇔ Same MGF.

SinceMx (τ ) = exp

(τTµ− 1

2τTΣτ

)(details on the next slide), and

My (τ ) = E[exp

(τT (Ax+ b)

)]= exp

(τTb

)E[exp

((ATτ

)Tx)]

= exp(τT (Aµ+ b)− 1

2τTAΣATτ

),

it concludes that y ∼ N(Aµ+ b,AΣAT

). �


MGF of the Gaussians: DetailsPropositionIf x ∼ N (µ,Σ),

Mx (τ ) = exp(τTµ− 1

2τTΣτ

).

Proof:Mx (τ ) =

∫1

|2πΣ|1/2exp

(−1

2 (x− µ)T Σ−1 (x− µ) + τTx

)dx

The exponent becomes−1

2 (x− µ)T Σ−1 (x− µ) + τTΣΣ−1x

= −12 (x− µ−Στ )T Σ−1 (x− µ−Στ )

+τTΣΣ−1µ− 12τ

TΣΣ−1Στ .

Hence,Mx (τ ) =

∫N (x|µ+ Στ ,Σ) dx· exp

(τTµ− 1

2τTΣτ

)= exp

(τTµ− 1

2τTΣτ

).

�Dr. Wei Dai (Imperial College) Kalman Filtering: a Tutorial January 2013 18

Back to the Kalman Filter: t = 1

Prediction:x1 = A1x0 + u1

p (x1) = N(0,A1Σ0A

T1 + Σu

)= N

(0,Σ1|0

).

Estimation:y1 = B1x1 + v1

p (x1|y1) ∝ p (y1|x1) p (x1) .

Algebra shows thatp (x1|y1) = N

(µ1|1,Σ1|1

),

whereΣ1|1 =

(BT

1 Σ−1v B1 + Σ−11|0

)−1,

µ1|1 =K1y1 =(Σ1|1B

T1 Σ−1v

)y1.


The Kalman Filter: t

Prediction:p (xt|y1,··· ,t−1) = p

(Atxt−1 + ut|yt−11

)= N

(µt|t−1,Σt|t−1

),

whereµt|t−1 = Atµt−1|t−1,

Σt|t−1 = AtΣt−1|t−1ATt + Σu.

Estimation:p (xt|y1,··· ,t) = N

(µt|t,Σt|t

),

whereΣ−1t|t = BT

t Σ−1v Bt + Σ−1t|t−1,

µt|t =Ktyt =(Σt|tB

Tt Σ−1v

)yt.


The Kalman Filter: Summary

Use the posterior:The posterior is always Gaussian.Track the full posterior.

I Recursive linear estimators.I Recursive covariance matrix computation.

Global optimal.

Other derivations:Based on LMMSE derivations.

I Use the orthogonal principle.I Use the derivative.

Difficult to show the global optimality.


What’s More

Having discussed:Standard Kalman:

xt = Atxt−1 + ut,yt = Btxt + vt.

Will discuss:Nonlinear Kalman:

xt = Atxt−1 + ut,yt = ft (xt) + vt,

where ft is nonlinear.

Sparse Kalman:xt = xt−1 + ut,yt = Btxt + vt,

where xt is sparse.


Documents

A Tutorial on Kalman Filtering - Imperial College …...A Tutorial on Kalman Filtering Dr. Wei Dai Imperial College London (IC) January 2013 Dr. Wei Dai (Imperial College) Kalman Filtering: