Daniela Calvetti CWRU & La Sapienza Roma, January 24, 2019bretti/Calvetti_ParticleMethodCNR_Roma… · 1;B 2;:::;B tg: Daniela Calvetti CWRU & La Sapienza Bayesian ltering: An introduction

Bayesian filtering: An introduction

Daniela CalvettiCWRU & La Sapienza

Roma, January 24, 2019

Daniela Calvetti CWRU & La Sapienza Bayesian filtering: An introduction Roma, January 24, 2019 1 / 53

Dynamic Inverse Problems

The unknown of primary interest is a time-dependent random variable(stochastic process)

Xt , t ≥ 0.

Observations at discrete time instances. Assuming additive noise,

Bj = F (tj ,Xtj ) + εj , t1 < t2 < . . . .

The prior information may include an evolution model, e.g., anordinary or stochastic differential equation.

Inverse problem: Estimate Xt through the posterior model

πt(xt | Bt), Bt = {b1, . . . bj | tj ≤ t < tj+1} .


Discrete time model

Discrete time stochastic process

X0,X1, . . . , X0 ∼ π0(x0).

with stochastic evolution model

Xt+1 = Ft(Xt) + Vt+1, Vt = innovation,

Observation modelBt = G (Xt) + Wt .

Posterior density

π(Xt | Dt), Dt = {B1,B2, . . . ,Bt}.


Introductory Example

Description of the model:

A hunter walks in a forest with speed v ≤ vmax

At t = 0, the hunter is at x = (0, 0).

The dog of the hunter is at the distance d ≤ dmax from the hunter

The dog is observed at discrete times t1 < t2 < . . . tm.

Problem: Estimate the trajectory of the hunter.



Particle approach: For each time step tj , generate a sample of possiblepositions,

Sj ={x1j , x

2j , . . . , x

Nj

}.

Modeling step by step:At t = t0 = 0, we know with certainty that the hunter is at the origin.Therefore,

S0 ={x1

0 , x20 , . . . , x

N0

},

wherexk0 = (0, 0), 1 ≤ k ≤ N.

Assign to each particle the same probability,

wk0 =

1

N, 1 ≤ k ≤ N.



At t = t1, before observing the dog, where is the hunter?

Propagate each particle:

xk1 = xk0 + wk1 , 1 ≤ k ≤ N,

where wk1 is a random step. It must satisfy

‖wk1 ‖ ≤ vmax(t1 − t0) = γ1.

The direction of the step is arbitrary, so we choose a model

W1 = γ1S

[cos Θsin Θ

],

whereS ∼ Uniform([0, 1]), Θ ∼ Uniform([0, 2π]).



Now the first observation of the dog arrives. Denote it by b1 ∈ R2.Likelihood:

πB1(b1 | x) ∝ χD(x ,dmax)(b1) =

{1, if ‖x − b1‖ ≤ dmax,0 otherwise.

Above,D(x1, dmax) = {x ∈ R2 | ‖x − x1‖ ≤ dmax}.




To each xk1 , assign a weight:

wk1 = πB1(b1 | xk1 ), wk

1 ←wk

1∑N`=1 w

`1

.

The weight wk1 expresses the likelihood of b1 to take place if xk1 is the

position of the hunter.



Resampling: From the sample{(x1

1 ,w11 ), (x2

1 ,w21 ), . . . , (xN1 ,w

N1 )},

draw with replacement N new particles. This is the new sample:

S1 ={x1

1 , x21 , . . . , x

N1

},

Observe: Typically, some particles xk1 are repeated several times, whilesome other particles do not appear at all.



Estimating the hunter’s position at t = t1: Set

x1 =1

N

N∑k=1

xk1 .

Now we can repeat the process again, to find x2, x3, . . .




Gaussian surrogate model: Replace the densities with Gaussianapproximations.At t = t1, before observing the dog, where is the hunter?

Propagate each particle:

xk1 = xk0 + wk1 , 1 ≤ k ≤ N,

where wk1 is a realization of W1,

W1 ∼ N (0, γ21 I2).



The first observation arrives: Use a Gaussian likelihood model,

π(b1 | x1) ∝ exp

(− 1

2d20

‖b1 − x1‖2

).

Assign likelihoods to proposed particles:

wk1 = exp

(− 1

2d20

‖b1 − xk1 ‖2

), wk

1 ←wk

1∑N`=1 w

`1

.



Importance Resampling: From the sample{(x1

1 ,w11 ), (x2

1 ,w21 ), . . . , (xN1 ,w

N1 )},

draw with replacement N new particles. This is the new sample:

S1 ={x1

1 , x21 , . . . , x

N1

},

Repeat with j ← j + 1.


Particle visualization

−2 0 2 4 6 8

0

1

2

3

4

5

6

7

8Prior



−2 0 2 4 6 80

1

2

3

4

5

6

7

8 Prior

Propagated prior



−2 0 2 4 6

0

1

2

3

4

5

6

Propagated prior

Prior



−2 0 2 4 6

−1

0

1

2

3

4

5

6 Likelihood

Propagated prior



−2 0 2 4 6

1

2

3

4

5

6

7

8Posterior

Likelihood


Gaussian Model and Kalman Filter

For the Gaussian approximation, explicit formulas are available:

Recall: xk1 is a realization of

X1 = x0 + W1, W1 ∼ N (0, γ11 I2).

Therefore,

πX1

(x1) ∝ exp

(− 1

2γ21

‖x1 − x0‖2

).

Use this as a prior when the observation comes!



Likelihood:

πB1(b1 | x1) ∝ exp

(− 1

2d20

‖b1 − x1‖2

).

Using Bayes’ formula, write the posterior:

π(x1 | b1) ∝ πX1

(x1)πB1(b1 | x1),

or explicitly,

π(x1 | b1) ∝ exp

(− 1

2γ21

‖x1 − x0‖2 − 1

2d20

‖b1 − x1‖2

).



The quadratic term in the exponent can be written as

1

2γ21

‖x1 − x0‖2 +1

2d20

‖b1 − x1‖2

=

(1

2γ21

+1

2d20

)‖x1‖2 − 2

(1

2γ21

x0 +1

2d20

b1

)T

x1 + . . .

Denote1

γ22

=

(1

γ21

+1

d20

),

to simplify the expression as

1

2γ22

(‖x1‖2 − 2

(γ2

2

γ21

x0 +γ2

2

d20

b1

)T

x1 + . . .

)



Conclusion: The posterior density is Gaussian,

πX1(x1 | b1) ∼ N (x21, γ

22 I2),

where the mean is

x1 =γ2

2

γ21

x0 +γ2

2

d20

b1,

a weighted mean of the a priori mean x0 and the position of the dog b1.This is a particular case of Kalman Filtering!


Bayes filtering, basic form

Evolution–observation model:

Xj+1 = fj(Xj) + Vj+1, j = 0, 1, 2, . . .

Yj = gj(Xj) + Ej , j = 1, 2, . . . .

Observations, or data:

Yj = yj , j = 1, 2, . . .

We assume further that the prior probability density of X0 is given.


Filtering algorithm

The goal is to design an algorithm along the following lines:

Given the density of X0, predict the density of X1 using the priorevolution model,

Using the predicted density of X1 as prior, calculate the posteriordensity π(x1 | y1),

Using the posterior density π(x1 | y1), predict the density of X2,

Using the predicted density of X2 as prior, calculate the posteriordensity π(x2 | y2),

Continue similarly.


Kalman filtering

Evolution–observation model:

Xj+1 = AXj + Vj+1, j = 0, 1, 2, . . .

Yj = BXj + Ej , j = 1, 2, . . . .

Assumptions about the noise processes and the initial process:

1 Normality:Vj ∼ N (0, Γj), Ej ∼ N (0,Σj).

2 Independency: Variables Vj , Ej , all mutually independent.

3 Initial density:X0 ∼ N (x0,D0),

and X0 is independent of the noise processes.


Propagation

Observation: To completely specify a Gaussian density, it is enough toknow the mean and the variance.Assume that

Xj ∼ N (xj ,Dj).

Mean: We haveXj+1 = AXj + V+1,

implying that the mean is

xj+1 = E{Xj+1

}= AE

{Xj

}+ E

{Vj+1

}= Axj .

Hence: Propagate the mean with A.


Propagation

Covariance: since

Xj+1 − xj+1 = A(Xj − xj) + Vj+1,

by independency,

E{

(Xj+1 − xj+1)(Xj+1 − xj+1)T}

= E{

(A(Xj − xj) + Vj+1)(A(Xj − xj) + Vj+1)T}

= E{

(A(Xj − xj)(Xj − xj)TAT

}+ E

{Vj+1V

Tj+1

}= ADjA

T + Γj+1.

Hence, after propagation,

Xj+1 ∼ N (Axj ,ADjAT + Γj+1).


Correction

To implement the correction step, consider a linear inverse problem,

Y = BX + E ,

whereX ∼ N (x ,D), E ∼ N (0,Σ).

To find the posterior density π(x | y) we use the formulas for the Gaussianposterior.


Kalman filtering

1 Initialize: j = 0, x0 and D0 given.

2 Prediction step: Calculate

x j+1 = Axj ,

D j+1 = ADjAT + Γj+1.

3 Updating step: Calculate

xj+1 = x j+1 + D j+1BT(BD j+1B

T + Σj+1)−1(yj+1 − Bx j+1),

Dj+1 = D j+1 − D j+1BT(BD j+1B

T + Σj+1)−1BD j+1.

4 Increase j by one and repeat from 2.


Dynamic model

Forward model

dx

dt= f (t, x , θ), x(0) = x0, (1)

x = x(t) ∈ Rn is the state vector,

θ ∈ Rk is the unknown, or poorly known parameter vector,

f : R× Rn × Rk → Rn is the model function

x0 possibly unknown, or poorly unknown initial value.


Observation model

Data: discrete noisy observations, may depend on the parameter vector:

bj = g(x(tj), θ) + nj , t1 < t2 < . . . , (2)

g : Rn × Rk → R is the observation function

nj is the observation noise

The inverse problem: Estimate the state vector and the parametervector, (x(t), θ), based on the observations.


Two motivational problem

The dynamical system of acetate metabolism in brain by PET scan data:

dm1

dt(t) = K1c(t)− (k2 + k3)m1(t)

dm2

dt(t) = k3m1(t)− k5m2(t)

Observation: Noisy measurements of

c(tj), m(tj) = V0c(tj) + m1(tj) + m2(tj).


Blood Precursor Product

[1-11C]acetate [1-11C]acetate [1-11C]O2

[1-11C]O2

[1-11C]O2

Astrocyte AstrocyteAstrocyte+Neuron

K1

k2

k5

m1 m2

k3

and a more complex one: the dynamical system of cellular metabolism ofskeletal muscles:


Glu

GA3P

BPG

G6P

Pyr Ala

Lac ACoA

Gly

PCr

Cr

TGL

GLC

FFA

FAC

OAA

Mal

Suc SCoA

AKG

Cit

O2

H2O

ATP

ADP

ADP

ATP

ATP

ADP

ATP

ADP

ADP

ATP

ADP

ATP

NAD

NADH

NADH

NAD NADH

NAD

NADH

NAD

ADP

ATP

NADH

NAD

NADH

NAD

NADH

NAD

ADP ATP

NADH

NAD

NAD

NADH

ATP ADP

ATP ADP

AMP ATP

ATP ADP

CO2


The discrete time Markov models framework

Evolution model:Xj+1 = F (Xj , θ) + Vj+1,

F is a known propagation model

Vj+1 is an innovation process

θ is a parameter: assumed known now, later to be estimated.

The observation modelYj = G (Xj) + Wj ,

the observation noise Wj independent of Xj .Update scheme for posterior densities given accumulated data:

π(xj | Dj) −→ π(xj+1 | Dj) −→ π(xj+1 | Dj+1)


Bayesian filtering

1 Propagation step: Chapman-Kolmogorov formula

π(xj+1 | Dj) =

∫π(xj+1 | xj ,Dj)π(xj | Dj)dxj

=

∫π(xj+1 | xj)π(xj | Dj)dxj ,

2 Analysis step: Bayes’ formula conditional on Dj

π(xj+1 | Dj+1) = π(xj+1 | yj+1,Dj)

∝ π(yj+1 | xj+1,Dj)π(xj+1 | Dj)

= π(yj+1 | xj+1)π(xj+1 | Dj),

Combining:

π(xj+1 | Dj+1) ∝ π(yj+1 | xj+1)

∫π(xj+1 | xj)π(xj | Dj)dxj .


Bayesian filtering

Assume that the current distribution is represented in terms of a sample

Sj ={

(x1j ,w

1j ), (x2

j ,w2j ), . . . , (xNj ,w

Nj )}.

The particle version (Monte Carlo integration)

π(xj+1 | Dj+1) ∝ π(yj+1 | xj+1)

∫π(xj+1 | xj)π(xj | Dj)dxj .

can be written as

π(xj+1 | Dj+1) ∝ π(yj+1 | xj+1)N∑

n=1

wnj π(xj+1 | xnj ).


Sampling Importance Resampling (SIR)

Layered sampling: For n = 1, 2, . . . ,N,

1 Draw a candidate particle xnj+1 from π(xj+1 | xnj );

2 Compute the relative likelihood gnj+1 = π(yj+1 | xnj+1);

3 Resample with replacement from

{(x1

j+1, w1j+1), (x2

j+1, w2j+1), . . . , (xNj+1, w

Nj+1)

}, wn

j+1 =gnj+1∑gnj+1

.

Data thinning:

Most particles xnj+1 may have vanishingly small likelihood

Few candidate particles are sampled over and over: The new sampleconsists mostly of copies of few candidate particles.

The density is poorly sampled.


Improvement: Auxiliary particles

Before resampling, calculate an auxiliary predictor:

xnj+1 = F (xnj ).

We write

π(xj+1 | Dj+1) ∝N∑

n=1

wnj π(yj+1 | xnj+1)︸︷︷︸

=gnj+1

π(yj+1 | xj+1)

π(yj+1 | xnj+1)π(xj+1 | xnj ),

The quantity gnj+1 is a predictor of how well the auxiliary particle would

explain the data.


Survival of the Fittest (SOF)

Given the initial probability density π0(x0),

1 Initialization: Draw the particle ensemble from π0(x0):

S0 ={

(x10 ,w

10 ), (x2

0 ,w20 ), . . . , (xN0 ,w

N0 )},

w10 = w2

0 = . . . = wN0 =

1

N.

Set j = 0.

2 Propagation: Compute the predictor:

xnj+1 = F (xnj ), 1 ≤ n ≤ N.



3 Survival of the fittest: For each n:

(a) Compute the fitness weights

gnj+1 = wn

j π(yj+1 | xnj+1), gnj+1 ←

gnj+1∑n g

nj+1

;

(b) Draw indices with replacement `n ∈{

1, 2, . . . ,N}

using probabilitiesP{`n = k} = gk

j+1;(c) Reshuffle

xnj ← x`nj , xnj+1 ← x`nj+1, 1 ≤ n ≤ N.



4 Innovation: For each n: Proliferate

xnj+1 = xnj+1 + vnj+1.

5 Weight updating: For each n, compute

wnj+1 =

π(yj+1 | xnj+1)

π(yj+1 | xnj+1), wn

j+1 ←wnj+1∑

n wnj+1

.

6 If j < T , increase j ← j + 1 and repeat.


Estimating parameters: Sequential Monte Carlo

For the discrete time model, the propagation (and possibly the likelihood)depend on the unknown θ,

xj+1 = F (xj , θ).

Monte Carlo integral for posterior update:

π(xj+1, θ | Dj+1) ∝ π(yj+1 | xj+1, θ)

×∫π(xj+1 | xj , θ)π(xj | θ,Dj)π(θ | Dj)dxj ,

Sample update:

Sj → Sj+1, Sj ={

(xnj , θnj ,w

nj )}Nn=1

.

where Sj is drawn from π(xj , θ | Dj).


Denote

θj =N∑

n=1

wnj θ

nj , Cj =

N∑n=1

wnj

(θnj − θj

)(θnj − θj

)T.

Approximate the marginal probability density π(θ | Dj) of θ by a Gaussianmixture model,

π(θ | Dj) ≈N∑

n=1

wnj N

(θ | θnj , s2Cj

),

for which we define the auxiliary particle by

θnj = aθnj + (1− a)θj ,

where a is a shrinkage factor, 0 < a < 1 and a2 + s2 = 1 to avoid artificialdiffusion.


Approximate

π(xj+1, θ | Dj+1) ∝N∑

n=1

wnj π(yj+1 | xj+1, θ)π(xj+1 | xnj , θ)N (θ | θnj , s2Cj),

which we write as

π(xj+1, θ | Dj+1) ∝N∑

n=1

wnj π(yj+1 | xnj+1, θ

nj )︸︷︷︸

=gnj+1

×π(yj+1 | xj+1, θ)

π(yj+1 | xnj+1, θnj )π(xj+1 | xnj , θ)N (θ | θnj , s2Cj),

where the coefficient gnj+1 is the fitness of the predictor(xnj+1, θ

nj

)=(F (xnj+1, θ

nj ), θ

nj

).


Propagation and innovation

The problem we are addressing assumes

dx

dt= f (t, x , θ), x(0) = x0,

while the discrete propagation is written as

xj+1 = F (xj , θ) + vj+1.

Questions:

1 How do we propagate?

2 What is the innovation?


Stiffness and syncronization

For systems which are inherently stiff, we use a good stiff solver:

xj+1 = F exact(xj , θ) = F (xj , θ) + approximation error,

where the approximation error is due to numerical integration.If the stiffness of the systems varies a lot with the parameter valuesprescribing a fixed accuracy may be a problems because

The time for the particles propagation may vary widely;

The slowest particle determines the propagation speed

We cannot take full advantage of parallel and vectorized computingenvironment.


Prescribe time, not accuracy

Thus, to improve the performance of the algorithm we

Propagate each particle with fixed propagation time.

Estimate the numerical accuracy for each particle

Set the jth particle innovation variance proportional to the integrationerror.

This yields the innovation covariance matrix

Vj+1 ∼ N (0, Γj+1),

where for 1 ≤ i ≤ d ,

Γj+1 = diag(γ) + εI, γi = τ2(uj+1 − uj+1)2i ,

with τ > 1.


SOF with error estimate innovation

Given the initial probability density π0(x0),

1 Initialization: Draw the particle ensemble from π0(x0):

S0 ={

(x10 ,w

10 ), (x2

0 ,w20 ), . . . , (xN0 ,w

N0 )},

w10 = w2

0 = . . . = wN0 =

1

N.

Set j = 0.

2 Propagation: Compute the predictor using LMM:

xnj+1 = Ψ(xnj , h), 1 ≤ n ≤ N.


3 Survival of the fittest: For each n:

(a) Compute the fitness weights

gnj+1 = wn

j π(yj+1 | xnj+1), gnj+1 ←

gnj+1∑n g

nj+1

;

(b) Draw indices with replacement `n ∈{

1, 2, . . . ,N}

using probabilitiesP{`n = k} = gk

j+1;(c) Reshuffle

xnj ← x`nj , xnj+1 ← x`nj+1, 1 ≤ n ≤ N.


4 Innovation: For each n:

(a) Using error estimate, estimate Γnj+1 = Γj+1(xnj );

(b) Draw vnj+1 ∼ N (0, Γn

j+1);(c) Proliferate

xnj+1 = xnj+1 + vnj+1.

5 Weight updating: For each n, compute

wnj+1 =

π(yj+1 | xnj+1)

π(yj+1 | xnj+1), wn

j+1 ←wnj+1∑

n wnj+1

.

6 If j < T , increase j ← j + 1 and repeat from (ii).

The parameter estimation SMC can be also carried out concurrently.


Documents

Daniela Calvetti CWRU & La Sapienza Roma, January 24, 2019bretti/Calvetti_ParticleMethodCNR_Roma… · 1;B 2;:::;B tg: Daniela Calvetti CWRU & La Sapienza Bayesian ltering: An introduction