Sampling Methods for Estimation: An Introduction Particle Filters, Sequential Monte-Carlo, and...

Preview:

Citation preview

Sampling Methods for Estimation: An Introduction

Particle Filters, Sequential Monte-Carlo, and Related Topics

Nahum Shimkin Faculty of Electrical Engineering Technion

2

Outline

The nonlinear estimation problem

Monte-Carlo Sampling:

- Crude Monte-Carlo

- Importance Sampling

Sequential Monte-Carlo

3

The Estimation Problem:

Consider a discrete-time stochastic system with state

xtX (usually n), and observation ytY. The system is

specified by the following prior data:

p(xo): initial state distribution

p(xt+1| xt): state transition law (t=0, 1, 2, ..)

p(yt| xt): measurement distribution (t=1, 2, ..)

Denote 0: 0 1: 1, , , , , .t t t t x x x y y y

4

The Estimation Problem (cont’d)

Note:

As usual, we assume the Markovian state property:

The above laws are often specified through state

equations:

The system may be time-varying, for simplicity we

omit the time variable.

1 0: 1: 1 0: 1: 1( | , ) ( | ), ( | , ) ( | )t t t t t t t t t tp p p p x x y x x y x y y x

1 ( , ); ( , )t t t t t tf w h v x x y x

5

The Estimation Problem (cont’d)

We wish to estimate the state xt, based on the prior

data and the observations y1:t.

The MMSE-optimal estimator, for example, is given by

with covariance

More generally, compute the posterior state

distribution:

We can then compute

1:ˆ ( | )t t tEx x y 1:cov( | ).t t tP x y

1:ˆ ( ) ( | ),t t t t tp p x x y x X

MAP

ˆ ˆ ( ) (and its covariance)

ˆ ˆarg max ( ), (etc.)

t t

t tx

p dx

p

X

x x x

x x

6

Posterior Distribution Formulas:

Using the Bayes rule, we easily obtain the following two-

step recursive formulas that relate p(xt|y1:t) to

p(xt-1| y1:t-1) and yt:

Time Update (prediction step):

Measurement Update:

1: 1 1 1 1: 1 1( | ) ( | ) ( | )t

t t t t t t t

x

p p p d x y x x x y x

1

1: 11:

1: 1

( | ) ( | )( | )

( | ) ( | )t t t t

t t t

t t t t

p pp d

p p

x

y x x yx y x

x x x x

7

Posterior Distribution Formulas: (2)

Unfortunately, these are hard to compute directly –

especially when x is high-dimensional!

Some well-known approximations:

The Extended Kalman Filter

Gaussian Sum Filters

General/Specific Numerical Integration Methods

Sampling-based methods – Particle Filters.

8

Monte-Carlo Sampling:

Crude Monte Carlo

The basic idea:

Estimate the expectation integral

by the following mean:

where That is, are independent

samples from p(x).

~ ( )( ) ( ( )) ( ) ( )x pI g E g g p d x x x x x

( )

1

1( ) ( )

Ni

Ni

I g gN

x

( ) ~ ( ).i px x ( )

1

Ni

ix

9

Crude Monte-Carlo – Interpretation

We can interpret Monte-Carlo sampling as trying to

approximate p(x) by the discrete distribution

It is easily seen that

The samples x(i) are sometimes called “particles”,

with weights w(i) = N-1.

( )

1

1ˆ ( ) ( ).

Ni

ni

pN

x x x

ˆ( ) ( ) ( ) .N NI g g p d x x x

x

(x)p

N1

10

Crude Monte Carlo: Properties

The following basic properties follow from standard results for i.i.d. sequences.

1. Bias: IN (g) is an unbiased estimate of I(g), namely

2. SLLN:

3. CLT:

whenever

Advantages (over numerical integration):

Straightforward scheme.

Focuses on “important” areas of p(x).

Rate of convergence does not depend (directly) on dim(x).

( ( )) ( ).NE I g I ga.s.( ) ( ).N N

I g I g

in distr. 2( ) ( ) (0, )N gNN I g I g N

2 cov( ( )) .g g x

11

The Monte-Carlo Method: Origins

Conceiver: Stanislav Ulam (1946). Early developments (1940’s-50’s):

Ulam & Von-Neumann: Importance Sampling, Rejection Sampling. Metropolis & Von-Neumann: Markov-Chain Monte-Carlo (MCMC).

A surprisingly large array of applications, including global optimization statistical and quantum physics computational biology signal and image processing rare-event simulation state estimation …

12

Importance Sampling (IS)

Idea:

Sample x from another distribution, q(x). Use weights to obtain the correct distribution.

Advantages:

May be (much) easier to sample from q than from p.

The covariance may be (much) smaller

Faster convergence / better accuracy.

2g

13

Importance Sampling (2)

Let q(x) be a selected distribution over x – the

proposal distribution. Assume that

Observe that:

where

is the likelihood ratio, or simply the weight-function.

( ) 0 ( ) 0.p q x x

( )( ) ( ) ( ) ( ) ( )

( )

( ) ( ) ( )

pI g g p d g q d

q

g w q d

xx x x x x x

x

x x x x

( )( )

( )

pw

q

xx

x

14

Importance Sampling (3)

This immediately leads to the IS algorithm:

(1) Sample

(2) Compute the weights and

Basic Properties: The above-mentioned convergence properties are

maintained. The CLT covariance is now This covariance is minimized by q(x)=p(x) g(x).

Unfortunately this choice is often too complicated, and we settle for approximations.

( ) ~ ( ), 1, , .i q i Nx x ( )

( )( )

( ),

( )

ii

i

pw

q

x

x( ) ( )

1

1ˆ ( ) ( ).N

i iN

i

I g w gN

x

2~ ( )ˆ cov ( ( ) ( )).g q g w x x x x

15

Importance Sampling (4)

Additional Comments:

It is often convenient to use the normalized weights

so that

The weights w(i) are sometimes not properly normalized

for example, when p(x) is known up to a (multiplicative)

constant.

We can still use the last formula, with

We may interpret as a “weighted-particles”

approximation to p(x):

( ) ( )1,i iw w

N ( ) ( )

1

ˆ( ) ( ).N

i i

i

I g w g

x

( )( )

( )

1

.i

iN i

j

ww

w

( ) ( )( , )i iwx

( ) ( )ˆ( ) ( ) ( )N

i iN

i n

p p w

x x x x

16

x

(x)p

N1

x

(x)p

Two Discrete Approximations

17

The Particle Filter Algorithm

We can now apply these ideas to our filtering problem. The basic algorithm is as follows:

0. Initialization: sample from p0(x) to obtain

with For t=1,2,…, given

1. Time update: For i=1,…, N sample

Set

2. Measurement update: For i=1,…, N, evaluate the

(unnormalized) importance weights:

3. Resampling: Sample N points from the discrete

distribution corresponding to Continue with

( ) ( )0 0, ,i iwx

( ) 10 .iw N ( ) ( )

1 1, :i it tw x( ) ( )

1( | ).i it t tp x x x

( ) ( )1ˆ .i i

t tw w

( ) ( ) ( )ˆ ( | ).i i it t t tw w p y x

( )

1

Nit ix

( ) ( )ˆ , .i it twx

( ) ( ) ( ) 1, , .i i it t tw w N x

18

The Particle Filter Algorithm (2)

Output: Based on we can estimate

We may also use for that purpose.

This algorithm is the original “bootstrap filter” introduced in Gordon et al. (1993).

CONDENSATION is a similar algorithm developed independently in the computer vision context (Isard and Blake, 1998).

Many improvements and variations now exist.

( ) ( ), ,i it tx w

( ) ( )1:

1

( ) ( )1:

1

( | ) ( )

( ( ) | ) ( )

Ni i

t t t ti

Ni i

t t t ti

p w

E g w g

x y x x

x y x

( ) ( )ˆ ,i it tx w

19

Particle Flow

)|( 1:11 ttp yx

)|( 1:1 ttp yx

)|( :1 ttp yx

)|( )(ittp xy

(from Doucet et. al., 2001)

20

The Particle Filter: Time Update

Recall that approximates and

is given.

An approximation is therefore obtained by

sampling and setting

For example, for the familiar state model

we get

( ) ( ) ( ) ( )1 1 1ˆ ( , ), ~ (0, ).i i i i

t t t tf v v Q x x N

( ) ( )1 1,i i

t tw x 1 1: 1( | ),t tp x y

1( | )t tp x x

( ) ( )ˆ ˆ,i it twx

( ) ( )1~ ( | ),i i

t t tp x x x ( ) ( )1ˆ .i i

t tw w

1 1 1( , ), ~ (0, ),t t t tf v Q x x x N

21

The Particle Filter: Time Update (2)

It is possible (and often useful) to sample from another (proposal) distribution

In that case we need to modify the weights:

The ideal choice for would by

Some schemes exist which try to approximate

sampling with q* - e.g., using MCMC.

( )ˆ itx

1( | ).t tq x x

( ) ( )( ) ( ) 1

1 ( ) ( )1

( | )ˆ .

( | )

i ii i t t

t t i it t

pw w

q

x x

x x

1( | )t tq x x

* 1:1

1: 1

( | )( | ) .

( | )t t

t tt t

pq

p

x y

x xx y

22

The Particle Filter: Measurement Update

Recall that

and approximate

To obtain an approximation to

we only need to modify the weights

accordingly:

with

1: 1: 1

1( | ) ( | ) ( | ),t t t t t tp p p

c x y y x x y

( ) ( )ˆ ˆ,i it twx 1: 1( | ).t tp x y

( ) ( )

1,i i

t t iw

x

N

1:( | ),t tp x y

( ) ( ) ( )ˆ ˆ( | ),i i it t t tw w p y x

( ) ( )ˆ .i it tx x

23

The Particle Filter: Resampling

Resampling is not strictly required for asymptotic convergence (as ).

However, without it, the distributions tend to degenerate, with few “large” particles, and many small ones.

The idea is therefore to split large particles into several smaller ones, and discard very small particles.

Random sampling from is one option. Another is

a deterministic splitting of particles according to

namely: particles at with

In fact, the last option tends to reduce the variance.

N ( ) ( ),i i

t twx

( ) ( ),i it twx

( ) ,itw

( ) ( ) ( )( , )i i it t i tw N N w x ( ) ( ) ,i i

t tx x( ) 1 ( ) .i it i tw N w

24

Additional Issues and Comments

Data Association: This is an important problem for applications such as

multiple target tracking, and visual tracking. Within the Kalman Filter framework, such problems are

often treated by creating multiple explicit hypothesis regarding the data origin which may lead to combinatorial explosion.

Isard and Blake (1998) observed that particle filters open new possibilities for data association for tracking. Specifically, different particles can be associated to different measurements, based on some “closeness” (or likelihood) measure.

Encouraging results have been reported.

25

Additional Issues and Comments (2)

Joint Parameter and State Estimation:

As is well known, this “adaptive estimation” problem can be reduced to a standard estimation problem by embedding the parameters in the state vector.

This however leads to a bi-linear state equation, for which standard KF algorithms did not prove successful. The problem is currently treated using non-sequential algorithms (e.g. EM + smoothing).

Particle filters have been applied to this problem with encouraging results.

26

Additional Issues and Comments (3)

Low-noise State Components:

State components which are not excited by external noise (such as fixed parameters) pose a problem since the respective particle components may be fixed in their place, and cannot correct initially bad placements.

The simplest solution is to add a little noise. Additional suggestions involve intermediate MCMC steps, and related ideas.

27

Conclusion:

Particle Filters and Sequential Monte Carlo Methods are an evolving and active topic, with good potential to handle “hard” estimation problems, involving non-linearity and multi-modal distributions.

In general, these schemes are computationally expensive as the number of “particles” N needs to be large for precise results.

Additional work is required on the computational issue – in particular, optimizing the choice of N, and related error bounds.

Another promising direction is the merging of sampling methods with more disciplined approaches (such as Gaussian filters, and the Rao-Blackwellization scheme …).

28

To Probe Further:

Monte-Carlo Methods (at large): R. Rubinstein, Simulation and the Monte-Carlo Method, Wiley, 1981 (and

1996). J.S. Liu, Monte-Carlo Strategies in Scientific Computing, Springer, 2001. M. Evans and T. Swartz, “Methods for approximating integrals in statistics with

special emphasis on Bayesian integration problems”, Statistical Science 10(3), 1995, pp. 254-272.

Particle Filters: N.J. Gordon, D. J. Salmond and A. Smith, “Novel approach to non-linear / non-

Gaussian Bayesian state estimation”, IEE Proceedings-F 140(2), pp. 107-113. M. Isard and A. Blake, “CONDENSATION – conditional density propagation for

visual tracking”, International Journal of Computer Vision 28(1), pp. 5-28. A Doucet, N. de Freitas and N. Gordon (eds.) Sequential Monte Carlo Method in

Practice, Springer, 2001. S. Arulampalam et al., “A tutorial on particle filters, for on-line non-linear/non-

Gaussian Bayesian tracking”, IEEE Trans. on Signal Processing 50(2), 2002.

Recommended