View
216
Download
1
Embed Size (px)
Citation preview
Sampling Methods for Estimation: An Introduction
Particle Filters, Sequential Monte-Carlo, and Related Topics
Nahum Shimkin Faculty of Electrical Engineering Technion
2
Outline
The nonlinear estimation problem
Monte-Carlo Sampling:
- Crude Monte-Carlo
- Importance Sampling
Sequential Monte-Carlo
3
The Estimation Problem:
Consider a discrete-time stochastic system with state
xtX (usually n), and observation ytY. The system is
specified by the following prior data:
p(xo): initial state distribution
p(xt+1| xt): state transition law (t=0, 1, 2, ..)
p(yt| xt): measurement distribution (t=1, 2, ..)
Denote 0: 0 1: 1, , , , , .t t t t x x x y y y
4
The Estimation Problem (cont’d)
Note:
As usual, we assume the Markovian state property:
The above laws are often specified through state
equations:
The system may be time-varying, for simplicity we
omit the time variable.
1 0: 1: 1 0: 1: 1( | , ) ( | ), ( | , ) ( | )t t t t t t t t t tp p p p x x y x x y x y y x
1 ( , ); ( , )t t t t t tf w h v x x y x
5
The Estimation Problem (cont’d)
We wish to estimate the state xt, based on the prior
data and the observations y1:t.
The MMSE-optimal estimator, for example, is given by
with covariance
More generally, compute the posterior state
distribution:
We can then compute
1:ˆ ( | )t t tEx x y 1:cov( | ).t t tP x y
1:ˆ ( ) ( | ),t t t t tp p x x y x X
MAP
ˆ ˆ ( ) (and its covariance)
ˆ ˆarg max ( ), (etc.)
t t
t tx
p dx
p
X
x x x
x x
6
Posterior Distribution Formulas:
Using the Bayes rule, we easily obtain the following two-
step recursive formulas that relate p(xt|y1:t) to
p(xt-1| y1:t-1) and yt:
Time Update (prediction step):
Measurement Update:
1: 1 1 1 1: 1 1( | ) ( | ) ( | )t
t t t t t t t
x
p p p d x y x x x y x
1
1: 11:
1: 1
( | ) ( | )( | )
( | ) ( | )t t t t
t t t
t t t t
p pp d
p p
x
y x x yx y x
x x x x
7
Posterior Distribution Formulas: (2)
Unfortunately, these are hard to compute directly –
especially when x is high-dimensional!
Some well-known approximations:
The Extended Kalman Filter
Gaussian Sum Filters
General/Specific Numerical Integration Methods
Sampling-based methods – Particle Filters.
8
Monte-Carlo Sampling:
Crude Monte Carlo
The basic idea:
Estimate the expectation integral
by the following mean:
where That is, are independent
samples from p(x).
~ ( )( ) ( ( )) ( ) ( )x pI g E g g p d x x x x x
( )
1
1( ) ( )
Ni
Ni
I g gN
x
( ) ~ ( ).i px x ( )
1
Ni
ix
9
Crude Monte-Carlo – Interpretation
We can interpret Monte-Carlo sampling as trying to
approximate p(x) by the discrete distribution
It is easily seen that
The samples x(i) are sometimes called “particles”,
with weights w(i) = N-1.
( )
1
1ˆ ( ) ( ).
Ni
ni
pN
x x x
ˆ( ) ( ) ( ) .N NI g g p d x x x
x
(x)p
N1
10
Crude Monte Carlo: Properties
The following basic properties follow from standard results for i.i.d. sequences.
1. Bias: IN (g) is an unbiased estimate of I(g), namely
2. SLLN:
3. CLT:
whenever
Advantages (over numerical integration):
Straightforward scheme.
Focuses on “important” areas of p(x).
Rate of convergence does not depend (directly) on dim(x).
( ( )) ( ).NE I g I ga.s.( ) ( ).N N
I g I g
in distr. 2( ) ( ) (0, )N gNN I g I g N
2 cov( ( )) .g g x
11
The Monte-Carlo Method: Origins
Conceiver: Stanislav Ulam (1946). Early developments (1940’s-50’s):
Ulam & Von-Neumann: Importance Sampling, Rejection Sampling. Metropolis & Von-Neumann: Markov-Chain Monte-Carlo (MCMC).
A surprisingly large array of applications, including global optimization statistical and quantum physics computational biology signal and image processing rare-event simulation state estimation …
12
Importance Sampling (IS)
Idea:
Sample x from another distribution, q(x). Use weights to obtain the correct distribution.
Advantages:
May be (much) easier to sample from q than from p.
The covariance may be (much) smaller
Faster convergence / better accuracy.
2g
13
Importance Sampling (2)
Let q(x) be a selected distribution over x – the
proposal distribution. Assume that
Observe that:
where
is the likelihood ratio, or simply the weight-function.
( ) 0 ( ) 0.p q x x
( )( ) ( ) ( ) ( ) ( )
( )
( ) ( ) ( )
pI g g p d g q d
q
g w q d
xx x x x x x
x
x x x x
( )( )
( )
pw
q
xx
x
14
Importance Sampling (3)
This immediately leads to the IS algorithm:
(1) Sample
(2) Compute the weights and
Basic Properties: The above-mentioned convergence properties are
maintained. The CLT covariance is now This covariance is minimized by q(x)=p(x) g(x).
Unfortunately this choice is often too complicated, and we settle for approximations.
( ) ~ ( ), 1, , .i q i Nx x ( )
( )( )
( ),
( )
ii
i
pw
q
x
x( ) ( )
1
1ˆ ( ) ( ).N
i iN
i
I g w gN
x
2~ ( )ˆ cov ( ( ) ( )).g q g w x x x x
15
Importance Sampling (4)
Additional Comments:
It is often convenient to use the normalized weights
so that
The weights w(i) are sometimes not properly normalized
for example, when p(x) is known up to a (multiplicative)
constant.
We can still use the last formula, with
We may interpret as a “weighted-particles”
approximation to p(x):
( ) ( )1,i iw w
N ( ) ( )
1
ˆ( ) ( ).N
i i
i
I g w g
x
( )( )
( )
1
.i
iN i
j
ww
w
( ) ( )( , )i iwx
( ) ( )ˆ( ) ( ) ( )N
i iN
i n
p p w
x x x x
16
x
(x)p
N1
x
(x)p
Two Discrete Approximations
17
The Particle Filter Algorithm
We can now apply these ideas to our filtering problem. The basic algorithm is as follows:
0. Initialization: sample from p0(x) to obtain
with For t=1,2,…, given
1. Time update: For i=1,…, N sample
Set
2. Measurement update: For i=1,…, N, evaluate the
(unnormalized) importance weights:
3. Resampling: Sample N points from the discrete
distribution corresponding to Continue with
( ) ( )0 0, ,i iwx
( ) 10 .iw N ( ) ( )
1 1, :i it tw x( ) ( )
1( | ).i it t tp x x x
( ) ( )1ˆ .i i
t tw w
( ) ( ) ( )ˆ ( | ).i i it t t tw w p y x
( )
1
Nit ix
( ) ( )ˆ , .i it twx
( ) ( ) ( ) 1, , .i i it t tw w N x
18
The Particle Filter Algorithm (2)
Output: Based on we can estimate
We may also use for that purpose.
This algorithm is the original “bootstrap filter” introduced in Gordon et al. (1993).
CONDENSATION is a similar algorithm developed independently in the computer vision context (Isard and Blake, 1998).
Many improvements and variations now exist.
( ) ( ), ,i it tx w
( ) ( )1:
1
( ) ( )1:
1
( | ) ( )
( ( ) | ) ( )
Ni i
t t t ti
Ni i
t t t ti
p w
E g w g
x y x x
x y x
( ) ( )ˆ ,i it tx w
19
Particle Flow
)|( 1:11 ttp yx
)|( 1:1 ttp yx
)|( :1 ttp yx
)|( )(ittp xy
(from Doucet et. al., 2001)
20
The Particle Filter: Time Update
Recall that approximates and
is given.
An approximation is therefore obtained by
sampling and setting
For example, for the familiar state model
we get
( ) ( ) ( ) ( )1 1 1ˆ ( , ), ~ (0, ).i i i i
t t t tf v v Q x x N
( ) ( )1 1,i i
t tw x 1 1: 1( | ),t tp x y
1( | )t tp x x
( ) ( )ˆ ˆ,i it twx
( ) ( )1~ ( | ),i i
t t tp x x x ( ) ( )1ˆ .i i
t tw w
1 1 1( , ), ~ (0, ),t t t tf v Q x x x N
21
The Particle Filter: Time Update (2)
It is possible (and often useful) to sample from another (proposal) distribution
In that case we need to modify the weights:
The ideal choice for would by
Some schemes exist which try to approximate
sampling with q* - e.g., using MCMC.
( )ˆ itx
1( | ).t tq x x
( ) ( )( ) ( ) 1
1 ( ) ( )1
( | )ˆ .
( | )
i ii i t t
t t i it t
pw w
q
x x
x x
1( | )t tq x x
* 1:1
1: 1
( | )( | ) .
( | )t t
t tt t
pq
p
x y
x xx y
22
The Particle Filter: Measurement Update
Recall that
and approximate
To obtain an approximation to
we only need to modify the weights
accordingly:
with
1: 1: 1
1( | ) ( | ) ( | ),t t t t t tp p p
c x y y x x y
( ) ( )ˆ ˆ,i it twx 1: 1( | ).t tp x y
( ) ( )
1,i i
t t iw
x
N
1:( | ),t tp x y
( ) ( ) ( )ˆ ˆ( | ),i i it t t tw w p y x
( ) ( )ˆ .i it tx x
23
The Particle Filter: Resampling
Resampling is not strictly required for asymptotic convergence (as ).
However, without it, the distributions tend to degenerate, with few “large” particles, and many small ones.
The idea is therefore to split large particles into several smaller ones, and discard very small particles.
Random sampling from is one option. Another is
a deterministic splitting of particles according to
namely: particles at with
In fact, the last option tends to reduce the variance.
N ( ) ( ),i i
t twx
( ) ( ),i it twx
( ) ,itw
( ) ( ) ( )( , )i i it t i tw N N w x ( ) ( ) ,i i
t tx x( ) 1 ( ) .i it i tw N w
24
Additional Issues and Comments
Data Association: This is an important problem for applications such as
multiple target tracking, and visual tracking. Within the Kalman Filter framework, such problems are
often treated by creating multiple explicit hypothesis regarding the data origin which may lead to combinatorial explosion.
Isard and Blake (1998) observed that particle filters open new possibilities for data association for tracking. Specifically, different particles can be associated to different measurements, based on some “closeness” (or likelihood) measure.
Encouraging results have been reported.
25
Additional Issues and Comments (2)
Joint Parameter and State Estimation:
As is well known, this “adaptive estimation” problem can be reduced to a standard estimation problem by embedding the parameters in the state vector.
This however leads to a bi-linear state equation, for which standard KF algorithms did not prove successful. The problem is currently treated using non-sequential algorithms (e.g. EM + smoothing).
Particle filters have been applied to this problem with encouraging results.
26
Additional Issues and Comments (3)
Low-noise State Components:
State components which are not excited by external noise (such as fixed parameters) pose a problem since the respective particle components may be fixed in their place, and cannot correct initially bad placements.
The simplest solution is to add a little noise. Additional suggestions involve intermediate MCMC steps, and related ideas.
27
Conclusion:
Particle Filters and Sequential Monte Carlo Methods are an evolving and active topic, with good potential to handle “hard” estimation problems, involving non-linearity and multi-modal distributions.
In general, these schemes are computationally expensive as the number of “particles” N needs to be large for precise results.
Additional work is required on the computational issue – in particular, optimizing the choice of N, and related error bounds.
Another promising direction is the merging of sampling methods with more disciplined approaches (such as Gaussian filters, and the Rao-Blackwellization scheme …).
28
To Probe Further:
Monte-Carlo Methods (at large): R. Rubinstein, Simulation and the Monte-Carlo Method, Wiley, 1981 (and
1996). J.S. Liu, Monte-Carlo Strategies in Scientific Computing, Springer, 2001. M. Evans and T. Swartz, “Methods for approximating integrals in statistics with
special emphasis on Bayesian integration problems”, Statistical Science 10(3), 1995, pp. 254-272.
Particle Filters: N.J. Gordon, D. J. Salmond and A. Smith, “Novel approach to non-linear / non-
Gaussian Bayesian state estimation”, IEE Proceedings-F 140(2), pp. 107-113. M. Isard and A. Blake, “CONDENSATION – conditional density propagation for
visual tracking”, International Journal of Computer Vision 28(1), pp. 5-28. A Doucet, N. de Freitas and N. Gordon (eds.) Sequential Monte Carlo Method in
Practice, Springer, 2001. S. Arulampalam et al., “A tutorial on particle filters, for on-line non-linear/non-
Gaussian Bayesian tracking”, IEEE Trans. on Signal Processing 50(2), 2002.