Online recursive particle path functional approximation ... · In section 3.1, the path approximation for recursive functionals is introduced combined with sequential Monte Carlo

Online recursive particle path functionalapproximation for large datasets

R. Marques∗ G. Storvik†

February 1, 2017

Abstract

Online inference for prediction of latent states of general state space models is awell-known challenging task, but Particle filters have proven successfull in a rangeof application. Prediction of functions of the whole sequence of latent variables,entitled path functionals, may be of interest in itself, but are also crucial elementswhen estimation of static parameters is needed. Online inference of path functionalsusing particle filters have shown be a much harder computational task, mainly dueto path degeneracy problems for long time series. In this paper, we present particleblocking approximations to mitigate the path degeneracy when we have access to theuse of recursive functionals. We provide illustrations to estimate path functional andfor online parameter estimation in the Gaussian systems framework.

Keywords: Approximate Bayesian inference; Particle filter; Path Recursive Functional;Sequential Monte Carlo methods; Sufficient Statistics; State space models

∗University of Oslo and Statistics for Innovation Centre, Oslo, Norway.†University of Oslo and Statistics for Innovation Centre, Oslo, Norway. Email of correspondence:

[email protected]

1

1 Introduction

General state space models are a broad class of time series models which encapsulate several

real problems in financial econometrics, engineering and life sciences (Cappe et al., 2005;

Candy, 2009; Kevin, 2012; Douc et al., 2014). However, inference for a vast majority of

space space models is analytically intractable. Sequential Monte Carlo (SMC) methods, and

their respective algorithms –called particle filters (PF)– are a powerful class of stochastic

algorithms to perform sequential inference in state space models (Doucet et al., 2001;

Chopin, 2004; Cappe et al., 2007; Fearnhead, 2008; Doucet and Johansen, 2009; Kunsch

et al., 2013). Theoretical results, showing convergence of marginal distribution as the

number of particles increases, are presented in several materials, for instance, see Doucet

et al. (2001); Del Moral (2004); Douc et al. (2014) and the references therein.

While the use of sequential Monte Carlo (SMC) methods offer valuable techniques and

algorithms to perform Bayesian inference in general state space models, they might still

be inefficient in practice when some quantity based in the whole trajectory is of inter-

est (Doucet and Johansen, 2009; Douc et al., 2014). Since SMC methods approximate a

sequence of target distributions via sequential importance sampling, the particle weights

will collapse as time increases (Doucet and Johansen, 2009). Including resampling steps

is the usual mechanism to bypass this problem, but the successive use of resampling leads

to sample impoverishment of the initial parts of the trajectory by duplicating the particles

with high weights. The systematic impoverishment is a well-known obstacle of sequential

Monte Carlo methods called path degeneracy. As time increases the loss of diversity among

the particle becomes stronger. As a consequence the ancestry of particles tend to be rep-

resent by only few trajectories (in the limit by only one) (see Andrieu et al., 2005; Cappe

et al., 2005; Kantas et al., 2009; Poyiadjis et al., 2011, and the references therein). MCMC

moves or arbitrary updates have been used as potential mechanism to reduce the sample

impoverishment (Gilks and Berzuini, 2001; Doucet et al., 2006; Lin et al., 2013). These

technologies appear in the SMC literature with different sampling schemes, and most of

them require computational cost that increases with time.

Path degeneracy is in particular problematic for static parameter estimation. To reduce

this problem for online parameter estimation, the forgetting property of the filtering distri-

2

butions might be used to reduce the number of resampling steps. This alternative is based

on fixed-lag approximations of the smoothing distributions, and its theoretical justification

combined with simulation studies can be found in Olsson et al. (2008). However, parti-

cle algorithms based on summary statistics to perform online Bayesian inference (Andrieu

et al., 1999; Fearnhead, 2002; Storvik, 2002; Carvalho et al., 2010) still suffer with the path

degeneracy, thus they may provide non-robust results for large time series data. The effect

of the path degeneracy for such approaches for offline and online parameter estimation was

exhaustively investigate in Kantas et al. (2009).

The main contribution of this article is to propose an online Bayesian particle algorithm

for path functionals that avoids path degeneracy for large datasets applications. More pre-

cisely, our sequential Monte Carlo scheme is based on a composite likelihood approximation

on non-overlapping blocks combined with a fixed-lag approximation. The size of the blocks

can be user-specified and should be specified as an increasing function of the number of

particles. In the limit with a block size equal to the total length of observation interval,

the ordinary particle filter algorithm is obtained.

The rest of the paper is structured as follows. Section 2 introduces the notation and

presents a short review of sequential Monte Carlo inference for general state space models.

In section 3.1, the path approximation for recursive functionals is introduced combined with

sequential Monte Carlo algorithms for such functionals. In section 3.2, the particle blocking

filter is adapted to on-line estimation of static parameters, and the implamentation details

are discussed in Section 3.3. We demostrate the peformace of our simulation scheme for

online Bayesian inference in Section 4. Finally, section 5 closes with a discussion.

2 Background

Let {xt}t∈N with xt ∈ X ⊂ Rpx be a Markov process. Denote the transition densities

by π(xt|xt−1; θ) and initial distribution π(x1|θ). Let also {yt}t∈N,yt ∈ Y be an observed

stochastic process conditionally independent given (x1, . . . ,xt), with marginal densities

π(yt|xt; θ). Here θ ∈ Θ ⊂ Rpθ are some static parameters. We denote X t =∏t

k=1X

and similarly for Y t. For any sequence {an}n∈N, we denote ak:n ≡ (ak, ak+1, . . . , an). Our

general interest will be in the posterior distributions π(x1:t|y1:t; θ) or π(x1:t, θ|y1:t) where

3

in the latter case we take a Bayesian view assuming a prespecified prior for θ.

2.1 Sequential Monte Carlo inference for state space models

Monte Carlo inference is a powerful technique to deal with the intractability of complex

state space model in general. In this section we will consider θ fixed, and dropped out of

the notation.

In general, we have the filtering recursions

π(x1:t|y1:t) =π(yt|xt)π(xt|xt−1)π(x1:t−1|y1:t−1)

π(yt|y1:t−1)

where

π(yt|y1:t−1) =

∫π(yt|xt)π(xt|y1:t−1)dxt.

Only in a few special cases, for instance, linear Gaussian state space model or finite hidden

Markov models, these recursions can be evaluated analytically.

Sequential Monte Carlo methods rely on approximating π(x1:t|y1:t) by (weighted) Monte

Carlo samples. The particles are generated independently from some proposal distribution

q(x1:t|y1:t) = q(x1|y1)t∏

k=2

q(xk|xk−1,yk)

where we assume that q(x1:t) for all x1:t ∈ X t dominates π(x1:t|y1:t). Hence, proper impor-

tance weights (in the definition of Liu and Chen, 1995) with respect to π(x1:t|y1:t) for an

entire sequence xi1:t is given by

wit ≡ wt(xi1:t) =

π(xi1:t|y1:t)

q(xi1:t|y1:t)∝ wit−1

π(xit|xit−1)π(yt|xit)q(xit|xit−1,yt)

.

The posterior distribution π(x1:t|y1:t) is then represented by a weighted sample {(xi1:t, wit), i =

1, . . . , N}. Suggestive choices of q(x1:t), including those in which the data are used to guide

the particles to regions of high likelihood, are found in Pitt and Shephard (1999); Doucet

et al. (2000); Cappe et al. (2007) and references therein.

We approximate the marginal posterior distribution by the empirical probability mea-

sure

πN(x1:t) ≡ π(x1:t|y1:t) =N∑i=1

W it δxi1:t(x1:t)

4

where W it ≡ wit/

∑Ni=1w

it is the normalized weight of sample i at time t and δxi1:t(x1:t)

denotes the Dirac measure at point xi1:t.

In practice, when t increases, very few particles (in the limit a single one) dominate the

entire sample resulting in a very small number of particles with non-zero weights to approx-

imate the posterior distribution. This is an unavoidable problem in sequential importance

sampling when we are increasing the sample space over time for the target distributions

(Liu, 2001; Doucet and Johansen, 2009). To by-pass this problem, resampling steps (Doucet

et al., 2001; Douc and Cappe, 2005) have been introduced as a simple possibility to reduce

the sample impoverishment (also known as weight degeneracy). The resampling algorithms

produce multiple copies of particles with high weights and then eliminate those with neg-

ligible weights. Since the repeated use of resampling implies an impoverishment of the

particles, resampling should only be performed when needed, typically when the variation

in the weights is high. Two simple criteria are the effective sample size and the Shannon

Entropy which they are defined by (Cappe et al., 2005).

ESSt = 1/N∑i=1

(W it )

2, Ent = −N∑i=1

W it log2W

it .

The maximum values for the effective sample size and the Shannon Entropy are N and

log2N , respectively, both obtained when all the weights are equal. Resampling can be

applied when one of these measures is below a threshold. Algorithm 1 describes the whole

procedure in detail (where the functionals involved will be discussed in the next subsection).

Theoretical results of sequential Monte Carlo methods, related to asymptotic converge,

can be found in (Chopin, 2004; Del Moral, 2004; Cappe et al., 2005; Douc et al., 2014). In

summary, under weak assumptions, when the number of particles increases to infinity, the

approximation of the sequence of target distributions can be ensured. In cases where inter-

est is in the filter distribution π(xt|y1:t), that is only the last state variable is considered,

the Monte Carlo error is uniformly bounded over time (Doucet et al., 2000; Douc et al.,

2014).

5

2.2 Inference on path Functionals

In many cases, including parameter estimation, functionals of the whole path x1:t are of

interest, e.g. π(St(x1:t)|y1:t) for some function St : X t → Rd. In this paper we will consider

a class of recursive path functionals :

Definition 1 Let {St(x1:t)}t>1 be a sequence of measurable functions defined on X t → Rd.

If there exist functions Tt on Rd ×X 2 → Rd such that

St(x1:t) = Tt(St−1(x1:t−1),xt−1:t), (1)

then St(x1:t) is called a recursive path functional. An additive recursive path functional

has the following additive structure:

St(x1:t) = St−1(x1:t−1) + st(xt−1,xt). (2)

Such functionals allow for sequential updates without increasing the computational cost

within online applications. In several applications, as climate science or financial risk mod-

eling (Wilks, 2011; Tsay, 2005), practitioners are interested in computing P(max(x1:t) >

ct|y1:t) for some critical value ct. Since max(x1:t) = max(max(x1:t−1),xt), it can be easily

encapsulated as a recursive path functional. Some simple examples of additive resursive

path functionals are

St(x1:t) =t∑

k=1

xk = St−1(x1:t−1) + xt

St(x1:t) =t∑

k=2

xkxk−1 = St−1(x1:t−1) + xtxt−1

who can be used to derive sample means and lag-one correlations. Also quantities involved

in maximum likelihood through the scoring algorithm or the EM algorithm will in many

cases be recursive path functionals (Doucet and Tadic, 2003; Cappe et al., 2005; Poyiadjis

et al., 2011; Nemeth et al., 2013). Algorithm 1 shows pseudo-code for a standard particle fil-

ter including updating of additive functionals. Note that the notation within this algorithm

is a bit sloppy with respect to the resampling step in that the set {(x1:t,Sit), i = 1, ..., N}

from steps 3 and 4 is not equal to the set with equal notation obtained in step 6.

6

Algorithm 1 Particle Filter.

(For all steps involving variables with index i a loop over i = 1, ..., N is performed.)

1: set xi0 = ∅ and wi0 = 1.

2: for t = 1 to T do

3: sample xit ∼ q(xt|xit−1,yt) and set xi1:t = (xi1:t−1,xit);

4: update functional Sit = Sit−1 + s(xit−1:t);

5: compute and normalize the weights;

wit = wit−1π(xit|xit−1)π(yt|xit)q(xit|xit−1,yt)

;

6: resample N new sets of particles from {(xit,Sit, wit), i = 1, ..., N}. . Optional

7: end for

Estimation of recursive path functionals involves estimation of the complete collection

of hidden states, that is approximation of π(x1:t|y1:t). Due to path degeneracy where most

but the last states of the path degenerate, algorithms based on particle approximations

are inefficient for estimation of path functionals in datasets with a long time horizon. The

path degeneracy is a direct consequence of the resampling stage, which in turn is originally

a problem of importance sampling on increased target spaces in which importance weights

will collapse to give significant weights to only a few particles (Del Moral, 2004; Olsson

et al., 2008; Doucet and Johansen, 2009; Poyiadjis et al., 2011). In particular, Poyiadjis

et al. (2011) showed that for additive functionals

V[

∫X tSt(x1:t)π

N(dx1:t)] ≥ CN−1t2,

where C a positive constant and the variance is with respect to the uncertainty due to

the particle filter algorithm. Variance increasing at least quadratically with t makes the

use SMC methods inappropriate for recursive path functional approximation. As a direct

consequence, when t is large enough, the earlier part of the particle path will, after resam-

pling, with high probability be represented by only one single trajectory. Such problems

are substantial in cases where π(xs|y1:s−1) differs considerable from π(xs|y1:s).

In order to reduce the sample impoverishment, one popular way to increase the diversity

among in the particles is to move or update each particle to a new random position (Gilks

7

and Berzuini, 2001). However, these strategies are subject to increased computational

complexity or memory requirements and may be too expensive in practice for real-time

applications where it is necessary to rejuvenate the complete sample path.

2.3 Online Bayesian parameter estimation

In Bayesian scenarios we admit some prior model, π(θ) say, and our aim now is online

inference about π(xt, θ|y1:t). In general we have

π(x1:t, θ|y1:t) ∝ π(x1:t−1|y1:t−1)π(θ|x1:t−1,y1:t−1)π(xt|xt−1, θ)π(yt|xt, θ).

Assuming now that weighted samples {(xi1:t−1, wit−1), i = 1, ..., N} are available at time

t − 1, new samples at time t can be obtained by first simulating θ from some proposal

distribution q(θ|x1:t−1,y1:t) followed by simulating xt from some proposal distribution

q(xt|θ,x1:t−1,y1:t). The importance weights are then updated by

wt =π(x1:t−1|y1:t−1)π(θ|x1:t−1,y1:t−1)π(xt|xt−1, θ)π(yt|xt, θ)

q(x1:t−1|y1:t−1)q(θ|x1:t−1,y1:t)q(xt|θ,x1:t−1,y1:t)

∝wt−1π(θ|x1:t−1,y1:t−1)π(xt|xt−1, θ)π(yt|xt, θ)

q(θ|x1:t−1,y1:t)q(xt|θ,x1:t−1,y1:t).

Note that is sufficient that wt−1 is any weight function which makes the sample x1:t−1 proper.

In order to get a real online algorithm, both the proposal distribution and the weights must

be easily updated as new observations arrive. Fearnhead (2002) and Storvik (2002) assumed

the existence of some low-dimensional sufficient statistic S(x1:t) for θ which is a recursive

path functional and further that simulation of θ can be performed by π(θ|x1:t−1,y1:t−1) =

π(θ|St(x1:t)). In this case, the weights simplifies to

wt ∝wt−1π(xt|xt−1, θ)π(yt|xt, θ)q(xt|θ,x1:t−1,y1:t)

.

A pseudo-code for the resulting algorithm is given in Algorithm 2 in case where the sufficient

statistic is of the additive form (2). Note that the parameter θ is not stored from one time

step to another and is an application of the so-called Rao-Blackwellisation method where

θ is integrated out. As pointed out by e.g. Andrieu et al. (2005), this algorithm suffers by

the same degeneracy problem as the path functional described in the previous section.

8

Algorithm 2 Rao-Blackwellised parameter estimation by Particle filter.


1: set xi0 = ∅,Si0 = ∅ and wi0 = 1


3: sample θi ∼ π(θ|Sit−1);

4: sample xit ∼ q(xt|θi,xit−1,yt) and set xi1:t = (xi1:t−1,xit);

5: update sufficient statistic Sit = Sit−1 + s(xit−1:t);

6: compute and normalize the weights

wit = wit−1π(xit|xit−1; θi)π(yt|xit; θi)

q(xit|θi,xit−1,yt), i = 1, ..., N ;

7: resample N new sets of particles from {(xit, wit), i = 1, ..., N}. . Optional

8: end for

9

3 Particle Blocking Filter

In this section we propose modifications of the path functional and parameter estimation

algorithms described in sections 2.2 and 2.3 that can avoid the degeneracy problems previ-

ously discussed. Our aim is to perform online inference on path functionals and/or static

parameters using sequential Monte Carlo techniques including techniques that avoid degen-

eracy. Note that this is more ambitious than approximation of expectations as considered

in e.g. Olsson et al. (2008).

3.1 Particle Blocking Filter for inference on additive path func-

tionals

For simplicity of notation, we omit the dependence of the static parameter values θ which

in this section we assume to be known. Simultaneous inference on θ will be considered in

section 3.2.

Consider the additive structure (2) where we now will divide the time-intervall 1 : t into

sub-intervals of length l. For notational simplicity, consider a t corresponding to the end

of a block, t = ntl. Then

St(x1:t) =nt∑b=1

Sb(x(b−1)l:bl)

where

Sb ≡Sb(x(b−1)l:bl) =bl∑

u=(b−1)l+1

su(xu−1:u).

Simulation from π(St(x1:t)|y1:t) can be obtained through simulations from π(S1, S2, ..., Snt |y1:t).

Due to the particle degeneracy discussed earlier, ordinary particle filters will only give one

or a few significant particles for Sb for all but the largest b’s. In the interest of maintaining

the particle diversity, we introduce two low-cost approximations for π(S1, S2, ..., Snt|y1:t):

π(S1, S2, ..., Snt |y1:t) ≈nt∏j=1

π(Sb|y1:t) conditional independence

≈nt∏j=1

π(Sb|y1:lb+k) fixed lag approximation

10

for some appropriately chosen lag k. The first approximation is a sort of composite likeli-

hood approximation (Varin et al., 2011) while the second fixed lag approximation is similar

to the approach in Olsson et al. (2008).

Assume now, through a particle filter, we have available a properly weighted sample

{(xi(l−1)b, wi(l−1)b)} from π(x(l−1)b|y1:(l−1)b). Run the particle filter further to time lb + k

giving samples from π(x(l−1)b:lb|y1:lb+k) which again gives samples of Sb from π(Sb|y1:lb+k).

These samples are then stored and not further modified. In order to obtain approximate

samples from p(St(x1:t)|y1:t), samples from each block are drawn independently. The full

details of the algorithm is described in Algorithm 3 for k = 0. The more general setting

with k ≥ 0 is considered in appendix A.

The algorithm can easily be extended to more general path functionals. Steps 5 and 9

then have to be replaced by the more general updating schemes defined through (1).

Algorithm 3 Particle Blocking Filter with blocklength l.


1: set xi0 = ∅, wi0 = 1,Si0 = 0, Si1,0 = 0, b = 1 and u = 0.


3: sample xit ∼ q(·|xit−1,yt) and set xi1:t = (xi1:t−1,xit); . Start filtering

4: compute and normalize weights wit ∝ wit−1π(xit|xit−1)π(yt|xit)q(xit|xit−1,yt)

;

5: increase u→ u+ 1 and update local functional Sib,u = Sib,u−1 + st(xit−1:t);

6: resample N new sets of particles from {(xit, Sib,u, wit)}. . Optional

7: if u = l then . New block

8: put Sib = Sib,l and resample N new sets of particles from {(Si

b, wit), i = 1, ..., N}

to obtain equally weighted particles;

9: update global functional Sibl = Si(b−1)l + Sib;

10: increase b← b+ 1 and put u = 0, Sib,0 = 0.

11: end if

12: end for

11

3.2 Particle block filter for online Bayesian parameter estimation

Consider now the more general setting of simultaneous online inference of the underlying

states and the static parameters θ. Here, we limit ourselves to explore blocking strategies

when the conditional distribution of the static parameters given the underlying states

depends on some low dimensional sufficient statistic that is a recursive path functional.

The general idea is similar to that of the Rao-Blackwellised approach to parameter

estimation described in section 2.3 in that the parameters are marginalised out by including

sufficient statistics into the simulation procedure. For these sufficient statistics, the blocking

strategy from section 3.1 is then applied. More formally, our interest is in simulation from

π(xt+1, θ|y1:t+1) which we now aim to achieve by simulation from the extended distribution

π(S1, S2, ..., Snt ,xt+1, θ|y1:t+1). Simulation from this extended distribution is based on the

following relationship

π(S1, S2, ..., Snt ,xt+1, θ|y1:t+1)

∝π(S1, S2, ..., Snt |y1:t)π(θ|S1, S2, ..., Snt)p(xt+1|θ, Snt)p(yt+1|xt+1)

where we now use the approximation

π(S1, S2, ..., Snt |y1:t) ≈nt∏b=1

π(Sb|y1:bl)

for some properly chosen block length l. This approximation again avoids degeneracy of

the elements Sb that now goes into the sufficient statistics needed for simulation of the

parameters θ. Algorithm 4 describes the procedure in this case. Comparing this with

Algorithm 2, we see that the use of sufficient statistics for parameter updating is the same,

but that the main idea is on robustifying the sampling of the sufficient statistics with

respect to the degeneracy problem of path functionals similar to Algorithm 3. Note that

the lag k from Algorithm 3 is put to zero in this case. Also in this case, extensions to more

general path functionals are possible.

3.3 Computational issues and implications for implementation

The block particle filter tries to minimize the path degeneracy without increasing the com-

putational cost or memory requirements. In practice, we introduce an extra component

12

Algorithm 4 Particle Blocking Filter with simultaneous parameter simulation


1: set xi0 = ∅,S0 = 0, S1,0 = 0, wi0 = 1, b = 1 and u = 0.


3: sample θi ∼ π(θ|Sit−1). . Start filtering

4: sample xit ∼ q(·|xit−1,yt; θi) and set xi1:t = (xi1:t−1,xit).

5: compute and normalize weights wit ∝ wit−1π(xit|θi,xit−1)π(yt|xit)q(xit|xit−1,yt;θ

i).

6: increase u→ u+ 1 and update local functional Sib,u = Sib,u−1 + st(xit−1:t)

7: resample N sets of new particles from {(xit, Sib,u, wit), i = 1, ..., N}. . Optional

8: set Sit = Sib + Sib,u.

9: if s = l then . New block

10: put Sib = Sib,l and resample N sets of new particles from {(Sib, wit), i = 1, ..., N}

to obtain equally weighted particles

11: update global functional Sibl = Si(b−1)l + Sib;

12: increase b← b+ 1 and put u = 0, Sib,0 = 0.

13: end if

14: end for

in the particle filter algorithms that need to be specified; the length of the blocks. An

optimal choice of the block-length is dominated at least by: (i) the data-generating pro-

cess {(xt,yt)}t>1; (ii) the purpose for which the particle algorithms are used, for instance,

approximating some test function or credibility intervals for the hyperparameters. In ad-

dition, l should be increasing as a function of the number of particles in order to reduce

the approximation error. Adaptive block-splitting can also be considered. We might split

blocks when the unique number of particles is relatively small.

13

4 Simulation experiments

In this section, we demonstrate the efficiency of applying the particle blocking methods in

a univariate partially linear Gaussian state space model. The model is described as follows:

x1 ∼N (α, σ2/(1− φ2));

xt|xt−1 ∼N (α + φxt−1, σ2); t = 1, 2, ...

Here, α is a location parameter while φ ∈ (0, 1) describes the autocorrelation in the latent

process. Our interest lies in approximating the posterior distribution of some path recur-

sive functional statistics π(St(x1:t)|y1:t) in the context where online particle methods are

required. In our simulation studies we use the stratified resampling algorithm (Douc and

Cappe, 2005) and time lag k equal to the block-length l.

4.1 Blocking for approximating functionals

Consider first a fully linear Gaussian state space where the observation model is given by

yt|xt ∼N (xt, 1). (3)

For this observation model, exact computation can be performed trough the Kalman tech-

niques, making it easy to evaluate the performance of the sequential Monte Carlo algo-

rithms. Assuming the static parameters θ = (α, φ, σ) are known, we are interested in

comparing the posterior distributions of S1,t = t−1∑x2t and S2,t = t−1

∑xtxt−1 by the

particle filter based on sufficient statistics (Algorithm 1) with the blocking Particle filter

(Algorithm 4) as well as exact solutions obtained by the Kalman filter/smoother. For both

particle filters, proposal distributions were equal to p(xt|xit−1, yt, θi).

In Figure 1 we report the trajectories of the posterior mean together with lower and

upper 10% posterior quantiles of S1,t and S2,t on a simulated dataset of length 50 000 with

two choices of block size for N = 100 and N = 1 000. As has been highlight previously

(and also noticed in literature Andrieu et al., 2005), we can see the effect of the path

degeneracy in the standard particle algorithm. On the other hand, the blocking schemes

gives much more accurate results for almost every time points, also in the end of the time

series. Moreover, variations on the size of the blocks and smoothing lags(l = k = 50 and

14

l = k = 100) give similar results, in particular when the number of particles is equal to

N = 1 000.

In order to evaluate the performance for different blocks sizes with two scenarios of the

target distributions, Figure 2 shows the estimate of following measure

dt = supv|FNSt (v)− FSt(v)|,

where FNSt

is the empirical distribution function obtained by the particle filter (either stan-

dard or blocked), and FSt is obtained by the Kalman smoother. As we can see from the

figure, the performance of the blocking algorithms are highly superior compared to the

standard method, in both scenarios of the static parameters. These empirical results are in

agreement with what we would expect when we avoid a excessive use of resampling steps

for the entire path. For that reason, we significantly reduce the sample impoverishment

in the genealogical tree during the simulation process. Moreover, the blocking algorithms

seems to be quite robust with respect to choices of the block sizes. Comparable results (not

reported here) were achieved with different number of particles.

15

0 10000 20000 30000 40000 50000

0.30

0.31

0.32

0.33

0.34

0.35

Time

mea

n S

Q

0 10000 20000 30000 40000 50000

0.16

00.

165

0.17

00.

175

0.18

0

Timem

ean

C

0 10000 20000 30000 40000 50000

0.30

0.31

0.32

0.33

0.34

0.35

Time

mea

n S

Q

0 10000 20000 30000 40000 50000

0.16

00.

165

0.17

00.

175

0.18

0

Time

mea

n C

Figure 1: Estimates of posterior mean and percentiles (10%, 90%) of S1,t = t−1∑x2t (left)

and S2,t = t−1∑xtxt−1 (right) for guided particle filter (gray, dashed), particle blocking

filter for l = k = 50 (light-blue, solid) and l = k = 100 (dotdashed/ dark-blue) and

Kalman Smoothing (black, dotted). Fixed static parameters: θ = (α, φ, σ) = (0, 0.9, 1)

using N = 100 (top) and N = 1 000 (bottom).

16

Figure 2: Time series plot of dt for St,1 = t−1∑x2t (left) and St,2 = t−1

∑xtxt−1 (right) for

two scenarios using N = 103. Top: θ = (α, φ, σ) = (0, 0.5, 0.5). Bottom: θ = (0, 0.95, 0.5).

Methods are standard particle filter (gray/ dashed), particle blocking filter for l = k = 50

(light-blue/ solid), l = k = 100 (dotdashed/ dark-blue) and l = k = 200 (longdashed/

blue).

17

4.2 Particle blocking with unknown static parameters

In this section we will consider the performance of particle blocking algorithms when the

static parameters are unknown, in thise case through a stochastic volatility model. Consider

the following observation model:

yt|xt ∼N (yt; 0, ext). (4)

Introduce zt = (1, xt−1)′ and β = (α, φ)′ so that the latent model can be rewritten as

xt =zttβ + σ, εt ∼ N(0, τ−1).

Then, we proceed via full Bayesian inference, using the prior

π(β, τ) = π(β|τ)π(τ) = N(µβ, (τQβ)−1)Gamma(aτ , bτ ).

For the given model,

p(x1:T |β, τ) ∝T∏t=1

exp{− τ2(xt − ztβ)2} ∝ exp{− τ

2

T∑t=1

(xt − z′tβ)2}

∝ exp{− τ2[[

T∑t=1

x2t − 2T∑t=1

xtz′tβ + β′

T∑t=1

zt−1z′tβ]}

showing that sufficient statistics for θ = (β, τ) are

T∑t=1

x2t ,T∑t=1

xtzt−1,T∑t=1

ztz′t.

In our simulation, we use µβ = 0,Qβ = I and aτ = bτ = 1. In order to reduce the insta-

bility of the online particle algorithms when static parameters are unknown, we initialize

the algorithms (Rao-Blackwellised and blocking) with the following strategies: for both

schemes we run the first 500 timepoints using full MCMC simulations; also for the blocking

algorithm, we postpone the time point in which the first block is created (in our simulation

we start blocking when t = 2500).

Figure 3 shows the density estimation for the posterior distribution for three sufficient

statistics and for all unknown static parameters for t = 104. The sufficient statistics

obtained by the particle blocking filter does not degenerate, which is the case for the Rao-

Blackewellised algorithm 2. Since, in this case we have a short variation in scale of the

summary statistics, it was observed a modest gain in terms of posterior estimation of the

parameters.

18

Figure 3: Density estimates of the posterior distribution of the model (4) for N = 2 000

and t = 10 000. Top panel: t−1∑xt, t

−1∑xt (left), t−1∑x2t (middle) and t−1

∑xtxt−1

(right). Bottom panel: α, φ and τ with true value (vertical dotted line/ red). The Rao-

Blackwellised particle filter (gray) and particle blocking filter (blue with l = k = 50) and

MCMC-bugs(black).

19

5 Discussion

In this paper we have introduced a simple strategy conditioned on creating blocks in the

particle filter algorithms in the interest of reducing the effect of the path degeneracy when

we are dealing with large datasets. As has been exhaustively noticed in the literature

of online sequential Monte Carlo inference, the path degeneracy is a common problem

when approximating a sequence of target distributions. Furthermore, it becomes more

evident when we deal with long time-series and complex models, and moreover when the

Monte Carlo samples are expensive. The key idea of our approach is to introduce a block

independence approximation for the posterior distribution of path recursive functions.

With our experimental studies, we show that the particle blocking strategies outper-

formed the standard particle filter algorithms. Also, we show that our methodology is

stable with respect to the block size. Although we have built our scheme assuming a fixed

length of each block, adaptive strategies can be considered taking into account the level of

degeneracy. Lastly, particle blocking algorithms become very attractive for online Bayesian

applications with a computational complexity being linear in time and in the number of

particles, that is equivalent the ordinary particle filter.

Acknowledgments

We gratefully acknowledge financial support from CAPES-Brazil and Statistics for Inno-

vation Center, in Norway.

A Particle blocking filter with general smoothing lag

20

Algorithm 5 Particle Blocking Filter with blocklength l and smoothing lag k > 0.


1: set xi0 = ∅, wi0 = 1,Si0 = 0, Si0 = 0, b = 1 and u = 0.


3: sample xit ∼ q(·|xit−1,yt). . Start filtering

4: compute and normalize weights wit = wit−1π(xit|xit−1)π(yt|xit)q(xit|xit−1,yt)

.

5: increase u→ u+ 1 and update local functional Sib,u = Sib,u−1 + st(xit−1:t);

6: update local functional Sib = Sib + s(xit−1:t)

7: resample N new sets of particles from {(xit, Sib−1, Sib, wit), i = 1, ..., N}. . Optional

8: if u = k then

9: resample from {Sib−1, wit), i = 1, ..., N} to obtain equally weighted particles.

10: update global functional Si1:(b−1)l = Si1:(b−2)l + Sib−1

11: end if

12: if u = l then . New block

13: increase b→ b+ 1 and put u = 0, Sib,u = 0.

14: end if

15: end for

21

B Extensions to more general proposal distributions

for the parameters

Algorithm 2 requires the ability to sample from π(θ|St−1). The approach can however be ex-

tended to consider more general settings. Assume now weighted samples {(xi1:t−1, θit−1, wit−1), i =

1, ..., N} are available at time t−1 where the weights are proper with respect to π(x1:t−1, θt−1|y1:t−1).

Assume a new θit is generated according to some proposal density q(θt|θit−1,St−1). Further,

define an extended target density pt(x1:t, θt−1, θt) containing π(x1:t, θt|y1:t) as its marginal,

which implies that

pt(x1:t, θt−1, θt) = π(x1:t, θt|y1:t)h(θt−1|θt,x1:t)

for some arbitrary density h(θt−1|θt,x1:t). Using the extended target density approach

by Storvik (2011), proper weights at time t with respect to π(x1:t, θt|y1:t) are given by

wt =pt(x1:t, θt−1, θt)

q(x1:t, θt−1, θt)

=π(x1:t, θt|y1:t)h(θt−1|θt,x1:t)

q(x1:t−1, θt−1)q(θt|θit−1,St−1)q(xt|θ,xt−1,yt)

∝π(x1:t−1|y1:t−1)π(θt|x1:t−1)π(xt|θt,xt−1)π(yt|xt)h(θt−1|θt,x1:t)

qt−1(x1:t−1, θt−1)q(θt|θit−1,St−1)q(xt|θt,xt−1,yt)

∝wt−1π(θt|x1:t−1)π(xt|θt,xt−1)π(yt|xt)h(θt−1|θt,x1:t)

q(θt|θit−1,St−1)q(xt|θt,xt−1,yt)

where we assume wt−1 is a proper weight with respect to the distribution π(x1:t−1, θ|y1:t−1)

with (x1:t−1, θ) generated from qt−1(x1:t−1, θt−1).

simulated from qt−1(xi1:t−1, θ

it−1) so that with

wit−1(x1:t−1, θt−1) =pt−1(x

i1:t−1, θ

it−1)

qt−1(xi1:t−1, θit−1)

,

we have a properly weighted sample. Let us introduce the extended target density pt(xj1:t, θ

jt−1, θ

it),

and, for t > 1, we propagate (xit, θit) from qt(xt, θt|xi1:t−1, θit−1). Therefore, by defining

wit = wt(xi1:t, θ

it; θ

it−1) ≡

pt(xj1:t, θ

jt−1, θ

it)

qt−1(xi1:t−1, θit−1)qt(x

it, θ

it|xi1:t−1, θit−1)

(5)

we also obtain a properly weighted sample {(xi1:t, θit, wit), i = 1, ..., N} from pt(x1:t, θ). Fac-

tor into the pseudo target distribution as discussed in Marques and Storvik (2013), we can

22

rewrite the proper weight as

wt(x1:t, θt; θt−1) =pt(x1:t, θt)h(θt−1|θt,x1:t)

qt−1(x1:t−1, θt−1)qt(xt, θt|x1:t−1, θt−1)

∝π(x1:t−1|y1:t−1)

qt−1(x1:t−1)×

π(θt|x1:t−1,y1:t−1)π(xt|x1:t−1, θt,y1:t−1)π(yt|x1:t, θt,y1:t−1)h(θt−1|θt,x1:t)

qt−1(θt−1|x1:t−1)qt(xt, θt|x1:t−1, θt−1)

where h(θt−1|θt,x1:t) is the backward kernel that must integrate one over Θ. By one

valid choice of the backward kernel h(θt−1|θt,x1:t) = qt−1(θt−1|x1:t−1), and assuming that

qt−1(x1:t−1) = πt−1(x1:t−1), then we have that

wt(x1:t, θt; θt−1) =π(θt|x1:t−1,y1:t−1)π(xt|x1:t−1, θt,y1:t−1)π(yt|x1:t, θt,y1:t−1)qt−1(θt−1|x1:t−1)

qt−1(θt−1|x1:t−1)qθt (θt|x1:t−1,y1:t−1)qyt (xt|x1:t−1, θt)

=π(θt|x1:t−1,y1:t−1)π(xt|x1:t−1, θ,y1:t−1)π(yt|x1:t, θ,y1:t−1)

qθt (θt|x1:t−1,y1:t−1)qxt (xt|x1:t−1, θt)

which is the Storvik filter when qθt is the exact Gibbs Kernel with π(θt|x1:t−1,y1:t−1) de-

pending on some low-dimensional sufficient statistics. Hence, in this case the proper weight

for time t reduces to

wt(x1:t, θt; θt−1) =π(θt|St−1(x1:t−1,y1:t−1))π(xt|x1:t−1, θt,y1:t−1)π(yt|x1:t, θt,y1:t−1)

qθt (θt|St−1(x1:t−1,y1:t−1))qxt (xt|x1:t−1, θt)

=π(xt|xt−t, θt)π(yt|xt, θt)

qxt (xt|x1:t−1, θt)(6)

Additionally, the result before also stands for the extended parameter filter (Erol 2013)

when qθt can be approximated by M -order exponential family approximation in separable

state space models. Clearly theses schemes above, the simulations of the state vector and

parameters are performed simultaneously, with marginalization step of θ. Additionally,

St−1 is to be updated deterministically in time a new sample St, and we stress that this

quantity is based on the particle approximation of πθ(x1:t|y1:t).

References

Christophe Andrieu, Nando De Freitas, and Arnaud Doucet. Sequential mcmc for bayesian

model selection. In Higher-Order Statistics, 1999. Proceedings of the IEEE Signal Pro-

cessing Workshop on, pages 130–134. IEEE, 1999.

23

Christophe Andrieu, Arnaud Doucet, and Vladislav B Tadic. On-line parameter estimation

in general state-space models. In Decision and Control, 2005 and 2005 European Control

Conference. CDC-ECC’05. 44th IEEE Conference on, pages 332–337. IEEE, 2005.

J.V. Candy. Bayesian signal processing: classical, modern, and particle filtering methods,

volume 54. Wiley-Interscience, 2009.

O. Cappe, E. Moulines, and T. Ryden. Inference in hidden Markov models. Springer Verlag,

2005.

O. Cappe, S.J. Godsill, and E. Moulines. An overview of existing methods and recent

advances in sequential Monte Carlo. Proceedings of the IEEE, 95(5):899–924, 2007.

C.M. Carvalho, M.S. Johannes, H.F. Lopes, and N.G. Polson. Particle learning and smooth-

ing. Statistical Science, 25(1):88–106, 2010.

N. Chopin. Central limit theorem for sequential Monte Carlo methods and its application

to Bayesian inference. The Annals of Statistics, 32(6):2385–2411, 2004.

Pierre Del Moral. Feynman-Kac Formulae, Genealogical and Interacting Particle Systems

with Applications. New York: Springer-Verlag, 2004.

Randal Douc and Olivier Cappe. Comparison of resampling schemes for particle filtering.

In Image and Signal Processing and Analysis, 2005. ISPA 2005. Proceedings of the 4th

International Symposium on, pages 64–69. IEEE, 2005.

Randal Douc, Eric Moulines, and David Stoffer. Nonlinear time series: Theory, Methods

and Applications with R Examples. CRC Press, 2014.

A. Doucet, S. Godsill, and C. Andrieu. On sequential Monte Carlo sampling methods for

Bayesian filtering. Statistics and computing, 10(3):197–208, 2000.

A. Doucet, N. de Freitas, and N. Gordon. Sequential Monte Carlo methods. Springer-Verlag,

2001.

Arnaud Doucet and Adam M Johansen. A tutorial on particle filtering and smoothing:

fifteen years later. Handbook of Nonlinear Filtering, 12:656–704, 2009.

24

Arnaud Doucet and Vladislav B Tadic. Parameter estimation in general state-space models

using particle methods. Annals of the institute of Statistical Mathematics, 55(2):409–422,

2003.

Arnaud Doucet, Mark Briers, and Stphane Sncal. Efficient block sampling strategies for

sequential Monte Carlo methods. Journal of Computational and Graphical Statistics, 15

(3):693–711, 2006. doi: 10.1198/106186006X142744.

P. Fearnhead. Markov Chain Monte Carlo, sufficient statistics, and particle filters. Journal

of Computational and Graphical Statistics, 11(4):848–862, 2002.

Paul Fearnhead. Computational methods for complex stochastic systems: a review of some

alternatives to MCMC. Statistics and Computing, 18:151–171, 2008.

W.R. Gilks and C. Berzuini. Following a moving targetMonte Carlo inference for dynamic

Bayesian models. Journal of the Royal Statistical Society: Series B (Statistical Method-

ology), 63(1):127–146, 2001.

N. Kantas, A. Doucet, S. Singh, and J. Maciejowski. An overview of sequential Monte

Carlo methods for parameter estimation in general state-space models. In Proceedings

of the IFAC Symposium on System Identification (SYSID), 2009.

M.P. Kevin. Machine Learning: a probabilistic perspective. The MIT press, 2012. ISBN

9780262018029.

Hans R Kunsch et al. Particle filters. Bernoulli, 19(4):1391–1403, 2013.

Ming Lin, Rong Chen, Jun S Liu, et al. Lookahead strategies for sequential monte carlo.

Statistical Science, 28(1):69–94, 2013.

Jun S. Liu. Monte Carlo strategies in scientific computing. Springer, 2001. ISBN

0387952306.

Jun S Liu and Rong Chen. Blind deconvolution via sequential imputations. Journal of the

American Statistical Association, 90(430):567–576, 1995.

25

Reinaldo Marques and Geir Storvik. Particle move-reweighting strategies for online infer-

ence. Technical report, University of Oslo Library, 2013.

Chris Nemeth, Paul Fearnhead, and Lyudmila Mihaylova. Particle approximations of the

score and observed information matrix for parameter estimation in state space models

with linear computational cost. arXiv preprint arXiv:1306.0735, 2013.

Jimmy Olsson, Olivier Cappe, Randal Douc, Eric Moulines, et al. Sequential monte carlo

smoothing with application to parameter estimation in nonlinear state space models.

Bernoulli, 14(1):155–179, 2008.

Michael K. Pitt and Neil Shephard. Filtering via simulation: Auxiliary particle filters.

Journal of the American Statistical Association, 94(446):590–599, 1999.

G. Poyiadjis, A. Doucet, and S.S. Singh. Particle approximations of the score and ob-

served information matrix in state space models with application to parameter estima-

tion. Biometrika, 98(1):65–80, 2011.

G. Storvik. Particle filters for state-space models with the presence of unknown static

parameters. Signal Processing, IEEE Transactions on, 50(2):281–289, 2002.

G. Storvik. On the flexibility of Metropolis–Hastings acceptance probabilities in auxiliary

variable proposal generation. Scandinavian Journal of Statistics, 38(2):342–358, 2011.

Ruey S Tsay. Analysis of financial time series, volume 543. John Wiley & Sons, 2005.

Cristiano Varin, Nancy Reid, and David Firth. An overview of composite likelihood meth-

ods. Statistica Sinica, pages 5–42, 2011.

Daniel S Wilks. Statistical methods in the atmospheric sciences, volume 100. Academic

press, 2011.

26

Documents

Online recursive particle path functional approximation ... · In section 3.1, the path approximation for recursive functionals is introduced combined with sequential Monte Carlo