Introduction to Sequential Monte Carlo and Particle MCMC ... · Motivation VanillaMonteCarlo ImportanceSampling MarkovChainMonteCarloMethods State-Space-ExtensionTricks SequentialMonteCarloMethods

Introduction to Sequential Monte Carloand Particle MCMC Methods

Axel Finke

Department of Statistics,University of Warwick, UK

Oberseminar, Münster: July 10, 2013

Piecewise Deterministic Processes (PDPs)

Ingredients� Time-dependent parameters: marked point process.�j ; �j /j2N[f0g with

– jump times 0 D �0 < �1 < �2 < : : :– jump sizes �0; �1; �2; : : :

� Static parameters: � .� Deterministic function: F � .

2 / 51





2 / 51





2 / 51





2 / 51

Sketch

PDP

Valu

e

Time

F θ( · , τ2, φ2)

F θ( · , τ1, φ1)F θ( · , τ0, φ0)

τ0 τ1 τ2

φ0

φ2

φ1

3 / 51

Observations

ObservationsPDP

Valu

e

Timeτ0 τ1 τ2

φ0

φ2

φ1

4 / 51

Example I: Object Tracking

ObservationsTrue trajectory

Dim

ensio

n2

Dimension 1 ×1040 1 2 3 4 5 6

−2000

0

2000

4000

6000

5 / 51

Example I: Object Tracking (continued)

not observed

[m/s

2 ]Ac

celer

atio

n

Time

not observed

[m/s

]Sp

eed

Observations[m

]Po

sitio

n

τj τj+1

−200

20

0

4000

104

6 / 51

Example II: Shot-Noise Cox Process

Insurance claims

Freq

uenc

y

Time

True intensity"Catastrophic" events

Inte

nsity

0 500 1000 1500 20000

20

400

5

10

15

7 / 51

Inference

� Sequential Monte Carlo filter for PDPs introduced byWhiteley, Johansen & Godsill (2011).� Efficient methods for estimating � still missing (though

Reversible-Jump MCMC works for simple models)

8 / 51

Inference

� Sequential Monte Carlo filter for PDPs introduced byWhiteley, Johansen & Godsill (2011).� Efficient methods for estimating � still missing (though

Reversible-Jump MCMC works for simple models)

8 / 51

Piecewise Deterministic ProcessesStatic Monte Carlo Methods

MotivationVanilla Monte CarloImportance SamplingMarkov Chain Monte Carlo MethodsState-Space-Extension Tricks

Sequential Monte Carlo MethodsMotivationGeneric SMC AlgorithmSample DegeneracySMC Samplers

Particle MCMC MethodsMotivation and SetupExtended Target DistributionParticle Marginal Metropolis–Hastings AlgorithmParticle Gibbs SamplerSMC2 Algorithm

Motivation

� X � � with support E.� f W E ! R some (�-integrable) function.� Want to calculate

�.f / WDZE

f .x/�.dx/�DZE

f .x/�.x/ dx�

D EŒf .X/�:

9 / 51

Motivation


�.f / WDZE

f .x/�.dx/�DZE

f .x/�.x/ dx�

D EŒf .X/�:

9 / 51

Motivation


�.f / WDZE

f .x/�.dx/�DZE

f .x/�.x/ dx�

D EŒf .X/�:

9 / 51

Motivation, continued

ExampleOften, E � R and �.x/ D p.xjy/ for some data y, so that

�.f / D

8̂̂̂̂<̂ˆ̂̂:

P.X 2 AjY D y/; if f D 1A for A � E,EŒXkjY D y�; if f D idk,varŒX jY D y�; if f D Œid��.f /�2,

:::

10 / 51

Motivation, continued

ExampleOften, E � R and �.x/ D p.xjy/ for some data y, so that

�.f / D

8̂̂̂̂<̂ˆ̂̂:

P.X 2 AjY D y/; if f D 1A for A � E,EŒXkjY D y�; if f D idk,varŒX jY D y�; if f D Œid��.f /�2,

:::

10 / 51

Monte Carlo Methods

� Problem: analytical evaluation of �.f / costly/impossible.� Idea:

1. construct approximation O� of � .2. estimate �.f / by

O�.f / DZE

f .x/ O�.dx/:

� Monte Carlo methods: construction of O� based on(computer-generated) (pseudo-)random numbers.

11 / 51

Monte Carlo Methods



O�.f / DZE

f .x/ O�.dx/:


11 / 51

Monte Carlo Methods



O�.f / DZE

f .x/ O�.dx/:


11 / 51

Monte Carlo Methods



O�.f / DZE

f .x/ O�.dx/:


11 / 51





Vanilla Monte Carlo� Sample X1; : : : ; XN iid� � .� Approximate �.dx/ by the empirical measure:

O�mc.dx/ WD 1

N

NXiD1

•X i .dx/:

CDF of �

Prob

abilit

y

x

0

1

12 / 51


O�mc.dx/ WD 1

N

NXiD1

•X i .dx/:

CDF: O�MCCDF: �Pr

obab

ility

x

N D 10

0

1

12 / 51


O�mc.dx/ WD 1

N

NXiD1

•X i .dx/:


obab

ility

x

N D 30

0

1

12 / 51


O�mc.dx/ WD 1

N

NXiD1

•X i .dx/:


obab

ility

x

N D 70

0

1

12 / 51


O�mc.dx/ WD 1

N

NXiD1

•X i .dx/:


obab

ility

x

N D 200

0

1

12 / 51


O�mc.dx/ WD 1

N

NXiD1

•X i .dx/:


obab

ility

x

N D 1000

0

1

12 / 51


O�mc.dx/ WD 1

N

NXiD1

•X i .dx/:


obab

ility

x

N D 5000

0

1

12 / 51

Vanilla Monte Carlo, continued

� Estimate �.f / D EŒf .X/� by

O�mc.f / DZE

f .x/ O�mc.dx/

D 1

N

NXiD1

f .X i/:

� Unbiased and consistent.� Monte Carlo methods are best viewed as simulation

techniques for approximating measures.

13 / 51



O�mc.f / DZE

f .x/ O�mc.dx/

D 1

N

NXiD1

f .X i/:



13 / 51



O�mc.f / DZE

f .x/ O�mc.dx/

D 1

N

NXiD1

f .X i/:



13 / 51





Motivation

SetupAssume that

�.x/ D .x/

Z

with normalising constant Z D RE .x/ dx, but

� we cannot sample from � .� Z is unknown (i.e. we can evaluate but not �)

Example (Bayesian inference)Let �.x/ WD p.xjy/ for some data y, then often,� we can evaluate .x/ D p.x; y/,� but Z D p.y/ D R

Ep.x; y/ dx is typically intractable.

14 / 51

Motivation

SetupAssume that

�.x/ D .x/

Z





14 / 51

Motivation

SetupAssume that

�.x/ D .x/

Z





14 / 51

Motivation

SetupAssume that

�.x/ D .x/

Z





14 / 51

Motivation

SetupAssume that

�.x/ D .x/

Z





14 / 51

Motivation

SetupAssume that

�.x/ D .x/

Z





14 / 51

Importance SamplingAssume that � is another distribution s.t. � � �.1. Sample X1; : : : ; XN iid� �.2. Approximate � by the weighted empirical measure:

O� is.dx/ WDNXiD1

W i•X i .dx/:

CDF: �CDF: �Pr

obab

ility

x

0

1

15 / 51


O� is.dx/ WDNXiD1

W i•X i .dx/:

CDF: �CDF: �Pr

obab

ility

x

0

1

15 / 51


O� is.dx/ WDNXiD1

W i•X i .dx/:


obab

ility

x

0

1

15 / 51


O� is.dx/ WDNXiD1

W i•X i .dx/:


obab

ility

x

0

1

15 / 51


O� is.dx/ WDNXiD1

W i•X i .dx/:

CDF: O� ISCDF: �Pr

obab

ility

x

0

1

15 / 51

Constructing the Importance Weights

PDF: �PDF: �

Dens

ity

x0

0:5

16 / 51

Constructing the Importance Weights, continuedWant to set W i / d�

d� .Xi/ D �.X i /

�.X i /.

Problem: can only evaluate G.X i/ WD d d� .X

i/ D .X i /

�.X i /.

Solution: approximate and Z separately, i.e.1. approximate .dx/ by

O is;u.dx/ WD 1

N

NXiD1

G.X i/•X i .dx/

2. approximate Z D REG.x/�.dx/ by

yZ WD O�mc.G/̃

‘vanilla’ Monte Carloestimate of �.G/

D 1

N

NXiD1

G.X i/:

17 / 51


d� .Xi/ D �.X i /

�.X i /.


i/ D .X i /

�.X i /.


O is;u.dx/ WD 1

N

NXiD1

G.X i/•X i .dx/


yZ WD O�mc.G/̃


D 1

N

NXiD1

G.X i/:

17 / 51


d� .Xi/ D �.X i /

�.X i /.


i/ D .X i /

�.X i /.


O is;u.dx/ WD 1

N

NXiD1

G.X i/•X i .dx/


yZ WD O�mc.G/̃


D 1

N

NXiD1

G.X i/:

17 / 51


d� .Xi/ D �.X i /

�.X i /.


i/ D .X i /

�.X i /.


O is;u.dx/ WD 1

N

NXiD1

G.X i/•X i .dx/


yZ WD O�mc.G/̃


D 1

N

NXiD1

G.X i/:

17 / 51


d� .Xi/ D �.X i /

�.X i /.


i/ D .X i /

�.X i /.


O is;u.dx/ WD 1

N

NXiD1

G.X i/•X i .dx/


yZ WD O�mc.G/̃


D 1

N

NXiD1

G.X i/:

17 / 51

Constructing the Importance Weights, continued

3. approximate �.dx/ by

O� is.dx/ WD O is;u.dx/= OZ

DNXiD1

W i•X i .dx/;

whereW i WD G.X i/PN

jD1G.Xj /:

Importance sampling yields unbiased (and consistent)estimates of normalising constants!

18 / 51

Constructing the Importance Weights, continued


O� is.dx/ WD O is;u.dx/= OZ

DNXiD1

W i•X i .dx/;

whereW i WD G.X i/PN

jD1G.Xj /:

Importance sampling yields unbiased (and consistent)estimates of normalising constants!

18 / 51





Markov Chain Monte Carlo Methods

Let K be a �-invariant ergodic Markov kernel.1. simulate a Markov chain with transitions K, i.e. sample

X1 � �. � /; X2 � K. � jX1/; X3 � K. � jX2/; : : :


O�mcmc.dx/ WD 1

N

RCNXiDRC1

•X i .dx/:

after a suitable burn-in time R.

19 / 51



X1 � �. � /; X2 � K. � jX1/; X3 � K. � jX2/; : : :


O�mcmc.dx/ WD 1

N

RCNXiDRC1

•X i .dx/:


19 / 51



X1 � �. � /; X2 � K. � jX1/; X3 � K. � jX2/; : : :


O�mcmc.dx/ WD 1

N

RCNXiDRC1

•X i .dx/:


19 / 51

Constructing the Markov Kernel K

Example (Gibbs sampler)Let E be d -dimensional. The standard Gibbs sampler cyclesthrough all full conditional distributions (under �), i.e.

K.dxi jxi�1/ WDdYjD1

�.dxij jxi1Wj�1; xi�1jC1Wd /

Partially-collapsed Gibbs sampler (Van Dyk & Park, 2008):� often no need to sample from full conditionals.� instead, sample from conditionals under a marginal of �

20 / 51






20 / 51






20 / 51


Example (Metropolis–Hastings algorithm)1. sample X? � Q. � jX i�1/ (where Q is not �-invariant)2. accept X i WD X? with probability

˛.X?jX i/ WD 1 ^ .X?/Q.X i jX?/

.X i/Q.X?jX i/;

otherwise, set X i WD X i�1.Thus, K has the form

K.dxi jxi�1/ WD ˛.xi jxi�1/Q.dxi jxi�1/C r.xi�1/•x.dxi/;

where r.x/ WD 1 � RE˛.zjx/Q.dzjx/.

21 / 51



˛.X?jX i/ WD 1 ^ .X?/Q.X i jX?/

.X i/Q.X?jX i/;




21 / 51



˛.X?jX i/ WD 1 ^ .X?/Q.X i jX?/

.X i/Q.X?jX i/;




21 / 51





State-Space Extension Tricks

� Want to target Q�.�/ D Q .�/=Z, for Z > 0.� What if we cannot evaluate Q .�/? (needed for IS/MCMC).� Idea:

1. instead, target �.�; x/ D .�; x/=Z, s.t.i. �.�; x/ admits Q�.�/ as a marginal,ii. .�; x/ can be evaluated.

2. construct IS/MCMC approximation O� of � D =Z.3. approximate Q� by �-marginal of O� .

� Many names for this: state-space extension,auxiliary-variable construction, data augmentation, . . .

22 / 51






22 / 51






22 / 51






22 / 51






22 / 51






22 / 51






22 / 51






22 / 51

State-Space Extension Tricks, continued

Example (Hidden Markov model)X0 � �� , and for n 2 N,

Xn � f �. � jXn�1/;Yn � g�. � jXn/;

where � are some ‘static’ parameters.Assume we are interested in Q�.�/ WD p.� jy1Wn/.� Q .�/ D p.�; y1Wn/ and Z D p.y1Wn/ are intractable.� but we can evaluate

.�; x0Wn; y1Wn/ WD p.�/��.x0/nY

pD1f �.xpjxp�1/g�.ypjxp/:

23 / 51



Xn � f �. � jXn�1/;Yn � g�. � jXn/;


.�; x0Wn; y1Wn/ WD p.�/��.x0/nY


23 / 51



Xn � f �. � jXn�1/;Yn � g�. � jXn/;


.�; x0Wn; y1Wn/ WD p.�/��.x0/nY


23 / 51



Xn � f �. � jXn�1/;Yn � g�. � jXn/;


.�; x0Wn; y1Wn/ WD p.�/��.x0/nY


23 / 51





Target distributions

SetupWant to target a sequence of related distributions.��n .x1Wn//n2N which� are defined on spaces .En/n2N of increasing dimension

[will be relaxed later],� have unknown normalising constants .Z�n/n2N .

� known [will be relaxed later].

ProblemIS/MCMC require new algorithm for each n 2 N.

24 / 51






24 / 51






24 / 51






24 / 51






24 / 51





Sequential Monte Carlo MethodsSequential Monte Carlo (SMC): propagate weighted samples(‘particles’) .X i

1Wn; W �;in /i2f1;:::;N g to construct

O��n .dx1Wn/ WDNXiD1

W �;in •X i

1Wn.dx1Wn/:

SMC AlgorithmAt time n, given .X i

1Wn�1; W�;in�1/i2f1;:::;N g,

1. sample X in � K�

n. � jX i1Wn�1/

– this approximates ��n�1.dx1Wn�1/K�n .dxnjx1Wn�1),2. re-weight particle paths .X i

1Wn/i2f1;:::;N g– to approximate ��n .dx1Wn/,

3. resample: reset weights, after– “multiplying” particles with large weights,– “killing off” paths with small weights.

25 / 51




W �;in •X i

1Wn.dx1Wn/:


1Wn�1; W�;in�1/i2f1;:::;N g,


n. � jX i1Wn�1/




25 / 51




W �;in •X i

1Wn.dx1Wn/:


1Wn�1; W�;in�1/i2f1;:::;N g,


n. � jX i1Wn�1/




25 / 51




W �;in •X i

1Wn.dx1Wn/:


1Wn�1; W�;in�1/i2f1;:::;N g,


n. � jX i1Wn�1/




25 / 51

Sequential Monte Carlo Methods, continued

Particle filters: SMC methods applied to the filtering problem.

Almost all SMC methods, e.g.� auxiliary particle filters (Pitt & Shepard, 1999),� block sampling (Doucet, Briers & Sénécal, 2006),� resample–move algorithms (Gilks & Berzuini, 2001),� SMC samplers (Del Moral, Doucet & Jasra, 2006),� discrete state-space particle filters (Fearnhead, 1998),� : : :

are special cases of this algorithm!

26 / 51

Sequential Monte Carlo Methods, continued

Particle filters: SMC methods applied to the filtering problem.

Almost all SMC methods, e.g.� auxiliary particle filters (Pitt & Shepard, 1999),� block sampling (Doucet, Briers & Sénécal, 2006),� resample–move algorithms (Gilks & Berzuini, 2001),� SMC samplers (Del Moral, Doucet & Jasra, 2006),� discrete state-space particle filters (Fearnhead, 1998),� : : :

are special cases of this algorithm!

26 / 51

SMC: Sketch

Parti

clelo

catio

n

Q�1.x1/Time

27 / 51

SMC: Sketch

Parti

clelo

catio

n

Q�2.x1; x2/Time

27 / 51

SMC: Sketch

Parti

clelo

catio

n

Q�3.x1W2; x3/

27 / 51

SMC: Sketch

Parti

clelo

catio

n

Q�4.x1W3; x4/Time

27 / 51

SMC: Sketch

Parti

clelo

catio

n

Q�5.x1W4; x5/Time

27 / 51





Sample Degeneracy

� Also known as sample impoverishment, or path degeneracy.� Eventually, all particles share a common ancestor.� Thus, SMC methods rely on an ergodicity (‘forgetting’)

property of .��n /n2N

28 / 51

Sample Degeneracy



28 / 51

Sample Degeneracy



28 / 51

Sample Degeneracy, continued

Hidden Markov model, continuedLet ��n .x1Wn/ WD p.x1Wnj�; y1Wn/ then we are often interested in� filtering distributions: ��n .xn/ D p.xnj�; y1Wn/,�! approximated by a diverse set of particles.� smoothing distributions: ��n .xk/ D p.xkj�; y1Wn/,�! often approximated by a single particle.

29 / 51

Sample Degeneracy, continuedHidden Markov model, continuedLet ��n .x1Wn/ WD p.x1Wnj�; y1Wn/ then we are often interested in� filtering distributions: ��n .xn/ D p.xnj�; y1Wn/,�! approximated by a diverse set of particles.� smoothing distributions: ��n .xk/ D p.xkj�; y1Wn/,�! often approximated by a single particle.

Parti

clelo

catio

n

Timep.x1jy1/

29 / 51


Parti

clelo

catio

n

Timep.x2jy1W2/

29 / 51


Parti

clelo

catio

n

p.x3jy1W3/

29 / 51


Parti

clelo

catio

n

Time p.x4jy1W4/

29 / 51


Parti

clelo

catio

n

Time p.x5jy1W5/

29 / 51


Parti

clelo

catio

n

Timep.x1jy1/

29 / 51


Parti

clelo

catio

n

Timep.x1jy1W2/

29 / 51


Parti

clelo

catio

n

Timep.x1jy1W3/

29 / 51


Parti

clelo

catio

n

Timep.x1jy1W4/

29 / 51


Parti

clelo

catio

n

Timep.x1jy1W5/

29 / 51

Alleviating the Degeneracy

� Use residual/stratified/systematic resampling�! avoid multinomial resampling!� Only resample when necessary.� Devise better proposal kernels K�

n .– e.g. avoid K�n . � jxn�1/ WD f � . � jxn�1/ in HMMs(’bootstrap’ filter),

– better: make use of yn when sampling X in.

30 / 51





30 / 51





30 / 51





30 / 51





30 / 51

Particle Lineages

�n .

all random variables generatedby the SMC algorithm‚ …„ ƒ

x1Wn; a1Wn�1/ WD"NYiD1

K�1 .x

i1/

#

�"

nYpD2

r�.ap�1jx1Wp�1; a1Wp�2/NYiD1

K�p .x

ipjx

aip�1

1Wp�1/

#Pa

rticle

no.

Time

x51

x41

x31

x21

x11

31 / 51

Particle Lineages

�n .



K�1 .x

i1/

#

�"

nYpD2


K�p .x

ipjx

aip�1

1Wp�1/

#Pa

rticle

no.

Time

a51

a41

a31

a21

a11

31 / 51

Particle Lineages

�n .



K�1 .x

i1/

#

�"

nYpD2


K�p .x

ipjx

aip�1

1Wp�1/

#Pa

rticle

no.

Time

x52

x42

x32

x22

x12

31 / 51

Particle Lineages

�n .



K�1 .x

i1/

#

�"

nYpD2


K�p .x

ipjx

aip�1

1Wp�1/

#Pa

rticle

no.

Time

a52

a42

a32

a22

a12

31 / 51

Particle Lineages

�n .



K�1 .x

i1/

#

�"

nYpD2


K�p .x

ipjx

aip�1

1Wp�1/

#Pa

rticle

no.

Time

x53

x43

x33

x23

x13

31 / 51

Particle Lineages

�n .



K�1 .x

i1/

#

�"

nYpD2


K�p .x

ipjx

aip�1

1Wp�1/

#Pa

rticle

no.

Time

a53

a43

a33

a23

a13

31 / 51

Particle Lineages

�n .



K�1 .x

i1/

#

�"

nYpD2


K�p .x

ipjx

aip�1

1Wp�1/

#Pa

rticle

no.

Time

x54

x44

x34

x24

x14

31 / 51

Particle Lineages

�n .



K�1 .x

i1/

#

�"

nYpD2


K�p .x

ipjx

aip�1

1Wp�1/

#Pa

rticle

no.

Time

a54

a44

a34

a24

a14

31 / 51

Particle Lineages

�n .



K�1 .x

i1/

#

�"

nYpD2


K�p .x

ipjx

aip�1

1Wp�1/

#Pa

rticle

no.

Time

x55

x45

x35

x25

x15

31 / 51

Interpretation as Importance Sampling

� SMC methods are just (standard!) importance sampling ona suitably extended space.� Hence, SMC methods yield an unbiased estimate of the

normalising constant Z�n , which is given by

yZ�n.X1Wn;A1Wn�1/ DnY

pD1

�1

N

NXiD1

G�p .Xi1Wp/˜

ith unnormalisedincremental weight at time n

�:

32 / 51

Interpretation as Importance Sampling

� SMC methods are just (standard!) importance sampling ona suitably extended space.� Hence, SMC methods yield an unbiased estimate of the

normalising constant Z�n , which is given by

yZ�n.X1Wn;A1Wn�1/ DnY

pD1

�1

N

NXiD1

G�p .Xi1Wp/˜

ith unnormalisedincremental weight at time n

�:

32 / 51





Motivation

What if target distributions . Q�n/n2N are defined on spaces. zEn/n2N of non-increasing dimension?

33 / 51

Motivation, continuedExample (Annealing):� Want to target complicated distribution � on a space E.� Idea: use SMC methods to target bridging distributions

Q�n.x/ / Q n.x/ WD Œ�.x/��nŒ�1.x/�1��n;

for n D 1; : : : ; P , where– we can easily sample from �1,– 0 D �1 < �2 < � � � < �P�1 < �P D 1.

Q�

Dens

ity

x0

1

34 / 51


Q�n.x/ / Q n.x/ WD Œ�.x/��nŒ�1.x/�1��n;


Q�

Dens

ity

x0

1

34 / 51


Q�n.x/ / Q n.x/ WD Œ�.x/��nŒ�1.x/�1��n;


Q�

Dens

ity

x0

1

34 / 51


Q�n.x/ / Q n.x/ WD Œ�.x/��nŒ�1.x/�1��n;


Q�

Dens

ity

x0

1

34 / 51


Q�n.x/ / Q n.x/ WD Œ�.x/��nŒ�1.x/�1��n;


Q�1

Q�

Dens

ity

x0

1

34 / 51


Q�n.x/ / Q n.x/ WD Œ�.x/��nŒ�1.x/�1��n;


Q�2

Q�

Dens

ity

x0

1

34 / 51


Q�n.x/ / Q n.x/ WD Œ�.x/��nŒ�1.x/�1��n;


Q�3

Q�

Dens

ity

x0

1

34 / 51


Q�n.x/ / Q n.x/ WD Œ�.x/��nŒ�1.x/�1��n;


Q�4

Q�

Dens

ity

x0

1

34 / 51


Q�n.x/ / Q n.x/ WD Œ�.x/��nŒ�1.x/�1��n;


Q�5

Q�

Dens

ity

x0

1

34 / 51


Q�n.x/ / Q n.x/ WD Œ�.x/��nŒ�1.x/�1��n;


Q�6

Q�

Dens

ity

x0

1

34 / 51


Q�n.x/ / Q n.x/ WD Œ�.x/��nŒ�1.x/�1��n;


Q�7

Q�

Dens

ity

x0

1

34 / 51

SMC Samplers

� Problem: evaluating the weights difficult/impossible.� Solution: target a sequence of extended distributions.�n/n2N s.t.1. �n admits Q�n as a marginal,2. .�n/n2N are defined on spaces of increasing dimension.

SMC Samplers (Del Moral, Doucet & Jasra, 2006)Use ‘backward’ Markov kernels Ln�1.xn�1jxn/, so that

�n.x1Wn/ WD Q�n.xn/nY

pD1Lp�1.xp�1jxp/:

35 / 51

SMC Samplers





35 / 51

SMC Samplers





35 / 51





Aim

� Now: want to approximate

�P .�; x1WP / D P .�; x1WP /Z

or its marginal �P .�/.

ExampleIf �P .�; x1WP / D p.�; x1WP jy1WP / for observations y1WP , then� P .�; x1WP / D p.�; x1WP ; y1WP /,� Z D p.y1WP /.

36 / 51

Aim


�P .�; x1WP / D P .�; x1WP /Z



36 / 51

Aim


�P .�; x1WP / D P .�; x1WP /Z



36 / 51

Aim


�P .�; x1WP / D P .�; x1WP /Z



36 / 51

Ideal MCMC Algorithms

Ideal Metropolis–Hastings AlgorithmGiven .�; X1WP /,1. propose new values .�?; X?

1WP /,2. accept with some probability.

BUT: difficult to design good proposals for X?1WP .

� Idea: use SMC approximation O��? as proposal for X?1WP .

� Problem: proposal density is intractable.H) cannot evaluate acceptance probability.

37 / 51







37 / 51







37 / 51







37 / 51







37 / 51







37 / 51

Ideal MCMC Algorithms, continued

Ideal Gibbs sampler:Given .�; X1WP /, sample from1. �P .� jx1WP /,2. �P .x1WP j�/ D ��P .x1WP /,

BUT: cannot sample from ��P .� Idea: sample from SMC approximation O�� of ��P .� Problem: O��P ¤ ��P .

38 / 51




38 / 51




38 / 51




38 / 51




38 / 51





Extended Target Distribution

Particle MCMC Methods� Andrieu, Doucet & Holenstein (2010).� exact MCMC methods, i.e.

– Metropolis–Hastings algorithm,– Gibbs sampler

targeting an extended distribution.� this distribution includes all random variables generated by

an SMC algorithm, i.e. .X1WP ;A1WP�1/ (and more).� basis: Pseudo-Marginal approach,

– Andrieu & Roberts (2009),– permits IS within MCMC,– and SMC can be interpreted as standard IS(on a suitably extended space).

39 / 51







39 / 51







39 / 51







39 / 51







39 / 51

Extended Target Distribution, continuedParametrisation I:x�P .�;x1WP ; a1WP�1; b�P /

/ p.�/ w�;b�PP–weight ofb�P th path

yZ�P .x1WP ; a1WP�1/œSMC estimate of

normalising constant

�P .x1WP ; a1WP�1/œSMC algorithm

:

Parti

cleno

.

Time

a54

a44

a34

a24

a14

a53

a43

a33

a23

a13

a52

a42

a32

a22

a12

a51

a41

a31

a21

a11

b�5

40 / 51

Reparametrisation

Using b�n D ab�

nC1

n for n 2 f1; : : : ; P � 1g,.x1WP ; a1WP�1; b�P /œ

Parametrisation I

! .x��1WP ; a��1WP�1; x

�1WP ; b

�1WP /Ÿ

Parametrisation II

:Pa

rticle

no.

Time

a54

a44

a34

a24

a14

a53

a43

a33

a23

a13

a52

a42

a32

a22

a12

a51

a41

a31

a21

a11

b�5

41 / 51

Reparametrisation

Using b�n D ab�

nC1

n for n 2 f1; : : : ; P � 1g,.x1WP ; a1WP�1; b�P /œ

Parametrisation I

! .x��1WP ; a��1WP�1; x

�1WP ; b

�1WP /Ÿ

Parametrisation II

:Pa

rticle

no.

Time

a54

a44

a34

a14

a53

a43

a23

a13

a52

a42

a32

a12

a51

a41

a21

a11

b�5

b�4

b�3

b�2b�

1

41 / 51

Extended Target DistributionParametrisation II:x�P .�;x��1WP ; a��1WP�1; x�1WP ; b�1WP /

D �P .�; x�1WP /™’actual’target

p.b�1WP /˜marginalof b�1WP

�P .x��1WP ; a

��1WP�1kx�1WP ; b�1WP /

‘conditional’ SMC algorithm

:

Parti

cleno

.

Time

a54

a44

a34

a14

a53

a43

a23

a13

a52

a42

a32

a12

a51

a41

a21

a11

b�5

b�4

b�3

b�2b�

1

42 / 51





PMMH AlgorithmA Metropolis–Hastings Algorithm Targeting x�P

� Notation: b WD b�P and � WD .�;x1WP ; a1WP�1; b/.� Proposal kernel:

Q.�?j�/ WD T .�?j�/ �?

P .x?1WP ; a?1WP�1/w

�?;b?

P ;

� Acceptance probability (using Parametrisation I):

˛.�?j�/ WD 1 ^ x�P .�?/Q.�j�?/

x�P .�/Q.�?j�/

D 1 ^ p.�?/

p.�/

yZ�?

P .x?1WP ; a

?1WP�1/

yZ�P .x1WP ; a1WP�1/T .� j�?/T .�?j�/:

� Special case of the GIMH algorithm(Andrieu & Roberts, 2009)

43 / 51

PMMH Algorithm, continued

� efficiency crucially depends on SMC estimate of Z�P� usually, varŒ yZ�P .X1WP ;A1WP�1/� grows linearly in P .� need to increase N at least linearly with P .�! otherwise: low acceptance rate.

44 / 51



44 / 51



44 / 51

Alternative “justification” of PMMH/GIMH

� only want to approximateZ�P .�; x1WP /dx1WP D �P .�/ / P .�/:

� p.�/Z�P D P .�/, so that

E�p.�/ yZ�P .X1WP ;A1WP�1/

� D P .�/:� View as MH algorithm with approximation

P .�?/T .� j�?/

P .�/T .�?j�/� p.�?/

p.�/

yZ�?

P .x?1WP ; a

?1WP�1/

yZ�P .x1WP ; a1WP�1/T .� j�?/T .�?j�/:

45 / 51






P .�?/T .� j�?/

P .�/T .�?j�/� p.�?/

p.�/

yZ�?

P .x?1WP ; a

?1WP�1/

yZ�P .x1WP ; a1WP�1/T .� j�?/T .�?j�/:

45 / 51






P .�?/T .� j�?/

P .�/T .�?j�/� p.�?/

p.�/

yZ�?

P .x?1WP ; a

?1WP�1/

yZ�P .x1WP ; a1WP�1/T .� j�?/T .�?j�/:

45 / 51





Particle Gibbs SamplerA Gibbs sampler targeting x�P

Recall thatx�P .�;x��1WP ; a��1WP�1; x�1WP ; b�1WP /

D �P .�; x�1WP /™‘actual’target

p.b�1WP /˜distribution ofindices of x�1WP

�P .x��1WP ; a

��1WP�1kx�1WP ; b�1WP /

‘conditional’ SMC algorithm

:

Parti

cleno

.

Time

b�5

b�4

b�3

b�2b�

1

46 / 51

Particle Gibbs SweepSample from:1. �P .� jx�1WP / [e.g. via a Metropolis–Hastings step],2. �P .x��1WP ; a��1WP�1kx�1WP ; b�1WP / [via ‘conditional’ SMC],3. x�P .b�P j�;x1WP ; a1WP�1/ [via Parametrisation I].

47 / 51


Parti

cleno

.

Time47 / 51


Parti

cleno

.

Time47 / 51


Parti

cleno

.

Time47 / 51


Parti

cleno

.

Time47 / 51


Parti

cleno

.

Time47 / 51


Parti

cleno

.

Time47 / 51

Particle Gibbs with Backward SamplingWhiteley (2010)

Sample from:1. �P .� jx�1WP / [e.g. via a Metropolis–Hastings step],2. �P .x��1WP ; a��1WP�1kx�1WP ; b�1WP / [via ‘conditional’ SMC],3. x�P .b�n j�;x1Wn; a1Wn�1; x�nC1WP ; b�nC1WP / for n D P; : : : ; 1.

48 / 51



Parti

cleno

.

Time 48 / 51



Parti

cleno

.

Time 48 / 51



Parti

cleno

.

Time 48 / 51



Parti

cleno

.

Time 48 / 51



Parti

cleno

.

Time 48 / 51



Parti

cleno

.

Time 48 / 51



Parti

cleno

.

Time 48 / 51



Parti

cleno

.

Time 48 / 51



Parti

cleno

.

Time 48 / 51



Parti

cleno

.

Time 48 / 51



Parti

cleno

.

Time 48 / 51



Parti

cleno

.

Time 48 / 51



Parti

cleno

.

Time 48 / 51



Parti

cleno

.

Time 48 / 51

Particle Gibbs with Ancestor SamplingLindsten, Jordan & Schön (2012)

1. sample from �P .� jx�1WP /,2. for n D 1; : : : ; P , sample from

i. �P .x�b�nn ; a

�b�nn�1kx1Wn�1; a1Wn�2; x�nWP ; b�nWP /,

ii. x�P .b�n j�;x1Wn; a1Wn�1; x�nC1WP ; b�nC1WP /.

49 / 51






Parti

cleno

.

Time49 / 51






Parti

cleno

.

Time49 / 51






Parti

cleno

.

Time49 / 51






Parti

cleno

.

Time49 / 51






Parti

cleno

.

Time49 / 51






Parti

cleno

.

Time49 / 51






Parti

cleno

.

Time49 / 51






Parti

cleno

.

Time49 / 51






Parti

cleno

.

Time49 / 51






Parti

cleno

.

Time49 / 51





SMC2

Chopin, Jacob & Papaspiliopoulos, (2013)

� Particle MCMC methods can only target a singledistribution �P .�; x1WP /.� How to approximate .�n.�; x1Wn//n2N (e.g. if observations

arrive sequentially in time)?� Idea: instead of MCMC, use SMC to target (a marginal of)

the extended distribution x�n.x1Wn; a1Wn�1; b�n/.� Can be interpreted as nested SMC algorithms – SMCwithin SMC, i.e. each particle has its own SMC algorithm.

50 / 51

SMC2





50 / 51

SMC2





50 / 51

SMC2





50 / 51

LiteratureI C Andrieu, GO Roberts, (2009). The pseudo-marginal approach for efficient Monte

Carlo computations. ANN STAT, 37(2), 697–725.

I C Andrieu, A Doucet, R Holenstein (2010). Particle Markov chain Monte Carlomethods. J ROY STAT SOC B, 72(3), 269–342.

I N Chopin, PE Jacob, O Papaspiliopoulos, (2013). SMC2: an efficient algorithm forsequential analysis of state space models. J ROY STAT SOC B, 75(3), 397–426.

I P Del Moral, A Doucet, A Jasra, (2006). Sequential Monte Carlo samplers.J ROY STAT SOC B, 68(3), 411–436.

I F Lindsten, MI Jordan, T Schön, (2012). Ancestor Sampling for Particle Gibbs.NIPS.

I DA Van Dyk, T Park (2006). Partially collapsed Gibbs samplers. JASA, 103(482),790–796.

I N Whiteley (2010). Contribution to the discussion on ‘Particle Markov chainMonte Carlo methods’ by C Andrieu, A Doucet, R Holenstein.J ROY STAT SOC B, 72(3), 306–307.

I N Whiteley, AM Johansen, S Godsill (2011). Monte Carlo filtering of piecewisedeterministic processes. J COMPUT GRAPH STAT, 20(1), 119–139.

51 / 51

Documents

Introduction to Sequential Monte Carlo and Particle MCMC ... · Motivation VanillaMonteCarlo ImportanceSampling MarkovChainMonteCarloMethods State-Space-ExtensionTricks SequentialMonteCarloMethods