View
0
Download
0
Category
Preview:
Citation preview
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Kernel adaptive Sequential Monte Carlo
Ingmar Schuster (Paris Dauphine)Heiko Strathmann (University College London)
Brooks Paige (Oxford)Dino Sejdinovic (Oxford)
December 7, 2015
1 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Section 1
Outline
2 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
1 Introduction
2 Kernel Adaptive SMC (KASS)
3 Implementation Details
4 Evaluation
5 Conclusion
3 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Section 2
Introduction
4 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Sequential Monte Carlo Samplers
Approximate integrals with respect to target distribution πT
Build upon Importance Sampling: approximate integral of hwrt density πT using samples following density q (undercertain conditions):∫
h(x)dπT (x) =
∫h(x)
πT (x)
q(x)dq(x)
Given prior π0, build sequence π0, . . . , πi , . . . πT such that
πi+1 is closer to πT than πi(δ(πi+1, πT ) < δ(πi , πT ) for some divergence δ)sample from πi can approximate πi+1 well usingimportance weight function w(·) = πi+1(·)/πi (·)
5 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Sequential Monte Carlo Samplers
At i = 0
Using proposal density q0, generate particles{(w0,j ,X0,j)}Nj=1 where w0,j = π0(X0,j)/q0(X0,j)importance resampling, resulting in Nequally weighted particles {(1/N, X̄0,j)}Nj=1
rejuvenation move for each X̄0,j byMarkov Kernel leaving π0 invariant
At i > 0
approximate πi by {(πi (Xi−1,j)/πi−1(Xi−1,j),Xi−1,j)}Nj=1
resamplingrejuvenation leaving πi invariantif πi 6= πT , repeat
6 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Sequential Monte Carlo Samplers
estimate evidence ZT of πT by
ZT ≈ Z0
T∏i=1
1
N
∑j
wi ,j
(aka normalizing constant, marginal likelihood)
Can be adaptive in rejuvenation steps without diminishingadaptation as required in adaptive MCMC
Will construct rejuvenation using RKHS-embedding ofparticles
7 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Intractable Likelihoods and Evidence
in nonconjugate latent variable models, intractable likelihoodsarise
when likelihood can be estimated unbiasedly, SMC still valid
simple case: estimate likelihood using IS or SMC, leads to IS2
(Tran et al., 2013) and SMC2 (Chopin et al., 2011)
results in noisy Importance Weights, but evidenceapproximation is still valid (Tran et al., 2013, Lemma 3)
8 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Nonlinear proposals based on positive definite Kernels
Kernel Adaptive Metropolis Hastings (KAMH) was introducedin Sejdinovic et al. (2014)
Given previous samples from target distribution π, draw newones more efficiently
Each sample mapped to functional in Reproducing KernelHilbert Space (RKHS) Hk using pd kernel k(·, ·)Fit Gaussian qk in Hk with
µ =
∫k(·, x)dπ(x) ≈ 1
n
n∑i=1
k(·,Xi )
Σ =
∫k(·, x)⊗ k(·, x)dπ(x)− µ⊗ µ
9 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Nonlinear proposals based on positive definite Kernels
Draw sample from qk and project back into original space, useas proposal in MH
KAMH set in adaptive MCMC, using vanishing adaptation(e.g. vanishing probability to use new samples for computingadaptive proposal)
Depending on used positive definite kernel, can adapt tononlinear targets
10 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Section 3
Kernel Adaptive SMC (KASS)
11 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Adaptive SMC Sampler
SMC works on a sequence of targets, so we use an artificialsequence of distributions leading from prior π0 to posterior πT
parameters of rejuvenation kernel can be adapted beforerejuvenation
Fearnhead and Taylor (2013) used global Gaussianapproximation as proposal in Metropolis Hastings rejuvenation
resulting in adaptive SMC sampler (ASMC)
12 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Kernel adaptive rejuvenation
instead, we use RKHS-proposal projected into input space(in closed form)
given unweighted particles {X̄i}Ni=1, proposal at X̄j is
qKAMH(·|X̄j) = N (·|X̄j , ν2MX,X̄j
CM>X,X̄j
+ γ2I ))
where C = I − 1n11> is centering matrix and
MX,X̄j= 2[∇xk(x , X̄1)|x=X̄j
, ...,∇xk(x , X̄N)|x=X̄j]
results inASMC using linear kernel
k(X ,X ′) = X>X ′
locally adaptive fit using Gaussian RBF
k(X ,X ′) = exp
(−‖X − X ′‖2
2σ2
)13 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
KASS versus ASMC
green: ASMC / KASS with linear kernelred: KASS with Gaussian RBF kernel
14 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Related Work
Most direct relation to ASMC (which is a special case)
All SMC samplers related to Annealed Importance Samplingwhich however does not use resampling (Neal, 1998)
Local Adaptive Importance Sampling (Givens and Raferty,1996, LAIS) has similar locally adaptive effect
at each iteration compute pairwise distances betweenImportance Samplesuse k nearest neighbors for fitting local Gaussian proposalno resampling steps mean decrease in sampling efficiencywhich is exponential in dimensionality of problem
15 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Section 4
Implementation Details
16 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Construction of Target Sequence
For artificial distribution sequence we used geometric bridge
πi ∝ π1−ρi0 πρiT
where (ρi )Ti=1 is an increasing sequence satisfying ρT = 1
another standard choice in Bayesian Inference is addingdatapoints one after another
πi (X ) = π(X |d1, . . . , dbρiDc)
resulting in Iterated Batch Importance Sampling(Chopin, 2002, IBIS)
17 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Stochastic approximation tuning of ν2
KASS’ free scaling parameter ν2 can be tuned for optimalscaling
Fearnhead and Taylor (2013) use auxiliary variable approachwith ESJD criterion
We used stochastic approximation framework of Andrieu andThoms (2008) instead
asymptotically optimal acceptance rate for Random Walkproposals is αopt = 0.234 (Rosenthal, 2011)after rejuvenation, Rao-Blackwellized estimator α̂i available byaveraging MH acceptance probabilitiestune ν2 by
ν2i+1 = ν2
i + λi (α̂i − αopt)
for non-increasing λ1, . . . , λT
18 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Section 5
Evaluation
19 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Synthetic nonlinear target (Banana)
Synthetic target: Banana distribution in 8 dimensions, i.e.Gaussian with twisted second dimension
20 15 10 5 0 5 10 15 20
4
2
0
2
4
6
8
20 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Synthetic nonlinear target (Banana)
Compare performance of Random-Walk rejuvenation withasymptotically optimal scaling (ν = 2.38/
√d), ASMC and
KASS with Gaussian RBF kernel
Fixed learning rate of λ = 0.1 to adapt scale parameter usingstochastic approximation
Geometric bridge of length 20
30 Monte Carlo runs
Report Maximum Mean Discrepancy (MMD) using polynomialkernel of order 3: distance of moments up to order 3 betweenground truth samples and samples produced by each method
21 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Synthetic nonlinear target (Banana)
0 100 200 300 400 500 600
Population size
3.5
4.0
4.5
5.0
5.5
6.0
6.5
7.0
7.5
8.0M
MD
tobe
nchm
ark
sam
ple ×107
KASSRWSMCASMC
Figure: Improved convergence of all mixed moments up to order 3 ofKASS compared to ASMC and RW-SMC.
22 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Sensor network localization
Applied problem: infer locations of S = 3 sensors in a sensornetwork measuring distance to each other
Known position for B = 2 base sensors
Measurements successful with probability decayingexponentially in squared distance (otherwise unobserved)
Zi ,j ∼ Binom
(1, exp
(−‖xi − xj‖2
2
2 · 0.32
))Measurements corrupted by Gaussian noise
Yi ,j ∼
{N (‖xi − xj‖, 0.02) if Zi ,j = 1
Yi ,j = 0 else
23 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Sensor network localization
run KASS and ASMC with geometric bridge of length 50 and10, 000 particles, fixed learning rate λi = 1
run KAMH for 50 · 10, 000 iterations, discard first half asburn-in, diminishing adaptation λi = 1/
√i
initialize both algorithms with samples from prior
qualitative comparison of KASS and closest adaptive MCMCalgorithm KAMH
24 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Sensor network localization: KAMH adaptive MCMC
−0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.0
0.2
0.4
0.6
0.8
1.0MCMC (KAMH)
Figure: Posterior samples of unknown sensor locations (in color) byKAMH. Set-up of the true sensor locations (black dots) and base sensors(black stars) causes uncertainty in posterior.
25 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Sensor network localization: KASS adaptive SMC
−0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.0
0.2
0.4
0.6
0.8
1.0SMC (KASS)
Figure: Posterior samples of unknown sensor locations (in color) byKASS. Set-up of the true sensor locations (black dots) and base sensors(black stars) causes uncertainty in posterior.
26 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Sensor network localization
MCMC algorithm not able to traverse all the modes withoutspecial care (e.g. Wormhole HMC by Lan et al., 2014)
KASS and ASMC perform similarly in this setup
with S = 2 (higher uncertainty), 1000 particles MMD of
0.76± 0.4 for KASS0.94± 0.7 for ASMC
27 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Evidence approximation for intractable likelihoods
in classification using Gaussian Processes (GP), logistictransformation renders likelihood intractable
likelihood can be unbiasedly estimated using ImportanceSampling from EP approximation
estimate model evidence when using ARD kernel in the GP
particularly hard because noisy likelihoods means noisyimportance weights
ground truth by averaging evidence estimate over 20 longrunning SMC algorithms
28 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Evidence approximation for intractable likelihoods
Figure: Ground truth in red, KASS in blue, ASMC in green.
29 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Section 6
Conclusion
30 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Conclusion (1)
Developed Kernel Adaptive SMC sampler for static models
KASS exploits local covariance of target throughRKHS-informed rejuvenation proposals
combines these with general SMC advantages for multimodaltargets and evidence estimation
especially attractive when likelihoods are intractable
31 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Conclusion (2)
evaluated on a strongly twisted Banana where it was clearlybetter than ASMC
KASS enables exploring multiple modes in nonlinear sensor
KASS exhibits less variance than ASMC in evidenceestimation for GP classification
evidence approximation even in case of intractable likelihoods
32 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Thanks!
33 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Literature I
Andrieu, C. and Thoms, J. (2008). A tutorial on adaptive MCMC.Statistics and Computing, 18(November):343–373.
Chopin, N. (2002). A sequential particle filter method for staticmodels. Biometrika, 89(3):539–552.
Chopin, N., Jacob, P. E., and Papaspiliopoulos, O. (2011).SMCˆ2: an efficient algorithm for sequential analysis ofstate-space models. 0(1):1–27.
Fearnhead, P. and Taylor, B. M. (2013). An Adaptive SequentialMonte Carlo Sampler. Bayesian Analysis, (2):411–438.
Givens, G. H. and Raferty, A. E. (1996). Local AdaptiveImportance Sampling for Multivariate Densities with StrongNonlinear Relationships. Journal of the American StatisticalAssociation, 91(433):132–141.
34 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Literature II
Lan, S., Streets, J., and Shahbaba, B. (2014). Wormholehamiltonian monte carlo. In Twenty-Eighth AAAI Conference onArtificial Intelligence.
Neal, R. (1998). Annealed Importance Sampling. Technical report,University of Toronto.
Rosenthal, J. S. (2011). Optimal Proposal Distributions andAdaptive MCMC. In Handbook of Markov Chain Monte Carlo,chapter 4, pages 93–112. Chapman & Hall.
Sejdinovic, D., Strathmann, H., Lomeli, M. G., Andrieu, C., andGretton, A. (2014). Kernel Adaptive Metropolis-Hastings. InInternational Conference on Machine Learning (ICML), pages1665–1673.
35 / 36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References
Literature III
Tran, M.-N., Scharth, M., Pitt, M. K., and Kohn, R. (2013).Importance sampling squared for Bayesian inference in latentvariable models. pages 1–39.
36 / 36
Recommended