Upload
doanngoc
View
223
Download
0
Embed Size (px)
Citation preview
A Repelling-Attracting Metropolis(RAM) Algorithm for Multimodality
Tak (Hyungsuk) Tak1, Xiao-Li Meng1, and David A. van Dyk2
Harvard University1 and Imperial College London2
International CHASC Astro-Statistics Collaboration
9 June 2016
1 / 13
MotivationTime delay estimation problem
Credit: NASA’s Goddard Space Flight Center
The strong gravitational field of a lensing galaxy splits light into two images.
I Light rays take different routes with different lengths.
I Difference between their arrival times → Time delay (∆)
I Time delay is used to infer cosmological parameters including Ho .
2 / 13
Motivation (cont.)
Modeling time delay → MCMC for multimodality
I Full posterior density function: π(∆,θ | Data) (Tak et al., arXiv).
I Metropolis within Gibbs sampler (Tierney, 1994)
I A multimodal (marginal) posterior of ∆ for Quasar Q0957+561
I Just 8 jumps out of a million iterations!
I Could we improve Metropolis’ ability to jump between modeswithout losing its simple-to-implement characteristic?
3 / 13
IdeaThere is a RAM on top of the mountain.How would this RAM move to the top of the other mountain?
Image credit: www.launsteinimagery.com, www.cliparts.com, www.iconfinder.com
4 / 13
Idea (cont.)1. Make a down-up movement in density to generate a proposal x ′′.
Image credit: http://www.bestofthetetons.com/, http://blog.showmenaturephotography.com/
2. Accept or reject x ′′ with probability min{
1, π(x′′)qDU(x (i)|x′′)π(x (i))qDU(x′′|x (i))
}.
Note: qDU is a down-up (DU) proposal density.5 / 13
RAM: Proposal
Two-step procedurex (i): Current state ↘ x ′: Intermediate proposal ↗ x ′′: Final proposal
1. (Downhill Metropolis) Generate x ′ ∼ q(· | x (i)) ∼ N(x (i), σ2) and
accept x ′ with probability αDε (x ′ | x (i)) = min
{1, π(x (i))+ε
π(x′)+ε
}.
Repeat this proposal step until one is accepted (forced Metropolis).
2. (Uphill Metropolis) Generate x ′′ ∼ q(· | x ′) and
accept x ′′ with probability αUε (x ′′ | x ′) = min
{1, π(x′′)+ε
π(x′)+ε
}Repeat this proposal step until one is accepted (forced Metropolis).
Note: ε = 10−308 to prevent a ratio of zeros (0/0).6 / 13
RAM: Acceptance/Rejection
Accept x ′′ with a Metropolis-Hastings acceptance probability
αDU(x ′′ | x (i)) = min
{1,
π(x ′′)qDU(x (i) | x ′′)π(x (i))qDU(x ′′ | x (i))
}= min
{1,
π(x ′′)∫q(x | x (i))αD
ε (x | x (i))dx
π(x (i))∫q(x | x ′′)αU
ε (x | x ′′)dx
}.
Is there a way to avoid calculating this intractable ratio?
If we explore an extended space with a correct marginal π(x), then therecan be a way to cancel this intractable ratio (Møller et al., 2006).
7 / 13
RAM: Auxiliary variable approach
An auxiliary variable z with πC(z | x) well-defined.
I Joint target density: πJ(z , x) = π(x)πC(z | x) = π(x)q(z (i) | x (i))
I Joint proposal density:
qJ(z ′′, x ′′ | z (i), x (i)) = q1(x ′′ | z (i), x (i))q2(z ′′ | x ′′, z (i), x (i))
= qDU(x ′′ | x (i))qD(z ′′ | x ′′)
Note: qD is a forced downhill kernel density.
I Joint acceptance probability:
αJ(z ′′, x ′′ | z (i), x (i)) = min
[1,
πJ(z ′′, x ′′)qJ(z (i), x (i) | z ′′, x ′′)
πJ(z (i), x (i))qJ(z ′′, x ′′ | z (i), x (i))
]
= min
1,π(x ′′) min{1, π(x(i))+ε
π(z(i))+ε}
π(x (i)) min{1, π(x′′)+επ(z′′)+ε
}
8 / 13
RAM: Overall algorithm
↘ x ′ ↗ x ′′ ↘ z ′′
Image credit: http://www.bestofthetetons.com/, http://blog.showmenaturephotography.com/
A RAM is composed of four steps in each iteration.
Steps 1–3: Generating a joint proposal (z ′′, x ′′)
1. (↘) Redraw x ′ ∼ q(· | x (i)) until u1 ∼ Unif(0, 1) < αDε (x ′ | x (i))
2. (↗) Redraw x ′′ ∼ q(· | x ′) until u2 ∼ Unif(0, 1) < αUε (x ′′ | x ′)
3. (↘) Redraw z ′′ ∼ q(· | x ′′) until u3 ∼ Unif(0, 1) < αDε (z ′′ | x ′′)
Step 4: Accept or reject the joint proposal (z ′′, x ′′)
4. Set (z (i+1), x (i+1)) = (z ′′, x ′′) if u4 < αJ(z ′′, x ′′ | z (i), x (i)), whereu4 ∼ Unif(0, 1), and set (z (i+1), x (i+1)) = (z (i), x (i)) otherwise.
9 / 13
Example: Quasar Q0957+561 (Hainline et al, 2012)
Metropolis within Gibbs sampler for p(∆,β,θ | Data)
I At iteration i ,
Step 1: Sample ∆(i) ∼ p(∆ | θ(i−1),Data)
Step 2: Sample θ(i) ∼ p(θ | ∆(i),Data)
I We use tempered transitions (TT) (Neal, 1996), Metropolis or RAMto draw ∆ in Step 1.
I We run 10 chains each of length 150,000 with 50,000 burn-in,spreading 10 initial values of ∆ across its space [−1100, 1100].
I We set an arbitrarily large proposal scale (σ = 400), assuming modallocations are unknown.
I We consider the CPU time required by each algorithm, running andthinning longer chains so that each chain has 100,000 samples.
10 / 13
Example: Quasar Q0957+561 (cont.)
Total # of iterations Avg. CPU time # of jumpsof each chain (seconds) between modes
TT 150,000 3,083 16Metropolis 1,992,032 3,085 52
RAM 488,267 3,080 120
11 / 13
Conclusion
Take-home messages
I Easy to implement (Keep it in your MCMC tool box!)
I Always possible to replace Metropolis with RAM for multimodality.
I Our extensive simulation studies will be reported in the final paper(some of them are on arXiv).
Future directions
I Theoretical convergence rate
I Possibly many (and better) down-up schemes, e.g., anti-Langevin(Christian P. Robert) or negative temperature (Art B. Owen)
I A global optimizer based on the down-up idea (analog to annealing)
12 / 13
Reference
1. Hainline, L. J. et al. (2012) “A New Microlensing Event in the Doubly Imaged QuasarQ0957+ 561” The Astrophysical Journal, 744, 2, 104.
2. Møller, J., Pettitt, A. N., Reeves, R., Berthelsen, K. K. (2006) “An Efficient Markov ChainMonte Carlo Method for Distributions with Intractable Normalising Constants” Biometrika,93, 2, 451–458.
3. Neal, R. M. (1996) “Sampling From Multimodal Distributions Using Tempered Transitions”Statistics and Computing, 6, 4, 353–366.
4. Tak, H., Meng, X.-L., van Dyk, D. A. (in progress), “A Repulsive-Attractive MetropolisAlgorithm for Multimodality.” arXiv 1601.05633
5. Tak, H., Mandel, K., van Dyk, D. A., Kashyap, V. L., Meng, X.-L., Siemiginowska, A. (inprogress), “Bayesian Estimates of Astronomical Time Delays between Gravitationally LensedStochastic Light Curves.” arXiv 1602.01462
6. Tierney, L. (2009) “Markov Chains for Exploring Posterior Distributions” The Annals ofStatistics, 22, 4, 1701–1728.
13 / 13