A Repelling-Attracting Metropolis (RAM) Algorithm for Multimodalitycschafer/SCMA6/Tak.pdf · A Repelling-Attracting Metropolis (RAM) Algorithm for Multimodality Tak (Hyungsuk) Tak

A Repelling-Attracting Metropolis(RAM) Algorithm for Multimodality

Tak (Hyungsuk) Tak1, Xiao-Li Meng1, and David A. van Dyk2

Harvard University1 and Imperial College London2

International CHASC Astro-Statistics Collaboration

9 June 2016

1 / 13

MotivationTime delay estimation problem

Credit: NASA’s Goddard Space Flight Center

The strong gravitational field of a lensing galaxy splits light into two images.

I Light rays take different routes with different lengths.

I Difference between their arrival times → Time delay (∆)

I Time delay is used to infer cosmological parameters including Ho .

2 / 13

Motivation (cont.)

Modeling time delay → MCMC for multimodality

I Full posterior density function: π(∆,θ | Data) (Tak et al., arXiv).

I Metropolis within Gibbs sampler (Tierney, 1994)

I A multimodal (marginal) posterior of ∆ for Quasar Q0957+561

I Just 8 jumps out of a million iterations!

I Could we improve Metropolis’ ability to jump between modeswithout losing its simple-to-implement characteristic?

3 / 13

IdeaThere is a RAM on top of the mountain.How would this RAM move to the top of the other mountain?

Image credit: www.launsteinimagery.com, www.cliparts.com, www.iconfinder.com

4 / 13

Idea (cont.)1. Make a down-up movement in density to generate a proposal x ′′.

Image credit: http://www.bestofthetetons.com/, http://blog.showmenaturephotography.com/

2. Accept or reject x ′′ with probability min{

1, π(x′′)qDU(x (i)|x′′)π(x (i))qDU(x′′|x (i))

}.

Note: qDU is a down-up (DU) proposal density.5 / 13

RAM: Proposal

Two-step procedurex (i): Current state ↘ x ′: Intermediate proposal ↗ x ′′: Final proposal

1. (Downhill Metropolis) Generate x ′ ∼ q(· | x (i)) ∼ N(x (i), σ2) and

accept x ′ with probability αDε (x ′ | x (i)) = min

{1, π(x (i))+ε

π(x′)+ε

}.

Repeat this proposal step until one is accepted (forced Metropolis).

2. (Uphill Metropolis) Generate x ′′ ∼ q(· | x ′) and

accept x ′′ with probability αUε (x ′′ | x ′) = min

{1, π(x′′)+ε

π(x′)+ε

}Repeat this proposal step until one is accepted (forced Metropolis).

Note: ε = 10−308 to prevent a ratio of zeros (0/0).6 / 13

RAM: Acceptance/Rejection

Accept x ′′ with a Metropolis-Hastings acceptance probability

αDU(x ′′ | x (i)) = min

{1,

π(x ′′)qDU(x (i) | x ′′)π(x (i))qDU(x ′′ | x (i))

}= min

{1,

π(x ′′)∫q(x | x (i))αD

ε (x | x (i))dx

π(x (i))∫q(x | x ′′)αU

ε (x | x ′′)dx

}.

Is there a way to avoid calculating this intractable ratio?

If we explore an extended space with a correct marginal π(x), then therecan be a way to cancel this intractable ratio (Møller et al., 2006).

7 / 13

RAM: Auxiliary variable approach

An auxiliary variable z with πC(z | x) well-defined.

I Joint target density: πJ(z , x) = π(x)πC(z | x) = π(x)q(z (i) | x (i))

I Joint proposal density:

qJ(z ′′, x ′′ | z (i), x (i)) = q1(x ′′ | z (i), x (i))q2(z ′′ | x ′′, z (i), x (i))

= qDU(x ′′ | x (i))qD(z ′′ | x ′′)

Note: qD is a forced downhill kernel density.

I Joint acceptance probability:

αJ(z ′′, x ′′ | z (i), x (i)) = min

[1,

πJ(z ′′, x ′′)qJ(z (i), x (i) | z ′′, x ′′)

πJ(z (i), x (i))qJ(z ′′, x ′′ | z (i), x (i))

]

= min

1,π(x ′′) min{1, π(x(i))+ε

π(z(i))+ε}

π(x (i)) min{1, π(x′′)+επ(z′′)+ε

}

8 / 13

RAM: Overall algorithm

↘ x ′ ↗ x ′′ ↘ z ′′

Image credit: http://www.bestofthetetons.com/, http://blog.showmenaturephotography.com/

A RAM is composed of four steps in each iteration.

Steps 1–3: Generating a joint proposal (z ′′, x ′′)

1. (↘) Redraw x ′ ∼ q(· | x (i)) until u1 ∼ Unif(0, 1) < αDε (x ′ | x (i))

2. (↗) Redraw x ′′ ∼ q(· | x ′) until u2 ∼ Unif(0, 1) < αUε (x ′′ | x ′)

3. (↘) Redraw z ′′ ∼ q(· | x ′′) until u3 ∼ Unif(0, 1) < αDε (z ′′ | x ′′)

Step 4: Accept or reject the joint proposal (z ′′, x ′′)

4. Set (z (i+1), x (i+1)) = (z ′′, x ′′) if u4 < αJ(z ′′, x ′′ | z (i), x (i)), whereu4 ∼ Unif(0, 1), and set (z (i+1), x (i+1)) = (z (i), x (i)) otherwise.

9 / 13

Example: Quasar Q0957+561 (Hainline et al, 2012)

Metropolis within Gibbs sampler for p(∆,β,θ | Data)

I At iteration i ,

Step 1: Sample ∆(i) ∼ p(∆ | θ(i−1),Data)

Step 2: Sample θ(i) ∼ p(θ | ∆(i),Data)

I We use tempered transitions (TT) (Neal, 1996), Metropolis or RAMto draw ∆ in Step 1.

I We run 10 chains each of length 150,000 with 50,000 burn-in,spreading 10 initial values of ∆ across its space [−1100, 1100].

I We set an arbitrarily large proposal scale (σ = 400), assuming modallocations are unknown.

I We consider the CPU time required by each algorithm, running andthinning longer chains so that each chain has 100,000 samples.

10 / 13

Example: Quasar Q0957+561 (cont.)

Total # of iterations Avg. CPU time # of jumpsof each chain (seconds) between modes

TT 150,000 3,083 16Metropolis 1,992,032 3,085 52

RAM 488,267 3,080 120

11 / 13

Conclusion

Take-home messages

I Easy to implement (Keep it in your MCMC tool box!)

I Always possible to replace Metropolis with RAM for multimodality.

I Our extensive simulation studies will be reported in the final paper(some of them are on arXiv).

Future directions

I Theoretical convergence rate

I Possibly many (and better) down-up schemes, e.g., anti-Langevin(Christian P. Robert) or negative temperature (Art B. Owen)

I A global optimizer based on the down-up idea (analog to annealing)

12 / 13

Reference

1. Hainline, L. J. et al. (2012) “A New Microlensing Event in the Doubly Imaged QuasarQ0957+ 561” The Astrophysical Journal, 744, 2, 104.

2. Møller, J., Pettitt, A. N., Reeves, R., Berthelsen, K. K. (2006) “An Efficient Markov ChainMonte Carlo Method for Distributions with Intractable Normalising Constants” Biometrika,93, 2, 451–458.

3. Neal, R. M. (1996) “Sampling From Multimodal Distributions Using Tempered Transitions”Statistics and Computing, 6, 4, 353–366.

4. Tak, H., Meng, X.-L., van Dyk, D. A. (in progress), “A Repulsive-Attractive MetropolisAlgorithm for Multimodality.” arXiv 1601.05633

5. Tak, H., Mandel, K., van Dyk, D. A., Kashyap, V. L., Meng, X.-L., Siemiginowska, A. (inprogress), “Bayesian Estimates of Astronomical Time Delays between Gravitationally LensedStochastic Light Curves.” arXiv 1602.01462

6. Tierney, L. (2009) “Markov Chains for Exploring Posterior Distributions” The Annals ofStatistics, 22, 4, 1701–1728.

13 / 13

Documents

A Repelling-Attracting Metropolis (RAM) Algorithm for Multimodalitycschafer/SCMA6/Tak.pdf · A Repelling-Attracting Metropolis (RAM) Algorithm for Multimodality Tak (Hyungsuk) Tak