36
Outline Nested Sampling Posterior Simulation Nested Sampling Termination and Size of N Numerical Examples Conclusion Nested Sampling for General Bayesian Computation Represented by WU Changye 12 février 2015 Represented by WU Changye Nested Sampling for General Bayesian Computation

Nested sampling

Embed Size (px)

Citation preview

Page 1: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Nested Sampling for General BayesianComputation

Represented by WU Changye

12 février 2015

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 2: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Nested Sampling

Posterior Simulation

Nested Sampling Termination and Size of N

Numerical Examples

Conclusion

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 3: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Introduction

In the Bayesian paradigm, the parameter θ follows the priordistribution π, the observations y follow the distribution L(y |θ)given θ, then the posterior distribution f (θ|y) which indicates thedistribution of θ given the observations y has the following form :

f (θ|y) =L(y |θ)π(θ)∫

Θ L(y |θ)π(θ)dθ

The objective of nested sampling is to compute the ’evidence’ :

Z =

∫Θ

L(y |θ)π(θ)dθ

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 4: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

θ is a random variable, then

Z = Eπ(L(θ))

For simplicity, let L(θ) denote the likelihood L(y |θ). The cumulativedistribution function of L(θ) is

F (λ) =

∫L(θ)<λ

π(θ)dθ

Define the induced measure µ on R by the likelihood function andthe prior as follwing

µ(A) = Pπ(L(θ) ∈ A)

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 5: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Lemma 1 : Eπ(L(θ)) = Eµ(X ).Proof : ∀g is a indication function of a measurable set A in R.Then

Eπ(g(L(θ))) = Eπ(IA(L(θ))) =

∫L(θ)∈A

π(θ)dθ

However, µ(dx) =∫

Θ δ{L(θ)}(dx)π(θ)dθ.

Eµ(g(X )) =

∫RIA(x)µ(dx) =

∫Θ

(∫RIA(x)δ{L(θ)}(dx)

)π(θ)dθ

Therefore,

Eµ(g(X )) = Eπ(IA(L(θ))) = Eπ(g(L(θ)))

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 6: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

In the general case, let {gn} be an increasing sequence of stepfunctions converging to identity function Id ; then {gn ◦ L} is anincreasing sequence of step functions converging to L and thedesired conclusion follows by taking limits.

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 7: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Lemma 2 : If X is a positive-valued random variable, has p.d.f. fand c.d.f. F, then :∫ ∞

0(1− F (x))dx =

∫ ∞0

xf (x)dx = E(X ).

Proof : ∫ ∞0

(1− F (x))dx =

∫ ∞0

(1− P(X < x))dx

=

∫ ∞0

P(X ≥ x)dx

=

∫ ∞0

∫ ∞x

f (y) · dy · dx

=

∫ ∞0

f (y)

∫ y

0dx · dy

=

∫ ∞0

f (y) · ydy = E(X )

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 8: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

According to Lemma 1 and 2,

Z = Eµ(X ) =

∫ ∞0

xdF (x) =

∫ ∞0

(1− F (x))dx

Let ϕ−1(x) = 1− F (x) = P{θ : L(θ) > x}

Z =

∫ ∞0

ϕ−1(x)dx =

∫ 1

0ϕ(x)dx

Therefore, we have the evidence represented by an one-dimensionalintegration.

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 9: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

In order to compute the following integration :

J =

∫ 1

0ϕ(x)dx

there are three methods based on sampling.

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 10: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

1) Importance Sampling :i = 1, · · · , n,Ui ∼ U[0,1],

J1 = 1n∑n

i=1 ϕ(Ui )

2) Riemann approximation :i = 1, · · · , n,Ui ∼ U[0,1],U(i) is the order statistics of(U1, · · · ,Un), U(1) ≤ · · · ≤ U(n),

J2 =∑n−1

i=1 ϕ(U(i))(U(i+1) − U(i))

3) A complicated method :x0 = 1step1 : i = 1, · · · ,N,U1

i ∼ U[0,1], x1 = max{U11 , · · · ,U1

N}step2 : i = 1, · · · ,N,U2

i ∼ U[0,x1], x2 = max{U21 , · · · ,U2

N}· · · · · ·setp n : i = 1, · · · ,N,Un

i ∼ U[0,xn−1], xn = max{Un1 , · · · ,Un

N}J3 =

∑ni=1 ϕ(xi )(xi−1 − xi )

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 11: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Nested sampling takes the third method and the reason is that ϕ isa decreasing function and in many cases it decreases rapidly.

Figure: Graph of ϕ(x) and the trace of (xi , ϕ(xi ))

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 12: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

First, we consider the distributions of x1, · · · , xn :for u ∈ [0, 1],

P(x1 < u) = P(U11 < u, · · · ,U1

N < u)

=N∏

i=1

P(U1i < u)

= uN

As a result, the density function of x1 is

f (x1) = NxN−11

By the same method, we have :

f (xk |xk−1) =N

xk−1

(xk

xk−1

)N−1

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 13: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Note tk = xkxk−1

,

P(tk ≤ t) =

∫P(xk ≤ tx |xk−1 = x)fxk−1(x)dx

=

∫xk−1

∫ tx

0fxk |xk−1(y |x)fxk−1(x)dxdy

=

∫xk−1

∫ tx

0

Nx

(yx

)N−1fxk−1(x)dxdy

=

∫xk−1

tN fxk−1(x)dx = tN

Besides,

P(tk ≤ t|xk−1 = x) = P(xk ≤ tx |xk−1 = x) = tN

As a result, we have tk ⊥ xk−1.Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 14: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Moreover, a point estimate for xk can be written entirely in termsof point estimates for the tk ,

xk =xk

xk−1× xk−1

xk−2×· · ·× x1

x0×x0 = tk ·tk−1 · · · t1 ·x0 =

(k∏

i=1

ti

)·x0

More appropriate to the large range common to many problems,log xk becomes

log xk = log

(k∏

i=1

ti

)· x0 =

k∑i=1

log ti + log x0

where the logarithmic shrinkage is distributed as

f (log t) = Ne(N−1) log t

with the mean and the variance :

E(log t) = − 1N

V(log t) =1

N2Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 15: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Taking the mean as the point estimate for each log ti finally gives

logxk

x0= − k

N±√

kN

Parameterizing xk in terms of the shrinkage proves immediatelyadvantageous – because the log ti are independent, the errors in thepoint estimates tend to cancel and the estimates for the xk growincreasingly more accurate with k .

xk = exp(− kN

)

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 16: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Next, we consider the distribution of ϕ(X ), where X ∼ U [0, 1]Considering the random variable X = ϕ−1(L(θ)), where θ ∼ π.Notice that :

ϕ−1 : [0, Lmax]→ [0, 1],

λ 7→ P(L(θ) > λ)

for u ∈ [0, 1],

P(X < u) = P(ϕ−1(L(θ)) < u)

= P(L(θ) > ϕ(u))

= ϕ−1(ϕ(u))

= u

This means that ϕ−1(L(θ)) follows the U [0, 1] and ϕ(X ) ∼ L(θ).

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 17: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Considering the situation on the truncated distribution :

π(θ) ∝{π(θ) L(θ) > L00 otherwise

Let X0 = ϕ−1(L0) and X = ϕ−1(L(θ)), where θ ∼ π.For u ∈ [0,X0],

P(X < u) = P(ϕ−1(L(θ)) < u|L(θ) > L0)

=P(L(θ) > ϕ(u))

P(L(θ) > L0)

=ϕ−1(ϕ(u))

X0=

uX0

X ∼ U [0,X0],

As a result, ϕ(X ) ∼ L(θ), where X ∼ U [0,X0] and θ ∼ π.Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 18: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

AlgorithmThe algorithm based on the method discussed in the previoussection is described in below :– Iteration 1 : sample independently N points θ1,i from the priorπ(θ), determine θ1 = argmin1≤i≤N L(θ1,i ) and set ϕ1 = L(θ1)

– Iteration 2 : obtain the N current values θ2,i , by reproducing theθ1,i ’s except for θ1 that is replaced by a draw from the priordistribution π conditional upon L(θ) ≥ ϕ1 ; then select θ2 asθ2 = argmin1≤i≤N L(θ2,i ), and set ϕ2 = L(θ2)

– Iterate the above step until a given stopping rule is satisfied, forinstance when observing very small changes in the approximationZ or when reaching the maximal value of L(θ) when it is known.

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 19: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Z =J∑

i=1

ϕi (xi−1 − xi )

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 20: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

By-product of Nested Sampling

Skilling indicates that nested sampling provides simulations fromthe posterior distribution at no extra cost : "the existing sequenceof points θ1, θ2, θ3, . . . already gives a set of posteriorrepresentatives, provided the i ’th is assigned the appropriateimportance ωiLi"

Eπ(f (θ)) =

∫Θ π(θ)L(θ)f (θ)dθ∫

Θ π(θ)L(θ)dθWe can use a single run of nested sampling to obtain estimators ofboth the numerator and the denominator, the latter being theevidence Z . The estimator of the numerator is

j∑i=1

(xi−1 − xi )ϕi f (θi ) (1)

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 21: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Lemma 3(N.Chopin & C.P Robert) :Let f (l) = Eπ{f (θ)|L(θ) = l} for l > 0, then, if f is absolutelycontinuous, ∫ 1

0ϕ(x)f (ϕ(x)) dx =

∫π(θ)L(θ)f (θ)dθ

Proof : Let ψ : x → x f (x),∫π(θ)L(θ)f (θ)dθ = Eπ[ψ{L(θ}]

=

∫ +∞

0Pπ(ψ{L(θ} > l)dl

=

∫ +∞

0ϕ−1(ψ−1(l))dl =

∫ 1

0ψ(ϕ(x))dx

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 22: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Termination

The author suggests that

max(L1, · · · , LN)Xj < fZj =⇒ termination

where f is some fraction.

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 23: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

N ?

The larger N is, the smaller the variability of the approximation is.

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 24: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

How to sample N points from the constraint parametricspace

Using a MCMC method which constructs a Markov Chain that hasthe invariant distribution of the truncated distribution.

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 25: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

A decentred gaussian example

The prior is

π(θ) =d∏

i=1

1√2π

exp(−12

(θ(k))2)

and the likelihood is

L(y |θ) =d∏

i=1

1√2π

exp(−12

(yk − θ(k))2)

In this example, we can calculate the evidence analytically

Z =

∫Rd

L(θ)π(θ)dθ =exp(−

∑dk=1 y2

k4 )

2dπd/2

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 26: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Figure: Graph of ϕ(x) and the trace of (xi , ϕ(xi )) with d = 1 and y = 10.

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 27: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Figure: The prior distribution and the likelihood with d = 1 and y = 10.

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 28: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Figure: the box-plot of log Z − log Z with d = 1 and y = 10 for Nestedsampling and Monte Carlo.

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 29: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Figure: the box-plot of log Z − log Z with d = 5 and y = (3, 3, 3, 3, 3).

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 30: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

A Probit Model

We consider the arsenic dataset and a probit model studied inChapter 5 of Gelman & Hill (2006). The observations areindependent Bernoulli variables yi such thatP(yi = 1|xi ) = Φ(xT

i θ), where xi is a vector of d covariates, θ is avector parameter of size d , and Φ denotes the standard normaldistribution function. In this particular example, d = 7.

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 31: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

The prior isθ ∼ N (0, 102Id )

L(θ) =n∏

i=1

(Φ(xT

i θ))yi(1− Φ(xT

i θ))1−yi

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 32: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Figure: the box-plot of log Z with N = 20 for HMC and random walkMCMC. The blue line remarks the true value of log Z (Chib’s method).

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 33: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Posterior Samples

We use the Gaussian example to illustrate this result. Letf (θ) = exp(−3θ + 9d

2 ).

Figure: The box-plot of the log-relative error of log Z − log Z andlog ˆE(f )− logE(f )

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 34: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Conclusion

– Nested sampling reverses the accepted approach to Bayesiancomputation by putting the evidence first.

– Nested sampling samples more sparsely from the prior in regionswhere the likelihood is low and more densely where the likelihoodis high, resulting in greater efficiency than a sampler that drawsdirectly from the prior.

– The procedure runs with an evolving collection of N points,where N can be chosen small for speed or large for accuracy.

– Nested sampling always reduces a multidimensional integral tothe integral of a one-dimensional monotonic function, no matterhow many dimensions θ occupies, and no matter how strange theshape of the likelihood function L(θ) is.

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 35: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Problems

– How to generate N independent points in the constraintparametric space is an important problem. Techniques to do soeffectively and efficiently may vary from problem to problem.

– Termination is also another problem in practice.

Represented by WU Changye Nested Sampling for General Bayesian Computation

Page 36: Nested sampling

OutlineNested Sampling

Posterior SimulationNested Sampling Termination and Size of N

Numerical ExamplesConclusion

Thank you !

Represented by WU Changye Nested Sampling for General Bayesian Computation