19
Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial College, London [email protected] November 21, 2016 Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 1 / 19

Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

Advanced Lectures on Bayesian Analysis

Alan Heavens

Imperial Centre for Inference and Cosmology (ICIC)Imperial College, London

[email protected]

November 21, 2016

Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 1 / 19

Page 2: Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

Overview

1 Model Comparison

2 Bayesian Evidence, or Marginal Likelihood

Bayesian Information Criterion (BIC)

Nested Models: Savage-Dickey Density Ratio

Gaussian Example

Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 2 / 19

Page 3: Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

Model Comparison

A higher-level question than parameter inference, in which one wantsto know which theoretical framework (‘model’) is preferred, given thedata (regardless of the parameter values)

The models may be completely different (e.g. compare Big Bang withSteady State, to use an old example),

or variants of the same idea. E.g. comparing a simple cosmologicalmodel where the Universe is assumed to be flat, with a more generalmodel where curvature is allowed to vary

The sort of question asked here is essentially ‘Do the data favour amore complex model?’

Clearly in the latter type of comparison the likelihood itself will be ofno use - it will always increase if we allow more freedom.

Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 3 / 19

Page 4: Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

Model Comparison

Assuming uninformative priors for the models (i.e. the same a prioriprobability), the probability of the models given the data is simplyproportional to the Bayesian Evidence.

The Bayesian Evidence, or Marginal Likelihood, is thedenominator in Bayes’ theorem

p(θ|x) =p(x |θ)π(θ)

p(x)

It is much more obvious if we include the model dependence as acondition:

p(θ|x ,M) =p(x |θ,M)π(θ|M)

p(x |M)

The Bayesian Evidence normalises the posterior, so is

p(x |M) =

∫dθ p(x |θ,M)π(θ|M)

Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 4 / 19

Page 5: Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

Model Comparison

Figure: The Planck power spectrum, with the theoretical model with best fittingcosmological parameters.

Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 5 / 19

Page 6: Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

Bayesian Evidence or Marginal Likelihood

We denote two competing models by M and M ′.

We denote by x the data vector, and by θ and θ′ the parametervectors (of length n and n′).

Rule 1: Write down what you want to know.

Here it is p(M|x) - the probability of the model, given the data.

Use Bayes’ theorem:

p(M|x) =p(x |M)π(M)

p(x)

Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 6 / 19

Page 7: Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

Bayesian evidence

The Bayesian Evidence is

p(x|M) =

∫dθ p(x|θ,M)π(θ|M),

If a model has no parameters, then the integral is simply replaced byp(x|M)

The relative probabilities of two models is

p(M ′|x)

p(M|x)=π(M ′)

π(M)

∫dθ′ p(x|θ′,M ′)π(θ′|M ′)∫dθ p(x|θ,M)π(θ|M)

.

With uninformative priors on the models, p(M ′) = p(M), this ratiosimplifies to the ratio of evidences, called the Bayes Factor,

B ≡∫dθ′ p(x|θ′,M ′)π(θ′|M ′)∫dθ p(x|θ,M)π(θ|M)

.

Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 7 / 19

Page 8: Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

Nested models

We assume that M ′ is a simpler model, which has fewer parameters init (n′ < n)

We further assume that it is nested in Model M ′, i.e. the n′

parameters of model M ′ are common to M, which has p ≡ n − n′

extra parameters in it. These parameters are fixed to fiducial values inM ′.

Note that the a complicated model M will (if M ′ is nested) inevitablylead to a higher likelihood (or at least as high), but the evidence mayfavour the simpler model if the fit is nearly as good, through thesmaller prior volume.

Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 8 / 19

Page 9: Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

Nested models

We assume uniform (and hence separable) priors in each parameter,over ranges ∆θ (or ∆θ′). Hence p(θ|M) = (∆θ1 . . .∆θn)−1

B =

∫dθ′ p(x|θ′,M ′)∫dθ p(x|θ,M)

∆θ1 . . .∆θn

∆θ′1 . . .∆θ′n′.

Note that if the prior ranges are not large enough to containessentially all the likelihood, then the position of the boundarieswould influence the Bayes factor. In what follows, we will assume theprior range is large enough to encompass all the likelihood.

In the nested case, the ratio of prior hypervolumes simplifies to

∆θ1 . . .∆θn

∆θ′1 . . .∆θ′n′= ∆θn′+1 . . .∆θn′+p,

where p ≡ n − n′ is the number of extra parameters in the morecomplicated model.

Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 9 / 19

Page 10: Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

Bayesian evidence

Challenges: The evidence requires a multidimensional integration over thelikelihood and prior, and this may be very expensive to compute.

Fisher-like approach: and assume the likelihood is a multivariategaussian (Laplace approximation)

Nested sampling (multinest, polychord), where one tries to samplethe likelihood in an efficient way.

Approximations: e.g., AIC and BIC may be unreliable as they arebased on the best-fit χ2, and from a Bayesian perspective we want toknow how much parameter space would give the data with highprobability. Also don’t include the prior. Not Bayesian.

Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 10 / 19

Page 11: Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

Bayesian Information Criterion (BIC)

With very constraining data, the likelihood L will be approximatelygaussian near a narrow peak at θmax

L ' Lmax exp

[−

(θ − θmax)iFij(θ − θmax)j2

]The evidence

p(x) =

∫dθ p(x|θ)π(θ)

p(x) ' Lmaxπmax

∫dθ exp

[−

(θ − θmax)iFij(θ − θmax)j2

]

Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 11 / 19

Page 12: Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

Bayesian Information Criterion (BIC)

In k parameter dimensions, this is

p(x) ' (2π)k/2|F |−1/2Lmaxπmax

ln p(x) ' lnLmax + lnπmax +k

2ln 2π − 1

2ln |F |

ln |F | ∼ k lnN (N= number of data points) to leading order, so

ln p(x) ' lnLmax − k2 lnN = -BIC/2.

So maximising the evidence is roughly equivalent to minimising theBIC, except that it is only true asymptotically, and assumes differentmodels have the same πmax

These assumptions often do not hold in practice

Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 12 / 19

Page 13: Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

Bayesian EvidenceNested Models: Savage-Dickey Density Ratio

Let M0 and M1 be nested models, such that M0 is a subset of M1,e.g., where one of M1’s parameters is fixed to a particular value.

Some notation. Let the parameters for M0 be ψ, and those of M1 beψ, φ.

M0 has φ = φ0 (fixed).

Assume that all probabilities are continuous, so

limφ→φ0

π1(ψ|φ) = aπ0(ψ). i .e. π1(ψ|φ = φ0) = aπ0(ψ).

The factor a is needed to ensure that the priors are normalised, whichwe absolutely have to have. e.g. If M1 has a 2D flat prior with ranges∆θ1,∆θ2,

π1 =1

∆θ1∆θ2; π0 =

1

∆θ1

so a = 1∆θ2

.

Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 13 / 19

Page 14: Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

SDDR

The Bayes factor is

BF01 ≡p(x |M0)

p(x |M1)=

∫p0(x |ψ)π0(ψ) dψ∫

p1(x |ψ, φ)π1(ψ, φ) dψdφ

With the continuity, we have then

BF01 =

∫p1(x |ψ, φ = φ0)π1(ψ, φ = φ0) dψ∫

p1(x |ψ, φ)π1(ψ, φ) dψdφ=

a p1(x |φ = φ0)

p1(x).

Now, using Bayes’ theorem,

p1(x |φ = φ0) =p1(φ = φ0|x) p1(x)

π1(φ = φ0)

Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 14 / 19

Page 15: Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

SDDR

Hence

BF01 =a p1(φ = φ0|x)

π1(φ = φ0).

This is the Savage-Dickey Density Ratio (SDDR). It looks verysimple, but we need to think how to use it, since the numerator is aposterior, not a likelihood.

However, if we have sampled it, we can estimate the SDDR.

The denominator is easy if the prior has a simple functional form.

The numerator may be estimated from samples of the posterior inmodel 1, e.g. if f (∆φ, φ0) is the fraction of samples within (a smallrange) ∆φ of φ0, then p1(φ0|x) ' f (∆φ, φ0)/(2∆φ).

Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 15 / 19

Page 16: Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

Gaussian Example

In this gaussian example, we can evaluate the integrals analytically.

Let M0 be x ∼ N (0, σ2), and M1 be x ∼ N (µ, σ2), where the prior on µis gaussian with variance Σ2. Let the measurement be x = λσ.

p1(x |µ) =1√2πσ

e−(x−µ)2/(2σ2)

and

p1(µ|x) =p1(x |µ)π1(µ)

p1(x)=

p1(x |µ)π1(µ)∫p1(x |µ)π1(µ)dµ

Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 16 / 19

Page 17: Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

Gaussian Example

Hence

BF01 =p1(x |µ = 0)π1(µ = 0)

p1(x)

i.e.,

BF01 =

1√2πσ

e−x2/(2σ2) · 1√

2πΣ1√2πσ

1√2πΣ

∫∞−∞ e−(x−µ)2/(2σ2)e−µ2/(2Σ2)dµ

so

BF01 =

√1 +

Σ2

σ2exp

[− λ2

2(1 + σ2

Σ2 )

]

Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 17 / 19

Page 18: Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

Gaussian Example

BF01 =

√1 +

Σ2

σ2exp

− λ2

2(1 + σ2

Σ2 )

If λ� 1, then B01 � 1 and M1 is favoured. If λ ' 1 and σ � Σ, thenM0 is favoured (Occam’s razor). If likelihood is much broader than prior,σ � Σ then BF01 ' 1 and nothing has been learned.

1

150

1

12

1

3

3 12 150

-1 0 1 2 3 4

0

1

2

3

4

5

Figure: The Bayes Factor for a gaussian likelihood (variance σ2), and a gaussianprior (variance Σ2). The x axis =log10(Σ/σ); the y axis is datum/σ.

Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 18 / 19

Page 19: Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_2.pdf · Advanced Lectures on Bayesian Analysis Alan Heavens Imperial Centre for Inference and Cosmology (ICIC) Imperial

Summary

Bayesian formalism can easily be generalised to model comparison

Resulting integrals over parameter space may be challenging tocompute (see Fred’s lectures)

Approximations such as BIC may not be accurate

Evidence ratios have sensitivity to the prior, even asymptotically

SDDR may be useful for nested models

Alan Heavens (ICIC, Imperial College) Advanced Topics November 21, 2016 19 / 19