52
Computing Entropies with Nested Sampling Brendon J. Brewer Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/ ˜ brewer/ Brendon J. Brewer Computing Entropies with Nested Sampling

brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Embed Size (px)

Citation preview

Page 1: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Computing Entropies with Nested Sampling

Brendon J. Brewer

Department of StatisticsThe University of Auckland

https://www.stat.auckland.ac.nz/˜brewer/

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 2: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

What is entropy?

Firstly, I’m talking about information theory, notthermodynamics (though the two are connected).

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 3: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Information Theory

The fundamental theorem of information theory

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 4: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Information Theory

The fundamental theorem of information theory

TheoremIf you take the log of a probability, it seems like you understandprofound truths.

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 5: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Shannon entropy

Consider a discrete probability distribution with probabilitiesp = {pi}. The Shannon entropy is

H(p) = −∑

i

pi log pi (1)

It is a real-valued property of the distribution.

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 6: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Relative entropy

Consider two discrete probability distributions with probabilitiesp = {pi} and q = {qi}. The relative entropy is

H(p;q) = −∑

i

pi log

(pi

qi

)(2)

Without the minus sign, it’s the ‘Kullback-Leibler divergence’,and is more fundamental than the Shannon entropy. Withuniform q, it reduces to the Shannon entropy (up to an additiveconstant).

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 7: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Entropy quantifies uncertainty

If there are just N equally likely possibilities, i.e., pi = 1/N, thenH = logN.

1 2 3x

0.0

0.1

0.2

0.3

0.4P

rob

abili

ty

H = 1.0986 nats

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 8: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Entropy quantifies uncertainty

If there are just N equally likely possibilities, i.e., pi = 1/N, thenH = logN.

5 10 15 20 25x

0.00

0.01

0.02

0.03

0.04

0.05P

rob

abili

ty

H = 3.0445 nats

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 9: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Entropy quantifies uncertainty

If there are just N equally likely possibilities, i.e., pi = 1/N, thenH = logN.

10 20 30 40x

0.00

0.01

0.02

0.03

0.04

0.05P

rob

abili

ty

H = 3.0445 nats

Heuristic: standard deviation quantifies uncertainty‘horizontally’, entropy does it ‘vertically’.

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 10: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

What about densities?

We get ‘differential entropy’

H = −∫

all xf (x) log f (x)dx (3)

This generalises log-volume, as defined with respect to dx .

ImportantDifferential entropy is coordinate-system dependent.

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 11: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Some entropies in Bayesian statistics

Written in terms of parameters θ and data d , for Bayesianpurposes.

1 Entropy of the prior for the parameters H(θ)

2 Entropy of the conditional prior for the data H(d |θ)3 Entropy of the posterior H(θ|d)4 Entropy of the prior for the data H(d)

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 12: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Some entropies in Bayesian statistics

Written in terms of parameters θ and data d , for Bayesianpurposes.

1 Entropy of the prior for the parameters H(θ)

2 Entropy of the conditional prior for the data H(d |θ)3 Entropy of the posterior H(θ|d)4 Entropy of the prior for the data H(d)

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 13: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Some entropies in Bayesian statistics

Written in terms of parameters θ and data d , for Bayesianpurposes.

1 Entropy of the prior for the parameters H(θ)

2 Entropy of the conditional prior for the data H(d |θ)

3 Entropy of the posterior H(θ|d)4 Entropy of the prior for the data H(d)

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 14: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Some entropies in Bayesian statistics

Written in terms of parameters θ and data d , for Bayesianpurposes.

1 Entropy of the prior for the parameters H(θ)

2 Entropy of the conditional prior for the data H(d |θ)3 Entropy of the posterior H(θ|d)

4 Entropy of the prior for the data H(d)

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 15: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Some entropies in Bayesian statistics

Written in terms of parameters θ and data d , for Bayesianpurposes.

1 Entropy of the prior for the parameters H(θ)

2 Entropy of the conditional prior for the data H(d |θ)3 Entropy of the posterior H(θ|d)4 Entropy of the prior for the data H(d)

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 16: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Some entropies in Bayesian statistics

RemarkConditional entropies such as (2) and (3) are defined using anexpectation over the second argument (the thing conditionedon).

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 17: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Interpretation of conditional entropies

How uncertain would the question “what’s the value of θ,precisely?” be if the question “what’s the value of d , precisely”were to be resolved?

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 18: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Connections

Entropy of the joint prior:

H(θ,d) = H(θ) + H(d |θ) (4)= H(d) + H(θ|d). (5)

Mutual information:

I(θ;d) = H(θ)− H(θ|d). (6)

This quantifies dependence — or more fundamentally,relevance, or the potential for learning.There are many other ways of expressing I.

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 19: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Pre-data considerations

We might want to know how relevant the data is to theparameters, before learning the data. We might want tooptimise that quantity for experimental design. But it’s nasty,especially if there are nuisance parameters.

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 20: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Hard integrals

E.g.

H(θ|d) = −∫

p(d)∫

p(θ|d) log p(θ|d)dθ dd (7)

= −∫

p(d)∫

p(θ|d) log[

p(θ)p(d |θ)p(d)

]dθ dd (8)

But p(d), sitting there inside a logarithm, is already supposedto be a hard integral (the marginal likelihood / evidence)...

p(d) =∫

p(θ)p(d |θ)dθ (9)

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 21: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Hard integrals

Hard integrals with nuisance parameters η, interestingparameter(s) φ

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 22: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Marginal Likelihood Integral

Nested Sampling was invented in order to do this hard integral

p(d) =∫

p(θ)p(d |θ)dθ (10)

or

Z =

∫π(θ)L(θ)dθ (11)

where π = prior, L = likelihood. It’s just an expectation. Why isit hard?

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 23: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Simple Monte Carlo fails

Z =

∫π(θ)L(θ)dθ (12)

≈ 1N

N∑i=1

L(θi) (13)

with θi ∼ π will probably miss the tiny regions where L is high.Equivalently, π implies a very heavy-tailed distribution ofL-values, and simple Monte Carlo fails.

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 24: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Nested Sampling

Nested Sampling takes the original problem and constructs a1D problem from it.

Z =

∫ 1

0L(X )dX (14)

where

X (`) =

∫L(θ)>`

π(θ)dθ (15)

The meaning of X

X (`) is the amount of prior mass whose likelihood exceeds `.As ` increases, X decreases.

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 25: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Nested Sampling

Figure from Skilling (2006).

Since X (`) is the CDF of L-values implied by π, points θ ∼ πhave a uniform distribution over X .

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 26: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Nested Sampling

The idea is to generate a sequence of points with increasinglikelihoods, such that we can estimate their X values. Since weknow their L-values, we can then do the integral numerically.

Z =

∫ 1

0L(X )dX (16)

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 27: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Nested Sampling algorithm

1 Generate N points from π

2 Find the worst one (lowest likelihood L∗), save it3 Estimate its X using Beta(N,1) distribution1

4 Record worst particle and its X -value, then discard it.5 Replace that point with a new one from π but with

likelihood above L∗.6 Repeat steps 2–5 indefinitely.

1From order statistics.Brendon J. Brewer Computing Entropies with Nested Sampling

Page 28: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Nested Sampling algorithm

1 Generate N points from π

2 Find the worst one (lowest likelihood L∗), save it3 Estimate its X using Beta(N,1) distribution1

4 Record worst particle and its X -value, then discard it.5 Replace that point with a new one from π but with

likelihood above L∗.6 Repeat steps 2–5 indefinitely.

1From order statistics.Brendon J. Brewer Computing Entropies with Nested Sampling

Page 29: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Nested Sampling algorithm

1 Generate N points from π

2 Find the worst one (lowest likelihood L∗), save it

3 Estimate its X using Beta(N,1) distribution1

4 Record worst particle and its X -value, then discard it.5 Replace that point with a new one from π but with

likelihood above L∗.6 Repeat steps 2–5 indefinitely.

1From order statistics.Brendon J. Brewer Computing Entropies with Nested Sampling

Page 30: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Nested Sampling algorithm

1 Generate N points from π

2 Find the worst one (lowest likelihood L∗), save it3 Estimate its X using Beta(N,1) distribution1

4 Record worst particle and its X -value, then discard it.5 Replace that point with a new one from π but with

likelihood above L∗.6 Repeat steps 2–5 indefinitely.

1From order statistics.Brendon J. Brewer Computing Entropies with Nested Sampling

Page 31: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Nested Sampling algorithm

1 Generate N points from π

2 Find the worst one (lowest likelihood L∗), save it3 Estimate its X using Beta(N,1) distribution1

4 Record worst particle and its X -value, then discard it.

5 Replace that point with a new one from π but withlikelihood above L∗.

6 Repeat steps 2–5 indefinitely.

1From order statistics.Brendon J. Brewer Computing Entropies with Nested Sampling

Page 32: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Nested Sampling algorithm

1 Generate N points from π

2 Find the worst one (lowest likelihood L∗), save it3 Estimate its X using Beta(N,1) distribution1

4 Record worst particle and its X -value, then discard it.5 Replace that point with a new one from π but with

likelihood above L∗.

6 Repeat steps 2–5 indefinitely.

1From order statistics.Brendon J. Brewer Computing Entropies with Nested Sampling

Page 33: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Nested Sampling algorithm

1 Generate N points from π

2 Find the worst one (lowest likelihood L∗), save it3 Estimate its X using Beta(N,1) distribution1

4 Record worst particle and its X -value, then discard it.5 Replace that point with a new one from π but with

likelihood above L∗.6 Repeat steps 2–5 indefinitely.

1From order statistics.Brendon J. Brewer Computing Entropies with Nested Sampling

Page 34: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

The sequence of X -values

The sequence of X values, if you transform them, have aPoisson process distribution with rate N.

− ln(X1) ∼ Exponential(N) (17)− ln(X2) ∼ − ln(X1) + Exponential(N) (18)− ln(X3) ∼ − ln(X2) + Exponential(N) (19)

(forgive the notational abuse)

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 35: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Poisson process view of NS

The number of NS iterations taken to enter a small region(defined by a likelihood threshold) is an unbiased estimator ofthe log-probability of that region!

Also, π(θ) can be any distribution (needn’t be a prior) and L(θ)any scalar function. Opportunities here...

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 36: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

My algorithm

To compute H(θ) = −∫

f (θ) log f (θ)dθ when f can be sampledbut not evaluated:

1 Generate a ‘reference point’ θref from f2 Do a Nested Sampling run with f as “prior” and minus the

distance to θref as “likelihood”.3 Measure how many NS iterations were needed to make

the distance to θref really small, and divide by N. That givesan unbiased estimate of the log-prob near θref.

4 Repeat steps 1–3 many times.5 Average the estimated log-probs, then apply corrections to

convert to density.

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 37: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

My algorithm

To compute H(θ) = −∫

f (θ) log f (θ)dθ when f can be sampledbut not evaluated:

1 Generate a ‘reference point’ θref from f

2 Do a Nested Sampling run with f as “prior” and minus thedistance to θref as “likelihood”.

3 Measure how many NS iterations were needed to makethe distance to θref really small, and divide by N. That givesan unbiased estimate of the log-prob near θref.

4 Repeat steps 1–3 many times.5 Average the estimated log-probs, then apply corrections to

convert to density.

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 38: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

My algorithm

To compute H(θ) = −∫

f (θ) log f (θ)dθ when f can be sampledbut not evaluated:

1 Generate a ‘reference point’ θref from f2 Do a Nested Sampling run with f as “prior” and minus the

distance to θref as “likelihood”.

3 Measure how many NS iterations were needed to makethe distance to θref really small, and divide by N. That givesan unbiased estimate of the log-prob near θref.

4 Repeat steps 1–3 many times.5 Average the estimated log-probs, then apply corrections to

convert to density.

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 39: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

My algorithm

To compute H(θ) = −∫

f (θ) log f (θ)dθ when f can be sampledbut not evaluated:

1 Generate a ‘reference point’ θref from f2 Do a Nested Sampling run with f as “prior” and minus the

distance to θref as “likelihood”.3 Measure how many NS iterations were needed to make

the distance to θref really small, and divide by N. That givesan unbiased estimate of the log-prob near θref.

4 Repeat steps 1–3 many times.5 Average the estimated log-probs, then apply corrections to

convert to density.

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 40: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

My algorithm

To compute H(θ) = −∫

f (θ) log f (θ)dθ when f can be sampledbut not evaluated:

1 Generate a ‘reference point’ θref from f2 Do a Nested Sampling run with f as “prior” and minus the

distance to θref as “likelihood”.3 Measure how many NS iterations were needed to make

the distance to θref really small, and divide by N. That givesan unbiased estimate of the log-prob near θref.

4 Repeat steps 1–3 many times.

5 Average the estimated log-probs, then apply corrections toconvert to density.

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 41: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

My algorithm

To compute H(θ) = −∫

f (θ) log f (θ)dθ when f can be sampledbut not evaluated:

1 Generate a ‘reference point’ θref from f2 Do a Nested Sampling run with f as “prior” and minus the

distance to θref as “likelihood”.3 Measure how many NS iterations were needed to make

the distance to θref really small, and divide by N. That givesan unbiased estimate of the log-prob near θref.

4 Repeat steps 1–3 many times.5 Average the estimated log-probs, then apply corrections to

convert to density.

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 42: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

My algorithm

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 43: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

My algorithm

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 44: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

My algorithm

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 45: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

My algorithm

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 46: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Movie

Play movie.mkv(It’s also on YouTube)

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 47: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Toy experimental design example

Two observing strategies — even and uneven. Which is betterfor measuring a period?

0.0 0.2 0.4 0.6 0.8 1.0

t

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

y

True signal

Even data

Uneven data

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 48: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Specifics

Let τ = log10(period).

I knew H(τ) because I chose the prior. I used the algorithm toestimate H(τ |d), so marginal posteriors were the distributionswhose entropies I estimated2.

I then computed the mutual information

I(τ ;d) = H(τ)− H(τ |d) (20)

2If you only care about one parameter, define your distance function interms of that parameter only!

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 49: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Result

Ieven = 5.441± 0.038 nats (21)Iuneven = 5.398± 0.038 nats (22)

i.e., any difference is trivial.

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 50: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

In practice...

When a period is short relative to observations, you can get amultimodal posterior pdf3, and ‘learn a lot’ by ruling out most ofthe space, but still having many peaks.

I did not investigate this aspect of the problem — I assumedlong periods.

3e.g., see Larry Bretthorst’s work connecting the posterior pdf to theperiodogram.

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 51: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Paper/software/endorsement

http://www.mdpi.com/1099-4300/19/8/422https://github.com/eggplantbren/InfoNest

Brendon J. Brewer Computing Entropies with Nested Sampling

Page 52: brendon J. Brewer - Department Of Statisticsbrewer/infonest-talk.pdf · 2 Entropy of the conditional prior for the data H ... 5 Average the estimated log-probs, ... convert to density

Thanks

Ruth Angus (Flatiron Institute), Ewan Cameron (Oxford), JamesCurran (Auckland), Tom Elliott (Auckland), David Hogg (NYU),Kevin Knuth (SUNY Albany), Thomas Lumley (Auckland), IainMurray (Edinburgh), John Skilling (Maximum Entropy DataConsultants), Jared Tobin (jtobin.io).

Brendon J. Brewer Computing Entropies with Nested Sampling