103
Fundamentals of Bayesian Inference

Fundamentals of Bayesian Inference

Embed Size (px)

DESCRIPTION

Fundamentals of Bayesian Inference. Brief Introduction of Your Lecturer. I am working at the Psychological Methods Group at the University of Amsterdam. For the past 10 years or so, one of my main research interests has been Bayesian statistics. - PowerPoint PPT Presentation

Citation preview

Page 1: Fundamentals of Bayesian Inference

Fundamentals of Bayesian Inference

Page 2: Fundamentals of Bayesian Inference

Brief Introduction ofYour Lecturer

I am working at the Psychological Methods Group at the University of Amsterdam.

For the past 10 years or so, one of my main research interests has been Bayesian statistics.

I have been promoting Bayesian inference in psychology, mainly through a series of articles, workshops, and one book.

Page 3: Fundamentals of Bayesian Inference

The Bayesian Book

…is a course book used at UvA and UCI. …will appear in print soon. ….is freely available at

http://www.bayesmodels.com (well, the first part is freely available)

Page 4: Fundamentals of Bayesian Inference
Page 5: Fundamentals of Bayesian Inference

August 12 - August 16, 2013University of Amsterdam

Bayesian Modeling for Cognitive ScienceA WinBUGS Workshop

http://bayescourse.socsci.uva.nl/

Page 6: Fundamentals of Bayesian Inference

Why DoBayesian Modeling

It is fun. It is cool. It is easy. It is principled. It is superior. It is useful. It is flexible.

Page 7: Fundamentals of Bayesian Inference

Our Goals This Afternoon Are…

To discuss some of the fundamentals of Bayesian inference.

To make you think critically about statistical analyses that you have always taken for granted.

To present clear practical and theoretical advantages of the Bayesian paradigm.

Page 8: Fundamentals of Bayesian Inference

Want to Know MoreAbout Bayes?

Page 9: Fundamentals of Bayesian Inference

Want to Know MoreAbout Bayes?

Page 10: Fundamentals of Bayesian Inference

Prelude

Eric-Jan

Wagenmakers

Page 11: Fundamentals of Bayesian Inference

Three Schools of Statistical Inference

Neyman-Pearson: α-level, power calculations, two hypotheses, guide for action (i.e., what to do).

Fisher: p-values, one hypothesis (i.e., H0), quantifies evidence against H0.

Bayes: prior and posterior distributions, attaches probabilities to parameters and hypotheses.

Page 12: Fundamentals of Bayesian Inference

A Freudian Analogy

Neyman-Pearson: The Superego. Fisher: The Ego. Bayes: The Id.

Claim: What Id really wants is to attach probabilities to hypotheses and parameters. This wish is suppressed by the Superego and the Ego. The result is unconscious internal conflict.

Page 13: Fundamentals of Bayesian Inference

Internal Conflict Causes Misinterpretations

p < .05 means that H0 is unlikely to be true, and can be rejected.

p > .10 means that H0 is likely to be true.

For a given parameter μ, a 95% confidence interval from, say, a to b means that there is a 95% chance that μ lies in between a and b.

Page 14: Fundamentals of Bayesian Inference

Two Ways to Resolve the Internal Conflict

1. Strengthen Superego and Ego by teaching the standard statistical methodology more rigorously. Suppress Id even more!

2. Give Id what it wants.

Page 15: Fundamentals of Bayesian Inference

Sentenced by p-value

The Unfortunate Case of Sally Clark

Page 16: Fundamentals of Bayesian Inference

The Case of Sally Clark

Sally Clark had two children die of SIDS. The chances of this happening are perhaps as

small as 1 in 73 million: 1/8543 × 1/8543. Can we reject the null hypothesis that Sally

Clark is innocent, and send her to jail? Yes, according to an expert for the

prosecution, Prof. Meadow.

Page 17: Fundamentals of Bayesian Inference

Prof. Roy Meadow,Britisch Paediatrician

“Meadow attributed many unexplained infant deaths to the disorder or condition in mothers called Münchausen Syndrome by Proxy.”

“According to this diagnosis some parents, especially mothers, harm or even kill their children as a means of calling attention to themselves.” (Wikepedia)

Page 18: Fundamentals of Bayesian Inference

Meadow’s Law

“One cot death is a tragedy, two cot deaths is suspicious and, until the contrary is proved, three cot deaths is murder.”

Page 19: Fundamentals of Bayesian Inference

The Outcome

In November 1999, Sally Clark was convicted of murdering both babies by a majority of 10 to 2 and sent to jail.

Page 20: Fundamentals of Bayesian Inference

The Outcome

Note the similarity to p-value hypothesis testing. A very rare event occurred, prompting the law system to reject the null hypothesis (“Sally is innocent”) and send Sally to jail.

Page 21: Fundamentals of Bayesian Inference

Critique

The focus is entirely on the low probability of the deaths arising from SIDS.

But what of the probability of the deaths arising from murder? Isn’t this probability just as relevant? How likely is it that a mother murders her two children?

Page 22: Fundamentals of Bayesian Inference

2002 Royal Statistical Society Open Letter

“The jury needs to weigh up two competing explanations for the babies’ deaths: SIDS or murder. The fact that two deaths by SIDS is quite unlikely is, taken alone, of little value. Two deaths by murder may well be even more unlikely.What matters is the relative likelihood of the deaths under each explanation, not just how unlikely they are under one explanation.”

President Peter Green to the Lord Chancellor

Page 23: Fundamentals of Bayesian Inference

What is the p-value?

“The probability of obtaining a test statistic at least as extreme as the one you observed, given that the null hypothesis is true.”

Page 24: Fundamentals of Bayesian Inference

The Logic of p-Values

The p-value only considers how rare the observed data are under H0.

The fact that the observed data may also be rare under H1 does not enter consideration.

Hence, the logic of p-values has the same flaw as the logic that lead to the sentencing of Sally Clark.

Page 25: Fundamentals of Bayesian Inference

Adjusted Open Letter

“Researchers need to weigh up two competing explanations for the data: H0 or H1. The fact that data are quite unlikely under H0 is, taken alone, of little value. The data may well be even more unlikely under H1.What matters is the relative likelihood of the data under each model, not just how unlikely they are under one model.”

Page 26: Fundamentals of Bayesian Inference

What is Bayesian Inference?Why be Bayesian?

Page 27: Fundamentals of Bayesian Inference

What is Bayesian Inference?

Page 28: Fundamentals of Bayesian Inference

What is Bayesian Inference?

“Common sense expressed in numbers”

Page 29: Fundamentals of Bayesian Inference

What is Bayesian Inference?

“The only statistical procedure that is coherent, meaning that it avoids statements

that are internally inconsistent.”

Page 30: Fundamentals of Bayesian Inference

What is Bayesian Statistics?

“The only good statistics”

[For more background see

Lindley, D. V. (2000). The philosophy of statistics. The Statistician, 49, 293-337.]

Page 31: Fundamentals of Bayesian Inference

Outline

Bayes in a Nutshell The Bayesian Revolution Hypothesis Testing

Page 32: Fundamentals of Bayesian Inference

Bayesian Inferencein a Nutshell

In Bayesian inference, uncertainty or degree of belief is quantified by probability.

Prior beliefs are updated by means of the data to yield posterior beliefs.

Page 33: Fundamentals of Bayesian Inference

Bayesian Parameter Estimation: Example

We prepare for you a series of 10 factual questions of equal difficulty.

You answer 9 out of 10 questions correctly. What is your latent probability θ of

answering any one question correctly?

Page 34: Fundamentals of Bayesian Inference

Bayesian Parameter Estimation: Example

We start with a prior distribution for θ. This reflect all we know about θ prior to the experiment. Here we make a standard choice and assume that all values of θ are equally likely a priori.

Page 35: Fundamentals of Bayesian Inference

Bayesian Parameter Estimation: Example

We then update the prior distribution by means of the data (technically, the likelihood) to arrive at a posterior distribution.

The posterior distribution is a compromise between what we knew before the experiment and what we have learned from the experiment. The posterior distribution reflects all that we know about θ.

Page 36: Fundamentals of Bayesian Inference

Mode = 0.9

95% confidence interval: (0.59, 0.98)

Page 37: Fundamentals of Bayesian Inference

The Inevitability of Probability

Why would one measure “degree of belief” by means of probability? Couldn’t we choose something else that makes sense?

Yes, perhaps we can, but the choice of probability is anything but ad-hoc.

Page 38: Fundamentals of Bayesian Inference

The Inevitability of Probability

Assume “degree of belief” can be measured by a single number.

Assume you are rational, that is, not self-contradictory or “obviously silly”.

Then degree of belief can be shown to follow the same rules as the probability calculus.

Page 39: Fundamentals of Bayesian Inference

The Inevitability of Probability

For instance, a rational agent would not hold intransitive beliefs, such as:

Bel A Bel B

Bel B Bel C

Bel C Bel A

Page 40: Fundamentals of Bayesian Inference

The Inevitability of Probability

When you use a single number to measure uncertainty or quantify evidence, and these numbers do not follow the rules of probability calculus, you can (almost certainly?) be shown to be silly or incoherent.

One of the theoretical attractions of the Bayesian paradigm is that it ensures coherence right from the start.

Page 41: Fundamentals of Bayesian Inference

Coherence I

Coherence is also key in de Finetti’s conceptualization of probability.

Page 42: Fundamentals of Bayesian Inference

Coherence II

One aspect of coherence is that “today’s posterior is tomorrow’s prior”.

Suppose we have exchangeable (iid) data x = {x1, x2}. Now we can update our prior using x, using first x1 and then x2, or using first x2

and then x1. All the procedures will result in exactly the

same posterior distribution.

Page 43: Fundamentals of Bayesian Inference

Coherence III

Assume we have three models: M1, M2, M3. After seeing the data, suppose that M1 is 3

times more plausible than M2, and M2 is 4 times more plausible than M3.

By transitivity, M1 is 3x4=12 times more plausible than M3.

Page 44: Fundamentals of Bayesian Inference

Outline

Bayes in a Nutshell The Bayesian Revolution Hypothesis Testing

Page 45: Fundamentals of Bayesian Inference

The Bayesian Revolution

Until about 1990, Bayesian statistics could only be applied to a select subset of very simple models.

Only recently, Bayesian statistics has undergone a transformation; With current numerical techniques, Bayesian models are “limited only by the user’s imagination.”

Page 46: Fundamentals of Bayesian Inference

The Bayesian Revolutionin Statistics

Page 47: Fundamentals of Bayesian Inference

The Bayesian Revolutionin Statistics

Page 48: Fundamentals of Bayesian Inference

Why Bayes is Now Popular

Markov chain Monte Carlo!

Page 49: Fundamentals of Bayesian Inference

Markov Chain Monte Carlo

Instead of calculating the posterior analytically, numerical techniques such as MCMC approximate the posterior by drawing samples from it.

Consider again our earlier example…

Page 50: Fundamentals of Bayesian Inference
Page 51: Fundamentals of Bayesian Inference

Mode = 0.89

95% confidence interval: (0.59, 0.98)

With 9000 samples, almost identical toanalytical result.

Page 52: Fundamentals of Bayesian Inference
Page 53: Fundamentals of Bayesian Inference

Want to Know MoreAbout MCMC?

Page 54: Fundamentals of Bayesian Inference

MCMC

With MCMC, the models you can build and estimate are said to be “limited only by the user’s imagination”.

But how do you get MCMC to work? Option 1: write the code yourself. Option 2: use WinBUGS/JAGS/STAN!

Page 55: Fundamentals of Bayesian Inference

Want to Know MoreAbout WinBUGS?

Page 56: Fundamentals of Bayesian Inference

Outline

Bayes in a Nutshell The Bayesian Revolution Hypothesis Testing

Page 57: Fundamentals of Bayesian Inference

Intermezzo

Confidence Intervals

Page 58: Fundamentals of Bayesian Inference

Frequentist Inference

Procedures are used because they do well in the long run, that is, in many situations.

Parameters are assumed to be fixed, and do not have a probability distribution.

Inference is pre-experimental or unconditional.

Page 59: Fundamentals of Bayesian Inference

Confidence Intervals

Width = 1

Mean = μ

Page 60: Fundamentals of Bayesian Inference

Confidence Intervals

Width = 1

μ

Draw a random number x.

x

Draw another random number y. What is the probabilitythat it will lie to the other side of μ?

y

Page 61: Fundamentals of Bayesian Inference

Confidence Intervals

Width = 1

μ

When we repeated this procedure many times , the mean μ will lie in the interval in 50% of the cases. Hence, the interval (x, y) with y > x is a 50% confidence interval for μ.

x y

Page 62: Fundamentals of Bayesian Inference

Confidence Intervals

Width = 1

μ

But now you observe the following data:

x y

Page 63: Fundamentals of Bayesian Inference

Confidence Intervals

Width = 1

μ

Because the width of the distribution is 1, I am 100% confident that the mean lies in the 50% confidence interval!

x y

Page 64: Fundamentals of Bayesian Inference

Why?

Frequentist procedures have good pre-experimental properties and are designed to work well for most data.

For particular data, however, these procedures may be horrible.

For more examples see the 1988 book by Berger & Wolpert, “the likelihood principle”.

Page 65: Fundamentals of Bayesian Inference

Bayesian Hypothesis TestingIn Nested Models

Harold Jeffreys Jeff Rouder

Page 66: Fundamentals of Bayesian Inference

Bayesian Hypothesis Testing: Example

We prepare for you a series of 10 factual true/false questions of equal difficulty.

You answer 9 out of 10 questions correctly. Have you been guessing?

0 : 0.5H

1 : 0.5H

Page 67: Fundamentals of Bayesian Inference

Bayesian Hypothesis Testing: Example

The Bayesian hypothesis test starts by calculating, for each model, the (marginal) probability of the observed data.

The ratio of these quantities is called the Bayes factor:

001

1

( | )

( | )

P D HBF

P D H

Page 68: Fundamentals of Bayesian Inference

Bayesian Model Selection

1 1 1

2 2 2

| |

| |

P M D P D M P M

P M D P D M P M

Posterior odds

Prior odds

Bayes factor

Page 69: Fundamentals of Bayesian Inference

Bayesian Hypothesis Test

Prob. of Data Under the Null Hypothesis

Prob. of Data Under the Alternative Hypothesis

The result is known as the Bayes factor. Solves the problem of disregarding H1!

Page 70: Fundamentals of Bayesian Inference

Guidelines for Interpretation of the Bayes Factor

BF Evidence

1 – 3 Anecdotal 3 – 10 Substantial10 – 30 Strong 30 – 100 Very strong >100 Decisive

Page 71: Fundamentals of Bayesian Inference

Bayesian Hypothesis Testing: Example

BF01 = p(D|H0) / p(D|H1)

When BF01 = 2, the data are twice as likely under H0 as under H1.

When, a priori, H0 and H1 are equally likely, this means that the posterior probability in favor of H0 is 2/3.

Page 72: Fundamentals of Bayesian Inference

Bayesian Hypothesis Testing: Example

The complication is that these so-called marginal probabilities are often difficult to compute.

For this simple model, everything can be done analytically…

Page 73: Fundamentals of Bayesian Inference

Bayesian Hypothesis Testing: Example

BF01 = p(D|H0) / p(D|H1) = 0.107

This means that the data are 1/ 0.107 = 9.3 times more likely under H1, the “no-guessing” model.

The associated posterior probability for H0 is about 0.10.

For more interesting models, things are never this straightforward!

Page 74: Fundamentals of Bayesian Inference

Savage-Dickey

When the competing models are nested (i.e., one is a more complex version of the other), Savage and Dickey have shown that, in our example,

Page 75: Fundamentals of Bayesian Inference

Savage-Dickey

Height of prior distributionat point of interest

Height of posterior distributionat point of interest

Page 76: Fundamentals of Bayesian Inference

Height of prior = 1Height of posterior = 0.107; Therefore,BF01 = 0.107/1 = 0.107.

Page 77: Fundamentals of Bayesian Inference

Advantages of Savage-Dickey

In order to obtain the Bayes factor you do not need to integrate out the model parameters, as you would do normally.

Instead, you only needs to work with the more complex model, and study the prior and posterior distributions only for the parameter that is subject to test.

Page 78: Fundamentals of Bayesian Inference

Intermezzo

In-Class Exercise...

Page 79: Fundamentals of Bayesian Inference

Practical Problem

Dr. John proposes a Seasonal Memory Model (SMM), which quickly becomes popular.

Dr. Smith is skeptical, and wants to test the model.

The model predicts that the increase in recall performance due to the intake of glucose is more pronounced in summer than in winter.

Dr. Smith conducts the experiment…

Page 80: Fundamentals of Bayesian Inference

Practical Problem

And finds the following results:

For these data, t = 0.79, p = .44. Note that, if anything, the result goes in the direction

opposite to that predicted by the model.

Page 81: Fundamentals of Bayesian Inference

Practical Problem

Dr. Smith reports his results in a paper entitled “False Predictions from the SMM: The impact on glucose and the seasons on memory”, which he submits to the Journal of Experimental Psychology: Learning, Memory, and Seasons.

After some time, Dr. Smith receives three reviews, one signed by Dr. John, who invented the SMM. Part of this review reads:

Page 82: Fundamentals of Bayesian Inference

Practical Problem

“From a null result, we cannot conclude that no difference exists, merely that we cannot reject the null hypothesis. Although some have argued that with enough data we can argue for the null hypothesis, most agree that this is only a reasonable thing to do in the face of a sizeable amount of data [which] has been collected over many experiments that control for all concerns. These conditions are not met here. Thus, the empirical contribution here does not enable readers to conclude very much, and so is quite weak (...).”

Page 83: Fundamentals of Bayesian Inference

Practical Problem

How can Dr. Smith analyze his results and quantify exactly the amount of support in favor of H0 versus the SMM?

Page 84: Fundamentals of Bayesian Inference

Helping Dr. Smith

Recall, however, that in the case of Dr. Smith the alternative hypothesis was directional; SMM predicted that the increase should be larger in summer than in winter – the opposite pattern was obtained.

We’ll develop a test that can handle directional hypotheses, and some other stuff also.

Page 85: Fundamentals of Bayesian Inference

WSD t-test

WSD stands for WinBUGS Savage Dickey t-test.

We’ll use WinBUGS to get samples from the posterior distribution for parameter δ, which represents effect size.

We’ll then use the Savage-Dickey method to obtain the Bayes factor. Our null hypothesis is always δ = 0.

Page 86: Fundamentals of Bayesian Inference

Graphical Model for the One-Sample t-test

Page 87: Fundamentals of Bayesian Inference
Page 88: Fundamentals of Bayesian Inference

WSD t-test

The t-test can be implemented in WinBUGS. We get close correspondence to Rouder et

al.’s analytical t-test. The WSD t-test can also be implemented for

two-sample tests (in which the two groups can also have different variances).

Here we’ll focus on the problem that still plagues Dr. Smith…

Page 89: Fundamentals of Bayesian Inference

Helping Dr. Smith (This Time for Real)

The Smith data (t = 0.79 , p = .44)

This is a within-subject experiment, so we can subtract the recall scores (winter minus summer) and see whether the difference is zero or not.

Page 90: Fundamentals of Bayesian Inference

NB: Rouder’s test also gives BF01 = 6.08

Dr. Smith Data: Non-directional Test

Page 91: Fundamentals of Bayesian Inference

Dr. Smith Data: SMM Directional Test

This means that p(H0|D) is about 0.93 (in case H0 and H1 are equally likely a priori)

Page 92: Fundamentals of Bayesian Inference

Dr. Smith Data: Directional Test Inconsistent with SMM’s prediction

Page 93: Fundamentals of Bayesian Inference

Helping Dr. Smith (This Time for Real)

According to our t-test, the data are about 14 times more likely under the null hypothesis than under the SMM hypothesis.

This is generally considered strong evidence in favor of the null.

Note that we have quantified evidence in favor of the null, something that is impossible in p-value hypothesis testing.

Page 94: Fundamentals of Bayesian Inference

Practical Consequences:A Psych Science Example

Experiment 1: “participants who were unobtrusively induced to move in the portly way that is associated with the overweight stereotype ascribed more stereotypic characteristics to the target than did control participants, t(18) = 2.1, p < .05.”

NB. The Bayes factor is 1.59 in favor of H1 (i.e., “only worth a bare mention”)

Page 95: Fundamentals of Bayesian Inference

Practical Consequences: A Psych Science Example

Experiment 2: “participants who were induced to engage in slow movements that are stereotypic of the elderly judged Angelika [a hypothetical person described in ambiguous terms - EJ] to be more forgetful than did control participants, t(35) = 2.1, p < .05.”

NB. The Bayes factor is 1.52 in favor of H1 (i.e., “only worth a bare mention”).

Page 96: Fundamentals of Bayesian Inference

Practical Consequences: A Psych Science Example

The author has somehow obtained a p-value smaller than .05 in both experiments.

Unfortunately (for the field), this constitutes little evidence in favor of the alternative hypothesis.

Note also that the prior plausibility of the alternative hypothesis is low. Extraordinary claims require extraordinary evidence!

Page 97: Fundamentals of Bayesian Inference

Empirical Comparison

In 252 articles, spanning 2394 pages, “we” found 855 t-tests.

This translates to an average of one t-test for every 2.8 pages, or about 3.4 t-tests per article.

Details in Wetzels et al., 2011, Perspectives on Psychological Science.

Page 98: Fundamentals of Bayesian Inference
Page 99: Fundamentals of Bayesian Inference

Main Problem

Page 100: Fundamentals of Bayesian Inference

Can People Look into the Future?

Intentions are posted online: design, intended analyses, the works.

Page 101: Fundamentals of Bayesian Inference

Optional Stopping is Allowed

“It is entirely appropriate to collect data until a point has been proven or disproven, or until the data collector runs out of time, money, or patience.”

Edwards, Lindman, & Savage, 1963, Psych Rev.

Page 102: Fundamentals of Bayesian Inference
Page 103: Fundamentals of Bayesian Inference

Inside every Non-Bayesian, there is a Bayesian

struggling to get out

Dennis Lindley