40
Introduction to Bayesian Inference September 8th, 2008 Reading: Gill Chapter 1-2 Introduction to Bayesian Inference – p.1/40

Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Embed Size (px)

Citation preview

Page 1: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Introduction to Bayesian InferenceSeptember 8th, 2008

Reading: Gill Chapter 1-2

Introduction to Bayesian Inference – p.1/40

Page 2: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Phases of Statistical Analysis

1. Initial Data ManipulationAssembling dataChecks of data quality - graphical and numeric

2. Preliminary Analysis: Clarify Directions for AnalysisIdentifying Data Structure:What is to be regarded as an “individual”?Are individuals grouped or associated?Types of variables measured on each individual?Response versus Explanatory VariablesAre any observations missing?

3. Definite Analysis – Formulation of Model, Fitting,Checking (EDA) . . .

4. Presentation of ConclusionsIntroduction to Bayesian Inference – p.2/40

Page 3: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Styles of Analysis

Descriptive MethodsGraphicalNumerical summaries

Probabilistic Methodsprobabilistic properties of estimates ( SamplingDistribution)probability model for observed data (Likelihood)probability model for quantifying prior uncertainty(Bayesian)

Bayesian analysis provides a formal approach forupdating prior information with the observed data(likelihood) to quantify uncertainty a posteriori

Introduction to Bayesian Inference – p.3/40

Page 4: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Analysis Goals

Estimation of uncertain quantities (parameters)

Prediction of future events

Tests of hypotheses

Making Decisions

Theoretical framework based on probability models

Introduction to Bayesian Inference – p.4/40

Page 5: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Probability: Measurement of Uncertainty

Counting: equally likely outcomes (card and dicegames, sampling)

Long-Run Frequency: Hurricane in September?Relative frequency from “similar” years

Subjective: measure of belief, judgmentapplies to unique events – probability of terroristattacksexpert opinioncomputer simulations

Bayesian methods allow the use of subjective information

(if available)Introduction to Bayesian Inference – p.5/40

Page 6: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Steps for Bayesian Inference

Quantify prior knowledge using a probabilitydistribution (Prior Distribution)

Observe event or data

Update prior beliefs (probabilities) with new data

Result is a Posterior Distribution

Formal updating rule given by Bayes’ Theorem

Introduction to Bayesian Inference – p.6/40

Page 7: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Probability

Events A, B, and BC

B ∩ BC = ∅ (1)

P (B ∪ BC) = 1 (2)

Conditional Probability

P (A | B) ≡P (A ∩ B)

P (B)if P (B) 6= 0 (3)

Introduction to Bayesian Inference – p.7/40

Page 8: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Bayes’ Theorem

Bayes’ Theorem reverse the conditioning:What is probability of B given that A has occured?

P (B | A) =P (A ∩ B)

P (A)(4)

=P (A | B)P (B)

P (A)(5)

Law of Total Probability:

P (A) = P (A | B)P (B) + P (A | BC)P (BC)

Introduction to Bayesian Inference – p.8/40

Page 9: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Twin Example

Twins are either identical (single egg that splits) orfraternal (two eggs)

Sexes: Girl/Girl GG, Boy/Boy BB, Girl/Boy GB orBoy/Girl BG

GG or BB twins could be identical (I) or fraternal (F )

Given that I have GG twins, what is the probability thatthey are identical: P (I | GG) ?

Easy to calcuate the probability in the other way P (GG | I)

Introduction to Bayesian Inference – p.9/40

Page 10: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Identical Twins

P (I | GG) =P (GG | I)P (I)

P (GG | I)P (I) + P (GG | F )P (F )

P (I) is the overall probability of having identical twins(among pregnancies with twins). Real-world datashows that about 1/3 of all twins are identical twins.This is our prior information.

If the twins are identical, they are either two boys ortwo girls∗ It is reasonable to assume that the twocases are equally probable (based on real-worldexperience). P (GG | I) = 1/2

In the case of non-identical twins, the four possibleoutcomes are assumed to be equally likely, soP (GG | F ) = 1/4

Introduction to Bayesian Inference – p.10/40

Page 11: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Result

Substituting the probabilities

P (GG) = 1/2 ∗ 1/3 + 1/4 ∗ 2/3 = 1/3 (6)

P (I | GG) =P (GG | I)P (I)

P (GG)=

1/2 ∗ 1/3

1/3= 1/2 (7)

The probability that the twins are identical twins, giventhat they are both girls is 1/2

The result combines experimental data (two girls) withprior information (1/3 of all twins are identical twins).

Introduction to Bayesian Inference – p.11/40

Page 12: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Probability Models

At least Two Physical Meanings for Probability Models

1. Random Sampling Model

well-defined Population

individuals in sample are drawn from the population ina “random” way

could conceivably measure the entire population

probability model specifies properties of the fullpopulation

Introduction to Bayesian Inference – p.12/40

Page 13: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Probability Models

2. Observations are made on a system subject to randomfluctuations

probability model specifies what would happen if,hypothetically, observations were repeated again andagain under the same conditions.

Population is hypothetical

Introduction to Bayesian Inference – p.13/40

Page 14: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Mathematical versus Statistical Models

Flipping a Coin: Given enough physics and accuratemeasurements of initial conditions, could the outcome of acoin toss be predicted with certainty? Deterministicmodel.

Errors:

Wrong assumptions in “mathematical model”

Simplification in assumptions

⇒ seemingly random outcomes

All models are wrong, but some are more useful than oth-

ers!

Introduction to Bayesian Inference – p.14/40

Page 15: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Probability Distributions for the Random Outcomes

Parametric Probability Models:

Yiiid∼ f(y, θ) for i = 1, . . . , n

Bernoulli/Binomial

Multinomial

Poission

Normal

Lognormal

Exponential

Gamma

Models almost always have unknown parametersIntroduction to Bayesian Inference – p.15/40

Page 16: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Likelihood

Assume a parametric model Y ∼ f(y|θ)

Likelihood L(θ|y1, . . . , yn) =∏

i f(yi|θ)

For each value of θ, the likelihood says how well thatvalue of θ explains the observed data

Maximum Likelihood Estimates θ

Interval Estimates/Testing via Likelihood Ratios

Quantification: Coverage Probabilities or Type I orType II errors requires distribution theory

Exact sampling distributions or Asymptoticapproximations

Introduction to Bayesian Inference – p.16/40

Page 17: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Exponential Example: Rate β

The time between accidents modeled with rate of β perday. Observations:1.5 15.0 60.3 30.5 2.8 56.4 27.0 6.4 110.7 25.4

f(y|β) = β exp(−yβ) y > 0

L(β; y1, . . . , yn) =∏

i

β exp (−yiβ)

= βn exp (−∑

i

yiβ)

Look at plot of L(β) for∑

yi = 336.

Introduction to Bayesian Inference – p.17/40

Page 18: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Likelihood function

l.exp = function(theta, y) {n = length(y)sumy = sum(y)l = thetaˆn * exp(-sumy * theta)return(l)

}

Vectorized: can be used to evaluate L(θ) at multiplevalues ot θ rather than using a loopbeta = seq(.00001, .25, length=1000)l.exp(beta, y)

Introduction to Bayesian Inference – p.18/40

Page 19: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Plot of Likelihood

0.00 0.05 0.10 0.15 0.20 0.25

0.0e

+00

5.0e

−21

1.0e

−20

1.5e

−20

2.0e

−20

2.5e

−20

β

L(β)

Exponential Likelihood

Introduction to Bayesian Inference – p.19/40

Page 20: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Adding Greek and Math Symbols to Plots

In the plot command I used

xlab=expression(beta),ylab=expression(L(beta)))

If the ’text’ argument to one of the text-drawing functions(’text’, ’mtext’, ’axis’, ’titles’, x- and y-axis labels) in R is anexpression, the argument is interpreted as amathematical expression and the output will be formattedaccording to TeX-like rules. For lots of examples see:

demo(plotmath)

Introduction to Bayesian Inference – p.20/40

Page 21: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Estimate of β

We can find the value of β that maximizes thelikelihood (or equivalently the log of the likelihood)

β = 0.03

> ll = l.exp(beta, y)> beta[ll == max(ll)]

Likelihood Ratio Λ(β) = L(β)/L(β)

Interval Estimates: Set of β such that Λ(β) > r

r = .1, .2 β values that are not “too unlikely”, that iswithin 10% or 20% of the maximum.

Introduction to Bayesian Inference – p.21/40

Page 22: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Interval Estimate

0.00 0.05 0.10 0.15 0.20 0.25

0.0

0.2

0.4

0.6

0.8

1.0

β

Λ(β

)

Exponential Likelihood Ratio

Introduction to Bayesian Inference – p.22/40

Page 23: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Intervals & Adding Filled Areas

> ll = ll/max(ll) # LR> # index of values LR > .1> set = ll >= .1> # min and max values of beta LR > .1> llint= c(min(beta[set]), max(beta[set]))[1] 0.014 .055> # add a filled polygon to plot> polygon(c(beta[set],llint[c(2,1)]),

c(ll[set],0,0),col="orange", density=NA)

Introduction to Bayesian Inference – p.23/40

Page 24: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Coverage

How confident are we that the interval covers the true β?

Likelihood is a function only of Y

Need the sampling distribution of Y

Need to work out theoretical probabilities thatintervals will cover the parameter.

Large n use CLT to construct intervals.

Which intervals (symmetric, or LR)?

More later in STA 213

Introduction to Bayesian Inference – p.24/40

Page 25: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Likelihood

Subjective interpretation of probability from the likelihood?

Likelihood looks like a density function

Normalize it to construct a probability density for β

p(β|y1, . . . , yn) ∝ βn exp(−β∑

yi)

As a function of β, looks like the kernel of a Gammadistribution

Introduction to Bayesian Inference – p.25/40

Page 26: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Gamma Distribution

Z has a Gamma distribution with mean a/b and variancea/b2:

z ∼ Gamma(a, b)

with density

f(z) =ba

Γ(a)za−1 exp(−zb) z > 0

In this parameterization b is a rate parameter.

rgamma(n, shape=a, scale=s) # mean =shape * scale

rgamma(n, shape=a, rate=r) # mean =shape/rate

Introduction to Bayesian Inference – p.26/40

Page 27: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Examples of Gamma Distributions

0 2 4 6 8 10

0.05

0.10

0.15

0.20

x

Den

sity

a = 1, r = 1/5a = 2, r = 1/5a = 2, r = 1/2

Introduction to Bayesian Inference – p.27/40

Page 28: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Distribution of β

Identify normalizing constant to make this a density for βor recognize the form of the distribution

p(β|y1, . . . , yn) = cβn exp(−β∑

yi)

where

c = 1/

∫∞

0βn exp(−β

∑yi)dβ

Looks like a Gamma(a = n + 1, b =∑

yi)

normalized likelihood has same shape has likelihood

same form that would be obtained using BayesTheorem with a “uniform” prior

Introduction to Bayesian Inference – p.28/40

Page 29: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Bayesian Inference

Starts with parameteric model for the data Y | θ

specify prior beliefs or prior uncertainty about θ usinga prior probabiity distribution p(θ)

Use Bayes Theorem to reverse the conditioning toobtain the posterior distribution of θ conditional onobserved data

Posterior Distribution:

p(θ|y1, . . . , yn) =L(θ;Y )p(θ)∫L(θ;Y )p(θ) dθ

(8)

p(θ|y1, . . . , yn) ∝ L(θ;Y )p(θ) (9)

Introduction to Bayesian Inference – p.29/40

Page 30: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Posterior Distribution

Posterior distribution is a conditional distribution: jointdistributions of data and parameters divided bydistribution of data

p(θ|y1, . . . , yn) =

∏f(yi|θ)p(θ)

f(y1, . . . , yn)

where f(y1, . . . , yn) =

∫ ∏f(yi|θ)p(θ)dθ

f(y1, . . . yn) is the marginal distribution of the data

Introduction to Bayesian Inference – p.30/40

Page 31: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Prior Distribution

“Uniform” prior p(β) = 1 in exponential example is nota proper distribution; although the posteriordistribution is a proper distribution.

“Formal Bayes” posterior distribution obtained as alimit of a proper Bayes procedure.

Be very careful with improper prior distributions, theymay not lead to proper posterior distributions!

Introduction to Bayesian Inference – p.31/40

Page 32: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Conjugate Prior Distributions

Prior and Posterior distributions are in the same family

β ∼ G(a, b) (10)

p(β | Y ) ∝ba

Γ(a)βa−1e−βbβne−β

P

yi (11)

∝ βa+n−1e−β(b+P

yi) (12)

β | Y ∼ G(a + n, b +∑

yi) (13)

For exponential data, the conjugate prior distribution is aGamma

“Uniform” is a limiting case with a = 1 and b → 0

Introduction to Bayesian Inference – p.32/40

Page 33: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Examples of Conjugate Families

Data Distributions Prior/PosteriorExponential Gamma

Poisson GammaBinomial Beta

Normal (known variance) NormalNormal (unknown mean/variance) Normal-Gamma

Always available in exponential families

Introduction to Bayesian Inference – p.33/40

Page 34: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Posterior Quantities

posterior distribution G(11, 336)

posterior mode β = 0.03 (same as MLE)

posterior mean E(β|Y ) = 11/336 = 0.033

Interval Estimates – Probability Intervals or CredibleRegions

Highest Posterior Density regionsEqual Tail areaIntervals based on Aysmptotics

Introduction to Bayesian Inference – p.34/40

Page 35: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

HPD Regions

Find

Θ1−α = {θ : p(θ | Y ) ≥ cα} such that P (θ ∈ Θ1−α | Y ) = 1−α

Often requires iterative solution

Find points such that p(θ|Y ) > c

Find probability of that set

Adjust c until reach desired level

may not have symmetric tail areas

multimodal posterior may not have an interval

Introduction to Bayesian Inference – p.35/40

Page 36: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Highest Posterior Density Region

0.02 0.03 0.04 0.05 0.06

010

2030

40

95% Highest Posterior Density[0.014, 0.054]

θ

Pos

terio

r D

ensi

ty

P (.014 < β < .054 | Y ) = 0.959

pgamma(.054,11,rate=336) -pgamma(.014,11,rate=336)

Introduction to Bayesian Inference – p.36/40

Page 37: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Equal Tail Area Intervals

Easier alternative is to find points such that

P (θ < θl | Y ) = α/2

P (θ > θu | Y ) = α/2

(θl, θu) is a 1 − α 100% Credible Interval (or PosteriorProbability Interval) for θ.

> qgamma(.025, 11, 336)[1] 0.01634274> qgamma(.975, 11, 336)[1] 0.0547332

Introduction to Bayesian Inference – p.37/40

Page 38: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Empirical Rule Estimates

Mean ± 2 Standard deviations (OK for symmetric bellshaped distributions)

Mean of gamma a/b = 0.0327

Variance of gamma a/b2 = 11/3362

Standard deviation = 0.0099

Approximate 95% Interval is (.013, .053)

Introduction to Bayesian Inference – p.38/40

Page 39: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

HPD Regions via Simulation

Simulate from the posterior distribution rgamma()

Coerce them into “class” mcmc as.mcmc()

Use HPDinterval() to find interval

> beta.out = rgamma(10000, 11, rate=sum(y))> libray(coda)> beta.out = as.mcmc(beta.out)> HPDinterval(beta.out)

lower uppervar1 0.01425439 0.05183348attr(,"Probability")[1] 0.95> pgamma(.05183,11,336)-pgamma(.01425,11,336)[1] 0.9493977

Introduction to Bayesian Inference – p.39/40

Page 40: Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Summary

Show full posterior distribution when possible

Overlay the prior distribution (if it is proper)

Use HPD regions – smallest length of any region forthe given coverage – when feasible

Introduction to Bayesian Inference – p.40/40