Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors:

Introduction to Bayesian InferenceSeptember 8th, 2008

Reading: Gill Chapter 1-2

Introduction to Bayesian Inference – p.1/40

Phases of Statistical Analysis

1. Initial Data ManipulationAssembling dataChecks of data quality - graphical and numeric

2. Preliminary Analysis: Clarify Directions for AnalysisIdentifying Data Structure:What is to be regarded as an “individual”?Are individuals grouped or associated?Types of variables measured on each individual?Response versus Explanatory VariablesAre any observations missing?

3. Definite Analysis – Formulation of Model, Fitting,Checking (EDA) . . .

4. Presentation of ConclusionsIntroduction to Bayesian Inference – p.2/40

Styles of Analysis

Descriptive MethodsGraphicalNumerical summaries

Probabilistic Methodsprobabilistic properties of estimates ( SamplingDistribution)probability model for observed data (Likelihood)probability model for quantifying prior uncertainty(Bayesian)

Bayesian analysis provides a formal approach forupdating prior information with the observed data(likelihood) to quantify uncertainty a posteriori


Analysis Goals

Estimation of uncertain quantities (parameters)

Prediction of future events

Tests of hypotheses

Making Decisions

Theoretical framework based on probability models


Probability: Measurement of Uncertainty

Counting: equally likely outcomes (card and dicegames, sampling)

Long-Run Frequency: Hurricane in September?Relative frequency from “similar” years

Subjective: measure of belief, judgmentapplies to unique events – probability of terroristattacksexpert opinioncomputer simulations

Bayesian methods allow the use of subjective information

(if available)Introduction to Bayesian Inference – p.5/40

Steps for Bayesian Inference

Quantify prior knowledge using a probabilitydistribution (Prior Distribution)

Observe event or data

Update prior beliefs (probabilities) with new data

Result is a Posterior Distribution

Formal updating rule given by Bayes’ Theorem


Probability

Events A, B, and BC

B ∩ BC = ∅ (1)

P (B ∪ BC) = 1 (2)

Conditional Probability

P (A | B) ≡P (A ∩ B)

P (B)if P (B) 6= 0 (3)


Bayes’ Theorem

Bayes’ Theorem reverse the conditioning:What is probability of B given that A has occured?

P (B | A) =P (A ∩ B)

P (A)(4)

=P (A | B)P (B)

P (A)(5)

Law of Total Probability:

P (A) = P (A | B)P (B) + P (A | BC)P (BC)


Twin Example

Twins are either identical (single egg that splits) orfraternal (two eggs)

Sexes: Girl/Girl GG, Boy/Boy BB, Girl/Boy GB orBoy/Girl BG

GG or BB twins could be identical (I) or fraternal (F )

Given that I have GG twins, what is the probability thatthey are identical: P (I | GG) ?

Easy to calcuate the probability in the other way P (GG | I)


Identical Twins

P (I | GG) =P (GG | I)P (I)

P (GG | I)P (I) + P (GG | F )P (F )

P (I) is the overall probability of having identical twins(among pregnancies with twins). Real-world datashows that about 1/3 of all twins are identical twins.This is our prior information.

If the twins are identical, they are either two boys ortwo girls∗ It is reasonable to assume that the twocases are equally probable (based on real-worldexperience). P (GG | I) = 1/2

In the case of non-identical twins, the four possibleoutcomes are assumed to be equally likely, soP (GG | F ) = 1/4


Result

Substituting the probabilities

P (GG) = 1/2 ∗ 1/3 + 1/4 ∗ 2/3 = 1/3 (6)

P (I | GG) =P (GG | I)P (I)

P (GG)=

1/2 ∗ 1/3

1/3= 1/2 (7)

The probability that the twins are identical twins, giventhat they are both girls is 1/2

The result combines experimental data (two girls) withprior information (1/3 of all twins are identical twins).


Probability Models

At least Two Physical Meanings for Probability Models

1. Random Sampling Model

well-defined Population

individuals in sample are drawn from the population ina “random” way

could conceivably measure the entire population

probability model specifies properties of the fullpopulation


Probability Models

2. Observations are made on a system subject to randomfluctuations

probability model specifies what would happen if,hypothetically, observations were repeated again andagain under the same conditions.

Population is hypothetical


Mathematical versus Statistical Models

Flipping a Coin: Given enough physics and accuratemeasurements of initial conditions, could the outcome of acoin toss be predicted with certainty? Deterministicmodel.

Errors:

Wrong assumptions in “mathematical model”

Simplification in assumptions

⇒ seemingly random outcomes

All models are wrong, but some are more useful than oth-

ers!


Probability Distributions for the Random Outcomes

Parametric Probability Models:

Yiiid∼ f(y, θ) for i = 1, . . . , n

Bernoulli/Binomial

Multinomial

Poission

Normal

Lognormal

Exponential

Gamma

Models almost always have unknown parametersIntroduction to Bayesian Inference – p.15/40

Likelihood

Assume a parametric model Y ∼ f(y|θ)

Likelihood L(θ|y1, . . . , yn) =∏

i f(yi|θ)

For each value of θ, the likelihood says how well thatvalue of θ explains the observed data

Maximum Likelihood Estimates θ

Interval Estimates/Testing via Likelihood Ratios

Quantification: Coverage Probabilities or Type I orType II errors requires distribution theory

Exact sampling distributions or Asymptoticapproximations


Exponential Example: Rate β

The time between accidents modeled with rate of β perday. Observations:1.5 15.0 60.3 30.5 2.8 56.4 27.0 6.4 110.7 25.4

f(y|β) = β exp(−yβ) y > 0

L(β; y1, . . . , yn) =∏

i

β exp (−yiβ)

= βn exp (−∑

i

yiβ)

Look at plot of L(β) for∑

yi = 336.


Likelihood function

l.exp = function(theta, y) {n = length(y)sumy = sum(y)l = thetaˆn * exp(-sumy * theta)return(l)

}

Vectorized: can be used to evaluate L(θ) at multiplevalues ot θ rather than using a loopbeta = seq(.00001, .25, length=1000)l.exp(beta, y)


Plot of Likelihood

0.00 0.05 0.10 0.15 0.20 0.25

0.0e

+00

5.0e

−21

1.0e

−20

1.5e

−20

2.0e

−20

2.5e

−20

β

L(β)

Exponential Likelihood


Adding Greek and Math Symbols to Plots

In the plot command I used

xlab=expression(beta),ylab=expression(L(beta)))

If the ’text’ argument to one of the text-drawing functions(’text’, ’mtext’, ’axis’, ’titles’, x- and y-axis labels) in R is anexpression, the argument is interpreted as amathematical expression and the output will be formattedaccording to TeX-like rules. For lots of examples see:

demo(plotmath)


Estimate of β

We can find the value of β that maximizes thelikelihood (or equivalently the log of the likelihood)

β = 0.03

> ll = l.exp(beta, y)> beta[ll == max(ll)]

Likelihood Ratio Λ(β) = L(β)/L(β)

Interval Estimates: Set of β such that Λ(β) > r

r = .1, .2 β values that are not “too unlikely”, that iswithin 10% or 20% of the maximum.


Interval Estimate

0.00 0.05 0.10 0.15 0.20 0.25

0.0

0.2

0.4

0.6

0.8

1.0

β

Λ(β

)

Exponential Likelihood Ratio


Intervals & Adding Filled Areas

> ll = ll/max(ll) # LR> # index of values LR > .1> set = ll >= .1> # min and max values of beta LR > .1> llint= c(min(beta[set]), max(beta[set]))[1] 0.014 .055> # add a filled polygon to plot> polygon(c(beta[set],llint[c(2,1)]),

c(ll[set],0,0),col="orange", density=NA)


Coverage

How confident are we that the interval covers the true β?

Likelihood is a function only of Y

Need the sampling distribution of Y

Need to work out theoretical probabilities thatintervals will cover the parameter.

Large n use CLT to construct intervals.

Which intervals (symmetric, or LR)?

More later in STA 213


Likelihood

Subjective interpretation of probability from the likelihood?

Likelihood looks like a density function

Normalize it to construct a probability density for β

p(β|y1, . . . , yn) ∝ βn exp(−β∑

yi)

As a function of β, looks like the kernel of a Gammadistribution


Gamma Distribution

Z has a Gamma distribution with mean a/b and variancea/b2:

z ∼ Gamma(a, b)

with density

f(z) =ba

Γ(a)za−1 exp(−zb) z > 0

In this parameterization b is a rate parameter.

rgamma(n, shape=a, scale=s) # mean =shape * scale

rgamma(n, shape=a, rate=r) # mean =shape/rate


Examples of Gamma Distributions

0 2 4 6 8 10

0.05

0.10

0.15

0.20

x

Den

sity

a = 1, r = 1/5a = 2, r = 1/5a = 2, r = 1/2


Distribution of β

Identify normalizing constant to make this a density for βor recognize the form of the distribution

p(β|y1, . . . , yn) = cβn exp(−β∑

yi)

where

c = 1/

∫∞

0βn exp(−β

∑yi)dβ

Looks like a Gamma(a = n + 1, b =∑

yi)

normalized likelihood has same shape has likelihood

same form that would be obtained using BayesTheorem with a “uniform” prior


Bayesian Inference

Starts with parameteric model for the data Y | θ

specify prior beliefs or prior uncertainty about θ usinga prior probabiity distribution p(θ)

Use Bayes Theorem to reverse the conditioning toobtain the posterior distribution of θ conditional onobserved data

Posterior Distribution:

p(θ|y1, . . . , yn) =L(θ;Y )p(θ)∫L(θ;Y )p(θ) dθ

(8)

p(θ|y1, . . . , yn) ∝ L(θ;Y )p(θ) (9)


Posterior Distribution

Posterior distribution is a conditional distribution: jointdistributions of data and parameters divided bydistribution of data

p(θ|y1, . . . , yn) =

∏f(yi|θ)p(θ)

f(y1, . . . , yn)

where f(y1, . . . , yn) =

∫ ∏f(yi|θ)p(θ)dθ

f(y1, . . . yn) is the marginal distribution of the data


Prior Distribution

“Uniform” prior p(β) = 1 in exponential example is nota proper distribution; although the posteriordistribution is a proper distribution.

“Formal Bayes” posterior distribution obtained as alimit of a proper Bayes procedure.

Be very careful with improper prior distributions, theymay not lead to proper posterior distributions!


Conjugate Prior Distributions

Prior and Posterior distributions are in the same family

β ∼ G(a, b) (10)

p(β | Y ) ∝ba

Γ(a)βa−1e−βbβne−β

P

yi (11)

∝ βa+n−1e−β(b+P

yi) (12)

β | Y ∼ G(a + n, b +∑

yi) (13)

For exponential data, the conjugate prior distribution is aGamma

“Uniform” is a limiting case with a = 1 and b → 0


Examples of Conjugate Families

Data Distributions Prior/PosteriorExponential Gamma

Poisson GammaBinomial Beta

Normal (known variance) NormalNormal (unknown mean/variance) Normal-Gamma

Always available in exponential families


Posterior Quantities

posterior distribution G(11, 336)

posterior mode β = 0.03 (same as MLE)

posterior mean E(β|Y ) = 11/336 = 0.033

Interval Estimates – Probability Intervals or CredibleRegions

Highest Posterior Density regionsEqual Tail areaIntervals based on Aysmptotics


HPD Regions

Find

Θ1−α = {θ : p(θ | Y ) ≥ cα} such that P (θ ∈ Θ1−α | Y ) = 1−α

Often requires iterative solution

Find points such that p(θ|Y ) > c

Find probability of that set

Adjust c until reach desired level

may not have symmetric tail areas

multimodal posterior may not have an interval


Highest Posterior Density Region

0.02 0.03 0.04 0.05 0.06

010

2030

40

95% Highest Posterior Density[0.014, 0.054]

θ

Pos

terio

r D

ensi

ty

P (.014 < β < .054 | Y ) = 0.959

pgamma(.054,11,rate=336) -pgamma(.014,11,rate=336)


Equal Tail Area Intervals

Easier alternative is to find points such that

P (θ < θl | Y ) = α/2

P (θ > θu | Y ) = α/2

(θl, θu) is a 1 − α 100% Credible Interval (or PosteriorProbability Interval) for θ.

> qgamma(.025, 11, 336)[1] 0.01634274> qgamma(.975, 11, 336)[1] 0.0547332


Empirical Rule Estimates

Mean ± 2 Standard deviations (OK for symmetric bellshaped distributions)

Mean of gamma a/b = 0.0327

Variance of gamma a/b2 = 11/3362

Standard deviation = 0.0099

Approximate 95% Interval is (.013, .053)


HPD Regions via Simulation

Simulate from the posterior distribution rgamma()

Coerce them into “class” mcmc as.mcmc()

Use HPDinterval() to find interval

> beta.out = rgamma(10000, 11, rate=sum(y))> libray(coda)> beta.out = as.mcmc(beta.out)> HPDinterval(beta.out)

lower uppervar1 0.01425439 0.05183348attr(,"Probability")[1] 0.95> pgamma(.05183,11,336)-pgamma(.01425,11,336)[1] 0.9493977


Summary

Show full posterior distribution when possible

Overlay the prior distribution (if it is proper)

Use HPD regions – smallest length of any region forthe given coverage – when feasible


Documents

Introduction to Bayesian Inference - Statistical Science of Statistical Analysis 1. ... probability model for observed data (Likelihood) ... Introduction to Bayesian Inference ...Authors: