Upload
lamxuyen
View
225
Download
0
Embed Size (px)
Citation preview
Introduction to Bayesian InferenceSeptember 8th, 2008
Reading: Gill Chapter 1-2
Introduction to Bayesian Inference – p.1/40
Phases of Statistical Analysis
1. Initial Data ManipulationAssembling dataChecks of data quality - graphical and numeric
2. Preliminary Analysis: Clarify Directions for AnalysisIdentifying Data Structure:What is to be regarded as an “individual”?Are individuals grouped or associated?Types of variables measured on each individual?Response versus Explanatory VariablesAre any observations missing?
3. Definite Analysis – Formulation of Model, Fitting,Checking (EDA) . . .
4. Presentation of ConclusionsIntroduction to Bayesian Inference – p.2/40
Styles of Analysis
Descriptive MethodsGraphicalNumerical summaries
Probabilistic Methodsprobabilistic properties of estimates ( SamplingDistribution)probability model for observed data (Likelihood)probability model for quantifying prior uncertainty(Bayesian)
Bayesian analysis provides a formal approach forupdating prior information with the observed data(likelihood) to quantify uncertainty a posteriori
Introduction to Bayesian Inference – p.3/40
Analysis Goals
Estimation of uncertain quantities (parameters)
Prediction of future events
Tests of hypotheses
Making Decisions
Theoretical framework based on probability models
Introduction to Bayesian Inference – p.4/40
Probability: Measurement of Uncertainty
Counting: equally likely outcomes (card and dicegames, sampling)
Long-Run Frequency: Hurricane in September?Relative frequency from “similar” years
Subjective: measure of belief, judgmentapplies to unique events – probability of terroristattacksexpert opinioncomputer simulations
Bayesian methods allow the use of subjective information
(if available)Introduction to Bayesian Inference – p.5/40
Steps for Bayesian Inference
Quantify prior knowledge using a probabilitydistribution (Prior Distribution)
Observe event or data
Update prior beliefs (probabilities) with new data
Result is a Posterior Distribution
Formal updating rule given by Bayes’ Theorem
Introduction to Bayesian Inference – p.6/40
Probability
Events A, B, and BC
B ∩ BC = ∅ (1)
P (B ∪ BC) = 1 (2)
Conditional Probability
P (A | B) ≡P (A ∩ B)
P (B)if P (B) 6= 0 (3)
Introduction to Bayesian Inference – p.7/40
Bayes’ Theorem
Bayes’ Theorem reverse the conditioning:What is probability of B given that A has occured?
P (B | A) =P (A ∩ B)
P (A)(4)
=P (A | B)P (B)
P (A)(5)
Law of Total Probability:
P (A) = P (A | B)P (B) + P (A | BC)P (BC)
Introduction to Bayesian Inference – p.8/40
Twin Example
Twins are either identical (single egg that splits) orfraternal (two eggs)
Sexes: Girl/Girl GG, Boy/Boy BB, Girl/Boy GB orBoy/Girl BG
GG or BB twins could be identical (I) or fraternal (F )
Given that I have GG twins, what is the probability thatthey are identical: P (I | GG) ?
Easy to calcuate the probability in the other way P (GG | I)
Introduction to Bayesian Inference – p.9/40
Identical Twins
P (I | GG) =P (GG | I)P (I)
P (GG | I)P (I) + P (GG | F )P (F )
P (I) is the overall probability of having identical twins(among pregnancies with twins). Real-world datashows that about 1/3 of all twins are identical twins.This is our prior information.
If the twins are identical, they are either two boys ortwo girls∗ It is reasonable to assume that the twocases are equally probable (based on real-worldexperience). P (GG | I) = 1/2
In the case of non-identical twins, the four possibleoutcomes are assumed to be equally likely, soP (GG | F ) = 1/4
Introduction to Bayesian Inference – p.10/40
Result
Substituting the probabilities
P (GG) = 1/2 ∗ 1/3 + 1/4 ∗ 2/3 = 1/3 (6)
P (I | GG) =P (GG | I)P (I)
P (GG)=
1/2 ∗ 1/3
1/3= 1/2 (7)
The probability that the twins are identical twins, giventhat they are both girls is 1/2
The result combines experimental data (two girls) withprior information (1/3 of all twins are identical twins).
Introduction to Bayesian Inference – p.11/40
Probability Models
At least Two Physical Meanings for Probability Models
1. Random Sampling Model
well-defined Population
individuals in sample are drawn from the population ina “random” way
could conceivably measure the entire population
probability model specifies properties of the fullpopulation
Introduction to Bayesian Inference – p.12/40
Probability Models
2. Observations are made on a system subject to randomfluctuations
probability model specifies what would happen if,hypothetically, observations were repeated again andagain under the same conditions.
Population is hypothetical
Introduction to Bayesian Inference – p.13/40
Mathematical versus Statistical Models
Flipping a Coin: Given enough physics and accuratemeasurements of initial conditions, could the outcome of acoin toss be predicted with certainty? Deterministicmodel.
Errors:
Wrong assumptions in “mathematical model”
Simplification in assumptions
⇒ seemingly random outcomes
All models are wrong, but some are more useful than oth-
ers!
Introduction to Bayesian Inference – p.14/40
Probability Distributions for the Random Outcomes
Parametric Probability Models:
Yiiid∼ f(y, θ) for i = 1, . . . , n
Bernoulli/Binomial
Multinomial
Poission
Normal
Lognormal
Exponential
Gamma
Models almost always have unknown parametersIntroduction to Bayesian Inference – p.15/40
Likelihood
Assume a parametric model Y ∼ f(y|θ)
Likelihood L(θ|y1, . . . , yn) =∏
i f(yi|θ)
For each value of θ, the likelihood says how well thatvalue of θ explains the observed data
Maximum Likelihood Estimates θ
Interval Estimates/Testing via Likelihood Ratios
Quantification: Coverage Probabilities or Type I orType II errors requires distribution theory
Exact sampling distributions or Asymptoticapproximations
Introduction to Bayesian Inference – p.16/40
Exponential Example: Rate β
The time between accidents modeled with rate of β perday. Observations:1.5 15.0 60.3 30.5 2.8 56.4 27.0 6.4 110.7 25.4
f(y|β) = β exp(−yβ) y > 0
L(β; y1, . . . , yn) =∏
i
β exp (−yiβ)
= βn exp (−∑
i
yiβ)
Look at plot of L(β) for∑
yi = 336.
Introduction to Bayesian Inference – p.17/40
Likelihood function
l.exp = function(theta, y) {n = length(y)sumy = sum(y)l = thetaˆn * exp(-sumy * theta)return(l)
}
Vectorized: can be used to evaluate L(θ) at multiplevalues ot θ rather than using a loopbeta = seq(.00001, .25, length=1000)l.exp(beta, y)
Introduction to Bayesian Inference – p.18/40
Plot of Likelihood
0.00 0.05 0.10 0.15 0.20 0.25
0.0e
+00
5.0e
−21
1.0e
−20
1.5e
−20
2.0e
−20
2.5e
−20
β
L(β)
Exponential Likelihood
Introduction to Bayesian Inference – p.19/40
Adding Greek and Math Symbols to Plots
In the plot command I used
xlab=expression(beta),ylab=expression(L(beta)))
If the ’text’ argument to one of the text-drawing functions(’text’, ’mtext’, ’axis’, ’titles’, x- and y-axis labels) in R is anexpression, the argument is interpreted as amathematical expression and the output will be formattedaccording to TeX-like rules. For lots of examples see:
demo(plotmath)
Introduction to Bayesian Inference – p.20/40
Estimate of β
We can find the value of β that maximizes thelikelihood (or equivalently the log of the likelihood)
β = 0.03
> ll = l.exp(beta, y)> beta[ll == max(ll)]
Likelihood Ratio Λ(β) = L(β)/L(β)
Interval Estimates: Set of β such that Λ(β) > r
r = .1, .2 β values that are not “too unlikely”, that iswithin 10% or 20% of the maximum.
Introduction to Bayesian Inference – p.21/40
Interval Estimate
0.00 0.05 0.10 0.15 0.20 0.25
0.0
0.2
0.4
0.6
0.8
1.0
β
Λ(β
)
Exponential Likelihood Ratio
Introduction to Bayesian Inference – p.22/40
Intervals & Adding Filled Areas
> ll = ll/max(ll) # LR> # index of values LR > .1> set = ll >= .1> # min and max values of beta LR > .1> llint= c(min(beta[set]), max(beta[set]))[1] 0.014 .055> # add a filled polygon to plot> polygon(c(beta[set],llint[c(2,1)]),
c(ll[set],0,0),col="orange", density=NA)
Introduction to Bayesian Inference – p.23/40
Coverage
How confident are we that the interval covers the true β?
Likelihood is a function only of Y
Need the sampling distribution of Y
Need to work out theoretical probabilities thatintervals will cover the parameter.
Large n use CLT to construct intervals.
Which intervals (symmetric, or LR)?
More later in STA 213
Introduction to Bayesian Inference – p.24/40
Likelihood
Subjective interpretation of probability from the likelihood?
Likelihood looks like a density function
Normalize it to construct a probability density for β
p(β|y1, . . . , yn) ∝ βn exp(−β∑
yi)
As a function of β, looks like the kernel of a Gammadistribution
Introduction to Bayesian Inference – p.25/40
Gamma Distribution
Z has a Gamma distribution with mean a/b and variancea/b2:
z ∼ Gamma(a, b)
with density
f(z) =ba
Γ(a)za−1 exp(−zb) z > 0
In this parameterization b is a rate parameter.
rgamma(n, shape=a, scale=s) # mean =shape * scale
rgamma(n, shape=a, rate=r) # mean =shape/rate
Introduction to Bayesian Inference – p.26/40
Examples of Gamma Distributions
0 2 4 6 8 10
0.05
0.10
0.15
0.20
x
Den
sity
a = 1, r = 1/5a = 2, r = 1/5a = 2, r = 1/2
Introduction to Bayesian Inference – p.27/40
Distribution of β
Identify normalizing constant to make this a density for βor recognize the form of the distribution
p(β|y1, . . . , yn) = cβn exp(−β∑
yi)
where
c = 1/
∫∞
0βn exp(−β
∑yi)dβ
Looks like a Gamma(a = n + 1, b =∑
yi)
normalized likelihood has same shape has likelihood
same form that would be obtained using BayesTheorem with a “uniform” prior
Introduction to Bayesian Inference – p.28/40
Bayesian Inference
Starts with parameteric model for the data Y | θ
specify prior beliefs or prior uncertainty about θ usinga prior probabiity distribution p(θ)
Use Bayes Theorem to reverse the conditioning toobtain the posterior distribution of θ conditional onobserved data
Posterior Distribution:
p(θ|y1, . . . , yn) =L(θ;Y )p(θ)∫L(θ;Y )p(θ) dθ
(8)
p(θ|y1, . . . , yn) ∝ L(θ;Y )p(θ) (9)
Introduction to Bayesian Inference – p.29/40
Posterior Distribution
Posterior distribution is a conditional distribution: jointdistributions of data and parameters divided bydistribution of data
p(θ|y1, . . . , yn) =
∏f(yi|θ)p(θ)
f(y1, . . . , yn)
where f(y1, . . . , yn) =
∫ ∏f(yi|θ)p(θ)dθ
f(y1, . . . yn) is the marginal distribution of the data
Introduction to Bayesian Inference – p.30/40
Prior Distribution
“Uniform” prior p(β) = 1 in exponential example is nota proper distribution; although the posteriordistribution is a proper distribution.
“Formal Bayes” posterior distribution obtained as alimit of a proper Bayes procedure.
Be very careful with improper prior distributions, theymay not lead to proper posterior distributions!
Introduction to Bayesian Inference – p.31/40
Conjugate Prior Distributions
Prior and Posterior distributions are in the same family
β ∼ G(a, b) (10)
p(β | Y ) ∝ba
Γ(a)βa−1e−βbβne−β
P
yi (11)
∝ βa+n−1e−β(b+P
yi) (12)
β | Y ∼ G(a + n, b +∑
yi) (13)
For exponential data, the conjugate prior distribution is aGamma
“Uniform” is a limiting case with a = 1 and b → 0
Introduction to Bayesian Inference – p.32/40
Examples of Conjugate Families
Data Distributions Prior/PosteriorExponential Gamma
Poisson GammaBinomial Beta
Normal (known variance) NormalNormal (unknown mean/variance) Normal-Gamma
Always available in exponential families
Introduction to Bayesian Inference – p.33/40
Posterior Quantities
posterior distribution G(11, 336)
posterior mode β = 0.03 (same as MLE)
posterior mean E(β|Y ) = 11/336 = 0.033
Interval Estimates – Probability Intervals or CredibleRegions
Highest Posterior Density regionsEqual Tail areaIntervals based on Aysmptotics
Introduction to Bayesian Inference – p.34/40
HPD Regions
Find
Θ1−α = {θ : p(θ | Y ) ≥ cα} such that P (θ ∈ Θ1−α | Y ) = 1−α
Often requires iterative solution
Find points such that p(θ|Y ) > c
Find probability of that set
Adjust c until reach desired level
may not have symmetric tail areas
multimodal posterior may not have an interval
Introduction to Bayesian Inference – p.35/40
Highest Posterior Density Region
0.02 0.03 0.04 0.05 0.06
010
2030
40
95% Highest Posterior Density[0.014, 0.054]
θ
Pos
terio
r D
ensi
ty
P (.014 < β < .054 | Y ) = 0.959
pgamma(.054,11,rate=336) -pgamma(.014,11,rate=336)
Introduction to Bayesian Inference – p.36/40
Equal Tail Area Intervals
Easier alternative is to find points such that
P (θ < θl | Y ) = α/2
P (θ > θu | Y ) = α/2
(θl, θu) is a 1 − α 100% Credible Interval (or PosteriorProbability Interval) for θ.
> qgamma(.025, 11, 336)[1] 0.01634274> qgamma(.975, 11, 336)[1] 0.0547332
Introduction to Bayesian Inference – p.37/40
Empirical Rule Estimates
Mean ± 2 Standard deviations (OK for symmetric bellshaped distributions)
Mean of gamma a/b = 0.0327
Variance of gamma a/b2 = 11/3362
Standard deviation = 0.0099
Approximate 95% Interval is (.013, .053)
Introduction to Bayesian Inference – p.38/40
HPD Regions via Simulation
Simulate from the posterior distribution rgamma()
Coerce them into “class” mcmc as.mcmc()
Use HPDinterval() to find interval
> beta.out = rgamma(10000, 11, rate=sum(y))> libray(coda)> beta.out = as.mcmc(beta.out)> HPDinterval(beta.out)
lower uppervar1 0.01425439 0.05183348attr(,"Probability")[1] 0.95> pgamma(.05183,11,336)-pgamma(.01425,11,336)[1] 0.9493977
Introduction to Bayesian Inference – p.39/40
Summary
Show full posterior distribution when possible
Overlay the prior distribution (if it is proper)
Use HPD regions – smallest length of any region forthe given coverage – when feasible
Introduction to Bayesian Inference – p.40/40