A Simple Gibbs SamplerA Simple Gibbs Sampler
Biostatistics 615/815Lecture 19Lecture 19
Scheduling2 d Mid h d l d f D b 182nd Midterm scheduled for December 18• 8:00-10:00 am
• Review session on December 11
Alternative could be December 4• 8:30-10:00 am
• Review session on December 2
Optimization Strategies
Single Variable• Golden Search• Quadratic Approximations
Multiple Variables• Simplex MethodSimplex Method• E-M Algorithm• Simulated Annealingg
Simulated Annealing
Stochastic Method
Sometimes takes up-hill steps• Avoids local minimaAvoids local minima
Solution is gradually frozenSolution is gradually frozen• Values of parameters with largest impact on
function values are fixed earlieru c o a ues a e ed ea e
Gibbs Sampler
Another MCMC Method
Update a single parameter at a time
Sample from conditional distribution when other parameters are fixedwhen other parameters are fixed
Gibbs Sampler Algorithm
valuesparameter of choice particular aConsider )(tθ
sayupdatetocomponentSelectinga:by valuesparameter ofset next theDefine
i),...,,...,,|( from for valueSample b.
say update,tocomponent Selecting a.
1121)1( θθθθθxθpθ
i
kiiit
i +−+
steps. previousrepeat and Increment t
Alternative Algorithmltfh iti lC id )(tθ
:byvaluesparameterofsetnexttheDefine
valuesparameter ofchoiceparticularaConsider )(tθ
),...,,|( from for valueSample b. in turn ,1 component,each Updatea.
:byvaluesparameter ofset next theDefine
321)1(
1 θθθxθpθ .. k
kt+
... ),...,,|( from for valueSample c. 312
)1(2 θθθxθpθ k
t+
titdI t
),...,,|( from for valueSample z. 131)1(
t
θθθxθpθ kkt
k −+
steps.previousrepeat andIncrement t
Key Property:Key Property:Stationary Distribution
)|()(th tS )()()( ttt θθθθθθ
as ddistribute is ),...,,(Then
)|,...,,(~),...,,( that Suppose
)()(2
)1(1
21)()(
2)(
1 xp
tk
tt
kt
ktt
+ θθθ
θθθθθθ
)|,...,,()|,...,(),,...,|( 21221 xpxpxp kkk = θθθθθθθθ
)|()|(
fact...In
)1()( tt θθθθ +
valuesparameter sample sampler to Gibbs expect the we,Eventually
)|(~)|(~ )1()( xpxp tt θθθθ +⇒
ondistributiposterior their from
Gibbs Sampling for Gibbs Sampling for Mixture Distributions
Sample each of the mixture parameters from conditional distribution• Dirichlet, Normal and Gamma distributions are
typical
Simple alternative is to sample the sourceof each observation• Assign observation to specific component
Sampling A Component
∑== iji
jj fxf
,xiZ)|(
),|(),,|Pr(
φηφπ
ηφπ∑
lljl
jj xf ),|( ηφπ
Calculate the probability that the observation originated from a specific component…
… can you recall how random numbers can be used to sample from one of a few discrete categories?sa p e o o e o a e d sc e e ca ego es
C Code: Sampling A ComponentC Code: Sampling A Componentint sample_group(double x, int k,
double * probs, double * mean, double * sigma){{int group; double p = Random();double lk = dmix(x, k, probs, mean, sigma);
for (group = 0; group < k - 1; group++)for (group 0; group < k 1; group++){double pgroup = probs[group] *
dnorm(x, mean[group], sigma[group])/lk;
if (p < pgroup) return group;
p -= pgroup;p pgroup;}
return k - 1;}}
Calculating Mixture Parameters
iZjin
j
= ∑=:1 Before sampling a
new origin for an observation
iji
ii
nxxnnp
=
=
∑
observation…
pdate mi t reiZj
iji nxxj
⎟⎞
⎜⎛
∑=:
… update mixture parameters given current assignments
iiZj
iiji nxnxsj
⎟⎟⎠
⎞⎜⎜⎝
⎛−= ∑
=:
22current assignments
Could be expensive!Could be expensive!
C Code:C Code:Updating Parametersvoid update_estimates(int k, int n,
double * prob, double * mean, double * sigma,double * counts, double * sum, double * sumsq)
{int i;
for (i = 0; i < k; i++){
[i] [i]/prob[i] = counts[i]/ n;mean[i] = sum[i] / counts[i];sigma[i] = sqrt((sumsq[i] - mean[i]*mean[i]*counts[i])
/ counts[i] + 1e-7);}}
}
C Code:C Code:Updating Mixture Parameters IIvoid remove_observation(double x, int group,
double * counts, double * sum, double * sumsq){counts[group] --;sum[group] -= x;sumsq[group] -= x * x;}
id i (d bl ivoid add_observation(double x, int group,double * counts, double * sum, double * sumsq)
{counts[group] ++;
[ ]sum[group] += x;sumsq[group] += x * x;}
Selecting a Starting State
Must start with an assignment of gobservations to groupings
Many alternatives are possible, I chose to perform random assignments withto perform random assignments with equal probabilities…
C Code:C Code:Starting Statevoid initial_state(int k, int * group,
double * counts, double * sum, double * sumsq){int i;
for (i = 0; i < k; i++)counts[i] = sum[i] = sumsq[i] = 0.0;
f (i 0 i i )for (i = 0; i < n; i++){group[i] = Random() * k;
[ [i]]counts[group[i]] ++;sum[group[i]] += data[i];sumsq[group[i]] += data[i] * data[i];}
}}
The Gibbs Sampler
Select initial state
Repeat a large number of times:• Select an element• Update conditional on other elements
If i t t t f hIf appropriate, output summary for each run…
C Code:C Code:Core of The Gibbs Samplerinitial_state(k, probs, mean, sigma, group, counts, sum, sumsq);for (i = 0; i < 10000000; i++)
{int id = Random() * n;
if (counts[group[id]] < MIN_GROUP) continue;
remove_observation(data[id], group[id], counts, sum, sumsq);i ( iupdate_estimates(k, n - 1, probs, mean, sigma,
counts, sum, sumsq);group[id] = sample_group(data[id], k, probs, mean, sigma);add_observation(data[id], group[id], counts, sum, sumsq);
if ((i > BURN_IN) && (i % THIN_INTERVAL == 0))/* Collect statistics */
}
Gibbs Sampler:Gibbs Sampler:Memory Allocation and Freeingvoid gibbs(int k, double * probs, double * mean, double * sigma)
{int i, * group = (int *) malloc(sizeof(int) * n);double * sum = alloc_vector(k);double * sumsq = alloc_vector(k);double * counts = alloc_vector(k);
/* Core of the Gibbs Sampler goes here */
free_vector(sum, k);free_vector(sumsq, k);free_vector(counts, k);f ( )free(group);}
Example ApplicationExample ApplicationOld Faithful Eruptions (n = 272)
Old Faithful Eruptions
ncy
1520
Freq
ue
510
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
05
Duration (mins)
Notes …
My first few runs found excellent solutions by fitting components accounting for very few observations but with variance near 0
Why? • Repeated values due to roundingRepeated values due to rounding• To avoid this, I set MIN_GROUP to 13 (5%)
Gibbs Sampler Burn-InLogLikelihood
-250
-350
-300
kelih
ood
-400
350
Log-
Lik
-4500 2500 5000 7500 10000
Iteration
Gibbs Sampler Burn-InMixture Means
5
3
4
ean
2
3
Me
10 2500 5000 7500 10000
Iteration
Gibbs Sampler After Burn-InGibbs Sampler After Burn InLikelihood
50-2
60-2
5-2
70
Like
lihoo
d
290
-280
L
0 20 40 60 80 100
-300
-2
0 20 40 60 80 100
Iteration (Millions)
Gibbs Sampler After Burn-InGibbs Sampler After Burn InMean for First Component
1.95
1.90
nent
Mea
n
1.85
Com
pon
1.80
0 20 40 60 80 100
Iteration (Millions)
Gibbs Sampler After Burn-In
56
4
onen
t Mea
n
3
Com
po
0 20 40 60 80 100
2
0 20 40 60 80 100
Iteration (Millions)
Notes on Gibbs Sampler
Previous optimizers settled on a minimum eventually
The Gibbs sampler continues wanderingThe Gibbs sampler continues wandering through the stationary distribution…
Forever!
Drawing Inferences…
To draw inferences, summarize parameter values from stationary distribution
For example, might calculate the mean,For example, might calculate the mean, median, etc.
Component Means10 10
8
8 8
6
Den
sity
6
Den
sity
6
Den
sity
4
D
4
D
4
D
2
02
02
0
Mean
1 2 3 4 5 6
Mean
1 2 3 4 5 6
Mean
1 2 3 4 5 6
Component Probabilities8
810
56
ensi
ty
46
ensi
ty
6
ensi
ty
34
De
2
De
24
De
23
0.0 0.2 0.4 0.6 0.8 1.0
0
0.0 0.2 0.4 0.6 0.8 1.0
0
0.0 0.2 0.4 0.6 0.8 1.0
01Probability Probability Probability
Overall Parameter Estimates
The means of the posterior distributions for the three components were:• F i f 0 073 0 278 d 0 648• Frequencies of 0.073, 0.278 and 0.648• Means of 1.85, 2.08 and 4.28 • Variances of 0.001, 0.065 and 0.182
Our previous estimates were:• C t t ib ti 160 0 195 d 0 644• Components contributing .160, 0.195 and 0.644• Component means are 1.856, 2.182 and 4.289• Variances are 0.00766, 0.0709 and 0.172
Joint Distributions
Gibbs Sampler provides other interesting information and insights
For example, we can evaluate jointFor example, we can evaluate joint distribution of two parameters…
Component Probabilites25
0.30
0.20
0.2
3 P
roba
bilit
y
0.10
0.15
Gro
up
0.05 0.10 0.15 0.20 0.25 0.30
0.05
Group 1 Probability
So far today …
Introduction to Gibbs sampling
Generating posterior distributions of parameters conditional on dataparameters conditional on data
Providing insight into joint distributionsProviding insight into joint distributions
A little bit of theory
Highlight connection between Simulated g gAnnealing and the Gibbs sampler …
Fill in some details of the Metropolis algorithmalgorithm
Both Methods Are Markov Both Methods Are Markov Chains
The probability of any state being chosen depends only on the previous state
)|Pr(),...,|Pr( 110011 −−−− ====== nnnnnnnn iSiSiSiSiS
States are updated according to transition t i ith l t Thi t i d fimatrix with elements pij. This matrix defines
important properties, including periodicity and irreducibility.irreducibility.
Metropolis-HastingsMetropolis HastingsAcceptance Probability
iSjS )|(L t
ππ
iSjSqq
ji
nnij
stateeach of iesprobabilit relative thebe and Let
)|propose(Let 1 === +
:isy probabilit acceptance Hastings-Metropolis The
qqππ
aqπqπ
a jiiji
jij
iji
jijij if ,1minor ,1min =⎟⎟
⎠
⎞⎜⎜⎝
⎛=⎟
⎟⎠
⎞⎜⎜⎝
⎛=
ππ j ofvaluesactual not theknown,bemust ratio Only theπi
,y
Metropolis-Hastings Equilibrium
ondistributi mequilibriuan reach it will Chain, Markov aupdate toalgorithm Hastings-Metropolis theuse weIf
Pr where ii)(S π==
ecommunicattostatesall allowmust density proposal thehappen, toFor this
e.communicat tostates
Gibbs Sampler
that ensuressampler Gibbs The = jijiji qq ππp jijiji qq
1,1min e,consequenc a As =⎟⎟⎠
⎞⎜⎜⎝
⎛= jij
ij qq
aππ
⎟⎠
⎜⎝ ijiqπ
Simulated Annealing
,parameter re temperatuaGiven 1
τ
with replace 1)(
∑=
j
iiiπ
τ
ττ
π
ππ
flattenedison distributiy probabilittheratures,high tempeAt
j
statesy probabilithigh given to are ghtslarger wei res, temperatulowAt yp,g p
Additional Reading
If you need a refresher on Gibbs sampling• Bayesian Methods for Mixture Distributions
M. Stephens (1997)http://www.stat.washington.edu/stephens/
Numerical Analysis for Statisticians( )• Kenneth Lange (1999)
• Chapter 24