Download pdf - Biostats Gibbs

A Simple Gibbs SamplerA Simple Gibbs Sampler

Biostatistics 615/815Lecture 19Lecture 19

Scheduling2 d Mid h d l d f D b 182nd Midterm scheduled for December 18• 8:00-10:00 am

• Review session on December 11

Alternative could be December 4• 8:30-10:00 am

• Review session on December 2

Optimization Strategies

Single Variable• Golden Search• Quadratic Approximations

Multiple Variables• Simplex MethodSimplex Method• E-M Algorithm• Simulated Annealingg

Simulated Annealing

Stochastic Method

Sometimes takes up-hill steps• Avoids local minimaAvoids local minima

Solution is gradually frozenSolution is gradually frozen• Values of parameters with largest impact on

function values are fixed earlieru c o a ues a e ed ea e

Gibbs Sampler

Another MCMC Method

Update a single parameter at a time

Sample from conditional distribution when other parameters are fixedwhen other parameters are fixed

Gibbs Sampler Algorithm

valuesparameter of choice particular aConsider )(tθ

sayupdatetocomponentSelectinga:by valuesparameter ofset next theDefine

i),...,,...,,|( from for valueSample b.

say update,tocomponent Selecting a.

1121)1( θθθθθxθpθ

i

kiiit

i +−+

steps. previousrepeat and Increment t

Alternative Algorithmltfh iti lC id )(tθ

:byvaluesparameterofsetnexttheDefine

valuesparameter ofchoiceparticularaConsider )(tθ

),...,,|( from for valueSample b. in turn ,1 component,each Updatea.

:byvaluesparameter ofset next theDefine

321)1(

1 θθθxθpθ .. k

kt+

... ),...,,|( from for valueSample c. 312

)1(2 θθθxθpθ k

t+

titdI t

),...,,|( from for valueSample z. 131)1(

t

θθθxθpθ kkt

k −+

steps.previousrepeat andIncrement t

Key Property:Key Property:Stationary Distribution

)|()(th tS )()()( ttt θθθθθθ

as ddistribute is ),...,,(Then

)|,...,,(~),...,,( that Suppose

)()(2

)1(1

21)()(

2)(

1 xp

tk

tt

kt

ktt

+ θθθ

θθθθθθ

)|,...,,()|,...,(),,...,|( 21221 xpxpxp kkk = θθθθθθθθ

)|()|(

fact...In

)1()( tt θθθθ +

valuesparameter sample sampler to Gibbs expect the we,Eventually

)|(~)|(~ )1()( xpxp tt θθθθ +⇒

ondistributiposterior their from

Gibbs Sampling for Gibbs Sampling for Mixture Distributions

Sample each of the mixture parameters from conditional distribution• Dirichlet, Normal and Gamma distributions are

typical

Simple alternative is to sample the sourceof each observation• Assign observation to specific component

Sampling A Component

∑== iji

jj fxf

,xiZ)|(

),|(),,|Pr(

φηφπ

ηφπ∑

lljl

jj xf ),|( ηφπ

Calculate the probability that the observation originated from a specific component…

… can you recall how random numbers can be used to sample from one of a few discrete categories?sa p e o o e o a e d sc e e ca ego es

C Code: Sampling A ComponentC Code: Sampling A Componentint sample_group(double x, int k,

double * probs, double * mean, double * sigma){{int group; double p = Random();double lk = dmix(x, k, probs, mean, sigma);

for (group = 0; group < k - 1; group++)for (group 0; group < k 1; group++){double pgroup = probs[group] *

dnorm(x, mean[group], sigma[group])/lk;

if (p < pgroup) return group;

p -= pgroup;p pgroup;}

return k - 1;}}

Calculating Mixture Parameters

iZjin

j

= ∑=:1 Before sampling a

new origin for an observation

iji

ii

nxxnnp

=

=

∑

observation…

pdate mi t reiZj

iji nxxj

⎟⎞

⎜⎛

∑=:

… update mixture parameters given current assignments

iiZj

iiji nxnxsj

⎟⎟⎠

⎞⎜⎜⎝

⎛−= ∑

=:

22current assignments

Could be expensive!Could be expensive!

C Code:C Code:Updating Parametersvoid update_estimates(int k, int n,

double * prob, double * mean, double * sigma,double * counts, double * sum, double * sumsq)

{int i;

for (i = 0; i < k; i++){

[i] [i]/prob[i] = counts[i]/ n;mean[i] = sum[i] / counts[i];sigma[i] = sqrt((sumsq[i] - mean[i]*mean[i]*counts[i])

/ counts[i] + 1e-7);}}

}

C Code:C Code:Updating Mixture Parameters IIvoid remove_observation(double x, int group,

double * counts, double * sum, double * sumsq){counts[group] --;sum[group] -= x;sumsq[group] -= x * x;}

id i (d bl ivoid add_observation(double x, int group,double * counts, double * sum, double * sumsq)

{counts[group] ++;

[ ]sum[group] += x;sumsq[group] += x * x;}

Selecting a Starting State

Must start with an assignment of gobservations to groupings

Many alternatives are possible, I chose to perform random assignments withto perform random assignments with equal probabilities…

C Code:C Code:Starting Statevoid initial_state(int k, int * group,

double * counts, double * sum, double * sumsq){int i;

for (i = 0; i < k; i++)counts[i] = sum[i] = sumsq[i] = 0.0;

f (i 0 i i )for (i = 0; i < n; i++){group[i] = Random() * k;

[ [i]]counts[group[i]] ++;sum[group[i]] += data[i];sumsq[group[i]] += data[i] * data[i];}

}}

The Gibbs Sampler

Select initial state

Repeat a large number of times:• Select an element• Update conditional on other elements

If i t t t f hIf appropriate, output summary for each run…

C Code:C Code:Core of The Gibbs Samplerinitial_state(k, probs, mean, sigma, group, counts, sum, sumsq);for (i = 0; i < 10000000; i++)

{int id = Random() * n;

if (counts[group[id]] < MIN_GROUP) continue;

remove_observation(data[id], group[id], counts, sum, sumsq);i ( iupdate_estimates(k, n - 1, probs, mean, sigma,

counts, sum, sumsq);group[id] = sample_group(data[id], k, probs, mean, sigma);add_observation(data[id], group[id], counts, sum, sumsq);

if ((i > BURN_IN) && (i % THIN_INTERVAL == 0))/* Collect statistics */

}

Gibbs Sampler:Gibbs Sampler:Memory Allocation and Freeingvoid gibbs(int k, double * probs, double * mean, double * sigma)

{int i, * group = (int *) malloc(sizeof(int) * n);double * sum = alloc_vector(k);double * sumsq = alloc_vector(k);double * counts = alloc_vector(k);

/* Core of the Gibbs Sampler goes here */

free_vector(sum, k);free_vector(sumsq, k);free_vector(counts, k);f ( )free(group);}

Example ApplicationExample ApplicationOld Faithful Eruptions (n = 272)

Old Faithful Eruptions

ncy

1520

Freq

ue

510

1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

05

Duration (mins)

Notes …

My first few runs found excellent solutions by fitting components accounting for very few observations but with variance near 0

Why? • Repeated values due to roundingRepeated values due to rounding• To avoid this, I set MIN_GROUP to 13 (5%)

Gibbs Sampler Burn-InLogLikelihood

-250

-350

-300

kelih

ood

-400

350

Log-

Lik

-4500 2500 5000 7500 10000

Iteration

Gibbs Sampler Burn-InMixture Means

5

3

4

ean

2

3

Me

10 2500 5000 7500 10000

Iteration

Gibbs Sampler After Burn-InGibbs Sampler After Burn InLikelihood

50-2

60-2

5-2

70

Like

lihoo

d

290

-280

L

0 20 40 60 80 100

-300

-2

0 20 40 60 80 100

Iteration (Millions)

Gibbs Sampler After Burn-InGibbs Sampler After Burn InMean for First Component

1.95

1.90

nent

Mea

n

1.85

Com

pon

1.80

0 20 40 60 80 100


Gibbs Sampler After Burn-In

56

4

onen

t Mea

n

3

Com

po

0 20 40 60 80 100

2

0 20 40 60 80 100


Notes on Gibbs Sampler

Previous optimizers settled on a minimum eventually

The Gibbs sampler continues wanderingThe Gibbs sampler continues wandering through the stationary distribution…

Forever!

Drawing Inferences…

To draw inferences, summarize parameter values from stationary distribution

For example, might calculate the mean,For example, might calculate the mean, median, etc.

Component Means10 10

8

8 8

6

Den

sity

6

Den

sity

6

Den

sity

4

D

4

D

4

D

2

02

02

0

Mean

1 2 3 4 5 6

Mean

1 2 3 4 5 6

Mean

1 2 3 4 5 6

Component Probabilities8

810

56

ensi

ty

46

ensi

ty

6

ensi

ty

34

De

2

De

24

De

23

0.0 0.2 0.4 0.6 0.8 1.0

0

0.0 0.2 0.4 0.6 0.8 1.0

0

0.0 0.2 0.4 0.6 0.8 1.0

01Probability Probability Probability

Overall Parameter Estimates

The means of the posterior distributions for the three components were:• F i f 0 073 0 278 d 0 648• Frequencies of 0.073, 0.278 and 0.648• Means of 1.85, 2.08 and 4.28 • Variances of 0.001, 0.065 and 0.182

Our previous estimates were:• C t t ib ti 160 0 195 d 0 644• Components contributing .160, 0.195 and 0.644• Component means are 1.856, 2.182 and 4.289• Variances are 0.00766, 0.0709 and 0.172

Joint Distributions

Gibbs Sampler provides other interesting information and insights

For example, we can evaluate jointFor example, we can evaluate joint distribution of two parameters…

Component Probabilites25

0.30

0.20

0.2

3 P

roba

bilit

y

0.10

0.15

Gro

up

0.05 0.10 0.15 0.20 0.25 0.30

0.05

Group 1 Probability

So far today …

Introduction to Gibbs sampling

Generating posterior distributions of parameters conditional on dataparameters conditional on data

Providing insight into joint distributionsProviding insight into joint distributions

A little bit of theory

Highlight connection between Simulated g gAnnealing and the Gibbs sampler …

Fill in some details of the Metropolis algorithmalgorithm

Both Methods Are Markov Both Methods Are Markov Chains

The probability of any state being chosen depends only on the previous state

)|Pr(),...,|Pr( 110011 −−−− ====== nnnnnnnn iSiSiSiSiS

States are updated according to transition t i ith l t Thi t i d fimatrix with elements pij. This matrix defines

important properties, including periodicity and irreducibility.irreducibility.

Metropolis-HastingsMetropolis HastingsAcceptance Probability

iSjS )|(L t

ππ

iSjSqq

ji

nnij

stateeach of iesprobabilit relative thebe and Let

)|propose(Let 1 === +

:isy probabilit acceptance Hastings-Metropolis The

qqππ

aqπqπ

a jiiji

jij

iji

jijij if ,1minor ,1min =⎟⎟

⎠

⎞⎜⎜⎝

⎛=⎟

⎟⎠

⎞⎜⎜⎝

⎛=

ππ j ofvaluesactual not theknown,bemust ratio Only theπi

,y

Metropolis-Hastings Equilibrium

ondistributi mequilibriuan reach it will Chain, Markov aupdate toalgorithm Hastings-Metropolis theuse weIf

Pr where ii)(S π==

ecommunicattostatesall allowmust density proposal thehappen, toFor this

e.communicat tostates

Gibbs Sampler

that ensuressampler Gibbs The = jijiji qq ππp jijiji qq

1,1min e,consequenc a As =⎟⎟⎠

⎞⎜⎜⎝

⎛= jij

ij qq

aππ

⎟⎠

⎜⎝ ijiqπ

Simulated Annealing

,parameter re temperatuaGiven 1

τ

with replace 1)(

∑=

j

iiiπ

τ

ττ

π

ππ

flattenedison distributiy probabilittheratures,high tempeAt

j

statesy probabilithigh given to are ghtslarger wei res, temperatulowAt yp,g p

Additional Reading

If you need a refresher on Gibbs sampling• Bayesian Methods for Mixture Distributions

M. Stephens (1997)http://www.stat.washington.edu/stephens/

Numerical Analysis for Statisticians( )• Kenneth Lange (1999)

• Chapter 24