23
Bayesian Generalized Product Partition Model By David Dunson and Ju- Hyun Park Presentation by Eric Wang 2/15/08

Bayesian Generalized Product Partition Model

  • Upload
    renata

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Bayesian Generalized Product Partition Model. By David Dunson and Ju-Hyun Park Presentation by Eric Wang 2/15/08. Outline. Introduce Product Partition Models (PPM). Relate PPM to DP via the Blackwell-MacQueen Polya Urn scheme. - PowerPoint PPT Presentation

Citation preview

Page 1: Bayesian Generalized Product Partition Model

Bayesian Generalized Product Partition Model

By David Dunson and Ju-Hyun Park

Presentation by Eric Wang 2/15/08

Page 2: Bayesian Generalized Product Partition Model

Outline• Introduce Product Partition Models (PPM).

• Relate PPM to DP via the Blackwell-MacQueen Polya Urn scheme.

• Introduce predictor dependence into PPM to form Generalized PPM (GPPM).

• Discussion and Results

• Conclusion

Page 3: Bayesian Generalized Product Partition Model

Product Partition Model• A PPM is formally defined as

– Where is a partition of .– Let denote the data for subjects in cluster h, h

= 1,…,k.– Therefore, the probability of partition is therefore the

product of all its independent subsets.– The posterior cohesion on after seeing data is also a

PPM,

k

h

k

h

*hhh

* ccf|f1 1

0* )S()S(),y()Sy(

)S,...,S(S **1

*k },...,1{ n}S:{y *hih iy

*S

*S y

)y()S( *hhh fc

(1)

Page 4: Bayesian Generalized Product Partition Model

Product Partition Model• A PPM can also be induced hierarchically

– Where if , .

• Taking induces a nonparametric PPM.

• A prior on the weights imposes a particular form on the cohesion: a convenient choice corresponds to the Dirichlet Process.

01

~,~

)(~S,|

GS

fyind

h

k

hhh

ind

i

S

ind

i i

θ

hSi *hSi )',..,(S 1 nSS

k

)',...( 1 k

Page 5: Bayesian Generalized Product Partition Model

Relating DP and PPM• In DP, . – G is seen in stick breaking. If it is marginalized out, it yields

the Blackwell-MacQueen (1973) formulation:

– Where is the unique value taken by the ith data.– The joint distribution of the a particular set

is therefore

due to the independence of the data.

)(~ 0GDPG

i

Page 6: Bayesian Generalized Product Partition Model

Relating DP and PPM• It can be shown directly that the Blackwell-MacQueen

formulation leads to

• Where is the number of data taking unique value .• is the unique value of the subject in cluster h, re-sorted

by their ids:

• Also, , is a normalizing constant and the cohesion is Then:

hk h

(3)

(2)

thl},...,,...,,...,,,...,{ ,1,

2

,21,2

1

,11,1 21 kh

kkhkh

h

khh

h

khh k

Page 7: Bayesian Generalized Product Partition Model

Relating DP and PPM• From slide 3, writing the prior and likelihood together:

• Notice that from (1), G can be marginalized out to get the same form

• Specifically, integrate over all possible unique values which can be taken by for subset h.h

(4)

Page 8: Bayesian Generalized Product Partition Model

Relating DP and PPM• Therefore, DP is a special case of PPM with cohesion

and normalizing constant .

• However, (2) follows the premise of DP that data is exhcangeable and does not incorporate dependence on predictors.

• Next, PPMs will be generalized such that predictor dependence is incorporated.

Page 9: Bayesian Generalized Product Partition Model

Generalized PPM• The goal of the paper is to formulate (1) such that the cohesion

depends on the subject’s predictor:

• This can be done following a process very similar to the non-predictor case above.

• Once again, the connection between DP and PPM will be used, this will henceforth be referred to as GPPM

• The formulation is interesting because the predictors will be treated as random variables rather than known fixed values (as in KSBP).

Page 10: Bayesian Generalized Product Partition Model

GPPM• Consider the following hierarchical model

– Where , constitutes a base measure on and , the parameters of the data and predictor, respectively.

– This model will segment data {1,…,n} into k clusters. As before, denotes that subject i belongs to cluster h.

– and , which denote the unique values of the parameters associated with the subject and its predictor, shown below

*hSi

Page 11: Bayesian Generalized Product Partition Model

GPPM• The joint distribution of can be developed in a similar manner to (2):

• The conditional distribution of given predictors is

• For comparison, (2) is shown below:

• The cohesion in (6) is

• (7) meets the criteria originally set out.

(5)

(6)

(2)

(7)

Page 12: Bayesian Generalized Product Partition Model

GPPM• Some thoughts on GPPM so far:

– As noted earlier the posterior distribution of PPMs are still in the class of PPMs, but with updated cohesion.

– Similiarly, the posterior of a GPPM will also take the form of a GPPM

– (2) and (6) are quite similar. The extra portion of (6) is the marginalized probability of the predictor .

– If , then the GPPM reverts to the Blackwell-MacQueen formulation, seen clearly in the following theorem.

)y()S( *hhh fc

Page 13: Bayesian Generalized Product Partition Model

Generalized Polya Urn Scheme• The following theorem shows that the GPPM can induce a

Blackwell-MacQueen Polya Urn scheme, generalized for predictor dependence:

Page 14: Bayesian Generalized Product Partition Model

Generalized Polya Urn Scheme• By the above theorem, data i will do either 1) or 2)

– 1) Draw a previously unseen unique value proportional to the concentration parameter and the base measure on the predictor

– 2) Draw a previously used unique value equal to the parameters of

cluster h proportional to the number of data which have previously chosen that unique value and the marginal likelihoods of its predictor value across the clusters.

• Further, since the predictors are treated as random variables, updating the posteriors on each cluster’s predictor parameters means that GPPM is a flexible, non-parametric way to adapt the distance measure in predictor space.

• In this paper G is always integrated out; however, Dunson alludes to variational techniques which could still be developed in similar fashion following the fast Variational DP proposed by Kurihara et al (2006).

Page 15: Bayesian Generalized Product Partition Model

Generalized Polya Urn Scheme• Consider, for example, a Normal-Wishart prior on the predictor as follows

• Where and are multiplicative constants and is a Wishart distribution with degrees of freedom and mean

• Notice that this formulation adds another multiplier to the precision of the predictor distribution. This analogously corresponds to kernel width in KSBP, and encourages tight local clustering in predictor space.

• The marginal distributions on the predictors from Theorem 1 take the forms shown on the next slide.

Page 16: Bayesian Generalized Product Partition Model

Generalized Polya Urn Scheme• The marginal distribution of the predictor in the first weight:

• The marginal distribution of the predictor in the second weight has the same functional form but with updated hyperparameters:

Non-central multivariate t-distribution with degrees of freedomMean and scale

2/)(*0

1*0

*0*2/1*

0*2/*

**0

**0

*

)()'(11||)2/()(

)2/)((),,|(p

xxxxxx

px

xxxx

x

xxpxf

where

And is the empirical mean of the predictors in cluster h, without predictor i.

Page 17: Bayesian Generalized Product Partition Model

Generalized Polya Urn Scheme• Posterior updating in this model is straightforward using MCMC. The

conditional posterior of the parameters is

• The indicators are updated separately from the cluster parameters . The membership indicators are sampled from it multinomial posterior:

• Next, update the parameters conditioned on and number of clusters k.

where is the base prior updated with the data likelihood

and the weights from Theorem 1

Page 18: Bayesian Generalized Product Partition Model

Results• Dunson et al. demonstrates results using the following model on

conditional density regression problems

• Where

• Demonstrate results on 3 datasets:– Simulated Single Gaussian (p = 2)– Simulated Mixture of two Gaussians (p = 2)– Epidemiology data (p = 3)

P-dimensional predictor

Data likelihood

Parameters of cluster h.

Page 19: Bayesian Generalized Product Partition Model

Results• Simulated single Gaussian data, 500 data points

– is generated iid from a uniform distribution over (0,1).– Data was simulated using

• Algorithm was run for 10,000 iterations with 1,000 iteration burn-in. Fast mixing and good estimates. Raw Data

x

y

Below are conditional distributions on y for two different values of x. The dotted lines is truth, the solid line is the estimation, and the dashed lines are 99% credibility intervals

Page 20: Bayesian Generalized Product Partition Model

Results• Simulated 2 Gaussian results, 500 data points

– is generated iid from a uniform distribution over (0,1).– Data was simulated using

Here, the left column of plots are for a PPM (non-generalized, while the right column plots is the GPPM on the same dataset. Notice much better fitting in the bottom plots, and that the GPPM is not dragged toward 0 as the second peak appears when approaches 0.

PPM GPPM

Page 21: Bayesian Generalized Product Partition Model

Results• Epidemiologic Application:• DDE is shown to increase the rate of pre-term birth. Two

predictors and correspond to DDE dose for child i, and mother’s age after normalization, respectively.

• Dataset size was 2,313 subjects.

• MCMC GPPM was run for 30,000 iterations with 10,000 iteration burn-in.

• The results confirmed earlier findings that DDE causes a slightly decreasing trend as DDE level rises.

• These findings are similar to previous KSBP work on the same dataset, but the implementation was simpler.

Page 22: Bayesian Generalized Product Partition Model

Results

Dashed lines indicate 99% credibility intervals

Raw Data

Page 23: Bayesian Generalized Product Partition Model

Conclusion• A GPPM was formulated beginning with the Blackwell-MacQueen

Polya Urn scheme.

• The GPPM incorporates predictor dependence by treating the predictor as a random variable.– It is similar in spirit to the KSBP, but is able to bypass issues such as kernel

width selection and the inability to implement a continuous distribution in predictor space.

• Future research directions could explore Dunson’s mention of a variational method similar to the formulation proposed in this paper.