1
Multidimensional covariate dependent Dirichlet processes Julyan Arbel ENSAE, CREST, France Abstract Introduction of dependent Dirichlet process ( DDP ) conditional to a covariate x, with varying weights p j ( x) and clusters θ j ( x). 1. General theoretical results on the mea- sure of dependence, 2. Polya Urn type predictive rule, 3. Construction of a novel DDP , based on the simulation of gamma random variables. Easy and fast simulation of the posterior. Generalization to multidimen- sional covariates. DDP definition Wide class of DDP [MacEachern, 1999]: family of random probability measures indexed by a covariate x defined on a Euclidean space X⊂ R k , G X = { G x : x ∈ X} G x = X j=1 p j ( x)δ θ j ( x) , p j ( x) = V j ( x) Y l< j (1 - V l ( x)) V j ( x) Beta(1, M ) and θ j ( x) G 0 iid. Important features: Multivariate processes V X y θ X . Stationarity across x, marginally DP ( M, G 0 ). Literature review There has been an increasing interest since MacEachern [1999], with the following gener- alizations ANOVA modeling [De Iorio et al., 2004], Spatial statistics [Gelfand et al., 2005, Duan et al., 2007], Dynamic models [Caron et al., 2006], Order-based DDP , or π - DDP [Griffin and Steel, 2006], Many priors by Dunson and coauthors, including Weighted mixtures of DP [Dun- son and Park, 2008], and Kernel stick- breaking processes, or KSBP [Dunson et al., 2007], Spatial normalized gamma processes, de- noted SNΓ P , [Rao and Teh, 2009]. 1. Measure of dependence Denote ξ x | G x G x and ξ u | G u G u , x , u. Define the α A, B dependence between ξ x and ξ u by α A, B (ξ x u ) = P(ξ x Au B) - P(ξ x A) P(ξ u B). Motivation: α A, B (ξ x u ) = Cov( G x ( A) G u ( B)). Proposition 1. We have α A, B (ξ x u ) = c V ( x, u) α A, B (θ x u ), where c V ( x, u) = μ( x,u) 2 M+1 -μ( x,u) and μ( x, u) = E (V x V u ). Interesting result : the c V ( x, u) coefficient does not go to 0 as the distance between x and u goes to . F IGURE 1: Variation of c V ( x, u) w.r.t. d ( x, u) 2. Predictive rule We have a predictive rule of the same type as the Polya Urn, derived in the same way as for the KSBP of Dunson and Park [2008]. Proposition 2. Denote ξ i | G x i G x i , then P(ξ i . |ξ 1 ,...,ξ i-1 ) = π 0 G 0 (.) + i-1 X j=1 π j δ ξ j (.), where the probability weights π j depend on the covariates x through the following expectations μ I = E (Q i∈I V x i ) . Shape coefficients α 1 α 12 α 2 α 3 α 23 α 123 x 1 x 3 x 2 α 1 α 123 α 2 α 12 α 23 α 3 x 1 x 2 x 3 . . . F IGURE 2: Two definitions of α (cf. Box 3.). Left: 1D covariates; idea from Trippa et al. [2011]. Right: 2D covariates (generalizes to R k ) 3. Novel DDP based on gamma random variables Process on the beta breaks, V X . Basic idea X Ga(a) y Y Ga(b) X + Y Ga(a + b) and X X + Y Be(a, b). (Ga(a, 1) denoted Ga(a)) X n = { x 1 ,..., x n } d Intersecting neighborhoods, kernels or balls, A i centered in x i , of λ-measure 1 (λ Lebesgue measure e.g.). d Partition of A = n i=1 A i into non intersecting parts A I , I⊂{1,..., n}. d Shape parameters α I = λ( A I ) (cf. Fig 2). d Γ I Ga(α I ) iid (and Γ M I Ga( M α I ) iid). d Γ x i = I|i∈I Γ I dependent (and Γ M x i = I|i∈I Γ M I ). I|i∈I α I = 1 Γ x i Ga(1) Eventually, set V x i = Γ x i Γ x i M x i . Example of Figure 2 X = { x 1 , x 2 , x 3 }, Γ 1 Ga(α 1 ),..., Γ 123 Ga(α 123 ), Γ x 1 1 12 123 , Γ M x 1 M 1 M 12 M 123 , V x 1 = Γ x 1 Γ x 1 M x 1 . Process on the clusters, θ X Single-θ model (fixed clusters across x): not satisfactory results. Multivariate Gaussian prior on θ X , centered, and with varcov matrix elements Σ xu = σ 2 α( x, u), with α( x, u) defined from α, and σ 2 diagonal elements. Conjugate to the Gaussian mixing kernel of the DP mixture. Discussion Difference with SN Γ P , Rao and Teh, [2009]: DDP with two different processes V X and θ X more flexibility than the SNΓ P from a single gamma process. Posterior computation MCMC algorithm based on the blocked Gibbs sampler for truncated Dirichlet pro- cesses, iteratively samples in the full condi- tionals of 1. Allocation variables, 2. Beta process, 3. Clusters (conjugate Gaussian), 4. Additional hyperparameters. Metropolis-Hastings step for the Beta pro- cess sampling: proposal in the prior; accept probability ρ is defined with the likelihood ratio d good acceptance rate (>1/2). Algorithm 1 Beta process full conditional 1: Given a current value V j , sample a new one V * j independently in the Beta process prior. 2: Acceptance probability is ρ = min(1, l(V * j )/l(V j )). Acknowledgements The author is funded by ENSAE. The project was partly supported by Fondation Sciences Mathématiques de Paris.

Poster DDP (BNP 2011 Veracruz)

Embed Size (px)

Citation preview

Page 1: Poster DDP (BNP 2011 Veracruz)

Multidimensional covariatedependentDirichletprocesses

Julyan ArbelENSAE, CREST, France

AbstractIntroduction of dependent Dirichlet process(DDP) conditional to a covariate x, with varyingweights p j(x) and clusters θ j(x).

1. General theoretical results on the mea-sure of dependence,

2. Polya Urn type predictive rule,

3. Construction of a novel DDP, basedon the simulation of gamma randomvariables. Easy and fast simulation of theposterior. Generalization to multidimen-sional covariates.

DDP definitionWide class of DDP [MacEachern, 1999]: familyof random probability measures indexed by acovariate x defined on a Euclidean space X ⊂Rk, GX = {Gx : x ∈ X}

Gx =

∞∑j=1

p j(x)δθ j(x),

p j(x) = V j(x)∏l< j

(1 − Vl(x))

V j(x) ∼ Beta(1,M) and θ j(x) ∼ G0 iid.

Important features:

• Multivariate processes VX y θX.

• Stationarity across x, marginallyDP(M,G0).

Literature reviewThere has been an increasing interest sinceMacEachern [1999], with the following gener-alizations

• ANOVA modeling [De Iorio et al., 2004],

• Spatial statistics [Gelfand et al., 2005,Duan et al., 2007],

• Dynamic models [Caron et al., 2006],

• Order-based DDP, or π − DDP [Griffin andSteel, 2006],

• Many priors by Dunson and coauthors,including Weighted mixtures of DP [Dun-son and Park, 2008], and Kernel stick-breaking processes, or KSBP [Dunson etal., 2007],

• Spatial normalized gamma processes, de-noted SNΓP, [Rao and Teh, 2009].

1. Measure of dependenceDenote ξx|Gx ∼ Gx and ξu|Gu ∼ Gu, x , u.Define the αA,B dependence between ξx and ξu

by

αA,B(ξx, ξu) = P(ξx ∈ A, ξu ∈ B)− P(ξx ∈ A)P(ξu ∈ B).

Motivation: αA,B(ξx, ξu) = Cov(Gx(A)Gu(B)).

Proposition 1. We have

αA,B(ξx, ξu) = cV (x, u) αA,B(θx, θu),

where cV (x, u) =µ(x,u)

2M+1−µ(x,u)

and µ(x, u) = E(VxVu).

Interesting result :the cV (x, u) coefficientdoes not go to 0 asthe distance betweenx and u goes to∞.

FIGURE 1: Variation ofcV (x, u) w.r.t. d(x, u)

2. Predictive ruleWe have a predictive rule of the same type asthe Polya Urn, derived in the same way as forthe KSBP of Dunson and Park [2008].

Proposition 2. Denote ξi|Gxi ∼ Gxi , then

P(ξi ∈ . |ξ1, . . . , ξi−1) = π0G0(.) +

i−1∑j=1

π jδξ j (.),

where the probability weights π j depend on thecovariates x through the following expectations µI =

E(∏

i∈I Vxi

).

Shape coefficients

α1α12

α2α3

α23α123

x1 x3x2

α1α123

α2

α12

α23

α3

x1 x2

x3

..

.

FIGURE 2: Two definitions of α (cf. Box 3.). Left: 1Dcovariates; idea from Trippa et al. [2011]. Right: 2Dcovariates (generalizes to Rk)

3. Novel DDP based on gamma random variablesProcess on the beta breaks, VX. Basic idea

X ∼ Ga(a) y Y ∼ Ga(b)

⇒ X + Y ∼ Ga(a + b) andX

X + Y∼ Be(a, b).

(Ga(a, 1) denoted Ga(a))

Xn = {x1, . . . , xn}

d Intersecting neighborhoods, kernels orballs, Ai centered in xi, of λ-measure 1 (λLebesgue measure e.g.).

d Partition of A = ∪ni=1Ai into non

intersecting parts AI, I ⊂ {1, . . . , n}.

d Shape parameters αI = λ(AI) (cf. Fig 2).

d ΓI ∼ Ga(αI) iid (and ΓMI∼ Ga(MαI) iid).

d Γxi =∑I|i∈I ΓI dependent (and ΓM

xi=∑

I|i∈I ΓMI

).∑I|i∈I αI = 1⇒ Γxi ∼ Ga(1)

Eventually, set Vxi =Γxi

Γxi + ΓMxi

.

Example of Figure 2

X = {x1, x2, x3},

Γ1 ∼ Ga(α1), . . . ,Γ123 ∼ Ga(α123),Γx1 = Γ1 + Γ12 + Γ123,

ΓMx1

= ΓM1 + ΓM

12 + ΓM123,

Vx1 =Γx1

Γx1 + ΓMx1

.

Process on the clusters, θXSingle-θ model (fixed clusters across x): notsatisfactory results.Multivariate Gaussian prior on θX, centered,and with varcov matrix elements Σxu =

σ2α(x, u), with α(x, u) defined from α, andσ2 diagonal elements. Conjugate to theGaussian mixing kernel of the DP mixture.

DiscussionDifference with SNΓP, Rao and Teh, [2009]:DDP with two different processes VX andθX ⇒ more flexibility than the SNΓP from asingle gamma process.

Posterior computation

MCMC algorithm based on the blockedGibbs sampler for truncated Dirichlet pro-cesses, iteratively samples in the full condi-tionals of

1. Allocation variables,

2. Beta process,

3. Clusters (conjugate Gaussian),

4. Additional hyperparameters.

Metropolis-Hastings step for the Beta pro-cess sampling: proposal in the prior; acceptprobability ρ is defined with the likelihoodratiod good acceptance rate (>1/2).

Algorithm 1 Beta process full conditional1: Given a current value V j, sample a new

one V∗j independently in the Beta processprior.

2: Acceptance probability isρ = min(1, l(V∗j )/l(V j)).

AcknowledgementsThe author is funded by ENSAE. The project was partlysupported by Fondation Sciences Mathématiques de Paris.