Poster DDP (BNP 2011 Veracruz)

Multidimensional covariatedependentDirichletprocesses

Julyan ArbelENSAE, CREST, France

AbstractIntroduction of dependent Dirichlet process(DDP) conditional to a covariate x, with varyingweights p j(x) and clusters θ j(x).

1. General theoretical results on the mea-sure of dependence,

2. Polya Urn type predictive rule,

3. Construction of a novel DDP, basedon the simulation of gamma randomvariables. Easy and fast simulation of theposterior. Generalization to multidimen-sional covariates.

DDP definitionWide class of DDP [MacEachern, 1999]: familyof random probability measures indexed by acovariate x defined on a Euclidean space X ⊂Rk, GX = {Gx : x ∈ X}

Gx =

∞∑j=1

p j(x)δθ j(x),

p j(x) = V j(x)∏l< j

(1 − Vl(x))

V j(x) ∼ Beta(1,M) and θ j(x) ∼ G0 iid.

Important features:

• Multivariate processes VX y θX.

• Stationarity across x, marginallyDP(M,G0).

Literature reviewThere has been an increasing interest sinceMacEachern [1999], with the following gener-alizations

• ANOVA modeling [De Iorio et al., 2004],

• Spatial statistics [Gelfand et al., 2005,Duan et al., 2007],

• Dynamic models [Caron et al., 2006],

• Order-based DDP, or π − DDP [Griffin andSteel, 2006],

• Many priors by Dunson and coauthors,including Weighted mixtures of DP [Dun-son and Park, 2008], and Kernel stick-breaking processes, or KSBP [Dunson etal., 2007],

• Spatial normalized gamma processes, de-noted SNΓP, [Rao and Teh, 2009].

1. Measure of dependenceDenote ξx|Gx ∼ Gx and ξu|Gu ∼ Gu, x , u.Define the αA,B dependence between ξx and ξu

by

αA,B(ξx, ξu) = P(ξx ∈ A, ξu ∈ B)− P(ξx ∈ A)P(ξu ∈ B).

Motivation: αA,B(ξx, ξu) = Cov(Gx(A)Gu(B)).

Proposition 1. We have

αA,B(ξx, ξu) = cV (x, u) αA,B(θx, θu),

where cV (x, u) =µ(x,u)

2M+1−µ(x,u)

and µ(x, u) = E(VxVu).

Interesting result :the cV (x, u) coefficientdoes not go to 0 asthe distance betweenx and u goes to∞.

FIGURE 1: Variation ofcV (x, u) w.r.t. d(x, u)

2. Predictive ruleWe have a predictive rule of the same type asthe Polya Urn, derived in the same way as forthe KSBP of Dunson and Park [2008].

Proposition 2. Denote ξi|Gxi ∼ Gxi , then

P(ξi ∈ . |ξ1, . . . , ξi−1) = π0G0(.) +

i−1∑j=1

π jδξ j (.),

where the probability weights π j depend on thecovariates x through the following expectations µI =

E(∏

i∈I Vxi

).

Shape coefficients

α1α12

α2α3

α23α123

x1 x3x2

α1α123

α2

α12

α23

α3

x1 x2

x3

..

.

FIGURE 2: Two definitions of α (cf. Box 3.). Left: 1Dcovariates; idea from Trippa et al. [2011]. Right: 2Dcovariates (generalizes to Rk)

3. Novel DDP based on gamma random variablesProcess on the beta breaks, VX. Basic idea

X ∼ Ga(a) y Y ∼ Ga(b)

⇒ X + Y ∼ Ga(a + b) andX

X + Y∼ Be(a, b).

(Ga(a, 1) denoted Ga(a))

Xn = {x1, . . . , xn}

d Intersecting neighborhoods, kernels orballs, Ai centered in xi, of λ-measure 1 (λLebesgue measure e.g.).

d Partition of A = ∪ni=1Ai into non

intersecting parts AI, I ⊂ {1, . . . , n}.

d Shape parameters αI = λ(AI) (cf. Fig 2).

d ΓI ∼ Ga(αI) iid (and ΓMI∼ Ga(MαI) iid).

d Γxi =∑I|i∈I ΓI dependent (and ΓM

xi=∑

I|i∈I ΓMI

).∑I|i∈I αI = 1⇒ Γxi ∼ Ga(1)

Eventually, set Vxi =Γxi

Γxi + ΓMxi

.

Example of Figure 2

X = {x1, x2, x3},

Γ1 ∼ Ga(α1), . . . ,Γ123 ∼ Ga(α123),Γx1 = Γ1 + Γ12 + Γ123,

ΓMx1

= ΓM1 + ΓM

12 + ΓM123,

Vx1 =Γx1

Γx1 + ΓMx1

.

Process on the clusters, θXSingle-θ model (fixed clusters across x): notsatisfactory results.Multivariate Gaussian prior on θX, centered,and with varcov matrix elements Σxu =

σ2α(x, u), with α(x, u) defined from α, andσ2 diagonal elements. Conjugate to theGaussian mixing kernel of the DP mixture.

DiscussionDifference with SNΓP, Rao and Teh, [2009]:DDP with two different processes VX andθX ⇒ more flexibility than the SNΓP from asingle gamma process.

Posterior computation

MCMC algorithm based on the blockedGibbs sampler for truncated Dirichlet pro-cesses, iteratively samples in the full condi-tionals of

1. Allocation variables,

2. Beta process,

3. Clusters (conjugate Gaussian),

4. Additional hyperparameters.

Metropolis-Hastings step for the Beta pro-cess sampling: proposal in the prior; acceptprobability ρ is defined with the likelihoodratiod good acceptance rate (>1/2).

Algorithm 1 Beta process full conditional1: Given a current value V j, sample a new

one V∗j independently in the Beta processprior.

2: Acceptance probability isρ = min(1, l(V∗j )/l(V j)).

AcknowledgementsThe author is funded by ENSAE. The project was partlysupported by Fondation Sciences Mathématiques de Paris.

Documents

Poster DDP (BNP 2011 Veracruz)