Upload
julyan-arbel
View
1.744
Download
3
Embed Size (px)
Citation preview
Multidimensional covariatedependentDirichletprocesses
Julyan ArbelENSAE, CREST, France
AbstractIntroduction of dependent Dirichlet process(DDP) conditional to a covariate x, with varyingweights p j(x) and clusters θ j(x).
1. General theoretical results on the mea-sure of dependence,
2. Polya Urn type predictive rule,
3. Construction of a novel DDP, basedon the simulation of gamma randomvariables. Easy and fast simulation of theposterior. Generalization to multidimen-sional covariates.
DDP definitionWide class of DDP [MacEachern, 1999]: familyof random probability measures indexed by acovariate x defined on a Euclidean space X ⊂Rk, GX = {Gx : x ∈ X}
Gx =
∞∑j=1
p j(x)δθ j(x),
p j(x) = V j(x)∏l< j
(1 − Vl(x))
V j(x) ∼ Beta(1,M) and θ j(x) ∼ G0 iid.
Important features:
• Multivariate processes VX y θX.
• Stationarity across x, marginallyDP(M,G0).
Literature reviewThere has been an increasing interest sinceMacEachern [1999], with the following gener-alizations
• ANOVA modeling [De Iorio et al., 2004],
• Spatial statistics [Gelfand et al., 2005,Duan et al., 2007],
• Dynamic models [Caron et al., 2006],
• Order-based DDP, or π − DDP [Griffin andSteel, 2006],
• Many priors by Dunson and coauthors,including Weighted mixtures of DP [Dun-son and Park, 2008], and Kernel stick-breaking processes, or KSBP [Dunson etal., 2007],
• Spatial normalized gamma processes, de-noted SNΓP, [Rao and Teh, 2009].
1. Measure of dependenceDenote ξx|Gx ∼ Gx and ξu|Gu ∼ Gu, x , u.Define the αA,B dependence between ξx and ξu
by
αA,B(ξx, ξu) = P(ξx ∈ A, ξu ∈ B)− P(ξx ∈ A)P(ξu ∈ B).
Motivation: αA,B(ξx, ξu) = Cov(Gx(A)Gu(B)).
Proposition 1. We have
αA,B(ξx, ξu) = cV (x, u) αA,B(θx, θu),
where cV (x, u) =µ(x,u)
2M+1−µ(x,u)
and µ(x, u) = E(VxVu).
Interesting result :the cV (x, u) coefficientdoes not go to 0 asthe distance betweenx and u goes to∞.
FIGURE 1: Variation ofcV (x, u) w.r.t. d(x, u)
2. Predictive ruleWe have a predictive rule of the same type asthe Polya Urn, derived in the same way as forthe KSBP of Dunson and Park [2008].
Proposition 2. Denote ξi|Gxi ∼ Gxi , then
P(ξi ∈ . |ξ1, . . . , ξi−1) = π0G0(.) +
i−1∑j=1
π jδξ j (.),
where the probability weights π j depend on thecovariates x through the following expectations µI =
E(∏
i∈I Vxi
).
Shape coefficients
α1α12
α2α3
α23α123
x1 x3x2
α1α123
α2
α12
α23
α3
x1 x2
x3
..
.
FIGURE 2: Two definitions of α (cf. Box 3.). Left: 1Dcovariates; idea from Trippa et al. [2011]. Right: 2Dcovariates (generalizes to Rk)
3. Novel DDP based on gamma random variablesProcess on the beta breaks, VX. Basic idea
X ∼ Ga(a) y Y ∼ Ga(b)
⇒ X + Y ∼ Ga(a + b) andX
X + Y∼ Be(a, b).
(Ga(a, 1) denoted Ga(a))
Xn = {x1, . . . , xn}
d Intersecting neighborhoods, kernels orballs, Ai centered in xi, of λ-measure 1 (λLebesgue measure e.g.).
d Partition of A = ∪ni=1Ai into non
intersecting parts AI, I ⊂ {1, . . . , n}.
d Shape parameters αI = λ(AI) (cf. Fig 2).
d ΓI ∼ Ga(αI) iid (and ΓMI∼ Ga(MαI) iid).
d Γxi =∑I|i∈I ΓI dependent (and ΓM
xi=∑
I|i∈I ΓMI
).∑I|i∈I αI = 1⇒ Γxi ∼ Ga(1)
Eventually, set Vxi =Γxi
Γxi + ΓMxi
.
Example of Figure 2
X = {x1, x2, x3},
Γ1 ∼ Ga(α1), . . . ,Γ123 ∼ Ga(α123),Γx1 = Γ1 + Γ12 + Γ123,
ΓMx1
= ΓM1 + ΓM
12 + ΓM123,
Vx1 =Γx1
Γx1 + ΓMx1
.
Process on the clusters, θXSingle-θ model (fixed clusters across x): notsatisfactory results.Multivariate Gaussian prior on θX, centered,and with varcov matrix elements Σxu =
σ2α(x, u), with α(x, u) defined from α, andσ2 diagonal elements. Conjugate to theGaussian mixing kernel of the DP mixture.
DiscussionDifference with SNΓP, Rao and Teh, [2009]:DDP with two different processes VX andθX ⇒ more flexibility than the SNΓP from asingle gamma process.
Posterior computation
MCMC algorithm based on the blockedGibbs sampler for truncated Dirichlet pro-cesses, iteratively samples in the full condi-tionals of
1. Allocation variables,
2. Beta process,
3. Clusters (conjugate Gaussian),
4. Additional hyperparameters.
Metropolis-Hastings step for the Beta pro-cess sampling: proposal in the prior; acceptprobability ρ is defined with the likelihoodratiod good acceptance rate (>1/2).
Algorithm 1 Beta process full conditional1: Given a current value V j, sample a new
one V∗j independently in the Beta processprior.
2: Acceptance probability isρ = min(1, l(V∗j )/l(V j)).
AcknowledgementsThe author is funded by ENSAE. The project was partlysupported by Fondation Sciences Mathématiques de Paris.