Upload
julyan-arbel
View
712
Download
1
Embed Size (px)
Citation preview
Dependent processesin Bayesian nonparametrics
Matteo Ruggiero
University of Torino and Collegio Carlo Alberto
Moncalieri, Feb 19 2016
0.0 0.2 0.4 0.6 0.8 1.0
time 1
0
0.029
0.059
0.088
1. Motivation and general settingBNP and discrete random probability measures
p = (p1, p2, . . .) frequencies in
∆∞ ={p ∈ [0, 1]∞ :
∑ipi = 1
}p↓ = (p(1), p(2), . . .) ordered frequencies in
∇∞ ={p ∈ [0, 1]∞ : p1 ≥ p2 ≥ · · · ≥ 0,
∑ipi = 1
}Assign law to p, which induces a distributionon ∆∞, ∇∞Otherwise assign to the indices unique labels
X1, X2, . . .iid∼ P0 continuous on X and define
the discrete measure
∞∑i=1
piδXi
which induces a distribution on P(X)
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 3
1. Motivation and general settingBNP and discrete random probability measures
Approach 1:model observations Yj directly with
p = (p1, p2, . . .) or P =∑∞
i=1piδXi
where Yj = Xi w.p. pi, and the (Xi, pi) are random
Approach 2:use mixtures to yield more flexibility and possibly aim at continuousdistributions
f(y) =
∫Xf(y | x)P (dx) ⇒ f(y) =
∑∞
i=1pif(y | Xi)
i.e. Yj ∼ f(y | Xi) w.p. pi and the (Xi, pi) are random
Use either approach as a base for estimation, uncertainty quantification,forecasting, clustering, . . .
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 4
1. Motivation and general settingMotivation for dependent processes
Assumptions in classical BNP approach:
observations are excheangeableobservations depend on a fixed environment/state of the worldinference is static (fixed time)/carried out on single environment
Data may not satisfy these assumptions (e.g. prices dynamics)
Need for more general types of dependence
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 5
1. Motivation and general settingPartial exchangeability
Natural extension is partial exchangeability (de Finetti sense), e.g.X1,1 X1,2 X1,3 · · ·X2,1 X2,2 X2,3 · · ·X3,1 X3,2 X3,3 · · ·· · · · · · · · · · · ·
row-wise exchangeability (not overall): given i, Xi,j are exchangeable
Accommodates e.g. temporal structures
Collection of random probability measures, indexed by some covariate
Can be extended to an uncountable family
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 6
1. Motivation and general settingDependent densities: discrete time
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 7
1. Motivation and general settingDependent densities: discrete time
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 8
1. Motivation and general settingDependent densities: continuous time
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 9
1. Motivation and general settingDependent densities: continuous time
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 10
1. Motivation and general settingDependent densities: continuous time
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 11
1. Motivation and general settingModelling and inference with time-dependent processes
Temporal dependence structure
Partial exchangeability, for any t we have a distribution (possibly a mixture)
(Possibly multiple) data available at discrete time points
Model collection of random probability measures, forming
a discrete time process, ora continuous-time process, with continuous paths or jumps
Nonparametric approach to allow for full flexibility
Analyse properties of the resulting model
Devise suitable strategies forposterior computation
Carry out inference on desired quantities
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 12
1. Motivation and general settingGeneral setting
X1, X2, . . .iid∼ P0 unique labels or locations in X
We are interested in time-dependent random probability measures of type
p(t) = (p1(t), p2(t), . . .) ∈ ∆∞
p↓(t) = (p(1)(t), p(2)(t), . . .) ∈ ∇∞
P (t) =∑∞
i=1pi(t)δXi(t) ∈P(X)
where t ≥ 0 represents time.
Discrete sample paths:p, p↓, P are countable collections of distributions, t ∈ NContinous sample paths:p, p↓, P are (random) t-continuous functions from [0,∞) to ∆∞,∇∞ or P(X)
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 13
2. Diffusive Dirichlet mixture modelsDirichlet process
The Dirichlet process [Ferguson 1973] extends the Dirichlet distribution from Kto infinitely many types
Can be defined via stick-breaking [Sethuraman 1994]
Viiid∼ Beta(1, θ), pi = Vi
i−1∏k=1
(1− Vk)
0 1p1 = V1 1− V1
V2
p2 (1− V1)(1− V2)
V3...
s.t. pi → 0 as i→∞ and∑i≥1 pi = 1
Take Xiiid∼ P0 with P0 continuous on X.
Then P =∑∞i=1 piδXi is a Dirichlet process
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 15
2. Diffusive Dirichlet mixture modelsDirichlet process
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.02
0.04
0.06
0.08
0.10
x
p
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 16
2. Diffusive Dirichlet mixture modelsDependent Dirichlet process
Basic idea [MacEachern, 1999]
We aim at defining a process
P (t) =
∞∑i=1
pi(t)δXi(t), t ≥ 0,
with Dirichlet process marginals
Handling both (p1(t), p2(t), . . .) and (X1(t), X2(t), . . .) can be non trivial.Consider instead
P (t) =∞∑i=1
pi(t)δXi , t ≥ 0, Xiiid∼ P0
Atoms are fixed, but there are infinitely many of them
In practice, as many as you need
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 17
2. Diffusive Dirichlet mixture modelsDiffusive Dirichlet process
Take the Dirichlet stick-breaking weights
pi = Vi
i−1∏k=1
(1− Vk), Vi ∼iid Beta(1, θ)
Substitute each component Vi ∈ [0, 1] with a diffusion {Vi(t)}t≥0 on [0, 1]
Then take
pi(t) = Vi(t)
i−1∏k=1
(1− Vk(t))
Each component needs to have Beta marginals, Vi(t) ∼ Beta(1, θ)
One-dimensional Wright–Fisher diffusions satisfy this
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 18
2. Diffusive Dirichlet mixture modelsWright–Fisher diffusions
0 2 4 6 8 10
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 19
2. Diffusive Dirichlet mixture modelsWright–Fisher diffusions
% of type 1 individuals (mutation rates: theta_1 = 2 , theta_2 = 8 )
Time (50K steps)
Sta
te s
pace
0 2 4 6 8 100
1
Ergodic frequencies against Stationary Distribution Beta( 2 , 8 )
State space0.0 0.2 0.4 0.6 0.8 1.0
01
23
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 20
2. Diffusive Dirichlet mixture modelsWright–Fisher diffusions
% of type 1 individuals (mutation rates: theta_1 = 8 , theta_2 = 8 )
Time (50K steps)
Sta
te s
pace
0 2 4 6 8 100
1
Ergodic frequencies against Stationary Distribution Beta( 8 , 8 )
State space0.0 0.2 0.4 0.6 0.8 1.0
0.0
1.5
3.0
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 21
2. Diffusive Dirichlet mixture modelsWright–Fisher diffusions
% of type 1 individuals (mutation rates: theta_1 = 0.4 , theta_2 = 0.4 )
Time (50K steps)
Sta
te s
pace
0 2 4 6 8 100
1
Ergodic frequencies against Stationary Distribution Beta( 0.4 , 0.4 )
State space0.0 0.2 0.4 0.6 0.8 1.0
04
8
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 22
2. Diffusive Dirichlet mixture modelsDiffusive Dirichlet process [Mena and R. 2016]
The resulting object
P (t) =∞∑i=1
(Vi(t)
i−1∏k=1
(1− Vk(t))︸ ︷︷ ︸pi(t)
)δXi , Vi(t) ∼WF(a, b)
has Dirichlet marginals for (a, b) = (1, θ), i.e. P (t) is a DP for all thas GEM marginals for (a, b) ∈ R2
+
has diffusive behaviour, P (t) is t-continuous in total variation
See also
Gutierrez, Mena and & R. 2016 (version with jumps)Mena, R. & Walker 2011 (geometric weights, different marginals)
for related models
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 23
2. Diffusive Dirichlet mixture modelsDiffusive Dirichlet process
0.0 0.2 0.4 0.6 0.8 1.0
time 1
0
0.029
0.059
0.088
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 24
2. Diffusive Dirichlet mixture modelsEstimation
At each time ti we have observations (yi,1, . . . , yi,ni).
Set up the hierarchical mixture
{Pt, t ≥ 0} ∼ diff-DP or GSB
xti | Pti ∼ Ptiyi,j | ti, xti
iid∼ f(· | xti)
equivalently yi is drawn from the time-dependent nonparametric mixture model
fti(y) =
∫Xf(y|x)Pti(dx) =
∞∑i=1
ptif(y | xi)
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 25
2. Diffusive Dirichlet mixture modelsSimulated data
True model
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 26
2. Diffusive Dirichlet mixture modelsSimulated data
Single data points
0 2 4 6 8 10
−20
24
68
True model (heat map), posterior mode (solid), 95% credible intervals for the mean (dashed), 95%quantiles of posterior density estimate (dotted).
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 27
2. Diffusive Dirichlet mixture modelsSimulated data
Multiple data points
0 2 4 6 8 10
−20
24
68
True model (heat map), posterior mode (solid), 95% credible intervals for the mean (dashed), 95%quantiles of posterior density estimate (dotted).
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 28
2. Diffusive Dirichlet mixture modelsReal data: S&P 500 (03/08 - 02/09)
Dependent density estimate
Heat map of estimated density (red), and mean estimate (solid)
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 29
2. Diffusive Dirichlet mixture modelsReal data: S&P 500 (03/08 - 02/09)
Dependent density estimate
160 170 180 190 200
800
900
1000
1100
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 30
2. Diffusive Dirichlet mixture modelsReal data: S&P 500 (03/08 - 02/09)
Dependent density estimate
160 170 180 190 200
800
900
1000
1100
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 31
3. Dynamic models for evolvingpopulations
A different view: modelling evolving populations
A sample path of p↓(t) = (p(1), . . . , p(7))
Time
0.1
0.2
0.3
0.4
0.5
0.6
0.7
FrequencyDynamic frenquencies of 7 species
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 33
3. Dynamic models for evolvingpopulations
A different view: modelling evolving populations
Distinct values X1, X2, . . . are interpreted asallelic types in geneticsplant or animal speciesunique identifiers of some evolving groups
Large population → species abundances approximate diffusive behaviours
If cannot provide an a priori upper bound, assume infinitely many species
Two different approaches:constructing stochastic models for pseudo-realistic evolutionary mechanisms(mutation, selection, recombination, migration, . . . )studying the association between certaindistributions and connected dynamics
Dynamics in figure are related toa Dirichlet distribution
Can we extend them? To what extent?With what interpretation?
Time
0.1
0.2
0.3
0.4
0.5
0.6
0.7
FrequencyDynamic frenquencies of 7 species
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 34
3. Dynamic models for evolvingpopulations
Wright–Fisher signals: Dirichlet-Multinomial model
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 35
3. Dynamic models for evolvingpopulations
Poisson-Dirichlet case
No. species Markov chain(N individuals)
KWright-Fisher(N,K, θ)
Fisher (1930), Wright (1931)
Diffusion(∞ individuals)
d
N →∞Wright-Fisher(K, θ)
Sato (1976)
stationary
w.r.t.Dir
(θK , . . . ,
θK
)Random measure
(t fixed)
∞ IMNA(θ)Ethier and Kurtz (1981)
d K →∞
PD(θ)Kingman (1975)
d K →∞
stationary
w.r.t.
Moran(N, θ)Watterson (1976)
d
N →∞
“d−→” = convergence in distribution
IMNA = infinitely many neutral alleles
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 36
3. Dynamic models for evolvingpopulations
Two-parameter Poisson-Dirichlet case
No. species
∞ PD(θ, α)Pitman (1995)
Random measure(t fixed)
Diffusion(∞ individuals)
IMNA(θ, α)Petrov (2009)
stationary
w.r.t.?? Moran(N, θ, α)
R. and Walker (2009)
d
N →∞
Markov chain(N individuals)
?? WF(K, θ, α)Costantini, De Blasi,
Ethier, R., Spano (2016)
d K →∞
K ?? WF(N,K, θ, α)Costantini, De Blasi,
Ethier, R., Spano (2016)
d
N →∞
stationary
w.r.t. ??
d K →∞
Remarks:
IMNA = infinitely many neutral allelesBased on Pitman’s generalized Polya urn schemeMutation and immigration
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 37
4. Computing time dependentposteriors
Continuous-time Gamma-Poisson model
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
CIR path X_t
0
5
10
15
20
25
30
35
0 10 20 30 40 50
Poisson(X_t) likelihood
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 39
4. Computing time dependentposteriors
Continuous-time Gamma-Poisson model
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 40
4. Computing time dependentposteriors
Continuous-time Gamma-Poisson model
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 41
4. Computing time dependentposteriors
The propagation mixture
Prior X ∼ πα := Gamma(α1, α2)
Likelihood Y | X ∼ Poisson(X)
Posterior X | Y1, . . . , Yn ∼ πα,n := Gamma(α1 +
∑n
i=1yi, α2 + n
)Propagation mixture [Papaspiliopoulos & R. 2014]
ψt(πα,n) :=
∫πα,n(x)Pt(x,dx
′)
is given by
ψt(πα,n) =
n∑j=0
pt(n, j)Gamma(α1 +
∑n
i=0yi − j, α2 + n− st
)for appropriate time-varying weights pt(n, j)
Can be extended to infinite dimensional models [Papaspiliopoulos, R. & Spano
2016]
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 42
4. Computing time dependentposteriors
Continuous-time Gamma-Poisson model
0 1 2 3 4 5 6 7
0.1
0.2
0.3
0.4
0.5
t � t0
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 43
4. Computing time dependentposteriors
Continuous-time Gamma-Poisson model
0 1 2 3 4 5 6 7
0.1
0.2
0.3
0.4
0.5
t � t0
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 44
4. Computing time dependentposteriors
Some references
Costantini, De Blasi, Ethier, R. and Spano (2016).Wright–Fisher construction of the two-parameter Poisson–Dirichlet diffusion.arXiv:1601.06064
Gutierrez, Mena & R. (2016).A time dependent Bayesian nonparametric model for air quality analysis.Comput. Statist. Data Anal.
Mena & R. (2016).Dynamic density estimation with diffusive Dirichlet mixtures. Bernoulli
Mena, R. & Walker (2011).Geometric stick-breaking processes for continuous-time Bayesian nonparametric modeling.J. Statist. Plann. Inf.
Papaspiliopoulos & R. (2014).Optimal filtering and the dual process. Bernoulli
Papaspiliopoulos, R. & Spano (2014).Filtering hidden Markov measures. arXiv:1411.4944
R. & Walker (2009).Countable representation for infinite dimensional diffusions derived from thetwo-parameter Poisson–Dirichlet process. Electr. Comm. Probab.
For more info: www.matteoruggiero.it
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 45