Dependent processes in Bayesian Nonparametrics

Dependent processesin Bayesian nonparametrics

Matteo Ruggiero

University of Torino and Collegio Carlo Alberto

Moncalieri, Feb 19 2016

0.0 0.2 0.4 0.6 0.8 1.0

time 1

0

0.029

0.059

0.088

1. Motivation and general settingBNP and discrete random probability measures

p = (p1, p2, . . .) frequencies in

∆∞ ={p ∈ [0, 1]∞ :

∑ipi = 1

}p↓ = (p(1), p(2), . . .) ordered frequencies in

∇∞ ={p ∈ [0, 1]∞ : p1 ≥ p2 ≥ · · · ≥ 0,

∑ipi = 1

}Assign law to p, which induces a distributionon ∆∞, ∇∞Otherwise assign to the indices unique labels

X1, X2, . . .iid∼ P0 continuous on X and define

the discrete measure

∞∑i=1

piδXi

which induces a distribution on P(X)

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 3

1. Motivation and general settingBNP and discrete random probability measures

Approach 1:model observations Yj directly with

p = (p1, p2, . . .) or P =∑∞

i=1piδXi

where Yj = Xi w.p. pi, and the (Xi, pi) are random

Approach 2:use mixtures to yield more flexibility and possibly aim at continuousdistributions

f(y) =

∫Xf(y | x)P (dx) ⇒ f(y) =

∑∞

i=1pif(y | Xi)

i.e. Yj ∼ f(y | Xi) w.p. pi and the (Xi, pi) are random

Use either approach as a base for estimation, uncertainty quantification,forecasting, clustering, . . .


1. Motivation and general settingMotivation for dependent processes

Assumptions in classical BNP approach:

observations are excheangeableobservations depend on a fixed environment/state of the worldinference is static (fixed time)/carried out on single environment

Data may not satisfy these assumptions (e.g. prices dynamics)

Need for more general types of dependence


1. Motivation and general settingPartial exchangeability

Natural extension is partial exchangeability (de Finetti sense), e.g.X1,1 X1,2 X1,3 · · ·X2,1 X2,2 X2,3 · · ·X3,1 X3,2 X3,3 · · ·· · · · · · · · · · · ·

row-wise exchangeability (not overall): given i, Xi,j are exchangeable

Accommodates e.g. temporal structures

Collection of random probability measures, indexed by some covariate

Can be extended to an uncountable family


1. Motivation and general settingDependent densities: discrete time


1. Motivation and general settingDependent densities: discrete time


1. Motivation and general settingDependent densities: continuous time






1. Motivation and general settingModelling and inference with time-dependent processes

Temporal dependence structure

Partial exchangeability, for any t we have a distribution (possibly a mixture)

(Possibly multiple) data available at discrete time points

Model collection of random probability measures, forming

a discrete time process, ora continuous-time process, with continuous paths or jumps

Nonparametric approach to allow for full flexibility

Analyse properties of the resulting model

Devise suitable strategies forposterior computation

Carry out inference on desired quantities


1. Motivation and general settingGeneral setting

X1, X2, . . .iid∼ P0 unique labels or locations in X

We are interested in time-dependent random probability measures of type

p(t) = (p1(t), p2(t), . . .) ∈ ∆∞

p↓(t) = (p(1)(t), p(2)(t), . . .) ∈ ∇∞

P (t) =∑∞

i=1pi(t)δXi(t) ∈P(X)

where t ≥ 0 represents time.

Discrete sample paths:p, p↓, P are countable collections of distributions, t ∈ NContinous sample paths:p, p↓, P are (random) t-continuous functions from [0,∞) to ∆∞,∇∞ or P(X)


2. Diffusive Dirichlet mixture modelsDirichlet process

The Dirichlet process [Ferguson 1973] extends the Dirichlet distribution from Kto infinitely many types

Can be defined via stick-breaking [Sethuraman 1994]

Viiid∼ Beta(1, θ), pi = Vi

i−1∏k=1

(1− Vk)

0 1p1 = V1 1− V1

V2

p2 (1− V1)(1− V2)

V3...

s.t. pi → 0 as i→∞ and∑i≥1 pi = 1

Take Xiiid∼ P0 with P0 continuous on X.

Then P =∑∞i=1 piδXi is a Dirichlet process


2. Diffusive Dirichlet mixture modelsDirichlet process

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.02

0.04

0.06

0.08

0.10

x

p


2. Diffusive Dirichlet mixture modelsDependent Dirichlet process

Basic idea [MacEachern, 1999]

We aim at defining a process

P (t) =

∞∑i=1

pi(t)δXi(t), t ≥ 0,

with Dirichlet process marginals

Handling both (p1(t), p2(t), . . .) and (X1(t), X2(t), . . .) can be non trivial.Consider instead

P (t) =∞∑i=1

pi(t)δXi , t ≥ 0, Xiiid∼ P0

Atoms are fixed, but there are infinitely many of them

In practice, as many as you need


2. Diffusive Dirichlet mixture modelsDiffusive Dirichlet process

Take the Dirichlet stick-breaking weights

pi = Vi

i−1∏k=1

(1− Vk), Vi ∼iid Beta(1, θ)

Substitute each component Vi ∈ [0, 1] with a diffusion {Vi(t)}t≥0 on [0, 1]

Then take

pi(t) = Vi(t)

i−1∏k=1

(1− Vk(t))

Each component needs to have Beta marginals, Vi(t) ∼ Beta(1, θ)

One-dimensional Wright–Fisher diffusions satisfy this


2. Diffusive Dirichlet mixture modelsWright–Fisher diffusions

0 2 4 6 8 10



% of type 1 individuals (mutation rates: theta_1 = 2 , theta_2 = 8 )

Time (50K steps)

Sta

te s

pace

0 2 4 6 8 100

1

Ergodic frequencies against Stationary Distribution Beta( 2 , 8 )

State space0.0 0.2 0.4 0.6 0.8 1.0

01

23



% of type 1 individuals (mutation rates: theta_1 = 8 , theta_2 = 8 )

Time (50K steps)

Sta

te s

pace

0 2 4 6 8 100

1

Ergodic frequencies against Stationary Distribution Beta( 8 , 8 )

State space0.0 0.2 0.4 0.6 0.8 1.0

0.0

1.5

3.0



% of type 1 individuals (mutation rates: theta_1 = 0.4 , theta_2 = 0.4 )

Time (50K steps)

Sta

te s

pace

0 2 4 6 8 100

1

Ergodic frequencies against Stationary Distribution Beta( 0.4 , 0.4 )

State space0.0 0.2 0.4 0.6 0.8 1.0

04

8


2. Diffusive Dirichlet mixture modelsDiffusive Dirichlet process [Mena and R. 2016]

The resulting object

P (t) =∞∑i=1

(Vi(t)

i−1∏k=1

(1− Vk(t))︸︷︷︸pi(t)

)δXi , Vi(t) ∼WF(a, b)

has Dirichlet marginals for (a, b) = (1, θ), i.e. P (t) is a DP for all thas GEM marginals for (a, b) ∈ R2

+

has diffusive behaviour, P (t) is t-continuous in total variation

See also

Gutierrez, Mena and & R. 2016 (version with jumps)Mena, R. & Walker 2011 (geometric weights, different marginals)

for related models


2. Diffusive Dirichlet mixture modelsDiffusive Dirichlet process

0.0 0.2 0.4 0.6 0.8 1.0

time 1

0

0.029

0.059

0.088


2. Diffusive Dirichlet mixture modelsEstimation

At each time ti we have observations (yi,1, . . . , yi,ni).

Set up the hierarchical mixture

{Pt, t ≥ 0} ∼ diff-DP or GSB

xti | Pti ∼ Ptiyi,j | ti, xti

iid∼ f(· | xti)

equivalently yi is drawn from the time-dependent nonparametric mixture model

fti(y) =

∫Xf(y|x)Pti(dx) =

∞∑i=1

ptif(y | xi)


2. Diffusive Dirichlet mixture modelsSimulated data

True model



Single data points

0 2 4 6 8 10

−20

24

68

True model (heat map), posterior mode (solid), 95% credible intervals for the mean (dashed), 95%quantiles of posterior density estimate (dotted).



Multiple data points

0 2 4 6 8 10

−20

24

68

True model (heat map), posterior mode (solid), 95% credible intervals for the mean (dashed), 95%quantiles of posterior density estimate (dotted).


2. Diffusive Dirichlet mixture modelsReal data: S&P 500 (03/08 - 02/09)

Dependent density estimate

Heat map of estimated density (red), and mean estimate (solid)




160 170 180 190 200

800

900

1000

1100




160 170 180 190 200

800

900

1000

1100


3. Dynamic models for evolvingpopulations

A different view: modelling evolving populations

A sample path of p↓(t) = (p(1), . . . , p(7))

Time

0.1

0.2

0.3

0.4

0.5

0.6

0.7

FrequencyDynamic frenquencies of 7 species



A different view: modelling evolving populations

Distinct values X1, X2, . . . are interpreted asallelic types in geneticsplant or animal speciesunique identifiers of some evolving groups

Large population → species abundances approximate diffusive behaviours

If cannot provide an a priori upper bound, assume infinitely many species

Two different approaches:constructing stochastic models for pseudo-realistic evolutionary mechanisms(mutation, selection, recombination, migration, . . . )studying the association between certaindistributions and connected dynamics

Dynamics in figure are related toa Dirichlet distribution

Can we extend them? To what extent?With what interpretation?

Time

0.1

0.2

0.3

0.4

0.5

0.6

0.7

FrequencyDynamic frenquencies of 7 species



Wright–Fisher signals: Dirichlet-Multinomial model



Poisson-Dirichlet case

No. species Markov chain(N individuals)

KWright-Fisher(N,K, θ)

Fisher (1930), Wright (1931)

Diffusion(∞ individuals)

d

N →∞Wright-Fisher(K, θ)

Sato (1976)

stationary

w.r.t.Dir

(θK , . . . ,

θK

)Random measure

(t fixed)

∞ IMNA(θ)Ethier and Kurtz (1981)

d K →∞

PD(θ)Kingman (1975)

d K →∞

stationary

w.r.t.

Moran(N, θ)Watterson (1976)

d

N →∞

“d−→” = convergence in distribution

IMNA = infinitely many neutral alleles



Two-parameter Poisson-Dirichlet case

No. species

∞ PD(θ, α)Pitman (1995)

Random measure(t fixed)

Diffusion(∞ individuals)

IMNA(θ, α)Petrov (2009)

stationary

w.r.t.?? Moran(N, θ, α)

R. and Walker (2009)

d

N →∞

Markov chain(N individuals)

?? WF(K, θ, α)Costantini, De Blasi,

Ethier, R., Spano (2016)

d K →∞

K ?? WF(N,K, θ, α)Costantini, De Blasi,

Ethier, R., Spano (2016)

d

N →∞

stationary

w.r.t. ??

d K →∞

Remarks:

IMNA = infinitely many neutral allelesBased on Pitman’s generalized Polya urn schemeMutation and immigration


4. Computing time dependentposteriors

Continuous-time Gamma-Poisson model

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●●

●

●

●

●

●

●

● ●

CIR path X_t

0

5

10

15

20

25

30

35

0 10 20 30 40 50

Poisson(X_t) likelihood









The propagation mixture

Prior X ∼ πα := Gamma(α1, α2)

Likelihood Y | X ∼ Poisson(X)

Posterior X | Y1, . . . , Yn ∼ πα,n := Gamma(α1 +

∑n

i=1yi, α2 + n

)Propagation mixture [Papaspiliopoulos & R. 2014]

ψt(πα,n) :=

∫πα,n(x)Pt(x,dx

′)

is given by

ψt(πα,n) =

n∑j=0

pt(n, j)Gamma(α1 +

∑n

i=0yi − j, α2 + n− st

)for appropriate time-varying weights pt(n, j)

Can be extended to infinite dimensional models [Papaspiliopoulos, R. & Spano

2016]




0 1 2 3 4 5 6 7

0.1

0.2

0.3

0.4

0.5

t � t0




0 1 2 3 4 5 6 7

0.1

0.2

0.3

0.4

0.5

t � t0



Some references

Costantini, De Blasi, Ethier, R. and Spano (2016).Wright–Fisher construction of the two-parameter Poisson–Dirichlet diffusion.arXiv:1601.06064

Gutierrez, Mena & R. (2016).A time dependent Bayesian nonparametric model for air quality analysis.Comput. Statist. Data Anal.

Mena & R. (2016).Dynamic density estimation with diffusive Dirichlet mixtures. Bernoulli

Mena, R. & Walker (2011).Geometric stick-breaking processes for continuous-time Bayesian nonparametric modeling.J. Statist. Plann. Inf.

Papaspiliopoulos & R. (2014).Optimal filtering and the dual process. Bernoulli

Papaspiliopoulos, R. & Spano (2014).Filtering hidden Markov measures. arXiv:1411.4944

R. & Walker (2009).Countable representation for infinite dimensional diffusions derived from thetwo-parameter Poisson–Dirichlet process. Electr. Comm. Probab.

For more info: www.matteoruggiero.it


Data & Analytics

Dependent processes in Bayesian Nonparametrics