PLNmodels - A collection of Poisson lognormal models for ... filePLNmodels A collection of Poisson lognormal models for multivariate analysis of count data Julien Chiquet, MIA Paris

PLNmodelsA collection of Poisson lognormal models for multivariate analysis of count data

Julien Chiquet, MIA Paris

joint work with M. Mariadasou, S. Robin

AG MIA, Jouy-en-Josas, May, 22 2019

J.C., Mahendra Mariadassou, Stephane Robin,

Variational inference for probabilistic Poisson PCAhttp://dx.doi.org/10.1214/18-AOAS1177 Ann Appl Statist 12: 2674–2698, 2018

J.C., Mahendra Mariadassou, Stephane Robin,

Variational inference for sparse network reconstruction from count dataIn Proceedings of the 19th International Conference on Machine Learning (ICML’19)

PLNmodels package, development version on githubinstall.packages("PLNmodels")

https://jchiquet.github.io/PLNmodels/

1

http://dx.doi.org/10.1214/18-AOAS1177

https://jchiquet.github.io/PLNmodels/

Motivations: oak powdery mildew pathobiome

Metabarcoding data from [JFS+16]

I n = 116 leaves, p = 114 species (66 bacteria, 47 fungies + E. alphitoides)

counts[1:3, c(1:4, 48:51)]

## f_1 f_2 f_3 f_4 E_alphitoides b_1045 b_109 b_1093## A1.02 72 5 131 0 0 0 0 0## A1.03 516 14 362 0 0 0 0 0## A1.04 305 24 238 0 0 0 0 0

I d = 8 covariates (tree susceptibility, distance to trunk, orientation, . . . )

covariates[1:3, ]

## tree distTOtrunk distTOground pmInfection orientation## A1.02 intermediate 202 155.5 1 SW## A1.03 intermediate 175 144.5 0 SW## A1.04 intermediate 168 141.5 0 SW

I Sampling effort in each sample (bacteria 6= fungi)

offsets[1:3, c(1:4, 48:51)]

## f_1 f_2 f_3 f_4 E_alphitoides b_1045 b_109 b_1093## [1,] 2488 2488 2488 2488 2488 8315 8315 8315## [2,] 2054 2054 2054 2054 2054 662 662 662## [3,] 2122 2122 2122 2122 2122 480 480 480

2

Problematic & Basic formalism

Data tables: Y = (Yij ),n × p; X = (Xik ),n × d ; O = (Oij ),n × p where

I Yij = abundance (read counts) of species j in sample i

I Xik = value of covariate k in sample i

I Oij = offset (sampling effort) for species j in sample i

Need a generic framework to model dependences between count variables

I account for peculiarities of count data vary over many orders of magnitude are overdispersed

I exhibit patterns of diversity summarize the information from Y (PCA, clustering, . . . )

I understand between-species interactions ’network’ inference (variable/covariance selection)

I correct for technical and confounding effects account for covariables and sampling effort

3

Models for multivariate count data

If we were in a Gaussian world, the general linear model would be appropriate

For each sample i = 1, . . . ,n, it explains

I the abundances of the p species (Yi)

I by the values of the d covariates Xi and the p offsets Oi

Yi = XiΘ︸︷︷︸account forcovariates

+ Oi︸︷︷︸account for

sampling effort

+εi , εi ∼ N (0p , Σ︸︷︷︸dependence

between species

)

But we are not, and there is no generic model for multivariate counts

I Data transformation (log ,√

): quick and dirty

I Non-Gaussian multivariate distributions: do not scale to data dimension

I Latent variable models: interaction occur in a latent (unobserved) layer

4

Models for multivariate count data

If we were in a Gaussian world, the general linear model would be appropriate

For each sample i = 1, . . . ,n, it explains

I the abundances of the p species (Yi)

I by the values of the d covariates Xi and the p offsets Oi

Yi = XiΘ︸︷︷︸account forcovariates

+ Oi︸︷︷︸account for

sampling effort

+εi , εi ∼ N (0p , Σ︸︷︷︸dependence

between species

)

But we are not, and there is no generic model for multivariate counts

I Data transformation (log ,√

): quick and dirty

I Non-Gaussian multivariate distributions: do not scale to data dimension

I Latent variable models: interaction occur in a latent (unobserved) layer

4

Poisson-log normal (PLN) distribution

A latent Gaussian model

Originally proposed by Atchisson [AH89]

Zi ∼ N (0,Σ)

Yi |Zi ∼ P(exp Oi + Xᵀi Θ + Zi)

Interpretation

I Dependency structure encoded in the latent space (i.e. in Σ)

I Additional effects are fixed

I Conditional Poisson distribution = noise model

Properties

+ over-dispersion

+ covariance with arbitrary signs

- maximum likelihood via EM algorithm is limited to a couple of variables5

Geometrical view

−2

0

2

−3 −2 −1 0 1 2

species 1

spec

ies

2

Latent Space (Z)

0

5

10

15

20

0.0 2.5 5.0 7.5 10.0 12.5

species 1

spec

ies

2

Observation Space (exp(Z))

0

5

10

15

20

0 5 10

species 1

spec

ies

2

Observation Space (Y = P(exp(Z))) + noise

0

5

10

15

0 5 10

species 1

spec

ies

2

Observation Space (Y) + noise

6

Geometrical view (with offset)

−2

0

2

−3 −2 −1 0 1 2

species 1

spec

ies

2

Latent Space (Z)

10

20

30

40

50

20 40 60

species 1

spec

ies

2

Observation Space (exp(Z + O))

10

20

30

40

50

20 40 60

species 1

spec

ies

2

Observation Space (Y = P(exp(Z + O))) + noise

10

20

30

40

50

20 40 60

species 1

spec

ies

2

Observation Space (Y) + noise

7

Intractable EM

Aim of the inference:

I estimate β = (Θ,Σ)

I predict the Zi

Maximum likelihood

PLN is an incomplete data model: try EM

log pβ(Y) = E[log pβ(Y,Z) |Y] +H[pβ(Z |Y)]

EM requires to evaluate (some moments of)

p(Z |Y) =∏i

p(Zi |Yi)

but no close form for p(Zi |Yi).

I [Kar05] resorts to numerical or Monte-Carlo integration.I Variational approach [WJ08]: use a proxy of p(Z |Y). 8

Variational EMVariational approximation: choose a class of distribution Q

Q =p : p(Z) =

∏i

pi(Zi), pi(Zi) = N (Zi ; mi , si)

and maximize the lower bound (E = expectation under p)

J (θ, p) = log pβ(Y)−KL[p(Z) || pβ(Z |Y)] = E[log pβ(Y,Z)] +H[p(Z)]

Variational EM.I VE step: find the optimal p:

ph = arg max J (βh , p) = arg minp∈Q

KL[p(Z) || pβh (Z |Y )]

I M step: update β

βh

= arg max J (β, ph) = arg maxβ

E[log pβ(Y,Z)]

9

Optimization & Implementation

Property: The lower J (β, p) is bi-concave, i.e.

I wrt p = (M, S) for given β

I wrt β = (Σ,Θ) for given p

but not jointly concave in general.

Optimization: projected gradient ascent for the complete parameter (m, s,β)

I algorithm: conservative convex separable approximations [Sva02]

I implementation: NLopt nonlinear-optimization package [Joh11]

I initialization: LM after log-trasnformation applied independently on each variables +concatenation of the regression coefficients + Pearson residuals

PLNmodels R/C++-package: https://jchiquet.github.io/PLNmodels

10

https://jchiquet.github.io/PLNmodels

PLN: natural extensions towards multivariate analysis

I PCA: rank constraint on Σ.

Zi ∼ N (µ,Σ = BB>), B ∈Mpk with orthogonal columns.

I Network: sparsity constraint on inverse covariance.

Zi ∼ N (µ,Σ = Ω−1), ‖Ω‖1 < c.

I LDA: maximize separation between groups with means M = [µ>1 , . . . ,µ>K ]>

Zi ∼ N (µi = g>i M,Σ), gi a group indicator vector.

I Clustering: mixture model in the latent space

Zi ∼K∏

k=1

πkN (µk ,Σk ), with, e.g., Σk diagonal matrices

Challenge: a variant of the variational algorithm is required for each model

11

PLN network modelModel:

Zi iid ∼ Np(0p ,Ω−1), Ω sparse, ‖Ω‖1,offdiagonal < c

Yi |Zi ∼ P(expOi + X>i Θ + Zi)Cheat: Use the PLN model and infer the graphical model of Z

(i , j ) /∈ E ⇔ Zi ⊥⊥ Zj |Z\i,j ⇔ Ωij = 0.

Graphical interpretation: p(Zi ,Yi) vs p(Yi)

Z1

Z2

Z3

Z4 Z5

Y1

Y2

Y3

Y4 Y5

Y1

Y2

Y3

Y4 Y5

12

PLN network modelModel:

Zi iid ∼ Np(0p ,Ω−1), Ω sparse, ‖Ω‖1,offdiagonal < c

Yi |Zi ∼ P(expOi + X>i Θ + Zi)Cheat: Use the PLN model and infer the graphical model of Z

(i , j ) /∈ E ⇔ Zi ⊥⊥ Zj |Z\i,j ⇔ Ωij = 0.

Graphical interpretation: p(Zi ,Yi) vs p(Yi)

Z1

Z2

Z3

Z4 Z5

Y1

Y2

Y3

Y4 Y5

Y1

Y2

Y3

Y4 Y5

12

Variational inference

Same problem: log pβ(Y) is intractable

Variational approximation: maximize

J (β, p)− λ ‖Ω‖1,off = E[log pβ(Y,Z)] +H[p(Z)]−λ ‖Ω‖1,off

taking p ∈ Q.

Still bi-concave in β = (Ω,Θ) and p = (M, S). Ex:

Ω = arg maxΩ

n

2

(log |Ω | − tr(ΣΩ)

)− λ‖Ω‖1,off : gLasso problem

13

Model selectionAlternative to model selection criteria

Sparsity level λ needs to be chosen.Stability-based approach for Network by resampling: StARS

1. Infers B networks Ω(b,λ) on subsamples of size m for varying λ.

2. Frequency of inclusion of each edges e = i ∼ j is estimated by

pλe = #b : Ω(b,λ)ij 6= 0/B

3. Variance of inclusion of edge e is vλe = pλe (1− pλe ).

4. Network stability is stab(λ) = 1− 2vλ where vλ is the average of the vλe .

StARS1 selects the smallest λ (densest network) for which stab(λ) ≥ 1− 2β

1[LRW10] suggest using 2β = 0.05 and m = b10√nc based on theoretical results.

14

An example in connection with the news

Data: first round of the French presidential election of 2017 (source: https://data.gouv.fr)

I votes cast for each of the 11 candidates in the more than 63, 000 polling stations

I voting population varied wildlyFrom 10 to 105,891 , with a median at 736 and 99.5% of the stations with less than 1,700 voters.

I patterns depend on geography

Models

I no offset

I offset: log-registered population of voters to account for different station sizes

I covariate: department as a proxy for geography.

Question: find competing candidates, who appeal to different voters, and compatible candidates

15

https://data.gouv.fr

French Presidential: no offset

Inferred network Latent Positions (PCA)

ARTHAUD

ASSELINEAU

CHEMINADE

DUPONT−AIGNAN

FILLON

HAMON

LASSALLE

LE PEN

MACRONMÉLENCHON

POUTOU

16

French Presidential: offset


ARTHAUD

ASSELINEAU

CHEMINADE

DUPONT−AIGNAN

FILLON

HAMON

LASSALLE

LE PEN

MACRONMÉLENCHON

POUTOU

17

French Presidential: departments


ARTHAUD

ASSELINEAU

CHEMINADE

DUPONT−AIGNAN

FILLON

HAMON

LASSALLE

LE PEN

MACRONMÉLENCHON

POUTOU

18

More ”conventional” example: Oak powdery mildew data set

Three setups

1. nr = 39 resistant samples, with covariates (orientation, distance to ground)

2. ns = 39 susceptible samples, with covariates (orientation, distance to ground)

3. both samples samples, with covariates + tree effect and interactions

Network inference

PLNnetwork + ’StARS’ for model selection

I 100 resamplings

I high level of stability (edges frequencies > 0.995)

Question: consensus or tree-specific networks?

19

PLNnetwork models: resistantTrees resistant to mildew (E. Alphitoıdes)

f1

f3

f4

f8

f10

f12f17f19

f25

f27f29

f32

f39

f1085

f1090f1278

Ea

b13

b153

b21 b25 b26b33

b364

b37

b44

b60

20

PLNnetwork models: susceptibleTrees susceptibles to mildew (E. Alphitoıdes)

f1

f3

f4

f8

f10

f12f17f19f25

f27

f29

f32f39

f1085

f1090

f1278

Eab13

b153

b21b25 b26

b33

b364

b37

b44

b60

21

PLNnetwork models: consensusBoth Trees

f1

f3

f4

f8

f10

f12f17f19f25

f27

f29

f32

f39

f1085

f1090

f1278

Ea

b13

b153

b21 b25 b26b33

b364

b37

b44

b60

22

PLNnetwork models: covariate effectcoefficients associated to orientation

orie

ntat

ion_

NE

orie

ntat

ion_

SW

resistant susceptible

−7.5

−5.0

−2.5

−7.5

−5.0

−2.5

typeBacteriaFungi

23

DiscussionSummary

I PLN = generic model for multivariate count data analysis

I Allows for covariates

I Flexible modeling of the covariance structure

I Efficient VEM algorithm

I PLNmodels package: https://github.com/jchiquet/PLNmodels

Ongoing extension...

I Confidence interval and tests for the regular PLN

I Other covariance structures (spatial, time series, ...), mixture models, . . .

I Zero-Inflation

Following PLN Network Raphaelle Momal’s PhD (supervized by S. Robin and C. Ambroise)

I Tree-based decomposition of the underlying graphical model

I Other Model selection criterion for network inference24

https://github.com/jchiquet/PLNmodels

References

John Aitchison and CH Ho.

The multivariate poisson-log normal distribution.Biometrika, 76(4):643–653, 1989.

B. Jakuschkin, V. Fievet, L. Schwaller, T. Fort, C. Robin, and C. Vacher.

Deciphering the pathobiome: Intra-and interkingdom interactions involving the pathogen Erysiphe alphitoides.Microbial ecology, pages 1–11, 2016.

Steven G Johnson.

The NLopt nonlinear-optimization package, 2011.

D. Karlis.

EM algorithm for mixed Poisson and other discrete distributions.Astin bulletin, 35(01):3–24, 2005.

Han Liu, Kathryn Roeder, and Larry Wasserman.

Stability approach to regularization selection (stars) for high dimensional graphical models.In Proceedings of the 23rd International Conference on Neural Information Processing Systems - Volume 2, NIPS’10, pages 1432–1440, USA, 2010. Curran Associates Inc.

Krister Svanberg.

A class of globally convergent optimization methods based on conservative convex separable approximations.SIAM journal on optimization, 12(2):555–573, 2002.

M. J. Wainwright and M. I. Jordan.

Graphical models, exponential families, and variational inference.Found. Trends Mach. Learn., 1(1–2):1–305, 2008.

25

Documents

PLNmodels - A collection of Poisson lognormal models for ... filePLNmodels A collection of Poisson lognormal models for multivariate analysis of count data Julien Chiquet, MIA Paris