80
Penalized Maximum Likelihood Inference for Sparse Gaussian Graphical Models with Latent Structure Christophe Ambroise, Julien Chiquet and Catherine Matias Laboratoire Statistique et G ´ enome, La g ´ enopole - Universit ´ e d’ ´ Evry Statistique et sant ´ e publique, le 13 janvier 2009 Ambroise, Chiquet, Matias 1

Gaussian Graphical Models with latent structure

Embed Size (px)

Citation preview

Page 1: Gaussian Graphical Models with latent structure

Penalized Maximum Likelihood Inference forSparse Gaussian Graphical Models with

Latent Structure

Christophe Ambroise, Julien Chiquet and Catherine Matias

Laboratoire Statistique et Genome,La genopole - Universite d’Evry

Statistique et sante publique, le 13 janvier 2009

Ambroise, Chiquet, Matias 1

Page 2: Gaussian Graphical Models with latent structure

Inferring Sparse Networks with LatentStructure

Christophe Ambroise, Julien Chiquet and Catherine Matias

Laboratoire Statistique et Genome,La genopole - Universite d’Evry

Statistique et sante publique, le 13 janvier 2009

Ambroise, Chiquet, Matias 1

Page 3: Gaussian Graphical Models with latent structure

Biological networksDifferent kinds of biological interactions

Families of networksI protein-protein

interactions,I metabolic pathways,I regulation network.

lexA

dinI

recF

rpD rpH

SsB

recA

umD

rpS

Regulation example : SOS Network E. Coli

Let us focus on regulatory networks . . . and look for influencenetwork

Ambroise, Chiquet, Matias 2

Page 4: Gaussian Graphical Models with latent structure

Biological networksDifferent kinds of biological interactions

Families of networksI protein-protein

interactions,I metabolic pathways,I regulation network.

lexA

dinI

recF

rpD rpH

SsB

recA

umD

rpS

Regulation example : SOS Network E. Coli

Let us focus on regulatory networks . . . and look for influencenetwork

Ambroise, Chiquet, Matias 2

Page 5: Gaussian Graphical Models with latent structure

Biological networksDifferent kinds of biological interactions

Families of networksI protein-protein

interactions,I metabolic pathways,I regulation network.

lexA

dinI

recF

rpD rpH

SsB

recA

umD

rpS

Regulation example : SOS Network E. Coli

Let us focus on regulatory networks . . . and look for influencenetwork

Ambroise, Chiquet, Matias 2

Page 6: Gaussian Graphical Models with latent structure

What questions?

Network

Inference

Supervised

Un-supervised

StructureDegree

distri-bution

Communityanalysis

Stat.model

Spectralclustering

How to find the interactions?What knowledge the structurecan provide?

Given a new node, what are theinteraction with the known nodes?

Given two nodes, do theyinteract?

Communities’ characteristics?Ambroise, Chiquet, Matias 3

Page 7: Gaussian Graphical Models with latent structure

What questions?

Network

Inference

Supervised

Un-supervised

StructureDegree

distri-bution

Communityanalysis

Stat.model

Spectralclustering

How to find the interactions?What knowledge the structurecan provide?

Given a new node, what are theinteraction with the known nodes?

Given two nodes, do theyinteract?

Communities’ characteristics?Ambroise, Chiquet, Matias 3

Page 8: Gaussian Graphical Models with latent structure

ProblemInfer the interactions between genes from microarray data

Microarray gene expression data,p genes, n experiments Which ones interact/co-express?

G0 G1

G2

G3

G4

G5

G6

G7

G8

G9

Major Issues

I combinatory: 2p2

possible graphsI dimension problem: n� p

Here, we reduce p to a number of fixed genes of interest

Ambroise, Chiquet, Matias 4

Page 9: Gaussian Graphical Models with latent structure

ProblemInfer the interactions between genes from microarray data

Microarray gene expression data,p genes, n experiments

Inference

Which ones interact/co-express?

G0 G1

G2

G3

G4

G5

G6

G7

G8

G9

Major Issues

I combinatory: 2p2

possible graphsI dimension problem: n� p

Here, we reduce p to a number of fixed genes of interest

Ambroise, Chiquet, Matias 4

Page 10: Gaussian Graphical Models with latent structure

ProblemInfer the interactions between genes from microarray data

Microarray gene expression data,p genes, n experiments

Inference

Which ones interact/co-express?

G0 G1

G2

G3

G4

G5

G6

G7

G8

G9

Major Issues

I combinatory: 2p2

possible graphsI dimension problem: n� p

Here, we reduce p to a number of fixed genes of interest

Ambroise, Chiquet, Matias 4

Page 11: Gaussian Graphical Models with latent structure

Our ideas to tackle these issues

Introduce prior taking the topology of the network intoaccount for better edge inference

G0 G1

G2

G3

G4

G5

G6

G7

G8

G9

Relying on biological constraints

1. few genes effectively interact (sparsity),2. networks are organized (latent structure).

Ambroise, Chiquet, Matias 5

Page 12: Gaussian Graphical Models with latent structure

Our ideas to tackle these issues

Introduce prior taking the topology of the network intoaccount for better edge inference

G0 G1

G2

G3

G4

G5

G6

G7

G8

G9

Relying on biological constraints

1. few genes effectively interact (sparsity),2. networks are organized (latent structure).

Ambroise, Chiquet, Matias 5

Page 13: Gaussian Graphical Models with latent structure

Our ideas to tackle these issues

Introduce prior taking the topology of the network intoaccount for better edge inference

A1 A2

A3

B1

B2

B3

B4

B5

C1

C2

Relying on biological constraints

1. few genes effectively interact (sparsity),2. networks are organized (latent structure).

Ambroise, Chiquet, Matias 5

Page 14: Gaussian Graphical Models with latent structure

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 6

Page 15: Gaussian Graphical Models with latent structure

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 6

Page 16: Gaussian Graphical Models with latent structure

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 6

Page 17: Gaussian Graphical Models with latent structure

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 7

Page 18: Gaussian Graphical Models with latent structure

GGMsGeneral settings

The Gaussian modelI Let X ∈ Rp be a random vector such as X ∼ N (0p,Σ);I let (X1, . . . , Xn) be an i.i.d. size–n sample (e.g., microarray

experiments);I let X be a n× p matrix such as (Xk)ᵀ is the kth row of X;I let K = (Kij)(i,j)∈P2 := Σ−1 be the concentration matrix.

The graphical interpretation

Xi ⊥⊥ Xj |XP\{i,j} ⇔ Kij = 0⇔ edge (i, j) /∈ network,

since rij|P\{i,j} = −Kij/√KiiKjj .

K describes the graph of conditional dependencies.

Ambroise, Chiquet, Matias 8

Page 19: Gaussian Graphical Models with latent structure

GGMsGeneral settings

The Gaussian modelI Let X ∈ Rp be a random vector such as X ∼ N (0p,Σ);I let (X1, . . . , Xn) be an i.i.d. size–n sample (e.g., microarray

experiments);I let X be a n× p matrix such as (Xk)ᵀ is the kth row of X;I let K = (Kij)(i,j)∈P2 := Σ−1 be the concentration matrix.

The graphical interpretation

Xi ⊥⊥ Xj |XP\{i,j} ⇔ Kij = 0⇔ edge (i, j) /∈ network,

since rij|P\{i,j} = −Kij/√KiiKjj .

K describes the graph of conditional dependencies.

Ambroise, Chiquet, Matias 8

Page 20: Gaussian Graphical Models with latent structure

GGMs and regressionNetwork inference as p independent regression problems

One may use p different linear regressions

Xi = (X\i)ᵀα+ ε, where αj = −Kij/Kii,

Meinshausen and Bulhman’s approach (06)Solve p independent Lasso problems (`1–norm enforcessparsity):

α = arg minα

1n

∥∥Xi −X\iα∥∥2

2+ ρ ‖α‖`1 ,

where Xi is the ith column of X, and X\i is the full matrix with ithcolumn removed.

Major drawback: need of a symmetrization step to obtain afinal estimate of K.

Ambroise, Chiquet, Matias 9

Page 21: Gaussian Graphical Models with latent structure

GGMs and regressionNetwork inference as p independent regression problems

One may use p different linear regressions

Xi = (X\i)ᵀα+ ε, where αj = −Kij/Kii,

Meinshausen and Bulhman’s approach (06)Solve p independent Lasso problems (`1–norm enforcessparsity):

α = arg minα

1n

∥∥Xi −X\iα∥∥2

2+ ρ ‖α‖`1 ,

where Xi is the ith column of X, and X\i is the full matrix with ithcolumn removed.

Major drawback: need of a symmetrization step to obtain afinal estimate of K.

Ambroise, Chiquet, Matias 9

Page 22: Gaussian Graphical Models with latent structure

GGMs and regressionNetwork inference as p independent regression problems

One may use p different linear regressions

Xi = (X\i)ᵀα+ ε, where αj = −Kij/Kii,

Meinshausen and Bulhman’s approach (06)Solve p independent Lasso problems (`1–norm enforcessparsity):

α = arg minα

1n

∥∥Xi −X\iα∥∥2

2+ ρ ‖α‖`1 ,

where Xi is the ith column of X, and X\i is the full matrix with ithcolumn removed.

Major drawback: need of a symmetrization step to obtain afinal estimate of K.

Ambroise, Chiquet, Matias 9

Page 23: Gaussian Graphical Models with latent structure

GGMs and LassoSolving p penalized regressions⇔maximize the penalized pseudo-likelihood

Consider the approximation P(X) =∏pi=1 P(Xi|X\i).

PropositionThe solution to

K = arg maxK,Kij 6=Kji

log L(X; K) + ρ ‖K‖`1 , (1)

with

L(X; K) =p∑i=1

( n∑k=1

log P(Xki |Xk

\i; Ki)),

shares the same null-entries as the solution of the pindependent penalized regressions.

Those p terms are not independent, as K is not diagonal ! Still requires the post-symmetrization

Ambroise, Chiquet, Matias 10

Page 24: Gaussian Graphical Models with latent structure

GGMs and LassoSolving p penalized regressions⇔maximize the penalized pseudo-likelihood

Consider the approximation P(X) =∏pi=1 P(Xi|X\i).

PropositionThe solution to

K = arg maxK,Kij 6=Kji

log L(X; K) + ρ ‖K‖`1 , (1)

with

L(X; K) =p∑i=1

( n∑k=1

log P(Xki |Xk

\i; Ki)),

shares the same null-entries as the solution of the pindependent penalized regressions.

Those p terms are not independent, as K is not diagonal ! Still requires the post-symmetrization

Ambroise, Chiquet, Matias 10

Page 25: Gaussian Graphical Models with latent structure

GGMs and penalized likelihood

The penalized likelihood of the Gaussian observationsUse a penalty term

n

2(log det(K)− Tr(SnK))− ρ‖K‖`1 ,

where Sn is the empirical covariance matrix.

Banerjee et al. Model selection through sparse maximumlikelihood estimation for multivariate Gaussian, JMLR, 2008.

Ambroise, Chiquet, Matias 11

Page 26: Gaussian Graphical Models with latent structure

GGMs and penalized likelihood

The penalized likelihood of the Gaussian observationsUse a penalty term

n

2(log det(K)− Tr(SnK))− ρ‖K‖`1 ,

where Sn is the empirical covariance matrix.

Natural generalizationUse different penalty parameters for different coefficients

n

2(log det(K)− Tr(SnK))− ‖ρZ(K)‖`1 ,

where ρZ(K) = (ρZi,Zj (Kij))i,j is a penalty function dependingon an unknown underlying structure Z.

Ambroise, Chiquet, Matias 11

Page 27: Gaussian Graphical Models with latent structure

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 12

Page 28: Gaussian Graphical Models with latent structure

The concentration matrix structureModelling connection heterogeneity

Assumption: there exists a latent structure spreading thevertices into a set Q = {1, . . . , q, . . . , Q} of classes of connectivity.

The classes of connectivityDenote Z = {Zi = (Zi1, . . . , ZiQ)}i where Ziq = 1{i∈q} are thelatent independent variables, with

I α = {αq}, the prior proportions of groups,I (Zi) ∼M(1, α), a multinomial distribution.

A mixture of Laplace distributionsAssume Kij |Z independent. Then Kij | {ZiqZj` = 1} ∼ fq`(·),where

fq`(x) =1

2λq`exp

{− |x|λq`

}, q, ` ∈ Q.

Ambroise, Chiquet, Matias 13

Page 29: Gaussian Graphical Models with latent structure

The concentration matrix structureModelling connection heterogeneity

Assumption: there exists a latent structure spreading thevertices into a set Q = {1, . . . , q, . . . , Q} of classes of connectivity.

The classes of connectivityDenote Z = {Zi = (Zi1, . . . , ZiQ)}i where Ziq = 1{i∈q} are thelatent independent variables, with

I α = {αq}, the prior proportions of groups,I (Zi) ∼M(1, α), a multinomial distribution.

A mixture of Laplace distributionsAssume Kij |Z independent. Then Kij | {ZiqZj` = 1} ∼ fq`(·),where

fq`(x) =1

2λq`exp

{− |x|λq`

}, q, ` ∈ Q.

Ambroise, Chiquet, Matias 13

Page 30: Gaussian Graphical Models with latent structure

The concentration matrix structureModelling connection heterogeneity

Assumption: there exists a latent structure spreading thevertices into a set Q = {1, . . . , q, . . . , Q} of classes of connectivity.

The classes of connectivityDenote Z = {Zi = (Zi1, . . . , ZiQ)}i where Ziq = 1{i∈q} are thelatent independent variables, with

I α = {αq}, the prior proportions of groups,I (Zi) ∼M(1, α), a multinomial distribution.

A mixture of Laplace distributionsAssume Kij |Z independent. Then Kij | {ZiqZj` = 1} ∼ fq`(·),where

fq`(x) =1

2λq`exp

{− |x|λq`

}, q, ` ∈ Q.

Ambroise, Chiquet, Matias 13

Page 31: Gaussian Graphical Models with latent structure

Some possible structures

Figure: From Affiliation to Bipartite

A1 A2

A3

B1

B2

B3

B4

B5

C1

C2

ExampleModular (affiliation) networkTwo kinds of Laplace distributions

1. intra-cluster q = `, fin(·;λin);

2. inter-cluster q 6= `, fout(·;λout).

Ambroise, Chiquet, Matias 14

Page 32: Gaussian Graphical Models with latent structure

Some possible structures

Figure: From Affiliation to Bipartite

A1 A2

A3

B1

B2

B3

B4

B5

C1

C2

ExampleModular (affiliation) networkTwo kinds of Laplace distributions

1. intra-cluster q = `, fin(·;λin);

2. inter-cluster q 6= `, fout(·;λout).

Ambroise, Chiquet, Matias 14

Page 33: Gaussian Graphical Models with latent structure

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 15

Page 34: Gaussian Graphical Models with latent structure

Looking for a criteria. . .

We wish to infer non-null entries of K knowing the data. Thenour strategy is

K = arg maxK�0

P(K|X) = arg maxK�0

log P(X,K).

Marginalization over Z

Because distribution of K is known conditional on the structure !

K = arg maxK�0

log∑Z∈ZLc(X,K,Z),

where Lc(X,K,Z) = P(X,K,Z) is complete-data likelihood.

An EM–like strategy is used hereafter to solve this problem.

Ambroise, Chiquet, Matias 16

Page 35: Gaussian Graphical Models with latent structure

Looking for a criteria. . .

We wish to infer non-null entries of K knowing the data. Thenour strategy is

K = arg maxK�0

P(K|X) = arg maxK�0

log P(X,K).

Marginalization over Z

Because distribution of K is known conditional on the structure !

K = arg maxK�0

log∑Z∈ZLc(X,K,Z),

where Lc(X,K,Z) = P(X,K,Z) is complete-data likelihood.

An EM–like strategy is used hereafter to solve this problem.

Ambroise, Chiquet, Matias 16

Page 36: Gaussian Graphical Models with latent structure

Looking for a criteria. . .

We wish to infer non-null entries of K knowing the data. Thenour strategy is

K = arg maxK�0

P(K|X) = arg maxK�0

log P(X,K).

Marginalization over Z

Because distribution of K is known conditional on the structure !

K = arg maxK�0

log∑Z∈ZLc(X,K,Z),

where Lc(X,K,Z) = P(X,K,Z) is complete-data likelihood.

An EM–like strategy is used hereafter to solve this problem.

Ambroise, Chiquet, Matias 16

Page 37: Gaussian Graphical Models with latent structure

The complete likelihood

Proposition

logLc(X,K,Z) =n

2(log det(K)− Tr(SnK))− ‖ρZ(K)‖`1

−∑

i,j∈P,i 6=jq,`∈Q

ZiqZjl log(2λq`) +∑

i∈P,q∈QZiq logαq + c,

where Sn is the empirical covariance matrix andρZ(K) =

(ρZiZj (Kij)

)(i,j)∈P2 is defined by

ρZiZj (Kij) =∑q,`∈Q

ZiqZj`Kij

λq`.

Ambroise, Chiquet, Matias 17

Page 38: Gaussian Graphical Models with latent structure

The complete likelihood

Proposition

logLc(X,K,Z) =n

2(log det(K)− Tr(SnK))− ‖ρZ(K)‖`1

−∑

i,j∈P,i 6=jq,`∈Q

ZiqZjl log(2λq`) +∑

i∈P,q∈QZiq logαq + c,

where Sn is the empirical covariance matrix andρZ(K) =

(ρZiZj (Kij)

)(i,j)∈P2 is defined by

ρZiZj (Kij) =∑q,`∈Q

ZiqZj`Kij

λq`.

Part concerning K: PML with a LASSO-type approach.

Ambroise, Chiquet, Matias 17

Page 39: Gaussian Graphical Models with latent structure

The complete likelihood

Proposition

logLc(X,K,Z) =n

2(log det(K)− Tr(SnK)) − ‖ρZ(K)‖`1

−∑

i,j∈P,i 6=jq,`∈Q

ZiqZjl log(2λq`) +∑

i∈P,q∈QZiq logαq + c,

where Sn is the empirical covariance matrix andρZ(K) =

(ρZiZj (Kij)

)(i,j)∈P2 is defined by

ρZiZj (Kij) =∑q,`∈Q

ZiqZj`Kij

λq`.

Part concerning Z: estimation with a variational approach.

Ambroise, Chiquet, Matias 17

Page 40: Gaussian Graphical Models with latent structure

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 18

Page 41: Gaussian Graphical Models with latent structure

An EM strategy

The conditional expectation to maximize

Q(K|K(m)

)= E

{logLc(X,K,Z)|X; K(m)

},∑Z∈Z

P(Z|X,K(m)

)logLc(X,K,Z)

=∑Z∈Z

P(Z|K(m)

)logLc(X,K,Z).

ProblemI No closed-form of Q

(K|K(m)

)because P(Z|K) cannot be

factorized.I We use variational approach to approximate P(Z|K).

Ambroise, Chiquet, Matias 19

Page 42: Gaussian Graphical Models with latent structure

An EM strategy

The conditional expectation to maximize

Q(K|K(m)

)= E

{logLc(X,K,Z)|X; K(m)

},∑Z∈Z

P(Z|X,K(m)

)logLc(X,K,Z)

=∑Z∈Z

P(Z|K(m)

)logLc(X,K,Z).

ProblemI No closed-form of Q

(K|K(m)

)because P(Z|K) cannot be

factorized.I We use variational approach to approximate P(Z|K).

Ambroise, Chiquet, Matias 19

Page 43: Gaussian Graphical Models with latent structure

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 20

Page 44: Gaussian Graphical Models with latent structure

Variational estimation of the latent structureDaudin et. al, 2008

PrincipleUse an approximation R(Z) of P(Z|K) in the factorized form,Rτ (Z) =

∏iRτ i(Zi) where Rτ i is a multinomial distribution with

parameters τ i.

I Maximize a lower bound of the log-likelihood

J (Rτ (Z)) = L(X,K)−DKL(Rτ (Z)‖P(Z|K)).

I Using its tractable form, we have

J (Rτ (Z)) =∑Z

Rτ (Z)Lc(X,K,Z) +H(Rτ (Z)).

Ambroise, Chiquet, Matias 21

Page 45: Gaussian Graphical Models with latent structure

Variational estimation of the latent structureDaudin et. al, 2008

PrincipleUse an approximation R(Z) of P(Z|K) in the factorized form,Rτ (Z) =

∏iRτ i(Zi) where Rτ i is a multinomial distribution with

parameters τ i.

I Maximize a lower bound of the log-likelihood

J (Rτ (Z)) = L(X,K)−DKL(Rτ (Z)‖P(Z|K)).

I Using its tractable form, we have

J (Rτ (Z)) =∑Z

Rτ (Z)Lc(X,K,Z) +H(Rτ (Z)).

This term plays the role of E(Lc(X,K,Z)|X,K(m))

Ambroise, Chiquet, Matias 21

Page 46: Gaussian Graphical Models with latent structure

Variational estimation of the latent structureDaudin et. al, 2008

PrincipleUse an approximation R(Z) of P(Z|K) in the factorized form,Rτ (Z) =

∏iRτ i(Zi) where Rτ i is a multinomial distribution with

parameters τ i.

I Maximize a lower bound of the log-likelihood

J (Rτ (Z)) = L(X,K)−DKL(Rτ (Z)‖P(Z|K)).

I Using its tractable form, we have

J (Rτ (Z)) =∑Z

Rτ (Z)Lc(X,K,Z) +H(Rτ (Z)).

This term plays the role of E(Lc(X,K,Z)|X,K(m))

Maximizing J leads to a fix-point relationship for τ

Ambroise, Chiquet, Matias 21

Page 47: Gaussian Graphical Models with latent structure

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 22

Page 48: Gaussian Graphical Models with latent structure

The M–stepSeen as a penalized likelihood problem

We aim at solvingK = arg max

K�0Qτ (K),

where

Penalized likelihood problem

Qτ (K) ={n

2(log det(K)− Tr(SnK))− ‖ρτ (K)‖`1 + Cst

},

Friedman, Hastie, Tibshirani. Sparse inverse covariance estimationwith the Lasso, Biostatistics, 2007.

Banerjee et al. Model selection through sparse maximumlikelihood estimation for multivariate Gaussian, JMLR, 2008.

We deal with a more complex penalty term here.

Ambroise, Chiquet, Matias 23

Page 49: Gaussian Graphical Models with latent structure

The M–stepSeen as a penalized likelihood problem

We aim at solvingK = arg max

K�0Qτ (K),

where

Penalized likelihood problem

Qτ (K) ={n

2(log det(K)− Tr(SnK))− ‖ρτ (K)‖`1 + Cst

},

Friedman, Hastie, Tibshirani. Sparse inverse covariance estimationwith the Lasso, Biostatistics, 2007.

Banerjee et al. Model selection through sparse maximumlikelihood estimation for multivariate Gaussian, JMLR, 2008.

We deal with a more complex penalty term here.

Ambroise, Chiquet, Matias 23

Page 50: Gaussian Graphical Models with latent structure

Let us work on the covariance matrix

PropositionThe maximization problem over K is equivalent to the following,dealing with the covariance matrix Σ:

Σ = arg max‖(Σ−Sn)·/P‖∞≤1

log det(Σ),

where ·/

is the term-by-term division and

P = (pij)i,j∈P =2n

∑q,`

τiqτj`λq`

.

The proof use some optimization, primal/dual tricks

Ambroise, Chiquet, Matias 24

Page 51: Gaussian Graphical Models with latent structure

A Block-wise resolution

Denote

Σ =

[Σ11 σ12

σᵀ12 Σ22

], Sn =

[S11 s12

sᵀ12 S22

], P =

[P11 p12

pᵀ12 P22

], (2)

where Σ11 is a (p− 1)× (p− 1) matrix, σ12 is a p− 1 lengthcolumn vector and Σ22 is a scalar.

Each column of Σ satisfies (by det of Schur complement)

σ12 = arg min{‖(y−s12)·/p12‖∞≤1}

{yᵀΣ

−1

11 y},

Ambroise, Chiquet, Matias 25

Page 52: Gaussian Graphical Models with latent structure

A `1–norm penalized writing

PropositionSolving the block-wise problem is equivalent to solve thefollowing dual problem

minβ

∥∥∥∥12Σ

1/2

11 β − Σ−1/2

11 s12

∥∥∥∥2

2

+ ‖p12 ? β‖`1 ,

where ? is the term-by-term product. Vectors σ12 and β arelinked by

σ12 = Σ11β/2.

A LASSO-like formulation with existing costless algorithms

Ambroise, Chiquet, Matias 26

Page 53: Gaussian Graphical Models with latent structure

The full EM algorithm

while bQτ ( bK(m)) has not stabilized do

//THE E-STEP: LATENT STRUCTURE INFERENCEif m = 1 then

// First passApply spectral clustering on the empirical covariance S to initialize bτ

elseCompute bτ with via fix-point algorithm, using bK(m−1)

end

//THE M-STEP: NETWORK INFERENCEConstruct the penalty matrix P according to bτwhile bΣ(m)

has not stabilized do

for each column of bΣ(m)do

Compute bσ12 by solving the LASSO–like problem with path-wisecoordinate optimization

endend

Compute bK(m) by block-wise inversion of bΣ(m)

m← m+ 1end

Ambroise, Chiquet, Matias 27

Page 54: Gaussian Graphical Models with latent structure

The full EM algorithm

while bQτ ( bK(m)) has not stabilized do

//THE E-STEP: LATENT STRUCTURE INFERENCEif m = 1 then

// First passApply spectral clustering on the empirical covariance S to initialize bτ

elseCompute bτ with via fix-point algorithm, using bK(m−1)

end

//THE M-STEP: NETWORK INFERENCEConstruct the penalty matrix P according to bτwhile bΣ(m)

has not stabilized do

for each column of bΣ(m)do

Compute bσ12 by solving the LASSO–like problem with path-wisecoordinate optimization

endend

Compute bK(m) by block-wise inversion of bΣ(m)

m← m+ 1end

Ambroise, Chiquet, Matias 27

Page 55: Gaussian Graphical Models with latent structure

The full EM algorithm

while bQτ ( bK(m)) has not stabilized do

//THE E-STEP: LATENT STRUCTURE INFERENCEif m = 1 then

// First passApply spectral clustering on the empirical covariance S to initialize bτ

elseCompute bτ with via fix-point algorithm, using bK(m−1)

end

//THE M-STEP: NETWORK INFERENCEConstruct the penalty matrix P according to bτwhile bΣ(m)

has not stabilized do

for each column of bΣ(m)do

Compute bσ12 by solving the LASSO–like problem with path-wisecoordinate optimization

endend

Compute bK(m) by block-wise inversion of bΣ(m)

m← m+ 1end

Ambroise, Chiquet, Matias 27

Page 56: Gaussian Graphical Models with latent structure

The full EM algorithm

while bQτ ( bK(m)) has not stabilized do

//THE E-STEP: LATENT STRUCTURE INFERENCEif m = 1 then

// First passApply spectral clustering on the empirical covariance S to initialize bτ

elseCompute bτ with via fix-point algorithm, using bK(m−1)

end

//THE M-STEP: NETWORK INFERENCEConstruct the penalty matrix P according to bτwhile bΣ(m)

has not stabilized do

for each column of bΣ(m)do

Compute bσ12 by solving the LASSO–like problem with path-wisecoordinate optimization

endend

Compute bK(m) by block-wise inversion of bΣ(m)

m← m+ 1end

Ambroise, Chiquet, Matias 27

Page 57: Gaussian Graphical Models with latent structure

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 28

Page 58: Gaussian Graphical Models with latent structure

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 29

Page 59: Gaussian Graphical Models with latent structure

Simulations settings

Five inference methods

1. InvCorEdge estimation based on empirical correlation matrix inversion.

2. GeneNet (Strimmer et al.)Edge estimation based on partial correlation with shrinkage.

3. GLasso (Friedman et al.)Edge estimation uses a uniform penalty matrix.

4. “perfect” SIMoNe (best results our method can aspire to)Edge estimation uses a penalty matrix constructed according to the theoretic

node classification.

5. SIMoNe (Statistical Inference for MOdular NEtworks)Edge estimation uses a penalty matrix constructed according to the estimated

node classification, iteratively.

Ambroise, Chiquet, Matias 30

Page 60: Gaussian Graphical Models with latent structure

Test simulation setup

Simulated Graphs

I Graphs simulated using an affiliation model (two sets ofparameters: intra-groups and inter-groups connections)

I p = 200 nodes p(p− 1)/2 = 19900 possible interactions.I 50 graphs (repetitions) were simulated per situation.I Gene expression data (i.e., Gaussian samples) was then

simulated using the sampled graph:1. Favorable setting (n = 10p),2. Middle case (n = 2p)3. Unfavorable setting (n = p/2)

Unstructured graph

I When no structure SIMoNe is comparable to GeneNet andGLasso

Ambroise, Chiquet, Matias 31

Page 61: Gaussian Graphical Models with latent structure

Concentration matrix and structure

(a) (b) (c)

Figure: Simulation of the structured sparse concentration matrix.Adjacency matrix without (a), with (b) columns reorganizedaccording the affiliation structure and corresponding graph (c).

Ambroise, Chiquet, Matias 32

Page 62: Gaussian Graphical Models with latent structure

Example of graph recoveryFavorable case

Figure: Theoretical graph and SIMoNe estimationAmbroise, Chiquet, Matias 33

Page 63: Gaussian Graphical Models with latent structure

Example of graph recoveryFavorable case

Figure: Theoretical graph and SIMoNe estimationAmbroise, Chiquet, Matias 33

Page 64: Gaussian Graphical Models with latent structure

Precision/Recall CurvesDefinition

Precision =TP

TP + FP= Proportion of true positives among all positives

Recall =TP

TP + FN= Proportion of true positive among all edges

Ambroise, Chiquet, Matias 34

Page 65: Gaussian Graphical Models with latent structure

Precision/Recall CurvesFavorable setting – n = 10p

I With n� p, PerfectSIMoNe and SIMoNeperform equivalently

I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.

I When n ≤ p all methodsperform poorly. . .

0.0 0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

Recall

Prec

ision

SIMoNeGLassoPerfectGeneNetInvcor

Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe, InvCor

Ambroise, Chiquet, Matias 34

Page 66: Gaussian Graphical Models with latent structure

Precision/Recall CurvesFavorable setting – n = 6p

I With n� p, PerfectSIMoNe and SIMoNeperform equivalently

I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.

I When n ≤ p all methodsperform poorly. . .

0.0 0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

Recall

Prec

ision

SIMoNeGLassoPerfectGeneNetInvcor

Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe, InvCor

Ambroise, Chiquet, Matias 34

Page 67: Gaussian Graphical Models with latent structure

Precision/Recall CurvesMiddle case – n = 3p

I With n� p, PerfectSIMoNe and SIMoNeperform equivalently

I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.

I When n ≤ p all methodsperform poorly. . .

0.0 0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

Recall

Prec

ision

SIMoNeGLassoPerfectGeneNetInvcor

Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe, InvCor

Ambroise, Chiquet, Matias 34

Page 68: Gaussian Graphical Models with latent structure

Precision/Recall CurvesMiddle case – n = 2p

I With n� p, PerfectSIMoNe and SIMoNeperform equivalently

I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.

I When n ≤ p all methodsperform poorly. . .

0.0 0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

Recall

Prec

ision

SIMoNeGLassoPerfectGeneNetInvcor

Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe

Ambroise, Chiquet, Matias 34

Page 69: Gaussian Graphical Models with latent structure

Precision/Recall CurvesUnfavorable case – n = p

I With n� p, PerfectSIMoNe and SIMoNeperform equivalently

I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.

I When n ≤ p all methodsperform poorly. . .

0.0 0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

Recall

Prec

ision

SIMoNeGLassoPerfectGeneNet

Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe

Ambroise, Chiquet, Matias 34

Page 70: Gaussian Graphical Models with latent structure

Precision/Recall CurvesUnfavorable case – n = p/2

I With n� p, PerfectSIMoNe and SIMoNeperform equivalently

I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.

I When n ≤ p all methodsperform poorly. . .

0.0 0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

Recall

Prec

ision

SIMoNeGLassoPerfectGeneNet

Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe

Ambroise, Chiquet, Matias 34

Page 71: Gaussian Graphical Models with latent structure

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 35

Page 72: Gaussian Graphical Models with latent structure

First results on real a datasetPrediction of the outcome of preoperative chemotherapy

Two types of patients

1. Patient response can be classified as either a pathologiccomplete response (PCR)

2. or residual disease (Not PCR).

Gene expression data

I 133 patients (99 not PCR, 34 PCR)I 26 identified genes (differential analysis)

Ambroise, Chiquet, Matias 36

Page 73: Gaussian Graphical Models with latent structure

First result on real a datasetPrediction of the outcome of preoperative chemotherapy

AMFR

BB_S4

BECNI

BTG3

CA12

CTNND2

E2F3

ERBB4

FGFRIOP

FLJ10916

FLJI2650

GAMT

GFRAI

IGFBP4

JMJD2B

KIA1467

MAPT

MBTP_SI

MELK

METRN

PDGFRA

RAMPI

RRM2

SCUBE2

THRAP2

ZNF552

Full SampleAmbroise, Chiquet, Matias 37

Page 74: Gaussian Graphical Models with latent structure

First result on real a datasetPrediction of the outcome of preoperative chemotherapy

AMFR

BB_S4

BECNI

BTG3

CA12

CTNND2

E2F3

ERBB4

FGFRIOP

FLJ10916

FLJI2650

GAMT

GFRAI

IGFBP4

JMJD2B

KIA1467

MAPT

MBTP_SI

MELK

METRN

PDGFRA

RAMPI

RRM2

SCUBE2

THRAP2

ZNF552

Not PCRAmbroise, Chiquet, Matias 37

Page 75: Gaussian Graphical Models with latent structure

First result on real a datasetPrediction of the outcome of preoperative chemotherapy

AMFR

BB_S4

BECNI

BTG3

CA12

CTNND2

E2F3

ERBB4

FGFRIOP

FLJ10916

FLJI2650

GAMT

GFRAI

IGFBP4

JMJD2B

KIA1467

MAPT

MBTP_SI

MELKMETRN

PDGFRA

RAMPI

RRM2

SCUBE2

THRAP2

ZNF552

PCRAmbroise, Chiquet, Matias 37

Page 76: Gaussian Graphical Models with latent structure

Conclusions

To sum-up

I We proposed an inference strategy based on apenalization scheme given by an underlying unknownstructure.

I The estimation strategy is based on a variational EM

algorithm, in which a LASSO-like procedure is embedded.I Preprint on arxiv.I R package SIMoNe

Perspectives

I Consider alternative prior more biologically relevant: hubs,motifs.

I Time segmentation when dealing with temporal data

Ambroise, Chiquet, Matias 38

Page 77: Gaussian Graphical Models with latent structure

Conclusions

To sum-up

I We proposed an inference strategy based on apenalization scheme given by an underlying unknownstructure.

I The estimation strategy is based on a variational EM

algorithm, in which a LASSO-like procedure is embedded.I Preprint on arxiv.I R package SIMoNe

Perspectives

I Consider alternative prior more biologically relevant: hubs,motifs.

I Time segmentation when dealing with temporal data

Ambroise, Chiquet, Matias 38

Page 78: Gaussian Graphical Models with latent structure

Penalty choice (1)

Let Ci denote the connectivity component of i in the trueconditional dependency graph, and Ci the correspondingcomponent resulting from the estimate K.

PropositionFix some ε > 0 and choose the penalty parameters λ such that,for all q, ` ∈ Q,

2p2Fn−2

2nλq`

(maxi 6=j

SiiSjj −1λ2q`

)−1/2

(n− 2)1/2

≤ ε,where 1− Fn−2 is the c.d.f. of a Students’s t-distribution withn− 2 degrees of freedom. Then

P(∃k, Ck * Ck) ≤ ε. (3)

Ambroise, Chiquet, Matias 39

Page 79: Gaussian Graphical Models with latent structure

Penalty choice (2)

It’s enough to choose λq` such as

λq`(ε) ≥2n

(n− 2 + t2n−2

2p2

))1/2

×

maxi 6=j

ZiqZj`=1

SiiSjj

−1/2

tn−2

2p2

)−1

.

Ambroise, Chiquet, Matias 40

Page 80: Gaussian Graphical Models with latent structure

Penalty choice (3)

Practically,

I Relax the λq` in the E–step (variational inference), thusmaking variational EM in the E-step.

I Fix the λq` in the M-step, adapting the above rule to thecontext.E.g., for an affiliation structure, we fix the ratio λin/λout = 1.2 and either let the

value 1/λin vary when considering precision/recall curves for synthetic data, or fix

this parameter relying on the above rule when dealing with real data

Ambroise, Chiquet, Matias 41