View
709
Download
0
Tags:
Embed Size (px)
Citation preview
Penalized Maximum Likelihood Inference forSparse Gaussian Graphical Models with
Latent Structure
Christophe Ambroise, Julien Chiquet and Catherine Matias
Laboratoire Statistique et Genome,La genopole - Universite d’Evry
Statistique et sante publique, le 13 janvier 2009
Ambroise, Chiquet, Matias 1
Inferring Sparse Networks with LatentStructure
Christophe Ambroise, Julien Chiquet and Catherine Matias
Laboratoire Statistique et Genome,La genopole - Universite d’Evry
Statistique et sante publique, le 13 janvier 2009
Ambroise, Chiquet, Matias 1
Biological networksDifferent kinds of biological interactions
Families of networksI protein-protein
interactions,I metabolic pathways,I regulation network.
lexA
dinI
recF
rpD rpH
SsB
recA
umD
rpS
Regulation example : SOS Network E. Coli
Let us focus on regulatory networks . . . and look for influencenetwork
Ambroise, Chiquet, Matias 2
Biological networksDifferent kinds of biological interactions
Families of networksI protein-protein
interactions,I metabolic pathways,I regulation network.
lexA
dinI
recF
rpD rpH
SsB
recA
umD
rpS
Regulation example : SOS Network E. Coli
Let us focus on regulatory networks . . . and look for influencenetwork
Ambroise, Chiquet, Matias 2
Biological networksDifferent kinds of biological interactions
Families of networksI protein-protein
interactions,I metabolic pathways,I regulation network.
lexA
dinI
recF
rpD rpH
SsB
recA
umD
rpS
Regulation example : SOS Network E. Coli
Let us focus on regulatory networks . . . and look for influencenetwork
Ambroise, Chiquet, Matias 2
What questions?
Network
Inference
Supervised
Un-supervised
StructureDegree
distri-bution
Communityanalysis
Stat.model
Spectralclustering
How to find the interactions?What knowledge the structurecan provide?
Given a new node, what are theinteraction with the known nodes?
Given two nodes, do theyinteract?
Communities’ characteristics?Ambroise, Chiquet, Matias 3
What questions?
Network
Inference
Supervised
Un-supervised
StructureDegree
distri-bution
Communityanalysis
Stat.model
Spectralclustering
How to find the interactions?What knowledge the structurecan provide?
Given a new node, what are theinteraction with the known nodes?
Given two nodes, do theyinteract?
Communities’ characteristics?Ambroise, Chiquet, Matias 3
ProblemInfer the interactions between genes from microarray data
Microarray gene expression data,p genes, n experiments Which ones interact/co-express?
G0 G1
G2
G3
G4
G5
G6
G7
G8
G9
Major Issues
I combinatory: 2p2
possible graphsI dimension problem: n� p
Here, we reduce p to a number of fixed genes of interest
Ambroise, Chiquet, Matias 4
ProblemInfer the interactions between genes from microarray data
Microarray gene expression data,p genes, n experiments
Inference
Which ones interact/co-express?
G0 G1
G2
G3
G4
G5
G6
G7
G8
G9
Major Issues
I combinatory: 2p2
possible graphsI dimension problem: n� p
Here, we reduce p to a number of fixed genes of interest
Ambroise, Chiquet, Matias 4
ProblemInfer the interactions between genes from microarray data
Microarray gene expression data,p genes, n experiments
Inference
Which ones interact/co-express?
G0 G1
G2
G3
G4
G5
G6
G7
G8
G9
Major Issues
I combinatory: 2p2
possible graphsI dimension problem: n� p
Here, we reduce p to a number of fixed genes of interest
Ambroise, Chiquet, Matias 4
Our ideas to tackle these issues
Introduce prior taking the topology of the network intoaccount for better edge inference
G0 G1
G2
G3
G4
G5
G6
G7
G8
G9
Relying on biological constraints
1. few genes effectively interact (sparsity),2. networks are organized (latent structure).
Ambroise, Chiquet, Matias 5
Our ideas to tackle these issues
Introduce prior taking the topology of the network intoaccount for better edge inference
G0 G1
G2
G3
G4
G5
G6
G7
G8
G9
Relying on biological constraints
1. few genes effectively interact (sparsity),2. networks are organized (latent structure).
Ambroise, Chiquet, Matias 5
Our ideas to tackle these issues
Introduce prior taking the topology of the network intoaccount for better edge inference
A1 A2
A3
B1
B2
B3
B4
B5
C1
C2
Relying on biological constraints
1. few genes effectively interact (sparsity),2. networks are organized (latent structure).
Ambroise, Chiquet, Matias 5
Outline
Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood
Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix
Numerical ExperimentsSynthetic dataBreast cancer data
Ambroise, Chiquet, Matias 6
Outline
Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood
Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix
Numerical ExperimentsSynthetic dataBreast cancer data
Ambroise, Chiquet, Matias 6
Outline
Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood
Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix
Numerical ExperimentsSynthetic dataBreast cancer data
Ambroise, Chiquet, Matias 6
Outline
Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood
Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix
Numerical ExperimentsSynthetic dataBreast cancer data
Ambroise, Chiquet, Matias 7
GGMsGeneral settings
The Gaussian modelI Let X ∈ Rp be a random vector such as X ∼ N (0p,Σ);I let (X1, . . . , Xn) be an i.i.d. size–n sample (e.g., microarray
experiments);I let X be a n× p matrix such as (Xk)ᵀ is the kth row of X;I let K = (Kij)(i,j)∈P2 := Σ−1 be the concentration matrix.
The graphical interpretation
Xi ⊥⊥ Xj |XP\{i,j} ⇔ Kij = 0⇔ edge (i, j) /∈ network,
since rij|P\{i,j} = −Kij/√KiiKjj .
K describes the graph of conditional dependencies.
Ambroise, Chiquet, Matias 8
GGMsGeneral settings
The Gaussian modelI Let X ∈ Rp be a random vector such as X ∼ N (0p,Σ);I let (X1, . . . , Xn) be an i.i.d. size–n sample (e.g., microarray
experiments);I let X be a n× p matrix such as (Xk)ᵀ is the kth row of X;I let K = (Kij)(i,j)∈P2 := Σ−1 be the concentration matrix.
The graphical interpretation
Xi ⊥⊥ Xj |XP\{i,j} ⇔ Kij = 0⇔ edge (i, j) /∈ network,
since rij|P\{i,j} = −Kij/√KiiKjj .
K describes the graph of conditional dependencies.
Ambroise, Chiquet, Matias 8
GGMs and regressionNetwork inference as p independent regression problems
One may use p different linear regressions
Xi = (X\i)ᵀα+ ε, where αj = −Kij/Kii,
Meinshausen and Bulhman’s approach (06)Solve p independent Lasso problems (`1–norm enforcessparsity):
α = arg minα
1n
∥∥Xi −X\iα∥∥2
2+ ρ ‖α‖`1 ,
where Xi is the ith column of X, and X\i is the full matrix with ithcolumn removed.
Major drawback: need of a symmetrization step to obtain afinal estimate of K.
Ambroise, Chiquet, Matias 9
GGMs and regressionNetwork inference as p independent regression problems
One may use p different linear regressions
Xi = (X\i)ᵀα+ ε, where αj = −Kij/Kii,
Meinshausen and Bulhman’s approach (06)Solve p independent Lasso problems (`1–norm enforcessparsity):
α = arg minα
1n
∥∥Xi −X\iα∥∥2
2+ ρ ‖α‖`1 ,
where Xi is the ith column of X, and X\i is the full matrix with ithcolumn removed.
Major drawback: need of a symmetrization step to obtain afinal estimate of K.
Ambroise, Chiquet, Matias 9
GGMs and regressionNetwork inference as p independent regression problems
One may use p different linear regressions
Xi = (X\i)ᵀα+ ε, where αj = −Kij/Kii,
Meinshausen and Bulhman’s approach (06)Solve p independent Lasso problems (`1–norm enforcessparsity):
α = arg minα
1n
∥∥Xi −X\iα∥∥2
2+ ρ ‖α‖`1 ,
where Xi is the ith column of X, and X\i is the full matrix with ithcolumn removed.
Major drawback: need of a symmetrization step to obtain afinal estimate of K.
Ambroise, Chiquet, Matias 9
GGMs and LassoSolving p penalized regressions⇔maximize the penalized pseudo-likelihood
Consider the approximation P(X) =∏pi=1 P(Xi|X\i).
PropositionThe solution to
K = arg maxK,Kij 6=Kji
log L(X; K) + ρ ‖K‖`1 , (1)
with
L(X; K) =p∑i=1
( n∑k=1
log P(Xki |Xk
\i; Ki)),
shares the same null-entries as the solution of the pindependent penalized regressions.
Those p terms are not independent, as K is not diagonal ! Still requires the post-symmetrization
Ambroise, Chiquet, Matias 10
GGMs and LassoSolving p penalized regressions⇔maximize the penalized pseudo-likelihood
Consider the approximation P(X) =∏pi=1 P(Xi|X\i).
PropositionThe solution to
K = arg maxK,Kij 6=Kji
log L(X; K) + ρ ‖K‖`1 , (1)
with
L(X; K) =p∑i=1
( n∑k=1
log P(Xki |Xk
\i; Ki)),
shares the same null-entries as the solution of the pindependent penalized regressions.
Those p terms are not independent, as K is not diagonal ! Still requires the post-symmetrization
Ambroise, Chiquet, Matias 10
GGMs and penalized likelihood
The penalized likelihood of the Gaussian observationsUse a penalty term
n
2(log det(K)− Tr(SnK))− ρ‖K‖`1 ,
where Sn is the empirical covariance matrix.
Banerjee et al. Model selection through sparse maximumlikelihood estimation for multivariate Gaussian, JMLR, 2008.
Ambroise, Chiquet, Matias 11
GGMs and penalized likelihood
The penalized likelihood of the Gaussian observationsUse a penalty term
n
2(log det(K)− Tr(SnK))− ρ‖K‖`1 ,
where Sn is the empirical covariance matrix.
Natural generalizationUse different penalty parameters for different coefficients
n
2(log det(K)− Tr(SnK))− ‖ρZ(K)‖`1 ,
where ρZ(K) = (ρZi,Zj (Kij))i,j is a penalty function dependingon an unknown underlying structure Z.
Ambroise, Chiquet, Matias 11
Outline
Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood
Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix
Numerical ExperimentsSynthetic dataBreast cancer data
Ambroise, Chiquet, Matias 12
The concentration matrix structureModelling connection heterogeneity
Assumption: there exists a latent structure spreading thevertices into a set Q = {1, . . . , q, . . . , Q} of classes of connectivity.
The classes of connectivityDenote Z = {Zi = (Zi1, . . . , ZiQ)}i where Ziq = 1{i∈q} are thelatent independent variables, with
I α = {αq}, the prior proportions of groups,I (Zi) ∼M(1, α), a multinomial distribution.
A mixture of Laplace distributionsAssume Kij |Z independent. Then Kij | {ZiqZj` = 1} ∼ fq`(·),where
fq`(x) =1
2λq`exp
{− |x|λq`
}, q, ` ∈ Q.
Ambroise, Chiquet, Matias 13
The concentration matrix structureModelling connection heterogeneity
Assumption: there exists a latent structure spreading thevertices into a set Q = {1, . . . , q, . . . , Q} of classes of connectivity.
The classes of connectivityDenote Z = {Zi = (Zi1, . . . , ZiQ)}i where Ziq = 1{i∈q} are thelatent independent variables, with
I α = {αq}, the prior proportions of groups,I (Zi) ∼M(1, α), a multinomial distribution.
A mixture of Laplace distributionsAssume Kij |Z independent. Then Kij | {ZiqZj` = 1} ∼ fq`(·),where
fq`(x) =1
2λq`exp
{− |x|λq`
}, q, ` ∈ Q.
Ambroise, Chiquet, Matias 13
The concentration matrix structureModelling connection heterogeneity
Assumption: there exists a latent structure spreading thevertices into a set Q = {1, . . . , q, . . . , Q} of classes of connectivity.
The classes of connectivityDenote Z = {Zi = (Zi1, . . . , ZiQ)}i where Ziq = 1{i∈q} are thelatent independent variables, with
I α = {αq}, the prior proportions of groups,I (Zi) ∼M(1, α), a multinomial distribution.
A mixture of Laplace distributionsAssume Kij |Z independent. Then Kij | {ZiqZj` = 1} ∼ fq`(·),where
fq`(x) =1
2λq`exp
{− |x|λq`
}, q, ` ∈ Q.
Ambroise, Chiquet, Matias 13
Some possible structures
Figure: From Affiliation to Bipartite
A1 A2
A3
B1
B2
B3
B4
B5
C1
C2
ExampleModular (affiliation) networkTwo kinds of Laplace distributions
1. intra-cluster q = `, fin(·;λin);
2. inter-cluster q 6= `, fout(·;λout).
Ambroise, Chiquet, Matias 14
Some possible structures
Figure: From Affiliation to Bipartite
A1 A2
A3
B1
B2
B3
B4
B5
C1
C2
ExampleModular (affiliation) networkTwo kinds of Laplace distributions
1. intra-cluster q = `, fin(·;λin);
2. inter-cluster q 6= `, fout(·;λout).
Ambroise, Chiquet, Matias 14
Outline
Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood
Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix
Numerical ExperimentsSynthetic dataBreast cancer data
Ambroise, Chiquet, Matias 15
Looking for a criteria. . .
We wish to infer non-null entries of K knowing the data. Thenour strategy is
K = arg maxK�0
P(K|X) = arg maxK�0
log P(X,K).
Marginalization over Z
Because distribution of K is known conditional on the structure !
K = arg maxK�0
log∑Z∈ZLc(X,K,Z),
where Lc(X,K,Z) = P(X,K,Z) is complete-data likelihood.
An EM–like strategy is used hereafter to solve this problem.
Ambroise, Chiquet, Matias 16
Looking for a criteria. . .
We wish to infer non-null entries of K knowing the data. Thenour strategy is
K = arg maxK�0
P(K|X) = arg maxK�0
log P(X,K).
Marginalization over Z
Because distribution of K is known conditional on the structure !
K = arg maxK�0
log∑Z∈ZLc(X,K,Z),
where Lc(X,K,Z) = P(X,K,Z) is complete-data likelihood.
An EM–like strategy is used hereafter to solve this problem.
Ambroise, Chiquet, Matias 16
Looking for a criteria. . .
We wish to infer non-null entries of K knowing the data. Thenour strategy is
K = arg maxK�0
P(K|X) = arg maxK�0
log P(X,K).
Marginalization over Z
Because distribution of K is known conditional on the structure !
K = arg maxK�0
log∑Z∈ZLc(X,K,Z),
where Lc(X,K,Z) = P(X,K,Z) is complete-data likelihood.
An EM–like strategy is used hereafter to solve this problem.
Ambroise, Chiquet, Matias 16
The complete likelihood
Proposition
logLc(X,K,Z) =n
2(log det(K)− Tr(SnK))− ‖ρZ(K)‖`1
−∑
i,j∈P,i 6=jq,`∈Q
ZiqZjl log(2λq`) +∑
i∈P,q∈QZiq logαq + c,
where Sn is the empirical covariance matrix andρZ(K) =
(ρZiZj (Kij)
)(i,j)∈P2 is defined by
ρZiZj (Kij) =∑q,`∈Q
ZiqZj`Kij
λq`.
Ambroise, Chiquet, Matias 17
The complete likelihood
Proposition
logLc(X,K,Z) =n
2(log det(K)− Tr(SnK))− ‖ρZ(K)‖`1
−∑
i,j∈P,i 6=jq,`∈Q
ZiqZjl log(2λq`) +∑
i∈P,q∈QZiq logαq + c,
where Sn is the empirical covariance matrix andρZ(K) =
(ρZiZj (Kij)
)(i,j)∈P2 is defined by
ρZiZj (Kij) =∑q,`∈Q
ZiqZj`Kij
λq`.
Part concerning K: PML with a LASSO-type approach.
Ambroise, Chiquet, Matias 17
The complete likelihood
Proposition
logLc(X,K,Z) =n
2(log det(K)− Tr(SnK)) − ‖ρZ(K)‖`1
−∑
i,j∈P,i 6=jq,`∈Q
ZiqZjl log(2λq`) +∑
i∈P,q∈QZiq logαq + c,
where Sn is the empirical covariance matrix andρZ(K) =
(ρZiZj (Kij)
)(i,j)∈P2 is defined by
ρZiZj (Kij) =∑q,`∈Q
ZiqZj`Kij
λq`.
Part concerning Z: estimation with a variational approach.
Ambroise, Chiquet, Matias 17
Outline
Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood
Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix
Numerical ExperimentsSynthetic dataBreast cancer data
Ambroise, Chiquet, Matias 18
An EM strategy
The conditional expectation to maximize
Q(K|K(m)
)= E
{logLc(X,K,Z)|X; K(m)
},∑Z∈Z
P(Z|X,K(m)
)logLc(X,K,Z)
=∑Z∈Z
P(Z|K(m)
)logLc(X,K,Z).
ProblemI No closed-form of Q
(K|K(m)
)because P(Z|K) cannot be
factorized.I We use variational approach to approximate P(Z|K).
Ambroise, Chiquet, Matias 19
An EM strategy
The conditional expectation to maximize
Q(K|K(m)
)= E
{logLc(X,K,Z)|X; K(m)
},∑Z∈Z
P(Z|X,K(m)
)logLc(X,K,Z)
=∑Z∈Z
P(Z|K(m)
)logLc(X,K,Z).
ProblemI No closed-form of Q
(K|K(m)
)because P(Z|K) cannot be
factorized.I We use variational approach to approximate P(Z|K).
Ambroise, Chiquet, Matias 19
Outline
Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood
Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix
Numerical ExperimentsSynthetic dataBreast cancer data
Ambroise, Chiquet, Matias 20
Variational estimation of the latent structureDaudin et. al, 2008
PrincipleUse an approximation R(Z) of P(Z|K) in the factorized form,Rτ (Z) =
∏iRτ i(Zi) where Rτ i is a multinomial distribution with
parameters τ i.
I Maximize a lower bound of the log-likelihood
J (Rτ (Z)) = L(X,K)−DKL(Rτ (Z)‖P(Z|K)).
I Using its tractable form, we have
J (Rτ (Z)) =∑Z
Rτ (Z)Lc(X,K,Z) +H(Rτ (Z)).
Ambroise, Chiquet, Matias 21
Variational estimation of the latent structureDaudin et. al, 2008
PrincipleUse an approximation R(Z) of P(Z|K) in the factorized form,Rτ (Z) =
∏iRτ i(Zi) where Rτ i is a multinomial distribution with
parameters τ i.
I Maximize a lower bound of the log-likelihood
J (Rτ (Z)) = L(X,K)−DKL(Rτ (Z)‖P(Z|K)).
I Using its tractable form, we have
J (Rτ (Z)) =∑Z
Rτ (Z)Lc(X,K,Z) +H(Rτ (Z)).
This term plays the role of E(Lc(X,K,Z)|X,K(m))
Ambroise, Chiquet, Matias 21
Variational estimation of the latent structureDaudin et. al, 2008
PrincipleUse an approximation R(Z) of P(Z|K) in the factorized form,Rτ (Z) =
∏iRτ i(Zi) where Rτ i is a multinomial distribution with
parameters τ i.
I Maximize a lower bound of the log-likelihood
J (Rτ (Z)) = L(X,K)−DKL(Rτ (Z)‖P(Z|K)).
I Using its tractable form, we have
J (Rτ (Z)) =∑Z
Rτ (Z)Lc(X,K,Z) +H(Rτ (Z)).
This term plays the role of E(Lc(X,K,Z)|X,K(m))
Maximizing J leads to a fix-point relationship for τ
Ambroise, Chiquet, Matias 21
Outline
Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood
Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix
Numerical ExperimentsSynthetic dataBreast cancer data
Ambroise, Chiquet, Matias 22
The M–stepSeen as a penalized likelihood problem
We aim at solvingK = arg max
K�0Qτ (K),
where
Penalized likelihood problem
Qτ (K) ={n
2(log det(K)− Tr(SnK))− ‖ρτ (K)‖`1 + Cst
},
Friedman, Hastie, Tibshirani. Sparse inverse covariance estimationwith the Lasso, Biostatistics, 2007.
Banerjee et al. Model selection through sparse maximumlikelihood estimation for multivariate Gaussian, JMLR, 2008.
We deal with a more complex penalty term here.
Ambroise, Chiquet, Matias 23
The M–stepSeen as a penalized likelihood problem
We aim at solvingK = arg max
K�0Qτ (K),
where
Penalized likelihood problem
Qτ (K) ={n
2(log det(K)− Tr(SnK))− ‖ρτ (K)‖`1 + Cst
},
Friedman, Hastie, Tibshirani. Sparse inverse covariance estimationwith the Lasso, Biostatistics, 2007.
Banerjee et al. Model selection through sparse maximumlikelihood estimation for multivariate Gaussian, JMLR, 2008.
We deal with a more complex penalty term here.
Ambroise, Chiquet, Matias 23
Let us work on the covariance matrix
PropositionThe maximization problem over K is equivalent to the following,dealing with the covariance matrix Σ:
Σ = arg max‖(Σ−Sn)·/P‖∞≤1
log det(Σ),
where ·/
is the term-by-term division and
P = (pij)i,j∈P =2n
∑q,`
τiqτj`λq`
.
The proof use some optimization, primal/dual tricks
Ambroise, Chiquet, Matias 24
A Block-wise resolution
Denote
Σ =
[Σ11 σ12
σᵀ12 Σ22
], Sn =
[S11 s12
sᵀ12 S22
], P =
[P11 p12
pᵀ12 P22
], (2)
where Σ11 is a (p− 1)× (p− 1) matrix, σ12 is a p− 1 lengthcolumn vector and Σ22 is a scalar.
Each column of Σ satisfies (by det of Schur complement)
σ12 = arg min{‖(y−s12)·/p12‖∞≤1}
{yᵀΣ
−1
11 y},
Ambroise, Chiquet, Matias 25
A `1–norm penalized writing
PropositionSolving the block-wise problem is equivalent to solve thefollowing dual problem
minβ
∥∥∥∥12Σ
1/2
11 β − Σ−1/2
11 s12
∥∥∥∥2
2
+ ‖p12 ? β‖`1 ,
where ? is the term-by-term product. Vectors σ12 and β arelinked by
σ12 = Σ11β/2.
A LASSO-like formulation with existing costless algorithms
Ambroise, Chiquet, Matias 26
The full EM algorithm
while bQτ ( bK(m)) has not stabilized do
//THE E-STEP: LATENT STRUCTURE INFERENCEif m = 1 then
// First passApply spectral clustering on the empirical covariance S to initialize bτ
elseCompute bτ with via fix-point algorithm, using bK(m−1)
end
//THE M-STEP: NETWORK INFERENCEConstruct the penalty matrix P according to bτwhile bΣ(m)
has not stabilized do
for each column of bΣ(m)do
Compute bσ12 by solving the LASSO–like problem with path-wisecoordinate optimization
endend
Compute bK(m) by block-wise inversion of bΣ(m)
m← m+ 1end
Ambroise, Chiquet, Matias 27
The full EM algorithm
while bQτ ( bK(m)) has not stabilized do
//THE E-STEP: LATENT STRUCTURE INFERENCEif m = 1 then
// First passApply spectral clustering on the empirical covariance S to initialize bτ
elseCompute bτ with via fix-point algorithm, using bK(m−1)
end
//THE M-STEP: NETWORK INFERENCEConstruct the penalty matrix P according to bτwhile bΣ(m)
has not stabilized do
for each column of bΣ(m)do
Compute bσ12 by solving the LASSO–like problem with path-wisecoordinate optimization
endend
Compute bK(m) by block-wise inversion of bΣ(m)
m← m+ 1end
Ambroise, Chiquet, Matias 27
The full EM algorithm
while bQτ ( bK(m)) has not stabilized do
//THE E-STEP: LATENT STRUCTURE INFERENCEif m = 1 then
// First passApply spectral clustering on the empirical covariance S to initialize bτ
elseCompute bτ with via fix-point algorithm, using bK(m−1)
end
//THE M-STEP: NETWORK INFERENCEConstruct the penalty matrix P according to bτwhile bΣ(m)
has not stabilized do
for each column of bΣ(m)do
Compute bσ12 by solving the LASSO–like problem with path-wisecoordinate optimization
endend
Compute bK(m) by block-wise inversion of bΣ(m)
m← m+ 1end
Ambroise, Chiquet, Matias 27
The full EM algorithm
while bQτ ( bK(m)) has not stabilized do
//THE E-STEP: LATENT STRUCTURE INFERENCEif m = 1 then
// First passApply spectral clustering on the empirical covariance S to initialize bτ
elseCompute bτ with via fix-point algorithm, using bK(m−1)
end
//THE M-STEP: NETWORK INFERENCEConstruct the penalty matrix P according to bτwhile bΣ(m)
has not stabilized do
for each column of bΣ(m)do
Compute bσ12 by solving the LASSO–like problem with path-wisecoordinate optimization
endend
Compute bK(m) by block-wise inversion of bΣ(m)
m← m+ 1end
Ambroise, Chiquet, Matias 27
Outline
Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood
Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix
Numerical ExperimentsSynthetic dataBreast cancer data
Ambroise, Chiquet, Matias 28
Outline
Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood
Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix
Numerical ExperimentsSynthetic dataBreast cancer data
Ambroise, Chiquet, Matias 29
Simulations settings
Five inference methods
1. InvCorEdge estimation based on empirical correlation matrix inversion.
2. GeneNet (Strimmer et al.)Edge estimation based on partial correlation with shrinkage.
3. GLasso (Friedman et al.)Edge estimation uses a uniform penalty matrix.
4. “perfect” SIMoNe (best results our method can aspire to)Edge estimation uses a penalty matrix constructed according to the theoretic
node classification.
5. SIMoNe (Statistical Inference for MOdular NEtworks)Edge estimation uses a penalty matrix constructed according to the estimated
node classification, iteratively.
Ambroise, Chiquet, Matias 30
Test simulation setup
Simulated Graphs
I Graphs simulated using an affiliation model (two sets ofparameters: intra-groups and inter-groups connections)
I p = 200 nodes p(p− 1)/2 = 19900 possible interactions.I 50 graphs (repetitions) were simulated per situation.I Gene expression data (i.e., Gaussian samples) was then
simulated using the sampled graph:1. Favorable setting (n = 10p),2. Middle case (n = 2p)3. Unfavorable setting (n = p/2)
Unstructured graph
I When no structure SIMoNe is comparable to GeneNet andGLasso
Ambroise, Chiquet, Matias 31
Concentration matrix and structure
(a) (b) (c)
Figure: Simulation of the structured sparse concentration matrix.Adjacency matrix without (a), with (b) columns reorganizedaccording the affiliation structure and corresponding graph (c).
Ambroise, Chiquet, Matias 32
Example of graph recoveryFavorable case
Figure: Theoretical graph and SIMoNe estimationAmbroise, Chiquet, Matias 33
Example of graph recoveryFavorable case
Figure: Theoretical graph and SIMoNe estimationAmbroise, Chiquet, Matias 33
Precision/Recall CurvesDefinition
Precision =TP
TP + FP= Proportion of true positives among all positives
Recall =TP
TP + FN= Proportion of true positive among all edges
Ambroise, Chiquet, Matias 34
Precision/Recall CurvesFavorable setting – n = 10p
I With n� p, PerfectSIMoNe and SIMoNeperform equivalently
I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.
I When n ≤ p all methodsperform poorly. . .
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Recall
Prec
ision
SIMoNeGLassoPerfectGeneNetInvcor
Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe, InvCor
Ambroise, Chiquet, Matias 34
Precision/Recall CurvesFavorable setting – n = 6p
I With n� p, PerfectSIMoNe and SIMoNeperform equivalently
I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.
I When n ≤ p all methodsperform poorly. . .
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Recall
Prec
ision
SIMoNeGLassoPerfectGeneNetInvcor
Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe, InvCor
Ambroise, Chiquet, Matias 34
Precision/Recall CurvesMiddle case – n = 3p
I With n� p, PerfectSIMoNe and SIMoNeperform equivalently
I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.
I When n ≤ p all methodsperform poorly. . .
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Recall
Prec
ision
SIMoNeGLassoPerfectGeneNetInvcor
Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe, InvCor
Ambroise, Chiquet, Matias 34
Precision/Recall CurvesMiddle case – n = 2p
I With n� p, PerfectSIMoNe and SIMoNeperform equivalently
I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.
I When n ≤ p all methodsperform poorly. . .
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Recall
Prec
ision
SIMoNeGLassoPerfectGeneNetInvcor
Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe
Ambroise, Chiquet, Matias 34
Precision/Recall CurvesUnfavorable case – n = p
I With n� p, PerfectSIMoNe and SIMoNeperform equivalently
I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.
I When n ≤ p all methodsperform poorly. . .
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
Recall
Prec
ision
SIMoNeGLassoPerfectGeneNet
Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe
Ambroise, Chiquet, Matias 34
Precision/Recall CurvesUnfavorable case – n = p/2
I With n� p, PerfectSIMoNe and SIMoNeperform equivalently
I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.
I When n ≤ p all methodsperform poorly. . .
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Recall
Prec
ision
SIMoNeGLassoPerfectGeneNet
Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe
Ambroise, Chiquet, Matias 34
Outline
Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood
Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix
Numerical ExperimentsSynthetic dataBreast cancer data
Ambroise, Chiquet, Matias 35
First results on real a datasetPrediction of the outcome of preoperative chemotherapy
Two types of patients
1. Patient response can be classified as either a pathologiccomplete response (PCR)
2. or residual disease (Not PCR).
Gene expression data
I 133 patients (99 not PCR, 34 PCR)I 26 identified genes (differential analysis)
Ambroise, Chiquet, Matias 36
First result on real a datasetPrediction of the outcome of preoperative chemotherapy
AMFR
BB_S4
BECNI
BTG3
CA12
CTNND2
E2F3
ERBB4
FGFRIOP
FLJ10916
FLJI2650
GAMT
GFRAI
IGFBP4
JMJD2B
KIA1467
MAPT
MBTP_SI
MELK
METRN
PDGFRA
RAMPI
RRM2
SCUBE2
THRAP2
ZNF552
Full SampleAmbroise, Chiquet, Matias 37
First result on real a datasetPrediction of the outcome of preoperative chemotherapy
AMFR
BB_S4
BECNI
BTG3
CA12
CTNND2
E2F3
ERBB4
FGFRIOP
FLJ10916
FLJI2650
GAMT
GFRAI
IGFBP4
JMJD2B
KIA1467
MAPT
MBTP_SI
MELK
METRN
PDGFRA
RAMPI
RRM2
SCUBE2
THRAP2
ZNF552
Not PCRAmbroise, Chiquet, Matias 37
First result on real a datasetPrediction of the outcome of preoperative chemotherapy
AMFR
BB_S4
BECNI
BTG3
CA12
CTNND2
E2F3
ERBB4
FGFRIOP
FLJ10916
FLJI2650
GAMT
GFRAI
IGFBP4
JMJD2B
KIA1467
MAPT
MBTP_SI
MELKMETRN
PDGFRA
RAMPI
RRM2
SCUBE2
THRAP2
ZNF552
PCRAmbroise, Chiquet, Matias 37
Conclusions
To sum-up
I We proposed an inference strategy based on apenalization scheme given by an underlying unknownstructure.
I The estimation strategy is based on a variational EM
algorithm, in which a LASSO-like procedure is embedded.I Preprint on arxiv.I R package SIMoNe
Perspectives
I Consider alternative prior more biologically relevant: hubs,motifs.
I Time segmentation when dealing with temporal data
Ambroise, Chiquet, Matias 38
Conclusions
To sum-up
I We proposed an inference strategy based on apenalization scheme given by an underlying unknownstructure.
I The estimation strategy is based on a variational EM
algorithm, in which a LASSO-like procedure is embedded.I Preprint on arxiv.I R package SIMoNe
Perspectives
I Consider alternative prior more biologically relevant: hubs,motifs.
I Time segmentation when dealing with temporal data
Ambroise, Chiquet, Matias 38
Penalty choice (1)
Let Ci denote the connectivity component of i in the trueconditional dependency graph, and Ci the correspondingcomponent resulting from the estimate K.
PropositionFix some ε > 0 and choose the penalty parameters λ such that,for all q, ` ∈ Q,
2p2Fn−2
2nλq`
(maxi 6=j
SiiSjj −1λ2q`
)−1/2
(n− 2)1/2
≤ ε,where 1− Fn−2 is the c.d.f. of a Students’s t-distribution withn− 2 degrees of freedom. Then
P(∃k, Ck * Ck) ≤ ε. (3)
Ambroise, Chiquet, Matias 39
Penalty choice (2)
It’s enough to choose λq` such as
λq`(ε) ≥2n
(n− 2 + t2n−2
(ε
2p2
))1/2
×
maxi 6=j
ZiqZj`=1
SiiSjj
−1/2
tn−2
(ε
2p2
)−1
.
Ambroise, Chiquet, Matias 40
Penalty choice (3)
Practically,
I Relax the λq` in the E–step (variational inference), thusmaking variational EM in the E-step.
I Fix the λq` in the M-step, adapting the above rule to thecontext.E.g., for an affiliation structure, we fix the ratio λin/λout = 1.2 and either let the
value 1/λin vary when considering precision/recall curves for synthetic data, or fix
this parameter relying on the above rule when dealing with real data
Ambroise, Chiquet, Matias 41