An introduction to Biological network inference viaGaussian Graphical Models
Christophe Ambroise, Julien Chiquet
Statistique et Genome, CNRS & Universite d’Evry Val d’Essonne
Sao Paulo – School on Advance Science – Octobre 2012
http://stat.genopole.cnrs.fr/~cambroise
Network inference 1
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 2
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 3
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 4
Real networks
IMany scientific fields :
I World Wide Web
I Biology, sociology, physics
INature of data under study:
I Interactions between N
objects
I O(N
2) possible interactions
INetwork topology :
I Describes the way nodesinteract, structure/functionrelationship
Sample of 250 blogs (nodes) with their links
(edges) of the French political Blogosphere.
Network inference 5
What the reconstructed networks are expected to be 1 (1)Regulatory networks
E. coli regulatory network
I relationships betweengene and their products
I inhibition/activation
I impossible to recover atlarge scale
I always incomplete
Figure 1: Regulatory network of E. Coli displayed with colors for each MixNet class (5 groups).
2
1
1and are presumably wrongly assumed to be
Network inference 6
What the reconstructed networks are expected to be (2)Regulatory networks
Figure: Regulatory network identified in mammalian cells: highly structured
Network inference 7
What the reconstructed networks are expected to be (3)Protein-Protein interaction networks
Figure: Yeast PPI network : do not be mislead by the representation, trust stat !
Network inference 8
What the reconstructed networks are expected to be (3)Protein-Protein interaction networks
Figure: Yeast PPI network : do not be mislead by the representation, trust stat !Network inference 8
What the reconstructed networks are expected to be (3)Protein-Protein interaction networks
Figure: Yeast PPI network : do not be mislead by the representation, trust stat !
Network inference 8
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 9
What are we looking at?
Central dogma of molecular biology
DNA mRNA Proteinstranscription translation
replication
Proteins
I are building blocks of any cellular functionality,
I are encoded by the genes,
Ido interact (at the protein and gene level – regulations).
Network inference 10
What questions in functional genomics? (1)
Various levels/scales of study
I genome: sequence analysis,
I transcriptome: gene expression levels,
I proteome: protein functions and interactions.
Questions
1. Biological understandingI Mechanisms of diseases,I gene/protein functions and interactions.
2. Medical/clinical careI Diagnostic (type of disease),I prognostic (survival analysis),I treatment (prediction of response).
Network inference 11
What questions in functional genomics? (1)
Various levels/scales of study
I genome: sequence analysis,
I transcriptome: gene expression levels,
I proteome: protein functions and interactions.
Questions
1. Biological understandingI Mechanisms of diseases,I gene/protein functions and interactions.
2. Medical/clinical careI Diagnostic (type of disease),I prognostic (survival analysis),I treatment (prediction of response).
Network inference 11
What questions in functional genomics? (2)
Central dogma of molecular biology
DNA mRNA Proteinstranscription translation
replication
Basic biostatistical issues
Selecting some genes of interest (biomarkers),
Looking for interactions between them (pathway analysis).
Network inference 12
How is this measured? (1)Microarray technology: parallel measurement of many biological features
signal processing
X =
0
B@x
11 x
21 x
31 . . . x
p1
...x
1n x
2n x
21 . . . x
pn
1
CA
Matrix of features n ⌧ p
Expression levels of p
probes are simultaneously
monitored for n individuals
pretreatment
Network inference 13
How is this measured? (2)Next Generation Sequencing: parallel measurement of even many more biological features
assembling
X =
0
B@k
11 k
21 k
31 . . . k
p1
...k
1n k
2n k
21 . . . k
pn
1
CA
Matrix of features nn p
Expression counts are extracted
from small repeated sequences
and monitored for n individuals
pretreatment
Network inference 14
What questions are we dealing with? (1)Supervised canonical example at the gene level: di↵erential analysis
Leukemia (Golub data, thanks to P. Neuvial)
I AML – Acute Myeloblastic Leukemia, n1 = 11,
I ALL – Acute Lymphoblastic Leukemia n2 = 27,
a n1 + n2 vector of outcome with each patient’s tumor type.
Supervised classificationFind genes with significantdi↵erent expression levelsbetween groups – biomarkers
prediction purpose
Network inference 15
What questions are we dealing with? (2)Unsupervised canonical example at the gene level: hierarchical clustering
Same kind of data, no outcome is considered
(Unsupervised) clustering
Find groups of gene which showstatisticaldependencies/commonalities –hoping for biological interactions
exploratory purpose
functional understanding
Can we do better than that ? And how do genes interact anyway?
Network inference 16
What questions are we dealing with? (2)Unsupervised canonical example at the gene level: hierarchical clustering
Same kind of data, no outcome is considered
(Unsupervised) clustering
Find groups of gene which showstatisticaldependencies/commonalities –hoping for biological interactions
exploratory purpose
functional understanding
Can we do better than that ? And how do genes interact anyway?
Network inference 16
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 17
The problem at hand
Inference
⇡ 10s/100s microarray/sequencing experiments
⇡ 1000s probes (“genes”)
Inference
Modeling questions prior to inference
1. What do the nodes represent? (the easiest one)
2. What is/should be the meaning of an edge? (the toughest one)I Biologically?I Statistically?
Network inference 18
The problem at hand
Inference
⇡ 10s/100s microarray/sequencing experiments
⇡ 1000s probes (“genes”)
Inference
Modeling questions prior to inference
1. What do the nodes represent? (the easiest one)
2. What is/should be the meaning of an edge? (the toughest one)I Biologically?I Statistically?
Network inference 18
The problem at hand
Inference
⇡ 10s/100s microarray/sequencing experiments
⇡ 1000s probes (“genes”)
Inference
Modeling questions prior to inference
1. What do the nodes represent? (the easiest one)
2. What is/should be the meaning of an edge? (the toughest one)I Biologically?I Statistically?
Network inference 18
The problem at hand
Inference
⇡ 10s/100s microarray/sequencing experiments
⇡ 1000s probes (“genes”)
Inference
Modeling questions prior to inference
1. What do the nodes represent? (the easiest one)
2. What is/should be the meaning of an edge? (the toughest one)I Biologically?I Statistically?
Network inference 18
More questions/issues
Modelling
I Is the network dynamic of static?
I How has the data been generated? (time-course/steady state)
I Are the edges oriented or not? (causality)
I What do the edges represent for my particular problem?
Statistical challenges
I (Ultra) high dimensionality,
I Noisy data, lack of reproducibility,
I Heterogeneity of the data (many techniques, various signals).
Network inference 19
More questions/issues
Modelling
I Is the network dynamic of static?
I How has the data been generated? (time-course/steady state)
I Are the edges oriented or not? (causality)
I What do the edges represent for my particular problem?
Statistical challenges
I (Ultra) high dimensionality,
I Noisy data, lack of reproducibility,
I Heterogeneity of the data (many techniques, various signals).
Network inference 19
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 20
Canonical model settingsBiological microarrays in comparable conditions
Notations
1. a set P = {1, . . . , p} of p variables:these are typically the genes (could be proteins);
2. a sample N = {1, . . . ,n} of individuals associated to the variables:these are typically the microarray (could be sequence counts).
Basic statistical modelThis can be view as
I a random vector X in Rp , whose j th entry is the j th variable,I a n-size sample (X
1, . . . ,X n), such as X i is the ith microarrays,
I could be independent identically distributed copies (steady-state)I could be dependent in a certain way (time-course data)
I assume a parametric probability distribution for X (Gaussian).
Network inference 21
Canonical model settingsBiological microarrays in comparable conditions
Notations
1. a set P = {1, . . . , p} of p variables:these are typically the genes (could be proteins);
2. a sample N = {1, . . . ,n} of individuals associated to the variables:these are typically the microarray (could be sequence counts).
Basic statistical modelThis can be view as
I a random vector X in Rp , whose j th entry is the j th variable,I a n-size sample (X
1, . . . ,X n), such as X i is the ith microarrays,
I could be independent identically distributed copies (steady-state)I could be dependent in a certain way (time-course data)
I assume a parametric probability distribution for X (Gaussian).
Network inference 21
Canonical model settingsBiological microarrays in comparable conditions
Notations
1. a set P = {1, . . . , p} of p variables:these are typically the genes (could be proteins);
2. a sample N = {1, . . . ,n} of individuals associated to the variables:these are typically the microarray (could be sequence counts).
Basic statistical modelThis can be view as
I a random vector X in Rp , whose j th entry is the j th variable,I a n-size sample (X
1, . . . ,X n), such as X i is the ith microarrays,
I could be independent identically distributed copies (steady-state)I could be dependent in a certain way (time-course data)
I assume a parametric probability distribution for X (Gaussian).
The data
Stacking (X
1, . . . ,X n), we met the usual individual/variable table X
Inference X =
0
B@x
11 x
21 x
31 . . . x
p1
...x
1n x
2n x
21 . . . x
pn
1
CA
Network inference 21
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 22
Modeling relationship between variables (1)Independence
Definition (Independence of events)
Two events A and B are independent if and only if
P(A,B) = P(A)P(B),
which is usually denoted by A ?? B . Equivalently,
IA ?? B , P(A|B) = P(A),
IA ?? B , P(A|B) = P(A|Bc
)
Example (class vs party)
partyclass Labour Tory
working 0.42 0.28bourgeoisie 0.06 0.24
partyclass Labour Tory
working 0.60 0.40bourgeoisie 0.20 0.80
Table: Joint probability (left) vs. conditional probability (right)Network inference 23
Modeling relationship between variables (1)Independence
Definition (Independence of events)
Two events A and B are independent if and only if
P(A,B) = P(A)P(B),
which is usually denoted by A ?? B . Equivalently,
IA ?? B , P(A|B) = P(A),
IA ?? B , P(A|B) = P(A|Bc
)
Example (class vs party)
partyclass Labour Tory
working 0.42 0.28bourgeoisie 0.06 0.24
partyclass Labour Tory
working 0.60 0.40bourgeoisie 0.20 0.80
Table: Joint probability (left) vs. conditional probability (right)Network inference 23
Modeling relationships between variables (2)Conditional independence
Generalizing to more than two events requires strong assumptions(mutual independence). Better handle with
Definition (Conditional independence of events)
Two events A and B are independent if and only if
P(A,B |C ) = P(A|C )P(B |C ),
which is usually denoted by A ?? B |CExample (Does QI depends on weight?)
Consider the events A = ”having low QI”, B = ”having low weight”.
Network inference 24
Modeling relationships between variables (2)Conditional independence
Generalizing to more than two events requires strong assumptions(mutual independence). Better handle with
Definition (Conditional independence of events)
Two events A and B are independent if and only if
P(A,B |C ) = P(A|C )P(B |C ),
which is usually denoted by A ?? B |CExample (Does QI depends on weight?)
Consider the events A = ”having low QI”, B = ”having low weight”.
Network inference 24
Modeling relationships between variables (2)Conditional independence
Generalizing to more than two events requires strong assumptions(mutual independence). Better handle with
Definition (Conditional independence of events)
Two events A and B are independent if and only if
P(A,B |C ) = P(A|C )P(B |C ),
which is usually denoted by A ?? B |CExample (Does QI depends on weight?)
Consider the events A = ”having low QI”, B = ”having low weight”.Estimating2 P(A,B), P(A) and P(B) in a sample would lead to
P(A,B) 6= P(A)P(B)
2stupidly
Network inference 24
Modeling relationships between variables (2)Conditional independence
Generalizing to more than two events requires strong assumptions(mutual independence). Better handle with
Definition (Conditional independence of events)
Two events A and B are independent if and only if
P(A,B |C ) = P(A|C )P(B |C ),
which is usually denoted by A ?? B |CExample (Does QI depends on weight?)
Consider the events A = ”having low QI”, B = ”having low weight”.But in fact, introducing C = ”having a given age”,
P(A,B |C ) = P(A|C )P(B |C )
Network inference 24
Independence of random vectors (1)Independence and Conditional independence: natural generalization
DefinitionConsider 3 random vector X ,Y ,Z with distribution fX , fY , fZ , jointlyfXY , fXYZ . Then,
IX and Y are independent iif fXY (x , y) = fX (x )fY (y);
IX and Y are conditionally independent on Z , z : fZ (z ) > 0 iiffXY |Z (x , y ; z ) = fX |Z (x ; z )fY |Z (y ; z ).
Proposition (Factorization criterion)
X and Y are independent (resp. conditionally independent on Z ) iif
there exists functions g and h such as, for all x and y
1. fXY (x , y) = g(x )h(y),
2. fXYZ (x , y , z ) = g(x , z )h(y , z ), for all z fZ (z ) > 0.
Network inference 25
Independence of random vectors (1)Independence and Conditional independence: natural generalization
DefinitionConsider 3 random vector X ,Y ,Z with distribution fX , fY , fZ , jointlyfXY , fXYZ . Then,
IX and Y are independent iif fXY (x , y) = fX (x )fY (y);
IX and Y are conditionally independent on Z , z : fZ (z ) > 0 iiffXY |Z (x , y ; z ) = fX |Z (x ; z )fY |Z (y ; z ).
Proposition (Factorization criterion)
X and Y are independent (resp. conditionally independent on Z ) iif
there exists functions g and h such as, for all x and y
1. fXY (x , y) = g(x )h(y),
2. fXYZ (x , y , z ) = g(x , z )h(y , z ), for all z fZ (z ) > 0.
Network inference 25
Independence of random vectors (2)Independence vs Conditional independence
f ;X ?? Z |Y
f ;X ?? Y |Z
f ;Y ?? Z |X
f ; fXYZ
f ; fX fY fZ
Figure: Mutual independence, Conditional dependence, full dependence.Network inference 26
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 27
Definition
DefinitionA graphical model gives a graphical (intuitive) representation of thedependence structure of a probability distribution.
Graphical structure $ Random variables/Random vector
It links
1. a random vector (or a set of random variables.) X = {X1, . . . ,Xp}with distribution P,
2. a graph G = (P, E) whereI P = {1, . . . , p} is the set of nodes associated to each variable,I E is a set of edges describing the dependence relationship of X ⇠ P.
Network inference 28
Definition
DefinitionA graphical model gives a graphical (intuitive) representation of thedependence structure of a probability distribution.
Graphical structure $ Random variables/Random vector
It links
1. a random vector (or a set of random variables.) X = {X1, . . . ,Xp}with distribution P,
2. a graph G = (P, E) whereI P = {1, . . . , p} is the set of nodes associated to each variable,I E is a set of edges describing the dependence relationship of X ⇠ P.
Network inference 28
Conditional Independence GraphsDefinition
DefinitionThe conditional independence graph of a random vector X is theundirected graph G = {P, E} with the set of node P = {1, . . . , p} andwhere
(i , j ) /2 E , Xi ?? Xj |P\{i , j}.
PropertyIt owns the Markov property: any two subsets of variables separated by athird is independent conditionally on variables in the third set.
Network inference 29
Conditional Independence GraphsDefinition
DefinitionThe conditional independence graph of a random vector X is theundirected graph G = {P, E} with the set of node P = {1, . . . , p} andwhere
(i , j ) /2 E , Xi ?? Xj |P\{i , j}.
PropertyIt owns the Markov property: any two subsets of variables separated by athird is independent conditionally on variables in the third set.
Network inference 29
Conditional Independence GraphsAn example
Let X1,X2,X3,X4 be four random variables with joint probability densityfunction fX (x ) = exp(u + x1 + x1x2 + x2x3x4) with u a given constant.
Apply the factorization property
fX (x ) = exp(u + x1 + x1x2 + x2x3x4)
= exp(u) · exp(x1 + x1x2) · exp(x2x3x4)
Graphical representation
G = (P, E) such as P = {1, 2, 3, 4}and
E =
1 2
3
4
Network inference 30
Conditional Independence GraphsAn example
Let X1,X2,X3,X4 be four random variables with joint probability densityfunction fX (x ) = exp(u + x1 + x1x2 + x2x3x4) with u a given constant.
Apply the factorization property
fX (x ) = exp(u + x1 + x1x2 + x2x3x4)
= exp(u) · exp(x1 + x1x2) · exp(x2x3x4)
Graphical representation
G = (P, E) such as P = {1, 2, 3, 4}and
E = {?}
1 2
3
4
Network inference 30
Conditional Independence GraphsAn example
Let X1,X2,X3,X4 be four random variables with joint probability densityfunction fX (x ) = exp(u + x1 + x1x2 + x2x3x4) with u a given constant.
Apply the factorization property
fX (x ) = exp(u + x1 + x1x2 + x2x3x4)
= exp(u) · exp(x1 + x1x2) · exp(x2x3x4)
Graphical representation
G = (P, E) such as P = {1, 2, 3, 4}and
E = {(1, 2)}
1 2
3
4
Network inference 30
Conditional Independence GraphsAn example
Let X1,X2,X3,X4 be four random variables with joint probability densityfunction fX (x ) = exp(u + x1 + x1x2 + x2x3x4) with u a given constant.
Apply the factorization property
fX (x ) = exp(u + x1 + x1x2 + x2x3x4)
= exp(u) · exp(x1 + x1x2) · exp(x2x3x4)
Graphical representation
G = (P, E) such as P = {1, 2, 3, 4}and
E = {(2, 3), (3, 4), (2, 4)}
1 2
3
4
Network inference 30
Directed Acyclic conditional independence Graph (DAG)Motivation
Limitation of undirected graphsSometimes an ordering on the variables is known, which allows to breakthe symmetry in the graphical representation to introduce, in some sense,“causality” in the modeling.
Consequences
I Each element of E has to be directed.
I There are no directed cycle in the graph.
We thus deal with a directed acyclic graph (or DAG).
Network inference 31
Directed Acyclic conditional independence Graph (DAG)Definition
Definition (Ordering)
An ordering between variables {1, . . . , p} is a relation � such that: i) forall couple (i , j ), either i � j or j � i , ii) � is transitive iii) � is notreflexive.
I A natural ordering is obtained when variables are observed acrosstime,
I A natural conditioning set for a pair of variables (i , j ) is the past,denoted P(j ) = 1, . . . , j for j .
Definition (DAG)
The directed conditional dependence graph of X is the directed graphG = (P, E�
) where
(i , j ) such as i � j /2 E , Xj ?? Xi |P(j )\{i , j}.
Network inference 32
Directed Acyclic conditional independence Graph (DAG)Definition
Definition (Ordering)
An ordering between variables {1, . . . , p} is a relation � such that: i) forall couple (i , j ), either i � j or j � i , ii) � is transitive iii) � is notreflexive.
I A natural ordering is obtained when variables are observed acrosstime,
I A natural conditioning set for a pair of variables (i , j ) is the past,denoted P(j ) = 1, . . . , j for j .
Definition (DAG)
The directed conditional dependence graph of X is the directed graphG = (P, E�
) where
(i , j ) such as i � j /2 E , Xj ?? Xi |P(j )\{i , j}.
Network inference 32
Directed Acyclic conditional independence Graph (DAG)Factorization and Markov property
Another view is a parent/descendant relationships to deal with theordering of the nodes:
The factorization property
fX (x ) =
pY
k=1
fXk |pak (xk |pak ),
where pak are the parents of node k .
Network inference 33
Directed Acyclic conditional independence Graph (DAG)An example
x1
x2 x3
x4 x5
x6 x7
fX (x ) = ?fX1fX2fX3fX4|X1,X2,X3fX5|X1,X3
fX6|X4fX7|X4,X5
.
Network inference 34
Directed Acyclic conditional independence Graph (DAG)An example
x1
x2 x3
x4 x5
x6 x7
fX (x ) = ?fX1 · · · fX2fX3fX4|X1,X2,X3fX5|X1,X3
fX6|X4fX7|X4,X5
.
Network inference 34
Directed Acyclic conditional independence Graph (DAG)An example
x1
x2 x3
x4 x5
x6 x7
fX (x ) = ?fX1fX2 · · · fX3fX4|X1,X2,X3fX5|X1,X3
fX6|X4fX7|X4,X5
.
Network inference 34
Directed Acyclic conditional independence Graph (DAG)An example
x1
x2 x3
x4 x5
x6 x7
fX (x ) = ?fX1fX2fX3 · · · fX4|X1,X2,X3fX5|X1,X3
fX6|X4fX7|X4,X5
.
Network inference 34
Directed Acyclic conditional independence Graph (DAG)An example
x1
x2 x3
x4 x5
x6 x7
fX (x ) = ?fX1fX2fX3fX4|X1,X2,X3· · · fX5|X1,X3
fX6|X4fX7|X4,X5
.
Network inference 34
Directed Acyclic conditional independence Graph (DAG)An example
x1
x2 x3
x4 x5
x6 x7
fX (x ) = ?fX1fX2fX3fX4|X1,X2,X3fX5|X1,X3
· · · fX6|X4fX7|X4,X5
.
Network inference 34
Directed Acyclic conditional independence Graph (DAG)An example
x1
x2 x3
x4 x5
x6 x7
fX (x ) = ?fX1fX2fX3fX4|X1,X2,X3fX5|X1,X3
fX6|X4· · · fX7|X4,X5
.
Network inference 34
Directed Acyclic conditional independence Graph (DAG)An example
x1
x2 x3
x4 x5
x6 x7
fX (x ) = ?fX1fX2fX3fX4|X1,X2,X3fX5|X1,X3
fX6|X4fX7|X4,X5
.
Network inference 34
Directed Acyclic conditional independence Graph (DAG)Markov property
Local Markov propertyFor any Y 2 dek where dek are the descendants of k , then
Xk ?? Y | pak ,
that is, Xk is conditionally independent on its non-descendants given itsparents.
Network inference 35
Local Markov property: example
x1
x2 x3
x4 x5
Check that x4 ?? x5 | {x2, x3}, byusing the factorization property.
P(x4|x5, x2, x3) =
P(x2, x3, x4, x5)
P(x2, x3, x5)
=
P(x2)P(x3)P(x4|x2, x3)P(x5|x3)P(x2)P(x3)P(x5|x3)
= P(x4|x2, x3).
Network inference 36
Local Markov property: example
x1
x2 x3
x4 x5
Check that x4 ?? x5 | {x2, x3}, byusing the factorization property.
P(x4|x5, x2, x3) =
P(x2, x3, x4, x5)
P(x2, x3, x5)
=
P(x2)P(x3)P(x4|x2, x3)P(x5|x3)P(x2)P(x3)P(x5|x3)
= P(x4|x2, x3).
Network inference 36
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 37
Modeling the genomic dataGaussian assumption
The data
Inference X =
0
B@x
11 x
21 x
31 . . . x
p1
...x
1n x
2n x
21 . . . x
pn
1
CA
Assuming fX (X) multivariate Gaussian
Greatly simplifies the inference:
naturally links independence and conditional independence to thecovariance and partial covariance,
gives a straightforward interpretation to the graphical modelingpreviously considered.
Network inference 38
Modeling the genomic dataGaussian assumption
The data
Inference X =
0
B@x
11 x
21 x
31 . . . x
p1
...x
1n x
2n x
21 . . . x
pn
1
CA
Assuming fX (X) multivariate Gaussian
Greatly simplifies the inference:
naturally links independence and conditional independence to thecovariance and partial covariance,
gives a straightforward interpretation to the graphical modelingpreviously considered.
Network inference 38
Start gently with the univariate Gaussian distribution
The Gaussian distribution is thenatural model for the level ofexpression of gene (noisy data).
We note X ⇠ N (µ, �2), so as EX = µ, VarX = �2 and
fX (x ) =
1
�p
2⇡exp
⇢� 1
2�2(x � µ)
2
�,
and
log fX (x ) = � log �p
2⇡ � 1
2�2(x � µ)
2.
Useless for modeling the distribution of expression level for a wholebunch of genes.
Network inference 39
Start gently with the univariate Gaussian distribution
The Gaussian distribution is thenatural model for the level ofexpression of gene (noisy data).
We note X ⇠ N (µ, �2), so as EX = µ, VarX = �2 and
fX (x ) =
1
�p
2⇡exp
⇢� 1
2�2(x � µ)
2
�,
and
log fX (x ) = � log �p
2⇡ � 1
2�2(x � µ)
2.
Useless for modeling the distribution of expression level for a wholebunch of genes.
Network inference 39
One step forward: bivariate Gaussian distributionNeed concepts of covariance and correlation
Let X ,Y be two real random variables.
Definitions
cov(X ,Y ) = Eh�X � E(X )
��Y � E(Y )
�i= E(XY ) � E(X )E(Y ).
⇢XY = cor(X ,Y ) =
cov(X ,Y )pVar(X ) · Var(Y )
.
Proposition
Icov(X ,X ) = Var(X ) = E[(X � EX )(Y � EY )],
Icov(X + Y ,Z ) = cov(X ,Z ) + cov(X ,Z ),
IVar(X + Y ) = Var(X ) + Var(Y ) + 2cov(X ,Y ).
IX ?? Y ) cov(X ,Y ) = 0.
IX ?? Y , cov(X ,Y ) = 0 when X ,Y are Gaussian.
Network inference 40
One step forward: bivariate Gaussian distributionNeed concepts of covariance and correlation
Let X ,Y be two real random variables.
Definitions
cov(X ,Y ) = Eh�X � E(X )
��Y � E(Y )
�i= E(XY ) � E(X )E(Y ).
⇢XY = cor(X ,Y ) =
cov(X ,Y )pVar(X ) · Var(Y )
.
Proposition
Icov(X ,X ) = Var(X ) = E[(X � EX )(Y � EY )],
Icov(X + Y ,Z ) = cov(X ,Z ) + cov(X ,Z ),
IVar(X + Y ) = Var(X ) + Var(Y ) + 2cov(X ,Y ).
IX ?? Y ) cov(X ,Y ) = 0.
IX ?? Y , cov(X ,Y ) = 0 when X ,Y are Gaussian.
Network inference 40
The bivariate Gaussian distribution
fXY (x , y) =
1p2⇡ det ⌃
exp{1
2
�x � µ1 y � µ2
�⌃
�1
✓x � µ1
y � µ2
◆}
where ⌃ is the variance/covariance matrix which is symmetric andpositive definite.
⌃ =
✓Var(X ) cov(Y ,X )
cov(Y ,X ) Var(Y )
◆.
and
fX ,Y (x , y) =
1p2⇡(1 � ⇢2XY )
exp � 1
2(1 � ⇢2XY )
(x
2+ y
2+ 2⇢XY xy),
where ⇢XY is the correlation between X ,Y and describe the interactionbetween them.
Network inference 41
The bivariate Gaussian distribution
fXY (x , y) =
1p2⇡ det ⌃
exp{1
2
�x � µ1 y � µ2
�⌃
�1
✓x � µ1
y � µ2
◆}
where ⌃ is the variance/covariance matrix which is symmetric andpositive definite. If standardized,
⌃ =
✓1 ⇢XY
⇢XY 1
◆.
and
fX ,Y (x , y) =
1p2⇡(1 � ⇢2XY )
exp � 1
2(1 � ⇢2XY )
(x
2+ y
2+ 2⇢XY xy),
where ⇢XY is the correlation between X ,Y and describe the interactionbetween them.
Network inference 41
The bivariate Gaussian distribution
The Covariance MatrixLet
X ⇠ N (0,⌃),
with unit variance and⇢XY = 0
⌃ =
✓1 0
0 1
◆.
The shape of the 2-Ddistribution evolvesaccordingly.
Network inference 42
The bivariate Gaussian distribution
The Covariance MatrixLet
X ⇠ N (0,⌃),
with unit variance and⇢XY = 0.9
⌃ =
✓1 0.9
0.9 1
◆.
The shape of the 2-Ddistribution evolvesaccordingly.
Network inference 42
Full generalization: multivariate Gaussian vectorNow need partial covariance and partial correlation
Let X ,Y ,Z be real random variables.
Definitions
cov(X ,Y |Z ) = cov(X ,Y ) � cov(X ,Z )cov(Y ,Z )/Var(Z ).
⇢XY |Z =
⇢XY � ⇢XZ⇢YZq1 � ⇢2XZ
q1 � ⇢2YZ
.
Give the interaction between X and Y once removed the e↵ect of Z .
PropositionWhen X ,Y ,Z are jointly Gaussian, then
cov(X ,Y |Z ) = 0 , cor(X ,Y |Z ) = 0 , X ?? Y |Z .
Network inference 43
Full generalization: multivariate Gaussian vectorNow need partial covariance and partial correlation
Let X ,Y ,Z be real random variables.
Definitions
cov(X ,Y |Z ) = cov(X ,Y ) � cov(X ,Z )cov(Y ,Z )/Var(Z ).
⇢XY |Z =
⇢XY � ⇢XZ⇢YZq1 � ⇢2XZ
q1 � ⇢2YZ
.
Give the interaction between X and Y once removed the e↵ect of Z .
PropositionWhen X ,Y ,Z are jointly Gaussian, then
cov(X ,Y |Z ) = 0 , cor(X ,Y |Z ) = 0 , X ?? Y |Z .
Network inference 43
The multivariate Gaussian distribution
Allow to give a modeling for the expression level of a whole set of genesP:
Gaussian vectorLet X ⇠ N (µ,⌃), and assume any block decomposition with {a, b} apartition of P
⌃ =
✓⌃ab ⌃ba
⌃ab ⌃bb
◆.
Then
1. Xa is Gaussian with distribution N (µa ,⌃aa)
2. Xa |Xb = x is Gaussian with distribution N (µa|b ,⌃a|b) known.
Network inference 44
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 45
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 46
Steady-state data: scheme
⇡ 10s microarrays over time
⇡ 1000s probes (“genes”)
Inference
Which interactions?
Network inference 47
Modeling the underlying distribution (1)
Model for data generation
I A microarray can be represented as a multivariate vectorX = (X1, . . . ,Xp) 2 Rp ,
I Consider n biological replicate in the same condition, which forms ausual n-size sample (X1, . . . ,Xn).
Consequence: a Gaussian Graphical Model
IX ⇠ N (µ,⌃) with X1, . . . ,Xn i.i.d. copies of X ,
I⇥ = (✓ij )i ,j2P , ⌃
�1 is called the concentration matrix.
Network inference 48
Modeling the underlying distribution (1)
Model for data generation
I A microarray can be represented as a multivariate vectorX = (X1, . . . ,Xp) 2 Rp ,
I Consider n biological replicate in the same condition, which forms ausual n-size sample (X1, . . . ,Xn).
Consequence: a Gaussian Graphical Model
IX ⇠ N (µ,⌃) with X1, . . . ,Xn i.i.d. copies of X ,
I⇥ = (✓ij )i ,j2P , ⌃
�1 is called the concentration matrix.
Network inference 48
Modeling the underlying distribution (2)Interpretation as a GGM
Multivariate Gaussian vector and covariance selection
� ✓ijp✓ii✓jj
= cor
�Xi ,Xj |XP\i ,j
�= ⇢ij |P\{i ,j},
Graphical Interpretation The matrix ⇥ = (✓ij )i ,j2P encodes the network G we are looking for.
conditional dependency between Xj and Xi
ornon-null partial correlation between Xj and Xi
m✓ij 6= 0
if and only ifi
j
?
Network inference 49
Modeling the underlying distribution (2)Interpretation as a GGM
Multivariate Gaussian vector and covariance selection
� ✓ijp✓ii✓jj
= cor
�Xi ,Xj |XP\i ,j
�= ⇢ij |P\{i ,j},
Graphical Interpretation The matrix ⇥ = (✓ij )i ,j2P encodes the network G we are looking for.
conditional dependency between Xj and Xi
ornon-null partial correlation between Xj and Xi
m✓ij 6= 0
if and only ifi
j
?
Network inference 49
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 50
Time-course data: scheme
t0t1
tn
⇡ 10s microarrays over time
⇡ 1000s probes (“genes”)
Inference
Which interactions?
Network inference 51
Modeling time-course data with DAG
Collecting gene expression
1. Follow-up of one single experiment/individual;
2. Close enough time-points to ensureI dependency between consecutive measurements;I homogeneity of the Markov process.
X1
X2X3
X4
X5
stands for
X
t1 X
t+11
X
t2 X
t+12
X
t+13
X
t+14G
X
t+15
Network inference 52
Modeling time-course data with DAG
Collecting gene expression
1. Follow-up of one single experiment/individual;
2. Close enough time-points to ensureI dependency between consecutive measurements;I homogeneity of the Markov process.
X1
X2X3
X4
X5
stands for
X
t1 X
t+11
X
t2 X
t+12
X
t+13
X
t+14G
X
t+15
Network inference 52
Modeling time-course data with DAG
Collecting gene expression
1. Follow-up of one single experiment/individual;
2. Close enough time-points to ensureI dependency between consecutive measurements;I homogeneity of the Markov process.
X
t1 X
21
. . .X
n1
X
12 X
22
. . .X
n2
X
13 X
23
. . .X
n3
X
14 X
24
. . .X
n4
G G GX
15 X
25
. . .X
n5
Network inference 52
DAG: remark
X
1
X
2X
3
X
4
X
5
Argh, there is a cycle :’(
versus
X
1t X
1t+1
X
2t X
2t+1
X
3t+1
X
4t+1G
X
5t+1
is indeed a DAG
Overcomes the rather restrictive acyclic requirement
Network inference 53
Modeling the underlying distribution (1)
Model for data generationA microarray can be represented as a multivariate vectorX = (X1, . . . ,Xp) 2 Rp , generated through a first order vectorautoregressive process VAR(1):
X
t= ⇥X
t�1+ b + "t , t 2 [1,n]
where "t is a white noise to ensure the Markov property andX
0 ⇠ N (0,⌃0).
Consequence: a Gaussian Graphical Model
I Each X
t |X t�1 ⇠ N (✓X t�1,⌃),
I or, equivalently, X tj |X t�1 ⇠ N (⇥jX
t�1,⌃)
where ⌃ is known and ⇥j is the j th row of ⇥.
Network inference 54
Modeling the underlying distribution (1)
Model for data generationA microarray can be represented as a multivariate vectorX = (X1, . . . ,Xp) 2 Rp , generated through a first order vectorautoregressive process VAR(1):
X
t= ⇥X
t�1+ b + "t , t 2 [1,n]
where "t is a white noise to ensure the Markov property andX
0 ⇠ N (0,⌃0).
Consequence: a Gaussian Graphical Model
I Each X
t |X t�1 ⇠ N (✓X t�1,⌃),
I or, equivalently, X tj |X t�1 ⇠ N (⇥jX
t�1,⌃)
where ⌃ is known and ⇥j is the j th row of ⇥.
Network inference 54
Modeling the underlying distribution (2)
I
2
66666666664
X
1t
.
.X
it
.
.
.X
pt
3
77777777775
=
2
6666666666664
✓11 . . . ✓1j . . ✓1p. . . . . . . .. . . . . . . .. . . . . . . .
✓i1 . . . ✓ij . . ✓ip. . . . . . . .. . . . . . . .. . . . . . . .
✓p1 . . . ✓pj . . ✓pp
3
7777777777775
2
66666666664
X
1t�1
.
.
.
X
jt�1..
X
pt�1
3
77777777775
+
2
66666666664
b1
.
.bi
.
.
.bp
3
77777777775
+
2
66666666664
"1t.."it...
"pt
3
77777777775
I Example:
⇥ =
0
@✓11 ✓12 0
✓21 0 0
0 ✓32 0
1
A
Network inference 55
Modeling the underlying distribution (3)Interpretation as a GGM
The VAR(1) as a covariance selection model
✓ij =
cov
⇣X
ti ,X t�1
j |X t�1P\j
⌘
var
⇣X
t�1j |X t�1
P\j
⌘ ,
Graphical Interpretation The matrix ⇥ = (✓ij )i ,j2P encodes the network G we are looking for.
conditional dependency between X t�1j and X t
ior
non-null partial correlation between X t�1j and X t
im
✓ij 6= 0
if and only ifi
j
?
Network inference 56
Modeling the underlying distribution (3)Interpretation as a GGM
The VAR(1) as a covariance selection model
✓ij =
cov
⇣X
ti ,X t�1
j |X t�1P\j
⌘
var
⇣X
t�1j |X t�1
P\j
⌘ ,
Graphical Interpretation The matrix ⇥ = (✓ij )i ,j2P encodes the network G we are looking for.
conditional dependency between X t�1j and X t
ior
non-null partial correlation between X t�1j and X t
im
✓ij 6= 0
if and only ifi
j
?
Network inference 56
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 57
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 58
The graphical models: reminderaa for goldfish-like memories
AssumptionA microarray can be represented as a multivariate Gaussian vector X .
Collecting gene expression
1. Steady-state data leads to an i.i.d. sample.
2. Time-course data gives a time series.
Graphical interpretationconditional dependency between X (i) and X (j )
ornon null partial correlation between X (i) and X (j )
if and only ifj
i
Encoded in an unknown matrix of parameters ⇥.
Network inference 59
The graphical models: reminderaa for goldfish-like memories
AssumptionA microarray can be represented as a multivariate Gaussian vector X .
Collecting gene expression
1. Steady-state data leads to an i.i.d. sample.
2. Time-course data gives a time series.
Graphical interpretationconditional dependency between X (i) and X (j )
ornon null partial correlation between X (i) and X (j )
if and only ifj
i?
Encoded in an unknown matrix of parameters ⇥.
Network inference 59
The graphical models: reminderaa for goldfish-like memories
AssumptionA microarray can be represented as a multivariate Gaussian vector X .
Collecting gene expression
1. Steady-state data leads to an i.i.d. sample.
2. Time-course data gives a time series.
Graphical interpretationconditional dependency between Xt (i) and Xt�1(j )
ornon null partial correlation between Xt (i) and Xt�1(j )
if and only ifj
i?
Encoded in an unknown matrix of parameters ⇥.
Network inference 59
The Maximum likelihood estimatorThe natural approach for parametric statistics
Let X be a random vector with distribution defined by fX (x ;⇥), where⇥ are the model parameters.
Maximum likelihood estimator
ˆ
⇥ = arg max
⇥L(⇥;X)
where L is the log likelihood, a function of the parameters:
L(⇥;X) = log
nY
k=1
fX (x
k;⇥),
where x
k is the k row of X.
Remarks
I This a convex optimization problem,
I We just need to detect non zero coe�cients in ⇥
Network inference 60
The penalized likelihood approach
Let ⇥ be the parameters to infer (the edges).
A penalized likelihood approach
ˆ
⇥� = arg max
⇥L(⇥;X) � � pen`1(⇥),
I L is the model log-likelihood,
Ipen`1 is a penalty function tuned by � > 0.
It performs1. regularization (needed when n ⌧ p),2. selection (sparsity induced by the `1-norm),
Network inference 61
The penalized likelihood approach
Let ⇥ be the parameters to infer (the edges).
A penalized likelihood approach
ˆ
⇥� = arg max
⇥L(⇥;X) � � pen`1(⇥),
I L is the model log-likelihood,
Ipen`1 is a penalty function tuned by � > 0.
It performs1. regularization (needed when n ⌧ p),2. selection (sparsity induced by the `1-norm),
Network inference 61
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 62
A Geometric View of SparsityConstrained Optimization
f(�1,�
2;X
)
�2 �1
We basically want to solve a problem ofthe form
maximize
�1,�2
f (�1, �2;X)
where f is typically a concave likelihoodfunction.This is strictly equivalent to solve
minimize
�1,�2
g(�1, �2;X)
where g = �f is convex ! For instancethe square lost in the OLS.
Network inference 63
A Geometric View of SparsityConstrained Optimization
f(�1,�
2;X
)
�2 �1
(maximize
�1,�2
f (�1, �2;X)
s.t. ⌦(�1, �2) c
,
where ⌦ defines a domain thatconstrains �.
Network inference 63
A Geometric View of SparsityConstrained Optimization
�2
�1
(maximize
�1,�2
f (�1, �2;X)
s.t. ⌦(�1, �2) c
,
where ⌦ defines a domain thatconstrains �.
mmaximize
�1,�2
f (�1, �2;X) � �⌦(�1, �2)
Network inference 63
A Geometric View of SparsityConstrained Optimization
�2
�1
(maximize
�1,�2
f (�1, �2;X)
s.t. ⌦(�1, �2) c
,
where ⌦ defines a domain thatconstrains �.
mmaximize
�1,�2
f (�1, �2;X) � �⌦(�1, �2)
How shall we define ⌦ to inducesparsity?
Network inference 63
A Geometric View of SparsitySupporting Hyperplane
An hyperplane supports a set i↵
I the set is contained in one half-space
I the set has at least one point on the hyperplane
�2
�1
Network inference 64
A Geometric View of SparsitySupporting Hyperplane
An hyperplane supports a set i↵
I the set is contained in one half-space
I the set has at least one point on the hyperplane
�2
�1
Network inference 64
A Geometric View of SparsitySupporting Hyperplane
An hyperplane supports a set i↵
I the set is contained in one half-space
I the set has at least one point on the hyperplane
�2
�1
Network inference 64
A Geometric View of SparsitySupporting Hyperplane
An hyperplane supports a set i↵
I the set is contained in one half-space
I the set has at least one point on the hyperplane
�2
�1
Network inference 64
A Geometric View of SparsitySupporting Hyperplane
An hyperplane supports a set i↵
I the set is contained in one half-space
I the set has at least one point on the hyperplane
�2
�1
There are Supporting Hyperplane at all points of convex sets:Generalize tangents
Network inference 64
A Geometric View of SparsitySupporting Hyperplane
An hyperplane supports a set i↵
I the set is contained in one half-space
I the set has at least one point on the hyperplane
�2
�1
�2
�1
Network inference 64
A Geometric View of SparsityDual Cone
Generalizes normals
�2
�1
�2
�1
�2
�1
Network inference 65
A Geometric View of SparsityDual Cone
Generalizes normals
�2
�1
�2
�1
�2
�1
Network inference 65
A Geometric View of SparsityDual Cone
Generalizes normals
�2
�1
�2
�1
�2
�1
Network inference 65
A Geometric View of SparsityDual Cone
Generalizes normals
�2
�1
�2
�1
�2
�1
Shape of dual cones ) sparsity pattern
Network inference 65
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 66
The LASSO
R. Tibshirani, 1996.The Lasso: Least Absolute Shrinkage and Selection Operator
S. Chen , D. Donoho , M. Saunders, 1995.Basis Pursuit.
Weisberg, 1980.Forward Stagewise regression.
3.2. Régularisations `p 23
�ls
�`1�1
�2
�ls
�`2
�1
�2
Fig. 3.2 – Comparaisons des solutions de problèmes régularisés par une norme `1 et `2.
À gauche de la figure 3.2, �`1 est l’estimateur du problème (3.2) régularisépar une norme `1. La deuxième composante de �`1 est annulée, car l’ellipseatteint la région admissible sur l’angle situé sur l’axe �2 = 0. À droite de lafigure 3.2, �`2 est l’estimateur du problème (3.2) régularisé par une norme`2. La forme circulaire de la région admissible n’incite pas les coefficientsà atteindre des valeurs nulles.
Afin de poursuivre cette discussion avec des arguments à la fois simpleset formels, on peut donner l’expression d’un coefficient des estimateurs�`1 et �`2 , lorsque la matrice X est orthogonale (ce qui correspond à descontours circulaires pour la fonction de perte quadratique). Pour �`2, nousavons
�`2m =
11 + �
�lsm .
Les coefficients subissent un rétrécissement2 proportionnel par le biais dufacteur 1 / (1 + �). En particulier, �`2
m ne peut être nul que si le coefficient�lsm est lui même exactement nul. Pour �`1, nous avons
�`1m = sign
⇣�lsm
⌘ ⇣|�ls
m| � �⌘
+
,
où [u]
+
= max(0, u). On obtient ainsi un seuillage « doux » : les compo-santes des coefficients des moindres carrés sont rétrécies d’une constante �lorsque |�ls
m| > � , et sont annulés sinon.
Stabilité
Définition 3.2 Stabilité—Selon Breiman [1996], un problème est instable si pour des ensemblesd’apprentissage similaires mais pas identiques (petites perturbations), on obtientdes prédictions ou des estimateurs très différents (grande perturbation).
Remarque 3.5 — Bousquet et Elisseeff [2002] ont défini de façon formelledifférentes notions de stabilité, basées sur le comportement des estima-teurs quand l’échantillon d’apprentissage est perturbé par le retrait ou leremplacement d’un exemple. �
2Shrinkage, en anglais.
(minimize
�2R2ky � X�k22,
s.t. k�k1 = |�1| + |�2| c.
mminimize
�2Rky � X�k22 + �k�k1.
Network inference 67
Orthogonal case and link to the OLS
OLS shrinkageThe Lasso has no analytical solution but in the orthogonal case: whenX
|X = I (never for real data),
ˆ�lassoj = sign(
ˆ�olsj ) max(0, |ˆ�ols
j | � �).
�ols
Lasso
OLS
�4
�4
�2
�2
00
2
2
4
4
Network inference 68
LARs: Least angle regression
B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, 2004.Least Angle Regression.
E�cient algorithm to compute the Lasso solutionsThe LARS solution consists of a curve denoting the solution � for eachvalue of �.
I construct a piecewise linear path of solution starting from the nullvector towards the OLS estimate,
I (Almost) the same cost as OLS,
I well adapted to cross validation (help us to choose �).
Network inference 69
Example: prostate cancer I
Lasso solution path with Lars
> library(lars)
> load("prostate.rda")
> x <- as.matrix(x)
> x <- scale(as.matrix(x))
> out <- lars(x,y)
> plot(out)
Network inference 70
Example: prostate cancer II
*
**
* * * * **
0.0 0.2 0.4 0.6 0.8 1.0
02
46
|beta|/max|beta|
Stan
dard
ized
Coe
ffici
ents
* * *
* * * * * *
* * * * * **
*
*
* * * * **
* **
* **
* ** * *
*
* * * * * * * *
*
* * * * * * * * ** * * * *
* * **
LASSO
37
82
1
0 1 3 5 6 7 8
Network inference 71
Choice of the tuning parameter � I
Model selection criteria
BIC(�) = ky � X
ˆ��k22 � df(
ˆ��)log n
2
AIC(�) = ky � X
ˆ��k22 � df(
ˆ��)
where df(
ˆ��) is the number of nonzero entries in ��.
Cross-validation
1. split the data into K folds,
2. use successively each K fold as the testing set,
3. compute the test error on this K folds,
4. average to obtain the CV estimation of the test error.
� is chosen to minimize the CV test error.
Network inference 72
Choice of the tuning parameter � II
CV choice for �
> cv.lars(x,y, K=10)
Network inference 73
Choice of the tuning parameter � III
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0.0 0.2 0.4 0.6 0.8 1.0
0.6
0.8
1.0
1.2
1.4
1.6
Fraction of final L1 norm
Cro
ss−V
alid
ated
MSE
Network inference 74
Many variations
Group-LassoActivate the variables by group (given by the user).
Adaptive/Weighted-Lasso
Adjust the penalty level to each variables, according to prior knowledge orwith data driven weights.
BoLassoBootstrapped version that removes false positives/stabilizes the estimate.
etc.
+ many theoretical results.
Network inference 75
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 76
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 77
Problem
t0t1
tn
⇡ 10s microarrays over time
⇡ 1000s probes (“genes”)
Inference
Which interactions?
The main statistical issue is the high dimensional setting.
Network inference 78
Handling the scarcity of the dataBy introducing some prior
Priors should be biologically grounded
1. few genes e↵ectively interact (sparsity),
2. networks are organized (latent clustering),
G1
G2
G3
G4 G5
G6
G7
G8
G9
G10
G11
G12
G13
Network inference 79
Handling the scarcity of the dataBy introducing some prior
Priors should be biologically grounded
1. few genes e↵ectively interact (sparsity),
2. networks are organized (latent clustering),
G1
G2
G3
G4 G5
G6
G7
G8
G9
G10
G11
G12
G13
Network inference 79
Handling the scarcity of the dataBy introducing some prior
Priors should be biologically grounded
1. few genes e↵ectively interact (sparsity),
2. networks are organized (latent clustering),
A1
A2
A3
A4 A
B1
B2
B3
B4
B5
B
C1
C
Network inference 79
Penalized log-likelihood
Banerjee et al., JMLR 2008
ˆ
⇥� = arg max
⇥Liid(⇥;S) � �k⇥k`1 ,
e�ciently solved by the graphical Lasso of Friedman et al, 2008.
Ambroise, Chiquet, Matias, EJS 2009Use adaptive penalty parameters for di↵erent coe�cients
Liid(⇥;S) � �kPZ ? ⇥k`1 ,
where PZ is a matrix of weights depending on the underlying clusteringZ.
Works with the pseudo log-likelihood (computationally e�cient).
Network inference 80
Penalized log-likelihood
Banerjee et al., JMLR 2008
ˆ
⇥� = arg max
⇥Liid(⇥;S) � �k⇥k`1 ,
e�ciently solved by the graphical Lasso of Friedman et al, 2008.
Ambroise, Chiquet, Matias, EJS 2009Use adaptive penalty parameters for di↵erent coe�cients
˜Liid(⇥;S) � �kPZ ? ⇥k`1 ,
where PZ is a matrix of weights depending on the underlying clusteringZ.
Works with the pseudo log-likelihood (computationally e�cient).
Network inference 80
Neighborhood selection (1)
Let
IXi be the ith column of X,
IX\i be X deprived of Xi .
Xi = X\i� + ", where �j = �✓ij✓ii
.
Meinshausen and Bulhman, 2006Since sign(corij |P\{i ,j}) = sign(�j ), select the neighbors of i with
arg min
�
1
n
��Xi � X\i�
��22+ � k�k`1 .
The sign pattern of ⇥� is inferred after a symmetrization step.
Network inference 81
Neighborhood selection (2)
The pseudo log-likelihood of the i.i.d Gaussian sample is
˜Liid(⇥;S) =
pX
i=1
nX
k=1
logP(Xk (i)|Xk (P\i);⇥i)
!,
=
n
2
log det(D) � n
2
Trace
⇣D
�1/2⇥S⇥D
�1/2⌘
� n
2
log(2⇡),
where D = diag(⇥).
Proposition
ˆ
⇥
pseudo
� = arg max
⇥:✓ij 6=✓ii
˜Liid
(⇥;S) � � k⇥k`1has the same null entries as inferred by neighborhood selection.
Network inference 82
Structured regularizationIntroduce prior knowledge
Building the weights
1. Build w from prior biological informationI transcription factors vs. regulatees,I number of potential binding sites,I KEGG pathways, Gene Ontology . . .
2. Build the weights matrix from clustering algorithmI Infer the network G0 with w = 1 for each node,I Apply a clustering algorithm on G0,I Re-Infer G with w built according to the clustering Z.
Network inference 83
Structured regularization“Bayesian” interpretation of `1 regularization
Laplacian prior on ⇥ depends on the clustering Z
P(⇥|Z) /Y
j
exp {�� · wkj (Z)|✓kj |} .
wk(Z) summarizes prior information on the position of edges fornode k
-1.0 -0.5 0.0 0.5 1.0
0.2
0.4
0.6
0.8
1.0
akj
prior
distribution
�wkj(Z) = 0.5�wkj(Z) = 1
Network inference 84
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 85
Multiple SclerosisStudying treatment e↵ect on 31 selected genes
Collaboration started with J.-C. Corvol, La pitie Salpetriere (and C.Charbonnier, C. Ambroise).
The data set
I 31 genes identified as implied in the MS physiopatholgy,
I Each patient received either a placebo or the treatment at t = 0,
I Gene expression are recorded through microarray along time for eachpatient (time-course data),
I full data set: of 100 microarrays, 25 patients (⇡ 75%/25%
treatment/placebo), 4 time points.
Comparing networks inferred under treatment or placebo.
Network inference 86
Multiple sclerosisDetecting changes in the regulatory networks
Active
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Placebo
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Intersection
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Difference
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Active
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Placebo
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Intersection
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Difference
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
I Better insight that di↵erential analysis,
I networks try to tell us how genes vary together,
I hope that sometimes an edge describes regulations.
Network inference 87
Multiple sclerosisDetecting changes in the regulatory networks
Active
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Placebo
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Intersection
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Difference
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Active
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Placebo
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Intersection
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Difference
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
I Better insight that di↵erential analysis,
I networks try to tell us how genes vary together,
I hope that sometimes an edge describes regulations.
Network inference 87
Multiple sclerosisDetecting changes in the regulatory networks
Active
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Placebo
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Intersection
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Difference
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Active
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Placebo
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Intersection
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
Difference
IL2RA
ITGAM
JAK1
STAT5A
STAT3
ITGA6
IL2RB
CD58
ITGB2
IL2RA.1ITGAL
STAT3.1
VCAM1.1
IL2
VCAM1.2
TOB1
JAK3
IL7.1
STAT3.2
ITGA6.1
CD58.1
IL2RGIL21
STAT5B
ICAM1
ITGB2.1
ITGA4
I Better insight that di↵erential analysis,
I networks try to tell us how genes vary together,
I hope that sometimes an edge describes regulations.
Network inference 87
E. coli S.O.S DNA repair network
E.coli S.O.S data
Assigning numbers to the arrows: Parameterizing agene regulation network by using accurateexpression kineticsMichal Ronen†, Revital Rosenberg†, Boris I. Shraiman‡, and Uri Alon†§¶
Departments of †Molecular Cell Biology and §Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel; and ‡Bell Laboratories,Lucent Technologies, Murray Hill, NJ 07974
Edited by David Botstein, Stanford University School of Medicine, Stanford, CA, and approved June 5, 2002 (received for review January 28, 2002)
A basic challenge in systems biology is to understand the dynam-ical behavior of gene regulation networks. Current approaches aimat determining the network structure based on genomic-scale data.However, the network connectivity alone is not sufficient to defineits dynamics; one needs to also specify the kinetic parameters forthe regulation reactions. Here, we ask whether effective kineticparameters can be assigned to a transcriptional network based onexpression data. We present a combined experimental and theo-retical approach based on accurate high temporal-resolution mea-surement of promoter activities from living cells by using greenfluorescent protein (GFP) reporter plasmids. We present algorithmsthat use these data to assign effective kinetic parameters within amathematical model of the network. To demonstrate this, weemploy a well defined network, the SOS DNA repair system ofEscherichia coli. We find a strikingly detailed temporal program ofexpression that correlates with the functional role of the SOSgenes and is driven by a hierarchy of effective kinetic parameterstrengths for the various promoters. The calculated parameters canbe used to determine the kinetics of all SOS genes given theexpression profile of just one representative, allowing a significantreduction in complexity. The concentration profile of the masterSOS transcriptional repressor can be calculated, demonstratingthat relative protein levels may be determined from purely tran-scriptional data. This finding opens the possibility of assigningkinetic parameters to transcriptional networks on a genomic scale.
There is much interest in understanding the design principlesunderlying the structure and dynamics of gene regulation
networks (1–10, 36). Determining the dynamic behavior of thesesystems requires specifying not only the network connectivity,but also the kinetic parameters for the various regulationreactions. Standard biochemical methods of measuring thesekinetic parameters are usually done outside of the cellularcontext and cannot be easily scaled up to a genomic level. Itwould therefore be valuable to develop methods to assigneffective kinetic parameters to transcriptional networks based onin vivo measurements. Here we present an approach for deter-mining the effective kinetic parameters of a transcriptionalnetwork based on accurate promoter activity measurements andanalysis algorithms (Fig. 1).
We developed a system for real-time monitoring of thetranscriptional activity of operons by means of low-copy reporterplasmids (10) in which a promoter controls green fluorescentprotein (GFP) (11). In each plasmid a different promotercontrols the transcription rate of the same reporter gene, gfp, andthus rate of transcript production from the promoter is propor-tional to the rate of GFP accumulation. By continuous mea-surements from living cells grown in a multiwell plate fluorim-eter, high-resolution time courses of the promoter strength andcell density are obtained. With this method, temporal resolutionof minutes can be achieved. This process complements, at higheraccuracy, the genomic-scale perspective given by DNA microar-rays (12). In a previous study, we demonstrated that this ap-proach can be used to determine the order of genes in an
assembly pathway (10). Here, we extend it by presenting analysisalgorithms that use accurate expression data to assign kineticparameters that can be incorporated into a mathematical modelof the dynamics.
We apply this method to a well characterized transcriptionalnetwork, the SOS DNA repair system in Escherichia coli. TheSOS system includes about 30 operons regulated at the tran-scriptional level (12–16). A master repressor (LexA) binds sitesin the promoter regions of these operons (16, 17). One of theSOS proteins, RecA, acts as a sensor of DNA damage: by bindingto single-stranded DNA it becomes activated and mediates LexAautocleavage. The drop in LexA levels causes the de-repressionof the SOS genes (Fig. 2). Once damage has been repaired orbypassed, the level of activated RecA drops, LexA accumulatesand represses the SOS operons, and the cells return to theiroriginal state.
We demonstrate that effective kinetic parameters can be usedto detect SOS genes with additional regulation, capture thetemporal transcriptional program, and calculate the concentra-tion profile of the regulatory protein.
MethodsPlasmids and Strains. Promoter regions were amplified fromMG1655 genomic DNA by using PCR and the following startand end coordinates for the primers taken from the sequencedE. coli genome (18): uvrA (4271368–4271753), uvrD (3995429–3995664), lexA (4254491–4254751), recA (2821707–2821893),ruvA (1943919–1944201), polB (65704–65932), umuD (1229552–1230069), uvrY (1993282–1993900), and lacZ (365438–365669).Each amplified region includes the entire region between ORFswith an additional 50–150 bp into each of the flanking ORFs.The promoter regions were cloned by using XhoI and BamHIsites upstream of a promoterless GFPmut3 gene in a low copypSC101 origin plasmid as described (10). The plasmids weretransformed into E. coli strain AB1157 [argE3, his4, leuB6,proA2, thr1, ara14, galK2, lacY1, mtl1, xyl5, thi1, tsx33, rpsL31, andsupE44] (24).
Culture and Measurements. Cultures of strain AB1157 (1 ml)inoculated from glycerol frozen stocks were grown for 16 h in LBmedium with kanamycin (25 !g!ml) at 37°C with shaking at 250rpm. The cultures were diluted 1:100 into defined medium (24)[M9 supplemented with thiamine (10 !g!ml), glucose (2 mg!ml), MgSO4 (1 mM), MgCl2 (0.1 mM), thymine (20 !g!ml), eachof the 20 aa except tryptophan (50 !g!ml) ! 25 !g!ml kana-mycin], at a final volume of 100 !l per well in a flat-bottom96-well plate (Sarstedt). The cultures were covered with anadhesive pad to prevent evaporation and grown in a WallacVictor2 multiwell f luorimeter at 37°C, set with an automatically
This paper was submitted directly (Track II) to the PNAS office.
Abbreviation: GFP, green fluorescent protein.¶To whom reprint requests should be addressed. E-mail: [email protected].
www.pnas.org!cgi!doi!10.1073!pnas.152046799 PNAS " August 6, 2002 " vol. 99 " no. 16 " 10555–10560
GEN
ETIC
S
I 8 major genes involved in theS.O.S. response
I 50 time points
uvrD
lexA
umuDC
recA
uvrA
uvrY
ruvA
polB
Network inference 88
E. coli S.O.S DNA repair networkPrecision and Recall rates
0.0
0.5
1.0
1.5Exp. 1
0.0
0.5
1.0
1.5Exp. 2
0.0
0.5
1.0
1.5Exp. 3
Lasso
Adapti
ve
G1DBN
Renet
KnownC
l
InferC
l0.0
0.5
1.0
1.5Exp. 4
PrecisionRecall
Network inference 89
E. coli S.O.S DNA repair networkInferred networks
Lasso
uvrD
lexA
umuDC
recA
uvrA
uvrY
ruvA
polB
Known Classification Inferred Classification
uvrD
lexA
umuDC
recA
uvrAuvrY
ruvA
polB
uvrDlexA
umuDC
recA
uvrA
uvrY
ruvA
polB
Network inference 90
Reasonnable computing timeComparing computation time with the multi-tests approach
2.5 3.0 3.5 4.0 4.5 5.0
02
46
810
CPU time (log scale)
log of p/n
log
of C
PU ti
me
(sec
)
1 se
c.10
sec
.1
min
.10
min
.1
hr.
4 hr
.
G1DBNRenet−VARInfCl
Figure: Computing times on the log-log scale for Renet-VAR, G1DBN and InfCl(including inference of classes). Intel Dual Core 3.40 GHz processor.
Network inference 91
OutlineIntroduction
MotivationsBackground on omicsModeling issue
Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector
Gaussian Graphical Models for genomic dataSteady-state dataTime-course data
Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso
Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning
Network inference 92
Handling the scarcity of data (1)By reducing the number of parameters
AssumptionConnections will only appear between informative genes
di↵erential analysis
select p key genes P
p “reasonable” compared to n
typically, n 2 [p/5; 5p]
the learning dataset
n size–p vectors of expression
(X1, . . . ,Xn ) with Xi 2 Rp
inference
Network inference 93
Handling the scarcity of data (2)By collecting as many observations as possible
Multitask learning Go to learning
How should we merge the data?organism
drug 1drug 2
drug 3
Network inference 94
Handling the scarcity of data (2)By collecting as many observations as possible
Multitask learning Go to learning
by inferring each network independentlyorganism
drug 1drug 2
drug 3
(X(1)1 , . . . ,X
(1)n1
),X(1)i 2 Rp1 (X
(2)1 , . . . ,X
(2)n2
),X(2)i 2 Rp2 (X
(3)1 , . . . ,X
(3)n3
),X(3)i 2 Rp3
inference inference inference
Network inference 94
Handling the scarcity of data (2)By collecting as many observations as possible
Multitask learning Go to learning
by pooling all the available dataorganism
drug 1drug 2
drug 3
(X1, . . . ,Xn ),Xi 2 Rp , with n = n1 + n2 + n3.
inference
Network inference 94
Handling the scarcity of data (2)By collecting as many observations as possible
Multitask learning Go to learning
by breaking the separabilityorganism
drug 1drug 2
drug 3
(X(1)1 , . . . ,X
(1)n1
),X(1)i 2 Rp1 (X
(2)1 , . . . ,X
(2)n2
),X(2)i 2 Rp2 (X
(3)1 , . . . ,X
(3)n3
),X(3)i 2 Rp3
inference
Network inference 94
Coupling related problems
Consider
IT samples concerning the expressions of the same p genes,
IX
(t)1 , . . . ,X (t)
nt is the t
th sample drawn from N (0p ,⌃(t)
), withcovariance matrix S
(t).
Multiple samples setup Go to scheme
Ignoring the relationships between the tasks leads to
arg max
⇥(t),t=1...,T
TX
t=1
L(⇥
(t);S
(t)) � � pen`1(⇥
(t),Z).
Breaking the separability
I Either by modifying the objective function
I or the constraints.
Network inference 95
Coupling related problems
Consider
IT samples concerning the expressions of the same p genes,
IX
(t)1 , . . . ,X (t)
nt is the t
th sample drawn from N (0p ,⌃(t)
), withcovariance matrix S
(t).
Multiple samples setup Go to scheme
Ignoring the relationships between the tasks leads to
arg max
⇥(t),t=1...,T
TX
t=1
L(⇥
(t);S
(t)) � � pen`1(⇥
(t),Z).
Breaking the separability
I Either by modifying the objective function
I or the constraints.
Network inference 95
Coupling related problems
Consider
IT samples concerning the expressions of the same p genes,
IX
(t)1 , . . . ,X (t)
nt is the t
th sample drawn from N (0p ,⌃(t)
), withcovariance matrix S
(t).
Multiple samples setup Go to scheme
Ignoring the relationships between the tasks leads to
arg max
⇥(t),t=1...,T
TX
t=1
L(⇥
(t);S
(t)) � � pen`1(⇥
(t),Z).
Breaking the separability
I Either by modifying the objective function
I or the constraints.
Network inference 95
Coupling problems by grouping variables through penalty
Potential groups Group(s) induced by edges (1, 2)
Group
-Lasso
X1 X2
X3 X4
X1 X2
X3 X4
X1 X2
X3 X4
X1 X2
X3 X4
X1 X2
X3 X4
X1 X2
X3 X4
X1 X2
X3 X4
X1 X2
X3 X4
Coo
perative-Lasso
X1 X2
X3 X4
X1 X2
X3 X4
X1 X2
X3 X4
X1 X2
X3 X4
X1 X2
X3 X4
X1 X2
X3 X4
X1 X2
X3 X4
X1 X2
X3 X4
X1 X2
X3 X4
X1 X2
X3 X4
X1 X2
X3 X4
X1 X2
X3 X4
Network inference 96
Conclusion
Versatile Statistical Tools
I Gaussian Graphical Model
I Sparse Regression
I Model Based Graph Clustering
Available Package on CRAN
ISimone to infer graphs from various kind of data
Iquadrupen to solve e�ciently various penalized regressions problems
Iosbm and mixer to search for structure in graphs
Network inference 97
References
Christopher Bishop, 2002-2006.Pattern Recognition and Machine Learning
Joe Whittacker, 1996.Graphical Model
Trevor Hastie, Robert Tibshirani, Jerome Friedman, 2009.The Elements of Statistical Learning (2nd Edition).
Ste↵en Lauritzen, 1996.Graphical Models.
+ papers of our lab (see webpage) ;-)
Network inference 98