Supervised inference of biological networks and...

Preview:

Citation preview

Supervised inference of biological networksand

Classification of gene expression data with genenetworks

Jean-Philippe VertJean-Philippe.Vert@ensmp.fr

Centre for Computational BiologyEcole des Mines de Paris, ParisTech

Institute for Infocomm Research, Singapore, February 14th, 2007

J.-P. Vert (Ecole des Mines) Supervised network inference 1 / 32

Outline

1 Supervised inference of biological networks from heterogeneousgenomic data

2 Using gene networks for gene expression data classification

J.-P. Vert (Ecole des Mines) Supervised network inference 2 / 32

Outline

1 Supervised inference of biological networks from heterogeneousgenomic data

2 Using gene networks for gene expression data classification

J.-P. Vert (Ecole des Mines) Supervised network inference 2 / 32

Outline

1 Supervised inference of biological networks from heterogeneousgenomic data

2 Using gene networks for gene expression data classification

J.-P. Vert (Ecole des Mines) Supervised network inference 3 / 32

Motivation

DataGene expression,Gene sequence,Protein localization, ...

GraphProtein-protein interactions,Metabolic pathways,Signaling pathways, ...

J.-P. Vert (Ecole des Mines) Supervised network inference 4 / 32

Strategies

Unsupervised approachesThe graph is completely unknown

model-based approaches : Bayes nets, dynamical systems,..similarity-based : connect similar nodes

Supervised approachesPart of the graph is known in advance

Prior knowledge in model-based approachesStatistical / Machine learning approaches: learn from the knownsubnetwork a rule that can predict edges from genomic data

J.-P. Vert (Ecole des Mines) Supervised network inference 5 / 32

Strategies

Unsupervised approachesThe graph is completely unknown

model-based approaches : Bayes nets, dynamical systems,..similarity-based : connect similar nodes

Supervised approachesPart of the graph is known in advance

Prior knowledge in model-based approachesStatistical / Machine learning approaches: learn from the knownsubnetwork a rule that can predict edges from genomic data

J.-P. Vert (Ecole des Mines) Supervised network inference 5 / 32

Genomic Data

Data representation a distancesWe assume that each type of data (expression, sequences...)defines a (negative definite) distance between genes.Many such distances exist (cf kernel methods).Data integration is easily obtained by summing the distance toobtain an “integrated” distance

J.-P. Vert (Ecole des Mines) Supervised network inference 6 / 32

Method 1: Direct similarity-based prediction

Motivation: “connect similar genes”Connect a and b if d(a, b) is below a threshold.This is an unsupervised approach (no use of the knownsubnetwork).

J.-P. Vert (Ecole des Mines) Supervised network inference 7 / 32

Method 1: Direct similarity-based prediction

Motivation: “connect similar genes”Connect a and b if d(a, b) is below a threshold.This is an unsupervised approach (no use of the knownsubnetwork).

J.-P. Vert (Ecole des Mines) Supervised network inference 7 / 32

Method 2: metric learning

Metric learningMotivation: use the known subnetwork to refine the distancemeasure, before applying the similarity-based methodBased on kernel CCA (Yamanishi et al., 2004) or kernel metriclearning (V. and Yamanishi, 2005).

J.-P. Vert (Ecole des Mines) Supervised network inference 8 / 32

Method 2: metric learning

Metric learningMotivation: use the known subnetwork to refine the distancemeasure, before applying the similarity-based methodBased on kernel CCA (Yamanishi et al., 2004) or kernel metriclearning (V. and Yamanishi, 2005).

J.-P. Vert (Ecole des Mines) Supervised network inference 8 / 32

Method 2: metric learning

Metric learningMotivation: use the known subnetwork to refine the distancemeasure, before applying the similarity-based methodBased on kernel CCA (Yamanishi et al., 2004) or kernel metriclearning (V. and Yamanishi, 2005).

J.-P. Vert (Ecole des Mines) Supervised network inference 8 / 32

Method 2: metric learning

Metric learningMotivation: use the known subnetwork to refine the distancemeasure, before applying the similarity-based methodBased on kernel CCA (Yamanishi et al., 2004) or kernel metriclearning (V. and Yamanishi, 2005).

J.-P. Vert (Ecole des Mines) Supervised network inference 8 / 32

Metric learning example

Kernel metric learning (V. and Yamanishi, 2005)Criterion: connected points should be near each other aftermapping to a new d-dimensional Euclidean space.Add regularization to deal with high dimensions.Mapping f (x) = (f1(x), . . . , fd(x)) with:

fi = arg minf⊥f1,...,fi−1,var(f )=1

∑i∼j

(f (xi)− f (xj)

)2+ λ||f ||2k

.

Interpolates between (kernel) PCA (λ = ∞) and graph embedding(λ = 0).Equivalent to a generalized eigenvalue problem.

J.-P. Vert (Ecole des Mines) Supervised network inference 9 / 32

Metric learning example

Kernel CCA (Yamanishi et al., 2004)Criterion: Find a subspace where the graph distance and thegenomic data distance matchFormulated as a search for correlated directions (kernel trick).

g8

g3 g7

g6g5g4

g1

g2

J.-P. Vert (Ecole des Mines) Supervised network inference 10 / 32

Metric learning example

Kernel CCA (Yamanishi et al., 2004)Criterion: Find a subspace where the graph distance and thegenomic data distance matchFormulated as a search for correlated directions (kernel trick).

g8

g3 g7

g6g5g4

g1

g2

g1

g4

g6

g7

g8

g3g5g2

Kernel

J.-P. Vert (Ecole des Mines) Supervised network inference 10 / 32

Metric learning example

Kernel CCA (Yamanishi et al., 2004)Criterion: Find a subspace where the graph distance and thegenomic data distance matchFormulated as a search for correlated directions (kernel trick).

g8

g3 g7

g6g5g4

g1

g2

g6

g7 g8g3

g1

g4

g5

g1

g4

g6

g7

g8

g3g5g2

g2

Diffusion kernelKernel

J.-P. Vert (Ecole des Mines) Supervised network inference 10 / 32

Metric learning example

Kernel CCA (Yamanishi et al., 2004)Criterion: Find a subspace where the graph distance and thegenomic data distance matchFormulated as a search for correlated directions (kernel trick).

g8

g3 g7

g6g5g4

g1

g2

g6

g7 g8g3

g1

g4

g5

g1

g4

g6

g7

g8

g3g5g2

g2

Kernel CCA

Diffusion kernelKernel

J.-P. Vert (Ecole des Mines) Supervised network inference 10 / 32

Method 3: Matrix completion

Motivation: Fill missing entries in the adjacency matrix directly, bymaking it similar to (a variant of) the data matrixMethod: EM algorithm based on information geometry of positivesemidefinite matrices (Kato et al., 2005)

J.-P. Vert (Ecole des Mines) Supervised network inference 11 / 32

Method 4: Supervised binary classification

A pair can be connected (1) or not connected (-1)Use known network as a training set for a SVM that will predict ifnew pair is connected or notExample: SVM with tensor product pairwise kernel (Ben-Hur andNoble, 2006):

KTTPK ((x1, x2), (x3, x4)) = K (x1, x3)K (x2, x4) + K (x1, x4)K (x2, x3) .

J.-P. Vert (Ecole des Mines) Supervised network inference 12 / 32

Method 5: Local predictions

Motivation: define specific models for each target node todiscriminate between its neighbors and the othersTreat each node independently from the other. Then combinepredictions for ranking candidate edges.

J.-P. Vert (Ecole des Mines) Supervised network inference 13 / 32

Method 5: Local predictions

Motivation: define specific models for each target node todiscriminate between its neighbors and the othersTreat each node independently from the other. Then combinepredictions for ranking candidate edges.

+1

−1

?

?

?

+1

−1

−1

J.-P. Vert (Ecole des Mines) Supervised network inference 13 / 32

Local predictions: pros and cons

ProsAllow very different models for nearby nodes on the graphFaster to train n models with n examples than 1 model with n2

examples

ConsFew positive examples available for some nodes

J.-P. Vert (Ecole des Mines) Supervised network inference 14 / 32

Local predictions: pros and cons

ProsAllow very different models for nearby nodes on the graphFaster to train n models with n examples than 1 model with n2

examples

ConsFew positive examples available for some nodes

J.-P. Vert (Ecole des Mines) Supervised network inference 14 / 32

Experiments

NetworkMetabolic network (668 vertices, 2782 edges)Protein-protein interaction network (984 vertices, 2438 edges)

Data (yeast)Gene expression (157 experiments)Phylogenetic profile (145 organisms)Cellular localization (23 intracellular locations)Yeast two-hybrid data (2438 interactions among 984 proteins)

Method5-fold cross-validationPredict edges between test set and training set

J.-P. Vert (Ecole des Mines) Supervised network inference 15 / 32

Results: protein-protein interaction

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Ratio of false positives

Rat

io o

f tru

e po

sitiv

es

DirectkMLkCCAemlocalPkernel

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Ratio of true positives

Fal

se d

isco

very

rat

e

DirectkMLkCCAemlocalPkernel

J.-P. Vert (Ecole des Mines) Supervised network inference 16 / 32

Results: metabolic gene network

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Ratio of false positives

Rat

io o

f tru

e po

sitiv

es

DirectkMLkCCAemlocalPkernel

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Ratio of true positives

Fal

se d

isco

very

rat

e

DirectkMLkCCAemlocalPkernel

J.-P. Vert (Ecole des Mines) Supervised network inference 17 / 32

Results: effect of data integration

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Ratio of false positives

Rat

io o

f tru

e po

sitiv

es

explocphyy2hint

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Ratio of true positivesF

alse

dis

cove

ry r

ate

explocphyy2hint

Local SVM, protein-protein interaction network.

J.-P. Vert (Ecole des Mines) Supervised network inference 18 / 32

Results: effect of data integration

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Ratio of false positives

Rat

io o

f tru

e po

sitiv

es

explocphyint

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Ratio of true positivesF

alse

dis

cove

ry r

ate

explocphyint

Local SVM, metabolic gene network.

J.-P. Vert (Ecole des Mines) Supervised network inference 19 / 32

Conclusion

SummaryA variety of methods have been investigated recentlySome reach interesting performance on the benchmarks: LocalSVM retrieve 45% of all true edges of the metabolic gene networkat a FDR below 50%

Valid for any network, but non-mechanistic model.Future work: experimental validation, improved data integration,semi-local approaches...

J.-P. Vert (Ecole des Mines) Supervised network inference 20 / 32

Outline

1 Supervised inference of biological networks from heterogeneousgenomic data

2 Using gene networks for gene expression data classification

J.-P. Vert (Ecole des Mines) Supervised network inference 21 / 32

Tumor classification from microarray data

Data availableGene expression measures for more than 10k genesMeasured on less than 100 samples of two (or more) differentclasses (e.g., different tumors)

GoalDesign a classifier to automatically assign a class to futuresamples from their expression profileInterpret biologically the differences between the classes

J.-P. Vert (Ecole des Mines) Supervised network inference 22 / 32

Tumor classification from microarray data

Data availableGene expression measures for more than 10k genesMeasured on less than 100 samples of two (or more) differentclasses (e.g., different tumors)

GoalDesign a classifier to automatically assign a class to futuresamples from their expression profileInterpret biologically the differences between the classes

J.-P. Vert (Ecole des Mines) Supervised network inference 22 / 32

Linear classifiers

The approachEach sample is represented by a vector x = (x1, . . . , xp) wherep > 105 is the number of probesClassification: given the set of labeled sample, learn a lineardecision function:

f (x) =

p∑i=1

βixi + β0 ,

that is positive for one class, negative for the otherInterpretation: the weight βi quantifies the influence of gene i forthe classification

J.-P. Vert (Ecole des Mines) Supervised network inference 23 / 32

Linear classifiers

PitfallsNo robust estimation procedure exist for 100 samples in 105

dimensions!It is necessary to reduce the complexity of the problem with priorknowledge.

J.-P. Vert (Ecole des Mines) Supervised network inference 24 / 32

Example : Norm Constraints

The approachA common method in statistics to learn with few samples in highdimension is to constrain the norm of β, e.g.:

Euclidean norm (support vector machines, ridge regression):‖β ‖2 =

∑pi=1 β2

i

L1-norm (lasso regression) : ‖β ‖1 =∑p

i=1 |βi |

ProsGood performance inclassification

ConsLimited interpretation(small weights)No prior biologicalknowledge

J.-P. Vert (Ecole des Mines) Supervised network inference 25 / 32

Example 2: Feature Selection

The approachConstrain most weights to be 0, i.e., select a few genes (< 20) whoseexpression are enough for classification. Interpretation is then aboutthe selected genes.

ProsGood performance inclassificationUseful for biomarkerselectionApparently easyinterpretation

ConsThe gene selectionprocess is usually notrobustWrong interpretation isthe rule (too muchcorrelation betweengenes)

J.-P. Vert (Ecole des Mines) Supervised network inference 26 / 32

Pathway interpretation

MotivationBasic biological functions are usually expressed in terms ofpathways and not of single genes (metabolic, signaling,regulatory)Many pathways are already knownHow to use this prior knowledge to constrain the weights to havean interpretation at the level of pathways?

J.-P. Vert (Ecole des Mines) Supervised network inference 27 / 32

Pathway-derived norm constraint

One solution (Rapaport et al., 2007)Let the set of pathways be represented by an undirected graph.Consider the pathway-derived norm:

Ω(β) =∑i∼j

(βi − βj

)2.

Constrain Ω(β) instead of ‖β ‖22

Remard: this is equivalent to a SVM with a particular kernel.

J.-P. Vert (Ecole des Mines) Supervised network inference 28 / 32

Pathway interpretation

N

-

Glycan biosynthesis

Protein kinases

DNA and RNA polymerase subunits

Glycolysis / Gluconeogenesis

Sulfurmetabolism

Porphyrinand chlorophyll metabolism

Riboflavin metabolism

Folatebiosynthesis

Biosynthesis of steroids, ergosterol metabolism

Lysinebiosynthesis

Phenylalanine, tyrosine andtryptophan biosynthesis Purine

metabolism

Oxidative phosphorylation, TCA cycle

Nitrogen,asparaginemetabolism

Bad exampleThe graph is thecomplete knownmetabolic network of thebudding yeast (fromKEGG database)We project the classifierweight learned by aSVMGood classificationaccuracy, but nopossible interpretation!

J.-P. Vert (Ecole des Mines) Supervised network inference 29 / 32

Pathway interpretation

Good exampleThe graph is thecomplete knownmetabolic network of thebudding yeast (fromKEGG database)We project the classifierweight learned by aspectral SVMGood classificationaccuracy, and goodinterpretation!

J.-P. Vert (Ecole des Mines) Supervised network inference 30 / 32

Conclusion

Use the gene graph to encode prior knowledge about theclassifier.Prior knowledge is always needed to classify few examples inlarge dimensions (sometimes implicitly)Future work: validation of the method on more data, otherformulations, directed graphs...

J.-P. Vert (Ecole des Mines) Supervised network inference 31 / 32

Acknowledgements

Supervised graph inferenceYoshihiro Yamanishi, Minoru Kanehisa (Univ. Kyoto): kCCA, kMLKevin Bleakley, Gerard Biau (Univ. Montpellier): local SVM

Classification of microarray dataFranck Rapaport, Emmanuel Barillot, Andrei Zynoviev, MarieDutreix (Curie Institute)

J.-P. Vert (Ecole des Mines) Supervised network inference 32 / 32

Recommended