Supervised inference of biological networks and...

Supervised inference of biological networksand

Classification of gene expression data with genenetworks

Jean-Philippe VertJean-Philippe.Vert@ensmp.fr

Centre for Computational BiologyEcole des Mines de Paris, ParisTech

Institute for Infocomm Research, Singapore, February 14th, 2007

J.-P. Vert (Ecole des Mines) Supervised network inference 1 / 32

Outline

1 Supervised inference of biological networks from heterogeneousgenomic data

2 Using gene networks for gene expression data classification

Outline

Motivation

DataGene expression,Gene sequence,Protein localization, ...

GraphProtein-protein interactions,Metabolic pathways,Signaling pathways, ...

Strategies

Unsupervised approachesThe graph is completely unknown

model-based approaches : Bayes nets, dynamical systems,..similarity-based : connect similar nodes

Supervised approachesPart of the graph is known in advance

Prior knowledge in model-based approachesStatistical / Machine learning approaches: learn from the knownsubnetwork a rule that can predict edges from genomic data

Strategies

Unsupervised approachesThe graph is completely unknown

model-based approaches : Bayes nets, dynamical systems,..similarity-based : connect similar nodes

Supervised approachesPart of the graph is known in advance

Prior knowledge in model-based approachesStatistical / Machine learning approaches: learn from the knownsubnetwork a rule that can predict edges from genomic data

Genomic Data

Data representation a distancesWe assume that each type of data (expression, sequences...)defines a (negative definite) distance between genes.Many such distances exist (cf kernel methods).Data integration is easily obtained by summing the distance toobtain an “integrated” distance

Method 1: Direct similarity-based prediction

Motivation: “connect similar genes”Connect a and b if d(a, b) is below a threshold.This is an unsupervised approach (no use of the knownsubnetwork).

Method 1: Direct similarity-based prediction

Motivation: “connect similar genes”Connect a and b if d(a, b) is below a threshold.This is an unsupervised approach (no use of the knownsubnetwork).

Method 2: metric learning

Metric learningMotivation: use the known subnetwork to refine the distancemeasure, before applying the similarity-based methodBased on kernel CCA (Yamanishi et al., 2004) or kernel metriclearning (V. and Yamanishi, 2005).

Metric learning example

Kernel metric learning (V. and Yamanishi, 2005)Criterion: connected points should be near each other aftermapping to a new d-dimensional Euclidean space.Add regularization to deal with high dimensions.Mapping f (x) = (f1(x), . . . , fd(x)) with:

fi = arg minf⊥f1,...,fi−1,var(f )=1

∑i∼j

(f (xi)− f (xj)

)2+ λ||f ||2k

Interpolates between (kernel) PCA (λ = ∞) and graph embedding(λ = 0).Equivalent to a generalized eigenvalue problem.

Kernel CCA (Yamanishi et al., 2004)Criterion: Find a subspace where the graph distance and thegenomic data distance matchFormulated as a search for correlated directions (kernel trick).

g6g5g4

g3g5g2

Kernel

g6g5g4

g7 g8g3

g3g5g2

Diffusion kernelKernel

g6g5g4

g7 g8g3

g3g5g2

Kernel CCA

Diffusion kernelKernel

Method 3: Matrix completion

Motivation: Fill missing entries in the adjacency matrix directly, bymaking it similar to (a variant of) the data matrixMethod: EM algorithm based on information geometry of positivesemidefinite matrices (Kato et al., 2005)

Method 4: Supervised binary classification

A pair can be connected (1) or not connected (-1)Use known network as a training set for a SVM that will predict ifnew pair is connected or notExample: SVM with tensor product pairwise kernel (Ben-Hur andNoble, 2006):

KTTPK ((x1, x2), (x3, x4)) = K (x1, x3)K (x2, x4) + K (x1, x4)K (x2, x3) .

Method 5: Local predictions

Motivation: define specific models for each target node todiscriminate between its neighbors and the othersTreat each node independently from the other. Then combinepredictions for ranking candidate edges.

Method 5: Local predictions

Motivation: define specific models for each target node todiscriminate between its neighbors and the othersTreat each node independently from the other. Then combinepredictions for ranking candidate edges.

Local predictions: pros and cons

ProsAllow very different models for nearby nodes on the graphFaster to train n models with n examples than 1 model with n2

examples

ConsFew positive examples available for some nodes

Local predictions: pros and cons

ProsAllow very different models for nearby nodes on the graphFaster to train n models with n examples than 1 model with n2

examples

ConsFew positive examples available for some nodes

Experiments

NetworkMetabolic network (668 vertices, 2782 edges)Protein-protein interaction network (984 vertices, 2438 edges)

Data (yeast)Gene expression (157 experiments)Phylogenetic profile (145 organisms)Cellular localization (23 intracellular locations)Yeast two-hybrid data (2438 interactions among 984 proteins)

Method5-fold cross-validationPredict edges between test set and training set

Results: protein-protein interaction

0 0.2 0.4 0.6 0.8 10

Ratio of false positives

DirectkMLkCCAemlocalPkernel

0 0.2 0.4 0.6 0.8 10

Ratio of true positives

Results: metabolic gene network

0 0.2 0.4 0.6 0.8 10

Ratio of true positives

Results: effect of data integration

0 0.2 0.4 0.6 0.8 10

explocphyy2hint

0 0.2 0.4 0.6 0.8 10

Ratio of true positivesF

explocphyy2hint

Local SVM, protein-protein interaction network.

Results: effect of data integration

0 0.2 0.4 0.6 0.8 10

explocphyint

0 0.2 0.4 0.6 0.8 10

Ratio of true positivesF

explocphyint

Local SVM, metabolic gene network.

Conclusion

SummaryA variety of methods have been investigated recentlySome reach interesting performance on the benchmarks: LocalSVM retrieve 45% of all true edges of the metabolic gene networkat a FDR below 50%

Valid for any network, but non-mechanistic model.Future work: experimental validation, improved data integration,semi-local approaches...

Outline

Tumor classification from microarray data

Data availableGene expression measures for more than 10k genesMeasured on less than 100 samples of two (or more) differentclasses (e.g., different tumors)

GoalDesign a classifier to automatically assign a class to futuresamples from their expression profileInterpret biologically the differences between the classes

Tumor classification from microarray data

Data availableGene expression measures for more than 10k genesMeasured on less than 100 samples of two (or more) differentclasses (e.g., different tumors)

GoalDesign a classifier to automatically assign a class to futuresamples from their expression profileInterpret biologically the differences between the classes

Linear classifiers

The approachEach sample is represented by a vector x = (x1, . . . , xp) wherep > 105 is the number of probesClassification: given the set of labeled sample, learn a lineardecision function:

f (x) =

p∑i=1

βixi + β0 ,

that is positive for one class, negative for the otherInterpretation: the weight βi quantifies the influence of gene i forthe classification

Linear classifiers

PitfallsNo robust estimation procedure exist for 100 samples in 105

dimensions!It is necessary to reduce the complexity of the problem with priorknowledge.

Example : Norm Constraints

The approachA common method in statistics to learn with few samples in highdimension is to constrain the norm of β, e.g.:

Euclidean norm (support vector machines, ridge regression):‖β ‖2 =

∑pi=1 β2

L1-norm (lasso regression) : ‖β ‖1 =∑p

i=1 |βi |

ProsGood performance inclassification

ConsLimited interpretation(small weights)No prior biologicalknowledge

Example 2: Feature Selection

The approachConstrain most weights to be 0, i.e., select a few genes (< 20) whoseexpression are enough for classification. Interpretation is then aboutthe selected genes.

ProsGood performance inclassificationUseful for biomarkerselectionApparently easyinterpretation

ConsThe gene selectionprocess is usually notrobustWrong interpretation isthe rule (too muchcorrelation betweengenes)

Pathway interpretation

MotivationBasic biological functions are usually expressed in terms ofpathways and not of single genes (metabolic, signaling,regulatory)Many pathways are already knownHow to use this prior knowledge to constrain the weights to havean interpretation at the level of pathways?

Pathway-derived norm constraint

One solution (Rapaport et al., 2007)Let the set of pathways be represented by an undirected graph.Consider the pathway-derived norm:

Ω(β) =∑i∼j

(βi − βj

Constrain Ω(β) instead of ‖β ‖22

Remard: this is equivalent to a SVM with a particular kernel.

Glycan biosynthesis

Protein kinases

DNA and RNA polymerase subunits

Glycolysis / Gluconeogenesis

Sulfurmetabolism

Porphyrinand chlorophyll metabolism

Riboflavin metabolism

Folatebiosynthesis

Biosynthesis of steroids, ergosterol metabolism

Lysinebiosynthesis

Phenylalanine, tyrosine andtryptophan biosynthesis Purine

metabolism

Oxidative phosphorylation, TCA cycle

Nitrogen,asparaginemetabolism

Bad exampleThe graph is thecomplete knownmetabolic network of thebudding yeast (fromKEGG database)We project the classifierweight learned by aSVMGood classificationaccuracy, but nopossible interpretation!

Good exampleThe graph is thecomplete knownmetabolic network of thebudding yeast (fromKEGG database)We project the classifierweight learned by aspectral SVMGood classificationaccuracy, and goodinterpretation!

Conclusion

Use the gene graph to encode prior knowledge about theclassifier.Prior knowledge is always needed to classify few examples inlarge dimensions (sometimes implicitly)Future work: validation of the method on more data, otherformulations, directed graphs...

Acknowledgements

Supervised graph inferenceYoshihiro Yamanishi, Minoru Kanehisa (Univ. Kyoto): kCCA, kMLKevin Bleakley, Gerard Biau (Univ. Montpellier): local SVM

Classification of microarray dataFranck Rapaport, Emmanuel Barillot, Andrei Zynoviev, MarieDutreix (Curie Institute)

Supervised inference of biological networks and...

Documents

Improving Weakly Supervised Visual Grounding by ......guide the learning of region-phrase matching in training. At the inference time, our method no longer requires object detectors

Mixed-Effects Biological Models: Estimation and Inference

Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University

Supervised neural network for data assimilation on atmospheric …pocanga.cptec.inpe.br/inscricaoDas/pdf/ppt/rosangela_c... · 2017. 11. 14. · Biological neuron Artificial neuron

Supervised Sentence Fusion with Single-Stage Inferencekapil/documents/ijcnlp13pyrfusion_slides.pdf · Supervised Sentence Fusion with Single-Stage Inference Kapil Thadani&Kathy McKeown

Supervised Classification - igetdst-iget.in/tutorials/IGET_RS_008/IGET_RS_008 Supervised... · ... Supervised Classification using SAGA 2 Supervised Classification using SAGA Objective:

Parametric Inference for Biological Sequence Analysispeople.csail.mit.edu/jrennie/trg/papers/pachter-biosequence-04.pdf · Parametric Inference for Biological Sequence Analysis Lior

Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State

Statistical learning of biological networks: a brief overview · Statistical learning of biological networks: a brief overviewSupervised Predictive approaches 12 / 30. Supervised

High-Dimensional Statistical Learning: Introduction...Classical Statistics Biological Big Data Supervised and Unsupervised Learning Low-Dimensional Versus High-Dimensional I The data

Biological Biomarkers NIH Public Access Improves Cognitive ...adni.loni.usc.edu/adni-publications/Cheng-2013-Semi-supervised multimodal relevanc.pdfdiseases. In this paper, we present

Feature Selection for Unsupervised and Supervised …shashua/papers/feature-selection-JMLR.pdfFeature Selection for Unsupervised and Supervised Inference: the Emergence of Sparsity

Presentation and implementation of a new approach of age from biological indicators (Bayesian inference procedure) Luc Buchet, Henri Caussinus, Daniel

Semi-supervised inference: General theory and estimation

Network inference and biological dynamics€¦ · inference methods that are instances of the general formulation we describe, using two published dynamical models. Our investigation

Knowledge Extraction and Inference from Text · Dominant Approaches to Relationship Extraction I Supervised I For each relation type, we collect annotated sentences as training examples

Network inference and biological dynamics · Network inference approaches are now widely used in biological applica-tions to probe regulatory relationships between molecular components

Supervised Inference of Gene Regulatory Networks by Linear

Introduction to KEGG - metabolomics.se Biology/Lectures/KEGG.pdf• Gene Network Inference Engine based on Supervised Analysis • Integration of multiple omics data to infer gene

Exact and efficient top-K inference for multi-target ... › publication › 8197306 › file › 8197340.pdfclassiﬁcation, dyadic prediction and biological network inference. In