3
Comparative and Functional Genomics Comp Funct Genom 2003; 4: 660–662. Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.348 Meeting Report ISMB 2003 BioPathways SIG and 5th BioPathways Meeting ISMB’03, Brisbane, Australia, 27–28 June 2003 Vincent Schachter 1 * and Aviv Regev 2 1 Genoscope, 2 rue Gaston Cr´ emieux, F-91000 Evry, France 2 Bauer Center for Genomics Research, Harvard University, 7 Divinity Avenue, Cambridge, MA 02138, USA *Correspondence to: Vincent Schachter, Director of Bioinformatics, GENOSCOPE, 2 rue Gaston Cr´ emieux, F-91000 Evry, France. E-mail: [email protected] Received: 10 October 2003 Accepted: 10 October 2003 The 5th BioPathways Consortium Meeting gath- ered 21 speakers, close to 100 registered partici- pants and an undetermined number of visitors from neighbouring SIGs. The meeting featured two main scientific ses- sions, focusing respectively on ‘Regulation and Interactions on a Systems Scale’ and ‘Function and Evolution of Metabolic Networks’, an ‘Ontologies, Databases and Data Integration’ session, and a con- tributed session on software tools for pathways. Following the BioPathways tradition and to foster depth of exchange, scientific sessions were struc- tured as a series of long presentations, concluded by an hour of open discussion on the session theme. The meeting started with a short assessment of the evolution of the field — computational biol- ogy of networks, or ‘systems biology’? — which has matured fast in the 3 years of existence of the BioPathways SIG. While some theoretical subfields, such as network reconstruction from experimental data, are acquiring technical depth and generating predictions of increasing biolog- ical relevance, there is a clear trend towards a stronger coupling between theoretical and exper- imental approaches, leading to new open questions on both sides. Another noticeable trend is the strong revival of fields that had been perceived as fairly well understood and stable, such as metabolism, thanks both to the ‘systems-wide’ perspective and to new theoretical tools. The ‘Regulation and Interactions’ session revol- ved around a few key ideas, each illustrated by several speakers. A first theme was the search for the right notion of ‘module’ in biological networks, at different levels of molecular organization (e.g. in regulatory networks, protein interaction networks), using diverse theoretical tools (e.g. graph theory, graphical probabilistic models). Segal illustrated this theme by presenting a method to infer module networks, i.e. sets of genes sharing a regulatory mechanism, from expres- sion data. The original inference scheme learns a Bayesian Network from the data, but the Bayesian network models regulation of sets of genes — modules — rather than single genes. The learning algorithm optimizes on the structure of the network, but also on the partition of genes into modules and on the ‘regulation program’ of each module, an abstract representation of the regulatory mechanism as a decision tree. It was also shown that in order to ensure statisti- cal robustness and, better yet, biological relevance, Copyright 2003 John Wiley & Sons, Ltd.

ISMB 2003 BioPathways SIG and 5th BioPathways Meeting

Embed Size (px)

Citation preview

Page 1: ISMB 2003 BioPathways SIG and 5th BioPathways Meeting

Comparative and Functional GenomicsComp Funct Genom 2003; 4: 660–662.Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.348

Meeting Report

ISMB 2003 BioPathways SIG and5th BioPathways MeetingISMB’03, Brisbane, Australia, 27–28 June 2003

Vincent Schachter1* and Aviv Regev2

1Genoscope, 2 rue Gaston Cremieux, F-91000 Evry, France2Bauer Center for Genomics Research, Harvard University, 7 Divinity Avenue, Cambridge, MA 02138, USA

*Correspondence to:Vincent Schachter, Director ofBioinformatics, GENOSCOPE,2 rue Gaston Cremieux,F-91000 Evry, France.E-mail: [email protected]

Received: 10 October 2003Accepted: 10 October 2003

The 5th BioPathways Consortium Meeting gath-ered 21 speakers, close to 100 registered partici-pants and an undetermined number of visitors fromneighbouring SIGs.

The meeting featured two main scientific ses-sions, focusing respectively on ‘Regulation andInteractions on a Systems Scale’ and ‘Function andEvolution of Metabolic Networks’, an ‘Ontologies,Databases and Data Integration’ session, and a con-tributed session on software tools for pathways.Following the BioPathways tradition and to fosterdepth of exchange, scientific sessions were struc-tured as a series of long presentations, concludedby an hour of open discussion on the session theme.

The meeting started with a short assessment ofthe evolution of the field — computational biol-ogy of networks, or ‘systems biology’? — whichhas matured fast in the 3 years of existence ofthe BioPathways SIG. While some theoreticalsubfields, such as network reconstruction fromexperimental data, are acquiring technical depthand generating predictions of increasing biolog-ical relevance, there is a clear trend towards astronger coupling between theoretical and exper-imental approaches, leading to new open questionson both sides. Another noticeable trend is the strong

revival of fields that had been perceived as fairlywell understood and stable, such as metabolism,thanks both to the ‘systems-wide’ perspective andto new theoretical tools.

The ‘Regulation and Interactions’ session revol-ved around a few key ideas, each illustrated byseveral speakers. A first theme was the search forthe right notion of ‘module’ in biological networks,at different levels of molecular organization (e.g. inregulatory networks, protein interaction networks),using diverse theoretical tools (e.g. graph theory,graphical probabilistic models).

Segal illustrated this theme by presenting amethod to infer module networks, i.e. sets of genessharing a regulatory mechanism, from expres-sion data. The original inference scheme learnsa Bayesian Network from the data, but theBayesian network models regulation of sets ofgenes — modules — rather than single genes. Thelearning algorithm optimizes on the structure of thenetwork, but also on the partition of genes intomodules and on the ‘regulation program’ of eachmodule, an abstract representation of the regulatorymechanism as a decision tree.

It was also shown that in order to ensure statisti-cal robustness and, better yet, biological relevance,

Copyright 2003 John Wiley & Sons, Ltd.

Page 2: ISMB 2003 BioPathways SIG and 5th BioPathways Meeting

BioPathways SIG Meeting 661

integration across different levels could be neces-sary. Searching for modules across different specieswas proposed as a promising path to increase sup-port from experimental data, to filter out noise andperhaps to search for evolutionarily-driven designprinciples.

In an extension of the classical expression pro-file correlation and clustering approaches, Segalshowed how microarray data from several speciescould be combined to yield a ‘meta-gene’ network.First, genes are grouped across species into so-called ‘metagenes’ (groups of orthologues), usingan extension to n species of the BLAST bidirec-tional best hit criterion. Expression profiles are thentransformed into gene–gene correlation matricesfor each species, and these are merged into a singlemetagene–metagene correlation matrix. Discretiza-tion yields a metagene ‘co-expression network’,which was then subjected to a variety of statis-tical analyses, and used to predict function usingguilt-by-association. Among the biological insightsthat could be gained, the assessment of the fractionof co-expression links conserved from the singleto multi-species level showed clear structure infunctional space: links between genes related tometabolism, degradation and protein biosynthesiswere among the most conserved, while signallingand neuronal function seemed to be newly evolved.

Another theme was the use of stochastic mod-els to better account for molecular phenomenafor which ordinary or partial differential equationsmodelling is inadequate, and of stochastic inferencemethods. Tian showed how stochasticity arises nat-urally in regulatory networks, from the probabilisticnature of the binding process combined with thelow number of transcription factors present at agiven time. He also surveyed how noise could beof functional use to the cell, by helping it stabilizeits dynamics or by even by generating qualitativedifferences in phenotype. Finally, he reviewed howexisting types of network models, from Booleancircuits to models based on stochastic differentialequations, could account for stochasticity in simu-lations.

A third theme was the use of probabilisticmethods for the prediction of individual proper-ties of genes and proteins, using sets of (other)properties. Przulj presented a systematic searchfor correlation between local graph properties ofprotein networks — including degree, articulationpoints, hubs, existence of short paths between

vertices — and biological properties of the cor-responding proteins, such as lethality, functionalclass, or the existence of a genetic interaction.Lichtenberg and Zhang also illustrated this theme,with respective focuses on protein function predic-tion and on genetic and protein interaction predic-tion.

Finally, both the issue of the robustness of pre-dictive methods relative to false positives and neg-atives in experimental data, and the question ofhow to validate these predictions — from cross-validation to actual experiments — were ubiqui-tous. Although there appears to be no simple recipeto solve the validation issue, the level of sophisti-cation of statistical assessments of prediction accu-racy has risen, with combinations of strategies,such as systematic validation against functional cat-egories, cross-correlation between different data-types, or comparison against randomized networks.

Regarding the robustness issue, the commonstrategy is to increase support for predictionsby pooling experimental information across sev-eral genes (modules, families), across species, andacross types of experiments. As a consequence,predictions are generated on more abstract enti-ties. The use of model-based statistical methodsalso facilitates the definition of optimality criteriafor a model instance inferred from a given dataset,ensuring some measure of theoretical confidence inthe predictions.

The open discussion addressed these four themesin detail, with a special focus on protein interactionprediction, for which several participants proposedthe establishment of a comparative assessment ofstructure prediction (CASP)-like competition.

The ‘Metabolic Networks’ session illustratedwell how the generation of measurements on awhole-cell scale is driving the need for models(static and dynamic) for network reconstructionmethods and for analytical approaches.

An ambitious Japanese project to build a com-prehensive global picture of Escherichia coli meta-bolism, within the larger context of the Interna-tional E. coli Alliance, was described by Tomita.The project integrates several complementary ongo-ing experimental efforts: MS analyses of metabo-lites, in vivo dynamic analysis with labelled iso-topes, expression profiling, 2D gels, systematicmutagenesis, enzyme kinetics, protein interac-tion identification, etc. For instance, 10 000 pro-tein–protein interactions have been detected so far

Copyright 2003 John Wiley & Sons, Ltd. Comp Funct Genom 2003; 4: 660–662.

Page 3: ISMB 2003 BioPathways SIG and 5th BioPathways Meeting

662 V. Schachter and A. Regev

using his-tagged proteins and immunoprecipitation,and a complete single-deletion mutant library hasbeen generated. Another effort focuses on findingas-yet unidentified metabolites using CE/MS mea-surements for charged molecules and LC/MS forneutral ones, the combination allowing a good res-olution on the identification of peaks of distinctmetabolites. First results show a surprisingly largenumber of metabolites that are not present in exist-ing databases. Tomita confirmed that data generatedby these projects will be made publicly available.

On the theoretical side, one recurring theme wasthe use of comparative approaches, e.g. to fill inthe gaps in the reconstruction of a static metabolicnetwork, or to explain major phenotypic differencesbetween genetically close species using steady-state dynamics.

Shah presented an exploratory study aimedat reconstructing a minimal yet maximally self-sufficient metabolic network from KEGG multi-species metabolic data. In such a network, allenzymes necessary for survival with minimal inputshould be included, the goal being to connect everycompound to the network, possibly with the helpof as-yet unknown reactions that could perhaps bepredicted. A first attempt generated a large numberof hypotheses in need of biological validation.

Another important idea was that, whereas fulldynamical models are difficult to both reconstructfrom available data and to study, there are simpli-fied models (flux balance analysis, S-systems, cou-pling of steady-state with small differential equa-tions, etc.) based on specific biological hypotheses(e.g. steady-state, or proximity to such) which lendthemselves better to simulation or analysis of net-work properties.

Martinos dos Santos showed how flux balanceanalysis (FBA), the study of metabolic fluxes undera steady-state hypothesis, can help understand whyonly one of two bacteria — Pseudomonas putidaand Ps. aeruginosa — that are very similar genet-ically is pathogenic. FBA allowed the predictionthat the presence of two enzymes specific to Ps.aeruginosa in the metabolic network has a signif-icant influence on metabolic flux distribution. Forinstance, ATP production was increased in someextreme pathways, the minimal set of flows withinthe network from which all actual flux distributionscan be derived as linear combinations and whichdefine the maximum metabolite conversion capa-bilities of the network under a given set of input

conditions. One lesson from this work is that smallgenetic differences can be strongly amplified.

Dos Santos then presented a modelling approachadapted to the study of metabolic fluxes in a com-munity of interacting cells — typically bacteria.The approach combines FBA, yielding distributionof fluxes at each time point, with more detaileddynamic modelling at certain critical points.

Almeida surveyed existing mathematical modelsfor metabolic networks, focusing on the feasibilityof instantiating a model at a given level of detailusing the metabolic profiles time series availabletoday. Next, he presented S-systems, a dynamicmodel in which parameters have clear biologicalmeaning, yet simplified enough to permit networkreconstruction.

The discussion opened with the question of howmuch is actually known about metabolism. Dif-ferences between ‘potential’ and actually occur-ring pathways, or between canonical pathways andcondition-specific pathways, were underlined. Themain issue related to experimental data was, unsur-prisingly, that of availability, followed closely byreliability. Several biases related to experimen-tal conditions classical in the study of bacterialmetabolism were also discussed.

The ‘Databases and Ontologies’ session featuredpresentations on pathways exchange formats andlanguages by a fairly comprehensive selection ofthe groups working towards a standardization goal(SBML, BioPax, CellML, OMG-LSR), as well aspresentations on pathways databases, for whichthere is clearly an unfilled need. In particular,Luciano described recent and impressive progressin the BioPax effort to define a standard formatfor pathways exchange, and Hucka gave an updateon the evolution and adoption of the SBML (sys-tems biology mark-up language) format by mod-ellers.

Finally, the ‘Software Tools’ session includedpresentations on a variety of pathways data man-agement and visualization tools.

In summary, the meeting confirmed that thefield of networks-related computational biologyis more than ever in a fast-growth stage, withboth frontier and depth expanding, and provideda snapshot of that field. Next year’s meeting inGlasgow will undoubtedly photograph a fairlydifferent landscape. More details of the meetingcan be found at www.biopathways.org

Copyright 2003 John Wiley & Sons, Ltd. Comp Funct Genom 2003; 4: 660–662.