34
Areejit Samal – MPA’11 Chester Laboratoire de Physique Théorique et Modèles Statistiques (LPTMS), CNRS and Univ Paris-Sud, Orsay, France and Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany Areejit Samal Randomizing genome-scale metabolic networks

Randomizing genome-scale metabolic networks

Embed Size (px)

DESCRIPTION

Talk presented at the meeting:Metabolic pathway analysis 201119—23 September 2011University of Chester, UK

Citation preview

Page 1: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Laboratoire de Physique Théorique et Modèles Statistiques (LPTMS), CNRS and Univ Paris-Sud, Orsay, France

and

Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany

Areejit Samal

Randomizing genome-scale metabolic networks

Page 2: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Architectural features of real-world networksArchitectural features of real-world networks

Small Average Path Length & High Local Clustering

Watts and Strogatz (1998)

Small-World

Ravasz et al (2002)

Power law degree distribution & Hierarchical modularity

Barabasi and Albert (1999)

Scale-free and Hierarchical Organization

Scale-free graphs can be generated using a simple model incorporating: Growth & Preferential attachment scheme

Structure of re

al-world

networks is very different fr

om random graphs

Structure of re

al-world

networks is very different fr

om random graphs

Page 3: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Shen-Orr et al (2002); Milo et al (2002,2004)

Network motifs and evolved designNetwork motifs and evolved design

Sub-graphs that are over-represented in real networks compared to randomized networks with same degree sequence.

Certain network motifs were shown to perform important information processing tasks.

For example, the coherent Feed Forward Loop (FFL) can be used to detect persistent signals.

The preponderance of certain sub-graphs in a real network may

reflect evolutionary design principles favored by selection.

Page 4: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Chosen Property

0.60 0.62 0.64 0.66 0.68 0.70

Fre

qu

ency

0

50

100

150

200

250

300

““Null hypothesis” based approach for identifying design Null hypothesis” based approach for identifying design principles in real-world networksprinciples in real-world networks

The procedure to deduce statistically significant network properties is:

Measure the ‘chosen property’ (E.g., motif significance profile, average clustering coefficient, etc.) in the investigated real network.

Generate randomized networks with structure similar to the investigated real network using an appropriate null model.

Use the distribution of the ‘chosen property’ for the randomized networks to estimate a p-value.

Investigated network

p-value

Page 5: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Edge-exchange algorithm: widely used null modelEdge-exchange algorithm: widely used null model

Randomized networks preserve the degree at

each node and the degree sequence.

Investigated Network

After 1 exchange

After 2 exchanges

Reference: Shen-Orr et al (2002); Milo et al (2002,2004); Maslov & Sneppen (2002)

Page 6: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Artzy-Randrup et al Science (2004)

Carefully posed null models are required for deducing Carefully posed null models are required for deducing design principles with confidencedesign principles with confidence

In C. elegans, close by neurons have a greater chance of forming a connection than distant neurons.

A null model incorporating this spatial constraint gives different results compared to generic edge-exchange algorithm.

Misinterpretation can arise due to use of an incomplete null model (e.g. ignoring important constraints) !!

A common generic null model (edge exchange algorithm) is unsuitable for different types of real-world networks.

Page 7: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Edge-exchange algorithm: case of bipartite metabolic Edge-exchange algorithm: case of bipartite metabolic networksnetworks

ASPT CITL

fum

asp-L cit

ac oaanh4

ASPT* CITL*

fum

asp-L cit

ac oaanh4

ASPT: asp-L → fum + nh4CITL: cit → oaa + ac

ASPT*: asp-L → ac + nh4 CITL*: cit → oaa + fum

Preserves degree of each node in the networkBut generates fictitious reactions that violate mass, charge and atomic balance satisfied by real chemical reactions!!

Note that fum has 4 carbon atoms while ac has 2 carbon atoms in the example shown.

Example: Guimera & Amaral (2005); Samal et al (2006)

Biochemica

lly m

eaningless ra

ndomizatio

n inappropria

te for

Biochemica

lly m

eaningless ra

ndomizatio

n inappropria

te for

metabolic netw

orks

metabolic netw

orks

Page 8: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Null models for metabolic networks: Blind-watchmaker network Null models for metabolic networks: Blind-watchmaker network

Blind to

basic co

nstraints

satis

fied by b

iochemica

l reacti

ons!!

Blind to

basic co

nstraints

satis

fied by b

iochemica

l reacti

ons!!

Page 9: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Motivation of this workMotivation of this work

We propose a new Markov Chain Monte Carlo (MCMC) based method to generate meaningful randomized ensembles for metabolic networks by successively imposing biochemical and functional constraints.

Ensemble RGiven number of valid biochemical reactions

Ensemble RmAdditionally, fix the No. of metabolites

Ensemble uRmAdditionally, exclude

blocked reactions

Ensemble uRm-V1Additionally, functional constraint of growth on a specified environments

Increasin

g level o

f con

straints

Constraint I

Constraint II

Constraint III

Constraint IV

Page 10: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Constraint I: Given number of valid biochemical Constraint I: Given number of valid biochemical reactionsreactions

Instead of generating fictitious reactions using edge exchange mechanism, we can restrict to >6000 valid reactions contained within KEGG database.

We use the database of 5870 mass balanced reactions derived from KEGG by Rodrigues and Wagner.

Page 11: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Global Reaction SetGlobal Reaction Set

+

Biomass composition

The E. coli biomass composition of iJR904 was used to determine viability of genotypes.

Nutrient metabolites

The set of external metabolites in iJR904 were allowed for uptake/excretion in the global reaction set.

KEGG Database + E. coli iJR904

We can implement Flux Balance Analysis (FBA)

within this database.

Reference: Rodrigues and Wagner (2009)

Page 12: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Comparison with Comparison with E. coliE. coli: Structural Properties of Interest: Structural Properties of Interest

• Metabolite Degree distribution

• Clustering Coefficient • Average Path Length

• Pc: Probability that a path exists between two nodes in the graph

• Largest strongly component (LSC) and the union of LSC, IN and OUT components

E. coli metabolic network has m (~750) metabolites and r (~900) reactions.

In this work, we will generate randomized metabolic networks using only reactions in KEGG and compare the structural properties of randomized networks with those of E. coli.

We study the following large-scale structural properties:

Bow-tie architecture of the Internet and metabolism

Ref: Broder et al; Ma and Zeng, Bioinformatics (2003)

Scale Free and Small World Scale Free and Small World Ref: Jeong et al (2000) Wagner & Fell (2001)Ref: Jeong et al (2000) Wagner & Fell (2001)

Page 13: Randomizing genome-scale metabolic networks

HEX1

PGI

atp

adph

PFK

g6p

glc-D

f6p

fdp

glc-D

g6p

f6p

fdp

HEX1: atp + glc-D → adp + g6p + hPGI: g6p ↔ f6pPFK: atp + f6p → adp + fdp + h

CurrencyMetabolite

OtherMetabolite

ReversibleReaction

IrreversibleReaction

(a) Bipartite Representation (b) Unipartite Representation

Degree Distribution

Clustering coefficientAverage Path Length

Largest Strong Component

Dif

fere

nt

Gra

ph

-th

eore

tic

Rep

rese

nta

tio

ns

of

the

Dif

fere

nt

Gra

ph

-th

eore

tic

Rep

rese

nta

tio

ns

of

the

Met

abo

lic

Net

wo

rkM

etab

oli

c N

etw

ork

Page 14: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Ensemble Ensemble RR: Random networks within KEGG with same number of : Random networks within KEGG with same number of reactions as reactions as E. coliE. coli

6000900C

KEGG Database E. coli

R1: 3pg + nad → 3php + h + nadhR2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o...RN: ---------------------------------

101....

The E. coli metabolic network or any random network of reactions within KEGG can be represented as a bit string of length N.

Present - 1

Absent - 0

N = 5870 reactions within KEGG

Bit string representation of metabolic networks

Ensemble R of random networks

To compare with E. coli, we generate an ensemble R of random networks: Pick random subsets with exactly r (~900) reactions within KEGG database of N (~6000) reactions. r is the no. of reactions in E. coli. Huge no. of networks possible within KEGG with exactly r reactions ~ Fixing the no. of reactions is equivalent to fixing genome size.

Contains r reactions (1’s in the bit string)

6000900C

KEGG with N reactions

Random networks with r reactions

Page 15: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Metabolite degree distributionMetabolite degree distribution

k

1 10 100 1000

P(k

)

10-6

10-5

10-4

10-3

10-2

10-1

100

E. coli

Complete KEGG: = 2.31

k

1 10 100 1000

P(k

)

10-6

10-5

10-4

10-3

10-2

10-1

100

E. coliRandom

E. coli: = 2.17; Most organisms: is 2.1-2.3

Reference: Jeong et al (2000);

Wagner and Fell (2001)

Random networks in ensemble R: = 2.51

Any (large enough) random subset of reactions in KEGG has Power Law degree distribution !!

k

1 10 100 1000

P(k

)

10-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

E. coliRandomKEGG

A degree of a metabolite is the number of reactions in which the metabolite participates in the network.

Page 16: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Reaction Degree DistributionReaction Degree Distribution

Number of substrates in a reaction

0 2 4 6 8 10 12

Fre

qu

ency

0

500

1000

1500

2000

2500

30000 Currency1 Currency2 Currency3 Currency4 Currency5 Currency6 Currency7 Currency

Tanaka and Doyle have suggested a classification of metabolites into three categories based on their biochemical roles: (a) ‘Carriers’ (very high degree)(b) ‘Precursors’ (intermediate degree)(c) ‘Others’ (low degree)Reference: Tanaka (2005); Tanaka, Csete, and Doyle (2008)

The degree of a reaction is the number of metabolites that participate in it.

The presence

of ubiquito

us (high degree) ‘c

urrency

’ metabolite

s

The presence

of ubiquito

us (high degree) ‘c

urrency

’ metabolite

s

along with

low degree ‘o

ther’ metabolite

s with

in a given re

action

along with

low degree ‘o

ther’ metabolite

s with

in a given re

action

explains t

he observe

d power laws i

n metabolis

m.

explains t

he observe

d power laws i

n metabolis

m.

Page 17: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Network Properties: Given number of valid biochemical Network Properties: Given number of valid biochemical reactions reactions

No. of m

etabolites in

E. coli < No. o

f metabolite

s in any ra

ndom network

Page 18: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Constraint II: Fix the number of metabolites Constraint II: Fix the number of metabolites

Direct sampling of randomized networks with same no. of reactions and metabolites as in E. coli is not possible.

Simple MCMC algorithm to sample networks within KEGG which have same no. of reactions and metabolites as that in E. coli.

Accept/Reject Criterion of reaction swap: Accept if the no. of metabolites in the new network is less than or equal to E. coli.

KEGG Database

R1: 3pg + nad → 3php + h + nadhR2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o...RN: ---------------------------------

10101..

N = 5870 reactions within KEGG reaction universe

11001..

10110..

11100..

00101..

E. coli

Accept

Reject

Step 1

Swap Swap

Reject

106 Swaps

Step 2

103 Swaps

store storeReaction swap keeps the number of reactions in network constant

Erase memory of

initial network

Saving Frequency

Ensemble Rm

Page 19: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Network Properties: After fixing number of metabolitesNetwork Properties: After fixing number of metabolites

Randomized network properties become closer to

E. coli

Page 20: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Constraint III: Exclude Blocked ReactionsConstraint III: Exclude Blocked Reactions

KEGG

Unblocked KEGG

Large-scale metabolic networks typically have certain dead-end or blocked reactions which cannot contribute to growth of the organism.

Blocked reactions have zero flux for every investigated chemical environment, under any steady-state condition with nonzero biomass growth flux.

We next exclude blocked reactions from our biochemical universe used for sampling randomized networks.

Page 21: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

MCMC Sampling: Exclusion of blocked reactions MCMC Sampling: Exclusion of blocked reactions

Restrict the reaction swaps to the set of unblocked reactions within KEGG.

Accept/Reject Criterion of reaction swap: Accept if the no. of metabolites in the new network is less than or equal to E. coli.

KEGG Database

R1: 3pg + nad → 3php + h + nadhR2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o...RN: ---------------------------------

10101..

N = 5870 reactions within KEGG reaction universe

11001..

10110..

11100..

00101..

E. coli

Accept

Reject

Step 1

Swap Swap

Reject

106 Swaps

Step 2

103 Swaps

store storeReaction swap keeps the number of reactions in network constant

Erase memory of

initial network

Saving Frequency

Ensemble uRm

Page 22: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Network Properties: After excluding blocked reactionsNetwork Properties: After excluding blocked reactions

Randomized network properties become still

closer to E. coli

Page 23: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Constraint IV: Growth PhenotypeConstraint IV: Growth Phenotype

011....

001....

GA

GB

x

Deletedreaction

FluxBalanceAnalysis

(FBA)

Viable genotype

Unviable genotype

Bit string and equivalent network representation of genotypes

Genotypes Ability to synthesize Viability biomass components

Growth

No Growth

Definition of Phenotype

For FBA Kauffman et al (2003) for a review

Page 24: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

MCMC Sampling: Growth phenotype in one environment MCMC Sampling: Growth phenotype in one environment

Restrict the reaction swaps to the set of unblocked reactions within KEGG.

Accept/Reject Criterion of reaction swap: Accept if (a) the no. of metabolites in the new network is less than or equal to E. coli(b) the new network is able to grow on the specified environment

KEGG Database

R1: 3pg + nad → 3php + h + nadhR2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o...RN: ---------------------------------

10101..

N = 5870 reactions within KEGG reaction universe

11001..

10110..

11100..

00101..

E. coli

Accept

Reject

Step 1

Swap Swap

Reject

106 Swaps

Step 2

103 Swaps

store storeReaction swap keeps the number of reactions in network constant

Erase memory of

initial network

Saving Frequency

Ensemble uRm-V1

Page 25: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Network Properties: Growth phenotype in one Network Properties: Growth phenotype in one environmentenvironment

Further im

provement obtained by im

posing phenotypic

constraint o

n randomized networks

Page 26: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Network Properties: Growth phenotype in multiple Network Properties: Growth phenotype in multiple environmentsenvironments

Still fu

rther im

provement obtained by in

creasing phenotypic

constraints on ra

ndomized networks

Page 27: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Global structural properties of real metabolic networks: Consequence of simple biochemical and functional constraints

Incr

easi

ng

lev

el o

f co

nst

rain

ts

Global network structure could be an indirect

consequence of biochemical and functional constraints

Samal and Martin, PLoS ONE (2011) 6(7): e22295

Conjectured by:A. Wagner (2007)

B. Papp et al. (2008)

Page 28: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

High level of genetic diversity in our randomized ensemblesHigh level of genetic diversity in our randomized ensembles

Number of reactions in genotype

500 1000 1500 2000 2500

Mea

n H

amm

ing

dis

tan

ce f

or

ran

do

m a

nd

via

ble

gen

oty

pes

0

200

400

600

800

1000

1200

1400

Glucose

AcetateSuccinate

(n): mean distance(n): maximum distance

Any two random networks in our most constrained ensemble differ in ~ 60% of their reactions.

Page 29: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Necessity of MCMC samplingNecessity of MCMC sampling

Constraint of having super-essential reactions

The convex curve indicates that the effect on viability of successive reaction removals becomes more and more severe.

Reference: Samal, Rodrigues, Jost, Martin, Wagner BMC Systems Biology (2010)

Page 30: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

SummarySummary

We propose a MCMC based method to generate randomized metabolic networks by successively imposing few macroscopic biochemical and functional constraints into our sampling criteria.

By imposing a few macroscopic constraints into our sampling criteria, we were able to approach the properties of real metabolic networks.

Page 31: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Concluding RemarksConcluding Remarks

We can extend the study using the SEED database to other organisms.

The list of reactions in KEGG is incomplete and its use may reflect evolutionary constraints associated with natural organisms.

From a synthetic biology context, there could be many more reactions that can occur which are not in KEGG.

Alternate approach proposed by Basler, Ebenhöh, Selbig and Nikoloski, Bioinformatics, 2011

Page 32: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

Neutral networks: A many-to-one genotype to phenotype mapNeutral networks: A many-to-one genotype to phenotype map

Genotype

Phenotype

RNA Sequence

Secondary structure of pairings

folding

Reference: Fontana & Schuster (1998) Lipman & Wilbur (1991) Ciliberti, Martin & Wagner(2007)

Neutral network or genotype network refers to the set of (many) genotypes that have a given phenotype, and their “organization” in genotype space.

Protein Sequence

Three dimensional structure

Gene network

Gene expression levels

Page 33: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

See also:

Common approaches in the study of design principles of metabolic Common approaches in the study of design principles of metabolic networksnetworks

Page 34: Randomizing genome-scale metabolic networks

Areejit Samal – MPA’11 Chester

AcknowledgementAcknowledgement

CNRS & Max Planck Society Joint Program in Systems Biology (CNRS MPG CNRS & Max Planck Society Joint Program in Systems Biology (CNRS MPG GDRE 513) for a Postdoctoral Fellowship.GDRE 513) for a Postdoctoral Fellowship.

Reference

Related Work

Genotype networks in metabolic reaction spacesAreejit Samal, João F Matias Rodrigues, Jürgen Jost, Olivier C Martin* and Andreas Wagner*   BMC Systems Biology 2010, 4:30

Environmental versatility promotes modularity in genome-scale metabolic networksAreejit Samal, Andreas Wagner* and Olivier C Martin*    BMC Systems Biology 2011, 5:135

Funding

Andreas Wagner João Rodrigues

University of Zurich

Jürgen Jost Max Planck Institute for Mathematics in the Sciences

Olivier C. MartinUniversité Paris-Sud

Thanks for y

our atte

ntion!!

-- Questio

ns??