59
Areejit Samal Laboratoire de Physique Théorique et Modèles Statistiques (LPTMS), CNRS and Univ Paris-Sud, Orsay, France and Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany Areejit Samal Large-scale structure of metabolic networks: Role of Biochemical and Functional constraints

Areejit Samal Randomizing metabolic networks

Embed Size (px)

Citation preview

Page 1: Areejit Samal Randomizing metabolic networks

Areejit Samal

Laboratoire de Physique Théorique et Modèles Statistiques (LPTMS), CNRS and UnivParis-Sud, Orsay, France

andMax Planck Institute for Mathematics in the Sciences, Leipzig, Germany

Areejit Samal

Large-scale structure of metabolic networks: Role of Biochemical and Functional

constraints

Page 2: Areejit Samal Randomizing metabolic networks

Areejit Samal

Outline

Architectural features of metabolic networks

Issues with existing null models for metabolic networks

Our method to generate realistic randomized metabolic networks using Markov Chain Monte Carlo (MCMC) sampling and Flux Balance Analysis (FBA)

Biological function is a main driver of large-scale structure in metabolic networks

Page 3: Areejit Samal Randomizing metabolic networks

Areejit Samal

Architectural features of real-world networks

Small Average Path Length & High Local Clustering

Watts and Strogatz, Nature (1998)

Small-world

Power law degree distribution

Barabasi and Albert, Science (1999)

Scale-free

Scale-free graphs can be generated using a simple model incorporating: Growth & Preferential attachment scheme

Page 4: Areejit Samal Randomizing metabolic networks

Areejit Samal

Architectural features of real-world networks

Small Average Path Length & High Local Clustering

Watts and Strogatz, Nature (1998)

Small-world

Power law degree distribution

Barabasi and Albert, Science (1999)

Scale-free

Scale-free graphs can be generated using a simple model incorporating: Growth & Preferential attachment scheme

Page 5: Areejit Samal Randomizing metabolic networks

Areejit Samal

Large-scale structure of metabolic networks

Ma and Zeng, Bioinformatics (2003)

Wagner & Fell (2001) Jeong et al (2000) Ravasz et al (2002)

Small-world, Scale-free and Hierarchical Organization

Bow-tie architecture

Bow-tie architecture of the metabolism is similar to that found for the WWW by Broder et al.

Small Average Path Length, High Local Clustering, Power-law degree distribution

Directed graph with a giant component of strongly connected nodes along with associated IN and OUT component

Page 6: Areejit Samal Randomizing metabolic networks

Areejit Samal

Large-scale structure of metabolic networks

Ma and Zeng, Bioinformatics (2003)

Wagner & Fell (2001) Jeong et al (2000) Ravasz et al (2002)

Small-world, Scale-free and Hierarchical Organization

Bow-tie architecture

Bow-tie architecture of the metabolism is similar to that found for the WWW by Broder et al.

Small Average Path Length, High Local Clustering, Power-law degree distribution

Directed graph with a giant component of strongly connected nodes along with associated IN and OUT component

Page 7: Areejit Samal Randomizing metabolic networks

Areejit Samal

Shen-Orr et al Nature Genetics (2002); Milo et al Science (2002,2004)

Network motifs and Evolutionary design

Sub-graphs that are over-represented in real networks compared to randomized networks with same degree sequence.

Certain network motifs were shown to perform important information processing tasks.

For example, the coherent Feed Forward Loop (FFL) can be used to detect persistent signals.

Page 8: Areejit Samal Randomizing metabolic networks

Areejit Samal

Shen-Orr et al Nature Genetics (2002); Milo et al Science (2002,2004)

Network motifs and Evolutionary design

Sub-graphs that are over-represented in real networks compared to randomized networks with same degree sequence.

Certain network motifs were shown to perform important information processing tasks.

For example, the coherent Feed Forward Loop (FFL) can be used to detect persistent signals.

Page 9: Areejit Samal Randomizing metabolic networks

Areejit Samal

Chosen Property

0.60 0.62 0.64 0.66 0.68 0.70

Freq

uenc

y

0

50

100

150

200

250

300

“Null hypothesis” based approach for identifying design principles in real-world networks

Statistically significant network properties are deduced as follows:

1) Measure the ‘chosen property’ (E.g., motif significance profile, average clustering coefficient, etc.) in the investigated real network.

2) Generate randomized networks with structure similar to the investigated real network using an appropriate null model.

3) Use the distribution of the ‘chosen property’ for the randomized networks to estimate a p-value.

Investigated network

p-value

Page 10: Areejit Samal Randomizing metabolic networks

Areejit Samal

Edge-exchange algorithm: commonly-used null model

Randomized networks preserve the degree at

each node and the degree sequence.

Investigated Network

After 1 exchange

After 2 exchanges

Reference: Shen-Orr et al Nature Genetics (2002); Milo et al Science (2002,2004); Maslov & Sneppen Science (2002)

Page 11: Areejit Samal Randomizing metabolic networks

Areejit Samal

Artzy-Randrup et al Science (2004)

Carefully posed null models are required for deducing design principles with confidence

C. elegans Neuronal Network: Close by neurons have a greater chance of forming a connection than distant neurons

A null model incorporating this spatial constraint gives different results compared to generic edge-exchange algorithm.

Page 12: Areejit Samal Randomizing metabolic networks

Areejit Samal

Artzy-Randrup et al Science (2004)

Carefully posed null models are required for deducing design principles with confidence

C. elegans Neuronal Network: Close by neurons have a greater chance of forming a connection than distant neurons

A null model incorporating this spatial constraint gives different results compared to generic edge-exchange algorithm.

Page 13: Areejit Samal Randomizing metabolic networks

Areejit Samal

Edge-exchange algorithm: case of bipartite metabolic networks

ASPT CITL

fum

asp-L cit

ac oaanh4

ASPT: asp-L → fum + nh4CITL: cit → oaa + ac

Page 14: Areejit Samal Randomizing metabolic networks

Areejit Samal

Edge-exchange algorithm: case of bipartite metabolic networks

ASPT CITL

fum

asp-L cit

ac oaanh4

ASPT* CITL*

fum

asp-L cit

ac oaanh4

ASPT: asp-L → fum + nh4CITL: cit → oaa + ac

ASPT*: asp-L → ac + nh4 CITL*: cit → oaa + fum

Preserves degree of each node in the networkBut generates fictitious reactions that violatemass, charge and atomic balance satisfied by real chemical reactions!!

Note that fum has 4 carbon atoms while ac has 2 carbon atoms in the example shown.

Null-model used in many studies including: Guimera & Amaral Nature (2005); Samal et al BMC Bioinformatics (2006)

Page 15: Areejit Samal Randomizing metabolic networks

Areejit Samal

Edge-exchange algorithm: case of bipartite metabolic networks

ASPT CITL

fum

asp-L cit

ac oaanh4

ASPT* CITL*

fum

asp-L cit

ac oaanh4

ASPT: asp-L → fum + nh4CITL: cit → oaa + ac

ASPT*: asp-L → ac + nh4 CITL*: cit → oaa + fum

Preserves degree of each node in the networkBut generates fictitious reactions that violatemass, charge and atomic balance satisfied by real chemical reactions!!

Note that fum has 4 carbon atoms while ac has 2 carbon atoms in the example shown.

Null-model used in many studies including: Guimera & Amaral Nature (2005); Samal et al BMC Bioinformatics (2006)

Page 16: Areejit Samal Randomizing metabolic networks

Areejit Samal

Edge-exchange algorithm: case of bipartite metabolic networks

ASPT CITL

fum

asp-L cit

ac oaanh4

ASPT* CITL*

fum

asp-L cit

ac oaanh4

ASPT: asp-L → fum + nh4CITL: cit → oaa + ac

ASPT*: asp-L → ac + nh4 CITL*: cit → oaa + fum

Preserves degree of each node in the networkBut generates fictitious reactions that violatemass, charge and atomic balance satisfied by real chemical reactions!!

Note that fum has 4 carbon atoms while ac has 2 carbon atoms in the example shown.

Null-model used in many studies including: Guimera & Amaral Nature (2005); Samal et al BMC Bioinformatics (2006)

A common generic null model (edge exchange algorithm) is unsuitable for different types of real-world networks.

Page 17: Areejit Samal Randomizing metabolic networks

Areejit Samal

Null models for metabolic networks: Blind-watchmaker network

Page 18: Areejit Samal Randomizing metabolic networks

Areejit Samal

Null models for metabolic networks: Blind-watchmaker network

Page 19: Areejit Samal Randomizing metabolic networks

Areejit Samal

Randomizing metabolic networks

We propose a new method using Markov Chain Monte Carlo (MCMC) sampling and Flux Balance Analysis (FBA) to generate meaningful randomized ensembles for metabolic networks by successively imposing biochemical and functional constraints.

Ensemble RGiven no. of valid

biochemical reactions

Ensemble RmFix the no. of metabolites

Ensemble uRmExclude Blocked Reactions

Ensemble uRm-V1Functional constraint of

growth on specified environments

Increasing level of constraints

Constraint I

Constraint II

Constraint III

Constraint IV

+

+

+

Page 20: Areejit Samal Randomizing metabolic networks

Areejit Samal

Comparison with E. coli: Large-scale structural properties of interest

• Metabolite Degree distribution

• Clustering Coefficient

• Average Path Length

• Pc: Probability that a path exists between two nodes in the directed graph

• Largest strongly component (LSC) and the union of LSC, IN and OUT components

E. coli metabolic network has m (~750) metabolites and r (~900) reactions.

In this work, we will generate randomized metabolic networks using only reactions in KEGG and compare the structural properties of randomized networks with those of E. coli.

We study the following large-scale structural properties:

Bow-tie architecture of the Internet and metabolism

Ref: Broder et al; Ma and Zeng, Bioinformatics (2003)

Scale Free and Small World Ref: Jeong et al (2000) Wagner & Fell (2001)

Page 21: Areejit Samal Randomizing metabolic networks

HEX1

PGI

atp

adph

PFK

g6p

glc-D

f6p

fdp

glc-D

g6p

f6p

fdp

HEX1: atp + glc-D → adp + g6p + hPGI: g6p ↔ f6pPFK: atp + f6p → adp + fdp + h

CurrencyMetabolite

OtherMetabolite

ReversibleReaction

IrreversibleReaction

(a) Bipartite Representation (b) Unipartite Representation

Diff

eren

t Gra

ph-th

eore

tic R

epre

sent

atio

ns o

f the

M

etab

olic

Net

wor

k

Page 22: Areejit Samal Randomizing metabolic networks

HEX1

PGI

atp

adph

PFK

g6p

glc-D

f6p

fdp

glc-D

g6p

f6p

fdp

HEX1: atp + glc-D → adp + g6p + hPGI: g6p ↔ f6pPFK: atp + f6p → adp + fdp + h

CurrencyMetabolite

OtherMetabolite

ReversibleReaction

IrreversibleReaction

(a) Bipartite Representation (b) Unipartite Representation

Degree Distribution

Clustering coefficientAverage Path Length

Largest Strong Component

Diff

eren

t Gra

ph-th

eore

tic R

epre

sent

atio

ns o

f the

M

etab

olic

Net

wor

k

Page 23: Areejit Samal Randomizing metabolic networks

HEX1

PGI

atp

adph

PFK

g6p

glc-D

f6p

fdp

glc-D

g6p

f6p

fdp

HEX1: atp + glc-D → adp + g6p + hPGI: g6p ↔ f6pPFK: atp + f6p → adp + fdp + h

CurrencyMetabolite

OtherMetabolite

ReversibleReaction

IrreversibleReaction

(a) Bipartite Representation (b) Unipartite Representation

Degree Distribution

Clustering coefficientAverage Path Length

Largest Strong Component

Diff

eren

t Gra

ph-th

eore

tic R

epre

sent

atio

ns o

f the

M

etab

olic

Net

wor

k

The values obtained for different network measures depend on the list of currency metabolites used to generate the unipartite graph of metabolites starting from the bipartite graph. In this work, we have used a well defined biochemically sound list of currency metabolites to construct unipartite graph corresponding to each metabolic network.

Page 24: Areejit Samal Randomizing metabolic networks

Areejit Samal

Constraint I: Given number of valid biochemical reactions

Instead of generating fictitious reactions using edge exchange mechanism, we can restrict to >6000 valid reactions contained within KEGG database.

We use the database of 5870 mass balanced reactions derived from KEGG by Rodrigues and Wagner. For simplicity, I will refer to this database in the talk as ‘KEGG’

Page 25: Areejit Samal Randomizing metabolic networks

Areejit Samal

Global Reaction Set

+

Biomass composition

The E. coli biomass composition of iJR904 was used to determine viability of genotypes.

Nutrient metabolites

The set of external metabolites in iJR904were allowed for uptake/excretion in the global reaction set.

KEGG Database + E. coli iJR904

We can implement Flux Balance Analysis (FBA)

within this database.

Page 26: Areejit Samal Randomizing metabolic networks

Areejit Samal

Genome-scale metabolic models: Flux Balance Analysis (FBA)

List of metabolic reactions with stoichiometric coefficients

Biomass composition

Set of nutrients in theenvironment

Flux Balance Analysis (FBA)

Growth rate for the given medium‘Biomass Yield’

Fluxes of all reactions

AdvantagesFBA does not require enzyme kinetic information which is not known for most reactions.

DisadvantagesFBA cannot predict internal metabolite concentrations and is restricted to steady states.Basic models do not account for metabolic regulation.

Reference: Varma and Palsson, Biotechnology (1994); Price et al Nature Reviews Microbiology (2004)

Page 27: Areejit Samal Randomizing metabolic networks

Areejit Samal

Bit string representation of metabolic networks within the global reaction set

6000900C

KEGG Database E. coli

R1: 3pg + nad → 3php + h + nadhR2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o...RN: ---------------------------------

101....

The E. coli metabolic network or any random network of reactions within KEGG can be represented as a bit string of length N with exactly r entries equal to 1. r is the number of reactions in the E. coli metabolic network.

Present - 1Absent - 0

N = 5870 reactions within KEGG (or global reaction set)

Contains r reactions (1’s in the bit string)

Page 28: Areejit Samal Randomizing metabolic networks

Areejit Samal

Ensemble R: Random networks within KEGG with same number of reactions as E. coli

100....

To compare with E. coli, we generate an ensemble R of random networks as follows:

Pick random subsets with exactly r (~900) reactions within KEGG database of N(~6000) reactions. r is the no. of reactions in E. coli.

Huge no. of networks possible within KEGG with exactly r reactions ~

Fixing the no. of reactions is equivalent to fixing genome size.

6000900C

KEGG with N=5870

reactions

Random subsets or networks with r reactions

001....

111....

101....

Each random network or bit string contains exactly r reactions (1’s in the string) like in E. coli.

Page 29: Areejit Samal Randomizing metabolic networks

Areejit Samal

Metabolite degree distribution

k

1 10 100 1000

P(k)

10-6

10-5

10-4

10-3

10-2

10-1

100

E. coli

E. coli: = 2.17; Most organisms: is 2.1-2.3

Reference: Jeong et al (2000);

Wagner and Fell (2001)

A degree of a metabolite is the number of reactions in which the metabolite participates in the network.

Page 30: Areejit Samal Randomizing metabolic networks

Areejit Samal

Metabolite degree distribution

k

1 10 100 1000

P(k)

10-6

10-5

10-4

10-3

10-2

10-1

100

E. coli

k

1 10 100 1000

P(k)

10-6

10-5

10-4

10-3

10-2

10-1

100

E. coliRandom

E. coli: = 2.17; Most organisms: is 2.1-2.3

Reference: Jeong et al (2000);

Wagner and Fell (2001)

Random networks in ensemble R: = 2.51

A degree of a metabolite is the number of reactions in which the metabolite participates in the network.

Page 31: Areejit Samal Randomizing metabolic networks

Areejit Samal

Metabolite degree distribution

k

1 10 100 1000

P(k)

10-6

10-5

10-4

10-3

10-2

10-1

100

E. coli

Complete KEGG: = 2.31

k

1 10 100 1000

P(k)

10-6

10-5

10-4

10-3

10-2

10-1

100

E. coliRandom

E. coli: = 2.17; Most organisms: is 2.1-2.3

Reference: Jeong et al (2000);

Wagner and Fell (2001)

Random networks in ensemble R: = 2.51

Any (large enough) random subset of reactions in KEGG has Power Law degree distribution !!

k

1 10 100 1000

P(k)

10-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

E. coliRandomKEGG

A degree of a metabolite is the number of reactions in which the metabolite participates in the network.

Page 32: Areejit Samal Randomizing metabolic networks

Areejit Samal

Reaction Degree Distribution

Number of substrates in a reaction

0 2 4 6 8 10 12

Freq

uenc

y

0

500

1000

1500

2000

2500

30000 Currency1 Currency2 Currency3 Currency4 Currency5 Currency6 Currency7 Currency

In each reaction, we have counted the number of currency and other metabolites which is shown in the figure. Most reactions involve 4 metabolites of which 2 are currency metabolites and 2 are other metabolites.

The degree of a reaction is the number of metabolites that participate in it.

The reaction degree distribution is bell shaped with typical reaction in the network involving 4 metabolites. The nature of reaction degree distribution is very different from the metabolite degree distribution which follows a power law.

R

atp

A

adp

B

Page 33: Areejit Samal Randomizing metabolic networks

Areejit Samal

Scale-free versus Scale-rich network: Origin of Power laws in metabolic networks

Tanaka and Doyle have suggested a classification of metabolites into three categories based on their biochemical roles (a) ‘Carriers’ (very high degree)(b) ‘Precursors’ (intermediate degree)(c) ‘Others’ (low degree)

It has been argued that gene duplication and divergence at the level of protein interaction networks can lead to observed power laws in metabolic networks. However, the presence of ubiquitous (high degree) ‘currency’ metabolites along with low degree ‘other’ metabolites in each reaction can explain the observed fat tail in the metabolite distribution.

Reference: Tanaka (2005); Tanaka, Csete, and Doyle (2008)

R

atp

A

adp

B

Page 34: Areejit Samal Randomizing metabolic networks

Areejit Samal

Network Properties: Given number of valid biochemical reactions

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

P c

0.0

0.2

0.4

0.6

0.8

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

Frac

tion

of n

odes

0.0

0.2

0.4

0.6

0.8

1.0LSCLSC + IN + OUT

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< C

>

0.00

0.02

0.04

0.06

0.08

0.10

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< L

>

0

2

4

6

8

10

Clustering Coefficient Path Length

Probability that a path exists between two nodes

Largest Strong Component

Page 35: Areejit Samal Randomizing metabolic networks

Areejit Samal

Network Properties: Given number of valid biochemical reactions

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

P c

0.0

0.2

0.4

0.6

0.8

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

Frac

tion

of n

odes

0.0

0.2

0.4

0.6

0.8

1.0LSCLSC + IN + OUT

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< C

>

0.00

0.02

0.04

0.06

0.08

0.10

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< L

>

0

2

4

6

8

10

Clustering Coefficient Path Length

Probability that a path exists between two nodes

Largest Strong Component

Page 36: Areejit Samal Randomizing metabolic networks

Areejit Samal

Constraint II: Fix the number of metabolites

Direct sampling of randomized networks with same no. of reactions and metabolites as in E. coli is not possible.

Accept/Reject Criterion of reaction swap:Accept if the no. of metabolites in the new network is less than or equal to E. coli.

KEGG Database

R1: 3pg + nad → 3php + h + nadhR2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o...RN: ---------------------------------

10101..

N = 5870 reactions within KEGG reaction universe

11001..

10110..

11100..

00101..

E. coli

Accept

Reject

Step 1

Swap Swap

Reject

106 Swaps

Step 2

103 Swaps

store storeReaction swap keeps the number of reactions in network constant

Erase memory of

initial network

Saving Frequency

Ensemble RM

Markov Chain Monte Carlo (MCMC) sampling of randomized networks

Page 37: Areejit Samal Randomizing metabolic networks

Areejit Samal

Network Properties: After fixing number of metabolites

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

P c

0.0

0.2

0.4

0.6

0.8

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

Frac

tion

of n

odes

0.0

0.2

0.4

0.6

0.8

1.0LSCLSC + IN + OUT

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< C

>

0.00

0.02

0.04

0.06

0.08

0.10

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< L

>

0

2

4

6

8

10

Clustering Coefficient Path Length

Probability that a path exists between two nodes

Largest Strong Component

Page 38: Areejit Samal Randomizing metabolic networks

Areejit Samal

Network Properties: After fixing number of metabolites

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

P c

0.0

0.2

0.4

0.6

0.8

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

Frac

tion

of n

odes

0.0

0.2

0.4

0.6

0.8

1.0LSCLSC + IN + OUT

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< C

>

0.00

0.02

0.04

0.06

0.08

0.10

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< L

>

0

2

4

6

8

10

Clustering Coefficient Path Length

Probability that a path exists between two nodes

Largest Strong Component

Page 39: Areejit Samal Randomizing metabolic networks

Areejit Samal

Constraint III: Exclude Blocked Reactions

KEGG

Unblocked KEGG

Large-scale metabolic networks typically have certain dead-end or blocked reactions which cannot contribute to growth of the organism.

Blocked reactions have zero flux for every investigated chemical environment under any steady-state condition and do not contribute to biomass growth flux.

We next exclude blocked reactions from our biochemical universe used for sampling randomized networks.

For Blocked reactions, see Burgard et al Genome Research (2004)

Page 40: Areejit Samal Randomizing metabolic networks

Areejit Samal

MCMC Sampling: Exclusion of blocked reactions

Accept/Reject Criterion of reaction swap:

Restrict the reaction swaps to the set of unblocked reactions within KEGG.

Accept if the no. of metabolites in the new network is less than or equal to E. coli.

KEGG Database

R1: 3pg + nad → 3php + h + nadhR2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o...RN: ---------------------------------

10101..

N = 5870 reactions within KEGG reaction universe

11001..

10110..

11100..

00101..

E. coli

Accept

Reject

Step 1

Swap Swap

Reject

106 Swaps

Step 2

103 Swaps

store store

Erase memory of

initial network

Saving Frequency

Ensemble uRM

Page 41: Areejit Samal Randomizing metabolic networks

Areejit Samal

Network Properties: After excluding blocked reactions

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

P c

0.0

0.2

0.4

0.6

0.8

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

Frac

tion

of n

odes

0.0

0.2

0.4

0.6

0.8

1.0LSCLSC + IN + OUT

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< C

>

0.00

0.02

0.04

0.06

0.08

0.10

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< L

>

0

2

4

6

8

10

Clustering Coefficient Path Length

Probability that a path exists between two nodes

Largest Strong Component

Page 42: Areejit Samal Randomizing metabolic networks

Areejit Samal

Network Properties: After excluding blocked reactions

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

P c

0.0

0.2

0.4

0.6

0.8

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

Frac

tion

of n

odes

0.0

0.2

0.4

0.6

0.8

1.0LSCLSC + IN + OUT

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< C

>

0.00

0.02

0.04

0.06

0.08

0.10

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< L

>

0

2

4

6

8

10

Clustering Coefficient Path Length

Probability that a path exists between two nodes

Largest Strong Component

Page 43: Areejit Samal Randomizing metabolic networks

Areejit Samal

Constraint IV: Growth Phenotype

011....

001....

GA

GB

xDeletedreaction

FluxBalanceAnalysis

(FBA)

Viable genotype

Unviable genotype

Bit string and equivalent network representation of genotypes

Genotypes Ability to synthesize Viability biomass components

Growth

No Growth

Definition of Phenotype

For FBA, see Price et al (2004) for a review

Page 44: Areejit Samal Randomizing metabolic networks

Areejit Samal

MCMC Sampling: Growth phenotype in one environment

Accept/Reject Criterion of reaction swap:

Restrict the reaction swaps to the set of unblocked reactions within KEGG.

Accept if (a) No. of metabolites in the new network is less than or equal to E. coli(b) New network is able to grow on the specified environment

KEGG Database

R1: 3pg + nad → 3php + h + nadhR2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o...RN: ---------------------------------

10101..

N = 5870 reactions within KEGG reaction universe

11001..

10110..

11100..

00101..

E. coli

Accept

Reject

Step 1

Swap Swap

Reject

106 Swaps

Step 2

103 Swaps

store store

Erase memory of

initial network

Saving Frequency

Ensemble uRM-V1

Page 45: Areejit Samal Randomizing metabolic networks

Areejit Samal

Network Properties: Growth phenotype in one environment

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

P c

0.0

0.2

0.4

0.6

0.8

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

Frac

tion

of n

odes

0.0

0.2

0.4

0.6

0.8

1.0LSCLSC + IN + OUT

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< C

>

0.00

0.02

0.04

0.06

0.08

0.10

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< L

>

0

2

4

6

8

10

Clustering Coefficient Path Length

Probability that a path exists between two nodes

Largest Strong Component

Page 46: Areejit Samal Randomizing metabolic networks

Areejit Samal

Network Properties: Growth phenotype in one environment

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

P c

0.0

0.2

0.4

0.6

0.8

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

Frac

tion

of n

odes

0.0

0.2

0.4

0.6

0.8

1.0LSCLSC + IN + OUT

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< C

>

0.00

0.02

0.04

0.06

0.08

0.10

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< L

>

0

2

4

6

8

10

Clustering Coefficient Path Length

Probability that a path exists between two nodes

Largest Strong Component

Page 47: Areejit Samal Randomizing metabolic networks

Areejit Samal

Network Properties: Growth phenotype in multiple environments

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

P c

0.0

0.2

0.4

0.6

0.8

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

Frac

tion

of n

odes

0.0

0.2

0.4

0.6

0.8

1.0LSCLSC + IN + OUT

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< C

>

0.00

0.02

0.04

0.06

0.08

0.10

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< L

>

0

2

4

6

8

10

Clustering Coefficient Path Length

Probability that a path exists between two nodes

Largest Strong Component

Page 48: Areejit Samal Randomizing metabolic networks

Areejit Samal

Network Properties: Growth phenotype in multiple environments

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

P c

0.0

0.2

0.4

0.6

0.8

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

Frac

tion

of n

odes

0.0

0.2

0.4

0.6

0.8

1.0LSCLSC + IN + OUT

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< C

>

0.00

0.02

0.04

0.06

0.08

0.10

R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

< L

>

0

2

4

6

8

10

Clustering Coefficient Path Length

Probability that a path exists between two nodes

Largest Strong Component

Page 49: Areejit Samal Randomizing metabolic networks

Areejit Samal

Global structural properties of real metabolic networks: Consequence of simple biochemical and functional constraints

Incr

easi

ng le

vel o

f con

stra

ints

Samal and Martin, PLoS ONE (2011) 6(7): e22295

Conjectured by: A. Wagner (2007) and

B. Papp et al. (2008)

Page 50: Areejit Samal Randomizing metabolic networks

Areejit Samal

Global structural properties of real metabolic networks: Consequence of simple biochemical and functional constraints

Incr

easi

ng le

vel o

f con

stra

ints

Samal and Martin, PLoS ONE (2011) 6(7): e22295

Conjectured by: A. Wagner (2007) and

B. Papp et al. (2008)

Page 51: Areejit Samal Randomizing metabolic networks

Areejit Samal

High level of genetic diversity in our randomized ensembles

Any two random networks in our most constrained ensemble uRM-v10 differ in ~ 60% of their reactions.

101....

100....

E. coli Random network in ensemble uRM-V10

versus

Hamming distance between the two networks is ~ 60% of the maximum possible between two bit strings.

There is a set of ~ 100 reactions which are present in all of our random networks.

101....

100....

Random network in ensemble uRM-V10

versus

Random network in ensemble uRM-V10

Page 52: Areejit Samal Randomizing metabolic networks

Areejit Samal

Necessity of MCMC sampling

Reference: Samal et al, BMC Systems Biology (2010)

Ensemble R

Ensemble Rm

Ensemble uRm

Number of reactions in genotype2000 2200 2400 2600 2800

Frac

tion

of g

enot

ypes

that

are

via

ble 100

10-5

10-10

10-15

10-20

GlucoseAcetateSuccinateAnalytic prefactor

Ensemble uRm-V1Ensemble uRm-V5

Ensemble uRm-V10

Embedded sets like Russian dolls

Including the viability constraint on the first chemical environment leads to a reduction by at least a factor 1022

Page 53: Areejit Samal Randomizing metabolic networks

Areejit Samal

Summary

We have proposed a new method based on MCMC sampling and FBA to generate randomized metabolic networks by successively imposing few macroscopic biochemical and functional constraints into our sampling criteria.

By imposing a few macroscopic constraints into our sampling criteria, we were able to approach the properties of real metabolic networks.

We conclude that biological function is a main driver of structure in metabolic networks.

Page 54: Areejit Samal Randomizing metabolic networks

Areejit Samal

Limitations and Future Outlook

We can extend the study using the SEED database to many organisms.

The list of reactions in KEGG is incomplete and its use may reflect evolutionary constraints associated with natural organisms.

From a synthetic biology context, there could be many more reactions that can occur which are not in KEGG.

Alternate approach proposed by Basler et al, Bioinformatics, 2011 which generates new mass-balanced reactions but does not impose functionality constraint.

Automated reconstruction of metabolic models for >140 organisms on which FBA can be performed.

Generate new mass-balanced reactions but no constraint of biomass production.

Page 55: Areejit Samal Randomizing metabolic networks

Areejit Samal

Genotype networks: A many-to-one genotype to phenotype map

Genotype

Phenotype

RNA Sequence

Secondary structure of pairings

folding

Reference: Fontana & Schuster (1998) Lipman & Wilbur (1991) Ciliberti, Martin & Wagner (2007)

Genotype network refers to the set of (many) genotypes that have a given phenotype, and their “organization” in genotype space. In the context of RNA, genotype networks are commonly referred to as Neutral networks.

Protein Sequence

Three dimensional structure

Gene network

Gene expression levels

Page 56: Areejit Samal Randomizing metabolic networks

Areejit Samal

Properties of the genotype-to-phenotype map

Are there many genotypes that have a given phenotype?

Do the genotypes with a given phenotype form a significant fraction of all possible genotypes?

Are all the genotypes with a given phenotype rather similar?

Are biological genotypes atypically robust compared to random genotypes with similar phenotype?

Are biological genotypes atypically innovative?

Can genotypes change gradually in response to selective pressures?

A. Wagner, Nat. Rev. Genet. (2008)

Page 57: Areejit Samal Randomizing metabolic networks

Areejit Samal

Properties of the genotype-to-phenotype map

Are there many genotypes that have a given phenotype?

Do the genotypes with a given phenotype form a significant fraction of all possible genotypes?

Are all the genotypes with a given phenotype rather similar?

Are biological genotypes atypically robust compared to random genotypes with similar phenotype?

Are biological genotypes atypically innovative?

Can genotypes change gradually in response to selective pressures?

We have considered these questions in the context of metabolic networks.

Genotype = lists of metabolic enzymes or reactionsPhenotype = growth or not in a given environment

A. Wagner, Nat. Rev. Genet. (2008)

Page 58: Areejit Samal Randomizing metabolic networks

Areejit Samal

Computational tool for Evolutionary Systems Biology

Page 59: Areejit Samal Randomizing metabolic networks

Areejit Samal

Acknowledgement

CNRS & Max Planck Society Joint Program in Systems Biology (CNRS MPG GDRE 513) for a Postdoctoral Fellowship.

Reference

Other Work

Genotype networks in metabolic reaction spacesAreejit Samal, João F Matias Rodrigues, Jürgen Jost, Olivier C Martin* and Andreas Wagner*BMC Systems Biology 2010, 4:30

Environmental versatility promotes modularity in genome-scale metabolic networksAreejit Samal, Andreas Wagner* andOlivier C Martin*BMC Systems Biology 2011, 5:135

Funding

Andreas Wagner João Rodrigues University of Zurich

Jürgen JostMax Planck Institute for Mathematics in the Sciences

Olivier C. MartinUniversité Paris-Sud