Journal of Computing Pune University Metabolic Pathway ... · Pune University Metabolic Pathway Engineering (PuMPE) Resource 1A.S.Kolaskar, 2Shweta Kolhi 1 ... Open source relational

Volume 2 No.7, JULY 2011 ISSN 2079-8407

Journal of Emerging Trends in Computing and Information Sciences

©2010-11 CIS Journal. All rights reserved.

http://www.cisjournal.org

325

Pune University Metabolic Pathway Engineering (PuMPE) Resource

1A.S.Kolaskar,

2Shweta Kolhi

1KIIT University, Bhubaneswar - 751024, India. 2Bioinformatics Center, University of Pune, Pune – 411007, India.

[email protected], [email protected]

ABSTRACT

PuMPE is a comprehensive resource that provides integrated information on metabolome of bacterial systems. The genome

data is annotated to infer metabolic pathways using in-house tools and web-based sources. PuMPE introduces a novel

aspect of metabolic categorization. It is the first resource to provide metabolome-based tree computed by comparing

metabolome between bacteria. PuMPE has metabolic pathways information for 581 bacteria having completely sequenced

genome. Information on Km (Michaelis constant) values, catalytic site data and 3D structures of enzymes is integrated and

made available on one platform. Open source relational database management system MySQL is used at the backend and

software used for visualization of structures and pathway interactions are also from open source. Updation is done

regularly with minimal human intervention. This resource is user friendly and provides unique integrated information to

carry out metabolic pathway engineering. It is available at http://115.111.37.202/mpe/

Keywords— Metabolic pathways database, pathways interactions, metabolome-based tree, metabolic categorization

1. INTRODUCTION

Advancements in instrumentation over the last

two decades have lead to exponential increase in

biological data. This biological data is in the form of

sequence data for genome and proteome, microarray data

for gene expression profiles, metabolome data for

metabolic pathways information etc. Many public domain

databases catalogue this information in a systematic

manner. These databases can be general or specific in

nature. NCBI [1], EBI [2] , Ensembl [3] etc are examples

of databases available in public domain, having general

molecular biology information. Stanford Microarray

Database [4] , Catalytic site atlas [5] , miRBase: the

microRNA database [6] are few examples of database

having specific biological data . These static databases

have enormous information on genes and proteins. On the

other hand, continuous dynamic interaction with the

environment is an important property of any living system

and hence there is a need for a comprehensive resource

that provides information on dynamic interactions between

genes, proteins and ligands. Metabolic pathways database

is an example of such dynamic interactions.

Metabolism is one of the better-documented

biological processes that represents interacting network of

genes. There exist metabolic pathways databases like

BioCyc [7], KEGG PATHWAY [8] etc., which provide

extensive information on organism specific pathway data.

However they do not include data on interaction of

pathways among themselves in the metabolome and

relationship between organisms depending on their

metabolome. Further enzyme kinetics data important for

metabolic pathways engineering is also absent from

above-mentioned databases. Inclusion of such information

reflects behaviour of an organism. To study biology of an

organism at molecular level in a holistic manner it is

necessary to catalogue systematically this data in a user-

friendly mode that can be then used to extract knowledge.

There is a need to develop software tools to analyse the

data in the database and extract knowledge relevant to the

user. These databases and software tools become

important resources and are helpful to build user specific

programs to engineer metabolic pathways and provide

help in designing new biological species or cells. Pune

University Metabolic Pathways Engineering resource is

one such attempt.

2. PuMPE DESCRIPTION

Pune University Metabolic Pathways Engineering

(PuMPE) resource has primary as well as derived data that

will be useful to carry out metabolic pathways

engineering. The database includes metabolic pathways

information for bacterial systems whose genome is fully

sequenced. The PuMPE resource contains all the

information that is available in KEGG PATHWAY for

fully sequenced bacteria by following the BioCyc

ontology. In addition to data from KEGG PATHWAY

several new primary and derived data are added to

increase the utility of the resource. Some of the unique

features of PuMPE include metabolic pathways

categorization, metabolome based tree, visualization of

interaction of each pathway with metabolome and Km

(Michaelis constant) values [10]. PuMPE also provides

information on catalytic site [5] , 3D structures of enzymes

[9], choke point enzymes, dynamic links to literature

database (PubMed) etc. Data is organized in a relational

database MySQL at the backend and has a user-friendly

front-end. Currently PuMPE has metabolic pathways

information on 581 bacteria having completely sequenced

genome. It contains information on 1750 pathways and

10201 reactions.

mailto:[email protected]

mailto:[email protected]

http://115.111.37.202/mpe/





326

2.1 Structure and implementation of PuMPE

PuMPE consists of module for data acquisition

and curation. The data is organized in relational database

management system MySQL and schema is given in

Figure 1. PuMPE is composed of 11 linked tables and

contains information given in Table 1

Figure 1: Schema of PuMPE

Table 1: Names and contents of tables in PuMPE

The query system has been developed using ASP.

A user-friendly web interface is designed in HTML by

implementing Java scripts. Parsing, annotation and data

updates have been automated to minimize human

intervention. Workflow of data collection and analysis in

PuMPE is given in Figure 2. As can be seen, data is

fetched from various sources and genome annotation is

undertaken using tools developed in-house to identify

pathways and to analyse them to extract knowledge.

Figure 2: PuMPE workflow

In addition to these primary data elements,

derived data elements such as categorization of metabolic

pathways, metabolome based bacterial relationship tree etc

are also incorporated (see Figure 1 for schema of database

part of PuMPE resource)

2.2 Data acquisition

The bacterial genome sequences are obtained

from the repository of nucleic acid sequences available at

the NCBI server [1]. Information on metabolic pathways

ontology and pathway enzymes was obtained from

BioCyc [7]. Latest PDB is used to get 3D structure of the

enzymes [9]. Data of reaction kinetics and enzyme

catalytic site data is obtained from BRENDA [10] and

Catalytic Site Atlas (CSA) [5] respectively. Drug target

data specific to bacterial systems was retrieved from Drug

Bank [11]. Homology models were built in-house using

Insight II with distance dependent dielectric constant. But

no explicit water molecules.

2.3 Data annotation and curation

The usefulness and quality of any data resource

depends on the accuracy and up to datedness of data in the

database. In PuMPE special care is taken to improve

annotations and curation of the data. Enrichment of

pathway annotations for each bacterium is carried out in

PuMPE using following approach –

An enzyme in a pathway is considered to be

present if the query sequence has Bit score 100 and E-





327

value 0.05 with an annotated enzyme belonging to

closely related species and known to be present in the

same pathway. Further analysis is done to check if such a

sequence has a catalytic site identical to the reference

enzyme. If both the results are positive then shotgun

methodology [12] was used to confirm the presence of an

enzyme in the pathway. If all the enzymes in the pathway

were found to be present, then only the pathway is marked

as present in the bacterium in question. The above

approach helped to identify additional pathways that are

included in PuMPE, marked with ―*‖.

2.4 Data Visualization

Data visualization is done at three different levels:

i) 3D structures of enzymes whose

experimental 3D structural information is

available in PDB or whose 3D structure is

predicted using Insight II are visualized

using Jmol, a public domain software for

windows.

ii) In house visualizer is developed to visualize

2D and 3D structures of metabolites. This

visualization tool is written in Java.

iii) ―JavaScript Information Visualization

Toolkit‖ was used to visualize interaction of

individual pathway with remaining pathways

through common compounds.

2.5 Search and retrieval of data from

database

User can search enzymes, compounds and pathways.

Enzymes can be searched by providing

EC number (Enzyme commission four digit

number)

Enzyme name

CAS number (Chemical Abstracts Service

number )

Compounds can be searched by their

Name

Formula

CAS number

Entire list and total number of pathways present

in any bacteria can be obtained by selecting the bacteria of

interest from a drop-down box (Figure 3). If a particular

pathway is present in a bacteria then a logical navigation is

provided beginning with pathway information followed by

enzyme information which includes 3D structures,

PROSITE pattern [13], dynamically generated PubMed

links, Km values, amino acid residues in catalytic site,

homology models wherever available etc., and finally the

nucleotide and protein sequence of the enzymes in the

pathway selected by the user.

Figure 3: Retrieval of metabolic pathways

from bacteria

3. UTILITIES AT PuMPE

The usefulness of a database increases if analysis

utilities are also developed. Users should be able to extract

knowledge using these tools. It is with this aim following

analysis tools were developed and incorporated in the

resource.

(a) Comparison of metabolic pathway between two

bacteria can be performed and the presence /

absence of a pathway against other bacteria can

be identified (Figure 4). This tool is written using

ASP by implementing Java Scripts. The tool uses

the unique id of each pathway to compare and

report presence / absence of a pathway.

Figure 4: Comparison between metabolic pathways

from two organisms.





328

(b) Metabolic categorization

The organization of metabolome in

different categories is initiated by identifying the

core pathways. Core pathways are identified by

comparing unique pathway id‘s among 94

bacteria having 250 annotated metabolic

pathways. The pathway id‘s present in all 94

bacteria are included in core pathways. 42 core

pathways were identified which are common in

each of the 94 bacteria considered for this

analysis [14]. These form Stage I of metabolic

categorization – the start point of metabolic

categorization. The rest of the pathways in every

bacterium are then categorized depending upon

direct or indirect interaction of each of the

remaining pathways with the Core/Stage I

pathways. Interaction between two pathways is

defined by the presence of at least one common

compound. Thus the pathways categorization

utility compares compound id‘s from each of the

Stage I pathway id‘s with the compound id‘s

from each of the remaining pathway id‘s.

Pathway id‘s having common compound id‘s

with the Stage I pathways are then categorized as

Stage II pathways. Following the same logic of

identifying common compound id‘s between

newly categorized pathways and remaining

pathways, this tool categorizes the metabolome

iteratively. Categorization process is stopped

when no common compounds exists between

newly categorized pathways and remaining

pathways. The interaction of pathways present in

different categories is documented in PuMPE and

can browsed (Figure. 5) and visualized (Figure.

6). As depicted in Figure. 6, each pathway is

represented as a node. Interacting pathways

between two categories are connected through an

edge displaying the common compound.

For visualization, in the parlance of graph theory,

each pathway is represented as a node and an

edge (representing a common compound)

connects interacting pathway nodes. The

visualization is modular in nature avoiding the

complex interconnectivity of large-scale

metabolic networks. Pathway interactions are

depicted in systematic order with query pathway

(pathways for which the user intends to obtain

interacting pathway) as the root and the

interacting pathways as internal node/leaf node.

The internal node being connected to its

interacting pathway and so on, until a leaf node is

obtained that has no further interacting pathways.

The simplicity and significance of this depiction

can be readily comprehended. One can easily

understand the impact of disrupting a particular

pathway on global network. This will be useful to

know the effects of enzyme drug target on other

metabolic pathways.

Figure 5: Pathway categories

Figure 6: Interactions between different categories of

pathways through common compound.

(c) Bacterial family-wise metabolic pathways

distribution

Bacterial family-wise distribution of

each metabolic pathway can be studied by

selecting the bacterial family and metabolic

pathway of interest from the drop-down box

(Figure.7). This tool directly shows if the selected

pathway is identical/similar or absent across all

bacteria belonging to the selected bacterial

family. To report a pathway as identical, this tool

checks if start compound id, intermediate

compound id‘s and end compound id are same

between the reference pathway and the pathway

present in the bacterium. Where as, a pathway is

reported as similar if start and end compound id‘s

are same but intermediate compound id‘s are

different between the reference pathway and the

pathway present in the bacterium. Further, if the

pathway is similar then the alternate pathway





329

reaction id and the corresponding enzymes are

provided as hyperlinks through this utility.

Figure 7: Family-wise distribution of pathway

d) Metabolic pathway profile based metabolome

tree

Metabolic pathway profile based metabolome

tree is computed to understand the relatedness of

metabolomes among bacterial species belonging to

same family. Such relations among metabolomes of

the bacteria may be similar or different when

compared with the relationship that one obtains by

comparing full genome or several proteins. The order

of biochemical reactions in a pathway is evolved

differently and depends on the requirement of products

as well as on the delicate balance of intermediates.

Thus pathways evolution is a multidimensional process

where biochemical reactions, rate of reactions and the

order of reactions are optimised. The metabolic

pathways profiling provides insights in this aspect of

biology of bacterial species in the family (Figure 8).

Figure 8: Metabolome based tree

Further the resource provides a list of choke point

enzymes for each bacterium and these enzymes are

mapped on metabolic pathways. Choke points are critical

points in metabolic networks. Inactivation of choke points

may lead to an organism's failure to produce or consume

particular metabolites that could cause serious problems

for fitness or survival of the organism [15]. Using choke

point enzymes information, potential drug targets can be

identified [16, 17].

4. DISCUSSIONS

In this resource BioCyc ontology is used which

has several advantages as compared to ontology used in

KEGG PATHWAY [18] as it considers smallest pathway

as the unit and provides unique id to such pathway. This

helps in comparison of pathways. PuMPE has many

additional derived data fields those add value to the

database and are essential to make PuMPE a useful

resource for pathways engineering.

The novel aspect of this resource is ―Metabolic

categorization‖ as well as the ―Metabolome-based tree‖.

The metabolic categorization is governed by the

interactions of a set of identical pathways (Stage I

pathways - present in all completely sequenced well-

annotated bacteria) with the remaining pathways in a

bacterium. This has huge implication in drug discovery

where in, complications resulting from adverse drug

reactions are observed as a result of lack of complete

information about the global interaction of metabolic

pathways [19]. This is generally observed when a drug

target has a role to play in more than one pathway [19].

Metabolic categorization can be used to identify targets

participating in unique pathway with least global

interaction. Non-interacting pathways from each Stage can

be potential drug targets with minimal side effects, as they

do not interfere with functioning of rest of the pathways.

Further choke point enzymes are identified and

reported in the resource. These choke point enzymes are

mapped onto metabolic pathways. The knowledge on

metabolic Stage and choke point enzymes will help to

make drug discovery process more efficient and reliable.

It has been shown that efficient metabolic

engineering can be undertaken by knocking out competing

pathways to improve the yield of target metabolites

[20,21,22]. Knowledge of global interaction of each

pathway in PuMPE can be used to block the competing

pathways and thus maximize the yield of required

metabolite.

Pathway alignment of single/multiple pathways

across organisms in order to infer a metabolome-based

tree is known to provide valuable information on

metabolic capabilities of different organisms. Though

multiple efforts have been made to infer metabolome-

based tree, there is not a single web-resource that provides

this information readily. This void is filled by the

inclusion of metabolome-based tree for each bacterial-

family in PuMPE. Further, distribution of each metabolic

pathway across bacteria belonging to distinct bacterial-





330

family can be browsed in PuMPE to interpret pathways as

identical, similar or absent across the family.

Taken all in consideration, PuMPE has useful

information pertaining to systems biology. It offers a

reliable platform to study Biology in holistic manner.

PuMPE has been developed at the Bioinformatics Centre,

University of Pune. A monthly updation of PuMPE is

planned. It can be accessed through

http://115.111.37.202/mpe/

ACKNOWLEDGEMENT

One of the authors, Shweta Kolhi acknowledges

financial assistance from Department of Biotechnology -

Center of Excellence Scheme, Government of India. The

authors would also like to acknowledge Dr. Sangeeta

Sawant, Mr. Om Prakash Pandey and Miss. Deshpande for

their help.

REFERENCES [1] Sayers E, Tanya Barrett, Dennis A. Benson, Evan

Bolton, Stephen H. Bryant, Kathi Canese, Vyacheslav

Chetvernin, Deanna M. Church, Michael DiCuccio,

Scott Federhen, Michael Feolo, Ian M. Fingerman,

Lewis Y. Geer, Wolfgang Helmberg, Yuri Kapustin,

David Landsman, David J. Lipman, Zhiyong Lu,

Thomas L. Madden, Tom Madej, Donna R. Maglott,

Aron Marchler-Bauer, Vadim Miller, Ilene Mizrachi,

James Ostell, Anna Panchenko, Lon Phan, Kim D.

Pruitt, Gregory D. Schuler, Edwin Sequeira, Stephen

T. Sherry, Martin Shumway, Karl Sirotkin, Douglas

Slotta, Alexandre Souvorov, Grigory Starchenko,

Tatiana A. Tatusova, Lukas Wagner, Yanli Wang, W.

John Wilbur, Eugene Yaschenko, and Jian Ye

Database resources of the National Center for

Biotechnology Information Nucleic Acids Res. D38-

D51. 2011 January; 39(Database issue): Published

online 2010 November 20

[2] Catherine Brooksbank, Graham Cameron, and Janet

Thornton. The European Bioinformatics Institute‘s

data resources Nucleic Acids Res. 2010 January;

38(Database issue): D17–D25. Published online 2010

January.

[3] Flicek P, M. Ridwan Amode, Daniel Barrell, Kathryn

Beal, Simon Brent, Yuan Chen, Peter Clapham, Guy

Coates, Susan Fairley, Stephen Fitzgerald, Leo

Gordon, Maurice Hendrix, Thibaut Hourlier, Nathan

Johnson, Andreas Kähäri, Damian Keefe, Stephen

Keenan, Rhoda Kinsella, Felix Kokocinski, Eugene

Kulesha, Pontus Larsson, Ian Longden, William

McLaren, Bert Overduin, Bethan Pritchard, Harpreet

Singh Riat, Daniel Rios, Graham R. S. Ritchie,

Magali Ruffier, Michael Schuster, Daniel Sobral,

Giulietta Spudich, Y. Amy Tang, Stephen Trevanion,

Jana Vandrovcova, Albert J. Vilella, Simon White,

Steven P. Wilder, Amonida Zadissa, Jorge Zamora,

Bronwen L. Aken, Ewan Birney, Fiona Cunningham,

Ian Dunham, Richard Durbin, Xosé M. Fernández-

Suárez, Javier Herrero, Tim J. P. Hubbard, Anne

Parker, Glenn Proctor, Jan Vogel and Stephen M. J.

Searle Ensembl 2011 Nucleic Acids Research 39

Database issue:D800-D806. 2011

[4] Hubble J, Demeter J, Jin H, Mao M, Nitzberg M,

Reddy TB, Wymore F, Zachariah ZK, Sherlock G,

Ball CA. Implementation of GenePattern within the

Stanford Microarray Database. Nucleic Acids

Res;37(Database Issue):D898-901. 2009 Jan 1

[5] Craig T. Porter, Gail J. Bartlett, and Janet M.

Thornton .The Catalytic Site Atlas: a resource of

catalytic sites and residues identified in enzymes

using structural data. Nucl. Acids. Res. 32: D129-

D133. 2004

[6] Griffiths-Jones S, Saini HK, van Dongen S, Enright

AJ. miRBase: tools for microRNA genomics. Nucleic

Acids Res 36(Database Issue):D154-D158. 2008

[7] Caspi R, Foerster H, Fulcher CA, Kaipa P,

Krummenacker M, Latendresse M, Paley S, Rhee SY,

Shearer AG, Tissier C, Walk TC, Zhang P, Karp

PD.The MetaCyc Database of metabolic pathways

and enzymes and the BioCyc collection of

Pathway/Genome Databases.Nucleic Acids Res.

Jan;36(Database issue):D623-31. Epub 2007 Oct 27.

2008

[8] Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M.,

and Hirakawa, M.; KEGG for representation and

analysis of molecular networks involving diseases and

drugs. Nucleic Acids Res. 38, D355-D360. 2010.

[9] Berman H.M, J. Westbrook, Z. Feng, G. Gilliland,

T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne

The Protein Data Bank Nucleic Acids Research, 28:

235-242. 2000

[10] Schomburg I, Chang A, Ebeling C, Gremse M, Heldt

C, Huhn G, Schomburg D. "BRENDA, the enzyme

database: updates and major new developments".

Nucleic Acids Res 32 (Database issue): D431–433.

2004

[11] Wishart DS, Knox C, Guo AC, Shrivastava S,

Hassanali M, Stothard P, Chang Z, Woolsey J

DrugBank: a comprehensive resource for in silico

drug discovery and exploration..Nucleic Acids Res.

Jan 1;34(Database issue):D668-72. 2006

[12] Pegg S.C, Babbitt P.C, Shotgun: getting more from

sequence similarity searches. Bioinformatics 15, 729-

740 1999.

[13] Hulo N., Bairoch A., Bulliard V., Cerutti L., De

Castro E., Langendijk-Genevaux P.S., Pagni M.,

http://115.111.37.202/mpe/

http://115.111.37.202/mpe/





331

Sigrist C.J.A. The PROSITE database. Nucleic Acids

Res. 34:D227-D230. 2006.

[14] Kolaskar A.S., Kolhi Shweta., Categorization of

Metabolome in Bacterial Systems. Unpublished.

Manuscript under preparation.

[15] Yeh I, Hanekamp T, Tsoka S, Karp PD, Altman RB.

Computational analysis of Plasmodium falciparum

metabolism: organizing genomic information to

facilitate drug discovery. Genome Res, . 14, 917–924.

2004

[16] Deepak Perumal, Chu Sing Lim, Kishore R.

Sakharkar and Meena K. Sakharkar Load Points‘ and

‗Choke Points‘ as Nodes for Prioritizing Drug Targets

in Pseudomonas aeruginosa. Current Bioinformatics, ,

4, 48-53. 2009

[17] Dong-Yup Lee,Bevan Kai Sheng Chung, Faraaz N.K.

Yusufi,and Suresh Selvarasu In Silico Genome-Scale

Modeling and Analysis for Identifying Anti-

Tubercular Drug Targets. Drug Development

Research 72 : 121-129 2011

[18] Green M.L, Karp P.D, The outcomes of pathway

database computations depend on pathway ontology.

Nucleic Acids Res. 34, 3687-97. 2006.

[19] Watterson S, Marshall S, Ghazal P.Logic models of

pathway biology.Drug Discov Today. May;13(9-

10):447-56. Epub 2008 Apr 23. Review. 2008

[20] Jarboe LR, Grabar TB, Yomano LP, Shanmugan KT,

Ingram LO. Development of ethanologenic bacteria,

Adv. Biochem. Eng. Biotechnol. 108 , pp. 237–261.

2007

[21] Leonard E, Yan Y, Fowler ZL, Li Z, Lim CG, Lim

KH, Koffas MA. Strain improvement of recombinant

Escherichia coli for efficient production of plant

flavonoids, Mol. Pharm. 5 , pp. 257–265. 2008

[22] Causey TB, Shanmugam KT, Yomano LP, Ingram

LO. Engineering Escherichia coli for efficient

conversion of glucose to pyruvate, Proc. Natl. Acad.

Sci. U. S. A. 101 , pp. 2235–2240. 2004

Documents

Journal of Computing Pune University Metabolic Pathway ... · Pune University Metabolic Pathway Engineering (PuMPE) Resource 1A.S.Kolaskar, 2Shweta Kolhi 1 ... Open source relational