Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Volume 2 No.7, JULY 2011 ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
©2010-11 CIS Journal. All rights reserved.
http://www.cisjournal.org
325
Pune University Metabolic Pathway Engineering (PuMPE) Resource
1A.S.Kolaskar,
2Shweta Kolhi
1KIIT University, Bhubaneswar - 751024, India. 2Bioinformatics Center, University of Pune, Pune – 411007, India.
[email protected], [email protected]
ABSTRACT
PuMPE is a comprehensive resource that provides integrated information on metabolome of bacterial systems. The genome
data is annotated to infer metabolic pathways using in-house tools and web-based sources. PuMPE introduces a novel
aspect of metabolic categorization. It is the first resource to provide metabolome-based tree computed by comparing
metabolome between bacteria. PuMPE has metabolic pathways information for 581 bacteria having completely sequenced
genome. Information on Km (Michaelis constant) values, catalytic site data and 3D structures of enzymes is integrated and
made available on one platform. Open source relational database management system MySQL is used at the backend and
software used for visualization of structures and pathway interactions are also from open source. Updation is done
regularly with minimal human intervention. This resource is user friendly and provides unique integrated information to
carry out metabolic pathway engineering. It is available at http://115.111.37.202/mpe/
Keywords— Metabolic pathways database, pathways interactions, metabolome-based tree, metabolic categorization
1. INTRODUCTION
Advancements in instrumentation over the last
two decades have lead to exponential increase in
biological data. This biological data is in the form of
sequence data for genome and proteome, microarray data
for gene expression profiles, metabolome data for
metabolic pathways information etc. Many public domain
databases catalogue this information in a systematic
manner. These databases can be general or specific in
nature. NCBI [1], EBI [2] , Ensembl [3] etc are examples
of databases available in public domain, having general
molecular biology information. Stanford Microarray
Database [4] , Catalytic site atlas [5] , miRBase: the
microRNA database [6] are few examples of database
having specific biological data . These static databases
have enormous information on genes and proteins. On the
other hand, continuous dynamic interaction with the
environment is an important property of any living system
and hence there is a need for a comprehensive resource
that provides information on dynamic interactions between
genes, proteins and ligands. Metabolic pathways database
is an example of such dynamic interactions.
Metabolism is one of the better-documented
biological processes that represents interacting network of
genes. There exist metabolic pathways databases like
BioCyc [7], KEGG PATHWAY [8] etc., which provide
extensive information on organism specific pathway data.
However they do not include data on interaction of
pathways among themselves in the metabolome and
relationship between organisms depending on their
metabolome. Further enzyme kinetics data important for
metabolic pathways engineering is also absent from
above-mentioned databases. Inclusion of such information
reflects behaviour of an organism. To study biology of an
organism at molecular level in a holistic manner it is
necessary to catalogue systematically this data in a user-
friendly mode that can be then used to extract knowledge.
There is a need to develop software tools to analyse the
data in the database and extract knowledge relevant to the
user. These databases and software tools become
important resources and are helpful to build user specific
programs to engineer metabolic pathways and provide
help in designing new biological species or cells. Pune
University Metabolic Pathways Engineering resource is
one such attempt.
2. PuMPE DESCRIPTION
Pune University Metabolic Pathways Engineering
(PuMPE) resource has primary as well as derived data that
will be useful to carry out metabolic pathways
engineering. The database includes metabolic pathways
information for bacterial systems whose genome is fully
sequenced. The PuMPE resource contains all the
information that is available in KEGG PATHWAY for
fully sequenced bacteria by following the BioCyc
ontology. In addition to data from KEGG PATHWAY
several new primary and derived data are added to
increase the utility of the resource. Some of the unique
features of PuMPE include metabolic pathways
categorization, metabolome based tree, visualization of
interaction of each pathway with metabolome and Km
(Michaelis constant) values [10]. PuMPE also provides
information on catalytic site [5] , 3D structures of enzymes
[9], choke point enzymes, dynamic links to literature
database (PubMed) etc. Data is organized in a relational
database MySQL at the backend and has a user-friendly
front-end. Currently PuMPE has metabolic pathways
information on 581 bacteria having completely sequenced
genome. It contains information on 1750 pathways and
10201 reactions.
Volume 2 No.7, JULY 2011 ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
©2010-11 CIS Journal. All rights reserved.
http://www.cisjournal.org
326
2.1 Structure and implementation of PuMPE
PuMPE consists of module for data acquisition
and curation. The data is organized in relational database
management system MySQL and schema is given in
Figure 1. PuMPE is composed of 11 linked tables and
contains information given in Table 1
Figure 1: Schema of PuMPE
Table 1: Names and contents of tables in PuMPE
The query system has been developed using ASP.
A user-friendly web interface is designed in HTML by
implementing Java scripts. Parsing, annotation and data
updates have been automated to minimize human
intervention. Workflow of data collection and analysis in
PuMPE is given in Figure 2. As can be seen, data is
fetched from various sources and genome annotation is
undertaken using tools developed in-house to identify
pathways and to analyse them to extract knowledge.
Figure 2: PuMPE workflow
In addition to these primary data elements,
derived data elements such as categorization of metabolic
pathways, metabolome based bacterial relationship tree etc
are also incorporated (see Figure 1 for schema of database
part of PuMPE resource)
2.2 Data acquisition
The bacterial genome sequences are obtained
from the repository of nucleic acid sequences available at
the NCBI server [1]. Information on metabolic pathways
ontology and pathway enzymes was obtained from
BioCyc [7]. Latest PDB is used to get 3D structure of the
enzymes [9]. Data of reaction kinetics and enzyme
catalytic site data is obtained from BRENDA [10] and
Catalytic Site Atlas (CSA) [5] respectively. Drug target
data specific to bacterial systems was retrieved from Drug
Bank [11]. Homology models were built in-house using
Insight II with distance dependent dielectric constant. But
no explicit water molecules.
2.3 Data annotation and curation
The usefulness and quality of any data resource
depends on the accuracy and up to datedness of data in the
database. In PuMPE special care is taken to improve
annotations and curation of the data. Enrichment of
pathway annotations for each bacterium is carried out in
PuMPE using following approach –
An enzyme in a pathway is considered to be
present if the query sequence has Bit score 100 and E-
Volume 2 No.7, JULY 2011 ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
©2010-11 CIS Journal. All rights reserved.
http://www.cisjournal.org
327
value 0.05 with an annotated enzyme belonging to
closely related species and known to be present in the
same pathway. Further analysis is done to check if such a
sequence has a catalytic site identical to the reference
enzyme. If both the results are positive then shotgun
methodology [12] was used to confirm the presence of an
enzyme in the pathway. If all the enzymes in the pathway
were found to be present, then only the pathway is marked
as present in the bacterium in question. The above
approach helped to identify additional pathways that are
included in PuMPE, marked with ―*‖.
2.4 Data Visualization
Data visualization is done at three different levels:
i) 3D structures of enzymes whose
experimental 3D structural information is
available in PDB or whose 3D structure is
predicted using Insight II are visualized
using Jmol, a public domain software for
windows.
ii) In house visualizer is developed to visualize
2D and 3D structures of metabolites. This
visualization tool is written in Java.
iii) ―JavaScript Information Visualization
Toolkit‖ was used to visualize interaction of
individual pathway with remaining pathways
through common compounds.
2.5 Search and retrieval of data from
database
User can search enzymes, compounds and pathways.
Enzymes can be searched by providing
EC number (Enzyme commission four digit
number)
Enzyme name
CAS number (Chemical Abstracts Service
number )
Compounds can be searched by their
Name
Formula
CAS number
Entire list and total number of pathways present
in any bacteria can be obtained by selecting the bacteria of
interest from a drop-down box (Figure 3). If a particular
pathway is present in a bacteria then a logical navigation is
provided beginning with pathway information followed by
enzyme information which includes 3D structures,
PROSITE pattern [13], dynamically generated PubMed
links, Km values, amino acid residues in catalytic site,
homology models wherever available etc., and finally the
nucleotide and protein sequence of the enzymes in the
pathway selected by the user.
Figure 3: Retrieval of metabolic pathways
from bacteria
3. UTILITIES AT PuMPE
The usefulness of a database increases if analysis
utilities are also developed. Users should be able to extract
knowledge using these tools. It is with this aim following
analysis tools were developed and incorporated in the
resource.
(a) Comparison of metabolic pathway between two
bacteria can be performed and the presence /
absence of a pathway against other bacteria can
be identified (Figure 4). This tool is written using
ASP by implementing Java Scripts. The tool uses
the unique id of each pathway to compare and
report presence / absence of a pathway.
Figure 4: Comparison between metabolic pathways
from two organisms.
Volume 2 No.7, JULY 2011 ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
©2010-11 CIS Journal. All rights reserved.
http://www.cisjournal.org
328
(b) Metabolic categorization
The organization of metabolome in
different categories is initiated by identifying the
core pathways. Core pathways are identified by
comparing unique pathway id‘s among 94
bacteria having 250 annotated metabolic
pathways. The pathway id‘s present in all 94
bacteria are included in core pathways. 42 core
pathways were identified which are common in
each of the 94 bacteria considered for this
analysis [14]. These form Stage I of metabolic
categorization – the start point of metabolic
categorization. The rest of the pathways in every
bacterium are then categorized depending upon
direct or indirect interaction of each of the
remaining pathways with the Core/Stage I
pathways. Interaction between two pathways is
defined by the presence of at least one common
compound. Thus the pathways categorization
utility compares compound id‘s from each of the
Stage I pathway id‘s with the compound id‘s
from each of the remaining pathway id‘s.
Pathway id‘s having common compound id‘s
with the Stage I pathways are then categorized as
Stage II pathways. Following the same logic of
identifying common compound id‘s between
newly categorized pathways and remaining
pathways, this tool categorizes the metabolome
iteratively. Categorization process is stopped
when no common compounds exists between
newly categorized pathways and remaining
pathways. The interaction of pathways present in
different categories is documented in PuMPE and
can browsed (Figure. 5) and visualized (Figure.
6). As depicted in Figure. 6, each pathway is
represented as a node. Interacting pathways
between two categories are connected through an
edge displaying the common compound.
For visualization, in the parlance of graph theory,
each pathway is represented as a node and an
edge (representing a common compound)
connects interacting pathway nodes. The
visualization is modular in nature avoiding the
complex interconnectivity of large-scale
metabolic networks. Pathway interactions are
depicted in systematic order with query pathway
(pathways for which the user intends to obtain
interacting pathway) as the root and the
interacting pathways as internal node/leaf node.
The internal node being connected to its
interacting pathway and so on, until a leaf node is
obtained that has no further interacting pathways.
The simplicity and significance of this depiction
can be readily comprehended. One can easily
understand the impact of disrupting a particular
pathway on global network. This will be useful to
know the effects of enzyme drug target on other
metabolic pathways.
Figure 5: Pathway categories
Figure 6: Interactions between different categories of
pathways through common compound.
(c) Bacterial family-wise metabolic pathways
distribution
Bacterial family-wise distribution of
each metabolic pathway can be studied by
selecting the bacterial family and metabolic
pathway of interest from the drop-down box
(Figure.7). This tool directly shows if the selected
pathway is identical/similar or absent across all
bacteria belonging to the selected bacterial
family. To report a pathway as identical, this tool
checks if start compound id, intermediate
compound id‘s and end compound id are same
between the reference pathway and the pathway
present in the bacterium. Where as, a pathway is
reported as similar if start and end compound id‘s
are same but intermediate compound id‘s are
different between the reference pathway and the
pathway present in the bacterium. Further, if the
pathway is similar then the alternate pathway
Volume 2 No.7, JULY 2011 ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
©2010-11 CIS Journal. All rights reserved.
http://www.cisjournal.org
329
reaction id and the corresponding enzymes are
provided as hyperlinks through this utility.
Figure 7: Family-wise distribution of pathway
d) Metabolic pathway profile based metabolome
tree
Metabolic pathway profile based metabolome
tree is computed to understand the relatedness of
metabolomes among bacterial species belonging to
same family. Such relations among metabolomes of
the bacteria may be similar or different when
compared with the relationship that one obtains by
comparing full genome or several proteins. The order
of biochemical reactions in a pathway is evolved
differently and depends on the requirement of products
as well as on the delicate balance of intermediates.
Thus pathways evolution is a multidimensional process
where biochemical reactions, rate of reactions and the
order of reactions are optimised. The metabolic
pathways profiling provides insights in this aspect of
biology of bacterial species in the family (Figure 8).
Figure 8: Metabolome based tree
Further the resource provides a list of choke point
enzymes for each bacterium and these enzymes are
mapped on metabolic pathways. Choke points are critical
points in metabolic networks. Inactivation of choke points
may lead to an organism's failure to produce or consume
particular metabolites that could cause serious problems
for fitness or survival of the organism [15]. Using choke
point enzymes information, potential drug targets can be
identified [16, 17].
4. DISCUSSIONS
In this resource BioCyc ontology is used which
has several advantages as compared to ontology used in
KEGG PATHWAY [18] as it considers smallest pathway
as the unit and provides unique id to such pathway. This
helps in comparison of pathways. PuMPE has many
additional derived data fields those add value to the
database and are essential to make PuMPE a useful
resource for pathways engineering.
The novel aspect of this resource is ―Metabolic
categorization‖ as well as the ―Metabolome-based tree‖.
The metabolic categorization is governed by the
interactions of a set of identical pathways (Stage I
pathways - present in all completely sequenced well-
annotated bacteria) with the remaining pathways in a
bacterium. This has huge implication in drug discovery
where in, complications resulting from adverse drug
reactions are observed as a result of lack of complete
information about the global interaction of metabolic
pathways [19]. This is generally observed when a drug
target has a role to play in more than one pathway [19].
Metabolic categorization can be used to identify targets
participating in unique pathway with least global
interaction. Non-interacting pathways from each Stage can
be potential drug targets with minimal side effects, as they
do not interfere with functioning of rest of the pathways.
Further choke point enzymes are identified and
reported in the resource. These choke point enzymes are
mapped onto metabolic pathways. The knowledge on
metabolic Stage and choke point enzymes will help to
make drug discovery process more efficient and reliable.
It has been shown that efficient metabolic
engineering can be undertaken by knocking out competing
pathways to improve the yield of target metabolites
[20,21,22]. Knowledge of global interaction of each
pathway in PuMPE can be used to block the competing
pathways and thus maximize the yield of required
metabolite.
Pathway alignment of single/multiple pathways
across organisms in order to infer a metabolome-based
tree is known to provide valuable information on
metabolic capabilities of different organisms. Though
multiple efforts have been made to infer metabolome-
based tree, there is not a single web-resource that provides
this information readily. This void is filled by the
inclusion of metabolome-based tree for each bacterial-
family in PuMPE. Further, distribution of each metabolic
pathway across bacteria belonging to distinct bacterial-
Volume 2 No.7, JULY 2011 ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
©2010-11 CIS Journal. All rights reserved.
http://www.cisjournal.org
330
family can be browsed in PuMPE to interpret pathways as
identical, similar or absent across the family.
Taken all in consideration, PuMPE has useful
information pertaining to systems biology. It offers a
reliable platform to study Biology in holistic manner.
PuMPE has been developed at the Bioinformatics Centre,
University of Pune. A monthly updation of PuMPE is
planned. It can be accessed through
http://115.111.37.202/mpe/
ACKNOWLEDGEMENT
One of the authors, Shweta Kolhi acknowledges
financial assistance from Department of Biotechnology -
Center of Excellence Scheme, Government of India. The
authors would also like to acknowledge Dr. Sangeeta
Sawant, Mr. Om Prakash Pandey and Miss. Deshpande for
their help.
REFERENCES [1] Sayers E, Tanya Barrett, Dennis A. Benson, Evan
Bolton, Stephen H. Bryant, Kathi Canese, Vyacheslav
Chetvernin, Deanna M. Church, Michael DiCuccio,
Scott Federhen, Michael Feolo, Ian M. Fingerman,
Lewis Y. Geer, Wolfgang Helmberg, Yuri Kapustin,
David Landsman, David J. Lipman, Zhiyong Lu,
Thomas L. Madden, Tom Madej, Donna R. Maglott,
Aron Marchler-Bauer, Vadim Miller, Ilene Mizrachi,
James Ostell, Anna Panchenko, Lon Phan, Kim D.
Pruitt, Gregory D. Schuler, Edwin Sequeira, Stephen
T. Sherry, Martin Shumway, Karl Sirotkin, Douglas
Slotta, Alexandre Souvorov, Grigory Starchenko,
Tatiana A. Tatusova, Lukas Wagner, Yanli Wang, W.
John Wilbur, Eugene Yaschenko, and Jian Ye
Database resources of the National Center for
Biotechnology Information Nucleic Acids Res. D38-
D51. 2011 January; 39(Database issue): Published
online 2010 November 20
[2] Catherine Brooksbank, Graham Cameron, and Janet
Thornton. The European Bioinformatics Institute‘s
data resources Nucleic Acids Res. 2010 January;
38(Database issue): D17–D25. Published online 2010
January.
[3] Flicek P, M. Ridwan Amode, Daniel Barrell, Kathryn
Beal, Simon Brent, Yuan Chen, Peter Clapham, Guy
Coates, Susan Fairley, Stephen Fitzgerald, Leo
Gordon, Maurice Hendrix, Thibaut Hourlier, Nathan
Johnson, Andreas Kähäri, Damian Keefe, Stephen
Keenan, Rhoda Kinsella, Felix Kokocinski, Eugene
Kulesha, Pontus Larsson, Ian Longden, William
McLaren, Bert Overduin, Bethan Pritchard, Harpreet
Singh Riat, Daniel Rios, Graham R. S. Ritchie,
Magali Ruffier, Michael Schuster, Daniel Sobral,
Giulietta Spudich, Y. Amy Tang, Stephen Trevanion,
Jana Vandrovcova, Albert J. Vilella, Simon White,
Steven P. Wilder, Amonida Zadissa, Jorge Zamora,
Bronwen L. Aken, Ewan Birney, Fiona Cunningham,
Ian Dunham, Richard Durbin, Xosé M. Fernández-
Suárez, Javier Herrero, Tim J. P. Hubbard, Anne
Parker, Glenn Proctor, Jan Vogel and Stephen M. J.
Searle Ensembl 2011 Nucleic Acids Research 39
Database issue:D800-D806. 2011
[4] Hubble J, Demeter J, Jin H, Mao M, Nitzberg M,
Reddy TB, Wymore F, Zachariah ZK, Sherlock G,
Ball CA. Implementation of GenePattern within the
Stanford Microarray Database. Nucleic Acids
Res;37(Database Issue):D898-901. 2009 Jan 1
[5] Craig T. Porter, Gail J. Bartlett, and Janet M.
Thornton .The Catalytic Site Atlas: a resource of
catalytic sites and residues identified in enzymes
using structural data. Nucl. Acids. Res. 32: D129-
D133. 2004
[6] Griffiths-Jones S, Saini HK, van Dongen S, Enright
AJ. miRBase: tools for microRNA genomics. Nucleic
Acids Res 36(Database Issue):D154-D158. 2008
[7] Caspi R, Foerster H, Fulcher CA, Kaipa P,
Krummenacker M, Latendresse M, Paley S, Rhee SY,
Shearer AG, Tissier C, Walk TC, Zhang P, Karp
PD.The MetaCyc Database of metabolic pathways
and enzymes and the BioCyc collection of
Pathway/Genome Databases.Nucleic Acids Res.
Jan;36(Database issue):D623-31. Epub 2007 Oct 27.
2008
[8] Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M.,
and Hirakawa, M.; KEGG for representation and
analysis of molecular networks involving diseases and
drugs. Nucleic Acids Res. 38, D355-D360. 2010.
[9] Berman H.M, J. Westbrook, Z. Feng, G. Gilliland,
T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne
The Protein Data Bank Nucleic Acids Research, 28:
235-242. 2000
[10] Schomburg I, Chang A, Ebeling C, Gremse M, Heldt
C, Huhn G, Schomburg D. "BRENDA, the enzyme
database: updates and major new developments".
Nucleic Acids Res 32 (Database issue): D431–433.
2004
[11] Wishart DS, Knox C, Guo AC, Shrivastava S,
Hassanali M, Stothard P, Chang Z, Woolsey J
DrugBank: a comprehensive resource for in silico
drug discovery and exploration..Nucleic Acids Res.
Jan 1;34(Database issue):D668-72. 2006
[12] Pegg S.C, Babbitt P.C, Shotgun: getting more from
sequence similarity searches. Bioinformatics 15, 729-
740 1999.
[13] Hulo N., Bairoch A., Bulliard V., Cerutti L., De
Castro E., Langendijk-Genevaux P.S., Pagni M.,
Volume 2 No.7, JULY 2011 ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
©2010-11 CIS Journal. All rights reserved.
http://www.cisjournal.org
331
Sigrist C.J.A. The PROSITE database. Nucleic Acids
Res. 34:D227-D230. 2006.
[14] Kolaskar A.S., Kolhi Shweta., Categorization of
Metabolome in Bacterial Systems. Unpublished.
Manuscript under preparation.
[15] Yeh I, Hanekamp T, Tsoka S, Karp PD, Altman RB.
Computational analysis of Plasmodium falciparum
metabolism: organizing genomic information to
facilitate drug discovery. Genome Res, . 14, 917–924.
2004
[16] Deepak Perumal, Chu Sing Lim, Kishore R.
Sakharkar and Meena K. Sakharkar Load Points‘ and
‗Choke Points‘ as Nodes for Prioritizing Drug Targets
in Pseudomonas aeruginosa. Current Bioinformatics, ,
4, 48-53. 2009
[17] Dong-Yup Lee,Bevan Kai Sheng Chung, Faraaz N.K.
Yusufi,and Suresh Selvarasu In Silico Genome-Scale
Modeling and Analysis for Identifying Anti-
Tubercular Drug Targets. Drug Development
Research 72 : 121-129 2011
[18] Green M.L, Karp P.D, The outcomes of pathway
database computations depend on pathway ontology.
Nucleic Acids Res. 34, 3687-97. 2006.
[19] Watterson S, Marshall S, Ghazal P.Logic models of
pathway biology.Drug Discov Today. May;13(9-
10):447-56. Epub 2008 Apr 23. Review. 2008
[20] Jarboe LR, Grabar TB, Yomano LP, Shanmugan KT,
Ingram LO. Development of ethanologenic bacteria,
Adv. Biochem. Eng. Biotechnol. 108 , pp. 237–261.
2007
[21] Leonard E, Yan Y, Fowler ZL, Li Z, Lim CG, Lim
KH, Koffas MA. Strain improvement of recombinant
Escherichia coli for efficient production of plant
flavonoids, Mol. Pharm. 5 , pp. 257–265. 2008
[22] Causey TB, Shanmugam KT, Yomano LP, Ingram
LO. Engineering Escherichia coli for efficient
conversion of glucose to pyruvate, Proc. Natl. Acad.
Sci. U. S. A. 101 , pp. 2235–2240. 2004