8
Beilstein-Institut Integration and Annotation of Kinetic Data of Biochemical Reactions in SABIO-RK Ulrike Wittig * , Renate Kania, Martin Golebiewski, Olga Krebs, Saqib Mir, Andreas Weidemann, Henriette Engelken and Isabel Rojas Scientific Databases and Visualization Group, EML Research gGmbH, Heidelberg, Germany E-Mail: * [email protected] Received: 30 th April 2008 / Published: 20 th August 2008 Abstract SABIO-RK is a curated database for the systems biology community containing biochemical reactions and their kinetic properties, the latter being manually extracted from literature sources. This information is crucial for the quantitative understanding of biological systems. Mod- ellers and wet-lab scientists alike require reliable information about reaction kinetics which is normally contained in publications generated worldwide. In SABIO-RK kinetic data are related to reactions, organ- isms and biological locations. The type of kinetic mechanism and corresponding rate equations are presented together with their para- meters and experimental conditions. In order to enable comprehensive understanding, integration and comparison of data it is necessary to provide annotations and links to community resources, such as external databases and ontologies that augment the content and the semantics of the SABIO-RK database entries. In this short paper we will present SABIO-RK ( http://sabio.villa-bosch.de/SABIORK) and our approach towards integration and annotation of the kinetic data and their respec- tive biochemical context. 85 http://www.beilstein-institut.de/escec2007/proceedings/Wittig/Wittig.pdf ESCEC, September 23 rd – 26 th , 2007, Ru ¨ desheim/Rhein, Germany

Integration and Annotation of Kinetic Data of Biochemical Reactions … · Beilstein-Institut Integration and Annotation of Kinetic Data of Biochemical Reactions in SABIO-RK Ulrike

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Integration and Annotation of Kinetic Data of Biochemical Reactions … · Beilstein-Institut Integration and Annotation of Kinetic Data of Biochemical Reactions in SABIO-RK Ulrike

Beilstein-Institut

Integration and Annotation of Kinetic

Data of Biochemical Reactions in

SABIO-RK

Ulrike Wittig*, Renate Kania, Martin Golebiewski,

Olga Krebs, Saqib Mir, Andreas Weidemann,

Henriette Engelken and Isabel Rojas

Scientific Databases and Visualization Group, EML Research gGmbH,Heidelberg, Germany

E-Mail: *[email protected]

Received: 30th April 2008 / Published: 20th August 2008

Abstract

SABIO-RK is a curated database for the systems biology community

containing biochemical reactions and their kinetic properties, the latter

being manually extracted from literature sources. This information is

crucial for the quantitative understanding of biological systems. Mod-

ellers and wet-lab scientists alike require reliable information about

reaction kinetics which is normally contained in publications generated

worldwide. In SABIO-RK kinetic data are related to reactions, organ-

isms and biological locations. The type of kinetic mechanism and

corresponding rate equations are presented together with their para-

meters and experimental conditions. In order to enable comprehensive

understanding, integration and comparison of data it is necessary to

provide annotations and links to community resources, such as external

databases and ontologies that augment the content and the semantics of

the SABIO-RK database entries. In this short paper we will present

SABIO-RK (http://sabio.villa-bosch.de/SABIORK) and our approach

towards integration and annotation of the kinetic data and their respec-

tive biochemical context.

85

http://www.beilstein-institut.de/escec2007/proceedings/Wittig/Wittig.pdf

ESCEC, September 23rd – 26th, 2007, Rudesheim/Rhein, Germany

Page 2: Integration and Annotation of Kinetic Data of Biochemical Reactions … · Beilstein-Institut Integration and Annotation of Kinetic Data of Biochemical Reactions in SABIO-RK Ulrike

Introduction

For the understanding of complex biochemical processes, the simulation and modelling of

biochemical reactions and complex networks require reliable kinetic information about the

individual reaction steps within the process. Information such as the kinetic laws describing

the dynamics of the reactions with their respective parameters determined under certain

experimental conditions is of paramount importance. These kinetic data are mainly found

in the literature and described in many different formats. Within the publications no con-

trolled vocabulary is used for the representation of biochemical reactions, kinetic data and

environmental conditions. The integration of these data needs the definition and use of

standards for reporting and exchanging.

SABIO-RK extends and supplements the information content of other databases containing

kinetic information (e. g. BRENDA [1], BioModels [2], and JWS Online [3]) by storing

highly interrelated information about biochemical reactions and their kinetics. It includes

reactants and modifying compounds (enzymes, cofactors, inhibitors or activators) of reac-

tions. The kinetic laws with their parameters and information about experimental conditions

are connected with the reaction information. The data about biochemical reactions, their rate

equations and parameters can be exported in SBML (Systems Biology Markup Language)

[4] file format.

The STRENDA [5] (Standards for Reporting Enzymology Data) commission is working on

the definition of a standard for reporting on enzyme activity. The standard should contain the

minimum amount of information that should accompany any published enzyme activity

data. The use of references to controlled vocabularies and ontologies is also of great

importance for the implementation of the STRENDA guidelines.

In this paper we describe new implementations in SABIO-RK and in the input interface to

match the defined requirements of STRENDA.

Data Integration

The data contained in SABIO-RK are extracted from different sources, in order to establish a

broad information basis. Most of the reactions, their associations with biochemical pathways

and their enzymes are automatically extracted from the KEGG database [6]. Information

about chemical compounds is extended additionally by data from ChEBI [7] or

PubChem [8].

Kinetic data in SABIO-RK are mainly extracted from literature and inserted manually using

a web-based input interface to enter the data into a temporary database. The temporary

database is used by biological experts to curate the data and to insert them into the final

SABIO-RK database. The main objective of the input interface is to supply a uniform format

86

Wittig, U. et al.

Page 3: Integration and Annotation of Kinetic Data of Biochemical Reactions … · Beilstein-Institut Integration and Annotation of Kinetic Data of Biochemical Reactions in SABIO-RK Ulrike

for users entering and curating the data found in publications. Apart from this, automatic

control mechanisms have been implemented to check for errors and inconsistencies during

the integration process. The systems check, for example: whether all reaction participants of

a reaction equation have to be defined as compound species; if all parameters in a rate

equation have to be defined in the parameter list independent of whether or not the para-

meter value is known; if the rate equations are mathematically correct; and which (if

applicable) parameter type should be related to a compound species (for example, a Km

value always should be related to a compound, however Vmax or kcat do not need a species

assignment).

To avoid errors and inconsistencies within the SABIO-RK database during the first input

process, the interface offers lists of controlled vocabularies for the selection of values for the

following data:

. Species (compounds search function)

. Species roles (e. g. substrate, product, modifier...)

. Biochemical reactions (search functions)

. Organisms, tissues and cellular locations

. Kinetic law types (e.g. Competitive inhibition or Sequential ordered Bi Bi)

. Parameter types (e. g. Km, kcat, Vmax, Ki, Kd, rate constant)

. Parameter units (e. g. mM, mM, 1/s, nmol/min)

Based on information in the literature, detailed information about the protein catalysing the

reaction is inserted. This includes information about a specific isoenzyme or mutations used

in the experiments (e. g. wildtype, mutant E540K, wildtype isoenzyme A), UniProt [9]

accession numbers and information about the composition of subunits (e. g. (Q6UG02)*4

for a homotetramer).

The data format defined in the input interface fits most of the current requirements of the

STRENDA commission for reporting enzymology data. So SABIO-RK offers the structure

for inserting and storing data required for a complete description of an experiment (Level 1,

List A) which comprise, for example, information about the enzyme (EC number, biological

location, isoenzyme information etc.) and the assay conditions. The description of enzyme

activity data (Level 1, List B) includes kinetic parameters necessary for enzyme function

definition (Vmax, kcat, Km etc.), information about enzyme inhibition or activation and the

description of the kinetic mechanisms including rate equations.

Another possible way to insert kinetic data into SABIO-RK is by using an XML-based

integration tool to import a higher amount of kinetic data automatically.

87

Integration and Annotation of Kinetic Data of Biochemical Reactions in SABIO-RK

Page 4: Integration and Annotation of Kinetic Data of Biochemical Reactions … · Beilstein-Institut Integration and Annotation of Kinetic Data of Biochemical Reactions in SABIO-RK Ulrike

Data Annotation

Scientific communication needs standards and a shared vocabulary to avoid misinterpreta-

tions. Such standards are especially important when gathering data from different sources.

To identify entities or terms unambiguously and to facilitate the search, interpretation and

comparison of data, these are standardized to a uniform format and structure. This implies

the usage and development of controlled vocabularies. Entities and expressions in SABIO-

RK are annotated to other resources and biological ontologies to clarify the biological terms

used and to embed the data into its context. This enables users to collect further information

through links to external databases and facilitates the integration of different database entries

into kinetic models.

Biological ontologies used in SABIO-RK are ChEBI, Gene Ontology (GO) [10], Systems

Biology Ontology (SBO) [11], and NCBI taxonomy [12]. ChEBI is a dictionary and onto-

logical classification of small chemical compounds. Gene Ontology comprises controlled

vocabularies for molecular functions, biological processes and cellular components of gene

products. Systems Biology Ontology defines controlled vocabularies for systems biology,

especially in the context of computational modelling. NCBI taxonomy is a controlled

vocabulary and complex classification system of organisms. These controlled vocabularies

and ontologies are the basis for the definition of the selection lists for compounds, organ-

isms, kinetic law types, parameter types etc. in the SABIO-RK input interface.

Links to external databases in the user interface (Fig. 1) allow the user to connect to other

data sources to get further information about the selected entry. Annotations are shown for

chemical compounds to KEGG, ChEBI and PubChem. Enzymes are linked through their EC

number to ExPASy ENZYME [13], KEGG, IntEnz [14], IUBMB [15] and Reactome [16].

Further links to KEGG are implemented for the reaction equation and the information source

is referenced to PubMed [17]. Proteins or protein complexes catalysing the reactions are

linked to the specific UniProt accession number(s). These annotations of proteins or protein

complexes to UniProt are done manually by biological experts based on information in the

publication.

88

Wittig, U. et al.

Page 5: Integration and Annotation of Kinetic Data of Biochemical Reactions … · Beilstein-Institut Integration and Annotation of Kinetic Data of Biochemical Reactions in SABIO-RK Ulrike

Figure 1. Database entry containing links to external databases and controlled voca-

bularies.

Many annotations to external databases and controlled vocabularies defined in the user

interface are also stored in the SMBL file for exporting the information (Fig. 2).

89

Integration and Annotation of Kinetic Data of Biochemical Reactions in SABIO-RK

Page 6: Integration and Annotation of Kinetic Data of Biochemical Reactions … · Beilstein-Institut Integration and Annotation of Kinetic Data of Biochemical Reactions in SABIO-RK Ulrike

Figure 2. SBML file containing annotations to external databases and controlled

vocabularies.

Future Directions

In order to offer the users more references to additional information we are working on the

cross-linking and annotation of the database content to more database resources and ontol-

ogies.

One of the next steps for further SABIO-RK developments is the incorporation of detailed

information about the mechanisms of the reaction to allow the user to obtain information

about the kinetic properties of sub-reactions or binding mechanisms of enzymes and sub-

strates. This includes the visualization of reaction mechanism information in the user-inter-

face.

In the future the input interface will be further adapted to the current and changing require-

ments of the STRENDA commission which could comprise, for example, more detailed and

structured information about experimental conditions and the methodology of measurement,

purification etc. SABIO-RK will participate in further discussions about STRENDA require-

ments and work on their implementation. SABIO-RK could serve as the basis for a general

data input and storage system for experimentalists and modellers.

90

Wittig, U. et al.

Page 7: Integration and Annotation of Kinetic Data of Biochemical Reactions … · Beilstein-Institut Integration and Annotation of Kinetic Data of Biochemical Reactions in SABIO-RK Ulrike

Summary

SABIO-RK is a curated data resource for modellers of biochemical networks to assemble

information about reactions and their kinetics. It also offers experimentalists the opportunity

to obtain information about biochemical reactions and their kinetics, within the context of

cellular locations, tissues and organisms. The data are extracted automatically from different

databases and kinetic information is manually extracted from literature. The database uses

controlled vocabularies and links to other ontologies or external databases to allow the

comparison of data and to extract additional information from other sources.

Acknowledgements

We would like to thank the Klaus Tschira Foundation as well as the German Research

Council (BMBF) for their funding. We thank all the student helpers, who have contributed

to the population of the database.

References

[1] Schomburg, I., Chang, A., Ebeling, C., Gremse, M., Heldt, C., Huhn, G., Schomburg,

D. (2004) BRENDA, the enzyme database: updates and major new developments.

Nucleic Acids Res. 32:D431-D433.

[2] Le Novere, N., Bornstein, B., Broicher, A., Courtot, M., Donizelli, M., Dharuri, H.,

Li, L., Sauro, H., Schilstra, M., Shapiro, B., Snoep, J.L., Hucka, M. (2006) BioMo-

dels Database: a free, centralized database of curated, published, quantitative kinetic

models of biochemical and cellular systems. Nucleic Acids Res. 34:D689-D691.

[3] Olivier, B.G., Snoep, J.L. (2004) Web-based kinetic modelling using JWS Online.

Bioinformatics 20:2143 – 2144.

[4] Hucka, M., Finney, A., Sauro, H. M., Bolouri, H., Doyle, J.C., Kitano, H., Arkin,

A.P., Bornstein, B.J., Bray, D., Cornish-Bowden, A., Cuellar, A.A., Dronov, S.,

Gilles, E.D., Ginkel, M., Gor, V., Goryanin, I.I., Hedley, W.J., Hodgman, T.C.,

Hofmeyr, J.H., Hunter, P.J., Juty, N.S., Kasberger, J.L., Kremling, A., Kummer, U.,

Le Novere, N., Loew, L.M., Lucio, D., Mendes, P., Minch, E., Mjolsness, E.D.,

Nakayama, Y., Nelson, M.R., Nielsen, P.F., Sakurada, T., Schaff, J.C., Shapiro,

B.E., Shimizu, T.S., Spence, H.D., Stelling, J., Takahashi, K., Tomita, M., Wagner,

J., Wang, J. (2003) The systems biology markup language (SBML): a medium for

representation and exchange of biochemical network models. Bioinformatics

19:524 – 531.

[5] STRENDA: http://www.strenda.org

91

Integration and Annotation of Kinetic Data of Biochemical Reactions in SABIO-RK

Page 8: Integration and Annotation of Kinetic Data of Biochemical Reactions … · Beilstein-Institut Integration and Annotation of Kinetic Data of Biochemical Reactions in SABIO-RK Ulrike

[6] Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S.,

Katayama, T., Araki, M., Hirakawa, M. (2006) From genomics to chemical geno-

mics: new developments in KEGG. Nucleic Acids Res. 34:D354 –D735.

[7] ChEBI: http://www.ebi.ac.uk/chebi/

[8] PubChem: http://www.ncbi.nlm.nih.gov/sites/entrez? db=pccompound

[9] The UniProt Consortium. The Universal Protein Resource (UniProt). (2008) Nucleic

Acids Res. 36:D190 –D195.

[10] Gene Ontology: http://www.geneontology.org/

[11] Systems Biology Ontology: http://www.ebi.ac.uk/sbo/

[12] NCBI taxonomy: http://www.ncbi.nlm.nih.gov/sites/entrez? db=taxonomy

[13] ExPASy ENZYME: http://www.expasy.org/enzyme/

[14] IntEnz: http://www.ebi.ac.uk/intenz/

[15] IUBMB: http://www.chem.qmul.ac.uk/iubmb/enzyme/

[16] Joshi-Tope, G., Gillespie, M., Vastrik, I., D’Eustachio, P., Schmidt, E., de Bono, B.,

Jassal, B., Gopinath, G.R., Wu, G.R., Matthews, L., Lewis, S., Birney, E., Stein, L.

(2005) Reactome: a knowledgebase of biological pathways. Nucleic Acids Res.

33:Database Issue:D428 –D432.

[17] PubMed: http://www.pubmed.gov

92

Wittig, U. et al.