Systems Biology Model Semantics and Integration

Embed Size (px)

Citation preview

Slide 1

Systems Biology Model Semantics and Integration

Allyson ListerBiology, Neurosciences & Computing GroupNewcastle University29 July 2011

This presentation is licensed under Creative Commons BY-SA 2.5

Background in Standards

In the beginning, there was syntax...TrEMBL, FuGE, SBML

...then came content...MIGS/MIMS

...And ultimately, semanticsOBI, SBO, BioPAX

Background in Integration

Throughout, an interest in data integrationRedundancy removal in TrEMBL

International Protein Index (IPI)

FuGE-based metadata database and data storage (SyMBA)

Semantic data integration

Rule-Based Mediation (RBM)

Integrate data from multiple data sources into a single, core ontology for reasoning, querying and data extraction back to a chosen (non-OWL) format

RBM (continued)

Resolution of syntactic and semantic heterogeneity occurs separatelyThe core ontology is a semantically-rich description of the research domain of interest

Syntactic ontologies pass data to the core via SWRLAre either syntactic translations of data formats into OWL or pre-existing OWL ontologies

Mainly uses existing, independent ontologies and off-the-shelf libraries and applications

RBM (continued)

Is notOntology alignmentAlignment often a prelude to ontology merging, and used where domains at least partly intersect

We do not intend to merge ontologies, and each data source may be very different from another

Format reconciliationWe are not trying to create a single, overarching format just quickly pull data from many formats

Systems Biology and RBM

Add information to modelsAdd new interactions/pathways to existing models

Add new biological annotation to existing models

Build skeleton models of requested interactions/pathways

RBM Overview

UniprotKBCellMLPathwayCommonsXMLXMLOWLOWLBioPAXCoreOntologyInstancesResolvesyntacticheterogeneityResolvesemanticheterogeneity, reasoning,querying

RetrievedataMFO

Exportto SBML, other formats If required

BioModelsXMLMFO

...XMLOWL

Why have an OWL intermediary where converters exist?

BioPAX SBML, CellML SBML converters existLossy (due to different scopes of each format)

Might not get what we need from such conversion

Why have an OWL intermediary where converters exist?

BioPAX SBML, CellML SBML converters existWith SWRL rules we can pull information for exactly those portions of each format we're interested in

Not dependent upon external developers if the meaning or structure of a format changes

Easier to change rules (especially for web applications or novices) and re-run mappings than re-write hard-coded Java/perl etc.

The core ontology

Ideally, a core ontology should be a tightly-scoped ontology describing the domain of interest

Multiple core ontologies can be created as necessary to address multiple biological questions

We began with an ontology describing the basics of telomere uncapping

Sharing common concepts among core ontologies

To make it easier to swap out core ontologies, use a common ontology which all can inherit BioPAX Level 3 (and perhaps the SBPAX3 extension) is being considered for my research

Such an ontology can be selectively enriched with the biological information of interest

Only a small number of domain-specific SWRL rules would be needed with each new core ontology

Visible face of RBM

Saintpulls suggested MIRIAM annotation and possible interactors from web services

syntactic integration of data, or direct querying of WSs based on query strings built from the SBML/CellML models

semantic Saint will also pull information out of RBM-integrated data

Reasoning

Not much reasoning over BioPAX yet, though as a component of my core ontology this will be coming soon

Reasoning over MFO models is quick, which is to be expected given the (deliberate) relative lack of complexity

Reasoning

Reasoning and querying over the core ontology has already discovered new annotations as well as possible identification of unknown species in SBML models

Reasoning tends to be slower than I'd like, although much of it can be done behind the scenes and the results stored for later queries (i.e. with SQWRL)

Many interesting projects

Model annotation for Synthetic BiologyGoksel Misirli and others

BioPAX SBMLSBPAX3 and other work by Oliver Ruebenacker and others

EBI BioPAX SBML conversion

RBM using both as data sources

Many interesting projects

SBML and OWLMFO

SBMLHarvester by Robert Hoehndorf and others

CellML and OWL

Related Work from Us

Model annotation for synthetic biology: http://dx.doi.org/10.1093/bioinformatics/btr048

Rule-Based Mediation http://cisban-silico.cs.ncl.ac.uk/RBM/http://dx.doi.org/10.1186/2041-1480-1-S1-S3

MFO: http://cisban-silico.cs.ncl.ac.uk/MFO/ doi:10.2390/biecoll-jib-2007-80

Related Work from Us

SyMBA: http://symba.sf.net

Saint: http://saint-annotate.sf.net http://dx.doi.org/10.1093/bioinformatics/btp523

Other Related Work

SBPAX3 http://sourceforge.net/apps/mediawiki/biopax/index.php?title=SBPAX3

SBMLHarvester http://bioonto.gen.cam.ac.uk/sbmlharvester/

SBML BioPAX conversion sbml2biopax http://www.ebi.ac.uk/compneur-srv/sbml/converters/SBMLtoBioPax.html

CellML and OWL, Wimalaratne et al. doi: 10.1093/bioinformatics/btp391

Thank you!

And thanks also to Phil Lord and Neil Wipat, my PhD supervisors

Biology, Neurosciences & Computing Group at the Computing Science Department, Newcastle University

CISBAN

BBSRC

Contact Me

Contact me@allysonlister

http://themindwobbles.wordpress.com

06/29/11