Upload
cynara
View
33
Download
0
Tags:
Embed Size (px)
DESCRIPTION
How Ontologies Add Value BioPAX: Biological Pathway Data Exchange Ontology. Joanne Luciano BioPAX Workgroup ( biopax.org ) BioPathways Consortium Liaison (biopathways.org) 3 May 2005 KM Pro Forum Bentley College, Waltham MA, USA. Introduction. BioPAX = Biopathway Exchange Language - PowerPoint PPT Presentation
Citation preview
How Ontologies Add Value
BioPAX: Biological Pathway Data Exchange Ontology
Joanne LucianoBioPAX Workgroup (biopax.org)
BioPathways Consortium Liaison (biopathways.org)
3 May 2005KM Pro Forum
Bentley College, Waltham MA, USA
3 May 2005 2
IntroductionBioPAX = Biopathway Exchange Language
Emerged at ISMB
•conceived at ISMB ’01•born at ISMB ’02 •crawling at ISMB ’03 (Level 0.5)•walking at ISMB ’04 (Level 1.0)•now in the “terrible twos”
3 May 2005 3
Ontology Intro• Natural language does a poor job at
conveying complex information without ambiguity
• Ontologies provide a means to give concise meanings to pieces of data from a particular domain– Thereby facilitating computational operations on
the data
• Ontologies are becoming increasingly common in the biological community– See http://obo.sourceforge.net/obo.htm
3 May 2005 4
Ontology: Components• Class hierarchy: chemical protein• Relations & attributes: fields (slots) on the
classes, can be other classes• Constraints: Define allowable values and
connections within an ontology• Objects: instances of classes• Values: occupy slots• Controlled vocabularies (CVs)• BioPAX will use class, attributes, constraints,
values and CVs. Objects are user responsibility
* From Peter Karp, “Ontologies: Definitions, Components, Subtypes”, SRI International, presentation available at http://www.biopax.org
3 May 2005 5
What is a Pathway?Depends on who you ask!
MetabolicPathways
MolecularInteractionNetworks
SignalingPathways
GeneRegulation
Glycolysis Protein-Protein Apoptosis Lac Operon
3 May 2005 6
High Throughput Experimental Methods
PubMed
Existing Literature
Microarray
Two-HybridMass
Spectrometry
Genetics
Multiple Pathway Databases
Integration Nightmare!
Protein modifications
Function
Interaction Data
Expression
3 May 2005 7
So many pathway databases…Each has its own
data model, format, and data access methods
Source: Pathway Resource List (http://cbio.mskcc.org/prl/)
3 May 2005 8
WITBioCycReactomeaMAZEKEGGBINDDIPHPRDMINTIntActPSI formatCSNDBTRANSPATHTRANSFACPubGeneGeneWays
Research Community Needs
PathwayDatabases
Semantic Aggregation, Integration, Inference(Pedantic Aggravation, Irritation, and Interference)
3 May 2005 9
Without BioPAX With BioPAX
Common “computable semantic” enables scientific discovery
Over 170 DBs and tools
Database
Application
User
A Common Exchange LanguagePromotes collaboration (big science), accessibility
3 May 2005 10
Closes Gaps in Pathway Data Space Exchange Language Domain
BioPAX
PSI-MI 2SBML,CellML
GeneticInteractions
Molecular InteractionsPro:Pro All:All
Interaction NetworksMolecular Non-molecularPro:Pro TF:Gene Genetic
Regulatory PathwaysLow Detail High Detail
Database ExchangeFormats
Simulation ModelExchange Formats
RateFormulas
Metabolic PathwaysLow Detail High Detail
Biochemical Reactions
Small MoleculesLow Detail High Detail
3 May 2005 11
Design Goals• Encapsulation: An entire pathway in
one record• Compatible: Use existing standards
wherever possible• Computable: From file reading to
logical inference• Successful: Buy-in from the research
community
3 May 2005 12
Technical GoalsInteroperability
– Integration and exchange of pathway data– Interchange through a common (standard)
representation– accommodate existing database
representations– provide a basis for future databases– enables development of tools for searching
and reasoning over the data baseDevelopment of tools and API to facilitate
conversion (libBioPAX)
3 May 2005 13
Technical Goals (cont’d)Why OWL? Why OWL DL?Expressivity (biology = “complex relationships”) • W3C Standard (use existing standards)
“Semantic Web enabled”• XML based (the exchange language in computing)• Machine Computable
– Facilitate integration of knowledge, data, tool development– Uncover inconsistencies and new knowledge
– OWL DL• Enable full reasoning capability for users
from file reading to logical inference• Complete: all conclusions are guaranteed to be
computed• Decidable: all computations will finish in finite time
(with OWL Lite, short amount of time)
3 May 2005 14
Social LogisticsGet organized
Make the decision & commitment2 or 3 dedicated individuals to be the contact points
Small core group– Bi-weekly conference calls, bi-monthly F2F– Commitment & resources
• Participants willing and able cover their costs• Outside funding (DOE)
Special interests and needs form subgroup task forces• Core group member(s)• Outside experts
International representation & participation (Outreach & Community Building)
• conferences and mailing lists• follow-up and individual
Collaborate with complementary/competing representations
3 May 2005 15
Social Logistics (cont’d) How we engendered buy in from the field which
made life much easier
Take things in steps:•Pathway Database vision -> Data Exchange Format as 1st step•Data Exchange Format -> Release in Levels of increasing complexity Level 1 supports Metabolic pathways, Level 2
Early success leads to early adoption, leads to increased probability of overall project success.
Get “buy in” and get involvement -leads to acceptance later•Support the existing databases (BioCYC, WIT, BIND, etc.)
–Got database sources to agree to participate in the development to assure that their DBs will be properly represented
•Got database sources to agree to export in the new format once it is defined
3 May 2005 16
Social Logistics (cont’d)Get “buy in” (continued) • Community Involvement and Support
Core group (represents voice of community, small, committed)Mailing ListUser community interaction (BioPAX-Boston)Subgroups
• International Meetings and Presentations Tool developers
ModelersUsers (researchers)Ontology developersDatabase providersComplementary representations (SBML, CellML)Like mindsGeneral Community
3 May 2005 17
Implementation of BioPAX
Designed using GKB Editor and Protégé
BioPAX uses OWL to define the “Schema”
BioPAX Instances to store the data
Technically, an ontology with instance data is a knowledge base
3 May 2005 18
BioPAX – Ontology
Level 1: Metabolic Pathways
3 May 2005 19
Creating and Editing
3 May 2005 20
OWL(schema)
Instances (Individuals)
data
Mapping Pathways to BioPAX
3 May 2005 21
Mapping Pathways to BioPAX
3 May 2005 22
Challenges & Bottlenecks
• Scientific– What’s a pathway? Depends on who you ask.
• Technical– Each own syntax & semantics– Immaturity of tools for data integration
• Social / Logistical– Community organization and adoption
• Financial– mostly volunteer of stakeholders– Dept of Energy
3 May 2005 23
Bridging Chemistry and Molecular Biology
Uniprot:P49841
•Different Views have different semantics: Lenses
• When there is a correspondence between objects, a semantic binding is possible
Apply Correspondence Rule:if ?target.xref.lsid == ?bpx:prot.xref.lsidthen ?target.correspondsTo.?bpx:prot
Source: Eric Neumann
3 May 2005 24
BioPAX increases collaboration and accessibility to the field and enables 'big science' because it delivers a scalable solution
Capture the complex relationships inherent in Biology
Solves some nasty integration problems
Saves a lot of time and money
Enables Computable Biology
3 May 2005 25
BioPAX Supporting GroupsGroups • Memorial Sloan-Kettering Cancer Center:
G. Bader, M. Cary, J. Luciano, C. Sander• SRI Bioinformatics Research Group:
P. Karp, S. Paley, J. Pick• University of Colorado Health Sciences
Center: I. Shah• BioPathways Consortium: J. Luciano,
E. Neumann, A. Regev, V. Schachter• Argonne National Laboratory: N. Maltsev,
E. Marland• Samuel Lunenfeld Research Institute:
C. Hogue• Harvard Medical School: E. Brauner,
D. Marks, J. Luciano, A. Regev• NIST: R. Goldberg• Stanford: T. Klein• Columbia: A. Rzhetsky• Dana Farber Cancer Institute: J. Zucker
Collaborating Organizations:
• Proteomics Standards Initiative (PSI)• Systems Biology Markup Language
(SBML)• CellML• Chemical Markup Language (CML)
Databases• BioCyc (www.biocyc.org)• BIND (www.bind.ca)• WIT (wit.mcs.anl.gov/WIT2)• PharmGKB (www.pharmgkb.org)
Grants• Department of Energy (Workshop)
The BioPAX Community
3 May 2005 26
Thank you!
Questions?