18
4/1/2008 OWL-ED 2008, Gaithersburg , MD 1 OWL: PAX of Mind or the AX? Experiences of Using OWL in the Development of BioPAX Joanne Luciano 1 & Robert Stevens 2 1 Harvard Medical School, 2 Manchester University OWL-ED DC, April 1-2, 2008 Gaithersburg, MD, USA

4/1/2008 OWL-ED 2008, Gaithersburg, MD 1 OWL: PAX of Mind or the AX? Experiences of Using OWL in the Development of BioPAX Joanne Luciano 1 & Robert Stevens

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

4/1/2008 OWL-ED 2008, Gaithersburg, MD1

OWL: PAX of Mindor

the AX? Experiences of Using OWL in the

Development of BioPAX

Joanne Luciano1 & Robert Stevens2

1Harvard Medical School, 2Manchester University

OWL-ED DC, April 1-2, 2008

Gaithersburg, MD, USA

4/1/2008 OWL-ED 2008, Gaithersburg, MD2

BioPAXThe Vision

Integrate biological processes data of different types

The Reality- An abstraction of the different types of processes that enable them to co-exist- A controlled vocabulary for them

What went wrong?- No real interest of using OWL features or reasoners- No real examples of why using them would be of any value

4/1/2008 OWL-ED 2008, Gaithersburg, MD3

The domain: Biological pathways

MetabolicPathways

MolecularInteractionNetworks

SignalingPathways

Main categories:

4/1/2008 OWL-ED 2008, Gaithersburg, MD4

A few technical factors

• the complexity of the language and its syntax• the open world assumption was foreign to people • the logical framework was unfamiliar• the steep learning curve• the lack of tutorials and examples• the lack of tools of any quality• the general lack of experience (new language)• the BioPAX community did not have a coherent set

of requirements

4/1/2008 OWL-ED 2008, Gaithersburg, MD5

A few social factors

• there was disagreement (two camps, OWL and XML Schema)

• OWL was not seen as necessary by all members • OWL of the community and it required

considerably more work• there were existing known methods• mentality: do enough to do the job at hand done• human nature: to resist the new or unknown

4/1/2008 OWL-ED 2008, Gaithersburg, MD6

Why bother?

• Much basic scientific research produces pathway data– environmental research, energy research, genetic and clinical

research, and virtually all of life science research today– At some point, the question is asked “What pathways are

involved?"

• Therefore, it is important to provide a mechanism for access and reuse to these data– enable it to have broad impact for science

• The major problem for researchers who use pathway databases has been that the representations of pathway data within these resources are not consistent or interchangeable

4/1/2008 OWL-ED 2008, Gaithersburg, MD7

At the conceptual level

• In signaling pathways, it is the activation or inhibition of a process (apoptosis)

• in metabolic pathways, a series of chemical reactions transform a chemical molecule (glucose → pyruvate)

4/1/2008 OWL-ED 2008, Gaithersburg, MD8

At the syntax level• HumanCyc’s term: D-glucose-6-phosphate

• KEGG’s term: D-Glucose-6P

It is clear we are referring to the same molecule, i.e. the same real world class of instances,

The vocabulary label used to name these instances differs and while this difference is insignificant for a human reader, it is significant for computational processing.

4/1/2008 OWL-ED 2008, Gaithersburg, MD9

Reasons for choosing OWL-DL

• OWL’s Expressivity

• Future uses: Enable reasoning

4/1/2008 OWL-ED 2008, Gaithersburg, MD10

Reasons for choosing OWL-DL

• Future uses: Enable reasoning

(whatever that means)– Future (not us, not now)– Reasoning

• by Choosing OWL-DL it would it would “be enabled”– (didn’t really think much about this)

4/1/2008 OWL-ED 2008, Gaithersburg, MD11

Mistakes in using OWL(nothing new here)

• Bad Conceptualizations– confusion about what was being represented

• biological processes or database records of biological processes

– Utility class, a concept used in Java, not in biology, was a

• Poor Understanding of OWL– Assumed axioms were disjoint– Domain and range– Open world assumptions and implications– Semantics in comments rather than in the ontology– What was said in the ontology, was not what was meant

• OWL as an EXPORT File Format• For more details see Luciano and Stevens (2007)

4/1/2008 OWL-ED 2008, Gaithersburg, MD12

Social Factors

• BioPathways Consortium enabled Chris Sander by obtaining a commitment for funding by Dept of Energy

• Chris in turn funded 2 people to get the initiative organized (The DEF Group),

• An initial group of stakeholders decided to organize a “core group” for decisions and hold meetings “by-invitation-only”

4/1/2008 OWL-ED 2008, Gaithersburg, MD13

Social Factors

• Ignorance – of the task– of how to achieve it

• Internal biases…– Which tool

• first there were none• then there were (promises)• then there were none

– XML-Schema vs-OWL

4/1/2008 OWL-ED 2008, Gaithersburg, MD14

Social Factors

• Understanding of OWL increased– Knowledge (papers, tutorials) became

increasingly available– Tools for OWL become available

• Discovery of mistakes made• However, remember “future”

– Pressure to release

• Breakdown – in-fighting, undermining, ugliness (more mistakes!)

4/1/2008 OWL-ED 2008, Gaithersburg, MD15

What Went Right

• Helped the Semantic Web community spread the word about OWL by having a user to point to

• Community outreach helped BioPAX adoption• BioPAX brought the wider community together • Created the higher level abstraction that

included generalize concepts common to the different “pathway” conceptualizations– Upper level ontology for pathways

4/1/2008 OWL-ED 2008, Gaithersburg, MD16

Looking ahead

Current Starting Point

• multiple OWL syntaxes• multiple tools• support materials• methodologies for development

4/1/2008 OWL-ED 2008, Gaithersburg, MD17

Looking ahead• NEED tools to support developers:

– analyze the semantic complexity needed to support use cases

– facilitate development in a staged process with increasing complexity at each stage

– support basic requirements first, controlled vocabularies, taxonomies, (XML data exchange) then interoperability (SBML/BioPAX)

– then support richer semantics enabling integration, inference, and possibly integrated or in-line rules

4/1/2008 OWL-ED 2008, Gaithersburg, MD18

Conclusions and Lessons Learnt OUCH, that hurt, don’t do that again

• Assess– complexity of the use cases– needs of the community– capability of the language (and its limitations)– tools available

• Process– support subsequent levels of complexity on sound

foundation (O??-Foundry) • Evaluate

– correct (specification)– complete/comprehensive (concepts and detail)– utility/effectiveness (use cases)