Daniel Schober, Martin Boeker, University Medical Center Freiburg Ontology Simplification Buzzword or real Need ? OBML 2010

.

Daniel Schober, Martin Boeker, University Medical Center Freiburg

‘Ontology Simplification’Buzzword or real Need ?

OBML 2010

.

State of the Art

• Transition from taxonomies to description logics– Increasing formal semantics & expressivity

• OWL DL applied widely– W3C standard for ‚semantic web‘– Multiple efforts with massive funding– Ontology libraries & best practice providers

.

Problem

• DL ontologies rarely used in application settings– Used in .ac & .edu, but not in .com projects– Only small data sets exploited

• Competency questions and use cases usually retro-fitted

• DL aspects rarely exploited– Complex LEGO-style definitions– RL profile expressivity

• Cardinalities, disjoints, nested universal restrictions, …

Usability decreases as formality and expressive power increases

Usability reverse proportional to complexity

.

Potential Reasons• Inherent complexity of biodomain

– Dealing with non-linear behaviors, non-classical physics– Far from sensory-perceptable world (‘meso-level’)

• Complexity of DL– Set theoretic approach counter-intuitive to object-orientation– Long class expressions, nested with multiple brackets– Hard to read Syntax

• Reasoners can’t cope with expressivity on larger scale– Computationally not feasible

• DL (semantics), OWL (syntax) ? … not really very robust• Tools & editors ?

…very far from being robust• Ontologists can’t keep up with frequent changes• Steep learning curve on engineering & usage side

.

Hypothesis

‘Is advertising ‘simplifications’ a solution ?’• Increase end-user compliance• Render DL ontologies human understandable

… while not sacrificing reasoning capabilities

Approach• Collect and review existing simplifications

– Check if whole approach feasible ?

• Introduce new simplification methods• Create typology of simplification methods• Raise general awareness • Make methods accessible• Test if they increase compliance

.

‚understandeable‘ ?• Ontology is understandable if its RUs are understandable• RU is understandable if …

– it is traceable and readily applicable by the user– its intended meaning can be grasped in a short time

• This is the case if …– it is labeled in line with user-expectations– it is instantiated often

• easy map to everyday language constructs• high every-day usage frequency

– it resides in the ‘intuitive’ meso-level• i.e. neither too abstract nor too special• directly perceivable by human senses

– it belongs to traceable and intelligible top-level category• i.e. MaterialEntity vs. DependentContinuant

– it has short logical definitions• built from simple RUs themselves

• Understanding can be facilitated by tools– Using principles of software ergonomics– Implementing simplification and normalisation strategies

.

Typology (naive start)

.

Typology (naive start)

1. Syntax simplifications2. Structural simplifications3. Shortcuts and local approximate models4. Views showing subsets of entities5. Modularizations, partitions, slims6. GUI simplifications / software ergonomics

Some examples …

.

Normalize syntax (1.a)Normalizing equivalent constructs into simpler forms

– Syntactical complexity reduction• E.g. via specialized language constructs

E.g. for disjointness

Simplify

<owl:Class rdf:about="#AdrenalineReceptor"><rdfs:subClassOf><owl:Restriction><owl:complementOf

rdf:resource="#VoltageGatedReceptor"/></owl:Restriction></rdfs:subClassOf></owl:Class>

into

<owl:Class rdf:about="AdrenalineReceptor"><owl:disjointWith rdf:resource="#VoltageGatedReceptor"/></owl:Class>

.

Conflate redundancy in restrictions (1.b) Avoid redundancy in restrictions

Simplify

NeuralInflammation Inflammation⊑ ∃has-participant. CNS_Tissue ⊓ ∃has-participant. PNS_Tissue ⊓ ∃has-participant. Brain_Tissue

into

NeuralInflammation ⊑ ∃ has-participant. (CNS_Tissue ⊓ PNS_Tissue ⊓ Brain_Tissue)

.

Increase human readability (1.c)Human readable syntax• Omit logics-specific symbolism

Simplify

HeparinBiosynthesis ⊑ (HeparinMetabolism (⊓ Biosynthesis ⊓∃ acts_on. Heparin))

into Manchester OWL Syntax

HeparinBiosynthesis SubClassOf HeparinMetabolism SubClassOf (Biosynthesis AND acts_on SOME Heparin)

or Attempto Controlled English (CNL)

“Every HeparinBiosynthesis is a HeparinMetabolism. Every HeparinBiosynthesis is a Biosynthesis that acts_on a Heparin.”

.

Simplify labels (2.c)

Naming Conventions• Shorten long relation names

– “Anatomic_Structure_Is_Physical_Part_Of”• Remove redundancy

SimplifyOvary ⊑∃Anatomic_Structure_Is_Physical_Part_Of. Reproductive_System

intoOvary ⊑ ∃Is_Physical_Part_Of. Reproductive_System

• ‘Anatomic_Structure’-prefix is already specified via domain of relation

.

Conflate property chains (3.a)

Use Shortcuts• Property chains (OWL 2) allow shortening expressions

– Compress two triples into one– Conflate / fold expression over 2 or more properties

Simplify

GeneA transcribed_to GeneA_mRNAGeneA_mRNA translated_to GeneA_Protein

into

GeneA_Protein product_of GeneA

.

Simplified umbrella classes (3.b)

Allows for graceful evolution through temporary proximity models – which can later be untangled seamlessly

Goal model for diseases PathologicalDisposition ⊑ ∃ inheresIn. PathologicalStructure PathologicalDisposition ⊑ ∀hasRealization. PathologicalProcess PathologicalProcess ⊑ ∃ hasParticipant. PathologicalStructure PathologicalProcess ⊑ ∃ realizationOf. PathologicalDisposition

Pre-coordination is labor-intensive due to combinatorial explosion

.

Simplified umbrella classes (3.b)• A pragmatic proximity model can be introduced

– Insert temporary umbrella class– ignoring disposition / structure / process distinction

PathologicalEntity ≡ PathologicalStructure PathologicalDisposition⊔ ⊔PathologicalProcess

• Later gracefully evolve towards complex model• All needed relations for …

– Pathological Structures: part-of / located-in– Pathological Dispositions: inheres-in– Pathological Processes: has-participant / located-in

… can be captured via one super-relation has-locus• Allows connecting from any PathologicalEnity to relevant location

– but without commitment to granularity– But still, the simplified model supports some inferences

• It can later be expanded – without rendering the simplification false

.

Discussion

• Typology in early stage– Re-structure into polyhierarchy of disjoint orthogonal branches

• Potential sortals ordering simplifications– By entity tackled– By persistence– By life cycle

• kick-off, development, deployment/usage– By ergonomics (Wahrnehmungspsychologie)– By user role– By user background

• mathematician, computer scientist, logician, philosopher, linguist, biologist,

.

Discussion

• Ease access to simplification methods• Publish

– OBO Foundry initiative– Ontology Engineering and Patterns Task Force (SWBPD-WG)– Ontology Design Pattern portals

• None currently addresses ‘simplifications’• Rather seen as properties of general design patterns• Introduce special ‘simplification pattern type’ or add additional descriptor to existing pattern types ?

.

Conclusions

• Reason for limited impact of OWL DL – Performance problems– Inherent complexity

• Complexity can be coped with by simplifications• Collection of >30 reviewed simplification methods

– Put in Typology– Collection and Typology to be expanded

• Cross-talk with ODP community

• Compare user compliance pre- and post-simplified– Test how fast two codes/ontologies lead to desired result for

same test task

• Feedback appreciated

.

Resources & Acknowledgements

Resources• Find Simplifications & reviews on http://www.imbi.uni-freiburg.de/~schober/Simplifications/

Acknowledgements

• Martin Boeker• Stefan Schulz• Josef Ingenerf• The DebugIT community

.

Normalize syntax (1.a)

E.g. for instance-assertions

Simplify

<rdf:Description rdf:ID="Beta receptor 94"> <rdf:type rdf:resource="#AdrenalineReceptor"/> </rdf:Description>

into <AdrenalineReceptor rdf:ID="Beta receptor 94"/>

.

Towards Simplification Methods

• Two types of simplifications1. Removing complexity

• Prevents full exploitation of semantics• Format transformation into OWL lite or SKOS

2. Hiding complexity• Allows full exploitation of semantics• Views and excerpts of ontologies

• Define characteristics for ‘simplicity’ and ‘understanding’ for following aspects

– Individual cognitive abilities– Semantics & syntax– Software ergonomics

.

Simplification Collection and Review

.

Conflate redundancy in restrictions (1.b) Avoid redundancy in restrictions• Frequent source of errors for inadequate modeling

– E.g. below: each individual AdrenalineReceptor is simultaneously expressed in three different body parts

Simplify

AdrenalineReceptor ⊑ ∃Gene_Product_Expressed_In_Tissue. Lung ⊓

∃Gene_Product_Expressed_In_Tissue. Brain ⊓ ∃Gene_Product_Expressed_In_Tissue. Muscle

into AdrenalineReceptor ⊑ ∀ Expressed_In. (Lung ⊔ Brain ⊔ Muscle)

.

Conflate property chains (3.a)Use Shortcuts• Property chains (OWL 2) allow shortening expressions

– Compress two triples into one– Conflate and fold expression over 2 or more properties

Simplify

Pneumonia outcome_of LungInflammationLungInflammation treated_by AntibioticsPneumonia improved_by Antibiotics

Tryptophan substrate_of IndolePhosphatase IndolePhosphatase has_product TrypthophanPhosphate

A is_son_of B and B is_brother_of C

into

Tryptophan processed_to TrypthophanPhosphate

A has_uncle C

• Two properties can be chained by a new property• In particular views shortcuts increase understanding

.

• To investigate– What complexities can be automatically detected and be removed

?• Parsers can unify / normalize and simplify syntax

– ‘Guided simplification finder’ chooses appropriate simplifications based on user requirements ?

Documents

Daniel Schober, Martin Boeker, University Medical Center Freiburg Ontology Simplification Buzzword or real Need ? OBML 2010