Dr. Leo Obrst MITRE Information Semantics Center for Innovative Computing & Informatics October 12, 2006 Ontologies & Databases: Similarities & Differences

Dr. Leo ObrstDr. Leo ObrstMITRE MITRE

Information SemanticsInformation SemanticsCenter for Innovative Computing & InformaticsCenter for Innovative Computing & Informatics

October 12, 2006October 12, 2006

Ontologies & Databases: Similarities & Differences

Ontolog Panel

2

Summary

• Databases:– Focus on local semantics that have only aspects of the real world– Typically keep that semantics implicit– Use logic structurally– Their schemas are not generally reusable

• Ontologies:– Focus on global semantics of the real world– Make that semantics explicit– Enable machine interpretability by using a logic-based modeling

language– Are reusable as true models of a portion of the world

3

Tightness of Coupling & Semantic Explicitness

Implicit, TIGHT

Explicit, Loose

Local

Far

1 System: Small Set of Developers

Systems of Systems

Enterprise

Community

Internet

Looseness of Coupling

Se

ma

nti

cs

Ex

plic

itn

ess

Data

Application

Same Process Space

Same CPUSame OS

Same Programming Language

Same Local Area NetworkSame Wide Area Network Client-Server

Same Intranet

Compiling

Linking

Agent Programming

Web Services: SOAP

Distributed Systems OOP

Applets, Java

Semantic Brokers

Middleware Web

Peer-to-peer

N-Tier Architecture

From Synchronous Interaction to Asynchronous Communication

Performance = k / Integration_Flexibility

Same Address Space

Same DBMS

Federated DBs

Data WarehousesData Marts

Workflow Ontologies

Semantic Mappings

XML, XML Schema

Conceptual Models

RDF/S, OWLWeb Services: UDDI, WSDL

OWL-S

Proof, Rules, Modal Policies: SWRL, FOL+

Enterprise Ontologies

EAISOA

EA

EA OntologiesEA Brokers

4

Ontology Spectrum: One View

weak semanticsweak semantics

strong semanticsstrong semantics

Is Disjoint Subclass of with transitivity property

Modal Logic

Logical Theory

Thesaurus Has Narrower Meaning Than

TaxonomyIs Sub-Classification of

Conceptual Model Is Subclass of

DB Schemas, XML Schema

UML

First Order Logic

RelationalModel, XML

ER

Extended ER

Description LogicDAML+OIL, OWL

RDF/SXTM

Syntactic Interoperability

Structural Interoperability

Semantic Interoperability

From

less to

more

expressive

5

Ontology Spectrum: Application

Logical Theory

Thesaurus

Taxonomy

Conceptual Model

Exp

ress

ivit

y

Categorization, Simple Search & Navigation, Simple Indexing

Synonyms, Enhanced Search (Improved Recall) & Navigation, Cross Indexing

Application

Enterprise Modeling (system, service, data), Question-Answering (Improved Precision), Querying, SW Services

Real World Domain Modeling, Semantic Search (using concepts, properties, relations, rules), Machine Interpretability (M2M, M2H semantic interoperability), Automated Reasoning, SW Services

Ontology

weak

strongConcept- based

Term- based

6

Example: Metadata Registry/Repository – Contains Objects + Classification

Data Element

Taxonomy

Namespace

Class

Data Objects

Classification Objects

Terminology Objects

Meaning Objects

Data AttributeConceptual Model

Ontology

Thesaurus

XML DTD

XML Schema

Concept

Property

Relation

Attribute

Value

Instance

Privileged TaxonomicRelation

Data SchemaDocuments

Data Value

Term (can be multi-lingual)

Keyword List

7

Approximate Cost/Benefit of Moving up the Ontology Spectrum

Co

st

Taxonomy

Thesaurus Conceptual Model

Logical Theory

Cost Benefit

Time

Higher Initial Costs

Much lower eventual costs because of reuse, less analyst labor

Increasingly greater benefit because of increased semantic interoperability, precision, level machine-human interaction

Higher initial costs at each step up

8

What Problems Do Ontologies Help Solve?• Heterogeneous database problemHeterogeneous database problem

– Different organizational units, Service Needers/Providers have radically different databases

– Different syntactically: what’s the format?– Different structurally: how are they structured?– Different semantically: what do they mean? – They all speak different languages

• Enterprise-wide system interoperability problem– Currently: system-of-systems, vertical stovepipes– Ontologies act as conceptual model representing enterprise consensus

semantics– Well-defined, sound, consistent, extensible, reusable, modular models

• Relevant document retrieval/question-answering problem– What is the meaning of your query?– What is the meaning of documents that would satisfy your query?– Can you obtain only meaningful, relevant documents?

9

.251.25SquareXAB035

.751.5RoundXAB023

…Price ($US)

Size (in)

ShapeCatalog No.

.4531S550298

.3537R550296

…Price ($US)

Diam (mm)

Geom.Part No.

Washer

Catalog No.Shape Size Price

iMetal Corp.E-Machina

iMetal Corp.E-Machina

Manufacturer

.451.25Square550298

.351.5Round550296

.751.5RoundXAB023

.251.25SquareXAB035

…Price ($US)

Size (in)

ShapeMfr No.

Supplier ASupplier

B

Buyer

Ontology

A Business Example of Ontology

10

Ontologies & the Data Integration Problem• DBs provide generality of storage and efficient access• Formal data model of databases insufficiently semantically

expressive• The process of developing a database discards meaning

– Conceptual model Logical Model Physical Model– Keys signify some relation, but no solid semantics– DB Semantics = Schema + Business Rules + Application Code

• Ontologies can represent the rich common semantics that spans DBs

– Link the different structures– Establish semantic properties

of data– Provide mappings across

data based on meaning– Also capture the rest of the

meaning of data:• Enterprise rules• Application code

(the inextricable semantics)

13465121.25°CNM035

13458121.135°

MIG-29CNM023

…TstampLongLatTypeTid

2.45121°2‘2"AH-1G C330298

2.35121°8'6"F-14D330296

…SenseTimeCoordModelS-code

Aircraft

IdentifierSignature Location Time Observed

Army

Navy

Army

Navy

Service

2.45121°2‘2"AH-1G C330298

2.35121°8'6"F-14D330296

13458121.135°MIG-29CNM023

13465121.25°Tupolev TU154

CNM035

…Time

ObservedLocationSignatureIdentifier

ArmyNavy

Ontology

A Military Example of Ontology

Commander, S2, S3

Tupolev TU154

Decimal

Geographic Coordinates

UTMCoordinate

Sexigesimal

11

Background on Relational Calculus for Databases

• Relational Calculus– Tuple Relational Calculus (TRC)

• More like a pre-relational file structure format– Domain Relational Calculus (DRC)

• Similar to logic as a modeling language– Relational Algebra (RA)– Roughly equivalent expressivity: all the above– SQL: slightly more powerful because of some

computation, ordering, etc.

• These use the syntax of FOL but only a very simplified semantics

12

Ontologies & Databases

• Ontologies are about vocabularies and their meanings, with an explicit, expressive, and well-defined semantics, possibly machine-interpretable

• Ontologies try to limit the possible formal models of interpretation (semantics) of those vocabularies to the set of meanings a modeler intends, i.e., close to the human conceptualization

• None of the other "vocabularies" such as database schemas or object models, with less expressive semantics, does that

• The approaches with less expressive semantics typically assume that humans will look at the "vocabularies" and supply the semantics via the human semantic interpreter (your mental model)

• Additionally a human developer will code programs to enforce the local semantics that the database/DBMS cannot

– They may or may not get it right– Other humans will have to read that code, interpret it, and see if it's actually doing what

everyone thinks it should be doing– The higher you go in terms of data warehouses, marts, etc., the more human interpreted

semantic error creeps in• Ontologies model generic real world concepts and their meanings, unlike either

database schemas or object models, which are typically very specific to a particular set of applications and represent limited semantics

• A given ontology cannot model completely any given domain– However, in capturing real world (and imaginary, if you wish, i.e., you might want a theory of

unicorns and other fantastic beasts) semantics, you are thereby enabled to reuse, extend, refine, generalize, etc., that semantic model

13

Ontologies & Databases

• It's suggested you reuse ontologies– You cannot reuse database schemas– You might be able to take a database conceptual schema and use that as the basis of an

ontology, but that would still be a leap from an Entity-Relation model to a Conceptual Model (say, UML, i.e., a weak ontology) to a Logical Theory (strong ontology)

– In much the same way, you can start with a taxonomy or a thesaurus and migrate it to an ontology

– But logical and physical schemas are typically pretty useless, since they incorporate non real world knowledge (and in non-machine-interpretable form)

– By the time you have the physical schema, you just have relations and key information: you've thrown away the little semantics you had at the conceptual schema level

• The methodology for ontologies and databases are similar (as for all models in the Ontology Spectrum) insofar as the database designer or knowledge/ontology engineer has to consider an information space that captures certain kinds of knowledge

– However, a database designer does not care about the real world, per se, but about constructing a specific local container/structure of data that will hold his/her user's data in an access-efficient way

– A good database designer will sit down with users and generate use cases/scenarios based on interaction with the users. Similarly, for ontologists: they'll sit down with domain experts/SMEs and get a sense of the semantics of the part of the world that these folks are knowledgeable about

– A good ontologist will analyze the data available (if available; bottom up) and also analyze what the domain expert says (top down)

– In many cases (intelligence analysis, e.g.), the ontologist won't ask the SME what kinds of questions that person asks for their tasks, but also what kinds of questions they would like to ask and which are impossible to get answered currently by using mainstream database and system technology

14

The Database Design Process: 3 Stages

1) In interaction with prospective users and stakeholders of the proposed database, the database designer will create a conceptual schema, usually using a modeling language and tools based on Entity-Relation models, extended ER models, or recently, on object-oriented models using UML

2) Once this conceptual schema is captured, the designer will refine to become a logical schema, sometimes called a logical data model, still in an ER language or UML. The logical schema typically results by refining the conceptual schema using normalization and other techniques to move closer to the so-called physical model that will be implemented to create the actual database - by normalizing the relations (and attributes, if the conceptual schema contains these) using the same ER and UML languages

3) Finally, refining the logical schema to become the physical schema, where the tables, columns, keys, etc., are defined, and then the physical table optimized in terms of which elements to index, which sectors in the database to place the various data elements

– A data dictionary may be created for the database; this expresses in natural language documentation, what the various elements of the database are intended to mean

– The data dictionary is only semantically interpretable by human beings, since it is written in natural language

– The most expressive real-world semantics of the database creation process thus exists in the conceptual schema and the data dictionary

– The conceptual schema, may be kept around, as part of the documentation of the process of developing the database, an artifact of that process

– The data dictionary, will typically be kept as documentation– Unfortunately, the underlying physical database and its schema may be changed dramatically -

without the original conceptual schema and the data dictionary being comparably changed– This is also typically the case with UML models used to create object-oriented systems and

sometimes to defined enterprise architectures

15

The Database Design Process

• Databases typically try to enforce 3 kinds of integrity• 1) Domain integrity (and note that this is not the same notion of "domain" we use in general in

logic/ontologies): domains are usually datatype domains, i.e., integers, strings, real numbers, or column-data domains.

– Typically you don't have any symbolic objects at all in a database, just strings– So on data entry or update say of a row, some program (or the DBMS) will make sure that if a column is defined to contain only integer

data, that the user can only enter integer data

• 2) Referential integrity: this refers to key relationships, primary and foreign– This kind of integrity is structural, making sure that if a key gets updated, that any key in any other place that's dependent on it gets

updated appropriately to. Add, Delete, Update (usually considered an initial Delete, followed by an Add)

• 3) Semantic integrity: this is the hardest part. Represents real-world constraints/etc., sometimes called "business rules" that you want to hold over your data

– Databases and DBMSs can't usually do this (even with active and passive triggers), and so auxiliary programming code usually has to enforce this

– Example:"no other employee can make more than the CEO", or other cross-dependencies.

• You can't really check consistency of a database in the same way you can for an ontology in a logical knowledge representation language

• For databases, you can just enforce as best as you can the above 3 kinds of integrity• For an ontology, you can check consistency in two ways:

– Syntactically (proof theory)– Semantically (model-theory)– But you can do this at two levels: (1) prove that your KR language is sound and complete, i.e., at the meta-level

• Sound ('Phi |- A' implies 'Phi |= A'): the proof system will not prove anything that is not valid• Complete ('Phi |= A' implies 'Phi |- A'): the proof system is strong enough to prove everything that's valid• 'Phi |- A' means something like: A follows from or is a consequence of Phi• 'Phi |= A' means that A is a semantic consequence or entailment of Phi in some model (or valuation system) M (with truth values, etc.) I.e.,

the argument is valid• Both |- and |= are called turnstyles, syntactic and semantic respectively

– Check the consistency of a theory (ontology), i.e., at the object level– This is usually something like Negation consistency: there is no A such that both 'Phi |- A' and 'Phi |- ~A', i.e., a contradiction

16

Ontology Design

• If you are creating common knowledge (as opposed to deep domain knowledge), you can in fact use your own intuition and understanding of the world to develop your ontology

• It certainly helps to have a good background in formal ontology or formal semantics, because then you've already learned

– 1) a rigorous, systematic methodology – 2) formal machinery for expressing fine details of world semantics– 3) an appreciation of many alternative analyses, pitfalls, errors, etc.– 4) complex knowledge about things in the world and insight into your pretheoretical

knowledge – In linguistics we say that although everyone knows how to use natural language like

English, very few know how to characterize that knowledge nor about prospective theories about that knowledge

– Naive speakers don't have good subjective insight into how they do things; they just do them

17

Ontologies vs. Databases

• As is so often the case with non-ontological approaches to capturing the semantics of data, systems, and services, the modeling process stops at a syntactic and structural model, and throws even the impoverished semantic model away, to act as historical artifact, completely separated from the evolution of the live database, system, or service, and still only semantically interpretable by a human being who can read the documents, interpret the graphics, supply the real world knowledge of the domain, and understands how the database, system, or service will actually be implemented and used

• Ontologists want to shift some of that "semantic interpretative burden“ to machines and have them eventually mimic human semantics, i.e., understand what we mean

• The result would be to bring the machine up to the human, not force the human to the machine level• By "machine semantic interpretation" we mean: by structuring and constraining in logical, axiomatic

language the symbols humans supply, the machine will conclude via an automated inference process roughly what a human would in comparable circumstances

• The knowledge representation language that enables this automated inference must be a language that both makes fine modeling distinctions and has a formal or axiomatic semantics for those distinctions, so no direct human involvement will be necessary – the meaning of "automated inference"

• Databases primary purpose is for storage and ease of access to data, not complex use• Software applications (with the data semantics embedded in nonreusable code via programmers)

and human beings must focus on data use, manipulation, and transformation, all of which require a high degree of interpretation of the data"

• Extending the capabilities of a database often requires significant reprogramming and restructuring of the database schema

• Extending the capabilities of an ontology can often be done by adding to its set of constituent relationships

• In theory, this may also include relationships for semantic mapping whereas semantic mapping between multiple databases will require external applications