97
March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 2004 1 Knowledge Representation Chapter 10

Embed Size (px)

Citation preview

Page 1: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 20041

Knowledge RepresentationChapter 10

Page 2: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 20042

Outline

• KR Introduction• Ontological Engineering• Categories and Objects• Actions, Situations, and Events• Mental Events and Mental Objects• Reasoning Systems for Categories• Reasoning with Default Information• Truth Maintenance Systems• Bio-Ontologies

Page 3: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 20043

KR Introduction• General problem in Computer Science• Solutions = Data Structures

– words– arrays– records– list

• More specific problem in AI• Solutions = knowledge structures

– lists– trees– procedural representations– logic and predicate calculus– rules– semantic nets and frames– scripts

Page 4: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 20044

Kinds of Knowledge

• Objects– Descriptions– Classifications

• Events– Time sequence– Cause and effect

• Relationships– Among objects– Between objects and events

• Meta-knowledge

Things we need to talk about and reason about; what do we know?

Distinguish between knowledge and its representation

Page 5: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 20045

Representation Mappings

• Knowledge Level

• Symbol Level

• Mappings are not one-to-one

• Never get it complete or exactly right

Internal Representation

English Representation

FactsReasoning Programs

Page 6: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 20046

Ontological Engineering

• Like knowledge engineering but applies to general-purpose knowledge bases

• Ultimate goal is to represent everything in the world!!

• Result is an upper ontologyAnything/Root

AbstractObjects GeneralizedEvents

RepresentationalObjects

Categories

NumbersSets

Sentences

Places

Measurements

Interval

WeightsTimes

Processes

ThingsMoments

PhyscialObjects

Solid Liquid Gas

Stuff

Agents

Humans

Animals

Page 7: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 20047

Special- and General-purpose Ontologies

• Special-purpose ontology:– Designed to represent a specific domain of

knowledge;• genetics (GO)• immune system (IMGT)• mathematics (Tom Gruber)

• General-purpose ontology:– Should be applicable in any special-purpose

domain– Unifies different domains of knowledge

• Upper ontology provides highest level framework - all other concepts follow

Page 8: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 20048

Cyc Upper Ontology

• Cycorp released 3,000 upper-level concepts into public domain

• Cyc Upper Ontology satisfies two important criteria;– It is universal: Every concept can be linked to it– It is articulate: Distinctions are necessary and

sufficient for most purposes

Page 9: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 20049

Categories - Representation

• Two choices for representation:– Predicate

• Basketball(b)

– Object• Basketballs

• Member(b, Basketballs)

• Subset(Basketballs, Balls)

Page 10: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200410

Categories - Organizing

• Inheritance:– All instances of the category Food are edible– Fruit is a subclass of Food– Apples is a subclass of Fruit– Therefore, Apples are edible

• The Class/Subclass relationships among Food, Fruit and Apples is a taxonomy

Page 11: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200411

Categories - Partitioning

• Disjoint: The categories have no members in common

• Exhaustive Decomposition: Every member of the category is included in at least one of the subcategories

• Partition: Disjoint exhaustive decomposition

Page 12: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200412

Categories - Partitioning

Disjoint({Animals,Vegetables})

Page 13: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200413

Categories - Partitioning

Disjoint({Animals,Vegetables})

Disjoint(s) <=> (c1,c2 c1s c2s c1c2 Intersection(c1,c2) = {})

Page 14: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200414

Categories - Partitioning

Disjoint({Animals,Vegetables})

Disjoint(s) <=> (c1,c2 c1s c2s c1c2 Intersection(c1,c2) = {})

ExhaustiveDecomposition({Americans,Canadians,Mexicans},NorthAmericans})

Page 15: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200415

Categories - Partitioning

Disjoint({Animals,Vegetables})

Disjoint(s) <=> (c1,c2 c1s c2s c1c2 Intersection(c1,c2) = {})

ExhaustiveDecomposition({Americans,Canadians,Mexicans},NorthAmericans})

ExhaustiveDecomposition(s,c) (i ic c2 c2s ic2)

Page 16: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200416

Categories - Partitioning

Disjoint({Animals,Vegetables})

Disjoint(s) <=> (c1,c2 c1s c2s c1c2 Intersection(c1,c2) = {})

ExhaustiveDecomposition({Americans,Canadians,Mexicans},NorthAmericans})

ExhaustiveDecomposition(s,c) (i ic c2 c2s ic2)

Partition({Males,Females},Animals)

Page 17: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200417

Categories - Partitioning

Disjoint({Animals,Vegetables})

Disjoint(s) <=> (c1,c2 c1s c2s c1c2 Intersection(c1,c2) = {})

ExhaustiveDecomposition({Americans,Canadians,Mexicans},NorthAmericans})

ExhaustiveDecomposition(s,c) (i ic c2 c2s ic2)

Partition({Males,Females},Animals)

Parition(s,c) Disjoint(s) ExhaustiveDecomposition(s,c)

Page 18: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200418

Categories - More

• PartOfPartOf(Bucharest,Romania)

PartOf(Romania,EasternEurope)

PartOf(EasternEurope,Europe)

PartOf(Europe,Earth)

• Composite ObjectsBiped(a) c1,c2,b Leg(c1) Leg(c2)

Body(b) PartOf(c1,a) PartOf(c2,a) PartOf(b,a) Attached(c1,b) Attached(c2,b) c1c2 [c3 Leg(c3) PartOf(c3,a) (c3=c1 c3=c2)]

Page 19: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200419

Categories – And More

• Count Nouns and Mass Nouns– How many aardvarks? How many butters!?!

x Butter PartOf(y,x) y Butter

• Intrinsic and Extrinsic Properties– Intrinsic properties belong to the very substance

of the object; e.g. flavor, color, density, boiling point, etc.

– Extrinsic properties change if the object is changed (cut in half); e.g. weight, length, shape, etc.

Page 20: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200420

Actions, Situations and Events

Page 21: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200421

Situation Calculus• The states resulting from executing actions

• Ontology:– Situations: logical terms describing initial

situation and all situations that result from executing actions on a given situation

Result(a,s)

– Fluents: functions and predicates that may be different in different situations

Age(Wumpus,S0) is Wumpus age in situation S0

– Atemporal or eternal: functions and predicates that are constant across all situations

Gold(G1)

Page 22: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200422

Situation Calculus – Actions

• Each action described by two axioms:– Possibility Axiom:

Preconditions Poss(a,s)– Effect Axiom:

Poss(a,s) changes that result from taking action

Page 23: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200423

Situation Calculus - ExamplePossibility Axioms:

At(Agent,x,s) Adjacent(x,y) Poss(Go(x,y),s).

Gold(g) At(Agent,x,s) At(g,x,s) Poss(Grab(g),s).

Holding(g,s) Poss(Release(g),s).

Effect Axioms:

Poss(Go(x,y),s) At(Agent,y,Result(Go(x,y),s).

Poss(Grab(g),s) Holding(g,Result(Grab(g),s)).

Poss(Release(g),s) Holding(g,Result(Grab(g),s)).

Page 24: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200424

Go for the Gold!GOAL: Bring the gold from [1,2] to [1,1]

At(Agent,[1,1],S0) At(G1,[1,2],S0).Holding(G1,S0).Gold(G1).Adjacent([1,1],[1,2]) Adjacent([1,2],[1,1]).

Do It:Go([1,1],[1,2]).

Result:At(Agent,[1,2],Result(Go([1,1],[1,2]),S0)).

Now, can I grab the gold?Grab(G1).

Page 25: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200425

Go for the Gold!GOAL: Bring the gold from [1,2] to [1,1]

At(Agent,[1,1],S0) At(G1,[1,2],S0).Holding(G1,S0).Gold(G1).Adjacent([1,1],[1,2]) Adjacent([1,2],[1,1]).

Do It:Go([1,1],[1,2]).

Result:At(Agent,[1,2],Result(Go([1,1],[1,2]),S0)).

Now, can I grab the gold?Grab(G1).

Page 26: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200426

Go for the Gold!GOAL: Bring the gold from [1,2] to [1,1]

At(Agent,[1,1],S0) At(G1,[1,2],S0).Holding(G1,S0).Gold(G1).Adjacent([1,1],[1,2]) Adjacent([1,2],[1,1]).

Do It:Go([1,1],[1,2]).

Result:At(Agent,[1,2],Result(Go([1,1],[1,2]),S0)).

Now, can I grab the gold?Grab(G1).

Page 27: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200427

Go for the Gold!GOAL: Bring the gold from [1,2] to [1,1]

At(Agent,[1,1],S0) At(G1,[1,2],S0).Holding(G1,S0).Gold(G1).Adjacent([1,1],[1,2]) Adjacent([1,2],[1,1]).

Do It:Go([1,1],[1,2]).

Result:At(Agent,[1,2],Result(Go([1,1],[1,2]),S0)).

Now, can I grab the gold?Grab(G1).

Page 28: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200428

The Frame Problem

Result:

At(Agent,[1,2],Result(Go([1,1],[1,2]),S0)).

Now, can I grab the Gold?

Grab(G1).

Page 29: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200429

The Frame Problem

Result:

At(Agent,[1,2],Result(Go([1,1],[1,2]),S0).

Now, can I grab the Gold?

Grab(G1).

What in the knowledge base allows me to go from my Result (above) to Grab(G1)?

Page 30: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200430

The Frame Problem

Result:

At(Agent,[1,2],Result(Go([1,1],[1,2]),S0).

Now, can I grab the Gold?

Grab(G1).

What in the knowledge base allows me to go from my Result (above) to Grab(G1)?

nothing

Page 31: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200431

The Frame Problem

• How do we represent all the things in the world that stay the same?– Represent all things at all situations; the

representational frame problem– Project the results of a sequence of actions; the

inferential frame problem

Page 32: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200432

Representational Frame Problem

• Successor-State Axiom:Action is possible (Fluent is true in result state

Action’s effect made it true It was true before and action left it alone).

• Truth value of each fluent in the next state depends on action and truth value in the current state

Poss(a,s) (At(Agent,y,Result(a,s)) a = Go(x,y) (At(Agent,y,s) a Go(y,z))).

Page 33: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200433

Time and Event Calculus

• Event Calculus based on points in time

• Fluents hold at points in time as opposed to holding in situations

“A fluent is true at a point in time if the fluent was initiated by an event at some time in the past and was not terminated by an intervening event.”

Page 34: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200434

Event Calculus

• Initiates(e,f,t) and Terminates(w,f,t)

• Event Calculus Axiom:T(f,t2) e,t Happens(e,t) Initiates(e,f,t)

(t<t2) Clipped(f,t,t2)

Clipped(f,t,t2) e,t Happens(e,t1) Terminates(e,f,t1) (t < t1) (t1 < t2)

Page 35: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200435

Event Calculus - more

• Can be extended to handle;– indirect effects– continuous change– nondeterministic effects– causal constraints– . . .

Page 36: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200436

Generalized Events

• Combines aspects of space and time calculus

• Allows representation of events occurring in a space-time continuum

World War II is an event that happened in various geographic locations during a specific period of time within the 20th century.

Page 37: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200437

Processes

• Discrete Events: the event is a whole and a part of the event is no longer the same event

• Processes can include subintervals; a part of a plane flight is still a member of the Flying class (aka liquid events)

• Stated more precisely: “Any subinterval of a process is also a member of the same process category.”

Page 38: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200438

Intervals

• Moment: has temporal duration of zero

• Extended Interval: has temporal duration of greater than zeroPartition({Moments,ExtendedIntervals},Intervals)

Member(i,Moments) Duration(i) = Seconds(0).

Page 39: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200439

Intervals Ontology

Meet(i,j) Time(End(i)) = Time(Start(j)).

Before(i,j) Time(End(i)) < Time(Start(j)).

After(j,i) Before(i,j).

During(i,j) Time(Start(j)) Time(Start(i)) Time(End(i)) Time(End(j)).

Overlap(i,j) k During(k,i) During(k,j).

Page 40: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200440

Mental Events and Mental Objects

• Knowledge about beliefs, specifically about those beliefs held by an agent– “Which agent knows about the geography of Maine?”

• Provides an agent the ability to reason about beliefs of agents

• However, need to define propositional attitudes, such as Believes, Knows and Wants as relations where the second argument is referentially opaque (no substitution of equal terms)

Page 41: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200441

Reasoning Systems for Categories• Categories are KR building blocks

• Two primary systems for reasoning:– Semantic Networks

• Graphical aids for visualizing knowledge

• Mechanisms for inferring properties of objects based on category membership

– Description Logics• Formal language for constructing and combining

category definitions

• Algorithms for classifying objects and determining subsumption relationships

Page 42: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200442

Semantic Networks

• Graphical notation with underlying logical representation

• A form of logic, but not FOL

• Capable of representing objects, relations, quantification, …

• Convenient representation of inheritance

• Multiple Inheritance (sometimes)

• Inverse links

• Extendable using procedural attachments

Page 43: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200443

Semantic Networks - More

• Can only express binary relationships – making it more difficult to express n-ary predicates; e.g. Fly(Shankar,NewYork,NewDelhi,Monday)

• Negation, disjunction, nested function symbols, and existential quantification are missing

• Some SNs include procedural attachments

• Represents default values – assertions may be overridden by more specific values

Page 44: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200444

Semantic Networks

Mammals

Persons

Females Males

Mary JohnSisterOf

SubsetOf

SubsetOf SubsetOf

Legs 1

HasMother2Legs

Page 45: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200445

Description Logics

• Notations to make it easier to describe definitions and properties of categories

• Taxonomic structure is organizing principle• Subsumption: Determine if one category is a

subset of another• Classification: Determine the category in

which an object belongs• Consistency: Determine if membership criteria

are logically satisfiable

Page 46: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200446

Description Logics

• CLASSIC was one of first languages (Borgida, et al, 1989)

• “All bachelors are unmarried adult males.”– DL:Bachelor = And(Unmarried,Adult,Male).– FOL:Bachelor(x) Unmarried(x) Adult(x) Male(x)

Page 47: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200447

Description Logics

• What does this DL statement say?

And(Man,AtLeast(3,Son), AtMost(2,Daughter), All(Son,And(Unemployed,Married, All(Spouse,Doctor))), All(Daughter,And(Professor, Fills(Department,Physics,Math)))).

Page 48: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200448

Description Logics - More

• Emphasis on tractability of inference• Inference happens by;

– Describe the problem instance

– Asserting the instance into the KB to be handled by the subsumption apparatus

• FOL cannot predict solution time• DL solve in time polynomial in size of KB• DLs usually lack disjuntion and negation (for

time/speed considerations)

Page 49: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200449

Current Description Logic• DAML+OIL

– DARPA Agent Mark-up Language + Ontology Inference Language (OIL)

– Comes out of DARPA initiative– OIL from University of Manchester– http://www.w3.org/TR/daml+oil-reference

• OWL– Ontology Web Language– A language for the semantic web– “Next generation” DAML+OIL– Flavors: OWL-Lite, OWL-DL and OWL (full)– W3C recommendation as of Feb 10, 2004– http://www.w3.org/TR/2004/REC-owl-features-20040210/

Page 50: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200450

Reasoning with Default Information

• Open and Closed worlds– Open World: Information provided is not

assumed to be complete, therefore inferences may result in sentences whose truth value is unknown

– Closed World: Information provided is assumed complete, therefore ground sentences not asserted to be true are assumed false

– Negation as Failure: A negative literal, not P, can be “proved” true if the proof of P fails

Page 51: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200451

Nonmonotonic Logics: Circumscription

• Version of closed-world assumption• Specify predicates that are almost always

false• Default rule stating that birds fly:Bird(x) Abnormal(x) Flies(x)• Abnormal() is circumscribed – reasoner

assumes Abnormal() unless Abnormal() is known to be true

• Circumspection is model preference logic; notion of preferred models in KB

Page 52: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200452

Nonmonotonic Logics:Default Logic

• Default rules express contingencies:Bird(x) : Flies(x)/Flies(x)• If Bird(x) is true, and Flies(x) consistent

with KB, then conclude Flies(x) (by default)

• Default rule form is;

P : J1, …, Jn/C• P = Prerequisite; J = Justifications; C =

Conclusions• If any J is false, then C is not true

Page 53: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200453

Truth Maintenance Systems

• Designed to handle Belief Revision:

Page 54: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200454

Truth Maintenance Systems

• Designed to handle Belief Revision:

Let’s say our KB contains sentence P

Page 55: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200455

Truth Maintenance Systems

• Designed to handle Belief Revision:

Let’s say our KB contains sentence P

But P is found to be incorrect/untrue

Page 56: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200456

Truth Maintenance Systems

• Designed to handle Belief Revision:

Let’s say our KB contains sentence P

But P is found to be incorrect/untrue

So, we want to say Tell(KB,P)

Page 57: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200457

Truth Maintenance Systems

• Designed to handle Belief Revision:

Let’s say our KB contains sentence P

But P is found to be incorrect/untrue

So, we want to say Tell(KB,P)

First, though, Retract(KB,P) to avoid P P

Page 58: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200458

Truth Maintenance Systems

• Designed to handle Belief Revision:

Let’s say our KB contains sentence P

But P is found to be incorrect/untrue

So, we want to say Tell(KB,P)

First, though, Retract(KB,P) to avoid P P

What if P Q? What happens to Q?

Page 59: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200459

Truth Maintenance Systems

• Designed to handle Belief Revision:

Let’s say our KB contains sentence P

But P is found to be incorrect/untrue

So, we want to say Tell(KB,P)

First, though, Retract(KB,P) to avoid P P

What if P Q? What happens to Q?

Retract Q?

Page 60: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200460

Truth Maintenance Systems

• Designed to handle Belief Revision:

Let’s say our KB contains sentence P

But P is found to be incorrect/untrue

So, we want to say Tell(KB,P)

First, though, Retract(KB,P) to avoid P P

What if P Q? What happens to Q?

Retract Q?

But what if we also have R Q?

Page 61: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200461

Truth Maintenance Systems

• Designed to handle Belief Revision:

Let’s say our KB contains sentence P

But P is found to be incorrect/untrue

So, we want to say Tell(KB,P)

First, though, Retract(KB,P) to avoid P P

What if P Q? What happens to Q?

Retract Q?

But what if we also have R Q?

Therefore…

Page 62: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200462

Truth Maintenance Systems• “Rollback” mechanism – doesn’t scale up• Justification-based Truth Maintenance System

(JTMS)– Includes in the KB the set of sentences from which the

sentence was inferred

– Sentences are in or out, based on truth value of supporting sentences

• Assumption-based Truth Maintenance System (ATMS)– Maintains a set of supporting sentences, representing

all states

– Sentence holds in just those cases where all assumptions in one of the assumptions sets hold

Page 63: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200463

Justification-based TMS

• Each sentence in KB includes all sentences that made it true

P Q has justification {P, P Q}

• What if Q has the following justifications, and we Retract(P)?

Page 64: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200464

Justification-based TMS

• Each sentence in KB includes all sentences that made it true

P Q has justification {P, P Q}

• What if Q has the following justifications, and we Retract(P)?

{P, P Q}

Page 65: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200465

Justification-based TMS

• Each sentence in KB includes all sentences that made it true

P Q has justification {P, P Q}

• What if Q has the following justifications, and we Retract(P)?

{P, P Q}

{P, P R Q}

Page 66: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200466

Justification-based TMS

• Each sentence in KB includes all sentences that made it true

P Q has justification {P, P Q}

• What if Q has the following justifications, and we Retract(P)?

{P, P Q}

{P, P R Q}

{R, R P Q}

Page 67: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200467

Justification-based TMS

• Each sentence in KB includes all sentences that made it true

P Q has justification {P, P Q}

• What if Q has the following justifications, and we Retract(P)?

{P, P Q}

{P, P R Q}

{R, R P Q}

• Sentences that comprise Justifications are in or out (not removed from KB) – efficiency

Page 68: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200468

Assumption-based TMS

• Designed to make Belief Revision efficient

• Represents all states at the same time

• Each sentence in the KB has a set of assumption sets

• For each sentence in the KB, the sentence holds when all assumptions in one of it’s assumption sets hold

Page 69: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200469

Ontologies in Practice:The BioOntologies Consortium

Page 70: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200470

Outline

• Motivation– The problem– The solution

• Exchange Languages Evaluation– Initial Evaluation– Second-level Evaluation– Conclusions/Recommendations

• Future Work

Page 71: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200471

Motivation

• Explosive and uncontrolled growth of Bioinformation

• It is increasingly important in the life sciences to integrate information across scientific disciplines and business areas– Terminology in the domain of molecular

biology is inconsistent - information searches can be incomplete and inaccurate

– Definitions and descriptions of life sciences objects differ among data sources - significant time and effort is required to integrate those data sources

Page 72: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200472

What is DNA Topoisomerase?EC 5.99.1.2

DNA Nicking-Closing Protein

DNA Relaxing Enzyme

DNA Relaxing Protein

DNA Topoisomerase

DNA Topoisomerase I

DNA Type 1 Topoisomerase

DNA Untwisting Enzyme

DNA Untwisting Protein

Omega Protein

Topoisomerase I

Type I DNA Topoisomerase

Nicking-closing enzyme

Relaxing enzyme

Untwisting enzyme

w-Protein

Swivelase

UMLS says it’s ==>

Page 73: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200473

Motivation - Shared Ontologies

• Ontologies in the life sciences currently exist, but not in a coordinated/shared manner

• Shared ontologies provide benefits;– sharing the work– database integration– exchange of biological data– developing shared understandings– differences can provide focus on interesting

problems

Page 74: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200474

The Solution: Ontologies

“An ontology is a specification of a conceptualization.”

“An ontology is a description of the concepts and relationships that can exist for an agent or a community of agents. ... A common ontology defines the vocabulary with which queries and assertions are exchanged among agents.”

T.R. Gruber (1993)

Page 75: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200475

Goals of Ontologies• Provide standardized vocabularies for text

mining and information retrieval

• Formalized ontologies are expressed in a common language (or a small number of languages), facilitating representation and exchange of ontological knowledge

• Building common ontologies will establish shared understandings within the community so, create a consortium as a forum to develop these ontologies

Page 76: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200476

Bio-Ontologies Consortium Goals• Enable interoperability/exchange of life

sciences information• Establish a consortium for promoting and

sharing open-source ontologies in the Life Sciences

• Establish user community for sharing experiences with designing and building ontologies for the Life Sciences

• Develop synergies with the Knowledge Management community to target tools/languages to life sciences ontologies

• Create a permanent portal for the exchange of ontologies and ontology building tools

Page 77: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200477

Bio-Ontologies Consortium Activity

• Enable interoperability/exchange of life sciences information– Successful exchange depends on;

• Common, shared definitions

• Common language to describe definitions

– Therefore, select a language, or a small set of languages, for the exchange of life sciences ontologies

Page 78: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200478

Select Candidate Languages (1)• Ontolingua

– Long-standing effort in KR community

– Based on work for common interchange language

• CycL– Significant effort in KR community

– Largest commercial vendor of ontological tools

• OML/CKML– XML based language

– new language, so possible to influence development

• OPM– OO model to describe single- and multi-DB schemas

– tool used in bioinformatic community

Page 79: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200479

Select Candidate Languages (2)• XML and XML/RDF

– Web-based language– Significant work going on to extend expressivity

• UML– Widely used modeling tool in commercial marketplace– Based on OO concepts (supported by industry)

• OKBC– API for accessing distributed Knowledge Bases– Current work by KR community

• ASN.1– Early representation language for Bioinformatics

• ODL– De facto standard for OO databases

Page 80: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200480

Evaluation Criteria (1)• Language Support and Standardization

– Does the language have a formal specification?– What support (documentation, tutorials, tech

support, …) is available?– Does the language implement a standard? If so,

who controls this standard?

• Data model/capabilities– How rich is the expressiveness of the language,

I.e., does the language support negation, conjunction, disjunction, relations, ...

Page 81: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200481

Evaluation Criteria (2)

• Performance– Scalability to real-world problems– Stability (languages with tools/environments)

• Other Issues/Pragmatics– Current users of the language– Domains in which the language has been

applied– Connection to data sources (knowledge sources

& storage formats (relational, OO, …))

Page 82: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200482

Initial Evaluation - Results

• Keys to acceptance:– Rich expressive power– Stability and history of use– Approachable/understandable syntax– Open to collaboration

• Keys to non-acceptance:– Proprietary language– Wedded to a commercial system

Page 83: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200483

Initial Evaluation - Results

Express Stable Tools Support Collab Status

Ontolingua High High High Med Med IN

OML/CKML High High Low Med High IN

XML/RDF Low Med Low High High OUT

OKBC Med High Med Med Med OUT

OPM Med High High Med Low OUT

CycL High High High Med Low OUT

UML/XMI Low Med High Low Low OUT

Page 84: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200484

Next Level Evaluation• Two languages stood out as strong

candidates;– Ontolingua– OML/CKML

• Conduct experiments to represent biological entities;– select two life sciences ontologies

• Ecocyc Gene Ontology

• GeneClinics data model/ontology

– represent each ontology in both Ontolingua and OML

Page 85: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200485

Gene Ontology - Ontolingua (1)(DEFINE-CLASS |Genes| (?X)"The class of all genes is divided into several subclasses. Genes whose function is unknown or known only approximately are grouped into the classes ORFs and Unclassified-Genes, respectively. Genes of known function have been classified using two orthogonal classification schemes developed by Monica Riley. One scheme classifies genes according to the physiological role of their product class (Physiological-Roles); the other scheme classifies genes according to the function of their product, such as enzymes and transport proteins (Product-Types).”:DEF (AND (|DNA-Segments| ?X))) ?VALUE)))

(DEFINE-FUNCTION CENTISOME-POSITION (?FRAME) :-> ?VALUE"This slot lists the map position of this gene on the chromosome in centisome units.”:DEF (AND (|Genes| ?FRAME) (NUMBER ?VALUE)))

(DEFINE-RELATION CITATIONS (?FRAME ?VALUE) "This slot lists general citations pertaining to the object containing the slot. Each value of the slot is a citation of the form[reference-id].”:DEF (AND (|Organisms| ?FRAME) (STRING ?VALUE)))

(DEFINE-RELATION COMMENT (?FRAME ?VALUE)"The Comment slot stores a general comment about the object that contains the slot.”:DEF (AND (:THING ?FRAME) (STRING ?VALUE)))

(DEFINE-FUNCTION COMMON-NAME (?FRAME) :-> ?VALUE "The primary name by which an object is known to scientists -- a widely used and familiar name (in some cases arbitrary choices must be made).”:DEF (AND (|Organisms| ?FRAME) (STRING ?VALUE)))

(DEFINE-RELATION EVIDENCE (?FRAME ?VALUE) "Describes evidence for the defined function of this object. Currently we distinguish between function that is determined experimentally, and function that is determined through computational sequence analysis.”:DEF (AND (|Genes| ?FRAME) ((:ONE-OF :EXPERIMENT :SEQUENCE-ANALYSIS) ?VALUE)))

Page 86: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200486

Gene Ontology - Ontolingua (2)(DEFINE-RELATION HISTORY (?FRAME ?VALUE)"Contains a textual history of changes made to this frame. Each item is either a string or a note frame.":DEF (AND (:THING ?FRAME) ((:OR :STRING |Notes|) ?VALUE)))

(DEFINE-FUNCTION INTERRUPTED? (?FRAME) :-> ?VALUE "The value of this slot is T for genes that are interrupted, i.e., those that have an early stop codon inserted.”:DEF (AND (|Genes| ?FRAME) (BOOLEAN ?VALUE)))

(DEFINE-FUNCTION LEFT-END-POSITION (?FRAME) :-> ?VALUE“”:DEF (AND (|DNA-Segments| ?FRAME) (NUMBER ?VALUE)))

(DEFINE-RELATION PRODUCT (?FRAME ?VALUE)"This slot lists the product of a gene, which could be a polypeptide or a tRNA. Multiple products will be recorded in the case that several chemically modified forms of the protein product exist." :DEF (AND (|Genes| ?FRAME) ((:OR |Polypeptides| RNA) ?VALUE)))

(DEFINE-RELATION PRODUCT-STRING (?FRAME ?VALUE) "This slot holds a text string that describes the product of this gene; this slot is only used when EcoCyc does not describe the gene product as a frame (such as a polypeptide frame).”:DEF (AND (|Genes| ?FRAME) (STRING ?VALUE)))

(DEFINE-RELATION PRODUCT-TYPES (?FRAME ?VALUE) "Describes the type of the gene product, e.g., is it an enzyme, an RNA, etc.”:DEF (AND (|Genes| ?FRAME) ((:ONE-OF :ENZYME :REGULATOR :LEADER :MEMBRANE :TRANSPORT :STRUCTURAL :RNA :PHENOTYPE :FACTOR :CARRIER) ?VALUE)))

Page 87: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200487

Gene Ontology - Ontolingua (3)(DEFINE-FUNCTION RIGHT-END-POSITION (?FRAME) :-> ?VALUE””:DEF (AND (|DNA-Segments| ?FRAME) (NUMBER ?VALUE)))

(DEFINE-RELATION SYNONYMS (?FRAME ?VALUE) "One or more secondary names for an object -- names that a scientist might attempt to use to retrieve the object. The Synonyms should include any name a user might use to try to retrieve an object.”:DEF (AND (|Generalized-Reactions| ?FRAME) (STRING ?VALUE)))

(DEFINE-FUNCTION TRANSCRIPTION-DIRECTION (?FRAME) :-> ?VALUE "This slot specifies the direction along the chromosome in which this gene is transcribed; allowable values are + or -." :DEF (AND (DNA ?FRAME) ((:ONE-OF "+" "-") ?VALUE)))

Page 88: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200488

Gene Ontology - OML/CKML (1)<CKML> <Ontology id="Riley's Gene Classes" version="1.0"> <comment> This OML ontology defines an encoding of the gene classificationsystem developed by Monica Riley. </comment> <extends ontology="http://www.ckml.org/ontology/" prefix="CKML"/> <Object type="Genes"> <comment> The class of all genes is divided into several subclasses. Genes whose function is unknown or known only approximately are grouped into the classes ORFs and Unclassified-Genes, respectively. Genes of known function have been classified using two orthogonal classification schemes developed by Monica Riley. One scheme classifies genes according to the physiological role of their product class (Physiological-Roles); the other scheme classifies genes according to the function of their product, such as enzymes and transport proteins (Product-Types). </comment> </Object>

<Function type="LEFT-END-POSITION" srcType="Genes" tgtType="data.Real"/> <Function type="INTERRUPTED?" srcType="Genes" tgtType="data.Boolean"> <comment> The value of this slot is T for genes that are interrupted, i.e., those that have an early stop codon inserted. </comment> </Function> <BinaryRelation type="HISTORY" srcType="CKML#Object" tgtType="data.String"> <comment> Contains a textual history of changes made to this frame. Each item is either a string or a note frame. </comment> </BinaryRelation> <Theory genus="Evidence"> <Object type="EXPERIMENT"/> <Object type="SEQUENCE-ANALYSIS"/> </Theory>

Page 89: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200489

Gene Ontology - OML/CKML (2)<BinaryRelation type="EVIDENCE" srcType="Genes" tgtType="Evidence"> <comment> Describes evidence for the defined function of this object. Currently we distinguish between function that is determined experimentally, and function that is determined through computational sequence analysis. </comment> </BinaryRelation> <Function type="CENTISOME-POSITION" srcType="Genes" tgtType="data.Real"> <comment> This slot lists the map position of this gene on the chromosome in centisome units. </comment> </Function> <BinaryRelation type="CITATIONS" srcType="CKML#Object" tgtType="data.String"> <comment> This slot lists general citations pertaining to the object containing the slot. Each value of the slot is a citation of the form [reference-id]. </comment> </BinaryRelation> <BinaryRelation type="COMMENT" srcType="CKML#Object" tgtType="data.String"> <comment> The Comment slot stores a general comment about the object that contains the slot. </comment> </BinaryRelation> <Function type="COMMON-NAME" srcType="CKML#Object" tgtType="data.String"> <comment> The primary name by which an object is known to scientists -- a widely used and familiar name (in some cases arbitrary choices must be made). </comment> </Function> <Theory genus="Transcription-Direction"> <Object type="+"/> <Object type="-"/> </Theory> <Function type="TRANSCRIPTION-DIRECTION" srcType="Genes"tgtType="Transcription-Direction"> <comment> This slot specifies the direction along the chromosome in which this gene is transcribed; allowable values are + or -. </comment> </Function> <BinaryRelation type="PRODUCT" srcType="Genes" tgtType="Polypeptides"/> </BinaryRelation>

Page 90: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200490

Gene Ontology - OML/CKML (3)<BinaryRelation type="SYNONYMS" srcType="CKML#Object" tgtType="data.String"> <comment> One or more secondary names for an object -- names that a scientist might attempt to use to retrieve the object. The Synonyms should include any name a user might use to try to retrieve an object. </comment> </BinaryRelation> <BinaryRelation type="PRODUCT-STRING" srcType="Genes" tgtType="data.String"> <comment> This slot holds a text string that describes the product of this gene; this slot is only used when EcoCyc does not describe the gene product as a frame (such as a polypeptide frame). </comment> </BinaryRelation> <Theory genus="Product-Types"> <Object type="ENZYME"/> <Object type="REGULATOR"/> <Object type="LEADER"/> <Object type="MEMBRANE"/> <Object type="TRANSPORT"/> <Object type="STRUCTURAL"/> <Object type="RNA"/> <Object type="PHENOTYPE"/> <Object type="FACTOR"/> <Object type="CARRIER"/> </Theory> <BinaryRelation type="PRODUCT-TYPES" srcType="Genes" tgtType="Product-Types"> <comment> Describes the type of the gene product, e.g., is it an enzyme, an RNA, etc. </comment> </BinaryRelation> <Function type="RIGHT-END-POSITION" srcType="Genes" tgtType="data.Real"/>

Page 91: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200491

<Collection.Object> <Genes id="EG10707" text="pheA"> <LEFT-END-POSITION tgt="2735765"/> <CENTISOME-POSITION tgt="58.97035d0"/> <TRANSCRIPTION-DIRECTION tgt="+"/> <RIGHT-END-POSITION tgt="2736925"/> </Genes> </Collection.Object> <Collection.BinaryRelation> <EVIDENCE src="EG10707" tgt="EXPERIMENT"/> <NAMES src="EG10707" tgt="pheA"/> <NAMES src="EG10707" tgt="b2599"/> <PRODUCT src="EG10707" tgt="CHORISMUTPREPHENDEHYDRAT-MONOMER"/> <PRODUCT-STRING src="EG10707" tgt="chorismate mutase-P and prephenate dehydratase"/> </Collection.BinaryRelation>

Gene Ontology - OML/CKML (4)

Page 92: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200492

Experiments - Results (Ecocyc)

• OML representation:– OML’s expressive capabilities captured most

aspects of gene ontology– some limitations in expressive capability; no

facets, cardinality or multiple collection types– terminology differences and definitions not

modular

• Ontolingua representation:– Ontolingua expressed all of gene ontology– Lisp syntax of Ontolingua not readily

approachable

Page 93: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200493

Experiments - Results (GeneClinics)

• OML representation:– Expressive capabilities adequate to the job– OML/CKML is based on conceptual graphs and

may have more expressive capabilities in the long term

• Ontolingua representation:– Ontolingua based on frames semantics which

more closely aligns with relational and OO data models

– Lisp syntax not acceptable to larger community

• Both languages would benefit from life sciences examples

Page 94: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200494

Conclusions and Recommendations

• The language most suitable for the exchange of life sciences ontologies should have the following key characteristics:– Frame-based representation

• Long history of work with frame-based representation model

• Mappings between this model and relational and/or OO data sources are easily expressed

– XML-based syntax• Critical for exchange among physically dispersed

community• New tools being developed in XML community• Lots of momentum in the web-based community

Page 95: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200495

Current Efforts

• Developed specification for an XML-based exchange language – XOL (XML Ontology Language) based on Ontolingua (Karp/Chaudhri)

• Frame-based semantics for OML/CKML

• Developing process for submission of life sciences ontologies to the Bio-Ontologies Consortium

Page 96: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200496

Other Ontology Efforts

• Gene Ontology Consortium (http://genome-www.stanford.edu/GO/)

• BioPathways Consortium (http://www.3rdmill.com/BioPathways)

• mmCIF (http://ndbserver.rutgers.edu/mmcif)

Page 97: March 11, 2004 1 Knowledge Representation Chapter 10

March 11, 200497

Bio-Ontologies Consortium - Future Work

• Content development• Elicit and review ontology submissions• Synergies with OMG• Provide public-domain ontologies to the

Life Sciences community and encourage use of those ontologies

• Bio-Ontologies 2000