Upload
allyson-mckinney
View
229
Download
1
Tags:
Embed Size (px)
Citation preview
March 11, 20041
Knowledge RepresentationChapter 10
March 11, 20042
Outline
• KR Introduction• Ontological Engineering• Categories and Objects• Actions, Situations, and Events• Mental Events and Mental Objects• Reasoning Systems for Categories• Reasoning with Default Information• Truth Maintenance Systems• Bio-Ontologies
March 11, 20043
KR Introduction• General problem in Computer Science• Solutions = Data Structures
– words– arrays– records– list
• More specific problem in AI• Solutions = knowledge structures
– lists– trees– procedural representations– logic and predicate calculus– rules– semantic nets and frames– scripts
March 11, 20044
Kinds of Knowledge
• Objects– Descriptions– Classifications
• Events– Time sequence– Cause and effect
• Relationships– Among objects– Between objects and events
• Meta-knowledge
Things we need to talk about and reason about; what do we know?
Distinguish between knowledge and its representation
March 11, 20045
Representation Mappings
• Knowledge Level
• Symbol Level
• Mappings are not one-to-one
• Never get it complete or exactly right
Internal Representation
English Representation
FactsReasoning Programs
March 11, 20046
Ontological Engineering
• Like knowledge engineering but applies to general-purpose knowledge bases
• Ultimate goal is to represent everything in the world!!
• Result is an upper ontologyAnything/Root
AbstractObjects GeneralizedEvents
RepresentationalObjects
Categories
NumbersSets
Sentences
Places
Measurements
Interval
WeightsTimes
Processes
ThingsMoments
PhyscialObjects
Solid Liquid Gas
Stuff
Agents
Humans
Animals
March 11, 20047
Special- and General-purpose Ontologies
• Special-purpose ontology:– Designed to represent a specific domain of
knowledge;• genetics (GO)• immune system (IMGT)• mathematics (Tom Gruber)
• General-purpose ontology:– Should be applicable in any special-purpose
domain– Unifies different domains of knowledge
• Upper ontology provides highest level framework - all other concepts follow
March 11, 20048
Cyc Upper Ontology
• Cycorp released 3,000 upper-level concepts into public domain
• Cyc Upper Ontology satisfies two important criteria;– It is universal: Every concept can be linked to it– It is articulate: Distinctions are necessary and
sufficient for most purposes
March 11, 20049
Categories - Representation
• Two choices for representation:– Predicate
• Basketball(b)
– Object• Basketballs
• Member(b, Basketballs)
• Subset(Basketballs, Balls)
March 11, 200410
Categories - Organizing
• Inheritance:– All instances of the category Food are edible– Fruit is a subclass of Food– Apples is a subclass of Fruit– Therefore, Apples are edible
• The Class/Subclass relationships among Food, Fruit and Apples is a taxonomy
March 11, 200411
Categories - Partitioning
• Disjoint: The categories have no members in common
• Exhaustive Decomposition: Every member of the category is included in at least one of the subcategories
• Partition: Disjoint exhaustive decomposition
March 11, 200412
Categories - Partitioning
Disjoint({Animals,Vegetables})
March 11, 200413
Categories - Partitioning
Disjoint({Animals,Vegetables})
Disjoint(s) <=> (c1,c2 c1s c2s c1c2 Intersection(c1,c2) = {})
March 11, 200414
Categories - Partitioning
Disjoint({Animals,Vegetables})
Disjoint(s) <=> (c1,c2 c1s c2s c1c2 Intersection(c1,c2) = {})
ExhaustiveDecomposition({Americans,Canadians,Mexicans},NorthAmericans})
March 11, 200415
Categories - Partitioning
Disjoint({Animals,Vegetables})
Disjoint(s) <=> (c1,c2 c1s c2s c1c2 Intersection(c1,c2) = {})
ExhaustiveDecomposition({Americans,Canadians,Mexicans},NorthAmericans})
ExhaustiveDecomposition(s,c) (i ic c2 c2s ic2)
March 11, 200416
Categories - Partitioning
Disjoint({Animals,Vegetables})
Disjoint(s) <=> (c1,c2 c1s c2s c1c2 Intersection(c1,c2) = {})
ExhaustiveDecomposition({Americans,Canadians,Mexicans},NorthAmericans})
ExhaustiveDecomposition(s,c) (i ic c2 c2s ic2)
Partition({Males,Females},Animals)
March 11, 200417
Categories - Partitioning
Disjoint({Animals,Vegetables})
Disjoint(s) <=> (c1,c2 c1s c2s c1c2 Intersection(c1,c2) = {})
ExhaustiveDecomposition({Americans,Canadians,Mexicans},NorthAmericans})
ExhaustiveDecomposition(s,c) (i ic c2 c2s ic2)
Partition({Males,Females},Animals)
Parition(s,c) Disjoint(s) ExhaustiveDecomposition(s,c)
March 11, 200418
Categories - More
• PartOfPartOf(Bucharest,Romania)
PartOf(Romania,EasternEurope)
PartOf(EasternEurope,Europe)
PartOf(Europe,Earth)
• Composite ObjectsBiped(a) c1,c2,b Leg(c1) Leg(c2)
Body(b) PartOf(c1,a) PartOf(c2,a) PartOf(b,a) Attached(c1,b) Attached(c2,b) c1c2 [c3 Leg(c3) PartOf(c3,a) (c3=c1 c3=c2)]
March 11, 200419
Categories – And More
• Count Nouns and Mass Nouns– How many aardvarks? How many butters!?!
x Butter PartOf(y,x) y Butter
• Intrinsic and Extrinsic Properties– Intrinsic properties belong to the very substance
of the object; e.g. flavor, color, density, boiling point, etc.
– Extrinsic properties change if the object is changed (cut in half); e.g. weight, length, shape, etc.
March 11, 200420
Actions, Situations and Events
March 11, 200421
Situation Calculus• The states resulting from executing actions
• Ontology:– Situations: logical terms describing initial
situation and all situations that result from executing actions on a given situation
Result(a,s)
– Fluents: functions and predicates that may be different in different situations
Age(Wumpus,S0) is Wumpus age in situation S0
– Atemporal or eternal: functions and predicates that are constant across all situations
Gold(G1)
March 11, 200422
Situation Calculus – Actions
• Each action described by two axioms:– Possibility Axiom:
Preconditions Poss(a,s)– Effect Axiom:
Poss(a,s) changes that result from taking action
March 11, 200423
Situation Calculus - ExamplePossibility Axioms:
At(Agent,x,s) Adjacent(x,y) Poss(Go(x,y),s).
Gold(g) At(Agent,x,s) At(g,x,s) Poss(Grab(g),s).
Holding(g,s) Poss(Release(g),s).
Effect Axioms:
Poss(Go(x,y),s) At(Agent,y,Result(Go(x,y),s).
Poss(Grab(g),s) Holding(g,Result(Grab(g),s)).
Poss(Release(g),s) Holding(g,Result(Grab(g),s)).
March 11, 200424
Go for the Gold!GOAL: Bring the gold from [1,2] to [1,1]
At(Agent,[1,1],S0) At(G1,[1,2],S0).Holding(G1,S0).Gold(G1).Adjacent([1,1],[1,2]) Adjacent([1,2],[1,1]).
Do It:Go([1,1],[1,2]).
Result:At(Agent,[1,2],Result(Go([1,1],[1,2]),S0)).
Now, can I grab the gold?Grab(G1).
March 11, 200425
Go for the Gold!GOAL: Bring the gold from [1,2] to [1,1]
At(Agent,[1,1],S0) At(G1,[1,2],S0).Holding(G1,S0).Gold(G1).Adjacent([1,1],[1,2]) Adjacent([1,2],[1,1]).
Do It:Go([1,1],[1,2]).
Result:At(Agent,[1,2],Result(Go([1,1],[1,2]),S0)).
Now, can I grab the gold?Grab(G1).
March 11, 200426
Go for the Gold!GOAL: Bring the gold from [1,2] to [1,1]
At(Agent,[1,1],S0) At(G1,[1,2],S0).Holding(G1,S0).Gold(G1).Adjacent([1,1],[1,2]) Adjacent([1,2],[1,1]).
Do It:Go([1,1],[1,2]).
Result:At(Agent,[1,2],Result(Go([1,1],[1,2]),S0)).
Now, can I grab the gold?Grab(G1).
March 11, 200427
Go for the Gold!GOAL: Bring the gold from [1,2] to [1,1]
At(Agent,[1,1],S0) At(G1,[1,2],S0).Holding(G1,S0).Gold(G1).Adjacent([1,1],[1,2]) Adjacent([1,2],[1,1]).
Do It:Go([1,1],[1,2]).
Result:At(Agent,[1,2],Result(Go([1,1],[1,2]),S0)).
Now, can I grab the gold?Grab(G1).
March 11, 200428
The Frame Problem
Result:
At(Agent,[1,2],Result(Go([1,1],[1,2]),S0)).
Now, can I grab the Gold?
Grab(G1).
March 11, 200429
The Frame Problem
Result:
At(Agent,[1,2],Result(Go([1,1],[1,2]),S0).
Now, can I grab the Gold?
Grab(G1).
What in the knowledge base allows me to go from my Result (above) to Grab(G1)?
March 11, 200430
The Frame Problem
Result:
At(Agent,[1,2],Result(Go([1,1],[1,2]),S0).
Now, can I grab the Gold?
Grab(G1).
What in the knowledge base allows me to go from my Result (above) to Grab(G1)?
nothing
March 11, 200431
The Frame Problem
• How do we represent all the things in the world that stay the same?– Represent all things at all situations; the
representational frame problem– Project the results of a sequence of actions; the
inferential frame problem
March 11, 200432
Representational Frame Problem
• Successor-State Axiom:Action is possible (Fluent is true in result state
Action’s effect made it true It was true before and action left it alone).
• Truth value of each fluent in the next state depends on action and truth value in the current state
Poss(a,s) (At(Agent,y,Result(a,s)) a = Go(x,y) (At(Agent,y,s) a Go(y,z))).
March 11, 200433
Time and Event Calculus
• Event Calculus based on points in time
• Fluents hold at points in time as opposed to holding in situations
“A fluent is true at a point in time if the fluent was initiated by an event at some time in the past and was not terminated by an intervening event.”
March 11, 200434
Event Calculus
• Initiates(e,f,t) and Terminates(w,f,t)
• Event Calculus Axiom:T(f,t2) e,t Happens(e,t) Initiates(e,f,t)
(t<t2) Clipped(f,t,t2)
Clipped(f,t,t2) e,t Happens(e,t1) Terminates(e,f,t1) (t < t1) (t1 < t2)
March 11, 200435
Event Calculus - more
• Can be extended to handle;– indirect effects– continuous change– nondeterministic effects– causal constraints– . . .
March 11, 200436
Generalized Events
• Combines aspects of space and time calculus
• Allows representation of events occurring in a space-time continuum
World War II is an event that happened in various geographic locations during a specific period of time within the 20th century.
March 11, 200437
Processes
• Discrete Events: the event is a whole and a part of the event is no longer the same event
• Processes can include subintervals; a part of a plane flight is still a member of the Flying class (aka liquid events)
• Stated more precisely: “Any subinterval of a process is also a member of the same process category.”
March 11, 200438
Intervals
• Moment: has temporal duration of zero
• Extended Interval: has temporal duration of greater than zeroPartition({Moments,ExtendedIntervals},Intervals)
Member(i,Moments) Duration(i) = Seconds(0).
March 11, 200439
Intervals Ontology
Meet(i,j) Time(End(i)) = Time(Start(j)).
Before(i,j) Time(End(i)) < Time(Start(j)).
After(j,i) Before(i,j).
During(i,j) Time(Start(j)) Time(Start(i)) Time(End(i)) Time(End(j)).
Overlap(i,j) k During(k,i) During(k,j).
March 11, 200440
Mental Events and Mental Objects
• Knowledge about beliefs, specifically about those beliefs held by an agent– “Which agent knows about the geography of Maine?”
• Provides an agent the ability to reason about beliefs of agents
• However, need to define propositional attitudes, such as Believes, Knows and Wants as relations where the second argument is referentially opaque (no substitution of equal terms)
March 11, 200441
Reasoning Systems for Categories• Categories are KR building blocks
• Two primary systems for reasoning:– Semantic Networks
• Graphical aids for visualizing knowledge
• Mechanisms for inferring properties of objects based on category membership
– Description Logics• Formal language for constructing and combining
category definitions
• Algorithms for classifying objects and determining subsumption relationships
March 11, 200442
Semantic Networks
• Graphical notation with underlying logical representation
• A form of logic, but not FOL
• Capable of representing objects, relations, quantification, …
• Convenient representation of inheritance
• Multiple Inheritance (sometimes)
• Inverse links
• Extendable using procedural attachments
March 11, 200443
Semantic Networks - More
• Can only express binary relationships – making it more difficult to express n-ary predicates; e.g. Fly(Shankar,NewYork,NewDelhi,Monday)
• Negation, disjunction, nested function symbols, and existential quantification are missing
• Some SNs include procedural attachments
• Represents default values – assertions may be overridden by more specific values
March 11, 200444
Semantic Networks
Mammals
Persons
Females Males
Mary JohnSisterOf
SubsetOf
SubsetOf SubsetOf
Legs 1
HasMother2Legs
March 11, 200445
Description Logics
• Notations to make it easier to describe definitions and properties of categories
• Taxonomic structure is organizing principle• Subsumption: Determine if one category is a
subset of another• Classification: Determine the category in
which an object belongs• Consistency: Determine if membership criteria
are logically satisfiable
March 11, 200446
Description Logics
• CLASSIC was one of first languages (Borgida, et al, 1989)
• “All bachelors are unmarried adult males.”– DL:Bachelor = And(Unmarried,Adult,Male).– FOL:Bachelor(x) Unmarried(x) Adult(x) Male(x)
March 11, 200447
Description Logics
• What does this DL statement say?
And(Man,AtLeast(3,Son), AtMost(2,Daughter), All(Son,And(Unemployed,Married, All(Spouse,Doctor))), All(Daughter,And(Professor, Fills(Department,Physics,Math)))).
March 11, 200448
Description Logics - More
• Emphasis on tractability of inference• Inference happens by;
– Describe the problem instance
– Asserting the instance into the KB to be handled by the subsumption apparatus
• FOL cannot predict solution time• DL solve in time polynomial in size of KB• DLs usually lack disjuntion and negation (for
time/speed considerations)
March 11, 200449
Current Description Logic• DAML+OIL
– DARPA Agent Mark-up Language + Ontology Inference Language (OIL)
– Comes out of DARPA initiative– OIL from University of Manchester– http://www.w3.org/TR/daml+oil-reference
• OWL– Ontology Web Language– A language for the semantic web– “Next generation” DAML+OIL– Flavors: OWL-Lite, OWL-DL and OWL (full)– W3C recommendation as of Feb 10, 2004– http://www.w3.org/TR/2004/REC-owl-features-20040210/
March 11, 200450
Reasoning with Default Information
• Open and Closed worlds– Open World: Information provided is not
assumed to be complete, therefore inferences may result in sentences whose truth value is unknown
– Closed World: Information provided is assumed complete, therefore ground sentences not asserted to be true are assumed false
– Negation as Failure: A negative literal, not P, can be “proved” true if the proof of P fails
March 11, 200451
Nonmonotonic Logics: Circumscription
• Version of closed-world assumption• Specify predicates that are almost always
false• Default rule stating that birds fly:Bird(x) Abnormal(x) Flies(x)• Abnormal() is circumscribed – reasoner
assumes Abnormal() unless Abnormal() is known to be true
• Circumspection is model preference logic; notion of preferred models in KB
March 11, 200452
Nonmonotonic Logics:Default Logic
• Default rules express contingencies:Bird(x) : Flies(x)/Flies(x)• If Bird(x) is true, and Flies(x) consistent
with KB, then conclude Flies(x) (by default)
• Default rule form is;
P : J1, …, Jn/C• P = Prerequisite; J = Justifications; C =
Conclusions• If any J is false, then C is not true
March 11, 200453
Truth Maintenance Systems
• Designed to handle Belief Revision:
March 11, 200454
Truth Maintenance Systems
• Designed to handle Belief Revision:
Let’s say our KB contains sentence P
March 11, 200455
Truth Maintenance Systems
• Designed to handle Belief Revision:
Let’s say our KB contains sentence P
But P is found to be incorrect/untrue
March 11, 200456
Truth Maintenance Systems
• Designed to handle Belief Revision:
Let’s say our KB contains sentence P
But P is found to be incorrect/untrue
So, we want to say Tell(KB,P)
March 11, 200457
Truth Maintenance Systems
• Designed to handle Belief Revision:
Let’s say our KB contains sentence P
But P is found to be incorrect/untrue
So, we want to say Tell(KB,P)
First, though, Retract(KB,P) to avoid P P
March 11, 200458
Truth Maintenance Systems
• Designed to handle Belief Revision:
Let’s say our KB contains sentence P
But P is found to be incorrect/untrue
So, we want to say Tell(KB,P)
First, though, Retract(KB,P) to avoid P P
What if P Q? What happens to Q?
March 11, 200459
Truth Maintenance Systems
• Designed to handle Belief Revision:
Let’s say our KB contains sentence P
But P is found to be incorrect/untrue
So, we want to say Tell(KB,P)
First, though, Retract(KB,P) to avoid P P
What if P Q? What happens to Q?
Retract Q?
March 11, 200460
Truth Maintenance Systems
• Designed to handle Belief Revision:
Let’s say our KB contains sentence P
But P is found to be incorrect/untrue
So, we want to say Tell(KB,P)
First, though, Retract(KB,P) to avoid P P
What if P Q? What happens to Q?
Retract Q?
But what if we also have R Q?
March 11, 200461
Truth Maintenance Systems
• Designed to handle Belief Revision:
Let’s say our KB contains sentence P
But P is found to be incorrect/untrue
So, we want to say Tell(KB,P)
First, though, Retract(KB,P) to avoid P P
What if P Q? What happens to Q?
Retract Q?
But what if we also have R Q?
Therefore…
March 11, 200462
Truth Maintenance Systems• “Rollback” mechanism – doesn’t scale up• Justification-based Truth Maintenance System
(JTMS)– Includes in the KB the set of sentences from which the
sentence was inferred
– Sentences are in or out, based on truth value of supporting sentences
• Assumption-based Truth Maintenance System (ATMS)– Maintains a set of supporting sentences, representing
all states
– Sentence holds in just those cases where all assumptions in one of the assumptions sets hold
March 11, 200463
Justification-based TMS
• Each sentence in KB includes all sentences that made it true
P Q has justification {P, P Q}
• What if Q has the following justifications, and we Retract(P)?
March 11, 200464
Justification-based TMS
• Each sentence in KB includes all sentences that made it true
P Q has justification {P, P Q}
• What if Q has the following justifications, and we Retract(P)?
{P, P Q}
March 11, 200465
Justification-based TMS
• Each sentence in KB includes all sentences that made it true
P Q has justification {P, P Q}
• What if Q has the following justifications, and we Retract(P)?
{P, P Q}
{P, P R Q}
March 11, 200466
Justification-based TMS
• Each sentence in KB includes all sentences that made it true
P Q has justification {P, P Q}
• What if Q has the following justifications, and we Retract(P)?
{P, P Q}
{P, P R Q}
{R, R P Q}
March 11, 200467
Justification-based TMS
• Each sentence in KB includes all sentences that made it true
P Q has justification {P, P Q}
• What if Q has the following justifications, and we Retract(P)?
{P, P Q}
{P, P R Q}
{R, R P Q}
• Sentences that comprise Justifications are in or out (not removed from KB) – efficiency
March 11, 200468
Assumption-based TMS
• Designed to make Belief Revision efficient
• Represents all states at the same time
• Each sentence in the KB has a set of assumption sets
• For each sentence in the KB, the sentence holds when all assumptions in one of it’s assumption sets hold
March 11, 200469
Ontologies in Practice:The BioOntologies Consortium
March 11, 200470
Outline
• Motivation– The problem– The solution
• Exchange Languages Evaluation– Initial Evaluation– Second-level Evaluation– Conclusions/Recommendations
• Future Work
March 11, 200471
Motivation
• Explosive and uncontrolled growth of Bioinformation
• It is increasingly important in the life sciences to integrate information across scientific disciplines and business areas– Terminology in the domain of molecular
biology is inconsistent - information searches can be incomplete and inaccurate
– Definitions and descriptions of life sciences objects differ among data sources - significant time and effort is required to integrate those data sources
March 11, 200472
What is DNA Topoisomerase?EC 5.99.1.2
DNA Nicking-Closing Protein
DNA Relaxing Enzyme
DNA Relaxing Protein
DNA Topoisomerase
DNA Topoisomerase I
DNA Type 1 Topoisomerase
DNA Untwisting Enzyme
DNA Untwisting Protein
Omega Protein
Topoisomerase I
Type I DNA Topoisomerase
Nicking-closing enzyme
Relaxing enzyme
Untwisting enzyme
w-Protein
Swivelase
UMLS says it’s ==>
March 11, 200473
Motivation - Shared Ontologies
• Ontologies in the life sciences currently exist, but not in a coordinated/shared manner
• Shared ontologies provide benefits;– sharing the work– database integration– exchange of biological data– developing shared understandings– differences can provide focus on interesting
problems
March 11, 200474
The Solution: Ontologies
“An ontology is a specification of a conceptualization.”
“An ontology is a description of the concepts and relationships that can exist for an agent or a community of agents. ... A common ontology defines the vocabulary with which queries and assertions are exchanged among agents.”
T.R. Gruber (1993)
March 11, 200475
Goals of Ontologies• Provide standardized vocabularies for text
mining and information retrieval
• Formalized ontologies are expressed in a common language (or a small number of languages), facilitating representation and exchange of ontological knowledge
• Building common ontologies will establish shared understandings within the community so, create a consortium as a forum to develop these ontologies
March 11, 200476
Bio-Ontologies Consortium Goals• Enable interoperability/exchange of life
sciences information• Establish a consortium for promoting and
sharing open-source ontologies in the Life Sciences
• Establish user community for sharing experiences with designing and building ontologies for the Life Sciences
• Develop synergies with the Knowledge Management community to target tools/languages to life sciences ontologies
• Create a permanent portal for the exchange of ontologies and ontology building tools
March 11, 200477
Bio-Ontologies Consortium Activity
• Enable interoperability/exchange of life sciences information– Successful exchange depends on;
• Common, shared definitions
• Common language to describe definitions
– Therefore, select a language, or a small set of languages, for the exchange of life sciences ontologies
March 11, 200478
Select Candidate Languages (1)• Ontolingua
– Long-standing effort in KR community
– Based on work for common interchange language
• CycL– Significant effort in KR community
– Largest commercial vendor of ontological tools
• OML/CKML– XML based language
– new language, so possible to influence development
• OPM– OO model to describe single- and multi-DB schemas
– tool used in bioinformatic community
March 11, 200479
Select Candidate Languages (2)• XML and XML/RDF
– Web-based language– Significant work going on to extend expressivity
• UML– Widely used modeling tool in commercial marketplace– Based on OO concepts (supported by industry)
• OKBC– API for accessing distributed Knowledge Bases– Current work by KR community
• ASN.1– Early representation language for Bioinformatics
• ODL– De facto standard for OO databases
March 11, 200480
Evaluation Criteria (1)• Language Support and Standardization
– Does the language have a formal specification?– What support (documentation, tutorials, tech
support, …) is available?– Does the language implement a standard? If so,
who controls this standard?
• Data model/capabilities– How rich is the expressiveness of the language,
I.e., does the language support negation, conjunction, disjunction, relations, ...
March 11, 200481
Evaluation Criteria (2)
• Performance– Scalability to real-world problems– Stability (languages with tools/environments)
• Other Issues/Pragmatics– Current users of the language– Domains in which the language has been
applied– Connection to data sources (knowledge sources
& storage formats (relational, OO, …))
March 11, 200482
Initial Evaluation - Results
• Keys to acceptance:– Rich expressive power– Stability and history of use– Approachable/understandable syntax– Open to collaboration
• Keys to non-acceptance:– Proprietary language– Wedded to a commercial system
March 11, 200483
Initial Evaluation - Results
Express Stable Tools Support Collab Status
Ontolingua High High High Med Med IN
OML/CKML High High Low Med High IN
XML/RDF Low Med Low High High OUT
OKBC Med High Med Med Med OUT
OPM Med High High Med Low OUT
CycL High High High Med Low OUT
UML/XMI Low Med High Low Low OUT
March 11, 200484
Next Level Evaluation• Two languages stood out as strong
candidates;– Ontolingua– OML/CKML
• Conduct experiments to represent biological entities;– select two life sciences ontologies
• Ecocyc Gene Ontology
• GeneClinics data model/ontology
– represent each ontology in both Ontolingua and OML
March 11, 200485
Gene Ontology - Ontolingua (1)(DEFINE-CLASS |Genes| (?X)"The class of all genes is divided into several subclasses. Genes whose function is unknown or known only approximately are grouped into the classes ORFs and Unclassified-Genes, respectively. Genes of known function have been classified using two orthogonal classification schemes developed by Monica Riley. One scheme classifies genes according to the physiological role of their product class (Physiological-Roles); the other scheme classifies genes according to the function of their product, such as enzymes and transport proteins (Product-Types).”:DEF (AND (|DNA-Segments| ?X))) ?VALUE)))
(DEFINE-FUNCTION CENTISOME-POSITION (?FRAME) :-> ?VALUE"This slot lists the map position of this gene on the chromosome in centisome units.”:DEF (AND (|Genes| ?FRAME) (NUMBER ?VALUE)))
(DEFINE-RELATION CITATIONS (?FRAME ?VALUE) "This slot lists general citations pertaining to the object containing the slot. Each value of the slot is a citation of the form[reference-id].”:DEF (AND (|Organisms| ?FRAME) (STRING ?VALUE)))
(DEFINE-RELATION COMMENT (?FRAME ?VALUE)"The Comment slot stores a general comment about the object that contains the slot.”:DEF (AND (:THING ?FRAME) (STRING ?VALUE)))
(DEFINE-FUNCTION COMMON-NAME (?FRAME) :-> ?VALUE "The primary name by which an object is known to scientists -- a widely used and familiar name (in some cases arbitrary choices must be made).”:DEF (AND (|Organisms| ?FRAME) (STRING ?VALUE)))
(DEFINE-RELATION EVIDENCE (?FRAME ?VALUE) "Describes evidence for the defined function of this object. Currently we distinguish between function that is determined experimentally, and function that is determined through computational sequence analysis.”:DEF (AND (|Genes| ?FRAME) ((:ONE-OF :EXPERIMENT :SEQUENCE-ANALYSIS) ?VALUE)))
March 11, 200486
Gene Ontology - Ontolingua (2)(DEFINE-RELATION HISTORY (?FRAME ?VALUE)"Contains a textual history of changes made to this frame. Each item is either a string or a note frame.":DEF (AND (:THING ?FRAME) ((:OR :STRING |Notes|) ?VALUE)))
(DEFINE-FUNCTION INTERRUPTED? (?FRAME) :-> ?VALUE "The value of this slot is T for genes that are interrupted, i.e., those that have an early stop codon inserted.”:DEF (AND (|Genes| ?FRAME) (BOOLEAN ?VALUE)))
(DEFINE-FUNCTION LEFT-END-POSITION (?FRAME) :-> ?VALUE“”:DEF (AND (|DNA-Segments| ?FRAME) (NUMBER ?VALUE)))
(DEFINE-RELATION PRODUCT (?FRAME ?VALUE)"This slot lists the product of a gene, which could be a polypeptide or a tRNA. Multiple products will be recorded in the case that several chemically modified forms of the protein product exist." :DEF (AND (|Genes| ?FRAME) ((:OR |Polypeptides| RNA) ?VALUE)))
(DEFINE-RELATION PRODUCT-STRING (?FRAME ?VALUE) "This slot holds a text string that describes the product of this gene; this slot is only used when EcoCyc does not describe the gene product as a frame (such as a polypeptide frame).”:DEF (AND (|Genes| ?FRAME) (STRING ?VALUE)))
(DEFINE-RELATION PRODUCT-TYPES (?FRAME ?VALUE) "Describes the type of the gene product, e.g., is it an enzyme, an RNA, etc.”:DEF (AND (|Genes| ?FRAME) ((:ONE-OF :ENZYME :REGULATOR :LEADER :MEMBRANE :TRANSPORT :STRUCTURAL :RNA :PHENOTYPE :FACTOR :CARRIER) ?VALUE)))
March 11, 200487
Gene Ontology - Ontolingua (3)(DEFINE-FUNCTION RIGHT-END-POSITION (?FRAME) :-> ?VALUE””:DEF (AND (|DNA-Segments| ?FRAME) (NUMBER ?VALUE)))
(DEFINE-RELATION SYNONYMS (?FRAME ?VALUE) "One or more secondary names for an object -- names that a scientist might attempt to use to retrieve the object. The Synonyms should include any name a user might use to try to retrieve an object.”:DEF (AND (|Generalized-Reactions| ?FRAME) (STRING ?VALUE)))
(DEFINE-FUNCTION TRANSCRIPTION-DIRECTION (?FRAME) :-> ?VALUE "This slot specifies the direction along the chromosome in which this gene is transcribed; allowable values are + or -." :DEF (AND (DNA ?FRAME) ((:ONE-OF "+" "-") ?VALUE)))
March 11, 200488
Gene Ontology - OML/CKML (1)<CKML> <Ontology id="Riley's Gene Classes" version="1.0"> <comment> This OML ontology defines an encoding of the gene classificationsystem developed by Monica Riley. </comment> <extends ontology="http://www.ckml.org/ontology/" prefix="CKML"/> <Object type="Genes"> <comment> The class of all genes is divided into several subclasses. Genes whose function is unknown or known only approximately are grouped into the classes ORFs and Unclassified-Genes, respectively. Genes of known function have been classified using two orthogonal classification schemes developed by Monica Riley. One scheme classifies genes according to the physiological role of their product class (Physiological-Roles); the other scheme classifies genes according to the function of their product, such as enzymes and transport proteins (Product-Types). </comment> </Object>
<Function type="LEFT-END-POSITION" srcType="Genes" tgtType="data.Real"/> <Function type="INTERRUPTED?" srcType="Genes" tgtType="data.Boolean"> <comment> The value of this slot is T for genes that are interrupted, i.e., those that have an early stop codon inserted. </comment> </Function> <BinaryRelation type="HISTORY" srcType="CKML#Object" tgtType="data.String"> <comment> Contains a textual history of changes made to this frame. Each item is either a string or a note frame. </comment> </BinaryRelation> <Theory genus="Evidence"> <Object type="EXPERIMENT"/> <Object type="SEQUENCE-ANALYSIS"/> </Theory>
March 11, 200489
Gene Ontology - OML/CKML (2)<BinaryRelation type="EVIDENCE" srcType="Genes" tgtType="Evidence"> <comment> Describes evidence for the defined function of this object. Currently we distinguish between function that is determined experimentally, and function that is determined through computational sequence analysis. </comment> </BinaryRelation> <Function type="CENTISOME-POSITION" srcType="Genes" tgtType="data.Real"> <comment> This slot lists the map position of this gene on the chromosome in centisome units. </comment> </Function> <BinaryRelation type="CITATIONS" srcType="CKML#Object" tgtType="data.String"> <comment> This slot lists general citations pertaining to the object containing the slot. Each value of the slot is a citation of the form [reference-id]. </comment> </BinaryRelation> <BinaryRelation type="COMMENT" srcType="CKML#Object" tgtType="data.String"> <comment> The Comment slot stores a general comment about the object that contains the slot. </comment> </BinaryRelation> <Function type="COMMON-NAME" srcType="CKML#Object" tgtType="data.String"> <comment> The primary name by which an object is known to scientists -- a widely used and familiar name (in some cases arbitrary choices must be made). </comment> </Function> <Theory genus="Transcription-Direction"> <Object type="+"/> <Object type="-"/> </Theory> <Function type="TRANSCRIPTION-DIRECTION" srcType="Genes"tgtType="Transcription-Direction"> <comment> This slot specifies the direction along the chromosome in which this gene is transcribed; allowable values are + or -. </comment> </Function> <BinaryRelation type="PRODUCT" srcType="Genes" tgtType="Polypeptides"/> </BinaryRelation>
March 11, 200490
Gene Ontology - OML/CKML (3)<BinaryRelation type="SYNONYMS" srcType="CKML#Object" tgtType="data.String"> <comment> One or more secondary names for an object -- names that a scientist might attempt to use to retrieve the object. The Synonyms should include any name a user might use to try to retrieve an object. </comment> </BinaryRelation> <BinaryRelation type="PRODUCT-STRING" srcType="Genes" tgtType="data.String"> <comment> This slot holds a text string that describes the product of this gene; this slot is only used when EcoCyc does not describe the gene product as a frame (such as a polypeptide frame). </comment> </BinaryRelation> <Theory genus="Product-Types"> <Object type="ENZYME"/> <Object type="REGULATOR"/> <Object type="LEADER"/> <Object type="MEMBRANE"/> <Object type="TRANSPORT"/> <Object type="STRUCTURAL"/> <Object type="RNA"/> <Object type="PHENOTYPE"/> <Object type="FACTOR"/> <Object type="CARRIER"/> </Theory> <BinaryRelation type="PRODUCT-TYPES" srcType="Genes" tgtType="Product-Types"> <comment> Describes the type of the gene product, e.g., is it an enzyme, an RNA, etc. </comment> </BinaryRelation> <Function type="RIGHT-END-POSITION" srcType="Genes" tgtType="data.Real"/>
March 11, 200491
<Collection.Object> <Genes id="EG10707" text="pheA"> <LEFT-END-POSITION tgt="2735765"/> <CENTISOME-POSITION tgt="58.97035d0"/> <TRANSCRIPTION-DIRECTION tgt="+"/> <RIGHT-END-POSITION tgt="2736925"/> </Genes> </Collection.Object> <Collection.BinaryRelation> <EVIDENCE src="EG10707" tgt="EXPERIMENT"/> <NAMES src="EG10707" tgt="pheA"/> <NAMES src="EG10707" tgt="b2599"/> <PRODUCT src="EG10707" tgt="CHORISMUTPREPHENDEHYDRAT-MONOMER"/> <PRODUCT-STRING src="EG10707" tgt="chorismate mutase-P and prephenate dehydratase"/> </Collection.BinaryRelation>
Gene Ontology - OML/CKML (4)
March 11, 200492
Experiments - Results (Ecocyc)
• OML representation:– OML’s expressive capabilities captured most
aspects of gene ontology– some limitations in expressive capability; no
facets, cardinality or multiple collection types– terminology differences and definitions not
modular
• Ontolingua representation:– Ontolingua expressed all of gene ontology– Lisp syntax of Ontolingua not readily
approachable
March 11, 200493
Experiments - Results (GeneClinics)
• OML representation:– Expressive capabilities adequate to the job– OML/CKML is based on conceptual graphs and
may have more expressive capabilities in the long term
• Ontolingua representation:– Ontolingua based on frames semantics which
more closely aligns with relational and OO data models
– Lisp syntax not acceptable to larger community
• Both languages would benefit from life sciences examples
March 11, 200494
Conclusions and Recommendations
• The language most suitable for the exchange of life sciences ontologies should have the following key characteristics:– Frame-based representation
• Long history of work with frame-based representation model
• Mappings between this model and relational and/or OO data sources are easily expressed
– XML-based syntax• Critical for exchange among physically dispersed
community• New tools being developed in XML community• Lots of momentum in the web-based community
March 11, 200495
Current Efforts
• Developed specification for an XML-based exchange language – XOL (XML Ontology Language) based on Ontolingua (Karp/Chaudhri)
• Frame-based semantics for OML/CKML
• Developing process for submission of life sciences ontologies to the Bio-Ontologies Consortium
March 11, 200496
Other Ontology Efforts
• Gene Ontology Consortium (http://genome-www.stanford.edu/GO/)
• BioPathways Consortium (http://www.3rdmill.com/BioPathways)
• mmCIF (http://ndbserver.rutgers.edu/mmcif)
March 11, 200497
Bio-Ontologies Consortium - Future Work
• Content development• Elicit and review ontology submissions• Synergies with OMG• Provide public-domain ontologies to the
Life Sciences community and encourage use of those ontologies
• Bio-Ontologies 2000