1
From thesauri to rich ontologies:
The AGROVOC case
Boris LauserFood and Agriculture Organization (FAO)
Rome, Italy
[email protected], www.fao.org
DELOS WorkshopLund, Sweden June 23, 2004
2
The problem
• AI and Semantic Web applications need full-fledged ontologies that support reasoning
• Constructing such ontologies is expensive
• While existing KOS do not provide the full set of precise concept relationships needed for reasoning,existing KOS, both large and small, represent much intellectual capital KOS = Knowledge Organization System
• How can this intellectual capital be put to use in constructing full-fledged ontologies
• Specifically: From AGROVOC to a full-fledged Food and Agriculture Ontology
3
Some applications of a Food and Agriculture Ontology
• Advice on crops and crop management (fertilization, irrigation)
• Advice on pest management
• Tracking contaminants through the food chain
• Advice on safe food processing
• Computing nutrition labels
• Advice on healthy eating
• Improved searching
4
AGROVOC relationships compared with more differentiated relationships
of a Food and Agriculture Ontology
5
AGROVOC Food and Agriculture Ontology
Undifferentiated hierarchical relationships
milk NT cow milk NT milk fat
cows
NT cow milk
Cheddar cheese
BT cow milk
Differentiated relationships
milk <includesSpecific> cow milk <containsSubstance> milk fat
cows
<hasComponent> cow milk*
Cheddar cheese
<madeFrom> cow milk
Rule 1
Part X <mayContainSubstance> Substance Y
IF Animal W <hasComponent> Part X AND Animal W <ingests> Substance Y
Rule 2
Food Z <containsSubstance> Substance Y
IF Food Z <madeFrom> Part X AND Part X <containsSubstance> Substance Y
6
From AGROVOC to FA Ontology
1) Define the FA Ontology structure
2) Fill in values from AGROVOC to the extent possible
3) Edit manually with computer assistanceusing the rules-as-you go approach andan ontology editor:
• make existing information more precise
• add new information
7
Define ontology structureOverall model
8
Concept
Relationshipsbetweenconcepts
Lexicalization/Term
String
Relationshipsbetweenstrings
Relationshipsbetweenterms
designated by
manifested asOther information:language/culture
subvocabulary/scopeaudiencetype, etc.
Note
annotation relationship
Relationship
RelationshipsbetweenRelationships
9
Define ontology structureRelationship types
10
Isa
Relationship Inverse relationship
X <includesSpecific>
X <inheritsTo> Y
Y <isa> X
Y <inheritsFrom> X
11
Holonymy / meronymy (the generic whole-part relationship)
Relationship Inverse relationship
X <containsSubstance> Y
X <hasIngredient> Y
X <madeFrom> Y
X <yieldsPortion> Y
X <spatiallyIncludes> Y
X <hasComponent> Y
X <includesSubprocess> Y
X <hasMember> Y
Y <substanceContainedIn> X
Y <ingredientOf> X
Y <usedToMake> X
Y <portionOf> X
Y <spatiallyIncludedIn> X
Y <componentOf> X
Y <subprocessOf> X
Y <memberOf> XY
12
Further relationship examples
Relationship Inverse relationship
X <causes> Y
X <instrumentFor> Y
X <processFor> Y
X <beneficialFor> Y
X <treatmentFor> Y
X <harmfulFor> Y
X <hasPest> Y
X <growsIn> Y
X <hasProperty> Y
X <hasSymptom> Y
X <similarTo> Y
X <oppositeTo> Y
X <hasPhase> Y
X <ingests> Y
X <madeFrom> Y
Y <causedBy> X
Y <performedByInstrument> X
Y <usesProcess> X
Y <benefitsFrom> X
Y <treatedWith> X
Y <harmedBy> X
Y <afflicts> X
Y <growthEnvironmentFor> X
Y <propertyOf> X
Y <indicates> X
Y <similarTo> X
Y <oppositeTo> X
Y <phaseOf> X
Y <ingestedBy> X
Y <usedToMake> X
13
Fill in values from AGROVOC
• Fill in values from AGROVOC to the extent possible
• Arrange in structured sequence (to the extent possible based on the information in AGROVOC) to facilitate editing(The editor can deal with similar problems at the same time.)
14
Undifferentiated relationships from AGROVOC
Edited relationships
milk NT cow milk
milk NT goat milk
milk NT buffalo milkmilk NT milk fatmilk RT milk proteinmilk RT lactosecows RT cow milk
goats RT goat milk
ewes RT ewe milk
goat milk RT goat cheese
ewe milk RT ewe cheese
acid soils BT chemical soil types
acrisols BT genetic soil types
alkaline soils BT chemical soil types
aluvial soils BT lithological soil types
chemical soil types BT soil types
Cichorium BT Asteraceae
Cichorium endivia BT Cichorium
Cichorium intybus BT Cichorium
Cichorium intybus RT coffee substitutes
Cichorium intybus RT root vegetablesblood NT blood protein
blood NT blood lipids
15
Edit manually with computer assistance
• Use the rules-as-you-go approach andgood ontology editing software that handles large ontologies efficiently
• make existing information more precise
• add new information
Assumption:
Entity types of concepts are known from AGROVOC or other sources (Langual, UMLS, WordNet); for example
milk fat is a Substance
Asteraceae is a taxon
The editor may need to determine the entity type
16
The rules-as-you-go approachExploit patterns to automate the conversion process
Example
1. An editor has determined that
milk NT cow milk should become milk <includesSpecific> cow milk
2. She recognizes that this is an example of the general pattern milk NT * milk milk <includesSpecific> * milk (where * is the wildcard character)
3. Given this pattern, the system can derive automatically
milk NT goat milk should become milk <includesSpecific> goat milk
Result:
17
Undifferentiated relationships from AGROVOC
Edited relationships
milk NT cow milk
milk NT goat milk
milk NT buffalo milkmilk NT milk fatmilk RT milk proteinmilk RT lactosecow RT cow milk
goats RT goat milk
ewes RT ewe milk
goat milk RT goat cheese
ewe milk RT ewe cheese
acid soils BT chemical soil types
acrisols BT genetic soil types
alkaline soils BT chemical soil types
aluvial soils BT lithological soil types
chemical soil types BT soil types
Cichorium BT Asteraceae
Cichorium endivia BT Cichorium
Cichorium intybus BT Cichorium
Cichorium intybus RT coffee substitutes
Cichorium intybus RT root vegetablesblood NT blood protein
blood NT blood lipids
milk <includesSpecific> cow milk
milk <includesSpecific> goat milk
milk <includesSpecific> buffalo milk
18
The rules as you go approachExploit patterns to automate the conversion process
1. Editor: milk NT milk fat milk <containsSubstance> milk fat
2. Pattern: Substance NT/RT Substance
Substance <containsSubstance> Substance
3. Thereforemilk RT milk protein milk <containsSubstance> milk protein
Result:
19
Undifferentiated relationships from AGROVOC
Edited relationships
milk NT cow milk
milk NT goat milk
milk NT buffalo milkmilk NT milk fatmilk RT milk proteinmilk RT lactosecows RT cow milk
goats RT goat milk
ewes RT ewe milk
goat milk RT goat cheese
ewe milk RT ewe cheese
acid soils BT chemical soil types
acrisols BT genetic soil types
alkaline soils BT chemical soil types
aluvial soils BT lithological soil types
chemical soil types BT soil types
Cichorium BT Asteraceae
Cichorium endivia BT Cichorium
Cichorium intybus BT Cichorium
Cichorium intybus RT coffee substitutes
Cichorium intybus RT root vegetablesblood NT blood protein
blood NT blood lipids
milk <includesSpecific> cow milk
milk <includesSpecific> goat milk
milk <includesSpecific> buffalo milkmilk <containsSubstance> milk fatmilk <containsSubstance> milk proteinmilk <containsSubstance> lactose
goat milk <containsSubstance> goat cheese
ewe milk <containsSubstance> ewe cheese
blood <containsSubstance> blood protein
blood <containsSubstance> blood lipids
20
The rules as you go approachExploit patterns to automate the conversion process
1. Editor:
cows RT cow milk cows <hasComponent> cow milk
2. Pattern Animal RT BodyPart Animal <hasComponent> BodyPart
3. Therefore:
goats NT goat milk goat <hasComponent> goat milk
Result:
21
Undifferentiated relationships from AGROVOC
Edited relationships
milk NT cow milk
milk NT goat milk
milk NT buffalo milkmilk NT milk fatmilk RT milk proteinmilk RT lactosecow RT cow milk
goats RT goat milk
ewes RT ewe milk
goat milk RT goat cheese
ewe milk RT ewe cheese
acid soils BT chemical soil types
acrisols BT genetic soil types
alkaline soils BT chemical soil types
aluvial soils BT lithological soil types
chemical soil types BT soil types
Cichorium BT Asteraceae
Cichorium endivia BT Cichorium
Cichorium intybus BT Cichorium
Cichorium intybus RT coffee substitutes
Cichorium intybus RT root vegetablesblood NT blood protein
blood NT blood lipids
milk <includesSpecific> cow milk
milk <includesSpecific> goat milk
milk <includesSpecific> buffalo milkmilk <containsSubstance> milk fatmilk <containsSubstance> milk proteinmilk <containsSubstance> lactosecows <hasComponent> cow milk
goats <hasComponent> goat milk
ewes <hasComponent> ewe milk
goat milk <containsSubstance> goat cheese
ewe milk <containsSubstance> ewe cheese
blood <containsSubstance> blood protein
blood <containsSubstance> blood lipids
22
The rules as you go approachExploit patterns to automate the conversion process
1. Editor:
acid soils BT chemical soil types acid soils <isa> chemical soil types
2. Pattern: X BT * type* X <isa> * type*
3. Therefore:
acrisols BT genetic soil types acrisols <isa> genetic soil types
Result:
23
Undifferentiated relationships from AGROVOC
Edited relationships
milk NT cow milk
milk NT goat milk
milk NT buffalo milkmilk NT milk fatmilk RT milk proteinmilk RT lactosecow RT cow milk
goats RT goat milk
ewes RT ewe milk
goat milk RT goat cheese
ewe milk RT ewe cheese
acid soils BT chemical soil types
acrisols BT genetic soil types
alkaline soils BT chemical soil types
aluvial soils BT lithological soil types
chemical soil types BT soil types
Cichorium BT Asteraceae
Cichorium endivia BT Cichorium
Cichorium intybus BT Cichorium
Cichorium intybus RT coffee substitutes
Cichorium intybus RT root vegetablesblood NT blood protein
blood NT blood lipids
milk <includesSpecific> cow milk
milk <includesSpecific> goat milk
milk <includesSpecific> buffalo milkmilk <containsSubstance> milk fatmilk <containsSubstance> milk proteinmilk <containsSubstance> lactosecows <hasComponent> cow milk
goats <hasComponent> goat milk
ewes <hasComponent> ewe milk
goat milk <containsSubstance> goat cheese
ewe milk <containsSubstance> ewe cheese
acid soils <isa> chemical soil types
acrisols <isa> genetic soil types
alkaline soils <isa> chemical soil types
aluvial soils <isa> lithological soil types
chemical soil type <isa> soil types
blood <containsSubstance> blood protein
blood <containsSubstance> blood lipids
24
The rules as you go approachExploit patterns to automate the conversion process
1. Editor:Cichorium BT Asteraceae Cichorium <isa> Asteraceae
2. Pattern: Taxon BT Taxon Taxon <isa> Taxon
3. Therefore:
Cichorium endivia BT Cichorium Cichorium endivia <isa> Cichorium
Result:
25
Undifferentiated relationships from AGROVOC
Edited relationships
milk NT cow milk
milk NT goat milk
milk NT buffalo milkmilk NT milk fatmilk RT milk proteinmilk RT lactosecow RT cow milk
goats RT goat milk
ewes RT ewe milk
goat milk RT goat cheese
ewe milk RT ewe cheese
acid soils BT chemical soil types
acrisols BT genetic soil types
alkaline soils BT chemical soil types
aluvial soils BT lithological soil types
chemical soil types BT soil types
Cichorium BT Asteraceae
Cichorium endivia BT Cichorium
Cichorium intybus BT Cichorium
Cichorium intybus RT coffee substitutes
Cichorium intybus RT root vegetablesblood NT blood protein
blood NT blood lipids
milk <includesSpecific> cow milk
milk <includesSpecific> goat milk
milk <includesSpecific> buffalo milkmilk <containsSubstance> milk fatmilk <containsSubstance> milk proteinmilk <containsSubstance> lactosecows <hasComponent> cow milk
goats <hasComponent> goat milk
ewes <hasComponent> ewe milk
goat milk <containsSubstance> goat cheese
ewe milk <containsSubstance> ewe cheese
acid soils <isa> chemical soil types
acrisols <isa> genetic soil types
alkaline soils <isa> chemical soil types
aluvial soils <isa> lithological soil types
chemical soil type <isa> soil types
Cichorium <isa> Asteraceae
Cichorium endivia <isa> Cichorium
Cichorium intybus <isa> Cichorium
blood <containsSubstance> blood protein
blood <containsSubstance> blood lipids
26
The rules as you go approachExploit patterns to automate the conversion process
1. Editor:Cichorium intybus RT coffee substitutes Cichorium intybus <usedToMake> coffee substitutes
2. Pattern: Taxon RT FoodProduct Taxon <usedToMake> FoodProduct
3. Therefore:Cichorium intybus RT root vegetables
Cichorium intybus <usedToMake> root vegetables
Result:
27
Undifferentiated relationships from AGROVOC
Edited relationships
milk NT cow milk
milk NT goat milk
milk NT buffalo milkmilk NT milk fatmilk RT milk proteinmilk RT lactosecow RT cow milk
goats RT goat milk
ewes RT ewe milk
goat milk RT goat cheese
ewe milk RT ewe cheese
acid soils BT chemical soil types
acrisols BT genetic soil types
alkaline soils BT chemical soil types
aluvial soils BT lithological soil types
chemical soil types BT soil types
Cichorium BT Asteraceae
Cichorium endivia BT Cichorium
Cichorium intybus BT Cichorium
Cichorium intybus RT coffee substitutes
Cichorium intybus RT root vegetablesblood NT blood protein
blood NT blood lipids
milk <includesSpecific> cow milk
milk <includesSpecific> goat milk
milk <includesSpecific> buffalo milkmilk <containsSubstance> milk fatmilk <containsSubstance> milk proteinmilk <containsSubstance> lactosecows <hasComponent> cow milk
goats <hasComponent> goat milk
ewes <hasComponent> ewe milk
goat milk <containsSubstance> goat cheese
ewe milk <containsSubstance> ewe cheese
acid soils <isa> chemical soil types
acrisols <isa> genetic soil types
alkaline soils <isa> chemical soil types
aluvial soils <isa> lithological soil types
chemical soil type <isa> soil types
Cichorium <isa> Asteraceae
Cichorium endivia <isa> Cichorium
Cichorium intybus <isa> Cichorium
Cichorium intybus <usedToMake> coffee substitutes
Cichorium intybus <usedToMake> root vegetablesblood <containsSubstance> blood protein
blood <containsSubstance> blood lipids
28
The rules as you go approachDiscussion
Main idea: Formulate constraints to assist the editor
• Ontology may have many relationship types, perhaps > 100
• Constraints limit the relationship types that are possible in a specific case; show the editor only these
• If the constraints limit possible relationship types to 1, conversion is automatic
• Constraints may depend on Thesaurus to be converted
29
Constraints
Thesaurus Relationships
Possible ontology relationships
NT / BT <hasMember> | <memberOf>
<includesSpecific> | <isa>
<hasComponent> | <componentOf>
<spatiallyIncludes> | <spatiallyIncludedIn>
etc.
RT <similarTo> | <similarTo>
<growsIn> | <EnvironmentForGrowing>
<treatmentFor> | <treatedWith>
<hasMember> | <memberOf>
<hasComponent> | <componentOf>
<madeFrom> | <usedToMake>
etc.
30
Constraints
Thesaurus Relationships
+ entity types or values
Possible ontology relationships
milk NT * milk
Substance NT Substance
X BT * type*
Taxon BT Taxon
GeogrEntity BT GeogrEntity
BodyPart BT BodyPart
ChemSubstance BT ChemSubstance
milk <includesSpecific> * milk
Substance <containsSubstance> Substance
X <isa> * type*
Taxon <isa> Taxon
GeogrEntity <spatiallyIncludedIn> GeogrEntity
BodyPart <isComponentOf> BodyPart
ChemSubstance <isa> ChemSubstance
31
Constraints
Thesaurus Relationships
+ entity types or values
Possible ontology relationships
Substance RT Substance
LivingOrganism RT BodyPart
Taxon RT FoodProduct
GeogrEntity RT GeogrGrouping
Process RT Object
ChemSubstance RT Function
Substance <containsSubstance> SubstanceSubstance <containedInSubstance> SubstanceSubstance <usedToMake> SubstanceSubstance <madeFrom> Substance
LivingOrganism <hasComponent> BodyPart
Taxon <usedToMake> FoodProduct
GeogrEntity <isMemberOf> GeogrGrouping
Process <performedByInstrument> ObjectProcess <affects> Object
ChemSubstance <usedFor> Function
32
Checking by editor
• Relationship instances created by editor by selecting from a constraint-generated menuare final
• Relationship instances created automatically must be presented to the editor
• If the editor determines that the relationship instances are almost always correct, she checks a box accept without checking
33
Overall conversion process
• One master editor must go through the file from start to finish,processing the relationship instances and creating patterns,creating new relationship types as needed
• Assistant editors can apply the patterns.
• In the first pass, the master editor should deal with the easy cases.
• Deal with the remaining cases later.Groups of similar relationship instances can be seen more easily in a smaller set
34
Adding new relationship types and new relationship instances
• AGROVOC does not contain all relationship types or relationship instances for AI applications
• Need to add data. For exampleOrganism X <hasPest> Organism Y
ChemSubstance X <actsAgainst> Organism Y
Organism X <actsAgainst> Organism Y
Plant X <growsIn> Environment Y
FoodProduct X <suitableFor> Diet Y
35
Conclusion
The rules-as-you-go approach is a realistic method for developing a rich ontology from an existing thesaurus
Full paper:Reengineering Thesauri for New Applications: the AGROVOC Example
Journal of Digital Information, Volume 4 Issue 4
http://jodi.ecs.soton.ac.uk/Articles/v04/i04/Soergel/
36
References
• For questions and discussion contact
Boris [email protected]
Dagobert [email protected]
• AOS: Agricultural Ontology Service Projecthttp://www.fao.org/agris/aos
• AGMES: http://www.fao.org/agris/agmes