Automatic Term RelationshipAutomatic Term RelationshipCleaning and Refinement Cleaning and Refinement
for for AGROVOC AGROVOC
* Asanee Kawtrakul, Aurawan Imsombut, Aree Thunkijjanukit, ** Dagobert Soergel,
*** Anita Liang, Margherita Sini, Gudrun Johannsen, and Johannes Keizer
* Department of Computer Engineering, Kasetsart University, Bangkok, Thailand** College of Library and Information Services, University of Maryland, College Park
*** FAO, Library & Documentation Systems Division, Rome, Italy
6th AOS Workshop - EFITA/WCCA2005
Portugal, 26 Jul 2005
2April 12, 2023
OutlineOutlineIntroduction
Structural Problems in AGROVOC
System Overview
Experimental Results
Conclusion and Future Work
3April 12, 2023
IntroductionIntroductionOntologies can enhance the performance of information processing systems.
Ontology has been automatically constructed by using raw text, dictionaries and thesauri.
We use AGROVOC as a resource to build an ontology in the domain of food and agriculture.
Many relationships in a thesaurus are incorrectly applied or defined too broadly.
4April 12, 2023
Structural Problems in AGROVOCStructural Problems in AGROVOC
Incorrectly assigned relationships
Vaguely defined relationships
Relationship Examples Remark (More Appropriate Relationship)
USE/UF 1. Locomotion UF Walking Incorrect Relationship
2. Digestive juices UF Chyme Incorrect Relationship
BT/NT 1. Milk NT Milk fat Milk <containsSubstance> Milk fat
2. Portugal BT Western Europe Portugal <spatiallyIncludedIn> Western Europe
Relationship Examples Remark (More Appropriate Relationship)
RT 1. Mutton RT Sheep Mutton <madeFrom> Sheep
2. Rice RT Rice flour Rice <usedToMake> Rice flour
3. FAO RT UN FAO <memberOf> UN
5April 12, 2023
System OverviewSystem Overview
Examples
Rules
Define Explicit Rules
Using Trainingstatistics-based Rules
Verification
Learning
WordNet Alignment
Noun Phrase Analysis
AGROVOC
Annotation
NP Rules
Detection and Suggestion ModuleRules Acquisition Module
Verification Module
Using Expert-Defined RulesRules
WordNet
6April 12, 2023
Rules Acquisition ModuleRules Acquisition Module
Expert-defined Rules
Learning-from-Examples Rules
7April 12, 2023
Expert-defined RulesExpert-defined Rules
Experts define rules by using data on concept types given in AGROVOC
Concept 1 Relation Concept 2
17 BT 10
30926 BT 55
94 BT 5104
6124 BT 8364
Concept Code
Concept Concept Type Data
17 Abies procera TP
10 Abies TP
55 Acarapis TA
94 Acinetobactor TB
8364 Western Europe GG
6124 Portugal GC
If X and Y are marked as “T*” in the concept type field, and X BT YThen taxonomic term X <subclassOf > taxonomic term Y Ex. Abies procera BT Abies
Remark:TP : Taxonomic term-PlantGC : Geographic term-Country level GG : Geographic term-above country level
If X and Y are marked as “G*” in the concept type field, and X BT YThen geographical entity X <spatiallyIncludedIn> geographical entity
Y Ex. Portugal BT Western Europe
8April 12, 2023
Learning-from-Examples RulesLearning-from-Examples Rules
Concept 1 Relation Concept 2
5015 RT 7030
6599 RT 25500
2791 RT 8069
4826 RT 896
4826 NT 4828
25784 RT 4954
Concept Code
Concept Concept Type Data
5015 Mutton
7030 Sheep
6599 Rice
25500 Rice flour
2791 FAO
8069 UN
4826 Milk
4828 Mlik Fat
25784 Engine Parts
4954 Engines
madeFrom / usedToMake Ex. Mutton<madeFrom>Sheep /
Sheep<usedToMake>Mutton
usedToMake / madeFrom Ex. Rice<usedToMake>Rice flour /
Rice flour<madeFrom>Rice
memberOf / hasMember Ex. FAO<memberOf>UN /
UN<hasMember>FAO
containsSubstance / SubstanceContainedInEx. Milk<containsSubstance>Milk fat /
Milk fat<substanceContainedIn>Milk
componentOf / hasComponent Ex. Engine part<componentOf>Engine /
Engine<hasComponent>Engine part
9April 12, 2023
Learning-from-Examples RulesLearning-from-Examples RulesUsing WordNet and Decision Tree learning method to extract
semantic relations rules
Manually tag term senses and specify the appropriate semantic
relationship by using the annotation tool
Ex. Mutton#1 <madeFrom> Sheep#1
Extract the complete hypernym path of each term from WordNet
Ex. {mutton#1, meat#1, food#2, solid#1, substance#1, entity#1}
{sheep#1, bovid#1, ruminant#1, mammal#1,vertebreate#1,
animal#1, organism#1, living_thing#1,
object#1,entity#1}
Applied C4.5 to learn the common ancestral concept for term1 and
term2, and generate the rules
10April 12, 2023
Learning-from-Examples RulesLearning-from-Examples Rules
Examples of hierarchical data used for training the ‘usedToMake’ relationship and the Training Rule
If class X is animal#1 and class Y is meat#1, and X RT YThen X <UsedToMake> YEx. Sheep RT Mutton, Swine RT Pork, Calf RT Veal
…
……
…
term1
entity#1 group#1
animal#1
vertibrate#1 young#1
sheep#1 swine#1 calf#1
term2
entity#1
object#1 substance#1
food#2
meat#1
mutton#1 pork#1 veal#1
11April 12, 2023
Detection and Suggestion ModuleDetection and Suggestion Module
Examples
Rules
Define Explicit Rules
Using Trainingstatistics-based Rules
Verification
Learning
WordNet Alignment
Noun Phrase Analysis
AGROVOC
Annotation
NP Rules
Detection and Suggestion ModuleRules Acquisition Module
Verification Module
Using Expert-Defined RulesRules
WordNet
12April 12, 2023
head word of terms
Noun Phrase AnalysisNoun Phrase AnalysisIf the head word of a term has the same surface form as its broader term, the system will apply the ‘subclassOf’/ ‘superclassOf’ relationship to them
Rules of compound noun. NP MOD + NCN
MOD NCN (Common Noun), NPN (Proper Noun), ADJ (Adjective), …
Ex.
Milk <superclassOf> Cow milk Milk BT Cow milk
same surface form
Using other techniques to refine the relationship Milk BT Milk fat
different surface form
13April 12, 2023
WordNet AlignmentWordNet Alignmentusing the hypernym/hyponym relationships of WordNet to align the BT/NT relationship in AGROVOC, and using the synset of a term in WordNet to align the UF/USE relationship in AGROVOC
Cabbage
Cruciferous vegetable
Vegetable
Cabbage
Vegetable
Hypernym
Hypernym
BT
WordNet Data AGROVOC Data
Cabbage
Vegetable
Hypernym
AGROVOC Data
14April 12, 2023
Detection and Suggestion ModuleDetection and Suggestion Module
AGROVOC Cleaning_& Refinement (T1, T2, Rel) ;Return new__relationship
Input: Term1, Term2, RelationshipOutput: New Relationship1. If (Rel = BT or Rel = NT)
Then If Agree_Expert_defined_Rules (T1, T2, Rel)Then return new_refined_relationship. ; following the rulesElse If Headword-Is-Compatible (T1, T2)
Then return subclass/superclass relationship.Else If Is_Wordnet_HypernymPath (T1,T2)
Then return subclass/superclass relationship.Else If Agree_Revision_Rules (T1, T2, Rel)
Then return new_relationship ; following the rulesElse return U. ; Un-refined
2. Else If (Rel=UF or Rel = USE)Then If Is_Wordnet_Synset (T1, T2)
Then return synonym relationship.Else If Agree_Revision_Rules (T1, T2, Rel)
Then return new_relationship. ; following the rulesElse return U. ; Un-refined
3. Else If (Rel=RT)Then If Agree_Revision_Rules (T1, T2, Rel)
Then return new_relationship. ; following the rulesElse return U. ; Un-refined
An Algorithm for Data Cleaning and Relationship Refinement
15April 12, 2023
Detection and Suggestion ModuleDetection and Suggestion Module
RulesUsing Training
statistics-based Rules
WordNet Alignment
Noun Phrase Analysis
AGROVOC
NP Rules
Using Expert-Defined RulesRules
WordNet
USE/UF
AGROVOC
Using Trainingstatistics-based Rules
WordNet Alignment
Using Trainingstatistics-based Rules
BT/NT RT
16April 12, 2023
Verification ModuleVerification ModuleFor the expert to verify the semantic relationship
refinement results
17April 12, 2023
Experimental ResultsExperimental Results
Remarks: - indicates this technique can not revise this relationship* indicates the experiment is run with some data** indicates the experiment is in initial state
Relation-ship
No.No. of
refinement
Expert-defined rules
NP AnalysisWordNet
AlignmentTraining Rules
No.PC(%)
No.PC(%)
No.PC(%)
No.PC(%)
BT/NT 32176 21072 16587 100% 2062 95% 2423 95% ** **
USE/UF 21605 3553 - - - - 3553 70% ** **
RT 27589 1420 - - - - - - 798* 72%*
Total 81370 26045 16587 100% 2062 95% 5976 80% 798* 72%*
Some synonym relationship should be replaceed with abbreviaton_of.
Ex. AMP <synonym> Adenosine monophosphate
Ambiguity in concept class from training systemEx. If class X is food#1 and class Y is food#1,
and X RT Y, then X <usedToMake> Y- pork RT hams
18April 12, 2023
Conclusion and Future WorkConclusion and Future Work
Platform for semi-automatic cleaning and refinement relationships has been developed based on multiple techniques Noun phrase analysis WordNet alignment Semantic relationship rules
Expert-defined Learning-from-Examples
Wishes: Need collaborative training the system in order to extract rules automatically
19April 12, 2023
Thank you for your attention.