19
Automatic Term Automatic Term Relationship Relationship Cleaning and Refinement Cleaning and Refinement for for AGROVOC AGROVOC * Asanee Kawtrakul, Aurawan Imsombut, Aree Thunkijjanukit, ** Dagobert Soergel, *** Anita Liang, Margherita Sini, Gudrun Johannsen, and Johannes Keizer * Department of Computer Engineering, Kasetsart University, Bangkok, Thailand ** College of Library and Information Services, University of Maryland, College Park *** FAO, Library & Documentation Systems Division, Rome, Italy 6th AOS Workshop - EFITA/WCCA2005 Portugal, 26 Jul 2005

Automatic Term Relationship Cleaning and Refinement for AGROVOC

Embed Size (px)

Citation preview

Page 1: Automatic Term Relationship Cleaning and Refinement for AGROVOC

Automatic Term RelationshipAutomatic Term RelationshipCleaning and Refinement Cleaning and Refinement

for for AGROVOC AGROVOC

* Asanee Kawtrakul, Aurawan Imsombut, Aree Thunkijjanukit, ** Dagobert Soergel,

*** Anita Liang, Margherita Sini, Gudrun Johannsen, and Johannes Keizer

* Department of Computer Engineering, Kasetsart University, Bangkok, Thailand** College of Library and Information Services, University of Maryland, College Park

*** FAO, Library & Documentation Systems Division, Rome, Italy

6th AOS Workshop - EFITA/WCCA2005

Portugal, 26 Jul 2005

Page 2: Automatic Term Relationship Cleaning and Refinement for AGROVOC

2April 12, 2023

OutlineOutlineIntroduction

Structural Problems in AGROVOC

System Overview

Experimental Results

Conclusion and Future Work

Page 3: Automatic Term Relationship Cleaning and Refinement for AGROVOC

3April 12, 2023

IntroductionIntroductionOntologies can enhance the performance of information processing systems.

Ontology has been automatically constructed by using raw text, dictionaries and thesauri.

We use AGROVOC as a resource to build an ontology in the domain of food and agriculture.

Many relationships in a thesaurus are incorrectly applied or defined too broadly.

Page 4: Automatic Term Relationship Cleaning and Refinement for AGROVOC

4April 12, 2023

Structural Problems in AGROVOCStructural Problems in AGROVOC

Incorrectly assigned relationships

Vaguely defined relationships

Relationship Examples Remark (More Appropriate Relationship)

USE/UF 1. Locomotion UF Walking Incorrect Relationship

2. Digestive juices UF Chyme Incorrect Relationship

BT/NT 1. Milk NT Milk fat Milk <containsSubstance> Milk fat

2. Portugal BT Western Europe Portugal <spatiallyIncludedIn> Western Europe

Relationship Examples Remark (More Appropriate Relationship)

RT 1. Mutton RT Sheep Mutton <madeFrom> Sheep

2. Rice RT Rice flour Rice <usedToMake> Rice flour

3. FAO RT UN FAO <memberOf> UN

Page 5: Automatic Term Relationship Cleaning and Refinement for AGROVOC

5April 12, 2023

System OverviewSystem Overview

Examples

Rules

Define Explicit Rules

Using Trainingstatistics-based Rules

Verification

Learning

WordNet Alignment

Noun Phrase Analysis

AGROVOC

Annotation

NP Rules

Detection and Suggestion ModuleRules Acquisition Module

Verification Module

Using Expert-Defined RulesRules

WordNet

Page 6: Automatic Term Relationship Cleaning and Refinement for AGROVOC

6April 12, 2023

Rules Acquisition ModuleRules Acquisition Module

Expert-defined Rules

Learning-from-Examples Rules

Page 7: Automatic Term Relationship Cleaning and Refinement for AGROVOC

7April 12, 2023

Expert-defined RulesExpert-defined Rules

Experts define rules by using data on concept types given in AGROVOC

Concept 1 Relation Concept 2

17 BT 10

30926 BT 55

94 BT 5104

6124 BT 8364

Concept Code

Concept Concept Type Data

17 Abies procera TP

10 Abies TP

55 Acarapis TA

94 Acinetobactor TB

8364 Western Europe GG

6124 Portugal GC

If X and Y are marked as “T*” in the concept type field, and X BT YThen taxonomic term X <subclassOf > taxonomic term Y Ex. Abies procera BT Abies

Remark:TP : Taxonomic term-PlantGC : Geographic term-Country level GG : Geographic term-above country level

If X and Y are marked as “G*” in the concept type field, and X BT YThen geographical entity X <spatiallyIncludedIn> geographical entity

Y Ex. Portugal BT Western Europe

Page 8: Automatic Term Relationship Cleaning and Refinement for AGROVOC

8April 12, 2023

Learning-from-Examples RulesLearning-from-Examples Rules

Concept 1 Relation Concept 2

5015 RT 7030

6599 RT 25500

2791 RT 8069

4826 RT 896

4826 NT 4828

25784 RT 4954

Concept Code

Concept Concept Type Data

5015 Mutton

7030 Sheep

6599 Rice

25500 Rice flour

2791 FAO

8069 UN

4826 Milk

4828 Mlik Fat

25784 Engine Parts

4954 Engines

madeFrom / usedToMake Ex. Mutton<madeFrom>Sheep /

Sheep<usedToMake>Mutton

usedToMake / madeFrom Ex. Rice<usedToMake>Rice flour /

Rice flour<madeFrom>Rice

memberOf / hasMember Ex. FAO<memberOf>UN /

UN<hasMember>FAO

containsSubstance / SubstanceContainedInEx. Milk<containsSubstance>Milk fat /

Milk fat<substanceContainedIn>Milk

componentOf / hasComponent Ex. Engine part<componentOf>Engine /

Engine<hasComponent>Engine part

Page 9: Automatic Term Relationship Cleaning and Refinement for AGROVOC

9April 12, 2023

Learning-from-Examples RulesLearning-from-Examples RulesUsing WordNet and Decision Tree learning method to extract

semantic relations rules

Manually tag term senses and specify the appropriate semantic

relationship by using the annotation tool

Ex. Mutton#1 <madeFrom> Sheep#1

Extract the complete hypernym path of each term from WordNet

Ex. {mutton#1, meat#1, food#2, solid#1, substance#1, entity#1}

{sheep#1, bovid#1, ruminant#1, mammal#1,vertebreate#1,

animal#1, organism#1, living_thing#1,

object#1,entity#1}

Applied C4.5 to learn the common ancestral concept for term1 and

term2, and generate the rules

Page 10: Automatic Term Relationship Cleaning and Refinement for AGROVOC

10April 12, 2023

Learning-from-Examples RulesLearning-from-Examples Rules

Examples of hierarchical data used for training the ‘usedToMake’ relationship and the Training Rule

If class X is animal#1 and class Y is meat#1, and X RT YThen X <UsedToMake> YEx. Sheep RT Mutton, Swine RT Pork, Calf RT Veal

……

term1

entity#1 group#1

animal#1

vertibrate#1 young#1

sheep#1 swine#1 calf#1

term2

entity#1

object#1 substance#1

food#2

meat#1

mutton#1 pork#1 veal#1

Page 11: Automatic Term Relationship Cleaning and Refinement for AGROVOC

11April 12, 2023

Detection and Suggestion ModuleDetection and Suggestion Module

Examples

Rules

Define Explicit Rules

Using Trainingstatistics-based Rules

Verification

Learning

WordNet Alignment

Noun Phrase Analysis

AGROVOC

Annotation

NP Rules

Detection and Suggestion ModuleRules Acquisition Module

Verification Module

Using Expert-Defined RulesRules

WordNet

Page 12: Automatic Term Relationship Cleaning and Refinement for AGROVOC

12April 12, 2023

head word of terms

Noun Phrase AnalysisNoun Phrase AnalysisIf the head word of a term has the same surface form as its broader term, the system will apply the ‘subclassOf’/ ‘superclassOf’ relationship to them

Rules of compound noun. NP MOD + NCN

MOD NCN (Common Noun), NPN (Proper Noun), ADJ (Adjective), …

Ex.

Milk <superclassOf> Cow milk Milk BT Cow milk

same surface form

Using other techniques to refine the relationship Milk BT Milk fat

different surface form

Page 13: Automatic Term Relationship Cleaning and Refinement for AGROVOC

13April 12, 2023

WordNet AlignmentWordNet Alignmentusing the hypernym/hyponym relationships of WordNet to align the BT/NT relationship in AGROVOC, and using the synset of a term in WordNet to align the UF/USE relationship in AGROVOC

Cabbage

Cruciferous vegetable

Vegetable

Cabbage

Vegetable

Hypernym

Hypernym

BT

WordNet Data AGROVOC Data

Cabbage

Vegetable

Hypernym

AGROVOC Data

Page 14: Automatic Term Relationship Cleaning and Refinement for AGROVOC

14April 12, 2023

Detection and Suggestion ModuleDetection and Suggestion Module

AGROVOC Cleaning_& Refinement (T1, T2, Rel) ;Return new__relationship

Input: Term1, Term2, RelationshipOutput: New Relationship1. If (Rel = BT or Rel = NT)

Then If Agree_Expert_defined_Rules (T1, T2, Rel)Then return new_refined_relationship. ; following the rulesElse If Headword-Is-Compatible (T1, T2)

Then return subclass/superclass relationship.Else If Is_Wordnet_HypernymPath (T1,T2)

Then return subclass/superclass relationship.Else If Agree_Revision_Rules (T1, T2, Rel)

Then return new_relationship ; following the rulesElse return U. ; Un-refined

2. Else If (Rel=UF or Rel = USE)Then If Is_Wordnet_Synset (T1, T2)

Then return synonym relationship.Else If Agree_Revision_Rules (T1, T2, Rel)

Then return new_relationship. ; following the rulesElse return U. ; Un-refined

3. Else If (Rel=RT)Then If Agree_Revision_Rules (T1, T2, Rel)

Then return new_relationship. ; following the rulesElse return U. ; Un-refined

An Algorithm for Data Cleaning and Relationship Refinement

Page 15: Automatic Term Relationship Cleaning and Refinement for AGROVOC

15April 12, 2023

Detection and Suggestion ModuleDetection and Suggestion Module

RulesUsing Training

statistics-based Rules

WordNet Alignment

Noun Phrase Analysis

AGROVOC

NP Rules

Using Expert-Defined RulesRules

WordNet

USE/UF

AGROVOC

Using Trainingstatistics-based Rules

WordNet Alignment

Using Trainingstatistics-based Rules

BT/NT RT

Page 16: Automatic Term Relationship Cleaning and Refinement for AGROVOC

16April 12, 2023

Verification ModuleVerification ModuleFor the expert to verify the semantic relationship

refinement results

Page 17: Automatic Term Relationship Cleaning and Refinement for AGROVOC

17April 12, 2023

Experimental ResultsExperimental Results

Remarks: - indicates this technique can not revise this relationship* indicates the experiment is run with some data** indicates the experiment is in initial state

Relation-ship

No.No. of

refinement

Expert-defined rules

NP AnalysisWordNet

AlignmentTraining Rules

No.PC(%)

No.PC(%)

No.PC(%)

No.PC(%)

BT/NT 32176 21072 16587 100% 2062 95% 2423 95% ** **

USE/UF 21605 3553 - - - - 3553 70% ** **

RT 27589 1420 - - - - - - 798* 72%*

Total 81370 26045 16587 100% 2062 95% 5976 80% 798* 72%*

Some synonym relationship should be replaceed with abbreviaton_of.

Ex. AMP <synonym> Adenosine monophosphate

Ambiguity in concept class from training systemEx. If class X is food#1 and class Y is food#1,

and X RT Y, then X <usedToMake> Y- pork RT hams

Page 18: Automatic Term Relationship Cleaning and Refinement for AGROVOC

18April 12, 2023

Conclusion and Future WorkConclusion and Future Work

Platform for semi-automatic cleaning and refinement relationships has been developed based on multiple techniques Noun phrase analysis WordNet alignment Semantic relationship rules

Expert-defined Learning-from-Examples

Wishes: Need collaborative training the system in order to extract rules automatically

Page 19: Automatic Term Relationship Cleaning and Refinement for AGROVOC

19April 12, 2023

Thank you for your attention.