View
994
Download
1
Embed Size (px)
Citation preview
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 1
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Mapping FAO’s AGROVOC Thesaurus and the
Chinese Agricultural Thesaurus (CAT)
Anita Liang
July 26, 2005
Sixth AOS Workshop
Vila Real, Portugal
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 2
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Outline
• The goals• Benefits• Project outputs• Characteristics of the terminologies• Definitions• Guidelines• Mapping• Outputs• Issues
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 3
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
• To draw equivalences between corresponding concepts within the two agricultural terminologies
• To enrich and structurally improve both sources
The goals
CAT
Chineseworld view
MAPPINGAGROVOC
Englishworld view
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 4
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Benefits
• Multilinguality: improved language coverage
• Domain coverage: improved domain coverage
• Interoperability: IS, applications
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 5
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Project outputs
(1) a mapping file that links corresponding concepts from CAT to AGROVOC
(2) a list of modifications to be applied to AGROVOC that serve both to improve its content and to provide a valid mapping
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 6
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Comparison• AGROVOC:
– 27736 English terms: 16769 descriptors, 10967 non descriptors
– 25060 Chinese terms: 16628 descriptors, 8432 non descriptors
– It is hierarchically structured with BT/NT relations. It has associative relations RT and UF/USE, as well as UF+.
• CAT: – Chinese terms: 64638 (51614 descriptors, 13024 non-
descriptors)– English terms: 90% descriptors (10% have both English
and Latin translations or no translations), no non-descriptors
– It is hierarchically structured with BT/NT relations and contains associative relations RT, UF/USE.
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 7
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Definitions: Mapping v. integration
• integration of different sources into a single unified thesaurus, may involve complete restructuring of both sources, recoverability and integrity of original sources less a priority than the overall logical consistency of the integrated product
• mapping of one source to the other, i.e., sources are revised, but each retains their original structure, mutual consistency is desirable but less a priority than establishing approximate equivalences
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 8
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Definitions (cont’d)• The source vocabulary is CAT; the target
vocabulary is AGROVOC.• Mapping means linking an entry in the source
vocabulary to an entry in the target vocabulary.• A term is a lexical representation of a concept.• An entry in CAT consists of the Chinese term and
any English translation(s) along with its relations to other entries. An entry in AGROVOC consists of at least one English or Chinese term along with their translations as well as its relations to other entries.
==> entry = concept
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 9
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Mapping between entries/concepts
AGROVOCCAT
zh term
en term
zh term
en term
fr term
es term
mappingCAT_ID = 123
(CAT termcode)
AGROVOC_ID = 345
(AGROVOC termcode)
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 10
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Working formats
What we have: RDBMS• AGROVOC scheme (MySQL)• CAT scheme (MySQL)
What we need: RDF(S)-based• SKOS?• OWL Lite
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 11
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Guidelines: General (1/4)• Entries should be mapped irrespective of their status
as descriptors or non-descriptors• Mappings should be between entry IDs, not term
IDs.• Many to one: multiple CAT entries can be mapped
to the same entry in the target vocabulary • One to many: an entry in CAT can be mapped to one
or more entries in the target vocabulary • Mapping relations are based on SKOS Mapping
relations and should include only the following: – Exact– Broader/Narrower (subsumption)– AND, OR, NOT
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 12
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Guidelines: Source/Target Modifications (2/4)
• When a gap occurs in either vocabulary because the corresponding term is missing, the term should be added to the appropriate vocabulary.
• When a gap occurs in the target vocabulary because the concept does not exist :
– If there is no parent in the target vocabulary to which it could be matched, then add the concept to the target vocabulary. Add the Chinese even if the English does not exist. Try to put relations where possible. Then do an exact mapping.
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 13
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Guidelines: Source/Target Modifications (3/4)
• Wrong translations should be fixed in both sources.
• Inconsistencies should be fixed within the terminologies
cat_zh1 BT cat_zh2agr_zh1 UF agr_zh2
• Conflicting semantics should be fixed within the terminologies
cat_zh1 BT cat_zh2 cat_en1 BT cat_en2agr_zh1 NT agr_zh2 agr_en1 NT agr_en2
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 14
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Guidelines: Source/Target Modifications (4/4)
• If two source entries need to be added to target vocabulary (but they have the same English translation), put a scope note or a definition to explain the difference.
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 15
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Mapping: exactMatch• If CAT entry A and AGROVOC entry B mean the same thing, i.e., are synonymous, they should be exact matches.
e.g., zh1 and zh2 are synonyms
cat_zh1 / cat_en1 agr_zh2 / agr_en1
exactMatch
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 16
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Mapping: broaderMatch
B is a concept that exists in CAT but not in AGROVOC
• solution 1) broaderMatch• solution 2) add the concept in the target (only in the original language) and do an exactMatch
a_Ac_A
c_B
broaderMatch
exactMatch
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 17
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
A is a concept that exists in CAT but not in AGROVOC
• solution 1) narrowMatch• solution 2) add the concept a_B in the target (only in the original language) and do an exactMatch
Mapping: narrowMatch
a_Ac_A
narrowMatch
a_B
exactMatch
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 18
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Problem?
• CAT has concept { Mathematics } containing nearly 200 narrower terms
• AGROVOC has concept { Mathematics } with no narrow terms
• ==> Map all 200 CAT terms as broaderMatch to ag_Mathematics?
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 19
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Mapping: inheritance (1/3)
Map every source entries at the most general level in the target vocabulary.1. Map c_A to a_A2. Descendants of c_A are by inheritance mapped as descendants of a_A3. If there are corresponding descendants of c_A and a_A, they should
be mapped.
c_D c_Da_Da_D
a_Ac_A
c_B a_B
1
3
a_Ac_A
c_B
c_C
a_B
1
2
3
c_Ca_Ca_C2
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 20
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Mapping process: inheritance (2/3)
Another type of inheritance:1. Map c_A to a_A1 with exactMatch2. If there are corresponding descendants of c_A and a_A, they should be
mapped (c_B with a_B2).3. Descendant of c_B are by inheritance mapped as descendants of a_B2
c_D
c_A
c_B
c_D
1 exactMatch a_A1
c_C
a_B1
3
a_A2
a_B22
c_C
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 21
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Mapping: inheritance (3/3)
In case of partial inheritance, do not map single children (fig. 1) , but map the parent and exclude using NOT the entries that should not be mapped (fig. 2).
c_fo a_fo
a_B
a_D
a_CNOT
c_fo a_fo
a_B
a_D
a_C
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 22
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
The output (1/2)
<c:Concept ID=“uri”>
<prefLabel lang=“zh”> 中国 </prefLabel> <map:exactMatch>
<a:Concept ID =“uri”><prefLabel
lang=“en”>China</prefLabel> </a:Concept> </map:exactMatch></c:Concept>
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 23
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Application
JSP Page
cow search
search terms
FAOBIB
AGRIS(chinese)RDF
mapping
results
AGROVOCRDF
search records
search records
CATRDF
CAASBibliogr. DB
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 24
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Issues
• SKOS mapping– “interlingua”: language independence - mapping
is oriented towards source terminology– Set theory metaphor: Difficult to put into practice
• Both terminologies are multilingual in overlapping languages - what is being mapped?
Food and Agriculture
Organization of the UN
Library and Documentation
Systems Division
Slide 25
July 2005
Mapping CAT to AGROVOC
6th AOS Workshop
Vila Real(Portugal)
26-27 July 2005
Thank you.