Upload
lisbeth-northrup
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Thesauri, interoperability and the role of ISO 25964
Stella G Dextre ClarkeProject Leader, ISO NP 25964
Chair, ISKO UKstella @lukehouse.org
1
Summary
Brief thesaurus chronology What role does the thesaurus have
now? The demand for interoperability Highlights from ISO 25964
2
Thesauri – a brief chronology Once upon a time, thesauri were at the cutting edge of
Information Retrieval (IR) technology Hey-day in 1960s and 1970s; after mid-1980s popularity
declined ISO 2788 and ISO 5964 (for monolingual and multilingual
thesauri respectively) came out 1974 - 1986. Internet/intranets in 1990s brought resurgence and
diversification (into other forms of controlled vocabulary, such as “taxonomies”)
TREC (1992 onwards) has shown dominance of statistical methods in IR. But stats alone are not enough!
At the turn of the century, thesauri back in fashion and work began on refurbishing the British and International standards
Semantic Web and SKOS developments provide more incentive Today, even Google employs some “taxonomists”.
3
Slide unearthed from TR’01(2001): The thesaurus coming back into fashion!
5
6
7
The role of controlled vocabularies today Needed where full text is not available, e.g.
image libraries and audio resources Invaluable for crossing language barriers Especially useful in-house, where the page
rank algorithms are less effective Essential to access vast databases and
catalogues of bibliographic data from decades past
Provide added value in combination with other methods, often hidden behind the scenes
In all these contexts, interoperability is key.8
Introducing ISO 25964ISO 25964: Thesauri and interoperability with other
vocabularies Part 1: Thesauri for information retrieval Part 2: Interoperability with other vocabularies
It updates ISO 2788 and ISO 5964 based on BS 8723, with much reworking Part 1, published in August 2011, covers
monolingual and multilingual thesauri Part 2, to be published in January 2013, covers
mapping between thesauri and other types of vocabulary
information retrieval seen as main application, including indexing as well as searching
9
What does “interoperability” mean?
Definition: ability of two or more systems or components to exchange information and to use the information that has been exchanged.In the case of thesauri and other KOS, broadly speaking interoperability applies at more than one level: presenting data in a standard way to enable import
and use in other systems (ISO 25964 Part 1) providing mappings between the terms/concepts of
one KOS and those of another (ISO 25964 Part 2) plus any other type of exchange between one KOS
and another (ISO 25964 Part 2)10
Linked Data Cloud in 2011 - Richard Cyganiak and Anja Jentzsch see http://lod-cloud.net/
A simplified view of interoperability
My thesaurus
Interoperability between vocabularies (see ISO 25964-2)
My thesaurus
Your thesaurus
GEMET
AGROVOC
LCSH
Dewey
Wordnet
Interoperability between applications (see ISO 25964-1)
Vocabulary management software
indexing/tagging software
search/browsing software
Content of ISO 25964-1, supporting interoperability between applications
thesaurus content and construction, mono- or multi-lingual (i.e. a complete update of ISO 2788 and ISO 5964)
guidance on applying facet analysis to thesauri guidance on managing thesaurus development
and maintenance functional requirements for software to
manage thesauri a data model and derived XML schema
15
16
Models for mapping Guidelines for mapping
Recommendations on mapping types How to handle pre-coordination Mapping to vocabularies other than thesauri:
classification schemes file plans (Classification schemes used for records
management) taxonomies subject heading schemes ontologies terminologies name authority lists synonym rings
Brief guidance on handling mappings data
Content of ISO 25964-2, supporting interoperability between vocabularies
17
Recommended “Models for mapping”
E
F
G
H
A B
C D
P Q R S
What does “mapping” mean? Definition: process of establishing relationships
between the concepts of one vocabulary and those of another
Recommended types of mapping are based on the standard internal relationship types, basically: equivalence, hierarchical and associative
Greater differentiation of mapping types is allowed, but is optional, to avoid complexity in simple applications
Full range of ISO 25964-2 mapping types Basic mapping types:
EquivalenceSimpleCompound
Intersecting compound equivalenceCumulative compound equivalence
HierarchicalBroaderNarrower
Associative Simple equivalence can be marked as “Exact” or
“Inexact”
Full range of ISO 25964-2 mapping types with examples Basic mapping types:
EquivalenceSimple: Laptop computers EQ Notebook
computersCompound
Intersecting compound equivalence:Women executives EQ Women +
ExecutivesCumulative compound equivalence:
Inland waterways EQ Rivers | Canals
HierarchicalBroader: Streets BM RoadsNarrower: Roads NM Streets
Associative: e-Learning RM Distance education Exact equivalence: Aubergines =EQ Egg-plants Inexact equivalence: Horticulture ~EQ Gardening
The joys of pre-coordination Examples:
599.742.71(084.12) photographs of lions (from UDC)Automobiles--Air conditioning--Maintenance and repair (from
LCSH)
Occurs characteristically in subject heading schemes, classification schemes, taxonomies and file plans
Mapping obliges use of the more complicated mapping types, especially compound equivalence
22
Vocabularies other than thesauri ISO 25964 is a standard for thesauri; it does
not attempt to standardize other types of KOS. It guides only on interoperability between thesauri and other types of KOS.
The clause on each KOS type presents: Key characteristics of the KOS (non-
normative) Semantic components/relationships (non-
normative) Recommendations for interoperability
between the KOS and a thesaurus, especially mapping (normative) 23
Vocabularies other than thesauri
The following are dealt with in ISO 25964: classification schemes file plans (classification schemes used for
records management) taxonomies subject heading schemes name authority lists synonym rings terminologies ontologies
General prospects for mapping- thesauri mapping relatively
straightforward
- classification schemes- file plans- taxonomies - subject heading schemes
concept mapping useful in IR, pre-coordination common
- name authority lists mapping usually straightforward but common concepts few
- synonym rings- terminologies- ontologies
concept mapping rarely useful; complementary uses are a more likely prospect
Ontologies are special… Definition of ontology excludes “lightweight”
examples such as thesauri and classification schemes
The Gruber/Studer definition is adopted, and interpreted broadly enough to admit OWL-based examples such as ORE and FOAF.
Mapping between ontologies and thesauri is not recommended.
Interoperability recommendations focus on use cases such as reengineering a thesaurus as an ontology, and complementary use of thesaurus with ontology. 26
Simple ontology illustration(credit: Jutta Lindenthal; see http://www.jlindenthal.de/IID/2012/Kurs_2012.htm )
27
Structural comparison
The illustration is used in ISO 25964 to draw out key similarities and differences between ontologies and thesauri.
The aim is to encourage emerging applications in which thesauri and ontologies can usefully interoperate.
28
Interoperability at the level of standards
SKOS
ISO25964
OWL
RDF
XML
SRU
Z39.19
MARC 21
REST
HTTP
BS 8723
ZThes
SPARQL
JSON
ISO2709
Z39.50
Dextre Clarke and Zeng, 2012. http://www.niso.org/publications/isq/2012/v24no1/clarke/ 30
The thesaurus coming back into fashion…
…although often hidden behind the scenes
And interoperability makes new tricks easier…
Want a copy of the standards? Download Part 1 from ISO at
http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=53657
Part 2 will be in the ISO catalogue next year Order from your national standards body (e.g.
BSI, DIN, ANSI, AFNOR) Some public/academic reference libraries stock
them ISO standards are not cheap to purchase However, the data model and XML schema for
exchange of thesaurus data are available online without charge or password control. Go to http://www.niso.org/schemas/iso25964/
34
Some extra slides with more detail
APPENDIX
35
Who is involved in developing the standard?•A Working Group (WG8), under the ISO subcommittee known as ISO TC46/SC9, has drafted the standard. •WG8 has members from 15 countries.•The WG8 Secretariat is provided by NISO in the USA•Currently active members of WG8 include:
Johan De Smedt Marianne Lykke
Stella Dextre Clarke (Leader) Esther Scheven
Michèle Hudon Douglas Tudhope
Daniel Kless Leonard Will
Jutta Lindenthal Marcia Lei Zeng
36
Intersecting versus cumulative equivalence
Mapping example from a pre-coordinated concept: inland waterway transport
Inland waterway transport EQ transport + (rivers | canals)
The Rialto Bridge, VeniceMichele Marieschi
© Bridgeman Education