37
Classification in contextualising, mapping and switching between vocabularies Aida Slavic UDC Consortium, The Hague [email protected] Colloquium Research Information Systems and Science Classifications: revisiting the NARCIS classification 28.09.2018

Classification in contextualising, mapping and switching ... · Dewey UDC Conspectus Aida Slavic The Hague 28 September 2018 UDC/DDC MAPPING BY CZECH NATIONAL LIBRARY. Aida Slavic

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Classification in contextualising, mapping and switching between vocabularies

Aida SlavicUDC Consortium, The Hague

[email protected]

Colloquium Research Information Systems and Science Classifications: revisiting the NARCIS classification

28.09.2018

OUTLINE

• Classification• Vocabulary mapping• Mapping projects• Mapping and linked data

Aida Slavic The Hague 28 September 2018

• classification for presentation of knowledgephilosophical classifications, pedagogical classifications

• classification for organization of knowledgescientific, economic or administrative classifications

• classifications for applications of knowledgeencyclopaedic classifications lexicographical and linguistic classifications, classifications in ontologies (expert systems)

• classifications for communication of knowledgelibrary & documentary, i.e. bibliographic classifications

KNOWLEDGE CLASSIFICATIONS

• classification schemes • position of each subject is determined by its relation to other

subjects• concept is usually represented by symbols• types: general and special knowledge scheme

§ alphabetical subject indexing languages § concept is represented by natural language terms§ position of subject determined by its name§ types:

§ subject headings (syntagmatic relationships between concepts) – can list concepts from all area of knowledge, no semantic relationships between entries

§ thesauri (language control, semantic relationships between concepts within limited field of knowledge) – each concepts can only have one broader category

Aida Slavic The Hague 28 September 2018

KNOWLEDGE ORGANIZATION SYSTEMS (KOS)

Bamboo sharksBlind sharksBrachaeluridaeBrachaelurusCarpet sharksCartilaginous fishesChiloscylliumChondrichthyesCladoselachiformes,Collared carpet sharksElasmobranchiiGaleomorphiiGinglymostomaGinglymostomatidaeHemiscylliidaeHemiscylliumHybodontiformesNurse sharksOrectolobiformesParascylliidaeParascylliumRhincodonRhincodontidaeShark - bamboo s., blind s.,

carpet s, collared carpet s., nurse s., whale s., etc.

Whale sharkXenacanthiformes

Aida Slavic The Hague 28 September 2018

ALPHABETICAL vs SYSTEMATIC ARRANGEMENT

English full ed. 1982 (class 628) Dutch abridged ed. 2013

Aida Slavic The Hague 28 September 2018

ALPHABETICAL INDEX TO CLASSIFICATION

Index shows all subject fields in which given term and/or concept appear (relative index) – it can list only terms as appear in the schedule or have additional terms/synonyms to assist discovery

alphabetical display of the thesaurus systematic display of the thesaurus

Aida Slavic The Hague 28 September 2018

“THESAURUSIFICATION” (UDC)

• class content is represented by a notation (language independent)=512.16 Jižní skupina turkických jazyků

Южная группа тюркских языков [Russian]तुक$ भाषाओं का द+,णी समहू [Hindi]

Թուրքական լեզուների հարավային խումբ [Armenian]

Νότια ομάδα των Τουρκικών γλωσσών [Greek]

���� [Chinese]

!"#$% &'"$( )*+,-./0/1234 [Bengali]

������ [Japanese]ಟ"# $ಾ&ೆಗಳ ದ+ಣ $ಾಗದ ಸಮೂಹ [Kannada]

• hierarchical organization

Aida Slavic The Hague 28 September 2018

MAIN CLASSIFICATION FEATURES

• disciplinary organization (in compliance with some recognized forms of knowledge and scientific and educational consensus)

• perspective or aspect classifications: one concept – many fields of knowledge (aspect classifications)

• organize information about entities and not entities themselves

• express context in which knowledge may be created, presented, recorded or used (form of presentation, carrier)

• semantic and syntactic relationships

Aida Slavic The Hague 28 September 2018

UNIVERSAL BIBLIOGRAPHIC CLASSIFICATIONS

ZoologyMammalia (mammals)

Lagomorpha (lagomorphs)Leporidae

AgricultureHunting

Various industriesTextile industry

Animal fibresHare fur. Rabbit fur

Animal husbandryRodents kept for fur

Rabbit

Preparation of foodstuffMethods of cooking

Braising

Small game

Animals kept as pets

Home economics

Animal husbandry

Agriculture

AgricultureRabbit

Oryctolagus (rabbit)

Angora rabbit hair

Braised rabbit

Rabbit

Aida Slavic The Hague 28 September 2018

ASPECT CLASSIFICATION

BEYOND HIERARCHY

the expressive power of classification is not only in the level of specificity, or number of classes and subclasses available…

It is in possibility to index and enable searching of content that can be complex, inter- or cross-disciplinary and unpredictable …

CLASSIFICATION VOCABULARY (UDC)

MAIN CLASSES (DISCIPLINES)

.0

-1/-9

SPECIAL AUXILIARYCONCEPTS

LANGUAGE=…

MATERIALS-03

RELATIONS-04

PERSONS-05

PROPERTIES-02

COMMON AUXILIARY CONCEPTS

TIME“ “

ETHNICS(=…)

PLACE(1/9)

FORM(0…)

15,000 CLASSES55,000 CLASSES

FACETED STRUCTURE AND SYNTHESIS IN UDC

Discipline 1

Discipline 2

Discipline 3

81 Linguistics and languages

811.112.2 German811.112.22 Upper German811.112.24 Middle German811.112.3/.4 Low German811.112.3 Plattdeutsch811.112.4 Frisian811.112.5 Dutch811.112.58 Dutch based

pidgin and creole

MAIN TABLES

Materials

Language

Time

Form

(1/9) Place

(4) Europe(430) Germany(436) Austria(437.3) Czech Republic(437.5) Slovakia(438) Poland

COMMON AUXILIARIES

-1 /-9 Schools, trends, methods

-116 Structuralism-116.2 Geneva school

‘0 Origins and periods of langusg

‘0 Origin and periods

‘1/’9 General theory of linguistics

‘1 Metatheory ‘2 Subject fields, facets of lin.‘34 Phonetics. Phonology’35 Graphemics. Orthography’36 Grammar’37 Semantics

SPECIAL AUXILIARY NUMBERS

RELATIONSHIPS BETWEEN SUBJECTS

37 :004 Education : Computers

338.48 :61 Tourism : Medicine

602.72 :17 Embryonic cloning : Ethics

FACET SYNTHESIS

History Scotland

94(410.5)“18”19th century

History Scotland

94 (410.5)“18”19th century

FACET INDICATORS

MORE DETAILS

-057 Persons according to occupation, work, livelihood, education -057.17 Managers in general. The management -057.177 Higher management. Top management-057.177.3 Directors. Board members-057.177.32 Non-executive directors-057.177.321 Deputy directors. Assistant directors

-056 Persons according to constitution, health, disposition, hereditary or other traits

-056.2 Persons according to physical state and health-056.25 Persons according to nourishment (nutritional

state) or body weight -056.257 Overweight persons. Overnourished. Fat.

Obese. Hypertrophic

-053 Persons according to age or age-groups-053.8 Adults. Grown-ups-053.88 Persons in late middle age (troisième âge)-

-056.257 -057.177 -053.88

Top management – Persons in late middle age-Overweight

612.12-009.92

Angina pectoris

1963 - GEIS (Groupe d’Etude sur l’Information Scientifique) first idea of linking information centres via an intermediary language

Aida Slavic The Hague 28 September 2018

INTERMEDIATE LANGUAGE (1960-1980)

UNISIST (1960s) project on an international information network considered the UDC as a serious candidate for a 'switching language'

• Upon reports from experts who looked into existing bibliographic classifications it was concluded that, although the least faulty, UDC would require too much work

• Standard Reference Code (5000 subjects organized hierarchically was proposed as a remedy) – some initial work was put into this but the idea never took of

Aida Slavic The Hague 28 September 2018

UNISIST PROJECT (1960s)

• Broad System of Ordering - British Classification Research Group proposed that the new fully faceted broad scheme be built to serve as an intermediary language. BSO was created 1979 (not implemented, used or developed since that date (http://www.ucl.ac.uk/fatks/bso/outline.htm#910)

• Information Coding Classification, by I. Dahlberg

• Bliss Bibliographic Classification 2, by J. Mills, V. Broughton)

None of the newly created schemes has every been implemented or widely used, completed or updated

Aida Slavic The Hague 28 September 2018

NEW SCHEMES FOR “SWITICHING”

The most common way of complementing classification scheme with an alphabetical indexing language in a library environment.UDC + subject heading index in the National Subject Authority File –CZENAS:

Aida Slavic The Hague 28 September 2018

CLASSIFICATION & SUBJECT HEADING SYSTEMS (1)

Aida Slavic The Hague 28 September 2018

SUBJECT AUTHORITY FILE (CZENAS)

Trends towards thesaurus/descriptors-like systems

Aida Slavic The Hague 28 September 2018

CLASSIFICATION & SUBJECT HEADING SYSTEMS (2)

• project Multilingual Subject Access to Catalogues of National Libraries (MSAC) 2005-2007 : mapping subject heading systems of 8 national libraries (Czech Republic, Slovakia, Slovenia, Croatia, Macedonia, Lithuania and Latvia) using UDC as a pivot

• MSAC results were utilized in the TEL-ME-MORE project (The European Library: Modular Extensions for Mediating)

• Online Resources)

Aida Slavic The Hague 28 September 2018

CLASSIFICATION & SUBJECT HEADING SYSTEMS (3)

WTO Library Thesaurus –> UDC (2008)

UDC

Example of an in-house solution -not connected to the UDC management. Problems:§ not updated to match

terminology changes in UDC§ data cannot be publicly shared

as it does not have copyright clearance

Aida Slavic The Hague 28 September 2018

UDC & THESAURI MAPPING

Available at: http://www.taranco.eu/udk/

Konverteringstabell mellan UDK och SAB

Aida Slavic The Hague 28 September 2018

UDC – SAB (Swedish Classification System)

Mapping of around 400 UDC/DDC classes to Conspectus subject categories

Dewey UDC Conspectus

Aida Slavic The Hague 28 September 2018

UDC/DDC MAPPING BY CZECH NATIONAL LIBRARY

Aida Slavic The Hague 28 September 2018

UDC – LBC (Library Bibliographic Class.)

UDC LBC

LBC is the main Russian classification scheme, mapping of an outline of 1000 clases is produced by the Book Chamber of Ukraine

HILT (High Level Thesaurus Project) 2000-2004• JISC (UK Joint Information Systems Committee)

funded, in Phase 2 changed to building a terminology service providing access to mapping data • pivot type intellectual mapping to Dewey Summaries

(1000 classes) from thesauri, classifications and subject headings (LCSH, AAT, MeSH, UNESCO, etc.)• publishing these mappings and making them

available as SKOS data• the service closed in 2004

Aida Slavic The Hague 28 September 2018

PUBLISHING AND SHARING KOS MAPPINGS (1)

CrissCross project (2006-2010)• funded by German Research Foundation at the time of the publication of the

German translation of Dewey by German National Library and the beginning of indexing German national bibliography using Dewey

• Mapping between SWD (German subject headings) and Dewey

Coli-conc (2017-)• project by Verbundzentrale des Gemeinsamer Bibliotheksverbund (VZG),

Common Library Network

• creation of concordances between different vocabularies used in German libraries - Basic Classification (BC), Dewey, Regensburg (network) Classification (RVK), Standard Thesaurus Witschaft (STW) and Universal Decimal Classification (UDC)

• Coli-Conc is a mapping registry, provides infrastructure for the mapping management, creation, assessment and quality control (Cocoda module)

Aida Slavic The Hague 28 September 2018

PUBLISHING AND SHARING KOS MAPPINGS (2)

UDC

Vocabulary 2

Vocabulary 1

DDC UDC

536 Heat 536 Heat. Thermodynamics [English]

Chaleur Chaleur. Thermodynamique [French]

Теплота Тепло. Термодинамика [Russian]

������ � [Chinese]

Examples from UDC and DDC summaries

Aida Slavic The Hague 28 September 2018

UDC SUMMARY AS A SWITCHING LANGUAGE (1)

example of the UDC class=162.3 Czech [Common auxiliary of language]

Aida Slavic The Hague 28 September 2018

UDC SUMMARY AS A SWITCHING LANGUAGE (2)

2600 UDC classes > 1000 DDC classes

Aida Slavic The Hague 28 September 2018

UDC SUMMARY > DEWEY SUMMARIES

• many other research projects dealt with KOS mapping: DESIRE, Renardus, TEL-ME-MORE, TELplus, STITCH, CARMEN, KoMoHe, Europeana, etc.

• numerous good quality vocabularies have already been published as SKOS i.e. linked data (https://www.w3.org/2001/sw/wiki/SKOS/Datasets ) -although better linking between KOS is yet to happen)

• publishing use and re-use of KOS mapping is still sparse

cf. Isaac. A (2010) “Progress in Semantic Mapping”, NKOS Workshop, Glasgow, 9 September.

Aida Slavic The Hague 28 September 2018

CONTINUOUS INTEREST IN MAPPINGS

• relevant for alphabetical indexing languages (especially thesauri), typically one-to-one mapping (pivot mapping not excluded)

• NLP, various methodologies to determine the equivalence between terms, concept and hierarchical relationships between different thesauri, utilising

concept equivalence algorithms

• large (digital) collections of documents indexed by thesauri are also processed to enhance/confirm semantic similarities

• speedy progress made possible through publishing KOS as open linked data (SKOS): explicit semantics & resolved copyright issues for vocabularies

already published as LOD

cf.

S. Faro, E. Francesconi, V. Sandrucci (2007) EUROVOC Studies LOT2 (2007): “D1.5 –

Thesauri KOS analysis and selected thesaurus mapping methodology on the project

case-study” (http://eurovoc.europa.eu/drupal/sites/all/files/Presentation-

D1.5_Mapping_Methodology.pdf)

A. Isaac, et. al. (2011) “Europeana and semantic alignment of vocabularies”, 10th

NKOS Workshop, Berlin, September 2011. https://at-

web1.comp.glam.ac.uk/pages/research/hypermedia/nkos/nkos2011/presentations/Is

aac-NKOS11.pdf

Aida Slavic The Hague 28 September 2018

AUTOMATIC MAPPING / ONTOLOGY MAPPING

• mapping between KOS in the background driven by practical needs (organizations, national information networks)• mapping to/between universal library classifications

(dominance imposed by national bibliographies, physical collections), esp. subject heading systems

• Internet ... cross-collection, cross-language, cross-domain resource discovery and mapping between vocabularies -central to numerous international projects

• solutions for sharing and publishing KOS & KOS mappings, terminological services, KOS linked data, automatic vocabulary mapping, ontology mapping, big data bottom-up approach in categorization of information

Aida Slavic The Hague 28 September 2018

SUMMARY

• need for vocabulary mapping registries and terminological services (updates and sustainability problem)• discovery and recovery of in-house/localized

mapping outputs (not known to the KOS data owners)• clarify copyright – LD related issues (with respect to

mappings)

Aida Slavic The Hague 28 September 2018

WORK AHEAD

Useful links:

• UDC Summary (2,600 classes, 57 languages)http://www.udcsummary.info

• UDC Summary as linked datahttp://udcdata.info/

• UDC Hub (complete UDC schedules online, 71,000 classes)http://www.udc-hub.com/

Aida Slavic The Hague 28 September 2018

THANK YOU