MANAGING CHANGES IN CLASSIFICATION: the case of UDC
Aida Slavic Editor-in-Chief, UDC
FOCUS
• Bibliographic classification in the linked data environment
• Practical issues to do with changes in classification scheme
§ Consequences these changes have on
information exchange § Importance of publishing historical
classification data as linked data
539.1 Nuclear physics. Atomic physics. Molecular physics 539.12 Elementary and simple particles 539.123/.124 Leptons. Including: Muons 539.123 Neutrinos 539.123.6 Antineutrinos 539.124 Electrons (including beta-particles) 539.124.6 Positrons 539.125/.126 Hadrons. Baryons and mesons 539.125 Nucleons 539.125.4 Protons 539.125.46 Antiprotons 539.125.5 Neutrons 539.125.56 Antineutrons 539.126.3 Mesons 539.126.4 Resonances 539.126.6 Hyperons
ALPHABETICAL vs SYSTEMATIC
Antineutrinos Antineutrons Antiprotons Atomic physics Baryons Beta-particles Bosons Electrons Hadrons Hyperons Leptons Mesons Molecular physics Muons Neutrinos Neutrons Nuclear physics Nuclei Nucleons Positrons Protons Resonances
words alone can only be arranged or ordered alphabetically
Classification orders concepts systematically
539.1 Nuclear physics. Atomic physics. Molecular physics 539.12 Elementary and simple particles 539.123/.124 Leptons. Including: Muons 539.123 Neutrinos 539.123.6 Antineutrinos 539.124 Electrons (including beta-particles) 539.124.6 Positrons 539.125/.126 Hadrons. Baryons and mesons 539.125 Nucleons 539.125.4 Protons 539.125.46 Antiprotons 539.125.5 Neutrons 539.125.56 Antineutrons 539.126.3 Mesons 539.126.4 Resonances 539.126.6 Hyperons
539.1 Nuclear physics. Atomic physics. Molecular physics 539.12 Elementary and simple particles 539.123/.124 Leptons. Including: Muons 539.123 Neutrinos 539.123.6 Antineutrinos 539.124 Electrons (including beta-particles) 539.124.6 Positrons 539.125/.126 Hadrons. Baryons and mesons 539.125 Nucleons 539.125.4 Protons 539.125.46 Antiprotons 539.125.5 Neutrons 539.125.56 Antineutrons 539.126.3 Mesons 539.126.4 Resonances 539.126.6 Hyperons
NOTATION
Antineutrinos Antineutrons Antiprotons Atomic physics Baryons Beta-particles Bosons Electrons Hadrons Hyperons Leptons Mesons Mesons Molecular physics Muons Neutrinos Neutrons Nuclear physics Nuclei Nucleons Positrons Protons Resonances
alphabetical order systematic order semantic relationships fixed by notation
NOTATION – enables mechanical ordering of subjects
NOTATION - LANGUAGE INDEPENDENT
Class =162.3 Czech SKOS export from UDC Summary
CLASS vs CONCEPT
§ Class notation rarely represent a single concept § Sometimes the notation serves for practical grouping
of phenomena
§ This causes many issues when it comes to using ontology-based standards as vehicles for presenting and managing classification schemes
NOTATION: A PLACE HOLDER
598.2 Aves (Birds) 598.24 Gruiformes. Charadriiformes. Ciconiiformes 598.244 Ciconiiformes 598.244.2 Ciconiidae
Including: Storks (genera Ciconia and Mycteria); the Jabiru (genus Jabiru); openbill storks (genus Anastomus) and adjutants (genus Leptoptilos)
Note: “storks” (in English) can be roughly taken as a common term for most of the extant species of class Ciconiidae ...in other languages species in this class do not have the same common name e.g. the English word ‘storks’ cannot be translated accurately in other languages
NOTATION: A CONTAINER OF INFORMATION
582.53 Alismatales Including: Strictly extinct genus Heleophyton SN: Class here also Alismatidae (scientifically outdated)
... 597.2/.5 Pisces (fishes) (scientifically outdated)
Note: Bibliographic classifications often have to contain concepts – even after these stop to be scientifically relevant.
BIBLIOGRAPHIC CLASSIFICATIONS
§ deal with recorded knowledge, i.e. after it has been embodied in documents
§ organize literature about entities and not entities
themselves
§ have to fulfil additional requirements with respect to the context in which knowledge may be created, presented, recorded or used
NOT AN ONTOLOGY…
§ Bibliographic classifications are primarily concerned with subjects
Subject = systematized body of ideas Concept = an idea
§ What is the subject (forms of knowledge)? Mining, Chemistry, Medicine
§ What is the subject about? (topics) mining of gold physical properties of water angina pectoris
BIBLIOGRAPHIC CLASSIFICATIONS
Two dominant characteristics: § disciplinary organization - organize the universe of
knowledge by disciplines i.e. forms of knowledge - based on some scientific and educational consensus
§ aspect classification - groups phenomena according to the
way they are researched, described and studied in documents
POLYHIERARCHY
§ in the universe of knowledge one concept can belong to more than one broader category
Domestic animals
Pets
Carnivora
Canidae
Dog
“DISTRIBUTED RELATIVES”
Chemical industry Pest-control chemicals Chemicals for controlling rodents. Rodenticides Mouse
Agriculture Animal husbandry Rodents kept for fur Mouse
Zoology Mammals Rodentia. Lagomorpha Myiomorpha
Muridae. Mice and rats Mouse
Agriculture Plant protection Control of plant diseases and pests Destruction of vertebrate pests Mouse
see also
see also
see also
LINKING CONCEPTS ACROSS KNOWLEDGE
Sharks
Natural SciencesBiologyAnimalsVertebrataPisces (Fishes)Elasmobranchii
Sharks
Arts. Recreation. Entertainment. SportFilm. Cinema (motion pictures)Film genresDocumentary filmsDocumentaries about sharks
Social SciencesEconomic scienceEconomic sectorsTourismAdventure tourismSwimming with sharks
Arts. Recreation. Entertainment. SportSportSport fishingSea fishingShark fishing
Applied SciencesAgricultureFishingFishing for deep-‐sea speciesShark fishing
Applied SciencesIndustriesLeather industryFish skinSharkskin
681 PRECISION MECHANISMS AND INSTRUMENTS 681.1 Apparatus with wheel or motor mechanisms 681.2 Instrument-making in general. Instrumentation. 681.3 Computers first placed here before 1980s 681.5 Automatic control engineering 681.6 Graphic reproduction machines and equipment 681.7 Optical apparatus and instruments 681.8 Technical acoustics. Musical instruments
NEW KNOWLEDGE EMERGES
Relocated to a new class 004 UDC 004/006 Dewey
IT HAPPENS ALL THE TIME... STARTS AS ONE CONCEPT...
§ Finding logical place for new and pervasive concepts
NANOTECHNOLOGY medicine
technology
industry
computer technology
agriculture
BIOTECHNOLOGY agriculture
biology genetics
industry
medicine
=2 Western langauges =20 English =3 Germanic languages =4 Romance or Neo-Latin languages =50 Italian =60 Spanish =690 Portuguese =7 Classic languages. Latin and Greek =81 Slavonic langauges =88 Baltic languages =9 Oriental, African and other languages =91 Various Indo-European languages =92 Semitic languages =94 Hamitic languages ...
REMOVING BIAS
Wrong classification of languages - causes wrong classification of: - peoples - literatures - philology
=2 Western langauges =20 English =3 Germanic languages =4 Romance or Neo-Latin languages =50 Italian =60 Spanish =690 Portuguese =7 Classic languages. Latin and Greek =81 Slavonic langauges =88 Baltic languages =9 Oriental, African and other languages =91 Various Indo-European languages =92 Semitic languages =94 Hamitic langauges ...
CORRECTED 25 YEARS AGO (UDC)
causes wrong classification of: - peoples - literatures - linguistics
Change to new scientific classification (1980s) =1/=2 Indo-European languages =3 Caucasian & other languages. Basque =4 Afro-Asiatic, Nilo-Saharan, Congo-Kordofanian, Khoisan =5 Ural-Altaic, Japanese, Korean, Ainu, Palaeo-Siberian,
Eskimo-Aleut, Dravidian, Sino-Tibetan =6 Austro-Asiatic. Austronesian =7 Indo-Pacific, Australian =8 American Indian (Amerindian) languages =9 Artificial languages
MORE CULTURAL BIAS…
2 RELIGION. FAITHS
21/28 CHRISTIANITY
21 Natural theology. Theodicy. De Deo 22 The Bible. Holy scripture 23 Dogmatic theology 24 Practical theology 25 Pastoral theology 26 Christian church in general 27 General history of the Christian church 28 Christian churches, sects 29 NON CHRISTIAN RELIGIONS
EXAMPLE 3: CORRECTED 15 YEARS AGO
2 RELIGION. FAITHS
21/28 Christianity 21 Natural theology. Theodicy. De Deo 22 The Bible. Holy scripture 23 Dogmatic theology 24 Practical theology 25 Pastoral theology 26 Christian church in general 27 General history of the Christian church 28 Christian churches, sects 29 NON CHRISTIAN RELIGIONS
NOW.....
2 RELIGION. FAITHS 21 Prehistoric and primitive religions 22 Religions of the Far East 23 Religions of the Indian subcontinent 24 Buddhism 25 Religions of antiquity 26 Judaism 27 Christianity 28 Islam 29 Modern spiritual movements
-1 Theory, nature of religion -2 Evidence of religion -3 Persons in religion -4 Religious practice -5 Worship. Rites. Cult -6 Processes in religion -7 Religious organization -8 Various properties -9 History of the faith, religion,
denomination or church
GEO-POLITICAL ENTITIES
§ new entities are being created, many entities become ‘historical’
§ administrative subdivisions of modern countries change (approximately every 20 years) § counties, districts, administrative units
§ at the same time.. § ‘old’ subjects have and will continue to have literature
written about them § Roman Empire, Venetian Republic, Austro-Hungarian Empire (Bukowina,
Galizia), British Empire, USSR, Czechoslovakia, Yugoslavia § Living and inanimate objects and cultural artefacts are studied and written
about long after they are extinct, out of use or practice
TYPE OF CHANGES IN SCHEME
§ Relocation: moving/introducing entire hierarchies from one place of classification structure to another e.g. 40% of UDC has changed from 1990-2008
§ class is cancelled § new classes added § class scope may change § description may change, references may
change
TRADITIONAL APPROACH IN HANDLING CHANGES Changes as published in the Extension and Corrections to the UDC
More information about semantics of changes is kept in the UDC database (apart from revision field indicators, date of changes, date of introduction, source of change)
NOTATION BECOMES AMBIGUOUS
Bible 27-23 now represented 22
reused
was represented
22 Religions originating in Far East
Reuse of a notation for different concepts 26-23
§ unpopular but unavoidable
§ can happen 10-50 years apart (desirable) or instantly (to avoid)
CANCELLATION MAPPING DATA
CLASS ID: 16544 NOTATION: 22 CAPTION: Religions originating in the Far East INTRODUCED/DATE: 0012
REPLACES ID 15991: 299.1 Religions of Oriental Peoples NOTATION HISTORY: yes USED FOR: ID:17054: Bible REPLACED BY: ID:17355: Christian Bible
Managing notation history in the UDC database:
UDC CHANGES AND LIBRARIES
§ libraries continue to use classification numbers 20-50
years or longer – few libraries have resources to re-classify
§ libraries rarely record the UDC number provenance – if they do this may represent a particular language edition
§ consequence: new and old concept representations are
used side by side causing many issues in managing/mapping changes to facilitate information exchange
COMPLEX CLASSIFICATION STRINGS
Any part of the complex subject description can change over time
Such complex UDC codes are typical of in bibliographic databases/library catalogues
GOOD PRACTICE IN MANAGING SUBJECT ACCESS
DOCUMENT
IsDescribedBy
IsDescribedIn
CAN LINKED DATA SOLVE THE PROBLEM?
LINKED DATA THAT CANNOT BE LINKED
§ National library of Hungary <bibo:Document rdf:about="http://nektar.oszk.hu/resource/manifestation/2645471"> <dcterms:subject> <rdf:Description> <dcam:memberOf rdf:resource="http://purl.org/dc/terms/UDC"/>
<rdf:value>894.511-32</rdf:value> </rdf:Description> </dcterms:subject>
§ Trondheim - Library of Norwegian University Of Science And Technology (NTNU) – TEKORD http://ckan.net/package/tekord)
• all sets contain obsolete records cancelled from UDC 25 years ago or longer
• all sets contain complex UDC numbers that need to be parsed in order to be validated and linked
ON THE OTHER HAND…
§ UDC archive contains historical data and tracks changes of UDC numbers (from 1900-1990 in paper form)
§ from 1990-2014 changes in UDC recorded in the database – these can be accessed in the UDC Online service
§ UDC Online can be used as a vehicle for a proper support to libraries – allowing for validation, parsing, number builder but also for storing and downloading UDC strings as authority records
URI
§ Option 2: notation + database ID ....//UDCMRF/22_17054 [Bible] ...//UDCMRF/22_16554 [Religions originating in the Far East]
§ Option 1: using unique database ID for the class (avoiding notation as an identifier as it can have different meanings over time):
....//UDCMRF/17054 [Bible] ...//UDCMRF/16554 [Religions originating in the Far East]
This approach was used in UDC Summary LD http://udcdata.info/
§ Option 3: notation + ‘release stamp’ - problem: does notation introduced in UDC MRF93 continues to mean the same in MRF00 release?
....//UDCMRF/MRF93/22 [Bible]
....//UDCMRF/MRF99/22 [Bible]
....//UDCMRF/MRF00/22 [Religions originating in the Far East]
Together with the ‘absolute’ MRF ….UDCMRF/22
STANDARDS: LACKING APPROPRIATE SOLUTION
§ Solution 2 (by C. Guéret): extending SKOS/MRF data with either
§ event ontology (LODE http://linkedevents.org/ontology/)
§ PROV ontology (provenance)
which would allow publishing/sharing information about what is actually happening with the class.
§ SKOS lacks solution to represent historical data or to track historical changes and one has to look for solutions in other ontology-type standards for representing vocabularies
§ Solution 1 (by A. Isaac): Extending SKOS using dc terms to model changes as isVersionOf and isReplacedBy relationships – introducing notation as a udcmrf:reference that can aggregate different concepts - but most importantly to allow for the introduction of concept into UDC (an empty node)
But it is not only about indicating the relationship – rather it is about documenting the change. Hence a more complex model would be needed
SOLUTION 1: TOWARDS UDC CONCEPT (A. Isaac)
udcmrf:reference/22
"22"^^udc:notationskos:notation
udcmrf:22_17054
skos:prefLabel "Bible"@en
"22"^^udc:notation
skos:notation
udcmrf:22_16544
skos:prefLabel
"Far Eastreligions"@en
dct:isReplacedBy
ore:aggregates
ore:aggregates
"299.1"^^udc:notation
skos:notation
udcmrf:299.1_15999
skos:prefLabel
"Religions ofOriental
Peoples"@en
dct:isReplacedBy
udc:concept-FarEastReligion
dct:isVersionOf dct:isVersionOf
SOLUTION 2: CLASS CHANGES AS AN EVENT (C. Guéret)
§ this would allow to publish/share all UDC classes that ever existed with all data related to the class lifecycle as well as with the various attributes relevant for automatic linking or replacement
§ Such an approach would have to be supported with an appropriate service model
§ Works with URI that is based on a ‘release stamp’ and notation
dc:Creation rdfs:subClassOf lode:Event udc:Replacement rdfs:subClassOf lode:Event udc:Reuse rdfs:subClassOf lode:Event to get something similar to the following: udc:class/22 ical:hasEvent udc:event/1 udc:event/1 rdf:type udc:Creation udc:event/1 lode:involved udc:release/MRF10 udc:event/1 rdfs:comment “new class”
DATA NEED SHARING: NOTATION & CONCEPT HISTORY
§ Whenever UDC notation is re-used e.g. § notation used for: term describing concept for which the notation
was previously used § old concept moved to: ID of the class to which the concept was moved § date of concept move § source of concept move
§ Whenever a concept is moved from one class to another § concept that moved: term representing concept § concept previously at: ID of the class at which concept
was before § date of move § source of move
DATA NEED SHARING: CANCELLATION
§ UDC number may be cancelled but its record and its ID stays permanently § cancellation date (date of cancellation) § cancellation source (issue of Extensions & Corrections in which this is
published) § replaced by: ID of the record to which the UDC number is redirected § replacement type
controlled list of types, expressing what the cancelled number is replaced with: new class, colon combination; combination with common auxiliary; combination with special auxiliary; other
§ replacement (semantic) alignment controlled list: exact match, to broader, to narrower, approximation
UDC LINKED DATA ARCHITECTURE WILL GET MORE COMPLICATED
Towards look-up service based on classification RDF triple store… (C. Guéret)
CONCLUDING REMARKS
§ UDC RDF triple store should contain all data necessary to resolve and interpret strings coming from library catalogues (including historical UDC data)
§ libraries should not need to worry about resolving the semantics of UDC codes
§ UDC linked data should be supported by a front-end service (number look-up/resolution service) – which would enable parsing, validating and resolving URI for UDC codes
CFP closes on 8th March http://seminar.udcc.org/2015/
THANK YOU