34
CLASSIFICATION ON THE NETWORK: MACHINE READABLE, SHARED, CLONED AND HIDDEN Aida Slavic associate editor of the UDC [email protected] Glasgow, 3-5 September, CILIP Cataloguing and Indexing Group Conference

CLASSIFICATION ON THE NETWORK: MACHINE READABLE, SHARED, CLONED AND HIDDEN Aida Slavic associate editor of the UDC [email protected] Glasgow, 3-5 September,

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

CLASSIFICATION ON THE NETWORK: MACHINE READABLE,

SHARED, CLONED AND HIDDEN

Aida Slavicassociate editor of the UDC

[email protected]

Glasgow, 3-5 September, CILIP Cataloguing and Indexing Group Conference

2

CLASSIFICATION ON THE NETWORK

the use of classification outside bibliographic domain brought about by the Internet

• broad knowledge browsing, presentation (initially)• automatic classification

moving ‘behind the screen’ with digital repositories and cross repository resource discovery

• information integration and searching across distributed collections • mapping between vocabularies• supporting cross language searching• supplementing simple text retrieval techniques to enable search

expansion • alerting services• filtering by subject areas for various type of reports, auditing,

statistics• source of vocabulary to build new vocabularies

common to all is interest in readily available rich classification data on which services and tools can be built at lower costs

3

VOCABULARY SHARING ON THE NETWORK

Need for generally applicable standards for representing vocabularies in machine readable way

Preferance XML and XML/RDF technology – to promote domain, system and platform independence

Publishing, exposing and sharing controlled vocabularies on the network

• ISO/IEC 13250 Topic Maps • BS 8723 Structured vocabularies for information

retrieval • Simple Knowledge Organization System (SKOS)

See: Sharing Vocabularies on the Web via SKOS

4

VOCABULARY SERVICES & REGISTRIES

Making the content of knowledge organization systems (KOS) available through web services

initiative by NKOS – Network Knowledge Organization Systems and Services (http://nkos.slis.kent.edu/)

For registries we need: machine accessible vocabularies using representations standard

and access protocols metadata for describing KOS (using a standard for identifying

and describing vocabularies) business case/cost effectiveness upload of vocabularies into registries by owners and regular

maintenance and upload of versions

See: Tudhope, D. Knowledge Organization System Services: brief review of NKOS activities and possibility of KOS registrieshttp://www.iskouk.org/presentations/tudhope_ISKOUKseminar1.pdf

5

CLASSIFICATION & SEMANTIC WEB

classification’s capacity to represent and control complex semantic relationships across universe of knowledge is compatible with the semantic web goals - universal and meaningful linking of concepts

large collections of resources already organized according to classifications schemes are source of concept/subject relationships that can be utilized to improve automatic information integration

prerequisite: full machine readability of data! open access to classification data on the network

6

ABOUT CLASSIFICATION: OUTLINE

role of classification in supporting subject access

subject authority control: managing, sharing, re-use of classification

improving classification source data

539.1 Nuclear physics. Atomic physics. Molecular physics

539.12 Elementary and simple particles539.123/.124 Leptons. Including: Muons539.123 Neutrinos539.123.6 Antineutrinos539.124 Electrons (including beta-particles)539.124.6 Positrons539.125/.126 Hadrons. Baryons and mesons539.125 Nucleons539.125.4 Protons539.125.46 Antiprotons539.125.5 Neutrons539.125.56 Antineutrons539.126.3 Mesons539.126.4 Resonances539.126.6 Hyperons

SEMANTIC RELATIONSHIPS

AntineutrinosAntineutronsAntiprotonsAtomic physicsBaryons Beta-particlesBosonsElectrons HadronsHyperonsLeptonsMesonsMesonsMolecular physicsMuonsNeutrinosNeutronsNuclear physicsNucleiNucleonsPositronsProtonsResonances

words alone can only be arranged ordered alphabetically

grouping concepts into classes according to similarity

539.1 Nuclear physics. Atomic physics. Molecular physics

539.12 Elementary and simple particles539.123/.124 Leptons. Including: Muons539.123 Neutrinos539.123.6 Antineutrinos539.124 Electrons (including beta-particles)539.124.6 Positrons539.125/.126 Hadrons. Baryons and mesons539.125 Nucleons539.125.4 Protons539.125.46 Antiprotons539.125.5 Neutrons539.125.56 Antineutrons539.126.3 Mesons539.126.4 Resonances539.126.6 Hyperons

539.1 Nuclear physics. Atomic physics. Molecular physics

539.12 Elementary and simple particles539.123/.124 Leptons. Including: Muons539.123 Neutrinos539.123.6 Antineutrinos539.124 Electrons (including beta-particles)539.124.6 Positrons539.125/.126 Hadrons. Baryons and mesons539.125 Nucleons539.125.4 Protons539.125.46 Antiprotons539.125.5 Neutrons539.125.56 Antineutrons539.126.3 Mesons539.126.4 Resonances539.126.6 Hyperons

SEMANTIC RELATIONSHIPS

AntineutrinosAntineutronsAntiprotonsAtomic physicsBaryons Beta-particlesBosonsElectrons HadronsHyperonsLeptonsMesonsMesonsMolecular physicsMuonsNeutrinosNeutronsNuclear physicsNucleiNucleonsPositronsProtonsResonances

alphabetical orderno semantic relationships

systematic ordersemantic relationships fixed by notation

NOTATION – enables mechanical ordering of subjects

9

WORDS

classification is ‘language independent’ but... words are an essential part of every

classification system the separation of concepts from words using

notation - simply means that an infinite number of natural language expressions can be attached to every class notation in order to provide access points

verbal access points managed separately as captions subject-alphabetical index (relative index, chain

index) alphabetical controlled vocabularies (thesauri,

subject headings) folksonomy

10

HIERARCHICAL ORGANIZATION

6 Applied sciences. Medicine. Technology62 Engineering. Technology in general621 Mechanical engineering in general. Nuclear technology. Electrical

engineering. Machinery621.8 Machine elements. Motive power engineering. Materials handling.

Fixings. Lubrication621.88 Fastening, fixing devices. Fasteners621.882 Threaded fasteners. Screws. Nuts and bolts. Washers621.882.2 Screws, bolts according to head form. Screws and bolts for various

materials621.882.21 Screws and bolts according to head form621.882.214 Other polygonal-headed screws and bolts621.882.214.2 Screws and bolts with knurled or milled head. Thumb screws

freedom to choose and change the level of specificity

browsing function

semantic search expansion

11

UNIVERSAL KNOWLEDGE CLASSIFICATION – ASPECT CLASSIFICATIONS

organizes the universe of knowledge by disciplines - based on some scientific and educational consensus (criticism!)

groups phenomena according to the way they are researched, described and studied in documents

assumption – collocation of books by the field in which they are used saves user’s timeusers looking for books on managing rabbit as a pest will not be interested in fur industry or physiology of rodents... They will find all books on rodent pest control in the closest proximity

same phenomenon will find its place in all disciplines in which it may be subject of study

12

SUBJECT CONTEXT – ASSOCIATIVE RELATIONSHIPS

Chemical industry Pest-control chemicals Chemicals for controlling rodents. Rodenticides Mouse

Agriculture Animal husbandry Rodents kept for fur Mouse

Zoology Mammals Rodentia. Lagomorpha Myiomorpha

Muridae. Mice and ratsMouse

Agriculture Plant protection Control of plant diseases and pests Destruction of vertebrate pests Mouse

see also

see also

see also

13

LINEAR PRESENTATION OF KNOWLEDGE

the role of classification is to establish systematic, linear presentation of knowledge – order of classes

two types of classifications with respect to the flexibility of access points• enumerative – single, pre-established order of

simple and complex subjects (e.g. Dewey, LCC) • faceted and semi-faceted classification – allow

a range of options in class ordering, control over access points to subjects, and unlimited combinations of subjects

14

SUBJECT ACCESS POINTS

bibliographic classifications are designed to denote the following elements of content :

subject and subject facets: entity (its parts, kinds), processes, materials, agents, operations, instruments, space, time

relationships between subjects treated within the document (influence, bias, application, comparison)

inner form of presentation: theoretical, historical or criticism

outer form of presentation such as audience, purpose, form of expression

manifestations: text, image, sound

carriers: paper, magnetic/optical discs, film, analogue recordings

15

CLASSIFICATION VOCABULARY (e.g. UDC)

COMMON AUXILIARY NUMBERS

TIME

“ ”

ETHNICS(=...)

PLACE(1/9)

FORM(0...)

PROPERTIES-02

MATERIALS-03

PERSONS-05

LANGUAGE=…

RELATIONS-04

MAIN CLASSES

(DISCIPLINES)

.0

SPECIAL

AUXILIARY

NUMBERS

-1/-9

16

SYNTHESIS

Discipline 1

Discipline 2

Discipline 3

81 Linguistics and languages

811.12.2 German811.12.22 Upper German811.12.24 Middle German811.12.3/.4 Low German811.12.3 Plattdeutsch811.12.4 Frisian811.12.5 Dutch811.12.58 Dutch based

pidgin and creole

MAIN TABLES

Materials

Language

Time

Form

(1/9) Place

(4) Europe(430) Germany(436) Austria(437.3) Czech Republic(437.5) Slovakia(438) Poland

COMMON AUXILIARIES

-1 /-9 Schools, trends, methods

-116 Structuralism-116.2 Geneva school

‘0 Origins and periods of langusg

‘0 Origin and periods

‘1/’9 General theory of linguistics

‘1 Metatheory ‘2 Subject fields, facets of lin.‘34 Phonetics. Phonology’35 Graphemics. Orthography’36 Grammar’37 Semantics

SPECIAL AUXILIARY NUMBERS

17

RELATING SUBJECTS ACROSS DISCIPLINES = PHASE RELATIONSHIPS

37 :004 Education : Computers

338.48 :61 Tourism : Medicine

602.72 :17 Embryonic cloning : Ethics

-04 Relations, Processes and Operations

-042Phase relations-042.1 Bias phase-042.2 Comparison phase-042.3 Influence phase-042.4 Tool phase. Exposition phase

18

SUBJECT FACETS AND FLEXIBILITY OF ORDER

History Scotland

94 (410.5) “18” 19th century

History Scotland

94 (410.5) “18” 19th century

19

FACETS OF PERSONS

-057 Persons according to occupation, work, livelihood, education -057.17 Managers in general. The management -057.177 Higher management. Top management-057.177.3 Directors. Board members -057.177.32 Non-executive directors -057.177.321 Deputy directors. Assistant directors

-056 Persons according to constitution, health, disposition, hereditary or other traits-056.2 Persons according to physical state and health-056.25 Persons according to nourishment (nutritional

state) or body weight -056.257 Overweight persons. Overnourished. Fat. Obese. Hypertrophic

-053 Persons according to age or age-groups -053.8 Adults. Grown-ups -053.88 Persons in late middle age (troisième âge)-

-056.257

-057.177

-053.88

Top management – Persons in late middle age- Overweight

612.12-009.92

Angina pectoris

20

FACETED STRUCTURE ALLOWS - FACET BASED VIEW

21

MANAGING SUBJECT ACCESS

DOCUMENT

authortitlepublisherformat ...

METADATA

SUBJECTCLASSMARK: 94(410)"19"

AUTHORITY FILE UDC CLASS: 94(410) "19"

DESCRIPTION: History of the U. K.

WAS BEFORE: 941.0

BROADER: 94(4)SEE ALSO: 94(73), 94(54), 94(366)

SEARCH TERMS:HistoryUnited KingdomGreat Britain20th century

DISPLAY AS: United Kingdom - History

-----------------------------------------------------------

MAPPING TO:

Dewey: 94

LCSH: History, 20th century United KingdomLCC: DA566-592

IS DESCRIBED BY

IS DESCRIBED IN

22

SEMANTIC SEARCH EXPANSION

SUBJECT # HITS

539.12 Elementary and simple particles 132539.125/.126 Hadrons. Baryons and mesons 58539.125 Nucleons 38

hadrons search

539.125.4 Protons 5

539.125.46 Antiprotons 2

539.125.5 Neutrons 7

539.125.56 Antineutrons 1

539.126.3 Mesons 9

539.126.4 Resonances 11539.126.6 Hyperons 6

23

SUBJECT HITS #

ASTRONOMY. Mercury 2 PHYSICS. Mercury barmeters 3

ANALYTICAL CHEMISTRY. Mercury38 INORGANIC CHEMISTRY. Mercury, Hg10 ENGINEERING. Mercury vapour generators 9 CHEMICAL INDUSTRY. Mercuration 3

ADVANTAGES IN RESOURCE DISCOVERY: DISAMBIGUATION

mercury search

results.....

SUBJECT # HITS

523.41 ASTRONOMY. Mercury 2

531.787.4 PHYSICS. Mercury barmeters 3

543.272.81 ANALYTICAL CHEMISTRY. Mercury 38

546.49 INORGANIC CHEMISTRY. Mercury, Hg 10

621.181.232 ENGINEERING. Mercury vapour generators 9

66.095.712.49 CHEMICAL INDUSTRY. Mercuration 3

24

ADVANTAGES IN RESOURCE DISCOVERY: PRECISION

results....

SUBJECT # HITS

569.32 Zoology: Rodentia and Lagomorpha 7 632.935.7 Protection of crops 3

636.92 Animal husbandry. Domestic rabbits 38636.92.045 Animal husbandry. Domestic rabbits. Pets 10636.932 Animal husbandry. Rodents kept for fur 9639.112 Hunting. Small game generally 22641.8 Cooking. Main dishes 2677.354 Textile industry. Hare fur. Rabbit fur 8

rabbit search

25

SEARCHING INTERFACE

Lang 1 Lapin

Lang 2 Coniglio

Lang 3 Kaninchen

Lang 4 Rabbit

CLASSIFICATION AUTHORITY FILESUBJECT AREAS

ZOOLOGY

ANIMAL HUSBANDRY

FUR INDUSTRY

599.325.1

636.92

677.354

Lapin, Coniglio, Kaninchen, Rabbit...

Lapin, Coniglio, Kaninchen, Rabbit...

Lapin, Coniglio, Kaninchen, Rabbit

hierarchical organization of concepts

search terms

SUPPORTING MULTILINGUAL SEARCHING

26

INTEGRATION OF INFORMATION

UDC

Vocabulary 2

Vocabulary 1

library classification is often used as a pivot i.e. a central mapping structure - for the alignment of different vocabulares as a central mapping structure

27

EXAMPLE

Nebis subject authority file record, ETH-Biliothek (Zürich) - http://www.ethbib.ethz.ch/index_e.html

28

MARC CLASSIFICATION FORMATS

MARC 21 Concise Format for Classification Datahttp://www.loc.gov/marc/classification/

Concise UNIMARC Classification Format http://www.ifla.org/VI/3/p1996-1/concise.htm

• offer sufficient support to semantic relationships but no support for managing and exploiting complexity of classification syntax, managing global changes i.e.

heading field is not structured and does not allow multidirectional access to the meaningful elements of a complex notation

29

REQUIREMENT

machine readable identification of each structural part of notation separates display of numbers/symbols from their function

data element identifiers

51 (410) (091)

UDC number encoding for database management

30

NETWORK – INITIALLY RESOURCE DISCOVERY German Harvest Automated Retrieval and Directory - GERHARD

subject gateway - automatic classification of the German web based on UDC data from the ETH library authority file (GERHARD website was shut down in 2005).

read more at http://www.bis.uni-oldenburg.de/abt1/waetjen/publ/Article.pdf

31

32

TASKS FOR CLASSIFICATION DEVELOPERS

Improving classification data at their source:

• provide rich, machine readable classification data exposing semantic relationships and providing multiple access points to notation and words

• enable sharing by distributing data in different standard formats

• find way of releasing part of data in public domain for testing and training

• make sure that copyright regulations do not impede the use of classification in information integration and exchange

33

EXAMPLE - UDC

UDC Master Reference File (MRF) data has been distributed to users in a file format since 1993.

data is improved: unique identifier for every class (independent from notation), semantic and syntactic relationships declared, syndectic structure improved

MRF 2008 exports will be available in MARC and SKOS standards or as on demand SQL statements, + various TEXT/XML outputs

pending:-improvement of verbal access (subject-alphabetical index)-merging the existing multilingual data into one database

future plans: inclusion of mapping to other vocabularies

looking for projects to test semantic technologies and how part of UDC data can be tested in an open m2m environment

34

IN SUMMARY

development of new standards opens new possibilities for sharing and use of classification: new services and new solutions

to support new kind of users classification has to be exposed in machine readable, standardized format and made accessible to programs and services on the network

issues for owners: costs, copyright policy

--- END OF PRESENTATION ---