21
Linking Library of Congress Subject Headings Owen Stephens 14th July 2011 @ostephens http://www.meanboyfriend.com/overdue_ideas Thursday, 14 July 2011

Linking lcsh and other stuff

Embed Size (px)

DESCRIPTION

Lightening talk I gave at the Libraries and Linked Data day organised by Talis at the British Library on 14th July 2011

Citation preview

Page 1: Linking lcsh and other stuff

Linking Library of Congress Subject Headings

Owen Stephens 14th July 2011@ostephens

http://www.meanboyfriend.com/overdue_ideas

Thursday, 14 July 2011

Page 2: Linking lcsh and other stuff

LNKNLCSH@ostephens

Thursday, 14 July 2011

This is the lightning version

I should precursor this talk by saying I’m really pleased that the LoC have invested in experimenting with Linked Data representations of aspects of their data. Anything in this talk isn’t a criticism of this, but about the issues we encountered using aspects of the data. It’s possible that some or all of these problems may have been down to my lack of understanding of LCSH and Linked Data :)

Page 3: Linking lcsh and other stuff

Library chops

Thursday, 14 July 2011

I’m a librarian - by nature and qualification :) - see http://www.meanboyfriend.com/overdue_ideas/2010/11/library-routes/

Been working on the cusp between libraries and IT since 1995. Spending early part of my career in small libraries means I have worked in just about every area of library front of house and back office. However although I’ve catalogued books, and have more than a passing familiarly with MARC, I’m not a cataloguer, and not an expert on LCSH

Page 4: Linking lcsh and other stuff

Linked Data chops

Thursday, 14 July 2011

I’ve been trying to understand the Semantic Web/Linked Data for several years :) My understanding has been accelerated over the last couple of years by involvement in several projects in the Linked Data space. Specifically the Lucero and CORE projects at the Open University

Page 5: Linking lcsh and other stuff

Thursday, 14 July 2011

Expressing similarity between published papers in UK research repositoriesHarvest metadata and full-text (50k papers from 143 UK repos so far)Text mine for relationshipsExpose ‘similarity’ measure as RDF triples using MuSIM Ontology (originally developed for Music, but equally applicable)For more information http://core-project.kmi.open.ac.uk

Page 6: Linking lcsh and other stuff

Exposing RDF

Thursday, 14 July 2011

Three ʻproductsʼCORE Portal - search or SPARQL metadata for harvested papers CORE Mobile – Android application to search & navigate across related papers & downloading articlesCORE Plugin - Designed to integrate into existing repository interface to link to ʻrelated papersʼ in other repos, based on CORE ʻsimilarityʼ

For more information http://core-project.kmi.open.ac.ukSPARQL Endpoint at http://core.kmi.open.ac.uk:8081/COREWeb/squeryHow we express data in RDF: http://core-project.kmi.open.ac.uk/node/13

Page 7: Linking lcsh and other stuff

Lucero

Thursday, 14 July 2011

For more information see http://lucero-project.infoData and SPARQL Endpoint available via http://data.open.ac.uk

Lucero published variety of data from the Open University as linked open data - admin data (buildings), course data (course catalogue, OERs), research data and data about bibliographic resources - including materials in the library (focussed on materials related to course materials - around 30k catalogue records)

Page 8: Linking lcsh and other stuff

LCSH

Thursday, 14 July 2011Lots been written about LCSH, it’s structure, whether it should be replaced. I don’t want to spend too much time on this today but it may come up in places

However it is probably worth recapping my understanding (if only to let those more knowledgeable correct it)

Key aspect in the context of this talk is that LCSH is primarily a pre-coordinated system - that is facets of subject headings are pre-combined into a single, multi-faceted heading. Although....“LCSH itself requires some degree of post-coordination of the pre-coordinated strings to bring out specific topics of works.” (http://www.loc.gov/catdir/cpso/pre_vs_post.pdf)

In fact the way that LCSH is structured in MARC records, and the way that indexes can be built on this in library management systems means that

I’m going to focus on ‘Topical’ subject headings (confusingly to me, LCSH can also cover Name, Title and Geographic headings)

Topical Terms can represent “a concrete object, animal, etc.; a category of people, animals, or objects; a more abstract concept, belief, process, or phenomenon; an institution, etc.” (http://www.tulane.edu/~techserv/lcsh%20introd.html) Topical LC Subject Headings are built by combining ‘Topical Terms’ with qualifiers (‘subdivisions’) which allow you to contextualise the term. The types of subdivision available are:

General (a high level general qualifier - e.g. ‘History’)Chronological (period of time - e.g. ‘20th Century’)Geographic (place - e.g. ‘Great Britain’)Form (the type/genre of material - e.g. ‘Dictionary’)

There are large number of rules that express how these subdivisions can be used in conjunction with Topical Terms, and the order in which they should be expressed. Not all combinations are valid - for example only certain General subdivisions may be further subdivided Geographically. The rules are not always black and white - they have ‘examples’ lists which you can use to inform you if it might be valid in a given situation.

Perhaps suffice to say that a document called ‘BASIC SUBJECT CATALOGING USING LCSH: Trainee’s Manual’ is 382 pages long.

Subject heading strings can be valid (i.e. constructed according to rules/patterns) while not being ‘Authorized’ - in this context and Authorized Heading is “A preferred subject term as decided and established by the Library of Congress by means of an authority record.” (Thanks to Tom Meehan for this definition)

Page 9: Linking lcsh and other stuff

Thursday, 14 July 2011

Thanks to work of Ed Summers and others, the Library of Congress have a Linked Data representation of LCSH in SKOS. However, this only covers ‘Authorized’ LCSH - presumably because only those LCSH with an Authority record have an identifier within LoC systems? (I’m speculating)

Page 10: Linking lcsh and other stuff

Thursday, 14 July 2011

This is a catalogue record from the OU - the two strings listed as ‘Subjects’ are LCSH (for cataloguers amongst you MARC 650s)

Can see the linked data representation at http://data.open.ac.uk/page/library/289148

Page 11: Linking lcsh and other stuff

Science--Study and Teaching--Research

Topical Term

General Subdivision

General Subdivision

Thursday, 14 July 2011

This is made up of a Topical Term - Science and two general subdivisions ‘Study and Teaching’ and ‘Research’

Page 12: Linking lcsh and other stuff

Science--Study and Teaching--Research

id.loc.gov ?

Thursday, 14 July 2011

This is (afaik - I trust the cataloguers) a valid LCSH ... however it is not authorized ... and so does not have a URI on id.loc.gov

Page 13: Linking lcsh and other stuff

Science--Study and Teaching--Research

http://id.loc.gov/authorities/sh85118587#concept

Thursday, 14 July 2011

“Science--Study and Teaching”, however, is an authorized heading

Page 14: Linking lcsh and other stuff

Science--Study and Teaching--Research

http://id.loc.gov/authorities/sh85118553#concept

N.B. This is URI for Science as Topical Term not http://id.loc.gov/authorities/sh00007934#concept which is URI for Science as a General Subdivision

Thursday, 14 July 2011

As is “Science”

Page 15: Linking lcsh and other stuff

Science--Study and Teaching--Research

http://id.loc.gov/authorities/sh2001008697#concept

Thursday, 14 July 2011

Also “Study and Teaching” (as a topical subdivision) is an authorized heading

Page 16: Linking lcsh and other stuff

Science--Study and Teaching--Research

http://id.loc.gov/authorities/sh2002006576#concept

N.B. This is URI for Research as General Subdivision not http://id.loc.gov/authorities/

sh85113021#concept which is URI for Research as a Topical Term

Thursday, 14 July 2011

Also “Research” (as a topical subdivision) is an authorized heading

Page 17: Linking lcsh and other stuff

More links please

Thursday, 14 July 2011

If we only used id.loc.gov URIs where we had an authorised LCSH, we would end up with only a small number of links. Some URIs in id.loc.gov would never be used in this way as they only represent subdivisions - never valid by themselves.

Therefor decided to check a variety of combinations against id.loc.gov

Page 18: Linking lcsh and other stuff

Science--Study and teaching

Science--Study and Teaching--Researchhttp://id.loc.gov/authorities/

sh85118587#concept

Science

Study and Teaching

http://id.loc.gov/authorities/sh85118553#concept

http://id.loc.gov/authorities/sh2001008697#concept

Research http://id.loc.gov/authorities/sh2002006576#concept

Science--Study and Teaching--Research

http://data.open.ac.uk/page/topic/library/science--

study_and_teaching--research

Thursday, 14 July 2011

Page 19: Linking lcsh and other stuff

MADS?

http://www.loc.gov/standards/mads/rdf/

Thursday, 14 July 2011

As far as I can see MADS (apart from looking complex) models the Authority - not the heading - this doesn’t solve the problem we saw here!

That is MADS would solve the problem only for Authorized headings (which it does represent as component parts - which I think addresses the issues raised by Karen Coyle at http://kcoyle.blogspot.com/2009/05/lcsh-as-linked-data-beyond-dash-dash.html)

Happy to be corrected...

Page 20: Linking lcsh and other stuff

bibo:authorList ( <http://examples.net/contributors/2> <http://examples.net/contributors/1>)

lcsh:headingList ( <http://id.loc.gov/authorities/sh85118553#concept> <http://id.loc.gov/authorities/

sh2001008697#concept> <http://id.loc.gov/authorities/sh2002006576#concept>)

A different approach?

Thursday, 14 July 2011

If we could use rdfs:list to represent the pre-coordinated string of headings - then wouldn’t care about whether ‘authorized’ or not, and would have all the individual headings there as well (bibo lists authors individual and as a list)

Again copying BIBO which has each author as a dc:author as well, could represent each part of the subject string as a separate dc:subject.

In a MADS world there would be advantage to expressing full authorized heading as well (for relationships derived in MADS) although there is still the question of expressing ‘authorized fragments’ which seems to me would also be useful with MADS for the same reasons

This feels like a simple approach that would at least allow us to capture the component parts of subject string (and personally I’m not sure we ought to go further than this? do we need to? why?). My feeling is lots of the work goes into representing the ‘Authority file’ as opposed to how subject headings are used in the real world ... is this fair?

Page 21: Linking lcsh and other stuff

Details: http://discovery.ac.uk/developers/competition/Datasets: http://ckan.net/group/ukdiscoveryAsk Questions: http://getthedata.orgor #discodev

Thursday, 14 July 2011

Finally just an advert - if you are interested in open data in the library/archive/museum space please consider entering this competition :) - really show the value of this stuff!