Linked data and the LOCAH project ILI2011

Preview:

DESCRIPTION

Slides for a presentation given at the Internet Librarian International Conference (ILI2011), October 2011

Citation preview

Bethan Ruddock, Library and Archival Services, Mimas, University of Manchester

bethan.ruddock@manchester.ac.uk @bethanar

LINKED DATA AND THE LOCAH PROJECT

#ILI2011

LINKED OPEN COPAC & ARCHIVES HUB

JISC-funded project (under JISCexpo - exposing digital content for education and research)

September 2010 – August 2011

Staff from Mimas, UKOLN, Eduserv

Additional expertise from Talis, OCLC, Library of Congress

PROJECT AIMS

Put archival and bibliographic data at the heart of the Linked Data Web, making new links between diverse content sources, enabling the free and flexible exploration of data and enabling researchers to make new connections between subjects, people, organisations and places to reveal more about our history and society.

Make a collection of resources available on the Web as structured data, in particular linked data, where a case can be made that it would benefit teaching, learning, research, administration and/or knowledge transfer in UK higher education

Develop a prototype with instructional step-by-step demonstration and documentation to show how the structured content can be used by 3rd party tools and services

Explore and report on the opportunities and barriers in making content structured and exposed on the Web for discovery and use. Such opportunities and barriers may coalesce around licensing implications, trust, provenance, sustainability and usability

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

THE DATA: COPAC

• Merged union catalogue of the holdings of over 60 UK libraries

• Over 50 million records• Consolidated records• MODS XML (not MARC)

A Copac consolidated record created from 5 contributed records. Lines show how contributed records match with one another.

THE DATA: ARCHIVES HUB

• Descriptions of archive collections from over 200 UK repositories

• Nearly 25,000 descriptions – collection-level and multi-level

• EAD (Encoded Archival Description)

CHALLENGES: VARIANCE

• Data from many sources – should adhere to Standards

AARC2 ISAD(G) BUT

Differences in implementation

CHALLENGES: DATA

dct:publisher: unknown

260 $b: unknown

dct publisher: definition:‘entity responsible for making the resource available’

CHALLENGES: MULTIPLE SOURCES

A ‘match graph’ of a consolidated Copac record

CHALLENGES: VOCABULARY

Stuffc r e a t e d

co

llec

ted

r e l at e

s

t o

co l l ec t e

d

c r e a t e d

re l a t e s t o

ORIGINATION

LICENSING

• Data comes from contributors Not ours to redistribute!

• Concerns Provenance Trust Control

• Consulted Liaised with contributors and stakeholders

THE TECHY STUFF

Specifications required a lot of brainstorming…

Image used under a CC licence from http://www.flickr.com/photos/blankdots/4865831504/

ARCHIVES HUB MODEL

ArchivalResource

Finding Aid

EAD Document

Biographical

History

Agent

Family Person Place

Concept

Genre Function

Organisation

maintainedBy/maintains

origination

associatedWith

accessProvidedBy/providesAccessTo

topic/page

hasPart/partOf

hasPart/partOf

encodedAs/encodes

Repository(Agent)

Book

Place

topic/page

Language

Level

administeredBy/administers

hasBiogHist/isBiogHistFor

foaf:focus Is-a associatedWith

level

Is-a

language

ConceptScheme

inScheme

ObjectrepresentedBy

PostcodeUnit

Extent

Creation

Birth Death

extent

participates in

TemporalEntity

TemporalEntity

at time

at time

product of

in

COPAC MODEL

Node name MODS field Ontology

BibliographicResource

<modscollection> bibo

cardinality property URI/literal ontology

0 1 copac:creator Creator URI dc

0 m copac:contributor Contributor URI coapc

0 1 event:producedIn Production Date URI event

0 1 dct:issued Production Date URI dc

0 m pode:publicationPlace Place URI pode

0 m isbd:P1016 Place URI isbd

0 m dct:publisher Publisher URI dc

0 1 dct:isPartOf Series URI dc

1 m copac:HeldBy Institution URI with Institution as subject

1 1 bibo:type Type URI bibo

0 m dct:subject Subject URI dc

0 m skos:subject subject URI skos

0 m dct:language Language URI dc

1 1 hub:encodedAs mods URI hub

data.copac.ac.uk

data.archiveshub.ac.uk

Visualisation Prototype Using Timemap –

Googlemaps and Simile

http://code.google.com/p/timemap/

Early stages with this

Will give location and ‘extent’ of archive.

Will link through to Archives Hub

BBC:Cranford

VIAF:Dickens

DBPedia: Gaskell Hub:Gaske

ll

Copac:Cranford

Geonames:Mancheste

r

DBPedia: Dickens

Hub:Dickens

Linking

CHALLENGES: ANONYMOUS

Mask image used under a CC licence from http://www.yourbdnews.com

Anonymous

Anonymous

anonymous

Anonymous

Anonymous

Anonymo

us

Anonymous

Anonymous

anonymous

Anony

m

ous

anon.

anon.

Anon.

anon

Anon.

Anon.

anonymous

data.copac.ac.uk/doc/bibliographicresource/6947473

data.copac.ac.uk/doc/concept/agent/6947473lacywilliam

data.copac.ac.uk/doc/bibliographicresource/6947473

data.copac.ac.uk/doc/agent/rys

data.archiveshub.ac.uk/doc/archivalresource/gb1086colour

data.archiveshub.ac.uk/doc/concept/unesco/photography

WHAT NEXT?

Linking Lives name-based approach into the data integrating archival resource with other

resources DBPedia, VIAF, Copac... route into archives for different

audiences? issues around trust and provenance to be

explored

FINALLY…

The LOCAH data is open for use…

…please play with it!Image used under a CC licence from

http://www.flickr.com/photos/huladancer22/530743543/

@bethanarbethaninfoprof.wordpress.combethan.ruddock@manchester.ac.uk

LOCAH blog: http://blogs.ukoln.ac.uk/locah/

Image used under a CC licence from http://www.flickr.com/photos/theilluminated/5386099858/

Recommended