29
www.bath.ac.u k UKOLN is supported by: Do the LOCAH-Motion How to Make Archival and Bibliographic Linked Data 16 th February 2011 Dev8D, University of London, UK Adrian Stevenson LOCAH Project Manager

Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

Embed Size (px)

DESCRIPTION

Presentation given at the Dev8d Developer Days event at the University of London Students Union, London, UK on 15th February 2011. The talk was primarily aimed at developers with the assumption that they knew a bit about RDF and Linked Data, so it doesn’t discuss these except in passing. I was mainly trying to give some specifics on the technicalities involved, and what platforms and tools we’re using, so people can follow the same path if they wanted. More info at http://blogs.ukoln.ac.uk/locah/2011/02/14/locah-lightening-at-dev8d/ and http://wiki.2011.dev8d.org/w/Session-L18

Citation preview

Page 1: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

UKOLN is supported by:

Do the LOCAH-MotionHow to Make Archival and Bibliographic Linked Data

16th February 2011

Dev8D, University of London, UK

Adrian Stevenson

LOCAH Project Manager

Page 2: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

What is the LOCAH Project?• Linked Open Copac and Archives Hub• Funded by #JiscEXPO 2/10 ‘Expose’ call• 1 year project. Started August 2010• Partners & Consultants:

– UKOLN – Adrian Stevenson, Julian Cheal– Mimas – Jane Stevenson, Bethan Ruddock, Yogesh Patel– Eduserv – Pete Johnston– Talis – Leigh Dodds, Tim Hodson– OCLC - Ralph LeVan, Thom Hickey– Ed Summers

• http://blogs.ukoln.ac.uk/locah/ tag: #locah

Page 3: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

What are the Archives Hub and Copac?• The Archives Hub is an aggregation of

archival descriptions from archive repositories across the UK - http://archiveshub.ac.uk

• Copac provides access to the merged library catalogues of libraries throughout the UK, including all national libraries - http://copac.ac.uk

Page 4: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

What is Linked Data?

• URIs

• LD Design Issues

• Triples

http://www.w3.org/DesignIssues/LinkedData.html

Page 5: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

What does Linked Data Offer?• Haven’t we been putting linked data on the

web for years?– In CSV , relational databases, XML etc?

• Well yes, but these approaches are not easy to integrate

• Web 2.0 mashups work against a fixed set of data sources

• Linked Data applications operate on top of an unbound, global data space.

Page 6: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

What is LOCAH Doing?

• Part 1: Exposing the Linked Data

• Part 2: Creating a prototype visualisation

• Part 3: Reporting on opportunities and barriers

Page 7: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

How are we Exposing the LOCAH Linked Data?1. Model our ‘things’ into RDF

2. Transform the existing data into RDF/XML

3. Enhance the data

4. Load the RDF/XML into a triple store

5. Create Linked Data Views

6. Document the process, opportunities and barriers on LOCAH Blog

Page 8: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

1. Modelling ‘things’ into RDF• Archives Hub data in ‘Encoded Archival Description’ EAD

XML form

• Copac data in ‘Metadata Object Description Schema’ MODS XML form

• Take a step back from the data format– Think about your ‘things’– What is EAD document “saying” about “things in the world”?– What questions do we want to answer about those “things”?

• Can help make data more user-centric

http://www.loc.gov/ead/ http://www.loc.gov/standards/mods/

Page 9: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

Triples• Thinking falls naturally into ‘triple’ statements

– ‘Things’ have ‘properties’ with ‘values’– Subject – Predicate - Object

• Triples are basis of RDF• More on all this at

http://blogs.ukoln.ac.uk/locah/2010/09/28/model-a-first-cut/

ArchivalResource

Repository Provides Access To

Page 10: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

Data Modelling Challenges• Archival description is hierarchical and multi-level• Information is provided about aggregation of records,

and then about component parts• Multi-level approach gives a strong sense of “context”

– “lower level” units interpreted in context of the higher levels of description

– Arguably “incomplete” without the contextual data

• Linked Data involves ‘bounded descriptions– Relations are asserted, e.g. member-of/component-of

– But there is no requirement or expectation that data consumers will follow the links describing the relations

Page 11: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

Data Modelling Challenges• Hub: inconsistencies in data and lack

of standardisation– there's actually no content standard in

the UK

• Copac: not a standard library catalogue– merged catalogues with de-duplication

to an extent but cannot be done entirely

Page 12: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

1. Modelling ‘things’ into RDF• Decide on patterns for URIs we generate• Following guidance from W3C ‘Cool URIs for the Semantic

Web’ and UK Cabinet Office ‘Designing URI Sets for the UK Public Sector’– E.g. http://example.ac.uk/id/findingaid/gb1086skinner ‘thing’ URI– Use HTTP 303 ‘See Other’ to redirect to …– E.g. http://example.ac.uk/doc/id/findingaid/gb1086skinner doc URI– Content negotiates to …– http://example.ac.uk/doc/…/doc.rdf , …/doc.html for documents

about things– More info at http://blogs.ukoln.ac.uk/locah/2010/11/16/identifying-the-things-uri-

patterns-for-the-hub-linked-data/

http://www.w3.org/TR/cooluris/http://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector

Page 13: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

1. Modelling ‘things’ into RDF• Using existing RDF vocabularies:

– DC, SKOS, FOAF, BIBO, WGS84 Geo, Lexvo, ORE, LODE, Event and Time Ontologies

• Define additional RDF terms where required– FindingAid– ArchivalResource– maintenanceAgency

• It can be hard to know where to look for vocabs and ontologies

• Decide on license – CC0, ODC PDD

Page 14: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

ArchivalResource

Finding Aid

EAD Document

Biographical History

Agent

Family Person Place

Concept

Genre Function

Organisation

maintainedBy/maintains

origination

associatedWith

accessProvidedBy/providesAccessTo

topic/page

hasPart/partOf

hasPart/partOf

encodedAs/encodes

Repository(Agent)

Book

Place

topic/page

Language

Level

administeredBy/administers

hasBiogHist/isBiogHistFor

foaf:focus Is-a associatedWith

level

Is-a

language

ConceptScheme

inScheme

ObjectrepresentedBy

PostcodeUnit

Extent

Creation

Birth Death

extent

participates in

TemporalEntity

TemporalEntity

at time

at time

product of

in

Archives Hub Model (as at 14/2/2011)

Page 15: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

Copac Model (as at November 2010 – work in progress)

Page 16: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

Feedback Requested!

• We would like feedback on the model• Appreciate this will be easier when the data

available• Via blog

– http://blogs.ukoln.ac.uk/locah/2010/09/28/model-a-first-cut/

– http://blogs.ukoln.ac.uk/locah/2010/11/08/some-more-things-some-extensions-to-the-hub-model/

– http://blogs.ukoln.ac.uk/locah/2010/10/07/modelling-copac-data/

• Via email, twitter, in person at Dev8d

Page 17: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

2. Transforming in RDF/XML• Need to transform data in EAD and MODS

to RDF/XML, based on our models

• For Hub data created XSLT Stylesheet and used Saxon parser– http://saxon.sourceforge.net/– Saxon runs the XSLT against a set of EAD files and

creates a set of RDF/XML files

• For Copac data created in-house Java transformation program

Page 18: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

3. Enhancing our data• Already have some links:

– lexvo.org URIs for languages of archival materials– reference.data.gov.uk URIs for time periods– URIs for postcodes, using both UK Postcodes URIs and

Ordnance Survey URIs

• Currently also looking at:– Virtual International Authority File

• Matches and links widely-used authority files - http://viaf.org/

– Library Congress Subject Headings– DBPedia

Page 19: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

4. Load the RDF/XML into a triple store• Using the Talis Platform triple store

• RDF/XML is HTTP POSTed

• We’re using Pynappl – Python client for the Talis Platform– http://code.google.com/p/pynappl/

• Store provides us with a SPARQL query interface

Page 20: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

5. Create Linked Data Views• Expose ‘bounded’ descriptions from

the triple store over the Web

• Make available as documents in both human-readable HTML and RDF formats (also JSON, Turtle, CSV)

• Using Paget ‘Linked Data Publishing Framework’– http://code.google.com/p/paget/– PHP scripts query Sparql endpoint

Page 21: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

• ‘Out-of-the-box’ Paget view• Linkedhub.ac.uk domain just given as example

Page 22: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

Other Stuff We Might Try• Linked Data API

– APIs, data formats and supporting tools to aid the adoption of linked data

– http://code.google.com/p/linked-data-api/

• Entity extraction from free text– Open Calais

• “creates rich semantic metadata for the content you submit” - http://www.opencalais.com/

– DBPedia Spotlight (announced yesterday)• “solution for linking unstructured information sources to the Linked

Open Data”• http://dbpedia.org/spotlight

Page 23: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data
Page 24: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

Can I Access the Locah Linked Data?

• Not quite yet …

• Hoping to release the Hub data by end February 2011

• Copac data end March 2011

• Release will include Linked Data views, Sparql endpoint details, example queries and supporting documentation

Page 25: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

How are we creating the Visualisation Prototype?• Based on researcher use cases

• Data queried from Sparql endpoint

• Use tools such as Simile, Many Eyes, Google Charts

Page 26: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

Visualisation Protoype• Using Timemap –

– Googlemaps and Simile

– http://code.google.com/p/timemap/

• Early stages with this• Will give location and

‘extent’ of archive.• Will link through to

Archives Hub

Page 27: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

How are we reporting on opportunities and barriers?• Recording these as we go along on

the blog (tags: ‘opportunities’ ‘barriers’)

• Feed into #JiscEXPO synthesis work

• Not time to go into these today• More at:

– http://blogs.ukoln.ac.uk/locah/2010/09/22/creating-linked-data-more-reflections-from-the-coal-face/

– http://blogs.ukoln.ac.uk/locah/2010/12/01/assessing-linked-data

Page 28: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

Questions?

• Contacts:– Ade Stevenson @adrianstevenson– Jane Stevenson @janestevenson– Pete Johnston @ppetej– Bethan Ruddock @bethanar– Julian Cheal @juliancheal– Yogesh Patel http://mimas.ac.uk/staff/

Page 29: Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

                                                             

www.bath.ac.uk

Attribution and CC License

• Sections of this presentation adapted from materials created by other members of the LOCAH Project

• This presentation available under creative commons Non Commercial-Share Alike:http://creativecommons.org/licenses/by-nc/2.0/uk/