45
SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Links Peter McKeague (On behalf of project partners) [email protected] SENESCHAL www.rcahms.gov.uk http://canmore.rcahms.gov.

Cigs lod rcahms_seneschal_pm_20131118

Embed Size (px)

DESCRIPTION

SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Links / Peter McKeague, RCAHMS, on behalf of the SENESCHAL Project team Presented at Linked Open Data: current practice in libraries and archives (Cataloguing & Indexing Group in Scotlland 3rd Linked Open Data Conference), Edinburgh, 18 Nov 2013

Citation preview

Page 1: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Links

Peter McKeague(On behalf of project partners)

[email protected]

SENESCHAL

www.rcahms.gov.uk http://canmore.rcahms.gov.uk

Page 2: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Outline of talk Part I RCAHMS

What we do What we hold Classifying

Part II Drivers for Linked Data

Part III SENESCHAL Project Partners The Project so far Prospects

Page 3: Cigs lod rcahms_seneschal_pm_20131118

• Identifies, surveys and analyses the historic and built environment of Scotland

• Preserves, cares for and adds to the information and items in its national collection

• Promotes understanding, education and enjoyment through interpretation of the information it collects and the items it looks after

RCAHMS Mission Statement

Page 5: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Standards: Midas Heritage

http://www.english-heritage.org.uk/publications/midas-heritage/

CIDOC Conceptual Reference Model (CRM) http://www.cidoc-crm.org/

Page 6: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Monuments: Internal staff databaseThesaurus: Events

ThesauriMonumentsObjectsMaritime Craft

Pick lists

Pick list Pick list

Page 7: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Information is published on Canmore

ThesauriMonumentsObjectsMaritime Craft

http://canmore.rcahms.gov.uk

Page 8: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Information is published on Canmore

ThesauriMonumentsObjectsMaritime Craft

Page 9: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

RCAHMS thesauri: text search

http://orapweb.rcahms.gov.uk/apex/f?p=210:1:

Page 10: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

RCAHMS thesauri: term definition

http://orapweb.rcahms.gov.uk/apex/f?p=210:1:

Page 11: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

RCAHMS thesauri : suggest a term

http://orapweb.rcahms.gov.uk/apex/f?p=210:1:

Page 12: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Part II: Drivers for Linked DataWe already publish our thesauri as key reference datasets for use by professional archaeologists in national organisations, in local authority Historic Environment Records as well as by anyone interested in the historic environment.

BUT

Our vocabularies (and other data) are not visible

The thesaurus architecture limits the potential of the terminology

Terms lack the persistent URIs that would allow our resources to act as hubs for the Web of Data.

Interoperability----For heritage, the main exponents of Linked Data are from the research community,and in Scotland primarily from Computer Scientists

Page 13: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Drivers for Linked Open Data

Open Data White paper June 2012: Scotland’s Digital Future April 2013: http://data.gov.uk/sites/default/files/Open_data_White_Paper.pdf http://www.scotland.gov.uk/Resource/0042/00421478.pdf

It is Government policy

Page 14: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Drivers for Linked Open Data

• Public data policy and practice will be clearly driven by the public and businesses who want to use the data, including what data is released, when and in what form

• Public data will be published in reusable, machine-readable form

• Public data will be released under the same open licence which enables free reuse, including commercial reuse

• Public data will be published using open standards, and following relevant recommendations of the World Wide Web Consortium

• Public data from different departments about the same subject will be published in the same, standard formats and with the same definitions

• Public data underlying the Government’s own website will be published in re-usable form • Release data quickly, and then work to make sure it is available in open standard formats, including Linked data forms.

It is Government policy: Open Data White Paper June 2012:

Page 15: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

... And a practical use

An online submission form to report fieldwork from contractors to curators

Page 16: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

©University of Glamorgan

“the key to interoperability”

http://www.heritagedata.org/

Part III: The partners

Page 17: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Lineage

STAR: Semantic Technologies for Archaeological resources 2007-2010AHRC funded project with English Heritage to apply semantic and knowledge-based technologies to the digital archaeological domain. STAR developed new methods for linking digital archive databases, vocabularies and the associated grey literature, exploiting the potential of a high level, core ontology and natural language processing techniques.http://hypermedia.research.southwales.ac.uk/kos/star/

STELLAR: Semantic Technologies Enhancing Links and Linked data for Archaeological Resources 2010-2011AHRC funded project with the ADS and English Heritage. Building on the outcomes of STAR, STELLAR provided support for non-specialist users to map and extract datasets. http://hypermedia.research.southwales.ac.uk/kos/stellar/

SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Links 2013-2014AHRC funded project with the ADS, English Heritage, RCAHMS, RCAHMW and Wessex Archaeology. http://hypermedia.research.southwales.ac.uk/kos/SENESCHAL/and http://www.heritagedata.org

Page 18: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

The SENESCHAL Project seneschal n. Historical

The steward or major-domo of a medieval great house 12 month AHRC funded project

March 2013 - February 2014 Deliverables

Controlled vocabularies online Linked data (SKOS) Downloadable files

Web services term suggestion, term validation, legacy data alignment

Tools to align data with controlled vocabularies Browser-based ‘widget’ controls

Page 19: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Interoperability “The terminology of a subject is the key to

interoperability” (John F. Sowa) Interoperability requires more than just a common

data model Data compatibility occurs on 2 levels – semantic

and syntactic. Ontologies / data structures deal with the semantic but not necessarily the syntactic “The CRM relies on existing syntactic interoperability

and is concerned only with adding semantic interoperability” (CIDOC CRM documentation)

Page 20: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

You say potato, I say tomato… Multiple datasets, multiple

organisations, multiple languages Unification of data structures is

possible, BUT… Incompatible terminology hinders

cross search and prevents greater interoperability

Applications attempting to reuse data must all individually sort out the same old problems

E.g. Get all the iron age post holes…

Feature PeriodPost-hole IRON AGEPosthole |ron agePOST HOLE Iron age?POSTHLOLE EARLY IRON AGEPOST HOLE (POSSIBLE)

250 BC

POSTHOLES C 500-200 B.C.

Solution: data cleansing and controlled vocabularies?

Page 21: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Typical interoperability issues encountered Simple spelling errors

POSTHLOLE”, “CESS PITT”, “FURRROWS”, FLINT SCRAPPER” Alternate word forms

“BOUNDARY”/”BOUNDARIES”, “GULLEY”/”GULLIES” Prefixes / suffixes

“RED HILL (POSSIBLE)”, “TRACKWAY (COBBLED)”, “CROFT?”, “CAIRN (POSSIBLE)”, “PORTAL DOLMEN (RE-ERECTED)”

Nested delimiters “POTTERY, CERAMIC TILE, IRON OBJECTS, GLASS”

Terms not intended for indexing “NONE”, “UNIDENTIFIED OBJECT”, “N/A”, “NA”, “INCOHERENT”

Terms that would not be in (any) thesauri “WOTSITS PACKET”, “CHARLES 2ND COIN”, “ROMAN STRUCTURE POSSIBLY A

VILLA“, “ST GUTHLACS BENEDICTINE PRIORY”, “WORCESTER-BIRMINGHAM CANAL”, “KUNGLIGA SLOTTET”, “SUB-FOSSIL BEETLES”

More specific phrases “SIDE WALL OF POT WITH LUG”, “BRICK-LINED INDUSTRIAL WELL OR MINE

SHAFT”, “ALIGNMENT OF PLATFORMS AND STONES”

Page 22: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Solutions - SENESCHAL Controlled vocabularies (again)

Commonly agreed concepts, terminology and identifiers Existing / new thesauri – community contributions?

Openness and availability Licensing, web services, downloads, data formats

Alignment of existing data Data cleansing tools Alignment techniques

Alignment of new data Interactive embedded data entry tools Validation at point of data entry Rather than trying to solve this vocabulary problem, help to prevent

it from happening in the first place

Page 23: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Vocabularies online as (SKOS) Linked Data Vocabularies from English Heritage

Monument Types Thesaurus Objects Thesaurus Event Types Thesaurus Maritime Craft Thesaurus RCHME Cultural Periods List / MIDAS Archaeological Periods List

Vocabularies from RCAHMS Monument Thesaurus (Scotland)

Multilingual - includes Scottish Gaelic translations! Objects (Scotland) Maritime Craft (Scotland)

Vocabularies from RCAHMW Monument Thesaurus (Wales) Event (Wales) Period (Wales)

Moving from term based towards concept based indexing Start to create links between concepts… between vocabularies… between datasets… between

sites… between countries Cross searching of (multilingual) cultural heritage resources

Page 24: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

(partial) SKOS model

skos:Concept

skos:inScheme

[literal value]

skos:ConceptScheme

skos:Collection

skos:broader,skos:narrower,

skos:related

skos:prefLabel,skos:altLabel,skos:notation,

skos:scopeNote,skos:changeNote

skos:member

[literal value]dc:title,

dc:description

skos:hasTopConceptskos:topConceptOf

Page 25: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Data licensing and attribution using CC REL

skos:Conceptskos:ConceptScheme

URI

cc:license

[literal value]cc:attributionName

cc:attributionURL

URIcc:license

cc:attributionURL

cc:attributionName

URI

dct:creator dct:creator

URI

dc:sourcedc:source

Attribution back to original data providers

Page 26: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

General System Architecture

SENESCHAL data store

Linked DataREST API

SPARQL query endpoint

web controls & applications

Web Services REST API

Native vocabularies

STELLAR (SKOS) templates

SKOS RDF vocabularies

(upload)

Additional metadata

Page 27: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Linked Data API (preliminary) The project will implement a Linked Data (restful) API The base URI maybe http://www.heritagedata.org/ or http://purl.org/xxx/.. Seneschal is a sub-project within the wider scope of ‘heritagedata.org’ – so:

http://www.heritagedata.org/seneschal - wiki/blog for project details, and <base uri>/schemes/123 (e.g.) for actual data API – see below…

Proposed REST API: /schemes – return list of all SKOS concept schemes held /schemes/search - (with parameters) – search for schemes /schemes/{id} – return details of specified SKOS concept scheme (current version) /schemes/{id}.html, .n3, .rdf, .json – return different serializations of that data, obtained either by

content negotiation or by direct request including extension /schemes/{id}/concepts – return list of ALL SKOS concepts in specified scheme /schemes/{id}/concepts/search – search for concepts in the specified scheme /concepts – return list of all SKOS concepts in ALL schemes /concepts/search - (with parameters) – search for concepts in any scheme /concepts/{id} – return details of specified SKOS concept (current version) /concepts/{id}.html, .n3, .rdf, .json – return different serializations of the data, obtained either by

content negotiation or by direct request including extension /concepts/{id}/schemes - return list of all schemes referencing the specified concept

Page 28: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Project deliverables

http://www.heritagedata.org/blog/

Page 29: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Schema List

http://heritagedata.org/test/getAllSchemes.php

Page 30: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Scottish Monument types

http://heritagedata.org/test/schemes/1.html

Page 31: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Scottish Monument types: Top level

http://heritagedata.org/test/schemes/1/concepts/405.html

Page 32: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Scottish Monument types: concept

http://purl.org/heritagedata/schemes/1/concepts/409

Page 33: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

http://heritagedata.org/test/searchForm.php

Page 34: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Page 35: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

http://heritagedata.org/test/sparql.php

Page 36: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Versioning (preliminary) /schemes/{id} – returns current version of the specified scheme /schemes/{id}/versions – returns all versions of the specified

scheme /schemes/{id}/versions/{id} – returns specified version of the

specified scheme /concepts/{id} – returns current version of the specified concept /concepts/{id}/versions – returns all versions of the specified

concept /concepts/{id}/versions/{id} – returns specified version of the

specified concept[skos:ConceptScheme]

data:schemes/123/versions/20111005[skos:ConceptScheme]

data:schemes/123

dct:hasVersion

(dct:isVersionOf)

[skos:ConceptScheme]data:schemes/123/versions/2013020301

dct:hasVersion

(dct:isVersionOf)

Page 37: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Published vocabularies

Vocabulary England Scotland Wales

Monument type YES YES YES

Objects YES YES

Maritime craft YES YES

Period YES YES

Events (activities) YES ???

Archaeological Sciences YES ???

Components YES

Building materials YES

Evidence YES

Page 38: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

A question of jurisdiction

289 Allison Street, Glasgow: TENEMENThttp://canmore.rcahms.gov.uk/en/site/148111/

TENEMENT (Scotland)http://purl.org/heritagedata/schemes/1/concepts/467A large building containing a number of rooms or flats,

access to which is usually gained via a common stairway.

TENEMENT (England)http://purl.org/heritagedata/schemes/eh_tmt2/concepts/68997A parcel of land.

TENEMENT (Wales)http://purl.org/heritagedata/schemes/10/concepts/68997

TENEMENT BLOCK (England)http://purl.org/heritagedata/schemes/eh_tmt2/concepts/71489Use for speculatively built 19th century "model dwellings", rather than those built by a philanthropic society.

TENEMENT BLOCK (Wales)http://purl.org/heritagedata/schemes/10/concepts/71489

TENEMENT HOUSE (England)http://purl.org/heritagedata/schemes/eh_tmt2/concepts/71476Originally built as a family house. Converted into flats during the 19th or 20th century.

SC674834

Page 39: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

A question of jurisdiction

A Cruck House in Wick, WorcestershireCruck cottage in Wick Philip Halling http://creativecommons.org/licenses/by-sa/2.0/

Cruck Framed Byre, Latheron, Caithnesshttp://canmore.rcahms.gov.uk/en/site/86630/

SC683414

Page 40: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

A bheil Gàidhlig agaibh?

The Cenotaph, George Square, Glasgow: http://canmore.rcahms.gov.uk/en/site/143264/

DP151933

Page 41: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

A bheil Gàidhlig agaibh?

Page 42: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Multilinguality Multilingual

labels & notes Search in one

language, retrieve another

Potential to manage regional terms

Page 43: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Challenges for RCAHMS Controlled vocabularies online Integration of project deliverables into RCAHMS processes

Managing candidate terms

Publishing additional vocabularies

Jurisdiction - a single British thesaurus for Cultural heritage?

Adding images

Moving the goalposts

Page 44: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

Summary Controlled vocabularies online

Linked data (SKOS) Downloadable files

Linking out Mapping between the different thesauri

Web services term suggestion, term validation, legacy data alignment

Tools to align data with controlled vocabularies Browser-based ‘widget’ controls

http://www.heritagedata.org/blog/work-in-the-pipeline/

Page 45: Cigs lod rcahms_seneschal_pm_20131118

SENESCHAL - Semantic ENrichment Enabling Sustainability of arCHAeological Links

©University of Glamorgan

“the key to interoperability”

http://www.heritagedata.org/