14
ARIADNE is funded by the European Commission's Seventh Framework Programme Integrating Data for Archaeology Dimitris Gavrilis, Eleni Afiontzi, Johan Fihn, Olof Olsson, Achille Felicetti, Franco Nicollucci, Sebastian Cuy

Integrating Data for Archaeology

Embed Size (px)

Citation preview

Page 1: Integrating Data for Archaeology

ARIADNE is funded by the European Commission's Seventh Framework Programme

Integrating Data for Archaeology

Dimitris Gavrilis, Eleni Afiontzi, Johan Fihn, Olof Olsson, Achille Felicetti, Franco Nicollucci, Sebastian Cuy

Page 2: Integrating Data for Archaeology

Introduction• Traditional projects in Archaeology focused on aggregating

data into one single format / system– Provide users with a unified interface– Improve search and retrieval– Improve retrieval semantics through specialized metadata schemas

• ARIADNE goes one step further : data integration– Try to model the domain information (ARIADNE Catalog Data Model)– Use a curation aware aggregator to enrich information using the

above model– Improve user experience through more substantial and powerful

queries

Page 3: Integrating Data for Archaeology

Innovation• Why hasn’t anyone done this before ?– Complexity– Performance– Domain knowledge

• Standard aggregation systems / architectures are insufficient. ARIADNE Infrastructure

Page 4: Integrating Data for Archaeology

ARIADNE Infrastructure• Flexibility– Ingest diverse and heterogeneous data

• XML, RDF, Excel, CSV, …

– Handle each datastream independently and according to it’s requirements• Adapting aggregation, validation, enrichment workflows

– Add new curation services easily and on demand

Page 5: Integrating Data for Archaeology

ARIADNE Infrastructure• Complexity– De-couple services complexity through a micro-service

oriented architecture– Use loosely connecting services in a highly scalable

environment.

• Performance– Scalable technologies

Page 6: Integrating Data for Archaeology

ARIADNE Infrastructure• Domain knowledge– Integrate the domain model (ACDM) into the

infrastructure– Make extensive use of domain thesauri (e.g. AAT) and

label every resource accordingly– Create specialized micro-services for curating content

according to the domain needs

Page 7: Integrating Data for Archaeology

Data Integration Overall Architecture

Repository

Excel Sheet

ARIADNE Registry

Validation

Cleaning

Enrichment

Integration

RDF Store(RDF)

Elastic Search

RDF Store(CRM)

Archive

ARIADNE Portal

Integration Experiments

Page 8: Integrating Data for Archaeology

Use of RDF• Every resource is assigned a unique and persistent

identifier that is resolved through a URI

• Every resource has an RDF representation according to the ACDM schema

Page 9: Integrating Data for Archaeology

Data Curation• Use of curation micro-services for enriching content

– Geo-normalization (identify, extract and normalize places and coordinates)

– Geo-coding (e.g. Geo-names)– Thesauri mappings (map native subject terms to a common thesauri :

AAT)– Temporal normalization (identify, extract and normalize dates)– Gazetteers (e.g. DAI Gazetteer)– Historical & Ancient place names identification (Pelagios & Pleiades)– Temporal information mappings (Perio.do)

Page 10: Integrating Data for Archaeology

Data Integration• Data Integration is based on a 3+1 dimensions– Subject– Space– Time– Resource type

Page 11: Integrating Data for Archaeology

Identify & Link together Resource Types

• Model individual information resource types (e.g. collections, bibliographic reports, databases, datasets, etc).

• Identify each resources type during ingestion

• Link / group different resource types– E.g. put all related heterogeneous resource types (reports,

datasets,…) under the same collections

Page 12: Integrating Data for Archaeology

Thematic integration• ARIADNE uses the AAT thesaurus to semantically label

ALL aggregated information.• AAT terms act as a glue and when combined with spatial

and temporal information can produce great results• Semantic expansion of terms is extensively being used in

order to improve retrieval.• Expansion of multi-lingual terms facilitates cross-

language search without requiring automatic translation.

Page 13: Integrating Data for Archaeology

Spatial & Temporal• All resources with spatial information– Are assigned WGS84 projected coordinates

• All resources with temporal information– Are normalized according the ACDM dates (that takes into

account periods, period names and supports ISO date format).

Page 14: Integrating Data for Archaeology

Subject Terms Curation Lifecycle

Native Subjects

Vocabulary Mapping Tool MORe

mappings

*nativeSubjects

Provider Native Repository

Excel SheetXML Files Registry

*nativeSubjects*providedSubjects

*nativeSubjects*providedSubjects

AAT

Elastic SearchARIADNE Portal

ACDM / Subjects (JSON)

**providedSubjects**derivedSubjects **broaderGenericSubjects *nativeSubjects

*mono-lingual (prefLabel only)** multi-lingual (prefLabel & altLabel)