35
co-funded by the European Union Work Package 2: Interoperability Infrastructure DM2E Final Event December, 11th 2014, Pisa Kai Eckert

06 dm2 e_pisa-wp2-no-anim

Embed Size (px)

Citation preview

Page 1: 06 dm2 e_pisa-wp2-no-anim

co-funded by the European Union

Work Package 2: Interoperability Infrastructure

DM2E Final Event December, 11th 2014, Pisa

Kai Eckert

Page 2: 06 dm2 e_pisa-wp2-no-anim

DM2E Architecture

DM2E Final Event: Work Package 2 2 11.12.2014

WP 1

WP 2

WP 3

Page 3: 06 dm2 e_pisa-wp2-no-anim

WP2 Infrastructure

DM2E Final Event: Work Package 2 3 11.12.2014

Page 4: 06 dm2 e_pisa-wp2-no-anim

Access to the data

Search and browse the data

DM2E Final Event: Work Package 2 4 11.12.2014

View and access the data

Page 5: 06 dm2 e_pisa-wp2-no-anim

Data Model

DM2E Final Event: Work Package 2 5 11.12.2014

Page 6: 06 dm2 e_pisa-wp2-no-anim

DM2E Model

The DM2E Model is an application profile refining the Europeana Data Model.

Reused vocabularies: Bibliographic Ontology, FaBiO, Publishing Roles Ontology, VIVO Ontology, VoID.

DM2E Final Event: Work Package 2 6 11.12.2014

edm:NonInfor mationResource

edm:Place edm:PhysicalThing

bibo:Book

dm2e:Manuscript

dm2e:Page

edm:Event skos:Concept

fabio:Chapter

dm2e:Work

edm:TimeSpan edm:Agent

foaf:Organization

foaf:Person

Page 7: 06 dm2 e_pisa-wp2-no-anim

DM2E Model: Metalevel

• Levels of Abstraction in DM2E

DM2E Final Event: Work Package 2 7 11.12.2014

Class Uplink Metadata

edm:ProvidedCHO ore:isAggregatedBy About the content

ore:Aggregation ore:isDescribedBy About the provided metadata, providers perspective, record level

ore:ResourceMap dm2e:DataResource foaf:Document

void:inDataset

void:Dataset (Named Graph)

About the RDF data, DM2E perspective

Metalevel, managed by DM2E

Infrastructure

Core data, created by provider mappings

Page 8: 06 dm2 e_pisa-wp2-no-anim

Center of the Infrastructure

• Core: DM2E Model

• OMNOM and TYPES vocabulary:

– Describe transformation and contextualisation workflows.

• Specifications for

– URI schemes,

– the organisation of CHOs on different levels,

– methods to link to external authority data.

11.12.2014 DM2E Final Event: Work Package 2 8

onto.dm2e.eu

Page 9: 06 dm2 e_pisa-wp2-no-anim

Iterative Process (Ingestion Model)

1. Issues are tracked based on validation and test reports.

2. Changes to the model are collected and included in the current draft version of the model.

3. Providers adjust their mappings.

11.12.2014 DM2E Final Event: Work Package 2 9

Automatic validation

Ingestion Tests

Mapping creation

4. A new draft is published on a regular basis for additional feedback.

5. Based on the feedback, a new version of the model is released.

Page 10: 06 dm2 e_pisa-wp2-no-anim

Evaluation

• The model evaluation took place in April/May 2014

• Basis:

– 10 datasets

– Delivered by eight data providers

– Mapped by six different institutions

– Altogether 61,365,146 triples

DM2E Final Event: Work Package 2 10 11.12.2014

Page 11: 06 dm2 e_pisa-wp2-no-anim

Evaluation: Some Insights

• Many classes and properties are not mapped

– For example: edm:Event, dm2e:misattributed, edm:happenedAt, skos:hiddenLabel • Some of these were asked for by providers!

– Conclusion: Unused classes and properties could be removed to achieve a higher simplicity of the model

• Different providers have different mapping styles

– Conclusion: Mapping recommendations are important!

11.12.2014 DM2E Final Event: Work Package 2 11

Page 12: 06 dm2 e_pisa-wp2-no-anim

Evaluation: Property Usage

• A few properties were used very often

• Most properties were rarely used (Long tail phenomenon)

• About a third of all properties were never used

DM2E Final Event: Work Package 2 12 11.12.2014

Page 13: 06 dm2 e_pisa-wp2-no-anim

RDF Application Profiles and Validation

• Important questions beyond the limits of DM2E.

• Initiation of a task group within the Dublin Core Metadata Initiative.

• Currently around 30 participants from 11 countries.

• Collaboration with W3C.

• Goals: – Establish RDF Application Profiles to provide combinations and

refinements of existing vocabularies or application profiles globally, but with a local context.

– Develop mechanisms to support the access to data using different application profiles.

– Support constraint definitions and automatic validation of RDF data.

• http://wiki.dublincore.org/index.php/RDF-Application-Profiles

DM2E Final Event: Work Package 2 13 11.12.2014

Page 14: 06 dm2 e_pisa-wp2-no-anim

Ingestion

DM2E Final Event: Work Package 2 14 11.12.2014

Page 15: 06 dm2 e_pisa-wp2-no-anim

The DM2E Data Bridge

DM2E Final Event: Work Package 2 15 11.12.2014

This is YOUR data.

This is the void:Dataset

in DM2E.

Page 16: 06 dm2 e_pisa-wp2-no-anim

Some more links are actually available...

DM2E Final Event: Work Package 2 16 11.12.2014

Page 17: 06 dm2 e_pisa-wp2-no-anim

Workflow Example: XSLT-Transformation

• XML XSLT RDF/XML DM2E Store

DM2E Final Event: Work Package 2 17 11.12.2014

Page 18: 06 dm2 e_pisa-wp2-no-anim

MINT (NTUA)

• XSLT mapping editor, alignment to DM2E data model

• User support by context-sensitive lists of available elements:

– Appropriate classes for resources

– Consistency checks using domain and range specifications

DM2E Final Event: Work Package 2 18 11.12.2014

Page 19: 06 dm2 e_pisa-wp2-no-anim

Open Workflows

• Distributed infrastructure to ingest and create data in DM2E.

• 100% RDF, 100% REST

• Components: – Input services (File services, D2R instances, OAI-PMH, ...)

– Transformation services (Generic XSLT, MINT, R2R)

– Ingestion services (Output of an ingestion pipeline)

– Contextualization services (Silk)

– Configuration services (MINT and Silk act as editors)

DM2E Final Event: Work Package 2 19 11.12.2014

Page 20: 06 dm2 e_pisa-wp2-no-anim

OmNom User Interface

• OmNom UI: Orchestration of web services

– Mapping and transformation

– Contextualisation

DM2E Final Event: Work Package 2 20 11.12.2014

Page 21: 06 dm2 e_pisa-wp2-no-anim

UI Integration

DM2E Final Event: Work Package 2 21 11.12.2014

MINT

Silk

OmNom

Page 22: 06 dm2 e_pisa-wp2-no-anim

Contextualisation

DM2E Final Event: Work Package 2 22 11.12.2014

Page 23: 06 dm2 e_pisa-wp2-no-anim

Contextualisation

• Silk: Link Discovery Framework (UMA)

• Definition of linkage rules to create links between Linked Data resources.

• http://context.dm2e.eu

DM2E Final Event: Work Package 2 23 11.12.2014

Page 24: 06 dm2 e_pisa-wp2-no-anim

Intergration of Silk

• Silk is integrated in OmNom as web service

DM2E Final Event: Work Package 2 24 11.12.2014

Use generated linkage rules

Generate links

Page 25: 06 dm2 e_pisa-wp2-no-anim

Access to Contextualisation Results

• Contextualisation results (Linksets) are kept separate from ingested data.

• Linksets are further described and versioned (like datasets).

• Additional linkset properties:

– Automatically created,

– Manually created,

– Recall-oriented (exploratory, but with wrong links),

– Precision-oriented (incomplete, but high quality),

– ...

DM2E Final Event: Work Package 2 25 11.12.2014

Page 26: 06 dm2 e_pisa-wp2-no-anim

Contextualisation Resources

DM2E Final Event: Work Package 2 26 11.12.2014

Geonames GND LCSH DBpedia

Freebase

Places Subjects

Agents

DDC Linked

Geodata

Page 27: 06 dm2 e_pisa-wp2-no-anim

Information for Contextualisation

• Structured data, often shallow

• Rich, but unstructured data

DM2E Final Event: Work Package 2 27 11.12.2014

Page 28: 06 dm2 e_pisa-wp2-no-anim

Contextualisation of Structured Data

DM2E Final Event: Work Package 2 28 11.12.2014

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

ONB Codices ONB ABO MPIWG Rara MPIWGHarriot

UBFFMSammlungen

GEI Digital BBAW DTA UIB WAB UBERDINGLER

New

Baseline

Potential

Page 29: 06 dm2 e_pisa-wp2-no-anim

Statement-level Provenance

• Generally, all statements for a resource like a CHO stem from the same data provider in DM2E and from one single data ingestion.

• But: Data about contextualisation resources (agents, places, subjects) are combined from different sources and contain additional links from the contextualisation process.

• We therefore have to deal with the provenance differently for them.

DM2E Final Event: Work Package 2 29 11.12.2014

Page 30: 06 dm2 e_pisa-wp2-no-anim

The "Oh, yeah?" Button

DM2E Final Event: Work Package 2 30 11.12.2014

At the toolbar (menu, whatever) associated with a document there is a button marked "Oh, yeah?". You press it when you loses that feeling of trust. It says to the Web, "so how do I know I can trust this information?". The software then goes directly or indirectly back to metainformation about the document, which suggests a number of reasons.

Source: http://www.w3.org/DesignIssues/UI.html

Page 31: 06 dm2 e_pisa-wp2-no-anim

Statement-level Provenance

• points you to the ingested dataset or linkset containing the statement.

• points you to the contextualisation resources which are linked to the same external resource. The latter is important because you can't provide the information under the URI of the external resource as you can't add data to its representation.

DM2E Final Event: Work Package 2 31 11.12.2014

Page 32: 06 dm2 e_pisa-wp2-no-anim

DM2E Final Event: Work Package 2 32 11.12.2014

Page 33: 06 dm2 e_pisa-wp2-no-anim

DM2E Final Event: Work Package 2 33 11.12.2014

Page 34: 06 dm2 e_pisa-wp2-no-anim

Great, where do I get it?

• Access to our data: http://data.dm2e.eu

• Documentation, Downloads: http://dm2e.eu

• DM2E in a Box:

– Virtual Machine Image (Virtual Box).

– Provides the full DM2E stack.

– Load your own data, browse it, annotate it.

DM2E Final Event: Work Package 2 34 11.12.2014

Page 35: 06 dm2 e_pisa-wp2-no-anim

Thank you.

DM2E Final Event: Work Package 2 35 11.12.2014