22
1 Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna) Kai Eckert University of Mannheim DM2E All WP Meeting November 30th, 2012 Vienna A Linked Data based Infrastructure for DM2E (WP2)

Kai Eckert - A Linked Data based Infrastructure for DM2E

Embed Size (px)

Citation preview

Page 1: Kai Eckert - A Linked Data based Infrastructure for DM2E

1Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

Kai EckertUniversity of Mannheim

DM2E All WP MeetingNovember 30th, 2012

Vienna

A Linked Data based Infrastructure for DM2E

(WP2)

Page 2: Kai Eckert - A Linked Data based Infrastructure for DM2E

2Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

Agenda

Motivation: Follow the Linked Data Principles

Linked Data Architecture

Provenance in the Europeana Data Model

OAI-ORE vs. Named Graphs

Linked Data Publishing with Provenance

Page 3: Kai Eckert - A Linked Data based Infrastructure for DM2E

3Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

Architecture

Integrated Webclient

Page 4: Kai Eckert - A Linked Data based Infrastructure for DM2E

The E

uro

peana D

ata

Model (

ED

M)

Provenance realizedby means of OAI-ORE.

Problems?

Users have tounderstand Proxies.

Users have tounderstand Aggregations.

Wouldn't named graphs be nicer?

Page 5: Kai Eckert - A Linked Data based Infrastructure for DM2E

5Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

Removing the proxies

Proxies are (proxy-) resources for the actual resources. Every data provider has an "own" resource to describe, as a placeholder.

Practical approach: we use named graphs to distinguish descriptions from different providers within our store.

Page 6: Kai Eckert - A Linked Data based Infrastructure for DM2E

6Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

Removing the Aggregations?

What is an aggregation?"Aggregations are used in Europeana to represent the complex constructs that are provided by contributors. An aggregation is associated to the object that it is about, by the property edm:aggregatedCHO."

Level of aggregation:

1 aggregation per providedCHO.

EuropeanaAggregation aggregates other aggregations (from data providers).

Page 7: Kai Eckert - A Linked Data based Infrastructure for DM2E

7Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

A Named Graph per Resource

Corresponds to the EDM Aggregations.Finegrained... feasible?Named Graphs as first class members in the model.

Statements about the aggregation that areonly valid for one resource!

If we allow this, the named graph must never get lost!

Page 8: Kai Eckert - A Linked Data based Infrastructure for DM2E

8Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

A Named Graph per Collection

This information must not get lost, too.But: It is not only valid for one resource. We are nowmore flexible regarding the publication of the data.

But: Where are the aggregations?

Page 9: Kai Eckert - A Linked Data based Infrastructure for DM2E

9Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

Page 10: Kai Eckert - A Linked Data based Infrastructure for DM2E

10Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

One Named Graph per Provided Dataset

Naturally fits to provenance requirements:All statements stem from some dataset.

Positive aspect: Dataproviders do not have to care any more!

Page 11: Kai Eckert - A Linked Data based Infrastructure for DM2E

11Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

Provenance and Versioning on Collection Level

Page 12: Kai Eckert - A Linked Data based Infrastructure for DM2E

12Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

Overlapping Resource Descriptions

Page 13: Kai Eckert - A Linked Data based Infrastructure for DM2E

13Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

Crosswalk to EDM

Are we still backwards comaptible?

YES :-)

Page 14: Kai Eckert - A Linked Data based Infrastructure for DM2E

14Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

Publishing

Page 15: Kai Eckert - A Linked Data based Infrastructure for DM2E

15Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

What's inside our store?

RDF Datasets per Collection, organized in Named Graphs.

NG URI scheme:

http://data.dm2e.eu/data/collection/[provider]/[collectionId]/[version]

Additional provenance statements for each collection.

Page 16: Kai Eckert - A Linked Data based Infrastructure for DM2E

16Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

Make it available

Web-Documents (with URI) deliver RDF, provenance is included as statements about the URI.

On client side, the document creates a new Named Graph, with the URI as name.

Page 17: Kai Eckert - A Linked Data based Infrastructure for DM2E

17Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

RESTful API (Publishing)

http://data.dm2e.eu/data/...

... collection/[provider]/[collectionID]/[version] (dm2e:Collection) => dump of one whole ingested dataset

... resource/[provider]/[collectionId]/[identifier] (edm:providedCHO) => 303 to latest version of describing Aggregation

... collection/[provider]/[collectionID]/[version]/[identifier] (ore:Aggregation) => data about a single resource

... linkset/[provider]/[linksetID]/[version] (dm2e:LinkSet) => generated links

... linkset/[provider]/[linksetID]/[version]/[provider]/ [collectionID]/[identifier] (dm2e:LinkAggregation) => links for a specific resource

Page 18: Kai Eckert - A Linked Data based Infrastructure for DM2E

18Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

Provenance in Documents

Generated from provenance information about datasets:

dc:creator => Data provider

dc:date => Timestamp

dm2e:version => version number

dm2e:nextVersion => link to next version of the document

dm2e:previousVersion => link to previous version

dm2e:links => link to a linkset

Optional: PROV statements for full provenance chain.

Maintained by the DM2E infrastructure.

Version means always the version of the underlying dataset.

Page 19: Kai Eckert - A Linked Data based Infrastructure for DM2E

19Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

Consuming our Data (WP1 and WP3)

Fetch data from our URIs.

Fetch suitable linksets from our URIs (links provided with data).

Local data cleansing (recommended for WP3): Unify all URIs based on owl:sameAs links for better local querying (or use reasoning).

Client has to maintain a mapping for original URIs per Named Graph for the proper representation of annotations.

Page 20: Kai Eckert - A Linked Data based Infrastructure for DM2E

20Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

Annotations

Annotations always on resource or statement level.

Subject of an annotation:

[Graph URI]/URLencode(resource URI) or [Graph URI]/URLencode(subject,predicate,object)

Example:

http://data.dm2e.eu/data/collection/[provider]/[collectionID]/[version]/[identifier]/[subject,predicate,object]

Similar to XPointer, SharedCanvas, ...

Page 21: Kai Eckert - A Linked Data based Infrastructure for DM2E

21Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

What is missing?

Several definitions of dm2e: terms.

A vocabulary for the annotations:

Requirements of the scientists

Mark wrong statements!

Comment on statements.

...

A vocabulary for the classification of linksets:

Automatically created

Manually created

Recall-oriented (exploratory, but with wrong links)

Precision-oriented (incomplete, but high quality)

...

...

Page 22: Kai Eckert - A Linked Data based Infrastructure for DM2E

22Kai Eckert: A Linked Data based Infrastructure for DM2E (All WP Meeting, November 30th, 2012, Vienna)

Implementation pending ;-)

Questions?

Suggestions?