30
1 Translation of Relational and Non-Relational Databases into RDF with xR2RML F. Michel, L. Djimenou, C. Faron-Zucker, J. Montagnat I3S lab, CNRS, Univ. Nice Sophia

Translation of Relational and Non-Relational Databases into RDF with xR2RML

Embed Size (px)

Citation preview

Page 1: Translation of Relational and Non-Relational Databases into RDF with xR2RML

1

Translation of Relational and Non-Relational Databases

into RDF with xR2RML

F. Michel, L. Djimenou, C. Faron-Zucker, J. Montagnat I3S lab, CNRS, Univ. Nice Sophia

Page 2: Translation of Relational and Non-Relational Databases into RDF with xR2RML

2

Web of data publication/interlinking of open datasets • Goal: publish heterogeneous data in a common format (RDF)

Driven by data integration initiatives, e.g.:

• Linking Open Data, 1015 ds.

• W3C Data Activity

• BIO2RDF, 35 ds.

• Neuroscience Information Framework (12598 registry entries)

Web-scale data integration

Linked Datasets as of Aug. 30th 2014. (c) R. Cyganiak & and A. Jentzsch

(Data: Apr. 2015)

Page 3: Translation of Relational and Non-Relational Databases into RDF with xR2RML

3

Web-scale data integration

Need to access data from the Deep Web [1]

• Strd./unstrd. data hardly indexed by search engines, hardly linked with other data sources

Exponential data growth goes on • Various types of DBs:

RDB, NoSQL, NewSQL, Native XML, LDAP directory, OODB...

• Heterogeneous data models and query capabilities

[1] B. He, M. Patel, Z. Zhang, and K. C.-C. Chang. Accessing the deep web. Communications of the ACM, 50(5):94–101, 2007

Page 4: Translation of Relational and Non-Relational Databases into RDF with xR2RML

4

Web-scale data integration

To enrich the web of data with existing and new data being created ever faster...

... we need standardized approaches to enable the translation of heterogeneous data sources to RDF

Page 5: Translation of Relational and Non-Relational Databases into RDF with xR2RML

5

Previous works

Background: R2RML and RML

Description of xR2RML

Evaluation and perspectives

Agenda

Page 6: Translation of Relational and Non-Relational Databases into RDF with xR2RML

6

Previous works

Background: R2RML and RML

Description of xR2RML

Evaluation and perspectives

Agenda

Page 7: Translation of Relational and Non-Relational Databases into RDF with xR2RML

7

Much work achieved on RDBs

D2RQ, Virtuoso, R2RML (W3C)…

Goals: generic RDB-to-RDF, OBDA, ontology learning, schema mapping…

Methods: direct mapping vs. domain-specific, materialization vs. SQL-to-SPARQL query rewriting

XML: using either XPath (RML), XQuery (XSPARQL, SPARQL2XQuery) or XSLT (Scissor-Lift), XSD-to-OWL (SPARQL2XQuery)

CSV/TSV/Spreadsheets: CSV on the web (W3C WG)

JSON: using JSONPath (RML)

Integration frameworks: DataLift, RML, Asio Tool Suite…

Previous works

Page 8: Translation of Relational and Non-Relational Databases into RDF with xR2RML

8

Existing approaches to map specific types of databases or map specific data formats to RDF

Each comes with its own mapping language or UI

Supporting a new system (data model and QL) not straightforward

Previous works

No unified mapping language to equally apply to most common databases (RDB, NoSQL, XML, LDAP, OO…)

Supporting a new data model and/or QL develop a DB connector but no change in the mapping language

Page 9: Translation of Relational and Non-Relational Databases into RDF with xR2RML

9

Previous works

Background: R2RML and RML

Description of xR2RML

Evaluation and perspectives

Agenda

Page 10: Translation of Relational and Non-Relational Databases into RDF with xR2RML

10

R2RML – RDB To RDF Mapping Language

W3C recommendation, 2012

Goals: • Describe mappings of relational entities to RDF

• Reuse of existing ontologies

• Operationalization not addressed

How: TriplesMaps (TM) define how to generate RDF triples • 1 logical table rows to process

• 1 subject map subject IRIs

• N (predicate map-object map) couples

• 1 opt. graph map graph IRIs

An R2RML mapping is an RDF graph

Triples

Page 11: Translation of Relational and Non-Relational Databases into RDF with xR2RML

11

R2RML – RDB To RDF Mapping Language

Id Acronym Centre_Id

10 CAC2010 4

Id Name address

4 Pasteur ...

Study

Centre

FK

R2RML mapping graph:

Produced RDF:

<#Centre> a rr:TriplesMap; rr:logicalTable [ rr:tableName "Centre" ]; rr:subjectMap [ rr:class ex:Centre; rr:template "http://example.org/centre#{Name}"; ].

<#Study> a rr:TriplesMap; rr:logicalTable [ rr:tableName “Study" ]; rr:subjectMap [ rr:class ex:Study; rr:template "http://example.org/study#{Id}"; ];

rr:predicateObjectMap [ rr:predicate ex:hasName; rr:objectMap [ rr:column "Acronym" ]; ];

rr:predicateObjectMap [ rr:predicate ex:locatedIn; rr:objectMap [ rr:parentTriplesMap <#Centre>; rr:joinCondition [ rr:child "Centre_id"; rr:parent "Id"; ]; ]; ].

<http://example.org/centre#Pasteur> a ex:Centre.

<http://example.org/study#10> a ex:Study;

ex:hasName "CAC2010";

ex:locatedIn <http://example.org/centre#Pasteur>.

Page 12: Translation of Relational and Non-Relational Databases into RDF with xR2RML

12

<#Centre>

rml:logicalSource [

rml:source “http://example.org/Centres.xml";

rml:referenceFormulation ql:XPath;

rml:iterator “/centres/centre”:

];

rr:subjectMap [

rr:class ex:Centre;

rr:template

"http://example.org/centre#{//centre/@Id}";

];

rr:predicateObjectMap [

rr:predicate ex:hasName;

rr:objectMap [

rml:reference "//centre/name" ];

];

RML extensions to R2RML

<centres> <centre @Id="4"> <name>Pasteur</name> </centre> <centre @Id="6"> <name>Pontchaillou</name> </centre> </centres>

Advantages: • Extends to CSV, JSON, XML sources • Map several sources simultaneously Limitations: • Fixed list of reference formulations • No distinction between reference

formulation and query language • No RDF collections

RML mapping graph: XML document:

Page 13: Translation of Relational and Non-Relational Databases into RDF with xR2RML

13

Previous works

Background: R2RML and RML

Description of xR2RML

Evaluation and perspectives

Agenda

Page 14: Translation of Relational and Non-Relational Databases into RDF with xR2RML

14

xR2RML - Overall picture

xR2RML Translation

Engine

xR2RML Mapping

description

Native QL

Source database

Flexible language to describe mappings from most common types of DB to RDF. Extends R2RML and leverages RML extensions.

Domain ontologies

refers to

Domain ontologies

uses

Page 15: Translation of Relational and Non-Relational Databases into RDF with xR2RML

15

xR2RML: Logical source

<#Centre>

xrr:logicalSource [

xrr:query ’’’for $x in doc(“centres.xml”)/centres/centre

where ... return $x’’’;

];

rr: R2RML vocabulary

xrr: xR2RML vocabulary

<centres> <centre @Id="4"> <name>Pasteur</name> </centre> <centre @Id="6"> <name>Pontchaillou</name> </centre> </centres>

XML database supporting XQuey:

xR2RML mapping graph:

Page 16: Translation of Relational and Non-Relational Databases into RDF with xR2RML

16

xR2RML: Data element references

<#Centre>

xrr:logicalSource [

xrr:query ’’’for $x in doc(“centres.xml”)/centres/centre

where ... return $x’’’;

];

rr:subjectMap [

rr:class ex:Centre;

rr:template

"http://example.org/centre#{//centre/@Id}";

];

rr:predicateObjectMap [

rr:predicate ex:hasName;

rr:objectMap [

xrr:reference "//centre/name" ];

];

rr: R2RML vocabulary

xrr: xR2RML vocabulary

<centres> <centre @Id="4"> <name>Pasteur</name> </centre> <centre @Id="6"> <name>Pontchaillou</name> </centre> </centres>

XML database supporting XQuey:

xR2RML mapping graph:

Page 17: Translation of Relational and Non-Relational Databases into RDF with xR2RML

17

xR2RML: Data element references

<centres> <centre @Id="4"> <name>Pasteur</name> </centre> <centre @Id="6"> <name>Pontchaillou</name> </centre> </centres>

XML database supporting XQuey:

xR2RML mapping graph:

rr: R2RML vocabulary

xrr: xR2RML vocabulary

<#Centre>

xrr:logicalSource [

xrr:query ’’’for $x in doc(“centres.xml”)/centres/centre

where ... return $x’’’;

];

rr:subjectMap [

rr:class ex:Centre;

rr:template

"http://example.org/centre#{//centre/@Id}";

];

rr:predicateObjectMap [

rr:predicate ex:hasName;

rr:objectMap [

xrr:reference “//centre/name" ];

];

xR2RML engine usage guidelines

Types of DB xrr:query xrr:reference rr:template

RDB, Column stores

SQL, CQL, HQL Column name

Native XML DB XQuery XPath

NoSQL doc. Store Proprietary JS-based JSONPath

SPARQL endpoint SPARQL Variable name, Column name (s, p, o)

Neo4J (graph db) Cypher Column name (s, p, o)

LDAP directory LDAP Query Attribute name

... ... ...

Page 18: Translation of Relational and Non-Relational Databases into RDF with xR2RML

18

{ "studyid": 10, "acronym": "CAC2010", "centres": [ { "centreid": 4, "name": "Pasteur" }, { "centreid": 6, "name": "Pontchaillou" } ] }

xR2RML: multiple values vs. RDF list/container

Mapping case: link the study with the centres it involves

<http://example.org/study#10> ex:involves “Pasteur”.

<http://example.org/study#10> ex:involves “Pontchaillou”.

<http://example.org/study#10> ex:involvesCenters ( “Pasteur” “Pontchaillou” )

Page 19: Translation of Relational and Non-Relational Databases into RDF with xR2RML

19

{ "studyid": 10, "acronym": "CAC2010", "centres": [ { "centreid": 4, "name": "Pasteur" }, { "centreid": 6, "name": "Pontchaillou" } ] }

xR2RML: multiple values vs. RDF list/container

Mapping case: link the study with the centres it involves

rr:objectMap [

xrr:reference "$.centres.*.name“;

rr:termType xrr:RdfList;

];

R2RML term types

rr:IRI, rr:Literal, rr:BlankNode

xR2RML term types

xrr:RdfList, xrr:RdfSeq, xrr:RdfBag, xrr:RdfAlt

Page 20: Translation of Relational and Non-Relational Databases into RDF with xR2RML

20

xR2RML: nested collections

From structured values (XML, JSON...): nested collections and key-value associations...

... to RDF:

generate nested lists/containers, qualify members (data type, language tag...)

rr:objectMap [

xrr:reference “...";

rr:termType xrr:RdfList;

xrr:nestedTermMap [ xrr:reference “...";

rr:termType xrr:RdfList;

xrr:nestedTermMap [ rr:datatype xsd:string; ]; ]; ];

( ( “John”^^xsd:string “Bob”^^xsd:string ) ( “Ted”^^xsd:string “Mark”^^xsd:string ) )

E.g.: produce a list of lists of strings

Page 21: Translation of Relational and Non-Relational Databases into RDF with xR2RML

21

Collection “studies”:

{ “studyid”: 10,

“acronym”: “CAC2010”,

“centres”: [ 4, 6 ]

}

Collection “centres”:

{ “centreid”: 4,

“name”: “Pasteur” },

{ “centreid”: 6,

“name”: “Pontchaillou”}

xR2RML: cross-references

<#Centre>

xrr:logicalSource [ ... ]; rr:subjectMap [ ... ].

<#Study>

xrr:logicalSource [ .. ]; rr:subjectMap [ ... ];

rr:predicateObjectMap [

rr:predicate ex:involvesSeq;

rr:objectMap [

rr:parentTriplesMap <#Centre>;

rr:joinCondition [

rr:child "$.centres.*";

rr:parent "$.centreid";

];

rr:termType xrr:RdfSeq;

];

].

<http://example.org/study#10> ex:involvesSeq

[ a rdf:Seq;

rdf:_1 <http://example.org/centre#Pasteur>;

rdf:_2 <http://example.org/centre#Pontchaillou>; ].

xR2RML mapping graph: MongoDB database:

Produced RDF:

Page 22: Translation of Relational and Non-Relational Databases into RDF with xR2RML

22

Collection “studies”:

{ “studyid”: 10,

“acronym”: “CAC2010”,

“centres”: [ 4, 6 ]

}

Collection “centres”:

{ “centreid”: 4,

“name”: “Pasteur” },

{ “centreid”: 6,

“name”: “Pontchaillou”}

xR2RML: cross-references

<#Centre>

xrr:logicalSource [ ... ]; rr:subjectMap [ ... ].

<#Study>

xrr:logicalSource [ .. ]; rr:subjectMap [ ... ];

rr:predicateObjectMap [

rr:predicate ex:involvesSeq;

rr:objectMap [

rr:parentTriplesMap <#Centre>;

rr:joinCondition [

rr:child "$.centres.*";

rr:parent "$.centreid";

];

rr:termType xrr:RdfSeq;

];

].

xR2RML mapping graph: MongoDB database:

Joint query pushed to the DB if supported, performed by the xR2RML engine otherwise

<http://example.org/study#10> ex:involvesSeq

[ a rdf:Seq;

rdf:_1 <http://example.org/centre#Pasteur>;

rdf:_2 <http://example.org/centre#Pontchaillou>; ].

Produced RDF:

Page 23: Translation of Relational and Non-Relational Databases into RDF with xR2RML

23

<#Centre> xrr:logicalSource [ xrr:sourceName "STAFF"; ]; ... rr:predicateObjectMap [ rr:predicate ex:fist-name; rr:objectMap [ xrr:reference "Column(Name)/JSONPath($.FirstName)" ]; ];

xR2RML: content with mixed formats

Data with mixed content

Relational table “STAFF”, column “Name” contains JSON data:

... Name ...

... { “FirstName”: “Bob”, “LastName: “Smith” }

...

xR2RML mapping graph:

Page 24: Translation of Relational and Non-Relational Databases into RDF with xR2RML

24

<#Centre> xrr:logicalSource [ xrr:sourceName "STAFF"; ]; ... rr:predicateObjectMap [ rr:predicate ex:fist-name; rr:objectMap [ xrr:reference "Column(Name)/JSONPath($.FirstName)" ]; ];

xR2RML: content with mixed formats

Data with mixed content

Relational table “STAFF”, column “Name” contains JSON data:

... Name ...

... { “FirstName”: “Bob”, “LastName: “Smith” }

...

Data format

Syntax path constructor

Row Column(), CSV(), TSV()

XML XPath()

JSON JSONPath()

... ...

xR2RML mapping graph:

Page 25: Translation of Relational and Non-Relational Databases into RDF with xR2RML

25

Previous works

Background: R2RML and RML

Description of xR2RML main features

Evaluation and perspectives

Agenda

Page 26: Translation of Relational and Non-Relational Databases into RDF with xR2RML

26

Use case: study the history and transmission of zoological knowledge along historical periods

TAXREF taxonomical reference • Designed to support studies in Conservation Biology, enriched

with bioarchaeological taxa

• Maintained the French National Museum of Natural History

• ~ 450.000 terms, CSV/JSON/XML

Use case in Digital Humanities

Page 27: Translation of Relational and Non-Relational Databases into RDF with xR2RML

27

Ongoing work [2]: Construction of a SKOS1 thesaurus based on TAXREF • Import of TAXREF/JSON into MongoDB

• Use of the Morph-xR2RML prototype implementation of xR2RML, to convert the MongoDB data to RDF

• Make alignments with existing well-adopted ontologies (e.g. NCBI Taxonomic Classification, GeoNames...) • Static alignments at mapping design time

• Using automatic alignment methods

Use case in Digital Humanities

1 SKOS: Simple Knowledge Organization System, W3C RDF-based standard to represent controlled vocabularies, taxonomies and thesauri. Bridge the gap between existing KOS and the Semantic Web and Linked Data.

Page 28: Translation of Relational and Non-Relational Databases into RDF with xR2RML

28

Ongoing discussion about the use of xR2RML to support ecology and agronomic studies • Large phenotype databases

Consider the query rewriting approach to support large datasets

How to write xR2RML mappings • Automatic xR2RML mapping generation from data schema

(XSD/DTD, JSON schema, JSON-LD...)

• Schema mapping

• Schema discovery

Perspectives

Page 29: Translation of Relational and Non-Relational Databases into RDF with xR2RML

29

Conclusions

Data deluge keeps on ever faster

Data stored in many kinds of DBs

xR2RML: • Flexible language to map most common types of database to

RDF

• Supports various data models and query languages

• Rich features: RDF collections/containers, joins, content with mixed formats

Applied to the construction of a SKOS thesaurus of TAXREF, a taxonomical reference

Page 30: Translation of Relational and Non-Relational Databases into RDF with xR2RML

30

Contacts:

Franck Michel

Johan Montagnat

Catherine Faron-Zucker

[2] C. Callou, F. Michel, C. Faron-Zucker, C. Martin, J. Montagnat. Towards a Shared Reference Thesaurus for

Studies on History of Zoology, Archaeozoology and Conservation Biology. In SW4SH workshop, ESWC’15.

[3] F. Michel, L. Djimenou, C. Faron-Zucker, and J. Montagnat. xR2RML: Non-Relational Databases to RDF

Mapping Language. Research report. ISRN I3S/RR 2014-04-FR. http://hal.archives-ouvertes.fr/hal-01066663

https://github.com/frmichel/morph-xr2rml/