34
Distributed Collaboration on RDF Datasets Using Git Towards the Quit Store Natanael Arndt, Norman Radtke and Michael Martin SEMANTiCS 2016, Leipzig September 14, 2016

Distributed Collaboration on RDF Datasets Using Git: Towards the Quit Store

Embed Size (px)

Citation preview

Distributed Collaboration on RDF DatasetsUsing Git

Towards the Quit Store

Natanael Arndt, Norman Radtke and Michael Martin

SEMANTiCS 2016, Leipzig

September 14, 2016

Problem & Motivation

2 / 23

Problem & Motivation

Linked Datasets as of August 2014

Enterprise Workspace

clon

e

enrich

😱

Public LOD Cloud

3 / 23

Problem & Motivation

Remark (Co-Evolution)The process of datasets simultaneously evolving separated from eachother while influencing each others evolution

4 / 23

Problem & Motivation

Usage of public LOD as background knowledgeMobile use casesIn distributed collaboration on RDF datasets

⇒ Support for multiple versions of the same dataset at the sametime

5 / 23

Approach

The same problem exists forsource code repositoriesSince around 10 yearsdistributed version controlsystem are solving thisproblemMultiple working copy exist atthe same time and can besynchronized

Server/Client Server/Client

Server/Client

Server/Client

Server/Client

6 / 23

Approach

Git is successful in softwaredevelopment

We have decided to see if thisalso works for RDFSo we have put RDF into therepositories

Server/Client Server/Client

Server/Client

Server/Client

Server/Client

7 / 23

Approach

Git is successful in softwaredevelopmentWe have decided to see if thisalso works for RDF

So we have put RDF into therepositories

Server/Client Server/Client

Server/Client

Server/Client

Server/Client

7 / 23

Approach

Git is successful in softwaredevelopmentWe have decided to see if thisalso works for RDFSo we have put RDF into therepositories

Server/Client Server/Client

Server/Client

Server/Client

Server/Client

7 / 23

Methodology

Quit Store

RDFQuad Store

SPARQL 1.1 InterfaceQuery & Update Read/write interface

Translating read/writeoperations to versioningSynchronizes the store withthe current working copy

8 / 23

Methodology:Serialization of RDF data

Multiple RDF serialization formats are availableFor the versioning with Git we need:

Same RDF graph = same representationMinimal difference between versionsMeaningful difference between version

⇒ We have chosen a canonicalized N-Quads serialization

9 / 23

Methodology:Blank Nodes in Versioning

With RDF as exchange format, still blank nodes are a problemBlank nodes identifiers only have a local scope… are not persistent or portable identifiers… are purely an artifact of the serialization

We follow the recommendation of RDF 1.1, to replace blanknodes with IRIs

([Cyganiak et al., 2014] sections 3.4 and 3.5)

10 / 23

Methodology:Blank Nodes in Versioning

With RDF as exchange format, still blank nodes are a problemBlank nodes identifiers only have a local scope… are not persistent or portable identifiers… are purely an artifact of the serializationWe follow the recommendation of RDF 1.1, to replace blanknodes with IRIs

([Cyganiak et al., 2014] sections 3.4 and 3.5)

10 / 23

Methodology:Read/Write Interface

Quit Store

RDFQuad Store

SPARQL 1.1 InterfaceQuery & Update

SPARQL 1.1 Query and UpdateQuery proxy providing aSPARQL endpointExecutes Queries on the StoreTriggers read or writeoperations on the versioninglayer

11 / 23

Methodology:Translating Read/Write Operations

Quit Store

RDFQuad Store

SPARQL 1.1 InterfaceQuery & Update

SPARQL read/write operationsare transformed to commit,merge, revert, push and pull

12 / 23

Methodology:Commit

Is triggered by UPDATE QueriesThe changed graphs are added and commited in a new GitcommitA Commit contains lines resp. statements added/removed

A B

13 / 23

Methodology:Commit

A commit is always referring to its predecessor not vice versaWe can also create two commits with the same predecessor

Branching/Forking

A B C

14 / 23

Methodology:Commit

A commit is always referring to its predecessor not vice versaWe can also create two commits with the same predecessorBranching/Forking

A B C

D

14 / 23

Methodology:Merge

If the commits are diverged we need to synchronize the versions

Create a commit with two predecessorsStill we need to actually consolidate the graphs

A B C

D

15 / 23

Methodology:Merge

If the commits are diverged we need to synchronize the versionsCreate a commit with two predecessorsStill we need to actually consolidate the graphs

A B C

D

E

15 / 23

Methodology:Merge

Using the default three-way-merge from git

<urn:ex:Tilia> a <urn:ex:Tree> .

<urn:ex:Tilia> <urn:ex:age> "1000"^^xsd:integer .

<urn:ex:Tilia> <urn:ex:label> "Linda"@de .

<urn:ex:Tilia> <urn:ex:label> "Tilia"@en .

16 / 23

Methodology:Merge

Using the default three-way-merge from gitOn syntactical level Git produces conflicts

Branch A

<urn:ex:Tilia> a <urn:ex:Tree> .

<urn:ex:Tilia> <urn:ex:age> "1000"^^xsd:integer .

+ <urn:ex:Tilia> <urn:ex:height> "40"^^xsd:integer .

<urn:ex:Tilia> <urn:ex:label> "Linda"@de .

<urn:ex:Tilia> <urn:ex:label> "Tilia"@en .

Branch B

<urn:ex:Tilia> a <urn:ex:Tree> .

<urn:ex:Tilia> <urn:ex:age> "1000"^^xsd:integer .

+ <urn:ex:Tilia> <urn:ex:label> "Linde"@de .

- <urn:ex:Tilia> <urn:ex:label> "Linda"@de .

<urn:ex:Tilia> <urn:ex:label> "Tilia"@en .

16 / 23

Methodology:Merge

Using the default three-way-merge from gitOn syntactical level Git produces conflicts

Git Merge:

<urn:ex:Tilia> a <urn:ex:Tree> .

<urn:ex:Tilia> <urn:ex:age> "1000"^^xsd:integer .

<<<<<<< HEAD

<urn:ex:Tilia> <urn:ex:height> "40"^^xsd:integer . =======

<urn:ex:Tilia> <urn:ex:label> "Linda"@de . <urn:ex:Tilia> <urn:ex:label> "Linde"@de .

>>>>>>> typo

<urn:ex:Tilia> <urn:ex:label> "Tilia"@en .

16 / 23

Methodology:Merge

Using the default three-way-merge from gitOn syntactical level Git produces conflictsBut actually there is no conflictConflicts have to be looked for on other levels

<urn:ex:Tilia> a <urn:ex:Tree> .

<urn:ex:Tilia> <urn:ex:age> "1000"^^xsd:integer .

<urn:ex:Tilia> <urn:ex:height> "40"^^xsd:integer .

<urn:ex:Tilia> <urn:ex:label> "Linde"@de .

<urn:ex:Tilia> <urn:ex:label> "Tilia"@en .

16 / 23

Methodology:Revert

Reverting a commit undoes an earlier changeThis is done by exchanging the add- and delete-set of statements

A B B−1

17 / 23

Implementation

File References

SPARQL 1.1 Interface

Public Git Repository

Local Git Repository

Query-Analyzer

Quad-Store

SPARQL Query

Update

Dump to files

Select

Parse files

Response

Written in Python, using Flask API as HTTP interface and RDFlib forSPARQL and RDF

18 / 23

Integration

Quit Store has the role of managing the repositoryProvide the read/write interfaceSynchronize the repository and the store.

Quit Store

Quit Diff Δ( , )

Quit Merge

60%Quit Notify

Quit Store

RDFQuad Store

SPARQL 1.1 InterfaceQuery & Update

19 / 23

Integration

Quit Diff can calculate differences between commitsTrace provenance of statementTransmit patches to collaborators.

Quit Store

Quit Diff Δ( , )

Quit Merge

60%Quit Notify

Quit Store

RDFQuad Store

SPARQL 1.1 InterfaceQuery & Update

19 / 23

Integration & Future Work

Quit Notify can actively inform other clones of updatesThis enables distributed setups for collaboration andsynchronization.

Quit Store

Quit Diff Δ( , )

Quit Merge

60%Quit Notify

Quit Store

RDFQuad Store

SPARQL 1.1 InterfaceQuery & Update

20 / 23

Integration & Future Work

Quit Merge will implement various merge strategies for RDFDetect conflicts in diverged versions.

Quit Store

Quit Diff Δ( , )

Quit Merge

60%Quit Notify

Quit Store

RDFQuad Store

SPARQL 1.1 InterfaceQuery & Update

20 / 23

Poster

Quit DiffNatanael ArndtNorman Radtke

?LEDS

INKED NTERPRISE ATA ERVICESL E D S

P 3421 / 23

Conclusion

With Quit we have presented amethodology for

version control and trackingprovenance of contributions,synchronization: clone, push and pullby other participants, anddistributed collaboration on RDFdatasets (gitflow)

Hopefully this can help to utilize thebig ecosystem of methodologies andtools around Git

Questions?Natanael Arndt<[email protected]>

Quit Store

Quit Diff Δ( , )

Quit Merge

60%Quit Notify

Quit Store

RDFQuad Store

SPARQL 1.1 InterfaceQuery & Update

22 / 23

Conclusion

With Quit we have presented amethodology for

version control and trackingprovenance of contributions,synchronization: clone, push and pullby other participants, anddistributed collaboration on RDFdatasets (gitflow)

Hopefully this can help to utilize thebig ecosystem of methodologies andtools around GitQuestions?Natanael Arndt<[email protected]>

Quit Store

Quit Diff Δ( , )

Quit Merge

60%Quit Notify

Quit Store

RDFQuad Store

SPARQL 1.1 InterfaceQuery & Update

22 / 23

References I

Cyganiak, R., Wood, D., and Lanthaler, M. (2014).Rdf 1.1 concepts and abstract syntax.https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/.

23 / 23