Enabling Fine-grained RDF Data Completeness Assessment

Preview:

Citation preview

Enabling Fine-grainedRDF Data Completeness Assessment

Fariz Darari, Simon Razniewski, Radityo E. Prasojo, Werner Nutt

KRDB, Free University of Bozen-Bolzano, Italy

ICWE 2016Lugano, Switzerland

June 8, 2016

Supported by the project MAGIC, funded by the province of Bolzano

Managing Completeness over Web Data June 8, 2016 1 / 31

Quality of Web Data: Completeness

How complete are Web data sources?

Managing Completeness over Web Data June 8, 2016 2 / 31

How complete is Wikidata for Apollo 11’s crew?

Managing Completeness over Web Data June 8, 2016 3 / 31

NASA says . . .

Managing Completeness over Web Data June 8, 2016 4 / 31

Wikidata is complete for Apollo 11’s crew!

Managing Completeness over Web Data June 8, 2016 5 / 31

Wikidata supports a special form ofcompleteness statement

Managing Completeness over Web Data June 8, 2016 6 / 31

Completeness Statements

Syntax:Compl(s,p, ?o)

Semantics:

Graph G has Compl(s,p, ?o)↓

G is complete for all p-values of s that exist in reality

Managing Completeness over Web Data June 8, 2016 7 / 31

Completeness Statements

Syntax:Compl(s,p, ?o)

Semantics:

Graph G has Compl(s,p, ?o)

↓G is complete for all p-values of s that exist in reality

Managing Completeness over Web Data June 8, 2016 7 / 31

Completeness Statements

Syntax:Compl(s,p, ?o)

Semantics:

Graph G has Compl(s,p, ?o)↓

G is complete for all p-values of s that exist in reality

Managing Completeness over Web Data June 8, 2016 7 / 31

Usages of Completeness Statements

Tracking data completion progress of KB contributors

Providing statistics about completeness of KBsExample: For 25% of Swiss cantons, Wikidata is completefor their official languages.

Checking query completeness

Managing Completeness over Web Data June 8, 2016 8 / 31

Usages of Completeness Statements

Tracking data completion progress of KB contributors

Providing statistics about completeness of KBsExample: For 25% of Swiss cantons, Wikidata is completefor their official languages.

Checking query completeness

Managing Completeness over Web Data June 8, 2016 8 / 31

Usages of Completeness Statements

Tracking data completion progress of KB contributors

Providing statistics about completeness of KBsExample: For 25% of Swiss cantons, Wikidata is completefor their official languages.

Checking query completeness

Managing Completeness over Web Data June 8, 2016 8 / 31

Checking Query Completeness

GA99: graph about the space mission A99

P1: query for schools of the children of A99’s crew

{ (A99, crew , ?cr), (?cr , child , ?ch), (?ch, school , ?sc) }

Evaluating P1 over GA99 gives one answer mapping:

{?cr 7→ Chan,?ch 7→ Dani,?sc 7→ USI}

Is P1 complete over GA99? We don’t know!

Managing Completeness over Web Data June 8, 2016 9 / 31

Checking Query Completeness

GA99: graph about the space mission A99

P1: query for schools of the children of A99’s crew

{ (A99, crew , ?cr), (?cr , child , ?ch), (?ch, school , ?sc) }

Evaluating P1 over GA99 gives one answer mapping:

{?cr 7→ Chan,?ch 7→ Dani,?sc 7→ USI}

Is P1 complete over GA99? We don’t know!

Managing Completeness over Web Data June 8, 2016 9 / 31

Checking Query Completeness

GA99: graph about the space mission A99

P1: query for schools of the children of A99’s crew

{ (A99, crew , ?cr), (?cr , child , ?ch), (?ch, school , ?sc) }

Evaluating P1 over GA99 gives one answer mapping:

{?cr 7→ Chan,?ch 7→ Dani,?sc 7→ USI}

Is P1 complete over GA99? We don’t know!

Managing Completeness over Web Data June 8, 2016 9 / 31

Checking Query Completeness

GA99: graph about the space mission A99

P1: query for schools of the children of A99’s crew

{ (A99, crew , ?cr), (?cr , child , ?ch), (?ch, school , ?sc) }

Evaluating P1 over GA99 gives one answer mapping:

{?cr 7→ Chan,?ch 7→ Dani,?sc 7→ USI}

Is P1 complete over GA99?

We don’t know!

Managing Completeness over Web Data June 8, 2016 9 / 31

Checking Query Completeness

GA99: graph about the space mission A99

P1: query for schools of the children of A99’s crew

{ (A99, crew , ?cr), (?cr , child , ?ch), (?ch, school , ?sc) }

Evaluating P1 over GA99 gives one answer mapping:

{?cr 7→ Chan,?ch 7→ Dani,?sc 7→ USI}

Is P1 complete over GA99? We don’t know!

Managing Completeness over Web Data June 8, 2016 9 / 31

Checking Query Completeness

P1 = { (A99, crew , ?cr), (?cr , child , ?ch), (?ch, school , ?sc) }

CA99: set of completeness statements consisting of

C1 = Compl(A99, crew,?o)

Managing Completeness over Web Data June 8, 2016 10 / 31

Checking Query Completeness

P1 = { (A99, crew , ?cr), (?cr , child , ?ch), (?ch, school , ?sc) }

CA99: set of completeness statements consisting of

C1 = Compl(A99, crew,?o)C2 = Compl(Bob, child,?o)

Managing Completeness over Web Data June 8, 2016 11 / 31

Checking Query Completeness

P1 = { (A99, crew , ?cr), (?cr , child , ?ch), (?ch, school , ?sc) }

CA99: set of completeness statements consisting of

C1 = Compl(A99, crew,?o)C2 = Compl(Bob, child,?o)C3 = Compl(Chan, child,?o)

Managing Completeness over Web Data June 8, 2016 12 / 31

Checking Query Completeness

P1 = { (A99, crew , ?cr), (?cr , child , ?ch), (?ch, school , ?sc) }

CA99: set of completeness statements consisting of

C1 = Compl(A99, crew,?o)C2 = Compl(Bob, child,?o)C3 = Compl(Chan, child,?o)C4 = Compl(Dani, school,?o)

Managing Completeness over Web Data June 8, 2016 13 / 31

Checking Query Completeness

P1 = { (A99, crew , ?cr), (?cr , child , ?ch), (?ch, school , ?sc) }

CA99: set of completeness statements consisting of

C1 = Compl(A99, crew,?o)C2 = Compl(Bob, child,?o)C3 = Compl(Chan, child,?o)C4 = Compl(Dani, school,?o)

Is P1 complete over GA99 wrt. CA99?Managing Completeness over Web Data June 8, 2016 14 / 31

Checking Query Completeness

P1 = { (A99, crew , ?cr), (?cr , child , ?ch), (?ch, school , ?sc) }

C1 matches the first triple of P1

→ Complete for Pc1 = (A99, crew , ?cr)

Instantiating the rest of P1 with the answers of Pc1 gives:

P2 = { (Bob, child , ?ch), (?ch, school , ?sc) }P3 = { (Chan, child , ?ch), (?ch, school , ?sc) }

Managing Completeness over Web Data June 8, 2016 15 / 31

Checking Query Completeness

P1 = { (A99, crew , ?cr), (?cr , child , ?ch), (?ch, school , ?sc) }

C1 matches the first triple of P1 → Complete for Pc1 = (A99, crew , ?cr)

Instantiating the rest of P1 with the answers of Pc1 gives:

P2 = { (Bob, child , ?ch), (?ch, school , ?sc) }P3 = { (Chan, child , ?ch), (?ch, school , ?sc) }

Managing Completeness over Web Data June 8, 2016 15 / 31

Checking Query Completeness

P1 = { (A99, crew , ?cr), (?cr , child , ?ch), (?ch, school , ?sc) }

C1 matches the first triple of P1 → Complete for Pc1 = (A99, crew , ?cr)

Instantiating the rest of P1 with the answers of Pc1 gives:

P2 = { (Bob, child , ?ch), (?ch, school , ?sc) }P3 = { (Chan, child , ?ch), (?ch, school , ?sc) }

Managing Completeness over Web Data June 8, 2016 15 / 31

Checking Query Completeness

P2 = { (Bob, child, ?ch), (?ch, school , ?sc) }

C2 matches the first triple of P2

→ Complete for Pc2 = (Bob, child , ?ch)

Instantiating the rest of P2 with the answers of Pc2 gives: nothing

Complete for P2

Managing Completeness over Web Data June 8, 2016 16 / 31

Checking Query Completeness

P2 = { (Bob, child, ?ch), (?ch, school , ?sc) }

C2 matches the first triple of P2 → Complete for Pc2 = (Bob, child , ?ch)

Instantiating the rest of P2 with the answers of Pc2 gives: nothing

Complete for P2

Managing Completeness over Web Data June 8, 2016 16 / 31

Checking Query Completeness

P2 = { (Bob, child, ?ch), (?ch, school , ?sc) }

C2 matches the first triple of P2 → Complete for Pc2 = (Bob, child , ?ch)

Instantiating the rest of P2 with the answers of Pc2 gives: nothing

Complete for P2

Managing Completeness over Web Data June 8, 2016 16 / 31

Checking Query Completeness

P3 = { (Chan, child, ?ch), (?ch, school , ?sc) }

C3 matches the first triple of P3

→ Complete forPc3 = (Chan, child , ?ch)

Instantiating the rest of P3 with the answers of Pc3 gives:

P4 = { (Dani , school , ?sc) }

Managing Completeness over Web Data June 8, 2016 17 / 31

Checking Query Completeness

P3 = { (Chan, child, ?ch), (?ch, school , ?sc) }

C3 matches the first triple of P3 → Complete forPc3 = (Chan, child , ?ch)

Instantiating the rest of P3 with the answers of Pc3 gives:

P4 = { (Dani , school , ?sc) }

Managing Completeness over Web Data June 8, 2016 17 / 31

Checking Query Completeness

P3 = { (Chan, child, ?ch), (?ch, school , ?sc) }

C3 matches the first triple of P3 → Complete forPc3 = (Chan, child , ?ch)

Instantiating the rest of P3 with the answers of Pc3 gives:

P4 = { (Dani , school , ?sc) }

Managing Completeness over Web Data June 8, 2016 17 / 31

Checking Query Completeness

P4 = { (Dani, school, ?sc) }

C4 matches the only triple of P4

→ Complete for the whole P4

Conclusion: We found complete matchesfor all query instantiations from P1→ P1 is complete over GA99 wrt. CA99

Managing Completeness over Web Data June 8, 2016 18 / 31

Checking Query Completeness

P4 = { (Dani, school, ?sc) }

C4 matches the only triple of P4 → Complete for the whole P4

Conclusion: We found complete matchesfor all query instantiations from P1→ P1 is complete over GA99 wrt. CA99

Managing Completeness over Web Data June 8, 2016 18 / 31

Checking Query Completeness

P4 = { (Dani, school, ?sc) }

C4 matches the only triple of P4 → Complete for the whole P4

Conclusion: We found complete matchesfor all query instantiations from P1

→ P1 is complete over GA99 wrt. CA99

Managing Completeness over Web Data June 8, 2016 18 / 31

Checking Query Completeness

P4 = { (Dani, school, ?sc) }

C4 matches the only triple of P4 → Complete for the whole P4

Conclusion: We found complete matchesfor all query instantiations from P1→ P1 is complete over GA99 wrt. CA99

Managing Completeness over Web Data June 8, 2016 18 / 31

Algorithm for Checking Query Completeness

Input: P query, G graph, C set of completeness statements

Output: true iff P is complete wrt. G and C

P ← {P}while P 6= ∅ do

choose and remove P0 ∈ PPc

0 ← FindMatch(P0, C)if Pc

0 = ∅return false

elsePrest

0 ← P0 \ Pc0

P ← P ∪ {µPrest0 | µ ∈ JPc

0KG}return true

Managing Completeness over Web Data June 8, 2016 19 / 31

Experimental Questions

What is the relationship between the number of query answersand completeness checking time?How do query evaluation time and completeness checkingtime compare?Is there a difference between completeness checking timefor complete and incomplete cases?

Managing Completeness over Web Data June 8, 2016 20 / 31

Experimental Setup

Graph: Wikidata

Queries: Three sets of path queries with an increasing number ofquery results (3 sets x 40 queries)

Pmot = { ($c$,mother , ?w), (?w ,mother , ?x), (?x ,mother , ?y) }Pcre = { ($c$, crew , ?w), (?w ,mission, ?x), (?x ,operator , ?y) }Pdiv = { ($c$,division, ?w), (?w ,division, ?x), (?x ,area, ?y) }

Completeness statements:Complete case: generated by traversing the query structure(1.7 mio statements)Incomplete case: drop randomly 20% of the statementsin the complete case

Managing Completeness over Web Data June 8, 2016 21 / 31

Experimental Setup

Graph: Wikidata

Queries: Three sets of path queries with an increasing number ofquery results (3 sets x 40 queries)

Pmot = { ($c$,mother , ?w), (?w ,mother , ?x), (?x ,mother , ?y) }Pcre = { ($c$, crew , ?w), (?w ,mission, ?x), (?x ,operator , ?y) }Pdiv = { ($c$,division, ?w), (?w ,division, ?x), (?x ,area, ?y) }

Completeness statements:Complete case: generated by traversing the query structure(1.7 mio statements)Incomplete case: drop randomly 20% of the statementsin the complete case

Managing Completeness over Web Data June 8, 2016 21 / 31

Experimental Setup

Graph: Wikidata

Queries: Three sets of path queries with an increasing number ofquery results (3 sets x 40 queries)

Pmot = { ($c$,mother , ?w), (?w ,mother , ?x), (?x ,mother , ?y) }Pcre = { ($c$, crew , ?w), (?w ,mission, ?x), (?x ,operator , ?y) }Pdiv = { ($c$,division, ?w), (?w ,division, ?x), (?x ,area, ?y) }

Completeness statements:Complete case: generated by traversing the query structure(1.7 mio statements)Incomplete case: drop randomly 20% of the statementsin the complete case

Managing Completeness over Web Data June 8, 2016 21 / 31

Experimental Setup

Implementation: Java with the Apache Jena libraryCompleteness statement matching = standard Java HashMapTriple store = Jena-TDB

Machine: 2.4 GHz laptop with 8 GB memory

Managing Completeness over Web Data June 8, 2016 22 / 31

Experimental Results

The more the query results, the longer the completeness checks

Though slower than query evaluation, in an absolute scalecompleteness checking performs reasonably well (at most 35 ms)Complete cases are slower than incomplete cases

Managing Completeness over Web Data June 8, 2016 23 / 31

Experimental Results

The more the query results, the longer the completeness checksThough slower than query evaluation, in an absolute scalecompleteness checking performs reasonably well (at most 35 ms)

Complete cases are slower than incomplete cases

Managing Completeness over Web Data June 8, 2016 23 / 31

Experimental Results

The more the query results, the longer the completeness checksThough slower than query evaluation, in an absolute scalecompleteness checking performs reasonably well (at most 35 ms)Complete cases are slower than incomplete cases

Managing Completeness over Web Data June 8, 2016 23 / 31

Practical Applications of Completeness Statements

How complete are Web data sources?

To answer the question, we need to provideA way to annotate complete parts of a data source usingcompleteness statementsWays to utilize the completeness statements to give insightson how complete the data source is

Managing Completeness over Web Data June 8, 2016 24 / 31

COOL-WD: COmpleteness toOL for WikiData

We have developed

a demo of completeness management tool for Wikidata

COOL-WD provides ways toannotate complete parts of Wikidatautilize completeness statements to do completenessaggregation and query completeness assessment

Managing Completeness over Web Data June 8, 2016 25 / 31

COOL-WD: Detailed Features

Management of completeness statementsAdding or removing completeness statements of any property of aWikidata entity

Viewing an entity page with its completeness annotationsAggregation of completeness statementsAssessment of query completeness

Managing Completeness over Web Data June 8, 2016 26 / 31

COOL-WD: Architecture

SPARQLEndpoint MediaWiki API

COOL-WDEngine

COOL-WDUserInterface

HTTP RequestsData Access Web Browsing

SPARQL Queries API Calls

Completeness DB

Managing Completeness over Web Data June 8, 2016 27 / 31

COOL-WD: Demo

http://cool-wd.inf.unibz.it/

Managing Completeness over Web Data June 8, 2016 28 / 31

Conclusions

We developed a sound and complete algorithmfor query completeness checking wrt. an RDF graph andcompleteness statements

The algorithm can be generalized to consider a more general formof completeness statements: Compl(P) where P is a basic graphpattern (BGP)

We evaluated completeness checking performanceWe developed COOL-WD, a completeness tool for Wikidata

Managing Completeness over Web Data June 8, 2016 29 / 31

Conclusions

We developed a sound and complete algorithmfor query completeness checking wrt. an RDF graph andcompleteness statements

The algorithm can be generalized to consider a more general formof completeness statements: Compl(P) where P is a basic graphpattern (BGP)

We evaluated completeness checking performanceWe developed COOL-WD, a completeness tool for Wikidata

Managing Completeness over Web Data June 8, 2016 29 / 31

Conclusions

We developed a sound and complete algorithmfor query completeness checking wrt. an RDF graph andcompleteness statements

The algorithm can be generalized to consider a more general formof completeness statements: Compl(P) where P is a basic graphpattern (BGP)

We evaluated completeness checking performance

We developed COOL-WD, a completeness tool for Wikidata

Managing Completeness over Web Data June 8, 2016 29 / 31

Conclusions

We developed a sound and complete algorithmfor query completeness checking wrt. an RDF graph andcompleteness statements

The algorithm can be generalized to consider a more general formof completeness statements: Compl(P) where P is a basic graphpattern (BGP)

We evaluated completeness checking performanceWe developed COOL-WD, a completeness tool for Wikidata

Managing Completeness over Web Data June 8, 2016 29 / 31

Ongoing Work

We plan to leverage completeness statements for checkingthe soundness of queries with negation1

We plan to develop fast completeness checks for arbitrarycompleteness statements1

We plan to exploit the potential of natural language completenessstatements already available on the Web: 14K in Wikipedia,24K in IMDb, 2200 in OpenStreetMapWe plan to extend COOL-WD with new cool features

Completeness analyticsQuery completeness diagnosticsLinked data publication of completeness statementsCompleteness gadget for tighter integration with Wikidata

1Darari et al. Ensuring Soundness for SPARQL with Negation UsingCompleteness Statements. Submitted to a conference.

Managing Completeness over Web Data June 8, 2016 30 / 31

Ongoing Work

We plan to leverage completeness statements for checkingthe soundness of queries with negation1

We plan to develop fast completeness checks for arbitrarycompleteness statements1

We plan to exploit the potential of natural language completenessstatements already available on the Web: 14K in Wikipedia,24K in IMDb, 2200 in OpenStreetMap

We plan to extend COOL-WD with new cool featuresCompleteness analyticsQuery completeness diagnosticsLinked data publication of completeness statementsCompleteness gadget for tighter integration with Wikidata

1Darari et al. Ensuring Soundness for SPARQL with Negation UsingCompleteness Statements. Submitted to a conference.

Managing Completeness over Web Data June 8, 2016 30 / 31

Ongoing Work

We plan to leverage completeness statements for checkingthe soundness of queries with negation1

We plan to develop fast completeness checks for arbitrarycompleteness statements1

We plan to exploit the potential of natural language completenessstatements already available on the Web: 14K in Wikipedia,24K in IMDb, 2200 in OpenStreetMapWe plan to extend COOL-WD with new cool features

Completeness analyticsQuery completeness diagnosticsLinked data publication of completeness statementsCompleteness gadget for tighter integration with Wikidata

1Darari et al. Ensuring Soundness for SPARQL with Negation UsingCompleteness Statements. Submitted to a conference.

Managing Completeness over Web Data June 8, 2016 30 / 31

Thank you!

Questions? Just drop Fariz an email: fadirra@gmail.com

Big thanks to Springer for the travel grant!

Have a look at the paper:http://dx.doi.org/10.1007/978-3-319-38791-8_10

And finally, a completeness statement for all the slides :-)Compl(thisSlideset,hasSlide,?o)

Managing Completeness over Web Data June 8, 2016 31 / 31

Recommended