39
Completeness Statements about RDF Data Sources and Their Use for Query Answering Fariz Darari Werner Nutt Giuseppe Pirrò Simon Razniewski Presented at ISWC 2013 Supported by SWSA with the Travel Grant Oct 23rd, 2013 Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 1 / 30

Completeness Statements at ISWC 2013

Embed Size (px)

DESCRIPTION

Completeness statements make explicit which information is complete for data sources. They can be used to ensure the completeness of query answers.

Citation preview

Page 1: Completeness Statements at ISWC 2013

Completeness Statements about RDF Data Sources andTheir Use for Query Answering

Fariz Darari

Werner NuttGiuseppe Pirrò

Simon Razniewski

Presented at ISWC 2013Supported by SWSA with the Travel Grant

Oct 23rd, 2013

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 1 / 30

Page 2: Completeness Statements at ISWC 2013

Motivation

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 2 / 30

Page 3: Completeness Statements at ISWC 2013

Motivation

Completeness statement about the IMDB data source

Quentin Tarantinowas the character

Mr. Brown

…………………………

……………

http://www.imdb.com/title/tt0105236/fullcredits?ref_=tt_ov_st_sm#cast

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 3 / 30

Page 4: Completeness Statements at ISWC 2013

Motivation

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 4 / 30

Page 5: Completeness Statements at ISWC 2013

Motivation

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 5 / 30

Page 6: Completeness Statements at ISWC 2013

Motivation

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 6 / 30

Page 7: Completeness Statements at ISWC 2013

Incomplete Data Source

Quentin Tarantino is missing..

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 7 / 30

Page 8: Completeness Statements at ISWC 2013

Incomplete Data Source

An incomplete data source of Tarantino movies,Gdbp = (Ga

dbp,Gidbp):

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 8 / 30

Page 9: Completeness Statements at ISWC 2013

Incomplete Data Source

An incomplete data source of Tarantino movies,Gdbp = (Ga

dbp,Gidbp):

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 9 / 30

Page 10: Completeness Statements at ISWC 2013

Incomplete Data SourceAn incomplete data source of Tarantino movies,Gdbp = (Ga

dbp,Gidbp):

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 10 / 30

Page 11: Completeness Statements at ISWC 2013

Incomplete Data Source

Incomplete Data SourceAn incomplete data source is a pair of two graphs,

G = (Ga,Gi), where Ga ⊆ Gi .

We call Ga the available graph and Gi the ideal graph.

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 11 / 30

Page 12: Completeness Statements at ISWC 2013

Completeness Statements: Examples

To express that a source is completefor all the triples about movies directed by Tarantino,we use the statement

Cdir = Compl((?m, type,Movie), (?m,director , tarantino) | ∅),

To express that a source is completefor all triples about actors in movies directed by Tarantino,we use

Cact =

Compl((?m,actor , ?a) | (?m, type,Movie), (?m,director , tara))

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 12 / 30

Page 13: Completeness Statements at ISWC 2013

Completeness Statements: Examples

To express that a source is completefor all the triples about movies directed by Tarantino,we use the statement

Cdir = Compl((?m, type,Movie), (?m,director , tarantino) | ∅),

To express that a source is completefor all triples about actors in movies directed by Tarantino,we use

Cact =

Compl((?m,actor , ?a) | (?m, type,Movie), (?m,director , tara))

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 12 / 30

Page 14: Completeness Statements at ISWC 2013

Completeness Statement: Definition

Let P1 be a non-empty BGP (Basic Graph Pattern) andP2 a BGP.A completeness statement is defined as

Compl(P1 | P2)

where we call P1 the pattern and P2 the conditionof the statement.

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 13 / 30

Page 15: Completeness Statements at ISWC 2013

Satisfaction of Completeness Statements

To the statement

C = Compl(P1 | P2),

we associate the CONSTRUCT query

QC = (P1,P1 ∪ P2).

Then, we say:

C is satisfied by an IDS G = (Ga,Gi), written G |= C, if

JQCKGi ⊆ Ga.

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 14 / 30

Page 16: Completeness Statements at ISWC 2013

Completeness Statements

Cdir = Compl((?m, type,Movie), (?m,director , tarantino) | ∅)

Question: Gdbp |= Cdir ?

Yes, because JQCdir KGidbp⊆ Ga

dbp.

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 15 / 30

Page 17: Completeness Statements at ISWC 2013

Completeness Statements

Cdir = Compl((?m, type,Movie), (?m,director , tarantino) | ∅)

Question: Gdbp |= Cdir ?Yes, because JQCdir KGi

dbp⊆ Ga

dbp.Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 15 / 30

Page 18: Completeness Statements at ISWC 2013

Completeness Statements

Cact = Compl((?m,actor , ?a) | (?m, type,Movie), (?m,director , tara))

Question: Gdbp |= Cact?

No, because among the results of JQCact KGidbp

,there is (reservoirDogs,actor , tarantino) not in Ga

dbp.

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 16 / 30

Page 19: Completeness Statements at ISWC 2013

Completeness Statements

Cact = Compl((?m,actor , ?a) | (?m, type,Movie), (?m,director , tara))

Question: Gdbp |= Cact?No, because among the results of JQCact KGi

dbp,

there is (reservoirDogs,actor , tarantino) not in Gadbp.

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 16 / 30

Page 20: Completeness Statements at ISWC 2013

Completeness Statements in RDF

Cact = Compl((?m,actor , ?a) | (?m, type,Movie), (?m,director , tara))

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 17 / 30

Page 21: Completeness Statements at ISWC 2013

Query Completeness: Example

Qdir = ({?m}, {(?m, type,Movie), (?m,director , tarantino)})

Then, JQdirKGidbp

= JQdirKGadbp

, and therefore Gdbp |= Compl(Qdir ).

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 18 / 30

Page 22: Completeness Statements at ISWC 2013

Query Completeness: Example

Qdir = ({?m}, {(?m, type,Movie), (?m,director , tarantino)})

Then, JQdirKGidbp

= JQdirKGadbp

, and therefore Gdbp |= Compl(Qdir ).Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 18 / 30

Page 23: Completeness Statements at ISWC 2013

Query Completeness: Definition

DefinitionLet Q be a SELECT query. We write

Compl(Q)

to say that Q is complete.An incomplete data source G = (Ga,Gi) satisfies Compl(Q),written

G |= Compl(Q) if and only if JQKGi = JQKGa .

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 19 / 30

Page 24: Completeness Statements at ISWC 2013

Completeness Entailment

Let C be a set of completeness statements andQ a SELECT query.We say that C entails the completeness of Q, written

C |= Compl(Q),

if any incomplete data source satisfying Calso satisfies Compl(Q).

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 20 / 30

Page 25: Completeness Statements at ISWC 2013

Completeness Reasoning

Transfer OperatorFor any set C of completeness statements and a graph G,we define the transfer operator TC:

TC(G) =⋃

C∈C

JQCKG

Prototypical Graph

Let Q = (W ,P) be a SELECT query.The prototypical graph P̃ is the graph resultingfrom the mapping of variables in P to fresh, unique IRIs.

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 21 / 30

Page 26: Completeness Statements at ISWC 2013

Completeness Reasoning

Transfer OperatorFor any set C of completeness statements and a graph G,we define the transfer operator TC:

TC(G) =⋃

C∈C

JQCKG

Prototypical Graph

Let Q = (W ,P) be a SELECT query.The prototypical graph P̃ is the graph resultingfrom the mapping of variables in P to fresh, unique IRIs.

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 21 / 30

Page 27: Completeness Statements at ISWC 2013

Completeness of Basic Queries

TheoremLet C be a set of completeness statements andQ = (W ,P) a basic query. Then,

C |= Compl(Q) if and only if P̃ = TC(P̃).

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 22 / 30

Page 28: Completeness Statements at ISWC 2013

Completeness Reasoning: Example

Consider the set of completeness statements

Cdir ,act = {Cdir ,Cact }

and the query

Qdir+act = ({ ?m },Pdir+act)

where

Pdir+act ={ (?m, type,Movie), (?m,director , tarantino), (?m,actor , tarantino) }.

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 23 / 30

Page 29: Completeness Statements at ISWC 2013

Completeness Reasoning: Example

Consider the set of completeness statements

Cdir ,act = {Cdir ,Cact }

and the query

Qdir+act = ({ ?m },Pdir+act)

Then,

P̃dir+act ={ (c?m, type,Movie), (c?m,director , tarantino), (c?m,actor , tarantino) }.

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 24 / 30

Page 30: Completeness Statements at ISWC 2013

Completeness Reasoning: Example

Cdir ,act = {Cdir ,Cact }

Qdir+act = ({ ?m },Pdir+act)

Therefore,

TCdir,act (P̃dir+act)

= JQCdir KP̃dir+act∪ JQCact KP̃dir+act

= { (c?m, type,Movie), (c?m,director , tara), (c?m,actor , tara) }= P̃dir+act .

Thus, Cdir ,act |= Compl(Qdir+act)

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 25 / 30

Page 31: Completeness Statements at ISWC 2013

Completeness Reasoning: Example

Cdir ,act = {Cdir ,Cact }

Qdir+act = ({ ?m },Pdir+act)

Therefore,

TCdir,act (P̃dir+act)= JQCdir KP̃dir+act

∪ JQCact KP̃dir+act

= { (c?m, type,Movie), (c?m,director , tara), (c?m,actor , tara) }= P̃dir+act .

Thus, Cdir ,act |= Compl(Qdir+act)

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 25 / 30

Page 32: Completeness Statements at ISWC 2013

Completeness Reasoning: Example

Cdir ,act = {Cdir ,Cact }

Qdir+act = ({ ?m },Pdir+act)

Therefore,

TCdir,act (P̃dir+act)= JQCdir KP̃dir+act

∪ JQCact KP̃dir+act

= { (c?m, type,Movie), (c?m,director , tara),

(c?m,actor , tara) }= P̃dir+act .

Thus, Cdir ,act |= Compl(Qdir+act)

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 25 / 30

Page 33: Completeness Statements at ISWC 2013

Completeness Reasoning: Example

Cdir ,act = {Cdir ,Cact }

Qdir+act = ({ ?m },Pdir+act)

Therefore,

TCdir,act (P̃dir+act)= JQCdir KP̃dir+act

∪ JQCact KP̃dir+act

= { (c?m, type,Movie), (c?m,director , tara), (c?m,actor , tara) }

= P̃dir+act .

Thus, Cdir ,act |= Compl(Qdir+act)

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 25 / 30

Page 34: Completeness Statements at ISWC 2013

Completeness Reasoning: Example

Cdir ,act = {Cdir ,Cact }

Qdir+act = ({ ?m },Pdir+act)

Therefore,

TCdir,act (P̃dir+act)= JQCdir KP̃dir+act

∪ JQCact KP̃dir+act

= { (c?m, type,Movie), (c?m,director , tara), (c?m,actor , tara) }= P̃dir+act .

Thus, Cdir ,act |= Compl(Qdir+act)

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 25 / 30

Page 35: Completeness Statements at ISWC 2013

The framework can also be applied to:

DISTINCT QueriesOPT QueriesQueries with RDFS Data Sources

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 26 / 30

Page 36: Completeness Statements at ISWC 2013

Federated Framework

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 27 / 30

Page 37: Completeness Statements at ISWC 2013

CoRner: Completeness Reasoner

Can check the completeness of a subset of SPARQLqueries depending on the completeness statements ofa single data source.Developed in Java using the Apache Jena library.Takes three inputs:

Completeness statements in RDF formatA SPARQL query(optional) an RDFS ontology

Available at http://rdfcorner.wordpress.com/.

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 28 / 30

Page 38: Completeness Statements at ISWC 2013

Conclusions and Future Work

We developed a theoretical framework for the expression ofcompleteness statements about data sources, from whichwe can ensure query completeness.We provide a compact RDF representation of completenessstatements, which can be embedded in VoID descriptions,and implemented the framework in CoRner.We are interested in studying completeness reasoning withmore expressive queries and OWL 2.

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 29 / 30

Page 39: Completeness Statements at ISWC 2013

Terima kasih, grazie, danke!!

Link to References

Fariz Darari (ISWC 2013) Completeness Reasoning @ Linked Data Oct 23rd, 2013 30 / 30