Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Default Negation for Datalog§
Andreas Pieris
Institute of Information Systems, Vienna University of Technology, Austria
GTTV, Lexington, KY, USA, September 27, 2015
Goal of the Datalog§ Project
Transform Datalog from a first class database query language
to a first class language for knowledge representation
(and other applications)
But first, let say few words about the good old plain Datalog
Datalog
• Recursive database query language defined in the 1980s
• A useful framework for inductive definitions
• Simple syntax and clear semantics
• Well-understood (query answering and containment, optimisations)
• Large projects and companies are “Datalog-based”
London
Vienna
Larnaca
Glasgow
Edinburgh
Datalog
Is Glasgow reachable from Vienna?
Flight(X,Y) Reachable(X,Y)
Flight(X,Y), Reachable(Y,Z) Reachable(X,Z)
Reachable(Vienna,Glasgow) Yes()
Flight(X,Y) Reachable(X,Y)
Flight(X,Y), Reachable(Y,Z) Reachable(X,Z)
Reachable(Vienna,Glasgow) Yes()
Datalog
DATALOG = Select-Project-Join + Recursion
Recursion - FOL or SQL queries are not enough
Modeling Ontologies
DL Axiom Rule-based Representation
A u B v C A(X), B(X) C(X)
Parent u Malev Father
Modeling Ontologies
DL Axiom Rule-based Representation
A u B v C A(X), B(X) C(X)
A v 8R.B A(X), R(X,Y) B(Y)
MetalDevicev 8hasPart.Metal
Modeling Ontologies
DL Axiom Rule-based Representation
A u B v C A(X), B(X) C(X)
A v 8R.B A(X), R(X,Y) B(Y)
R v S R(X,Y) S(X,Y)
brotherOfv relativeOf
Modeling Ontologies
DL Axiom Rule-based Representation
A u B v C A(X), B(X) C(X)
A v 8R.B A(X), R(X,Y) B(Y)
R v S R(X,Y) S(X,Y)
R inv S R(X,Y) S(Y,X)
parentOf inv childOf
Modeling Ontologies
DL Axiom Rule-based Representation
A u B v C A(X), B(X) C(X)
A v 8R.B A(X), R(X,Y) B(Y)
R v S R(X,Y) S(X,Y)
R inv S R(X,Y) S(Y,X)
trans(R) R(X,Y), R(Y,Z) R(X,Z)
trans(ancestorOf)
Modeling Ontologies
DL Axiom Rule-based Representation
A u B v C A(X), B(X) C(X)
A v 8R.B A(X), R(X,Y) B(Y)
R v S R(X,Y) S(X,Y)
R inv S R(X,Y) S(Y,X)
trans(R) R(X,Y), R(Y,Z) R(X,Z)
A v 9R.B A(X) 9Y R(X,Y) R(X,Y) B(Y)
Studentv attends.Course
Modeling Ontologies
DL Axiom Rule-based Representation
A u B v C A(X), B(X) C(X)
A v 8R.B A(X), R(X,Y) B(Y)
R v S R(X,Y) S(X,Y)
R inv S R(X,Y) S(Y,X)
trans(R) R(X,Y), R(Y,Z) R(X,Z)
A v 9R.B A(X) 9Y R(X,Y) R(X,Y) B(Y)
A v 9·1R.B A(X), R(X,Y), B(Y), R(X,Z), B(Z) Y = Z
Personv 9·1hasPassport.Valid
Modeling Ontologies
DL Axiom Rule-based Representation
A u B v C A(X), B(X) C(X)
A v 8R.B A(X), R(X,Y) B(Y)
R v S R(X,Y) S(X,Y)
R inv S R(X,Y) S(Y,X)
trans(R) R(X,Y), R(Y,Z) R(X,Z)
A v 9R.B A(X) 9Y R(X,Y) R(X,Y) B(Y)
A v 9·1R.B A(X), R(X,Y), B(Y), R(X,Z), B(Z) Y = Z
A disj B A(X), B(X) ?
Student disj Professor
Modeling Ontologies using Datalog
DL Axiom Rule-based Representation
A u B v C A(X), B(X) C(X)
A v 8R.B A(X), R(X,Y) B(Y)
R v S R(X,Y) S(X,Y)
R inv S R(X,Y) S(Y,X)
trans(R) R(X,Y), R(Y,Z) R(X,Z)
A v 9R.B A(X) 9Y R(X,Y) R(X,Y) B(Y)
A v 9·1R.B A(X), R(X,Y), B(Y), R(X,Z), B(Z) Y = Z
A disj B A(X), B(X) ?
Much is possible with Datalog
Modeling Ontologies using Datalog
DL Axiom Rule-based Representation
A u B v C A(X), B(X) C(X)
A v 8R.B A(X), R(X,Y) B(Y)
R v S R(X,Y) S(X,Y)
R inv S R(X,Y) S(Y,X)
trans(R) R(X,Y), R(Y,Z) R(X,Z)
A v 9R.B A(X) 9Y R(X,Y) R(X,Y) B(Y)
A v 9·1R.B A(X), R(X,Y), B(Y), R(X,Z), B(Z) Y = Z
A disj B A(X), B(X) ?
Much is not possible with Datalog
Datalog+
• Extend Datalog by allowing in the head:
o Existential quantification (9)
o Equality atoms (=)
o Constant false (?)
…for query answering over databases
Datalog[9,=,?]
highly expressive KR language
Datalog+ vs. DLs
• Several Horn-DLs (no disjunction) can be expressed via Datalog+ rules
• But, Datalog+ rules can express more
• Higher arity predicates allow for more flexibility
o DLs have only unary and binary predicates - concepts and roles
Boss(X) supervisorOf(X,X)
siblingOf(X,Y) 9Z (parentOf(Z,X), parentOf(Z,Y))
Datalog+: Other Appications
• Data Exchange
• Data Extraction
• Conceptual Modeling (e.g., UML)
• Querying the Semantic Web (RDF graphs)
• Automated Product Configuration
Datalog+: Other Appications
• Data Exchange
• Data Extraction
• Conceptual Modeling (e.g., UML)
• Querying the Semantic Web (RDF graphs)
• Automated Product Configuration
Data Exchange
Source Schema Target Schema
S T
Σst
Σt
person(ID, Name)
employee(Name, Address)
employee(N,A) → ID person(ID,N)
person(ID,N1), person(ID,N2) → N1 = N2
Data Extraction
PRODUCT
Toshiba_Protege_cx
Dell_25416
Dell_23233
Acer_78987
PRICE
480
360
470
390
Data Extraction
PRODUCT
Toshiba_Protege_cx
Dell_25416
Dell_23233
Acer_78987
PRICE
480
360
470
390
T1 T2
Data Extraction
we need object creation...
PRODUCT
Toshiba_Protege_cx
Dell_25416
Dell_23233
Acer_78987
PRICE
480
360
470
390
T1 T2
PRODUCT
Toshiba_Protege_cx
Dell_25416
Dell_23233
Acer_78987
PRICE
480
360
470
390
Data Extraction
table(T1),
table(T2),
sameColor(T1,T2),
isNeighbourRight(T1,T2) 9T tablebox(T),
contains(T,T1), contains(T,T2)
T1 T2
Conceptual Modeling
Stock
0..1
Member
Owns
Competes
0..1
1..1
0..1
1..1
1..1
1..1
1..1
Company
Executive Person
IssuesIndex[0..1]:Str
getIndex():List
Company(X) 9Y Issues(X,Y)
Stock(X), Issues(Y,X), Issues(Z,X) Y = Z
Stock(X),Index(X,Y) Str(Y)
Stock(X), getIndex(X,Y) List(Y)
Stock(X) 9Y Issues(Y,X)
Main Reasoning Service in Datalog+
D
Σ
hD,Σi
D
database
Datalog+ Program
Query = 9X ('(X))
hD,Σi ² Query , D ̂ Σ ² Query
Datalog§
• Extend Datalog by allowing in the head:
o Existential quantification (9)
o Equality atoms (=)
o Constant false (?)
…for query answering over databases
• But, already Datalog[9] is undecidable
• Datalog[9,=,?] is syntactically restricted ! Datalog§
Datalog[9,=,?]
Main Decidability Paradigms for Datalog[9]
Finite Treewidth Sets (FTS) Finite Unification Sets (FUS)
Database expansion is tree-like
Forward chaining procedures
Backward resolution terminates
Proof-theoretic procedures
Q
D
Q
D
…but, identifying the above properties is an undecidable problem
The Main Decidable Datalog[9] Languages
LinearGuarded
Sticky
FUS
FTS
DL-LiteR EL
Linear Datalog[9]
• Linearity: there exists only one body-atom
• LOGSPACE data complexity & PSPACE-complete combined complexity
• Strictly more expressive than DL-LiteR
person(P) 9F hasFather(P,F), person(F)
[Calì, Gottlob & Lukasiewicz, JWS 2012]
DL-LiteR into Linear Datalog[9]
DL-Lite: Popular family of DLs - at the basis of the OWL 2 QL profile of OWL
DL-LiteRAxioms Linear Datalog[9]
A v B A(X) B(X)
A v 9R A(X) 9Y R(X,Y)
9R v A R(X,Y) A(X)
9R v 9P R(X,Y) 9Z P(X,Z)
A v 9R.B A(X) 9Y R(X,Y), B(Y)
R v P R(X,Y) P(X,Y)
A v :B A(X), B(X) ?
Linear Datalog[9]
• Linearity: there exists only one body-atom
• LOGSPACE data complexity & PSPACE-complete combined complexity
• Strictly more expressive than DL-LiteR
• Query answering is first-order rewritable
person(P) 9F hasFather(P,F), person(F)
[Calì, Gottlob & Lukasiewicz, JWS 2012]
First-Order Rewritability
D
ΣQ
QSQL evaluation
8D : hD, Σi ² Q , D ² QSQL
compilation
first-order query
QFO
SQL query
translation
[Calvanese, De Giacomo, Lembo, Lenzerini & Rosati, JAR 2007]
evaluated and optimized
in the usual way
Linear Datalog[9]
• Linearity: there exists only one body-atom
• LOGSPACE data complexity & PSPACE-complete combined complexity
• Strictly more expressive than DL-LiteR
• Query answering is first-order rewritable ) low data complexity
person(P) 9F hasFather(P,F), person(F)
[Calì, Gottlob & Lukasiewicz, JWS 2012]
Guarded Datalog[9]
• Guardedness: a single body-atom contains all the body-variables
• PTIME-c data complexity & 2EXPTIME-c combined complexity
• Strictly more expressive than EL
supervisorOf(S,E), employee(E) employee(S)
[Calì, Gottlob & Lukasiewicz, JWS 2012] & [Calì, Gottlob & Kifer, JAIR 2013]
EL into Guarded Datalog[9]
EL: Popular DL for biological applications - at the basis of OWL 2 EL profile
EL Axioms Guarded Datalog[9]
A v B A(X) B(X)
A u B v C A(X), B(X) C(X)
A v 9R.B A(X) 9Y (R(X,Y), B(Y))
9R.B v A R(X,Y), B(Y) A(X)
…several extensions of EL are captured by Guarded Datalog[9]
Guarded Datalog[9]
• Guardedness: a single body-atom contains all the body-variables
• PTIME-c data complexity + 2EXPTIME-c combined complexity
• Strictly more expressive than EL
• Query answering is Datalog rewritable (cannot be first-order rewritable)
supervisorOf(S,E), employee(E) employee(S)
[Calì, Gottlob & Lukasiewicz, JWS 2012] & [Calì, Gottlob & Kifer, JAIR 2013]
Datalog Rewritability
D
ΣQ
evaluation
8D : hD, Σi ² Q , D ² QDAT
compilation
Datalog query
QDAT
exploit a Datalog engine
Guarded Datalog[9]
• Guardedness: a single body-atom contains all the body-variables
• PTIME-c data complexity & 2EXPTIME-c combined complexity
• Strictly more expressive than EL
• Query answering is Datalog rewritable ) low data complexity
supervisorOf(S,E), employee(E) employee(S)
[Calì, Gottlob & Lukasiewicz, JWS 2012] & [Calì, Gottlob & Kifer, JAIR 2013]
The Main Decidable Datalog[9] Languages
LinearGuarded
FUS
FTS
ELDL-LiteR
Sticky
Why Beyond Tree-like Models?
elephant(X) 9Y hasEAncestor(X,Y), elephant(Y)
cat(X) 9Y hasCAncestor(X,Y), cat(Y)
elephant(X), cat(Y) biggerThan(X,Y)
elephant(e)
elephant(e1)
elephant(e2)
elephant(e3)
elephant(e4)
.
.
.
cat(c)
cat(c1)
cat(c2)
cat(c3)
cat(c4)
.
.
.
£ infinite complete
bipartite graph=
• Stickiness: join-variables stick to the inferred atoms
• LOGSPACE data complexity & EXPTIME-complete combined complexity
• Strictly more expressive than DL-LiteR
• Query answering is first-order rewritable ) low data complexity
Sticky Datalog[9]
[Calì, Gottlob & P., AIJ 2012]
R(X,Y), P(Y,Z) 9W T(X,Y,W)
T(X,Y,Z) 9W S(Y,W)
R(X,Y), P(Y,Z) 9W T(X,Y,W)
T(X,Y,Z) 9W S(X,W)
The Main Decidable Datalog[9] Languages
LinearGuarded ELDL-LiteR
Sticky
Several Interesting Extensions
Field of intense research - e.g., Montpellier,
Dresden, Calabria, Oxford, Vienna, …
Linear
Guarded
Weakly-Guarded Frontier-Guarded
Weakly-Frontier-Guarded
Sticky
Sticky-Join Weakly-Sticky
Weakly-Sticky-Join
Complexity of the Main Datalog[9] Languages
Data Complexity Bounded Arity Combined Complexity
Linear in AC0 NP-c PSPACE-c
Guarded PTIME-c EXPTIME-c 2EXPTIME-c
Sticky in AC0 NP-c EXPTIME-c
via query rewriting
…can we go beyond positive rules, i.e., Datalog[9,»]?
Datalog[9,»]
• Rules extended with negative literals in their body
• But, what is the semantics for Datalog[9,»]?
Number(X) 9Y Succ(X,Y), Number(Y)
Number(X), »Even(X) Odd(X)
Number(X), »Odd(X) Even(X)
Well-Founded Semantics (WFS) & Stable Model Semantics (SMS)
Semantics of Datalog[9,»]
1. Convert the Datalog[9,»] program into a normal LP (via Skolemization)
2. Use the existing WFS and SMS for normal LPs
WFS(D,Σ) := WFS(ΠD,Σ)
SMS(D,Σ) := SMS(ΠD,Σ)
D = {R(a,b), P(a)}
Σ = {R(X,Y) 9Z R(Y,Z)),
R(X,Y), P(X), »S(X) P(Y)),
R(X,Y), »P(X) S(Y)}
ΠD,Σ = {R(a,b), P(a),
R(X,Y) R(Y,f(X,Y)),
R(X,Y), P(X), »S(X) P(Y),
R(X,Y), »P(X) S(Y)}
Skolemization
Query Answering and Datalog[9,»]
WFS Boolean Conjunctive Query Answering (WFS-BCQ) :
Input: database D, Datalog[9,»] program Σ, BCQ Q
Question: WFS(D,Σ) ² Q?
SMS Boolean Conjunctive Query Answering (SMS-BCQ) :
Input: database D, Datalog[9,»] program Σ, BCQ Q
Question: Μ ² Q, 8Μ 2 SMS(D,Σ)?
Guarded Datalog[9,»]
tree-likeness of the underlying models is preserved
R(X,Y,Z), P(X,Y), »S(Z,X) 9W R(Y,Z,W), S(W,Z)
[Gottlob, Hernich, Kupke & Lukasiewicz, PODS 2013, KR 2014]
Data Combined
WFS-BCQ PTIME-c 2EXPTIME-c
SMS-BCQ coNP-c 2EXPTIME-c
Guarded Datalog[9,»]
[Gottlob, Hernich, Kupke & Lukasiewicz, PODS 2013, KR 2014]
Q
D
blocking technique
R(X,Y,Z), P(X,Y), »S(Z,X) 9W R(Y,Z,W), S(W,Z)
Data Combined
WFS-BCQ PTIME-c 2EXPTIME-c
SMS-BCQ coNP-c 2EXPTIME-c
Guarded Datalog[9,»]
[Gottlob, Hernich, Kupke & Lukasiewicz, PODS 2013, KR 2014]
Guarded Datalog[9,»]
under SMS
Guarded Datalog[9,» ,_]
with stratified negation
·p
R(X,Y,Z), P(X,Y), »S(Z,X) 9W R(Y,Z,W), S(W,Z)
Data Combined
WFS-BCQ PTIME-c 2EXPTIME-c
SMS-BCQ coNP-c 2EXPTIME-c
Linear Datalog[9,»]
[Gottlob, Hernich, Kupke & Lukasiewicz, PODS 2013, KR 2014]
R(X,Y,Z), »S(Z,X) 9W R(Y,Z,W), S(W,Z)
Data Combined
WFS-BCQ PTIME-c 2EXPTIME-c
SMS-BCQ coNP-c 2EXPTIME-c
(LOGSPACE) (PSPACE-C)
Linear Datalog[9,»] behaves like Guarded Datalog[9,»]
Sticky Datalog[9,»]
[Alviano & P., PODS 2015]
• Stickiness: join-variables stick to the inferred atoms
• What is the right definition for Sticky Datalog[9,»]?
…either consider or ignore the variables in negative literals
R(X,Y), P(Y,Z) 9W T(X,Y,W)
T(X,Y,Z) 9W S(Y,W)
R(X,Y), P(Y,Z) 9W T(X,Y,W)
T(X,Y,Z) 9W S(X,W)
Sticky Sticky+
WFS-BCQ EXPTIME-c / in PTIME Undecidable
SMS-BCQ Undecidable Undecidable
variables in negative literals
do not obey the stickiness condition
Sticky Datalog[9,»]
combined / data
[Alviano & P., PODS 2015]
Sticky Sticky+
WFS-BCQ EXPTIME-c / in PTIME Undecidable
SMS-BCQ Undecidable Undecidable
variables in negative literals
do not obey the stickiness condition
employ a proof-theoretic approach
Sticky Datalog[9,»]
combined / data
Q
D
[Alviano & P., PODS 2015]
Sticky Sticky+
WFS-BCQ EXPTIME-c / in PTIME Undecidable
SMS-BCQ Undecidable Undecidable
variables in negative literals
do not obey the stickiness condition
combined / data
[Alviano & P., PODS 2015]
existential quantification + cartesian products + guessing
Sticky Datalog[9,»]
Sticky Datalog[9,»] is Undecidable under SMS
• Each stable model encodes a possible computation of the Turing machine
• The query checks whether at least one stable model represents a valid
halting computation
k-th horizontal row represents the
k-th configuration of the Turing machine
… … …
…
…
…
[Alviano & P., PODS 2015]
Sticky Sticky+
WFS-BCQ EXPTIME-c / in PTIME Undecidable
SMS-BCQ Undecidable Undecidable
variables in negative literals
do not obey the stickiness condition
Sticky Datalog[9,»]: Sum Up
combined / data
[Alviano & P., PODS 2015]
Stickiness + WFS - proof-theoretic approach
Stickiness + SMS - 9-quantification + cartesian products + guessing
But…
move(X,Y), »win(Y) win(X)
• Even rules with exactly one positive atom may not be sticky
• Can we do better?
…the second dimension of stickiness
[Alviano & P., PODS 2015]
1st Dimension: the positive part is sticky
2nd Dimension: negative literals that lose a variable stick to one positive atom
2D-Stickiness
move(X,Y), »win(Y) win(X)
P(X,Y), R(Y), »R(X) 9Z S(Y,Z)
1st dimension
2nd dimension
1st Dimension: the positive part is sticky
2nd Dimension: negative literals that lose a variable stick to one positive atom
2D-Stickiness
T(X,Y), R(Y,Z) P(X,Y)
P(X,Y), R(Y,Z), »R(X,X) 9Z S(Y,Z)
1st Dimension: the positive part is sticky
2nd Dimension: negative literals that lose a variable stick to one positive atom
2D-Stickiness
T(X,Y), R(Y,Z) P(X,Y)
P(X,Y), R(Y,Z), »R(X,X) 9Z S(Y,Z)
1st dimension
1st Dimension: the positive part is sticky
2nd Dimension: negative literals that lose a variable stick to one positive atom
2D-Stickiness
T(X,Y), R(Y,Z) P(X,Y)
P(X,Y), R(Y,Z), »R(X,X) 9Z S(Y,Z)
1st dimension
2nd dimension
1st Dimension: the positive part is sticky
2nd Dimension: negative literals that lose a variable stick to one positive atom
2D-Stickiness
P(X), P(Y) T(Y,X)
T(X,Y), »R(Y,X) 9Z S(Z)
1st dimension
1st Dimension: the positive part is sticky
2nd Dimension: negative literals that lose a variable stick to one positive atom
2D-Stickiness
P(X), P(Y) T(Y,X)
T(X,Y), »R(Y,X) 9Z S(Z)
1st dimension
2nd dimension
2D-Sticky Datalog[9,»]
2D-Sticky 2D-Sticky+2
WFS-BCQ 2EXPTIME-c / PTIME-c Undecidable
SMS-BCQ Undecidable Undecidable
negative literals may stick
to two positive atoms
combined / data
[Alviano & P., PODS 2015]
2D-Sticky Datalog[9,»]
2D-Sticky 2D-Sticky+2
WFS-BCQ 2EXPTIME-c / PTIME-c Undecidable
SMS-BCQ Undecidable Undecidable
negative literals may stick
to two positive atoms
combined / data
employ a proof-theoretic approach
Q
D
[Alviano & P., PODS 2015]
Datalog[9,»]: An Overview
LinearGuarded
Sticky
2D-Sticky
Conclusions and Future Work
Thank you!
Transform Datalog from a first class database query language to a
first class language for knowledge representation
(and other applications)
Problems under investigation:
• Stickiness + stable model semantics
• Deal with equality - Datalog[9,» ,=]
• New semantics without applying Skolemization - follow the
approach on stable models by Ferraris, Lee & Lifschitz