61
Query answering and query abstraction through ontologies Maurizio Lenzerini Dipartimento di Ingegneria Informatica Automatica e Gestionale Antonio Ruberti EROSS 2020 ER Online Summer Seminars October 14, 2020 Maurizio Lenzerini Query answering and query abstraction through ontologies (1/59)

Query answering and query abstraction through ontologies

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Query answering and query abstraction throughontologies

Maurizio Lenzerini

Dipartimento di Ingegneria InformaticaAutomatica e Gestionale Antonio Ruberti

EROSS 2020

ER Online Summer Seminars

October 14, 2020

Maurizio Lenzerini Query answering and query abstraction through ontologies (1/59)

ER Online Summer Seminars: very interesting event

Maurizio Lenzerini Query answering and query abstraction through ontologies (2/59)

ER Online Summer Seminars: very, very interesting event

Maurizio Lenzerini Query answering and query abstraction through ontologies (3/59)

ER Online Summer Seminars: very, very, very interesting event

Maurizio Lenzerini Query answering and query abstraction through ontologies (4/59)

Conceptual modeling in the ’70s: top-down data design

conceptual schema

DB

conceptual modeling

implementation

Design time

DB

uses

Run time

Why does the conceptual disappear at run time?

Maurizio Lenzerini Query answering and query abstraction through ontologies (5/59)

(My) research question

conceptual schema

DB

uses

mapping

DB

Run time

conceptual schema

uses

DB

mapping

Principles and tools for “using” the conceptual schema at run time

Maurizio Lenzerini Query answering and query abstraction through ontologies (6/59)

Ontology-based data management

Based on three main components:

Ontology, a declarative, logic-based specification of the domain of interest,used as a unified, conceptual view for clients

Data sources, representing external, independent, heterogeneous, storage(or, more generally, computational) structures

Mappings, used to semantically link data at the sources to the ontology

Abstraction for several data management and interoperability scenarios

Maurizio Lenzerini Query answering and query abstraction through ontologies (7/59)

Data interoperability architectures

Dataindependence Data integration Data exchange Collaborative data sharing[Date et al.] [Doan et al. 2012] [Arenas et al. 2014] [Karvounarakis et al.2013]

Data independence at run time (data access through ontology)

(Virtual) data integration (data federation)

Data exchange/consolidation (materialized data integration, ETL, ELT)

Collaborative data sharing (P2P data integration, Peer data management)

Common feature: mappings (arrows)Maurizio Lenzerini Query answering and query abstraction through ontologies (8/59)

Mappings

The relationship between two nodes of the data interoperability network isspecified by means of a set of mapping assertions.

Syntax of mapping assertion

Each mapping assertions from (source) node S to (target) node G has the form

∀~x (Φ(~x)→ Ψ(~x))

where

Φ(~x) is a query over S, whose free variables are ~x

Ψ(~x) is a query over G, whose free variables are from ~x.

Intuitive semantics of mapping assertion

The data items in S satisfying the pattern expressed as Φ(~x) “match” the dataitems in G satisfying the pattern expressed as Ψ(~x).

Maurizio Lenzerini Query answering and query abstraction through ontologies (9/59)

Types of mappings

Taxonomy from data integration [L. 2002]

Global as view (GAV) – Ψ(~x) is an atom v(~x)

∀~x (Φ(~x)→ v(~x))

the element v of the target node is associated with a view over the sourcenode

Local as view (LAV) – Φ(~x) is an atom v(~x)

∀~x (v(~x)→ Ψ(~x))

the element v of the source node is associated with a view over the targetnode

GLAV∀~x (Φ(~x)→ Ψ(~x))

a view over the source node is associated to a view over the target node

Maurizio Lenzerini Query answering and query abstraction through ontologies (10/59)

Example of mapping

Consider the following mapping relating Node 1 to Node 2:

∀x∀y∀z Student(x) ∧ Grade(x, y, z)→ ∃w Teaches(w, y) ∧ Enrolled(x, y)

Data at Node 1:{ Student(Mario),Grade(Mario,IS,B),Student(Anna),Grade(Anna,AI,A) }

After data exchange:

Data at Node 2 after data exchange:{ Teaches(w1,IS),Enrolled(Mario,IS),Teaches(w2,AI),Enrolled(Anna,AI) }

Maurizio Lenzerini Query answering and query abstraction through ontologies (11/59)

This talk: how to use mappings for processing queries inontology-based data management

Compile timeDiscoveryAnalysis (equivalence, redundancy, optimization)Reasoning (inverse, composition, etc.)

Run timeExchanging dataUpdate propagationTranslationProcessing queriesin ontology-baseddata management

Ontology

Data sources

Maurizio Lenzerini Query answering and query abstraction through ontologies (12/59)

Outline

1 Ontology-based data management

2 Processing queries in OBDM: answering and abstraction

3 Query answering

4 Query abstraction

5 Conclusion

Maurizio Lenzerini Query answering and query abstraction through ontologies (13/59)

Outline

1 Ontology-based data management

2 Processing queries in OBDM: answering and abstraction

3 Query answering

4 Query abstraction

5 Conclusion

Maurizio Lenzerini Query answering and query abstraction through ontologies (14/59)

Formal framework of ontology-based data management

An ontology-based data management (OBDM) specification [Poggi et al.2008, L. 2018] is a triple J = 〈O,M,S〉, where

O is the ontology, expressed as a logical theory – here, as usual, a TBox ina Description Logic (DL)

S is a data schema representing the data sources – here, a (federated)relational schema

M is a set of mapping assertions, each of the form

∀~x (Φ(~x)→ Ψ(~x))

where (in this talk we focus on conjunctive queries, but we could be moregeneral)

Φ(~x) is a conjunctive query (SPJ query) over S, with free variables ~xΨ(~x) is a conjunctive query (SPJ query) over O, with free variables from ~x.

An OBDM system is a pair (J , D), where J = 〈O,M,S〉 is an OBDMspecification, and D is an S-database, i.e., a source database that is legal for S.

Maurizio Lenzerini Query answering and query abstraction through ontologies (15/59)

Ontology-based data management specification – Example

Ontology O (TBox)

Employee v ∃worksForEmployee v ∃empCodeEmployee v ∃salaryProject v ∃worksFor−Project v ∃projectName∃worksFor v Employee∃worksFor− v Project

Federated source schema SD1[SSN:String,PrName: String]

Employees and Projects they work for

D2[Code: String, Salary : Int]Employee’s Code with salary

D3[Code: String, SSN: String]Employee’s Code with SSN

. . .

Mapping M

M1: SELECT SSN,PrNameFROM D1

→ Employee(pe(SSN)), Project(pr(PrName)),projectName(pr(PrName), PrName),workFor(pe(SSN), pr(PrName))

M2: SELECT Code,SalaryFROM D2,D3WHERE D2.Code = D3.Code

→ Employee(pe(SSN)),salary(pe(SSN), Salary)

M3: SELECT Code,SalaryFROM D2,D3WHERE D2.Code = D3.Code

AND SSN NOT IN(SELECT SSN FROM D1)

→ Employee(pe(SSN)),salary(pe(SSN), Salary)

Maurizio Lenzerini Query answering and query abstraction through ontologies (16/59)

Ontology-based data management: Semantics

Let J = 〈O,M,S〉 be an OBDM specification, and let I= (∆I , ·I) be aninterpretation for the ontology O.

Def.: Semantics

The semantics of J is given with respect to an S-database D. I= (∆I , ·I) is amodel of (J , D) if:

I satisfies all axioms of O, i.e., is a model of O;

I satisfies M wrt D, i.e., satisfies every assertion Φ(~x) → Ψ(~x) in M wrtD, which means that the sentence ∀~x (Φ(~x) → Ψ(~x)) is true in I ∪ D.

(J , D) is satisfiable or consistent, if it admits at least one model.

Semantics of queries

The semantics of a query q(~x) posed to the OBDM system (J , D) is given interms of the set cert(q,J , D) of certain answers, i.e., those answers that holdin all the models of (J , D).

Maurizio Lenzerini Query answering and query abstraction through ontologies (17/59)

Ontology-mediated query answering

A DL-knowledge base is a pair 〈T ,A〉, where T is a TBox (also called ontology,i.e., general axioms on concepts and relations) and A is an ABox (specificaxioms on individuals)

Ontology-mediated query answering, or query answering over DL-KBs

Given a DL-knowledge base 〈T ,A〉, a query q over such KB and a tuple a ofconstants in A, check whether a is a certain answer to q over 〈T ,A〉, i.e., ifq(a) is true in every model of T ∪ A.

Let us denote by M(D) the ABox obtained by “applying” the mapping M toD (i.e., by (i) computing the answer to the left-hand-side query of eachmapping assertion m, and (ii) for each such answer adding into the ABox the“corresponding” tuple satisfying the right-hand-side query of m)

Proposition

If q is a query posed to J = 〈O,M,S〉 and D is an S-database, thena ∈ cert(q,J , D) if and only if a is a certain answer to q over 〈O,M(D)〉.

Maurizio Lenzerini Query answering and query abstraction through ontologies (18/59)

Outline

1 Ontology-based data management

2 Processing queries in OBDM: answering and abstraction

3 Query answering

4 Query abstraction

5 Conclusion

Maurizio Lenzerini Query answering and query abstraction through ontologies (19/59)

Two fundamental problems regarding queries

Given OBDM specification J = 〈O,M,S〉

1 Query answering:given a query qO over the ontology O, compute a query qS that capturesthe certain answers to qO at best under J−→ Direct rewriting (or, Ontology-to-Source rewriting – top-down)

2 Query abstraction:given a query qS over the source schema S, compute a query qO whosecertain answers capture qS at best under J−→ Reverse rewriting (or, Source-to-Ontology rewriting – bottom-up)

Maurizio Lenzerini Query answering and query abstraction through ontologies (20/59)

Usages of query answering and abstraction

Ontology

Data sources

Query

1 Query answering:Compute the certain answers of a query expressed over the ontologyLook for source data that are inconsistent/incomplete with respect to theontologyTell me which sources store data corresponding to instances of relevantconcepts/relationships, or relevant views of such ontology elements

2 Query abstraction:Explain the content of a data source in terms of the ontologyVerify if a given data service expressed over the data sources can beexpressed in terms of the ontologyAutomatically associate semantics to open data sets

Ontology

Data sources

Query

Maurizio Lenzerini Query answering and query abstraction through ontologies (21/59)

Outline

1 Ontology-based data management

2 Processing queries in OBDM: answering and abstraction

3 Query answering

4 Query abstraction

5 Conclusion

Maurizio Lenzerini Query answering and query abstraction through ontologies (22/59)

Computing certain answers

Problem

Given an OBDM specification J = 〈O,M,S〉 and a query qO over theontology O, compute a query qS over the source schema S that bestcharacterizes qO under J .

The best characterization is obtained by the so-called perfect rewriting.

Formal definition of perfect rewriting

A query qS over S is a perfect O-to-S J -rewriting of qO if for everyS-database D, qDS coincides with the certain answers cert(qO,J , D).

Two basic computational problems:

Verification (check if qS is a perfect O-to-S J -rewriting of qO)

Computation (compute the perfect O-to-S J -rewriting of qO)

Starting from [Calvanese et al. 2007], computing perfect O-to-S J -rewritings inseveral scenarios is one of the most studied problems in KR&R in recent years.

Maurizio Lenzerini Query answering and query abstraction through ontologies (23/59)

Direct rewriting for computing certain answer

Note that ComputerProfessor is partitioned intoComputerScientist and ComputerEngineer.Here is the ABox corresponding to M(D), renderedas a “knowledge graph”:

Andrea’s example

Query:{ (x) | ∃y, z. supervisedBy(x, y),ComputerSC(y), hates(y, z),ComputerEng(z) }Answer: ???

Maurizio Lenzerini Query answering and query abstraction through ontologies (24/59)

Direct rewriting for computing certain answer

Note that ComputerProfessor is partitioned intoComputerScientist and ComputerEngineer.Here is the ABox corresponding to M(D), renderedas a “knowledge graph”:

Andrea’s example

Query:{ (x) | ∃y, z. supervisedBy(x, y),ComputerSC(y), hates(y, z),ComputerEng(z) }Answer: { john } To obtain this answer, we need to reasoning by cases

−→ No first-order query is a perfect ontology-to-source rewriting of q

Maurizio Lenzerini Query answering and query abstraction through ontologies (24/59)

Complexity of conjunctive query answering in DLs

Studied extensively for (unions of) CQs and various ontology languages:

Combined complexity Data complexity

Plain databases NP-complete AC0 (1)

OWL 2 ??? coNP-hard (2)

(1) This makes it possible to scale with the data.

(2) Already for a TBox with a single disjunction (see example above).

Research question

Can we find interesting DLs for which we can always rewrite the query over theontology into a FOL query over the source? A lot of research work/answers inthe last decade!

Maurizio Lenzerini Query answering and query abstraction through ontologies (25/59)

Complexity of conjunctive query answering in DLs

Studied extensively for (unions of) CQs and various ontology languages:

Combined complexity Data complexity

Plain databases NP-complete AC0 (1)

OWL 2 ??? coNP-hard (2)

(1) This makes it possible to scale with the data.

(2) Already for a TBox with a single disjunction (see example above).

Research question

Can we find interesting DLs for which we can always rewrite the query over theontology into a FOL query over the source? A lot of research work/answers inthe last decade!

Maurizio Lenzerini Query answering and query abstraction through ontologies (25/59)

The DL-Lite family

DL-Lite is a family [Calvanese et al 2007] of DLs optimized according to thetradeoff between expressive power and complexity of query answering, withemphasis on data.

The same complexity as relational databases.

In fact, query answering is FOL-rewritable and hence can be delegated to arelational DB engine.

Nevertheless they have the right expressive power to capture the essentialfeatures of conceptual modeling formalisms.

DL-Lite provides robust foundations for Ontology-Based Data Management.

The DL-Lite family is at the basis of the OWL 2 QL profile of the W3Cstandard Web Ontology Language OWL.

Maurizio Lenzerini Query answering and query abstraction through ontologies (26/59)

Capturing basic ontology constructs in DL-Lite

Modeling construct DL-Lite

ISA between classes Student v Person

... and or relations isMatherOf v isParentOf

Disjointness between classes Student v ¬Professor

... and or relations isMatherOf v ¬isFatherOf

Domain of relations ∃livesIn v Person

Range of rleaitons ∃livesIn− v City

Mandatory participation (min card = 1)Person v ∃livesIn

City v ∃livesIn−

Functionality of relations (max card = 1) (funct livesIn)but a functional relation cannot be specialized (funct livesIn−)

Note: DL-Lite distinguishes between abstract objects and data values as well(we can represent concept attributes) (ignored here).

Maurizio Lenzerini Query answering and query abstraction through ontologies (27/59)

Query answering in DL-Lite-based OBDM systems

In a DL-Lite-based OBDM specification 〈O,M,S〉O is expressed in DL-Lite

M is a set of GAV mapping assertions (the right-hand side Ψ is aconjunctive query (CQ) without existential variables).

queries over O are unions of conjunctive queries (UCQs).

Query answering is performed through ontology-to-source rewriting:

Given a (U)CQ q, OBDM specification J = 〈O,M,S〉, S-database D, wecompute cert(q,J , D) as follows:

1 we compute the ontology rewriting rq,O of q using O;

2 we compute the mapping rewriting r of rq,O using M;

3 evaluate the UCQ r directly over D.

Correctness of this procedure shows that r is the perfect ontology-to-sourcerewriting of q wrt J , and shows FOL-rewritability of query answering inDL-Lite.

Maurizio Lenzerini Query answering and query abstraction through ontologies (28/59)

Query answering in DL-Lite-based OBDM systemse

S contains two tables

T REG(ID,JOB)

T STUDENT(ID,UNIVERSITY)

Animal

Person

Employee Student

T_STUDENTT_REG

JOB = ‘e’

University

Ontology query qO :{ (x) | Person(x) }

Ontology rewriting: { (x) | Person(x) ∨ Employee(x) ∨ Student(x) }Mapping rewriting: { (x) | T REG(x, y) ∨ T STUDENT(x, z) }

Maurizio Lenzerini Query answering and query abstraction through ontologies (29/59)

Computational complexity of query answering

Proposition

Query answering on a DL-Lite ontology-based data management system(〈O,M,S〉, D) of the kind considered so far is

1 PTime in the size of the ontology O and the mappings M.

2 AC0 in the size of the database D, in fact FOL-rewritable.

3 Exponential in the size of the query, more precisely NP-complete.

Precisely the complexity of evaluating CQs in plain relational DBs.

Can we go beyond DL-Lite and remain in AC0?

The DLs of the DL-Lite family are essentially the maximally expressive DLsenjoying these nice computational properties.

Maurizio Lenzerini Query answering and query abstraction through ontologies (30/59)

Outline

1 Ontology-based data management

2 Processing queries in OBDM: answering and abstraction

3 Query answering

4 Query abstraction

5 Conclusion

Maurizio Lenzerini Query answering and query abstraction through ontologies (31/59)

Query abstraction

Problem

Given an OBDM specification J = 〈O,M,S〉 and a query qS over the sourcesschema S, compute a query qO over the ontology O whose certain answers bestcharacterize qS under J .

Sort of reverse engineering problem, that is relevant is several tasks, e.g.,

providing the semantics of the various data sources;

providing the semantics of data services expressed over the sources;

checking whether the ontology and the mapping assertions are “adequate”for answering source queries through the OBDM system;

automatically documenting the semantics of open data sets.

The notion of abstraction (sometimes called realization or reverse rewriting)aims at addressing this problem [Cima 2017, Lutz et al. 2018, Cima et al. 2019,Cima 2020].

Maurizio Lenzerini Query answering and query abstraction through ontologies (32/59)

Perfect abstraction

Let J = 〈O,M,S〉 be an OBDM specification, and let qS be a query over S.The best characterization is obtained by the co-called perfect abstraction.

Definition (Formal definition of abstraction)

A query qO over O is a perfect J -abstraction of qS , if for every S-database Dsuch that 〈J , D〉 is consistent, we have that

qDS =cert(qO,J , D)

qDS = cert(qO,J , D)

Query answering Query abstractionOntology-to-Source rewriting Source-to-Ontology rewriting

(top-down) (bottom-up)input: qO input: qSoutput: qS output: qO

Two basic computational problems:

Verification (check if qO is a perfect J -abstraction of qS)

Computation (compute the perfect J -abstraction of qS)

Maurizio Lenzerini Query answering and query abstraction through ontologies (33/59)

Perfect abstraction - Example

Animal

Person

Employee Student

T_STUDENTT_REG

JOB = ‘e’

University

Source query qS :select ID as x from T STUDENT

Source query q′S :select ID as x from T REG

Maurizio Lenzerini Query answering and query abstraction through ontologies (34/59)

Perfect abstraction - Example

Animal

Person

Employee Student

T_STUDENTT_REG

JOB = ‘e’

University

Source query qS :select ID as x from T STUDENT

Source query q′S :select ID as x from T REG

perfect J -abstraction of qS : ???

Maurizio Lenzerini Query answering and query abstraction through ontologies (35/59)

Perfect abstraction - Example

Animal

Person

Employee Student

T_STUDENTT_REG

JOB = ‘e’

University

Source query qS :select ID as x from T STUDENT

Source query q′S :select ID as x from T REG

perfect J -abstraction of qS : Student(x)

Maurizio Lenzerini Query answering and query abstraction through ontologies (36/59)

Perfect abstraction - Example

Animal

Person

Employee Student

T_STUDENTT_REG

JOB = ‘e’

University

Source query qS :select ID as x from T STUDENT

Source query q′S :select ID as x from T REG

perfect J -abstraction of qS : Student(x)

perfect J -abstraction of q′S : ???

Maurizio Lenzerini Query answering and query abstraction through ontologies (37/59)

Perfect abstraction - Example

Animal

Person

Employee Student

T_STUDENTT_REG

JOB = ‘e’

University

Source query qS :select ID as x from T STUDENT

Source query q′S :select ID as x from T REG

perfect J -abstraction of qS : Student(x)

perfect J -abstraction of q′S : none

Maurizio Lenzerini Query answering and query abstraction through ontologies (38/59)

Sound and complete abstractions

The perfect abstraction for a source query may not exist.

Let J = 〈O,M,S〉 be an OBDM specification, and let qS be a query over S.

Definition (Sound abstraction)

A query qO over O is a sound J -abstraction of qS if for every S-database Dsuch that 〈J , D〉 is consistent, we have that

cert(qO,J , D) ⊆ qDS

Definition (Complete abstraction)

A query qO over O is a complete J -abstraction of qS if for every S-databaseD such that 〈J , D〉 is consistent, we have that

qDS ⊆ cert(qO,J , D)

Maurizio Lenzerini Query answering and query abstraction through ontologies (39/59)

Sound and complete abstractions - Example

Animal

Person

Employee Student

T_STUDENTT_REG

JOB = ‘e’

University

Source query qS :select ID as x from T STUDENT

Source query q′S :select ID as x from T REG

perfect J -abstraction of qS : Student(x)

perfect J -abstraction of q′S : none

Maurizio Lenzerini Query answering and query abstraction through ontologies (40/59)

Sound and complete abstractions - Example

Animal

Person

Employee Student

T_STUDENTT_REG

JOB = ‘e’

University

Source query qS :select ID as x from T STUDENT

Source query q′S :select ID as x from T REG

perfect J -abstraction of qS : Student(x)

perfect J -abstraction of q′S : none

complete J -abstraction of q′S : ???

complete J -abstraction of q′S : ???

Maurizio Lenzerini Query answering and query abstraction through ontologies (41/59)

Sound and complete abstractions - Example

Animal

Person

Employee Student

T_STUDENTT_REG

JOB = ‘e’

University

Source query qS :select ID as x from T STUDENT

Source query q′S :select ID as x from T REG

perfect J -abstraction of qS : Student(x)

perfect J -abstraction of q′S : none

complete J -abstraction of q′S : Animal(x)

complete J -abstraction of q′S : Person(x)

Maurizio Lenzerini Query answering and query abstraction through ontologies (42/59)

Sound and complete abstractions - Example

Animal

Person

Employee Student

T_STUDENTT_REG

JOB = ‘e’

University

Source query qS :select ID as x from T STUDENT

Source query q′S :select ID as x from T REG

perfect J -abstraction of qS : Student(x)

perfect J -abstraction of q′S : none

complete J -abstraction of q′S : Animal(x)

complete J -abstraction of q′S : Person(x)

sound J -abstraction of q′S : ???

sound J -abstraction of q′S : ???

Maurizio Lenzerini Query answering and query abstraction through ontologies (43/59)

Sound and complete abstractions - Example

Animal

Person

Employee Student

T_STUDENTT_REG

JOB = ‘e’

University

Source query qS :select ID as x from T STUDENT

Source query q′S :select ID as x from T REG

perfect J -abstraction of qS : Student(x)

perfect J -abstraction of q′S : none

complete J -abstraction of q′S : Animal(x)

complete J -abstraction of q′S : Person(x)

sound J -abstraction of q′S : Person(x),University(x)

sound J -abstraction of q′S : Employee(x)

Maurizio Lenzerini Query answering and query abstraction through ontologies (44/59)

Maximally sound and minimally complete abstractions

Let L be a class of queries.

Definition

If qO ∈ L is a sound J -abstraction of qS , then qO is L-maximally sound if noq′ ∈ L exists such that

(i) q′ is a sound J -abstraction of qS ,

(ii) ∀ S-database D cert(qO,J , D) ⊆ cert(q′,J , D), and

(iii) there exists an S-database D s.t. cert(qO,J , D) ⊂ cert(q′,J , D).

Definition

If qO ∈ L is a complete J -abstraction of qS , then qO is L-minimally completeif no q′ ∈ L exists such that

(i) q′ is a complete J -abstraction of qS ,

(ii) ∀ S-database D cert(q′,J , D) ⊆ cert(qO,J , D), and

(iii) there exists an S-database s.t. cert(q′,J , D) ⊂ cert(qO,J , D).

Maurizio Lenzerini Query answering and query abstraction through ontologies (45/59)

Maximal and minimal abstractions - Example

Animal

Person

Employee Student

T_STUDENTT_REG

JOB = ‘e’

University

Source query qS :select ID as x from T STUDENT

Source query q′S :select ID as x from T REG

perfect J -abstraction of qS : Student(x)

perfect J -abstraction of q′S : none

UCQ-minimally complete J -abstraction of q′S : ???

UCQ-maximally sound J -abstraction of q′S : ???

Maurizio Lenzerini Query answering and query abstraction through ontologies (46/59)

Maximal and minimal abstractions - Example

Animal

Person

Employee Student

T_STUDENTT_REG

JOB = ‘e’

University

Source query qS :select ID as x from T STUDENT

Source query q′S :select ID as x from T REG

perfect J -abstraction of qS : Student(x)

perfect J -abstraction of q′S : none

UCQ-minimally complete J -abstraction of q′S : Person(x)

UCQ-maximally sound J -abstraction of q′S : Employee(x)

Maurizio Lenzerini Query answering and query abstraction through ontologies (47/59)

Can we compute the UCQ-maximally completeabstractions?

Theorem

The verification problem for complete abstractions is NP-complete.

What about the computation problem? We now focus on the problem ofcomputing the UCQ-maximally complete abstraction of a qS over the sourceschema S with respect to an OBDM specification J = 〈O,M,S〉, where

qS is a CQ

O a DL-LiteR (therefore, no functionality assertions) ontology, and

M a set of GLAV mapping assertions of the form

∀~x ∀~y (ΦS(~x, ~y) → ∃~z ΨO(~x, ~z))

where ΦS(~x) is a CQ over S and ΨO is a CQ over the ontology O.

Note that computing such rewriting solves also the problem of computing theperfect abstraction (it is sufficient to check the result for soundness)

Maurizio Lenzerini Query answering and query abstraction through ontologies (48/59)

How to compute it: basic idea

There is a strict correlation between CQs and relational databases. Given a CQq over a schema S it is possible to construct in linear time an S-database Dq

that fully captures q, and viceversa:

every constant in q becomes a value in Dq;every variable in q becomes a labeled null in Dq;every atom Ri(~u) ∈ q becomes a fact (tuple) in Dq.

Example

Let S contain the tables Tab1(id,city) and Tab2(id,city), and consider thefollowing CQ qS over S

qS(x)← Tab1(x, y),Tab2(‘sara’, y)

The S-database DqS associated to the query qS is:

Tab1(x,y), Tab2(‘sara’,y)

where x, y are labeled nulls.

Maurizio Lenzerini Query answering and query abstraction through ontologies (49/59)

How to compute it: basic idea (cont’d)

Roughly speaking, the S-database DqS associated to qS is representative ofthose instances of S on which the evaluation of qS is non-empty.

Given the OBDM specification J = 〈O,M,S〉, we denote by M(DqS ) the setof O-facts (ABox, in DL terminology) obtained by chasing DqS with themapping assertions in M.

Intuition

The query qO over the ontology corresponding to the UCQ-minimally completeJ -abstraction of qS is the query corresponding to M(DqS ).

compute the compute the compute the

associated DB chase corresponding query

qS −→ DqS −→ M(DqS ) −→ qO

Maurizio Lenzerini Query answering and query abstraction through ontologies (50/59)

How to compute it: basic idea (cont’d)

Example

Suppose to have the following mapping:

m1: Tab1(x,y) ; Person(x), livesIn(x, y)m2: Tab2(x,y) ; Person(x),worksIn(x, y)

and consider the instance DqS of S associated to the query qS :

Tab1(x,y), Tab2(‘sara’,y),

It is easy to see that M(DqS ) is the ABox containing the following assertions:

Person(x), livesIn(x, y), Person(‘sara’), worksIn(‘sara’, y)

From M(DqS ) we can construct the following query over the ontology,denoting all the Persons living in the same city where the person ‘sara’ works

qO(x)← Person(x), livesIn(x, y), Person(‘sara’), worksIn(‘sara’, y)

Maurizio Lenzerini Query answering and query abstraction through ontologies (51/59)

UCQ-minimally complete abstraction: algorithm

We denote by > the special conjunctive query over O that is true for everytuple.

Algorithm FindMinimallyCompleteAbstraction

Input: a DL-LiteR OBDM specification 〈O,M,S〉,UCQ qS = q1S(~t1) ∪ . . . ∪ qnS(~tn) over S

Output: UCQ qO over Obegin

1 qO := { ~t1 | M(q1S),>(~t1) } ∪ . . . ∪ { ~tn | M(qnS),>(~tn) }2 return qO

end

Informally, the algorithm computes the output query as union of CQs obtainedby simply applying the mapping M to each CQ qiS in qS , using > to bind thedistinguished variables of the output query that do not appear in M(qiS).

Maurizio Lenzerini Query answering and query abstraction through ontologies (52/59)

UCQ-minimally complete abstraction: properties

Let J = 〈O,M,S〉 be a DL-LiteR OBDM specification, and let qS be a CQover S.

FindMinimallyCompleteAbstraction(J , qS) terminates and runs in:

(i) PTime in the size of qS ;(ii) PTime in the size of O;

(iii) ExpTime in the size of M.

The query returned by FindMinimallyCompleteAbstraction(J , qS) is aUCQ-minimally complete J -abstraction of qS .

The UCQ-minimally complete J -abstraction of qS is unique up to logicalequivalence, and it can be expressed as a CQ if qS is a CQ.

A perfect J -abstraction of qS exists in the class of positive queries if andonly if the query qO returned byFindMinimallyCompleteAbstraction(J , qS) is a sound J -abstraction ofqS , in which case qO is the perfect J -abstraction of qS .

Maurizio Lenzerini Query answering and query abstraction through ontologies (53/59)

UCQ-maximally sound abstraction

Computing UCQ-maximally sound J -abstractions is more involved.

Theorem

The verification problem for sound abstractions is Πp2-complete.

Theorem

UCQ-maximally sound J -abstractions of a query qS may not exist if theOBDM specification J has one of the following features:

1 disjointness axioms in the ontology;

2 inclusion axioms with ∃R as right-hand side in the ontology;

3 LAV mapping assertions, even without joins involving existential variablesin the right-hand side;

4 non-pure GAV mapping assertions;

5 qS has joins on existential variables.

We have devised an algorithm for computing UCQ-maximally soundJ -abstractions in the restricted setting with none of the above features.

Maurizio Lenzerini Query answering and query abstraction through ontologies (54/59)

UCQ-maximally sound abstraction with non-pure GAVmappings

A GAV mapping is pure if for each mapping assertion, the variables in theright-hand-side query are different. Consider the following OBDA specificationJ = 〈O,S,M〉 with non-pure GAV mapping:

O = ∅,S = { s1, s2 },M = { s1(x1, x2)→ R(x1, x2), s2(x)→ R(x, x) },

and the following qS = {(x1, x2) | s1(x1, x2)}.

q′O = {(x1, x2) | R(x1, x2)} is the UCQ-minimally complete J -rewriting of qS ,but is not sound. Intuitively, the infinite query:

qO =⋃

a,b∈constwith a 6=b

{(a, b) | R(a, b)}

is the maximally sound J -abstraction of qS in the class of positive queries, butis not a UCQ → no UCQ-maximally sound J -abstraction of qO exists.

Maurizio Lenzerini Query answering and query abstraction through ontologies (55/59)

Outline

1 Ontology-based data management

2 Processing queries in OBDM: answering and abstraction

3 Query answering

4 Query abstraction

5 Conclusion

Maurizio Lenzerini Query answering and query abstraction through ontologies (56/59)

Future work on query abstraction

A notable challenge: can we obtain better sound abstractions if we expressabstractions with languages going beyond UCQ?

Animal

Person

Employee Student

T_STUDENTT_REG

JOB = ‘e’

University

Source query q′S :select ID as x from T REG

UCQ-maximally sound J -abstraction of q′S : Employee(x)

a better sound J -abstraction of q′S : Person(x), ¬ KStudent(x),corresponding to the non-monotonic query in EQL-lite asking for allpersons that are not known to be students.

Maurizio Lenzerini Query answering and query abstraction through ontologies (57/59)

Future work on query abstraction

More challenges:

The existence problem.

Investigate ontology query languages beyond UCQ (such as EQL-lite)

Single out more restricted settings guaranteeing existence ofUCQ-maximally sound abstractions.

Going beyond DL-LiteR.

We have assumed that the query language used to express qS is thelanguage of CQs. This is too limited: real applications often requiresaggregation, negation, universal quantification, etc. Can we computeabstractions of queries with such operators?

Study specific applications of abstraction, and try to specialize thecorresponding techniques.

Investigate possibile use of query abstraction in explanation of classifiers

Research themes in HOPE, a national MIUR-PRIN project with SapienzaUniversity of Rome, Free University of Bozen/Bolzano, Polytechnic Universityof Milan, University of Milan, and University of Cagliari.

Maurizio Lenzerini Query answering and query abstraction through ontologies (58/59)

Thanks

Joint work with(*)

Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, AntonellaPoggi, Riccardo Rosati, and many others (query answering)

Gianluca Cima, Antonella Poggi (query abstraction)

Thank you for your attention!

(*) My heart is not weary, it’s light and it’s free; I’ve got nothin’ but affection for allthose who’ve sailed with me – Bob Dylan

Maurizio Lenzerini Query answering and query abstraction through ontologies (59/59)