AMW'11 dependencies-sem index-t-mappings

Preview:

DESCRIPTION

First talk where I introduce the semantic index technique for query answering with inferences, the T-mappings technique (a mapping transformation/optimisation technique to avoid exponential blows during query rewriting) and the role of dependencies in query answering by query rewriting.

Citation preview

DependenciesMaking Ontology Based Data Access Work in Practice

Mariano Rodriguez-Muro and Diego Calvanese{rodriguez,calvanese}@inf.unibz.it

KRDB Research CentreFree University of Bozen Bolzano

May 11, 2011

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 1 / 33

The context

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 2 / 33

DL Ontologies

Description Logics:

• Formalisms for knowledge representation.

• Decidable fragments of FOL

• Base of OWL

• World is described by means of Concepts and Roles

Ontologies

• Intentional knowledge: TBox T .

• Extensional knowledge: ABox A.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 3 / 33

DL Ontologies

Description Logics:

• Formalisms for knowledge representation.

• Decidable fragments of FOL

• Base of OWL

• World is described by means of Concepts and Roles

Ontologies

• Intentional knowledge: TBox T .

• Extensional knowledge: ABox A.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 3 / 33

OBDA with DL-Lite

A family of light-weight ontology languages

• DL-LiteF conceptsB := A | ∃R

• DL-LiteF rolesR := P | P−

• DL-LiteF TBoxes

B v B | B v ¬B | (funct R)

• DL-LiteF ABoxesA(a) | R(a, b)

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 4 / 33

OBDA with DL-Lite

A family of light-weight ontology languages

• DL-LiteF conceptsB := A | ∃R

• DL-LiteF rolesR := P | P−

• DL-LiteF TBoxes

B v B | B v ¬B | (funct R)

• DL-LiteF ABoxesA(a) | R(a, b)

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 4 / 33

OBDA with DL-Lite

A family of light-weight ontology languages

• DL-LiteF conceptsB := A | ∃R

• DL-LiteF rolesR := P | P−

• DL-LiteF TBoxes

B v B | B v ¬B | (funct R)

• DL-LiteF ABoxesA(a) | R(a, b)

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 4 / 33

OBDA with DL-Lite

A family of light-weight ontology languages

• DL-LiteF conceptsB := A | ∃R

• DL-LiteF rolesR := P | P−

• DL-LiteF TBoxes

B v B | B v ¬B | (funct R)

• DL-LiteF ABoxesA(a) | R(a, b)

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 4 / 33

OBDA with DL-Lite

A family of light-weight ontology languages

• DL-LiteF conceptsB := A | ∃R

• DL-LiteF rolesR := P | P−

• DL-LiteF TBoxes

B v B | B v ¬B | (funct R)

• DL-LiteF ABoxesA(a) | R(a, b)

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 4 / 33

Query Answering

TBox:

Man v Person,Woman v Person,Person v ∃hasFather ,

∃hasFather− v Person

ABox:Man(mariano)

Queries:q(x)← Person(x), hasFather(x , y),Person(y)

Problem: Compute the certain answers of Q, denoted cert(Q,O).

The promise

We can do this as efficiently as answering DB queries, also in the virtualsetting.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 5 / 33

Query Answering

TBox:

Man v Person,Woman v Person,Person v ∃hasFather ,

∃hasFather− v Person

ABox:Man(mariano)

Queries:q(x)← Person(x), hasFather(x , y),Person(y)

Problem: Compute the certain answers of Q, denoted cert(Q,O).

The promise

We can do this as efficiently as answering DB queries, also in the virtualsetting.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 5 / 33

Query Answering

TBox:

Man v Person,Woman v Person,Person v ∃hasFather ,

∃hasFather− v Person

ABox:Man(mariano)

Queries:q(x)← Person(x), hasFather(x , y),Person(y)

Problem: Compute the certain answers of Q, denoted cert(Q,O).

The promise

We can do this as efficiently as answering DB queries, also in the virtualsetting.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 5 / 33

Query Answering

TBox:

Man v Person,Woman v Person,Person v ∃hasFather ,

∃hasFather− v Person

ABox:Man(mariano)

Queries:q(x)← Person(x), hasFather(x , y),Person(y)

Problem: Compute the certain answers of Q, denoted cert(Q,O).

The promise

We can do this as efficiently as answering DB queries, also in the virtualsetting.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 5 / 33

Query Answering with PerfectRef (2005)

Query:q(x)← Person(x), hasFather(x , y),Person(y)

Reformulation:

q(x)← Person(x), hasFather(x , y),Person(y)

q(x)← Person(x), hasFather(x , y), hasFather(z , y)

q(x)← Person(x), hasFather(x , y)

q(x)← Person(x),Person(x)

q(x)← Person(x)

q(x)← Person(x), hasFather(x , y),Man(y)

q(x)← Person(x), hasFather(x , y),Woman(y)

q(x)← hasFather(x ,m), hasFather(x , y),Person(y)

q(x)← hasFather(x ,m), hasFather(x , y), hasFather(z , y)

q(x)← hasFather(x ,m), hasFather(x , y)

q(x)← hasFather(x ,m),Person(x)

q(x)← hasFather(x ,m), hasFather(x , t)

q(x)← hasFather(x ,m)

q(x)← hasFather(x ,m), hasFather(x , y),Man(y)

q(x)← hasFather(x ,m), hasFather(x , y),Woman(y)

q(x)← Man(x), hasFather(x , y),Person(y)

q(x)← Man(x), hasFather(x , y), hasFather(y , z)

q(x)← Man(x), hasFather(x , y),Man(y)

q(x)← Man(x), hasFather(x , y),Woman(y)

q(x)←Woman(x), hasFather(x , y),Person(y)

q(x)←Woman(x), hasFather(x , y), hasFather(y , z)

q(x)←Woman(x), hasFather(x , y),Man(y)

q(x)←Woman(x), hasFather(x , y),Woman(y)

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 6 / 33

Query Answering with PerfectRef (2005)Query:

q(x)← Person(x), hasFather(x , y),Person(y)

Reformulation:

q(x)← Person(x), hasFather(x , y),Person(y)

q(x)← Person(x), hasFather(x , y), hasFather(z , y)

q(x)← Person(x), hasFather(x , y)

q(x)← Person(x),Person(x)

q(x)← Person(x)

q(x)← Person(x), hasFather(x , y),Man(y)

q(x)← Person(x), hasFather(x , y),Woman(y)

q(x)← hasFather(x ,m), hasFather(x , y),Person(y)

q(x)← hasFather(x ,m), hasFather(x , y), hasFather(z , y)

q(x)← hasFather(x ,m), hasFather(x , y)

q(x)← hasFather(x ,m),Person(x)

q(x)← hasFather(x ,m), hasFather(x , t)

q(x)← hasFather(x ,m)

q(x)← hasFather(x ,m), hasFather(x , y),Man(y)

q(x)← hasFather(x ,m), hasFather(x , y),Woman(y)

q(x)← Man(x), hasFather(x , y),Person(y)

q(x)← Man(x), hasFather(x , y), hasFather(y , z)

q(x)← Man(x), hasFather(x , y),Man(y)

q(x)← Man(x), hasFather(x , y),Woman(y)

q(x)←Woman(x), hasFather(x , y),Person(y)

q(x)←Woman(x), hasFather(x , y), hasFather(y , z)

q(x)←Woman(x), hasFather(x , y),Man(y)

q(x)←Woman(x), hasFather(x , y),Woman(y)

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 6 / 33

Query Answering with PerfectRef (2005)Query:

q(x)← Person(x), hasFather(x , y),Person(y)

Reformulation:

q(x)← Person(x), hasFather(x , y),Person(y)

q(x)← Person(x), hasFather(x , y), hasFather(z , y)

q(x)← Person(x), hasFather(x , y)

q(x)← Person(x),Person(x)

q(x)← Person(x)

q(x)← Person(x), hasFather(x , y),Man(y)

q(x)← Person(x), hasFather(x , y),Woman(y)

q(x)← hasFather(x ,m), hasFather(x , y),Person(y)

q(x)← hasFather(x ,m), hasFather(x , y), hasFather(z , y)

q(x)← hasFather(x ,m), hasFather(x , y)

q(x)← hasFather(x ,m),Person(x)

q(x)← hasFather(x ,m), hasFather(x , t)

q(x)← hasFather(x ,m)

q(x)← hasFather(x ,m), hasFather(x , y),Man(y)

q(x)← hasFather(x ,m), hasFather(x , y),Woman(y)

q(x)← Man(x), hasFather(x , y),Person(y)

q(x)← Man(x), hasFather(x , y), hasFather(y , z)

q(x)← Man(x), hasFather(x , y),Man(y)

q(x)← Man(x), hasFather(x , y),Woman(y)

q(x)←Woman(x), hasFather(x , y),Person(y)

q(x)←Woman(x), hasFather(x , y), hasFather(y , z)

q(x)←Woman(x), hasFather(x , y),Man(y)

q(x)←Woman(x), hasFather(x , y),Woman(y)

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 6 / 33

Alternatives

• Improved version of PerfectRef (2007-2011)

• RQR (Urbina et, al. 2007)

Too many unions, cannot execute!.

• PRESTO (Rosati et al., 2010)

Better, eventually it breaks.

• Combined Approach (Kontchakov et. al., 2010)

Fast. But too much data and too much time.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 7 / 33

Alternatives

• Improved version of PerfectRef (2007-2011)

• RQR (Urbina et, al. 2007)

Too many unions, cannot execute!.

• PRESTO (Rosati et al., 2010)

Better, eventually it breaks.

• Combined Approach (Kontchakov et. al., 2010)

Fast. But too much data and too much time.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 7 / 33

Alternatives

• Improved version of PerfectRef (2007-2011)

• RQR (Urbina et, al. 2007)

Too many unions, cannot execute!.

• PRESTO (Rosati et al., 2010)

Better, eventually it breaks.

• Combined Approach (Kontchakov et. al., 2010)

Fast. But too much data and too much time.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 7 / 33

Alternatives

• Improved version of PerfectRef (2007-2011)

• RQR (Urbina et, al. 2007)

Too many unions, cannot execute!.

• PRESTO (Rosati et al., 2010)

Better, eventually it breaks.

• Combined Approach (Kontchakov et. al., 2010)

Fast. But too much data and too much time.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 7 / 33

Alternatives

• Improved version of PerfectRef (2007-2011)

• RQR (Urbina et, al. 2007)

Too many unions, cannot execute!.

• PRESTO (Rosati et al., 2010)

Better, eventually it breaks.

• Combined Approach (Kontchakov et. al., 2010)

Fast. But too much data and too much time.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 7 / 33

Alternatives

• Improved version of PerfectRef (2007-2011)

• RQR (Urbina et, al. 2007)

Too many unions, cannot execute!.

• PRESTO (Rosati et al., 2010)

Better, eventually it breaks.

• Combined Approach (Kontchakov et. al., 2010)

Fast. But too much data and too much time.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 7 / 33

Alternatives

• Improved version of PerfectRef (2007-2011)

• RQR (Urbina et, al. 2007)

Too many unions, cannot execute!.

• PRESTO (Rosati et al., 2010)

Better, eventually it breaks.

• Combined Approach (Kontchakov et. al., 2010)

Fast. But too much data and too much time.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 7 / 33

What can we do?

?

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 8 / 33

Query AnsweringIt is not only about existential constants

Query:q(x , y)← Person(x), hasFather(x , y),Person(y)

Reformulation:

q(x , y)← Person(x), hasFather(x , y),Person(y)

q(x , y)← Person(x), hasFather(x , y), hasFather(z , y)

q(x , y)← Person(x), hasFather(x , y),Man(y)

q(x , y)← Person(x), hasFather(x , y),Woman(y)

q(x , y)← hasFather(x ,m), hasFather(x , y),Person(y)

q(x , y)← hasFather(x ,m), hasFather(x , y), hasFather(z , y)

q(x , y)← hasFather(x ,m), hasFather(x , y),Man(y)

q(x , y)← hasFather(x ,m), hasFather(x , y),Woman(y)

q(x , y)← Man(x), hasFather(x , y),Person(y)

q(x , y)← Man(x), hasFather(x , y), hasFather(z , y)

q(x , y)← Man(x), hasFather(x , y),Man(y)

q(x , y)← Man(x), hasFather(x , y),Woman(y)

q(x , y)←Woman(x), hasFather(x , y),Person(y)

q(x , y)←Woman(x), hasFather(x , y), hasFather(z , y)

q(x , y)←Woman(x), hasFather(x , y),Man(y)

q(x , y)←Woman(x), hasFather(x , y),Woman(y)

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 9 / 33

Query AnsweringIt is not only about existential constants

Query:q(x , y)← Person(x), hasFather(x , y),Person(y)

Reformulation:

q(x , y)← Person(x), hasFather(x , y),Person(y)

q(x , y)← Person(x), hasFather(x , y), hasFather(z , y)

q(x , y)← Person(x), hasFather(x , y),Man(y)

q(x , y)← Person(x), hasFather(x , y),Woman(y)

q(x , y)← hasFather(x ,m), hasFather(x , y),Person(y)

q(x , y)← hasFather(x ,m), hasFather(x , y), hasFather(z , y)

q(x , y)← hasFather(x ,m), hasFather(x , y),Man(y)

q(x , y)← hasFather(x ,m), hasFather(x , y),Woman(y)

q(x , y)← Man(x), hasFather(x , y),Person(y)

q(x , y)← Man(x), hasFather(x , y), hasFather(z , y)

q(x , y)← Man(x), hasFather(x , y),Man(y)

q(x , y)← Man(x), hasFather(x , y),Woman(y)

q(x , y)←Woman(x), hasFather(x , y),Person(y)

q(x , y)←Woman(x), hasFather(x , y), hasFather(z , y)

q(x , y)←Woman(x), hasFather(x , y),Man(y)

q(x , y)←Woman(x), hasFather(x , y),Woman(y)

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 9 / 33

The full picture: Ontology Based DataAccess

SourceUser SourceUser

Queries Ontology

Mappings

Source

To deal with OBDA we need to consider:

• If in the backend we have RDBMSs, we cannot go beyond theircapabilities.

• All systems are composed by T , D = 〈R, I〉, M.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 10 / 33

First ObservationIs my data complete?

Completeness of A

The TBox sais: Manager v EmployeeIn the ABox: all Managers are already employees.

In any realistic scenario:

• We don’t use arbitrary sources;

• Intersection of semantics is reflected in completeness (e.g., no need tochase, expand or rewrite)

• This happens a lot!

Keyword

Redundancy

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 11 / 33

First ObservationIs my data complete?

Completeness of A

The TBox sais: Manager v Employee

In the ABox: all Managers are already employees.

In any realistic scenario:

• We don’t use arbitrary sources;

• Intersection of semantics is reflected in completeness (e.g., no need tochase, expand or rewrite)

• This happens a lot!

Keyword

Redundancy

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 11 / 33

First ObservationIs my data complete?

Completeness of A

The TBox sais: Manager v EmployeeIn the ABox: all Managers are already employees.

In any realistic scenario:

• We don’t use arbitrary sources;

• Intersection of semantics is reflected in completeness (e.g., no need tochase, expand or rewrite)

• This happens a lot!

Keyword

Redundancy

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 11 / 33

First ObservationIs my data complete?

Completeness of A

The TBox sais: Manager v EmployeeIn the ABox: all Managers are already employees.

In any realistic scenario:

• We don’t use arbitrary sources;

• Intersection of semantics is reflected in completeness (e.g., no need tochase, expand or rewrite)

• This happens a lot!

Keyword

Redundancy

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 11 / 33

First ObservationIs my data complete?

Completeness of A

The TBox sais: Manager v EmployeeIn the ABox: all Managers are already employees.

In any realistic scenario:

• We don’t use arbitrary sources;

• Intersection of semantics is reflected in completeness (e.g., no need tochase, expand or rewrite)

• This happens a lot!

Keyword

Redundancy

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 11 / 33

First ObservationIs my data complete?

Completeness of A

The TBox sais: Manager v EmployeeIn the ABox: all Managers are already employees.

In any realistic scenario:

• We don’t use arbitrary sources;

• Intersection of semantics is reflected in completeness (e.g., no need tochase, expand or rewrite)

• This happens a lot!

Keyword

Redundancy

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 11 / 33

First ObservationIs my data complete?

Completeness of A

The TBox sais: Manager v EmployeeIn the ABox: all Managers are already employees.

In any realistic scenario:

• We don’t use arbitrary sources;

• Intersection of semantics is reflected in completeness (e.g., no need tochase, expand or rewrite)

• This happens a lot!

Keyword

Redundancy

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 11 / 33

First ObservationIs my data complete?

Completeness of A

The TBox sais: Manager v EmployeeIn the ABox: all Managers are already employees.

In any realistic scenario:

• We don’t use arbitrary sources;

• Intersection of semantics is reflected in completeness (e.g., no need tochase, expand or rewrite)

• This happens a lot!

Keyword

Redundancy

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 11 / 33

Second ObservationThere are no ABoxes

THERE ARE NO ABOXES!

Any Ontology based query answering systems today:

• Uses relational DBs to store the ABox data;

• In such D, both, R and I can be manipulated;

• Implementors may choose any M for their system;

Opportunity

To complete an ABox we can do more than expansion.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 12 / 33

Second ObservationThere are no ABoxes

THERE ARE NO ABOXES!

Any Ontology based query answering systems today:

• Uses relational DBs to store the ABox data;

• In such D, both, R and I can be manipulated;

• Implementors may choose any M for their system;

Opportunity

To complete an ABox we can do more than expansion.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 12 / 33

How to approach the problemTwo level approach

How to approach OBDA in practice?

• Efficient ways to deal with redundancy due to completeness.

• Efficient ways to complete (virtual) ABoxes.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 13 / 33

How to approach the problemTwo level approach

How to approach OBDA in practice?

• Efficient ways to deal with redundancy due to completeness.

• Efficient ways to complete (virtual) ABoxes.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 13 / 33

How to approach the problemTwo level approach

How to approach OBDA in practice?

• Efficient ways to deal with redundancy due to completeness.

• Efficient ways to complete (virtual) ABoxes.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 13 / 33

How to approach the problemTwo level approach

How to approach OBDA in practice?

• Efficient ways to deal with redundancy due to completeness.

• Efficient ways to complete (virtual) ABoxes.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 13 / 33

ContributionsDealing with redundancy

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 14 / 33

Characterizing completeness

ABox Dependencies

Definition

An assertion B vA B that restricts valid ABoxes.

Syntax B2 vA B2

Semantics: A |= Manager vA Employee if Manager(x)∈ A impliesEmployee(x)∈ A.

ABox dependencies are fundamentally different than TBox assertions.Think open world

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 15 / 33

Characterizing completeness

ABox Dependencies

Definition

An assertion B vA B that restricts valid ABoxes.

Syntax B2 vA B2

Semantics: A |= Manager vA Employee if Manager(x)∈ A impliesEmployee(x)∈ A.

ABox dependencies are fundamentally different than TBox assertions.Think open world

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 15 / 33

Where to deal with redundancy?

Given a TBox T , an ABox A, a set of dependencies Σ and a query Q,what do we do?

Available Options:

• Optimize the query reformulation algorithm to deal with Σ.

• Optimize the TBox T with respect to Σ.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 16 / 33

Where to deal with redundancy?

Given a TBox T , an ABox A, a set of dependencies Σ and a query Q,what do we do?Available Options:

• Optimize the query reformulation algorithm to deal with Σ.

• Optimize the TBox T with respect to Σ.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 16 / 33

Where to deal with redundancy?

Given a TBox T , an ABox A, a set of dependencies Σ and a query Q,what do we do?Available Options:

• Optimize the query reformulation algorithm to deal with Σ.

• Optimize the TBox T with respect to Σ.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 16 / 33

Where to deal with redundancy?

Given a TBox T , an ABox A, a set of dependencies Σ and a query Q,what do we do?Available Options:

• Optimize the query reformulation algorithm to deal with Σ.

• Optimize the TBox T with respect to Σ.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 16 / 33

When is an assertion redundant?

Direct Redundancy: Case 1

Let T be implied the followinghierarchy:

∃hasFather

Person

Human

Redundant if Σ is:

∃hasFather

Person

Human

Σ sais hasFather(mariano, ramon) ∈ A → Human(mariano) ∈ A.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 17 / 33

When is an assertion redundant?

Direct Redundancy: Case 1

Let T be implied the followinghierarchy:

∃hasFather

Person

Human

Redundant if Σ is:

∃hasFather

Person

Human

Σ sais hasFather(mariano, ramon) ∈ A → Human(mariano) ∈ A.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 17 / 33

When is an assertion redundant?

Direct Redundancy: Case 1

Let T be implied the followinghierarchy:

∃hasFather

Person

Human

Redundant if Σ is:

∃hasFather

Person

Human

Σ sais hasFather(mariano, ramon) ∈ A → Human(mariano) ∈ A.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 17 / 33

When is an assertion redundant?

Direct Redundancy: Case 1

Let T be implied the followinghierarchy:

∃hasFather

Person

Human

Redundant if Σ is:

∃hasFather

Person

Human

Σ sais hasFather(mariano, ramon) ∈ A → Human(mariano) ∈ A.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 17 / 33

When is an assertion redundant?

Direct Redundancy: Case 2

Let T be the following TBox:

Person

∃hasFather−

∃hasFather

Man

Redundant if Σ is:

Person

∃hasFather−

∃hasFather

Man

Σ sais Man(ramon) ∈ A → ∃a′ | hasFather(ramon, a′) ∧ Person(a′) ∈ A.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 18 / 33

When is an assertion redundant?

Direct Redundancy: Case 2

Let T be the following TBox:

Person

∃hasFather−

∃hasFather

Man

Redundant if Σ is:

Person

∃hasFather−

∃hasFather

Man

Σ sais Man(ramon) ∈ A → ∃a′ | hasFather(ramon, a′) ∧ Person(a′) ∈ A.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 18 / 33

When is an assertion redundant?

Direct Redundancy: Case 2

Let T be the following TBox:

Person

∃hasFather−

∃hasFather

Man

Redundant if Σ is:

Person

∃hasFather−

∃hasFather

Man

Σ sais Man(ramon) ∈ A → ∃a′ | hasFather(ramon, a′) ∧ Person(a′) ∈ A.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 18 / 33

When is an assertion redundant?

Direct Redundancy: Case 2

Let T be the following TBox:

Person

∃hasFather−

∃hasFather

Man

Redundant if Σ is:

Person

∃hasFather−

∃hasFather

Man

Σ sais Man(ramon) ∈ A → ∃a′ | hasFather(ramon, a′) ∧ Person(a′) ∈ A.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 18 / 33

When is an assertion redundant?

Direct Redundancy: Case 2

Let T be the following TBox:

Person

∃hasFather−

∃hasFather

Man

Redundant if Σ is:

Person

∃hasFather−

∃hasFather

Man

Σ sais Man(ramon) ∈ A → ∃a′ | hasFather(ramon, a′) ∧ Person(a′) ∈ A.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 18 / 33

When is an assertion redundant?Indirect Redundancy

Let T be the following TBox:

Animal

Man Human

Redundant if Σ is:

Animal

Man Human

Σ sais Man(mariano) ∈ A then Animal(mariano) ∈ A.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 19 / 33

When is an assertion redundant?Indirect Redundancy

Let T be the following TBox:

Animal

Man Human

Redundant if Σ is:

Animal

Man Human

Σ sais Man(mariano) ∈ A then Animal(mariano) ∈ A.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 19 / 33

When is an assertion redundant?Indirect Redundancy

Let T be the following TBox:

Animal

Man Human

Redundant if Σ is:

Animal

Man Human

Σ sais Man(mariano) ∈ A then Animal(mariano) ∈ A.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 19 / 33

When is an assertion redundant?Indirect Redundancy

Let T be the following TBox:

Animal

Man Human

Redundant if Σ is:

Animal

Man Human

Σ sais Man(mariano) ∈ A then Animal(mariano) ∈ A.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 19 / 33

Formalization: Redundancy

Given a TBox T and a set of dependencies Σ over T , the optimized versionof T w.r.t. Σ, denoted optim(T ,Σ), is the set of inclusion assertions

{α ∈ sat(T ) | α is not redundant in sat(T ) w.r.t. sat(Σ)}

We can compute optim(T ,Σ) in linear time.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 20 / 33

ContributionsCompleting ABoxes

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 21 / 33

General considerations

OBDA systems have no ABoxes, instead virtual ABoxes V = 〈D,M〉 withD = 〈R, I〉.

If we that V |= A vA B, we check make sure that mappings for B includeall the data coming from the mappings of A.Trade-off:

• Degree of completeness (# of dependencies),

• Cost of the procedure

• Performance of Query answering.

We can complete virtual ABoxes up to B v ∃R without the need for newdata.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 22 / 33

General considerations

OBDA systems have no ABoxes, instead virtual ABoxes V = 〈D,M〉 withD = 〈R, I〉.

If we that V |= A vA B, we check make sure that mappings for B includeall the data coming from the mappings of A.

Trade-off:

• Degree of completeness (# of dependencies),

• Cost of the procedure

• Performance of Query answering.

We can complete virtual ABoxes up to B v ∃R without the need for newdata.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 22 / 33

General considerations

OBDA systems have no ABoxes, instead virtual ABoxes V = 〈D,M〉 withD = 〈R, I〉.

If we that V |= A vA B, we check make sure that mappings for B includeall the data coming from the mappings of A.Trade-off:

• Degree of completeness (# of dependencies),

• Cost of the procedure

• Performance of Query answering.

We can complete virtual ABoxes up to B v ∃R without the need for newdata.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 22 / 33

General considerations

OBDA systems have no ABoxes, instead virtual ABoxes V = 〈D,M〉 withD = 〈R, I〉.

If we that V |= A vA B, we check make sure that mappings for B includeall the data coming from the mappings of A.Trade-off:

• Degree of completeness (# of dependencies),

• Cost of the procedure

• Performance of Query answering.

We can complete virtual ABoxes up to B v ∃R without the need for newdata.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 22 / 33

Semantic Index for OBDA

General Idea

• To encode the semantics of T in numeric indexes and ranges forconcept names and roles.

• Store the ABox in the database using those indexes and ranges.

• Make mappings for the system that take the ranges into account.

We can do this by using the implied hierarchy of T to generate the indexand ranges!

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 23 / 33

Semantic Index for OBDA

General Idea• To encode the semantics of T in numeric indexes and ranges for

concept names and roles.

• Store the ABox in the database using those indexes and ranges.

• Make mappings for the system that take the ranges into account.

We can do this by using the implied hierarchy of T to generate the indexand ranges!

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 23 / 33

Semantic Index for OBDA

General Idea• To encode the semantics of T in numeric indexes and ranges for

concept names and roles.

• Store the ABox in the database using those indexes and ranges.

• Make mappings for the system that take the ranges into account.

We can do this by using the implied hierarchy of T to generate the indexand ranges!

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 23 / 33

Semantic Index for OBDA

General Idea• To encode the semantics of T in numeric indexes and ranges for

concept names and roles.

• Store the ABox in the database using those indexes and ranges.

• Make mappings for the system that take the ranges into account.

We can do this by using the implied hierarchy of T to generate the indexand ranges!

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 23 / 33

Semantic Index for OBDA

General Idea• To encode the semantics of T in numeric indexes and ranges for

concept names and roles.

• Store the ABox in the database using those indexes and ranges.

• Make mappings for the system that take the ranges into account.

We can do this by using the implied hierarchy of T to generate the indexand ranges!

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 23 / 33

Semantic Index Example

T = {B v A,C v A,C v D}

We create a table TC with constant and idx columns. To insert the datawe use the indexes. e.g., B(mariano) ∈ A then we put (mariano, 2) ∈ TC

We create the mappings using the ranges, e.g., SELECT constant

FROM TC WHERE IDX ≥ 1 AND IDX ≤ 3; A(constant)

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 24 / 33

Semantic Index Example

T = {B v A,C v A,C v D}

A

B C

D

We create a table TC with constant and idx columns. To insert the datawe use the indexes. e.g., B(mariano) ∈ A then we put (mariano, 2) ∈ TC

We create the mappings using the ranges, e.g., SELECT constant

FROM TC WHERE IDX ≥ 1 AND IDX ≤ 3; A(constant)

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 24 / 33

Semantic Index Example

T = {B v A,C v A,C v D}

1A

B2

C3

4D

We create a table TC with constant and idx columns. To insert the datawe use the indexes. e.g., B(mariano) ∈ A then we put (mariano, 2) ∈ TC

We create the mappings using the ranges, e.g., SELECT constant

FROM TC WHERE IDX ≥ 1 AND IDX ≤ 3; A(constant)

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 24 / 33

Semantic Index Example

T = {B v A,C v A,C v D}

1A

B2

C3

4D

We create a table TC with constant and idx columns. To insert the datawe use the indexes. e.g., B(mariano) ∈ A then we put (mariano, 2) ∈ TC

We create the mappings using the ranges, e.g., SELECT constant

FROM TC WHERE IDX ≥ 1 AND IDX ≤ 3; A(constant)

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 24 / 33

Semantic Index Example

T = {B v A,C v A,C v D}

1, {(1, 3)}A

B2, {(2, 2)}

C3, {(3, 3)}

4, {(3, 4)}D

We create a table TC with constant and idx columns. To insert the datawe use the indexes. e.g., B(mariano) ∈ A then we put (mariano, 2) ∈ TC

We create the mappings using the ranges, e.g., SELECT constant

FROM TC WHERE IDX ≥ 1 AND IDX ≤ 3; A(constant)

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 24 / 33

Semantic Index Example

T = {B v A,C v A,C v D}

1, {(1, 3)}A

B2, {(2, 2)}

C3, {(3, 3)}

4, {(3, 4)}D

We create a table TC with constant and idx columns. To insert the datawe use the indexes. e.g., B(mariano) ∈ A then we put (mariano, 2) ∈ TC

We create the mappings using the ranges, e.g., SELECT constant

FROM TC WHERE IDX ≥ 1 AND IDX ≤ 3; A(constant)

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 24 / 33

Experimentation I

The Resource Index features:

• Search over 22 document collections

• Semantics given by the hierarchies of 200 ontologies (SNOMED, GO)

Implementation in a nutshell:

(i) Understand documents with natural language processing andannotate

Cervical Cancer(′doc224′)

(ii) Expand the ABox

(iii) Pose queries that retrieve documents as

q(x)← A1(x) ∧ · · · ∧ An(x)

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 25 / 33

Experimentation II

The challenge:

• ≈ 3 million concepts and ≈ 2.5 million is-a assertions

• Split second responses

• 150 GB of data

• Expansion data: 1.5 TB

The experimentation data:

• Clinical Trials.gov (CT)

• 181 million assertion (≈ 14 GB of data, ≈ 140 GB when expanded.)

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 26 / 33

Results

The query:

q(x)← DNA Repair Gene(x) ∧ Antigen Gene(x) ∧ Cancer Gene(x)

Results:

• Traditional reformulation: Union of 467874 SQL SPJ queries;

• Semantic Index: 1 SQL; execution 3.582s (0.082s if warm); Timeto compute semantic index: 1 min; Size of data: +≈ 4 GB.

• ABox expansion: 1 SQL; executing 3s (0.6s if warm); Expansiontime ≈ 7 days; Size of data +≈ 126 GB.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 27 / 33

Results

The query:

q(x)← DNA Repair Gene(x) ∧ Antigen Gene(x) ∧ Cancer Gene(x)

Results:

• Traditional reformulation: Union of 467874 SQL SPJ queries;

• Semantic Index: 1 SQL; execution 3.582s (0.082s if warm); Timeto compute semantic index: 1 min; Size of data: +≈ 4 GB.

• ABox expansion: 1 SQL; executing 3s (0.6s if warm); Expansiontime ≈ 7 days; Size of data +≈ 126 GB.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 27 / 33

The Query

The query:

q(x)← DNA Repair Gene(x) ∧ Antigen Gene(x) ∧ Cancer Gene(x)

SELECT DISTINCT r0.element_id as element_id

FROM

RESOURCE_INDEX.CT_ANN r0 JOIN RESOURCE_INDEX.CT_ANN r1

ON r0.element_id = r1.element_id

JOIN RESOURCE_INDEX.CT_ANN r2

ON r1.element_id = r2.element_id

WHERE

((r0.idx >= 1783559 AND r0.idx <= 1783657)) AND

((r1.idx >= 1782996 AND r1.idx <= 1783029)) AND

((r2.idx >= 1783115 AND r2.idx <= 1783253));

Standard SQL query efficient in ANY DBMS.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 28 / 33

Conclusions

Contributions

• We indicated that efficient OBDA requires to take into account morethan only T , A and Q.

• Provided means to deal with redundancy at the level of the TBox.

• We showed that expansion is not necessary that we can completeABoxes.

• We presented to efficient ways to complete ABoxes, one for thegeneral OBDA setting and one for the virtual setting.

Future work

• Exploring more expressive languages.

• Exploring the RDFS/SPARQL setting.

• Handling updates of T and A.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 29 / 33

Conclusions

Contributions

• We indicated that efficient OBDA requires to take into account morethan only T , A and Q.

• Provided means to deal with redundancy at the level of the TBox.

• We showed that expansion is not necessary that we can completeABoxes.

• We presented to efficient ways to complete ABoxes, one for thegeneral OBDA setting and one for the virtual setting.

Future work

• Exploring more expressive languages.

• Exploring the RDFS/SPARQL setting.

• Handling updates of T and A.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 29 / 33

Extra examples

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 30 / 33

First Observation (cont.)Mappings will introduce dependencies over ABoxes

Let R be a DB schema with the relation schema employee with attributesid, dept, and salary. Let M be the following mappings:

SELECT id,dept FROM employee ;q(id , dept)← Employee(id) ∧WORKS-FOR(id, dept)

SELECT id,dept FROM employee

WHERE salary > 1000

;q(id , dept)← Manager(id)∧MANAGES(id, dept)

Then for any instance I, if Manager(John) ∈ A we have thatEmployee(John).This is an indicator of completeness of all ABoxes A for M and R, e.g., Ais complete w.r.t. Manager vA Employee.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 31 / 33

First Observation (cont.)Mappings will introduce dependencies over ABoxes

Let R be a DB schema with the relation schema employee with attributesid, dept, and salary. Let M be the following mappings:

SELECT id,dept FROM employee ;q(id , dept)← Employee(id) ∧WORKS-FOR(id, dept)

SELECT id,dept FROM employee

WHERE salary > 1000

;q(id , dept)← Manager(id)∧MANAGES(id, dept)

Then for any instance I, if Manager(John) ∈ A we have thatEmployee(John).

This is an indicator of completeness of all ABoxes A for M and R, e.g., Ais complete w.r.t. Manager vA Employee.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 31 / 33

First Observation (cont.)Mappings will introduce dependencies over ABoxes

Let R be a DB schema with the relation schema employee with attributesid, dept, and salary. Let M be the following mappings:

SELECT id,dept FROM employee ;q(id , dept)← Employee(id) ∧WORKS-FOR(id, dept)

SELECT id,dept FROM employee

WHERE salary > 1000

;q(id , dept)← Manager(id)∧MANAGES(id, dept)

Then for any instance I, if Manager(John) ∈ A we have thatEmployee(John).This is an indicator of completeness of all ABoxes A for M and R, e.g., Ais complete w.r.t. Manager vA Employee.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 31 / 33

Formalization: Chains

Let T be a TBox, B, C basic concepts, and Σ a set of dependencies overT . A T -chain from B to C in T (resp., a Σ-chain from B to C in Σ) is asequence of concept inclusion assertions (Bi v B ′i )

ni=0 in T (resp., a

sequence of inclusion dependencies (Bi vA B ′i )ni=0 in Σ), for some n ≥ 0,

such that:

1 B0 = B, B ′n = C , and

2 for 1 ≤ i ≤ n, we have that B ′i−1 and Bi are basic concepts s.t., either

(i) B ′i−1 = Bi , or(ii) B ′i−1 = ∃R and Bi = ∃R−, for some basic role R.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 32 / 33

Formalization: Redundancy

Let T be a TBox, B, C basic concepts, and Σ a set of dependencies. Theconcept inclusion assertion B v C is directly redundant in T w.r.t. Σ if

(i) Σ |= B vA C and

(ii) for every T -chain (Bi v B ′i )ni=0 with B ′n = B in T , there is a Σ-chain

(Bi vA B ′i )ni=0.

Then, B v C is redundant in T w.r.t. Σ if

(a) it is directly redundant, or

(b) there exists B ′ 6= B s.t.

(i) T |= B ′ v C ,(ii) B ′ v C is not redundant in T w.r.t. Σ, and(iii) B v B ′ is directly redundant in T w.r.t. Σ.

Rodriguez-Muro and Calvanese (UNIBZ) Dependencies and OBDA May 11, 2011 33 / 33

Recommended