Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
KR in Database Systems Implementation
David Toman
Cheriton School of CS
Joint work with Alexander Hudek and Grant Weddell
David Toman KR in DBMS Implementation 1 / 40
Data and Constraints
The Textbook View of Relational Databases
Data represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforced by sentences (in the same logic)
⇒ the instance is a model of the constraints
What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ
where V is a (new) relational symbol and ϕ is a query (in our logic).⇒ now the instance is no longer a model (unless V is a materialized view)
What about CREATE INDEX Statements?Index declaration (on R for a key x) ∼ a sentence ∀r ,x.I(r ,x)↔ ∃y .R(r ,x,y)
. . . what is then the difference between the primary index on R and R itself?
David Toman (et al.) KR in DBMS Implementation Motivation 2 / 40
Data and Constraints
The Textbook View of Relational Databases
Data represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforced by sentences (in the same logic)
⇒ the instance is a model of the constraints
What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ
where V is a (new) relational symbol and ϕ is a query (in our logic).⇒ now the instance is no longer a model (unless V is a materialized view)
What about CREATE INDEX Statements?Index declaration (on R for a key x) ∼ a sentence ∀r ,x.I(r ,x)↔ ∃y .R(r ,x,y)
. . . what is then the difference between the primary index on R and R itself?
David Toman (et al.) KR in DBMS Implementation Motivation 2 / 40
Data and Constraints
The Textbook View of Relational Databases
Data represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforced by sentences (in the same logic)
⇒ the instance is a model of the constraints
What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ
where V is a (new) relational symbol and ϕ is a query (in our logic).⇒ now the instance is no longer a model (unless V is a materialized view)
What about CREATE INDEX Statements?Index declaration (on R for a key x) ∼ a sentence ∀r ,x.I(r ,x)↔ ∃y .R(r ,x,y)
. . . what is then the difference between the primary index on R and R itself?
David Toman (et al.) KR in DBMS Implementation Motivation 2 / 40
Data and Constraints
The Textbook View of Relational Databases
Data represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforced by sentences (in the same logic)
⇒ the instance is a model of the constraints
What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ
where V is a (new) relational symbol and ϕ is a query (in our logic).⇒ now the instance is no longer a model (unless V is a materialized view)
What about CREATE INDEX Statements?Index declaration (on R for a key x) ∼ a sentence ∀r ,x.I(r ,x)↔ ∃y .R(r ,x,y)
. . . what is then the difference between the primary index on R and R itself?
David Toman (et al.) KR in DBMS Implementation Motivation 2 / 40
Data and Constraints
The Textbook View of Relational Databases
Data represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforced by sentences (in the same logic)
⇒ the instance is a model of the constraints
What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ
where V is a (new) relational symbol and ϕ is a query (in our logic).⇒ now the instance is no longer a model (unless V is a materialized view)
What about CREATE INDEX Statements?Index declaration (on R for a key x) ∼ a sentence ∀r ,x.I(r ,x)↔ ∃y .R(r ,x,y)
. . . what is then the difference between the primary index on R and R itself?
David Toman (et al.) KR in DBMS Implementation Motivation 2 / 40
Phyical Data Independence
IDEA:Separate the users’ view(s) of the data fromthe way it is physically represented.
independent customized user views,changes to conceptual structure withoutaffecting users,physical storage details hidden fromusers,changes to physical storage withoutaffecting logical view,
Originally just two levels: physicaland conceptual/logical [Codd1970].
[ANSI/X3/SPARC StandardsPlanning and RequirementsCommittee, Bachman, 1975]
David Toman (et al.) KR in DBMS Implementation Motivation 3 / 40
Phyical Data Independence
IDEA:Separate the users’ view(s) of the data fromthe way it is physically represented.
independent customized user views,changes to conceptual structure withoutaffecting users,physical storage details hidden fromusers,changes to physical storage withoutaffecting logical view,
Originally just two levels: physicaland conceptual/logical [Codd1970].
[ANSI/X3/SPARC StandardsPlanning and RequirementsCommittee, Bachman, 1975]
David Toman (et al.) KR in DBMS Implementation Motivation 3 / 40
Phyical Data Independence
IDEA:Separate the users’ view(s) of the data fromthe way it is physically represented.
independent customized user views,changes to conceptual structure withoutaffecting users,physical storage details hidden fromusers,changes to physical storage withoutaffecting logical view,
Originally just two levels: physicaland conceptual/logical [Codd1970].
[ANSI/X3/SPARC StandardsPlanning and RequirementsCommittee, Bachman, 1975]
David Toman (et al.) KR in DBMS Implementation Motivation 3 / 40
Phyical Data Independence
IDEA:Separate the users’ view(s) of the data fromthe way it is physically represented.
independent customized user views,changes to conceptual structure withoutaffecting users,physical storage details hidden fromusers,changes to physical storage withoutaffecting logical view,
Originally just two levels: physicaland conceptual/logical [Codd1970].
[ANSI/X3/SPARC StandardsPlanning and RequirementsCommittee, Bachman, 1975]
David Toman (et al.) KR in DBMS Implementation Motivation 3 / 40
Ontology-based Data Access (OBDA) [Calvanese et al.: Mastro, 2011]
Semantic Web 2 (2011) 43–53 43DOI 10.3233/SW-2011-0029IOS Press
The MASTRO system for ontology-based dataaccessEditor(s): Thomas Lukasiewicz, Oxford University, UKSolicited review(s): Carsten Lutz, Universität Bremen, Germany; Roman Kontchakov, Birkbeck College London, UK; one anonymous reviewer
Diego Calvanese a,*, Giuseppe De Giacomo b, Domenico Lembo b, Maurizio Lenzerini b,Antonella Poggi b, Mariano Rodriguez-Muro a, Riccardo Rosati b, Marco Ruzzi b andDomenico Fabio Savo b
a Free University of Bozen-Bolzano, Piazza Domenicani 3, I-39100, Bolzano, ItalyE-mail: [email protected] Sapienza Universita di Roma, Via Ariosto 25, I-00185, Roma, ItalyE-mail: [email protected]
Abstract. In this paper we present MASTRO, a Java tool for ontology-based data access (OBDA) developed at Sapienza Univer-sità di Roma and at the Free University of Bozen-Bolzano. MASTRO manages OBDA systems in which the ontology is specifiedin DL-LiteA,id , a logic of the DL-Lite family of tractable Description Logics specifically tailored to ontology-based data access,and is connected to external JDBC enabled data management systems through semantic mappings that associate SQL queriesover the external data to the elements of the ontology. Advanced forms of integrity constraints, which turned out to be veryuseful in practical applications, are also enabled over the ontologies. Optimized algorithms for answering expressive queriesare provided, as well as features for intensional reasoning and consistency checking. MASTRO provides a proprietary API, anOWLAPI compatible interface, and a plugin for the Protégé 4 ontology editor. It has been successfully used in several projectscarried out in collaboration with important organizations, on which we briefly comment in this paper.
Keywords: Ontology-based data access, Description Logics, reasoning over ontologies
1. Introduction
In this paper we present MASTRO, a tool forontology-based data access developed at SapienzaUniversità di Roma and at the Free University ofBozen-Bolzano. Ontology-based data access (OBDA)refers to a setting in which an ontology is used as ahigh-level, conceptual view over data repositories, al-lowing users to access data without the need to knowhow they are actually organized and where they arestored (cf. Fig. 1).
The OBDA approach turns out to be very useful inall scenarios in which accessing data in a unified andcoherent way is difficult. This may happen for several
*Corresponding author.
reasons. For example, databases may have undergoneseveral manipulations during the years, often for op-timizing applications using them, and may have lost
Fig. 1. Ontology-based data access.
1570-0844/11/$27.50 c! 2011 – IOS Press and the authors. All rights reserved
Information Integration [Genesereth: Data Integration, 2010]
Data Exchange [Arenas et el.: Data Exchange, 2014]
David Toman (et al.) KR in DBMS Implementation Motivation 4 / 40
Common Threads and Issues
In general two schemas: Conceptual/Logical and Physical
⇒ both endowed with metadata (vocabulary, constraints, . . . )⇒ mappings that connect the two schemas
Rules: (1) queries (updates) over the conceptual/logical schema only,(2) raw data as instance of the physical schema only,(3) conceptual/logical and physical schemata are disjoint.
Issues to be formalized/fixed:1 Formal description of the two schemas (same formalism for both?)2 Language(s) for metadata, mappings, etc.3 User Interface:
Data ModelQuery (and Update) Language (e.g., when an answer is an answer?)
Algorithms/Execution model for user queries and updates
David Toman (et al.) KR in DBMS Implementation Motivation 5 / 40
Common Threads and Issues
In general two schemas: Conceptual/Logical and Physical
⇒ both endowed with metadata (vocabulary, constraints, . . . )⇒ mappings that connect the two schemas
Rules: (1) queries (updates) over the conceptual/logical schema only,(2) raw data as instance of the physical schema only,(3) conceptual/logical and physical schemata are disjoint.
Issues to be formalized/fixed:1 Formal description of the two schemas (same formalism for both?)2 Language(s) for metadata, mappings, etc.3 User Interface:
Data ModelQuery (and Update) Language (e.g., when an answer is an answer?)
Algorithms/Execution model for user queries and updates
David Toman (et al.) KR in DBMS Implementation Motivation 5 / 40
Common Threads and Issues
In general two schemas: Conceptual/Logical and Physical
⇒ both endowed with metadata (vocabulary, constraints, . . . )⇒ mappings that connect the two schemas
Rules: (1) queries (updates) over the conceptual/logical schema only,(2) raw data as instance of the physical schema only,(3) conceptual/logical and physical schemata are disjoint.
Issues to be formalized/fixed:1 Formal description of the two schemas (same formalism for both?)2 Language(s) for metadata, mappings, etc.3 User Interface:
Data ModelQuery (and Update) Language (e.g., when an answer is an answer?)
Algorithms/Execution model for user queries and updates
David Toman (et al.) KR in DBMS Implementation Motivation 5 / 40
What to do?
Definability and RewritingQueries range-restricted FOL (a.k.a. SQL)Ontology/Schema range-restricted FOLData CWA (complete information)
ΣL SL ϕoo (Logical Schema)
Query (SL)��
CompilerRelational Algebra (over SA)
��Schema (SL ∪ SP)
OO
Evaluator // Answers
Data (SA ⊂ SP)
OO
David Toman (et al.) KR in DBMS Implementation Definability/Interpolation 6 / 40
What to do?
Definability and RewritingQueries range-restricted FOL over SL definable w.r.t. Σ and SAOntology/Schema range-restricted FOLData CWA (complete information for SA symbols)
ΣL SL ϕoo (Logical Schema)
ΣLP (rewriting)
��ΣP SA ⊆ SP ψoo (Physical Schema)
Query (SL)��
CompilerRelational Algebra (over SA)
��Schema (SL ∪ SP)
OO
Evaluator // Answers
Data (SA ⊂ SP)
OO
David Toman (et al.) KR in DBMS Implementation Definability/Interpolation 6 / 40
What to do?
Definability and RewritingQueries range-restricted FOL over SL definable w.r.t. Σ and SAOntology/Schema range-restricted FOLData CWA (complete information for SA symbols)
Query (SL)��
CompilerRelational Algebra (over SA)
��Schema (SL ∪ SP)
OO
Evaluator // Answers
Data (SA ⊂ SP)
OO
David Toman (et al.) KR in DBMS Implementation Definability/Interpolation 6 / 40
What to do?
Definability and RewritingQueries range-restricted FOL over SL definable w.r.t. Σ and SAOntology/Schema range-restricted FOLData CWA (complete information for SA symbols)
Query (SL)��
CompilerRelational Algebra (over SA)
��Schema (SL ∪ SP)
OO
Evaluator // Answers
Data (SA ⊂ SP)
OO
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
©2011
users: looks like a single model (of the conceptual schema)implementation: many models
but definable queries answer the same in each of them
David Toman (et al.) KR in DBMS Implementation Definability/Interpolation 6 / 40
What to do?
Definability and RewritingQueries range-restricted FOL over SL definable w.r.t. Σ and SAOntology/Schema range-restricted FOLData CWA (complete information for SA symbols)
Query (SL)��
CompilerRelational Algebra (over SA)
��Schema (SL ∪ SP)
OO
Evaluator // Answers
Data (SA ⊂ SP)
OO
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
©2011
decidable (schema) languages?finite models?
David Toman (et al.) KR in DBMS Implementation Definability/Interpolation 6 / 40
GRAND UNIFIED APPROACH
TO QUERY COMPILATION
PART I: WHAT CAN IT DO?
David Toman (et al.) KR in DBMS Implementation 7 / 40
What can this do?
GOALGenerate query plans that compete with hand-written programs in C
1 Standard Designs (base files, primary/secondary indexing,horizontal/vertical partitioning, coverage/disjointness)
2 linked data structures, pointers, . . .3 access to search structures (index access and selection),4 hash-based access to data (including hash-joins),5 multi-level storage (aka disk/remote/distributed files), . . .6 materialized views (FO-definable),7 updates through logical schema (needs id invention!), . . .
. . . all without having to code (too much) in C/C++ !
David Toman (et al.) KR in DBMS Implementation What can it do? 8 / 40
Basics: Base File
david$ cat 848ex/base3.fol[% constraintstable(x,y,z) <-> ex(r,p0basetable(r,x,y,z)),% queryq(x,y) <-> ex(z,table(x,x,z) and table(z,y,y))].
David Toman (et al.) KR in DBMS Implementation What can it do? 9 / 40
Basics: Base File (cont.)
david$ cat 848ex/base3.fol | ./trans | ./main -allocated heap 1073741824 (1024M) at 0x116202000
Closing sets:L: { -p0basetable(slc:5,sl8:3,sl0:2,sl0:2)/0 }
{ -p0basetable(slc:4,sl0:1,sl0:1,sl8:3)/0 }R: { +p0basetable(slc:4,sl0:1,sl0:1,sl8:3)/0
+p0basetable(slc:5,sl8:3,sl0:2,sl0:2)/0 }
Success with fragment:+EXISTS[sl8:3.E](
+JOIN(+EXISTS[slc:4.E](+ATOM p0basetable(slc:4,sl0:1,sl0:1,sl8:3)),+EXISTS[slc:5.E](
+ATOM p0basetable(slc:5,sl8:3,sl0:2,sl0:2))))
David Toman (et al.) KR in DBMS Implementation What can it do? 10 / 40
Basics: Vertical Partition
david$ cat 848ex/indexonly.fol[% constraintstable(x,y,z) <-> ex(r,basetable(r,x,y,z)),% indicesp0indexx(r,x) <-> ex([y,z],basetable(r,x,y,z)),p0indexy(r,y) <-> ex([x,z],basetable(r,x,y,z)),p0indexz(r,z) <-> ex([x,y],basetable(r,x,y,z)),% keys(basetable(r,x1,y1,z1) and basetable(r,x2,y2,z2))
->(x1=x2 and y1=y2 and z1=z2),% queryq(x,y,z) <-> table(x,y,z)].
David Toman (et al.) KR in DBMS Implementation What can it do? 11 / 40
Basics: Vertical Partition
david$ cat 848ex/indexonly.fol | ./trans | ./main -allocated heap 1073741824 (1024M) at 0x11304f000
Closing sets:L: { -p0indexx(sl19:4,sl0:1)/0 }
{ -p0indexy(sl19:4,sl0:2)/0 }{ -p0indexz(sl19:4,sl0:3)/0 }
R: { +p0indexz(sl19:4,sl0:3)/0+p0indexy(sl19:4,sl0:2)/0+p0indexx(sl19:4,sl0:1)/0 }
Success with fragment:+EXISTS[sl19:4.E](
+JOIN(+ATOM p0indexx(sl19:4,sl0:1),+JOIN(
+ATOM p0indexy(sl19:4,sl0:2),+ATOM p0indexz(sl19:4,sl0:3))))
David Toman (et al.) KR in DBMS Implementation What can it do? 12 / 40
Basics: Vertical Partition (and Projection)
% queryq(x,y) <-> ex(z,table(x,y,z))].
david$ cat 848ex/indexonly2.fol | ./trans | ./main -allocated heap 1073741824 (1024M) at 0x114117000
Closing sets:L: { -p0indexx(sl1c:4,sl0:1)/0 }
{ -p0indexy(sl1c:4,sl0:2)/0 }{ -p0indexz(sl1c:4,sl11:3)/0 }
R: { +p0indexy(sl1c:4,sl0:2)/0 +p0indexx(sl1c:4,sl0:1)/0 }
Success with fragment:+EXISTS[sl1c:4.E](
+JOIN(+ATOM p0indexx(sl1c:4,sl0:1),+ATOM p0indexy(sl1c:4,sl0:2)))
David Toman (et al.) KR in DBMS Implementation What can it do? 13 / 40
Basics: Horizontal Partition
david$ cat 848ex/horizontal.fol[% constraintstable(x,y,z) <-> ex(r,basetable(r,x,y,z)),% horizontal partitionsp0hpp1(r,x,y,z) -> basetable(r,x,y,z),p0hpp2(r,x,y,z) -> basetable(r,x,y,z),p0hpp3(r,x,y,z) -> basetable(r,x,y,z),basetable(r,x,y,z) -> (p0hpp1(r,x,y,z) or
p0hpp2(r,x,y,z) orp0hpp3(r,x,y,z)),
% keys(basetable(r,x1,y1,z1) and basetable(r,x2,y2,z2))
->(x1=x2 and y1=y2 and z1=z2),% queryq(x,y,z) <-> table(x,y,z)].
David Toman (et al.) KR in DBMS Implementation What can it do? 14 / 40
Basics: Coverage
david$ cat 848ex/horizontal.fol | ./trans | ./main -allocated heap 1073741824 (1024M) at 0x116d93000
Closing sets:L: { -p0hpp3(sl13:4,sl0:1,sl0:2,sl0:3)/0
-p0hpp2(sl13:4,sl0:1,sl0:2,sl0:3)/0-p0hpp1(sl13:4,sl0:1,sl0:2,sl0:3)/0 }
R: { +p0hpp1(sl13:4,sl0:1,sl0:2,sl0:3)/0 }{ +p0hpp3(sl13:4,sl0:1,sl0:2,sl0:3)/0 }{ +p0hpp2(sl13:4,sl0:1,sl0:2,sl0:3)/0 }
Success with fragment:+UNION(+UNION(+EXISTS[sl13:4.E](+ATOM p0hpp2(sl13:4,sl0:1,sl0:2,sl0:3)),+EXISTS[sl13:4.E](+ATOM p0hpp3(sl13:4,sl0:1,sl0:2,sl0:3))),+EXISTS[sl13:4.E](+ATOM p0hpp1(sl13:4,sl0:1,sl0:2,sl0:3)))
David Toman (et al.) KR in DBMS Implementation What can it do? 15 / 40
Basics: Coverage and Disjointnss
david$ cat 848ex/subclass2.fol[% constraintstable(x,y,z) <-> ex(r,basetable(r,x,y,z)),% superclass and complementp0super(r,x,y,z) -> (p0complement(r) or basetable(r,x,y,z)),basetable(r,x,y,z) -> p0super(r,x,y,z),p0complement(r) -> ex([x,y,z],p0super(r,x,y,z)),% disjointnessp0complement(r) and ex([x,y,z],basetable(r,x,y,z)) -> bot,% keys(p0super(r,x1,y1,z1) and p0super(r,x2,y2,z2))->
(x1=x2 and y1=y2 and z1=z2),% queryq(x,y,z) <-> table(x,y,z)].
David Toman (et al.) KR in DBMS Implementation What can it do? 16 / 40
Basics: Coverage (cont.)
david$ cat 848ex/subclass2.fol | ./trans | ./main -allocated heap 1073741824 (1024M) at 0x10d09b000
Closing sets:L: { -p0super(sl12:4,sl0:1,sl0:2,sl0:3)/0 }
{ +p0complement(sl12:4)/0 }R: { -p0complement(sl12:4)/0
+p0super(sl12:4,sl0:1,sl0:2,sl0:3)/0 }
Success with fragment:+EXISTS[sl12:4.E](+JOIN(
+ATOM p0super(sl12:4,sl0:1,sl0:2,sl0:3),-ATOM p0complement(sl12:4)))
David Toman (et al.) KR in DBMS Implementation What can it do? 17 / 40
Lists and Pointers
1 Logical Schemaemployee works department
number oo // enumber number//
name dnumber name
salary manager
oo
2 Physical Design: a linked list of emp records pointing to dept records.record emp of
integer numstring nameinteger salaryreference dept
record dept ofinteger numstring namereference manager
3 Access Paths: empfile/1/0, emp-num/2/1, . . . (but no deptfile)4 Integrity Constraints (many), e.g.,
∀x , y , z.employee(x , y , z)→ ∃w .empfile(w) ∧ emp-num(w , x),∀a, x .empfile(a) ∧ emp-num(a, x)→ ∃y , z.employee(x , y , z), . . .
David Toman (et al.) KR in DBMS Implementation What can it do? 18 / 40
What can this do: navigating pointers
1 List all employee numbers and names (∃z.employee(x , y , z)):
∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)
2 List all department numbers with their manager names(∃z,u, v .department(x , z,u) ∧ employee(u, y , v)):
.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)
⇒ needs “departments have at least one employee”.. . . needs duplicate elimination during projection.
.empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)
⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.
David Toman (et al.) KR in DBMS Implementation What can it do? 19 / 40
What can this do: navigating pointers
1 List all employee numbers and names (∃z.employee(x , y , z)):
∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)
or, in C-like syntax: for a in empfile dox := a->num;y := a->name;
2 List all department numbers with their manager names(∃z,u, v .department(x , z,u) ∧ employee(u, y , v)):
.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)
⇒ needs “departments have at least one employee”.. . . needs duplicate elimination during projection.
.empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)
⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.
David Toman (et al.) KR in DBMS Implementation What can it do? 19 / 40
What can this do: navigating pointers
1 List all employee numbers and names (∃z.employee(x , y , z)):
∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)
2 List all department numbers with their manager names(∃z,u, v .department(x , z,u) ∧ employee(u, y , v)):
.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)
⇒ needs “departments have at least one employee”.. . . needs duplicate elimination during projection.
.empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)
⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.
David Toman (et al.) KR in DBMS Implementation What can it do? 19 / 40
What can this do: navigating pointers
1 List all employee numbers and names (∃z.employee(x , y , z)):
∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)
2 List all department numbers with their manager names(∃z,u, v .department(x , z,u) ∧ employee(u, y , v)):
∃a,d ,e.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)
⇒ needs “departments have at least one employee”.. . . needs duplicate elimination during projection.
.empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)
⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.
David Toman (et al.) KR in DBMS Implementation What can it do? 19 / 40
What can this do: navigating pointers
1 List all employee numbers and names (∃z.employee(x , y , z)):
∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)
2 List all department numbers with their manager names(∃z,u, v .department(x , z,u) ∧ employee(u, y , v)):
∃a,d ,e.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)
⇒ needs “departments have at least one employee”.. . . needs duplicate elimination during projection.
∃a,b,d .empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)
⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.
David Toman (et al.) KR in DBMS Implementation What can it do? 19 / 40
What can this do: navigating pointers
1 List all employee numbers and names (∃z.employee(x , y , z)):
∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)
2 List all department numbers with their manager names(∃z,u, v .department(x , z,u) ∧ employee(u, y , v)):
∃a,d ,e.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)
⇒ needs “departments have at least one employee”.. . . needs duplicate elimination during projection.
∃a,b,d .empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)
⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.
David Toman (et al.) KR in DBMS Implementation What can it do? 19 / 40
What can it do: Hashing, Lists, et al.
Hash Index with (list-based) Separate Chaining
... D1
i : • // •
//
• // •
//
⊥
... D3
j : ⊥
...
n : • // •
//
⊥ D2
Hash Array Separate Chaining Linked Lists Dept Records
David Toman (et al.) KR in DBMS Implementation What can it do? 20 / 40
What can it do: Hashing, Lists, et al.
Hash Index on department’s name:Access paths:
SA ⊇ {hash/2/1,hasharraylookup/2/1,listscan/2/1}.Physical Constraints:
ΣLP ⊇ {∀x , y .((deptfile(x) ∧ dept-name(x , y))→ ∃z,w .(hash(y , z)∧ hasharraylookup(z,w) ∧ listscan(w , x))),
∀x , y .(hash(x , y)→ ∃z.hasharraylookup(y , z)),∀x , y .(listscan(x , y)→ deptfile(y)) }
Query:∃y , z.(department(x1,p, y) ∧ employee(y , x2, z)){p}.
E?x6.Hash(p,?x6)^E?x5.Hasharraylookup(?x6,?x5)^E?x4.Listscan(?x5,?x4)^E?s0.Dept-name(?x4,?s0)^Cmp(p,?s0))^Dept-num(?x4,x1)^E?x3.Dept-manager(?x4,?x3)^Emp-name(?x3,x2)
David Toman (et al.) KR in DBMS Implementation What can it do? 21 / 40
What can it do: Hashing, Lists, et al.
Hash Index on department’s name:Access paths:
SA ⊇ {hash/2/1,hasharraylookup/2/1,listscan/2/1}.Physical Constraints:
ΣLP ⊇ {∀x , y .((deptfile(x) ∧ dept-name(x , y))→ ∃z,w .(hash(y , z)∧ hasharraylookup(z,w) ∧ listscan(w , x))),
∀x , y .(hash(x , y)→ ∃z.hasharraylookup(y , z)),∀x , y .(listscan(x , y)→ deptfile(y)) }
Query:∃y , z.(department(x1,p, y) ∧ employee(y , x2, z)){p}.
E?x6.Hash(p,?x6)^E?x5.Hasharraylookup(?x6,?x5)^E?x4.Listscan(?x5,?x4)^E?s0.Dept-name(?x4,?s0)^Cmp(p,?s0))^Dept-num(?x4,x1)^E?x3.Dept-manager(?x4,?x3)^Emp-name(?x3,x2)
David Toman (et al.) KR in DBMS Implementation What can it do? 21 / 40
What can this do: two-level store
The access path empfile is refined by emppages/1/0 and emprecords/2/1:
emppages returns (sequentially) disk pages containing emp records, andemprecords given a disc page, returns emp records in that page.
5 List all employees with the same name(∃z,u, v ,w , t .employee(x1, z,u, v) ∧ employee(x2, z,w , t)):
∃y , z,w , v ,p,q.emppages(p) ∧ emppages(q)∧ emprecords(p, y) ∧ emp-num(y , x1) ∧ emp-name(y ,w)∧ emprecords(q, z) ∧ emp-num(z, x2) ∧ emp-name(z, v)
∧ compare(w , v).
⇒ this plan implements the block nested loops join algorithm.
. . . more examples inMorgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
.David Toman (et al.) KR in DBMS Implementation What can it do? 22 / 40
What can this do: materialized views
Brain TeaserGiven materialized views
1 V1(x , y)↔ ∃z,u.R(z, x) ∧ R(z,u) ∧ R(u, y)
2 V2(x , y)↔ ∃z.R(x , z) ∧ R(z, y)
3 V3(x , y)↔ ∃z,u.R(x , z) ∧ R(z,u) ∧ R(u, y)
can one rewrite the query
{x , y | ∃z,u, v .R(z, x) ∧ R(z,u) ∧ R(u, v) ∧ R(v , y)}
in terms of V1, V2, ad V3?
David Toman (et al.) KR in DBMS Implementation What can it do? 23 / 40
What can this do: updates
User updates only through logical schema:⇒ supplying “delta” relations (sets of tuples)
Two copies of the schema: Σold and Σnew ;Delta relations: R+ (insertions) and R− (deletions);Constraints: ∀x .(Rold (x) ∨ R+(x)) ≡ (Rnew (x) ∨ R−(x)),
∀x .(R+(x) ∧ R−(x))→ ⊥
ΣoldL Sold
L SnewL
//U+,U−
ΣnewL
ΣoldLP Σnew
LP
��Σold
P SA ⊆ SoldP SA ⊆ Snew
P//
A+,A−Σnew
P
Update turned into definability question
Is Anew (or A+,A−) definable in terms of Aoldi ∈ Sold
A (old access paths)and U+
j , U−j (user updates) for every access path A ∈ SA?
How do we get “anonymous values”?Constant Complement tables (access paths that do malloc and such);
cyclic dependencies between anonymous values broken via reification.
David Toman (et al.) KR in DBMS Implementation What can it do? 24 / 40
What can this do: updates
User updates only through logical schema:⇒ supplying “delta” relations (sets of tuples)
Two copies of the schema: Σold and Σnew ;Delta relations: R+ (insertions) and R− (deletions);Constraints: ∀x .(Rold (x) ∨ R+(x)) ≡ (Rnew (x) ∨ R−(x)),
∀x .(R+(x) ∧ R−(x))→ ⊥
Update turned into definability question
Is Anew (or A+,A−) definable in terms of Aoldi ∈ Sold
A (old access paths)and U+
j , U−j (user updates) for every access path A ∈ SA?
How do we get “anonymous values”?Constant Complement tables (access paths that do malloc and such);
cyclic dependencies between anonymous values broken via reification.
David Toman (et al.) KR in DBMS Implementation What can it do? 24 / 40
What can this do: updates
User updates only through logical schema:⇒ supplying “delta” relations (sets of tuples)
Two copies of the schema: Σold and Σnew ;Delta relations: R+ (insertions) and R− (deletions);Constraints: ∀x .(Rold (x) ∨ R+(x)) ≡ (Rnew (x) ∨ R−(x)),
∀x .(R+(x) ∧ R−(x))→ ⊥
Update turned into definability question
Is Anew (or A+,A−) definable in terms of Aoldi ∈ Sold
A (old access paths)and U+
j , U−j (user updates) for every access path A ∈ SA?
How do we get “anonymous values”?Constant Complement tables (access paths that do malloc and such);
cyclic dependencies between anonymous values broken via reification.
David Toman (et al.) KR in DBMS Implementation What can it do? 24 / 40
GRAND UNIFIED APPROACH
TO QUERY COMPILATION
PART II: HOW DOES IT WORK?
David Toman (et al.) KR in DBMS Implementation 25 / 40
The Plan
Definability and RewritingQueries range-restricted FOL over SL definable w.r.t. Σ and SAOntology/Schema range-restricted FOLData CWA (complete information for SA symbols)
ΣL SL ϕoo (Logical Schema)
ΣLP (rewriting)
��ΣP SA ⊆ SP ψoo (Physical Schema)
David Toman (et al.) KR in DBMS Implementation How does it work? 26 / 40
Query Plans via Interpolation
Idea #1: Plans as FormulasRepresent query plans as (annotated) range-restricted formulas ψ over SA:
atomic formula 7→ access path (get-first–get-next iterator)conjunction 7→ nested loops joinexistential quantifier 7→ projection (annotated w/duplicate info)disjunction 7→ concatenationnegation 7→ simple complement
Non-logical (but necessary) Add-ons1 Non-logical properties/operators
binding patternsduplication of data and duplicate-preserving/eliminating projectionssortedness of data (with respect to the iterator semantics) and sorting
2 Cost model(s)
David Toman (et al.) KR in DBMS Implementation How does it work? 27 / 40
Query Plans via Interpolation
Idea #1: Plans as FormulasRepresent query plans as (annotated) range-restricted formulas ψ over SA:
atomic formula 7→ access path (get-first–get-next iterator)conjunction 7→ nested loops joinexistential quantifier 7→ projection (annotated w/duplicate info)disjunction 7→ concatenationnegation 7→ simple complement
Non-logical (but necessary) Add-ons1 Non-logical properties/operators
binding patternsduplication of data and duplicate-preserving/eliminating projectionssortedness of data (with respect to the iterator semantics) and sorting
2 Cost model(s)
David Toman (et al.) KR in DBMS Implementation How does it work? 27 / 40
Query Plans via Interpolation
Idea #1 (cont):Represent physical design as access paths (SA) and constraints (Σ).Represent query plans as (annotated) range-restricted formulas ψ over SA.
⇒ reduces correctness of ψ to logical implication Σ |= ϕ↔ ψ
IDEA #2:Use interpolation to search for ψ:
extract an interpolant ψ from a (TABLEAU) proof of Σ ∪ Σ∗ |= ϕ→ ϕ∗
⇒ Beth definability of ϕ over Σ and SA resolves the existence of ψ.
David Toman (et al.) KR in DBMS Implementation How does it work? 28 / 40
Query Plans via Interpolation
Idea #1 (cont):Represent physical design as access paths (SA) and constraints (Σ).Represent query plans as (annotated) range-restricted formulas ψ over SA.
⇒ reduces correctness of ψ to logical implication Σ |= ϕ↔ ψ
IDEA #2:Use interpolation to search for ψ:
extract an interpolant ψ from a (TABLEAU) proof of Σ ∪ Σ∗ |= ϕ→ ϕ∗
⇒ Beth definability of ϕ over Σ and SA resolves the existence of ψ.
David Toman (et al.) KR in DBMS Implementation How does it work? 28 / 40
Query Plans via Interpolation
Idea #1 (cont):Represent physical design as access paths (SA) and constraints (Σ).Represent query plans as (annotated) range-restricted formulas ψ over SA.
⇒ reduces correctness of ψ to logical implication Σ |= ϕ↔ ψ
IDEA #2:Use interpolation to search for ψ:
extract an interpolant ψ from a (TABLEAU) proof of Σ ∪ Σ∗ |= ϕ→ ϕ∗
⇒ Beth definability of ϕ over Σ and SA resolves the existence of ψ.
David Toman (et al.) KR in DBMS Implementation How does it work? 28 / 40
Issues with TABLEAU
Dealing with the subformula property of Tableau⇒ analytic tableau explores formulas structurally⇒ (to large degree ) the structure of interpolant
depends on where access paths are present in queries/constraints.
IDEA #3:Separate general constraints from physical rules in the formulation ofthe definability question (and the subsequent interpolant extraction):
ΣL ∪ ΣR ∪ ΣLR |= ϕL → ϕR where ΣLR = {∀x .PL ↔ P ↔ PR | P ∈ SA}
Factoring logical reasoning from plan enumeration⇒ backtracking tableau to get alternative plans: too slow, too few plans
IDEA #4:Define conditional tableau exploration (using general constraints)
and separate it from plan generation (using physical rules)
David Toman (et al.) KR in DBMS Implementation How does it work? 29 / 40
Issues with TABLEAU
Dealing with the subformula property of Tableau⇒ analytic tableau explores formulas structurally⇒ (to large degree ) the structure of interpolant
depends on where access paths are present in queries/constraints.
IDEA #3:Separate general constraints from physical rules in the formulation ofthe definability question (and the subsequent interpolant extraction):
ΣL ∪ ΣR ∪ ΣLR |= ϕL → ϕR where ΣLR = {∀x .PL ↔ P ↔ PR | P ∈ SA}
Factoring logical reasoning from plan enumeration⇒ backtracking tableau to get alternative plans: too slow, too few plans
IDEA #4:Define conditional tableau exploration (using general constraints)
and separate it from plan generation (using physical rules)
David Toman (et al.) KR in DBMS Implementation How does it work? 29 / 40
Conditional Formulæ and Tableau
Conditional Formulæϕ[C] where C is a set of (ground) atoms over SA
ϕ only exists if all atoms in C are “used” in a plan tableau.
Absorbed Range-restricted Formulæ: ANF
Q ::= R(x) | ⊥ | Q ∧Q | Q ∨Q | ∀x .R(x)→ Q,
. . . and all ∃’s are Skolemized.
Conditional Tableau Rules for ANF
S ∪ {ϕ[C], ψ[C]}(ϕ ∧ ψ)[C] ∈ S
(conj)S ∪ {ϕ[C]} S ∪ {ψ[C]}
(ϕ ∨ ψ)[C] ∈ S(disj)
S ∪ {(ϕ[t/x ])[C ∪ D]}{R(t)[C], (∀x .R(x)→ ϕ)[D]} ⊆ S
(abs)S ∪ {R(t)[R(t)]}
SR(x) ∈ SA (phys)
David Toman (et al.) KR in DBMS Implementation How does it work? 30 / 40
Conditional Tableau: Outline
Goal: a closed tableau of the form
T P
T L T R T L . . . T R
T L and T R are logical left- and right-tableau(only logical constraints ΣL and ΣR , built 1st)
T P is a physical tableau (only physical rules ΣLR , built 2nd)
David Toman (et al.) KR in DBMS Implementation How does it work? 31 / 40
Conditional Tableau: Logical Parts
Conditional Tableau for (Q,Σ,SA)
Proof trees (T L,T R): T L for ΣL ∪ {QL(a)} over {PL | P ∈ SA}T R for ΣR ∪ {QR(a)→ ⊥} over {PR | P ∈ SA}
Closing Set(s)We call S of atoms over SA a closing set for T if, for every branch,
1 there is an atom R(t)[C], R(t) 6∈ C, such that C ∪ {¬R(t)} ⊆ S, or2 there is ⊥[C] such that C ⊆ S.
⇒ there are many different minimal closing sets for T .
ObservationFor arbitrary closing set S, an interpolant for T L(T R) is ⊥(>).
David Toman (et al.) KR in DBMS Implementation How does it work? 32 / 40
Plan Enumeration
Physical Tableau T P
P : LP RP
:R(t) : {{¬RL(t)}} {{RR(t)}}P1 ∧ P2 : LP1 ∪ LP2 {S1 ∪ S2 | S1 ∈ RP1 ,S2 ∈ RP2}P1 ∨ P2 : {S1 ∪ S2 | S1 ∈ LP1 ,S2 ∈ LP2} RP1 ∪ RP2
¬P1 : {{LL(t) | LR (t) ∈ S} | S ∈ RP1} {{LR (t) | LL(t) ∈ S} | S ∈ LP1}∃x .P1 : LP1[t/x ] RP1[t/x ]
ObservationFor a range-restricted formula P over SA there is an analytic tableau T P thatuses only formulæ in ΣLR ∪ {∀x .trueR(x)} such that:
1 Open branches of T P correspond to sets S ∈ LP (left branch) or S ∈ RP(right branch); and
2 The interpolant extracted from this tableau is logically equivalent to P,assuming that the closure of each left branch interpolates to ⊥ and forright branch to >.
David Toman (et al.) KR in DBMS Implementation How does it work? 33 / 40
Logical&Physical Combined, Controlling the Search
Basic Strategy1 build (T L,T R) for (Q,Σ,SA) to a certain depth,2 build T P and test if each element in LP(RP) closes T L(T R).
if so, T P [T L,T R] is closed tableau yielding an interpolant equivalent to P;(. . . otherwise extend depth in step 1 and repeat.)
In step 2 we can “test” many Ps (plan enumeration), buthow do we know which ones to try? while building these bottom-up?
Controlling the Searchonly use the (phys) rule in T L(T R) for R(t) that appears in T R(T L),only consider fragments that help closing (T L,T R)
⇒ this is determined using the minimal closing sets for (T L,T R).
. . . combine with search (among P ’s) with respect to a cost model.
David Toman (et al.) KR in DBMS Implementation How does it work? 34 / 40
Logical&Physical Combined, Controlling the Search
Basic Strategy1 build (T L,T R) for (Q,Σ,SA) to a certain depth,2 build T P and test if each element in LP(RP) closes T L(T R).
if so, T P [T L,T R] is closed tableau yielding an interpolant equivalent to P;(. . . otherwise extend depth in step 1 and repeat.)
In step 2 we can “test” many Ps (plan enumeration), buthow do we know which ones to try? while building these bottom-up?
Controlling the Searchonly use the (phys) rule in T L(T R) for R(t) that appears in T R(T L),only consider fragments that help closing (T L,T R)
⇒ this is determined using the minimal closing sets for (T L,T R).
. . . combine with search (among P ’s) with respect to a cost model.
David Toman (et al.) KR in DBMS Implementation How does it work? 34 / 40
Postprocessing
Duplicate Elimination Elimination
Q[{R(x1, . . . , xk )}]↔Q[R(x1, . . . , xk )]Q[{Q1 ∧Q2}]↔Q[{Q1} ∧ {Q2}]
Q[{¬Q1}]↔Q[¬Q1]Q[¬{Q1}]↔Q[¬Q1]
Q[{Q1 ∨Q2}]↔Q[{Q1} ∨ {Q2}] if Σ ∪ {Q[]} |= Q1 ∧Q2 → ⊥Q[{∃x .Q1}]↔Q[∃x .{Q1}]
if Σ ∪ {Q[] ∧Q1[y1/x ] ∧Q1[y2/x} |= y1 ≈ y2
CFDnc logic to approximate Σ and to reason about FDs/disjointness in PTIME.
David Toman (et al.) KR in DBMS Implementation How does it work? 35 / 40
Postprocessing
Duplicate Elimination Elimination
Q[{R(x1, . . . , xk )}]↔Q[R(x1, . . . , xk )]Q[{Q1 ∧Q2}]↔Q[{Q1} ∧ {Q2}]
Q[{¬Q1}]↔Q[¬Q1]Q[¬{Q1}]↔Q[¬Q1]
Q[{Q1 ∨Q2}]↔Q[{Q1} ∨ {Q2}] if Σ ∪ {Q[]} |= Q1 ∧Q2 → ⊥Q[{∃x .Q1}]↔Q[∃x .{Q1}]
if Σ ∪ {Q[] ∧Q1[y1/x ] ∧Q1[y2/x} |= y1 ≈ y2
CFDnc logic to approximate Σ and to reason about FDs/disjointness in PTIME.
⇒ also CUT insertion (similar—seeMorgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
).
David Toman (et al.) KR in DBMS Implementation How does it work? 35 / 40
Summary of the Approach
1 FO (DLFDE) tableau based interpolation algorithm⇒ enumeration of plans factored from reasoning⇒ range-restricted queries and constraints→ ground terms only⇒ extra-logical binding patterns and cost model
2 post processing (using CFD approximation)⇒ duplicate elimination elimination⇒ cut insertion
3 run time⇒ library of common data structures+schema constraints
or an interface to a legacy system⇒ finger data structures to simulate merge joins et al.
David Toman (et al.) KR in DBMS Implementation Summary 36 / 40
Brain Teaser
david$ cat nash.fol% Nash unlabelled[% constraintsp0v1(x,y)<->ex([z,u],r(z,x) and r(z,u) and r(u,y)),p0v2(x,y)<->ex(z,r(x,z) and r(z,y)),p0v3(x,y)<->ex([z,u],r(x,z) and r(z,u) and r(u,y))% queryq(x,y)<->ex([z,u,t],r(z,x) and r(z,u) and r(u,t) and r(t,y)),].
David Toman (et al.) KR in DBMS Implementation Summary 37 / 40
Brain Teaser
david$ cat nash.fol | ./trans | ./main -
Closing sets:L: { -p0v1(sl0:1,sl21:5)/0 }
{ -p0v3(sl17:3,sl0:2)/0 }{ -p0v1(sl0:1,sr1a:1f)/0 +p0v2(sl17:3,sr1a:1f)/0 }{ -p0v3(sr10:d,sl0:2)/0 +p0v2(sr10:d,sl21:5)/0 }
R: { -p0v2(sr10:d,sl21:5)/0 +p0v1(sl0:1,sl21:5)/0 }{ -p0v2(sl17:3,sr1a:1f)/0 +p0v3(sl17:3,sl0:2)/0 }{ +p0v3(sr10:d,sl0:2)/0 +p0v1(sl0:1,sl21:5)/0 }{ +p0v3(sl17:3,sl0:2)/0 +p0v1(sl0:1,sr1a:1f)/0 }
Success with fragment:+EXISTS[sl21:5.E](+JOIN(+ATOM p0v1(sl0:1,sl21:5),-EXISTS[sr10:d.A](+JOIN(
+ATOM p0v2(sr10:d,sl21:5),-ATOM p0v3(sr10:d,sl0:2)))))
David Toman (et al.) KR in DBMS Implementation Summary 38 / 40
Brain Teaser
david$ cat nash.fol | ./trans | ./main -
Closing sets:L: { -p0v1(sl0:1,sl21:5)/0 }
{ -p0v3(sl17:3,sl0:2)/0 }{ -p0v1(sl0:1,sr1a:1f)/0 +p0v2(sl17:3,sr1a:1f)/0 }{ -p0v3(sr10:d,sl0:2)/0 +p0v2(sr10:d,sl21:5)/0 }
R: { -p0v2(sr10:d,sl21:5)/0 +p0v1(sl0:1,sl21:5)/0 }{ -p0v2(sl17:3,sr1a:1f)/0 +p0v3(sl17:3,sl0:2)/0 }{ +p0v3(sr10:d,sl0:2)/0 +p0v1(sl0:1,sl21:5)/0 }{ +p0v3(sl17:3,sl0:2)/0 +p0v1(sl0:1,sr1a:1f)/0 }
Success with fragment:+EXISTS[sl17:3.E](+JOIN(+ATOM p0v3(sl17:3,sl0:2),-EXISTS[sr1a:1f.A](+JOIN(
+ATOM p0v2(sl17:3,sr1a:1f),-ATOM p0v1(sl0:1,sr1a:1f)))))
David Toman (et al.) KR in DBMS Implementation Summary 38 / 40
Open Issues
1 Dealing with ordered data? (merge-joins etc.: we have a partial solution)
2 Decidable schema languages (decidable interpolation problem)?
3 More powerful schema languages (inductive types, etc.)?
4 Beyond FO Queries/Views (e.g., count/sum aggregates)?
5 Coding extra-logical bits (e.g., binding patterns, postprocessing, etc. )in the schema itself?
6 Standard Designs (a plan can always be found as in SQL)?
7 Explanation(s) of non-definability?
8 Fine(r)-grained updates?
9 . . .
. . . and, as always, performance, performance, performance!
David Toman (et al.) KR in DBMS Implementation Summary 39 / 40
Message from our Sponsors
Data Systems Group at the University of Waterloo∼ 10 professors, affiliated faculty, postdocs, 40+ graduate students, . . .wide range of research interests
Advanced query processing/Knowledge representationSystem aspects of database systems and Distributed data managementData quality/Managing uncertain data/Data miningNew(-ish) domains (text, streaming, graph data/RDF, OLAP)
research sponsored by governments, and local/global companiesNSERC/CFI/OIT and Google, IBM, SAP, OpenText, . . .
. . . and we are always looking for good graduate students (MMath/PhD)⇒ comes with full support over multiple years
David Toman (et al.) KR in DBMS Implementation Summary 40 / 40