Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
KR in Database Systems Implementation(or Life beyond Lite Logics and CQ/UCQ)
David Toman
D.R. Cheriton School of Computer Science
Joint work with Alexander Hudek and Grant Weddell
David Toman (et al.) KR in DBMS Implementation 1 / 18
Data and Constraints: the Database Recap
The Textbook ViewData represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforces by sentences (in the same logic)
⇒ the instance is a model of the constraints
What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ (in our logic)
where V is a (new) relational symbol and ϕ is a query.
Much Bigger Deal: Physical Data IndependenceLogical Symbols user (visible) relations/tablesMappingPhysical Symbols data structures (indices)
David Toman (et al.) KR in DBMS Implementation Motivation 2 / 18
Data and Constraints: the Database Recap
The Textbook ViewData represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforces by sentences (in the same logic)
⇒ the instance is a model of the constraints
What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ (in our logic)
where V is a (new) relational symbol and ϕ is a query.
Much Bigger Deal: Physical Data IndependenceLogical Symbols user (visible) relations/tablesMappingPhysical Symbols data structures (indices)
David Toman (et al.) KR in DBMS Implementation Motivation 2 / 18
Data and Constraints: the Database Recap
The Textbook ViewData represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforces by sentences (in the same logic)
⇒ the instance is a model of the constraints
What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ (in our logic)
where V is a (new) relational symbol and ϕ is a query.
Much Bigger Deal: Physical Data IndependenceLogical Symbols user (visible) relations/tablesMappingPhysical Symbols data structures (indices)
David Toman (et al.) KR in DBMS Implementation Motivation 2 / 18
Data and Constraints: the Database Recap
The Textbook ViewData represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforces by sentences (in the same logic)
⇒ the instance is a model of the constraints
What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ (in our logic)
where V is a (new) relational symbol and ϕ is a query.
Much Bigger Deal: Physical Data IndependenceLogical Symbols user (visible) relations/tablesMapping C/C++ gooPhysical Symbols data structures (indices)
David Toman (et al.) KR in DBMS Implementation Motivation 2 / 18
Data and Constraints: the Database Recap
The Textbook ViewData represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforces by sentences (in the same logic)
⇒ the instance is a model of the constraints
What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ (in our logic)
where V is a (new) relational symbol and ϕ is a query.
Much Bigger Deal: Physical Data IndependenceLogical Symbols user (visible) relations/tablesMapping constraints (+ a minimal runtime)Physical Symbols data structures (indices)
David Toman (et al.) KR in DBMS Implementation Motivation 2 / 18
The KR Way
Queries and OntologiesQueries are answered not only w.r.t. explicit data (A)
but also w.r.t. background knowledge (T ) under OWA⇒ Ontology-based Data Access (OBDA)
ExampleSocrates is a MAN (explicit data)Every MAN is MORTAL (ontology)
List all MORTALs⇒ {Socrtes} (query)
How do we answer queries?Using logical implication (to define certain answers):
Ans(Q,A, T ) := {Q(a1, . . . ,ak ) | T ∪ A |= Q(a1, . . . ,ak )}⇒ answers are ground Q-atoms logically implied by A ∪ T .
David Toman (et al.) KR in DBMS Implementation OBDA Basics 3 / 18
The KR Way
Queries and OntologiesQueries are answered not only w.r.t. explicit data (A)
but also w.r.t. background knowledge (T ) under OWA⇒ Ontology-based Data Access (OBDA)
ExampleSocrates is a MAN (explicit data)Every MAN is MORTAL (ontology)
List all MORTALs⇒ {Socrtes} (query)
How do we answer queries?Using logical implication (to define certain answers):
Ans(Q,A, T ) := {Q(a1, . . . ,ak ) | T ∪ A |= Q(a1, . . . ,ak )}⇒ answers are ground Q-atoms logically implied by A ∪ T .
David Toman (et al.) KR in DBMS Implementation OBDA Basics 3 / 18
Complexity
Good/Standard NewsLOGSPACE/PTIME (data complexity) for query answering:
(U)CQ andDL-Lite/EL⊥/CFD∀nc/“rules”-lite (Horn)
Bad Newsno negative queries/sub-queriesno negations in ABoxno closed-world assumptioncounter-intuitive query answers
David Toman (et al.) KR in DBMS Implementation OBDA Basics 4 / 18
Complexity
Good/Standard NewsLOGSPACE/PTIME (data complexity) for query answering:
(U)CQ andDL-Lite/EL⊥/CFD∀nc/“rules”-lite (Horn)
Bad Newsno negative queries/sub-queriesno negations in ABoxno closed-world assumptioncounter-intuitive query answers
David Toman (et al.) KR in DBMS Implementation OBDA Basics 4 / 18
Difficulties: Unintuitive Answers
ExampleEMP(Sue)
EMP v ∃PHONENUM (or ∀x .EMP(x)→ ∃y .PHONENUM(x , y))
User: Does Sue have a phone number?Information System: YES
User: OK, tell me Sue’s phone number!Information System: (no answer)
User:
David Toman (et al.) KR in DBMS Implementation OBDA Basics 5 / 18
Difficulties: Unintuitive Answers
ExampleEMP(Sue)
EMP v ∃PHONENUM (or ∀x .EMP(x)→ ∃y .PHONENUM(x , y))
User: Does Sue have a phone number?Information System: YES
User: OK, tell me Sue’s phone number!Information System: (no answer)
User:
David Toman (et al.) KR in DBMS Implementation OBDA Basics 5 / 18
Difficulties: Unintuitive Answers
ExampleEMP(Sue)
EMP v ∃PHONENUM (or ∀x .EMP(x)→ ∃y .PHONENUM(x , y))
User: Does Sue have a phone number?Information System: YES
User: OK, tell me Sue’s phone number!Information System: (no answer)
User:
David Toman (et al.) KR in DBMS Implementation OBDA Basics 5 / 18
Difficulties: Unintuitive Answers
ExampleEMP(Sue)
EMP v ∃PHONENUM (or ∀x .EMP(x)→ ∃y .PHONENUM(x , y))
User: Does Sue have a phone number?Information System: YES
User: OK, tell me Sue’s phone number!Information System: (no answer)
User:
David Toman (et al.) KR in DBMS Implementation OBDA Basics 5 / 18
What to do?
Definability and RewritingQueries range-restricted FOL (a.k.a. SQL)Ontology/Schema range-restricted FOL Σ := ΣL ∪ ΣLP ∪ ΣP
Data CWA (complete information)
users: looks like a single model (of the logical schema)implementation: many models
but definable queries answer the same in each of them
Query (SL)��
CompilerRelational Algebra (over SA)
��Schema (SL ∪ SP)
OO
Evaluator // Answers
Data (SA ⊂ SP)
OO
David Toman (et al.) KR in DBMS Implementation Definability/Interpolation 6 / 18
What to do?
Definability and RewritingQueries range-restricted FOL over SL definable w.r.t. Σ and SA
Ontology/Schema range-restricted FOL Σ := ΣL ∪ ΣLP ∪ ΣP
Data CWA (complete information for SA symbols)
ΣL SL ϕoo Logical Schemaand User Queries
ΣLP (rewriting)
��ΣP SA ⊆ SP ψoo Physical Schema
and Query Plans
users: looks like a single model (of the logical schema)implementation: many models
but definable queries answer the same in each of them
Query (SL)��
CompilerRelational Algebra (over SA)
��Schema (SL ∪ SP)
OO
Evaluator // Answers
Data (SA ⊂ SP)
OO
David Toman (et al.) KR in DBMS Implementation Definability/Interpolation 6 / 18
What to do?
Definability and RewritingQueries range-restricted FOL over SL definable w.r.t. Σ and SA
Ontology/Schema range-restricted FOL Σ := ΣL ∪ ΣLP ∪ ΣP
Data CWA (complete information for SA symbols)
users: looks like a single model (of the logical schema)implementation: many models
but definable queries answer the same in each of them
Query (SL)��
CompilerRelational Algebra (over SA)
��Schema (SL ∪ SP)
OO
Evaluator // Answers
Data (SA ⊂ SP)
OO
David Toman (et al.) KR in DBMS Implementation Definability/Interpolation 6 / 18
What to do?
Definability and RewritingQueries range-restricted FOL over SL definable w.r.t. Σ and SA
Ontology/Schema range-restricted FOL Σ := ΣL ∪ ΣLP ∪ ΣP
Data CWA (complete information for SA symbols)
users: looks like a single model (of the logical schema)implementation: many models
but definable queries answer the same in each of them
Query (SL)��
CompilerRelational Algebra (over SA)
��Schema (SL ∪ SP)
OO
Evaluator // Answers
Data (SA ⊂ SP)
OO
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
©2011
David Toman (et al.) KR in DBMS Implementation Definability/Interpolation 6 / 18
GRAND UNIFIED APPROACH
TO QUERY COMPILATION
PART I: WHAT CAN IT DO?
David Toman (et al.) KR in DBMS Implementation 7 / 18
What can this do?
GOALGenerate query plans that compete with hand-written programs in C
1 linked data structures, pointers, . . .2 access to search structures (index access and selection),3 hash-based access to data (including hash-joins),4 multi-level storage (aka disk/remote/distributed files), . . .5 materialized views (FO-definable),6 updates through logical schema (needs id invention!), . . .
. . . all without having to code (too much) in C/C++ !
David Toman (et al.) KR in DBMS Implementation What can it do? 8 / 18
What can this do?
GOALGenerate query plans that compete with hand-written programs in C
1 linked data structures, pointers, . . .2 access to search structures (index access and selection),3 hash-based access to data (including hash-joins),4 multi-level storage (aka disk/remote/distributed files), . . .5 materialized views (FO-definable),6 updates through logical schema (needs id invention!), . . .
. . . all without having to code (too much) in C/C++ !
David Toman (et al.) KR in DBMS Implementation What can it do? 8 / 18
Lists and Pointers1 Logical Schema
employee works department
number oo // enumber number//
name dnumber name
salary manager
oo
2 Physical Design: a linked list of emp records pointing to dept records.record emp of
integer numstring nameinteger salaryreference dept
record dept ofinteger numstring namereference manager
3 Access Paths: empfile/1/0, emp-num/2/1, . . . (but no deptfile)4 Integrity Constraints (many), e.g.,
∀x , y , z.employee(x , y , z)→ ∃w .empfile(w) ∧ emp-num(w , x),∀a, x .empfile(a) ∧ emp-num(a, x)→ ∃y , z.employee(x , y , z), . . .
David Toman (et al.) KR in DBMS Implementation What can it do? 9 / 18
What can this do: navigating pointers
Example queries:1 List all employee numbers and names (∃z,w .employee(x , y , z,w)):
∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)
2 List all department numbers with their manager names(∃z,u, v ,w .department(x , z,u) ∧ employee(u, y , v ,w)):
.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)
⇒ needs “departments have at least one employee”.. . . needs duplicate elimination during projection.
.empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)
⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.
David Toman (et al.) KR in DBMS Implementation What can it do? 10 / 18
What can this do: navigating pointers
Example queries:1 List all employee numbers and names (∃z,w .employee(x , y , z,w)):
∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)
2 List all department numbers with their manager names(∃z,u, v ,w .department(x , z,u) ∧ employee(u, y , v ,w)):
.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)
⇒ needs “departments have at least one employee”.. . . needs duplicate elimination during projection.
.empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)
⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.
David Toman (et al.) KR in DBMS Implementation What can it do? 10 / 18
What can this do: navigating pointers
Example queries:1 List all employee numbers and names (∃z,w .employee(x , y , z,w)):
∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)
2 List all department numbers with their manager names(∃z,u, v ,w .department(x , z,u) ∧ employee(u, y , v ,w)):
∃a,d ,e.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)
⇒ needs “departments have at least one employee”.
. . . needs duplicate elimination during projection.
.empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)
⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.
David Toman (et al.) KR in DBMS Implementation What can it do? 10 / 18
What can this do: navigating pointers
Example queries:1 List all employee numbers and names (∃z,w .employee(x , y , z,w)):
∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)
2 List all department numbers with their manager names(∃z,u, v ,w .department(x , z,u) ∧ employee(u, y , v ,w)):
∃a,d ,e.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)
⇒ needs “departments have at least one employee”.
. . . needs duplicate elimination during projection.
∃a,b,d .empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)
⇒ needs “managers work in their own departments”.
. . . NO duplicate elimination during projection.
David Toman (et al.) KR in DBMS Implementation What can it do? 10 / 18
What can this do: navigating pointers
Example queries:1 List all employee numbers and names (∃z,w .employee(x , y , z,w)):
∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)
2 List all department numbers with their manager names(∃z,u, v ,w .department(x , z,u) ∧ employee(u, y , v ,w)):
∃a,d ,e.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)
⇒ needs “departments have at least one employee”.. . . needs duplicate elimination during projection.
∃a,b,d .empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)
⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.
David Toman (et al.) KR in DBMS Implementation What can it do? 10 / 18
What can this do: two-level store
The access path empfile is refined by emppages/1/0 and emprecords/2/1:emppages returns (sequentially) disk pages containing emp records, andemprecords given a disc page, returns emp records in that page.
5 List all employees with the same name(∃z,u, v ,w , t .employee(x1, z,u, v) ∧ employee(x2, z,w , t)):
∃y , z,w , v ,p,q.emppages(p) ∧ emppages(q)∧ emprecords(p, y) ∧ emp-num(y , x1) ∧ emp-name(y ,w)∧ emprecords(q, z) ∧ emp-num(z, x2) ∧ emp-name(z, v)
∧ compare(w , v).
⇒ this plan implements the block nested loops join algorithm.
. . . many more examples inMorgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
.
David Toman (et al.) KR in DBMS Implementation What can it do? 11 / 18
GRAND UNIFIED APPROACH
TO QUERY COMPILATION
PART II: HOW DOES IT WORK?
David Toman (et al.) KR in DBMS Implementation 12 / 18
The Plan
Definability and RewritingQueries range-restricted FOL over SL definable w.r.t. Σ and SAOntology/Schema range-restricted FOLData CWA (complete information for SA symbols)
ΣL SL ϕoo (Logical Schema)
ΣLP (rewriting)
��ΣP SA ⊆ SP ψoo (Physical Schema)
David Toman (et al.) KR in DBMS Implementation How does it work? 13 / 18
Query Plans via Interpolation
IDEA #1:Represent physical design as access paths (SA) and constraints (Σ).Represent query plans as (annotated) range-restricted formulas ψ over SA.
atomic formula 7→ access pathconjunction 7→ nested loops joinexistential quantifier 7→ projection (annotated w/ duplicate info)disjunction 7→ concatenationnegation 7→ simple complement
⇒ reduces correctness of ψ w.r.t. the user query ϕ to Σ |= ϕ↔ ψ
IDEA #2:Use interpolation to search for ψ:
extract an interpolant ψ from a (TABLEAU) proof of Σ ∪ Σ∗ |= ϕ→ ϕ∗
⇒ Beth Definability of ϕ over Σ and SA resolves the existence of ψ(except for binding patterns)
David Toman (et al.) KR in DBMS Implementation How does it work? 14 / 18
Query Plans via Interpolation
IDEA #1:Represent physical design as access paths (SA) and constraints (Σ).Represent query plans as (annotated) range-restricted formulas ψ over SA.
⇒ reduces correctness of ψ w.r.t. the user query ϕ to Σ |= ϕ↔ ψ
IDEA #2:Use interpolation to search for ψ:
extract an interpolant ψ from a (TABLEAU) proof of Σ ∪ Σ∗ |= ϕ→ ϕ∗
⇒ Beth Definability of ϕ over Σ and SA resolves the existence of ψ(except for binding patterns)
David Toman (et al.) KR in DBMS Implementation How does it work? 14 / 18
Query Plans via Interpolation
IDEA #1:Represent physical design as access paths (SA) and constraints (Σ).Represent query plans as (annotated) range-restricted formulas ψ over SA.
⇒ reduces correctness of ψ w.r.t. the user query ϕ to Σ |= ϕ↔ ψ
IDEA #2:Use interpolation to search for ψ:
extract an interpolant ψ from a (TABLEAU) proof of Σ ∪ Σ∗ |= ϕ→ ϕ∗
⇒ Beth Definability of ϕ over Σ and SA resolves the existence of ψ(except for binding patterns)
David Toman (et al.) KR in DBMS Implementation How does it work? 14 / 18
Query Plans via Interpolation
IDEA #1:Represent physical design as access paths (SA) and constraints (Σ).Represent query plans as (annotated) range-restricted formulas ψ over SA.
⇒ reduces correctness of ψ w.r.t. the user query ϕ to Σ |= ϕ↔ ψ
IDEA #2:Use interpolation to search for ψ:
extract an interpolant ψ from a (TABLEAU) proof of Σ ∪ Σ∗ |= ϕ→ ϕ∗
⇒ Beth Definability of ϕ over Σ and SA resolves the existence of ψ(except for binding patterns)
David Toman (et al.) KR in DBMS Implementation How does it work? 14 / 18
Engineering Issues
Subformula (structural) Property: not enough rewritings (plans)
• ΣL ∪ ΣR ∪ ΣLR |= ϕL → ϕR where ΣLR = {∀x̄ .PL ↔ P ↔ PR | P ∈ SA}
Alternative Proofs/Plans: backtracking is too slow
• conditional formulæ: ϕ[C] where C is a set of (ground) literals over SA
• logical (non-backtrackable) conditional tableau (T L,T R)
• cost-based plan enumeration based on closing sets in (T L,T R) and ΣLR
Non-logical Features: dealing with duplicates et al.
• Q[∃x .Q1] 7→ Q[∃x .Q1] if Σ ∪ {Q[] ∧Q1[y1/x ] ∧Q1[y2/x ]} |= y1 ≈ y2
• Q[Q1 ∨Q2] 7→ Q[Q1 ∨Q2] if Σ ∪ {Q[]} |= Q1 ∧Q2 → ⊥
⇒ CFDInc description logic approximation of Σ (PTIME reasoning).
. . . for details seeMorgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
.
David Toman (et al.) KR in DBMS Implementation How does it work? 15 / 18
Engineering Issues
Subformula (structural) Property: not enough rewritings (plans)
• ΣL ∪ ΣR ∪ ΣLR |= ϕL → ϕR where ΣLR = {∀x̄ .PL ↔ P ↔ PR | P ∈ SA}
Alternative Proofs/Plans: backtracking is too slow
• conditional formulæ: ϕ[C] where C is a set of (ground) literals over SA
• logical (non-backtrackable) conditional tableau (T L,T R)
• cost-based plan enumeration based on closing sets in (T L,T R) and ΣLR
Non-logical Features: dealing with duplicates et al.
• Q[∃x .Q1] 7→ Q[∃x .Q1] if Σ ∪ {Q[] ∧Q1[y1/x ] ∧Q1[y2/x ]} |= y1 ≈ y2
• Q[Q1 ∨Q2] 7→ Q[Q1 ∨Q2] if Σ ∪ {Q[]} |= Q1 ∧Q2 → ⊥
⇒ CFDInc description logic approximation of Σ (PTIME reasoning).
. . . for details seeMorgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
.
David Toman (et al.) KR in DBMS Implementation How does it work? 15 / 18
Engineering Issues
Subformula (structural) Property: not enough rewritings (plans)
• ΣL ∪ ΣR ∪ ΣLR |= ϕL → ϕR where ΣLR = {∀x̄ .PL ↔ P ↔ PR | P ∈ SA}
Alternative Proofs/Plans: backtracking is too slow
• conditional formulæ: ϕ[C] where C is a set of (ground) literals over SA
• logical (non-backtrackable) conditional tableau (T L,T R)
• cost-based plan enumeration based on closing sets in (T L,T R) and ΣLR
Non-logical Features: dealing with duplicates et al.
• Q[∃x .Q1] 7→ Q[∃x .Q1] if Σ ∪ {Q[] ∧Q1[y1/x ] ∧Q1[y2/x ]} |= y1 ≈ y2
• Q[Q1 ∨Q2] 7→ Q[Q1 ∨Q2] if Σ ∪ {Q[]} |= Q1 ∧Q2 → ⊥
⇒ CFDInc description logic approximation of Σ (PTIME reasoning).
. . . for details seeMorgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m
Series Editor: M. Tamer Özsu, University of Waterloo
CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT
SYNTHESIS LECTURES ON DATA MANAGEMENT
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
M. Tamer Özsu, Series Editor
MORGAN
&CLAYPO
OL
ISBN: 978-1-60845-278-1
9 781608 452781
90000
Series ISSN: 2153-5418
FUNDAMENTALS OF PHYSICAL DESIGN AND Q
UERY COMPILATION
Fundamentals of Physical Design andQuery Compilation
University of Waterloo
Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.
Fundamentals ofPhysical Design andQuery Compilation
David Toman
.
David Toman (et al.) KR in DBMS Implementation How does it work? 15 / 18
Summary of the Approach
1 FO (DLFDE) tableau based interpolation algorithm⇒ enumeration of plans factored from reasoning⇒ range-restricted queries and constraints→ ground terms only⇒ extra-logical binding patterns and cost model
2 Post processing (using CFDInc approximation)⇒ duplicate elimination elimination⇒ cut insertion
3 Run time⇒ library of common data structures+schema constraints
or an interface to a legacy system⇒ finger data structures to simulate merge joins et al.
David Toman (et al.) KR in DBMS Implementation Summary 16 / 18
Research Directions and Open Issues
1 Dealing with ordered data? (merge-joins etc.: we have a partial solution)
2 Decidable schema languages (decidable interpolation problem)?
3 More powerful schema languages (inductive types, etc.)?
4 Beyond FO Queries/Views (e.g., count/sum aggregates)?
5 Coding extra-logical bits (e.g., binding patterns, postprocessing, etc. )in the schema itself?
6 Standard Designs (a plan can always be found as in SQL)?
7 Explanation(s) of non-definability?
8 Fine(r)-grained updates?
9 . . .
. . . and, as always, performance, performance, performance!
David Toman (et al.) KR in DBMS Implementation Summary 17 / 18
Message from our Sponsors
Database Group at the University of Waterloo7 professors, affiliated faculty, postdocs, 30+ graduate students, . . .wide range of research interests
Advanced query processing/Knowledge representationSystem aspects of database systems and Distributed data managementData quality/Managing uncertain data/Data miningNew(-ish) domains (text, streaming, graph data/RDF, OLAP)
research sponsored by governments, and local/global companiesNSERC/CFI/OIT and Google, IBM, SAP, OpenText, . . .
part of a School of CS with 75+ professors, 300+ grad students, etc.AI&ML, Algorithms&Data Structures, PL, Theory, Systems, . . .
Cheriton School of Computer Science has been ranked #18 in CS bythe world by US News and World Report (#1 in Canada).
. . . and we are always looking for good graduate students (MMath/PhD)⇒ comes with full support over multiple years
David Toman (et al.) KR in DBMS Implementation Summary 18 / 18