Upload
phungdiep
View
216
Download
0
Embed Size (px)
Citation preview
Uncertainty in Databases
Lecture 2: Essential Database Foundations
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Table of Contents
1 Historical Background
2 Database Model
3 Queries
4 Constraints
5 Complexity
6 References
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Codd’s VisionCodd Catches OnTop Academic RecognitionSelected Publication Venues
Table of Contents
1 Historical Background
2 Database Model
3 Queries
4 Constraints
5 Complexity
6 References
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Codd’s VisionCodd Catches OnTop Academic RecognitionSelected Publication Venues
Codd’s Vision
I 1970: Edgar F. Codd invents the Relational Database modelI Data stored as a collection fo relations that conform to a
schema and can be accessed through a query language overthe schema
I Separation between the logical and phisical layersI Work done in IBM San Jose, which is now IBM AlmadenI E. F. Codd: A Relational Model of Data for Large Shared Data
Banks. In Communications of the ACM 13(6): 377-387 (1970)
I 1972: Codd introduced the relational algebra and therelational calculus (logical view of database querying), andproved their equal expressive power [Cod72]
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Codd’s VisionCodd Catches OnTop Academic RecognitionSelected Publication Venues
Codd Catches On
I 1973: Michael Stonebraker and Eugene Wong implementCodd’s vision in INGRES, which commercialised in 1983 andevolved to Postgres (now PostgreSQL) in 1989
I 1974: A group from the IBM San Jose lab implements Codd’svision in System R, which evolved to DB2 in 1983
I SQL initially developed at IBM by Donald D. Chamberlin andRaymond F. Boyce [CB74]
I 1977: Influenced by Codd, Larry Ellison founds SoftwareDevelopment Laboratories, which becomes RelationalSoftware in 1979, which becomes Oracle Systems Corporationin 1982, named after its flagship product—Oracle database
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Codd’s VisionCodd Catches OnTop Academic RecognitionSelected Publication Venues
Top Academic Recognition
1981: Codd receives the ACM Turing Award
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Codd’s VisionCodd Catches OnTop Academic RecognitionSelected Publication Venues
Breaking News
Last week: ACM announces Michael Stonebraker as the 2014Turing Award winner for fundamental contributions to the concepts
and practices underlying modern database systems
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Codd’s VisionCodd Catches OnTop Academic RecognitionSelected Publication Venues
Selected Publication Venues
I Conferences:I SIGMOD: ACM Special Interest Group on Management of
Data (since 1975)I PODS: ACM Symp. on Principles of Database Systems (since
1982)I VLDB: Intl. Conf. on Very Large Databases (since 1975)I ICDE: IEEE Intl. Conf. on Data Engineering (since 1984)I ICDT: Intl. Conference on Database Theory (since 1986)I EDBT: Intl. Conference on Extending Database Technology
(since 1988)
I Journals:I TODS: ACM Transactions on Database Systems (since 1976)I VLDBJ: The VLDB Journal (since 1992)I SIGMOD REC: ACM SIGMOD Record (since 1969)
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
RelationsDatabase SchemasLogical Viewpoint
Table of Contents
1 Historical Background
2 Database Model
3 Queries
4 Constraints
5 Complexity
6 References
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
RelationsDatabase SchemasLogical Viewpoint
Relations
I A relation r consists of a heading (A1, . . . , Am), which is asequence (A1, . . . , Am) of distinct attributes, and a body,which is a finite collection of tuples t = (a1, . . . , am) of values
I We assume some infinite domain of values (or constants)
I By a convenient abuse of notation, a relation is oftenidentified with its bodyI (e.g., t ∈ r means that t is a tuple in the body of r)
I We may refer to the ith value in a tuple t ∈ R as t.Ai or t[i]
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
RelationsDatabase SchemasLogical Viewpoint
Database Schemas
I A relation schema has a relation name (or relation symbol) Rand a heading (A1, . . . , An), which is again a sequence ofattributes; it is denoted by R(A1, . . . , Am)I Or simply R if the attributes are not important or clear from
the context
I The arity of R(A1, . . . , Am) is m, and is denoted by ar(R)
I Sometimes the attributes are not important, and we may usejust R/m to specify that R is a relation name of arity m
I A schema is a pair S = (R,Σ), where R is a set of relationschemas with distinct names, and Σ is a set of constraintsover RI R is often called a signatureI We later discuss languages of constraints
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
RelationsDatabase SchemasLogical Viewpoint
Database Instance
I A relation r is said to be over a relation schema R if r and Rhave the same heading
I A database instance (or just instance) I over a schemaS = (R,Σ) associates with every relation name R a relationRI over R, such that all the constraints in Σ are satisfied
I We denote by Inst(S) the set of all the instances over S
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
RelationsDatabase SchemasLogical Viewpoint
Logical Viewpoint
I It is convenient and common to view the database as a logicalsystem (first-order of higher-order logic)I vocabulary=schema+built-in predicates (e.g., <, >, =), and
structure=instance
I Database queries are viewed as logical formulas ϕ(x) over thedatabase: Q(I) = {a | I |= ϕ(a)}
I But there are some significant restrictions of the logic:I We usually have only relation (no function) symbolsI We usually consider only finite structures (cf. finite-model
theory)I Queries should be independent of the domain outside the
database (cf. relational calculus)
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
RelationsDatabase SchemasLogical Viewpoint
On Finite Structures
I Consider the set F = {ϕi | i = 1, 2, . . . } of sentences whereϕi is the sentence “R has at least i distinct tuples”
I Each ϕi is expressible in first-order logic
I Every finite subset of F has a finite model, but there is nofinite model for F
I Hence, the compactness theorem no longer holds in the finite
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Relational AlgebraSQLConjunctive Queries
Table of Contents
1 Historical Background
2 Database Model
3 Queries
4 Constraints
5 Complexity
6 References
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Relational AlgebraSQLConjunctive Queries
Queries
I The queries we will consider are such that produce a relationfrom the database
I Formally, a query Q over a schema S is associated with aheading (A1, . . . , Ak), and it maps every instance I ∈ Inst(S)into a relation with that heading
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Relational AlgebraSQLConjunctive Queries
Example
Takes
student cno
Ahuva 1Alon 1
Ahuva 2
Course
cno cname
1 AI2 DB3 PL
Find all pairs (s, c) of students and courses, such that there existsa course number x where both Takes(s, x) and Course(x, c) hold
student cname
Ahuva AIAlon AI
Ahuva DB
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Relational AlgebraSQLConjunctive Queries
Boolean Queries
I A special case is where k = 0, and then the result eithercontains the empty tuple or is empty; in this case we say thatthe query is Boolean
I Example: Is it the case that for some x, bothTakes(’Ahuva’, x) and Course(x, ’DB’) hold?
I We often denoteI Q(I) = {()} by Q(I) = true or I |= QI Q(I) = ∅ by Q(I) = false or I 6|= Q
I Boolean queries are very important in the analysis querylanguages (expressiveness, complexity, optimization andequivalence, etc.)
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Relational AlgebraSQLConjunctive Queries
Relational Algebra
I Introduced by Codd [Cod72]
I Used by existing database systems, mainly for internalquery-plan optimization
I A collection of operations over relationsI Unary: r → t; binary: (r, s)→ t
I Queries via:1 Applying the operators to the database relations2 Composition
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Relational AlgebraSQLConjunctive Queries
Algebraic Operators
I Union (∪), difference (−)I R ∪ S and R− S allowed if R and S are union compatible,
that is, the have the same heading
I Cartesian product (×)I R× S allowed only when R and S have disjoint headings
I Projection (π)I πA′
1,...,A′k(R) allowed if A′
1, . . . , A′k are distinct attributes of R
I Selection (σ)I σϕ(R) allowed if ϕ is a condition over the attributes of R
(e.g., A1 = A2 or A1 6= A2)
I Renaming (ρ)I ρA→B(R) allowed if A is an attribute of R and B is not an
attribute of R
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Relational AlgebraSQLConjunctive Queries
Example
Takesstudent cno
Ahuva 1Alon 1
Ahuva 2
Coursecno cname
1 AI2 DB3 PL
Takes× ρcno→cCourse
student cno c cname
Ahuva 1 1 AIAlon 1 1 AI
Ahuva 2 2 DBAhuva 1 2 DBAlon 1 3 PL
Ahuva 2 3 PL
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Relational AlgebraSQLConjunctive Queries
Example
Takesstudent cno
Ahuva 1Alon 1
Ahuva 2
Coursecno cname
1 AI2 DB3 PL
σcno=c(Takes× ρcno→cCourse)
student cno c cname
Ahuva 1 1 AIAlon 1 1 AI
Ahuva 2 2 DB
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Relational AlgebraSQLConjunctive Queries
Example
Takesstudent cno
Ahuva 1Alon 1
Ahuva 2
Coursecno cname
1 AI2 DB3 PL
πstudent,cno,cname
(σcno=c(Takes× ρcno→cCourse)
)student cno cname
Ahuva 1 AIAlon 1 AI
Ahuva 2 DB
In short: Takes ./ Course (natural join)
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Relational AlgebraSQLConjunctive Queries
Example
Takesstudent cno
Ahuva 1Alon 1
Ahuva 2
Coursecno cname
1 AI2 DB3 PL
πstudent,cname
(σcno=c(Takes× ρcno→cCourse)
)student cname
Ahuva AIAlon AI
Ahuva DB
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Relational AlgebraSQLConjunctive Queries
SQL
I SQL (Structured Query Language) is natural language toexpress relational algebraI SELECT (projection) . . . AS (rename)I FROM (Cartesian product)I WHERE (selection)I UNIONI MINUS
I And much more, e.g., aggregate operators (e.g., COUNT,SUM), clustering operators (e.g., GROUP BY, HAVING),ranking (e.g., ORDER BY, LIMIT), and more
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Relational AlgebraSQLConjunctive Queries
The Example in SQL
Takes
student cno
Ahuva 1Alon 1
Ahuva 2
Course
cno cname
1 AI2 DB3 PL
Find all pairs (s, c) of students and courses, such that there existsa course number x where both Takes(s, x) and Course(x, c) hold
SELECT S.student, C.cname
FROM Takes T, Course C
WHERE T.cno = C.cno
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Relational AlgebraSQLConjunctive Queries
Conjunctive Queries
I Conjunctive Queries (CQs) are SELECT-FROM-WHEREqueries (no MINUS, UNION, etc.) such that all the WHEREconditions are equalities among attributes
I CQs are typically represented in the following FOL notation:
Q(x) :− ∃y[ϕ1(x,y) ∧ · · · ∧ ϕk(x,y)
]where:I x and y are disjoint sequences of variablesI Each ϕi(x,y) is a an atomic formula of the form
R(τ1, . . . , τm) where R is an m-ary relation in the schema andeach τj is either a variable in x, a variable in y, or a constantvalue (e.g., 7 or ’Ahuva’)
I Every variable in x occurs at least once on the right hand side
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Relational AlgebraSQLConjunctive Queries
CQ Terminology and Notation
Q(x) :− ∃y[ϕ1(x,y) ∧ · · · ∧ ϕk(x,y)
]For simplification, quantification and conjunction are omitted:
Q(x)︸ ︷︷ ︸head
:−atom︷ ︸︸ ︷
ϕ1(x,y) , · · · , ϕk(x,y)︸ ︷︷ ︸body
A variables in x is called a free or head variable, and a variable iny is called an existential variable
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Relational AlgebraSQLConjunctive Queries
CQ Terminology and Notation
Q(x) :− ∃y[ϕ1(x,y) ∧ · · · ∧ ϕk(x,y)
]For simplification, quantification and conjunction are omitted:
Q(x)
︸ ︷︷ ︸head
:−
atom︷ ︸︸ ︷
ϕ1(x,y) , · · · , ϕk(x,y)
︸ ︷︷ ︸body
A variables in x is called a free or head variable, and a variable iny is called an existential variable
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Relational AlgebraSQLConjunctive Queries
CQ Terminology and Notation
Q(x) :− ∃y[ϕ1(x,y) ∧ · · · ∧ ϕk(x,y)
]For simplification, quantification and conjunction are omitted:
Q(x)︸ ︷︷ ︸head
:−atom︷ ︸︸ ︷
ϕ1(x,y) , · · · , ϕk(x,y)︸ ︷︷ ︸body
A variables in x is called a free or head variable, and a variable iny is called an existential variable
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Relational AlgebraSQLConjunctive Queries
The Example as CQ
Takes
student cno
Ahuva 1Alon 1
Ahuva 2
Course
cno cname
1 AI2 DB3 PL
Find all pairs (s, c) of students and courses, such that there existsa course number x where both Takes(s, x) and Course(x, c) hold
Q(s, c) :− Takes(s, x) ∧ Course(x, c)
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Relational AlgebraSQLConjunctive Queries
Why Are CQs Interesting?
I This is the class of all the queries that can be phrased in RAwhen using only:I Selection with equality predicateI ProjectionI Join
I For that reason, CQs are often called SPJ queries
I CQs are the building block of expressive query languages, suchas Datalog
I Useful queries that are simple enough to perform deepinvestigation for various database problemsI As we shall see, there are significant deep insights and
algorithms that apply only to conjunctive queries
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
Table of Contents
1 Historical Background
2 Database Model
3 Queries
4 Constraints
5 Complexity
6 References
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
Why Do We Care about Constraints?
I Allow to enforce database coherence and avoid bugs
I Allow to formally determine what inconsistency meansI Very relevant to us
I May have dramatic effect on algorithms and complexity
I By focusing on specific classes of constraint languages, weallow for nontrivial analysis
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
Functional Dependencies
Lab
name faculty
SIPL EELCL CS
SSDL CSSTAT IE
LabRoom
lab building room
SIPL Meyer 100SIPL Meyer 101LCL Taub 100
SSDL Taub 200
I A lab belongs to just one faculty (i.e., name is a key for Lab)
I A specific room in a specific building belongs to only one lab
I A lab may have multiple rooms, but all in the same building
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
Formal Definition
I Let S be a schema
I A Functional Dependency (FD) over S is an expression of theform R : U → V , where U and V are sets of attributes of R
I An instance I over S satisfies the FD R : U → V if for everytwo tuples t1 and t2 of RI :
t1 and t2 agree on U ⇒ t1 and t2 agree on V
I By “agree on W” we mean that t1 and t2 have the same valuein every position that corresponds to an attribute of W
I If U and V cover all the attributes of R, then R : U → V is akey constraint and U is said to be a key for R
I As a simplified notation, we write U and V by simply listingtheir attributes (no set notation)
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
Example Revisited
Lab
name faculty
SIPL EELCL CS
SSDL CSSTAT IE
LabRoom
lab building room
SIPL Meyer 100SIPL Meyer 101LCL Taub 100
SSDL Taub 200
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
Example Revisited
Lab
name faculty
SIPL EELCL CS
SSDL CSSTAT IE
LabRoom
lab building room
SIPL Meyer 100SIPL Meyer 101LCL Taub 100
SSDL Taub 200
I A lab belongs to just one faculty (i.e., lab is a key for Lab)
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
Example Revisited
Lab
name faculty
SIPL EELCL CS
SSDL CSSTAT IE
LabRoom
lab building room
SIPL Meyer 100SIPL Meyer 101LCL Taub 100
SSDL Taub 200
I A lab belongs to just one faculty (i.e., lab is a key for Lab)
Lab : name→ faculty
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
Example Revisited
Lab
name faculty
SIPL EELCL CS
SSDL CSSTAT IE
LabRoom
lab building room
SIPL Meyer 100SIPL Meyer 101LCL Taub 100
SSDL Taub 200
I A specific room in a specific building belongs to only one lab
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
Example Revisited
Lab
name faculty
SIPL EELCL CS
SSDL CSSTAT IE
LabRoom
lab building room
SIPL Meyer 100SIPL Meyer 101LCL Taub 100
SSDL Taub 200
I A specific room in a specific building belongs to only one lab
LabRoom : building room→ lab
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
Example Revisited
Lab
name faculty
SIPL EELCL CS
SSDL CSSTAT IE
LabRoom
lab building room
SIPL Meyer 100SIPL Meyer 101LCL Taub 100
SSDL Taub 200
I A lab may have multiple rooms, but all in the same building
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
Example Revisited
Lab
name faculty
SIPL EELCL CS
SSDL CSSTAT IE
LabRoom
lab building room
SIPL Meyer 100SIPL Meyer 101LCL Taub 100
SSDL Taub 200
I A lab may have multiple rooms, but all in the same building
LabRoom : lab→ building
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
Generalizing FDs
I There are various formalisms that naturally extend FDs tocross-relation dependencies
I Following are two popular examples:I Equality-Generating Dependencies (EGDs)I Denial Constraints (DCs)
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
EGDs
I The FD Lab : name→ faculty can be phrased in FOL as
∀x, y, z[Lab(x, y) ∧ Lab(x, z)→ y = z
]I An EGD is an expression of the form
∀x[ϕ(x)→ y1 = y2
]I ϕ(x) is a conjunction of atomic formulasI y1 and y2 are variables in x
I Example:
Lab(l1, f1), Lab(l2, f2), LabRoom(l1, b, r1), LabRoom(l2, b, r2)
→ f1 = f2
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
DCs
I The FD Lab : name→ faculty can be phrased in FOL as
∀x, y, z ¬(Lab(x, y) ∧ Lab(x, z) ∧ y 6= z
)I A DC is an expression of the form
∀x ¬(ϕ(x) ∧ ψ(x)
)I x = (x1, . . . , xn) is a sequence of variablesI Each ϕ(x) is a conjunction of atomic formulasI Each γ(x) is a conjunction of comparisons between two
variables in x (e.g., x1 6= x2, x1 < x2, x1 ≥ x2, etc.)
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
Inclusion Dependencies
Lab
name faculty
SIPL EELCL CS
SSDL CSSTAT IE
LabRoom
lab building room
SIPL Meyer 100SIPL Meyer 101LCL Taub 100
SSDL Taub 200
I Every lab in LabRoom should be listed in the Lab relation(foreign key)
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
Formal Defintion
I Let S be a schema
I An INclusion Dependency (IND) over S is an expression δ ofthe form
R[A1, . . . , Am] ⊆ S[B1, . . . , Bm]
where:I R and S are relation name in S
I R and S may be equal
I A1, . . . , Am are distinct attributes of RI B1, . . . , Bm are distinct attributes of S
I An instance I over S satisfies δ if
πA1,...,Am(RI) ⊆ πB1,...,Bm(SI)
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
Example
Consider the relation Friend[person1, person2]
Friend[person1, person2] ⊆ Friend[person2, person1]
means that friendship is symmetric
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
Another Example
Lab
name faculty
SIPL EELCL CS
SSDL CSSTAT IE
LabRoom
lab building room
SIPL Meyer 100SIPL Meyer 101LCL Taub 100
SSDL Taub 200
I Every lab should be listed in the Lab relation (foreign key)
LabRoom[lab] ⊆ Lab[name]
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
Tuple-Generating Dependencies
I The IND LabRoom[lab] ⊆ Lab[name] can be phrased in FOLas
∀x, y, z[LabRoom(x, y, z)→ ∃w[Lab(x,w)]
]I A Tuple-Generating Dependency (TGD) is an expression of
the form∀x[ϕ(x)→ ∃yψ(x,y)
]where ϕ(x) and ψ(x,y) are conjunctions of atomic formulas
I Example:
Researcher(p, l), LabRoom(l, b, r)→ ∃r′[PersonRoom(p, b, r′)]
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies
Question
Can we express that Friends is transitive with TGDs?
Can we express that Friends has no triangles with TGDs?
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Data and Combined ComplexityOther Yardsticks of Efficiency
Table of Contents
1 Historical Background
2 Database Model
3 Queries
4 Constraints
5 Complexity
6 References
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Data and Combined ComplexityOther Yardsticks of Efficiency
Types of Database Complexity
I We consider computational problems that involve one or moreof the following components:I Schema SI A set Σ of constraintsI A query QI A database instance I
I Common complexity measures designed to distinguish oneinput from another (e.g., instances are far bigger thanschemas/queries)
I Combined complexity: everyting is given as input
I Data complexity: I is given as input, everything else is fixedI Formally, we consider infinitely many computational problems
PS,Σ,Q, one per combination of S, Σ and Q
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Data and Combined ComplexityOther Yardsticks of Efficiency
Example: Complexity of CQ Answering
Problem Def. (Boolean CQ Evaluation)
Given a schema S, a Boolean CQ Q over S and an instance I overS, determine whether Q(I) = true.
We will show that this problem is NP-complete under combinedcomplexity, by reduction from the Clique problem.
Problem Def. (Clique)
Given a graph G = (V,E) and a number k, determine whether Gcontains a clique of size k, that is, a subset U of V such that|U | = k and every two nodes in U are neighbours.
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Data and Combined ComplexityOther Yardsticks of Efficiency
Reduction
I Given G = (V,E) with V = {1, . . . , n}, and k, construct:I S = {RE/2}I IG = {RE(i, j) | {i, j} ∈ E and i < j}I Qk is a CQ with existential variables X1, . . . , Xk, and an atom
RE(Xi, Xj) for every i and j with 1 ≤ i < j ≤ kI For example, suppose that G is the following graph:
3
1 2
4
IG = RE
1 32 32 43 4
Q3 :− RE(X1, X2), RE(X1, X3), R(X2, X3)
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Data and Combined ComplexityOther Yardsticks of Efficiency
Correctness
I The reduction is correct since the following two areequivalent:
1 G has a clique of size at least k2 Qk(IG) = true
I Hence, determining whether Q(I) = true, given S, Q and I,is NP-hardI Membership in NP is straightforward, hence, the problem is
NP-complete
I Note: The schema S does not depend on the input (G, k),but the size of Q is quadratic in k
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Data and Combined ComplexityOther Yardsticks of Efficiency
Data Complexity
What is the data complexity of answering a query in RA?
I We consider the problem PS,Q of computing the answers for aquery Q in RA (Relational Algebra) over a given inputinstance I over S
I The naive way of straightforwardly executing Q runs inpolynomial time!
I As a special case, CQ evaluation is in polynomial time underdata complexityI Note that data complexity is insensitive to the representation
of the query
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Data and Combined ComplexityOther Yardsticks of Efficiency
Other Yardsticks of Efficiency
Other yardsticks of efficiency are often used in databasecomplexity; here are two examples:
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Data and Combined ComplexityOther Yardsticks of Efficiency
Parameterized Complexity
I Parameterized complexity is between data complexity andcombined complexity;I Query evaluation is Fixed Parameter Tractable if it can be
evaluated in time O(f(Q) · p(I)), where f is any (computable)function of the query in p(I) is polynomial in the size of II Our reduction from clique shows that CQ evaluation is not
likely to be FPT
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Data and Combined ComplexityOther Yardsticks of Efficiency
Input-Output Complexity
I A query (e.g., CQ) may be required to output an exponentialnumber of results
I Hence, it makes no sense to require evaluation in polynomialtime
I Input-output complexity measures the time as a function ofboth the input and the outputI Polynomial total time: the running time is polynomial in the
combined size of the input and the outputI Polynomial delay: the answers are produced one by one, where
the delay between every two answers is polynomial in the inputonly
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
Table of Contents
1 Historical Background
2 Database Model
3 Queries
4 Constraints
5 Complexity
6 References
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
Historical BackgroundDatabase Model
QueriesConstraintsComplexityReferences
References I
Donald D. Chamberlin and Raymond F. Boyce, SEQUEL: Astructured english query language, SIGMOD, ACM, 1974,pp. 249–264.
E. F. Codd, A relational model of data for large shared databanks, Commun. ACM 13 (1970), no. 6, 377–387.
, Relational completeness of data base sublanguages,Database Systems (1972), 65–98.
Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2
End of lecture 2Essential Database Foundations