33
Database Systems 236363 Relational Calculus

Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Embed Size (px)

Citation preview

Page 1: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Database Systems236363

Relational Calculus

Page 2: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Relational Algebra vs. Relational Calculus

• Relational algebra queries are relatively easy to implement in a programming language, yet the translation of a query from a natural language (or even SQL) to relational algebra is not intuitive

• In the Domain Relational Calculus (DRC) language and its variant Tuple Relational Calculus (TRC), the terms of the query are expressed as first-order logic expression, which is more similar to the way humans compose questions, in contrast to the “algorithmic” structure of relational algebra

Page 3: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Domain Relational Calculus (DRC)

Page 4: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

DRC – Basic Concept• DRC enables writing logical expressions describing conditions on the

records• Each such expression will have free variables (defined later); the

truth/false value of the expression will be evaluated with respect to these variables

• The expression (x1,...,xn) describes a condition whose free variables are x1,...,xn

– is also called a formula• The free variables of a formula describe the attributes of the records

– For any set of possible values 1,...,n for the free variables x1,...,xn, the expression returns a truth value• Either (1,..., n) is satisfied or (1,..., n) is not satisfied

– R[A1,...,An] = {<x1,...,xn> : (x1,...,xn)} stands for the relation consisting of exactly all n-tuples satisfying (x1,...,xn)

Page 5: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Atomic Expressions

• Belongs to formulas– If R[A1,...,An] is a relation in the database, then the formula

R(x1,...,xn) is a formula with n free variables• For any sequence of values 1,...,n, the formula R(1,...,n) is satisfied iff

the relation includes a record with attributes corresponding to these values

– We can also write <1 ,..., n>R instead of R(1,...,n)

• Comparisons– Between variables and between variables and constants using

the usual comparison operators such as “z=2”, “x<y”, etc.– The free variables are the ones appearing in the formula

Page 6: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Composing Expressions – Boolean

• Negation– If (x1,...,xn) is a valid expression, then (x1,...,xn)=(x1,...,xn) is also a

valid expression with the same free variables– Meaning: (x1,...,xn) is satisfied iff (x1,...,xn) is not satisfied

• Disjunction– If 1(x1,...,xn) and 2(x1,...,xn) are valid expressions, then (x1,...,xn)=

1(x1,...,xn)2(x1,...,xn) is a valid expression that is satisfied if either 1

or 2 is satisfied (or both)

– It is possible that only some of the variables x1,...,xn in either 1 or 2

are free variables in this expression• It is possible to “declare” for a given expression the free variables that do

not appear in it explicitly

Page 7: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Additional Boolean Operations• Conjunction

– The expression 1(x1,...,xn)2(x1,...,xn) is satisfied for a given sequence of values x1,...,xn iff it is satisfied for both 1(x1,...,xn) and 2(x1,...,xn)

– It is equivalent to ((1(x1,...,xn))(2(x1,...,xn)))• The parenthesis describe the order in which operators are applied in the

composed expression when it is not obvious

• Implication– Written as 1(x1,...,xn)2(x1,...,xn)

– This expression is equivalent to ((1(x1,...,xn))(2(x1,...,xn)))

• Equivalence– Written as 1(x1,...,xn)2(x1,...,xn)– This expression is equivalent to

(1(x1,...,xn)2(x1,...,xn))(2(x1,...,xn)1(x1,...,xn))

Page 8: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

The Existential Quantifier

• If (x1,...,xn) is a valid expression, then (x2,...,xn)=x1(x1,...,xn) is a valid expression whose free variables are x2,...,xn

– The meaning of this expression is that (2,...,n) is satisfied iff there exists a value 1 for which (1,...,n) is satisfied

• Example:– For attributes whose value ranges are the natural numbers,

(x2,x3)=x1 ((x2 > x1) (x1 > x3) is satisfied iff x2,x3 are a pair of decreasing non-consecutive numbers

Page 9: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

The Universal Quantifier

• Given a valid expression (x1,...,xn), the expression (x2,...,xn)=x1(x1,...,xn) has the following meaning: for any given sequence of values 2,...,n, (2,..., n) is satisfied iff for every possible value 1, (1,..., n) is satisfied

• How do we define “every possible value”?– Naturally, if the set of possible values is infinite, verifying such an

expressions becomes problematic…– For now, we assume that this is restricted to the range of possible values for

the corresponding attribute– If the range is not clear, we need to state it explicitly

– A similar problem also exists for due to negation. In particular, x1(x1,...,xn) is equivalent to x1(x1,...,xn)

Page 10: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Quantified and Free Variables

• Consider the formula (y,z)=(x1(x,y))(x2(x,z)) – The free variables here are y and Z– What about x?

• x appears twice, once on each side of the operator. Yet, there is no relation between these occurances

• In fact, it is equivalent to the formula (y,z)=(x11(x1,y))(x22(x2,z))

– Denote R1[A,B] the relation defined by 1 and R2[A,C] the relation defined by 2

• The equivalent expression in RA is (BR1)(CR2)

– To enforce the same value on both sides of the operator, the formula should be written (y,z)= x(1(x,y)2(x,z))• This is equivalent in RA to B,C ( R1 ⋈ R2)

Page 11: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Implementing RA Expressions in DRC

• Let (x1,...,xn) be a formula representing T[A1,...,An]– The relation A1,...,Am T[A1,...,An] (for m<n) is obtained by

{<x1,...,xm>:xm+1,...,xn(x1,...,xm,xm+1,...,xn)}

– To obtain T[A1,...,An] for a given expression, denote (x1,...,xn) the expression obtained from by replacing each Ai with Xi (for each 1in)• We get the following first order logic expression

T={<x1,...,xn> : (x1,...,xn) (x1,...,xn)}

Page 12: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Cartesian Product in DRC

• Let 1(x1,...,xn) be a formula representing T1[A1,...,An] and 2(x1,...,xm) be a formula representing T2[B1,...,Bm]

– We can represent T1xT2 by the following expression

T1T2= {<x1,...,xn,xn+1,...,xn+m> : 1(x1,...,xn)2(xn+1,...,xn+m)}

• Notice that we need to use different sets of variables for 1 and 2

Page 13: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Subtract, Union, Intersection

• Let 1(x1,...,xn) be a formula representing T1[A1,...,An] and 2(x1,...,xn) be a formula representing T2[A1,...,An]

• We have:– T1 \ T2 = {<x1,...,xn> : 1(x1,...,xn) 2(x1,...,xn)}

– T1T2 = {<x1,...,xn> : 1(x1,...,xn) 2(x1,...,xn)}

– T1T2 = {<x1,...,xn> : 1(x1,...,xn) 2(x1,...,xn)}

Page 14: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Composed Expressions in DRC

• Let 1(x1,...,xn,y1,...,ym) be a formula representing T1[A1,...,An,B1,...,Bm] and 2(y1,...,ym,z1,...,zk) a formula representing T2[B1,...,Bm,C1,...,Cc]– The natural join of T1 and T2 can be expressed as follows

T1 ⋈ T2 = {<x1,...,xn,y1,...,ym,z1,...,zk> : 1(x1,...,xn,y1,...,ym) 2(y1,...,ym,z1,...,zk)}

– The semi-join can be expressed asT1 ⋉ T2 = {<x1,...,xn,y1,...,ym> :

z1,...,zk (1(x1,...,xn,y1,...,ym) 2(y1,...,ym,z1,...,zk))}

Page 15: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Formal Equivalence Proof• In order to exemplify how equivalence proofs are structured, we

will formally prove that if T1 corresponds to 1(x1,...,xn,y1,...,ym) and T2 corresponds to 2(y1,...,ym,z1,...,zk) then for the expression

(x1,...,xn,y1,...,ym,z1,...,zk) = 1(x1,...,xn,y1,...,ym)2(y1,...,ym,z1,...,zk)

the following holds: T1⋈T2={<x1,...,zk>:(x1,...,zk)}

• To that end, we need to show that for every two relations T1 and T2

and every sequence of values t=(1,...,n,1,...,m,1,...,k):

– If tT1⋈T2 then (t) is satisfied

– If (t) is satisfied then tT1⋈T2

Page 16: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

First Direction

• If t=(1,...,n,1,...,m,1,...,k)T1⋈T2 then by the definition of join, (1,...,n,1,...,m)T1 meaning that 1(1,...,n,1,...,m) is satisfied and (1,...,m,1,...,k)T2

meaning that 2(1,...,m,1,...,k) is also satisfied

• Hence, 1(1,...,n,1,...,m)2(1,...,m,1,...,k) =(1,...,n,1,...,m,1,...,k) is

satisfied, as needed

Page 17: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Second Direction

• If (t)=1(1,...,n,1,...,m)2(1,...,m,1,...,k) is satisfied, then by the definition of conjunction, 1(1,...,n,1,...,m) is satisfied, meaning that (1,...,n,1,...,m) T1

• Similarly, 2(1,...,m,1,...,k) is satisfied, meaning that (1,...,m,1,...,k)T2

• Hence, by the definition of natural join T1⋈T2, we have (1,...,n,1,...,m,1,...,k)T1⋈T2, as needed

Page 18: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Boolean Queries

• If is a DRC expression without any free variables (but uses the database relations), then the query “{<>:}” is still meaningful

• The result will be a relation without any attributes– It is the empty relation if is not satisfied over the database– Otherwise, it is the relation that includes a single empty record

• We sometimes use the notation “” for such a query

• For example, the following query checks whether the relations R and S satisfy R⊆S: {<>: x1,...,xn (R(x1,...,xn) → S(x1,...,xn))}

Page 19: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Problematic Expressions

• Consider the DRC expression: {<x> : R(x)}– Can it be written in RA?– Can it be implemented in a table?

• How about the expression: {<x,y> : x=y} ?

• We should avoid extremely large (and in particular infinite) tables as well as queries (and sub-queries) that require addressing extremely large (and in particular infinite) domains– In other words, we would like to restrict ourselves to the content

of the DB and avoid being dependent on the domains

Page 20: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Safe DRC

• Safe-DRC is intended to ensure that all expressions can be implemented without having to scan entire domains– This is both in order to prepare the input and in order to evaluate

expressions with quantifiers

• A DRC expression is called safe if it satisfies some additional requirements that ensure that its result only depends on the relations in the database and not on the entire possible range of attribute values– Not every DRC expression has an equivalent Safe-DRC expression

• A more formal definition is given in the tutorial

Page 21: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

DRC – Summarizing Example• We are going to compare between a few RA and DRC queries

Station

HeightS_Name

S_Type

Line

L_Type

L_Num

Direction

ServesKm

Train

T_NumDays

Service

T_CategoryClass

Food

Gives

Arrives

Platform

D_TimeA_Time

Page 22: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

The Relevant Relations• Since in DRC the order of variables may impact the result, we will first

define the order of attributes for each relation:

– Station[S_Name, Height] Station_Type[S_Name,S_Type]

– Line[L_Num,Direction,L_Type] Serves[S_Name,L_Num,Direction,Km]

– Train[T_Num,Days] Service[T_Category,Class,Food]

– Gives[T_Num,T_Category,Class]

– Arrives[T_Num,S_Name,L_Num,Direction,A_Time,D_Time,Platform]

Page 23: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Sample Query in DRC

• Which stations are served on Line 1-south?– As in RA, we can extract the data from “Serves”

{<x> : y,z,w ( Serves(x,y,z,w) y=1 z=“south” )}– This translates directly to RA as

S_Name ((L_Num=1) (Direction=“south”) (Serves))

– Such an expression can be also written in a shortened manner as

{<x> : w Serves(x,1,“south”,w)}

Page 24: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Another Example

• Which lines have stations below sea level?– For every pair of variables y,z representing a line (L_Num,Direction)

for which there are stations, we can identify the stations using the variable x for which there is some w such that Serves(x,y,z,w) is true

– Hence, the expression {<y,z>:x,w(Serves(x,y,z,w)(x))}where (x) is satisfied iff x is a station below sea level

– For this, we have(x) = u(Station(x,u) (u < 0 ))

– Resulting in the following query{<y,z>:x,w(Serves(x,y,z,w) (u(Station(x,u) (u<0)))}

Page 25: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

The Other Direction• What does the following query means? {<x>:y1,y2(Station(y1,y2)

z1,z2,z3,z4,z5(Arrives(x,y1,z1,z2,z3,z4,z5)))}

– The sub-expression z1,z2,z3,z4,z5(Arrives(x,y1,z1,z2,z3,z4,z5)) represents that train number x stops in station called y1 of a certain line number, direction, etc.

– The sub-expression around the sign (without the quantifiers) indicates that if (y1,y2) is a station then x is a train that stops there

– Hence, the complete expression gives the train numbers stopping in all stations• Always? How can this be fixed?

– Personal exercise: write an expression that will only take into account stations in which trains stop at

Page 26: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

A More Complex Example

• What are the lines that share a change station?– First, here is an expression for obtaining all change stations

1(x)=y(Station(x,y) Station_Type(x,“change”))

– Next, we locate pairs of lines appearing with the same change station 2(u1,v1,u2,v2)=

x,w1,w2(1(x)Serves(x,u1,v1,w1)Serves(x,u2,v2,w2))

– The complete query, including line types, is then {<u1,v1,t1,u2,v2,t2> :

Line(u1,v1,t1) Line(u2,v2,t2) 2(u1,v1,u2,v2) (u1≠u2)}

Page 27: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Tuple Relational Calculus (TRC)

Page 28: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

TRC

• The main difference between TRC and DRC is that in TRC variables’ value are complete records rather than a single attribute

• Hence, to define a query that returns a relation, we need an expression (t) that has a single free variable (although it may include sub-expressions that have quantified variables)

• The general query structure is {t[A1,...,An]:(t)}, which returns a relation with attributes A1,...,An including all possible records having corresponding attributes for which is satisfied

Page 29: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Atomic Formulas in TRC

• Belongs to– For a relation R[A1,...,An] and variable t, the atomic formula

“tR” is satisfied iff the value of t is a record inside R

• Comparisons– We denote t[A] the value of attribute A in the variable t– For example, “t[A]=2” is satisfied if that value of A in t is 2– “r[B]<s[C]” is satisfied if the values of the corresponding

attributes of s and r satisfy this condition

Page 30: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Composed Formulas in TRC• Boolean Formulas

– These are written in TRC exactly the same as in DRC and have the same meaning– For example, the formula 1(t1,...,tk)2(t1,...,tk) is satisfied iff both 1 and 2 are

satisfied with respect to t1,...,tk • Recall that here each ti is a complete record (tuple)

• Quantifiers– Are also written as expected– For example, t1 (t1,...,tk) is satisfied iff there is a possible record (regardless of

whether it is in the database or not) for which (t1,...,tk) is satisfied – here t2,...,tk

are free variables– Yet, here the quantifier applies to complete records (lines) rather than single

attributes– When it is not clear from the context, we write the attributes of the quantified

variable explicitly, e.g., t1[A1,…,An] (t1,...,tk)

Page 31: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

Sample Queries in TRC

• In the train operation example, what is the formula for “which stations are served by line 1-south”?– {t[S_Name] : s[S_Name,L_Num,Direction,Km]

( sServes t[S_Name]=s[S_Name] s[L_Num]=1 s[Direction]=“south” )}

• What about “which lines serve stations below sea level”?– {t[L_Num,Direction] : r,s (rStation sServes

s[L_Num]=t[L_Num] s[Direction]=t[Direction] r[S_Name]=s[S_Name] r[Height]<0 )}

Page 32: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

The Expressive Power of TRC and DRC

• Every query that can be expressed in TRC can also be expressed in DRC and vice versa

• In particular, TRC also has the problem that it can be used to define queries that cannot be implemented in a database

• Similarly to DRC, TRC also has a safe version– Safe-TRC and Safe-DRC have the same expressive power,

i.e., any formula written by Safe-TRC can be expressed in Safe-DRC and vice versa

Page 33: Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in

The Expressive Power of Relational Calculus

• The expressive power of relational calculus is at least as strong as RA since as we saw for DRC (and it is true also for TRC), any query in RA can be expressed in DRC

• Yet, there are expressions in DRC (and TRC) that cannot be expressed in RA– Although DRC enables writing expressions that cannot be implemented in a database

• Safe relational calculus (both Safe-DRC and Safe-TRC) as studied in the tutorials have the same expressive power as RA

• Relational calculus (both safe and unsafe) cannot be used for expressing transitive closure– E.g., “What are all the stations that can be reached from station S in a finite number

of train changes?”