38
Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

  • View
    223

  • Download
    6

Embed Size (px)

Citation preview

Page 1: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

Relational Algebra & Calculus

Zachary G. IvesUniversity of Pennsylvania

CIS 550 – Database & Information Systems

September 16, 2004

Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

Page 2: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

2

Administrivia

Homework 1 handed out Will be due in 1 week, unless otherwise

announced

Focus: relational algebra and calculus

Page 3: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

3

Example Data Instance

sid name

1 Jill

2 Qun

3 Nitin

fid name

1 Ives

2 Saul

8 Roth

sid exp-grade

cid

1 A 550-0103

1 A 700-1003

3 C 500-0103

cid subj sem

550-0103 DB F03

700-1003 AI S03

501-0103 Arch F03

fid cid

1 550-0103

2 700-1003

8 501-0103

STUDENT Takes COURSE

PROFESSOR Teaches

Page 4: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

4

Codd’s Relational Algebra

A set of mathematical operators that compose, modify, and combine tuples within different relations

Relational algebra operations operate on relations and produce relations (“closure”)f: Relation Relation f: Relation x Relation

Relation

Page 5: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

5

A Set of Logical Operations: The Relational Algebra

Six basic operations: Projection (R) Selection (R) Union R1 [ R2

Difference R1 – R2

Product R1 £ R2

(Rename) (R) And some other useful ones:

Join R1 ⋈ R2

Semijoin R1 ⊲ R2

Intersection R1 Å R2 Division R1 ¥ R2

Page 6: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

6

Data Instance for Operator Examples

sid name

1 Jill

2 Qun

3 Nitin

4 Marty

fid name

1 Ives

2 Saul

8 Roth

sid exp-grade

cid

1 A 550-0103

1 A 700-1003

3 A 700-1003

3 C 500-0103

4 C 500-0103

cid subj sem

550-0103 DB F03

700-1003 AI S03

501-0103 Arch F03

fid cid

1 550-0103

2 700-1003

8 501-0103

STUDENT Takes COURSE

PROFESSOR Teaches

Page 7: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

7

Projection,

Page 8: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

8

Selection,

Page 9: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

9

Product X

Page 10: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

10

Join, ⋈: A Combination of Productand Selection

Page 11: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

11

Union

Page 12: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

12

Difference –

Page 13: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

13

Rename,

The rename operator can be expressed several ways: The book has a very odd definition that’s not

algebraic An alternate definition:

(x) Takes the relation with schema Returns a relation with the attribute

list

Rename isn’t all that useful, except if you join a relation with itself

Why would it be useful here?

Page 14: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

14

Mini-Quiz

This completes the basic operations of the relational algebra. We shall soon find out in what sense this is an adequate set of operations. Try writing queries for these: The names of students named “Bob” The names of students expecting an “A” The names of students in Amir Roth’s 501

class The sids and names of students not enrolled

Page 15: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

15

Deriving Intersection

Intersection: as with set operations, derivable from difference

A-B B-A

A B

A Å B≡ (A [ B) – (A – B) – (B – A)≡ (A - B) – (B - A)

Page 16: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

16

Division

A somewhat messy operation that can be expressed in terms of the operations we have already defined

Used to express queries such as “The fid's of faculty who have taught all subjects”

Paraphrased: “The fid’s of professors for which there does not exist a subject that they haven’t taught”

Page 17: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

17

Division Using Our Existing Operators

All possible teaching assignments: Allpairs:

NotTaught, all (fid,subj) pairs for which professor fid has not taught subj:

Answer is all faculty not in NotTaught:

fid,subj (PROFESSOR £ subj(COURSE))

Allpairs - fid,subj(Teaches COURSE)⋈fid(PROFESSOR) - fid(NotTaught)

´ fid(PROFESSOR) - fid(fid,subj (PROFESSOR £ subj(COURSE)) -fid,subj(Teaches COURSE))⋈

Page 18: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

18

Division: R1 R2

Requirement: schema(R1) ¾ schema(R2) Result schema: schema(R1) – schema(R2) “Professors who have taught all courses”:

What about “Courses that have been taught by all faculty”?

fid (fid,subj(Teaches ⋈ COURSE) subj(COURSE))

Page 19: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

19

The Big Picture: SQL to Algebra toQuery Plan to Web Page

SELECT * FROM STUDENT, Takes, COURSE

WHERE STUDENT.sid = Takes.sID AND Takes.cID = cid

STUDENT

Takes COURSE

Merge

Hash

by cid by cidOptimizer

ExecutionEngine

StorageSubsystem

Web Server / UI / etc

Query Plan – anoperator tree

Page 20: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

20

Hint of Future Things: OptimizationIs Based on Algebraic Equivalences

Relational algebra has laws of commutativity, associativity, etc. that imply certain expressions are equivalent in semantics

They may be different in cost of evaluation!

c Ç d(R) ´ c(R) [ d(R)

c (R1 £ R2) ´ R1 ⋈c R2

c Ç d (R) ´ c (d (R))

Query optimization finds the most efficient representation to evaluate (or one that’s not bad)

Page 21: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

21

Switching Gears: An Equivalent, ButVery Different, Formalism

Codd invented a relational calculus that he proved was equivalent in expressiveness Based on a subset of first-order logic –

declarative, without an implicit order of evaluation Tuple relational calculus Domain relational calculus

More convenient for describing certain things, and for certain kinds of manipulations

The database uses the relational algebra internally

But query languages (e.g., SQL) are mostly based on the relational calculus

Page 22: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

22

Domain Relational Calculus

Queries have form:

{<x1,x2, …, xn>| p}

Predicate: boolean expression over x1,x2, …, xn Precise operations depend on the domain and

query language – may include special functions, etc.

Assume the following at minimum:<xi,xj,…> R X op Y X op const const op X

where op is , , , , , xi,xj,… are domain variables

domain variables

predicate

Page 23: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

23

More Complex Predicates

Starting with these atomic predicates, build up new predicates by the following rules: Logical connectives: If p and q are predicates,

then so are p q, p q, p, and p q (x>2) (x<4) (x>2) (x>0)

Existential quantification: If p is a predicate, then so is x.p

x. (x>2) (x<4)

Universal quantification: If p is a predicate, then so is x.p

x.x>2 x. y.y>x

Page 24: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

24

Some Examples

Faculty ids Subjects for courses with students

expecting a “C” All course numbers for which there exists

a smaller course number

Page 25: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

25

Logical Equivalences

There are two logical equivalences that will be heavily used: p q p q

(Whenever p is true, q must also be true.) x. p(x) x. p(x)

(p is true for all x) The second can be a lot easier to check!

Example: The highest course number offered

Page 26: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

26

Free and Bound Variables

A variable v is bound in a predicate p when p is of the form v… or v…

A variable occurs free in p if it occurs in a position where it is not bound by an enclosing or

Examples: x is free in x > 2 x is bound in x. x > y

Page 27: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

27

Can Rename Bound Variables Only

When a variable is bound one can replace it with some other variable without altering the meaning of the expression, providing there are no name clashes

Example: x. x > 2 is equivalent to y. y > 2

Otherwise, the variable is defined outside our “scope”…

Page 28: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

28

Safety Pitfall in what we have done so far – how do we

interpret: {<sid,name>| <sid,name> STUDENT}

Set of all binary tuples that are not students: an infinite set (and unsafe query)

A query is safe if no matter how we instantiate the relations, it always produces a finite answer Domain independent: answer is the same regardless

of the domain in which it is evaluated Unfortunately, both this definition of safety and

domain independence are semantic conditions, and are undecidable

Page 29: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

29

Safety and Termination Guarantees

There are syntactic conditions that are used to guarantee “safe” formulas The definition is complicated, and we won’t discuss

it; you can find it in Ullman’s Principles of Database and Knowledge-Base Systems

The formulas that are expressible in real query languages based on relational calculus are all “safe”

Many DB languages include additional features, like recursion, that must be restricted in certain ways to guarantee termination and consistent answers

Page 30: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

30

Mini-Quiz

How do you write: Which students have taken more than one

course from the same professor?

Page 31: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

31

Translating from RA to DRC

Core of relational algebra: , , , x, - We need to work our way through the

structure of an RA expression, translating each possible form. Let TR[e] be the translation of RA expression e

into DRC.

Relation names: For the RA expression R, the DRC expression is {<x1,x2, …, xn>| <x1,x2, …, xn> R}

Page 32: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

32

Selection: TR[ R]

Suppose we have (e’), where e’ is another RA expression that translates as:

TR[e’]= {<x1,x2, …, xn>| p} Then the translation of c(e’) is

{<x1,x2, …, xn>| p’}where ’ is obtained from by replacing each attribute with the corresponding variable

Example: TR[#1=#2 #4>2.5R] (if R has arity 4) is

{<x1,x2, x3, x4>|< x1,x2, x3, x4> R x1=x2 x4>2.5}

Page 33: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

33

Projection: TR[i1,…,im(e)]

If TR[e]= {<x1,x2, …, xn>| p} then TR[i1,i2,…,im

(e)]=

{<x i1,x i2

, …, x im >| xj1,xj2

, …, xjk.p},

where xj1,xj2

, …, xjk are variables in x1,x2, …, xn

that are not in x i1,x i2

, …, x im

Example: With R as before,#1,#3 (R)={<x1,x3>| x2,x4. <x1,x2, x3,x4> R}

Page 34: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

34

Union: TR[R1 R2] R1 and R2 must have the same arity For e1 e2, where e1, e2 are algebra

expressionsTR[e1]={<x1,…,xn>|p} and TR[e2]={<y1,…yn>|q}

Relabel the variables in the second:TR[e2]={< x1,…,xn>|q’}

This may involve relabeling bound variables in q to avoid clashesTR[e1e2]={<x1,…,xn>|pq’}.

Example: TR[R1 R2] = {< x1,x2, x3,x4>| <x1,x2, x3,x4>R1 <x1,x2, x3,x4>R2

Page 35: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

35

Other Binary Operators

Difference: The same conditions hold as for unionIf TR[e1]={<x1,…,xn>|p} and TR[e2]={< x1,…,xn>|q}

Then TR[e1- e2]= {<x1,…,xn>|pq}

Product: If TR[e1]={<x1,…,xn>|p} and TR[e2]={< y1,…,ym>|q}

Then TR[e1 e2]= {<x1,…,xn, y1,…,ym >| pq}

Example: TR[RS]= {<x1,…,xn, y1,…,ym >|

<x1,…,xn> R <y1,…,ym > S }

Page 36: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

36

What about the Tuple Relational Calculus?

We’ve been looking at the Domain Relational Calculus

The Tuple Relational Calculus is nearly the same, but variables are at the level of a tuple, not an attribute

{Q | 9 S COURSES, 9 T 2 Takes (S.cid = T.cid Æ Q.cid = S.cid Æ Q.exp-grade = T.exp-grade)}

Page 37: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

37

Limitations of the Relational Algebra / Calculus

Can’t do: Aggregate operations Recursive queries Complex (non-tabular) structures

Most of these are expressible in SQL, OQL, XQuery – using other special operators

Sometimes we even need the power of a Turing-complete programming language

Page 38: Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content

38

Summary

Can translate relational algebra into relational calculus DRC and TRC are slightly different syntaxes but

equivalent

Given syntactic restrictions that guarantee safety of DRC query, can translate back to relational algebra

These are the principles behind initial development of relational databases SQL is close to calculus; query plan is close to algebra Great example of theory leading to practice!