37
CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

Embed Size (px)

Citation preview

Page 1: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

CS 405G: Introduction to Database Systems

Lecture 7: Relational Algebra IIInstructor: Chen Qian

Spring 2014

Page 2: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

(1) 7 points. 0.5 for each mistake

04/18/23 2

Page 3: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

(2) 3 points. No. If starting and ending dates are recorded in multi-

valued attributes, we cannot determine which starting date is corresponding to which ending data.

04/18/23 3

Page 4: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

Review: Summary of core operators

Selection: Projection: Cross product: Union: Difference: Renaming:

04/18/23 Chen Qian @ University of Kentucky

σp R

πL R

R X S

R SR - Sρ S(A1, A2, …) R

4

Page 5: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

Review Summary of derived operators

Join: Natural join: Intersection:

04/18/23 Chen Qian @ University of Kentucky

R p S

R S

R S

Many more Outer join, Division, Semijoin, anti-semijoin, …

5

Page 6: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

Using Join

What are the ids of Lisa’s classes? Student(sid: string, name: string, gpa: float) Course(cid: string, department: string) Enrolled(sid: string, cid: string, grade: character)

An Answer: Student_Lisa σname = “Lisa”Student Lisa_Enrolled Student_Lisa Enrolled Lisa’s classes πCID Lisa_Enrolled

Or: Student_Enrolled Student Enrolled Lisa_Enrolled σname = “Lisa” Student_Enrolled Lisa’s classes πCID Lisa_Enrolled

6

Page 7: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

Join Example

sid name age gpa

1234 John 21 3.5

1123 Mary 22 3.8

1012 Lisa 22 2.6

sid name age gpa

1012 Lisa 22 2.6

sid cid grade

1123 108 A

1012 647 A

1012 108 B

sid name age gpa cid grade

1012 Lisa 22 2.6 647 A

1012 Lisa 22 2.6 108 B

σname=“Lisa”

πcid

cid

647

108

7

Page 8: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

IDs of Lisa’s Classes

π CID( (σname = “Lisa”Student) Enrolled)

Enroll

πCIDIDs of Lisa’s classes

Student

σname = “Lisa”

Who’s Lisa?

8

Page 9: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

Students in Lisa’s Classes SID of Students in Lisa’s classes

Student_Lisa σname = “Lisa”Student Lisa_Enrolled Student_Lisa Enrolled Lisa’s classes πCID Lisa_Enrolled Enrollment in Lisa’s classes Lisa’s classes Enrolled Students in Lisa’s class πSID Enrollment in Lisa’s classes

Students inLisa’s classes

Enroll

πSID

Enroll

πCIDLisa’s classes

Student

σname = “Lisa”

Who’s Lisa?

9

Page 10: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

Tips in Relational Algebra

Use temporary variables Use foreign keys to join tables

10

Page 11: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

An exercise

Names of students in Lisa’s classes

Students inLisa’s classes Student

πnameTheir names

Enroll

πSID

Enroll

πCIDLisa’s classes

Student

σname = “Lisa”

Who’s Lisa?

11

Page 12: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

Set Minus (difference) Operation

CID’s of the courses that Lisa is NOT taking

CID’s of the coursesthat Lisa IS taking

All CID’s-

πCID

Course

Enroll

Student

σname = “Lisa”

πCID

12

Page 13: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

Renaming Operation

Enrolled1(SID1, CID1,Grade1) Enrolled

sid cid grade

1234 647 A

1123 108 A

sid1 cid1 grade1

1234 647 A

1123 108 A

Enroll1(SID1,

CID1,Grade1)

13

Page 14: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

Example

We have the following relational schemas Student(sid: string, name: string, gpa: float) Course(cid: string, department: string) Enrolled(sid: string, cid: string, grade: character)

SID’s of students who take at least two coursesEnrolled Enrolled

πSID (Enrolled Enrolled.SID = Enrolled.SID & Enrolled.CID Enrolled.CID Enrolled)

14

Page 15: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

Example (cont.)

Enroll1(SID1, CID1,Grade1) Enrolled

Enroll2(SID2, CID2,Grade2) Enrolled

πSID (Enroll1 SID1 = SID2 & CID1 CID2Enroll2)

ρEnroll1(SID1, CID1,Grade1) ρEnroll2(SID2, CID2, Grade2)

Enroll Enroll

SID1 = SID2 & CID1 CID2

πSID1

Expression tree syntax:

15

Page 16: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

How does it work?

04/18/23 Chen Qian @ University of Kentucky

sid cid grade

1123 108 A

1012 647 A

1012 108 B

sid2 cid2 grade2

1123 108 A

1012 647 A

1012 108 B

Enroll1 SID1 = SID2 Enroll2

sid cid grade sid2 cid2 grade2

1123 108 A 1123 108 A

1012 647 A 1012 647 A

1012 647 A 1012 108 B

1012 108 B 1012 647 A

1012 108 B 1012 108 B

16

Page 17: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

sid cid grade

1123 108 A

1012 647 A

1012 108 B

sid2 cid2 grade2

1123 108 A

1012 647 A

1012 108 B

Enroll1 SID1 = SID2 & CID1 CID2Enroll2

sid cid grade sid2 cid2 grade2

1123 108 A 1123 108 A

1012 647 A 1012 647 A

1012 647 A 1012 108 B

1012 108 B 1012 647 A

1012 108 B 1012 108 B

17

Page 18: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

Tips in Relational Algebra

A comparison is to identify a relationship

18

Page 19: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

A trickier exercise

Who has the highest GPA? Who has a GPA? Who does NOT have the highest GPA?

Whose GPA is lower than somebody else’s?

πSID

Student

-

StudentStudent

ρStudent1 ρStudent2

Student1.GPA < Student2.GPA

πStudent1.SID

A deeper question:When (and why) is “-” needed?

19

Page 20: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

Review: Summary of core operators

Selection: Projection: Cross product: Union: Difference: Renaming:

σp R

πL R

R X S

R SR - Sρ S(A1, A2, …) R

20

Page 21: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

Review: Summary of derived operators

Join: Natural join: Intersection:

R p S

R SR S

21

Page 22: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

Review

Relational algebra Use temporary variable Use foreign key to join relations A comparison is to identify a relationship

22

Page 23: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

Exercises of R. A.

sid sname rating age

22 dustin 7 45.0

31 lubber 8 55.558 rusty 10 35.0

bid bname color101 Interlake Blue102 Interlake Red103 Clipper Green104 Marine Red

sid bid day

22 101 10/10/9658 103 11/12/96

Reserves

SailorsBoats

Chen Qian @ University of Kentucky04/18/23 23

Page 24: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

Problem 1 Find names of sailors who’ve reserved boat #103

Solution:

))Re((103

Sailorsservesbidsname

Sailors

πsnameWho reserved boat

#103?

Reserves

σbid = “103”Boat #103

Chen Qian @ University of Kentucky04/18/23 24

Page 25: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

Information about boat color only available in Boats; so need an extra join:

Problem 2: Find names of sailors who’ve reserved a red boat

Names of sailors who

reserved red boat

Sailors

πsname

Reserve

πSIDWho reserved red

boats?

Boat

σcolor = “red”Red boats

Chen Qian @ University of Kentucky04/18/23 25

Page 26: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

Problem 3: Find names of sailors who’ve reserved a red boat or a green boat

Can identify all red or green boats, then find sailors who’ve reserved one of these boats:

Names of sailors who

reserved red boat

Sailors

πsname

Reserve

πSIDWho reserved red boats?

Boat

σcolor = “red” color = “green”

Red boats

Chen Qian @ University of Kentucky04/18/23 26

Page 27: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

Monotone operators

If some old output rows may need to be removed Then the operator is non-monotone

Otherwise the operator is monotone That is, old output rows always remain “correct” when

more rows are added to the input Formally, for a monotone operator op:

R µ R’ implies op( R ) µ op( R’ )

RelOpAdd more rows

to the input...

What happensto the output?

28

Page 28: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

Why is “-” needed for highest GPA?

Composition of monotone operators produces a monotone query Old output rows remain “correct” when more rows are

added to the input Highest-GPA query is non-monotone

Current highest GPA is 4.1 Add another GPA 4.2 Old answer is invalidated

So it must use difference!

29

Page 29: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

Classification of relational operators

Selection: σp R

Projection: πL R Cross product: R X S Join: R p S Natural join: R S Union: R U S Difference: R - S Intersection: R ∩ S

Monotone

Monotone

Monotone

Monotone

Monotone

MonotoneMonotone w.r.t. R; non-monotone w.r.t S

Monotone

30

Page 30: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

Why do we need core operator X?

Cross product The only operator that adds columns

Difference The only non-monotone operator

Union The only operator that allows you to add rows?

Selection? Projection?

31

Page 31: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

Aggregate Functions and Operations

Aggregation function takes a collection of values and returns a single value as a result.

avg: average valuemin: minimum valuemax: maximum valuesum: sum of valuescount: number of values

Aggregate operation in relational algebra

G1, G2, …, Gn g F1( A1), F2( A2),…, Fn( An) (E) E is any relational-algebra expression G1, G2 …, Gn is a list of attributes on which to group (can be

empty) Each Fi is an aggregate function Each Ai is an attribute name

04/18/23 Chen Qian @ University of Kentucky 32

Page 32: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

Aggregate Operation – Example

Relation r:

A B

C

7

7

3

10

g sum(c) (r)sum-C

27

04/18/23 Chen Qian @ University of Kentucky 33

Page 33: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

Aggregate Operation – Example

Relation account grouped by branch-name:

branch-name g sum(balance) (account)

branch-name account-number balance

PerryridgePerryridgeBrightonBrightonRedwood

A-102A-201A-217A-215A-222

400900750750700

branch-name balance

PerryridgeBrightonRedwood

13001500700

04/18/23 Chen Qian @ University of Kentucky 34

Page 34: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

Null Values

It is possible for tuples to have a null value, denoted by null, for some of their attributes

null signifies an unknown value or that a value does not exist.

The result of any arithmetic expression involving null is null.

Aggregate functions simply ignore null values For duplicate elimination and grouping, null is treated like

any other value, and two nulls are assumed to be the same

04/18/23 Chen Qian @ University of Kentucky 35

Page 35: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

Null Values

Comparisons with null values return the special truth value unknown If false was used instead of unknown, then not (A < 5)

would not be equivalent to A >= 5 Three-valued logic using the truth value unknown:

OR: (unknown or true) = true, (unknown or false) = unknown (unknown or unknown) = unknown

AND: (true and unknown) = unknown, (false and unknown) = false, (unknown and unknown) = unknown

NOT: (not unknown) = unknown Result of select predicate is treated as false if it evaluates to

unknown

04/18/23 Chen Qian @ University of Kentucky 36

Page 36: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

04/18/23 Chen Qian @ University of Kentucky

Review

Expression tree Tips in writing R.A.

Use temporary variables Use foreign keys to join tables A comparison is to identify a relationship Use difference in non-monotonic results

37

Page 37: CS 405G: Introduction to Database Systems Lecture 7: Relational Algebra II Instructor: Chen Qian Spring 2014

How to write answers to a R.A. problem

Go ahead to write down a single expression as long as you think it is correct.

However, the followings are recommended: Draw an expression tree Write down any

intermediate expressions temporary variables renaming operations

Because then you can get partial credits!

04/18/23 Chen Qian @ University of Kentucky 38