69
CMU SCS Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications Lecture #5: Relational Algebra

Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

  • Upload
    reia

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications. Lecture #5: Relational Algebra. Overview. history concepts Formal query languages relational algebra rel. tuple calculus rel. domain calculus. History. before: records, pointers, sets etc - PowerPoint PPT Presentation

Citation preview

Page 1: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

Carnegie Mellon Univ.School of Computer Science

15-415 - Database Applications

Lecture #5: Relational Algebra

Page 2: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #2

Overview

• history• concepts• Formal query languages

– relational algebra– rel. tuple calculus– rel. domain calculus

Page 3: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #3

History

• before: records, pointers, sets etc• introduced by E.F. Codd in 1970• revolutionary!• first systems: 1977-8 (System R; Ingres) • Turing award in 1981

Page 4: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #4

Concepts - reminder

• Database: a set of relations (= tables)• rows: tuples• columns: attributes (or keys)• superkey, candidate key, primary key

Page 5: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #5

ExampleDatabase:

STUDENTSsn Name Address

123 smith main str234 jones forbes ave

SSN c-id grade123 15-413 A234 15-413 B

Page 6: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #6

Example: cont’dDatabase:

STUDENTSsn Name Address

123 smith main str234 jones forbes ave

rel. schema (attr+domains)

tuple

k-th attribute

(Dk domain)

Page 7: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #7

Example: cont’d

STUDENTSsn Name Address

123 smith main str234 jones forbes ave

rel. schema (attr+domains)

instance

Page 8: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #8

Example: cont’d

STUDENTSsn Name Address

123 smith main str234 jones forbes ave

rel. schema (attr+domains)

instance

• Di: the domain of the i-th attribute (eg., char(10)

Page 9: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #9

Overview

• history• concepts• Formal query languages

– relational algebra– rel. tuple calculus– rel. domain calculus

Page 10: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #10

Formal query languages

• How do we collect information?• Eg., find ssn’s of people in 415• (recall: everything is a set!)• One solution: Rel. algebra, ie., set operators • Q1: Which ones??• Q2: what is a minimal set of operators?

Page 11: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #11

• .• .• .• set union U • set difference ‘-’

Relational operators

Page 12: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #12

Example:

FT-STUDENTSsn Name

129 peters main str239 lee 5th ave

PT-STUDENTSsn Name Address

123 smith main str234 jones forbes ave

• Q: find all students (part or full time)• A: PT-STUDENT union FT-STUDENT

Page 13: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #13

Observations:

• two tables are ‘union compatible’ if they have the same attributes (‘domains’)

• Q: how about intersection U

Page 14: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #14

Observations:

• A: redundant:• STUDENT intersection STAFF =

STUDENT STAFF

Page 15: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #15

Observations:

• A: redundant:• STUDENT intersection STAFF =

STUDENT STAFF

Page 16: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #16

Observations:

• A: redundant:• STUDENT intersection STAFF = STUDENT - (STUDENT - STAFF)

STUDENT STAFF

Page 17: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #17

Observations:

• A: redundant:• STUDENT intersection STAFF = STUDENT - (STUDENT - STAFF)

Double negation:

We’ll see it again, later…

Page 18: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #18

• .• .• .• set union • set difference ‘-’

Relational operators

U

Page 19: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #19

Other operators?

• eg, find all students on ‘Main street’• A: ‘selection’

)('' STUDENTstrmainaddress

STUDENTSsn Name Address

123 smith main str234 jones forbes ave

Page 20: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #20

Other operators?

• Notice: selection (and rest of operators) expect tables, and produce tables (-> can be cascaded!!)

• For selection, in general:

)(RELATIONcondition

Page 21: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #21

Selection - examples

• Find all ‘Smiths’ on ‘Forbes Ave’

)('''' STUDENTaveForbesaddressSmithname

‘condition’ can be any boolean combination of ‘=‘, ‘>’, ‘>=‘, ...

Page 22: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #22

• selection• .• .• set union • set difference R - S

Relational operators

)(Rcondition

R U S

Page 23: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #23

• selection picks rows - how about columns?• A: ‘projection’ - eg.: finds all the ‘ssn’ - removing duplicates

Relational operators

)(STUDENTssn

STUDENTSsn Name Address

123 smith main str234 jones forbes ave

Page 24: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #24

Cascading: ‘find ssn of students on ‘forbes ave’

Relational operators

))(( '' STUDENTaveforbesaddressssn

STUDENTSsn Name Address

123 smith main str234 jones forbes ave

Page 25: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #25

• selection• projection• .• set union • set difference R - S

Relational operators

)(Rcondition)(Rlistatt

R U S

Page 26: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #26

Are we done yet?Q: Give a query we can not answer yet!

Relational operators

Page 27: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #27

A: any query across two or more tables,eg., ‘find names of students in 15-415’

Q: what extra operator do we need??

Relational operators

STUDENTSsn Name Address

123 smith main str234 jones forbes ave

TAKESSSN c-id grade

123 15-413 A234 15-413 B

Page 28: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #28

A: any query across two or more tables,eg., ‘find names of students in 15-415’

Q: what extra operator do we need??A: surprisingly, cartesian product is enough!

Relational operators

STUDENTSsn Name Address

123 smith main str234 jones forbes ave

TAKESSSN c-id grade

123 15-413 A234 15-413 B

Page 29: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #29

Cartesian product

• eg., dog-breeding: MALE x FEMALE• gives all possible couples

MALEnamespikespot

FEMALEnamelassieshiba

x =M.name F.namespike lassiespike shibaspot lassiespot shiba

Page 30: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #30

so what?

• Eg., how do we find names of students taking 415?

STUDENTSsn Name Address

123 smith main str234 jones forbes ave

SSN c-id grade123 15-415 A234 15-413 B

Page 31: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #31

Cartesian product

• A:

Ssn Name Address ssn cid grade123 smith main str 123 15-415 A234 jones forbes ave 123 15-415 A123 smith main str 234 15-413 B234 jones forbes ave 234 15-413 B

)(......... .. TAKESxSTUDENTssnTAKESssnSTUDENT

Page 32: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #32

Cartesian product

))((.. ..41515 TAKESxSTUDENTssnTAKESssnSTUDENTcid

Ssn Name Address ssn cid grade123 smith main str 123 15-415 A234 jones forbes ave 123 15-415 A123 smith main str 234 15-413 B234 jones forbes ave 234 15-413 B

Page 33: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #33

)))((

(

..41515 TAKESxSTUDENTssnTAKESssnSTUDENTcid

name

Ssn Name Address ssn cid grade123 smith main str 123 15-415 A234 jones forbes ave 123 15-415 A123 smith main str 234 15-413 B234 jones forbes ave 234 15-413 B

Page 34: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #34

• selection• projection• cartesian product MALE x FEMALE• set union • set difference R - S

FUNDAMENTALRelational operators

)(Rcondition)(Rlistatt

R U S

Page 35: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #35

Relational ops

• Surprisingly, they are enough, to help us answer almost any query we want!!

• derived/convenience operators:– set intersection– join (theta join, equi-join, natural join)– ‘rename’ operator– division

)(' RRSR

Page 36: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #36

Joins

• Equijoin: SR bSaR .. )(.. SRbSaR

Page 37: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #37

Cartesian product

• A: )(......... .. TAKESxSTUDENTssnTAKESssnSTUDENT

Ssn Name Address ssn cid grade123 smith main str 123 15-415 A234 jones forbes ave 123 15-415 A123 smith main str 234 15-413 B234 jones forbes ave 234 15-413 B

Page 38: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #38

Joins

• Equijoin: • theta-joins: generalization of equi-join - any condition

SR bSaR .. )(.. SRbSaR

SR

Page 39: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #39

Joins

• very popular: natural join: R S• like equi-join, but it drops duplicate

columns: STUDENT (ssn, name, address) TAKES (ssn, cid, grade)

Page 40: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #40

Joins

• nat. join has 5 attributes TAKESSTUDENT

TAKESSTUDENT ssnTAKESssnSTUDENT .. equi-join: 6

Ssn Name Address ssn cid grade123 smith main str 123 15-415 A234 jones forbes ave 123 15-415 A123 smith main str 234 15-413 B234 jones forbes ave 234 15-413 B

Page 41: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #41

Natural Joins - nit-picking

• if no attributes in common between R, S:nat. join -> cartesian product

Page 42: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #42

Overview - rel. algebra

• fundamental operators• derived operators

– joins etc– rename– division

• examples

Page 43: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #43

Rename op.

• Q: why?• A: shorthand; self-joins; …• for example, find the grand-parents of

‘Tom’, given PC (parent-id, child-id)

)(BEFOREAFTER

Page 44: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #44

Rename op.

• PC (parent-id, child-id) PCPC

PCp-id c-idMary TomPeter MaryJohn Tom

PCp-id c-idMary TomPeter MaryJohn Tom

Page 45: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #45

Rename op.

• first, WRONG attempt:

• (why? how many columns?)• Second WRONG attempt:

PCPC

PCPC idpPCidcPC ..

Page 46: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #46

Rename op.

• we clearly need two different names for the same table - hence, the ‘rename’ op.

PCPC idpPCidcPCPC ..11 )(

Page 47: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #47

Overview - rel. algebra

• fundamental operators• derived operators

– joins etc– rename– division

• examples

Page 48: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #48

Division

• Rarely used, but powerful.• Example: find suspicious suppliers, ie.,

suppliers that supplied all the parts in A_BOMB

Page 49: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #49

Division

SHIPMENTs# p#s1 p1s2 p1s1 p2s3 p1s5 p3

ABOMBp#p1p2

BAD_Ss#s1

Page 50: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #50

Division

• Observations: ~reverse of cartesian product• It can be derived from the 5 fundamental

operators (!!)• How?

Page 51: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #51

Division

• Answer:

• Observation: find ‘good’ suppliers, and subtract! (double negation)

]))([()( )()()( rsrrsr SRSRSR

Page 52: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #52

Division

• Answer:

• Observation: find ‘good’ suppliers, and subtract! (double negation)

]))([()( )()()( rsrrsr SRSRSR

SHIPMENTs# p#s1 p1s2 p1s1 p2s3 p1s5 p3

ABOMBp#p1p2

BAD_Ss#s1

Page 53: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #53

Division

• Answer:

]))([()( )()()( rsrrsr SRSRSR

SHIPMENTs# p#s1 p1s2 p1s1 p2s3 p1s5 p3

ABOMBp#p1p2

BAD_Ss#s1

All suppliers

All bad parts

Page 54: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #54

Division

• Answer:

]))([()( )()()( rsrrsr SRSRSR

SHIPMENTs# p#s1 p1s2 p1s1 p2s3 p1s5 p3

ABOMBp#p1p2

BAD_Ss#s1

all possible

suspicious shipments

Page 55: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #55

Division

• Answer:

]))([()( )()()( rsrrsr SRSRSR

SHIPMENTs# p#s1 p1s2 p1s1 p2s3 p1s5 p3

ABOMBp#p1p2

BAD_Ss#s1

all possible

suspicious shipments

that didn’t happen

Page 56: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #56

Division

• Answer:

]))([()( )()()( rsrrsr SRSRSR

SHIPMENTs# p#s1 p1s2 p1s1 p2s3 p1s5 p3

ABOMBp#p1p2

BAD_Ss#s1

all suppliers who missed

at least one suspicious shipment,

i.e.: ‘good’ suppliers

Page 57: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #57

Overview - rel. algebra

• fundamental operators• derived operators

– joins etc– rename– division

• examples

Page 58: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #58

Sample schema

STUDENTSsn Name Address

123 smith main str234 jones forbes ave

CLASSc-id c-name units15-413 s.e. 215-412 o.s. 2

TAKESSSN c-id grade

123 15-413 A234 15-413 B

find names of students that take 15-415

Page 59: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #59

Examples

• find names of students that take 15-415

Page 60: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #60

Examples

• find names of students that take 15-415

)]([ 41515 TAKESSTUDENTidcname

Page 61: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #61

Sample schema

STUDENTSsn Name Address

123 smith main str234 jones forbes ave

CLASSc-id c-name units15-413 s.e. 215-412 o.s. 2

TAKESSSN c-id grade

123 15-413 A234 15-413 B

find course names of ‘smith’

Page 62: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #62

Examples

• find course names of ‘smith’

)]

([ ''

CLASSTAKESSTUDENT

smithnamenamec

Page 63: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #63

Examples

• find ssn of ‘overworked’ students, ie., that take 412, 413, 415

Page 64: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #64

Examples

• find ssn of ‘overworked’ students, ie., that take 412, 413, 415: almost correct answer:

)()()(

415

413

412

TAKESTAKESTAKES

namec

namec

namec

Page 65: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #65

Examples

• find ssn of ‘overworked’ students, ie., that take 412, 413, 415 - Correct answer:

)]([)]([)]([

412

412

412

TAKESTAKESTAKES

namecssn

namecssn

namecssn

c-name=413

c-name=415

Page 66: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #66

Examples

• find ssn of students that work at least as hard as ssn=123, ie., they take all the courses of ssn=123, and maybe more

Page 67: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #67

Sample schema

STUDENTSsn Name Address

123 smith main str234 jones forbes ave

CLASSc-id c-name units15-413 s.e. 215-412 o.s. 2

TAKESSSN c-id grade

123 15-413 A234 15-413 B

Page 68: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #68

Examples

• find ssn of students that work at least as hard as ssn=123 (ie., they take all the courses of ssn=123, and maybe more

)]([)]([ 123, TAKESTAKES ssnidcidcssn

Page 69: Carnegie Mellon Univ. School of Computer Science 15-415 - Database Applications

CMU SCS

CMU SCS 15-415 Faloutsos #69

Conclusions

• Relational model: only tables (‘relations’)• relational algebra: powerful, minimal: 5

operators can handle almost any query!