31
1 Relational Algebra Lecture 5

Relational+algebra (1)

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Relational+algebra (1)

1

Relational Algebra

Lecture 5

Page 2: Relational+algebra (1)

2

Relational Query Languages

• Languages for describing queries on a relational database

• Structured Query Language (SQL)– Predominant application query language– Declarative

• Relational Algebra– Intermediate language within DBMS– Procedural

Page 3: Relational+algebra (1)

3

Algebra

• Based on operators and a domain of values• Operators map arguments from domain into

another domain value• Hence, an expression involving operators

and arguments produces a value in the domain. We refer to the expression as a query and the value produced as the result of that query

Page 4: Relational+algebra (1)

4

Relational Algebra

• Domain: set of relations, i.e., result is another relation

• Basic operators: select, project, union, set difference, Cartesian product (or cross product)

• Derived operators: set intersection, division, join• Procedural: Relational expression specifies query

by describing an algorithm (the sequence in which operators are applied) for determining the result of an expression

Page 5: Relational+algebra (1)

5

Relational Algebra in a DBMS

parser

SQLRelationalalgebraexpression

OptimizedRelationalalgebraexpression

Query optimizer

Codegenerator

Queryexecutionplan

Executablecode

DBMS

Page 6: Relational+algebra (1)

6

Select Operator• Produce table containing subset of rows of

argument table satisfying condition

<condition>(<relation>)

• Example: H o b b y = ’ s t a m p s ’ ( P e r s o n )

1123 John 123 Main stamps1123 John 123 Main coins5556 Mary 7 Lake Dr hiking9876 Bart 5 Pine St stamps

1123 John 123 Main stamps9876 Bart 5 Pine St stamps

Person

Id Name Address Hobby Id Name Address Hobby

Page 7: Relational+algebra (1)

7

Selection Condition

• Operators: <, , , >, =, • Simple selection condition:

– <attribute> operator <constant>– <attribute> operator <attribute>

• <condition> AND <condition>• <condition> OR <condition>• NOT <condition>• NOTE: Each attribute in condition must be one of

the attributes of the relation name

Page 8: Relational+algebra (1)

8

Id>3000 OR Hobby=‘hiking’ (Person)

Id>3000 AND Id <3999 (Person)

NOT(Hobby=‘hiking’) (Person)

Hobby‘hiking’ (Person)

(Id > 3000 AND Id < 3999) OR Hobby‘hiking’ (Person)

Selection Condition - Examples

Page 9: Relational+algebra (1)

9

Project Operator

• Produce table containing subset of columns of argument table

<attribute-list> (<relation>) Example: Name, Hobby (Person)

1123 John 123 Main stamps1123 John 123 Main coins5556 Mary 7 Lake Dr hiking9876 Bart 5 Pine St stamps

John stampsJohn coinsMary hikingBart stamps

Id Name Address Hobby Name Hobby

Person

Page 10: Relational+algebra (1)

10

Project OperatorExample: Name, Address (Person)

1123 John 123 Main stamps1123 John 123 Main coins5556 Mary 7 Lake Dr hiking9876 Bart 5 Pine St stamps

John 123 MainMary 7 Lake DrBart 5 Pine St

Result is a table (no duplicates)

Id Name Address Hobby Name Address

Person

Page 11: Relational+algebra (1)

11

Relational Expressions: Combining Operators

1123 John 123 Main stamps1123 John 123 Main coins5556 Mary 7 Lake Dr hiking9876 Bart 5 Pine St stamps

Id, Name ( Hobby=’stamps’ OR Hobby=’coins’ (Person) )

1123 John9876 Bart

Id Name Address Hobby Id Name

Person

Page 12: Relational+algebra (1)

12

Set Operators

• Relation is a set of tuples => set operations should apply

• Result of combining two relations with a set operator is a relation => all its elements must be tuples having same structure

• Hence, scope of set operations limited to union compatible relations

Page 13: Relational+algebra (1)

13

Union Compatible Relations

• Two relations are union compatible if– Both have same number of columns– Names of attributes are the same in both– Attributes with the same name in both relations

have the same domain

• Union compatible relations can be combined using union, intersection, and set difference

Page 14: Relational+algebra (1)

14

Example

Tables: Person (SSN, Name, Address, Hobby) Professor (Id, Name, Office, Phone)are not union compatible. However Name (Person) and Name (Professor)are union compatible and Name (Person) - Name (Professor)makes sense.

Page 15: Relational+algebra (1)

15

Cartesian Product• If R and S are two relations, R S is the set of all

concatenated tuples <x,y>, where x is a tuple in R and y is a tuple in S– (R and S need not be union compatible)

• R S is expensive to compute:– Factor of two in the size of each row– Quadratic in the number of rows

a b c d a b c d x1 x2 y1 y2 x1 x2 y1 y2 x3 x4 y3 y4 x1 x2 y3 y4 x3 x4 y1 y2 R S x3 x4 y3 y4 R S

Page 16: Relational+algebra (1)

16

Renaming

• Result of expression evaluation is a relation• Attributes of relation must have distinct names.

This is not guaranteed with Cartesian product– e.g., suppose in previous example a = c

• Renaming operator tidies this up. To assign the names A1, A2,… An to the attributes of the n column relation produced by expression use

expression [A1, A2, … An]

Page 17: Relational+algebra (1)

17

Example

Transcript (StudId, CrsCode, Semester, Grade)

Teaching (ProfId, CrsCode, Semester)

StudId, CrsCode (Transcript)[StudId, SCrsCode] ProfId, CrsCode(Teaching) [ProfId, PCrscode]

This is a relation with 4 attributes: StudId, SCrsCode, ProfId, PCrsCode

Page 18: Relational+algebra (1)

18

Derived Operation: Join

The expression : join-condition´ (R S) where join-condition´ is a conjunction of terms: Ai oper Bi

in which Ai is an attribute of R, Bi is an attribute of S, and oper is one of =, <, >, , , is referred to as the (theta) join of R and S and denoted: R join-condition SWhere join-condition and join-condition´ are (roughly) the same …

Page 19: Relational+algebra (1)

19

Join and Renaming

• Problem: R and S might have attributes with the same name – in which case the Cartesian product is not defined

• Solution: – Rename attributes prior to forming the product

and use new names in join-condition´.– Common attribute names are qualified with

relation names in the result of the join

Page 20: Relational+algebra (1)

20

Theta Join – Example

Output the names of all employees that earnmore than their managers.

Employee.Name (Employee MngrId=Id AND Salary>Salary Manager)The join yields a table with attributes:

Employee.Name, Employee.Id, Employee.Salary, MngrIdManager.Name, Manager.Id, Manager.Salary

Page 21: Relational+algebra (1)

21

Equijoin Join - Example

Name,CrsCode(Student Id=StudId Grade=‘A’(Transcript))

Id Name Addr Status111 John ….. …..222 Mary ….. …..333 Bill ….. …..444 Joe ….. …..

StudId CrsCode Sem Grade 111 CSE305 S00 B 222 CSE306 S99 A 333 CSE304 F99 A

Mary CSE306Bill CSE304

The equijoin is commonlyused since it combines related data in different relations.

Student Transcript

Equijoin: Join condition is a conjunction of equalities.

Page 22: Relational+algebra (1)

22

Natural Join• Special case of equijoin:

– join condition equates all and only those attributes with the same name (condition doesn’t have to be explicitly stated)

– duplicate columns eliminated from the result

Transcript (StudId, CrsCode, Sem, Grade)Teaching (ProfId, CrsCode, Sem)

Transcript Teaching = StudId, Transcript.CrsCode, Transcript.Sem, Grade, ProfId ( Transcript CrsCode=CrsCode AND Sem=SemTeaching)

[StudId, CrsCode, Sem, Grade, ProfId ]

Page 23: Relational+algebra (1)

23

Natural Join (con’t)• More generally:

R S = attr-list (join-cond (R × S) )

where attr-list = attributes (R) attributes (S)(duplicates are eliminated) and join-cond has the form: A1 = A1 AND … AND An = An

where {A1 … An} = attributes(R) attributes(S)

Page 24: Relational+algebra (1)

24

Natural Join Example

• List all Id’s of students who took at least two different courses:

StudId ( CrsCode CrsCode2 ( Transcript

Transcript [StudId, CrsCode2, Sem2, Grade2])) (don’t join on CrsCode, Sem, and Grade attributes)

Page 25: Relational+algebra (1)

25

Division• Goal: Produce the tuples in one relation, r,

that match all tuples in another relation, s– r (A1, …An, B1, …Bm)

– s (B1 …Bm)

– r/s, with attributes A1, …An, is the set of all tuples <a> such that for every tuple <b> in s, <a,b> is in r

• Can be expressed in terms of projection, set difference, and cross-product

Page 26: Relational+algebra (1)

26

Division (con’t)

Page 27: Relational+algebra (1)

27

Division - Example

• List the Ids of students who have passed all courses that were taught in spring 2000

• Numerator: StudId and CrsCode for every course passed by every student StudId, CrsCode (Grade ‘F’ (Transcript) )

• Denominator: CrsCode of all courses taught in spring 2000 CrsCode (Semester=‘S2000’ (Teaching) )

• Result is numerator/denominator

Page 28: Relational+algebra (1)

28

Schema for Student Registration System

Student (Id, Name, Addr, Status)Professor (Id, Name, DeptId)Course (DeptId, CrsCode, CrsName, Descr)Transcript (StudId, CrsCode, Semester, Grade)Teaching (ProfId, CrsCode, Semester)Department (DeptId, Name)

Page 29: Relational+algebra (1)

29

Query Sublanguage of SQL

• Tuple variable C ranges over rows of Course.• Evaluation strategy:

– FROM clause produces Cartesian product of listed tables

– WHERE clause assigns rows to C in sequence and produces table containing only rows satisfying condition

– SELECT clause retains listed columns

• Equivalent to: CrsNameDeptId=‘CS’(Course)

SELECT C.CrsNameFROM Course CWHERE C.DeptId = ‘CS’

Page 30: Relational+algebra (1)

30

Join Queries

• List CS courses taught in S2000• Tuple variables clarify meaning.• Join condition “C.CrsCode=T.CrsCode”

– eliminates garbage

• Selection condition “ T.Sem=‘S2000’ ” – eliminates irrelevant rows

• Equivalent (using natural join) to:

SELECT C.CrsNameFROM Course C, Teaching TWHERE C.CrsCode=T.CrsCode AND T.Sem=‘S2000’

CrsName(Course Sem=‘S2000’ (Teaching) )

CrsName (Sem=‘S2000’ (Course Teaching) )

Page 31: Relational+algebra (1)

31

Correspondence Between SQL and Relational Algebra

SELECT C.CrsNameFROM Course C, Teaching TWHERE C.CrsCode=T.CrsCode AND T.Sem=‘S2000’

Also equivalent to:CrsName C_CrsCode=T_CrsCode AND Sem=‘S2000’

(Course [C_CrsCode, DeptId, CrsName, Desc] Teaching [ProfId, T_CrsCode, Sem])

This is the simple evaluation algorithm for SELECT.Relational algebra expressions are procedural. Which of the two equivalent expressions is more easily evaluated?