Upload
hoangthuan
View
239
Download
0
Embed Size (px)
Citation preview
Relational database
• Relational database systems are expected to be equipped with a query language that can assist its users to query the database instances.
• There are two kinds of query languages
relational algebra and relational calculus.
What is Relational Algebra?
• An algebra whose operands are relations or variables
that represent relations.
• Operators are designed to do the most common things
that we need to do with relations in a database.
• The result is an algebra that can be used as a query
language for relations.
Relational Algebra
• Operands: tables
• Operators:
choose only the rows you want
choose only the columns you want
combine tables
and a few other things
Example
• Movies(mID, title, director, year, length)
• Artists(aID, aName, nationality)
• Roles(mID, aID, character)
• Foreign key constraints:
–Roles[mID] ⊆ Movies[mID]
–Roles[aID] ⊆ Artists[aID]
Select: choose rows
• Notation: σc(R)
R is a table
Condition c is a boolean expression
It can use comparison operators and boolean operators
• The result is a relation
with the same schema as the operand
but with only the tuples that satisfy the condition
Project : choose columns
• Notation: π L (R)
R is a table.
L is a subset (not necessarily a proper subset) of
the attributes of R.
• The result is a relation
with all the tuples from R
but with only the attributes in L, and in that order
About project
• Why is it called “project”?
• What is the value of πdirector (Movies)?
• Exercise: Write an RA expression to find the names of all directors of movies from the 1970s
• Now, suppose you want the names of all characters in movies from the 1970s. We need to be able to combine tables.
Example
Name Age
Karim 20
Ruth 18
Minh 20
Sofia 19
Jennifer 19
Sasha 20
People πAge People
Age
20
18
19
Cartesian Product
• Notation: R1 x R2
• The result is a relation with
every combination of a tuple from R1 concatenated to a tuple from R2
• Its schema is every attribute from R followed by every attribute of S, in order
Cartesian Product
• How many tuples are in R1 x R2?
• Example: Movies x Roles
• If an attribute occurs in both relations, it occurs twice
in the result (prefixed by relation name)
Multi-cultural class
ClassInfo=π Name, Country, GPA (σStudent.name=AlgoList.name AlgoList x Student)
Cartesian product can be inconvenient
• It can introduce nonsense tuples.
• You can get rid of them with selects.
• But this is so highly common, an operation was defined to make it easier: join.
Special case of equijoin: Natural Join
• Notation: R ⋈ S
• The result is defined by
taking the Cartesian product
selecting to ensure equality on attributes that are in
both relations (as determined by name)
projecting to remove duplicate attributes.
Natural Join
• The following examples show what natural join does when the tables have:
no attributes in common
one attribute in common
a different attribute in common
(Note that we change the attribute names for relation follows to set up these scenarios.)
Properties of Natural Join
• Commutative: R ⋈ S = S ⋈ R
(although attribute order may vary; this will matter later
when we use set operations)
• Associative: R ⋈ (S ⋈ T) = (R ⋈ S) ⋈ T
• So when writing n-ary joins, brackets are irrelevant.
We can just write: R1 ⋈ R2 ⋈ . . . ⋈ Rn
Assignment Operator
• Notation:
R = Expression
• Alternate notation:
R(A1, ..., An) = Expression
Lets you name all the attributes of the new relation
Sometimes you don‟t want the name they would
get from Expression.
R must be a temporary variable, not one of the
relations in the schema. i.e., you are not updating
the content of a relation.
Assignment
• Example:
• CSCoffering = σdept=‘csc’ Offering
• TookCSC(sid, grade) = πsid, grade(CSCoffering ⋈ Took)
• PassedCSC(sid) =πsid σgrade>50(TookCSC)
Assignment
• Assignment helps us break a problem down
• It also allows us to change the names of relations [and attributes].
• There is another way to rename things ...
Set Operations on Relations
• R U S, the union of R and S, is the set of tuples that
are in R or S or both.
• R - S, the difference of R and S, is the set of tuples
that are in R but not in S.
• Note that R - S is different from S - R.
• R ∩ S, the intersection of R and S, is the set of tuples
that are in both R and S.
Condition for set operators
• Set operators can operate only on two union-compatible relations.
• Two relations are union-compatible if they have the same number of attributes and each attribute must be from the same domain
Outer Join
Motivation
• Suppose we join R ⋈ S.
• A tuple of R which doesn't join with any tuple of S is
said to be dangling.
• Similarly for a tuple of S.
• Problem: We loose dangling tuples.
Outer Join
Outer join
• Preserves dangling tuples by padding them with a
special NULL symbol in the result.
RA is procedural
• An RA query itself suggests a procedure for constructing the result (i.e., how one could implement the query).
• We say that it is “procedural.”
Understand the database!
You need to understand the tables in a database and
how they relate to each other before you can work
with it.
Schema
• There are 5 tables:
1. BUYER(BUYERID, BUYERPHONE, BUYER_RATING)
2. POPART(PO#, PART#, PARTCOST, PARTQTY)
3. PART(PART#, PARTNAME, PARTTYPE, PARTSIZE)
4. SUPPLIER(SUPPLIER#, SUPPLIERNAME, SUPPLIERCITY, TIMEZONE)
5. SUPPLY(PART#, SUPPLIER#, BUYERID)
What is the primary key of the BUYER table?
• BUYER(BUYERID, BUYERPHONE, BUYER_RATING)
– BUYERID - identified by a single underline.
• This means that across all the records in the table,
BUYERID is unique.
Q. What is the primary key of POPART (pronounced : P-O-Part)?
• POPART(PO#, PART#, PARTCOST, PARTQTY)
• PO#, PART# - a concatenated or composite key
• We are given the criteria of „small‟ which is in the part table.
• We want to know names which is also in the part table.
• First do a select to eliminate the rows that are not „small‟
• and then a project to show just the names.
• select part where partsize = ‘small’ giving temp
• project temp over (partname) giving answer
Notes: Temp is a convenient name for the output of the select since it is not yet the final answer. Since the output of a statement is a new table, the new table can be used in subsequent statements so we can do the project on the temp table.
PART# PARTNAME PARTTYPE PARTSIZE
S11 Smiley Widget STUFFED small
F11 Frowning
Widget STUFFED small
Temp
PARTNAME
Smiley Widget
Frowning Widget
Answer
select part where partsize = ‘small’ giving temp
project temp over (partname) giving answer
What are the names of parts purchased on po# 1234?
• We know the po# which is in POPART and we want
to know partname which is in PART. Thus, we must
join the two tables.
• POPART(PO#, PART#, PARTCOST, PARTQTY)
• PART(PART#, PARTNAME, PARTTYPE,
PARTSIZE)
• Two tables can be joined if and only if they are
join compatible. Two tables are join compatible if
they have a common field.
• In this case, both tables have a part# field so they are
join compatible.
• (Note that it is the contents of the fields that must
be the same, not necessarily the names of the
fields. It is very convenient if the names of the fields
are the same when the contents are the same.)
• When two tables are joined, each record from table 1 is compared to each record from table 2.
• When a match is found, a record is written to new table which contains all fields from both table 1 and table 2.
• When PART and POPART are joined, the 3 records in PART are compared with all 3 records in POPART for a total of 9 comparisons.
• For example a 50,000 record table joined with a
1,000,000 record table requires 50 trillion
comparisons.
• If the join criteria (specifying the common field) is
left out, every record matches every record and
the new table will have 50 trillion records.
PART# PARTNAME PARTTYPE PARTSIZE
S11 Smiley Widget STUFFED small
F11 Frowning Widget STUFFED small
L12 Left-handed Widget STUFFED large
Part
PO# PART# PARTCOST PARTQTY
1233 L12 2.00 24
1234 L12 2.00 30
1234 F11 3.00 128
PART# PARTNAME PARTTYPE PARTSIZE PO# PART# PARTCOST PARTQTY
F11 Frowning Widget STUFFED small 1234 F11 3.00 128
L12 Left-handed
Widget STUFFED large 1233 L12 2.00 24
L12 Left-handed
Widget STUFFED large 1234 L12 2.00 30
POPART
The join of PART and POPART generates three matches so the resulting table is:
Newtable
What are the names of parts supplied by
suppliers in Hamilton?
• We know SUPPLIERCITY is in SUPPLIER and want to know
PARTNAME in PART. These two tables cannot be directly joined because they have no common field. However, we can get to PART by going through SUPPLY so we will involve that table in a join as well.
• BUYER(BUYERID, BUYERPHONE, BUYER_RATING)
• POPART(PO#, PART#, PARTCOST, PARTQTY)
• PART(PART#, PARTNAME, PARTTYPE, PARTSIZE)
• SUPPLIER(SUPPLIER#, SUPPLIERNAME, SUPPLIERCITY, TIMEZONE)
• SUPPLY(PART#, SUPPLIER#, BUYERID)
• select supplier where supplier city=‟Hamilton‟ giving
temp1
• join temp 1 and supply over supplier# giving temp2
• join temp2 and part over part# giving temp3
• project temp3 over (partname) giving answer