59
Relational Algebra SQL specifies what to retrieve but not how to retrieve it. Need a process to translate a descriptive statement into a collection of activities. This is “behind the scenes”, but important to know nonetheless. If a CS student doesn’t know it – who will?

Relational Algebra

  • Upload
    hal

  • View
    22

  • Download
    2

Embed Size (px)

DESCRIPTION

Relational Algebra. SQL specifies what to retrieve but not how to retrieve it. Need a process to translate a descriptive statement into a collection of activities. This is “behind the scenes”, but important to know nonetheless. If a CS student doesn’t know it – who will?. - PowerPoint PPT Presentation

Citation preview

Page 1: Relational Algebra

Relational Algebra

SQL specifies what to retrieve but not how to retrieve it.

Need a process to translate a descriptive statement into a collection of activities.

This is “behind the scenes”, but important to know nonetheless.

If a CS student doesn’t know it – who will?

Page 2: Relational Algebra

SQL StatementHigh level logic involving table (relation) manipulation

Mid level logic involving loops and comparisons

Low level logic involving B-Tree, index, hash table access, etc.

Result

Page 3: Relational Algebra

SQL queries relations and generates a new relation.

Need something that manipulates relations in order to create other relations.

Need to do it efficiently.

Page 4: Relational Algebra

Relational Algebra: a collection of operations (activities) to construct new relations from given relations.

Typical process:

SQL relational algebra operations (or something similar) mid/low level

logic result

Page 5: Relational Algebra

Reference: First few slides of [http://www.cs.wayne.edu/~shiyong/csc6710/slides/kiferComp_348761_ppt11.ppt], though the notation may be intimidating.

Page 6: Relational Algebra

Background:

Tuple (n-tuple): collection of n things written as (a1, a2,…, an).

Typically a tuple represents an entity in a relation (a row in a table).

Page 7: Relational Algebra

Definition: 2 relations are Union-compatible if they have same degree (number of attributes) and the i th attributes of each are defined on same domains.

The first three definitions that follow apply only to Union-compatible relations.

Set operations:

Page 8: Relational Algebra

Union

A B is the set of elements belonging to either A or B.

Venn Diagram

A B

Page 9: Relational Algebra

SQL Server:SELECT *

FROM S

WHERE status = 20

UNION

SELECT *

FROM S

WHERE status = 50

Page 10: Relational Algebra

Difference (Minus):

A – B is the set of elements belonging to A but not to B.

Venn Diagram.

A – B

Page 11: Relational Algebra

SQL Server uses Except instead of Minus

Supplier table: Create a view defined by that below (all suppliers not in Paris).SELECT *

FROM S

EXCEPT

SELECT *

FROM S

WHERE City = 'Paris'

Page 12: Relational Algebra

Intersection

A B is the set of elements belonging to both A and B.

Venn Diagram

A B

Page 13: Relational Algebra

SQL Server:SELECT *

FROM S

WHERE status > 20

INTERSECT

SELECT *

FROM S

WHERE status < 50

Page 14: Relational Algebra

A×B is the set of all elements of the form (x, y) where x belongs to A and y belongs to B.

Cartesian Product:

Page 15: Relational Algebra

A = {a, b} B = {1, 2, 3}

A×B = {(a, 1), (a, 2), (a, 3), (b, 1), (b, 2), (b, 3) }

Supplier table: Create a view defined by

SELECT * FROM S CROSS JOIN SP

Page 16: Relational Algebra

Selection (Restriction)

R where predicate where the predicate is some condition that evaluates to true or false.

A row subset of relation R.

Select * from R where predicate.

Page 17: Relational Algebra

Let R be a relation and X, Y, …, Z be among R’s attributes.

Then R[X, Y, …, Z] is the set of elements from R restricted to attributes from X, Y, … , Z with duplicates removed.

Column subset of R

Select distinct attribute list from R

Projection

Page 18: Relational Algebra

Join

Cartesian Product followed by a selection

A×B where predicate.

SELECT * FROM S CROSS JOIN SP where S.S# = SP.S#

Or

SELECT * FROM S, SP where S.S#=SP.S#

Page 19: Relational Algebra

A join B

Let Ci, … , Cj be attribute names common to A and B. Then A JOIN B is

(A ×B)[all attributes with duplicates removed] where

A.Ci = B.Ci and

:

:

A.Cj = B.Cj

Natural Join:

Page 20: Relational Algebra

Example, Consider S join SP. Suppose S9 is a supplier but S9 supplies no parts, yet. An entry for S9 will NOT appear in S join SP.

S left outerjoin SP will include every row from the left table (S) with NULLS filled in if there’s no matching S# in the SP table.

Outer Join:

Page 21: Relational Algebra

Inner Join

Conventional join operation but the phrase is used if there’s the potential for confusion with outer join.

Page 22: Relational Algebra

Examples

select * from S inner join SP on S.S#=SP.S#

select * from S left outer join SP on S.S#=SP.S#

Page 23: Relational Algebra

Get supplier names for suppliers who supply part P2.

( ( SP where P# = ‘P2) join S) [SName]

1. Find a row subset (Selection) of SP where P#=‘P2’

2. Do a natural join with S

3. Find a column subset of the result (project on Sname)

Page 24: Relational Algebra

Get supplier names for suppliers who supply at least one red part.

( ( ( P where color = ‘Red’) Join SP) [S#] Join S) [Sname]

1. Find a row subset (Selection) of P where color=‘Red’

2. Do a natural join with SP

3. Project on S#

4. Do a natural join with S

5. Project on SName

Page 25: Relational Algebra

Get supplier names for suppliers who do not supply part P2.

S[Sname] Minus ( ( SP where P# = ‘P2) join S) [Sname]

1. Project S on SName

2. Find a row subset of SP where P#=‘P2’

3. Join the result of 2 with S

4. Project the result of 3 on Sname

5. Calculate the difference between the result of 1 and the result of 2

Page 26: Relational Algebra

Let A be a relation of degree m+n and B be a relation of degree n

Visualize an m+n tuple (entry) of A as a pair (x, y) where

x is an m-tuple (1st m attributes) and y is an n-tuple (last n attributes)

Also suppose the n attributes of B are defined on the same domains as the last n attributes of A.

Then C = A dividedby B is a relation consisting of m-tuples x where for all y in B there is a pair (x, y) in A.

Division:

Page 27: Relational Algebra

NOTE: only 5 operations: restriction, projection, product, union, and difference are primitive.

AB = A – (A – B)

A dividedby B = A[x] – (A[x]×B – A)[x].

Page 28: Relational Algebra

SP dividedby B

B is: 1) P1 or 2) P2 and P4 or 3) P1 through P6

Supplier numbers that supply all parts in B

SP Dividedby B is: 1) S1 and S2 or 2) S1 and S4 or 3) S1

Example:

S# P#S1 P1S1 P2S1 P3S1 P4S1 P5S1 P6S2 P1S2 P2S2 P3S3 P2S3 P3S4 P2S4 P3S4 P4

Page 29: Relational Algebra

A dividedby B = A[x] – (A[x]×B – A)[x].

Apply to previous example where A = SP.

B = P1 B = P2 and P4 B = P1 thru P6

A[x] S1 thru S4 S1 thru S4 S1 thru S4A[x] ×B (Si, P1) i = 1…4 (Si, P2) i = 1…4

(Si, P4) i = 1…4(S1, Pi) i = 1…6(S2, Pi) i = 1…6(S3, Pi) i = 1…6(S4, Pi) i = 1…6

A[x] ×B - A (S3, P1)(S4, P1)

(S2, P4)(S3, P4)

(S2, Pi) i=3,4,5,6(S3, Pi) i=1,3,4,5,6(S4, Pi) i=1,3,6

(A[x] ×B – A)[x] S3 and S4 S2 and S3 S2, S3, S4A[x] -(A[x] × B – A)[x] S1 and S2 S1 and S4 S1

Page 30: Relational Algebra

Get supplier names for suppliers who supply all parts.

( ( SP dividedby P[P#]) Join S)[Sname]

or

( ( SP[S#] - (SP[S#] times P[P#] – SP)[S#]) Join S)[Sname]

Recall previous notes in SQL where finding ALL of something required double “not exist” subqueries.

Example

Page 31: Relational Algebra

Select S.name

From S, SP

Where S.S# = SP.S#

And SP.P# = ‘P2’

Suppose 100 suppliers and 10,000 shipments

Consider

(S times SP) where S.S#=SP.S# and SP.P#=’P2’

vs

(S times (SP where SP.P#=’P2’) where S.S# = SP.S#

First option creates a VERY large intermediate table.

Query Optimization: Introduction to Database Systems by C. J. Date 8th ed

Page 32: Relational Algebra

Slide 4 of [http://www.cs.wayne.edu/~shiyong/csc6710/slides/kiferComp_348761_ppt11.ppt]

Internal representation of a relational algebra expression (e.g. a query tree)

Stages of query optimization:

S SPSPS

Where SP.P# = ‘P2

join

Project on name

join

Where SP.P# = ‘P2

Project on name

(S join (SP where P#=’P2’))[name] ((S join SP) where P#=’P2) [name]

Page 33: Relational Algebra

Canonical formsSet of queries, called C, such that for every possible query there is a query in C that is equivalent.

Reason for this? Get suppliers who supply P2 can be expressed in 8 different ways. Ideally, the efficiency should not depend on the query form.

Page 34: Relational Algebra

Where p or (q and r) where (p or q) and (p or r) -- Conjunctive normal form

(A where p) where q A where (p and q)

(A[attributes]) where p (A where p)[attributes]

A.F1 > B.F2 and B.F2 = 3 A.F1 > 3

Examples:

Page 35: Relational Algebra

Consider: Get suppliers not in London or Paris.

SELECT S#, Sname, Status, City

FROM dbo.S

WHERE (City <> 'Paris') OR

(City <> 'London')

Does this yield the expected result?

Page 36: Relational Algebra

NOT (p and q) NOT p or NOT q (DeMorgan’s laws

NOT (p or q) NOT p and NOT q (DeMorgan’s laws

If system knew that SP.P# is a foreign key matching a primary key P.P# in P then

(SP join P) [S#] SP[S#]

Page 37: Relational Algebra

Query optimizers do not optimize – just try to find “reasonably good” evaluation strategies. Might take longer to find optimal strategy than to do brute force method.

[http://www.cs.wayne.edu/~shiyong/csc6710/slides/kiferComp_348761_ppt11.ppt]

Page 38: Relational Algebra

Summaries and References:

“MySQL's query optimization engine isn't always the most efficient when it comes to subqueries.”

often a good idea to convert a subquery to a join

[http://www.databasejournal.com/features/mysql/article.php/3813821/Five-Query-Optimizations-in-MySQL.htm]

Page 40: Relational Algebra

SQL Server: Query Analysis:

Select a database

Create a query window (right click on the database and select new query) and enter an query

Query Display Estimated Execution Plan

Query Include Actual Execution Plan.

When the query is executed there will be an execution plan tab on the result pane.

Page 41: Relational Algebra

Some example queries to analyze:

select * from Sselect * from S order by S#select * from S order by statusselect * from SP where Qty = 200select * from S, SP where S.S#=SP.S# and P#='P2'select * from S where S# in (select S# from SP where P#='P2')S2 Replacement ViewAny Red part View.Only Red Parts1 view.Do some views from the genealogy database (see next slide for family tree)

Page 42: Relational Algebra

Mike Sarah

Tom Ellen

Jeanne Joseph

TimDanielIraBrianKathy

Jim Alice

John Joanne

Jason Sally Linda

Page 44: Relational Algebra

Some useful terms:

[http://technet.microsoft.com/en-us/library/cc917672.aspx]

Clustered indexA clustered index is organized as a B-tree, where the nonleaf nodes are index pages and the leaf nodes are data pages.

A table can have only 1.

Page 45: Relational Algebra

Nonclustered indexA nonclustered index is organized as a B-tree. Unlike a clustered index, a nonclustered index consists of only index pages. The leaf nodes in a nonclustered index are not data pages, but contain row locators for individual rows in the data pages.

Page 46: Relational Algebra

Hash match inner join: [http://blogs.msdn.com/craigfr/archive/2006/08/10/687630.aspx].

Page 47: Relational Algebra

Left Semi Join“returns rows from one table that would join with another table without performing a complete join” [http://blogs.msdn.com/craigfr/archive/2006/07/19/671712.aspx]

Left Anti Semi JoinSimilar to above but returns those that would NOT join.

Page 48: Relational Algebra

Stream Aggregates: [http://blogs.msdn.com/craigfr/archive/2006/09/13/752728.aspx]

Page 49: Relational Algebra

Index scan vs index seek: [http://blogs.msdn.com/craigfr/archive/2006/06/26/647852.aspx

Page 50: Relational Algebra

Spooling:

Temporary “caching” of data for use by another activity: [http://technet.microsoft.com/en-us/library/ms191221.aspx]

Page 52: Relational Algebra

Construct query plans from low level routines and make some attempt at optimizing.

True optimization is sometimes more costly than the savings.

Generating low level routines

Page 53: Relational Algebra

Brute Force: Cardinality of R is m, cardinality of S is n

For i = 1 to m do

For j = 1 to n do

If R[i].C = S[j].C then

Add joined tuple, R[i]:S[j] to result table 

Join of relations R and S (on attribute C): Intro to Database Systems by C. J. Date

Page 54: Relational Algebra

Time is proportional to m*n

Worsedepending on many factors, number of disk I/Os could be proportional to m*n

Potentially large if tens or hundreds of thousands of tuples in each table or if tuples are very large and require a lot of disk I/Os.

Page 55: Relational Algebra

Assume index on S.CFor i = 1 to m do

{

Find all records S[j], in S, where S[j].C=R[i].C

//requires a search through the index which is

//far less work than looking through all records in S

Join R[i] with corresponding tuples and add to result table.

}

Indexed method

Page 56: Relational Algebra

Assume (or build) a hash table on S.C

For i = 1 to m do

{

k = hash(R[i].C)

search hash list at H[k] looking for matches to R[i].C

if found, add joined tuple to result table.

}

Hash Table:

Page 57: Relational Algebra

R… relation; A…attributes; R[i] tuple i of R;

Chosen is a boolean vector, one bit for each R[i]; initially all False.

Nested Loop :

For i = 1 to m do

//look for alternative

If Chosen[i] is true then

{

add R[i].A to result table

for j = i+1 to n

if R[j].A = R[i].A then set Chosen[j]=true

}

Projection:

Page 58: Relational Algebra

For i = 1 to m do

{

dup = false

k = hash(R[i].A)

search all tuples at H[k], looking for R[i].A; if not found add R[i].A to tuples at H[k]

}

Gather all tuples from hash table to store into result table.

Hashing: using a hash table H

Page 59: Relational Algebra

CreateView P’

As Select * from P where Color = ‘Red’ and Weight < 20

 

CreateView SP’

As Select * from SP where Qty > 200

 

CreateView S’

As Select * from S where City = ‘London’

 

Select name from S’, SP’, P’

Where S’.S# = SP’.S# and

SP’.P# = P’.P#

 

Could create these three views concurrently with multiprocessors or multi-core processors.

Find London suppliers who supply red parts with weight < 20 in quantities of 200