Upload
hal
View
22
Download
2
Embed Size (px)
DESCRIPTION
Relational Algebra. SQL specifies what to retrieve but not how to retrieve it. Need a process to translate a descriptive statement into a collection of activities. This is “behind the scenes”, but important to know nonetheless. If a CS student doesn’t know it – who will?. - PowerPoint PPT Presentation
Citation preview
Relational Algebra
SQL specifies what to retrieve but not how to retrieve it.
Need a process to translate a descriptive statement into a collection of activities.
This is “behind the scenes”, but important to know nonetheless.
If a CS student doesn’t know it – who will?
SQL StatementHigh level logic involving table (relation) manipulation
Mid level logic involving loops and comparisons
Low level logic involving B-Tree, index, hash table access, etc.
Result
SQL queries relations and generates a new relation.
Need something that manipulates relations in order to create other relations.
Need to do it efficiently.
Relational Algebra: a collection of operations (activities) to construct new relations from given relations.
Typical process:
SQL relational algebra operations (or something similar) mid/low level
logic result
Reference: First few slides of [http://www.cs.wayne.edu/~shiyong/csc6710/slides/kiferComp_348761_ppt11.ppt], though the notation may be intimidating.
Background:
Tuple (n-tuple): collection of n things written as (a1, a2,…, an).
Typically a tuple represents an entity in a relation (a row in a table).
Definition: 2 relations are Union-compatible if they have same degree (number of attributes) and the i th attributes of each are defined on same domains.
The first three definitions that follow apply only to Union-compatible relations.
Set operations:
Union
A B is the set of elements belonging to either A or B.
Venn Diagram
A B
SQL Server:SELECT *
FROM S
WHERE status = 20
UNION
SELECT *
FROM S
WHERE status = 50
Difference (Minus):
A – B is the set of elements belonging to A but not to B.
Venn Diagram.
A – B
SQL Server uses Except instead of Minus
Supplier table: Create a view defined by that below (all suppliers not in Paris).SELECT *
FROM S
EXCEPT
SELECT *
FROM S
WHERE City = 'Paris'
Intersection
A B is the set of elements belonging to both A and B.
Venn Diagram
A B
SQL Server:SELECT *
FROM S
WHERE status > 20
INTERSECT
SELECT *
FROM S
WHERE status < 50
A×B is the set of all elements of the form (x, y) where x belongs to A and y belongs to B.
Cartesian Product:
A = {a, b} B = {1, 2, 3}
A×B = {(a, 1), (a, 2), (a, 3), (b, 1), (b, 2), (b, 3) }
Supplier table: Create a view defined by
SELECT * FROM S CROSS JOIN SP
Selection (Restriction)
R where predicate where the predicate is some condition that evaluates to true or false.
A row subset of relation R.
Select * from R where predicate.
Let R be a relation and X, Y, …, Z be among R’s attributes.
Then R[X, Y, …, Z] is the set of elements from R restricted to attributes from X, Y, … , Z with duplicates removed.
Column subset of R
Select distinct attribute list from R
Projection
Join
Cartesian Product followed by a selection
A×B where predicate.
SELECT * FROM S CROSS JOIN SP where S.S# = SP.S#
Or
SELECT * FROM S, SP where S.S#=SP.S#
A join B
Let Ci, … , Cj be attribute names common to A and B. Then A JOIN B is
(A ×B)[all attributes with duplicates removed] where
A.Ci = B.Ci and
:
:
A.Cj = B.Cj
Natural Join:
Example, Consider S join SP. Suppose S9 is a supplier but S9 supplies no parts, yet. An entry for S9 will NOT appear in S join SP.
S left outerjoin SP will include every row from the left table (S) with NULLS filled in if there’s no matching S# in the SP table.
Outer Join:
Inner Join
Conventional join operation but the phrase is used if there’s the potential for confusion with outer join.
Examples
select * from S inner join SP on S.S#=SP.S#
select * from S left outer join SP on S.S#=SP.S#
Get supplier names for suppliers who supply part P2.
( ( SP where P# = ‘P2) join S) [SName]
1. Find a row subset (Selection) of SP where P#=‘P2’
2. Do a natural join with S
3. Find a column subset of the result (project on Sname)
Get supplier names for suppliers who supply at least one red part.
( ( ( P where color = ‘Red’) Join SP) [S#] Join S) [Sname]
1. Find a row subset (Selection) of P where color=‘Red’
2. Do a natural join with SP
3. Project on S#
4. Do a natural join with S
5. Project on SName
Get supplier names for suppliers who do not supply part P2.
S[Sname] Minus ( ( SP where P# = ‘P2) join S) [Sname]
1. Project S on SName
2. Find a row subset of SP where P#=‘P2’
3. Join the result of 2 with S
4. Project the result of 3 on Sname
5. Calculate the difference between the result of 1 and the result of 2
Let A be a relation of degree m+n and B be a relation of degree n
Visualize an m+n tuple (entry) of A as a pair (x, y) where
x is an m-tuple (1st m attributes) and y is an n-tuple (last n attributes)
Also suppose the n attributes of B are defined on the same domains as the last n attributes of A.
Then C = A dividedby B is a relation consisting of m-tuples x where for all y in B there is a pair (x, y) in A.
Division:
NOTE: only 5 operations: restriction, projection, product, union, and difference are primitive.
AB = A – (A – B)
A dividedby B = A[x] – (A[x]×B – A)[x].
SP dividedby B
B is: 1) P1 or 2) P2 and P4 or 3) P1 through P6
Supplier numbers that supply all parts in B
SP Dividedby B is: 1) S1 and S2 or 2) S1 and S4 or 3) S1
Example:
S# P#S1 P1S1 P2S1 P3S1 P4S1 P5S1 P6S2 P1S2 P2S2 P3S3 P2S3 P3S4 P2S4 P3S4 P4
A dividedby B = A[x] – (A[x]×B – A)[x].
Apply to previous example where A = SP.
B = P1 B = P2 and P4 B = P1 thru P6
A[x] S1 thru S4 S1 thru S4 S1 thru S4A[x] ×B (Si, P1) i = 1…4 (Si, P2) i = 1…4
(Si, P4) i = 1…4(S1, Pi) i = 1…6(S2, Pi) i = 1…6(S3, Pi) i = 1…6(S4, Pi) i = 1…6
A[x] ×B - A (S3, P1)(S4, P1)
(S2, P4)(S3, P4)
(S2, Pi) i=3,4,5,6(S3, Pi) i=1,3,4,5,6(S4, Pi) i=1,3,6
(A[x] ×B – A)[x] S3 and S4 S2 and S3 S2, S3, S4A[x] -(A[x] × B – A)[x] S1 and S2 S1 and S4 S1
Get supplier names for suppliers who supply all parts.
( ( SP dividedby P[P#]) Join S)[Sname]
or
( ( SP[S#] - (SP[S#] times P[P#] – SP)[S#]) Join S)[Sname]
Recall previous notes in SQL where finding ALL of something required double “not exist” subqueries.
Example
Select S.name
From S, SP
Where S.S# = SP.S#
And SP.P# = ‘P2’
Suppose 100 suppliers and 10,000 shipments
Consider
(S times SP) where S.S#=SP.S# and SP.P#=’P2’
vs
(S times (SP where SP.P#=’P2’) where S.S# = SP.S#
First option creates a VERY large intermediate table.
Query Optimization: Introduction to Database Systems by C. J. Date 8th ed
Slide 4 of [http://www.cs.wayne.edu/~shiyong/csc6710/slides/kiferComp_348761_ppt11.ppt]
Internal representation of a relational algebra expression (e.g. a query tree)
Stages of query optimization:
S SPSPS
Where SP.P# = ‘P2
join
Project on name
join
Where SP.P# = ‘P2
Project on name
(S join (SP where P#=’P2’))[name] ((S join SP) where P#=’P2) [name]
Canonical formsSet of queries, called C, such that for every possible query there is a query in C that is equivalent.
Reason for this? Get suppliers who supply P2 can be expressed in 8 different ways. Ideally, the efficiency should not depend on the query form.
Where p or (q and r) where (p or q) and (p or r) -- Conjunctive normal form
(A where p) where q A where (p and q)
(A[attributes]) where p (A where p)[attributes]
A.F1 > B.F2 and B.F2 = 3 A.F1 > 3
Examples:
Consider: Get suppliers not in London or Paris.
SELECT S#, Sname, Status, City
FROM dbo.S
WHERE (City <> 'Paris') OR
(City <> 'London')
Does this yield the expected result?
NOT (p and q) NOT p or NOT q (DeMorgan’s laws
NOT (p or q) NOT p and NOT q (DeMorgan’s laws
If system knew that SP.P# is a foreign key matching a primary key P.P# in P then
(SP join P) [S#] SP[S#]
Query optimizers do not optimize – just try to find “reasonably good” evaluation strategies. Might take longer to find optimal strategy than to do brute force method.
[http://www.cs.wayne.edu/~shiyong/csc6710/slides/kiferComp_348761_ppt11.ppt]
Summaries and References:
“MySQL's query optimization engine isn't always the most efficient when it comes to subqueries.”
often a good idea to convert a subquery to a join
[http://www.databasejournal.com/features/mysql/article.php/3813821/Five-Query-Optimizations-in-MySQL.htm]
Some additional information[http://avid.cs.umass.edu/courses/645/s2009/lectures/Lec12-QueryOptimizer-x6.pdf]
SQL Server: Query Analysis:
Select a database
Create a query window (right click on the database and select new query) and enter an query
Query Display Estimated Execution Plan
Query Include Actual Execution Plan.
When the query is executed there will be an execution plan tab on the result pane.
Some example queries to analyze:
select * from Sselect * from S order by S#select * from S order by statusselect * from SP where Qty = 200select * from S, SP where S.S#=SP.S# and P#='P2'select * from S where S# in (select S# from SP where P#='P2')S2 Replacement ViewAny Red part View.Only Red Parts1 view.Do some views from the genealogy database (see next slide for family tree)
Mike Sarah
Tom Ellen
Jeanne Joseph
TimDanielIraBrianKathy
Jim Alice
John Joanne
Jason Sally Linda
Intro to SQL Server Optimization
[http://www.c-sharpcorner.com/UploadFile/john_charles/QueryoptimizationinSQLServer200512112007154303PM/QueryoptimizationinSQLServer2005.aspx]
Some useful terms:
[http://technet.microsoft.com/en-us/library/cc917672.aspx]
Clustered indexA clustered index is organized as a B-tree, where the nonleaf nodes are index pages and the leaf nodes are data pages.
A table can have only 1.
Nonclustered indexA nonclustered index is organized as a B-tree. Unlike a clustered index, a nonclustered index consists of only index pages. The leaf nodes in a nonclustered index are not data pages, but contain row locators for individual rows in the data pages.
Hash match inner join: [http://blogs.msdn.com/craigfr/archive/2006/08/10/687630.aspx].
Left Semi Join“returns rows from one table that would join with another table without performing a complete join” [http://blogs.msdn.com/craigfr/archive/2006/07/19/671712.aspx]
Left Anti Semi JoinSimilar to above but returns those that would NOT join.
Stream Aggregates: [http://blogs.msdn.com/craigfr/archive/2006/09/13/752728.aspx]
Index scan vs index seek: [http://blogs.msdn.com/craigfr/archive/2006/06/26/647852.aspx
Spooling:
Temporary “caching” of data for use by another activity: [http://technet.microsoft.com/en-us/library/ms191221.aspx]
Summaries at [http://blogs.msdn.com/craigfr/attachment/8508493.ashx] and [http://technet.microsoft.com/en-us/library/ms191158.aspx]
Construct query plans from low level routines and make some attempt at optimizing.
True optimization is sometimes more costly than the savings.
Generating low level routines
Brute Force: Cardinality of R is m, cardinality of S is n
For i = 1 to m do
For j = 1 to n do
If R[i].C = S[j].C then
Add joined tuple, R[i]:S[j] to result table
Join of relations R and S (on attribute C): Intro to Database Systems by C. J. Date
Time is proportional to m*n
Worsedepending on many factors, number of disk I/Os could be proportional to m*n
Potentially large if tens or hundreds of thousands of tuples in each table or if tuples are very large and require a lot of disk I/Os.
Assume index on S.CFor i = 1 to m do
{
Find all records S[j], in S, where S[j].C=R[i].C
//requires a search through the index which is
//far less work than looking through all records in S
Join R[i] with corresponding tuples and add to result table.
}
Indexed method
Assume (or build) a hash table on S.C
For i = 1 to m do
{
k = hash(R[i].C)
search hash list at H[k] looking for matches to R[i].C
if found, add joined tuple to result table.
}
Hash Table:
R… relation; A…attributes; R[i] tuple i of R;
Chosen is a boolean vector, one bit for each R[i]; initially all False.
Nested Loop :
For i = 1 to m do
//look for alternative
If Chosen[i] is true then
{
add R[i].A to result table
for j = i+1 to n
if R[j].A = R[i].A then set Chosen[j]=true
}
Projection:
For i = 1 to m do
{
dup = false
k = hash(R[i].A)
search all tuples at H[k], looking for R[i].A; if not found add R[i].A to tuples at H[k]
}
Gather all tuples from hash table to store into result table.
Hashing: using a hash table H
CreateView P’
As Select * from P where Color = ‘Red’ and Weight < 20
CreateView SP’
As Select * from SP where Qty > 200
CreateView S’
As Select * from S where City = ‘London’
Select name from S’, SP’, P’
Where S’.S# = SP’.S# and
SP’.P# = P’.P#
Could create these three views concurrently with multiprocessors or multi-core processors.
Find London suppliers who supply red parts with weight < 20 in quantities of 200