36
Part 4 Relational Model

Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

  • Upload
    ngophuc

  • View
    214

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Part 4

Relational Model

Page 2: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Why Other “Productivity Boosters” Have Failed

Low level of structural detail Record-at-a-time processing orientation

(Unwilling to give up “control”) No sharp distinction between logical and

physical Limited data independence

No operations on “aggregates” (files, sets, tables, relations, ...)

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 2

Page 3: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Relational Database model “Codd's” Model

E. F. (Ted) Codd, CACM V13 #6 (June, 1970), pp. 377-87. “A Relational Model of Data for Large Shared Data Banks”

Developed in mid-1970’s Based on the mathematical theory of relations Codd's definition:

Given sets S1, S2, ... , Sn (not necessarily distinct), R is a relation on these n sets if it is a set of n-tuples each of which has its first element from S1, its second element from S2, and so on.

We shall refer to Sj as the jth domain of R. R is said to have degree n. If R has m n-tuples (or just tuples), R is said to have

cardinality m.

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 3

Page 4: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

The Beginning of Codd's Historic Article

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 4

Page 5: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Conceptual Idea of a Relation Conceptual (but not physical) ideas:

- A relation is a table or a flat file with n columns or fields and m rows or records - Column (or field) j represents a set of values (from a

possible set of values, Sj, the “domain”) for a particular attribute of all the entities

- Each row (or record) represents a set of values for an

entity, one for each attribute (column, field) - Degree - number of columns (fields, domains) - Cardinality - number of rows (records, entities, tuples)

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 5

Page 6: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Translation of Relational Terms

Relational Loose Term Equivalent Relation Table Tuple Row Degree # of attributes Cardinality # of table entries Domain field-level edit criteria

and integrity constraints

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 6

Page 7: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Requirements of a Relation All rows of the relation must have the same

attributes in the same order No repeating groups Each row must be unique

(No duplicate rows - if there are, they are “cast out”)

A set of columns that forms an identifier is the

table key

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 7

Page 8: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Recognizing and Eliminating Repeating Groups

Consider the 3M master sales history file:

1,000,000 entries of the form: Co Sect Div Grp Dept Item # Item Desc Qty $ Qty $

Qty $ Qty $ Qty $ Qty $ Qty $ Qty $ Qty $ Qty $ Where the first Qty/$ group represents sales this month,

the second sales last month, and the last sales 9 months ago. The primary key is the first 6 columns.

This needs to be broken up into two tables, with

two different keys as follows:

Co Sect Div Grp Dept Item # Item Desc and

Co Sect Div Grp Dept Item # Month Year Qty $

The first table has a 6-part primary key, with 1,000,000

entries. The second table has an 8-part primary key, with up to

10,000,000 entries. BUT zeroes for both qty and $ need to be stored - the

rows can be eliminated. This actually SAVES space.

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 8

Page 9: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Advantages of the Relational Model Logical not physical model

- easy to communicate Data Independence

- implementation independent Record interconnections are dynamically

generated based on data value - (no user-visible navigation links)

Set-at-a-time database operations (relational operators) locate, permute, join, select, project, derive, order, format, present

Join - the operator that “connects” tables - is unrestricted - it is not necessary to pre-define access paths

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 9

Page 10: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Differences in the Relational Model

Relational Non-relational Set-Oriented Navigational High-level Low-level

What How

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 10

Page 11: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Relational View of Sample Database

department

employee

task

project

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 11

Page 12: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Details of Department Relation

DeptNo Dname Loc Dbudget

400 programming 200 150000

401 financial 200 275000

402 academic 100 390000403 support 300 7000

attributes (columns)

enti-ties

domain 1 domain 4

tuple(row)

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 12

Page 13: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Organization of Relations in Sample Database

Relation (Entity type)

Attributes (Key underlined)

emp (Ename, Job, Mgr, Hired, Rate, Bonus, DeptNo)

dept (DeptNo, Dname, Loc, Dbudget)

task (Ename, Project_id, Tname, Hours)

proj (Project_id, Description, Pbudget, Due_date)

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 13

Page 14: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

(Single Table) Relational Operations

named file,view, orrelation

booleanentityselectionexpression

locate relation

selection

namedattributes projection

derivationrules

entry-levelderivations

orderingspecification order

set-functionspecification

file-levelderivations

format,edit spec.,destination

formatting &presentation

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 14

Page 15: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Relational Algebra Relational operators take one or two relations as

their “operands” or arguments Result of applying a relational operator to a

relation (or pair of relations) is another relation

Consequently, relational operators can be used

in sequence to achieve the desired results

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 15

Page 16: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Locate Relation

Table may only “logically” exist

relation 1 relation 2 relation 3 relation 4

user

view 1 view 2

file 1 file 2 file 3 Table may contain “derived” elements

finance

table

(view)

Ename Rate Bonus task_hours

smith 35 165

jones 35 145

king 18 49

turner 75 1000 57

From emp table

From

task

table

Cardinality may be determined by user type (how much you get depends on who you are)

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 16

Page 17: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Selection Construct a new table by taking a subset of the

tuples in the relation Done by taking a horizontal subset of the table Select all rows of the relation that satisfy some

specified condition In SQL: SELECT * FROM EMP WHERE JOB =

'PROGRAMMER';

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 17

Page 18: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Projection Construct a new table by taking a subset of the

attributes in a relation Done by taking vertical slices out of the table Extract all columns whose names have been

specified (Prior to permanent storage) Remove all

duplicate rows in the resulting table In SQL: SELECT ENAME, JOB, RATE FROM

EMP;

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 18

Page 19: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Entry-Level Derivations Construct a new table by appending new

attributes to the relation that are derived from existing attributes

Done by extending the table horizontally with generated columns

Specify a rule for generating the new columns from existing columns

New columns generated may be named for future reference

In SQL: SELECT ENAME, RATE, BONUS,

TASK_HOURS, RATE*TASK_HOURS + BONUS AS EXTEND FROM FINANCE;

finance Ename Rate Bonus task_hours extend

table (view)

smith 35 165 5775 jones 35 145 5075 king 18 49 882 turner 75 1000 57 5275

New column (extend) generated by the formula

(Rate × task_hours) + Bonus

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 19

Page 20: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Order

• Construct a “new” table by permuting the tuples in a relation

• Done by rearranging the rows of the table • Sort the rows of the relation using any combination of

the attributes as sorting keys • Actually, two relations differing only in the positions

of the tuples (rows) within it are considered the same

(This is also true of permuting the attributes, provided

the names and domains are carried along) In SQL: SELECT * FROM DEPT ORDER BY

DNAME; DeptNo Dname Loc Dbudget

400 programming 200 150000 401 financial 200 257000 402 academic 100 390000 403 support 300 7000 Order by Dname to obtain:

DeptNo Dname Loc Dbudget 402 academic 100 390000 401 financial 200 257000 400 programming 200 150000 403 support 300 7000

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 20

Page 21: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

File-Level Derivations

Obtain the results of set-level functions (Everything is a set in the relational model.)

Done by obtaining one or more “tables” of results (many are “tables” with one row and one attribute)

Perform sum, count, count unique, minimum, maximum, mean, variance, standard deviation, etc. on specified columns

Function results are generally not “appended” to the existing “table” in the relational sense - they represent “new” tables - with their own domains

In SQL: SELECT COUNT(DEPTNO), COUNT

(DISTINCT LOC), SUM(DBUDGET) FROM DEPT;

DeptNo Dname Loc Dbudget 400 programming 200 150000 401 financial 200 257000 402 academic 100 390000 403 support 300 7000

4 3 822000 count of departments

count of unique locations

total of all budgets

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 21

Page 22: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Formatting and Presentation Prepare the results in a form suitable for:

-transmission -storage -printing -plotting -screen display, etc.

Results may be sent to

-user screens -hard-copy devices -other software packages (e.g. Lotus 1-2-3), etc.

Specifications may be given through

-the query language (interactive or embedded) -a separate report writer -a variety of screen-oriented tools

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 22

Page 23: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Two Table Relational Operations Cartesian Product

All rows of the second table appended to all rows of the first table

No compatibility requirements Join

A form of parallel table lookup Both tables must share a domain

Union

All rows of the second table appended to the rows of the first table

Both tables must have the same domains Set Difference

All rows of the first table whose keys do not appear as keys in the second table

Both tables must share the same domains for their keys

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 23

Page 24: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Cartesian Product

If R1 and R2 are relations, the Cartesian product is written R1 × R2 (in relational algebra) or SELECT * FROM R1, R2; (in SQL)

A new relation is generated that consists of every tuple in R1 followed by every tuple in R2

relation empl relation group

name age dept dept loc able 20 35 35 100 baker 40 45 45 200 codd 60 45 25 100 date 30 25

Cartesian product empl × group empl.name empl.age empl.dept group.dept group.loc

able 20 35 35 100 able 20 35 45 200 able 20 35 25 100

baker 40 45 35 100 baker 40 45 45 200 baker 40 45 25 100 codd 60 45 35 100 codd 60 45 45 200 codd 60 45 25 100 date 30 25 35 100 date 30 25 45 200 date 30 25 25 100

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 24

Page 25: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Join Operation Form the Cartesian product between two

relations Cast out duplicates (assuming projection is done

also) Apply join conditions to select a subset of the

Cartesian product (selection) There are a variety of different join types,

differentiated by • which relations are used • what the join conditions are • what results are desired

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 25

Page 26: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Natural Join Operation (Simple join, inner equijoin)

- Start with two different tables, form the Cartesian

product (e.g. empl x group) empl.name empl.age empl.dept group.dept group.loc able 20 35 35 100 able 20 35 45 200 able 20 35 25 100 baker 40 45 35 100 baker 40 45 45 200 baker 40 45 25 100 codd 60 45 35 100 codd 60 45 45 200 codd 60 45 25 100 date 30 25 35 100 date 30 25 45 200 date 30 25 25 100

- Select rows where values of a pair of fields are equal (e.g. empl.dept and group.dept)

empl.name empl.age empl.dept group.dept group.loc able 20 35 35 100 baker 40 45 45 200 codd 60 45 45 200 date 30 25 25 100

- Project all except the duplicated column empl.name empl.age dept group.loc able 20 35 100 baker 40 45 200 codd 60 45 200 date 30 25 100

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 26

Page 27: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Expressing the Natural Join The natural join is written:

empl x group where empl.dept = group.dept

in the relational algebra The natural join is written:

SELECT * FROM EMPL, GROUP WHERE EMPL.DEPT = GROUP.DEPT;

in SQL The natural join performs a “table lookup”

function by “looking up” data from the second table for a field in the first table

Unfortunately, if no match is found for an item

“looked up” in the first table, that row in the first table is “lost”

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 27

Page 28: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Inner and Outer Joins

relation empl relation group

name age dept dept loc fox 30 25 25 100 gun 35 27 30 200 hal 27 30 40 150

Inner join produces:

empl.name empl.age dept group.loc fox 30 25 100 hal 27 30 200

Outer join produces:

empl.name empl.age dept group.loc fox 30 25 100 hal 27 30 200 gun 35 27 40 150

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 28

Page 29: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Outer Joins Outer Join

• selects from two different tables • keeps rows where values of a pair of fields are equal • if a row in either original table does not appear as a

part of any row after the inner join, that original row is appended to the join table with null values in the remaining fields

• In Oracle SQL: SELECT * FROM EMPL+, DEPT+ WHERE EMPL.DEPT = GROUP.DEPT;

Left Outer Join

• only members from the left original table are appended

Right Outer Join

• only members from the right original table are appended

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 29

Page 30: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Left and Right Outer Joins Left outer join produces: empl.name empl.age dept group.loc fox 30 25 100 hal 27 30 200 gun 35 27

• In Oracle SQL: SELECT * FROM EMPL+, DEPT WHERE EMPL.DEPT = GROUP.DEPT;

Right outer join produces: empl.name empl.age dept group.loc fox 30 25 100 hal 27 30 200 40 150

• In Oracle SQL: SELECT * FROM EMPL, DEPT+ WHERE EMPL.DEPT = GROUP.DEPT;

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 30

Page 31: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Other Types of Joins Self join

• joins a table with itself • e.g. desire a list of all employees with the department

number of their manager • two fields in the table must have the same domain

Non-equijoin (θ join or theta join)

• the condition for keeping rows is not an equality • inequality (<, >, ≤,≥, !=) • wild-card match (= 'S*', = '*MAN', = '1?') • other (IN, BETWEEN, LIKE, IS NULL, NOT)

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 31

Page 32: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Relational Algebra Notation Project:

R2 = πR1 (attribute 1, attribute 2, ...)

Select: R3 = R2 WHERE condition

Product (or Cartesian Product)

R5 = R3 × R4

Join: R7 = R5 × R6 WHERE condition

Union:

R9 = R7 UNION R8 or

R9 = R7 + R8

Difference: R11 = R9 - R10

Intersection:

R13 = R11 INTERSECT R12

Division: R15 = R13 / R14

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 32

Page 33: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Union, Difference, and Intersection

Suppose sets A and B have duplicated rows as show by the crosshatched region below

A BA

B

Union: Everything in one relation or the other or both,

but only one copy

Difference: Everything in the first relation that is NOT duplicated in the second relation

Intersection: One copy of everything that appears in both relations

A + BUnion

A - BDifference

A Intersect BIntersection

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 33

Page 34: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Division Division is a rather unusual construction in the

relational algebra. It is generally used to establish association rather than causation. It is useful in data mining application. If we divide relation A by relation B, then the

result is those first parts of rows in A that have all of the rows of relation B.

For example, if relation A is:

Ename Project_id allen admit allen billing barger admit barger alumni barger billing jones billing jones budget king admit

...and if relation B is:

Project_id admit billing

... then A / B is

Ename allen barger

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 34

Page 35: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Example Use of Relational Operators

1. Retrieve information on “large” projects - projects with a budget in excess of $ 10,000

R1 = proj WHERE Pbudget > 10000

2. The project_id's of large projects R2 = πR1 (Project_id)

3. Budgets and names of projects 'olson' is working

on R3 = proj × task (Ename, Description, Pbudget) WHERE task.Project_id = proj.Project_id R4 = R3 WHERE Ename = 'olson' R5 = πR4 (Description, Pbudget)

4. Budgets and names of projects 'olson' is working

on - in a single expression R6 = proj × task (Description, Pbudget) WHERE task.Project_id = proj.Project_id AND Ename = 'olson'

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 35

Page 36: Part 4 - Relational Modelcourseweb.stthomas.edu/tpsturm/private/notes/qm450/RELATION.pdf · Part 4 Relational Model . ... if there are, they are “cast out”) ... Function results

Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 36

Example Use of SQL

1. Retrieve information on “large” projects - projects with a budget in excess of $ 10,000

SELECT * FROM PROJ WHERE PBUDGET > 10000;

2. The project_id's of large projects SELECT PROJECT_ID FROM PROJ WHERE

PBUDGET > 10000;

3. Budgets and names of projects 'olson' is working on

SELECT PROJ.DESCRIPTION, PROJ.PBUDGET FROM PROJ, TASK WHERE TASK.PROJECT_ID = PROJ.PROJECT_ID AND TASK.ENAME = 'OLSON';