63
2004-10-06 SLIDE 1 IS 257 – Fall 2004 Relational Algebra and Calculus: Introduction to SQL University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management

Chapter 9

  • Upload
    rey501

  • View
    215

  • Download
    1

Embed Size (px)

DESCRIPTION

DBMS PPT

Citation preview

Page 1: Chapter 9

2004-10-06 SLIDE 1IS 257 – Fall 2004

Relational Algebra and Calculus:Introduction to SQL

University of California, Berkeley

School of Information Management and Systems

SIMS 257: Database Management

Page 2: Chapter 9

2004-10-06 SLIDE 2IS 257 – Fall 2004

Lecture Outline

• Review– Design to Relational Implementation

• Relational Algebra

• Relational Calculus

• Introduction to SQL

Page 3: Chapter 9

2004-10-06 SLIDE 3IS 257 – Fall 2004

Lecture Outline

• Review– Design to Relational Implementation

• Relational Algebra

• Relational Calculus

• Introduction to SQL

Page 4: Chapter 9

2004-10-06 SLIDE 4IS 257 – Fall 2004

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 5: Chapter 9

2004-10-06 SLIDE 5IS 257 – Fall 2004

Original Cookie ER Diagram

AU_ID

BIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILE

libid

CALLFILE

pubidPUBFILE

subcodeaccno subcode

libidaccno

AUTHORS

AU_BIBaccno

AU ID

Author

Note: diagramcontains onlyattributes usedfor linking

Page 6: Chapter 9

2004-10-06 SLIDE 6IS 257 – Fall 2004

What Problems?

• What sorts of problems and missing features arise given the previous ER diagram?

Page 7: Chapter 9

2004-10-06 SLIDE 7IS 257 – Fall 2004

Problems Identified

• Subtitles, parallel titles?• Edition information• Series information• lending status• material type designation• Genre, class information• Better codes (ISBN?)• Missing information

(ISBN)

• Authority control for authors

• Missing/incomplete data• Data entry problems• Ordering information• Illustrations• Subfield separation (such

as last_name, first_name)• Separate personal and

corporate authors

Page 8: Chapter 9

2004-10-06 SLIDE 8IS 257 – Fall 2004

Problems (Cont.)

• Location field inconsistent

• No notes field• No language field• Zipcode doesn’t

support plus-4• No publisher shipping

addresses

• No (indexable) keyword search capability

• No support for multivolume works

• No support for URLs – to online version– to libraries– to publishers

Page 9: Chapter 9

2004-10-06 SLIDE 9IS 257 – Fall 2004

BIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILE

libid

CALLFILE

pubidPUBFILE

subcodeaccno subcode

libidaccno

Cookie2: Separate Name Authorities

nameid

AUTHFILE

AUTHBIB

authtype

accno

authid

nameauthid

Page 10: Chapter 9

2004-10-06 SLIDE 10IS 257 – Fall 2004

Cookie 3: Keywords

BIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILE

libid

CALLFILE

pubidPUBFILE

subcodeaccno subcode

libidaccno

KEYMAP TERMS

accno termid termid

nameid

AUTHFILE

AUTHBIB

authtype

accno

authid

nameauthid

Page 11: Chapter 9

2004-10-06 SLIDE 11IS 257 – Fall 2004

Cookie 4: SeriesSERIES

seriesid

seriesid

ser_title

BIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILE

libid

CALLFILE

pubidPUBFILE

subcodeaccno subcode

libidaccno

KEYMAP TERMS

accno termid termid

nameid

AUTHFILE

AUTHBIB

authtype

accno

authid

nameauthid

Page 12: Chapter 9

2004-10-06 SLIDE 12IS 257 – Fall 2004

Cookie 5: Circulation

circid

CIRC

circidcopynum patronid

PATRON

seriesid

BIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILE

libid

CALLFILE

pubidPUBFILE

subcodeaccno subcode

libidaccno

KEYMAP TERMS

accno termid termidSERIES seriesid

ser_title

nameid

AUTHFILE

AUTHBIB

authtype

accno

authid

nameauthid

Page 13: Chapter 9

2004-10-06 SLIDE 13IS 257 – Fall 2004

Logical Model: Mapping to Relations

• Take each entity– BIBFILE– LIBFILE– CALLFILE– SUBFILE– PUBFILE– INDXFILE

• And make it a table...

Page 14: Chapter 9

2004-10-06 SLIDE 14IS 257 – Fall 2004

Lecture Outline

• Review– Design to Relational Implementation

• Relational Algebra

• Relational Calculus

• Introduction to SQL

Page 15: Chapter 9

2004-10-06 SLIDE 15IS 257 – Fall 2004

Relational Algebra

• Relational Algebra is a collection of operators that take relations as their operands and return a relation as their results

• First defined by Codd– Include 8 operators

• 4 derived from traditional set operators• 4 new relational operations

From: C.J. Date, Database Systems 8th ed.

Page 16: Chapter 9

2004-10-06 SLIDE 16IS 257 – Fall 2004

Relational Algebra Operations

• Restrict

• Project

• Product

• Union

• Intersect

• Difference

• Join

• Divide

Page 17: Chapter 9

2004-10-06 SLIDE 17IS 257 – Fall 2004

Restrict

• Extracts specified tuples (rows) from a specified relation (table) – Restrict is AKA “Select”

Page 18: Chapter 9

2004-10-06 SLIDE 18IS 257 – Fall 2004

Project

• Extracts specified attributes(columns) from a specified relation.

Page 19: Chapter 9

2004-10-06 SLIDE 19IS 257 – Fall 2004

Product

• Builds a relation from two specified relations consisting of all possible concatenated pairs of tuples, one from each of the two relations. (AKA Cartesian Product)

abc

xy

xyxyxy

aabbcc

Product

Page 20: Chapter 9

2004-10-06 SLIDE 20IS 257 – Fall 2004

Union

• Builds a relation consisting of all tuples appearing in either or both of two specified relations.

Page 21: Chapter 9

2004-10-06 SLIDE 21IS 257 – Fall 2004

Intersect

• Builds a relation consisting of all tuples appearing in both of two specified relations

Page 22: Chapter 9

2004-10-06 SLIDE 22IS 257 – Fall 2004

Difference

• Builds a relation consisting of all tuples appearing in first relation but not the second.

Page 23: Chapter 9

2004-10-06 SLIDE 23IS 257 – Fall 2004

Join

• Builds a relation from two specified relations consisting of all possible concatenated pairs, one from each of the two relations, such that in each pair the two tuples satisfy some condition. (E.g., equal values in a given col.)

A1 B1A2 B1A3 B2

B1 C1B2 C2B3 C3

A1 B1 C1A2 B1 C1A3 B2 C2

(Naturalor Inner)

Join

Page 24: Chapter 9

2004-10-06 SLIDE 24IS 257 – Fall 2004

Outer Join

• Outer Joins are similar to PRODUCT -- but will leave NULLs for any row in the first table with no corresponding rows in the second.

A1 B1A2 B1A3 B2A4 B7

B1 C1B2 C2B3 C3

A1 B1 C1A2 B1 C1A3 B2 C2A4 * *

Outer Join

Page 25: Chapter 9

2004-10-06 SLIDE 25IS 257 – Fall 2004

Divide

• Takes two relations, one binary and one unary, and builds a relation consisting of all values of one attribute of the binary relation that match (in the other attribute) all values in the unary relation.

a

xy

xyzxy

aaabc

Divide

Page 26: Chapter 9

2004-10-06 SLIDE 26IS 257 – Fall 2004

ER Diagram: Acme Widget Co.

Contains Part

Part# Count

Price

Customer

Quantity

Orders

Cust#

Invoice

Writes

Sales-Rep

Invoice#

Sales

Rep#

Line-ItemContains

Part#

Invoice#

Cust#

Hourly

Employee

ISA

Emp#Wage

Page 27: Chapter 9

2004-10-06 SLIDE 27IS 257 – Fall 2004

Employee

SSN Lastname Firstname Middlename Birthdate Address123-76-3423 Jones Janet Mary 6/25/1963 234 State342-88-7865 Smith Thomas Frederick 8/4/1970 12 Lambert486-87-6543 Hendersen Charles Robert 9/23/1961 44 Central843-36-7659 Martinez Roberto Garcia 7/8/1958 76 Highland

Page 28: Chapter 9

2004-10-06 SLIDE 28IS 257 – Fall 2004

Part

Part # Name Price Count1 Big blue widget 3.76 22 Small blue Widget 7.35 43 Tiny red widget 5.25 74 large red widget 157.23 235 double widget rack 10.44 126 Small green Widget 30.45 587 Big yellow widget 7.96 18 Tiny orange widget 81.75 429 Big purple widget 55.99 9

Page 29: Chapter 9

2004-10-06 SLIDE 29IS 257 – Fall 2004

Sales-Rep

SSN Rep # Sales123-76-3423 1 $12,345.45843-36-7659 2 $231,456.75

HourlySSN Wage342-88-7865 $12.75486-87-6543 $20.50

Page 30: Chapter 9

2004-10-06 SLIDE 30IS 257 – Fall 2004

Customer

Cust # COMPANY STREET1 STREET2 CITY STATE ZIPCODE

1Integrated Standards Ltd. 35 Broadway Floor 12 New York NY 02111

2 MegaInt Inc. 34 Bureaucracy Plaza Floors 1-172 Phildelphia PA 03756

3 Cyber Associates3 Control Elevation Place

Cyber Assicates Center Cyberoid NY 08645

4General Consolidated 35 Libra Plaza Nashua NH 09242

5Consolidated MultiCorp 1 Broadway Middletown IN 32467

6Internet Behometh Ltd. 88 Oligopoly Place Sagrado TX 78798

7Consolidated Brands, Inc.

3 Independence Parkway Rivendell CA 93456

8 Little Mighty Micro 34 Last One Drive Orinda CA 94563

9 SportLine Ltd. 38 Champion Place Suite 882 Compton CA 95328

Page 31: Chapter 9

2004-10-06 SLIDE 31IS 257 – Fall 2004

Invoice

Invoice # Cust # Rep #93774 3 184747 4 188367 5 288647 9 1

776879 2 265689 6 2

Page 32: Chapter 9

2004-10-06 SLIDE 32IS 257 – Fall 2004

Line-Item

Invoice # Part # Quantity93774 3 1084747 23 188367 75 288647 4 3

776879 22 565689 76 1293774 23 1088367 34 2

Page 33: Chapter 9

2004-10-06 SLIDE 33IS 257 – Fall 2004

Join ItemsPart # Name Price Count

1 Big blue widget 3.76 22 Small blue Widget 7.35 43 Tiny red widget 5.25 74 large red widget 157.23 235 double widget rack 10.44 126 Small green Widget 30.45 587 Big yellow widget 7.96 18 Tiny orange widget 81.75 429 Big purple widget 55.99 9

… … … …

Invoice # Part # Quantity93774 3 1084747 23 188367 75 288647 4 3

776879 22 565689 76 1293774 23 1088367 34 2

Invoice # Cust # Rep #93774 3 184747 4 188647 5 288367 9 1

776879 2 265689 6 2

Cust # COMPANY STREET1 STREET2 CITY STATE ZIPCODE

1Integrated Standards Ltd. 35 Broadway Floor 12 New York NY 02111

2 MegaInt Inc. 34 Bureaucracy Plaza Floors 1-172 Phildelphia PA 03756

3 Cyber Associates3 Control Elevation Place

Cyber Assicates Center Cyberoid NY 08645

4General Consolidated 35 Libra Plaza Nashua NH 09242

5Consolidated MultiCorp 1 Broadway Middletown IN 32467

6Internet Behometh Ltd. 88 Oligopoly Place Sagrado TX 78798

7Consolidated Brands, Inc.

3 Independence Parkway Rivendell CA 93456

8 Little Mighty Micro 34 Last One Drive Orinda CA 94563

9 SportLine Ltd. 38 Champion Place Suite 882 Compton CA 95328

Page 34: Chapter 9

2004-10-06 SLIDE 34IS 257 – Fall 2004

Relational Algebra

• What is the name of the customer who ordered Large Red Widgets?– Restrict “large Red Widgets” row from Part as

temp1– Join temp1 with Line-item on Part # as temp2– Join temp2 with Invoice on Invoice # as temp3– Join temp3 with Customer on cust # as temp4– Project Company from temp4 as answer

Page 35: Chapter 9

2004-10-06 SLIDE 35IS 257 – Fall 2004

Lecture Outline

• Review– Design to Relational Implementation

• Relational Operations

• Relational Algebra

• Relational Calculus

• Introduction to SQL

Page 36: Chapter 9

2004-10-06 SLIDE 36IS 257 – Fall 2004

Relational Calculus

• Relational Algebra provides a set of explicit operations (select, project, join, etc) that can be used to build some desired relation from the database

• Relational Calculus provides a notation for formulating the definition of that desired relation in terms of the relations in the database without explicitly stating the operations to be performed

• SQL is based on the relational calculus and algebra

Page 37: Chapter 9

2004-10-06 SLIDE 37IS 257 – Fall 2004

Lecture Outline

• Review– Design to Relational Implementation

• Relational Operations

• Relational Algebra

• Relational Calculus

• Introduction to SQL

Page 38: Chapter 9

2004-10-06 SLIDE 38IS 257 – Fall 2004

SQL

• Structured Query Language

• Used for both Database Definition, Modification and Querying

• Basic language is standardized across relational DBMS’s. Each system may have proprietary extensions to standard.

• Relational Calculus combines Restrict, Project and Join operations in a single command. SELECT.

Page 39: Chapter 9

2004-10-06 SLIDE 39IS 257 – Fall 2004

SQL - History

• QUEL (Query Language from Ingres)

• SEQUEL from IBM San Jose

• ANSI 1992 Standard is the version used by most DBMS today (SQL92)

• Basic language is standardized across relational DBMSs. Each system may have proprietary extensions to standard.

Page 40: Chapter 9

2004-10-06 SLIDE 40IS 257 – Fall 2004

SQL99

• In 1999, SQL99 – also known as SQL3 – was adopted and contains the following eight parts: – The SQL/Framework (75 pages)– SQL/Foundation (1100 pages)– SQL/Call Level Interface (400 pages)– SQL/Persistent Stored Modules (PSM) (160 pages)– SQL/Host Language Bindings (250 pages)– SQL Transactions (??)– SQL Temporal objects (??)– SQL Objects (??)

• Designed to be compatible with SQL92

Page 41: Chapter 9

2004-10-06 SLIDE 41IS 257 – Fall 2004

SQL99

• The SQL/Framework --SQL basic concepts and general requirements.

• SQL/Call Level Interface (CLI) -- An API for SQL. This is similar to ODBC.

• SQL/Foundation --The syntax and SQL operations that are the basis for the language.

Page 42: Chapter 9

2004-10-06 SLIDE 42IS 257 – Fall 2004

SQL99

• SQL/Persistent Stored Modules (PSM) --Defines the rules for developing SQL routines, modules, and functions such as those used by stored procedures and triggers. This is implemented in many major RDBMSs through proprietary, nonportable languages, but for the first time we have a standard for writing procedural code that is transportable across databases.

Page 43: Chapter 9

2004-10-06 SLIDE 43IS 257 – Fall 2004

SQL99

• SQL/Host Language Bindings --Define ways to code embedded SQL in standard programming languages. This simplifies the approach used by CLIs and provides performance enhancements.

• SQL Transactions --Transactional support for RDBMSs.

• SQL Temporal objects --Deal with Time-based data.

• SQL Objects --The new Object-Relational features, which represent the largest and most important enhancements to this new standard.

Page 44: Chapter 9

2004-10-06 SLIDE 44IS 257 – Fall 2004

SQL99 Data TypesSQL

Data Types

Ref TypesPredefined

TypesArrays

ROWData Struct

User-DefinedTypes

Numeric String DateTime Interval Boolean

Date

Time

Timestamp

Bit Character Blob

Fixed

Varying

CLOB

Fixed

Varying

ApproximateExact

NEWIN SQL99

Page 45: Chapter 9

2004-10-06 SLIDE 45IS 257 – Fall 2004

SQL Uses

• Database Definition and Querying– Can be used as an interactive query language– Can be imbedded in programs

• Relational Calculus combines Select, Project and Join operations in a single command: SELECT

Page 46: Chapter 9

2004-10-06 SLIDE 46IS 257 – Fall 2004

SELECT

• Syntax:

SELECT [DISTINCT] attr1, attr2,…, attr3 FROM rel1 r1, rel2 r2,… rel3 r3 WHERE condition1 {AND | OR} condition2 ORDER BY attr1 [DESC], attr3 [DESC]

Page 47: Chapter 9

2004-10-06 SLIDE 47IS 257 – Fall 2004

SELECT

• Syntax:

SELECT a.author, b.title FROM authors a, bibfile b, au_bib c WHERE a.AU_ID = c.AU_ID and c.accno = b.accno ORDER BY a.author ;

• Examples in Access...

Page 48: Chapter 9

2004-10-06 SLIDE 48IS 257 – Fall 2004

SELECT Conditions

• = equal to a particular value• >= greater than or equal to a particular value• > greater than a particular value• <= less than or equal to a particular value• <> not equal to a particular value• LIKE “*term*” (may be other wild cards in other

systems)• IN (“opt1”, “opt2”,…,”optn”)• BETWEEN val1 AND val2• IS NULL

Page 49: Chapter 9

2004-10-06 SLIDE 49IS 257 – Fall 2004

Relational Algebra Selection using SELECT

• Syntax:

SELECT * WHERE condition1 {AND | OR} condition2;

Page 50: Chapter 9

2004-10-06 SLIDE 50IS 257 – Fall 2004

Relational Algebra Projection using SELECT

• Syntax:

SELECT [DISTINCT] attr1, attr2,…, attr3 FROM rel1 r1, rel2 r2,… rel3 r3;

Page 51: Chapter 9

2004-10-06 SLIDE 51IS 257 – Fall 2004

Relational Algebra Join using SELECT

• Syntax:

SELECT * FROM rel1 r1, rel2 r2 WHERE r1.linkattr = r2.linkattr ;

Page 52: Chapter 9

2004-10-06 SLIDE 52IS 257 – Fall 2004

Sorting

• SELECT BIOLIFE.[Common Name], BIOLIFE.[Length (cm)]

FROM BIOLIFE

ORDER BY BIOLIFE.[Length (cm)] DESC;

Note: the square brackets are not part of the standard,But are used in Access for names with embedded blanks

Page 53: Chapter 9

2004-10-06 SLIDE 53IS 257 – Fall 2004

Subqueries

• SELECT SITES.[Site Name], SITES.[Destination no]

FROM SITES

WHERE sites.[Destination no] IN (SELECT [Destination no] from DEST where [avg temp (f)] >= 78);

• Can be used as a form of JOIN.

Page 54: Chapter 9

2004-10-06 SLIDE 54IS 257 – Fall 2004

Aggregate Functions

• Count• Avg• SUM• MAX• MIN• Others may be available in different

systems

Page 55: Chapter 9

2004-10-06 SLIDE 55IS 257 – Fall 2004

Using Aggregate functions

• SELECT attr1, Sum(attr2) AS name FROM tab1, tab2 ...

GROUP BY attr1, attr3 HAVING condition;

Page 56: Chapter 9

2004-10-06 SLIDE 56IS 257 – Fall 2004

Using an Aggregate Function

• SELECT DIVECUST.Name, Sum([Price]*[qty]) AS Total

FROM (DIVECUST INNER JOIN DIVEORDS ON DIVECUST.[Customer No] = DIVEORDS.[Customer No]) INNER JOIN DIVEITEM ON DIVEORDS.[Order No] = DIVEITEM.[Order No]

GROUP BY DIVECUST.Name HAVING (((DIVECUST.Name) Like

"*Jazdzewski"));

Page 57: Chapter 9

2004-10-06 SLIDE 57IS 257 – Fall 2004

GROUP BY

• SELECT DEST.[Destination Name], Count(*) AS Expr1

FROM DEST INNER JOIN DIVEORDS ON DEST.[Destination Name] = DIVEORDS.Destination

GROUP BY DEST.[Destination Name]

HAVING ((Count(*))>1);

• Provides a list of Destinations with the number of orders going to that destination

Page 58: Chapter 9

2004-10-06 SLIDE 58IS 257 – Fall 2004

Create Table

• CREATE TABLE table-name (attr1 attr-type PRIMARYKEY, attr2 attr-type,…,attrN attr-type);

• Adds a new table with the specified attributes (and types) to the database.

Page 59: Chapter 9

2004-10-06 SLIDE 59IS 257 – Fall 2004

Access Data Types

• Numeric (1, 2, 4, 8 bytes, fixed or float)• Text (255 max)• Memo (64000 max)• Date/Time (8 bytes)• Currency (8 bytes, 15 digits + 4 digits decimal)• Autonumber (4 bytes)• Yes/No (1 bit)• OLE (limited only by disk space)• Hyperlinks (up to 64000 chars)

Page 60: Chapter 9

2004-10-06 SLIDE 60IS 257 – Fall 2004

Access Numeric types

• Byte – Stores numbers from 0 to 255 (no fractions). 1 byte

• Integer– Stores numbers from –32,768 to 32,767 (no fractions) 2 bytes

• Long Integer (Default) – Stores numbers from –2,147,483,648 to 2,147,483,647 (no fractions).

4 bytes

• Single– Stores numbers from -3.402823E38 to –1.401298E–45 for negative

values and from 1.401298E–45 to 3.402823E38 for positive values.4 bytes

• Double– Stores numbers from –1.79769313486231E308 to –

4.94065645841247E–324 for negative values and from 1.79769313486231E308 to 4.94065645841247E–324 for positive values. 15 8 bytes

• Replication ID– Globally unique identifier (GUID) N/A 16 bytes

Page 61: Chapter 9

2004-10-06 SLIDE 61IS 257 – Fall 2004

Oracle Data Types

• CHAR (size) -- max 2000• VARCHAR2(size) -- up to 4000• DATE• DECIMAL, FLOAT, INTEGER, INTEGER(s),

SMALLINT, NUMBER, NUMBER(size,d)– All numbers internally in same format…

• LONG, LONG RAW, LONG VARCHAR– up to 2 Gb -- only one per table

• BLOB, CLOB, NCLOB -- up to 4 Gb• BFILE -- file pointer to binary OS file

Page 62: Chapter 9

2004-10-06 SLIDE 62IS 257 – Fall 2004

Creating a new table from existing tables

• Syntax:

SELECT [DISTINCT] attr1, attr2,…, attr3 INTO newtablename FROM rel1 r1, rel2 r2,… rel3 r3 WHERE condition1 {AND | OR} condition2 ORDER BY attr1 [DESC], attr3 [DESC]

Page 63: Chapter 9

2004-10-06 SLIDE 63IS 257 – Fall 2004

Using Oracle

• Go to “My.SIMS” from the SIMS web site and click on Oracle (you don’t need to do it again)

• Use SSH to login to dream (unix shell)

• At the command line type “sqlplus”

• Oracle will prompt you for login and password

• If everything is set up you are logged into Oracle and will get the SQL> prompt…