45
The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007 me slide content courtesy of Susan Davidson & Raghu Ramakrishnan

The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

Embed Size (px)

DESCRIPTION

3 Recall Basic SQL SELECT [DISTINCT] {T 1.attrib, …, T 2.attrib} FROM {relation} T 1, {relation} T 2, … WHERE {predicates}  SELECT *  All STUDENTs  AS  As a “range variable” (tuple variable): optional  As an attribute rename operator select-list from-list qualification

Citation preview

Page 1: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

The Structured Query Language

Zachary G. Ives / Nicholas TaylorUniversity of Pennsylvania

CIS 550 – Database & Information Systems

September 26, 2007Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

Page 2: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

2

Administrivia Homework 2 handed out today

Due 10/8

Page 3: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

3

Recall Basic SQLSELECT [DISTINCT] {T1.attrib, …, T2.attrib}FROM {relation} T1, {relation} T2, …WHERE {predicates}

SELECT * All STUDENTs

AS As a “range variable” (tuple variable): optional As an attribute rename operator

select-list

from-listqualification

Page 4: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

4

Our Example Data Instance

sid name1 Jill2 Qun3 Nitin

fid name1 Ives2 Saul8 Martin

sid exp-grade

cid

1 A 550-01051 A 700-10053 C 501-0105

cid subj sem

550-0105 DB F05700-1005 AI S05501-0105 Arch F05

fid cid1 550-

01052 700-

10058 501-

0105

STUDENT Takes COURSE

PROFESSOR Teaches

Page 5: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

5

Some Nice Features SELECT *

All STUDENTs AS

As a “range variable” (tuple variable): optional As an attribute rename operator

Example: Which students (names) have taken more than

one course from the same professor?

Page 6: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

6

Expressions in SQL Can do computation over scalars (int, real or

string) in the select-list or the qualification Show all student IDs decremented by 1

Strings: Fixed (CHAR(x)) or variable length (VARCHAR(x)) Use single quotes: ’A string’ Special comparison operator: LIKE Not equal: <>

Typecasting: CAST(S.sid AS VARCHAR(255))

Page 7: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

7

Set Operations Set operations default to set semantics, not bag

semantics:(SELECT … FROM … WHERE …){op}(SELECT … FROM … WHERE …)

Where op is one of: UNION INTERSECT, MINUS/EXCEPT

(many DBs don’t support these last ones!) Bag semantics: ALL

Page 8: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

8

Exercise Find all students who have taken DB but

not AI Hint: use EXCEPT

Page 9: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

9

Set Operations Set operations default to set semantics, not bag

semantics:(SELECT … FROM … WHERE …){op}(SELECT … FROM … WHERE …)

Where op is one of: UNION INTERSECT, MINUS/EXCEPT

(many DBs don’t support these last ones!) Bag semantics: ALL

Page 10: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

10

Exercise Find all students who have taken DB but

not AI Hint: use EXCEPT

Page 11: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

11

Revised Example Data Instance

sid name1 Jill2 Qun3 Nitin4 Marty

fid name1 Ives2 Saul8 Martin

sid exp-grade

cid

1 A 550-01051 A 700-10053 A 700-10053 C 501-01054 C 501-0105

cid subj sem

550-0105 DB F05700-1005 AI S05501-0105 Arch F05555-1006 Sys S06

fid cid1 550-

01052 700-

10058 501-

0105

STUDENT Takes COURSE

PROFESSOR Teaches

Page 12: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

12

Nested Queries in SQL Simplest: IN/NOT IN

Example: Students who have taken subjects that have (at any point) been taught by Martin

Page 13: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

13

Correlated Subqueries Most common: EXISTS/NOT EXISTS

Find all students who have taken DB but not AI

Page 14: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

14

Universal and Existential Quantification Generally used with subqueries:

{op} ANY, {op} ALL Find the students with the best expected

grades

Page 15: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

15

Table Expressions Can substitute a subquery for any relation

in the FROM clause:SELECT S.sidFROM (SELECT sid FROM STUDENT WHERE sid = 5) SWHERE S.sid = 4

Notice that we can actually simplify this query!

What is this equivalent to?

Page 16: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

16

Aggregation GROUP BY

SELECT {group-attribs}, {aggregate-operator}(attrib)FROM {relation} T1, {relation} T2, …WHERE {predicates}GROUP BY {group-list}

Aggregate operators AVG, COUNT, SUM, MAX, MIN DISTINCT keyword for AVG, COUNT, SUM

Page 17: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

17

Some Examples Number of students in each course

offering Number of different grades expected for

each course offering Number of (distinct) students taking AI

courses

Page 18: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

18

Data Instance, Again

sid name1 Jill2 Qun3 Nitin4 Marty

fid name1 Ives2 Saul8 Martin

sid exp-grade

cid

1 A 550-01051 A 700-10053 A 700-10053 C 501-01054 C 501-0105

cid subj sem

550-0105 DB F05700-1005 AI S05501-0105 Arch F05555-1006 Sys S06

fid cid1 550-

01052 700-

10058 501-

0105

STUDENT Takes COURSE

PROFESSOR Teaches

Page 19: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

19

What If You Want to Only ShowSome Groups? The HAVING clause lets you do a selection based

on an aggregate (there must be 1 value per group):

SELECT C.subj, COUNT(S.sid)FROM STUDENT S, Takes T, COURSE CWHERE S.sid = T.sid AND T.cid = C.cidGROUP BY subjHAVING COUNT(S.sid) > 5

Exercise: For each subject taught by at least two professors, list the minimum expected grade

Page 20: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

20

Aggregation and Table Expressions(aka Derived Relations) Sometimes need to compute results over

the results of a previous aggregation:

SELECT subj, AVG(size)FROM (

SELECT C.cid AS id, C.subj AS subj, COUNT(S.sid) AS sizeFROM STUDENT S, Takes T, COURSE CWHERE S.sid = T.sid AND T.cid = C.cidGROUP BY cid, subj)

GROUP BY subj

Page 21: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

21

Thought Exercise… Tables are great, but…

Not everyone is uniform – I may have a cell phone but not a fax

We may simply be missing certain information We may be unsure about values

How do we handle these things?

Page 22: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

22

One Answer: Null Values We designate a special “null” value to

represent “unknown” or “N/A”

But a question: what does:

do?

Name

Home Fax

Sam 123-4567

NULL

Li 234-8972

234-8766

Maria

789-2312

789-2121SELECT * FROM CONTACT WHERE Fax < “789-1111”

Page 23: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

23

Three-State Logic Need ways to evaluate boolean expressions

and have the result be “unknown” (or T/F) Need ways of composing these three-state

expressions using AND, OR, NOT:

Can also test for null-ness: attr IS NULL, attr IS NOT NULL

Finally: need rules for arithmetic, aggregation

T AND U = UF AND U = FU AND U = U

T OR U = TF OR U = UU OR U = U

NOT U = U

Page 24: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

24

Nulls and Joins Sometimes need special variations of joins:

I want to see all courses and their students … But what if there’s a course with no students?

Outer join: Most common is left outer join:

SELECT C.subj, C.cid, T.sid FROM COURSE C LEFT OUTER JOIN Takes T ON C.cid = T.cidWHERE …

Page 25: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

25

Data Instance, Again (!)

sid name1 Jill2 Qun3 Nitin4 Marty

fid name1 Ives2 Saul8 Martin

sid exp-grade

cid

1 A 550-01051 A 700-10053 A 700-10053 C 501-01054 C 501-0105

cid subj sem

550-0105 DB F05700-1005 AI S05501-0105 Arch F05555-1006 Sys S06

fid cid1 550-

01052 700-

10058 501-

0105

STUDENT Takes COURSE

PROFESSOR Teaches

Page 26: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

26

Warning on Outer Join Oracle doesn’t support standard SQL

syntax here:

SELECT C.subj, C.cid, T.sid FROM COURSE C , Takes T WHERE C.cid =(+) T.cid

Page 27: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

27

Beyond Null Can have much more complex ideas of

incomplete or approximate information Probabilistic models (tuple 80% likely to be an

answer) Naïve tables (can have variables instead of NULLs) Conditional tables (tuple IF some condition holds)

… And what if you want “0 or more”? In relational databases, create a new table and

foreign key But can have semistructured data (like XML)

Page 28: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

28

Modifying the Database:Inserting Data Inserting a new literal tuple is easy, if wordy:

INSERT INTO PROFESSOR (fid, name)VALUES (4, ‘Simpson’)

But we can also insert the results of a query!

INSERT INTO PROFESSOR (fid, name) SELECT sid AS fid, name FROM STUDENT WHERE sid < 20

Page 29: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

29

Deleting Tuples Deletion is a fairly simple operation:

DELETEFROM STUDENT SWHERE S.sid < 25

Page 30: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

30

Updating Tuples What kinds of updates might you want to

do?

UPDATE STUDENT SSET S.sid = 1 + S.sid, S.name = ‘Janet’WHERE S.name = ‘Jane’

Page 31: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

31

Now, How Do I Talk to the DB? Generally, apps are in a different (“host”)

language with embedded SQL statements Static (query fixed): SQLJ, embedded SQL in C Dynamic (query generated by program at

runtime): ODBC, JDBC, ADO, OLE DB, … Predefined mappings between SQL types

and host language types CHAR, VARCHAR String INTEGER int DOUBLE double

Page 32: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

32

Static SQL using SQLJint sid = 5;String name5 = "Jim", name5;// Database connection setup omitted

#sql {INSERT INTO STUDENTVALUES(:sid, :name)

};

#sql {SELECT name INTO :name6 FROM

STUDENTWHERE sid = 6

};

Page 33: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

33

JDBC: Dynamic SQLimport java.sql.*;

Connection conn = DriverManager.getConnection(…);Statement s = conn.createStatement();

int sid = 5;String name = "Jim";s.executeUpdate("INSERT INTO STUDENT VALUES(" +

sid + ", '" + name + "')");// or equivalentlys.executeUpdate(" INSERT INTO STUDENT VALUES(5,

'Jim')");

Page 34: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

34

Static vs. Dynamic SQL Syntax

Static is cleaner that Dynamic Dynamic doesn’t extend language syntax, so

you can use any tool you like Execution

Static must be precompiled Can be faster at runtime Extra step is needed to deploy application

Static checks SQL syntax at compilation time, Dynamic at run time

We’ll focus on JDBC, since it’s easy to use

Page 35: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

35

The Impedance Mismatch and Cursors SQL is set-oriented – it returns relations There’s no relation type in most languages! Solution: cursor that’s opened, read

ResultSet rs = stmt.executeQuery("SELECT * FROM STUDENT");

while (rs.next()) {int sid = rs.getInt("sid");String name = rs.getString("name");System.out.println(sid + ": " + name);

}

Page 36: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

36

JDBC: Prepared Statements (1) But query compilation takes a (relatively) long time! This example is therefore inefficient.

int[] students = {1, 2, 4, 7, 9};for (int i = 0; i < students.length; ++i) {

ResultSet rs = stmt.executeQuery("SELECT * " +

"FROM STUDENT WHERE sid = " + students[i]);

while (rs.next()) {…

}

Page 37: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

37

JDBC: Prepared Statements (2) To speed things up, prepare statements and bind

arguments to them This also means you don’t have to worry about escaping

strings, formatting dates, etc. Problems with this lead to a lot of security holes (SQL injection) Or suppose a user inputs the name “O’Reilly”

PreparedStatement stmt = conn.prepareStatement("SELECT * " +

"FROM STUDENT WHERE sid = ? ");int[] students = {1, 2, 4, 7, 9};for (int i = 0; i < students.length; ++i) {

stmt.setInt(1, students[i]);ResultSet rs = stmt.executeQuery();while (rs.next()) {…

}

Page 38: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

38

Database-Backed Web Sites We all know traditional static HTML web

sites:Web-Browser

HTTP-RequestGET ...

Web-Server

File-System

Load File

HTML-File

HTML-File

Page 39: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

39

Common Gateway Interface (CGI)Can have the web server invoke code (with

parameters) to generate HTML

Web ServerHTTP-Request

HTML-File

Web Server

File-SystemLoad File

FileHTML?

HTML

Execute Program

Program?Output

I/O, Network, DB

Page 40: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

40

CGI: Discussion Advantages:

Standardized: works for every web-server, browser Flexible: Any language (C++, Perl, Java, …) can be

used Disadvantages:

Statelessness: query-by-query approach Inefficient: new process forked for every request Security: CGI programmer is responsible for security Updates: To update layout, one has to be a

programmer

Page 41: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

41

Java-Server-Process

DB Access in Java

Sybase

Java Applet

TCP/UDPIP

Oracle ...

JDBC-Driver

JDBC-Driver

JDBC-Driver

JDBC Driver manager

BrowserJVM

Page 42: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

42

Java Applets: Discussion Advantages:

Can take advantage of client processing Platform independent – assuming standard Java

Disadvantages: Requires JVM on client; self-contained Inefficient: loading can take a long time ... Resource intensive: Client needs to be state of the

art Restrictive: can only connect to server where

applet was loaded from (for security … can be configured)

Page 43: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

43

*SP Server Pages and Servlets(IIS, Tomcat, …)

File-SystemWeb Server

HTTP Request

HTML File

Web Server

Load File

FileHTML?HTML

I/O, Network, DB

Script/Servlet?

Output

Server Extension

May have a built-in VM (JVM, CLR)

Page 44: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

44

DB-Driven Web Server

One Step Beyond: DB-Driven Web Sites (Strudel, Cocoon, …)

LocalDatabase

HTTP Request

HTML File

Web Server

Cache

Data

HTML

Other datasources

Script?

DynamicHTML

Generation

Styles

Page 45: The Structured Query Language Zachary G. Ives / Nicholas Taylor University of Pennsylvania CIS 550 – Database & Information Systems September 26, 2007

45

Wrapping Up We’ve seen how to query in SQL

Basic foundation is TRC-based Subqueries and aggregation add extra power beyond *RC Nulls and outer joins add flexibility of representation We can update tables

We’ve also seen that SQL doesn’t precisely match standard host language semantics Embedded SQL Dynamic SQL

We’ve seen a hint of data-driven web site architectures