Upload
edith-hardy
View
222
Download
3
Embed Size (px)
Citation preview
Dan Suciu -- p544 Fall 2011 2
Staff
• Instructor: Dan Suciu– CSE 662, [email protected]– Office hours: Wednesdays, 5:30-6:20
• TAs: – Sandra Fan, [email protected]
Dan Suciu -- p544 Fall 2011 3
Communications
• Web page: http://www.cs.washington.edu/p544 – Lectures will be available here– Homework will be posted here– Announcements may be posted here
• Mailing list:– Announcements, group discussions– If you registered, you are automatically subscribed
Dan Suciu -- p544 Fall 2011 4
Textbook(s)
Main textbook:• Database Management Systems,
Ramakrishnan and Gehrke
Second textbook:• Database Systems: The Complete Book,
Garcia-Molina, Ullman, Widom
Dan Suciu -- p544 Fall 2011 5
Course Format
• Lectures Wednesdays, 6:30-9:20
• 7 Homework Assignments
• Take-home Final
Dan Suciu -- p544 Fall 2011 7
Homework Assignments
1. SQL2. Conceptual design3. JAVA/SQL4. Transactions5. Database tuning6. XML/XPath/XQuery7. Pig Latin, on AWS
Due: Mondays’, by 11:59pm. Three late days per person
Dan Suciu -- p544 Fall 2011 8
Take-home Final
• Posted on December 8, at 11:59pm
• Due on December 10, by 10:00pm
• No late days/hours/minutes/seconds
Dan Suciu -- p544 Fall 2011 9
Software Tools• Postgres:
– Preferred usage: download from download http://www.postgresql.org/download/– Other option: use postgres on lab machines
• SQL Server 2008– Download client from http://msdnaa.cs.washington.edu – Username is your full @cs.washington.edu email address– Doesn’t work ? Email [email protected] – Connect to IPROJSRV (may need tunneling)– OK to use you own server, just import IMDB
• Xquery: download one interpreter from– Preferred: Saxon: http://saxon.sourceforge.net/ (from apache; very popular)– Others:
• Zorba: http://www.zorba-xquery.com/ (I used this one: ½ day installation)• Galax: http://galax.sourceforge.net/ (great in the past, seems less well maintained)
• Pig Latin: – We will run it on Amazon Web Services– You may download from http://hadoop.apache.org/pig/, but you won’t need it
Dan Suciu -- p544 Fall 2011 10
Accessing SQL Server
• SQL Server Management Studio• Server Type = Database Engine• Server Name = IPROJSRV• Authentication = SQL Server Authentication
– Login = your UW email address (not the CSE email)– Password = [in class]
• Must connect from within CSE, or must use tunneling• Alternatively: install your own, get it from MSDNAA
(see earlier slide)• Then play with IMDB, start working on HW 1
Dan Suciu -- p544 Fall 2011 11
Rest of Today’s Lecture
• Overview of DBMS
• Overview of the course content
• SQL
Dan Suciu -- p544 Fall 2011 13
Database
What is a database ?• A collection of files storing related data
Give examples of databases• Accounts database; payroll database; UW’s
students database; Amazon’s products database; airline reservation database
Dan Suciu -- p544 Fall 2011 15
Database Management System
What is a DBMS ?• A big C program written by someone else that allows
us to manage efficiently a large database and allows it to persist over long periods of time
Give examples of DBMS• DB2 (IBM), SQL Server (MS), Oracle, Sybase• MySQL, Postgres, …
SQL for Nerds, Greenspun, http://philip.greenspun.com/sql/ (Chap 1)
Dan Suciu -- p544 Fall 2011 16
Market Shares
From 2006 Gartner report:
• IBM: 21% market with $3.2BN in sales
• Oracle: 47% market with $7.1BN in sales
• Microsoft: 17% market with $2.6BN in sales
Dan Suciu -- p544 Fall 2011 17
An Example
The Internet Movie Databasehttp://www.imdb.com
• Entities: Actors (800k), Movies (400k), Directors, …
• Relationships:who played where, who directed what, …
Dan Suciu -- p544 Fall 2011 18
Key concept 1: Relational Data Model
Actor: Cast:
Movie:
id fName lName gender
195428 Tom Hanks M
645947 Amy Hanks F
. . .
id Name year
337166 Toy Story 1995
. . . . . . . ..
pid mid
195428 337166
. . .
Dan Suciu -- p544 Fall 2011 19
Key concept 2: Declarative Query LanguageSELECT *
FROM Actor
SELECT count(*)
FROM Actor
SELECT *
FROM Actor
WHERE lName = ‘Hanks’SQL
We write what we want,not how we want it.
20
Key concept 3: Data Independence
SELECT *
FROM Actor, Casts, Movie
WHERE lname='Hanks' and Actor.id = Casts.pid
and Casts.mid=Movie.id and Movie.year=1995
817k actors, 3.5M casts, 380k movies;
How can it be so fast ?
Physical data independence:query is independent of physical storage
Dan Suciu -- p544 Fall 2011 21
How Can We Evaluate the Query ?
Actor: Cast: Movie:
id fName lName gender
. . . Hanks
. . .
id Name year
. . . 1995
. . .
pid mid
. . .
. . .
Plan 1: . . . . [ in class ]
Plan 2: . . . . [ in class ]
Dan Suciu -- p544 Fall 2011 22
Actor Cast Movie
lName=‘Hanks’ year=1995
Actor Cast Movie
lName=‘Hanks’ year=1995
Indexes: on Actor.lName, on Movie.year
Alternative query plans:
Query optimizationDatabase Statistics histograms, synopses, etc
Dan Suciu -- p544 Fall 2011 23
Key concept 4: Transactions
X = Read(Account_1);X.amount = X.amount - 100;Write(Account_1, X);
Y = Read(Account_2);Y.amount = Y.amount + 100;Write(Account_2, Y);
CRASH !
What is the problem ?
Recovery from systems failures:Transfer $100 from account 1 to account 2:
Dan Suciu -- p544 Fall 2011 24
X = Read(Account);if (X.amount >= 100) { dispense_money( ); X.amount = X.amount – 100; }else error(“Insufficient funds”);
X = Read(Account);if (X.amount >= 100) { dispense_money( ); X.amount = X.amount – 100; }else error(“Insufficient funds”);
What can go wrong ?
Concurrency Control
Overdrafting an account:
User 1: User 2:
Dan Suciu -- p544 Fall 2011 25
Transactions
ACID =• Atomicity ( = recovery)• Consistency• Isolation ( = concurrency control)• Durability
Dan Suciu -- p544 Fall 2011 26
Client/Server Database Architecture
• Single server that stores the database• Many clients running apps and connecting to DBMS• Performance bottlenecks:
– Client/server communication– Transactional semantics
• Other architectures: – main memory database– replicated databases
Dan Suciu -- p544 Fall 2011 27
Two Types of Database Usage
• OLTP (online-transaction-processing)– Many updates– Many simple “point queries”– Few (or no) complex aggregate queries
• Decision-Support– Many aggregate/group-by queries.– Few (or no) updates
Dan Suciu -- p544 Fall 2011 28
Trends in Data Management
• Large scale data analytics: Map/Reduce, Pig, …• Cloud based database service: AWS, Azure, …• NoSQL: sacrifice ACID for performance• Data privacy• Data provenance• Complex data analytics: probabilistic databases
Dan Suciu -- p544 Fall 2011 29
Outline of Course Content
1. SQL2. Relational Calculus, Database Design3. Constraints, Views4. Transactions: recovery5. Transactions: concurrency control6. XML, XPath, XQuery7. Data storage, indexes, physical tuning8. Query execution9. Query optimization10. Big Data: Parallel databases, Map/Reduce, Pig Latin11. Advanced topics: privacy, provenance, probabilistic dbs
Dan Suciu -- p544 Fall 2011 30
Announcement: Homework 1
• Homework 1 is posted; • Due on Monday, Oct. 10• Tools:
– Postgres: install on your computer (PREFERRED) or use the installation in the lab
– SQL Server, for testing only; connect to IPROJSRV:login: your UW email address; password: ……..
• Tasks: create db, import data, create indices, write 11 SQL queries
Dan Suciu -- p544 Fall 2011 31
Outline for rest of today
• Basics SQL (Chapters 5.2, 5.3)• Aggregates (Chapter 5.5.)• Nulls, Outer joins (Chapter 5.6)• Subqueries (Chapters 5.4)
– This is tough ! Next lecture we will discuss Relational Calculus (a.k.a. Tuple Calculus, Chapter 4.3). See supplementary text Three Query Language Formalisms
Dan Suciu -- p544 Fall 2011 32
SQL
• Data Definition Language (DDL)– Create/alter/delete tables and their attributes– Read from the book
• Data Manipulation Language (DML)– Query tables, Insert/delete/modify – Discussed in class
Dan Suciu -- p544 Fall 2011 33
Tables in SQL
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Product
Attribute namesTable name
Tuples or rows
Key
Dan Suciu -- p544 Fall 2011 34
The Relational Data ModelData is stored in tables , a.k.a. relations
Each relation has:1. A schema = name+attributes
– Product(PName, Price, Category, Manufacturer)– Each relation has a key, which we underline
2. An instance = set of rows
SQL departs from the pure relational model in that it allows duplicate tuples• Set semantics bag semantics
{1, 2, 3} {1, 1, 2, 3, 3, 3}
Dan Suciu -- p544 Fall 2011 35
Data Types in SQL
• Atomic types:– Characters: CHAR(20), VARCHAR(50)– Numbers: INT, BIGINT, SMALLINT, FLOAT– Others: MONEY, DATETIME, …
• Record (aka tuple)– Has atomic attributes
• Table (relation)– A set of tuples
Dan Suciu -- p544 Fall 2011 36
Simple SQL Query
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
SELECT *FROM ProductWHERE category=‘Gadgets’
Product
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks“selection”
Dan Suciu -- p544 Fall 2011 37
Simple SQL QueryPName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
SELECT PName, Price, ManufacturerFROM ProductWHERE Price > ‘$100’
Product
PName Price Manufacturer
SingleTouch $149.99 Canon
MultiTouch $203.99 Hitachi
“selection” and
“projection”
Dan Suciu -- p544 Fall 2011 38
Details• Case insensitive:
SELECT = Select = select
Product = product
BUT: ‘Seattle’ ≠ ‘seattle’
• Constants:
‘abc’ - yes
“abc” - no
Dan Suciu -- p544 Fall 2011 39
Eliminating Duplicates
SELECT DISTINCT category
FROM Product
Compare to:
SELECT category
FROM Product
Category
Gadgets
Gadgets
Photography
Household
Category
Gadgets
Photography
Household
Dan Suciu -- p544 Fall 2011 40
Ordering the Results
SELECT pname, price, manufacturer
FROM Product
WHERE category=‘Gadgets’ AND price > ‘$10’
ORDER BY price, pname
Ties are broken by the second attribute on the ORDER BY list.
Ordering is ascending, unless you specify the DESC keyword.
Dan Suciu -- p544 Fall 2011 41
SELECT Category
FROM Product
ORDER BY PName
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
?SELECT DISTINCT category
FROM Product
ORDER BY category
SELECT DISTINCT category
FROM Product
ORDER BY PName
?
?
Dan Suciu -- p544 Fall 2011 42
Keys and Foreign Keys
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Product
Company
CName Country
GizmoWorks USA
Canon Japan
Hitachi Japan
Key
Foreignkey
Dan Suciu -- p544 Fall 2011 43
Joins
Product (PName, Price, Category, Manufacturer)
Company (CName,, Country)
Find all products under $200 manufactured in Japan;return their names and prices.
SELECT PName, PriceFROM Product, CompanyWHERE Manufacturer=CName AND Country=‘Japan’ AND Price <= ‘$200’
Joinbetween Product
and Company
Dan Suciu -- p544 Fall 2011 44
Joins
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Product Company
Cname Country
GizmoWorks USA
Canon Japan
Hitachi Japan
PName Price
SingleTouch $149.99
SELECT PName, PriceFROM Product, CompanyWHERE Manufacturer=CName AND Country=‘Japan’ AND Price <= ‘$200’
Dan Suciu -- p544 Fall 2011 45
Tuple Variables
SELECT DISTINCT name, countryFROM Person, CompanyWHERE worksfor = cname
Whichcountry ?
Product (PName, Price, Category, Manufacturer)
Company (CName,, Country)
Person(name, Country, Worksfor)
SELECT DISTINCT Person.name, Company.countryFROM Person, CompanyWHERE Person.worksfor = Company.cname
SELECT DISTINCT x.name, y.countryFROM Person AS x, Company AS yWHERE x.worksfor = y.cname
Dan Suciu -- p544 Fall 2011 46
In Class
Product (pname, price, category, manufacturer)
Company (cname, country)
Find all Chinese companies that manufacture products in the ‘toy’ category
SELECT cname
FROM
WHERE
Dan Suciu -- p544 Fall 2011 47
In Class
Product (pname, price, category, manufacturer)
Company (cname, country)
Find all Chinese companies that manufacture products both in the ‘electronic’ and ‘toy’ categories
SELECT cname
FROM
WHERE
Dan Suciu -- p544 Fall 2011 48
The Nested Loop Semantics of SQL Queries
SELECT a1, a2, …, ak
FROM R1 AS x1, R2 AS x2, …, Rn AS xnWHERE Conditions
Answer = {}
for x1 in R1 do
for x2 in R2 do …..
for xn in Rn do if Conditions
then Answer = Answer {(a1,…,ak)}return Answer
Dan Suciu -- p544 Fall 2011 49
SELECT DISTINCT R.A
FROM R, S, T
WHERE R.A=S.A OR R.A=T.A
Using the Formal Semantics
If S ≠ and T ≠ ∅ ∅then returns R Ç (S È T)else returns ∅
What do these queries compute ?
SELECT DISTINCT R.A
FROM R, S
WHERE R.A=S.A
Returns R Ç S
Dan Suciu -- p544 Fall 2011 50
Aggregation
SELECT count(*)FROM Product
Except count, all aggregations apply to a single attribute
SELECT sum(price)FROM ProductWHERE manufacturer=‘GizmoWorks’
SQL supports several aggregation operations:
sum, count, min, max, avg
Product (pname, price, category, manufacturer)
Company (cname, country)
Dan Suciu -- p544 Fall 2011 51
COUNT applies to duplicates, unless otherwise stated:
SELECT count(category) FROM ProductWHERE price > ‘$20’
If category has no nulls,then count(category)=count(*)
We probably want:
SELECT count(DISTINCT category)FROM ProductWHERE price > ‘$20’
Aggregation: CountProduct (pname, price, category, manufacturer)
Company (cname, country)
Dan Suciu -- p544 Fall 2011 52
Grouping and Aggregation
SELECT manufacturer, count(*) AS totalFROM ProductWHERE price < ‘$200’GROUP BY manufacturer
Let’s see what this means…
For each manufacturer, find total number of its products under $200.
Product (pname, price, category, manufacturer)
Company (cname, country)
Dan Suciu -- p544 Fall 2011 53
Grouping and Aggregation
1. Compute the FROM and WHERE clauses.
2. Group by the attributes in the GROUPBY
3. Compute the SELECT clause, including aggregates.
Dan Suciu -- p544 Fall 2011 54
1&2. FROM-WHERE-GROUPBY
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
SELECT manufacturer, count(*) AS totalFROM ProductWHERE price < ‘$200’GROUP BY manufacturer
Dan Suciu -- p544 Fall 2011 55
3. SELECTSELECT manufacturer, count(*) AS totalFROM ProductWHERE price < ‘$200’GROUP BY manufacturer
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
count(*) Manufacturer
2 GizmoWorks
1 Canon
Dan Suciu -- p544 Fall 2011 56
HAVING Clause
SELECT manufacturer, count(*) AS totalFROM ProductWHERE price < ‘$200’GROUP BY manufacturerHAVING min(price) >’$20’
Same query, except that we return only those manufacturersthat make only products with price > $20
HAVING clause contains conditions on aggregates.
Product (pname, price, category, manufacturer)
Company (cname, country)
Dan Suciu -- p544 Fall 2011 57
General form of Grouping and Aggregation
SELECT SFROM R1,…,Rn
WHERE C1GROUP BY a1,…,ak
HAVING C2
S = may contain attributes a1,…,ak and/or any aggregates but NO OTHER ATTRIBUTES
C1 = is any condition on the attributes in R1,…,Rn
C2 = is any condition on aggregate expressions
Why ?
Dan Suciu -- p544 Fall 2011 58
General form of Grouping and Aggregation
Evaluation steps:1. Evaluate FROM-WHERE, apply condition C1
2. Group by the attributes a1,…,ak
3. Apply condition C2 to each group (may have aggregates)4. Compute aggregates in S and return the result
SELECT S
FROM R1,…,Rn
WHERE C1
GROUP BY a1,…,ak
HAVING C2
Dan Suciu -- p544 Fall 2011 59
NULLS in SQL
• Whenever we don’t have a value, we can put a NULL• Can mean many things:
– Value does not exists– Value exists but is unknown– Value not applicable– Etc.
• The schema specifies for each attribute if can be null (nullable attribute) or not
• How does SQL cope with tables that have NULLs ?
Dan Suciu -- p544 Fall 2011 60
Null Values
• If x= NULL then 4*(3-x)/7 is still NULL
• If x= NULL then x=‘Joe’ is UNKNOWN• In SQL there are three boolean values:
FALSE = 0UNKNOWN = 0.5TRUE = 1
Dan Suciu -- p544 Fall 2011 61
Null Values
• C1 AND C2 = min(C1, C2)• C1 OR C2 = max(C1, C2)• NOT C1 = 1 – C1
Rule in SQL: include only tuples that yield TRUE
SELECT *FROM PersonWHERE (age < 25) AND (height > 6 OR weight > 190)
E.g.age=20heigth=NULLweight=200
Dan Suciu -- p544 Fall 2011 62
Null Values
Unexpected behavior:
Some Persons are not included !
SELECT *FROM PersonWHERE age < 25 OR age >= 25
Dan Suciu -- p544 Fall 2011 63
Null Values
Can test for NULL explicitly:– x IS NULL– x IS NOT NULL
Now it includes all Persons
SELECT *FROM PersonWHERE age < 25 OR age >= 25 OR age IS NULL
Dan Suciu -- p544 Fall 2011
Outerjoins
64
SELECT x.country, y.pnameFROM Company x JOIN Product y ON x.cname = y.manufacturer
SELECT x.country, y.pnameFROM Company x, Product y WHERE x.cname = y.manufacturer
Same as:
But countries that don’t manufacture will not be listed !
Product (pname, price, category, manufacturer)
Company (cname, country)Normally, joins are “inner joins”:
Dan Suciu -- p544 Fall 2011
Outerjoins
65
SELECT x.country, y.pnameFROM Company x LEFT OUTER JOIN Product y ON x.cname = y.manufacturer
If we want to see the companies that don’t produce anything,then we use an outer join:
Product (pname, price, category, manufacturer)
Company (cname, country)
Dan Suciu -- p544 Fall 2011 66
ProductCompanyPName Price Category
Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Cname Country
GizmoWorks USA
Canon Japan
Hitachi Japan
MuseumPass Vatican
Cname PName
USA GizmoWorks
USA GizmoWorks
Japan Canon
Japan Hitachi
Vatican NULL
Dan Suciu -- p544 Fall 2011
Application
67
SELECT x.country, count(*)FROM Company x, Product y WHERE x.cname = y.manufacturerGROUP BY x.country
What’s wrong ?
Product (pname, price, category, manufacturer)
Company (cname, country)
Compute the total number of products made by each country
Dan Suciu -- p544 Fall 2011
Application
68
SELECT x.country, count(y.pname)FROM Company x LEFT OUTER JOIN Product y ON x.cname = y.manufacturerGROUP BY x.country
Now we also get the products who sold in 0 quantity
Product (pname, price, category, manufacturer)
Company (cname, country)
Compute the total number of products made by each country
Note: we don’tuse count(*)
WHY ?
Dan Suciu -- p544 Fall 2011 69
Outer Joins
• Left outer join:– Include the left tuple even if there’s no match
• Right outer join:– Include the right tuple even if there’s no match
• Full outer join:– Include the both left and right tuples even if there’s no
match
Dan Suciu -- p544 Fall 2011
Subqueries
• A subquery is another SQL query nested inside a larger query
• Such inner-outer queries are called nested queries• A subquery may occur in:
1. A SELECT clause2. A FROM clause3. A WHERE clause
70
Rule of thumb: avoid writing nested queries when possible; sometimes it’s impossible
Dan Suciu -- p544 Fall 2011 71
1. Subqueries in SELECT
Product (pname, price, category, manufacturer)
Company (cname, country)
For each product return the country that manufactures it
SELECT X.pname, (SELECT Y.country FROM Company Y WHERE Y.cname=X.manufacturer)FROM Product X
What happens if a subquery returns more than one country ?
Dan Suciu -- p544 Fall 2011 72
1. Subqueries in SELECT
Whenever possible, don’t use a nested queries:
= We have “unnested”the query
SELECT X.pname, (SELECT Y.country FROM Company Y WHERE Y.cname=X.manufacturer)FROM Product X
SELECT pname, countryFROM Product, CompanyWHERE cname=manufacturer
Product (pname, price, category, manufacturer)
Company (cname, country)
Dan Suciu -- p544 Fall 2011 73
1. Subqueries in SELECT
Compute the number of products made by each country
SELECT DISTINCT x.country, (SELECT count(*) FROM Company y, Product WHERE y.cname=manufacturer and y.country = x.country)FROM Company x
Better: we can unnest by using a GROUP BY
Product (pname, price, category, manufacturer)
Company (cname, country)
SELECT x.country, count(*)FROM Company x, Product zWHERE x.cname = z.manufacturerGROUP BY x.country
74
GROUP BY v.s. Nested Quereis
SELECT manufacturer, count(*) AS totalFROM ProductWHERE price < '$200’GROUP BY manufacturer
SELECT DISTINCT x.manufacturer, (SELECT count(*) FROM Product y WHERE x.manufacturer = y.manufacturer AND price < '$200’) AS totalFROM Product xWHERE price < '$200’
Why twice ?Dan Suciu -- p544 Fall 2011
Dan Suciu -- p544 Fall 2011 75
2. Subqueries in FROM
Find all products whose prices is > 20 and < 30
SELECT *FROM (SELECT * FROM Product AS Y WHERE Y.price > ‘$20’) AS xWHERE x.price < ‘$30’
Unnest this query !
Product (pname, price, category, manufacturer)
Company (cname, country)
Dan Suciu -- p544 Fall 2011 76
3. Subqueries in WHERE
Find all countries that make some products with price < 100
SELECT DISTINCT x.countryFROM Company xWHERE EXISTS (SELECT * FROM Product y WHERE y.manufacturer = x.cname and y.price < ‘$100’)
Existential quantifiers
Using EXISTS:
Product (pname, price, category, manufacturer)
Company (cname, country)
Correlated subqery:uses x from outer query
Dan Suciu -- p544 Fall 2011 77
3. Subqueries in WHERE
Find all countries that make some products with price < 100
Predicate Calculus (a.k.a. First Order Logic)
{ y | x.Company(x,y)∃ ∧( z. p. c.Product(z,p,c,x)∃ ∃ ∃ ∧p<100)}
Existential quantifiersProduct (pname, price, category, manufacturer)
Company (cname, country)
Dan Suciu -- p544 Fall 2011 78
3. Subqueries in WHERE
Find all countries that make some products with price < 100
SELECT DISTINCT countryFROM CompanyWHERE cname IN (SELECT Product.manufacturer FROM Product WHERE Product.price < ‘$100’)
Using IN
Existential quantifiersProduct (pname, price, category, manufacturer)
Company (cname, country)
De-correlated subqery
Dan Suciu -- p544 Fall 2011 79
3. Subqueries in WHERE
Find all countries that make some products with price < 100
SELECT DISTINCT Company.countryFROM CompanyWHERE ‘$100’ > ANY (SELECT price FROM Product WHERE manufacturer = cname)
Using ANY:
Existential quantifiersProduct (pname, price, category, manufacturer)
Company (cname, country)
Dan Suciu -- p544 Fall 2011 80
3. Subqueries in WHERE
Find all countries that make some products with price < 100
SELECT DISTINCT x.countryFROM Company x, Product yWHERE x.cname = y.manufacturer and y.price < ‘$100’
Existential quantifiers are easy !
Now let’s unnest it:
Existential quantifiersProduct (pname, price, category, manufacturer)
Company (cname, country)
Dan Suciu -- p544 Fall 2011 81
3. Subqueries in WHERE
Universal quantifiers are hard !
Find the countries of all companies that make only products with price < 100
Universal quantifiersProduct (pname, price, category, manufacturer)
Company (cname, country)
Dan Suciu -- p544 Fall 2011 82
3. Subqueries in WHERE
Predicate Calculus (a.k.a. First Order Logic)
{ y | x.Company(x,y)∃ ∧( z. p. c.Product(z,p,c,x)∀ ∀ ∀ p<100) }
Find the countries of all companies that make only products with price < 100
Universal quantifiersProduct (pname, price, category, manufacturer)
Company (cname, country)
Dan Suciu -- p544 Fall 2011 83
3. Subqueries in WHERE
{ y | x. Company(x,y)∃ ∧( z. p. c.Product(z,p,c,x)∀ ∀ ∀ p<100) }
De Morgan’s Laws:¬(A B) = ¬A ¬B∧ ∨¬(A B) = ¬A ¬B∨ ∧¬ x. P(x) = x. ¬ P(x)∀ ∃¬ x. P(x) = x. ¬ P(x)∃ ∀
{ y| x.Company(x,y)∃ ∧¬( z p. p.Product(z,p,c,x)∃ ∃ ∃ ∧p≥100) }
{ y | x. Company(x,y)) } −∃{ y | x. Company(x,y) ∃ ∧( z p. c.Product(z,p,c,x)∃ ∃ ∃ ∧p≥100) }
¬(A B) = A ¬B ∧
=
=
Dan Suciu -- p544 Fall 2011 84
3. Subqueries in WHERE
2. Find all companies s.t. all their products have price < 100
1. Find the other companies: i.e. s.t. some product 100
SELECT DISTINCT countryFROM CompanyWHERE cname IN (SELECT manufacturer FROM Product WHERE price >= ‘$100’)
SELECT DISTINCT countryFROM CompanyWHERE cname NOT IN (SELECT manufacturer FROM Product WHERE price >= ‘$100’)
Dan Suciu -- p544 Fall 2011 85
3. Subqueries in WHERE
Find the countries of all companies that make only products with price < 100
Universal quantifiers
Using EXISTS:
SELECT DISTINCT x.countryFROM Company xWHERE NOT EXISTS (SELECT * FROM Product y WHERE y.manufacturer = x.cname and y.price >= ‘$100’)
Product (pname, price, category, manufacturer)
Company (cname, country)
Dan Suciu -- p544 Fall 2011 86
3. Subqueries in WHERE
SELECT DISTINCT Company.countryFROM CompanyWHERE ‘$100’ > ALL (SELECT price FROM Product WHERE manufacturer = cname)
Using ALL:
Find the countries of all companies that make only products with price < 100
Universal quantifiers
Product (pname, price, category, manufacturer)
Company (cname, country)
Dan Suciu -- p544 Fall 2011 87
Question for Database Fansand their Friends
• Can we unnest this query ?
Find the countries of all companies that make only products with price < 100
Dan Suciu -- p544 Fall 2011 88
Monotone Queries• A query Q is monotone if:
– Whenever we add tuples to one or more of the tables…– … the answer to the query cannot contain fewer tuples
• Fact: all unnested queries are monotone – Proof: using the “nested for loops” semantics
• Fact: A query a universal quantifier is not monotone
• Consequence: we cannot unnest a query with a universal quantifier
Dan Suciu -- p544 Fall 2011
Queries that must be nested
89
Rule of Thumb:Non-monotone queries cannot be unnested. In particular, queries with a universal quantifier cannot be unnested
Dan Suciu -- p544 Fall 2011 90
More SQL
Read the following commands in the book
• CREATE TABLE• INSERT• DELETE• UPDATE
They are easy; but we need/use them all the time in class, and in the homework assignments
Unnesting Aggregates
For each category, find the maximum price
SELECT DISTINCT X.category, (SELECT max(Y.price) FROM Product Y WHERE X.category = Y.category)FROM Product X
SELECT category, max(price)FROM ProductGROUP BY category
Equivalent queries
Note: no need for DISTINCT(DISTINCT is the same as GROUP BY)
Product (pname, price, category, manufacturer)
Company (cname, country)
Unnesting Aggregates
Find the number of products made in each country
SELECT DISTINCT X.country, (SELECT count(*) FROM Company Y, Product Z WHERE Y.cname=Z.manufacturer AND Y.country = X.country)FROM Company X
SELECT X.country, count(*)FROM Company X, Product YWHERE X.cname=Y.manufacturer GROUP BY X.country
They are NOTequivalent !
(WHY?)
Product (pname, price, category, manufacturer)
Company (cname, country)
Dan Suciu -- p544 Fall 2011 94
More Unnesting
• Find authors who wrote ³ 10 documents:• Attempt 1: with nested queries
SELECT DISTINCT Author.nameFROM AuthorWHERE count(SELECT Wrote.url FROM Wrote WHERE Author.login=Wrote.login) > 10
This isSQL bya novice
Author(login,name)
Wrote(login,url)
Dan Suciu -- p544 Fall 2011 95
More Unnesting
• Find all authors who wrote at least 10 documents:
• Attempt 2: SQL style (with GROUP BY)
SELECT DISTINCT Author.nameFROM Author, WroteWHERE Author.login=Wrote.loginGROUP BY Author.nameHAVING count(wrote.url) > 10
This isSQL by
an expert
Dan Suciu -- p544 Fall 2011 96
Finding Witnesses
For each country, find its most expensive products
Product (pname, price, category, manufacturer)
Company (cname, country)
Finding Witnesses
SELECT x.country, max(y.price)FROM Company x, Product yWHERE x.cname = y.manufacturerGROUP BY x.country
Finding the maximum price is easy…
But we need the witnesses, i.e. the products with max price
For each country, find its most expensive products
Product (pname, price, category, manufacturer)
Company (cname, country)
Dan Suciu -- p544 Fall 2011 98
Finding Witnesses
SELECT u.country, v.pname, v.priceFROM Company u, Product v, (SELECT x.country, max(y.price) as mprice FROM Company x, Product y WHERE x.cname = y.manufacturer GROUP BY x.country) AS pWHERE u.country = p.country and v.price = p.mprice
To find the witnesses, compute the maximum pricein a subquery
Dan Suciu -- p544 Fall 2011 99
Finding Witnesses
There is a more concise solution here:
SELECT x.country, y.pname, y.priceFROM Company x, Product yWHERE x.cname = y.manufacturer and y.price >= ALL (SELECT z.price FROM Product z WHERE x.cname = z.manufacturer)
Product (pname, price, category, manufacturer)
Company (cname, country)