Upload
ada-johnston
View
217
Download
0
Embed Size (px)
DESCRIPTION
Outline and Reading Material Schema Normalization: 19.1 – 19.7 –Read only about BCNF, skip 3NF –Note: you need BCNF for HW2 Constraints and triggers: 3.2, 3.3, 5.8 Views: 3.6 –Answering queries using views: Sec. 1,2,3 Dan Suciu -- CSEP544 Fall Some material is NOT covered in the book. Read the slides carefully: we will not cover all in class!
Citation preview
Dan Suciu -- CSEP544 Fall 2011
Lecture 03:Normal Forms, Constraints, Views
Wednesday, October 12, 2011
1
2
Announcements
• HW1: was due Monday
• HW2: due next Monday
Dan Suciu -- CSEP544 Fall 2011
Outline and Reading Material
• Schema Normalization: 19.1 – 19.7– Read only about BCNF, skip 3NF– Note: you need BCNF for HW2
• Constraints and triggers: 3.2, 3.3, 5.8• Views: 3.6
– Answering queries using views: Sec. 1,2,3
Dan Suciu -- CSEP544 Fall 2011 3
Some material is NOT covered in the book.Read the slides carefully: we will not cover all in class!
Schema Normalization
Dan Suciu -- CSEP544 Fall 2011 4
5
Normal Forms
• 1st Normal Form = all tables are flat• 2nd Normal Form = obsolete• Boyce Codd Normal Form = will study• 3rd Normal Form = see book• 4th Normal Form = golden standard
Dan Suciu -- CSEP544 Fall 2011
6
First Normal Form (1NF)• A database schema is in First Normal
Form if all tables are flat
Name GPA Courses
Alice 3.8
Bob 3.7
Carol 3.9
Math
DB
OS
DB
OS
Math
OS
Student Name GPA
Alice 3.8
Bob 3.7
Carol 3.9
Student
Course
Math
DB
OS
Student Course
Alice Math
Carol Math
Alice DB
Bob DB
Alice OS
Carol OS
Takes Course
May needto add keys
7
Relational Schema Design
PersonbuysProduct
name
price name ssn
Conceptual Model:
Relational Model:plus FD’s
Normalization:Eliminates anomalies
Dan Suciu -- CSEP544 Fall 2011
8
Data Anomalies
When a database is poorly designed we get anomalies:
Redundancy: data is repeated
Updated anomalies: need to change in several places
Delete anomalies: may lose data when we don’t want
Dan Suciu -- CSEP544 Fall 2011
9
Relational Schema Design
Anomalies:• Redundancy = repeat data• Update anomalies = Fred moves to “Bellevue”• Deletion anomalies = Joe deletes his phone number:
what is his city ?
Recall set attributes (persons with several phones):Name SSN PhoneNumber CityFred 123-45-6789 206-555-1234 SeattleFred 123-45-6789 206-555-6543 SeattleJoe 987-65-4321 908-555-2121 Westfield
One person may have multiple phones, but lives in only one city
10
Relation DecompositionBreak the relation into two:
Name SSN City
Fred 123-45-6789 SeattleJoe 987-65-4321 Westfield
SSN PhoneNumber
123-45-6789 206-555-1234123-45-6789 206-555-6543987-65-4321 908-555-2121Anomalies have gone:
• No more repeated data• Easy to move Fred to “Bellevue” (how ?)• Easy to delete all Joe’s phone number (how ?)
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 SeattleFred 123-45-6789 206-555-6543 SeattleJoe 987-65-4321 908-555-2121 Westfield
11
Relational Schema Design(or Logical Design)
Main idea:• Start with some relational schema• Find out its functional dependencies• Use them to design a better relational
schema
Dan Suciu -- CSEP544 Fall 2011
12
Functional Dependencies
What is a functional dependency?• It is a kind of constraint• In theory one should find the FDs during
requirement analysis, then apply rigorously the normalization steps that we discuss next
• In practice one rarely follows this strict prescription
Dan Suciu -- CSEP544 Fall 2011
13
Functional Dependencies
Meaning: If two tuples agree on the attributes
then they must also agree on the attributes
Notation:
A1, A2, …, An B1, B2, …, Bm
A1, A2, …, An
B1, B2, …, Bm
Dan Suciu -- CSEP544 Fall 2011
14
When Does an FD Hold
Definition: A1, ..., Am B1, ..., Bn holds in R if:
t, t’ R, (t.A1=t’.A1 ... t.Am=t’.Am t.B1=t’.B1 ... t.Bn=t’.Bn )
A1 ... Am B1 ... nm
if t, t’ agree here then t, t’ agree here
t
t’
R
15
Examples
EmpID Name, Phone, PositionPosition Phonebut not Phone Position
An FD holds, or does not hold on an instance:
EmpID Name Phone PositionE0045 Smith 1234 ClerkE3542 Mike 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 Lawyer
Dan Suciu -- CSEP544 Fall 2011
16
Example
Position Phone
EmpID Name Phone PositionE0045 Smith 1234 ClerkE3542 Mike 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 Lawyer
Dan Suciu -- CSEP544 Fall 2011
17
Example
EmpID Name Phone PositionE0045 Smith 1234 ClerkE3542 Mike 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 Lawyer
but not Phone PositionDan Suciu -- CSEP544 Fall 2011
18
ExampleFD’s are constraints: On some instances they hold On others they don’t
name category color department price
Gizmo Gadget Green Toys 49
Tweaks Gadget Green Toys 99
Does this instance satisfy all the FDs ?
name colorcategory departmentcolor, category price
Dan Suciu -- CSEP544 Fall 2011
19
Example
name category color department price
Gizmo Gadget Green Toys 49
Tweaks Gadget Black Toys 99
Gizmo Stationary Green Office-supp. 59
What about this one ?
name colorcategory departmentcolor, category price
20
An Interesting Observation
If all these FDs are true:name colorcategory departmentcolor, category price
Then this FD also holds: name, category price
Why ??
21
Goal: Find ALL Functional Dependencies
• Anomalies occur when certain “bad” FDs hold
• We know some of the FDs
• Need to find all FDs, then look for the bad ones
Dan Suciu -- CSEP544 Fall 2011
22
Armstrong’s Rules (1/3)
Is equivalent to
Splitting rule and Combing rule
A1 ... Am B1 ... Bm
A1, A2, …, An B1, B2, …, Bm
A1, A2, …, An B1
A1, A2, …, An B2
. . . . .A1, A2, …, An Bm
Dan Suciu -- CSEP544 Fall 2011
23
Armstrong’s Rules (2/3)
Trivial Rule
Why ?
A1 … Am
where i = 1, 2, ..., n
A1, A2, …, An Ai
Dan Suciu -- CSEP544 Fall 2011
24
Armstrong’s Rules (3/3)
Transitive Closure Rule
If
and
then
Why ?
A1, A2, …, An B1, B2, …, Bm
B1, B2, …, Bm C1, C2, …, Cp
A1, A2, …, An C1, C2, …, Cp
Dan Suciu -- CSEP544 Fall 2011
25
A1 … Am B1 … Bm C1 ... Cp
Dan Suciu -- CSEP544 Fall 2011
26
Example (continued)Start with these:
Infer these:
Inferred FD Which Ruledid we apply ?
4. name, category name5. name, category color6. name, category category7. name, category color, category8. name, category price
1. name color2. category department3. color, category price
27
Example (continued)
Answers:
Inferred FD Which Ruledid we apply ?
4. name, category name Trivial rule5. name, category color Transitivity on 4, 16. name, category category Trivial rule7. name, category color, category Split/combine on 5, 68. name, category price Transitivity on 3, 7
1. name color2. category department3. color, category price
THIS IS TOO HARD ! There is an easier way.
28
Closure of a set of AttributesGiven a set of attributes A1, …, An
The closure, {A1, …, An}+ = the set of attributes B s.t. A1, …, An B
name colorcategory departmentcolor, category price
Example:
name+ = {name, color}{name, category}+ = {name, category, color, department, price}color+ = {color}
29
Closure AlgorithmX={A1, …, An}.
repeat until X doesn’t change do: if B1, …, Bn C is a FD and B1, …, Bn are all in X then add C to X.
{name, category}+ = ? [ …in class…]
name colorcategory departmentcolor, category price
Example:
30
Practice at Home…
Compute {A,B}+ X = {A, B, }
Compute {A, F}+ X = {A, F, }
R(A,B,C,D,E,F) A, B CA, D EB DA, F B
Dan Suciu -- CSEP544 Fall 2011
31
Why Do We Need Closure
• With closure we can find all FD’s easily
• To check if X A– Compute X+
– Check if A Î X+
Dan Suciu -- CSEP544 Fall 2011
Practice at Home
A, B CA, D BB D
Find all FD’s implied by:
Practice at Home
A, B CA, D BB D
Step 1: Compute X+, for every X:A+ = A, B+ = BD, C+ = C, D+ = DAB+ =ABCD, AC+=AC, AD+=ABCD, BC+=BCD, BD+=BD, CD+=CDABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)BCD+ = BCD, ABCD+ = ABCD
Find all FD’s implied by:
Practice at Home
A, B CA, D BB D
Step 1: Compute X+, for every X:A+ = A, B+ = BD, C+ = C, D+ = DAB+ =ABCD, AC+=AC, AD+=ABCD, BC+=BCD, BD+=BD, CD+=CDABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)BCD+ = BCD, ABCD+ = ABCD
Step 2: Enumerate all FD’s X Y, s.t. Y X+ and XY = :AB CD, ADBC, ABC D, ABD C, ACD B
Find all FD’s implied by:
Keys
A superkey is a set of attributes X = A1, ..., An s.t. for any other attribute B, we have X B
Note: X is a superkey if X+ = [all attributes]
A key is a minimal superkey X
How do we compute all keys X ?
36
Example
Product(name, price, category, color)
name, category pricecategory color
What are the keys ?
Dan Suciu -- CSEP544 Fall 2011
37
Example
Product(name, price, category, color)
name, category pricecategory color
(name, category) + = name, category, price, color
Hence X = (name, category) is a key
What are the keys ?
Practice at Home
Dan Suciu -- CSEP544 Fall 2011 38
student addressroom, time coursestudent, course room, time
Find all keys
Enrollment(student, address, course, room, time)
39
Read at Home
We can have more than one key.
Example: relation R(A,B,C) satisfies
ABCBCA
ABCBACor
Find all keys
40
Boyce-Codd Normal Form
There are no“bad” FDs:
A relation R is in BCNF if:
Whenever X B is a non-trivial dependency, then X is a superkey.
Equivalently:
Dan Suciu -- CSEP544 Fall 2011
A relation R is in BCNF if: " X, either X+ = X or X+ = [all attributes]
41
Example
The only key is: {SSN, PhoneNumber}Hence SSN Name, City is a “bad” dependency
SSN Name, City
Alternatively: SSN+ = Name, City and is neither SSN nor all attributes
Name SSN PhoneNumber CityFred 123-45-6789 206-555-1234 SeattleFred 123-45-6789 206-555-6543 SeattleJoe 987-65-4321 908-555-2121 WestfieldJoe 987-65-4321 908-555-1234 Westfield
42
BCNF Decomposition AlgorithmNormalize(R) find X s.t.: X ≠ X+ ≠ [all attributes] if (not found) then “R is in BCNF” let Y = X+ - X; Z = [all attributes] - X+ decompose R into R1(X Y) and R2(X Z) Normalize(R1); Normalize(R2);
Y X Z
X+
43
Example BCNF DecompositionPerson(name, SSN, age, hairColor, phoneNumber)
SSN name, ageage hairColor
Find X s.t.: X ≠X+ ≠ [all attributes]
Dan Suciu -- CSEP544 Fall 2011
44
Example BCNF DecompositionPerson(name, SSN, age, hairColor, phoneNumber)
SSN name, ageage hairColor
Find X s.t.: X ≠X+ ≠ [all attributes]
Dan Suciu -- CSEP544 Fall 2011
Iteration 1: Person: SSN+ = SSN, name, age, hairColorDecompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber)
SSNname,age,hairColor
phoneNumber
45
Example BCNF DecompositionPerson(name, SSN, age, hairColor, phoneNumber)
SSN name, ageage hairColor
Find X s.t.: X ≠X+ ≠ [all attributes]
Dan Suciu -- CSEP544 Fall 2011
Iteration 1: Person: SSN+ = SSN, name, age, hairColorDecompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber)
Iteration 2: P: age+ = age, hairColorDecompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber)
What arethe keys ?
46
Practice at HomeA BB C
R(A,B,C,D) A+ = ABC ≠ ABCD
R(A,B,C,D)
Dan Suciu -- CSEP544 Fall 2011
47
Practice at Home
What arethe keys ?
A BB C
R(A,B,C,D) A+ = ABC ≠ ABCD
R(A,B,C,D)
What happens if in R we first pick B+ ? Or AB+ ?
R1(A,B,C) B+ = BC ≠ ABC
R2(A,D)
R11(B,C) R12(A,B)
Dan Suciu -- CSEP544 Fall 2011
48
Decompositions in General
R1 = projection of R on A1, ..., An, B1, ..., Bm
R2 = projection of R on A1, ..., An, C1, ..., Cp
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)
Dan Suciu -- CSEP544 Fall 2011
49
Theory of Decomposition
Sometimes it is correct:Name Price Category
Gizmo 19.99 Gadget
OneClick 24.99 Camera
Gizmo 19.99 Camera
Name Price
Gizmo 19.99
OneClick 24.99
Gizmo 19.99
Name Category
Gizmo Gadget
OneClick Camera
Gizmo Camera
Lossless decomposition
50
Incorrect Decomposition
Sometimes it is not:
Name Price Category
Gizmo 19.99 Gadget
OneClick 24.99 Camera
Gizmo 19.99 Camera
Name Category
Gizmo Gadget
OneClick Camera
Gizmo Camera
Price Category
19.99 Gadget
24.99 Camera
19.99 Camera
What’sincorrect ??
Lossy decomposition
51
Decompositions in GeneralR(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
If A1, ..., An B1, ..., Bm
Then the decomposition is lossless
R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)
BCNF decomposition is always lossless. WHY ?
Note: don’t need A1, ..., An C1, ..., Cp
Constraints
Dan Suciu -- CSEP544 Fall 2011 52
Dan Suciu -- CSEP544 Fall 2011 53
Integrity Constraints in SQL
• Constraint = a property we want to hold• System enforces constraints at updates• Constraints in SQL:
– Keys, foreign keys– Attribute-level constraints– Tuple-level constraints– Global constraints: assertions
54
Keys
OR:
CREATE TABLE Product (name CHAR(30) PRIMARY KEY,price INT)
CREATE TABLE Product (name CHAR(30),price INT,
PRIMARY KEY (name))
Product(name, price)
Dan Suciu -- CSEP544 Fall 2011
55
Keys with Multiple Attributes
CREATE TABLE Product (name CHAR(30),category VARCHAR(20),price INT,
PRIMARY KEY (name, category))
Name Category Price
Gizmo Gadget 10
Camera Photo 20
Gizmo Photo 30
Gizmo Gadget 40
Product(name, category, price)
56
Other Keys
CREATE TABLE Product ( productID CHAR(10),
name CHAR(30),category VARCHAR(20),price INT,
PRIMARY KEY (productID), UNIQUE (name, category))
There is at most one PRIMARY KEY;there can be many UNIQUE
Dan Suciu -- CSEP544 Fall 2011
57
Foreign Key Constraints
CREATE TABLE Purchase ( buyer CHAR(30), seller CHAR(30), product CHAR(30) REFERENCES Product(name), store VARCHAR(30))
Foreign key
Purchase(buyer, seller, product, store)Product(name, price)
Dan Suciu -- CSEP544 Fall 2011
58
Name Category
Gizmo gadget
Camera Photo
OneClick Photo
ProdName Store
Gizmo Wiz
Camera Ritz
Camera Wiz
Product Purchase
Dan Suciu -- CSEP544 Fall 2011
Foreign Key Constraints
59
Purchase(buyer, seller, product, category, store)Product(name, category, price)
CREATE TABLE Purchase( buyer VARCHAR(50), seller VARCHAR(50), product CHAR(20), category VARCHAR(20), store VARCHAR(30), FOREIGN KEY (product, category) REFERENCES Product(name, category) );
Dan Suciu -- CSEP544 Fall 2011
60
Name Category
Gizmo gadget
Camera Photo
OneClick Photo
ProdName Store
Gizmo Wiz
Camera Ritz
Camera Wiz
Product Purchase
What happens during updates ?
Types of updates:• In Purchase: insert/update• In Product: delete/update
61
What happens during updates ?
• SQL has three policies for maintaining referential integrity:
• Reject violating modifications (default)• Cascade: after a delete/update do a
delete/update• Set-null set foreign-key field to NULL
Dan Suciu -- CSEP544 Fall 2011
Constraints on Attributes and Tuples
62
CREATE TABLE Purchase ( . . . store VARCHAR(30) NOT NULL, . . . )
CREATE TABLE Product ( . . . price INT CHECK (price >0 and price < 999))
Attribute level constraints:
Tuple level constraints:
Dan Suciu -- CSEP544 Fall 2011 . . . CHECK (price * quantity < 10000) . . .
63
CREATE TABLE Purchase (prodName CHAR(30)
CHECK (prodName IN SELECT Product.name FROM Product), date DATETIME NOT NULL)
Whatis the difference from
Foreign-Key ?
Dan Suciu -- CSEP544 Fall 2011
64
General Assertions
CREATE ASSERTION myAssert CHECK NOT EXISTS(
SELECT Product.nameFROM Product, PurchaseWHERE Product.name = Purchase.prodNameGROUP BY Product.nameHAVING count(*) > 200)
Dan Suciu -- CSEP544 Fall 2011
65
Comments on Constraints
• Can give them names, and alter later
• We need to understand exactly when they are checked
• We need to understand exactly what actions are taken if they fail
Dan Suciu -- CSEP544 Fall 2011
Semantic Optimization using Constraints
66
SELECT Purchase.storeFROM Product, PurchaseWHERE Product.name=Purchase.product
Purchase(buyer, seller, product, store)Product(name, price)
SELECT Purchase.storeFROM Purchase
Why ? and When ?
Views
Dan Suciu -- CSEP544 Fall 2011 67
Overview
Views are ubiquitous in data management:
• Used in SQL as names for predefined queries
• More generally, any derived data is a view
Dan Suciu -- CSEP544 Fall 2011 68
69
CREATE VIEW CustomerPrice AS SELECT DISTINCT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname
View Basics
CustomerPrice(customer, price) “virtual table”Dan Suciu -- CSEP544 Fall 2011
Views are relations, defined by a query
Purchase(customer, product, store)Product(pname, price)
CustomerPrice(customer, price)
View Basics
Dan Suciu -- CSEP544 Fall 2011 70
SELECT DISTINCT u.customer, v.storeFROM CustomerPrice u, Purchase vWHERE u.customer = v.customer AND u.price > 100
We can later use the view:
Purchase(customer, product, store)Product(pname, price)
CustomerPrice(customer, price)
71
View Basics
SELECT DISTINCT u.customer, v.storeFROM CustomerPrice u, Purchase vWHERE u.customer = v.customer AND u.price > 100
CREATE VIEW CustomerPrice AS SELECT DISTINCT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname
View:
Query:
Purchase(customer, product, store)Product(pname, price)
CustomerPrice(customer, price)
Dan Suciu -- CSEP544 Fall 2011
72
View Basics
SELECT DISTINCT u.customer, v.storeFROM (SELECT DISTINCT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname) u, Purchase vWHERE u.customer = v.customer AND u.price > 100
Modified query:
Dan Suciu -- CSEP544 Fall 2011
Purchase(customer, product, store)Product(pname, price)
CustomerPrice(customer, price)
Next, unnest the query…
73
View Basics
SELECT DISTINCT x.customer, v.storeFROM Purchase x, Product y, Purchase v, WHERE x.customer = v.customer AND y.price > 100 AND x.product = y.pname
Modified and unnested query:
Dan Suciu -- CSEP544 Fall 2011
Purchase(customer, product, store)Product(pname, price)
CustomerPrice(customer, price)
74
Practice at Home…
SELECT DISTINCT u.customer, v.storeFROM CustomerPrice u, Purchase vWHERE u.customer = v.customer AND u.price > 100
??
Dan Suciu -- CSEP544 Fall 2011
Purchase(customer, product, store)Product(pname, price)
CustomerPrice(customer, price)
75
Answer
SELECT DISTINCT u.customer, v.storeFROM CustomerPrice u, Purchase vWHERE u.customer = v.customer AND u.price > 100
Dan Suciu -- CSEP544 Fall 2011
Purchase(customer, product, store)Product(pname, price)
CustomerPrice(customer, price)
SELECT DISTINCT x.customer, v.storeFROM Purchase x, Product y, Purchase v, WHERE x.customer = v.customer AND y.price > 100 AND x.product = y.pname
76
Types of Views• Virtual views:
– Pros/cons ?
• Materialized views
– Pros/cons ?
Dan Suciu -- CSEP544 Fall 2011
77
Types of Views• Virtual views:
– Used in databases– Computed only on-demand – slow at runtime– Always up to date
• Materialized views– Used in databases and data warehouses– Pre-computed offline – fast at runtime– May have stale data or expensive synchronization
Dan Suciu -- CSEP544 Fall 2011
Technical Aspects• View inlining, or query
modification
• Query answering using views
• Updating views
• Incremental view update
Db View
Answer
V
Q
Db View
Answer
V
Q
Db ViewV
Update??
Db ViewV
Update ??
79
Applications
Views have lots of applications• Physical and logical data independence• Security• Indexes• Denormalization• Semantic caching• …
Dan Suciu -- CSEP544 Fall 2011
Physical Data Independence: Vertical Partitioning
SSN Name Address Resume Picture234234 Mary Huston Clob1… Blob1…345345 Sue Seattle Clob2… Blob2…345343 Joan Seattle Clob3… Blob3…234234 Ann Portland Clob4… Blob4…
Resumes
SSN Name Address234234 Mary Huston
345345 Sue Seattle . . .
SSN Resume234234 Clob1…345345 Clob2…
SSN Picture234234 Blob1…
345345 Blob2…
T1 T2 T3
81
Vertical Partitioning
CREATE VIEW Resumes AS SELECT T1.ssn, T1.name, T1.address, T2.resume, T3.picture FROM T1,T2,T3 WHERE T1.ssn=T2.ssn and T2.ssn=T3.ssn
Dan Suciu -- CSEP544 Fall 2011
82
Vertical Partitioning
SELECT addressFROM ResumesWHERE name = ‘Sue’
We want the system to query only table T1.
Will that happen ?
Dan Suciu -- CSEP544 Fall 2011
Vertical Partitioning
• Hot trend in databases today for analytics• Main idea:
– Storage = Column(TID, value) pairs– Sort by TID enables reconstructing the table– Compress great compression, minimize I/O– Updates = VERY, VERY expensive
• Companies: C-Store and Vertica
Dan Suciu -- CSEP544 Fall 2011 83
Horizontal Partitioning
SSN Name City234234 Mary Huston
345345 Sue Seattle
345343 Joan Seattle
234234 Ann Portland
-- Frank Calgary
-- Jean Montreal
Customers
SSN Name City234234 Mary Huston
CustomersInHuston
SSN Name City345345 Sue Seattle
345343 Joan Seattle
CustomersInSeattle
. . . . .
85
Horizontal Partitioning
CREATE VIEW Customers AS CustomersInHuston UNION ALL CustomersInSeattle UNION ALL . . .
Dan Suciu -- CSEP544 Fall 2011
86
Horizontal Partitioning
SELECT nameFROM CustomersWHERE city = ‘Seattle’
Which tables are queried by the system ?
WHY ???Dan Suciu -- CSEP544 Fall 2011
87
Horizontal Partitioning
SELECT nameFROM CustomersWHERE city = ‘Seattle’
Now even humanscan’t tell which tablecontains customersin Seattle
Dan Suciu -- CSEP544 Fall 2011
CREATE VIEW Customers AS CustomersInXXX UNION ALL CustomersInYYY UNION ALL . . .
88
Horizontal Partitioning
CREATE VIEW Customers AS (SELECT SSN, name, ‘Huston’ as city FROM CustomersInHuston) UNION ALL (SELECT SSN, name, ‘Seattle’ as city FROM CustomersInSeattle) UNION ALL . . .
A hack around the problem:
Dan Suciu -- CSEP544 Fall 2011
89
Horizontal Partitioning
SELECT nameFROM CustomersWHERE city = ‘Seattle’
SELECT nameFROM CustomersInSeattle
Dan Suciu -- CSEP544 Fall 2011
Data Integration TerminologyLocal DB1
Local DBk
…
Integrated Data
Local DB1
Local DBk
…
Integrated Data
Global as View
VV1
Vk
Local as View
Which one needs query expansion,which one needs query answering using views ?
Horizontal Partitioning as LAV
CREATE VIEW CustomersInSeattle AS (SELECT * FROM Customers WHERE city = ‘Seattle’)CREATE VIEW CustomersInHuston AS (SELECT * FROM Customers WHERE city = ‘Huston’)….
SELECT name FROM CustomersWHERE city = ‘Seattle’
SELECT name FROM CustomersInSeattle
Views and Security
Name Address BalanceMary Huston 450.99Sue Seattle -240Joan Seattle 333.25Ann Portland -520
Customers:
Fred is notallowed to
see Balance
CREATE VIEW PublicCustomers SELECT Name, Address FROM Customers
Name Address BalanceMary Huston 450.99Sue Seattle -240Joan Seattle 333.25Ann Portland -520
John isnot allowedto see >0balances
CREATE VIEW BadCreditCustomers SELECT * FROM Customers WHERE Balance < 0
93
IndexesREALLY important to speed up query processing time.
SELECT *FROM PersonWHERE name = 'Smith'
CREATE INDEX myindex05 ON Person(name)
Person (name, age, city)
May take too long to scan the entire Person table
Now, when we rerun the query it will be much faster
B+ Tree Index
94
Adam Betty Charles …. Smith ….
We will discuss them in detail in a later lecture.
Dan Suciu -- CSEP544 Fall 2011
95
Creating IndexesIndexes can be created on more than one attribute:
CREATE INDEX doubleindex ON Person (age, city)Example:
96
Creating IndexesIndexes can be created on more than one attribute:
SELECT * FROM Person WHERE age = 55 AND city = 'Seattle'
Helps in:
CREATE INDEX doubleindex ON Person (age, city)Example:
97
Creating IndexesIndexes can be created on more than one attribute:
SELECT * FROM Person WHERE age = 55 AND city = 'Seattle'
Helps in:
CREATE INDEX doubleindex ON Person (age, city)Example:
SELECT * FROM Person WHERE age = 55
and even in:
98
Creating IndexesIndexes can be created on more than one attribute:
SELECT * FROM Person WHERE age = 55 AND city = 'Seattle'
Helps in:
SELECT * FROM Person WHERE city = 'Seattle'
But not in:
CREATE INDEX doubleindex ON Person (age, city)Example:
SELECT * FROM Person WHERE age = 55
and even in:
CREATE INDEX W ON Product(weight)CREATE INDEX P ON Product(price)
Indexes are Materialized Views
SELECT weight, priceFROM ProductWHERE weight > 10 and price < 100
Product(pid, name, weight, price, …)
SELECT x.weight, y.priceFROM W x, P yWHERE x.weight > 10 and y.price < 100 and x.pid = y.pid
CREATE VIEW W AS SELECT weight, pid FROM Product yCREATE VIEW P AS SELECT price, pid FROM Product y
Indexes as LAV:
“Covering indexes”:query usesonly the indexes
Denormalization
• Compute a view that is the join of several tables
• The view is now a relation that is not in BCNF (why not?)
Dan Suciu -- CSEP544 Fall 2011 100
CREATE VIEW CustomerPurchase AS SELECT x.customer, x.store, y.pname, y.price FROM Purchase x, Product y WHERE x.product = y.pname
Purchase(customer, product, store)Product(pname, price)
101
Semantic Caching• Queries Q1, Q2, … have been executed,
and their results are stored at the client• Now we need to compute a new query Q• Sometimes we can use the prior results in
answering Q• These queries can be seen as
materialized views• The problem becomes: answer Q using
the views Q1,Q2,…
Technical Aspects of Views• Simplifying queries after the views have
been in-lined– Query un-nesting– Query minimization
• Handling updates– Updating virtual views– Incremental update of materialized views
102Dan Suciu -- CSEP544 Fall 2011
103
Set v.s. Bag Semantics
SELECT DISTINCT a,b,cFROM R, S, TWHERE . . .
SELECT a,b,cFROM R, S, TWHERE . . .
Set semantics
Bag semantics
Dan Suciu -- CSEP544 Fall 2011
104
Unnesting: Sets/Sets
SELECT DISTINCT a,b,cFROM (SELECT DISTINCT u,v FROM R,S WHERE …), TWHERE . . .
SELECT DISTINCT a,b,cFROM R, S, TWHERE . . .
Dan Suciu -- CSEP544 Fall 2011
105
Unnesting: Sets/Bags
SELECT DISTINCT a,b,cFROM (SELECT u,v FROM R,S WHERE …), TWHERE . . .
SELECT DISTINCT a,b,cFROM R, S, TWHERE . . .
Dan Suciu -- CSEP544 Fall 2011
106
Unnesting: Bags/Bags
SELECT a,b,cFROM (SELECT u,v FROM R,S WHERE …), TWHERE . . .
SELECT a,b,cFROM R, S, TWHERE . . .
Dan Suciu -- CSEP544 Fall 2011
107
Unnesting: Bags/Sets
SELECT a,b,cFROM (SELECT DISTINCT u,v FROM R,S WHERE …), TWHERE . . .
NO
Dan Suciu -- CSEP544 Fall 2011
Query Minimization• Replace a query Q with Q’ having fewer tables in the
FROM clause
• When Q has fewest number of tables in the FROM clause, then we say it is minimized
• Usually (but not always) users write queries that are already minimized
• But the result of rewriting a query over view is often not minimized
Dan Suciu -- CSEP544 Fall 2011 108
Query Minimization under Bag Semantics
Rule 1: If:• x, y are tuple variables over the same
table and:• The condition x.key = y.key is in the
WHERE clauseThen combine x, y into a single variablequery
Dan Suciu -- CSEP544 Fall 2011 109
Query Minimization under Bag Semantics
SELECT x.date, y.nameFROM Order x, Product y, Order zWHERE x.pid = y.pid and y.price < 99 and z.weight > 150 and y.pid = z.pid and x.cid = z.cid
Order(cid, pid, weight, date)Product(pid, name, price)
SELECT x.date, y.nameFROM Order x, Product yWHERE x.pid = y.pid and y.price < 99 and x.weight > 150
What constraintsdo we need to havefor this optimization ?
111
Query Minimization under Bag Semantics
Rule 2: If • x ranges over S, y ranges over T, and• The condition x.fk = y.key is in the
WHERE clause, and• there is a not null constraint on x.fk• y is not used anywhere else, andThen remove T (and y) from the query
Dan Suciu -- CSEP544 Fall 2011
Query Minimization under Bag Semantics
Order(cid, pid, weight, date)Product(pid, name, price)
SELECT x.cid, x.date FROM Order x WHERE x.weight > 20
SELECT x.cid, x.date FROM Order x, Product y WHERE x.pid = y.pid and x.weight > 20
Q: Where do weencounter non-
minimized queries ?
What constraintsdo we need to havefor this optimization ?
Query Minimization under Bag Semantics
CREATE VIEW CheapOrders AS SELECT x.cid,x.pid,x.date,y.name,y.price FROM Order x, Product y WHERE x.pid = y.pid and y.price < 99
CREATE VIEW HeavyOrders AS SELECT a.cid,a.pid,a.date,b.name,b.price FROM Order a, Product b WHERE a.pid = b.pid and a.weight > 150
Order(cid, pid, weight, date)Product(pid, name, price)
SELECT u.cidFROM CheapOrders u, HeavyOrders vWHERE u.pid = v.pid and u.cid = v.cid
A: in queries resultingfrom view inlining
Customers who orderedcheap, heavy products
Query Minimization
CREATE VIEW CheapOrders AS SELECT x.cid,x.pid,x.date,y.name,y.price FROM Order x, Product y WHERE x.pid = y.pid and y.price < 99
CREATE VIEW HeavyOrders AS SELECT a.cid,a.pid,a.date,b.name,b.price FROM Order a, Product b WHERE a.pid = b.pid and a.weight > 150
Order(cid, pid, weight, date)Product(pid, name, price)
SELECT u.cidFROM CheapOrders u, HeavyOrders vWHERE u.pid = v.pid and u.cid = v.cid
SELECT a.cidFROM Order x, Product y Order a, Product bWHERE . . . .
Redundant Orders and Products
SELECT a.cidFROM Order x, Product y, Order a, Product bWHERE x.pid = y.pid and a.pid = b.pid and y.price < 99 and a.weight > 150 and x.cid = a.cid and x.pid = a.pid
SELECT x.cidFROM Order x, Product y, Product bWHERE x.pid = y.pid and x.pid = b.pid and y.price < 99 and x.weight > 150
x = a
SELECT x.cidFROM Order x, Product yWHERE x.pid = y.pid and y.price < 99 and x.weight > 150
y = b
Query Minimization under Set Semantics
• Rules 1 and 2 still apply
• Rule 3 involves homomorphisms
116Dan Suciu -- CSEP544 Fall 2011
Definition of a Homomorphism
A homomorphism from Q’ to Qis a mapping h : {y1, …, ym} {x1, …, xk}such that: (a) If h(yi) = xj, then Ri’ = Rj(b) C logically implies h(C’) and(c) h(A’) = A
SELECT DISTINCT AFROM R1 x1, …, Rk xkWHERE C
SELECT DISTINCT A’FROM R1’ y1, …, Rm’ ymWHERE C’
Q Q’
Definition of a HomomorphismTheorem If there exists a homomorphismfrom Q’ to Q, then every answer returned by Qis also returned by Q’.
We say that Q is contained in Q’
If there exists a homomorphism from Q’ to Q,and a homomorphism from Q to Q’,then Q and Q’ are equivalent
Find Homomorphism
Dan Suciu -- CSEP544 Fall 2011 119
SELECT DISTINCT x.cidFROM Order x, Product yWHERE x.pid = y.pid and y.price < 99 and x.weight > 150
QSELECT DISTINCT x.cidFROM Order x, Product y, Order zWHERE x.pid = y.pid and y.pid = z.pid and y.price < 99 and x.weight > 150 and z.weight > 100
Q’
Order(cid, pid, weight, date)Product(pid, name, price)
Homomorphism Q Q’
Dan Suciu -- CSEP544 Fall 2011 120
SELECT DISTINCT x.cidFROM Order x, Product yWHERE x.pid = y.pid and y.price < 99 and x.weight > 150
QSELECT DISTINCT x.cidFROM Order x, Product y, Order zWHERE x.pid = y.pid and y.pid = z.pid and y.price < 99 and x.weight > 150 and z.weight > 100
Q’
Order(cid, pid, weight, date)Product(pid, name, price)
Every answer to Qis also an answer to Q’
WHY ?
Homomorphism Q Q’
Dan Suciu -- CSEP544 Fall 2011 121
SELECT DISTINCT x.cidFROM Order x, Product yWHERE x.pid = y.pid and y.price < 99 and x.weight > 150
QSELECT DISTINCT x.cidFROM Order x, Product y, Order zWHERE x.pid = y.pid and y.pid = z.pid and y.price < 99 and x.weight > 150 and z.weight > 100
Q’
Order(cid, pid, weight, date)Product(pid, name, price)
Q and Q’ are equivalent !
Query Minimization under Set Semantics
SELECT DISTINCT x.pidFROM Product x, Product y, Product zWHERE x.category = y.category and y.price > 100 and x.category = z.category and z.price > 500 and z.weight > 10
SELECT DISTINCT x.pidFROM Product x, Product zWHERE x.category = z.category and z.price > 500 and z.weight > 10
Same as:
123
Query Minimization under Set Semantics
Rule 3: Let Q’ be the query obtained by removing the tuple variable x from Q. If:
• Q has set semantics (and same for Q’)• there exists a homomorphism from Q to Q’
Then Q’ is equivalent to Q. Hence one can safely remove x.
Dan Suciu -- CSEP544 Fall 2011
Example
SELECT DISTINCT x.pidFROM Product x, Product y, Product zWHERE x.category = y.category and y.price > 100 and x.category = z.category and z.price > 500 and z.weight > 10
SELECT DISTINCT x’.pidFROM Product x’, Product z’WHERE x’.category = z’.category and z’.price > 500 and z’.weight > 10
Q
Q’ Find a homomorphism h: Q Q’
Example
SELECT DISTINCT x.pidFROM Product x, Product y, Product zWHERE x.category = y.category and y.price > 100 and x.category = z.category and z.price > 500 and z.weight > 10
SELECT DISTINCT x’.pidFROM Product x’, Product z’WHERE x’.category = z’.category and z’.price > 500 and z’.weight > 10
Q
Q’ Answer: H(x) = x’, H(y) = H(z) = z’
126
CREATE VIEW Expensive-Product AS SELECT pname FROM Product WHERE price > 100
Updating Views
INSERT INTO Expensive-Product VALUES(‘Gizmo’)
Purchase(customer, product, store)Product(pname, price)
Updateableview
Dan Suciu -- CSEP544 Fall 2011
Updatable Views• Have a virtual view V(A1, A2, …) over
tables R1, R2, …• User wants to update a tuple in V
– Insert/modify/delete• Can we translate this into updates to
R1, R2, … ?• If yes: V = “an updateable view”• If not: V = “a non-updateable view”
Dan Suciu -- CSEP544 Fall 2011 127
128
CREATE VIEW Expensive-Product AS SELECT pname FROM Product WHERE price > 100
Updating Views
INSERT INTO Product VALUES(‘Gizmo’, NULL)
Purchase(customer, product, store)Product(pname, price)
Updateableview
Dan Suciu -- CSEP544 Fall 2011
INSERT INTO Expensive-Product VALUES(‘Gizmo’)
129
CREATE VIEW AcmePurchase AS SELECT customer, product FROM Purchase WHERE store = ‘AcmeStore’
Updating Views
INSERT INTO AcmePurchase VALUES(‘Joe’, ‘Gizmo’)
Purchase(customer, product, store)Product(pname, price)
Updateableview
Dan Suciu -- CSEP544 Fall 2011
130
CREATE VIEW AcmePurchase AS SELECT customer, product FROM Purchase WHERE store = ‘AcmeStore’
Updating Views
INSERT INTO AcmePurchase VALUES(‘Joe’, ‘Gizmo’)
INSERT INTO PurchaseVALUES(‘Joe’,’Gizmo’,NULL)
Notethis
Purchase(customer, product, store)Product(pname, price)
Updateableview
Dan Suciu -- CSEP544 Fall 2011
131
Updating Views
INSERT INTO CustomerPrice VALUES(‘Joe’, 200)
? ? ? ? ?
Non-updateableview
Most views arenon-updateable
CREATE VIEW CustomerPrice AS SELECT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname
Purchase(customer, product, store)Product(pname, price)
Dan Suciu -- CSEP544 Fall 2011
132
Incremental View Update
Also known as view synchronization• Immediate synchronization = after each
update• Deferred synchronization
– Lazy = at query time– Periodic– Forced = manual
133
CREATE VIEW FullOrder AS SELECT x.cid,x.pid,x.date,y.name,y.price FROM Order x, Product y WHERE x.pid = y.pid
Incremental View Update
UPDATE ProductSET price = price / 2WHERE pid = ‘12345’
Order(cid, pid, date)Product(pid, name, price)
UPDATE FullOrderSET price = price / 2WHERE pid = ‘12345’
No need to recompute the entire view !Dan Suciu -- CSEP544 Fall 2011
134
CREATE VIEW Categories AS SELECT DISTINCT category FROM Product
Incremental View Update
DELETE ProductWHERE pid = ‘12345’
Product(pid, name, category, price)
DELETE CategoriesWHERE category in (SELECT category FROM Product WHERE pid = ‘12345’)
It doesn’t work ! Why ? How can we fix it ?
135
CREATE VIEW Categories AS SELECT category, count(*) as c FROM Product GROUP BY category
Incremental View Update
DELETE ProductWHERE pid = ‘12345’
Product(pid, name, category, price)
UPDATE CategoriesSET c = c-1 WHERE category in (SELECT category FROM Product WHERE pid = ‘12345’);DELETE CategoriesWHERE c = 0
136
Answering Queries Using Views
[See the paper]• We have several materialized views:
– V1, V2, …, Vn• Given a query Q
– Answer it by using views instead of base tables• Variation: Query rewriting using views
– Answer it by rewriting it to another query first• Example: if the views are indexes, then we
rewrite the query to use indexesDan Suciu -- CSEP544 Fall 2011
Rewriting Queries Using Views
137
Purchase(buyer, seller, product, store)Person(pname, city)
CREATE VIEW SeattleView AS SELECT y.buyer, y.seller, y.product, y.store FROM Person x, Purchase y WHERE x.city = ‘Seattle’ AND x.pname = y.buyer
SELECT y.buyer, y.sellerFROM Person x, Purchase yWHERE x.city = ‘Seattle’ AND x..pname = y.buyer AND y.product=‘gizmo’
Goal: rewrite this queryin terms of the view
Have thismaterializedview:
Rewriting Queries Using Views
138
SELECT y.buyer, y.sellerFROM Person x, Purchase yWHERE x.city = ‘Seattle’ AND x..pname = y.buyer AND y.product=‘gizmo’
SELECT buyer, sellerFROM SeattleViewWHERE product= ‘gizmo’
Dan Suciu -- CSEP544 Fall 2011
Rewriting is not always possible
139
CREATE VIEW DifferentView AS SELECT y.buyer, y.seller, y.product, y.store FROM Person x, Purchase y, Product z WHERE x.city = ‘Seattle’ AND x.pname = y.buyer AND y.product = z.name AND z.price < 100
SELECT y.buyer, y.sellerFROM Person x, Purchase yWHERE x.city = ‘Seattle’ AND x..pname = y.buyer AND y.product=‘gizmo’ SELECT buyer, seller
FROM DifferentViewWHERE product= ‘gizmo’
“Maximallycontainedrewriting”