Upload
jane-phillips
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Principles of Database Management Systems
CSE 544
IntroductionMarch 31st, 1999
Staff
Instructor: Alon Levy Sieg, Room 310, [email protected] Office hours: wed, 2:30-3:30. Or by email.
TAs: Zack Ives and Rachel Pottinger Office hours:
Zack: Mondays at noon (224) Rachel: Thursdays at 2:30pm (224)
Mailing list: cse544@csWeb page: (a lot of stuff already there)
http://www.cs.washington.edu/education/courses/544/99sp/
Course Times
In general, WF, 12-1:20pm (with a 5 minute breather in the middle).
Two special dates: Monday, April 5th Monday, April 19th
No classes on last week.
Goals of the Course
Purpose: Foundations of database management systems. Issues in building database systems. Introduction to current research issues in
databases. Have fun: databases are not just bunches of
tuples.
Grading
Homeworks: 15% SQL querying fun Join implementations
Project: 25% A query optimization engine for data
integration.Midterm: 15%Final: 35%Participation and intangibles: 10%
Textbook
Database System Implementation, Ullman, Widom, and Garcia-Molina, to be published by Prentice-Hall in June; available from the copy center.
Other Useful Texts
Database Management Systems (Ramakrishnan) Foundations of Databases (Abiteboul, Hull & Vianu) Parallel and Distributed DBMS (Ozsu and Valduriez) Transaction Processing (Gray and Reuter) Database Systems (Silberschatz, Korth and
Sudarshan) Data and Knowledge based Systems (volumes I, II)
(Ullman) Readings in Database Systems (Stonebraker and
Hellerstein) Proceedings of SIGMOD, VLDB, PODS conferences.
Prerequisites
Real Prerequisites
Operating systemsData structures and
algorithmsDistributed systemsComplexity theoryMathematical LogicKnowledge
Representation
User interface design
Programming languages
Artificial Intelligence (Search)
Greek, Hebrew, French
Why Use a DBMS?
• Large amounts of data (Giga’s, Tera’s) • Data is very structured• Persistent data• Valuable data• Performance requirements• Concurrent access to the data• Restricted access to data
All programs manipulate data, so why use a database?
Functionality of a DBMS
Persistent storage managementTransaction managementResiliency: recovery from crashes.Separation between logical and physical
views of the data. High level query and data manipulation
language. Efficient query processing
Interface with programming languages
Persistent Storage
Becomes a hard problem because of the interaction with the other levels of the DBMS: What are we storing? Efficient indexing Special issues due to resiliency requirements Exploit “semantic” knowledge
Issue: interaction with the operating system. Should we rely on the OS?
Transaction Processing and RecoveryFor efficient use of resources, we want
concurrent access to data.Systems sometimes crash.A “real” database guarantees ACIDACID:
Atomicity: all or nothing of a transaction. Consistency: always leave the DB consistent. Isolation: every transaction runs as if it’s the only
one in the system. Durability: if committed, we really mean it.
Do we really want ACID?
Physical vs. Logical Levels
External Schema 1 External Schema 2
Relational Schema
Physical Schema
Disk
•Conceptual schema: tables and their attributes•Physical schema: files, indexes hash tables.•External schema: views of the different applications, classes of users.
System catalog: The component of the database that stores meta data.
Conceptual design: a precursor to the relational schema.
The Relational Model
Student Course Quarter
Charles CS 444 Fall, 1997
Dan CS 142 Winter,1998
… … …
Data is organized into tables with attributes. Rows in the tables are tuples.
The power of simplicity!
Logical Model Issues
What data model should we use? Relational, object-oriented, object-relational,
deductive database model, semi-structuredHow do we design a good schema? (normal
forms, index selection)Are we really providing an abstraction?How does this abstraction interact with the
programming language? (the impedance mismatch).
Querying a Database
Find all the students who have taken CSE444 in Winter, 1998.
S(tructured) Q(uery) L(anguage) select E.name from Enroll E where E.course=CSE444 and E.quarter=“Winter, 1998”
SQL also provides update facilities. SQL: an acquired taste (try datalog first)
Issues in Query Languages
Does it provide the appropriate functionality? SQL books get thicker and thicker.
Expressive power of a query language.Ease of use (query by example)DeclarativityProvide guidance in writing “good”
queries?
Query Optimization
A query is a declarative specification of “what” you want.
A query execution plan is an imperative program to produce the answer.
Query optimization: produce an efficient query execution plan.
Issues: large search space of plans, cost estimation, semantic transformations
Real goal: avoid the bad plans.
Database Industry
Relational databases are a great success of theoretical ideas.
“Big 3” DBMS companies are among the largest software companies in the world.
IBM (with DB2) and Microsoft (SQL Server, Microsoft Access) are also important players.
$20B industryMoving to warehousing, decision support.
Course (Rough) Outline
The basics: The relational model SQL Views, integrity constraints Conceptual modeling datalog (recursive queries)
Physical representation: Index structures.
Course Outline (cont)
Query execution: Algorithms for joins, selections, projections.
Query OptimizationAdvanced topics:
data integration data mining semi-structured data
Transaction processing
The relational data model
Terminology
Name Price Category Manufacturer
gizmo $19.99 gadgets GizmoWorks
Power gizmo $29.99 gadgets GizmoWorks
SingleTouch $149.99 photography Canon
MultiTouch $203.99 household Hitachi
tuples
Attribute namesProduct (relation name)
Product(name: string, Price: real, category: enum, Manufacturer: string)
(Arity=4)
More Terminology
Every attribute has an atomic type.
Relation Schema: relation name + attribute names + attribute types
Relation instance: a set of tuples. Only one copy of any tuple! (not)
Database Schema: a set of relation schemas.
Database instance: a relation instance for every relation in the schema.
More on TuplesFormally, a mapping from attribute names to (correctly typed) values:
name gizmo price $19.99 category gadgets manufacturer GizmoWorks
Sometimes we refer to a tuple by itself: (note order of attributes)
(gizmo, $19.99, gadgets, GizmoWorks) or
Product (gizmo, $19.99, gadgets, GizmoWorks).
Integrity Constraints
An important functionality of a DBMS is to enable the specificationof integrity constraints and to enforce them.
Knowledge of integrity constraints is also useful for query optimization.
Examples of constraints:
keys, superkeys foreign keys domain constraints, tuple constraints. Functional dependencies, multivalued dependencies.
KeysA minimal set of attributes that uniquely the tuple (I.e., there is nopair of tuples with the same values for the key attributes):
Person: social security number name name + address name + address + age
Perfect keys are often hard to find, but organizations usuallyinvent something anyway.Superkey: a set of attributes that contains a key.A relation may have multiple keys: (but only one primary key)
employee number, social-security number
Foreign Key Constraints
Purchase:
buyer price product
Joe $20 gizmo Jack $20 E-gizmo
Product:
name manufacturer description
gizmo G-sym great stuffE-gizmo G-sym even better
An attribute of a relation R is must refer to a key of a relation S.
Functional Dependencies
Definition:
If two tuples agree on the attributes
A , A , … A 1 2 n
then they must also agree on the attributes
B , B , … B 1 2 m
Formally:
A , A , … A 1 2 n
B , B , … B 1 2 m
Key of a relation: all the attributes are either on the left or right.
Some Obvious Properties of FD’s
A , A , … A 1 2 n B , B , … B 1 2 m
A , A , … A 1 2 n 1
Is equivalent to
B
A , A , … A 1 2 n 2B
A , A , … A 1 2 n mB
…
A , A , … A 1 2 n iA Always holds.
Splitting rule and Combing rule
Comparing Functional Dependencies
Entailment: a set of functional dependencies S1 entails a set S2 if: any database that satisfies S1 much also satisfy S2.
Example: A B, B C entails A C
Equivalence: two sets of FD’s are equivalent if each entails the other.
{A B, B C } is equivalent to {A B, A C, B C}
Closure: Given a set of attributes A and a set of dependencies C, we want to find all the other attributes that are functionally determined by A.
Closure Algorithm
Start with Closure=A.
Until closure doesn’t change do:
if is in C, and
B is not in Closure
then
add B to closure.
A , A , … A 1 2 nB
A , A , … A 1 2 nAre all in the closure, and
Problems in Designing Schema
Name SSN Phone Number
Fred 123-321-99 (201) 555-1234
Fred 123-321-99 (206) 572-4312Joe 909-438-44 (908) 464-0028Joe 909-438-44 (212) 555-4000
Problems:
- redundancy - update anomalies - deletion anomalies
Relation Decomposition
Name SSN
Fred 123-321-99
Joe 909-438-44
Name Phone Number
Fred (201) 555-1234
Fred (206) 572-4312Joe (908) 464-0028Joe (212) 555-4000
Break the relation into two relations:
Boyce-Codd Normal Form
A simple condition for removing anomalies from relations:
A relation R is in BCNF if and only if:
Whenever there is a nontrivial dependency for R , it is the case that { } is a super-key for R.
A , A , … A 1 2 n
BA , A , … A 1 2 n
In English (though a bit vague):
Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R.