36
Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Embed Size (px)

Citation preview

Page 1: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Principles of Database Management Systems

CSE 544

IntroductionMarch 31st, 1999

Page 2: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Staff

Instructor: Alon Levy Sieg, Room 310, [email protected] Office hours: wed, 2:30-3:30. Or by email.

TAs: Zack Ives and Rachel Pottinger Office hours:

Zack: Mondays at noon (224) Rachel: Thursdays at 2:30pm (224)

Mailing list: cse544@csWeb page: (a lot of stuff already there)

http://www.cs.washington.edu/education/courses/544/99sp/

Page 3: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Course Times

In general, WF, 12-1:20pm (with a 5 minute breather in the middle).

Two special dates: Monday, April 5th Monday, April 19th

No classes on last week.

Page 4: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Goals of the Course

Purpose: Foundations of database management systems. Issues in building database systems. Introduction to current research issues in

databases. Have fun: databases are not just bunches of

tuples.

Page 5: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Grading

Homeworks: 15% SQL querying fun Join implementations

Project: 25% A query optimization engine for data

integration.Midterm: 15%Final: 35%Participation and intangibles: 10%

Page 6: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Textbook

Database System Implementation, Ullman, Widom, and Garcia-Molina, to be published by Prentice-Hall in June; available from the copy center.

Page 7: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Other Useful Texts

Database Management Systems (Ramakrishnan) Foundations of Databases (Abiteboul, Hull & Vianu) Parallel and Distributed DBMS (Ozsu and Valduriez) Transaction Processing (Gray and Reuter) Database Systems (Silberschatz, Korth and

Sudarshan) Data and Knowledge based Systems (volumes I, II)

(Ullman) Readings in Database Systems (Stonebraker and

Hellerstein) Proceedings of SIGMOD, VLDB, PODS conferences.

Page 8: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Prerequisites

Page 9: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Real Prerequisites

Operating systemsData structures and

algorithmsDistributed systemsComplexity theoryMathematical LogicKnowledge

Representation

User interface design

Programming languages

Artificial Intelligence (Search)

Greek, Hebrew, French

Page 10: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Why Use a DBMS?

• Large amounts of data (Giga’s, Tera’s) • Data is very structured• Persistent data• Valuable data• Performance requirements• Concurrent access to the data• Restricted access to data

All programs manipulate data, so why use a database?

Page 11: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Functionality of a DBMS

Persistent storage managementTransaction managementResiliency: recovery from crashes.Separation between logical and physical

views of the data. High level query and data manipulation

language. Efficient query processing

Interface with programming languages

Page 12: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Persistent Storage

Becomes a hard problem because of the interaction with the other levels of the DBMS: What are we storing? Efficient indexing Special issues due to resiliency requirements Exploit “semantic” knowledge

Issue: interaction with the operating system. Should we rely on the OS?

Page 13: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Transaction Processing and RecoveryFor efficient use of resources, we want

concurrent access to data.Systems sometimes crash.A “real” database guarantees ACIDACID:

Atomicity: all or nothing of a transaction. Consistency: always leave the DB consistent. Isolation: every transaction runs as if it’s the only

one in the system. Durability: if committed, we really mean it.

Do we really want ACID?

Page 14: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Physical vs. Logical Levels

External Schema 1 External Schema 2

Relational Schema

Physical Schema

Disk

•Conceptual schema: tables and their attributes•Physical schema: files, indexes hash tables.•External schema: views of the different applications, classes of users.

System catalog: The component of the database that stores meta data.

Conceptual design: a precursor to the relational schema.

Page 15: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

The Relational Model

Student Course Quarter

Charles CS 444 Fall, 1997

Dan CS 142 Winter,1998

… … …

Data is organized into tables with attributes. Rows in the tables are tuples.

The power of simplicity!

Page 16: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Logical Model Issues

What data model should we use? Relational, object-oriented, object-relational,

deductive database model, semi-structuredHow do we design a good schema? (normal

forms, index selection)Are we really providing an abstraction?How does this abstraction interact with the

programming language? (the impedance mismatch).

Page 17: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Querying a Database

Find all the students who have taken CSE444 in Winter, 1998.

S(tructured) Q(uery) L(anguage) select E.name from Enroll E where E.course=CSE444 and E.quarter=“Winter, 1998”

SQL also provides update facilities. SQL: an acquired taste (try datalog first)

Page 18: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Issues in Query Languages

Does it provide the appropriate functionality? SQL books get thicker and thicker.

Expressive power of a query language.Ease of use (query by example)DeclarativityProvide guidance in writing “good”

queries?

Page 19: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Query Optimization

A query is a declarative specification of “what” you want.

A query execution plan is an imperative program to produce the answer.

Query optimization: produce an efficient query execution plan.

Issues: large search space of plans, cost estimation, semantic transformations

Real goal: avoid the bad plans.

Page 20: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Database Industry

Relational databases are a great success of theoretical ideas.

“Big 3” DBMS companies are among the largest software companies in the world.

IBM (with DB2) and Microsoft (SQL Server, Microsoft Access) are also important players.

$20B industryMoving to warehousing, decision support.

Page 21: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Course (Rough) Outline

The basics: The relational model SQL Views, integrity constraints Conceptual modeling datalog (recursive queries)

Physical representation: Index structures.

Page 22: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Course Outline (cont)

Query execution: Algorithms for joins, selections, projections.

Query OptimizationAdvanced topics:

data integration data mining semi-structured data

Transaction processing

Page 23: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

The relational data model

Page 24: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Terminology

Name Price Category Manufacturer

gizmo $19.99 gadgets GizmoWorks

Power gizmo $29.99 gadgets GizmoWorks

SingleTouch $149.99 photography Canon

MultiTouch $203.99 household Hitachi

tuples

Attribute namesProduct (relation name)

Product(name: string, Price: real, category: enum, Manufacturer: string)

(Arity=4)

Page 25: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

More Terminology

Every attribute has an atomic type.

Relation Schema: relation name + attribute names + attribute types

Relation instance: a set of tuples. Only one copy of any tuple! (not)

Database Schema: a set of relation schemas.

Database instance: a relation instance for every relation in the schema.

Page 26: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

More on TuplesFormally, a mapping from attribute names to (correctly typed) values:

name gizmo price $19.99 category gadgets manufacturer GizmoWorks

Sometimes we refer to a tuple by itself: (note order of attributes)

(gizmo, $19.99, gadgets, GizmoWorks) or

Product (gizmo, $19.99, gadgets, GizmoWorks).

Page 27: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Integrity Constraints

An important functionality of a DBMS is to enable the specificationof integrity constraints and to enforce them.

Knowledge of integrity constraints is also useful for query optimization.

Examples of constraints:

keys, superkeys foreign keys domain constraints, tuple constraints. Functional dependencies, multivalued dependencies.

Page 28: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

KeysA minimal set of attributes that uniquely the tuple (I.e., there is nopair of tuples with the same values for the key attributes):

Person: social security number name name + address name + address + age

Perfect keys are often hard to find, but organizations usuallyinvent something anyway.Superkey: a set of attributes that contains a key.A relation may have multiple keys: (but only one primary key)

employee number, social-security number

Page 29: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Foreign Key Constraints

Purchase:

buyer price product

Joe $20 gizmo Jack $20 E-gizmo

Product:

name manufacturer description

gizmo G-sym great stuffE-gizmo G-sym even better

An attribute of a relation R is must refer to a key of a relation S.

Page 30: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Functional Dependencies

Definition:

If two tuples agree on the attributes

A , A , … A 1 2 n

then they must also agree on the attributes

B , B , … B 1 2 m

Formally:

A , A , … A 1 2 n

B , B , … B 1 2 m

Key of a relation: all the attributes are either on the left or right.

Page 31: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Some Obvious Properties of FD’s

A , A , … A 1 2 n B , B , … B 1 2 m

A , A , … A 1 2 n 1

Is equivalent to

B

A , A , … A 1 2 n 2B

A , A , … A 1 2 n mB

A , A , … A 1 2 n iA Always holds.

Splitting rule and Combing rule

Page 32: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Comparing Functional Dependencies

Entailment: a set of functional dependencies S1 entails a set S2 if: any database that satisfies S1 much also satisfy S2.

Example: A B, B C entails A C

Equivalence: two sets of FD’s are equivalent if each entails the other.

{A B, B C } is equivalent to {A B, A C, B C}

Closure: Given a set of attributes A and a set of dependencies C, we want to find all the other attributes that are functionally determined by A.

Page 33: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Closure Algorithm

Start with Closure=A.

Until closure doesn’t change do:

if is in C, and

B is not in Closure

then

add B to closure.

A , A , … A 1 2 nB

A , A , … A 1 2 nAre all in the closure, and

Page 34: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Problems in Designing Schema

Name SSN Phone Number

Fred 123-321-99 (201) 555-1234

Fred 123-321-99 (206) 572-4312Joe 909-438-44 (908) 464-0028Joe 909-438-44 (212) 555-4000

Problems:

- redundancy - update anomalies - deletion anomalies

Page 35: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Relation Decomposition

Name SSN

Fred 123-321-99

Joe 909-438-44

Name Phone Number

Fred (201) 555-1234

Fred (206) 572-4312Joe (908) 464-0028Joe (212) 555-4000

Break the relation into two relations:

Page 36: Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Boyce-Codd Normal Form

A simple condition for removing anomalies from relations:

A relation R is in BCNF if and only if:

Whenever there is a nontrivial dependency for R , it is the case that { } is a super-key for R.

A , A , … A 1 2 n

BA , A , … A 1 2 n

In English (though a bit vague):

Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R.