21
Database Management Systems CSE 590DB Introduction March 30, 1998

Database Management Systems CSE 590DB Introduction March 30, 1998

Embed Size (px)

Citation preview

Database Management Systems

CSE 590DB

IntroductionMarch 30, 1998

Staff

Instructor: Alon Levy Sieg, Room 310, [email protected] Office hours: by appointment

TA: Rachel Pottinger Sieg 223, [email protected] Office hours: TBA

Mailing list: cse590db@csWeb page:

http://www.cs.washington.edu/education/courses/590db/98sp/

Purpose and Format

Purpose: Foundations of database management systems. Introduction to current research issues in databases.

Format: Lectures introducing the main topics Student presentations of selected research papers. Projects (for 3-credit takers)

Textbooks (none required)

Database Management Systems (Ramakrishnan) Foundations of Databases (Abiteboul, Hull & Vianu) Fundamentals of Database Systems (Elmasri and

Navathe) Database Systems (Silberschatz, Korth and

Sudarshan) Data and Knowledge based Systems (volumes I, II)

(Ullman) Readings in Database Systems (Stonbraker) Proceedings of SIGMOD, VLDB, PODS confrences.

Prerequisites

Real Prerequisites

Operating systemsData structures and

algorithmsDistributed systemsComplexity theoryMathematical LogicKnowledge

Representation

User interface design

Programming languages

Artificial Intelligence (Search)

Greek, Hebrew, French

Why Use a DBMS?

• Large amounts of data (Giga’s)• Data is very structured• Persistent data• Valuable data• Performance requirements• Concurrent access to the data• Restricted access to data

All programs manipulate data, so why use a database?

Functionality of a DBMS

Persistent storage managementMultiple abstraction levels of the data (in

particular, provides a logical view).High level query and data manipulation

languageEfficient query processingTransaction managementResiliency: recovery from crashes.Interface with programming languages

Persistent Storage

Becomes a hard problem because of the interaction with the other levels of the DBMS: What are we storing? Efficient indexing Special issues due to resiliency requirements Exploit “semantic” knowledge

Issue: interaction with the operating system. Should we rely on the OS?

Levels of Abstraction

External Schema1 External Schema 2

Conceptual Schema

Physical Schema

Disk

•Conceptual schema: tables and their attributes•Physical schema: files, indexes hash tables.•External schema: views of the different applications, classes of users.

System catalog: la The component of the database that manages the meta data about the different levels of abstraction.

The Relational Model

Student Course Quarter

Charles CS 444 Fall, 1997

Dan CS 142 Winter,1998

… … …

Data is organized into tables with attributes. Rows in the tables are tuples.

The power of simplicity!

Logical Model Issues

What data model should we use? Relational, object-oriented, object-relational,

deductive database model, semi-structuredHow do we design a good conceptual schema?

(normal forms, index selection)Are we really providing an abstraction?How does this abstraction interact with the

programming language? (the impedance mismatch).

Querying a Database

Find all the students who have taken CSE444 in Winter, 1998.

S(tructured) Q(uery) L(anguage) select E.name from Enroll E where E.course=CSE444 and E.quarter=“Winter, 1998”

SQL also provides an update facilities. SQL: an acquired taste (try datalog first)

Issues in Query Languages

Does it provide the appropriate functionality? SQL books get thicker and thicker.

Expressive power of a query language.Ease of use (query by example)DeclarativityProvide guidance in writing “good”

queries?

Query Optimization

A query is a declarative specification of “what” you want.

A query execution plan is an imperative program to produce the answer.

Query optimization: produce an efficient query execution plan.

Issues: large search space of plans, cost estimation, semantic transformations

Real goal: avoid the bad plans.

Transaction Processing and Recovery

For efficient use of resources, we want concurrent access to data.

Systems sometimes crash.A “real” database guarantees ACIDACID:

Atomicity: all or nothing of a transaction. Consistency: always leave the DB consistent. Isolation: every transaction runs as if it’s the

only one in the system. Durability: if committed, then we really mean it.

Database Industry

Relational databases are a great success of theoretical ideas.

“Big 3” DBMS companies are among the largest software companies in the world.

IBM (with DB2) and Microsoft (SQL Server, Microsoft Access) are also important players.

$20B industryMoving to warehousing, decision support.

Why Use a DBMS?

Data independence and efficient access.

Reduced application development time.

Data integrity and security.Uniform data administrationConcurrent access and recovery from

crashes.

DBMS Development

Issues in scaleup: Indexing and storing large amounts of data. Algorithms: sorting, joins

“Novel” issues: Modeling data (models, constraints, schema

design). Query languages Optimization: from a declarative

specification to an efficient program.

Course (Rough) OutlineData models and their associated query

languages: Relational: SQL, datalog, relational algebra Object-oriented: OQL Object-relational: novel features in SQL3. Semi-structured: languages for querying graphs.

Storage (very briefly)Query optimization: foundations and current

limitations.

Outline (continued)

Semantic analysis: query containment, using views.

Decision support, data warehousing: data warehouse design, maintainability issues.

Data integration: querying heterogeneous sources in a uniform fashion.

Data miningSIGMOD/PODS