32
Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

Embed Size (px)

DESCRIPTION

3 The INGRES System  Interactive Graphics and Retrieval System  Probably the first usable RDBMS  One of the first real projects built on UNIX  On a minicomputer, the PDP-11, which greatly constrained things due to limited memory  No low-level access to disk – prevented clustering in storage  Based on a relational query language called QUEL  Many religious wars between QUEL and SEQUEL camps  Stonebraker’s first commercialized project  Changed university IP rules forever  Today: was recently open-sourced by Computer Associates  … but mostly overshadowed by MySQL and Postgres (a later Stonebraker project)

Citation preview

Page 1: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

Archetypal Databases,or, What Is a DBMS?

Zachary G. IvesUniversity of Pennsylvania

CIS 650 – Implementing Data Management Systems

September 9, 2008

Page 2: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

Today Last time we saw the “vision” of the relational

database – decoupling “what” from “how” Storage scheme shouldn’t matter to the programmer … Only the operations to be performed

But how do we realize this vision? And what about all of the other aspects of a

DBMS? Concurrency, recovery Creating tables, performing updates, …

2

Page 3: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

3

The INGRES System Interactive Graphics and Retrieval System

Probably the first usable RDBMS One of the first real projects built on UNIX

On a minicomputer, the PDP-11, which greatly constrained things due to limited memory

No low-level access to disk – prevented clustering in storage Based on a relational query language called QUEL

Many religious wars between QUEL and SEQUEL camps Stonebraker’s first commercialized project

Changed university IP rules forever

Today: was recently open-sourced by Computer Associates … but mostly overshadowed by MySQL and Postgres (a later

Stonebraker project)

Page 4: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

4

Processes Precompiler converts C+EQUEL into a

C program that gets compiled Due to 64K process size limits,

needed to break things into 4 processes communicating via pipes

Even some of these needed to be broken into overlays

Later added a Process 2.5 to make 5 processes

C program

Lexer, parserconcurrencyquery mod.

Query processing

Utilities,recovery

2

3

4

Page 5: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

5

System-R Probably a bit

closer to today’s DBMSs, at least at the low level – but didn’t run on UNIX

Based on years of experience with IMS and other IBM database systems At this point, prior to Selinger (note limited

optimization), Lindsay (not a huge performance focus)

Components were built to be generally reusable Default language was called SEQUEL, by Chamberlin

and Boyce

Page 6: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

6

System-R Architecture

Relational StorageSystem (RSS)

Storage, concurrency, access paths (“images”, “links”),

triggers

RelationalData System (RDS)

parsing, (limited) optimization

SEQUEL or e-SEQUEL(RDI)

(RSI)

Page 7: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

7

Languages QUEL vs. SEQUEL

The focus of many religious wars, though they borrowed each other’s ideas

Ultimately IBM won due to (1) market presence, (2) Oracle SEQUEL had NULL concept, bag semantics, aggregation

Postgres was originally based on QUEL, hence PostgreSQL

QUEL: more orthogonal,simple EQUEL embedding

RANGE OF X IS MYRELRETRIEVE (A.B)WHERE … ## C-block

SEQUEL: more block-oriented,embedded via cursors and row sets, aggregationSELECT A.BFROM MYREL XWHERE …

Page 8: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

8

Administration of a DatabaseINGRES:

Only the DBA can create shared relations and grant access to them

It’s possible for a user to create temporary relations (and required for query processing)

System-R: Anyone can create private relations and grant access to

them It’s possible to create persistent or temporary relations Mechanisms to add columns to tables (by default these

become NULL)

Page 9: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

9

Integrity and Security Both systems allowed much more general

notions of integrity than key constraints Assertions as a means of validation!

Query modification is one method used to do this Conjunction of assertions plus the query/update Only limited expressiveness – in INGRES there must be

at most one variable Security models:

System-R used a view-based security model INGRES used query modification How do these differ? Which is better?

Page 10: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

Data Storage

Page 11: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

11

Physical Data Layout Tuples must fit within a

page All tuples have TIDs

TID is a page + index System-R

Allowed links between tuples

Built-in concept of NULL Where else are TIDs

useful?

Page-level clustering: done by System-R in extents

What about INGRES?

t1t2 t3

Page 12: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

12

Access Paths – INGRES INGRES Access Methods Interface (AMI):

Can have unordered “heap file” Hash-based indexing ISAM-like indexing on a primary key

Predates B+ Tree – initially height-balanced, but afterwards index structure is static; requires overflow pages

“ISAM-like” because can’t sort across pages – can only lock one page at a time!!!

Lookups are done via OPENR(), GET() FIND() can be used to find a start or stop point Can call PARAMD, PARAMI to get parameters of data or index

Idea of extensible access methods – later revisited in Generalized Search Trees (GiST)

Page 13: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

13

Access Paths – System-R “Images”

Slightly more than indices (which are included there) – also linked lists and orderings

Index structures include ISAM and B Trees Search arguments, aka sargable predicates

Note that clustering was a key concept here Links via TIDs

Note that these were NOT relational in spirit! Intention was to support IMS over RSS Some code likely made its way back into IMS

How do these concepts relate to OO databases and today’s XML databases?

Page 14: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

14

Catalogs and Indices as Logical Abstractions System-R and INGRES decided to express

catalogs as relations, making them accessible to queries as well as enabling reuse of code How are they accessible in today’s RDBMSs?

INGRES even did this with indices – 1:1 mapping between files and relations Indices vs. views: how are they different? How are they different in the presence of TIDs or

links?

Page 15: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

Query Processing

Page 16: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

16

Query Processing - INGRES DECOMP algorithm and One Variable Query Processor

Break every query into separate operations that generate a temp relation for every projection, selection, join

Pick a relation to iterate over, substitute value for variable Repeat recursively

Note that (1) it’s interpreted, (2) it’s non-pipelined, and (3) it’s adaptive, always choosing the smallest relation for the substitution

Note that access paths were chosen in a pretty ad hoc way Also, no concept of sargable predicates – instead, could FIND

both upper and lower bounds, but needed to GET between

Page 17: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

17

DECOMP Example RANGE OF E, M IS EMPLOYEE

RANGE OF D IS DEPTRETRIEVE (E.NAME)WHERE E.SALARY > M.SALARY AND E.MANAGER = M.NAME AND E.DEPT = D.DEPT AND D.FLOOR# = 1 AND E.AGE > 40

First: apply selection and projection

Page 18: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

18

DECOMP, Ctd. RANGE OF D IS DEPT

RETRIEVE INTO T1 (D.DEPT)WHERE D.FLOOR# = 1

RANGE OF E IS EMPLOYEERETRIEVE INTO T2(E.NAME, E.SALARY, E.MANAGER, E.DEPT)WHERE E.AGE > 40

Now substitute these back into the main query

Page 19: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

19

DECOMP, Ctd. RANGE OF E IS T2

RANGE OF M IS EMPLOYEERANGE OF D IS T1RETRIEVE (E.NAME)WHERE E.SALARY > M.SALARY AND E.MANAGER = M.NAME AND E.DEPT = D.DEPT

Now pick the relation with smallest cardinality, e.g., T1, and substitute

Page 20: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

20

One-Variable Substitution foreach D in T1, recursively process:

RANGE OF E IS T2RANGE OF M IS EMPLOYEERETRIEVE (E.NAME)WHERE E.SALARY > M.SALARY AND E.MANAGER = M.NAME AND E.DEPT = *valueOf(D.DEPT)*

Now apply the selection to E…RANGE OF E IS T2RETRIEVE INTO T3 (E.NAME, E.SALARY, E.MANAGER, E.DEPT)WHERE E.DEPT = *valueOf(D.DEPT)*

Page 21: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

21

DECOMP, Recursively So the query looks like:

foreach D in T1 RANGE OF E IS T2

RETRIEVE INTO T3 (E.NAME, E.SALARY, E.MANAGER, E.DEPT)WHERE E.DEPT = *valueOf(D.DEPT)*

RANGE OF E IS T3RANGE OF M IS EMPLOYEERETRIEVE (E.NAME)WHERE E.SALARY > M.SALARY AND E.MANAGER = M.NAME

Now choose the smallest relation for substitution (e.g., T3)

Page 22: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

22

DECOMP, Recursively foreach D in T1

foreach D in T1 RANGE OF E IS T2

RETRIEVE INTO T3 (E.NAME, E.SALARY, E.MANAGER, E.DEPT)WHERE E.DEPT = *valueOf(D.DEPT)*

foreach E2 in T3 RANGE OF M IS EMPLOYEE

RETRIEVE INTO (val1)WHERE *valueOf(E2.SALARY)* > M.SALARY AND *valueOf(E2.MANAGER)* = M.NAME

Page 23: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

23

Query Processing – System-R Optimization is cost-based

Consider both disk cost (primary) and CPU cost (scaled by some H)

Compare clustered, non-clustered indices; sequential scan Needed to consider cases where data was interspersed

with other relations

Every join is binary; query plans are compiled Merge join and nested loops join are present Also have a link-based join No dynamic programming yet – Selinger was not yet

aboard

Page 24: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

24

Views INGRES:

Not supported at that time (though later) System-R:

Single-table views are typically updatable, if one-to-one (contrast with today’s SQL) Additionally, cursors could typically be modified!

Views as a security/encapsulation mechanism GRANT and REVOKE privileges

Page 25: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

25

Triggers Note that System-R had triggers from the

beginning! Later work led to the idea of active

databases, which had very rich triggger languages

Page 26: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

ACIDity

Page 27: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

27

Rollback / Abort Notion of transactions in System-R

encompassed multiple operations BEGIN_TRANS, END_TRANS, SAVE, RESTORE

Both systems had a notion of “old” and “new” pages (“shadow paging”) Could roll back transactions by swapping back

the old page But how did that work with concurrency?

Page 28: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

28

Concurrency & Locking INGRES:

Supports single QUEL statements as transactions Query locked all resources (table-level) before it began All page updates are atomic via locks Avoid deadlocks by preventing cases where they could

occur – this is why they don’t do true ISAM System-R:

Multiple levels of lock granularity logical: table-, range-level physical: page-, tuple-level

Shared, exclusive locks Multiple isolation levels (READ UNCOMMITTED, READ

COMMITTED, SERIALIZABLE; no REPEATABLE READ) Resolve deadlocks by choosing a victim, restarting

Page 29: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

29

Recovery INGRES:

Deferred updates – for performance, isolation Also used to make recovery possible But how far does this take you?

System-R: Notion of checkpoints and restarting Transaction logging and replay Not much mentioned about possibility of

recovery from failed restart Today’s techniques even recover from that

Page 30: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

Analysis

Page 31: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

31

What Ideas from this WorkWere Broken? Shadow paging

Now everything is purely log-based SQL idiosyncrasies INGRES recovery Relations as files Query optimization

Page 32: Archetypal Databases, or, What Is a DBMS? Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 9, 2008

32

Discussion:What Ideas Are We Still Using?