Upload
william-banks
View
213
Download
0
Embed Size (px)
DESCRIPTION
3 The INGRES System Interactive Graphics and Retrieval System Probably the first usable RDBMS One of the first real projects built on UNIX On a minicomputer, the PDP-11, which greatly constrained things due to limited memory No low-level access to disk – prevented clustering in storage Based on a relational query language called QUEL Many religious wars between QUEL and SEQUEL camps Stonebraker’s first commercialized project Changed university IP rules forever Today: was recently open-sourced by Computer Associates … but mostly overshadowed by MySQL and Postgres (a later Stonebraker project)
Citation preview
Archetypal Databases,or, What Is a DBMS?
Zachary G. IvesUniversity of Pennsylvania
CIS 650 – Implementing Data Management Systems
September 9, 2008
Today Last time we saw the “vision” of the relational
database – decoupling “what” from “how” Storage scheme shouldn’t matter to the programmer … Only the operations to be performed
But how do we realize this vision? And what about all of the other aspects of a
DBMS? Concurrency, recovery Creating tables, performing updates, …
2
3
The INGRES System Interactive Graphics and Retrieval System
Probably the first usable RDBMS One of the first real projects built on UNIX
On a minicomputer, the PDP-11, which greatly constrained things due to limited memory
No low-level access to disk – prevented clustering in storage Based on a relational query language called QUEL
Many religious wars between QUEL and SEQUEL camps Stonebraker’s first commercialized project
Changed university IP rules forever
Today: was recently open-sourced by Computer Associates … but mostly overshadowed by MySQL and Postgres (a later
Stonebraker project)
4
Processes Precompiler converts C+EQUEL into a
C program that gets compiled Due to 64K process size limits,
needed to break things into 4 processes communicating via pipes
Even some of these needed to be broken into overlays
Later added a Process 2.5 to make 5 processes
C program
Lexer, parserconcurrencyquery mod.
Query processing
Utilities,recovery
2
3
4
5
System-R Probably a bit
closer to today’s DBMSs, at least at the low level – but didn’t run on UNIX
Based on years of experience with IMS and other IBM database systems At this point, prior to Selinger (note limited
optimization), Lindsay (not a huge performance focus)
Components were built to be generally reusable Default language was called SEQUEL, by Chamberlin
and Boyce
6
System-R Architecture
Relational StorageSystem (RSS)
Storage, concurrency, access paths (“images”, “links”),
triggers
RelationalData System (RDS)
parsing, (limited) optimization
SEQUEL or e-SEQUEL(RDI)
(RSI)
7
Languages QUEL vs. SEQUEL
The focus of many religious wars, though they borrowed each other’s ideas
Ultimately IBM won due to (1) market presence, (2) Oracle SEQUEL had NULL concept, bag semantics, aggregation
Postgres was originally based on QUEL, hence PostgreSQL
QUEL: more orthogonal,simple EQUEL embedding
RANGE OF X IS MYRELRETRIEVE (A.B)WHERE … ## C-block
SEQUEL: more block-oriented,embedded via cursors and row sets, aggregationSELECT A.BFROM MYREL XWHERE …
8
Administration of a DatabaseINGRES:
Only the DBA can create shared relations and grant access to them
It’s possible for a user to create temporary relations (and required for query processing)
System-R: Anyone can create private relations and grant access to
them It’s possible to create persistent or temporary relations Mechanisms to add columns to tables (by default these
become NULL)
9
Integrity and Security Both systems allowed much more general
notions of integrity than key constraints Assertions as a means of validation!
Query modification is one method used to do this Conjunction of assertions plus the query/update Only limited expressiveness – in INGRES there must be
at most one variable Security models:
System-R used a view-based security model INGRES used query modification How do these differ? Which is better?
Data Storage
11
Physical Data Layout Tuples must fit within a
page All tuples have TIDs
TID is a page + index System-R
Allowed links between tuples
Built-in concept of NULL Where else are TIDs
useful?
Page-level clustering: done by System-R in extents
What about INGRES?
t1t2 t3
12
Access Paths – INGRES INGRES Access Methods Interface (AMI):
Can have unordered “heap file” Hash-based indexing ISAM-like indexing on a primary key
Predates B+ Tree – initially height-balanced, but afterwards index structure is static; requires overflow pages
“ISAM-like” because can’t sort across pages – can only lock one page at a time!!!
Lookups are done via OPENR(), GET() FIND() can be used to find a start or stop point Can call PARAMD, PARAMI to get parameters of data or index
Idea of extensible access methods – later revisited in Generalized Search Trees (GiST)
13
Access Paths – System-R “Images”
Slightly more than indices (which are included there) – also linked lists and orderings
Index structures include ISAM and B Trees Search arguments, aka sargable predicates
Note that clustering was a key concept here Links via TIDs
Note that these were NOT relational in spirit! Intention was to support IMS over RSS Some code likely made its way back into IMS
How do these concepts relate to OO databases and today’s XML databases?
14
Catalogs and Indices as Logical Abstractions System-R and INGRES decided to express
catalogs as relations, making them accessible to queries as well as enabling reuse of code How are they accessible in today’s RDBMSs?
INGRES even did this with indices – 1:1 mapping between files and relations Indices vs. views: how are they different? How are they different in the presence of TIDs or
links?
Query Processing
16
Query Processing - INGRES DECOMP algorithm and One Variable Query Processor
Break every query into separate operations that generate a temp relation for every projection, selection, join
Pick a relation to iterate over, substitute value for variable Repeat recursively
Note that (1) it’s interpreted, (2) it’s non-pipelined, and (3) it’s adaptive, always choosing the smallest relation for the substitution
Note that access paths were chosen in a pretty ad hoc way Also, no concept of sargable predicates – instead, could FIND
both upper and lower bounds, but needed to GET between
17
DECOMP Example RANGE OF E, M IS EMPLOYEE
RANGE OF D IS DEPTRETRIEVE (E.NAME)WHERE E.SALARY > M.SALARY AND E.MANAGER = M.NAME AND E.DEPT = D.DEPT AND D.FLOOR# = 1 AND E.AGE > 40
First: apply selection and projection
18
DECOMP, Ctd. RANGE OF D IS DEPT
RETRIEVE INTO T1 (D.DEPT)WHERE D.FLOOR# = 1
RANGE OF E IS EMPLOYEERETRIEVE INTO T2(E.NAME, E.SALARY, E.MANAGER, E.DEPT)WHERE E.AGE > 40
Now substitute these back into the main query
19
DECOMP, Ctd. RANGE OF E IS T2
RANGE OF M IS EMPLOYEERANGE OF D IS T1RETRIEVE (E.NAME)WHERE E.SALARY > M.SALARY AND E.MANAGER = M.NAME AND E.DEPT = D.DEPT
Now pick the relation with smallest cardinality, e.g., T1, and substitute
20
One-Variable Substitution foreach D in T1, recursively process:
RANGE OF E IS T2RANGE OF M IS EMPLOYEERETRIEVE (E.NAME)WHERE E.SALARY > M.SALARY AND E.MANAGER = M.NAME AND E.DEPT = *valueOf(D.DEPT)*
Now apply the selection to E…RANGE OF E IS T2RETRIEVE INTO T3 (E.NAME, E.SALARY, E.MANAGER, E.DEPT)WHERE E.DEPT = *valueOf(D.DEPT)*
21
DECOMP, Recursively So the query looks like:
foreach D in T1 RANGE OF E IS T2
RETRIEVE INTO T3 (E.NAME, E.SALARY, E.MANAGER, E.DEPT)WHERE E.DEPT = *valueOf(D.DEPT)*
RANGE OF E IS T3RANGE OF M IS EMPLOYEERETRIEVE (E.NAME)WHERE E.SALARY > M.SALARY AND E.MANAGER = M.NAME
Now choose the smallest relation for substitution (e.g., T3)
22
DECOMP, Recursively foreach D in T1
foreach D in T1 RANGE OF E IS T2
RETRIEVE INTO T3 (E.NAME, E.SALARY, E.MANAGER, E.DEPT)WHERE E.DEPT = *valueOf(D.DEPT)*
foreach E2 in T3 RANGE OF M IS EMPLOYEE
RETRIEVE INTO (val1)WHERE *valueOf(E2.SALARY)* > M.SALARY AND *valueOf(E2.MANAGER)* = M.NAME
23
Query Processing – System-R Optimization is cost-based
Consider both disk cost (primary) and CPU cost (scaled by some H)
Compare clustered, non-clustered indices; sequential scan Needed to consider cases where data was interspersed
with other relations
Every join is binary; query plans are compiled Merge join and nested loops join are present Also have a link-based join No dynamic programming yet – Selinger was not yet
aboard
24
Views INGRES:
Not supported at that time (though later) System-R:
Single-table views are typically updatable, if one-to-one (contrast with today’s SQL) Additionally, cursors could typically be modified!
Views as a security/encapsulation mechanism GRANT and REVOKE privileges
25
Triggers Note that System-R had triggers from the
beginning! Later work led to the idea of active
databases, which had very rich triggger languages
ACIDity
27
Rollback / Abort Notion of transactions in System-R
encompassed multiple operations BEGIN_TRANS, END_TRANS, SAVE, RESTORE
Both systems had a notion of “old” and “new” pages (“shadow paging”) Could roll back transactions by swapping back
the old page But how did that work with concurrency?
28
Concurrency & Locking INGRES:
Supports single QUEL statements as transactions Query locked all resources (table-level) before it began All page updates are atomic via locks Avoid deadlocks by preventing cases where they could
occur – this is why they don’t do true ISAM System-R:
Multiple levels of lock granularity logical: table-, range-level physical: page-, tuple-level
Shared, exclusive locks Multiple isolation levels (READ UNCOMMITTED, READ
COMMITTED, SERIALIZABLE; no REPEATABLE READ) Resolve deadlocks by choosing a victim, restarting
29
Recovery INGRES:
Deferred updates – for performance, isolation Also used to make recovery possible But how far does this take you?
System-R: Notion of checkpoints and restarting Transaction logging and replay Not much mentioned about possibility of
recovery from failed restart Today’s techniques even recover from that
Analysis
31
What Ideas from this WorkWere Broken? Shadow paging
Now everything is purely log-based SQL idiosyncrasies INGRES recovery Relations as files Query optimization
32
Discussion:What Ideas Are We Still Using?