45
BACKGROUND OVERVIEW OF RELATIONAL DBS

BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

Embed Size (px)

Citation preview

Page 1: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

B AC K G R O U N D

OVERVIEW OF RELATIONAL DBS

Page 2: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

2

BASICS

• Highly structured• Schema based - we can leverage this to address volume

• Semantics• SQL• App• Middleware• Users

• Structure• Tables/relations, rows/tuples, columns/attributes• User defined data types• PKs and FKs• Null or not null• Triggers as a catch-all integrity constraint• Normalization for formal table minimization

• Uses• Bank checks• Insurance claims• Credit card payments

Page 3: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

3

NOSQL

• Cluster based broad distribution• Semi structured• More flexible access of data• Hierarchical• Similar structure

Page 4: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

4

RELATIONAL DBS:FORMALLY UNDERSTOOD

• Set theoretic

•Originally defined with an algebra, with Selection, Projection, Join, and Union/Difference/Intersection

•Declarative calculus that is based on the algebra and supports large grained queries

• Clean implementation spec

• Unambiguous optimization - with its own algebra of query parse tree transformations

Page 5: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

5

SEMANTICS ARE IN QUERIES

• Relational algebra compliant

•Queries written in declarative calculus

• Set-oriented

• But at least Programmers tend to follow PK/FK pairs, and infer semantics from attribute names and associations in tuples

•Query results are legal tables (Views)

Page 6: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

6

ALSO WE GET(GOOD AND BAD)

• Fixed size tuples for easy row-optimization

• 2P transactions

• Table, Row distribution

• Two language based, with lowest common denominator semantics

• Security

• Checkpointing

• Powerful query optimizers

Page 7: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

7

OBJECT-RELATIONAL DBS

• This runs somewhat counter to NoSQL trends - we make the data types even more complex

•We make domains out of type constructors

•Object IDs

• A row can be a tuple - or an object, with an object ID and a tuple, making all relational DBs also O-R

Page 8: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

8

OBJECT-ORIENTED DBS

• No tuple rows

• Blend SQL and the app language

• This avoids lowest common denominator semantics

• These bombed, as relational DBs were not O-O

• And they are tough to optimize

Page 9: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

THE RELATIONAL ALGEBRA AND CALCULUS: THE HEART OF

RELATIONAL DBS

… S Q L

Page 10: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

10

THE BIG 3:

• Selection and projection are unary ops• Join is binary• Selection is based on a formula and returns a

table that contains all tuples from a given table where the formula is valid• Projection returns a table consisting of a subset of

attributes from a given table, with dupes removed• Join creates tuples with attributes from two given

tables, where a specific attribute in one matches a specific attribute in another (often a PK, FK pair)

Page 11: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

11

ALGEBRAIC CLOSURE

• Any relational algebra operation returns a legal derived table• The set operators are also part of the algebra• From a formal perspective, the join operator is not

a minimal operator, and is therefore represented as a cross product followed by a selection (where the PK equals the FK)• Note that joins are symmetric

Page 12: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

12

JOINS CAN BE GENERALIZED

• Complex join conditions• Non-equi joins• A “natural” join is based on matching all

attributes with equal names in both tables• “Outer” join creates null-packed tuples when

tuples on the left do not match any on the right; there is also a right outer join

Page 13: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

13

THE CALCULUS

• It is a tuple calculus, not a domain calculus• SQL is equivalent• Select From Where• The part after the Where is declarative• A tuple calculus (SQL) • Notice that the variables are indeed tuples• Note that set operators often act on tables that

are being created in the query

Page 14: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

SQL

 CREATE PROCEDURE test()BEGINDECLARE sql_error TINYINT DEFAULT FALSE;DECLARE CONTINUE HANDLER FOR SQLEXCEPTIONSET sql_error = TRUE;START TRANSACTION; INSERT INTO invoicesVALUES (115, 34, 'ZXA-080', '2012-01-18', 14092.59, 0, 0, 3, '2012-04-18', NULL); INSERT INTO invoice_line_items VALUES (115, 1, 160, 4447.23, 'HW upgrade');INSERT INTO invoice_line_items VALUES (115, 2, 167, 9645.36, 'OS upgrade');IF sql_error = FALSE THENCOMMIT;SELECT 'The transaction was committed.';ELSEROLLBACK;SELECT 'The transaction was rolled back.';END IF;END//

Page 15: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

15

MORE

• IN operator is “element of”• EXISTS • Nesting• FOR ALL• FOR SOME• Putting computations in the SELECT clause• COUNT, SUM, AVG, MAX, MIN operators

Page 16: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

STORED PROGRAMS

• Stored procedures (can be called by an application)• Stored functions (can be called by an SQL

program)• Triggers (tied to an operation like INSERT)• Events (tied to a clock)

Page 17: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

VARIABLES

• DECLARE statement• SET statement• DEFAULT statement• INTO (from a SELECT clause)

Page 18: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

EXAMPLE…

CREATE PROCEDURE test()BEGIN DECLARE max_invoice_total DECIMAL(9,2); DECLARE min_invoice_total DECIMAL(9,2); DECLARE percent_difference DECIMAL(9,4); DECLARE count_invoice_id INT; DECLARE vendor_id_var INT; SET vendor_id_var = 95;  SELECT MAX(invoice_total), MIN(invoice_total), COUNT(invoice_id) INTO max_invoice_total, min_invoice_total, count_invoice_id FROM invoices WHERE vendor_id = vendor_id_var;

Page 19: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

EXAMPLE, CONTINUED

SET percent_difference = (max_invoice_total - min_invoice_total) / min_invoice_total * 100; SELECT CONCAT('$', max_invoice_total) AS 'Maximum invoice', CONCAT('$', min_invoice_total) AS 'Minimum invoice', CONCAT('%', ROUND(percent_difference, 2)) AS 'Percent difference', count_invoice_id AS 'Number of invoices';END//

Page 20: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

DOMAIN TYPES – CHAPTER 8

• Character• Integers • Reals• Date• Time• Large object, BLOB and CLOB• 2D vector spatial types• Enumerated

Page 21: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

THE ACID PROPERTIES, NORMALIZATION, AND DATABASE DESIGN

Page 22: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

ACID TRANSACTIONS

• Atomic: Either all of a transaction or None of it affects the database• Consistent: When a transaction ends, the

database obeys all constraints• Isolated: Two running transactions cannot pass

values to each other, via the database or other data store• Durable: Once a transaction has “committed”, its

updates are permanent

Page 23: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

ATOMICITY

• Use a local log to store a transaction’s partial result• If a transaction does something illegal, toss out

the log

Page 24: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

CONSISTENT

• Check constraints in phase 1• Some are immediate, like domains• Others don’t have to be true until the commit point, like

FKs

Page 25: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

ISOLATED

• Transactions commit in a linear order • Serializability is enforced• Results become available only after atomic

commit point

Page 26: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

DURABLE

• Database has one state and it is in nonvolatile storage• Keep checkpoints and transaction logs

Page 27: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

DEADLOCK

• Loops of transactions wait on each other• Detection: use time-outs• Prevention: use “waits for” graph

Page 28: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

STORED PROGRAMS

• Stored procedures (can be called by an application)• Stored functions (can be called by an SQL

program)• Triggers (tied to an operation like INSERT)• Events (tied to a clock)

Page 29: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

VARIABLES

• DECLARE statement• SET statement• DEFAULT statement• INTO (from a SELECT clause)

Page 30: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

ANOTHER VIEW OF TRANSACTIONS

• Prevents• Lost updates from one of two transactions• Dirty reads when a transaction reads an uncommitted

value• Nonrepeatable reads in one transaction because the

value gets updated in between• Phantom reads are when a subset of updated rows are

simultaneously updated by another transaction

Page 31: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

CONTINUED…

• Options• Serializable isolates transactions completely and is the

highest level of protection• Read uncommitted lets our four problems occur – no

locks• Read committed prevents dirty reads• Repeatable read is the default and it means that a

transaction will always read a given value the same because the values are locked

Page 32: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

DEADLOCK

• Detect by closing transactions that have been open a long time• Use the lowest acceptable locking level• Try to do heavy update transactions when

database can be completely reserved

Page 33: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

STORED PROGRAMS

• Stored procedures (can be called by an application)• Stored functions (can be called by an SQL

program)• Triggers (tied to an operation like INSERT)• Events (tied to a clock)

Page 34: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

THE DB DESIGN PROCESS

• Start with an entity model• Map to tables• Create PKs and FKs• Create other constraints• Normalize tables

Page 35: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

OUR FOCUS: NORMALIZATION

• Goals• Minimize redundant data• Minimize “update anomalies”

Page 36: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

FUNCTIONAL AND MULTIVALUED DEPENDENCIES

• FD • We say that ai FD-> aj• Or “ai functionally determines aj”

• MVD->• We say that ai MVD-> aj• Or “ai multivalued determines aj”

• Note: the right side of an FD or an MVD can be a set of attributes

Page 37: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

FIRST 3 NORMAL FORMS

• First (1NF) The value stored at the intersection of each row and column must be a scalar value, and a table must not contain any repeating columns.• Second (2NF) Every non-key column must

depend on the entire primary key.• Third (3NF) Every non-key column must depend

only on the primary key.

Page 38: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

NF3 FIXED AND NF4

• Boyce-Codd (BCNF) A non-key column can’t be dependent on another non-key column. • Fourth (4NF) A table must not have more

than one multivalued dependency, where the primary key has a one-to-many relationship to non-key columns.

Page 39: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

EXAMPLE: 1NF

Page 40: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

EXAMPLE: 2NF

Page 41: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

EXAMPLE: 2NF, CONTINUED

Page 42: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

3NF: REMOVE TRANSITIVE DEPENDENCIES

Customer ID Address ZIP18 112 First 80304 17 123 Ash 80303 16 123 Ash 80303

Page 43: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

3NF, CONTINUED

Break into two tables:

Customer ID AddressAddress Zip

Page 44: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

4NF: SEPARATE PAIRS OF MVDS

Mothers_Phone Fathers_Phone Child_Name

Break into: Mothers_Phone Child_Name 3030000000 Sue 3031111111 SueAnd Fathers_Phone Child_Name 3032222222 Sue

3033333333 Sue

Note: both fields needed for PK

Page 45: BACKGROUND OVERVIEW OF RELATIONAL DBS. BASICS Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users

TRADEOFFS

• “Decomposition” makes it harder to misunderstand the database schema• But Decomposition create narrow tables that

might not correspond to forms in the real world• And Decomposition leads to extra joins• One solution is to pre-join data