72
Sept. 2004 Yangjun Chen 91.3 902 1 Database: Review Database I n t r o d u c t i o n s y s t e m a r c h i t e c t u r e , B a s i c c o n c e p t s , E R - m o d e l , D a t a m o d e l i n g , B + - t r e e H a s h i n g R e l a t i o n a l a l g e b r a , R e l a t i o n a l d a t a m o d e l S Q L : D D L , D M L N o r m a l i z a t i o n L o s s l e s s j o i n H i e r a r c h i c a l d a t a b a s e s

Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Embed Size (px)

Citation preview

Page 1: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 1

Database: Review

Database

Introduction

system architecture,

Basic concepts,

ER

-mod e l,

Da ta

mo d el in g ,

B+

-treeH

ashing

Relational algebra,

Relational data m

odel

SQ

L: D

DL

, DM

L

Norm

alization

Lossless join

Hierarchical databases

Page 2: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 2

Database: Review

Introductionto the database systems

What is a database?

The main characters of a database

The basic database design method

The entity-relationship data model

for application modeling

Page 3: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 3

Database: Review

The main characteristics of the database approach:

single repository of data•sharable by multiple users

•concurrency control and transaction concept•security and integrity constraints

•self-describing - system catalogue contains meta data

•program-data independence•some changes to the database are transparent to programs/users

•multiple views of data - to support individual needs of programs/users

Page 4: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 4

Database: Review

Database schema, Schema evolution,

Database state

Working process with a database system

Database system architecture

Data independence concept

Concepts andArchitecture

Page 5: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 5

Database: Review

Database schema

Relation schema

Schema evolution

Database state

Student Name StNo Class Major

Smith 17 1 CS

Brown 8 2 CS

Course CName CNo CrHrs Dept

Database 8803 3 CS

C 2606 3 CS

Section SId CNo Semester Yr Instructor

32 8803 Spring 2000 Smith

25 8803 Winter 2000 Smith

43 2606 Spring 2000 Jones

Grades StNo Sid Grade

17 25 A

17 43 B

Page 6: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 6

Database: Review

Working process with a database system:

Definition•record structure•data elements

•names•data types•constraints

etc

Construction•create database

files•populate the

database with records

Manipulation•querying•updating

Page 7: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 7

Database: Review

Database Management System (DBMS)

•collection of software facilitating the definition, construction and manipulation of databases

Users/actors

Requestmanager

Storagemanager,

Queryevaluation

Meta data

Storeddatabase

DBMS

Page 8: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 8

Database: Review

Three-schema architecture

Externalview

Externalview

Conceptualschema

Internalschema

Physical storage structures and details

Describes the whole database for all users

A specific user or groups view of the database

Page 9: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 9

Database: Review

Data modeling usingER-model

Entity-relationship model- Entity types

- strong entities- weak entities

- Relationships among entities- Attributes - attribute classification- Constraints

- cardinality constraints- participation constraints

ER-to-Relation-mapping

Page 10: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 10

Database: Review

employee

department

project

dependent

ER-model:

works for

manages

works on

dependents of

controls

supervision

bdate

ssn

name

lnameminitfname

sex address salary

birthdatename sex relationship

name number location

name number location

number ofemployeesstartdate

hours

1

1

1

N

supervisor supervisee NM

N

1M

N1 M

Page 11: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 11

Database: Review

external hashing

static hashing & dynamic hashing

hash function

mathematical function that maps a key to a

bucket addresscollisionscollision resolution scheme- open addressing- chaining- multiple hashing

linear hashing

Hashing technique

Page 12: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 12

Database: Review

External hashing: the data are on the disk.

Static hashing:using a hashing function to map keys to bucket addressesprimary area can not be changedcollision resolution schema:

open addressingchainingmultiple hashing

Dynamic hashing:primary area can be changedlinear hashing

Page 13: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 13

Database: Review

Linear hashing:

1. What is a phase?

2. How to split a bucket?

3. When to split a bucket?

4. What bucket will be chosen to split next?

Page 14: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 14

Database: Review

Linear hashing:initially hash file contains M bucketshi = key mod 2iM (i = 0, 1, 2, ...)insertion process can be divided into several phases

phase 1:insertion using h0 = key mod Msplitting using h1 = key mod 2Msplitting rule: overflow of a bucket or

if load factor > constant (e.g., 0.70)overflow will be put in the overflow area or redistributed throughsplitting a bucketsplitting buckets from n = 0 to n = M- 1 (after each splittingn is increased by 1.Phase 1 finishes when n = M (in this case, the primary areabecomes 2M buckets long)

Page 15: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 15

Database: Review

phase 2:insertion using h1 = key mod 2Msplitting using h2 = key mod 4Msplitting rule: overflow of a bucket or

if load factor > constant (e.g., 0.70)overflow will be put in the overflow area or redistributed

throughsplitting a bucketsplitting buckets from n = 0 to n = 2M- 1 (after each splittingn is increased by 1.Phase 1 finishes when n = 2M (in this case, the primary areawill contain 4M buckets.)

phase 3: ... … h2 = …, h3 = …, ...

Page 16: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 16

Database: Review

tree

- root, internal, leaf, subtree

- parent, child, sibling

balanced, unbalanced

b+-tree

- splits on overflow; merge on underflow

- in practice it is usually 3 or 4 levels deep

search, insert, delete algorithms

Multi-levelindex

Page 17: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 17

Database: Review

B+-tree insertion: leaf node splitting, internal node splitting

Leaf splitting

When a leaf splits, a new leaf is allocated •the original leaf is the left sibling, the new one is the right sibling •key and pointer pairs are redistributed: the left sibling will have smaller keys than the right sibling•a 'copy' of the key value which is the largest of the keys in the left sibling is promoted to the parent

33

12 22 33 44 48 55 12 22 44 48 5531 33

22 33

insert 31

Page 18: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 18

Database: Review

Internal node splitting

If an internal node splits and it is not the root,•insert the key and pointer and then determine the middle key•a new 'right' sibling is allocated•everything to its left stays in the left sibling•everything to its right goes into the right sibling •the middle key value along with the pointer to the new right sibling is promoted to the parent (the middle key value 'moves' to the parent to become the discriminator between this left and right sibling)

22 33

55

22

26 55

Insert 26

33

Page 19: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 19

Database: Review

Internal node splitting

When a new root is formed, a key value and two pointers must be placed into it.

26 55

Insert 40

26 55

40

Page 20: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 20

Database: Review

Deleting nodes from a B+-tree:

1. When deleting a key from a node A, check whether the

number of the remaining keys (or pointers) is p/2.

2. If it is not the case, redistribute the keys in the left sibling B or

in the right sibling C if it is possible. Otherwise, merge A and B or merge

A and C.

3. When redistributing or merging, change the key values in the

parent node so that the following condition is satisfied:

•< P1, K1, P2, K2, …, Pq-1, Kq-1, Pq >

•K1 < K2 < ... < Kq-1 (i.e. it is an ordered set)

•for the key values, X, in the subtree pointed to by Pi

•Ki-1 < X <= Ki for 1 < i < q•X <= K1 for i = 1•Kq-1 < X for i = q

Page 21: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 21

Database: Review

A b+-tree

5

3 7 8

6 7 9 125 81 3

Records

p = 3,pleaf = 2.

Page 22: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 22

Database: Review

Entry deletion

- deletion sequence: 8, 12, 9, 7

5

3 7 9

6 7 125 91 3

Deleting 8 causes the node redistribute.

Page 23: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 23

Database: Review

Entry deletion

- deletion sequence: 8, 12, 9, 7

5

3 7

6 75 91 3

12 is removed.

Page 24: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 24

Database: Review

Entry deletion

- deletion sequence: 8, 12, 9, 7

5

3 6

65 71 3

9 is removed.

Page 25: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 25

Database: Review

Entry deletion

- deletion sequence: 8, 12, 9, 7

5

3 6

651 3

Deleting 7 makes this pointer no use.Therefore, a merge at the level abovethe leaf level occurs.

Page 26: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 26

Database: Review

Entry deletion

- deletion sequence: 8, 12, 9, 7

53

For this merge, 5 will be taken as a key value in A since any key value in B is less than or equal to 5 but any key value in C is larger than 5.

651 3

5A

B

C

5

This point becomes useless.The corresponding nodeshould also be removed.

Page 27: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 27

Database: Review

Entry deletion

- deletion sequence: 8, 12, 9, 7

651 3

53 5

Page 28: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 28

Database: Review

Data modeling usingRelational modelRelational algebra

Relational Data Model

- relational schema, relations

- database schema, database state

- integrity constraints and updating

Relational algebra

- select, project, join, cartesian product

- division

- set operations:

union, intersection, difference,

Page 29: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 29

Database: Review

Integrity Constraints

•any database will have some number of constraints that must be applied to ensure correct data (valid states)

1. domain constraints•a domain is a restriction on the set of valid values•domain constraints specify that the value of each attribute A must be an atomic value from the domain dom(A).

2. key constraints•a superkey is any combination of attributes that uniquely identify a tuple: t1[superkey] t2[superkey].

- Example: <Name, SSN> (in Employee)•a key is superkey that has a minimal set of attributes

- Example: <SSN> (in Employee)

Page 30: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 30

Database: Review

Integrity Constraints•If a relation schema has more than one key, each of them is called a candidate key.•one candidate key is chosen as the primary key (PK)•foreign key (FK) is defined as follows:

i) Consider two relation schemas R1 and R2;ii ) The attributes in FK in R1 have the same domain(s) as the

primary key attributes PK in R2; the attributes FK are said to reference or refer to the relation R2;

iii) A value of FK in a tuple t1 of the current state r(R1) either occurs as a value of PK for some tuple t2 in the

current state r(R2) or is null. In the former case, we have t1[FK] = t2[PK], and we say that the tuple t1 references or refers to the tuple t2.Example:

Employee(SSN, …, Dno) Dept(Dno, … )

FK

Page 31: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 31

Database: Review

Integrity Constraints

3. entity integrity •no part of a PK can be null

4. referential integrity•domain of FK must be same as domain of PK•FK must be null or have a value that appears as a PK value

5. semantic integrity•other rules that the application domain requires:

•state constraint: gross salary > net income •transition constraint: Widowed can only follow Married; salary of an employee cannot decrease

Page 32: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 32

Database: Review

Relational algebra

Retrieve for each female employee a list of the names of her

dependents:

FEMALE_EMPS SEX = ‘F’ (EMPLOYEE)

ACTUAL_DEPENDENTS EMPNAMES

EMPNAMES FNAME,LNAME, SSN(FEMALE_EMPS)

RESULT FNAME, LNAME, DEPENDENT_NAME(ACTUAL_DEPENDENTS )

DEPENDENTSSN = ESSN

Page 33: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 33

Database: Review

DDL

- creating schemas

- modifying schemas

DML

- select-from-where clause

- group by, having, order by

- update

- view

SQL

Page 34: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 34

Database: Review

DDL - Examples:

•Create schema: Create schema COMPANY authorization JSMITH;

•Create table: Create table EMPLOYEE

(FNAME VARCHAR(15) NOT NULL, MINIT CHAR, LNAME VARCHAR(15) NOT NULL, SSN CHAR(9) NOT NULL, BDATE DATE, ADDRESS VARCHAR(30), SEX CHAR, SALARY DECIMAL(10, 2), SUPERSSN CHAR(9), DNO INT NOT NULL,

PRIMARY KEY(SSN),FOREIGN KEY(SUPERSSN) REFERENCES EMPLOYEE(SSN),FOREIGN KEY(DNO) REFERENCES DEPARTMENT(DNUMBER));

Page 35: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 35

Database: Review

DDL - Examples:

•drop schemaDROP SCHEMA CAMPANY CASCADE;DROP SCHEMA CAMPANY RESTRICT;

•drop tableDROP TABLE DEPENDENT CASCADE;DROP TABLE DEPENDENT RESTRICT;

•alter tableALTER TABLE COMPANY.EMPLOYEE

ADD JOB VARCHAR(12);ALTER TABLE COMPANY.EMPLOYEE

DROP ADDRESS CASCADE;

Page 36: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 36

Database: Review

DML - select-from-where clause

Retrieve a list of employees and the projects they are working on, ordered bydepartment, within each department, ordered alphabetically by last name, first name:

SELECT DNAME, LNAME, FNAME, PNAMEFROM DEPARTMENT, EMPLOYEE, WORKS_ON, PROJECTWHERE DNUMBER = DNO AND SSN = ESSN AND

PNO = PNUMBERORDER BY DNAME, LNAME, FNAME

order by – clausegroup by – clausehaving – clauseaggregation functions: max, min, average, count, sum

Page 37: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 37

Database: Review

DML - select-from-where clause

•Insert•Update•Delete

INSERT INTO employee ( fname, lname, ssn, dno )VALUES ( "Joe", "Smith", 909, 1);

UPDATE employee SET salary = 100000WHERE ssn=909;

DELETE FROM employee WHERE ssn=909;

Note that Access changes the above to read:INSERT INTO employee ( fname, lname, ssn, dno )SELECT "Joe", "Smith", 909, 1;

Page 38: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 38

Database: Review

View definition

•Use a Create View command

•essentially a select specifying the data that makes up the view

•Create View Enames as select lname, fname from employee

CREATE VIEW Enames (lname, fname)AS SELECT LNAME, FNAME FROM EMPLOYEE

Page 39: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 39

Database: Review

CREATE VIEW DEPT_INFO (DEPT_NAME,NO_OF_EMPS,TOTAL_SAL)

AS SELECT DNAME, COUNT(*), SUM(SALARY)FROM DEPARTMENT, EMPLOYEEWHERE DNUMBER = DNOGROUP BY DNAME;

Page 40: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 40

Database: Review

function dependencies

- data redundancy, update anomalies

- what is a function dependency?

- inference rules, minimal set of FDs

normal forms

- first normal form

- second normal form

- third normal form

- Boyce Codd normal form

Normalization

Page 41: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 41

Database: Review

Data redundancy and update anomalies:

ename ssn bdate address

EmployeeDepartment

dnumber dname

This is similar to Employee, but we have included dname

Page 42: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 42

Database: Review

In the two prior cases with EmployeeDepartment and EmployeeProject, we have redundant information in the database …

•if two employees work in the same department, then that department name is replicated

•if more than one employee works on a project then the project location is replicated

•if an employee works on more than one project his/her name is replicated

Redundant data leads to

•additional space requirements

•update anomalies

Page 43: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 43

Database: Review

Suppose EmployeeDepartment is the only relation where department name is recorded

insert anomalies

•adding a new department is complicated unless there is also an employee for that department

deletion anomalies

•if we delete all employees for some department, what should happen to the department information?

modification anomalies

•if we change the name of a department, then we must change it in all tuples referring to that department

Page 44: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 44

Database: Review

Functional dependencies:

Suppose we have a relation R comprising attributes X,Y, …

We say a functional dependency exists between the attributes X and Y,

if, whenever a tuple exists with the value x for X, it will always have the same value y for Y.

X Y

X Y

LHS RHS

Page 45: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 45

Database: Review

student_no student_namecourse_no gender

Student

Given a specific student number, there is only one value for student name and only one value for gender found with it.

Student_no Student_name

gender

Page 46: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 46

Database: Review

Inference Rules for Function Dependencies

•From a set of FDs, we can derive some other FDs

Example:

F = {ssn {EnameBdate, Address, dnumber},

dnumber {dname, dmgrssn}}

ssn {dname, dmgrssn}, ssn dnumber,dnumber dname.

inference

•F+ (closure of F): The set of all FDs that can be deduced fromF (with F together) is called the closure of F.

Page 47: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 47

Database: Review

Inference Rules for Function Dependencies

•Inference rules:

- IR1 (reflexive rule): If X Y, then X Y. (X X.)

- IR2 (augmentation rule): {X Y} |= ZX Y.

- IR3 (transitive rule): {X Y, Y Z} |= X .

- IR4 (decomposition, or projective, rule):

{X Y} |= X Y, X Z.

- IR5 (union, or additive, rule): {X Y, Y Z} |= X Y.

- IR6 (pseudotransitive rule): {X Y, WY Z} |= WX .

Page 48: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 48

Database: Review

Equivalence of Sets of FDs

E and F are equivalent if E+ = F+.

Minimal sets of FDs

•every dependency has a single attribute on the RHS

•the attributes on the LHS of a dependency are minimal

•we cannot remove any dependency from F and still have a set of dependencies that is equivalent to F.

ssn pnumber hours ename plocation

{ssn, pnumber} hours,ssn ename,pnumber plocation.

Page 49: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 49

Database: Review

Normal Forms

•A series of normal forms are known that have, successively, better update characteristics.

•We’ll consider 1NF, 2NF, 3NF, and BCNF.

•A technique used to improve a relation is decomposition, where one relation is replaced by two or more relations. When we do so, we want to eliminate update anomalies without losing any information.

Page 50: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 50

Database: Review

1NF - First Normal Form

The domain of an attribute must only contain atomic values.

•This disallows repeating values, sets of values, relations within relations, nested relations, …

•In the example database we have a department located in possibly several locations: department 5 is located in Bellaire, Sugarland, and Houston.

•If we had the relation

then it would not be 1NF because there are multiple values to be kept in dlocations.

Department

dnumber dname dmgrssn dlocations

5 Research 333445555 Bellaire, Sugarland, Houston

Page 51: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 51

Database: Review

1NF - First Normal Form

If we have a non-1NF relation we can decompose it, or modify it appropriately, to generate 1NF relations.

There are 3 options:

•option 1: split off the problem attribute into a new relation (create a DepartmentLocation relation).

dnumber dname dmgrssn dlocation

Department

dnumber

DepartmentLocation

5 Research 333445555 Bellaire5

5 Sugarland

5 HoustonGenerally considered the best solution

Page 52: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 52

Database: Review

2NF - Second Normal Form

•full functional dependency

X Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more.

ssn pnumber hours ename plocation

EmployeeProject

{ssn, pnumber} hours is a full dependency

(neither ssn hours , nor pnumber hours).

Page 53: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 53

Database: Review

2NF - Second Normal Form

•partial functional dependency

X Y is a partial functional dependency if removal of some attribute A from X does not affect the dependency.

{ssn, pnumber} ename is a partial dependency

because ssn ename holds.)

ssn pnumber hours ename plocation

EmployeeProject

Page 54: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 54

Database: Review

2NF - Second Normal Form

A relation schema is in 2NF if

(1) it is in 1NF and

(2) every non-key attribute must be fully functionally dependent on the primary key.

If we had the relation

EmployeeProject

ssn pnumber hours ename plocation

then this relation would not be 2NF because of two separate

violations of the 2NF definition:

Page 55: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 55

Database: Review

2NF - Second Normal Form

•We correct this by decomposing the relation into three relations - splitting off the offending attributes - splitting off partial dependencies on the key.

ssn pnumber hours ename plocation

EmployeeProject

ssn pnumber hours

ename

plocation

ssn

pnumber

2NF

Page 56: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 56

Database: Review

3NF - Third Normal Form

•Transitive dependency

A functional dependency X Y in a relation schema R is a transitive dependency if there is a set of attributes Z that is not a subset of any key of R, and both X Z and Z Y hold.

ename ssn bdate address

EmployeeDept

dnumber dname

ssn dnumber and dnumber dname

Page 57: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 57

Database: Review

3NF - Third Normal Form

A relation schema is in 3NF if

(1) it is in 2NF and

(2) each non-key attribute must not be fully functionally dependent on another non-key attribute (there must be no transitive dependency of a non-key attribute on the PK)

•If we had the relation

ename ssn bdate address dnumber dname

then this relation would not be 3NF because•dname is functionally dependent on dnumber and neither is•a key attribute

Page 58: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 58

Database: Review

3NF - Third Normal Form

•We correct this by decomposing - splitting off the transitive dependencies

ename ssn bdate address

EmployeeDept

dnumber dname

ename ssn bdate address dnumber

dnamednumber3NF

Page 59: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 59

Database: Review

Boyce Codd Normal Form, BCNF

•Consider a different definition of 3NF, which is equivalent to the previous one.

A relation schema R is in 3NF if, whenever a function dependency X A holds in R, either

(a) X is a superkey of R, or

(b) A is a prime attribute of R.

A superkey of a relation schema R = {A1, A2, ..., An} is a set of attributes S Rwith the propertity that no tuples t1 and t2 in any legal state r of R will have t1[S] = t2[S].An attribute is called a prime attribute if it is a member of any key.

Page 60: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 60

Database: Review

Boyce Codd Normal Form, BCNF

•If we remove (b) from the previous definition for 3NF, we have the definition for BCNF.

•A relation schema is in BCNF if every determinant is a superkey key. Stronger than 3NF:

- no partial dependencies

- no transitive dependencies where a non-key attribute is dependent on another non-key attribute

- no non-key attributes appear in the LHS of a functional dependency.

Page 61: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 61

Database: Review

Boyce Codd Normal Form, BCNF

Consider:

student_no course_no instr_no

Instructor teaches one course only.

Student takes a course and has one instructor.

In 3NF!

{student_no, course_no} instr_noinstr_no course_no

Page 62: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 62

Database: Review

Boyce Codd Normal Form, BCNF

This decomposition preserves all the information.

course_no instr_no

student_no instr_no121 180399

121 190377

222 180366

222 77

99

77

66

S# C#I# I#

Only FD is instr_no course_no

but the join preserves

{student_no, course_no} instr_no

Page 63: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 63

Database: Review

Definition of lossless join property

- relation decomposition

- lossless join property

Testing algorithm

- matrix construction

- matrix initialization

- matrix modification

Losslessjoin

Page 64: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 64

Database: Review

•Basic definition of Lossless-join

A decomposition D = {R1, R2,..., Rm} of R has the lossless

join property with respect to the set of dependencies F on R if, for every relation r of R that satisfies F, the following holds,

(R1(r), ..., Rm(r)) = r,

where is the natural join of all the relations in D.

The word loss in lossless refers to loss of information, not to loss of tuples.

Page 65: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 65

Database: Review

SSN PNUM hours ENAME

Emp_PROJ

PNAME PLOCATION

F = {SSN ENAME, PNUM {PNAME, PLOCATION},{SSN, PNUM} hours}

SSN ENAME

R1

PNUM PNAME PLOCATION

R2

SSN PNUM hours

R3

Lossless join

Page 66: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 66

Database: Review

•decomposion-1

A1SSN

A2ENAME

A3PNUM

A4PNAME

A5PLOCATION

A6hours

b11

b21

b31

b12

b22

b32

b13

b23

b33

b14

b24

b34

b15

b25

b35

b16

b26

b36

R1

R2

R3

a1

b21

a1

a2

b22

b32

b13

a3

a3

b14

a4

b34

b15

a5

b35

b16

b26

a6

R1

R2

R3

Page 67: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 67

Database: Review

a1

b21

a1

a2

b22

a2

b13

a3

a3

b14

a4

b34

b15

a5

b35

b16

b26

a6

R1

R2

R3

a1

b21

a1

a2

b22

a2

b13

a3

a3

b14

a4

a4

b15

a5

a5

b16

b26

a6

R1

R2

R3

SSN ENAME

PNUM {PNAME, PLOCATION}

SSN ENAME

PNUM PNAME PLOCATION

Page 68: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 68

Database: Review

•Example: decomposition-2

SSN PNUM hours ENAME

Emp_PROJ

PNAME PLOCATION

F = {SSN ENAME, PNUM {PNAME, PLOCATION},{SSN, PNUM} hours}

ENAME

R1

SSN PNAME

PLOCATION

R2

PNUM hours

Not lossless join

PLOCATION

Page 69: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 69

Database: Review

•decomposition-2

A1SSN

A2ENAME

A3PNUM

A4PNAME

A5PLOCATION

A6hours

b11

b21

b12

b22

b13

b23

b14

b24

b15

b25

b16

b26

R1

R2

b11

a1

a2

b22

b13

a3

b14

a4

a5

a5

b16

a6

R1

R2

The matrix can not be changed!

SSN ENAMEPNUM {PNAME, PLOCATION}

{SSN, PNUM} hours

Page 70: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 70

Database: Review

Hierarchical database schema

- hierarchical schema

- record type, PCR type

- virtual PCR: virtual child, virtual parent

Database languages

- HDDL

- HDML

Hierarchicaldatabases

Page 71: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 71

Database: Review

dependent

Dept_locations

employee

department

project

ERD for Chapter 6 database example

1

1

1

1

n

n

nn

nn

m

1

11

Works on

Page 72: Database: Review Sept. 2004Yangjun Chen 91.39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Sept. 2004 Yangjun Chen 91.3902 72

Database: Review

•Virtual Parent-child Relationships- Hierarchical schema using VPCR - for a Company

databaseDepartment

Dname Dnum

Project

Pname … ...Dlocation

Location

DemployeeEPTR

DmanagerMPTR Pworker

Hours WPTR

Employee

Ename Minit … ...

EsuperviseeSPTR

Dependent

DEPname Minit ...

D E

L P

Y

M W

S

T

StartDate