IELM 511: Information System design

IELM 511: Information System design

Introduction

Part 1. ISD for well structured data – relational and other DBMS

ISD for systems with non-uniformly structured data

Part III: (one out of)

Basics of web-based IS (www, web2.0, …)Markup’s, HTML, XMLDesign tools for Info Sys: UML

API’s for mobile appsSecurity, CryptographyIS product lifecyclesAlgorithm analysis, P, NP, NPC

Info storage (modeling, normalization)Info retrieval (Relational algebra, Calculus, SQL)DB integrated API’s

Agenda

Relational design

1. Converting ER diagram into (a set of) DB tables

2. Normal forms – a theoretical basis for RDB design

Bank is organized in branches. Each branch is located in a particular city and identified by a unique name. The bank monitors the assets of each branch.

(Recap) Example: Banking system

Customers are identified by their SSN (equiv to HKID). The bank stores each customer’s name and address. Customers may have accounts, and can take out loans. A customer may be associated with a particular banker, who may act as a loan officer of personal banker for that customer.

Bank employees are also identified by SSN. The bank stores the Name, address, phone #, start day of employment of each employee, the name of all dependents of the employee, and the manager of the employee.

The bank offers two types of accounts: savings and checking. Accounts can be held by more than one customer, and a customer may have many accounts. Each account has a unique account number. We store each account’s balance, and the most recent date when the account was accessed by each customer holding the account. Each savings account has an interest rate, and overdrafts are recorded for each checking account.

A loan originates art a particular branch, and is held by one or more customers. Each loan has a unique number. For each loan, the bank stores the loan amount and the payments (date and amount) . Payment numbers are not unique, but a payment number uniquely identifies a payment for a specific loan.

1

n

1

n

mn

1

n

1 n

m

n

(Recap) Bank ER

Agenda

Relational design

1. Converting ER diagram into (a set of) DB tables

2. Normal forms – a theoretical basis for RDB design

Converting ER into Relational tables: rationale

There is an informal set of rules to convert ER diagrams into Tables

This is a very good initial design for most DB’s

Normalization can be used to verify/improve this initial design

Basic terminology

- All data is stored in tablesColumns: AttributesRows: Tuples

- Domain of an Attribute, A set of values that A can have

- Schema TableName( A1, A2, …, An)

- Tuple, t, of R(A1, A2, …, An) ORDERED set of values, < v1, v2, v3, …, vn>vi dom( Ai)

All tables in a DB must obey four types of constraints

Constraints on DB tables

A. Domain constraints

t[Ai] dom( Ai), for all t, Ai

B. Key constraints

Superkey of R: A set of attributes, SK, of R such that t1[ SK] != t2[SK] whenever t1 ≠ t2

Key: minimal Superkey of R

minimal:removal of any attribute from Key

no longer a Superkey of R

Constraints on tables..

B. Key constraints, examples:

CAR( State, LicensePlateNo, VehicleID, Model, Year, Manufacturer)

K1 = { State, LicensePlateNo}K1 is a minimal Superkey Key

K2 = { VehicleID } K2 Key (Why ?)

K3 = { VehicleID, Manufacturer}

Superkey ?

Key ?

Constraints on tables...

C. Entity Integrity constraints

If PK is the Primary Key, thent[PK] != NULL for any tuple t r( R)

D. Referential constraints

- All referential constraints must be defined

- X(Ai) references Y(Bj) dom(Ai) = dom(Bj)

- Foreign Key attributes that reference a Primary Key

Foreign Key examples

CUSTOMER SSN Name Address BankerSSN

EMPLOYEE SSN Name StartDate TelNo MgrSSN

FKFK

Converting ER into Relational tables

1. For each regular entity, E, One table E with all the simple attributes of E.

Select a primary key for E, and mark it.

2. For each binary relation type, R, between entity types, S and T:

For 1:1 relationship between S and TEither add PK(S) as FK(T), or add PK(T) as FK(R)

For 1:N relationship between S and T (S: the N-side)Add PK(T) as a foreign key in S.

For M:N relationship, R, between S and TCreate a new table, R, withthe PK’s of S and T as FK’s of P, plus any attributes of R

Converting ER into Relational tables..

3. For each weak entity type, W, whose identifying entity is EOne table W with all attributes of W and the primary key of Emark the Primary Key

4. For each multi-valued attribute A,Create a new table, R, including A, plus PK of the entity/relationship containing A

5. For each n-ary relationship, R, with degree > 2Create a table R, withPK of each participating entity as FK, plus all simple attributes of R

Converting ER into Relational tables…

6. Specializations*

If P is the highest level entity of an aggregation type,with specialization entity types R and S. Then:

Create a table for P, with each regular attribute of P.

Create a table for each of R, S, each with all of theirrespective attributes, and the primary key of P.

* we will ignore other, special cases of specialization .

Initial DB design for the bank ER: step 1 (entities)

BRANCH( b-name, city, assets, …)

CUSTOMER( cssn, c-name, street, city, ….)

LOAN( l-no, amount, ….)

PAYMENT( l-no, pay-no, date, amount, ….)

EMPLOYEE( e-ssn, e-name, tel, start-date, ….)

ACCOUNT( ac-no, balance, ….)

SACCOUNT( ac-no, int-rate, ….)

CACCOUNT( ac-no, od-amt, ….)

1

n

1

n

mn

1

n

1 n

m

n

1

n

1

n

mn

1

n

1 n

m

n

1

n

1

n

mn

1

n

1 n

m

n

1

n

1

n

mn

1

n

1 n

m

n

Initial DB design: step 2 (1-1, 1-n relationships)


CUSTOMER( cssn, c-name, street, city, banker, banker-type, ….)

LOAN( l-no, amount, br-name, ….)


EMPLOYEE( e-ssn, e-name, tel, start-date, mgr-ssn, ….)




1

n

1

n

mn

1

n

1 n

m

n

1

n

1

n

mn

1

n

1 n

m

n

Initial DB design: step 3 (m-n relationships)


CUSTOMER( cssn, c-name, street, city, banker, banker-type, ….)

LOAN( l-no, amount, br-name, ….)


EMPLOYEE( e-ssn, e-name, tel, start-date, mgr-ssn, ….)




BORROWS( cust-ssn, loan-num, ….)

DEPOSIT( c-ssn, ac-num, access-date….)

1

n

1

n

mn

1

n

1 n

m

n

1

n

1

n

mn

1

n

1 n

m

n

Initial DB design: step 4 (multi-valued attributes)

BRANCH( b-name, city, assets)

CUSTOMER( cssn, c-name, street, city, banker, banker-type)

LOAN( l-no, amount, br-name)

PAYMENT( l-no, pay-no, date, amount)

EMPLOYEE( e-ssn, e-name, tel, start-date, mgr-ssn)

ACCOUNT( ac-no, balance)

SACCOUNT( ac-no, int-rate)

CACCOUNT( ac-no, od-amt)

BORROWS( cust-ssn, loan-num)

DEPOSIT( c-ssn, ac-num, access-date)

DEPENDENT( emp-ssn, dep-name)

Normalization: the theoretical basis for RDB design

How can we tell if a DB design is ‘Good’ ?

A DB Design is good if:

(1) it provides a way to store all information in the system

(2) the design is not bad

How can we tell if a DB design is ‘Bad’ ?

Bad DB design examples:

Normalization: the theoretical basis for RDB design

CUST_LOAN( cssn, cname, addr, banker, banker-type, loan-no, amt, branch)

CUST_DEPOSIT( cssn, cname, addr, banker, banker-type, ac-no, bal, access-date)

(a) Information is stored redundantly

(b) Insertion anomalies

(c) Deletion Anomalies

(d) Modification Anomalies

Normalization: the theoretical basis for RDB design..

Design requirement: Avoid too many NULL values in some rows

STUDENT( SID, Name, Phone, Email, SocietyName, MemberNo)

OR

STUDENT( SID, Name, Phone, Email)

MEMBERSHIP( SID, SocietyName, MembershipNo)

Bad DB Designs..

- Spurious Tuples must not be created when ‘join’-ing tables

P1Proj2

P1Proj1

PartNoProjectNo

Proj2 P2

P2S2

P1S1

PartNoSupplierNo

S2 P1

Qty

25

10

20

PROJECT_PARTS SUPPLIER_PARTS

P1Proj2

P1Proj1

PartNoProjectNo

Proj2 P2

P2S2

P1S1

PartNoSupplierNo

S2 P1

Qty

25

10

20

PROJECT_PARTS SUPPLIER_PARTS

Example:

- Who supplied P2 to Proj2 ?-- the answer requires us to ‘join’ the two tables

- Who supplied P1 to Proj2 ?

suppliespart

supplier

projectpart_no

supp_no

proj_no

A (bad) design

Normal forms: functional dependencies

A set of attributes, X, functionally determines a set of attributes Yif the value of X determines a unique value for Y.

X Y implies thatfor any two tuples, t1 and t2,if t1[X] = t2[X], then t1[ Y] = t2[ Y]

NOTATION: X Y

Examples:

In table CUSTOMER, {SSN} {Customer name}

In table PAYMENT, { l-no, pay-no} { date, amount}

Concept of FD is important in development of normalized DB designs.

First normal form, 1NF

A table is in 1NF if it does not contain - any composite attributes,- any multi-valued attributes,- any nested relations

Any non-1NF schema can be converted into a set of 1NF schemas

SID Name SemYr Courses

0401 John Smith Fall 05 ie110, ie215

0402 Jane Doe Fall 05 ie110, ie317

STUDENT_COURSES

Not 1NF

Composite Multi-valued

SID Lname Fname Sem Yr Course

0401 Smith John Fall 05 ie110

0401 Smith John Fall 05 ie215

0402 Doe Jane Fall 05 ie110

0402 Doe Jane Fall 05 ie317

STUDENT_COURSES_1NF

1NF

First normal form, 1NF..

5P2

10P1JohnSmith1123

JaneDoe3312P3

P2

ProjNo

5

10

HoursFnameLnameSSN

Projects

EMPLOYEE_PROJECTS Nested

Not 1NF

JaneDoe3312

JohnSmith1123

FnameLnameSSN

5P21123

10P23312

5P33312

10P11123

HoursProjNoSSN

EMPLOYEE EMP_PROJECTS

1NF

Second normal form, 2NF

Prime Attribute:An attribute that is a member of the primary key

Full functional Dependency:A FD, Y Z, such that X Z is false for all X Y

{SSN, PNumber} {Hours} Full FD ?

{SSN, PNumber} EName Full FD ?

Second normal form, 2NF..

A schema R is in 2NF if every non-prime attribute A in R isfully functionally dependent on the primary key.

Any non-2NF design can be converted into a set of 2NF designs

is not in 2NF, because:PK = { cssn, ac-no}, but {cssn} {cname}.

CUST_DEPOSIT( cssn, cname, addr, banker, banker-type, ac-no, bal, access-date)

CUSTOMER( cssn, c-name, addr, banker, banker-type)

ACCOUNT( ac-no, balance)

DEPOSIT( c-ssn, ac-num, access-date)

Third normal from, 3NF

A Transitive Functional Dependency is an FD, Y Zthat can be derived from two FDs Y X and X Z.

Example (poor DB design):

CUST_BANKER( cssn, cname, addr, banker, banker-type, banker-mgr)

{cssn} {banker-mgr} is a transitive FD [why ?]

Any non-3NF design can be converted into a set of 2NF designs

CUSTOMER( cssn, c-name, street, city, banker, banker-type)

EMPLOYEE( e-ssn, mgr-ssn)

Concluding remarks on normal forms

1. Normalized designs avoid problems associated with “bad” designs

2. Notice that informal ER diagram Tables mapping yields 3NF schemas!

3. General 3NF:Notice our definition of 3NF depends on our selection of a PK.If a table has multiple choices of PK’s, then further problems may arise

There is a general 3NF definition that avoids such issues.

However, in practical cases, such issues are rare and outside our scope.

References and Further Reading

Silberschatz, Korth, Sudarshan, Database Systems Concepts, McGraw Hill

Next: Relational algebra, calculus, and SQL

Documents

IELM 511: Information System design