6.1 Lecture 6 Data Design IMS2805 - Systems Design and Implementation

6.1

Lecture 6

Data Design

IMS2805 - Systems Design and Implementation

6.2

References

HOFFER, J.A., GEORGE, J.F. and VALACICH (2002) 3rd ed., Modern Systems Analysis and Design, Prentice-Hall, New Jersey, Chap 12

HOFFER, J.A., GEORGE, J.F. and VALACICH (2005) 4th ed., Modern Systems Analysis and Design, Prentice-Hall, New Jersey, Chap 10

WHITTEN, J.L., BENTLEY, L.D. and DITTMAN, K.C. (2001) 5th ed., Systems Analysis and Design Methods, Irwin/McGraw-Hill, New York, NY. Chapter 12

6.3

Data Design

Logical data design:

Logical data modelling

Physical file and database design

6.4

Overview of logical data modelling

Logical data modelling involves:

collecting detailed attributes for each entity and relationship identified

converting ER models to relations normalising the relations merging relations from each user viewpoint converting the normalised and merged relations

to create a data structure diagram

6.5

The data structures developed in the logical system design phase need to be converted to physical data structures for the physical design stage.

A knowledge of the relevant data base management system (DBMS) is important for this phase.

The aim is to develop a compromise between the “ideal” (most flexible) structure developed during normalisation and the most efficient (fastest, cheapest) structure preferred for implementation.

Data Design

6.6

Information required:

Normalised relations, including volume estimates

Definitions of each attribute

Descriptions of where and when data are used (CRUD)

Expectations/requirements for response time and data integrity

Descriptions of technologies to be used for implementing the files and databases

Physical Data Design

6.7

Key decisions: The storage format (data type) for each attribute, e.g.

length,coding scheme, decimal places, range of values

Grouping of attributes into physical records (data structure)

Arranging records in secondary memory for storage, retrieval, update (file organisation)

Selection of media and structures for efficient access

DBMS technologies require selection of options for retrieval and for implementation of relationships between entities

Physical Data Design

6.8

The relational database model

The relational database model represents data in the form of tables or relations

Important concepts are: relation primary key foreign key functional dependency normalisation

6.9

Normalisation

Normalisation is a process for converting complex data structures into simple, stable data structures in the form of relations.

Data models consisting of normalised relations: are robust and stable have minimum redundancy are flexible are technology-independent (logical)

6.10

Normalisation ensures that each attribute is attached to the appropriate relation:

i.e. each attribute is contained in the relation which represents the real world system object or concept that the attribute describes or is a property ofe.g. the attribute Student-name should be in the relation STUDENT which represents the real world object “student” of interest to a student records system

Normalisation

6.11

Normalisation

Normalisation was originally developed as part of relational database theory by E.F. Codd (1970)

Normalisation is accomplished in stages, each of which corresponds to a “normal form”

Originally, Codd defined first, second and third normal forms. Third normal form is adequate for most business applications.

Later extensions include Boyce-Codd, 4th, 5th and domain-key normal forms.

6.12

Relation

A relation is a named, two-dimensional table of data

Each relation consists of a set of named columns and an arbitrary number of rows

Each column corresponds to an attribute of the relation

Each row corresponds to an instance (or record) that contains values for that instance

6.13

Example relation

A relation generally corresponds to some real world object or concept of interest to the system (similar to an entity) e.g.:

Emp# Name Salary Dept

1247

1982

9314

Adams

Smith

Jones

24000

27000

33000

Finance

MIS

Finance

Employee

Employee (Emp#, Name, Salary, Dept)

6.14

Properties of relations

Relational tables are tables in which:

data values are atomic (single-valued)

data values in columns are from the same domain

each row in the relation is unique

the sequence of columns is insignificant

the sequence of rows is insignificant

6.15

Primary key An attribute or group of attributes which uniquely

identifies a row of a relation Entity integrity (relational data base theory)

requires that each relation has a non-null primary key

Where several possible keys are identified, they are known as candidate keys - choose one to be the primary key

Employee (Emp#, Name, Salary, Dept)

Order-item (Order#, Item#, Qty-ordered)

E.g.

6.16

Foreign key

A foreign key is an attribute in one relation that is also a primary key in another relation

The referential integrity constraint (relational database theory) specifies that if an attribute value exists in one relation then it must also exist in a linked relation

A foreign key must satisfy referential integrity

6.17

Foreign key example

In the example below, if a given Dept# exists in an Employee relation then that Dept# must exist in the Department relation

Employee (Emp#, Name, Salary, Dept#)

Department (Dept#, Dname, Budget)

foreign key

6.18

Functional dependency A functional dependency is a particular

relationship between attributes in a relation For any relation R, attribute B is functionally

dependent on attribute A if each value of A has only ONE value of B associated with it,i.e. if for every valid instance of A, that value of A uniquely determines the value of B

A identifies B

A B

Emp# Emp-name

Emp# Salary

6.19

Normalisation to third normal form is accomplished in 3 steps each corresponding to a basic normal form

A normal form is a state of a relation that can be determined by applying simple rules concerning dependencies within that relation

Each step of the normalisation process is applied to a single relation in sequence so that the relation is converted to third normal form

Steps in normalisation

6.20

Steps in normalisation

First Normal Form

Second Normal Form

Third Normal Form

Unnormalised table

Remove repeating groups

Remove partial dependencies

Remove transitive dependencies

6.21

Well Structured Relations

A well structured relation contains a minimum amount of redundancy and allows users to insert, modify, and delete rows in a table without errors or inconsistencies (known as “anomalies”)

Three types of anomaly are possible: insertion deletion Modification

Third normal form relations are considered to be well structured relations

6.22

First normal form (1 NF) A relation is in first normal form if it contains no

repeating data :the value of the data at the intersection of each row and column must be single-valued

Remove any repeating groups of attributes to convert a relation to 1 NF (key of the removed group is a composite key)

Order (Order#, Customer#, (Item#, Desc, Qty))

Order-Item (Order#, Item#, Desc, Qty)

Order (Order#, Customer#)

6.23

Second normal form (2 NF)

A relation is in 2 NF if it is in 1 NF and every non-ket attribute is fully functionally dependent on the primary key

A partial dependency exists if one or more non-key attributes are dependent on only part of a composite primary key

Remove any partial dependencies to convert a 1 NFrelation to 2 NF

If the primary key consists of only one attribute or there are no non-key attributes, then a 1 NF relation is automatically in 2 NF

6.24

Second normal form (2 NF)

Remove Partial Dependencies a non-key attribute cannot be identified by part

of a composite key

Order (Order#, Item#, Desc, Qty-ordered)

Order-Item (Order#, Item#, Qty-ordered)

Item (Item#, Desc)

6.25

Partial dependency anomalies

Order# Item# Item-desc Qty

27

28

28

873

402

873

nut

bolt

nut

2

1

10

Order-Item

30 495 washer 50

UPDATE change item =-desc in many places DELETE data for last item lost when last order

for that item is deleted CREATE cannot add new item until it is

ordered

6.26

The solution for these anomalies: 2 NF

Order# Item# Qty

27

28

28

873

402

873

2

1

10

Order

30 495 50

Item# Desc

873

402

nut

bolt

495 washer

delete last order for item, but item remains

add new item at any time

change item description in one place only

Item

6.27

If a relation has a composite key, it must be checked for dependencies within the key to determine whether they should be retained e.g.

DEPT (Dept#, Dept-name, (Emp#, Emp-name))

DEPT (Dept#, Dept-name), DEPT-EMP (Dept#, Emp#, Emp-name)

But DEPT-EMP (Dept#, Emp#, Emp-name)

Remove Dept# from the key:EMP (Emp# ,Emp-name, Dept#)

Dependencies within the primary key

6.28

Third normal form (3 NF)

A relation is in 3 NF if it is in 2 NF and no transitive dependencies exist

A transitive dependency is a functional dependency between two or more non-key attributes

Remove any transitive dependencies to convert a 2 NF relation to 3 NF

6.29

Third normal form

Remove Transitive Dependencies a non-key attribute cannot be identified

by another non-key attribute

Employee (Emp#, Ename, Dept#, Dname)

Employee (Emp#, Ename, Dept#)

Department( Dept#, Dname)

(look for foreign keys and their attributes)

6.30

Transitive Dependency Anomalies

Emp# Emp-name Dept# Dname

10

20

25

Smith

Jones

Smith

D5

D7

D7

MIS

Finance

Finance

Employee

30 Black D8 Sales

UPDATE change dept name in many places DELETE data for dept lost when last employee

for that dept is deleted CREATE cannot add new dept until an

employee is allocated to it

6.31

The solution for these anomalies: 3 NF

Emp# Ename Dept#

10

20

25

Smith

Jones

Smith

D5

D7

D7

Employee

30 Black D8

Dept# Dname

D5

D7

MIS

Finance

D8 Sales

delete last emp in dept, but dept remains

add new dept at any time

change dept name in one place

Item

6.32

Normalisation to 3 NF

A relation is normalised if all attributes are fullyfunctionally dependent on the primary key:

Remove repeating groups

Remove partial dependencies

Remove transitive dependencies

6.33

Transforming ER Diagrams into Relations

Transforming an ER diagram into normalised relations, and then merging all the relations into one final, consolidated set of relations can be accomplished in four steps:

1. Represent entities as relations2. Represent relationships as relations3. Normalise each relation4. Merge the relations

6.34

Representing Entities as Relations

Each entity in the ER model is transformed into a relation e.g.

becomes

CUSTOMER

Customer (Customer-no, Name, Address, City, State, Postcode, Discount)

6.35

Representing Relationships as Relations

1. Binary Relationships (1:N, 1:1)

CUSTOMER ORDER

Customer (Customer-no, Name, Address, City, State, Postcode, Discount)

Order (Order-no, Order-date, Promised-date, Customer-no)

places

6.36


1. Binary Relationships (1:N, 1:1)

For 1:M, add the primary key of the entity on the one side of the relationship as a foreign key in the relation that is on the many side

For 1:1 relationship involving entities A and B, choose from

add the primary key of A as a foreign key of Badd the primary key of B as a foreign key of Aboth of the above

6.37

2. Binary and Higher Degree Relationships (M:N)


PRODUCT ORDER

Where we wish to know the quantity of each product on each order, i.e. this attribute is an attribute of the relationship requests/requested by:

requests

Order (Order-no, Order-date, Promised-date)Order Line (Order-no, Product-no, Quantity-ordered)Product (Product-no, description, (other attributes))

6.38


2. Binary and Higher Degree Relationships (M:N)

For M:N, first create a relation for each for each of the entity types, and then create a relation for the relationship, with a composite primary key formed from the primary keys of the participating entity types.

6.39

3. Unary Relationships


EMPLOYEEITEM

Has component manages

Reports to

Is a componentof

(1:N)(M:N)

Employee (Emp-id, Name, Birthdate, Manager-id)

Item (Item-no, Name, Cost)Item-Bill (Item-no, Component-no, Quantity)

6.40


4. IS-A Relationship (Class-Subclass or generalisation)

PROPERTY

BEACH PROPERTY

MOUNTAINPROPERTY

Property (Street-address, City-state-postcode, No-rooms, Typical-rent)Beach (Street-address, City-state-postcode , Distance-to-beach)Mountain (Street-address, City-state-postcode , Skiing)

6.41

During the normalisation process two or more relations with the same primary key may appear

The set of 3NF relations must not contain any duplicate data

Relations with the same primary key should be merged

Normalisation of relations: merging relations

6.42

Normalisation of relations

Synonyms Two or more attributes may have different names

but the same meaning Either adopt one of the names as a standard or

choose a third nameSTUDENT1 (Student-id, Name, Phone-no)

STUDENT2 (VCE-no, Name, Address)

STUDENT (Student-id, Name, Address, Phone-no)

6.43

Homonyms Two or more attributes may have the same name

but different meanings To resolve the conflict, new attribute names need

to be created

STUDENT1 (Student-id, Name, Address)STUDENT2 (Student-id , Name, Phone-no, Address)

STUDENT (Student-id, Name, Phone-no, Campus-address, Permanent-address )

Normalisation of relations

6.44

Normalisation of relations Transitive dependencies

When two 3NF relations are merged, transitive dependencies may result e.g.

STUDENT1 (Student-id, Major)STUDENT2 (Student-id , Advisor)

STUDENT (Student-id, Major, Advisor)

BUT MAJOR ADVISOR

STUDENT1 (Student-id, Major)MAJOR (Major , Advisor)

6.45

Data Structure Diagrams

A set of 3 NF relations may be converted to a simple diagrammatic form to begin physical database design

The conversion is simple:

1. Draw a named rectangle for each relation2. Draw a relationship line between rectangles

linked by foreign keys with a “many” cardinality at the foreign key end of the relationship

6.46

Data Structure Diagram example

CUSTOMER (Cust#, Cname,Phone number) SALES ORDER (Sord#, Sord-date, Cust#) SALES ORDER-ITEM (Sord#, Item#, Qty) ITEM (Item#, Item-desc)

CUSTOMER

SALESORDER

ITEM

SALES ORDERLINE

6.47

Data Structure Diagram example

Eliminate Redundant Relationships

(Tourcode, ...)

(Tourcode, depdate, ...)

(Booking#, ... , Tourcode, depdate)

redundant

TOUR

DEPARTURE

BOOKING

6.48

Logical Data Model+

Designer Added Data+

Volume & Sizing Info+

Constraints

Logical Process Model+

Designer Added Procedures+

Frequency & Response Req.+

Constraints

Logical Database Design+

Logical Transaction Model

Target Platform Specific Design

Relational Database Design

6.49

Structure (Stability / Adaptability)Preserve correspondence to normalised data model

Preserve data integrity

Performance (Speed and Resource Utilisation)Transaction response requirements

Minimise secondary storage requirementsMinimise transfers between primary and secondary

storage

Database Design Objectives

6.50

Logical Design: Designer added entity types and attribute types Review primary and foreign keys Review derived and aggregate data Define table and column structure Establish row and column ratios Define transaction access maps Review transaction access maps considering

performance Develop composite load map Evaluate design in terms of corporate standards,

response requirements and resource utilisation.

Database Design - Logical Design

6.51

Selection of table access paths including indexes and file organisation.

Selection of data types and sizes Selection of other DBMS dependent

variables page size, free space management option, data and index clustering, compression, concurrency control issues, database sizing - % fill for tables, buffer sizes, access control constraints.

Database Design - Physical Design

6.52

It is often necessary to extend the logical data model with extra entity types, attribute types and / or relationship types to reflect design and implementation decisions.

Additional data may be required to support: archiving back-up access control restart and recovery audit and integrity requirements system housekeeping functions and the storing of

historical information

Designer Added Data

6.53

Adding entities for the purpose of archiving:

Project EmployeeHas Assigned

ArchivedProject

ArchivedEmployee

Last UsedEmpno

Example: Designer Added Data

6.54

Relations become tables. Attributes become columns

a number of decisions must be made when mapping attributes to columns.

Domains must be defined and linked to the appropriate table columns. (if domains are supported by the DBMS)

Convert Logical Data Model

6.55

Some guidelines regarding primary keys: Primary keys identified in the logical data model may

not always be suitable for implementation. The length of each attribute forming the primary key

should be as short as possible. Avoid long character fields.

Keep the use of composite keys to a minimum. Surrogate keys may be used to reduce these

problems. Sometimes these keys may be hidden from the users.

Compact unique identifiers are particularly suitable as primary and therefore foreign keys.

No component of a primary key can accept null values, and the uniqueness property must be enforced.

Primary Keys

6.56

A technique to analyse the data access loads generated by each transaction type.

This allows alternative approaches and their impact on data access loads to be considered.

The loads generated by each transaction type may be aggregated to calculate the total access loads on the database, and this information is useful in identifying heavily loaded structures of the database.

Transaction Analysis

6.57

To properly evaluate a design it may be necessary to calculate several sets of performance statistics.

Transaction loads may vary widely from peak to average.

Database size may vary widely during application execution particularly if cumulative data is held.

It is much safer to perform design calculations using peak loads, rather than 'average' loads.

Planned equipment utilisation should never exceed 70%.

Statistics Must be Representative

6.58

Define the cardinality of each table, and the ratios between table rows.

Develop transaction access maps for each transaction type showing the sequence in which each table is accessed and the basis of each access.

Use table cardinality and ratio information to determine the number of rows of each table that each transaction type will access.

Transaction Analysis Technique

6.59

Aggregate these details to produce total access loads for each table in the database.

The individual loads for each transaction type and the composite load map should be carefully examined for opportunities to improve the efficiency of the system.

Great care needs to be taken to balance flexibility against performance, when considering amendments to transaction access paths.


6.60

The following data model fragment will be used for examples. The numbers relate to table cardinality (eg Order table 900) table ratios (eg Order to Orderline ratio 1:10)

NextOrder#

(1)

Order Item

(900) (1000)

(9000)

(1:10) (1:9)

OrderLine


6.61

Item (Item#, Description, Price, Quantity-on-hand) 1000 records with 5% growth pa

Order-line (Order#, Item#, Qty-ordered, Qty-shipped, Unit-price) 9000 records with 7% growth pa, Ratios ( 1 Item: 9 Order-lines), (1 Order : 10 Order-

lines) Order ( Order#, Ord-date, etc)

900 records with 7% growth pa, Next Order (Order#)

1 record with no growth Each Item has on average 9 Order-lines Each Order has on average 10 Order-lines

Table Cardinality & Row Ratios

6.62

Developed from action diagrams for elementary processes. For each transaction, or at least the most critical, it is necessary to identify: Each table access, The sequence of the access within the transaction, The type of access C. R. U. D.(create, read, update,

delete) The columns and predicate of the search condition for

the CRUD access Whether the predicate values come from other tables

accessed by the same transaction, The columns required as a result of this access, The number of accesses per transaction, The number of accesses per period.

Transaction Access Path Maps

6.63

Transaction Acces Path MapTransaction Number 1 Frequency 100 per hour peakTransaction Name: Enter New Order 30 per hour average

Destination Destination Destination Source Source Accesses AccessAccess Access Table Access Columns Table Column Per PerNo Type Name Predicate Required Name Names Tran Period

1 read Next Order Order# 1 1002 update Next Order current row Order# 1 1003 insert Order all screen 1 1004 read Item Item= screen all screen 10 10005 update Item current row Quantity -OH screen 10 10006 insert Order-line screen 10 1000

Total Logical Acceses 33 3300

NextOrder#

(1)

1 & 2 3 4 & 5

6

Order Item

(900) (1000)

(9000)

(1:10) (1:9)

OrderLine

Example: Transaction Access Path Map

6.64

Item

1 & 2

(1000)

Transaction Acces Path MapTransaction Number 33 Frequency 300 per day peakTransaction Name: Item Status Enquiry 100 per day average

Destination Destination Destination Source Source Accesses AccessAccess Access Table Access Columns Table Column Per PerNo Type Name Predicate Required Name Names Tran Period

1 read Item Item = screen Item#, Desc, QOH 0.7 2102 read Item Desc= screen Item#, Desc, QOH 0.3 90


Assumption: 70% of enquiries are by Item#, 30% by description (Desc)


6.65

3.

2.

1.

Transaction Acces Path Map

Transaction Number 66 Frequency 20 per hour peak

Transaction Name: Check Order Details 15 per hour average

Destination Destination Destination Source Source Accesses Access

Access Access Table Access Columns Table Column Per Per

No Type Name Predicate Required Name Names Tran Period

1 read Order all 1 20

2 read Order-line all Order Order# 10 200

3 read Item Item# = Item# all Order-line Item# 10 200


Order Item

(900) (1000)

(9000)

(1:10) (1:9)

OrderLine


6.66

Complicate the situation, because they are difficult to quantify and predict in advance.

Some controls on when ad hoc queries are permitted may be necessary.

Some DBMS allow the cost of a query to be calculated and the user may be forewarned or prevented from executing an expensive query.

Data may be extracted from the production database during quiet times and loaded onto another database on a different machine to minimise the impact of ad hoc query.

Ad Hoc Queries

6.67

The access path for each transaction should be reviewed for efficiency.

A number of techniques may be used to reduce the cost of transactions. Alteration of aggregation time. Alteration of column derivation time. Data Replication Combining and splitting tables Introducing codes simplify identification

and compress data

Analysis of the Transaction Access Path Map

6.68

Summarise the information from each transaction access map to show the total number of accesses by table, access type, table columns used for access and the access predicate.

This will highlight possible keys and indexes, row sequencing requirements and the columns used to join tables.

Ideally the production of the composite load map should be automated to allow for automatic re-calculation of totals following changes to transaction access maps.

Developing the Composite Load Map

6.69

Composite Load MapDestination Destination Destination Destination Accesses Source SourceTable Access Column Access Per Table Column Trans AccessName Type Name(s) Predicate Hour Name Name(s) Number NumberItem read Item# "=" 200 Order-line Item# 66 3

"=" 1000 screen 1 4"=" 27 screen 33 1Total 1227

Description "=" 12 screen 33 2Total 12

update item# "=" 1000 item item# 1 5Total 1000

Next Order read Order# 100 1 1Total 100

Next Order update Order# 100 1 2Total 100

Order insert all 100 screen 1 3Total 100

read order# "=" 20 screen 66 1Total 20

Order-line insert all 1000 screen 1 6Total 1000

read order# "=" 200 order order# 66 2Total 200

Example: Composite Load Map

6.70

The composite load map is designed to highlight table access types that have a very high frequency.

One logical input-output operation does not necessarily result in one physical input-output operation. Buffering techniques, indexes, and logging complicate the actual number of physical I/O operations.

The transactions contributing to these high levels of access should be closely scrutinised to determine whether the number of I/O accesses could be reduced by altering their data access strategy.

The composite load map also provides a base for determining where entry point access methods such as primary and secondary indexes may provide substantial benefit.

Interpreting the Composite Load Map

6.71

Where the composite load map shows high join frequencies between tables, control of table row placement, or the use of table access methods such as hashing techniques or indexes may be used to optimise join operations.

Where a table has a high frequency of sequential access, physical ordering of table rows, or the creation of an index to support the access could be useful.

Tables with a small number of rows could be held in main storage, if this option is supported by the DBMS.

The actual access path taken will depend on the DBMS query optimiser.

Interpreting the Composite Load Map

6.72

Given the limitations of some DBMS it may be necessary to compromise the logical data model for performance reasons.

In general this should be hidden from application logic by the use of special access routines that are responsible for mapping the normalised application view to the de-normalised storage view.

Some DBMS specific design techniques may allow a normalised application view to be maintained.

Compromising the Logical Model

6.73

Some transactions may implement designer added procedures that are part of the designed business system rather than part of the process model developed during analysis.

The starting point for transaction design for process implementing procedures is the logic defined via an action diagram for the elementary process(es) which the transaction (procedure) is to implement.

If action diagrams have not been developed during analysis then these should be developed in close association with the data model as part of design.

Transaction Design

6.74

Operational, audit and access control issues. Impact of designer added data on procedure logic. Concurrency and locking issues. Failure, fallback, restart and recovery issues. Standardisation of transaction resource

requirements. Frequency of use and transaction response

requirements. The use of derived and redundant data to minimise

access to secondary storage. The most appropriate time to derive aggregate

data.

Physical Issues in Transaction Design

Documents

6.1 Lecture 6 Data Design IMS2805 - Systems Design and Implementation