38
ISYS 200 Week #5 1 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

Embed Size (px)

Citation preview

Page 1: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 1

Systems Analysis IDesigning Databases

ISYS 200

Glenn Booker

Page 2: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 2

Databases

Storing and manipulating data are the heart of an information system

Data must be Available when needed Accurate Consistent in its definition Efficiently stored and retrieved Presented in some useful form

Page 3: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 3

Databases

There are two approaches for database design – flat file and relational A flat (or ‘conventional’) file database keeps

separate files for each application COBOL systems, and FileMaker Pro Like using Microsoft Excel as a database

A relational database stores data in a single structure for use by many applications Every major database system; Oracle, DB2, Access,

MySQL, Informix, etc.

Page 4: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 4

Databases

Flat file systems are, by definition, redundant Data is copied to each report table only when the

report is generated, so later changes make the table obsolete

Relational databases keep the connections to each set of data Each record exists once in the system Is much more flexible to generate custom views

of the data, without changing its structure

Page 5: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 5

Reality, Data, Metadata

A database models some collection of data which appears in reality Reality is the thing being modeled Data is used to describe the key attributes

of that thing About what aspects of that thing do we wish to

keep track? Metadata (data about data) describes the

characteristics of the data

Page 6: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 6

Data Modeling

Data modeling (or database or information modeling) is a way of organizing and describing the data in a system

It is a logical model to describe the specific data fields (elements) we wish to capture, and how they are related to each other

Page 7: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 7

Where to start?

Recall that data modeling starts with thinking about the things involved in your system

These things are formally called “entities” – nouns, if you will

Start by identifying all of the places, people, events, and ideas which are affected by your system

Page 8: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 8

Permanent vs. Transient Data

A key for relational data modeling is that we are primarily concerned with data we need to keep permanently

Data which is only needed briefly isn’t modeled in an ERD Major difference between relational and

object-oriented analysis

Page 9: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 9

Lots of Types of Files

Master files contain permanent information Transaction files describe business events Document files contain copies of historic data Archival files contain master and

transaction file records offline, which are no longer needed

Table look-up files contain static data for reference calculations (tax tables, Zip codes)

Audit files record changes to other files

Page 10: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 10

Characterize Entities

Then examine each entity and determine the attributes which you are interested in – what data do you need to know in order to describe one such entity meaningfully?

Consider if some attributes can be readily grouped together, thereby forming compound attributes (e.g. name)

Page 11: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 11

Characterize Entities

Entities are generally one of two types: A set of data you want to keep permanently

(customer orders, product information, etc.), or A lookup list or table (types of status codes,

shipping rates, tax rates, etc.) Data which is transient is generally kept in

local variables, and doesn’t appear in an ERD (e.g. change of address info)

Page 12: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 12

Keep it or not?

In trying to decide if data needs to be kept, consider whether someone might want to analyze that data in the future

For examples, to look for sales patterns, trace relocation history, keep record of data changes (who modified what data and when?)

When in doubt, keep it for now

Page 13: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 13

Attributive Entities

An entity which depends on the existence of another entity, but only indicates multiple examples of that ‘parent’ entity, is an attributive entity Hence it’s referred to as describing ‘repeating

groups of data’, or they are ‘parent/child’ entities Shown using part of an oval inside the entity

box (p. 36)

Page 14: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 14

Database Notation

In week 2 we covered the basic notation for cardinality Only one Zero or one One or many Zero, one, or many Strictly many

Now we want to put all the two-entity pairs together, into a single diagram

Page 15: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 15

ERD

The Entity-Relationship Diagram (ERD) captures all the entities for our system, and shows the relationships among them (the lines connecting them)

The relationships all need to have A verb phrase in at least one direction Cardinality at both ends

Now we can add attributes and keys

Page 16: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 16

Attributes

Each entity is almost certain to have at least two attributes (often more) At least one of them is a unique identifier for each

record in the entity, called the primary key Other attributes describe significant

characteristics of the entity So, about what aspects of the entity do you want

to retain knowledge? Attributes are also called fields, data items, etc.

Page 17: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 17

Characterize Attributes

For each attribute, define its data type: Text (“Fred”) [and the character set (Latin)] Number (real (3.56) or integer (124)) Date and/or time Yes/No (a.k.a. T/F, binary, or Boolean) A fixed set of possible values (e.g. grades)

Formally known as an enumerated list Multimedia: photos, drawings, movies, sounds

Page 18: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 18

Attributes & Records

Attributes can be fixed or variable length Length of integer (8-64 bits, in powers of 2), Lengths of text fields (the number of characters

or bytes it contains) Length of real number fields and the number of

decimals (8.1 only has six digits, -12345.6) We won’t worry about attribute lengths

A record is a set of attributes from one entity Typically corresponding to one row in a data table

Page 19: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 19

Relevant Data Type Standards

Character sets ISO/IEC 8859 – a set of nine 8-bit character sets,

each used for a different family of languages (Latin, Cyrillic, Greek, Arabic, etc.)

Unicode – a 16-bit character set to represent almost all languages

Representation of dates and times ISO 8601

Page 20: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 20

Characterize Attributes

Other traits to look for, beyond the scope of this course: Identify the domain of each attribute

What is the range of allowable values? Determine if there is a default value for

each attribute Is each attribute mandatory (required) for each

entity? (Avoid many mandatory fields) We care: could an attribute be a key?

Page 21: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 21

Key Attributes

An attribute or group of attributes may be a unique identifier, or key, for each entity Examples are Social Security Number,

driver’s license number, ISBN, Student ID If a group of attributes is used, it is

a concatenated (or composite or compound) key {Course number, section number, and

term} could form a concatenated key

Page 22: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 22

Many Keys Possible

There might be more than one usable key for an entity

Each possible key is called a candidate key One candidate key is selected to be the

primary key (PK) All others are alternate keys (AK)

Example: the electric company may use a customer ID or account number as a primary key, and your phone number as an alternate key

Page 23: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 23

Primary Key may be Meaningless

A primary key may correspond to some recognizable attribute SSN, student ID, ISBN, etc.

Or it may be completely meaningless A sequential number, called Order_ID

As long as the primary key is unique for every record, either kind is acceptable

Page 24: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 24

Foreign Keys

A foreign key is an attribute which establishes the connection between two entities A Sales entity might have a foreign key attribute

Customer_ID, which points to all of the data in the Customer entity for the particular customer who placed a given sale

If that were the case, the primary key in the Customer entity must be Customer_ID

This type of connection is the heart of relational database modeling (!)

Page 25: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 25

Foreign Keys

A foreign key (FK) is an attribute which exists, in an entity other than where it is a primary key (PK), to establish the relationship between the two entities Primary key must be unique for each record, but

a foreign key value may appear many times Only one PK-FK connection is required for

the relationship to exist Entity with FK generally has a PK of its own

Page 26: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 26

Other Key Traits

A PK attribute may also be a FK Especially for 1:1 relationships

An associative entity builds a concatenated primary key from more than one entity Uses a diamond shape inside the normal box

to show its special nature

Page 27: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 27

Other Relationships

A many-to-many (non-specific) relationship implies a lot of one-to-many relationships Often use an associative entity to bridge

between them An identifying relationship is when a parent

entity’s PK is used as part of the PK for a child entity Child entity is then considered “weak”

because it depends on the parent

Page 28: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 28

Summary of Key Traits

The bottom line for keys is: Each entity must have at least one PK

More than one PK implies a concatenated key Alternate keys are completely optional Each entity may have from zero to many FK’s Each FK is a PK in another, related entity Only one PK-FK relationship is needed to relate

two entities Some keys are not inherently meaningful data

Page 29: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 29

Data Normalization

Analysis of a data model for implementation is done using data normalization Normalization organizes data attributes to

form simple, non-redundant, flexible, adaptive entities

There are five levels of data normalization, of which three are ever used They build on each other; so to get to second

level you have to comply with first level too

Page 30: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 30

First Normal Form (1NF)

An entity is in first normal form if there are no attributes which can have more than one value for each instance (record) of the entity

Attributes which could have more than one value for a given entity belong to a different kind of entity

In other words, every attribute appears only once for each record

Page 31: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 31

Second Normal Form (2NF)

Must be first normal form, and: Each non-primary-key attribute is uniquely

determined by the entire primary key Non-primary-key attributes may not be

dependent on only part of the primary key If any are, move them to another table which uses

only that part of the primary key If there are no concatenated keys, second

normal form comes automatically with 1NF

Look at concatenated keys only!

Page 32: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 32

Third Normal Form (3NF)

Must be second normal form, and: The value of each non-primary-key attribute

is not dependent upon any other non-primary-key attribute Every attribute depends only on the primary key

The two ways to look for this are derived attributes and transitive dependencies...

Page 33: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 33

Third Normal Form (3NF)

Derived attributes (data) are fields calculated or logically derived from other fields Exception: OK to keep attribute if multiple entities

are involved in deriving an attribute Transitive dependencies may exist for

non-concatenated keyed tables; is when a non-key attribute depends on another non-key attribute

Page 34: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 34

Third Normal Form (3NF)

Or in brief, for third normal form…

An entity is in third normal form if every non-primary key attribute is dependent on the primary key, the whole primary key, and nothing but the primary key

(as in, “Do you swear to tell the truth…”)

Page 35: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 35

Further Normalization

Additional improvement in data structure is possible through “Simplification by Inspection” - look for other redundancies or simplifications possible

Many CASE tools can also inspect for first level normalization, but generally no further

Just for the record, here are the 4th and 5th normal forms I’ve never seen them used in industry

Page 36: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 36

Fourth and Fifth Normal Forms

Fourth normal form (4NF) involves removing multivalued dependencies If a pair of records has two matching attributes,

decompose the data structure to remove that Fifth normal form (5NF) involves removing

join dependencies (nearly impossible to do) This is when business rules define a connection

among many entities (e.g. if you replace a tire, you must also replace the valve stem)

Taken from the INFO 605 text, pp. 351-354

Page 37: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 37

File Organization

There are many ways to organize files Sequential organization Linked lists Hash files

We won’t go into detail on them You can forget about 4NF and 5NF, too

Page 38: ISYS 200Week #51 Systems Analysis I Designing Databases ISYS 200 Glenn Booker

ISYS 200 Week #5 38

Summary: How to create an ERD

Define the entities about which you’ll want to keep information

Define the relationships between those entities Must have a verb phrase and both cardinalities

Define the attributes for each entity At a minimum, define the name and data type

of each attribute Identify a primary key for each entity Identify foreign keys as needed

Normalize to third normal form (and fix keys as needed)