Upload
brianna-porter
View
221
Download
2
Tags:
Embed Size (px)
Citation preview
ISYS 200 Week #5 1
Systems Analysis IDesigning Databases
ISYS 200
Glenn Booker
ISYS 200 Week #5 2
Databases
Storing and manipulating data are the heart of an information system
Data must be Available when needed Accurate Consistent in its definition Efficiently stored and retrieved Presented in some useful form
ISYS 200 Week #5 3
Databases
There are two approaches for database design – flat file and relational A flat (or ‘conventional’) file database keeps
separate files for each application COBOL systems, and FileMaker Pro Like using Microsoft Excel as a database
A relational database stores data in a single structure for use by many applications Every major database system; Oracle, DB2, Access,
MySQL, Informix, etc.
ISYS 200 Week #5 4
Databases
Flat file systems are, by definition, redundant Data is copied to each report table only when the
report is generated, so later changes make the table obsolete
Relational databases keep the connections to each set of data Each record exists once in the system Is much more flexible to generate custom views
of the data, without changing its structure
ISYS 200 Week #5 5
Reality, Data, Metadata
A database models some collection of data which appears in reality Reality is the thing being modeled Data is used to describe the key attributes
of that thing About what aspects of that thing do we wish to
keep track? Metadata (data about data) describes the
characteristics of the data
ISYS 200 Week #5 6
Data Modeling
Data modeling (or database or information modeling) is a way of organizing and describing the data in a system
It is a logical model to describe the specific data fields (elements) we wish to capture, and how they are related to each other
ISYS 200 Week #5 7
Where to start?
Recall that data modeling starts with thinking about the things involved in your system
These things are formally called “entities” – nouns, if you will
Start by identifying all of the places, people, events, and ideas which are affected by your system
ISYS 200 Week #5 8
Permanent vs. Transient Data
A key for relational data modeling is that we are primarily concerned with data we need to keep permanently
Data which is only needed briefly isn’t modeled in an ERD Major difference between relational and
object-oriented analysis
ISYS 200 Week #5 9
Lots of Types of Files
Master files contain permanent information Transaction files describe business events Document files contain copies of historic data Archival files contain master and
transaction file records offline, which are no longer needed
Table look-up files contain static data for reference calculations (tax tables, Zip codes)
Audit files record changes to other files
ISYS 200 Week #5 10
Characterize Entities
Then examine each entity and determine the attributes which you are interested in – what data do you need to know in order to describe one such entity meaningfully?
Consider if some attributes can be readily grouped together, thereby forming compound attributes (e.g. name)
ISYS 200 Week #5 11
Characterize Entities
Entities are generally one of two types: A set of data you want to keep permanently
(customer orders, product information, etc.), or A lookup list or table (types of status codes,
shipping rates, tax rates, etc.) Data which is transient is generally kept in
local variables, and doesn’t appear in an ERD (e.g. change of address info)
ISYS 200 Week #5 12
Keep it or not?
In trying to decide if data needs to be kept, consider whether someone might want to analyze that data in the future
For examples, to look for sales patterns, trace relocation history, keep record of data changes (who modified what data and when?)
When in doubt, keep it for now
ISYS 200 Week #5 13
Attributive Entities
An entity which depends on the existence of another entity, but only indicates multiple examples of that ‘parent’ entity, is an attributive entity Hence it’s referred to as describing ‘repeating
groups of data’, or they are ‘parent/child’ entities Shown using part of an oval inside the entity
box (p. 36)
ISYS 200 Week #5 14
Database Notation
In week 2 we covered the basic notation for cardinality Only one Zero or one One or many Zero, one, or many Strictly many
Now we want to put all the two-entity pairs together, into a single diagram
ISYS 200 Week #5 15
ERD
The Entity-Relationship Diagram (ERD) captures all the entities for our system, and shows the relationships among them (the lines connecting them)
The relationships all need to have A verb phrase in at least one direction Cardinality at both ends
Now we can add attributes and keys
ISYS 200 Week #5 16
Attributes
Each entity is almost certain to have at least two attributes (often more) At least one of them is a unique identifier for each
record in the entity, called the primary key Other attributes describe significant
characteristics of the entity So, about what aspects of the entity do you want
to retain knowledge? Attributes are also called fields, data items, etc.
ISYS 200 Week #5 17
Characterize Attributes
For each attribute, define its data type: Text (“Fred”) [and the character set (Latin)] Number (real (3.56) or integer (124)) Date and/or time Yes/No (a.k.a. T/F, binary, or Boolean) A fixed set of possible values (e.g. grades)
Formally known as an enumerated list Multimedia: photos, drawings, movies, sounds
ISYS 200 Week #5 18
Attributes & Records
Attributes can be fixed or variable length Length of integer (8-64 bits, in powers of 2), Lengths of text fields (the number of characters
or bytes it contains) Length of real number fields and the number of
decimals (8.1 only has six digits, -12345.6) We won’t worry about attribute lengths
A record is a set of attributes from one entity Typically corresponding to one row in a data table
ISYS 200 Week #5 19
Relevant Data Type Standards
Character sets ISO/IEC 8859 – a set of nine 8-bit character sets,
each used for a different family of languages (Latin, Cyrillic, Greek, Arabic, etc.)
Unicode – a 16-bit character set to represent almost all languages
Representation of dates and times ISO 8601
ISYS 200 Week #5 20
Characterize Attributes
Other traits to look for, beyond the scope of this course: Identify the domain of each attribute
What is the range of allowable values? Determine if there is a default value for
each attribute Is each attribute mandatory (required) for each
entity? (Avoid many mandatory fields) We care: could an attribute be a key?
ISYS 200 Week #5 21
Key Attributes
An attribute or group of attributes may be a unique identifier, or key, for each entity Examples are Social Security Number,
driver’s license number, ISBN, Student ID If a group of attributes is used, it is
a concatenated (or composite or compound) key {Course number, section number, and
term} could form a concatenated key
ISYS 200 Week #5 22
Many Keys Possible
There might be more than one usable key for an entity
Each possible key is called a candidate key One candidate key is selected to be the
primary key (PK) All others are alternate keys (AK)
Example: the electric company may use a customer ID or account number as a primary key, and your phone number as an alternate key
ISYS 200 Week #5 23
Primary Key may be Meaningless
A primary key may correspond to some recognizable attribute SSN, student ID, ISBN, etc.
Or it may be completely meaningless A sequential number, called Order_ID
As long as the primary key is unique for every record, either kind is acceptable
ISYS 200 Week #5 24
Foreign Keys
A foreign key is an attribute which establishes the connection between two entities A Sales entity might have a foreign key attribute
Customer_ID, which points to all of the data in the Customer entity for the particular customer who placed a given sale
If that were the case, the primary key in the Customer entity must be Customer_ID
This type of connection is the heart of relational database modeling (!)
ISYS 200 Week #5 25
Foreign Keys
A foreign key (FK) is an attribute which exists, in an entity other than where it is a primary key (PK), to establish the relationship between the two entities Primary key must be unique for each record, but
a foreign key value may appear many times Only one PK-FK connection is required for
the relationship to exist Entity with FK generally has a PK of its own
ISYS 200 Week #5 26
Other Key Traits
A PK attribute may also be a FK Especially for 1:1 relationships
An associative entity builds a concatenated primary key from more than one entity Uses a diamond shape inside the normal box
to show its special nature
ISYS 200 Week #5 27
Other Relationships
A many-to-many (non-specific) relationship implies a lot of one-to-many relationships Often use an associative entity to bridge
between them An identifying relationship is when a parent
entity’s PK is used as part of the PK for a child entity Child entity is then considered “weak”
because it depends on the parent
ISYS 200 Week #5 28
Summary of Key Traits
The bottom line for keys is: Each entity must have at least one PK
More than one PK implies a concatenated key Alternate keys are completely optional Each entity may have from zero to many FK’s Each FK is a PK in another, related entity Only one PK-FK relationship is needed to relate
two entities Some keys are not inherently meaningful data
ISYS 200 Week #5 29
Data Normalization
Analysis of a data model for implementation is done using data normalization Normalization organizes data attributes to
form simple, non-redundant, flexible, adaptive entities
There are five levels of data normalization, of which three are ever used They build on each other; so to get to second
level you have to comply with first level too
ISYS 200 Week #5 30
First Normal Form (1NF)
An entity is in first normal form if there are no attributes which can have more than one value for each instance (record) of the entity
Attributes which could have more than one value for a given entity belong to a different kind of entity
In other words, every attribute appears only once for each record
ISYS 200 Week #5 31
Second Normal Form (2NF)
Must be first normal form, and: Each non-primary-key attribute is uniquely
determined by the entire primary key Non-primary-key attributes may not be
dependent on only part of the primary key If any are, move them to another table which uses
only that part of the primary key If there are no concatenated keys, second
normal form comes automatically with 1NF
Look at concatenated keys only!
ISYS 200 Week #5 32
Third Normal Form (3NF)
Must be second normal form, and: The value of each non-primary-key attribute
is not dependent upon any other non-primary-key attribute Every attribute depends only on the primary key
The two ways to look for this are derived attributes and transitive dependencies...
ISYS 200 Week #5 33
Third Normal Form (3NF)
Derived attributes (data) are fields calculated or logically derived from other fields Exception: OK to keep attribute if multiple entities
are involved in deriving an attribute Transitive dependencies may exist for
non-concatenated keyed tables; is when a non-key attribute depends on another non-key attribute
ISYS 200 Week #5 34
Third Normal Form (3NF)
Or in brief, for third normal form…
An entity is in third normal form if every non-primary key attribute is dependent on the primary key, the whole primary key, and nothing but the primary key
(as in, “Do you swear to tell the truth…”)
ISYS 200 Week #5 35
Further Normalization
Additional improvement in data structure is possible through “Simplification by Inspection” - look for other redundancies or simplifications possible
Many CASE tools can also inspect for first level normalization, but generally no further
Just for the record, here are the 4th and 5th normal forms I’ve never seen them used in industry
ISYS 200 Week #5 36
Fourth and Fifth Normal Forms
Fourth normal form (4NF) involves removing multivalued dependencies If a pair of records has two matching attributes,
decompose the data structure to remove that Fifth normal form (5NF) involves removing
join dependencies (nearly impossible to do) This is when business rules define a connection
among many entities (e.g. if you replace a tire, you must also replace the valve stem)
Taken from the INFO 605 text, pp. 351-354
ISYS 200 Week #5 37
File Organization
There are many ways to organize files Sequential organization Linked lists Hash files
We won’t go into detail on them You can forget about 4NF and 5NF, too
ISYS 200 Week #5 38
Summary: How to create an ERD
Define the entities about which you’ll want to keep information
Define the relationships between those entities Must have a verb phrase and both cardinalities
Define the attributes for each entity At a minimum, define the name and data type
of each attribute Identify a primary key for each entity Identify foreign keys as needed
Normalize to third normal form (and fix keys as needed)