Upload
anne-gillene-vivar
View
53
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Chap 6
Citation preview
Chapter VI – Logical Design and Relational Data Model
1 | P a g e Team Crescendo
Logical Database Schema
The relational model gives us a single way to represent data: as a two-dimensional table
called a relation.
1. Schema
Figure 1
The name of a relation and the set of attributes for a relation is called the schema for that
relation. To show the schema of the relation, use the relation name followed by a parenthesized
list of its attributes. Using figure 1 above, we can form the schema:
Movies (Title, Year, Length, Film Type)
The attributes in a relation schema are a set, not a list. The standard order of attributes
must be followed when displaying the relation or any of its rows.
2. Tuples The rows of a relation, other than the header row containing the attribute names, are called
tuples. A tuple has one component for each attribute. When we want to display the tuple alone,
not as part of the relation, we use commas to separate the components, and use a parenthesis to
surround the tuple. For example, we will use the first row of the given relation:
(Star Wars, 1977, 124, color)
We should always use the order in which the attributes were listed in the relation schema
because the attributes are not displayed.
3. Domains
The relational data model requires that each component of each tuple should be atomic,
meaning that its values cannot be broken into smaller components.
The components of any tuple of the relation must have, in each component, a value that
belongs to the domain of the corresponding column. For example, tuples of the Movies relation
Chapter VI – Logical Design and Relational Data Model
2 | P a g e Team Crescendo
of Fig. 1 must have a first component that is a string, second and third components that are
integers, and a fourth component whose value is one of the constants color and blackAndWhite.
4. Relation Instances
A relation about movies is not static; rather, relations change over time. We expect that
these changes involve the tuples of the relation, such as adding new tuples, editing the
components of the tuples, and deleting the tuples.
A set of tuples is for a given relation is called an instance of that relation. For example, the
first three tuples in figure 1 form an instance of relation Movies.
Presumably, the relation Movies has changed over time and will continue to change over
time. For example, in 1980, Movies did not contain the tuples for Mighty Ducks or Wayne's World.
However, a conventional database system maintains only one version of any relation: the set of
tuples that are in the relation "now." This instance of the relation is called the current instance
Chapter VI – Logical Design and Relational Data Model
3 | P a g e Team Crescendo
Relational Data Model
Model
- a representation of ‘real world’ objects and events, and their associations.
- concentrates on the essential, inherent aspects of an organization and ignores the accidental
properties
Data Model
- an integrated collection of concepts for describing data, relationships between data, and
constraints on the data used by an organization
- attempts to represent the data requirements of the organization, or the part of the
organization that you wish to model.
- provides the basic concepts and notations that will allow database designers and end-users to
communicate their understanding of the organization data unambiguously and accurately
- consists of three components (1) structural part – set of rules that define how the database is
to be constructed (2) manipulative – defining the types of operations/transactions that are
allowed on the data (including operations used for updating or retrieving data and for changing
the structure of the database) (3) set of integrity rules – ensures that the data is accurate
- the purpose of a data model is to represent data and to make the data understandable
The relational data model is based on the mathematical concept of a relation which is physically
represented as a table. Codd, a trained mathematician, used terminology taken from
mathematics, principally set theory and predicate logic.
Relation – a table with columns and rows
A relational DBMS requires only that the database be perceived by the user as tables
Attribute – a named column of a relation
In a relational model, we use relations to hold information about the objects that we
want to represent in the database.
The rows of the table correspond to individual records and the columns correspond to
the attributes
Attributes can appear in any order and the relation will still be the same relation and
convey the same meaning
Domain – the set of allowable values for one or more attributes
Important feature of the relational model, every attribute in a relational database is
associated with a domain.
Domains may be distinct for each attribute, or two or more attributes may be associated
with the same domain.
Chapter VI – Logical Design and Relational Data Model
4 | P a g e Team Crescendo
Note that, at any given time, typically there will be values in a domain that don’t
currently appear as values in the corresponding attribute. In other words, a domain
describes possible values for an attribute.
Allows us to define the meaning and source of values that attributes can hold.
More information is available to the system and it can (theoretically) reject operations
that don’t make sense
Chapter VI – Logical Design and Relational Data Model
5 | P a g e Team Crescendo
Tuple – a record of a relation
The fundamental elements of a relation
Relational database – a collection of normalized tables
consists of tables that are appropriately structured
Properties of relational tables
the table has a name that is distinct from all other tables in the database
Each cell of the table contains exactly one value; tables don’t contain repeating
groups of data
A relational table that satisfies this property is said to be normalized (first normal
form)
Each column has a disctinct name
The values of a column are all from the same domain
The order of columns has no significance.
Each record is distinct; there are no duplicate records
The order of records has no significance, theoretically.
Relational keys
Each record in a table must be unique, therefore we must be able to identify a column
or combination of columns (relational keys) that provides uniquenes.
Superkey – a column or set of columns that uniquely identifies a record within a table
Chapter VI – Logical Design and Relational Data Model
6 | P a g e Team Crescendo
Candidate key – a superkey that contains only the minimum number of columns
necessary for unique identification
– has two properties: (1) Uniquenes (2) Irreducibility – no proper subset
of the candidate key has the uniqueness property
Primary key – the candidate key that is selected to identify records uniquely within the
table
Foreign key – a column or set of columns within one table that matches the candidate
key of some table (possibly the same table)
Representing Relational Databases
A relational database consists of one or more tables. The common convention for
representing a description of a relational database is to give the name of each table,
followed by the column names in parentheses. Normally, the primary key is underlined.
The description of the relational database for the StayHome video rental company is:
Relational Integrity
Since every column has an associated domain, there are constraints (called domain
constraints) in the form of restrictions on the set of values allowed for the columns
of tables.
There are two important integrity rules, which are constraints that apply to all
instances of the database.
Chapter VI – Logical Design and Relational Data Model
7 | P a g e Team Crescendo
1. Entity Integrity
2. Referential Integrity
Nulls
represent a value for a column that is currently unknown or is not applicable for
this record
A way to deal with incomplete or exceptional data
It is not the same as a zero numeric value or a text string filled with spaces, but a
null represents the absence of a value
Entity Integrity
- In a base table no column of a primary key can be null
- A base table is a named table whose records are physically stored in the
database.
Referential Integrity
- If a foreign key exists in a table, either the foreign key value must match a candidate
key value of some record in its home table or the foreign key must be wholly null.
Advantages:
1. Ease of use: The revision of any information as tables consisting of rows and columns
is much easier to understand .
2. Flexibility: Different tables from which information has to be linked and extracted can
be easily manipulated by operators such as project and join to give information in the
form in which it is desired.
3. Precision: The usage of relational algebra and relational calculus in the manipulation of
he relations between the tables ensures that there is no ambiguity, which may otherwise
arise in establishing the linkages in a complicated network type database.
4. Security: Security control and authorization can also be implemented more easily by
moving sensitive attributes in a given table into a separate relation with its own
authorization controls. If authorization requirement permits, a particular attribute could
be joined back with others to enable full information retrieval.
Chapter VI – Logical Design and Relational Data Model
8 | P a g e Team Crescendo
5. Data Independence: Data independence is achieved more easily with normalization
structure used in a relational database than in the more complicated tree or network
structure.
6. Data Manipulation Language: The possibility of responding to query by means of a
language based on relational algebra and relational calculus e.g SQL is easy in the
relational database approach. For data organized in other structure the query language
either becomes complex or extremely limited in its capabilities.
Disadvantages :
1. Performance: A major constraint and therefore disadvantage in the use of relational
database system is machine performance. If the number of tables between which
relationships to be established are large and the tables themselves effect the performance
in responding to the sql queries.
2. Physical Storage Consumption: With an interactive system, for example an operation
like join would depend upon the physical storage also. It is, therefore common in
relational databases to tune the databases and in such a case the physical data layout
would be chosen so as to give good performance in the most frequently run operations.
It therefore would naturally result in the fact that the lays frequently run operations would
tend to become even more shared.
3. Slow extraction of meaning from data: if the data is naturally organized in a hierarchical
manner and stored as such, the hierarchical approach may give quick meaning for that
data.
Chapter VI – Logical Design and Relational Data Model
9 | P a g e Team Crescendo
Concept of Normalization
Normalization of Database
Normalization is a systematic approach of decomposing tables to eliminate data redundancy and
undesirable characteristics like insertion, update and deletion Anomalies. It is a two step process
that puts data into tabular form by removing duplicated data from the relation tables.
Uses of Normalization
1. Eliminating redundant (useless) data
2. Ensuring data dependencies make sense i.e. data is logically stored
Without Normalization it becomes difficult to handle and update the database, without facing
data loss. Insertion, Updation and Deletion Anomalies are very frequent if Database is not
normalized. To understand these anomalies let us take an example of Student table.
Student table:
Updation Anomaly: To update address of a student who occurs twice or more than twice in a
table, we will have to update S_Address column in all the rows, else data will become inconsistent.
Insertion Anomaly: Suppose for a new admission, we have a student id(S_id), name and address
of a student but if student has not opted for any subjects yet then we have to insert NULL there,
leading to insertion anomaly.
Deletion Anomaly: If (S_id) 401 has only one subject and temporarily he drops it, when we delete
that row, entire student record will be deleted along with it.
Normalization Rule
Normalization rule are divided into following normal form.
1. First Normal Form
Chapter VI – Logical Design and Relational Data Model
10 | P a g e Team Crescendo
2. Second Normal Form
3. Third Normal Form
4. BCNF
First Normal Form (1NF)
A row of data cannot contain repeating group of data i.e each column must have a unique value.
Each row of data must have a unique identifier i.e Primary key. For example consider a table which
is not in First Normal form.
You can clearly see here that student name Adam is used twice in the table and subject math is
also repeated. This violates the First Normal form. To reduce above table to First Normal form
break the table into two different tables.
In Student table concentration of subject_id is the Primary key. Now both the Student table and
Subject table are normalized to first normal form.
Chapter VI – Logical Design and Relational Data Model
11 | P a g e Team Crescendo
Second Normal form (2NF)
A table to be normalized to Second Normal form should all meet the needs of First Normal
form and there must not be any partial dependency of any column on primary key. It means that
for a table that has concatenated primary key, each column in the table that is not part of the
primary key must depend upon the entire concatenated key for its existence. If any column
depends only on one part of the concatenated key, then the table fails Second Normal form. For
example, consider a table which is not in Second Normal form.
In customer table concatenation of Customer_id and Order_id is the primary key. This table is in
First Normal form but not in Second Normal form because there are partial dependencies of
columns on primary key. Customer_Name is only dependent on customer_id, Order_name is
dependent on Order_id and there is no link between sale_detail and Customer_name.
To reduce Customer table to Second Normal form break the table into following three different
tables.
Chapter VI – Logical Design and Relational Data Model
12 | P a g e Team Crescendo
Denormalization Databases intended for online transaction processing (OLTP) are typically more normalized
than databases intended for online analytical processing (OLAP). OLTP applications are
characterized by a high volume of small transactions such as updating a sales record at a
supermarket checkout counter. The expectation is that each transaction will leave the database in
a consistent state. By contrast, databases intended for OLAP operations are primarily "read mostly"
databases. Denormalization is also used to improve performance on smaller computers as in
computerized cash-registers and mobile devices, since these may use the data for look-up only
(e.g. price lookups). Denormalization may also be used when no RDBMS exists for a platform (such
as Palm), or no changes are to be made to the data and a swift response is crucial.
Some Good Reasons Not To Normalize
That said, there are some good reasons not to normalize your database. Let’s look at a few:
1. Joins are expensive. Normalizing your database often involves creating lots of tables. In fact,
you can easily wind up with what might seem like a simple query spanning five or ten tables.
If you’ve ever tried doing a five-table join, you know that it works in principle, but its
painstakingly slow in practice. If you’re building a web application that relies upon multiple-
join queries against large tables, you might find yourself thinking: “If only this database wasn’t
normalized!” When you hear that thought in your head, it’s a good time to consider
denormalizing. If you can stick all of the data used by that query into a single table without
really jeopardizing your data integrity, go for it! Be a rebel and denormalize your database.
You won’t look back!
2. Normalized design is difficult. If you’re working with a complex database schema, you’ll
probably find yourself banging your head against the table over the complexity of
normalization. As a simple rule of thumb, if you’ve been banging your head against the table
for an hour or two trying to figure out how to move to the fourth normal form, you might be
taking normalization too far. Step back and ask yourself if it’s really worth continuing.
3. Quick and dirty should be quick and dirty. If you’re just developing a prototype, just do
whatever works quickly. Really. It’s OK. Rapid application development is sometimes more
important than elegant design. Just remember to go back and take a careful look at your
design once you’re ready to move beyond the prototyping phase. The price you pay for a quick
and dirty database design is that you might need to throw it away and start over when it’s time
to build for production.
Chapter VI – Logical Design and Relational Data Model
13 | P a g e Team Crescendo
Five Basic Normal Forms
I. First Normal Form
An entity is in the first normal form if it contains no repeating groups. In relational terms,
a table is in the first normal form if it contains no repeating columns. Repeating columns make
your data less flexible, waste disk space, and make it more difficult to search for data.
Example: In the telephone directory, it appears that the name table contains repeating
columns, child1, child2, and child3.
You can see some problems in the current table. The table always reserves space on the
disk for three child records, whether the person has children or not. The maximum number of
children that you can record is three, but some of your acquaintances might have four or more
children. To look for a particular child, you have to search all three columns in every row.
To eliminate the repeating columns and bring the table to the first normal form, separate
the table into two tables. Put the repeating columns into one of the tables. The association
between the two tables is established with a primary-key and foreign-key combination. Because
a child cannot exist without an association in the name table, you can reference the name table
with a foreign key, rec_num.
Chapter VI – Logical Design and Relational Data Model
14 | P a g e Team Crescendo
II. Second Normal Form
An entity is in second normal form if each attribute that is not in the primary key
provides a fact that depends on the entire key. A violation of the second normal form occurs
when a non-primary key attribute is a fact about a subset of a composite key.
Example: An inventory entity records quantities of specific parts that are stored at particular
warehouses.
Here, the primary key consists of the PART and the WAREHOUSE attributes together.
Because the attribute WAREHOUSE_ADDRESS depends only on the value of WAREHOUSE, the
entity violates the rule for second normal form. This design causes several problems:
Each instances for a part that this warehouse stores repeats the address of the
warehouse.
If the address of the warehouse changes, every instance referring to a part that is stored
in that warehouse must be updated.
Because of the redundancy, the data might become inconsistent. Different instances
could show different addresses for the same warehouse.
If at any time the warehouse has no stored parts, the address of the warehouse might
not exist in any instances in the entity.
To satisfy second normal form, the information in the figure above would be in two entities.
Chapter VI – Logical Design and Relational Data Model
15 | P a g e Team Crescendo
III. Third Normal Form
An entity is in third normal form if each non-primary key attribute provides a fact that is
independent of other non-key attributes and depends only on the key. A violation of the third
normal form occurs when a non-primary attribute is a fact about another non-key attribute.
Example: The first entity in the following figure contains the attributes EMPLOYEE_NUMBER and
DEPARTMENT_NUMBER. Suppose that a program or user adds an attribute,
DEPARTMENT_NAME, to the entity. The new attribute depends on DEPARTMENT_NUMBER,
whereas the primary key is on the EMPLOYEE_NUMBER attribute. The entity now violates third
normal form.
Changing the DEPARTMENT_NAME value based on the update of a single employee,
David Brown, does not change the DEPARTMENT_NAME value for other employees in that
department. The updated version of the entity illustrates the resulting inconsistency.
Additionally, updating the DEPARTMENT_NAME in this table does not update it in any other
table that might contain a DEPARTMENT_NAME column.
You can normalize the entity by modifying the EMPLOYEE_DEPARTMENT entity and
creating two new entities: EMPLOYEE and DEPARTMENT. The DEPARTMENT entity contains
attributes for DEPARTMENT_NUMBER and DEPARTMENT_NAME. Now, an update such as
Chapter VI – Logical Design and Relational Data Model
16 | P a g e Team Crescendo
changing a department name is much easier. You need to make the update only to the
DEPARTMENT entity.
Chapter VI – Logical Design and Relational Data Model
17 | P a g e Team Crescendo
IV. Fourth Normal Form
An entity is in fourth normal form if no instance contains two or more independent,
multi-valued facts about an entity.
Example: Consider the EMPLOYEE entity. Each instance of EMPLOYEE could have both
SKILL_CODE and LANGUAGE_CODE. An employee can have several skills and know several
languages. Two relationships exist, one between employees and skills, and one between
employees and languages. An entity is not in fourth normal form if it represents both
relationships.
Instead, you can avoid this violation by creating two entities that represent both relationships.
V. Fifth Normal Form
Fifth normal form deals with cases where information can be reconstructed from smaller
pieces of information that can be maintained with less redundancy. Second, third, and fourth
normal forms also serve this purpose, but fifth normal form generalizes to cases not covered by
the others.
Example: If agents represent companies, companies make products, and agents sell products,
then we might want to keep a record of which agent sells which product for which company.
This information could be kept in one record type with three fields:
Chapter VI – Logical Design and Relational Data Model
18 | P a g e Team Crescendo
In this case, it turns out that we can reconstruct all the true facts from a normalized form
consisting of three separate record types, each containing two fields:
Roughly speaking, we may say that a record type is in fifth normal form when its
information content cannot be reconstructed from several smaller record types. If a record type
can only be decomposed into smaller records which all have the same key, then the record type
is considered to be in fifth normal form without decomposition. A record type in fifth normal
form is also in fourth, third, second, and first normal forms.
Chapter VI – Logical Design and Relational Data Model
19 | P a g e Team Crescendo
Transforming E-R Diagrams into Relations
It is useful to transform the conceptual data model into a set of normalized relations
Steps:
1. Represent entities
2. Represent relationships
3. Normalize the relations
4. Merge the relations
In translating a relationship set to a relation, attributes of the relation must include:
- The primary key for each participating entity set
- All descriptive attributes of the relationship set
From E/R Diagrams to Relational Designs
From Entity Sets to Relations
- For each non-weak entity set, create a relation of the same name and with the same
set of attributes.
Example: (Entity = Stars)
Name Address
Carrie Fisher 123 Maple St., Hollywood
Mark Hamill 456 Oak Rd., Brentwood
Chapter VI – Logical Design and Relational Data Model
20 | P a g e Team Crescendo
From E/R Relationships to Relations
- For each entity set involved in relationship R, we take its key attribute or attributes as
part of the schema of the relation for R.
- If the relationship has attributes, then these are also attributes of relation R
Example: (Relationship = Owns)
Combining Relations
Example: (Combining relation Movies with relation Owns)
Handling Weak Entity Sets
If W is a weak entity set, construct for W a relation whose schema consists of:
1. All Attributes of the weak entity set W.
2. All attribute of supporting relationship for W.
3. For each supporting relationship for W, all the key attributes of the entity set E.
Rename attributes, if necessary, to avoid name conflicts.
Do not construct a relation for any supporting relationship for W.
Example:
Title Year studioName
Star Wars 1977 Fox
Mighty Ducks 1991 Disney
Title Year Length filmType studioName
Star Wars 1977 124 color Fox
Mighty Ducks 1991 104 color Disney
Chapter VI – Logical Design and Relational Data Model
21 | P a g e Team Crescendo
Schema for Relation Contracts:
Contracts (starName, studioName, title, year, salary)
Converting Subclass Structures to Relations
Principal conversion strategies:
1. Follow the E/R viewpoint. For each entity set E in the hierarchy, create a relation that
includes the key attributes from the root and any attributes belonging to E.
2. Treat entities as objects belonging to a single class. For each possible subtree including
the root, create on relation, whose schema includes all the attributes of all the entity sets
in the subtree.
3. Use null values. Create one relation with all the attributes of all the entity sets in the
hierarchy. Each entity is represented by one tuple, and that tuple has a null value for
whatever attributes the entity does not have.
E/R – Style Conversion
- Create a relation for each entity set. If the entity set E is not the root of the hierarchy,
then the relation for E will include the key attributes at the root, to identify the entity
represented by each tuple, plus all the attributes of E.
Example:
1. Movies (title, year, length, filmType)
2. MurderMysteries (title, year, weapon)
3. Cartoons (title, year)
Chapter VI – Logical Design and Relational Data Model
22 | P a g e Team Crescendo
An Object-Oriented Approach
- An alternate strategy for converting isa-hierarchies to relations is to enumerate all
the possible subtrees of the hierarchy.
- For each, create one relation that represents entities that have components in exactly
those subtrees; the schema for this relation has all the attributes of any entity set in
the subtree.
Example: (refer to image in previous example)
Four possible subtrees including the root:
1. Movie alone.
Movies (title, year, lenth, filmType)
2. Movies and Cartoons only.
MoviesC (title, year, length, filmType)
3. Movies and Murder-Mysteries only.
MoviesMM (title, year, length, filmType, weapon)
4. All three entity sets.
MoviesCMM (title, year, length, filmType, weapon)
We can combine Movies with MoviesC and MoviesMM with MoviesCMM, although
doing so loses some information.
Using Null Values to Combine Relations
- If we are allowed to use NULL as a value in tuples, we can handle a hierarchy of entity
sets with a single relation. This relation has all the attributes belonging to any entity
set of the hierarchy.
Example: (based from the previous examples)
Movie (title, year, lenth, fimType, weapon)
Chapter VI – Logical Design and Relational Data Model
23 | P a g e Team Crescendo
Combining Relations:
Relations are sets. Therefore, set operations (∪, ∩, −) can be applied to relations with respect to
the underlying sets to form a new relation.
Let
A= {1, 2, 3} and
B= {1, 2, 3, 4}
The relation R1 is on A and the relation R2 is on B:
R1= {(1, 1), (2, 2), (3, 3)} and
R2= {(1, 1), (1, 2), (1, 3), (1, 4)}
Because a movie has several stars, we are forced to repeat all the information about the movie,
once for each star, that the length of the Star Wars is repeated three times once for each stars as
is the fact that the movie is owned by Fox. And this redundancy is undesirable, and the purpose
of merging or combining relation is to split relation and thereby remove the redundancy.
Chapter VI – Logical Design and Relational Data Model
24 | P a g e Team Crescendo