24
Chapter VI – Logical Design and Relational Data Model 1 | Page Team Crescendo Logical Database Schema The relational model gives us a single way to represent data: as a two-dimensional table called a relation. 1. Schema Figure 1 The name of a relation and the set of attributes for a relation is called the schema for that relation. To show the schema of the relation, use the relation name followed by a parenthesized list of its attributes. Using figure 1 above, we can form the schema: Movies (Title, Year, Length, Film Type) The attributes in a relation schema are a set, not a list. The standard order of attributes must be followed when displaying the relation or any of its rows. 2. Tuples The rows of a relation, other than the header row containing the attribute names, are called tuples. A tuple has one component for each attribute. When we want to display the tuple alone, not as part of the relation, we use commas to separate the components, and use a parenthesis to surround the tuple. For example, we will use the first row of the given relation: (Star Wars, 1977, 124, color) We should always use the order in which the attributes were listed in the relation schema because the attributes are not displayed. 3. Domains The relational data model requires that each component of each tuple should be atomic, meaning that its values cannot be broken into smaller components. The components of any tuple of the relation must have, in each component, a value that belongs to the domain of the corresponding column. For example, tuples of the Movies relation

Chapter 6 DBMS

Embed Size (px)

DESCRIPTION

Chap 6

Citation preview

Page 1: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

1 | P a g e Team Crescendo

Logical Database Schema

The relational model gives us a single way to represent data: as a two-dimensional table

called a relation.

1. Schema

Figure 1

The name of a relation and the set of attributes for a relation is called the schema for that

relation. To show the schema of the relation, use the relation name followed by a parenthesized

list of its attributes. Using figure 1 above, we can form the schema:

Movies (Title, Year, Length, Film Type)

The attributes in a relation schema are a set, not a list. The standard order of attributes

must be followed when displaying the relation or any of its rows.

2. Tuples The rows of a relation, other than the header row containing the attribute names, are called

tuples. A tuple has one component for each attribute. When we want to display the tuple alone,

not as part of the relation, we use commas to separate the components, and use a parenthesis to

surround the tuple. For example, we will use the first row of the given relation:

(Star Wars, 1977, 124, color)

We should always use the order in which the attributes were listed in the relation schema

because the attributes are not displayed.

3. Domains

The relational data model requires that each component of each tuple should be atomic,

meaning that its values cannot be broken into smaller components.

The components of any tuple of the relation must have, in each component, a value that

belongs to the domain of the corresponding column. For example, tuples of the Movies relation

Page 2: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

2 | P a g e Team Crescendo

of Fig. 1 must have a first component that is a string, second and third components that are

integers, and a fourth component whose value is one of the constants color and blackAndWhite.

4. Relation Instances

A relation about movies is not static; rather, relations change over time. We expect that

these changes involve the tuples of the relation, such as adding new tuples, editing the

components of the tuples, and deleting the tuples.

A set of tuples is for a given relation is called an instance of that relation. For example, the

first three tuples in figure 1 form an instance of relation Movies.

Presumably, the relation Movies has changed over time and will continue to change over

time. For example, in 1980, Movies did not contain the tuples for Mighty Ducks or Wayne's World.

However, a conventional database system maintains only one version of any relation: the set of

tuples that are in the relation "now." This instance of the relation is called the current instance

Page 3: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

3 | P a g e Team Crescendo

Relational Data Model

Model

- a representation of ‘real world’ objects and events, and their associations.

- concentrates on the essential, inherent aspects of an organization and ignores the accidental

properties

Data Model

- an integrated collection of concepts for describing data, relationships between data, and

constraints on the data used by an organization

- attempts to represent the data requirements of the organization, or the part of the

organization that you wish to model.

- provides the basic concepts and notations that will allow database designers and end-users to

communicate their understanding of the organization data unambiguously and accurately

- consists of three components (1) structural part – set of rules that define how the database is

to be constructed (2) manipulative – defining the types of operations/transactions that are

allowed on the data (including operations used for updating or retrieving data and for changing

the structure of the database) (3) set of integrity rules – ensures that the data is accurate

- the purpose of a data model is to represent data and to make the data understandable

The relational data model is based on the mathematical concept of a relation which is physically

represented as a table. Codd, a trained mathematician, used terminology taken from

mathematics, principally set theory and predicate logic.

Relation – a table with columns and rows

A relational DBMS requires only that the database be perceived by the user as tables

Attribute – a named column of a relation

In a relational model, we use relations to hold information about the objects that we

want to represent in the database.

The rows of the table correspond to individual records and the columns correspond to

the attributes

Attributes can appear in any order and the relation will still be the same relation and

convey the same meaning

Domain – the set of allowable values for one or more attributes

Important feature of the relational model, every attribute in a relational database is

associated with a domain.

Domains may be distinct for each attribute, or two or more attributes may be associated

with the same domain.

Page 4: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

4 | P a g e Team Crescendo

Note that, at any given time, typically there will be values in a domain that don’t

currently appear as values in the corresponding attribute. In other words, a domain

describes possible values for an attribute.

Allows us to define the meaning and source of values that attributes can hold.

More information is available to the system and it can (theoretically) reject operations

that don’t make sense

Page 5: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

5 | P a g e Team Crescendo

Tuple – a record of a relation

The fundamental elements of a relation

Relational database – a collection of normalized tables

consists of tables that are appropriately structured

Properties of relational tables

the table has a name that is distinct from all other tables in the database

Each cell of the table contains exactly one value; tables don’t contain repeating

groups of data

A relational table that satisfies this property is said to be normalized (first normal

form)

Each column has a disctinct name

The values of a column are all from the same domain

The order of columns has no significance.

Each record is distinct; there are no duplicate records

The order of records has no significance, theoretically.

Relational keys

Each record in a table must be unique, therefore we must be able to identify a column

or combination of columns (relational keys) that provides uniquenes.

Superkey – a column or set of columns that uniquely identifies a record within a table

Page 6: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

6 | P a g e Team Crescendo

Candidate key – a superkey that contains only the minimum number of columns

necessary for unique identification

– has two properties: (1) Uniquenes (2) Irreducibility – no proper subset

of the candidate key has the uniqueness property

Primary key – the candidate key that is selected to identify records uniquely within the

table

Foreign key – a column or set of columns within one table that matches the candidate

key of some table (possibly the same table)

Representing Relational Databases

A relational database consists of one or more tables. The common convention for

representing a description of a relational database is to give the name of each table,

followed by the column names in parentheses. Normally, the primary key is underlined.

The description of the relational database for the StayHome video rental company is:

Relational Integrity

Since every column has an associated domain, there are constraints (called domain

constraints) in the form of restrictions on the set of values allowed for the columns

of tables.

There are two important integrity rules, which are constraints that apply to all

instances of the database.

Page 7: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

7 | P a g e Team Crescendo

1. Entity Integrity

2. Referential Integrity

Nulls

represent a value for a column that is currently unknown or is not applicable for

this record

A way to deal with incomplete or exceptional data

It is not the same as a zero numeric value or a text string filled with spaces, but a

null represents the absence of a value

Entity Integrity

- In a base table no column of a primary key can be null

- A base table is a named table whose records are physically stored in the

database.

Referential Integrity

- If a foreign key exists in a table, either the foreign key value must match a candidate

key value of some record in its home table or the foreign key must be wholly null.

Advantages:

1. Ease of use: The revision of any information as tables consisting of rows and columns

is much easier to understand .

2. Flexibility: Different tables from which information has to be linked and extracted can

be easily manipulated by operators such as project and join to give information in the

form in which it is desired.

3. Precision: The usage of relational algebra and relational calculus in the manipulation of

he relations between the tables ensures that there is no ambiguity, which may otherwise

arise in establishing the linkages in a complicated network type database.

4. Security: Security control and authorization can also be implemented more easily by

moving sensitive attributes in a given table into a separate relation with its own

authorization controls. If authorization requirement permits, a particular attribute could

be joined back with others to enable full information retrieval.

Page 8: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

8 | P a g e Team Crescendo

5. Data Independence: Data independence is achieved more easily with normalization

structure used in a relational database than in the more complicated tree or network

structure.

6. Data Manipulation Language: The possibility of responding to query by means of a

language based on relational algebra and relational calculus e.g SQL is easy in the

relational database approach. For data organized in other structure the query language

either becomes complex or extremely limited in its capabilities.

Disadvantages :

1. Performance: A major constraint and therefore disadvantage in the use of relational

database system is machine performance. If the number of tables between which

relationships to be established are large and the tables themselves effect the performance

in responding to the sql queries.

2. Physical Storage Consumption: With an interactive system, for example an operation

like join would depend upon the physical storage also. It is, therefore common in

relational databases to tune the databases and in such a case the physical data layout

would be chosen so as to give good performance in the most frequently run operations.

It therefore would naturally result in the fact that the lays frequently run operations would

tend to become even more shared.

3. Slow extraction of meaning from data: if the data is naturally organized in a hierarchical

manner and stored as such, the hierarchical approach may give quick meaning for that

data.

Page 9: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

9 | P a g e Team Crescendo

Concept of Normalization

Normalization of Database

Normalization is a systematic approach of decomposing tables to eliminate data redundancy and

undesirable characteristics like insertion, update and deletion Anomalies. It is a two step process

that puts data into tabular form by removing duplicated data from the relation tables.

Uses of Normalization

1. Eliminating redundant (useless) data

2. Ensuring data dependencies make sense i.e. data is logically stored

Without Normalization it becomes difficult to handle and update the database, without facing

data loss. Insertion, Updation and Deletion Anomalies are very frequent if Database is not

normalized. To understand these anomalies let us take an example of Student table.

Student table:

Updation Anomaly: To update address of a student who occurs twice or more than twice in a

table, we will have to update S_Address column in all the rows, else data will become inconsistent.

Insertion Anomaly: Suppose for a new admission, we have a student id(S_id), name and address

of a student but if student has not opted for any subjects yet then we have to insert NULL there,

leading to insertion anomaly.

Deletion Anomaly: If (S_id) 401 has only one subject and temporarily he drops it, when we delete

that row, entire student record will be deleted along with it.

Normalization Rule

Normalization rule are divided into following normal form.

1. First Normal Form

Page 10: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

10 | P a g e Team Crescendo

2. Second Normal Form

3. Third Normal Form

4. BCNF

First Normal Form (1NF)

A row of data cannot contain repeating group of data i.e each column must have a unique value.

Each row of data must have a unique identifier i.e Primary key. For example consider a table which

is not in First Normal form.

You can clearly see here that student name Adam is used twice in the table and subject math is

also repeated. This violates the First Normal form. To reduce above table to First Normal form

break the table into two different tables.

In Student table concentration of subject_id is the Primary key. Now both the Student table and

Subject table are normalized to first normal form.

Page 11: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

11 | P a g e Team Crescendo

Second Normal form (2NF)

A table to be normalized to Second Normal form should all meet the needs of First Normal

form and there must not be any partial dependency of any column on primary key. It means that

for a table that has concatenated primary key, each column in the table that is not part of the

primary key must depend upon the entire concatenated key for its existence. If any column

depends only on one part of the concatenated key, then the table fails Second Normal form. For

example, consider a table which is not in Second Normal form.

In customer table concatenation of Customer_id and Order_id is the primary key. This table is in

First Normal form but not in Second Normal form because there are partial dependencies of

columns on primary key. Customer_Name is only dependent on customer_id, Order_name is

dependent on Order_id and there is no link between sale_detail and Customer_name.

To reduce Customer table to Second Normal form break the table into following three different

tables.

Page 12: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

12 | P a g e Team Crescendo

Denormalization Databases intended for online transaction processing (OLTP) are typically more normalized

than databases intended for online analytical processing (OLAP). OLTP applications are

characterized by a high volume of small transactions such as updating a sales record at a

supermarket checkout counter. The expectation is that each transaction will leave the database in

a consistent state. By contrast, databases intended for OLAP operations are primarily "read mostly"

databases. Denormalization is also used to improve performance on smaller computers as in

computerized cash-registers and mobile devices, since these may use the data for look-up only

(e.g. price lookups). Denormalization may also be used when no RDBMS exists for a platform (such

as Palm), or no changes are to be made to the data and a swift response is crucial.

Some Good Reasons Not To Normalize

That said, there are some good reasons not to normalize your database. Let’s look at a few:

1. Joins are expensive. Normalizing your database often involves creating lots of tables. In fact,

you can easily wind up with what might seem like a simple query spanning five or ten tables.

If you’ve ever tried doing a five-table join, you know that it works in principle, but its

painstakingly slow in practice. If you’re building a web application that relies upon multiple-

join queries against large tables, you might find yourself thinking: “If only this database wasn’t

normalized!” When you hear that thought in your head, it’s a good time to consider

denormalizing. If you can stick all of the data used by that query into a single table without

really jeopardizing your data integrity, go for it! Be a rebel and denormalize your database.

You won’t look back!

2. Normalized design is difficult. If you’re working with a complex database schema, you’ll

probably find yourself banging your head against the table over the complexity of

normalization. As a simple rule of thumb, if you’ve been banging your head against the table

for an hour or two trying to figure out how to move to the fourth normal form, you might be

taking normalization too far. Step back and ask yourself if it’s really worth continuing.

3. Quick and dirty should be quick and dirty. If you’re just developing a prototype, just do

whatever works quickly. Really. It’s OK. Rapid application development is sometimes more

important than elegant design. Just remember to go back and take a careful look at your

design once you’re ready to move beyond the prototyping phase. The price you pay for a quick

and dirty database design is that you might need to throw it away and start over when it’s time

to build for production.

Page 13: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

13 | P a g e Team Crescendo

Five Basic Normal Forms

I. First Normal Form

An entity is in the first normal form if it contains no repeating groups. In relational terms,

a table is in the first normal form if it contains no repeating columns. Repeating columns make

your data less flexible, waste disk space, and make it more difficult to search for data.

Example: In the telephone directory, it appears that the name table contains repeating

columns, child1, child2, and child3.

You can see some problems in the current table. The table always reserves space on the

disk for three child records, whether the person has children or not. The maximum number of

children that you can record is three, but some of your acquaintances might have four or more

children. To look for a particular child, you have to search all three columns in every row.

To eliminate the repeating columns and bring the table to the first normal form, separate

the table into two tables. Put the repeating columns into one of the tables. The association

between the two tables is established with a primary-key and foreign-key combination. Because

a child cannot exist without an association in the name table, you can reference the name table

with a foreign key, rec_num.

Page 14: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

14 | P a g e Team Crescendo

II. Second Normal Form

An entity is in second normal form if each attribute that is not in the primary key

provides a fact that depends on the entire key. A violation of the second normal form occurs

when a non-primary key attribute is a fact about a subset of a composite key.

Example: An inventory entity records quantities of specific parts that are stored at particular

warehouses.

Here, the primary key consists of the PART and the WAREHOUSE attributes together.

Because the attribute WAREHOUSE_ADDRESS depends only on the value of WAREHOUSE, the

entity violates the rule for second normal form. This design causes several problems:

Each instances for a part that this warehouse stores repeats the address of the

warehouse.

If the address of the warehouse changes, every instance referring to a part that is stored

in that warehouse must be updated.

Because of the redundancy, the data might become inconsistent. Different instances

could show different addresses for the same warehouse.

If at any time the warehouse has no stored parts, the address of the warehouse might

not exist in any instances in the entity.

To satisfy second normal form, the information in the figure above would be in two entities.

Page 15: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

15 | P a g e Team Crescendo

III. Third Normal Form

An entity is in third normal form if each non-primary key attribute provides a fact that is

independent of other non-key attributes and depends only on the key. A violation of the third

normal form occurs when a non-primary attribute is a fact about another non-key attribute.

Example: The first entity in the following figure contains the attributes EMPLOYEE_NUMBER and

DEPARTMENT_NUMBER. Suppose that a program or user adds an attribute,

DEPARTMENT_NAME, to the entity. The new attribute depends on DEPARTMENT_NUMBER,

whereas the primary key is on the EMPLOYEE_NUMBER attribute. The entity now violates third

normal form.

Changing the DEPARTMENT_NAME value based on the update of a single employee,

David Brown, does not change the DEPARTMENT_NAME value for other employees in that

department. The updated version of the entity illustrates the resulting inconsistency.

Additionally, updating the DEPARTMENT_NAME in this table does not update it in any other

table that might contain a DEPARTMENT_NAME column.

You can normalize the entity by modifying the EMPLOYEE_DEPARTMENT entity and

creating two new entities: EMPLOYEE and DEPARTMENT. The DEPARTMENT entity contains

attributes for DEPARTMENT_NUMBER and DEPARTMENT_NAME. Now, an update such as

Page 16: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

16 | P a g e Team Crescendo

changing a department name is much easier. You need to make the update only to the

DEPARTMENT entity.

Page 17: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

17 | P a g e Team Crescendo

IV. Fourth Normal Form

An entity is in fourth normal form if no instance contains two or more independent,

multi-valued facts about an entity.

Example: Consider the EMPLOYEE entity. Each instance of EMPLOYEE could have both

SKILL_CODE and LANGUAGE_CODE. An employee can have several skills and know several

languages. Two relationships exist, one between employees and skills, and one between

employees and languages. An entity is not in fourth normal form if it represents both

relationships.

Instead, you can avoid this violation by creating two entities that represent both relationships.

V. Fifth Normal Form

Fifth normal form deals with cases where information can be reconstructed from smaller

pieces of information that can be maintained with less redundancy. Second, third, and fourth

normal forms also serve this purpose, but fifth normal form generalizes to cases not covered by

the others.

Example: If agents represent companies, companies make products, and agents sell products,

then we might want to keep a record of which agent sells which product for which company.

This information could be kept in one record type with three fields:

Page 18: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

18 | P a g e Team Crescendo

In this case, it turns out that we can reconstruct all the true facts from a normalized form

consisting of three separate record types, each containing two fields:

Roughly speaking, we may say that a record type is in fifth normal form when its

information content cannot be reconstructed from several smaller record types. If a record type

can only be decomposed into smaller records which all have the same key, then the record type

is considered to be in fifth normal form without decomposition. A record type in fifth normal

form is also in fourth, third, second, and first normal forms.

Page 19: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

19 | P a g e Team Crescendo

Transforming E-R Diagrams into Relations

It is useful to transform the conceptual data model into a set of normalized relations

Steps:

1. Represent entities

2. Represent relationships

3. Normalize the relations

4. Merge the relations

In translating a relationship set to a relation, attributes of the relation must include:

- The primary key for each participating entity set

- All descriptive attributes of the relationship set

From E/R Diagrams to Relational Designs

From Entity Sets to Relations

- For each non-weak entity set, create a relation of the same name and with the same

set of attributes.

Example: (Entity = Stars)

Name Address

Carrie Fisher 123 Maple St., Hollywood

Mark Hamill 456 Oak Rd., Brentwood

Page 20: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

20 | P a g e Team Crescendo

From E/R Relationships to Relations

- For each entity set involved in relationship R, we take its key attribute or attributes as

part of the schema of the relation for R.

- If the relationship has attributes, then these are also attributes of relation R

Example: (Relationship = Owns)

Combining Relations

Example: (Combining relation Movies with relation Owns)

Handling Weak Entity Sets

If W is a weak entity set, construct for W a relation whose schema consists of:

1. All Attributes of the weak entity set W.

2. All attribute of supporting relationship for W.

3. For each supporting relationship for W, all the key attributes of the entity set E.

Rename attributes, if necessary, to avoid name conflicts.

Do not construct a relation for any supporting relationship for W.

Example:

Title Year studioName

Star Wars 1977 Fox

Mighty Ducks 1991 Disney

Title Year Length filmType studioName

Star Wars 1977 124 color Fox

Mighty Ducks 1991 104 color Disney

Page 21: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

21 | P a g e Team Crescendo

Schema for Relation Contracts:

Contracts (starName, studioName, title, year, salary)

Converting Subclass Structures to Relations

Principal conversion strategies:

1. Follow the E/R viewpoint. For each entity set E in the hierarchy, create a relation that

includes the key attributes from the root and any attributes belonging to E.

2. Treat entities as objects belonging to a single class. For each possible subtree including

the root, create on relation, whose schema includes all the attributes of all the entity sets

in the subtree.

3. Use null values. Create one relation with all the attributes of all the entity sets in the

hierarchy. Each entity is represented by one tuple, and that tuple has a null value for

whatever attributes the entity does not have.

E/R – Style Conversion

- Create a relation for each entity set. If the entity set E is not the root of the hierarchy,

then the relation for E will include the key attributes at the root, to identify the entity

represented by each tuple, plus all the attributes of E.

Example:

1. Movies (title, year, length, filmType)

2. MurderMysteries (title, year, weapon)

3. Cartoons (title, year)

Page 22: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

22 | P a g e Team Crescendo

An Object-Oriented Approach

- An alternate strategy for converting isa-hierarchies to relations is to enumerate all

the possible subtrees of the hierarchy.

- For each, create one relation that represents entities that have components in exactly

those subtrees; the schema for this relation has all the attributes of any entity set in

the subtree.

Example: (refer to image in previous example)

Four possible subtrees including the root:

1. Movie alone.

Movies (title, year, lenth, filmType)

2. Movies and Cartoons only.

MoviesC (title, year, length, filmType)

3. Movies and Murder-Mysteries only.

MoviesMM (title, year, length, filmType, weapon)

4. All three entity sets.

MoviesCMM (title, year, length, filmType, weapon)

We can combine Movies with MoviesC and MoviesMM with MoviesCMM, although

doing so loses some information.

Using Null Values to Combine Relations

- If we are allowed to use NULL as a value in tuples, we can handle a hierarchy of entity

sets with a single relation. This relation has all the attributes belonging to any entity

set of the hierarchy.

Example: (based from the previous examples)

Movie (title, year, lenth, fimType, weapon)

Page 23: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

23 | P a g e Team Crescendo

Combining Relations:

Relations are sets. Therefore, set operations (∪, ∩, −) can be applied to relations with respect to

the underlying sets to form a new relation.

Let

A= {1, 2, 3} and

B= {1, 2, 3, 4}

The relation R1 is on A and the relation R2 is on B:

R1= {(1, 1), (2, 2), (3, 3)} and

R2= {(1, 1), (1, 2), (1, 3), (1, 4)}

Because a movie has several stars, we are forced to repeat all the information about the movie,

once for each star, that the length of the Star Wars is repeated three times once for each stars as

is the fact that the movie is owned by Fox. And this redundancy is undesirable, and the purpose

of merging or combining relation is to split relation and thereby remove the redundancy.

Page 24: Chapter 6 DBMS

Chapter VI – Logical Design and Relational Data Model

24 | P a g e Team Crescendo