Introduction to Database Design - Khon Kaen Universitykrunapon/courses/178370/slides/... ·...

Preview:

Citation preview

Introduction to

Database Design

Dr. Kanda Runapongsa

(krunapon@kku.ac.th)

Dept of Computer Engineering

Khon Kaen University

2

Overview

What are the steps in designing

a database?

Why is the ER model used to

create an initial design?

What are the main concepts in

the ER model?

What are guidelines for using

the ER model effectively?

3

Database Design Process

The process can be divided into six steps

The ER model is most relevant to the first three

Requirement Analysis

Conceptual database design

Logical database design

Schema refinement

Physical database design

Application and security design

4

Requirement Analysis

Understand what data is to be stored

Know what applications must be built

on top of the database

Understand what operations are most

frequent and subject to performance

requirements

Summary: Find out what the users

want from the database

5

Conceptual Database Design

Create a simple description of the

data that closely matches how users

and developers think of the data

This step uses the ER model which is

one of several high-level or semantic,

data models used in the database

design

The initial design must be translated

into a data model supported by a

commercial system

6

Logical Database Design

Choose a DBMS to implement the

database design

Convert the conceptual database

design into a database schema in the

data model of the chosen DBMS

Here, we consider only relation

DBMSs

The task is to convert an ER schema

into a relational database schema

7

Schema Refinement

Analyze the collection of relations in

the relational database schema to

identify potential problems and to

refine it

Schema refinement can be guided by

some elegant and powerful theory

For example, we can apply the theory of

normalizing relations to reduce the

redundancy of data stored in the

database

8

Physical Database Design

Consider typical expected workloads

that the database must support and

further refine the database design to

ensure that it meets desired

performance criteria

For example, we may build indexes

on some tables and cluster some

tables into the same location on the

disk

9

Application and Security Design

Any software project that involves a DBMS must consider aspects of the application that go beyond the database itself

We must describe the role of each entity in every process as part of a complete workflow for that task

For each role, we must identify the parts of the database that must be accessible and the parts of the database that must not be accessible

10

Conceptual DesignWhat are the entities and relationships in

the enterprise?

What information about these entities and relationships should we store in the database?

What are the integrity constraint or business rules that hold?

A database schema in the ER model can be represented by using ER diagrams

We can map an ER diagram into a relational schema

11

ER Model Basics: Entity (1/2)

Entity: a real-world object distinguishable from other objects

An entity is described (in DB) using a set of attributes

Entity set: a collection of similar entities

All entities in an entity set have the same set of attributes; this is what we mean by similar

Each entity set has a key

Each attribute has a domain

12

ER Model Basics: Entity (2/2)

A key is a minimal set of attributes

whose values uniquely identify an

entity in the set

There could be more than one

candidate key

If so, we designate one of them as the

primary key

Which one should we design?

13

Example: Entity

ER diagram of an

entity “Employees”

Entity set “Employees”

ssn namelot

111 A 12

222 B 24

Entity “111 A 12” is similar to Entity “222 B 24” since both of them are described using the same set of attributes

Employees

ssnname

lot

The key of this entity

is “ssn” which is

underlined

14

ER Model Basics: Relationship (1/2)

A relationship is an association

among two or more entities, e.g.,

Khun Mana works in the department

of Computer Engineering

Relationship Set: Collection of similar

relationships

15

ER Model Basics: Relationship (2/2)

An n-ary relationship set R relates n entities sets E1…En; each relationship in R involves entities E1, …, En

Same entity set could participate in different relationship sets

Employees works in which department

Employees supervise which other employees

Same entity different “roles” in same set

Employees supervise subordinates

Employees are supervised by supervisors

16

Example1: Relationships

A relationship between “Employees” and “Departments”

“since” is a descriptive attribute of this relationship

A relationship must be uniquely identified by the participating entities

lot

dname

budgetdid

sincename

Works_In DepartmentsEmployees

ssn

17

Example2: Relationships

Same entity set could participate

in different relationship sets, or in

different “roles” in same set.

Reports_To

lot

name

Employees

super-

visor

ssn

Works_In DepartmentsEmployees

Manages

super-visee

18

Key Constraints (1/2)Consider Works_In: An employee can

work in many departments; a department can have many employees‟

In contrast, each dept has at most one manager, according to the key constraintin Manages

Many kinds of relationships

One-to-one

One-to-many

Many-to-one

Many-to-many

19

Key Constraints (2/2)

1-to-1 1-to Many Many-to-1

dname

budgetdid

since

lot

name

ssn

ManagesEmployees Departments

Many-to-Many

20

Participation Constraints

Does every department have a manager?

If so, this is a participation constraint: the participation of Departments in Manager is said to be total (vs. partial)

Each employee works in at least one department and that each department has at least one employee the participation of both Employees and Departments in Works_In is total

A participation that is not total is said to be partial

21

Example: Participation Constraints

Employees and Departments have total

participation for relationship “Works_In”

For relationship “Manages”, Departments have

total participation but Employees have partial

participation

lot

name dname

budgetdid

sincename dname

budgetdid

since

Manages

since

DepartmentsEmployees

ssn

Works_In

22

Weak Entities

A weak entity can be identified

uniquely only by considering the

primary key of another (owner) entity

Owner entity set and weak entity set

must participate in a one-to-many

relationship set (one owner, many weak

entities)

Weak entity set must have total

participation in this identifying

relationship set

23

Example: Weak Entities

“Employees” is an owner entity set

and “Dependents” is a weak entity set

“Dependents” have total participation

in the relationship

lot

name

agepname

DependentsEmployees

ssn

Policy

cost

24

ISA („is a‟) Hierarchies (1/3)

In superclass-subclass in C++,

attributes in superclass are inherited

to subclass

If we declare A ISA B, every A entity

is also considered to be a B entity

Hourly_Emps ISA Employees

Overlap constraints: Can Chai be an

Hourly_Emps as well as a

Contract_Emps entity?

(Allowed/disallowed)

25

ISA („is a‟) Hierarchies (2/3)

Covering constraints: Does every

Employees entity also have to be an

Hourly_Emps or a Contact_Emps

entity? (Yes/no)

Reasons for using ISA:

To add descriptive attributes specific to

a subclass

To identify entities that participate in a

relationship

26

ISA („is a‟) Hierarchies (3/3)

Contract_Emps

name

ssn

Employees

lot

hourly_wages

ISA

Hourly_Emps

contractid

hours_worked

27

Aggregation

Used when we have to model a relationship involving (entity sets and) a relationship set

Aggregation allows us to treat a relationship set as an entity set for purposes of participation in (other) relationships

Aggregation vs. ternary relationship

Monitors is a distinct relationship, with a descriptive attribute

Also, can say that each sponsorship is monitored by at most one employee

28

Example: Aggregation

budgetdidpid

started_on

pbudget

dname

until

DepartmentsProjects Sponsors

Employees

Monitors

lotname

ssn

since

29

Conceptual Design Using the ER Model

Design choices:

Should a concept be modeled as an entity or an attribute?

Should a concept be modeled as an entity or a relationship?

Identifying relationships: Binary or ternary? Aggregation?

Constraints in the ER model:

A lot of data semantics can (and should) be captured

30

Entity vs. Attribute (1/2)Should address be an attribute of

Employees or an entity (connected to Employees by a relationship)?

Depends upon the use we want to make of address information, and the semantics of the data

If we have several addresses per employee, address must be an entity (since attributes cannot be set-valued)

If the structure (city, street, etc.) is important, e.g., we want to retrieve employees in a given city, address must be modeled as an entity (since attribute values are atomic)

31

Works_In4 does not allow an employee to work in a department for two or more periods.

Similar to the problem of wanting to record several addresses for an employee: We want to record several values of the descriptive attributes for each instance of this relationship. Accomplished by introducing new entity set, Duration.

name

Employees

ssn lot

Works_In4

from to

dname

budgetdid

Departments

dname

budgetdidname

Departments

ssn lot

Employees Works_In4

Durationfrom to

Entity vs. Attribute (2/2)

32

First ER diagram OK if a manager gets a separate discretionary

budget for each dept.

What if a manager gets a discretionary budget that covers all managed depts? Redundancy: dbudget

stored for each dept managed by manager.

Misleading: Suggests dbudget associated with department-mgr combination.

Manages2

name dname

budgetdid

Employees Departments

ssn lot

dbudgetsince

dname

budgetdid

DepartmentsManages2

Employees

namessn lot

since

Managers dbudget

ISA

This fixes the

problem!

Entity vs. Relationships

33

agepname

DependentsCovers

name

Employees

ssn lot

Policies

policyid cost

Bad design

Beneficiary

agepname

Dependents

policyid cost

Policies

Purchaser

name

Employees

ssn lot

Better design

Binary vs. Ternary Relationships (1/3)

34

Binary vs. Ternary Relationships (2/3)

If each policy is owned by just 1 employee, and each dependent is tied to the covering policy, the first diagram is inaccurate

What are the additional constraints in the 2nd diagram?

Each policy is owned by at most one employee

Each dependent is covered by at most one policy

35

Binary vs. Ternary Relationships (3/3)

An example in the other direction: a ternary relation Contracts relates entity sets Parts, Departments, and Suppliers, and has descriptive attribute qty.

No combination of binary relationships is an adequate substitute:

S „can-supply‟ P, D „needs‟ P, and D „deals-with‟ S does not imply that D has agreed to buy P from S

How do we record qty?

36

Summary of ER (1/4)

Conceptual design follows requirement analysis

Yields a high-level description of data to be stored

ER model popular for conceptual design

Constructs are expressive, close to the way people think about their applications

Basic constructs: entities, relationships, and attributes (of entities and relationships)

37

Summary of ER (2/4)

Some additional constructs: weak entities, ISA hierarchies, and aggregation

Several kinds of integrity constraints can be expressed in the ER model: key constraints, participation constraints, and overlap/covering constraints for ISA hierarchies

Some constraints (notably, functional dependencies) cannot be expressed in the ER model

38

Summary of ER (3/4)

ER design is subjective. There are

often many ways to model a given

scenario!

Analyzing alternatives can be tricky,

especially for a large enterprise.

Common choices include:

Entity vs. attribute, entity vs.

relationship, binary or n-ary relationship,

whether or not to use ISA hierarchies,

and whether or not to use aggregation

39

Summary of ER (4/4)

Ensuring good database design:

resulting relational schema should be

analyzed and refined further

Functional dependency information and

normalization techniques are especially

useful

40

Reference

Database Management System 3rd,

R. Ramakrishnan and J. Gehrke

Recommended