49
9/21/2000 SIMS 257 – Database Management Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School of Information Management and Systems SIMS 257 – Database Management

Database Design: From Conceptual Design to Physical Implementation

  • Upload
    sorley

  • View
    31

  • Download
    1

Embed Size (px)

DESCRIPTION

Database Design: From Conceptual Design to Physical Implementation. University of California, Berkeley School of Information Management and Systems SIMS 257 – Database Management. Review. Database Design Process Normalization. Database Design Process. Application 1. Application 2. - PowerPoint PPT Presentation

Citation preview

Page 1: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Database Design: From Conceptual Design to Physical

ImplementationUniversity of California, Berkeley

School of Information Management and Systems

SIMS 257 – Database Management

Page 2: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Review

• Database Design Process

• Normalization

Page 3: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 4: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Normalization

• Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data

• Normalization is a multi-step process beginning with an “unnormalized” relation

– Hospital example from Atre, S. Data Base: Structured Techniques for

Design, Performance, and Management.

Page 5: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Normal Forms

• First Normal Form (1NF)

• Second Normal Form (2NF)

• Third Normal Form (3NF)

• Boyce-Codd Normal Form (BCNF)

• Fourth Normal Form (4NF)

• Fifth Normal Form (5NF)

Page 6: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Normalization

Boyce-Codd and

Higher

Functional dependencyof nonkey attributes on the primary key - Atomic values only

Full Functional dependencyof nonkey attributes on the primary key

No transitive dependency between nonkey attributes

All determinants are candidate keys - Single multivalued dependency

Page 7: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Unnormalized Relations

• First step in normalization is to convert the data into a two-dimensional table

• In unnormalized relations data can repeat within a column

Page 8: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Unnormalized RelationPatient # Surgeon # Surg. date Patient Name Patient Addr Surgeon Surgery Postop drugDrug side effects

1111145 311

Jan 1, 1995; June 12, 1995 John White

15 New St. New York, NY

Beth Little Michael Diamond

Gallstones removal; Kidney stones removal

Penicillin, none-

rash none

1234243 467

Apr 5, 1994 May 10, 1995 Mary Jones

10 Main St. Rye, NY

Charles Field Patricia Gold

Eye Cataract removal Thrombosis removal

Tetracycline none

Fever none

2345 189Jan 8, 1996 Charles Brown

Dogwood Lane Harrison, NY

David Rosen

Open Heart Surgery

Cephalosporin none

4876 145Nov 5, 1995 Hal Kane

55 Boston Post Road, Chester, CN Beth Little

Cholecystectomy Demicillin none

5123 145May 10, 1995 Paul Kosher

Blind Brook Mamaroneck, NY Beth Little

Gallstones Removal none none

6845 243

Apr 5, 1994 Dec 15, 1984 Ann Hood

Hilton Road Larchmont, NY

Charles Field

Eye Cornea Replacement Eye cataract removal

Tetracycline Fever

Page 9: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

First Normal FormPatient # Surgeon #Surgery DatePatient NamePatient AddrSurgeon Name Surgery Drug adminSide Effects

1111 145 01-Jan-95 John White

15 New St. New York, NY Beth Little

Gallstones removal Penicillin rash

1111 311 12-Jun-95 John White

15 New St. New York, NY

Michael Diamond

Kidney stones removal none none

1234 243 05-Apr-94 Mary Jones10 Main St. Rye, NY Charles Field

Eye Cataract removal

Tetracycline Fever

1234 467 10-May-95 Mary Jones10 Main St. Rye, NY Patricia Gold

Thrombosis removal none none

2345 189 08-Jan-96Charles Brown

Dogwood Lane Harrison, NY David Rosen

Open Heart Surgery

Cephalosporin none

4876 145 05-Nov-95 Hal Kane

55 Boston Post Road, Chester, CN Beth Little

Cholecystectomy Demicillin none

5123 145 10-May-95 Paul Kosher

Blind Brook Mamaroneck, NY Beth Little

Gallstones Removal none none

6845 243 05-Apr-94 Ann Hood

Hilton Road Larchmont, NY Charles Field

Eye Cornea Replacement

Tetracycline Fever

6845 243 15-Dec-84 Ann Hood

Hilton Road Larchmont, NY Charles Field

Eye cataract removal none none

Page 10: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Second Normal FormPatient # Patient Name Patient Address

1111 John White15 New St. New York, NY

1234 Mary Jones10 Main St. Rye, NY

2345Charles Brown

Dogwood Lane Harrison, NY

4876 Hal Kane55 Boston Post Road, Chester,

5123 Paul KosherBlind Brook Mamaroneck, NY

6845 Ann HoodHilton Road Larchmont, NY

Page 11: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Second Normal FormSurgeon # Surgeon Name

145 Beth Little

189 David Rosen

243 Charles Field

311 Michael Diamond

467 Patricia Gold

Page 12: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Second Normal FormPatient # Surgeon # Surgery Date Surgery Drug Admin Side Effects

1111 145 01-Jan-95Gallstones removal Penicillin rash

1111 311 12-Jun-95

Kidney stones removal none none

1234 243 05-Apr-94Eye Cataract removal Tetracycline Fever

1234 467 10-May-95Thrombosis removal none none

2345 189 08-Jan-96Open Heart Surgery

Cephalosporin none

4876 145 05-Nov-95Cholecystectomy Demicillin none

5123 145 10-May-95Gallstones Removal none none

6845 243 15-Dec-84Eye cataract removal none none

6845 243 05-Apr-94Eye Cornea Replacement Tetracycline Fever

Page 13: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Third Normal FormPatient # Surgeon # Surgery Date Surgery Drug Admin

1111 145 01-Jan-95 Gallstones removal Penicillin

1111 311 12-Jun-95Kidney stones removal none

1234 243 05-Apr-94 Eye Cataract removal Tetracycline

1234 467 10-May-95 Thrombosis removal none

2345 189 08-Jan-96 Open Heart Surgery Cephalosporin

4876 145 05-Nov-95 Cholecystectomy Demicillin

5123 145 10-May-95 Gallstones Removal none

6845 243 15-Dec-84 Eye cataract removal none

6845 243 05-Apr-94Eye Cornea Replacement Tetracycline

Page 14: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Third Normal Form

Drug Admin Side Effects

Cephalosporin none

Demicillin none

none none

Penicillin rash

Tetracycline Fever

Page 15: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Most 3NF Relations are also BCNF

Patient # Patient Name Patient Address

1111 John White15 New St. New York, NY

1234 Mary Jones10 Main St. Rye, NY

2345Charles Brown

Dogwood Lane Harrison, NY

4876 Hal Kane55 Boston Post Road, Chester,

5123 Paul KosherBlind Brook Mamaroneck, NY

6845 Ann HoodHilton Road Larchmont, NY

Page 16: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Fourth Normal Form

• Any relation is in Fourth Normal Form if it is BCNF and any multivalued dependencies are trivial

• Eliminate non-trivial multivalued dependencies by projecting into simpler tables

Page 17: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Fifth Normal Form

• A relation is in 5NF if every join dependency in the relation is implied by the keys of the relation

• Implies that relations that have been decomposed in previous NF can be recombined via natural joins to recreate the original relation.

Page 18: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Normalization

• Normalization is performed to reduce or eliminate Insertion, Deletion or Update anomalies.

• However, a completely normalized database may not be the most efficient or effective implementation.

• “Denormalization” is sometimes used to improve efficiency.

Page 19: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Denormalization

• Usually driven by the need to improve query speed

• Query speed is improved at the expense of more complex or problematic DML (Data manipulation language) for updates, deletions and insertions.

Page 20: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Downward DenormalizationCustomer

ID

Address

Name

Telephone

Customer

ID

Address

Name

Telephone

Order

Order No

Date Taken

Date Dispatched

Date Invoiced

Cust ID

Order

Order No

Date Taken

Date Dispatched

Date Invoiced

Cust ID

Cust Name

Before: After:

Page 21: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Upward DenormalizationOrder

Order No

Date Taken

Date Dispatched

Date Invoiced

Cust ID

Cust Name

Order

Order No

Date Taken

Date Dispatched

Date Invoiced

Cust ID

Cust Name

Order Price

Order Item

Order No

Item No

Item Price

Num Ordered

Order Item

Order No

Item No

Item Price

Num Ordered

Page 22: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Today: New Design

• Today we will build the COOKIE database from needs (rough) through the conceptual model, logical model and finally physical implementation in Access.

Page 23: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

ER Diagram Symbols

Entity

AttributePrimary

key

Relationship

Ovals are used to indicate the attributes associated with an entity or relationship (That is, the pieces of information recorded in the database about the entity or relationship) An underlined name indicates that the attribute is a primary key (That is, it can uniquely identify the entity)

Rectangles are used to indicate entities (That is, the representatives or records describing persons, things, or events in the database)

Diamonds are used to indicate relationships between entities. (That is, some association between the data records of different entities)

Page 24: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Cookie Requirements• Cookie is a bibliographic database that contains

information about a hypothetical union catalog of several libraries.

• Need to record which books are held by which libraries

• Need to search on bibliographic information– Author, title, subject, call number for a given library,

etc.

• Need to know who publishes the books for ordering, etc.

Page 25: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Cookie Database

• There are currently 5 main types of entities in the database – Books (bibfile)

– Local Call numbers (callfile)

– Libraries (libfile)

– Publishers (pubfile)

– Subject headings (subfile)

– Links between subject and books (indxfile)

Page 26: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

BIBFILE• Books (BIBFILE) contains information about

particular books. It includes one record for each book. The attributes are:– accno -- an “accession” or serial number

– author -- The author’s name (not realistic -- one author per book)

– title -- The title of the book

– loc -- Location of publication (where published)

– date -- Date of publication

– price -- Price of the book

– pagination -- Number of pages

– ill -- What type of illustrations (maps, etc) if any

– height -- Height of the book in centimeters

Page 27: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Books/BIBFILE

Books

Author

accno

Title

Loc

DatePrice

Pagination

HeightIll

Page 28: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

CALLFILE

• CALLFILE contains call numbers and holdings information linking particular books with particular libraries. Its attributes are:– accno -- the book accession number

– libid -- the id of the holding library

– callno -- the call number of the book in the particular library

– copies -- the number of copies held by the particular library

Page 29: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

LocalInfo/CALLFILE

CALLFILE

Copiesaccno

libid Callno

Page 30: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

LIBFILE• LIBFILE contain information about the libraries

participating in this union catalog. Its attributes include:– libid -- Library id number– library -- Name of the library– laddress -- Street address for the library– lcity -- City name– lstate -- State code (postal abbreviation)– lzip -- zip code– lphone -- Phone number– mop - suncl -- Library opening and closing times for each day of the week.

Page 31: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Libraries/LIBFILE

LIBFILE

LibidSatCl

SatOp

FCl

FOp

ThCl

ThOpWClWOpTuClTuOp

Mcl

MOp

Suncl

SunOp

lphone

lziplstate lcityladdressLibrary

Page 32: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

PUBFILE• PUBFILE contain information about the

publishers of books. Its attributes include– pubid -- The publisher’s id number– publisher -- Publisher name– paddress -- Publisher street address– pcity -- Publisher city– pstate -- Publisher state– pzip -- Publisher zip code– pphone -- Publisher phone number– ship -- standard shipping time in days

Page 33: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Publisher/PUBFILE

PUBFILEpubid

Ship

Publisher

pphone

pzip

pstate

pcity

paddress

Page 34: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

SUBFILE

• SUBFILE contains each unique subject heading that can be assigned to books. Its attributes are– subcode -- Subject identification number– subject -- the subject heading/description

Page 35: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Subjects/SUBFILE

SUBFILE

Subjectsubid

Page 36: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

INDXFILE

• INDXFILE provides a way to allow many-to-many mapping of subject headings to books. Its attributes consist entirely of links to other tables– subcode -- link to subject id– accno -- link to book accession number

Page 37: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Linking Subjects and Books

INDXFILE

accnosubid

Page 38: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Some examples of Cookie Searches

• Who wrote Microcosmographia Academica?• How many pages long is Alfred Whitehead’s The Aims of Education

and Other Essays?• Which branches in Berkeley’s public library system are open on Sunday?

• What is the call number of Moffitt Library’s copy of Abraham Flexner’s book Universities: American, English, German?

• What books on the subject of higher education are among the holdings of Berkeley (both UC and City) libraries?

• Print a list of the Mechanics Library holdings, in descending order by height.

• What would it cost to replace every copy of each book that contains illustrations (including graphs, maps, portraits, etc.)?

• Which library closes earliest on Friday night?

Page 39: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Cookie ER diagram

Has callBIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILEHas index

libid

CALLFILE Has copy

publishes pubidPUBFILE

Has subject

subcodeaccno subcode

libidaccno

Note: diagramcontains onlyattributes usedfor linking

Page 40: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

What Problems?

• What sorts of problems and missing features arise given the previous ER diagram?

Page 41: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Problems Identified

• Field sizes inappropriate• Author doesn’t allow

multiple authors (editors, etc).

• Subtitles, parallel titles• Edition information• Series information• lending status• material type designation• Genre, class information• Better codes (ISBN?)

• Missing information (ISBN)

• Authority control for authors

• Missing/incomplete data• Data entry problems• Ordering information• Illustrations• Subfield separation (such

as last_name, first_name)• Separate personal and

corporate authors

Page 42: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Problems (Cont.)

• Location field inconsistent

• No notes field• No language field• Zipcode doesn’t support

plus-4• No publisher shipping

addresses

• No (indexable) keyword search capability

• No support for multivolume works

• No support for URLs – to online version

– to libraries

– to publishers

Page 43: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Original Cookie ER diagram

Has callBIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILEHas index

Address, etc

Librarylibid

CALLFILE Has copy

publishes pubidPUBFILE

Has subject

subidaccno subid subject

CallnoLibidaccno

Page 44: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Cookie2: Separate Name Authorities

nameid

BIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILE

libid

CALLFILE

pubidPUBFILE

subcodeaccno subcode

libidaccno

AUTHFILE

AUTHBIB

authtype

accno

nameid

name

Page 45: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Cookie3: Keywords

nameid

BIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILE

libid

CALLFILE

pubidPUBFILE

subcodeaccno subcode

libidaccno

AUTHFILE

AUTHBIB

authtype

accno

nameid

name

KEYMAP TERMS

accno termid termid

Page 46: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Cookie 4: Series

nameid

BIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILE

libid

CALLFILE

pubidPUBFILE

subcodeaccno subcode

libidaccno

AUTHFILE

AUTHBIB

authtype

accno

nameid

name

KEYMAP TERMS

accno termid termid

SERIES

seriesid

seriesid

ser_title

Page 47: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Cookie 5: Circulation

nameid

BIBFILE

pubid

LIBFILE

accno

libid

CALLFILE

pubidPUBFILE

libidaccno

INDXFILE SUBFILE

subcodeaccno subcodeAUTHFILE

AUTHBIB

authtype

accno

nameid

name

KEYMAP TERMS

accno termid termid

SERIES

seriesid

seriesid

ser_title

CIRC

circidcopynumpatronid

PATRON

circid

Page 48: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Mapping to Relations

• Take each entity– BIBFILE– LIBFILE– CALLFILE– SUBFILE– PUBFILE– INDXFILE

• And make it a table...

Page 49: Database Design: From Conceptual Design to Physical Implementation

9/21/2000 SIMS 257 – Database Management

Implementing the Physical Database...

• For each of the entities, we will build a table…

• Start up access…

• Use “New” in Tables…

• Loading data

• Entering data

• Data entry forms