View
217
Download
1
Tags:
Embed Size (px)
Citation preview
Callie’s Birthday 2004-10-05 - SLIDE 1
IS 202 – FALL 2004
Prof. Ray Larson & Prof. Marc Davis
UC Berkeley SIMS
Tuesday and Thursday 10:30 am - 12:00 pm
Fall 2004http://www.sims.berkeley.edu/academics/courses/is202/f04/
SIMS 202:
Information Organization
and Retrieval
Lecture 11: Intro to Database Design
Callie’s Birthday 2004-10-05 - SLIDE 2
IS 202 – FALL 2004
Lecture Overview
• Review– Evaluation exercise
• Databases and Database Design
• Database Life Cycle
• ER Diagrams
• Discussion
• Next Time/Readings
Callie’s Birthday 2004-10-05 - SLIDE 3
IS 202 – FALL 2004
Lecture Overview
• Review– Evaluation exercise
• Databases and Database Design
• Database Life Cycle
• ER Diagrams
• Discussion
• Next Time/Readings
Callie’s Birthday 2004-10-05 - SLIDE 4
IS 202 – FALL 2004
What is a Database?
Callie’s Birthday 2004-10-05 - SLIDE 5
IS 202 – FALL 2004
Files and Databases
• File: A collection of records or documents dealing with one organization, person, area or subject (Rowley)– Manual (paper) files– Computer files
• Database: A collection of similar records with relationships between the records (Rowley)– Bibliographic, statistical, business data,
images, etc.
Callie’s Birthday 2004-10-05 - SLIDE 6
IS 202 – FALL 2004
Database
• A Database is a collection of stored operational data used by the application systems of some particular enterprise (C.J. Date)– Paper “Databases”
• Still contain a large portion of the world’s knowledge
– File-Based Data Processing Systems• Early batch processing of (primarily) business data
– Database Management Systems (DBMS)
Callie’s Birthday 2004-10-05 - SLIDE 7
IS 202 – FALL 2004
Why DBMS?
• History– 50’s and 60’s all applications were custom
built for particular needs– File based– Many similar/duplicative applications dealing
with collections of business data– Early DBMS were extensions of programming
languages– 1970 - E.F. Codd and the Relational Model– 1979 - Ashton-Tate and first Microcomputer
DBMS
Callie’s Birthday 2004-10-05 - SLIDE 8
IS 202 – FALL 2004
File Based Systems
Naughty
NiceJust what asked for
CoalEstimation
DeliveryList
Application File
ToysAddresses
Toys
Callie’s Birthday 2004-10-05 - SLIDE 9
IS 202 – FALL 2004
From File Systems to DBMS
• Problems with file processing systems– Inconsistent data– Inflexibility– Limited data sharing– Poor enforcement of standards– Excessive program maintenance
Callie’s Birthday 2004-10-05 - SLIDE 10
IS 202 – FALL 2004
DBMS Benefits
• Minimal data redundancy• Consistency of data• Integration of data• Sharing of data• Ease of application development• Uniform security, privacy, and integrity controls• Data accessibility and responsiveness• Data independence• Reduced program maintenance
Callie’s Birthday 2004-10-05 - SLIDE 11
IS 202 – FALL 2004
Terms and Concepts
• Data independence– Physical representation and location of data
and the use of that data are separated• The application doesn’t need to know how or
where the database has stored the data, but just how to ask for it
• Moving a database from one DBMS to another should not have a material effect on application program
• Recoding, adding fields, etc. in the database should not affect applications
Callie’s Birthday 2004-10-05 - SLIDE 12
IS 202 – FALL 2004
Database Environment
CASE Tools
DBMS
UserInterface
ApplicationPrograms
Repository Database
Callie’s Birthday 2004-10-05 - SLIDE 13
IS 202 – FALL 2004
Database Components
DBMS===============
Design toolsTable CreationForm CreationQuery CreationReport Creation
Procedural language
compiler (4GL)=============
Run timeForm processorQuery processor
Report WriterLanguage Run time
UserInterface
Applications
ApplicationProgramsDatabase
Database contains:User’s DataMetadataIndexesApplication Metadata
Callie’s Birthday 2004-10-05 - SLIDE 14
IS 202 – FALL 2004
Types of Database Systems
• PC databases
• Centralized database
• Client/server databases
• Distributed databases
• Database models
Callie’s Birthday 2004-10-05 - SLIDE 15
IS 202 – FALL 2004
PC Databases
E.g.:AccessFoxProDbaseEtc.
Callie’s Birthday 2004-10-05 - SLIDE 16
IS 202 – FALL 2004
Centralized Databases
Central Computer
Callie’s Birthday 2004-10-05 - SLIDE 17
IS 202 – FALL 2004
Client Server Databases
NetworkClient
Client
Client
DatabaseServer
Callie’s Birthday 2004-10-05 - SLIDE 18
IS 202 – FALL 2004
Distributed Databases
computercomputer
computer
Location A
Location CLocation B
HomogeneousDatabases
Callie’s Birthday 2004-10-05 - SLIDE 19
IS 202 – FALL 2004
Distributed Databases
Local Network
DatabaseServer
Client
Client
CommServer
Remote Comp.
Remote Comp.
HeterogeneousOr FederatedDatabases
Callie’s Birthday 2004-10-05 - SLIDE 20
IS 202 – FALL 2004
Terms and Concepts
• A “database application” is an application program (or set of related programs) that is used to perform a series of database activities:– Create
• Add new data to the database
– Read• Read current data from the database
– Update• Update or modify current database data
– Delete• Remove current On behalf of database users
Callie’s Birthday 2004-10-05 - SLIDE 21
IS 202 – FALL 2004
Terms and Concepts
• Enterprise– Organization
• Entity– Person, Place, Thing, Event, Concept...
• Attributes– Data elements (facts) about some entity– Also sometimes called fields or items or domains
• Data values– Instances of a particular attribute for a particular
entity
Callie’s Birthday 2004-10-05 - SLIDE 22
IS 202 – FALL 2004
Terms and Concepts
• Key– An attribute or set of attributes used to identify
or locate records in a file
• Primary Key– An attribute or set of attributes that uniquely
identifies each record in a file
Callie’s Birthday 2004-10-05 - SLIDE 23
IS 202 – FALL 2004
Terms and Concepts
• Models– (1) Levels or views of the Database
• Conceptual, logical, physical
– (2) DBMS types• Relational, Hierarchic, Network, Object-Oriented,
Object-Relational
Callie’s Birthday 2004-10-05 - SLIDE 24
IS 202 – FALL 2004
Models (1)
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
More later on this…
Callie’s Birthday 2004-10-05 - SLIDE 25
IS 202 – FALL 2004
Data Models(2): History
• Hierarchical Model (1960’s and 1970’s)– Similar to data structures in programming
languages
Books(id, title)
Publisher SubjectsAuthors
(first, last)
Callie’s Birthday 2004-10-05 - SLIDE 26
IS 202 – FALL 2004
Data Models(2): History
• Network Model (1970’s)– Provides for single entries of data and
navigational “links” through chains of data.
Subjects Books
Authors
Publishers
Callie’s Birthday 2004-10-05 - SLIDE 27
IS 202 – FALL 2004
Data Models(2): History
• Relational Model (1980’s)– Provides a conceptually simple model for data
as relations (typically considered “tables”) with all data visible
Book ID Title pubid Author id1 Introductio 2 12 The history 4 23 New stuff ab 3 34 Another title 2 45 And yet more 1 5
pubid pubname1 Harper2 Addison3 Oxford4 Que
Authorid Author name1 Smith2 Wynar3 Jones4 Duncan5 Applegate
Subid Subject1 cataloging2 history3 stuff
Book ID Subid1 22 13 34 24 3
Callie’s Birthday 2004-10-05 - SLIDE 28
IS 202 – FALL 2004
Data Models(2): History
• Object Oriented Data Model (1990’s)– Encapsulates data and operations as
“Objects”
Books(id, title)
Publisher SubjectsAuthors
(first, last)
Callie’s Birthday 2004-10-05 - SLIDE 29
IS 202 – FALL 2004
Data Models(2): History
• Object-Relational Model (1990’s)– Combines the well-known properties of the
Relational Model with such OO features as:• User-defined datatypes• User-defined functions• Inheritance and sub-classing
• All of the major enterprise DBMS systems are now Object-Relational or incorporate Object-Relational features
Callie’s Birthday 2004-10-05 - SLIDE 30
IS 202 – FALL 2004
Lecture Overview
• Review– MediaStreams
• Databases and Database Design
• Database Life Cycle
• ER Diagrams
• Discussion
• Next Time/Readings
Callie’s Birthday 2004-10-05 - SLIDE 31
IS 202 – FALL 2004
Database System Life Cycle
Growth,Change, &
Maintenance6
Operations5
Integration4
Design1
Conversion3
PhysicalCreation
2
Callie’s Birthday 2004-10-05 - SLIDE 32
IS 202 – FALL 2004
Design
• Determination of the needs of the organization
• Development of the Conceptual Model of the database– Typically using Entity-Relationship
diagramming techniques
• Construction of a Data Dictionary
• Development of the Logical Model
Callie’s Birthday 2004-10-05 - SLIDE 33
IS 202 – FALL 2004
Physical Creation
• Development of the Physical Model of the Database– Data formats and types– Determination of indexes, etc.
• Load a prototype database and test
• Determine and implement security, privacy and access controls
• Determine and implement integrity constraints
Callie’s Birthday 2004-10-05 - SLIDE 34
IS 202 – FALL 2004
Conversion
• Convert existing data sets and applications to use the new database– May need programs, conversion utilities to
convert old data to new formats
Callie’s Birthday 2004-10-05 - SLIDE 35
IS 202 – FALL 2004
Integration
• Overlaps with Phase 3
• Integration of converted applications and new applications into the new database
Callie’s Birthday 2004-10-05 - SLIDE 36
IS 202 – FALL 2004
Operations
• All applications run full-scale
• Privacy, security, access control must be in place
• Recovery and Backup procedures must be established and used
Callie’s Birthday 2004-10-05 - SLIDE 37
IS 202 – FALL 2004
Growth, Change, and Maintenance
• Change is a way of life– Applications, data requirements, reports, etc.
will all change as new needs and requirements are found
– The Database and applications and will need to be modified to meet the needs of changes
Callie’s Birthday 2004-10-05 - SLIDE 38
IS 202 – FALL 2004
Another View of the Life Cycle
Operations5
Conversion3
PhysicalCreation
2Growth, Change
6
Integration4
Design1
Callie’s Birthday 2004-10-05 - SLIDE 39
IS 202 – FALL 2004
Lecture Overview
• Review– MediaStreams
• Databases and Database Design
• Database Life Cycle
• ER Diagrams
• Discussion
• Next Time/Readings
Callie’s Birthday 2004-10-05 - SLIDE 40
IS 202 – FALL 2004
Database Design Process
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
Callie’s Birthday 2004-10-05 - SLIDE 41
IS 202 – FALL 2004
Entity
• An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information– Persons (e.g.: customers in a business,
employees, authors)– Things (e.g.: purchase orders, meetings,
parts, companies)
Employee
Callie’s Birthday 2004-10-05 - SLIDE 42
IS 202 – FALL 2004
Attributes
• Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it (this is the Metadata for the entities)
Employee
Last
Middle
First
Name SSN
Age
Birthdate
Projects
Callie’s Birthday 2004-10-05 - SLIDE 43
IS 202 – FALL 2004
Relationships
• Relationships are the associations between entities
• They can involve one or more entities and belong to particular relationship types
Callie’s Birthday 2004-10-05 - SLIDE 44
IS 202 – FALL 2004
Relationships
ClassAttendsStudent
PartSuppliesproject parts
Supplier
Project
Callie’s Birthday 2004-10-05 - SLIDE 45
IS 202 – FALL 2004
Types of Relationships
• Concerned only with cardinality of relationship
TruckAssignedEmployee
ProjectAssignedEmployee
ProjectAssignedEmployee
1 1
n
n
1
m
Chen ER notation
Callie’s Birthday 2004-10-05 - SLIDE 46
IS 202 – FALL 2004
Other Notations
TruckAssignedEmployee
ProjectAssignedEmployee
ProjectAssignedEmployee
“Crow’s Foot”
Callie’s Birthday 2004-10-05 - SLIDE 47
IS 202 – FALL 2004
Other Notations
TruckAssignedEmployee
ProjectAssignedEmployee
ProjectAssignedEmployee
IDEFIX Notation
Callie’s Birthday 2004-10-05 - SLIDE 48
IS 202 – FALL 2004
More Complex Relationships
ProjectEvaluationEmployee
Manager
1/n/n
1/1/1
n/n/1
ProjectAssignedEmployee 4(2-10) 1
SSN ProjectDate
ManagesEmployee
Manages
Is Managed By
1
n
Callie’s Birthday 2004-10-05 - SLIDE 49
IS 202 – FALL 2004
Weak Entities
• Owe existence entirely to another entity
Order-lineContainsOrder
Invoice #
Part#
Rep#
QuantityInvoice#
Callie’s Birthday 2004-10-05 - SLIDE 50
IS 202 – FALL 2004
Supertype and Subtype Entities
ClerkIs one ofSales-rep
Invoice
Other
Employee
Sold
Manages
Callie’s Birthday 2004-10-05 - SLIDE 51
IS 202 – FALL 2004
Many to Many Relationships
Employee
ProjectIsAssigned
ProjectAssignment
Assigned
SSN
Proj#
SSN
Proj#Hours
Callie’s Birthday 2004-10-05 - SLIDE 52
IS 202 – FALL 2004
Lecture Overview
• Review– MediaStreams
• Databases and Database Design
• Database Life Cycle
• ER Diagrams
• Discussion
• Next Time/Readings
Callie’s Birthday 2004-10-05 - SLIDE 53
IS 202 – FALL 2004
Discussion
• Why use DBMS for web-based system development?
• Why Not use IR systems?
• Can you use both?
• Other Questions?
Callie’s Birthday 2004-10-05 - SLIDE 54
IS 202 – FALL 2004
Lecture Overview
• Review– MediaStreams
• Databases and Database Design
• Database Life Cycle
• ER Diagrams
• Database Design
• Discussion
• Next Time/Readings
Callie’s Birthday 2004-10-05 - SLIDE 55
IS 202 – FALL 2004
Next Time
• Database Design – Normalization and SQL
• Readings– Hoffer/McFadden “Logical database Design
and the Relational Model”