View
214
Download
0
Category
Tags:
Preview:
Citation preview
2002.10.14 - SLIDE 1IS 202 – FALL 2002
Prof. Ray Larson & Prof. Marc Davis
UC Berkeley SIMS
Tuesday and Thursday 10:30 am - 12:00 pm
Fall 2002http://www.sims.berkeley.edu/academics/courses/is202/f02/
SIMS 202:
Information Organization
and Retrieval
Lecture 13: Intro to Database Design
2002.10.14 - SLIDE 2IS 202 – FALL 2002
Lecture Overview
• Photo Project Feedback and Assignment 6 Discussion
• Review– Metadata And Markup– XML DTD Construction– XML For Protocols And Metadata Languages
• Databases and Database Design• Database Life Cycle• ER Diagrams• Database Design
2002.10.14 - SLIDE 3IS 202 – FALL 2002
Lecture Overview
• Photo Project Feedback and Assignment 6 Discussion
• Review– Metadata And Markup– XML DTD Construction– XML For Protocols And Metadata Languages
• Databases and Database Design• Database Life Cycle• ER Diagrams• Database Design
2002.10.14 - SLIDE 4IS 202 – FALL 2002
Photo Metadata Matters
"Unlike people's recollections, photographs don't change. They don't lie." — Bill Simon
2002.10.14 - SLIDE 5IS 202 – FALL 2002
Photo Project Feedback
IS202 PHOTO PROJECT STUDENT FEEDBACK
0 5 10 15 20 25 30
Need shared folders for facet groups
Frustrated with consolidation process
Need smaller facet groups
Liked reorg into facet groups
Offer help about group process
Show examples of other classification schemes
Need more discussion of assignments in class
Need more clarity about classification practices and principles
Assignments were too time-pressed
Assignments need to be clearer
Need better overview of whole project at the beginning
Learned a lot
Group experience was useful
Came to understand difficulty of classifying
Gro
up
sA
ssig
nm
en
tsL
ea
rne
d
2002.10.14 - SLIDE 6IS 202 – FALL 2002
Where We Are Headed
• 450+ photos annotated in our consolidated metadata classification
• Searchable from SIMS web site in the Flamenco Browser
• Hopefully, some project teams will implement their applications as well– If not in 202, then in future SIMS projects
2002.10.14 - SLIDE 7IS 202 – FALL 2002
Consolidated Photo Browser
http://fusion.sims.berkeley.edu/photo_project/photodatabase.cfm
2002.10.14 - SLIDE 8IS 202 – FALL 2002
Flamenco Image Search
2002.10.14 - SLIDE 9IS 202 – FALL 2002
Flamenco Image Search
2002.10.14 - SLIDE 10IS 202 – FALL 2002
Assignment 6 Discussion
• Procedure for requesting additions to the consolidated classification (Monday through Wednesday only)
• Procedure for facet groups to recommend additions to the consolidated classification (Thursday through Friday)
• Procedure for the facet oversight group to decide additions to the consolidated classification (Friday through Monday)
2002.10.14 - SLIDE 11IS 202 – FALL 2002
Photo Project Name Choices
• SIMS Snapshot• Digital Shoebox• Photo Pigeonhole • Pigeonhole • ImageKey• Picture Yourself• Pictures on the Wall• Distant Camera• Memory to Spare• Memories to Spare
2002.10.14 - SLIDE 12IS 202 – FALL 2002
Lecture Overview
• Photo Project Feedback and Assignment 6 Discussion
• Review– Metadata And Markup– XML DTD Construction– XML For Protocols And Metadata Languages
• Databases and Database Design• Database Life Cycle• ER Diagrams• Database Design
2002.10.14 - SLIDE 13IS 202 – FALL 2002
SGML/XML Structure
• An SGML document consists of three parts:– The SGML Declaration– The Document Type Definition (DTD)– The Document Instance
• An XML document REQUIRES only the document instance, but for effective processing a DTD is very important
• XML Schema provides an alternative to DTDs for XML applications
2002.10.14 - SLIDE 14IS 202 – FALL 2002
DTD Components
• The major components of a DTD are:– Entity Declarations– Element Declarations– Attribute Declarations
2002.10.14 - SLIDE 15IS 202 – FALL 2002
Lecture Overview
• Photo Project Feedback and Assignment 6 Discussion
• Review– Metadata And Markup– XML DTD Construction– XML For Protocols And Metadata Languages
• Databases and Database Design• Database Life Cycle• ER Diagrams• Database Design
2002.10.14 - SLIDE 16IS 202 – FALL 2002
What is a Database?
2002.10.14 - SLIDE 17IS 202 – FALL 2002
Files and Databases
• File: A collection of records or documents dealing with one organization, person, area or subject (Rowley)– Manual (paper) files– Computer files
• Database: A collection of similar records with relationships between the records (Rowley)– Bibliographic, statistical, business data,
images, etc.
2002.10.14 - SLIDE 18IS 202 – FALL 2002
Database
• A Database is a collection of stored operational data used by the application systems of some particular enterprise (C.J. Date)– Paper “Databases”
• Still contain a large portion of the world’s knowledge
– File-Based Data Processing Systems• Early batch processing of (primarily) business data
– Database Management Systems (DBMS)
2002.10.14 - SLIDE 19IS 202 – FALL 2002
Why DBMS?
• History– 50’s and 60’s all applications were custom
built for particular needs– File based– Many similar/duplicative applications dealing
with collections of business data– Early DBMS were extensions of programming
languages– 1970 - E.F. Codd and the Relational Model– 1979 - Ashton-Tate and first Microcomputer
DBMS
2002.10.14 - SLIDE 20IS 202 – FALL 2002
File Based Systems
Naughty
NiceJust what asked for
CoalEstimation
DeliveryList
Application File
ToysAddresses
Toys
2002.10.14 - SLIDE 21IS 202 – FALL 2002
From File Systems to DBMS
• Problems with file processing systems– Inconsistent data– Inflexibility– Limited data sharing– Poor enforcement of standards– Excessive program maintenance
2002.10.14 - SLIDE 22IS 202 – FALL 2002
DBMS Benefits
• Minimal data redundancy• Consistency of data• Integration of data• Sharing of data• Ease of application development• Uniform security, privacy, and integrity controls• Data accessibility and responsiveness• Data independence• Reduced program maintenance
2002.10.14 - SLIDE 23IS 202 – FALL 2002
Terms and Concepts
• Data independence– Physical representation and location of data
and the use of that data are separated• The application doesn’t need to know how or
where the database has stored the data, but just how to ask for it
• Moving a database from one DBMS to another should not have a material effect on application program
• Recoding, adding fields, etc. in the database should not affect applications
2002.10.14 - SLIDE 24IS 202 – FALL 2002
Database Environment
CASE Tools
DBMS
UserInterface
ApplicationPrograms
Repository Database
2002.10.14 - SLIDE 25IS 202 – FALL 2002
Database Components
DBMS===============
Design toolsTable CreationForm CreationQuery CreationReport Creation
Procedural language
compiler (4GL)=============
Run timeForm processorQuery processor
Report WriterLanguage Run time
UserInterface
Applications
ApplicationProgramsDatabase
Database contains:User’s DataMetadataIndexesApplication Metadata
2002.10.14 - SLIDE 26IS 202 – FALL 2002
Types of Database Systems
• PC databases
• Centralized database
• Client/server databases
• Distributed databases
• Database models
2002.10.14 - SLIDE 27IS 202 – FALL 2002
PC Databases
E.g.:AccessFoxProDbaseEtc.
2002.10.14 - SLIDE 28IS 202 – FALL 2002
Centralized Databases
Central Computer
2002.10.14 - SLIDE 29IS 202 – FALL 2002
Client Server Databases
NetworkClient
Client
Client
DatabaseServer
2002.10.14 - SLIDE 30IS 202 – FALL 2002
Distributed Databases
computercomputer
computer
Location A
Location CLocation B
HomogeneousDatabases
2002.10.14 - SLIDE 31IS 202 – FALL 2002
Distributed Databases
Local Network
DatabaseServer
Client
Client
CommServer
Remote Comp.
Remote Comp.
HeterogeneousOr FederatedDatabases
2002.10.14 - SLIDE 32IS 202 – FALL 2002
Terms and Concepts
• Database application– An application program (or set of related
programs) that is used to perform a series of database activities:
• Create• Read• Update• Delete
On behalf of database users
2002.10.14 - SLIDE 33IS 202 – FALL 2002
Terms and Concepts
• Database activities:– Create
• Add new data to the database
– Read• Read current data from the database
– Update• Update or modify current database data
– Delete• Remove current data from the database
2002.10.14 - SLIDE 34IS 202 – FALL 2002
Terms and Concepts
• Enterprise– Organization
• Entity– Person, Place, Thing, Event, Concept...
• Attributes– Data elements (facts) about some entity– Also sometimes called fields or items or domains
• Data values– Instances of a particular attribute for a particular
entity
2002.10.14 - SLIDE 35IS 202 – FALL 2002
Terms and Concepts
• Records– The set of values for all attributes of a
particular entity– AKA “tuples” or “rows” in relational DBMS
• File– Collection of records – AKA “Relation” or “Table” in relational DBMS
2002.10.14 - SLIDE 36IS 202 – FALL 2002
Terms and Concepts
• Key– An attribute or set of attributes used to identify
or locate records in a file
• Primary Key– An attribute or set of attributes that uniquely
identifies each record in a file
2002.10.14 - SLIDE 37IS 202 – FALL 2002
Terms and Concepts
• Models– (1) Levels or views of the Database
• Conceptual, logical, physical
– (2) DBMS types• Relational, Hierarchic, Network, Object-Oriented,
Object-Relational
2002.10.14 - SLIDE 38IS 202 – FALL 2002
Models (1)
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
2002.10.14 - SLIDE 39IS 202 – FALL 2002
Data Models(2): History
• Hierarchical Model (1960’s and 1970’s)– Similar to data structures in programming
languages
Books(id, title)
Publisher SubjectsAuthors
(first, last)
2002.10.14 - SLIDE 40IS 202 – FALL 2002
Data Models(2): History
• Network Model (1970’s)– Provides for single entries of data and
navigational “links” through chains of data.
Subjects Books
Authors
Publishers
2002.10.14 - SLIDE 41IS 202 – FALL 2002
Data Models(2): History
• Relational Model (1980’s)– Provides a conceptually simple model for data
as relations (typically considered “tables”) with all data visible
Book ID Title pubid Author id1 Introductio 2 12 The history 4 23 New stuff ab 3 34 Another title 2 45 And yet more 1 5
pubid pubname1 Harper2 Addison3 Oxford4 Que
Authorid Author name1 Smith2 Wynar3 Jones4 Duncan5 Applegate
Subid Subject1 cataloging2 history3 stuff
Book ID Subid1 22 13 34 24 3
2002.10.14 - SLIDE 42IS 202 – FALL 2002
Data Models(2): History
• Object Oriented Data Model (1990’s)– Encapsulates data and operations as
“Objects”
Books(id, title)
Publisher SubjectsAuthors
(first, last)
2002.10.14 - SLIDE 43IS 202 – FALL 2002
Data Models(2): History
• Object-Relational Model (1990’s)– Combines the well-known properties of the
Relational Model with such OO features as:• User-defined datatypes• User-defined functions• Inheritance and sub-classing
2002.10.14 - SLIDE 44IS 202 – FALL 2002
Lecture Overview
• Photo Project Feedback and Assignment 6 Discussion
• Review– Metadata And Markup– XML DTD Construction– XML For Protocols And Metadata Languages
• Databases and Database Design• Database Life Cycle• ER Diagrams• Database Design
2002.10.14 - SLIDE 45IS 202 – FALL 2002
Database System Life Cycle
Growth,Change, &
Maintenance6
Operations5
Integration4
Design1
Conversion3
PhysicalCreation
2
2002.10.14 - SLIDE 46IS 202 – FALL 2002
Design
• Determination of the needs of the organization
• Development of the Conceptual Model of the database– Typically using Entity-Relationship
diagramming techniques
• Construction of a Data Dictionary
• Development of the Logical Model
2002.10.14 - SLIDE 47IS 202 – FALL 2002
Physical Creation
• Development of the Physical Model of the Database– Data formats and types– Determination of indexes, etc.
• Load a prototype database and test
• Determine and implement security, privacy and access controls
• Determine and implement integrity constraints
2002.10.14 - SLIDE 48IS 202 – FALL 2002
Conversion
• Convert existing data sets and applications to use the new database– May need programs, conversion utilities to
convert old data to new formats
2002.10.14 - SLIDE 49IS 202 – FALL 2002
Integration
• Overlaps with Phase 3
• Integration of converted applications and new applications into the new database
2002.10.14 - SLIDE 50IS 202 – FALL 2002
Operations
• All applications run full-scale
• Privacy, security, access control must be in place
• Recovery and Backup procedures must be established and used
2002.10.14 - SLIDE 51IS 202 – FALL 2002
Growth, Change, and Maintenance
• Change is a way of life– Applications, data requirements, reports, etc.
will all change as new needs and requirements are found
– The Database and applications and will need to be modified to meet the needs of changes
2002.10.14 - SLIDE 52IS 202 – FALL 2002
Another View of the Life Cycle
Operations5
Conversion3
PhysicalCreation
2Growth, Change
6
Integration4
Design1
2002.10.14 - SLIDE 53IS 202 – FALL 2002
Lecture Overview
• Photo Project Feedback and Assignment 6 Discussion
• Review– Metadata And Markup– XML DTD Construction– XML For Protocols And Metadata Languages
• Databases and Database Design• Database Life Cycle• ER Diagrams• Database Design
2002.10.14 - SLIDE 54IS 202 – FALL 2002
Database Design Process
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
2002.10.14 - SLIDE 55IS 202 – FALL 2002
Entity
• An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information– Persons (e.g.: customers in a business,
employees, authors)– Things (e.g.: purchase orders, meetings,
parts, companies)
Employee
2002.10.14 - SLIDE 56IS 202 – FALL 2002
Attributes
• Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it (this is the Metadata for the entities)
Employee
Last
Middle
First
Name SSN
Age
Birthdate
Projects
2002.10.14 - SLIDE 57IS 202 – FALL 2002
Relationships
• Relationships are the associations between entities
• They can involve one or more entities and belong to particular relationship types
2002.10.14 - SLIDE 58IS 202 – FALL 2002
Relationships
ClassAttendsStudent
PartSuppliesproject parts
Supplier
Project
2002.10.14 - SLIDE 59IS 202 – FALL 2002
Types of Relationships
• Concerned only with cardinality of relationship
TruckAssignedEmployee
ProjectAssignedEmployee
ProjectAssignedEmployee
1 1
n
n
1
m
Chen ER notation
2002.10.14 - SLIDE 60IS 202 – FALL 2002
Other Notations
TruckAssignedEmployee
ProjectAssignedEmployee
ProjectAssignedEmployee
“Crow’s Foot”
2002.10.14 - SLIDE 61IS 202 – FALL 2002
Other Notations
TruckAssignedEmployee
ProjectAssignedEmployee
ProjectAssignedEmployee
IDEFIX Notation
2002.10.14 - SLIDE 62IS 202 – FALL 2002
More Complex Relationships
ProjectEvaluationEmployee
Manager
1/n/n
1/1/1
n/n/1
ProjectAssignedEmployee 4(2-10) 1
SSN ProjectDate
ManagesEmployee
Manages
Is Managed By
1
n
2002.10.14 - SLIDE 63IS 202 – FALL 2002
Weak Entities
• Owe existence entirely to another entity
Order-lineContainsOrder
Invoice #
Part#
Rep#
QuantityInvoice#
2002.10.14 - SLIDE 64IS 202 – FALL 2002
Supertype and Subtype Entities
ClerkIs one ofSales-rep
Invoice
Other
Employee
Sold
Manages
2002.10.14 - SLIDE 65IS 202 – FALL 2002
Many to Many Relationships
Employee
ProjectIsAssigned
ProjectAssignment
Assigned
SSN
Proj#
SSN
Proj#Hours
2002.10.14 - SLIDE 66IS 202 – FALL 2002
Lecture Overview
• Photo Project Feedback and Assignment 6 Discussion
• Review– Metadata And Markup– XML DTD Construction– XML For Protocols And Metadata Languages
• Databases and Database Design• Database Life Cycle• ER Diagrams• Database Design
2002.10.14 - SLIDE 67IS 202 – FALL 2002
Database Design Process
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
2002.10.14 - SLIDE 68IS 202 – FALL 2002
Requirements Analysis
• Conceptual Requirements– Systems Analysis Process
• Examine all of the information sources used in existing applications
• Identify the characteristics of each data element– Numeric– Text– Date/time– Etc.
• Examine the tasks carried out using the information
• Examine results or reports created using the information
2002.10.14 - SLIDE 69IS 202 – FALL 2002
Conceptual Design
• Conceptual Model– Merge the collective needs of all applications– Determine what Entities are being used
• Some object about which information is to maintained
– What are the Attributes of those entities?• Properties or characteristics of the entity• What attributes uniquely identify the entity
– What are the Relationships between entities• How the entities interact with each other?
2002.10.14 - SLIDE 70IS 202 – FALL 2002
Developing a Conceptual Model
• Overall view of the database that integrates all the needed information discovered during the requirements analysis
• Elements of the Conceptual Model are represented by diagrams, Entity-Relationship or ER Diagrams, that show the meanings and relationships of those elements independent of any particular database systems or implementation details
• Can also be represented using other modeling tools (such as UML)
2002.10.14 - SLIDE 71IS 202 – FALL 2002
Logical Design
• Logical Model– How is each entity and relationship
represented in the Data Model of the DBMS• Hierarchic?• Network?• Relational?• Object-Oriented?
2002.10.14 - SLIDE 72IS 202 – FALL 2002
Physical Design
• Internal Model– Choices of index file structure– Choices of data storage formats– Choices of disk layout
2002.10.14 - SLIDE 73IS 202 – FALL 2002
Database Application Design
• External Model– User views of the integrated database – Making the old (or updated) applications work
with the new database design
Recommended