Upload
amos-brendan-wilkerson
View
230
Download
2
Tags:
Embed Size (px)
Citation preview
Data Modeling and Database Design
Database Systems: Architecture and Components
Data
• Data - Distinct pieces of information,
information you store for future
reference• Data can exist in a variety of forms:
1. as numbers or text on pieces of paper,
2. as bits and bytes stored in electronic memory,
3. or as facts stored in a person's mind.
4. etc
What is the difference of Data and Information ?
Data vs. Information
• Data: raw facts (employee names, hours worked etc). Represent real world things. They get value when some relationship is defined between them. Rules and relationships are set up to organize data into valuable information
• Information: collection of facts organized in such a way that they have additional value beyond the facts themselves : information is value-added data
• Turning data into information is a process – a set of logically related tasks to achieve a defined outcome
• Knowledge is awareness and understanding of a set of information and the ways how it supports a specific task
Data vs. Information
Valuable Information : result of data processing
Managing Data
Difficulties in Managing Data
Amount of data increases exponentially.
Data are scattered and collected by many individuals using various methods and devices.
Data come from many sources – any problems here ?
Data security, quality and integrity are critical – why ?
Difficulties in Managing Data (continued)
An ever-increasing amount of data needs to be considered in making organizational decisions.
The Data Deluge
Chapter 1 – Database Systems: Architecture and Components 8
Terminology
• Data• Information• Metadata
Chapter 1 – Database Systems: Architecture and Components 9
Data Management
1. Creation of data
2. Retrieval of data
3. Update or modification of data
4. Deletion of data
For that, data must be accessed and, for the ease of access, data must be organized.
What Is a Database?
• Databases in general are sets of data (information) that are arranged for easy access. Doesn’t have to be on a computer (is Rolodex a database ?).
• Databases are good for tracking and reporting on most things in business – i.e. invoices, inventory, customers.
• Examples: use a database to store and retrieve telephone numbers for customers, customer info, history of orders etc
Database:An Organized collection of Data
Database:An Organized collection of Data
Electronic Electronic spreadsheetspreadsheet
Filing Filing cabinetcabinet
Database Database in a in a
computercomputer
Chapter 1 – Database Systems: Architecture and Components 12
History of Data Management
1950 1960 1970 1980 1990 2000
File systems
Hierarchical DBMS
Network DBMS
Relational DBMS
Object-oriented DBMS
The Hierarchy of Data :data is usually organized in a hierarchy
Chapter 1 – Database Systems: Architecture and Components 14
Limitations of File-Processing Systems
• Lack of Data IntegrityData integrity (data values are correct, consistent, complete, and current) is often violated in isolated environments.
• Lack of StandardsOrganizations find it hard to enforce standards for naming data items as well as for accessing, updating, and protecting data.
• Lack of Flexibility/MaintainabilityFile-processing systems are not amenable to structural changes in data and are therefore dependent upon a programmer who can either write or modify program code.
Chapter 1 – Database Systems: Architecture and Components 15
Limitations of File-Processing Systems (continued)
The limitations to file-processing systems are due to:• Lack of Data Integration
Data are separated and isolated in a file-processing environment.
• Lack of Program-Data IndependenceThe structure of each file is embedded in the application programs.
The Traditional Database Approach : flat files to keep data of specific kind
Chapter 1 – Database Systems: Architecture and Components 17
So, What Is Desirable?
• Integrated data – Not data in isolation to be integrated by the application
program/programmer
• Data Independence– Application program(s) immune to changes in storage
structure and access strategy
– Independent user views of data
Relational Database vs Flat Database Files
• Flat file Database: puts all information in “one large table”. OK for small database. Consequences :
• Leads to redundant data.• Potential of data corruption is high.
• Relational Database: divides data into two or more tables and then relates the tables. For large databases, it is a faster, easier & more flexible use of your data.
Database Approach : a pool of the same data shared by multiple applications
Database approach involves a combination of hard ware and software
The Database Approach
• Database management system (DBMS) provides all users with access to all the data.
• DBMSs minimize the following problems:– Data redundancy : The same data are stored in
many places.– Data isolation : Applications cannot access data
associated with other applications.– Data inconsistency : Various copies of the data
do not agree.
Database Approach (continued)
• DBMSs maximize the following issues:– Data security : Keeping the organization’s data safe
from theft, modification, and/or destruction.– Data integrity : Data must meet constraints (e.g.,
student grade point averages cannot be negative).• Data independence : Applications and data are
independent of one another. applications and data are not linked to each other, meaning that different applications are able to access the same data.
Database Management Systems
Advantages of the Database Approach
More on Advantages of the Database Approach
Disadvantages of the Database Approach
Data Modeling and Database Models
• Database design must reflect the enterprise’s business processes
• When building the database , consider :
– Content - What data should be collected?– Access - What data should be given to what
users?– Logical structure - How will the data be
organized to make sense to a particular user?– Physical organization - Where will the data
actually be located?
Data Models : relational data model is the basis for any relational database,
but this model is not the only one
• A data model defines how records are related, which affects how users can access the data.
• Existing Data Models :
– Hierarchical models– Network models– Relational models : the most popular and
most widely used nowadays
Hierarchical (Tree) Model is used for a Flat File database design
• Early Database based on a hierarchical system much like your Windows filing system:
Hierarchical Databases
More on Database types : Network
• Database Models:
2. Network :
• A designer needs to set a predetermined structure : where to store what
• Implement Owner/Member model• One has to be very familiar with
database
Network Models
More on Database Types : Relational
• Database Models:
3. Relational
• Addresses limitations of other models
• Each table represents an entity of its own, related to other entities ( other tables)
• Independent of application which uses data
Entity-Relationship (ER) Diagram:graphical method to show organization
and relationships between data
Chapter 1 – Database Systems: Architecture and Components 34
History of Data Management
In the 1970s, the Standards Planning and Requirements Committee (SPARC) of the American National Standards Institute (ANSI) proposed what came to be known as the ANSI/SPARC three-schema architecture: conceptual, internal and external schema.
Chapter 1 – Database Systems: Architecture and Components 35
EXTERNALSCHEMA
CONCEPTUAL SCHEMA
EXTERNALSCHEMA
INTERNAL SCHEMA
STORED DATABASE
. . . . . . . .Individual User Views
Global View
Storage View
Figure 1.2 The ANSI / SPARC three-schema Architecture
Three Perspectives of Metadata in a Database
Chapter 1 – Database Systems: Architecture and Components 36
Conceptual Schema
• Core of the architecture• Represents the global view of the structure of the
entire database for a community of users• Captures data specification (metadata)• Describes all data items and relationships between
data together with integrity constraints• Separates data from the program (or views from the
physical storage structure)• Technology independent
Chapter 1 – Database Systems: Architecture and Components 37
Internal Schema
• Describes the physical structure of the stored data (e.g., how the data is actually laid out on storage devices)
• Describes the mechanism used to implement access strategies (e.g., indexes, hashed addresses, etc.)
• Technology dependent• Concerned with the efficiency of data storage and
access mechanisms
Chapter 1 – Database Systems: Architecture and Components 38
External Schema
• Represents different user views, each describing portions of the database
• Technology independent• Views are generated exclusively by logical references
Chapter 1 – Database Systems: Architecture and Components 39
Physical and Logical Data Independence
• Physical Data Independence
Definition: External views unaffected by changes to the internal structure
How?: Introduction of conceptual schema between the external views and the internal (physical) schema
Physical and Logical Data Independence (continued)
• Logical Data Independence
Definition: External views unaffected by design changes (growth or restructuring) in conceptual schema
How?: External views generated exclusively through logical reference to elements in the conceptual schema
Consequence: External views unaffected by changes to other external views
Chapter 1 – Database Systems: Architecture and Components 40
Chapter 1 – Database Systems: Architecture and Components 41
What is a Database System?
• A self-describing collection of integrated records
Self-describing
The structure of the database (metadata) is recorded within the database system – not in the application programs.
Integrated
The responsibility for 'integrating' data items as needed is assumed by the DBMS instead of the programmer.
Chapter 1 – Database Systems: Architecture and Components 42
Characteristics of a Database System
Database
A single, integrated set of files
Database Management System (DBMS)
A collection of general-purpose software that facilitates the process of defining, constructing, and manipulating a database for various applications
Chapter 1 – Database Systems: Architecture and Components 43
ApplicationProgram 4
ApplicationProgram 3
Application Program 2
ApplicationProgram 1
ApplicationProgram 5
User-friendly Database Interrogation
Data Items(Minimal/Controlled Redundancy)
Management
A
C
DB HF
E IG
Users
Users
Figure 1.4 An early view of a database system*
*Adapted From Richard L. Nolan, "Computer Data Bases: The Future is Now," Harvard Business Review, (September-October, 1973)
An Early View of a Database System
Chapter 1 – Database Systems: Architecture and Components 44
What is a Database Management System (DBMS)?
A DBMS is a collection of general-purpose software that facilitates the processes of defining, constructing, and manipulating a database.
•The major components of a DBMS include one or more query languages; tools for generating reports; facilities for providing security, integrity, backup and recovery; a data manipulation language for accessing the database; and a data definition language used to define the structure of data.
Chapter 1 – Database Systems: Architecture and Components 45
Query Language[SQL]
Data Manipulation Language[DML/SQL]
Security & Recovery[DCL/SQL]
Report Generator
Access RoutinesData Definition Language
[DDL/SQL]
Database Management System [DBMS]Software component
Figure 1.5 Components of a database system
Database{Contains Data}
Computer-aidedSof tware Engineering
Tools[CASE Tools]
Data Repository{Data Models
Metadata}
Data Dictionary{DBMS Metadata}
Components of a Database System
Chapter 1 – Database Systems: Architecture and Components 46
Types of Database Systems
• Number of usersSingle-user
Desktop database systemMulti-user
Workgroup database systemEnterprise database system
• ScopeDesktop database systemWorkgroup database systemEnterprise database system
Some Popular DBMS :
• Excel **• Access• SQL Server• Oracle• MySQL
More on Popular RDBMS
• MS Access ( small business, low cost, low security, small number of concurrent users)
• MS SQL Server (middle to large companies, good security, good integration with Windows platforms, large number of concurrent users, relatively low cost)
• Oracle (large businesses with high data storage and retrieval requirements, excellent security, scalability, performance, high cost)
• DB2 (IBM) ( large businesses, usually deployed on mainframes and large scale workstations/clusters, requires professional management, high costs, very reliable)
• Free public domain RDBMS ( mySQL, Cloudescape , more) : limited user interface tools, need high qualifications of personnel, from small to large businesses. Free ? No costs?
Popular Database Management Systems
System Development Life Cycle of a Database
System Development Life Cycle of a Database
StrategyStrategyandand
AnalysisAnalysisDesignDesign
BuildBuildandand
DocumentDocument
TransitionTransition
ProductionProduction
Systematic approach to database development- transforms business requirements into an operational database
Design a system basing on the model developed in the strategy and analysis phase.
Build the prototype. Write and execute the command to create tables and objects. Develop user documentation and manual
Analyze business requirements. Build models of the system. Transfer business narrative into graphical representation of needs and rules. Confirm and refine the model with the analysts and experts
Quick Quiz
• What is the difference between data, metadata and information?
• What is the difference between sequential access and direct access? Give an example of each.
ANSWER: File.• What is data integrity and what is significance of a lack of
data integrity? ANSWER: Data integrity.• : attribute • What is the difference between a database and a database
management system? ANSWER: Hierarchical. • What is the role of data models in database design? Which
data model is the most flexible? ANSWER: Relational.• ANSWER: Entity-Relationship (ER) diagrams.
Quick Quiz• True or False: Oracle is the leading provider of database
systems. ANSWER: TrueSWER: File.• What term is used to describe the degree of accuracy of data? ASWER: Data integrity.• : attribute • Which data model uses a parent-child structure? ANSWER: Hierarchical. • Which data model is the most flexible? ANSWER: Relational.• What is the most common data modeling technique? ANSWER: Entity-Relationship (ER) diagrams.• True or False: Oracle is the leading provider of database
systems. ANSWER: TrueANSWER: SQL
Quick Quiz• What is the term used to describe a collection of records? ANSWER: File.• What term is used to describe the degree of accuracy of data? ANSWER: Data integrity.• : attribute • Which data model uses a parent-child structure? ANSWER: Hierarchical. • Which data model is the most flexible? ANSWER: Relational.• What is the most common data modeling technique? ANSWER: Entity-Relationship (ER) diagrams.• True or False: Oracle is the leading provider of database
systems. ANSWER: TrueANSWER: SQL
Quick Quiz• What is the term used to describe a collection of records? ANSWER: File.• What term is used to describe the degree of accuracy of data? ANSWER: Data integrity.• : attribute • Which data model uses a parent-child structure? ANSWER: Hierarchical. • Which data model is the most flexible? ANSWER: Relational.• What is the most common data modeling technique? ANSWER: Entity-Relationship (ER) diagrams.• True or False: Oracle is the leading provider of database
systems. ANSWER: TrueANSWER: SQL