ITEC 1010 Information and Organizations
Chapter 5
Organizing Data and Information
ITEC 1010 Information and Organizations
Data
Data A necessity for almost any enterprise to carry out its
business. Consists of raw facts, and when organized may be transformed into information
Database A collection of data organized to meet users’ needs
Database management system (DBMS) A group of programs that manipulate the database and
provide an interface between the database and the user of the database or other application programs
ITEC 1010 Information and Organizations
DBMS ‘Discussion’ (1)
A collection of programs that enables you to store, modify, and extract information from a database. There are many different types of DBMSs, ranging from small systems that run on personal computers to huge systems that run on mainframes. The following are examples of database applications:
computerized library systems automated teller machines flight reservation systems computerized parts inventory systems
From a technical standpoint, DBMSs can differ widely. The terms
ITEC 1010 Information and Organizations
DBMS ‘Discussion’ (2)
relational, network, flat, and hierarchical all refer to the way a DBMS organizes information internally. The internal organization can affect how quickly and flexibly you can extract information.
Requests for information from a database are made in the form of a query, which is a stylized question. For example, the query
SELECT ALL WHERE NAME = "SMITH" AND AGE > 35
requests all records in which the NAME field is SMITH and the AGE
ITEC 1010 Information and Organizations
DBMS ‘Discussion’ (3)
field is greater than 35. The set of rules for constructing queries is known as a query language. Different DBMSs support different query languages, although there is a semi-standardized query language called SQL (structured query language). Sophisticated languages for managing database systems are called fourth-generation languages, or 4GLs for short.
The information from a database can be presented in a variety of formats. Most DBMSs include a report writer program that enables you to output data in the form of a report. Many DBMSs also include
ITEC 1010 Information and Organizations
DBMS ‘Discussion’ (4)
a graphics component that enables you to output information in the
form of graphs and charts.
ITEC 1010 Information and Organizations
Hierarchy of Data
Schematic
ITEC 1010 Information and Organizations
Hierarchy of data Example
DatabaseDatabase
FilesFiles
RecordsRecords
FieldsFields
Characters(bytes)
Characters(bytes)
Personel file
Department file
Payroll file
(Project database)
005-10-6321 Johns Francine 10-7-65549-77-1001 Buckley Bill 2-17-79098-40-1370 Fiske Steven 1-5-85
(Personnel file)
098-40-1370 Fiske Steven 1-5-85 598 (Record containing SSN, last name, first name, date of hire)
Fiske (Last name field)
1000100 (Letter ‘F’ in ASCII)
ITEC 1010 Information and Organizations
Terminology
Database• A collection of integrated and related files
File• A collection of related records
Record• A collection of related fields
Field• A group of characters
Character• Basic building block of information, represented by a byte
ITEC 1010 Information and Organizations
Data Entities, Attributes, and Keys
Entity• A generalized class of people, places, or things (objects) for
which data are collected, stored, and maintained• E.g., Customer, Employee
Attribute• A characteristic of an entity; something the entity is identified by• E.g., Customer name, Employee name
Keys• A field or set of fields in a record that is used to identify the
record• E.g, A field or set of fields that uniquely identifies the record
ITEC 1010 Information and Organizations
Keys and Attributes
Key field Attributes (fields)Entities(records)
Employee # Last name First name Hire date Dept. #
005-10-6321 Johns Francine 10-7-65 257
549-77-1001 Buckley Bill 2-17-79 650
098-40-1370 Fiske Steven 1-5-85 598
ITEC 1010 Information and Organizations
The Traditional Approach
The traditional approach… Separate files are created and stored for each
application program
Schematic
ITEC 1010 Information and Organizations
PayrollPayroll
programsReports
Data FilesApplicationprograms Users
InvoicingInvoicingprograms
Reports
Inventorycontrol
Inventorycontrol
programs
Reports
Managementinquiries
Managementinquiriesprograms
Reports
ITEC 1010 Information and Organizations
Drawbacks
Data redundancy Duplication of data in separate files
Lack of data integrity The degree to which the data in any one file is accurate
Program-data dependence A situation in which program and data organized for
one application are incompatible with programs and data organized differently for another application
ITEC 1010 Information and Organizations
Database Approach
The database approach… A pool of related data is shared by multiple
application programs Rather than having separate data files, each
application uses a collection of data that is either joined or related in the database
Schematic
ITEC 1010 Information and Organizations
Payroll data
Inventorydata
InvoicingData
Otherdata
Databasemanagement
system
Payrollprogram
Inventoryprogram
Invoicingprogram
Otherprograms
Reports
Reports
Reports
Reports
Database Interface Applications programs Users
ITEC 1010 Information and Organizations
Advantages
Improved strategic use of corporate data Reduced data redundancy Improved data integrity Easier modification and updating Data and program independence Better access to data and information Standardization of data access A framework for program development Better overall protection of the data Shared data and information resources
ITEC 1010 Information and Organizations
Disadvantages
Relatively high cost of purchasing and operating a DBMS in a mainframe operating environment
Increased cost of specialized staff Increased vulnerability
ITEC 1010 Information and Organizations
Data Modeling and Database Models (1)
Planned data redundancy A way of organizing data in which the logical
database design is altered so that certain data entities are combined
Summary totals are carried in the data records rather than calculated from elemental data
Some data attributes are repeated in more than one data entity to improve database performance
ITEC 1010 Information and Organizations
Data Modeling and Database Models (2)
Data model A map or diagram of entities and their relationships
Enterprise data modeling Data modeling done at the level of the entire
organization
Entity-relationship (ER) diagrams A data model that uses basic graphical symbols to
show the organization of and relationships between data
ITEC 1010 Information and Organizations
Example:Entity Relationship (ER) Diagram for a Customer
Ordering Database
Schematic
ITEC 1010 Information and Organizations
OrderCustomer Product
1:N one-to-manyrelationship
Last name Colour
Identificationnumber
NameFirstname
Identificationnumber
Attributes
Entities
1 N
ITEC 1010 Information and Organizations
Hierarchical Database Model
Hierarchical database model A data model in which data are organized in a
top-down, or inverted tree structure
Schematic
ITEC 1010 Information and Organizations
Department C
Employee1
Employee2
Employee3
Employee4
Employee5
Employee6
Department B
Project 1
Department A
ITEC 1010 Information and Organizations
Network Data Model
Network data model• An expansion of the hierarchical database model
with an owner-member relationship in which a member may have many owners
Project 1 Project 2
Department A Department B Department C
ITEC 1010 Information and Organizations
Relational Data Model
Relational data model All data elements are placed in two-
dimensional tables, called relations, that are the logical equivalent of files
Schematic
ITEC 1010 Information and Organizations
Project Number Description Dept. Number
155 Payroll 257
498 Widgets 632
226 Sales manager 598
Dept. Number Dept. Name Manager SSN
257 Accounting 421-55-99993
632Manufacturin
g765-00-3192
598 Marketing 098-40-1370
SSN Last Name First Name Hire Date Dept. Number
005-10-6321 Johns Francine 10-7-65 257
549-77-1001 Buckley Bill 2-17-79 650
098-40-1370 Fiske Steven 1-5-85 598
Data Table 1: Project Table Data Table 2: Department Table
Data Table 3: Manager Table
ITEC 1010 Information and Organizations
Relational Database Terminology
Selecting Data manipulation that eliminates rows according to
certain criteria Projecting
Data manipulation that eliminates columns in a table Joining
Data manipulation that combines two or more tables Linked
Relating tables in a relational database together
ITEC 1010 Information and Organizations
Linking Data Tables to Answer an Inquiry
Schematic
ITEC 1010 Information and Organizations
Project Number Description Dept. Number
155 Payroll 257
498 Widgets 632
226 Sales manager 598
Dept. Number Dept. Name Manager SSN
257 Accounting 421-55-99993
632 Manufacturing 765-00-3192
598 Marketing 098-40-1370
SSN Last Name First Name Hire Date Dept. Number
005-10-6321 Johns Francine 10-7-65 257
549-77-1001 Buckley Bill 2-17-79 650
098-40-1370 Fiske Steven 1-5-85 598
ITEC 1010 Information and Organizations
Building and Modifying a Relational Database
Using Microsoft Access
Screen snap
ITEC 1010 Information and Organizations
ITEC 1010 Information and Organizations
Schemas and Subschemas
Schema A description of the entire database
Subschema A file that contains a description of a subset of
the database and identifies which users can perform modifications on the data items in that subset
Schematic
ITEC 1010 Information and Organizations
DBMS
Schema
SubschemaB
SubschemaA
SubschemaC
User1
User2
User3
User4
User5
ITEC 1010 Information and Organizations
Schema ‘Discussion’
Pronounced skee-ma, the structure of a database system, described
in a formal language supported by the database management
system (DBMS). In a relational database, the schema defines the
tables, the fields in each table, and the relationships between fields
and tables.
Schemas are generally stored in a data dictionary. Although a
schema is defined in text database language, the term is often used
to refer to a graphical depiction of the database structure.
ITEC 1010 Information and Organizations
Data Definition Language
Data Definition Language (DDL) A collection of instructions and commands
used to define and describe data and data relationships in a specific database
Schematic
ITEC 1010 Information and Organizations
SCHEMA DESCRIPTIONSCHEMA NAME IS XXXXAUTHOR XXXXDATE XXXXFILE DESCRIPTION
FILE NAME IS XXXX ASSIGN XXXXFILE NAME IS XXXXASSIGN XXXX
AREA DESCRIPTIONAREA NAME IS XXXX
RECORD DESCRIPTIONRECORD NAME ISXXXXRECORD ID IS XXXXLOCATION MODE ISXXXXWITHIN XXX AREA FROM XXXX THRU XXXX
SET DESCRIPTIONSET NAME IS XXXXORDER IS XXXXMODE IS XXXXMEMBER IS XXXX...
ITEC 1010 Information and Organizations
Data Dictionary
Data Dictionary A detailed description of all data used in the
database
Schematic
ITEC 1010 Information and Organizations
NORTHWESTERN MANUFACTURING
PREPARED BY: D. BORDWELLDATE: 04 AUGUSTAPPROVED BY: J. EDWARDS DATE: 13 OCTOBERVERSION: 3.1PAGE: 1 OF 1
DATA ELEMENT NAME: PARTNODESCRIPTION: INVENTORY PART NUMEROTHER NAMES: PTNOVALUE RANGE: 100 TO 5000DATA TYPE: NUMERICPOSITIONS: 4 POSITIONS OR COLUMNS
ITEC 1010 Information and Organizations
Data Dictionary Features
Provide a standard definition of terms and data elements
Assist programmers in designing and writing programs
Simplify database modification Reduce data redundancy Increase data reliability Faster program development Easier modification of data and information
ITEC 1010 Information and Organizations
Logical and PhysicalAccess Paths
Logical access path (LAP) Application requires information from the
DBMS
Physical access path (PAP) DBMS accesses a storage device to retrieve
data
Schematic
ITEC 1010 Information and Organizations
Dataon
storagedevices
DBMS
Managementinquiries
Othersoftware
Applicationprograms
Physical access path (PAP)
Logical access path (LAP)
ITEC 1010 Information and Organizations
Manipulating Data
Concurrency control A method of dealing with a situation in which two or
more people need to access the same record in a database at the same time
Data manipulation language (DML) The commands that are used to manipulate the data in
a database
Structured query language (SQL) A standardized data manipulation language
ITEC 1010 Information and Organizations
Structured Query Language (SQL)
“Invented” at IBM’s Almaden Research Centre (San Jose, CA) in the 1970s
E.g.,
Select all (“*”) columns from the EMPLOYEE table in which the JOB_CLASSIFICATION field is equal to “C2”
SELECT * FROM EMPLOYEE WHEREJOB_CLASSIFICATION = “C2”
ITEC 1010 Information and Organizations
SQL ‘Discussion’ (1)
Abbreviation of structured query language, and pronounced either see-kwell or as separate letters. SQL is a standardized query language for requesting information from a database. The original version called SEQUEL (structured English query language) was designed by an IBM research center in 1974 and 1975. SQL was first introduced as a commercial database system in 1979 by Oracle Corporation.
Historically, SQL has been the favorite query language for database management systems running on minicomputers and mainframes.
ITEC 1010 Information and Organizations
SQL ‘Discussion’ (2)
Increasingly, however, SQL is being supported by PC database systems because it supports distributed databases (databases that are spread out over several computer systems). This enables several users on a local-area network to access the same database simultaneously.
Although there are different dialects of SQL, it is nevertheless the closest thing to a standard query language that currently exists. In 1986, ANSI approved a rudimentary version of SQL as the official standard, but most versions of SQL since then have included many
ITEC 1010 Information and Organizations
SQL ‘Discussion’ (3)
extensions to the ANSI standard. In 1991, ANSI updated the
standard. The new standard is known as SAG SQL.
ITEC 1010 Information and Organizations
Database Output
Screen snap
ITEC 1010 Information and Organizations
ITEC 1010 Information and Organizations
Popular Database Management Systems for End Users
Microsoft Access 98 Lotus Approach 98 Inprise (formerly Borland) dBASE DBMS Selection Criteria
• Database size• Number of concurrent users• Performance• Integration• Features• The vendor• Cost
ITEC 1010 Information and Organizations
Distributed Databases
Distributed database… A database in which the actual data may be
spread across several smaller databases connected via telecommunications devices
‘Pretty’ picture
ITEC 1010 Information and Organizations
ITEC 1010 Information and Organizations
Data Warehouse
Data warehouse A relational database management system designed
specifically to support management decision making Current evolution of Decision Support Systems
(DSSs)
Data mart A subset of a data warehouse for small and medium-
size businesses or departments within larger companies Schematic
ITEC 1010 Information and Organizations
Relationaldatabases
Hierarchicaldatabases
Networkdatabases
Flat files
Spreadsheets
Dataextraction
process
Query andanalysis
tools
Datawharehouse
Datacleanupprocess
End user access
ITEC 1010 Information and Organizations
Designing a Customer Data Warehouse
Sharply define your goals and objectives before you build the warehouse
Choose the software that best fits your goals Determine who/what should be in the database Develop a plan Measure results
ITEC 1010 Information and Organizations
Data Mining Applications
Data mining The automated discovery of patterns and relationships
in a data warehouse
Data mining applications• Market segmentation
• Customer queries
• Fraud detection
• Direct marketing
• Market basket analysis
• Trend analysis
ITEC 1010 Information and Organizations
On-Line AnalyticalProcessing (OLAP)
On-line analytic processing (OLAP)• Access to multidimensional databases providing
managerially useful display techniques• Now used to store and deliver data warehouse
information
Data warehouse and OLAP• Provides top-down, query-driven analysis
Data mining• Provides bottom-up, discovery-driven analysis
ITEC 1010 Information and Organizations
Open DatabaseConnectivity (ODBC)
Open database connectivity (ODBC)• A set of standards that ensures software written to
comply with these standards can be used with any ODBC-compliant database
Schematic
ITEC 1010 Information and Organizations
dBASE
ACCESSdatabase
Lotus 1-2-3spreadsheet
Paradoxdatabase
ODBC Import
ODBC Link
ODBC Export
ITEC 1010 Information and Organizations
Object-Relational Database Management Systems (ORDBMS)
Object-relational database management system (ORDBMS)• A DBMS capable of manipulating audio, video, and graphical data.
Hypertext• Users can search and manipulate alphanumeric data in an
unstructured way
Hypermedia• Allows businesses to search and manipulate multimedia forms of
data
Spatial data technology• Use of an object-relational database to store and access data
according to the location it describes and to permit spatial queries and analysis
ITEC 1010 Information and Organizations
End of Chapter 5
Chapter 6