20
Organization of Data and Information Organization of Data Hierarchy of Data Data Entities, Attributes, and Keys The Traditional Approach vs. Database Approach The Traditional Flat File Approach The Database Approach Database Design Database Models Data Manipulation in a Relational Database Data Modeling Database Trends

Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

Embed Size (px)

Citation preview

Page 1: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

Organization of Data and Information Organization of Data

Hierarchy of Data Data Entities, Attributes, and Keys

The Traditional Approach vs. Database Approach The Traditional Flat File Approach The Database Approach

Database Design Database Models Data Manipulation in a Relational Database Data Modeling Database Trends

Page 2: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

Organization of Data Data is organized in a certain way such that it can be easily accessed and manipulated by users. Traditionally, data was stored in files. Recently, data is organized in tables in a database system. In this chapter we describe various aspects of data organization. Hierarchy of Data Data is generally organized in a hierarchy that begins with the smallest piece of data used by computers (a bit) and progresses through the hierarchy to a database. Figure below shows the hierarchy of data. Bit: Smallest piece of data, a 0 or 1 Byte: Eight bits make a byte.

A character (a, A, p, P, 1, 9, $, # ) is represented by a byte. Field: One or more characters make a field. Records: One or more fields make a record. File: One or more records make a file or table. Database: One or more files or tables make a database.

2

Page 3: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

Data Entities, Attributes, and Keys Entity An entity is a generalized class of people, places, or things (objects) for which data is collected, stored, and maintained. Examples are student, course, department, employee, customer, inventory, and order. Attribute (Field) An attribute is a characteristic of an entity. For example, Employee #, Last Name, First Name, Hire Date, Dept. Number are attributes of an entity Employee. Attributes are also termed as fields. A particular value of an attribute or field is called data item. Primary Key It is a field or a set of fields in a record that uniquely identifies a record in a file or table. No other record can have the same data values for the fields. It is used to retrieve, insert, update, and delete values for a record. Key A key is a field or a set of fields in a record that is used to identify a record. One or more records may have the same key. Primary Key A primary key is a field or a set of fields that uniquely identifies a record. Entity: Order Order Number

Order date Item Number Quantity Amount

1001 02/15/2000 1543 4 15.40 1002 03/20/2000 1630 2 50.50

3

Page 4: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

The Traditional Approach vs. Database Approach The Traditional Flat File Approach In traditional approach, data is stored in files. Application programs are written to manipulate data in one or more files. Thus, accounting, finance, manufacturing, human resources, and marketing all developed their own systems and data files within the same organization. This approach to data management, in which separate data files are created and stored for each application program, is called the traditional approach. Thus, the traditional approach required duplication of data to address requirements of various functional areas of business in an organization. For example, customer name and address is duplicated in two or more files. Thus when a user updates information of a customer in one data file, it is not reflected in another file. Program-Data Dependence: Each program has its own data-file and data for one application is not compatible to data for another application. Data Redundancy: Duplication of data in separate files is known as data redundancy. Data redundancy thus conflicts with data integrity. Data Integrity: The degree to which the data in any file is accurate. It follows from the control or elimination of data redundancy. Keeping the customer record in one file can improve data integrity.

4

Page 5: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

Traditional Approach to Data Management

5

Page 6: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

The Database Approach In this approach, all data are stored as a collection called database, which is shared by all applications. A database not only contain data but also a software called the database management system (DBMS). Database management system (DBMS): It is a collection of programs that is used as an interface between the database and the users. When the application program calls for a data, the DBMS finds it in the database and presents it to the application program. A database management system (DBMS) is simply the software that permits an organization to centralize data, manage them efficiently, and provide access to the stored data by application programs. The database approach offers significant advantages over the traditional file approach. These are illustrated below.

6

Page 7: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

Advantages of Database Approach

7 7

Page 8: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

Database Design A database should be designed to store all data relevant to the business and provide quick access and easy modification. Building a database requires two types of designs: a logical design and a physical design. Logical Design: The logical design of a database shows an abstract model of how the data should be structured and arranged to meet an organization’s information need. The logical design involves identifying relationships among the different data items (entities) and grouping them in an orderly fashion. Physical Design: The physical database design shows how the logical database is actually arranged in a direct-access storage device.

Entity-Relationship Diagram (Logical Design) Physical Design

8

Page 9: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

Database Models Conventional DBMS use one of three principal logical database models for keeping track of entities, attributes, and relationships. The three principal database models are: • Hierarchical • network, and • relational. Hierarchical Model In this model, data is organized in a top-down, or inverted tree, structure. To the user, each record looks like an organization chart with one top-level segment called root. An upper segment is connected logically to a lower segment in a parent-child relationship. Data is accessed logically by going through the appropriate “generation” of parents to get to the desired data element, and there is only one access path. Figure below shows a hierarchical structure that might be used for a human resources database. In a hierarchical database, the data are physically linked to one another by a series of pointers that form chains of related data segments. The most common hierarchical DBMS is IBM’s IMS (Information Management System).

9

Page 10: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

Network Model The network data model is a variation of the hierarchical data model. Whereas, hierarchical structures depict one-to-many relationships, network structures depict many-to-many relationships. In other words, parents can have multiple children, and a child can have more than one parent. A typical student-course network model is illustrated below. As one student can take more than one course, storing data require duplication of pointers to define the relationships. As the number of pointers increases, the database operation becomes more complicated.

10

Page 11: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

Relational Model In the relational model, all data elements are placed in simple two-dimensional tables called relations, which are logical equivalent of files. Each relation represents a entity (of the Entity-Relationship diagram) and the columns of the relation represent the attributes of the entity. Primary Key: In each relation or table, there is a field that uniquely identifies each row in the table. Foreign Key: The relationship between any two entities are created by duplicating the primary key of a relation to another, and it is called a foreign key. Tuple: In a relational mode, each row of a relation is termed as a record or tuple. Domain: The allowable values of an attribute of a relation, is called the domain of the attribute. For example, the domain of the attribute sex of an employee is: male and female).

11

Page 12: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

Data Manipulation in a Relational Database Once data are placed into a relational database, users can make inquiries and analyze data. There are three basic data manipulation processes: selection, projection, and join. Selection Selection involves selecting certain rows of data in a relation according to certain criteria. Example: SELECT * FROM Project WHERE Project_Number = 226 Projection Projection involves selection of data values from certain columns of a relation. Example: SELECT Dept_Number, Dept_Name FROM Project WHERE Dept_Number = 598 Join Joining involves combing two or more relations. Example: SELECT Project_Number, Project_Name, Dept_Number, Dept_Name FROM Project, Department WHERE Project.Dept_Number = Department.Dept_Number

12

Page 13: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

Joining Two or More Relations

13

Page 14: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

Data Modeling Tools To create a database, one must go through design exercises. The design process identifies relationships among data elements and the most efficient ways of grouping data elements. The process also identifies redundant data elements. Database designers document the logical data model with an entity-relationship diagram. Entity-Relationship (ER) Diagram In this model, rectangles are used for entities and diamonds are used for relationships between the entities. The attributes of an entity are represented by ovals. The diagram defines one-to-many or many-to-many relationships between the entities. E-R diagrams are used to develop relations and the relationships among them in a relational database.

Relationship

Entities

Attributes

14

Page 15: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

Database Trends Continuing developments in information technology and its business applications have resulted in the evolution of several major types of databases: • Data Warehouse • Distributed Database • Object-Oriented Database Data Warehouse A data warehouse stores data from current and previous years that has been extracted from the various operational databases (TPS) of an organization. It is a central source of data that has been screened, edited, standardized, and integrated in a single database. It is specifically designed specifically to support management decision making such as business analysis, market research, and decision support. It is not for meeting the needs of transaction processing system. It is common for a data warehouse to contain 5 to 10 years of current and historical data, and the size can be of hundreds of gigabytes. Data Marts: When a data warehouse is organized for one department or function, it is called a data mart. For example, the current and historical financial information of a company is kept in a data mart. Data Mining: In data mining, the data in a data warehouse are processed to identify key factors and trends in historical patterns of business activity. This can be used to help managers make decisions about strategic changes in business operations to gain competitive advantages in the marketplace.

15

Page 16: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

Data Warehousing Process

16

Page 17: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

Distributed Database A distributed database spreads actual data in several smaller databases in multiple geographical locations that are connected via network. This is normally performed for an enterprise, which has nationwide or worldwide business locations. Instead of keeping all data in one location or replicating all data in every office location, data are distributed according to the need. Distributed databases give corporations more flexibility in how databases are organized and used. Local offices can create, manage, and use their own databases, and people at other offices can access and share in the local databases. For example, a nationwide bank may have a southern regional office in Dallas and branches in major cities in the south. In addition to keeping the account information of it’s own branch, the Dallas office might replicate updated account information of other major cities. The managers of the regional office thus do not have to obtain information from the Houston office over the slower network, the information is readily available without their knowledge. Because data are distributed and replicated in multiple locations, distributed databases create additional challenges in maintaining data security, accuracy, timeliness, and conformance to standards. Replicated Database: A replicated database holds a duplicate set of frequently used data. At the beginning of the day, the company will send a copy of important data to each distributed processing location. At the end of the day, the different sites send the changed data back to be stored in the main database.

17

Page 18: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

Distributed Database

(Using computers and network devices, users can access data located in the corporate headquarter, in research and development center, the warehouse, and in retail stores)

18

Page 19: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

Object-Oriented Database Conventional database management systems were designed for homogeneous data that can be easily structured into predefined data fields and records. Current applications require databases that can store and retrieve not only numbers and characters but also drawings, images, photographs, voice, and full-motion video. Conventional DBMSs are not well suited to handle graphics-based and multimedia applications, because these data can not easily be stored in rows or tables. In an object-oriented database, data are stored as objects, which contain both the data and the processing instructions (methods) needed to complete the database transaction. Hypermedia Database: Hypermedia database stores chunks of information in the form of nodes connected by links established by the user. The nodes can contain text, graphics, sound, full-motion video, or executable computer programs. Searching for information is not be predefined for a user, but the user may branch off to related information through hyper-links.

19

Page 20: Organizing Data and Information - University of Houston ... of Data.pdf · data is organized in tables in a database system. ... human resources, ... Organizing Data and Information

Hypermedia database is suitable for web-based applications, which require multiple data types.

20