30
Chapter 6: Elements of Database Systems 1 of 30 ACCOUNTING INFORMATION SYSTEMS: A DATABASE APPROACH by: Uday S. Murthy, Ph.D., ACA and S. Michael Groomer, Ph.D., CPA, CISA Elements of Database Systems Learning Objectives After studying this chapter you should be able to: distinguish between the file-oriented approach and the database approach discuss fundamental relational database concepts such as composite and foreign keys specify the types of relationships that can be represented in database systems provide a detailed description of the relational database model discuss database integrity, emphasizing entity and referential integrity in particular explain and provide examples of validation rules in relational database systems discuss how views and permissions can be used to restrict access to sensitive data in relational database systems explain the data dictionary concept describe the types of database languages construct SQL queries to extract information from relational database systems discuss database backup and recovery methods explain concepts such as concurrency control explain in general terms concepts such as the object-oriented approach to developing database systems The elements of computer-based information systems were discussed in Chapter 2. In this chapter we focus on the elements of database systems. As discussed in Chapter 1, the next generation of accounting systems will have an enterprise-wide orientation and will most likely be built on a database platform. We will first differentiate between the older file-oriented approach and the more recent database approach. Various data structure elements specific to database systems will then be discussed. The major

ch6

Embed Size (px)

DESCRIPTION

information system

Citation preview

Chapter 6: Elements of Database Systems

1 of 30

ACCOUNTING INFORMATION SYSTEMS: A DATABASE APPROACH by: Uday S. Murthy, Ph.D., ACA and S. Michael Groomer, Ph.D., CPA, CISA

Elements of Database Systems

Learning Objectives

After studying this chapter you should be able to:

• distinguish between the file-oriented approach and the database approach

• discuss fundamental relational database concepts such as composite and foreign keys

• specify the types of relationships that can be represented in database systems

• provide a detailed description of the relational database model

• discuss database integrity, emphasizing entity and referential integrity in particular

• explain and provide examples of validation rules in relational database systems

• discuss how views and permissions can be used to restrict access to sensitive data in relational database systems

• explain the data dictionary concept

• describe the types of database languages

• construct SQL queries to extract information from relational database systems

• discuss database backup and recovery methods

• explain concepts such as concurrency control

• explain in general terms concepts such as the object-oriented approach to developing database systems

The elements of computer-based information systems were discussed in Chapter 2. In this chapter we focus on the elements of database systems. As discussed in Chapter 1, the next generation of accounting systems will have an enterprise-wide orientation and will most likely be built on a database platform. We will first differentiate between the older file-oriented approach and the more recent database approach. Various data structure elements specific to database systems will then be discussed. The major

Chapter 6: Elements of Database Systems

2 of 30

types of database models will then be reviewed. The remainder of the chapter will focus exclusively on the relational database model, which has grown to become widely accepted as the platform of choice for robust enterprise applications.

File-oriented and database approaches contrasted

The early applications of computer technology were in automating transaction processing systems. These early applications were developed using COBOL. Each transaction processing system was created and treated independently with its own set of files and programs. There was virtually no integration across application areas. Most business data processing systems developed in the 1970s and the early 1980s employed this approach. This approach to TPS is referred to as the file-oriented approach. A more modern approach is to develop an integrated set of application systems with all data stored in a shared repository, i.e., the enterprise database. This approach is referred to as the database approach. It is important to note that a significant number of file-oriented systems exist in many businesses today. Most of these file-oriented systems are written in COBOL and continue to receive maintenance. These systems will be around for some years to come due primarily to the significant investment businesses have made in these systems. The term "legacy systems" is used to describe these older COBOL systems. Systems administrators in many organizations have to grapple with the problems associated with interfacing these legacy systems with newer systems that operate in a database or 4GL environment. Furthermore, many of these legacy systems had to be overhauled to deal with the Y2K problem, since their "date" data structures typically allocated only two digits for the year. The contrast between the file-oriented and database approaches is most stark in the context of custom-developed accounting information systems. However, as discussed in Chapter 3, many companies use “off-the-shelf” accounting software packages such as Quickbooks Pro, Peachtree, and Great Plains. Lower-end (less expensive) accounting packages tend to conform more to the file-oriented approach while the higher-end (more expensive) packages tend to conform more to the database approach. Thus, the discussion below of the file-oriented and database approaches is also relevant in the context of accounting software packages. Both the file-oriented and the database approaches will now be described, and the relative advantages and drawbacks of each will be discussed.

The file-oriented approach

As indicated above, the file-oriented approach involves creating a set of files, as needed, for each transaction processing application such as sales or purchases. A set of COBOL programs and data files are created to satisfy the information needs of each application. As shown in the figure on the next page, each application's files and programs are created and maintained independent of other applications.

Chapter 6: Elements of Database Systems

3 of 30

Duplication of files across applications is one consequence of this independence. For example, in the above figure, File A is used in application program 1 and 3. However, two instances of File A must be created - programs 1 and 3 cannot simply share File A since there is no way of allowing concurrent access to the same file in COBOL. Another important characteristic of the file-oriented approach is that each COBOL program is required to define the data structures that will be used in the program. For example, a CUSTOMER file might have the following fields: CUSTOMER-NO, NAME, STREET-ADDRESS, CITY, STATE, and ZIP. The format of this file, including the field names and data types (e.g., text, numeric, date/time) must be explicitly defined. Such definitions constitute the "data structure" for a particular file. If you are familiar with COBOL, you might recall that every COBOL program has four divisions -- the identification division, the environment division, the data division, and the procedure division. The data division is where the data structures are defined which are then manipulated in the procedure division. Thus, in the file-oriented approach, every program defines all the data structures it uses.

Chapter 6: Elements of Database Systems

4 of 30

Drawbacks of the file-oriented approach The most significant drawback of the file-oriented approach is data redundancy or duplication, which is caused by the lack of sharing of data across applications. For example, a marketing application would have to create its own customer file, although a customer file already exists in the sales application. The resulting data redundancy or duplication has many undesirable consequences. Apart from merely consuming more storage space, data redundancy can easily lead to data inconsistencies. Adding new data, changing existing data, or deleting data has to be repeated for each instance of the duplicate files. For example, consider the marketing and sales application referred to above. If a customer's address change is recorded in the sales application but not the marketing application, the result is inconsistent data across the two applications with no indication of which address is the correct address. Thus, data redundancy can cause data inconsistencies, which is more problematic than simply the extra storage space consumed by duplicate data. A second drawback of the file-oriented approach is the proliferation of files resulting from each application creating its own files as needed, which leads to data maintenance becoming significantly more problematic. With several versions of the same file in different applications, ensuring consistency of data across all applications becomes more difficult as the number of files multiplies. A third drawback of the file-oriented approach is the length of time normally required for application development. Files needed for a new application must be created from scratch since sharing of existing files is not possible.

A fourth and rather significant disadvantage of the file-oriented approach is the lack of independence between the data structures and the application programs that access those data structures. As indicated earlier, the "data" division in the COBOL program defines the data structures of all files used in the program. These data structures have to be redefined in every COBOL program that accesses the same file. Any change in a data structure of a file has to be painstakingly effected in each of the several COBOL programs that access the file. The drawbacks of the file-oriented approach are summarized in the table on the next page.

Chapter 6: Elements of Database Systems

5 of 30

Drawbacks of the File-Oriented Approach

Drawback Explanation

Data redundancy

Duplication of data (files) across applications poses a data maintenance problem and can potentially cause problems of data inconsistencies. Excessive duplication also results in high data storage costs.

Proliferation of files

The task of maintaining files can become very complex as the number of applications multiplies, since each application creates its own set of files.

Lengthy application development

Inability to share data in existing files increases the time required to create new applications, since all files needed for the new application must be independently created.

Lack of data independence

Data structures and the procedures that modify the data are both defined within the same program (in the "Data" and "Procedure" divisions). Data structures and/or procedures cannot be independently modified.

The Database Approach

In contrast to the file-oriented approach, the database approach centers around creating an organization wide repository of data that all applications and all users can share. Rather than having multiple instances of the same file, each set of data is uniquely stored, as shown in the figure that follows. Note that in the database approach the term "data set" is used instead of "file." Each application interfaces with the data it needs by accessing the appropriate data sets from the organization's repository or database. The "sharing" of data in the common repository, stored on disk, is transparent to the users. Concurrent access to the same data set is handled by the database management system. Note in the following figure (next page) that the DBMS interfaces with the operating system. The actual retrieval of the data sets stored on disks is handled by the operating system.

Chapter 6: Elements of Database Systems

6 of 30

A database is an integrated repository of an organization's data containing a series of interrelated data sets. The data sets are designed to store data about entities such as customers, employees, and vendors, and also events such as sales, which are really relationships between entities. Specifically, the "sales" event represents a relationship between the "customers" and "finished goods inventory" data sets (i.e., finished goods inventory is sold to customers). The repository is integrated in that there is no duplication of data sets - every entity and event data set is stored just once. The data sets are interrelated in that common attributes exist between data sets to signify relationships between entities and events. Coordination among the data sets involves ensuring that updates to one data set do not result in data inconsistencies in related data sets.

The tasks of creating, updating, and managing data sets are handled by the database management system (DBMS). The definition of data sets in terms of their structure is done using DBMS facilities. Updates to these data sets are also performed using features in the DBMS. In effect, the DBMS serves as the interface between the application programs and users requesting access to data sets and the operating system which actually retrieves the data from their physical locations on a magnetic disk. When there are multiple concurrent requests to access the same data set, it is the DBMS that prioritizes and coordinates the requests. Details regarding this process of handling multiple simultaneous requests to the same data set, a process called concurrency control, will be discussed a little later. The DBMS also handles a variety of other functions, most notably backup and security.

Advantages of the Database Approach In contrast to the file-oriented approach, the database approach has many advantages. First, data redundancy is essentially eliminated since every data set is stored only once in the repository. Multiple applications requiring access to the same data set simply share that data set perhaps even simultaneously as indicated above. No longer are duplicate versions of the same data set maintained for different applications (as was

Chapter 6: Elements of Database Systems

7 of 30

necessary in the file-oriented approach). This sharing of data is a key feature of the database approach. If customer names and addresses are stored in only one data set, a change in a particular customer's address would need to be made just once. A second advantage of the database approach, related to the first advantage, is that data inconsistencies are much less likely to occur. Since customer ABC's record is stored just once in the database, there is no possibility of having different versions of ABC's name and address in different files.

A third and very significant advantage of the database approach is data independence. Recall that in the file-oriented approach data structures must be defined in each application program that accesses those data structures. In a database, data structures are defined using the DBMS independently of application programs that access the data sets. It is not the application programs but the DBMS that is used to define data structures. An application program that requires access to the customer data set does not have to define the structure of the customer data set. This independent definition permits changes to be made in the structures of data sets without having to modify each application program that accesses the affected data sets. Thus, if the "zip code" field in the customer data set needs to be changed from a five digit field to a nine digit field, this change is performed once using the DBMS. None of the application programs that access the customer data set need to be modified.

A fourth advantage of the database approach is that sharing of data and the data independence concept permit rapid application development. New applications using data that already exists in the database can be very quickly developed. The time-consuming steps of defining the data structures and setting up files are eliminated.

The fifth and final advantage of the database approach is that the important functions of backup, control, and security are centralized. Virtually all DBMS come with backup facilities to periodically backup the entire database. Corrupted data sets can be easily restored from the backup. Facilities also exist within the DBMS software to specify access restrictions for either the entire repository, specific data sets, or specific data items within each data set. Controls over what kind of data can be entered into each data set can also be specified at the level of the data set. These controls, called integrity constraints or validation rules, will be discussed later in the chapter. In contrast to the database approach, the file-oriented approach required backup, control, and security to be performed and specified on an application-by-application basis.

The advantages of the database approach are summarized in the following table on the next page.

Chapter 6: Elements of Database Systems

8 of 30

Advantages of the Database Approach

Advantage Explanation

Data redundancy virtually eliminated

Each data set is stored just once in the repository, thereby reducing data storage costs.

No data inconsistencies Since each data item is stored only once, there cannot be multiple versions of that data item.

Data independence

Data structures are defined separately from the application programs. Changes can be made to data structures without having to modify application programs.

Rapid application development Ability to share data shortens the time required to create new applications.

Centralized backup, control, and security

Backup, control, and security tasks are all handled centrally by the database management system.

Drawbacks of the database approach

In comparison to the advantages just cited, the database approach has a few drawbacks. First, within the organization, performing tasks beyond the most basic database activities can be extremely complex. Although DBMS software is becoming increasingly user friendly, most complex tasks such as administering the database require considerable expertise. A second drawback of the database approach is that DBMS include a number of fairly complex features for controlling the integrity of data entered into tables. Mastering the intricacies of table integrity features can be quite a daunting task. However, given their role as control and security consultants within the organization, auditors (internal and external) must familiarize themselves with the DBMS features that allow control and security to be specified. From an auditor's perspective, the complexity of DBMS might pose significant problems in addressing audit concerns of control, security, and integrity of data. Third, data stored within the database can most easily be accessed using DBMS specific features and utilities, such as the DBMS query language and built-in reports. Thus, for the purposes of the annual financial statement audit, the auditor must become adept at creating and running database queries to ensure that there are no material errors in the financial statements. A fourth and final drawback of the database approach pertains to the centralization of control and security. Although centralization was listed as an advantage, it can also be a weakness since intruders need only penetrate the DBMS shield to have access to all of

Chapter 6: Elements of Database Systems

9 of 30

the organization's data. Access and control restrictions specified on data sets are done using DBMS features; those same features can be used to "turn off" the access and control restrictions unless adequate safeguards exist. Thus, for the DBMS control and security features to be reliable, there should be strong control procedures over access to the database (control procedures are discussed in detail in chapter 10). The disadvantages of the database approach are summarized in the following table. The advantages of the database approach outweigh the few drawbacks. With the falling cost of hardware and software, most organizations should be able to justify the investment associated with the database approach.

Disadvantages of the Database Approach

Disadvantage Explanation Complexity of DBMS administration

The administration of large scale database systems requires significant resources and expertise

Data integrity using complex DBMS features

Configuring the database to insure data integrity requires considerable expertise and intricate knowledge of DBMS features. Accountants and auditors must familiarize themselves with control and security concerns for DBMS and how these can be implemented in database environments

Data accessible only through DBMS

Accountants and auditors must be competent in using the DBMS to access data for the purpose of generating useful information and fulfilling audit objectives

Centralized backup, control, and security

Backup, control and security are typically centralized, potentially making the organization vulnerable to a hacker who can break through the central security shield

Fundamental database concepts Having contrasted the file-oriented and database approaches, let us examine some fundamental concepts in the database approach. The data set concept discussed earlier is a crucial building block of database models and database systems. A data set is created for every entity and every event of interest that needs to be represented in the database. A primary key is a unique identifier of a record in a file. Similarly, in a database, every data set must have a unique identifier of records within the data set. This unique identifier is referred to as the primary key of the data set. When multiple fields are required to uniquely identify a data set, the result is referred to as a composite key or a concatenated key. Every field in a data set other than the primary key is referred to as a non-key attribute. Some of the non-key attributes may be used to sort the file (or data set) to facilitate answering user queries.

Chapter 6: Elements of Database Systems

10 of 30

As discussed earlier, a unique aspect of the database approach is the interrelationships between data sets. These interrelationships are defined in different ways depending on the type of database model (to be discussed in the next section). In terms of our discussion of keys, it is relevant to discuss the concept of a foreign key. A foreign key is a field in a data set that is the primary key in a related data set, referred to as the "master" data set. There are two variants of the "foreign key" concept -- a foreign key can either be part of a composite primary key or simply a non-key attribute in a data set. That is, when a data set has a composite primary key (more than one field making up the primary key), then each individual element of that composite key will usually be the primary key in another related data set. Alternatively, a foreign key can simply be a non-key attribute in a data set which happens to be a primary key in a related data set. It is important to identify foreign keys in a database because it is these keys that enable linking of data sets that are related to one another. The concept of foreign keys will be explained later in this chapter using an example. The various key types are summarized in the following table.

Keys in Database Environments

Key Explanation Primary key Unique identifier of records in a data set.

Composite (concatenated) key Two or more fields taken together serve as the primary key in the data set.

Non-key attribute Any field that is not a primary key attribute.

Foreign key

Two variants: (1) an element of a composite key in a data set which is the primary key in a related data set, (2) a non-key attribute in a data set which is the primary key in a related data set.

Relationships between entities and events, both of which are represented in the database by means of data sets, can be of three types, referred to as the relationship cardinality. One-to-one (1:1), one-to-many (1:M), and many-to-many (M:M) relationships are the three relationship types; the shorthand for depicting each relationship type is shown in parentheses. Consider the relationship between the "department" and "manager" entities. A 1:1 relationship between departments and managers implies that each department can have one and only one manager and each manager can manage one and only one department. Now consider the relationship between the "salespersons" and "customers" entities. A 1:M relationship between salespersons and customers means that each salesperson can have many customers but every customer is assigned to exactly one salesperson. Note that a 1:M relationship can be interpreted as a M:1 relationship when read from the opposite direction. Thus, the relationship from customers to salespersons is a M:1 relationship (many customers have one salesperson). A M:M relationship between salespersons and customers

Chapter 6: Elements of Database Systems

11 of 30

indicates that each salesperson can have many customers and each customer can work with many salespersons.

As another example of the various relationship cardinalities, consider the relationship between students and tutors. A 1:1 relationship indicates that each student is assigned exactly one tutor and each tutor is assigned exactly one student. A 1:M relationship from tutors to students indicates that each tutor has many students but every student is assigned exactly one tutor. Thus, this relationship would be read as a M:1 relationship from students to tutors. A M:M relationship between tutors and students indicates that each student can have many tutors and each tutor can have many students. Note that the "M" side of a relationship, that is the "many" side, can be interpreted as "one or more." The various relationship types are summarized in the table below.

Relationship Types

Relationship Explanation 1:1 One-to-one (e.g., one professor in one office) 1:M One-to-many (e.g., one advisor has many students)

M:M Many-to-many (e.g., a class has many students, and a student can be in many classes)

Overview of database models

Given the understanding of fundamental database concepts discussed above, we now turn to a description of the dominant database model today, i.e., the relational model. Two older database models are the hierarchical and the network model -- these models were popular in the 1970s but have since been displaced by the relational model. The hierarchical and network models are rarely found in practice and are therefore not discussed in this chapter. An emerging model is the object-oriented database model which we shall briefly explore when we consider emerging concepts in the database arena.

Relational Model Overview

The relational model, first proposed by E.F. Codd, uses the concept of a "relation" to store data. A relation is simply a two-dimensional table with rows and columns. The rows, also referred to as "tuples," are the records in the data set and the columns are the fields. The term "table" and "relation" are synonymous and are used interchangeably. Henceforth, we shall use the term "table" rather than "relation" and "row" rather than "tuple." A "customer" data set would be implemented as a two-dimensional table with the columns containing the various attributes of customers (e.g., name, address, phone number, etc.) and the rows containing the records (i.e., customers). The concept of a table is thus intuitively appealing and easy to understand.

Chapter 6: Elements of Database Systems

12 of 30

In the relational model, then, a series of tables are constructed for storing data relevant to the situation. A relational model for a sales order and collection system is shown in the figure below. Note that one table is used to represent each entity and each event.

The resulting tables are SALES-REGIONS, CUSTOMERS, SALES-ORDERS, ITEMS-ORDERED, COLLECTIONS, ORDERS-COLLECTIONS, and ITEMS. The arrows in the figure are drawn to point out the links between tables (i.e., the common fields between tables). The single and double headed arrows signify "1" and "M" relationships as before. The convention we will use to indicate the primary key in a table is by underlining it. Obviously, there will be two (or more) fields underlined in the case of a composite primary key.

Recall that we introduced the concept of a "foreign key" earlier in the chapter. Let us revisit that concept in the context of the relational model. To repeat, a foreign key is either a non-key attribute in a table that is a primary key in a related table or an element of a composite key in a table that is a primary key in a related table. The "related table" is in effect the "master" table for that key field. In the above set of tables, the CUSTOMER-NO and REGION-NO fields in the SALES-ORDERS table represent one variant of the foreign key concept -- they are non-key attributes in the SALES-ORDERS table, but each of them are primary keys in a related table. CUSTOMER-NO is the primary key in the CUSTOMERS table, and REGION-NO is the primary key in the SALES-REGIONS table. "CUSTOMERS" is considered to be the "master" table for the

Chapter 6: Elements of Database Systems

13 of 30

CUSTOMER-NO primary key, and "SALES-REGIONS" is considered to be the "master" table for the REGION-NO field.

Now let us turn to the second variant of foreign keys -- those that are elements of composite primary keys. Note that the ORDERS-COLLECTIONS table has a composite key of RECEIPT-NO and ORDER-NO (i.e., RECEIPT-NO and ORDER-NO taken together uniquely determine the rows in the ORDERS-COLLECTIONS table). Per the definition of a foreign key, both RECEIPT-NO and ORDER-NO are each considered to be foreign keys in the ORDERS-COLLECTIONS table since each of them is a primary key in a related "master" table (RECEIPT-NO is the primary key in COLLECTIONS and ORDER-NO is the primary key in SALES-ORDERS). Similarly, ORDER-NO and ITEM-NO are foreign keys in the ITEMS-ORDERED table (the "master" table for items is the ITEMS table). The relevance of the distinction between these two variants of foreign keys will become apparent a little later in the chapter when we discuss the concepts of entity integrity and referential integrity. The convention we will use to indicate foreign keys in the relational model is with an asterisk (*) at the end of the foreign key field.

In the relational model, M:M relationships are represented using composite key tables. Thus, the M:M relationship between SALES-ORDERS and COLLECTIONS is implemented by creating a new ORDER-COLLECTIONS table, which has a composite key comprising the primary keys of the SALES-ORDERS and COLLECTIONS tables, i.e., the two tables involved in the M:M relationship. That is, the ORDER-COLLECTIONS table has a composite primary key of RECEIPT-NO, ORDER-NO, with RECEIPT-NO being the primary key of the COLLECTIONS table and ORDER-NO being the primary key of the SALES-ORDERS table. Similarly, the M:M relationship between SALES-ORDES and ITEMS is implemented by means of the ITEMS-ORDERED table which also has a composite key, formed by taking the primary key of SALES-ORDERS and the primary key of ITEMS (i.e., ORDER-NO, ITEM-NO). The ITEMS-ORDERED and ORDER-COLLECTION composite key tables above both had non-key attributes. However, it is possible for a composite key table to have no non-key attributes at all. In that case, the composite key table is referred to as an all key relation (i.e., a table where all fields comprise the primary key).

It is important to note that relationships between tables are represented implicitly using foreign keys. By contrast, the older hierarchical and network models represented relationships explicitly using physical pointers. The process of designing a relational model, in terms of the number and structure of tables and the keys linking tables, will be discussed in the next chapter. Suffice to say for now that the process of designing a relational database model is non-trivial.

The simplicity and ease of use of the relational model, and hence its superiority over the hierarchical and network models, is evident in many ways. First, the use of foreign keys results in the representation of all relationships implicitly and not explicitly (as was the case in the hierarchical and network models). Thus, any two tables can be related as long as they have a common field. Thus, query processing in the relational model is far simpler since users do not have be cognizant of the physical pointers between data sets. In terms of the types of relationships that can be represented in the relational

Chapter 6: Elements of Database Systems

14 of 30

model, 1:1, 1:M and M:M relationships can all be represented. The way in which each of the relationship types is represented will be discussed in the next chapter.

Relational database management systems (RDBMS) are available for enterprise oriented operating systems such as Unix. Linux, and Windows 2012 Server as well as personal computer oriented operating systems such as Windows and Macintosh OS-X. Oracle 12c, IBM Informix 12.1, Microsoft SQL Server, IBM DB2, and Sybase are some of the popular enterprise RDBMS that run on “industrial-strength” servers. For individual users running Windows, some of the popular RDBMS packages are Microsoft Access and dBase.

The Relational Model Explored Although there have been recent advances in database technology focused on the object-oriented model, which we will briefly explore towards the end of this chapter, the vast majority of database oriented business information systems are built using the relational model platform. Some years ago, RDBMS were hailed for their simplicity and ease of use but assailed for their poor performance relative to DBMS built on the then prevailing models – the hierarchical and network models (which are now defunct). However, in recent years, the performance of RDBMS has improved considerably as a result of which RDBMS have gained widespread acceptance in the marketplace. Even personal computer based RDBMS such as Microsoft Access are proving to be powerful enough for creating complex and robust database applications to meet the information needs of a variety of businesses. We will first examine the rules to which tables must conform. RDBMS vary in terms of their compliance with these rules. In addition to these basic rules, we will also focus on various RDBMS features that have implications for control and security of the database. As accountants, you are likely to be called upon to work with and give advice on the control and security aspect of database systems. Integrity constraints, validation rules, permissions, views, and the data dictionary are some of the features of RDBMS that are relevant from a control and security standpoint.

Rules for tables Tables in a relational database must conform to a number of rules. Each table in the database must have a unique name; no two tables can have the same name. Duplicate columns and rows are not permitted within a table; no two columns can have the same name in a table, and no two rows can have the same value in every column. The sequence of rows and columns is immaterial. However, the convention is to list the primary key field(s) at the left and the non-key fields on the right. Rows in tables are usually ordered in ascending or descending order of the primary key, although this ordering is not essential. Tables can easily be sorted on fields other than the primary key. Every table must have a designated primary key - a unique identifier of every row in the table. The primary key could either be one field only, or more than one field taken together. As discussed earlier, when multiple fields make up the primary key, the result is referred to as a composite key or a concatenated key. Note that a table could have more than one unique identifier, but only one must be chosen as the primary key. For

Chapter 6: Elements of Database Systems

15 of 30

example, a student table could store both the social security number (SSN) and a university assigned student ID (SID) number. The student table would thus have two unique identifiers (SSN and SID), but only one can be designated as the primary key. Relationships between tables are represented using common fields between them. As discussed earlier, these common fields are foreign keys. Recall that a foreign key is either an individual element of a composite primary key or a non-key attribute in one table that is the primary key in another table. As discussed above, the "CUSTOMER-NO" field in the sales orders table in the relational model is also the primary key in the customers table and is thus a foreign key in the sales orders table. The rules for tables are summarized below.

• Table names must be unique in the database.

• Every table must have a primary key.

• Duplicate rows and duplicate columns are not allowed.

• The order of rows and columns is immaterial.

Entity and referential integrity Tables must conform to a number of integrity constraints. Two key constraints are entity integrity and referential integrity. Entity integrity means that the primary key field (or fields in case of a composite key) in a table cannot be null and must be unique. This integrity constraint applies to every table in the database. What entity integrity simply means is that the primary key field must have a value -- it cannot be left blank. Furthermore, the value of every primary key in a table must be unique -- no two rows in the table can have the same primary key value. Most RDBMS are equipped with features that automatically enforce entity integrity. Thus, the RDBMS will signal an error if the user attempts to insert a new row without specifying a value for the primary key field, or if the value specified is a duplicate value. Referential integrity means that foreign keys must either be null or match an existing value in the "master" table for the foreign key. It is important to note that foreign keys can only be null when they are non-key attributes. It is perfectly legal for a non-key attribute to be null. However, a foreign key can never be null when it is an element of a composite key, because a null value in an element of a composite key would violate entity integrity.

Referential integrity is best explained in the context of an example. Consider the relational model in the figure presented earlier in the chapter. The CUSTOMER-NO field in the SALES-ORDERS table is a foreign key (because it is a non-key attribute in the SALES-ORDERS table and a primary key in the CUSTOMERS table). What referential integrity requires is that every value of CUSTOMER-NO in the SALES-ORDERS table must exist in the CUSTOMERS table. In other words, there cannot be a customer number in the SALES-ORDERS table that does not exist in the CUSTOMERS table. Simply put, you cannot have a sales order on a non-existing customer. The CUSTOMERS table is considered the "master" table for the CUSTOMER-NO foreign key. That is, every new customer number must first appear in the CUSTOMERS table

Chapter 6: Elements of Database Systems

16 of 30

before it can appear anywhere else in the database (i.e., in other tables). Note however that referential integrity allows a foreign key field to be left blank when it is a non-key attribute. For example, a null value in the CUSTOMER-NO foreign key field in the SALES-ORDERS table might signify a cash sale (in which case it may not be necessary to keep track of the customer number). Referential integrity also applies to the REGION-NO field in the SALES-ORDERS table (every REGION-NO in SALES-ORDERS must exist in the SALES-REGIONS table). Thus, referential integrity dictates that all foreign key fields must have a corresponding value in the "master" table for the foreign key field.

Entity integrity and referential integrity are essential to ensure an error free database. Entity integrity prevents tables from having duplicate or missing primary keys which would prevent rows from being located and queries from being answered. The main purpose of referential integrity is to ensure the validity of foreign keys. As discussed earlier, foreign keys are how links between tables are implemented. If referential integrity is not enforced, then relationships between tables may be corrupted because of invalid foreign key values.

Data validation rules In addition to entity and referential integrity, a number of data validation rules can be prescribed for each table in the database. The purpose of these validation rules is to prevent erroneous data from being entered into the table. Note the emphasis on the word "prevent" -- these rules are aimed at prevention rather than detection of errors. Thus, validation rules in database systems present an opportunity for accountants and auditors to propose a wide range of controls that can be built into the systems to prevent errors from creeping into the database. Let us first discuss examples of data validation rules and then how they work to prevent errors. Validation rules can be established for individual fields within a table to restrict the data that can be entered into the field. Rules that refer to more than one field in a table can also be defined either directly at the field level or in some database systems at the overall table level. Data validation rules at the field level can be specified to ensure that the value entered in the field is in a range of acceptable values. Minimum values, maximum values and both minimum and maximum values for data can be specified. For an example, an "hours worked" field can have a minimum value (1), a maximum value (40), or both a minimum and a maximum value (>=1 AND <= 40). If the user attempts to enter a value of 50, the system will reject the input and display an error message. Another type of validation rule allows valid values for a field to be specified such that data input is accepted only if it matches one of the acceptable values. A "shipping code" field may be designed to accept only one of the following values: 'S' for in-state, 'O' for out of state but within the U.S., and 'I' for international. If the user attempts to enter a value other than S, O, or I, an message will be displayed and the erroneous data will not be accepted. Rules can also be specified to ensure that the correct number of digits have been entered into a field. For example, a "zip code" field can be designed to accept exactly five digits, or exactly nine digits in the format 99999-9999.

Rules can also be specified to ensure valid relationships between fields. Take for example a "sales order" table with the fields "order date" and "ship date." A rule can be

Chapter 6: Elements of Database Systems

17 of 30

specified to ensure that the ship date is always on or after the order date. While some database systems allow such rules to be specified in either of the two fields, other systems refer to such multi-field validation rules as "record validation rules" (or "table level validation rules"). Validation rules in RDBMS are summarized in the following table.

Validation Rules

Range test Greater than a minimum and/or less than a maximum value?

Validity test One of the acceptable values for this field? Length test Correct number of digits entered? Valid combinations test

Correct mathematical or logical relationship between fields in a table?

Restricting access to tables In addition to such validation rules that are defined for each table, access restrictions can be defined at the database level for controlling access to sensitive data such as tables containing employee salary data. Most RDBMS provide the ability to designate authorized and unauthorized users for tables. Certain tables can be made accessible to all users except those listed, or only to the listed users. Unauthorized users receive error messages when they attempt to access a table which they have not been authorized to access. Another method of restricting access to sensitive data is to create a view. A view is a virtual table. A view appears to the user as a table but is actually a version of an existing table with some of the table's data hidden from view (hence the term "view"). For example, a view can be constructed of the "employee" table that hides the "salary" and "home phone number" fields. The rest of the data in the employee table such as the name, office phone number, email address, etc. would appear in the view. All users can be given permission to access this view, but only a few individuals would be granted access to the actual employee table. Apart from hiding certain columns, rows that contain sensitive data can also be hidden. For example, information pertaining to high level executives can be omitted from the view by excluding all rows in the "employee" table where the "rank" field is higher than 8 (assuming that all top executives have values greater than 8 in the "rank" field). Hiding of rows and columns can be combined in the same view. To summarize, access to sensitive data can be restricted by (1) specifying which users can access the sensitive table by setting "permissions" and (2) creating views in which sensitive fields are hidden and permitting access only to the view.

Data Dictionary

In an RDBMS, the data dictionary contains a variety of information about the contents of the database. In a sense, the data dictionary contains data about data, or "meta

Chapter 6: Elements of Database Systems

18 of 30

data." Note that the data dictionary is not a single table in the database but a collection of hidden “system” tables. The information stored in these hidden system tables comprising the data dictionary is useful both from the standpoint of application development and database maintenance and also from a control and security standpoint. Some examples of the kind of data contained within the data dictionary are the names of all tables, the columns (attributes) contained in each table along with their data formats, and the privileges held by each user authorized to access the database. By querying the data dictionary, users can locate all tables in which a particular attribute exists. Note that access to the data dictionary should be restricted to select systems designers and auditors who would find the need to refer to the data dictionary for a variety of reasons. For example, an auditor interested in determining which users are authorized to read the "customers" table could find that information by examining the data dictionary. Or if the auditor would like to know how many tables contain the "employee salary" field, the answer can be obtained from the data dictionary.

Languages for RDBMS There are three categories of RDBMS languages: one for defining the relational database schema, one that is used to access database tables from within conventional application programs, and one that can be used by end users to perform ad hoc queries of the database. The data definition language (DDL) is used to program the database schema. The design and setup of the database is typically performed by the database administrator (DBA) who is the individual having overall control over the database. Using the DDL, the DBA can create tables, define authorized users of each table and specify validation rules for individual tables. The language that is used within conventional application programs like COBOL, or more recently C and Visual Basic, is called the data manipulation language (DML). These DML statements are embedded within third generation languages such as COBOL or C, or even fourth generation languages. The purpose of the DML statements is to allow the programs to access tables in a database whenever required. The need for a separate DML embedded within conventional programs is fast diminishing as most RDBMS today have powerful programming components built into them. For example, Microsoft Access provides the ability to develop powerful applications using Visual Basic for Applications (VBA) which is available within Access.

The third category of DBMS languages is the data query language (DQL). The DQL is used by end users to perform ad hoc queries on the database. The DQL should ideally be easy to use so that end users without extensive programming experience can execute simple statements to satisfy relatively straightforward information needs. There are two broad categories of DQL - GUI oriented querying referred to as Query By Example (QBE), and command line oriented querying. The most popular command line DQL is Structured Query Language (SQL - pronounced "sequel"). SQL will be discussed in greater detail in the next section.

In GUI oriented querying, QBE, the user is presented with a shell of the table or tables containing the data that would answer the user's query. In the appropriate field, the user simply provides an example of the data he/she is looking for (hence the term "query by

Chapter 6: Elements of Database Systems

19 of 30

example"). In effect, the user specifies select conditions for fields and can also indicate which fields the result should be sorted on. For example, if the user wants all sales orders where the amount exceeded $1,000, he/she would first invoke the QBE module to view a "skeleton" of the orders table. Then, the user would tab over to the "amount" field, type in >1000, and execute the query. The skeleton table would then show the rows that met the condition specified by the user (or a message if no rows were found to satisfy the condition). As is possible in the "view" concept discussed earlier, the user can hide certain fields from the answer. The following figure shows QBE in action, to show orders exceeding $1,000, using the default query design view in Microsoft Access.

In addition to QBE and SQL, two other RDBMS tools are noteworthy. Most RDBMS include a report writer which can be used to create custom reports formatted to the user's specifications. The user simply indicates which table or view to use as the input and can determine the precise nature of the report. The fields to total, at what points to provide subtotals, the header and footer for each page, and the end of report summary information are some examples of the report characteristics that the user can control. The second useful RDBMS tool is the forms editor which can be used to create custom data input forms. Rather than using the table itself to enter data, users (especially novice users) can be provided with easy to use forms that simplify the process of entering and retrieving data. Forms can be designed to supply default values for fields and for specifying custom formats to facilitate data entry. Other than simply for data input, forms also represent the user-interface component of powerful custom applications that can be developed using the RDBMS' programming tools. Program code modules can be associated with buttons on forms such that a whole series of actions are automatically executed when the user clicks on a button after entering data into form fields. This use of forms will be discussed in the next chapter. Shown below is an example of a Microsoft Access form, used to add, update, and delete information about customers.

Chapter 6: Elements of Database Systems

20 of 30

Structured Query Language -- SQL As indicated above, a very popular database language for data definition, manipulation, and query is Structured Query Language (SQL). In the context of the three types of database languages discussed earlier (i.e., DDL, DML, and DQL), SQL includes statements for all three purposes. The set of "create" commands within SQL are used to define tables, i.e., as a DDL. SQL statements can be embedded into conventional programming languages (i.e., used as a DML). Finally, and in its most common use, SQL can be used as a DQL by end users seeking answers to ad hoc queries. There are four primary operations that can be performed on tables using SQL -- the SELECT operation (to select rows from one or more tables), the INSERT operation (to insert rows into a table), the UPDATE operation (to modify one or more rows in a table), and the DELETE operation (to delete one or more rows from a table). The most commonly used SQL operation to answer ad hoc queries is the SELECT operation. The SELECT operation can simultaneously (1) create a horizontal subset of a table, i.e., selecting all rows from a table that meet a certain condition, (2) link tables using the common field between them, and (3) create a vertical subset of a table or tables, i.e., displaying only certain fields in tables.

The general syntax of the SELECT operation is as follows:

SELECT <table_name1>.<field_name1>, <table_name1>.<field_name2>, <table_name2>.<field_name2> FROM <table_name1>, <table_name2>... WHERE <table_name1>.<common_field1> = <table_name2>.<common_field1> .... AND <condition> [INTO <result table>]

The "INTO" syntax in square brackets is optional -- it results in the creation of a new table to store the query results. The <condition> portion of the SQL statement (i.e., the WHERE clause) is specified using a field from a table listed in the FROM part of the statement (e.g., CUSTOMERS.BALANCE > 5000). Note that the WHERE condition

Chapter 6: Elements of Database Systems

21 of 30

clause is used to specify both the joins necessary to obtain the query result and the criterion or criteria to be applied. The order of joins and criteria specifications is not material. Tables needed for joins and for criteria specifications should be listed in the FROM portion of the SQL statement. Note that the above syntax is not universal across all RDBMS -- slight variations from one RDBMS to the next will exist. Let us now look at some examples of SQL in action.

Consider the following four tables in a sales database information system (the primary key in each table is in underlined, and foreign keys are identified with an asterisk at the end of the field):

CUSTOMERS (CUSTOMERNO, NAME, ADDRESS, PHONE, BALANCE, CREDIT_LIMIT)

SALES (INVOICENO, DATE, CUSTOMERNO*, SALESPERSON, TOTAL)

ITEMS (ITEMNO, DESCRIPTION, QTY_ON_HAND, COST_PRICE)

ITEMS_SOLD (INVOICENO*, ITEMNO*, QTY_SOLD, SELLING_PRICE)

Let us assume that the sales manager has the following queries: (1) which customers, if any, have exceeded their credit limit? (2) what are the names and phone numbers of customers who have been sold merchandise by John Doe? (3) what are the names and current balances of customers who have been sold item number 1250? The SQL queries to answer each of these queries follow:

Query no. 1: SELECT * FROM CUSTOMERS WHERE CREDIT_LIMIT < BALANCE; (Note: the * means all fields) Query no. 2: SELECT CUSTOMERS.NAME, CUSTOMERS.PHONE FROM CUSTOMERS, SALES WHERE CUSTOMERS.CUSTOMERNO = SALES.CUSTOMERNO AND SALES.SALESPERSON = "John Doe";

Query no. 3 SELECT CUSTOMERS.NAME, CUSTOMERS.BALANCE FROM CUSTOMERS, SALES, ITEMS_SOLD WHERE CUSTOMERS.CUSTOMERNO = SALES.CUSTOMERNO AND SALES.INVOICENO = ITEMS_SOLD.INVOICENO AND ITEMS_SOLD.ITEMNO = 1250;

The statements shown above use the syntax <table-name.field-name> to jointly refer to both a field and the table in which the field appears. Joins are performed by indicating which fields in the two tables should equal one another (i.e., which fields are common between the two tables). In query number 3 above, the WHERE clause specifies (1) the two joins needed—between CUSTOMERS and SALES using CUSTOMERNO and between SALES and ITEMS_SOLD using INVOICENO and (2) the criterion involving

Chapter 6: Elements of Database Systems

22 of 30

the ITEMNO field in the ITEMS_SOLD table. The CUSTOMERS, SALES, and ITEMS_SOLD tables must all be specified in the FROM portion of the query.

While the exact form of SQL syntax can vary from one RDBMS to the next, the general format should be similar to that shown above. The various DBMS languages/tools are summarized in the following table.

DBMS Languages

Language/tool Explanation

DDL - Data Definition Language

Used to create tables, set permissions on tables, define validation rules in tables, and perform other functions such as backup.

DML - Data Manipulation Language

Embedded into application programs written in a third or fourth generation language. The DML statements allow the program to interface with the database.

DQL - Data Query Language

General term for user-oriented interfaces to the database to enable end users to obtain answers to ad hoc questions.

SQL - Structured Query Language

A widely accepted standard relational database query language. Command line interface using four main operators -- SELECT, INSERT, UPDATE, and DELETE.

QBE - Query By Example

Graphical interface for querying. User is presented with a shell of a table to be queried in which the user can enter an example of what he/she is looking for as a means of querying the table.

Report Writer Allows custom reports to be generated from tables in a very user-friendly intuitive manner.

Forms Editor

Permit the creation of user-friendly interfaces to tables. Forms can be made to appear like the documents and paper forms that are familiar to the user.

Chapter 6: Elements of Database Systems

23 of 30

DBMS backup and control features In order to protect the organization from accidental or intentional corruption of the database, periodic backups should be performed. The most basic form of backup is the static backup. This backup procedure first involves closing all programs and shutting down the database. Next, the entire database is backed up, either to a separate disk or to tape. During the backup procedure all users are "locked out" of the database (i.e., prevented from accessing the database). The reason that this backup method is referred to as a "static" method is because table structures and values are saved at a particular point in time. If the database crashes, it can only be recovered to the state at which it was last backed up. In other words, transactions that occurred since the last backup are lost. Most personal computer RDBMS offer only static database backup. A superior backup method is called dynamic backup. Often available only on mainframe RDBMS, this method involves periodic static backup combined with logging of each individual transaction to a backup magnetic disk in addition to the primary magnetic disk. In effect, every transaction is recorded twice -- once on the primary database disk and once on a backup disk. In the event of a hardware or software failure which results in corruption of the database, the static backup is retrieved and the new transactions from the backup disk are "applied" to the backed up version of the database. This process results in recovery of the database to the status at the point of failure. In effect, the database is reconstructed as if the crash had never occurred. One popular dynamic backup method is the redundant array of inexpensive (or independent) disks (RAID). As the name suggests, RAID uses an array of magnetic disks for recording transactions. The RAID controller determines which disks each transaction will be written on - as indicated earlier, each transaction will be written on more than one disk. If a disk fails, the RAID controller can still retrieve all the transactions because every single data item on the failed disk would have been written on some other disk in the array. Especially with the cost of magnetic disks declining rapidly, RAID systems have become more affordable than ever before. This web site at the Advanced Computer & Network Corporation provides an excellent description of the different levels of RAID, from level 0, to level 53, to level 0+1.

When multiple users can access a database via a network, a critical concern is control over concurrent or simultaneous updates. If two users are allowed to update a table at the same time, the database may be left with inconsistent values after the two users perform their respective updates. For example, assume that 100 units of a finished goods inventory item are in stock. Next assume that two sales clerks each process a sales transaction for 80 units at exactly the same time (both would be permitted to do so, since the system would display available inventory of 100 to both users). At the conclusion of both transactions, the inventory item would have a balance of -60! To prevent such an occurrence, all RDBMS have some form of concurrency control. The most rudimentary form of concurrency control is called a "lock out." When one user accesses a table with the intention of updating it, all other users are simply "locked out" from the table. The other users receive a message indicating that the table is currently unavailable. However, this is an extreme form of concurrency control. Users who only intend to read data in the table should be permitted to do so. A less stringent form of concurrency control is the "write lock" in which users who only intend to read data from

Chapter 6: Elements of Database Systems

24 of 30

the table are granted access but users who intend to update values in the table are denied access. Remember that in a multi-user DBMS environment, it is essential to allow simultaneous access to tables unless corruption of the data might result (as in the case of simultaneous updates).

Emerging database systems concepts We conclude this chapter with a brief discussion of an emerging concept relating to database systems. Object-oriented (OO) approaches to modeling and implementing database systems are becoming increasingly popular. This approach employs object-oriented modeling (OOM) techniques to model the domain of interest and then implements the resulting model using an object-oriented database management system (OODBMS). The object-oriented approach focuses on the objects of interest in the domain. Customers, vendors, employees, sales orders, and receipts are all viewed as objects that have certain attributes. OOM involves identifying the objects of interest, their attributes, and relationships between objects. A critical feature unique to the OO approach is that an "object" package includes both the attributes of the object and the methods or procedures that pertain to that object. The methods might dictate how the object's attributes are modified in response to different events, or how the object causes changes in the attributes of other objects. Thus, a key difference between the database models described earlier and the OO approach is that OO models combine data (attributes) and procedures (methods) in one package, i.e., the "object." This feature of OO models is referred to as encapsulation - attributes and methods are represented together in one capsule. Another powerful feature of OO models is inheritance. OO models depict the real world as a hierarchy of object classes, with lower level classes inheriting attributes and methods from higher level classes. Thus, lower level object classes do not need to redefine attributes and methods that are common to the higher level object classes in the class hierarchy.

An OO model contains all details needed for implementation and object-oriented DBMS are powerful enough to represent all the information contained in the model. However, most organizations that have made heavy investments in RDBMS see little need to migrate to OO environments. While OO modeling methods are available, there is no consensus regarding the "best" method to use. Finally, although OODBMS are beginning to become commercially available, they have not gained much acceptance in the marketplace probably due to their relatively high cost and poor performance in comparison to RDBMS. Gemstone, Jade, ObjectDB, and Objectivity are some examples of OODBMS.

Summary

The chapter began by contrasting the older file-oriented approach with the database approach. Drawbacks of the file-oriented approach and advantages and limitations of the database approach were discussed. Key database concepts such as primary, concatenated, and foreign keys were described. The various types of relationships such as 1:1, 1:M, and M:M relationships were then explained. The relational model was then explored in detail. Rules for relations, entity and referential integrity, and validation rules for relational database systems were explained. The process of restricting access

Chapter 6: Elements of Database Systems

25 of 30

to data in a relational database was then discussed. The data dictionary concept was then explained. The three major database languages - the data definition language, the data manipulation language, and the data query language - were described in terms of their functions. SQL, a popular relational database query language, was discussed in some detail along with examples. Finally, backup and control procedures for relational database systems were discussed. These include static backup, dynamic backup, RAID, and concurrency control. The chapter concluded by discussing the emerging concept of object-oriented modeling and implementation of database systems.

Key Terms Composite key Concatenated key Concurrency control Data definition language Data dictionary Data independence Data manipulation language Data query language Database approach Dynamic backup Encapsulation Entity integrity File-oriented approach Foreign key Forms editor Inheritance Object-oriented Redundant array of inexpensive disks Referential integrity Relationship cardinality Report writer Static backup Structured query language (SQL)

Chapter 6: Elements of Database Systems

26 of 30

Key Web Sites PC based DBMS

• Microsoft Access – Microsoft's popular RDBMS for the personal computer

• dBase - one of the earliest (and still around) RDBMS for the personal computer

• Base – the database software that is part of the Apache OpenOffice suite

Server based DBMS

• IBM DB2 – IBM’s relational DBMS for the enterprise

• Microsoft SQL Server – the latest version is SQL Server 2005

• IBM Informix 12.1 – Informix Dynamic Server – an enterprise strength relational database

• Oracle 12c – the latest version of Oracle's database, the industry leader in RDBMS technology.

• Sybase - A cross-platform RDBMS

• mySQL – An open source (i.e., “free”) multi-platform RDBMS

Other sites

• A home page dedicated to information about the SQL standard

• An interactive online SQL tutorial

• The W3Schools.com site for SQL – a good place to learn SQL

• Network World article on RAID

• Explanation of the different levels of RAID

• A site with links to various object-oriented database systems

Chapter 6: Elements of Database Systems

27 of 30

Discussion Questions 1. Briefly describe the file-oriented approach to data processing.

2. Provide an overview level description of the database approach to data processing.

3. Distinguish between the file-oriented and database approaches in terms of their relative advantages and disadvantages.

4. What do you understand by the term "legacy systems."

5. Explain the concept of data independence.

6. Giving examples, explain the concept of foreign keys.

7. Indicate the key features of the object-oriented model.

8. How are many-to-many relationships represented in the relational model? Explain in the context of the following scenario: an employee can be working on many projects, and a project can have many employees working on it.

9. What are the rules to which tables must conform in the relational model?

10. Giving examples, explain the concepts of entity and referential integrity.

11. What are data validation rules? Why are validation rules in database environments superior to application controls in a file-oriented environment?

12. Explain the methods by which access to sensitive data in a relational database can be restricted.

13. Explain the concept of the "data dictionary." Why do auditors find the data dictionary useful?

14. What are the three broad categories of database languages? Briefly indicate the function of each language type.

15. Describe the four major SQL operators.

16. Distinguish between static and dynamic database backup. Explain the function of RAID.

17. Giving examples, explain the concept of concurrency control in database environments.

Chapter 6: Elements of Database Systems

28 of 30

Problems and Exercises 1. Aggies-R-Us would like your assistance in developing a logical database model for their purchasing application. Based on discussions with key managers at Aggies-R-Us, you determine the following information: (1) a vendor can supply many parts and a particular part can by supplied by many vendors, and (2) each part can be stored in many warehouses and each warehouse can store many parts. Required: Draw a relational model to reflect the relationships between vendors, parts, and warehouses. List the tables in your model and draw appropriate links between the tables (use single-headed and double-headed arrows to indicate the relationship cardinality). You may make reasonable assumptions regarding the fields to be represented in each table; be sure to indicate the primary key in each table. 2. Answer the questions that follow with reference to the following tables in a relational database. The assumptions pertaining to the tables are (1) an employee can work on many projects, (2) a project can have many employees working on it, and (3) the "hours-worked" field is used to keep track of the number of hours worked by each employee on each project. EMPLOYEES (EMPLOYEE-NO, NAME, PHONE-NO, OFFICE)

CUSTOMERS (CUSTOMER-NO, NAME, ADDRESS, BALANCE)

PROJECTS (PROJECT-NO, DATE, CUSTOMER-NO, BILLING-AMOUNT)

HOURS (EMPLOYEE-NO, PROJECT-NO, HOURS-WORKED)

Required: a) Identify the primary key in each table. b) Identify foreign keys, if any.

3. Answer the questions that follow with reference to the following tables in a relational database. The assumptions pertaining to the tables are (1) an instructor can be teaching many courses, (2) a student can be taking many classes, (3) there can be only one instructor teaching a particular course, and (4) each class can have many students enrolled in it. INSTRUCTORS (INSTRUCTOR-NO, NAME, PHONE-NO, OFFICE)

STUDENTS (STUDENT-NO, NAME, ADDRESS, PHONE, YEAR-JOINED, GRADUATION-YEAR, COLLEGE)

COURSES (COURSE-NO, DESCRIPTION, CREDIT-HOURS, INSTRUCTOR-NO)

ENROLLMENTS (COURSE-NO, STUDENT-NO, STATUS)

Required:

a) Identify the primary key in each table.

Chapter 6: Elements of Database Systems

29 of 30

b) Identify foreign keys, if any. 4. Consider a scenario in which work orders require many parts and the same part could be used on different work orders. Construct tables to show how this relationship would be implemented in the relational model. You may make any reasonable assumptions regarding fields to be represented for work orders and parts. 5. Answer the questions that follow with reference to the following tables in a relational database. The assumptions pertaining to these three tables are (1) a customer can have many invoices, (2) an invoice can have many items, and (3) the STR field indicates the state sales tax rate for each customer's state.

CUSTOMERS

CUSTOMER# NAME ADDRESS1 ADDRESS2

STATE STR BALANCE

456 ABC Corp. 111 Any St. Houston TX 6.25 34560.65 457 DEF Corp. 22 Anywhere Dr. New York NY 6.50 2145.90 458 GHI Corp. 5 Someplace Ct. Miami FL 6.45 45670.75 459 JKL Corp. 56 Some Dr. Bryan TX 6.25 21009.50 460 MNO Corp. 7 Noplace Cir. San Diego CA 5.50 4561.00

INVOICES

INVOICE# DATE CUSTOMER# AMOUNT 1001 11-1-95 456 450.75 1002 11-2-95 457 560.25 11-2-95 460 300.10 1003 11-2-95 459 890.25 1004 11-3-95 450 425.50

INVOICE-ITEMS

INVOICE# ITEM# DESC PRICE QTY 1001 121 Widget 2.25 45 1001 540 Bolt 0.40 25 1002 211 Gear 3.70 10 1003 121 Widget 2.25 15 1003 121 Widget 2.25 10 1006 348 Nut 0.25 5

Required: a) List all violations of entity integrity in the above tables. b) List all violations of referential integrity in the above tables.

Chapter 6: Elements of Database Systems

30 of 30

6. Consider the following tables in a relational database. Provide the appropriate "SELECT" SQL statement necessary to answer the queries that follow. Primary keys are underlined and foreign key fields have an asterisk at the end of the field.

CUSTOMERS (CUSTNO, CNAME, CADDRESS, BALANCE)

SALESPERSONS (SPNO, SNAME, DATE_EMPLOYED, SALARY)

SALES (INVOICENO, DATE, CUSTNO*, SPNO*)

Required: a) List the salesperson name and salary for all sales to customers whose balance is greater than $20,000. b) List the names and addresses of all customers who have been sold merchandise by salespersons employed before 1/1/96.

7. Consider the following tables in a relational database which are in third normal form. Provide the appropriate "SELECT" SQL statement necessary to answer the queries that follow. Primary keys are underlined and foreign key fields have an asterisk at the end of the field. CUSTOMERS (CUSTOMERNO, NAME, ADDRESS, REGION, BALANCE)

INVOICES (INVOICENO, DATE, CUSTOMERNO*, SALESPERSON, AMOUNT)

ITEMS-SOLD (INVOICENO*, ITEMNO*, QUANTITY_SOLD, SELLING_PRICE)

INVENTORY (ITEMNO, DESCRIPTION, QUANTITY_ON_HAND)

Required: a) List the invoice number, item number, item description and selling price on all invoices by salesperson "John Doe." b) List the customer names, invoice numbers, and invoice dates for all invoices where the quantity sold exceeded 100.

8. Visit Web sites describing personal computer relational database products such as Microsoft Access, dBase, and OpenOffice Base. Develop criteria that you feel are important in evaluating the products, and rate each product in terms of the criteria you develop. Summarize your findings regarding the product you feel best meets the criteria you develop.

Last Updated: August 19, 2013

Copyright © 1996-2013 CyberText Publishing, Inc. All Rights Reserved