76
UNIT-I Database Systems Data Data is nothing but some raw facts and figures. The word “raw” indicates that the facts have not yet been processed to reveal their meaning. Information Information is the result of processing raw data to reveal its meaning. Information should be in a meaningful format, i.e., to reveal meaning, information requires context. E.g. an average temperature reading of 105 degrees does not mean much unless you also know its context like. o Is this degrees are in Fahrenheit or Celsius? o Is this a machine temperature, a body temperature, or an outside air temperature. Information can be used as the foundation for decision making. Raw data must be properly formatted for storage, processing and presentation. Some key points about data and information: 1. Data constitute the building blocks of information. 2. Information is produced by processing data. 3. Information is used to reveal the meaning of the data. 4. Accurate, relevant and timely information is the key to good decision making. 5. Good decision making is the key to organizational survival in a global environment. Introducing the database and the DBMS Database: An organised collection of logically related data with out having any redundancy in one place is called database. A database is a shared integrated computer structure that stores a collection of: 1. End-user, data that is raw facts of interest to the end user. 2. Meta data, or data about data, through which the end user data are integrated and managed. MDBMS 1/76 1/76

mdbms notes u-I.doc

Embed Size (px)

DESCRIPTION

bsc dbms u-i

Citation preview

Page 1: mdbms notes u-I.doc

UNIT-I

Database SystemsData

Data is nothing but some raw facts and figures. The word “raw” indicates that the facts have not yet been processed to reveal their meaning.

InformationInformation is the result of processing raw data to reveal its

meaning. Information should be in a meaningful format, i.e., to reveal

meaning, information requires context. E.g. an average temperature reading of 105 degrees does not mean much unless you also know its context like.o Is this degrees are in Fahrenheit or Celsius?o Is this a machine temperature, a body temperature, or an

outside air temperature. Information can be used as the foundation for decision making. Raw data must be properly formatted for storage, processing and

presentation.

Some key points about data and information:1. Data constitute the building blocks of information.2. Information is produced by processing data.3. Information is used to reveal the meaning of the data.4. Accurate, relevant and timely information is the key to good

decision making.5. Good decision making is the key to organizational survival in a

global environment.

Introducing the database and the DBMS

Database:An organised collection of logically related data with out having any

redundancy in one place is called database.A database is a shared integrated computer structure that stores a

collection of:1. End-user, data that is raw facts of interest to the end user.2. Meta data, or data about data, through which the end user data

are integrated and managed.

Meta data:The meta data provide a description of the data characteristics (eg.,

data types, sizes) and the set of relationships(constraints) that link the data found with in the database. Such as the name of the each data element (column name), the type of values (numeric, text or date) stored on each data element, whether or not the data element can be left empty and so on.

DBMS:

MDBMS 1/581/58

Page 2: mdbms notes u-I.doc

UNIT-I

A DBMS is a collection of programs that manages the database structure and controls access to the data stored in the database.

Historical roots : files and file systems

Basic file terminology:

Data: data is nothing but some raw facts.Field: A character or group of characters that has a specific meaning. A filed is used to define and store data. A field is nothing but a column in the table.Record: A logically connected set of one or more fields that describes a person, place or thing. It is nothing but a row in a table.File: It is a collection of related records. It is nothing but a table.

Manual system or Manual File processing system:A manager of almost any small organization was able to keep track

of necessary data by using a manual file system. Such a file system was traditionally composed of a collection of file folders, each properly tagged and kept in a filing cabinet.

Eg1: A file folder in a doctor’s office might contain patient data, one file folder for each patient. Al of the data in that file folder would describe only that particular patient’s medical history.

Eg2: A personal manager might organize personal data by category of employment like clerical, technical, scales and administrative. Therefore a file folder labeled “technical” would contain data whose duties where classified as technical.

As long as a data collection was relatively small and an organizations managers had few reporting requirements, the manual system served its role well as a data repository. However, as organizations grew and as reporting requirements became more complex, keeping track of data in a manual file system became more difficult. In fact, finding and using data in growing collection of file folders turned into time consuming and a difficult task.

Eg: Just consider few questions to which a retail business owner might want answers.

1. What products sold well during the past week, month, quarter or year?

2. What is the current daily, weekly, monthly, quarterly or yearly sales volume?

3. Did the various cost categories increase, decrease or remain stable during the past week, month, quarter or year?

The list of questions such as these tends to be long and to increase in number as an organization grows. Unfortunately, generating reports from a manual file system can be slow and difficult task.Computer File Processing System:

The conversion from a manual file system to a matching computer file system could be technically complex. Consequently, a new kind of

MDBMS 2/582/58

Page 3: mdbms notes u-I.doc

UNIT-I

professional, known as a data processing [DP] specialist, had to be hired or “grown” from the current staff.

The DP specialist creates the necessary computer file structures, often wrote the software that managed the data with in those structures and designed the application programs that produces reports based on the file data.

Initially, the computer files with in the file system were similar to the manual files. A simple example of a customer data file for a small insurance company is shown in the figure below:

The customer file shown in the above figure contains some records. Each record is composed of 9 fields. The records are stored in a named file. Because the file in the above figure contains customer data for the insurance company, its file name is CUSTOMER.

Using the CUSTOMER files contents, the DP specialist wrote programs that produced very useful reports for the Insurance company’s sales department such as

1. Monthly summaries that showed the types and amounts of insurance sold by each agent. It can be used to analyze each agents productivity.

2. Monthly checks to determine which customers must be contacted for renewal.

3. Reports that analyzed the ratios of insurance type sold by each agent.

As time went on, the insurance company needed additional programs to produce new reports. Then the sales department and the insurance company created a file name SALES, which helped track daily sales efforts. Additional files were created as needed to produce even more useful reports. Then, the DP specialist was asked to create the AGENT file as shown in the below figure. The data in the AGENT file were used to write checks, keep track of taxes paid and summarize insurance coverage etc.,

MDBMS 3/583/58

Page 4: mdbms notes u-I.doc

UNIT-I

MDBMS 4/584/58

Page 5: mdbms notes u-I.doc

UNIT-I

As the number of files increased, a small file system like the one shown in the figure below evolved.

Each file in the system used its own application program to store, retrieve and modify data. Each file was owned by a particular department.

As the insurance company’s file system grew, the demand for the DP specialists programming sales grew even faster, and the DP specialists was authorized to hire additional programmers.Problems with file system Data management

The following are some of the problems with file system data management:

1. Structural Dependency2. Data Dependency3. Field Definitions and Naming Conventions.4. Data Redundancy5. Data Inconsistency6. Data Anomalies7. Lack of Data Sharing8. Lengthy Development Times.

MDBMS 5/585/58

Page 6: mdbms notes u-I.doc

UNIT-I

Structural Dependency:A file system exhibits structural dependency which means that

access to a file is dependent on its structure.Eg: Adding a customer date of birth field to the CUSTOMER file.

Given this change, none of the previous programs will work with the new CUSTOMER file structure. Therefore, all of the file system programs must be modified to conform to the new file structure.

Data Dependency:Even changes in the characteristics of data, such as changing a field

from integer to decimal, requires changes in all the programs that access the file. Data dependence makes the file system extremely cumbersome from the point of view of a programmer and database manager.Field Definitions and Naming Conventions:

Selecting proper field names is also important. Make sure that the field names are reasonably and descriptive.

Eg: Suppose a field name “Ren” represents the customer insurance renewal date, but it is not reasonably descriptive. The same column can be defined as Cus_Ren_Date, which indicates customer’s insurance renewal date. It can be easily understandable by the users.

With proper naming conventions, the file structure becomes “self documenting”.Data Redundancy:

The Data Redundancy occurs when the same data are stored unnecessarily at different places (The repetition of same data more than one place).

Eg: In the above file structures, the Agent names and phone numbers occur in both the CUSTOMER and AGENT files. But you need only one correct copy of the Agent name and Phone Numbers. Having them occur in more than one place produces data redundancy.

Data Inconsistency:Data inconsistency exists when different and conflicting versions of

the same data appear in different places.Eg: Suppose if we change an Agent phone number or address in the

AGENT file, but, if you forget to make corresponding changes in the CUSTOMER file, then the files contain different data for the same agent. It leads to data inconsistency problems.

Data Anomalies:The dictionary defines anomalies as an abnormality that is some

problems or errors.Data anomalies are divided into three types:1. Insertion Anomaly2. Deletion Anomaly3. Update Anomaly

Lack of Data Sharing:In computer file processing system there is no integration of data.

So we cannot share data among number of users.

MDBMS 6/586/58

Page 7: mdbms notes u-I.doc

UNIT-I

Lengthy Development Times:All application programs must be developed from the beginning

because there is no any reusability of data.

Role and advantages of DBMS

The role of DBMS:

1. The DBMS serves as the interface between the user and the database.

2. The database structure itself is stored as a collection of files and the only way to access the data in those files is through the DBMS.

3. The following figure shows how a DBMS acts as an interface.

4. The DBMS receives all application requests and translate them into the complex operations required to fulfill those requests.

5. The DBMS hides much of the databases internal complexity from the application programs and users.

6. The application program might be written by a programmer using a programming language such as SQL, SQL Server, VB.NET, C++ or Java.

7. Having a DBMS between the end users and the data base, it offers some important advantages:

a. The DBMS enables the data in the database “to be shared” among multiple operations or users.

b. The DBMS “integrates” the many different users of data into a single data repository.

MDBMS 7/587/58

Page 8: mdbms notes u-I.doc

UNIT-I

The advantages of DBMS:

A DBMS or database approach offers the following advantages:1. Program Data Independence2. Minimal Data Redundancy3. Improved Data Consistency4. Improved Data Sharing5. Improved Productivity of Application Development6. Enforcement of Standards7. Improved Data Quality8. Improved Data Accessibility and responsiveness9. Reduced Program Maintenance10. Improved Decision Making

Program Data Independence:The separation of data descriptions (meta data) from the application

programs that use the data is called data Independence with the database approach, data descriptions are stored in a central location called the “repository”. This property of database systems allows an organizations data to change and evolve without changing the application programs that process the data.Minimal Data Redundancy:

The design goal with the database approach is that previously separated data files are integrated into a single, logical structure. So each primary fact is recorded only once in one place in the data base.

The database approach does not eliminate redundancy entirely, but it allows the designer to carefully control the type and amount of redundancy.Improved Data Consistency:

By eliminating or controlling data redundancy we greatly reduce the opportunities for inconsistency.Improved Data Sharing:

A database is designed as a shared corporate resource. Authorized internal and external users are granted permission to use the database, and each user [or group of users] is provided one or more user views(it is a logical description of some portion of the database that is required by a user to perform some task) to facilitate this use.Increased Productivity of Application Development:

A major advantage of the database approach is that it greatly reduces the cost and time for developing new business applications.Enforcement of Standards:

When the database approach is implemented with full management support, the database administration function should be granted authority and responsibility for establishing and enforcing data standards. These standards [constraints] and procedures for accessing, updating and protecting data.

MDBMS 8/588/58

Page 9: mdbms notes u-I.doc

UNIT-I

Improved Data Quality:The database approach provides a number of tools and procedures

to improve data quality. 1. Database Designers can specify Integrity Constraints that are

enforced by the DBMS.2. To clean up the operational data before they are placed in the

Dataware house.Improved Data Accessibility and Responsiveness:

With a relational data base, end users without having any programming experience can often retrieve and display the data from a database.Reduced Program Maintenance:

Stored data can be easily maintained that is stored data must be changed frequently for a variety of reasons like new data items are added, data formats are changed etc., can be easily maintained.Improved Decision Making:

Better managed data and improved data access make it possible to generate better quality information on which better decisions are based.

Types of Databases

A DBMS can support different types of databases. Some of them are as follows:

1. According to number of users2. According to database locations3. According to data usage4. According to the structure of data5. XML databases

According to number of users:According to the number of users the databases are divided into

two types. They are i) Single User Databases and ii) Multi User Databases.Single User Databases:

1. It supports only one user at a given time. In other words, if user ‘A’ is using the database, users ‘B’ and users ‘C’ must wait until user ‘A’ is done.

2. A single user database that runs on a personal computer is also called as Desktop Database.

Multi User Databases:It supports multiple users at the same time. It is again divided into

two types. They are: i) Work Group Database and ii) Enterprise Database.Work Group Database:

When the multi user Database supports a small number of users (usually fewer than 50) or a specific Department with in an organization is called as Work Group Database.Enterprise Database:

When the database is used by the entire organization and supports many users (more than 50, usually in 100’s) across many departments then that database is called as Enterprise Database.

MDBMS 9/589/58

Page 10: mdbms notes u-I.doc

UNIT-I

According to Database Locations:Location might also be used to classify the databases. Based on the

locations databases are divided into two types: i) Centralised and ii) Distributed.Centralised Database:

A database that supports data located at a single site is called as a Centralised Database.Distributed Database:

A database that supports data distributed across several different sites is called as Distributed Database.According to Data Usage:

The most popular way of classifying databases today, however, is based on how they will be used and on the time sensitivity of the information gathered from them.

Eg: Transactions such as product sales, payments and supply purchases reflect critical day to day operations. Such transactions might be recorded accurately and immediately.

According to data usage databases are divided into two types:i) Operational Database ii) Data Ware HouseOperational Database:

A database that is designed to support a company’s day to day operation is classified as an operational database. It is also called Transactional or Productional database.Data Ware House:

It can store data derived from different sources in an organization.

According to the structure of the data:Databases can also be classified to reflect the degree to which the

data are structured. These are divided into:i) Unstructured Databaseii) Structured Databaseiii) Semi Structured Database

Unstructured Database:It contains the data in their original (raw) state, i.e., in the format in

which they were collected.Structured Database:

These are the result of taking unstructured data and formatting such data to facilitate, storage, use and the generation of information (processed raw data).Semi Structured Database:

These are the data that have already been processed to some extent.XML Databases:

Unstructured and Semi Structured data storage and management needs are being addressed through a new generation of data bases known as XML data bases. An XML database supports the storage and management of Semi Structured XML data.XML (Extensible Markup Language)

MDBMS 10/5810/58

Page 11: mdbms notes u-I.doc

UNIT-I

It is a special language used to represent and manipulate data elements in a textual format.

MDBMS 11/5811/58

Page 12: mdbms notes u-I.doc

UNIT-I

Various components of Data Base Environment:The term database system refers to an organization of components

that define and regulate the collection, storage, management and use of data with in a database environment. From a general management point of view, the database system is composed of major components as shown in the figure.

The database system environment has the following components:1. Hardware2. Software3. People4. Procedures5. Data

Hardware:Hardware refers to all of the systems physical devices. For example

computers (micro computers, work stations, servers and mainframe computers), storage devices, printers, network devices (Hubs, Routers, Switches, Fiber Optics) and other devices.Software:

Collection of programs or instructions or commands to perform a task is called software. To make the database system function fully, three types of software are needed. They are

1. OS – Operating System Software2. DBMS software3. Application Programs and Utility Software

MDBMS 12/5812/58

Page 13: mdbms notes u-I.doc

UNIT-I

Operating System Software:It manages all hardware components and makes it possible

for all other software to run on the computers. Eg: Microsoft Windows, Linux, Unix etc.,DBMS Software:

It manages the Database with in the database system. Eg: SQL, Microsoft SQL Server, Oracle Corporations, IBM’s DB2 etc.,Application Programs and Utility Software:

These are used to access and manipulate data in the database. These are also used for generating reports, tabulations and other information.

People:This component includes all users of the database system. On the

basis of primary job function five types of users can be identified in a database system. They are:

1. System Administrators2. Database Administrators3. Database Designers4. System Analysts and Programmers5. End Users.System Administrators:

This person supervises the database systems general operations.Database Administrators:

They are also known as DBAs. These persons manage the DBMS and ensure that the database is functioning properly.Database Designers:

The persons who design the database structure of the organization.System Analysts and Programmers:

These people design and implement the application programs. They design and create the data entry screens (forms), reports and procedures through which end users access and manipulate the data in a data base.End Users:

These are the people who use the application programs to run the organizations daily operations.

Procedures:These are the instructions and rules that govern the design and use

of the database system.Data:

The word data indicates the collection of facts stored in the database. Data are the raw material from which information is generated.

MDBMS 13/5813/58

Page 14: mdbms notes u-I.doc

UNIT-I

Functions of DBMS

A DBMS performs several functions which are used for the integrity and consistency of the data in the database. They are:

1. Data Dictionary Management2. Data Storage Management3. Data Transformation and Presentation4. Security Management5. Multi User Access Control6. Backup and Recovery Management7. Data Integrity Management8. Database Access Languages and Application Programming

Interfaces9. Database Communication Interfaces

Data Dictionary Management:The DBMS stores definitions of the data elements and their

relationships in a data dictionary. Any changes made in a database structure are automatically recorded in the data dictionary.Data Storage Management:

The DBMS creates and manages the complex structure required for data storage. A modern DBMS (MDBMS) provides storage not only for the data, but also for related data entry forms, report definition, data validation rules, procedural code, structures to handle video and picture formats and so on.Data Transformation and Presentation:

The DBMS transforms entered data into required data structures. The DBMS formats the physically retrieved data to make it confirm to the users logical expectations.Security Management:

The DBMS creates a security system that enforces users security. Security rules determine which users can access the database, which data items each user can access, and which data operations (read, add, delete or modify), the user can perform. This is especially important in multi user database systems. All database users may be authenticated to the DBMS through a user name and password.Multi User Access Control:

Multiple users can access the DBMS at the same time without waiting.Backup and Recovery Management:

The DBMS provides backup and data recovery to ensure data safety and integrity.

Backup means maintaining a duplicate copy of the original database that is copying the database into some other systems or to some other external secondary storage devices.

Recovery management deals with the recovery of the database after a failure such as bad sector in the disk, virus attacks or a power failure.Data Integrity Management:

MDBMS 14/5814/58

Page 15: mdbms notes u-I.doc

UNIT-I

The DBMS promotes and enforces integrity rules, thus minimizes data redundancy and maximizes data consistency. The data relationships stored in the data dictionary are used to enforce data integrity.

MDBMS 15/5815/58

Page 16: mdbms notes u-I.doc

UNIT-I

Database Access Languages and Application Programming Interfaces:

1. The DBMS provides data access through a Query Language.2. A Query Language is a non-procedural that is the users specify

what must be done without having to specify how it is to be done.

3. SQL is used to access the data in a database through the DBMS.4. The DBMS also provides application programming interfaces to

procedural languages such as COBOL, C, C++, Java, VB.NET etc.,Database Communication Interfaces:

Current generation DBMS can accept end user request via multiple, different network environments. For example, the DBMS might provide access to the database via the internet through the use of web browsers.

Disadvantages of Database Approach

Although the database system yields considerable advantages over previous data management approaches, database systems due carry significant disadvantages. Some of them are:

1. Increased Costs2. Management Complexity3. Maintaining Currency4. Vendor Dependency5. Frequent Upgrade or Replacement Cycles

Increase Costs:Database systems require hardware and software and highly skilled

personnel. The cost of maintaining the hardware, software and personnel required to operate and manage a database system. So, we have to pay amounts for each thing.Management Complexity:

The changes introduced by the adoption of a database system must be properly managed to ensure that they help advance the companies objectives.Maintaining Currency:

To maximize the efficiency of the database system, one must keep the system current. Therefore one must perform frequent updates and apply the latest patches and security measures to all components because database technology advances rapidly. Personnel training costs tend to be significant.Vendor Dependency:

Given the heavy investment in technology and personnel training, companies might be reluctant to change the database vendors. As a consequence, vendors are less likely to offer pricing point advantages to existing customers, and those customers might be limited in their choice of database system components.Frequent Upgrade or Replacement Cycles:

DBMS vendors upgrade their products by adding new functionality. Such new features often come bundled in new upgrade versions of software. Some of these versions require hardware upgrades. Not only

MDBMS 16/5816/58

Page 17: mdbms notes u-I.doc

UNIT-I

do the upgrade themselves, cost, money but it also costs money to train database users and administrators to properly use and mange the new features.

MDBMS 17/5817/58

Page 18: mdbms notes u-I.doc

UNIT-I

Why database design is important?

1. Database design refers to the activities that focus on the design of the database structure that will be used to store and mange end user data.

2. Proper database design requires the designer to identify the databases expected use.

3. Databases can be designed to be used in centralized, distributed, single or multi user environments. Each approach requires different database designs.

4. A well designed database facilitates data management and generates accurate and valuable information.

5. A poorly designed database leads to bad decision making and bad decision making can lead to the failure of an organization.

MDBMS 18/5818/58

Page 19: mdbms notes u-I.doc

UNIT-I

Chapter – 2

Data Models

Data modeling and Data ModelsDatabase design focuses on how the database structure will be

used to store and manage end user data.A Data Model is a relatively simple representation, usually

graphical, of more complex real world data structures. In general terms, a model is an abstraction of a more complex real world object or event.

Within the database environment, a data model represents data structures and their characteristics, relations, constraints, transformations and other constructs with the purpose of supporting a specific problem domain.

Data Models A model is a representation of reality, ‘real world’ objects and

events, and their associations. It is an abstraction that concentrates on the essential, inherent aspects of an organization.

Data Model can be defined as an integrated collection of concepts for describing and manipulating data, relationships between data, and constraints on the data in an organization.

The importance of Data Models1. Data models can facilitate interaction among the designer,

application programmers and the end users.2. A well development data model can even foster improved

understanding of the organization for which the database design is developed.

3. Data constitute the most basic information units employed by a system. Applications are created to manage data and to help transform data into information.

4. Data are viewed in different ways by different people, for example, contrast the [data] view of a company manager with that of a company clerk. Although the manager and the clerk both work for the same company, the manager is more likely to have an enterprise-wide view of company data than the clerk.

5. Application programmers have yet another view of data, being more concerned with data location, formatting and specific reporting requirements. Basically, application programmers translate company policies and procedures from a variety of sources into appropriate interfaces (forms, tables), reports and queries.

6. When a good database blue print is available then it is very easy to implement database approach.

Data Model Basic Building BlocksThe basic building blocks of all data models:

1. Entities2. Attributes3. Relationships

MDBMS 19/5819/58

Page 20: mdbms notes u-I.doc

UNIT-I

4. Constraints1. Entity:

An entity is anything (a person, a place, a device or an event) about which data are to be collected and stored. Entities may be physical objects, such as customers or products, but entities may also be abstractions, such as flight routes or musical concerts.2. Attribute:

An attribute is a characteristic of an entity. For example, a STUDENT entity would be described by attributes such as Rno, Sname, Address, etc.

Attributes are the equivalent of fields in file systems.3. Relationship:

A relationship describes an association among entities. For example, a relationship exists between students and courses that can be described as follows: a student can join in any one course and a course can have many students.

Types of Relationships:1. One to One (1..1 / 1:1) relationships2. One to Many (1..*/1:M) relationships3. Many to Many (*..*/N:M/M:N) relationships

One to Many Relationships [1:M]A painter paints many different paintings, but each one of them is

painted by only one painter. Thus, the painter [the “one”] is related to the paintings [the “many”]. Therefore database designers label the relationship “PAINTER paints PAINTING” as 1:M.One to One Relationship [1:1]

Each patient assigned a single, bed and each bed is assigned for only patient. Therefore, the relationship, “PATIENT assigned BED” is label as 1:1.Many to Many Relationships [M:N]

An employee may learn many job skills and each job skill may be learned by many students. Database designers label the relationship “EMPLOYEE learns SKILL” as N:M or M:N.4. Constraints:

A constraint is a rule or a restriction placed on the data. Constraints are important because they help to ensure data integrity. Constraints are normally expressed in the form of rules. For example

1. Employee name should be in capital letters2. Employee number should start with an alphabet.

Business RulesA Business Rule is a brief, precise description of a policy,

procedure or principle or a statement within a specific organization.Business Rules can apply to any organization large or small

business, a government unit or a research laboratory that stores and uses data to generate information.

Properly written business rules are used to define entities, attributes, relationships and constraints. For example

1. An Agent can serve many customers, and each customer can be served by only one Agent.

MDBMS 20/5820/58

Page 21: mdbms notes u-I.doc

UNIT-I

i.e., AGENT serves CUSTOMER.2. A customer may generate many invoices. An invoice is

generated by only one customer.3. A training session cannot be scheduled for fewer than ten

employees or for more than 30 employees.Discovering Business Rules:

The main sources of business rules are company managers, policy makers, department managers and written documentation such as a company’s procedures, standards or operation manuals. A faster and more direct source of business rules is direct interviews with end users.

Advantages of discovering business rules:1. To help standardize the company’s view of data2. It is communication tool between users and designers3. They allow the designer to understand the natural role and scope

of the data4. They allow the designer to understand business processes.5. They allow the designer to develop appropriate relationship

participation rules and constraints and to create an accurate data model.

Translating Business Rules into Data Model Components:Business rules set the stage for the proper identification of entities,

attributes, relationships and constraints. As a general rule, a noun in a business rule will translate into an entity in the data model, and a verb (active or passive) associating nouns will translate into a relationship among the entities.

Eg: The business rule “A customer may generate many invoices” contains two nouns (customer, invoices) and a verb (generate) that associates the nouns.Evolution of Data Models

The quest for better data management has led to several different models that attempt to resolve the file system’s critical shortcomings.Generation Time Model Examples CommentsFirst 1960s-1970s File System VMS/VSAM Used mainly on IBM

mainframe systemsManaged records, not relationships

Second 1970s Hierarchical and Network Data Model

IMSADABAS IDS-II

Early database systemsNavigational access

Third Mid-1970s to present

Relational Data Model

DB2OracleMS SQL-serverMySQL

Conceptual simplicityEntity Relationship modeling and support for relational data modeling

Fourth Mid-1980s to present

Object OrientedExtended Relational

VersantFirstObjects.NetObjectivity/DBDB2 UDBOracle 10g

Support complex dataExtended relational productsSupport objects and data warehousing

Next Generation

Present to Future

XML DbXMLTaminoDB2 UDB

Organization and management of unstructured data

MDBMS 21/5821/58

Page 22: mdbms notes u-I.doc

UNIT-I

Oracle 10gMS SQL Server

Relational and object models add support for XML documents

Hierarchical Data Models:The Hierarchical model was developed in 1960s to manage large

amounts of data for complex manufacturing projects such as the Apollo Rocket that landed on the moon in 1969.

1. It’s basic structure is represented by an upside down tree2. The Hierarchical structure contains levels or segments3. A segment is equivalent of a file systems record type

4. Within the hierarchy, the top layer the root (level 0) is perceived as the parent of the segment directly beneath it.

5. In the above figure, the root segment is the parent of the level 1 segment, which in turn are the parents of the level 2 segment etc.,

6. The segments below other segments are the children of the segment above

7. In short, the hierarchical model shows a set of One to Many relationship between a parent and its children segments

8. Each parent can have many children, but each child has only one parent

Advantages:1. It promotes data sharing2. Parent Child relationship promotes conceptual simplicity3. Parent Child relationship promotes data integrity4. It is efficient with 1:M relationship

Disadvantages:1. It is complex to implement2. Complex application development and management3. There are no Many to Many relationships

MDBMS 22/5822/58

Page 23: mdbms notes u-I.doc

UNIT-I

Network Data Models: The network data model was created to represent the more

complex data relationships more efficiently than the hierarchical model

The network model was created to improve database performance and to impose a database standards

To establish database standards, the “Conference On DAta SYstem’s Lanaguages (CODASYL)” created the “DB Task Group(DBTG) in the late 1960s

The DBTG was charged to define standard specifications for an environment that would facilitate database creation and data manipulation. The final DBTG report contained specifications for three crucial database components:

o The Schema, which is the conceptual organization of the entire database as viewed by the database administrator.

o The Subschema, which defines the portion of the database “seen” by the application programs that actually produce the desired information from the data contained within the database.

o A data management language (DML), that defines the environment in which data can be managed.

To produce the desired standardization for each of the three components, the DBTG specified three distinct DML components.

o A schema data definition language (DDL), which enables the database administrator to define the schema components.

o A subschema DDL, which allows the application programs to define the database components that will be used by the application.

o A data manipulation language to work with the data in the database.

The network data base is a collection of records in 1:M relationships. However, the network model allows a record to have more than one parent i.e., it allows to maintain Many to Many relationships.

In network database terminology, a relationship is called a set.

Each set is composed of atleast two record types one owner record[parent] and a member record [child].

A set presents a 1:M relationship between the owner and the member. An example of such a relationship is shown in the figure.

MDBMS 23/5823/58

Page 24: mdbms notes u-I.doc

UNIT-I

The above figure illustrates a network data model for a typical sales organization. In this model CUSTOMER, SALESREP, INVOICE, INV_LINE, PAYMENT and PRODUCT represent record types. INVOICE is owned by both SALESREP and CUSTOMER. Similarly INV_LINE has two owners PRODUCT and INVOICE. The network model can also include one owner relationship such as CUSTOMER makes PAYMENT.Advantages:

1. It handles more relationship types such as M:N and multi parent2. Data access is more flexible than the hierarchical model3. Data owner and member relationship promotes data integrity4. It has database standards

Disadvantages:1. System complexity limits efficiency2. The lack of ad hoc query capability3. Limited data independence

Relational Data Model: The Relational model was introduced by E.F.Codd in 1970 The relational model foundation is a mathematical concept

known as a relation A relation is compared to a table in mathematics which consists

of rows and columns Each row in a relation (table) is called a “Tuple” (record or

instance or occurrence) Each column represents an attribute (field or feature of an entity) Relation(Table) is called as an entity(object) The relational data model is implemented through a RDBMS Some of the relational database softwares are: Oracle, DB2, SQL,

SQL Server, MySQL, MS-Access etc., The relational data model is easier to understand and implement Tables are related to each other through the sharing of a

common attribute. For example, the CUSTOMER table in figure might contain a sales agent’s number that is also contained in the AGENT table.

MDBMS 24/5824/58

Page 25: mdbms notes u-I.doc

UNIT-I

The common column(Agent_Code) between the CUSTOMER and AGENT tables enables you to match the CUSTOMER to his sales agent even though the customer data is stored in one table and sales representative data stored in another table.

The relationship type (1:1, 1:M, N:M) is often shown in a relational schema as show in the figure.

Relational Diagram:It is a representation of the relational database’s entities, attributes

within those entities and the relationships between those entities.1. In the above example, it uses the 1:M relationship and this

diagram is produced by using the MS-Access Software.2. The reason for the relational data models improvement is its

powerful and flexible query language.3. For most relational database software, the query language is

SQL, which allows the user to specify what must be done without specifying how it must be done.

4. The RDBMS uses SQL to translate user queries in to instructions for retrieving the requested data.

MDBMS 25/5825/58

Page 26: mdbms notes u-I.doc

UNIT-I

Advantages:1. Structural Independence i.e., changes in a table structure do not

effect the data access or application programs2. Tabular view of data improves conceptual simplicity, easier

database design, implementation, maintenance and use3. Ad hoc query capability is based on SQL

Disadvantages:1. The RDBMS requires substantial hardware and system software

overhead2. Untrained people can misuse the data of a database which leads

to some anomalies.E-R Data Model:

Peter Chen first introduced the E-R Data Model in 1976. It was the graphical representation of entities and their relationships in a database structure. E-R models are normally represented in an entity relationship diagram [DRD], which uses graphical representations to model database components.Components of E-R Model:

The E-R model is based on the following components:1. Entity2. Attribute3. Relationships

Entity:An entity is nothing but an object like a person, a place, a device or

an event about which data are to be collected and store.1. Entity names should be specified in capital letters.2. Entity name should be a noun in singular form.3. Entity is represented in the ERD by a rectangular, also known

as an Entity Box.4. The entity name is written in the centre of the Rectangle.

Eg: CUSTORMER, STUDENT, EMPLOYEE etc.,Usually, when applying the ERD to the relational model, an entity is

mapped to a relational table. Each row in the relational table is known as an entity instance or entity occurrence in the E-R Model.Attribute:

Attributes describe the characteristics or features of an entity. For example a STUDENT entity would be described by attributes such as Stu_No, Stu_Name, Stu_Addr, etc.Relationships:

A relationship describes an association among entities. Most relationships describe associations between two entities. Relationships are divided into three types:

1. One to Many [1:M]2. Many to Many [M:M or M:N]3. One to One [1:1]

The name of the relationship usually is an active or passive verb. For example:

1. A PAINTER paints may PAINTINGS2. An EMPLOYEE learns many SKILLS

MDBMS 26/5826/58

Page 27: mdbms notes u-I.doc

UNIT-I

3. An EMPLOYEE manages a STOREThe below figures shows the different types of relationships using

two ER-Notations:1. The original Chen Notation2. Crow’s Foot Notation

Chen Notation:1. The chen notation is based on Peter Chen Land mark2. In this notation, the connectivities are written next to each entity

box3. Relationships are represented by a diamond connected to the

related entities through a relationship line4. The relationship name is written inside the diamond

Crow’s Foot Notation:1. The name Crow’s Foot is derived from the ‘3 pronged’ symbol

used to represent the many side of the relationship2. The connectivities are represented by symbols. For example, the

“1” is represented by a short line segment(|) and the “Many” is represented by the ‘3-pronged’ crow’s foot symbol( )

3. The relationship name is written above the relationship line.Advantages:

1. Visual model yields conceptual simplicity2. Visual representation makes it an effective communication tool3. It is integrated with relational model

Disadvantages:1. There is limited constraint representation2. There is limited relationship representation3. Loss of information content occurs when attributes are removed

from entities to avoid crowded display.

MDBMS 27/5827/58

Page 28: mdbms notes u-I.doc

UNIT-I

Object Oriented Data Model:In the Object Oriented Data Model (OODM), both data and their

relationships are combined in a single structure known as an object. The OODM is the basis for the Object Oriented Data Base Management System (OODBMS).

An object includes information about relationships with their objects. Object also contain all operations that can be performed on it, such as changing its data values, finding a specific data value, printing data values etc.

Object includes data, various types of relationships and operational procedures.Components of OODM:

The OODM is based on the following five components:1. Object2. Attributes3. Class4. Class Hierarchy5. Inheritance

Object:An object is an abstraction of a real world entity. That is an object

may be considered equivalent to an E-R models entity. Eg: STUDENT, EMPLOYEE etc.Attribute:

Attributes describe the properties of an object.Eg: A CUSTOMER object includes the attributes Cust_ID, Cust_Name, Cust_Phone etc.

Class:1. A class is a collection of similar objects with shared structure

(attributes) and behaviour(methods)2. A class contains a set of procedures known as methods3. A class’s method represent a real world action such as:

a. Finding a selected Person’s nameb. Changing a Person’s namec. Printing a Person’s address

Class Hierarchy:Classes are organized in a class hierarchy. The class hierarchy

resembles on upside down tree in which each class has only one parent. For example the CUSTOMER class and the EMPLOYEE class share a parent PERSON class.Inheritance:

Inheritance is the ability of an object with in the class hierarchy to inherit the attributes and methods of the classes above it.

Eg: Two classes CUSTOMER and EMPLOYEE can be created as subclasses from the super class PERSON. In this case, CUSTOMER and EMPLOYEE will inherit all attributes and methods from PERSON.OODM’s are represented using Unified Modeling Language (UML)

class diagrams. UML class diagrams are used to represent data and their relationships with I the larger UML Object Oriented Systems.

MDBMS 28/5828/58

Page 29: mdbms notes u-I.doc

UNIT-I

Eg: To illustrate the main concepts of the OODM, let us use a simple invoicing problem. In this case, invoices are generated by CUSTOMERs, each invoice references one or more lines, and each line represents on item purchased by a CUSTOMER.The below figure illustrates the object representation for this simple

invoicing problem, as well as the equivalent UML diagram and E-R model.

1. The object representation of the INVOICE includes all related objects with in the same object box. The connectivities (1 and M) indicate the relationship of the related objects to the INVOICE. For example, the “1” next to the CUSTOMER object indicates that each INVOICE is related to only one CUSTOMER. The “M” next to the LINE object indicates that INVOICE contains many LINEs.

2. The UML class diagram uses three separate object classes (CUSTOMER, INVOICE and LINE) and two relationships to represent this simple invoicing problem. The relationship connectivities are represented by the 1..1 (one and only one), 0..*(zero or many). 1..* (one or many) symbols.

3. The E-R model also uses three separate entities and two relationships to represent this simple invoice problem.

Advantages:1. Visual representation2. Inheritance promotes data integrity

Disadvantages:1. High system overhead slows transactions2. It is a complex navigational system

Data Models: A summaryIn the evolution of data models, there are some common

characteristics that data models must have in order to widely accepted: A data model must show some degree of conceptual simplicity

without compromising the semantic completeness of the database.

A data model must represent the real world as closely as possible.

MDBMS 29/5829/58

Page 30: mdbms notes u-I.doc

UNIT-I

Representation of the real world transformations(behaviour) must be in compliance with the consistency and integrity characteristics of any data model.

Each new data model capitalizes on the shortcomings of previous models. The network model replaced the hierarchical model because the former made it much easier to represent complex (many to many) relationships. In turn, the relational model offers several advantages over the hierarchical and network models through its simpler data representation, superior data independence, and easy to use query language.

MDBMS 30/5830/58

Page 31: mdbms notes u-I.doc

UNIT-I

Advantages and Disadvantages of various Database Models

Data ModelData

Independence

Structural Independenc

eAdvantages Disadvantages

Hierarchical Yes No 1. It promotes data sharing2. Parent/Child relationship promotes

conceptual simplicity3. Database security is provided and

enforced by DBMS4. Parent/Child relationship promotes

data integrity5. It is efficient with 1:M relationships

1. Complex implementation requires knowledge of physical data storage characteristics

2. Complex application development, management and use

3. Changes in structure require changes in all application programs

4. There are implementation limitations(no multiparent or M:N relationships)

5. No data definition and manipulation language in the DBMS

6. There is a lack of standardsNetwork Yes No 1. Conceptual simplicity is at least

equal to that of the hierarchical model

2. It handles more relationship types, such as M:N and multiparent

3. Data access is more flexible than in hierarchical and file system models

4. Data owner/member relationship promotes data integrity

5. It includes data definition and manipulation languages in DBMS

6. There is conformance to standards

1. System complexity limits efficiency-still a navigational system

2. Navigational system yields complex implementation, application development, and management

3. Strucutural changes require changes in all application programs

MDBMS 31/58 31/58

Page 32: mdbms notes u-I.doc

UNIT-I

Relational Yes Yes 1. Structural independence is promoted by the use of independent tables. Changes in a table’s structure do not affect data access or application programs.

2. Tabular view substantially improves conceptual design, implementation, management, and use.

3. Ad hoc query capability is based on SQL.

4. Powerful RDBMS isolates the end user from physical level details and improves implementation and management simplicity.

1. The RDBMS requires substantial hardware and system software overhead.

2. Conceptual simplicity gives relatively untrained people the tools to use a good system poorly and if unchecked, it may produce the same data anomalies found in file systems.

3. It may promote “islands of information” problems as individuals and departments can easily develop their own applications.

Entity Relationship

Yes Yes 1. Visual modeling yields exceptional conceptual simplicity.

2. Visual representation makes it an effective communication tool.

3. It is integrated with dominant relational model.

1. There is limited constraint representation.

2. There is limited relationship representation.

3. There is no data manipulation language.

4. Loss of information content occurs when attributes are removed from entities to avoid crowded displays.

Object oriented

Yes Yes 1. Semantic content is added.2. Visual representation includes

semantic content.3. Inheritance promotes data

integrity.

1. Slow development of standards caused vendors to supply their own enhancements, thus eliminating a widely accepted standard.

2. It is a complex navigational system.3. There is a steep learning curve.4. High system overhead slows

transactions.

MDBMS 32/58 32/58

Page 33: mdbms notes u-I.doc

UNIT-I

Basic terminology used by the various data models

Real World Example File Processing

Hierarchical Model

Network Model

Relational Model

ER Model OO Model

A group of vendors

VendorFile cabinet

File Segment type

Record type Table Entity Set Class

A single Vendor

Global supplies

Record Segment occurrence

Current Record

Row (tuple) Entity occurrence

Object instance

The contact name

Venkateswara

Field Segment field

Record Field Table attribute

Entity attribute

Object attribute

The vendor identifier

111 Index Sequence field

Record key Key Entity identifier

Object identifier

MDBMS 33/58 33/58

Page 34: mdbms notes u-I.doc

UNIT-I

Degrees of Data Abstraction

For the system to be usable, it must retrieve data efficiently. The need for efficiency has led designers to use complex data structures to represent data in the database. Since many database-systems users are not computer trained, developers hide the complexity from users through several levels of abstraction, to simplify users’ interactions with the system:

Physical Level : The lowest level of abstraction describes how the data are actually stored. The physical level describes complex low-level data structures in detail.

Logical Level : The next-higher level of abstraction describes what data are stored in the database, and what relationships exist among those data. The logical level thus describes the entire database in terms of a small number of relatively simple structures. Although implementation of the simple structures at the logical level may involve complex physical-level structures, the user of the logical level does not need to be aware of this complexity. Database administrators, who must decide what information to keep in the database, use the logical level of abstraction.

The three levels of data abstraction View Level : The highest level of abstraction describes only part of

the entire database. Even though the logical level uses simpler structures, complexity remains because of the variety of information stored in a large database. Many users of the database system do not need all this information; instead, they need to access only a part of the database. The view level of abstraction exists to simplify their interaction with the system. The system may provide many views for the same database.

MDBMS 34/5834/58

Page 35: mdbms notes u-I.doc

UNIT-I

Data Abstraction1. The major purpose of a database system is to provide users with an abstract view of

the system.

The system hides certain details of how data is stored and created and maintained

Complexity should be hidden from database users.

2. There are several levels of abstraction: 1. Physical Level:

How the data are stored. E.g. index, B-tree, hashing. Lowest level of abstraction. Complex low-level structures described in detail.

2. Conceptual Level: Next highest level of abstraction. Describes what data are stored. Describes the relationships among data. Database administrator level.

3. View Level: Highest level. Describes part of the database for a particular group of users. Can be many different views of a database. E.g. tellers in a bank get a view of customer accounts, but not of

payroll data.

Figure illustrates the three levels.

MDBMS 35/5835/58

Page 36: mdbms notes u-I.doc

UNIT-I

A data model comprises of three components: ·        A structural part, consisting of a set of rules according to

which databases can be constructed. ·        A manipulative part, defining the types of operation that

are allowed on the data (this includes the operations that are used for updating or retrieving data from the database and for changing the structure of the database).

·        Possibly a set of integrity rules, which ensures that the data is accurate.The purpose of a data model is to represent data and to make the data understandable. There have been many data models proposed in the literature. They fall into three broad categories: ·        Object Based Data Models·        Physical Data Models·        Record Based Data ModelsThe object based and record based data models are used to describe data at the conceptual and external levels, the physical data model is used to describe data at the internal level.Object Based Data ModelsObject based data models use concepts such as entities, attributes, and relationships. An entity is a distinct object (a person, place, concept, event) in the organization that is to be represented in the database. An attribute is a property that describes some aspect of the object that we wish to record, and a relationship is an association between entities. Some of the more common types of object based data model are:·        Entity—Relationship·        Object Oriented The Entity-Relationship model has emerged as one of the main techniques for modeling database design and forms the basis for the database design methodology. The object oriented data model extends the definition of an entity to include, not only the attributes that describe the state of the object but also the actions that are associated with the object, that is, its behavior. Physical Data Models Physical data models describe how data is stored in the computer, representing information such as record structures, record ordering, and access paths. There are not as many physical data models as logical data models, the most common one being the Unifying Model. Record Based Logical Models Record based logical models are used in describing data at the logical and view levels. In contrast to object based data models, they are used to specify the overall logical structure of the database and to provide a higher-level description of the implementation. The three most widely accepted record based data models are:·        Hierarchical Model·        Network Model

MDBMS 36/5836/58

Page 37: mdbms notes u-I.doc

UNIT-I

·        Relational Model

MDBMS 37/5837/58

Page 38: mdbms notes u-I.doc

UNIT-I

Chapter-3

The Relational Database Model

The database structures required by both the hierarchical and network database models often become complicated enough to diminish efficient database design. The relational data model changed all of that by allowing the designer to focus on the logical representation of the data and its relationships, rather than on the physical storage details.

The relational model enables you to view data logically rather than physically. The practical significance of taking the logical view is that it serves as a reminder of the simple file concept of data storage. Although the use of a table, quite unlike that of a file, has the advantages of structural and data independence, a table does resemble a file from a conceptual point of view. Logical simplicity tends to yield simple and effective database design methodologies.Tables (relation) and their characteristics

The logical view of the database is facilitated by the creation of data relationships based on a logical construct known as a relation. A table is perceived as a two dimensional structure composed of rows and columns. A table is also called a relation. A table contains a group of related entity occurrences that is, an entity set. For example a STUDENT table contains a collection of entity occurrences, each representing a student. The terms entity set and table are often used interchangeably.Characteristics of a Relational Table:

1. A table is perceived as a two-dimensional structure composed of rows and columns.

2. Each table row (tuple) represents a single entity occurrence within the entity set.

3. Each table column represents an attribute and each column has a distinct name.

4. Each row and column intersection represents a single data value.5. All values in a column must conform to the same data format.6. Each column has a specific range of values known as the

attribute domain.7. The order of rows and columns is immaterial to the DBMS.8. Each table must have an attribute or a combination of attributes

that uniquely identifies each row i.e., each table must have a primary key.

KeysIn the relational model, keys are important because they are used

to ensure that each row in a table is uniquely identifiable and as well as they establish relationships among tables and ensure the integrity of the data.

A key consists of one or more attributes that determine other attributes. For example student num identifies all of the student attributes, such as name, dob, group studying etc.

MDBMS 38/5838/58

Page 39: mdbms notes u-I.doc

UNIT-I

The key’s role is based on the concept of determination. In the context of a database table, the statement “A determines B” indicates that if you know the value of attribute A, you can look up (determine) the value of attribute B.

For example, knowing the Stu_Num in the STUDENT table means that you are able to look up (determine) that student’s name, average, phone number and so on. The short hand notation for “A determines B” is A B. If A determines B,C,D, then A B,C,D.

The principle of determination is very important because it is used in the definition of a central relational database concept known as functional dependence.

The attribute B is functionally dependent on the attribute A if each value in column a determines one and only one value in column B.

The functional dependence definition can be generalized to cover the case in which the determining attribute values occur more than once in a table. Functional dependence can then be defined in this way:

Attribute A determines attribute B (i.e., B is functionally dependent on A) if all of the rows in the table that agree in value for attribute A also agree in value of attribute B.

For example, student classification based on hours completedHours Completed

Classification

<30 Fr30-59 So60-89 Jr>=90 Sr

Therefore, one can write : Stu_hours Stu_class

MDBMS 39/5839/58

Page 40: mdbms notes u-I.doc

UNIT-I

But the specific number of hours is not dependent on the classification. The classification(Stu_class) does not determine one and only one value for completed hours (Stu_hours).

Any attribute that is part of a key is known as key attribute. The combination of last name, first name, initial and home phone is very likely to produce unique matches for the remaining attributes.

For example Stu_lname, Stu_fname, Stu_init, Stu_phone Stu_hrs, Stu_class

With the possible existence of a composite key, the notion of functional dependence can be further refined by specifying full functional dependence:

If the attribute (B) is functionally dependent on a composite key (A) but not on any subset of that composite key, the attribute (B) is fully functionally dependent on (A).

Within the broad key classification, several specialized keys can be defined. For example, a super key is any key that uniquely identifies each row. In short, the super key functionally determines all of a row’s attributes. In the STUDENT table, the superkey could be any of the following:

Stu_NumStu_Num, Stu_LnameStu_Num, Stu_Lname, Stu_Init

A relational database can also be represented by a relational schema. A relational schema is a textual representation of the database tables where each table is listed by its name followed by the list of its attributes in parentheses. The primary key attribute(s) is (are) underlined.

STUDENT(Stu_Num, Stu_Lname, Stu_Init, Stu_Gpa)Relational Database KeysKey Type DefinitionSuper key An attribute (or combination of attributes) that uniquely

identifies each row in a tableCandidate key A minimal (irreducible) super key. A superkey that does

not contain a subset of attributes that it self is a superkey

Primary key A candidate key selected to uniquely identify all other attribute values in any given row. Cannot contain null entries

Secondary key An attribute (or combination of attributes) used strictly for data retrieval purposes

Foreign key An attribute (or combination of attributes) in one table whose values must either match the primary key in another table or be null

Integrity rulesThe relational data model includes several types of constraints, or

business rules, whose purpose is to facilitate maintaining the accuracy and integrity or data in the database. The major types of integrity

MDBMS 40/5840/58

Page 41: mdbms notes u-I.doc

UNIT-I

constraints are domain constraints, entity integrity, referential integrity and action assertions.

Domain constraints: A domain is the set of all values that may be assigned to an attribute. A domain definition usually consists of the following components : domain name, meaning, datatype, size and allowable values or allowable range. Ex: NOT NULL, Check

Entity Integrity: the entity integrity rule is designed to assure that every relation has a primary key, and that the data values for that primary key are all valid. It states that “No primary key attribute(or component of a primary key attribute) may be null”.

Referential Integrity constraint: a referential integrity constraint is a rule that maintains consistency among the rows of two relations. The rule states that if there is a foreign key in one relation, either each foreign key value must match a primary key value in another relation or the foreign value must be null.

Relational SET operators: the data in relational tables are of limited value unless the data can be manipulated to generate useful information. Relational algebra defines the theoretical way of manipulating table contents using the eight relational operators: SELECT, PROJECT, JOIN, INTERSECT, UNION, DIFFERENCE, PRODUCT and DIVIDE.

UNION combines all rows from two tables, excluding duplicate rows. The tables must have the same attribute characteristics(the columns and domains must be identical) to be used in the UNION.

INTERSECT yields only the rows that appear in both tables.DIFFERENCE yields all rows in one table that are not found in the

other table i.e., it subtracts one table from the other.PRODUCT yields all possible pairs of rows from two tables- also

known as the Cartesian product.SELECT, also known as RESTRICT yields values for all rows found in

a table that satisfy a given condition. SELECT yields a horizontal subset of a table.

PROJECT yields all values for selected attributes. In other words, PROJECT yields a vertical subset of a table.

JOIN allows information to be combined from two or more tables. It allows the use of independent tables linked by common attributes. There are several types of joins.A natural join links tables by selecting only the rows with common values in their common attributes.An equi join links tables on the basis of an equality condition that compares specified columns of each table. The outcome of the equijoin does not eliminate duplicate columns, and the condition or criterion used to join the tables must be explicitly defined. The equijoin takes its name from the equality comparison operator(=) used in the condition. If any other comparison operator is used, the join is called a theta join.In an outer join, the matched pairs would be retained and any unmatched values in the other table would be left null.

MDBMS 41/5841/58

Page 42: mdbms notes u-I.doc

UNIT-I

More specifically, if an outer join is produced for tables two scenarios are possible: a left outer join and a right outer join. The outer joins are labeled left and right, which refers to the order in which the tables are listed in the SQL command.

DIVIDE operation uses one single column table as the divisor and one 2 column table as the dividend. The tables must have a common column. The output of the DIVIDE operation is a single column with the values of column “a” from the dividend table rows where the value of the common column in both tables match.

The Data Dictionary and the System CatalogThe data dictionary provides a detailed description of all tables

found within the user/designer created database. Thus the data dictionary contains at least all of the attribute names and characteristics for each table in the system. The data dictionary contains metadata (data about data). The data dictionary is sometimes described as ‘the database designer’s database”.

The system catalog can be described as a detailed system dictionary that describes all objects within the database, including data about table names, the table’s creator and creating date, the number of columns in each table, the data type corresponding to each column, index filenames, index creators, authorized users, and access privileges. Because the system catalog contains all required data dictionary information, the terms system catalog and data dictionary are often used interchangeably.

The system catalog is actually a system created database whose tables store the user/designer created database characteristics and contents. They can be queried just like any user/designer-created table.

The system catalog automatically produces database documentation. As new tables are added to the database, that documentation also allows the RDBMs to check for and eliminate homonyms and synonyms.

In general terms, homonyms are similar sounding words with different meanings such as boar and bore. In database context, the word homonym indicates the use of the same attribute name to label different attributes.

In a database context, a synonym is the opposite of a homonym and indicates the use of different names to describe the same attribute.Relationships within the Relational Database

The relationships in relational database are divided into three types. They are

1. One to Many (1:M)2. Many to Many (M:N)3. One to One (1:1)

One to Many(1:M) relationshipThe 1:M relationship is the relational database norm. To see how

such a relationship is modeled and implemented, consider the PAINTER paints PAINTING example.

MDBMS 42/5842/58

Page 43: mdbms notes u-I.doc

UNIT-I

As you examine the PAINTER and PAINTING table contents in the above figure, note the following features:

Each painting is painted by one painter, but each painter could have painted many paintings.

There is only one row in the PAINTER table for any given row in the PAINTING table, but there may be many rows in the PAINTING table for any given row in the PAINTER table.

The 1:M relationship is found in any database environment. Many to Many(M:N) Relationship:

A Many to Many relationship is not supported directly in the relational environment. However, M:N relationships can be implemented by creating anew entity in 1:M relationships with the original entities.

Eg: To explore the M:N relationship, consider a college environment in which each STUDENT can take many classes and each CLASS can contain many STUDENTS.

The E-R model for M:N relationship between STUDENT and CLASS

MDBMS 43/5843/58

Page 44: mdbms notes u-I.doc

UNIT-I

The features of the above ERM are :Each CLASS can have many STUDENTS and each STUDENT can take

many CLASS.There can be many rows in the CLASS table for any given row in the

STUDENT table and there can be many rows in the STUDENT table for any given row in the CLASS table. These are shown in the below tables.One to One Relationship(1:1)

In this relationship, one entity can be related to only one other entity and vice versa. Eg: One department chair a professor.

A professor can chair only one department and one department can have only one department chair. The entities PROFESSOR and DEPARTMENT thus exhibit a 1:1 relationship.

The basic 1:1 relationship is modeled in figure and its implementation is shown.

The implemented 1:1 relationship between PROFESSOR and DEPARTMENT.

MDBMS 44/5844/58

Page 45: mdbms notes u-I.doc

UNIT-I

As you examine the above tables.The professor identification is through the emp_num.The 1:1 professor chairs department relationship is implemented by

having the emp_num foreign key in the department table.The professor table contains the dept code as the foreign key.

MDBMS 45/5845/58

Page 46: mdbms notes u-I.doc

UNIT-I

IndexesSuppose you want to locate a particular book in a library. Then you

use the libraries catalog, which is indexed by title, topic and author. The index (in either a manual or a computer system) points you to the books location, there by making retrieval of the book is a simple matter.

Suppose you to find a topic in a text book, then it is much simpler to go to the books index, look up the required pharase {ER Model} and read the page references that point you to the appropriate pages.Index

An index is an orderly arrangement used to logically access rows in a table. It is used to locate a needed item quickly.

Indexes in the relational database environment work like the indexes like the above examples. From a conceptual point of view, an index is composed of an index key and a set of pointers.

Suppose you want to look up all the paintings created by a given painter in the painting data base:

Table Name : Painter

Without an index, you must read each row in the PAINTING table and see if the painter_num matches the requested painter. However, if you index the painter table and use the index key painter_num, you merely need to look up the appropriate painter_num in the index and find the matching pointers. The index will resemble the representation shown in the below figure.

MDBMS 46/5846/58

Page 47: mdbms notes u-I.doc

UNIT-I

As you examine the above figure and compare it to the paintings database table, the 1st painter_num index key value(1,2,3) is found in record(1,2,4) of the painting table. The 2nd index key value(1,2,6) is found in records(3,5) of the painting table.

Indexes play an important role in DBMSs for the implementation of primary keys when you define a tables primary key, the DBMS automatically creates a unique index on the primary key columns you declared.

When you declare cus_code to be the primary key of the customer table, the DBMS automatically creates a unique index on that attribute.Unique Index:

It is an index in which the index key can have only one painter value(row) associated with it.

A table can have many indexes, but each index associated with only table.

The index key can have multiple attributes(composite index)Purposes of Indexes:

The following are the some of the advantages of indexes:An index is used to locate a needed item quickly.An index can be used to retrieve data more efficiently.Indexes can also be used by a DBMS to retrieve data ordered by a

specific attribute or attributes.

Codd RulesIn 1985, Dr. E.F. Codd published a list of 12 rules to define a

relational database system the reason Dr.Codd published the list was his concern that many vendors were marketing product as “relational” even though those products did not meet minimum relational standards. Dr. Codd’s list, shown in table serves as a frame of reference for what a truly relational database should be bear in mind that even the dominant database vendors do not fully support all 12 rules.

Rule

Rule Name Description

1. Information All information in a relational database must be logically represented as column values in rows within tables

2. Guaranteed Access Every value in a table is guaranteed to be accessible through a combination of table name, primary key value and column name

3. Systematic treatment of NULLs

Nulls must be represented and treated in a systematic way, independent of data type

4. Dynamic On-Line Catalog based on the relational model

The meta data must be stored and managed as ordinary data, i.e., in tables, with in the database. Such data must be available to authorized users using the standard database relational language

MDBMS 47/5847/58

Page 48: mdbms notes u-I.doc

UNIT-I

5. Comprehensive Data Sub language

The relational database may support many languages. However, it must support one well defined, declarative language with support for data definition, view definition, data manipulation (interactive and by program), integrity constraints, authorization and transaction management

6. View Updating Any view that is theoretically updatable must be updatable through the system

7. High level Insert, Update and Delete

The database must support set level inserts, updates and deletes

8. Physical Data Independence

Application programs and adhoc facilities are logically affected when physical access methods or storage structures are changed

9. Logical data independence

Application programs and adhoc facilities are logically unaffected when changes are made to the table structures that preserved the original table values(changing order or column or inserting columns)

10. Integrity Independence

All relational integrity constraints must be definable in the relational language and stored in the system catalog, not at the application level

11. Distribution Independence

The end users and application programs are unaware and unaffected by the data location (distributed vs local databases)

12. Non subversion If the system supports low level access to the data, there must not be a way to bypass the integrity rules of the database

Rule Zero All preceding rules are based on the notion that in order for a database to be considered relational it must use its relational facilities exclusively to manage the database

MDBMS 48/5848/58

Page 49: mdbms notes u-I.doc

UNIT-II

Entity Relationship Modeling [ERM]The relational database model, that the ERM forms the basis of an

ERD. The ERD represents the conceptual database as viewed by the end user. ERD’s shows the databases main components such as entities, attributes and relationships. The ERD’s can be represented as in the Chen notation or in the Crow’s Foot Notation or UML notations.

The Chen notation favours conceptual modeling. The Crow’s Foot notation favours a more implementation-

oriented approach. The UML notation can be used for both conceptual and

implementation modeling.Entities

Entity is nothing but like a person, place, a thing or an event about which data are to be collected and stored.

An entity actually refers to the “entity set” and does not to a single entity occurrence.

In otherwords, the word entity in the ERM corresponds to a table not to a row in the relational environment.

The ERM refers to a table row as an entity instance or entity occurrence

In both the chen and crow’s foot notations, an entity is represented by a rectangle containing the entities name.

The entity name should be a noun in singular form and is usually written in all capital letters in the center of the rectangle box.

AttributesAn attribute is a characteristic or feature of an entity.Eg: the student entity includes attributes such as: Stu_Fname,

Stu_Initial, Stu_Lname, Stu_Dob etc.,In the original chen notation, attributes are represented by ovals

and are connected to the entity rectangle with a lineEach oval contains the name of the attribute that represents.In the crows-foot notation, the attributes are written in the attribute

box below the entity rectangle.

Chen notation diagram and crow’s foot notation diagram

MDBMS 49/5849/58

STUDENT EMPLOYEE

Page 50: mdbms notes u-I.doc

UNIT-II

Required and Optional AttributesA required attribute is an attribute that must have a value, in

other words it can’t be left empty or null. The required attributes are shown in the bold faced attributes in the “crow foot” notation. In the above example, Stu_Fname and Stu_Initial are required attributes and we must pass values into those attributes.

An optional attribute is an attribute that doesn’t require a value, therefore it can be left empty or it can be null. In the above example, Stu_lname, Stu_dob can be left emptyDomains

Attributes have a domain. A domain is the set of possible values for a given attribute. The domain for the gender attribute consists of only two possibilities: M or F.

Attributes may share a domain.Identifiers (Primary Keys)

The ERM uses identifiers, that is, one or more attributes that uniquely identify each entity instance. Identifiers are underlined in the ERD.

Composite IdentifiersIdeally, an entity identifier is composed of only a single attribute. However, it is

possible to use a composite identifier, that is, a primary key composed of more than one attribute.Composite and Simple Attributes

An attribute which can be further sub divided to yield additional attributes is known as composite attributes. For example, the attribute “address” can be sub divided into street, city, state and zip code. A Simple attribute is an attribute which cannot be sub divided to yield additional attributes is known as simple attribute. Eg: age, gender, marital status would be classified as simple attributes.Single Valued attributes

An attribute that can have only a single value is known as single valued attributes.Eg: a person can have only one social security number(SSN) and a manufactured

part can have only one serial number.Multi Valued Attributes

The attributes which can have many values are known as multi valued attributes.Eg: A person may have several college degrees. A household may have several

different phones, each with its own numberIn the Chen ERM, the multi valued attributes are shown by a double

line connecting the attribute to the entity. But the crow’s foot notation doesn’t identify multi valued attributes.

Multi valued attributes can be represented as follows:

MDBMS 50/5850/58

Page 51: mdbms notes u-I.doc

UNIT-II

In the above example, car_vid is the primary key and car_color is a multi valued attribute of the car entity.Implementing Multi Valued Attributes

Although the conceptual model can handle M:N relationships and multi valued attributes, you should not implement them in the RDBMS.

In the Relational table, each column/row intersection represents a single data value. So if multivalued attributes exists, the designer must decide on one of the two possible courses of action:

1. Within the original entity, create several new attributes, one for each of the original multi valued attribute’s components. For example, the CAR entity’s attribute CAR_COLOR can be split to create the new attributes CAR_TOPCOLOR, CAR_BODYCOLOR and CAR_TRIMCOLOR. Although this solution seems to work, its adoption can lead to major structural problems in the table. Imagine how the solution in figure splitting a multi valued attribute into new attributes – would cause problems when it is applied to an employee entity containing employee degrees and certifications. If some employees have 10 degrees and certifications while most have fewer of none, the number of degree/certification attributes would number 10 and most of those attribute values would be null for the most.

2. Create a new entity composed of the original multi valued attribute’s components. The new entity (CAR_COLOR) then related to the original (CAR) entity in a 1:M relationship.

Derived AttributesA derived attribute is an attribute whose value is calculated

(derived) from other attributes. The derived attribute need not be physically stored within the database, instead, it can be derived by using an algorithm. For example, an employee’s age, EMP_AGE, may be found by computing the integer value of the difference between the current date and the EMP_DOB.

MDBMS 51/5851/58

Page 52: mdbms notes u-I.doc

UNIT-II

Derived attributes are sometimes referred to as computed attributes. The decision to store derived attributes in database tables depends on the processing requirements and the constraints placed on a particular application. The advantages and disadvantages of storing(or not storing) derived attributes in the database shown below in the table.

RelationshipsA relationship is an association between entities. The entities that

participate in a relationship are also known as participants, and each relationship is identified by a name that describes the relationship.

Relationships between entities always operate in both directions. That is, to define the relationship between the entities named CUSTOMER and INVOICE, you would specify that:

A CUSTOMER may generate many INVOICEs. Each INVOICE is generated by one CUSTOMER.

Connectivity and CardinalityThe term connectivity is used to describe the relationship

classification. That is one to one, one to many and many to many relationships.

Cardinality expresses the minimum and maximum number of entity occurrences associated with one occurrence of the related entity.

In the ERD, cardinality is indicated by placing the appropriate numbers beside the entities, using the format (x,y). The first value represents the minimum number of associated entities, while the second value represents the maximum number of associated entities.

Knowing the minimum and maximum number of entity occurrences is very useful at the application software level. For example, a college might want to ensure that a class is not taught unless it has at least 10 students enrolled.

MDBMS 52/5852/58

Page 53: mdbms notes u-I.doc

UNIT-II

As you examine the Crow’s Foot Diagram, the cardinality (1,4) written next to the CLASS entity in the “PROFESSOR teaches CLASS” relationship indicates that the PROFESSOR table’s primary key value occurs at least once and no more than four times as foreign key values in the CLASS table. Similarly, the cardinality(1,1) written next to the PROFESSOR entity indicates that each class is taught by one and only one professor. That is, each CLASS entity occurrence is associated with one and only one entity occurrence in PROFESSOR.Existence Dependence

An entity is said to be existence dependent if it can exist in the database only when it is associated with another related entity occurrence.Relationship Strength

The concept of relationship strength is based on how the primary key of a related entity is defined. To implement a relationship, the primary key of one entity appears as a foreign key in the related entity. There are times when the foreign key also is a primary key component in he related entity.Weak(Non-identifying) Relationships

A weak relationship, also known as non-identifying relationship, exists if the PK of the related entity does not contain a PK component of the parent entity. By default, relationships are established by having the PK of the parent entity appear as an FK on the related entity. For example, suppose that the COURSE and CLASS entities are defined.

In this case, a weak relationship exists between COURSE and CLASS because the CLASS_CODE is the CLASS entity’s PK, while the CRS_CODE is

MDBMS 53/5853/58

Page 54: mdbms notes u-I.doc

UNIT-II

only an FK. In this example, the CLASS PK did not inherit the PK component from the COURSE entity.

The Crow’s Foot notation depicts the weak(non-identifying) relationship with a solid line between the entities.Strong (Identifying) Relationships

A strong relationship, also known as an identifying relationship, exists when the PK of the related entity contains a PK component of the parent entity.

For example, the definitions of the COURSE and CLASS entities indicate that a strong relationship exists between COURSE and CLASS, because the CLASS entity’s composite PK is composed of CRS_CODE+CLASS_SECTION.

The Crow’s Foot notation depicts the strong(identifying) relationship with a solid line between the entities.Weak Entities

A weak entity is one that meets two conditions:1. The entity is existence-dependent; i.e., it cannot exist without

the entity with which it has a relationship.2. The entity has a primary key i.e., partially or totally derived from

the parent entity in the relationship.For example, a company insurance policy insures an employee and

his/her dependents. For the purpose of describing an insurance policy, an EMPLOYEE might or might not have a DEPENDENT, but the DEPENDENT must be associated with an EMPLOYEE. Moreover, the DEPENDENT cannot exist without the EMPLOYEE; i.e., a person cannot get insurance coverage as a dependent unless he happens to be a dependent of an employee. DEPENDENT is the weak entity in the relationship “EMPLOYEE has DEPENDENT”.

MDBMS 54/5854/58

Page 55: mdbms notes u-I.doc

UNIT-II

Relationship ParticipationParticipation in an entity relationship is either optional or

mandatory.Optional participation means that one entity occurrence doest

not require a corresponding entity occurrence in a particular relationship. In other words, an entity occurrence (row) in the one entity does not necessarily require the existence of a corresponding entity occurrence in another entity.

In Crow’s Foot notation, an optional relationship between entities is shown by drawing a small circle (O) on the side of the optional entity. The existence of an optional entity indicates that the minimum cardinality is 0 for the optional entity.

Mandatory participation means that one entity occurrence requires a corresponding entity occurrence in a particular relationship. The existence of a mandatory relationship indicates that the minimum cardinality is 1 for the mandatory entity.

Consider the relationship the “PROFESSOR teaches CLASS”, It is quite possible for a PROFESSOR not to teach a CLASS. Therefore, CLASS is optional to PROFESSOR. On the other hand, a CLASS might by taught by a PROFESSOR. Therefore, PROFESSOR is mandatory to CLASS.

The cardinality next to CLASS to be (0,3) in the figure, thus indicating that a professor may teach no classes at all or as many as three classes.

Table shows the various cardinalities that are supported by the Crow’s Foot notation.

MDBMS 55/5855/58

Page 56: mdbms notes u-I.doc

UNIT-II

Relationship DegreeA relationship degree indicates the number of entities or

participants associated with a relationship. A unary relationship exists when an association is maintained within a single entity. A binary relationship exists when two entities are associated. A ternary relationship exists when three entities are associated.

Unary RelationshipsIn the case of the unary relationship shown in figure, an employee

within the EMPLOYEE entity is the manager for one or more employees with that entity. In this case, the existence of the “manages” relationship means that EMPLOYEE requires another EMPLOYEE to be manager i.e.,

MDBMS 56/5856/58

Page 57: mdbms notes u-I.doc

UNIT-II

EMPLOYEE has a relationship with itself. Such a relationship is known as a recursive relationship.

A binary relationship exists when two entities are associated in a relationship. Binary relationships are most common. In fact, to simplify the conceptual design, whenever possible most higher order (ternary and higher) relationships are decomposed into appropriate equivalent binary relationships.Ternary and Higher Degree Relationships

A ternary relationship implies an association among three different entities. For example note the relationships in figure, which are represented by the following business rules:

A DOCTOR writes one or more PRESCRIPTIONs. A PATIENT may receive one or more PRESCRIPTIONs. A DRUG may appear in one or more PRESCRIPTIONs.

Associative (composite) EntitiesIn general, relationships do not contain attributes. The associative

entity(composite or bridge entity) is composed of the primary keys of each of the entities to be connected.

Normally relationships do not contain attributes. But some times

Types of EntitiesEntities are divided into three types. They are

Strong Entity Weak Entity Associative Entity

Strong EntityAn entity that exists independently of other entity types is known as

strong entity. It is also known as “existence independent”.Eg: some regular entities like: STUDENT, EMPLOYEE, COURSE,

CUSTOMER, PRODUCT, INVOICE etc.,Instances of a strong entity type always have a unique

characteristic called an identifier.Weak Entity:

An entity type whose existence depends on some other entity type is known as weak entity type. It is also called as “existence dependent”.

A weak entity type has no business meaning in the ERD without the entity on which it depends.

The entity type on which the weak entity type depends is called the Identifying owner.

A weak entity type does not have its own identifier.Eg: the below diagram shows the example of a weak entity and its

identifying relationship.

EMPLOYEE is a strong entity type with identifier Emp_no(we denote the Identifier attribute by underlining it), DEPENDENT is a weak entity which is indicated by double lined rectangle box.

The relationship between a weak entity type and its owner is called an “Identifying relationship”. In the above example, “has” is the

MDBMS 57/5857/58

Page 58: mdbms notes u-I.doc

UNIT-II

identifying relationship which can be indicated by the double lined diamond symbol.

The attribute ‘dependent_name’ serves as a partial identifier. We use a double under line to indicate a partial identifier. Here, dependent name is also a composite attribute which can be broken into three attributes like : first name, Middle initial, last name.Associative Entity:

An entity type that associates the instances of one or more entity types and contains attributes that are peculiar to the relationship among those entity instances.

The associative entity CERTIFICATE is represents with the diamond relationship symbol enclosed with the entity box as shown in the above figure. Thus an associative acts both as an entity type by including some attributes and also has a relationship between two or more entity types.

MDBMS 58/5858/58