Database Management system Dept of Computer Science & Engg, VJCET
MODULE I
1.1 Basic Concepts DBMS is a collection of interrelated data and a set of programs to access this
data in convenient and efficient way. It controls the organization, storage,
retrieval, security and integrity of data in a database. In other words, it enables
users to create and maintain a database. It accepts requests from the
application and instructs the operating system to transfer the appropriate data.
It facilitates the processes of defining, constructing, manipulating and sharing
of database among various users and applications.
-Defining a database means specifying the different type of data elements to
be stored in the database. i.e. data types, structures and constraints. For a bank
database, specifies the fields like Name (string of alphabets), Acct Number
(integer with range) and also the characteristics of each field.
-Constructing the database is the process of storing the data itself on
some storage medium that is controlled by the dbms.
-Manipulating a database is the processing of database. It includes
updating and retrieving of database.
1.2 Purpose of database systems
File system Verses Database approachOne way to store information in a computer system is to store it
as in traditional file system. In this method each data is stored in different files.
And there is an application programs for each of the application.
Data redundancy and inconsistency
In traditional file systems the data may be duplicated. For eg: Consider
a bank having two accounts savings bank account and credit check
account. In this case, the address of customer is stored in two files: one
with SB account and other with checking record. Thus this duplication
will result in need of high storage space. And this will also leads to the
inconsistency. That is, if the address of a customer changes, then the
1
Database Management system Dept of Computer Science & Engg, VJCET
change may be reflected only in one account. This is the inconsistency
of data.
Difficulty in accessing information
Suppose the bank needs a list of customers with an account higher than
Rs. 10,000. But, we do not have an application at hand to list out this
request. Thus, to access this information we have two choices. First
one is that list out the SB account customers and then extracts the
needed list manually. In the second option, we have to develop a new
program to satisfy the new request. The two are difficult.
Data Isolation
Data are scattered in different files and files may be in various formats.
So it is difficult extract the appropriate data.
Integrity problems
The constraint of data is enforced through the programs by appropriate
code. So if we need to add a new constraint, we have to change the
code. Then, it is very difficult to add or change the constraints. The
problem will be compounded when constraints involves several
constraints from different files.
Atomicity problems
Suppose a failure occurs during execution of the program. Then the
execution stops in the middle of the program resulting in an
inconsistency. But the execution of a program should end to a
consistency state. For a traditional file system the failure mostly result
to an inconsistency state.
1.3 Features (characteristics) of DBMS
The basic difference difference between traditional file processing and
database approach is that in traditional file processing, each user defines and
implements the files needed for a specific application as part of programming
the application. But in case of database approach, a single repository of data is
maintained that is defined once and then is accessed by various users.
For eg. Consider a student record, in traditional file processing the office
should have a record for each student to keep his or her fees and payments.
And in department have another record for students to keep their marks and
2
Database Management system Dept of Computer Science & Engg, VJCET
progress. Even though both office and Department interested in data about
students, each user maintains separate files, because each user requires some
data that is not available from other user.
Now what are the features of database approach?
Database system is
1. Self describing:
i.e. The database system contains not only the database itself but also a
complete definition or description and structure of database . This structure
is stored in a catalog with type, storage format and constraints as I
mentioned earlier. The information stored in database is called meta-data.
2. Data security
The DBMS can prevent unauthorized users from viewing or updating the
database. Using passwords, users are allowed access to the entire database
or a subset of it known as a "subschema." For example, in a student
database, some users may be able to view payment details while others
may view only mark list of students.
3. Data Integrity
The DBMS can ensure that no more than one user can update the same
record at the same time. It can keep duplicate records out of the database;
for example, no two customers with the same customer number can be
entered.
4. Interactive Query
Most DBMSs provide query languages and report writers that let users
interactively interrogate the database and analyze its data. This important
feature gives users access to all management information as needed. i.e.
we will get easily all details of each student at any time.
5. Interactive Data Entry and Updating
Many DBMSs provide a way to interactively enter and edit data, allowing
you to manage your own files and databases. However, interactive
operation does not leave an audit trail and does not provide the controls
necessary in a large organization. These controls must be programmed into
the data entry and update programs of the application.
6. Data Independence
With DBMSs, the details of the data structure are not stated in each
3
Database Management system Dept of Computer Science & Engg, VJCET
application program. The program asks the DBMS for data by field name;
for example, a coded equivalent of "give me customer name and balance
due" would be sent to the DBMS. Without a DBMS, the programmer must
reserve space for the full structure of the record in the program. Any
change in data structure requires changing all application programs.
1.4 DBMS Components Data:
Data stored in a database include numerical data which may be
integers (whole numbers only) or floating point numbers (decimal),
and non-numerical data such as characters (alphabetic and numeric
characters), date or logical (true or false). More advanced systems may
include more complicated data entities such as pictures and images as
data types.
Standard operations:
Standard operations are provided by most DBMS. These operations
provide the user basic capabilities for data manipulation. Examples of these
standard operations are sorting, deleting and selecting records.
Data definition language (DDL): DDL is the language used to describe the contents of the database. It is used to describe, for example, attribute names
(field names), data types, location in the database, etc.
Data manipulation and query language:
Normally a query language is supported by a DBMS to form
commands for input, edit, analysis, output, reformatting, etc. Some degree
of standardization has been achieved with SQL (Structured Query
Language).
Programming tools:
Besides commands and queries, the database should be accessible
directly from application programs through function calls (subroutine
calls) in conventional programming languages.
File structures:
Every DBMS has its own internal structures used to organize the
data although some common data models are used by most DBMS.
Abstraction
4
Database Management system Dept of Computer Science & Engg, VJCET
We all know that each application program have some data
relevant to a particular task. And an application program needs to use a portion
of data, which is used by some other programs. In early days of
computerization, each application programmer designs the file structure,
metadata of the file and the access method each record. That is, each application
program use its own data, details concerning the structure of data as well as the
access and to interpret each data. The application programs are implemented
independently and by hence itself, any change in storage media requires changes
to these structures and access methods. Because the files were structured for one
application, it was difficult to use the data in these files to new applications
requiring data from several files belonging to different existing applications.
Eg: Consider two application programs that require the data on an entity
set EMPLOYEE. The first application program involves the public relation
department sending each employee a news letter and related material. This
application program interested in the record type EMPLOYEE, that
containing the values for the attributes of EMPL_Name and
EMPL_Address.
1.5 Architecture of DBMS The generalized architecture of DBMS is called ANSI/SPARC model.
The architecture is divided into three levels: External level, Conceptual level
and Internal level.
The view at each of these levels is described by a schema. Schema
describes the records and its relationships in the view.
a. External view or User view
It is the highest level of data abstraction. This includes only those portions
of database of concern to a user or Application program. Each user has a
different external view and it is described by means of a scheme called external
schema. The schema contains the definition of the logical records and
relationships in external view. It also contains the method of deriving the objects
in the external view from the objects in the conceptual view.
b. Conceptual view
At this level of database abstraction, all the database entities and the
5
Database Management system Dept of Computer Science & Engg, VJCET
relationship among them are included. One conceptual view represents the entire
database called conceptual schema. It describes the method of deriving the
objects in the conceptual view from the objects in the internal view. And also
specify the checks to retain the data consistency and integrating.
c.Internal view
It is the lowest level of abstraction, closest to the physical storage method. It
describes how the data is stored, what is the structure of data storage and the
method of accessing these data. It is represented by internal schema.
View Level …Defined by User …………………………………………………………………………………
…………..
………………………………………………………………………………
………………..
Defined by DBA
Defined by DBA for
optimization
Fig 1.1
1.6 Data independenceData independence of DBMS is the capacity to change the schema at one
level of database system without having to change the next high levels. The
three schema architecture can be used to achieve this data independence. We can
define data independence into two types:
1. Logical data independence
It is the capacity to change the conceptual schema without having to
change the external schema. Sometimes, we may need to change the
conceptual schema to expand the database, to change the constraints, or to
6
View 2 View 4View 1 View 3
Physical Level
Logical Level
Database Management system Dept of Computer Science & Engg, VJCET
reduce the database. Only the view definitions and mappings need to be
changed in DBMS that supporting logical data independence. Application
programmer cannot feel any change in the schema construct of DBMS.
2. Physical data independence
Physical data independence is the capacity to change the internal
schema without having to change the conceptual schema and external
schema. The internal schema may change to improve the performance of
retrieval or update. Then the conceptual schema need not change if the data
remains same. For e.g.: We need not change the Query to retrieve a student
progress report even though the DBMS take a new method to store the
student record.
Advantages
1. Controlling Redundancy
In traditional file processing, every user group maintains its own
files. Each group independently keeps files on their db e.g., students.
Therefore, much of the data is stored twice or more. And the redundancy
leads to several problems:
a. duplication of effort
i.e. storage space wasted when the same data is stored repeatedly
b. files that represent the same data may become inconsistent (since the
updates are applied independently by each users group).
We can use controlled redundancy.
2. Restricting Unauthorized Access
A DBMS should provide a security and authorization subsystem.
Some db users will not be authorized to access all information in the db(e.g.,
financial data).
Some users are allowed only to retrieve data. Some users are allowed both to
retrieve and to update database.
3. Providing Persistent Storage for Program Objects and Data Structures
Data structure provided by DBMS must be compatible with the
programming language’s data structures. E.g., Object oriented DBMS are
compatible with programming languages such as C++, SMALLTALK, and
7
Database Management system Dept of Computer Science & Engg, VJCET
the DBMS software automatically performs conversions between
programming data structure and file formats.
4. Permitting Inference and Actions Using Deduction Rules
Deductive database systems provide capabilities for defining
deduction rules to inference new information from the stored database facts.
5. Providing Multiple User Interfaces
(e.g., query languages, programming languages interfaces, forms, menu-
driven interfaces, etc.)
6. Representing Complex Relationships Between Data
The complex relationship between data is easily represented.
7. Enforce Integrity Constraints
The integrity constraint for information is reasonably enforced by the
database management system.
1.7 DBMS Disadvantages A database system generally provides on-line access to the database
for many users. In contrast, a conventional system is often designed to meet
a specific need and therefore generally provides access to only a small
number of users. Because of the larger number of users accessing the data
when a database is used, the enterprise may involve additional risks as
compared to a conventional data processing system in the following areas.
1. Confidentiality, Privacy and Security When information is centralized and is made available to
users from remote locations, the possibilities of abuse are often more than in
a conventional system. To reduce the chances of unauthorized users
accessing sensitive information, it is necessary to take technical,
administrative and, possibly, legal measures. Most databases store valuable
information that must be protected from deliberate attack and destruction.
2. Data Quality Since the database is accessible to users remotely,
adequate controls are needed to control users updating data and to control
data quality. With increased number of users accessing data directly, there
are enormous opportunities for users to damage the data. Unless there are
suitable controls, the data quality may be compromised.
8
Database Management system Dept of Computer Science & Engg, VJCET
3. Data Integrity Since a large number of users could be using a database
concurrently, we should have to ensure that data remain correct during
operation. The main threat to data integrity comes from several different
users attempting to update the same data at the same time. The database
therefore needs to be protected against accidental changes by the users.
4. Enterprise Vulnerability Centralizing all data of an enterprise in one database may mean
that the database becomes critical resource. The survival of the enterprise
may depend on reliable information being available from its database. The
enterprise therefore becomes vulnerable to the destruction of the database or
to unauthorized modification of the database.
5. The Cost of using a DBMS
Conventional data processing systems are typically designed to
run a number of well-defined, preplanned processes. Such systems are often
"tuned" to run efficiently for the processes that they were designed for.
Although the conventional systems are usually fairly inflexible in that new
applications may be difficult to implement and/or expensive to run, they are
usually very efficient for the applications they are designed for.
The database approach on the other hand provides a
flexible alternative where new applications can be developed relatively
inexpensively. The flexible approach is not without its costs and one of these
costs is the additional cost of running applications that
the conventional system was designed for. Using standardized software is
almost always less machine efficient than specialized software.
1.8 Data model Entities and Attributes
Entities are distinguishable objects of concern and are modeled using
their characteristics or attributes. A database usually contains large number
9
Database Management system Dept of Computer Science & Engg, VJCET
of similar entities. For eg: A company database consists of a large number of
employees may want to store similar information for each employee. Then
each of the employees can be termed as an entity. An entity can be an
object with physical existence. For eg: a car, a person or an employee. But
each entity will have its own value. Each entity has properties that describe
the entity called attribute of that entity. Collection of entities with same
attributes termed as an entity type.
For eg: Employee (Employee_id, Address, Designation, Salary)
Here Employee is an entity and Employee_id, Address, Designation, Salary
represents the attribute of entity Employee.
There can be several types of attributes such as Simple versus
composite, single-valued verses multi-valued and stored verses derived.
1. Composite versus Simple
Composite attributes are those attributes that can be divided into
smaller sub parts with independent meaning. Consider the above e.g.: in
which the attribute Address can be
divided into small sub parts like City, State and Street_address. The
attributes that are not divisible are called simple or atomic attributes. The
value of a composite attribute is the concatenation of the value of its
constituent simple attributes.
2. Single-valued verses multi-valued
Most of the attributes will have only single value for a particular
entity. Such attributes are called single valued. In some cases there may be
having more than one value for an attribute of a particular entity. These
attributes are called multi-valued. The attribute age of an entity person will
have only one value, while the college degree of that person will have more
than one degree. So the attribute age can be consider as single-valued and
college degree as multi-valued.
3.Stored verses derived
In some case the attribute values can be related so that one can be
derived from the other. Consider a person as an entity. The attributes age and
DateOfBirth of person is
10
Database Management system Dept of Computer Science & Engg, VJCET
related. i.e. the age of a person can be derived from the current date and his
DateOfBirth. The age attribute hence is called Derived attribute and the
DateOfBirth is called stored attribute from where age of person calculated.
Entity set
An entity set is a set of entities of the same type that share the
same properties, or attributes. It is represented by a set of attributes. An
attribute, as used in the E-R model can be characterized by the following
attributes.
Simple and composite attributes
Single and multi-valued attributes
Null attributes
Derived attributes
A relationship is an association among several entities. And a relationship
set is a set of relationships of the same type.
Keys
Before designing a database we should be able to specify how entities
within a given entity set and relationships within a given relationship set are
distinguished. Conceptually the individual entities and relationships are
distinct; but from a database perspective, the difference must be expressed
by their attributes. The concept of key is used to make such distinctions.
Super key is a set of attributes that, taken collectively, to identify
uniquely an entity in the entity set. For eg: the social_security_no attribute
of the entity set employee is sufficient to distinguish one employee entity
from another. Thus social_security_no is a superkey for the entity set
employee. Superkeys with minimal subset is known as the candidate key.
For eg: it is possible to combine the attributes, employ_id & employ_name
form a superkey. But the social_security_no is sufficient to distinguish the
two employees. Thus social_security_no is a candidate key. Usually
primary key is used to denote the candidate key that is chosen by the
database designer to identify an entity from an entity set. A key (super,
candidate and primary) is a property of the entity set rather than the
individual entities.
Entity- Relationship (E-R) Diagram
11
Database Management system Dept of Computer Science & Engg, VJCET
The overall logical structure of a database can be expressed graphically by
an E-R diagram. The diagram consists of the following major components.
Rectangles: represent entity set.
Ellipses: represent attributes.
Diamonds: represents relationship sets.
Lines: links attribute set to entity set and entity set to relationship set.
Double ellipses: represent multi-valued attributes.
Dashed ellipses: denote derived attributes.
For eg: Consider an E-R diagram, which consists of two entity sets
customer and loan.
Fig1.2
A data model is a plan for building a database. The model represents
data conceptually, the way the user sees it, rather than how computers store
it. Data models focus on required data elements and associations; most often
they are expressed graphically using
Entity-relationship diagrams. On a more abstract level, the term is also used
in describing a database's overall structure. Mostly used data modeling
techniques are
1. Entity- Relationship model
2. Hierarchical model
3. Network model
12
Emp_id
Employee
LocationProduct
Salary
Designation
Addr
Works For
Company
Database Management system Dept of Computer Science & Engg, VJCET
4. Object-oriented model
1.9 Hierarchical Model The hierarchical data model organizes data in a tree structure.
There is a hierarchy of parent and child data segments. This structure implies
that a record can have repeating information, generally in the child data
segments. Data in a series of records have a set of field values attached to it.
It collects all the instances of a specific record together as a record type.
These record types are the equivalent of tables in the relational model, and
with the individual records being the equivalent of rows. To create links
between these record types, the hierarchical model uses Parent Child
Relationships
Hierarchical databases link records like an organization chart.
A record type can be owned by only one owner. In the following example,
orders are owned by only one customer. Hierarchical structures were widely
used with early mainframe systems; however, they are often restrictive in
linking real-world structures.
Fig 1.3
Advantages:
• Hierarchical Model is simple to construct and operate on
• Corresponds to a number of natural hierarchically organized domains -
e.g., assemblies in manufacturing, personnel organization in companies
13
Customer
Order
Database Management system Dept of Computer Science & Engg, VJCET
• Language is simple; uses constructs like GET, GET UNIQUE, GET
NEXT, GET NEXT WITHIN PARENT etc.
Disadvantages:
• Navigational and procedural nature of processing
• Database is visualized as a linear arrangement of records
• Little scope for "query optimization"
1.10 Network Model
In 1971, the Conference on Data Systems Languages (CODASYL) formally
defined the network model. The basic data modeling construct in the
network model is the set construct. A set consists of an owner record type, a
set name, and a member record type. A member record type can have that
role in more than one set, hence the multiparent concept is supported. An
owner record type can also be a member or owner in another set. In network
databases, a record type can have multiple owners. In the example below,
orders are owned by both customers and products, reflecting their natural
relationship in business.
Fig 1.4
Advantages:
• Network Model is able to model complex relationships and represents
semantics of add/delete on the relationships.
14
Customer
Order
Product
Database Management system Dept of Computer Science & Engg, VJCET
• Can handle most situations for modeling using record types and
relationship types.
• Language is navigational; uses constructs like FIND, FIND member, FIND
owner, FIND NEXT within set, GET etc. Programmers can do optimal
navigation through the database.
Disadvantages:
• Navigational and procedural nature of processing
• Database contains a complex array of pointers that thread through a set of
records.
• Little scope for automated "query optimization"
1.11 Object-Oriented Model Object DBMSs add database functionality to object programming
languages. They bring much more than persistent storage of
programming language objects. Object DBMSs
extend the semantics of the C++, Smalltalk and Java object programming
languages to provide full-featured database programming capability,
while retaining native language compatibility. A major benefit of this
approach is the unification of the application and database development
into a seamless data model and language environment. As a result,
applications require less code, use more natural data modeling, and code
bases are easier to maintain. Object developers can write complete
database applications with a modest
Objects
15
Order
Database Management system Dept of Computer Science & Engg, VJCET
Fig 1.5
1.12 Entity relational model (RDBMS - relational database
management system)
A database based on the relational model developed by E.F. Codd. A
relational database allows the definition of data structures, storage and
retrieval operations and integrity constraints. In such a database the data and
relations between them are organized in tables. A table is a collection of
records and each record in a table contains the same fields.
It permits the database designer to create a consistent,
logical representation of information. Consistency is achieved by including
declared constraints in the database design, which is usually referred to as
the logical schema. The theory includes a process of database normalization
whereby a design with certain desirable properties can be selected from a set
of logically equivalent alternatives. The access plans and other
implementation and operation details are handled by the DBMS engine, and
are not reflected in the logical model. This contrasts with common practice
for SQL DBMSs in which performance tuning often requires changes to the
logical model.
The basic relational building block is the domain or data
type, usually abbreviated nowadays to type. A tuple is an unordered set of
attribute values. An attribute is an ordered pair of attribute name and type
name. An attribute value is a specific valid value for the type of the attribute.
This can be either a scalar value or a more complex type. Relational
databases do not link records together physically, but the design of the
records must provide a common field, such as account number, to allow for
matching. Often, the fields used for matching are indexed in order to speed
up the process.
16
Database Management system Dept of Computer Science & Engg, VJCET
In the following example, customers, orders and products
are linked by comparing data fields and/or indexes when information from
more than one record type is needed. This method is more flexible for ad
hoc inquiries. Many hierarchical and network DBMSs also provide this
capability.
Relational model
Fig 1.6
MODULE 2
17
Customer Order ProductCustomer Order
Database Management system Dept of Computer Science & Engg, VJCET
2.1 Basic Structure of relational model - The relational model for database management is a data model based on predicate logic and set theory. It was invented by Edgar Codd. The fundamental assumption of the relational model is that all data are represented as mathematical n-ary relations, an n-ary relation being a subset of the Cartesian product of n sets.
1) Relation The fundamental organizational structure for data in the relational model is the relation. A relation is a two-dimensional table made up of rows and columns. Each relation also called a table, stores data about entities.
2) Tuples - The rows in a relation are called tuples. They represent specific occurrences (or records) of an entity. Each row consists of a sequence of values, one for each column in the table. In addition, each row (or record) in a table must be unique. A tuple variable is a variable that stand for a tuple.
3) Attributes – The column in a relation is called attribute. The attributes represent characteristics of an entity.
4) Domain – For each attribute there is a set of permitted values called domain of that attribute. For all relations ‘r’, the domain of all attributes of ‘r’ should be atomic. A domain is said to be atomic if elements of the domain are considered to be indivisible units.
2.2 Database Schema – Logical design of the database is termed as database schema.
1) Database instance – Database instance is a snapshot of the data in a database at a given instant of time.
2) Relation schema – The concept of relation schema corresponds to the programming notion of type definition. It can be considered as the definition of a domain of values. The database schema is the collection of relation schemas that define a database.
3) Relation instance – The concept of a relation instance corresponds to the programming language notion of a value of a variable. For relation instance, we actually mean the “relation” itself.
2.3 Keys – A key is the relational means of specifying uniqueness. The keys applicable in relational model are primary key, candidate key and super key.
1.) Primary key - A primary key is a value that can be used to identify a unique row in a table. Attributes are associated with it.
2.) Candidate key - A candidate key of a relation variable is a set of attributes of that relation variable such that (1) at all times it holds in the relation assigned to that variable that there are no two distinct tuples with the same values for these attributes and (2) there is not a proper subset for which (1) holds.
18
Database Management system Dept of Computer Science & Engg, VJCET
3.) Super key - A superkey is defined in the relational model as a set of attributes of a relation variable for which it holds that in all relations assigned to that variable there are no two distinct tuples that have the same values for the attributes in this set.
4.) Foreign key - A foreign key is a field or group of fields in a database record that point to a key field or group of fields forming a key of another database record in some (usually different) table. A relation schema, r1, derived from an E-R schema may include among its attributes the primary key of another relation schema, r2. This attribute is the foreign key from r1, referencing r2. The relation r1 is called the referencing relation of the foreign key dependency and r2 is called the referenced relation of r2.
2.4 Schema diagram – A database schema, along with primary key and foreign key dependencies, can be depicted pictorially by schema diagrams. Each relation in the database schema is represented as a box, with the attributes listed inside it and the relation name above it. If there are primary key attributes, a horizontal line crosses the box, with the primary key attributes listed above the line. Foreign key dependencies appear as arrows from the foreign key attributes of the referencing relation to the foreign key attributes of the referenced relation.
2.5 Relational algebra – The relational algebra is a procedural query language. (A query language is a language in which a user requests information from the database.) It consists of a set of operations that take one or two relations as input and produce a new relation as the result. The fundamental operations in relational algebra are select, project, union, set difference, Cartesian product and rename. There are several other operations namely, set intersection, natural join, division and assignment.
Fundamental operations
1. Select operation - The select operation selects tuples that satisfy a given predicate. The Greek symbol ‘σ’ is used to denote selection. The predicate appears as a subscript to σ . It is a unary operation.
E.g. Consider the borrow relation and branch relation in the banking example:
Borrow relation
Branch name Loan# Customer
nameAmount
Downtown
Round Hill
Redwood
17
23
13
Jones
Smith
Hayes
1000
2000
130019
Database Management system Dept of Computer Science & Engg, VJCET
Table 2.1
Branch relation
Table 2.2To select
tuples (rows) of the borrow relation where the branch is “Redwood”, we would write bname =”Redwood” (borrow)
The new relation created as the result of this operation consists of one tuple: (Redwood, 13, Hayes,1300). We allow comparisons using =, , <, , > and in the selection predicate. We also allow the logical connectives (or) and (and). For example:
bname = “Downtown” amount > 800 (borrow) 2. Project operation - The project operation is used to retrieve specific attributes/columns from a relation. It is denoted using Greek letter pi (∏). It is a unary operation.
For example, to obtain a relation showing customers and branches, but ignoring amount and loan#, we write
∏branchname,customername(borrow)
3) Union operation – The union operation is a binary operation since it involves 2 relations. It is used to retrieve tuples appearing in either or both the relations participating in the UNION. It is denoted as U. For a union operation RUS to be legal, we require that
o R and S must have the same number of attributes. o The domains of the corresponding attributes must be the same.
4) Set difference – The set difference operation is a binary operation. Set difference is denoted by the minus sign ( ). It finds tuples that are in one relation, but not in another. Thus R-S results in a relation containing tuples that are in R but not in S.
5) Cartesian product – This is a binary operation involving 2 relations. It is used to obtain all possible combination of tuples from two relations. The cartesian product of two relations is denoted by a cross ( ), written R1 x R2 for relations R1 and R2.
Branch name Branch city Assets
Downtown
Round Hill
Redwood
Brooklyn
Horseneck
Palo Alto
9000000
21000000
17000000
20
Database Management system Dept of Computer Science & Engg, VJCET
The result of R1 x R2 is a new relation with a tuple for each possible pairing of tuples from R1 and R2. In order to avoid ambiguity, the attribute names have attached to them the name of the relation from which they came. If no ambiguity will result, we drop the relation name. If R1 has n tuples, and R2 has m tuples, then R=R1 x R2 will have mxn tuples.
6) Rename – The rename operation solves the problems that occur with naming when performing the cartesian product of a relation with itself.
Suppose we want to find the names of all the customers who live on the same street and in the same city as Smith.
Customer name Customer street Customer city
Jones
Smith
Hayes
Main
North
Main
Harrison
Rye
Harrison
Table 2.3 Customer relation
We can get the street and city of Smith by writing
To find other customers with the same information, we need to reference the customer relation again:
where p is a selection predicate requiring street and ccity values to be equal.
So we have to distinguish between the two street values appearing in the Cartesian product, as both come from a single customer relation. For that, we use the rename operator, denoted by the Greek letter rho ( ).
We write
to get the relation r under the name of x.
If we use this to rename one of the two customer relations we are using, the ambiguities will disappear.
21
Database Management system Dept of Computer Science & Engg, VJCET
Additional operations
1. Set Intersection - Set intersection is denoted by , and returns a relation that
contains tuples that are in both of its argument relations. It does not add any power as
Eg: Consider the depositor and borrower relations. If we want to find all customers
who have both a loan and an account, we have to take the intersection of two
relations. It can be written as ∏ customer name(borrower) ∩ ∏ customer name(depositor).
2. Natural join - Natural join is a dyadic operator that is written as R S where R
and S are relations. The result of the natural join is the set of all combinations of
tuples in R and S that are equal on their common attribute names.
Consider R and S to be sets of attributes. We denote attributes appearing in both
relations by R ∩ S. We denote attributes in either or both relations by RUS. Consider
two relations r(R) and s(S). The natural join of r and s, denoted by r s is a relation
on scheme R ∩ S. It is a projection onto R U S of a selection on r x s where the
predicate requires r.a =s.a for each attribute a in R ∩ S. Formally, r s = Π R U S
(σ r.A1=s.A1 Λ r.A2=s.A2 Λ….r.An=s.An (r x s)) where R ∩ S = {Λ1, Λ2,…., Λn }
For an example consider the tables Employee and Dept and their natural join:
Dept
DeptName Manager
Sales Harriet
Production Charles
Employee
Name EmpId DeptName
Harry 3415 Finance
Sally 2241 Sales
Harriet 2202 Sales
22
Database Management system Dept of Computer Science & Engg, VJCET
Finance George
Table 2.4 Table 2.5
Table 2.6
3. Equi-join - If we want to combine tuples from two relations where the combination
condition is not simply the equality of shared attributes then it is convenient to have a
more general form of join operator, which is the θ-join (or theta-join). The θ-join is a
dyadic operator that is written as or where a and b are attribute names, θ is a binary relation in the set {<, ≤, =, >, ≥}, v is a value constant, and R and S are relations. The result of this operation consists of all combinations of tuples in R and S that satisfy the relation θ. The result of the θ-join is defined only if the headers of S and R are disjoint, that is, do not contain a common attribute.
4. Outer-join - Whereas the result of a join (or inner join) consists of tuples formed by combining matching tuples in the two operands, an outer join contains those tuples and additionally some tuples formed by extending an unmatched tuple in one of the operands by "fill" values for each of the attributes of the other operand. Three outer join operators are defined: left outer join, right outer join, and full outer join.
Left Outer join - The left outer join is written as R =X S where R and S are relations. The result of the left outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in R that have no
Employee Dept
Name EmpId DeptName Manager
Harry 3415 Finance George
Sally 2241 Sales Harriet
George 3401 Finance George
Harriet 2202 Sales Harriet
23
Database Management system Dept of Computer Science & Engg, VJCET
matching tuples in S. For an example consider the tables Employee and Dept and their left outer join:
In the resulting relation, tuples in S which have no common values in common
attribute names with tuples in R take a null value, ω. Since there are no tuples in Dept
with a DeptName of Finance or Executive, ωs occur in the resulting relation where
tuples in DeptName have tuples of Finance or Executive.
Table2.8
Table 2.9
The left outer join can be simulated using the natural join and
set union as follows:
R =X S = R ∪ (R S)
Dept
DeptName Manager
Sales Harriet
Production CharlesEmployee
Name EmpId DeptName
Harry 3415 Finance
Sally 2241 Sales
George 3401 Finance
Harriet 2202 Sales
Tim 1123 Executive
24
Database Management system Dept of Computer Science & Engg, VJCET
Table 2.10
Right outer join - The right outer join
behaves almost identically to the left outer
join, with the exception that all the values
from the "other" relation appear in the
resulting relation. The right outer join is
written as R X= S where R and S are
relations. The result of the right outer join is
the set of all combinations of tuples in R and
S that are equal on their common attribute
names, in addition to tuples in S that have no
matching tuples in R. For an eg. consider the
tables Employee and Dept and their right
outer join:
In the resulting relation, tuples in R which
have no common values in common attribute names with tuples in S take a null value,
ω. Since there are no tuples in Employee with a DeptName of Production, ωs occur in
the Name attribute of the resulting relation where tuples in DeptName had tuples of
Production.
Table 2.11
Employee =X Dept
Name EmpId DeptName Manager
Harry 3415 Finance ω
George 3401 Finance ω
Tim 1123 Executive ω
Sally 2241 Sales Harriet
Harriet 2202 Sales Harriet
Employee
Name EmpId DeptName
Harry 3415 Finance
Sally 2241 Sales
George 3401 Finance
Harriet 2202 Sales
Tim 1123 Executive
Dept
DeptName Manager
Sales Harriet
Production Charles
25
Database Management system Dept of Computer Science & Engg, VJCET
Table2.12
Employee X= Dept
Name EmpId DeptName Manager
Sally 2241 Sales Harriet
Harriet 2202 Sales Harriet
ω ω Production Charles
Table2.13
Full outer join - The outer join or full outer join in effect combines the results of the
left and right outer joins. The full outer join is written as R =X= S where R and S are
relations. The result of the full outer join is the set of all combinations of tuples in R
and S that are equal on their common attribute names, in addition to tuples in S that
have no matching tuples in R and tuples in R that have no matching tuples in S in their
common attribute names.
26
Database Management system Dept of Computer Science & Engg, VJCET
For an example consider the tables Employee and Dept and their full outer join:
In the resulting relation, tuples in R which have no common values in common
attribute names with tuples in S take a null value, ω. Tuples in S which have no
common values in common attribute names with tuples in R, also take a null value, ω
Table 2.14 Table 2.15
Table 2.16 Table2.17
Employee =X= Dept
Employee
Name EmpId DeptName
Harry 3415 Finance
Sally 2241 Sales
George 3401 Finance
Harriet 2202 Sales
Tim 1123 Executive
Dept
DeptName Manager
Sales Harriet
Production Charles
27
Database Management system Dept of Computer Science & Engg, VJCET
Name EmpId DeptName Manager
Harry 3415 Finance ω
Sally 2241 Sales Harriet
George 3401 Finance ω
Harriet 2202 Sales Harriet
Tim 1123 Executive ω
ω ω Production Charles
5. Division operation - The division is a binary operation that is written as R ÷ S. The
result consists of the restrictions of tuples in R to the attribute names unique to R, i.e.,
in the header of R but not in the header of S, for which it holds that all their
combinations with tuples in S are present in R. For an example see the tables
Completed, DBProject and their division: Table 2.19
Table 2.18
Completed
Student TaskFred
Fred
Fred
Eugene
Eugene
Sara
Sara
Database1
Database2
Compiler1
Database1
Compiler1
Database1
Database2
DBProject
TaskDatabase1
Database2
Completed ÷ DBProject
StudentFred
Sara
28
Database Management system Dept of Computer Science & Engg, VJCET
Let r(R) and s(S) be relations. Let . The relation r ÷ s is a relation on scheme
R – S. A tuple t is in r ÷ s if for every tuple ts in s there is a tuple tr in r satisfying both
of the following:
These conditions say that the portion of a tuple is in if and only if there
are tuples with the portion and the portion in for every value of the
portion in relation .
6. Assignment operation - Sometimes it is useful to be able to write a relational
algebra expression in parts using a temporary relation variable. The assignment
operation, denoted , works like assignment in a programming language.
We could rewrite our division definition as
No extra relation is added to the database, but the relation variable created can be used
in subsequent expressions. Assignment to a permanent relation would constitute a
modification to the database.
2.6 Tuple Relational Calculus - The tuple calculus is a calculus that was introduced by Edgar F. Codd as part of the relational model in order to give a declarative database query language for this data model. The tuple relational calculus is a nonprocedural language. (The relational algebra was procedural.) We must provide a formal description of the information desired. A query in the tuple relational calculus is expressed as { t / P(t) } i.e. the set of tuples t for which predicate P is true. We also use the notation
o t[a] to indicate the value of tuple on attribute. o t є r to show that tuple t is in relation r.
Example Queries
For example, to find the branch-name, loan number, customer name and amount for loans over $1200:
29
Database Management system Dept of Computer Science & Engg, VJCET
This gives us all attributes, but suppose we only want the customer names. (We would use project in the algebra.) We need to write an expression for a relation on scheme (cname).
In English, we may read this equation as “the set of all tuples t such that there exists a tuple s in the relation borrow for which the values of t and s for the cname attribute are equal, and the value of s for the amount attribute is greater than 1200.”
The notation means that “there exists a tuple t in relation r such that predicate Q(t) is true''. Consider another example: Find all customers having a loan from the SFU branch, and the cities in which they live:
In English, we might read this as “the set of all (cname,ccity) tuples for which cname is a borrower at the SFU branch, and ccity is the city of cname”. Tuple variable s ensures that the customer is a borrower at the SFU branch. Tuple variable u is restricted to pertain to the same customer as , and also ensures that ccity is the city of the customer.
The logical connectives (AND) and (OR) are allowed, as well as (negation). We also use the existential quantifier and the universal quantifier .
Formal Definition
A tuple relational calculus expression is of the form { t | P(t) } where P is a formula. Several tuple variables may appear in a formula.
Tuple variable : A tuple variable is said to be a free variable unless it is quantified by a or a . If it is quantified by a or a , it is said to be bound variable.
Formula : A formula is built of atoms. An atom is one of the following forms:
o s є r , where s is a tuple variable, and r is a relation ( is not allowed). o s[x] θ u[y] where s and u are tuple variables, and x and y are attributes, and θ
is a comparison operator ( ). o s[x] θ c, where c is a constant in the domain of attribute x.
Formulae are built up from atoms using the following rules:
o An atom is a formula. o If P is a formula, then so are and (P). o If P1and P2 are formulae, then so are P1 P2, , and . o If P(s) is a formula containing a free tuple variable s, then
30
Database Management system Dept of Computer Science & Engg, VJCET
are also formulae.
Important equivalences:
ooo
Safety of Expressions
A tuple relational calculus expression may generate an infinite expression, e.g.
There are infinite number of tuples that are not in borrow. Most of these tuples contain values that do not appear in the database. So we have to restrict the relational calculus.
Safe Tuple Expressions
The domain of a formula , denoted dom( ), is the set of all values referenced in P. We may say an expression { t / P(t) }is safe if all values that appear in the result are values from dom( ). A safe expression yields a finite number of tuples as its result. Otherwise, it is called unsafe. The tuple relational calculus restricted to safe expressions is equivalent in expressive power to the relational algebra.
2.7 Domain Relational Calculus - The domain relational calculus (DRC) is a calculus that was introduced by Edgar F. Codd as a declarative database query language for the relational data model. This language uses the same operators as tuple calculus; Logical operators Λ(and), V(or) and ¬ (not). The existential quantifier ( )∃ and the universal quantifier ( ) can be used to bind the variables.∀ Formal Definition
An expression is of the form
where the represent domain variables, and is a formula.
An atom in the domain relational calculus is of the following forms :
o <x1, x2, …., xn> є r where r is a relation on n attributes, and xi, 1 ≤ i ≤ n, are domain variables or constants.
o x θ y , where x and y are domain variables, and θ is a comparison operator.
o x θ c , where c is a constant.
Formulae are built up from atoms using the following rules:
31
Database Management system Dept of Computer Science & Engg, VJCET
o An atom is a formula. o If P is a formula, then so are and (P).o If P1and P2 are formulae, then so are P1 P2, , and
. o If P(s) is a formula containing a free tuple variable s, then
are also formulae.
Example Queries
Find branch name, loan number, customer name and amount for loans of over $1200.
Find all customers who have a loan for an amount > than $1200.
Find all customers having a loan from the SFU branch, and the city in which they live.
Find all customers having a loan, an account or both at the SFU branch.
Find all customers who have an account at all branches located in Brooklyn.
Safety of Expressions
We say that an expression
{ < x1, x2,…..,xn > | P (x1, x2,….xn)} is safe if all of the following hold:
1. All values that appear in tuples of the expression are values from dom(P).
2. For every “there exists” sub formula of the form Эx (P1(x)), the subformula is true if and only if there is a value x in dom(P1) such that P1(x) is true.
3. For every “ for all” subformula of the form Vx (P1(x)), the subformula is true if and only if P1(x) is true for all values of x.
32
Database Management system Dept of Computer Science & Engg, VJCET
An expression such as { <b, l, a> | ¬(<b, l, a> є loan)} is unsafe because it allows values in the result that are not in the domain of the expression.
All three of the following are equivalent:
o The relational algebra. o The tuple relational calculus restricted to safe expressions. o The domain relational calculus restricted to safe expressions.
2.8 SQL – Sql has become the standard relational database language. It has several parts:
o Data definition language (DDL) - provides commands to Define relation schemes. Delete relations. Create indices. Modify schemes.
o Interactive data manipulation language (DML) - a query language based on both relational algebra and tuple relational calculus, plus commands to insert, delete and modify tuples.
o Embedded data manipulation language - for use within programming languages like C, PL/1, Cobol, Pascal, etc.
o View Definition - commands for defining views o Authorization - specifying access rights to relations and views. o Integrity - a limited form of integrity checking. o Transaction control - specifying beginning and end of transactions.
Basic Structure
Basic structure of an SQL expression consists of select, from and where clauses.
A typical SQL query has the form : select A1, A2,…….,An
from r1, r2,….,rn
where PEach Ai represents an attribute, and each ri a relation. P is a predicate. This query is equivalent to the algebra expression.
Π A1, A2,….,An(σ p (r1 x r2 x ………x rm))
If the where clause is omitted, the predicate P is true. The list of attributes can be replaced with a * to select all. The result of an SQL query is a relation.
The select clause - corresponds to the projection operation of the relational algebra. It is used to list the attributes desired in the result of a query. If we want to remove duplicates in a selection procedure, we use the keyword distinct after select. The keyword all is used to specify explicitly that duplicates are not removed. select *
33
Database Management system Dept of Computer Science & Engg, VJCET
means select all the attributes. Select clause can also contain arithmetic expressions involving operators (+, -, *, / ) and operating on constants or attributes of tuples.
Eg: 1. select branch-name from loan
1. select branch-name, loan-number, amount*100from loan
The where clause - corresponds to selection predicate in relational algebra. It consists of a predicate involving attributes of the relations that appear in the from clause. SQL uses the logical connectives and, or and not - rather than mathematical symbols Λ, V and ¬ in the where clause. The operands of the logical connectives can be expressions involving the comparison operators <, >, ≤, ≥, = and <>. SQL includes a between comparison operator to simplify where clauses that specify that a value be less than or equal to some value or greater than or equal to some other value.
Eg: 1. select loan-number from loan where amount between 90000 and 100000
The from clause - corresponds to Cartesian product of the relational algebra. It lists the relations to be scanned in the evaluation of the expression.
The rename operation – SQL provides a mechanism for renaming both relations and attributes. It uses the as clause, taking the form: old-name as new-name.
String operations - The most commonly used operation on strings is pattern matching using the operator like. We describe patterns using two special characters:
Percent (%) – The % character matches any substring. Underscore ( _ ) – The _ character matches any character.
Patterns are case-sensitive. The keyword escape is used to define the escape character. We can use not like for string mismatching.
Ordering the display of tuples - SQL allows the user to control the order in which tuples are displayed.
o order by makes tuples appear in sorted order (ascending order by default).
o desc specifies descending order. o asc specifies ascending order.
Set operations - SQL has the set operations union, intersect and except. union eliminates duplicates, being a set operation. If we want to retain duplicates, we may use union all, similarly for intersect and except.
Not all implementations of SQL have these set operations. except in SQL-92 is called minus in SQL-86.
34
Database Management system Dept of Computer Science & Engg, VJCET
Aggregate functions - In SQL we can compute functions on groups of tuples using the group by clause. Attributes given are used to form groups with the same values. SQL can then compute
o average value -- avg o minimum value -- min o maximum value -- max o total sum of values -- sum o number in group -- count
These are called aggregate functions. They return a single value. having-clause is used to state conditions that applies to groups rather than to tuples. Predicates in the having clause are applied after the formation of groups. If a where clause and a having clause appear in the same query, the where clause predicate is applied first. Tuples satisfying where clause are placed into groups by the group by clause. The having clause is applied to each group. Groups satisfying the having clause are used by the select clause to generate the result tuples. If no having clause is present, the tuples satisfying the where clause are treated as a single group.
Null values – The keyword null is used to test for a null value(absence of information about the value of an attribute).
2.9Views in SQL - A view in SQL is defined using the create view command: create view v as <query expression>where <query expression> is any legal query expression. The view created is given the name v. To create a view all-customer of all branches and their customers:
create view all-customer as
(select bname, cnamefrom depositor, accountwhere depositor.account# = account.account#) union
(select bname, cname from borrower, loan where borrower.loan# = loan.loan#)
Having defined a view, we can now use it to refer to the virtual relation it creates. View names can appear anywhere a relation name can.
2.10 Data manipulations
Insert – It is used to insert a single tuple to a relation. To insert data into a relation, we either specify a tuple, or write a query whose result is the set of tuples to be inserted. Attribute values for inserted tuples must be members of the attribute's domain.
35
Database Management system Dept of Computer Science & Engg, VJCET
Eg: To insert a tuple for Smith who has $1200 in account A-9372 at the SFU branch.
insert into account values (“SFU”, “A-9372”', 1200)
It is important that we evaluate the select statement fully before carrying out any insertion. If some insertions were carried out even as the select statement were being evaluated, the insertion might insert an infinite number of tuples. Evaluating the select statement completely before performing insertions avoids such problems. It is possible for inserted tuples to be given values on only some attributes of the schema. The remaining attributes are assigned a null value denoted by null. We can prohibit the insertion of null values using the SQL DDL.
Delete – The delete command removes tuples from a relation. Deletion is expressed in much the same way as a query. Instead of displaying, the selected tuples are removed from the database. We can only delete whole tuples. A deletion in SQL is of the form delete from r where P. Tuples in r for which P is true are deleted. If the where clause is omitted, all tuples are deleted. We may only delete tuples from one relation at a time, but we may reference any number of relations in a select-from-where clause embedded in the where clause of a delete. However, if the delete request contains an embedded select that references the relation from which tuples are to be deleted, ambiguities may result.
Update - Updating allows us to change some values in a tuple without necessarily changing all. where clause of update statement may contain any construct legal in a where clause of a select statement (including nesting). A nested select within an update may reference the relation that is being updated. As before, all tuples in the relation are first tested to see whether they should be updated, and the updates are carried out afterwards.
Update of a view - The view update exists also in SQL. An example will illustrate: Consider a clerk who needs to see all information in the loan relation except amount. Let the view branch-loan be given to the clerk: create view branch-loan as select bname, loan# from loan
Since SQL allows a view name to appear anywhere a relation name may appear, the clerk can write: insert into branch-loan values (“SFU”, “L-307”). This insertion is represented by an insertion into the actual relation loan, from which the view is constructed. However, we have no value for amount. This insertion results in (“SFU'', “L-307”, null) being inserted into the loan relation.
MODULE 3
36
Database Management system Dept of Computer Science & Engg, VJCET
3.1 Transaction and system preliminaries.
The concept of transaction has been devised as a convenient and precise way
of describing the various logical units that form a database system. We have
transaction systems which are systems that operate on very large databases, on which
several (sometimes running into hundreds) of users concurrently operate – i.e. they
manipulate the database transaction. There are several such systems presently in
operation in our country also – if you consider the railway reservation system,
wherein thousands of stations – each with multiple number of computers operate on a
huge database, the database containing the reservation details of all trains of our
country for the next several days. There are many other such systems like the airlines
reservation systems, distance banking systems, stock market systems etc. In all these
cases apart from the accuracy and integrity of the data provided by the database (note
that money is involved in almost all the cases – either directly or indirectly), the
systems should provide instant availability and fast response to these hundreds of
concurrent users. In this block, we discuss the concept of transaction, the problems
involved in controlling concurrently operated systems and several other related
concepts. We repeat – a transaction is a logical operation on a database and the users
intend to operate with these logical units trying either to get information from the
database and in some cases modify them. Before we look into the problem of
concurrency, we view the concept of multiuser systems from another point of view –
the view of the database designer.
3.1.1 A typical multiuser system
We remind ourselves that a multiuser computer system is a system that can be
used by a number of persons simultaneously as against a single user system,
which is used by one person at a time. (Note however, that the same system can be
used by different persons at different periods of time). Now extending this
concept to a database, a multiuser database is one which can be accessed and
modified by a number of users simultaneously – whereas a single user database is
37
Database Management system Dept of Computer Science & Engg, VJCET
one which can be used by only one person at a time. Note that multiuser
databases essentially mean there is a concept of multiprogramming but the
converse is not true. Several users may be operating simultaneously, but not all of
them may be operating on the database simultaneously.
Now, before we see what problems can arise because of concurrency, we see
what operations can be done on the database. Such operations can be single line
commands or can be a set of commands meant to be operated sequentially. Those
operations are invariably limited by the “begin transaction” and “end transaction”
statements and the implication is that all operations in between them are to be done on
a given transaction.
Another concept is the “granularity” of the transaction. Assume each field in a
database is named. The smallest such named item of the database can be called a
field of a record. The unit on which we operate can be one such “grain” or a number
of such grains collectively defining some data unit. However, in this course, unless
specified otherwise, we use of “single grain” operations, but without loss of
generality. To facilitate discussions, we presume a database package in which the
following operations are available.
i) Read_tr(X: The operation reads the item X and stores it into an assigned
variable. The name of the variable into which it is read can be anything,
but we would give it the same name X, so that confusions are avoided. I.e.
whenever this command is executed the system reads the element required
from the database and stores it into a program variable called X.
ii) Write – tr(X): This writes the value of the program variable currently
stored in X into a database item called X.
Once the read –tr(X) is encountered, the system will have to perform the
following operations.
1. Find the address of the block on the disk where X is stored.
2. Copy that block into a buffer in the memory.
3. Copy it into a variable (of the program) called X.
A write –tr (x) performs the converse sequence of operations.
1. Find the address of the diskblock where the database variable X is stored.
38
Database Management system Dept of Computer Science & Engg, VJCET
2. Copy the block into a buffer in the memory.
3. Copy the value of X from the program variable to this X.
4. Store this updated block back to the disk.
Normally however, the operation (4) is not performed every time a write –tr is
executed. It would be a wasteful operation to keep writing back to the disk every
time. So the system maintains one/more buffers in the memory which keep getting
updated during the operations and this updated buffer is moved on to the disk at
regular intervals. This would save a lot of computational time, but is at the heart of
some of the problems of concurrency that we will have to encounter.
3.1.2 The need for concurrency controlLet us visualize a situation wherein a large number of users (probably spread
over vast geographical areas) are operating on a concurrent system. Several problems
can occur if they are allowed to execute their transactions operations in an
uncontrolled manner.
Consider a simple example of a railway reservation system. Since a number
of people are accessing the database simultaneously, it is obvious that multiple copies
of the transactions are to be provided so that each user can go ahead with his
operations. Let us make the concept a little more specific. Suppose we are
considering the number of reservations in a particular train of a particular date. Two
persons at two different places are trying to reserve for this train. By the very
definition of concurrency, each of them should be able to perform the operations
irrespective of the fact that the other person is also doing the same. In fact they will
not even know that the other person is also booking for the same train. The only way
of ensuring the same is to make available to each of these users their own copies to
operate upon and finally update the master database at the end of their operation.
Now suppose there are 10 seats are available. Both the persons, say A and B
want to get this information and book their seats. Since they are to be accommodated
concurrently, the system provides them two copies of the data. The simple way is to
perform a read –tr (X) so that the value of X is copied on to the variable X of person
A (let us call it XA) and of the person B (XB). So each of them know that there are 10
seats available.
39
Database Management system Dept of Computer Science & Engg, VJCET
Suppose A wants to book 8 seats. Since the number of seats he wants is (say
Y) less than the available seats, the program can allot him the seats, change the
number of available seats (X) to X-Y and can even give him the seat numbers that
have been booked for him.
The problem is that a similar operation can be performed by B also. Suppose
he needs 7 seats. So, he gets his seven seats, replaces the value of X to 3 (10 – 7) and
gets his reservation.
The problem is noticed only when these blocks are returned to main database
(the disk in the above case).
Before we can analyse these problems, we look at the problem from a more
technical view.
1 The lost update problem: This problem occurs when two transactions that access
the same database items have their operations interleaved in such a way as to make
the value of some database incorrect. Suppose the transactions T1 and T2 are
submitted at the (approximately) same time. Because of the concept of interleaving,
each operation is executed for some period of time and then the control is passed on to
the other transaction and this sequence continues. Because of the delay in updatings,
this creates a problem. This was what happened in the previous example. Let the
transactions be called TA and TB.
TA TB
Read –tr(X)
Read –tr(X) Time
X = X – NA
X = X - NB
Write –tr(X)
write –tr(X)
fig1 fig2
Note that the problem occurred because the transaction TB failed to record the
transactions TA. I.e. TB lost on TA. Similarly since TA did the writing later on, TA lost
the updatings of TB.
40
Database Management system Dept of Computer Science & Engg, VJCET
2 The temporary update (Dirty read) problem
This happens when a transaction TA updates a data item, but later on (for some
reason) the transaction fails. It could be due to a system failure or any other
operational reason. Or the system may have later on noticed that the operation should
not have been done and cancels it. To be fair, it also ensures that the original value is
restored.
But in the meanwhile, another transaction TB has accessed the data and since it
has no indication as to what happened later on, it makes use of this data and goes
ahead. Once the original value is restored by TA, the values generated by TB are
obviously invalid.
TA TB
Read –tr(X) Time
X = X – N
Write –tr(X)
Read –tr(X)
X = X - N
write –tr(X)
Failure
X = X + N
Write –tr(X)
Fig3
The value generated by TA out of a non-sustainable transaction is a “dirty
data” which is read by TB, produces an illegal value. Hence the problem is called a
dirty read problem.
3 The Incorrect Summary Problem: Consider two concurrent operations, again
called TA and TB. TB is calculating a summary (average, standard deviation or some
such operation) by accessing all elements of a database (Note that it is not updating
any of them, only is reading them and is using the resultant data to calculate some
values). In the meanwhile TA is updating these values. In case, since the Operations
41
Database Management system Dept of Computer Science & Engg, VJCET
are interleaved, TA, for some of it’s operations will be using the not updated data,
whereas for the other operations will be using the updated data. This is called the
incorrect summary problem.
TA TB
Sum = 0Read –tr(A)Sum = Sum + A
Read –tr(X)X = X – NWrite –tr(X)
Read tr(X)Sum = Sum + XRead –tr(Y)Sum = Sum + Y
Read (Y)Y = Y – NWrite –tr(Y)
Fig4In the above example, both TA will be updating both X and Y. But since it first updates X and then Y and the operations are so interleaved that the transaction TB uses both of them in between the operations, it ends up using the old value of Y with the new value of X. In the process, the sum we got does not refer either to the old set of values or to the new set of values.
4 Unrepeatable read: This can happen when an item is read by a transaction twice,
(in quick succession) but the item has been changed in the meanwhile, though the
transaction has no reason to expect such a change. Consider the case of a reservation
system, where a passenger gets a reservation detail and before he decides on the
aspect of reservation the value is updated at the request of some other passenger at
another place.
3.1.4 The concept of failures and recovery Any database operation can not be immune to the system on which it operates
(both the hardware and the software, including the operating systems). The system
should ensure that any transaction submitted to it is terminated in one of the following
ways.
42
Database Management system Dept of Computer Science & Engg, VJCET
a) All the operations listed in the transaction are completed, the
changes are recorded permanently back to the database and the
database is indicated that the operations are complete.
b) In case the transaction has failed to achieve it’s desired objective,
the system should ensure that no change, whatsoever, is reflected
onto the database. Any intermediate changes made to the database
are restored to their original values, before calling off the
transaction and intimating the same to the database.
In the second case, we say the system should be able to “Recover” from the
failure. Failures can occur in a variety of ways.
i) A System Crash: A hardware, software or network error can make the
completion of the transaction an impossibility.
ii) A transaction or system error: The transaction submitted may be faulty
– like creating a situation of division by zero or creating a negative
numbers which cannot be handled (For example, in a reservation
system, negative number of seats convey no meaning). In such cases,
the system simply discontinuous the transaction by reporting an error.
iii) Some programs provide for the user to interrupt during execution. If
the user changes his mind during execution, (but before the
transactions are complete) he may opt out of the operation.
iv) Local exceptions: Certain conditions during operation may force the
system to raise what are known as “exceptions”. For example, a bank
account holder may not have sufficient balance for some transaction to
be done or special instructions might have been given in a bank
transaction that prevents further continuation of the process. In all
such cases, the transactions are terminated.
v) Concurrency control enforcement: In certain cases when concurrency
constrains are violated, the enforcement regime simply aborts the
process to restart later.
The other reasons can be physical problems like theft, fire etc or system
problems like disk failure, viruses etc. In all such cases of failure, a recovery
mechanism is to be in place.
43
Database Management system Dept of Computer Science & Engg, VJCET
3.2 Transaction States and additional operations
Though the read tr and write tr operations described above the most
fundamental operations, they are seldom sufficient. Though most operations on
databases comprise of only the read and write operations, the system needs several
additional operations for it’s purposes. One simple example is the concept of
recovery discussed in the previous section. If the system were to recover from a crash
or any other catastrophe, it should first be able to keep track of the transactions –
when they start, when they terminate or when they abort. Hence the following
operations come into picture.
i) Begin Trans: This marks the beginning of an execution process.
ii) End trans: This marks the end of a execution process.
iii) Commit trans: This indicates that transaction is successful and the
changes brought about by the transaction may be incorporated onto the
database and will not be undone at a later date.
iv) Rollback: Indicates that the transaction is unsuccessful (for whatever
reason) and the changes made to the database, if any, by the transaction
need to be undone.
Most systems also keep track of the present status of all the transactions at the present
instant of time (Note that in a real multiprogramming environment, more than one
transaction may be in various stages of execution). The system should not only be
able to keep a tag on the present status of the transactions, but also should know what
are the nextpossibilities for the transaction to proceed and in case of a failure, how to
roll it back. The whole concept takes the state transition diagram. A simple state
transition diagram, in view of what we have seen so for can appear as follows:
Terminate
Abort Terminate
Begin End
44
ActivePartially committed
Committed
Termi-
nated
Failure
Database Management system Dept of Computer Science & Engg, VJCET
Transaction Transaction Commit
Read/Write
Fig5
The arrow marks indicate how a state of a transaction can change to a next
state. A transaction is in an active state immediately after the beginning of execution.
Then it will be performing the read and write operations. At this state, the system
protocols begin ensuring that a system failure at this juncture does not make
erroneous recordings on to the database. Once this is done, the system “Commits”
itself to the results and thus enters the “Committed state”. Once in the committed
state, a transaction automatically proceeds to the terminated state.
The transaction may also fail due to a variety of reasons discussed in a
previous section. Once it fails, the system may have to take up error control exercises
like rolling back the effects of the previous write operations of the transaction. Once
this is completed, the transaction enters the terminated state to pass out of the system.
A failed transaction may be restarted later – either by the intervention of the
user or automatically.
The concept of system log:
To be able to recover from failures of the transaction operations the
system needs to essentially maintain a track record of all transaction operations that
are taking place and that are likely to affect the status of the database. This
information is called a “System log” (Similar to the concept of log books) and may
become useful when the system is trying to recover from failures. The log
information is kept on the disk, such that it is not likely to be affected by the normal
system crashes, power failures etc. (Otherwise, when the system crashes, if the disk
also crashes, then the entire concept fails). The log is also periodically backed up into
removable devices (like tape) and is kept in archives.
The question is, what type of data or information needs to be logged into the
system log?
Let T refer to a unique transaction – id, generated automatically whenever a
new transaction is encountered and this can be used to uniquely identify the
transaction. Then the following entries are made with respect to the transaction T.
45
Database Management system Dept of Computer Science & Engg, VJCET
i) [Start-Trans, T] : Denotes that T has started execution.
ii) [Write-tr, T, X, old, new]: denotes that the transaction T has changed the
old value of the data X to a new value.
iii) [read_tr, T, X] : denotes that the transaction T has read the value of the X
from the database.
iv) [Commit, T] : denotes that T has been executed successfully and confirms
that effects can be permanently committed to the database.
v) [abort, T] : denotes that T has been aborted.
These entries are not complete. In some cases certain modification to their purpose
and format are made to suit special needs.
(Note that though we have been talking that the logs are primarily useful for recovery
from errors, they are almost universally used for other purposes like reporting,
auditing etc).
The two commonly used operations are “undo” and “redo” operations. In the undo, if
the transaction fails before permanent data can be written back into the database, the
log details can be used to sequentially trace back the updatings and return them to
their old values. Similarly if the transaction fails just before the commit operation is
complete, one need not report a transaction failure. One can use the old, new values
of all write operation on the log and ensure that the same is entered onto the database.
Commit Point of a Transaction:
The next question to be tackled is when should one commit to the results of a
transaction? Note that unless a transaction is committed, it’s operations do not get
reflected in the database. We say a transaction reaches a “Commit point” when all
operations that access the database have been successfully executed and the effects of
all such transactions have been included in the log. Once a transaction T reaches a
commit point, the transaction is said to be committed – i.e. the changes that the
transaction had sought to make in the database are assumed to have been recorded
into the database. The transaction indicates this state by writing a [commit, T] record
into it’s log. At this point, the log contains a complete sequence of changes brought
about by the transaction to the database and has the capacity to both undo it (in case
46
Database Management system Dept of Computer Science & Engg, VJCET
of a crash) or redo it (if a doubt arises as to whether the modifications have actually
been recorded onto the database).
Before we close this discussion on logs, one small clarification. The records
of the log are on the disk (secondary memory). When a log record is to be written, a
secondary device access is to be made, which slows down the system operations. So
normally a copy of the most recent log records are kept in the memory and the
updatings are made there. At regular intervals, these are copied back to the disk. In
case of a system crash, only those records that have been written onto the disk will
survive. Thus, when a transaction reaches commit stage, all records must be
forcefully written back to the disk and then commit is to be executed. This concept is
called ‘forceful writing’ of the log file.
3.3 Desirable Transaction properties. (ACID properties)For the effective and smooth database operations, transactions should possess
several properties. These properties are – Atomicity, consistency preservation,
isolation and durability. Often by combining their first letters, they are called ACID
properties.
i) Atomicity: A transaction is an atomic unit of processing i.e. it cannot be
broken down further into a combination of transactions. Looking
otherway, a given transaction will either get executed or is not performed
at all. There cannot be a possibility of a transaction getting partially
executed.
ii) Consistency preservation: A transaction is said to be consistency
preserving if it’s complete execution takes the database from one
consistent state to another.
We shall slightly elaborate on this. In steady state a database is expected to be
consistent i.e. there are not anomalies in the values of the items. For example if a
database stores N values and also their sum, the database is said to be consistent if the
addition of these N values actually leads to the value of the sum. This will be the
normal case.
Now consider the situation when a few of these N values are being changed.
Immediately after one/more values are changed, the database becomes inconsistent.
47
Database Management system Dept of Computer Science & Engg, VJCET
The sum value no more corresponds to the actual sum. Only after all the updatings
are done and the new sum is calculated that the system becomes consistent.
A transaction should always ensure that once it starts operating on a database,
it’s values are made consistent before the transaction ends.
iii) Isolation: Every transaction should appear as if it is being executed in
isolation. Though, in a practical sense, a large number of such transactions
keep executing concurrently no transaction should get affected by the
operation of other transactions. Then only is it possible to operate on the
transaction accurately.
iv) Durability; The changes effected to the database by the transaction should
be permanent – should not vanish once the transaction is removed. These
changes should also not be lost due to any other failures at later stages.
Now how does one enforce these desirable properties on the transactions? The
atomicity concept is taken care of, while designing and implementing the transaction.
If, however, a transaction fails even before it can complete it’s assigned task, the
recovery software should be able to undo the partial effects inflicted by the
transactions onto the database.
The preservation of consistency is normally considered as the duty of the
database programmer. A “consistent state” of a database is that state which satisfies
the constraints specified by the schema. Other external constraint may also be
included to make the rules more effective. The database programmer writes his
programs in such a way that a transaction enters a database only when it is in a
consistent state and also leaves the state in the same or any other consistent state.
This, of course implies that no other transaction “interferes” with the action of the
transaction in question.
This leads us to the next concept of isolation i.e. every transaction goes about
doing it’s job, without being bogged down by any other transaction, which may also
be working on the same database. One simple mechanism to ensure this is to make
sure that no transaction makes it’s partial updates available to the other transactions,
until the commit state is reached. This also eliminates the temporary update problem.
However, this has been found to be inadequate to take care of several other problems.
Most database transaction today come with several levels of isolation. A transaction
is said to have a level zero (0) isolation, if it does not overwrite the dirty reads of
higher level transactions (level zero is the lowest level of isolation). A transaction is
48
Database Management system Dept of Computer Science & Engg, VJCET
said to have a level 1 isolation, if it does not lose any updates. At level 3, the
transaction neither loses updates nor has any dirty reads. At level 3, the highest level
of isolation, a transaction does not have any lost updates, does not have any dirty
reads, but has repeatable reads.
3.4 The Concept of SchedulesWhen transactions are executing concurrently in an interleaved fashion, not
only does the action of each transaction becomes important, but also the order of
execution of operations from each of these transactions. As an example, in some of
the problems that we have discussed earlier in this section, the problem may get itself
converted to some other form (or may even vanish) if the order of operations becomes
different. Hence, for analyzing any problem, it is not just the history of previous
transactions that one should be worrying about, but also the “schedule” of operations.
Schedule (History of transaction):
We formally define a schedule S of n transactions T1, T2 …Tn as on ordering of
operations of the transactions subject to the constraint that, for each transaction, T i
that participates in S, the operations of Ti must appear in the same order in which they
appear in Ti. I.e. if two operations Ti1 and Ti2 are listed in Ti such that Ti1 is earlier to
Ti2, then in the schedule also Ti1 should appear before Ti2. However, if Ti2 appears
immediately after Ti1 in Ti, the same may not be true in S, because some other
operations Tj1 (of a transaction Tj) may be interleaved between them. In short, a
schedule lists the sequence of operations on the database in the same order in which it
was effected in the first place.
For the recovery and concurrency control operations, we concentrate mainly on readtr
and writetr operations, because these operations actually effect changes to the
database. The other two (equally) important operations are commit and abort, since
they decide when the changes effected have actually become active on the database.
Since listing each of these operations becomes a lengthy process, we make a notation
for describing the schedule. The operations of readtr, writetr, commit and abort, we
49
Database Management system Dept of Computer Science & Engg, VJCET
indicate by r, w, c and a and each of them come with a subscript to indicate the
transaction number
For example SA : r1(x); y2(y); w2(y); r1(y), W1 (x); a1
Indicates the following operations in the same order:
Readtr(x) transaction 1
Read tr (y) transaction 2
Write tr (y) transaction 2
Read tr(y) transaction 1
Write tr(x) transaction 1
Abort transaction 1
Conflicting operations: Two operations in a schedule are said to be in conflict if they
satisfy these conditions
i) The operations belong to different transactions
ii) They access the same item x
iii) Atleast one of the operations is a write operation.
For example : r1(x); w2 (x)
W1 (x); r2(x)
w1 (y); w2(y)
Conflict because both of them try to write on the same item.
But r1 (x); w2(y) and r1(x) and r2(x) do not conflict, because in the first case the read
and write are on different data items, in the second case both are trying read the same
data item, which they can do without any conflict.
A Complete Schedule: A schedule S of n transactions T1, T2…….. Tn is said to be a
“Complete Schedule” if the following conditions are satisfied.
50
Database Management system Dept of Computer Science & Engg, VJCET
i) The operations listed in S are exactly the same operations as in T1, T2 ……
Tn, including the commit or abort operations. Each transaction is
terminated by either a commit or an abort operation.
ii) The operations in any transaction. Ti appear in the schedule in the same
order in which they appear in the Transaction.
iii) Whenever there are conflicting operations, one of two will occur before
the other in the schedule.
A “Partial order” of the schedule is said to occur, if the first two conditions of the
complete schedule are satisfied, but whenever there are non conflicting operations in
the schedule, they can occur without indicating which should appear first.
This can happen because non conflicting operations any way can be executed in any
order without affecting the actual outcome.
However, in a practical situation, it is very difficult to come across complete
schedules. This is because new transactions keep getting included into the schedule.
Hence, often one works with a “committed projection” C(S) of a schedule S. This set
includes only those operations in S that have committed transactions i.e. transaction
Ti whose commit operation Ci is in S.
Put in simpler terms, since non committed operations do not get reflected in the actual
outcome of the schedule, only those transactions, who have completed their commit
operations contribute to the set and this schedule is good enough in most cases.
3.5 Schedules and Recoverability :
Recoverability is the ability to recover from transaction failures. The success
or otherwise of recoverability depends on the schedule of transactions. If fairly
straightforward operations without much interleaving of transactions are involved,
error recovery is a straight forward process. On the other hand, if lot of interleaving
51
Database Management system Dept of Computer Science & Engg, VJCET
of different transactions have taken place, then recovering from the failure of any one
of these transactions could be an involved affair. In certain cases, it may not be
possible to recover at all. Thus, it would be desirable to characterize the schedules
based on their recovery capabilities.
To do this, we observe certain features of the recoverability and also of
schedules. To begin with, we note that any recovery process, most often involves a
“roll back” operation, wherein the operations of the failed transaction will have to be
undone. However, we also note that the roll back need to go only as long as the
transaction T has not committed. If the transaction T has committed once, it need not
be rolled back. The schedules that satisfy this criterion are called “recoverable
schedules” and those that do not, are called “non-recoverable schedules”. As a rule,
such non-recoverable schedules should not be permitted.
Formally, a schedule S is recoverable if no transaction T which appears is S
commits, until all transactions T1 that have written an item which is read by T have
committed.
The concept is a simple one. Suppose the transaction T reads an item X from
the database, completes its operations (based on this and other values) and commits
the values. I.e. the output values of T become permanent values of database.
But suppose, this value X is written by another transaction T’ (before it is read
by T), but aborts after T has committed. What happens? The values committed by T
are no more valid, because the basis of these values (namely X) itself has been
changed. Obviously T also needs to be rolled back (if possible), leading to other
rollbacks and so on.
The other aspect to note is that in a recoverable schedule, no committed
transaction needs to be rolled back. But, it is possible that a cascading roll back
scheme may have to be effected, in which an uncommitted transaction has to be rolled
back, because it read from a value contributed by a transaction which later
aborted. But such cascading rollbacks can be very time consuming because at any
instant of time, a large number of uncommitted transactions may be operating. Thus,
it is desirable to have “cascadeless” schedules, which avoid cascading rollbacks.
52
Database Management system Dept of Computer Science & Engg, VJCET
This can be ensured by ensuring that transactions read only those values which
are written by committed transactions i.e. there is no fear of any aborted or failed
transactions later on. If the schedule has a sequence wherein a transaction T1 has to
read a value X by an uncommitted transaction T2, then the sequence is altered, so that
the reading is postponed, till T2 either commits or aborts.
This delays T1, but avoids any possibility of cascading rollbacks.
The third type of schedule is a “strict schedule”, which as the name suggests is highly
restrictive in nature. Here, transactions are allowed neither to read or write a value X
until the last transaction that wrote X has committed or aborted. Note that the strict
schedules largely simplifies the recovery process, but the many cases, it may not be
possible device strict schedules.
It may be noted that the recoverable schedule, cascadeless schedules and strict
schedules each is more stringent than it’s predecessor. It facilitates the recovery
process, but sometimes the process may get delayed or even may become impossible
to schedule.
3.6 Serializability
Given two transaction T1 and T2 are to be scheduled, they can be scheduled in
a number of ways. The simplest way is to schedule them without in that bothering
about interleaving them. I.e. schedule all operation of the transaction T1 followed by
all operations of T2 or alternatively schedule all operations of T2 followed by all
operations of T1.
T1 T2
read_tr(X)
X=X+N
write_tr(X)
read_tr(Y)
Y=Y+N
Write_tr(Y)
53
Database Management system Dept of Computer Science & Engg, VJCET
Time read_tr(X)
X=X+P
Write_tr(X)
Fig 6 Non-interleaved (Serial Schedule) :A
T1 T2 T2 T2
read_tr(X) read_tr(X)
read_tr(X)
X=X+N X=X+P X=X+P
write_tr(X) Write_tr(X)
write_tr(X)
read_tr(Y) readtr(X)
Y=Y+N |
Write_tr(Y) |
Fig 7 Non-interleaved (Serial Schedule):B
These now can be termed as serial schedules, since the entire sequence of operation in
one transaction is completed before the next sequence of transactions is started.
In the interleaved mode, the operations of T1 are mixed with the operations of T2.
This can be done in a number of ways. Two such sequences are given below:
T1 T2
read_tr(X )
X=X+N
54
Database Management system Dept of Computer Science & Engg, VJCET
read_tr(X)
X=X+P
write_tr(X)
read_tr(Y)
Write_tr(X)
Y=Y+N
Write_tr(Y)
Fig 8 Interleaved (non-serial schedule):C
T1 T2
read_tr(X)
X=X+N
write_tr(X)
read_tr(X)
X=X+P
Write_tr(X)
read_tr(Y)
Y=Y+N
Write_tr(Y)
Fig 9 Interleaved (Nonserial) Schedule D.
Formally a schedule S is serial if, for every transaction, T in the schedule, all
operations of T are executed consecutively, otherwise it is called non serial. In such a
non-interleaved schedule, if the transactions are independent, one can also presume
that the schedule will be correct, since each transaction commits or aborts before the
next transaction begins. As long as the transactions individually are error free, such a
sequence of events are guaranteed to give a correct results.
The problem with such a situation is the wastage of resources. If in a serial
schedule, one of the transactions is waiting for an I/O, the other transactions also
cannot use the system resources and hence the entire arrangement is wasteful of
resources. If some transaction T is very long, the other transaction will have to keep
waiting till it is completed. Moreover, wherein hundreds of machines operate
55
Database Management system Dept of Computer Science & Engg, VJCET
concurrently becomes unthinkable. Hence, in general, the serial scheduling concept is
unacceptable in practice.
However, once the operations are interleaved, so that the above cited problems
are overcome, unless the interleaving sequence is well thought of, all the problems
that we encountered in the beginning of this block become addressable. Hence, a
methodology is to be adopted to find out which of the interleaved schedules give
correct results and which do not.
A schedule S of N transactions is “serialisable” if it is equivalent to some
serial schedule of the some N transactions. Note that there are n! different serial
schedules possible to be made out of n transaction. If one goes about interleaving
them, the number of possible combinations become unmanageably high. To ease our
operations, we form two disjoint groups of non serial schedules- these non serial
schedules that are equivalent to one or more serial schedules, which we call
“serialisable schedules” and those that are not equivalent to any serial schedule and
hence are not serialisable once a nonserial schedule is serialisable, it becomes
equivalent to a serial schedule and by our previous definition of serial schedule will
become a “correct” schedule. But now can one prove the equivalence of a nonserial
schedule to a serial schedule?
The simplest and the most obvious method to conclude that two such
schedules are equivalent is to find out their results. If they produce the same results,
then they can be considered equivalent. i.e. it two schedules are “result equivalent”,
then they can be considered equivalent. But such an oversimplification is full of
problems. Two sequences may produce the same set of results of one or even a large
number of initial values, but still may not be equivalent. Consider the following two
sequences:
S1 S2
read_tr(X) read_tr(X)
X=X+X X=X*X
write_tr(X) Write_tr(X)
fig 10
56
Database Management system Dept of Computer Science & Engg, VJCET
For a value X=2, both produce the same result. Can be conclude that they are
equivalent? Though this may look like a simplistic example, with some imagination,
one can always come out with more sophisticated examples wherein the “bugs” of
treating them as equivalent are less obvious. But the concept still holds -result
equivalence cannot mean schedule equivalence. One more refined method of finding
equivalence is available. It is called “ conflict equivalence”. Two schedules can be
said to be conflict equivalent, if the order of any two conflicting operations in both the
schedules is the same (Note that the conflicting operations essentially belong to two
different transactions and if they access the same data item, and atleast one of them is
a write_tr(x) operation). If two such conflicting operations appear in different orders
in different schedules, then it is obvious that they produce two different databases in
the end and hence they are not equivalent.
1 Testing for conflict serializability of a schedule:
We suggest an algorithm that tests a schedule for conflict serializability.
1. For each transaction Ti, participating in the schedule S, create a node
labeled T1 in the precedence graph.
2. For each case where Tj executes a readtr(x) after Ti executes write_tr(x),
create an edge from Ti to Tj in the precedence graph.
3. For each case where Tj executes write_tr(x) after Ti executes a read_tr(x),
create an edge from Ti to Tj in the graph.
4. For each case where Tj executes a write_tr(x) after Ti executes a
write_tr(x), create an edge from Ti to Tj in the graph.
5. The schedule S is serialisable if and only if there are no cycles in the
graph.
If we apply these methods to write the precedence graphs for the four cases of
section 1.8, we get the following precedence graphs.
X
57
T1 T2 T1 T2
Database Management system Dept of Computer Science & Engg, VJCET
X
Schedule A Schedule B
X
X
Schedule C Schedule D
Fig 11
We may conclude that schedule D is equivalent to schedule A.
2.View equivalence and view serializability:
Apart from the conflict equivalence of schedules and conflict serializability, another
restrictive equivalence definition has been used with reasonable success in the context
of serializability. This is called view serializability.
Two schedules S and S1 are said to be “view equivalent” if the following conditions
are satisfied.
i) The same set of transactions participates in S and S1 and S and S1
include the same operations of those transactions.
ii) For any operation ri(X) of Ti in S, if the value of X read by the
operation has been written by an operation wj(X) of Tj(or if it is the
original value of X before the schedule started) the same condition
must hold for the value of x read by operation ri(X) of Ti in S1.
iii) If the operation Wk(Y) of Tk is the last operation to write, the item Y in
S, then Wk(Y) of Tk must also be the last operation to write the item y
in S1.
58
T1 T2T1 T2
Database Management system Dept of Computer Science & Engg, VJCET
The concept being view equivalent is that as long as each read operation of the
transaction reads the result of the same write operation in both the schedules, the write
operations of each transaction must produce the same results. Hence, the read
operations are said to see the same view of both the schedules. It can easily be
verified when S or S1 operate independently on a database with the same initial state,
they produce the same end states. A schedule S is said to be view serializable, if it is
view equivalent to a serial schedule.
It can also be verified that the definitions of conflict serializability and view
serializability are similar, if a condition of “ constrained write assumption” holds on
all transactions of the schedules. This condition states that any write operation wi(X)
in Ti is preceded by a ri(X) is Ti and that the value written by wi(X) in Ti depends
only on the value of X read by ri(X). This assumes that computation of the new value
of X is a function f(X) based on the old value of x read from the database. However,
the definition of view serializability is less restrictive than that of conflict
serializability under the “unconstrained write assumption” where the value written by
the operation Wi(x) in Ti can be independent of it’s old value from the database. This
is called a “blind write”.
But the main problem with view serializability is that it is extremely complex
computationally and there is no efficient algorithm to do the same.
3.Uses of serializability:
If one were to prove the serializability of a schedule S, it is equivalent to saying that S
is correct. Hence, it guarantees that the schedule provides correct results. But being
serializable is not the same as being serial. A serial scheduling inefficient because of
the reasons explained earlier, which leads to under utilization of the CPU, I/O devices
and in some cases like mass reservation system, becomes untenable. On the other
hand, a serializable schedule combines the benefits of concurrent execution( efficient
system utilization, ability to cater to larger no of concurrent users) with the guarantee
of correctness.
But all is not well yet. The scheduling process is done by the operating system
routines after taking into account various factors like system load, time of transaction
59
Database Management system Dept of Computer Science & Engg, VJCET
submission, priority of the process with reference to other process and a large number
of other factors. Also since a very large number of possible interleaving combinations
are possible, it is extremely difficult to determine before hand the manner in which
the transactions are interleaved. In other words getting the various schedules itself is
difficult, let alone testing them for serializability.
Hence, instead of generating the schedules, checking them for serializability and then
using them, most DBMS protocols use a more practical method – impose restrictions
on the transactions themselves. These restrictions, when followed by every
participating transaction, automatically ensure serializability in all schedules that are
created by these participating schedules.
Also, since transactions are being submitted at different times, it is difficult to
determine when a schedule begins and when it ends. Hence serializability theory can
be used to deal with the problem by considering only the committed projection C(CS)
of the schedule. Hence, as an approximation, we can define a schedule S as
serializable if it’s committed C(CS) is equivalent to some serial schedule.
3.7.Locking techniques for concurrency control
Many of the important techniques for concurrency control make use of the concept
of the lock. A lock is a variable associated with a data item that describes the status of
the item with respect to the possible operations that can be done on it. Normally
every data item is associated with a unique lock. They are used as a method of
synchronizing the access of database items by the transactions that are operating
concurrently. Such controls, when implemented properly can overcome many of the
problems of concurrent operations listed earlier. However, the locks themselves may
create a few problems, which we shall be seeing in some detail in subsequent sections.
Types of locks and their uses:
Binary locks: A binary lock can have two states or values ( 1 or 0) one of them
indicate that it is locked and the other says it is unlocked. For example if we presume
60
Database Management system Dept of Computer Science & Engg, VJCET
1 indicates that the lock is on and 0 indicates it is open, then if the lock of item(X) is 1
then the read_tr(x) cannot access the time as long as the lock’s value continues to be
1. We can refer to such a state as lock (x).
The concept works like this. The item x can be accessed only when it is free
to be used by the transactions. If, say, it’s current value is being modified, then X
cannot be (infact should not be) accessed, till the modification is complete. The
simple mechanism is to lock access to X as long as the process of modification is on
and unlock it for use by the other transactions only when the modifications are
complete.
So we need two operations lockitem(X) which locks the item and
unlockitem(X) which opens the lock. Any transaction that wants to makes use of the
data item, first checks the lock status of X by the lockitem(X). If the item X is
already locked, (lock status=1) the transaction will have to wait. Once the status
becomes = 0, the transaction accesses the item, and locks it (makes it’s status=1).
When the transaction has completed using the item, it issues an unlockitem (X)
command, which again sets the status to 0, so that other transactions can access the
item.
Notice that the binary lock essentially produces a “mutually exclusive” type of
situation for the data item, so that only one transaction can access it. These operations
can be easily written as an algorithm as follows:
The Locking algorithm
Lockitem(X):
Start: if Lock(X)=0, /* item is unlocked*/
Then Lock(X)=1 /*lock it*/
Else
{
wait(until Lock(X)=0) and
the lock manager wakes up the transaction)
61
Database Management system Dept of Computer Science & Engg, VJCET
go to start
}
The Unlocking algorithm:
Unlock item(X):
Lock(X) 0; ( “unlock the item”)
{ If any transactions are waiting,
Wakeup one of the waiting transactions}
The only restrictions on the use of the binary locks are that they should be
implemented as indivisible units (also called “critical sections” in operating systems
terminology). That means no interleaving operations should be allowed, once a lock
or unlock operation is started, until the operation is completed. Otherwise, if a
transaction locks a unit and gets interleaved with many other transactions, the locked
unit may become unavailable for long times to come with catastrophic results.
To make use of the binary lock schemes, every transaction should follow certain
protocols:
1. A transaction T must issue the operation lockitem(X), before issuing a
readtr(X) or writetr(X).
2. A transaction T must issue the operation unlockitem(X) after all readtr(X)
and write_tr(X) operations are complete on X.
3. A transaction T will not issue a lockitem(X) operation if it already holds
the lock on X (i.e. if it had issued the lockitem(X) in the immediate
previous instance)
4. A transaction T will not issue an unlockitem(X) operation unless it holds
the lock on X.
Between the lock(X) and unlock(X) operations, the value of X is held only
by the transaction T and hence no other transaction can operate on X, thus
many of the problems discussed earlier are prevented.
Shared/Exclusive locks
62
Database Management system Dept of Computer Science & Engg, VJCET
While the operation of the binary lock scheme appears satisfactory, it suffers
from a serious drawback. Once a transaction holds a lock (has issued a lock
operation), no other transaction can access the data item. But in large concurrent
systems, this can become a disadvantage. It is obvious that more than one transaction
should not go on writing into X or while one transaction is writing into it, no other
transaction should be reading it, no harm is done if several transactions are allowed to
simultaneously read the item. This would save the time of all these transactions,
without in anyway affecting the performance.
This concept gave rise to the idea of shared/exclusive locks. When only read
operations are being performed, the data item can be shared by several transaction,
only when a transaction wants to write into it that the lock should be exclusive. Hence
the shared/exclusive lock is also sometimes called multiple mode lock. A read lock is
a shared lock (which can be used by several transactions), whereas a writelock is an
exclusive lock. So, we need to think of three operations, a read lock, a writelock and
unlock. The algorithms can be as follows:
Read Lock Operation:
Readlock(X):
Start: If Lock (X) = “unlocked”
Then {
Lock(X) “read locked”,
No of reads(X) 1
}
else if Lock(X) = “read locked”
then no. of reads(X) = no of reads(X)0+1;
else { wait until Lock(X) “unlocked” and the lock
manager
wakes up the transaction) }
go to start
end.
63
Database Management system Dept of Computer Science & Engg, VJCET
The writelock operation :
Writelock(X)
Start: If lock(X) = “unlocked”
Then Lock(X) “write-locked”.
Else { wait until Lock(X) = “unlocked” and
The lock manager wakes up the transaction}
Go to start
End;
The Unlock Operation :
Unlock(X)
If lock(X) = “write locked”
Then { Lock(X) “unlocked”’
Wakeup one of the waiting transaction, if any
}
else if Lock(X) = “read locked”
then { no of reads(X) no of reads –1
if no of reads(X)=0
then { Lock(X) = “unlocked”
wakeup one of the waiting transactions, if any
}
}
The algorithms are fairly straight forward, except that during the unlocking
operation, if a number of read locks are there, then all of them are to be unlocked
before the unit itself becomes unlocked.
To ensure smooth operation of the shared / exclusive locking system, the
system must enforce the following rules:
64
Database Management system Dept of Computer Science & Engg, VJCET
1. A transaction T must issue the operation readlock(X) or writelock(X)
before any read or write operations are performed.
2. A transaction T must issue the operation writelock(X) before any
writetr(X) operation is performed on it.
3. A transaction T must issue the operation unlock (X) after all readtr(X) are
completed in T.
4. A transaction T will not issue a readlock(X) operation if it already holds a
readlock or writelock on X.
5. A transaction T will not issue a writelock(X) operation if it already holds a
readlock or writelock on X.
Conversion Locks
In some cases, it is desirable to allow lock conversion by relaxing the
conditions (4) and (5) of the shared/ exclusive lock mechanism. I.e. if a transaction T
already holds are type of lock on a item X, it may be allowed to convert it to other
types. For example, it is holding a readlock on X, it may be allowed to upgrade it to a
writelock. All that the transaction does is to issue a writelock(X) operation. If T is
the only transaction holding the readlock, it may be immediately allowed to upgrade
itself to a writelock, otherwise it has to wait till the other readlocks (of other
transactions) are released. Similarly if it is holding a writelock, T may be allowed to
downgrade it to readlock(X). The algorithms of the previous sections can be amended
to accommodate the conversion locks and this has been left as on exercise to the
students.
Before we close the section, it should be noted that use of binary locks does
not by itself guarantee serializability. This is because of the fact that in certain
combinations of situations, a key holding transaction may end up unlocking the unit
too early. This can happen because of a variety of reasons, including a situation
wherein a transaction feels it is no more needing a particular data unit and hence
unlocks, it but may be indirectly writing into it at a later time (through some other
unit). This would result in ineffective locking performance and the serializability is
65
Database Management system Dept of Computer Science & Engg, VJCET
lost. To guarantee such serializability, the protocol of two phase locking is to be
implemented, which we will see in the next section.
Two phase locking:
A transaction is said to be following a two phase locking if the operation of
the transaction can be divided into two distinct phases. In the first phase, all items
that are needed by the transaction are acquired by locking them. In this phase, no
item is unlocked even if it’s operations are over. In the second phase, the items are
unlocked one after the other. The first phase can be thought of as a growing phase,
wherein the store of locks held by the transaction keeps growing. In the second
phase, called the shrinking phase, the no. of locks held by the transaction keep
shrinking.
readlock(Y)
readtr(Y) Phase I
writelock(X)
-----------------------------------
unlock(Y)
readtr(X) Phase II
X=X+Y
writetr(X)
unlock(X)
fig12
3.8Query Optimization Techniques:
1. Heuristic-based query optimization – This is based on heuristic rules for ordering
the operations in a query execution strategy. In general, many different relational
algebra expressions-and hence many different query trees can be equivalent.i.e they
can correspond to the same query. The query parser will typically generate a standard
initial query tree to correspond to an SQL query without doing an optimization. The
optimizer must include rules for equivalence among relational algebra expressions
that can be applied to the query. The heuristic query optimization rules then utilize
66
Database Management system Dept of Computer Science & Engg, VJCET
these equivalence expressions to transform the initial tree, into the final optimized
query tree.
General transformation rules for relational algebra operations:
1. Cascade of σ : A conjunctive selection condition can be broken up into a cascade
of individual σ operations.
2. Commutativity of σ : The σ operation is commutative.
3. Cascade of П : In a cascade of П operations, all but the last one can be ignored.
4. Commutating σ with П : If the selection condition c involves only those attributes
A1, A2,…An in the projection list, the 2 operations can be commuted:
П A1, A2,..An (σc ( R) ) = σc (П A1, A2,..An ( R))
5. Commutativity of (and X ) : The operation is commutative as is the X
operation. i.e. R S = S R
R X S = S X R
6. Commuting σ with (or X) : If all the attributes in the selection condition c
involve only the attributes of one of the relations being joined, say R, the two
operations can be commuted as follows:
σc (R S) = (σc(R) ) S
Alternatively, if the selection condition c can be written as c1 and c2, where condition
c1 involves only the attributes of R and condition c2 involves only the attributes of S,
the operations commute as follows:
σc (R S) = (σc1(R) ) (σc2(S) )
The same rule apply if the is replaced by a X operation.
7. Commuting П with (or X) : Suppose that the projection list is L = {A1, A2,
….An, B1, B2,….Bm} where A1, A2, ……..An are attributes of R and B1, B2, ……
Bm are attributes of S. If the join condition c involves only attributes in L, the two
operations can be commuted as follows: П L ( R c S) = (П A1, A2,..An (R) ) c (П B1,
B2,..Bm (S) )
If the join condition c contains additional attributes not in L, these must be added to
the projection list, and a final П operation is needed. i.e. if attributes An+1,……,An+k
of R and Bm+1,……,Bm+p of S are involved in the join condition c but are not in the
projection list L, the operations commute as follows:
П L ( R c S) = П L ( (П A1, A2,..An,An+1,…..An+k (R) ) c (П B1, B2,..Bm,Bm+1,….Bm+p(S) )).
67
Database Management system Dept of Computer Science & Engg, VJCET
For X, there is no condition c, so the first transformation rule
always by replacing c with X.
8. Commutativity of set operations: The set operations U and ∩ are commutative
but – is not.
9. Associativity of , X , U and ∩ : These 4 operations are individually associative.
i.e if Ө stands for any of these four operations then (R Ө S) Ө T = R Ө (S Ө T).
10. Commuting σ with set operations: The σ operation commutes with U, ∩ and - .
If Ө stands for any of these three operations then σc(R Ө S) = (σc(R ) Ө (σc( S)).
11. The П operation commutes with U: П L ( R U S) = ( П L ( R)) U (П L ( S)).
12. Converting a (σ , X) sequence into : If the condition c of a σ that follows a X
corresponds to a join condition, convert the (σ, X) sequence into a as follows:
(σc (R X S) = (R c S)
Outline Of Heuristic Algebraic Optimization Algorithm
Based on the above mentioned rules we can now outline the steps of the algorithm as :
1. Using rule1, break up any SELECT operations with conjunctive conditions
into a cascade of SELECT operations.
2. Using rules 2, 4, 6 and 10 concerning the commutativity of SELECT with
other operations, move each SELECT operations as far down the query tree as
is permitted by the attributes involved in the select condition.
3. Using rules 5 and 9 concerning commutativity and associativity of binary
operations, rearrange the leaf nodes of the tree using the following criteria.
First, position the leaf node relations with the most restrictive SELECT
operations so they are executed first in the query tree representation. The
definition of most restrictive SELECT can mean either the ones that produce a
relation with the fewest tuples or with the smallest absolute size. Another
possibility is to define the most restrictive SELECT as the one with the
smallest selectivity. Second, make sure that the ordering of leaf nodes does not
cause CARTESIAN PRODUCT operations. For e.g. if the two relations with
the most restrictive SELECT do not have a direct join condition between
them, it may be desirable to change the order of leaf nodes to avoid Cartesian
products.
68
Database Management system Dept of Computer Science & Engg, VJCET
4. Using rule 12, combine a CARTESIAN PRODUCT operation with a
subsequent SELECT operation in the tree into a JOIN operation, if the
condition represents a join condition.
5. Using rules 3, 4, 7 and 11 concerning the cascading of PROJECT and the
commuting of PROJECT with other operations, break down and move lists of
projection attributes down the tree as far as possible by creating new
PROJECT operations as needed. Only those attributes needed in the query
result and in subsequent operations in the query tree should be kept after each
PROJECT operation.
6. Identify subtrees that represent groups of operations that can be executed by a
single algorithm.
2. Cost Based optimization – A query optimizer should not solely depend on
heuristic rules; it should also estimate and compare the costs of executing a query
using different execution strategies and should choose the strategy with the lowest
cost estimate. This approach is more suitable for compiled queries where the
optimization is done at compile time and the resulting execution strategy code is
stored and executed directly at run-time.
Cost Components for Query Execution
The cost of executing a query includes the following components:
1. Access cost to secondary storage: This is the cost of searching for, reading and
writing data blocks that reside on secondary storage, mainly on disk. The cost of
searching for records in a file depends on the type of access structures on that file,
such as ordering, hashing and primary or secondary indices. In addition, factors
such as whether the file blocks are allocated contiguously on the same disk
cylinder or scattered on the disk affect the access cost.
2. Storage cost: This is the cost of storing any intermediate files that are generated by
an execution strategy for the query.
3. Computation cost: This is the cost of performing in memory operations on the
data buffers during query execution. Such operations include searching for and
sorting records, merging records for a join and performing computations on field
values.
4. Memory usage cost: This is the cost pertaining to the number of memory buffers
needed during query execution.
69
Database Management system Dept of Computer Science & Engg, VJCET
5. Communication cost: This is the cost of shipping the query and its result from the
database site to the site or terminal where the query originated.
These components are used for cost function that is used to estimate query execution
cost. To estimate the costs of various execution strategies, we must keep track of
information that is needed for the cost functions. This information may be stored in
the DBMS catalog, where it is accessed by the query optimizer. First, we must know
the size of each file. For a file whose records are all of the same type, the number of
records(tuples), the (average) record size and the number of blocks are needed. The
blocking factor of the file may also be needed.
3.10 Assertions An assertion is a predicate expressing a condition that we wish the database always to
satisfy. Domain constraints and referential-integrity constraints are special forms of
assertions. There are many constraints that we cannot express using only these special
forms. Examples of such constraints include
1. The sum of all loan numbers for each branch must be less than the sum of all
account balances at the branch.
2. Every loan has at least one customer who maintains an account with a minimum
balance of $1000.00
An assertion in SQL-92 takes the form
Create assertion <assertion-name> check <predicate>
The two constraints mentioned can be written as shown next. Since SQL does not
provide a “for all X, P(X)” construct (where P is a predicate), we are forced to
implement the construct using the equivalent “not exists X such that not P(X) ”
construct , which can be written in SQL.
1. Create assertion sum-constraint check (not exists (select * from branch
where (select sum(amount) from loan where loan.branch-
name=branch.branch-name) >= (select sum(amount) from account where
loan.branch-name=branch.branch-name)))
2. Create assertion balance-constraint check (not exists (select * from loan
where not exists (select * from borrower, depositor, account where
loan.loan-number=borrower.loan-number and
borrower.customer-name=depositor.customer-name and
70
Database Management system Dept of Computer Science & Engg, VJCET
depositor.account-number=account.account.number and
account.balance>=1000)))
When an assertion is created, the system tests it for validity. If the assertion is valid,
then any future modification to the database is allowed only if it does not cause that
assertion to be violated.
3.10 Triggers A trigger is a statement that is executed automatically by the system as a side
effect of a modification to the database. To design a trigger mechanism, we must meet
two requirements:
1. Specify the conditions under which the trigger is to be executed.
2. Specify the actions to be taken when the trigger executes
3.11 The basic structure of the oracle system An Oracle server consists of an Oracle database – the collection of
stored data, including log and control files – and the Oracle instance – the processes,
including Oracle (system) processes and user processes taken together, created for a
specific instance of the database operation.
Oracle Database Structure
The Oracle database has two primary structures:
1. A physical structure – referring to the actual stored data.
2. A logical structure – corresponding to an abstract representation of stored data,
which roughly corresponds to the conceptual schema of the databases.
The database contains the following types of files:
1. One or more data files; these contain the actual data.
2. Two or more log files called redo log files; these record all changes made to
data and are used in the process of recovering, if certain changes do not get
written to permanent storage.
3. One or more control files; these contain control information such as database
name, file names and locations and a database creation timestamp.
4. Trace files and an alert log; background processes have a trace file associate
with them and the alert log maintains major database events.
71
Database Management system Dept of Computer Science & Engg, VJCET
The structure of an Oracle database consists of the definition of database in terms of
schema objects and one or more tablespaces. The schema objects contain definitions
of tables, views, sequences, stored procedures, indexes, clusters and database links.
Oracle instance : The set of processes that constitute an instance of the server’s
operation is called an Oracle instance, which consists of a System Global Area and a
set of background processes.
System Global Area (SGA) : This area of memory is used for database
information shared by users. Oracle assigns an SGA area when an instance starts.
The SGA in turn is divided into several types of memory structures:
1. Database buffer cache: This keeps the most recently accessed data blocks from
the database. This helps in reducing the disk I/O activity.
2. Redo log buffer, which is the buffer for the redo log file and is used for
recovery purposes.
3. Shared pool, which contains shared memory constructs.
User processes : Each user process corresponds to the execution of some
application or some tool.
Program Global Area (PGA): This is a memory buffer that contains data and
control information for a server process.
Oracle processes: A process is a thread of control or a mechanism in an operating
system that can execute a series of steps. A process has its own private memory
area where it runs.
Oracle Processes: Oracle creates server processes to handle requests from connected
user processes. The background processes are created for each instance of Oracle;
they perform I/O asynchronously and provide parallelism for better performance.
Oracle Startup and Shutdown: An Oracle database is not available to users until
the Oracle server has been started up and the database has been opened. Starting a
database and making it available system wide requires the following steps:
1. Starting an instance of the database: The SGA is allocated and background
processes are created in this step.
2. Mounting a database: This associates a previously started Oracle instance with a
database. Until then it is available only to administrators. The database
administrator chooses whether to run the database in exclusive or parallel mode.
When an oracle instance mounts a database in an exclusive mode, only that
instance can access the database. On the other hand, if the instance is started in a
72
Database Management system Dept of Computer Science & Engg, VJCET
parallel or share mode, other instances that are started in parallel mode can also
mount the database.
3. Opening a database: Opening a database makes it available for normal database
operations by having oracle open the on-line data files and log files.
The reverse of the above operations will shut down an Oracle instance as follows:
1. Close the window.
2. Dismount the database.
3. Shut down the Oracle instance.
3.12 Database structure and its manipulation in oracleSchema Objects: In Oracle schema refers to a collection of data definition objects.
Schema objects are the individual objects that describe tables, views etc. Tables are
the basic units of data. Synonyms are direct reference to objects. Program units
include function, stored procedure or package.
Oracle Data Dictionary: This is a read-only set of tables that keeps the metadata –
schema description – for a database. Oracle dictionary, has the following components:
Names of users
Security information
Schema objects information
Integrity constraints
Space allocation and utilization of database objects
Statistics on attributes, tables and predicates
Access audit trail information
3.13 Storage organization in oracleA database is divided into logical storage units called tablespaces, with the following
characteristics:
Each database is divided into one or more tablespaces.
There is a system tablespace and users tablespace.
One or more datafiles are created in each tablespace.
The combined storage capacity of a database’s tablespace is the total storage
capacity of the database.
Data Blocks: Data Block represents the smallest unit of I/O. A data block has the
following components:
73
Database Management system Dept of Computer Science & Engg, VJCET
Header: Contains general block information such as block address and type of
segment.
Table directory: Contains information about tables that have data in the data
block.
Row directory: Contains information about the actual rows.
Row data: Uses the bulk of space in the data block.
Free space: Space allocated for row updates and new rows.
Extents: When a table is created, Oracle allocates it an initial extent. Incremental
extents are automatically allocated when the initial extent becomes full. All extents
allocated in index segments remain allocated as long as the index exists. When an
index associated with a table or cluster is dropped, Oracle reclaims the space.
Segments: A segment is made up of a number of extents and belongs to a tablespace.
Oracle uses the following types of 4 segments:
Data segments: Each nonclustered table and each cluster has a single data segment to
hold all its data, which is created when the application creates the table or cluster with
the CREATE command.
Index segments: Each index in an Oracle database has a single index segment, which
is created with the CREATE INDEX command.
Temporary segments: These are created by Oracle for use by SQL statements that
need a temporary work area.
Rollback segments: Each database must contain one or more rollback segments,
which are used for “undoing” transactions.
3.14 Programming in PL/SQL :
BLOCK PL/SQL STRUCTURE:
PL/SQL is a block-structured language. A PL/SQL block defines a unit of
processing, which can include its own local variables, SQL statements, cursors, and
exception handlers. The blocks can be nested. The simplest block structure is given
below.
74
DECLARE
Variable declarations
BEGIN
Program statements
EXCEPTION
WHEN exception
THEN
Database Management system Dept of Computer Science & Engg, VJCET
In the above PL/SQL block, block parts are logical. Blocks starts with
DECLARATION section in which memory variables and other oracle objects can
be declared. The next section contains SQL executable statements for
manipulating table data by using the variables and constants declared in the
DECLARE section. EXCEPTIONS is the last sections of the PL/SQL block which
contains SQL and/or PL/SQL code to handle errors that may crop up during the
execution of the above code block. EXCEPTION section is optional.
Each block can contain other blocks, i.e. blocks can be nested. Blocks of the
code cannot be nested in the DECLARATION section.
PL/SQL CHARACTER SET
PL/SQL uses the standard ASCII set. The basic character set includes the
following.
Words used in a PL/SQL blocks are called lexical units. We can freely insert
blank spaces between lexical units in a PL/SQL blocks. The spaces have no effect
on the PL/SQL block.
The ordinary symbols used in PL/SQL blocks are
( ) + - * / < > = ; % , “ [ ] :
Compound symbols used in PL/SQL block are
<> != -= ^= <= >= : = ** || << >>
VARIABLES
75
Uppercase alphabets A to Z.
Lowercase alphabets a to z.
Numbers 0 to 9
Symbols ( ) + - * / < > = ! ; : , . @ ‘
% “ # $ ^ & _ \ { } ? [ ]
Database Management system Dept of Computer Science & Engg, VJCET
Variables may be used to store the result of a query or calculations. Variables
must be declared before being used. Variables in PL/SQL block are named variables.
A variable name must begin with a character and can be followed by a maximum of
29 other characters (variable length is 30 characters).
Reserved words cannot be used as variable names unless enclosed within the
double quotes. Variables must be separated from each other by at least one space or
by a punctuation mark.
The case (upper/lower) is insignificant when declaring variable names. Space
cannot be used in a variable name.
LITERALS
A literal is a numeric value or a character string used to represent itself. So,
literals can be classified into two types.
Numeric literals
Non- numeric literals (string literals)
Numeric literals:
These can be either integers or floating point numbers. If a float is being
represented, then the integer part must be separated from the float part by a period
( . ).
Integers 25 43 437 -57 etc
Floats 6.34 25E-03 0.1 +17.1 etc
Non numeric literals:
These are represented by one or more legal characters and must be enclosed
within single quotes.
Ex: ‘ Hello world ’
76
Database Management system Dept of Computer Science & Engg, VJCET
‘ EMPLOYEE NAME ’‘ ******* ’
‘ A’
‘ * ’
We can represent single quote character itself in a non-numeric literal by writing it
twice.
Ex: ‘Don’’t go without saving the program’
PL/SQL will also have literals, which are called as logical (boolean) literals.
These are predetermined constants. The value it can take are TRUE, FALSE, and
NULL.
COMMENTS
A comment line begins with a double hyphen (--). In this case the entire
line will be treated as a comment.
Ex: -- This section performs salary updation.
The comment line begins with a slash followed by an asterisk (/*) till the
occurrence of an asterisk followed by a slash (*/). In this case comment
lines can be extended to more than one lines.
Ex-1: /* this is only for user purpose
which calculates the total salary temporarily
and stores the value in temp_sal */
Ex-2: /* This takes rows from /* table EMPLOYEE */
and put on another table */
In the above comment, there is a comment within an another comment
line,
this is not allowed in PL/SQL.
PL/SQL DATA TYPES AND DECLARATIONS:
PL/SQL supports the standard ORACLE SQL data types. The default data
types that
77
Database Management system Dept of Computer Science & Engg, VJCET
can be declared in PL/SQL are
NUMBER: For storing numeric data
Syntax: variable name NUMBER (precision, [scale])
precision determines the number of significant digits that
NUMBER
can contain. Scale determines the number of digits to the right of
the
decimal point.
Ex: NUMBER (4,2) stores 4234.60
NUMBER (10) stores 3289473348
CHAR: This data type stores fixed length character data.
Syntax: Variable name CHAR (size)
where size specifies fixed length of the variable name.
Ex: CHAR (10) stores MASTERFILE
VARCHAR2: It stores variable length character string data.
Syntax: Variable name VARCHAR2 (size)
Where size specifies the maximum length of the variable name.
Ex: VARCHAR2 (20) stores TRANSACTIONFILE
DATE: The date data types store a date and time.
Syntax: variable name DATE
Ex: date_of_birth DATE
BOOLEAN: This data type stores only TRUE, FALSE or NULL values.
Syntax: variable name BOOLEAN
Ex: flag BOOLEAN.
%TYPE declares a variable or constant to have the same data type as that of a
previously defined variable or of a column in a table or in a view.
78
Database Management system Dept of Computer Science & Engg, VJCET
NOT NULL causes creation of a variable or a constant that cannot have a NULL
value. If you attempt to assign the value NULL to a variable or a constant that has
been assigned a NOT NULL constraint, causes an error.
NOTE: As soon as a variable or constant has been declared as NOT NULL, it must be
assigned a value. Hence every NOT NULL declaration of a variable or constant needs
to be followed by PL/SQL expression that loads a value into the variable or constant
declared.
DECLARING VARIABLES
We can declare a variable of any data type either native to the ORACLE or native to
PL/SQL. Variables are declared in the DECLARE section of the PL/SQL block.
Declaration involves the name of the variable followed by its data type. All statement
must end with a semicolon (;) which is the delimiter in PL/SQL. To assign a value to
the variable the assignment operator (:=) is used.
The general syntax is <Variable name> <type> [ :=<value> ];
Ex: pay NUMBER (6,2);
in_stack BOOLEAN;
name VARCHAR2 (30);
room CHAR (2);
date_of_purchase DATE;
ASSIGNING A VALUE TO A VARIABLE:
A value can be assigned to the variable in any one of the following two ways.
Using the assignment operator :=
Ex: tax := price * tax_rate pay := basic + da.
Selecting or fetching table data values in to variables.
79
Database Management system Dept of Computer Science & Engg, VJCET
Ex: SELECT sal INTO pay
FROM Employee
WHERE emp_name = ‘SMITH’;
DECLARING A CONSTANT:
Declaring a constant is similar to declaring a variable except that you have
to add
the key word CONSTANT and immediately assign a value to it. Thereafter, no further
assignment to the constants is possible.
Ex: pf_percent CONSTANT NUMBER (3,2) := 8.33;
PICKING UP A VARIABLE’S PARAMETERS FROM A TABLE CELL
The basic building block of a table is a cell (i.e. table’s column). While creating a table user attaches certain attributes like data type and constraints. These attributes can be passed on to the variables being created in PL/SQL. This simplifies the declaration of variables and constants.
For this purpose, the %TYPE attribute is used in the declaration of a
variable when the variable’s attributes must be picked from a table field (i.e. column).
Ex: current_sal employee.sal % TYPE
In the above example, current_sal is the variable of PL/SQL block. It gets the data
type
and constraints of the column (field) sal belong to the table Employee. Declaring a variable
with the %TYPE attribute has two advantages
You do not need to know the data type of the table column
If you change the parameters of the table column, the variable’s parameters will
change as well.
PL/SQL allows you to use the %TYPE attribute in a nesting variable declaration.
The following example illustrates several variables defined on earlier %TYPE
declarations in a nesting fashion.
80
Database Management system Dept of Computer Science & Engg, VJCET
Ex: Dept_sales INTEGER;
Area_sales dept_sales %TYPE0;
Group_sales area_sales %TYPE;
Regional_sales area_sales %TYPE;
Corporate_sales regional_sales %TYPE;
In case, variables for the entire row of a table need to be declared, then
instead
of declaring them individually, %ROWTYPE is used.
Ex: emp_row_var employee %ROWTYPE;
Here, the variable emp_row_var will be a composite variable, consisting of
the column
names of the table as its member. To refer to a specific, say ‘sal’; the following
statement will be used.
emp_row_var.sal := 5000;
AN IDENTIFIER IN PL/SQL BLOCK:
The name of any ORACLE object (variable, memory variable, constant, record,
cursor etc) is known as an Identifier. The following laws have to be followed while
working with identifiers.
An identifier cannot be declared twice in the same block
The same identifier can be delcared in two different blocks.
In the second law, the two identifiers are unique and any change in one does
not affect the other.
PL/SQL OPERATORS
81
Database Management system Dept of Computer Science & Engg, VJCET
Operators are the glue that holds expressions together. PL/SQL operators can be
divided into
the following categories.
Arithmetic operators
Comparison operators
Logical operators
String operators
PL/SQL operators are either unary (i.e. they act on one value/variable) or binary
(they act on two values/variables)
1) ARITHMETIC OPERATORS:
Arithmetic operators are used for mathematical computations. They are
2) COMPARISON OPERATORS:
Comparison operators return a BOOLEAN result, either TRUE or FALSE.
They are
82
+ Addition
- Subtraction or Negation ( Ex: -5)
* Multiplication
/ Division
** Exponentiation operator (example 10**5 = 10^5)
= Equality operator 5=3
!= Inequality operator a!=b
<> Inequality operator 5<>3
-= Inequality operator ‘john’ -= ’johny’
< Less than operator a<b
> Greater than operator a>b
<= Less than or equal to a<=b
>= Greater than or equal to a>=b
Database Management system Dept of Computer Science & Engg, VJCET
In addition to this PL/SQL also provides some other comparison operators like LIKE,
IN,
BETWEEN, IS NULL etc.
LIKE: Pattern-matching operator.
It is used to compare a character string against a pattern. Two wild card
characters are defined for use with LIKE, the % (percentage sign) and ( _ )
underscore. The % sign matches any number of characters in a string and ( _ )
matches exactly one.
Ex-1: new%matches with newyork, newjersey etc (i.e. any string
beginning with
the word new).
Ex-2: ‘_ _ _day’ matches with Sunday, Monday and Friday and It will not
match with other days like ‘Tuesday’, ‘Wednesday’, ‘Thursday’
and
‘Saturday’.
IN: Checks to see if a value lies within a specified list of values. The syntax is
Syntax: The_value [NOT] IN (value1, value2, value3……)
Ex: 3 IN (4, 8, 7, 5, 3, and 2) Returns TRUE.
Sun NOT IN ( ‘sat’, ‘mon’, ‘tue’, ‘wed’, ‘sun’) Returns TRUE.
BETWEEN: Checks to see if a value lies with in a specified range of value.
Syntax: the_value [NOT] BETWEEN low_end AND high_end.
Ex: 5 BETWEEN –5 AND 10. Returns TRUE
4 NOT BETWEEN 3 AND 4 Returns FALSE.
IS NULL: Checks to see if a value is NULL.
83
Database Management system Dept of Computer Science & Engg, VJCET
Syntax: the_value IS [NOT] NULL
Ex: If balance IS NULL then
If acc_id IS NOT NULL then
3) LOGICAL OPERATORS.
PL/SQL implements 3 logical operations AND, OR and NOT. The NOT
operator is unary operator and is typically used to negate the result of a comparison
expression, where as the AND and OR operators are typically used to link together
multiple comparisons.
A AND B is true only if A returns TRUE and B returns TRUE else it is
FALSE.
A OR B is TRUE if either of A or B is TRUE. And it is FALSE if both A
and B
are FASLE.
NOT A Returns TRUE if A is FALSE
Returns FALSE if A is TRUE.
Ex: (5 = 5) AND (4<20) AND (2>=2) Returns TRUE
(5=5) OR (5!=4) Returns TRUE.
‘mon’ IN ( ‘sun’, ‘sat’) OR (2 = 2) Returns TRUE.
4) STRING OPERATORS:
PL/SQL has two operators specially designed to operate only on character string type
data. These are LIKE and ( || ) Concatenation operator. LIKE is a comparison
operator and is used to compare strings and it is discussed in the previous session.
Concatenation operator has following syntax.
Synatx: String_1 || string_2
String_1 and string_2 both are strings and can be a string constants, string variables or
string expressions. The concatenation operator returns a resultant string consisting of
all the characters in string_1 followed by all the characters in string_2.
84
Database Management system Dept of Computer Science & Engg, VJCET
Ex : ‘Chandra’ || ’shekhar’ Returns ‘Chandrashekhar’
A=’Engineering’ B=’College’ C=VARCHAR2 (50)
C=A || ‘ ‘ || B Returns a value to variable C as ‘Engineering
College’.
NOTE-1: PL/SQL string comparisons are always case sensitive, i.e. ‘aaa’ not
equal to
‘AAA’.
NOTE-2: ORACLE has some built in functions that are designed to convert
from one
data type to another data type.
To_date: Converts a character string into date
To_number: Converts a character string to a number.
To_char: Converts either a number or date to character string.
Ex: To_date (‘1/1/92’, ‘mm/dd/yy/’); Returns 01-jan-1992.
To_date (‘1-1-1998’, ‘mm-dd-yyyy’); Returns 01/01/1998.
To_date (‘Jan 1, 2001’,’mm dd, yyyy’); Returns Saturday, January 01,
2001.
To_date (‘1/1/02’, ‘mm/dd/rr’); Returns Tue, Jan 01, 2002.
To_number (‘123.99’, ‘999D99’); Returns 123.99
To_number ( ‘$1,232.95’, ‘$9G999D99’); Returns $1, 232.99
To_char (123.99, ‘999D99’); Returns 123.99.
CONDITIONAL CONTROL IN PL/SQL :
In PL/SQL, the if statement allows you to control the execution of a
block of
code. In PL/SQL we can use the following if forms
IF condition THEN statements END IF;
IF condition THEN statements
ELSE statements
END IF;
IF condition THEN statements
85
Database Management system Dept of Computer Science & Engg, VJCET
ELSE IF condition THEN
Statements
ELSE statements
END IF
END IF;
ITERATIVE CONTROL IN PL/SQL :
PL/SQL provides iterative control and execution of PL/SQL statements
in the
block. This is the ability to repeat or skip sections of a code block.
Following are
the four types of iterative statements provided by the PL/SQL
The Loop statement
The WHILE Loop statement
The GOTO statement
FOR Loop
i. LOOP STATEMENT:
A loop repeats a sequence of statements. The format is as follows.
LOOP
Statements
END LOOP;
The one or more PL/SQL statements can be written between the key
words LOOP and END LOOP. Once a LOOP begins to run, it will go on
forever. Hence loops are always accompanied by a conditional statements that
keeps control on the number of times it is executed. We can also build user
defined exists from a loop, where required.
Ex: LOOP
Cntr : = cntr + 1;
IF cntr > 100
EXIT;
86
Database Management system Dept of Computer Science & Engg, VJCET
END IF;
END LOOP;
EXIT statement brings the control out of loop if the condition is
satisfied.
ii. WHILE LOOP :
The WHILE loop enables you to evaluate a condition before a
sequence of statements would be executed. If condition is TRUE then
sequence of statements are executed. This is different from the FOR loop
where you must execute the loop atleast once. The syntax for the WHILE loop
is as follows:
Syntax: WHILE < Condition is TRUE >
LOOP
< Statements >
END LOOP;
Ex : DECLARE
Count NUMBER(2) : = 0;
BEGIN
WHILE count < = 10
LOOP
Count : = count + 1;
Message('while loop executes');
END LOOP;
END;
EXIT and EXIT WHEN statement:
EXIT and EXIT WHEN statements enable you to escape out of the control
of a loop. The format of the EXIT statement is as follows :
Syntax: EXIT;
EXIT WHEN statements has following syntax
Syntax: EXIT WHEN <condition is true >;
87
Database Management system Dept of Computer Science & Engg, VJCET
EXIT WHEN statement enables you to specify the condition required to exit the
execution of the loop. In this case no if statement is required.
Ex-1: IF count > = 10 EXIT;
Ex-2: EXIT WHEN count > = 10;
iii. THE GOTO STATEMENT :
The GOTO statement allows you to change the flow of control within a
PL/SQL
block. The syntax is as follows
Syntax: GOTO <label name> ;
The label is surrounded by double brackets (<< >>) and label must not have a
semi colon after the label name. The label name does not contain a semi colon
because it is not a PL/SQL statement. But rather an identifier of a block of PL/SQL
code. You must have at least one statement after the label otherwise an error will
result. The GOTO destination must be in the same block, at the same level as or
higher than the GOTO statement itself.
Ex: IF result = 'fail' THEN
GOTO failed_stud
END IF;
<<failed_stud>>
Message ('student is failed');
The entry point of the destination block is defined within << >> as
shown above, i.e. labels are written within the symbol << >>. Notice that
<<failed_stud>> is a label and it is not ended with semicolon ( ; ).
iv. FOR LOOP :
FOR loop will allow you to execute a block of code repeatedly until
some condition occurs. The syntax of FOR loop is as follows.
Syntax: FOR loop_index IN [ REVERSE] low_value ..
High_value LOOP
Statements to execute
END LOOP;
88
Database Management system Dept of Computer Science & Engg, VJCET
The loop_index is defined by oracle as a local variable of type integer.
REVERSE allows you to execute the loop in reverse order. The low_value ..
High_value is the range to execute the loop. These can be constants or
variables. The line must be terminated with loop with no semicolon at the end
of this line. You can list the statements to be executed until the loop is
executed is evaluated to false.
Ex: FOR v_count IN 1 .. 5 LOOP
Message ('for loop executes');
END LOOP;
In the above example the message 'for loop executes' is displayed five
times.
We can terminate the FOR loop permanently using EXIT statement
based on some BOOLEAN condition. Nesting of FOR loop can also be
allowed in PL/SQL. The outer loop executed once, then the inner loop is
executed as many times as the range indicates, and then the control is returned
to the outer loop until its range expires.
Ex: FOR out_count IN 1..2 LOOP
FOR in_count IN 1..2 LOOP
Message ('nested for loop');
END LOOP;
END LOOP;
In the above example the message 'nested for loop' is displayed four
times.
Let us discuss some examples from the understanding how to write a
PL/SQL block structure. Here we assume that a table called "EMP" is created
and the datas are already inserted into it.
Table name : EMP
Create table EMP
( emp_no NUMBER (3),
name VARCHAR2 (15),
89
Database Management system Dept of Computer Science & Engg, VJCET
salary NUMBER (6,2),
dept VARCHAR2 (15),
div VARCHAR2 (2) );
EXAMPLE-1:
DECLARE
num NUMBER (3);
sal emp.salary %TYPE;
emp_name emp.name %TYPE;
count NUMBER (2) : = 1;
starting_emp CONSTANT NUMBER(3) : = 134;
BEGIN
SELECT name, salary INTO emp_name, sal
FROM EMP
WHERE emp_no = starting_emp;
WHILE sal < 4000.00
LOOP
Count : = count + 1;
SELECT emp_no, name, salary INTO
Num, emp_name, sal FROM EMP
WHERE emp_no > 2150;
END LOOP;
Commit;
END;
In the above example there are five statements in the declaration part.
The num is a integer type, sal and emp_name takes the similar data type of
the salary and name columns of EMP table respectively. Count is a variable
of type integer and takes initial value 1. Starting_emp is a constant and it is of
integer type with immediately assigned value 134.
Between BEGIN and END key words, there are some SQL executable
statements used for manipulating the table data. The SELECT statement
extracts data stored in name and salary columns of EMP table corresponding
90
Database Management system Dept of Computer Science & Engg, VJCET
to the employee having employee number 134. It stores those values In the
variables emp_name and sal respectively.
If sal less than 4000 then the statements within the loop will be
executed. Within the loop, there are two SQL statements, the first one
increments the count value by 1 and the second statement is a SELECT
statement. The commit statement commits the changes made to that table. The
END statement terminates the PL/SQL block.
EXAMPLE-2:
This example assumes the existence of table accounts created by using
the following SQL statements.
Create table Accounts
(accnt_id NUMBER(3),
name VARCHAR2(25),
bal NUMBER(6,2) );
PL/SQL block:
DECLARE
acct_balance NUMBER(6,2);
acct CONSTANT NUMBER(3) : = 312;
debit_amt CONSTANT NUMBER(5,2) : = 500.00;
BEGIN
SELECT bal INTO acct_balance FROM Accounts
WHERE accnt_id = acct;
IF acct_balance = debit_amt THEN
UPDATE Accounts
SET bal : = bal - debit_amt WHERE accnt_id = acct;
ELSE
Message ('insufficient amount in account');
END IF;
END;
91
Database Management system Dept of Computer Science & Engg, VJCET
The above example illustrates the use of IF .. THEN .. ELSE.. END IF
condition control statements.
Declaration part declares one variable and two constants. The
SELECT statement extracts the amount in the bal column of Accounts table
corresponding to account number 312, and stores that in a variable
acct_balance.
If statement checks acct_balance for sufficient amount before
debiting. It updates the table Accounts if it has sufficient amount in the
balance, else it displays a message intimating insufficient fund in the account
of specified accnt_id.
EXAMPLE-3:
This example assumes two tables, which are created by following
statements.
Create table Inventory
( prod_no NUMBER (6),
product VARCHAR2 (15),
quantity NUMBER (5) );
Create table Purchase_record
( mesg VARCHAR2 (50),
d_ate DATE );
PL/SQL block :
DECLARE
num_in_stack NUMBER(5);
BEGIN
SELECT quantity INTO num_in_stack
FROM Inventory WHERE product = 'gasket';
IF num_in_stack > 0 THEN
UPDATE Inventory SET quantity : = quantity - 1
WHERE product = 'gasket';
92
Database Management system Dept of Computer Science & Engg, VJCET
INSERT INTO Purchase_record
VALUES (' One gasket purchased', sysdate);
ELSE
INSERT INTO Purchase_record
VALUES ('no gasket availabel',sysdate);
Message ( 'there are no more gasket in stack' );
END IF;
Commit;
END;
The above block of PL/SQL code does the following;
It determines how many gaskets are left in stack.
If the number left in staff is greater than zero, it updates the inventory
to reflect the sale of a gasket.
It stores the fact that a gasket was purchased on a certain date.
If the stock available is zero, it stores the fact that there are no more
gaskets for sale on the date on which such a situation occurred.
ERROR HANDLING IN PL/SQL :
PL/SQL has the capability of dealing with the errors that arise while
executing a PL/SQL block of code. It has a number of conditions that are pre
programmed in to it that are recognized as error conditions. These are called
internally defined exceptions. You can also program PL/SQL to recognize user-
defined exceptions.
There are two different types Error conditions ( Exceptions).
user defined error conditions / exceptions.
Predetermined internal PL/SQL exceptions.
1) USER DEFINED EXCEPTIONS:
93
Database Management system Dept of Computer Science & Engg, VJCET
User can write a set of code, which is to be executed while error occurs when
executing a PL/SQL block of code. These set of code are called user-defined
exceptions, and these are placed in the last section of PL/SQL block called
EXCEPTIONS.
The method used to recognise user-defined exceptions is as follows
Declare a user defined exception in the declaration section of
PL/SQL block.
In the main program block for the conditions that needs special
attention, execute a RAISE statement.
Reference the declared exception with an error handling routine in
EXCEPTION section of PL/SQL block.
RAISE statement acts like CALL statement of high level languages. It has
general format
RAISE < name of exception >
When RAISE statement is executed, it stops the normal processing of
PL/SQL block
of code and control passes to an error handler block of the code at the end
of PL/SQL
program block ( EXCEPTION section ).
An exception declaration declares a name for user defined error conditions
that the PL/SQL code block recognizes. It can only appear in the DECLARE section
of the PL/SQL code which preceedes the key word BEGIN.
EXAMPLE :
DECLARE
---------------
zero_commission Exception;
---------------
BEGIN
-----------------
94
Database Management system Dept of Computer Science & Engg, VJCET
IF commission = 0 THEN
RAISE zero_commission;
------------------------
EXCEPTION
WHEN zero_commission THEN
Process the error
END;
Exception handler (error handler block ) is written between the key words
EXCEPTION and END. The exception handling part of a PL/SQL code is
optional. This block of code specifies what action has to be taken when the named
exception condition occurs.
The naming convention for exception name are exactly the same as those for
variables or constants. All the rules for accessing an exception from PL/SQL
blocks are same as those for variables and constants. However, it should be noted
that exceptions cannot be passed as arguments to functions or procedures like
variables or constants.
2) PREDETERMINED INTERNAL PL/SQL EXCEPTIONS :
The ORACLE server defines several errors with standard names. Although
every ORACLE error has a number, the error must be referred by name. PL/SQL
has predefined some common ORACLE errors and exceptions. Some of them are
given below:
NO_DATA_FOUND Raised when a select statement returns zero rows.
TOO_MANY_ROWS Raised when a select statement returns more than
one rows.
VALUE_ERROR Raised when there is either a data type mismatch or if the
size is smaller than required size.
95
Database Management system Dept of Computer Science & Engg, VJCET
INVALID_NUMBER Raised when conversion of a number to a
character string failed.
ZERO_DIVIDE Raised when attempted to divide by zero.
PROGRAM_ERROR Raised if PL/SQL encounters an internal problem.
STORAGE_ERROR Raised if PL/SQL runs out of memory or if
memory if corrupted.
DUP_VAL_ON_INDEX Raised when attempted to insert or update a
duplicate into a column that has unique index.
INVALID_CURSOR Raised when illegal cursor operation was
attempted.
CURSOR_ALREADY_OPEN Raised when attempted to open a cursor that was
previously opened.
NOT_LOGGED_ON Raised when a database call was made without
being logged into ORACLE.
LOGIN_DENIED Raised when login to ORACLE failed because of invalid
username and password.
OTHERS This will be raised when the all other exceptions failed to
catch the errors.
It is possible to use WHEN OTHERS clause in the exception handling part of the
PL/SQL block. It will take care of all exceptions that are not taken care of in the
code.
The syntax for exception handling portion of PL/SQL block is as follows:
EXCEPTION
WHEN exception_1 THEN Statements;
WHEN exception_2 THEN Statements;
- - --- ---- -- ---
END;
In this syntax, exception_1 and exception_2 are the names of exceptions (may be
predefined or user-defined ). Statements in the PL/SQl code that will be executed
if the exception name is satisfied.
96
Database Management system Dept of Computer Science & Engg, VJCET
EXAMPLE-1:
This example writes PL/SQL code for validating accnt_id of Accounts table
so that it must not be left blank, if it is blank cursor should not be allowed to move to
the next field.
DECLARE
no_value exception;
BEGINIF : Accounts.accnt_id IS NULL THEN
RAISE no_value;
ELSE
next_field;
END IF;
EXCEPTIONWHEN no_value THEN
Message ( 'account id cannot be, blank Please enter valid data !!! ' );
go_field ( : system.cursor_field );
END;
In the above example accnt_id field of Accounts table is checked for NULL
value. If it is NULL, then RAISE statement calls exception handler no_value.
This exception name no_value is declared in DECLARE section and defined in
the EXCEPTION section of PL/SQL block by using WHEN statement. no_value
is a user-defined exception.
EXAMPLE-2:
DECLARE
balance Accounts.bal %TYPE;
acount_num Accounts.accnt_id %TYPE;
97
Database Management system Dept of Computer Science & Engg, VJCET
BEGIN
SELECT accnt_id bal INTO account_num, balance
FROM Accounts WHERE accnt_id > 0000;
EXCEPTION
WHEN no_data_found THEN
Message ('empty table');
END;
In the above example exception is used in the PL/SQL block. This exception is
predefined internal PL/SQL exception (NO_DATA_FOUND).
Therefore, it does not require declaration in DECLARE section and RAISE
statement in BEGIN … END portion of the block. Even though it is not raised, the
ORACLE server will raise this exception when there is no data in bal and accnt_id
field.
PL/SQL FUNCTIONS AND PROCEDURES :
PL/SQL allows you to define functions and procedures. These are similare to
functions and procedures defined in any other languages, and they are defined as one
PL/SQL block.
FUNCTIONS :
The syntax for defining a function is as follows :
FUNCTION name [ (argument-list) ] RETURN data-type {IS, AS}
Variable-declarations
BEGIN
Program-code
[ EXCEPTION
error-handling-code]
END;
In this syntax,
98
Database Management system Dept of Computer Science & Engg, VJCET
name The name you want to give the function.
argument-list List of input and/or output parameters for the functions.
data-type. The data type of the function's return value.
Variable-declarations Where you declare any variables that are local to the function.
program-code Where you write PL/SQL statements that make up the
function.
error-handling-code Where you write any error handling routine.
Notice that the function block is similar to the PL/SQL block that we discussed
earlier.
The keyword DECLARE has been replaced by FUNCTION header, which
names the function, describes the parameter and indicates the return type.
Functions can be called by using name( argument list )
Example:
FUNCTION check(b_exp in BOOLEAN,
True_number in NUMBER,
False_number in NUMBER)
RETURN NUMBER IS
BEGIN
IF b_exp THEN RETURN true_number;
ELSE
RETURN false_number;
END IF;
END;
The above function can be called as follows.
Check ( 2 > 1, 1 , 0)
Check (5 = 0, 1, 0)
PROCEDURES:
99
Database Management system Dept of Computer Science & Engg, VJCET
The declaration of procedures is almost identical to that of function
and the syntax
is given below.
PROCEDURE name [(argument list)] {IS,AS}
Variable declaration
BEGIN
Program code
[EXCEPTION
Error handling code ]
END;
Here name is the name that you want to give the procedure and all other are
similar to function declaration. Procedure declaration resembles a function declaration
except that there is no data type and key word PROCEDURE is used instead of
FUNCTION.
Ex: PROCEDURE swapn (A IN OUT NUMBER, B IN OUT NUMBER) IS
Temp_num NUMBER;
BEGIN
Temp_num : = A;
A : = B;
B : = temp_num;
END;
The above procedure can be called as follows.
Swapn (3,4);
Swapn (-6,7);
DATABASE TRIGGERS :
PL/SQL can be used to write data base triggers. Triggers are used to define code
that is executed/fired when certain actions or event occur. At the data base level,
triggers can be defined for events such as inserting a record into a table, deleting a
record, and updating a record.
100
Database Management system Dept of Computer Science & Engg, VJCET
The trigger definition consists of following basic parts.
The events that fires the trigger
The database table on which event must occur
An optional condition controlling when the trigger is executed
A PL/SQL block containing the code to be executed when the trigger is fired.
A trigger is a data base object, like a table or an index. When you define a trigger,
it becomes a part of the database and it is always is executed when the event for
which it is defined occurs.
Syntax for creating a data base trigger is shown below.
CREATE [ or REPLACE ] TRIGGER trigger-name
{ BEFORE | AFTER } verb-list ON table-name
[ FOR EACH ROW [ WHEN condition ] ]
DECLARE
Declarations
BEGIN
PL/SQL code
END;
In the above syntax
REPLACE
Is used to recreate if trigger already exists.
trigger-name
Is the name of the trigger to be created.
verb-list
The SQL verbs that fire the Create, i.e. it may be INSERT,
UPDATE or DELETE.
table-name
The table on which the trigger is defined.
condition
An optional condition placed on the execution of the triggers.
101
Database Management system Dept of Computer Science & Engg, VJCET
declarations.
Consists of any variable, record or cursor declarations needed
by this PL/SQL blocks.
PL/SQL code
PL/SQL code that gets executed when the trigger fires.
EXAMPLE:
CREATE TRIGGER check_salary
BEGORE insert or update of S AL, JOB on EMP
FOR EACH ROW WHEN ( new. Job != 'director')
DECLARE
minsal NUMBER;
maxsal NUMBER;
BEGIN
SELECT min_sal, max_sal INTO minsal, maxsal
FROM salary-mast WHERE JOB = :new.job;
IF ( :new-sal < minsal or :new.sal > maxsal ) THEN
Message ( 'salary out of range' );
END IF;
END;
3.15 CURSOR IN PL/SQL:
PL/SQL cursors provide a way for your program to select multiple rows of data
from the database and then to process each row individually. Cursors are PL/SQL
constructs that enable you to process, one row at a time, the results of a multi row
query.
ORACLE uses work areas to execute SQL statements, PL/SQL allows user to
name private work areas and access the stored information. The PL/SQL
constructs to identify each and every work area used by SQL is called a Cursor.
There are 2 types of cursors.
Implicit cursors
102
Database Management system Dept of Computer Science & Engg, VJCET
Explicit cursors
Implicit cursors are declared by ORACLE for each UPDATE, DELETE and
INSERT SQL commands. Explicit cursors are declared and used by the user to
process multiple row, returned by SELECT statement.
The set of rows returned by a query is called the Active Set. Its size depends on
the number of rows that meet the search criteria of the SQL query. The data that is
stored in the cursor is called the Active Data Set.
ORACLE cursor is a mechanism used to easily process multiple rows of data.
Cursors contain a pointer that keeps track of current row being accessed, which
enables your program to process the rows at a time.
EXAMPLE:
When a user executes the following SELECT statement
SELECT emp_no, emp_name, job, salary
FROM employee
WHERE dept = 'physics'
The resultant dataset will be displayed as follows
Table3.1
1) EXPLICIT CURSOR MANAGEMENT :
The following are the steps to using explicitly defined cursors within PL/SQL
Declare the cursor
Open the cursor
Fetch data from the cursor
Close the cursor
103
1234 A. N. Sharanu Asst. Professor 22,000.00
1345 N. Bharath Senior Lecturer 17,000.00
1400 M. Mala Lab Incharge 9,000.00
Database Management system Dept of Computer Science & Engg, VJCET
Declaring the cursor :
Declaring a cursor enables you to define the cursor and assign a name to it. It has
following syntax.
CURSOR cursor-name
IS SELECT statement
Ex: CURSOR c_name IS
SELECT emp_name FROM Emp WHERE dept = 'physics'
Opening a cursor:
Opening a cursor executes the query and identifies the active set that contains
all the rows, which meet the query search criteria.
Syntax :
OPEN cursor_name
Ex:
OPEN c_name
Open statement retrieves the records from the database and places it in the
cursor (private SQL area).
Fetching data from cursor:
The fetch statement retrieves the rows from the active set one row at a time. The
fetch statement is used usually used in conjunction with iterative process
( looping statements ). In iterative process the cursor advances to the next row in
the active set each time the fetch command is executed. The fetch command is the
only means to navigate through the active set.
Syntax : FETCH cursor-name INTO record-list
Record-list is the list of variables that will receive the columns (fields ) from the active set.
Ex: LOOP
-----------
------------
FETCH c_name INTO name;
104
Database Management system Dept of Computer Science & Engg, VJCET
-----------
END LOOP;
Closing a cursor :
Closing statement closes/deactivates/disables the previously opened cursor and
makes the active set undefined. Once it is closed, you cannot perform any
operations on it. Once a cursor is closed, the user can reopen the cursor by using
Open statement.
Syntax : CLOSE cursor_name
Ex: CLOSE c_name;
EXAMPLE-1 :
The HRD manager has decided to raise the salary for all the employees in
the physics department by 0.05. whenever any such raise is given to the employees, a
record for the same is maintained in the emp_raise table ( the data table definitions are
given below ). Write a PL/SQL block to update the salary of each employee and insert
the record in the emp_raise table.
Tabe: employee
emp_code varchar (10)
emp_name varchar (10)
dept varchar (15)
job varchar (15)
salary number (6,2)
Table: emp_raise
emp_code Varchar(10)
raise_date Date
raise_amt Number(6,2)
Solution:
DECLARE
105
Database Management system Dept of Computer Science & Engg, VJCET
CURSOR c_emp IS
SELECT emp_code, salary FROM employee
WHERE dept = 'physics';
str_emp_code employee.emp_code %TYPE;
num_salary employee.salary %TYPE;
BEGIN
OPEN c_emp;
LOOP
FETCH c_emp INTO str_emp_code, num_salary;
UPDATE employee SET salary : = num_salary + (num_salary * 0.05)
WHERE emp_code = str_emp_code;
INSERT INTO emp_raise
VALUES ( str_emp_code, sysdate, num_salary * 0.05 );
END LOOP;
Commit;
CLOSE c_emp;
END;
2) EXPLICIT CURSOR ATTRIBUTES:
ORACLE provides certain attributes/cursor variables to control the execution of
the cursor. Whenever any cursor (explicit or implicit ) is opened and used,
ORACLE creates a set of four system variables via which ORACLE keep track of
the "current status" of the cursor. Programmers can access these variables. They
are
%NOT FOUND: Evaluates to TRUE if the last fetch is failed i.e. no more rows are
left.
Syntax: cursor_name %NOT FOUND
%FOUND: Evaluates to TRUE, when last fetch succeeded
Syntax: cursor_name %FOUND
106
Database Management system Dept of Computer Science & Engg, VJCET
%ISOPEN: Evaluates to TRUE, if the cursor is opened, otherwise evaluates
to
FALSE. Syntax: cursor_name %ISOPEN
%ROWCOUNT: Returns the number of rows fetched.
Syntax: cursor_name %ROWCOUNT.
EXAMPLE :
DECLAREv_emp_name varchar2(32);
v_salary_rate number(6,2);
v_payroll_total number(9,2);
v_pay_type char;
CURSOR c_emp IS
SELECT emp_name, pay_rate, pay_type FROM employee
WHERE emp_dept = 'physics'
BEGIN
IF c_name %ISOPEN THEN
RAISE
not_opened
ELSE
OPEN c_emp;
LOOP
FETCH c_emp INTO v_emp_name, v_salary_rate,
v_pay_type;
EXIT WHEN c_emp % NOTFOUND;
IF v_pay_type = 'S' THEN
v_payroll_total : = (v_salary_rate * 1.25 );
ELSE
v_payroll_total : = (v_salary_rate * 40);
END IF;
INSERT INTO weekly_salary VALUES ( 'v_payroll_total' );
END LOOP;
CLOSE c_emp;
107
Database Management system Dept of Computer Science & Engg, VJCET
EXCEPTION
WHEN not_opened
Message ( 'cursor is not opened' );
END;
REFERENCES:-
1. Teach yourself PL/SQL in 21 days - SAMS Publications.
2. ORACLE-7 - Ivan Bayross.
2. ORACLE Developer 2000 - Ivan Bayross.
3. ORACLE Developers guide - David McClanahan.
MODULE 4
4.1 Introduction
Measure Of Quality We can discuss the goodness of relation schema in two levels.1. Logical Level
All you know, logical level represent the middle level in three level architecture of DBMS. The logical level describes how users interpret the relation schema and the meaning of their attributes. Having good relation schema at this level enables users to understand clearly the meaning of data in relations and hence to formulate the queries correctly.2 Implementation Level
It is the lowermost level in the DBMS architecture, which describes how the tuples in base relations are stored and updated. This level applies only to the storage level in the database. But the former logical level applies to both view
108
Database Management system Dept of Computer Science & Engg, VJCET
level and logical level. And the database becomes effective as much as with the effective storage.
4.2 Database Design Techniques
Generally we can design the database in two different approaches.1. Top-Down Design (Analysis) methodology
It starts with the major entities of their interest, their attributes and their relationships. And then we add other entities and may split these entities into a number of specialized entities and add the relationships between these entities.2. Bottom-Up Design
It starts with a set of attributes. And group these attributes into entities. Then find out the relationship between these entities. Identify the higher-level entities, generalized these entities and locate relationships at this higher level.
Problems with bad schema
1. Redundant storage of data:
2. Wastage of disc space
3. More running time
Informal Design Guidelines for relation schema
There are four informal measures of quality for relation schema design.
1. Semantics of the relation attribute
For all attributes belonging to a relation will have certain real world meaning and a proper interpretation associated with them. The semantics specifies how to interpret the attribute values stored in a tuple of the relation. Guideline:-Design relation schema so that it is easy to explain its meaning. Do not combine the attributes from multiple entity types and relationship types into a single relation. Otherwise if the relation contains multiple entities, the semantics will be unambiguous and the relation cannot be easily explained.
2. Reducing the redundant values in tuples and update anomalies
One goal of schema design is to minimize the storage space that the space relations occupy. The anomalies that may be present in the database relation can be classified into three categories.
1. Insertion anomalies
2. Deletion anomalies
3. Modification anomalies
109
Database Management system Dept of Computer Science & Engg, VJCET
Guideline: - Design relation schema so that there is no insertion, deletion and modification anomalies are present in relations. If any anomalies are present note them clearly and make sure that the programs that update the database will operate correctly.
3. Null values in tuples
The null value in tuples is wastage of storage space in storage level and may also lead to problems with understanding the meaning of attributes and with specifying join operations at the logical level. Another problem is how to account the aggregate functions in null valued attributes. Guideline: - As far possible, avoid placing attributes in base relation whose values frequently be null. If nulls are unavoidable, make sure that they are applicable for exceptional cases only.
4.3 ConstraintsConstraints on database can generally divided into four main
categories.1. Inherent model based: Constraints that are inherent to the data
model called inherent model based constraint like, a relation
cannot have duplicate tuples.
2. Schema Based: Constraints that can be directly expressed in the
schemas of the data model, typically specifying in DDL are
called Schema based constraint.
3. Application Based: Schemas expressed by the application
programs are called application-based constraints.
4. Data Dependency: It is the constraint that is related to the
dependency between the values in the relation.
Now we can go through the details of schema-based constraints. These
schema based constraints are expressed in relational model. It includes five basic
constraints.
1. Domain constraint
2. Key constraint
3. Entity integrity constraint
4. Referential integrity constraint
5. Constraint on nulls
4.4 Domain constraint A domain D represents a set of atomic values. The data type describing the
type of values that can appear in each column is represented by this domain. i.e
110
Database Management system Dept of Computer Science & Engg, VJCET
each value in the domain indivisible as far as the relational model is concerned.
Specifying the data type from which the data values specifying the domain are
drawn specifies the domain. A domain is given a name, data type and format.
4.5 Entity-Integrity constraintEntity integrity constraint states that no primary key value can be null. Primary
key value is used to identify tuples. Having null value in primary key implies we
cannot identify tuples. The key constraint and integrity constraint are specified on
individual relations.
4.6 Referential integrity constraintThe Referential integrity constraint is specified between two relations and is used
to maintain the consistency among tuples in two relations. Referential integrity
constraint states that a tuple in one relation that refer to another relation must refer
to an existing tuple in that relation. To define Referential integrity constraint first
we have to define the concept of foreign key (FK).
A set of attributes FK in relation schema R1, is a foreign key of R1 that references
the R2 relation if it satisfies the following two rules.
1. The attributes in FK have the same domain as the PK attributes of R2. The
attributes FK are said to reference or refer to the relation R2.
2. A value of FK in tuple t1 of the current state r1(R) either occurs as a value of
PK for some tuple t2 in the current state r2(R) or is null.
If t1[FK]=t2[PK] then we can say that the tuple t1 refers to the tuple t2.
R1 is referencing relation
R2 is referenced relation.
Key constraint A relation is defined as a set of tuples. By definition of a relation all the
tuples in a relation are distinct. i.e no two tuples can have the same value for all
their attributes. There are some subsets of relation schema R with the property that
no two tuples in any relation state ‘r ‘ of ‘R’ should have the same value for these
attributes.
A key K of a relation schema R is a super key of R with the additional property
that removing any attribute A from K leaves a set of attributes K’ that is not a
super key of R any more. Hence key satisfies the following two constraints.
1. The two distinct tuples in any state of the relation cannot have identical
values for all the attributes in the key.
111
Database Management system Dept of Computer Science & Engg, VJCET
2. It is a minimal super key. i.e from a super key we cannot remove any
attribute and still have the uniqueness constraint to hold the first condition.
Null ConstraintIt specifies whether the null values are permitted to an attribute in a database.
4.7 Functional Dependency (FD)Functional dependency is a constraint between two sets of attributes from the
database.
Definition: A FD denoted by X→ Y between two sets of attributes X and Y that
are subsets of R specifies a constraint on the possible tuples that can form a
relation state r of R. The constraint is that, for any two tuples t1 and t2 in r that
have t1[X] =t2[X], they must also have t1[Y] =t2[Y]. i. e. the values of the Y
component of a tuple in r depends on the values of X component or the X
component determines the value of Y component.
i.e
1. The constraint on R states that there cannot be more than one tuple with
a given X value in any relation state r(R) ╡ X→ Y for any subset of
attributes Y of R.
2. If X→ Y in R doesn’t say whether or not Y→ X in R.
A functional dependency FD: is called trivial if Y is a subset of X.
Definition: A functional dependency, denoted by X→Y, between two sets of
attributes X and Y that are subsets of the attributes of relation R, specifies that the
values in a tuple corresponding to the attributes in Y are uniquely determined by
the values corresponding to the attributes in X.
For example, the social security number uniquely determines a name;
SSN→ Name
Functional dependencies are determined by the semantics of the relation, in
general, they cannot be determined by inspection of an instance of the relation.
That is, a functional dependency is a constraint and not a property derived from a
relation.
Inference rules
Armstrong's axioms - sound and complete i.e, enable the computation of any
functional dependency.
112
Database Management system Dept of Computer Science & Engg, VJCET
Functional dependencies are
1. Reflexivity - if the B's are a subset of the A's then A → B
2. Augmentation - If A → B, then A, C → B, C.
3. Transitivity - If A → B and B → C then A → C.
Additional inference rules
4. Decomposition - If A → B, C then A → B
5. Union - If A → B and A → C then A → B, C
6. Pseudo transitive - If A → B and C, B → D then C, A → D
Equivalence of sets of functional dependencies
Two functional dependencies S & T are equivalent iff S→ T and T → S.
The dependency {A_1, ..., A_n} → {B_1, ..., B_m}
is trivial if the B's are a subset of the A's
is nontrivial if at least one of the B's is not among the A's
is completely nontrivial if none of the B's is also one of the A's
Closure (F+)
All dependencies that include F and that can be inferred from F using the above
rules are called closure of F denoted by F+.
Algorithm to compute closure
We have to find out whether F╞ X → Y. This is the case when X → Y Є F+ .For
the better method is to generate X+, closure of X under Fand test F╞ X → Y
using the first two axioms augmentation and reflexive rules.
Algorithm:
Input: A set of FD ‘s F and asset of attributes X.
Output: The closure X+ of X under the FD’s F+.
X+:=X;
Change=true;
While change do
Begin
Change: = False;
For each FD W → Z in F do
Begin
If W Z then do
113
Database Management system Dept of Computer Science & Engg, VJCET
Begin
X+: = X+UZ;
Change: =True;
End;
End;
End;
4.8 NormalizationIn relational database theory, normalization is the process of restructuring the
logical data model of a database to eliminate redundancy, organize data
efficiently, reduce repeating data and to reduce the potential for anomalies
during data operations. Data normalization also may improve data consistency
and simplify future extension of the logical data model. The formal
classifications used for describing a relational database's level of normalization
are called normal forms (NF).
A non-normalized database can suffer from data anomalies:
A non-normalized database may store data representing a particular referent in
multiple locations. An update to such data in some but not all of those locations
results in an update anomaly, yielding inconsistent data. A normalized database
prevents such an anomaly by storing such data (i.e. data other than primary
keys) in only one location.
A non-normalized database may have inappropriate dependencies, i.e.
relationships between data with no functional dependencies. Adding data to such
a database may require first adding the unrelated dependency. A normalized
database prevents such insertion anomalies by ensuring that database relations
mirror functional dependencies.
Similarly, such dependencies in non-normalized databases can hinder deletion.
That is, deleting data from such databases may require deleting data from the
inappropriate dependency. A normalized database prevents such deletion
anomalies by ensuring that all records are uniquely identifiable and contain no
extraneous information.
4.9 Normal forms Edgar F. Codd originally defined the first three normal
114
Database Management system Dept of Computer Science & Engg, VJCET
The first normal form requires that tables be made up of a primary key and a
number of atomic fields, and the second and third deal with the relationship of
non-key fields to the primary key. These have been summarized as requiring
that all non-key fields be dependent on "the key, the whole key and nothing but
the key". In practice, most applications in 3NF are fully normalized. However,
research has identified potential update anomalies in 3NF databases. BCNF is a
further refinement of 3NF that attempts to eliminate such anomalies.
The fourth and fifth normal forms (4NF and 5NF) deal specifically with the
representation of many-many and one-many relationships. Sixth normal form
(6NF) only applies to temporal databases.
4.10. First normal form (1NF) First normal form (1NF) lays the groundwork for an organized database
design:
Ensure that each table has a primary key: minimal set of attributes which
can uniquely identify a record. It states that the domain of an attribute must
include only atomic values and the value of any attribute in a tuple must be
single value from the domain of that attribute. It doesn’t allow nested relation.
Data that is redundantly duplicated across multiple rows of a table is moved out
to a separate table.
Atomicity: Each attribute must contain a single value, not a set of values.
Eg: Consider a Relation Person. The person will have the attributes SSN, Name,
Age, Address and College_Degree.
Person
Table 4.1
Now we can analyze this relation. Now check what are the possible values of
each attributes. Here SSN and Age will have only one value for a person. But
The college_Degree will have more than one value. And the address and Name
of person can be divided into more than one attributes. Hence this relation is not
in 1NF. So let us change this relation schema into 1NF by dividing this relation
into two relations.
115
SSN Name Address Age College_Degree
Database Management system Dept of Computer Science & Engg, VJCET
Name→ FName, MInit, LName
Address→ ApartmentNo, City
Person_Residence
Table 4.2
College_Degree
Table 4.3
4.11 Second normal form (2NF) First, the table must be in 1NF, plus, we want to make sure that every
Non-Primary-Key attribute (field) is fully functionally dependent upon the
ENTIRE Primary-Key for its existence. This rule ONLY applies when you have
a multi-part (concatenated) Primary Key (PK).
It requires that data stored in a table with a composite primary key must not be
dependent on only part of the table's primary key. And the database must meet
all the requirements of the first normal form.
Take each non-key field, and ask this question: If I knew part of the PK, could I
tell what the non-key field would be.
Inventory
Table4.4
In this inventory table, Description combined with Supplier is our PK. This is
because we have two of the same product that come from different suppliers.
There are two non-key fields. So, we can ask the questions:
If we know just Description, can we find out Cost? No, because we have more
than one supplier for the same product.
116
SSN FName LName MInit ApartmentNo City
SSN UG PG
Description Supplier Cost Supplier_Address
Database Management system Dept of Computer Science & Engg, VJCET
If we know just Supplier, and we find out Cost? No, because we need to know
what the Item is as well.
Therefore, Cost is fully, functionally dependent upon the ENTIRE PK
(Description-Supplier) for its existence.
If we know just Description, can we find out Supplier Address? No, because
we have more than one supplier for the same product.
If we know just Supplier, Can we find out Supplier Address? Yes. The
Address does not depend upon the Description of the item.
Therefore, Supplier Address is NOT functionally dependent upon the ENTIRE PK
(Description-Supplier) for its existence.
We must get rid of Supplier Address from this table.
Inventory
Table 4.5
Supplier
Table 4.6
At this point, since it is the "Supplier" table, we can rename the "Supplier"
filed to "Name." Name is the PK for this new table.
General Definition:
A relation schema R is in second normal form (2NF) if every nonprime
attribute A in R is not partially dependent on any key of R.
4.12 Third normal form (3NF) For 3NF, first, the table must be in 2NF, plus, we want to make sure
that the non-key fields are dependent upon ONLY the PK, and not on any other
field in the table. This is very similar to 2NF, except that now you are
comparing the non-key fields to OTHER non-key fields.
For database to be in third normal form
1. The database must meet all the requirements of the second normal form.
117
Description Supplier Cost
Name Supplier_Address
Database Management system Dept of Computer Science & Engg, VJCET
2. Any field which is dependent not only on the primary key but also on another
field is moved out to a separate table.
Book
Table 4.7
Again, just ask the questions:
If I know # of Pages, can I find out Author's Name? No. Can I find out
Author's affiliation No? No.
If I know Author's Name, can I find out # of Pages? No. Can I find out
Author's affiliation No? YES.
Therefore, Author's affiliation No is functionally dependent upon Author's
Name, not the PK for its existence.
Book
Table 4.8
Author
Table 4.9
General Definition:
A relation schema R is in 3NF if, whenever a nontrivial functional
dependency X→A holds in R,
Either a) X is a Superkey
Or b) Y is a prime attribute of R.
i.e. A relation schema R is in 3NF if every nonprime attribute of R meets both of
the following terms:
1. It is fully functionally dependent on every key of R.
2. It is nontransitively dependent on every key of R.
4.13 Boyce-Codd normal form (BCNF) A row is in BCNF if and only if every determinant is a candidate key.
118
Name Auth_Name #Pages Auth_Affil_No
Name Auth_Name #Pages Name Auth_Name #Pages
Name Auth_Affil_No
Database Management system Dept of Computer Science & Engg, VJCET
The second and third normal forms assume that all attributes not part of the
candidate keys depend on the candidate keys but does not deal with
dependencies within the keys. BCNF deals with such dependencies.
A relation R is said to be in BCNF if whenever X -> A holds in R, and A is not
in X, then X is a candidate key for R.
BCNF covers very specific situations where 3NF misses interdependencies
between non key attributes. It should be noted that most relations that are in
3NF are also in BCNF. Infrequently, a 3NF relation is not in BCNF and this
happens only if
(a) the candidate keys in the relation are composite keys (that is, they are not
single attributes),
(b) there is more than one candidate key in the relation, and
(c) the keys are not disjoint, that is, some attributes in the keys are common.
The BCNF differs from the 3NF only when there are more than one candidate
keys and the keys are composite and overlapping. Consider for example, the
relationship
enrol (sno, sname, cno, cname, date-enrolled)
Let us assume that the relation has the following candidate keys:
(sno, cno)
(sno, cname)
(sname, cno)
(sname, cname)
(we have assumed sname and cname are unique identifiers). The relation is in
3NF but not in BCNF because there are dependencies
sno -> sname
cno -> cname
where attributes that are part of a candidate key are dependent on part of
another candidate key. Such dependencies indicate that although the relation is
about some entity or association that is identified by the candidate keys
e.g. (sno, cno), there are attributes that are not about the whole thing that the
keys identify. For example, the above relation is about an association
(enrolment) between students and subjects and therefore the relation needs to
include only one identifier to identify students and one identifier to identify
subjects. Providing two identifiers about students (sno, sname) and two keys
119
Database Management system Dept of Computer Science & Engg, VJCET
about subjects (cno, cname) means that some information about students and
subjects that is not needed is being provided. This provision of information
will result in repetition of information and the anomalies. If we wish to include
further information about students and courses in the database, it should not be
done by putting the information in the present relation but by creating new
relations that represent information about entities student and subject.
These difficulties may be overcome by decomposing the above relation in the
following three relations:
(sno, sname)
(cno, cname)
(sno, cno, date-of-enrolment)
We now have a relation that only has information about students, another only
about subjects and the third only about enrolments. All the anomalies and
repetition of information have been removed.
4.14 Multivalued Dependency and Fourth normal form In a relational model, if all of the information about an entity is to be
represented in one relation, it will be necessary to repeat all the information
other than the multivalue attribute value to represent all the information that
we wish to represent. This results in many tuples about the same instance of
the entity in the relation and the relation having a composite key (the entity id
and the mutlivalued attribute). Of course the other option suggested was to
represent this multivalue information in a separate relation. The situation of
course becomes much worse if an entity has more than one multivalued
attributes and these values are represented in one relation by a number of
tuples for each entity instance. The multivalued dependency relates to this
problem when more than one multivalued attributes exist. Consider the
following relation that represents an entity employee that has one mutlivalued
attribute proj:
emp (e#, dept, salary, proj)
We have so far considered normalization based on functional dependencies;
dependencies that apply only to single-valued facts. For example, e# -> dept
implies only one dept value for each value of e#. Not all information in a
120
Database Management system Dept of Computer Science & Engg, VJCET
database is single-valued, for example, proj in an employee relation may be
the list of all projects that the employee is currently working on. Although e#
determines the list of all projects that an employee is working on, e# -> proj is
not a functional dependency.
We can more clear the multivalued dependency by the following example.
programmer (emp_name, qualifications, languages)
This relation includes two multivalued attributes of entity programmer;
qualifications and languages. There are no functional dependencies.
The attributes qualifications and languages are assumed independent of each
other. If we were to consider qualifications and languages separate entities,
we would have two relationships (one between employees and qualifications
and the other between employees and programming languages). Both the
above relationships are many-to-many i.e. one programmer could have several
qualifications and may know several programming languages. Also one
qualification may be obtained by several programmers and one programming
language may be known to many programmers.
Functional dependency A -> B relates one value of A to one value of B while
multivalued dependency A ->> B defines a relationship in which a set of
values of attribute B are determined by a single value of A.
Now, more formally, X ->> Y is said to hold for R(X, Y, Z) if t1 and t2 are two
tuples in R that have the same values for attributes X and therefore with t1[x]
= t2[x] then R also contains tuples t3 and t4 (not necessarily distinct) such that
t1[x] = t2[x] = t3[x] = t4[x]
t3[Y] = t1[Y] and t3[Z] = t2[Z]
t4[Y] = t2[Y] and t4[Z] = t1[Z]
In other words if t1 and t2 are given by
t1 = [X, Y1, Z1], and
t2 = [X, Y2, Z2]
then there must be tuples t3 and t4 such that
t3 = [X, Y1, Z2], and
t4 = [X, Y2, Z1]
We are therefore insisting that every value of Y appears with every value of Z
to keep the relation instances consistent. In other words, the above conditions
insist that X alone determines Y and Z and there is no relationship between Y
121
Database Management system Dept of Computer Science & Engg, VJCET
and Z since Y and Z appear in every possible pair and hence these pairings
present no information and are of no significance.
Fourth normal formFourth normal form (or 4NF) requires that there be no non-trivial multivalued
dependencies of attribute sets on something other than a superset of a
candidate key. A table is said to be in 4NF if and only if it is in the BCNF and
multivalued dependencies are functional dependencies. The 4NF removes
unwanted data structures: multivalued dependencies.
Definition: A relation schema R is in 4NF with respect to a set of
dependencies if, for every non trivial multivalued dependency X ->>Y in F+,
X is a superkey for R.
Properties Of Relational Decompositions
Decomposition Property: A relation must satisfy the following two properties
during decomposition.
i. Lossless:- A lossless-join dependency is a property of decomposition, which
ensures that spurious rows are generated when relations are united through a
natural join operation. i.e. The information in an instance r of R must be
preserved in the instances r1, r2, r3, …..rk where ri = ∏Ri (r)
Decomposition is lossless with respect to a set of functional dependencies F if,
for every relation instance r on R satisfying F,
R=∏R1 (r) * ∏R2 (r) * . . . . . . . ∏Rn (r)
ii. Dependency Preserving Property: - If a set of functional dependencies
hold on R it should be possible to enforce F by enforcing appropriate
dependencies on each r1 .
Decomposition D= (R1, R2, R3, ………, Rk) of schema R preserves a set of
dependencies F if,
(∏R1 (F) U ∏R2 (F) U . . . . . . . . . . . ∏Rn (F)) +=F+
∏Ri(F) is the projection of F onto Ri.
i.e Any FD that logically follows from F must also logically follows from the
union of projection of F onto Ri ‘S . Then D is called dependency preserving.
4.15 Join Dependency and Fifth Normal Form
122
Database Management system Dept of Computer Science & Engg, VJCET
Join dependency is the term used to indicate the property of a relation
schema that cannot be decomposed losslesly into two relation schema, but can
be decomposed losslesly into three or more simpler relation schema. It means
that a table, after it has been decomposed into three or more smaller tables
must be capable of being joined again on common keys to form the original
table.
Fifth normal form
Fifth normal form (5NF and also PJ/NF) requires that there are no non-trivial
join dependencies that do not follow from the key constraints. A table is said
to be in the 5NF if and only if it is in 4NF and the candidate keys imply every
join dependency in it.
4.16 Pitfalls in Relational Database Design.
A bad design may have several properties, including:
Repetition of information.
Inability to represent certain information.
Loss of information.
Module 5.
5.1Distributed database concepts A distributed computing system consists of a number of processing elements that
are interconnected by a computer network and that co-operate in performing certain
assigned tasks.
A distributed database (DDB) is a collection of multiple logically interrelated
databases distributed over a computer network. A distributed database management
123
Database Management system Dept of Computer Science & Engg, VJCET
system (DDBMS) is a software system that manages a distributed database while
making the distribution transparent to the user. At the physical hardware level, the
following main factors distinguish a DDBMS from a centralized system:
There are multiple computers called sites or nodes.
These sites must be connected by some type of communication
network to transmit data and commands among sites.
Parallel versus Distributed technology – There are two main types of
multiprocessor system architecture:
Shared memory (tightly coupled) architecture: Multiple memory share
secondary (disk) storage and also share primary memory.
Shared disk (loosely coupled) architecture: Multiple processors share
secondary (disk) storage but each has their own primary memory.
Database management systems developed using the above types of architectures are
termed parallel database management systems; rather than DDBMS they utilize
parallel processor technology. In another type of architecture called shared nothing
architecture, every processor has its own primary and secondary (disk) memory, no
common memory exists and the processors communicate over a high-speed
interconnection network. Although the shared nothing architecture resembles a
distributed database computing environment, major differences exist in the mode of
operation. In shared nothing architecture, there is symmetry and homogeneity of
nodes; this is not true of the distributed database environment where heterogeneity of
nodes is very common.Advantages of Distributed Databases:-
1. Management of distributed data with different levels of transparency: Ideally,
a DBMS should be distribution transparent in the sense of hiding the details of where
each file is physically stored within the system. The following types of transparencies
are possible:
Distribution or network transparency: This refers to the freedom for the user
from the operational details of the network. It may be divided into location
transparency and naming transparency. Location transparency refers to the
fact that the command used to perform a task is independent of the location of
data and the location of the system where the command was issued. Naming
transparency implies that once a name is specified, the named objects can be
accessed unambiguously without additional specification.
124
Database Management system Dept of Computer Science & Engg, VJCET
Replication transparency: Copies of data may be stored at multiple sites for
better availability, performance and reliability. Replication transparency
makes the user unaware of the existence of fragments.
Fragmentation transparency: Fragmentation makes the user unaware of the
existence of fragments.
2. Increased availability and reliability: Reliability is defined as the probability that
a system is running at a certain time point. Availability is the probability that the
system is continuously available during a time interval. When the data and DBMS
software are distributed over several sites one site may fail while other sites continue
to operate. Only the data and software that exist at the failed state cannot be accessed.
This improves both reliability and availability.
3. Improved performance: A distributed DBMS fragments the database by keeping
the data closer to where it is needed most. Data localization reduces the contention for
CPU and I/O services and simultaneously reduces access delays involved in wide area
networks. When a large database is distributed over multiple sites, smaller databases
exist at each site. As a result, local queries and transactions accessing data at a single
site have better performance because of the small local databases. Moreover,
interquery and intraquery parallelism can be achieved by executing multiple queries at
different sites.
4. Easier expansion: In a distributed environment, expansion of the system in terms
of adding more data, increasing database sizes or adding more processors is much
easier.
Additional Functions of Distributed Databases
1. Keeping track of data: The ability to keep track of the data distribution,
fragmentation and replication by expanding the DBMS catalog.
2. Distributed Query processing: The ability to access remote sites and transmit
queries and data among the various sites via a communication network.
3. Distributed transaction management: The ability to devise execution strategies for
queries and transactions that access data from more than one site and to synchronize
the access to distributed data and maintain integrity of the overall database.
4. Replicated data management: The ability to decide which copy of a replicated data
item to access and to maintain the consistency of copies of a replicated data item.
125
Database Management system Dept of Computer Science & Engg, VJCET
5. Distributed database recovery: The ability to recover from individual site crashes
and from new types of failures such as the failure of communication links.
6. Security: Distributed transactions must be executed with the proper management of
the security of the data and the authorization/access privileges of users.
7. Distributed directory (catalog) management: A directory contains information
(metadata) about data in the database.
5.2 Data Fragmentation This is the process of breaking up the database into logical units called fragments,
which may be assigned for storage at the various sites. There are mainly two types of
fragmentation:
Horizontal fragmentation
Vertical fragmentation
a) Horizontal fragmentation – A horizontal fragment of a relation is a subset of the
tuples in that relation. The tuples that belong to the horizontal fragment are specified
by a condition on one or more attributes of the relation. Often, only a single attribute
is involved. Horizontal fragmentation divides a relation “horizontally” by grouping
rows to create subset of tuples, where each subset has a certain logical meaning.
These fragments can be assigned to different sites in the distributed system. Derived
horizontal fragmentation applies the partitioning of a primary relation to other
secondary relations which are related to the primary via a foreign key. Each horizontal
fragment on a relation R can be specified by a σCi(R) operation in the relational
algebra. A set of horizontal fragments whose conditions C1, C2, ……., Cn include all
the tuples in R (i.e. every tuples in R satisfies (C1 or C2 or …..or Cn)) is called a
complete horizontal fragmentation of R. In many cases, a complete horizontal
fragmentation is also disjoint; i.e. no tuple in R satisfies (Ci and Cj) for any i ≠ j.
b) Vertical fragmentation – Vertical fragmentation divides a relation “vertically” by
columns. A vertical fragment of a relation keeps only certain attributes of the relation.
It is necessary to include the primary key or some candidate key attribute in every
vertical fragment so that the full relation can be reconstructed from the fragments. For
e.g.: Consider the schema Employee (Name, Bdate, Address, Sex, SSN, Salary, DNo).
We want to fragment this relation into 2 vertical fragments. The first fragment
includes personal information – Name, Address, Bdate and Sex – and the second
126
Database Management system Dept of Computer Science & Engg, VJCET
fragment includes work related information – SSN, Salary and DNo. This
fragmentation is not proper because, if the two fragments are stored separately we
cannot put the original employee tuples back together, since there is no common
attribute between the two fragments. Hence we must add SSN attribute to the personal
information fragment also. A vertical fragment on a relation R can be specified by a
ПLi(R) operation in the relational algebra. A set of vertical fragments whose projection
lists L1, L2, ……., Ln include all the attributes in R but share only the primary key
attribute of R is called a complete vertical fragmentation of R. In this case, the
projection lists satisfy the following conditions:
1. L1 U L2 U…..U Ln = ATTRS(R)
2. Li ∩ Lj = PK(R) for any i ≠ j, where ATTRS(R) is the set of attributes of R and
PK(R) is the primary key of R.
c) Mixed (Hybrid) fragmentations – Mixed fragmentation is the combination of
vertical fragmentation and horizontal fragmentation. In general a fragment of a
relation can be constructed by a SELECT-PROJECT combination of operations
ПL(σC(R)).
If C = True and L ≠ ATTRS(R), we get a vertical fragment.
If C ≠ True and L = ATTRS(R), we get a horizontal fragment.
If C ≠ True and L ≠ ATTRS(R), we get a mixed fragment.
d) Fragmentation schema – A fragmentation schema of a database is a definition of
a set of fragments that includes all attributes and tuples in the database and satisfies
the condition that the whole database can be reconstructed from the fragments by
applying some sequence of OUTER UNION and UNION operations.
e) Allocation schema – An allocation schema describes the allocation of fragments to
sites of the DDBS; hence it is a mapping that specifies for each fragment the site(s) at
which it is stored.
5.3 Data Replication and Allocation If a fragment is stored at more than one site, it is said to be replicated.
a) Fully replicated distributed database – If the replication of whole database is
done at every site in the distributed system, the resulting database is called a fully
replicated distributed database. This can improve availability remarkably because the
127
Database Management system Dept of Computer Science & Engg, VJCET
system can continue to operate as long as at least one site is up. It also improves
performance of retrieval for global queries. The disadvantage of full replication is that
it can slow down update operations drastically.
b) Nonredundant allocation – In this system, each fragment is stored at exactly one
site. In this case, all fragments must be disjoint except for the repetition of primary
keys among vertical (or mixed) fragments.
c) Partial replication – In this system, some fragments of the database may be
replicated whereas others may not. The number of copies of each fragment can range
from one up to the total number of sites in the distributed system.
d) Replication schema – A description of the replication of fragments is called
replication schema. Each fragment – or each copy of a fragment – must be assigned to
a particular site in the distributed system. This process is called data distribution or
data allocation.
5.4 Types of Distributed Database Systems
The term distributed database management system can describe various systems that
dif fer from one another in many respects. The main thing that all such systems have
in com. mon is the fact that data and software are distributed over multiple sites
connected by some form of communication network.
The first factor we consider is the qegree of homogen.eity of the DDBMS
software. If all servers (or individual local DBMSs) use identical soft.ware and all
users (clients) use identical software, the DDBMS is called homogeneous; 'otherwise,
it is called heterogeneous. Another factor related to the degree of homogeneity is the
degree of local auton. omy. If there is no provision for the local site to function as a
stand-alone DBMS, then the system has no local autonomy. On the other hand, if
direct access by local transactions to a server is permitted, the system has some
degree of local autonomy.
At one extreme of the autonomy spectrum, we have a DDBMS that "looks like" a
centralized DBMS to the user. A single conceptual schema exists, and all access to the
system is obtained through a site that is part of the 'DDBMS--which means that no
local autonomv exists. At the other extreme we encounter a type of DDBMS called a
federated DDBMS (or a multidatabase system). In such a system, each server is an
fndependent and autonomous centralized DBMS that has its own local users, local
128
Database Management system Dept of Computer Science & Engg, VJCET
transactions, and DBA and hence has a very high degree of local autonomy. The term
federated database system (FOBS) is used when there is some global view or schema
of the federation of databases that is shared by the applications. On the other hand, a
multidatabase system does not have a global schema and interactively constructs one
as needed by the application. Both systems are hybrids between distributed and
centralized systems and the distinction we made between them is not strictly
followed. We will refer to them as FDBSs in a generic sense.
In a heterogeneous FOBS, one server may be a relational DBMS, another a network
DBMS, and a third an object or hierarchical DBMS; in such a case it is necessary to
have a canonical system language and to include language translators to translate
subqueries nom the canonical language to the language of each server. We briefly
discuss the issues affecting the design of FDBSs below.
Federated Database Management Systems Issues
. The type of heterogeneity present inFDBSs may arise from several sources.
.. Differences in data models: Databases in an organization come from a
variety of data models including, the relational data model, the object data
model, etc.The modeling capabilities of the models vary. Hence, to deal with
them uniformly via a single global schema or to process them in a single
language is challenging. Even if two databases are both from the RDBMS
environment, the same information may be represented as an attribute name,
as a relation name, or as a value in different databases. This calls for an
intelligent query-processing mechanism that can relate information based on
metadata.
. Differences in constraints: Constraint facilities for specification and
implementation vary from system to system-. There are comparable features
that must be reconciled in the construction of a global schema. For example,
the relationships from ER models are represented as referential integrity
constraints in the relational model. Triggers may have to be used to
implement certain constraints in the relational model. The global schema
must also deal with potential conflicts among constraints.
. Differences in query languages: Even with the same data model, the
languages and their versions vary. For example, SQL has multiple versions
129
Database Management system Dept of Computer Science & Engg, VJCET
like SQL-89, SQL-92 (SQL2), and SQL3, and each system has its own set of
data types, comparison operators, string manipulation features, and so on.
Semantic Heterogeneity.
Semantic heterogeneity occurs when there are differences in the meaning,
interpretation, and intended use of the same or related data. Semantic heterogeneity
among component database systems (OBSs) creates tne mggest hurdle in designing
global schemas of heterogeneous databases. The design autonQmy of component
OBSs refers to their freedom of choosing the following design parameters, which In
turn affect the eventual complexity of the FOBS:
. The universe of discourse from which the data is drawn: For example, two
customer accounts databases in the federation may be from United States and
Japan with entirely different sets of attributes about customer accounts
required by the accounting practices. Currency rate fluctuations would also
present a problem. Hence, relations in these two databases which have
identical nameS---CUSTOMER or ACCOUNT may have some common and
some entirely distinct information.
. Representation and naming: The representation and naming of data
elements and,the structure of the data model may be prespecified for each
local database.
. The understanding, meaning, and subjective interpretation of data.
This is a chief contributor to semantic heterogeneity
. Transaction,and policy constraints: These deal with serializability criteria,
compensating transactions, and other transaction policies.
Derivation of summaries: Aggregation, summarization, and other data-
processing features and operations supported by the system.
5.5Query Processing in Distributed Databases
Data Transfer Costs of Distributed Query Processing
In a distributed system, several additional factors further complicate
query processing. The first is the cost of transferring data over the network. This
data includes intermediate files that are transferred to other sites for further
130
Database Management system Dept of Computer Science & Engg, VJCET
processing, as well as the final result files that may have to be transferred to the
site where the query result is needed. Although these costs may not be very high
if the sites are connected via a high-performance local area network, they
become quite significant in other types of networks. Hence, OOBMS query
optimization algorithms consider the goal of n~ducing the amount of data
transfer as an optimization criterion in choosing a distributed query execution
strategy.
Distributed Query Processing Using Semijoin
The idea behind distributed query processing using the semi join operation is to reduce the number of tuples in a relation before transferring it to another site. Intuitively, the idea is 10 send the joining column of one relation R to the site where the other relation S is located; this column is then joined with S. Following that, the join attributes, along with rheattributes required in the result, are projected out and shipped back to the original site and joined with R. Hence, only the joining column of R is transferred in one direction, and a subset of S with no extraneous tuples or attributes is transferred in the other direction.lf only a small fraction of the tuples in S participate in the join, this can be quite an efficient solution to minimizing data transfer.
Query and Update Decomposition
In a DDBMS with no distribution transparency, the user phrases a query directly in
terms of specific fragments.
The user must also maintain consistency of replicated data items when updating a
DDBMS with no replication transparency.
On the other hand, a DDBMS that supports full distribution, fragmentation, and
replication transparency allows the user to specify a query or update request on
the schema just as though the DBMS were centralized. For updates, the DDBMS
is responsible for maintaining consistency among replicated items by using one of
the distributed concurrency control algorithms. For queries, a query decomposi.
tion module must break up or decompose a query into subqueries that can be
executed at the individual sites. In addition, a strategy for combining the results of
the subqueries to form the query result must be generated. Whenever the DDBMS
determines that an item referenced in the query is replicated, it must choose or
materialize a particular replica during query execution.
131
Database Management system Dept of Computer Science & Engg, VJCET
To determine which replicas include the data items referenced in a query, the
DDBMS refers to the fragmentation, replication, and distribution information
stored in the DDBMS catalog. For vertical fragmentation, the attribute list for
each fragment is kept in the catalog. For horizontal fragmentation, a condition,
sometimes called a guard, is kept for each fragment. This is basically a selection
condition that specifies which tuples exist in the fragment; it is called a guard
because only tuples that satisfy this condition are permitted to be stored in the
fragment. For mixed fragments, both the attribute list and the guard can. dition are
kept in the catalog.
5.6 Concurrency Control and Recovery in Distributed Databases
For concurrency control and recovery purposes, numerous problems arise in a
distributed DBMS environment that are not encountered in a centralized DBMS
environment. These include the following:
Dealing with multiple copies of the data items: The concurrency control
method is responsible for maintaining consistency among these copies. The
recovery method is responsible for making a copy consistent with other
copies if the site on which the copy is stored fails and recovers later.
Failure of individual sites: The DDBMS should continue to operate with its
running sites, if possible, when one or more individual sites fail. When a site
recovers, its local database must be brought up to date with the rest of the
sites before it rejoins the system.
Failure of communication links: The system must be able to deal with failure
of one or more of the communication links that connect the sites. An extreme
case of this problem is that network partitioning may occur. This breaks up the
sites into two or more partitions, where the sites within each partition can
communicate only with one another and not with sites in other partitions. .
Distributed commit: Problems can arise with committing a transaction that is
accessing databases stored on multiple sites if some sites fail during the
commit process. The two-phase commit protocol (see Chapter 21) is often
used to deal with this problem.
Distributed deadlock: Deadlock may occur among several sites, so techniques
for dealing with deadlocks must be extended to take this into account.
132
Database Management system Dept of Computer Science & Engg, VJCET
References
1. Fundamentals of Database System Elmasri and Navathe (3rd Edition),Pearson
Education Asia
2. Database System Concepts - Henry F Korth, Abraham Silbershatz, Mc
Graw Hill 2nd edition.
3. An Introduction to Database Systems - C.J.Date (7th Edition) Pearson
Education Asia
4. Database Principles, Programming and Performance – Patrick O’Neil, Elizabeth
O’Neil
5. An Introduction to Database Systems - Bibin C. Desai
6. Teach yourself PL/SQL in 21 days - SAMS Publications.
7. SQL,PLSQL - Ivan Bayross.
8. ORACLE Developers guide - David McClanahan.
133
Database Management system Dept of Computer Science & Engg, VJCET
134