Upload
md-hassan
View
230
Download
0
Embed Size (px)
Citation preview
8/14/2019 Database Management Systems UNIT - 1
1/34
UNIT I
INTRODUCTION
Database Management Systems
Computer Databaseis a structured collection of records or data that is stored in a computer
system. The structure is achieved by organizing the data according to a database model. The
model in most common use today is the relational model. Other models such as the
hierarchical modeland thenetwork modeluse a more explicit representation of relationships
(see below for explanation of the various database models).
A computerdatabaserelies upon softwareto organize the storage of data. This software is
known as a database management system (!"#). atabase management systems are
categorized according to the database model that they support. The model tends to
determine the $uery languagesthat are available to access the database. Agreat deal of
the internal engineering of a !"#% however% is independent of the data model% and is
concerned with managing factors such as performance% concurrency% integrity% and
recovery from hardware failures. &n these areas there are large differences between
products.
'. A database management system (!"#)% or simply a database system (!#)%
consists of
o A collection of interrelated and persistent data (usually referred to as the
database(!)).
o A set of application programs used to access% update and manage that data
(which form the data management system ("#)).. The goal of a !"# is to provide an environment that is both convenient and
efficientto use in
o etrieving information from the database.
o #toring information into the database.
*. atabases are usually designed to manage largebodies of information. This involves
o efinition of structures for information storage (data modeling).
o
+rovision of mechanisms for the manipulation of information (file and systemsstructure% $uery processing).
'
http://en.wikipedia.org/wiki/Database_modelhttp://en.wikipedia.org/wiki/Relational_modelhttp://en.wikipedia.org/wiki/Hierarchical_modelhttp://en.wikipedia.org/wiki/Network_modelhttp://en.wikipedia.org/wiki/Network_modelhttp://en.wikipedia.org/wiki/Computerhttp://en.wikipedia.org/wiki/Computerhttp://en.wikipedia.org/wiki/Softwarehttp://en.wikipedia.org/wiki/Softwarehttp://en.wikipedia.org/wiki/Database_modelhttp://en.wikipedia.org/wiki/Query_languageshttp://en.wikipedia.org/wiki/Query_languageshttp://en.wikipedia.org/wiki/Database_modelhttp://en.wikipedia.org/wiki/Relational_modelhttp://en.wikipedia.org/wiki/Hierarchical_modelhttp://en.wikipedia.org/wiki/Network_modelhttp://en.wikipedia.org/wiki/Computerhttp://en.wikipedia.org/wiki/Softwarehttp://en.wikipedia.org/wiki/Database_modelhttp://en.wikipedia.org/wiki/Query_languages8/14/2019 Database Management Systems UNIT - 1
2/34
o +roviding for the safety of information in the database (crash recovery and
security).
o ,oncurrency control if the system is shared by users.
Purpose of Database Systems
'. To see why database management systems are necessary% a typical --fileprocessing
system// supported by a conventional operating system.
The application is a savings bank0
o #avings account and customer records are kept in permanent system files.
o Application programs are written to manipulate files to perform the following
tasks0
ebit or credit an account.
Add a new account.
1ind an account balance.
2enerate monthly statements.
. evelopment of the system proceeds as follows0
o 3ew application programs must be written as the need arises.
o 3ew permanent files are created as re$uired.
o butover a long period of time files may be in different formats% and
o Application programs may be in different languages.
*. #o we can see there are problems with the straight fileprocessing approach0
o ata redundancy and inconsistency
#ame information may be duplicated in several places.
All copies may not be updated properly.
o ifficulty in accessing data
"ay have to write a new application program to satisfy an unusual
re$uest.
4.g. find all customers with the same postal code.
,ould generate this data manually% but a long 5ob...
o ata isolation
ata in different files.
8/14/2019 Database Management Systems UNIT - 1
3/34
ata in different formats.
ifficult to write new application programs.
o "ultiple users
6ant concurrency for faster response time.
3eed protection for concurrent updates.
4.g. two customers withdrawing funds from the same account at the
same time account has 7899 in it% and they withdraw 7'99 and 789.
The result could be 7*89% 7:99 or 7:89 if no protection.
o #ecurity problems
4very user of the system should be able to access only the data they are
permitted to see.
4.g. payroll people only handle employee records% and cannot see
customer accounts; tellers only access account data and cannot see
payroll data.
ifficult to enforce this with application programs.
o &ntegrity problems
ata may be re$uired to satisfy constraints.
4.g. no account balance below 78.99.
Again% difficult to enforce or to change constraints with the file
processing approach.
These problems and others led to the development of database management
systems.
Data bstraction
'. The ma5or purpose of a database system is to provide users with an abstract vie!of
the system.
The system hides certain details of how data is stored and created and maintained
,omplexity should be hidden from database users.
. There are several levels of abstraction0
'. +hysical
8/14/2019 Database Management Systems UNIT - 1
4/34
1eatures of physical data model include0
#pecification all tables and columns.
1oreign keys are used to identify relationships between tables.
enormalization may occur based on user re$uirements. +hysical considerations may cause the physical data model to be $uite different from
the logical data model.
At this level% the data modeler will specify how the logical data model will be realized in the
database schema.
The steps for physical data model design are as follows0
'. ,onvert entities into tables.. ,onvert relationships into foreign keys.
*. ,onvert attributes into columns.
:. "odify the physical data model based on physical constraints = re$uirements.
>ow the data are stored.
4.g. index% !tree% hashing.
8/14/2019 Database Management Systems UNIT - 1
5/34
The steps for designing the logical data model are as follows0
'. &dentify all entities.
. #pecify primary keys for all entities.
*. 1ind the relationships between different entities.
:. 1ind all attributes for each entity.
8. esolve manytomany relationships.
?. 3ormalization.
3ext highest level of abstraction.
escribes whatdata are stored.
escribes the relationships among data.
atabase administrator level.
*. @iew ighest level.
escribespartof the database for a particular group of users.
,an be many different views of a database.
4.g. tellers in a bank get a view of customer accounts% but not of
payroll data.
1ig. llustrates the three levels.
"igure #$#%The three levels of data abstraction
&O'IC& DT IND(P(NC( ND P)*SIC& DT
IND(P(ND(NC(
8
8/14/2019 Database Management Systems UNIT - 1
6/34
&ogical Data Independence%
8/14/2019 Database Management Systems UNIT - 1
7/34
,on
ceptual 4 "odel
A conceptual entityrelationship model shows how the business
world sees information. &t suppresses noncritical details in order
to emphasize business rules and user ob5ects. &t typically
includes only significant entities which have business meaning%along with their relationships. "anytomany relationships are
acceptable to represent entity associations.
A conceptual model might discover that there is a need to house
information about each person in an organization. 6hile
considerable thought is given to discovering and describing the
relevant properties of each person% the designers accept
implicitly that each person is distinct and uni$ue.
A conceptual model may include a few significant attributes to
augment the definition and visualization of entities. 3o effort
need be made to inventory the full attribute population of such a
model. A conceptual model may have some identifying concepts
or candidate keys noted but it explicitly does not include a
complete scheme of identity% since identifiers are logical choicesmade from a deeper context.
D
8/14/2019 Database Management Systems UNIT - 1
8/34
8/14/2019 Database Management Systems UNIT - 1
9/34
&n #ummaryThe conceptual model is concerned with the real world view and
understanding of data; the logical model is a generalized formal
structure in the rules of information science; the physical model
specifies how this will be executed in a particular !"#instance.
@arious data modeling methodologies and products provide
these layers of abstraction in different ways. #ome address only
the physical implementation; some model only the logical
structure; others may provide elements of all three but not
necessarily in three separate views. &n each case it helps the data
modeler to understand the level of abstraction to which a
particular feature or task belongs.
Data Storage C+aracteristics
1or a significant amount of data% we re$uire persistent% inexpensive% reliable and
sharable storage methods with relatively rapid access time.
Persistent ata persists (lives on) after power is removed.
Ine-pensive typically measured on a 7 per "egabyte basis.
Reliable #hould not have to be replaced due to excessive errors.
S+arable #hould facilitate sharing of data among many users.
ccess time ata should be accessible in a relatively short period of time.
dvantages
The advantages of the database management systems can be enumerated as under0
.are+ouseofInformation
The database managementsystems are warehouses of information% where large amount of
data can be stored. The common examples in commercial applications are inventory data%
personnel data% etc. &t often happens that a common man uses a database management system%
without even realizing% that it is being used. The best examples for the same% would be theaddress book of a cell phone% digital diaries% etc. !oth these e$uipments store data in their
F
http://www.buzzle.com/articles/data-management/http://www.buzzle.com/articles/data-management/8/14/2019 Database Management Systems UNIT - 1
10/34
internal database.
Definingttributes
The uni$ue data field in a table is assigned a primary key. The primary key helps in the
identification of data. &t also checks for duplicates within the same table% thereby reducing
data redundancy. There are tables% which have a secondary key in addition to the primary
key. The secondary key is also called /foreign key/. The secondary key refers to the primary
key of another table% thus establishing a relationship between the two tables.
SystematicStorage
The data is stored in the form of tables. The tables consists of rows and columns. The primary
and secondary key help to eliminate data redundancy% enabling systematic storage of data.
C+angestoSc+ema
The table schema can be changed and it is not platform dependent. Therefore% the tables in
the system can be edited to add new columns and rows without hampering the applications%
that depend on that particular database.
No&anguageDependence
The database management systems are not language dependent. Therefore% they can be used
with various languages and on various platforms.
Table/oins
The data in two or more tables can be integrated into a single table. This enables to reduce the
size of the database and also helps in easy retrieval of data.
MultipleSimultaneousUsage
The database can be used simultaneously by a number of users. @arious users can retrieve the
same data simultaneously. The data in the database can also be modified% based on the
privileges assigned to users.
DataSecurity
ata is the most important asset. Therefore% there is a need for data security. atabasemanagement systems help to keep the data secured.
'9
http://www.buzzle.com/articles/list-of-programming-languages.htmlhttp://www.buzzle.com/articles/list-of-programming-languages.htmlhttp://www.buzzle.com/articles/list-of-programming-languages.html8/14/2019 Database Management Systems UNIT - 1
11/34
Privileges
ifferent privileges can be given to different users. 1or example% some users can edit the
database% but are not allowed to delete the contents of the database.
bstract 0ie! of Data and (asy Retrieval
!"# enables easy and convenient retrieval of data. A database user can view only the
abstract form of data; the complexities of the internal structure of the database are hidden
from him. The data fetched is in user friendly format.
DataConsistency
ata consistency ensures a consistent view of data to every user. &t includes the accuracy%
validity and integrity of related data. The data in the database must satisfy certain consistency
constraints% for example% the age of a candidate appearing for an exam should be of number
datatype and in the range of 98. 6hen the database is updated% these constraints are
checked by the database systems.
The commonly used database management system is called relational databasemanagement
system (!"#). The most important advantage of database management systems is the
systemetic storage of data% by maintaining the relationship between the data members. The
data is stored as tuples in a !"#.
The advent of ob5ect oriented programming gave rise to the concept of ob5ect oriented
database management systems. These systems combine properties like inheritance%
encapsulation% polymorphism% abstraction with atomicity% consistency% isolation and
durability% also called A,& properties of !"#.
atabase management systems have brought about systematization in data storage% along
with data security.
#$ Controlling Data Redundancy 1&n the conventional file processing system% every user
group maintains its own files for handling its data files. This may lead to
2uplication of same data in different files.
''
http://www.buzzle.com/articles/advantages-of-relational-databases.htmlhttp://www.buzzle.com/articles/data-storage/http://www.buzzle.com/articles/advantages-of-relational-databases.htmlhttp://www.buzzle.com/articles/data-storage/8/14/2019 Database Management Systems UNIT - 1
12/34
26astage of storage space% since duplicated data is stored.
2 4rrors may be generated due to updation of the same data in different files.
2Time in entering data again and again is wasted.
2,omputer esources are needlessly used.
2&t is very difficult to combine information.
3$ (limination of Inconsistency 1 &n the file processing system information is duplicated
throughGout the system. #o changes made in one file may be necessary be carried over to
another file. This may lead to inconsistent data. #o we need to remove this duplication of data
in multiple file to eliminate inconsistency.
"or e-ample% 1
8/14/2019 Database Management Systems UNIT - 1
13/34
4$ 5etter service to t+e users 1A !"# is often used to provide better services to the users.
&n conventional system% availability of information is often poor% since it normally difficult to
obtain information that the existing systems were not designed for. Once several conventional
systems are combined to form one centralized database% the availability of information and its
updateness is likely to improve since the data can now be shared and !"# makes it easy to
respond to anticipated information re$uests.
,entralizing the data in the database also means that user can obtain new and combined
information easily that would have been impossible to obtain otherwise. Also use of !"#
should allow users that don/t know programming to interact with the data more easily% unlike
file processing system where the programmer may need to write new programs to meet every
new demand.
6$ "le-ibility of t+e System is Improved 1#ince changes are often necessary to the contents
of the data stored in any system% these changes are made more easily in a centralized database
than in a conventional system. Applications programs need not to be changed on changing
the data in the database.
7$ Integrity can be improved 1 #ince data of the organization using database approach is
centralized and would be used by a number of users at a time. &t is essential to enforce
integrityconstraints.
&n the conventional systems because the data is duplicated in multiple files so updating or
changes may sometimes lead to entry of incorrect data in some files where it exists.
"or e-ample% 1The example of result system that we have already discussed. #ince multiple
files are to maintained% as sometimes you may enter a value for course which may not exist.
#uppose course can have values (,omputer% Accounts% 4conomics% and Arts) but we enter a
value />indi/ for it% so this may lead to an inconsistent data% so lack of &ntegrity.
4ven if we centralized the database it may still contain incorrect data. 1or example0
J #alary of full time employ may be entered as s. 899 rather than s. 8999.J A student may be shown to have borrowed books but has no enrollment.
'*
8/14/2019 Database Management Systems UNIT - 1
14/34
J A list of employee numbers for a given department may include a number of non existent
employees.
These problems can be avoided by defining the validation procedures whenever any update
operation is attempted.
8$ Standards can be enforced 1#ince all access to the database must be through !"#% so
standards are easier to enforce. #tandards may relate to the naming of data% format of data%
structure of the data etc. #tandardizing stored data formats is usually desirable for the purpose
of data interGchange or migration between systems.
9$ Security can be improved 1 &n conventional systems% applications are developed in an
adhoc=temporary manner. Often different system of an organization would access different
components of the operational data% in such an environment enforcing security can be $uiet
difficult. #etting up of a dataGbase makes it easier to enforce security restrictions since data is
now centralized. &t is easier to control who has access to what parts of the database. ifferent
checks can be established for each type of access (retrieve% modify% delete etc.) to each piece
of information in the database.
,onsider an (-ample of banking in which the employee at different levels may be given
access to different types of data in the database. A clerk may be given the authority to know
only the names of all the customers who have a loan in bank but not the details of each loan
the customer may have. &t can be accomplished by giving the privileges to each employee.
:$ Organi;ation
8/14/2019 Database Management Systems UNIT - 1
15/34
"or e-ample% 1 A !A must choose best file #tructure and access method to give fast
response for the high critical applications as compared to less critical applications.
>$ Overall cost of developing and maintaining systems is lo!er 1 &t is much easier to
reGspond to unanticipated re$uests when data is centralized in a database than when it is
stored in a conventional file system. Although the initial cost of setting up of a database can
be large% one normal expects the overall cost of setting up of a database% developing and
maintaining application programs to be far lower than for similar service using conventional
systems% #ince the productivity of programGmers can be higher in using nonprocedural
languages that have been developed with !"# than using procedural languages.
#?$ Data Model must be developed 1+erhaps the most important advantage of setting up of
database system is the re$uirement that an overall data model for an organization be build. &n
convenGtional systems% it is more likely that files will be designed as per need of particular
applications demand. The overall view is often not considered. !uilding an overall view of an
organization/s data is usual cost effective in the long terms.
##$ Provides bac@up and Recovery 1 ,entralizing a database provides the schemes such as
recovery and backups from the failures including disk crash% power failures% software errors
which may help the database to recover from the inconsistent state to the state that existed
prior to the occurrence of the failure% though methods are very complex.
A@A3TA24# O1 !"#
#$ Cost of )ard!areA Soft!are
A processor with high speed of data processing and memory of large size is re$uired to run
the !"# software. &t means that you have to up grade the hardware used for filebased
system. #imilarly% !"#software is also very costly.
3$ Cost of Data Conversion
6hen a computer filebased system is replaced with a database system% the data stored into
data file must be converted to database file. &t is very difficult and costly method to convert
data of data files into database. Kou have to hire database and system designers along with
application programmers. Alternatively% you have to take the services of some software
'8
http://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q799809.htmlhttp://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q315992.htmlhttp://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q799809.htmlhttp://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q315992.html8/14/2019 Database Management Systems UNIT - 1
16/34
house. #o a lot of money has to be paid for developing software. J
*$ Cost of StaffTrailing
"ost !"#s are often complex systems so the training for users to use the !"# is
re$uired. Training is re$uired at all levels% including programming% application development%
and database administration. Theorganizationhas to be paid a lot of amount for the training
of staff to run the !"#.
6$ ppointing Tec+nical Staff
The trained technical persons such as database administrator%application programmers% data
entry operators etc. Are re$uired to handle the !"#. Kou have to pay handsome salaries to
these persons. Therefore% theC system cost increases.
7$ Database Damage
&n most of the organizations% all data is integrated into a single database. &f database is
damaged due to electric failure or database is corrupted on thestorage media% then your
valuable data may be lost forever.
)ISTOR* O" DT5S( S*ST(MS
ata are raw facts that constitute building blocks of information. atabase isa collection of information and a means to manipulate data in a useful way% which
must provide proper storage for large amounts of data% easy and fast access and
facilitate the processing of data. atabase "anagement #ystem (!"#) is a set of
software that is used to define% store% manipulate and control the data in a database.
1rom prestage flatfile system% to relational and ob5ectrelational systems% database
technology has gone through several generations and its :9 years history.
T)( (0O&UTION O" T)( DT5S(
ncient )istory% ata are not stored on disk; programmer defines both
logical data structure and physical structure% such as storage structure% access
methods% &=O modes etc. One data set per program0 high data redundancy. There is no
persistence; andom access memory (A") is expensive and limited% programmer
productivity low.
'?
http://www.blurtit.com/Business_Finance/Business/Staff/http://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q781137.htmlhttp://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/Business_Finance/Business/Staff/http://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q781137.htmlhttp://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q734867.html8/14/2019 Database Management Systems UNIT - 1
17/34
#>8:1ile!ased0 predecessor of database% ata maintained in a flat file. +rocessing
characteristics determined by common use of magnetic tape medium.
ata are stored in files with interface between programs and files. "apping happens
between logical files and physical file% one file corresponds to one or several
programs
@arious access methods exits% e.g.% se$uential% indexed% random
e$uires extensive programming in thirdgeneration language such as ,O!O8:1#>:? 4ra of nonrelational database0 A database provides integrated and
structured collection of stored operational data which can be used or shared by
application systems. +rominent hierarchical database model was &!"Ls first !"#
called &"#. +rominent network database model was ,OA#K< !T2 model; &"#
was the most popular network !"#.
)ierarc+ical data model
"id 'F?9s ockwell partner with &!" to create information "anagement #ystem
(&"#)% &"# !=, lead the mainframe database market in D9Ls and early E9Ls.
!ased on binary trees.
8/14/2019 Database Management Systems UNIT - 1
18/34
o ifficult to manage and lack of standards% such as problem to add empty nodes
and canLt easily handle manymany relationships.
o
8/14/2019 Database Management Systems UNIT - 1
19/34
Two ma5or pro5ects start and both were operational in late 'FD9s
o &324# at Bniversity of ,alifornia% !erkeley became commercial and
followed up +O#T24# which was incorporated into &nformix.
o #ystem at &!" san Iose
8/14/2019 Database Management Systems UNIT - 1
20/34
o Buery processor translates statements in a $uery language into lowlevel
instructions the database manager understands. ("ay also attempt to find an
e$uivalent but more efficient form.)
o DM& precompiler converts "< statements embedded in an application
program to normal procedure calls in a host language. The precompiler
interacts with the $uery processor.
o DD& compiler converts < statements to a set of tables containing
metadata stored in a data dictionary.
&n addition% several data structures are re$uired for physical system implementation0
o Data files%store the database itself.o Data dictionary%stores information about the structure of the database. &t is
used +eavily. 2reat emphasis should be placed on developing a good design
and efficient implementation of the dictionary.
o Indices%provide fast access to data items holding particular values.
1igure shows these components.
9
8/14/2019 Database Management Systems UNIT - 1
21/34
'
8/14/2019 Database Management Systems UNIT - 1
22/34
"igure %atabase system structure.
Codd
8/14/2019 Database Management Systems UNIT - 1
23/34
?. Compre+ensive Data Sublanguage Rule
A relational system may support several languages and various modes of terminal
use. &owever, there must be at least one language whose statements are e'pressible,
per some well-defined synta', as character strings and whose ability to support all of
the following is comprehensible
a. data definition
b. view definition
c. data manipulation (interactive and by program)
d. integrity constraints
e. authoriation
f. transaction boundaries (begin, commit, and rollback).
D. 0ie! Updating Rule
All views that are theoretically updateable are also updateable by the system.
E. )ig+1level Insert Update and Delete
%he capability of handling a base relation or a derived relation as a single operand
applies nor only to the retrieval of data but also to the insertion, update, and deletion
of data.
F. P+ysical Data Independence
Application programs and terminal activities remain logically unimpaired whenever
any changes are made in either storage representation or access methods.
'9. &ogical Data Independence
Application programs and terminal activities remain logically unimpaired when
information preserving changes of any kind that theoretically permit unimpairment
are made to the base tables.
''. Integrity Independence
ntegrity constraints specific to a particular relational database must be definable in
the relational data sublanguage and storable in the catalog, not in the application
programs.
'. Distribution Independence
%he data manipulation sublanguage of a relational !"#$ must enable application
*
8/14/2019 Database Management Systems UNIT - 1
24/34
programs and terminal activities to remain logically unimpaired whether and
whenever data are physically centralied or distributed.
'*. Nonsubversion Rule
f a relational system has or supports a low-level (single-record-at-a-time) language,
that low-level language cannot be used to subvert or bypass the integrity rules or
constraints e'pressed in the higher-level (multiple-records-at-a-time) relational
language.
"ile structures and inde-ing
"ile Organi;ation
'. A fileis organized logically as a se$uence of records.
. ecords are mapped onto disk blocks.
*. 1iles are provided as a basic construct in operating systems% so we assume the
existence of an underlying file system.
:. !locks are of a fixed size determined by the operating system.
8. ecord sizes vary.
?. &n relational database% tuples of distinct relations may be of different sizes.
D. One approach to mapping database to files is to store records of one length in a given
file.
E. An alternative is to structure files to accommodate variablelength records. (1ixed
length is easier to implement.)
"i-ed1&engt+ Records
'. ,onsider a file of deposit records of the form02. aaaaaaaaaaaaPtypedeposit= record
*.
4. bname0 char();
8.
6. account*0 char('9);
D.
8. balance0 real;
F.
:
8/14/2019 Database Management Systems UNIT - 1
25/34
#?$ end
''.
o &f we assume that each character occupies one byte% an integer occupies : bytes%
and a real E bytes% our deposit record is :9 bytes long.
o The simplest approach is to use the first :9 bytes for the first record% the next :9
bytes for the second% and so on.
o >owever% there are two problems with this approach.
'. &t is difficult to delete a record from this structure.
. #pace occupied must somehow be deleted% or we need to mark deleted
records so that they can be ignored.
o Bnless block size is a multiple of :9% some records will cross block boundaries.
o &t would then re$uire two block accesses to read or write such a record.
'. 6hen a record is deleted% we could move all successive records up one (1igure '9.D)%
which may re$uire moving a lot of records.
o 6e could instead move the last record into the --hole// created by the deleted
record (1igure '9.E).
o This changes the order the records are in.
o &t turns out to be undesirable to move records to occupy freed space% as moving
re$uires block accesses.
o Also% insertions tend to be more fre$uent than deletions.
o &t is acceptable to leave the space open and wait for a subse$uent insertion.
o This leads to a need for additional structure in our file design.
'*. #o one solution is0
o At the beginning of a file% allocate some bytes as a file +eader.
o This header for now need only be used to store the address of the first record
whose contents are deleted.
o This first record can then store the address of the second available record% and so
on (1igure '9.F).
o To insert a ne!record% we use the record pointed to by the header% and change
the header pointer to the ne-tavailable record.
o &f no deleted records exist we add our new record to the end of the file.
8
8/14/2019 Database Management Systems UNIT - 1
26/34
':. Note0 Bse of pointers re$uires careful programming. &f a record pointed to is moved or
deleted% and that pointer is not corrected% the pointer becomes a dangling pointer.
ecords pointed to are called pinned.
'8. 1ixedlength file insertions and deletions are relatively simple because --one size fits
all//. 1or variable length% this is not the case.
"ile Operations
,onsider four basic 1ile Operations0
Operation Similar SB& Statement
1ind #elect
&nsert &nsert
"odify Bpdate
elete elete
Unordered file 3ew record is inserted at the end of the file.
o
&nsert takes constant time.o #elect% Bpdate and elete take n= time.
(nis the number of records)
Ordered file 3ew record is inserted in order% in the file.
o &nsert takes logn plus this time to reorganize records.
o #elect% Bpdate% elete take at least logn
Inde-ed file 3ew record is inserted at the end of the file.
o
An inde'is maintained that points to the location on disk where the record isfound.
o &nsert takes constant time for the data itself plus logn for the index
o #elect% Bpdate% elete take logn lookup on the index followed by constant
time to access data record.
IND(IN'
J Mec+anism for efficiently locating ro!EsF !it+out +aving to scan entire table
?
8/14/2019 Database Management Systems UNIT - 1
27/34
J 5ased on a search key: ro!s +aving a particular value for t+e searc+ @ey
attributes can be =uic@ly located
J DonGt confuse candidate @ey !it+ searc+ @ey%
Q Candidate @ey% setof attributesHguaranteesuni=ueness
Q Searc+ @ey% sequenceof attributesH does not guaranteeuni=ueness Just
used for searc+
Inde- Structure
J Inde- Structure Contains%
Q Index entriesJ Can contain t+e data tuple itself Einde- and table are integratedin t+is
caseFH or
J Searc+ @ey value and a pointer to a ro! +aving t+at valueH table
stored separately in t+is case unintegratedinde-
Q Location mechanism
J lgorit+m K data structure for locating an inde- entry !it+ a given
searc+ @ey value
Q Inde- entries are stored in accordance !it+ t+e searc+ @ey value
J (ntries !it+ t+e same searc+ @ey value are stored toget+er E+as+ 51
treeF
J (ntries may be sorted on searc+ @ey value E51treeF
Types of Inde-ing
An index is made up of two components0 A keyand apointer
The keyis typically the key value for the relation and is mainly used to identify and
look up records.
Thepointeris an address on disk where the rest of the data in the record can be found.
Two types of indexes discussed here0 Ordered index and >ashing.
Ordered Inde-
ecords are stored as they are inserted.
Rey attribute is stored in order in the index.
D
8/14/2019 Database Management Systems UNIT - 1
28/34
Storage Structure
J #tructure of file containing a table
Q >eap file (no index% not integrated)
Q #orted file (no index% not integrated)Q &ntegrated file containing index and rows (index entries contain rows in this
case)
J A"
J !Stree
J >ash
Inde- "ile .it+ Separate Storage Structure
Clustered Inde-
J Clustered index% inde- entries and ro!s are ordered in t+e same !ay
Q n integrated storage structure is al!ays clustered Esince ro!s
and inde- entries are t+e sameF
Q T+e particular inde- structure Eeg +as+ treeF dictates +o! t+e
ro!s are organi;ed in t+e storage structure
J T+ere can be at most one clustered inde- on a table
E
8/14/2019 Database Management Systems UNIT - 1
29/34
Q CR(T( T5&( generally creates an integrated clustered
EmainF inde- on primary @ey
J 'ood for range searc+es !+en a range of searc+ @ey values is re=uested
Q Use location mec+anism to locate inde- entry at start of range
J T+is locates first ro!$
Q Subse=uent ro!s are stored in successive locations if inde- is clustered
Enot so if unclusteredF
Q Minimi;es page transfers and ma-imi;es li@eli+ood of cac+e +its
Clustered Main Inde-
F
8/14/2019 Database Management Systems UNIT - 1
30/34
Clustered Secondary Inde-
Unclustered Inde-
J Unclustered EsecondaryF inde-% inde- entries and ro!s are not
ordered in t+e same !ay
J n secondary inde- mig+t be clustered or unclustered !it+ respect to t+e storage
structure it references
Q It is generally unclustered Esince t+e organi;ation of ro!s in t+e storage
structure depends on main inde-F
Q T+ere can be many secondary indices on a table
Q Inde- created by CR(T( IND( is generally an unclustered secondaryinde-
*9
8/14/2019 Database Management Systems UNIT - 1
31/34
Unclustered Secondary Inde-
Sparse vs$ Dense Inde-
J !ense inde'0 has index entry for each data record
Q Bnclustered index mustbe dense
Q ,lustered index need not be dense
J $parse inde'0 has index entry for each page of data file
Multiple ttribute Searc+ Ley
J ,4AT4 &34 &nx O3 Tbl (Att+% Att)J #earch key is aseuenceof attributes; index entries are lexically ordered
*'
8/14/2019 Database Management Systems UNIT - 1
32/34
J #upports finer granularity e$uality search0
Q 1ind row with value (A'% A) U
J #upports range search (tree index only)0
Q 1ind rows with values between (A'% A) and (A'
% A
) UJ #upports partial key searches (tree index only)0
Q 1ind rows with values of Att+between A' and A'
Q !ut not 1ind rows with values of Attbetween A and AU
&ocating an Inde- (ntry
J Bse binary search (index entries sorted)J &f pages of index entries% then log page transfers (which is a big
improvement over binary search of the data pages of a/page data file
since / 00)
J Bse multilevel index0 #parse index on sorted list of index entries
T!o1&evel Inde-
1 $eparator level is a sparse index over pages of index entries
1 2eaf levelcontains index entries
Q ,ost of searching the separator level VV cost of searching index level since separator level
is sparse
Q ,ost or retrieving row once index entry is found is 9 (if integrated) or ' (if not)
Multilevel Inde-
*
8/14/2019 Database Management Systems UNIT - 1
33/34
Q #earch cost H number of levels in tree
Q &f is the fanout of a separator page% cost is log 3 +
1 4xample0 if 4 '99 and H '9%999% cost H *
(reduced to if root is kept in main memory)
Inde- Se=uential ccess Met+od EISMF
J 2enerally an integrated storage structure
Q ,lustered% index entries contain rows
J #eparator entry H (ki, pi); ki is a search key value;piis a pointer to a lower level page
J ki separates set of search key values in the two subtrees pointed at bypi-+andpi.
Inde- Se=uential ccess Met+od
**
8/14/2019 Database Management Systems UNIT - 1
34/34
Inde- Se=uential ccess Met+od
J The index is static0
Q Once the separator levels have been constructed% they never change
Q 3umber and position of leaf pages in file stays fixedJ 2ood for e$uality and range searches
Q