Database Management Systems UNIT - 1

Embed Size (px)

Citation preview

  • 8/14/2019 Database Management Systems UNIT - 1

    1/34

    UNIT I

    INTRODUCTION

    Database Management Systems

    Computer Databaseis a structured collection of records or data that is stored in a computer

    system. The structure is achieved by organizing the data according to a database model. The

    model in most common use today is the relational model. Other models such as the

    hierarchical modeland thenetwork modeluse a more explicit representation of relationships

    (see below for explanation of the various database models).

    A computerdatabaserelies upon softwareto organize the storage of data. This software is

    known as a database management system (!"#). atabase management systems are

    categorized according to the database model that they support. The model tends to

    determine the $uery languagesthat are available to access the database. Agreat deal of

    the internal engineering of a !"#% however% is independent of the data model% and is

    concerned with managing factors such as performance% concurrency% integrity% and

    recovery from hardware failures. &n these areas there are large differences between

    products.

    '. A database management system (!"#)% or simply a database system (!#)%

    consists of

    o A collection of interrelated and persistent data (usually referred to as the

    database(!)).

    o A set of application programs used to access% update and manage that data

    (which form the data management system ("#)).. The goal of a !"# is to provide an environment that is both convenient and

    efficientto use in

    o etrieving information from the database.

    o #toring information into the database.

    *. atabases are usually designed to manage largebodies of information. This involves

    o efinition of structures for information storage (data modeling).

    o

    +rovision of mechanisms for the manipulation of information (file and systemsstructure% $uery processing).

    '

    http://en.wikipedia.org/wiki/Database_modelhttp://en.wikipedia.org/wiki/Relational_modelhttp://en.wikipedia.org/wiki/Hierarchical_modelhttp://en.wikipedia.org/wiki/Network_modelhttp://en.wikipedia.org/wiki/Network_modelhttp://en.wikipedia.org/wiki/Computerhttp://en.wikipedia.org/wiki/Computerhttp://en.wikipedia.org/wiki/Softwarehttp://en.wikipedia.org/wiki/Softwarehttp://en.wikipedia.org/wiki/Database_modelhttp://en.wikipedia.org/wiki/Query_languageshttp://en.wikipedia.org/wiki/Query_languageshttp://en.wikipedia.org/wiki/Database_modelhttp://en.wikipedia.org/wiki/Relational_modelhttp://en.wikipedia.org/wiki/Hierarchical_modelhttp://en.wikipedia.org/wiki/Network_modelhttp://en.wikipedia.org/wiki/Computerhttp://en.wikipedia.org/wiki/Softwarehttp://en.wikipedia.org/wiki/Database_modelhttp://en.wikipedia.org/wiki/Query_languages
  • 8/14/2019 Database Management Systems UNIT - 1

    2/34

    o +roviding for the safety of information in the database (crash recovery and

    security).

    o ,oncurrency control if the system is shared by users.

    Purpose of Database Systems

    '. To see why database management systems are necessary% a typical --fileprocessing

    system// supported by a conventional operating system.

    The application is a savings bank0

    o #avings account and customer records are kept in permanent system files.

    o Application programs are written to manipulate files to perform the following

    tasks0

    ebit or credit an account.

    Add a new account.

    1ind an account balance.

    2enerate monthly statements.

    . evelopment of the system proceeds as follows0

    o 3ew application programs must be written as the need arises.

    o 3ew permanent files are created as re$uired.

    o butover a long period of time files may be in different formats% and

    o Application programs may be in different languages.

    *. #o we can see there are problems with the straight fileprocessing approach0

    o ata redundancy and inconsistency

    #ame information may be duplicated in several places.

    All copies may not be updated properly.

    o ifficulty in accessing data

    "ay have to write a new application program to satisfy an unusual

    re$uest.

    4.g. find all customers with the same postal code.

    ,ould generate this data manually% but a long 5ob...

    o ata isolation

    ata in different files.

  • 8/14/2019 Database Management Systems UNIT - 1

    3/34

    ata in different formats.

    ifficult to write new application programs.

    o "ultiple users

    6ant concurrency for faster response time.

    3eed protection for concurrent updates.

    4.g. two customers withdrawing funds from the same account at the

    same time account has 7899 in it% and they withdraw 7'99 and 789.

    The result could be 7*89% 7:99 or 7:89 if no protection.

    o #ecurity problems

    4very user of the system should be able to access only the data they are

    permitted to see.

    4.g. payroll people only handle employee records% and cannot see

    customer accounts; tellers only access account data and cannot see

    payroll data.

    ifficult to enforce this with application programs.

    o &ntegrity problems

    ata may be re$uired to satisfy constraints.

    4.g. no account balance below 78.99.

    Again% difficult to enforce or to change constraints with the file

    processing approach.

    These problems and others led to the development of database management

    systems.

    Data bstraction

    '. The ma5or purpose of a database system is to provide users with an abstract vie!of

    the system.

    The system hides certain details of how data is stored and created and maintained

    ,omplexity should be hidden from database users.

    . There are several levels of abstraction0

    '. +hysical

  • 8/14/2019 Database Management Systems UNIT - 1

    4/34

    1eatures of physical data model include0

    #pecification all tables and columns.

    1oreign keys are used to identify relationships between tables.

    enormalization may occur based on user re$uirements. +hysical considerations may cause the physical data model to be $uite different from

    the logical data model.

    At this level% the data modeler will specify how the logical data model will be realized in the

    database schema.

    The steps for physical data model design are as follows0

    '. ,onvert entities into tables.. ,onvert relationships into foreign keys.

    *. ,onvert attributes into columns.

    :. "odify the physical data model based on physical constraints = re$uirements.

    >ow the data are stored.

    4.g. index% !tree% hashing.

  • 8/14/2019 Database Management Systems UNIT - 1

    5/34

    The steps for designing the logical data model are as follows0

    '. &dentify all entities.

    . #pecify primary keys for all entities.

    *. 1ind the relationships between different entities.

    :. 1ind all attributes for each entity.

    8. esolve manytomany relationships.

    ?. 3ormalization.

    3ext highest level of abstraction.

    escribes whatdata are stored.

    escribes the relationships among data.

    atabase administrator level.

    *. @iew ighest level.

    escribespartof the database for a particular group of users.

    ,an be many different views of a database.

    4.g. tellers in a bank get a view of customer accounts% but not of

    payroll data.

    1ig. llustrates the three levels.

    "igure #$#%The three levels of data abstraction

    &O'IC& DT IND(P(NC( ND P)*SIC& DT

    IND(P(ND(NC(

    8

  • 8/14/2019 Database Management Systems UNIT - 1

    6/34

    &ogical Data Independence%

  • 8/14/2019 Database Management Systems UNIT - 1

    7/34

    ,on

    ceptual 4 "odel

    A conceptual entityrelationship model shows how the business

    world sees information. &t suppresses noncritical details in order

    to emphasize business rules and user ob5ects. &t typically

    includes only significant entities which have business meaning%along with their relationships. "anytomany relationships are

    acceptable to represent entity associations.

    A conceptual model might discover that there is a need to house

    information about each person in an organization. 6hile

    considerable thought is given to discovering and describing the

    relevant properties of each person% the designers accept

    implicitly that each person is distinct and uni$ue.

    A conceptual model may include a few significant attributes to

    augment the definition and visualization of entities. 3o effort

    need be made to inventory the full attribute population of such a

    model. A conceptual model may have some identifying concepts

    or candidate keys noted but it explicitly does not include a

    complete scheme of identity% since identifiers are logical choicesmade from a deeper context.

    D

  • 8/14/2019 Database Management Systems UNIT - 1

    8/34

  • 8/14/2019 Database Management Systems UNIT - 1

    9/34

    &n #ummaryThe conceptual model is concerned with the real world view and

    understanding of data; the logical model is a generalized formal

    structure in the rules of information science; the physical model

    specifies how this will be executed in a particular !"#instance.

    @arious data modeling methodologies and products provide

    these layers of abstraction in different ways. #ome address only

    the physical implementation; some model only the logical

    structure; others may provide elements of all three but not

    necessarily in three separate views. &n each case it helps the data

    modeler to understand the level of abstraction to which a

    particular feature or task belongs.

    Data Storage C+aracteristics

    1or a significant amount of data% we re$uire persistent% inexpensive% reliable and

    sharable storage methods with relatively rapid access time.

    Persistent ata persists (lives on) after power is removed.

    Ine-pensive typically measured on a 7 per "egabyte basis.

    Reliable #hould not have to be replaced due to excessive errors.

    S+arable #hould facilitate sharing of data among many users.

    ccess time ata should be accessible in a relatively short period of time.

    dvantages

    The advantages of the database management systems can be enumerated as under0

    .are+ouseofInformation

    The database managementsystems are warehouses of information% where large amount of

    data can be stored. The common examples in commercial applications are inventory data%

    personnel data% etc. &t often happens that a common man uses a database management system%

    without even realizing% that it is being used. The best examples for the same% would be theaddress book of a cell phone% digital diaries% etc. !oth these e$uipments store data in their

    F

    http://www.buzzle.com/articles/data-management/http://www.buzzle.com/articles/data-management/
  • 8/14/2019 Database Management Systems UNIT - 1

    10/34

    internal database.

    Definingttributes

    The uni$ue data field in a table is assigned a primary key. The primary key helps in the

    identification of data. &t also checks for duplicates within the same table% thereby reducing

    data redundancy. There are tables% which have a secondary key in addition to the primary

    key. The secondary key is also called /foreign key/. The secondary key refers to the primary

    key of another table% thus establishing a relationship between the two tables.

    SystematicStorage

    The data is stored in the form of tables. The tables consists of rows and columns. The primary

    and secondary key help to eliminate data redundancy% enabling systematic storage of data.

    C+angestoSc+ema

    The table schema can be changed and it is not platform dependent. Therefore% the tables in

    the system can be edited to add new columns and rows without hampering the applications%

    that depend on that particular database.

    No&anguageDependence

    The database management systems are not language dependent. Therefore% they can be used

    with various languages and on various platforms.

    Table/oins

    The data in two or more tables can be integrated into a single table. This enables to reduce the

    size of the database and also helps in easy retrieval of data.

    MultipleSimultaneousUsage

    The database can be used simultaneously by a number of users. @arious users can retrieve the

    same data simultaneously. The data in the database can also be modified% based on the

    privileges assigned to users.

    DataSecurity

    ata is the most important asset. Therefore% there is a need for data security. atabasemanagement systems help to keep the data secured.

    '9

    http://www.buzzle.com/articles/list-of-programming-languages.htmlhttp://www.buzzle.com/articles/list-of-programming-languages.htmlhttp://www.buzzle.com/articles/list-of-programming-languages.html
  • 8/14/2019 Database Management Systems UNIT - 1

    11/34

    Privileges

    ifferent privileges can be given to different users. 1or example% some users can edit the

    database% but are not allowed to delete the contents of the database.

    bstract 0ie! of Data and (asy Retrieval

    !"# enables easy and convenient retrieval of data. A database user can view only the

    abstract form of data; the complexities of the internal structure of the database are hidden

    from him. The data fetched is in user friendly format.

    DataConsistency

    ata consistency ensures a consistent view of data to every user. &t includes the accuracy%

    validity and integrity of related data. The data in the database must satisfy certain consistency

    constraints% for example% the age of a candidate appearing for an exam should be of number

    datatype and in the range of 98. 6hen the database is updated% these constraints are

    checked by the database systems.

    The commonly used database management system is called relational databasemanagement

    system (!"#). The most important advantage of database management systems is the

    systemetic storage of data% by maintaining the relationship between the data members. The

    data is stored as tuples in a !"#.

    The advent of ob5ect oriented programming gave rise to the concept of ob5ect oriented

    database management systems. These systems combine properties like inheritance%

    encapsulation% polymorphism% abstraction with atomicity% consistency% isolation and

    durability% also called A,& properties of !"#.

    atabase management systems have brought about systematization in data storage% along

    with data security.

    #$ Controlling Data Redundancy 1&n the conventional file processing system% every user

    group maintains its own files for handling its data files. This may lead to

    2uplication of same data in different files.

    ''

    http://www.buzzle.com/articles/advantages-of-relational-databases.htmlhttp://www.buzzle.com/articles/data-storage/http://www.buzzle.com/articles/advantages-of-relational-databases.htmlhttp://www.buzzle.com/articles/data-storage/
  • 8/14/2019 Database Management Systems UNIT - 1

    12/34

    26astage of storage space% since duplicated data is stored.

    2 4rrors may be generated due to updation of the same data in different files.

    2Time in entering data again and again is wasted.

    2,omputer esources are needlessly used.

    2&t is very difficult to combine information.

    3$ (limination of Inconsistency 1 &n the file processing system information is duplicated

    throughGout the system. #o changes made in one file may be necessary be carried over to

    another file. This may lead to inconsistent data. #o we need to remove this duplication of data

    in multiple file to eliminate inconsistency.

    "or e-ample% 1

  • 8/14/2019 Database Management Systems UNIT - 1

    13/34

    4$ 5etter service to t+e users 1A !"# is often used to provide better services to the users.

    &n conventional system% availability of information is often poor% since it normally difficult to

    obtain information that the existing systems were not designed for. Once several conventional

    systems are combined to form one centralized database% the availability of information and its

    updateness is likely to improve since the data can now be shared and !"# makes it easy to

    respond to anticipated information re$uests.

    ,entralizing the data in the database also means that user can obtain new and combined

    information easily that would have been impossible to obtain otherwise. Also use of !"#

    should allow users that don/t know programming to interact with the data more easily% unlike

    file processing system where the programmer may need to write new programs to meet every

    new demand.

    6$ "le-ibility of t+e System is Improved 1#ince changes are often necessary to the contents

    of the data stored in any system% these changes are made more easily in a centralized database

    than in a conventional system. Applications programs need not to be changed on changing

    the data in the database.

    7$ Integrity can be improved 1 #ince data of the organization using database approach is

    centralized and would be used by a number of users at a time. &t is essential to enforce

    integrityconstraints.

    &n the conventional systems because the data is duplicated in multiple files so updating or

    changes may sometimes lead to entry of incorrect data in some files where it exists.

    "or e-ample% 1The example of result system that we have already discussed. #ince multiple

    files are to maintained% as sometimes you may enter a value for course which may not exist.

    #uppose course can have values (,omputer% Accounts% 4conomics% and Arts) but we enter a

    value />indi/ for it% so this may lead to an inconsistent data% so lack of &ntegrity.

    4ven if we centralized the database it may still contain incorrect data. 1or example0

    J #alary of full time employ may be entered as s. 899 rather than s. 8999.J A student may be shown to have borrowed books but has no enrollment.

    '*

  • 8/14/2019 Database Management Systems UNIT - 1

    14/34

    J A list of employee numbers for a given department may include a number of non existent

    employees.

    These problems can be avoided by defining the validation procedures whenever any update

    operation is attempted.

    8$ Standards can be enforced 1#ince all access to the database must be through !"#% so

    standards are easier to enforce. #tandards may relate to the naming of data% format of data%

    structure of the data etc. #tandardizing stored data formats is usually desirable for the purpose

    of data interGchange or migration between systems.

    9$ Security can be improved 1 &n conventional systems% applications are developed in an

    adhoc=temporary manner. Often different system of an organization would access different

    components of the operational data% in such an environment enforcing security can be $uiet

    difficult. #etting up of a dataGbase makes it easier to enforce security restrictions since data is

    now centralized. &t is easier to control who has access to what parts of the database. ifferent

    checks can be established for each type of access (retrieve% modify% delete etc.) to each piece

    of information in the database.

    ,onsider an (-ample of banking in which the employee at different levels may be given

    access to different types of data in the database. A clerk may be given the authority to know

    only the names of all the customers who have a loan in bank but not the details of each loan

    the customer may have. &t can be accomplished by giving the privileges to each employee.

    :$ Organi;ation

  • 8/14/2019 Database Management Systems UNIT - 1

    15/34

    "or e-ample% 1 A !A must choose best file #tructure and access method to give fast

    response for the high critical applications as compared to less critical applications.

    >$ Overall cost of developing and maintaining systems is lo!er 1 &t is much easier to

    reGspond to unanticipated re$uests when data is centralized in a database than when it is

    stored in a conventional file system. Although the initial cost of setting up of a database can

    be large% one normal expects the overall cost of setting up of a database% developing and

    maintaining application programs to be far lower than for similar service using conventional

    systems% #ince the productivity of programGmers can be higher in using nonprocedural

    languages that have been developed with !"# than using procedural languages.

    #?$ Data Model must be developed 1+erhaps the most important advantage of setting up of

    database system is the re$uirement that an overall data model for an organization be build. &n

    convenGtional systems% it is more likely that files will be designed as per need of particular

    applications demand. The overall view is often not considered. !uilding an overall view of an

    organization/s data is usual cost effective in the long terms.

    ##$ Provides bac@up and Recovery 1 ,entralizing a database provides the schemes such as

    recovery and backups from the failures including disk crash% power failures% software errors

    which may help the database to recover from the inconsistent state to the state that existed

    prior to the occurrence of the failure% though methods are very complex.

    A@A3TA24# O1 !"#

    #$ Cost of )ard!areA Soft!are

    A processor with high speed of data processing and memory of large size is re$uired to run

    the !"# software. &t means that you have to up grade the hardware used for filebased

    system. #imilarly% !"#software is also very costly.

    3$ Cost of Data Conversion

    6hen a computer filebased system is replaced with a database system% the data stored into

    data file must be converted to database file. &t is very difficult and costly method to convert

    data of data files into database. Kou have to hire database and system designers along with

    application programmers. Alternatively% you have to take the services of some software

    '8

    http://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q799809.htmlhttp://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q315992.htmlhttp://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q799809.htmlhttp://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q315992.html
  • 8/14/2019 Database Management Systems UNIT - 1

    16/34

    house. #o a lot of money has to be paid for developing software. J

    *$ Cost of StaffTrailing

    "ost !"#s are often complex systems so the training for users to use the !"# is

    re$uired. Training is re$uired at all levels% including programming% application development%

    and database administration. Theorganizationhas to be paid a lot of amount for the training

    of staff to run the !"#.

    6$ ppointing Tec+nical Staff

    The trained technical persons such as database administrator%application programmers% data

    entry operators etc. Are re$uired to handle the !"#. Kou have to pay handsome salaries to

    these persons. Therefore% theC system cost increases.

    7$ Database Damage

    &n most of the organizations% all data is integrated into a single database. &f database is

    damaged due to electric failure or database is corrupted on thestorage media% then your

    valuable data may be lost forever.

    )ISTOR* O" DT5S( S*ST(MS

    ata are raw facts that constitute building blocks of information. atabase isa collection of information and a means to manipulate data in a useful way% which

    must provide proper storage for large amounts of data% easy and fast access and

    facilitate the processing of data. atabase "anagement #ystem (!"#) is a set of

    software that is used to define% store% manipulate and control the data in a database.

    1rom prestage flatfile system% to relational and ob5ectrelational systems% database

    technology has gone through several generations and its :9 years history.

    T)( (0O&UTION O" T)( DT5S(

    ncient )istory% ata are not stored on disk; programmer defines both

    logical data structure and physical structure% such as storage structure% access

    methods% &=O modes etc. One data set per program0 high data redundancy. There is no

    persistence; andom access memory (A") is expensive and limited% programmer

    productivity low.

    '?

    http://www.blurtit.com/Business_Finance/Business/Staff/http://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q781137.htmlhttp://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/Business_Finance/Business/Staff/http://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q781137.htmlhttp://www.blurtit.com/q734867.htmlhttp://www.blurtit.com/q734867.html
  • 8/14/2019 Database Management Systems UNIT - 1

    17/34

    #>8:1ile!ased0 predecessor of database% ata maintained in a flat file. +rocessing

    characteristics determined by common use of magnetic tape medium.

    ata are stored in files with interface between programs and files. "apping happens

    between logical files and physical file% one file corresponds to one or several

    programs

    @arious access methods exits% e.g.% se$uential% indexed% random

    e$uires extensive programming in thirdgeneration language such as ,O!O8:1#>:? 4ra of nonrelational database0 A database provides integrated and

    structured collection of stored operational data which can be used or shared by

    application systems. +rominent hierarchical database model was &!"Ls first !"#

    called &"#. +rominent network database model was ,OA#K< !T2 model; &"#

    was the most popular network !"#.

    )ierarc+ical data model

    "id 'F?9s ockwell partner with &!" to create information "anagement #ystem

    (&"#)% &"# !=, lead the mainframe database market in D9Ls and early E9Ls.

    !ased on binary trees.

  • 8/14/2019 Database Management Systems UNIT - 1

    18/34

    o ifficult to manage and lack of standards% such as problem to add empty nodes

    and canLt easily handle manymany relationships.

    o

  • 8/14/2019 Database Management Systems UNIT - 1

    19/34

    Two ma5or pro5ects start and both were operational in late 'FD9s

    o &324# at Bniversity of ,alifornia% !erkeley became commercial and

    followed up +O#T24# which was incorporated into &nformix.

    o #ystem at &!" san Iose

  • 8/14/2019 Database Management Systems UNIT - 1

    20/34

    o Buery processor translates statements in a $uery language into lowlevel

    instructions the database manager understands. ("ay also attempt to find an

    e$uivalent but more efficient form.)

    o DM& precompiler converts "< statements embedded in an application

    program to normal procedure calls in a host language. The precompiler

    interacts with the $uery processor.

    o DD& compiler converts < statements to a set of tables containing

    metadata stored in a data dictionary.

    &n addition% several data structures are re$uired for physical system implementation0

    o Data files%store the database itself.o Data dictionary%stores information about the structure of the database. &t is

    used +eavily. 2reat emphasis should be placed on developing a good design

    and efficient implementation of the dictionary.

    o Indices%provide fast access to data items holding particular values.

    1igure shows these components.

    9

  • 8/14/2019 Database Management Systems UNIT - 1

    21/34

    '

  • 8/14/2019 Database Management Systems UNIT - 1

    22/34

    "igure %atabase system structure.

    Codd

  • 8/14/2019 Database Management Systems UNIT - 1

    23/34

    ?. Compre+ensive Data Sublanguage Rule

    A relational system may support several languages and various modes of terminal

    use. &owever, there must be at least one language whose statements are e'pressible,

    per some well-defined synta', as character strings and whose ability to support all of

    the following is comprehensible

    a. data definition

    b. view definition

    c. data manipulation (interactive and by program)

    d. integrity constraints

    e. authoriation

    f. transaction boundaries (begin, commit, and rollback).

    D. 0ie! Updating Rule

    All views that are theoretically updateable are also updateable by the system.

    E. )ig+1level Insert Update and Delete

    %he capability of handling a base relation or a derived relation as a single operand

    applies nor only to the retrieval of data but also to the insertion, update, and deletion

    of data.

    F. P+ysical Data Independence

    Application programs and terminal activities remain logically unimpaired whenever

    any changes are made in either storage representation or access methods.

    '9. &ogical Data Independence

    Application programs and terminal activities remain logically unimpaired when

    information preserving changes of any kind that theoretically permit unimpairment

    are made to the base tables.

    ''. Integrity Independence

    ntegrity constraints specific to a particular relational database must be definable in

    the relational data sublanguage and storable in the catalog, not in the application

    programs.

    '. Distribution Independence

    %he data manipulation sublanguage of a relational !"#$ must enable application

    *

  • 8/14/2019 Database Management Systems UNIT - 1

    24/34

    programs and terminal activities to remain logically unimpaired whether and

    whenever data are physically centralied or distributed.

    '*. Nonsubversion Rule

    f a relational system has or supports a low-level (single-record-at-a-time) language,

    that low-level language cannot be used to subvert or bypass the integrity rules or

    constraints e'pressed in the higher-level (multiple-records-at-a-time) relational

    language.

    "ile structures and inde-ing

    "ile Organi;ation

    '. A fileis organized logically as a se$uence of records.

    . ecords are mapped onto disk blocks.

    *. 1iles are provided as a basic construct in operating systems% so we assume the

    existence of an underlying file system.

    :. !locks are of a fixed size determined by the operating system.

    8. ecord sizes vary.

    ?. &n relational database% tuples of distinct relations may be of different sizes.

    D. One approach to mapping database to files is to store records of one length in a given

    file.

    E. An alternative is to structure files to accommodate variablelength records. (1ixed

    length is easier to implement.)

    "i-ed1&engt+ Records

    '. ,onsider a file of deposit records of the form02. aaaaaaaaaaaaPtypedeposit= record

    *.

    4. bname0 char();

    8.

    6. account*0 char('9);

    D.

    8. balance0 real;

    F.

    :

  • 8/14/2019 Database Management Systems UNIT - 1

    25/34

    #?$ end

    ''.

    o &f we assume that each character occupies one byte% an integer occupies : bytes%

    and a real E bytes% our deposit record is :9 bytes long.

    o The simplest approach is to use the first :9 bytes for the first record% the next :9

    bytes for the second% and so on.

    o >owever% there are two problems with this approach.

    '. &t is difficult to delete a record from this structure.

    . #pace occupied must somehow be deleted% or we need to mark deleted

    records so that they can be ignored.

    o Bnless block size is a multiple of :9% some records will cross block boundaries.

    o &t would then re$uire two block accesses to read or write such a record.

    '. 6hen a record is deleted% we could move all successive records up one (1igure '9.D)%

    which may re$uire moving a lot of records.

    o 6e could instead move the last record into the --hole// created by the deleted

    record (1igure '9.E).

    o This changes the order the records are in.

    o &t turns out to be undesirable to move records to occupy freed space% as moving

    re$uires block accesses.

    o Also% insertions tend to be more fre$uent than deletions.

    o &t is acceptable to leave the space open and wait for a subse$uent insertion.

    o This leads to a need for additional structure in our file design.

    '*. #o one solution is0

    o At the beginning of a file% allocate some bytes as a file +eader.

    o This header for now need only be used to store the address of the first record

    whose contents are deleted.

    o This first record can then store the address of the second available record% and so

    on (1igure '9.F).

    o To insert a ne!record% we use the record pointed to by the header% and change

    the header pointer to the ne-tavailable record.

    o &f no deleted records exist we add our new record to the end of the file.

    8

  • 8/14/2019 Database Management Systems UNIT - 1

    26/34

    ':. Note0 Bse of pointers re$uires careful programming. &f a record pointed to is moved or

    deleted% and that pointer is not corrected% the pointer becomes a dangling pointer.

    ecords pointed to are called pinned.

    '8. 1ixedlength file insertions and deletions are relatively simple because --one size fits

    all//. 1or variable length% this is not the case.

    "ile Operations

    ,onsider four basic 1ile Operations0

    Operation Similar SB& Statement

    1ind #elect

    &nsert &nsert

    "odify Bpdate

    elete elete

    Unordered file 3ew record is inserted at the end of the file.

    o

    &nsert takes constant time.o #elect% Bpdate and elete take n= time.

    (nis the number of records)

    Ordered file 3ew record is inserted in order% in the file.

    o &nsert takes logn plus this time to reorganize records.

    o #elect% Bpdate% elete take at least logn

    Inde-ed file 3ew record is inserted at the end of the file.

    o

    An inde'is maintained that points to the location on disk where the record isfound.

    o &nsert takes constant time for the data itself plus logn for the index

    o #elect% Bpdate% elete take logn lookup on the index followed by constant

    time to access data record.

    IND(IN'

    J Mec+anism for efficiently locating ro!EsF !it+out +aving to scan entire table

    ?

  • 8/14/2019 Database Management Systems UNIT - 1

    27/34

    J 5ased on a search key: ro!s +aving a particular value for t+e searc+ @ey

    attributes can be =uic@ly located

    J DonGt confuse candidate @ey !it+ searc+ @ey%

    Q Candidate @ey% setof attributesHguaranteesuni=ueness

    Q Searc+ @ey% sequenceof attributesH does not guaranteeuni=ueness Just

    used for searc+

    Inde- Structure

    J Inde- Structure Contains%

    Q Index entriesJ Can contain t+e data tuple itself Einde- and table are integratedin t+is

    caseFH or

    J Searc+ @ey value and a pointer to a ro! +aving t+at valueH table

    stored separately in t+is case unintegratedinde-

    Q Location mechanism

    J lgorit+m K data structure for locating an inde- entry !it+ a given

    searc+ @ey value

    Q Inde- entries are stored in accordance !it+ t+e searc+ @ey value

    J (ntries !it+ t+e same searc+ @ey value are stored toget+er E+as+ 51

    treeF

    J (ntries may be sorted on searc+ @ey value E51treeF

    Types of Inde-ing

    An index is made up of two components0 A keyand apointer

    The keyis typically the key value for the relation and is mainly used to identify and

    look up records.

    Thepointeris an address on disk where the rest of the data in the record can be found.

    Two types of indexes discussed here0 Ordered index and >ashing.

    Ordered Inde-

    ecords are stored as they are inserted.

    Rey attribute is stored in order in the index.

    D

  • 8/14/2019 Database Management Systems UNIT - 1

    28/34

    Storage Structure

    J #tructure of file containing a table

    Q >eap file (no index% not integrated)

    Q #orted file (no index% not integrated)Q &ntegrated file containing index and rows (index entries contain rows in this

    case)

    J A"

    J !Stree

    J >ash

    Inde- "ile .it+ Separate Storage Structure

    Clustered Inde-

    J Clustered index% inde- entries and ro!s are ordered in t+e same !ay

    Q n integrated storage structure is al!ays clustered Esince ro!s

    and inde- entries are t+e sameF

    Q T+e particular inde- structure Eeg +as+ treeF dictates +o! t+e

    ro!s are organi;ed in t+e storage structure

    J T+ere can be at most one clustered inde- on a table

    E

  • 8/14/2019 Database Management Systems UNIT - 1

    29/34

    Q CR(T( T5&( generally creates an integrated clustered

    EmainF inde- on primary @ey

    J 'ood for range searc+es !+en a range of searc+ @ey values is re=uested

    Q Use location mec+anism to locate inde- entry at start of range

    J T+is locates first ro!$

    Q Subse=uent ro!s are stored in successive locations if inde- is clustered

    Enot so if unclusteredF

    Q Minimi;es page transfers and ma-imi;es li@eli+ood of cac+e +its

    Clustered Main Inde-

    F

  • 8/14/2019 Database Management Systems UNIT - 1

    30/34

    Clustered Secondary Inde-

    Unclustered Inde-

    J Unclustered EsecondaryF inde-% inde- entries and ro!s are not

    ordered in t+e same !ay

    J n secondary inde- mig+t be clustered or unclustered !it+ respect to t+e storage

    structure it references

    Q It is generally unclustered Esince t+e organi;ation of ro!s in t+e storage

    structure depends on main inde-F

    Q T+ere can be many secondary indices on a table

    Q Inde- created by CR(T( IND( is generally an unclustered secondaryinde-

    *9

  • 8/14/2019 Database Management Systems UNIT - 1

    31/34

    Unclustered Secondary Inde-

    Sparse vs$ Dense Inde-

    J !ense inde'0 has index entry for each data record

    Q Bnclustered index mustbe dense

    Q ,lustered index need not be dense

    J $parse inde'0 has index entry for each page of data file

    Multiple ttribute Searc+ Ley

    J ,4AT4 &34 &nx O3 Tbl (Att+% Att)J #earch key is aseuenceof attributes; index entries are lexically ordered

    *'

  • 8/14/2019 Database Management Systems UNIT - 1

    32/34

    J #upports finer granularity e$uality search0

    Q 1ind row with value (A'% A) U

    J #upports range search (tree index only)0

    Q 1ind rows with values between (A'% A) and (A'

    % A

    ) UJ #upports partial key searches (tree index only)0

    Q 1ind rows with values of Att+between A' and A'

    Q !ut not 1ind rows with values of Attbetween A and AU

    &ocating an Inde- (ntry

    J Bse binary search (index entries sorted)J &f pages of index entries% then log page transfers (which is a big

    improvement over binary search of the data pages of a/page data file

    since / 00)

    J Bse multilevel index0 #parse index on sorted list of index entries

    T!o1&evel Inde-

    1 $eparator level is a sparse index over pages of index entries

    1 2eaf levelcontains index entries

    Q ,ost of searching the separator level VV cost of searching index level since separator level

    is sparse

    Q ,ost or retrieving row once index entry is found is 9 (if integrated) or ' (if not)

    Multilevel Inde-

    *

  • 8/14/2019 Database Management Systems UNIT - 1

    33/34

    Q #earch cost H number of levels in tree

    Q &f is the fanout of a separator page% cost is log 3 +

    1 4xample0 if 4 '99 and H '9%999% cost H *

    (reduced to if root is kept in main memory)

    Inde- Se=uential ccess Met+od EISMF

    J 2enerally an integrated storage structure

    Q ,lustered% index entries contain rows

    J #eparator entry H (ki, pi); ki is a search key value;piis a pointer to a lower level page

    J ki separates set of search key values in the two subtrees pointed at bypi-+andpi.

    Inde- Se=uential ccess Met+od

    **

  • 8/14/2019 Database Management Systems UNIT - 1

    34/34

    Inde- Se=uential ccess Met+od

    J The index is static0

    Q Once the separator levels have been constructed% they never change

    Q 3umber and position of leaf pages in file stays fixedJ 2ood for e$uality and range searches

    Q