Unit 1Final1

Embed Size (px)

Citation preview

  • 8/8/2019 Unit 1Final1

    1/25

    1 Introduction to Databases

    Unit 1

    Introduction to Databases

    Introduction

    Requirements of Databases

    Characteristics of the database

    Codd rules to convert a DBMS into RDBMS

    Database system concept

    Data Models

    Schemas and Instances

    Database Architecture

    Data Independence

    Database system environment

    Classification of DBMS system

    Database system utilities

    INTRODUCTION

    Information Age

    The present time is known as the information age, reason being that humans are dealing with dataand information related to business or organization. Since the beginning of civilization, man ismanipulating data and the give and take of information has been in practice, but this has beenconsidered as an important discipline only for the last few decades. Today, data manipulation andinformation processing have become the major tasks of any organization, small or big, whether itis an educational institution, government concern, scientific, commercial or any other. Thus wecan say that information is the essential requirement of any business or organization.

    Data

    It is the plural of a Greek word datum, which means any raw facts, or figure like numbers, events,letters, transactions, etc, based on which we cannot reach any conclusion. It can be useful after

  • 8/8/2019 Unit 1Final1

    2/25

    Introduction to Databases

    processing, e.g. 78, it is simply a number (data) but if we say marks in physics 78 then it becomesinformation. It means somebody got distinction in physics.

    Information

    Information is processed data. The user can take decision based on information.

    Data Processing Information

    Information systems, through their central role in information economy, bring about the followingchanges:

    Global exposure of the industry.

    Actively working people.

    Precedence of idea and information over money.

    Growth in the business size.

    Globalization changing technologies.

    Integration among different components based on information flow.

    Need for optimum utilization of resources.

    Deciding loss/benefit of business.

    Future oriented Information.

    External Interfaces.

    The organization as an information system

    2

    Marketing

    ProductDevelopment

    Sales

    CorporateDatabase

    Accounting

    A/crereceivable

    A/c Payable

    Management

    ControlPlanning

    Manufacturing

    Production

    Scheduling

    Material

    RequirementPlanning

    Purchasing

  • 8/8/2019 Unit 1Final1

    3/25

    3 Introduction to Databases

    An organization is only a mechanism for processing information and considers that the traditionalmanagement of information can be viewed in the context of information and process. Themanager may be considered as a planning and decision center. Established routes of informationflow are used to determine the effectiveness of the organization in achieving its objectives. Thus,information is often described as the key to success in business.

    Information Quality

    We expect information to be reliable and accurate. These features can be measured by thedegree of completeness, precision and timeliness.

    Completeness

    The user of information should receive all the details necessary to aid decision-making. It isimportant for all information to be supplied before decisions are made. For example, new stockshould not be ordered until full details of current stock levels are known. This is a simpleexample, since we know what information is required and where to obtain it. Difficulties beginwhen we are not sure of the completeness of the information received. Business analysts andeconomic advisors are well aware of these problems when devising strategies and fiscal plans.

    Precision

    Inaccurate information can be more damaging than incomplete information to a business. Thedegree of accuracy required depends on the recipients position in the management hierarchy. Ingeneral terms, the higher the position, the less accuracy required. Decisions made at the topmanagement level are based on annual summaries of items such as sales, purchases and capitalspending. Middle managers would require a greater degree of accuracy, perhaps weekly ormonthly totals. Junior management requires the greatest degree of accuracy to aid decision-making. Daily up-to-date information is often necessary, with accuracy to the nearest percentage

    point or unit.

    Timeliness

    This is described as the provision of prepared information as soon as it is required. We also needto consider the case where accurate information is produced, but not used immediately, renderingit out-of-date. Some systems demand timely information and cannot operate without it. Airlinereservation systems are one example, passengers and airline staff depend on timely informationconcerning flight times, reservations and hold-ups.

  • 8/8/2019 Unit 1Final1

    4/25

    Introduction to Databases

    Data Processing

    This is a traditional term used to describe the processing of function-related data with a businessorganization. Sales order processing is a typical example of data processing. Note that processingmay be carried out manually or using a computer. Some systems employ a combination of bothmanual and computerized processing techniques. In both the cases, the data processing isessentially. Differences can be described in terms of:

    Speed

    Computers can process data much quicker than any human. Hence, a computer system has a potentially higher level of productivity and, therefore, it is cheaper for high-volume dataprocessing. Speed allows more timely information to be generated.

    Accuracy

    Computers have a reputation for accuracy, assuming that correct data has been input and that procedures define processing steps correctly. The errors in computer systems are thus humanerrors (software, or input), or less likely, machine errors (hardware failure).

    Volume

    As processing requirements increase, possibly due to business expansion, managers require moreinformation processing. Human systems cannot cope up with these demands. Banking is a primeexample where the dependency on computers is total.

    Decision-Making

    There are some tasks that computers cannot perform. These activities usually have a high degreeof non-procedural thinking in which the rules of processing are difficult to define - it would beextremely difficult to produce a set of rules even for safety in crossing a busy road. Manymanagement posts still rely to a great degree on human decision-making. Top managementdecisions on policy and future business are still determined by a board of directors and not by acomputer.

    Having understood the basic concept and significance of information and database, let us now getinto the basics:

    Data: As we described earlier, Data are the raw facts used for informationprocessing. Data must be collected and then input ready for processing.

    Each item of data must be clearly labeled, formatted and its size determined.For example, a customer account number may be labeled A/C, in numericformat, of size of five digits.

    Data may enter a system in one form and then be changed as it is processed orcalculated. Customer order data, for example, may be converted to electronic

    4

  • 8/8/2019 Unit 1Final1

    5/25

    5 Introduction to Databases

    form by keying in the orders from specially prepared data entry forms. Theorder data may then be used to update both customer and stock files.

    Input: The transaction is the primary data input which leads to system action, e.g., the

    input of a customer order to the sales order processing system. The volume andfrequency of transactions will often determine the structure of an organization.

    In addition to transaction data, a business system will also need reference tostored data, know as standing or fixed data. Within a sales order processingsystem we have standing data in the form of customer names and addresses,stock records and price lists. The transactions contain some standing data, forreferencing, but mainly variable data, such as items and quantities ordered.

    Output: Output from a business system is often seen as planning or control information, or asinput to another system. This can be understood if we consider a stock control system. Outputwill be stock level information; slow-and fast-moving items for example are stock orders, foritems whose quantities fall below their reorder level. Stock movement information would beused to plan stock levels and reorder levels, whilst stock order requirements would be used asinput to the purchasing system.

    Files: A file is an ordered collection of data records, stored for retrieval or amendment, asrequired. When files are amended from transaction data, this is referred to as updating. Inorder to aid information flow, files may be shared between sub systems. For example, a stockfile may be shared between the sales function and the purchasing function.

    Processes: Data is converted to output or information by processing. Processing examplesinclude sorting, calculating and extracting.

    DATABASE

    A database is a collection of related data or operational data extracted from any firm ororganization. For example, consider the names, telephone number, and address of people youknow. You may have recorded this data in an indexed address book, or you may have stored it ona diskette, using a personal computer and software such as Microsoft Access of MS Office orORACLE, SQL SERVER etc.

    The common use of the term database is usually more restricted.

    A database has the following implicit properties:

    A database represents some aspect of the real world, sometimes called the miniworld or theUniverse of Discourse (U.D.). Changes to the miniworld are reflected in the database.

    A database is a logically coherent collection of data with some inherent meaning. A randomassortment of data cannot correctly be referred to as a database.

    A database is designed, built and populated with data for a specific purpose. It has an intendedgroup of users and some preconceived applications in which these users are interested.

    In other words, a database has some source from which data is derived, some degree of interaction

    with events and an audience that is actively interested in the contents of the database. A databasecan be of any size and of varying complexity. For example, the list of names and addresses

  • 8/8/2019 Unit 1Final1

    6/25

    Introduction to Databases

    referred to earlier may consist of only a few hundred records, each with a simple structure. On theother hand, the card catalog of a large library may contain half a million cards stored underdifferent categories by primary authors last name, by subject, by book titles with eachcategory organized in alphabetic order.

    Here are several examples of databases.

    1. Manufacturing company

    2. Bank

    3. Hospital

    4. University

    5. Government department

    In general, it is a collection of files (tables)

    Entity: A person, place, thing or event about which information must be kept.

    Attribute: Pieces of information describing a particular entity. These are mainly thecharacteristics about the individual entity. Individual attributes help to identify and distinguishone entity from another.

    Student (Database Name)

    Hierarchy of Database

    Entity Attributes

    Personnel Name, Age, Address, Fathers Name

    Academic Name, Roll No., Course, Depts. Name

    6

    Bit 0,1

    Byte 10101011 (8-bits)

    Field (Attribute name like name, Age, Address)

    Record (One or more rows in a table)

    File (Table or collection of all files)

    Database (Collection of files or tables)

  • 8/8/2019 Unit 1Final1

    7/25

    7 Introduction to Databases

    e.g.

    Student (Database Name)

    Field name or attribute name

    Personal (Table Name) Academic (Table Name)

    Name Father Name Age

    John Albert 24

    Ramesh Suresh 18

    Why Database?

    Handling of a small shops database can be done normally but if you have a large database andmultiple users then in that case you have to maintain computerized database. The advantages of adatabase system over traditional, paper-based methods of record-keeping tag will perhaps be morereadily apparent in these examples. Here are some of them.

    Compactness: No need for possibly voluminous paper files.

    Speed: The machine can retrieve and change data faster than a human can..

    Accuracy: Accurate, up-to-date information is available on demand at any time.

    Benefits of the Database Approach

    There are following benefits of the Database Approach:

    Redundancy and duplication can be reduced. In the database approach, theviews of different user groups are integrated during database design. Forconsistency, we should have a database design that stores each logical data item such as students name or birth date in only one place in the database. This doesnot permit inconsistency, and it saves time. However, in some cases, controlledredundancymay be useful for improving the performance of queries.

    Inconsistency can be avoided (to some extent). Employee E4 works indepartment D5 is represented by two distinct entries in the stored database.Suppose also that the DBMS is not aware of this duplication (i.e. redundancy is notcontrolled). Then there will necessarily be an occasion on which the two entrieswill not agree, i.e., when one of the two has been updated and the other has not. Atsuch times the database is said to be inconsistent.

    RECORD

    NameROLLNO

    COURSE Dept.Name

    John 12 MSC Computer

    Ramesh 15 BCA Computer

  • 8/8/2019 Unit 1Final1

    8/25

    Introduction to Databases

    The data can be shared.Same database can be used by variety of users, for theirdifferent objectives, simultaneously.

    Security restrictions can be applied.It is likely that some users is often will not

    be authorized to access all information in the database. For example, financial datais often considered confidential, and hence only authorized persons are allowed toaccess such data. In addition, some users may be permitted only to retrieve data,whereas others are allowed both to retrieve and to-update.

    Integrity can be maintained.The problem of integrity is the problem of ensuringthat the data in the database in accurate it means if the data type of any field isnumber then we cannot insert any string text here.

    DATABASE SYSTEM

    A DBMS is a sophisticated piece of software, which supports the creation, manipulation andadministration of database system. A database system comprises of a database of operational datatogether with the processing functionality required to access and manage that data. Typically, thismeans a computerized record keeping system whose overall purpose is to maintain informationand to make that information available on demand.

    The DBMS as an Interface between physical Database and user Requests

    The DBMS responds to a query by invoking the appropriate sub-programs, each ofwhich performs its special function to interpret the query, or to locate the desired data in thedatabase and insert it in the designed order. Thus DBMS shields database users from the tedious

    programming they would have to do, organize data for storage, or to gain access to it once it hasbeen stored.

    8

    Query Record Operation OtherLanguageInterface

    Database ManagementS stem

    Operating system

    User Request

  • 8/8/2019 Unit 1Final1

    9/25

    9 Introduction to Databases

    As already mentioned, a database consists of a group of related files of different record types andthe DBMS allows users to access data anywhere in the database, without the knowledge of howdata are actually organized on the storage device.

    The DBMS (database approach) tries to overcome all of the shortcomings of

    the pre database approach as follows:

    Data Validation Problems: If many programs manipulate a particular type of informationthen validation of its correctness must be carried out by each of those on guard against entry ofany illegal values. Consequently, program code may need to duplicate and, if the validation

    conditions change, each program (at least) must be recompiled.

    Data Sharing Problems: Perhaps more seriously, if a file is used by several programs andthere is a need to change its structures in some way, perhaps to add a new type informationobject that is required by a new program, then each program will need to be recompiled-unlessone maintains duplicate information in different structures, in which case there is asynchronization problem.

    Manipulation Problems: When writing a program using a conventional programminglanguage and operating system facilities, a programmer uses record-level commands (i.e. readsand writes) on each file to perform the required functions; this is laborious and henceunproductive of the programmers time.

    Data Redundancy: The same piece of information may be stored in two or more files. Forexample, the particulars of an individual who may be a customer and an employee may bestored in two or more files.

    Program/Data Dependency: In the traditional approach, if a data field is to be added to amaster file, all such programs that access the master file would have to be changed to allow forthis new field which would have been added to the master record.

    Lack of Flexibility: In view of the strong coupling between the program and the data, mostinformation retrieval possibilities would be limited to well-anticipated and predeterminedrequests for data, the system would normally be capable of producing schedule records and

    queries which it would have been programmed to create. In the fast moving and competentbusiness environment of today, apart from such regularly scheduled records there is a need forresponding to un-anticipatory queries and some kind of investigative analysis which cannot beenvisaged professionally.

    So let us now try to appreciate how DBMS solves some of the issues.

    Data Validation: In principle, validation rules for data objects can be held in the schema andenforced on entry by the DBMS. This reduces the amount of application code that is needed.Changes to these rules need be made exactly once because they are not duplicated.

    Data Sharing: Changes to the structures of data objects are registered by modifications

    to the schema. Existing application programs need not be aware of any differences, because a

    Database

  • 8/8/2019 Unit 1Final1

    10/25

    Introduction to Databases

    correspondence between their view of data and that, which is now supported,can also be held in the schema and interpreted by the DBMS. This concept is often referredto as data independence; applications are independent of the actual representation oftheir data.

    One of the main reasons for using DBMS is to have central control of both the data and theprocesses that access those data. The person who has such central control over the system is called

    the database administrator (DBA). The functions of the DBA include the following:

    Schema Definition: The DBA creates the original database schema by writing a set of definesthat is translated by the DDL (Data Defn. Lang.) Compiler to a set of tables that is store

    permanently in the data dictionary.

    Storage Structure and Access-Method Definition: The DBA creates appropriate storage

    structures and access methods by writing a set of definitions, which is translated by the DDLcompiler.

    Schema and Physical-Organization Modification: Programmers accomplish the relativelyrare modifications either to the database schema or to the description of the physical storageorganization. By writing a set of definitions that is used by either the DDL compiler or thedata-storage and data defn. Language compilers to generate modifications to the appropriateintend system-tables (for example, the data dictionary).

    Growing of Authorizations for Data Access: The granting of different types ofauthorizations allows the DBA to regulate the parts of the database, which various users can

    access. Integrity-Constraint Specification: The data values stored in the database must satisfy

    certain consistency constraints e.g., perhaps the number of hours an employee may work in 1week may not exceed a pre-specified limit (say 80 hours)

    CODD RULES

    Rule 1 : The information Rule.

    "All information in a relational data base is represented explicitly at the logical level and inexactly one way - by values in tables."Everything within the database exists in tables and is accessed via table access routines.

    Rule 2 : Guaranteed access Rule.

    "Each and every datum (atomic value) in a relational data base is guaranteed to be logicallyaccessible by resorting to a combination of table name, primary key value and column name."To access any data-item you specify which column within which table it exists, there is noreading of characters 10 to 20 of a 255 byte string.

    Rule 3 : Systematic treatment of null values.

    "Null values (distinct from the empty character string or a string of blank characters and distinct

    from zero or any other number) are supported in fully relational DBMS for representing missinginformation and inapplicable information in a systematic way, independent of data type."

    10

  • 8/8/2019 Unit 1Final1

    11/25

    11 Introduction to Databases

    If data does not exist or does not apply then a value of NULL is applied, this is understood by theRDBMS as meaning non-applicable data.

    Rule 4 : Dynamic on-line catalog based on the relational model.

    "The data base description is represented at the logical level in the same way as-ordinary data, sothat authorized users can apply the same relational language to its interrogation as they apply tothe regular data."The Data Dictionary is held within the RDBMS, thus there is no-need for off-line volumes to tellyou the structure of the database.

    Rule 5 : Comprehensive data sub-language Rule.

    "A relational system may support several languages and various modes of terminal use (forexample, the fill-in-the-blanks mode). However, there must be at least one language whosestatements are expressible, per some well-defined syntax, as character strings and that iscomprehensive in supporting all the following items

    Data Definition

    View Definition

    Data Manipulation (Interactive and by program).

    Integrity Constraints

    Authorization.

    Every RDBMS should provide a language to allow the user to query the contents of the RDBMSand also manipulate the contents of the RDBMS.

    Rule 6 : .View updating Rule

    "All views that are theoretically updatable are also updatable by the system."Not only can the user modify data, but so can the RDBMS when the user is not logged-in.

    Rule 7 : High-level insert, update and delete.

    "The capability of handling a base relation or a derived relation as a single operand applies notonly to the retrieval of data but also to the insertion, update and deletion of data."The user should be able to modify several tables by modifying the view to which they act as basetables.

    Rule 8 : Physical data independence.

    "Application programs and terminal activities remain logically unimpaired whenever anychanges are made in either storage representations or access methods."The user should not be aware of where or upon which media data-files are stored

    Rule 9 : Logical data independence.

    "Application programs and terminal activities remain logically unimpaired when information-preserving changes of any kind that theoretically permit un-impairment are made to the basetables."User programs and the user should not be aware of any changes to the structure of the tables

    (such as the addition of extra columns).

  • 8/8/2019 Unit 1Final1

    12/25

    Introduction to Databases

    Rule 10 : Integrity independence.

    "Integrity constraints specific to a particular relational data base must be definable in therelational data sub-language and storable in the catalog, not in the application programs."If a column only accepts certain values, then it is the RDBMS which enforces these constraints

    and not the user program, this means that an invalid value can never be entered into this column,whilst if the constraints were enforced via programs there is always a chance that a buggy

    program might allow incorrect values into the system.

    Rule 11 : Distribution independence.

    "A relational DBMS has distribution independence."The RDBMS may spread across more than one system and across several networks, however tothe end-user the tables should appear no different to those that are local.

    Rule 12 : Non-subversion Rule.

    "If a relational system has a low-level (single-record-at-a-time) language, that low level cannotbe used to subvert or bypass the integrity Rules and constraints expressed in the higher levelrelational language (multiple-records-at-a-time)."

    0. Foundation Rule

    Interestingly Codd defined a Rule 0 for relational database systems.

    "For any system that is advertised as, or claimed to be, a relational database management system,that system must be able to manage databases entirely through its relational capabilities, nomatter what additional capabilities the system may support." (Codd, 1990)

    That means, no matter what additional features a relational database might support, in order to betruly called relational it must comply with the 12 rules.

    DATA MODELS

    Underlying the structure of a database is the data model: a collection of conceptual tools fordescribing data, data relationships, data semantics, and consistency constraints. The various datamodels that have been proposed fall into three different groups: object-based logical models,record-based logical models, and physical models.

    Object-Based Logical Models

    Object-based logical models are used in describing data at the logical and view levels. They arecharacterized by the fact that they provide fairly flexible structuring capabilities and allow dataconstraints to be specified explicitly. There are many different models, and more are likely tocome. Several of the more widely known ones are

    The entity-relationship model

    The object-oriented model

    12

  • 8/8/2019 Unit 1Final1

    13/25

    13 Introduction to Databases

    The Entity-Relationship Model

    The entity-relationship (E-R) data model is based on a perception of a real world that consists of acollection of basic objects, called entities, and of relationships

    Figure 1 A sample E-R diagram.

    among these objects. An entity is a thing or object in the real world that is distinguishablefrom other objects. For example, each person is an entity, and bank accounts can be considered to

    be entities. Entities are described in a database by a set of attributes. For example, the attributesaccount-number and balance describe one particular account in a bank. A relationship is anassociation among several entities. For example, a Depositor relationship associates a customerwith each account that she has. The set of all entities of the same type, and the set and relationshipof the same type, are termed an entity set and relationship set, respectively.In addition to entities and relationships, the E-R model represents certain constraints to which thecontents of a database must conform. One important constraint is mapping cardinalities, whichexpress the number of entities to which another entity can be associated via a relationship set.

    The overall logical structure of a database can be expressed graphically by an E-R diagram, whichis built up from the following components:

    Rectangles, which represent entity sets

    Ellipses, which represent attributes

    Diamonds, which represent relationships among entity sets

    Lines, which link attributes to entity sets and entity sets to relationships

    Each component is labeled with the entity or relationship that it represents.

    The Object-Oriented Model

    Like the E-R model, the object-oriented model is based on a collection of objects. An objectcontains values stored in instance variables within the object. An object also contains bodies ofcode that operate on the object. These bodies of code are called methods.Objects that contain the same types of values and the same methods are grouped together intoclasses. A class may be viewed as a type definition for objects. This combination of data andmethods comprising a type definition is similar to a programming-language abstract data type.The only way in which one object can access the data of another object is by invoking a method ofthat other object. This action is called sending a message to the object. Thus, the call interface ofthe methods of an object defines that objects externally visible part. The internal part of the

    object-the instance variables and method code-are not visible externally. The result is two levels ofdata abstraction.

    Social-Security

    Customer-name

    Customer-street

    Customer-city

    Account-number Balance

    Customer Account

    Depositor

  • 8/8/2019 Unit 1Final1

    14/25

    Introduction to Databases

    To illustrate the concept, let us consider an object representing a bank account. Such an objectcontains instance variables account-number and balance. It contains a method pay-interest, whichadds interest to the balance. Assume that the bank had been paying 6 percent interest on allaccounts, but now is changing its policy to pay 5 percent if the balance is less than $1000 or 6

    percent if the balance is $1000 or greater. Under most data models, making this adjustment wouldinvolve changing code in one or more application programs. Under the object-oriented model, theonly change is made within the pay-interest method. The external interface to the objects remainsunchanged.

    Figure 2: Class hierarchy

    Unlike entities in the E-R model, each object has its own unique identity, independent of thevalues that it contains. Thus, two objects containing the same values are nevertheless distinct. The

    distinction among individual objects is maintained in the physical level through the assignment ofdistinct object identifiers.

    Record-Based Logical Models

    Record-based logical models are used in describing data at the logical and view levels. In contrastto object-based data models, they are used both to specify the overall logical structure of thedatabase and to provide a higher-level description of the implementation.Record-based models are so named because the database structured in fixed-format records ofseveral types. Each record type defines a fixed number of fields, or attributes, and each field isusually of a fixed length. The use of fixed-length records simplifies the physical-level

    implementation of the database. This simplicity is in contrast to many of the object-based models,whose richer structure often leads to variable-length records at the physical level.The three most widely accepted record-based data models are the relational, network, andhierarchical models. The relational model, which has gained favor over the other two in recentyears. The network and hierarchical models are still used in a large number of older databases.

    Here, we present a brief overview of each model.

    Relational Model

    The relational model uses a collection of tables to represent both data and the relationships among

    those data. Each table has multiple columns, and each column has a unique name. Figure 3

    14

    person

    employee customer

    secretaryteller

    bank

    branch

  • 8/8/2019 Unit 1Final1

    15/25

    15 Introduction to Databases

    presents a sample relational database comprising of two tables: one shows bank customers, and theother shows the accounts that belong to those customers. It shows, for example, that CustomerJohnson, with

    Customer-name Social-security Customer-street Customer-city Account-number

    JohnsonSmithHayesTurnerJohnsonJonesLindsaySmith

    192-83-7465019-28-3746677-89-9011182-73-6091192-83-7465321-12-3123336-66-9999019-28-3746

    AlmaNorthMainPutnamAlmaMainPark

    North

    Palo AltoRyeHarrisonStamfordPalo altoHarrisonPittsfieldRye

    A-101A-215A-102A-305A-201A-217A-222A-201

    Account-number

    Balance

    A-101A-215A-102A-305A-201A-217A-222

    500700400350900750700

    Figure 3. A sample relational database

    Social-security number 192-83-7465, lives on Main in Harrison, and has two accounts: A-101,with a balance of $500, and A-201, with a balance of $900. Note that customer Johnson and Smithshare account number A-201 (they may share a business venture)

    Network Model

    Data in the network model are represented by collections of records (in the Pascal sense), andrelationships among data are represented by links, which can be viewed as pointers. The records in

    the database are organized as collections of arbitrary graphs. Figure 4 presents a sample networkdatabase using the same information as in figure 3.

  • 8/8/2019 Unit 1Final1

    16/25

    Introduction to Databases

    Figure 4. Network Model

    Hierarchical Model

    The hierarchical model is similar to the network model in the sense that data and relationshipsamong data are represented by records and links, respectively. It differs from the network model inthat the records are organized as collections of trees rather than arbitrary graphs. Figure 5 presentsa sample hierarchical database with the same information as in figure 4.

    Figure 5. A sample hierarchical database.

    Difference Among the Models

    The relational model differs from the network and hierarchical models in that it does not usepointers or links. Instead, the relational model relates records by theValues that they contain. This freedom from the use of pointers allows a formal mathematical

    foundation to be defined.

    16

  • 8/8/2019 Unit 1Final1

    17/25

    17 Introduction to Databases

    Physical Data Models

    Physical data models are used to describe data at the lowest level. In contrast to logical data

    models, there are few physical data models in use. Two of the widely known ones are the unifyingmodel and the frame-memory model.Physical data models capture aspects of database-system implementation that are not covered inthis book.

    DATABASE LANGUAGES

    A database system provides two different types of languages: one to specify the database schema,and the other to express database queries and updates.

    Data-Definition Language

    A database schema is specified by a set of definitions expressed by a special language called adata-definition language (DDL). The result of compilation of DDL statements is a set of tables thatis stored in special file called data dictionary, or data directory.A data dictionary is a file that contains metadata-that is, data about data.This file is consulted before actual data are read or modified in the database system.The storage structure and access methods used by the database system are specified by a set ofdefinitions in a special type of DDL called a data storage and definition language. The result ofcompilation of these definitions is a set of instructions to specify the implementation details of the

    database schemas-details are usually hidden from the users.

    Data-Manipulation Language

    The levels of abstraction apply not only to the definition or structuring of data, but also to themanipulation of data. By data manipulation, we mean

    The retrieval of information stored in the database

    The insertion of new information into the database

    The deletion of information from the database

    The modification of information stored in the database

    At the physical level, we must define algorithms that allow efficient access to data. At higherlevels of abstraction, we emphasize ease of use. The goal is to provide efficient human interactionwith the system.A data-manipulation language (DML) is a language that enables users to access or manipulate dataas organized by the appropriate data model. There are basically two types:

    Procedural DMLs require a user to specify what data are needed and how to get those data.

    Nonprocedural DMLs require a user to specify what data are needed without specifyinghow to get those data.

  • 8/8/2019 Unit 1Final1

    18/25

    Introduction to Databases

    A query is statement requesting the retrieval of information. The portion of a DML that involvesinformation retrieval is called a query language. Although technically incorrect, it is common

    practice to use the terms query language and data-manipulation language synonymously.

    INSTANCES AND SCHEMAS

    Database changes over time when and as information is inserted and deleted. The collection ofinformation stored in the database at a particular moment is called an instance of the database. Theoverall design of the database is called the database schema, schemas one changes infrequently, ifat all.

    Analogies to the concepts of data types, variables and values in programming languages are usefulhere. Returning to the customer record types definition, note that in declaring the type of customer,we have not declared any variables. To declare such variables in a Pascal-like language, we write

    Var customer: customer; variable customer2 now corresponds to an area of storage containing acustomer type record.

    A database schema corresponds to the programming-language type definition. A variable of agiven type has a particular value at a given instant. Thus, the value of a variable in programminglanguages corresponds to an instance of a database schema. In other words the description of adatabase is called the database schema, which is specified during database design and is notexpected to change frequently, A displayed schema is called a schema diagram.

    E.g. student-schema.

    Name. Roll No Class Mayan

    Course.

    Course No Department

    A schema diagram displays only some aspects of a schema, such as the names of record types anddata items, and some types of constraints. Other aspects are not specified in the schema diagram.As in the above diagram theyre neither in data type of each data item, nor in the relationshipsamong the various files.

    DATA ABSTRACTION

    For the system to be usable, it must retrieve data efficiently. This concern has led to the design ofcomplex data structures for the representation of data in the database. Since many database-systemusers are not computer trained, developers hide the complexity from users through several levelsof abstraction, to simplify users interactions with the systems:

    Physical level. The lowest level of abstraction describes how the data are actuallystored. At the physical level, complex low-level data structures are described in detail.

    18

    Schema diagram

  • 8/8/2019 Unit 1Final1

    19/25

    19 Introduction to Databases

    Logical level. The next higher level of abstraction describes what data are stored in thedatabase, and what relationships exist among those data. The entire database is thusdescribed in terms of a small number of relatively simple structures. Althoughimplementation of the simple structures at the logical level may involve complex

    physical-level structures, the user of the logical level does not need to be aware of thiscomplexity. Database administrators, who must decide what information is to be keptin the database, use the logical level of abstraction.

    View level. The highest level of abstraction describes only part of the entire database.Despite the use of simpler structures at the logical level, some complexity remains,

    because of the large size of the database. Many users of the database system will not beconcerned with all this information. Instead, such users need to access only a part of thedatabase. So that their interaction with the system is simplified, the view level ofabstraction is defined. The system may provide many views for the same database.

    The interrelationship among these three levels of abstraction is illustrated in Figure givenbelow.

    The Three Levels of Data Abstraction

    An analogy to the concept of data types in programming languages may clarify the distinctionamong levels of abstraction. Most high-level programming languages support the notion of arecord type. For example, in a Pascal-like language, we may declare a record as follows:

    type customer = record

    customer-name : string;

    social-security : string;

    customer-street : string;

    customer-city : string;

    end

    This code defines a new record called customer with three fields. Each field has a name and a typeassociate with it. A banking enterprise may have several such record types, including

    Account, with fields account-number and balance

  • 8/8/2019 Unit 1Final1

    20/25

    Introduction to Databases

    Employee, with fields employee-name and salary

    At the physical level, a customer, account, or employee record can be described as a block ofconsecutive storage locations (for example, words or bytes). The language compiler hides this

    level of detail from programmers. Similarly, the database system hides many of the lowest-levelstorage details from database programmers. Database administrators may be aware of certaindetails of the physical organization of the data.

    At the logical level, each such record is described by a type definition, as illustrated in the previous code segment, and the interrelationship among these record types is defined.Programmers using a programming language work at this level of abstraction. Similarly, databaseadministrators usually work at this level of abstraction.

    Finally, at the view level, computer users see a set of application programs that hide details of thedata types. Similarly, at the view level, several views of the database are defined, and databaseusers see these views. In addition to hiding details of the logical level of the database, the viewsalso provide a security mechanism to prevent users from accessing parts of the database. Forexample, tellers in a bank see only that part of the database that has information on customeraccounts; they cannot access information concerning salaries of employees.

    OVERALL SYSTEM STRUCTURE

    A database system is partitioned into modules that deal with each of the responsibilities of theoverall system. Some of the functions of the database system may be provided by the computersoperating system. In most cases, the computers operating system provides only the most basicservices, and the database system must build on that base. Thus, the design of a database systemmust include consideration of the interface between the database system and the operating system.The functional components of a database system can be broadly divided into query processorcomponents and storage manger components. The query processor components include:

    DML compiler, which translates DML statements in a query language into low-levelinstructions that the query evaluation engine understands. In addition, the DML compilerattempts to transform a users request into an equivalent but more efficient form, thusfinding a good strategy for executing the query.

    Embedded DML precompiler, which converts DML statements embedded in an

    application program to normal procedure calls in the host language. The precompiler mustinteract with the DML compiler to generate the appropriate code. DDL interpreter, which interprets DDL statements and records them in a set of tables

    containing metadata. Query evaluation engine, which executes low-level instructions generated by the DML

    compiler.

    The storage manger components provides the interface between the low-level data stored in thedatabase and the application programs and queries submitted to the system. The storage mangercomponents include:

    20

  • 8/8/2019 Unit 1Final1

    21/25

    21 Introduction to Databases

    Authorization and integrity manger, which tests for the satisfaction of integrity constraintsand checks the authority of users to access data.

    Transaction manager, which ensures that the database remains in a consistent (correct)state despite system failures, and that concurrent transaction executions proceed withoutconflicting.

    File manager, which manages the allocation of space on disk storage and the datastructures used to represent information stored on disk.

    Buffer manager, which is responsible for fetching data from disk storage into mainmemory, and deciding what data to cache in memory.

    In addition, several data structure are required as part of the physical system implementation:Data files, which store the database itself.Data dictionary, which stores metadata about the structure of the database. The data dictionary isused heavily. Therefore, great emphasis should be placed on developing a good design and

    efficient implementation of the dictionary.Indices, which provide fast access to data items that hold particular values.Statistical data, which store statistical information about the data in the database. This informationis used by the query processor to select efficient ways to execute a query.

    Figure 6 shows these components and the connections among them.

    nave user application sophisticated database(tellers, agents, etc.) programmer users administrator

    users

    applicationinterfaces

    applicationprograms

    querydatabasescheme

    applicationprograms

    object code

    embeddedDML

    precompiler

    DMLprecompiler

    DMLinterpreter

    queryevaluation

    engine

    buffermanager

    file manager

    transactionmanager

    queryprocessor

    storagemanager

    indicesstatistical

    data

    datafiles

    datadictionary

    disk storage

    databasemanagementsystem

  • 8/8/2019 Unit 1Final1

    22/25

    Introduction to Databases

    DATA INDEPENDENCE

    The ability to modify a schema definition in one level without affecting a schema definition in thenext higher level is called data independence. There are two levels of data independence:

    1. Physical data independence is the ability to modify the physical schema without causingapplication programs to be rewritten. Modification at the physical level are occasionallynecessary to improve performance.

    2. Logical data independence is the ability to modify the logical schema without causingapplication programs to be rewritten. Modifications at the logical level are necessarywhenever the logical structure of the database is altered (for example, when money-marketaccounts are added to banking system).

    Logical data independence is more difficult to achieve than is physical data independence,since application programs are heavily dependent on the logical structure of the data thatthey access.The concept of data independence is similar in many respects to the concept of abstractdata types in modern programming languages. Both hide implementation details from theusers, to allow users to concentrate on the general structure, rather than on low-levelimplementation details.

    DATABASE USERS

    The primary goal of a database system is to provide an environment for retrieving informationfrom and storing new information into the database. There are four different types of databasesystem users, differentiated by the way that they expect to interact with the system.

    Application programmers are computer professionals who interact with the system throughDML (Data Manipulation Language) calls, which are embedded in a program written in a hostlanguage (for example, Cobol, PL/S, Pascal, C). These programs are commonly referred asapplication programs. e.g.: A Banking system includes programs that generate payroll checksthat debit accounts, that credit accounts, or that transfer funds between accounts.

    Sophisticated Users: Such users interact with the system without writing programs. Instead,

    they form their requests in database query language. Each such query is submitted to a veryprocessor whose function is to breakdown DML statement into instructions that the storagemanager understands. Analysts who submit to explore data in the database till in the category.

    Specialized Users: Such users are those who write specialized database applications that donot fit into the fractional data-processing framework. e.g. computer-aided design systems,knowledge base and expert systems, systems that store data with complex data types (forexample, graphics data and audio data).

    Naive users: These users are unsophisticated who interact with the system by involving one ofthe permanent application programs that have been written. For example, a bank teller whoneeds to transfer $50 from account A to account B invokes a program called transfer.

    22

  • 8/8/2019 Unit 1Final1

    23/25

    23 Introduction to Databases

    This program asks the teller for the amount of money to be transferred, the account from whichthe money is to be transferred, and the account to which the money is to be transferred.

    THE DATABASE SYSTEM ENVIRONMENT

    A DBMS is a complex software system. In this section we discuss the types of softwarecomponents that constitute a DBMS and the types of computer system software with which theDBMS interacts.

    DBMS Component Modules

    Figure 7 illustrates, in a simplified form, the typical DBMS components. The database and theDBMS catalog are usually stored on disk. Access to the disk is controlled primarily by theoperating system (OS), which schedules disk input/output. A higher-level stored data mangermodule of the DBMS controls access to DBMS information that is stored on disk, whether it is

    part of the database or the catalog. The dotted lines and circles market A,B,C,D, and E in figure 7illustrate accesses that are under the control of this stored data manager. The stored data managermay use basic OS services for carrying out low level data transfer between the disk and computermain storage, but it controls other aspects of data transfer, such as handling buffers in mainmemory. Once the data is in main memory buffers, it can be processed by other DBMS modules,as well as by application programs.

    APPLICATIONPROGRAMS

    PRIVILEGEDCOMMANDS

    DMLSTATEMENTS

    INTERACTIVEQUERY

    DDLSTATEMENTS

    DMLCompiler

    StoredData

    Manager

    Host LanguageCompiler

    PreCompiler

    Run-timeDatabaseProcessor

    DDLCompiler

    QueryCompiler

    COMPILER(CANNED)

    TRANSACTIONS

    Concurrency Control/Backup/Recovery Subsystems

    SystemCatalog/Data

    Dictionary

    STORED DATABASE

    DBA Staff

    Casualusers Parametric

    users

    execution execution

    execution

    E

    D

    Applicationprogrammers

  • 8/8/2019 Unit 1Final1

    24/25

    Introduction to Databases

    Figure 7. Typical component modules of a DBMS. Dotted lines show accesses that are under thecontrol of the stored data manager.

    The DDL complier process schema definition, specified in the DDL, and stores descriptions of the

    schemas (meta-data) in the DBMS catalog. The catalog includes information such as the names offiles, data items, storage details of each file, mapping information among schemas, andconstraints, in addition to many other types of information that are needed by the DBMS modules.DBMS software modules then look up the catalog information as needed.

    The run-time database processor handles database accesses at run time; it receives retrieval orupdate operations and carries them out on the database. Access to disk goes through the storeddata manager. The query compiler handles high-level queries that are entered interactively. It

    parses, analyzes, and compiles or interprets a query by creating database access code, and thengenerates calls to the run-time processor for executing the code.

    The pre-compiler extracts DML commands from an application program written in a hostprogramming language. These commands are sent to the DML compiler for compilation intoobject code for database access. The rest of the program is sent to the host language compiler. Theobject codes for the DML commands and the rest of the program are linked, forming a cannedtransaction whose executable code includes calls to the run-time database processor.Figure 7 is not meant to describe a specific DBMS; rather it illustrates typical DBMS modules.The DBMS interacts with the operating system when disk accesses-to the database or to thecatalog-are needed. If the computer system is shared by many users, the OS will schedule DBMSdisk access requests and DBMS processing along with other process. The DBMS also interfaceswith compilers for general-purpose host programming languages. User-friendly interfaces to the

    DBMS can be provided to help any of the user types shown in figure 7 to specify their requests.

    CLASSIFICATION OF DATABASE MANAGEMENT SYSTEM

    Several criteria are normally used to classify DBMSs. The first is the data model on which theDBMS is based. The two types of data models used in many current commercial DBMSs are therelational data model and the object data model. Many legacy applications still run on databasesystems based on the hierarchical and network data models. The relational DBMSs are evolvingcontinuously, and, in particular, have been incorporating many of the concepts that weredeveloped in object databases. This has led to a new class of DBMSs that are being called object-relational DBMSs. We can hence categorize DBMSs based on the data model: relational, object,object-relational, hierarchical, network, and other.

    A DBMS is centralized if the data is stored at a single computer site. A centralized DBMS cansupport multiple users, but the DBMS and the database themselves reside totally at a singlecomputer site. A distributed DBMS (DDBMS) can have the actual database and DBMS softwaredistributed over many sites, connected by a computer network. Homogeneous DDBMSs use thesame DBMS software at multiple sites. A recent trend is to develop software to access severalautonomous preexisting databases stored under heterogeneous DBMSs. This leads to a federatedDBMS (or multidatabase system), where the participating DBMSs are loosely coupled and have a

    degree of local autonomy. Many DDBMs use a client-server architecture.

    24

  • 8/8/2019 Unit 1Final1

    25/25

    25 Introduction to Databases

    Another criterion is the cost of the DBMS. The majority of DBMS packages cost between $10,000and $ 100,000. Single-user low-end systems that work with microcomputer cost between $100 and$3000. At the other end, a few elaborate packages cost more than $100,000.

    We can also classify a DBMS on the basis of the types of access path options for storing files. Onewell-known family of DBMSs is based on inverted file structures. Finally, a DBMS can begeneral-purpose or special-purpose. When performance is a primary consideration, a special-

    purpose DBMS can be designed and built for a specific application; such a system cannot be usedfor other applications without major changes. Many airline reservations and telephone directorysystems developed in the past are special-purpose DBMSs. These fall into the category of on-linetransaction processing (OLTP) systems, which must support a large number of concurrenttransactions without imposing excessive delays.

    DATABASE SYSTEM UTILITIES

    In addition to possessing the software modules just described, most DBMSs have database utilitiesthat help the DBA in managing the database system. Common utilities have the following types offunctions:

    1. Loading: A loading utility is used to load existing data files-such as text files or sequentialfiles-into the database. Usually, the current (source) format of the data file and the desired(target) database file structure are specified to the utility, which then automaticallyreformats the data and stores it in the database. With the proliferation of DBMSs,transferring data from one DBMS to another is becoming common in many organizations.

    Some vendors are offering products that generate the appropriate loading programs, giventhe existing source and target database storage descriptions (internal schemas). Such toolsare also called conversion tools.

    2. Backup: A backup utility creates a backup copy of the database, usually by dumping theentire database onto tape. The backup copy can be used to restore the database in case ofcatastrophic failure. Incremental backups are also often used, where only changes since the

    previous backup are recorded. Incremental backup is more complex but it saves space.3. File reorganization: This utility can be used to reorganize a database file into a different

    file organization to improve performance.4. Performance monitoring: Such a utility monitors database usage and provides statistics to

    the DBA. The DBA uses the statistics in making decisions such as whether or not to

    reorganizes files to improve performance.

    Other utilities may be available for sorting files, handling data compression, monitoring access byusers, and performing other functions.