Data Base Systems

Embed Size (px)

Citation preview

  • 8/12/2019 Data Base Systems

    1/22

    Data base systems

    INTRODUCTIONThe primary memory of a computer is limited and hence

    programs and data are deleted from primary memory once their

    use is over. These programs and data are organised into files for

    permanent storage on secondary storage device for reuse. Thesefiles are structured in a particular way depending upon the typeof access required and the media on which they are stored. Ifthe data requires quick access, it is stored on disks and if it

    requires only serial processing the data is usually stored on tape.The file is made up of a number of records. The recordsare a group of fields and each field is made up of some bits ofdata. Each file is given a name for its identify. The namegenerally consists of two parts: the first is a single-word name

    and the second, a three-letter extension name to indicate thetype of file. For instance .COB, . PRG etc. for program files and

    .OBF, .OAT etc. for data files. For example, in stock.dat, stock

    is the first part of the file name and .dat is the extension.A file holds records of logically similar data. Each record

    consists of a set of fields for data. Each field holds. data ofdefined nature like date field holds only dates, name field holds

    only names, etc. The computer files are organised on physicalstorage devices like magnetic tape, disk and CD-ROM.

    Data and Information

    Dat,a is the result of measurements of various attributes ofentities such as product, student, inventory item and employee.The measurements may be recorded in alphabetical, numerical,image, voice or other forms. Thus, the raw and unanalysed

    numbers and facts about entities constitute data. On the otherhand information results from data when they are organised or

    structured in some meaningful ways. The processed data haveto be placed in a context for have them to derive meaning andrelevance. Relevance in turn adds to the value of information

    in decisions and actions. Data processing requires some infusionof intelligence ( meaning, purpose and usefulness) into data to

    generate information. The application of intelligence may bein the form of some principles, knowledge, experience andintuition to convert data into information.

    Definition of Information

    The term 'information' is a very common word and it

    conveys some meaning to the recipient. Itis very difficult todefine it comprehensively. Yet, Davis and Olson 1 give a fairlygood definition. They define information as "data that has

    been processed into a form that is meaningful to the recipientand is of real or perceived value in current or prospective actions

    or decisions".This implies that information is:Processed dataIt has a form

  • 8/12/2019 Data Base Systems

    2/22

    . It is meaningful to the recipientIt has a value, and,

    It is useful in current or prospective decisions oractions.

    Differences between data and information

    Though the words 'data' and 'information' are often used

    interchangeably, there is clear distinction between the two.Some of the major differences are as follows:Data are facts but information, though based on data,is not fact.

    Though information arises from data, all data do notbecome information. There is a lot of selective filteringof data before processing them into information. Data are the result of routine recording of events and

    activities taking place. Generation of information isuser-driven which is not always automatic.

    Data are independent of users whereas information is

    user dependent. Most information reports are designedto meet anticipated information needs of a user or a

    group of users. That is, information for one user isvery likely to be data for other users.

    Field, Record and FileA file is a collection of related records. A record is madeup of a number of fields to hold data items. Each field is madeup of a number of storage spaces. Each storage space can holda byte of information. A collection of logically related files

    forms a database. It usually contains quite a few files holdingdata, which can be accessed by many users.

    Roll no, name, sex and address are the field names. Eachfield reserves some spaces for storage of respective data. Forexample, Roll No has a 7 byte storage space, Name has 30 bytesstorage and so on. Roll No field holds data items 9501101,9501105 and 9501112 as roll numbers of students. ARUN GOKUL,

    RAJESH KUMAR etc. are data items in the name field. Each lineof fields relates to an entity: student. Attributes of the studententitysuch as roll no, sex and address become the field names.

  • 8/12/2019 Data Base Systems

    3/22

    Data fields hold the basic elements of data in them. Allattributes of an entity taken together form a record. When

    such related records are put together, that collection is calleda file. Record d,esign can be logical or physical. Logical designrepresents the logical relationship among the data items in thefield. The physical record design means the way data items are

    physically stored on some media like disk and tape,

    File OrganisationThe file organisation means the way the records are written

    up in a file and depends on:(i) File activity,(ii) Volatility of information, and(iii) Storage deviceFile activity means the properties of records processed in

    one run. If only a few records are accessed in a single run,activity is low. If the file activity is low, it can be stored on disk

    device for efficient file processing. On the other hand, if a

    good number of records are accessed in any given time, the fileactivity is high and such files can be stored on tapes so that

    processing is more efficient and less costly.File volatility means the proportion of record changes. If

    records are changed very frequently, the volatility is very high.For high volatility files such as seat reservation files in atransport firm, disk medium is more efficient and offers a finiteaccess. If only.magnetic tapes are available, then files areorganised in sequential organisation. On the other hand

    magnetic disks offer more flexibility as they support bothsequential access and direct access.

    Other considerations in file organisation are:

    (i) Response time; direct access for quick response(ii) Cost of storage medium

    (iii) Volume of storage, and,(iv) Security of data

    Methods of File Organisation1) Serial file organisation2) Sequential organisation

    3) Indexed sequential organisation4) Direct file organisation

    1. Serial file OrganisationThe records in a serial file are stored randomly and are

    generally appended at the end of a file as the data originate.

    The logical order of records with respect to a key field does notbear any relation to the order of physical storage of such records

    in the file. It is also referred to as non-keyed sequential file.

    2. Sequential file organisation

    This file can be created on a magnetic tape or disk. Eachrecord is written up on the tape or disk one by one logicallyordered on one or more key fields. For example, ordering can

    be in the ascending order of roll no in case of a student file.

  • 8/12/2019 Data Base Systems

    4/22

    The records are stored on a sorted order. If new records areadded or existing records are deleted, the file has to be resorted

    in case of disk file. If the file is stored on a magnetictape, another new file has to be created to update the existingfile with the changes to be effected since creation or last updateof the file. This is done to maintain the proper sequence of the

    records in the file. The advantages of sequential file are simpleorganisation and ease in accessing records sequentially.To minimise the cost of update, the new records are

    bunched in a transaction file and the master file (that is theoriginal file which is relatively permanent) is updated in a singlerun leading to the creation of a new master file. This file update

    is called grand father-father-son update, as there will be threefiles any time.

    3. Indexed-sequential file organisationAn index is a combination of key and storage address of

    records. This file organisation creates an index file in additionto the data file created. The index file holds pairs of key and

    storage address of records in the data file. The index file helpsin randomly locating records in the data file as the physicalstorage location of the record is obtained from the index file.

    This file organisation supports both sequential access ard randomaccess of records in the file.

    4. Direct File Organisation

    These files are created on disks or CD-ROMs. In direct fileorganisation a hashing technique is used to generate storage

    address of records in the file. There are quite a number of waysof converting a key (such as roll no for a student file, and

    product-code for an inventory file) to a numeric value. The keys

    may be numeric, alphabetic or alphanumeric. In the case of

    alphabetic and alphanumeric keys, numeric key value has to begenerated. Direct mapping is done by performing somearithmetic manipulation of the key value, called hashing. Thehashing function, h (k), generates a value for each key, WhlCh is used as an address for storage location.

    Direct file supportsdirect access of files and minimises the access time of records.The records need not be sorted before storage as in an indexedsequentialfile.

    Modes of File Access

    The computer file can be accessed in three modes:sequential, random and dynamic.

    1.Sequential Access

    This means that for accessing a record sequentially, thefile has to be read from the beginning, that is record 1, record2, and so on until the required record is reached. The accesstime of a single record depends on where in the file the record

    is stored. That is, if it is the first record in the file, it takesmuch less time to access than a record that is at the end of thefile.

    2.Random Access

    This method takes the same time for accessing the record

  • 8/12/2019 Data Base Systems

    5/22

    in the file wherever the record is physically located in the file.The storage location of the record is obtained by converting

    the key value of the record into its numeric location address byhash function. Then the record is located directly.

    3.Dynamic Access

    This mode combines both sequential and random modes of

    access. At times, it may be required to start sequential accessfrom a given record only. For example a file holds 2000 recordsand records numbered 1220 to 1250 are to be accessed for

    processing. In this case, it is better to locate the record number1229 randomly and access the remaining records in sequentialmode.

    File Updating

    Updating of files means making" the file current by

    incorporating changes to the records held in it or adding newrecords to it. If data are very large or are likely to change

    occasionally, such data are held in a master file. Master filesare relatively permanent and are used for referring to the data

    there in when required. Data arising out of day-to-daytransactions change very often and they are, therefore, held ina temporary file called transaction file.

    The master files have to be made current by incorporatingchanges in data to the master files. This process is called fileupdating. There are three ways in which these changes areeffected: addition of a record to master file, deletion of a recordfrom, and modification of a record held in, master file.

    Methods of Updating Sequential file

    Sequential files can be updated in two ways: direct updating

    and grand father-father-son updating.

    Direct updatingIn case of direct update, the data are processed online

    and files are updated directly, that is no back up files aremaintained. The direct update keeps all files updated and

    enables real-time response. It saves disk space as transactionfiles are not opened for temporary storage of data. But it isvery difficult to recreate a file if it is corrupted or deleted

    accidentally. Deletion of records is also not possible. For directupdating, the data must be stored in random access files.

    Examples of random access storage devices are magnetic disks,magnetic drums and CDROMs.

    Grand Father-Father-Son update

    In this method two files are used as input files and theyresult in the creation of a new updated master file. The two

    input files are the master files requiring updating and theTransaction file containing the transaction data of the period.Both the files are to be sorted in the same order on the samekey before updating starts.

    Updating Process

    Both the master file and transaction file are read(1) The keys are then compared

  • 8/12/2019 Data Base Systems

    6/22

    (2) If the master file key is less than the transaction filekey, no change is required. The record is copied to

    the new master file.(3) If the master file key is equal to Transaction file key,then the record is to be either deleted or modified.(4) If the master file key is greater than transaction file

    key, then it means that the transaction file record isnew and is therefore to be copied to the new masterfile.

    (5) Three generations of files are maintained always.Hence the name Grandfather-father-son update.

    Indexed File UpdatingIndexed file has random access capability. Indexed filesallow direct updating. Whenever any change in data takes place,

    the particular record is randomly accessed and updated. Thedisadvantage of direct updating is that no back up files are

    maintained and it may be difficult to undo changes effected.

    Indexed file or Indexed sequential file organisation keepsin addition to data files an index or table that lists the address

    of records on disk (namely, track and sector number) accordingto the contents of the key field. The key chosen must be able

    to identify a record uniquely. Any record in the file can be readat any time. Updating is easier in case of indexed files as onlythose records requiring modification need only be read andmodified. Indexed file is highly suitable where quick responseis required; for example, airline reservation or railway

    reservation requires direct updating.

    Database System

    A database is a set of logically connected data files that

    have common access methods between them. It storestransaction data. It does not contain any input or output data.

    The input data may cause a change to operational data but arenot part of the database. Similarly, the output data mean the

    reports or query responses from the system. The input data andoutput data are transient and they are not stored in thedatabase.

    The database system gives centralised control over thedatabase resources. The advantages of centralised control over

    the data are1:Redundancy can be reduced,

    Inconsistency can be avoided,

    The data can be shared,Standards can. be enforced,

    Security restrictions can be applied, and,Integrity can be maintained.

    The concept of IRM calls for treating information as anorganisational resource. In traditional file management system,applications owned their own data and it was not shared withother applications. Each application defined its data, created

  • 8/12/2019 Data Base Systems

    7/22

    its file structure and stored the data conveniently to be accessedby its application program. Thus applications like payroll,

    inventory management etc. owned their own data. Severalapplications stored the same data item in many files. This causeda lot of duplication in data storage and the consequent datainconsistency, as the related files were not updated

    simultaneously. Often application programs had to be modifiedto use data files of other applications.Database is a centrally controlled, integrated collectionof logically organised data. The central control ensures datasharing among applications and enforces database security

    procedures. The data items in the database are logically related

    and this helps in integration of database.Advantages of Database Systems

    The database system approach has the following advantages Data independence

    The data are logically designed into databases and theyare independent of applications. Since the data are programindependent,

    any application can use them without anymodification to the code. Data shareability

    Database permits simultaneous multiple access to thedatabase. Thus, multiple users can share the same data. Data integrity

    Access to the database is controlled by the databasemanagement system. The system authorises personnel for

    entering, editing and deleting data. It also authorises people toaccess data for various data processing activities. Since thedatabase stores one data item only in one place and updates it

    with fresh transaction data automatically, there is little chance

    of inconsistency in the database. Data availabilityThe database is centrally controlled and access to data is

    permitted through an authorisation scheme. The data resources are therefore available to the users in the

    organisation subjectto the authorisation procedure. Data evolvabilityThe database is flexible and can store huge quantity ofdata. It can evolve as the number of applications and queries

    increase to meet their data requirements.Components of Database System

    The common database components are:

    Database filesThe database files store the transaction data.DBMSIt is a set of programs that manages the database. It

    performs a number of tasks like controlling access to thedatabase, making security checks etc.Host level language interface systemThis system interacts with application programs andinterprets their data requests that are issued in high-level

  • 8/12/2019 Data Base Systems

    8/22

    language.Natural language interface

    DBMS needs to process queries and data requests issued toit in natural languages called English-like language. The naturallanguage interface performs interpreting the queries andrequests in natural language. It also facilitates managerial

    interac;tion with the database for decision support applications.Application programsThe application programs request for data from thedatabase. The data independence permits the applications touse the data for a variety of purposes.Data Dictionary

    The data dictionary contains schema of the database. Itdefines each data item in the database, lists its structure,

    source, person authorised to modify it etc.

    Report generator

    The system generates output for users in the form of queryresponse or reports. It might also produce documents like invoice

    and process ad-hoc queries and special report requests.Users of Database SystemsThere are three broad classes of users for organisational

    database systems. They are:1. Application programmers who write application

    programs that manipulate the data in the database.

    2. End-users who access the database by invokingapplication programs or through a structured query

    language, and,3. Database Administrator who is responsible for

    planning, designing, creating and maintaining the

    database.

    Database Management System (DBMS)DBMS is a set of system programs that manages the entiredatabase. It controls access to files. It updates files and retrievesdata from the files on request by applications for processing.

    DBMS maintains database by adding, deleting and modifyingrecords in database. It permits multiple users to access thesame files simultaneously. It acts as an interface between theapplication programs and the data in the database. If the userwants some data from the database, the DBMS processes the

    request, locates the data in the database and displays them forthe user. In traditional file management system, the user needs

    to specify both the data and its storage location. DBMS requires

    storing the database on direct access storage devices.DBMS is general-purpose system software. It works inconjunction with the operating systems to create, process, store,retrieve, control and manage data. Its tasks include defining,

    constructing, and manipulating database for applications.Defining database involves specifying data types, datastructures, storage constraints etc. Constructing database meansstoring the data on storage medium under the control of theDBMS. Database manipulation includes merging databases,

  • 8/12/2019 Data Base Systems

    9/22

    generating reports, processing queries etc.The three main components of a DBMS are data definition

    language, data manipulation language, and data dictionary.

    Data Definition Language

    The contents of database are created using the data

    definition language. It defines relationships between differentdata elements and serves as an interface for application

    programs that use the data.

    Database Manipulation Language

    Data is processed and updated using a language called datamanipulation language. It allows a user to query database andreceive summary or customised reports. The data manipulationlanguage is usually integrated with other programminglanguages, many of which are 3GLs or 4GLs.

    Each database package has its own query language withunique rules and instruction formats. Hence there is no universal

    query language. Query language is used to access the data for

    report generation, query processing and other data processingactivities.

    Structured Query Language (SQL) is a non-procedurallanguage that deals with data, data integrity, data manipulation,

    data access, data retrieval, data query and data security. MostDBMS packages use some version of SQL whose primary purposeis to allow users to query a database and generate ad-hoc reportsthat provide customised information.

    Data Dictionary

    Data dictionary is an electronic document that containsdata definition and data use for every data type in the database.

    It describes the data and its characteristics such as its location,

    size and type. It identifies its origin, use, ownership and methodsof accessing and security of data. DBMS uses data dictionary to

    store all details of data such as data definition, data storage,data use and access privileges.

    Database Administrator (DBA)Organisations that implement database systems constitutea function called database administration to supervise the

    organisational database resources. Database administratorsupervises the database administration function. The job of

    database administrator is to plan, design, create, modify andmaintain the database of the organisation with special emphasis

    on security and data integrity. He is not much concerned with

    the details of the application programs that access the database for data. He maintains the schema and datadictionary. Any

    change in the form of data item, its creation etc. can only bedone by the database administrator.His specific responsibilities include: Guiding the initial design of the database, and laterdeveloping and extending it to meet growing

    organisational requirements. Establishing the database and monitoring the use of

  • 8/12/2019 Data Base Systems

    10/22

    it. Deciding on the content of the database. He has to

    see that the relevant data are collected and stored inthe database. Establishing and monitoring database control and- security policies and procedures.

    Servicing database users by educating and trainingthem in the use of the database.

    Disadvantages of Database

    The following are some of the disadvantages of database:

    Higher data processing costs

    The database system causes higher data processing costs.

    This is due to the strict and elaborate procedure for data access,updating and processing.

    Increased hardware and software costsIt requires more direct access memory capacity, greater

    communication capability (including communication software),and additional processing power. This increases the hardware

    and software costs.Data insecurity and integrityMost of the security and integrity problems are related to

    the fact that many users have access rights to the database.Elaborate security systems are implemented to protect thedatabase and to prevent unauthorised access.

    Insufficient database expertiseDatabase technology is complex. Most organisations do not

    have enough personnel with necessary expertise to implementand manage database systems.

    Database Architecture

    The purpose of database is to facilitate huge storage andquick retrieval of data from the database. There are three basic

    ways of organising data in a database. They are hierarchical,network and relational structures.

    Hierarchical StructureThe relationships between records form a hierarchy. Therecords or aggregates of data are logically conceived to be stored

    at different levels of hierarchy. The structure looks like a treewith branches turned upside down. The relation between entities

    is structured in such a way as to link it with only one data itemat the higher level. In a hierarchical database, the relationship

    between records is one of parent-child. One record can be linked

    to only record at the higher level. Data stored in a lower levelnode (child record) can be accessed only through the higherlevel

    node (parent record).

    Network Structure

    This structure can represent more complex logicalrelationships. This structure permits multiple relations betweendata items. One entity linked up to any number of other types

    of entities. That is, it allows many-to-many relationships amongrecords. Any data element can be related to any number of

  • 8/12/2019 Data Base Systems

    11/22

    other data elements.

    Relational Structure

    Relational Slructure is the most recent of these threestructures. All data elements stored in the database areconceived to be stored in tables. Different data tables arelinked up using common type of data item in different tables.

    The table is called a relation; the columns of the table arecalled domains and the r0WS are called tuples. A tuple containsvalues of data items called data elements of an entity.

    Data Mining and Data Warehousing

    Large organisations have huge quantity of data in theirdatabases and they are still growing. Until recently, businesscomputing

    technologies concentrated on data capture storageand retrieval. But, the need to interpret and find patterns in

    the huge data is growing and computing technologies are makingit possible now. Data mining is the focus of the new class of technologies being developed to help

    business find meaning indata lying idle. The data mining helps in drawing inferences

    from the data and in understanding the customer, products andmarkets betteT.Data mining employs a host of techniques; some very old

    like the statistical techniques including linear programming, andothers are recently developed and are known as data analysis,machine learning, online analytical processing etc. These

    techniques help in discovering new patterns in data.Huge databases have necessitated the need for data

    - warehousing. Data warehousing means organising large amountsof data and making them available company-wide to users. Datawarehousing is an integral part of data mining. The quality and

    quantity of data available for data mining is a function of data

    warehousing. Data mining helps in identifying preferences ofcustomers groups and deciding on promotional material toinfluence their buying habits. The information can be used in

    product development, product customisation and target

    marketing. Data mining represents a new trend in the use ofinformation technology. The focus has shifted from data storageand retrieval to data analysis for making inferences.

    Relational Database Management System (RDBMS)

    A DBMS that is based onrelational model

    is called as RDBMS. Relation model is most

    successful mode of all three models. Designed by E.F. Codd, relational model is based

    on the theory of sets and relations of mathematics.Relational model represents data in the form a table.A table is a two dimensionalarray containing rows and columns. Each row contains datarelated to an entity such

    as a student. Each column contains the data related to asingle attribute of the entity

  • 8/12/2019 Data Base Systems

    12/22

    such as student name.One of the reasons behind the succes

    s of relational model is its simplicity. It is easy tounderstand the data and easy to manipulate.Another important advantage with relational model,compared with remaining two

    models is, it doesnt bind data with relationship betwe en data item. Instead it allowsyou to have dynamic relationship between entities usingthe values of the columns.Almost all Database systems that are sold in the market,now- a-days, have either

    complete or partial implementation of relational model.

    Figure 1 shows how data is represented in relational model and what are the terms

    used to refer to various components of a table. The following are the terms used in relational model.

    Tuple / RowA single row in the table is called as tuple. Eachrow represents the data of asingle entity.Attribute / Column

    A column stores an attribute of the entity. For exa

    mple, if details of students arestored then student name is an attribute; course isanother attribute and so on.

    Column NameEach column in the table is given a name. This name isused to refer to value in the

    column.Table Name

    Each table is given a name. This is used to refer to the

  • 8/12/2019 Data Base Systems

    13/22

    table. The name depicts thecontent of the table.

    The following are two other terms, primary key and foreign key, that are veryimportant in relational model.Primary Key

    A table contains the data related entities. If you take STUDETNS table, it contains datarelated to students. For each student there will be onerow in the table. Eachstudentsdata in the table must be uniquely identified. In o

    rder to identify each entity uniquelyin the table, we use a column in the table. That colum

    n, which is used to uniquelyidentify entities (students) in the table is called as pr

    imary key.In c

    ase of STUDENTS table (see figure 1) we can use ROLLNOas the primary key as itin not duplicated.

    So a primary key can be defined as aset of columns used to uniquelyidentify rows of a table.

    Some other examples for primary keys are account numberin bank, product code of

    products, employee number of an employeeComposite Primary KeyIn some tables a single column cannot be used to uniquely

    identify entities (rows). In

    that case we have to use two or more columns to uniquelyidentify rows of the table.When a primary key contains two or more columns it is called as composite primary

    key.In figure 2, we have PAYMENTS table, which contains the details of payments made bythe students. Each row in the table contains roll numberof the student, payment date

    and amount paid. Neither of the columns can uniquelyidentify rows. So we have to

    combine ROLLNO and DP to uniquely identify rows in t

    he table. As primary key isconsisting of two columns it is called as composite primary key

  • 8/12/2019 Data Base Systems

    14/22

    Figure 2:Composite Primary Key

    Foreign KeyIn relational model, we often store data in different tables and put them together to

    get complete information. For example, in PAYMENTStable we have only ROLLNO of

    the student. To get remaining information about thestudent we have to useSTUDETNS table. Roll number in PAYMENTS table can be

    used to obtain remaininginformation about the student.The relationship between entities student and paymentis one-to-many. One studentmay make payment for many times. As we already h

    ave ROLLNO column in PAYMENTStable, it is possible to join with STUDENTS table andget information about parententity (student).Roll number column of PAYMENTS table is called as

    foreign keyas it is used to joinPAYMENTS table with STUDENTS table. So foreign keyis the key on the many side of

    the relationship.

  • 8/12/2019 Data Base Systems

    15/22

    Figure 3:Foreign Key

    ROLLNO column of PAYMENTS table must derive its valuesfrom ROLLNO column ofSTUDENTS table.

    When a child table contains a row that doesnt refer toa corresponding parent key, it

    is called asorphan record. We must not have orphan records, as theyare result of lack

    of data integrity.Integrity Rules

    Data integrity is to be maintained at any cost. If data loses integrity it becomesgarbage. So every effort is to be made to ensure dataintegrity is maintained. Thefollowing are the main integrity rules that are to b

    e followed.Domain integrityData is said to contain domain integrity when the value of a column is derived fromthe domain. Domain is the collection of potential valu

    es. For example, column date ofjoining must be a valid date. All valid dates form on

    e domain. If the value of date ofjoining is an invalid date, then it is said to violatedomain integrity.

    Entity integrityThis specifies that all values in primary key must be not

    null and unique. Each entitythat is stored in the table must be uniquely identified. Every table must contain a

    primary key and primary key must be not null and uni

  • 8/12/2019 Data Base Systems

    16/22

    que.Referential Integrity

    This specifies that a foreign key must be either null ormust have a value that isderived from corresponding parent key. For example, if we have a table called

    BATCHES, then ROLLNO column of the table will be referencing ROLLNO column ofSTUDENTS table. All the values of ROLLNO column of BATCHES table must be derivedfrom ROLLNO column of STUDENTS table. This is because ofthe fact that no student

    who is not part of STUDENTS table can join a batchRelational Algebra

    A set of operators used to perform operations on tablesis called as

    relationalalgebra

    . Operators in relational algebra take one or moretables as parameters and

    produce one table as the result.

    The following are operators in relational algebra:UnionIntersect

    Difference or minusProject

    SelectJoinUnion

    This takes two tables and returns all rows that are belo

    nging to either first or secondtable (or both). See figure 4.

    Figure 4:

    Union, Intersect and Minus

  • 8/12/2019 Data Base Systems

    17/22

    i ntersect

    This takes two tables and returns all rows that are belo

    nging to first and second table.

    See figure 4.Difference or Minus

    This takes two tables and returns all rows that exist inthe first table and not in the

    second table. See figure 4.ProjectTakes a single table and returns the vertical subset of t

    he table. See figure 1.5.Select

    Takes a single table and returns a horizontal subset of the table. That means it returns

    only those rows that satisfy the condition. See figure 1.5.

    Figure 5:Project, Select and Join

    JoinRows of two table are combined based on the given colum

    n(s) values. The tablesbeing joined must have a common column. See figure 5.

    Structured Query Language (SQL)Almost all relational database management systems use SQL

    (Structured QueryLanguage) for data manipulation and retrieval. SQL

    is the standard language forrelational database systems. SQL is a non-procedural language, where you need to

    concentrate on what you want, not on how you get it.Put it in other way, you need

    not be concerned with procedural details.

  • 8/12/2019 Data Base Systems

    18/22

    SQL Commands are divided into four categories, depending upon what they do.DDL (Data Definition Language)

    DML (Data Manipulation Language)DCL (Data Control Language)Query (Retrieving data)DDL

    commands are used to define the data. For example, CREATE TABLE.DMLcommands such as, INSERT and DELETE are used to manipulate data.DCLcommands are used to control access to data. For example, GRANT.Query

    is used to retrieve data using SELECT.DML and Query are also collectively called as DML. And DDL and DCL are called as DDL

    Data processing Methods

    Data that is stored is processed in three different ways.Processing data means

    retrieving data and deriving information from data.Depending upon where it is doneand how it is done, there are three methods.

    Centralized data processingDe-centralized data processingDistributed data processing

    Centralized data processingIn this method the entire data is stored in one place a

    nd processed there itself.Mainframe is best example for this kind of processing. The entire data is stored and

    processed on mainframe. All programs, invoked from clien

    ts (dumb terminals), areexecuted on the mainframe and data is also stored in mainframe

    Figure 6:Centralized data processing.As you can see in figure 6, all terminals are attached to mainframe. Terminals do not

    have any processing ability. They take input from users

  • 8/12/2019 Data Base Systems

    19/22

    and send output to users.Decentralized data processing

    In this data is processed at various places. A typical example is each departmentcontaining its own system for its own data processing needs.See figure 7, for an

    example of decentralized data processing. Each department stores data related toitself and runs all programs that process its data. But the biggest drawback of thistype of data processing is that data is to be duplicated.As common data is to be

    stored in each machine, it is called asredundancy

    . This redundancy will cause datainconsistency. That means the data stored by two departme

    nts will not agree witheach other.

    Data in this mode is duplicated, as there is no means tostore common data in one

    place and access from all machines

    Figure 7:Decentralized Data Processing

    Distributed Data Processing (Client/Server)In this data processing method, data process is distributed

    between client and server.Server takes care of managing data. Client interacts wi

    th user. For example, if youassume a process where we need to draw a graph to show the number of students in a

    given month for each subject, the following steps will take place:

  • 8/12/2019 Data Base Systems

    20/22

    Figure 8:Distributed data processing

    .1 First, client interacts with user and takes input (month

    name) from user and thenpasses it to server.2.Server then will query the database to get data rela

    tedto the month, which is sentto server, and will send data back to client.3.The client will then use the data retrieved from data

    base to draw a graph.

    If you look at the above process, the client and serverare equally participating in the

    process. That is the reason this type of data processing is called as distributed. The

    process is evenly distributed between client and server. C

    lient is a program written in

    one of the font-end tools such as Visual basic or Delphi.Server is a databasemanagement system such as Oracle, SQL Server etc. The language used to send

    commands from client to server is SQL (see figure 8).This is also called as two-tier client/server architecture.

    In this we have only two tiers(layers) one is server and another is client.The following is an example of 3-tier client server, where client interacts with user onone side and interacts with application server on anothe

    r side. Application, which

    processes and validates data, takes the request from clientand sends the request inthe language understood by database server. Application servers are generally object

    oriented. They expose a set of object, whose methods areto be invoked by client to

    perform the required operation.

    Application server takes some burden from database serverand some burden from

  • 8/12/2019 Data Base Systems

    21/22

    client.

    Figure 9:

    3-tier client-server architecture.

    In 3-tier client/server architecture, database server andapplication server may reside

    on different machines or on the same machine. Since theadvent of web applicationwe are also seeing more than 3-tiers, which is called as n

    -tier architecture. Forexample, the following is the sequence in a typical webapplication.1.Client- web browser, sends request to web server.

    2.Web server executes the request page, which may be an AS

    P or JSP.3.

    ASP or JSP will access application server.4.Application server then will access database server.

    SummaryA DBMS is used to store and manipulate data. A DBMS based on relational model isRDBMS. Primary key is used for unique identificationof rows and foreign key to join

    tables. Relational algebra is a collection of operators used to operate on tables. Wewill see how to practically use these operators in laterchapter.SQL is a language commonly used in RDBMS to store and r

    etrieve data. In my opinion,

    SQL is one of the most important languages if you aredealing with an RDBMS becausetotal data access is done using SQL.

    SQL can execute queries against a database

  • 8/12/2019 Data Base Systems

    22/22

    SQL can retrieve data from a database

    SQL can insert records in a database

    SQL can update records in a database

    SQL can delete records from a database

    SQL can create new databases

    SQL can create new tables in a database

    SQL can create stored procedures in a database

    SQL can create views in a database

    SQL can set permissions on tables, procedures, and views