Database Management Systems-Done

Embed Size (px)

Citation preview

  • 8/4/2019 Database Management Systems-Done

    1/28

    25

    Assignment Set - 1(MI0034)

  • 8/4/2019 Database Management Systems-Done

    2/28

    25

    Q.1 Differentiate between Traditional File System &Modern Database System? Describe the properties ofDatabase & the Advantage of Database?

    Differences between Traditional File system and Modern Database

    Traditional File SystemModern Database Management

    SystemsTraditional File system is the system thatwas followed before the advent of DBMSi.e., it is the older way.

    This is the Modern way which has replacedthe older concept of File system.

    In Traditional file processing, datadefinition is part of the applicationprogram and works with only specificapplication.

    --> Data definition is part of the DBMS--> Application is independent and can be

    used with any application.File systems are Design Driven; they

    require design/coding change when newkind of data occurs.E.g.: In a traditional employee themaster file has Emp_name, Emp_id,Emp_addr, Emp_design, Emp_dept,Emp_sal, if we want to insert one morecolumn Emp_Mob number then itrequires a complete restructuring of thefile or redesign of the application code,even though basically all the data exceptthat in one column is the same.

    --> One extra column (Attribute) can be

    added without any difficulty--> Minor coding changes in the

    Application program may be required.

    Traditional File system keeps redundant

    [duplicate] information in manylocations. This might result in the loss ofData Consistency.For e.g.: Employee names might existin separate files like Payroll Master Fileand also in Employee Benefit Master Fileetc. Now if an employee changes his orher last name, the name might bechanged in the pay roll master file butnot be changed in Employee BenefitMaster File etc. This might result in theloss of Data Consistency.

    Redundancy is eliminated to the maximum

    extent in DBMS if properly defined.

    In a File system data is scattered invarious files, and each of these files maybe in different formats, making it difficultto write new application programs toretrieve the appropriate data.

    This problem is completely solved here.

    Security features are to be coded in theapplication Program itself

    Coding for security requirements is notrequired as most of them have been takencare by the DBMS.

  • 8/4/2019 Database Management Systems-Done

    3/28

    25

    The following are the important properties of Database:

    1. A database is a logical collection of data having some implicitmeaning. If the data are not related then it is not called as proper

    database.

    E.g. Student studying in class II got 5th rank.

    2. A database consists of both data as well as the description of thedatabase structure and constraints.

    E.G

    Field Name Type Description

    Stud_name Character It is the studentsname

    Class Alpha numeric It is the class of thestudent

    3. A database can have any size and of various complexity. If weconsider the above example of employee database the name andaddress of the employee may consists of very few records each with

    simple structure. E.g.

    Emp_name

    Emp_id

    Emp_addr Emp_desig

    Emp_Sal

    Prasad 100 Shubhodaya, NearKatariguppe BigBazaar, BSK II stage,Bangalore

    ProjectLeader

    40000

    Usha 101 #165, 4th mainChamrajpet,

    Bangalore

    Softwareengineer

    10000

    Nupur 102 #12, Manipal Towers,Bangalore

    Lecturer 30000

    Peter 103 Syndicate house,Manipal

    ITexecutive

    15000

    Like this there may be n number of records.

    Stud_name Class Rank obtained

    Rakhi Class V 2nd

  • 8/4/2019 Database Management Systems-Done

    4/28

    25

    4. The DBMS is considered as general-purpose software system thatfacilitates the process of defining, constructing and manipulatingdatabases for various applications.

    5. A database provides insulation between programs, data and dataabstraction. Data abstraction is a feature that provides the

    integration of the data source of interest and helps to leverage thephysical data however the structure is.

    6. The data in the database is used by variety of users for variety ofpurposes. For E.g. when you consider a hospital databasemanagement system the view of usage of patient database isdifferent from the same used by the doctor. In this case the data arestored separately for the different users. In fact it is stored in asingle database. This property is nothing but multiple views of thedatabase.

    7. Multiple user DBMS must allow the data to be shared by multipleusers simultaneously. For this purpose the DBMS includesconcurrency control software to ensure that the updation done tothe database by variety of users at single time must get updatedcorrectly. This property explains the multiuser transactionprocessing.

    Advantages of Database

    1. Redundancy is reduced

    2. Data located on a server can be shared by clients

    3. Integrity (accuracy) can be maintained

    4. Security features protect the Data from unauthorized access

    5. Modern DBMS support internet based application.

    6. In DBMS the application program and structure of data are independent.

    7. Consistency of Data is maintained

    8. DBMS supports multiple views. As DBMS has many users, and each oneof them might use it for different purposes, and may require to view andmanipulate only on a portion of the database, depending on requirement.

  • 8/4/2019 Database Management Systems-Done

    5/28

    25

    Q.2 What is the disadvantage of sequential fileorganization? How do you overcome it? What are theadvantages & disadvantages of Dynamic Hashing?

    Disadvantage of sequential file organization is that we must use linearsearch or binary search to locate the desired record and that results inmore i/o operations. In this there are a number of unnecessarycomparisons. In hashing technique or direct file organization, the key valueis converted into an address by performing some arithmetic manipulationon the key value, which provides very fast access to records.

    Let us consider a hash function h that maps the key value k to the valueh(k). The VALUE h(k) is used as an address.

    The basic terms associated with the hashing techniques are:

    1) Hash table: It is simply an array that is having address of records.

    2) Hash function: It is the transformation of a key into the corresponding

    location or address in the hash table (it can be defined as a function thattakes key as input and transforms it into a hash table index).

    3) Hash key: Let R be a record and its key hashes into a key value calledhash key.

    Internal Hashing

    For internal files, hash table is an array of records, having array in therange from 0 to M-1. Let as consider a hash function H(K) such that

  • 8/4/2019 Database Management Systems-Done

    6/28

    25

    H(K)=key mod M which produces a remainder between 0 and M-1depending on the value of key. This value is then used for the recordaddress. The problem with most hashing function is that they do notguarantee that distinct value will hash to distinct address, a situation thatoccurs when two non-identical keys are hashed into the same location.

    For example: let us assume that there are two non-identical keys k1=342and k2=352 and we have some mechanism to convert key values toaddress. Then the simple hashing function is:

    h(k) = k mod 10

    Here h (k) produces a bucket address.

    To insert a record with key value k, we must have its key first. E.g.:Consider h (K-1)=K1% 10 will get 2 as the hash value. The record with keyvalue 342 is placed at the location 2, another record with 352 as its key

    value produces the same has address i.e. h(k1) = h(k2). When we try toplace the record at the location where the record with key K1 is alreadystored, there occurs a collision. The process of finding another position iscalled collision resolution. There are numerous methods for collisionresolution.

    1) Open addressing: With open addressing we resolve the hash clash byinserting the record in the next available free or empty location in thetable.

    2) Chaining: Various overflow locations are kept, a pointer field is added

    to each record and the pointer is set to address of that overflow location.

    External Hashing for Disk Files

    Handling Overflow for Buckets By Chaining

    Hashing for disk files is called external hashing. Disk storage is divided intobuckets, each of which holds multiple records. A bucket is either one diskblock or a cluster of continuous blocks.

    The hashing function maps a key into a relative bucket number. A tablemaintained in the file header converts the bucket number into thecorresponding disk block address

    The collision problem is less severe with buckets, because many recordswill fit in a same bucket. When a bucket is filled to capacity and we try toinsert a new record into the same bucket, a collision is caused. However,we can maintain a pointer in each bucket to address overflow records.

    The hashing scheme described is called static hashing, because a fixednumber of buckets M is allocated. This can be serious drawback for

  • 8/4/2019 Database Management Systems-Done

    7/28

    25

    dynamic files. Suppose M be a number of buckets, m be the maximumnumber of records that can fit in one bucket, then at most m*M recordswill fit in the allocated space. If the records are fewer than m*M numbers,collisions will occur and retrieval will be slowed down.

    Dynamic Hashing Technique

    A major drawback of the static hashing is that address space is fixed.Hence it is difficult to expand or shrink the file dynamically.

    In dynamic hashing, the access structure is built on the binaryrepresentation of the hash value. In this, the number of buckets is notfixed [as in regular hashing] but grows or diminishes as needed. The filecan start with a single bucket, once that bucket is full, and a new record isinserted, the bucket overflows and is slit into two buckets. The records aredistributed among the two buckets based on the value of the first[leftmost] bit of their hash values. Records whose hash values start with a

    0 bit are stored in one bucket, and those whose hash values start with a 1bit are stored in another bucket. At this point, a binary tree structurecalled a directory is built. The directory has two types of nodes.

    1. Internal nodes: Guide the search, each has a left pointer correspondingto a 0 bit, and a right pointer corresponding to a 1 bit.

    2. Leaf nodes: It holds a pointer to a bucket a bucket address.

    Each leaf node holds a bucket address. If a bucket overflows, for example:a new record is inserted into the bucket for records whose hash values

    start with 10 and causes overflow, then all records whose hash value startswith 100 are placed in the first split bucket, and the second bucketcontains those whose hash value starts with 101. The levels of a binarytree can be expanded dynamically.

    Extendible Hashing: In extendible hashing the stored file has a directoryor index table or hash table associated with it. The index table consists ofa header containing a value d called the global depth of the table, and alist of 2d pointers [pointers to data block]. Here d is the number of leftmost bits currently being used to address the index table. The left most dbits of a key, when interpreted as a number give the bucket address inwhich the desired records are stored.

    Each bucket also has a header giving the local depth d1.

    Of that bucket specifies the number bits on which the bucket contentsare based. Suppose d=3 and that the first pointer in the table [the 000pointer] points to a bucket for which the local depth d 1 is 2, the local depth2 means that in this case the bucket contains all the records whose searchkeys start with 000 and 001 [because first two bits are 00].

  • 8/4/2019 Database Management Systems-Done

    8/28

    25

    To insert a record with search value k, if there is room in the bucket, weinsert the record in the bucket. If the bucket is full, we must split thebucket and redistribute the current records plus the new one.

    For ex: To illustrate the operation of insertion using account file. Weassume that a bucket can hold only two records.

    Bangalore 100

    Mysore 200

    Mysore 300

    Mangalore 400

    Hassan 500

    Hassan 600

    Hassan 700

    Hash function for branch name

    Branch-name H(branch-name)

    Bangalore 0010 1101

    Mysore 1010 0011

    Mangalore 1100 0001

    Hassan 1111 0001

    Let us insert the record (Bangalore, 100). The hash table (address table)contains a pointer to the one-bucket, and the record is inserted. Thesecond record is also placed in the same bucket (bucket size is 2).

    When we attempt to insert the next records (downtown, 300), the bucketis full. We need to increase the number of bits that we use from the hashvalue i.e., d=1, 21=2 buckets. This increases entries in the hash addresstable. Now the hash table contains two entries i.e., it points to twobuckets. The first bucket contains the records whose search key has ahash value that begins with 0, and the second bucket contains records

    whose search key has a hash value beginning with 1. Now the local depthof bucket =1.

    Next we insert (Mianus, 400). Since the first bit of h (Mianus) is 1, the newrecord should placed into the 2nd bucket, but we find that the bucket is full.We increase the number of bits for comparison, that we use from the hashto 2(d=2). This increases the number of entries in the hash table to 4 (2 2 =4). The records will be distributed among two buckets. Since the bucketthat has prefix 0 was not split, hash prefix 00 and 01 both point to thisbucket.

  • 8/4/2019 Database Management Systems-Done

    9/28

    25

    Next (perryridge, 500) record is inserted into the same bucket as Mianus.Next insertion of (Perryridge, 600) results in a bucket overflow, causes anincrease in the number of bits (increase global depth d by 1 i,e d=3), andthus increases the hash table entries. (Now the hash table has 23 = 8entries). The records will be distributed among two buckets; the firstcontains all records whose hash value start with 110, and the second all

    those whose hash value start with 111.

    Advantages of dynamic hashing:

    1. The main advantage is that splitting causes minor reorganization, sinceonly the records in one bucket are redistributed to the two new buckets.

    2. The space overhead of the directory table is negligible.

    3. The main advantage of extendable hashing is that performance doesnot degrade as the file grows. The main space saving of hashing is that no

    buckets need to be reserved for future growth; rather buckets can beallocated dynamically.

    Disadvantages of dynamic hashing:

    1. The index tables grow rapidly and too large to fit in main memory. Whenpart of the index table is stored on secondary storage, it requires extraaccess.

    2. The directory must be searched before accessing the bucket, resultingin two-block access instead of one in static hashing.

    3. A disadvantage of extendable hashing is that it involves an additionallevel of indirection.

    Q. 3 What is relationship type? Explain the differenceamong a relationship instance, relationship type & arelation set?

    Relationship type is a meaningful association among entity types.

    Relationship means: an association of entities where the association

    includes one entity from each participating entity type.

    Each uniquely identifiable occurrence of a relationship type is referred to

    as a relationship.

    A relationship indicates the particular entities that are related with each

    other in some form or by means.

  • 8/4/2019 Database Management Systems-Done

    10/28

    25

    In the real world, items have relationships to one another. E.g.: A book ispublished by a particular publisher. The association or relationship thatexists between the entities relates data items to each other in ameaningful way. A relationship is an association between entities.

    A collection of relationships of the same type is called a relationship set.

    A relationship type R is a set of associations between E, E2..En entitytypes mathematically, R is a set of relationship instances ri.

    E.g.: Consider a relationship type WORKS_FOR between two entity types employee and department, which associates each employee with thedepartment the employee works for. Each relationship instance inWORKS_FOR associates one employee entity and one department entity,where each relationship instance is ri which connects employee anddepartment entities that participate in ri.

    Employee el, e3 and e6 work for department d1, e2 and e4 work for d2and e5 and e7 work for d3. Relationship type R is a set of all relationshipinstances.

    Whenever we want to form a relationship between two objects, you mustuse a relationship type. A relationship type defines the roles that an objectcan play in a relationship. For example, if you placed a containsrelationship type between two objects, that creates a hierarchicalrelationship between the two objects. One object would have to play achild role, and the other object would have to play a parent role.

    The Information Catalog Center contains a set of predefined relationshiptypes that are ready for you to use in your organization. These relationshiptypes are already associated with the predefined object types inInformation Catalog Center.

    Each relationship type is based on a category, which determines the rolesthat object types can play in it. You can create your own relationshiptypes, but you must select a predefined category which will determine theroles that are used within each new relationship type.

  • 8/4/2019 Database Management Systems-Done

    11/28

    25

    Degree of relationship type: The number of entity sets thatparticipatein a relationship set. A unary relationship exists when an association ismaintained with a single entity.

    A binary relationship exists when two entities are associated.

    A tertiary relationship exists when there are three entities associated.

    Role Names and Recursive Relationship

    Each entry type to participate in a relationship type plays a particular rolein the relationship. The role name signifies the role that a participatingentity from the entity type plays in each relationship instance, e.g.: In theWORKS FOR relationship type, the employee plays the role of employee orworker and the department plays the role of department or employer.However in some cases the same entity type participates more than oncein a relationship type in different roles. Such relationship types are called

    recursive.

    E.g.: employee entity type participates twice in SUPERVISION once in therole of supervisor and once in the role of supervisee.

    Constraints on Relationship Types

  • 8/4/2019 Database Management Systems-Done

    12/28

    25

    Relationship types usually have certain constraints that limit the possiblecombination of entities that may participate in the relationship instance.

    E.g.: If the company has a rule that each employee must work for exactlyone department. The two main types of constraints are cardinality ratioand participation constraints.

    The cardinality ratio specifies the number of entities to which anotherentity can be associated through a relationship set.

    Mapping cardinalities should be one of the following.

    One-to-One: An entity in A is associated with at most one entity in B andvice versa.

    Employee can manage only one department and that a department hasonly one manager.

    One-to-Many: An entity in A is associated with any number in B. An entityin B however can be associated with at most one entity in A.

    Each department can be related to numerous employees but an employeecan be related to only one department

    Many-to-One: An entity in A is associated with at most one entity in B. Anentity in B however can be associated with any number of entities in A.Many depositors deposit into a single account.

    Man-to-Many: An entity in A is associated with any number of entities inB and an entity in B is associated with any number of entities in A.

  • 8/4/2019 Database Management Systems-Done

    13/28

    25

    An employee can work on several projects and several employees canwork on a project.

    Participation Roles: There are two ways an entity can participate in arelationship where there are two types of participations.

    1. Total:The participation of an entity set E in a relationship set R is saidto be total if every entity in E participates in at lest one relationship in R.Every employee must work for a department. The participation ofemployee in WORK FOR is called total.

    Total participation is sometimes called existence dependency.

    2. Partial: If only some entities in E participate in relationship in R, theparticipation of entity set E in relationship R is said to be partial.

    We do not expect every employee to manage a department, so theparticipation of employee in MANAGES relationship type is partial.

  • 8/4/2019 Database Management Systems-Done

    14/28

    25

    Weak Entity: Some entity types may not have any key attribute of theirown; they are called weak entity types. An entity set that has a primarykey is termed as a strong entity type. A weak entity type always has atotal participation [existence dependence] with respect to a strong entity.

    A weak entity type is dependent on the existence of another entity. Weak

    entity is also referred to as child, dependent OR subordinate entities, andstrong entities as parent, owner OR dominant entities. E.g.: In the followingrelationship PARENT is a weak entity as it needs the entity EMPLOYEE forits existence. The entities EMPLOYEE, COMPANY etc. are strong entities.Weak entities are represented by a double lined rectangle.

    Q. 4 What is SQL? Discuss.

    The Structured Query language which is used for programming thedatabase. The history of SQL began in an IBM laboratory in San Jose,

  • 8/4/2019 Database Management Systems-Done

    15/28

    25

    California, where SQL was developed in the late 1970s. SQL stands forstructured Query Language. It is a non-procedural language, meaning thatSQL describes what data to retrieve delete or insert, rather than how toperform the operation. It is the standard command set used tocommunicate with the RDBMS.

    A SQL query is not-necessarily a question to the database. It canbe command to do one of the following.

    Create or delete a table. Insert, modify or delete rows. Search several rows for specifying information and return the result inorder. Modify security information.

    THE SQL STATEMENT CAN BE GROUPED INTO FOLLOWING CATEGORIES.

    1. DDL(Data Definition Language)2. DML(Data Manipulation Language)3. DCL(Data Control Language)4. TCL(Transaction Control Language)

    DDL: Data Definition LanguageThe DDL statement provides commands for defining relation schema i,e forcreating tables, indexes, sequences etc. and commands for dropping,altering, renaming objects.

    DML: (Data Manipulation Language)

    The DML statements are used to alter the database tables in someway.The UPDATE, INSERT and DELETE statements alter existing rows in adatabase tables, insert new records into a database table, or remove oneor more records from the database table.

    DCL: (Data Control Language)The Data Control Language Statements are used to Grant permission tothe user and Revoke permission from the user, Lock certain Permission forthe user.SQL DBA>Revoke Import from Akash;SQL DBA>Grant all on emp to public;SQL DBA>Grant select, Update on EMP to L.Suresh;SQlDBA>Grant ALL on EMP to Akash with Grant option;

    Revoke: Revoke takes out privilege from one or more tables orviews.SQL DBA>rEOKE UPDATE, DELETE FROM l.sURES;SQL DBA>Revoke all on emp from Akash

    TCL: (Transaction Control Language)It is used to control transactions.

  • 8/4/2019 Database Management Systems-Done

    16/28

    25

    Eg: CommitRollback: Discard/Cancel the changes up to the previous commit point.

    SQL is used to communicate with a database. According to ANSI (AmericanNational Standards Institute), it is the standard language for relationaldatabase management systems. SQL statements are used to perform

    tasks such as update data on a database, or retrieve data from adatabase. Some common relational database management systems thatuse SQL are: Oracle, Sybase, Microsoft SQL Server, Access, Ingres, etc.Although most database systems use SQL, most of them also have theirown additional proprietary extensions that are usually only used on theirsystem. However, the standard SQL commands such as "Select", "Insert","Update", "Delete", "Create", and "Drop" can be used to accomplish almosteverything that one needs to do with a database. This tutorial will provideyou with the instruction on the basics of each of these commands as wellas allow you to put them to practice using the SQL Interpreter.

    A relational database system contains one or more objects called tables.The data or information for the database are stored in these tables. Tablesare uniquely identified by their names and are comprised of columns androws. Columns contain the column name, data type, and any otherattributes for the column. Rows contain the records or data for thecolumns. Here is a sample table called "weather".

    city, state, high, and low are the columns. The rows contain the data forthis table:

    Weather

    city statehigh

    low

    Phoenix Arizona 105 90

    Tucson Arizona 101 92

    Flagstaff Arizona 88 69

    San DiegoCalifornia

    77 60

    Albuquerque

    NewMexico

    80 72

    The select statement is used to query the database and retrieve selecteddata that match the criteria that you specify. Here is the format of asimple select statement:

    select "column1"

  • 8/4/2019 Database Management Systems-Done

    17/28

    25

    [,"column2",etc]from "tablename"[where "condition"];[] = optional

    The column names that follow the select keyword determine which

    columns will be returned in the results. You can select as many columnnames that you'd like, or you can use a "*" to select all columns.

    The table name that follows the keyword from specifies the table that willbe queried to retrieve the desired results.

    The where clause (optional) specifies which data values or rows will bereturned or displayed, based on the criteria described after the keywordwhere.

    Conditional selections used in the where clause:

    = Equal

    > Greater than

    < Less than

    >=Greater than orequal

  • 8/4/2019 Database Management Systems-Done

    18/28

    25

    To illustrate, consider a simple SQL command, SELECT. SELECT retrieves aset of data from the database according to some criteria, using the syntax:

    SELECT list_of_column_names from list_of_relation_names whereconditional_expression_that_identifies_specific_rows

    The list_of_relation_names may be one or more comma-separatedtable names or an expression operating on whole tables.

    The conditional_expression will contain assertions about the valuesof individual columns within individual rows in a table, and onlythose rows meeting the assertions will be selected. Conditionalexpressions within SQL are very similar to conditional expressionsfound in most programming languages.

    For example, to retrieve from a table called Customers all columns(designated by the asterisk) with a value of Smith for the column

    Last_Name, a client program would prepare and send this SQL statementto the server back end:

    SELECT * FROM Customers WHERE Last_Name='Smith';The server back end may then reply with data such as this:

    +---------+-----------+------------+

    | Cust_No | Last_Name | First_Name |

    +---------+-----------+------------+

    | 1001 | Smith | John |

    | 2039 | Smith | David |

    | 2098 | Smith | Matthew |

    +---------+-----------+------------+

    3 rows in set (0.05 sec)Following is an SQL command that displays only two columns,column_name_1 and column_name_3, from the table myTable:

    SELECT column_name_1, column_name_3 from myTable

    Below is a SELECT statement displaying all the columns of the tablemyTable2 for each row whose column_name_3 value includes the string"brain":

    SELECT * from column_name_3 where column_name_3 like '%brain%'

  • 8/4/2019 Database Management Systems-Done

    19/28

    25

    SQL, short for Structured Query Language is pronounced Ess Queue el andis a simple non procedural language that lets you store and retrieve datain a relational database. This is a quick introduction to SQL.

    Two Classes of SQL

    SQL falls into two classesFlavors of SQLThe standards in use today are Ansi-89 and Ansi-92 though there havebeen three more released (1999, 2003 and 2006). It's the flavor supportedby the database server you're using that matters. All modern ones supportAnsi-92. Each database publisher has their own slightly different version ofSQL. If you use proprietary SQL features, your SQL becomes non portableand needs rewriting if you move to a different database server.Third Party ToolsOften, the tools you use such as IDEs for designing and running SQLprovide table design, creation and management. Here are a few I've used.

    EMS MySQl ManagerSQLYogSQL Server Enterprise ManagerDbArtisanMySQLAdmin

    Most are standalone applications but the last one is an open source webapplication.

    1.Data Manipulation Language (DML) - SQL for retrieving and storing data.2.Data Design Language (DDL) - SQL for creating, altering and dropping

    tables.

    Most of the time, the SQL you use is for manipulating the data butoccasionally you'll need to create new tables, alter existing ones or add anindex. One of the best things about SQL is that you can do all of theseoperations with just simple SQL commands.

    Comments in SQLUse two dashes to make the rest of the line a comment:-- Don't mangle the furdwinder!Most SQLs support the C-Style comments as well.

    /* Like this */

    Data StorageData is stored in tables made up of individual rows of data. Each row hasthe same number and type of columns, defined when you created thetable. Database data is held in each row, much like fields or members in astruct or class. A typical payroll record might have these columns.

    EmployeeID int

  • 8/4/2019 Database Management Systems-Done

    20/28

    25

    EmployeeName varchar(30)EmployeeGradeID int -- a number indicating some levelEmployeeDOB datetimeTotalGrossPayYTD float -- (YTD means Year To Date)TotalTaxDeductedYTD floatTotalGrossPayM float -- (M for Month)

    TotalTaxDeductedM floatAnnualSalary FloatTaxBand varchar(8) -- Special stringDateLastPaid datetime

    There would probably be other administrative columns such as date of lastpayroll run etc.

    You use data manipulation SQL to create this with the Create Tablecommand. Another table might have the employees details in the firm-such as the department they work in, total days of vacation allowed (and

    taken) etc. These aren't relevant to the payroll so wouldn't be in that tablebut they would probably have the EmployeeID and EmployeeNamecolumns. These are needed for indexing.

    IndexesTables can have millions of rows but usually only a subset of those rows isneeded to work on. This is where the concept of an index comes from. Thedatabase designer tells the database server to add an index to a particularcolumn. With 100,000 rows in the Payroll table, fetching the row foremployee id 78965 would require a lot of reads to find that particular rowwithout indexes. With an index, it reads the index table, find where that

    row is held and then fetches it- much faster!The Four Main SQL statementsThese are

    1.Select - Fetches data from one or more tables2.Insert - Inserts a row of data into a table3.Update - Changes in a value in one or more rows4.Delete - Deletes one or more rows of data from a table

    This is the SQL to fetch an employee record from the payroll table

    select * from Payroll where EmployeeId = 78965

    The * means fetch all columns. You could also fetch just a couple ofcolumns with this query for all employees in taxband 'XYZ'.

    select EmployeeID, AnnualSalary from Payroll where TaxBand = 'XYZ'

    If you leave the where clause off then you get all rows. The selectstatement lets you fetch any combination of columns and rows from one(or more tables).

  • 8/4/2019 Database Management Systems-Done

    21/28

    25

    Inserting a new row

    Insert adds a row. you specify which columns you wish to add and thenprovide values for them.

    insert into Payroll ( EmployeeID, EmployeeName, EmployeeGrade,EmployeeDOB,

    TotalGrossPayYTD, TotalTaxDeductedYTD, TotalGrossPayM, TotalTaxDeductedM, AnnualSalary, TaxBand,

    DateLastPaid ) values (4567, 'David Bolton', 3,'1958-09-18',0.0, 0.0, 0.0,0.0, 80000, 'ABG' , NULL)

    There has to be one value for each column listed. If any are excluded thenthat column has to have a default value or the value Null. (see shortly)Updating Rows

    Updating lets you modify one or more columns in one or more rows.update Payrollset DateLastPaid = GetDate(), -- A built in function that returns today's

    dateTotalGrossPayYTD = TotalGrossPayYTD + ( AnnualSalary/12),TotalGrossPayM = AnnualSDalary/12where TaxBand='XYZ'This uses the where clause to extract rows for TaxBand = XYZ' then sets

    the DateLastPaid, TotalGrossPayYTD and TotalGrossPayM columns for eachof those rows.

    In practice, payroll is a lot more complicated than this!

    Deleting Rows

    Delete uses the where clause to specify which rows you wish to remove. Ifyou leave it off you can delete all rows!

    delete Payroll where EmployeeID = 4567 -- I got fired!

    What is NULL?

    It is a special value means that no data exists. If a database table is a bitlike a spreadsheet then a null value is an empty cell. If you do much withSQL you'll come across Null values.

    Joins

    The power of SQL really comes into its own when you use join. This letsyou retrieve data from two or more tables that have related columns. Forexample say we had a grade table with two columns,

  • 8/4/2019 Database Management Systems-Done

    22/28

    25

    1.GradeID int2.GradeDescription varchar(20) that has this data.

    0 - Graduate1 - Programmer2 - Analyst

    3 - Architect4 - System Designer

    Then we could do a select like this

    select EmployeeName, GradeDescription fromPayRoll,Gradewhere EmployeeGradeID = GradeIdand EmployeeId = 4567

    Q. 5 What is Normalization? Discuss various types ofNormal Forms?

    Normalization is the process of building database structures to store data,because any application ultimately depends on its data structures. If thedata structures are poorly designed, the application will start from a poorfoundation. This will require a lot more work to create a useful and efficientapplication. Normalization is the formal process for deciding whichattributes should be grouped together in a relation. Normalization servesas a tool for validating and improving the logical design, so that the logical

    design avoids unnecessary duplication of data, i.e. it eliminatesredundancy and promotes integrity. In the normalization process weanalyze and decompose the complex relations into smaller, simpler andwell-structured relations.

    Database normalization is a technique for designing relational databasetables to minimize duplication of information and, in so doing, to safeguardthe database against certain types of logical or structural problems,namely data anomalies.

    For example, when multiple instances of a given piece of information occur

    in a table, the possibility exists that these instances will not be keptconsistent when the data within the table is updated, leading to a loss ofdata integrity. A table that is sufficiently normalized is less vulnerable toproblems of this kind, because its structure reflects the basic assumptionsfor when multiple instances of the same information should berepresented by a single instance only.

    Types of Normal Forms

    Normal forms Based on Primary Keys

  • 8/4/2019 Database Management Systems-Done

    23/28

    25

    A relation schema R is in first normal form if every attribute of R takes onlysingle atomic values. We can also define it as intersection of each row andcolumn containing one and only one value. To transform the un-normalized table (a table that contains one or more repeating groups) tofirst normal form, we identify and remove the repeating groups within thetable.

    E.g.

    Dept.

    D.Name D.No D. location

    R&D 5 [England, London, Delhi)

    HRD 4 Bangalore

    Consider the figure that each dept can have number of locations. This is

    not in first normal form because D.location is not an atomic attribute. Thedormain of D location contains multivalues.There is a technique to achieve the first normal form. Remove theattribute D.location that violates the first normal form and place intoseparate relation Dept_location

    Functional dependency: The concept of functional dependency wasintroduced by Prof. Codd in 1970 during the emergence of definitions forthe three normal forms. A functional dependency is the constraint betweenthe two sets of attributes in a relation from a database.

    Given a relation R, a set of attributes X in R is said to functionallydetermine another attribute Y, in R, (X->Y) if and only if each value of X isassociated with one value of Y. X is called the determinant set and Y is thedependant attribute.

    Second Normal Form (2 NF)

    A second normal form is based on the concept of full functionaldependency. A relation is in second normal form if every non-primeattribute A in R is fully functionally dependent on the Primary Key of R .

    Emp_Project:Emp_Project

    2NF and 3 NF, (a) Normalizing EMP_PROJ into 2NF relations

  • 8/4/2019 Database Management Systems-Done

    24/28

    25

    (b) Normalizing EMP_DEPT into 3NF relations

    A Partial functional dependency is a functional dependency in which one or

    more non-key attributes are functionally dependent on part of the primarykey. It creates a redundancy in that relation, which results in anomalieswhen the table is updated.

    Third Normal Form (3NF)

    This is based on the concept of transitive dependency. We should designrelational schema in such a way that there should not be any transitivedependencies, because they lead to update anomalies. A functionaldependence [FD] x->y in a relation schema R is a transitive dependency.If there is a set of attributes Z Le x->, z->y is transitive. The dependency

    SSN->Dmgr is transitive through Dnum in Emp_dept relation because SSN->Dnum and Dnum->Dmgr, Dnum is neither a key nor a subset [part] ofthe key.

  • 8/4/2019 Database Management Systems-Done

    25/28

    25

    According to codds definition, a relational schema R is in 3NF if itsatisfies 2NF and no no_prime attribute is transitively dependent on theprimary key. Emp_dept relation is not in 3NF, we can normalize the abovetable by decomposing into E1 and E2.

    Boyce Codd Normal Form (BCNF)

    Database relations are designed so that they are neither partialdependencies nor transitive dependencies, because these types ofdependencies result in update anomalies. functional dependency describesthe relationship between attributes in a relation. For example, A and Bare attributes in relation R. B is functionally dependent on A (A B) if eachvalue of A is associated with exactly one value of B.

    The left_hand side and the right_hand side functional dependency aresometimes called the determinant and dependent respectively.

    A relation is in BCNF if and only if every determinant is a Candidate key.The difference between the third normal form and BCNF is that for a

    functional dependency , the third normal form allows this dependencyin a relation if B is a primary_key attribute and A is not a Cndidate key.Where as in BCNF. A must be Candidate Key. Therefore BCNF is astronger form of the third normal form.

    The PRODUCT scheme is in BCNF. Since the prd# is a candidate key,similarly customer schema is also in BCNF.

    The schema ORDER, however is not in BCNF, because ord# is not a superkey for ORDER, i.e. we could have a pair of tuples representing a singleord#.

    http://resources.smude.edu.in/slm/wp-content/uploads/2010/10/clip-image01816.jpg
  • 8/4/2019 Database Management Systems-Done

    26/28

    25

    For e.g.

    (1234,145,13,789)

    (1234,123,53,455)

    here ord# is not a candidate key. However, the FD ord#->amt is nottrivial; therefore ORDER does not satisfy the definition of CNF. It suffersfrom the problem of repetition of information. This redundancy can beeliminated by decomposing into ORDER1, ORDER2.

    ORDER1(ord#,cust#)

    ORDER2(prd#,qty,amt)

    Fourth Normal Form (4NF)

    Multi valued dependencies are based on the concept of first normal form,which prohibits attributes having a set of values. If we have two or moremulti valued independent attributes in the same relation, we get into asituation where we have to repeat every value of one of the attributes,with every value of the other attributes to keep the relation stateconsistent, and to maintain independence among the attributes involved.This constraint is specified by a Multi valued dependency.

    Normalization using join dependencies

    Join dependency: The 5NF is also called "Project Join Normal form". It is

    important to note that normalization into 5NF is considered very rarely inpractice.

    Definition: relation r is in 5NF, if for all join dependencies at least one ofthe following holds:

    (R1,R2..Rn) dependency

    Every Ri is a candidate key for R.

    Q. 6 What do you mean by Shared Lock & Exclusive lock?Describe briefly two phase locking protocol?

    A lock is a restriction on access to data in a multi-user environment. Itprevents multiple users from changing the same data simultaneously. Iflocking is not used, data within the database may become logicallyincorrect and may produce unexpected results.

    Shared Locks

  • 8/4/2019 Database Management Systems-Done

    27/28

    25

    It is used for read only operations, i.e., used for operations that do notchange or update the data.

    E.G., SELECT statement:

    Shared locks allow concurrent transaction to read (SELECT) a data. No

    other transactions can modify the data while shared locks exist. Sharedlocks are released as soon as the data has been read.

    Exclusive Locks

    Exclusive locks are used for data modification operations, such as UPDATE,DELETE and INSERT. It ensures that multiple updates cannot be made tothe same resource simultaneously. No other transaction can read ormodify data when locked by an exclusive lock.

    Exclusive locks are held until transaction commits or rolls back since those

    are used for write operations.

    There are three locking operations: read_lock(X), write_lock(X), andunlock(X). A lock associated with an item X, LOCK(X), now has threepossible states: "read locked", "write-locked", or "unlocked". A read-lockeditem is also called share-locked, because other transactions are allowed toread the item, whereas a write-locked item is called exclusive-locked,because a single transaction exclusive holds the lock on the item.

    Each record on the lock table will have four fields: . The value (state) of LOCK is

    either read-locked or write-locked.

    The Two Phase Locking Protocol

    The two phase locking protocol is a process to access the shared resourcesas their own without creating deadlocks. This process consists of twophases.

    1. Growing Phase: In this phase the transaction may acquire lock, but maynot release any locks. Therefore this phase is also called as resourceacquisition activity.

    2. Shrinking phase: In this phase the transaction may release locks, butmay not acquire any new locks. This includes the modification of data andrelease locks. Here two activities are grouped together to form secondphase.

    IN the beginning, transaction is in growing phase. Whenever lock is neededthe transaction acquires it. As the lock is released, transaction enters thenext phase and it can stop acquiring the new lock request.

  • 8/4/2019 Database Management Systems-Done

    28/28

    Strict two phase locking

    In the two phases locking protocol cascading rollback are not avoided. Inorder to avoid this slight modification are made to two phase locking andcalled strict two phase locking. In this phase all the locks are acquired bythe transaction are kept on hold until the transaction commits.

    Deadlock & starvation: In deadlock state there exists, a set of transactionin which every transaction in the set is waiting for another transaction inthe set.

    Suppose there exists a set of transactions waiting

    {T1, T2, T3,.., Tn) such that T1 is waiting for a data itemexisting in T2, T2 for T3 etc and Tn is waiting of T1. In this state none ofthe transaction will progress.