RDBMS Basic Concepts

Embed Size (px)

Citation preview

  • 8/13/2019 RDBMS Basic Concepts

    1/32

    DatabaseTo find out what database is, we have to

    start from data, which is the basic buildingblock of any DBMS.Data: Facts, figures, statistics etc. havingno particular meaning (e.g. 1, ABC, 19 etc).Record: Collection of related data items,e.g. in the above example the three data

    items had no meaning. But if we organizethem in the following way, then theycollectively represent meaningfulinformation.

    RollNameAge

    1 ABC 19

    Tableor Relation: Collection of relatedrecords.

    RollNameAge

    1 ABC 19

    2 DEF 22

    3 XYZ 28

  • 8/13/2019 RDBMS Basic Concepts

    2/32

    The columns of this relation arecalled Fields, Attributesor Domains. The

    rows are called Tuplesor Records.Database: Collection of related relations.Consider the following collection of tables:

    T1

    RollNameAge

    1 ABC 19

    2 DEF 223 XYZ 28

    T2

    RollAddress

    1 KOL2 DEL

    3 MUM

    T3

    RollYear1 I

    2 II

    3 I

  • 8/13/2019 RDBMS Basic Concepts

    3/32

    T4

    YearHostel

    I H1II H2

    We now have a collection of 4 tables. Theycan be called a related collection becausewe can clearly find out that there are some

    common attributes existing in a selectedpair of tables. Because of these commonattributes we may combine the data of twoor more tables together to find out thecomplete details of a student. Questions

    like Which hostel does the youngeststudent live in? can be answered now,although AgeandHostelattributes are indifferent tables.In a database, data is organized strictly inrow and column format. The rows are

    calledTupleor Record. The data itemswithin one row may belong to different datatypes. On the other hand, the columns areoften called Domainor Attribute. All the

  • 8/13/2019 RDBMS Basic Concepts

    4/32

    data items within a single attribute are ofthe same data type.

    What is Management System?A management system is a set of rules andprocedures which help us to createorganize and manipulate the database. Italso helps us to add, modify delete data

    items in the database. The managementsystem can be either manual orcomputerized.The management system is importantbecause without the existence of some kindof rules and regulations it is not possible tomaintain the database. We have to selectthe particular attributes which should beincluded in a particular table; the commonattributes to create relationship betweentwo tables; if a new record has to be

    inserted or deleted then which tablesshould have to be handled etc. Theseissues must be resolved by having some

  • 8/13/2019 RDBMS Basic Concepts

    5/32

    kind of rules to follow in order to maintainthe integrity of the database.

    Three Views of DataWe know that the same thing, if viewedfrom different angles produces differencesights. Likewise, the database that we havecreated already can have different aspects

    to reveal if seen from different levels ofabstraction. The term Abstractionis veryimportant here. Generally it means theamount of detail you want to hide. Anyentity can be seen from differentperspectives and levels of complexity tomake it a reveal its current amount ofabstraction. Let us illustrate by a simpleexample.A computer reveals the minimum of itsinternal details, when seen from outside.

    We do not know what parts it is built with.This is the highest level of abstraction,meaning very few details are visible. If weopen the computer case and look inside at

  • 8/13/2019 RDBMS Basic Concepts

    6/32

    the hard disc, motherboard, CD drive, CPUand RAM, we are in middle level of

    abstraction. If we move on to open the harddisc and examine its tracks, sectors andread-write heads, we are at the lowest levelof abstraction, where no details areinvisible.In the same manner, the database can also

    be viewed from different levels ofabstraction to reveal different levels ofdetails. From a bottom-up manner, we mayfind that there are three levels ofabstraction or views in the database. Wediscuss them here.

  • 8/13/2019 RDBMS Basic Concepts

    7/32

    The word schema means arrangement

    how we want to arrange things that wehave to store. The diagram above showsthe three different schemas used in DBMS,seen from different levels of abstraction.The lowest level, called the Internal orPhysical schema, deals with the

    description of how raw data items (like 1,ABC, KOL, H2 etc.) are stored in thephysical storage (Hard Disc, CD, TapeDrive etc.). It also describes the data typeof these data items, the size of the items inthe storage media, the location (physicaladdress) of the items in the storage deviceand so on. This schema is useful fordatabase application developers anddatabase administrator.The middle level is known as

    the Conceptual or Logical Schema, anddeals with the structure of the entiredatabase. Please note that at this level weare not interested with the raw data items

  • 8/13/2019 RDBMS Basic Concepts

    8/32

    anymore, we are interested with thestructure of the database. This means we

    want to know the information about theattributes of each table, the commonattributes in different tables that help themto be combined, what kind of data can beinput into these attributes, and so on.Conceptual or Logical schema is very

    useful for database administrators whoseresponsibility is to maintain the entiredatabase.

    The highest level of abstraction isthe External or View Schema. This istargeted for the end users. Now, an enduser does not need to know everythingabout the structure of the entire database,rather than the amount of details he/sheneeds to work with. We may not want the

    end user to become confused withastounding amount of details by allowinghim/her to have a look at the entiredatabase, or we may also not allow this for

  • 8/13/2019 RDBMS Basic Concepts

    9/32

    the purpose of security, where sensitiveinformation must remain hidden from

    unwanted persons. The databaseadministrator may want to create custommade tables, keeping in mind the specifickind of need for each user. These tablesare also known as virtual tables, becausethey have no separate physical existence.

    They are crated dynamically for the users atruntime. Say for example, in our sampledatabase we have created earlier, we havea special officer whose responsibility is tokeep in touch with the parents of any underaged student living in the hostels. Thatofficer does not need to know every detailexcept the Roll, Name,Addresssand Age. The databaseadministrator may create a virtual table withonly these four attributes, only for the use of

    this officer.

    Data Independence

  • 8/13/2019 RDBMS Basic Concepts

    10/32

    This brings us to our next topic: dataindependence. It is the property of the

    database which tries to ensure that if wemake any change in any level of schema ofthe database, the schema immediatelyabove it would require minimal or no needof change.What does this mean? We know that in a

    building, each floor stands on the floorbelow it. If we change the design of any onefloor, e.g. extending the width of a room bydemolishing the western wall of that room, itis likely that the design in the above floorswill have to be changed also. As a result,one change needed in one particular floorwould mean continuing to change thedesign of each floor until we reach the topfloor, with an increase in the time, cost andlabour. Would not life be easy if the change

    could be contained in one floor only? Dataindependence is the answer for this. Itremoves the need for additional amount of

  • 8/13/2019 RDBMS Basic Concepts

    11/32

    work needed in adopting the single changeinto all the levels above.

    Data independence can be classified intothe following two types:

    1. Physical Data Independence: Thismeans that for any change made in thephysical schema, the need to change the

    logical schema is minimal. This ispractically easier to achieve. Let us explainwith an example.Say, you have bought an Audio CD of arecently released film and one of yourfriends has bought an Audio Cassette of thesame film. If we consider the physicalschema, they are entirely different. The firstis digital recording on an optical media,where random access is possible. Thesecond one is magnetic recording on a

    magnetic media, strictly sequential access.However, how this change is reflected inthe logical schema is very interesting. Formusic tracks, the logical schema for both

  • 8/13/2019 RDBMS Basic Concepts

    12/32

    the CD and the Cassette is the title cardimprinted on their back. We have

    information like Track no, Name of theSong, Name of the Artist and Duration ofthe Track, things which are identical forboth the CD and the Cassette. We canclearly say that we have achieved thephysical data independence here.

    2. Logical Data Independence: Thismeans that for any change made in thelogical schema, the need to change theexternal schema is minimal. As we shallsee, this is a little difficult to achieve. Let usexplain with an example.Suppose the CD you have bought contains6 songs, and some of your friends areinterested in copying some of those songs(which they like in the film) into their favorite

    collection. One friend wants the songs 1, 2,4, 5, 6, another wants 1, 3, 4, 5 and anotherwants 1, 2, 3, 6. Each of these collectionscan be compared to a view schema for that

  • 8/13/2019 RDBMS Basic Concepts

    13/32

    friend. Now by some mistake, a scratch hasappeared in the CD and you cannot extract

    the song 3. Obviously, you will have to askthe friends who have song 3 in theirproposed collection to alter their view bydeleting song 3 from their proposedcollection as well.

    Database AdministratorThe Database Administrator, better knownas DBA, is the person (or a group ofpersons) responsible for the well being ofthe database management system. S/hehas the flowing functions andresponsibilities regarding databasemanagement:

    1. Definition of the schema, thearchitecture of the three levels of the dataabstraction, data independence.

    2. Modification of the defined schema asand when required.

  • 8/13/2019 RDBMS Basic Concepts

    14/32

    3. Definition of the storage structure i.e.and access method of the data stored i.e.

    sequential, indexed or direct.4. Creating new used-id, password etc,and also creating the access permissionsthat each user can or cannot enjoy. DBA isresponsible to create user roles, which arecollection of the permissions (like read,

    write etc.) granted and restricted for a classof users. S/he can also grant additionalpermissions to and/or revoke existingpermissions from a user if need be.

    5. Defining the integrity constraints for thedatabase to ensure that the data enteredconform to some rules, thereby increasingthe reliability of data.

    6. Creating a security mechanism toprevent unauthorized access, accidental orintentional handling of data that can cause

    security threat.7. Creating backup and recovery policy.

    This is essential because in case of afailure the database must be able to revive

  • 8/13/2019 RDBMS Basic Concepts

    15/32

    itself to its complete functionality with noloss of data, as if the failure has never

    occurred. It is essential to keep regularbackup of the data so that if the system failsthen all data up to the point of failure will beavailable from a stable storage. Only thoseamount of data gathered during the failurewould have to be fed to the database to

    recover it to a healthy status.

    Advantages and Disadvantages ofDatabase Management SystemWe must evaluate whether there is any gainin using a DBMS over a situation where wedo not use it. Let us summarize theadvantages.

    1. Reduction of Redundancy:This isperhaps the most significant advantage ofusing DBMS. Redundancy is the problem of

    storing the same data item in more oneplace. Redundancy creates severalproblems like requiring extra storage space,entering same data more than once during

  • 8/13/2019 RDBMS Basic Concepts

    16/32

    data insertion, and deleting data from morethan one place during deletion. Anomalies

    may occur in the database if insertion,deletion etc are not done properly.2. Sharing of Data:In a paper-based

    record keeping, data cannot be sharedamong many users. But in computerizedDBMS, many users can share the same

    database if they are connected via anetwork.

    3. Data Integrity:We can maintain dataintegrity by specifying integrity constrains,which are rules and restrictions about whatkind of data may be entered or manipulatedwithin the database. This increases thereliability of the database as it can beguaranteed that no wrong data can existwithin the database at any point of time.

    4. Data security:We can restrict certain

    people from accessing the database orallow them to see certain portion of thedatabase while blocking sensitive

  • 8/13/2019 RDBMS Basic Concepts

    17/32

    information. This is not possible very easilyin a paper-based record keeping.

    However, there could be a fewdisadvantages of using DBMS. They can beas following:

    1. As DBMS needs computers, we have toinvest a good amount in acquiring the

    hardware, software, installation facilitiesand training of users.

    2. We have to keep regular backupsbecause a failure can occur any time.Taking backup is a lengthy process and thecomputer system cannot perform any otherjob at this time.

    3. While data security system is a boon forusing DBMS, it must be very robust. Ifsomeone can bypass the security systemthen the database would become open to

    any kind of mishandling.

  • 8/13/2019 RDBMS Basic Concepts

    18/32

  • 8/13/2019 RDBMS Basic Concepts

    19/32

    Un-Normalized Form (UNF)If a table contains non-atomic values at eachrow, it is said to be in UNF. An atomic valueissomething that can not be further decomposed.

    A non-atomic value, as the name suggests,can be further decomposed and simplified.Consider the following table:

    Emp-Id

    Emp-Name

    MonthSalesBank-Id

    Bank-Name

    E01 AA Jan 1000 B01 SBI

    Feb 1200

    Mar 850

    E02 BB Jan 2200 B02 UTIFeb 2500

    E03 CC Jan 1700 B01 SBI

    Feb 1800

    Mar 1850

    Apr 1725

    In the sample table above, there are multipleoccurrences of rows under each key Emp-Id.

    Although considered to be the primary key,Emp-Id cannot give us the unique identification

  • 8/13/2019 RDBMS Basic Concepts

    20/32

    facility for any single row. Further, each primarykey points to a variable length record (3 forE01, 2 for E02 and 4 for E03).

    First Normal Form (1NF)A relation is said to be in 1NF if it contains nonon-atomic values and each row can provide aunique combination of values. The above table

    in UNF can be processed to create thefollowing table in 1NF.

    Emp-Id

    Emp-Name

    MonthSalesBank-Id

    Bank-Name

    E01 AA Jan 1000 B01 SBI

    E01 AA Feb 1200 B01 SBI

    E01 AA Mar 850 B01 SBI

    E02 BB Jan 2200 B02 UTI

    E02 BB Feb 2500 B02 UTI

    E03 CC Jan 1700 B01 SBI

    E03 CC Feb 1800 B01 SBIE03 CC Mar 1850 B01 SBI

    E03 CC Apr 1725 B01 SBI

  • 8/13/2019 RDBMS Basic Concepts

    21/32

    As you can see now, each row contains uniquecombination of values. Unlike in UNF, thisrelation contains only atomic values, i.e. the

    rows can not be further decomposed, so therelation is now in 1NF.

    Second Normal Form (2NF)A relation is said to be in 2NF f if it is already in

    1NF and each and every attribute fully dependson the primary key of the relation. Speakinginversely, if a table has some attributes which isnot dependant on the primary key of that table,then it is not in 2NF.Let us explain. Emp-Id is the primary key of the

    above relation. Emp-Name, Month, Sales andBank-Name all depend upon Emp-Id. But theattribute Bank-Name depends on Bank-Id,which is not the primary key of the table. So thetable is in 1NF, but not in 2NF. If this positioncan be removed into another related relation, it

    would come to 2NF.Emp-IdEmp-NameMonthSalesBank-Id

    E01 AA JAN 1000 B01

    E01 AA FEB 1200 B01

  • 8/13/2019 RDBMS Basic Concepts

    22/32

    E01 AA MAR 850 B01

    E02 BB JAN 2200 B02

    E02 BB FEB 2500 B02

    E03 CC JAN 1700 B01

    E03 CC FEB 1800 B01

    E03 CC MAR 1850 B01

    E03 CC APR 1726 B01

    Bank-IdBank-NameB01 SBI

    B02 UTI

    After removing the portion into another relationwe store lesser amount of data in two relationswithout any loss information. There is also a

    significant reduction in redundancy.

    Third Normal Form (3NF)A relation is said to be in 3NF, if it is already in2NF and there exists no transitive

    dependencyin that relation. Speakinginversely, if a table contains transitivedependency, then it is not in 3NF, and the tablemust be split to bring it into 3NF.

  • 8/13/2019 RDBMS Basic Concepts

    23/32

    What is a transitive dependency? Within arelation if we see

    A B [B depends on A]

    AndB C [C depends on B]Then we may derive

    A C[C depends on A]Such derived dependencies hold well in most ofthe situations. For example if we have

    Roll MarksAndMarks GradeThen we may safely deriveRoll Grade.This third dependency was not originally

    specified but we have derived it.The derived dependency is called atransitive dependency when suchdependency becomes improbable. Forexample we have been givenRoll City

    AndCity STDCodeIf we try to derive Roll STDCode it becomesa transitive dependency, because obviously the

  • 8/13/2019 RDBMS Basic Concepts

    24/32

    STDCode of a city cannot depend on the rollnumber issued by a school or college. In such acase the relation should be broken into two,

    each containing one of these twodependencies:Roll City

    AndCity STD code

    Boyce-Code Normal Form (BCNF)A relationship is said to be in BCNF if it isalready in 3NF and the left hand side of everydependency is a candidate key. A relationwhich is in 3NF is almost always in BCNF.

    These could be same situation when a 3NFrelation may not be in BCNF the followingconditions are found true.

    1. The candidate keys are composite.2. There are more than one candidate keys in

    the relation.

    3. There are some common attributes in therelation.

    Professor DepartmentHead o Percent

  • 8/13/2019 RDBMS Basic Concepts

    25/32

    Code Dept. Time

    P1 Physics Ghosh 50

    P1 Mathematics Krishnan 50

    P2 Chemistry Rao 25

    P2 Physics Ghosh 75

    P3 Mathematics Krishnan 100

    Consider, as an example, the above relation. Itis assumed that:

    1. A professor can work in more than onedepartment2. The percentage of the time he spends in

    each department is given.3. Each department has only one Head of

    Department.

    The relation diagram for the above relation isgiven as the following:

  • 8/13/2019 RDBMS Basic Concepts

    26/32

    The given relation is in 3NF. Observe, however,

    that the names of Dept. and Head of Dept. areduplicated. Further, if Professor P2 resigns,rows 3 and 4 are deleted. We lose theinformation that Rao is the Head of Departmentof Chemistry.The normalization of the relation is done by

    creating a new relation for Dept. and Head ofDept. and deleting Head of Dept. form the givenrelation. The normalized relations are shown inthe following.

    Professor CodeDepartmentPercent Time

    P1 Physics 50

    P1 Mathematics 50

    P2 Chemistry 25

    P2 Physics 75

    P3 Mathematics 100

    DepartmentHead of Dept.

    Physics Ghosh

    Mathematics Krishnan

  • 8/13/2019 RDBMS Basic Concepts

    27/32

    Chemistry Rao

    See the dependency diagrams for these newrelations.

    Fourth Normal Form (4NF)When attributes in a relation have multi-valueddependency, further Normalization to 4NF and5NF are required. Let us first find out whatmulti-valued dependency is.

    A multi-valued dependencyis a typical kind ofdependency in which each and every attributewithin a relation depends upon the other, yetnone of them is a unique primary key.We will illustrate this with an example. Considera vendor supplying many items to many

    projects in an organization. The following arethe assumptions:

    1. A vendor is capable of supplying manyitems.

  • 8/13/2019 RDBMS Basic Concepts

    28/32

    2. A project uses many items.3. A vendor supplies to many projects.4. An item may be supplied by many vendors.

    A multi valued dependency exists here becauseall the attributes depend upon the other and yetnone of them is a primary key having uniquevalue.

    Vendor CodeItem CodeProject No.

    V1 I1 P1

    V1 I2 P1

    V1 I1 P3

    V1 I2 P3

    V2 I2 P1

    V2 I3 P1

    V3 I1 P2V3 I1 P3

    The given relation has a number of problems.For example:

    1. If vendor V1 has to supply to project P2, butthe item is not yet decided, then a row with a

    blank for item code has to be introduced.2. The information about item I1 is stored twice

    for vendor V3.Observe that the relation given is in 3NF andalso in BCNF. It still has the problem mentioned

  • 8/13/2019 RDBMS Basic Concepts

    29/32

    above. The problem is reduced by expressingthis relation as two relations in the FourthNormal Form (4NF). A relation is in 4NF if it has

    no more than one independent multi valueddependency or one independent multi valueddependency with a functional dependency.The table can be expressed as the two 4NFrelations given as following. The fact thatvendors are capable of supplying certain items

    and that they are assigned to supply for someprojects in independently specified in the 4NFrelation.

    Vendor-Supply

    Vendor Code

    Item Code

    V1 I1

    V1 I2

    V2 I2

    V2 I3

    V3 I1

    Vendor-Project

    Vendor Code

    Project No.

    V1 P1

  • 8/13/2019 RDBMS Basic Concepts

    30/32

    V1 P3

    V2 P1

    V3 P2

    Fifth Normal Form (5NF)These relations still have a problem. Whiledefining the 4NF we mentioned that all theattributes depend upon each other. Whilecreating the two tables in the 4NF, although wehave preserved the dependencies betweenVendor Code and Item code in the first tableand Vendor Code and Item code in the secondtable, we have lost the relationship betweenItem Code and Project No. If there were aprimary key then this loss of dependency wouldnot have occurred. In order to revive thisrelationship we must add a new table like thefollowing. Please note that during the entireprocess of normalization, this is the only stepwhere a new table is created by joining twoattributes, rather than splitting them intoseparate tables.

    Project No.Item Code

    P1 11

    P1 12

  • 8/13/2019 RDBMS Basic Concepts

    31/32

    P2 11

    P3 11

    P3 13

    Let us finally summarize the normalizationsteps we have discussed so far.

    Input

    Relation

    Transformation Output

    Relation

    AllRelations Eliminate variable lengthrecord. Remove multi-attribute

    lines in table.

    1NF

    1NF

    Relation

    Remove dependency of non-

    key attributes on part of a

    multi-attribute key.

    2NF

    2NF Remove dependency of non-key attributes on other non-key

    attributes.

    3NF

    3NF Remove dependency of an

    attribute of a multi attribute key

    on an attribute of another

    (overlapping) multi-attributekey.

    BCNF

    BCNF Remove more than one

    independent multi-valued

    4NF

  • 8/13/2019 RDBMS Basic Concepts

    32/32

    dependency from relation by

    splitting relation.

    4NF Add one relation relating

    attributes with multi-valued

    dependency.

    5NF