Database 2nd Semester

Embed Size (px)

Citation preview

  • 7/28/2019 Database 2nd Semester

    1/18

    Data:Data consists of a series of facts or statements that may have beencollected, stored, processed and/or manipulated but have not beenorganized or placed into context. When data is organized, it becomesinformation. Information can be processed and used to draw generalizedconclusions or knowledge.

    Examples:

    A file listing all of the orders placed through an online service is anexample of data. If we sort the data by ZIP code and summarize thenumber of orders that come from each city, we have created information.We can create knowledge by taking this information and makingstatements such as "Most orders for Widget X come from thenortheastern United States."

    META DATA: Metadata is literally "data about data." This term refers toinformation about data itself -- perhaps the origin, size, formatting orother characteristics of a data item. In the database field, metadata isessential to understanding and interpreting the contents of a datawarehouse.

    Data Base:A database is a collection of information organized intointerrelated tables of data and specifications of data objects.

    Database - Advantages & Disadvantages

    Advantages

    Reduced data redundancy Reduced updating errors and increased consistency Greater data integrity and independence from applications programs Improved data access to users through use of host and query

    languages Improved data security Reduced data entry, storage, and retrieval costs Facilitated development of new applications program

    Disadvantages

    Database systems are complex, difficult, and time-consuming todesign

    Substantial hardware and software start-up costs Damage to database affects virtually all applications programs Extensive conversion costs in moving form a file-based system to a

    database system Initial training required for all programmers and users

  • 7/28/2019 Database 2nd Semester

    2/18

    Hierarchical Model: The hierarchical data model organizes data in a tree structure. There is

    a hierarchy of parent and child data segments. This structure implies that a record can have

    repeating information, generally in the child data segments. Data in a series of records, which

    have a set of field values attached to it. It collects all the instances of a specific record

    together as a record type. These record types are the equivalent of tables in the relational

    model, and with the individual records being the equivalent of rows. To create links betweenthese record types, the hierarchical model uses Parent Child Relationships. These are a 1:N

    mapping between record types. This is done by using trees, like set theory used in the

    relational model, "borrowed" from maths. For example, an organization might store

    information about an employee, such as name, employee number, department, salary. The

    organization might also store information about an employee's children, such as name and

    date of birth. The employee and children data forms a hierarchy, where the employee data

    represents the parent segment and the children data represents the child segment. If an

    employee has three children, then there would be three child segments associated with one

    employee segment. In a hierarchical database the parent-child relationship is one to many.

    This restricts a child segment to having only one parent segment. Hierarchical DBMSs were

    popular from the late 1960s, with the introduction of IBM's Information Management System(IMS) DBMS, through the 1970s.

    Network Model: The popularity of the network data model coincided with the popularity of

    the hierarchical data model. Some data were more naturally modeled with more than one

    parent per child. So, the network model permitted the modeling of many-to-many

    relationships in data. In 1971, the Conference on Data Systems Languages (CODASYL)

    formally defined the network model. The basic data modeling construct in the network model

    is the set construct. A set consists of an owner record type, a set name, and a member record

    type. A member record type can have that role in more than one set, hence the multiparent

    concept is supported. An owner record type can also be a member or owner in another set.

    The data model is a simple network, and link and intersection record types (called junction

    records by IDMS) may exist, as well as sets between them . Thus, the complete network of

    relationships is represented by several pairwise sets; in each set some (one) record type is

    owner (at the tail of the network arrow) and one or more record types are members (at the

    head of the relationship arrow). Usually, a set defines a 1:M relationship, although 1:1 is

    permitted. The CODASYL network model is based on mathematical set theory.

    Relational Model:(RDBMS - relational database management system) A database based on

    the relational model developed by E.F. Codd. A relational database allows the definition ofdata structures, storage and retrieval operations and integrity constraints. In such a database

    the data and relations between them are organised in tables. A table is a collection of records

    and each record in a table contains the same fields.

    Properties of Relational Tables:

    Values Are Atomic

    Each Row is Unique

    Column Values Are of the Same Kind

    The Sequence of Columns is Insignificant

    The Sequence of Rows is Insignificant

    Each Column Has a Unique Name

  • 7/28/2019 Database 2nd Semester

    3/18

    Certain fields may be designated as keys, which means that searches for specific values of

    that field will use indexing to speed them up. Where fields in two different tables take values

    from the same set, a join operation can be performed to select related records in the two

    tables by matching values in those fields. Often, but not always, the fields will have the same

    name in both tables. For example, an "orders" table might contain (customer-ID, product-code) pairs and a "products" table might contain (product-code, price) pairs so to calculate a

    given customer's bill you would sum the prices of all products ordered by that customer by

    joining on the product-code fields of the two tables. This can be extended to joining multiple

    tables on multiple fields. Because these relationships are only specified at retreival time,

    relational databases are classed as dynamic database management system. The

    RELATIONAL database model is based on the Relational Algebra.

    DBMS:Stands for "Database Management System." In short, a DBMS is a database program.Technically speaking, it is a software system that uses a standard method of cataloging,

    retrieving, and running queries on data. The DBMS manages incoming data, organizes it, and

    provides ways for the data to be modified or extracted by users or other programs.

    Some DBMS examples include MySQL, PostgreSQL, Microsoft Access, SQL Server,

    FileMaker, Oracle, RDBMS, dBASE, Clipper, and FoxPro. Since there are so many database

    management systems available, it is important for there to be a way for them to communicate

    with each other. For this reason, most database software comes with an Open Database

    Connectivity (ODBC) driver that allows the database to integrate with other databases. For

    example, common SQL statements such as SELECT and INSERT are translated from a

    program's proprietary syntax into a syntax other databases can understand.

    DBMS Functions: There are several functions that a DBMS performs to ensure data

    integrity and consistency of data in the database. The ten functions in the DBMS are: data

    dictionary management, data storage management, data transformation and presentation,

    security management, multiuser access control, backup and recovery management, data

    integrity management, database access languages and application programming interfaces,

    database communication interfaces, and transaction management.

    1. Data Dictionary ManagementData Dictionary is where the DBMS stores definitions of the data elements and their

    relationships (metadata). The DBMS uses this function to look up the required data

    component structures and relationships. When programs access data in a database they are

    basically going through the DBMS. This function removes structural and data dependency

    and provides the user with data abstraction. In turn, this makes things a lot easier on the end

    user. The Data Dictionary is often hidden from the user and is used by DatabaseAdministrators and Programmers.

    2. Data Storage Management: This particular function is used for the storage of data and

    any related data entry forms or screen definitions, report definitions, data validation rules,

    procedural code, and structures that can handle video and picture formats. Users do not need

    to know how data is stored or manipulated. Also involved with this structure is a term called

    performance tuning that relates to a databases efficiency in relation to storage and access

    speed.

    3. Data Transformation and Presentation: This function exists to transform any data

    entered into required data structures. By using the data transformation and presentation

    function the DBMS can determine the difference between logical and physical data formats.

    4. Security Management: This is one of the most important functions in the DBMS.Security management sets rules that determine specific users that are allowed to access the

    http://www.techterms.com/definition/odbchttp://www.techterms.com/definition/odbchttp://www.techterms.com/definition/odbchttp://www.techterms.com/definition/odbc
  • 7/28/2019 Database 2nd Semester

    4/18

    database. Users are given a username and password or sometimes through biometric

    authentication (such as a fingerprint or retina scan) but these types of authentication tend to

    be more costly. This function also sets restraints on what specific data any user can see or

    manage.

    5. Multiuser Access Control

    Data integrity and data consistency are the basis of this function. Multiuser accesscontrol is a very useful tool in a DBMS, it enables multiple users to access the database

    simultaneously without affecting the integrity of the database.

    6. Backup and Recovery ManagementBackup and recovery is brought to mind whenever there is potential outside threats

    to a database. For example if there is a power outage, recovery management is how long it

    takes to recover the database after the outage. Backup management refers to the data safety

    and integrity; for example backing up all your mp3 files on a disk.

    7. Data Integrity ManagementThe DBMS enforces these rules to reduce things such as data redundancy, which is

    when data is stored in more than one place unnecessarily, and maximizing data consistency,making sure database is returning correct/same answer each time for same question asked.

    ERD: Entity-Relationship Diagrams (ERD)Data models are tools used in analysis to describe

    the data requirements and assumptions in the system from a top-down perspective. They also

    set the stage for the design of databases later on in the SDLC.There are three basic elements

    in ER models:Entities are the "things" about which we seek information.Attributes are the

    data we collect about the entities.Relationships provide the structure

    Elements of ER Model:

    ENTITIES

    According to the English Dictionary [19], an entity is "Something that exists as a particular

    and discrete unit ", and adapted from [20] , a definition that can be the starting point in the

    discussion is that an entity is something that has a distinct, separate existence, though it need

    not be a material existence. In the context of databases, entities became the main discrete data

    objects that make the subject of collecting and keeping data. There have been developed

    techniques and methodologies of identifying entities for a certain problem or world which we

    do not cover here, we mention just that in the general case entities are usually recognizableconcrete or abstract concepts.

  • 7/28/2019 Database 2nd Semester

    5/18

    Examples of entities are: person, places, things, or events which have relevance to the

    database.

    RELATIONSHIPS

    A relationship represents an association between two or more entities. An example of arelationship in the medical world would be:

    any drug is produced by one manufacturer.

    a disease presents zero, one or more symptoms.

    a drug causes more reactions, and a reaction can be caused by one or more drugs.

    ATTRIBUTES

    An attribute is the abstraction used to describe one property of the entity set ( the totality of

    the one entity instances makes up the entity set ). A value is an attribute's particular instance.

    The entire collection of possible values an attribute can have is called the domain of an

    attribute.

    The classification of attributes is done according to their role : whether they identify an

    instance of an entity or not. If they do, they are called identifiers, and if describe a non-unique

    characteristic they are called descriptors. Identifiers are generally named keys.

    Having introduced all the key elements of the ER model ( entities, attributes and relationships

    ), the introduction on special entity types and the discussion about relationships classification

    ( which have been intentionally omitted ) is required.

    Entity Relationship Diagrams:Entity Relationship Diagrams (ERDs) illustrate the logical

    structure of databases.

    Entity Relationship Diagram Notations

    EntityAn entity is an object or concept about which you want to store information.

    Weak Entity

    A weak entity is an entity that must defined by a foreign key relationship with another entity

    as it cannot be uniquely identified by its own attributes alone.

  • 7/28/2019 Database 2nd Semester

    6/18

    Attribute:

    A key attribute is the unique, distinguishing characteristic of the entity. For example, an

    employee's social security number might be the employee's key attribute.

    Multivalued attribute

    A multivalued attribute can have more than one value. For example, an employee entity can

    have multiple skill values.

    Derived attribute

    A derived attribute is based on another attribute. For example, an employee's monthly salary

    is based on the employee's annual salary.

    Relationships

    Relationships illustrate how two entities share information in the database structure.

    how to draw relationships:

    First,connect the two entities, then drop the relationship notation on the line.

    Cardinality

    Cardinality specifies how many instances of an entity relate to one instance of another

    entity.Ordinality is also closely linked to cardinality. While cardinality specifies the

    occurences of a relationship, ordinality describes the relationship as either mandatory or

    optional. In other words, cardinality specifies the maximum number of relationships and

    ordinality specifies the absolute minimum number of relationships.

    http://www.smartdraw.com/resources/tutorials/Drawing-ER-Diagrams-1http://www.smartdraw.com/resources/tutorials/Drawing-ER-Diagrams-1http://www.smartdraw.com/resources/tutorials/Drawing-ER-Diagrams-2http://www.smartdraw.com/resources/tutorials/Drawing-ER-Diagrams-2http://www.smartdraw.com/resources/tutorials/Drawing-ER-Diagrams-1
  • 7/28/2019 Database 2nd Semester

    7/18

    Recursive relationship

    In some cases, entities can be self-linked. For example, employees can supervise otheremployees.

    Cardinality Notations

    Cardinality specifies how many instances of an entity relate to one instance of another entity.

    Ordinality is also closely linked to cardinality. While cardinality specifies the occurances of a

    relationship, ordinality describes the relationship as either mandatory or optional. In other

    words, cardinality specifies the maximum number of relationships and ordinality specifies the

    absolute minimum number of relationships. When the minimum number is zero, the

    relationship is usually called optional and when the minimum number is one or more, the

    relationship is usually called mandatory.

    There are many notation styles that express cardinality and they are all supported by

    SmartDraw.

    Degrees of Relationship (Cardinality)

    The degree of relationship (also known as cardinality) is the number ofoccurrences in one entity which are associated (or linked) to the numberof occurrences in another.

    There are three degrees of relationship, known as:

    1. one-to-one (1:1)

  • 7/28/2019 Database 2nd Semester

    8/18

    2. one-to-many (1:M)3. many-to-many (M:N)

    One-to-one (1:1)

    This is where one occurrence of an entity relates to only one occurrence inanother entity.

    A one-to-one relationship rarely exists in practice, but it can. However,you may consider combining them into one entity.

    For example, an employee is allocated a company car, which can only bedriven by that employee.

    Therefore, there is a one-to-one relationship between employee andcompany car.

    One-to-Many Relationships

    One-to-Many (1:M)

    Is where one occurrence in an entity relates to many occurrences inanother entity.

    For example, taking the employee and department entities shown on theprevious page, an employee works in one department but a departmenthas many employees.

    Therefore, there is a one-to-many relationship between department andemployee.

    Many-to-Many (M:N)

    This is where many occurrences in an entity relate to many occurrences inanother entity.

    The normalisation process discussed earlier would prevent any suchrelationships but the definition is included here for completeness.

  • 7/28/2019 Database 2nd Semester

    9/18

    As with one-to-one relationships, many-to-many relationships rarelyexist. Normally they occur because an entity has been missed.

    For example, an employee may work on several projects at the same timeand a project has a team of many employees.

    Therefore, there is a many-to-many relationship between employee andproject.

    Normalization

    Normalization is the process of eliminating redundant data from database tables. There are 5

    levels of normalization - also termed as the 5 normal forms. Most database designers stop at

    either levels 2 or 3. This is because although normalization reduces data redundancy, it also

    results in increased complexity which will cause a decrease in performance. This decrease in

    performance is due to the requirement to join the normalized tables in queries. Levels 4 and 5

    of normalization remains largely an academic field of study and is not applied in industry.

    Anomaly in database:Data anomaly means same type of data present in database as a duplication.So while updating

    or modifying the information in the database we gets the problem of data inconsistency to

    solve this problem we need to remove the duplicated data

    Functional Dependency:A functional dependency occurs when one attribute in a relation uniquely determines another

    attribute. This can be written A -> B which would be the same as stating "B is functionally

    dependent upon A."

    Examples:In a table listing employee characteristics including Social Security Number

    (SSN) and name, it can be said that name is functionally dependent upon SSN (or SSN ->

    name) because an employee's name can be uniquely determined from their SSN. However,

    the reverse statement (name -> SSN) is not true because more than one employee can have

    the same name but different SSNs.

    First Normal Form (1NF)

    The next step is to transform the table of unnormalized data into firstnormal form (1NF). The rule is:remove any repeating attributes to anew table. The process is as follows:

    Identify repeating attributes.

    Remove these repeating attributes to a new table together witha copy of the key from the UNF table.

  • 7/28/2019 Database 2nd Semester

    10/18

    Assign a key to the new table (and underline it). The key from theoriginal unnormalised tablealways becomes part of the key of thenew table.

    A compound key is created. The value for this key must be uniquefor each entity occurrence.

    Second normal form (2NF). At this level of normalization, each column in a

    table that is not a determiner of the contents of another column must itself be a

    function of the other columns in the table. For example, in a table with three columns

    containing customer ID, product sold, and price of the product when sold, the price

    would be a function of the customer ID (entitled to a discount) and the specific

    product.

    Third normal form (3NF). At the second normal form, modifications are stillpossible because a change to one row in a table may affect data that refers to this

    information from another table. For example, using the customer table just cited,

    removing a row describing a customer purchase (because of a return perhaps) will

    also remove the fact that the product has a certain price. In the third normal form,

    these tables would be divided into two tables so that product pricing would be

    tracked separately.

    Normalization in Detail

    What is Normalization ? Why should we use it?

    Normalization is a database design technique which organizes tables in a manner that

    reduces redundancy and

    dependency of data.

    It divides larger tables to smallertables and link them using relationships.

    The inventor of the relational model Edgar Codd proposed the theory of normalization with

    the introduction of

    FirstNormal Form and he continued to extend theory with Second andThird Normal

    Form. Later he joined with

    Raymond F. Boyce to develop the theory ofBoyce-Codd Normal Form.

    Theory of Normalization is still being developed further. For example there are discussions even on 6th Normal Form.

  • 7/28/2019 Database 2nd Semester

    11/18

    But in most practical applications normalization achieves its best in 3rd Normal Form. The evolution of

    Normalization

    theories is illustrated below-

    Lets learn Normalization with practical example -

    Assume a video library maintains a database of movies rented out. Without any normalization all information is stored in

    one table as shown below.

    Table 1

    Here you see Movies Rented column has multiple values.

    Now lets move in to 1st Normal Form

    1NF Rules

    Each table cell should contain single value.

    Each record needs to be unique.

    The above table in 1NF-

  • 7/28/2019 Database 2nd Semester

    12/18

    Table 1 : In 1NF Form

    Before we proceed lets understand a few things --

    What is a KEY ?

    A KEY is a value used to uniquely identify a record in a table. A KEY could be a single column or combination of

    multiple columns

    Note: Columns in a table that are NOT used to uniquely identify a record are called non-key columns.

    What is a primary Key?

    A primary is a single column values used to uniquely identify a database record.

    It has following attributes

    A primary key cannot be NULL

    A primary key value must be unique

    The primary key values can not be changed

    The primary key must be given a value when a new record is inserted.

    What is a composite Key?

    A composite key is a primary key composed of multiple columns used to identify a record uniquely

    In our database , we have two people with the same name Robert Phil but they live at different places.

  • 7/28/2019 Database 2nd Semester

    13/18

    Hence we require both Full Name and Address to uniquely identify a record. This is a composite key.

    Lets move into 2NF

    2NF Rules

    Rule 1- Be in 1NF

    Rule 2- Single Column Primary Key

    It is clear that we cant move forward to make our simple database in 2nd Normalization form unless we partition the

    table above.

    Table 1

    Table 2

    We have divided our 1NF table into two tables viz. Table 1 and Table2. Table 1 contains member information.

    Table 2 contains information on movies rented.

    We have introduced a new column called Membership_id which is the primary key for table 1. Records can be

    uniquely identified in Table 1 using membership id

    Introducing Foreign Key!

    In Table 2, Membership_ID is the foreign Key

  • 7/28/2019 Database 2nd Semester

    14/18

    Foreign Key references primary key of another Table!It helps connect your Tables

    A foreign key can have a different name from its primary key

    It ensures rows in one table have corresponding rows in another

    Unlike Primary key they do not have to be unique. Most often they arentForeign keys can be null even though primary keys can not

  • 7/28/2019 Database 2nd Semester

    15/18

    Why do you need a foreign key ?

    Suppose an idiot inserts a record in Table B such as

    You will only be able to insert values into your foreign key that exist in the unique key in the parent table.

    This helps in referential integrity.

  • 7/28/2019 Database 2nd Semester

    16/18

    The above problem can be overcome by declaring membership id from Table2 as foreign key of membership id

    from Table1

    Now , if somebody tries to insert a value in the membership id field that does not exist in the parent table ,

    an error will be shown!

    What is a transitive functional dependencies?

    A transitive functional dependency is when changing a non-key column , might cause any of the other non-key

    columns to change

    Consider the table 1. Changing the non-key column Full Name , may change Salutation.

    Lets move ito 3NF

    3NF Rules

    Rule 1- Be in 2NF

    Rule 2- Has no transitive functional dependencies

    To move our 2NF table into 3NF we again need to need divide our table.

  • 7/28/2019 Database 2nd Semester

    17/18

    TABLE 1

    Table 2

    Table 3

    We have again divided our tables and created a new table which stores Salutations.

    There are no transitive functional dependencies and hence our table is in 3NF

    In Table 3 Salutation ID is primary key and in Table 1 Salutation ID is foreign to primary key in Table 3

    Now our little example is in a level that cannot further be decomposed to attain higher forms of normalization.

    In fact it is already in higher normalization forms. Separate efforts for moving in to next levels of normalization

    are normally needed in complex databases. However we will be discussing about next levels of normalizations

    in brief in the following.

    Boyce-Codd Normal Form (BCNF)

  • 7/28/2019 Database 2nd Semester

    18/18

    Even when a database is in 3rd Normal Form, still there would be anomalies resulted if it has more than one Candidate Key.

    Sometimes is BCNF is also referred as 3.5 Normal Form.

    4th Normal Form

    If no database table instance contains two or more, independent and multivalued data describing the relevant entity ,

    then it is in 4th Normal Form.

    5th Normal Form

    A table is in 5th Normal Form only if it is in 4NF and it cannot be decomposed in to any number of smaller tables

    without loss of data.

    6th Normal Form

    6th Normal Form is not standardized yet however it is being discussed by database experts for some time. Hopefully

    we would have clear standardized definition for 6th Normal Form in near future.