SALEH database lect2

Embed Size (px)

Citation preview

  • 8/6/2019 SALEH database lect2

    1/56

    CS200 Database Systems

    CS200 DATABASE SYSTEMS

    LECTURE1INTRODUCTION

    InstructorALLY .S. Nyamawe, [email protected] +255 715 016 580

    mailto:[email protected]:[email protected]
  • 8/6/2019 SALEH database lect2

    2/56

  • 8/6/2019 SALEH database lect2

    3/56

    CS200 Database Systems

    Cont...

    chances are that our activities will involvesomeone or some computer program accessing adatabase. Even purchasing items from a

    supermarket nowadays in many cases involvesan automatic update of the database that keepsthe inventory of supermarket items.

    These interactions are examples of what we

    may call traditional database applications, inwhich most of the information that is stored andaccessed is either textual or numeric.

  • 8/6/2019 SALEH database lect2

    4/56

  • 8/6/2019 SALEH database lect2

    5/56

    CS200 Database Systems

    Database Definition

    Databases and database technology are having amajor impact on the growing use of computers. Itis fair to say that databases play a critical role inalmost all areas where computers are used,including business, electronic commerce,engineering, medicine, law, education, and library

    science, etc.Generally, A database is a collection of related

    data.

  • 8/6/2019 SALEH database lect2

    6/56

    CS200 Database Systems

    Cont...

    By data, we mean known facts that can berecorded and that have implicit meaning. Forexample, consider the names, telephone numbers,

    and addresses of the people you know. You mayhave recorded this data in an indexed address

    book, or you may have stored it on a hard drive,using a personal computer and software such asMicrosoft Access, or Excel. This is a collection ofrelated data with an implicit meaning and hence isa database.

  • 8/6/2019 SALEH database lect2

    7/56

    CS200 Database Systems

    Cont...

    Also, A database can be defined as collection ofinformation organized in such a way that it canbe accessed easily.

    ExamplesTracking customer ordersMaintaining Employees Records.Maintaining Students Information

  • 8/6/2019 SALEH database lect2

    8/56

    CS200 Database Systems

    Database Properties

    A database has the following implicit properties:A database represents some aspect of the real world,

    sometimes called the miniworld or the universe ofdiscourse. Changes to the miniworld are reflected in

    the database.A database is a logically coherent collection of data

    with some inherent meaning. A random assortment ofdata cannot correctly be referred to as a database.

    A database is designed, built, and populated with datafor a specific purpose. It has an intended group of usersand some preconceived applications in which theseusers are interested.

  • 8/6/2019 SALEH database lect2

    9/56

    CS200 Database Systems

    History of Databases

    Manual systems

    File Processing Systems (FPS)

    Database Management systems (DBMS)

  • 8/6/2019 SALEH database lect2

    10/56

    CS200 Database Systems

    Manual Systems

    Structure

    Information can be stored in dedicated room or inseparate offices.

    Room or office will be furnished with shelves.Different shelves will hold Records for different

    subjects.

    Records will be stored in hard flat files, each filewill carry one record.

    Each file will have a specific number to identify it.

    A person will use the file number to retrieve the

    specific file (record).

  • 8/6/2019 SALEH database lect2

    11/56

    CS200 Database Systems

    Manual Systems

    User

    File keeper

    Files Cabinet

  • 8/6/2019 SALEH database lect2

    12/56

    CS200 Database Systems

    File Processing Systems (FPS)

    Information stored as groups of records in separatefiles

    File processing systems consisted of a few data filesand many application programs

    Each file called a soft flat file

    Flat file contain processed information for onespecific function

    Use of programming languages to write applications

    Little flexibility

    High maintenance

    Many limitations

  • 8/6/2019 SALEH database lect2

    13/56

    CS200 Database Systems

    File Processing Systems (FPS)

  • 8/6/2019 SALEH database lect2

    14/56

    CS200 Database Systems

    Limitations of File ProcessingSystems

    Separate and isolated data.

    Data redundancy.

    Program - data interdependence involvingfile formats and access techniques.

    Difficulty in representing data from theusers view.

    Data inflexibility.

  • 8/6/2019 SALEH database lect2

    15/56

    CS200 Database Systems

    Database Management systems(DBMS

    A program that allows users to define, create,manipulate, store, maintain, retrieve, and process the

    data in a database in order to produce meaningful

    information.

    Focus on information representationData stored as records in various database files that can

    be combined to produce meaningful information for users

    DBMS controls all functions of capturing, processing,

    storing, retrieving data and generates various forms of dataoutput

    Manages access by multiple users and multiple programsto a common store of data

  • 8/6/2019 SALEH database lect2

    16/56

    CS200 Database Systems

    Cont..

  • 8/6/2019 SALEH database lect2

    17/56

    CS200 Database Systems

    DBMS overcomes all Limitationsof FPS.

    Eliminates separation and isolation of data

    Reduces data redundancy

    Eliminates dependence between programs and

    data

    Allows for representation of data from usersview

    Increases data flexibilitySuperior flexibility and security over spreadsheet

    applications

  • 8/6/2019 SALEH database lect2

    18/56

    CS200 Database Systems

    Characteristics of a DBMS

    Computerized record-keeping system

    Contains facilities that allow the user to:

    Add, delete files, Insert, retrieve, update,and delete data

    Collection of databases; each can be used forseparate purposes or combined

    Examples of DBMS are: Sql server, Ms Access,MySql, Oracle.

  • 8/6/2019 SALEH database lect2

    19/56

    CS200 Database Systems

    Functions and Uses of a DBMS

    To store data

    To organize data

    To control access to dataTo protect data

    To provide decision support

    To provide transaction processing

  • 8/6/2019 SALEH database lect2

    20/56

    CS200 Database Systems

    Components of a DBMS

    users/programmers

    application programs/queries

    software to process queries/programs

    software to access stored data

    stored database

    definition

    (meta -data)

    stored

    database

  • 8/6/2019 SALEH database lect2

    21/56

    CS200 Database Systems

    Architecture of a DBMS

    user queries

    storage manager

    stored database

    definition

    (meta-data)

    stored

    database

    schema modifications modifications

    query processor

    transaction

    manager

  • 8/6/2019 SALEH database lect2

    22/56

    CS200 Database Systems

    Overview of DBMS Components

    Stored Database and Meta-data: The stored databaseresides on secondary and tertiary devices. (At anygiven moment some portion of the database will also

    be mirrored in cache, but we will ignore this for the

    moment.)Meta-data is data about data. In this case the meta-data is

    a description of the data components of the database.Offsets of fields within records. Typing information.

    Schema information. Index information and so forth.For a given database, a DBMS may maintain many

    different indices designed to provide fast access to randomdata.

  • 8/6/2019 SALEH database lect2

    23/56

    CS200 Database Systems

    Cont...

    Storage Manager: In a simple database system,the storage manager is nothing more than thefile system of the underlying OS. In larger

    systems, for the purposes of efficiency, theDBMSs normally control storage on the diskdirectly.

    The storage manager consists of two basiccomponents (1) the buffer manager, and (2) thefile manager.

  • 8/6/2019 SALEH database lect2

    24/56

    CS200 Database Systems

    Cont...

    File Manager: Keeps track of the location of files on the disksand obtains the block or blocks containing a file on requestfrom the buffer manager. Disks are typically blocked into

    regions of contiguous space ranging between 212

    and 214

    bytes (between roughly 4000 to 16,000 bytes/block).

    Buffer Manager: Handles main memory. It obtains blocks ofdata from the disk, via the file manager, and chooses a pageof main memory in which to store the block. The paging

    algorithm will determine how long a page will remain inmain memory. However, the transaction manager can alsoforce a page in main memory to be returned to disk.

  • 8/6/2019 SALEH database lect2

    25/56

    CS200 Database Systems

    Cont...

    Query Manager: Turns a query or databasemanipulation, which may be expressed at avery high level (e.g., SQL) into a sequence

    of request for stored data such as specifictuples of a relation or parts of an index to arelation.

    Often the hardest part of query processing is

    query optimization, which involves theformulation of a good query executionstrategy.

  • 8/6/2019 SALEH database lect2

    26/56

    CS200 Database Systems

    Cont...

    Transaction Manager: There are certain guarantees thata DBMS must make when performing operations on adatabase. These guarantees are often referred to as theACID properties.

    Atomicity: all of a transaction is executed or none of itis executed.

    Consistency: data cannot be in a inconsistent state.

    Isolation: concurrent transactions must be isolated from

    each other both in effect and in visibility.Durability: changes to the database caused by a

    transaction must not be lost even if the system failsimmediately after the transaction completes.

  • 8/6/2019 SALEH database lect2

    27/56

    CS200 Database Systems

    Database Design

    For the system to be acceptable to the end-users, the database design activity is crucial.

    A poorly designed database will generate

    error that may lead to bad decisions beingmade, which may have serious repercussionsfor the organization. On the other hand, a well-

    designed database produces, in an efficient way,a system that provides the correct informationfor the decision-making process to succeed.

  • 8/6/2019 SALEH database lect2

    28/56

    CS200 Database Systems

    Roles in the Database Environment

    Data and Database AdministratorsThe Data Administrator (DA) is responsible for the

    management of the data resource including database planning, development and maintenance of standards,

    policies and procedures, and conceptual/logical databasedesign.

    The Database Administrator (DBA) is responsible for the physical realization of the database, including physicaldatabase design and implementation, security and integritycontrol, maintenance of the operational system, and ensuringsatisfactory performance of the applications for users. Therole of the DBA is more technically oriented than that of theDA.

  • 8/6/2019 SALEH database lect2

    29/56

    CS200 Database Systems

    Database Administrator

    A database administrator (DBA) controls andmanages the database.

    Functions of a DBAMake decisions concerning the content of thedatabase

    Plan storage structures and access strategies

    Provides support to usersDefines security and integrity checks

    Interprets backup and recovery strategies.

    R l i th D t b E i t

  • 8/6/2019 SALEH database lect2

    30/56

    CS200 Database Systems

    Roles in the Database Environment(Cont..)

    Database DesignersIn large db design projects, we can distinguish between two

    types of designers: logical database designers and physicaldatabase designers.

    Logical database designers are concerned with identifyingthe data (the entities and attributes), the relationshipsbetween the data, and the constraints on the data that will bestored in the database.

    Physical database designers are highly dependent on thetarget DBMS, and there may be more than one way ofimplementing a mechanism. The physical db designer mustbe fully aware of the functionality of the target DBMS.

  • 8/6/2019 SALEH database lect2

    31/56

    CS200 Database Systems

    Cont...

    Application DevelopersOnce the database has been implemented, the

    application programs that provide the required

    functionality for the end-users must beimplemented. This is the responsibility of theapplication developers.

  • 8/6/2019 SALEH database lect2

    32/56

    CS200 Database Systems

    Cont...

    End UsersEnd users are the clients for the database and can be

    broadly categorized into two groups based upon how theyutilize the system.

    Nave users are typically unaware of the DBMS.They access the database through specially writtenapplication programs which attempt to make theoperations as simple as possible. They typically knownothing about the database or the DBMS.

    Sophisticated users are familiar with the structure ofthe database and the facilities offered by the DBMS. Theywill typically use a high-level query language like SQL to perform their required operations and may even write their

    own application programs.

    Advantages and Disadvantages

  • 8/6/2019 SALEH database lect2

    33/56

    CS200 Database Systems

    Advantages and Disadvantagesof a DBMS

    Advantages:Centralized data reduces management problemsData redundancy and consistency are controllable

    Program - data interdependency is diminishedFlexibility of data is increased

    More information from the same amount of data

    Sharing of data

    Improved data integrityImproved security

    Enforcement of standards

  • 8/6/2019 SALEH database lect2

    34/56

    CS200 Database Systems

    Cont...

    Disadvantages:Reduction in speed of data access time

    Requires special knowledgePossible dependency of application

    programs to specific DBMS versions

  • 8/6/2019 SALEH database lect2

    35/56

    CS200 Database Systems

    More DBMS Advantages

    control of data redundancy economy of scale

    data consistency

    more information from same data

    amount of data available

    sharing of data

    improved data integrity

    improved data security

    enforcement of standards

    balance of conflicting requirements

    improved data accessibility

    increased productivity

    improved maintenance

    increased concurrency

    improved backup and recovery

    improved responsiveness

  • 8/6/2019 SALEH database lect2

    36/56

    CS200 Database Systems

    More DBMS Disadvantages

    complexity

    size

    cost of DBMSs

    additional hardware costs

    cost of conversion

    performance (specific cases)

    higher impact of failure

    complexity

    size

    cost of DBMSs

    additional hardware costs

    cost of conversion

    performance (specific cases)

    higher impact of failure

    Th L l f Ab t ti i

  • 8/6/2019 SALEH database lect2

    37/56

    CS200 Database Systems

    Three-Levels of Abstraction in a

    Database System

    View 1View 1 View 2View 2 View nView n

    user 1 user 2 user n

    external level

    Conceptual

    Schema

    Conceptual

    Schema

    conceptual level

    internal level

    physical data organization

    Internal

    Schema

    Internal

    Schema

    dbdb

    external to

    conceptualmapping

    conceptual to

    internal

    mapping

  • 8/6/2019 SALEH database lect2

    38/56

    CS200 Database Systems

    The External Level

    The external level is the users view of the database.This level describes that part of the database which is

    relevant to each user.The external level consists of a number of different

    external views of the db. Each user has a view of thereal world represented in a form that is familiar for thatuser.

    The external view includes only those entities,

    attributes, and relationships in the real world that theuser is interested in. Other entities, attributes, andrelationships may exist, but the user will be unaware thatthey even exist.

  • 8/6/2019 SALEH database lect2

    39/56

    CS200 Database Systems

    The External Level Cont...

    It is often the case that different external views willhave different representations of the same data.Example: one view may represent dates in the form of

    (month, day, year) while another view may represent dates

    in the form of (day, month, year).Some views may include derived or calculated data. Thisis data that is not actually stored in the database as such,

    but created when needed.

    Example: one view may need to see a persons age.However, this is probably not a stored value in the db sinceit would require daily updates. Rather, it is probablyderived from stored data representing the persons date of

    birth and the current date.

  • 8/6/2019 SALEH database lect2

    40/56

    CS200 Database Systems

    The Conceptual Level

    The conceptual level is the community view of the database.This level describes whatdata is stored in the database and therelationships among the data.This is the level at which the logical structure of the entire

    database as seen by the DBA is contained. It represents a

    complete view of the data requirements of the organization that isindependent of any storage considerations.

    The conceptual level supports each external view, in that anydata available to a user must be contained in, or derivable from,the conceptual level.

    This level contains no storage-dependent details. For example, an entity may be defined as represented byan integer data type at this level, but the number of bytes itoccupies is not specified at this level.

  • 8/6/2019 SALEH database lect2

    41/56

    CS200 Database Systems

    The Internal Level

    The internal level represents the physicalrepresentation of the database on the computer. Thislevel describes howthe data is stored in the database.The internal level describes the physical

    implementation necessary to achieve optimal runtimeperformance and storage space utilization.It covers the data structures and file organizations

    used to store the data on the storage devices.

    It interfaces with the OS access methods (filemanagement techniques for storing and retrievingdata records) to place the data on the storage devices,

    build indexes, retrieve the data, and so on.

    Th Ph i l L l

  • 8/6/2019 SALEH database lect2

    42/56

    CS200 Database Systems

    The Physical Level

    Below the internal level is the physical level that may bemanaged by the OS under the direction of the DBMS.

    The functions of the DBMS and the OS at the physicallevel are not clear cut and will vary from system to

    system.Some DBMSs take advantage of many of the OS access

    methods, while others will use only the most basic onesand create their own file organizations.

    The physical level below the DBMS consists of itemsonly the OS knows, such as exactly how the sequencing isimplemented and whether the fields of internal recordsare stored as contiguous bytes on the disk.

    D t I d d

  • 8/6/2019 SALEH database lect2

    43/56

    CS200 Database Systems

    Data Independence

    One of the major objectives of the three-levelarchitecture is to provide data independence, whichmeans that the upper levels are unaffected by changesto lower levels.

    There are two types of data independence: logicalandphysical.Logical data independence refers to the immunity of

    the external schemas to changes in the conceptual

    schema.Physical data independence refers to the immunityof the conceptual schema to changes in the internalschema.

  • 8/6/2019 SALEH database lect2

    44/56

    CS200 Database Systems

    Data Independence(cont.)

    View 1View 1 View 2View 2 View nView n

    user 1 user 2 user n

    external level

    ConceptualSchema

    ConceptualSchema

    conceptual level

    internal level

    physical data organization

    Internal

    Schema

    Internal

    Schema

    dbdb

    logical data independence

    physical data independence

    D t b L

  • 8/6/2019 SALEH database lect2

    45/56

    CS200 Database Systems

    Database Languages

    A data sublanguage consists of two parts: a Data DefinitionLanguage (DDL) and a Data Manipulation Language (DML).The DDL is used to specify the database schema and the DML

    is used to both read and update the database.These languages are called data sublanguages because they do

    not include constructs for all computing needs such asconditional or iterative statements, which are provided by thehigh-level programming languages.Most DBMSs have a facility for embedding the sublanguage ina high-level programming language such as COBOL, Pascal,

    C, C++, Java, or Visual Basic which is then called the hostlanguage.Most data sublanguages also provide a non-embedded or

    interactive version of the language to be input directly from a

    terminal.

    D t D fi iti L (DDL)

  • 8/6/2019 SALEH database lect2

    46/56

    CS200 Database Systems

    Data Definition Language (DDL)

    A Data Definition Language is a language thatallows the DBA or user to describe and name theentities, attributes, and relationships required forthe application, together with any associated

    integrity and security constraints.The result of the compilation/execution of the DDLstatements is a set of tables stored in special filescollectively referred to as the system catalog. Thesystem catalog is also commonly referred to as thedata dictionary ordata directory.

    D t M i l ti L (DML)

  • 8/6/2019 SALEH database lect2

    47/56

    CS200 Database Systems

    Data Manipulation Language (DML)

    A Data Manipulation Language is a language thatprovides a set of operations to support the basic datamanipulation operations on the data held in thedatabase.

    DML operations usually include the following: insertion of new data into the database. modification of data stored in the database. retrieval of data contained in the database. deletion of data from the database.The part of the DML that involves data retrieval iscalled aquery language.

    DMLs (cont )

  • 8/6/2019 SALEH database lect2

    48/56

    CS200 Database Systems

    DMLs (cont.)

    DMLs are distinguished by their underlying retrievalconstructs. We can distinguish two basic types of DMLs:procedural and non-procedural.Procedural DMLs are languages in which the user informsthe system whatdata is required and exactly howto retrieve

    that data. Non-procedural DMLs are languages in which the userinforms the system only ofwhatdata is required and leavesthe how to retrieve the data entirely up to the system.

    It is common for procedural DMLs to be embedded in high-level programming languages.Procedural DMLs tend to be more focused on individual

    records while non-procedural DMLs tend to operate on sets ofrecords.

    Fourth Generation Languages

  • 8/6/2019 SALEH database lect2

    49/56

    CS200 Database Systems

    Fourth Generation Languages

    There is no consensus as to what constitutes a4GL. In essences it is a shorthand programminglanguage. What requires several hundred lines of

    code in a 3GL will require only a few lines ofcode in a 4GL.3GLs are procedural while 4GLs are non-

    procedural.

    4GLs include spreadsheets and databaselanguages.

    SQL and QBE are examples of 4GLs.

    Types of Databases

  • 8/6/2019 SALEH database lect2

    50/56

    CS200 Database Systems

    Types of Databases

    Flat Databases

    Relational Database

    Flat Databases

  • 8/6/2019 SALEH database lect2

    51/56

    CS200 Database Systems

    Flat Databases

    A single kind of record with a fixed number offields.

    a way of organizing all information in a single

    table.

    suitable for extremely simple databases.

    inherit data redundancy

    Relational Database

  • 8/6/2019 SALEH database lect2

    52/56

    CS200 Database Systems

    Relational Database

    Data stored in a collection of columns androws called a table, or a relation

    Tables may be electronically linked via akey field containing common data

    Easy to add, delete and modify the data

    and the table structures

    Relational Database

  • 8/6/2019 SALEH database lect2

    53/56

    CS200 Database Systems

    Relational Database

    Summary

  • 8/6/2019 SALEH database lect2

    54/56

    CS200 Database Systems

    Summary

    In this lecture we defined a database as a collection ofrelated data, where data means recorded facts. A typicaldatabase represents some aspect of the real world and isused for specific purposes by one or more groups of users.

    A DBMS is a generalized software package forimplementing and maintaining a computerized database.The database and software together form a databasesystem. We identified several characteristics thatdistinguish the database approach from traditional file-

    processing applications.We discussed about database history and it's types as wellas it's users (a DBA, DA etc).

    Challenge

  • 8/6/2019 SALEH database lect2

    55/56

    CS200 Database Systems

    Challenge

    Discuss the capabilities that should be providedby a DBMS.

    Discuss the main characteristics of the database

    and how it differs from traditional file systems.Why would you choose a database system

    instead of simply storing data in operating systemfiles? When would it make sense not to use a

    database system?

    Questions

  • 8/6/2019 SALEH database lect2

    56/56

    CS200 Database Systems

    Questions