RDB11.pptx

RELATIONAL DATABASEMANAGEMENT SYSTEM (17332)

Theory Paper : 100 marksExt. Oral : 25 marks

Term Work : 50 marksSessional : 10 marks

Total : 185 marks

REFERENCE BOOKS• Database System Concepts (4th Edition)

Author : Silberschatz, Korth, Sudarshan

• Introduction to Database Management Systems Author : ISRD Group

• SQL ,PL/SQL the Programming language of Oracle Author : Ivan Bayross

• Advanced Database Management SystemAuthor : Chakrabarti Dasgupta

Chapter No 1 : DATABASE SYSTEM CONCEPT (marks 16)Data• Known facts that can be recorded and have an implicit meaning.

For example, consider the names, telephone numbers, and addresses of the people you now.

Database• A database is a collection of related data. Database systems are

designed to manage large bodies of information. Management of data involves both defining structures for storage of information and providing mechanisms for the manipulation of information. The database system must ensure the safety of the information stored, despite system crashes or attempts at unauthorized access.

DBMS• A database management system (DBMS) is a collection of

related data and programs that enables users to create and maintain a database. The DBMS is a general-purpose software system that facilitates the processes of defining, constructing, manipulating, and sharing databases among various users and applications.

File Processing System

Disadvantages of file processing system– Data redundancy and inconsistency– Difficulty in accessing data– Data isolation– Concurrent-access anomalies– Integrity problems– Atomicity problems– Security problems

Data redundancy and inconsistency– For example, the address and telephone number of a particular customer may appear in a le

that consists of savings-account records and in a le that consists of checking-account records.– This redundancy leads to higher storage and access cost. In addition, it may lead to data

inconsistency.

Difficulty in accessing data– “find out the names of all customers who live within a particular postal- code area.”– There is no application program on hand to meet every need. So a bank manager face

difficulty to access data

Data isolation (Separation of Data in various files)– Because data are scattered in various files, and files may be in different formats, writing new

application programs to retrieve the appropriate data is difficult.

Problems with concurrent access

Example: Assume I'm paying for groceries with my MAC card at the same time my pay check is being deposited (and my bank uses a file processing system):

“Withdrawal program & Deposit program accessing database concurrently”

1. Read balance from checking account file as $51 2. Read balance from checking account file as $51 3. Subtract $50 (for groceries) 4. Update checking account file (new balance: $1) 5. add $100 (my salary) 6. Update checking account file (new balance: $151)

Atomicity Problem– In many applications, it is crucial that, if a failure occurs, the data be restored to the consistent

state that existed prior to the failure. It is difficult to ensure atomicity in a conventional le-processing system.

Integrity problems – Data may need to satisfy certain conditions, called consistency constraints .

for example: account balances should never fall below $0 . difficult to enforce/add/change such consistency constraints in a file processing system.

Security Problems – Not all users have access permission to all type of data , but enforcing such restrictions in FPS is

difficult.

Application of database1. Banking: 2. Airlines: 3. Universities: 4. Credit card transactions:5. Telecommunication: 6. Finance: 7. Sales:8. Manufacturing:

Introduction to RDMS– A DBMS that is based on relational model is called as RDBMS. RDBMS Designed by E.F. Codd.– The relational model uses a collection of tables to represent both data and the relationships

among those data. Each table has multiple columns, and each column has a unique name.– A table is a two dimensional array containing rows and columns. Each row contains data

related to an entity such as a student. Each column contains the data related to a single attribute of the entity such as student name.

– Basic Concepts of RDBMS are as followsTuple: In relational model, a row is called as tuple. Attribute: A column header is called as an attribute. Degree: The degree of relation is number of attributes of the table. Domain: All permissible values of attributes is called as a domain. Cardinality: Number of rows in the table is called as cardinality

Figure 1 shows how data is represented in relational model and what are the terms used to refer to various components of a table. The following are the terms used in relational model.

Difference between DBMS and RDBMS

Name of various DBMS and RDBMS software

DBMS Software– Dbase– FoxBASE– FoxPro

RDMS Software– Oracle– MySQL– SQL Server

Data Abstraction– Hiding Database Design complexities from users (which are not computer professionals) is

nothing but known as Data Abstraction.– Data Abstraction feature provide easy way to retrieve data, from database.– There are three levels of abstraction.

Physical level :– Hiding the detail about “how the data is stored actually and where it is stored in database”

from user is known as physical level abstraction.– The physical level describes complex low-level data structures in detail.

Logical level :– Hiding the detail about “what data are stored in the database, and what relationships exist

among those data” from user is known as logical level abstraction.– The logical level describes simple data structures in detail.

View level– The view level of abstraction exists to simplify their interaction with the system. The system

may provide many views for the same database, may they need to access only a part of the database.

– Views can also hide information (e.g., salary) for security purposes.

Fig. Three levels of Abstraction

Database Languages • DDL : Data Definition Language

– DDL commands are deal with the database structure. – DDL commands are used to create, manipulate and drop the database structure.

create , alter, rename , drop commands• DQL : Data Query Language

– DQL commands are used to access database data.

select command• DML : Data Manipulation Language

– DML commands deal with instances of the database.– DML commands are used to insert, update and delete the actual data of database.

insert , update , delete commands

Instances and Schemas Schema

• The overall structure of the database OR overall design of the database is called the database schema.• The schema of the database not change frequently , it is more stable as compare to data in database.• Design of schema decide the fields and their type in database.• We can create the schema or structure of database by using create command. • We can change the schema or structure of database by using alter command.

Instances• Information store in database at a particular point of time is known as Instance in database.• Instances are not stable in nature and can be change frequently.• Instances denote the rows of table in RDBMS and represent the actual data of the database.

• We can add new instance in database by using insert command.

Data Independence• It is the property of the database which tries to ensure that if we make any change in any

level of schema of the database, the schema immediately above it would require minimal or no need of change.

• What does this mean? We know that in a building, each floor stands on the floor below it. If we change the design of any one floor, e.g. extending the width of a room by demolishing the western wall of that room, it is likely that the design in the above floors will have to be changed also. As a result, one change needed in one particular floor would mean continuing to change the design of each floor until we reach the top floor, with an increase in the time, cost and labour. Would not life be easy if the change could be contained in one floor only? Data independence is the answer for this. It removes the need for additional amount of work needed in adopting the single change into all the levels above.

Data independence can be classified into the following two types

1.Physical Data Independence: This means that for any change made in the physical schema, the need to change the logical schema is minimal. This is practically easier to achieve. Let us explain with an example. Say, you have bought an Audio CD of a recently released film and one of your friends has bought an Audio Cassette of the same film. If we consider the physical schema, they are entirely different. The first is digital recording on an optical media, where random access is possible. The second one is magnetic recording on a magnetic media, strictly sequential access. However, how this change is reflected in the logical schema is very interesting. For music tracks, the logical schema for both the CD and the Cassette is the title card imprinted on their back. We have information like Track no, Name of the Song, Name of the Artist and Duration of the Track, things which are identical for both the CD and the Cassette. We can clearly say that we have achieved the physical data independence here.

2.Logical Data Independence: This means that for any change made in the logical schema, the need to change the external schema is minimal. As we shall see, this is a little difficult to achieve. Let us explain with an example. Suppose the CD you have bought contains 6 songs, and some of your friends are interested in copying some of those songs (which they like in the film) into their favorite collection. One friend wants the songs 1, 2,4,5,6, another wants 1,3,4,5 and another wants 1,2,3,6. Each of these collections can be compared to a view schema for that friend. Now by some mistake, a scratch has appeared in the CD and you cannot extract the song 3. Obviously, you will have to ask the friends who have song 3 in their proposed collection to alter their view by deleting song 3 from their proposed collection as well

Overall Structure of DBMSThere are four components 1 ) Disk Storage 2 ) Storage Manager

3 ) Query Processor 4) Database Users

Disk Storage• Data files, which store the database itself.

• Data dictionary which stores metadata about the structure of the database, in particular the schema of the database

• Indices which provide fast access to data items that hold particular values.

Storage Manager• Authorization and integrity manager

which tests for the satisfaction of integrity constraints and checks the authority of users to access data.

• Transaction manager, which ensures that the database remains in a consistent (correct) state despite system failures, and that concurrent transaction executions proceed without conflicting.

• File manager which manages the allocation of space on disk storage and the data structures used to represent information stored on disk.

• Buffer manager which is responsible for fetching data from disk storage into main memory, and deciding what data to cache in main memory.

The Query ProcessorThe query processor components include• DDL interpreter which interprets DDL statements and records the definitions in the data

dictionary.• DML compiler, which translates DML statements in a query language into an evaluation

plan consisting of low-level instructions that the query evaluation engine understands. A query can usually be translated into any of a number of alternative evaluation plans that all give the same result. The DML compiler also performs query optimization, that is, it picks the lowest cost evaluation plan from among the alternatives.• Query evaluation engine, which executes low-level instructions generated by the DML compiler.

Fig. Overall Structure of DBMS

Database Users Naive users –

invoke one of the permanent application programs that have been written previously.

E.g. people accessing database over the web, bank tellers, clerical staff

Application programmers – interact with system through DML calls. Responsible to write application programs using DML commands.

Sophisticated users – form requests in a database query language.Responsible to search required data from database (Data Mining)

Functions of Database Administrator

• Definition of the schema, the architecture of the three levels of the data abstraction, data independence.

• Modification of the defined schema as and when required.• Creating new user ID, password etc. , and also creating the access permissions that each

user can or cannot enjoy. DBA is responsible to create user roles.• Defining the integrity constraints for the database to ensure that the data entered

conform to some rules, thereby increasing the reliability of data.• Creating a security mechanism to prevent unauthorized access, accidental or intentional

handling of data that can cause security threat.• Creating backup and recovery policy. This is essential because in case of a failure the

database must be able to revive itself to its complete functionality with no loss of data ,as if the failure has never occurred.

Two / Three Tier architecture

Two-tier architecture: E.g. client programs using ODBC/JDBC to communicate with a database.Three-tier architecture: E.g. web-based applications, and applications built using “middleware”.

Two-tier architectureIn a two-tier architecture, the application is partitioned into a component that resides at the client machine, which invokes database system functionality at the server machine through query language statements. Application program interface standards like ODBC and JDBC are used for interaction between the client and the server.

Three-tier architectureIn contrast, in a three-tier architecture, the client machine acts as merely a frontend and does not contain any direct database calls. Instead, the client end communicates with an application server, usually through a forms interface. The application server in turn communicates with a database system to access data. The business logic of the application, which says what actions to carry out under what conditions, is embedded in the application server, instead of being distributed across multiple clients. Three-tier applications are more appropriate for large applications, and for applications that run on the World Wide Web.

E.F. Codd’s laws for fully functional RDBMS1. All data should be presented to the user in table form. 2. All data should be accessible without ambiguity. 3. A column should be allowed to remain empty. 4. The DBMS must provide access to its structure through the same tools that are used

to access the data. 5. The DBMS must support a clearly defined language that includes functionality for

data definition, data manipulation, data integrity, and database transaction control. 6. Data can be presented to the user in different logical combinations called views.7. Set operations like Union, Intersection and Minus should be supported. 8. The user is isolated from the physical method of storing and retrieving information from the da

tabase. 9. How a user views data should not change when the logical structure (tables

structure) of the database changes.10. Integrity Constraints on user input must be stored in Data Dictionary, to make RDBMS front end

Independent. 11. A user should be totally unaware of whether or not the database is housed on one

computer or distributed across several computers. 12. Users should not be allowed to modify the database structure using any GUI based

applications. Database Structure modification must be done by only direct SQL commands.

We can classify database into two categories as follows

• Centralized Database : The data reside in one single

location.• Distributed Database :

The data is distributed over multiple locations.

Distributed DatabaseA distributed database is a

collection of partially independent databases that (ideally) share a common schema, and coordinate processing of transactions that access nonlocal data. The processors communicate with one another through a communication network that handles routing and connection strategies.

Admission Section

Account Section

Exam Section

Student Personal

Data

Student Fees

Record

Student Result

Analysis

DDBMS

Fig. Distributed Database

Distributed Database

Types of Distributed DBMS

Homogeneous DDBMSA homogeneous distributed database has identical software and hardware running all databases instances, and may appear through a single interface as if it were a single database.

Heterogeneous DDBMSA heterogeneous distributed database may have different hardware, operating systems, database management systems, and even data models for different databases.

DATA WAREHOUSE Definition

A data warehouse is Integrated, Subject oriented, Time-variant, Nonvolatile collection of data in support of management's decision making process.

IntegratedData warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said to be integrated.

Subject OrientedData warehouses are designed to help you analyze data. For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best salesman for this item last year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented.

NonvolatileNonvolatile means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred previously.

Time VariantIn order to discover trends in business, analysts need large amounts of historical data. So it is necessary to put data into warehouse time by time.

DATA MININGData mining (knowledge discovery in databases): Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases.

Data IndependenceData independence, which can be defined as the capacity to change the schema at one level of a database system without having to change the schema at the next higher level .

There are two types of data independence present

Logical data independenceLogical data independence is the capacity to change the conceptual schema without having to change external schemas or application programs.Example : change in constraint at logical level should not affect the application programs.

Physical data independencePhysical data independence is the capacity to change the internal schema without having to change the conceptual schema. Hence, the external schemas need not be changed as well.Example: if physical files are reorganized , it should not affect logical organization of database.

• Explain any four advantages of DBMS• Differentiate between Two tier and Three tier architecture.• Explain following terms

– Storage Manager– Database Users

• Write down any four advantages of DDBMS.

• DCL : Data Control Language– DCL commands are useful to implement access control policies of the organization. – DCL commands are used to set/reset users permissions on different objects (eg.

Tables,Views etc).

grant , revoke commands• TCL : Transaction Control Language

– TCL commands are used to complete transactions without atomicity problem.– TCL commands useful to maintain database in a consistent state.

commit , rollback , savepoint commands

Documents

RDB11.pptx