Download pdf - DBMS

1

DATABASE MANAGEMENT

SYSTEM

2

What is data base management system? Explain the three level architecture with

diagram?

A database management system (DBMS) is the software that allows a computer to

perform database functions of storing, retrieving, adding, deleting and modifying data.

Relational database management systems (RDBMS) implement the relational model of

tables and relationships. A database management system (DBMS) is a software package

designed to define, manipulate, retrieve and manage data in a database. A DBMS

generally manipulates the data itself, the data format, field names, record structure and file

structure. It also defines rules to validate and manipulate this data. A DBMS relieves users

of framing programs for data maintenance. Fourth-generation query languages, such as

SQL, are used along with the DBMS package to interact with a database.

Three Level Architecture of DBMS

An early proposal for a standard terminology and general architecture database a system

was produced in 1971 by the DBTG (Data Base Task Group) appointed by the Conference

on data Systems and Languages. The DBTG recognized the need for a two level approach

with a system view called the schema and user view called subschema. The American

National Standard Institute terminology and architecture in 1975.ANSI-SPARC

recognized the need for a three level approach with a system catalog.

The design of a Database Management System highly depends on its architecture. It can

be centralized or decentralized or hierarchical. DBMS architecture can be seen as single

tier or multi tier. n-tier architecture divides the whole system into related but independent

n modules, which can be independently modified, altered, changed or replaced.

In 1-tier architecture, DBMS is the only entity where user directly sits on DBMS and uses

it. Any changes done here will directly be done on DBMS itself. It does not provide handy

tools for end users and preferably database designer and programmers use single tier

architecture.

If the architecture of DBMS is 2-tier then must have some application, which uses the

DBMS. Programmers use 2-tier architecture where they access DBMS by means of

application. Here application tier is entirely independent of database in term of operation,

design and programming.

3-tier architecture

Most widely used architecture is 3-tier architecture. 3-tier architecture separates it tier

from each other on basis of users. It is described as follows:

3

[3-tier DBMS architecture]

Database (Data) Tier: At this tier, only database resides. Database along with its

query processing languages sits in layer-3 of 3-tier architecture. It also contains all

relations and their constraints.

Application (Middle) Tier: At this tier the application server and program, which

access database, resides. For a user this application tier works as abstracted view of

database. Users are unaware of any existence of database beyond application. For

database-tier, application tier is the user of it. Database tier is not aware of any

other user beyond application tier. This tier works as mediator between the two.

User (Presentation) Tier: An end user sits on this tier. From a users aspect this

tier is everything. He/she doesn't know about any existence or form of database

beyond this layer. At this layer multiple views of database can be provided by the

application. All views are generated by applications, which resides in application

tier.

Multiple tier database architecture is highly modifiable as almost all its components are

independent and can be changed independently.

4

There are following three levels or layers of DBMS architecture:

1. External Level

2. Conceptual Level

3. Internal Level

1. External Level: - External Level is described by a schema i.e. it consists of definition

of logical records and relationship in the external view. It also contains the method of

deriving the objects in the external view from the objects in the conceptual view.

2. Conceptual Level: - Conceptual Level represents the entire database. Conceptual

schema describes the records and relationship included in the Conceptual view. It also

contains the method of deriving the objects in the conceptual view from the objects in the

internal view.

3. Internal Level: - Internal level indicates hoe the data will be stored and described the

data structures and access method to be used by the database. It contains the definition of

stored record and method of representing the data fields and access aid used.

A mapping between external and conceptual views gives the correspondence among the

records and relation ship of the conceptual and external view. The external view is the

abstraction of conceptual view which in turns is the abstraction of internal view. It

describes the contents of the database as perceived by the user or application program of

that view.

5

Explain all DDL & DHL command with systems and output?

DML vs. DDL

Data Manipulation Language (also known as DML) is a family of computer languages.

They are used by computer programs, and/or database users, to manipulate data in a

database that is, insert, delete and update this data in the database.

Data Definition Language (also known as DDL) is a computer language used to define

data structures as its namesake suggests. It first made its appearance in the CODASYL

database model (a model pertaining to the information technology industry consortium,

known as Conference on Data Systems Languages). DDL was used within the schema of

the database in order to describe the records, fields, and sets that made up the user Data

Model. It was at first a way in which programmers defined SQL. Now, however, it is used

generically to refer to any formal language used to describe data or information structures

(for example, XML schemas).

The most popular form of DML is the Structured Query Language (or SQL). This is a

language used for databases, and is designed specifically for managing data in relational

database management systems (or RDBMS). There are also other forms in which DML is

used, for instance IM S/DLI, CODASYL databases (IDMS, for example), and a few

others. DML comprises of SQL data change statements, meaning that stored data is

modified, but the schema or database objects remain the same. The functional capability of

the DML is organised by the initial word in a statement. This word is most generally a

verb giving the page a specific action to fulfil. There are four specific verbs that initiate

an action: SELECTINTO, INSERT, UPDATE, and DELETE.

The DDL is used mainly to create that is to make a new database, table, index or stored

query. A CREATE statement in SQL literally creates an object inside any RDBMS. As

such, the types of objects able to be created are completely dependent on which RDBMS

is currently in use. Most RDBMS support the table, index, user, synonym and database

creation. In some cases, a system will allow the CREATE command and other DDL

commands inside a specific transaction. This means that these functions are capable of

being rolled back. The most common CREATE command is the CREATE TABLE

command.

DMLs are quite various. They have different functions and capabilities between database

vendors. There are only two DML languages, however: Procedural and Declarative. While

6

there are multiple standards established for SQL, most vendors provide their own

extensions to the standard without implementing it entirely.

Summary:

1. DML is a grouping of computer languages used by computer programs to manipulate

data in a database; DDL is a computer language used specifically to define data structures.

2. The most popular form of DML is SQL, and is comprised of various change statements;

DDL mainly uses the CREATE command.

DDL

Data Definition Language (DDL) statements are used to define the database structure or

schema. Some examples:

o CREATE - to create objects in the database o ALTER - alters the structure of the database o DROP - delete objects from the database o TRUNCATE - remove all records from a table, including all spaces allocated for

the records are removed

o COMMENT - add comments to the data dictionary o RENAME - rename an object

DML

Data Manipulation Language (DML) statements are used for managing data within

schema objects. Some examples:

o SELECT - retrieve data from the a database o INSERT - insert data into a table o UPDATE - updates existing data within a table o DELETE - deletes all records from a table, the space for the records remain o MERGE - UPSERT operation (insert or update) o CALL - call a PL/SQL or Java subprogram o EXPLAIN PLAN - explain access path to data o LOCK TABLE - control concurrency

DCL

Data Control Language (DCL) statements. Some examples:

o GRANT - gives user's access privileges to database o REVOKE - withdraw access privileges given with the GRANT command

7

TCL

Transaction Control (TCL) statements are used to manage the changes made by DML

statements. It allows statements to be grouped together into logical transactions.

o COMMIT - save work done o SAVEPOINT - identify a point in a transaction to which you can later roll back o ROLLBACK - restore database to original since the last COMMIT o SET TRANSACTION - Change transaction options like isolation level and what

rollback segment to use

8

Explain various types of data models with the help of diagrams?

In software engineering, the term data model is used in two related senses. In the sense

covered by this article, it is a description of the objects represented by a computer system

together with their properties and relationships; these are typically "real world" objects

such as products, suppliers, customers, and orders. In the second sense, covered by the

article database model, it means a collection of concepts and rules used in defining data

models: for example the relational model uses relations and tuples, while the network

model uses records, sets, and fields.

Overview of data modeling context: Data model is based on Data, Data relationship, Data

semantic and Data constraint. A data model provides the details of information to be

stored, and is of primary use when the final product is the generation of computer software

code for an application or the preparation of a functional specification to aid a computer

software make-or-buy decision. The figure is an example of the interaction between

process and data models.

Data models are often used as an aid to communication between the business people

defining the requirements for a computer system and the technical people defining the

design in response to those requirements. They are used to show the data needed and

created by business processes.

According to Hoberman (2009), "A data model is a wayfinding tool for both business and

IT professionals, which uses a set of symbols and text to precisely explain a subset of real

information to improve communication within the organization and thereby lead to a more

flexible and stable application environment."

9

A data model explicitly determines the structure of data. Data models are specified in a

data modeling notation, which is often graphical in form.

A data model can be sometimes referred to as a data structure, especially in the context of

programming languages. Data models are often complemented by function models,

especially in the context of enterprise models.

Relationships and functions

A given database management system may provide one or more of the five models. The

optimal structure depends on the natural organization of the application's data, and on the

application's requirements, which include transaction rate (speed), reliability,

maintainability, scalability, and cost. Most database management systems are built around

one particular data model, although it is possible for products to offer support for more

than one model.

Various physical data models can implement any given logical model. Most database

software will offer the user some level of control in tuning the physical implementation,

since the choices that are made have a significant effect on performance.

A model is not just a way of structuring data: it also defines a set of operations that can be

performed on the data. The relational model, for example, defines operations such as

select (project) and join. Although these operations may not be explicit in a particular

query language, they provide the foundation on which a query language is built.

Flat model

Flat File Model.

The flat (or table) model consists of a single, two-dimensional array of data elements,

where all members of a given column are assumed to be similar values, and all members

of a row are assumed to be related to one another. For instance, columns for name and

password that might be used as a part of a system security database. Each row would have

the specific password associated with an individual user. Columns of the table often have a

10

type associated with them, defining them as character data, date or time information,

integers, or floating point numbers. This tabular format is a precursor to the relational

model.

Early data models

These models were popular in the 1960s, 1970s, but nowadays can be found primarily in

old legacy systems. They are characterized primarily by being navigational with strong

connections between their logical and physical representations, and deficiencies in data

independence.

Hierarchical model

In a hierarchical model, data is organized into a tree-like structure, implying a single

parent for each record. A sort field keeps sibling records in a particular order. Hierarchical

structures were widely used in the early mainframe database management systems, such as

the Information Management System (IMS) by IBM, and now describe the structure of

XML documents. This structure allows one one-to-many relationship between two types

of data. This structure is very efficient to describe many relationships in the real world;

recipes, table of contents, ordering of paragraphs/verses, any nested and sorted

information.

This hierarchy is used as the physical order of records in storage. Record access is done by

navigating through the data structure using pointers combined with sequential accessing.

Because of this, the hierarchical structure is inefficient for certain database operations

when a full path (as opposed to upward link and sort field) is not also included for each

record. Such limitations have been compensated for in later IMS versions by additional

logical hierarchies imposed on the base physical hierarchy.

11

Network model

The network model expands upon the hierarchical structure, allowing many-to-many

relationships in a tree-like structure that allows multiple parents. It was the most popular

before being replaced by the relational model, and is defined by the CODASYL

specification.

The network model organizes data using two fundamental concepts, called records and

sets. Records contain fields (which may be organized hierarchically, as in the

programming language COBOL). Sets (not to be confused with mathematical sets) define

one-to-many relationships between records: one owner, many members. A record may be

an owner in any number of sets, and a member in any number of sets.

A set consists of circular linked lists where one record type, the set owner or parent,

appears once in each circle, and a second record type, the subordinate or child, may appear

multiple times in each circle. In this way a hierarchy may be established between any two

record types, e.g., type A is the owner of B. At the same time another set may be defined

where B is the owner of A. Thus all the sets comprise a general directed graph (ownership

defines a direction), or network construct. Access to records is either sequential (usually in

each record type) or by navigation in the circular linked lists.

The network model is able to represent redundancy in data more efficiently than in the

hierarchical model, and there can be more than one path from an ancestor node to a

descendant. The operations of the network model are navigational in style: a program

maintains a current position, and navigates from one record to another by following the

12

relationships in which the record participates. Records can also be located by supplying

key values.

Although it is not an essential feature of the model, network databases generally

implement the set relationships by means of pointers that directly address the location of a

record on disk. This gives excellent retrieval performance, at the expense of operations

such as database loading and reorganization.

Popular DBMS products that utilized it were Cincom Systems' Total and Cullinet's IDMS.

IDMS gained a considerable customer base; in the 1980s, it adopted the relational model

and SQL in addition to its original tools and languages.

Most object databases (invented in the 1990s) use the navigational concept to provide fast

navigation across networks of objects, generally using object identifiers as "smart"

pointers to related objects. Objectivity/DB, for instance, implements named one-to-one,

one-to-many, many-to-one, and many-to-many named relationships that can cross

databases. Many object databases also support SQL, combining the strengths of both

models.

Inverted file model

In an inverted file or inverted index, the contents of the data are used as keys in a lookup

table, and the values in the table are pointers to the location of each instance of a given

content item. This is also the logical structure of contemporary database indexes, which

might only use the contents from a particular columns in the lookup table. The inverted

file data model can put indexes in a second set of files next to existing flat database files,

in order to efficiently directly access needed records in these files.

Notable for using this data model is the ADABAS DBMS of Software AG, introduced in

1970. ADABAS has gained considerable customer base and exists and supported until

today. In the 1980s it has adopted the relational model and SQL in addition to its original

tools and languages.

13

Relational model

The relational model was introduced by E.F. Codd in 1970 as a way to make database

management systems more independent of any particular application. It is a mathematical

model defined in terms of predicate logic and set theory, and systems implementing it

have been used by mainframe, midrange and microcomputer systems.

The products that are generally referred to as relational databases in fact implement a

model that is only an approximation to the mathematical model defined by Codd. Three

key terms are used extensively in relational database models: relations, attributes, and

domains. A relation is a table with columns and rows. The named columns of the relation

are called attributes, and the domain is the set of values the attributes are allowed to take.

The basic data structure of the relational model is the table, where information about a

particular entity (say, an employee) is represented in rows (also called tuples) and

columns. Thus, the "relation" in "relational database" refers to the various tables in the

database; a relation is a set of tuples. The columns enumerate the various attributes of the

entity (the employee's name, address or phone number, for example), and a row is an

actual instance of the entity (a specific employee) that is represented by the relation. As a

result, each tuple of the employee table represents various attributes of a single employee.

All relations (and, thus, tables) in a relational database have to adhere to some basic rules

to qualify as relations. First, the ordering of columns is immaterial in a table. Second, there

can't be identical tuples or rows in a table. And third, each tuple will contain a single value

for each of its attributes.

A relational database contains multiple tables, each similar to the one in the "flat" database

model. One of the strengths of the relational model is that, in principle, any value

occurring in two different records (belonging to the same table or to different tables),

implies a relationship among those two records. Yet, in order to enforce explicit integrity

constraints, relationships between records in tables can also be defined explicitly, by

identifying or non-identifying parent-child relationships characterized by assigning

cardinality (1:1, (0)1:M, M:M). Tables can also have a designated single attribute or a set

14

of attributes that can act as a "key", which can be used to uniquely identify each tuple in

the table.

A key that can be used to uniquely identify a row in a table is called a primary key. Keys

are commonly used to join or combine data from two or more tables. For example, an

Employee table may contain a column named Location which contains a value that

matches the key of a Location table. Keys are also critical in the creation of indexes,

which facilitate fast retrieval of data from large tables. Any column can be a key, or

multiple columns can be grouped together into a compound key. It is not necessary to

define all the keys in advance; a column can be used as a key even if it was not originally

intended to be one.

A key that has an external, real-world meaning (such as a person's name, a book's ISBN,

or a car's serial number) is sometimes called a "natural" key. If no natural key is suitable

(think of the many people named Brown), an arbitrary or surrogate key can be assigned

(such as by giving employees ID numbers). In practice, most databases have both

generated and natural keys, because generated keys can be used internally to create links

between rows that cannot break, while natural keys can be used, less reliably, for searches

and for integration with other databases. (For example, records in two independently

developed databases could be matched up by social security number, except when the

social security numbers are incorrect, missing, or have changed.)

The most common query language used with the relational model is the Structured Query

Language (SQL).

Dimensional model

The dimensional model is a specialized adaptation of the relational model used to

represent data in data warehouses in a way that data can be easily summarized using

online analytical processing, or OLAP queries. In the dimensional model, a database

schema consists of a single large table of facts that are described using dimensions and

measures. A dimension provides the context of a fact (such as who participated, when and

where it happened, and its type) and is used in queries to group related facts together.

Dimensions tend to be discrete and are often hierarchical; for example, the location might

include the building, state, and country. A measure is a quantity describing the fact, such

as revenue. It is important that measures can be meaningfully aggregatedfor example,

the revenue from different locations can be added together.

15

In an OLAP query, dimensions are chosen and the facts are grouped and aggregated

together to create a summary.

The dimensional model is often implemented on top of the relational model using a star

schema, consisting of one highly normalized table containing the facts, and surrounding

denormalized tables containing each dimension. An alternative physical implementation,

called a snowflake schema, normalizes multi-level hierarchies within a dimension into

multiple tables.

A data warehouse can contain multiple dimensional schemas that share dimension tables,

allowing them to be used together. Coming up with a standard set of dimensions is an

important part of dimensional modeling.

Its high performance has made the dimensional model the most popular database structure

for OLAP.

Post-relational database models

Products offering a more general data model than the relational model are sometimes

classified as post-relational. Alternate terms include "hybrid database", "Object-enhanced

RDBMS" and others. The data model in such products incorporates relations but is not

constrained by E.F. Codd's Information Principle, which requires that

all information in the database must be cast explicitly in terms of values in relations and in

no other way

Some of these extensions to the relational model integrate concepts from technologies that

pre-date the relational model. For example, they allow representation of a directed graph

with trees on the nodes. The German company sones implements this concept in its

GraphDB.

Some post-relational products extend relational systems with non-relational features.

Others arrived in much the same place by adding relational features to pre-relational

systems. Paradoxically, this allows products that are historically pre-relational, such as

PICK and MUMPS, to make a plausible claim to be post-relational.

The resource space model (RSM) is a non-relational data model based on multi-

dimensional classification.

Graph model

Graph databases allow even more general structure than a network database; any node

may be connected to any other node.

16

Multivalue model

Multivalue databases are "lumpy" data, in that they can store exactly the same way as

relational databases, but they also permit a level of depth which the relational model can

only approximate using sub-tables. This is nearly identical to the way XML expresses

data, where a given field/attribute can have multiple right answers at the same time.

Multivalue can be thought of as a compressed form of XML.

An example is an invoice, which in either multivalue or relational data could be seen as

(A) Invoice Header Table - one entry per invoice, and (B) Invoice Detail Table - one entry

per line item. In the multivalue model, we have the option of storing the data as on table,

with an embedded table to represent the detail: (A) Invoice Table - one entry per invoice,

no other tables needed.

The advantage is that the atomicity of the Invoice (conceptual) and the Invoice (data

representation) are one-to-one. This also results in fewer reads, less referential integrity

issues, and a dramatic decrease in the hardware needed to support a given transaction

volume.

Object-oriented database models

In the 1990s, the object-oriented programming paradigm was applied to database

technology, creating a new database model known as object databases. This aims to avoid

the object-relational impedance mismatch - the overhead of converting information

between its representation in the database (for example as rows in tables) and its

representation in the application program (typically as objects). Even further, the type

system used in a particular application can be defined directly in the database, allowing the

database to enforce the same data integrity invariants. Object databases also introduce the

17

key ideas of object programming, such as encapsulation and polymorphism, into the world

of databases.

A variety of these ways have been tried for storing objects in a database. Some products

have approached the problem from the application programming end, by making the

objects manipulated by the program persistent. This typically requires the addition of some

kind of query language, since conventional programming languages do not have the ability

to find objects based on their information content. Others have attacked the problem from

the database end, by defining an object-oriented data model for the database, and defining

a database programming language that allows full programming capabilities as well as

traditional query facilities.

Object databases suffered because of a lack of standardization: although standards were

defined by ODMG, they were never implemented well enough to ensure interoperability

between products. Nevertheless, object databases have been used successfully in many

applications: usually specialized applications such as engineering databases or molecular

biology databases rather than mainstream commercial data processing. However, object

database ideas were picked up by the relational vendors and influenced extensions made to

these products and indeed to the SQL language.

An alternative to translating between objects and relational databases is to use an object-

relational mapping (ORM) library.

18

Explain all types of DBMS? List the system and command along with output?

A DBMS always provides data independence. Any change in storage mechanism and

formats are performed without modifying the entire application. There are four main types

of database organization:

Relational Database: Data is organized as logically independent tables.

Relationships among tables are shown through shared data. The data in one table

may reference similar data in other tables, which maintains the integrity of the

links among them. This feature is referred to as referential integrity - an important

concept in a relational database system. Operations such as "select" and "join" can

be performed on these tables. This is the most widely used system of database

organization.

Flat Database: Data is organized in a single kind of record with a fixed number of

fields. This database type encounters more errors due to the repetitive nature of

data.

Object Oriented Database: Data is organized with similarity to object oriented

programming concepts. An object consists of data and methods, while classes

group objects having similar data and methods.

Hierarchical Database: Data is organized with hierarchical relationships. It

becomes a complex network if the one-to-many relationship is violated.

Data management models

The data management systems (also called data base management systems) introduced

several new ways of organizing data. That is they introduced several new ways of linking

record fragments (or segments) together to form larger records for processing. Although

many different methods were tried, only three major methods became popular: the

hierarchic method, the network method, and the newest, the relational method.

Each of these methods reflected the manner in which the vendor constructed and

physically managed data within the file. The systems designer and the programmer had to

understand these methods so that they could retrieve and process the data in the files.

These models depicted the way the record fragments were tied to each other and thus the

manner in which the chain of pointers had to be followed to retrieved the fragments in the

correct order.

Each vendor introduced a structural model to depict how the data was organized and tied

together. These models also depicted what options were chosen to be implemented by the

19

development team, data record dependencies, data record occurrence frequencies, and the

sequence in which data records had to be accessed - also called the navigation sequence.

The hierarchic model

The hierarchic model (figure) is used to describe those record structures in which the

various physical records which make up the logical record are tied together in a sequence

which looks like an inverted tree. At the top of the structure is a single record. Beneath

that are one or more records each of which can occur one or more times. Each of these can

in turn have multiple records beneath them. In diagrammatic form the top to bottom set of

records looks like a inverted tree or a pyramid of records. To access the set of records

associated with the identifier one started at the top record and followed the pointers from

record to record.

20

The various records in the lower part of the structure are accessed by first accessing the

records above them and then following the chain of pointers to the records at the next

lower level. The records at any given level are referred to as the parent records and the

records at the next lower level that are connected to it, or dependent on it are referred to as

its children or the child records. There can be any number of records at any level, and each

record can have any number of children. Each occurrence of the structure normally

represent the collection of data about a single subject. This parent-child repetition can be

repeated through several levels.

The data model for this type of structural representation usually depicts each segment or

record fragment only once and uses lines to show the connection between a parent record

and its children. This depiction of record types and lines connecting them looks like an

inverted tree or an organizational hierarchy chart.

Each file is said to consist of a number of repetitions of this tree structure. Although the

data model depicts all possible records types within a structure, in any given occurrence,

record types may or may not be present. Each occurrence of the structure represents a

specific subject occurrence an is identified by a unique identifier in the single, topmost

record type (the root record).

Designers employing this type of data management system would have to develop a

unique record hierarchy for each data storage subject. A given application may have

several different hierarchies, each representing data about a different subject, associated

with it and a company may have several dozen different hierarchies of record types as

components of its data model. A characteristic of this type of model is that each hierarchy

is normally treated as separate and distinct from the other hierarchies, and various

hierarchies can be mixed and matched to suit the data needs of the particular application.

The network model

The network data model (figure) has no implicit hierarchic relationship between the

various records, and in many cases no implicit structure at all, with the records seemingly

placed at random. The network model does not make a clear distinction between subjects

mingling all record types in an overall schematic. The network model may have many

different records containing unique identifiers, each of which acts as an entry point into

the record structure. Record types are grouped into sets of two, one or both of which can in

turn be part of another set of two record types. Within a given set, one record type is said

to be the owner record and one is said to be the member record. Access to a set is always

21

accomplished by first locating the specific owner record and then following the chain of

pointers to the member records of the set. The network can be traversed or navigated by

moving from set to set. Various different data structures can be constructed by selecting

sets of records and excluding others.

Each record type is depicted only once in this type of data model and the relationship

between record types is indicated by a line between them. The line joining the two records

contains the name of the set. Within a set a record can have only one owner, but multiple

owner member sets can be constructed using the same two record types

The network model has no explicit hierarchy and no explicit entry point. Whereas the

hierarchic model has several different hierarchies structures, the network model employs a

22

single master network or model, which when completed looks like a web of records. As

new data is required, records are added to the network and joined to existing sets.

The relational model

The relational model (figure), unlike the network or the hierarchic models did not rely on

pointers to connect and chose to view individual records in sets regardless of the subject

occurrence they were associated with. This is in contrast to the other models which sought

to depict the relationships between record types. In the network model records are

portrayed as residing in tables with no physical pointer between these tables. Each table is

thus portrayed independently from each other table. This made the data model itself a

model of simplicity, but it in turn made the visualization of all the records associated with

a particular subject somewhat difficult.

23

Data records were connected using logic and by using that data that was redundantly

stored in each table. Records on a given subject occurrence could be selected from

multiple tables by matching the contents of these redundantly stored data fields.

The impact of data management systems

The use of these products to manage data introduced a new set of tasks for the data

analysis personnel. In addition to developing record layouts, they also had the new task of

determining how these records should be structured, or arranged and joined by pointer

structures.

Once those decisions were made they had to be conveyed to the members of the

implementation team. The hierarchic and network models were necessary because without

them the occurrence sequences and the record to record relationships designed into the

files could not be adequately portrayed. Although the relational "model" design choices

also needed to be conveyed to the implementation team, the relational model was always

depicted in much the same format as standard record layouts, and any other access or

navigation related information could be conveyed in narrative form.

Data as a corporate Resource

One additional concept was introduced during the period when these new file management

systems were being developed - the concept that data was a corporate resource. The

implications of concept were that data belonged to the corporation as a whole, and not to

individual user areas. This implied that data should somehow be shared, or used in

common by all members of the firm.

Data sharing required data planning. Data had to be organized, sized and formatted to

facilitate use by all who needed it. This concept of data sharing was diametrically opposed

to the application orientation where data records and data files were designed for, and data

owned by the application and the users of that application.

This concept also introduced a new set of participants in the data analysis process and a

new set of users of the data models. These new people were business area personnel who

were now drawn into the data analysis process. The data record models which had sufficed

for the data processing personnel no longer conveyed either the right information nor

information with the correct perspective to be meaningful for these new participants.

The primary method of data planning is the development of the data model. Many of the

early data planning was accomplished within the context of the schematics used by the

design team to describe the data management file structures.

24

These models were used as analysis and requirements tools, and as such were moderately

effective. They were limited in one respect, that being that organizations tended to use the

implementation model, which also contained information about pointer use, navigation

information, or in the case of the network models, owner-member set information, access

choice information and other information which was important to the data processing

implementation team, but not terribly relevant to the user.

Normalization

Concurrent with the introduction of the relational data model another concept was

introduced - that of normalization. Although it was introduced in the early nineteen-

seventies its full impact did not begin to be felt until almost a decade later, and even today

its concepts are not well understood. The various record models gave the designer a way

of presenting to the user not only the record layout but also also the connections between

the data records. In a sense allowing the designer to show the user what data could be

accessed with what other data. Determination of record content however was not

addressed in any methodical manner. Data elements were collect into records in a

somewhat haphazard manner. That is there was no rationale or predetermined reason why

one data element was placed in the same record as another. Nor was there any need to do

so since the physical pointers between records prevented data on one subject from being

confused with data about another, even at the occurrence level.

The relational model however lacked these pointers and relied on logic to assemble a

complete set of data from its tables. Because it was logic driven (based upon mathematics)

the notion was proposed that placement of data elements in records could also be guided

by a set of rules. If followed, these rules would eliminate many of the design mistakes

which arose from the meaning of data being inadvertently changed due to totally unrelated

changes. It also set forth rules which if followed would arrange the data within the records

and within the files more logically and more consistently.

Previously data analysis, file and record designers, relied on intuition and experience to

construct record layouts. As the design progressed, data was moved from record to record,

records were split and others combined until the final model was pleasing, relatively

efficient and satisfied the processing needs of the application that needed the data that

these models represented. Normalization offered the hope that the process of record

layout, and thus model development could be more procedurally driven, more rule driven

such that relatively inexperienced users could also participate in the process. It was also

25

hoped that these rules would also assist the experienced designer and eliminate some of

the iterations, and thus make the process more efficient.

The first rule of normalization was that data should depend (or be collected) by key. That

is, data should be organized by subject, as opposed to previous methods which collected

data by application or system. This notion was obvious to hierarchic model users, whose

models inherently followed this principle, but was somewhat foreign and novel to network

model developers where the aggregation of data about a data subject was not as

commonplace.

This notion of subject organized data led to the development of non-DBMS oriented data

models.

The Entity-Relationship model

While the record data models served many purposes for the system designers, these

models had little meaning or relevance to the users community. Moreover, much of the

information the users needed to evaluate the effectiveness of the design was missing.

Several alternative data model formats were introduced to fill this void. These models

attempted to model data in a different manner. Rather than look at data from a record

perspective, they began to look at the entities or subjects about which data was being

collected and maintained. They also realized that the the relationship between these data

subjects was also an area that needed to be modeled and subjected to user scrutiny. These

relationships were important because in may respects they reflected the business rules

under which the firm operated. This modeling of relationships was particularly important

when relational data management systems were being used because the relationship

between the data tables was not explicitly stated, and the design team required some

method for describing those relationships to the user.

As we shall see later on, the Entity-Relatinship model has one other important advantage.

In as much as it is non-DBMS specific, and is in fact not a DBMS model at all, data

models can be developed by the design team without first having to make a choice as to

which DBMS to use. In those firms where multiple data management systems are both in

use and available, this is a critical advantage in the design process.

26

IRCTC TABLES Layout of railway reservation form and connection of this form with the database required

to store information.

PASSENGERS DATABASE: database of passengers contains following fields

1. Name

2. Age

3. Gender..

4. Total Number Of Passengers Travelling

Number of Adults..

Number Of children..

Senior Citizen

5. Date Of Travel

6. Class of Travel..

TRAIN DATABASE : database of train contains following fields

1. Train Name.

2. Train Number..

3. RouteFrom..To..

4. Train Time

5. Number of Compartments.

AC First Class

AC 2 Tier

AC 3 Tier

Sleeper..

General.

6. Number of Employees.

DESIGN OF TABLES

The passenger database will contain following fields

PNR NO (Primary key)

NAME

AGE

GENDER

TOTAL PASSENGER

DATE OF TRAVEL

27

CLASS

TRAIN NO.

The train database will contain following field

Train name

Train no. (Primary key)

Route from-to

Departure time

No of compartments

1 AC

2AC

3AC

SLEEPER

GENERAL

SLR

SNAPSHOTS OF TABLES

TABLE FORPASSENGERS

28

This is the original snapshot from M S Access. The primary key here is PNR NO. , this

table also contains name of passenger, age, gender, total passenger travelling, date of

travel, class and train no. in which they are travelling.

TABLE FOR TRAINS

This is the original snapshot from M S Access. The primary key here is train no. , this

table also contains train name, route, departure time from originating station, no. of

compartments in whole train and class wise segmentation of compartments.

29

SNAPSHOTS OF FORMS

PASSENGER RESERVATION FORM

This form contain the same data labels whatever is there in M S ACCESS database i.e.

name of passenger, age, gender, total passenger travelling, date of travel, class and train

no. in which they are travelling

30

FORM FOR TRAINS

This form contains the same data labels whatever is there in M S ACCESS database. I.e.

train name, route, and departure time from originating station, no. of compartments in

whole train and class wise segmentation of compartments.