1
DATABASE MANAGEMENT
SYSTEM
2
What is data base management system? Explain the three level architecture with
diagram?
A database management system (DBMS) is the software that allows a computer to
perform database functions of storing, retrieving, adding, deleting and modifying data.
Relational database management systems (RDBMS) implement the relational model of
tables and relationships. A database management system (DBMS) is a software package
designed to define, manipulate, retrieve and manage data in a database. A DBMS
generally manipulates the data itself, the data format, field names, record structure and file
structure. It also defines rules to validate and manipulate this data. A DBMS relieves users
of framing programs for data maintenance. Fourth-generation query languages, such as
SQL, are used along with the DBMS package to interact with a database.
Three Level Architecture of DBMS
An early proposal for a standard terminology and general architecture database a system
was produced in 1971 by the DBTG (Data Base Task Group) appointed by the Conference
on data Systems and Languages. The DBTG recognized the need for a two level approach
with a system view called the schema and user view called subschema. The American
National Standard Institute terminology and architecture in 1975.ANSI-SPARC
recognized the need for a three level approach with a system catalog.
The design of a Database Management System highly depends on its architecture. It can
be centralized or decentralized or hierarchical. DBMS architecture can be seen as single
tier or multi tier. n-tier architecture divides the whole system into related but independent
n modules, which can be independently modified, altered, changed or replaced.
In 1-tier architecture, DBMS is the only entity where user directly sits on DBMS and uses
it. Any changes done here will directly be done on DBMS itself. It does not provide handy
tools for end users and preferably database designer and programmers use single tier
architecture.
If the architecture of DBMS is 2-tier then must have some application, which uses the
DBMS. Programmers use 2-tier architecture where they access DBMS by means of
application. Here application tier is entirely independent of database in term of operation,
design and programming.
3-tier architecture
Most widely used architecture is 3-tier architecture. 3-tier architecture separates it tier
from each other on basis of users. It is described as follows:
3
[3-tier DBMS architecture]
Database (Data) Tier: At this tier, only database resides. Database along with its
query processing languages sits in layer-3 of 3-tier architecture. It also contains all
relations and their constraints.
Application (Middle) Tier: At this tier the application server and program, which
access database, resides. For a user this application tier works as abstracted view of
database. Users are unaware of any existence of database beyond application. For
database-tier, application tier is the user of it. Database tier is not aware of any
other user beyond application tier. This tier works as mediator between the two.
User (Presentation) Tier: An end user sits on this tier. From a users aspect this
tier is everything. He/she doesn't know about any existence or form of database
beyond this layer. At this layer multiple views of database can be provided by the
application. All views are generated by applications, which resides in application
tier.
Multiple tier database architecture is highly modifiable as almost all its components are
independent and can be changed independently.
4
There are following three levels or layers of DBMS architecture:
1. External Level
2. Conceptual Level
3. Internal Level
1. External Level: - External Level is described by a schema i.e. it consists of definition
of logical records and relationship in the external view. It also contains the method of
deriving the objects in the external view from the objects in the conceptual view.
2. Conceptual Level: - Conceptual Level represents the entire database. Conceptual
schema describes the records and relationship included in the Conceptual view. It also
contains the method of deriving the objects in the conceptual view from the objects in the
internal view.
3. Internal Level: - Internal level indicates hoe the data will be stored and described the
data structures and access method to be used by the database. It contains the definition of
stored record and method of representing the data fields and access aid used.
A mapping between external and conceptual views gives the correspondence among the
records and relation ship of the conceptual and external view. The external view is the
abstraction of conceptual view which in turns is the abstraction of internal view. It
describes the contents of the database as perceived by the user or application program of
that view.
5
Explain all DDL & DHL command with systems and output?
DML vs. DDL
Data Manipulation Language (also known as DML) is a family of computer languages.
They are used by computer programs, and/or database users, to manipulate data in a
database that is, insert, delete and update this data in the database.
Data Definition Language (also known as DDL) is a computer language used to define
data structures as its namesake suggests. It first made its appearance in the CODASYL
database model (a model pertaining to the information technology industry consortium,
known as Conference on Data Systems Languages). DDL was used within the schema of
the database in order to describe the records, fields, and sets that made up the user Data
Model. It was at first a way in which programmers defined SQL. Now, however, it is used
generically to refer to any formal language used to describe data or information structures
(for example, XML schemas).
The most popular form of DML is the Structured Query Language (or SQL). This is a
language used for databases, and is designed specifically for managing data in relational
database management systems (or RDBMS). There are also other forms in which DML is
used, for instance IM S/DLI, CODASYL databases (IDMS, for example), and a few
others. DML comprises of SQL data change statements, meaning that stored data is
modified, but the schema or database objects remain the same. The functional capability of
the DML is organised by the initial word in a statement. This word is most generally a
verb giving the page a specific action to fulfil. There are four specific verbs that initiate
an action: SELECTINTO, INSERT, UPDATE, and DELETE.
The DDL is used mainly to create that is to make a new database, table, index or stored
query. A CREATE statement in SQL literally creates an object inside any RDBMS. As
such, the types of objects able to be created are completely dependent on which RDBMS
is currently in use. Most RDBMS support the table, index, user, synonym and database
creation. In some cases, a system will allow the CREATE command and other DDL
commands inside a specific transaction. This means that these functions are capable of
being rolled back. The most common CREATE command is the CREATE TABLE
command.
DMLs are quite various. They have different functions and capabilities between database
vendors. There are only two DML languages, however: Procedural and Declarative. While
6
there are multiple standards established for SQL, most vendors provide their own
extensions to the standard without implementing it entirely.
Summary:
1. DML is a grouping of computer languages used by computer programs to manipulate
data in a database; DDL is a computer language used specifically to define data structures.
2. The most popular form of DML is SQL, and is comprised of various change statements;
DDL mainly uses the CREATE command.
DDL
Data Definition Language (DDL) statements are used to define the database structure or
schema. Some examples:
o CREATE - to create objects in the database o ALTER - alters the structure of the database o DROP - delete objects from the database o TRUNCATE - remove all records from a table, including all spaces allocated for
the records are removed
o COMMENT - add comments to the data dictionary o RENAME - rename an object
DML
Data Manipulation Language (DML) statements are used for managing data within
schema objects. Some examples:
o SELECT - retrieve data from the a database o INSERT - insert data into a table o UPDATE - updates existing data within a table o DELETE - deletes all records from a table, the space for the records remain o MERGE - UPSERT operation (insert or update) o CALL - call a PL/SQL or Java subprogram o EXPLAIN PLAN - explain access path to data o LOCK TABLE - control concurrency
DCL
Data Control Language (DCL) statements. Some examples:
o GRANT - gives user's access privileges to database o REVOKE - withdraw access privileges given with the GRANT command
7
TCL
Transaction Control (TCL) statements are used to manage the changes made by DML
statements. It allows statements to be grouped together into logical transactions.
o COMMIT - save work done o SAVEPOINT - identify a point in a transaction to which you can later roll back o ROLLBACK - restore database to original since the last COMMIT o SET TRANSACTION - Change transaction options like isolation level and what
rollback segment to use
8
Explain various types of data models with the help of diagrams?
In software engineering, the term data model is used in two related senses. In the sense
covered by this article, it is a description of the objects represented by a computer system
together with their properties and relationships; these are typically "real world" objects
such as products, suppliers, customers, and orders. In the second sense, covered by the
article database model, it means a collection of concepts and rules used in defining data
models: for example the relational model uses relations and tuples, while the network
model uses records, sets, and fields.
Overview of data modeling context: Data model is based on Data, Data relationship, Data
semantic and Data constraint. A data model provides the details of information to be
stored, and is of primary use when the final product is the generation of computer software
code for an application or the preparation of a functional specification to aid a computer
software make-or-buy decision. The figure is an example of the interaction between
process and data models.
Data models are often used as an aid to communication between the business people
defining the requirements for a computer system and the technical people defining the
design in response to those requirements. They are used to show the data needed and
created by business processes.
According to Hoberman (2009), "A data model is a wayfinding tool for both business and
IT professionals, which uses a set of symbols and text to precisely explain a subset of real
information to improve communication within the organization and thereby lead to a more
flexible and stable application environment."
9
A data model explicitly determines the structure of data. Data models are specified in a
data modeling notation, which is often graphical in form.
A data model can be sometimes referred to as a data structure, especially in the context of
programming languages. Data models are often complemented by function models,
especially in the context of enterprise models.
Relationships and functions
A given database management system may provide one or more of the five models. The
optimal structure depends on the natural organization of the application's data, and on the
application's requirements, which include transaction rate (speed), reliability,
maintainability, scalability, and cost. Most database management systems are built around
one particular data model, although it is possible for products to offer support for more
than one model.
Various physical data models can implement any given logical model. Most database
software will offer the user some level of control in tuning the physical implementation,
since the choices that are made have a significant effect on performance.
A model is not just a way of structuring data: it also defines a set of operations that can be
performed on the data. The relational model, for example, defines operations such as
select (project) and join. Although these operations may not be explicit in a particular
query language, they provide the foundation on which a query language is built.
Flat model
Flat File Model.
The flat (or table) model consists of a single, two-dimensional array of data elements,
where all members of a given column are assumed to be similar values, and all members
of a row are assumed to be related to one another. For instance, columns for name and
password that might be used as a part of a system security database. Each row would have
the specific password associated with an individual user. Columns of the table often have a
10
type associated with them, defining them as character data, date or time information,
integers, or floating point numbers. This tabular format is a precursor to the relational
model.
Early data models
These models were popular in the 1960s, 1970s, but nowadays can be found primarily in
old legacy systems. They are characterized primarily by being navigational with strong
connections between their logical and physical representations, and deficiencies in data
independence.
Hierarchical model
In a hierarchical model, data is organized into a tree-like structure, implying a single
parent for each record. A sort field keeps sibling records in a particular order. Hierarchical
structures were widely used in the early mainframe database management systems, such as
the Information Management System (IMS) by IBM, and now describe the structure of
XML documents. This structure allows one one-to-many relationship between two types
of data. This structure is very efficient to describe many relationships in the real world;
recipes, table of contents, ordering of paragraphs/verses, any nested and sorted
information.
This hierarchy is used as the physical order of records in storage. Record access is done by
navigating through the data structure using pointers combined with sequential accessing.
Because of this, the hierarchical structure is inefficient for certain database operations
when a full path (as opposed to upward link and sort field) is not also included for each
record. Such limitations have been compensated for in later IMS versions by additional
logical hierarchies imposed on the base physical hierarchy.
11
Network model
The network model expands upon the hierarchical structure, allowing many-to-many
relationships in a tree-like structure that allows multiple parents. It was the most popular
before being replaced by the relational model, and is defined by the CODASYL
specification.
The network model organizes data using two fundamental concepts, called records and
sets. Records contain fields (which may be organized hierarchically, as in the
programming language COBOL). Sets (not to be confused with mathematical sets) define
one-to-many relationships between records: one owner, many members. A record may be
an owner in any number of sets, and a member in any number of sets.
A set consists of circular linked lists where one record type, the set owner or parent,
appears once in each circle, and a second record type, the subordinate or child, may appear
multiple times in each circle. In this way a hierarchy may be established between any two
record types, e.g., type A is the owner of B. At the same time another set may be defined
where B is the owner of A. Thus all the sets comprise a general directed graph (ownership
defines a direction), or network construct. Access to records is either sequential (usually in
each record type) or by navigation in the circular linked lists.
The network model is able to represent redundancy in data more efficiently than in the
hierarchical model, and there can be more than one path from an ancestor node to a
descendant. The operations of the network model are navigational in style: a program
maintains a current position, and navigates from one record to another by following the
12
relationships in which the record participates. Records can also be located by supplying
key values.
Although it is not an essential feature of the model, network databases generally
implement the set relationships by means of pointers that directly address the location of a
record on disk. This gives excellent retrieval performance, at the expense of operations
such as database loading and reorganization.
Popular DBMS products that utilized it were Cincom Systems' Total and Cullinet's IDMS.
IDMS gained a considerable customer base; in the 1980s, it adopted the relational model
and SQL in addition to its original tools and languages.
Most object databases (invented in the 1990s) use the navigational concept to provide fast
navigation across networks of objects, generally using object identifiers as "smart"
pointers to related objects. Objectivity/DB, for instance, implements named one-to-one,
one-to-many, many-to-one, and many-to-many named relationships that can cross
databases. Many object databases also support SQL, combining the strengths of both
models.
Inverted file model
In an inverted file or inverted index, the contents of the data are used as keys in a lookup
table, and the values in the table are pointers to the location of each instance of a given
content item. This is also the logical structure of contemporary database indexes, which
might only use the contents from a particular columns in the lookup table. The inverted
file data model can put indexes in a second set of files next to existing flat database files,
in order to efficiently directly access needed records in these files.
Notable for using this data model is the ADABAS DBMS of Software AG, introduced in
1970. ADABAS has gained considerable customer base and exists and supported until
today. In the 1980s it has adopted the relational model and SQL in addition to its original
tools and languages.
13
Relational model
The relational model was introduced by E.F. Codd in 1970 as a way to make database
management systems more independent of any particular application. It is a mathematical
model defined in terms of predicate logic and set theory, and systems implementing it
have been used by mainframe, midrange and microcomputer systems.
The products that are generally referred to as relational databases in fact implement a
model that is only an approximation to the mathematical model defined by Codd. Three
key terms are used extensively in relational database models: relations, attributes, and
domains. A relation is a table with columns and rows. The named columns of the relation
are called attributes, and the domain is the set of values the attributes are allowed to take.
The basic data structure of the relational model is the table, where information about a
particular entity (say, an employee) is represented in rows (also called tuples) and
columns. Thus, the "relation" in "relational database" refers to the various tables in the
database; a relation is a set of tuples. The columns enumerate the various attributes of the
entity (the employee's name, address or phone number, for example), and a row is an
actual instance of the entity (a specific employee) that is represented by the relation. As a
result, each tuple of the employee table represents various attributes of a single employee.
All relations (and, thus, tables) in a relational database have to adhere to some basic rules
to qualify as relations. First, the ordering of columns is immaterial in a table. Second, there
can't be identical tuples or rows in a table. And third, each tuple will contain a single value
for each of its attributes.
A relational database contains multiple tables, each similar to the one in the "flat" database
model. One of the strengths of the relational model is that, in principle, any value
occurring in two different records (belonging to the same table or to different tables),
implies a relationship among those two records. Yet, in order to enforce explicit integrity
constraints, relationships between records in tables can also be defined explicitly, by
identifying or non-identifying parent-child relationships characterized by assigning
cardinality (1:1, (0)1:M, M:M). Tables can also have a designated single attribute or a set
14
of attributes that can act as a "key", which can be used to uniquely identify each tuple in
the table.
A key that can be used to uniquely identify a row in a table is called a primary key. Keys
are commonly used to join or combine data from two or more tables. For example, an
Employee table may contain a column named Location which contains a value that
matches the key of a Location table. Keys are also critical in the creation of indexes,
which facilitate fast retrieval of data from large tables. Any column can be a key, or
multiple columns can be grouped together into a compound key. It is not necessary to
define all the keys in advance; a column can be used as a key even if it was not originally
intended to be one.
A key that has an external, real-world meaning (such as a person's name, a book's ISBN,
or a car's serial number) is sometimes called a "natural" key. If no natural key is suitable
(think of the many people named Brown), an arbitrary or surrogate key can be assigned
(such as by giving employees ID numbers). In practice, most databases have both
generated and natural keys, because generated keys can be used internally to create links
between rows that cannot break, while natural keys can be used, less reliably, for searches
and for integration with other databases. (For example, records in two independently
developed databases could be matched up by social security number, except when the
social security numbers are incorrect, missing, or have changed.)
The most common query language used with the relational model is the Structured Query
Language (SQL).
Dimensional model
The dimensional model is a specialized adaptation of the relational model used to
represent data in data warehouses in a way that data can be easily summarized using
online analytical processing, or OLAP queries. In the dimensional model, a database
schema consists of a single large table of facts that are described using dimensions and
measures. A dimension provides the context of a fact (such as who participated, when and
where it happened, and its type) and is used in queries to group related facts together.
Dimensions tend to be discrete and are often hierarchical; for example, the location might
include the building, state, and country. A measure is a quantity describing the fact, such
as revenue. It is important that measures can be meaningfully aggregatedfor example,
the revenue from different locations can be added together.
15
In an OLAP query, dimensions are chosen and the facts are grouped and aggregated
together to create a summary.
The dimensional model is often implemented on top of the relational model using a star
schema, consisting of one highly normalized table containing the facts, and surrounding
denormalized tables containing each dimension. An alternative physical implementation,
called a snowflake schema, normalizes multi-level hierarchies within a dimension into
multiple tables.
A data warehouse can contain multiple dimensional schemas that share dimension tables,
allowing them to be used together. Coming up with a standard set of dimensions is an
important part of dimensional modeling.
Its high performance has made the dimensional model the most popular database structure
for OLAP.
Post-relational database models
Products offering a more general data model than the relational model are sometimes
classified as post-relational. Alternate terms include "hybrid database", "Object-enhanced
RDBMS" and others. The data model in such products incorporates relations but is not
constrained by E.F. Codd's Information Principle, which requires that
all information in the database must be cast explicitly in terms of values in relations and in
no other way
Some of these extensions to the relational model integrate concepts from technologies that
pre-date the relational model. For example, they allow representation of a directed graph
with trees on the nodes. The German company sones implements this concept in its
GraphDB.
Some post-relational products extend relational systems with non-relational features.
Others arrived in much the same place by adding relational features to pre-relational
systems. Paradoxically, this allows products that are historically pre-relational, such as
PICK and MUMPS, to make a plausible claim to be post-relational.
The resource space model (RSM) is a non-relational data model based on multi-
dimensional classification.
Graph model
Graph databases allow even more general structure than a network database; any node
may be connected to any other node.
16
Multivalue model
Multivalue databases are "lumpy" data, in that they can store exactly the same way as
relational databases, but they also permit a level of depth which the relational model can
only approximate using sub-tables. This is nearly identical to the way XML expresses
data, where a given field/attribute can have multiple right answers at the same time.
Multivalue can be thought of as a compressed form of XML.
An example is an invoice, which in either multivalue or relational data could be seen as
(A) Invoice Header Table - one entry per invoice, and (B) Invoice Detail Table - one entry
per line item. In the multivalue model, we have the option of storing the data as on table,
with an embedded table to represent the detail: (A) Invoice Table - one entry per invoice,
no other tables needed.
The advantage is that the atomicity of the Invoice (conceptual) and the Invoice (data
representation) are one-to-one. This also results in fewer reads, less referential integrity
issues, and a dramatic decrease in the hardware needed to support a given transaction
volume.
Object-oriented database models
In the 1990s, the object-oriented programming paradigm was applied to database
technology, creating a new database model known as object databases. This aims to avoid
the object-relational impedance mismatch - the overhead of converting information
between its representation in the database (for example as rows in tables) and its
representation in the application program (typically as objects). Even further, the type
system used in a particular application can be defined directly in the database, allowing the
database to enforce the same data integrity invariants. Object databases also introduce the
17
key ideas of object programming, such as encapsulation and polymorphism, into the world
of databases.
A variety of these ways have been tried for storing objects in a database. Some products
have approached the problem from the application programming end, by making the
objects manipulated by the program persistent. This typically requires the addition of some
kind of query language, since conventional programming languages do not have the ability
to find objects based on their information content. Others have attacked the problem from
the database end, by defining an object-oriented data model for the database, and defining
a database programming language that allows full programming capabilities as well as
traditional query facilities.
Object databases suffered because of a lack of standardization: although standards were
defined by ODMG, they were never implemented well enough to ensure interoperability
between products. Nevertheless, object databases have been used successfully in many
applications: usually specialized applications such as engineering databases or molecular
biology databases rather than mainstream commercial data processing. However, object
database ideas were picked up by the relational vendors and influenced extensions made to
these products and indeed to the SQL language.
An alternative to translating between objects and relational databases is to use an object-
relational mapping (ORM) library.
18
Explain all types of DBMS? List the system and command along with output?
A DBMS always provides data independence. Any change in storage mechanism and
formats are performed without modifying the entire application. There are four main types
of database organization:
Relational Database: Data is organized as logically independent tables.
Relationships among tables are shown through shared data. The data in one table
may reference similar data in other tables, which maintains the integrity of the
links among them. This feature is referred to as referential integrity - an important
concept in a relational database system. Operations such as "select" and "join" can
be performed on these tables. This is the most widely used system of database
organization.
Flat Database: Data is organized in a single kind of record with a fixed number of
fields. This database type encounters more errors due to the repetitive nature of
data.
Object Oriented Database: Data is organized with similarity to object oriented
programming concepts. An object consists of data and methods, while classes
group objects having similar data and methods.
Hierarchical Database: Data is organized with hierarchical relationships. It
becomes a complex network if the one-to-many relationship is violated.
Data management models
The data management systems (also called data base management systems) introduced
several new ways of organizing data. That is they introduced several new ways of linking
record fragments (or segments) together to form larger records for processing. Although
many different methods were tried, only three major methods became popular: the
hierarchic method, the network method, and the newest, the relational method.
Each of these methods reflected the manner in which the vendor constructed and
physically managed data within the file. The systems designer and the programmer had to
understand these methods so that they could retrieve and process the data in the files.
These models depicted the way the record fragments were tied to each other and thus the
manner in which the chain of pointers had to be followed to retrieved the fragments in the
correct order.
Each vendor introduced a structural model to depict how the data was organized and tied
together. These models also depicted what options were chosen to be implemented by the
19
development team, data record dependencies, data record occurrence frequencies, and the
sequence in which data records had to be accessed - also called the navigation sequence.
The hierarchic model
The hierarchic model (figure) is used to describe those record structures in which the
various physical records which make up the logical record are tied together in a sequence
which looks like an inverted tree. At the top of the structure is a single record. Beneath
that are one or more records each of which can occur one or more times. Each of these can
in turn have multiple records beneath them. In diagrammatic form the top to bottom set of
records looks like a inverted tree or a pyramid of records. To access the set of records
associated with the identifier one started at the top record and followed the pointers from
record to record.
20
The various records in the lower part of the structure are accessed by first accessing the
records above them and then following the chain of pointers to the records at the next
lower level. The records at any given level are referred to as the parent records and the
records at the next lower level that are connected to it, or dependent on it are referred to as
its children or the child records. There can be any number of records at any level, and each
record can have any number of children. Each occurrence of the structure normally
represent the collection of data about a single subject. This parent-child repetition can be
repeated through several levels.
The data model for this type of structural representation usually depicts each segment or
record fragment only once and uses lines to show the connection between a parent record
and its children. This depiction of record types and lines connecting them looks like an
inverted tree or an organizational hierarchy chart.
Each file is said to consist of a number of repetitions of this tree structure. Although the
data model depicts all possible records types within a structure, in any given occurrence,
record types may or may not be present. Each occurrence of the structure represents a
specific subject occurrence an is identified by a unique identifier in the single, topmost
record type (the root record).
Designers employing this type of data management system would have to develop a
unique record hierarchy for each data storage subject. A given application may have
several different hierarchies, each representing data about a different subject, associated
with it and a company may have several dozen different hierarchies of record types as
components of its data model. A characteristic of this type of model is that each hierarchy
is normally treated as separate and distinct from the other hierarchies, and various
hierarchies can be mixed and matched to suit the data needs of the particular application.
The network model
The network data model (figure) has no implicit hierarchic relationship between the
various records, and in many cases no implicit structure at all, with the records seemingly
placed at random. The network model does not make a clear distinction between subjects
mingling all record types in an overall schematic. The network model may have many
different records containing unique identifiers, each of which acts as an entry point into
the record structure. Record types are grouped into sets of two, one or both of which can in
turn be part of another set of two record types. Within a given set, one record type is said
to be the owner record and one is said to be the member record. Access to a set is always
21
accomplished by first locating the specific owner record and then following the chain of
pointers to the member records of the set. The network can be traversed or navigated by
moving from set to set. Various different data structures can be constructed by selecting
sets of records and excluding others.
Each record type is depicted only once in this type of data model and the relationship
between record types is indicated by a line between them. The line joining the two records
contains the name of the set. Within a set a record can have only one owner, but multiple
owner member sets can be constructed using the same two record types
The network model has no explicit hierarchy and no explicit entry point. Whereas the
hierarchic model has several different hierarchies structures, the network model employs a
22
single master network or model, which when completed looks like a web of records. As
new data is required, records are added to the network and joined to existing sets.
The relational model
The relational model (figure), unlike the network or the hierarchic models did not rely on
pointers to connect and chose to view individual records in sets regardless of the subject
occurrence they were associated with. This is in contrast to the other models which sought
to depict the relationships between record types. In the network model records are
portrayed as residing in tables with no physical pointer between these tables. Each table is
thus portrayed independently from each other table. This made the data model itself a
model of simplicity, but it in turn made the visualization of all the records associated with
a particular subject somewhat difficult.
23
Data records were connected using logic and by using that data that was redundantly
stored in each table. Records on a given subject occurrence could be selected from
multiple tables by matching the contents of these redundantly stored data fields.
The impact of data management systems
The use of these products to manage data introduced a new set of tasks for the data
analysis personnel. In addition to developing record layouts, they also had the new task of
determining how these records should be structured, or arranged and joined by pointer
structures.
Once those decisions were made they had to be conveyed to the members of the
implementation team. The hierarchic and network models were necessary because without
them the occurrence sequences and the record to record relationships designed into the
files could not be adequately portrayed. Although the relational "model" design choices
also needed to be conveyed to the implementation team, the relational model was always
depicted in much the same format as standard record layouts, and any other access or
navigation related information could be conveyed in narrative form.
Data as a corporate Resource
One additional concept was introduced during the period when these new file management
systems were being developed - the concept that data was a corporate resource. The
implications of concept were that data belonged to the corporation as a whole, and not to
individual user areas. This implied that data should somehow be shared, or used in
common by all members of the firm.
Data sharing required data planning. Data had to be organized, sized and formatted to
facilitate use by all who needed it. This concept of data sharing was diametrically opposed
to the application orientation where data records and data files were designed for, and data
owned by the application and the users of that application.
This concept also introduced a new set of participants in the data analysis process and a
new set of users of the data models. These new people were business area personnel who
were now drawn into the data analysis process. The data record models which had sufficed
for the data processing personnel no longer conveyed either the right information nor
information with the correct perspective to be meaningful for these new participants.
The primary method of data planning is the development of the data model. Many of the
early data planning was accomplished within the context of the schematics used by the
design team to describe the data management file structures.
24
These models were used as analysis and requirements tools, and as such were moderately
effective. They were limited in one respect, that being that organizations tended to use the
implementation model, which also contained information about pointer use, navigation
information, or in the case of the network models, owner-member set information, access
choice information and other information which was important to the data processing
implementation team, but not terribly relevant to the user.
Normalization
Concurrent with the introduction of the relational data model another concept was
introduced - that of normalization. Although it was introduced in the early nineteen-
seventies its full impact did not begin to be felt until almost a decade later, and even today
its concepts are not well understood. The various record models gave the designer a way
of presenting to the user not only the record layout but also also the connections between
the data records. In a sense allowing the designer to show the user what data could be
accessed with what other data. Determination of record content however was not
addressed in any methodical manner. Data elements were collect into records in a
somewhat haphazard manner. That is there was no rationale or predetermined reason why
one data element was placed in the same record as another. Nor was there any need to do
so since the physical pointers between records prevented data on one subject from being
confused with data about another, even at the occurrence level.
The relational model however lacked these pointers and relied on logic to assemble a
complete set of data from its tables. Because it was logic driven (based upon mathematics)
the notion was proposed that placement of data elements in records could also be guided
by a set of rules. If followed, these rules would eliminate many of the design mistakes
which arose from the meaning of data being inadvertently changed due to totally unrelated
changes. It also set forth rules which if followed would arrange the data within the records
and within the files more logically and more consistently.
Previously data analysis, file and record designers, relied on intuition and experience to
construct record layouts. As the design progressed, data was moved from record to record,
records were split and others combined until the final model was pleasing, relatively
efficient and satisfied the processing needs of the application that needed the data that
these models represented. Normalization offered the hope that the process of record
layout, and thus model development could be more procedurally driven, more rule driven
such that relatively inexperienced users could also participate in the process. It was also
25
hoped that these rules would also assist the experienced designer and eliminate some of
the iterations, and thus make the process more efficient.
The first rule of normalization was that data should depend (or be collected) by key. That
is, data should be organized by subject, as opposed to previous methods which collected
data by application or system. This notion was obvious to hierarchic model users, whose
models inherently followed this principle, but was somewhat foreign and novel to network
model developers where the aggregation of data about a data subject was not as
commonplace.
This notion of subject organized data led to the development of non-DBMS oriented data
models.
The Entity-Relationship model
While the record data models served many purposes for the system designers, these
models had little meaning or relevance to the users community. Moreover, much of the
information the users needed to evaluate the effectiveness of the design was missing.
Several alternative data model formats were introduced to fill this void. These models
attempted to model data in a different manner. Rather than look at data from a record
perspective, they began to look at the entities or subjects about which data was being
collected and maintained. They also realized that the the relationship between these data
subjects was also an area that needed to be modeled and subjected to user scrutiny. These
relationships were important because in may respects they reflected the business rules
under which the firm operated. This modeling of relationships was particularly important
when relational data management systems were being used because the relationship
between the data tables was not explicitly stated, and the design team required some
method for describing those relationships to the user.
As we shall see later on, the Entity-Relatinship model has one other important advantage.
In as much as it is non-DBMS specific, and is in fact not a DBMS model at all, data
models can be developed by the design team without first having to make a choice as to
which DBMS to use. In those firms where multiple data management systems are both in
use and available, this is a critical advantage in the design process.
26
IRCTC TABLES Layout of railway reservation form and connection of this form with the database required
to store information.
PASSENGERS DATABASE: database of passengers contains following fields
1. Name
2. Age
3. Gender..
4. Total Number Of Passengers Travelling
Number of Adults..
Number Of children..
Senior Citizen
5. Date Of Travel
6. Class of Travel..
TRAIN DATABASE : database of train contains following fields
1. Train Name.
2. Train Number..
3. RouteFrom..To..
4. Train Time
5. Number of Compartments.
AC First Class
AC 2 Tier
AC 3 Tier
Sleeper..
General.
6. Number of Employees.
DESIGN OF TABLES
The passenger database will contain following fields
PNR NO (Primary key)
NAME
AGE
GENDER
TOTAL PASSENGER
DATE OF TRAVEL
27
CLASS
TRAIN NO.
The train database will contain following field
Train name
Train no. (Primary key)
Route from-to
Departure time
No of compartments
1 AC
2AC
3AC
SLEEPER
GENERAL
SLR
SNAPSHOTS OF TABLES
TABLE FORPASSENGERS
28
This is the original snapshot from M S Access. The primary key here is PNR NO. , this
table also contains name of passenger, age, gender, total passenger travelling, date of
travel, class and train no. in which they are travelling.
TABLE FOR TRAINS
This is the original snapshot from M S Access. The primary key here is train no. , this
table also contains train name, route, departure time from originating station, no. of
compartments in whole train and class wise segmentation of compartments.
29
SNAPSHOTS OF FORMS
PASSENGER RESERVATION FORM
This form contain the same data labels whatever is there in M S ACCESS database i.e.
name of passenger, age, gender, total passenger travelling, date of travel, class and train
no. in which they are travelling
30
FORM FOR TRAINS
This form contains the same data labels whatever is there in M S ACCESS database. I.e.
train name, route, and departure time from originating station, no. of compartments in
whole train and class wise segmentation of compartments.