Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1
Database System Second Year
Bahaa Dhiaa
Control & Systems Eng.
Computer Eng. Branch
Lec. 1
Introduction
What is Database?
DB. A collection of information organized in such a way that a computer program can quickly
select desired pieces of data. You can think of a database as an electronic filing system.
Traditional databases are organized by fields, records, and files. A field is a single piece of
information, a record is one complete set of fields, and a file is a collection of records. For
example, a telephone book is analogous to a file. It contains a list of records, each of which
consists of three fields: name, address, and telephone number.
To access information from a database, you need a database management system (DBMS).
This is a collection of programs that enables you to enter, organize, and select data in a
database. Major components of Database system are:
1. Data
2. Hardware
3. Software
4. Users
Data:
The term data means groups of information that represent the qualitative or quantitative
attributes of a variable or set of variables. Data (plural of "datum", which is seldom used) are
typically the results of measurements and can be the basis of graphs, images, or observations
of a set of variables. Data are often viewed as the lowest level of abstraction from which
information and knowledge are derived.
Generally data in the database are both
Integrated: the unification of several otherwise distinct data files with any redundancy
among those files either wholly or partly removed.
2
Shared: means that the individual pieces of data in the DB can be shared among
several different users some piece of data can be accrued of the same time (concurrent
access).
Hardware:
Needs two basically component
Mass media storage ( secondary storage)
Processor with associated memory.
Software:
Between the physical DB itself and the users is software. This software is referred to as the
DataBase Management System (DBMS). The DBMS shields the DB users from hardware
level details.
The DBMS provides users with a view of the DB that is elevated above hardware level.
Users (Database Users):
There are three different types of database system users:
1. Application programmers: who are responsible for writing database application
programs in some programming language such as COBOL, PL/I. C++, Java, or some
higher-level "fourth-generation language”.
2. End users who access the database interactively. A given end user can access the
database via one of the online applications or he can use an interface provided as an
integral part of the system.
3. Database administrator (DBA) who discuss the database administration function and
the associated (very important) data.
Database Administrator
The database administrator is a person having central control over data and programs
accessing that data. Duties of the database administrator include:
1. Scheme definition.
2. Storage structure and access method definition.
3. Scheme and physical organization modification.
4. Granting of authorization for data access.
5. Integrity constraint specification.
3
Why Database?
1. Compactness: There is no need for possibly voluminous paper files.
2. Speed: The machine can retrieve and update data far faster than a human can.
3. Less drudgery: Much of maintaining files by hand is eliminated.
4. Currency: Accurate, up-to-date information is available on demand at any time.
5. Protection: The data can be better protected against unintentional loss and unlawful
access.
Benefits of DB approach:
1. The data can· be shared.
2. Redundancy can be reduced.
3. Inconsistency can be avoided
4. Transaction support call be provided.
5. Integrity can be maintained.
6. Security can be enforced
7. Conflicting requirements can be balanced.
8. Standards can be enforced.
Data Abstraction:
The major purpose of a database system is to provide users with an abstract view of the
system. The system hides certain details of how data is stored and created and maintained.
Complexity should be hidden from database users.
There are several levels of abstraction:
1. The internal level (also known as the storage level or physical level). It is the one
closest to physical storage. It is the one concerned with the way the data is stored
inside the system.
2. Conceptual Level (also known as logical level): Conceptual Level represents the
entire database. Conceptual schema describes the records and relationship included
in the Conceptual view. It also contains the method of deriving the objects in the
conceptual view from the objects in the internal view.
3. The external level (also known as the user level or view level) is the one closest to
the users. It is the one concerned with the way the data is seen by individual users, ·
4
Mappings
In addition to the three level of architecture, there are certain mappings: one
conceptual\internal mapping and several external/conceptual mappings: The
conceptual\internal mapping defines the correspondence between the conceptual view and
the stored database. An external\conceptual mapping defines the correspondence between a
particular external view and the conceptual view.
Instances and schemas in database
Definition of schema: Design of a database is called the schema. Schema is of three types:
Physical schema, logical schema and view schema.
The design of a database at physical level is called physical schema. How the data stored
in blocks of storage is described at this level.
Design of database at logical level is called logical schema. Programmers and database
administrators work at this level. At this level, data can be described as certain types of data
records gets stored in data structures.
5
Design of database at view level is called view schema. This generally describes end user
interaction with database systems.
Definition of instance: The data stored in database at a particular moment of time is called
instance of database. Database schema defines the variable declarations in tables that belong
to a particular database, but the value of these variables at a moment of time is called the
instance of that database.
Database languages:
A database system provides three different types of languages:
1. Data Definition Language (DDL)
It is a language that allows the users to define data and their relationship to other types of
data. It is mainly used to create files, databases, data dictionary and tables within databases.
It is also used to specify the structure of each table, set of associated values with each
attribute, integrity constraints, security and authorization information for each table and
physical storage structure of each table on disk.
The following table gives an overview about usage of DDL statements in SQL.
2. Data Manipulation Language (DML)
It is a language that provides a set of operations to support the basic data manipulation
operations on the data held in the databases. It allows users to insert, update, delete and
retrieve data from the database. The part of DML that involves data retrieval is called a
query language.
The following table gives an overview about the usage of DML statements in SQL:
6
3. Data Control Language (DCL)
DCL statements control access to data and the database using statements such as GRANT
and REVOKE. A privilege can either be granted to a User with the help of GRANT statement.
In addition to granting of privileges, you can also revoke (taken back) it by using REVOKE
command.
The following table gives an overview about the usage of DCL statements in SQL:
Database Management System (DBMS):
A DBMS is a set of software programs that controls the organization, storage, management,
and retrieval of data in a database. So when a user issues an access request using some
particular data sublanguage, the DBMS accepts that request and analyzes it. The DBMS
inspects the external schema for that user, the corresponding external/conceptual mapping,
the conceptual schema, the conceptual/internal mapping, and the stored database definition.
The DBMS executes the necessary operations on the stored database. Here is some examples
of various DBMS (computerized library systems, automated teller machines (ATM), flight
reservation systems and computerized parts inventory systems).
7
Database Manager
The database manager is a program module which provides the interface between the
low-level data stored in the database and the application programs and queries submitted
to the system. So the database manager module is responsible for
Interaction with the file manager: Data stored on disk using the file system
usually provided by a conventional operating system. The database manager
must translate DML statements into low-level file system commands (for storing,
retrieving and updating data in the database).
Integrity enforcement: Checking that updates in the database do not violate
consistency constraints (e.g. no bank account balance below $0)
Security enforcement: Ensuring that users only have access to information they
are permitted to see
Backup and recovery: Detecting failures due to power failure, disk crash,
software errors, etc., and restoring the database to its state before the failure
Concurrency control: Preserving data consistency when there are concurrent
users.
File Manager and Disk Manager
Responsibility for the structure of the files and managing the file space goes to the
file manager. It is also responsible for locating the block containing the required record,
requesting this block from the disk manager, and transmitting the required record to the data
manager as shown. The file manager can be implemented using an interface to the existing
file subsystem provided by the operating system of the host computer or it can include a file
subsystem written especially for the DBMS. While the disk manager is part of the operating
system of the host computer and all physical input and output operations are performed by it.
The disk manager transfers the block or page requested by the file manager so that the latter
need not be concerned with the physical characteristics of the underlying storage media.
Overall System Structure
Database systems are partitioned into modules for different functions. Some functions (e.g.
file systems) may be provided by the operating system. The components of the overall
database system structure include:
8
File manager manages allocation of disk space and data structures used to
represent information on disk.
Database manager: The interface between low-level data and application
programs and queries.
Query processor translates statements in a query language into low-level
instructions the database manager understands. (May also attempt to find an
equivalent but more efficient form.)
DML compiler converts DML statements embedded in an application program
to normal procedure calls in a host language. The compiler interacts with the
query processor.
DDL compiler converts DDL statements to a set of tables containing metadata
stored in a data dictionary.
9
1
Database System Second Year
Bahaa Dhiaa
Control & Systems Eng.
Computer Eng. Branch
Lec. 2 Data Models
Data Models
Data Model can be defined as an integrated collection of concepts for describing and
manipulating data, relationships between data, and constraints on the data in an organization.
It illustrate how the logical structure of a database is modeled, how data is connected to each
other and how they are processed and stored inside the system.
The purpose of a data model is to represent data and to make the data understandable.
There have been many data models proposed, they fall into three broad categories:
Object Based Data Models
Record Based Data Models
Physical Data Models
The object based and record based data models are used to describe data at the conceptual
and external levels, the physical data model is used to describe data at the internal level.
Object based data models use concepts such as entities, attributes, and relationships. An
entity is a distinct object (a person, place, concept, and event) in the organization that is to be
represented in the database. An attribute is a property that describes some aspect of the object
that we wish to record, and a relationship is an association between entities. Some of the
more common types of object based data model are:
Entity-Relationship
Object Oriented
Semantic
Functional
Record based logical Models – Like Object based model, they also describe data at the
conceptual and view levels. These models specify logical structure of database with records,
fields and attributes. There are three types of record based data models defined: Hierarchical
data models, Network data models and Relational data models. Most widely used record
based data model is relational data model. Other two are not widely used.
2
Physical data model represent the model where it describes how data are stored in
computer memory, how they are scattered and ordered in the memory, and how they would
be retrieved from memory. Basically physical data model represents the data at data layer or
internal layer.
Entity-Relationship Model (E-R Model)
It is a graphical technique, which is used to convert the requirement of the system to
graphical representation, so that it can become well understandable. A basic component of
the model is the Entity-Relationship diagram, which is used to visually represent data objects.
ER Model is based on Entities (and their attributes) and Relationships among entities.
Entity: An entity in an ER Model is a real-world entity having properties called
attributes. Every attribute is defined by its set of values called domain. For example,
in a school database, a student is considered as an entity. Student has various attributes
like name, age, class, etc.
Relationship: The logical association among entities is called relationship.
Relationships are mapped with entities in various ways. Mapping cardinalities define
the number of association between two entities. Mapping cardinalities are in four
types:
one to one
one to many
many to one
many to many
3
Relational Model
The most popular data model in DBMS is the Relational Model. It is more scientific
model than others. It is used widely around the world for data storage and processing. This
model is simple and it has all the properties and capabilities required to process data with
storage efficiency. The concepts of this model are:
Tables − In relational data model, relations are saved in the format of Tables. This
format stores the relation among entities. A table has rows and columns, where rows
represents records and columns represent the attributes.
Tuple − A single row of a table, which contains a single record for that relation is
called a tuple.
Relation instance − A finite set of tuples in the relational database system represents
relation instance.
Relation schema − A relation schema describes the relation name (table name),
attributes, and their names.
Attribute domain − Every attribute has some pre-defined value scope, known as
attribute domain.
4
In a relational database, all data are stored and accessed via relations. Relations that store
data are called "base relations", and in implementations are called "tables". Other relations
do not store data, but are computed by applying operations to other relations. These relations
are sometimes called "derived relations". In implementations these are called "views" or
"queries".
Hierarchical Data Models
In this data model, the entities are represented in a hierarchical fashion. Here we
identify a parent entity, and its child entity. Again we drill down to identify next level of child
entity and so on. This model can be imagined as folders inside a folder.
It can also be imagined as root like structure. This model will have only one main root. It
then branches into sub-roots, each of which will branch again. This type of relationship is
best defined for (1:N) type of relationships. E.g.; One company has multiple departments
(1:N), one company has multiple suppliers (1:N),one department has multiple employees
(1:N), each department has multiple projects(1:N).
Network Data Models
This is the enhanced version of hierarchical data model. It is designed to address the
drawbacks of the hierarchical model. It helps to address M:N relationship. This data model
is also represented as hierarchical, but this model will not have single parent concept. Any
child in the tree can have multiple parents here.
1
Database System Second Year
Bahaa Dhiaa
Control & Systems Eng.
Computer Eng. Branch
Lec.3 SQL
What is SQL?
SQL (pronounced "ess-que-el") stands for Structured Query Language. SQL is used to
communicate with a database. According to ANSI (American National Standards Institute),
it is the standard language for relational database management systems. SQL statements are
used to perform tasks such as update data on a database, or retrieve data from a database.
Some common relational database management systems that use SQL are: Oracle, Sybase,
Microsoft SQL Server, Access, Ingres, etc. Although most database systems use SQL, most
of them also have their own additional proprietary extensions that are usually only used on
their system. However, the standard SQL commands such as "Select", "Insert", "Update",
"Delete", "Create", and "Drop" can be used to accomplish almost everything that one needs
to do with a database. This tutorial will provide you with the instruction on the basics of each
of these commands as well as allow you to put them to practice using the SQL Interpreter.
Selecting Data
The select statement is used to query the database and retrieve selected data that match the
criteria that you specify. Here is the format of a simple select statement:
select "column1"
[,"column2",etc]
from "tablename"
[where "condition"];
[] = optional
The column names that follow the select keyword determine which columns will be
returned in the results. You can select as many column names that you'd like, or you can
use a "*" to select all columns.
The table name that follows the keyword from specifies the table that will be queried to
retrieve the desired results.
The where clause (optional) specifies which data values or rows will be returned or
displayed, based on the criteria described after the keyword where.
Conditional selections used in the where clause:
= Equal
> Greater than
< Less than
>= Greater than or equal
2
<= Less than or equal
<> Not equal to
LIKE *See note below
The LIKE pattern matching operator can also be used in the conditional selection of the
where clause. Like is a very powerful operator that allows you to select only rows that are
"like" what you specify. The percent sign "%" can be used as a wild card to match any
possible character that might appear before or after the characters specified. For example:
select first, last, city
from empinfo
where first LIKE 'Er%';
This SQL statement will match any first names that start with 'Er'. Strings must be in
single quotes.
Or you can specify,
select first, last
from empinfo
where last LIKE '%s';
This statement will match any last names that end in a 's'.
select * from empinfo
where first = 'Eric';
This will only select rows where the first name equals 'Eric' exactly.
Sample Table: empinfo
first Last id age city state
John Jones 99980 45 Payson Arizona
Mary Jones 99982 25 Payson Arizona
Eric Edwards 88232 32 San Diego California
Mary Ann Edwards 88233 32 Phoenix Arizona
Ginger Howell 98002 42 Cottonwood Arizona
Sebastian Smith 92001 23 Gila Bend Arizona
Gus Gray 22322 35 Bagdad Arizona
3
Mary Ann May 32326 52 Tucson Arizona
Erica Williams 32327 60 Show Low Arizona
Leroy Brown 32380 22 Pinetop Arizona
Elroy Cleaver 32382 22 Globe Arizona
Perform the following samples of select statements in the SQL then write down your
expected results.
select first, last, city from empinfo;
select last, city, age from empinfo
where age > 30;
select first, last, city, state from empinfo
where first LIKE 'J%';
select * from empinfo;
select first, last, from empinfo
where last LIKE '%s';
select first, last, age from empinfo
where last LIKE '%illia%';
select * from empinfo where first = 'Eric';
Select statement exercises
Enter select statements to:
1. Display the first name and age for everyone that's in the table.
2. Display the first name, last name, and city for everyone that's not from Payson.
3. Display all columns for everyone that is over 40 years old.
4. Display the first and last names for everyone whose last name ends in an "ay".
5. Display all columns for everyone whose first name equals "Mary". 6. Display all columns for everyone whose first name contains "Mary".
Creating Tables The create table statement is used to create a new table. Here is the format of a
simple create table statement:
create table "tablename"
("column1" "data type",
"column2" "data type",
"column3" "data type");
Format of create table if you were to use optional constraints:
create table "tablename"
("column1" "data type"
[constraint],
"column2" "data type"
4
[constraint],
"column3" "data type"
[constraint]);
[ ] = optional
Note: You may have as many columns as you'd like, and the constraints are optional.
Example:
create table employee
(first varchar(15),
last varchar(20),
age number(3),
address varchar(30),
city varchar(20),
state varchar(20));
To create a new table, enter the keywords create table followed by the table name,
followed by an open parenthesis, followed by the first column name, followed by the
data type for that column, followed by any optional constraints, and followed by a
closing parenthesis. It is important to make sure you use an open parenthesis before
the beginning table, and a closing parenthesis after the end of the last column
definition. Make sure you seperate each column definition with a comma. All SQL statements should end with a ";".
The table and column names must start with a letter and can be followed by letters,
numbers, or underscores - not to exceed a total of 30 characters in length. Do not
use any SQL reserved keywords as names for tables or column names (such as
"select", "create", "insert", etc).
Data types specify what the type of data can be for that particular column. If a
column called "Last_Name", is to be used to hold names, then that particular column should have a "varchar" (variable-length character) data type.
Here are the most common Data types:
char(size) Fixed-length character string. Size is specified in parenthesis.
Max 255 bytes.
varchar(size) Variable-length character string. Max size is specified in
parenthesis.
number(size) Number value with a max number of column digits specified in
parenthesis.
date Date value
number(size,d) Number value with a maximum number of digits of "size" total,
with a maximum number of "d" digits to the right of the
decimal.
What are constraints? When tables are created, it is common for one or more columns
to have constraints associated with them. A constraint is basically a rule associated
with a column that the data entered into that column must follow. For example, a
"unique" constraint specifies that no two records can have the same value in a
particular column. They must all be unique. The other two most popular constraints
are "not null" which specifies that a column can't be left blank, and "primary key". A
"primary key" constraint defines a unique identification of each record (or row) in a
table. All of these and more will be covered in the future Advanced release of this
Tutorial. Constraints can be entered in this SQL interpreter, however, they are not
5
supported in this Intro to SQL tutorial & interpreter. They will be covered and
supported in the future release of the Advanced SQL tutorial - that is, if "response" is good.
It's now time for you to design and create your own table. You will use this table
throughout the rest of the tutorial. If you decide to change or redesign the table, you
can either drop it and recreate it or you can create a completely different one. The SQL statement drop will be covered later
Your create statement should resemble:
Create Table Exercise
You have just started a new company. It is time to hire some employees. You will need
to create a table that will contain the following information about your new employees:
firstname, lastname, title, age, and salary. After you create the table, you should receive
a small form on the screen with the appropriate column names. If you are missing any
columns, you need to double check your SQL statement and recreate the table. Once it's
created successfully, go to the "Insert" lesson.
create table
myemployees_ts0211
(firstname varchar(30),
lastname varchar(30),
title varchar(30),
age number(2),
salary number(8,2));
Inserting into a Table
The insert statement is used to insert or add a row of data into the table.
To insert records into a table, enter the key words insert into followed by the table name,
followed by an open parenthesis, followed by a list of column names separated by commas,
followed by a closing parenthesis, followed by the keyword values, followed by the list of
values enclosed in parenthesis. The values that you enter will be held in the rows and they
will match up with the column names that you specify. Strings should be enclosed in single
quotes, and numbers should not.
insert into "tablename"
(first_column,...last_column)
values (first_value,...last_value);
In the example below, the column name first will match up with the value 'Luke', and
the column name state will match up with the value 'Georgia'.
Example:
insert into employee
(first, last, age, address, city, state)
values ('Luke', 'Duke', 45, '2130 Boars Nest',
'Hazard Co', 'Georgia');
6
Note: All strings should be enclosed between single quotes: 'string'
Insert statement exercises
It is time to insert data into your new employee table.
Your first three employees are the following:
Jonie Weber, Secretary, 28, 19500.00
Potsy Weber, Programmer, 32, 45300.00
Dirk Smith, Programmer II, 45, 75020.00
Enter these employees into your table first, and then insert at least 5 more of your own list
of employees in the table.
After they're inserted into the table, enter select statements to:
1. Select all columns for everyone in your employee table.
2. Select all columns for everyone with a salary over 30000.
3. Select first and last names for everyone that's under 30 years old.
4. Select first name, last name, and salary for anyone with "Programmer" in their
title.
5. Select all columns for everyone whose last name contains "ebe".
6. Select the first name for everyone whose first name equals "Potsy".
7. Select all columns for everyone over 80 years old. 8. Select all columns for everyone whose last name ends in "ith".
Create at least 5 of your own select statements based on specific information that you'd like
to retrieve
Updating Records
The update statement is used to update or change records that match a specified criteria.
This is accomplished by carefully constructing a where clause.
update "tablename"
set "columnname" =
"newvalue"
[,"nextcolumn" =
"newvalue2"...]
where "columnname"
OPERATOR "value"
[and|or "column"
OPERATOR "value"];
[] = optional
Examples:
update phone_book
set area_code = 623
where prefix = 979;
7
update phone_book
set last_name = 'Smith', prefix=555, suffix=9292
where last_name = 'Jones';
update employee
set age = age+1
where first_name='Mary' and last_name='Williams';
Update statement exercises
After each update, issue a select statement to verify your changes.
1. Jonie Weber just got married to Bob Williams. She has requested that her last
name be updated to Weber-Williams.
2. Dirk Smith's birthday is today, add 1 to his age.
3. All secretaries are now called "Administrative Assistant". Update all titles
accordingly.
4. Everyone that's making under 30000 are to receive a 3500 a year raise.
5. Everyone that's making over 33500 are to receive a 4500 a year raise.
6. All "Programmer II" titles are now promoted to "Programmer III". 7. All "Programmer" titles are now promoted to "Programmer II".
Create at least 5 of your own update statements and submit them.
Deleting Records
The delete statement is used to delete records or rows from the table.
delete from "tablename"
where "columnname"
OPERATOR "value"
[and|or "column"
OPERATOR "value"];
[ ] = optional
Examples:
delete from employee;
Note: if you leave off the where clause, all records will be deleted!
delete from employee
where lastname = 'May';
delete from employee
where firstname = 'Mike' or firstname = 'Eric';
To delete an entire record/row from a table, enter "delete from" followed by the table
name, followed by the where clause which contains the conditions to delete. If you leave
off the where clause, all records will be deleted.
8
Delete statement exercises
(Use the select statement to verify your deletes):
1. Jonie Weber-Williams just quit, remove her record from the table.
2. It's time for budget cuts. Remove all employees who are making over 70000 dollars.
Create at least two of your own delete statements, and then issue a command to delete all
records from the table.
Drop a Table
The drop table command is used to delete a table and all rows in the table.
To delete an entire table including all of its rows, issue the drop table command followed
by the tablename. drop table is different from deleting all of the records in the table.
Deleting all of the records in the table leaves the table including column and constraint
information. Dropping the table removes the table definition as well as all of its rows.
drop table "tablename"
Example:
drop table myemployees_ts0211;
SELECT Statement
The SELECT statement is used to query the database and retrieve selected data that match
the criteria that you specify.
The SELECT statement has five main clauses to choose from, although, FROM is the only
required clause. Each of the clauses have a vast selection of options, parameters, etc. The
clauses will be listed below, but each of them will be covered in more detail later in the
tutorial.
Here is the format of the SELECT statement:
SELECT [ALL | DISTINCT] column1[,column2]
FROM table1[,table2]
[WHERE "conditions"]
[GROUP BY "column-list"]
[HAVING "conditions]
[ORDER BY "column-list" [ASC | DESC] ]
Example:
9
SELECT name, title, dept
FROM employee
WHERE title LIKE 'Pro%';
The above statement will select all of the rows/values in the name, title, and dept columns
from the employee table whose title starts with 'Pro'. This may return job titles including
Programmer or Pro-wrestler.
ALL and DISTINCT are keywords used to select either ALL (default) or the "distinct" or
unique records in your query results. If you would like to retrieve just the unique records in
specified columns, you can use the "DISTINCT" keyword. DISTINCT will discard the
duplicate records for the columns you specified after the "SELECT" statement: For
example:
SELECT DISTINCT age
FROM employee_info;
This statement will return all of the unique ages in the employee_info table.
ALL will display "all" of the specified columns including all of the duplicates. The ALL
keyword is the default if nothing is specified.
Exercises
1. From the items_ordered table, select a list of all items purchased for customerid
10449. Display the customerid, item, and price for this customer.
2. Select all columns from the items_ordered table for whoever purchased a Tent.
3. Select the customerid, order_date, and item values from the items_ordered
table for any items in the item column that start with the letter "S".
4. Select the distinct items in the items_ordered table. In other words, display a listing of each of the unique items from the items_ordered table.
Aggregate Functions
MIN returns the smallest value in a given column
MAX returns the largest value in a given column
SUM returns the sum of the numeric values in a given column
AVG returns the average value of a given column
COUNT returns the total number of values in a given column
COUNT(*) returns the number of rows in a table
Aggregate functions are used to compute against a "returned column of numeric data" from
your SELECT statement. They basically summarize the results of a particular column of
selected data. We are covering these here since they are required by the next topic,
10
"GROUP BY". Although they are required for the "GROUP BY" clause, these functions
can be used without the "GROUP BY" clause. For example:
SELECT AVG(salary)
FROM employee;
This statement will return a single result which contains the average value of everything
returned in the salary column from the employee table.
Another example:
SELECT AVG(salary)
FROM employee;
WHERE title = 'Programmer';
This statement will return the average salary for all employees whose title is equal to
'Programmer'
Example:
SELECT Count(*)
FROM employees;
This particular statement is slightly different from the other aggregate functions since there
isn't a column supplied to the count function. This statement will return the number of rows
in the employees table.
Review Exercises
1. Select the maximum price of any item ordered in the items_ordered table. Hint:
Select the maximum price only.>
2. Select the average price of all of the items ordered that were purchased in the
month of Dec.
3. What are the total number of rows in the items_ordered table?
4. For all of the tents that were ordered in the items_ordered table, what is the price of the lowest tent? Hint: Your query should return the price only.
GROUP BY clause
The GROUP BY clause will gather all of the rows together that contain data in the
specified column(s) and will allow aggregate functions to be performed on the one or more
columns. This can best be explained by an example:
GROUP BY clause syntax:
11
SELECT column1,
SUM(column2)
FROM "list-of-tables"
GROUP BY "column-list";
Let's say you would like to retrieve a list of the highest paid salaries in each dept:
SELECT max(salary), dept
FROM employee
GROUP BY dept;
This statement will select the maximum salary for the people in each unique department.
Basically, the salary for the person who makes the most in each department will be
displayed. Their salary and their department will be returned.
Multiple Grouping Columns - What if I wanted to display their lastname too?
For example, take a look at the items_ordered table. Let's say you want to group everything
of quantity 1 together, everything of quantity 2 together, everything of quantity 3 together,
etc. If you would like to determine what the largest cost item is for each grouped quantity
(all quantity 1's, all quantity 2's, all quantity 3's, etc.), you would enter:
SELECT quantity, max(price)
FROM items_ordered
GROUP BY quantity;
Enter the statement in above, and take a look at the results to see if it returned what you were
expecting. Verify that the maximum price in each Quantity Group is really the maximum
price.
Review Exercises
1. How many people are in each unique state in the customers table? Select the
state and display the number of people in each. Hint: count is used to count rows
in a column, sum works on numeric data only.
2. From the items_ordered table, select the item, maximum price, and minimum
price for each specific item in the table. Hint: The items will need to be broken
up into separate groups.
3. How many orders did each customer make? Use the items_ordered table. Select
the customerid, number of orders they made, and the sum of their orders. Click the Group By answers link below if you have any problems
HAVING clause
The HAVING clause allows you to specify conditions on the rows for each group - in other
words, which rows should be selected will be based on the conditions you specify. The
HAVING clause should follow the GROUP BY clause if you are going to use it.
12
HAVING clause syntax:
SELECT column1,
SUM(column2)
FROM "list-of-tables"
GROUP BY "column-list"
HAVING "condition";
HAVING can best be described by example. Let's say you have an employee table containing
the employee's name, department, salary, and age. If you would like to select the average
salary for each employee in each department, you could enter:
SELECT dept, avg(salary)
FROM employee
GROUP BY dept;
But, let's say that you want to ONLY calculate & display the average if their salary is over
20000:
SELECT dept, avg(salary)
FROM employee
GROUP BY dept
HAVING avg(salary) > 20000;
Review Exercises (note: yes, they are similar to the group by exercises, but these
contain the HAVING clause requirements
1. How many people are in each unique state in the customers table that have
more than one person in the state? Select the state and display the number of
how many people are in each if it's greater than 1.
2. From the items_ordered table, select the item, maximum price, and minimum
price for each specific item in the table. Only display the results if the maximum
price for one of the items is greater than 190.00.
3. How many orders did each customer make? Use the items_ordered table. Select
the customerid, number of orders they made, and the sum of their orders if
they purchased more than 1 item.
ORDER BY clause
ORDER BY is an optional clause which will allow you to display the results of your query
in a sorted order (either ascending order or descending order) based on the columns that
you specify to order by.
ORDER BY clause syntax:
13
SELECT column1, SUM(column2)
FROM "list-of-tables"
ORDER BY
"column-list" [ASC | DESC]; [ ] = optional
This statement will select the employee_id, dept, name, age, and salary from the
employee_info table where the dept equals 'Sales' and will list the results in Ascending
(default) order based on their Salary.
ASC = Ascending Order - default
DESC = Descending Order
For example:
SELECT employee_id, dept, name, age, salary
FROM employee_info
WHERE dept = 'Sales'
ORDER BY salary;
If you would like to order based on multiple columns, you must seperate the columns with
commas. For example:
SELECT employee_id, dept, name, age, salary FROM employee_info WHERE dept = 'Sales' ORDER BY salary, age DESC;
Review Exercises
1. Select the lastname, firstname, and city for all customers in the customers
table. Display the results in Ascending Order based on the lastname.
2. Same thing as exercise #1, but display the results in Descending order.
3. Select the item and price for all of the items in the items_ordered table that the
price is greater than 10.00. Display the results in Ascending order based on the price.
Combining conditions and Boolean Operators
The AND operator can be used to join two or more conditions in the WHERE clause. Both
sides of the AND condition must be true in order for the condition to be met and for those
rows to be displayed.
SELECT column1,
SUM(column2)
FROM "list-of-tables"
WHERE "condition1" AND
"condition2";
14
The OR operator can be used to join two or more conditions in the WHERE clause also.
However, either side of the OR operator can be true and the condition will be met - hence,
the rows will be displayed. With the OR operator, either side can be true or both sides can
be true.
For example:
SELECT employeeid, firstname, lastname, title, salary
FROM employee_info
WHERE salary >= 50000.00 AND title = 'Programmer';
This statement will select the employeeid, firstname, lastname, title, and salary from the
employee_info table where the salary is greater than or equal to 50000.00 AND the title is
equal to 'Programmer'. Both of these conditions must be true in order for the rows to be
returned in the query. If either is false, then it will not be displayed.
Although they are not required, you can use paranthesis around your conditional
expressions to make it easier to read:
SELECT employeeid, firstname, lastname, title, salary
FROM employee_info
WHERE (salary >= 50000.00) AND (title = 'Programmer');
Another Example:
SELECT firstname, lastname, title, salary
FROM employee_info
WHERE (title = 'Sales') OR (title = 'Programmer');
This statement will select the firstname, lastname, title, and salary from the employee_info
table where the title is either equal to 'Sales' OR the title is equal to 'Programmer'.
Review Exercises
1. Select the customerid, order_date, and item from the items_ordered table for
all items unless they are 'Snow Shoes' or if they are 'Ear Muffs'. Display the
rows as long as they are not either of these two items.
2. Select the item and price of all items that start with the letters 'S', 'P', or 'F'.
IN and BETWEEN Conditional Operators
SELECT col1, SUM(col2)
FROM "list-of-tables"
WHERE col3 IN
(list-of-values);
15
SELECT col1, SUM(col2)
FROM "list-of-tables"
WHERE col3 BETWEEN value1
AND value2;
The IN conditional operator is really a set membership test operator. That is, it is used to
test whether or not a value (stated before the keyword IN) is "in" the list of values provided
after the keyword IN.
For example:
SELECT employeeid, lastname, salary
FROM employee_info
WHERE lastname IN ('Hernandez', 'Jones', 'Roberts', 'Ruiz');
This statement will select the employeeid, lastname, salary from the employee_info table
where the lastname is equal to either: Hernandez, Jones, Roberts, or Ruiz. It will return the
rows if it is ANY of these values.
The IN conditional operator can be rewritten by using compound conditions using the
equals operator and combining it with OR - with exact same output results:
SELECT employeeid, lastname, salary
FROM employee_info
WHERE lastname = 'Hernandez' OR lastname = 'Jones' OR lastname =
'Roberts'
OR lastname = 'Ruiz';
As you can see, the IN operator is much shorter and easier to read when you are testing for
more than two or three values.
You can also use NOT IN to exclude the rows in your list.
The BETWEEN conditional operator is used to test to see whether or not a value (stated
before the keyword BETWEEN) is "between" the two values stated after the keyword
BETWEEN.
For example:
SELECT employeeid, age, lastname, salary
FROM employee_info
WHERE age BETWEEN 30 AND 40;
16
This statement will select the employeeid, age, lastname, and salary from the employee_info
table where the age is between 30 and 40 (including 30 and 40).
This statement can also be rewritten without the BETWEEN operator:
SELECT employeeid, age, lastname, salary
FROM employee_info
WHERE age >= 30 AND age <= 40;
You can also use NOT BETWEEN to exclude the values between your range.
Review Exercises
1. Select the date, item, and price from the items_ordered table for all of the rows
that have a price value ranging from 10.00 to 80.00.
2. Select the firstname, city, and state from the customers table for all of the rows
where the state value is either: Arizona, Washington, Oklahoma, Colorado, or Hawaii.
Mathematical Operators
Standard ANSI SQL-92 supports the following first four basic arithmetic operators:
+ addition
- subtraction
* multiplication
/ division
% Modulo
The modulo operator determines the integer remainder of the division. This operator is not
ANSI SQL supported, however, most databases support it. The following are some more
useful mathematical functions to be aware of since you might need them. These functions
are not standard in the ANSI SQL-92 specs, therefore they may or may not be available on
the specific RDBMS that you are using. However, they were available on several major
database systems that I tested. They WILL work on this tutorial.
ABS(x) returns the absolute value of x
SIGN(x) returns the sign of input x as -1, 0, or 1 (negative, zero, or
positive respectively)
17
MOD(x,y) modulo - returns the integer remainder of x divided by y (same
as x%y)
FLOOR(x) returns the largest integer value that is less than or equal to x
CEILING(x) or
CEIL(x)
returns the smallest integer value that is greater than or equal to
x
POWER(x,y) returns the value of x raised to the power of y
ROUND(x) returns the value of x rounded to the nearest whole integer
ROUND(x,d) returns the value of x rounded to the number of decimal places
specified by the value d
SQRT(x) returns the square-root value of x
For example:
SELECT round(salary), firstname
FROM employee_info
This statement will select the salary rounded to the nearest whole value and the firstname
from the employee_info table.
Review Exercises
1. Select the item and per unit price for each item in the items_ordered table. Hint: Divide the price by the quantity.
Table Joins, a must
All of the queries up until this point have been useful with the exception of one major
limitation - that is, you've been selecting from only one table at a time with your SELECT
statement. It is time to introduce you to one of the most beneficial features of SQL &
relational database systems - the "Join". To put it simply, the "Join" makes relational
database systems "relational".
Joins allow you to link data from two or more tables together into a single query result--
from one single SELECT statement.
A "Join" can be recognized in a SQL SELECT statement if it has more than one table after
the FROM keyword.
For example:
18
SELECT "list-of-columns"
FROM table1,table2
WHERE "search-condition(s)"
Joins can be explained easier by demonstrating what would happen if you worked with one
table only, and didn't have the ability to use "joins". This single table database is also
sometimes referred to as a "flat table". Let's say you have a one-table database that is used
to keep track of all of your customers and what they purchase from your store:
id first last address city state zip date item price
Everytime a new row is inserted into the table, all columns will be be updated, thus
resulting in unnecessary "redundant data". For example, every time Wolfgang Schultz
purchases something, the following rows will be inserted into the table:
id first last address
city
stat
e zip date item price
1098
2
Wolfgan
g
Schult
z 300 N. 1st Ave
Yum
a AZ
8500
2
03229
9
snowboar
d
45.0
0
1098
2
Wolfgan
g
Schult
z 300 N. 1st Ave
Yum
a AZ
8500
2
08289
9
snow
shovel
35.0
0
1098
2
Wolfgan
g
Schult
z 300 N. 1st Ave
Yum
a AZ
8500
2
09119
9 gloves
15.0
0
1098
2
Wolfgan
g
Schult
z 300 N. 1st Ave
Yum
a AZ
8500
2
10099
9 lantern
35.0
0
1098
2
Wolfgan
g
Schult
z 300 N. 1st Ave
Yum
a AZ
8500
2
02290
0 tent
85.0
0
An ideal database would have two tables:
1. One for keeping track of your customers 2. And the other to keep track of what they purchase:
"Customer_info" table:
customer_number firstname lastname address city state zip
19
"Purchases" table:
customer_number date item price
Now, whenever a purchase is made from a repeating customer, the 2nd table, "Purchases"
only needs to be updated! We've just eliminated useless redundant data, that is, we've just
normalized this database!
Notice how each of the tables have a common "cusomer_number" column. This column,
which contains the unique customer number will be used to JOIN the two tables. Using the
two new tables, let's say you would like to select the customer's name, and items they've
purchased. Here is an example of a join statement to accomplish this:
SELECT customer_info.firstname, customer_info.lastname, purchases.item
FROM customer_info, purchases
WHERE customer_info.customer_number = purchases.customer_number;
This particular "Join" is known as an "Inner Join" or "Equijoin". This is the most common
type of "Join" that you will see or use.
Notice that each of the colums are always preceeded with the table name and a period. This
isn't always required, however, it IS good practice so that you wont confuse which colums
go with what tables. It is required if the name column names are the same between the two
tables. I recommend preceeding all of your columns with the table names when using joins.
Note: The syntax described above will work with most Database Systems -including
the one with this tutorial. However, in the event that this doesn't work with yours,
please check your specific database documentation.
Although the above will probably work, here is the ANSI SQL-92 syntax specification for
an Inner Join using the preceding statement above that you might want to try:
SELECT customer_info.firstname, customer_info.lastname, purchases.item
FROM customer_info INNER JOIN purchases
ON customer_info.customer_number = purchases.customer_number;
Another example:
SELECT employee_info.employeeid, employee_info.lastname,
employee_sales.comission
FROM employee_info, employee_sales
WHERE employee_info.employeeid = employee_sales.employeeid;
20
This statement will select the employeeid, lastname (from the employee_info table), and the
comission value (from the employee_sales table) for all of the rows where the employeeid
in the employee_info table matches the employeeid in the employee_sales table.
Review Exercises
1. Write a query using a join to determine which items were ordered by each of
the customers in the customers table. Select the customerid, firstname,
lastname, order_date, item, and price for everything each customer purchased
in the items_ordered table.
2. Repeat exercise #1, however display the results sorted by state in descending order.
1
Database System Second Year
Bahaa Dhiaa
Control & Systems Eng.
Computer Eng. Branch
Lec.4
Integrity Constraints
Integrity constraints are used to ensure accuracy and consistency of data in a relational
database. Many types of integrity constraints play a role in referential integrity (RI). Some of
these integrity constraints are (Primary Key Constraints, Unique Constraints, Foreign Key
Constraints, Not Null Constraints, Check Constraints, Triggers and others)
Primary Key Constraints
Primary key is the term used to identify one or more columns in a table that make a
row of data unique. Although the primary key typically consists of one column in a table,
more than one column can comprise the primary key. For example, either the employee’s
Social Security number or an assigned employee identification number is the logical primary
key for an employee table. The objective is for every record to have a unique primary key or
value for the employee’s identification number.
Because there is probably no need to have more than one record for each employee in an
employee table, the employee identification number makes a logical primary key. The
primary key is assigned at table creation.
The following example identifies the EMP_ID column as the PRIMARY KEY for the
EMPLOYEES table:
This method of defining a primary key is accomplished during table creation. The
primary key in this case is an implied constraint. You can also specify a primary key explicitly
as a constraint when setting up a table, as follows:
2
The primary key constraint in this example is defined after the column comma list in
the CREATE TABLE statement. A primary key that consists of more than one column can
be defined by either of the following methods:
Or:
Unique Constraints
A unique column constraint in a table is similar to a primary key in that the value in
that column for every row of data in the table must have a unique value. Although a primary
key constraint is placed on one column, you can place a unique constraint on another column
even though it is not actually for use as the primary key. Study the following example:
3
The primary key in this example is EMP_ID, meaning that the employee
identification number is the column that is used to ensure that every record in the table is
unique. The primary key is a column that is normally referenced in queries, particularly to
join tables. The column EMP_PHONE has been designated as a UNIQUE value, meaning
that no two employees can have the same telephone number. There is not a lot of difference
between the two, except that the primary key is used to provide an order to data in a table
and, in the same respect, join related tables.
Foreign Key Constraints
A foreign key is a column in a child table that references a primary key in the
parent table. A foreign key constraint is the main mechanism used to enforce referential
integrity between tables in a relational database. A column defined as a foreign key is used
to reference a column defined as a primary key in another table.
Study the creation of the foreign key in the following example:
The EMP_ID column in this example has been designated as the foreign key for the
EMPLOYEE_PAY_TBL table. This foreign key, as you can see, references the EMP_ID
column in the EMPLOYEE_TBL table. This foreign key ensures that for every EMP_ID in
the EMPLOYEE_PAY_TBL, there is a corresponding EMP_ID in the EMPLOYEE_TBL.
This is called a parent/child relationship. The parent table is the EMPLOYEE_TBL table,
and the child table is the EMPLOYEE_PAY_TBL table
4
In this figure, the EMP_ID column in the child table references the EMP_ID column
in the parent table. For a value to be inserted for EMP_ID in the child table, a value for
EMP_ID in the parent table must first exist. Likewise, for a value to be removed for EMP_ID
in the parent table, all corresponding values for EMP_ID must first be removed from the
child table. This is how referential integrity works.
A foreign key can be added to a table using the ALTER TABLE command, as
shown in the following example:
NOT NULL Constraints
Previous examples use the keywords NULL and NOT NULL listed on the same line
as each column and after the data type. NOT NULL is a constraint that you can place on a
table’s column. This constraint disallows the entrance of NULL values into a column; in
other words, data is required in a NOT NULL column for each row of data in the table. NULL
is generally the default for a column if NOT NULL is not specified, allowing NULL values
in a column.
5
Check Constraints
Check (CHK) constraints can be utilized to check the validity of data entered into
particular table columns. Check constraints are used to provide back-end database edits,
although edits are commonly found in the front-end application as well. General edits restrict
values that can be entered into columns or objects, whether within the database itself or on a
front-end application. The check constraint is a way of providing another protective layer for
the data. The following example illustrates the use of a check constraint:
The check constraint in this table has been placed on the EMP_ZIP column, ensuring
that all employees entered into this table have a ZIP code of ‘46234’. Perhaps that is a little
restricting. Nevertheless, you can see how it works.
If you wanted to use a check constraint to verify that the ZIP code is within a list of
values, your constraint definition could look like the following:
If there is a minimum pay rate that can be designated for an employee, you could
have a constraint that looks like the following:
6
Dropping Constraints
Any constraint that you have defined can be dropped using the ALTER TABLE
command with the DROP CONSTRAINT option. For example, to drop the primary key
constraint in the EMPLOYEES table, you can use the following command:
Some implementations might provide shortcuts for dropping certain constraints. For
example, to drop the primary key constraint for a table in MySQL, you can use the following
command:
Triggers
A trigger is a compiled SQL procedure in the database used to perform actions based
on other actions that occur within the database. A trigger is a form of a stored procedure that
is executed when a specified DML action is performed on a table. The trigger can be executed
before or after an INSERT, DELETE, or UPDATE statement. Triggers can also be used to
check data integrity before an INSERT, DELETE, or UPDATE statement. Triggers can roll
back transactions, and they can modify data in one table and read from another table in
another database.
The basic syntax for Oracle is as follows:
The following is an example trigger:
7
The preceding example shows the creation of a trigger called EMP_PAY_TRIG. This
trigger inserts a row into the EMPLOYEE_PAY_HISTORY table, reflecting the changes
made every time a row of data is updated in the EMPLOYEE_PAY_TBL table.
The DROP TRIGGER Statement A trigger can be dropped using the DROP
TRIGGER statement. The syntax for dropping a trigger is as follows:
The FOR EACH ROW syntax allows the developer to have the procedure fire for
each row that is affected by the SQL statement or once for the statement as a whole. The
syntax is as follows:
The difference is how many times the trigger is executed. If you create a regular
trigger and execute a statement against the table that affects 100 rows, the trigger is executed
once. If instead you create the trigger with the FOR EACH ROW syntax and execute the
statement again, the trigger is executed 100 times, once for each row that is affected by the
statement.
8
Example1: Create an SQL trigger to keep in a table the ID and the registration date for
every employee will be registered in the company table.
Employee (ID, name, age, address, salary)
Example2: For the following tables:
Employee (name, ID, salary, Dept#, Supervisor)
Department (Dept_Name, Dept#, total_sal, manager)
The total_sal attribute is derived attribute. Its value is the sum of the salaries of all
employees in a particular department. Create SQL triggers so the value of the total_sal will
always be correct for all events may be happened.
HW: In a library database which contains the following tables, the NoBrwBooks table is
created to indicate the number of borrowed books of each section (category). Write SQL
triggers to do that. Note: status attribute data are either borrowed or available.
Books (ID, Title, Author, Status, Section)
NoBrwBooks (Total_brwBooks, Section)
1
Database System Second Year
Bahaa Dhiaa
Control & Systems Eng.
Computer Eng. Branch
Lec.5 File Organization
File Organization
A database is mapped into a number of different files that are maintained by the
underlying operating system. These files reside permanently on disks. A file is organized
logically as a sequence of records. These records are mapped onto disk blocks. Blocks are of
a fixed size determined by the operating system, but record sizes vary.
One approach to mapping database to files is to use several files, and to store records
of one fixed length in a given file. An alternative way is to structure files such that we can
accommodate multiple length for records. Files of fixed-length records are easier to
implement than files of variable-length case.
Fixed-Length Records
Consider a file of deposit records of the form:
type deposit = record
bname : char(22);
account# : char(10);
balance : real;
end
If we assume that each character occupies one byte, an integer occupies 4 bytes,
and a real 8 bytes, our deposit record is 40 bytes long. The simplest approach is to use the
first 40 bytes for the first record, the next 40 bytes for the second, and so on.
2
However, there are two problems with this approach.
1. It is difficult to delete a record from this structure. Space occupied must
somehow be deleted, or we need to mark deleted records so that they can be
ignored.
2. Unless block size is a multiple of 40, some records will cross block
boundaries. It would then require two block accesses to read or write such a
record.
When a record is deleted, we could move all successive records up one (Figures
below), which may require moving a lot of records. We could instead move the last record
into the ``hole'' created by the deleted record. This changes the order the records are in.
3
It turns out to be undesirable to move records to occupy freed space, as moving
requires block accesses. Also, insertions tend to be more frequent than deletions. It is
acceptable to leave the space open and wait for a subsequent insertion. This leads to a need
for additional structure in our file design.
So one solution is:
- At the beginning of a file, allocate some bytes as a file header.
- This header for now need only be used to store the address of the first record whose
contents are deleted.
- This first record can then store the address of the second available record, and so on
(see figure below).
- To insert a new record, we use the record pointed to by the header, and change the
header pointer to the next available record.
- If no deleted records exist we add our new record to the end of the file.
- Note: Use of pointers requires careful programming. If a record pointed to is moved
or deleted, and that pointer is not corrected, the pointer becomes a dangling
pointer. Records pointed to are called pinned.
4
Fixed-length file insertions and deletions are relatively simple because ``one size
fits all''. For variable length, this is not the case
Variable-Length Records
Variable-length records arise in a database in several ways:
o Storage of multiple items in a file.
o Record types allowing variable field size
o Record types allowing repeating fields
We'll look at several techniques, using one example with a variable-length record:
type account-list = record
bname : char(22);
account-info : array [1 .. ∞] of record;
account# : char(10);
balance : real;
end
end
Account-information is an array with an arbitrary number of elements.
1. Byte string representation
A simple method for implementing variable-length records is to attach a special
end-of-record symbol ( ) to the end of each record. Each record is stored as a string
of successive bytes (as shown below).
5
Byte string representation has several disadvantages:
o It is not easy to re-use space left by a deleted record.
o In general, there is no space for records to grow longer. So the record must be moved
and movement is costly if the record is pinned.)
So this method is not usually used and a modified form of byte-string representation
is used. It is called Slot page structure. There is a header at the beginning of each block,
containing:
o The number of record entries in the header
o The end of free space in the block
o An array whose entries contain the location and size of each record.
The slot page structure requires that there be no pointers that point directly to
records. Instead, pointers must point to the entry in the header that contains the actual location
of the record. This level of indirection allows records to be moved to prevent fragmentation
of space inside a block, while supporting indirect pointers to the record.
6
2. Fixed-length representation
Another way to implement variable-length records is to use one or more fixed-
length records to represent one variable-length record.
There are two techniques for implementing files of variable-length records using fixed-
length records:
o Reserved space - uses fixed-length records large enough to accommodate
the largest variable-length record. (Unused space filled with end-of-record
symbol.)
o Pointers - represent by a list of fixed-length records, chained together.
The reserved space method requires the selection of some maximum record
length. (as shown below)
If most records are of near-maximum length this method is useful. Otherwise, space
is wasted. Then the pointer method may be used (see figure below). Its disadvantage is that
space is wasted in successive records in a chain as non-repeating fields are still present.
7
To overcome this disadvantage, we can split records into two blocks (See Figure below)
o Anchor block - contains first records of a chain
o Overflow block - contains records other than first in the chain.
Now all records in a block have the same length, and there is no wasted space.
Organization of Records in Files
So far, we have studied how records are represented in a file structure, the next
question is how to organize them in a file. Several of the possible ways of organizing records
in files are:
Heap file organization. Any record can be placed anywhere in the file where there
is space for the record. There is no ordering of records.
Sequential file organization. Records are stored in sequential order, according to the
value of a “search key” of each record.
Hashing file organization. A hash function is computed on some attribute of each
record. The result of the hash function specifies in which block of the file the record
should be laced.
Clustering file organization. Related records of the different relations are stored on
the same block, so that one I/O operation fetches related records from all the relations.
8
Sequential File Organization
A sequential file is designed for efficient processing of records in sorted order based
on some search key. A search key is any attribute or set of attributes. Records are chained
together by pointers to permit fast retrieval in search key order. The pointer in each record
points to the next record in search-key order. Furthermore, to minimize the number of block
accesses in sequential file processing, we store records physically in search-key order, or as
close to search-key order as possible. Figure below shows an example, with bname as the
search key.
It is difficult, to maintain physical sequential order as records are inserted and
deleted, since it is costly to move many records as a result of a single insertion or deletion.
Deletion can be managed with the pointer chains. For insertion, we must find the right
location for the inserted record based on the search key, if there is a free record (that is, space
left after a deletion) within the same block, then insert the new record there. Otherwise, insert
the new record in an overflow block and adjust the pointers to chain together the records in
search-key order.
1
Database System Second Year
Bahaa Dhiaa
Control & Systems Eng.
Computer Eng. Branch
Lec.6 indexing and hashing
Basic Concepts
Index – in books- is alphabetically arranged list of terms given at the end of a printed
book with page numbers on which the terms can be found. Database-system indices play the
same role as book indices in libraries. For example, to retrieve a student record given an ID,
the database system would look up an index to find on which disk block the corresponding
record resides, and then fetch the disk block, to get the appropriate student record. An
attribute or set of attributes used to look up records in a file is called a search key.
There are two basic kinds of indices:
Ordered indices. Based on a sorted ordering of the values.
Hash indices. Based on a uniform distribution of values across a range of buckets
which its value determined by a hash function.
No one technique is the best. Each technique is best suited to particular database
applications. Each technique must be evaluated on the basis of these factors:
Access types: The types of access that are supported efficiently.
Access time: The time it takes to find a particular data item.
Insertion time: The time it takes to insert a new data item.
Deletion time: The time it takes to delete a data item.
Space overhead: The additional space occupied by an index structure.
Ordered Indices
A file may have several indices, on different search keys. If the file containing the
records is sequentially ordered, a primary index is an index whose search key defines the
sequential order of the file. Primary indices are also called clustering indices. At the other
hand, Indices whose search key specifies an order different from the sequential order of the
2
file are called secondary indices, or nonclustering indices. There are two types of ordered
indices that we can use:
Dense index: An index record appears for every search-key value in the file. In a
dense index, the index record contains the search-key value and a pointer to the first
data record with that search-key value. The rest of the records with the same search
key-value would be stored sequentially after the first record.
Sparse index: To locate a record indexed by sparse index technique, we find the
index entry with the largest search-key value that is less than or equal to the search-
key value for which we are looking. We start at the record pointed to by that index
entry, and follow the pointers in the file until we find the desired record.
The two figures below show dense and sparse indices, respectively, for the account file.
Suppose that we are looking up records for the Perryridge branch. Using the dense index of
Figure A, we follow the pointer directly to the first Perryridge record. We process this record,
and follow the pointer in that record to locate the next record in search-key (branch-name)
order. We continue processing records until we encounter a record for a branch other than
Perryridge. If we are using the sparse index (Figure B), we do not find an index entry for
“Perryridge.” Since the last entry (in alphabetic order) before “Perryridge” is “Mianus,” we
follow that pointer. We then read the account file in sequential order until we find the first
Perryridge record, and begin processing at that point.
3
It is clear that it is faster to locate a record if we have a dense index rather than a
sparse index. However, sparse indices have advantages over dense indices in that they require
less space and they impose less maintenance overhead for insertions and deletions.
Multilevel Indices
Even if we use a sparse index, the index itself may become too large for efficient
processing. If we have a file with 100,000 records, with 10 records stored in each block. If
we have one index record per block, the index has 10,000 records. Index records are smaller
than data records, so let us assume that 100 index records fit on a block. Thus, we need 100
blocks just for index. Such large indices are stored as sequential files on disk. If an index is
sufficiently small to be kept in main memory, the search time to find an entry is low.
However, if the index is so large that it must be kept on disk, a search for an entry requires
several disk block reads.
To deal with this problem, we treat the index just as we would treat any other
sequential file, and construct a sparse index on the primary index, as in Figure below.
To locate a record, we first use binary search on the outer index to find the record for
the largest search-key value less than or equal to the one that we desire. The pointer points to
a block of the inner index. We scan this block until we find the record that has the largest
search-key value less than or equal to the one that we desire. The pointer in this record points
to the block of the file that contains the record we are looking for.
4
Hash File Organization
In a hash file organization, we obtain the address of the disk block containing
a desired record directly by computing a function on the search-key value of the record.
A term bucket is used to indicate to a unit of storage that can store one or more records.
A bucket is typically a disk block, but could be chosen to be smaller or larger than a disk
block. If K indicate to the set of all search-key values, and B indicate to the set of all bucket
addresses, then a hash function H is a function from K to B which is used for access, insertion
and deletion.
Records with different search-key values may be mapped to the same bucket; thus
entire bucket has to be searched sequentially to locate a record.
5
Index Definition in SQL
We create an index by the create index command, which takes the form:
create index <index-name> on <relation-name> (<attribute-list>)
The attribute-list is the list of attributes of the relations that form the search key for
the index. To define an index name b-index on the branch relation with branch-name as the
search key, we write:
create index b-index on branch (branch-name)
If we wish to declare that the search key is a candidate key, we add the attribute
unique to the index definition. Thus, the command:
create unique index b-index on branch (branch-name)
declares branch-name to be a candidate key for branch. If, at the time we enter the create
unique index command and branch-name is not a candidate key, the system will display an
error message, and the attempt to create the index will fail. If the index creation attempt
succeeds, any subsequent attempt to insert a tuple that violates the key declaration will fail.
Note that the unique feature is redundant if the database system supports the unique
declaration of the SQL standard.
If an index is required to be dropped, the drop index command takes the form:
drop index <index-name>
So if we want to drop the index b-index that we created, the command is:
drop index b-index
Multiple-Key Access
For certain types of queries, it is advantageous to use multiple indices if they exist.
So if multiple indices exist, the indices may either use multiple single-key or indices use
multiple keys.
6
1- Using Multiple Single-Key Indices
Assume that the account file has two indices: one for branch-name and one for balance.
Consider the following query: “Find all account numbers at the Perryridge branch with
balances equal to $1000.” We write:
select account-number
from account
where branch-name = “Perryridge” and balance = 1000
There are three strategies possible for processing this query:
a) Use the index on branch-name to find all records related to the Perryridge branch,
then examine each such record to see whether balance = 1000.
b) Use the index on balance to find all records related to accounts with balances of
$1000, then examine each such record to see whether branch-name = “Perryridge.”
c) Use the index on branch-name to find pointers to all records related to the Perryridge
branch. Also, use the index on balance to find pointers to all records related to
accounts with a balance of $1000. Take the intersection of these two sets of pointers.
Those pointers that are in the intersection point to records related to both Perryridge
and accounts with a balance of $1000.
The third strategy is the only one of the three that takes advantage of the existence of
multiple indices.
2- Indices on Multiple Keys
The strategy for this case is to create and use an index on a search key of multiple
attributes (branch-name, balance). The index on the combined search-key will fetch only
records that satisfy both conditions.