Lec. 1 Introduction · Lec. 1 Introduction What is Database? DB. A collection of information organized in such a way that a computer program can quickly select desired pieces of data

1

Database System Second Year

Bahaa Dhiaa

Control & Systems Eng.

Computer Eng. Branch

Lec. 1

Introduction

What is Database?

DB. A collection of information organized in such a way that a computer program can quickly

select desired pieces of data. You can think of a database as an electronic filing system.

Traditional databases are organized by fields, records, and files. A field is a single piece of

information, a record is one complete set of fields, and a file is a collection of records. For

example, a telephone book is analogous to a file. It contains a list of records, each of which

consists of three fields: name, address, and telephone number.

To access information from a database, you need a database management system (DBMS).

This is a collection of programs that enables you to enter, organize, and select data in a

database. Major components of Database system are:

1. Data

2. Hardware

3. Software

4. Users

Data:

The term data means groups of information that represent the qualitative or quantitative

attributes of a variable or set of variables. Data (plural of "datum", which is seldom used) are

typically the results of measurements and can be the basis of graphs, images, or observations

of a set of variables. Data are often viewed as the lowest level of abstraction from which

information and knowledge are derived.

Generally data in the database are both

Integrated: the unification of several otherwise distinct data files with any redundancy

among those files either wholly or partly removed.

http://www.webopedia.com/TERM/D/computer.html

http://www.webopedia.com/TERM/D/program.html

http://www.webopedia.com/TERM/D/select.html

http://www.webopedia.com/TERM/D/data.html

http://www.webopedia.com/TERM/D/system.html

http://www.webopedia.com/TERM/D/field.html

http://www.webopedia.com/TERM/D/record.html

http://www.webopedia.com/TERM/D/file.html

http://www.webopedia.com/TERM/D/access.html

http://www.webopedia.com/TERM/D/database_management_system_DBMS.html

http://en.wikipedia.org/wiki/Variable

http://en.wikipedia.org/wiki/Datum_(disambiguation)

http://en.wikipedia.org/wiki/Measurement

http://en.wikipedia.org/wiki/Graph_(data_structure)

http://en.wikipedia.org/wiki/Image

http://en.wikipedia.org/wiki/Abstraction

http://en.wikipedia.org/wiki/Information

http://en.wikipedia.org/wiki/Knowledge

2

Shared: means that the individual pieces of data in the DB can be shared among

several different users some piece of data can be accrued of the same time (concurrent

access).

Hardware:

Needs two basically component

Mass media storage ( secondary storage)

Processor with associated memory.

Software:

Between the physical DB itself and the users is software. This software is referred to as the

DataBase Management System (DBMS). The DBMS shields the DB users from hardware

level details.

The DBMS provides users with a view of the DB that is elevated above hardware level.

Users (Database Users):

There are three different types of database system users:

1. Application programmers: who are responsible for writing database application

programs in some programming language such as COBOL, PL/I. C++, Java, or some

higher-level "fourth-generation language”.

2. End users who access the database interactively. A given end user can access the

database via one of the online applications or he can use an interface provided as an

integral part of the system.

3. Database administrator (DBA) who discuss the database administration function and

the associated (very important) data.

Database Administrator

The database administrator is a person having central control over data and programs

accessing that data. Duties of the database administrator include:

1. Scheme definition.

2. Storage structure and access method definition.

3. Scheme and physical organization modification.

4. Granting of authorization for data access.

5. Integrity constraint specification.

3

Why Database?

1. Compactness: There is no need for possibly voluminous paper files.

2. Speed: The machine can retrieve and update data far faster than a human can.

3. Less drudgery: Much of maintaining files by hand is eliminated.

4. Currency: Accurate, up-to-date information is available on demand at any time.

5. Protection: The data can be better protected against unintentional loss and unlawful

access.

Benefits of DB approach:

1. The data can· be shared.

2. Redundancy can be reduced.

3. Inconsistency can be avoided

4. Transaction support call be provided.

5. Integrity can be maintained.

6. Security can be enforced

7. Conflicting requirements can be balanced.

8. Standards can be enforced.

Data Abstraction:

The major purpose of a database system is to provide users with an abstract view of the

system. The system hides certain details of how data is stored and created and maintained.

Complexity should be hidden from database users.

There are several levels of abstraction:

1. The internal level (also known as the storage level or physical level). It is the one

closest to physical storage. It is the one concerned with the way the data is stored

inside the system.

2. Conceptual Level (also known as logical level): Conceptual Level represents the

entire database. Conceptual schema describes the records and relationship included

in the Conceptual view. It also contains the method of deriving the objects in the

conceptual view from the objects in the internal view.

3. The external level (also known as the user level or view level) is the one closest to

the users. It is the one concerned with the way the data is seen by individual users, ·

4

Mappings

In addition to the three level of architecture, there are certain mappings: one

conceptual\internal mapping and several external/conceptual mappings: The

conceptual\internal mapping defines the correspondence between the conceptual view and

the stored database. An external\conceptual mapping defines the correspondence between a

particular external view and the conceptual view.

Instances and schemas in database

Definition of schema: Design of a database is called the schema. Schema is of three types:

Physical schema, logical schema and view schema.

The design of a database at physical level is called physical schema. How the data stored

in blocks of storage is described at this level.

Design of database at logical level is called logical schema. Programmers and database

administrators work at this level. At this level, data can be described as certain types of data

records gets stored in data structures.

5

Design of database at view level is called view schema. This generally describes end user

interaction with database systems.

Definition of instance: The data stored in database at a particular moment of time is called

instance of database. Database schema defines the variable declarations in tables that belong

to a particular database, but the value of these variables at a moment of time is called the

instance of that database.

Database languages:

A database system provides three different types of languages:

1. Data Definition Language (DDL)

It is a language that allows the users to define data and their relationship to other types of

data. It is mainly used to create files, databases, data dictionary and tables within databases.

It is also used to specify the structure of each table, set of associated values with each

attribute, integrity constraints, security and authorization information for each table and

physical storage structure of each table on disk.

The following table gives an overview about usage of DDL statements in SQL.

2. Data Manipulation Language (DML)

It is a language that provides a set of operations to support the basic data manipulation

operations on the data held in the databases. It allows users to insert, update, delete and

retrieve data from the database. The part of DML that involves data retrieval is called a

query language.

The following table gives an overview about the usage of DML statements in SQL:

6

3. Data Control Language (DCL)

DCL statements control access to data and the database using statements such as GRANT

and REVOKE. A privilege can either be granted to a User with the help of GRANT statement.

In addition to granting of privileges, you can also revoke (taken back) it by using REVOKE

command.

The following table gives an overview about the usage of DCL statements in SQL:

Database Management System (DBMS):

A DBMS is a set of software programs that controls the organization, storage, management,

and retrieval of data in a database. So when a user issues an access request using some

particular data sublanguage, the DBMS accepts that request and analyzes it. The DBMS

inspects the external schema for that user, the corresponding external/conceptual mapping,

the conceptual schema, the conceptual/internal mapping, and the stored database definition.

The DBMS executes the necessary operations on the stored database. Here is some examples

of various DBMS (computerized library systems, automated teller machines (ATM), flight

reservation systems and computerized parts inventory systems).

http://en.wikipedia.org/wiki/Software_program

http://en.wikipedia.org/wiki/Organization

http://en.wikipedia.org/wiki/Computer_storage

http://en.wikipedia.org/wiki/Information_retrieval

http://en.wikipedia.org/wiki/Data

http://en.wikipedia.org/wiki/Database

7

Database Manager

The database manager is a program module which provides the interface between the

low-level data stored in the database and the application programs and queries submitted

to the system. So the database manager module is responsible for

Interaction with the file manager: Data stored on disk using the file system

usually provided by a conventional operating system. The database manager

must translate DML statements into low-level file system commands (for storing,

retrieving and updating data in the database).

Integrity enforcement: Checking that updates in the database do not violate

consistency constraints (e.g. no bank account balance below $0)

Security enforcement: Ensuring that users only have access to information they

are permitted to see

Backup and recovery: Detecting failures due to power failure, disk crash,

software errors, etc., and restoring the database to its state before the failure

Concurrency control: Preserving data consistency when there are concurrent

users.

File Manager and Disk Manager

Responsibility for the structure of the files and managing the file space goes to the

file manager. It is also responsible for locating the block containing the required record,

requesting this block from the disk manager, and transmitting the required record to the data

manager as shown. The file manager can be implemented using an interface to the existing

file subsystem provided by the operating system of the host computer or it can include a file

subsystem written especially for the DBMS. While the disk manager is part of the operating

system of the host computer and all physical input and output operations are performed by it.

The disk manager transfers the block or page requested by the file manager so that the latter

need not be concerned with the physical characteristics of the underlying storage media.

Overall System Structure

Database systems are partitioned into modules for different functions. Some functions (e.g.

file systems) may be provided by the operating system. The components of the overall

database system structure include:

8

File manager manages allocation of disk space and data structures used to

represent information on disk.

Database manager: The interface between low-level data and application

programs and queries.

Query processor translates statements in a query language into low-level

instructions the database manager understands. (May also attempt to find an

equivalent but more efficient form.)

DML compiler converts DML statements embedded in an application program

to normal procedure calls in a host language. The compiler interacts with the

query processor.

DDL compiler converts DDL statements to a set of tables containing metadata

stored in a data dictionary.

9

1


Bahaa Dhiaa



Lec. 2 Data Models

Data Models

Data Model can be defined as an integrated collection of concepts for describing and

manipulating data, relationships between data, and constraints on the data in an organization.

It illustrate how the logical structure of a database is modeled, how data is connected to each

other and how they are processed and stored inside the system.

The purpose of a data model is to represent data and to make the data understandable.

There have been many data models proposed, they fall into three broad categories:

Object Based Data Models

Record Based Data Models

Physical Data Models

The object based and record based data models are used to describe data at the conceptual

and external levels, the physical data model is used to describe data at the internal level.

Object based data models use concepts such as entities, attributes, and relationships. An

entity is a distinct object (a person, place, concept, and event) in the organization that is to be

represented in the database. An attribute is a property that describes some aspect of the object

that we wish to record, and a relationship is an association between entities. Some of the

more common types of object based data model are:

Entity-Relationship

Object Oriented

Semantic

Functional

Record based logical Models – Like Object based model, they also describe data at the

conceptual and view levels. These models specify logical structure of database with records,

fields and attributes. There are three types of record based data models defined: Hierarchical

data models, Network data models and Relational data models. Most widely used record

based data model is relational data model. Other two are not widely used.

2

Physical data model represent the model where it describes how data are stored in

computer memory, how they are scattered and ordered in the memory, and how they would

be retrieved from memory. Basically physical data model represents the data at data layer or

internal layer.

Entity-Relationship Model (E-R Model)

It is a graphical technique, which is used to convert the requirement of the system to

graphical representation, so that it can become well understandable. A basic component of

the model is the Entity-Relationship diagram, which is used to visually represent data objects.

ER Model is based on Entities (and their attributes) and Relationships among entities.

Entity: An entity in an ER Model is a real-world entity having properties called

attributes. Every attribute is defined by its set of values called domain. For example,

in a school database, a student is considered as an entity. Student has various attributes

like name, age, class, etc.

Relationship: The logical association among entities is called relationship.

Relationships are mapped with entities in various ways. Mapping cardinalities define

the number of association between two entities. Mapping cardinalities are in four

types:

one to one

one to many

many to one

many to many

3

Relational Model

The most popular data model in DBMS is the Relational Model. It is more scientific

model than others. It is used widely around the world for data storage and processing. This

model is simple and it has all the properties and capabilities required to process data with

storage efficiency. The concepts of this model are:

Tables − In relational data model, relations are saved in the format of Tables. This

format stores the relation among entities. A table has rows and columns, where rows

represents records and columns represent the attributes.

Tuple − A single row of a table, which contains a single record for that relation is

called a tuple.

Relation instance − A finite set of tuples in the relational database system represents

relation instance.

Relation schema − A relation schema describes the relation name (table name),

attributes, and their names.

Attribute domain − Every attribute has some pre-defined value scope, known as

attribute domain.

4

In a relational database, all data are stored and accessed via relations. Relations that store

data are called "base relations", and in implementations are called "tables". Other relations

do not store data, but are computed by applying operations to other relations. These relations

are sometimes called "derived relations". In implementations these are called "views" or

"queries".

Hierarchical Data Models

In this data model, the entities are represented in a hierarchical fashion. Here we

identify a parent entity, and its child entity. Again we drill down to identify next level of child

entity and so on. This model can be imagined as folders inside a folder.

It can also be imagined as root like structure. This model will have only one main root. It

then branches into sub-roots, each of which will branch again. This type of relationship is

best defined for (1:N) type of relationships. E.g.; One company has multiple departments

(1:N), one company has multiple suppliers (1:N),one department has multiple employees

(1:N), each department has multiple projects(1:N).

Network Data Models

This is the enhanced version of hierarchical data model. It is designed to address the

drawbacks of the hierarchical model. It helps to address M:N relationship. This data model

is also represented as hierarchical, but this model will not have single parent concept. Any

child in the tree can have multiple parents here.

http://en.wikipedia.org/wiki/Relation_(database)

http://en.wikipedia.org/w/index.php?title=Relational_operations&action=edit&redlink=1

http://en.wikipedia.org/wiki/View_(database)

1


Bahaa Dhiaa



Lec.3 SQL

What is SQL?

SQL (pronounced "ess-que-el") stands for Structured Query Language. SQL is used to

communicate with a database. According to ANSI (American National Standards Institute),

it is the standard language for relational database management systems. SQL statements are

used to perform tasks such as update data on a database, or retrieve data from a database.

Some common relational database management systems that use SQL are: Oracle, Sybase,

Microsoft SQL Server, Access, Ingres, etc. Although most database systems use SQL, most

of them also have their own additional proprietary extensions that are usually only used on

their system. However, the standard SQL commands such as "Select", "Insert", "Update",

"Delete", "Create", and "Drop" can be used to accomplish almost everything that one needs

to do with a database. This tutorial will provide you with the instruction on the basics of each

of these commands as well as allow you to put them to practice using the SQL Interpreter.

Selecting Data

The select statement is used to query the database and retrieve selected data that match the

criteria that you specify. Here is the format of a simple select statement:

select "column1"

[,"column2",etc]

from "tablename"

[where "condition"];

[] = optional

The column names that follow the select keyword determine which columns will be

returned in the results. You can select as many column names that you'd like, or you can

use a "*" to select all columns.

The table name that follows the keyword from specifies the table that will be queried to

retrieve the desired results.

The where clause (optional) specifies which data values or rows will be returned or

displayed, based on the criteria described after the keyword where.

Conditional selections used in the where clause:

= Equal

> Greater than

< Less than

>= Greater than or equal

2

<= Less than or equal

<> Not equal to

LIKE *See note below

The LIKE pattern matching operator can also be used in the conditional selection of the

where clause. Like is a very powerful operator that allows you to select only rows that are

"like" what you specify. The percent sign "%" can be used as a wild card to match any

possible character that might appear before or after the characters specified. For example:

select first, last, city

from empinfo

where first LIKE 'Er%';

This SQL statement will match any first names that start with 'Er'. Strings must be in

single quotes.

Or you can specify,

select first, last

from empinfo

where last LIKE '%s';

This statement will match any last names that end in a 's'.

select * from empinfo

where first = 'Eric';

This will only select rows where the first name equals 'Eric' exactly.

Sample Table: empinfo

first Last id age city state

John Jones 99980 45 Payson Arizona

Mary Jones 99982 25 Payson Arizona

Eric Edwards 88232 32 San Diego California

Mary Ann Edwards 88233 32 Phoenix Arizona

Ginger Howell 98002 42 Cottonwood Arizona

Sebastian Smith 92001 23 Gila Bend Arizona

Gus Gray 22322 35 Bagdad Arizona

3

Mary Ann May 32326 52 Tucson Arizona

Erica Williams 32327 60 Show Low Arizona

Leroy Brown 32380 22 Pinetop Arizona

Elroy Cleaver 32382 22 Globe Arizona

Perform the following samples of select statements in the SQL then write down your

expected results.

select first, last, city from empinfo;

select last, city, age from empinfo

where age > 30;

select first, last, city, state from empinfo

where first LIKE 'J%';

select * from empinfo;

select first, last, from empinfo

where last LIKE '%s';

select first, last, age from empinfo

where last LIKE '%illia%';

select * from empinfo where first = 'Eric';

Select statement exercises

Enter select statements to:

1. Display the first name and age for everyone that's in the table.

2. Display the first name, last name, and city for everyone that's not from Payson.

3. Display all columns for everyone that is over 40 years old.

4. Display the first and last names for everyone whose last name ends in an "ay".

5. Display all columns for everyone whose first name equals "Mary". 6. Display all columns for everyone whose first name contains "Mary".

Creating Tables The create table statement is used to create a new table. Here is the format of a

simple create table statement:

create table "tablename"

("column1" "data type",

"column2" "data type",

"column3" "data type");

Format of create table if you were to use optional constraints:

create table "tablename"

("column1" "data type"

[constraint],

"column2" "data type"

4

[constraint],

"column3" "data type"

[constraint]);

[ ] = optional

Note: You may have as many columns as you'd like, and the constraints are optional.

Example:

create table employee

(first varchar(15),

last varchar(20),

age number(3),

address varchar(30),

city varchar(20),

state varchar(20));

To create a new table, enter the keywords create table followed by the table name,

followed by an open parenthesis, followed by the first column name, followed by the

data type for that column, followed by any optional constraints, and followed by a

closing parenthesis. It is important to make sure you use an open parenthesis before

the beginning table, and a closing parenthesis after the end of the last column

definition. Make sure you seperate each column definition with a comma. All SQL statements should end with a ";".

The table and column names must start with a letter and can be followed by letters,

numbers, or underscores - not to exceed a total of 30 characters in length. Do not

use any SQL reserved keywords as names for tables or column names (such as

"select", "create", "insert", etc).

Data types specify what the type of data can be for that particular column. If a

column called "Last_Name", is to be used to hold names, then that particular column should have a "varchar" (variable-length character) data type.

Here are the most common Data types:

char(size) Fixed-length character string. Size is specified in parenthesis.

Max 255 bytes.

varchar(size) Variable-length character string. Max size is specified in

parenthesis.

number(size) Number value with a max number of column digits specified in

parenthesis.

date Date value

number(size,d) Number value with a maximum number of digits of "size" total,

with a maximum number of "d" digits to the right of the

decimal.

What are constraints? When tables are created, it is common for one or more columns

to have constraints associated with them. A constraint is basically a rule associated

with a column that the data entered into that column must follow. For example, a

"unique" constraint specifies that no two records can have the same value in a

particular column. They must all be unique. The other two most popular constraints

are "not null" which specifies that a column can't be left blank, and "primary key". A

"primary key" constraint defines a unique identification of each record (or row) in a

table. All of these and more will be covered in the future Advanced release of this

Tutorial. Constraints can be entered in this SQL interpreter, however, they are not

5

supported in this Intro to SQL tutorial & interpreter. They will be covered and

supported in the future release of the Advanced SQL tutorial - that is, if "response" is good.

It's now time for you to design and create your own table. You will use this table

throughout the rest of the tutorial. If you decide to change or redesign the table, you

can either drop it and recreate it or you can create a completely different one. The SQL statement drop will be covered later

Your create statement should resemble:

Create Table Exercise

You have just started a new company. It is time to hire some employees. You will need

to create a table that will contain the following information about your new employees:

firstname, lastname, title, age, and salary. After you create the table, you should receive

a small form on the screen with the appropriate column names. If you are missing any

columns, you need to double check your SQL statement and recreate the table. Once it's

created successfully, go to the "Insert" lesson.

create table

myemployees_ts0211

(firstname varchar(30),

lastname varchar(30),

title varchar(30),

age number(2),

salary number(8,2));

Inserting into a Table

The insert statement is used to insert or add a row of data into the table.

To insert records into a table, enter the key words insert into followed by the table name,

followed by an open parenthesis, followed by a list of column names separated by commas,

followed by a closing parenthesis, followed by the keyword values, followed by the list of

values enclosed in parenthesis. The values that you enter will be held in the rows and they

will match up with the column names that you specify. Strings should be enclosed in single

quotes, and numbers should not.

insert into "tablename"

(first_column,...last_column)

values (first_value,...last_value);

In the example below, the column name first will match up with the value 'Luke', and

the column name state will match up with the value 'Georgia'.

Example:

insert into employee

(first, last, age, address, city, state)

values ('Luke', 'Duke', 45, '2130 Boars Nest',

'Hazard Co', 'Georgia');

6

Note: All strings should be enclosed between single quotes: 'string'

Insert statement exercises

It is time to insert data into your new employee table.

Your first three employees are the following:

Jonie Weber, Secretary, 28, 19500.00

Potsy Weber, Programmer, 32, 45300.00

Dirk Smith, Programmer II, 45, 75020.00

Enter these employees into your table first, and then insert at least 5 more of your own list

of employees in the table.

After they're inserted into the table, enter select statements to:

1. Select all columns for everyone in your employee table.

2. Select all columns for everyone with a salary over 30000.

3. Select first and last names for everyone that's under 30 years old.

4. Select first name, last name, and salary for anyone with "Programmer" in their

title.

5. Select all columns for everyone whose last name contains "ebe".

6. Select the first name for everyone whose first name equals "Potsy".

7. Select all columns for everyone over 80 years old. 8. Select all columns for everyone whose last name ends in "ith".

Create at least 5 of your own select statements based on specific information that you'd like

to retrieve

Updating Records

The update statement is used to update or change records that match a specified criteria.

This is accomplished by carefully constructing a where clause.

update "tablename"

set "columnname" =

"newvalue"

[,"nextcolumn" =

"newvalue2"...]

where "columnname"

OPERATOR "value"

[and|or "column"

OPERATOR "value"];

[] = optional

Examples:

update phone_book

set area_code = 623

where prefix = 979;

7

update phone_book

set last_name = 'Smith', prefix=555, suffix=9292

where last_name = 'Jones';

update employee

set age = age+1

where first_name='Mary' and last_name='Williams';

Update statement exercises

After each update, issue a select statement to verify your changes.

1. Jonie Weber just got married to Bob Williams. She has requested that her last

name be updated to Weber-Williams.

2. Dirk Smith's birthday is today, add 1 to his age.

3. All secretaries are now called "Administrative Assistant". Update all titles

accordingly.

4. Everyone that's making under 30000 are to receive a 3500 a year raise.

5. Everyone that's making over 33500 are to receive a 4500 a year raise.

6. All "Programmer II" titles are now promoted to "Programmer III". 7. All "Programmer" titles are now promoted to "Programmer II".

Create at least 5 of your own update statements and submit them.

Deleting Records

The delete statement is used to delete records or rows from the table.

delete from "tablename"

where "columnname"

OPERATOR "value"

[and|or "column"

OPERATOR "value"];

[ ] = optional

Examples:

delete from employee;

Note: if you leave off the where clause, all records will be deleted!

delete from employee

where lastname = 'May';

delete from employee

where firstname = 'Mike' or firstname = 'Eric';

To delete an entire record/row from a table, enter "delete from" followed by the table

name, followed by the where clause which contains the conditions to delete. If you leave

off the where clause, all records will be deleted.

8

Delete statement exercises

(Use the select statement to verify your deletes):

1. Jonie Weber-Williams just quit, remove her record from the table.

2. It's time for budget cuts. Remove all employees who are making over 70000 dollars.

Create at least two of your own delete statements, and then issue a command to delete all

records from the table.

Drop a Table

The drop table command is used to delete a table and all rows in the table.

To delete an entire table including all of its rows, issue the drop table command followed

by the tablename. drop table is different from deleting all of the records in the table.

Deleting all of the records in the table leaves the table including column and constraint

information. Dropping the table removes the table definition as well as all of its rows.

drop table "tablename"

Example:

drop table myemployees_ts0211;

SELECT Statement

The SELECT statement is used to query the database and retrieve selected data that match

the criteria that you specify.

The SELECT statement has five main clauses to choose from, although, FROM is the only

required clause. Each of the clauses have a vast selection of options, parameters, etc. The

clauses will be listed below, but each of them will be covered in more detail later in the

tutorial.

Here is the format of the SELECT statement:

SELECT [ALL | DISTINCT] column1[,column2]

FROM table1[,table2]

[WHERE "conditions"]

[GROUP BY "column-list"]

[HAVING "conditions]

[ORDER BY "column-list" [ASC | DESC] ]

Example:

9

SELECT name, title, dept

FROM employee

WHERE title LIKE 'Pro%';

The above statement will select all of the rows/values in the name, title, and dept columns

from the employee table whose title starts with 'Pro'. This may return job titles including

Programmer or Pro-wrestler.

ALL and DISTINCT are keywords used to select either ALL (default) or the "distinct" or

unique records in your query results. If you would like to retrieve just the unique records in

specified columns, you can use the "DISTINCT" keyword. DISTINCT will discard the

duplicate records for the columns you specified after the "SELECT" statement: For

example:

SELECT DISTINCT age

FROM employee_info;

This statement will return all of the unique ages in the employee_info table.

ALL will display "all" of the specified columns including all of the duplicates. The ALL

keyword is the default if nothing is specified.

Exercises

1. From the items_ordered table, select a list of all items purchased for customerid

10449. Display the customerid, item, and price for this customer.

2. Select all columns from the items_ordered table for whoever purchased a Tent.

3. Select the customerid, order_date, and item values from the items_ordered

table for any items in the item column that start with the letter "S".

4. Select the distinct items in the items_ordered table. In other words, display a listing of each of the unique items from the items_ordered table.

Aggregate Functions

MIN returns the smallest value in a given column

MAX returns the largest value in a given column

SUM returns the sum of the numeric values in a given column

AVG returns the average value of a given column

COUNT returns the total number of values in a given column

COUNT(*) returns the number of rows in a table

Aggregate functions are used to compute against a "returned column of numeric data" from

your SELECT statement. They basically summarize the results of a particular column of

selected data. We are covering these here since they are required by the next topic,

10

"GROUP BY". Although they are required for the "GROUP BY" clause, these functions

can be used without the "GROUP BY" clause. For example:

SELECT AVG(salary)

FROM employee;

This statement will return a single result which contains the average value of everything

returned in the salary column from the employee table.

Another example:

SELECT AVG(salary)

FROM employee;

WHERE title = 'Programmer';

This statement will return the average salary for all employees whose title is equal to

'Programmer'

Example:

SELECT Count(*)

FROM employees;

This particular statement is slightly different from the other aggregate functions since there

isn't a column supplied to the count function. This statement will return the number of rows

in the employees table.

Review Exercises

1. Select the maximum price of any item ordered in the items_ordered table. Hint:

Select the maximum price only.>

2. Select the average price of all of the items ordered that were purchased in the

month of Dec.

3. What are the total number of rows in the items_ordered table?

4. For all of the tents that were ordered in the items_ordered table, what is the price of the lowest tent? Hint: Your query should return the price only.

GROUP BY clause

The GROUP BY clause will gather all of the rows together that contain data in the

specified column(s) and will allow aggregate functions to be performed on the one or more

columns. This can best be explained by an example:

GROUP BY clause syntax:

11

SELECT column1,

SUM(column2)

FROM "list-of-tables"

GROUP BY "column-list";

Let's say you would like to retrieve a list of the highest paid salaries in each dept:

SELECT max(salary), dept

FROM employee

GROUP BY dept;

This statement will select the maximum salary for the people in each unique department.

Basically, the salary for the person who makes the most in each department will be

displayed. Their salary and their department will be returned.

Multiple Grouping Columns - What if I wanted to display their lastname too?

For example, take a look at the items_ordered table. Let's say you want to group everything

of quantity 1 together, everything of quantity 2 together, everything of quantity 3 together,

etc. If you would like to determine what the largest cost item is for each grouped quantity

(all quantity 1's, all quantity 2's, all quantity 3's, etc.), you would enter:

SELECT quantity, max(price)

FROM items_ordered

GROUP BY quantity;

Enter the statement in above, and take a look at the results to see if it returned what you were

expecting. Verify that the maximum price in each Quantity Group is really the maximum

price.

Review Exercises

1. How many people are in each unique state in the customers table? Select the

state and display the number of people in each. Hint: count is used to count rows

in a column, sum works on numeric data only.

2. From the items_ordered table, select the item, maximum price, and minimum

price for each specific item in the table. Hint: The items will need to be broken

up into separate groups.

3. How many orders did each customer make? Use the items_ordered table. Select

the customerid, number of orders they made, and the sum of their orders. Click the Group By answers link below if you have any problems

HAVING clause

The HAVING clause allows you to specify conditions on the rows for each group - in other

words, which rows should be selected will be based on the conditions you specify. The

HAVING clause should follow the GROUP BY clause if you are going to use it.

http://sqlcourse2.com/groupby-1.html

12

HAVING clause syntax:

SELECT column1,

SUM(column2)


GROUP BY "column-list"

HAVING "condition";

HAVING can best be described by example. Let's say you have an employee table containing

the employee's name, department, salary, and age. If you would like to select the average

salary for each employee in each department, you could enter:

SELECT dept, avg(salary)

FROM employee

GROUP BY dept;

But, let's say that you want to ONLY calculate & display the average if their salary is over

20000:

SELECT dept, avg(salary)

FROM employee

GROUP BY dept

HAVING avg(salary) > 20000;

Review Exercises (note: yes, they are similar to the group by exercises, but these

contain the HAVING clause requirements

1. How many people are in each unique state in the customers table that have

more than one person in the state? Select the state and display the number of

how many people are in each if it's greater than 1.

2. From the items_ordered table, select the item, maximum price, and minimum

price for each specific item in the table. Only display the results if the maximum

price for one of the items is greater than 190.00.

3. How many orders did each customer make? Use the items_ordered table. Select

the customerid, number of orders they made, and the sum of their orders if

they purchased more than 1 item.

ORDER BY clause

ORDER BY is an optional clause which will allow you to display the results of your query

in a sorted order (either ascending order or descending order) based on the columns that

you specify to order by.

ORDER BY clause syntax:

13

SELECT column1, SUM(column2)


ORDER BY

"column-list" [ASC | DESC]; [ ] = optional

This statement will select the employee_id, dept, name, age, and salary from the

employee_info table where the dept equals 'Sales' and will list the results in Ascending

(default) order based on their Salary.

ASC = Ascending Order - default

DESC = Descending Order

For example:

SELECT employee_id, dept, name, age, salary

FROM employee_info

WHERE dept = 'Sales'

ORDER BY salary;

If you would like to order based on multiple columns, you must seperate the columns with

commas. For example:

SELECT employee_id, dept, name, age, salary FROM employee_info WHERE dept = 'Sales' ORDER BY salary, age DESC;

Review Exercises

1. Select the lastname, firstname, and city for all customers in the customers

table. Display the results in Ascending Order based on the lastname.

2. Same thing as exercise #1, but display the results in Descending order.

3. Select the item and price for all of the items in the items_ordered table that the

price is greater than 10.00. Display the results in Ascending order based on the price.

Combining conditions and Boolean Operators

The AND operator can be used to join two or more conditions in the WHERE clause. Both

sides of the AND condition must be true in order for the condition to be met and for those

rows to be displayed.

SELECT column1,

SUM(column2)


WHERE "condition1" AND

"condition2";

14

The OR operator can be used to join two or more conditions in the WHERE clause also.

However, either side of the OR operator can be true and the condition will be met - hence,

the rows will be displayed. With the OR operator, either side can be true or both sides can

be true.

For example:

SELECT employeeid, firstname, lastname, title, salary

FROM employee_info

WHERE salary >= 50000.00 AND title = 'Programmer';

This statement will select the employeeid, firstname, lastname, title, and salary from the

employee_info table where the salary is greater than or equal to 50000.00 AND the title is

equal to 'Programmer'. Both of these conditions must be true in order for the rows to be

returned in the query. If either is false, then it will not be displayed.

Although they are not required, you can use paranthesis around your conditional

expressions to make it easier to read:

SELECT employeeid, firstname, lastname, title, salary

FROM employee_info

WHERE (salary >= 50000.00) AND (title = 'Programmer');

Another Example:

SELECT firstname, lastname, title, salary

FROM employee_info

WHERE (title = 'Sales') OR (title = 'Programmer');

This statement will select the firstname, lastname, title, and salary from the employee_info

table where the title is either equal to 'Sales' OR the title is equal to 'Programmer'.

Review Exercises

1. Select the customerid, order_date, and item from the items_ordered table for

all items unless they are 'Snow Shoes' or if they are 'Ear Muffs'. Display the

rows as long as they are not either of these two items.

2. Select the item and price of all items that start with the letters 'S', 'P', or 'F'.

IN and BETWEEN Conditional Operators

SELECT col1, SUM(col2)


WHERE col3 IN

(list-of-values);

15

SELECT col1, SUM(col2)


WHERE col3 BETWEEN value1

AND value2;

The IN conditional operator is really a set membership test operator. That is, it is used to

test whether or not a value (stated before the keyword IN) is "in" the list of values provided

after the keyword IN.

For example:

SELECT employeeid, lastname, salary

FROM employee_info

WHERE lastname IN ('Hernandez', 'Jones', 'Roberts', 'Ruiz');

This statement will select the employeeid, lastname, salary from the employee_info table

where the lastname is equal to either: Hernandez, Jones, Roberts, or Ruiz. It will return the

rows if it is ANY of these values.

The IN conditional operator can be rewritten by using compound conditions using the

equals operator and combining it with OR - with exact same output results:

SELECT employeeid, lastname, salary

FROM employee_info

WHERE lastname = 'Hernandez' OR lastname = 'Jones' OR lastname =

'Roberts'

OR lastname = 'Ruiz';

As you can see, the IN operator is much shorter and easier to read when you are testing for

more than two or three values.

You can also use NOT IN to exclude the rows in your list.

The BETWEEN conditional operator is used to test to see whether or not a value (stated

before the keyword BETWEEN) is "between" the two values stated after the keyword

BETWEEN.

For example:

SELECT employeeid, age, lastname, salary

FROM employee_info

WHERE age BETWEEN 30 AND 40;

16

This statement will select the employeeid, age, lastname, and salary from the employee_info

table where the age is between 30 and 40 (including 30 and 40).

This statement can also be rewritten without the BETWEEN operator:

SELECT employeeid, age, lastname, salary

FROM employee_info

WHERE age >= 30 AND age <= 40;

You can also use NOT BETWEEN to exclude the values between your range.

Review Exercises

1. Select the date, item, and price from the items_ordered table for all of the rows

that have a price value ranging from 10.00 to 80.00.

2. Select the firstname, city, and state from the customers table for all of the rows

where the state value is either: Arizona, Washington, Oklahoma, Colorado, or Hawaii.

Mathematical Operators

Standard ANSI SQL-92 supports the following first four basic arithmetic operators:

+ addition

- subtraction

* multiplication

/ division

% Modulo

The modulo operator determines the integer remainder of the division. This operator is not

ANSI SQL supported, however, most databases support it. The following are some more

useful mathematical functions to be aware of since you might need them. These functions

are not standard in the ANSI SQL-92 specs, therefore they may or may not be available on

the specific RDBMS that you are using. However, they were available on several major

database systems that I tested. They WILL work on this tutorial.

ABS(x) returns the absolute value of x

SIGN(x) returns the sign of input x as -1, 0, or 1 (negative, zero, or

positive respectively)

17

MOD(x,y) modulo - returns the integer remainder of x divided by y (same

as x%y)

FLOOR(x) returns the largest integer value that is less than or equal to x

CEILING(x) or

CEIL(x)

returns the smallest integer value that is greater than or equal to

x

POWER(x,y) returns the value of x raised to the power of y

ROUND(x) returns the value of x rounded to the nearest whole integer

ROUND(x,d) returns the value of x rounded to the number of decimal places

specified by the value d

SQRT(x) returns the square-root value of x

For example:

SELECT round(salary), firstname

FROM employee_info

This statement will select the salary rounded to the nearest whole value and the firstname

from the employee_info table.

Review Exercises

1. Select the item and per unit price for each item in the items_ordered table. Hint: Divide the price by the quantity.

Table Joins, a must

All of the queries up until this point have been useful with the exception of one major

limitation - that is, you've been selecting from only one table at a time with your SELECT

statement. It is time to introduce you to one of the most beneficial features of SQL &

relational database systems - the "Join". To put it simply, the "Join" makes relational

database systems "relational".

Joins allow you to link data from two or more tables together into a single query result--

from one single SELECT statement.

A "Join" can be recognized in a SQL SELECT statement if it has more than one table after

the FROM keyword.

For example:

18

SELECT "list-of-columns"

FROM table1,table2

WHERE "search-condition(s)"

Joins can be explained easier by demonstrating what would happen if you worked with one

table only, and didn't have the ability to use "joins". This single table database is also

sometimes referred to as a "flat table". Let's say you have a one-table database that is used

to keep track of all of your customers and what they purchase from your store:

id first last address city state zip date item price

Everytime a new row is inserted into the table, all columns will be be updated, thus

resulting in unnecessary "redundant data". For example, every time Wolfgang Schultz

purchases something, the following rows will be inserted into the table:

id first last address

city

stat

e zip date item price

1098

2

Wolfgan

g

Schult

z 300 N. 1st Ave

Yum

a AZ

8500

2

03229

9

snowboar

d

45.0

0

1098

2

Wolfgan

g

Schult

z 300 N. 1st Ave

Yum

a AZ

8500

2

08289

9

snow

shovel

35.0

0

1098

2

Wolfgan

g

Schult

z 300 N. 1st Ave

Yum

a AZ

8500

2

09119

9 gloves

15.0

0

1098

2

Wolfgan

g

Schult

z 300 N. 1st Ave

Yum

a AZ

8500

2

10099

9 lantern

35.0

0

1098

2

Wolfgan

g

Schult

z 300 N. 1st Ave

Yum

a AZ

8500

2

02290

0 tent

85.0

0

An ideal database would have two tables:

1. One for keeping track of your customers 2. And the other to keep track of what they purchase:

"Customer_info" table:

customer_number firstname lastname address city state zip

19

"Purchases" table:

customer_number date item price

Now, whenever a purchase is made from a repeating customer, the 2nd table, "Purchases"

only needs to be updated! We've just eliminated useless redundant data, that is, we've just

normalized this database!

Notice how each of the tables have a common "cusomer_number" column. This column,

which contains the unique customer number will be used to JOIN the two tables. Using the

two new tables, let's say you would like to select the customer's name, and items they've

purchased. Here is an example of a join statement to accomplish this:

SELECT customer_info.firstname, customer_info.lastname, purchases.item

FROM customer_info, purchases

WHERE customer_info.customer_number = purchases.customer_number;

This particular "Join" is known as an "Inner Join" or "Equijoin". This is the most common

type of "Join" that you will see or use.

Notice that each of the colums are always preceeded with the table name and a period. This

isn't always required, however, it IS good practice so that you wont confuse which colums

go with what tables. It is required if the name column names are the same between the two

tables. I recommend preceeding all of your columns with the table names when using joins.

Note: The syntax described above will work with most Database Systems -including

the one with this tutorial. However, in the event that this doesn't work with yours,

please check your specific database documentation.

Although the above will probably work, here is the ANSI SQL-92 syntax specification for

an Inner Join using the preceding statement above that you might want to try:

SELECT customer_info.firstname, customer_info.lastname, purchases.item

FROM customer_info INNER JOIN purchases

ON customer_info.customer_number = purchases.customer_number;

Another example:

SELECT employee_info.employeeid, employee_info.lastname,

employee_sales.comission

FROM employee_info, employee_sales

WHERE employee_info.employeeid = employee_sales.employeeid;

http://sqlcourse2.com/normalization.html

20

This statement will select the employeeid, lastname (from the employee_info table), and the

comission value (from the employee_sales table) for all of the rows where the employeeid

in the employee_info table matches the employeeid in the employee_sales table.

Review Exercises

1. Write a query using a join to determine which items were ordered by each of

the customers in the customers table. Select the customerid, firstname,

lastname, order_date, item, and price for everything each customer purchased

in the items_ordered table.

2. Repeat exercise #1, however display the results sorted by state in descending order.

1


Bahaa Dhiaa



Lec.4

Integrity Constraints

Integrity constraints are used to ensure accuracy and consistency of data in a relational

database. Many types of integrity constraints play a role in referential integrity (RI). Some of

these integrity constraints are (Primary Key Constraints, Unique Constraints, Foreign Key

Constraints, Not Null Constraints, Check Constraints, Triggers and others)

Primary Key Constraints

Primary key is the term used to identify one or more columns in a table that make a

row of data unique. Although the primary key typically consists of one column in a table,

more than one column can comprise the primary key. For example, either the employee’s

Social Security number or an assigned employee identification number is the logical primary

key for an employee table. The objective is for every record to have a unique primary key or

value for the employee’s identification number.

Because there is probably no need to have more than one record for each employee in an

employee table, the employee identification number makes a logical primary key. The

primary key is assigned at table creation.

The following example identifies the EMP_ID column as the PRIMARY KEY for the

EMPLOYEES table:

This method of defining a primary key is accomplished during table creation. The

primary key in this case is an implied constraint. You can also specify a primary key explicitly

as a constraint when setting up a table, as follows:

2

The primary key constraint in this example is defined after the column comma list in

the CREATE TABLE statement. A primary key that consists of more than one column can

be defined by either of the following methods:

Or:

Unique Constraints

A unique column constraint in a table is similar to a primary key in that the value in

that column for every row of data in the table must have a unique value. Although a primary

key constraint is placed on one column, you can place a unique constraint on another column

even though it is not actually for use as the primary key. Study the following example:

3

The primary key in this example is EMP_ID, meaning that the employee

identification number is the column that is used to ensure that every record in the table is

unique. The primary key is a column that is normally referenced in queries, particularly to

join tables. The column EMP_PHONE has been designated as a UNIQUE value, meaning

that no two employees can have the same telephone number. There is not a lot of difference

between the two, except that the primary key is used to provide an order to data in a table

and, in the same respect, join related tables.

Foreign Key Constraints

A foreign key is a column in a child table that references a primary key in the

parent table. A foreign key constraint is the main mechanism used to enforce referential

integrity between tables in a relational database. A column defined as a foreign key is used

to reference a column defined as a primary key in another table.

Study the creation of the foreign key in the following example:

The EMP_ID column in this example has been designated as the foreign key for the

EMPLOYEE_PAY_TBL table. This foreign key, as you can see, references the EMP_ID

column in the EMPLOYEE_TBL table. This foreign key ensures that for every EMP_ID in

the EMPLOYEE_PAY_TBL, there is a corresponding EMP_ID in the EMPLOYEE_TBL.

This is called a parent/child relationship. The parent table is the EMPLOYEE_TBL table,

and the child table is the EMPLOYEE_PAY_TBL table

4

In this figure, the EMP_ID column in the child table references the EMP_ID column

in the parent table. For a value to be inserted for EMP_ID in the child table, a value for

EMP_ID in the parent table must first exist. Likewise, for a value to be removed for EMP_ID

in the parent table, all corresponding values for EMP_ID must first be removed from the

child table. This is how referential integrity works.

A foreign key can be added to a table using the ALTER TABLE command, as

shown in the following example:

NOT NULL Constraints

Previous examples use the keywords NULL and NOT NULL listed on the same line

as each column and after the data type. NOT NULL is a constraint that you can place on a

table’s column. This constraint disallows the entrance of NULL values into a column; in

other words, data is required in a NOT NULL column for each row of data in the table. NULL

is generally the default for a column if NOT NULL is not specified, allowing NULL values

in a column.

5

Check Constraints

Check (CHK) constraints can be utilized to check the validity of data entered into

particular table columns. Check constraints are used to provide back-end database edits,

although edits are commonly found in the front-end application as well. General edits restrict

values that can be entered into columns or objects, whether within the database itself or on a

front-end application. The check constraint is a way of providing another protective layer for

the data. The following example illustrates the use of a check constraint:

The check constraint in this table has been placed on the EMP_ZIP column, ensuring

that all employees entered into this table have a ZIP code of ‘46234’. Perhaps that is a little

restricting. Nevertheless, you can see how it works.

If you wanted to use a check constraint to verify that the ZIP code is within a list of

values, your constraint definition could look like the following:

If there is a minimum pay rate that can be designated for an employee, you could

have a constraint that looks like the following:

6

Dropping Constraints

Any constraint that you have defined can be dropped using the ALTER TABLE

command with the DROP CONSTRAINT option. For example, to drop the primary key

constraint in the EMPLOYEES table, you can use the following command:

Some implementations might provide shortcuts for dropping certain constraints. For

example, to drop the primary key constraint for a table in MySQL, you can use the following

command:

Triggers

A trigger is a compiled SQL procedure in the database used to perform actions based

on other actions that occur within the database. A trigger is a form of a stored procedure that

is executed when a specified DML action is performed on a table. The trigger can be executed

before or after an INSERT, DELETE, or UPDATE statement. Triggers can also be used to

check data integrity before an INSERT, DELETE, or UPDATE statement. Triggers can roll

back transactions, and they can modify data in one table and read from another table in

another database.

The basic syntax for Oracle is as follows:

The following is an example trigger:

7

The preceding example shows the creation of a trigger called EMP_PAY_TRIG. This

trigger inserts a row into the EMPLOYEE_PAY_HISTORY table, reflecting the changes

made every time a row of data is updated in the EMPLOYEE_PAY_TBL table.

The DROP TRIGGER Statement A trigger can be dropped using the DROP

TRIGGER statement. The syntax for dropping a trigger is as follows:

The FOR EACH ROW syntax allows the developer to have the procedure fire for

each row that is affected by the SQL statement or once for the statement as a whole. The

syntax is as follows:

The difference is how many times the trigger is executed. If you create a regular

trigger and execute a statement against the table that affects 100 rows, the trigger is executed

once. If instead you create the trigger with the FOR EACH ROW syntax and execute the

statement again, the trigger is executed 100 times, once for each row that is affected by the

statement.

8

Example1: Create an SQL trigger to keep in a table the ID and the registration date for

every employee will be registered in the company table.

Employee (ID, name, age, address, salary)

Example2: For the following tables:

Employee (name, ID, salary, Dept#, Supervisor)

Department (Dept_Name, Dept#, total_sal, manager)

The total_sal attribute is derived attribute. Its value is the sum of the salaries of all

employees in a particular department. Create SQL triggers so the value of the total_sal will

always be correct for all events may be happened.

HW: In a library database which contains the following tables, the NoBrwBooks table is

created to indicate the number of borrowed books of each section (category). Write SQL

triggers to do that. Note: status attribute data are either borrowed or available.

Books (ID, Title, Author, Status, Section)

NoBrwBooks (Total_brwBooks, Section)

1


Bahaa Dhiaa



Lec.5 File Organization

File Organization

A database is mapped into a number of different files that are maintained by the

underlying operating system. These files reside permanently on disks. A file is organized

logically as a sequence of records. These records are mapped onto disk blocks. Blocks are of

a fixed size determined by the operating system, but record sizes vary.

One approach to mapping database to files is to use several files, and to store records

of one fixed length in a given file. An alternative way is to structure files such that we can

accommodate multiple length for records. Files of fixed-length records are easier to

implement than files of variable-length case.

Fixed-Length Records

Consider a file of deposit records of the form:

type deposit = record

bname : char(22);

account# : char(10);

balance : real;

end

If we assume that each character occupies one byte, an integer occupies 4 bytes,

and a real 8 bytes, our deposit record is 40 bytes long. The simplest approach is to use the

first 40 bytes for the first record, the next 40 bytes for the second, and so on.

2

However, there are two problems with this approach.

1. It is difficult to delete a record from this structure. Space occupied must

somehow be deleted, or we need to mark deleted records so that they can be

ignored.

2. Unless block size is a multiple of 40, some records will cross block

boundaries. It would then require two block accesses to read or write such a

record.

When a record is deleted, we could move all successive records up one (Figures

below), which may require moving a lot of records. We could instead move the last record

into the ``hole'' created by the deleted record. This changes the order the records are in.

3

It turns out to be undesirable to move records to occupy freed space, as moving

requires block accesses. Also, insertions tend to be more frequent than deletions. It is

acceptable to leave the space open and wait for a subsequent insertion. This leads to a need

for additional structure in our file design.

So one solution is:

- At the beginning of a file, allocate some bytes as a file header.

- This header for now need only be used to store the address of the first record whose

contents are deleted.

- This first record can then store the address of the second available record, and so on

(see figure below).

- To insert a new record, we use the record pointed to by the header, and change the

header pointer to the next available record.

- If no deleted records exist we add our new record to the end of the file.

- Note: Use of pointers requires careful programming. If a record pointed to is moved

or deleted, and that pointer is not corrected, the pointer becomes a dangling

pointer. Records pointed to are called pinned.

4

Fixed-length file insertions and deletions are relatively simple because ``one size

fits all''. For variable length, this is not the case

Variable-Length Records

Variable-length records arise in a database in several ways:

o Storage of multiple items in a file.

o Record types allowing variable field size

o Record types allowing repeating fields

We'll look at several techniques, using one example with a variable-length record:

type account-list = record

bname : char(22);

account-info : array [1 .. ∞] of record;

account# : char(10);

balance : real;

end

end

Account-information is an array with an arbitrary number of elements.

1. Byte string representation

A simple method for implementing variable-length records is to attach a special

end-of-record symbol ( ) to the end of each record. Each record is stored as a string

of successive bytes (as shown below).

5

Byte string representation has several disadvantages:

o It is not easy to re-use space left by a deleted record.

o In general, there is no space for records to grow longer. So the record must be moved

and movement is costly if the record is pinned.)

So this method is not usually used and a modified form of byte-string representation

is used. It is called Slot page structure. There is a header at the beginning of each block,

containing:

o The number of record entries in the header

o The end of free space in the block

o An array whose entries contain the location and size of each record.

The slot page structure requires that there be no pointers that point directly to

records. Instead, pointers must point to the entry in the header that contains the actual location

of the record. This level of indirection allows records to be moved to prevent fragmentation

of space inside a block, while supporting indirect pointers to the record.

6

2. Fixed-length representation

Another way to implement variable-length records is to use one or more fixed-

length records to represent one variable-length record.

There are two techniques for implementing files of variable-length records using fixed-

length records:

o Reserved space - uses fixed-length records large enough to accommodate

the largest variable-length record. (Unused space filled with end-of-record

symbol.)

o Pointers - represent by a list of fixed-length records, chained together.

The reserved space method requires the selection of some maximum record

length. (as shown below)

If most records are of near-maximum length this method is useful. Otherwise, space

is wasted. Then the pointer method may be used (see figure below). Its disadvantage is that

space is wasted in successive records in a chain as non-repeating fields are still present.

7

To overcome this disadvantage, we can split records into two blocks (See Figure below)

o Anchor block - contains first records of a chain

o Overflow block - contains records other than first in the chain.

Now all records in a block have the same length, and there is no wasted space.

Organization of Records in Files

So far, we have studied how records are represented in a file structure, the next

question is how to organize them in a file. Several of the possible ways of organizing records

in files are:

Heap file organization. Any record can be placed anywhere in the file where there

is space for the record. There is no ordering of records.

Sequential file organization. Records are stored in sequential order, according to the

value of a “search key” of each record.

Hashing file organization. A hash function is computed on some attribute of each

record. The result of the hash function specifies in which block of the file the record

should be laced.

Clustering file organization. Related records of the different relations are stored on

the same block, so that one I/O operation fetches related records from all the relations.

8

Sequential File Organization

A sequential file is designed for efficient processing of records in sorted order based

on some search key. A search key is any attribute or set of attributes. Records are chained

together by pointers to permit fast retrieval in search key order. The pointer in each record

points to the next record in search-key order. Furthermore, to minimize the number of block

accesses in sequential file processing, we store records physically in search-key order, or as

close to search-key order as possible. Figure below shows an example, with bname as the

search key.

It is difficult, to maintain physical sequential order as records are inserted and

deleted, since it is costly to move many records as a result of a single insertion or deletion.

Deletion can be managed with the pointer chains. For insertion, we must find the right

location for the inserted record based on the search key, if there is a free record (that is, space

left after a deletion) within the same block, then insert the new record there. Otherwise, insert

the new record in an overflow block and adjust the pointers to chain together the records in

search-key order.

1


Bahaa Dhiaa



Lec.6 indexing and hashing

Basic Concepts

Index – in books- is alphabetically arranged list of terms given at the end of a printed

book with page numbers on which the terms can be found. Database-system indices play the

same role as book indices in libraries. For example, to retrieve a student record given an ID,

the database system would look up an index to find on which disk block the corresponding

record resides, and then fetch the disk block, to get the appropriate student record. An

attribute or set of attributes used to look up records in a file is called a search key.

There are two basic kinds of indices:

Ordered indices. Based on a sorted ordering of the values.

Hash indices. Based on a uniform distribution of values across a range of buckets

which its value determined by a hash function.

No one technique is the best. Each technique is best suited to particular database

applications. Each technique must be evaluated on the basis of these factors:

Access types: The types of access that are supported efficiently.

Access time: The time it takes to find a particular data item.

Insertion time: The time it takes to insert a new data item.

Deletion time: The time it takes to delete a data item.

Space overhead: The additional space occupied by an index structure.

Ordered Indices

A file may have several indices, on different search keys. If the file containing the

records is sequentially ordered, a primary index is an index whose search key defines the

sequential order of the file. Primary indices are also called clustering indices. At the other

hand, Indices whose search key specifies an order different from the sequential order of the

2

file are called secondary indices, or nonclustering indices. There are two types of ordered

indices that we can use:

Dense index: An index record appears for every search-key value in the file. In a

dense index, the index record contains the search-key value and a pointer to the first

data record with that search-key value. The rest of the records with the same search

key-value would be stored sequentially after the first record.

Sparse index: To locate a record indexed by sparse index technique, we find the

index entry with the largest search-key value that is less than or equal to the search-

key value for which we are looking. We start at the record pointed to by that index

entry, and follow the pointers in the file until we find the desired record.

The two figures below show dense and sparse indices, respectively, for the account file.

Suppose that we are looking up records for the Perryridge branch. Using the dense index of

Figure A, we follow the pointer directly to the first Perryridge record. We process this record,

and follow the pointer in that record to locate the next record in search-key (branch-name)

order. We continue processing records until we encounter a record for a branch other than

Perryridge. If we are using the sparse index (Figure B), we do not find an index entry for

“Perryridge.” Since the last entry (in alphabetic order) before “Perryridge” is “Mianus,” we

follow that pointer. We then read the account file in sequential order until we find the first

Perryridge record, and begin processing at that point.

3

It is clear that it is faster to locate a record if we have a dense index rather than a

sparse index. However, sparse indices have advantages over dense indices in that they require

less space and they impose less maintenance overhead for insertions and deletions.

Multilevel Indices

Even if we use a sparse index, the index itself may become too large for efficient

processing. If we have a file with 100,000 records, with 10 records stored in each block. If

we have one index record per block, the index has 10,000 records. Index records are smaller

than data records, so let us assume that 100 index records fit on a block. Thus, we need 100

blocks just for index. Such large indices are stored as sequential files on disk. If an index is

sufficiently small to be kept in main memory, the search time to find an entry is low.

However, if the index is so large that it must be kept on disk, a search for an entry requires

several disk block reads.

To deal with this problem, we treat the index just as we would treat any other

sequential file, and construct a sparse index on the primary index, as in Figure below.

To locate a record, we first use binary search on the outer index to find the record for

the largest search-key value less than or equal to the one that we desire. The pointer points to

a block of the inner index. We scan this block until we find the record that has the largest

search-key value less than or equal to the one that we desire. The pointer in this record points

to the block of the file that contains the record we are looking for.

4

Hash File Organization

In a hash file organization, we obtain the address of the disk block containing

a desired record directly by computing a function on the search-key value of the record.

A term bucket is used to indicate to a unit of storage that can store one or more records.

A bucket is typically a disk block, but could be chosen to be smaller or larger than a disk

block. If K indicate to the set of all search-key values, and B indicate to the set of all bucket

addresses, then a hash function H is a function from K to B which is used for access, insertion

and deletion.

Records with different search-key values may be mapped to the same bucket; thus

entire bucket has to be searched sequentially to locate a record.

5

Index Definition in SQL

We create an index by the create index command, which takes the form:

create index <index-name> on <relation-name> (<attribute-list>)

The attribute-list is the list of attributes of the relations that form the search key for

the index. To define an index name b-index on the branch relation with branch-name as the

search key, we write:

create index b-index on branch (branch-name)

If we wish to declare that the search key is a candidate key, we add the attribute

unique to the index definition. Thus, the command:

create unique index b-index on branch (branch-name)

declares branch-name to be a candidate key for branch. If, at the time we enter the create

unique index command and branch-name is not a candidate key, the system will display an

error message, and the attempt to create the index will fail. If the index creation attempt

succeeds, any subsequent attempt to insert a tuple that violates the key declaration will fail.

Note that the unique feature is redundant if the database system supports the unique

declaration of the SQL standard.

If an index is required to be dropped, the drop index command takes the form:

drop index <index-name>

So if we want to drop the index b-index that we created, the command is:

drop index b-index

Multiple-Key Access

For certain types of queries, it is advantageous to use multiple indices if they exist.

So if multiple indices exist, the indices may either use multiple single-key or indices use

multiple keys.

6

1- Using Multiple Single-Key Indices

Assume that the account file has two indices: one for branch-name and one for balance.

Consider the following query: “Find all account numbers at the Perryridge branch with

balances equal to $1000.” We write:

select account-number

from account

where branch-name = “Perryridge” and balance = 1000

There are three strategies possible for processing this query:

a) Use the index on branch-name to find all records related to the Perryridge branch,

then examine each such record to see whether balance = 1000.

b) Use the index on balance to find all records related to accounts with balances of

$1000, then examine each such record to see whether branch-name = “Perryridge.”

c) Use the index on branch-name to find pointers to all records related to the Perryridge

branch. Also, use the index on balance to find pointers to all records related to

accounts with a balance of $1000. Take the intersection of these two sets of pointers.

Those pointers that are in the intersection point to records related to both Perryridge

and accounts with a balance of $1000.

The third strategy is the only one of the three that takes advantage of the existence of

multiple indices.

2- Indices on Multiple Keys

The strategy for this case is to create and use an index on a search key of multiple

attributes (branch-name, balance). The index on the combined search-key will fetch only

records that satisfy both conditions.

Documents

Lec. 1 Introduction · Lec. 1 Introduction What is Database? DB. A collection of information organized in such a way that a computer program can quickly select desired pieces of data