My Extendible Hashing Report1

VISVESVARAYA TECHNOLOGICAL UNIVERSITYMachche, Belgaum

2010 - 2011

A project report On

“Extendible Hashing”

Submitted in partial fulfillment of the requirements for the award of degree

Bachelor of EngineeringIn

Information Science and Engineering

ByShiva Shankar B.N

1RV08IS048

Under the guidance of

Kavitha S.NProfessor,

Dept. Of ISE,RV College Of Engineering

Nagraj G CholliProfessor,

Dept. Of ISE,RV College Of Engineering

Department of Information Science and Engineering,

R. V. College of Engineering, (An Autonomous Institution under VTU, Accredited by NBA)

Bangalore – 560059

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

Department of Information Science and Engineering

R.V. College of Engineering,(An Autonomous Institution under VTU, Accredited by NBA)

Bangalore – 560059

CERTIFICATE

This is to certify that the FS project entitled

“Extendible Hashing”

has been successfully carried out by Shiva Shankar B.N bearing USN:1RV08IS048 in

partial fulfillment of; File Structures Lab (07IS63); the requirements for the

award of degree of Bachelor of Engineering in Information Science and

Engineering, during the academic year 2010-2011. It is certified that all

corrections/suggestions indicated for Internal Assessment have been incorporated in Report

deposited in departmental library. The project has been approved as it satisfies the academic

requirements in respect of Project work prescribed for the File Structures Lab.

Kavitha S.NProfessor

Dept. Of ISE,RV College Of

Engineering

Nagraj V CholliProfessor

Dept. Of ISE,RV College Of

Engineering

Dr. Ramakanth Kumar PProf &HOD,Dept. Of ISE,

RV College Of Engineering

PrincipalRVCE

Name of the Examiners Signature with Date

1.____________________ __________________

2.____________________ __________________

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

Department of Information Science and Engineering

R.V. College of Engineering,(An Autonomous Institution under VTU, Accredited by NBA)

Bangalore-560059

DECLARATION

I, Shiva Shankar B.N, student of sixth semester B.E. in Information Science and Engineering,

R.V. College of Engineering, Bangalore declare that the project entitled “Extendible

Hashing” has been carried out by me and submitted in partial fulfillment of the File Structures

course requirements for the award of degree in Bachelor of Engineering in Information

Science and Engineering of Visvesvaraya Technological University, Belgaum during the

academic year 2010-2011. The matter embodied in this report has not been submitted to any

other university or institution for the award of any other degree or diploma.

Shiva Shankar B.N

USN: 1RV08IS048B.E. in Information Science and Engineering R.V. College of EngineeringBangalore-560 059

ACKNOWLEDGMENT

My project was the result of the encouragement of many people who helped in shaping it and

provide feedback, direction valuable support. It is with hearty gratitude that I acknowledge their

contributions to my project.

I like to thank my internal guide Kavitha S.N, professor, Department of Information Science &

Engineering, RVCE, for the guidance, suggestions in the area of improvement, and

implementation of the project.

I like to thank my lab-in-charge Nagraj.G Cholli, Professor, Department of Information Science

& Engineering, RVCE, for the constant help and support extended towards me during the course

of the project.

I am also grateful to the Professor and HOD, Dr. Ramakanth Kumar P, Department of

Information Science & Engineering, RVCE, for permitting me to take up this project and his

encouragement. I thank our Principal, RVCE, who has always been a great source of inspiration.

I thank RSST, for the infrastructure and facilities provided that helped in the completion of the

project successfully.

Last, but not the least, I would like to thank my family and friends who provided me with

valuable suggestions to improve my project.

Shiva Shankar B.N

1RV08IS048

ABSTRACT

With the evolution of technology, the amount of data obtained through various transactions, have

exceeded more than ever. This enormous data which has a large potential in guiding an

organization in its future endeavors. However, preserving this huge amount of data is justified,

only if it is possible for the organization to extract the required information from the stored data,

when required. This reasoning finally brings us to the conclusion that, all the transaction data that

is stored, serves its purpose only if it can be accessed with atleast the acceptable performance

level.

Given the importance of accessing the required data efficiently, various efficient ways have been

developed overtime. Some of these efficient algorithms use the concepts of Indexing, B trees, B+

+ trees, Hashing etc. B trees is considered to be efficient, however, its shortcoming is its

performance dependence on the depth of the tree (logkN). Hashing concept overcomes this

drawback, and suggests a way of obtaining an access speed of the order of 1 i.e O(1). Thus

hashing is one of the most efficient in accessing static data. However, most of the data is

considered to be dynamic i.e data gets modified very frequently and static hashing cannot handle

the varying data. This necessity then led to another form of hashing called Extendible Hashing.

The project demonstrates the implementation of Extendible Hashing, in accessing a series of

student records. The project uses Object Oriented Programming using C++ in its code

implementation and thus shows the applicability of Object Oriented Programming in

implementing complex programs.

The project provides features such as insertion, deletion, search, update and displaying of student

records stored in a student database. The insertion of records uses the concept of Extendible

Hashing in generating the record insertion address. Further, concepts of extendible hashing such

as buckets and many others have been implemented. The project also uses C based graphics to

present the data in a better form.

i

LIST OF FIGURES

Fig No. Name of Figure Page No.

Fig. 3.1

Fig. 3.2

Fig. 3.3

Fig. 3.4

Fig. 4.1

Fig.4.2

Fig.4.3

Fig.4.4

Fig.4.5

Fig.4.6

Fig.4.7

Fig.4.8

Fig. 7.1

Fig. 7.2

Fig. 7.3

Fig. 7.4

Fig. 7.5

Fig. 7.6

System Block Diagram

Level 0 Diagram

Level 1 Diagram

Level 2 Diagram

Structure diagram

IOBuffer class diagram

Student class diagram

FixedLengthBuffer class diagram

DelimFieldBuffer class diagram

TextIndex class diagram

Insertion Flowchart

Deletion Flowchart

Home page

Choice screen

Data Entry-Record Insertion

Record Modification

Record Display

Directory Details

7

8

9

10

11

12

12

13

13

13

14

16

26

26

27

27

28

28

ii

LIST OF TABLES

Table no. Name of Table Page No.

Table 6.1

Table 6.2

Table 6.3

Table 6.4

Table 6.5

Table 6.6

Table 6.7

Table 6.8

Table 6.9

Table 6.10

Table 6.11

Table 6.12

Unit test case for insertion operation.

Unit test case for modification operation.

Unit test case for display operation.

Unit test case for display all operation.

Unit test case for directory display

Unit test case for deletion operation.

Unit test case for space utilization operation.

Unit test case for test failure

Unit test case for correction of failure

Integrated test case for doubling directory.

Integrated test case for collapsing directory.

System test case for hashing.

20

21

21

21

22

22

22

23

23

24

24

25

iii

TABLE OF CONTENTSSl. No. Chapter Name Page No.

1.

2.

3.

4.

5.

6.

Introduction1.1 Purpose1.2 Scope1.3 Motivation1.4 Literature Survey

Software Requirement Specification2.1 Overall Description2.2 Specific Requirements 2.2.1 Functionality 2.2.1.1 Functionality Requirement 1 2.2.1.2 Functionality Requirement 2 2.2.1.3 Functionality Requirement 3 2.2.2 Performance Requirement 2.2.3 Design Constraints 2.2.4 Hardware Requirement 2.2.5 Software Requirement 2.2.6 Interface Requirement

2.2.6.1User Interfaces 2.2.6.2 Communication Interfaces

High Level Design3.1. Design Considerations 3.1.1 Assumptions and Dependencies

3.1.2General Constraints3.2 System Block Diagram 3.2.1 Solution Architect Diagram3.3 Data Flow Diagram

Detailed Design 4.1 Structure Diagram4.2 Class Diagram4.3 Flow Charts

Implementation5.1 Selection of the platform5.2 Selection of the programming language5.3 Programming Coding Guidelines Testing6.1 Unit Testing 6.1.1 Unit Test Case1 6.1.1 Unit Test Case2

11112

33344445555555

6666778

11

1214

18171819

20202021

iv

http://www.cmcrossroads.com/bradapp/docs/sdd.html#TOC_SEC8%23TOC_SEC8

http://www.cmcrossroads.com/bradapp/docs/sdd.html#TOC_SEC7%23TOC_SEC7

7.

8.

6.1.1 Unit Test Case36.1.1 Unit Test Case46.1.1 Unit Test Case56.1.1 Unit Test Case66.1.1 Unit Test Case76.1.1 Unit Test Case86.1.1 Unit Test Case9

6.2 Integration Testing 6.2.1 Integration Test Case1 6.2.1 Integration Test Case26.3 System Testing 6.3.1 System Test Case1

Results 7.1 Snapshots7.2 Advantages of the Project7.3 Limitations of the Project

Conclusion8.1 Future Enhancement

References

Appendix A List Of Acronyms

Appendix B Coding

212122222223232424242525

26262929

3030

31

32

33

v

Extendible Hashing Introduction

Chapter 1

INTRODUCTION

1.1 Purpose

The purpose of this project is to demonstrate the working of a file structure based on the concept

of Extendible Hashing. This helps us to understand the complexity required and also the benefits

derived out of implementing a project based on the extendible hashing.

There have been many concepts developed in search for better accessing of records. Some of

them include B trees, B+ trees etc. A working model of the hashing concept lets us compare it

with the other working models and get a clear idea about the applicability of the suitable concepts

for the different purposes.

Each of the file structure concepts, indexing, B trees, B++ trees are very efficient in structuring

and storing the data in a file. However, the B trees have O(logkN) access i.e their performance

can decrease with increase in the amount of data. Extendible Hashing comes as a better solution

to this problem providing an access of the order of 1.

1.2 Scope

The project is mainly applicable in academic institutions such as schools, colleges etc where

student details are stored, accessed and modified. However, the implementation is not just

confined to this particular field and can serve a wide variety of diverse fields such as accounting,

medicine, business etc.

Since, the main purpose of the project is to maintain a database using Extendible Hashing file

structure, the area of influence of this project encompasses all the fields that require a database

for its functioning.

For, the schools and colleges in particular, the implementation focuses on adding, searching,

modifying, updating or deleting a student record from the database in a very efficient manner.

Further, certain University specific constraints have been added to prevent errors creeping in to

the database.

1.3 Motivation

The key motivation factor behind a project on Extendible Hashing is the miracle that can be

achieved using extendible hashing. A file structure concept which can provide a sense of

completely structured data because of extremely low access times i.e O(1) and also overcome the

overhead involved in actually structuring the data. Providing logical adjacency instead of

Dept of ISE, R.V.C.E. 2009-2010 1

Extendible Hashing Introduction

physical adjacency helps in overcoming the overhead. This idea in particular, provides the

inspiration towards creating a program based on Extendible Hashing.

1.4 Literature Survey

The file structure concepts evolved with the need to efficiently access the data. This efficient

access was mainly dependent on the way the data was stored i.e mapping of address to the data

and so research were carried out towards storing data in such a manner as to get the address of

required data as efficiently as possible.

Early work with files presumed that files were on tape, since most files were. Access was

sequential and the cost of access grew in direct proportion to the size of the file. Simple indexes

were used to speed up the access. However, as the indexes grew, they too became difficult to

manage. Due to this reason, in the early 1960s, the idea of applying tree structures emerged as a

potential solution. In the late 1960s, using the work of B trees and B+ trees, many commercial

vendors created file systems that were faster and were not sequential.

B trees provided excellent access performance, but there was a cost: no longer could a file be

accessed sequentially with efficiency. Fortunately, this problem was solved almost immediately

by adding a linked list structure at the bottom level of the B tree. The combination of a B-tree and

a sequential linked list is called a B+ tree.

Over the next ten years, B-trees and B+ trees became the basis for many commercial file systems,

since they provide access times that grow in proportion to logkN. However, even though B-trees

with all its advancements proved to be extremely efficient, the ultimate goal i.e to access any data

required present in any part of the disk in one access was not achieved. In the 1980s, there was

optimism in achieving this, and Hashing concept gave some hints of achieving this. Hashing

proved to be extremely efficient. However, it had its drawback-it was for static files. After much

work, Extendible Hashing was developed which could retrieve information with one or, at most,

two disk accesses no matter how big the file became.

Dept of ISE, R.V.C.E. 2009-2010 2

Extendible Hashing Software Requirements Specification

Chapter 2

SOFTWARE REQUIREMENT SPECIFICATION

2.1 Overall Description

The project involves the following specifications:

Product Perspective: The product shall have graphics implemented into it for ease of use.

The product shall be able to deliver the required functionality efficiently i.e faster access and also

lesser storage overhead.

The product shall implement the Object Oriented Approach and thus provide easier ways to

update, debug and correct modules.

Product Functions:

During user input, each entry of the record shall be validated based on certain constraints.

The product shall provide the insert, update, display and delete functionalities.

On every recursive collapse, the information shall be displayed to the user.

The space utilized at a particular time shall also be displayed.

User Characteristics:

The user can enter the details of each student adhering to the constraints provided.

The user can enter view the collapsing of directories.

General Constraints:

The application must be protected from viruses in the system on which it is installed.

The application specifically requires Turbo C++ or any other C++ compiler with the support for

C graphics.

2.2 Specific Requirements

The requirements below shall enhance the supportability or maintainability of the system being

built, including coding standards, naming conventions ,class libraries, maintenance access,

utilities etc.

Dept of ISE, R.V.C.E. 2009-2010 3


2.2.1 Functionality

2.2.1.1 Functionality Requirement 1–Bucket Size

A bucket consists of a series of records which share the same address. For the program to work

consistently, the bucket size shall be based on the following criteria:

The buffer size shall be large enough to provide the required performance

The buffer size shall not exceed the limit above which the operating system cannot

manage.

Sector and track capacities on the disk.

Data access time of the hard disk( seek, rotation and data transfer times)

Bucket size shall not be larger than a track.

Bucket size shall be that of a single cluster.

2.2.1.2 Functionality Requirement 2–Doubling the size of the directory

The directory shall be split each time the bucket overflows and shall be displayed to the

user.

Address shall be assigned to the new buckets created.

Double the address space extending it from 2n to 2n+1 cells.

2.2.1.3 Functionality Requirement 3–Directory Collapse

The directory collapse shall be preceded by a check to determine whether downsizing is

possible.

The directory shall be collapsed if a pair of directory cells which point to different

buckets cannot be found in a directory scan.

Space shall be allocated for new a new array of bucket addresses that is half the size of

the original and then copying the bucket references shared by each cell pair to a single

cell in the new directory.

2.2.2 Supportability

The coding shall follow the naming standard.

The coding shall use the Object Oriented approach and use modules.

Certain comments, algorithms shall be provided as and when necessary.

Dept of ISE, R.V.C.E. 2009-2010 4


2.2.3 Performance Requirements

The computers used must have Intel Pentium 4 processors to provide optimum performance.

2.2.4 Design Constraints

The design of the product follows the IO buffer hierarchy.

The product has been designed considering the need to store student information in a

database and has provided student specific attributes such as USN, Branch etc for the

same.

2.2.4 Hardware Requirements

Processor: Pentium (3 or above) or AMD Athlon

RAM: 128Mb or more

Hard Disk: 10 MB or more

2.2.5 Software Requirements

Operating System: Microsoft Windows XP 32bit

Compiler: Turbo C++

2.2.6 Interface Requirement

2.2.6.1 User Interfaces

The product shall be completely User Interface based, adopting the C Graphics as its primary

interface.

2.2.6.2 Communication Interfaces

The communications for permanent storage on secondary storage devices (and also the retrieval

from these devices) shall be provided using the I/O functions provided by the C++ library.

Dept of ISE, R.V.C.E. 2009-2010 5

Extendible Hashing High Level Design

Chapter 3

HIGH LEVEL DESIGN

3.1 Design Considerations

3.1.1 Assumptions and Dependencies

Each student has a unique identifier called USN.

The USN is a 10 letter alphanumeric key of the format NAANNAANNN where N is a

digit between 0 and 9 and A is an alphabet.

Modification of USN is not allowed.

All attributes of a particular record are dependent on the USN as the key.

Directory splitting takes place depending on the bucket size i.e when bucket overflow

occurs.

Directory collapse takes place depending on whether collapsing is possible.

3.1.2 General Constraints

NULL CONSTRAINT:

No attribute values can be null.

ENTITY INTEGRITY CONSTRAINT:

The USN value cannot be null.

SEMANTIC CONSTRAINTS:

The USN is a 10 letter alphanumeric key of the format NAANNAANNN where N is a

digit between 0 and 9 and A is an alphabet.

The semester values must be between 1 and 8.

The department values supported are ISE/ise, CSE/cse.

Dept of ISE, R.V.C.E. 2009-2010 6


3.2 System Block Diagram

Block diagram is a diagram of a system, in which the principal parts or functions are represented

by blocks connected by lines, that show the relationships of the blocks. They are heavily used in

the engineering world in hardware design, software design, and diagrams. The block diagram is

typically used for a higher level, less detailed description aimed more at understanding the

overall concepts and less at understanding the details of implementation.

The system block diagram for simple hashing is given below. It takes the key of the record as the

input and produces the address as the output in which the record will be stored.

Figure 3.1 System block diagram of Hashing

The system block diagram for extendible hashing is given below. Extendible Hashing provides a

directory which is made up of cells and each cell points to a bucket. More than one cell can point

to a single bucket. Bucket is nothing but an index file containing the keys of the records

Figure 3.2 System block diagram of Extendible hashing

Dept of ISE, R.V.C.E. 2009-2010 7

http://en.wikipedia.org/wiki/Software_design

http://en.wikipedia.org/wiki/Hardware_design

http://en.wikipedia.org/wiki/Diagram

0.0

ExtendibleHashing

Client

Message

ResultsHashing

Data 1

Output2

Output1

Fig 3.2 Level 0 Data Flow diagram


3.3 Data Flow Diagram

A data-flow diagram (DFD) is a graphical representation of the "flow" of data through an

information system. DFDs can also be used for the visualization of data processing (structured

design).On a DFD, data items flow from an external data source or an internal data store to an

internal data store or an external data sink, via an internal process. A DFD provides no

information about the timing of processes, or about whether processes will operate in sequence or

in parallel. It is therefore quite different from a flowchart, which shows the flow of control

through an algorithm, allowing a reader to determine what operations will be performed, in what

order, and under what circumstances, but not what kinds of data will be input to and output from

the system, nor where the data will come from and go to, nor where the data will be stored.

Level 0 Data Flow Diagram

It describes the overall processing of the system and shows one process for each major

processing step or functional requirement. Data flows from the context appear on system diagram

also (level balancing). It can show a single data store to represent all data in aggregate at this

level. It can draw duplicate sources, sinks and data stores to increase legibility


A level 1 dataflow diagram depicts the main functional areas of the system under investigation. It

is derived with reference to the context diagram.

The context diagram on this screen depicts the overall business process for a generic system.

Further analysis is then necessary in order to identify the major functional areas.

Dept of ISE, R.V.C.E. 2009-2010 8

http://en.wikipedia.org/wiki/Flowchart

http://en.wikipedia.org/wiki/Data_processing

http://en.wikipedia.org/wiki/Data_visualization

http://en.wikipedia.org/wiki/Information_system

2.0Manipulate Details

REPORT

MESSAGE

1.0Validate Data FedCLIENT

3.0Retrieve Details

Input data

Input Output 1

STUDENT DATABASE

Output 2

ExtendibleHashing 3

2

1

4

5

Fig3.3 Level 1 Data Flow diagram


1: A Key

2: Another Key

3: Error status

4: Record address

5: Student characteristics

Output 1: Performs the operation of hashing as directed to the student database, and displays the

messages in a convenient manner which can be easily interpreted.

Output 2: Displays reports of student details.

Dept of ISE, R.V.C.E. 2009-2010 9

2.1

INSERTDETAILS MESSAGE

INPUT OUTPUT

2.2

UPDATEDETAILS MESSAGE

2.2

DELETEDETAILS MESSAGE

INPUT

INPUT

OUTPUT

OUTPUT

Fig3.4 Level 2 Data Flow Diagram



A level 2 data flow diagram depicts the input and output forms of the data

It is derived with reference to the context diagram.

The context diagram on this screen depicts the overall business process for a generic system.

Further analysis is then necessary in order to identify the major parts of the input and output.

INPUT: Student details

OUTPUT: Successfully inserted/updated/deleted

Dept of ISE, R.V.C.E. 2009-2010 10

MAIN

DISPLAY ALLDELETE

SEARCH

UNPACK

REMOVE

UNPACK

MODIFY

UNPACK

UNPACK

UNPACK

UNPACK

APPEND

APPEND PACK

IsUSNOK

IsNameOK

IsBranchOK

IsSemOK

DISPLAY

SEARCH UNPACK

Fig4.1 Structure Diagram

Extendible Hashing Detailed Design

Chapter 4

DETAILED DESIGN

4.1 Structure Chart

A Structure Chart (SC) is a chart, which shows the breakdown of the configuration system to the

lowest manageable levels. This chart is used in structured programming to arrange the program

modules in a tree structure. Each module is represented by a box, which contains the module's

name. The tree structure visualizes the relationships between the modules.

Dept of ISE, R.V.C.E. 2009-2010 11

http://en.wikipedia.org/wiki/Structured_programming

http://en.wikipedia.org/wiki/Configuration_system

http://en.wikipedia.org/wiki/Chart

IOBUFFER

Buffer:stringBufferSize:integerMaxBytes:integer

Read(istream&):integerWrite(ostream &):integerPack(void*,int):integer

Unpack(void*,int):integer

Fig 4.2 IOBuffer Class Diagram

STUDENT

USN:stringLname:stringFname:string

Address:stringSemester:stringCollege:string

Pack(void*,int):inegerUnpack(void*,int):integer

Print(ostream &)Search(char*):integer

Append(char*)

Fig 4.3 Student Class Diagram


4.2 Class Diagrams

Class Diagram is a graphical model used in the object-oriented approach to show all of the

classes of objects in the system. It is a set of classes that are closely related in terms of function

and data, and which form an independent and reusable product.

Dept of ISE, R.V.C.E. 2009-2010 12

FIXEDLENGTHBUFFER

Read(istream&):integerWrite(ostream &):integer

Print(ostream &)sizeofBuffer():integer

Fig 4.4 FixedLengthBuffer Class Diagram

DELIMFIELDBUFFER

Delim:charDefaultDelim:char

Pack(void*,int):integerUnpack(void*,int):integer

Print(ostream &)Init()

Clear()

Fig 4.5 DelimFieldBuffer Class Diagram

TEXTINDEX

MaxKeys:integerNumKeys:integer

Insert(char*,int):integerRemove(char*):integerSearch(char*):integer

Print(ostream&)

Fig 4.6 TextIndex Class Diagram


Dept of ISE, R.V.C.E. 2009-2010 13

B

INPUT KEY

KEY EXISTS?

CALL BUCKET::INSERT

PRINT KEY EXISTS

END

CALL BUCKET::SPLIT

IS BUCKET FULL?ADD KEY TO

BUCKET

A

END

START

Y

N

Y

N


4.3 Flow ChartsA flow chart is a graphical or symbolic representation of a process. Each step in the process is

represented by a different symbol and contains a short description of the process step. The flow

chart symbols are linked together with arrows showing the process flow direction.

4.3.1 Insertion

The flow chart for insertion is shown below:

Dept of ISE, R.V.C.E. 2009-2010 14

CALL BUCKET::INSERT

KEY EXISTS??

PRINT KEY EXISTS

END

A

DIVIDE THE KEYS INTO THE NEW BUCKETS

IS THE DIRECTORY BIG ENOUGH?

CALL DIRECTORY::DOUBLE

SIZE

DOUBLE THE DIRECTORY SIZE AND ALLOW NEW BUCKET

B

N

Y

N

Y


Dept of ISE, R.V.C.E. 2009-2010 15

START

INPUT KEY

CALLDIRECTORY::

REMOVE

IS THE KEY FOUND?

CALL BUCKET::REMOVE

PRINT: KEY NOT FOUND

END

CALL DELETEMETHOD

PASS BUCKETTO DIRECTORY:

TRY COMBINE

A

Y

N


4.3.2 Deletion

The Flow chart for deletion is as follows:

Dept of ISE, R.V.C.E. 2009-2010 16

A

IS THEREBUDDY BUCKET?

IS SUM OF 2 BUCKET<1 BUCKET?

PRINT DELETIONDONE

END

CALL DIRECTORY:: COLLAPSE

CAN DIRECTORY BE COLLAPSED

PRINTDELETION

COLLAPSE THE DIRECTORY

END

Y

N

Y

N

Y

N

Fig 4.8 Key Deletion Flow Diagram


Dept of ISE, R.V.C.E. 2009-2010 17

Extendible Hashing Implementation

Chapter 5

IMPLEMENTATION

5.1 Selection of the platform

An operating system is software that manages computer resources and provides

programmer/users with an interface used to access those resources. An operating system

performs basic tasks such as controlling and allocating memory, prioritizing system requests,

controlling and internal system resources as a service to users and programs of the system.

The system under development works in a very restrictive environment. The

security concerns are large and require that the system being developed be robust and safe

from attack. Windows XP analyzes the performance impact of visual effects and uses this to

determine whether to enable them, so input and output devices, facilitating computer

networking and managing files. An operating system processes system data and user input,

and responds by allocating and managing tasks as to prevent the new functionality from

consuming excessive additional processing overhead. Users can further customize these

settings. Windows XP operating systems can fix problems and add features by using service

pack. The service pack is a superset of all previous service packs and patches so that only the

latest service pack needs to be installed.

5.2 Selection of the programming language-C++

The programming language used for the development work is C++. The reasons for selecting

this language include

Compared to C, C++ which is object oriented in its approach suits well for the

modular programming that I apply in my project.

C++ provides a lot of I/O features which are very crucial for my project.

Since the project has more of input operations from the user, C++ provides simple

ways to input data which is otherwise, complex in Java.

Compared to Java, C++ runs faster because of the direct conversion of source

code to machine code.

C++ has the capability to interact directly with the machine, which is a add-on

capability that can be utilized.

C++ is a widely used language and hence it can, to some extent guarantee the

portability of the application developed.

Dept of ISE, R.V.C.E. 2009-2010 18

Extendible Hashing Implementation

5.3 Programming Coding Guidelines

5.3.1 Naming Conventions

Every variable has all the letters in lowercase.

Every class begins with an uppercase letter and has all the other letters in lowercase.

Uppercase letters are used to distinguish between words in an identifier.

Every method name has all the letters in lowercase.

5.3.2 Coding Conventions

All the required variables are declared at the beginning of each module.

For every unique functionality, a function is written and thus modularized.

Unconditional looping statements such as goto are avoided as much as possible to keep

the program simple and easy to debug.

Unsigned variables have been rarely used i.e only when it is extremely necessary.

Static variables are used to save space as and when possible.

The I/O buffer hierarchy is made use of extensively.

Dept of ISE, R.V.C.E. 2009-2010 19

Extendible Hashing Testing

Chapter 6

TESTING

The testing done in this project are Unit testing, Integration testing and System testing.

Features to be tested: Insertion, deletion, modification, updating and directory

collapse.

Items to be tested: doubling of directory size and space utilization for buckets.

Purpose of testing: To check the effective implementation of Extendible Hashing

Pass / Fail Criteria: Changes made either in the program or in the database file must

reflect in the file or program respectively.

Assumptions and Constraints: The values that can be entered have specific formats

with size constraints for each record.

6.1 Unit Testing

Unit testing is a software verification and validation method in which a programmer tests if

individual units of source code are fit for use. A unit is the smallest testable part of an

application. In procedural programming a unit may be an individual function or procedure.

6.1.1 Unit Test Case 1

Table 6.1 Unit Test Case 1

Sl No. of test case : 1Name of test : Insertion Test

Item / Feature being tested : Insert a new Student record.

Sample Input : USN=’1RV07IS030’,Name=’Mithun’,Address=’Mangalore’, Semester=’6’, Branch=’ISE’, College=’RVCE’.

Expected output : The record is Inserted into the file.Actual output : The record is successfully inserted.

Remarks : Test succeeded.

Dept of ISE, R.V.C.E. 2009-2010 20




Sl No. of test case : 2Name of test : Modify test.

Item / Feature being tested : Modification of Details from the student record.Sample Input : Name=’John’ with USN=’1RV07IS532’and values to be

modified.Expected output : The modifications must be reflected in the database.

Actual output : The record is successfully updated. Remarks : Test succeeded.



Sl No. of test case : 3Name of test : Display test

Item / Feature being tested : Display student details.Sample Input : USN=’1RV07IS431’.

Expected output : The record with USN=1RV07IS431 must be Displayed.Actual output : The record with USN=1RV07IS431 is Displayed.




Sl No. of test case : 4Name of test : Display all test

Item / Feature being tested : Display all the student records stored in fileSample Input : Enter the choice to display all records.

Expected output : All the student records in the file must be displayed .Actual output : All the student records in the file is displayed.


Dept of ISE, R.V.C.E. 2009-2010 21




Sl No. of test case : 5Name of test : Delete test

Item / Feature being tested : Deletion of a student record from the fileSample Input : USN=’1RV07IS342’

Expected output : The record with USN=’1RV07IS342’ must be deleted from the file

Actual output : The record is deleted. Remarks : Test succeeded.



Sl No. of test case : 6Name of test : Directory display

Item / Feature being tested : To display the Directory.Sample Input : Enter the choice to display directory.

Expected output : The directory details must be displayed.Actual output : The directory details are successfully displayed.




Sl No. of test case : 7Name of test : Space utilization test

Item / Feature being tested : Display Space utilizationSample Input : Enter the choice to display space utilization.

Expected output : The space utilization must be displayed.Actual output : The space utilization is displayed.


6.1.8 Unit Test Case 8 Dept of ISE, R.V.C.E. 2009-2010 22



Sl No. of test case : 8Name of test : Insert test

Item / Feature being tested : Inserting student record

Sample Input :USN=’1RV07IS030’,Name=’Mithun’,Address=’Bangalore’,Semester=’11’, Branch=’ISE’,College=’RVCE’.

Expected output : The record should be successfully Inserted into the fileActual output : Record not inserted

Remarks : Test Failed

The insertion with semester as 11 fails because of wrong value entered for semester. The

insertion of the record with correct values is shown next.



Sl No. of test case : 9Name of test : Insert test

Item / Feature being tested : Inserting student record

Sample Input :USN=’1RV07IS030’, Name=’Mithun’,Address=’Bangalore’,Semester=’6’, Branch=’ISE’, College=’RVCE’.

Expected output : The record is successfully Inserted into the fileActual output : Record is inserted

Remarks : Test succeeded

6.2 Integration testing

Integration testing (sometimes called Integration and Testing, abbreviated "I&T") is the activity

of software testing in which individual software modules are combined and tested as a group. It

occurs after unit testing and before system testing. Integration testing takes as its input modules

that have been unit tested, groups them in larger aggregates, applies tests defined in an

integration test plan to those aggregates, and delivers as its output the integrated system ready for

system testing.

Dept of ISE, R.V.C.E. 2009-2010 23


6.2.1 Integration test case 1

Table 6.10 Integration Test Case 1

Sl No. of test case : 1Name of test : Doubling the directory

Item / Feature being tested : Doubling the directorySample Input : Inserting a new record

Expected output : The directory gets doubled and the record is storedActual output : The directory gets doubled and the record is stored

Remarks : Test succeeded

6.2.2 Integration test case 2

Table 6.11 Integration Test Case 2

Sl No. of test case : 2Name of test : Collapsing the directory

Item / Feature being tested : Collapsing the directorySample Input : Deleting a record

Expected output : The directory gets collapsed when there are no more records in it

Actual output : The directory gets collapsedRemarks : Test succeeded

6.3 System testing

Dept of ISE, R.V.C.E. 2009-2010 24


System testing of software or hardware is testing conducted on a complete, integrated system to

evaluate the system's compliance with its specified requirements. System testing takes, as its

input, all of the "integrated" software components that have successfully passed integration

testing and also the software system itself integrated with any applicable hardware system(s).

6.3.1 System test case 1

Table 6.12 System Test Case

Sl No. of test case : 1Name of test : Hashing

Item / Feature being tested : Hashing.cppSample Input : Inserting student records

Expected output :The records get inserted in the corresponding hash addresses resolving collision using buckets

Actual output : The records get inserted in the correct addressRemarks : Test succeeded

Dept of ISE, R.V.C.E. 2009-2010 25

Extendible Hashing Results

Chapter 7

RESULTS

7.1 Snapshots

Figure 7.1 The home page

Figure 7.2 The choice screen

Dept of ISE, R.V.C.E. 2009-2010 26


Figure 7.3 Data Entry-Record Insertion

Figure 7.4 Record Modification

Dept of ISE, R.V.C.E. 2009-2010 27


Figure 7.5 Record Display

Figure 7.6 Directory details

7.2 Advantages of the project Dept of ISE, R.V.C.E. 2009-2010 28


The project provides the ability to store student records in a database (a file), by using one

of the best file structure data storage and access concepts i.e Extendible Hashing.

Using the project, the particular instant of directory collapse and expansion can be found.

Provides a functionality to view the current space utilization.

The program provides better performance compared to B,B+ implementations.

Also, Hashing provides faster access, usually with very little storage overhead and it is

adoptable to most types of primary keys, i.e. hashing makes it possible to find any record

with only one disk access.

Extendible Hashing allows the address space to grow and shrink dynamically along with the

file, thus avoiding the need of overflow handling.

The use of the model of ‘TRIE’ extends the use of the hashed value, by the addition of

another level to the depth of the trie with a radix of 2, another bit of the hashed value is

used.

Space utilization is calculated to approximately 69% using the approximation formula given

by Flajolet.

7.3 Limitations of the project

The program does not support the use of interfaces such as mouse etc.

The size of the program compared to the functionality it provides is extremely large.

The program has been implemented using C++, however, using a Java code, would have

drastically reduced the size because of many built-in functions.

The program does not use the newer concepts such as multi-threading etc in its

implementation.

The program implement in C++, is machine dependent and requires compilation each

time it is shifted to a different platform.

To cover the address space effectively more no. of bits must be used from the hashed

value.

A complete binary tree has to be formed from the trie.

To accommodate the use of even a single new bit, the address space has to be doubled

due to the splitting of the bucket.

Records can be combined for 2 buckets only if they are buddy buckets.

Dept of ISE, R.V.C.E. 2009-2010 29

Extendible Hashing Conclusion

Chapter 8

CONCLUSION

8.1 Future Enhancement

The project developed can be enhanced further by making the bucket size dynamic.

The implementations can also be done to graphically represent the changes that happen after each

insertion and deletion.

The user input forms can be made more user friendly by providing exit options wherever

possible. Also, integration of mouse functionalities within the program can make it very easier

for the user to interact.

Further, multithreading capabilities can be provided and the program can be made to run over the

network, storing the file (database) on the server.

Even though, the program has been modularized; there are instances of redundant codes being

part of the program. There are many places in the source program where the redundant codes can

be eliminated and also certain codes can be integrated into a single module.

The display functionality can be modified to make multiple records to be displayed possible on a

single screen and thereby effectively utilize the display area.

Dept of ISE, R.V.C.E. 2009-2010 30

Extendible Hashing References

REFERENCES

[1] Competence Center Corporate Data Quality 2-IWI-HSG, Institute of Information

Management, University of St. Gallen,St. ,Gallen

[2] Michael.J.Folk, Bill Zoellick and Greg Riccardi,File Structures-An Object Oriented Approach

Using C++,2008

[3]Bjarne Stroustrup,C++ -the complete reference.

[4]Yashwant P Kanetkar, Graphics Using C.

[5] Raghu Ramakrishnan and Johannes Gehrke-Database Management Systems

[6] Ian Sommerville, Software Engineering, 5th Edition, Pearson Education.

[7] www.CUserseJournal.com

[8] www.ThinkinginC.com

[9] www.AlgorithmsinCProgramming.com

Dept of ISE, R.V.C.E. 2009-2010 31

Extendible Hashing Appendix-A

APPENDIX-A

LIST OF ACRONYMS

1. FS: File Structures

2. USN: University Serial Number

3. SRS: Software Requirement Specification

4. OS: Operating System

5. RAM: Random Access Memory

6. SQL: Structured Query Language

7. ID: Identifier

8. CD-ROM:Compact Disk Read Only Memory

Dept of ISE, R.V.C.E. 2009-2010 32

Extendible Hashing Appendix-B

APPENDIX-B

CODING

B.1 Student Database

//student.h

#include "C:\tc\hash\delim.cpp"class Student{ public: char URN[13];char Lname[11];char Fname[21];char Address[50]; char Semester[2];char Branch[6];char College[11];

Student(); static int InitBuffer (DelimFieldBuffer &); void Clear (); int Unpack (IOBuffer &); int Pack (IOBuffer &) const; void Print (ostream &, char *label = 0) const; int Search(char *); int Append(char *);};

//main.cpp

#include<iostream.h>#include "c:\tc\hash\student.h"#define TRUE 1#define FALSE 0

Student :: Student () { Clear();}

void Student :: Clear(){ // Set each field to an empty string URN[0] = 0;Lname[0] = 0; Fname[0] = 0; Address[0] = 0; Semester[0] = 0; Branch[0] = 0; College[0] = 0;}

int Student :: Pack (IOBuffer & Buffer) const{ int numBytes; Buffer.Clear(); numBytes = Buffer.Pack(URN); if (numBytes == -1) return FALSE; numBytes = Buffer.Pack(Lname); if (numBytes == -1) return FALSE; numBytes = Buffer.Pack(Fname); if (numBytes == -1) return FALSE; numBytes = Buffer.Pack(Address); if (numBytes == -1) return FALSE; numBytes = Buffer.Pack(Semester); if (numBytes == -1) return FALSE;

Dept of ISE, R.V.C.E. 2009-2010 33

Extendible Hashing Appendix-B numBytes = Buffer.Pack(Branch); if (numBytes == -1) return FALSE; numBytes = Buffer.Pack(College); if (numBytes == -1) return FALSE; return TRUE;}

int Student :: Unpack(IOBuffer & Buffer)//all the feilds are packed to the buffer

{ Clear(); int numBytes; numBytes = Buffer.Unpack(URN); if (numBytes == -1) return FALSE; URN[numBytes] = 0; numBytes = Buffer.Unpack(Lname); if (numBytes == -1) return FALSE; Lname[numBytes] = 0; numBytes = Buffer.Unpack(Fname); if (numBytes == -1) return FALSE; Fname[numBytes] = 0; numBytes = Buffer.Unpack(Address); if (numBytes == -1) return FALSE; Address[numBytes] = 0; numBytes = Buffer.Unpack(Semester); if (numBytes == -1) return FALSE; Semester[numBytes] = 0; numBytes = Buffer.Unpack(Branch); if (numBytes == -1) return FALSE; Branch[numBytes] = 0; numBytes = Buffer.Unpack(College); if (numBytes == -1) return FALSE; College[numBytes] = 0; return TRUE;}

int Student :: InitBuffer (DelimFieldBuffer & Buffer){ return TRUE;}void Student :: Print(ostream & stream, char * label) const{ gotoxy(3,4); if (label == 0) stream << "Stutent:"; else stream << label; gotoxy(3,5); stream << "Reg-no : " << URN ; gotoxy(3,6); stream << "Last Name : " << Lname; gotoxy(3,7); stream << "First Name: " << Fname; gotoxy(3,8); stream << "Address : " << Address; gotoxy(3,9); stream << "Semester : " << Semester; gotoxy(3,10); stream << "Branch : " << Branch;

Dept of ISE, R.V.C.E. 2009-2010 34

Extendible Hashing Appendix-B gotoxy(3,11); stream << "College : " << College; stream<<flush;}

int Student :: Search(char *myfile){ fstream file(myfile,ios::in); Student s1;

while(1){ DelimFieldBuffer :: SetDefaultDelim('|'); DelimFieldBuffer Buff; int add=Buff.Read(file); if (add==-1) return 0; s1.Unpack(Buff); if( strcmpi(s1.URN,URN)==0) return add+1;

}}

int Student :: Append(char *myfile){ DelimFieldBuffer :: SetDefaultDelim('|'); DelimFieldBuffer Buff; Student :: InitBuffer(Buff); Pack(Buff); fstream file(myfile,ios::in|ios::out); file.seekp(0,ios::end); file.seekg(0,ios::end); int recaddr=Buff.Write(file); file.close(); return recaddr;}

Dept of ISE, R.V.C.E. 2009-2010 35

Documents

My Extendible Hashing Report1