Hashing in DS

Preview:

DESCRIPTION

in computer

Citation preview

SEMINAR ON HASHING

PRESENTED TO: PRESENTED BY: MS. MANISHA RUCHIKA(6) RITIKA(11) MCA 1

S.S.D WOMEN’S INSTITUTE OF TECHNOLOGY, BATHINDA (AFFILIATED TO PUNJABI UNIVERSITY,PATIALA)

CONTENTS

What is hashing?What is hashing? Inserting ,deleting and searching a recordInserting ,deleting and searching a record hash functions and their methodshash functions and their methods CollisionCollision Two ways of collision resolutionTwo ways of collision resolution

1.1. open addressingopen addressing

2.2. ChainingChaining

Hashing are a common approach to the storing/searching problem.

A collection of data is stored ,and each data item has a key associated with it.

Hashing

What is a Hash Table ?

The simplest kind of hash table is an array of records.

This example has 701 records.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

An array of records

. . .

[ 700]

What is a Hash Table ?

Each record has a special field, called its key.

In this example, the key is a long integer field called Number.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

. . .

[ 700]

[ 4 ]

Number 506643548

What is a Hash Table ?

The number might be a person's identification number, and the rest of the record has information about the person.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

. . .

[ 700]

[ 4 ]

Number 506643548

What is a Hash Table ?

When a hash table is in use, some spots contain valid records, and other spots are "empty".

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

The general idea of using the The general idea of using the keykey to determine the address of a to determine the address of a record is an excellent idea, but it must be modified so that a great record is an excellent idea, but it must be modified so that a great deal of space is not wasted.deal of space is not wasted.

This modification takes the form of a function H from the set K of This modification takes the form of a function H from the set K of keys in to the set L of memory addresses.keys in to the set L of memory addresses.

H : K H : K L (hash function) L (hash function) Chop is a technique, in which combine the pieces of key K to form Chop is a technique, in which combine the pieces of key K to form

the hash address H(k).the hash address H(k).key

key1 key2

Key1.1 Key1.2

Methods of Hash Functions

Division methodDivision method:- Choose a number m larger than the :- Choose a number m larger than the number n of keys in k.number n of keys in k.

The hash function H is defined by The hash function H is defined by

H(k)=k(mod m) or H(k)=k(mod m) +1H(k)=k(mod m) or H(k)=k(mod m) +1 k(mod m) denotes the remainder when k is divided by m.k(mod m) denotes the remainder when k is divided by m. Second formula is used when we want the hash addresses Second formula is used when we want the hash addresses

to range from 1 to m. rather than o to m-1to range from 1 to m. rather than o to m-1..

Example of division method

68 employees is assigned a unique 4-digit employee number. 68 employees is assigned a unique 4-digit employee number. Suppose L consists of 100 two-digit addresses. 00,01,02,Suppose L consists of 100 two-digit addresses. 00,01,02,……99. we apply the division method to each of the ……99. we apply the division method to each of the employee number: 3205, 7148, 2345.employee number: 3205, 7148, 2345.

1. Choose a prime number m close to 99 such as m=97. then 1. Choose a prime number m close to 99 such as m=97. then

H(3205)=4 , H(7148)=67 , H(2345)=17.H(3205)=4 , H(7148)=67 , H(2345)=17.

2. H(3205)=4+1=52. H(3205)=4+1=5

H(7148)=67+1=68H(7148)=67+1=68

H(2345)=17+1=18H(2345)=17+1=18

Mid square method

The key k is squared. Than the hash function H is defined The key k is squared. Than the hash function H is defined byby

H(k)=lH(k)=l Here l is obtained by deleting digits from both ends of k*k.Here l is obtained by deleting digits from both ends of k*k.

Example:Example:

k: 3205 k: 3205 71487148 23452345

k*k: 10 272 025 k*k: 10 272 025 51 093 90451 093 904 5 499 025 5 499 025

H(k): 72H(k): 72 9393 9999

Folding method

The key k is partitioned into a number of parts, k1,k2,The key k is partitioned into a number of parts, k1,k2,….kr.….kr.

Parts are added together, ignoring the last carry.Parts are added together, ignoring the last carry.

H(k)=k1+k2+……+krH(k)=k1+k2+……+kr Example:-Example:-

H(3205)=32+05=37H(3205)=32+05=37

H(7148)=71+48=19 (the leading digit 1 in this is ignored).H(7148)=71+48=19 (the leading digit 1 in this is ignored).

H(2345)=23+45=68H(2345)=23+45=68

Inserting a New Record

In order to insert a new record, the key must somehow be converted to an array index.

The index is called the hash value of the key.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Number 580625685

Inserting a New Record

Typical way create a hash value:

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Number 580625685

(Number mod 701)

What is (580625685 mod 701) ?

Inserting a New Record

Typical way to create a hash value:

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Number 580625685

(Number mod 701)

What is (580625685 mod 701) ?3

Inserting a New Record

The hash value is used for the location of the new record.

Number 580625685

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

[3]

Inserting a New Record

The hash value is used for the location of the new record.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

Collisions

Here is another new record to insert, with a hash value of 2.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

Number 701466868

My hashvalue is [2].

Collisions

This is called a collision, because there is already another valid record at [2].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

Number 701466868

When a collision occurs,

move forward until youfind an empty spot.

Collisions

This is called a collision, because there is already another valid record at [2].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

Number 701466868

When a collision occurs,

move forward until youfind an empty spot.

Collisions

This is called a collision, because there is already another valid record at [2].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

Number 701466868

When a collision occurs,

move forward until youfind an empty spot.

Collisions

This is called a collision, because there is already another valid record at [2].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

The new record goesin the empty spot.

Searching for a Key

The data that's attached to a key can be found fairly quickly.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

Searching for a Key

Calculate the hash value. Check that location of the array

for the key.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

My hashvalue is [2].

Not me.

Searching for a Key

Keep moving forward until you find the key, or you reach an empty spot.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

My hashvalue is [2].

Not me.

Searching for a Key

Keep moving forward until you find the key, or you reach an empty spot.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

My hashvalue is [2].

Not me.

Searching for a Key

Keep moving forward until you find the key, or you reach an empty spot.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

My hashvalue is [2].

Yes!

Searching for a Key

When the item is found, the information can be copied to the necessary location.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

My hashvalue is [2].

Yes!

Deleting a Record

Records may also be deleted from a hash table.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Pleasedelete me.

Deleting a Record

Records may also be deleted from a hash table. But the location must not be left as an ordinary

"empty spot" since that could interfere with searches.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Deleting a Record

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 233667136Number 281942902

Number 155778322

. . .Number 580625685 Number 701466868

Records may also be deleted from a hash table. But the location must not be left as an ordinary

"empty spot" . The location must be marked in some special way so

that a search can tell that the spot used to have something in it.

COLLISION RESOLUTION

There are two general ways of resolving collisions.There are two general ways of resolving collisions.

The particular procedure that one chooses depends on many The particular procedure that one chooses depends on many factors. One important factor is the ratio of the number n of keys factors. One important factor is the ratio of the number n of keys in K(which is the number of the record in F) to the number m of in K(which is the number of the record in F) to the number m of hash addresses in L. this ratio, = n/m is called load factor. hash addresses in L. this ratio, = n/m is called load factor.

n= number of filled records.n= number of filled records.

m= total number of memory locations.m= total number of memory locations.

The efficiency of The efficiency of hash function hash function with a collision resolution with a collision resolution procedure is measured by the average number of procedure is measured by the average number of probes(key comparisons) needed to find the location of the probes(key comparisons) needed to find the location of the record with a given key k.the efficiency depends mainly on record with a given key k.the efficiency depends mainly on the load factor . The following two quantities:the load factor . The following two quantities:

S( ) S( ) = average number of probes for a successful search.= average number of probes for a successful search.

U( ) = average number of probes for an unsuccessful U( ) = average number of probes for an unsuccessful search.search.

Open addressing

If the home slot for the record that is being inserted is already If the home slot for the record that is being inserted is already

occupied, then simply choose a different location with in the tableoccupied, then simply choose a different location with in the table..

But…how do we choose this alternate location? The technique must be reproducible, and on average be cheap.

Linear Probing

Linear probing involves simply walking down the table until an Linear probing involves simply walking down the table until an

empty slot is foundempty slot is found..

Drawbacks of linear probing

The major drawbacks of linear probing is that, as the table The major drawbacks of linear probing is that, as the table becomes about half full, here is a tendency toward becomes about half full, here is a tendency toward clusteringclustering..

The sequential searches needed to find an empty position The sequential searches needed to find an empty position become longer and longer.become longer and longer.

Example:- if a new insertion hashes to location b, then it will Example:- if a new insertion hashes to location b, then it will go there, but if it hashes to location a, then it will also go go there, but if it hashes to location a, then it will also go into b.into b.

a b c d e

The problem of clustering is essentially one of The problem of clustering is essentially one of instabilityinstability. . If a few keys happen randomly to be near each other, then If a few keys happen randomly to be near each other, then it becomes more and more likely that other keys will join it becomes more and more likely that other keys will join them, and the distribution will become progressively more them, and the distribution will become progressively more

unbalancedunbalanced..

Avoid the problem of clustering

Quadratic probing:- Quadratic probing:- suppose a record R with key k has the suppose a record R with key k has the hash address H(k)=h. then, instead of searching the locations hash address H(k)=h. then, instead of searching the locations with addresses h, h+1, h+2,…. We linearly search the with addresses h, h+1, h+2,…. We linearly search the locations with addresses.locations with addresses.

h, h+1, h+4, h+9, h+16,….h+i^2…h, h+1, h+4, h+9, h+16,….h+i^2…

If the number m of locations in the table T is a prime number, If the number m of locations in the table T is a prime number, then the above sequence will access half of the locations in T.then the above sequence will access half of the locations in T.

Double Hashing

A second hash function H’ is used for resolving a A second hash function H’ is used for resolving a collision. Suppose a record R with key k has the hash collision. Suppose a record R with key k has the hash addresses H(k)=h and H’(k)= h’ m then we linearly addresses H(k)=h and H’(k)= h’ m then we linearly search the locations with addresses.search the locations with addresses.

h, h+h’, h+2h’, h+3h’,….h, h+h’, h+2h’, h+3h’,….

if m is a prime number , then the above sequence all the if m is a prime number , then the above sequence all the locations in the table T.locations in the table T.

Chaining

Design the table so that each slot is actually a container that can hold multiple records.

Here, the “chains" are linked lists which could hold any number of colliding records. Alternatively each table slot could be large enough to store several records directly…in that case the slot may overflow, requiring a fallback…

EXAMPLE

LIST

8

3

0

5

7

0

0

2

0

0

6

INFO LINK

A 0

B 0

C 0

D 0

E 1

X 4

Y 0

Z 0

10

11

12

3

4

6

5

7

8

910

Recommended