41
SEMINAR ON HASHING PRESENTED TO: PRESENTED BY: MS. MANISHA RUCHIKA(6) RITIKA(11) MCA 1 S.S.D WOMEN’S INSTITUTE OF TECHNOLOGY, BATHINDA

Hashing in DS

Embed Size (px)

DESCRIPTION

in computer

Citation preview

Page 1: Hashing in DS

SEMINAR ON HASHING

PRESENTED TO: PRESENTED BY: MS. MANISHA RUCHIKA(6) RITIKA(11) MCA 1

S.S.D WOMEN’S INSTITUTE OF TECHNOLOGY, BATHINDA (AFFILIATED TO PUNJABI UNIVERSITY,PATIALA)

Page 2: Hashing in DS

CONTENTS

What is hashing?What is hashing? Inserting ,deleting and searching a recordInserting ,deleting and searching a record hash functions and their methodshash functions and their methods CollisionCollision Two ways of collision resolutionTwo ways of collision resolution

1.1. open addressingopen addressing

2.2. ChainingChaining

Page 3: Hashing in DS

Hashing are a common approach to the storing/searching problem.

A collection of data is stored ,and each data item has a key associated with it.

Hashing

Page 4: Hashing in DS

What is a Hash Table ?

The simplest kind of hash table is an array of records.

This example has 701 records.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

An array of records

. . .

[ 700]

Page 5: Hashing in DS

What is a Hash Table ?

Each record has a special field, called its key.

In this example, the key is a long integer field called Number.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

. . .

[ 700]

[ 4 ]

Number 506643548

Page 6: Hashing in DS

What is a Hash Table ?

The number might be a person's identification number, and the rest of the record has information about the person.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

. . .

[ 700]

[ 4 ]

Number 506643548

Page 7: Hashing in DS

What is a Hash Table ?

When a hash table is in use, some spots contain valid records, and other spots are "empty".

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Page 8: Hashing in DS

The general idea of using the The general idea of using the keykey to determine the address of a to determine the address of a record is an excellent idea, but it must be modified so that a great record is an excellent idea, but it must be modified so that a great deal of space is not wasted.deal of space is not wasted.

This modification takes the form of a function H from the set K of This modification takes the form of a function H from the set K of keys in to the set L of memory addresses.keys in to the set L of memory addresses.

H : K H : K L (hash function) L (hash function) Chop is a technique, in which combine the pieces of key K to form Chop is a technique, in which combine the pieces of key K to form

the hash address H(k).the hash address H(k).key

key1 key2

Key1.1 Key1.2

Page 9: Hashing in DS

Methods of Hash Functions

Division methodDivision method:- Choose a number m larger than the :- Choose a number m larger than the number n of keys in k.number n of keys in k.

The hash function H is defined by The hash function H is defined by

H(k)=k(mod m) or H(k)=k(mod m) +1H(k)=k(mod m) or H(k)=k(mod m) +1 k(mod m) denotes the remainder when k is divided by m.k(mod m) denotes the remainder when k is divided by m. Second formula is used when we want the hash addresses Second formula is used when we want the hash addresses

to range from 1 to m. rather than o to m-1to range from 1 to m. rather than o to m-1..

Page 10: Hashing in DS

Example of division method

68 employees is assigned a unique 4-digit employee number. 68 employees is assigned a unique 4-digit employee number. Suppose L consists of 100 two-digit addresses. 00,01,02,Suppose L consists of 100 two-digit addresses. 00,01,02,……99. we apply the division method to each of the ……99. we apply the division method to each of the employee number: 3205, 7148, 2345.employee number: 3205, 7148, 2345.

1. Choose a prime number m close to 99 such as m=97. then 1. Choose a prime number m close to 99 such as m=97. then

H(3205)=4 , H(7148)=67 , H(2345)=17.H(3205)=4 , H(7148)=67 , H(2345)=17.

2. H(3205)=4+1=52. H(3205)=4+1=5

H(7148)=67+1=68H(7148)=67+1=68

H(2345)=17+1=18H(2345)=17+1=18

Page 11: Hashing in DS

Mid square method

The key k is squared. Than the hash function H is defined The key k is squared. Than the hash function H is defined byby

H(k)=lH(k)=l Here l is obtained by deleting digits from both ends of k*k.Here l is obtained by deleting digits from both ends of k*k.

Example:Example:

k: 3205 k: 3205 71487148 23452345

k*k: 10 272 025 k*k: 10 272 025 51 093 90451 093 904 5 499 025 5 499 025

H(k): 72H(k): 72 9393 9999

Page 12: Hashing in DS

Folding method

The key k is partitioned into a number of parts, k1,k2,The key k is partitioned into a number of parts, k1,k2,….kr.….kr.

Parts are added together, ignoring the last carry.Parts are added together, ignoring the last carry.

H(k)=k1+k2+……+krH(k)=k1+k2+……+kr Example:-Example:-

H(3205)=32+05=37H(3205)=32+05=37

H(7148)=71+48=19 (the leading digit 1 in this is ignored).H(7148)=71+48=19 (the leading digit 1 in this is ignored).

H(2345)=23+45=68H(2345)=23+45=68

Page 13: Hashing in DS

Inserting a New Record

In order to insert a new record, the key must somehow be converted to an array index.

The index is called the hash value of the key.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Number 580625685

Page 14: Hashing in DS

Inserting a New Record

Typical way create a hash value:

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Number 580625685

(Number mod 701)

What is (580625685 mod 701) ?

Page 15: Hashing in DS

Inserting a New Record

Typical way to create a hash value:

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Number 580625685

(Number mod 701)

What is (580625685 mod 701) ?3

Page 16: Hashing in DS

Inserting a New Record

The hash value is used for the location of the new record.

Number 580625685

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

[3]

Page 17: Hashing in DS

Inserting a New Record

The hash value is used for the location of the new record.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

Page 18: Hashing in DS

Collisions

Here is another new record to insert, with a hash value of 2.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

Number 701466868

My hashvalue is [2].

Page 19: Hashing in DS

Collisions

This is called a collision, because there is already another valid record at [2].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

Number 701466868

When a collision occurs,

move forward until youfind an empty spot.

Page 20: Hashing in DS

Collisions

This is called a collision, because there is already another valid record at [2].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

Number 701466868

When a collision occurs,

move forward until youfind an empty spot.

Page 21: Hashing in DS

Collisions

This is called a collision, because there is already another valid record at [2].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

Number 701466868

When a collision occurs,

move forward until youfind an empty spot.

Page 22: Hashing in DS

Collisions

This is called a collision, because there is already another valid record at [2].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

The new record goesin the empty spot.

Page 23: Hashing in DS

Searching for a Key

The data that's attached to a key can be found fairly quickly.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

Page 24: Hashing in DS

Searching for a Key

Calculate the hash value. Check that location of the array

for the key.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

My hashvalue is [2].

Not me.

Page 25: Hashing in DS

Searching for a Key

Keep moving forward until you find the key, or you reach an empty spot.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

My hashvalue is [2].

Not me.

Page 26: Hashing in DS

Searching for a Key

Keep moving forward until you find the key, or you reach an empty spot.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

My hashvalue is [2].

Not me.

Page 27: Hashing in DS

Searching for a Key

Keep moving forward until you find the key, or you reach an empty spot.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

My hashvalue is [2].

Yes!

Page 28: Hashing in DS

Searching for a Key

When the item is found, the information can be copied to the necessary location.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

My hashvalue is [2].

Yes!

Page 29: Hashing in DS

Deleting a Record

Records may also be deleted from a hash table.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Pleasedelete me.

Page 30: Hashing in DS

Deleting a Record

Records may also be deleted from a hash table. But the location must not be left as an ordinary

"empty spot" since that could interfere with searches.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Page 31: Hashing in DS

Deleting a Record

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 233667136Number 281942902

Number 155778322

. . .Number 580625685 Number 701466868

Records may also be deleted from a hash table. But the location must not be left as an ordinary

"empty spot" . The location must be marked in some special way so

that a search can tell that the spot used to have something in it.

Page 32: Hashing in DS

COLLISION RESOLUTION

There are two general ways of resolving collisions.There are two general ways of resolving collisions.

The particular procedure that one chooses depends on many The particular procedure that one chooses depends on many factors. One important factor is the ratio of the number n of keys factors. One important factor is the ratio of the number n of keys in K(which is the number of the record in F) to the number m of in K(which is the number of the record in F) to the number m of hash addresses in L. this ratio, = n/m is called load factor. hash addresses in L. this ratio, = n/m is called load factor.

n= number of filled records.n= number of filled records.

m= total number of memory locations.m= total number of memory locations.

Page 33: Hashing in DS

The efficiency of The efficiency of hash function hash function with a collision resolution with a collision resolution procedure is measured by the average number of procedure is measured by the average number of probes(key comparisons) needed to find the location of the probes(key comparisons) needed to find the location of the record with a given key k.the efficiency depends mainly on record with a given key k.the efficiency depends mainly on the load factor . The following two quantities:the load factor . The following two quantities:

S( ) S( ) = average number of probes for a successful search.= average number of probes for a successful search.

U( ) = average number of probes for an unsuccessful U( ) = average number of probes for an unsuccessful search.search.

Page 34: Hashing in DS

Open addressing

If the home slot for the record that is being inserted is already If the home slot for the record that is being inserted is already

occupied, then simply choose a different location with in the tableoccupied, then simply choose a different location with in the table..

But…how do we choose this alternate location? The technique must be reproducible, and on average be cheap.

Page 35: Hashing in DS

Linear Probing

Linear probing involves simply walking down the table until an Linear probing involves simply walking down the table until an

empty slot is foundempty slot is found..

Page 36: Hashing in DS

Drawbacks of linear probing

The major drawbacks of linear probing is that, as the table The major drawbacks of linear probing is that, as the table becomes about half full, here is a tendency toward becomes about half full, here is a tendency toward clusteringclustering..

The sequential searches needed to find an empty position The sequential searches needed to find an empty position become longer and longer.become longer and longer.

Example:- if a new insertion hashes to location b, then it will Example:- if a new insertion hashes to location b, then it will go there, but if it hashes to location a, then it will also go go there, but if it hashes to location a, then it will also go into b.into b.

a b c d e

Page 37: Hashing in DS

The problem of clustering is essentially one of The problem of clustering is essentially one of instabilityinstability. . If a few keys happen randomly to be near each other, then If a few keys happen randomly to be near each other, then it becomes more and more likely that other keys will join it becomes more and more likely that other keys will join them, and the distribution will become progressively more them, and the distribution will become progressively more

unbalancedunbalanced..

Page 38: Hashing in DS

Avoid the problem of clustering

Quadratic probing:- Quadratic probing:- suppose a record R with key k has the suppose a record R with key k has the hash address H(k)=h. then, instead of searching the locations hash address H(k)=h. then, instead of searching the locations with addresses h, h+1, h+2,…. We linearly search the with addresses h, h+1, h+2,…. We linearly search the locations with addresses.locations with addresses.

h, h+1, h+4, h+9, h+16,….h+i^2…h, h+1, h+4, h+9, h+16,….h+i^2…

If the number m of locations in the table T is a prime number, If the number m of locations in the table T is a prime number, then the above sequence will access half of the locations in T.then the above sequence will access half of the locations in T.

Page 39: Hashing in DS

Double Hashing

A second hash function H’ is used for resolving a A second hash function H’ is used for resolving a collision. Suppose a record R with key k has the hash collision. Suppose a record R with key k has the hash addresses H(k)=h and H’(k)= h’ m then we linearly addresses H(k)=h and H’(k)= h’ m then we linearly search the locations with addresses.search the locations with addresses.

h, h+h’, h+2h’, h+3h’,….h, h+h’, h+2h’, h+3h’,….

if m is a prime number , then the above sequence all the if m is a prime number , then the above sequence all the locations in the table T.locations in the table T.

Page 40: Hashing in DS

Chaining

Design the table so that each slot is actually a container that can hold multiple records.

Here, the “chains" are linked lists which could hold any number of colliding records. Alternatively each table slot could be large enough to store several records directly…in that case the slot may overflow, requiring a fallback…

Page 41: Hashing in DS

EXAMPLE

LIST

8

3

0

5

7

0

0

2

0

0

6

INFO LINK

A 0

B 0

C 0

D 0

E 1

X 4

Y 0

Z 0

10

11

12

3

4

6

5

7

8

910