64
1 Chapter 5 Hashing • General ideas • Methods of implementing the hash table • Comparison among these methods • Applications of hashing • Compare hash tables with binary search trees

1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

Embed Size (px)

Citation preview

Page 1: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

1

Chapter 5 Hashing

• General ideas

• Methods of implementing the hash table

• Comparison among these methods

• Applications of hashing

• Compare hash tables with binary search trees

Page 2: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

2

5.1 General Ideas

• Hash table is a fixed size (TableSize) array containing keys.

• Each key is mapped into some number in the range 0 to TableSize - 1, and placed in the appropriate cells.

Page 3: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

3

5.1 General Ideas

• The mapping is called a hash function, which should be simple to compute and should ensure that any two distinct keys get different cells. It should distribute the keys evenly among the cells.

• Collision occurs when 2 or more keys are mapped to the same cell.

Page 4: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

4

5.2 Hash Function

Simple Hash Function

• For numeric keys, one simple hash function is Key mod TableSize, where TableSize is a prime number.

• Assume the key value is 9 digits, and there are 2500 keys. To reduce collision, choose the table size so that the load factor is about 50%.

Page 5: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

5

5.2 Hash Function

Select TableSize to be 4999, a prime number close to 5000.Key Value Address Key Value Address

123456789 1485 987654118 1688 **123456790 1486 555555555 1688 **000000504 0504 * 101129183 4412200120472 0504 * 200120473 0505118920912 4700 010600010 2130200120000 0032 027001191 1592

* and ** indicate collisions

Page 6: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

6

5.2 Hash Function

Hash by Folding

• Partition the key into several parts, usually 3 parts of about equal length.

• Partitions are folded over each other and summed.

• The remainder of the sum divided by TableSize is the hash value.

Page 7: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

7

5.2 Hash Function

• Example (use 2-4-3 folding instead of 3-3-3 to illustrate folding, and set TableSize to 10000 for ease of illustration)

Key Value Folding Address

123456789 2100+3456+0987 6543987654321 8900+7654+0123 6677123456790 2100+3456+0097 5653555555555 5500+5555+0555 1610 *000000472 0000+0000+0274 0247101129183 0100+1129+0381 1610 *200120472 0200+0120+0274 0594200120473 0200+0120+0374 0694118920912 1100+8920+0219 0239010600010 1000+0600+0010 1610 *200120000 0200+0120+0000 0320027001191 2000+7001+0191 9192

Page 8: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

8

5.2 Hash Function

Mid-Square Method

• The key is multiplied by itself (squared).

• The middle few digits of the result are used as the hash value.

• The exact number of digits to be used depends on the size of the table.

Page 9: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

9

5.2 Hash Function

• Suppose the key is 12345.

• 123452 = 152 399 025

• The middle 3 digits 399 is the hash value.

• If TableSize is 200, then 399 mod 200 = 199 is the hash value.

• Avoid the situation where the middle digits are zeros.

Page 10: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

10

5.2 Hash Function

Character Keys

• One simple method to convert keys to numbers is to add up the ASCII values of the characters in the string, e.g., the string HongKong becomes 795 (72+111+110+103+75+111+110+103)

Page 11: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

11

5.2 Hash Functiontypedef unsigned int Index;

/* Fig 5.3 */

Index Hash1(const char *Key, int TableSize)

{

unsigned int HashVal = 0;

while (*Key != '\0')

HashVal += *Key++;

return HashVal % TableSize;

}

Page 12: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

12

5.3 Separate Chaining

• Keep a list of all elements that hash to the same value

• Example: Hash (X) = X mod 10, with new elements inserted at the end of the list, and the data sequence 0, 4, 9, 16, 25, 36, 49, 64, 81

Page 13: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

13

5.3 Separate Chaining

Page 14: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

14

5.3 Separate Chaining

• Type declaration for separate chaining /* Fig 5.7 */

#ifndef _HashSep_H

struct ListNode;

typedef struct ListNode *Position;

struct HashTbl;

typedef struct HashTbl *HashTable;

Page 15: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

15

5.3 Separate Chaining

HashTable InitializeTable (int TableSize);

void DestroyTable (HashTable H);

Position Find (ElementType Key, HashTable H );

void Insert (ElementType Key, HashTable H);

ElementType Retrieve (Position P);

/* Routines such as Delete and MakeEmpty are omitted */

#endif /* _HashSep_H */

Page 16: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

16

5.3 Separate Chaining

struct ListNode

{ ElementType Element;

Position Next;

};

typedef Position List;

struct HashTbl

{ int TableSize;

List *TheLists;

};

Page 17: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

17

5.3 Separate Chaining

• Initialization routine for separate chaining /* Fig 5.8 */

HashTable

InitializeTable (int TableSize)

{

HashTable H;

int i;

Page 18: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

18

5.3 Separate Chaining

if (TableSize < MinTableSize)

{

Error ("Table size too small");

return NULL;

}

/* Allocate table */

H = malloc (sizeof (struct HashTbl));

Page 19: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

19

5.3 Separate Chaining

if (H == NULL)

FatalError ("Out of space!!!");

H->TableSize = NextPrime (TableSize);

/* Allocate array of lists */

H->TheLists = malloc (sizeof (List) * H->

TableSize);

if (H->TheLists == NULL)

FatalError ("Out of space!!!");

Page 20: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

20

5.3 Separate Chaining

/* Allocate list headers */

for (i = 0; i < H->TableSize; i++)

{ H->TheLists [i] = malloc (sizeof (struct

ListNode));

if (H->TheLists [i] == NULL)

FatalError ("Out of space!!!");

else

H->TheLists [i]->Next = NULL;

}

Page 21: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

21

5.3 Separate Chaining

return H;

}

Page 22: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

22

5.3 Separate Chaining

• Find routine for separate chaining/* Fig 5.9 */

Position

Find (ElementType Key, HashTable H)

{

Position P;

List L;

Page 23: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

23

5.3 Separate Chaining

L = H->TheLists [Hash (Key, H->TableSize)];

P = L->Next;

while (P != NULL && P->Element != Key)

/* Probably need strcmp!! */

P = P->Next;

return P;

}

Page 24: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

24

5.3 Separate Chaining

• Insert routine for separate chaining /* Fig 5.10 */

void

Insert (ElementType Key, HashTable H)

{

Position Pos, NewCell;

List L;

Page 25: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

25

5.3 Separate Chaining

Pos = Find (Key, H);

if (Pos == NULL) /* Key is not found */

{

NewCell = malloc (sizeof (struct ListNode));

if (NewCell == NULL)

FatalError ("Out of space!!!");

else

{

Page 26: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

26

5.3 Separate Chaining

L = H->TheLists [Hash (Key, H->

TableSize)];

NewCell->Next = L->Next;

/* Probably need strcpy! */

NewCell->Element = Key;

L->Next = NewCell;

}

}

}

Page 27: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

27

5.3 Separate Chaining

• Effort required to perform a search is the constant time required to evaluate the hash function plus the time to traverse the list.

• Average list length = (load factor)

• Successful search requires about 1 + /2 links to be traversed.

• Unsuccessful search requires about 1 + links to be traversed.

Page 28: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

28

5.3 Separate Chaining

• A general rule is to make the table size as large as the expected number of elements.

• Chaining could be through a list or a tree.

• A disadvantage of separate chaining is that it requires a second data structure for the chains. Time is required for the allocation of new cells on insertion.

Page 29: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

29

5.4 Open Addressing

• If collision occurs, alternative cells are tried until an empty cell is found.

• hi(X) = (Hash (X) + F(i)) mod TableSize, with F(0) = 0

• Load factor should be below 0.5.

• Try consecutive locations (with wraparound), i.e., F(i) = i.

Page 30: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

30

5.4.1 Linear Probing

• Example: Key sequence 89, 18, 49, 58, 69

Page 31: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

31

5.4.1 Linear Probing

• Primary clustering

Any key that hashes into the cluster will require several attempts to resolve the collision,and then it will add to the cluster.

• Expected number of probes for successful search is

S = 1/2(1+1/(1-))

Page 32: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

32

5.4.1 Linear Probing

• Primary clustering

Any key that hashes into the cluster will require several attempts to resolve the collision,and then it will add to the cluster.

Page 33: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

33

5.4.1 Linear Probing

• Expected number of probes for successful search is

S = 1/2(1+1/(1-))

• Expected number of probes for insertion and unsuccessful search and is

2) 1(

11

2

1

U

Page 34: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

34

5.4.1 Linear Probing

• For random collision resolution strategy (each probe is independent of the previous probes),

1

1ln

11

1

S

U

Page 35: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

35

5.4.1 Linear Probing

Page 36: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

36

5.4.2 Quadratic Probing

• Eliminates the primary clustering problem• The collision function is quadratic, e.g.,

F(i) = i2

• No guarantee that all cells are tried.• No guarantee of finding an empty cell once

the table gets more than half full, or even before the table gets full if the table size is not prime.

Page 37: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

37

5.4.2 Quadratic Probing

Page 38: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

38

5.4.2 Quadratic Probing

• Eliminates the primary clustering problem• The collision function is quadratic, e.g.,

F(i) = i2

• No guarantee that all cells are tried.• No guarantee of finding an empty cell once

the table gets more than half full, or even before the table gets half full if the table size is not prime.

Page 39: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

39

5.4.2 Quadratic Probing

Type declaration for open addressing

typedef int ElementType;

/* Fig. 5.14 */

#ifndef _HashQuad_H

typedef unsigned int Index;

typedef Index Position;

Page 40: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

40

5.4.2 Quadratic Probing

/* Place in the implementation file */

enum KindOf Entry {Legitimate, Empty,

Deleted}

struct HashEntry

{ ElementType Element;

enum KindOfEntry Info;

};

Page 41: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

41

5.4.2 Quadratic Probing

typedef struct HashEntry Cell;

/* Cell *TheCells will be allocated later */

struct HashTbl

{

int TableSize;

Cell *TheCells;

};

Page 42: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

42

5.4.2 Quadratic Probing

struct HashTbl;

typedef struct HashTbl *HashTable;

HashTable InitializeTable (int TableSize);

void DestroyTable (HashTable H);

Position Find (ElementType Key,

HashTable H);

void Insert (ElementType Key, HashTable H);

Page 43: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

43

5.4.2 Quadratic Probing

ElementType Retrieve (Position P,

HashTable H);

HashTable Rehash (HashTable H);

/* Delete & MakeEmpty are omitted */

#endif /* _HashQuad_H */

Page 44: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

44

5.4.2 Quadratic Probing

Routine to initialize open addressing hash table

/* Fig. 5.15 */

HashTable

InitializeTable (int TableSize)

{

HashTable H;

int i;

Page 45: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

45

5.4.2 Quadratic Probing

if (TableSize < MinTableSize)

{

Error ("Table size too small");

return NULL;

}

/* Allocate table */

H = malloc (sizeof (struct HashTbl));

Page 46: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

46

5.4.2 Quadratic Probing

if (H == NULL)

FatalError ("Out of space!!!");

H->TableSize = NextPrime (TableSize);

/* Allocate array of Cells */

H->TheCells = malloc (sizeof (Cell) *

H ->TableSize);

Page 47: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

47

5.4.2 Quadratic Probing

if (H->TheCells == NULL)

FatalError ("Out of space!!!");

for (i = 0; i < H->TableSize; i++ )

H->TheCells [i].Info = Empty;

return H;

}

Page 48: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

48

5.4.2 Quadratic Probing

Routine for hashing with quadratic probing

/* Fig. 5.16 */

Position

Find (ElementType Key, HashTable H)

{

Position CurrentPos;

int CollisionNum;

Page 49: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

49

5.4.2 Quadratic Probing

CollisionNum = 0;

CurrentPos = Hash (Key, H->TableSize);

while (H->TheCells [CurrentPos].Info !=

Empty && H->

TheCells [CurrentPos].Element != Key)

/* Probably need strcmp!! */

{

CurrentPos += 2 * ++CollisionNum - 1;

Page 50: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

50

5.4.2 Quadratic Probing

if (CurrentPos >= H->TableSize)

CurrentPos -= H->TableSize;

}

return CurrentPos;

}

• If the table size is prime, a new element can always be inserted if the table is at least half empty.

Page 51: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

51

5.4.2 Quadratic Probing

• Standard deletion cannot be performed in an open addressing hash table because the cell might have caused a collision to go past it.

• Secondary clustering problem - elements hash to the same position will probe the same alternative cells.

Page 52: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

52

5.4.3 Double Hashing

• F (i) = i * hash2(X), hash2(X) should not be zero

• An example is hash2(X) = R - (X mod R), where R is a prime number smaller than TableSize (R=7 in the following table).

Page 53: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

53

5.4.3 Double Hashing

Page 54: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

54

5.5 Rehashing

• Build another hash table that is about twice as big, with a new hash function.

• Suppose the elements 13, 15, 24, 6 and 23 are inserted into the original table using the function h(X) = X mod 7, with linear probing:

Page 55: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

55

5.5 Rehashing

Page 56: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

56

5.5 Rehashing

• Rehashing with a table with 17 cells, h(X) = X mod 17

Page 57: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

57

5.5 Rehashing

Rehashing for Open Addressing /* Fig 5.22 */

HashTable

Rehash (HashTable H)

{int i, OldSize;

Cell *OldCells;

OldCells = H->TheCells;

OldSize = H->TableSize;

Page 58: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

58

5.5 Rehashing

/* Get a new, empty table */

H = InitializeTable (2 * OldSize);

/* Scan through old table, reinsert into new */

for (i = 0; i < OldSize; i++ )

if (OldCells [i].Info == Legitimate)

Insert (OldCells [i].Element, H);

free (OldCells);

return H; }

Page 59: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

59

5.5 Rehashing

• Rehashing operation is O(N) per N/2 inserts, i.e., a constant cost to each insertion.

• Slow down interactive operations

Page 60: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

60

5.5 Rehashing

• Strategies of implementing rehashing with quadratic probing– rehash as soon as half full

– rehash only when insertion fails

– rehash when the load factor reaches a certain threshold

Page 61: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

61

Chapter 5 Summary

• Hash table can be used to implement the Insert and Find operations in constant average time.

• For separate chaining, the load factor should be close to 1.

• For open addressing, the load factor should not exceed 0.5.

Page 62: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

62

Chapter 5 Summary

• Rehashing can be implemented to allow the table to grow.

• Comparison between binary search trees and hash tables– difficult to find the minimum (or maximum)

element in a hash table

– cannot find a range of elements in a hash table

Page 63: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

63

Chapter 5 Summary

– O(log N) is not necessarily that much more than (1), since there are no multiplications or divisions by search trees.

– Sorted input can make binary trees perform poorly.

Page 64: 1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with

64

Chapter 5 Summary

• Some applications of hash table– files for which records are not required to be

arranged in a particular order

– compilers use symbol tables to keep track of declared variables (no delete operation)

– on-line spelling checkers can store an entire dictionary in a hash table