60
Asst. Prof. Dr. İlker Kocabaş Hash Tables

Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

Embed Size (px)

Citation preview

Page 1: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

Asst. Prof. Dr. İlker Kocabaş

Hash Tables

Page 2: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

2

Overview

Information RetrievalBinary Search TreesHashing.Applications.Example.Hash Functions.

Hash TablesCollisionsLinear ProbingProblems with Linear ProbChaining

Page 3: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

3

R. Kruse, C. Tondo, B. Leung, “Data Structures and Program Design in C”, 1991, Prentice Hall.

E. Horowitz, S. Salini, S. Anderson-Freed, “Fundamentals of Data Structures in C”, 1993, Computer Science Press.

R. Sedgewick, “Algorithms in C”, 1990, Addison-Wesley.

A. Aho, J. Hopcroft, J. Ullman, “Data Structures and Algorithms”, 1983, Addison-Wesley.

T.A. Standish, “Data Structures, Algorithms & Software Principles in C”, 1995, Addison-Wesley.

D. Knuth, “The Art of Computer Programming”, 1975, Addison-Wesley.

Y. Langsam, M. Augenstein, M. Fenenbaum, “Data Structures using C and C++”, 1996, Prentice Hall.

Example: Bibliography

Page 4: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

4

Insert the information into a Binary Search Tree,

using the first author’s surname as the key

Page 5: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

5

Kruse

Horowitz Sedgewick

Aho Knuth

Langsam

Standish

Insert the information into a Binary Search Tree,

using the first author’s surname as the key

Kruse Horowitz Sedgewick Aho Knuth Langsam Standish

Page 6: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

6

Complexity

Inserting Balanced Trees O(log(n)) Unbalanced Trees O(n)

Searching Balanced Trees O(log(n)) Unbalanced Trees O(n)

Page 7: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

7

Hashing

key

hash function

0

1

2

3

TABLESIZE - 1

:

:

hash table

pos

Page 8: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

8

“Kruse”

0

1

2

3

6

4

5

hash table

Example:

5

Kruse

hash function

Page 9: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

9

Hashing

Each item has a unique key.Use a large array called a Hash Table.Use a Hash Function.

Page 10: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

10

Applications

Databases.Spell checkers.Computer chess games.Compilers.

Page 11: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

11

Operations

Initialize all locations in Hash Table are empty.

InsertSearchDelete

Page 12: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

12

Hash Function

Maps keys to positions in the Hash Table.Be easy to calculate.Use all of the key.Spread the keys uniformly.

Page 13: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

13

unsigned hash(char* s){ int i = 0; unsigned value = 0; while (s[i] != ‘\0’) { value = (s[i] + 31*value) % 101; i++; } return value;}

Example: Hash Function #1

Page 14: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

14

A. Aho, J. Hopcroft, J. Ullman, “Data Structures and Algorithms”, 1983, Addison-Wesley.

‘A’ = 65

‘h’ = 104

‘o’ = 111

value = (65 + 31 * 0) % 101 = 65

value = (104 + 31 * 65) % 101 = 99

value = (111 + 31 * 99) % 101 = 49

Example: Hash Function #1

value = (s[i] + 31*value) % 101;

Page 15: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

15

resultingtable is “sparse”

Example: Hash Function #1

value = (s[i] + 31*value) % 101;Hash

Key ValueAho 49Kruse 95Standish 60Horowitz 28Langsam 21Sedgewick 24Knuth 44

Page 16: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

16

value = (s[i] + 1024*value) % 128;

Example: Hash Function #2

likely toresult in

“clustering”

Hash Key ValueAho 111Kruse 101Standish 104Horowitz 122Langsam 109Sedgewick 107Knuth 104

Page 17: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

17

Example: Hash Function #3

“collisions”

value = (s[i] + 3*value) % 7;

Hash Key Value

Aho 0Kruse 5Standish 1Horowitz 5Langsam 5Sedgewick 2Knuth 1

Page 18: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

18

Insert

Apply hash function to get a position.Try to insert key at this position.Deal with collision.

Page 19: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

19

Aho Hash Function

0

1

2

3

6

4

5

hash table

Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth

0

Example: Insert

Aho

Page 20: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

20

Kruse

0

1

2

3

6

4

5

hash table

Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth

5

Example: Insert

Aho

Kruse

Hash Function

Page 21: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

21

Standish

0

1

2

3

6

4

5

hash table

Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth

1

Example: Insert

Aho

Kruse

StandishHash

Function

Page 22: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

22

Search

Apply hash function to get a position.Look in that position.Deal with collision.

Page 23: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

23

Kruse

Kruse

0

1

2

3

6

4

5

hash table

Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth

5

Example: Search

Aho

StandishHash

Function

found.

Page 24: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

24

Kruse

Sedgwick

0

1

2

3

6

4

5

hash table

Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth

2

Example: Search

Aho

StandishHash

Function

Not found.

Page 25: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

Hash Tables:Collision Resolution

Page 26: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

26

Hashing

key

hash function

0

1

2

3

TABLESIZE - 1

:

:

hash table

pos

Page 27: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

27

“Kruse”

0

1

2

3

6

4

5

hash table

Example:

5

Kruse

hash function

Page 28: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

28

Hashing

Each item has a unique key.Uses a large array called a Hash Table.Uses a Hash Function.

Hash Function• Maps keys to positions in the Hash Table.• Be easy to calculate.• Use all of the key.• Spread the keys uniformly.

Page 29: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

29

Hash Table Operations

Initialize all locations in Hash Table are empty.

InsertSearchDelete

Page 30: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

30

Example: Hash Function #3

“collisions”

value = (s[i] + 3*value) % 7;

Hash Key Value

Aho 0Kruse 5Standish 1Horowitz 5Langsam 5Sedgewick 2Knuth 1

Page 31: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

31

Collision

When two keys are mapped to the same position.Very likely.

10

20

30

40

50

60

70

0.1169

0.4114

0.7063

0.8912

0.9704

0.9941

0.9992

Number of People ProbabilityBirthdays

Page 32: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

32

Collision Resolution

Two methods are commonly used: Linear Probing. Chaining.

Page 33: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

33

Linear Probing

Linear search in the array from the position where collision occurred.

Page 34: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

34

Insert with Linear Probing

Apply hash function to get a position.Try to insert key at this position.Deal with collision. Must also deal with a full table!

Page 35: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

35

Aho Hash Function

0

1

2

3

6

4

5

hash table

Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth

0

Example: Insert with Linear Probing

Aho

Page 36: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

36

Kruse

0

1

2

3

6

4

5

hash table

Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth

5

Example: Insert with Linear Probing

Aho

Kruse

Hash Function

Page 37: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

37

Standish

0

1

2

3

6

4

5

hash table

Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth

1

Example: Insert with Linear Probing

Aho

Kruse

Standish

Hash Function

Page 38: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

38

Horowitz

0

1

2

3

6

4

5

hash table

Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth

5

Example: Insert with Linear Probing

Aho

Kruse

Standish

Horowitz

Hash Function

Page 39: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

39

Langsam

0

1

2

3

6

4

5

hash table

Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth

5

Example: Insert with Linear Probing

Aho

Kruse

Standish

Horowitz

Hash Function

Langsam

Page 40: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

40

module linearProbe(item){ position = hash(key of item) count = 0 loop { if (count == hashTableSize) then { output “Table is full” exit loop } if (hashTable[position] is empty) then { hashTable[position] = item exit loop } position = (position + 1) % hashTableSize count++ }}

Page 41: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

41

Search with Linear Probing

Apply hash function to get a position.Look in that position.Deal with collision.

Page 42: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

42

Kruse

Langsam

0

1

2

3

6

4

5

hash table

Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth

5

Example: Search with Linear Probing

Aho

Standish

Horowitz

Hash Function

Langsam

found.

Page 43: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

43

Kruse

Knuth

0

1

2

3

6

4

5

hash table

1

Example: Search with Linear Probing

Aho

Standish

Horowitz

Hash Function

Langsam

not found.

Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth

Page 44: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

44

module search(target){ count = 0 position = hash(key of target) loop { if (count == hashTableSize) then { output “Target is not in Hash Table” return -1. } else if (hashTable[position] is empty) then { output “Item is not in Hash Table” return -1. } else if (hashTable[position].key == target) then { return position. } position = (position + 1) % hashTableSize count++ }}

Page 45: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

45

Delete with Linear Probing

Use the search function to find the itemIf found check that items after that also don’t hash to the item’s positionIf items after do hash to that position, move them back in the hash table and delete the item.

Very difficult and time/resource consuming!Very difficult and time/resource consuming!

Page 46: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

46

Linear Probing: Problems

Speed.Tendency for clustering to occur as the table becomes half full.Deletion of records is very difficult.If implemented in arrays – table may become full fairly quickly, resizing is time and resource consuming

Page 47: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

47

Chaining

Uses a Linked List at each position in the Hash Table. Linked list at a position contains all the items that ‘hash’

to that position. May keep linked lists sorted or not.

Page 48: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

48

hash table0

1

2

3

:

:

Page 49: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

49

0

1

2

3

6

4

5

Aho, Kruse, Standish, Horowiz, Langsam, Sedgwick, Knuth

Example: Chaining

0, 5, 1, 5, 5, 2, 1

3

1

2

Kruse Horowitz

Knuth

1

Standish

Aho

Sedgewick

Langsam

0

0

0

Page 50: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

50

Hashtable with ChainingAt each position in the array you have a list:

List hashTable[MAXTABLE];

You must initialise each list in the table.0

1

2

1

2

1

:

Page 51: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

51

Insert with Chaining

Apply hash function to get a position in the array.Insert key into the Linked List at this position in the array.

Page 52: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

52

module InsertChaining(item){ posHash = hash(key of item)

insert (hashTable[posHash], item); }

0

1

2

1

2 Knuth

1

Standish

Aho

Sedgewick:

Page 53: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

53

Search with Chaining

Apply hash function to get a position in the array.Search the Linked List at this position in the array.

Page 54: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

54

/* module returns NULL if not found, or the address of the * node if found */

module SearchChaining(item){ posHash = hash(key of item) Node* found;

found = searchList (hashTable[posHash], item);

return found;

}

54

0

1

2

1

2 Knuth

1

Standish

Aho

Sedgewick:

Page 55: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

55

Delete with Chaining

Apply hash function to get a position in the array.Delete the node in the Linked List at this position in the array.

Page 56: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

/* module uses the Linked list delete function to delete an item *inside that list, it does nothing if that item isn’t there. */

module DeleteChaining(item){ posHash = hash(key of item) deleteList (hashTable[posHash], item);}

56

0

1

2

1

2 Knuth

1

Standish

Aho

Sedgewick:

Page 57: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

57

Disadvantages of Chaining

Uses more space.More complex to implement. Contains a linked list at every element in

the array. Requires linear searching. May be time consuming.

Page 58: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

58

Advantages of Chaining

Insertions and Deletions are easy and quick.Allows more records to be stored.Naturally resizable, allows a varying number of records to be stored.

Page 59: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

Dictionaries and Hash Tables 59

Double HashingDouble hashing uses a secondary hash function d(k) and handles collisions by placing an item in the first available cell of the series

(i jd(k)) mod N for j 0, 1, … , N 1The secondary hash function d(k) cannot have zero valuesThe table size N must be a prime to allow probing of all the cells

Common choice of compression map for the secondary hash function: d2(k) q k mod q

where q N q is a prime

The possible values for d2(k) are

1, 2, … , q

Page 60: Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables

Dictionaries and Hash Tables 60

Performance of Hashing

In the worst case, searches, insertions and removals on a hash table take O(n) timeThe worst case occurs when all the keys inserted into the dictionary collideThe load factor nN affects the performance of a hash tableAssuming that the hash values are like random numbers, it can be shown that the expected number of probes for an insertion with open addressing is

1 (1 )

The expected running time of all the dictionary ADT operations in a hash table is O(1) In practice, hashing is very fast provided the load factor is not close to 100%Applications of hash tables:

small databases compilers browser caches