143
1 Searching and Hashing

1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

Embed Size (px)

Citation preview

Page 1: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

1

Searching and Hashing

Page 2: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

2

Concepts This Lecture

Searching an array Linear search Binary search Comparing algorithm performance

Page 3: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

3

Searching

Searching = looking for something Searching an array is particularly common

Goal: determine if a particular value is in the array

We'll see that more than one algorithm will work

Page 4: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

4

Searching Algorithms

The algorithm used to find a number in a phone book is practical and efficient for human but not so good for computers It's not precise It's not consistent

Let's imagine another scenario. Suppose that you have A pile of cards containing names of customers They are not organized in any particular way You want to find the card with name Sarah (your key)

The procedure you'll will use is likely to be: look a each card's key (one by one) until one matches your target This is an algorithm and is called Linear Search

Page 5: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

5

Searching as a Function Specification: Let b be the array to be searched, n is the size of the array, and b is x

is value being search for. If x appears in b[0..n-1], return its index, i.e., return k such that b[k]==x. If x not found, return –1

None of the parameters are changed by the function Function outline:

void Lookup ((const int vec[ ], int vSize, int key, Boolean& found, int& loc) {

...}

Page 6: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

6

Linear Search Algorithm: start at the beginning of the array and examine each

element until x is found, or all elements have been examined

void Lookup (const int vec[ ], int vSize, int key, Boolean& found, int& loc) {

loc = 0;

while (loc < vSize && vec[loc] != key)

loc++;

found = (loc < vSize);

}

Page 7: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

7

Linear Search

Test: search(v, 8, 6)

3 12 -5 6 142 21 -17 45b

Found It!

Page 8: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

8

Linear Search

Test: search(v, 8, 15)

3 12 -5 6 142 21 -17 45b

Ran off the end! Not found.

Page 9: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

9

Linear Search

Note: The loop condition is written so vec[loc] is not accessed if loc >= vSize.

while ( loc < vSize && vec[loc] != key )

(Why is this true? Why does it matter?)

3 12 -5 6 142 21 -17 45b

Page 10: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

10

Write a Recursive Linear Search

NodeType linearSearch(NodeType *start, int target) { if (start->key == target) return *start; if (start == NULL) return NULL; else return LinearSearch(start->next, target);}

NodeType linearSearch(NodeType *start, int target) { if (start->key == target) return *start; if (start == NULL) return NULL; else return LinearSearch(start->next, target);}

Page 11: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

11

Linear Search-Linked List

for each item in the list if the item's key match the target stop and report "success"report failure

for each item in the list if the item's key match the target stop and report "success"report failure

Page 12: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

12

Linear Search (target = 9)

headhead

55 1212 99

//

headhead

Page 13: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

13

Linear Search (target = 9)

headhead

55 1212 99

//

headhead

Page 14: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

14

Linear Search (target = 9)

headhead

55 1212 99

//

headhead

Page 15: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

15

Linear Search (target = n)

NodeType linearSearch(NodeType *start, int target) { NodeType *temp = start; while (temp != NULL) { if (temp->key == target) return *temp; temp = temp->next; } return NULL;}

NodeType linearSearch(NodeType *start, int target) { NodeType *temp = start; while (temp != NULL) { if (temp->key == target) return *temp; temp = temp->next; } return NULL;}

Page 16: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

16

Analyzing Linear Search

Best case analysis The element is always found in the first position of the list, which

means that we do one comparison: O(1) Worst case analysis

The element is never present in the list. This means that we are going to do n comparisons where n is the size of the listwe have to go through the whole list to be sure whether the element is

present: O(N) Average case analysis

The search key can be found anywhere in the list If we "run" the algorithm for each possibility where the key may appear

we get: 1+2+….+vSize/vSize => (vSize*(vSize+1)/2)/vSize = (vSize+1)/2 = O(N)

Page 17: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

17

Can we do better?

Time needed for linear search is proportional to the size of the array.

An alternate algorithm, "Binary search," works if the array is sorted 1. Look for the target in the middle. 2. If you don't find it, you can ignore half of the

array, and repeat the process with the other half.

Example: Find first page of pizza listings in the yellow pages

Page 18: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

18

Can we do better?

Time needed for linear search is proportional to the size of the array.

An alternate algorithm, "Binary search," works if the array is sorted 1. Look for the target in the middle. 2. If you don't find it, you can ignore half of the

array, and repeat the process with the other half.

Example: Find first page of pizza listings in the yellow pages

Page 19: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

19

Binary Search

In some cases, you get a list which is already ordered. In this case we can use algorithms that take this into

consideration The idea of binary search is

Split the list in two halves and compare the target with the key in the middle of the list

Based on this comparison we can tell which half of the list may contain the target

Binary search eliminates half of the list at each iteration

It requires direct access to the list elements

Page 20: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

20

Binary Search Strategy What we want: Find split between values larger

and smaller than x:

<= x > x

0 L R n

b

<= x > x?

0 L R n

b

Situation while searching

Step: Look at b[(L+R)/2]. Move L or R to the middle depending on test.

Page 21: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

21

Binary Search Strategy

More precisely

Values in b[0..L] <= x Values in b[R..n-1] > x Values in b[L+1..R-1] are unknown

<= x > x?

0 L R n

b

Page 22: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

22

Binary SearchIterative Approach

/* If x appears in b[0..n-1], return its location, i.e., return k so that b[k]==x. If x not found, return -1 */NodeType binarySearch(NodeType list[], int size, int target){

int front, back, mid;___________________ ;

while ( _______________ ) {

} _________________ ;}

<= x > x?0 L R n

b

Page 23: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

23

/* If x appears in b[0..n-1], return its location, i.e., return k so that b[k]==x. If x not found, return -1 */NodeType binarySearch(NodeType list[], int size, int target){

int front, back, mid;___________________ ;

while ( _______________ ) { mid = (front+back)/2;

if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; } _________________ ;}

<= x > x?0 L R n

b

Binary SearchIterative Approach

Page 24: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

24

Loop Termination/* If x appears in b[0..n-1], return its location, i.e., return k so that b[k]==x. If x not found, return -1 */NodeType binarySearch(NodeType list[], int size, int target){

int front, back, mid;___________________ ;

while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1; } _________________ ;}

<= x > x?0 L R n

b

Page 25: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

25

/* If x appears in b[0..n-1], return its location, i.e., return k so that b[k]==x. If x not found, return -1 */

NodeType binarySearch(NodeType list[], int size, int target) { int front(0); int back(size-1); int mid; while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1; } _________________ ;}

Initialization

<= x > x0 L R n

b

Page 26: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

26

NodeType binarySearch(NodeType list[], int size, int target) { int front(0); int back(size-1); int mid; while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1; } return NULL; \\ Indicates target was not found;}

Return Result

<= x > x0 L R n

b

Page 27: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

27

Binary Search

Test: bsearch(v,8,3);

-17 -5 3 6 12 21 45 142b

0 1 2 3 4 5 6 7

L Rmid

while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1;

RmidL midL

Page 28: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

28

Binary Search

Test: bsearch(v,8,17);

-17 -5 3 6 12 21 45 142b

L Rmid

while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1;

midmidL RL

0 1 2 3 4 5 6 7

Page 29: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

29

Binary Search

Test: bsearch(v,8,143);

-17 -5 3 6 12 21 45 142b

L Rmid

while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1;

midmidmidL L L L

0 1 2 3 4 5 6 7

Page 30: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

30

Binary Search

Test: bsearch(v,8,-143);

-17 -5 3 6 12 21 45 142b

L Rmid

while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1;

midmid RRR

0 1 2 3 4 5 6 7

Page 31: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

31

Binary Search (target = n)

NodeType binarySearch(NodeType list[], int size, int target) { int front(0); int back(size-1); int mid; while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1; } return NULL; \\ Indicates target was not found;}

NodeType binarySearch(NodeType list[], int size, int target) { int front(0); int back(size-1); int mid; while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1; } return NULL; \\ Indicates target was not found;}

Page 32: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

32

Binary Search (target = 7)

44 66 77 1212 1818 2222 2323 2828

front(0)front(0)

3030

back(8)back(8)

Page 33: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

33

Binary Search (target = 7)

44 66 77 1212 1818 2222 2323 2828

front(0)front(0)

3030

back(8)back(8)

mid(4)mid(4)

Page 34: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

34

Binary Search (target = 7)

44 66 77 1212 1818 2222 2323 2828

front(0)front(0)

3030

back(3)back(3)mid(4)mid(4)

Page 35: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

35

Binary Search (target = 7)

44 66 77 1212 1818 2222 2323 2828

front(0)front(0)

3030

back(3)back(3)mid(1)mid(1)

Page 36: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

36

Binary Search (target = 7)

44 66 77 1212 1818 2222 2323 2828

front(2)front(2)

3030

back(3)back(3)

mid(1)mid(1)

Page 37: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

37

Binary Search (target = 7)

44 66 77 1212 1818 2222 2323 2828

front(2)front(2)

3030

back(3)back(3)

mid(1)mid(1)

Page 38: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

38

Is it worth the trouble?

Suppose you had 1000 elements Ordinary search would require maybe 500 comparisons on

average Binary search

after 1st compare, throw away half, leaving 500 elements to be searched.

after 2nd compare, throw away half, leaving 250. Then 125, 63, 32, 16, 8, 4, 2, 1 are left.

After at most 10 steps, you're done! What if you had 1,000,000 elements??

Page 39: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

39

How Fast Is It?

Another way to look at it: How big an array can you search if you examine a given number of array elements?

# comps Array size

1 1

2 2

3 4

4 8

5 16

6 32

7 64

8 128

… …

11 1,024

… …

21 1,048,576

Page 40: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

40

List size Loop Iterations1 13 27 315 431 563 6

127 7

Analyzing Binary Search

We only need to concentrate in the main loop The loop is different from the linear search because its number of

executions is not a multiple of n (list size) We can easily see that the size of the input is halved in each interaction.

This should already give a "hint" of each function describes this algorithm, but let's use a table

The table shows that thenumber of iterations grows

proportionally to the logarithm

base 2 of the size of the list

O(log n)

The table shows that thenumber of iterations grows

proportionally to the logarithm

base 2 of the size of the list

O(log n)

Page 41: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

41

Time for Binary Search

Key observation: for binary search: size of the array n that can be searched with k comparisons: n ~ 2k

Number of comparisons k as a function of array size n: k ~ log2 n

This is fundamentally faster than linear search (where k ~ n)

Page 42: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

42

Write a Recursive Binary Seach Function BinarySearch( )

BinarySearch takes sorted array vec, and two subscripts, fromLoc and toLoc, and key as arguments. It returns false if key is not found in the elements vec[fromLoc…toLoc]. Otherwise, it returns true.

BinarySearch is O(log2N).

Page 43: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

43

found = BinarySearch(vec, 25, 0, 14 );

key fromLoc toLocindexes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

vec 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28

16 18 20 22 24 26 28

24 26 28

24 NOTE: denotes element examined

Page 44: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

44

Recursive Binary Seach -- basic idea

• This is an example of a recursive function where arguments are halved.

Given: a sorted array a of values (integers, strings, ..) from range [s,t]

Task: search if a value x is in the array. If yes, return position, otherwise -1.

Page 45: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

45

Recursive Binary Seach -- basic idea

• Consider how you search for a name in a phone book: you don't use algorithm 1 (otherwise it would take ages to find a name starting with Z).

• instead, you open the book somewhere, and then continue searching in the half that contains the name then open up somewhere in that half, and continue searching in the portion that contains the name, etc.

Page 46: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

46

Now let's do this for a sorted array of integers, but let's alwayscheck the middle of the remaining range.Example: search for 7 in the following array

2 5 7 11 17 24 31 38 40 41 0 1 2 3 4 5 6 7 8 9

mid: (0+9)/2 = 4, 7< a[4], so look in lower half

2 5 7 11

mid = (0+3)/2 == 1, 7> a[1], so look in upper half

7 11

mid = (2+3)/2 == 2, 7 == a[2], found!

low high

low high

low high

Recursive Binary Seach -- basic idea

Page 47: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

47

Recursive Binary Seach -- basic idea

• Example: array contains 3,5. Search for 4. (0+1)/2 is 0 (integer div). so if we don't exclude mid, the sub array starts again at index 0 and ends at 1. => infinite number of recursive calls in the code on the next page, mid is excluded from the subarray to prevent this.

Page 48: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

48

• Let's think about the design of the recursive fct before coding it:1. recursive calls: call function with that half of the current subrange

that contains x

Define subrange with start and end index

2. base case: when should the recursive calls stop: when we find x

• what if x is not in the array? -- stop if a single cell that does not contain x check: does the (start + end)/2 procedure always end in an array of length 1? A: depends on how you implement it. You must ensure that array gets at least smaller by 1.

Recursive Binary Seach -- basic idea

Page 49: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

Boolean BinarySearch ( int vec[ ] , int key , int fromLoc , int toLoc )

// PRE: vec [ fromLoc . . toLoc ] sorted in ascending order // POST: FCTVAL == ( key in vec [ fromLoc . . toLoc] )

{ int mid ;if ( fromLoc > toLoc ) // base case -- not found

return false ; else {

mid = ( fromLoc + toLoc ) / 2 ;

if ( vec [ mid ] == key ) // base case-- found at mid

return true ;

else if ( key < vec [ mid ] ) // search lower half return BinarySearch ( vec, key, fromLoc, mid-1 ) ; else // search upper half

return BinarySearch( vec, key, mid + 1, toLoc ) ; }

} 49

Recursive Binary Seach

Page 50: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

#include <stdio.h>

/* prototype */

int binSearch(int array[], int first, int last, int N);

void main(void){

int index;

int value;

int list[] = {1,2,3,5,6};

printf(“Enter a search value:”);

scanf(“%i”,&value);/* the function binSearch returns the index of the array */

/* where the match is found, otherwise a –1 */

index = binSearch(list,0,4,value);

if (index == -1)

printf(“Value not found!\n”);

else

printf(“Value matches the %i element in the array!\n”,++index);

}

/* code continued on next slide */

Page 51: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

/* array is the name of the array (or sub-array) to be searched */

/* first is the left-most index of the array being searched */

/* last is the right-most index of the array being searched */

/* N is the value being searched for */

int binSearch(int array[], int first, int last, int N) {

int midpt; if (N < array[first] || N > array[last] )

return -1;

/* didn't meet our error condition */

midpt = (first+last)/2;

if (array[midpt] == N)

return midpt; /* recursive calls */

else if (array[midpt] > N)

return binSearch( array, first, midpt – 1, N);

else

return binSearch( array, midpt+1,last, N);

}

Page 52: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

52

Note the contents of the “stack” when we execute a call binSearch from main:(some of the details are simplified)

(push) return binSearch( array, 0,1, 2); (first =0, last =4)

(push) return binSearch( array, 1,1, 2); (first =0, last =1)

(pop) return 1; (first = 1, last = 1)

(pop) return 1; (first =0, last=1)

(pop) return 1; (first =0, last=4)

Recursive Binary Seach

Page 53: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

53

Note the contents of the “stack” when we execute a call binSearch from main:(some of the details are simplified)

(push) return binSearch( array, 2+1,4, 4); (first =0, last =4)

(pop) return -1; (first = 3, last = 4)

(pop) return -1; (first =0, last =4)

Recursive Binary Seach

Page 54: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

54

Iteration vs. Recursion

Turns out any iterative algorithm can be reworked to use recursion instead (and vice versa).

There are programming languages where recursion is the only choice(!)

Some algorithms are more naturally written with recursion But naïve applications of recursion can be inefficient

Page 55: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

55

Binary Seach

Several comments on binary search:

• Binary search assumes that the elements are sorted. If they are not sorted, you won't know in which half to continue searching.

• Binary search is not a great idea for linked lists, since you can't just jump to the middle element. You'd have to iterate through the list to get there, so you could just as well check for x while you are doing that.

Page 56: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

56

Summary Linear search and binary search are two

different algorithms for searching an array Binary search is vastly more efficient

But binary search only works if the array elements are in order

Page 57: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

57

Hashing

Page 58: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

58

Tables: rows & columns of information

A table has several fields (types of information) A telephone book may have fields name, address, phone number A user account table may have fields user id, password, home

folder To find an entry in the table, you only need know the

contents of one of the fields (not all of them). This field is the key In a telephone book, the key is usually name In a user account table, the key is usually user id

Ideally, a key uniquely identifies an entry If the key is name and no two entries in the telephone book have

the same name, the key uniquely identifies the entries

Page 59: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

59

The Table ADT: operations

insert: given a key and an entry, inserts the entry into the table

find: given a key, finds the entry associated with the key remove: given a key, finds the entry associated with the

key, and removes it

Also: getIterator: returns an iterator, which visits each of the

entries one by one (the order may or may not be defined)etc.

Page 60: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

60

Table ADT’s

We are familiar with direct access structures and linear access structures.

Both have its advantages and disadvantages The crucial point for avoiding direct access structures is the

fact that we need to allocate in advance the size of this structure In all likelihood, we tend to overestimate the its size and we end up

with a very sparse structure We tend to think that the actual number of keys to be stored is

equivalent to the universe of possible existing keys In some problems the number of keys to be stored is smaller

than the number in the universe of keys. In this case a hash table may save us a lot of space.

Page 61: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

61

How should we implement a table?

How often are entries inserted and removed? How many of the possible key values are likely to be used? What is the likely pattern of searching for keys?

e.g. Will most of the accesses be to just one or two key values?

Is the table small enough to fit into memory? How long will the table exist?

Our choice of representation for the Table ADT depends on the answers to the following

Page 62: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

62

TableNode: a key and its entry For searching purposes, it is best to store the

key and the entry separately (even though the key’s value may be inside the entry)

“Smith” “Smith”, “124 Hawkers Lane”, “9675846”

“Yeo” “Yeo”, “1 Apple Crescent”, “0044 1970 622455”

key entry

TableNode

Page 63: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

63

Implementation 1:unsorted sequential array

An array in which TableNodes are stored consecutively in any order

insert: add to back of array; O(1) find: search through the keys one at

a time, potentially all of the keys; O(n)

remove: find + replace removed node with last node; O(n)

0

key entry

1

23

and so on

Page 64: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

64

Implementation 2:sorted sequential array

An array in which TableNodes are stored consecutively, sorted by key

insert: add in sorted order; O(n) find: binary search; O(log n) remove: find, remove node and

shuffle down; O(n)

0

key entry

1

23

We can use binary search because thearray elements are sorted

and so on

Page 65: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

65

Implementation 3:linked list (unsorted or sorted)

TableNodes are again stored consecutively

insert: add to front; O(1)or O(n) for a sorted list

find: search through potentially all the keys, one at a time; O(n)still O(n) for a sorted list

remove: find, remove using pointer alterations; O(n)

key entry

and so on

Page 66: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

66

An array in which TableNodes are not stored consecutively - their place of storage is calculated using the key and a hash function

Hashed key: the result of applying a hash function to a key

Keys and entries are scattered throughout the array

Implementation 5:hashing

key entry

Key hash function

array index

4

10

123

Page 67: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

67

An array in which TableNodes are not stored consecutively - their place of storage is calculated using the key and a hash function

insert: calculate place of storage, insert TableNode; O(1)

find: calculate place of storage, retrieve entry; O(1)

remove: calculate place of storage, set it to null; O(1)

Implementation 5:hashing

key entry

4

10

123

All are O(1) !

Page 68: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

68

Hash Functions

Hash tables normally maintain the invariant of direct access structure which provide O(1) time (constant time) to access an element

With direct access structure, a key k is normally stored in slot k. In hash tables this element is stored in slot h(k).

h(k) is a hash function. It maps the universe U of keys into the slots of a hash table (smaller than the universe)

h : U --> {0,1,...,m-1} where m is the size of the tableh : U --> {0,1,...,m-1} where m is the size of the table

Page 69: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

69

Hashing example: a fruit shop 10 stock details, 10 table positions

key entry01

2

3

4

5

6

7

8

9

Stock numbers are between 0 and 1000Use hash function: stock no. / 100What if we now insert stock no. 350?

Position 3 is occupied: there is a collision

Collision resolution strategy: insert in the next free position (linear probing)

85 85, apples

462 462, pears

912 912, papaya

323 323, guava

350 350, oranges

Given a stock number, we find stock by using the hash function again, and use the collision resolution strategy if necessary

Page 70: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

70

Pictorial view of Hash Tables

k1

k2k3

k4

Page 71: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

71

Pictorial view of Hash Tables

k1

k2k3

k4

k5

Page 72: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

72

Three factors affecting the performance of hashing

The hash function Ideally, it should distribute keys and entries evenly throughout the table It should minimise collisions, where the position given by the hash

function is already occupied The collision resolution strategy

Separate chaining: chain together several keys/entries in each position Open addressing: store the key/entry in a different position

The size of the table Too big will waste memory; too small will increase collisions and may

eventually force rehashing (copying into a larger table) Should be appropriate for the hash function used – and a prime number

is best

Page 73: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

73

Choosing a hash function:turning a key into a table position

Truncation Ignore part of the key and use the rest as the array index (converting

non-numeric parts) A fast technique, but check for an even distribution throughout the

table Folding

Partition the key into several parts and then combine them in any convenient way

Unlike truncation, uses information from the whole key Modular arithmetic (used by truncation & folding, and on its own)

To keep the calculated table position within the table, divide the position by the size of the table, and take the remainder as the new position

Page 74: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

74

Examples of hash functions (1)

Truncation: If students have an 9-digit identification number, take the last 3 digits as the table position e.g. 925371622 becomes 622

Folding: Split a 9-digit number into three 3-digit numbers, and add them e.g. 925371622 becomes 925 + 376 + 622 = 1923

Modular arithmetic: If the table size is 1000, the first example always keeps within the table range, but the second example does not (it should be mod 1000) e.g. 1923 mod 1000 = 923 (in Java: 1923 % 1000)

Page 75: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

75

Examples of hash functions (2) Using a telephone number as a key

The area code is not random, so will not spread the keys/entries evenly through the table (many collisions)

The last 3-digits are more random Using a name as a key

Use full name rather than surname (surname not particularly random) Assign numbers to the characters (e.g. a = 1, b = 2; or use Unicode

values) Strategy 1: Add the resulting numbers. Bad for large table size. Strategy 2: Call the number of possible characters c (e.g. c = 54 for

alphabet in upper and lower case, plus space and hyphen). Then multiply each character in the name by increasing powers of c, and add together.

Page 76: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

76

What is a Hash Table ?

The simplest kind of hash table is an array of records.

This example has 701 records.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

An array of records

. . .

[ 700]

Page 77: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

77

What is a Hash Table ?

Each record has a special field, called its key.

In this example, the key is a long integer field called Number.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

. . .

[ 700]

[ 4 ]

Number 506643548

Page 78: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

78

What is a Hash Table ?

The number might be a person's identification number, and the rest of the record has information about the person.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

. . .

[ 700]

[ 4 ]

Number 506643548

Page 79: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

79

What is a Hash Table ?

When a hash table is in use, some spots contain valid records, and other spots are "empty".

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Page 80: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

80

Inserting a New Record

In order to insert a new record, the key must somehow be converted to an array index.

The index is called the hash value of the key.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Number 580625685

Page 81: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

81

Inserting a New Record

Typical way create a hash value:

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Number 580625685

(Number mod 701)

What is (580625685 mod 701) ?

Page 82: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

82

Inserting a New Record

Typical way to create a hash value:

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Number 580625685

(Number mod 701)

What is (580625685 mod 701) ?3

Page 83: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

83

Inserting a New Record

The hash value is used for the location of the new record.

Number 580625685

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

[3]

Page 84: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

84

Inserting a New Record

The hash value is used for the location of the new record.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

Page 85: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

85

Collisions

Here is another new record to insert, with a hash value of 2.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

Number 701466868

My hashvalue is [2].

Page 86: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

86

Collisions

This is called a collision, because there is already another valid record at [2].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

Number 701466868

When a collision occurs,move forward until you

find an empty spot.

When a collision occurs,move forward until you

find an empty spot.

Page 87: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

87

Collisions

This is called a collision, because there is already another valid record at [2].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

Number 701466868

When a collision occurs,move forward until you

find an empty spot.

When a collision occurs,move forward until you

find an empty spot.

Page 88: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

88

Collisions

This is called a collision, because there is already another valid record at [2].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

Number 701466868

When a collision occurs,move forward until you

find an empty spot.

When a collision occurs,move forward until you

find an empty spot.

Page 89: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

89

Collisions

This is called a collision, because there is already another valid record at [2].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

The new record goesin the empty spot.

The new record goesin the empty spot.

Page 90: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

90

A Quiz

Where would you be placed in this table, if there is no collision? Use your social security number or some other favorite number.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868

. . .

Page 91: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

91

Searching for a Key

The data that's attached to a key can be found fairly quickly.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

Page 92: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

92

Searching for a Key

Calculate the hash value. Check that location of the array

for the key.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

My hashvalue is [2].

Not me.

Page 93: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

93

Searching for a Key

Keep moving forward until you find the key, or you reach an empty spot.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

My hashvalue is [2].

Not me.

Page 94: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

94

Searching for a Key

Keep moving forward until you find the key, or you reach an empty spot.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

My hashvalue is [2].

Not me.

Page 95: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

95

Searching for a Key

Keep moving forward until you find the key, or you reach an empty spot.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

My hashvalue is [2].

Yes!

Page 96: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

96

Searching for a Key

When the item is found, the information can be copied to the necessary location.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

My hashvalue is [2].

Yes!

Page 97: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

97

Deleting a Record

Records may also be deleted from a hash table.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Pleasedelete me.

Page 98: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

98

Deleting a Record

Records may also be deleted from a hash table. But the location must not be left as an ordinary "empty

spot" since that could interfere with searches.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Page 99: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

99

Deleting a Record

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Records may also be deleted from a hash table. But the location must not be left as an ordinary "empty

spot" since that could interfere with searches. The location must be marked in some special way so that

a search can tell that the spot used to have something in it.

Page 100: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

100

Using a hash function

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

. . .

Empty

4501

Empty

8903

8

10

values

[ 97]

[ 98]

[ 99]

7803

Empty

.

.

.

Empty

2298

3699

HandyParts company makes no more than 100 different parts. But theparts all have four digit numbers.

This hash function can be used tostore and retrieve parts in an array.

Hash(key) = partNum % 100

Page 101: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

101

Placing elements in the array

Use the hash function

Hash(key) = partNum % 100

to place the element with

part number 5502 in the

array.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

. . .

Empty

4501

Empty

8903

8

10

values

[ 97]

[ 98]

[ 99]

7803

Empty

.

.

.

Empty

2298

3699

Page 102: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

102

Placing elements in the array

Next place part number6702 in the array.

Hash(key) = partNum % 100

6702 % 100 = 2

But values[2] is already occupied.

COLLISION OCCURS

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

. . .

values

[ 97]

[ 98]

[ 99]

7803

Empty

.

.

.

Empty 2298

3699

Empty

4501

5502

Page 103: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

103

How to resolve the collision?

One way is by linear probing.This uses the rehash function

(HashValue + 1) % 100

repeatedly until an empty locationis found for part number 6702.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

. . .

values

[ 97]

[ 98]

[ 99]

7803

Empty

.

.

.

Empty

2298

3699

Empty

4501

5502

Page 104: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

104

Resolving the collision

Still looking for a place for 6702using the function

(HashValue + 1) % 100

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

. . .

values

[ 97]

[ 98]

[ 99]

7803

Empty

.

.

.

Empty

2298

3699

Empty

4501

5502

Page 105: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

105

Collision resolved

Part 6702 can be placed atthe location with index 4.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

. . .

values

[ 97]

[ 98]

[ 99]

7803

Empty

.

.

.

Empty

2298

3699

Empty

4501

5502

Page 106: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

106

Collision resolved

Part 6702 is placed atthe location with index 4.

Where would the part withnumber 4598 be placed usinglinear probing?

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

. . .

values

[ 97]

[ 98]

[ 99]

7803

6702

.

.

.

Empty

2298

3699

Empty

4501

5502

Page 107: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

107

Choosing the table size to minimise collisions

As the number of elements in the table increases, the likelihood of a collision increases - so make the table as large as practical

If the table size is 100, and all the hashed keys are divisable by 10, there will be many collisions! Particularly bad if table size is a power of a small integer

such as 2 or 10 More generally, collisions may be more frequent if:

greatest common divisor (hashed keys, table size) > 1 Therefore, make the table size a prime number (gcd = 1)

Collisions may still happen, so we need a collision resolution strategy

Page 108: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

108

Collision resolution techniques

We will review a simple technique called chaining. However there are those who argue against this approach and point out other techniques such as: Linear Probing: Very simple. If position h(key) is occupied, do a

linear search in the table until you find a empty slot. The slot is searched in this order: h(key), k(key)+1, h(key)+2, ..., h(key)+c

Quadratic probing: is a variant of the above where the term being added to the hash result is squared. h(key)+c2

Random probing: is another variant where the term being added to the hash function is a random number. h(key)+random()

Rehashing: is a technique where a sequence of hashing functions are defined (h

1, h

2, ... h

k). If a collision occurs the

functions are used in the this order

Page 109: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

109

Collision resolution:open addressing (1)

Linear probing: increase by 1 each time [mod table size!] Quadratic probing: to the original position, add 1, 4, 9, 16,…

Probing: If the table position given by the hashed key is already occupied, increase the position by some amount, until an empty position is found

Use the collision resolution strategy when inserting and when finding (ensure that the search key and the found keys match)

May also double hash: result of linear probing result of another hash function

With open addressing, the table size should be double the expected no. of elements

Page 110: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

110

Clustering

is the tendency of elements to become unevenly distributed in the hash table, with many elements clustering around a single hash location.

One problem with linear probing is that it results in clustering.

Page 111: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

111

Collision resolution:open addressing (2)

If the table is fairly empty with many collisions, linear probing may cluster (group) keys/entries This increases the time to insert and to find

1 2 3 4 5 6 7 8

For a table of size n, then if the table is empty, the probability of the next entry going to any particular place is 1/nIn the diagram, the probability of position 2 getting filled next is 2/n (either a hash to 1 or to 2 fills it)Once 2 is full, the probability of 4 being filled next is 4/n and then of 7 is 7/n (i.e. the probability of getting long strings steadily increases)

Page 112: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

112

Collision resolution:open addressing (3)

An empty key/entry marks the end of a cluster, and so can be used to terminate a find operation

So, if we remove an entry within a cluster, we should not empty it!

To allow probing to continue, the removed entry must be marked as ‘removed but cluster continues’

Page 113: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

113

Collision resolution:open addressing (4)

Quadratic probing is a solution to the clustering problem Linear probing adds 1, 2, 3, etc. to the original hashed key Quadratic probing adds 12, 22, 32 etc. to the original hashed

key However, whereas linear probing guarantees that all empty

positions will be examined if necessary, quadratic probing does not e.g. Table size 16 and original hashed key 3 gives the

sequence: 3, 4, 7, 12, 3, 12, 7, 4… More generally, with quadratic probing, insertion may be

impossible if the table is more than half-full! Need to rehash (see later)

Page 114: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

114

Collision resolution: chaining Each slot of a hash table will be a

pointer to a linked list Add the keys and entries anywhere in

the list (front easiest) Advantages over open addressing:

Simpler insertion and removal Array size is not a limitation (but

should still minimise collisions: make table size roughly equal to expected number of keys and entries)

Disadvantage Memory overhead is large if entries

are small

4

10

123

key entry key entry

key entry key entry

key entry

No need to change position!

Page 115: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

115

Chaining

is another means (besides linear probing) used to handle collisions that arise from the use of a hash function.

Chaining uses the hash value, not as the actual location of the element, but as the index into an array of pointers. A chain is a linked list of elements that share the same hash location.

FOR EXAMPLE . . .

Page 116: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

116

Using hashing and chaining

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

. . .

pointers

[ 97]

[ 98]

[ 99]

HandyParts company makes no more than 100 different parts. But theparts all have four digit numbers.

Use this hash function to store and retrieve parts in the chains.

Hash(key) = partNum % 100

7803

.

.

.

2298

3699

4501

Page 117: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

117

Using chaining

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

. . .

pointers

[ 97]

[ 98]

[ 99]

7803

.

.

.

2298

3699

4501

Use the hash function

Hash(key) = partNum % 100

to place the element with

part number 5502 in a chain.

5502

Page 118: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

118

Using chaining

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

. . .

[ 97]

[ 98]

[ 99]

7803

.

.

.

2298

3699

4501

5502

Next place part number6702 in a chain.

Hash(key) = partNum % 100

6702 % 100 = 2

6702

pointers

Page 119: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

119

Using chaining

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

. . .

[ 97]

[ 98]

[ 99]

7803

.

.

.

2298

3699

4501

5502 6702

Where would the part withnumber 4598 be placed using chaining?

pointers

Page 120: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

120

More Chaining…….

Page 121: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

121

Hashing(103)

h(103) = 103 mod 10 h(103) = 3

h(103) = 103 mod 10 h(103) = 3

Page 122: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

122

Hashing(103)

h(n) = 103 mod 10 h(n) = 3

h(n) = 103 mod 10 h(n) = 3

103103 //

Page 123: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

123

Hashing(69)

h(n) = 69 mod 10 h(n) = 9

h(n) = 69 mod 10 h(n) = 9

103103 //

6969 //

Page 124: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

124

Hashing(20)

h(n) = 20 mod 10 h(n) = 0

h(n) = 20 mod 10 h(n) = 0

103103 //

6969 //

2020 //

Page 125: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

125

Hashing(13)

h(n) = 13 mod 10 h(n) = 3

h(n) = 13 mod 10 h(n) = 3

103103

6969 //

2020 //

1313 //

Page 126: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

126

Hashing(110)

h(n) = 110 mod 10 h(n) = 0

h(n) = 110 mod 10 h(n) = 0

103103

6969 //

2020

1313 //

110110 //

Page 127: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

127

Hashing(53)

h(n) = 53 mod 10 h(n) = 3

h(n) = 53 mod 10 h(n) = 3

103103

6969 //

2020

1313 //

110110 //

5353 //

Page 128: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

128

Final Hash Table

103103

6969 //

2020

1313 //

110110 //

5353 //

Page 129: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

129

Searching in a Hash Table

Like any other structure, searching is a common task with hash tables

Searching works as belowGiven a target, hash the targetTake the value of the hash of target and go to the slot.

If the target exist it must be in this slotSearch in the list in the current slot using a linear

search.

Page 130: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

130

Searching for 53

103103

6969 //

2020

1313 //

110110 //

5353 //

Page 131: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

131

Searching for 53

103103

6969 //

2020

1313 //

110110 //

5353 //

Page 132: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

132

Searching for 53

103103

6969 //

2020

1313 //

110110 //

5353 //

temptemp

Page 133: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

133

Searching for 53

103103

6969 //

2020

1313 //

110110 //

5353 //

temptemp

Page 134: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

134

Searching for 53

103103

6969 //

2020

1313 //

110110 //

5353 //

temptemp

Page 135: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

135

Searching for 53

103103

6969 //

2020

1313 //

110110 //

5353 //

temptemp

Page 136: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

136

hashSearch(n)

NodeType hashSearch(NodeType* table[],int target) { int index = hash(target); NodeType *temp = table[index]; return linearSearch(temp,target);}

NodeType hashSearch(NodeType* table[],int target) { int index = hash(target); NodeType *temp = table[index]; return linearSearch(temp,target);}

Page 137: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

137

Rehashing: enlarging the table To rehash:

Create a new table of double the size (adjusting until it is again prime) Transfer the entries in the old table to the new table, by recomputing their

positions (using the hash function) When should we rehash?

When the table is completely full With quadratic probing, when the table is half-full or insertion fails

Why double the size? If n is the number of elements in the table, there must have been n/2

insertions before the previous rehash (if rehashing done when table full) So by making the table size 2n, a constant cost is added to each insertion

Page 138: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

138

Comparison of collision techniques

factor (n/size)

Exp

ecte

d N

umbe

r of

Pro

besLinear Probing

Random Probing

Chaining

Page 139: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

139

Applications of Hashing Compilers use hash tables to keep track of declared variables A hash table can be used for on-line spelling checkers — if

misspelling detection (rather than correction) is important, an entire dictionary can be hashed and words checked in constant time

Game playing programs use hash tables to store seen positions, thereby saving computation time if the position is encountered again

Hash functions can be used to quickly check for inequality — if two elements hash to different values they must be different

Storing sparse data

Page 140: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

140

When are other representations more suitable than hashing?

Hash tables are very good if there is a need for many searches in a reasonably stable table

Hash tables are not so good if there are many insertions and deletions, or if table traversals are needed — in this case, AVL trees are better

If there are more data than available memory then use a B-tree

Also, hashing is very slow for any operations which require the entries to be sorted e.g. Find the minimum key

Page 141: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

141

Performance of Hashing

The number of probes depends on the load factor (usually denoted by ) which represents the ratio of entries present in the table to the number of positions in the array

We also need to consider successful and unsuccessful searches separately

For a chained hash table, the average number of probes for an unsuccessful search is and for a successful search is 1 + /2

Page 142: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

142

Performance of Hashing (2)

For open addressing, the formulae are more complicated but typical values are:Load Factor 0.1 0.5 0.8 0.9 0.99Successful searchLinear Probes 1.05 1.6 3.4 6.2 21.3Quadratic Probes 1.04 1.5 2.1 2.7 5.2Unsuccessful searchLinear Probes 1.13 2.7 15.4 59.8 430Quadratic probes 1.13 2.2 5.2 11.9 126

Note that these do not depend on the size of the array or the number of entries present but only on the ratio (the load factor)

Page 143: 1 Searching and Hashing. 2 Concepts This Lecture Searching an array Linear search Binary search Comparing algorithm performance

143

Hash tables store a collection of records with keys. The location of a record depends on the hash value of the

record's key. When a collision occurs, the next available location is

used. Searching for a particular key is generally quick. When an item is deleted, the location must be marked in a

special way, so that the searches know that the spot used to be used.

Summary