1 Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

1

Data Structures

CSCI 132, Spring 2014Lecture 34

Analyzing Hash Tables

2

Recall Hash Tables

A hash table

•Hash tables use an index function that maps many possible keys to a single location.

•If the table is sparse, then most of the time only 1 key will go to each location.

•If 2 records do get assigned to the same location (a collision), we use a method for reassigning the second record (collision resolution).

3

The C++ Hash Table Specification

const int hash_size = 997; // a prime number of appropriate size

class Hash_table { public: Hash_table( ); void clear( ); Error_code insert(const Record &new_entry); Error_code retrieve(const Key &target, Record &found) const; private: Record table[hash_size];};

4

Implementation of insert( )

Error_code Hash_table :: insert(const Record &new_entry) { Error_code result = success; int probe_count, // Counter to be sure that table is not full. increment, // Increment used for quadratic probing. probe; // Position currently probed in the hash table. Key null; // Null key for comparison purposes. null.make_blank( );

probe = hash(new_entry); //Find location to insert new_entry

probe_count = 0; increment = 1;

5

insert( ) continued

while (table[probe] != null // Is the location empty? && table[probe] != new_entry // Duplicate key? && probe_count < (hash_size + 1)/2) { // Has overflow occurred? probe_count++; probe = (probe + increment)%hash_size; increment += 2; // Prepare increment for next iteration. } if (table[probe] == null) table[probe] = new_entry; // Insert new entry. else if (table[probe] == new_entry) result = duplicate_error; else result = overflow; // The table is full. return result;}

6

Likelihood of collisions

•How many people have to be in a room before the probability that two of them have the same birthday reaches 50%?

P = (1 - (364/365)*(363/365)*(362/365)* ...*(365-m+1)/365 > 0.5 when m >= 23

•The calculation for a probability of a collision in a table is similar.

•The table does not have to be very full for the probability of a collision to reach at least 50%.

•Therefore: Collisions happen! We must handle them efficiently.

7

Counting Probes

•We can analyze the running time of hash tables by counting comparisons.

•Comparisons take place when "probing" an entry: Looking at an entry and comparing its key to a target.

•The number of probes done depends on how full the table is.n = number of entries in the tablet = number of total positions in table (= hash_size) = n/t = Load Factor

= 0 means no entries in table= 0.5 means the table is 1/2 full<= 1 for contiguous table without chaining (open addressing)can be greater than 1 if using chaining

8

Number of comparisons for chaining

Unsuccessful searches:•If entries distributed evenly over the table, then the expected number of entries in each chain is: n/t = .

•For an unsuccessful search, we must do one probe for each entry in the list, so the average number of probes (or comparisons) is .

Successful searches:•Average number of comparisons for sequential search of a list with k items is:

(k + 1)/2•The node we are looking for is in our list, the other n-1 nodes are distributed evenly over the table so the average number of nodes will be:

k = (n-1)/t + 1 ~ n/t + 1 = + 1.•Average number of comparisons will be

( + 1 + 1)/2 =/2 + 1

9

Open addressing (without chaining)

Evenly distributed entries, Random probing:Number of Comparisons (approx)

Successful case: (1/)ln(1/(1-))Unsuccessful case: 1/(1 - )

Linear Probing:Successful case: 0.5(1 + 1/(1-) )Unsuccessful case: 0.5(1 + 1/(1-)2 )

Theoretical and empirical results

11

Hash Tables vs. Other Methods

•Speed of retrieval from a hash table does not depend on the total number of entries, but on the ratio of entries/table-size ().

•A table of size 40 with 20 entries has the same performance as a table of size 4000 with 2000 entries.

Sequential Search: (n)Binary Search: ( lg(n))Hash Table retrieval: O (1) for small .

•Read section 9.8 on choosing a method for storage and retrieval of data.

12

Radix sort

Radix sort creates a table of queues. Each queue corresponds to a letter of the alphabet.

Sort from least significant letter to most significant letter.

13

Implementation of Radix Sort

const int key_size = 5;const int max_chars = 28;template <class Record>void Sortable_list<Record> :: radix_sort( ) { Record data; Queue queues[max_chars]; for (int position = key_size - 1; position >= 0; position--) { // Loop from the least to the most significant position. while (remove(0, data) == success) { int queue_number = alphabetic_order(data.key_letter(position)); queues[queue_number].append(data); // Queue operation. } rethread(queues); // Reassemble the list. }}

Documents

1 Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables