Upload
frederica-wiggins
View
214
Download
0
Embed Size (px)
Citation preview
1
Data Structures
CSCI 132, Spring 2014Lecture 34
Analyzing Hash Tables
2
Recall Hash Tables
A hash table
•Hash tables use an index function that maps many possible keys to a single location.
•If the table is sparse, then most of the time only 1 key will go to each location.
•If 2 records do get assigned to the same location (a collision), we use a method for reassigning the second record (collision resolution).
3
The C++ Hash Table Specification
const int hash_size = 997; // a prime number of appropriate size
class Hash_table { public: Hash_table( ); void clear( ); Error_code insert(const Record &new_entry); Error_code retrieve(const Key &target, Record &found) const; private: Record table[hash_size];};
4
Implementation of insert( )
Error_code Hash_table :: insert(const Record &new_entry) { Error_code result = success; int probe_count, // Counter to be sure that table is not full. increment, // Increment used for quadratic probing. probe; // Position currently probed in the hash table. Key null; // Null key for comparison purposes. null.make_blank( );
probe = hash(new_entry); //Find location to insert new_entry
probe_count = 0; increment = 1;
5
insert( ) continued
while (table[probe] != null // Is the location empty? && table[probe] != new_entry // Duplicate key? && probe_count < (hash_size + 1)/2) { // Has overflow occurred? probe_count++; probe = (probe + increment)%hash_size; increment += 2; // Prepare increment for next iteration. } if (table[probe] == null) table[probe] = new_entry; // Insert new entry. else if (table[probe] == new_entry) result = duplicate_error; else result = overflow; // The table is full. return result;}
6
Likelihood of collisions
•How many people have to be in a room before the probability that two of them have the same birthday reaches 50%?
P = (1 - (364/365)*(363/365)*(362/365)* ...*(365-m+1)/365 > 0.5 when m >= 23
•The calculation for a probability of a collision in a table is similar.
•The table does not have to be very full for the probability of a collision to reach at least 50%.
•Therefore: Collisions happen! We must handle them efficiently.
7
Counting Probes
•We can analyze the running time of hash tables by counting comparisons.
•Comparisons take place when "probing" an entry: Looking at an entry and comparing its key to a target.
•The number of probes done depends on how full the table is.n = number of entries in the tablet = number of total positions in table (= hash_size) = n/t = Load Factor
= 0 means no entries in table= 0.5 means the table is 1/2 full<= 1 for contiguous table without chaining (open addressing)can be greater than 1 if using chaining
8
Number of comparisons for chaining
Unsuccessful searches:•If entries distributed evenly over the table, then the expected number of entries in each chain is: n/t = .
•For an unsuccessful search, we must do one probe for each entry in the list, so the average number of probes (or comparisons) is .
Successful searches:•Average number of comparisons for sequential search of a list with k items is:
(k + 1)/2•The node we are looking for is in our list, the other n-1 nodes are distributed evenly over the table so the average number of nodes will be:
k = (n-1)/t + 1 ~ n/t + 1 = + 1.•Average number of comparisons will be
( + 1 + 1)/2 =/2 + 1
9
Open addressing (without chaining)
Evenly distributed entries, Random probing:Number of Comparisons (approx)
Successful case: (1/)ln(1/(1-))Unsuccessful case: 1/(1 - )
Linear Probing:Successful case: 0.5(1 + 1/(1-) )Unsuccessful case: 0.5(1 + 1/(1-)2 )
Theoretical and empirical results
11
Hash Tables vs. Other Methods
•Speed of retrieval from a hash table does not depend on the total number of entries, but on the ratio of entries/table-size ().
•A table of size 40 with 20 entries has the same performance as a table of size 4000 with 2000 entries.
Sequential Search: (n)Binary Search: ( lg(n))Hash Table retrieval: O (1) for small .
•Read section 9.8 on choosing a method for storage and retrieval of data.
12
Radix sort
Radix sort creates a table of queues. Each queue corresponds to a letter of the alphabet.
Sort from least significant letter to most significant letter.
13
Implementation of Radix Sort
const int key_size = 5;const int max_chars = 28;template <class Record>void Sortable_list<Record> :: radix_sort( ) { Record data; Queue queues[max_chars]; for (int position = key_size - 1; position >= 0; position--) { // Loop from the least to the most significant position. while (remove(0, data) == success) { int queue_number = alphabetic_order(data.key_letter(position)); queues[queue_number].append(data); // Queue operation. } rethread(queues); // Reassemble the list. }}