Upload
buck-hunt
View
217
Download
1
Tags:
Embed Size (px)
Citation preview
Hash Tables - Introduction
A structure that offers fast insertion and searching
Insertion and searching is almost O(1)
Hashing- a range of key values is transformed into a range of array index values
Hashing - Introduction
In a dictionary, if the main key was the array index, searching and inserting items would be very fast.
Example: Empdata[1000], employee database, index=employee number- search for employee with emp. number = 500- Answer: Empdata[500]- Running Time: O(1)
Hash Tables
In the previous example, it was easy since employee number is an integer.
What if the main key is a word in the English Alphabet (i.e. last names)
How can the main key be mapped into an array index
Hash Tables
Sum of Digits Method- map the alphabet A-Z to the numbers 1 to 26 (a=1,b=2,c=3,etc.)- add the total of the letters- For example, “cats” (c=3,a=1,t=20,s=19);3+1+20+19=43-”cats” will be stored using index = 43
Hash Tables
Problem- Too may words with the same index - “was”,”tin”,”give”,”tend”,”moan”,”tick” and several other words add to 43
Hashing
Another Method (Multiply by Powers)- an integer in the numeric system is in the power of 10 - 7546=7x1000 + 5x100 + 4x10+6- 7546=7x103 + 5x102 + 4x101+6
Can do the same thing with words - Use 27 as base (26 letters + blank)
Hashing
“cats”=3*273+1*272+20*271+19 = 60,337
unique index for every wordmain drawback : takes too much
space- For up to 10 letter words(279), one 7,000,000,000,000 (7000 gigabytes)
Hashing
While the scheme was able to generate unique keys, it assigns spaces to non-words (aaaaaa,zzzzzzz,aaacccc,etc.)
Be able to compress a huge range of numbers from the numbers-multiplied-by powers system into a smaller(reasonably sized) array
Hashing
Hashing function- The process of converting a number in a large range into a number in a smaller range.
Hashing
“cats”=3*273+1*272+20*271+19 = 60,337
unique index for every wordmain drawback : takes too much
space- For up to 10 letter words(279), one 7,000,000,000,000 (7000 gigabytes)
Hashing
While the scheme was able to generate unique keys, it assigns spaces to non-words (aaaaaa,zzzzzzz,aaacccc,etc.)
Be able to compress a huge range of numbers from the numbers-multiplied-by powers system into a smaller(reasonably sized) array
Hashing
Hashing function- The process of converting a number in a large range into a number in a smaller range.
Size of smaller range - twice the size of the data set (2s)- for 50,000 words, array of 100,000 elements
Hashing
Hash Function- achieved by using the modulo function (returns the remainder)- for example, 33 mod 10 = 3
- LargeNumber mod Smallrange
Hashing
Hugenumber=C0*279+C1*278+C2*277 …. C9*270
arraysize = numberofwords * 2
arrayindex=Hugenumber mod arraysize
Hashing - Collisions
Hashing presents the risk of two elements with the same index (although better than sumofdigits).
Collision - two elements with the same index key after hashing
Collisions
Two approaches to handle collision- Open Addressing- Separate Chaining
Open Addressing - Finding the next available free cell
Separate Chaining - install a linked list at each index
Open Addressing
Three Types- Linear Probing, Quadratic Probing, and Double Hashing
Linear Probing - Finding the next available cell (x+1,x+2,etc.)- leads to clustering
Clustering
Quadratic Probing- Finds next available cell using the squares as the step method (x+1,x+4,x+27,etc)
Double Hash- Hash again using a different hash function to find next free cell- 2nd hash : step size
Separate Chaining
A linked list is installed in the array index such that entries with the same keys are attached to the linked list
1 1
2
3
986 1881 333
Hashing
Read Chapter 5 (Data Structures and Algorithms in C by Weiss)
Chapter 7 (in Goodrich and Tamassia Book)
Hash Functions implementations are presented in these chapters
Data Structures and Algorithms
Conceptual ApproachJava and C Implementations are
presented in both booksFor further studies, focus more on
the mathematical aspects (look at theorems & propositions)- Proving
When to use what
General Purpose Data Structures- arrays,linked lists, trees, and hash tables- used to store and retrieve data using key values
-applications : can be used for storing personnel records, inventories, contact lists,etc.
General Purpose Data Structures
Arrays Best used : - when amount of data is reasonably small - when to amount of data is predictable in advance
General Purpose Data Structures
Linked Lists- when data stored cannot be predicted- when data will be frequently inserted and deleted
Binary Search Trees- used when arrays or linked lists are too slow - O(logN) : insertion,searching, deletion
General Purpose Data Structures
Hash Tables - fastest data storage structure- used in spell checkers and as symbol tables in compilers - may require additional memory for open addressing implementations
Special Purpose Data Structures
Stacks,Queues (Priority Queues)used by a computer program to aid in carrying out
some algorithm For example, in graph algorithms, stack and
queues were usedAbstract Data Types
- implemented by a more fundamental data structure (array,linked list)- conceptual aids
Special Purpose Data Structures
Stacks- used when you want to access the last data inserted (LIFO structure)- implemented using array or linked lists depending on size
Queues - used when you want to access the first data item (FIFO structure)
Graphs
Unique data structureDirectly model real world situations
(maps, flight-airports,etc)structure of the graph reflects the
structure of the problemmain choice is representation : adjacent
list or adjacency matrix
Sorting
For limited data elements (up to 1000-1500 entries), insertion sort may be sufficient
When bogged down, can use merge sort or quick sort (merge sort however requires additional memory)