Upload
tamthientai
View
217
Download
0
Embed Size (px)
Citation preview
7/28/2019 Choosing a Hash Table Function.pdf
1/8
CSE 535 : Lockwood 1
CSE 535 : Lecture 6
Hash Functions Continued:Generation Functions
Washington University
Fall 2003
http://www.arl.wustl.edu/arl/projects/fpx/cse535/
Copyright 2003, John W LockwoodThis lecture includes slides and diagrams from:Data Structures and Algorithms, by John Morris
CSE 535 : Lockwood 2
Choosing a Hash Table Function
Almost any function will do
But some functions aredefinitely better than others!
Key criterion
Minimum number of collisions Keeps chains short
Maintains O(1) average length of chain
7/28/2019 Choosing a Hash Table Function.pdf
2/8
CSE 535 : Lockwood 3
Uniform Hashing
Ideal hash function P(k) = probability that a key, k, occurs
If there are m slots in our hash table,
A uniform hashing function, H(k), would ensure:
or, in plain English, the number of keys that map to each slot is equal
P(k) =k | h(k) = 0
P(k) = ....k | h(k) = 1
P(k) =k | h(k) = m-1
1m
(Read as sum over allk such thath(k) = 0)
CSE 535 : Lockwood 4
Hash Tables - A Uniform Hash Function
If the keys, k, are integers
randomly distributed in [ 0 , r ),
then
is a uniform hash function
Example
Add ASCII codes for characters mod, 255will give values in [ 0, 256 ) or [ 0, 255 ]
Replace + by xor Same range, but without the mod operation
h(k) =mkr
7/28/2019 Choosing a Hash Table Function.pdf
3/8
CSE 535 : Lockwood 5
Hash Tables - Reducing the range to [ 0, m )
Weve mapped the keys to a range of integers0 k < r
Now we must reduce this range to [ 0, m )
where m is a reasonable size for the hash table
Strategies
Division - use a mod function
Multiplication
Universal hashing
CSE 535 : Lockwood 6
Using Division to Reduce hash range to [ 0, m )
Use a mod function h(k) = k mod m
Choice of m?
Powers of 2 aregenerally not good!
h(k) = k mod 2n
selects last n bits of k
Prime numbers close to 2n
seem to be good choices
Eg: For ~4000 entry table, choose m = 4093
0110010111000011010
k mod 28 selects these bits
7/28/2019 Choosing a Hash Table Function.pdf
4/8
CSE 535 : Lockwood 7
Computing Hash Functions in Hardware
XOR Functions are preferred Avoids excessive hardware needed for
multiplication, division, and long critical pathrequired for carry chains in sums
Bit shifting can help
Helps to randomize the high-order bits, sincethe ASCII values used to represent stringschange mostly in the lower-order bits.
Good to simulate function with real data toensure that Hash function provides an evendistribution
CSE 535 : Lockwood 8
Hash Tables - Reducing the range to [ 0, m )
Universal Hash Function
Consider a set of functions, H, which map keysx with r bits to H(x) with m bits
Input Range: [0, r)
Output Range [ 0, m )
H, is a universal hash function,if for each pair of keys, x and y,the number of functions for which
H(x) = H(y)is|H|/m
7/28/2019 Choosing a Hash Table Function.pdf
5/8
CSE 535 : Lockwood 9
Avoiding the Worst-case running times
Using a well-known hash function can be bad A determined adversary can always find a set
of data that will defeat any hash function Hash all keys to same slot causes O(n) search
For the paranoid
Select a hash function randomly at compile-timefrom a set of hash functions to reduce theprobability of poor performance
For the really paranoid
Pick a hash function randomly at run-time
CSE 535 : Lockwood 10
Loadable Hash Generation Matrix
Hash Matrix Elements
q
q3,2
...
3,1
q2,2
1
x
x
2
3
...
a
a
1
2
x
q2,1
Inputs
Outputs
q1,1
...
...
q2,1
7/28/2019 Choosing a Hash Table Function.pdf
6/8
CSE 535 : Lockwood 11
1 0 1
1 1 0
Example of Hash Matrix Generator
q=
1 0
...
1
1
1
...
1
0
0
Inputs
Outputs
...
...
1
Hash Matrix Elements
1
0
1
1
0 0
01
CSE 535 : Lockwood 12
Example of how logic reduction reduces size of acircuit programmed at compile-time
AND gateseliminatedby identityproperty
Logic
reduction is
automatic,
and occurs
during
synthesis
...
...1
1
...
1
0
Inputs
Outputs
...
Hash Matrix Elements
0
7/28/2019 Choosing a Hash Table Function.pdf
7/8
CSE 535 : Lockwood 13
Larger Example of Hash Matrix
Given
80-bit Input Hash Function over 10 bytes Length of our signatures
12-bit Output 4096 addresses
Size of our Block RAMs
Uniform and random distributionof {0,1} values in q
Universal hash function
Estimate number of AND gates XOR gates Logic depth
a2
CSE 535 : Lockwood 14
Example: Collision Frequency of Birthdays
How many people does it takebefore the odds are > 50% thattwo people in a class of size nhave the same birthday?
Model
x = Birthday [Universe of all possible dates]
H(x) = DayNumber (x) There are 365 days in a normal year
Are birthdays on the same day unlikely?
7/28/2019 Choosing a Hash Table Function.pdf
8/8
CSE 535 : Lockwood 15
Distinct Birthdays
Let Q(n) = probability that n people havedistinct birthdays
Q(1) = 1
With two people, the 2nd has only 364 freebirthday
The 3rd has only 363, and so on:
Q(2) = Q(1) *364
365
Q(n) = Q(1) *364
365
364
365
365-n+1
365* * *
363
CSE 535 : Lockwood 16
Coincident Birthdays
Probability of having two identical birthdays
P(n) = 1 - Q(n)
P(23) = 0.507
With 23 entries,table is only23/365 = 6.3%
full!
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
1.000
0 20 40 60 80