Choosing a Hash Table Function.pdf

Embed Size (px)

Citation preview

  • 7/28/2019 Choosing a Hash Table Function.pdf

    1/8

    CSE 535 : Lockwood 1

    CSE 535 : Lecture 6

    Hash Functions Continued:Generation Functions

    Washington University

    Fall 2003

    http://www.arl.wustl.edu/arl/projects/fpx/cse535/

    Copyright 2003, John W LockwoodThis lecture includes slides and diagrams from:Data Structures and Algorithms, by John Morris

    [email protected]

    CSE 535 : Lockwood 2

    Choosing a Hash Table Function

    Almost any function will do

    But some functions aredefinitely better than others!

    Key criterion

    Minimum number of collisions Keeps chains short

    Maintains O(1) average length of chain

  • 7/28/2019 Choosing a Hash Table Function.pdf

    2/8

    CSE 535 : Lockwood 3

    Uniform Hashing

    Ideal hash function P(k) = probability that a key, k, occurs

    If there are m slots in our hash table,

    A uniform hashing function, H(k), would ensure:

    or, in plain English, the number of keys that map to each slot is equal

    P(k) =k | h(k) = 0

    P(k) = ....k | h(k) = 1

    P(k) =k | h(k) = m-1

    1m

    (Read as sum over allk such thath(k) = 0)

    CSE 535 : Lockwood 4

    Hash Tables - A Uniform Hash Function

    If the keys, k, are integers

    randomly distributed in [ 0 , r ),

    then

    is a uniform hash function

    Example

    Add ASCII codes for characters mod, 255will give values in [ 0, 256 ) or [ 0, 255 ]

    Replace + by xor Same range, but without the mod operation

    h(k) =mkr

  • 7/28/2019 Choosing a Hash Table Function.pdf

    3/8

    CSE 535 : Lockwood 5

    Hash Tables - Reducing the range to [ 0, m )

    Weve mapped the keys to a range of integers0 k < r

    Now we must reduce this range to [ 0, m )

    where m is a reasonable size for the hash table

    Strategies

    Division - use a mod function

    Multiplication

    Universal hashing

    CSE 535 : Lockwood 6

    Using Division to Reduce hash range to [ 0, m )

    Use a mod function h(k) = k mod m

    Choice of m?

    Powers of 2 aregenerally not good!

    h(k) = k mod 2n

    selects last n bits of k

    Prime numbers close to 2n

    seem to be good choices

    Eg: For ~4000 entry table, choose m = 4093

    0110010111000011010

    k mod 28 selects these bits

  • 7/28/2019 Choosing a Hash Table Function.pdf

    4/8

    CSE 535 : Lockwood 7

    Computing Hash Functions in Hardware

    XOR Functions are preferred Avoids excessive hardware needed for

    multiplication, division, and long critical pathrequired for carry chains in sums

    Bit shifting can help

    Helps to randomize the high-order bits, sincethe ASCII values used to represent stringschange mostly in the lower-order bits.

    Good to simulate function with real data toensure that Hash function provides an evendistribution

    CSE 535 : Lockwood 8

    Hash Tables - Reducing the range to [ 0, m )

    Universal Hash Function

    Consider a set of functions, H, which map keysx with r bits to H(x) with m bits

    Input Range: [0, r)

    Output Range [ 0, m )

    H, is a universal hash function,if for each pair of keys, x and y,the number of functions for which

    H(x) = H(y)is|H|/m

  • 7/28/2019 Choosing a Hash Table Function.pdf

    5/8

    CSE 535 : Lockwood 9

    Avoiding the Worst-case running times

    Using a well-known hash function can be bad A determined adversary can always find a set

    of data that will defeat any hash function Hash all keys to same slot causes O(n) search

    For the paranoid

    Select a hash function randomly at compile-timefrom a set of hash functions to reduce theprobability of poor performance

    For the really paranoid

    Pick a hash function randomly at run-time

    CSE 535 : Lockwood 10

    Loadable Hash Generation Matrix

    Hash Matrix Elements

    q

    q3,2

    ...

    3,1

    q2,2

    1

    x

    x

    2

    3

    ...

    a

    a

    1

    2

    x

    q2,1

    Inputs

    Outputs

    q1,1

    ...

    ...

    q2,1

  • 7/28/2019 Choosing a Hash Table Function.pdf

    6/8

    CSE 535 : Lockwood 11

    1 0 1

    1 1 0

    Example of Hash Matrix Generator

    q=

    1 0

    ...

    1

    1

    1

    ...

    1

    0

    0

    Inputs

    Outputs

    ...

    ...

    1

    Hash Matrix Elements

    1

    0

    1

    1

    0 0

    01

    CSE 535 : Lockwood 12

    Example of how logic reduction reduces size of acircuit programmed at compile-time

    AND gateseliminatedby identityproperty

    Logic

    reduction is

    automatic,

    and occurs

    during

    synthesis

    ...

    ...1

    1

    ...

    1

    0

    Inputs

    Outputs

    ...

    Hash Matrix Elements

    0

  • 7/28/2019 Choosing a Hash Table Function.pdf

    7/8

    CSE 535 : Lockwood 13

    Larger Example of Hash Matrix

    Given

    80-bit Input Hash Function over 10 bytes Length of our signatures

    12-bit Output 4096 addresses

    Size of our Block RAMs

    Uniform and random distributionof {0,1} values in q

    Universal hash function

    Estimate number of AND gates XOR gates Logic depth

    a2

    CSE 535 : Lockwood 14

    Example: Collision Frequency of Birthdays

    How many people does it takebefore the odds are > 50% thattwo people in a class of size nhave the same birthday?

    Model

    x = Birthday [Universe of all possible dates]

    H(x) = DayNumber (x) There are 365 days in a normal year

    Are birthdays on the same day unlikely?

  • 7/28/2019 Choosing a Hash Table Function.pdf

    8/8

    CSE 535 : Lockwood 15

    Distinct Birthdays

    Let Q(n) = probability that n people havedistinct birthdays

    Q(1) = 1

    With two people, the 2nd has only 364 freebirthday

    The 3rd has only 363, and so on:

    Q(2) = Q(1) *364

    365

    Q(n) = Q(1) *364

    365

    364

    365

    365-n+1

    365* * *

    363

    CSE 535 : Lockwood 16

    Coincident Birthdays

    Probability of having two identical birthdays

    P(n) = 1 - Q(n)

    P(23) = 0.507

    With 23 entries,table is only23/365 = 6.3%

    full!

    0.000

    0.100

    0.200

    0.300

    0.400

    0.500

    0.600

    0.700

    0.800

    0.900

    1.000

    0 20 40 60 80