© 2001 by Charles E. Leiserson Introduction to AlgorithmsDay 12 L8.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 8 Prof. Charles E. Leiserson

© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.1

Introduction to Algorithms6.046J/18.401J/SMA5503

Lecture 8Prof. Charles E. Leiserson


A weakness of hashingProblem: For any hash function h, a setof keys exists that can cause the averageaccess time of a hash table to skyrocket.

• An adversary can pick all keys from {k ∈ U : h(k) = i} for some slot i.

IDEA: Choose the hash function at random,independently of the keys.• Even if an adversary can see your code, he or she cannot find a bad set of keys, since he or she doesn’t know exactly which hash function will be chosen.


Universal hashinglet H be a finite collection of hash functions,each mapping U to {0, 1, …, m–1}. We sayH is universal if for all x, y ∈ U, where x ≠ y,we have |{h H: ∈ h(x) = h(y)}| = |H|/m.

That is, the chanceof a collisionbetween x and y is1/m if we choose hrandomly from .


Universality is goodTheorem. Let h be a hash function chosen(uniformly) at random from a universal set H

of hash functions. Suppose h is used to hashn arbitrary keys into the m slots of a table T.Then, for a given key x, we have


Proof of theoremProof. Let Cx be the random variable denotingthe total number of collisions of keys in T withx, and let


Proof (continued)

• Take expectationof both sides.


Proof (continued)

• Take expectationof both sides.

• Linearity ofexpectation.


Proof (continued)

• Take expectationof both sides.• Linearity ofexpectation.


Proof (continued)

• Take expectation of both sides.• Linearity of expectation.

• Algebra.


Constructing a set ofuniversal hash functions

digits, each with value in the set {0, 1, …, m–1}.That is, let k = ⟨k0, k1, …, kr ,⟩ where 0 ≤ ki < m.

Randomized strategy:Pick a = ⟨a0, a1, …, ar⟩ where each ai is chosenrandomly from {0, 1, …, m–1}. Dot product,

modulo mDefine How big is


Universality of dot-producthash functionsTheorem. The set H = {ha} is universal.

Proof. Suppose that x = ⟨x0, x1, …, xr⟩ and y =⟨y0, y1, …, yr⟩ be distinct keys. Thus, they differin at least one digit position, wlog position 0.For how many ha H∈ do x and y collide?

We must have ha(x) = ha(y), which implies that


Proof (continued)Equivalently, we have

or

which implies that


Fact from number theoryTheorem. Let m be prime. For anysuch that z ≠ 0, there exists a uniquesuch that

Example: m = 7.


We have

and since x0 ≠ y0 , an inverse (x0 – y0 )–1 must exist,which implies that

Thus, for any choices of a1, a2, …, ar, exactlyone choice of a0 causes x and y to collide.

Back to the proof


Proof (completed)Q. How many ha’s cause x and y to collide?A. There are m choices for each of a1, a2, …, ar ,but once these are chosen, exactly one choicefor a0 causes x and y to collide, namely

Thus, the number of ha’s that cause x and yto collide is


Perfect hashingGiven a set of n keys, construct a static hashtable of size m = O(n) such that SEARCH takesΘ(1) time in the worst case.

IDEA: Two level schemewith universalhashing atboth levels.No collisionsat level 2!


Collisions at level 2Theorem. Let H be a class of universal hash

functions for a table of size m = n2. Then, if weuse a random h H to hash ∈ n keys into the table,the expected number of collisions is at most 1/2.

Proof. By the definition of universality, theprobability that 2 given keys in the table collideunder h is 1/m = 1/n2. Since there are pairsof keys that can possibly collide, the expectednumber of collisions is


No collisions at level 2Corollary. The probability of no collisionsis at least 1/2.Proof. Markov’s inequality says that for anynonnegative random variable X, we have

Thus, just by testing random hash functionsin H , we’ll quickly find one that works.

Applying this inequality with t = 1, we findthat the probability of 1 or more collisions isat most


Analysis of storageFor the level-1 hash table T, choose m = n, andlet ni be random variable for the number of keysthat hash to slot i in T. By using ni

2 slots for thelevel-2 hash table Si, the expected total storagerequired for the two-level scheme is therefore

since the analysis is identical to the analysis fromrecitation of the expected running time of bucketsort. (For a probability bound, apply Markov.)

Documents

© 2001 by Charles E. Leiserson Introduction to AlgorithmsDay 12 L8.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 8 Prof. Charles E. Leiserson