Upload
godwin-flynn
View
219
Download
1
Embed Size (px)
Citation preview
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.1
Introduction to Algorithms6.046J/18.401J/SMA5503
Lecture 8Prof. Charles E. Leiserson
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.2
A weakness of hashingProblem: For any hash function h, a setof keys exists that can cause the averageaccess time of a hash table to skyrocket.
• An adversary can pick all keys from {k ∈ U : h(k) = i} for some slot i.
IDEA: Choose the hash function at random,independently of the keys.• Even if an adversary can see your code, he or she cannot find a bad set of keys, since he or she doesn’t know exactly which hash function will be chosen.
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.3
Universal hashinglet H be a finite collection of hash functions,each mapping U to {0, 1, …, m–1}. We sayH is universal if for all x, y ∈ U, where x ≠ y,we have |{h H: ∈ h(x) = h(y)}| = |H|/m.
That is, the chanceof a collisionbetween x and y is1/m if we choose hrandomly from .
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.4
Universality is goodTheorem. Let h be a hash function chosen(uniformly) at random from a universal set H
of hash functions. Suppose h is used to hashn arbitrary keys into the m slots of a table T.Then, for a given key x, we have
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.5
Proof of theoremProof. Let Cx be the random variable denotingthe total number of collisions of keys in T withx, and let
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.6
Proof (continued)
• Take expectationof both sides.
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.7
Proof (continued)
• Take expectationof both sides.
• Linearity ofexpectation.
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.8
Proof (continued)
• Take expectationof both sides.• Linearity ofexpectation.
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.9
Proof (continued)
• Take expectation of both sides.• Linearity of expectation.
• Algebra.
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.10
Constructing a set ofuniversal hash functions
digits, each with value in the set {0, 1, …, m–1}.That is, let k = ⟨k0, k1, …, kr ,⟩ where 0 ≤ ki < m.
Randomized strategy:Pick a = ⟨a0, a1, …, ar⟩ where each ai is chosenrandomly from {0, 1, …, m–1}. Dot product,
modulo mDefine How big is
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.11
Universality of dot-producthash functionsTheorem. The set H = {ha} is universal.
Proof. Suppose that x = ⟨x0, x1, …, xr⟩ and y =⟨y0, y1, …, yr⟩ be distinct keys. Thus, they differin at least one digit position, wlog position 0.For how many ha H∈ do x and y collide?
We must have ha(x) = ha(y), which implies that
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.12
Proof (continued)Equivalently, we have
or
which implies that
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.13
Fact from number theoryTheorem. Let m be prime. For anysuch that z ≠ 0, there exists a uniquesuch that
Example: m = 7.
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.14
We have
and since x0 ≠ y0 , an inverse (x0 – y0 )–1 must exist,which implies that
Thus, for any choices of a1, a2, …, ar, exactlyone choice of a0 causes x and y to collide.
Back to the proof
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.15
Proof (completed)Q. How many ha’s cause x and y to collide?A. There are m choices for each of a1, a2, …, ar ,but once these are chosen, exactly one choicefor a0 causes x and y to collide, namely
Thus, the number of ha’s that cause x and yto collide is
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.16
Perfect hashingGiven a set of n keys, construct a static hashtable of size m = O(n) such that SEARCH takesΘ(1) time in the worst case.
IDEA: Two level schemewith universalhashing atboth levels.No collisionsat level 2!
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.17
Collisions at level 2Theorem. Let H be a class of universal hash
functions for a table of size m = n2. Then, if weuse a random h H to hash ∈ n keys into the table,the expected number of collisions is at most 1/2.
Proof. By the definition of universality, theprobability that 2 given keys in the table collideunder h is 1/m = 1/n2. Since there are pairsof keys that can possibly collide, the expectednumber of collisions is
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.18
No collisions at level 2Corollary. The probability of no collisionsis at least 1/2.Proof. Markov’s inequality says that for anynonnegative random variable X, we have
Thus, just by testing random hash functionsin H , we’ll quickly find one that works.
Applying this inequality with t = 1, we findthat the probability of 1 or more collisions isat most
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.19
Analysis of storageFor the level-1 hash table T, choose m = n, andlet ni be random variable for the number of keysthat hash to slot i in T. By using ni
2 slots for thelevel-2 hash table Si, the expected total storagerequired for the two-level scheme is therefore
since the analysis is identical to the analysis fromrecitation of the expected running time of bucketsort. (For a probability bound, apply Markov.)