º°©°¡® - BGUds142/wiki.files/Presentation08-hash.pdf · Analysis of Open-Address Hashing...

Motivation

• Many applications require a dynamic set that supports only the dictionary operations:

– insert

– search

– delete

• Example:

– A symbol table in a compiler.

• We will consider all keys to be (possibly large) natural numbers.

• Suppose: – The keys are from a universe U={0, 1, … u-1}

– Keys are distinct

• The idea: – Set up an array T[0..u-1] in which

• T[i] = x if x T and x.key = i

• T[i] = null otherwise

– Operations take O(1) time! • search (T, k) return T[k]

• insert (T, x) T[x.key] x

• delete (T, x) T[x.key] null

• So – what’s the pro le ?

Direct Addressing

• Direct addressing works well when the size of

the universe U is relatively small

• But what if the keys are 32-bit integers?

– It will have 232 entries

Hashing

• Solution:

– map keys to smaller range 0 .. m-1

• This mapping is called a hashing

Collisions

• Two hashed key may collide with one another

• Solution:

– chaining

– open addressing

Chaining

• Chaining puts elements that hash to the same

slot in a linked list.

Search, Insert and Delete

• search(T, k)

– search for an element with key k in list T[h(k)]

• insert(T, x)

– insert x at the head of list T[h(x.key)]

• delete(T, x)

– delete x from the list T[h(x.key)]

Analysis of Chaining

• Assume simple uniform hashing: – each key is equally likely to be hashed to any slot.

• Given n keys and m slots in the table: the load factor = n/m is the average # keys per slot

• We will show that the average cost of an unsuccessful search for a key is Θ(1+).

• We will show that the average cost of a successful search is Θ(1+/2) = Θ(1+).

• Hence, the average cost is Θ(1+).

• Thus, if n = O(m), α = / = O / = O 1), and the average cost is Θ(1)

• Theorem: – An unsuccessful search takes expected time Θ(1+α .

• Proof: – Simple uniform hashing any key not already in the table

is equally likely to hash to any of the m slots.

– To search unsuccessfully for any key k, need to search to the end of the list T[h(k)].

– This list has expe ted le gth E[le gth of T[h k ]] = α. – Therefore, the expected number of elements examined in

a u su essful sear h is α. – Adding in the time to compute the hash function, the total

time required is Θ 1 + α .

• Theorem: – An successful search takes expected time Θ(1+α .

• Proof: – Assume that the element x being searched for is equally likely to be

any of the n elements stored in the table.

– The number of elements examined during a successful search for x is 1 more than the number of elements that appear before x in x’s list.

– These are the elements inserted after x was inserted (because we insert at the head of the list).

– So we need to find the average, over the n elements x in the table, of how many elements were inserted into x’s list after x was inserted.

– For i = 1, 2, . . . , n, let xi be the i th element inserted into the table, and let ki = key[xi].

– For all i and j, define indicator random variable Xi j = I{h(ki) = h(kj)}.

• How can we convert floats or ASCII strings to natural numbers?

• Example: – Co sider CLR“

• ASCII values: C = 67, L = 76, R = 82, S = 83.

– There are 128 basic ASCII values.

– “o i terpret CLR“ • (67 ∙ 1283)+ (76 ∙ 1282)+ (82 ∙ 1281)+ (83 ∙ 1280) =

141,764,947.

Hor er’s rule

• Hor er’s rule:

• Code:

y = ad

for i = d-1 to 0

y = ai+x·y

• If d is large the value of y is too big.

• Solution: evaluate the polynomial modulo m:

y = ad

for i = d-1 to 0

y = (ai+x·y) mod m

1 ...)))(...(... axxaxaxaaxaxaxa ddd

Choosing A Hash Function

• Clearly choosing the hash function well is

crucial

– What will a worst-case hash function do?

– What will be the time to search in this case?

• What are desirable features of the hash

function?

– Should distribute keys uniformly into slots

– Should not depend on patterns in the data

Choosing A Hash Function

• Unfortunately, it is typically not possible to check this conditions – One rarely knows the probability distribution

according to which the keys are drawn

– The keys may not be drawn independently.

• Occasionally we do know the distribution – If the keys are known to be random real numbers k

independently and uniformly distributed in the range 0≤k<1

– The hash function h(k) = km satisfies the condition of simple uniform hashing.

The Division Method

h(k) = k mod m

• For example, if the hash table has size m = 12 and the key is k = 100, then h(k) = 4.

• Hashing by division is quite fast.

• What happens if m is a power of 2 (say 2P)?

The Multiplication Method

• For a constant 0 < A < 1: h(k) = m·frac(kA), where frac(x) is the fractional part of x.

• Advantage: Value of m is not critical.

• Usually m = 2P for some integer p.

• Suppose that the word size is w bits.

• Choose A = s/2w, where is an integer in the range 0 < s < 2w

• The computation of h(k) can be done using integer operations:

– ks = kA·2w, so the rightmost w bits of ks are equal to frac(kA)·2w.

– m·frac(kA) = first p bits of frac(kA) after floating point = leftmost p bits of the rightmost w bits of ks.

Universal Hashing

• Peak a hash function randomly when the

algorithm begins

– A way to randomize the algorithm to control the

input distribution.

– Need a good family of hash functions to choose

Universal Set

• A finite collection H of hash functions is

universal

– if for each k, l U, where k ≠ l, the u er of hash functions h H for which h(k) = h(l) is

≤|H|/ , where is the size of the hash ta le.

• Alternatively, H is universal

– if, with a hash function h chosen randomly from H,

the probability of a collision between two

different keys is no more than 1/m.

Universal Hashing

• Theorem: – Choose h from a universal family of hash functions

– Hash n keys into a table T of m slots, n m

– Then the expected number of collisions involving a particular key x is less than 1

• Proof: – For each pair of keys x, y, let Ixy the indicator that y and x collide.

– E[Ixy] = 1/m

– Let Cx be total number of collisions involving key x

– Since n m, we have E[Cx] < 1

• Corollary

– Using chaining and universal hashing, the expected time for each search operation is O(1).

1n]E[I]IE[]E[C

A Universal Hash Set of Functions

• Let U = {1, …, u} • Fix a prime p > u.

• For every a є {1, …, p-1}, b є {0, …, p-1}, define

hab(k) = ((ak + b) mod p) mod m, where m is the

size of the table

• H = {hab(k) } is universal

Open Addressing

• Basic idea:

– If slot is full, try another slot, etc., until an open slot is found (probing)

– To search, follow same sequence of probes as would be used when inserting the element

• If reach element with correct key, return it

• If reach a null pointer, element is not in table

• Good for fixed sets (adding but no deletion)

– Example: spell checking

• Ta le eed ’t e u h igger tha n

– But, comparing to chaining, there is more space available.

Hash Function

such that

for each k є U

{h(k,0),h(k,1), ..., h(k, m-1)}

is a permutation of

{0, 1, ..., m -1}

Insert & Search

Deletion

• Cannot just put null into the slot containing

the key we want to delete.

• Solution:

– Use a special value DELETED when marking a slot

as empty during deletion.

• The disadvantage:

– Search time is no longer dependent on the load

factor α.

Linear Probing

h' : U → {0, 1, ..., m - 1} be an hash function

Define

h(k, i) = (h'(k) + i) mod m

• Easy to implement

• Suffers from a primary clustering

– Long runs of occupied sequences build up.

Double hashing

h(k, i) = (h1(k) + ih2(k)) mod m

h1 and h2 are hash functions

• Advantage:

– Two keys with same hash may have different steps.

– Only m2 different probe sequences.

Example

• h1(k) = k mod 13

• h2(k) = 1 + (k mod 11)

Analysis of Open-Address Hashing

• Theorem: – Given an open-address hash ta le with load fa tor α <

– The expected number of probes in a successful search is at most

• assuming uniform hashing

– Each key is equally likely to have any of the m! permutations of 0, 1, . . . , − 1 as its probe sequence

• assuming that each key in the table is equally likely to be searched for.

Analysis of Open-Address Hashing

• Theorem: – Given an open-address hash table with load factor

α < 1,

– The expected number of probes in an unsuccessful search is at most 1/(1-α ,

• assuming uniform hashing.

• If α is a o sta t, a sear h ru s i O 1) time.

• Theorem: – The expected number of probes to insert is at most

1/(1 − α .

Bloom Filter

• . • URL 10,000,000

. , ( -URL .)

ו • ר ב . פ , .

• , , (false positive) ,

• -URL . • URL .

, -URL . • , -

URL . ," . -false positive .

פ –פ

• URL 10,000,000 . • T 80,000,000 , h .

-T -0. • URL x 1 - T[h(x)]. • URL x ,

T[h(x)] . • T[h(x)]=0 . • T[h(x)]=1 . • , -T[h(x)]=1 ?

• y h(x)=h(y). •

10,000,000/80,000,000=0.125

• m k . • : 0. • x ,

x k , k " " .

• y , k

• n .

• , .

• " " ( k,m - n )

(m=12,k=3)

0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 1 0 1 0 0 1 0

0 0 1 1 0 1 0 1 0 0 1 0

(m=12,k=3)

0 0 1 1 0 1 0 1 0 0 1 0

x1 x2 y1

: false positive y2

0 0 1 1 0 1 0 1 0 0 1 0

x1 x2 y2

false positive

• , false positive k

1. • n ,

" " 1 , 0 kn

false positive

• k 1 :

פ false positive • m ( )-n

, k ( ) false

positive ? :false positive

• k k 0 .

)1ln(/ /

)1(mkn

ekkmknee

)1ln(/ mkn

פ false positive

• false positive :

• Byte 8 bit m/n=8 false positive 0.0214.

º°©°¡® - BGUds142/wiki.files/Presentation08-hash.pdf · Analysis of Open-Address Hashing...

Documents

Black-Box Analysis of the Block-Cipher-Based Hash-Function ...rogaway/papers/hash.pdf · as one-way functions. We suggest that proving black-box bounds, of the style given here, is

Address Translation. Main Points Address Translation Concept – How do we convert a virtual address to a physical address? Flexible Address Translation

894 Project Slide Presentation08 Dec

HASH FUNCTIONScseweb.ucsd.edu/~mihir/cse207/slides/s-hash.pdf · 2019. 10. 23. · Hash functions MD: MD4, MD5, MD6 SHA2: SHA1, SHA224, SHA256, SHA384, SHA512 SHA3: SHA3-224, SHA3-256,

NAME OPERATING AS ADDRESS 1 ADDRESS 2 ADDRESS 3 …

Hash tables - cs.bgu.ac.ilds202/wiki.files/hash.pdf · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m

Supplier Supplier Name Email Address Address Line 1 ... · PDF fileSupplier Name Email Address Address Line 1 Address Line 2 Address Line 3 Address Line 4 Address ... A00020 British

Newspaper Editor Address 1 Address 2 Address 3 … Editor Address 1 Address 2 Address 3 Postcode E.mail address Aberdeen Citizen News Desk PO Box 43 Lang Stracht Mastrick ABERDEEN

Materia: Algoritmos y Programaci´on II HASHINGweb.fi.uba.ar/~ssantisi/works/hash/hash.pdf · sensiblemente la necesidad de resolver los hash clashes y, por lo tanto, gana en ve-locidad

Ttc Logistics Presentation08

HASH FUNCTIONS - cseweb.ucsd.educseweb.ucsd.edu/~mihir/cse107/slides/s-hash.pdf · SHA = “Secure Hash Algorithm ... Practical hash functions like the MD, SHA2 and SHA3 series are

Hash Functions - UMass Amhersteuler.ecs.umass.edu/ece597/pdf/Crypto-Part11-Hash.pdf · Hash Function ECE597/697 Koren Part.11 .6 Adapted from Paar & Pelzl, “Understanding Cryptography,”

Finding Similar Items and Locality Sensitive Hashingrcs46/lectures_2017/03-hash/03-hash.pdf · A Day In The Life A Hard Day's Night Abbey Road Medley Across The Universe All My Loving

Reference Type of Business Name of Business Address Address … · 2019-07-03 · Reference Type of Business Name of Business Address Address Address Address Address Post Code 63280

Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m

Icici bank investor-presentation08

Our Experience in Lean Manufacturing Implementationhashllp.com/pdf/downloads/Introduction-Hash.pdf · PVC Pipe Extrusion Machine ... defects. On-Time Delivery(OTD) ... Our Experience

º°©°¡® - cs.bgu.ac.ilds202/wiki.files/Presentation08-hash.pdf · Analysis of Chaining Assume simple uniform hashing : ± each key is equally likely to be hashed to any slot

Introduction Hashfunctions - CIMATluis.dominguez/files/urosario-hash.pdf · Agenda1 One-wayfunctions HashfunctionsversusMACfunctions Hashfunctions Construction SHA LuisDominguez luis.dominguez@cimat.mx

DoDAAC Basics...DoDAAC Record DoDAAC: AR6789 Name: Super Awesome Deployment Address 1 (Physical Address) Address 2 (Ship To Address) Address 3 (Bill To Address) POC Information It