hashing - cs.unc.edu

Preview:

Citation preview

Hashing

DynamicDictionaries

Operations:• create• insert• find• remove• max/min• writeoutinsortedorder

Onlydefined forobjectclassesthatareComparable

Hashtables

Operations:• create• insert• find• remove• max/min• writeoutinsortedorder

Onlydefined forobjectclassesthatareComparable haveequals defined

Hashtables

Operations:• create• insert• find• remove• max/min• writeoutinsortedorder

Onlydefined forobjectclassesthatareComparable haveequals defined

Javaspecific:FromtheJavadocumentation

Hashtables– implementation

• Haveatable(anarray)ofafixedtableSize

• A hashfunctiondetermineswhereinthistableeach

itemshouldbestored

itemhash(item)

[apositiveinteger]

%tableSize

THEDESIGNQUESTIONS

1. ChoosingtableSize

2. Choosingahashfunction

3. Whattodowhenacollision occurs

2174 % 10=4

Hashtables– tableSize

• Shoulddependonthe(maximum)numberofvaluestobestored

• Let λ =[numberofvaluesstored]/tableSize

• Loadfactor ofthehashtable

• Restrictλ tobeatmost1(or½)

• RequiretableSizetobeaprimenumber

• to“randomize”awayanypatternsthatmayariseinthehashfunction

values

• Theprimeshouldbeoftheform(4k+3)

[forreasonstobedetailedlater]

Hashtables– thehashfunction

Iftheobjectstobestoredhaveintegerkeys (e.g.,studentIDs)hash(k)=kis

generallyOK,unlessthekeyshave“patterns”

Otherwise,some“randomized”waytoobtainaninteger

Hashtables– thehashfunction

Iftheobjectstobestoredhaveintegerkeys (e.g.,studentIDs)hash(k)=kis

generallyOK,unlessthekeyshave“patterns”

Otherwise,some“randomized”waytoobtainaninteger

Hashtables– thehashfunction

Iftheobjectstobestoredhaveintegerkeys (e.g.,studentIDs)hash(k)=kis

generallyOK,unlessthekeyshave“patterns”

Otherwise,some“randomized”waytoobtainaninteger

Hashtables– thehashfunction

Iftheobjectstobestoredhaveintegerkeys (e.g.,studentIDs)hash(k)=kis

generallyOK,unlessthekeyshave“patterns”

Otherwise,some“randomized”waytoobtainaninteger

Java-specific

•EveryclasshasadefaulthashCode()methodthatreturnsaninteger

•Maybe(should be)overridden

•Requiredproperties

consistentwiththeclass’sequals()method

neednotbeconsistentacrossdifferentrunsoftheprogram

differentobjectsmayreturnthesamevalue!

Hashtables– thehashfunction

Iftheobjectstobestoredhaveintegerkeys (e.g.,studentIDs)hash(k)=kis

generallyOK,unlessthekeyshave“patterns”

Otherwise,some“randomized”waytoobtainaninteger

Java-specific

•EveryclasshasadefaulthashCode()methodthatreturnsaninteger

•Maybe(should be)overridden

•Requiredproperties

consistentwiththeclass’sequals()method

neednotbeconsistentacrossdifferentrunsoftheprogram

differentobjectsmayreturnthesamevalue!

FromtheJava1.5.0documentation

http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Object.html#hashCode%28%29

Hashtables– collisionresolution

Theuniverse ofpossibleitemsisusuallyfargreaterthantableSize

Collision:whenmultipleitemshashontothesamelocation(akacellorbucket)

Collisionresolutionstrategiesspecifywhattodoincaseofcollision

1. Chaining(closedaddressing)

2. Probing(openaddressing)

a. Linearprobing

b. Quadraticprobing

c. DoubleHashing

d. PerfectHashing

e. CuckooHashing

Hashtables– implementation

• Haveatable(anarray)ofafixedtableSize

• A hashfunctiondetermineswhereinthistableeach

itemshouldbestored

itemhash(item)

[apositiveinteger]

%tableSize

THEDESIGNQUESTIONS

1. ChoosingtableSize

2. Choosingahashfunction

3. Whattodowhenacollision occurs

Hashtables– tableSize

Restricttheloadfactorλ =[numberofvaluesstored]/tableSize tobe

atmost1(or½)

RequiretableSizetobeaprimenumberoftheform(4k+3)

Hashtables– thehashfunction

Iftheobjectstobestoredhaveintegerkeys (e.g.,studentIDs)hash(k)=kis

generallyOK,unlessthekeyshave“patterns”

Otherwise,some“randomized”waytoobtainaninteger

Java-specific

•EveryclasshasadefaulthashCode()methodthatreturnsaninteger

•Maybeoverridden

•Requiredproperties

consistentwiththeclass’sequals()method

neednotbeconsistentacrossdifferentrunsoftheprogram

differentobjectsmayreturnthesamevalue!

Hashtables– collisionresolution

Theuniverse ofpossibleitemsisusuallyfargreaterthantableSize

Collision:whenmultipleitemshashontothesamelocation(akacellorbucket)

Collisionresolutionstrategiesspecifywhattodoincaseofcollision

1. Chaining(closedaddressing)

2. Probing(openaddressing)

a. Linearprobing

b. Quadraticprobing

c. DoubleHashing

d. PerfectHashing

e. CuckooHashing

Hashtables– collisionresolution: chaining

Maintainalinkedlist ateachcell/bucket

(Thehashtableisan arrayoflinkedlists)

Insert:atfrontoflist

- ifpre-condition is“notalreadyinlist,” then faster

- inanycase,later-inserteditemsoftenaccessedmorefrequently (theLRU principle)

Example:Insert02,12, 22,…,92 intoaninitiallyemptyhashtablewithtableSize =10

[Note:badchoiceoftableSize– onlytomaketheexampleeasier!!]

Maintainalinkedlist ateachcell/bucket

(Thehashtableisan arrayoflinkedlists)

Insert:atfrontoflist

- ifpre-condisthatnotalreadyinlist,thenfaster

- inanycase,later-inserteditemsoftenaccessedmorefrequently

Example:Insert02,12, 22,…,92 intoaninitiallyemptyhashtablewithtableSize =10

[Note:badchoiceoftableSize– onlytomaketheexampleeasier!!]

Hashtables– collisionresolution: chaining

Maintainalinkedlist ateachcell/bucket

(Thehashtableisan arrayoflinkedlists)

Insert:atfrontoflist

- ifpre-condisthatnotalreadyinlist,thenfaster

-inanycase,later-inserteditemsoftenaccessedmorefrequently

FindandRemove:obviousimplementations

Worst-caserun-time:Θ(N)peroperation(allelementsinthesamelist)

Averagecase:O(λ) peroperationDesignrule:forchaining,keepλ ≤1Ifλ becomesgreaterthan1,rehash (later)

Hashtables– collisionresolution: chaining

Theloadfactor:[numberofitemsstored]/tableSize

Hashtables– collisionresolution:probing

1. Chaining (closedaddressing)2. Probing (open addressing)

a. Linearprobingb. Quadraticprobingc. DoubleHashingd. PerfectHashinge. CuckooHashing

Incaseofcollision, tryalternativelocations untilanemptycellisfound

• [Open address]

Probesequence:ho(x), h1(x),h2(x),…,withhi(x)=[hash(x)+f(i)]%tableSize

Thefunction f(i) isdifferent forthedifferentprobingmethods

Avoids theuseofdynamicmemory

f(i) isalinear functionofi– typically,f(i)=i

Example:insert89,18,49,58,and 69 intoatableofsize10,usinglinearprobing

Hashtables– collisionresolution:linearprobing

1. Chaining (closed addressing)2. Probing (open addressing)

a. Linear probingb. Quadratic probingc. Double Hashingd. Perfect Hashinge. Cuckoo Hashing

In case of collision, try alternative locations until an empty cell is found

• [Open address]

Probe sequence: ho(x), h1(x), h2(x), …, with hi(x) = [hash(x) + f(i)] % tableSize

The function f(i) is different for the different probing methods

Avoids the use of dynamic memory

f(i) is a linear function of i – typically, f(i) = i

Example:insert89,18,49,58,and 69 intoatableofsize10,usinglinearprobing

Hashtables- review

Supports thebasicdynamicdictionaryops:insert,find, remove

Doesnot needclasstobeComparable

Threedesigndecisions: tableSize,hashfunction, collision resolution

Tablesize

aprime oftheform(4k+3),keepingloadfactor constraintsinmind

Hashfunction

should“randomize”theitems

Java’shashCode() method

Collision resolution: chaining

Collision resolution:probing (openaddressing)– linearprobing

Theclustering problem

Hashtables- clustering

Twocausesofclustering:

multiplekeyshashontothesamelocation(secondary clustering)

multiplekeyshashontothesamecluster(primary clustering)

Secondary clusteringcausedbyhashfunction;primary,bychoiceofprobesequence

Numberofprobesperoperationincreases with loadfactor

Hashtables– collisionresolution:probing

1. Chaining (closedaddressing)2. Probing (open addressing)

a. Linearprobingb. Quadraticprobingc. DoubleHashingd. PerfectHashinge. CuckooHashing

f(i) isaquadraticfunctionof i(e.g.,f(i)=i2)

Example:insert89,18,49,58,and 69 intoatableofsize10,usingquadraticprobing

Hashtables– collisionresolution:quadraticprobing

Example:insert89,18,49,58,and 69 intoatableofsize10,usingquadraticprobing

Hashtables– collisionresolution:quadraticprobing

Twocausesofclustering:

multiplekeyshashontothesamelocation(secondary clustering)

multiplekeyshashontothesamecluster(primary clustering)

Whichonedoesquadraticprobing solve?

primaryclustering

Efficientimplementation ofi2 à (i+1)2:(i+1)and(2i+1) inparallel,andthenaddi2 and

(2i+1)

Choosing tableSize:

-prime:atleasthalfthetablegetsprobed

-primeof theform (4k+3)andprobesequence is± i2:entiretablegetsprobed

Remove:lazydelete mustbeused

Hashtables– collisionresolution:probing

1. Chaining (closedaddressing)2. Probing (open addressing)

a. Linearprobingb. Quadraticprobingc. DoubleHashingd. PerfectHashinge. CuckooHashing

Togetridofsecondary clustering

Usetwohashfunctions: hash1(.) andhash2(.)

Probesequence“step”sizeishash2(.)

- [Unlikelydistinctitemsagreeonboth hash1(.)andhash2(.)]

hash2(.) mustneverevaluatetozero!

Acommon(good)choice:R– (xmodR), forRaprime

smallerthantableSize

Example:insert89,18,49,58,and 69 intoatableofsize10,usingdoublehashingwithhash2(x)=7– xmod7

Hashtables– collisionresolution:doublehashing

Example:insert89,18,49,58,and 69 intoatableofsize10,usingdoublehashingwithhash2(x)=7– xmod7

Hashtables– collisionresolution:probing

1. Chaining (closedaddressing)2. Probing (open addressing)

a. Linearprobingb. Quadraticprobingc. DoubleHashingd. PerfectHashinge. CuckooHashing

Hashtables– collisionresolution:Cuckoohashing

Goal:constant-timeO(1)find intheworstcase

Exampleapplication:networkroutingtables

[remove alsotakesO(1)time]

Inserthasworst-caseΘ(N)run-time

Keeptwo hashtables,andusetwodifferenthashfunctions

Hashtables– collisionresolution:Cuckoohashing

TABLE1 TABLE2

01234

A:hash1(A)=0,hash2(A)=2

A B:hash1(B)=0,hash2(B)=0B

Hashtables– collisionresolution:Cuckoohashing

TABLE1 TABLE2

01234

A:hash1(A)=0,hash2(A)=2

A

B:hash1(B)=0,hash2(B)=0B

C:hash1(C)=1,hash2(C)=4C

D:hash1(D)=1,hash2(D)=0

D

Hashtables– collisionresolution:Cuckoohashing

TABLE1 TABLE2

01234

A:hash1(A)=0,hash2(A)=2

A

B:hash1(B)=0,hash2(B)=0B

C:hash1(C)=1,hash2(C)=4

C

D:hash1(D)=1,hash2(D)=0

D

E:hash1(E)=3,hash2(E)=2E

F:hash1(F)=3,hash2(F)=4

F

Hashtables– collisionresolution:Cuckoohashing

TABLE1 TABLE2

01234

A:hash1(A)=0,hash2(A)=2

A

B:hash1(B)=0,hash2(B)=0B

C:hash1(C)=1,hash2(C)=4

C

D:hash1(D)=1,hash2(D)=0

D

E:hash1(E)=3,hash2(E)=2

E

F:hash1(F)=3,hash2(F)=4

F

Hashtables– collisionresolution:Cuckoohashing

TABLE1 TABLE2

01234

A:hash1(A)=0,hash2(A)=2

A B:hash1(B)=0,hash2(B)=0B

C:hash1(C)=1,hash2(C)=4

C

D:hash1(D)=1,hash2(D)=0

D

E:hash1(E)=3,hash2(E)=2

E

F:hash1(F)=3,hash2(F)=4

F

Hashtables– collisionresolution:Cuckoohashing

TABLE1 TABLE2

01234

A:hash1(A)=0,hash2(A)=2

A B:hash1(B)=0,hash2(B)=0B

C:hash1(C)=1,hash2(C)=4

C

D:hash1(D)=1,hash2(D)=0

D

E:hash1(E)=3,hash2(E)=2

E

F:hash1(F)=3,hash2(F)=4

F

Hashtables– collisionresolution:Cuckoohashing

Insert

- InsertintoTable1,usinghash1

- Ifcellisalreadyoccupied

- bump itemintoother table(usingappropriatehashfunction)

- Repeat

- Rehash afterkrepetitions

Eachtableshould bemorethanhalfempty

Stronger condition thanloadfactor≤½

Rehashing

Whenloadfactorbecomestoolarge…

(Approximately)double tableSize

Scan oldtable,insertingeachnon-deleteditemintothenewtable

Worst-case time?

- O(N2)

Average-case:O(N)

Amortizedanalysis

Averagecostperinsert,overasequenceofrepeatedre-hashings

[Notgreatforinteractiveapplications…]

Hashtables- review

Supports thebasicdynamicdictionaryops:insert,find, remove

Threedesigndecisions: tableSize,hashfunction, collision resolution

Tablesize:aprime oftheform(4k+3),keepingloadfactor constraintsinmind

Hashfunction

Java’shashCode() method

item goestohash(item)%tableSize

Collision:multiple itemsatthesamelocation

Collision resolution:-chaining

Collision resolution: -probing (openaddressing)- Linearprobing

- Quadraticprobing

- DoubleHashing

- CuckooHashing

Java-specific– hashCode() andequals()

public class Employee {String name;int id;public Employee(String n, int i){name = n; id = i;}

public boolean equals(Employee e){return (name == e.name && id == e.id);

}}

……

public static void main(String[] args) {Employee e1=new Employee("weiss", 001);Employee e2=new Employee("weiss", 001);System.out.println(e1.hashCode() + ", " + e2.hashCode());System.out.println(e1 == e2);System.out.println(e1.equals(e2));

Employee e2 = e1;

f(i) canbeanylinear function (a*i+b)

Ifgcd(a,tableSize)=1,thenlinearprobing willprobe theentiretable

Primaryclustering:blocksofoccupiedcellsstartforming eveninarelativelyemptytable

Hashtables– collisionresolution:linearprobing

anyitemhashing here…

f(i) canbeanylinear function (a*i+b)

Ifgcd(a,tableSize)=1,thenlinearprobing willprobe theentiretable

Primaryclustering:blocksofoccupiedcellsstartforming eveninarelativelyemptytable

Hashtables– collisionresolution:linearprobing

anyitemhashing here… grows theclusterbyone

f(i) canbeanylinear function (a*i+b)

Ifgcd(a,tableSize)=1,thenlinearprobing willprobe theentiretable

Primaryclustering:blocksofoccupiedcellsstartforming eveninarelativelyemptytable

Hashtables– collisionresolution:linearprobing

anyitemhashing here… mergesthetwoclusters

Hashtables- clustering

Twocausesofclustering:

multiplekeyshashontothesamelocation(secondary clustering)

multiplekeyshashontothesamecluster(primary clustering)

Secondary clusteringcausedbyhashfunction;primary,bychoiceofprobesequence

Numberofprobesperoperationincreases with loadfactor

Hashtables– linearprobing:remove

0

1

2

3

4

5

6

7

8

9

insertA;hash(A)=4

A

insertB;hash(B)=5

B

insertC;hash(C)=4

C

removeBfindC

Removemust beimplementedaslazydelete!!

- Loadfactorcomputed including lazy-deleteditems

- Ininserts,may“reclaim”lazy-deletedcells

Recommended