Parallel Hashing Algorithms

Parallel Hashing Algorithms

Prepared By : Akash Panchal (14MCEC17) & Nisarg Patel (14MCEC20)@Nirma University, Ahmedabad

Concurrent hash table

• Indexing key-value objects• Lookup(key)• Insert(key, value)• Delete(key)

•Fundamental building block for modern systems

• System applications (e.g., kernel caches)

• Concurrent user-level applications

Goal: memory-efficient and high-throughput

• Memory efficient (e.g., > 90% space utilized)

• Fast concurrent reads (scale with # of cores/threads)

• Fast concurrent writes (scale with # of cores/threads)

separate chaining hash table

K V

K V

K V

K V

K V

Chaining items hashed in same bucket

lookup K V


- Acquiring a Linked List Node Location- Only one thread should insert new node.

- find the end of the linked list, lock out other threads.

- Make sure we are still at the end of the linked list.

- Then allocate a new node from the unfilled region, and finally link it in.

Good: simple

Bad: poor cache locality

Bad: pointers cost space


open addressing hash table

Probing alternate locations for vacancy

e.g., linear/quadratic probing, double hashing

- Inserting a Key :

- 1) The bucket is empty.- Must acquire a location.- TwoStepAcquireAttempt algorithm is used.- Insert the key.

- 2) The bucket is already filled.- Find next possible bucket using any probing

method.

Procedure: TwoStepAcquireAttempt

Parameters: (location, key)

// first check without locking

1: if location is empty then

2: lock(location)

// then check with the lock

3: if location is still empty then

4: Reserve location for key

5: unlock(location)

6: return true

7: end if

// Another thread modified the structure before we could acquire location

8: unlock(location)

9: end if

10: return false . // Caller needs to continue searching

open addressing hash table

Probing alternate locations for vacancy

lookupGood: cache friendly

Bad: poor memory efficiency

- performance dramatically degrades when the usage grows beyond 70% capacity or so

- e.g., Google dense_hash_map wastes 50% memory by default.

Linear Hashing with linear Probing

H1 H2 H3 H4 H5 H6 H7 H8 H9 h10

Parallel Insert :Case 1 : Two keys are inserted at different location.

xH1 = insert( ) yH5 = insert( )

Pass : Pass 1


H1 H2 H3 H4 H5 H6 H7 H8 H9 h10

s u t

Parallel Insert :Case 2 : Two keys are inserted at different location but collision occurs.


Pass : Pass 1 Pass 2 Pass 3


H1 H2 H3 H4 H5 H6 H7 H8 H9 h10


Pass : Pass 1

Parallel Insert :Case 3 : Two keys tried to insert at same location and contention occurred.

x y

Pass 2

Analysis : linear hashing

Serial Approach Parallel Approach

Lookup : Best case : O(n)Worst Case : O(n2)

Best case : O(1),Worst case : O(n)

Insert Best Case : O(n),Worst Case : O(n2)

Best case : O(1)Worst Case : O(n)

Removal : Best case : O(n),Worst Case : O(n2)

Best case : O(1)Worst case : O(n)

Cuckoo hashing

• Use two hash tables and two hash functions

• Each element will have exactly one “nest” (hash location) in each table• Guarantee that any element will only ever exist in one of its “nests”.

• Lookup/delete are O(1) because we can check 2 locations (“nests”) in O(1) time.

Cuckoo hashing

Insertion :

1. Insert an element by finding one of its “nests” and putting it there

• This may evict another element!

2. Insert the evicted element into its *other* “nest” This may evict another element!

X

Cuckoo hashing in Sequential: ‘X’ is inserted by moving ‘Y’ and ‘Z’

Cuckoo hashing

Asymmetric cuckoo hashing :

• Choose one (the first) table to be larger than the other

– Improves the probability that we get a hit on the first lookup

– Only a minor slowdown on insert

Cuckoo hashing

Same Table cuckoo Hashing :

• We didn’t actually need two separate tables.

– It made the analysis much easier.

– But… In practice, we just need two hash functions.

Cuckoo hashing

Each bucket has b slots for items (b-way set-associative)

Each key is mapped to two random buckets

• stored in one of thembuckets

012345678

key x

hash1(x)

hash2(x)

Predictable and fast lookup

• Lookup: read 2 buckets in parallel• constant time in the worst case

x

012345678

Lookup x

012345678

move keys to alternate buckets

Insert may need “cuckoo move”

• Insert:

Both are full？

Insert y a

x

b

k

r

cse

nf

x

a

bWrite to an empty slot in one of the

possible locations

x

a

bpossible locations

possible locations

Insert may need “cuckoo move”

• Insert: move keys to alternate buckets

• find a “cuckoo path” to an empty slot

• move hole backwards

012345678

Insert y

x

a

b

y

b

a

x

Case 1 : Inserting ‘a’ and ‘b’ in parallel, ‘b’ find its place in 1st hash function and ‘a’ has to evict value.

k

Case 2 : Inserting ‘a’ and ‘b’ in parallel, ‘a’ and ‘b’ has to evict value.

Algorithmic optimizations

• Lock after discovering a cuckoo path• minimize critical sections

• But…

• Decreases parallelization

Previous approach: writer locks the table during the whole insert process

lock();

Search for a cuckoo path;

Cuckoo move and insert;

unlock();

All Insert operations of other threads are blocked

Case 3 : Inserting ‘a’ and ‘b’ in parallel, ‘a’ and ‘b’ has to evict value.But they are having same path, so contention will occur on path.

while(1) {

Search for a cuckoo path;

lock();


if(success)

unlock();

break;

}

This algo will work to avoid contention on path and one who came first will be completing the process and other will try after some time.

Lock after discovering a cuckoo path

while(1) {Search for a cuckoo path;

lock();Cuckoo move and insert;if(success)

unlock();break;

}

// no locking required

Multiple Insert threads can look for cuckoo paths concurrently

←collision


Analysis : parallel cuckoo hashing


Lookup : Best case : O(n)Worst Case : O(n2)

Best case : O(1),Worst case : O(n)

Insert : Best Case : O(n),Worst Case : O(n),

Best case : O(1)Worst Case : O(n),

Removal : Best case : O(n),Worst Case : O(n2)

Best case : O(1)Worst case : O(n)

Cuckoo with 2 hash function performs well until 50% load.

Cuckoo with 3 hash function performs well until 90% load.

Lock-free Cuckoo Hashing

• The algorithm allows mutating operations to operate concurrently with query ones and requires only single word compare-and-swap primitives.

• When an insertion triggers a sequence of key displacements, instead of locking the whole cuckoo path, this algorithm breaks down the chain of relocations into several single relocations which can be executed independently and concurrently with other operations.


• Here, concurrent cuckoo hashing contains two hash tables (sub-tables), which correspond to two independent hash functions.

• Each key can be stored at one of its two possible positions, one in each sub-table called primary and secondary respectively.


• Problem :

• The original cuckoo approach inserts the new key to the primary sub-table by evicting a collided key and re-inserting it to the other sub-table.

• This approach, however, causes the evicted key to be “nestless”, i.e. absent from both sub-tables, until the re-insertion is completed.

• This might be an issue in concurrent environments: the “nestless” key is unreachable by other concurrent operations which want to operate on it.


• Problem :

• Moving key problem :

• The key present in table[1] is relocated to table[0], meanwhile search reads

from table[0] and then table[1].

• To avoid such missing, search performs the second round query.

• Another issue with insertion operations is that a key can be inserted to two sub-tables simultaneously. • Since concurrent insertions can operate independently on two sub-tables, they can

both succeed. This results in two existing instances of one key, possibly mapped to different values.

• To prevent such a case to happen, one can use a lock during insertion to lock both slots so that only one instance of a key is successfully added to the table at a time.


• Searching :

• Searching for a key in our lock-free cuckoo hash table includes querying for the existence of the key in two sub-tables.

• A key is available if it is found in one of them. A search operation starts with the first query round by reading from the possible slots in the primary sub-table and then in the secondary sub-table

• If the key is found in one of them, the value mapped to key is returned.

Search(x) : Case 1 : Value found in the Primary sub-table.

x

Primary sub-table Secondary sub-table

H1 = hash1(x)

h1

Search(x) : Case 2: Value found in the secondary sub-table.


x

H2= hash2(x)

h2


• Insert :

• Find :

• It examines both sub-tables to discover if two instances of the key exist in two sub-tables. When the same key is found on both sub-tables, the one in the secondary sub-table is deleted.

Find(x) :

Case 1 : If key already exist then update & return.


x

H1 = hash1(x)

Find(x) :

Case 2 : If the same key found on both then one from secondary sub-table gets deleted.


x

H1 = hash1(x)

x

H2= hash2(x)

h1

h2

Find(x) :

Case 3 : returns current items which are stored at possible slots where key should be hashed to.


y

H1 = hash1(x)

z

H2= hash2(x)

return y;

return z;


• Insert :• Insert a new value using CAS( Compare-And-Swap) primitives.

• CAS is a synchronization primitive available in most modern processors. It compares the content of a memory word to a given value and, only if they are the same, modifies the content of that word to a given new value.

y

Insert (x) :

Case 1 : If one of the two slots is empty, the new entry is inserted with a CAS.


H1 = hash1(x)

x

H2 = hash2(x)

x

h1

h2

r

e

c

a

b

n

Insert (x) :

Case 2 : If both the slots are occupied then relocationprocess initiated using the cuckoo path.


H1 = hash1(x)

x

Cuckoo Path : s y k

s

y k

ys

k

k

y

s

f

X

Cuckoo hash : Problem while Cycle

Solution : After Threshold time, use another hash function or extend the size of hash table.

S T

U

Z

Y V

Analysis : lock-free cuckoo


Lookup Best Case : O(n)Worst Case : O(d*n)

Best Case : O(1)Worst Case : O(d)

Insert Best Case : O(n),Worst Case : > O(d*n),

Best case : O(1)Worst case : > O(d),

Removal Best Case : O(n)Worst Case : O(d*n)

Best Case : O(1)Worst Case : O(d)

Where, d>2 is number of hash functions and sub table

Concurrent cuckoo hash table

- high memory efficiency

- fast concurrent writes and reads

References

1. Eric L. Goodman, David J. Hagliny, Chad Scherrer,Daniel Chavarr´ia-Miranda, Jace Mogil, John Feo, “Hashing Strategies for the Cray XMT”

2. Parallel and Sequential Data Structures and Algorithms, 15-210 (Spring 2012) Lectured by Margaret Reid-Miller — 26 April 2012.

3. Xiaozhou Li, David G. Andersen, Michael Kaminsky, Michael J. Freedman, “Algorithmic Improvements for Fast Concurrent Cuckoo Hashing”, “Princeton University, Carnegie Mellon University, Intel Labs.”

4. Nhan Nguyen, Philippas Tsigas,Chalmers University of Technology Gothenburg, Sweden, “Lock-free Cuckoo Hashing”.

Education

Parallel Hashing Algorithms