43
Extendible Hashing For Use as a File Structure 1

Extendible Hashing For Use as a File Structure 1

Embed Size (px)

Citation preview

Extendible Hashing

For Use as a File Structure

1

2

External Hashing

What if the hash table is a file in which each bucket is a record in that file?

Observations: A bucket may contain more than one

key value. The number of buckets may expand or

contract dynamically.

3

Extendible Hashing

Handling multiple key values per bucket is not a problem.

Collisions are resolved with overflow buckets rather than the next bucket.

Keep track of the number of times all buckets have been split (the “level”) and the next bucket to split.

4

The Hash Function

The standard hash function would now be something like:

H(x, L) = x mod (n * 2L)

“L” is the level, initially zero.If H(x, L) < b, then calculate

H(x, L+1).“b” is the next bucket to split.

5

The “Split”

Questions: When do I split the next bucket? What does a split entail?

We split when the load factor exceeds a certain threshold. The load factor is the number of key values / number of slots.

A split entails creating a new bucket and rehashing all keys in bucket b at level L+1.

6

The Insert Algorithm

Initialize L = 0 and b = 0;Calculate bucket = H(x, L)

if (bucket < b) bucket = H(x, L+1)If bucket has an empty slot, fill it with x

Else, create an overflow bucket for xIf the new load factor >= the threshold

Add new bucket at end Rehash all key values in bucket b at Level

L+1 Add one to b.

7

The Insert Algorithm II

If b = n * 2L We have split all the buckets at the

current level, so L = L + 1 b = 0

8

Insert Example

Insert 24:bucket = H(24, 0) = 0bucket >= b, so bucket 0 it is:

Insert: 24,10,15,33,60,11, 61,41

210 b=0, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 0/6 = 0threshold = 0.75

9

Insert Example

Insert 10:bucket = H(10, 0) = 1bucket >= b, so bucket 1 it is:

Insert 10,15,33,60,11, 61,41

21

24

0 b=0, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 1/6 = 0.17threshold = 0.75

10

Insert Example

Insert 15:bucket = H(15, 0) = 0bucket >= b, so bucket 0 it is:

Insert:15,33,60,11, 61,41

2

10

1

24

0 b=0, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 2/6 = 0.33threshold = 0.75

11

Insert Example

Insert 33:bucket = H(33, 0) = 0bucket >= b, so bucket 0 it is:

Insert:33,60,11, 61,41

2

10

1

2415

0 b=0, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 3/6 = 0.5threshold = 0.75

12

Insert Example

This requires an overflow bucket.Let’s assume overflow buckets also can hold

2 key values. Now, update load factor:

Insert:60,11, 61,41

2

10

1

2415

0 b=0, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 4/6 = 0.67threshold = 0.75

33

13

Insert Example

Insert 60bucket = H(60, 0) = 0bucket >= b, so bucket 0 it is:

Insert:60,11, 61,41

2

10

1

2415

0 b=0, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 4/8 = 0.5threshold = 0.75

33

14

Insert Example

Insert 11bucket = H(11, 0) = 2bucket >= b, so bucket 2 it is:

Insert:11, 61,41

2

10

1

2415

0 b=0, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 5/8 = 0.63threshold = 0.75

3360

15

Insert Example

Load factor >= threshold, so it is time to rehash all keys in bucket b = 0:

First, create a new bucket:

Insert:61,41

11

2

10

1

2415

0 b=0, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 6/8 = 0.75threshold = 0.75

3360

16

Insert Example

rehash 24 at level L+1:H(24, 1) = 24 mod 6 = 024 stays at bucket 0

Insert:61,41

11

2

10

1

2415

0 b=0, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 6/10 = 0.6threshold = 0.75

3360

3

17

Insert Example

rehash 15 at level L+1:H(15, 1) = 15 mod 6 = 315 moves to bucket 3

Insert:61,41

11

2

10

1

2415

0 b=0, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 6/10 = 0.6threshold = 0.75

3360

3

18

Insert Example

rehash 33 at level L+1:H(33, 1) = 33 mod 6 = 333 moves to bucket 3

Insert:61,41

11

2

10

1

24

0 b=0, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 6/10 = 0.6threshold = 0.75

3360

15

3

19

Insert Example

rehash 60 at level L+1:H(60, 1) = 60 mod 6 = 060 stays at bucket 0

Insert:61,41

11

2

10

1

24

0 b=0, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 6/10 = 0.6threshold = 0.75

60

1533

3

20

Insert Example

Add 1 to b; it is less than 3, so done with first split.

I now have an empty overflow bucket; remove it and recalculate load factor:

Insert:61,41

11

2

10

1

2460

0 b=0, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 6/10 = 0.6threshold = 0.75

1533

3

21

Insert Example

Load factor is now 0.75, so I need to split again, this time b=1.

Insert:61,41

11

2

10

1

2460

0 b=1, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 6/8 = 0.75threshold = 0.75

1533

3

22

Insert Example

Add bucket 4 and rehash all key values at bucket 1.

10 mod 6 = 4, so it should move:

Insert:61,41

11

2

10

1

2460

0 b=1, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 6/10 = 0.6threshold = 0.75

1533

3 4

23

Insert Example

Note update of b to 2; the load factor is OK, so continue with insert of 61.

Insert:61,41

11

21

2460

0 b=2, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 6/10 = 0.6threshold = 0.75

1533

3

10

4

24

Insert Example

bucket = H(61,0) = 1Since bucket < b, recalculate at L+1:bucket = H(61, 1) = 1

Insert:61,41

11

21

2460

0 b=2, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 6/10 = 0.6threshold = 0.75

1533

3

10

4

25

Insert Example

Finally, insert 41bucket = H(41,0) = 2bucket >= b so 2 it is:

Insert:41

11

2

61

1

2460

0 b=2, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 7/10 = 0.7threshold = 0.75

1533

3

10

4

26

Insert Example

Load factor >= threshold, so split bucket 2:

Insert:done

1141

2

61

1

2460

0 b=2, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 8/10 = 0.8threshold = 0.75

1533

3

10

4

27

Insert Example

Both 11 and 41 are 5 mod 6, so both go to bucket 5.

Update b...

Insert:done

2

61

1

2460

0 b=2, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 8/12 = 0.67threshold = 0.75

1533

3

10

4

1141

5

28

Insert Example

b = 3*2L, so set b=0 and L=L+1:

Insert:done

2

61

1

2460

0 b=3, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 8/12 = 0.67threshold = 0.75

1533

3

10

4

1141

5

29

Insert Example

Done.

Insert:done

2

61

1

2460

0 b=0, L=1H(x) = x mod 3*2L

2 key values /bucketLoad factor = 8/12 = 0.67threshold = 0.75

1533

3

10

4

1141

5

30

Insert Example

2

61

1

2460

0

1533

3

10

4

1141

5

62

31

Deleting with Extendible Hashing

Delete works the opposite of insert: When the load factor goes below a

lower threshold, combine buckets. Note: if b=0, it is necessary to

decrement the level

32

Delete Algorithm

Hash the key value to delete in the standard way, hashing at level L+1 if necessary. If the key value is not found, report

failure and stop Else continue

Update the load factor

33

Delete Algorithm II

If the load factor <= Lower Threshold Decrement b if (b== -1)

if (L=0) set b=0 and stopL=L-1 and b=n*2L - 1

Combine the last bucket with bucket b; Repeat if necessary.

34

Delete Example

Let’s start with the final table from our insert example.

We’ll use 0.5 as our lower threshold.

Delete: 60, 10, 41

2

61

1

2460

0 b=0, L=1H(x) = x mod 3*2L

2 key values /bucketLoad factor = 8/12 = 0.67Lower threshold = 0.5

1533

3

10

4

1141

5

35

Delete Example

Delete 60H(60, 1) = 0 which is >= bRemove 60 from bucket 0:

Delete: 60, 10, 41

2

61

1

2460

0 b=0, L=1H(x) = x mod 3*2L

2 key values /bucketLoad factor = 8/12 = 0.67Lower threshold = 0.5

1533

3

10

4

1141

5

36

Delete Example

Delete 10H(10,1) = 4 which is >=bRemove 10 from bucket 4:

Delete: 10, 41

2

61

1

24

0 b=0, L=1H(x) = x mod 3*2L

2 key values /bucketLoad factor = 7/12 = 0.58Lower threshold = 0.5

1533

3

10

4

1141

5

37

Delete Example

Time to combine buckets.Decrementing b results in b=-1 soset L=0 and b= 3*20 - 1 = 2

Delete: 41

2

61

1

24

0 b=0, L=1H(x) = x mod 3*2L

2 key values /bucketLoad factor = 6/12 = 0.5Lower threshold = 0.5

1533

3 4

1141

5

38

Delete Example

Next, combine the last bucket (5) with bucket 2:

Delete: 41

2

61

1

24

0 b=2, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 6/12 = 0.5Lower threshold = 0.5

1533

3 4

1141

5

39

Delete Example

Bucket 5 is deleted and the load factor is updated.

Load factor > lower threshold, so done.

Delete: 41

1141

2

61

1

24

0 b=2, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 6/10 = 0.6Lower threshold = 0.5

1533

3 4

40

Delete Example

Delete 33H(33, 0) = 0 < b, so rehash at L+1:H(33, 1) = 3; remove 33 from bucket 3:

Delete: 33

1141

2

61

1

24

0 b=2, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 6/10 = 0.6Lower threshold = 0.5

1533

3 4

41

Delete Example

Load Factor <= Lower threshold, so time to combine...

First, decrement b:

Delete: done

1141

2

61

1

24

0 b=2, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 5/10 = 0.5Lower threshold = 0.5

15

3 4

42

Delete Example

Now, combine last bucket (4) with bucket b=1, and remove bucket 4.

Update the load factor too:

Delete: done

1141

2

61

1

24

0 b=1, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 5/10 = 0.5Lower threshold = 0.5

15

3 4

43

Delete Example

Load factor >= lower threshold, so done.

Delete: done

1141

2

61

1

24

0 b=1, L=0H(x) = x mod 3*2L

2 key values /bucketLoad factor = 5/8 = 0.625Lower threshold = 0.5

15

3