14
CPSC 335 Dr. Marina Gavrilova Computer Science University of Calgary Canada

CPSC 335 Dr. Marina Gavrilova Computer Science University of Calgary Canada

Embed Size (px)

Citation preview

CPSC 335

Dr. Marina GavrilovaComputer Science

University of Calgary

Canada

Extendible hashing Expandable and dynamic hashing Virtual hashing Summary

2

OUTLINE

3

Standard hashing works on fixed file size.

What if we add / delete many keys? What if the file sizes change significantly?

Then we will develop separate techniques. Two types: - Directory schemes - Directory less schemes

Hash Functions for Extendible Hashing

4

Keys stored in buckets.

Each bucket can only hold a fixed size of items.

Index is an extendible table; h(x) hashes a key value x to a bit map; only a portion of a bit map is used to build a directory.

Example: buckets h(kn) = 11011

Add kn

b00 ********************************

b00

b01 b01

b10

Table

b1 b11

Extendible Hashing

00011

00110

0010101100

0101110011

11110

11111

0001 10 11

0001 10 11

1001111011

11110

11111

5

Directory schemes - Extendible Hashing (Fagin et. al. 1979) - Expandable hashing (Knott 1971) - Dynamic Hashing (Larson 1978) Directory less schemes - Virtual hashing (Litwin 1978)

Hash Functions for Extendible Hashing

6

Size of a bucket = MAX # of pseudokeys (3 in our example)

Once the bucket is full – split the bucket into two

Two situation will be possible: - Directory remains of the same size adjust pointer to a bucket

- Size of directory grows from 2k to 2k+1 i.e. directory size can be 1, 2, 4, 8, 16 etc (8 is shown in the figure).

The number of buckets will remain the same, i.e. some references will point to the same bucket.

Finally, one can use bitmap to build the index but store an actual key in the bucket!

Extendible Hashing

000

001

010

011

100

101

110

111

7

1. Use as much space as needed.

2. Input the file name, # of words to insert Use bucket size: 128

3.Use any function h(k) that returns the string of bits of up to 32 bits (integer type can be used).

4.Bucket – char array

5.Main idea: only the FIRST bits of the mask are used for search

Extendible Hashing

8

Assume that a hashing technique is applied to a dynamically changing file composed of buckets, and each bucket can hold only a fixed number of items.

Extendible hashing accesses the data stored in buckets indirectly through an index that is dynamically adjusted to reflect changes in the file.

The characteristic feature of extendible hashing is the organization of the index, which is an expandable table.

Extendible Hashing

9

A hash function applied to a certain key indicates a position in the index and not in the file (or table or keys). Values returned by such a hash function are called pseudokeys.

The file requires no reorganization when data are added to or deleted from it, since these changes are indicated in the index.

Only one hash function h can be used, but depending on the size of the index, only a portion of the added h(K) is utilized.

A simple way to achieve this effect is by looking at the address into the string of bits from which only the i leftmost bits can be used.

The number i is the depth of the directory. In figure 1(a) (in the next slide), the depth is equal to two.

Extendible Hashing

10

Extendible Hashing

Figure 1. An example of extendible hashing (Drozdek Textbook)

11

Expandable Hashing Similar idea to an extendible hashing. But binary tree is used to store an index on the buckets.

Dynamic Hashing

multiple binary trees are used. Outcome: - To shorten the search. - Based on the key --- select what tree to search.

Expandable & Dynamic Hashing

12

Larson method Index is simplified to be represented as a set of binary trees.

Height of each tree is limited.

h(x) is searched in ALL trees. Time: m – trees, k keys in each max, overall: m*lgk.Advantage: shorter search time in index file

Dynamic Hashing

13

Litwin’s Virtual Hashing Expand buckets in a linear fashion.

Store them continuously in the memory.

No table is needed, the procedure is simple.

Virtual Hashing

14

Summary

Extendible hashing advantages: Initially allocated space can increase indefinitely Location of a bucket where key belongs requires only very fast

bits comparison Very flexible in choosing size of the bucket, and allows their

storage on disks/remote memory access

Extendible hashing disadvantages: Increased algorithm complexity Extra memory overhead to store index inside the bucket