Upload
padmakumar-pavamony
View
162
Download
7
Embed Size (px)
Citation preview
©Silberschatz, Korth and Sudarshan12.1Database System Concepts
Dynamic Hashing
Good for database that grows and shrinks in size Allows the hash function to be modified dynamically Extendable hashing – one form of dynamic hashing
This hashing scheme take advantage of the fact that the result of applying a hashing function is a non-negative integer which can be represented as a binary number- a string of bits.
a type of directory, i.e., an array of 2d bucket addresses—is maintained, where d is called the global depth of the directory.
A local depth d’—stored with each bucket—specifies the number of bits on which the bucket contents are based
Value of d grows and shrinks as the size of the database grows and shrinks.
Thus, actual number of buckets is < 2d
The number of buckets changes dynamically due to coalescing and splitting of buckets.
©Silberschatz, Korth and Sudarshan12.2Database System Concepts
Splitting and Coalescing of Buckets
Splitting of buckets is done when an overflow occurs; the value of d is incremented by one.
For a bucket whose hash value starting with 01, after splitting, first contains records whose hash value start with 010 and the other with 011
Coalescing occurs when records are deleted, i.e. d>d’; The value of d is decremented by one.
Extendible Hashing - Example
Record K h(K) h(K)2
rec1 2639 1 00001rec2 3760 16 10000rec3 4692 20 10100rec4 4871 7 00111rec5 5659 27 11011rec6 1821 29 11101rec7 1074 18 10010rec8 2115 11 01011rec9 1620 20 10100rec10 2428 28 11100rec11 3943 7 00111rec12 4750 14 01110rec13 6975 31 11111rec14 4981 21 10101rec15 9208 24 11000
01
rec 1rec 2
d1=0
record 3 = overflow!!
splitting bucket
d = 1d = 0
d1 = local depthd = global depth
rec 1
d1 = 1
d1 = 1
rec 2rec 3
rec 4
record 5 = overflow!!
splitting bucket
NEXT
Directory Locations
rec 2rec 3
rec 1rec 4
rec 5rec 6
00
10
d = 2
d1 = 2
d1 = 1
11d1 = 2
01
record 7 = overflow!!
splitting bucket
NEXT
rec 2
rec 3
rec 1rec 4
rec 5rec 6
d1 = 3
000
110
d = 3
d1 = 1
111d1 = 2
001010011
100101
d1 = 3
rec 7
record 8 = overflow!!
splitting bucket
NEXT
record 10 = overflow!!
splitting bucket
NEXTrec 1
rec 4
000
110
d = 3
111
001010011
100101
d1 = 3
d1 = 2
d1 = 3
d1 = 3
d1 = 3
d1 = 2rec 8
rec 2
rec 3
rec 5rec 6
rec 7
rec 9
record 13 = overflow!!
splitting bucket
NEXT
rec 5
rec 6rec 10
d1 = 3
d1 = 3
rec 7
rec 9
d1 = 3
d1 = 3
rec 1
rec 4
000
110
d = 3
111
001010011
100101
rec 2
rec 3
d1 = 3
d1 = 3
d1 = 2rec 8
rec 11
rec 12
rec 6 d1 = 4
d1 = 3
d1 = 3
d1 = 3
d1 = 2
0000
1110
d = 4
1111
000100100011
11001101
0100010101100111
10101011
10001001
d1 = 3
d1 = 4
d1 = 3
rec 10
rec 13
rec 1
rec 4
rec 2
rec 3
rec 5
rec 7
rec 8
rec 11
rec 12
rec 14
rec 15
©Silberschatz, Korth and Sudarshan12.10Database System Concepts
Advantages and Disadvantages
Benefits of extendable hashing: Hash performance does not degrade with growth of file
Minimal space overhead
Disadvantages of extendable hashing Extra level of indirection to find desired record
Bucket address table may itself become very big (larger than memory) Need a tree structure to locate desired record in the structure!
Changing size of bucket address table is an expensive operation
Linear hashing is an alternative mechanism which avoids these disadvantages at the possible cost of more bucket overflows. That is the directory is not needed.
©Silberschatz, Korth and Sudarshan12.11Database System Concepts
Indexing
©Silberschatz, Korth and Sudarshan12.12Database System Concepts
Indexing : Basic Concepts
Indexing mechanisms used to speed up access to desired data. E.g., The catalog of library.
Search Key - attribute to set of attributes used to look up records in a file.
An index file consists of records (called index entries) of the form
Index files are typically much smaller than the original file Two basic kinds of indices:
Ordered indices: search keys are stored in sorted order
Hash indices: search keys are distributed uniformly across “buckets” using a “hash function”.
search-key pointer
©Silberschatz, Korth and Sudarshan12.13Database System Concepts
Index Evaluation Factors
Access types supported efficiently. E.g., records with a specified value in the attribute
or records with an attribute value falling in a specified range of values.
Access time Insertion time Deletion time Space overhead- additional space occupied by an index
structure.
©Silberschatz, Korth and Sudarshan12.14Database System Concepts
Ordered Indices
In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library.
Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of the file. Also called clustering index
The search key of a primary index is usually but not necessarily the primary key.
Secondary index: an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index.
Index-sequential file: ordered sequential file with a primary index.
Indexing techniques evaluated on basis of:
©Silberschatz, Korth and Sudarshan12.15Database System Concepts
Dense Index Files
Dense index — Index record appears for every search-key value in the file.
©Silberschatz, Korth and Sudarshan12.16Database System Concepts
Sparse Index Files
Sparse Index: contains index records for only some search-key values. Applicable when records are sequentially ordered on search-key
To locate a record with search-key value K we: Find index record with largest search-key value < K
Search file sequentially starting at the record to which the index record points
Less space and less maintenance overhead for insertions and deletions.
Generally slower than dense index for locating records. Good tradeoff: sparse index with an index entry for every block
in file, corresponding to least search-key value in the block.
©Silberschatz, Korth and Sudarshan12.17Database System Concepts
Example of Sparse Index Files
©Silberschatz, Korth and Sudarshan12.18Database System Concepts
Multilevel Index
If primary index does not fit in memory, access becomes expensive.
To reduce number of disk accesses to index records, treat primary index kept on disk as a sequential file and construct a sparse index on it. outer index – a sparse index of primary index
inner index – the primary index file
If even outer index is too large to fit in main memory, yet another level of index can be created, and so on.
Indices at all levels must be updated on insertion or deletion from the file.
©Silberschatz, Korth and Sudarshan12.19Database System Concepts
Multilevel Index (Cont.)
©Silberschatz, Korth and Sudarshan12.20Database System Concepts
Index Update: Insertion
Single-level index insertion: Perform a lookup using the search-key value appearing in the record
to be inserted.
Dense indices – if the search-key value does not appear in the index, insert it.
Sparse indices – if index stores an entry for each block of the file, no change needs to be made to the index unless a new block is created. In this case, the first search-key value appearing in the new block is inserted into the index.
Multilevel insertion (as well as deletion) algorithms are simple extensions of the single-level algorithms
©Silberschatz, Korth and Sudarshan12.21Database System Concepts
Index Update: Deletion
If deleted record was the only record in the file with its particular search-key value, the search-key is deleted from the index also.
Single-level index deletion: Dense indices – deletion of search-key is similar to file record
deletion.
Sparse indices – if an entry for the search key exists in the index, it is deleted by replacing the entry in the index with the next search-key value in the file (in search-key order). If the next search-key value already has an index entry, the entry is deleted instead of being replaced.
©Silberschatz, Korth and Sudarshan12.22Database System Concepts
Secondary Indices
Frequently, one wants to find all the records whose values in a certain field (which is not the search-key of the primary index satisfy some condition. Example 1: In the account database stored sequentially
by account number, we may want to find all accounts in a particular branch
Example 2: as above, but where we want to find all accounts with a specified balance or range of balances
We can have a secondary index with an index record for each search-key value; index record points to a bucket that contains pointers to all the actual records with that particular search-key value.
©Silberschatz, Korth and Sudarshan12.23Database System Concepts
Secondary Index on balance field of account
©Silberschatz, Korth and Sudarshan12.24Database System Concepts
THANK YOU.
That’s all about Indices……