View
217
Download
0
Category
Tags:
Preview:
Citation preview
Advanced Databases Indexing Techniques 2
The Problem
• What can we introduce to make search more efficient?– Indices!
• What is an index?
… …
Advanced Databases Indexing Techniques 3
Definitions
• Index: an auxiliary data structure to speed up record retrieval• Search key: the field/s of a table which is/are indexed• Storage: index files that contain index records
– Each entry storing• Actual data record • or, search key value k and record ID <k,rid> • or, search key value k and list of records IDs <k,rid list>
• Types: ordered and unordered (hash) indices
Page i Page i+1
Paul
Anna
Tim
Advanced Databases Indexing Techniques 4
Types of Ordered Indices (1/3)
• Assuming ordered data files• Depending on which field is indexed
– Primary index: search key is ordering key field• Pointer for each page
– Secondary index: search key is non ordering field
Paul00112233Anna00112234Matt00112235Tim00112236
Carol00112237Rob00112238
00112233001122350011223600112238
AnnaCarolPaulTim
primary
secondary
Advanced Databases Indexing Techniques 5
Types of Ordered Indices (2/3)
• Depending on the density of index records– Dense index: an index record for each distinct search key value, ie
every record
– Sparse index: index records for only some search key values• search key value for first record in page• pointer to page
Paul00112233Anna00112234
Matt00112235
Tim00112236Carol00112237
Rob00112238
00112233001122350011223600112238
sparse
001122330011223400112235001122360011223700112238
dense
Advanced Databases Indexing Techniques 6
Types of Ordered Indices (3/3)
• Ordering field is nonkey (may have duplicates)– Clustered index
– Unclustered index
Paul00112233
Anna00112234
Matt00112235
Tim00112236
Carol00112237
Rob00112238Paul01112233
Tim01112236Tim02112236
AnnaCarolMattPaulRobTim
001122330011223400112235001122360011223700112238011122330111223602112236
clustered
unclustered
Advanced Databases Indexing Techniques 7
Indices Exercise
• 215 records• 128 bytes/record• 210 bytes/page• ordered file equality search on ordering field, unspanned
organization– without an index
– with a primary index• on field of size 12 bytes• assume pointer 4 bytes long
Advanced Databases Indexing Techniques 8
Multi-level Indices (1/2)
• If access using first-level index is still expensive• Build a sparse index on the first-level index
– Multi-level Index
• Fan-out: index blocking factor
Paul00112233Anna00112234
Matt00112235
Tim00112236Carol00112237
Rob00112238
0011223300112234
00112235
001122360011223700112238
00112233
00112235
00112236
first-level index
second-level index
Advanced Databases Indexing Techniques 9
Multi-level Indices (2/2)
• 26 index records/page (fan-out)• 215 index records• 1st-level
– 29 pages
• 2nd-level– 29 index records
– 23 pages
• 3rd-level– 23 index records
– 1 page
• 1 <= 215 / (26)t
• t = ceil(log26 215 ) = 3
• t = ceil(logfo#index-records)
Advanced Databases Indexing Techniques 10
Dynamic multi-level indices
• So far assumed indices are physically ordered files– expensive insertions and deletions
• Dynamic multi-level indices– B trees
– B+ trees
Advanced Databases Indexing Techniques 11
Tree-structured Indices
• For each node: K1 < K2 < … Kq-1
• For each value X in subtree pointed to by Pi
– Ki-1< X < Ki, 1<i<q
– X < Ki, i=1
– Ki-1< X, i=q
P1 K1 … Ki-1 Pi Ki … Kq-1 Pq
X X X
Advanced Databases Indexing Techniques 12
B tree
• Problems: empty nodes, unbalanced trees– solution: B trees
…
…
… …
…
…
…
… …
Advanced Databases Indexing Techniques 13
B tree: Definition
• Each node: <P1,<K1, Pr1>, P2,…,<Kq-1, Prq-1>, Pq>• Pi tree pointer, Ki search value, Pri data pointer • For each node: K1 < K2 < … Kq-1
• For each value X in subtree pointed to by Pi – Ki-1< X < Ki, 1<i<q– X < Ki, i=1– Ki-1< X, i=q
• Each node at most q pointers– B tree is order q
• Each node at least ceil(q/2) tree pointers– except from root
• Internal node with p pointers has p-1 values• All leaves at the same level
– balanced tree
Advanced Databases Indexing Techniques 14
B tree: Example
5 8
ø 1 ø 3 ø ø 6 ø 7 ø ø 9 ø 12 ø
tree pointer
data pointer
ø null pointer
Advanced Databases Indexing Techniques 15
B+ tree
• Most implementations of B tree are B+ tree• Data pointers only in leaves
– more entries in internal nodes than regular B trees
– less internal nodes
– less levels
– faster access
Advanced Databases Indexing Techniques 16
B+ tree: Definition
• Internal nodes: <P1,K1, P2,…, Pq-1, Kq-1, Pq>
• Leaf nodes: <<K1, Pr1>, <K2, Pr2>,…,<Kp-1, Prp-1>, Pnext>
• Pri points a data records or block of pointers of such records
• leaf order
120 150 180
150 156 179 180 200
100 101 110 120 130
Advanced Databases Indexing Techniques 17
100 101 110 120 130 150 156 179 180 200
3 5 11 30 35
120 150 18030
100
B+ tree: Search
• At each level, find smallest Ki larger than search key
• Follow associated pointer Pi
Advanced Databases Indexing Techniques 18
B+ tree: Insert
• Nodes may overflow or underflow• Ignoring overflow or underflow• Inserting data record with with search key value k
– find leaf node
– if k found• add record to file, create indirect block if there isn’t one• add record pointer to indirect block
– if k not found• add data record to file• insert record pointer in leaf node (all search keys in order)
Advanced Databases Indexing Techniques 19
B+ tree: Delete
• Ignoring overflow or underflow• Find leaf node with search key value k• Find data record pointer, delete record• delete index record
– and indirect block, if any, if empty
Advanced Databases Indexing Techniques 20
B+ tree: Simple Insert
• Insert 42
100 101 110 120 130 150 156 179 180 200
3 5 11 30 35
120 150 18030
100k < 100
42
Advanced Databases Indexing Techniques 21
B+ tree: Leaf Overflow (1/2)
• Insert 9
100 101 110 120 130 150 156 179 180 200
3 5 11 30 35 42
120 150 18030
100k < 100
Advanced Databases Indexing Techniques 22
B+ tree: Leaf Overflow (2/2)
• first ceil(n/2) in existing node, rest in new leaf node• n=3+1=4
100 101 110 120 130 150 156 179 180 200
120 150 1809 30
100k < 100
3 5 30 35 429 11
Advanced Databases Indexing Techniques 23
9 30
k < 100
3 5 30 35 429 11
B+ tree: Internal Node Overflow (1/3)
• Insert 210, insert 205
100 101 110 120 130 150 156 179 180 200 210
120 150 180
100
Advanced Databases Indexing Techniques 24
B+ tree: Internal Node Overflow (2/3)
• Leaf Split
9 30
k < 100
3 5 30 35 429 11
100 101 110 120 130 150 156 179 180 200
120 150 180
100
205 210
Advanced Databases Indexing Techniques 25
B+ tree: Internal Node Overflow (3/3)
9 30
k < 100
3 5 30 35 429 11
100 101 110 120 130 150 156 179 180 200
120
100 150
205 210
180 205
Advanced Databases Indexing Techniques 26
B+ tree: New Root (1/2)
• Insert 210, insert 205
100 101 110 120 130 150 156 179 180 200
120 150 180
205 210
Advanced Databases Indexing Techniques 27
B+ tree: New Root (2/2)
180 205
100 101 110 120 130 150 156 179 180 200
120
205 210
150
Advanced Databases Indexing Techniques 28
Index Insert Exercise
• Insert 8, 7, 41
9 30
3 5 30 35 429 11
Advanced Databases Indexing Techniques 29
B+ tree: Delete
• Simple delete case• Underflow case:
– redistribute records
– coalesce with siblings
– update parents
Advanced Databases Indexing Techniques 30
B+ tree: Simple Delete (1/2)
• Delete 110
180 205
100 101 110 120 130 150 156 179 180 200
120
205 210 215
150
Advanced Databases Indexing Techniques 31
B+ tree: Simple Delete (2/2)
• Leaf Updated
180 205
100 101 120 130 150 156 179 180 200
120
205 210 215
150
Advanced Databases Indexing Techniques 32
B+ tree: Delete Redistribution (1/2)
• Delete 180
180 205
100 101 120 130 150 156 179 180 200
120
205 210 215
150
Advanced Databases Indexing Techniques 33
B+ tree: Delete Redistribution (2/2)
• Redistribute entries– left or right sibling
179 205
100 101 120 130 150 156 179 200
120
205 210
150
Advanced Databases Indexing Techniques 34
B+ tree: Delete Coalesce (1/4)
• Delete 101
179 205
100 101 120 130 150 156 179 200
120
205 210 215
150
Advanced Databases Indexing Techniques 35
B+ tree: Delete Coalesce (2/4)
• Leaf updated• No redistribution
– sibling coalesce
179 205
100 120 130 150 156 179 200
120
205 210 215
150
Advanced Databases Indexing Techniques 36
B+ tree: Delete Coalesce (3/4)
• Leaf updated• No redistribution
– sibling coalesce
179 205
100 120 130 150 156 179 200
205 210 215
150
Advanced Databases Indexing Techniques 37
B+ tree: Delete Coalesce (4/4)
• Redistribution
205
100 120 130 150 156 179 200
150
205 210 215
179
Advanced Databases Indexing Techniques 39
Static Hashing (1/2)
• Store records in buckets with overflow chains• Allocate a fixed number of buckets M• Problems:
– small M• long overflow chains, slow search-delete-insert
null
h
null
Advanced Databases Indexing Techniques 40
Static Hashing (2/2)
• Problems:– large M
• wasted space, slow scan null
h
null
null
Advanced Databases Indexing Techniques 41
Dynamic Hashing
• Splitting and coalescing buckets as the database grows-shrinks• One scheme: Extendible Hashing• Hash function generates large values, eg 32 bits
– use i bits, change i as database size changes
• If overflow, double the number of buckets– use i+1 bits of the hash function
– but, expensive: read all pages M and distribute records in 2*M pages
• solution: use a directory and double the size of the directory– only split bucket that overflowed
Advanced Databases Indexing Techniques 42
Extendible Hashing (1/4)
h(18) = 10010
2
01
00
11
10
16 20
2
1
2
2
Directory
Buckets
3 7
2
A
B
C
D
18
Advanced Databases Indexing Techniques 43
Extendible Hashing (2/4)
h(4) = 00100
2
01
00
11
10
16 20
2
1
2
2
3 7
2
A
B
C
D
18
Advanced Databases Indexing Techniques 44
Extendible Hashing (3/4)
2
01
00
11
10
16
3
1
2
2
3 7
2
A
B
C
D
18
20 4
3A1
Advanced Databases Indexing Techniques 45
Extendible Hashing (4/4)
3
001
000
011
010
16
3
1
2
2
3 7
2
A
B
C
D
18
20 4
3A1
101
100
111
110
• Global Depth• Local Depth• If bucket full:
– split bucket
– increment LD
• If GD=LD– increment GD
– double directory
Advanced Databases Indexing Techniques 46
Extendible Hashing: Delete
• If deletion make bucket empty– merge with split image
• If directory pointers point to same bucket as split image– directory halved
Advanced Databases Indexing Techniques 47
Extendible Hashing: Summary
• Avoids overflow pages• Directory can get large• Key search requires just 2 page reads• Space utilization fluctuates
– 59-90% for uniformly distributed records
Recommended