1
Chapter 12: Indexing and Chapter 12: Indexing and HashingHashing
IndexingIndexing Basic ConceptsBasic Concepts Ordered Indices Ordered Indices B+-Tree Index FilesB+-Tree Index Files
HashingHashing StaticStatic Dynamic HashingDynamic Hashing
2
Basic ConceptsBasic Concepts
ValueValue
Search KeySearch Key - set of attributes used to - set of attributes used to look up records in a file.look up records in a file.
value
record
search key pointer
3
Index Evaluation MetricsIndex Evaluation Metrics
Access types supported efficiently. E.g., Access types supported efficiently. E.g., Point query: find “Tom”Point query: find “Tom” Range query: find students whose age is Range query: find students whose age is
between 20-40between 20-40 Access timeAccess time Update timeUpdate time Space overheadSpace overhead
4
Ordered IndicesOrdered Indices In an In an ordered indexordered index, , index entries are index entries are
stored sorted on the search key value. stored sorted on the search key value. E.g., author catalog in library.E.g., author catalog in library.
5
2010
4030
6050
8070
10090
10305070
90110130150
170190210230
Primary index
Also called clustering index
•The search key of a primary index is usually but not necessarily the primary key.
same order
Search key
6
Search key
5030
7020
4080
10100
6090
10203040
506070...
Secondary index: non-clustering index.
different order
7
Sequential File
2010
4030
6050
8070
10090
Dense Index
10203040
50607080
90100110120
Dense Index: contains index records for every search-key values.
8
Sequential File
2010
4030
6050
8070
10090
Sparse Index
10305070
90110130150
170190210230
Sparse Index: contains index records for only some search-key values.
Applicable when records are sequentially ordered on search-key
9
Secondary indexesSecondary indexes Sequencefield
5030
7020
4080
10100
6090
• Sparse index
302080
100
90...
does not make sense!
10
Sequential File
2010
4030
6050
8070
10090
Sparse 2nd level
10305070
90110130150
170190210230
1090
170250
330410490570
Multilevel IndexMultilevel Index
11
Secondary indexesSecondary indexes Sequencefield
5030
7020
4080
10100
6090
10203040
506070...
105090...
sparsehighlevel
Lowest level is denseLowest level is dense Other levels are sparseOther levels are sparse
Multilevel IndexMultilevel Index
12
Conventional indexesConventional indexes
Advantage:Advantage:
- Simple- Simple- Index is sequential file good - Index is sequential file good
for for scansscansDisadvantage:Disadvantage:
- Inserts expensive- Inserts expensive
13
OutlineOutline
Conventional indexesConventional indexes B+-Tree B+-Tree NEXT NEXT
14
NEXT: Another type of indexNEXT: Another type of index Give up on sequentiality of indexGive up on sequentiality of index Try to get “balance”Try to get “balance”
15
RootRoot
B+Tree Example n=4
100
120
150
180
30
3 5 11
30
35
100
101
110
120
130
150
156
179
180
200
16
Sample non-leafSample non-leaf
57
81
95
Key is moved (not copied) from lower level non-leaf node to upper level non-leaf node
to keys to keys to keys to keys
< 57 57 k<81 81k<95 95
17
Sample leaf node:Sample leaf node:
From non-leaf nodeFrom non-leaf node
to next leafto next leaf
in sequencein sequence
57
81
95
To r
eco
rd
wit
h k
ey 5
7
To r
eco
rd
wit
h k
ey 8
1
To r
eco
rd
wit
h k
ey 8
5
Key is copied (not moved) from leaf node to non-leaf node
18
n=4n=4
Leaf:Leaf:
Non-leaf:Non-leaf:
30
35
30
30 35
30
19
Size of nodes: Size of nodes:
n pointersn pointers
n-1 keysn-1 keys
20
Don’t want nodes to be too Don’t want nodes to be too emptyempty
Use at leastUse at least
Root : 2 pointersRoot : 2 pointers
Non-leaf: Non-leaf: n/2n/2 pointers pointers
Leaf : Leaf : (n-1)/2(n-1)/2 keys keys
21
Full nodeFull node min. nodemin. node
Non-leafNon-leaf
LeafLeaf
n=4
12
01
50
18
0
30
3 5 11
30
35
counts
even if
null
22
B+tree rulesB+tree rulestree of order tree of order nn
(1) All leaves at same lowest level(1) All leaves at same lowest level(balanced tree)(balanced tree)
(2) Pointers in leaves point to records(2) Pointers in leaves point to records except for “sequence pointer”except for “sequence pointer”
23
(3) Number of pointers/keys for B+tree(3) Number of pointers/keys for B+tree
Non-leaf(non-root) n n-1 n/2 n/2- 1
Leaf(non-root) n n-1
Root n n-1 2 1
Max Max Min Min ptrs keys ptrsdata keys
(n-1)/2 (n-1)/2
24
Insert into B+treeInsert into B+tree
(a) simple case(a) simple case space available in leafspace available in leaf
(b) leaf overflow(b) leaf overflow
(c) non-leaf overflow(c) non-leaf overflow
(d) new root(d) new root
25
(a) Insert key = 32(a) Insert key = 32 n=43 5 11
30
31
30
100
32
26
(b) Insert key = 7(b) Insert key = 7 n=4
3 5 11
30
31
30
100
3 5
7
7
27
(c) Insert key = 160(c) Insert key = 160 n=4
10
0
120
150
180
150
156
179
180
200
160
18
0
160
179
28
(d) New root, insert 45(d) New root, insert 45 n=4
10
20
30
1 2 3 10
12
20
25
30
32
40
40
45
40
30new root
29
(a) Simple case - (a) Simple case - no exampleno example
(b) Coalesce with neighbor (b) Coalesce with neighbor (sibling)(sibling)
(c) Re-distribute keys(c) Re-distribute keys
(d) Cases (b) or (c) at non-leaf(d) Cases (b) or (c) at non-leaf
Deletion from B+treeDeletion from B+tree
30
(b) Coalesce with sibling(b) Coalesce with sibling Delete 50Delete 50
10
40
100
10
20
30
40
50
n=5
40
31
(c) Redistribute keys(c) Redistribute keys Delete 50Delete 50
10
40
100
10
20
30
35
40
50
n=5
35
35
32
40
45
30
37
25
26
20
22
10
141 3
10
20
30
40
(d) Non-leaf coalese(d) Non-leaf coalese Delete 37Delete 37
n=5
40
30
25
25
new root
33
B+tree deletions in B+tree deletions in practicepractice
– Often, coalescing is Often, coalescing is notnot implemented implemented Too hard and not worth it!Too hard and not worth it!
34
Index Definition in SQLIndex Definition in SQL
Create an indexCreate an indexcreate indexcreate index <index-name> <index-name>
onon <relation-name> (<attribute-list>) <relation-name> (<attribute-list>)
E.g.: create index gindex on country(gdp);E.g.: create index gindex on country(gdp);
To drop an index To drop an index drop index drop index <index-name><index-name>
E.g.: drop index gindex;E.g.: drop index gindex;