Upload
brandon-thomas
View
223
Download
6
Tags:
Embed Size (px)
Citation preview
DBMS 2001 Notes 4.1: B-Trees 1
Principles of Database Management Systems
4.1: B-Trees
Pekka Kilpeläinen(after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and
Jennifer Widom)
DBMS 2001 Notes 4.1: B-Trees 2
• B-Trees– a commonly used index structure– nonsequential, “balanced” (acces paths
to different records of equal length)– adapts well to insertions & deletions– consists of blocks holding at most n keys
and n+1 pointers, and at least half of this– (We consider a variation actually called a
B+ tree)
DBMS 2001 Notes 4.1: B-Trees 3
Root
B+Tree Example n=3
100
120
150
180
30
3 5 11
30
35
100
101
110
120
130
150
156
179
180
200
DBMS 2001 Notes 4.1: B-Trees 4
Sample non-leaf
to keys to keys to keys to keys< 120 120 k<150
150k<180 180
120
150
180
DBMS 2001 Notes 4.1: B-Trees 5
Sample leaf node:
From non-leaf node
to next leafin
sequence1
20
130
un
use
d
To r
eco
rd
wit
h k
ey 1
20
To r
eco
rd
wit
h k
ey 1
30
DBMS 2001 Notes 4.1: B-Trees 6
Don’t want nodes to be too empty
• Number of pointers in use:– at internal nodes at least (n+1)/2
(to child nodes)– at leaves at least (n+1)/2
(to data records/blocks) x}n|Zmax{nx
x}n|Zmin{nx
DBMS 2001 Notes 4.1: B-Trees 7
Full nodemin. node
Non-leaf
Leaf
n=3
12
01
50
18
0
30
3 5 11
30
35
DBMS 2001 Notes 4.1: B-Trees 8
B+tree rules
(1) All leaves at the same lowest level (balanced tree)
(2) Pointers in leaves point to records except for “sequence pointer”
DBMS 2001 Notes 4.1: B-Trees 9
(3) Number of pointers/keys for B+tree
Non-leaf(non-root) n+1 n (n+1)/2 (n+1)/2- 1
Leaf(non-root) n+1 n
Root n+1 n 2(*) 1
Max Max Min Min ptrs keys ptrsdata keys
(n+1)/2 (n+1)/2
(*) 1, if only one record in the file
DBMS 2001 Notes 4.1: B-Trees 10
Insert into B+treeFirst lookup the proper leaf;
(a) simple case– leaf not full: just insert (key, pointer-to-record)
(b) leaf overflow(c) non-leaf overflow(d) new root
DBMS 2001 Notes 4.1: B-Trees 11
(a) Insert key = 32 n=33 5 11
30
31
30
100
32
DBMS 2001 Notes 4.1: B-Trees 12
(b) Insert key = 7 n=3
3 5 11
30
31
30
100
3 5
7
7
DBMS 2001 Notes 4.1: B-Trees 13
(c) Insert key = 160 n=3
10
0
120
150
180
150
156
179
180
200
160
18
0
160
179
DBMS 2001 Notes 4.1: B-Trees 14
(d) New root, insert 45 n=3
10
20
30
1 2 3 10
12
20
25
30
32
40
40
45
40
30new root
Height grows at root => balance maintained
DBMS 2001 Notes 4.1: B-Trees 15
Again, first lookup the proper leaf;
(a): Simple case: no underflow; Otherwise ...
(b): Borrow keys from an adjacent sibling (if it doesn't become too empty); Else ...
(c): Coalesce with a sibling node (d): Cases (a), (b) or (c) at non-leaf
Deletion from B+tree
DBMS 2001 Notes 4.1: B-Trees 16
(b) Borrow keys– Delete 50
10
40
100
10
20
30
35
40
50
n=4
35
35
=> min # of keys in a leaf = 5/2 = 2
DBMS 2001 Notes 4.1: B-Trees 17
(c) Coalesce with a sibling– Delete 50
20
40
100
20
30
40
50
n=4
40
DBMS 2001 Notes 4.1: B-Trees 18
40
45
30
37
25
26
20
22
10
141 3
10
20
30
40
(d) Non-leaf coalesce– Delete 37
n=4
40
30
25
25
new root
=> min # of keys in a non-leaf = (n+1)/2 - 1=3-1= 2
DBMS 2001 Notes 4.1: B-Trees 19
B+tree deletions in practice
– Often, coalescing is not implemented– Too hard and not worth it!– later insertions may return the node back to its
required minimum size– Compromise: Try redistributing keys with a
sibling;If not possible, leave it there
– if all accesses to the records go through the B-tree, can place a "tombstone" for the deleted record at the leaf
DBMS 2001 Notes 4.1: B-Trees 20
Why B-trees Are Good?
• B-tree adapts well to insertions and deletions, maintaining balance– DBA does not need to care about
reorganizing– split/merge operations rather rare
(How often would nodes with 200 keys be split?)
Access times dominated by key-lookup (i.e., traversal from root to a leaf)
DBMS 2001 Notes 4.1: B-Trees 21
Efficiency of B-trees• For example, assume 4 KB blocks, 4 byte
keys and 8 byte pointers• How many keys and pointers fit in a node
(= index block)?– Max n s.t. (4*n + 8*(n+1)) B 4096 B ?
n=340; 340 keys and 341 pointers fit in a node
171 … 341 pointers in a non-leaf node
DBMS 2001 Notes 4.1: B-Trees 22
Efficiency of B-trees (cont.)
• Assume an average node has 255 pointers a three-level B-tree has 2552 = 65025 leaves with total of 2553 or about 16.6 million pointers to records if root block kept in main memory,each record can be accessed with 2+1 disk I/Os; If all 256 internal nodes are in main memory, record access requires 1+1 disk I/Os (256 x 4 KB = 1 MB; quite feasible!)
DBMS 2001 Notes 4.1: B-Trees 23
Outline/summary
• B trees•popular index structures with
graceful growth properties•support range queries like
“… WHERE 100 < Key < 200” ( Exercises)
• Next: Hashing schemes