23
DBMS 2001 Notes 4.1: B-Trees 1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom)

DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

Embed Size (px)

Citation preview

Page 1: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 1

Principles of Database Management Systems

4.1: B-Trees

Pekka Kilpeläinen(after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and

Jennifer Widom)

Page 2: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 2

• B-Trees– a commonly used index structure– nonsequential, “balanced” (acces paths

to different records of equal length)– adapts well to insertions & deletions– consists of blocks holding at most n keys

and n+1 pointers, and at least half of this– (We consider a variation actually called a

B+ tree)

Page 3: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 3

Root

B+Tree Example n=3

100

120

150

180

30

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

Page 4: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 4

Sample non-leaf

to keys to keys to keys to keys< 120 120 k<150

150k<180 180

120

150

180

Page 5: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 5

Sample leaf node:

From non-leaf node

to next leafin

sequence1

20

130

un

use

d

To r

eco

rd

wit

h k

ey 1

20

To r

eco

rd

wit

h k

ey 1

30

Page 6: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 6

Don’t want nodes to be too empty

• Number of pointers in use:– at internal nodes at least (n+1)/2

(to child nodes)– at leaves at least (n+1)/2

(to data records/blocks) x}n|Zmax{nx

x}n|Zmin{nx

Page 7: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 7

Full nodemin. node

Non-leaf

Leaf

n=3

12

01

50

18

0

30

3 5 11

30

35

Page 8: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 8

B+tree rules

(1) All leaves at the same lowest level (balanced tree)

(2) Pointers in leaves point to records except for “sequence pointer”

Page 9: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 9

(3) Number of pointers/keys for B+tree

Non-leaf(non-root) n+1 n (n+1)/2 (n+1)/2- 1

Leaf(non-root) n+1 n

Root n+1 n 2(*) 1

Max Max Min Min ptrs keys ptrsdata keys

(n+1)/2 (n+1)/2

(*) 1, if only one record in the file

Page 10: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 10

Insert into B+treeFirst lookup the proper leaf;

(a) simple case– leaf not full: just insert (key, pointer-to-record)

(b) leaf overflow(c) non-leaf overflow(d) new root

Page 11: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 11

(a) Insert key = 32 n=33 5 11

30

31

30

100

32

Page 12: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 12

(b) Insert key = 7 n=3

3 5 11

30

31

30

100

3 5

7

7

Page 13: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 13

(c) Insert key = 160 n=3

10

0

120

150

180

150

156

179

180

200

160

18

0

160

179

Page 14: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 14

(d) New root, insert 45 n=3

10

20

30

1 2 3 10

12

20

25

30

32

40

40

45

40

30new root

Height grows at root => balance maintained

Page 15: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 15

Again, first lookup the proper leaf;

(a): Simple case: no underflow; Otherwise ...

(b): Borrow keys from an adjacent sibling (if it doesn't become too empty); Else ...

(c): Coalesce with a sibling node (d): Cases (a), (b) or (c) at non-leaf

Deletion from B+tree

Page 16: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 16

(b) Borrow keys– Delete 50

10

40

100

10

20

30

35

40

50

n=4

35

35

=> min # of keys in a leaf = 5/2 = 2

Page 17: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 17

(c) Coalesce with a sibling– Delete 50

20

40

100

20

30

40

50

n=4

40

Page 18: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 18

40

45

30

37

25

26

20

22

10

141 3

10

20

30

40

(d) Non-leaf coalesce– Delete 37

n=4

40

30

25

25

new root

=> min # of keys in a non-leaf = (n+1)/2 - 1=3-1= 2

Page 19: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 19

B+tree deletions in practice

– Often, coalescing is not implemented– Too hard and not worth it!– later insertions may return the node back to its

required minimum size– Compromise: Try redistributing keys with a

sibling;If not possible, leave it there

– if all accesses to the records go through the B-tree, can place a "tombstone" for the deleted record at the leaf

Page 20: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 20

Why B-trees Are Good?

• B-tree adapts well to insertions and deletions, maintaining balance– DBA does not need to care about

reorganizing– split/merge operations rather rare

(How often would nodes with 200 keys be split?)

Access times dominated by key-lookup (i.e., traversal from root to a leaf)

Page 21: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 21

Efficiency of B-trees• For example, assume 4 KB blocks, 4 byte

keys and 8 byte pointers• How many keys and pointers fit in a node

(= index block)?– Max n s.t. (4*n + 8*(n+1)) B 4096 B ?

n=340; 340 keys and 341 pointers fit in a node

171 … 341 pointers in a non-leaf node

Page 22: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 22

Efficiency of B-trees (cont.)

• Assume an average node has 255 pointers a three-level B-tree has 2552 = 65025 leaves with total of 2553 or about 16.6 million pointers to records if root block kept in main memory,each record can be accessed with 2+1 disk I/Os; If all 256 internal nodes are in main memory, record access requires 1+1 disk I/Os (256 x 4 KB = 1 MB; quite feasible!)

Page 23: DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,

DBMS 2001 Notes 4.1: B-Trees 23

Outline/summary

• B trees•popular index structures with

graceful growth properties•support range queries like

“… WHERE 100 < Key < 200” ( Exercises)

• Next: Hashing schemes