38
CS 4432 lecture #10 - b+ tree indexing 1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS4432: Database Systems II

Embed Size (px)

DESCRIPTION

CS4432: Database Systems II. Lecture #10. Professor Elke A. Rundensteiner. 100. 30. 20. 80. 90. 50. 70. 40. 10. 60. 50. 10. 10. 20. 50. 60. 70. 30. 90. 40. high Level (always sparse). Hierarchy of index structures. Sequence field. 1. 2. 3. 4. 5. first level - PowerPoint PPT Presentation

Citation preview

CS 4432 lecture #10 - b+ tree indexing

1

CS4432: Database Systems IILecture #10

Professor Elke A. Rundensteiner

CS 4432 lecture #8 - indexing 2

Hierarchy of index structuresSequencefield

5030

7020

4080

10100

6090

firstlevel

(dense,if non-

sequential)

10203040

506070...

105090...

highLevel

(alwayssparse)

1

2

5

43

CS 4432 lecture #8 - indexing 3

Conventional indexes : pros/cons ?Advantage:

- Simple- Index is sequential file

good for scans - Search efficient for static data

Disadvantage:

- Inserts expensive, and/or- Lose sequentiality & balance

- Then search time unpredictable

CS 4432 lecture #10 - b+ tree indexing

4

Example Sequential Index

continuous

free space

102030

405060

708090

39313536

323834

33

overflow area(not sequential)

CS 4432 lecture #10 - b+ tree indexing

5

• Without re-organization we get unpredictable performance

• Too much/often re-organization brings too much overhead

• DBA does not know when to reorganize

• DBA does not know how full to loadpages of new index

Problems … Problems … Problems …

CS 4432 lecture #10 - b+ tree indexing

6

So Let’s Try Another Index . . .

• Give up “sequentiality” of index• Predictable performance under

updates• Achieve always balance of “tree” • Automate restructuring under

updates

CS 4432 lecture #10 - b+ tree indexing

7

Root

B+Tree Example n=3

100

120

150

180

30

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

CS 4432 lecture #10 - b+ tree indexing

8

B+ Trees in Practice

• Typical order: 100. Typical fill-factor: 67%.– average fanout = 133

• Typical capacities:– Height 4: 1334 = 312,900,700 records– Height 3: 1333 = 2,352,637 records

• Can often hold top levels in buffer pool:– Level 1 = 1 page = 8 Kbytes– Level 2 = 133 pages = 1 Mbyte– Level 3 = 17,689 pages = 133 Mbytes

CS 4432 lecture #10 - b+ tree indexing

9

Sample non-leaf

to keys to keys to keys to keys

< 57 57 k<81 81k<95 95

57

81

95

CS 4432 lecture #10 - b+ tree indexing

10

Sample leaf node:

From non-leaf node

to next leafin

sequence5

7

81

95

To r

eco

rd

wit

h k

ey 5

7

To r

eco

rd

wit

h k

ey 8

1

To r

eco

rd

wit

h k

ey 9

5

CS 4432 lecture #10 - b+ tree indexing

11

In textbook’s notationn=3

Leaf:

Non-leaf:

30

35

30

30 35

30

CS 4432 lecture #10 - b+ tree indexing

12

Size of node n: n+1 pointersn keys

(fixed)

CS 4432 lecture #10 - b+ tree indexing

14

Full nodemin. node

Non-leaf

Leaf

n=3

12

01

50

18

0

30

3 5 11

30

35

counts

even if

null

Non-leaf: (n+1)/2 pointers

Leaf: (n+1)/2 pointers to data

CS 4432 lecture #10 - b+ tree indexing

15

B+tree rules tree of order n

(1) All leaves at same lowest level(balanced tree)

(2) Pointers in leaves point to records; except for the “sequence pointer”

CS 4432 lecture #10 - b+ tree indexing

17

Root

B+Tree Example : Searches

100

120

150

180

30

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

CS 4432 lecture #10 - b+ tree indexing

18

Insert into B+tree

(a) simple case– space available in leaf

(b) leaf overflow(c) non-leaf overflow(d) new root

CS 4432 lecture #10 - b+ tree indexing

19

(a) Insert key = 32 n=33 5 11

30

31

30

100

32

CS 4432 lecture #10 - b+ tree indexing

20

(a) Insert key = 7 n=3

3 5 11

30

31

30

100

3 5

7

7

CS 4432 lecture #10 - b+ tree indexing

21

(c) Insert key = 160 n=3

10

0

120

150

180

150

156

179

180

200

160

18

0

160

179

CS 4432 lecture #10 - b+ tree indexing

22

(d) New root, insert 45 n=3

10

20

30

1 2 3 10

12

20

25

30

32

40

40

45

40

30new root

CS 4432 lecture #10 - b+ tree indexing

23

Recap: Insert Data into B+ Tree

• Find correct leaf L. • Put data entry onto L.

– If L has enough space, done!– Else, must split L (into L and a new node L2)

• Redistribute entries evenly, copy up middle key.• Insert index entry pointing to L2 into parent of L.

• This can happen recursively– To split index node, redistribute entries evenly, but

push up middle key. (Contrast with leaf splits.)

• Splits “grow” tree; root split increases height. – Tree growth: gets wider or one level taller at top.

CS 4432 lecture #10 - b+ tree indexing

24

(a) Simple case (b) Leaf-node: Coalesce with neighbor

(sibling)

(c) Leaf-node: Re-distribute keys(d) Cases (b) or (c) at non-leaf

Deletion from B+tree

CS 4432 lecture #10 - b+ tree indexing

25

(a) Delete key = 11 n=33 5 11

30

31

30

100

CS 4432 lecture #10 - b+ tree indexing

26

(b) Coalesce with sibling– Delete 50

10

40

100

10

20

30

40

50

n=4

40

CS 4432 lecture #10 - b+ tree indexing

27

(c) Redistribute keys– Delete 50

10

40

100

10

20

30

35

40

50

n=4

35

35

CS 4432 lecture #10 - b+ tree indexing

28

40

45

30

37

25

26

20

22

10

141 3

10

20

30

40

(d) Coalese and Non-leaf coalese– Delete 37

n=4

40

30

25

25

new root

CS 4432 lecture #10 - b+ tree indexing

29

Delete Data from B+ Tree

• Start at root, find leaf L where entry belongs.• Remove the entry.

– If L is at least half-full, done! – If L has only d-1 entries,

• Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).

• If re-distribution fails, merge L and sibling.

• If merge occurred, must delete entry (pointing to L or sibling) from parent of L.

• Merge could propagate to root, decreasing height.

CS 4432 lecture #10 - b+ tree indexing

30

• Concurrency control harder in B-Trees• B-tree consumes more space• B-tree automatically decides :

– when to reorganize– how full to load pages of new index

Discussion of B-trees (vs. static indexed sequential files)

CS 4432 lecture #10 - b+ tree indexing

31

ComparisonB-tree vs. indexed seq.

file• Less space, so

lookup faster• Inserts managed

by overflow area• Requires

temporary restructuring

• Unpredictable performance

• Consumes more space, so lookup slower

•Each insert/delete potentially restructures

•Build-in restructuring

• Predictable performance

CS 4432 lecture #10 - b+ tree indexing

32

• Speaking of buffering… Is LRU a good policy for B+tree

buffers?Of course not!

Should try to keep root in memory at all times

(and perhaps some nodes from second level)

Should keep the “path” when going down to leaves

(just in case of restructuring)

CS 4432 lecture #10 - b+ tree indexing

34

Interesting problem:

For B+tree, how large should n be?

n is number of keys / node

CS 4432 lecture #10 - b+ tree indexing

35

assumptions: n children per node and N records in database

(1) Time to read B-Tree node from disk is (tseek + tread*n) msec.(2) Once in main memory, use binary search to locate key, (a + b log_2 n) msec(3) Need to search (read) log_n (N) tree nodes

(4) t-search = (tseek + tread*n + (a + b*log_2(n)) * log n (N)

CS 4432 lecture #10 - b+ tree indexing

36

Can get: f(n) = time to find a record

f(n)

nopt n

FIND nopt by f’(n) = 0

What happens to nopt as:•Disk gets faster? CPU get faster? …

CS 4432 lecture #10 - b+ tree indexing

37

Bulk Loading of B+ Tree

• For large collection of records, create B+ tree.• Method 1: Repeatedly insert records slow.• Method 2: Bulk Loading more efficient.

CS 4432 lecture #10 - b+ tree indexing

38

Bulk Loading of B+ Tree

• Initialization: – Sort all data entries – Insert pointer to first (leaf) page in new (root) page.

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*

Sorted pages of data entries; not yet in B+ treeRoot

CS 4432 lecture #10 - b+ tree indexing

39

Bulk Loading (Contd.)

• Index entries for leaf pages always entered into right-most index page

• When this fills up, it splits.

Split may go up right-most path to root.

3* 4* 6* 9* 10*11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*

Root

Data entry pages

not yet in B+ tree3523126

10 20

3* 4* 6* 9* 10* 11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*

6

Root

10

12 23

20

35

38

not yet in B+ treeData entry pages

CS 4432 lecture #10 - b+ tree indexing

40

Summary of Bulk Loading

• Method 1: multiple inserts.– Slow.– Does not give sequential storage of leaves.

• Method 2: Bulk Loading – Has advantages for concurrency control.– Fewer I/Os during build.– Leaves will be stored sequentially (and

linked) – Can control “fill factor” on pages.

SummaryB+ tree idea: self-balancing index structure that supports both search and insert/delete in log_n time.

B+ tree is versatile : handles equality and range searches

B+ tree and its variants: common index structure in industrial DBMSs

CS 4432 lecture #10 - b+ tree indexing

41