Database design and Implementation 05.btree

8/8/2019 Database design and Implementation 05.btree

1/151

Database Systems Implementation, Bongki Moon 1

Introduction to IndexingB+-Tree Indexes

Ramakrishnan & Gehrke: Chap. 8.2, 10.3-10.8


Overview

Index classification Primay/Secondary, Clustered/Non-clustered

Sparse/Dense, Single-attribute/Composite

B+-Tree

Structure, Search etc.

Insert and Delete algorithms

Duplicate keys, variable-length keys, bulk loading


2/15

2


Basics

To speed up selections on the search key fields.

Any subset of the fields of a relation can be thesearch key for an index on the relation.

Typically stored as a separate file (or relation).

Typically much smaller than base relations.


Index Classification

Primary vs. secondary: If search key containsprimary key, then called primary index.

Primary index value is unique (e.g., SSN). Secondary index allows duplicate key values (e.g.

Names, GPA).

Unique index: Search key is or contains a candidatekey.


3/15

3



Clustered vs. Non-clustered: If order of data records is the same as, or`close to, order of keys, then called clustered index.

How many clustered and non-clustered indexes can a relation have?

Index entries

Data entries

direct search for

(Index File)

(Data file)

Data Records

data entries

Data entries

Data Records

CLUSTERED UNCLUSTERED



Dense vs. Sparse: Ifthere is 1-to-1 mappingbetween key values (in

index) and data records,then the index is dense. Can a clustered index be

dense or sparse?

Can a non-clusteredindex be dense or sparse?

Ashby, 25, 3000

Smith, 44, 3000

Ashby

Cass

Smith

22

25

30

40

44

44

50

Sparse Indexon

Name Data File

Dense Indexon

Age

33

Bristow, 30, 2007

Basu, 33, 4003

Cass, 50, 5004

Tracy, 44, 5004

Daniels, 22, 6003

Jones, 40, 6003


4/15

4


Benefits and Costs

Cost of retrieving data records through index variesgreatly based ontypes of indexes. (E.g.) For an exact-match query with a unique/non-unique, clustered/non-

clustered, dense/sparse index, how many pages need to be retrieved from theindex and the base relation?

(E.g.) How about a range query?

Some may be more costly than others To build or maintain a clustered index, the records in the base relation must

be sorted and be in sorted order.

For dynamic data, it will be costly to maintain the sorted order.

To reduce the cost of dynamic insertions, Keep some free space on each page of the base relation for future insertions.

Use overflow pages (with links) for more future insertions. (Thus, order of datarecords is `close to, but not identical to, the sort order.)

Ex) Cost of indexed scan: clustered vs. non-cluster. Which is better?


Composite Search Keys

Search on a combination ofcolumns. Equality query:

age=20 and sal =75

Range query: age =20; or age=20 and sal > 10

Keys in index should be in sortedorder by search key to supportrange queries. But, how for multiple columns? Certain queries may benefit from a

particular order. (E.g.) age-then-sal, sal-then-age

Asymmetric vs. Symmetric

sue 13 75

bob

cal

joe 12

10

20

8011

12

name age sal

12,20

12,10

11,80

13,75

20,12

10,12

75,13

80,11

11

12

12

13

10

20

75

80

Data recordssorted by name

Data entries in indexsorted by

Data entriessorted by

Examples of composite keyindexes using lexicographic order.


5/15

5


B+-Tree: Introduction

Most widely used.

Support both range and equality searchesefficiently.

Dynamic index Adjusts gracefully under insertions and deletions.

Keeps tree height-balanced. The cost of exact-match/insertion/deletion is log FN.

F is the fanout, and N is the # leaf nodes.


B+-Tree: Structural Characteristics

Root node has at least two children.

Minimum 50% occupancy (except for root). For a B+- treeof order d,

An internal node hasd/2

m

d children. A leaf node has d/2 m d-1 pointers to data pages.

Index Entries

Data Entries

("Sequence set")

(Direct search)<


6/15

6


B+-Tree: Node Structures

Internal nodes

N : the number of valid entries.

Ki : key values (K1 < K2 < < Kd-1).

Pi : tree pointers to internal or leaf nodes.

Leaf nodes

Ppred, Pnext : pointers to neighboring leaf nodes.

Pi : data pointers to a record or a block.

N P1 K1 P2 K2 Pd-1 Kd-1 Pd...

P2K2P1K1PpredN ... PnextPd-1Kd-1


Example B+-Tree (of order 5)

Search begins at root, and key comparisonsdirect it to a leaf.

(E.g.) Search for 5, 15, or all data entries > 24 ...Root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13


7/15

7


Inserting a Key into a B+-Tree

Find a correct leaf node L.

Put a key entry into L. If L has enough space, done!

Else, must split L (into L and a new node L2) Redistribute entries evenly, copy up the middle key.

Insert a new index entry pointing to L2 into the parent of L.

Node splitting can happen recursively To split a non-leaf node, redistribute entries evenly, but

push up the middle key. (Contrast with leaf splits.)

Splits grow tree wider or taller Splitting the root node makes the tree one level taller.


Inserting 8* into Example B+-Tree

2* 3* 5* 7* 8*

5

Note that 5 is copied up and

continues to appear in the leaf.

5 24 30

17

13

Note that 17 is pushed up andappears once in the index.

Root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13


8/15

8


Example B+-Tree After Inserting 8*

Notice that root was split, leading to increase in height.

In this example, we can avoid split by re-distributingentries; however, this is usually not done in practice.

2* 3*

Root

17

24 30

14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*


Deleting a Key from a B+-Tree

Start at the root, find a leaf L where the entry belongs.

Remove the entry. If L is at least half-full, done!

If L has less thand/2

entries (pointers),

Try to re-distribute, borrowing from sibling (adjacent node withsame parent as L).

If re-distribution fails, merge L and a sibling.

If merge occurred, must delete an entry (pointing to L orsibling) from the parent of L.

Merge could propagate to the root, decreasing the height.


9/15

9


Example: Deleting 19* and 20*

Deleting 20* is done with re-distribution (by moving key 24). The key 24 is removed from the parent. The new middle key 27 is copied up.

2* 3*

Root

17

30

14*16* 33*34*38*39*

135

7*5* 8* 22*24*

27

27*29*

2* 3*

Root17

24 30

14*16* 19*20*22* 24* 27*29* 33*34*38*39*

135

7*5* 8*


... And Then Deleting 24*

Two leaf nodes merge : Key 27 is removed from the parent.

2* 3*

Root

17

30

14*16* 33*34*38*39*

135

7*5* 8* 22*24*

27

27*29*

2* 3*

Root

17

14*16*

135

7*5* 8* 22* 27*

30

33* 34*29* 38* 39*


10/15

10


Example B+-Tree After Deleting 24*

2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*5* 8*

Root30135 17

2* 3*

Root

17

14*16*

135

7*5* 8* 22* 27*

30

33* 34*29* 38* 39*

Two internal nodes merge : Key 17 ispulled down from the root.


Example: Deleting 24*

Root

135 17 20

22

30

14*16* 17*18* 20* 33*34*38*39*22*27*29*21*7*5* 8*3*2*

A different scenario: Deleting 24* causes two leaf nodes merged. Then, this causes an underflow in the internal node.

Root

135 17 20

22

27

14*16* 17*18* 20* 27*29*22*24*21*7*5* 8*3*2* 34*38*39*

30

33*


11/15

11


Example of Non-leaf Re-distributionRoot

135 17 20

22

30

14*16* 17*18* 20* 33*34*38*39*22*27*29*21*7*5* 8*3*2*

14*16* 33*34*38*39*22*27*29*17*18* 20*21*7*5* 8*2* 3*

Root

135

17

3020 22

Entries are re-distributed by pushing through the entries in the parent node. Pull down 22 from the root, and push up 20 to the root. We can stop here. Its ok to do once more. Pull down 20 from the root, and push up 17 to the root.


Summary of B+-tree Operations

Insert Split a leaf node; copy up the middle key.

Split an internal node;push up the middle key.

Delete Merge leaf nodes; remove the middle key from the parent. Merge internal nodes;pull down the middle key from the parent.

Redistribute leaf nodes; remove the middle key from the parent, andcopy up a new middle key.

Redistribute internal nodes;pull down the middle key from theparent to one child node, andpush up a new middle key from theother child node.


12/15

12


Other Issues of B+-tree

1. Duplicate Key Values

2. Prefix Key Compression

Variable-length Keys such as strings

3. Bulk-Loading

4. Choosing an Optimal Node Size


Duplicate Key Values

Three alternatives for Leaf Page Layout:1. Multiple pairs

2. One-key and multiple-pointers

variable-length records, variable fanout.3. Another level of indirection

additional overhead for dereferencing and disk accesses.

Page Overflows: several leaf pages may contain entrieswith the same key value, if the 1st or 2nd layout option is selected.

The 3rd layout option may be a better choice for this.

How about AM Layer of MiniRel project?


13/15

13


Variable-Length Keys

Longer keys may reduce the fan-out and grow the indextaller. (The taller the tree, the more disk accesses.)

Prefix-key compression is done to increase fan-out.

Key values in index entries only `direct traffic; can oftencompress them. (E.g.) If we have adjacent index entries with search key values

Dannon Yogurt, David Smith and Devarakonda Murthy, we canabbreviate them to Dan, Dav and Dev. What if there is a data entry Davey Jones? (Then, we can only

compress David Smith to Davi)

Insert/delete must be suitably modified.

What if the keys (or strings) are longer than a page?


Bulk Loading of a B+-Tree

For a large collection of records, building a B+-tree byrepeatedly inserting records is very slow. Does not give sequential storage of leaves.

Bulk-Loading:

Start with an empty root node and sorted entries Insert a leaf node at a time into the right-most slot of the root or

parent node.

3* 4* 6* 9* 10* 11* 12* 13* 20*22* 23* 31* 35* 36* 38* 41* 44*

Sorted pages of data entries; not yet in B+ treeRoot


14/15

14


B+-Trees in Practice

Typical order: 200. Typical fill-factor: 67%. Block: 8KB, key: 36 Bytes, Pointer: 4 Bytes.

average fanout = 133

Typical capacities: Height 3: 1333 = 2,352,637 records

Height 4: 1334 = 312,900,700 records

Can often hold top levels in buffer pool: Level 1 = 1 page = 8 Kbytes

Level 2 = 133 pages = 1 Mbytes

Level 3 = 17,689 pages = 133 MBytes


Optimal Size of B-tree Nodes

Problem Small pages make B-tree taller and inefficient.

Large pages increase transfer time.

Idea: Find a Page Size that maximizes the benefit-to-cost

ratio. Benefit = log2F

Because F(fanout) Page size; Height 1/log2F

AccessCost = disk_latency + PageSize/TransferRate (ignoringcache effects).

From Table 6 in Gray&Graefe 5 Minute Rule paper, 8/16/32 Kbytes are near optimal, when one entry is 20 bytes

long.


15/15


Summary

B+-tree is a dynamic structure. Inserts/deletes leave tree height-balanced; log F N cost.

Adjusts to growth gracefully.

High fanout (F) means depth rarely more than 3 or 4.

Typically, 67% occupancy on average.

We will discuss B-tree locking soon.

Most widely used index in database management systemsbecause of its versatility. One of the most optimized

components of a DBMS. Discussions

B-Tree vs. B+-Tree

BST vs. B+-Tree

Documents

Database design and Implementation 05.btree