70
Modern Database Systems Lecture 2 Aristides Gionis Michael Mathioudakis Spring 2017

Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

Modern Database SystemsLecture 2

Aristides GionisMichael Mathioudakis

Spring 2017

Page 2: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

in this lecture...

b+ trees and hash-based indexingexternal sorting

2

Page 3: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

b+ trees

Page 4: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

b+ trees

4

leaf nodescontain data-entries

sequentially linkedeach node stored in one page

data entries can be any one of the three alternative typestype 1: data records; type 2: (k, rid); type 3: (k, rids)

at least 50% capacity - except for root!

non-leaf nodesindex entries

used to direct search

in the examples that follow...alternative 2 is used

all nodes have between d and 2d key entriesd is the order of the tree

Page 5: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

b+ trees

5

non-leaf nodesindex entries

used to direct search

P0 K 1 P 1 K 2 P 2 K m P m

k* < K1 K1 ≤ k* < K2Km ≤ k*

leaf nodescontain data-entries

sequentially linked

closer look at non-leaf nodes

search key values pointers

Page 6: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

b+ trees

6

most widely used index

search and updates at logFN cost (cost = pages I/O)F = fanout (num of pointers per index node); N = num of leaf pages

efficient equality and range queries

non-leaf nodesindex entries

used to direct search

leaf nodescontain data-entries

sequentially linked

Page 7: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

example b+ tree - search

search begins at root, and key comparisons direct it to a leafsearch for 5*; search for all data entries >= 24*

root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13

7

Page 8: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

inserting a data entry

1. find correct leaf L2. place data entry onto L

a. if L has enough space, done!b. else must split L into L and L2

• redistribute entries evenly• copy up the middle key to parent of L• insert entry pointing to L2 to parent of L

8

the above happens recursivelywhen index nodes are split, push up middle key

splits grow the treeroot split increases height

Page 9: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

example b+ tree

insert 8*

root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13

9

middle key is copied up (andcontinues to appear in the leaf)

5≥ 5< 5

5 24 30

17

13

middle key is pushed up

≥ 17< 17split parent node!

L

2* 3* 5* 7* 8*

L L2

Page 10: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

example b+ tree

insert 8*

root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13

10

2* 3*

17

24 30

14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

Page 11: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

deleting a data entry

11

inverse of insertion

re-distribute entries &(maybe) merge nodes

vs split nodes &re-distribute entries

when nodesare less than half-full

when nodes overflow

remove data entry add data entryvs

deletion insertion

Page 12: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

b+ trees in practicetypical order d = 100, fill-factor = 67%

average fan-out 133

typical capacities:for height 4: 1334 = 312,900,700 records

for height 3: 1333 = 2,352,637 records

can often hold top levels in main memorylevel 1: 1 page = 8KBytes

level 2: 133 pages = 1MBytelevel 3: 17,689 pages = 133 MBytes

12

Page 13: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

hash-based indexes

Page 14: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

hash-based index

the index supports equality queriesdoes not support range queriesstatic and dynamic variants exist 14

data entries organized in M bucketsbucket = a collection of pages

the data entry for recordwith search key value key

is assigned to bucket given by hashing function

h(key) mod M

e.g., h(key) = α key + β

h(key) mod M

keyh

0

1

2

M-1

...

buckets

Page 15: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

static hashing

number of buckets is fixed

start with one page per bucket

allocated sequentially, never de-allocatedcan use overflow pages

15

h(key) mod M

keyh

0

1

2

M-1

...

buckets

Page 16: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

static hashing

drawbacklong overflow chains can degrade performance

dynamic hashing techniquesadapt index to data sizeextendible and linear hashing

16

Page 17: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

extendible hashingproblem: bucket becomes full

one solutiondouble the number of buckets......and redistribute data entries

however…reading and re-writing all buckets is expensive

better idea:use directory of pointers to buckets

double number of ‘logical’ buckets…but split ‘physically’ only the overflown bucket

directory much smaller than data entry pages - good!no overflow pages - good!

17

Page 18: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

example

2

2

2

2

local depth

2

global depth

directory

00011011

data entries

bucket A

bucket B

bucket C

bucket D

4* 12* 32* 16*

1* 5* 21* 13*

10*

15* 7* 19*

directory is array of size M = 4 = 22

to find bucket for r, take last 2 # bits of keyh(r) = key mod 22

e.g., if h(r) = 5 = binary 101it is in bucket pointed to by 01

global depth = 2 = min. bits enough to enumerate buckets

local depth= min bits to identify individual bucket

= 2

18

Page 19: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

insertion

2

2

2

2

local depth

2

global depth

directory

00011011

data entries

bucket A

bucket B

bucket C

bucket D

4* 12* 32* 16*

1* 5* 21* 13*

10*

15* 7* 19*

try to insert entry tocorresponding bucket

if necessary, double the directoryi.e., when for split bucketlocal depth > global depth

19

when directory doubles,increase global depth +1

if bucket is full,increase +1 local depth

and split bucket(allocate new bucket, re-distribute)

Page 20: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

exampleinsert record with h(r) = 20 = binary 10100 è bucket A

split bucket A !allocate new page,

redistribute according to modulo 2M = 8 = 23

3 least significant bits

we’ll have more than 4 buckets now,

so double the directory!

00011011

2

2

2

2

2

local depth

global depth

directory

bucket A

bucket B

bucket C

bucket D

data entries

4* 12* 32* 16*

1* 5* 21* 13*

10*

15* 7* 19*

20

Page 21: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

global depth

2

example

3

2

2

2

2

2

local depth 3

2

2

2

3

directory

001

4* 12* 20* bucket A2split image of A

000

010011

bucket A

bucket B

bucket C

bucket D

32* 16*

1* 5* 21* 13*

10*

15* 7* 19*

100101110111

21

split bucket A andredistribute entries

update local depthdouble the directoryupdate global depth

update pointers

Page 22: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

notes

20 = binary 10100last 2 bits (00) tell us r belongs in A or A2

last 3 bits needed to tell which

global depth of directorynumber of bits enough to determine which bucket any entry belongs to

local depth of a bucketnumber of bits enough to determine if an entry belongs to this bucket

when does bucket split cause directory doubling?before insert, local depth of bucket = global depth

22

Page 23: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

example

34* 12* 20* bucket A2

000001010011

3

3

2

2

2

local depth

global depth

directory

bucket A

bucket B

bucket C

bucket D

32* 16*

1* 5* 21* 13*

10*

15* 7* 19*

100101110111

insert h(r) = 17

split bucket B

000001010011

3 3

global depth

directory

bucket B1* 17*

100101110111

3bucket B25* 21* 13*

23

other buckets not shown

Page 24: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

comments on extendible hashingif directory fits in memory,

equality query answered in one disk accessanswered = retrieve rid

directory grows in spurtsif hash values are skewed, it might grow large

delete: reverse algorithmempty bucket can be merged with its ‘split image’

when can the directory be halved?when all directory elements point to same bucket as their ‘split image’

24

Page 25: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

indexes in SQL

Page 26: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

create index

CREATE INDEX indexbON students (age, grade)USING BTREE;

CREATE INDEX indexhON students (age, grade)USING HASH;

DROP INDEX indexhON student;

26

Page 27: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

27

external sorting

Page 28: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

the sorting problemsetting

a relation R, stored over N disk pages3≤B<N pages available in memory (buffer pages)

tasksort records of R and store result on disk

sort by a function of record field values f(r)

whyapplication need records ordered

part of join implementation (in upcoming lecture...)28

Page 29: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

sorting with 3 buffer pages

2 phases

29

1

2

N

inputrelation R

N p

ages

sto

red

on d

isk

outputsorted R

f(r)

N pages stored on disk

buffer (memory used by dbms)

Page 30: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

sorting with 3 buffer pages - first phase

30

1

2

N

inputrelation R

N p

ages

sto

red

on d

isk

pass 0: output N runsrun: sorted sub-file

after first phase: one run is one pagehow: load one page at a time,

sort it in-memory, output to disk

run #1

run #2

run #N

only 1 buffer page needed for first phase

outputN runs

Page 31: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

run #1

...

run #N/2

sorting with 3 buffer pages - second phase

31

inputrelation R

N p

ages

sto

red

on d

isk

run #1

run #2

run #(N-1)

run #N

pass 1,2,...: halve the runshow: scan pairs of runs, each in own page,

merge in-memory into a new run,output to disk

input page 1

input page 2

output page

N pages stored on disk

outputhalf runs

Page 32: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

merge?

32

input page 1

input page 2

output page8 4 3 1

7 6 5 2

merge the two sorted input pagesinto the output page

maintaining sorted order

compare the next smallest value from each pagemove smallest to output page

valuesf(r)

Page 33: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

merge?

33

input page 1

input page 2

output page8 4 3

7 6 5 21

merge the two input pages into the output pagemaintaining sorted order

compare the next smallest value from each pagemove smallest to output page

Page 34: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

merge?

34

input page 1

input page 2

output page8 4 3

7 6 52 1

merge the two input pages into the output pagemaintaining sorted order

compare the next smallest value from each pagemove smallest to output page

Page 35: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

merge?

35

input page 1

input page 2

output page8 4

7 6 53 2 1

merge the two input pages into the output pagemaintaining sorted order

compare the next smallest value from each pagemove smallest to output page

Page 36: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

merge?

36

input page 1

input page 2

output page8

7 6 54 3 2 1

merge the two input pages into the output pagemaintaining sorted order

compare the next smallest value from each pagemove smallest to output page

output page is full,what do we do?

write it to disk!

Page 37: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

merge?

37

input page 1

input page 2

output page8

7 6 5

merge the two input pages into the output pagemaintaining sorted order

compare the next smallest value from each pagemove smallest to output page

Page 38: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

merge?

38

input page 1

input page 2

output page8

7 65

merge the two input pages into the output pagemaintaining sorted order

compare the next smallest value from each pagemove smallest to output page

Page 39: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

merge?

39

input page 1

input page 2

output page8

76 5

merge the two input pages into the output pagemaintaining sorted order

compare the next smallest value from each pagemove smallest to output page

Page 40: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

merge?

40

input page 1

input page 2

output page87 6 5

merge the two input pages into the output pagemaintaining sorted order

compare the next smallest value from each pagemove smallest to output page

input page is empty!what do we do?

if the input run has more pages, load

next one

Page 41: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

merge?

41

input page 1

input page 2

output page

8 7 6 5

merge the two input pages into the output pagemaintaining sorted order

compare the next smallest value from each pagemove smallest to output page

Page 42: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

merge?

42

input page 1

input page 2

output page

merge the two input pages into the output pagemaintaining sorted order

compare the next smallest value from each pagemove smallest to output page

Page 43: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

sorting with 3 buffer pages - second phase

43

inputrelation R

N p

ages

sto

red

on d

isk N

pages stored on disk

run #1

run #2

run #(N-1)

run #N

pass 1,2,...: halve the runsafter log2N passes...

we are done!

input page 1

input page 2

output page

outputsorted R

f(r)

Page 44: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

sorting with B buffer pages

44

1

2

N

inputrelation R

N p

ages

sto

red

on d

isk

outputsorted R

f(r)

N pages stored on disk

page #B

page #1

page #2

page #(B-1)

same approach

...

Page 45: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

sorting with B buffer pages - first phase

45

1

2

N

inputrelation R

N p

ages

sto

red

on d

isk

outputsorted R

N pages stored on disk

...

pass 0: output éN/Bù runshow: load R to memory in chunks

of B pages, sort in-memory,output to disk

run #1

run #2

run é#N/Bù

page #B

page #1

page #2

page #(B-1)

Page 46: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

sorting with B buffer pages - second phase

46

inputrelation R

N p

ages

sto

red

on d

isk

outputsorted R

f(r)

N pages stored on disk

output page

pass 1,2,...:merge runs in groups of B-1

...

input page #1

input page #2

input page #(B-1)

run #1

run #2

run #éN/Bù

Page 47: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

sorting with B buffer pages

47

1

2

N

inputrelation R

N p

ages

sto

red

on d

isk

outputsorted R

f(r)

N pages stored on disk

output page

how many passes in total?let N1 = éN/Bù

total number of passes = 1 + élogB-1(N1)ù

...

input page #1

input page #2

input page #(B-1)

phase 1 phase 2

Page 48: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

sorting with B buffer

48

1

2

N

inputrelation R

N p

ages

sto

red

on d

isk

outputsorted R

f(r)

N pages stored on disk

output page

how many pages I/O per pass?

...

input page #1

input page #2

input page #(B-1)

2N: N input, N output

Page 49: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

summary

49

Page 50: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

summary• commonly used indexes• B+ tree

• most commonly used• supports efficient equation and range queries

• hash-based indexes• extendible hashing uses directory, not overflow pages

• external sorting

50

Page 51: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

references● “cowbook”, database management systems, by ramakrishnan and gehrke● “elmasri”, fundamentals of database systems, elmasri and navathe● other database textbooks

51

creditssome slides based on material fromdatabase management systems, by ramakrishnan and gehrke

Page 52: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

backup slides

52

Page 53: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

linear hashing

53

Page 54: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

linear hashing

dynamic hashinguses overflow pages; no directory

splits a bucket in round-robin fashionwhen an overflow occurs

M = 2level: number of buckets at beginning of roundpointer next ∈ [0, M) points at next bucket to split

already next - 1 ‘split-image’ buckets appended to original M

54

Page 55: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

linear hashingto allocate entries, use

H0(key) = h(key) mod M, orH1(key) = h(key) mod 2M

i.e., level or level+1 least significant bits of h(key)

to allocate bucket for keyfirst use H0(key)

if H0(key) is less than nextthen it refers to a split bucket

use H1(key) to determine if it refers to original or its split image

55

Page 56: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

linear hashingin the middle of a round...

M buckets that existed at thebeginning of this round;this is the range of H0

bucket to be split nextbuckets already split in this round

split image bucketscreated through splitting of other buckets in this round

if H0(key) is in this range, then must use H1(key) to decide if entry is in split image bucket

> is a directory necessary? 56

Page 57: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

linear hashing insertsinsert

find bucket by applying H0 / H1 andinsert if there is space

if bucket to insert into is full:add overflow page, insert data entry,split next bucket and increment next

since buckets are split round-robin,long overflow chains don’t develop!

57

Page 58: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

example - insert h(r) = 43on split, H1 is used to redistribute entries

H0

this is for illustration only!

M=4

00

01

10

11

000

001

010

011 actual contents of the

linear hashing file

next=0PRIMARYPAGES

44*

36*

32*

25*

9* 5*

14*

18*10*

30*

31*

35*

11*

7*

00

01

10

11

000

001

010

011

next=1

PRIMARYPAGES

44*

36*

32*

25*

9* 5*

14*

18*10*

30*

31*

35*

11*

7*

OVERFLOWPAGES

43*

00100

H1 H0H1

58

Page 59: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

b+ tree - deletion

59

Page 60: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

deleting a data entry

1. start at root, find leaf L of entry 2. remove the entry, if it exists

○ if L is at least half-full, done!○ else

■ try to re-distribute, borrowing from sibling● adjacent node with same parent as L

■ if that fails, merge L into siblingo if merge occured,

must delete L from parent of L

60

merge could propagate to root

Page 61: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

example b+ tree

delete 19*2* 3*

root17

24 30

14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

61

2* 3*

17

24 30

14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

Page 62: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

example b+ tree

delete 20*2* 3*

root17

24 30

14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

62

2* 3*

17

24 30

14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

occupancy below 50%, redistribute!

Page 63: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

example b+ tree

delete 20*2* 3*

root17

24 30

14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

63

2* 3*

17

27 30

14* 16* 22* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

occupancy below 50%, redistribute!

24*

middle key is copied up!

Page 64: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

example b+ tree

delete 24*2* 3*

root17

27 30

14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

64occupancy below 50%, merge!

2* 3*

17

27 30

14* 16* 22* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

Page 65: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

example b+ tree

delete 24*2* 3*

root17

27 30

14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

65occupancy below 50%, merge!

2* 3*

17

27 30

14* 16* 22* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

delete from parent!reverse of copying up

Page 66: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

example b+ tree

delete 24*2* 3*

root17

27 30

14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

66

2* 3*

17

30

14* 16* 22* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

delete from parent!reverse of copying up

Page 67: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

example b+ tree

delete 24*2* 3*

root17

27 30

14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

67

2* 3*

17

30

14* 16* 22* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

delete from parent!reverse of copying up

Page 68: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

example b+ tree

delete 24*2* 3*

root17

27 30

14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

68

2* 3*

17

30

14* 16* 22* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

merge children of root!reverse of pushing up

Page 69: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

example b+ tree

delete 24*2* 3*

root17

27 30

14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

69

2* 3*

17 30

14* 16* 22* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

merge children of root!reverse of pushing up

Page 70: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve

example b+ tree

delete 24*2* 3*

root17

27 30

14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

70

2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*5* 8*

30135 17