93
CS 4604: Introduction to Database Management Systems B. Aditya Prakash Lecture #8: Storing data and Indexes

CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

CS4604:IntroductiontoDatabaseManagementSystems

B.AdityaPrakashLecture#8:StoringdataandIndexes

Page 2: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Annoucements

§  Extraofficehourstillmidterm– CheckPiazzapost

Prakash2018 VTCS4604 2

Page 3: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

STORINGDATA

Prakash2018 VTCS4604 3

Page 4: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

DBMSLayers:

Query Optimization and Execution

Relational Operators

Files and Access Methods

Buffer Management

Disk Space Management

DB

Queries

TODAYà

4

Page 5: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

LeverageOSfordisk/filemanagement?

§  Layersofabstractionaregood…but:

5

Page 6: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

LeverageOSfordisk/filemanagement?

§  Layersofabstractionaregood…but:– Unfortunately,OSoftengetsinthewayofDBMS

6

Page 7: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

LeverageOSfordisk/filemanagement?

§  DBMSwants/needstodothings“itsownway”– Specializedprefetching– Controloverbufferreplacementpolicy

•  LRUnotalwaysbest(sometimesworst!!)– Controloverthread/processscheduling

•  “Convoyproblem”– AriseswhenOSschedulingconflictswithDBMSlocking

– Controloverflushingdatatodisk• WALprotocolrequiresflushinglogentriestodisk

7

Page 8: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

DisksandFiles

§  DBMSstoresinformationondisks.– but:disksare(relatively)VERYslow!

§ MajorimplicationsforDBMSdesign!

8

Page 9: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

DisksandFiles

§ MajorimplicationsforDBMSdesign:– READ:disk->mainmemory(RAM).– WRITE:reverse– Botharehigh-costoperations,relativetoin-memoryoperations,somustbeplannedcarefully!

9

Page 10: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

WhyNotStoreItAllinMainMemory?

10

Page 11: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

WhyNotStoreItAllinMainMemory?

§  Coststoomuch.– disk:~$1/Gb;memory:~$100/Gb– High-endDatabasestodayinthe10-100TBrange.

– Approx60%ofthecostofaproductionsystemisinthedisks.

§ Mainmemoryisvolatile.§  Note:somespecializedsystemsdostoreentiredatabaseinmainmemory.

11

Page 12: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

TheStorageHierarchySmaller, Faster

Bigger, Slower

12

Page 13: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

TheStorageHierarchy

– Main memory (RAM) for currently used data.

– Disk for the main database (secondary storage).

– Tapes for archiving older versions of the data (tertiary storage).

Smaller, Faster

Bigger, Slower

Registers

L1 Cache

Main Memory

Magnetic Disk

Magnetic Tape

...

13

Page 14: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

JimGray’sStorageLatencyAnalogy:HowFarAwayistheData?

Registers On Chip Cache On Board Cache

Memory

Disk

1 2

10

100

Tape

10 9

10 6

Boston

This Building

This Room My Head

10 min

1.5 hr

2 Years

1 min

Pluto

2,000 Years

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

The image cannot be displayed. Your computer may not have

The image cannot be displayed. Your computer may not have enough

Andromeda

14

Page 15: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

Disks§  Secondarystoragedeviceofchoice.§ Mainadvantageovertapes:randomaccessvs.sequential.

§  Dataisstoredandretrievedinunitscalleddiskblocksorpages.

§  UnlikeRAM,timetoretrieveadiskpagevariesdependinguponlocationondisk.–  relativeplacementofpagesondiskisimportant!

15

Page 16: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

AnatomyofaDisk

Platters

Spindle

•  Sector •  Track •  Cylinder •  Platter •  Block size = multiple of sector size (which is fixed)

Disk head

Arm movement

Arm assembly

Tracks

Sector

#16

Page 17: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

AccessingaDiskPage

§  Timetoaccess(read/write)adiskblock:–  .–  .–  .

17

Page 18: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

AccessingaDiskPage

§  Timetoaccess(read/write)adiskblock:– seektime:movingarmstopositiondiskheadontrack

–  rotationaldelay:waitingforblocktorotateunderhead

–  transfertime:actuallymovingdatato/fromdisksurface

18

Page 19: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

AccessingaDiskPage

§  Relativetimes?– seektime:–  rotationaldelay:–  transfertime:

19

Page 20: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

AccessingaDiskPage

§  Relativetimes?– seektime:about1to20msec–  rotationaldelay:0to10msec–  transfertime:<1msecper4KBpage

Transfer

Seek

Rotate

transfer

20

Page 21: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

Seektime&rotationaldelaydominate

§  KeytolowerI/Ocost:reduceseek/rotationdelays!

§  Alsonote:Forshareddisks,muchtimespentwaitinginqueueforaccesstoarm/controller

Seek

Rotate

transfer

21

Page 22: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

ArrangingPagesonDisk

§  “Next” blockconcept:– blocksonsametrack,followedby– blocksonsamecylinder,followedby– blocksonadjacentcylinder

§  Accesing‘next’blockischeap§  Ausefuloptimization:pre-fetching

– Seetextbookpage323

22

Page 23: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

Rulesofthumb…

1. MemoryaccessmuchfasterthandiskI/O(~1000x)

§  “Sequential”I/Ofasterthan“random”I/O(~10x)

23

Page 24: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

Conclusions---Storing

§ Memoryhierarchy§  Disks:(>1000xslower)-thus

– packinfoinblocks–  trytofetchnearbyblocks(sequentially)

24

Page 25: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

TREEINDEXES

Prakash2018 VTCS4604 25

Page 26: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

DeclaringIndexes

§  Nostandard!§  Typicalsyntax:CREATE INDEX StudentsInd ON Students(ID);

CREATE INDEX CoursesInd ON Courses(Number, DeptName);

Prakash2018 VTCS4604 26

Page 27: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

TypesofIndexes

§  Primary:indexonakey– Usedtoenforceconstraints

§  Secondary:indexonnon-keyattribute§  Clustering:orderoftherowsinthedatapagescorrespondtotheorderoftherowsintheindex– Onlyoneclusteredindexcanexistinagiventable– Usefulforrangepredicates

§  Non-clustering:physicalordernotthesameasindexorder

Prakash2018 VTCS4604 27

Page 28: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

UsingIndexes(1):EqualitySearches

§  Givenavaluev,theindextakesustoonlythosetuplesthathavevintheattribute(s)oftheindex.

§  E.g.(useCourseIndindex)SELECT Enrollment FROM Courses WHERE Number = “4604” and DeptName = “CS”

Prakash2018 VTCS4604 28

Page 29: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

UsingIndexes(1):EqualitySearches

§  Givenavaluev,theindextakesustoonlythosetuplesthathavevintheattribute(s)oftheindex.

§  CanuseHashes,butseenext

Prakash2018 VTCS4604 29

Page 30: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

UsingIndexes(2):RangeSearches

§  ``Findallstudentswithgpa>3.0’’§  maybeslow,evenonsortedfile§  Hashesnotagoodidea!§ Whattodo?

Prakash2018 VTCS4604

Page 1 Page 2 Page N Page 3 Data File

30

Page 31: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

RangeSearches

§  ``Findallstudentswithgpa>3.0’’§  maybeslow,evenonsortedfile§  Solution:Createan`index’file.

Prakash2018 VTCS4604

Page 1 Page 2 Page N Page 3 Data File

k2 kN k1 Index File

31

Page 32: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

RangeSearches

§ Moredetails:§  ifindexfileissmall,dobinarysearchthere§  Otherwise??

Prakash2018 VTCS4604

Page 1 Page 2 Page N Page 3 Data File

k2 kN k1 Index File

32

Page 33: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

B-trees

§  themostsuccessfulfamilyofindexschemes(B-trees,B+-trees,B*-trees)

§  Canbeusedforprimary/secondary,clustering/non-clusteringindex.

§  balanced“n-way”searchtrees§  OriginalPaper:RudolfBayerandMcCreight,E.M.OrganizationandMaintenanceofLargeOrderedIndexes.ActaInformatica1,173-189,1972.

Prakash2018 VTCS4604 33

Page 34: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

B-trees

§  Eg.,B-treeoforderd=1:

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

34

Page 35: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

B-treeproperties:

§  eachnode,inaB-treeoforderd:– Keyorder– atmostn=2dkeys– atleastdkeys(exceptroot,whichmayhavejust1key)

– allleavesatthesamelevel–  ifnumberofpointersisk,thennodehasexactlyk-1keys

–  (leavesareempty)

Prakash2018 VTCS4604

v1 v2 … vn-1

p1 pn

35

Page 36: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Properties

§  “blockaware”nodes:eachnodeisadiskpage§  O(log(N))foreverything!(ins/del/search)§  typically,ifd=50-100,then2-3levels§  utilization>=50%,guaranteed;onaverage69%

Prakash2018 VTCS4604 36

Page 37: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Queries

§  Algoforexactmatchquery?(eg.,ssn=8?)

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

37

Page 38: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

JAVAanimation

§  http://slady.net/java/bt/

Prakash2018 VTCS4604 38

Page 39: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Queries

§  Algoforexactmatchquery?(eg.,ssn=8?)

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

39

Page 40: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Queries

§  Algoforexactmatchquery?(eg.,ssn=8?)

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

40

Page 41: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Queries

§  Algoforexactmatchquery?(eg.,ssn=8?)

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

41

Page 42: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Queries

§  Algoforexactmatchquery?(eg.,ssn=8?)

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

Hsteps(=diskaccesses)

42

Page 43: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Queries

§  whataboutrangequeries?(eg.,5<salary<8)§  Proximity/nearestneighborsearches?(eg.,salary~8)

Prakash2018 VTCS4604 43

Page 44: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Queries

§  whataboutrangequeries?(eg.,5<salary<8)§  Proximity/nearestneighborsearches?(eg.,salary~8)

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

44

Page 45: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Queries

§  whataboutrangequeries?(eg.,5<salary<8)§  Proximity/nearestneighborsearches?(eg.,salary~8)

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

45

Page 46: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Queries

§  whataboutrangequeries?(eg.,5<salary<8)§  Proximity/nearestneighborsearches?(eg.,salary~8)

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

46

Page 47: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Queries

§  whataboutrangequeries?(eg.,5<salary<8)§  Proximity/nearestneighborsearches?(eg.,salary~8)

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

47

Page 48: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Variations

§  HowcouldwedoevenbetterthantheB-treesabove?

Prakash2018 VTCS4604 48

Page 49: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

B+trees-Motivation

§  B-tree–printkeysinsortedorder:

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

49

Page 50: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

B+trees-Motivation

§  B-treeneedsback-tracking–howtoavoidit?

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

50

Page 51: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

B+trees-Motivation

§  Strongerreason:forclusteringindex,datarecordsarescattered:

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

51

Page 52: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Solution:B+-trees

§  facilitatesequentialops§  Theystringallleafnodestogether§  AND§  replicatekeysfromnon-leafnodes,tomakesureeverykeyappearsattheleaflevel

§  (vital,forclusteringindex!)

Prakash2018 VTCS4604 52

Page 53: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

B+trees

Prakash2018 VTCS4604

1 3

6

6

9

9

<6

>=6 <9>=9

7 13

53

Page 54: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

B+trees

Prakash2018 VTCS4604

1 3

6

6

9

9

<6

>=6 <9>=9

7 13

IndexPages

DataPages

54

Page 55: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

B+trees

§ Moredetails:next(andtextbook)§  Inshort:onsplit

– atleaflevel:COPYmiddlekeyupstairs– atnon-leaflevel:pushmiddlekeyupstairs(asinplainB-tree)

Prakash2018 VTCS4604 55

Page 56: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

ExampleB+Tree

§  Searchbeginsatroot,andkeycomparisonsdirectittoaleaf

§  Searchfor5*,15*,alldataentries>=24*...

Prakash2018 VTCS4604

Based on the search for 15*, we know it is not in the tree!

Root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13

56

Page 57: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

InsertingaDataEntryintoaB+Tree

§  FindcorrectleafL.§  PutdataentryontoL.

–  IfLhasenoughspace,done!– Else,mustsplitL(intoLandanewnodeL2)

•  Redistributeentriesevenly,copyupmiddlekey.

§  parentnodemayoverflow– butthen:pushupmiddlekey.Splits“grow”tree;rootsplitincreasesheight.

Prakash2018 VTCS4604 57

Page 58: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

ExampleB+Tree–Inserting30*

Prakash2018 VTCS4604

Root

17 24

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*

13

23*

58

Page 59: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

ExampleB+Tree–Inserting30*

Prakash2018 VTCS4604

Root

17 24

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*

13

23* 30*

59

Page 60: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

ExampleB+Tree-Inserting8*

Prakash2018 VTCS4604

Root

17 24

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*

13

23*

60

Page 61: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

ExampleB+Tree-Inserting8*

Prakash2018 VTCS4604

Root

17 24

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*

13

23*

NoSpace

61

Page 62: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

ExampleB+Tree-Inserting8*Root

17 24

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*

13

23*

2* 3* 5* 14* 16* 19* 20* 22* 24* 27* 29* 23* 7* 8*

13 17 24

5*

SoSplit!

62

Page 63: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

ExampleB+Tree-Inserting8*Root

17 24

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*

13

23*

2* 3* 5* 14* 16* 19* 20* 22* 24* 27* 29* 23* 7* 8*

13 17 24

5*

SoSplit!

AndthenpushmiddleUP

63

Page 64: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

ExampleB+Tree-Inserting8*Root

17 24

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*

13

23*

2* 3* 14* 16* 19* 20* 22* 24* 27* 29* 23* 7* 8*

5 13 17 24

5*

<5 >=5

FinalState

64

Page 65: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

ExampleB+Tree-Inserting21*

Prakash2018 VTCS4604

2* 3*

Root

5

14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*

13 17 24

23*

2* 3* 14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8* 23*

65

Page 66: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

ExampleB+Tree-Inserting21*

Prakash2018 VTCS4604

2* 3*

Root

5

14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*

13 17 24

23*

2* 3* 14* 16* 19* 20* 24* 27* 29* 7* 5* 8* 21* 22* 23*

17 21 24 13 5 RootisFull,sosplitrecursively

66

Page 67: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

ExampleB+Tree:Recursivesplit

Prakash2018 VTCS4604

•  Notice that root was also split, increasing height.

2* 3*

Root

17

21 24

14* 16* 19* 20* 21* 22* 23* 24* 27* 29*

13 5

7* 5* 8*

67

Page 68: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

Example:Datavs.IndexPageSplit

§  leaf:‘copy’§  non-leaf:‘push’

§  whynot‘copy’@non-leaves?

2* 3* 5* 7* 8*

5

5 21 24

17

13

… 2* 3* 5* 7*

17 21 24 13

Data Page Split

Index Page Split

8*

5

#68

Page 69: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

SameInserting21*:TheDeferredSplit

Prakash2018 VTCS4604

2* 3*

Root

5

14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*

13 17 24

23*

Notethishasfreespace.So…

69

Page 70: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Inserting21*:TheDeferredSplit

Prakash2018 VTCS4604

2* 3*

Root

5

14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*

13 17 24

23*

LENDkeystosibling,throughPARENT!

2* 3*

Root

5

14* 16* 19* 20* 21* 23* 24* 27* 7* 5* 8*

13 17 23

22* 29*

70

Page 71: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Inserting21*:TheDeferredSplit

Prakash2018 VTCS4604

2* 3*

Root

5

14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*

13 17 24

23*

Shorter,morepacked,fastertree

2* 3*

Root

5

14* 16* 19* 20* 21* 23* 24* 27* 7* 5* 8*

13 17 23

22* 29*

71

Page 72: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Insertionexamplesforyoutotry

Prakash2018 VTCS4604

2* 3*

Root

30

14* 16* 21* 22* 23*

13 5

7* 5* 8*

20 … (not shown)

11*

Insert the following data entries (in order): 28*, 6*, 25*

72

Page 73: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Answer…

Prakash2018 VTCS4604

2* 3*

30

7* 8* 14* 16*

7 5

6* 5*

13 …

After inserting 28*, 6*

After inserting 25*

21* 22* 23* 28*

20

11*

73

Page 74: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Answer…

Prakash2018 VTCS4604

2* 3*

13

20 23

7* 8* 14* 16* 21* 22* 23* 25* 28*

7 5

6* 5*

30

11*

After inserting 25*

74

Page 75: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

DeletingaDataEntryfromaB+Tree

§  Startatroot,findleafLwhereentrybelongs.§  Removetheentry.

–  IfLisatleasthalf-full,done!–  IfLunderflows

•  Trytore-distribute,borrowingfromsibling(adjacentnodewithsameparentasL).

•  Ifre-distributionfails,mergeLandsibling.–  updateparent–  andpossiblymerge,recursively

Prakash2018 VTCS4604 75

Page 76: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

DeletionfromB+Tree

Prakash2018 VTCS4604 76

2* 3*

Root 17

24 30

14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13 5

7* 5* 8*

1

Page 77: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

Example:Delete19*&20*

Deleting19*iseasy:

2* 3*

Root 17

24 30

14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13 5

7* 5* 8*

2* 3*

Root 17

30

14* 16* 33* 34* 38* 39*

13 5

7* 5* 8* 22* 24*

27

27* 29*

20* 22*

•  Deleting20*->re-distribution(notice:27copiedup)

1 2

3

77

Page 78: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

2* 3*

Root 17

30

14* 16* 33* 34* 38* 39*

13 5

7* 5* 8* 22* 24*

27

27* 29*

...AndThenDeleting24*

2* 3*

Root 17

14* 16* 33* 34* 38* 39*

13 5

7* 5* 8* 22* 27*

30

29*

•  Mustmergeleaves:OPPOSITEofinsert

3

4

78

Page 79: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

2* 3*

Root 17

30

14* 16* 33* 34* 38* 39*

13 5

7* 5* 8* 22* 24*

27

27* 29*

...AndThenDeleting24*

2* 3*

Root 17

14* 16* 33* 34* 38* 39*

13 5

7* 5* 8* 22* 27*

30

29*

•  Mustmergeleaves:OPPOSITEofinsert

…butarewedone??

3

4

79

Page 80: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

...MergeNon-LeafNodes,ShrinkTree

Prakash2018 VTCS4604

2* 3*

Root 17

14* 16* 33* 34* 38* 39*

13 5

7* 5* 8* 22* 27*

30

29*

4

2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39* 5* 8*

Root 30 13 5 17

5

80

Page 81: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

ExampleofNon-leafRe-distribution

§  Treeisshownbelowduringdeletionof24*.§  Now,wecanre-distributekeys

Prakash2018 VTCS4604

Root

13 5 17 20

22

30

14* 16* 17* 18* 20* 33* 34* 38* 39* 22* 27* 29* 21* 7* 5* 8* 3* 2*

81

Page 82: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

AfterRe-distribution

§  needonlyre-distribute‘20’;did‘17’,too§  whywouldwewanttore-distributemorekeys?Ans:reduceslikelihoodofsplit(seeBook,pg.356)

Prakash2018 VTCS4604

14* 16* 33* 34* 38* 39* 22* 27* 29* 17* 18* 20* 21* 7* 5* 8* 2* 3*

Root

13 5

17

30 20 22

82

Page 83: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Mainobservationsfordeletion

§  Ifakeyvalueappearstwice(leaf+nonleaf),theabovealgorithmsdeleteitfromtheleaf,only

§  whynotnon-leaf,too?

Prakash2018 VTCS4604 83

Page 84: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Mainobservationsfordeletion

§  Ifakeyvalueappearstwice(leaf+nonleaf),theabovealgorithmsdeleteitfromtheleaf,only

§  whynotnon-leaf,too?§  ‘lazydeletions’-infact,somevendorsjustmarkentriesasdeleted(~underflow),– andreorganize/compactlater

Prakash2018 VTCS4604 84

Page 85: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Recap:mainideas

§  onoverflow,split(and‘push’,or‘copy’)– orconsiderdeferredsplit

§  onunderflow,borrowkeys;ormerge– orletitunderflow...

Prakash2018 VTCS4604 85

Page 86: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

B+TreesinPractice

§  Typicalorder:100.Typicalfill-factor:67%.– averagefanout=2*100*0.67=134

§  Typicalcapacities:– Height4:1334=312,900,721entries– Height3:1333=2,406,104entries

Prakash2018 VTCS4604 86

Page 87: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

B+TreesinPractice

§  Canoftenkeeptoplevelsinbufferpool:– Level1=1page=8KB– Level2=134pages=1MB– Level3=17,956pages=140MB

Prakash2018 VTCS4604 87

Page 88: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

BulkLoadingofaB+Tree

§  Inanemptytree,insertmanykeys§ Whynotone-at-a-time?

– Tooslow!

Prakash2018 VTCS4604 88

Page 89: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

BulkLoadingofaB+Tree

§  Initialization:Sortalldataentries§  scanlist;wheneverenoughforapage,pack§  <repeatforupperlevel>

Prakash2018 VTCS4604

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*

Sorted pages of data entries; not yet in B+ tree Root

89

Page 90: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Prakash2018 VTCS4604

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*

Root

Data entry pages not yet in B+ tree 35 23 12 6

10 20

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*

6

Root

10

12 23

20

35

38

not yet in B+ tree Data entry pages

BulkLoadingofaB+Tree

#90

Page 91: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

ANoteon`Order’

§  Order(d)conceptreplacedbyphysicalspacecriterioninpractice(`atleasthalf-full’).

§ Manyrealsystemsareevensloppierthanthis:theyallowunderflow,andonlyreclaimspacewhenapageiscompletelyempty.

§  (whatarethebenefitsofsuch‘slopiness’?)

Prakash2018 VTCS4604 91

Page 92: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Conclusions

§  B+treeistheprevailingindexingmethod§  Excellent,O(logN)worst-caseperformanceforins/del/search;(~3-4diskaccessesinpractice)

§  guaranteed50%spaceutilization;avg69%

Prakash2018 VTCS4604 92

Page 93: CS 4604: Introduction to Database Management Systemscourses.cs.vt.edu/~cs4604/Fall18/lectures/lecture-8.pdf§ Extra office hours till midterm – Check Piazza post Prakash 2018 VT

Conclusions

§  Canbeusedforanytypeofindex:primary/secondary,sparse(clustering),ordense(non-clustering)

§  Severalfine-extensionsonthebasicalgorithm– deferredsplit;– bulk-loading

Prakash2018 VTCS4604 93