CS 4604: Introduction to Database Management...

Preview:

Citation preview

CS4604:IntroductiontoDatabaseManagementSystems

B.AdityaPrakashLecture#8:StoringdataandIndexes

Annoucements

§  Extraofficehourstillmidterm– CheckPiazzapost

Prakash2018 VTCS4604 2

STORINGDATA

Prakash2018 VTCS4604 3

Prakash2018 VTCS4604

DBMSLayers:

Query Optimization and Execution

Relational Operators

Files and Access Methods

Buffer Management

Disk Space Management

DB

Queries

TODAYà

4

Prakash2018 VTCS4604

LeverageOSfordisk/filemanagement?

§  Layersofabstractionaregood…but:

5

Prakash2018 VTCS4604

LeverageOSfordisk/filemanagement?

§  Layersofabstractionaregood…but:– Unfortunately,OSoftengetsinthewayofDBMS

6

Prakash2018 VTCS4604

LeverageOSfordisk/filemanagement?

§  DBMSwants/needstodothings“itsownway”– Specializedprefetching– Controloverbufferreplacementpolicy

•  LRUnotalwaysbest(sometimesworst!!)– Controloverthread/processscheduling

•  “Convoyproblem”– AriseswhenOSschedulingconflictswithDBMSlocking

– Controloverflushingdatatodisk• WALprotocolrequiresflushinglogentriestodisk

7

Prakash2018 VTCS4604

DisksandFiles

§  DBMSstoresinformationondisks.– but:disksare(relatively)VERYslow!

§ MajorimplicationsforDBMSdesign!

8

Prakash2018 VTCS4604

DisksandFiles

§ MajorimplicationsforDBMSdesign:– READ:disk->mainmemory(RAM).– WRITE:reverse– Botharehigh-costoperations,relativetoin-memoryoperations,somustbeplannedcarefully!

9

Prakash2018 VTCS4604

WhyNotStoreItAllinMainMemory?

10

Prakash2018 VTCS4604

WhyNotStoreItAllinMainMemory?

§  Coststoomuch.– disk:~$1/Gb;memory:~$100/Gb– High-endDatabasestodayinthe10-100TBrange.

– Approx60%ofthecostofaproductionsystemisinthedisks.

§ Mainmemoryisvolatile.§  Note:somespecializedsystemsdostoreentiredatabaseinmainmemory.

11

Prakash2018 VTCS4604

TheStorageHierarchySmaller, Faster

Bigger, Slower

12

Prakash2018 VTCS4604

TheStorageHierarchy

– Main memory (RAM) for currently used data.

– Disk for the main database (secondary storage).

– Tapes for archiving older versions of the data (tertiary storage).

Smaller, Faster

Bigger, Slower

Registers

L1 Cache

Main Memory

Magnetic Disk

Magnetic Tape

...

13

Prakash2018 VTCS4604

JimGray’sStorageLatencyAnalogy:HowFarAwayistheData?

Registers On Chip Cache On Board Cache

Memory

Disk

1 2

10

100

Tape

10 9

10 6

Boston

This Building

This Room My Head

10 min

1.5 hr

2 Years

1 min

Pluto

2,000 Years

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

The image cannot be displayed. Your computer may not have

The image cannot be displayed. Your computer may not have enough

Andromeda

14

Prakash2018 VTCS4604

Disks§  Secondarystoragedeviceofchoice.§ Mainadvantageovertapes:randomaccessvs.sequential.

§  Dataisstoredandretrievedinunitscalleddiskblocksorpages.

§  UnlikeRAM,timetoretrieveadiskpagevariesdependinguponlocationondisk.–  relativeplacementofpagesondiskisimportant!

15

Prakash2018 VTCS4604

AnatomyofaDisk

Platters

Spindle

•  Sector •  Track •  Cylinder •  Platter •  Block size = multiple of sector size (which is fixed)

Disk head

Arm movement

Arm assembly

Tracks

Sector

#16

Prakash2018 VTCS4604

AccessingaDiskPage

§  Timetoaccess(read/write)adiskblock:–  .–  .–  .

17

Prakash2018 VTCS4604

AccessingaDiskPage

§  Timetoaccess(read/write)adiskblock:– seektime:movingarmstopositiondiskheadontrack

–  rotationaldelay:waitingforblocktorotateunderhead

–  transfertime:actuallymovingdatato/fromdisksurface

18

Prakash2018 VTCS4604

AccessingaDiskPage

§  Relativetimes?– seektime:–  rotationaldelay:–  transfertime:

19

Prakash2018 VTCS4604

AccessingaDiskPage

§  Relativetimes?– seektime:about1to20msec–  rotationaldelay:0to10msec–  transfertime:<1msecper4KBpage

Transfer

Seek

Rotate

transfer

20

Prakash2018 VTCS4604

Seektime&rotationaldelaydominate

§  KeytolowerI/Ocost:reduceseek/rotationdelays!

§  Alsonote:Forshareddisks,muchtimespentwaitinginqueueforaccesstoarm/controller

Seek

Rotate

transfer

21

Prakash2018 VTCS4604

ArrangingPagesonDisk

§  “Next” blockconcept:– blocksonsametrack,followedby– blocksonsamecylinder,followedby– blocksonadjacentcylinder

§  Accesing‘next’blockischeap§  Ausefuloptimization:pre-fetching

– Seetextbookpage323

22

Prakash2018 VTCS4604

Rulesofthumb…

1. MemoryaccessmuchfasterthandiskI/O(~1000x)

§  “Sequential”I/Ofasterthan“random”I/O(~10x)

23

Prakash2018 VTCS4604

Conclusions---Storing

§ Memoryhierarchy§  Disks:(>1000xslower)-thus

– packinfoinblocks–  trytofetchnearbyblocks(sequentially)

24

TREEINDEXES

Prakash2018 VTCS4604 25

DeclaringIndexes

§  Nostandard!§  Typicalsyntax:CREATE INDEX StudentsInd ON Students(ID);

CREATE INDEX CoursesInd ON Courses(Number, DeptName);

Prakash2018 VTCS4604 26

TypesofIndexes

§  Primary:indexonakey– Usedtoenforceconstraints

§  Secondary:indexonnon-keyattribute§  Clustering:orderoftherowsinthedatapagescorrespondtotheorderoftherowsintheindex– Onlyoneclusteredindexcanexistinagiventable– Usefulforrangepredicates

§  Non-clustering:physicalordernotthesameasindexorder

Prakash2018 VTCS4604 27

UsingIndexes(1):EqualitySearches

§  Givenavaluev,theindextakesustoonlythosetuplesthathavevintheattribute(s)oftheindex.

§  E.g.(useCourseIndindex)SELECT Enrollment FROM Courses WHERE Number = “4604” and DeptName = “CS”

Prakash2018 VTCS4604 28

UsingIndexes(1):EqualitySearches

§  Givenavaluev,theindextakesustoonlythosetuplesthathavevintheattribute(s)oftheindex.

§  CanuseHashes,butseenext

Prakash2018 VTCS4604 29

UsingIndexes(2):RangeSearches

§  ``Findallstudentswithgpa>3.0’’§  maybeslow,evenonsortedfile§  Hashesnotagoodidea!§ Whattodo?

Prakash2018 VTCS4604

Page 1 Page 2 Page N Page 3 Data File

30

RangeSearches

§  ``Findallstudentswithgpa>3.0’’§  maybeslow,evenonsortedfile§  Solution:Createan`index’file.

Prakash2018 VTCS4604

Page 1 Page 2 Page N Page 3 Data File

k2 kN k1 Index File

31

RangeSearches

§ Moredetails:§  ifindexfileissmall,dobinarysearchthere§  Otherwise??

Prakash2018 VTCS4604

Page 1 Page 2 Page N Page 3 Data File

k2 kN k1 Index File

32

B-trees

§  themostsuccessfulfamilyofindexschemes(B-trees,B+-trees,B*-trees)

§  Canbeusedforprimary/secondary,clustering/non-clusteringindex.

§  balanced“n-way”searchtrees§  OriginalPaper:RudolfBayerandMcCreight,E.M.OrganizationandMaintenanceofLargeOrderedIndexes.ActaInformatica1,173-189,1972.

Prakash2018 VTCS4604 33

B-trees

§  Eg.,B-treeoforderd=1:

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

34

B-treeproperties:

§  eachnode,inaB-treeoforderd:– Keyorder– atmostn=2dkeys– atleastdkeys(exceptroot,whichmayhavejust1key)

– allleavesatthesamelevel–  ifnumberofpointersisk,thennodehasexactlyk-1keys

–  (leavesareempty)

Prakash2018 VTCS4604

v1 v2 … vn-1

p1 pn

35

Properties

§  “blockaware”nodes:eachnodeisadiskpage§  O(log(N))foreverything!(ins/del/search)§  typically,ifd=50-100,then2-3levels§  utilization>=50%,guaranteed;onaverage69%

Prakash2018 VTCS4604 36

Queries

§  Algoforexactmatchquery?(eg.,ssn=8?)

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

37

JAVAanimation

§  http://slady.net/java/bt/

Prakash2018 VTCS4604 38

Queries

§  Algoforexactmatchquery?(eg.,ssn=8?)

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

39

Queries

§  Algoforexactmatchquery?(eg.,ssn=8?)

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

40

Queries

§  Algoforexactmatchquery?(eg.,ssn=8?)

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

41

Queries

§  Algoforexactmatchquery?(eg.,ssn=8?)

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

Hsteps(=diskaccesses)

42

Queries

§  whataboutrangequeries?(eg.,5<salary<8)§  Proximity/nearestneighborsearches?(eg.,salary~8)

Prakash2018 VTCS4604 43

Queries

§  whataboutrangequeries?(eg.,5<salary<8)§  Proximity/nearestneighborsearches?(eg.,salary~8)

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

44

Queries

§  whataboutrangequeries?(eg.,5<salary<8)§  Proximity/nearestneighborsearches?(eg.,salary~8)

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

45

Queries

§  whataboutrangequeries?(eg.,5<salary<8)§  Proximity/nearestneighborsearches?(eg.,salary~8)

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

46

Queries

§  whataboutrangequeries?(eg.,5<salary<8)§  Proximity/nearestneighborsearches?(eg.,salary~8)

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

47

Variations

§  HowcouldwedoevenbetterthantheB-treesabove?

Prakash2018 VTCS4604 48

B+trees-Motivation

§  B-tree–printkeysinsortedorder:

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

49

B+trees-Motivation

§  B-treeneedsback-tracking–howtoavoidit?

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

50

B+trees-Motivation

§  Strongerreason:forclusteringindex,datarecordsarescattered:

Prakash2018 VTCS4604

1 3

6

7

9

13

<6

>6 <9>9

51

Solution:B+-trees

§  facilitatesequentialops§  Theystringallleafnodestogether§  AND§  replicatekeysfromnon-leafnodes,tomakesureeverykeyappearsattheleaflevel

§  (vital,forclusteringindex!)

Prakash2018 VTCS4604 52

B+trees

Prakash2018 VTCS4604

1 3

6

6

9

9

<6

>=6 <9>=9

7 13

53

B+trees

Prakash2018 VTCS4604

1 3

6

6

9

9

<6

>=6 <9>=9

7 13

IndexPages

DataPages

54

B+trees

§ Moredetails:next(andtextbook)§  Inshort:onsplit

– atleaflevel:COPYmiddlekeyupstairs– atnon-leaflevel:pushmiddlekeyupstairs(asinplainB-tree)

Prakash2018 VTCS4604 55

ExampleB+Tree

§  Searchbeginsatroot,andkeycomparisonsdirectittoaleaf

§  Searchfor5*,15*,alldataentries>=24*...

Prakash2018 VTCS4604

Based on the search for 15*, we know it is not in the tree!

Root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13

56

InsertingaDataEntryintoaB+Tree

§  FindcorrectleafL.§  PutdataentryontoL.

–  IfLhasenoughspace,done!– Else,mustsplitL(intoLandanewnodeL2)

•  Redistributeentriesevenly,copyupmiddlekey.

§  parentnodemayoverflow– butthen:pushupmiddlekey.Splits“grow”tree;rootsplitincreasesheight.

Prakash2018 VTCS4604 57

ExampleB+Tree–Inserting30*

Prakash2018 VTCS4604

Root

17 24

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*

13

23*

58

ExampleB+Tree–Inserting30*

Prakash2018 VTCS4604

Root

17 24

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*

13

23* 30*

59

ExampleB+Tree-Inserting8*

Prakash2018 VTCS4604

Root

17 24

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*

13

23*

60

ExampleB+Tree-Inserting8*

Prakash2018 VTCS4604

Root

17 24

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*

13

23*

NoSpace

61

Prakash2018 VTCS4604

ExampleB+Tree-Inserting8*Root

17 24

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*

13

23*

2* 3* 5* 14* 16* 19* 20* 22* 24* 27* 29* 23* 7* 8*

13 17 24

5*

SoSplit!

62

Prakash2018 VTCS4604

ExampleB+Tree-Inserting8*Root

17 24

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*

13

23*

2* 3* 5* 14* 16* 19* 20* 22* 24* 27* 29* 23* 7* 8*

13 17 24

5*

SoSplit!

AndthenpushmiddleUP

63

Prakash2018 VTCS4604

ExampleB+Tree-Inserting8*Root

17 24

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*

13

23*

2* 3* 14* 16* 19* 20* 22* 24* 27* 29* 23* 7* 8*

5 13 17 24

5*

<5 >=5

FinalState

64

ExampleB+Tree-Inserting21*

Prakash2018 VTCS4604

2* 3*

Root

5

14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*

13 17 24

23*

2* 3* 14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8* 23*

65

ExampleB+Tree-Inserting21*

Prakash2018 VTCS4604

2* 3*

Root

5

14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*

13 17 24

23*

2* 3* 14* 16* 19* 20* 24* 27* 29* 7* 5* 8* 21* 22* 23*

17 21 24 13 5 RootisFull,sosplitrecursively

66

ExampleB+Tree:Recursivesplit

Prakash2018 VTCS4604

•  Notice that root was also split, increasing height.

2* 3*

Root

17

21 24

14* 16* 19* 20* 21* 22* 23* 24* 27* 29*

13 5

7* 5* 8*

67

Prakash2018 VTCS4604

Example:Datavs.IndexPageSplit

§  leaf:‘copy’§  non-leaf:‘push’

§  whynot‘copy’@non-leaves?

2* 3* 5* 7* 8*

5

5 21 24

17

13

… 2* 3* 5* 7*

17 21 24 13

Data Page Split

Index Page Split

8*

5

#68

SameInserting21*:TheDeferredSplit

Prakash2018 VTCS4604

2* 3*

Root

5

14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*

13 17 24

23*

Notethishasfreespace.So…

69

Inserting21*:TheDeferredSplit

Prakash2018 VTCS4604

2* 3*

Root

5

14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*

13 17 24

23*

LENDkeystosibling,throughPARENT!

2* 3*

Root

5

14* 16* 19* 20* 21* 23* 24* 27* 7* 5* 8*

13 17 23

22* 29*

70

Inserting21*:TheDeferredSplit

Prakash2018 VTCS4604

2* 3*

Root

5

14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*

13 17 24

23*

Shorter,morepacked,fastertree

2* 3*

Root

5

14* 16* 19* 20* 21* 23* 24* 27* 7* 5* 8*

13 17 23

22* 29*

71

Insertionexamplesforyoutotry

Prakash2018 VTCS4604

2* 3*

Root

30

14* 16* 21* 22* 23*

13 5

7* 5* 8*

20 … (not shown)

11*

Insert the following data entries (in order): 28*, 6*, 25*

72

Answer…

Prakash2018 VTCS4604

2* 3*

30

7* 8* 14* 16*

7 5

6* 5*

13 …

After inserting 28*, 6*

After inserting 25*

21* 22* 23* 28*

20

11*

73

Answer…

Prakash2018 VTCS4604

2* 3*

13

20 23

7* 8* 14* 16* 21* 22* 23* 25* 28*

7 5

6* 5*

30

11*

After inserting 25*

74

DeletingaDataEntryfromaB+Tree

§  Startatroot,findleafLwhereentrybelongs.§  Removetheentry.

–  IfLisatleasthalf-full,done!–  IfLunderflows

•  Trytore-distribute,borrowingfromsibling(adjacentnodewithsameparentasL).

•  Ifre-distributionfails,mergeLandsibling.–  updateparent–  andpossiblymerge,recursively

Prakash2018 VTCS4604 75

DeletionfromB+Tree

Prakash2018 VTCS4604 76

2* 3*

Root 17

24 30

14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13 5

7* 5* 8*

1

Prakash2018 VTCS4604

Example:Delete19*&20*

Deleting19*iseasy:

2* 3*

Root 17

24 30

14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13 5

7* 5* 8*

2* 3*

Root 17

30

14* 16* 33* 34* 38* 39*

13 5

7* 5* 8* 22* 24*

27

27* 29*

20* 22*

•  Deleting20*->re-distribution(notice:27copiedup)

1 2

3

77

Prakash2018 VTCS4604

2* 3*

Root 17

30

14* 16* 33* 34* 38* 39*

13 5

7* 5* 8* 22* 24*

27

27* 29*

...AndThenDeleting24*

2* 3*

Root 17

14* 16* 33* 34* 38* 39*

13 5

7* 5* 8* 22* 27*

30

29*

•  Mustmergeleaves:OPPOSITEofinsert

3

4

78

Prakash2018 VTCS4604

2* 3*

Root 17

30

14* 16* 33* 34* 38* 39*

13 5

7* 5* 8* 22* 24*

27

27* 29*

...AndThenDeleting24*

2* 3*

Root 17

14* 16* 33* 34* 38* 39*

13 5

7* 5* 8* 22* 27*

30

29*

•  Mustmergeleaves:OPPOSITEofinsert

…butarewedone??

3

4

79

...MergeNon-LeafNodes,ShrinkTree

Prakash2018 VTCS4604

2* 3*

Root 17

14* 16* 33* 34* 38* 39*

13 5

7* 5* 8* 22* 27*

30

29*

4

2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39* 5* 8*

Root 30 13 5 17

5

80

ExampleofNon-leafRe-distribution

§  Treeisshownbelowduringdeletionof24*.§  Now,wecanre-distributekeys

Prakash2018 VTCS4604

Root

13 5 17 20

22

30

14* 16* 17* 18* 20* 33* 34* 38* 39* 22* 27* 29* 21* 7* 5* 8* 3* 2*

81

AfterRe-distribution

§  needonlyre-distribute‘20’;did‘17’,too§  whywouldwewanttore-distributemorekeys?Ans:reduceslikelihoodofsplit(seeBook,pg.356)

Prakash2018 VTCS4604

14* 16* 33* 34* 38* 39* 22* 27* 29* 17* 18* 20* 21* 7* 5* 8* 2* 3*

Root

13 5

17

30 20 22

82

Mainobservationsfordeletion

§  Ifakeyvalueappearstwice(leaf+nonleaf),theabovealgorithmsdeleteitfromtheleaf,only

§  whynotnon-leaf,too?

Prakash2018 VTCS4604 83

Mainobservationsfordeletion

§  Ifakeyvalueappearstwice(leaf+nonleaf),theabovealgorithmsdeleteitfromtheleaf,only

§  whynotnon-leaf,too?§  ‘lazydeletions’-infact,somevendorsjustmarkentriesasdeleted(~underflow),– andreorganize/compactlater

Prakash2018 VTCS4604 84

Recap:mainideas

§  onoverflow,split(and‘push’,or‘copy’)– orconsiderdeferredsplit

§  onunderflow,borrowkeys;ormerge– orletitunderflow...

Prakash2018 VTCS4604 85

B+TreesinPractice

§  Typicalorder:100.Typicalfill-factor:67%.– averagefanout=2*100*0.67=134

§  Typicalcapacities:– Height4:1334=312,900,721entries– Height3:1333=2,406,104entries

Prakash2018 VTCS4604 86

B+TreesinPractice

§  Canoftenkeeptoplevelsinbufferpool:– Level1=1page=8KB– Level2=134pages=1MB– Level3=17,956pages=140MB

Prakash2018 VTCS4604 87

BulkLoadingofaB+Tree

§  Inanemptytree,insertmanykeys§ Whynotone-at-a-time?

– Tooslow!

Prakash2018 VTCS4604 88

BulkLoadingofaB+Tree

§  Initialization:Sortalldataentries§  scanlist;wheneverenoughforapage,pack§  <repeatforupperlevel>

Prakash2018 VTCS4604

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*

Sorted pages of data entries; not yet in B+ tree Root

89

Prakash2018 VTCS4604

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*

Root

Data entry pages not yet in B+ tree 35 23 12 6

10 20

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*

6

Root

10

12 23

20

35

38

not yet in B+ tree Data entry pages

BulkLoadingofaB+Tree

#90

ANoteon`Order’

§  Order(d)conceptreplacedbyphysicalspacecriterioninpractice(`atleasthalf-full’).

§ Manyrealsystemsareevensloppierthanthis:theyallowunderflow,andonlyreclaimspacewhenapageiscompletelyempty.

§  (whatarethebenefitsofsuch‘slopiness’?)

Prakash2018 VTCS4604 91

Conclusions

§  B+treeistheprevailingindexingmethod§  Excellent,O(logN)worst-caseperformanceforins/del/search;(~3-4diskaccessesinpractice)

§  guaranteed50%spaceutilization;avg69%

Prakash2018 VTCS4604 92

Conclusions

§  Canbeusedforanytypeofindex:primary/secondary,sparse(clustering),ordense(non-clustering)

§  Severalfine-extensionsonthebasicalgorithm– deferredsplit;– bulk-loading

Prakash2018 VTCS4604 93

Recommended