View
1
Download
0
Category
Preview:
Citation preview
CS4604:IntroductiontoDatabaseManagementSystems
B.AdityaPrakashLecture#8:StoringdataandIndexes
Annoucements
§ Extraofficehourstillmidterm– CheckPiazzapost
Prakash2018 VTCS4604 2
STORINGDATA
Prakash2018 VTCS4604 3
Prakash2018 VTCS4604
DBMSLayers:
Query Optimization and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Queries
TODAYà
4
Prakash2018 VTCS4604
LeverageOSfordisk/filemanagement?
§ Layersofabstractionaregood…but:
5
Prakash2018 VTCS4604
LeverageOSfordisk/filemanagement?
§ Layersofabstractionaregood…but:– Unfortunately,OSoftengetsinthewayofDBMS
6
Prakash2018 VTCS4604
LeverageOSfordisk/filemanagement?
§ DBMSwants/needstodothings“itsownway”– Specializedprefetching– Controloverbufferreplacementpolicy
• LRUnotalwaysbest(sometimesworst!!)– Controloverthread/processscheduling
• “Convoyproblem”– AriseswhenOSschedulingconflictswithDBMSlocking
– Controloverflushingdatatodisk• WALprotocolrequiresflushinglogentriestodisk
7
Prakash2018 VTCS4604
DisksandFiles
§ DBMSstoresinformationondisks.– but:disksare(relatively)VERYslow!
§ MajorimplicationsforDBMSdesign!
8
Prakash2018 VTCS4604
DisksandFiles
§ MajorimplicationsforDBMSdesign:– READ:disk->mainmemory(RAM).– WRITE:reverse– Botharehigh-costoperations,relativetoin-memoryoperations,somustbeplannedcarefully!
9
Prakash2018 VTCS4604
WhyNotStoreItAllinMainMemory?
10
Prakash2018 VTCS4604
WhyNotStoreItAllinMainMemory?
§ Coststoomuch.– disk:~$1/Gb;memory:~$100/Gb– High-endDatabasestodayinthe10-100TBrange.
– Approx60%ofthecostofaproductionsystemisinthedisks.
§ Mainmemoryisvolatile.§ Note:somespecializedsystemsdostoreentiredatabaseinmainmemory.
11
Prakash2018 VTCS4604
TheStorageHierarchySmaller, Faster
Bigger, Slower
12
Prakash2018 VTCS4604
TheStorageHierarchy
– Main memory (RAM) for currently used data.
– Disk for the main database (secondary storage).
– Tapes for archiving older versions of the data (tertiary storage).
Smaller, Faster
Bigger, Slower
Registers
L1 Cache
Main Memory
Magnetic Disk
Magnetic Tape
...
13
Prakash2018 VTCS4604
JimGray’sStorageLatencyAnalogy:HowFarAwayistheData?
Registers On Chip Cache On Board Cache
Memory
Disk
1 2
10
100
Tape
10 9
10 6
Boston
This Building
This Room My Head
10 min
1.5 hr
2 Years
1 min
Pluto
2,000 Years
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
The image cannot be displayed. Your computer may not have
The image cannot be displayed. Your computer may not have enough
Andromeda
14
Prakash2018 VTCS4604
Disks§ Secondarystoragedeviceofchoice.§ Mainadvantageovertapes:randomaccessvs.sequential.
§ Dataisstoredandretrievedinunitscalleddiskblocksorpages.
§ UnlikeRAM,timetoretrieveadiskpagevariesdependinguponlocationondisk.– relativeplacementofpagesondiskisimportant!
15
Prakash2018 VTCS4604
AnatomyofaDisk
Platters
Spindle
• Sector • Track • Cylinder • Platter • Block size = multiple of sector size (which is fixed)
Disk head
Arm movement
Arm assembly
Tracks
Sector
#16
Prakash2018 VTCS4604
AccessingaDiskPage
§ Timetoaccess(read/write)adiskblock:– .– .– .
17
Prakash2018 VTCS4604
AccessingaDiskPage
§ Timetoaccess(read/write)adiskblock:– seektime:movingarmstopositiondiskheadontrack
– rotationaldelay:waitingforblocktorotateunderhead
– transfertime:actuallymovingdatato/fromdisksurface
18
Prakash2018 VTCS4604
AccessingaDiskPage
§ Relativetimes?– seektime:– rotationaldelay:– transfertime:
19
Prakash2018 VTCS4604
AccessingaDiskPage
§ Relativetimes?– seektime:about1to20msec– rotationaldelay:0to10msec– transfertime:<1msecper4KBpage
Transfer
Seek
Rotate
transfer
20
Prakash2018 VTCS4604
Seektime&rotationaldelaydominate
§ KeytolowerI/Ocost:reduceseek/rotationdelays!
§ Alsonote:Forshareddisks,muchtimespentwaitinginqueueforaccesstoarm/controller
Seek
Rotate
transfer
21
Prakash2018 VTCS4604
ArrangingPagesonDisk
§ “Next” blockconcept:– blocksonsametrack,followedby– blocksonsamecylinder,followedby– blocksonadjacentcylinder
§ Accesing‘next’blockischeap§ Ausefuloptimization:pre-fetching
– Seetextbookpage323
22
Prakash2018 VTCS4604
Rulesofthumb…
1. MemoryaccessmuchfasterthandiskI/O(~1000x)
§ “Sequential”I/Ofasterthan“random”I/O(~10x)
23
Prakash2018 VTCS4604
Conclusions---Storing
§ Memoryhierarchy§ Disks:(>1000xslower)-thus
– packinfoinblocks– trytofetchnearbyblocks(sequentially)
24
TREEINDEXES
Prakash2018 VTCS4604 25
DeclaringIndexes
§ Nostandard!§ Typicalsyntax:CREATE INDEX StudentsInd ON Students(ID);
CREATE INDEX CoursesInd ON Courses(Number, DeptName);
Prakash2018 VTCS4604 26
TypesofIndexes
§ Primary:indexonakey– Usedtoenforceconstraints
§ Secondary:indexonnon-keyattribute§ Clustering:orderoftherowsinthedatapagescorrespondtotheorderoftherowsintheindex– Onlyoneclusteredindexcanexistinagiventable– Usefulforrangepredicates
§ Non-clustering:physicalordernotthesameasindexorder
Prakash2018 VTCS4604 27
UsingIndexes(1):EqualitySearches
§ Givenavaluev,theindextakesustoonlythosetuplesthathavevintheattribute(s)oftheindex.
§ E.g.(useCourseIndindex)SELECT Enrollment FROM Courses WHERE Number = “4604” and DeptName = “CS”
Prakash2018 VTCS4604 28
UsingIndexes(1):EqualitySearches
§ Givenavaluev,theindextakesustoonlythosetuplesthathavevintheattribute(s)oftheindex.
§ CanuseHashes,butseenext
Prakash2018 VTCS4604 29
UsingIndexes(2):RangeSearches
§ ``Findallstudentswithgpa>3.0’’§ maybeslow,evenonsortedfile§ Hashesnotagoodidea!§ Whattodo?
Prakash2018 VTCS4604
Page 1 Page 2 Page N Page 3 Data File
30
RangeSearches
§ ``Findallstudentswithgpa>3.0’’§ maybeslow,evenonsortedfile§ Solution:Createan`index’file.
Prakash2018 VTCS4604
Page 1 Page 2 Page N Page 3 Data File
k2 kN k1 Index File
31
RangeSearches
§ Moredetails:§ ifindexfileissmall,dobinarysearchthere§ Otherwise??
Prakash2018 VTCS4604
Page 1 Page 2 Page N Page 3 Data File
k2 kN k1 Index File
32
B-trees
§ themostsuccessfulfamilyofindexschemes(B-trees,B+-trees,B*-trees)
§ Canbeusedforprimary/secondary,clustering/non-clusteringindex.
§ balanced“n-way”searchtrees§ OriginalPaper:RudolfBayerandMcCreight,E.M.OrganizationandMaintenanceofLargeOrderedIndexes.ActaInformatica1,173-189,1972.
Prakash2018 VTCS4604 33
B-trees
§ Eg.,B-treeoforderd=1:
Prakash2018 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
34
B-treeproperties:
§ eachnode,inaB-treeoforderd:– Keyorder– atmostn=2dkeys– atleastdkeys(exceptroot,whichmayhavejust1key)
– allleavesatthesamelevel– ifnumberofpointersisk,thennodehasexactlyk-1keys
– (leavesareempty)
Prakash2018 VTCS4604
v1 v2 … vn-1
p1 pn
35
Properties
§ “blockaware”nodes:eachnodeisadiskpage§ O(log(N))foreverything!(ins/del/search)§ typically,ifd=50-100,then2-3levels§ utilization>=50%,guaranteed;onaverage69%
Prakash2018 VTCS4604 36
Queries
§ Algoforexactmatchquery?(eg.,ssn=8?)
Prakash2018 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
37
JAVAanimation
§ http://slady.net/java/bt/
Prakash2018 VTCS4604 38
Queries
§ Algoforexactmatchquery?(eg.,ssn=8?)
Prakash2018 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
39
Queries
§ Algoforexactmatchquery?(eg.,ssn=8?)
Prakash2018 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
40
Queries
§ Algoforexactmatchquery?(eg.,ssn=8?)
Prakash2018 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
41
Queries
§ Algoforexactmatchquery?(eg.,ssn=8?)
Prakash2018 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
Hsteps(=diskaccesses)
42
Queries
§ whataboutrangequeries?(eg.,5<salary<8)§ Proximity/nearestneighborsearches?(eg.,salary~8)
Prakash2018 VTCS4604 43
Queries
§ whataboutrangequeries?(eg.,5<salary<8)§ Proximity/nearestneighborsearches?(eg.,salary~8)
Prakash2018 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
44
Queries
§ whataboutrangequeries?(eg.,5<salary<8)§ Proximity/nearestneighborsearches?(eg.,salary~8)
Prakash2018 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
45
Queries
§ whataboutrangequeries?(eg.,5<salary<8)§ Proximity/nearestneighborsearches?(eg.,salary~8)
Prakash2018 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
46
Queries
§ whataboutrangequeries?(eg.,5<salary<8)§ Proximity/nearestneighborsearches?(eg.,salary~8)
Prakash2018 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
47
Variations
§ HowcouldwedoevenbetterthantheB-treesabove?
Prakash2018 VTCS4604 48
B+trees-Motivation
§ B-tree–printkeysinsortedorder:
Prakash2018 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
49
B+trees-Motivation
§ B-treeneedsback-tracking–howtoavoidit?
Prakash2018 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
50
B+trees-Motivation
§ Strongerreason:forclusteringindex,datarecordsarescattered:
Prakash2018 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
51
Solution:B+-trees
§ facilitatesequentialops§ Theystringallleafnodestogether§ AND§ replicatekeysfromnon-leafnodes,tomakesureeverykeyappearsattheleaflevel
§ (vital,forclusteringindex!)
Prakash2018 VTCS4604 52
B+trees
Prakash2018 VTCS4604
1 3
6
6
9
9
<6
>=6 <9>=9
7 13
53
B+trees
Prakash2018 VTCS4604
1 3
6
6
9
9
<6
>=6 <9>=9
7 13
IndexPages
DataPages
54
B+trees
§ Moredetails:next(andtextbook)§ Inshort:onsplit
– atleaflevel:COPYmiddlekeyupstairs– atnon-leaflevel:pushmiddlekeyupstairs(asinplainB-tree)
Prakash2018 VTCS4604 55
ExampleB+Tree
§ Searchbeginsatroot,andkeycomparisonsdirectittoaleaf
§ Searchfor5*,15*,alldataentries>=24*...
Prakash2018 VTCS4604
Based on the search for 15*, we know it is not in the tree!
Root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
56
InsertingaDataEntryintoaB+Tree
§ FindcorrectleafL.§ PutdataentryontoL.
– IfLhasenoughspace,done!– Else,mustsplitL(intoLandanewnodeL2)
• Redistributeentriesevenly,copyupmiddlekey.
§ parentnodemayoverflow– butthen:pushupmiddlekey.Splits“grow”tree;rootsplitincreasesheight.
Prakash2018 VTCS4604 57
ExampleB+Tree–Inserting30*
Prakash2018 VTCS4604
Root
17 24
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*
13
23*
58
ExampleB+Tree–Inserting30*
Prakash2018 VTCS4604
Root
17 24
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*
13
23* 30*
59
ExampleB+Tree-Inserting8*
Prakash2018 VTCS4604
Root
17 24
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*
13
23*
60
ExampleB+Tree-Inserting8*
Prakash2018 VTCS4604
Root
17 24
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*
13
23*
NoSpace
61
Prakash2018 VTCS4604
ExampleB+Tree-Inserting8*Root
17 24
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*
13
23*
2* 3* 5* 14* 16* 19* 20* 22* 24* 27* 29* 23* 7* 8*
13 17 24
5*
SoSplit!
62
Prakash2018 VTCS4604
ExampleB+Tree-Inserting8*Root
17 24
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*
13
23*
2* 3* 5* 14* 16* 19* 20* 22* 24* 27* 29* 23* 7* 8*
13 17 24
5*
SoSplit!
AndthenpushmiddleUP
63
Prakash2018 VTCS4604
ExampleB+Tree-Inserting8*Root
17 24
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*
13
23*
2* 3* 14* 16* 19* 20* 22* 24* 27* 29* 23* 7* 8*
5 13 17 24
5*
<5 >=5
FinalState
64
ExampleB+Tree-Inserting21*
Prakash2018 VTCS4604
2* 3*
Root
5
14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*
13 17 24
23*
2* 3* 14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8* 23*
65
ExampleB+Tree-Inserting21*
Prakash2018 VTCS4604
2* 3*
Root
5
14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*
13 17 24
23*
2* 3* 14* 16* 19* 20* 24* 27* 29* 7* 5* 8* 21* 22* 23*
17 21 24 13 5 RootisFull,sosplitrecursively
66
ExampleB+Tree:Recursivesplit
Prakash2018 VTCS4604
• Notice that root was also split, increasing height.
2* 3*
Root
17
21 24
14* 16* 19* 20* 21* 22* 23* 24* 27* 29*
13 5
7* 5* 8*
67
Prakash2018 VTCS4604
Example:Datavs.IndexPageSplit
§ leaf:‘copy’§ non-leaf:‘push’
§ whynot‘copy’@non-leaves?
2* 3* 5* 7* 8*
5
5 21 24
17
13
… 2* 3* 5* 7*
17 21 24 13
Data Page Split
Index Page Split
8*
5
#68
SameInserting21*:TheDeferredSplit
Prakash2018 VTCS4604
2* 3*
Root
5
14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*
13 17 24
23*
Notethishasfreespace.So…
69
Inserting21*:TheDeferredSplit
Prakash2018 VTCS4604
2* 3*
Root
5
14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*
13 17 24
23*
LENDkeystosibling,throughPARENT!
2* 3*
Root
5
14* 16* 19* 20* 21* 23* 24* 27* 7* 5* 8*
13 17 23
22* 29*
70
Inserting21*:TheDeferredSplit
Prakash2018 VTCS4604
2* 3*
Root
5
14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*
13 17 24
23*
Shorter,morepacked,fastertree
2* 3*
Root
5
14* 16* 19* 20* 21* 23* 24* 27* 7* 5* 8*
13 17 23
22* 29*
71
Insertionexamplesforyoutotry
Prakash2018 VTCS4604
2* 3*
Root
30
14* 16* 21* 22* 23*
13 5
7* 5* 8*
20 … (not shown)
11*
Insert the following data entries (in order): 28*, 6*, 25*
72
Answer…
Prakash2018 VTCS4604
2* 3*
30
7* 8* 14* 16*
7 5
6* 5*
13 …
After inserting 28*, 6*
After inserting 25*
21* 22* 23* 28*
20
11*
73
Answer…
Prakash2018 VTCS4604
2* 3*
13
20 23
7* 8* 14* 16* 21* 22* 23* 25* 28*
7 5
6* 5*
30
…
11*
After inserting 25*
74
DeletingaDataEntryfromaB+Tree
§ Startatroot,findleafLwhereentrybelongs.§ Removetheentry.
– IfLisatleasthalf-full,done!– IfLunderflows
• Trytore-distribute,borrowingfromsibling(adjacentnodewithsameparentasL).
• Ifre-distributionfails,mergeLandsibling.– updateparent– andpossiblymerge,recursively
Prakash2018 VTCS4604 75
DeletionfromB+Tree
Prakash2018 VTCS4604 76
2* 3*
Root 17
24 30
14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13 5
7* 5* 8*
1
Prakash2018 VTCS4604
Example:Delete19*&20*
Deleting19*iseasy:
2* 3*
Root 17
24 30
14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13 5
7* 5* 8*
2* 3*
Root 17
30
14* 16* 33* 34* 38* 39*
13 5
7* 5* 8* 22* 24*
27
27* 29*
20* 22*
• Deleting20*->re-distribution(notice:27copiedup)
1 2
3
77
Prakash2018 VTCS4604
2* 3*
Root 17
30
14* 16* 33* 34* 38* 39*
13 5
7* 5* 8* 22* 24*
27
27* 29*
...AndThenDeleting24*
2* 3*
Root 17
14* 16* 33* 34* 38* 39*
13 5
7* 5* 8* 22* 27*
30
29*
• Mustmergeleaves:OPPOSITEofinsert
3
4
78
Prakash2018 VTCS4604
2* 3*
Root 17
30
14* 16* 33* 34* 38* 39*
13 5
7* 5* 8* 22* 24*
27
27* 29*
...AndThenDeleting24*
2* 3*
Root 17
14* 16* 33* 34* 38* 39*
13 5
7* 5* 8* 22* 27*
30
29*
• Mustmergeleaves:OPPOSITEofinsert
…butarewedone??
3
4
79
...MergeNon-LeafNodes,ShrinkTree
Prakash2018 VTCS4604
2* 3*
Root 17
14* 16* 33* 34* 38* 39*
13 5
7* 5* 8* 22* 27*
30
29*
4
2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39* 5* 8*
Root 30 13 5 17
5
80
ExampleofNon-leafRe-distribution
§ Treeisshownbelowduringdeletionof24*.§ Now,wecanre-distributekeys
Prakash2018 VTCS4604
Root
13 5 17 20
22
30
14* 16* 17* 18* 20* 33* 34* 38* 39* 22* 27* 29* 21* 7* 5* 8* 3* 2*
81
AfterRe-distribution
§ needonlyre-distribute‘20’;did‘17’,too§ whywouldwewanttore-distributemorekeys?Ans:reduceslikelihoodofsplit(seeBook,pg.356)
Prakash2018 VTCS4604
14* 16* 33* 34* 38* 39* 22* 27* 29* 17* 18* 20* 21* 7* 5* 8* 2* 3*
Root
13 5
17
30 20 22
82
Mainobservationsfordeletion
§ Ifakeyvalueappearstwice(leaf+nonleaf),theabovealgorithmsdeleteitfromtheleaf,only
§ whynotnon-leaf,too?
Prakash2018 VTCS4604 83
Mainobservationsfordeletion
§ Ifakeyvalueappearstwice(leaf+nonleaf),theabovealgorithmsdeleteitfromtheleaf,only
§ whynotnon-leaf,too?§ ‘lazydeletions’-infact,somevendorsjustmarkentriesasdeleted(~underflow),– andreorganize/compactlater
Prakash2018 VTCS4604 84
Recap:mainideas
§ onoverflow,split(and‘push’,or‘copy’)– orconsiderdeferredsplit
§ onunderflow,borrowkeys;ormerge– orletitunderflow...
Prakash2018 VTCS4604 85
B+TreesinPractice
§ Typicalorder:100.Typicalfill-factor:67%.– averagefanout=2*100*0.67=134
§ Typicalcapacities:– Height4:1334=312,900,721entries– Height3:1333=2,406,104entries
Prakash2018 VTCS4604 86
B+TreesinPractice
§ Canoftenkeeptoplevelsinbufferpool:– Level1=1page=8KB– Level2=134pages=1MB– Level3=17,956pages=140MB
Prakash2018 VTCS4604 87
BulkLoadingofaB+Tree
§ Inanemptytree,insertmanykeys§ Whynotone-at-a-time?
– Tooslow!
Prakash2018 VTCS4604 88
BulkLoadingofaB+Tree
§ Initialization:Sortalldataentries§ scanlist;wheneverenoughforapage,pack§ <repeatforupperlevel>
Prakash2018 VTCS4604
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
Sorted pages of data entries; not yet in B+ tree Root
89
Prakash2018 VTCS4604
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
Root
Data entry pages not yet in B+ tree 35 23 12 6
10 20
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
6
Root
10
12 23
20
35
38
not yet in B+ tree Data entry pages
BulkLoadingofaB+Tree
#90
ANoteon`Order’
§ Order(d)conceptreplacedbyphysicalspacecriterioninpractice(`atleasthalf-full’).
§ Manyrealsystemsareevensloppierthanthis:theyallowunderflow,andonlyreclaimspacewhenapageiscompletelyempty.
§ (whatarethebenefitsofsuch‘slopiness’?)
Prakash2018 VTCS4604 91
Conclusions
§ B+treeistheprevailingindexingmethod§ Excellent,O(logN)worst-caseperformanceforins/del/search;(~3-4diskaccessesinpractice)
§ guaranteed50%spaceutilization;avg69%
Prakash2018 VTCS4604 92
Conclusions
§ Canbeusedforanytypeofindex:primary/secondary,sparse(clustering),ordense(non-clustering)
§ Severalfine-extensionsonthebasicalgorithm– deferredsplit;– bulk-loading
Prakash2018 VTCS4604 93
Recommended