Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Database design and implementation CMPSCI 645
Lecture 08: Storage and Indexing
1
Where is the data and how to get to it?
2
DB
DBMS architecture
3
DiskSpaceManager
AccessMethods
BufferManager
QueryParser
QueryRewriter
QueryOp=mizer
QueryExecutor
LockManager LogManager
DB
Memory hierarchy
5
randomaccessfastvola=le
randomaccessrela=velyslownon-vola=le
sequen=alscannon-vola=lelongarchiving
mainmemory
magne+cdisk
tape
Disks and DBMS design
DB
Databasesarestoredondisks
write
read
RAM
expensiveopera=ons
6
Why not store everything in memory?
7
vola=lity
cost
Basics of disks
8
Pla4ers
Spindle
Armmovement
Diskhead
Armassembly
PlaIersspinunderthehead
Onlyoneheadreadsandwrites
Retrieval=mevaries:Seek=me+rota=ondelay+transfer=me
PlaIershavetracks,formingan(imaginary)cylinder
Eachtrackhassectors.Blocks(pages)aremul=pleofsectors
Accessing a disk page
} Timetoaccess(read/write)adiskblock:1. seek'me(movingarmstoposi=onadiskheadonatrack)2. rota'onaldelay(wai=ngforablocktorotateunderthehead)3. transfer'me(actuallymovingdatato/fromdisksurface)
} Seek=meandrota=onaldelaydominate.
} PlacementofpagesondiskhasmajorimpactonDBMSperformance.
9
Arranging pages on disk
} Sequen=alpagestorage:} blocksonthesametrack,followedby} blocksonthesamecylinder,followedby} blocksonanadjacentcylinder
} Pagesinafileshouldbearrangedsequen=allyondisk,tominimizeseekandrota=onaldelay.} Scanofthefileisasequen'alscan.
10
Files of records
11
Fieldsareorganizedinarecord
Acollec=onofrecordsareorganizedinapage
Acollec=onofpagesmakesafile
Unordered (Heap) Files } Simplestfilestructurecontainsrecordsinnopar=cularorder.
} Asfilegrowsandshrinks,diskpagesareallocatedandde-allocated.
} Tosupportrecordlevelopera=ons,wemust:} keeptrackofthepagesinafile} keeptrackoffreespaceonpages} keeptrackoftherecordsonapage
12
Heap File Using a Page Directory
} Pageentrycanincludethenumberoffreebytesonthepage.
} Thedirectoryisacollec=onofpages;linkedlistimplementa=onisjustonealterna=ve.
DataPage1
DataPage2
DataPageN
HeaderPage
DIRECTORY
13
Page format
14
} Howtostorerecordsonapage
} Considerapageasacollec=onofslots,oneforeachrecord
} Arecordisiden=fiedbyrid=<pageid,slot#>
} Recordids(rids)areusedinindexes
Page formats: fixed length records
Movingrecordsforfreespacemanagementchangesrid;maynotbeacceptable.
Slot1Slot2
SlotN
... ...
N M10...
M...321PACKED UNPACKED,BITMAP
Slot1Slot2
SlotN
FreeSpace
SlotM
11
numberofrecords
numberofslots
15
Page formats: variable length records
Canmoverecordsonpagewithoutchangingrid;so,aIrac=veforfixed-lengthrecordstoo.
PageiRid=(i,N)
Rid=(i,2)
Rid=(i,1)
Pointertostartoffreespace
SLOTDIRECTORY
N...2120 16 24 N
#slots
16
Record formats: fixed length
Numberoffieldsandtypestoredinsystemcatalogs.Findingithfielddoesnotrequirescanofrecord.
Baseaddress(B)
L1 L2 L3 L4
F1 F2 F3 F4
Address=B+L1+L2
17
Record formats: variable length
F1F2F3F4
S1 S2 S3 S4 E4ArrayofFieldOffsets
$ $ $ $
Scan
FieldsDelimitedbySpecialSymbols
F1F2F3F4
2ndchoiceoffersdirectaccesstoithfieldwithsmalldirectoryoverhead.
18
Question
} Considerthefollowingquery:
} HowcantheDBMSexecutethisquerygiven} 1GBofmemory} 100GBTempSensorand10GBPressureSensor
SELECT S1.temp, S2.pressure!FROM ! TempSensor S1, PressureSensor S2!WHERE! S1.location = S2.location !
! AND S1.time = S2.time!
19
Buffer manager
Disk
Mainmemory
Pagerequestsfromhigher-levelcode
Bufferpool
Diskpage
Freeframe
1pagecorrespondsto1diskblock
Disk=collec=onofblocks
Diskspacemanager
BufferpoolmanagerFilesandaccessmethods
choiceofframedictatedbyreplacementpolicy
• DatamustbeinRAMforDBMStooperateonit!• Bufferpool=tableof<frame#,pageid>pairs
READ/WRITE
INPUT/OUTPUT
01
11
02
pincount
dirty
20
When a page is requested...
} Ifrequestedpageisnotinpool(andbufferisfull):} Chooseaframeforreplacement} Ifframeisdirty,writeittodisk} Readrequestedpageintochosenframe
} Pinthepageandreturnitsaddress.
Ifrequestscanbepredicted(e.g.,sequen=alscans)pagescanbepre-fetchedseveralpagesata=me!
23
Buffer replacement policy } Frameischosenforreplacementbyareplacementpolicy:} Least-recently-used(LRU),Clock,MRUetc.
} Policycanhavebigimpacton#ofI/O’s;dependsontheaccesspa>ern.
} Sequen'alflooding:Nastysitua=oncausedbyLRU+repeatedsequen=alscans.} #bufferframes<#pagesinfilemeanseachpagerequestcausesanI/O.MRUmuchbeIerinthissitua=on(butnotinallsitua=ons,ofcourse).
24
DBMS vs. OS file system
} Reason1:Correctness} DBMSneedsfinegrainedcontrolfortransac=ons} Needstoforcepagestodiskforrecoverypurposes
} Reason2:Performance} DBMSmaybeabletoan=cipateaccesspaIerns} Hence,mayalsobeabletoperformprefetching} MayselectbeIerpagereplacementpolicy
25
OSdoesdiskspace&buffermgmt:whynotletitmanagethesetasks?
Database file types
Thedatafilecanbeoneof:} Heapfile} Setofrecords,par==onedintoblocks} Unsorted
} Sequen=alfile} SortedaccordingtosomeaIribute(s)called(sort)key
differentfrom“key"!
26
Index
} A(possiblyseparate)file,thatallowsfastaccesstorecordsinthedatafilegivenasearchkey
} Theindexcontains(key,value)pairs:} Thekey=anaIributevalue} Thevalue=eitherapointertotherecord,ortherecorditself
againdifferentfrom“key"!
27
High-level overview: Indexes
id age salary other
006 19 50k ...
005 20 55k ...
004 25 50k ...
007 30 80k ...
002 35 75k ...
003 35 70k ...
001 40 65k ...
id age salary other
006 19 50k ...
004 25 50k ...
005 20 55k ...
001 40 65k ...
003 35 70k ...
002 35 75k ...
007 30 80k ...
datafile=indexfileclustered(primary)index
indexfileunclustered(secondary)index
28
Index classification } Clustered/unclustered
} Clustered=recordscloseinindexarecloseindata} Unclustered=recordscloseinindexmaybefarindata
} Primary/secondary} Primary=isoveraIributesthatincludetheprimarykey} Secondary=otherwise
} Organiza=on:B+treeorHashtable
29
Clustered/Unclustered
} Clustered} Indexdeterminestheloca=onofindexedrecords} Typically,clusteredindexisonewherevaluesaredatarecords(butnotnecessary)
} Unclustered} Indexcannotreorderdata,doesnotdeterminedataloca=on
} Intheseindexes:value=pointertodatarecord
30
Clustered index
} FileissortedontheindexaIribute} Onlyonepertable
10
20
30
40
50
60
70
80
10
20
30
40
50
60
70
80
Index File Data File
31
Unclustered index
} Severalpertable
10
10
20
20
20
30
30
30
20
30
30
20
10
20
10
30
Index File Data File
32
Clustered vs. unclustered index
Dataentries(IndexFile)(Datafile)
DataRecords
Dataentries
DataRecords
CLUSTERED UNCLUSTERED
B+Tree B+Tree
33
Alternatives for data entry k* in index
} Inadataentryk*,wecanstore:} Alterna=ve1:<k,datarecordwithsearchkeyvaluek>} Alterna=ve2:<k,ridofarecordwithsearchkeyvaluek>
} Alterna=ve3:<k,listofridsofrecordswithsearchkeyk>
} Choiceofanalterna'vefordataentriesisorthogonaltoanindexingtechniqueused.} Indexingtechniques:B+tree,hashing,…
34
Cost model
WeignoreCPUcosts,forsimplicity:} B:Thenumberofdatapages} R:Numberofrecordsperpage} D:(Average)=metoreadorwritediskpage} MeasuringnumberofpageI/Osignoresgainsofpre-fetchingasequenceofpages;thus,evenI/Ocostisonlyapproximated.
} Average-caseanalysis;basedonseveralsimplis=cassump=ons.
35
Comparing file organizations
} Heapfiles(randomorder)} Sortedfiles,sortedon<age,sal>} ClusteredB+treefile,Alterna=ve(1),search
key<age,sal>} HeapfilewithunclusteredB+treeindexon
searchkey<age,sal>} Heapfilewithunclusteredhashindexon
searchkey<age,sal> 36
Operations to compare
37
} Scan:Fetchallrecordsfromdisk} Equalitysearch} Rangeselec=on} Insertarecord} Deletearecord
Assumptions } HeapFiles:} Equalityselec=ononkey;exactlyonematch.
} SortedFiles:} Filescompactedaverdele=ons.
} Indexes:} Alt(2),(3):dataentrysize=10%sizeofrecord} Hash:Nooverflowbuckets.
} 80%pageoccupancy=>Filesize=1.25datasize} Tree:67%occupancy(thisistypical).
} Impliesfilesize=1.5datasize
38
Assumptions (contd.)
} Scans:} Leaflevelsofatree-indexarechained.} Indexdata-entriesplusactualfilescannedforunclusteredindexes.
} Rangesearches:} Weusetreeindexestorestrictthesetofdatarecordsfetched,butignorehashindexes.
39
Cost of operations
40
Scan Equality Range
Heap file BD 0.5 BD BD
Sorted file BD D log2 B D (log2 B + #match recs)
Clustered tree index 1.5 BD D logF 1.5B D (logF 1.5B + #pages with matched recs)
Unclustered tree index BD (R+0.15) D(1 + logF 0.15B) D (logF 0.15B + #pages with matched recs)
Unclustered hash index
BD (R + 0.125) 2D BD
Severalassump=onsunderliethese(rough)es=mates!