Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Lecture9:DataStorageandIOModels
Lecture9
Announcements
• SubmissionProjectPart1tonight• InstructionsonPiazza!
• PS2dueonFridayat11:59pm• Questions?EasierthanPS1.
• BadgersRule!
Lecture9
Today’sLecture
1. DataStorage
2. DiskandFiles
3. BufferManager- Prelims
3
Lecture9
1. DataStorage
4
Lecture9>Section1
Whatyouwilllearnaboutinthissection
1. Lifecycleofaquery
2. ArchitectureofaDBMS
3. MemoryHierarchy
5
Lecture9>Section1
6
Lifecycleofaquery
Lecture9>Section1
Query
Query Result
Database Server
Select R.text from Report R, Weather Wwhere W.image.rain()
and W.city = R.city and W.date = R.date
and R.text.
matches(“insurance claims”)
QuerySyntax Tree
Parser
Query Plan
Optimizer
Segments
Query Scheduler |…|……|………..|………..|
|…|……|………..|………..||…|……|………..|………..||…|……|………..|………..||…|……|………..|………..||…|……|………..|………..||…|……|………..|………..||…|……|………..|………..||…|……|………..|………..||…|……|………..|………..||…|……|………..|………..|
QueryResult
ExecuteOperators
7
InternalArchitectureofaDBMS
Lecture9>Section1>ArchitectureofaDBMS
query
QueryExecution
dataaccess
StorageManager
I/Oaccess
8
ArchitectureofaStorageManager
Lecture9>Section1>StorageManager
IOAccesses
I/OManager
AccessMethods
HeapFile B+-treeIndex
SortedFile HashIndex
BufferManager ConcurrencyCo
ntrol
Manager
RecoveryM
anager
InSystems,IOcostmattersaton!
9
DataStorage
Lecture9>Section1>DataStorage
• HowdoesaDBMSstoreandaccessdata?• mainmemory(fast,temporary)• disk(slow,permanent)
• Howdowemovedatafromdisktomainmemory?• buffermanager
• Howdoweorganizerelationaldataintofiles?
10
MemoryHierarchy
Lecture9>Section1>MemoryHierarchy
FlashStorage
CPU
MainMemory
MagneticHardDiskDrive(HDD)
Cache
11
Lecture9>Section1>MemoryHierarchy
Whynotmainmemory?
• Relativelyhighcost• Mainmemoryisnotpersistent!
• Typicalstoragehierarchy:• Primarystorage:mainmemory(RAM)forcurrentlyuseddata
• Secondarystorage:disk forthemaindatabase
• Tertiarystorage:tapes forarchivingolderversionsofthedata
2.DiskandFiles
12
Lecture9>Section2
Whatyouwilllearnaboutinthissection
1. Allaboutdisks
2. Accessingadisk
3. Managingdiskspace
13
Lecture9>Section2
Disks
14
Lecture9>Section2>Disks
• Secondarystoragedeviceofchoice.
•Dataisstoredandretrievedinunitscalleddiskblocks
•UnlikeRAM,timetoretrieveadiskpagevariesdependinguponlocationondisk.
• Therefore,relativeplacementofpagesondiskhasmajorimpactonDBMSperformance!
TheMechanicsofaDisk
15
Lecture9>Section2>Disks
Mechanicalcharacteristics:• Rotationspeed(7200RPM)• Numberofplatters(1-30)• Numberoftracks(<=10000)• Numberofbytes/track(105)
Platters
SpindleDisk head
Arm movement
Arm assembly
Tracks
Sector
Cylinder
TheMechanicsofaDisk
16
Lecture9>Section2>Disks
Platters
SpindleDisk head
Arm movement
Arm assembly
Tracks
Sector
Cylinder
• Plattersspin@~7200rpm• Armassemblymovesto
positionaheadonadesiredtrack.Tracksunderheadsmakeacylinder (imaginary!)
• Only1headreads/writesatanytime
• Blocksize:multipleofsectorsize (whichisfixed).
TheMechanicsofaDisk
17
Lecture9>Section2>Disks
Platters
SpindleDisk head
Arm movement
Arm assembly
Tracks
Sector
Cylinder
Unitofreadorwrite:diskblock:k*SectorSize
Onceinmemory:pageTypically:4kor8kor16k
Accesstime =seektime+rotationaldelay+transfertime(1-20ms)(0-10ms)(~1ms per8kblock)
TheMechanicsofaDisk
18
Lecture9>Section2>Disks
“NextBlock”concept(1) Blocksonsametrack(2) Blocksonsamecylinder(3) Blocksonadjacentcylinder
Platters
SpindleDisk head
Arm movement
Arm assembly
Tracks
Sector
Cylinder
Disksread/writeoneblockatatime
GOAL:Minimizeseekandrotationaldelay
Forasequentialscan,pre-fetchingseveralpagesatatimeisabigwin!
Accesstime =seektime+rotationaldelay+transfertime
access time = rotational delay + seek time + transfer time
rotationaldelay:timetowaitforsectortorotateunderthediskhead• typicaldelay:0–10ms• maximumdelay=1fullrotation• averagedelay~halfrotation
19
RPM Average delay
5,400 5.56
7,200 4.17
10,000 3.00
15,000 2.00
Accessingthedisk(I)
Lecture9>Section2>AccessingtheDisk
access time = rotational delay + seek time + transfer time
seektime:timetomovethearmtopositiondiskheadontherighttrack• typicalseektime:~9ms• ~4ms forhigh-enddisks
20
Accessingthedisk(II)
Lecture9>Section2>AccessingtheDisk
access time = rotational delay + seek time + transfer time
datatransfertime:timetomovethedatato/fromthedisksurface• typicalrates:~100MB/s• theaccesstimeisdominatedbytheseektimeandrotationaldelay!
21
Accessingthedisk(III)
Lecture9>Section2>AccessingtheDisk
22
SeagateHDD
Capacity 3TB
RPM 7,200
AverageSeekTime 9ms
Max Transfer Rate 210MB/s
#Platters 3
WhataretheI/Oratesforblocksize4KB and:• randomworkload(~0.3MB/s)• sequentialworkload(~210MB/s)
Example:Specs
Lecture9>Section2>AccessingtheDisk
• Thediskspaceisorganizedintofiles• Filesaremadeupofpages• Pagescontainrecords
• Dataisallocated/deallocated inincrementsofpages• Logicallyclosepagesshouldbenearbyinthedisk
23
ManagingDiskSpace
Lecture9>Section2>ManagingDiskSpace
SSDs(SolidStateDrive)
• SSDsuseflashmemory• Nomovingparts(norotate/seekmotors)
• eliminatesseektimeandrotationaldelay• verylowpowerandlightweight
• Datatransferrates:300-600MB/s• SSDscanreaddata(sequentialorrandom)veryfast!
24
Lecture9>Section2>SSDs
• Smallstorage(0.1-0.5xofHDD)• expensive(20xofHDD)• Writes aremuchmoreexpensivethanreads (10x)• Limitedlifetime
• 1-10Kwritesperpage• theaveragefailurerateis6years
25
SSDs
Lecture9>Section2>SSDs
Canonlyreadandwriteinblocksorpagesof2K,4K,ormorebytes.Lookslikeadisk.
3.BufferManager- Prelims
26
Lecture9>Section3
Whatyouwilllearnaboutinthissection
1. BufferManager
2. ReplacementPolicy
27
Lecture9>Section3
High-level:Diskvs.MainMemory
Disk:
• Slow: Sequentialblock access• Readablocks(notbyte)atatime,sosequentialaccessischeaper
thanrandom• Diskread/writesareexpensive!
• Durable:Wewillassumethatonceondisk,dataissafe!
• Cheap 28
Platters
SpindleDisk head
Arm movement
Arm assembly
Tracks
Sector
Cylinder
RandomAccessMemory(RAM)orMainMemory:
• Fast: Randomaccess,byteaddressable• ~10xfasterforsequentialaccess• ~100,000xfasterforrandomaccess!
• Volatile: Datacanbelostife.g.crashoccurs,powergoesout,etc!
• Expensive: For$100,get16GBofRAMvs.2TBofdisk!
Lecture9>Section3>Storage&memorymodel
TheBuffer
Lecture9>Section3>TheBuffer
Disk
MainMemory
Buffer• Abuffer isaregionofphysicalmemoryusedtostoretemporarydata
• Inthislecture:aregioninmainmemoryusedtostoreintermediatedatabetweendiskandprocesses
• Keyidea:Reading/writingtodiskisslow-needtocachedata!
MainMemory
Buffer
The(Simplified)Buffer• Inthisclass:We’llconsiderabufferlocatedinmainmemory thatoperatesoverpagesandfiles:
Lecture9>Section3>TheBuffer
Disk1,0,31,0,3
• Read(page): Readpagefromdisk->bufferifnotalreadyinbuffer
MainMemory
Buffer
The(Simplified)Buffer• Inthisclass:We’llconsiderabufferlocatedinmainmemory thatoperatesoverpagesandfiles:
Lecture9>Section3>TheBuffer
Disk1,0,3
1,0,3• Read(page): Readpagefromdisk->
bufferifnotalreadyinbuffer
02
Processescanthenreadfrom/writetothepageinthebuffer
MainMemory
Buffer
The(Simplified)Buffer• Inthisclass:We’llconsiderabufferlocatedinmainmemory thatoperatesoverpagesandfiles:
Lecture9>Section3>TheBuffer
Disk1,0,3
1,2,3• Read(page): Readpagefromdisk->
bufferifnotalreadyinbuffer
• Flush(page): Evictpagefrombuffer&writetodisk
MainMemory
Buffer
The(Simplified)Buffer• Inthisclass:We’llconsiderabufferlocatedinmainmemory thatoperatesoverpagesandfiles:
Lecture9>Section3>TheBuffer
Disk1,0,3
1,2,3• Read(page): Readpagefromdisk->
bufferifnotalreadyinbuffer
• Flush(page): Evictpagefrombuffer&writetodisk
• Release(page): Evictpagefrombufferwithout writingtodisk
MainMemory
Buffer
Lecture9>Section3>TheBuffer
Disk
ManagingDisk:TheDBMSBuffer• Databasemaintainsitsownbuffer
• Why?TheOSalreadydoesthis…
• DBknowsmoreaboutaccesspatterns.
• Watchforhowthisshowsup!(cf.SequentialFlooding)
• Recoveryandloggingrequireabilitytoflush todisk.
TheBufferManager• Abuffermanager handlessupportingoperationsforthebuffer:
• Primarily,handles&executesthe“replacementpolicy”• i.e.findsapageinbuffertoflush/releaseifbufferisfullandanewpageneedstobereadin
• DBMSstypicallyimplementtheirownbuffermanagementroutines
Lecture9>Section3>TheBuffer
ASimplifiedFilesystem Model
• Forus,apage isafixed-sizedarray ofmemory• Think:Oneormorediskblocks• Interface:
• writetoanentry(calledaslot)orsetto“None”
• DBMSalsoneedstohandlevariablelengthfields• Pagelayoutisimportantforgoodhardwareutilizationaswell(see346)
• Andafile isavariable-lengthlist ofpages• Interface:create/open/close;next_page();etc.
Lecture9>Section3>TheBuffer
Disk
1,0,3 1,0,3File
Page
• DatamustbeinRAMforDBMStooperateonit• Allthepagesmaynotfitintomainmemory
37
BufferManager
Lecture9>Section3>TheBuffer
Buffermanager:responsibleforbringingpagesfromdisktomainmemoryasneededpages broughtintomainmemoryareinthebufferpoolthebufferpoolispartitionedintoframes:slotsforholdingdiskpages
38
bufferpool
page frame
pagerequest
disk
BufferManager
Lecture9>Section3>TheBuffer
39
Remember:BufferManager
Lecture9>Section3>TheBuffer
• Read(page): Readpagefromdisk->bufferifnotalreadyinbuffer
• Flush(page): Evictpagefrombuffer&writetodisk
• Release(page): Evictpagefrombufferwithout writingtodisk
• Howdowechooseaframeforreplacement?• LRU(LeastRecentlyUsed)• Clock• MRU(MostRecentlyUsed)• FIFO,random,…
• Thereplacementpolicyhasbigimpacton#ofI/O’s(dependsontheaccesspattern)
• Tobecontinued!40
Bufferreplacementpolicy
Lecture9>Section3>TheBuffer