Upload
hoangkhue
View
213
Download
0
Embed Size (px)
Citation preview
VirtualMemory• Idea:Givetheprogrammertheillusionofalargeaddressspacewhilehavingasmallphysicalmemory• Sothattheprogrammerdoesnotworryaboutmanagingphysicalmemory
• Programmercanassumehe/shehas“infinite” amountofphysicalmemory
• Hardwareandsoftwarecooperativelyandautomaticallymanagethephysicalmemoryspacetoprovidetheillusion• Illusionismaintainedforeachindependentprocess
2
BasicMechanism• Indirection(inaddressing)
• Addressgeneratedbyeachinstructioninaprogramisa“virtualaddress”• i.e.,itisnotthephysicaladdressusedtoaddressmainmemory
• An“addresstranslation” mechanismmapsthisaddresstoa“physicaladdress”• Addresstranslationmechanismcanbeimplementedinhardwareandsoftwaretogether
“Attheheart[...]isthenotionthat‘address’isaconceptdistinct from‘physicallocation.’”PeterDenning
3
OverviewofPaging
virtua
lvirtua
l
physica
l
Process1
Process2
4GB
4GB
16MB
VirtualPage
VirtualPage
PhysicalPageFrame
4
Review:VirtualMemory&PhysicalMemory
null
null
Memoryresidentpagetable(DRAM)
Physicalmemory(DRAM)
VP7VP4
Virtualmemory(disk)
Valid01
010
10
1
Physicalpagenumberordiskaddress
PTE0
PTE7
PP0VP2VP1
PP3
VP1
VP2
VP4
VP6
VP7
VP3
Virtualaddress
¢ Apagetablecontainspagetableentries(PTEs)thatmapvirtualpagestophysicalpages.
5
Translation• Assume:VirtualPage7ismappedtoPhysicalPage32• ForanaccesstoVirtualPage7…
031
011001
1112
0000000111
OffsetVPNVirtualAddress:
027
011001
1112
0000100000
OffsetPPNPhysicalAddress:
Translated
6
AddressTranslationWithaPageTable
Virtualpagenumber(VPN) Virtualpageoffset(VPO)
Physicalpagenumber(PPN) Physicalpageoffset(PPO)
Virtualaddress
Physicaladdress
Valid Physicalpagenumber(PPN)
Pagetablebaseregister(PTBR)
(CR3inx86)
Pagetable
Physicalpagetableaddressforthecurrentprocess
Validbit=0:Pagenotinmemory
(pagefault)
0p-1pn-1
0p-1pm-1
Validbit=1
7
AddressTranslation:PageHit
1)ProcessorsendsvirtualaddresstoMMU
2-3)MMUfetchesPTEfrompagetableinmemory
4)MMUsendsphysicaladdresstocache/memory
5)Cache/memorysendsdatawordtoprocessor
MMU Cache/MemoryPA
Data
CPU VA
CPUChip PTEA
PTE1
2
3
4
5
8
AddressTranslation:PageFault
1)ProcessorsendsvirtualaddresstoMMU2-3)MMUfetchesPTEfrompagetableinmemory4)Validbitiszero,soMMUtriggerspagefaultexception5)Handleridentifiesvictim(and,ifdirty,pagesitouttodisk)6)HandlerpagesinnewpageandupdatesPTEinmemory7)Handlerreturnstooriginalprocess,restartingfaultinginstruction
MMU Cache/Memory
CPU VA
CPUChip PTEA
PTE1
2
3
4
5
Disk
Pagefaulthandler
Victimpage
Newpage
Exception
6
7
9
IntegratingVMandCache
VACPU MMU
PTEA
PTE
PA
Data
MemoryPAPA
miss
PTEAPTEAmiss
PTEAhit
PAhit
Data
PTE
L1cache
CPUChip
VA:virtualaddress,PA:physicaladdress,PTE:pagetableentry,PTEA=PTEaddress10
TwoProblems
• Twoproblemswithpagetables
•Problem#1:Pagetableistoolarge
• Problem#2:Pagetableisstoredinmemory• Beforeeverymemoryaccess,alwaysfetchthePTEfromtheslowmemory?èLargeperformancepenalty
11
Multi-LevelPageTables• Suppose:
• 4KB(212)pagesize,48-bitaddressspace,8-bytePTE
• Problem:• Wouldneeda512GBpagetable!
• 248 *2-12*23 =239 bytes
• Commonsolution:Multi-levelpagetable• Example:2-levelpagetable
• Level1table:eachPTEpointstoapagetable(alwaysmemoryresident)
• Level2table:eachPTEpointstoapage(pagedinandoutlikeanyotherdata)
Level1Table
...
Level2Tables
...
12
ATwo-LevelPageTableHierarchyLevel1
pagetable
...
Level2pagetables
VP0
...
VP1023
VP1024
...
VP2047
Gap
0
PTE0
...
PTE1023
PTE0
...
PTE1023
1023nullPTEs
PTE1023 1023unallocated
pagesVP9215
Virtualmemory
(1K- 9)nullPTEs
PTE0
PTE1
PTE2(null)
PTE3(null)
PTE4(null)
PTE5(null)
PTE6(null)
PTE7(null)
PTE8
2KallocatedVMpagesforcodeanddata
6KunallocatedVMpages
1023unallocatedpages
1allocatedVMpageforthestack
32bitaddresses,4KBpages,4-bytePTEs 13
Translatingwithak-levelPageTable
Pagetablebaseregister
(PTBR)
VPN10p-1n-1
VPOVPN2 ... VPNk
PPN
0p-1m-1PPOPPN
VIRTUALADDRESS
PHYSICALADDRESS
... ...
theLevel1pagetable
aLevel2pagetable
aLevelkpagetable
14
Translation:“Flat”PageTablepte_t PAGE_TABLE[1<<20];// 32-bit VA, 28-bit PA, 4KB page
PAGE_TABLE[7]=2;
31XXX000000111
OffsetVPN
VirtualAddress01112
NULL PTE0NULL PTE1
NULL PTE7
NULL PTE1<<20-1
······
15 0PAGE_TABLE
27XXX000000010
OffsetPPN
PhysicalAddress01112
000000010 PTE7
15
NULLPDE0
Translation:Two-LevelPageTablepte_t *PAGE_DIRECTORY[1<<10];
PAGE_DIRECTORY[0]=malloc((1<<10)*sizeof(pte_t));
PAGE_DIRECTORY[0][7]=2;
&PT0PDE0NULLPDE1
NULLPDE102331 0PAGE_DIR
NULL PTE0
PTE7
NULL PTE102315 0
NULL
PAGE_TABLE0
PTE7000000010
VPN[31:12]=0000000000_0000000111Directoryindex Tableindex
16
Two-LevelPageTable(x86)
• CR3:ControlRegister3(orPageDirectoryBaseRegister)• Storesthephysical addressofthepagedirectory• Q:Whynotthevirtualaddress?
17
Per-ProcessVirtualAddressSpace• Eachprocesshasitsownvirtualaddressspace• ProcessX:texteditor• ProcessY:videoplayer• X writingtoitsvirtualaddress0doesnot affectthedatastoredinY’svirtualaddress0(oranyotheraddress)• Thiswastheentirepurposeofvirtualmemory
• Eachprocesshasitsownpagedirectoryandpagetables• Onacontextswitch,theCR3’svaluemustbeupdated
X’sPAGE_DIR Y’sPAGE_DIR
CR319
TwoProblems
• Twoproblemswithpagetables• Problem#1:Pagetableistoolarge• Pagetablehas1Mentries• Eachentryis4B(because4B≈20-bitPPN)• Pagetable=4MB(!!)• veryexpensiveinthe80s
• Solution: Hierarchicalpagetable
•Problem#2:Pagetableisinmemory• Beforeeverymemoryaccess,alwaysfetchthePTEfromtheslowmemory?èLargeperformancepenalty
20
SpeedingupTranslationwithaTLB
• Pagetableentries(PTEs)arecachedinL1likeanyothermemoryword• PTEsmaybeevictedbyotherdatareferences• PTEhitstillrequiresasmallL1delay
• Solution:TranslationLookaside Buffer (TLB)• Smallset-associativehardwarecacheinMMU• Mapsvirtualpagenumberstophysicalpagenumbers• Containscompletepagetableentriesforsmallnumberofpages
21
AccessingtheTLB• MMUusestheVPNportionofthevirtualaddresstoaccesstheTLB:
TLBtag(TLBT) TLBindex(TLBI)0p-1pn-1
VPO
VPN
p+t-1p+t
PTEtagv
…PTEtagvSet0
PTEtagv PTEtagvSet1
PTEtagv PTEtagvSetT-1
T=2t sets
TLBIselectstheset
TLBTmatchestagoflinewithinset
22
TLBHit
MMU Cache/Memory
CPU
CPUChip
VA1
PA
4
Data5
ATLBhiteliminatesamemoryaccess
TLB
2
VPN
PTE
3
23
TLBMiss
MMU Cache/MemoryPA
Data
CPU VA
CPUChip
PTE
1
2
5
6
TLB
VPN
4
PTEA3
ATLBmissincursanadditionalmemoryaccess(thePTE)Fortunately,TLBmissesarerare.Why?
24
SimpleMemorySystemExample• Addressing• 14-bitvirtualaddresses• 12-bitphysicaladdress• Pagesize=64bytes
13 12 11 10 9 8 7 6 5 4 3 2 1 0
11 10 9 8 7 6 5 4 3 2 1 0
VPO
PPOPPN
VPN
VirtualPageNumber VirtualPageOffset
PhysicalPageNumber PhysicalPageOffset25
0–021340A10D030–073
0–030–060–080–022
0–0A0–040–0212D031
102070–0010D090–030
ValidPPNTagValidPPNTagValidPPNTagValidPPNTagSet
SimpleMemorySystemTLB• 16entries• 4-wayassociative
13 12 11 10 9 8 7 6 5 4 3 2 1 0
VPOVPN
TLBITLBT
0 0 0 0 1 1 0 1
0–021340A10D030–073
0–030–060–080–022
0–0A0–040–0212D031
102070–0010D090–030
ValidPPNTagValidPPNTagValidPPNTagValidPPNTagSet
TranslationLookasideBuffer(TLB)
VPN=0b1101PPN=?
26
SimpleMemorySystemPageTableOnlyshowingthefirst16entries(outof256)
10D0F1110E12D0D0–0C0–0B1090A1170911308
ValidPPNVPN
0–070–06116050–0410203133020–0112800
ValidPPNVPN
0x0D→ 0x2D
27
VPN=0b1101PPN=?
ContextSwitches
• AssumethatProcessX isrunning• ProcessX’sVPN5ismappedtoPPN100• TheTLBcachesthismapping• VPN5à PPN100
• NowassumeacontextswitchtoProcessY• ProcessY’sVPN5ismappedtoPPN200• WhenProcessYtriestoaccessVPN5,itsearchestheTLB• ProcessY findsanentrywhosetagis5• Hurray!It’saTLBhit!• ThePPNmustbe100!• …Areyousure? 28
ContextSwitches(cont’d)
• Approach#1.FlushtheTLB• Wheneverthereisacontextswitch,flushtheTLB
• AllTLBentriesareinvalidated• Example:80836
• UpdatingthevalueofCR3signalsacontextswitch• ThisautomaticallytriggersaTLBflush
• Approach#2.AssociateTLBentrieswithprocesses• AllTLBentrieshaveanextrafieldinthetag...
• Thatidentifiestheprocesstowhichitbelongs• Invalidateonlytheentriesbelongingtotheoldprocess• Example:Modernx86,MIPS
29
HandlingTLBMisses
• TheTLBissmall;itcannotholdall PTEs• SometranslationswillinevitablymissintheTLB• MustaccessmemorytofindtheappropriatePTE• Calledwalking thepagedirectory/table• Largeperformancepenalty
• WhohandlesTLBmisses?1. Hardware-ManagedTLB2. Software-ManagedTLB
30
HandlingTLBMisses(cont’d)
• Approach#1.Hardware-Managed (e.g.,x86)• Thehardwaredoesthepagewalk• ThehardwarefetchesthePTEandinsertsitintotheTLB• IftheTLBisfull,theentryreplaces anotherentry
• Allofthisisdonetransparently
• Approach#2.Software-Managed (e.g.,MIPS)• Thehardwareraisesanexception• Theoperatingsystemdoesthepagewalk• TheoperatingsystemfetchesthePTE• Theoperatingsysteminserts/evictsentriesintheTLB
31
HandlingTLBMisses(cont’d)
• Hardware-ManagedTLB• Pro:Noexceptions.Instructionjuststalls• Pro:Independentinstructionsmaycontinue• Pro:Smallfootprint(noextrainstructions/data)• Con:Pagedirectory/tableorganizationisetchedinstone
• Software-ManagedTLB• Pro:TheOScandesignthepagedirectory/table• Pro:MoreadvancedTLBreplacementpolicy• Con:Flushespipeline• Con:Performanceoverhead
32
AddressTranslationandCaching• Whendowedotheaddresstranslation?• BeforeorafteraccessingtheL1cache?
• Inotherwords,isthecachevirtuallyaddressedorphysicallyaddressed?• Virtualversusphysicalcache
• Whataretheissueswithavirtuallyaddressedcache?
• Synonymproblem:• Twodifferentvirtualaddressescanmaptothesamephysicaladdressà samephysicaladdresscanbepresentinmultiplelocationsinthecacheà canleadtoinconsistencyindata
33
HomonymsandSynonyms• Homonym:SameVAcanmaptotwodifferentPAs• Why?
• VAisindifferentprocesses
• Synonym:DifferentVAscanmaptothesamePA• Why?
• Differentpagescansharethesamephysicalframewithinoracrossprocesses
• Reasons:sharedlibraries,shareddata,copy-on-writepageswithinthesameprocess,…
• Dohomonymsandsynonymscreateproblemswhenwehaveacache?• Isthecachevirtuallyorphysicallyaddressed?
34
Cache-VMInteraction
CPU
TLB
cache
lowerhier.
physicalcache
CPU
cache
tlb
lowerhier.
virtual(L1)cache
VA
PA
CPU
cache tlb
lowerhier.
virtual-physicalcache
VA
PA
VA
PA
35
Virtually-IndexedPhysically-Tagged• IfC≤(page_size ´ associativity),thecacheindexbitscomeonlyfrompageoffset(sameinVAandPA)• IfbothcacheandTLBareonchip• indexbotharraysconcurrentlyusingVAbits• checkcachetag(physical)againstTLBoutputattheend
VPN PageOffset
TLB
PPN
CIndex CO
physicalcache
tag data=
cachehit?TLBhit? 36
Virtually-IndexedPhysically-Tagged• IfC>(page_size ´ associativity),thecacheindexbitsincludeVPNÞ Synonymscancauseproblems• Thesamephysicaladdresscanexistintwolocations
• Solutions?VPN PageOffset
TLB
PPN
CacheIndex CO
physicalcache
tag data=
cachehit?TLBhit?
a
37
SanityCheck• Core2Duo:32KB,8-waysetassociative,pagesize≥4K• Cachesize≤(page_size ´ associativity)?• 2P = 4KP =12
• Needs12bitsforpageoffset• 2C=32KB,C=15
• Needs15bitstoaddressabyteinthecache• 2A =8-way,A=3
• Increasingtheassociativityofthecachereducesthenumberofaddressbitsneededtoindexintothecache
• Needs12bitsforcacheindexandoffset,astagsarematchedforblocksinthesameset• C≤P+A?15≤12+3?True
38
SomeSolutionstotheSynonymProblem• Limitcachesizeto(pagesizetimesassociativity)• getindexfrompageoffset
• Onawritetoablock,searchallpossibleindicesthatcancontainthesamephysicalblock,andupdate/invalidate• UsedinAlpha21264,MIPSR10K
• RestrictpageplacementinOS• makesureindex(VA)=index(PA)• Calledpagecoloring• UsedinmanySPARCprocessors
39
IntelCorei7MemorySystem
L1d-cache32KB,8-way
L2unifiedcache256KB,8-way
L3unifiedcache8MB,16-way
(sharedbyallcores)
Mainmemory
Registers
L1d-TLB64entries,4-way
L1i-TLB128entries,4-way
L2unifiedTLB512entries,4-way
L1i-cache32KB,8-way
MMU(addr translation)
Instructionfetch
Corex4
DDR3Memorycontroller3x [email protected]/s
32GB/s total(sharedbyallcores)
Processorpackage
QuickPath [email protected]/s each
ToothercoresToI/Obridge
41
End-to-endCorei7AddressTranslationCPU
VPN VPO36 12
TLBT TLBI432
...
L1TLB(16sets,4entries/set)
VPN1 VPN299
PTE
CR3
PPN PPO40 12
Pagetables
TLBmiss
TLBhit
Physicaladdress(PA)
Result32/64
...
CT CO40 6
CI6
L2,L3,andmainmemory
L1d-cache(64sets,8lines/set)
L1hit
L1miss
Virtualaddress(VA)
VPN3 VPN499
PTE PTE PTE
42
SpeedingUpL1Access
• Observation• BitsthatdetermineCIidenticalinvirtualandphysicaladdress• Canindexintocachewhileaddresstranslationtakingplace• GenerallywehitinTLB,soPPNbits(CTbits)availablenext• “Virtuallyindexed,physicallytagged”• Cachecarefullysizedtomakethispossible
Physicaladdress
(PA)
CT CO40 6
CI6
Virtualaddress
(VA) VPN VPO
36 12
PPOPPN
AddressTranslation
NoChange
CIL1Cache
CT TagCheck
Corei7Level1-3PageTableEntries
Pagetablephysicalbaseaddress Unused G PS A CD WT U/S R/W P=1
Eachentryreferencesa4Kchildpagetable.Significantfields:P:Childpagetablepresentinphysicalmemory(1)ornot(0).
R/W:Read-onlyorread-writeaccessaccesspermissionforallreachablepages.
U/S:userorsupervisor(kernel)modeaccesspermissionforallreachablepages.
WT:Write-throughorwrite-backcachepolicyforthechildpagetable.
A:Referencebit(setbyMMUonreadsandwrites,clearedbysoftware).
PS:Pagesizeeither4KBor4MB(definedforLevel1PTEs only).
Pagetablephysicalbaseaddress:40mostsignificantbitsofphysicalpagetableaddress(forcespagetablestobe4KBaligned)
XD:DisableorenableinstructionfetchesfromallpagesreachablefromthisPTE.
51 12 11 9 8 7 6 5 4 3 2 1 0UnusedXD
AvailableforOS(pagetablelocationondisk) P=0
526263
44
Corei7Level4PageTableEntries
Pagephysicalbaseaddress Unused G D A CD WT U/S R/W P=1
Eachentryreferencesa4Kchildpage.Significantfields:P:Childpageispresentinmemory(1)ornot(0)
R/W:Read-onlyorread-writeaccesspermissionforchildpage
U/S:Userorsupervisormodeaccess
WT:Write-throughorwrite-backcachepolicyforthispage
A:Referencebit(setbyMMUonreadsandwrites,clearedbysoftware)
D:Dirtybit(setbyMMUonwrites,clearedbysoftware)
Pagephysicalbaseaddress:40mostsignificantbitsofphysicalpageaddress(forcespagestobe4KBaligned)
XD:Disableorenableinstructionfetchesfromthispage.
51 12 11 9 8 7 6 5 4 3 2 1 0UnusedXD
AvailableforOS(pagelocationondisk) P=0
526263
45
Corei7PageTableTranslation
CR3
Physicaladdressofpage
PhysicaladdressofL1PT
9VPO
9 12 Virtualaddress
L4PTPagetable
L4PTE
PPN PPO40 12 Physical
address
Offsetintophysicalandvirtualpage
VPN3 VPN4VPN2VPN1
L3PTPagemiddledirectory
L3PTE
L2PTPageupperdirectory
L2PTE
L1PTPageglobaldirectory
L1PTE
99
40/
40/
40/
40/
40/
12/
512GBregion
perentry
1GBregion
perentry
2MBregion
perentry
4KBregion
perentry
46