47
Virtual Memory Samira Khan Apr 27, 2017 1

Virtual Memory - cs.virginia.educr4bd/3330/S2017/notes/20170427... · Overview of Paging l l al Process 1 Process 2 B B B ... •This was the entire purpose of virtual memory

Embed Size (px)

Citation preview

VirtualMemorySamiraKhanApr27,2017

1

VirtualMemory• Idea:Givetheprogrammertheillusionofalargeaddressspacewhilehavingasmallphysicalmemory• Sothattheprogrammerdoesnotworryaboutmanagingphysicalmemory

• Programmercanassumehe/shehas“infinite” amountofphysicalmemory

• Hardwareandsoftwarecooperativelyandautomaticallymanagethephysicalmemoryspacetoprovidetheillusion• Illusionismaintainedforeachindependentprocess

2

BasicMechanism• Indirection(inaddressing)

• Addressgeneratedbyeachinstructioninaprogramisa“virtualaddress”• i.e.,itisnotthephysicaladdressusedtoaddressmainmemory

• An“addresstranslation” mechanismmapsthisaddresstoa“physicaladdress”• Addresstranslationmechanismcanbeimplementedinhardwareandsoftwaretogether

“Attheheart[...]isthenotionthat‘address’isaconceptdistinct from‘physicallocation.’”PeterDenning

3

OverviewofPaging

virtua

lvirtua

l

physica

l

Process1

Process2

4GB

4GB

16MB

VirtualPage

VirtualPage

PhysicalPageFrame

4

Review:VirtualMemory&PhysicalMemory

null

null

Memoryresidentpagetable(DRAM)

Physicalmemory(DRAM)

VP7VP4

Virtualmemory(disk)

Valid01

010

10

1

Physicalpagenumberordiskaddress

PTE0

PTE7

PP0VP2VP1

PP3

VP1

VP2

VP4

VP6

VP7

VP3

Virtualaddress

¢ Apagetablecontainspagetableentries(PTEs)thatmapvirtualpagestophysicalpages.

5

Translation• Assume:VirtualPage7ismappedtoPhysicalPage32• ForanaccesstoVirtualPage7…

031

011001

1112

0000000111

OffsetVPNVirtualAddress:

027

011001

1112

0000100000

OffsetPPNPhysicalAddress:

Translated

6

AddressTranslationWithaPageTable

Virtualpagenumber(VPN) Virtualpageoffset(VPO)

Physicalpagenumber(PPN) Physicalpageoffset(PPO)

Virtualaddress

Physicaladdress

Valid Physicalpagenumber(PPN)

Pagetablebaseregister(PTBR)

(CR3inx86)

Pagetable

Physicalpagetableaddressforthecurrentprocess

Validbit=0:Pagenotinmemory

(pagefault)

0p-1pn-1

0p-1pm-1

Validbit=1

7

AddressTranslation:PageHit

1)ProcessorsendsvirtualaddresstoMMU

2-3)MMUfetchesPTEfrompagetableinmemory

4)MMUsendsphysicaladdresstocache/memory

5)Cache/memorysendsdatawordtoprocessor

MMU Cache/MemoryPA

Data

CPU VA

CPUChip PTEA

PTE1

2

3

4

5

8

AddressTranslation:PageFault

1)ProcessorsendsvirtualaddresstoMMU2-3)MMUfetchesPTEfrompagetableinmemory4)Validbitiszero,soMMUtriggerspagefaultexception5)Handleridentifiesvictim(and,ifdirty,pagesitouttodisk)6)HandlerpagesinnewpageandupdatesPTEinmemory7)Handlerreturnstooriginalprocess,restartingfaultinginstruction

MMU Cache/Memory

CPU VA

CPUChip PTEA

PTE1

2

3

4

5

Disk

Pagefaulthandler

Victimpage

Newpage

Exception

6

7

9

IntegratingVMandCache

VACPU MMU

PTEA

PTE

PA

Data

MemoryPAPA

miss

PTEAPTEAmiss

PTEAhit

PAhit

Data

PTE

L1cache

CPUChip

VA:virtualaddress,PA:physicaladdress,PTE:pagetableentry,PTEA=PTEaddress10

TwoProblems

• Twoproblemswithpagetables

•Problem#1:Pagetableistoolarge

• Problem#2:Pagetableisstoredinmemory• Beforeeverymemoryaccess,alwaysfetchthePTEfromtheslowmemory?èLargeperformancepenalty

11

Multi-LevelPageTables• Suppose:

• 4KB(212)pagesize,48-bitaddressspace,8-bytePTE

• Problem:• Wouldneeda512GBpagetable!

• 248 *2-12*23 =239 bytes

• Commonsolution:Multi-levelpagetable• Example:2-levelpagetable

• Level1table:eachPTEpointstoapagetable(alwaysmemoryresident)

• Level2table:eachPTEpointstoapage(pagedinandoutlikeanyotherdata)

Level1Table

...

Level2Tables

...

12

ATwo-LevelPageTableHierarchyLevel1

pagetable

...

Level2pagetables

VP0

...

VP1023

VP1024

...

VP2047

Gap

0

PTE0

...

PTE1023

PTE0

...

PTE1023

1023nullPTEs

PTE1023 1023unallocated

pagesVP9215

Virtualmemory

(1K- 9)nullPTEs

PTE0

PTE1

PTE2(null)

PTE3(null)

PTE4(null)

PTE5(null)

PTE6(null)

PTE7(null)

PTE8

2KallocatedVMpagesforcodeanddata

6KunallocatedVMpages

1023unallocatedpages

1allocatedVMpageforthestack

32bitaddresses,4KBpages,4-bytePTEs 13

Translatingwithak-levelPageTable

Pagetablebaseregister

(PTBR)

VPN10p-1n-1

VPOVPN2 ... VPNk

PPN

0p-1m-1PPOPPN

VIRTUALADDRESS

PHYSICALADDRESS

... ...

theLevel1pagetable

aLevel2pagetable

aLevelkpagetable

14

Translation:“Flat”PageTablepte_t PAGE_TABLE[1<<20];// 32-bit VA, 28-bit PA, 4KB page

PAGE_TABLE[7]=2;

31XXX000000111

OffsetVPN

VirtualAddress01112

NULL PTE0NULL PTE1

NULL PTE7

NULL PTE1<<20-1

······

15 0PAGE_TABLE

27XXX000000010

OffsetPPN

PhysicalAddress01112

000000010 PTE7

15

NULLPDE0

Translation:Two-LevelPageTablepte_t *PAGE_DIRECTORY[1<<10];

PAGE_DIRECTORY[0]=malloc((1<<10)*sizeof(pte_t));

PAGE_DIRECTORY[0][7]=2;

&PT0PDE0NULLPDE1

NULLPDE102331 0PAGE_DIR

NULL PTE0

PTE7

NULL PTE102315 0

NULL

PAGE_TABLE0

PTE7000000010

VPN[31:12]=0000000000_0000000111Directoryindex Tableindex

16

Two-LevelPageTable(x86)

• CR3:ControlRegister3(orPageDirectoryBaseRegister)• Storesthephysical addressofthepagedirectory• Q:Whynotthevirtualaddress?

17

Multi-LevelPageTable(x86-64)

18

Per-ProcessVirtualAddressSpace• Eachprocesshasitsownvirtualaddressspace• ProcessX:texteditor• ProcessY:videoplayer• X writingtoitsvirtualaddress0doesnot affectthedatastoredinY’svirtualaddress0(oranyotheraddress)• Thiswastheentirepurposeofvirtualmemory

• Eachprocesshasitsownpagedirectoryandpagetables• Onacontextswitch,theCR3’svaluemustbeupdated

X’sPAGE_DIR Y’sPAGE_DIR

CR319

TwoProblems

• Twoproblemswithpagetables• Problem#1:Pagetableistoolarge• Pagetablehas1Mentries• Eachentryis4B(because4B≈20-bitPPN)• Pagetable=4MB(!!)• veryexpensiveinthe80s

• Solution: Hierarchicalpagetable

•Problem#2:Pagetableisinmemory• Beforeeverymemoryaccess,alwaysfetchthePTEfromtheslowmemory?èLargeperformancepenalty

20

SpeedingupTranslationwithaTLB

• Pagetableentries(PTEs)arecachedinL1likeanyothermemoryword• PTEsmaybeevictedbyotherdatareferences• PTEhitstillrequiresasmallL1delay

• Solution:TranslationLookaside Buffer (TLB)• Smallset-associativehardwarecacheinMMU• Mapsvirtualpagenumberstophysicalpagenumbers• Containscompletepagetableentriesforsmallnumberofpages

21

AccessingtheTLB• MMUusestheVPNportionofthevirtualaddresstoaccesstheTLB:

TLBtag(TLBT) TLBindex(TLBI)0p-1pn-1

VPO

VPN

p+t-1p+t

PTEtagv

…PTEtagvSet0

PTEtagv PTEtagvSet1

PTEtagv PTEtagvSetT-1

T=2t sets

TLBIselectstheset

TLBTmatchestagoflinewithinset

22

TLBHit

MMU Cache/Memory

CPU

CPUChip

VA1

PA

4

Data5

ATLBhiteliminatesamemoryaccess

TLB

2

VPN

PTE

3

23

TLBMiss

MMU Cache/MemoryPA

Data

CPU VA

CPUChip

PTE

1

2

5

6

TLB

VPN

4

PTEA3

ATLBmissincursanadditionalmemoryaccess(thePTE)Fortunately,TLBmissesarerare.Why?

24

SimpleMemorySystemExample• Addressing• 14-bitvirtualaddresses• 12-bitphysicaladdress• Pagesize=64bytes

13 12 11 10 9 8 7 6 5 4 3 2 1 0

11 10 9 8 7 6 5 4 3 2 1 0

VPO

PPOPPN

VPN

VirtualPageNumber VirtualPageOffset

PhysicalPageNumber PhysicalPageOffset25

0–021340A10D030–073

0–030–060–080–022

0–0A0–040–0212D031

102070–0010D090–030

ValidPPNTagValidPPNTagValidPPNTagValidPPNTagSet

SimpleMemorySystemTLB• 16entries• 4-wayassociative

13 12 11 10 9 8 7 6 5 4 3 2 1 0

VPOVPN

TLBITLBT

0 0 0 0 1 1 0 1

0–021340A10D030–073

0–030–060–080–022

0–0A0–040–0212D031

102070–0010D090–030

ValidPPNTagValidPPNTagValidPPNTagValidPPNTagSet

TranslationLookasideBuffer(TLB)

VPN=0b1101PPN=?

26

SimpleMemorySystemPageTableOnlyshowingthefirst16entries(outof256)

10D0F1110E12D0D0–0C0–0B1090A1170911308

ValidPPNVPN

0–070–06116050–0410203133020–0112800

ValidPPNVPN

0x0D→ 0x2D

27

VPN=0b1101PPN=?

ContextSwitches

• AssumethatProcessX isrunning• ProcessX’sVPN5ismappedtoPPN100• TheTLBcachesthismapping• VPN5à PPN100

• NowassumeacontextswitchtoProcessY• ProcessY’sVPN5ismappedtoPPN200• WhenProcessYtriestoaccessVPN5,itsearchestheTLB• ProcessY findsanentrywhosetagis5• Hurray!It’saTLBhit!• ThePPNmustbe100!• …Areyousure? 28

ContextSwitches(cont’d)

• Approach#1.FlushtheTLB• Wheneverthereisacontextswitch,flushtheTLB

• AllTLBentriesareinvalidated• Example:80836

• UpdatingthevalueofCR3signalsacontextswitch• ThisautomaticallytriggersaTLBflush

• Approach#2.AssociateTLBentrieswithprocesses• AllTLBentrieshaveanextrafieldinthetag...

• Thatidentifiestheprocesstowhichitbelongs• Invalidateonlytheentriesbelongingtotheoldprocess• Example:Modernx86,MIPS

29

HandlingTLBMisses

• TheTLBissmall;itcannotholdall PTEs• SometranslationswillinevitablymissintheTLB• MustaccessmemorytofindtheappropriatePTE• Calledwalking thepagedirectory/table• Largeperformancepenalty

• WhohandlesTLBmisses?1. Hardware-ManagedTLB2. Software-ManagedTLB

30

HandlingTLBMisses(cont’d)

• Approach#1.Hardware-Managed (e.g.,x86)• Thehardwaredoesthepagewalk• ThehardwarefetchesthePTEandinsertsitintotheTLB• IftheTLBisfull,theentryreplaces anotherentry

• Allofthisisdonetransparently

• Approach#2.Software-Managed (e.g.,MIPS)• Thehardwareraisesanexception• Theoperatingsystemdoesthepagewalk• TheoperatingsystemfetchesthePTE• Theoperatingsysteminserts/evictsentriesintheTLB

31

HandlingTLBMisses(cont’d)

• Hardware-ManagedTLB• Pro:Noexceptions.Instructionjuststalls• Pro:Independentinstructionsmaycontinue• Pro:Smallfootprint(noextrainstructions/data)• Con:Pagedirectory/tableorganizationisetchedinstone

• Software-ManagedTLB• Pro:TheOScandesignthepagedirectory/table• Pro:MoreadvancedTLBreplacementpolicy• Con:Flushespipeline• Con:Performanceoverhead

32

AddressTranslationandCaching• Whendowedotheaddresstranslation?• BeforeorafteraccessingtheL1cache?

• Inotherwords,isthecachevirtuallyaddressedorphysicallyaddressed?• Virtualversusphysicalcache

• Whataretheissueswithavirtuallyaddressedcache?

• Synonymproblem:• Twodifferentvirtualaddressescanmaptothesamephysicaladdressà samephysicaladdresscanbepresentinmultiplelocationsinthecacheà canleadtoinconsistencyindata

33

HomonymsandSynonyms• Homonym:SameVAcanmaptotwodifferentPAs• Why?

• VAisindifferentprocesses

• Synonym:DifferentVAscanmaptothesamePA• Why?

• Differentpagescansharethesamephysicalframewithinoracrossprocesses

• Reasons:sharedlibraries,shareddata,copy-on-writepageswithinthesameprocess,…

• Dohomonymsandsynonymscreateproblemswhenwehaveacache?• Isthecachevirtuallyorphysicallyaddressed?

34

Cache-VMInteraction

CPU

TLB

cache

lowerhier.

physicalcache

CPU

cache

tlb

lowerhier.

virtual(L1)cache

VA

PA

CPU

cache tlb

lowerhier.

virtual-physicalcache

VA

PA

VA

PA

35

Virtually-IndexedPhysically-Tagged• IfC≤(page_size ´ associativity),thecacheindexbitscomeonlyfrompageoffset(sameinVAandPA)• IfbothcacheandTLBareonchip• indexbotharraysconcurrentlyusingVAbits• checkcachetag(physical)againstTLBoutputattheend

VPN PageOffset

TLB

PPN

CIndex CO

physicalcache

tag data=

cachehit?TLBhit? 36

Virtually-IndexedPhysically-Tagged• IfC>(page_size ´ associativity),thecacheindexbitsincludeVPNÞ Synonymscancauseproblems• Thesamephysicaladdresscanexistintwolocations

• Solutions?VPN PageOffset

TLB

PPN

CacheIndex CO

physicalcache

tag data=

cachehit?TLBhit?

a

37

SanityCheck• Core2Duo:32KB,8-waysetassociative,pagesize≥4K• Cachesize≤(page_size ´ associativity)?• 2P = 4KP =12

• Needs12bitsforpageoffset• 2C=32KB,C=15

• Needs15bitstoaddressabyteinthecache• 2A =8-way,A=3

• Increasingtheassociativityofthecachereducesthenumberofaddressbitsneededtoindexintothecache

• Needs12bitsforcacheindexandoffset,astagsarematchedforblocksinthesameset• C≤P+A?15≤12+3?True

38

SomeSolutionstotheSynonymProblem• Limitcachesizeto(pagesizetimesassociativity)• getindexfrompageoffset

• Onawritetoablock,searchallpossibleindicesthatcancontainthesamephysicalblock,andupdate/invalidate• UsedinAlpha21264,MIPSR10K

• RestrictpageplacementinOS• makesureindex(VA)=index(PA)• Calledpagecoloring• UsedinmanySPARCprocessors

39

Today

• Casestudy:Corei7/Linuxmemorysystem

40

IntelCorei7MemorySystem

L1d-cache32KB,8-way

L2unifiedcache256KB,8-way

L3unifiedcache8MB,16-way

(sharedbyallcores)

Mainmemory

Registers

L1d-TLB64entries,4-way

L1i-TLB128entries,4-way

L2unifiedTLB512entries,4-way

L1i-cache32KB,8-way

MMU(addr translation)

Instructionfetch

Corex4

DDR3Memorycontroller3x [email protected]/s

32GB/s total(sharedbyallcores)

Processorpackage

QuickPath [email protected]/s each

ToothercoresToI/Obridge

41

End-to-endCorei7AddressTranslationCPU

VPN VPO36 12

TLBT TLBI432

...

L1TLB(16sets,4entries/set)

VPN1 VPN299

PTE

CR3

PPN PPO40 12

Pagetables

TLBmiss

TLBhit

Physicaladdress(PA)

Result32/64

...

CT CO40 6

CI6

L2,L3,andmainmemory

L1d-cache(64sets,8lines/set)

L1hit

L1miss

Virtualaddress(VA)

VPN3 VPN499

PTE PTE PTE

42

SpeedingUpL1Access

• Observation• BitsthatdetermineCIidenticalinvirtualandphysicaladdress• Canindexintocachewhileaddresstranslationtakingplace• GenerallywehitinTLB,soPPNbits(CTbits)availablenext• “Virtuallyindexed,physicallytagged”• Cachecarefullysizedtomakethispossible

Physicaladdress

(PA)

CT CO40 6

CI6

Virtualaddress

(VA) VPN VPO

36 12

PPOPPN

AddressTranslation

NoChange

CIL1Cache

CT TagCheck

Corei7Level1-3PageTableEntries

Pagetablephysicalbaseaddress Unused G PS A CD WT U/S R/W P=1

Eachentryreferencesa4Kchildpagetable.Significantfields:P:Childpagetablepresentinphysicalmemory(1)ornot(0).

R/W:Read-onlyorread-writeaccessaccesspermissionforallreachablepages.

U/S:userorsupervisor(kernel)modeaccesspermissionforallreachablepages.

WT:Write-throughorwrite-backcachepolicyforthechildpagetable.

A:Referencebit(setbyMMUonreadsandwrites,clearedbysoftware).

PS:Pagesizeeither4KBor4MB(definedforLevel1PTEs only).

Pagetablephysicalbaseaddress:40mostsignificantbitsofphysicalpagetableaddress(forcespagetablestobe4KBaligned)

XD:DisableorenableinstructionfetchesfromallpagesreachablefromthisPTE.

51 12 11 9 8 7 6 5 4 3 2 1 0UnusedXD

AvailableforOS(pagetablelocationondisk) P=0

526263

44

Corei7Level4PageTableEntries

Pagephysicalbaseaddress Unused G D A CD WT U/S R/W P=1

Eachentryreferencesa4Kchildpage.Significantfields:P:Childpageispresentinmemory(1)ornot(0)

R/W:Read-onlyorread-writeaccesspermissionforchildpage

U/S:Userorsupervisormodeaccess

WT:Write-throughorwrite-backcachepolicyforthispage

A:Referencebit(setbyMMUonreadsandwrites,clearedbysoftware)

D:Dirtybit(setbyMMUonwrites,clearedbysoftware)

Pagephysicalbaseaddress:40mostsignificantbitsofphysicalpageaddress(forcespagestobe4KBaligned)

XD:Disableorenableinstructionfetchesfromthispage.

51 12 11 9 8 7 6 5 4 3 2 1 0UnusedXD

AvailableforOS(pagelocationondisk) P=0

526263

45

Corei7PageTableTranslation

CR3

Physicaladdressofpage

PhysicaladdressofL1PT

9VPO

9 12 Virtualaddress

L4PTPagetable

L4PTE

PPN PPO40 12 Physical

address

Offsetintophysicalandvirtualpage

VPN3 VPN4VPN2VPN1

L3PTPagemiddledirectory

L3PTE

L2PTPageupperdirectory

L2PTE

L1PTPageglobaldirectory

L1PTE

99

40/

40/

40/

40/

40/

12/

512GBregion

perentry

1GBregion

perentry

2MBregion

perentry

4KBregion

perentry

46

VirtualMemorySamiraKhanApr27,2017

47