Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
TheVirtualBlockInterface:AFlexibleAlternativetothe
ConventionalVirtualMemoryFramework
Nastaran Hajinazar Pratyush PatelMinesh PatelKonstantinosKanellopoulos Saugata GhoseRachata Ausavarungnirun GeraldoF.Oliveira
JonathanAppavoo VivekSeshadriOnur Mutlu
ExecutiveSummary• Motivation:Moderncomputingsystemscontinuetodiversifywithrespecttosystemarchitecture,memorytechnologies,andapplications’memoryneeds
• Problem:Continuallyadaptingtheconventionalvirtualmemoryframeworktoeachpossiblesystemconfigurationischallenging- Resultsinperformancelossorrequiresnon-trivialworkarounds
• Goal:Designanalternativevirtualmemoryframeworkthat(1)Efficientlysupportsawidevarietyofnewsystemconfigurations(2)Providesthekeyfeaturesandeliminatesthekeyinefficienciesof
theconventionalvirtualmemoryframework
• VirtualBlockInterface(VBI):Delegatesmemorymanagementtodedicatedhardwareinthememorycontroller- Efficientlyadaptstodiversesystemconfigurations- Reducesoverheadsandcomplexitiesassociatedwithconventionalvirtualmemory- Enablesmanyoptimizations(e.g.,low-overheadpagewalksinvirtualmachines,virtualcaches)
• Evaluation:Twoexampleusecases1. VBI significantlyimprovesperformanceforbothnativeexecution(2.4x)
andvirtualmachineenvironments(4.3x)2. VBIsignificantlyimprovesheterogeneousmemoryarchitectureeffectiveness
2
Motivation
Outline
3
VBI:VirtualBlockInterfaceKeyIdea&GuidingPrinciplesDesignOverviewOptimizationsEnabledbyVBI
Methodology
Results
Summary
Motivation
Outline
4
VBI:VirtualBlockInterfaceKeyIdea&GuidingPrinciplesDesignOverviewOptimizationsEnabledbyVBI
Methodology
Results
Summary
ComputingSystemsAreDiversifying
6
Application
VirtualMemorymanagedbytheoperatingsystem
Hardware
Cannotadaptefficiently
ComputingSystemsAreDiversifying
7
Application
VirtualMemorymanagedbytheoperatingsystem
Hardware
Cannotadaptefficiently
Continuallyadaptingtheconventionalvirtualmemoryframeworkischallenging
8
Virtual Address Space (VAS)
P1
VAS 1
P2 Pn
Page Tablesmanaged by the OS
Physical Memory
VAS 2 VAS n
. . .Processes
ConventionalVirtualMemoryFramework
9
Virtual Address Space (VAS)
P1
VAS 1
P2 Pn
Page Tablesmanaged by the OS
Physical Memory
VAS 2 VAS n
. . .Processes
ConventionalVirtualMemoryFramework
eachprocessismappedtoafixed-sizevirtualaddressspace
e.g.,256TBinIntelx86-64
10
Virtual Address Space (VAS)
P1
VAS 1
P2 Pn
Page Tablesmanaged by the OS
Physical Memory
VAS 2 VAS n
. . .Processes
ConventionalVirtualMemoryFramework
one-to-onemappingmanagedbytheOS
11
Virtual Address Space (VAS)
P1
VAS 1
P2 Pn
Page Tablesmanaged by the OS
Physical Memory
VAS 2 VAS n
. . .Processes
ConventionalVirtualMemoryFramework
per-processpagetablestomapeachVAStophysicalmemory
managedbytheOS
readbyhardware
Challenges
• Threeexamplesofthechallenges inadaptingconventionalvirtualmemoryframeworksforincreasingly-diversesystems:
- Requiringa rigid pagetablestructure
- Highaddresstranslationoverhead invirtualmachines
- Inefficient heterogeneousmemorymanagement
12
Challenge1:RigidPageTableStructures
• Flexiblycustomizedpagetablescanreducetheaddresstranslationoverhead- Customizedtotheapplication’smemorybehavior• e.g.,largergranularitiesformoredenselyallocatedmemoryregions
• Con:- Requiresarigid pagetablestructure
• e.g.,fixed-granularity4-levelpagetableinIntelx86
13
Virtual Address Space (VAS)
P1
VAS 1
P2 Pn
Page Tablesmanaged by the OS
Physical Memory
VAS 2 VAS n
. . .Processes
accessedbybothOS andhardware
Challenges
• Threeexamplesofthechallenges inadaptingconventionalvirtualmemoryframeworksforincreasingly-diversesystems:
- Requiringa rigid pagetablestructure
- Highaddresstranslationoverhead invirtualmachines
- Inefficient heterogeneousmemorymanagement
14
Challenge2:OverheadsinVirtualMachines
15
Host Virtual Address Space
Host OS
Host Page Tables
Physical Memory
Process 1
VAS 1
Challenge2:OverheadsinVirtualMachines
16
• Invirtualmachines,processesgothroughanextralevelofindirection
• Con:- 2Dpagetablewalks
Guest OS
Host Virtual Address Space
Host OS
Process 2
Host Page Tables
Physical Memory
---- virtualization layer ----
Guest Virtual Address Space
g VAS
Guest Page Tables
Process 1
VAS 1 VAS 2
guestvirtual– to–
hostvirtual
hostvirtual– to–
hostphysical
Challenges
• Threeexamplesofthechallenges inadaptingconventionalvirtualmemoryframeworksforincreasingly-diversesystems:
- Requiringa rigid pagetablestructure
- Highaddresstranslationoverhead invirtualmachines
- Inefficient heterogeneousmemorymanagement
17
Page Tablesmanaged by the OS
• Enhancingperformancewithheterogenousmemoriesrequires:- Datamapping
18
Virtual Address Space (VAS)
P1
VAS 1
Challenge3:ManagingHeterogeneousMemory
Slow Mem.Fast Mem.
19
Virtual Address Space (VAS)
P1
VAS 1
Page Tablesmanaged by the OS
Challenge3:ManagingHeterogeneousMemory
Slow Mem.Fast Mem.
• Enhancingperformancewithheterogenousmemoriesrequires:- Datamapping- Datamigration
• Con:- OShaslowvisibilityintoruntimememorybehavior• Timelyreactiontothechangesischallenging
PriorWorks• Optimizationsthatalleviate theoverheads oftheconventionalvirtualmemoryframework
Shortcomings:• Basedonspecific systemorworkloadcharacteristics
- Areapplicabletoonlylimitedproblemsorapplications
• Requirespecialized andnotnecessarilycompatiblechangestoboththeOSandhardware- Implementingallinasystemisadauntingprospect
20
PriorWorks• Optimizationsthatalleviate theoverheads oftheconventionalvirtualmemoryframework
Shortcomings:• Basedonspecific systemorworkloadcharacteristics
- Areapplicabletoonlylimitedproblemsorapplications
• Requirespecialized andnotnecessarilycompatiblechangestoboththeOSandhardware- Implementingallinasystemisadauntingprospect
21
Weneedaholisticsolutionthatefficientlysupportsincreasinglydiversesystemconfigurations
Designanalternativevirtualmemoryframeworkthat
• Efficiently andflexibly supportsincreasinglydiversesystemconfigurations
• Provides thekeyfeatures ofconventionalvirtualmemoryframeworkwhileeliminating itskeyinefficiencies
OurGoal
22
Motivation
Outline
23
VBI:VirtualBlockInterfaceKeyIdea&GuidingPrinciplesDesignOverviewOptimizationsEnabledbyVBI
Methodology
Results
Summary
VirtualBlockInterface(VBI)
VBIisanalternativevirtualmemoryframework
Keyidea:
Delegate physicalmemorymanagementtodedicatedhardwareinthememorycontroller
24
VBI:GuidingPrinciples• Sizevirtualaddressspacesappropriatelyforprocesses
- Mitigates translationoverheads ofunnecessarilylargeaddressspaces
• Decoupleaddresstranslationfromaccessprotection- Defersaddresstranslationuntilnecessarytoaccessmemory- Enablestheflexibility ofmanagingtranslationandprotectionusingseparatestructures
• Communicatedatasemanticstothehardware- Enablesintelligentresourcemanagement
25
Addressestherigidnessandlackofinformationincurrentframeworks,toreducelargeoverheads
Motivation
Outline
26
VBI:VirtualBlockInterfaceKeyIdea&GuidingPrinciplesDesignOverviewOptimizationsEnabledbyVBI
Methodology
Results
Summary
VBI:Overview
27
Virtual Address Space (VAS)
P1
VAS 1
P2 Pn
Page Tablesmanaged by the OS
Physical Memory
VAS 2 VAS n
. . .Processes
VBIConventional Virtual Memory
VBI Address Space
P1
VB 1
P2 Pn
Memory Translation Layerin the memory controller
Physical Memory
VB 2 VB 3 VB 4
. . .Processes
VirtualBlocks
VBI Address Space
P1
VB 1
P2 Pn
Memory Translation Layerin the memory controller
Physical Memory
VB 2 VB 3 VB 4
. . .Processes• Globally-visible VBIaddressspace
28
VirtualBlocks
VBI Address Space
P1
VB 1
P2 Pn
Memory Translation Layerin the memory controller
Physical Memory
VB 2 VB 3 VB 4
. . .Processes• Globally-visible VBIaddressspace- Consistsofasetofvirtualblocks (VBs)ofdifferentsizes• Examplesizeclasses:4KB,128KB,4MB,128MB,4GB,128GB,4TB,128TB
29
VirtualBlocks
VBI Address Space
P1
VB 1
P2 Pn
Memory Translation Layerin the memory controller
Physical Memory
VB 2 VB 3 VB 4
. . .Processes• Globally-visible VBIaddressspace- Consistsofasetofvirtualblocks (VBs)ofdifferentsizes• Examplesizeclasses:4KB,128KB,4MB,128MB,4GB,128GB,4TB,128TB
• AllVBsarevisibletoallprocesses
30
VirtualBlocks
VBI Address Space
P1
VB 1
P2 Pn
Memory Translation Layerin the memory controller
Physical Memory
VB 2 VB 3 VB 4
. . .Processes• Globally-visible VBIaddressspace- Consistsofasetofvirtualblocks (VBs)ofdifferentsizes• Examplesizeclasses:4KB,128KB,4MB,128MB,4GB,128GB,4TB,128TB
• AllVBsarevisibletoallprocesses
• ProcessesmapeachsemanticallymeaningfulunitofinformationtoaseparateVB- e.g.,adatastructure,asharedlibrary
31
InherentlyVirtualCaches
• VBIaddressspaceprovidessystem-wide uniqueVBIaddresses
• VBIaddresses aredirectly usedtoaccesson-chipcaches- Nolongerrequireaddresstranslation
• Pros:- Enablesinherentlyvirtualcaches
• nosynonymsandhomonyms
32
VBI Address Space
P1
VB 1
P2 Pn
Memory Translation Layerin the memory controller
Physical Memory
VB 2 VB 3 VB 4
. . .Processes
Hardware-ManagedMemory
• MemorymanagementisdelegatedtotheMemoryTranslationLayer(MTL) inthememorycontroller- Addresstranslation- Physicalmemoryallocation
• Pros:Manybenefits,including- Physicalmemoryisallocatedonlywhenthelocationneedstobewrittentomemory
- Noneedfor2Dpagewalksinvirtualmachines
- Enablingflexibletranslationstructures33
VBI Address Space
P1
VB 1
P2 Pn
Memory Translation Layerin the memory controller
Physical Memory
VB 2 VB 3 VB 4
. . .Processes
OS-ManagedAccessPermissions
VBI Address Space
P1
VB 1
P2 Pn
Memory Translation Layerin the memory controller
Physical Memory
VB 2 VB 3 VB 4
. . .Processes• OScontrolswhichprocessesaccesswhichVBs
• Eachprocesshasitsownpermissions (read/write/execute)whenattaching toaVB
• OSmaintainsalistofVBsattachedtoeachprocess- Storedinaper-processtable- Usedduringpermissionchecks
34
Virtual Address Space (VAS)
P1
VAS 1
P2 Pn
Page Tablesmanaged by the OS
Physical Memory
VAS 2 VAS n
. . .Processes
ProcessAddressSpaceinVBI
VBI Address Space
P1
VB 1
P2 Pn
Memory Translation Layerin the memory controller
Physical Memory
VB 2 VB 3 VB 4
. . .Processes• AnyprocesscanattachtoanyVB
• Aprocess'VBsdefineitsaddressspace- Addressspacesizeisdeterminedbytheactual needsoftheprocess
theaddressspaceofprocessP1
35
ProcessAddressSpaceinVBI
VBI Address Space
P1
VB 1
P2 Pn
Memory Translation Layerin the memory controller
Physical Memory
VB 2 VB 3 VB 4
. . .Processes• AnyprocesscanattachtoanyVB
• Aprocess'VBsdefineitsaddressspace- Addressspacesizeisdeterminedbytheactual needsoftheprocess
theaddressspaceofprocessP1
36
Firstguidingprinciple:Appropriately-sizedvirtualaddressspaces
DecoupledProtectionandTranslation
AddressmappingmanagedbyOS
37
Virtual Address Space (VAS)
P1
VAS 1
P2 Pn
Page Tablesmanaged by the OS
Physical Memory
VAS 2 VAS n
. . .Processes
AccesspermissionsmanagedbyOS
VBI Address Space
P1
VB 1
P2 Pn
Memory Translation Layerin the memory controller
Physical Memory
VB 2 VB 3 VB 4
. . .Processes
AccesspermissionsmanagedbyOS
AddressmappingmanagedbytheMTL
ConventionalvirtualmemoryVBI
DecoupledProtectionandTranslation
AddressmappingmanagedbyOS
38
AccesspermissionsmanagedbyOS
VBI Address Space
P1
VB 1
P2 Pn
Memory Translation Layerin the memory controller
Physical Memory
VB 2 VB 3 VB 4
. . .Processes
AccesspermissionsmanagedbyOS
AddressmappingmanagedbytheMTL
Secondguidingprinciple:Decouplingaddresstranslationfromaccessprotection
VBI
AddressTranslationStructuresinVBI
VBI Address Space
P1
VB 1
P2 Pn
Memory Translation Layerin the memory controller
Physical Memory
VB 2 VB 3 VB 4
. . .Processes
39
• TranslationstructuresarenotsharedwiththeOS
- Separatestructuresfortranslationandpermissioninformation
- Allowsflexibletranslationstructures
- Per-VBtranslationstructuretunedtotheVB’scharacteristicse.g.,single-leveltablesforsmallVBs
• Pros:- Lowersoverheadsandallowsforcustomization
VBInformation
VBI Address Space
P1
VB 1
P2 Pn
Memory Translation Layerin the memory controller
Physical Memory
VB 2 VB 3 VB 4
. . .Processes
VB
Enable
Reference Counter
Properties
Size
• EachVBisassociatedwith- System-wideuniqueID- Sizei.e.,whichsizeclass
- Enablebit- ReferencecounternumberofprocessesattachedtotheVB
- PropertiesbitvectorsemanticinformationaboutVBcontents,e.g.,accesspattern,latencysensitivevs.bandwidthsensitive
X
40
VBInformation
VBI Address Space
P1
VB 1
P2 Pn
Memory Translation Layerin the memory controller
Physical Memory
VB 2 VB 3 VB 4
. . .Processes
VB
Enable
Reference Counter
Properties
Size
• EachVBisassociatedwith- System-wideuniqueID- Sizei.e.,whichsizeclass
- Enablebit- ReferencecounternumberofprocessesattachedtotheVB
- PropertiesbitvectorsemanticinformationaboutVBcontents,e.g.,accesspattern,latencysensitivevs.bandwidthsensitive
X
41
Thirdguidingprinciple:Communicatingdatasemanticstothehardware
ImplementingVBI
• Pleaserefertoourpaper
- Detailedreferenceimplementationandmicroarchitecture
42
Memory Controller
Memory Translation Layer (MTL)
L1miss
VBUID offset
L2
Last-Level Cache(LLC)
index = request_vb(...);x = malloc(index, size);
.
.
.y = (*x); Virtual
Address
Application
index offset
miss
VBIAddress
CPU Physical Memory
VITsCVTs
enable_vb attach
CVT(Client–VB Table)
Cache TranslationStructures
Data
Translation Walker
Physical AddressTLB
miss
hit
VIT(VB Info Table)
Cache
Motivation
Outline
43
VBI:VirtualBlockInterfaceKeyIdea&GuidingPrinciplesDesignOverviewOptimizationsEnabledbyVBI
Methodology
Results
Summary
• Manyoptimizationsnoteasilyattainablebefore
• Examples:- Appropriatelysizedprocessaddressspace- Flexibleaddresstranslationstructures- Communicatingdatasemanticstothehardware- Inherentlyvirtualcaches- Delayedphysicalmemoryallocation- Eliminating2Dpagewalksinvirtualmachines- Earlymemoryreservationmechanism
OptimizationsNaturallyEnabledbyVBI
44
Achievedthroughguiding
principles
Covered in our paper
Coverednext
• Inherentlyvirtualcaches
• Delayedphysicalmemoryallocation
• Eliminating2Dpagewalksinvirtualmachines
ExampleOptimizations
45
In VBIIn Conventional Virtual Memory
virtually-indexed virtually-tagged(VIVT)
InherentlyVirtualCaches
46
VIPTCache
Core
Physical Memory
TLB
virtual address
miss
hitphysical address
VIVTCache
Core
Physical Memory
TLB
virtual address
physical address
VBICache
Core
Physical Memory
MTL
VBI address
physical address
permission check
page walkmiss
VIVTCache
Synonyms&
homonyms
miss miss
permission check
VBICache
NoSynonyms
&homonyms
physical address
page walk
system-wideunique
virtually-indexed physically-tagged(VIPT)
In Conventional Virtual Memory In VBI
virtually-indexed virtually-tagged(VIVT)
caches in VBI
47
VIPTcache
Core
Physical Memory
TLB
virtual address
miss
hitphysical address
VIVTcache
Core
Physical Memory
TLB
virtual address
physical address
VBIcache
Core
Physical Memory
MTL
VBI address
physical address
permission check
page walkmiss
VIVTCache
Synonyms&
homonyms
miss miss
permission check
VBICache
NoSynonyms
&homonyms
physical address
page walk
system-wideunique
virtually-indexed physically-tagged(VIPT)
VBIreduces addresstranslationoverheadbyenablingbenefitsakintoVIVTcaches
InherentlyVirtualCaches
• Inherentlyvirtualcaches
• Delayedphysicalmemoryallocation
• Eliminating2Dpagewalksinvirtualmachines
ExampleOptimizations
48
misspage walk
In VBIvirtually-indexed physically-tagged (VIPT)In Conventional Virtual Memory
DelayedPhysicalMemoryAllocation
49
VIPTCache
Core
Physical Memory
TLB
virtual address
hitphysical address
VBICache
Core
Physical Memory
MTL
VBI address
miss
no memoryallocated
miss zeroedcache line
actualcache line
misspage walk
50
VIPTcache
Core
Physical Memory
TLB
virtual address
hitphysical address
VBICache
Core
Physical Memory
MTL
VBI address
writeback
allocatesmemory
actualcache line
miss
virtually-indexed physically-tagged (VIPT)In Conventional Virtual Memory In VBI
DelayedPhysicalMemoryAllocation
51
VBICache
Core
Physical Memory
MTL
VBI address
writeback
allocatesmemory
• Noaddresstranslationforaccessestoregionswithnoallocation
• Nomemoryaccessesto regionswithnoallocationyet
• NomemoryallocationforVBsthatneverleavethecacheduringtheirlifetime
In VBI
DelayedPhysicalMemoryAllocation
In VBI
52
VBICache
Core
Physical Memory
MTL
VBI address
writeback
Allocatesmemory
• Noaddresstranslation foraccessestoregionswithnoallocation
• Nomemoryaccessto regionswithnoallocationyet
• NomemoryallocationforVBsthatneverleavethecacheduringtheirlifetime
VBIreduces addresstranslationoverhead,improves overall performance,
and reduces memory consumption
DelayedPhysicalMemoryAllocation
• Inherentlyvirtualcaches
• Delayedphysicalmemoryallocation
• Eliminating2Dpagewalksinvirtualmachines
ExampleOptimizations
53
Guest OS
54
Host Virtual Address Space
Host OS
Process 2
Host Page Tables
Physical Memory
---- virtualization layer ----
Eliminating2DPageWalksinVirtualMachines
Guest Virtual Address Space
g VAS
Guest Page Tables
Process 1
VAS 1 VAS 2
Processrunningonavirtualmachine(VM)
guestvirtual– to–
hostvirtual
hostvirtual– to–
hostphysical
Conventionalvirtualmemory
Guest OS
55
Host Virtual Address Space
Host OS
Process 2
Host Page Tables
Physical Memory
---- virtualization layer ----
Eliminating2DPageWalksinVirtualMachines
Guest Virtual Address Space
g VAS
Guest Page Tables
Process 1
VAS 1 VAS 2
Processrunningonavirtualmachine(VM)
guestvirtual– to–
hostvirtual
hostvirtual– to–
hostphysical
Guest OS
VBI Address Space
Host OS
Process 1
VB 1
Process 2
Memory Translation Layerin Memory Controller
Physical Memory
VB 2 VB 3
---- virtualization layer ----
Conventionalvirtualmemory
GuestOS andhostOSinteractonce toattachProcess1toitsVBs
56
MTL istheonlycomponent inthesystemthatmanagesaddressmapping
Eliminating2DPageWalksinVirtualMachines
Guest OS
VBI Address Space
Host OS
Process 1
VB 1
Process 2
Memory Translation Layerin Memory Controller
Physical Memory
VB 2 VB 3
---- virtualization layer ----
VBI
GuestOS andhostOSinteractoncetoattachProcess1toitsVBs
57
MTL performsaddresstranslationandmemoryallocation
Eliminating2DPageWalksinVirtualMachines
Guest OS
VBI Address Space
Host OS
Process 2
VB 1
Process 1
Memory Translation Layerin Memory Controller
Physical Memory
VB 2 VB 3
---- virtualization layer ----
Byeliminating2Dpagewalks,VBIreduces addresstranslationoverhead
invirtualizedenvironments
Motivation
Outline
58
VBI:VirtualBlockInterfaceKeyIdea&GuidingPrinciplesDesignOverviewOptimizationsEnabledbyVBI
Methodology
Results
Summary
• Simulator:heavily-modifiedversionofRamulator- Modelsvirtualmemorycomponents(e.g.,TLBs,pagetables)- Availableathttps://github.com/CMU-SAFARI/Ramulator-VBI
• Workloads:SPECspeed2017,SPECCPU2006,TailBench,Graph500
• Systemparameters:- Core:4-wideissue,OOO,128-entryROB- L1Cache:32KB,8-wayassociative,4cycles- L2Cache:256KB,8-wayassociative,8cycles- L3Cache:8MB(2MBper-core),16-wayassociative,31cycles- L1DTLB:- 4KBpages:64-entry,fullyassociative
- 2MBpages:32-entry,fullyassociative- L2DTLB:4KBand2MBpages:512-entry,4-wayassociative- PageWalkCache:32-entry,fullyassociative- DRAM:DDR3-1600,1channel,1rank/channel,8banks/rank- PCM:PCM-800,1channel,1rank/channel,8banks/rank
Methodology
59
Motivation
Outline
60
VBI:VirtualBlockInterfaceKeyIdea&GuidingPrinciplesDesignOverviewOptimizationsEnabledbyVBI
Methodology
Results
Summary
• TheimpactofVBIonreducingtheaddresstranslationoverheadinbothnativeexecutionandvirtualmachines
• Evaluatedsystems:- Threebaselines:
• Native: applicationsrunnativelyonanx86-64system• Virtual: applicationsruninsideavirtualmachine(acceleratedusing2Dpagewalkcache[Bhargava+,ASPLOS’08])
• PerfectTLB:anunrealisticversionofNativewithnotranslationoverhead
- OneVBIconfiguration:• VBI-Full:VBIwithalltheoptimizationsthatitenables
• Seeourpaperforresultsonmoresystemconfigurations
UseCase1:AddressTranslation
61
0.0
0.5
1.0
1.5
2.0
2.5
astar
bzip2
GemsFDTD
mcfmilc
namd
sjeng
bwaves-1
7
deepsjeng-1
7
lbm-17
omnetpp-17
img-d
nnmose
s
Graph500AVG
Virtual Perfect TLB VBI-Full
UseCase1:AddressTranslation
62
Spee
dup
Normalized to Native
0.7x
13.3
8.9
0.0
0.5
1.0
1.5
2.0
2.5
astar
bzip2
GemsFDTD
mcfmilc
namd
sjeng
bwaves-1
7
deepsjeng-1
7
lbm-17
omnetpp-17
img-d
nnmose
s
Graph500AVG
Virtual Perfect TLB VBI-Full
UseCase1:AddressTranslation
63
Spee
dup
Normalized to Native
1.9x
13.3
8.9
0.0
0.5
1.0
1.5
2.0
2.5
astar
bzip2
GemsFDTD
mcfmilc
namd
sjeng
bwaves-1
7
deepsjeng-1
7
lbm-17
omnetpp-17
img-d
nnmose
s
Graph500AVG
Virtual Perfect TLB VBI-Full
UseCase1:AddressTranslation
64
Spee
dup
Normalized to Native
2.4x
13.3
8.9
0.0
0.5
1.0
1.5
2.0
2.5
astar
bzip2
GemsFDTD
mcfmilc
namd
sjeng
bwaves-1
7
deepsjeng-1
7
lbm-17
omnetpp-17
img-d
nnmose
s
Graph500AVG
Virtual Perfect TLB VBI-Full
UseCase1:AddressTranslation
65
Spee
dup
Normalized to Native
4.3x
13.3
8.9
0.0
0.5
1.0
1.5
2.0
2.5
astar
bzip2
GemsFDTD
mcfmilc
namd
sjeng
bwaves-1
7
deepsjeng-1
7
lbm-17
omnetpp-17
img-d
nnmose
s
Graph500AVG
Virtual Perfect TLB VBI-Full
UseCase1:AddressTranslation
66
Spee
dup
Normalized to Native
49%
13.3
8.9
0.0
0.5
1.0
1.5
2.0
2.5
astar
bzip2
GemsFDTD
mcfmilc
namd
sjeng
bwaves-1
7
deepsjeng-1
7
lbm-17
omnetpp-17
img-d
nnmose
s
Graph500AVG
Virtual Perfect TLB VBI-Full
UseCase1:AddressTranslation
67
Spee
dup
Normalized to Native
VBIsignificantlyimprovesperformanceinbothnative execution andvirtual machines
49%
13.3
8.9
• ThebenefitsofVBIinharnessingthefullpotentialofheterogeneousmemoryarchitectures
- HybridPCM–DRAMmemoryarchitecture
• Evaluatedsystems:- Twobaselines:
• Hotness-UnawarePCM–DRAM:unawareofthedatahotness
• IDEAL:alwaysmapsfrequently-accesseddatatoDRAM
- OneVBIconfiguration:• VBIPCM–DRAM:VBImapsandmigratesfrequently-accessedVBstotheDRAM
UseCase2:MemoryHeterogeneity
68
Moreinourpaper:• SimilarperformanceimprovementforTiered-Latency-DRAM[Lee+,HPCA’13]
UseCase2:MemoryHeterogeneity
69
0.0
0.5
1.0
1.5
2.0
2.5
astar
bzip2
GemsFDTD
hmmermcf
milc
soplex
sphinx3
bwaves-1
7
lbm-17
omnetpp-17
xalancb
mk-17
img-d
nnmose
s
Graph500AVG
VBI PCM-DRAM IDEAL
Normalized to Hotness-Unaware PCM–DRAM
Spee
dup
33%
UseCase2:MemoryHeterogeneity
70
0.0
0.5
1.0
1.5
2.0
2.5
astar
bzip2
GemsFDTD
hmmermcf
milc
soplex
sphinx3
bwaves-1
7
lbm-17
omnetpp-17
xalancb
mk-17
img-d
nnmose
s
Graph500AVG
VBI PCM-DRAM IDEAL
Normalized to Hotness-Unaware PCM–DRAM
Spee
dup
33%
VBIenablesefficient datamapping anddatamigration forheterogeneousmemorysystems
Motivation
Outline
71
VBI:VirtualBlockInterfaceKeyIdea&GuidingPrinciplesDesignOverviewOptimizationsEnabledbyVBI
Methodology
Results
Summary
• VirtualBlockInterface(VBI):Anewvirtualmemoryframework- Addressesthechallengesinadaptingconventionalvirtualmemorytoincreasinglydiversesystemconfigurationsandworkloads
• KeyIdea:Delegatephysicalmemorymanagementtodedicatedhardwareinthememorycontroller• Benefits:Noteasilyattainableinconventionalvirtualmemory(e.g.,inherentlyvirtualcaches,delayingphysicalmemoryallocation,andavoiding2Dpagewalksinvirtualmachines)• Evaluation:
- VBIsignificantlyimprovesperformanceinbothnativeexecutionandvirtualmachines
- Increasestheeffectivenessofmanagingheterogeneousmemoryarchitectures
• Conclusion:VBIisapromisingnewvirtualmemoryframework- Canenableseveralimportantoptimizations- Increasesdesignflexibilityforvirtualmemory- Anewdirectionforfutureworkinnovelvirtualmemoryframeworks
Summary
72
TheVirtualBlockInterface:AFlexibleAlternativetothe
ConventionalVirtualMemoryFramework
Nastaran Hajinazar Pratyush PatelMinesh PatelKonstantinosKanellopoulos Saugata GhoseRachata Ausavarungnirun GeraldoF.Oliveira
JonathanAppavoo VivekSeshadriOnur Mutlu