Upload
victor-wilkins
View
216
Download
2
Embed Size (px)
Citation preview
Computer Architecture 2012 – virtual memory1
Computer Architecture
Virtual Memory (VM)
By Dan Tsafrir, 10/6/2011Presentation based on slides by Lihu Rappoport
Computer Architecture 2012 – virtual memory2
http://www.youtube.com/watch?v=3ye2OXj32DM (funny beginning)
Computer Architecture 2012 – virtual memory3
DRAM (dynamic random-access memory)
Corsair 1333 MHz DDR3Laptop Memory
Price (at amazon.com):
– $43 for 4 GB
– $79 for 8 GB
“The physical memory”
Computer Architecture 2012 – virtual memory4
VM – motivation
Provides isolation between processes
– Processes can concurrently run on a single machine
– Vm prevents them from accessing the memory of one another
– (But still allows for convenient sharing when required) Provides illusion of large memory
– VM size can be bigger than physical memory size
– VM decouples program from real size (can differ across machines) Provides illusion of contiguous memory
– Programmers need not worry about where data is placed exactly Allows for memory dynamic growth
– Can add memory to processes at runtime as needed Allows for memory overcommitment
– Sum of VM spaces (across all processes) can be >= physical
– DRAM often one of the most costly parts in the system
Computer Architecture 2012 – virtual memory5
VM – terminology
Virtual address space
– Space used by the programmer
– “Ideal” = contagious & as big is you’d like
Physical address
– The real, underlying physical memory address
– Completely abstracted away by OS/HW
Computer Architecture 2012 – virtual memory6
VM – basic idea
Divide memory (virtual & physical) into fixed size blocks
– “page” = chunk of contagious data in virtual space
– “frame” = physical memory exactly enough to hold one page
– |page| = |frame| (= size)
– page size = power of 2 = 2k (bytes)
– By default, k=12 almost always => page size is 4KB
While virtual address space is contiguous
– Pages can be mapped into arbitrary frames
Pages can reside
– In memory or on disk (hence, overcommitment)
All programs are written using vm address space
– HW does on-the-fly translation from virtual and physical addresses
– Use a page table to translate between virtual and physical addresses
Computer Architecture 2012 – virtual memory7
VM – simplistic illustration
Memory acts as a cache for the secondary storage (disk) Immediate advantages
– Illusion of contiguity & of having more physical memory– Program actual location unimportant– Dynamic growth, isolation, & sharing are easy to obtain
pages(virtual space)
frames(DRAM)
address translation
disk
Computer Architecture 2012 – virtual memory8
Translation – use a “page table”
63
page offset (12bit)
011
virtual page number (52bit)
page offset (12bit)physical frame number (20bit)
virtual address (64bit)
physical address (32bit)
12
(page size is typically 212 byte = 4KB)
how to map?
Computer Architecture 2012 – virtual memory9
Translation – use a “page table”
V D frameNumber
1
page table baseregister
0
valid bit
dirty bit
AC
access control
(page size is typically 212 byte = 4KB)
Computer Architecture 2012 – virtual memory10
Translation – use a “page table”
63
page offset (12bit)
011
virtual page number (52bit)
page offset (12bit)
11 0
physical frame number (20bit)
31
virtual address (64bit)
physical address (32bit)
V D frameNumber
1
page table baseregister
0
valid bit
dirty bit
12
AC
access control
12
(page size is typically 212 byte = 4KB)
Computer Architecture 2012 – virtual memory11
Translation – use a “page table”
V D frameNumberAC
“PTE” (page table entry)
Computer Architecture 2012 – virtual memory12
Page tables
Valid
1
Physical Memory
Disk
Page Tablepoints to memory
frame or disk address
1
1
1
1
1
11
1
0
0
0
Virtual page number
Computer Architecture 2012 – virtual memory13
Checks
If ( valid == 1 )
page is in main memory at frame address stored in table
Data is readily available (e.g., can copy it to the cache)
else /*page fault */
need to fetch page from disk
causes a trap, usually accompanied by a context switch:
current process suspended while page is fetched from disk Access Control
– R=read-only, R/W=read/write, X=execute– If ( access type incompatible with specified access rights )
protection violation fault
traps to fault-handler Demand paging
– Pages fetched from secondary memory only upon the first fault– Rather then, e.g., upon file open
Computer Architecture 2012 – virtual memory14
Page replacement
Page replacement policy– Decided which page to place on disk
LRU (least recently used)– Typically too wasteful (updated upon each memory reference)
FIFO (first in first out)– Simplest: no need to update upon references, but ignores usage
Second-chance– Set per-page “was it referenced?” bit (can be done by HW or SW)– Swap out first page with bit = 0, FIFO order– When traversed, if bit = 1, set it to be 0 and push the associated
page to end of the list (in FIFO terms, page becomes newest) Clock
– More efficient variant of second-chance– Pages are cyclically ordered (no FIFO); search clockwise for first
page with bit=0; set bit=0 for pages that have bit=1
Computer Architecture 2012 – virtual memory15
Page replacement – cont.
NRU (not recently used)
– More sophisticated LRU approximation
– HW or SW maintains per-page ‘referenced’ & ‘modified’ bits
– Periodically (clock interrupt), SW turns ‘referenced’ off
– Replacement algorithm partitions pages to Class 0: not referenced, not modified Class 1: not referenced, modified Class 2: referenced, not modified Class 3: referenced, modified
– Choose at random a page from the lowest class for removal
– Underlying principles (order is important): Prefer keeping referenced over unreferenced Prefer keeping modified over unmodified
– Can a page be modified but not referenced?
Computer Architecture 2012 – virtual memory16
Page replacement – advanced
ARC (adaptive replacement cache)
– Factors not only recency (when latest access),but also frequency (how many times accessed)
– User determines which factor has more weight
– Better (but more wasteful) than LRU
– Develop by IBM: Nimrod Megiddo & Dharmendra Modha– Details: http://www.usenix.org/events/fast03/tech/full_papers/megiddo/megiddo.pdf
CAR (clock with adaptive replacement)
– Similar to ARC, and comparable in performance
– But, unlike ARC, doesn’t require user-specified parameters
– Likewise developed by IBM: Sorav Bansal & Dharmendra Modha– Details: http://www.usenix.org/events/fast04/tech/full_papers/bansal/bansal.pdf
Computer Architecture 2012 – virtual memory17
Page faults
Page faults: the data is not in memory retrieve it from disk
– CPU detects the situation (valid=0)
– But it cannot remedy the situation (doesn’t know disk; it’s the OS job)
– Thus, it must trap to OS
– OS loads page from disk Possibly writing victim page to disk (if no room & if dirty) Possibly avoids reading from disk due to OS “buffer cache”
– OS updates page table (valid=1)
– OS resumes process; now, HW will retry & succeed! Page fault incurs a significant penalty
– “Major” page fault = must go get page from disk
– “Minor” page fault = page already resides in OS buffer cache Possible only for files; not for “anonymous” spaces like the stack
– => pages shouldn’t be too small (as noted, typically 4KB)
Computer Architecture 2012 – virtual memory18
Page size
Smaller page size (typically 4KB)– PROS: minimizes internal fragmentation– CONS: increase size of page table
Bigger size (called “superpages” if > 4K)– PROS:
Amortize disk access cost May prefetch useful data May discard useless data early
– CONS: Increased fragmentation Might transfer unnecessary info at the expense of useful info
Lots of work to increase page size beyond 4K– HW supports it for years; OS is the “bottleneck”– Attractive because:
Bigger DRAMs, increasing memory/disk performance gap
Computer Architecture 2012 – virtual memory19
TLB (translation lookaside buffer)
Page table resides in memory
– Each translation requires a memory access
– Might be required for each load/store!
TLB
– Cache recently used PTEs
– speed up translation
– typically 128 to 256 entries
– usually 4 to 8 way associative
– TLB access time is comparable to L1 cache access time
Yes
NoTLB Hit ?
AccessPage Table
Virtual Address
Physical Addresses
TLB Access
Computer Architecture 2012 – virtual memory20
TLB is a cache for recent address translations:
Making Address Translation Fast
Valid
1
1
1
1
0
1
1
0
1
1
0
1
1
1
1
1
0
1
Physical Memory
Disk
Virtual page number
Page Table
Valid Tag Physical PageTLB
Physical PageOr
Disk Address
Computer Architecture 2012 – virtual memory21
TLB Access
Tag Set
Offset
Set#
Hit/Miss
Way MUX
PTE
1
1
1
1
1
1
11
1
11
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
====
Way 0 Way 1 Way 2 Way 3 Way 0 Way 1 Way 2 Way 3
Virtual page number
Computer Architecture 2012 – virtual memory22
Unified L2 L2 is unified (no separation for data/inst) – as the main memory
– In case of a miss in either: d-L1, i-L1, d-TLB, or i-TLB=> try to get missed data from L2
– PTEs can and do reside in L2
L1 Instruction cache
L1 Data Cache
L2 cache
DataTLB
InstructionTLB
translations translations Memory
Computer Architecture 2012 – virtual memory23
VM & cache
TLB access is serial with cache access => performance is crucial! Page table entries can be cached in L2 cache (as data)
Yes
No
AccessTLB
AccessPage TableIn Memory
Access Cache
Virtual Address
L1Cache Hit ?
Yes
No
Physical Addresses Data
No AccessMemory
L2Cache Hit ?
TLBHit ?
L2Cache Hit ?
No
Computer Architecture 2012 – virtual memory24
Overlapped TLB & cache access
#Set is not contained within the Page Offset
The #Set is not known until the physical page number is known
Cache can be accessed only after address translation done
VM view of a Physical Address
Cache view of a Physical Address
0
Page offset
11
Physical Page Number
1229
0
disp
13
tag
1429 5
set
6
Computer Architecture 2012 – virtual memory25
Overlapped TLB & cache access (cont)
In the above example #Set is contained within the Page Offset
The #Set is known immediately
Cache can be accessed in parallel with address translation
Once translation is done, match upper bits with tags
Limitation: Cache ≤ (page size × associativity)
Virtual Memory view of a Physical Address
Cache view of a Physical Address
0
Page offset
11
Physical Page Number
1229
029 5
disptag set
61112
Computer Architecture 2012 – virtual memory26
Overlapped TLB & cache access (cont)
Tag Set
Page offset
Set#
Virtual page number
set disp
Set#
Physical page number
TLB
Hit/Miss
Way MUX====
Cache
Way MUX= = = = = = = =
Hit/Miss
Data
Computer Architecture 2012 – virtual memory27
Overlapped TLB & cache access (cont) Assume cache is 32K Byte, 2 way set-associative, 64 byte/line
– (215/ 2 ways) / (26 bytes/line) = 215-1-6 = 28 = 256 sets In order to still allow overlap between set access and TLB access
– Take the upper two bits of the set number from bits [1:0] of the VPN
Physical_addr[13:12] may be different than virtual_addr[13:12]
– Tag is comprised of bits [31:12] of the physical address The tag may mis-match bits [13:12] of the physical address
– Cache miss allocate missing line according to its virtual set address and physical tag
0
Page offset
11
Physical Page Number
1229
0
disp
13 12 11
tag
1429 5
set
6
VPN[1:0]
Computer Architecture 2012 – virtual memory28
DMA (direct memory access) DMA copies page from/to, e.g., disk controller (or other I/O dev)
– Access memory without requiring CPU involvement– Assume we copy from memory to disk (swap out page)– Read each relevant block:
Snoop-invalidate if resides in cache (L1, L2), meaning: if it is modified, copy line from cache into memory invalidates cache line
– Writes the line to the disk controller– This means that when a page is swapped-out of memory
All data in the caches which belongs to that page is invalidated The page in the disk is up-to-date
In the page table – Assign 0 to valid bit in PTE of swapped-out pages– The rest of the PTE bits may be used by the OS for keeping the
location of the page on disk– TLB entry of swapped out page is likewise invalidated
Computer Architecture 2012 – virtual memory29
Context switch
Each process has its own address space– Akin to saying “each process has its own page table”– OS allocates frames for process => updates process's page table– If only one PTE points to frame throughput the system
Only the associated process can access the corresponding frame– Shared memory
Two PTEs of two processes point to the same frame Upon context switching
– Save current architectural state to memory: Architectural registers, including Register that holds the page table base address in memory
– Flush TLB As same virtual addresses are routinely reused (Recently “VPID” added to TLB to some x86’s => no need to flush)
– Load the new architectural state from memory Architectural registers Register that holds the page table base address in memory
Computer Architecture 2012 – virtual memory30
Virtually-addressed cache Cache uses virtual addresses (tags are virtual)
Only require address translation on cache miss– TLB not in path to cache hit! But…
Aliasing: >=2 virtual addresses mapped to same physical address– => >=2 cache lines holding data of same physical address – => Must update all cache entries with same physical address
data
Trans-lation
Cache
MainMemory
VA
hit
PA
CPU
Computer Architecture 2012 – virtual memory31
Virtually-addressed cache
Cache must be flushed at task switch
– Possible solution: include unique process ID (PID) in tag (like the
VPID we discussed earlier)