VIRTUAL MEMORYSlides by: Pedro Tomás
Additional reading: Computer Architecture: A Quantitative Approach”, 5th edition, Chapter 2 and Appendix B, John L. Hennessy and David A. Patterson, Morgan Kaufmann, 2011
ADVANCED COMPUTER ARCHITECTURES
ARQUITECTURAS AVANÇADAS DE COMPUTADORES (AAC)
Advanced Computer Architectures, 2014
Outline
2
Introduction to virtual memory
Virtual memory as a cache
Basic translation scheme
Multilevel translation schemes
Advanced Computer Architectures, 2014
Benefits
3
Virtual memory systems where developed to allow the management of both
the RAM and Hard Disk Drive (HDD) memory in a simple, easy and uniform
way for the programmers.
Virtual memory allows extending the available memory to the sum of the RAM plus
the HDD
Virtual memory is fundamental in modern computing systems for multi-
tasking purposes:
It simplifies sharing the physical memory between the multiple running tasks
It allows tasks to be loaded into (and executed from) any memory address space
It does not forces programs to be re-compiled if a given address space is already occupied
Advanced Computer Architectures, 2014
Virtual memory as a cache
4
Virtual memory works similarly to cache systems
The HDD is considered as the larger memory system
The RAM memory works as a cache of the pages stored in HDD
In practice, pages are typically either on RAM memory or on HDD, and not on both.
The basic “cache” blocks are much larger in size:
In virtual memory systems with fixed-size blocks, the blocks are named pages
In virtual memory systems with variable-size blocks, the blocks are named segments
Locality principles are used to keep most used blocks in RAM memory
Allows reducing the memory access time
Write-back, write allocate policy to decrease the time to write data
Pre-fetching techniques can be used to decrease memory access time
Page misses (page fault) typically result in a large overhead for data copy
Page misses are managed by the operating system
Since the overhead is large, complex page replacement policies can be used
When a page fault occurs, the process is typically removed from execution
Advanced Computer Architectures, 2014
Basic translation scheme
5
The programmer accesses data using a virtual address
Each process has a distinct virtual address space
The processor checks if the virtual address is cached on RAM memory
If the address is not on RAM, a page fault occurs
If the address is on RAM memory, the processor translates the virtual address into
the physical address (i.e., the location) on RAM memory
Advanced Computer Architectures, 2014
Basic translation scheme
6
Consider a 32-bit processor and a virtual memory system with pages of
4KB
Each memory access (for instruction fetch or for load/store) uses a virtual
address with length of 32-bits
The address can be decomposed into virtual page number and page offset
Virtual address
031
Page number
031
Virtual address
Page offset
1112
Advanced Computer Architectures, 2014
Basic translation scheme
7
The page number is used to address a page table that stores the location
of the given page in physical memory
Page number
031
Virtual address
Page offset
1112
PAGE TABLE
Page base
031
Page offset
1112
Physical address of data in RAM memory
The page table makes the translation between virtual and physical address
Advanced Computer Architectures, 2014
Basic translation scheme
8
The page table must be stored in memory
each process must have its own page table
Page number
031
Virtual address
Page offset
1112
PAGE TABLE
Page base
031
Page offset
1112
Physical address of data in RAM memory
Each process has a page table base pointer
The page table base pointer is a special register associated with the running thread, which must be saved whenever there is a context switch
Page table pointer
Page base
031
Page offset
1920
Physical address the corresponding
page table entry
Advanced Computer Architectures, 2014
Basic translation scheme
9
The page table must be stored in memory
each process must have its own page table
Page number
031
Virtual address
Page offset
1112
RAM memory
Page base
031
Page offset
1112
Physical address of data in RAM memory
Each process has a page table base pointer
The page table base pointer is a special register associated with the running thread, which must be saved whenever there is a context switch
Page table pointer
Page base
031
Page offset
1112
Physical address the corresponding
page table entry
Empty
space
Process A
Page Table
Process A
Virtual Page x
Process B
Virtual Page y
Process A
Virtual Page zData
Advanced Computer Architectures, 2014
Basic translation scheme
10
Consider a virtual memory system with pages of 4KB, virtual and physical
addresses of 32bits and page table entries of 4B
What is the size of the page table for a process occupying 128MB of
virtual space
Solution:
#Virtual pages = 128MB / 4KB = 227 / 212 = 215
Page table size = 215 (pages) x 4B (per table) = 217B = 128KB > 4KB!!!!
Page tables can use a large amount of space in RAM memory
This problem can be mitigated by paging the page table
Advanced Computer Architectures, 2014
Multilevel translation
12
Consider a virtual memory system with pages of 4KB, virtual and physical
addresses of 32bits and page table entries (PTEs) of 4B
How many levels are required to guarantee that each page (data/page
table/directory) is at most 4KB?
Solution:
0
log2(4KB) = 12 bits
Page offset
1131
Each page table has
at most 4KB
Since each page table
entry uses 4B, a total
of 1K entries can be
stored in each page
table
log2(1K) = 10 bits
Page table offset
21 12
Each page table has
at most 4KB
Since each page table
entry uses 4B, a total
of 1K entries can be
stored in each page
table
log2(1KB) = 10 bits
Directory offset
22
Advanced Computer Architectures, 2014
Multilevel translation
13
Consider a virtual memory system with pages of 4KB, virtual addresses of
48 bits, physical addresses of 54 bits and page table entries (PTEs) of 8B(one of the possible combinations in the Intel IA-32e architecture)
Explain how the virtual address 0001 34AB FFA0h is translated into a
physical address
Solution:0
log2(4KB) = 12 bits
Page offset
11
Each page table has
at most 4KB
Since each page table
entry uses 8B, a total
of 512 entries can be
stored in each page
table
log2(512) = 9 bits
PT (Level 1) offset
20 12
Each page table has
at most 4KB
Since each page table
entry uses 8B, a total
of 512 entries can be
stored in each page
table
log2(512) = 9 bits
PT (Level 2) offset
29 21
Each page table has
at most 4KB
Since each page table
entry uses 8B, a total
of 512 entries can be
stored in each page
table
log2(512) = 9 bits
PT (Level 3) offset
38 30
Each page table has
at most 4KB
Since each page table
entry uses 8B, a total
of 512 entries can be
stored in each page
table
log2(512) = 9 bits
Directory offset
47 39
Advanced Computer Architectures, 2014
Multilevel translation
14
Consider a virtual memory system with pages of 4KB, virtual addresses of
48 bits, physical addresses of 54 bits and page table entries (PTEs) of 8B(one of the possible combinations in the Intel IA-32e architecture)
Explain how the virtual address 0001 34AB FFA0h is translated into a
physical address
Solution:0
Offset 4000
1111 1010 0000
11
Entry 191
0 1011 1111
20 12
Entry 421
11 0100 101
29 21
Entry 4
000 0001 00
38 30
Entry 0
0000 0000 0
47 39
Advanced Computer Architectures, 2014
Multilevel translation
Step 1 – Read directory entry15
Consider a virtual memory system with pages of 4KB, virtual addresses of
48 bits, physical addresses of 54 bits and page table entries (PTEs) of 8B(one of the possible combinations in the Intel IA-32e architecture)
Virtual Page000134ABFh
(Process A)
Virtual Page 34
(Process B)
Virtual Page 1
(Process C)
Page Table 4 of Level 2
(Process A)
Page Table 421 of Level 1
(Process A)
Directory(Level 4)
(Process A)
RAM MEMORY
Virtual Page 1
(Process B)
Page Table 0 of Level 3
(Process A)
Directory page pointer
(Process A)
VIRTUAL ADDRESS
Base of directory table 000000000 000
0000 0000 0
Entry 0
47 39
9 bits
42 bits 3 bits(Required to address
the 8 bytes of the PTE)
PHYSICAL ADDRESS
Base address of Page table 0, level 3 . . .
000 0001 00
Entry 4
38 30
(STEP 1)
11 0100 101
Entry 421
29 21
LEVEL 4 - PTE 0
Data Control
0 1011 1111
Entry 191
20 12
1111 1010 0000
Offset 4000
11 0
Advanced Computer Architectures, 2014
Multilevel translation
Step 2 – Read level 3 entry16
Consider a virtual memory system with pages of 4KB, virtual addresses of
48 bits, physical addresses of 54 bits and page table entries (PTEs) of 8B(one of the possible combinations in the Intel IA-32e architecture)
Virtual Page000134ABFh
(Process A)
Virtual Page 34
(Process B)
Virtual Page 1
(Process C)
Page Table 4 of Level 2
(Process A)
Page Table 421 of Level 1
(Process A)
Directory(Level 4)
(Process A)
RAM MEMORY
Virtual Page 1
(Process B)
Page Table 0 of Level 3
(Process A)
VIRTUAL ADDRESS
000000100 000
0000 0000 0
Entry 0
47 39
9 bits
42 bits
PHYSICAL ADDRESS
Base address of Page table 4, level 2 . . .
000 0001 00
Entry 4
38 30
(STEP 2)
11 0100 101
Entry 421
29 21
LEVEL 3 - PTE 4Data Control
0 1011 1111
Entry 191
20 12
1111 1010 0000
Offset 4000
11 0
Base address of Page table 0, level 3 . . .
LEVEL 4 - PTE 0
Control
3 bits
Advanced Computer Architectures, 2014
Multilevel translation
Step 3 – Read level 2 entry17
Consider a virtual memory system with pages of 4KB, virtual addresses of
48 bits, physical addresses of 54 bits and page table entries (PTEs) of 8B(one of the possible combinations in the Intel IA-32e architecture)
Virtual Page000134ABFh
(Process A)
Virtual Page 34
(Process B)
Virtual Page 1
(Process C)
Page Table 4 of Level 2
(Process A)
Page Table 421 of Level 1
(Process A)
Directory(Level 4)
(Process A)
RAM MEMORY
Virtual Page 1
(Process B)
Page Table 0 of Level 3
(Process A)
VIRTUAL ADDRESS
110100101 000
0000 0000 0
Entry 0
47 39
9 bitsPHYSICAL ADDRESS
Base address of Page table 421, level 1 . . .
000 0001 00
Entry 4
38 30
(STEP 3)
11 0100 101
Entry 421
29 21
LEVEL 2 - PTE 421Data Control
0 1011 1111
Entry 191
20 12
1111 1010 0000
Offset 4000
11 0
Base address of Page table 4, level 2 . . .
LEVEL 3 - PTE 4
Control
42 bits 3 bits
Advanced Computer Architectures, 2014
Multilevel translation
Step 4 – Read level 1 entry18
Consider a virtual memory system with pages of 4KB, virtual addresses of
48 bits, physical addresses of 54 bits and page table entries (PTEs) of 8B(one of the possible combinations in the Intel IA-32e architecture)
Virtual Page000134ABFh
(Process A)
Virtual Page 34
(Process B)
Virtual Page 1
(Process C)
Page Table 4 of Level 2
(Process A)
Page Table 421 of Level 1
(Process A)
Directory(Level 4)
(Process A)
RAM MEMORY
Virtual Page 1
(Process B)
Page Table 0 of Level 3
(Process A)
VIRTUAL ADDRESS
010111111 000
0000 0000 0
Entry 0
47 39
9 bitsPHYSICAL ADDRESS Base address of Page
table 191, level 1 . . .
000 0001 00
Entry 4
38 30
(STEP 4)
11 0100 101
Entry 421
29 21
LEVEL 1 - PTE 191Data Control
0 1011 1111
Entry 191
20 12
1111 1010 0000
Offset 4000
11 0
Base address of Page table 421, level 1 . . .
LEVEL 2 - PTE 421
Control
42 bits 3 bits
Advanced Computer Architectures, 2014
Multilevel translation
Step 5 – Read the data19
Consider a virtual memory system with pages of 4KB, virtual addresses of
48 bits, physical addresses of 54 bits and page table entries (PTEs) of 8B(one of the possible combinations in the Intel IA-32e architecture)
Virtual Page000134ABFh
(Process A)
Virtual Page 34
(Process B)
Virtual Page 1
(Process C)
Page Table 4 of Level 2
(Process A)
Page Table 421 of Level 1
(Process A)
Directory(Level 4)
(Process A)
RAM MEMORY
Virtual Page 1
(Process B)
Page Table 0 of Level 3
(Process A)
VIRTUAL ADDRESS
111110100000
0000 0000 0
Entry 0
47 39
12 bitsPHYSICAL ADDRESS
Base address of Page table 191, level 1
000 0001 00
Entry 4
38 30
(STEP 4)
11 0100 101
Entry 421
29 21
Data
0 1011 1111
Entry 191
20 12
1111 1010 0000
Offset 4000
11 0
Base of virtual page 000134ABFh . . .
LEVEL 1 - PTE 191
Control
42 bits
Advanced Computer Architectures, 2014
Multilevel translation
Questions20
Multi-level translation requires multiple memory accesses
The system becomes slow
Dealt with by adding a Translation Look-aside buffer
Acts as a cache virtual address translation
How do we access the caches in virtual address systems:
Using the virtual address
Using the physical address
Control information stored in the page table entries (PTEs)?
Advanced Computer Architectures, 2014
PTE Control information
21
Control bits:
P (Present) states whether the page is physically present on RAM memory
Access to a PTE with P=0 generates a page fault trap
The OS checks if the page is valid (generates an segmentation fault if not) and loads the page to
memory
A (Accessed) states whether the page was recently accessed
Used for page replacement purposes
D (Dirty) indicates that the page has been written to
D=1 means that page must be written to HDD before replacing the page
R/W (Read/Write) controls if the page is read-only or allows writes
Writing to a read-only page can generate a protection fault
Ex (Execute) states whether instructions can be loaded from this page
U/S (User/supervisor) controls access privileges
If the user attempts to access a supervisor protected page, a protection fault is generated
PCD (Page-level cache disable) used for disabling caching of data
Page Table Entry
Page physical address A DP R/W EX U/S PCD
Advanced Computer Architectures, 2014
Data access time
22
An n-level page virtual page system requires n+1 memory accesses
Considerably decreases system performance
Caches can help to mitigate this problem:
Solution A (not common):
Virtually address the caches such that translation only occurs in low memory levels (e.g., when
accessing the RAM), which occurs less often
Requires control bits (e.g., the process id (pid)) to state cache line ownership
Extra control bits are required to deal with address spaces that are shared by multiple processes
Solution B (typical case):
Use a cache for virtual to physical address translation
Advanced Computer Architectures, 2014
Translation Look-aside Buffer (TLB)
23
Caching of the most recently accessed PTEs (page table entries)
Only explores the temporal locality principle
The TLB is typically a small, fully associative cache
The TLB typically has 32-256 entries
Page Virtual Address V
TLB structure
TAG PTE – DATA AND CONTROL
Page physical address A DP R/W EX U/S PCDPID
CONTROL
Advanced Computer Architectures, 2014
Access to the TLB and L1 Cache
Parallelizing access24
Accessing the data requires first translating the address and then going to the cache
𝑇𝐷𝐴𝑇𝐴 = 𝑇𝐴𝐷𝐷𝑅𝐸𝑆𝑆 𝑇𝑅𝐴𝑁𝑆𝐿𝐴𝑇𝐼𝑂𝑁 + 𝑇𝐴𝐶𝐶𝐸𝑆𝑆𝑇𝐴𝐷𝐷𝑅𝐸𝑆𝑆 𝑇𝑅𝐴𝑁𝑆𝐿𝐴𝑇𝐼𝑂𝑁 = 𝑇𝑇𝐿𝐵 +𝑀𝑅𝑇𝐿𝐵 × 𝑁 × 𝑇𝐴𝐶𝐶𝐸𝑆𝑆
𝑇𝐴𝐶𝐶𝐸𝑆𝑆 = 𝑇𝐿1 +𝑀𝑅𝐿1 × 𝑇𝐿2 +𝑀𝑅𝐿2 × 𝑇𝐿3 +𝑀𝑅𝐿3 × 𝑇𝑅𝐴𝑀
We can improve performance by parallelizing the cache access and the address translation
CORE
Virtual to Physical Address translation
TLB
Multileveltranslation
MultilevelCache
TLB Miss
Main Memory
Hard Disk Drive (HDD)
Advanced Computer Architectures, 2014
Access to the TLB and L1 Cache
Parallelizing access (𝑁 ≤ 𝑀)25
If 𝑁 ≤ 𝑀 (Cache way not larger than page size)
The bits used to index the cache come directly from the virtual address
The cache can be directly indexed while the virtual address is translated
For 4KB pages (typical case), we restrict each way to 4KB
TAG INDEX
CACHE ACCESS:PHYSICAL ADDRESS
OFFSET
PAGE VIRTUAL ADDRESS PAGE OFFSET
N bits
M bits
ADDRESS TRANSLATION:VIRTUAL ADDRESS
𝑁 = log2 𝐶𝑎𝑐ℎ𝑒 𝑤𝑎𝑦 𝑠𝑖𝑧𝑒
𝑀 = log2 𝑃𝑎𝑔𝑒 𝑠𝑖𝑧𝑒
Advanced Computer Architectures, 2014
Access to the TLB and L1 Cache
Parallelizing access (𝑁 ≤ 𝑀)26
Memória Cache
TAG V
Get the data and return the value to the processor
=
DATA
TAG V DATA
Page Virtual Address V
TLB structure
TAG PTE
Page physical addressPID
CONTROL
Check all TLB entries and verify if any of the entries is a match
Retrieve one entry per way
Retrieve the page physical address
Check TAG in all for ways for
PIPELINE STAGE 1 PIPELINE STAGE 2 PIPELINE STAGE 3
Generate a TLB miss if the translation information is not available
Generate a cache miss if the data is not on cache
PAGE VIRTUAL ADDRESS PAGE OFFSET
INDEX OFFSET
VIRTUAL ADDRESSOFFSET
Select only the bytes corresponding to the
line offset
Advanced Computer Architectures, 2014
Access to the TLB and L1 Cache
Parallelizing access (𝑁 > 𝑀)27
If 𝑁 > 𝑀 (cache way larger than page size)
get multiple lines and perform the remaining multiplexing after known the physical
address
𝑁 = log2 𝐶𝑎𝑐ℎ𝑒 𝑤𝑎𝑦 𝑠𝑖𝑧𝑒
𝑀 = log2 𝑃𝑎𝑔𝑒 𝑠𝑖𝑧𝑒
INDEX OFFSET
CACHE ACCESSPHYSICAL ADDRESS
N bitsADDRESS TRANSLATIONVIRTUAL ADDRESS
PAGE OFFSET
M bits
TAG
PAGE VIRTUAL ADDRESS
Advanced Computer Architectures, 2014
Access to the TLB and L1 Cache
Parallelizing access (𝑁 > 𝑀)28
Memória Cache
TAG V
Get the data and return the value to the processor
=
DATA
TAG V DATA
Page Virtual Address V
TLB structure
TAG PTE
Page physical addressPID
CONTROL
Check all TLB entries and verify if any of the entries is a match
Retrieve one entry per way
Retrieve the page physical address
Check TAG in all for ways for
PIPELINE STAGE 1 PIPELINE STAGE 2 PIPELINE STAGE 3
Also get the value of the remaining index bits
Generate a cache miss if the data is not on cache
PAGE VIRTUAL ADDRESS PAGE OFFSET
INDEX OFFSET
VIRTUAL ADDRESSOFFSET
Select only the bytes corresponding to the
line offset
Use only the index bits
falling within the page
offset
Select 2K words, where K is the number of index bits overlapping the page virtual address
Extract TAGfor comparison
Extract remaining index bits
Advanced Computer Architectures, 2014
Access to the TLB and L1 Cache
When a TLB miss occurs…29
If there is a TLB miss…
VIRTUAL ADDRESS
0000 0000 0
Entry 0
47 39
000 0001 00
Entry 4
38 30
11 0100 101
Entry 421
29 21
0 1011 1111
Entry 191
20 12
1111 1010 0000
Offset 4000
11 0
MISS
Page Virtual Address V
TLB structure
TAG PTE
Page physical addressPID
CONTROL
Check all TLB entries and verify if any of the entries is a match
Advanced Computer Architectures, 2014
Access to the TLB and L1 Cache
When a TLB miss occurs…30
Try to get the page table level 1 base address directly from
the TLB
Can have a unified structure, although it is not frequent
VIRTUAL ADDRESS
0000 0000 0
Entry 0
47 39
000 0001 00
Entry 4
38 30
11 0100 101
Entry 421
29 21
0 1011 1111
Entry 191
20 12
1111 1010 0000
Offset 4000
11 0
HIT
Page Virtual Address V
TLB structure
TAG PTE
Page physical addressPID
CONTROL
Check a TLB (unified or not) for the Page Table Level 1 base address
010111111 000
9 bitsPHYSICAL ADDRESS
Go to memory and fetch page base address, return the physical page base address and return the address
42 bits 3 bits
Advanced Computer Architectures, 2014
Page substitution policies
31
Least Recently Used (LRU)
Select the least recently used page to be replaced
Requires keeping track of which page is accessed how many times
Add a counter to the PTE
Keep a list in memory that as on top the least accessed page, and on the bottom
the most accessed page
Not feasible
Not recently used (NRU)
Just guarantee that a page recently accessed is not replaced
Use the accessed (A) bit in the PTE
Also consider only replacing pages which are not dirty to decrease
substitution time