32
VIRTUAL MEMORY Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach”, 5th edition, Chapter 2 and Appendix B , John L. Hennessy and David A. Patterson, Morgan Kaufmann, 2011 ADVANCED COMPUTER ARCHITECTURES ARQUITECTURAS A VANÇADAS DE COMPUTADORES (AAC)

Lesson 2 - ISA - Autenticação · ... Computer Architecture: A Quantitative Approach”, 5th edition, Chapter 2 and Appendix B, John L. Hennessy and David A. Patterson ... Advanced

  • Upload
    vonhu

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

VIRTUAL MEMORYSlides by: Pedro Tomás

Additional reading: Computer Architecture: A Quantitative Approach”, 5th edition, Chapter 2 and Appendix B, John L. Hennessy and David A. Patterson, Morgan Kaufmann, 2011

ADVANCED COMPUTER ARCHITECTURES

ARQUITECTURAS AVANÇADAS DE COMPUTADORES (AAC)

Advanced Computer Architectures, 2014

Outline

2

Introduction to virtual memory

Virtual memory as a cache

Basic translation scheme

Multilevel translation schemes

Advanced Computer Architectures, 2014

Benefits

3

Virtual memory systems where developed to allow the management of both

the RAM and Hard Disk Drive (HDD) memory in a simple, easy and uniform

way for the programmers.

Virtual memory allows extending the available memory to the sum of the RAM plus

the HDD

Virtual memory is fundamental in modern computing systems for multi-

tasking purposes:

It simplifies sharing the physical memory between the multiple running tasks

It allows tasks to be loaded into (and executed from) any memory address space

It does not forces programs to be re-compiled if a given address space is already occupied

Advanced Computer Architectures, 2014

Virtual memory as a cache

4

Virtual memory works similarly to cache systems

The HDD is considered as the larger memory system

The RAM memory works as a cache of the pages stored in HDD

In practice, pages are typically either on RAM memory or on HDD, and not on both.

The basic “cache” blocks are much larger in size:

In virtual memory systems with fixed-size blocks, the blocks are named pages

In virtual memory systems with variable-size blocks, the blocks are named segments

Locality principles are used to keep most used blocks in RAM memory

Allows reducing the memory access time

Write-back, write allocate policy to decrease the time to write data

Pre-fetching techniques can be used to decrease memory access time

Page misses (page fault) typically result in a large overhead for data copy

Page misses are managed by the operating system

Since the overhead is large, complex page replacement policies can be used

When a page fault occurs, the process is typically removed from execution

Advanced Computer Architectures, 2014

Basic translation scheme

5

The programmer accesses data using a virtual address

Each process has a distinct virtual address space

The processor checks if the virtual address is cached on RAM memory

If the address is not on RAM, a page fault occurs

If the address is on RAM memory, the processor translates the virtual address into

the physical address (i.e., the location) on RAM memory

Advanced Computer Architectures, 2014

Basic translation scheme

6

Consider a 32-bit processor and a virtual memory system with pages of

4KB

Each memory access (for instruction fetch or for load/store) uses a virtual

address with length of 32-bits

The address can be decomposed into virtual page number and page offset

Virtual address

031

Page number

031

Virtual address

Page offset

1112

Advanced Computer Architectures, 2014

Basic translation scheme

7

The page number is used to address a page table that stores the location

of the given page in physical memory

Page number

031

Virtual address

Page offset

1112

PAGE TABLE

Page base

031

Page offset

1112

Physical address of data in RAM memory

The page table makes the translation between virtual and physical address

Advanced Computer Architectures, 2014

Basic translation scheme

8

The page table must be stored in memory

each process must have its own page table

Page number

031

Virtual address

Page offset

1112

PAGE TABLE

Page base

031

Page offset

1112

Physical address of data in RAM memory

Each process has a page table base pointer

The page table base pointer is a special register associated with the running thread, which must be saved whenever there is a context switch

Page table pointer

Page base

031

Page offset

1920

Physical address the corresponding

page table entry

Advanced Computer Architectures, 2014

Basic translation scheme

9

The page table must be stored in memory

each process must have its own page table

Page number

031

Virtual address

Page offset

1112

RAM memory

Page base

031

Page offset

1112

Physical address of data in RAM memory

Each process has a page table base pointer

The page table base pointer is a special register associated with the running thread, which must be saved whenever there is a context switch

Page table pointer

Page base

031

Page offset

1112

Physical address the corresponding

page table entry

Empty

space

Process A

Page Table

Process A

Virtual Page x

Process B

Virtual Page y

Process A

Virtual Page zData

Advanced Computer Architectures, 2014

Basic translation scheme

10

Consider a virtual memory system with pages of 4KB, virtual and physical

addresses of 32bits and page table entries of 4B

What is the size of the page table for a process occupying 128MB of

virtual space

Solution:

#Virtual pages = 128MB / 4KB = 227 / 212 = 215

Page table size = 215 (pages) x 4B (per table) = 217B = 128KB > 4KB!!!!

Page tables can use a large amount of space in RAM memory

This problem can be mitigated by paging the page table

Advanced Computer Architectures, 2014

Multilevel translation

11

Advanced Computer Architectures, 2014

Multilevel translation

12

Consider a virtual memory system with pages of 4KB, virtual and physical

addresses of 32bits and page table entries (PTEs) of 4B

How many levels are required to guarantee that each page (data/page

table/directory) is at most 4KB?

Solution:

0

log2(4KB) = 12 bits

Page offset

1131

Each page table has

at most 4KB

Since each page table

entry uses 4B, a total

of 1K entries can be

stored in each page

table

log2(1K) = 10 bits

Page table offset

21 12

Each page table has

at most 4KB

Since each page table

entry uses 4B, a total

of 1K entries can be

stored in each page

table

log2(1KB) = 10 bits

Directory offset

22

Advanced Computer Architectures, 2014

Multilevel translation

13

Consider a virtual memory system with pages of 4KB, virtual addresses of

48 bits, physical addresses of 54 bits and page table entries (PTEs) of 8B(one of the possible combinations in the Intel IA-32e architecture)

Explain how the virtual address 0001 34AB FFA0h is translated into a

physical address

Solution:0

log2(4KB) = 12 bits

Page offset

11

Each page table has

at most 4KB

Since each page table

entry uses 8B, a total

of 512 entries can be

stored in each page

table

log2(512) = 9 bits

PT (Level 1) offset

20 12

Each page table has

at most 4KB

Since each page table

entry uses 8B, a total

of 512 entries can be

stored in each page

table

log2(512) = 9 bits

PT (Level 2) offset

29 21

Each page table has

at most 4KB

Since each page table

entry uses 8B, a total

of 512 entries can be

stored in each page

table

log2(512) = 9 bits

PT (Level 3) offset

38 30

Each page table has

at most 4KB

Since each page table

entry uses 8B, a total

of 512 entries can be

stored in each page

table

log2(512) = 9 bits

Directory offset

47 39

Advanced Computer Architectures, 2014

Multilevel translation

14

Consider a virtual memory system with pages of 4KB, virtual addresses of

48 bits, physical addresses of 54 bits and page table entries (PTEs) of 8B(one of the possible combinations in the Intel IA-32e architecture)

Explain how the virtual address 0001 34AB FFA0h is translated into a

physical address

Solution:0

Offset 4000

1111 1010 0000

11

Entry 191

0 1011 1111

20 12

Entry 421

11 0100 101

29 21

Entry 4

000 0001 00

38 30

Entry 0

0000 0000 0

47 39

Advanced Computer Architectures, 2014

Multilevel translation

Step 1 – Read directory entry15

Consider a virtual memory system with pages of 4KB, virtual addresses of

48 bits, physical addresses of 54 bits and page table entries (PTEs) of 8B(one of the possible combinations in the Intel IA-32e architecture)

Virtual Page000134ABFh

(Process A)

Virtual Page 34

(Process B)

Virtual Page 1

(Process C)

Page Table 4 of Level 2

(Process A)

Page Table 421 of Level 1

(Process A)

Directory(Level 4)

(Process A)

RAM MEMORY

Virtual Page 1

(Process B)

Page Table 0 of Level 3

(Process A)

Directory page pointer

(Process A)

VIRTUAL ADDRESS

Base of directory table 000000000 000

0000 0000 0

Entry 0

47 39

9 bits

42 bits 3 bits(Required to address

the 8 bytes of the PTE)

PHYSICAL ADDRESS

Base address of Page table 0, level 3 . . .

000 0001 00

Entry 4

38 30

(STEP 1)

11 0100 101

Entry 421

29 21

LEVEL 4 - PTE 0

Data Control

0 1011 1111

Entry 191

20 12

1111 1010 0000

Offset 4000

11 0

Advanced Computer Architectures, 2014

Multilevel translation

Step 2 – Read level 3 entry16

Consider a virtual memory system with pages of 4KB, virtual addresses of

48 bits, physical addresses of 54 bits and page table entries (PTEs) of 8B(one of the possible combinations in the Intel IA-32e architecture)

Virtual Page000134ABFh

(Process A)

Virtual Page 34

(Process B)

Virtual Page 1

(Process C)

Page Table 4 of Level 2

(Process A)

Page Table 421 of Level 1

(Process A)

Directory(Level 4)

(Process A)

RAM MEMORY

Virtual Page 1

(Process B)

Page Table 0 of Level 3

(Process A)

VIRTUAL ADDRESS

000000100 000

0000 0000 0

Entry 0

47 39

9 bits

42 bits

PHYSICAL ADDRESS

Base address of Page table 4, level 2 . . .

000 0001 00

Entry 4

38 30

(STEP 2)

11 0100 101

Entry 421

29 21

LEVEL 3 - PTE 4Data Control

0 1011 1111

Entry 191

20 12

1111 1010 0000

Offset 4000

11 0

Base address of Page table 0, level 3 . . .

LEVEL 4 - PTE 0

Control

3 bits

Advanced Computer Architectures, 2014

Multilevel translation

Step 3 – Read level 2 entry17

Consider a virtual memory system with pages of 4KB, virtual addresses of

48 bits, physical addresses of 54 bits and page table entries (PTEs) of 8B(one of the possible combinations in the Intel IA-32e architecture)

Virtual Page000134ABFh

(Process A)

Virtual Page 34

(Process B)

Virtual Page 1

(Process C)

Page Table 4 of Level 2

(Process A)

Page Table 421 of Level 1

(Process A)

Directory(Level 4)

(Process A)

RAM MEMORY

Virtual Page 1

(Process B)

Page Table 0 of Level 3

(Process A)

VIRTUAL ADDRESS

110100101 000

0000 0000 0

Entry 0

47 39

9 bitsPHYSICAL ADDRESS

Base address of Page table 421, level 1 . . .

000 0001 00

Entry 4

38 30

(STEP 3)

11 0100 101

Entry 421

29 21

LEVEL 2 - PTE 421Data Control

0 1011 1111

Entry 191

20 12

1111 1010 0000

Offset 4000

11 0

Base address of Page table 4, level 2 . . .

LEVEL 3 - PTE 4

Control

42 bits 3 bits

Advanced Computer Architectures, 2014

Multilevel translation

Step 4 – Read level 1 entry18

Consider a virtual memory system with pages of 4KB, virtual addresses of

48 bits, physical addresses of 54 bits and page table entries (PTEs) of 8B(one of the possible combinations in the Intel IA-32e architecture)

Virtual Page000134ABFh

(Process A)

Virtual Page 34

(Process B)

Virtual Page 1

(Process C)

Page Table 4 of Level 2

(Process A)

Page Table 421 of Level 1

(Process A)

Directory(Level 4)

(Process A)

RAM MEMORY

Virtual Page 1

(Process B)

Page Table 0 of Level 3

(Process A)

VIRTUAL ADDRESS

010111111 000

0000 0000 0

Entry 0

47 39

9 bitsPHYSICAL ADDRESS Base address of Page

table 191, level 1 . . .

000 0001 00

Entry 4

38 30

(STEP 4)

11 0100 101

Entry 421

29 21

LEVEL 1 - PTE 191Data Control

0 1011 1111

Entry 191

20 12

1111 1010 0000

Offset 4000

11 0

Base address of Page table 421, level 1 . . .

LEVEL 2 - PTE 421

Control

42 bits 3 bits

Advanced Computer Architectures, 2014

Multilevel translation

Step 5 – Read the data19

Consider a virtual memory system with pages of 4KB, virtual addresses of

48 bits, physical addresses of 54 bits and page table entries (PTEs) of 8B(one of the possible combinations in the Intel IA-32e architecture)

Virtual Page000134ABFh

(Process A)

Virtual Page 34

(Process B)

Virtual Page 1

(Process C)

Page Table 4 of Level 2

(Process A)

Page Table 421 of Level 1

(Process A)

Directory(Level 4)

(Process A)

RAM MEMORY

Virtual Page 1

(Process B)

Page Table 0 of Level 3

(Process A)

VIRTUAL ADDRESS

111110100000

0000 0000 0

Entry 0

47 39

12 bitsPHYSICAL ADDRESS

Base address of Page table 191, level 1

000 0001 00

Entry 4

38 30

(STEP 4)

11 0100 101

Entry 421

29 21

Data

0 1011 1111

Entry 191

20 12

1111 1010 0000

Offset 4000

11 0

Base of virtual page 000134ABFh . . .

LEVEL 1 - PTE 191

Control

42 bits

Advanced Computer Architectures, 2014

Multilevel translation

Questions20

Multi-level translation requires multiple memory accesses

The system becomes slow

Dealt with by adding a Translation Look-aside buffer

Acts as a cache virtual address translation

How do we access the caches in virtual address systems:

Using the virtual address

Using the physical address

Control information stored in the page table entries (PTEs)?

Advanced Computer Architectures, 2014

PTE Control information

21

Control bits:

P (Present) states whether the page is physically present on RAM memory

Access to a PTE with P=0 generates a page fault trap

The OS checks if the page is valid (generates an segmentation fault if not) and loads the page to

memory

A (Accessed) states whether the page was recently accessed

Used for page replacement purposes

D (Dirty) indicates that the page has been written to

D=1 means that page must be written to HDD before replacing the page

R/W (Read/Write) controls if the page is read-only or allows writes

Writing to a read-only page can generate a protection fault

Ex (Execute) states whether instructions can be loaded from this page

U/S (User/supervisor) controls access privileges

If the user attempts to access a supervisor protected page, a protection fault is generated

PCD (Page-level cache disable) used for disabling caching of data

Page Table Entry

Page physical address A DP R/W EX U/S PCD

Advanced Computer Architectures, 2014

Data access time

22

An n-level page virtual page system requires n+1 memory accesses

Considerably decreases system performance

Caches can help to mitigate this problem:

Solution A (not common):

Virtually address the caches such that translation only occurs in low memory levels (e.g., when

accessing the RAM), which occurs less often

Requires control bits (e.g., the process id (pid)) to state cache line ownership

Extra control bits are required to deal with address spaces that are shared by multiple processes

Solution B (typical case):

Use a cache for virtual to physical address translation

Advanced Computer Architectures, 2014

Translation Look-aside Buffer (TLB)

23

Caching of the most recently accessed PTEs (page table entries)

Only explores the temporal locality principle

The TLB is typically a small, fully associative cache

The TLB typically has 32-256 entries

Page Virtual Address V

TLB structure

TAG PTE – DATA AND CONTROL

Page physical address A DP R/W EX U/S PCDPID

CONTROL

Advanced Computer Architectures, 2014

Access to the TLB and L1 Cache

Parallelizing access24

Accessing the data requires first translating the address and then going to the cache

𝑇𝐷𝐴𝑇𝐴 = 𝑇𝐴𝐷𝐷𝑅𝐸𝑆𝑆 𝑇𝑅𝐴𝑁𝑆𝐿𝐴𝑇𝐼𝑂𝑁 + 𝑇𝐴𝐶𝐶𝐸𝑆𝑆𝑇𝐴𝐷𝐷𝑅𝐸𝑆𝑆 𝑇𝑅𝐴𝑁𝑆𝐿𝐴𝑇𝐼𝑂𝑁 = 𝑇𝑇𝐿𝐵 +𝑀𝑅𝑇𝐿𝐵 × 𝑁 × 𝑇𝐴𝐶𝐶𝐸𝑆𝑆

𝑇𝐴𝐶𝐶𝐸𝑆𝑆 = 𝑇𝐿1 +𝑀𝑅𝐿1 × 𝑇𝐿2 +𝑀𝑅𝐿2 × 𝑇𝐿3 +𝑀𝑅𝐿3 × 𝑇𝑅𝐴𝑀

We can improve performance by parallelizing the cache access and the address translation

CORE

Virtual to Physical Address translation

TLB

Multileveltranslation

MultilevelCache

TLB Miss

Main Memory

Hard Disk Drive (HDD)

Advanced Computer Architectures, 2014

Access to the TLB and L1 Cache

Parallelizing access (𝑁 ≤ 𝑀)25

If 𝑁 ≤ 𝑀 (Cache way not larger than page size)

The bits used to index the cache come directly from the virtual address

The cache can be directly indexed while the virtual address is translated

For 4KB pages (typical case), we restrict each way to 4KB

TAG INDEX

CACHE ACCESS:PHYSICAL ADDRESS

OFFSET

PAGE VIRTUAL ADDRESS PAGE OFFSET

N bits

M bits

ADDRESS TRANSLATION:VIRTUAL ADDRESS

𝑁 = log2 𝐶𝑎𝑐ℎ𝑒 𝑤𝑎𝑦 𝑠𝑖𝑧𝑒

𝑀 = log2 𝑃𝑎𝑔𝑒 𝑠𝑖𝑧𝑒

Advanced Computer Architectures, 2014

Access to the TLB and L1 Cache

Parallelizing access (𝑁 ≤ 𝑀)26

Memória Cache

TAG V

Get the data and return the value to the processor

=

DATA

TAG V DATA

Page Virtual Address V

TLB structure

TAG PTE

Page physical addressPID

CONTROL

Check all TLB entries and verify if any of the entries is a match

Retrieve one entry per way

Retrieve the page physical address

Check TAG in all for ways for

PIPELINE STAGE 1 PIPELINE STAGE 2 PIPELINE STAGE 3

Generate a TLB miss if the translation information is not available

Generate a cache miss if the data is not on cache

PAGE VIRTUAL ADDRESS PAGE OFFSET

INDEX OFFSET

VIRTUAL ADDRESSOFFSET

Select only the bytes corresponding to the

line offset

Advanced Computer Architectures, 2014

Access to the TLB and L1 Cache

Parallelizing access (𝑁 > 𝑀)27

If 𝑁 > 𝑀 (cache way larger than page size)

get multiple lines and perform the remaining multiplexing after known the physical

address

𝑁 = log2 𝐶𝑎𝑐ℎ𝑒 𝑤𝑎𝑦 𝑠𝑖𝑧𝑒

𝑀 = log2 𝑃𝑎𝑔𝑒 𝑠𝑖𝑧𝑒

INDEX OFFSET

CACHE ACCESSPHYSICAL ADDRESS

N bitsADDRESS TRANSLATIONVIRTUAL ADDRESS

PAGE OFFSET

M bits

TAG

PAGE VIRTUAL ADDRESS

Advanced Computer Architectures, 2014

Access to the TLB and L1 Cache

Parallelizing access (𝑁 > 𝑀)28

Memória Cache

TAG V

Get the data and return the value to the processor

=

DATA

TAG V DATA

Page Virtual Address V

TLB structure

TAG PTE

Page physical addressPID

CONTROL

Check all TLB entries and verify if any of the entries is a match

Retrieve one entry per way

Retrieve the page physical address

Check TAG in all for ways for

PIPELINE STAGE 1 PIPELINE STAGE 2 PIPELINE STAGE 3

Also get the value of the remaining index bits

Generate a cache miss if the data is not on cache

PAGE VIRTUAL ADDRESS PAGE OFFSET

INDEX OFFSET

VIRTUAL ADDRESSOFFSET

Select only the bytes corresponding to the

line offset

Use only the index bits

falling within the page

offset

Select 2K words, where K is the number of index bits overlapping the page virtual address

Extract TAGfor comparison

Extract remaining index bits

Advanced Computer Architectures, 2014

Access to the TLB and L1 Cache

When a TLB miss occurs…29

If there is a TLB miss…

VIRTUAL ADDRESS

0000 0000 0

Entry 0

47 39

000 0001 00

Entry 4

38 30

11 0100 101

Entry 421

29 21

0 1011 1111

Entry 191

20 12

1111 1010 0000

Offset 4000

11 0

MISS

Page Virtual Address V

TLB structure

TAG PTE

Page physical addressPID

CONTROL

Check all TLB entries and verify if any of the entries is a match

Advanced Computer Architectures, 2014

Access to the TLB and L1 Cache

When a TLB miss occurs…30

Try to get the page table level 1 base address directly from

the TLB

Can have a unified structure, although it is not frequent

VIRTUAL ADDRESS

0000 0000 0

Entry 0

47 39

000 0001 00

Entry 4

38 30

11 0100 101

Entry 421

29 21

0 1011 1111

Entry 191

20 12

1111 1010 0000

Offset 4000

11 0

HIT

Page Virtual Address V

TLB structure

TAG PTE

Page physical addressPID

CONTROL

Check a TLB (unified or not) for the Page Table Level 1 base address

010111111 000

9 bitsPHYSICAL ADDRESS

Go to memory and fetch page base address, return the physical page base address and return the address

42 bits 3 bits

Advanced Computer Architectures, 2014

Page substitution policies

31

Least Recently Used (LRU)

Select the least recently used page to be replaced

Requires keeping track of which page is accessed how many times

Add a counter to the PTE

Keep a list in memory that as on top the least accessed page, and on the bottom

the most accessed page

Not feasible

Not recently used (NRU)

Just guarantee that a page recently accessed is not replaced

Use the accessed (A) bit in the PTE

Also consider only replacing pages which are not dirty to decrease

substitution time

More on memory systems:

Virtual memory

Next lesson32