1 Memory hierarchy and paging Electronic Computers M

1

Memory hierarchy and paging

Electronic Computers M

2

How do we dream of a memory?

• Infinite capacity and access time null……….

BUT

• Faster is the memory more expensive and power consuming is (and very often is of bigger physical size)

• The aimed characteristics are unattainable

• Alternative solution: multiple level memory hierarchy

• Big capacity memory: slow access stime

• Small capacity memory; very fast access time

• Each level is therefore characterized by :

Access time Cost per byte Total capacity Transfer speed (bandwith) Single transferred item size

Livelli di gerarchia delle memorie

3

CPURegisters

Cache I lev.

Cache II lev.

Cache III lev.

Central Memory

Disk

Tape

N.B. Some cache levels can be missing in the CPUs. How is the memory hierarchy handled?Caches are hardware managed (totally transparent to users)Memory and disks: hardware, OS and user (files)

Biggerspeed

Biggercapacity

Capacity/access-time/costs

CPU registers Hundreds of Bytes

<1 ns

CacheKbytes-Mbytes

1-10 ns$10/ MByte

DiscThousands of G Bytes / Tbytes

10 ms$0.0016/ MByte

TapeInfinite

Seconds-minutesi

Central memory GBytes

100ns- 300ns$1/ MByte

CACHE: small and very fast memory. Discussed later

Characteristics

4

• Inclusion: All information of the upper levels (those increasingly nearer the CPU) are present in the lower levels. Very often used (but not always)

• Coherency: information data in different levels must be consistent and therefore update policies must be implemented

• Write-through: immediate information blocks update• Write-back: information update is delayed until mandatory (i.e. a data replacement or its

request by other processors)

• A replacement policy must be therefore defined

• NB: Information blocks in caches are called «lines» and in the central memory «pages».

Locality principle

5

• Each program in any phase of its execution uses only a small portion of the memory data/instrutions

• Two locality types:

Time locality : when a data item has been accessed it is very likely that the same item will be accessed in the near future (i.e. loop)

Space locality: when a data item has been accessed it is very likely other items of near address will be accessed (i.e. vecotrs, matrices, linear code …)

Working Set

Memory hierarchyGeneral issues

6

• It solves the following problems:

The speed difference between processors and memories The need of big size central memories

• The main characteristic of the balance between cache and central memory is the speed and the transferred elements (indivisible – no portion of them) which are the the lines (32-256 plus bytes. The size depends on the number of cache levels)

• The main characteristic of the balance between the central memory and the disks is the capacity and the transferred elements are the pages (4KB-128KB), that is blocks of fixed size either of programs or of data – see later for their use

• A computer can have either, neither or both of them (caching – paging)

• Exploiting the memory hierarchy and the locality principle we achieve two goals: A (virtual) memory space is made available to the programmer, whose size is equal to the addressable

central memory space (which depends on the parallelism of the computer address). The physical central memory is always smaller than the addressable space. The central memory is much slower than the cache memory

The maximum speed access is granted to the processor, which accesses in most cases only the cache which is much faster than the central memory

This implies that faults must be handled that is cases when a memory level (either cache or central) DOES NOT contain the requested data and must get them from the a lower level memory, for instance cache lines (from central memory) or central memory pages (from disk). Double faults are obviously possible but unlikely if the system is well managed.

Terminology

7

• There is a HIT when the requested data are present in the hierarchy level to which it was requested (i.e. the first level cache for the processor or the central memory for the last level cache)

• There is a MISS when the data ARE NOT present in the hierarchy level to which it was requested and must be retrieved recursively from lower levels

CacheBlock

A

Central Memory

BlockA

BlockB

BlockN

Processor There is a HIT when the processor requests data belonging to block A and a MISS if the processor requests data belonging to block B.In case of a MISS the time for accessing lines of block B (miss penalty) depends on the request time the time of extracting block B and the transfer time between levels. This time increases according to the distance of the data (n. of levels) from the CPU (it varies from few to thousands clock cycles). Bigger is the block, bigger is the transfer time but in this case the miss rate (the probability of a miss in a data block) decreases. There must be therefore a reasonable balance in order to keep to a minimum the miss rate x miss penalty

BlockR

Problems

8

• Where can be placed in cache (block placement) data (lines) of the block B of the main memory? For instance it could replace data of block A?

• How can data be found in cache (block identification) ?• How can we choose the block in cache (line) to be replaced when the cache is already full ? Many

policies (see later caches coherency and BTB) • What happens when we write a line (write strategy) (for instance a line of block A) ? Normally a

write-back policy (see later caches) is used

CacheBlock

A

Memoria

BlockA

BlockB

BlockN

Processor

BlockR

Virtual Memory

9

• The concept of logical (virtual) address space vs physical address space is the basis of the memory management

Logical addresses -generated by the CPU and known also as virtual addresses or linear addresses of the data

Physical addresses – the real, physical addresses where the requested data are stored

• Memory Management Unit: a hw/sw device which maps the logical (virtual) addresses to physical addresses. Used only in medium to high performance processors

• The programmer deals only with the logical addresses and is always totally

unaware of the physical addresses of where the requested data are located

Paging

10

• The physical memory is subdivided by the hw in fixed size (2’s power) blocks called frames• The logical (virtual) memory is for the programmer a sequence of consecutive addresses which is

interpreted by the hw as subdivided in blocks of equal size (pages). Pages and frames are of the same size

• The OS manages the frames (free or occupied)• In order to execute a program at any time only n of its pages are needed (working set). For the

execution therefore a program needs only n frames not necessarily contiguous (normally they are never contiguous) where the working set can be stored.

• A mapping system is therefore needed (the pages table which contains the initial physical addresses of all frames where the program pages are stored.)

• The memory fragmentation (but for the last frame of a program) is therefore avoided.• The CPUs virtual address is normally interpreted by the hw as made of two components:

The m MSBits are the page number, that is the index in a table (page table) which allows the retrieval (the first physical address) of the corresponding frame

The n LSBits are the offset in page that is the value which must be added to the initial physical address to retrieve that data. Since the pages (and frames) are always of the same size and a 2’s power (they aligned - the initial address of each one has its n LSBits equal zero) the offset must be only joined to the MSBits

11

Paging

MAPPINGPage table

Page number to frame initial address (MSBs)

Physical Memory

Virtual addresses

PAGE

PAGE

PAGE

PAGE

PAGE

PAGE

PAGE

PAGE

PAGE

PAGE

FRAME

FRAME

FRAME

FRAME

FRAME

FRAME

FRAME

FRAME

FRAME

FRAME

Physical addresses

2k elements

Logical Memory

Page number Offset

k bits n bits

Frame number Offset

h bits n bits

K>>h

12

Paging

Virtual Page NumberOffset in page

+Page table

Initial address

Physical page number(Page initial address)

Always aligned (LSbits always zero !!)

Offset inpage

(joined)

Page table

Processor generated address

Page descriptor

Datum physical address

Status

Address translation

Page table implementation

13

• The page tables (one for each taks !!) are stored in the central memory

• A base table address register must point to the page table initial address. The size of each page table corresponds to the size of the virtual memory size divided by the page size and multiplied by the number of bytes for each table entry.

• The OS must manage another table indicating which physical pages are freee and/or occupied

• In order to avoid double memory access for each data access a special cache must exist called Translation Lookaside Buffer)which provides the physical page address without accessing the page table in main memory

Translation LookasideBuffer (TLB)

14

Virtual Page Number Offset in page

+

Page table

Processor generated address Status

TLB(Within theprocessor)

Hit

Miss

The TLB stores the translation (virtual to physical) of the last n addresses. It is a cache

Page tableInitial address

Physical page number Offset

Datum physical address

15

Paging (x86)

Access protection

Dirty bit

Reference bits

Present/Missing

Status bits

Each page can be defined as read only, read/write, user, system etc.

It indicates whether the page content has been modified, When modified it must be written back to bulk memory when replaced

It indicates whether the page was accessed (used by the replacement algorithm))

A virtual page may or may not be in the physical memory. In the latter case in the page descriptor the page address location in the bulk memory is stored

Valid/Invalid

It indicates whether a physical page corresponds to a frame of the virtual memory

16

Paging

Page size

Big pages(bigger than 32 KB)

• Reduced comprehensive access time (latency time)• Reduced transfer time (reduced page-miss frequency)• Smaller page table size• Bigger internal fragmentation

Small pages (typically 4-8 KB)

• Increased access time (increased seek time)• Increased transfer time (increased page-miss frequency)• Bigger page table size• Thrashing• Smaller internal fragmentation

Normally the page size lies between 4KB and 256 KB

Page fault

17

• Page load occurs «on demand» that is when one of its data are requested and the page is not already in memory: a OS trap is generated in this case

• The OS checks whether a non valid access took place (aborted) or the page is not yet in memory (page fault)

• In the latter case the OS checks whether a free frame is available and stores there the requested page. When no free frames are available an occupied frame is freed. If modified (dirty bit) the page is written back to the bulk memory. The page table is then modified

• The OS restarts the interrupted instruction of the interrupted task (restartable instruction)

Main Memory

If the page is not in memory?

18

ProcessorTranslationmechanism

Virtual address

Page already in memory

Must be «always» availablein memory (at least the portion needed)

Page

Datum

Offset

Miss (fault)

Bulk memory(disk)

Fault handlerOS

Hit

19

Page table organisation

N.B. a different page table exists for each task

S.O.

TP-P1

TP-P2 Page1

Page 2Page 3

Page0

File system

27 0M

44 1M

8714 2D

16 3M

Phys page 27

Phys page 16

Phys page 44

Phys. page or disk address

i.e.: frame 2 is located in page 44; frame 2 is on disk sector 8714

Frame number

Memory or DiskProcess 1 page table (TP-P1)

Page table size

20

• Consider a virtual address space with 36 bit address parallelism and frames/pages of 16Kbytes (16kbytes corresponds to a 14 bits offset. The frame number consists therefore of 22 bits).

• The page table contains 222 descriptors (22 x 210 x 210 = 4 x 1024 x 1024 = 4M) each of 22 bits (pages are aligned which means that their initial addresses has the 14 LSbits equal zero. They must not therefore be stored in the descriptor). If there are 10 status bits 4 bytes per descriptor are needed. The page table of each process is therefore 4M x 4 bytes = 16 Mbytes !

• The total memory space for the page tables is therefore 16mbytes x number of active processes (very often hundreds): memory occupancy unacceptable

Multiple levels page table

21

Hyrerarchical organisation(case of 4KB pages and 32 bit address )

Level 1 Level 2 Offset

Address 32 bit. level I: 10 bit; level II : 10 bit; (each table slot 4 bytes !) - offset in page : 12 bit Table level I: 4 KB (points to 1024 level II tables – 4 bytes/address + status)

Table level II: 4 KB (points to 1024 data/code pages - 4 bytes/address + status)

1024 (210) elements 20+12 (status)=32 BIT ->4 KB

1024 2nd lev.tables

Each table size (1 or 2 level) is1024 x 4 = 4 KB that is the size of a page! 2nd level tables are loaded when necessary

Addr = 32 bit = 4 bytes but pages aligned (12 LSB = 0) -> 20 bit + 12 status bits

First level table, loaded when the task is started, is always present in memory

Phys. Addr. (aligned)of Tab. liv. 2

Phys. initial addresses of the user pages (data/code)

10 bit 10 bit 12 bit System register Initial physicl address

of level 1 tableOne for each process

Task physical page

Virtual Page Number

22

Hyrerarchical organisation(4 KB pages)

1023

n

0

1023

m

0

Offset

I level table – 10 bit (1024 II level tables )

Each element stores the physical initial address + status

of a II level table

123

123

31 12 11 0

Address status

4 bytes

II level table – 10 bit ( 1024 elements ) Each element stores the physiscal initial address + status

of a physical memory page (contiguous ddresses) and therefore corrisponds to 1024 x 4 KB (12 bit) = 4 MB

Physical page 4 KB (12 bit) (Contiguos addresses)

Virtual memory 4 GBconceptually divided in 4MB blocks

(4GB/1024 – 10 bit))(4 MB -> 22 bit)

(physical addr.)

Level 1 Level 2 Offset

Virtual Page Number

10 bit 10 bit 12 bit

4 bytes

II lev init. addr+status Page init ,addr + status

(In this example a 16 bit word)

4 bytes if parallelism 32 bit)

Dato

(physical addr.)

Virtual Page Number

23

Hyrerarchical organisation(4 KB pages)

00000000100000000011000000011001

Ex. Addr. 00803019H -> 00000000100000000011000000011001

4M

1023

n

0

4M

4M

4K

4K

4K 1023

m

0

25

4M 14M 24M 3 4K 1

4K 24K 3

31 12 11 04 bytes

1° Liv: slot 2 2° liv: slot 3 Offset: 25d

Physical page

Totale 4 GB

(a byte in this example)

(The size of the addressed datadepends on the operation code )

(physical addr.)(physical addr.)

Address status

24

Hyerarchical organisationEach II level table is a page which does not contain data BUT the physical address of page of the requested data.

Upon a context switch only the first level table (4KB) must be present in memory while the second level tables are recalled only when needed using a Least Recently Used meachnism similar to that of the data pages

In the modern processors where the address parallelism is over 38 bit 3 levels hyerarchical page sistems are implemented

As already pointed out each data access would require multiple memory accesses: unacceptable. The Translation Lookaside Buffer –TLB mechanism is used which stores the last translations between logical and physical addresses, drastically reducing all access delay (but for the memory data access which in turn is reduced with code/data caches – see later) -. N.B. page tables changes (for instance the initial address of a data page) are NOT automatically

reflected in the TLB which must be cleared upon a context switch. The OS is responsible for the congruence.

Documents

1 Memory hierarchy and paging Electronic Computers M