63
Lecture 5: Memory 1

Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Lecture 5: Memory

1

Page 2: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Memory§5.1 IntrMemory roduction

It is ‘impossible’ to have memory that is both Unlimited (large in capacity)( g p y) And fast

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2

Page 3: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Memory TechnologyMemory Technology

S RAM (SRAM) Static RAM (SRAM) 0.5ns – 2.5ns, $2000 – $5000 per GB

Dynamic RAM (DRAM) 50ns – 70ns $20 – $75 per GB50ns 70ns, $20 $75 per GB

Magnetic disk5 20 $0 20 $2 GB 5ms – 20ms, $0.20 – $2 per GB

Ideal memory Access time of SRAM Capacity and cost/GB of disk

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3

p y

Page 4: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Memory§5.1 IntrMemory roduction

It is ‘impossible’ to have memory that is both Unlimited (large in capacity)( g p y) And fast

We create an illusion for the programmer Before that, let us look at the way programs

access memory y

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 4

Page 5: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Principle of Locality§5.1 IntrPrinciple of Locality

P ll f h

roduction

Programs access a small proportion of their address space at any time

Temporal locality Items accessed recently are likely to be accessed again

soon e.g., instructions in a loop

Spatial locality Items near those accessed recently are likely to be

accessed soon E.g., sequential instruction access, array data

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 5

Page 6: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

To Take Advantage of LocalityTo Take Advantage of Locality

Employ memory hierarchy Use multiple levels of memoriesp ‘Larger’ distance from processor =>

• larger size • larger size • larger access time

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 6

Page 7: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Memory hierarchyy y

FIGURE 5.1 The basic structure of a memory hierarchy. By implementing the memory system as a hierarchy, the user has the illusion of a memory that is as large as the largest level of the hierarchy, but can be accessed as if it were all built from the fastest memory. Flash memory has replaced disks in many embedded devices, and may lead to a new level in the storage hierarchy for desktop and server computers; see Section 6.4. Copyright © 2009 Elsevier, Inc. All rights reserved.

7 7

Page 8: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Taking Advantage of LocalityTaking Advantage of Locality

Memory hierarchy Store everything on diskStore everything on disk Copy recently accessed (and nearby) items

f d k ll DRAM from disk to smaller DRAM memory Main memoryy

Copy more recently accessed (and nearby) items from DRAM to smaller SRAM memoryitems from DRAM to smaller SRAM memory Cache memory attached to CPU

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 8

Page 9: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Memory Hierarchy Levels Block (aka line): unit of copying

M b lti l d May be multiple words

If accessed data is present in upper levellevel Hit: access satisfied by upper level

• Hit ratio: hits/accesses

If accessed data is absent Miss: block copied from lower level

• Time taken: miss penalty• Miss ratio: misses/accesses

= 1 – hit ratio

Then accessed data supplied from upper level

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 9

Page 10: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

This structure, with the appropriate operating mechanisms, allows the processor to have an access time that is determined primarily by level 1 of the hierarchy and yet have a memory as large as level n. Although the local disk is normally the bottom of the hierarchy, some systems use tape or a file server over a local area network as the next levels of the hierarchy. 

Page 11: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Cache Memory§5.2 TheCache Memory

Cache memory

e Basics of

The level of the memory hierarchy closest to the CPU

f Caches

Given accesses X1, …, Xn–1, Xn

How do we know if the data is present?

Where do we look?

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 11

Page 12: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Direct Mapped CacheDirect Mapped Cache Location determined by address Direct mapped: only one choice (Block address) modulo (#Blocks in cache) (Block address) modulo (#Blocks in cache)

Page 13: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

(Block address) modulo (#Blocks in cache)(Block address) modulo (#Blocks in cache)

Page 14: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Direct Mapped CacheDirect Mapped Cache

If #Blocks is a power of 2 If #Blocks is a power of 2 Use low‐order address bits to compute 

ddaddress

Page 15: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Tag BitsTag Bits

Each cache location can store the contents of more than one memory locationy How do we know which particular block is

stored in a cache location?stored in a cache location? Add a set of tag bits to the cache Tag needs only need the high-order bits of the

address

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 15

Page 16: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Valid BitsValid Bits

What if there is no data in a location? Valid bit: 1 = present, 0 = not presentp , p Initially 0 because when the processor starts up,

the cache does not have any valid datathe cache does not have any valid data

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 16

Page 17: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Cache ExampleCache Example 8-blocks, 1 word/block, direct mapped Initial state

Index V Tag Data000 N000 N001 N010 N010 N011 N100 N101 N110 N111 N

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 17

111 N

Page 18: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Cache ExampleCache ExampleWord addr Binary addr Hit/miss Cache block

22 10 110 Miss 110

Index V Tag Data000 N000 N001 N010 N010 N011 N100 N101 N110 Y 10 Mem[10110]111 N

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 18

111 N

Page 19: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Cache ExampleCache ExampleWord addr Binary addr Hit/miss Cache block

26 11 010 Miss 010

Index V Tag Data000 N000 N001 N010 Y 11 Mem[11010]010 Y 11 Mem[11010]011 N100 N101 N110 Y 10 Mem[10110]111 N

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 19

111 N

Page 20: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Cache ExampleCache ExampleWord addr Binary addr Hit/miss Cache block

22 10 110 Hit 11026 11 010 Hit 010

Index V Tag Data000 N000 N001 N010 Y 11 Mem[11010]010 Y 11 Mem[11010]011 N100 N101 N110 Y 10 Mem[10110]111 N

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 20

111 N

Page 21: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Cache ExampleCache ExampleWord addr Binary addr Hit/miss Cache block

16 10 000 Miss 0003 00 011 Miss 01116 10 000 Hit 000

Index V Tag Data000 Y 10 M [10000]

16 10 000 Hit 000

000 Y 10 Mem[10000]001 N010 Y 11 Mem[11010]010 Y 11 Mem[11010]011 Y 00 Mem[00011]100 N101 N110 Y 10 Mem[10110]111 N

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 21

111 N

Page 22: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Cache ExampleCache ExampleWord addr Binary addr Hit/miss Cache block

18 10 010 Miss 010

Index V Tag Data000 Y 10 M [10000]000 Y 10 Mem[10000]001 N010 Y 10 Mem[10010]010 Y 10 Mem[10010]011 Y 00 Mem[00011]100 N101 N110 Y 10 Mem[10110]111 N

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 22

111 N

Page 23: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Address SubdivisionAddress SubdivisionMemory Address is of 32 

bits

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 23

Page 24: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Address SubdivisionAddress Subdivision

Cache holds 1024 blocksBl k i i 1 dBlock size is 1 wordSo, cache size is 1024 words or 2^10 wordsOr (2^10) x 4 bytes

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 24

Or (2^10) x 4 bytesOr 4KB

Page 25: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Address SubdivisionAddress Subdivision

Since Cache holds 1024 

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 25

blocks, we need 10 bits to index the cache 

Page 26: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Address SubdivisionAddress Subdivision

1 word or 4 bytes are fetched into each cache

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 26

fetched into each cache block, so no need to index 

them

Page 27: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Address SubdivisionAddress Subdivision

That leaves 20 bits upper bits as tag bits

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 27

bits as tag bits

Page 28: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Example: Larger Block SizeExample: Larger Block Size 64 blocks, 16 bytes/block To what block number does byte address 1200

map?p

Block address = 1200/16 = 75Bl k b 75 d l 64 11 Block number = 75 modulo 64 = 11

Tag Index Offset03491031

4 bits6 bits22 bits 4 bits6 bits22 bits

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 28

Page 29: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Block Size ConsiderationsBlock Size Considerations

Larger blocks should reduce miss rate Due to spatial localityp y

But in a fixed-sized cacheL bl k f f h Larger blocks fewer of them

• More competition increased miss rate

Larger miss penalty Can override benefit of reduced miss rateCan override benefit of reduced miss rate

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 29

Page 30: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Cache MissesCache Misses

On cache hit, CPU proceeds normally On cache missOn cache miss Stall the CPU pipeline

F h bl k f l l f h h Fetch block from next level of hierarchy Instruction cache miss

• Restart instruction fetch

Data cache miss• Complete data access

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 30

Page 31: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Write ThroughWrite-Through

O h d h ld d h bl k On each data-write hit, could just update the block in cache

B h h d ld b i i But then cache and memory would be inconsistent

Write through: also update memoryB k k l But makes writes take longer

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 31

Page 32: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Write BufferWrite-Buffer

S l b ff Solution: write buffer Holds data waiting to be written to memory

CPU i i di l f i i i b ff CPU continues immediately after writing to write-buffer• Write-buffer is freed later when memory write is completed

But CPU stalls on write if write buffer is already full

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 32

Page 33: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Write BufferWrite-Buffer

Write-buffer can become full and the processor will stall ifp Rate of memory completing the write operation is

less than the rate at which write instructions are less than the rate at which write instructions are generated

Or if there is a burst of writes Or if there is a burst of writes

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 33

Page 34: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Write BackWrite-Back

Alternative: On data-write hit, just update the block in cache Keep track of whether each block is dirty

When a dirty block is replaced Write it back to memory

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 34

Page 35: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Measuring Cache Performance§5.3 M

eg Components of CPU time

P ti l

asuring a

Program execution cycles• Includes cache hit time

Memory stall cycles

nd Improv Memory stall cycles

• Mainly from cache misses

ving Cache Performmance

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 35

Page 36: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Measuring Cache Performance§5.3 M

eg Components of CPU time

P ti l

asuring a

Program execution cycles• Includes cache hit time

Memory stall cycles

nd Improv Memory stall cycles

• Mainly from cache misses

With simplifying assumptions:

ving Cach

With simplifying assumptions: e PerformcyclesstallMemory mance

penalty Missrate MissProgram

accessesMemory

penalty MissnInstructio

MissesProgram

nsInstructio

Program

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 36

nInstructioProgram

Page 37: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Cache Performance ExampleCache Performance Example

Gi Given I-cache miss rate = 2%

D h 4% D-cache miss rate = 4% Miss penalty = 100 cycles

B CPI ( d l h ) 2 Base CPI (ideal cache) = 2 Load & stores are 36% of instructionsH h f ld b h How much faster would be a processor with a perfect cache that never misses ?

Page 38: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Cache Performance ExampleCache Performance Example

Mi l f ll i i if I i h Miss cycles for all instructions if I is the instruction count

I h I × 0 02 × 100 2 I I-cache: I × 0.02 × 100 = 2 I D-cache: I × 0.36 × 0.04 × 100 = 1.44 I

Mi l i i i 2 1 44 Miss cycle per instruction is 2 + 1.44 Total cycles per instruction is 2 + 2 + 1.44 So, actual CPI = 2 + 2 + 1.44 = 5.44 Ideal CPU is 5.44/2 =2.72 times faster

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 38

Page 39: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Associative CachesAssociative Caches

F ll Fully associative Allow a given block to go in any cache entry Requires all entries to be searched at once Comparator per entry (expensive)

n-way set associative Each set contains n entries Block number determines which set

• (Block number) modulo (#Sets in cache)

Search all entries in a given set at once n comparators (less expensive)

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 39

Page 40: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Associative Cache ExampleAssociative Cache Example

FIGURE 5.13 The location of a memory block whose address is 12 in a cache with eight blocks varies for directmapped, set‐associative, and fully associative placement. In direct‐mapped placement, there is only one cache block where memory block 12 can be found, and that block is given by (12 modulo 8) = 4. In a two‐way set‐associative cache, there would be four sets, and memory block 12 must be in set (12 mod 4) = 0; the memory block could be in either element of the set. In a fully associativeplacement, the memory block for block address 12 can appear in any of the eight cache blocks. Copyright © 2009 Elsevier, Inc. All rights reserved.

Page 41: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Spectrum of AssociativitySpectrum of Associativity

FIGURE 5.14 An eight‐block cache configured as direct mapped, two‐way set associative, four‐way set associative, and fully associative. The total size of the cache in blocks is equal to the number of sets times the associativity. Thus, for a fixed cache size, increasing the associativity decreases the number of sets while increasing the number of elements per set. With eight blocks, aneight‐way set‐associative cache is the same as a fully associative cache. Copyright © 2009 Elsevier, Inc. All rights reserved.

Page 42: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Virtual Memory§5.4 VirtVirtual Memory

VM is the technique of using main memory as

tual Mem

VM is the technique of using main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the

ory

Managed jointly by CPU hardware and the operating system (OS)

Same underlying concept as in cache but different in terminologies

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 42

Page 43: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

VM Terms§5.4 VirtVM Terms tual M

em

Virtual address is the address produced by the program

ory

Physical address is an address in the main ymemory

CPU and OS translate virtual addresses to physical addresses VM “block” is called a page VM translation “miss” is called a page fault

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 43

Page 44: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Page Fault PenaltyPage Fault Penalty

On page fault the page must be fetched from On page fault, the page must be fetched from disk

T k illi f l k l M i Takes millions of clock cycles. Main memory latency is around 100,000 times better than the disk latencydisk latency

Try to minimize page fault rateS Smart replacement algorithms implemented in software in the OS

R di f di k i l h d f • Reading from disk is slow enough and software overhead is negligible

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 44

Page 45: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Motivation 1§5.4 VirtMotivation 1

Multiple programs share main memory and

tual Mem

Multiple programs share main memory and they can change dynamically

ory

To avoid writing into each other’s data, we would like separate address space for each programp p p g

With VM, each gets a private virtual address space g p pholding its frequently used code and data

VM translates the virtual address into physical address allowing protection from other programs

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 45

Page 46: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Motivation I1§5.4 VirtMotivation I1

A large program cannot fit into the main

tual Mem

A large program cannot fit into the main memory

ory

VM automatically maps addresses into disk space if the main memory is not sufficienty

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 46

Page 47: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Address TranslationAddress Translation Address translation: the process by which the virtual address

is mapped to a physical address

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 47

Page 48: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Address TranslationAddress Translation

Two components of virtual 

Page offset does not change and the number of bits determines 

the size of the pagepaddress

the size of the page

The number of pages addressable with virtual address might be larger than the number of pages addressable with the physical address which gives the ill i f b d d f i l illusion of unbounded amount of virtual memory

Page 49: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Translation Using a Page TableTranslation Using a Page Table

Page 50: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Translation Using a Page TableTranslation Using a Page Table

A page table, that resides on the memory, is used for 

address translation

Each program has its own page table

Page 51: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Translation Using a Page TableTranslation Using a Page Table

To indicate the location of the page table in the memory, a hardware 

register points to the start of the page table

Each program has its own page table

Page 52: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Translation Using a Page TableTranslation Using a Page Table

Note the use of a Valid bit

Page 53: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Page TablesPage Tables Stores placement informationp Array of page table entries, indexed by virtual page

number Page table register in CPU points to page table in

physical memory If page is present in memory it stores the physical page number Plus other status bits

If page is not presentp g p a page fault occurs and the OS is given control Next few slides, we recap some OS concepts

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 53

p p

Page 54: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Role of the OSRole of the OS

A ( ) h A process (a program in execution), has a context defined by the values in its program counter, registers, and page table If another process preempts this process, this p p p p ,

context must be saved Rather than save the entire page table, only the Rather than save the entire page table, only the

page table register is saved To restart the process in the ‘running’ state To restart the process in the running state,

the operating system reloads the context

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 54

Page 55: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

R l f h OS

Th OS bl f ll h

Role of the OS

The OS is responsible for allocating the physical memory and updating the page tables

It maintains that virtual address of different It maintains that virtual address of different processes do not collide thus providing protectionprotection

Page fault is handled by OS

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 55

Page 56: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

R l f h OS

Th OS h d k (

Role of the OS

The OS creates a space on the disk (not main memory) for all pages of a process when the process is created called swap space

OS also creates a record of where each virtual address of the process is located in the diskaddress of the process is located in the disk

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 56

Page 57: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Mapping Pages to StorageMapping Pages to Storage

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 57

Page 58: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Page FaultPage Fault

Handled b OS Handled by OS

If all pages in main memory are in use (it is full) If all pages in main memory are in use (it is full), the OS must choose a page to replace

The replaced page must be written to the swap space on the disk

To reduce page fault rate, f l l d (LRU) l prefer least-recently used (LRU) replacement

Predict that the page that was NOT used recently will be NOT used in near future

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 58

Page 59: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

WritesWrites

Di k i k illi f l Disk writes take millions of cycles Write through is impractical because it takes

millions of cycles to write to the disk millions of cycles to write to the disk • Building a write buffer is impractical

VM uses write-back VM uses write-back

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 59

Page 60: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Fast Translation Using a TLBFast Translation Using a TLB

Address translation would appear to require extra memory references One to access the page table itself Then the actual memory access

But access to page tables has good locality So use a fast cache of recently used translations y Called a Translation Look-aside Buffer (TLB)

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 60

Page 61: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Memory ProtectionMemory Protection

VM ll diff h h VM allows different processes to share the same main memory Need to protect against errant access Requires OS assistanceRequires OS assistance

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 61

Page 62: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Memory ProtectionMemory Protection Hardware support for OS protection Support two modes

• Privileged supervisor mode (aka kernel mode) meaning that the OS is runningOS is running

• User mode

Privileged instructions that only the OS can use • Allow it to write to the supervisor bit, and page table pointer

Allow mechanisms (e.g., special instructions) to switch between supervisor mode and the user mode (e g syscallbetween supervisor mode and the user mode (e.g., syscallin MIPS)

These features allow the OS to change page tables while preventing a user process from changing themp g p g g

Page 63: Lecture 5: Memory - IDATDTS10/info/lectures/Lecture5.pdfMiss rate Miss penalty Program Memory accesses Miss penalty Instruction Misses Program Instructions Chapter 5 —Large and Fast:

Concluding Remarks§5.12 CoConcluding Remarks

F ll l l

oncluding 

Fast memories are small, large memories are slow We really want fast, large memories

C hi i hi ill i

Remarks

Caching gives this illusion

Principle of localityP ll f h Programs use a small part of their memory space frequently

Memory hierarchy Memory hierarchy L1 cache L2 cache … DRAM memory disk disk

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 63