43
Ch 5.1 CO&ISA, NLT 2013 Chapter 5: Memory Ngo Lam Trung [with materials from Computer Organiz ation and Design, 4 th  Edition, Patterson & Hennessy, © 2008, MK and M.J. Irwin’s presentation, PSU 2008]  

CA Chap5 Nlt2013

Embed Size (px)

Citation preview

Page 1: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 1/43

Ch 5.1CO&ISA, NLT 2013

Chapter 5: Memory

Ngo Lam Trung

[with materials from Computer Organization and Design, 4th Edition,

Patterson & Hennessy, © 2008, MK

and M.J. Irwin’s presentation, PSU 2008] 

Page 2: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 2/43

Ch 5.2CO&ISA, NLT 2013

Content

Memory hierarchy

Cache

Virtual memory

Page 3: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 3/43

Ch 5.3CO&ISA, NLT 2013

Review: Major Components of a Computer

Processor

Control

Datapath

Memory

Devices

Input

Output

 C  a ch  e

M ai  n

M em or  y

 S  e c on d  ar  y

M em or  y

 (  Di   sk  )  

Page 4: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 4/43

Ch 5.4CO&ISA, NLT 2013

Memory technology

Static RAM (SRAM)

0.5ns – 2.5ns, $2000 – $5000 per GB

Dynamic RAM (DRAM)

50ns – 70ns, $20 – $75 per GB

Magnetic disk

5ms – 20ms, $0.20 – $2 per GB

Fact:

Large memories are slow

Fast memories are small (and expensive)

Ideal memory

 Access time of SRAM

Capacity and cost/GB of disk

Page 5: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 5/43

Ch 5.5CO&ISA, NLT 2013

A Typical Memory Hierarchy

SecondLevelCache

(SRAM)

Control

Datapath

SecondaryMemory(Disk)

On-Chip Components

R e gF i  l   e

MainMemory(DRAM)

D a

 t   a

 C  a c h  e

I  n s  t  r 

 C  a c h  e

I  T L B 

DT 

L B 

Speed (%cycles): ½’s 1’s 10’s 100’s 10,000’s 

Size (bytes): 100’s 10K’s M’s G’s T’s 

Cost: highest lowest

How to get an ideal memory

 As fast as SRAM

 As cheap as disk?

Page 6: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 6/43

Ch 5.6CO&ISA, NLT 2013

Characteristics of the Memory Hierarchy

Increasingdistance

from theprocessorin access time

L1$

L2$

Main Memory

Secondary Memory

Processor

(Relative) size of the memory at each level

Inclusive – higher level issubset oflower level

4-8 bytes (word)

1 to 4 blocks

1,024+ bytes (disk sector = page)

8-32 bytes (block)

Page 7: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 7/43Ch 5.7CO&ISA, NLT 2013

The Memory Hierarchy: Locality Principal

C program

int x[1000], temp;for (i = 0; i < 999; i++)

for (j = i+1; j < 1000; j++)if (x[i] < x[j]){

temp = x[i];x[i] = x[j];x[j] = temp;

}

Data memory space of x isaccess multiple times

Instruction memory spaceof the two for loops areused repeatedly

CPU

Cache

Main

memory

Transfer blocks of

instruction/data

Instruction fetch/data access

(multiple times)

Page 8: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 8/43Ch 5.8CO&ISA, NLT 2013

The Memory Hierarchy: Locality Principal

Temporal Locality (locality in time)

If a memory location is referenced then it will tend to bereferenced again soon

 Keep most recently accessed data items closer to the processor

Spatial Locality (locality in space) If a memory location is referenced, the locations with nearby

addresses will tend to be referenced soon

 Move blocks consisting of contiguous words closer to theprocessor

Page 9: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 9/43Ch 5.9CO&ISA, NLT 2013

Terminology

Block (or line): the minimum unit of information that ispresent in a cache

Hit: the requested memory item is found in a level of thememory hierarchy

Miss: the requested memory item is not  found in a level ofthe memory hierarchy

Page 10: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 10/43

Page 11: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 11/43Ch 5.11CO&ISA, NLT 2013

Cache

The memory hierarchy between the processor and mainmemory

CPU fetch instructions and data from cache, if found (cache hit) fast access.

If not found (cache miss) load from main memory into cache,then access in cache miss penalty

Cache is a small windows to see the main memory

CPU

Cache

MainmemoryBlocks of data

Instruction fetchData access

Page 12: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 12/43Ch 5.12CO&ISA, NLT 2013

Two questions to answer (in hardware):

Q1: How do we know if a data item is in the cache? Q2: If it is, how do we find it?

Direct mapped

Each memory block is mapped to exactly one block in thecache

- lots of lower level blocks must share blocks in the cache

 Address mapping (to answer Q2):

(block address) modulo (# of blocks in the cache)

The tag field: associated with each cache block that containsthe address information (the upper portion of the address)required to identify the block (to answer Q1)

The valid bit: if there is data in the block or not

Cache Basics

Page 13: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 13/43Ch 5.14CO&ISA, NLT 2013

Caching: A Simple First Example

00

01

10

11

Cache

Main Memory

Q2: How do we find it?

Use next 2 low ordermemory address bits

 – the index – to

determine whichcache block (i.e.,modulo the number ofblocks in the cache)

Tag Data

Q1: Is it there?

Compare the cachetag to the high order 2memory address bits totell if the memory blockis in the cache

Valid

0000xx

0001xx

0010xx0011xx

0100xx

0101xx

0110xx

0111xx1000xx

1001xx

1010xx

1011xx

1100xx

1101xx

1110xx

1111xx

One word blocks

Two low order bitsdefine the byte in theword (32b words)

(block address) modulo (# of blocks in the cache)

Index

Page 14: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 14/43

Ch 5.16CO&ISA, NLT 2013

Direct Mapped Cache

0 1 2 3

4 3 4 15

Consider the main memory word reference string

0 1 2 3 4 3 4 15

00 Mem(0) 00 Mem(0)

00 Mem(1)00 Mem(0) 00 Mem(0)

00 Mem(1)

00 Mem(2)

miss miss miss miss

miss misshit hit

00 Mem(0)

00 Mem(1)

00 Mem(2)00 Mem(3)

01 Mem(4)

00 Mem(1)

00 Mem(2)00 Mem(3)

01 Mem(4)

00 Mem(1)

00 Mem(2)00 Mem(3)

01 Mem(4)

00 Mem(1)

00 Mem(2)00 Mem(3)

01 4

11 15

00 Mem(1)00 Mem(2)

00 Mem(3)

Start with an empty cache - all

blocks initially marked as not valid

8 requests, 6 misses

What if we repeatedly request 1,000,000 times

Page 15: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 15/43

Ch 5.17CO&ISA, NLT 2013

One word blocks, cache size = 1K words (or 4KB)

MIPS Direct Mapped Cache Example

20Tag 10Index

DataIndex TagValid0

1

2.

.

.

1021

1022

1023

31 30 . . . 13 12 11 . . . 2 1 0Byte

offset

What kind of locality are we taking advantage of?

20

Data

32

Hit

Page 16: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 16/43

Ch 5.18CO&ISA, NLT 2013

Multiword Block Direct Mapped Cache

8Index

DataIndex TagValid0

12

.

.

.

253

254

255

31 30 . . . 13 12 11 . . . 4 3 2 1 0Byte

offset

20

20Tag

Hit Data

32

Block offset

Four words/block, cache size = 1K words

What kind of locality are we taking advantage of?

Page 17: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 17/43

Ch 5.20CO&ISA, NLT 2013

Taking Advantage of Spatial Locality

0

Let cache block hold more than one word

0 1 2 3 4 3 4 15

1 2

3 4 3

4 15

00 Mem(1) Mem(0)

miss

00 Mem(1) Mem(0)

hit

00 Mem(3) Mem(2)

00 Mem(1) Mem(0)

miss

hit

00 Mem(3) Mem(2)

00 Mem(1) Mem(0)

miss

00 Mem(3) Mem(2)

00 Mem(1) Mem(0)01 5 4

hit

00 Mem(3) Mem(2)

01 Mem(5) Mem(4)

hit

00 Mem(3) Mem(2)

01 Mem(5) Mem(4)

00 Mem(3) Mem(2)

01 Mem(5) Mem(4)miss

11 15 14

Start with an empty cache - all

blocks initially marked as not valid

8 requests, 4 misses

Page 18: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 18/43

Ch 5.21CO&ISA, NLT 2013

Miss Rate vs Block Size vs Cache Size

   M   i  s  s  r  a   t  e   (   %   )

Block size (bytes)

8 KB

16 KB

64 KB

256 KB

Miss rate goes up if the block size becomes a significantfraction of the cache size because the number of blocksthat can be held in the same size cache is smaller(increasing capacity misses)

Page 19: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 19/43

Ch 5.22CO&ISA, NLT 2013

Cache Field Sizes

The number of bits in a cache includes both the storage

for data and for the tags 32-bit byte address

For a direct mapped cache with 2n blocks, n bits are used for theindex

For a block size of 2m words (2m+2 bytes), m bits are used to

address the word within the block and 2 bits are used to addressthe byte within the word

What is the size of the tag field?

The total number of bits in a direct-mapped cache is then

2n x (block size + tag field size + valid field size)

How many total bits are required for a direct mappedcache with 16KB of data and 4-word blocks assuming a32-bit address?

Page 20: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 20/43

Ch 5.23CO&ISA, NLT 2013

Read hits (I$ and D$)

this is what we want!

Write hits (D$ only)

require the cache and memory to be consistent

- always write the data into both the cache block and the next level in

the memory hierarchy (write-through)- writes run at the speed of the next level in the memory hierarchy – so

slow! – or can use a write buffer and stall only if the write buffer is full

allow cache and memory to be inconsistent

- write the data only into the cache block (write-back the cache block to

the next level in the memory hierarchy when that cache block is“evicted”) 

- need a dirty bit for each data cache block to tell if it needs to bewritten back to memory when it is evicted – can use a write buffer tohelp “buffer” write-backs of dirty blocks

Handling Cache Hits

Page 21: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 21/43

Ch 5.24CO&ISA, NLT 2013

Sources of Cache Misses

Compulsory (cold start or process migration, firstreference):

First access to a block.

We cannot do much on this.

Solution: increase block size (but also increases miss penalty).

Capacity:

Cache cannot contain all blocks accessed by the program

Solution: increase cache size (may increase access time)

Conflict (collision):

Multiple memory locations mapped to the same cache location Solution 1: increase cache size

Solution 2: increase associativity (may increase access time)

Page 22: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 22/43

Ch 5.25CO&ISA, NLT 2013

Handling Cache Misses (Single Word Blocks)

Read misses (I$ and D$)

stall the pipeline, fetch the block from the next level in the memory

hierarchy, install it in the cache and send the requested word tothe processor, then let the pipeline resume

Write misses (D$ only)

1. stall the pipeline, fetch the block from next level in the memory

hierarchy, install it in the cache (which may involve having to evicta dirty block if using a write-back cache), write the word from theprocessor to the cache, then let the pipeline resume

or

2. Write allocate  – just write the word into the cache updating both

the tag and data, no need to check for cache hit, no need to stallor

3. No-write allocate  – skip the cache write (but must invalidate thatcache block since it will now hold stale data) and just write theword to the write buffer (and eventually to the next memory level),

no need to stall if the write buffer isn’t full

Page 23: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 23/43

Ch 5.26CO&ISA, NLT 2013

Multiword Block Considerations

Read misses (I$ and D$)

Processed the same as for single word blocks – a miss returnsthe entire block from memory

Miss penalty grows as block size grows

- Early restart  – processor resumes execution as soon as therequested word of the block is returned

- Requested word first  – requested word is transferred from thememory to the cache (and processor) first

Nonblocking cache  – allows the processor to continue to accessthe cache while the cache is handling an earlier miss

Write misses (D$)

If using write allocate must first  fetch the block from memory andthen write the word to the block (or could end up with a “garbled”block in the cache (e.g., for 4 word blocks, a new tag, one wordof data from the new block, and three words of data from the oldblock)

Page 24: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 24/43

Ch 5.27CO&ISA, NLT 2013

The off-chip interconnect and memory architecture canaffect overall system performance in dramatic ways

Memory Systems that Support Caches

CPU

Cache

DRAMMemory

bus

One word wide organization (one word wide busand one word wide memory) 

 Assume

1. 1 memory bus clock cycle to send the addr

2. 15 memory bus clock cycles to get the 1st word in the block from DRAM (row cycle time), 5 memory bus clock cycles for 2nd,3rd, 4th words (column access time)

3. 1 memory bus clock cycle to return a word

of data

Memory-Bus to Cache bandwidth

number of bytes accessed from memoryand transferred to cache/CPU per memory

bus clock cycle

32-bit data&

32-bit addrper cycle

on-chip

Page 25: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 25/43

Ch 5.29CO&ISA, NLT 2013

One Word Wide Bus, One Word Blocks

CPU

Cache

DRAMMemory

bus

on-chip

If the block size is one word, then for amemory access due to a cache miss,the pipeline will have to stall for thenumber of cycles required to return onedata word from memory

memory bus clock cycle to send address

memory bus clock cycles to read DRAMmemory bus clock cycle to return data

total clock cycles miss penalty

Number of bytes transferred per clockcycle (bandwidth) for a single miss is

bytes per memory bus clockcycle

1

151

17

4/17 = 0.235

Page 26: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 26/43

Ch 5.31CO&ISA, NLT 2013

One Word Wide Bus, Four Word Blocks

CPU

Cache

DRAMMemory

bus

on-chip

What if the block size is four words andeach word is in a different DRAM row?

cycle to send 1st address

cycles to read DRAM

cycles to return last data word

total clock cycles miss penalty

Number of bytes transferred per clockcycle (bandwidth) for a single miss is

bytes per clock

15 cycles

15 cycles

15 cycles

15 cycles

1

4 x 15 = 60

1

62

(4 x 4)/62 = 0.258

Page 27: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 27/43

Ch 5.33CO&ISA, NLT 2013

One Word Wide Bus, Four Word Blocks

CPU

Cache

DRAMMemory

bus

on-chip

What if the block size is four words and allwords are in the same DRAM row?

cycle to send 1st address

cycles to read DRAM

cycles to return last data word

total clock cycles miss penalty

Number of bytes transferred per clockcycle (bandwidth) for a single miss is

bytes per clock

15 cycles

5 cycles

5 cycles

5 cycles

1

15 + 3*5 = 30

1

32

(4 x 4)/32 = 0.5

Page 28: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 28/43

Ch 5.35CO&ISA, NLT 2013

Interleaved Memory, One Word Wide Bus

For a block size of four words

cycle to send 1st

 addresscycles to read DRAM banks

cycles to return last data word

total clock cycles miss penalty

CPU

Cache

bus

on-chip

 Number of bytes transferredper clock cycle (bandwidth) for asingle miss is

bytes per clock

15 cycles

15 cycles

15 cycles

15 cycles

(4 x 4)/20 = 0.8

1

15

4*1 = 4

20

DRAMMemorybank 1

DRAMMemorybank 0

DRAMMemorybank 2

DRAMMemorybank 3

Page 29: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 29/43

Ch 5.37CO&ISA, NLT 2013

Reducing Cache Miss Rates #1

1.  Allow more flexible block placement

Direct mapped cache: a memory block maps to exactlyone cache block

Fully associative cache allow a memory block to bemapped to any  cache block

 A compromise is to divide the cache into sets each ofwhich consists of n “ways” (n-way set associative). Amemory block maps to a unique set (specified by theindex field) and can be placed in any way of that set (sothere are n choices) 

(block address) modulo (# sets in the cache)

Page 30: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 30/43

Ch 5.39CO&ISA, NLT 2013

Another Reference String Mapping

0 4 0 4

0 4 0 4

Consider the main memory word reference string

0 4 0 4 0 4 0 4

miss miss miss miss

miss miss miss miss

00 Mem(0) 00 Mem(0)01 4

01 Mem(4)000

00 Mem(0)01

4

00 Mem(0)

01 400 Mem(0)

014

01 Mem(4)000

01 Mem(4)000

Start with an empty cache - all

blocks initially marked as not valid

Ping pong effect due to conflict misses - two memory

locations that map into the same cache block

8 requests, 8 misses

Page 31: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 31/43

Ch 5.40CO&ISA, NLT 2013

Set Associative Cache Example

0

Cache

Main Memory

Q2: How do we find it?

Use next 1 low ordermemory address bit to

determine whichcache set (i.e., modulothe number of sets inthe cache)

Tag Data

Q1: Is it there?

Compare all  the cachetags in the set to thehigh order 3 memoryaddress bits to tell ifthe memory block is inthe cache

V

0000xx

0001xx0010xx

0011xx

0100xx

0101xx

0110xx

0111xx

1000xx

1001xx

1010xx

1011xx

1100xx1101xx

1110xx

1111xx

Set

1

0

1

Way

0

1

One word blocks

Two low order bitsdefine the byte in theword (32b words)

Page 32: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 32/43

Page 33: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 33/43

Ch 5.42CO&ISA, NLT 2013

Four-Way Set Associative Cache 28 = 256 sets each with four ways (each with one block)

31 30 . . . 13 12 11 . . . 2 1 0 Byte offset

DataTagV0

1

2

.

.

.

253

254

255

DataTagV0

1

2

.

.

.

253

254

255

DataTagV0

1

2

.

.

.

253

254

255

Index DataTagV0

1

2

.

.

.

253

254

255

8Index

22Tag

Hit Data

32

4x1 select

Way 0 Way 1 Way 2 Way 3

Page 34: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 34/43

Ch 5.43CO&ISA, NLT 2013

Range of Set Associative Caches

For a fixed size cache, increase of the number of blocksper set results in decrease of the number of sets

Block offset Byte offsetIndexTag

Decreasing associativity

Fully associative(only one set)Tag is all the bits exceptblock and byte offset

Direct mapped(only one way)Smaller tags, only asingle comparator

Increasing associativity

Selects the setUsed for tag compare Selects the word in the block

Page 35: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 35/43

Page 36: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 36/43

Ch 5.45CO&ISA, NLT 2013

Reducing Cache Miss Rates #2

2. Use multiple levels of caches

Normally a unified L2 cache (holding both instructionsand data) and in some cases even a unified L3 cache

Page 37: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 37/43

Ch 5.46CO&ISA, NLT 2013

Multilevel Cache Design Considerations Design considerations for L1 and L2 caches are very

different

Primary cache should focus on minimizing hit time in support ofa shorter clock cycle

- Smaller with smaller block sizes

Secondary cache(s) should focus on reducing miss rate toreduce the penalty of long main memory access times

- Larger with larger block sizes- Higher levels of associativity

The miss penalty of the L1 cache is significantly reducedby the presence of an L2 cache – so it can be smaller but

have a higher miss rate For the L2 cache, hit time is less important than miss rate

The L2$ hit time determines L1$’s miss penalty 

L2$ local miss rate >> than the global miss rate

Page 38: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 38/43

Ch 5.47CO&ISA, NLT 2013

Summary

Memory hierarchy and the locality principal

Cache design Direct mapped

Set associative

Memory access when cache hit and miss

Page 39: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 39/43

Ch 5.48CO&ISA, NLT 2013

Review: The Memory Hierarchy

Increasingdistancefrom theprocessor inaccess time

L1$

L2$

Main Memory

Secondary Memory

Processor

(Relative) size of the memory at each level

Inclusive – whatis in L1$ is asubset of whatis in L2$ is asubset of whatis in MM that isa subset of is

in SM

4-8 bytes (word)

1 to 4 blocks

1,024+ bytes (disk sector = page)

8-32 bytes (block)

Take advantage of the principle of locality to present theuser with as much memory as is available in the cheapest

technology at the speed offered by the fastest technology

Page 40: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 40/43

Ch 5.49CO&ISA, NLT 2013

Virtual Memory

Use main memory as a “cache” for secondary memory 

 Allows efficient and safe sharing of memory amongmultiple programs

Provides the ability to easily run programs larger than thesize of physical memory

Simplifies loading a program for execution by providingfor code relocation (i.e., the code can be loadedanywhere in main memory)

Page 41: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 41/43

Ch 5.50CO&ISA, NLT 2013

Virtual Memory

What makes it work?  – again the Principle of Locality

 A program is likely to access a relatively small portion of itsaddress space during any period of time

Each program is compiled into its own address space – a“virtual” address space 

During run-time each virtual address must be translated to a

physical address (an address in main memory)

Page 42: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 42/43

Ch 5.51CO&ISA, NLT 2013

Virtual Memory

Program 1virtual address space

main memory

 A program’s address space is divided into pages (all onefixed size) or segments (variable sizes)

The starting location of each page (either in main memory or insecondary memory) is contained in the program’s page table

Program 2virtual address space

Page 43: CA Chap5 Nlt2013

8/12/2019 CA Chap5 Nlt2013

http://slidepdf.com/reader/full/ca-chap5-nlt2013 43/43

Virtual Memory

Some topics

 Address Translation from virtual to physical memory address

Handling page fault (miss)

Cache design with virtual memory