73
Chap.7 Memory system Jen-Chang Liu, Spring 2006

Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Chap.7 Memory system

Jen-Chang Liu, Spring 2006

Page 2: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Big Ideas so far

15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems as layers

Pliable Data: a program determines what it is Stored program concept: instructions just data

Greater performance by exploiting parallelism (pipeline)

Principle of Locality, exploited via a memory hierarchy (cache)

Principles/Pitfalls of Performance Measurement

Page 3: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Five components of computer

Input, output, memory, datapath, control

Page 4: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Outline Introduction Basics of caches Measuring cache performance

Set associative cache Multilevel cache

Virtual memory

Make memory system fast

Make memory system big

Page 5: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Introduction Programmer’s view about memory

Unlimited amount of fast memory How to create the above illusion?

無限大的快速記憶體

Scene: library Book shelf

desk

onebook

books

Page 6: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Principle of locality Program access a relatively small portion

of their address space at any instant of time

Temporal locality If an item is referenced, it will tend to be

referenced again soon Spatial locality

If an item is referenced, items whose address are close by will tend to be referenced soon

Page 7: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Cost and performance of memory

How to build a memory system from the above memory technologies?

Access time $ per GB in 2004

SRAM 0.5-5ns $4000-$10000

DRAM 50-70ns $100-$200

Magnetic disk 5-20 million ns $0.5-$2

•SRAM: static random access memory•DRAM: dynamic random access memory

Page 8: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Memory hierarchy 記憶體階層

Memory

CPU

Memory

Size Cost ($/bit)Speed

Smallest

Biggest

Highest

Lowest

Fastest

Slowest Memory

Ex.

disk

DRAM

SRAM

data

Alldata

Subsetof data

Subsetof data

Page 9: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Operation in memory hierarchy

Processor

Data are transferred

If data is found /* hit */ transfer to processor;

else /* miss */ transfer data to upper level;

accesstime

Hit time

Misspenalty

Page 10: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Outline Introduction Basics of caches Measuring cache performance

Set associative cache Multilevel cache

Virtual memory

How to design memory hierarchy?

Page 11: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Cache Cache: a safe place for hiding or storing

things.

Cache Memory hierarchy between CPU and main

memory Any storage managed to take advantage of

locality of access

Webster’s dictionary

快取記憶體

Memory

CPU

Memory

Size Cost ($/bit)Speed

Smallest

Biggest

Highest

Lowest

Fastest

Slowest Memory

Page 12: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

What does a cache do?

a. Before the reference to Xn

X3

Xn – 1

Xn – 2

X1

X4

b. After the reference to Xn

X3

Xn – 1

Xn – 2

X1

X4

Xn

X2X2

a. Before the reference to Xn

X3

Xn – 1

Xn – 2

X1

X4

b. After the reference to Xn

X3

Xn – 1

Xn – 2

X1

X4

Xn

X2X2

Page 13: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Problem to design a cache Cache contains part of the data in

memory of disk Q1: How do we know if a data item is in

the cache?如何知道 cache 有沒有現在要用的資料?= > 如何把記憶體抓到的資料放到 cache 裡?

Page 14: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Direct mapped cache (Fig 7.5)

Ex. (block address) modulo (no. of cache blocks in the cache)

Address of word Location in cache

00001 00101 01001 01101 10001 10101 11001 11101

000

Cache

Memory

001

010

011

100

101

110

111

Page 15: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Direct mapped cache (cont.)

Many memory words one location in cache

Q: Which memory word in the cache? Use tag to identify

Q: Whether the memory block is valid? Ex. Initially, the cache is empty Use valid bit to identify

data word

Cacheaddr. valid ta

g

Page 16: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Fig7.6

Page 17: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Fig7.6

Page 18: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Cache access (Fig 7.7)Address (showing bit positions)

20 10

Byteoffset

Valid Tag DataIndex

0

1

2

1021

1022

1023

Tag

Index

Hit Data

20 32

31 30 13 12 11 2 1 0

Word = 4bytesaddress

Cache 裡真正用來存資料的部分

Cache block 大小:

Page 19: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Ex. Calculate bits in a cache

How many bits are required for a direct-mapped cache with 64KB of data and one-word blocks, assuming a 32-bit address?

32-bitaddress

31 0

1 Word data

2

64KB = 16K words = 214 words2

14

3 Tag = 32-14-2 = 16

16

4 Cache bit: 214 x (32 + 16 + 1) = 98KB

Page 20: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Ex. Real machine: DECstation 3100

Address (showing bit positions)

16 14 Byteoffset

Valid Tag Data

Hit Data

16 32

16Kentries

16 bits 32 bits

31 30 17 16 15 5 4 3 2 1 031 30 … 16 15 … 4 3 2 1 0

64KB data

98KB cache size

(214)

Page 21: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Ex. DECStation 3100 Use MIPS R2000 CPU Use pipeline as in Chap. 6

Instructionfetch

Reg ALUData

accessReg

8 nsInstruction

fetchReg ALU

Dataaccess

Reg

8 nsInstruction

fetch

8 ns

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 4 6 8 10 12 14 16 18

2 4 6 8 10 12 14

...

Programexecutionorder(in instructions)

Instructionfetch

Reg ALUData

accessReg

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 nsInstruction

fetchReg ALU

Dataaccess

Reg

2 nsInstruction

fetchReg ALU

Dataaccess

Reg

2 ns 2 ns 2 ns 2 ns 2 ns

Programexecutionorder(in instructions)

Instructionfetch

Reg ALUData

accessReg

8 nsInstruction

fetchReg ALU

Dataaccess

Reg

8 nsInstruction

fetch

8 ns

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 4 6 8 10 12 14 16 18

2 4 6 8 10 12 14

...

Programexecutionorder(in instructions)

Instructionfetch

Reg ALUData

accessReg

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 nsInstruction

fetchReg ALU

Dataaccess

Reg

2 nsInstruction

fetchReg ALU

Dataaccess

Reg

2 ns 2 ns 2 ns 2 ns 2 ns

Programexecutionorder(in instructions)

Data memory

Instruction memory

Two memoryUnits?

Page 22: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Ex. DECStation 3100 caches Instruction cache and data cache

Memory

CPU

Memory

Size Cost ($/bit)Speed

Smallest

Biggest

Highest

Lowest

Fastest

Slowest Memory

64KBInstruction

cache

64KBdata

cache

Page 23: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Ex. DECStation 3100 Cache access: Read

Memory

CPU

Memory

Size Cost ($/bit)Speed

Smallest

Biggest

Highest

Lowest

Fastest

Slowest Memory

64KBInstruction

cache

64KBdata

cache

PC Address calculated from ALUCachehit

Cachemiss

Updatecache

Page 24: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Peer Instruction

A. Mem hierarchies were invented before 1950. (UNIVAC I wasn’t delivered ‘til 1951)

B. If you know your computer’s cache size, you can often make your code run faster.

C. Memory hierarchies take advantage of spatial locality by keeping the most recent data items closer to the processor.

ABC1: FFF2: FFT3: FTF4: FTT5: TFF6: TFT7: TTF8: TTT

Page 25: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

CS61C L31 Caches I (26) Garcia 2005 © UCB

Peer Instructions

1. All caches take advantage of spatial locality.

2. All caches take advantage of temporal locality.

3. On a read, the return value will depend on what is in the cache.

ABC1: FFF2: FFT3: FTF4: FTT5: TFF6: TFT7: TTF8: TTT

Page 26: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Handling cache misses Cache miss processing

Stall the processor Fetch the data from memory Write the cache entry

Put the data Update the tag field Update the valid bit

Continue execution

Page 27: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Ex. DECStation 3100Cache access: Write

Processor

Data are transferred

Store data

new value

Data in cacheand memory isinconsistent!!!資料不相符

1. Write-through

更改快取記憶體同時也寫回記憶體

2. Write-back

不寫回記憶體

Page 28: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Problems with write-through

Writing to main memory slows down the performance Ex. CPI without cache miss = 1.2 clock cycles write to memory causes extra 10 cycles 13% store instructions in gcc 1.2+10x13% = 2.5 clock cycles

記憶體存取造成效率變差

Solution: write buffer Store the data into write buffer while the data is

waiting to be written to memory The process can continue execution after writing

data into cache and write buffer

寫入資料暫存在 write buffer,等待寫入記憶體,程式繼續執行

Page 29: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Problems with write-back New value is written only

to the cache Problem: cache and

memory inconsistence Complex to implement Ex. When a cache entry is

replaced, it must update the corresponding memory address

Processor

Data are transferred

Page 30: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Use of spatial locality Previous cache design takes advantage

of temporal locality Use spatial locality in cache design

A cache block that is larger than 1 word in length

With a cache miss, we will fetch multiple words that are adjacent

時間上的局部性

空間上的局部性

一次抓多個相鄰的 words

Page 31: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

One-word cache (Fig 7.7)Address (showing bit positions)

20 10

Byteoffset

Valid Tag DataIndex

0

1

2

1021

1022

1023

Tag

Index

Hit Data

20 32

31 30 13 12 11 2 1 0

address

Page 32: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Multiple-word cacheAddress (showing bit positions)

16 12 Byteoffset

V Tag Data

Hit Data

16 32

4Kentries

16 bits 128 bits

Mux

32 32 32

2

32

Block offsetIndex

Tag

31 16 15 4 32 1 0

4-word blockaddr.

Page 33: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Advantage of multiple-word block (spatial locality)

Ex. access word with byte address 16,24,20

162024284-word block cache

1-word block cache

16 - cache miss24 - cache miss20 - cache miss

16 – cache miss load 4-word block

24 – cache hit20 – cache hit

memory

Page 34: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Multiple-word cache: write miss

Address (showing bit positions)

16 12 Byteoffset

V Tag Data

Hit Data

16 32

4Kentries

16 bits 128 bits

Mux

32 32 32

2

32

Block offsetIndex

Tag

31 16 15 4 32 1 0

addr. 1-word data01

Reload4-wordblock

1-word data

miss

Page 35: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

CS61C L32 Caches II (37) Garcia, 2005 © UCB

1. Read 0x00000014

...

ValidTag 0x0-3 0x4-7 0x8-b 0xc-f

01234567

10221023

...

• 000000000000000000 0000000001 0100

Index

Tag field Index field Offset

00000000

00

Page 36: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

CS61C L32 Caches II (38) Garcia, 2005 © UCB

So we read block 1 (0000000001)

...

ValidTag 0x0-3 0x4-7 0x8-b 0xc-f

01234567

10221023

...

• 000000000000000000 0000000001 0100

Index

Tag field Index field Offset

00000000

00

Page 37: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

CS61C L32 Caches II (39) Garcia, 2005 © UCB

No valid data

...

ValidTag 0x0-3 0x4-7 0x8-b 0xc-f

01234567

10221023

...

• 000000000000000000 0000000001 0100

Index

Tag field Index field Offset

00000000

00

Page 38: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

CS61C L32 Caches II (40) Garcia, 2005 © UCB

So load that data into cache, setting tag, valid

...

ValidTag 0x0-3 0x4-7 0x8-b 0xc-f

01234567

10221023

...

1 0 a b c d

• 000000000000000000 0000000001 0100

Index

Tag field Index field Offset

0

000000

00

Page 39: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

CS61C L32 Caches II (41) Garcia, 2005 © UCB

Read from cache at offset, return word b• 000000000000000000 0000000001 0100

...

ValidTag 0x0-3 0x4-7 0x8-b 0xc-f

01234567

10221023

...

1 0 a b c d

Index

Tag field Index field Offset

0

000000

00

Page 40: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

CS61C L32 Caches II (42) Garcia, 2005 © UCB

2. Read 0x0000001C = 0…00 0..001 1100

...

ValidTag 0x0-3 0x4-7 0x8-b 0xc-f

01234567

10221023

...

1 0 a b c d

• 000000000000000000 0000000001 1100

Index

Tag field Index field Offset

0

000000

00

Page 41: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

CS61C L32 Caches II (43) Garcia, 2005 © UCB

Index is Valid

...

ValidTag 0x0-3 0x4-7 0x8-b 0xc-f

01234567

10221023

...

1 0 a b c d

• 000000000000000000 0000000001 1100

Index

Tag field Index field Offset

0

000000

00

Page 42: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

CS61C L32 Caches II (44) Garcia, 2005 © UCB

Index valid, Tag Matches

...

ValidTag 0x0-3 0x4-7 0x8-b 0xc-f

01234567

10221023

...

1 0 a b c d

• 000000000000000000 0000000001 1100

Index

Tag field Index field Offset

0

000000

00

Page 43: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

CS61C L32 Caches II (45) Garcia, 2005 © UCB

Index Valid, Tag Matches, return d

...

ValidTag 0x0-3 0x4-7 0x8-b 0xc-f

01234567

10221023

...

1 0 a b c d

• 000000000000000000 0000000001 1100

Index

Tag field Index field Offset

0

000000

00

Page 44: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

CS61C L32 Caches II (46) Garcia, 2005 © UCB

3. Read 0x00000034 = 0…00 0..011 0100

...

ValidTag 0x0-3 0x4-7 0x8-b 0xc-f

01234567

10221023

...

1 0 a b c d

• 000000000000000000 0000000011 0100

Index

Tag field Index field Offset

0

000000

00

Page 45: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

CS61C L32 Caches II (47) Garcia, 2005 © UCB

So read block 3

...

ValidTag 0x0-3 0x4-7 0x8-b 0xc-f

01234567

10221023

...

1 0 a b c d

• 000000000000000000 0000000011 0100

Index

Tag field Index field Offset

0

000000

00

Page 46: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

CS61C L32 Caches II (48) Garcia, 2005 © UCB

No valid data

...

ValidTag 0x0-3 0x4-7 0x8-b 0xc-f

01234567

10221023

...

1 0 a b c d

• 000000000000000000 0000000011 0100

Index

Tag field Index field Offset

0

000000

00

Page 47: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

CS61C L32 Caches II (49) Garcia, 2005 © UCB

Load that cache block, return word f

...

ValidTag 0x0-3 0x4-7 0x8-b 0xc-f

01234567

10221023

...

1 0 a b c d

• 000000000000000000 0000000011 0100

1 0 e f g h

Index

Tag field Index field Offset

0

0

0000

00

Page 48: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

CS61C L32 Caches II (50) Garcia, 2005 © UCB

4. Read 0x00008014 = 0…10 0..001 0100

...

ValidTag 0x0-3 0x4-7 0x8-b 0xc-f

01234567

10221023

...

1 0 a b c d

• 000000000000000010 0000000001 0100

1 0 e f g h

Index

Tag field Index field Offset

0

0

0000

00

Page 49: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

CS61C L32 Caches II (51) Garcia, 2005 © UCB

So read Cache Block 1, Data is Valid

...

ValidTag 0x0-3 0x4-7 0x8-b 0xc-f

01234567

10221023

...

1 0 a b c d

• 000000000000000010 0000000001 0100

1 0 e f g h

Index

Tag field Index field Offset

0

0

0000

00

Page 50: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

CS61C L32 Caches II (52) Garcia, 2005 © UCB

Cache Block 1 Tag does not match (0 != 2)

...

ValidTag 0x0-3 0x4-7 0x8-b 0xc-f

01234567

10221023

...

1 0 a b c d

• 000000000000000010 0000000001 0100

1 0 e f g h

Index

Tag field Index field Offset

0

0

0000

00

Page 51: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

CS61C L32 Caches II (53) Garcia, 2005 © UCB

Miss, so replace block 1 with new data & tag

...

ValidTag 0x0-3 0x4-7 0x8-b 0xc-f

01234567

10221023

...

1 2 i j k l

• 000000000000000010 0000000001 0100

1 0 e f g h

Index

Tag field Index field Offset

0

0

0000

00

Page 52: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

CS61C L32 Caches II (54) Garcia, 2005 © UCB

And return word j

...

ValidTag 0x0-3 0x4-7 0x8-b 0xc-f

01234567

10221023

...

1 2 i j k l

• 000000000000000010 0000000001 0100

1 0 e f g h

Index

Tag field Index field Offset

0

0

0000

00

Page 53: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Advantage of multiple-word block (spatial locality)

Comparison of miss rateBlock sizein words

program Instructionmiss rate

Datamiss rate

gcc 1 6.1% 2.1%4 2.0% 1.7%

spice 1 1.2% 1.3%4 0.3% 0.6%

Why improvement oninstruction miss is significant?

Instruction references have betterspatial locality

Page 54: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Miss rate v.s. block size

1 KB

8 KB

16 KB

64 KB

256 KB

256

40%

35%

30%

25%

20%

15%

10%

5%

0%

Mis

s ra

te

64164

Block size (bytes)Why?Block 數變少 !

Page 55: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Short conclusion Direct mapped cache

Map a memory word to a cache block Valid bit, tag field

Cache read Hit, read miss, miss penalty

Cache write Write-through Write-back Write miss penalty

Multi-word cache (use spatial locality)

Page 56: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Outline Introduction Basics of caches Measuring cache performance

Set associative cache Multilevel cache

Virtual memory

Make memory system fast

Page 57: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Cache performance How cache affects system

performance?CPU time= ( CPU execution clock cycles ) x clock cycle time

+ Memory-stall clock cycles

cache hit

cache miss

Memory-stall cycles = Read-stall cycles + Write-stall cycles

Read-stall cycles = Program

ReadsX Read miss rate x read miss penalty

Assume read and write miss penalty are the same

Memory-stall cycles = Program

Mem. accessX miss rate x miss penalty

Page 58: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Ex. Calculate cache performance

CPI = 2 without any memory stalls For gcc, instruction cache miss rate=2% data cache miss rate=4% miss penalty = 40 cycles Sol: Set instruction count = I

Instruction miss cycles = I x 2% x 40 = 0.8 x I

Data miss cycles = I x 36% x 4% x 40 = 0.58 x Ipercentage of lw/sw

Memory-stall cycles = 0.8I + 0.58I = 1.38ICPU timestalls

CPU timeperfect cache=

2I + 1.38I2I

=1.69

Page 59: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Why memory is bottleneck for system performance?

In previous example, if we make the processor faster, change CPI from 2 to 1

Memory-stall cycles remains the same=1.38ICPU timestalls

CPU timeperfect cache=

I + 1.38II

=2.38

Percentage of memory stall:

1.383.38

=41%1.382.38

=58%

CPU 變快 (CPI 降低,或 clock rate 提高 )Memory 對系統效能的影響百分比越重

Page 60: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Outline Introduction Basics of caches Measuring cache performance

Set associative cache (reduce miss rate) Multilevel cache

Virtual memory

Make memory system fast

Page 61: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

How to improve cache performance ?

Larger cache Set associative cache

Reduce cache miss rate New placement rule other than direct

mapping Multi-level cache

Reduce cache miss penalty

Memory-stall cycles = Program

Mem. accessX miss rate x miss penalty

Page 62: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Flexible placement of blocks

Recall: direct mapped cache One address -> one block in cache

00001 00101 01001 01101 10001 10101 11001 11101

000

Cache

Memory

001

010

011

100

101

110

111

? One address -> more than one block in cache 一個 memory address 可以對應到 cache 中一個以上的 block

Page 63: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Full-associative cache A memory data can be placed in any

block in the cache Disadvantage:

Search all entries in the cache for a match

Using parallel comparators

1

2Tag

Data

Block # 0 1 2 3 4 5 6 7

Search

Direct mapped

1

2Tag

Data

Set # 0 1 2 3

Search

Set associative

1

2Tag

Data

Search

Fully associative可放在 cache 任意位置

Page 64: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Set-associative cache Between direct mapped and full-

associative A memory data can be placed in a set

of blocks in the cache

Disadvantage: Search all entries in the set for a match Parallel comparators

可放在 cache 中某一個集合中

1

2Tag

Data

Block # 0 1 2 3 4 5 6 7

Search

Direct mapped

1

2Tag

Data

Set # 0 1 2 3

Search

Set associative

1

2Tag

Data

Search

Fully associative

(address) modulo (number of sets in cache)

Ex. 12 modulo 4 = 0

Page 65: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Example: 4-way set-associative cache

Address

22 8

V TagIndex

0

1

2

253

254255

Data V Tag Data V Tag Data V Tag Data

3222

4-to-1 multiplexor

Hit Data

123891011123031 0

Parallelcomparators

Page 66: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Take all schemes as a case of set-associativity

Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data

Eight-way set associative (fully associative)

Tag Data Tag Data Tag Data Tag Data

Four-way set associative

Set

0

1

Tag Data

One-way set associative(direct mapped)

Block

0

7

1

2

3

4

5

6

Tag Data

Two-way set associative

Set

0

1

2

3

Tag Data

Ex. 8-block cache

Page 67: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Example: set-associative caches (p. 500)

A cache with 4 blocks Load data with block addresses

0,8,0,6,8one-way set-associative cache (direct mapped)

5 misses

Page 68: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Example: set-associative caches

2-way set-associative cache

4-way set-associative cache

4 misses

3 misses

Page 69: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Short conclusion Higher degree of associativity

Lower miss rate More hardware cost to search

Page 70: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Outline Introduction Basics of caches Measuring cache performance

Set associative cache Multilevel cache (reduce miss penalty)

Virtual memory

Make memory system fast

Page 71: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Multi-level cache Goal: reduce miss penalty

Memory

CPU

Memory

Size Cost ($/bit)Speed

Smallest

Biggest

Highest

Lowest

Fastest

Slowest Memory

Primary cache (L1)

Secondary cache(L2)

L1 cachemiss

L2 cachemiss

Cache hit

Main memory

Page 72: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Example: Performance of multilevel cache

CPI = 1 without cache miss, clock rate = 500MHz

Primary cache, miss rate=5% Secondary cache, miss rate=2%, access

time=20ns Main memory, access time=200 ns

Total CPI = Base CPI + memory-stall CPI

1 ?

Page 73: Chap.7 Memory system Jen-Chang Liu, Spring 2006. Big Ideas so far 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems

Example: Performance of multilevel cache (cont.)

Total CPI = Base CPI + memory-stall CPI

1 ?

access to main memory=200ns x 500M clock/sec=100clock

access to L2 cache = 20ns x 500M clock/sec =10 clock

Total CPI = 1 + L1 miss penalty + L2 miss penalty = 1 + 5% x 10 + 2% x 100 = 3.5

One-level cache

Two-level cache

Total CPI = 1 + 5% x 100 = 6