47
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: Bernhard Boser & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/18/16 Fall 2016 - Lecture #15 1

CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Embed Size (px)

Citation preview

Page 1: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

CS61C:GreatIdeasinComputerArchitecture(MachineStructures)

CachesPart2

Instructors:BernhardBoser &RandyH.Katz

http://inst.eecs.berkeley.edu/~cs61c/

10/18/16 Fall2016- Lecture#15 1

Page 2: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Outline

• CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…

10/18/16 Fall2016– Lecture#15 2

Page 3: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Outline

• CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…

10/18/16 Fall2016– Lecture#15 3

Page 4: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Second-LevelCache(SRAM)

TypicalMemoryHierarchy

Control

Datapath

SecondaryMemory(Disk

OrFlash)

On-ChipComponents

RegFile

MainMemory(DRAM)Data

CacheInstrCache

Speed(cycles):½’s 1’s 10’s 100’s-10001,000,000’s

Size(bytes): 100’s 10K’s M’sG’sT’s

• Principleoflocality+memoryhierarchypresentsprogrammerwith≈asmuchmemoryasisavailableinthecheapest technologyatthe≈speedofferedbythefastest technology

Cost/bit:highest lowest

Third-LevelCache(SRAM)

10/18/16 Fall2016- Lecture#15 4

Page 5: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Processor

Control

Datapath

AddingCachetoComputer

PC

Registers

Arithmetic&LogicUnit(ALU)

MemoryInput

Output

Bytes

Enable?Read/Write

Address

WriteData

ReadData

Processor-Memory Interface I/O-MemoryInterfaces

Program

Data

Cache

10/18/16 Fall2016- Lecture#15 5

Processororganizedaroundwordsand bytes

Memory (includingcache)organizedaroundblocks,

whicharetypicallymultiple words

Page 6: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

KeyCacheConcepts• PrincipleofLocality– TemporalLocalityandSpatialLocality

• HierarchyofMemories (speed/size/costperbit)toexploitlocality

• Cache– copyofdatainlowerlevelofmemoryhierarchy

• DirectMappedtofindblockincacheusingTagfieldandValidbitforHit

• CacheDesignOrganizationChoices:– FullyAssociative,Set-Associative,Direct-Mapped

610/18/16 Fall2016- Lecture#15

Page 7: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

CacheOrganizations• “FullyAssociative”:Blockplacedanywhereincache– Firstdesignlastlecture– Note:NoIndexfield,butonecomparator/block

• “DirectMapped”:Blockgoesonlyoneplaceincache– Note:Onlyonecomparator– Numberofsets=numberblocks

• “N-waySetAssociative”:Nplacesforblockincache– Numberofsets=NumberofBlocks/N– Ncomparators– FullyAssociative:N=numberofblocks– DirectMapped:N=1

10/18/16 Fall2016- Lecture#15 7

Page 8: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111

0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111

0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111

8 88Byte

Word8-Byte Block

address address address

2 LSBs are 0 3 LSBs are 0

0

1

2

3

01234567012345670123456701234567

Byte offset in blockBlock #

MemoryBlockvs.WordAddressing

10/18/16 Fall2016- Lecture#15 8

Page 9: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

010100100000

010100110000

010101000000

010101010000

010101100000

010101110000

010110000000

010110010000

010110100000

010110110000

010100100000

010100110000

010101000000

010101010000

010101100000

010101110000

010110000000

010110010000

010110100000

010110110000

82

83

84

85

86

87

88

89

90

91

2

3

4

5

6

7

0

1

2

3

0

1

0

1

0

1

0

1

0

1

010100100000

010100110000

010101000000

010101010000

010101100000

010101110000

010110000000

010110010000

010110100000

010110110000

MemoryBlockNumberAliasing

Block# Block#mod8 Block#mod2

12-bitmemoryaddresses,16Byteblocks

10/18/16 Fall2016- Lecture#15 9

Page 10: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

ProcessorAddressFieldsusedbyCacheController

• BlockOffset:Byteaddresswithinblock• SetIndex:Selectswhichset• Tag:Remainingportionofprocessoraddress

• SizeofIndex=log2(numberofsets)• SizeofTag=Addresssize– SizeofIndex

– log2(numberofbytes/block)

Block offsetSetIndexTag

ProcessorAddress(32-bitstotal)

10/18/16 Fall2016- Lecture#15 10

Page 11: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

• Onewordblocks,cachesize=1Kwords(or4KB)

Direct-MappedCacheRevisted

20Tag 10Index

DataIndex TagValid012...

102110221023

3130 ... 131211 ... 210Byteoffset

20

Data

32

HitValidbitensures

somethingusefulincacheforthisindex

CompareTagwithupperpartofAddresstoseeifa

Hit

Readdatafromcache

insteadofmemoryif

aHit

Comparator

10/18/16 Fall2016- Lecture#15 11

Page 12: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Four-WaySet-AssociativeCache• 28 =256setseachwithfourways(eachwithoneblock)

3130 ... 131211... 210 Byteoffset

DataTagV012...

253254255

DataTagV012...

253254255

DataTagV012...

253254255

SetIndex

DataTagV012...

253254255

8Index

22Tag

Hit Data

32

4x1select

Way0 Way1 Way2 Way3

10/18/16 Fall2016- Lecture#15 12

Page 13: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Outline

• CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…

10/18/16 Fall2016– Lecture#15 13

Page 14: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

HandlingStoreswithWrite-Through

• Storeinstructionswritetomemory,changingvalues

• Needtomakesurecacheandmemoryhavesamevaluesonwrites:twopolicies

1)Write-ThroughPolicy:writecacheandwritethroughthecachetomemory– Everywriteeventuallygetstomemory– Tooslow,soincludeWriteBuffertoallowprocessortocontinueoncedatainBuffer

– Bufferupdatesmemoryinparalleltoprocessor

10/18/16 Fall2016- Lecture#15 14

Page 15: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Write-ThroughCache

• Writebothvaluesincacheandinmemory

• WritebufferstopsCPUfromstallingifmemorycannotkeepup

• Writebuffermayhavemultipleentriestoabsorbburstsofwrites

• Whatifstoremissesincache?

Processor

32-bitAddress

32-bitData

Cache

32-bitAddress

32-bitData

Memory

1022 99252

720

12

1312041 Addr Data

WriteBuffer

10/18/16 Fall2016- Lecture#15 15

Page 16: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

HandlingStoreswithWrite-Back

2)Write-BackPolicy:writeonlytocacheandthenwritecacheblockbacktomemorywhenevictblockfromcache–Writescollectedincache,onlysinglewritetomemoryperblock

– Includebittoseeifwrotetoblockornot,andthenonlywritebackifbitisset• Called“Dirty”bit(writingmakesit“dirty”)

10/18/16 Fall2016- Lecture#15 16

Page 17: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Write-BackCache

• Store/cachehit,writedataincacheonlyandsetdirtybit– Memoryhasstalevalue

• Store/cachemiss,readdatafrommemory,thenupdateandsetdirtybit– “Write-allocate”policy

• Load/cachehit,usevaluefromcache

• Onanymiss,writebackevictedblock,onlyifdirty.Updatecachewithnewblockandcleardirtybit

Processor

32-bitAddress

32-bitData

Cache

32-bitAddress

32-bitData

Memory

1022 99252

720

12

1312041

DDDD

DirtyBits

10/18/16 Fall2016- Lecture#15 17

Page 18: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Write-Throughvs.Write-Back

• Write-Through:– Simplercontrollogic– Morepredictabletimingsimplifiesprocessorcontrollogic

– Easiertomakereliable,sincememoryalwayshascopyofdata(bigidea:Redundancy!)

• Write-Back– Morecomplexcontrollogic– Morevariabletiming(0,1,2memoryaccessespercacheaccess)

– Usuallyreduceswritetraffic

– Hardertomakereliable,sometimescachehasonlycopyofdata

10/18/16 Fall2016- Lecture#15 18

Page 19: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Administrivia• Midterm#22weeksaway!

November1!– Inclass!3:40-5PM– Synchronousdigitaldesignand

Project3(processordesign)included

– PipelinesandCaches– ONEDoublesidedCribsheet– ReviewSession,Sunday,10/30,

1-3PM,10Evans

1910/18/16 Fall2016- Lecture#15

Page 20: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

iClicker Saga

10/18/16 Fall2016-- Lecture#15 20

Page 21: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

iClicker andEPA

• Nolongertakingattendanceinlecture– butwehopeyouwillcontinuetocomeanyway

• ContinuetouseClickerquestionsinlecturetohelpyoutestyourunderstanding

• EPAwillbebasedona“holistic”assessmentoflecture,piazza,guerrillaandtutoringsessions,officehours,discussion,andlabparticipation

• EPAwillbecalculatedsoastoonlyhelpyourcoursegrade,neverhurtit

10/18/16 Fall2016-- Lecture#15 21

Page 22: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Outline

• CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…

10/18/16 Fall2016– Lecture#15 22

Page 23: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Cache(Performance) Terms

• Hitrate:fractionofaccessesthathitinthecache• Missrate:1– Hitrate• Misspenalty:timetoreplaceablockfromlowerlevelinmemoryhierarchytocache

• Hittime:timetoaccesscachememory(includingtagcomparison)

• Abbreviation:“$”=cache(aBerkeleyinnovation!)

10/18/16 Fall2016- Lecture#15 23

Page 24: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

AverageMemoryAccessTime(AMAT)• AverageMemoryAccessTime(AMAT)istheaveragetimetoaccessmemoryconsideringbothhitsandmissesinthecache

AMAT= Timeforahit+Missrate× Misspenalty

10/18/16 Fall2016- Lecture#15 24

Page 25: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Clickers/PeerInstruction

AMAT=Timeforahit+MissratexMisspenalty• Givena200psec clock,amisspenaltyof50clockcycles,amissrateof0.02missesperinstructionandacachehittimeof1clockcycle,whatisAMAT?A:≤200psecB:400psecC:600psecD: 800psec

2510/18/16 Fall2016- Lecture#15

Page 26: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Clickers/PeerInstruction

AMAT=Timeforahit+MissratexMisspenalty• Givena200psec clock,amisspenaltyof50clockcycles,amissrateof0.02missesperinstructionandacachehittimeof1clockcycle,whatisAMAT?A:≤200psecB:400psecC:600psecD: 800psec

2610/18/16 Fall2016- Lecture#15

1clockcycle+.02*50clockcycles=2clockcycles

Page 27: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

PingPongCacheExample:Direct-MappedCachew/4Single-WordBlocks,Worst-CaseReferenceString

0 4 0 4

0 4 0 4

• Considerthemainmemoryaddressreferencestringofwordnumbers:04040404

Startwithanemptycache- allblocksinitiallymarkedasnotvalid

10/18/16 Fall2016- Lecture#15 27

Page 28: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

0 4 0 4

0 4 0 4

miss miss miss miss

miss miss miss miss

00Mem(0) 00Mem(0)01 4

01Mem(4)000

00Mem(0)01 4

00Mem(0)01 4

00Mem(0)01 4

01Mem(4)000

01Mem(4)000

Startwithanemptycache- allblocksinitiallymarkedasnotvalid

Ping-pong effectduetoconflictmisses- twomemorylocationsthatmapintothesamecacheblock

• 8requests,8misses

• Considerthemainmemoryaddressreferencestringofwordnumbers:04040404

10/18/16 Fall2016- Lecture#15 28

PingPongCacheExample:Direct-MappedCachew/4Single-WordBlocks,Worst-CaseReferenceString

Page 29: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Outline

• CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…

10/18/16 Fall2016– Lecture#15 29

Page 30: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

AlternativeBlockPlacementSchemes

• DMplacement:mem block12in8blockcache:onlyonecacheblockwheremem block12canbefound—(12modulo8)=4

• SAplacement:foursetsx 2-ways(8cacheblocks),memoryblock12inset(12mod4)=0;eitherelementoftheset

• FAplacement:mem block12canappearinanycacheblocks10/18/16 Fall2016- Lecture#15 30

Page 31: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Example:2-WaySetAssociative$(4words=2setsx2waysperset)

0

Cache

MainMemory

Q:Howdowefindit?

Usenext1lowordermemoryaddressbittodeterminewhichcacheset(i.e.,modulothenumberofsetsinthecache)

Tag Data

Q:Isitthere?

Compareall thecachetagsinthesettothehighorder3memoryaddressbits totellifthememoryblockisinthecache

V

0000xx0001xx0010xx0011xx0100xx0101xx0110xx0111xx1000xx1001xx1010xx1011xx1100xx1101xx1110xx1111xx

Set

1

01

Way

0

1

OnewordblocksTwoloworderbitsdefine thebyteintheword(32bwords)

10/18/16 Fall2016- Lecture#15 31

Page 32: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

PingPongCacheExample:4Word2-WaySA$,SameReferenceString

0 4 0 4

• Considerthemainmemorywordreferencestring04040404Startwithanemptycache- allblocks

initiallymarkedasnotvalid

10/18/16 Fall2016- Lecture#15 32

Page 33: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

PingPongCacheExample:4-Word2-WaySA$,SameReferenceString

0 4 0 4

• Considerthemainmemoryaddressreferencestring04040404

miss miss hit hit

000Mem(0) 000Mem(0)

Startwithanemptycache- allblocksinitiallymarkedasnotvalid

010Mem(4) 010Mem(4)

000Mem(0) 000Mem(0)

010Mem(4)

• Solvestheping-pongeffectinadirect-mappedcacheduetoconflictmissessincenowtwomemorylocationsthatmapintothesamecachesetcanco-exist!

• 8requests,2misses

10/18/16 Fall2016- Lecture#15 33

Page 34: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Four-WaySet-AssociativeCache• 28 =256setseachwithfourways(eachwithoneblock)

3130 ... 131211... 210 Byteoffset

DataTagV012...

253254255

DataTagV012...

253254255

DataTagV012...

253254255

Index DataTagV012...

253254255

8Index

22Tag

Hit Data

32

4x1select

Way0 Way1 Way2 Way3

10/18/16 Fall2016- Lecture#15 34

Page 35: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

AlternativeOrganizationsofanEight-BlockCache

Totalsizeof$inblocksisequaltonumberofsets× associativity.Forfixed$sizeandfixedblocksize,increasing associativitydecreasesnumberofsetswhileincreasingnumberofelementsperset.Witheightblocks,an8-wayset-associative$issameasafullyassociative$.

10/18/16 Fall2016- Lecture#15 35

Page 36: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

RangeofSet-AssociativeCaches• Forafixed-sizecacheandfixedblocksize,eachincreasebyafactoroftwoinassociativitydoublesthenumberofblocksperset(i.e.,thenumberorways)andhalvesthenumberofsets– decreasesthesizeoftheindexby1bitandincreasesthesizeofthetagby1bit

Wordoffset ByteoffsetIndexTag

10/18/16 Fall2016- Lecture#15 36

Page 37: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

RangeofSet-AssociativeCaches• Forafixed-sizecacheandfixedblocksize,eachincreasebyafactoroftwoinassociativitydoublesthenumberofblocksperset(i.e.,thenumberorways)andhalvesthenumberofsets– decreasesthesizeoftheindexby1bitandincreasesthesizeofthetagby1bit

Wordoffset ByteoffsetIndexTag

Decreasingassociativity,lowerway,moresets

Fullyassociative(onlyoneset)Tagisallthebitsexceptblockandbyteoffset

Directmapped(onlyoneway)Smallertags,onlyasinglecomparator

Increasingassociativity,higherway,lesssets

SelectsthesetUsedfortagcompare Selectsthewordintheblock

10/18/16 Fall2016- Lecture#15 37

Page 38: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

TotalCacheCapacity=Associativity× #ofsets× block_sizeBytes=blocks/set× sets× Bytes/block

ByteOffsetTag Index

C=N× S× B

address_size =tag_size +index_size +offset_size=tag_size +log2(S)+log2(B)

10/18/16 Fall2016- Lecture#15 38

Page 39: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

TotalCacheCapacity=

39

Associativity*#ofsets*block_sizeBytes=blocks/set*sets*Bytes/block

ByteOffsetTag Index

C=N*S*B

address_size =tag_size +index_size +offset_size=tag_size +log2(S)+log2(B)

DoubletheAssociativity:Numberofsets?tag_size?index_size?#comparators?

DoubletheSets:Associativity?tag_size?index_size?#comparators?

10/18/16 Fall2016- Lecture#15

Page 40: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

TotalCacheCapacity=

40

Associativity*#ofsets*block_sizeBytes=blocks/set*sets*Bytes/block

ByteOffsetTag Index

C=N*S*B

address_size =tag_size +index_size +offset_size=tag_size +log2(S)+log2(B)

DoubletheAssociativity:Halvethenumberofsetstag_size +1whileindex_size – 1,2xcomparators

DoubletheSets:Halvetheassociativitytag_size - 1whileindex_size +1,½xcomparators

10/18/16 Fall2016- Lecture#15

Page 41: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

YourTurn• Foracacheof64blocks,eachblockfourbytesinsize:

1. Thecapacityofthecacheis:256 bytes.2. Givena2-waySetAssociativeorganization,thereare32

sets,eachof2 blocks,and2 placesablockfrommemorycouldbeplaced.

3. Givena4-waySetAssociativeorganization,thereare16setseachof4 blocksand4 placesablockfrommemorycouldbeplaced.

4. Givenan8-waySetAssociativeorganization,thereare8setseachof8 blocksand8 placesablockfrommemorycouldbeplaced.

10/18/16 Fall2016- Lecture#15 41

Page 42: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Clicker/PeerInstruction• ForSsets,Nways,Bblocks,whichstatementshold?

(i)ThecachehasBtags(ii)ThecacheneedsNcomparators(iii)B=NxS(iv)SizeofIndex=Log2(S)

A:(i)onlyB:(i)and(ii)onlyC:(i),(ii),(iii)onlyD:AllfourstatementsaretrueE:Nonearetrue

10/18/16 Fall2016- Lecture#15 42

Page 43: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

CostsofSet-AssociativeCaches• N-wayset-associativecachecosts– Ncomparators(delayandarea)– MUXdelay(setselection)beforedataisavailable– Dataavailableaftersetselection(andHit/Missdecision).DM$:blockisavailablebeforetheHit/Missdecision• InSet-Associative,notpossibletojustassumeahitandcontinueandrecoverlaterifitwasamiss

• Whenmissoccurs,whichway’sblockselectedforreplacement?– LeastRecentlyUsed(LRU):onethathasbeenunusedthelongest(principleoftemporallocality)• Musttrackwheneachway’sblockwasusedrelativetootherblocksintheset

• For2-waySA$,onebitperset→setto1whenablockisreferenced;resettheotherway’sbit(i.e.,“lastused”)

10/18/16 Fall2016- Lecture#15 43

Page 44: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

CacheReplacementPolicies• RandomReplacement

– Hardwarerandomlyselectsacacheevict• Least-RecentlyUsed

– Hardwarekeepstrackofaccesshistory– Replacetheentrythathasnotbeenusedforthelongesttime– For2-wayset-associativecache,needonebitforLRUreplacement

• ExampleofaSimple“Pseudo”LRUImplementation– Assume64FullyAssociativeentries– Hardwarereplacementpointerpointstoonecacheentry– Wheneveraccessismadetotheentrythepointerpointsto:

• Movethepointertothenextentry– Otherwise:donotmovethepointer– (exampleof“not-most-recentlyused”

replacementpolicy)

44

:

Entry0Entry1

Entry63

ReplacementPointer

10/18/16 Fall2016- Lecture#15

Page 45: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

BenefitsofSet-AssociativeCaches• ChoiceofDM$versusSA$dependsonthecostofamiss

versusthecostofimplementation

• Largestgainsareingoingfromdirectmappedto2-way(20%+reductioninmissrate)

10/18/16 Fall2016- Lecture#15 45

Page 46: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

Outline

• CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…

10/18/16 Fall2016– Lecture#15 46

Page 47: CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2 Instructors: ... 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000

And inConclusion…

• NameoftheGame:ReduceAMAT–ReduceHitTime–ReduceMissRate–ReduceMissPenalty

• Balancecacheparameters(Capacity,associativity,blocksize)

10/18/16 Fall2016- Lecture#15 47