36
CS 110 Computer Architecture Caches Part 1 Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University 1 Slides based on UC Berkley's CS61C

CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

CS110ComputerArchitecture

CachesPart1

Instructor:SörenSchwertfeger

http://shtech.org/courses/ca/

School of Information Science and Technology SIST

ShanghaiTech University

1Slides based on UC Berkley's CS61C

Page 2: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

New-SchoolMachineStructures(It’sabitmorecomplicated!)

• ParallelRequestsAssignedtocomputere.g.,Search“Katz”

• ParallelThreadsAssignedtocoree.g.,Lookup,Ads

• ParallelInstructions>[email protected].,5pipelinedinstructions

• ParallelData>[email protected].,Addof4pairsofwords

• HardwaredescriptionsAllgates@onetime

• ProgrammingLanguages2

SmartPhone

WarehouseScale

Computer

SoftwareHardware

HarnessParallelism&AchieveHighPerformance

LogicGates

Core Core…

Memory(Cache)

Input/Output

Computer

CacheMemory

Core

InstructionUnit(s) FunctionalUnit(s)

A3+B3A2+B2A1+B1A0+B0

Howdoweknow?

Page 3: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

Processor

Control

Datapath

ComponentsofaComputer

3

PC

Registers

Arithmetic&LogicUnit(ALU)

MemoryInput

Output

Bytes

Enable?Read/Write

Address

WriteData

ReadData

Processor-MemoryInterface I/O-MemoryInterfaces

Program

Data

Page 4: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

Problem:Largememoriesslow?LibraryAnalogy

• Findingabookinalargelibrarytakestime– Takestimetosearchalargecardcatalog– (mappingtitle/authortoindexnumber)

– Round-triptimetowalktothestacksandretrievethedesiredbook.

• Largerlibrariesmakesbothdelaysworse• Electronicmemorieshavethesameissue,plusthetechnologiesthatweusetostoreanindividualbitgetslowerasweincreasedensity(SRAMversusDRAMversusMagneticDisk)

4Howeverwhatwewantisalargeyetfastmemory!

Page 5: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

Processor-DRAMGap(latency)

5

Time

µProc60%/year

DRAM7%/year

1

10

100

100019

8019

81

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

DRAM

CPU

1982

Processor-MemoryPerformanceGap:(growing50%/yr)

Perfo

rmance

1980microprocessorexecutes~oneinstructioninsametimeasDRAMaccess2015microprocessorexecutes~1000instructionsinsametimeasDRAMaccess

SlowDRAMaccesscouldhavedisastrousimpactonCPUperformance!

Page 6: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

BigIdea:MemoryHierarchyProcessor

Sizeofmemoryateachlevel

Increasingdistancefromprocessor,decreasingspeed

Level1

Level2

Leveln

Level3

...

Inner

Outer

Levelsinmemoryhierarchy

Aswemoveto outerlevelsthelatencygoesupandpriceperbitgoesdown.Why?

6

Page 7: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

Whattodo:LibraryAnalogy• Wanttowriteareportusinglibrarybooks• Gotolibrary,lookuprelevantbooks,fetchfromstacks,andplaceondeskinlibrary

• Ifneedmore,checkthemoutandkeepondesk– Butdon’treturnearlierbookssincemightneedthem

• Youhopethiscollectionof~10booksondeskenoughtowritereport,despite10beingonlyatinyfractionofbooksavailable

7

Page 8: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

RealMemoryReferencePatterns

Donald J. Hatfield, Jeanette Gerald: Program Restructuring for Virtual Memory. IBM Systems Journal 10(3): 168-192 (1971)

Time

Mem

oryAd

dress(on

edo

tperaccess)

Page 9: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

BigIdea:Locality

• TemporalLocality(localityintime)– Gobacktosamebookondesktopmultipletimes– Ifamemorylocationisreferenced,thenitwilltendtobereferencedagainsoon

• SpatialLocality (localityinspace)– Whengotobookshelf,pickupmultiplebooksonJ.D.Salingersincelibrarystoresrelatedbookstogether

– Ifamemorylocationisreferenced,thelocationswithnearbyaddresseswilltendtobereferencedsoon

9

Page 10: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

MemoryReferencePatterns

Donald J. Hatfield, Jeanette Gerald: Program Restructuring for Virtual Memory. IBM Systems Journal 10(3): 168-192 (1971)

Time

Mem

oryAd

dress(on

edo

tperaccess)

SpatialLocality

TemporalLocality

Page 11: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

PrincipleofLocality

• PrincipleofLocality:Programsaccesssmallportionofaddressspaceatanyinstantoftime(spatiallocality)andrepeatedlyaccessthatportion(temporallocality)

• Whatprogramstructuresleadtotemporalandspatiallocalityininstruction accesses?

• Indata accesses?

11

Page 12: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

MemoryReferencePatternsAddress

Time

Instructionfetches

Stackaccesses

Dataaccesses

nloopiterations

subroutinecall

subroutinereturn

argumentaccess

scalaraccesses

Page 13: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

CachePhilosophy• Programmer-invisiblehardwaremechanismtogiveillusionofspeedoffastestmemorywithsizeoflargestmemory–Worksfineevenifprogrammerhasnoideawhatacacheis

– However,performance-orientedprogrammerstodaysometimes“reverseengineer”cachedesigntodesigndatastructurestomatchcache

13

Page 14: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

MemoryAccesswithoutCache

• Loadwordinstruction:lw $t0,0($t1)• $t1contains1022ten,Memory[1022]=99

1. Processorissuesaddress1022tentoMemory2. Memoryreadswordataddress1022ten(99)3. Memorysends99toProcessor4. Processorloads99intoregister$t0

14

Page 15: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

Processor

Control

Datapath

AddingCachetoComputer

15

PC

Registers

Arithmetic&LogicUnit(ALU)

MemoryInput

Output

Bytes

Enable?Read/Write

Address

WriteData

ReadData

Processor-MemoryInterface I/O-MemoryInterfaces

Program

Data

Cache

Page 16: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

MemoryAccesswithCache• Loadwordinstruction:lw $t0,0($t1)• $t1contains1022ten,Memory[1022]=99• Withcache:Processorissuesaddress1022tentoCache1. Cachecheckstoseeifhascopyofdataataddress

1022ten2a. Iffindsamatch(Hit):cachereads99,sendstoprocessor2b. Nomatch(Miss):cachesendsaddress1022toMemory

I. Memoryreads99ataddress1022tenII. Memorysends99toCacheIII. Cachereplaceswordwithnew99IV. Cachesends99toprocessor

2. Processorloads99intoregister$t016

Page 17: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

Cache“Tags”• Needwaytotellifhavecopyoflocationinmemorysothatcandecideonhitormiss

• Oncachemiss,putmemoryaddressofblockin“tagaddress”ofcacheblock1022placedintagnexttodatafrommemory(99)

17

Tag Data

252 121022 99131 72041 20

Fromearlierinstructions

Page 18: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

Anatomyofa16ByteCache,4ByteBlock

• Operations:1. CacheHit2. CacheMiss3. Refillcachefrom

memory

• CacheneedsAddressTagstodecideifProcessorAddressisaCacheHitorCacheMiss– Comparesall4tags

18

Processor

32-bitAddress

32-bitData

Cache

32-bitAddress

32-bitData

Memory

1022 99252

720

12

1312041

Page 19: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

Tag Data

252 121022 99131 72041 20

Tag Data

252 121022 99511 112041 20

CacheReplacement• Supposeprocessornowrequestslocation511,whichcontains11?

• Doesn’tmatchanycacheblock,somust“evict”oneresidentblocktomakeroom– Whichblocktoevict?

• Replace“victim”withnewmemoryblockataddress511

19

Page 20: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

BlockMustbeAlignedinMemory

• Wordblocksarealigned,sobinaryaddressofallwordsincachealwaysendsin00two

• Howtotakeadvantageofthistosavehardwareandenergy?

• Don’tneedtocomparelast2bitsof32-bitbyteaddress(comparatorcanbenarrower)

=>Don’tneedtostorelast2bitsof32-bitbyteaddressinCacheTag(Tagcanbenarrower)

20

Page 21: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

Anatomyofa32BCache,8BBlock

21

• Blocksmustbealignedinpairs,otherwisecouldgetsamewordtwiceincache

Ø Tagsonlyhaveeven-numberedwords

Ø Last3bitsofaddressalways000two

Ø Tags,comparatorscanbenarrower

• Cangethitforeitherwordinblock

Processor

32-bitAddress

32-bitData

Cache

32-bitAddress

32-bitData

Memory

1022 99252

421947

12

1302040

1000720

-10

Page 22: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

HardwareCostofCache

• NeedtocompareeverytagtotheProcessoraddress

• Comparatorsareexpensive

• Optimization:use2“sets”ofdatawithatotalofonly2comparators

• 1Addressbitselectswhichset

• Compareonlytagsfromselectedset

• Generalizetomoresets2222

Processor

32-bitAddress

Tag Data

32-bitData

Cache32-bitAddress

32-bitData

Memory

Tag Data

Set0

Set1

Tag Data

Tag Data

Page 23: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

ProcessorAddressFieldsusedbyCacheController

• BlockOffset:Byteaddresswithinblock• SetIndex:Selectswhichset• Tag:Remainingportionofprocessoraddress

• SizeofIndex=log2(numberofsets)• SizeofTag=Addresssize– SizeofIndex– log2(numberofbytes/block)

Block offsetSetIndexTag

23

ProcessorAddress(32-bitstotal)

Page 24: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

Whatislimittonumberofsets?• Foragiventotalnumberofblocks,wecansavemorecomparatorsifhavemorethan2sets

• Limit:AsManySetsasCacheBlocks=>onlyoneblockperset– onlyneedsonecomparator!

• Called“Direct-Mapped”Design

24

Block offsetIndexTag

Page 25: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

DirectMappedCacheEx:Mappinga6-bitMemoryAddress

• Inexample,blocksizeis4bytes/1word• Memoryandcacheblocksalwaysthesamesize,unitoftransferbetween

memoryandcache• #Memoryblocks>>#Cacheblocks

– 16Memoryblocks=16words=64bytes=>6bitstoaddressallbytes– 4Cacheblocks,4bytes(1word)perblock– 4Memoryblocksmaptoeachcacheblock

• Memoryblocktocacheblock,akaindex:middletwobits• Whichmemoryblockisinagivencacheblock,akatag:toptwobits

25

05 1

ByteWithinBlock

ByteOffset

23

BlockWithin$

4

Mem BlockWithin$Block

Tag Index

Page 26: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

OneMoreDetail:ValidBit

• Whenstartanewprogram,cachedoesnothavevalidinformationforthisprogram

• Needanindicatorwhetherthistagentryisvalidforthisprogram

• Adda“validbit”tothecachetagentry0=>cachemiss,evenifbychance,address=tag1=>cachehit,ifprocessoraddress=tag

26

Page 27: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

Caching:ASimpleFirstExample

00011011

Cache

MainMemory

Q:Whereinthecacheisthemem block?

Usenext2low-ordermemoryaddressbits–theindex– todeterminewhichcacheblock(i.e.,modulothenumberofblocksinthecache)

Tag Data

Q:Isthememoryblockincache?Comparethecachetagtothehigh-order2memoryaddressbitstotellifthememoryblockisinthecache(providedvalidbitisset)

Valid

0000xx0001xx0010xx0011xx0100xx0101xx0110xx0111xx1000xx1001xx1010xx1011xx1100xx1101xx1110xx1111xx

OnewordblocksTwoloworderbits(xx)definethebyteintheblock(32bwords)

Index

27

Page 28: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

• Onewordblocks,cachesize=1Kwords(or4KB)

Direct-MappedCacheExample

20Tag 10Index

DataIndex TagValid012...

102110221023

3130... 131211... 210Byteoffset

Whatkindoflocalityarewetakingadvantageof?

20

Data

32

Hit

28

Validbitensures

somethingusefulincacheforthisindex

CompareTagwith

upperpartofAddresstoseeifaHit

Readdatafromcacheinstead

ofmemoryifaHit

Comparator

Page 29: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

• Fourwords/block,cachesize=1Kwords

Multiword-BlockDirect-MappedCache

8Index

2

DataIndex TagValid012...

253254255

3130... 1312 11...4 3210 Byteoffset

20

20Tag

Hit Data

32

Wordoffset

Whatkindoflocalityarewetakingadvantageof?29

Page 30: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

CacheNamesforEachOrganization• “FullyAssociative”:Blockcangoanywhere– Firstdesigninlecture– Note:NoIndexfield,but1comparator/block

• “DirectMapped”:Blockgoesoneplace– Note:Only1comparator– Numberofsets=numberblocks

• “N-waySetAssociative”:Nplacesforablock– Numberofsets=numberofblocks/N– Ncomparators– FullyAssociative:N=numberofblocks– DirectMapped:N=1

30

Page 31: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

RangeofSet-AssociativeCaches• Forafixed-sizecache,andagivenblocksize,eachincreasebyafactorof 2inassociativitydoublesthenumberofblocksperset(i.e.,thenumberof“ways”)andhalvesthenumberofsets–• decreasesthesizeoftheindexby1bitandincreasesthesizeofthetagby1bit

31

Block offsetIndexTag

MoreAssociativity(moreways)

Whatifwecanalsochangetheblocksize?

Page 32: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

Question• Foracachewithconstanttotalcapacity, ifweincreasethenumberofwaysbyafactorof2,whichstatementisfalse:

• A:Thenumberofsetscouldbedoubled• B:Thetagwidthcoulddecrease• C:Theblocksizecouldstaythesame• D:Theblocksizecouldbehalved• E:Tagwidthmustincrease

32

Page 33: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

TotalCashCapacity=

33

Associativity*#ofsets*block_sizeBytes=blocks/set*sets*Bytes/block

ByteOffsetTag Index

C=N*S*B

address_size =tag_size +index_size +offset_size=tag_size +log2(S)+log2(B)

ClickerQuestion:Cremainsconstant,Sand/orBcanchangesuchthatC=2N*(SB)’=>(SB)’=SB/2

Tag_size =address_size – (log2(S)+log2(B))=address_size – log2(SB)=address_size – (log2(SB)– 1)

Page 34: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

Second-LevelCache(SRAM)

TypicalMemoryHierarchyControl

Datapath

SecondaryMemory(Disk

OrFlash)

On-ChipComponents

RegFile

MainMemory(DRAM)Data

CacheInstrCache

Speed(cycles):½’s 1’s 10’s 100’s 1,000,000’s

Size(bytes): 100’s 10K’sM’sG’sT’s

34

• Principleoflocality+memoryhierarchypresentsprogrammerwith≈asmuchmemoryasisavailableinthecheapest technologyatthe≈speedofferedbythefastest technology

Cost/bit:highest lowest

Third-LevelCache(SRAM)

Page 35: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

Inthenews:Intel3DXpoint• 375GB(2nd half20171.5TB)• In2015announcedas”1000timesfasterthanSSD”

• 500.000IOPS(very good value compared to SSD)• very low latency (40timesfaster than SSD)• ForDesktops:16and 32GB(44and 80USD)

35

Page 36: CS 110 Computer Architecture - ShanghaiTechrobotics.shanghaitech.edu.cn/courses/ca/17s/lecture/2017-CA-L15_Caches_I.pdf00 01 10 11 Cache Main Memory Q: Where in the cache is the memblock?

36

• TransparentlyintegratesintothememorysubsystemandmakestheSSDappearlikeDRAMtotheOSandapplications

• Upto8xmemoryextension• Lowlatencyandultra-highendurance