Carnegie Mellon
1
CacheLabImplementa/onandBlocking
AdityaShah Recita-on7:Oct8th,2015
Carnegie Mellon
2
WelcometotheWorldofPointers!
Carnegie Mellon
3
Outline¢ Schedule¢ Memoryorganiza/on¢ Caching
§ Differenttypesoflocality§ Cacheorganiza-on
¢ Cachelab§ Part(a)BuildingCacheSimulator§ Part(b)EfficientMatrixTranspose§ Blocking
Carnegie Mellon
4
ClassSchedule¢ CacheLab
§ DuethisThursday,Oct15th.§ Startnow(ifyouhaven’talready!)
¢ ExamSoon!§ Startdoingprac-ceproblems.§ TheyhavebeenuploadedontotheCourseWebsite!
Carnegie Mellon
5
MemoryHierarchy
Registers
L1cache(SRAM)
Mainmemory(DRAM)
Localsecondarystorage
(localdisks)
Larger,slower,cheaperperbyte
Remotesecondarystorage(tapes,distributedfilesystems,Webservers)
Localdisksholdfilesretrievedfromdisksonremotenetworkservers
Mainmemoryholdsdiskblocksretrievedfromlocaldisks
L2cache(SRAM)
L1cacheholdscachelinesretrievedfromL2cache
CPUregistersholdwordsretrievedfromL1cache
L2cacheholdscachelinesretrievedfrommainmemory
L0:
L1:
L2:
L3:
L4:
L5:
Smaller,faster,costlierperbyte
Carnegie Mellon
6
MemoryHierarchy
We will discuss this interaction
¢ Registers
¢ SRAM
¢ DRAM
¢ Local Secondary storage
¢ Remote Secondary storage
Carnegie Mellon
7
SRAM vs DRAM tradeoff¢ SRAM (cache)
§ Faster (L1 cache: 1 CPU cycle)§ Smaller (Kilobytes (L1) or Megabytes (L2))§ More expensive and “energy-hungry”
¢ DRAM (main memory)§ Relatively slower (hundreds of CPU cycles)§ Larger (Gigabytes)§ Cheaper
Carnegie Mellon
8
Locality¢ Temporallocality
§ Recentlyreferenceditemsarelikelytobereferencedagaininthenearfuture
§ AWeraccessingaddressXinmemory,savethebytesincacheforfutureaccess
¢ Spa/allocality§ Itemswithnearbyaddressestend
tobereferencedclosetogetherin-me§ AWeraccessingaddressX,savetheblockofmemoryaroundXin
cacheforfutureaccess
Carnegie Mellon
9
MemoryAddress¢ 64-bitonsharkmachines
¢ Blockoffset:bbits¢ Setindex:sbits¢ TagBits:(AddressSize–b–s)
Carnegie Mellon
10
Cache¢ A cache is a set of 2^s cache sets
¢ A cache set is a set of E cache lines§ E is called associativity§ If E=1, it is called “direct-mapped”
¢ Each cache line stores a block§ Each block has B = 2^b bytes
¢ Total Capacity = S*B*E
Carnegie Mellon
11
VisualCacheTerminologyE lines per set
S = 2s sets
0 1 2 B-1 tag v
valid bit B = 2b bytes per cache block (the data)
t bits s bits b bits Address of word:
tag set index
block offset
data begins at this offset
Carnegie Mellon
12
GeneralCacheConcepts
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
8 9 14 3Cache
MemoryLarger,slower,cheapermemoryviewedaspar//onedinto“blocks”
Dataiscopiedinblock-sizedtransferunits
Smaller,faster,moreexpensivememorycachesasubsetoftheblocks
4
4
4
10
10
10
Carnegie Mellon
13
GeneralCacheConcepts:Miss
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
8 9 14 3Cache
Memory
DatainblockbisneededRequest:12
Blockbisnotincache:Miss!
BlockbisfetchedfrommemoryRequest:12
12
12
12
Blockbisstoredincache• Placementpolicy:determineswherebgoes• Replacementpolicy:determineswhichblockgetsevicted(vic-m)
Carnegie Mellon
14
GeneralCachingConcepts:TypesofCacheMisses
¢ Cold(compulsory)miss§ Thefirstaccesstoablockhastobeamiss
¢ Conflictmiss§ Conflictmissesoccurwhenthelevelkcacheislargeenough,butmul-ple
dataobjectsallmaptothesamelevelkblock§ E.g.,Referencingblocks0,8,0,8,0,8,...wouldmissevery-me
¢ Capacitymiss§ Occurswhenthesetofac-vecacheblocks(workingset)islargerthan
thecache
Carnegie Mellon
15
CacheLab¢ Part(a)Buildingacachesimulator
¢ Part(b)Op/mizingmatrixtranspose
Carnegie Mellon
16
Part(a):Cachesimulator¢ AcachesimulatorisNOTacache!
§ MemorycontentsNOTstored§ BlockoffsetsareNOTused–thebbitsinyouraddressdon’t
mader.§ Simplycounthits,misses,andevic-ons
¢ Yourcachesimulatorneedstoworkfordifferents,b,E,givenatrun/me.
¢ UseLRU–LeastRecentlyUsedreplacementpolicy§ Evicttheleastrecentlyusedblockfromthecachetomakeroom
forthenextblock.§ Queues?TimeStamps?
Carnegie Mellon
17
Part(a):Hints¢ Acacheisjust2Darrayofcachelines:
§ structcache_linecache[S][E];§ S=2^s,isthenumberofsets§ Eisassocia-vity
¢ Eachcache_linehas:§ Validbit§ Tag§ LRUcounter(onlyifyouarenotusingaqueue)
Carnegie Mellon
18
Part(a):getopt
¢ getopt()automatesparsingelementsontheunixcommandlineIffunc/ondeclara/onismissing
§ Typicallycalledinalooptoretrievearguments§ Itsreturnvalueisstoredinalocalvariable§ Whengetopt()returns-1,therearenomoreop-ons
¢ Tousegetopt,yourprogrammustincludetheheaderfile #include<unistd.h>
¢ Ifnotrunningonthesharkmachinesthenyouwillneed#include<getopt.h>.
§ BederAdvice:RunonSharkMachines!
Carnegie Mellon
19
Part(a):getopt¢ Aswitchstatementisusedonthelocalvariableholdingthereturnvaluefromgetopt()
§ Eachcommandlineinputcasecanbetakencareofseparately§ “optarg”isanimportantvariable–itwillpointtothevalueofthe
op-onargument
¢ Thinkabouthowtohandleinvalidinputs
¢ Formoreinforma/on,§ lookatman3getopt§ hdp://www.gnu.org/soWware/libc/manual/html_node/
Getopt.html
Carnegie Mellon
20
Part(a):getoptExampleint main(int argc, char** argv){ int opt,x,y; /* looping over arguments */ while(-1 != (opt = getopt(argc, argv, “x:y:"))){ /* determine which argument it’s processing */ switch(opt) { case 'x': x = atoi(optarg); break; case ‘y': y = atoi(optarg); break; default: printf(“wrong argument\n"); break; } } }
¢ Supposetheprogramexecutablewascalled“foo”.Thenwewouldcall“./foo-x1–y3“topassthevalue1tovariablexand3toy.
Carnegie Mellon
21
Part(a):fscanf¢ Thefscanf()func/onisjustlikescanf()exceptitcanspecifyastreamtoreadfrom(scanfalwaysreadsfromstdin)
§ parameters:§ Astreampointer§ formatstringwithinforma-ononhowtoparsethefile§ therestarepointerstovariablestostoretheparseddata
§ Youtypicallywanttousethisfunc-oninaloop.Itreturns-1whenithitsEOForifthedatadoesn’tmatchtheformatstring
¢ Formoreinforma/on,§ manfscanf§ hdp://crasseux.com/books/ctutorial/fscanf.html
¢ fscanfwillbeusefulinreadinglinesfromthetracefiles.§ L10,1 § M20,1
Carnegie Mellon
22
Part(a):fscanfexampleFILE*pFile;//pointertoFILEobjectpFile=fopen("tracefile.txt",“r");//openfileforreadingcharidentifier;unsignedaddress;intsize;//Readinglineslike"M20,1"or"L19,3"while(fscanf(pFile,“%c%x,%d”,&identifier,&address,&size)>0){
//Dostuff}fclose(pFile);//remembertoclosefilewhendone
Carnegie Mellon
23
Part(a):Malloc/free¢ Usemalloctoallocatememoryontheheap
¢ Alwaysfreewhatyoumalloc,otherwisemaygetmemoryleak
§ some_pointer_you_malloced=malloc(sizeof(int));§ Free(some_pointer_you_malloced);
¢ Don’tfreememoryyoudidn’tallocate
Carnegie Mellon
24
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Part(b)EfficientMatrixTranspose¢ Matrix Transpose (A -> B)
Matrix A Matrix B
¢ How do we optimize this operation using the cache?
Carnegie Mellon
25
Part(b):EfficientMatrixTranspose¢ SupposeBlocksizeis8bytes?
¢ AccessA[0][0]cachemiss Shouldwehandle3&4¢ AccessB[0][0]cachemiss nextor5&6?¢ AccessA[0][1]cachehit¢ AccessB[1][0]cachemiss
Carnegie Mellon
26
Part(b):Blocking
¢ Blocking:dividematrixintosub-matrices.
¢ Sizeofsub-matrixdependsoncacheblocksize,cachesize,inputmatrixsize.
¢ Trydifferentsub-matrixsizes.
Carnegie Mellon
27
Example:MatrixMul/plica/on
a b
i
j
*c
=
c = (double *) calloc(sizeof(double), n*n); /* Multiply n x n matrices a and b */ void mmm(double *a, double *b, double *c, int n) { int i, j, k; for (i = 0; i < n; i++)
for (j = 0; j < n; j++) for (k = 0; k < n; k++)
c[i*n + j] += a[i*n + k] * b[k*n + j]; }
Carnegie Mellon
28
CacheMissAnalysis¢ Assume:
§ Matrixelementsaredoubles§ Cacheblock=8doubles§ CachesizeC<<n(muchsmallerthann)
¢ Firstitera/on:§ n/8+n=9n/8misses
§ AWerwardsincache:(schema-c)
*=
n
*=8wide
Carnegie Mellon
29
CacheMissAnalysis¢ Assume:
§ Matrixelementsaredoubles§ Cacheblock=8doubles§ CachesizeC<<n(muchsmallerthann)
¢ Seconditera/on:§ Again:
n/8+n=9n/8misses
¢ Totalmisses:§ 9n/8*n2=(9/8)*n3
n
*=8wide
Carnegie Mellon
30
BlockedMatrixMul/plica/onc = (double *) calloc(sizeof(double), n*n); /* Multiply n x n matrices a and b */ void mmm(double *a, double *b, double *c, int n) { int i, j, k; for (i = 0; i < n; i+=B)
for (j = 0; j < n; j+=B) for (k = 0; k < n; k+=B)
/* B x B mini matrix multiplications */ for (i1 = i; i1 < i+B; i++) for (j1 = j; j1 < j+B; j++) for (k1 = k; k1 < k+B; k++)
c[i1*n+j1] += a[i1*n + k1]*b[k1*n + j1]; }
a b
i1
j1
*c
=c
+
BlocksizeBxB
Carnegie Mellon
31
CacheMissAnalysis¢ Assume:
§ Cacheblock=8doubles§ CachesizeC<<n(muchsmallerthann)§ Threeblocksfitintocache:3B2<C
¢ First(block)itera/on:§ B2/8missesforeachblock§ 2n/B*B2/8=nB/4
(omiyngmatrixc)
§ AWerwardsincache(schema-c)
*=
*=
BlocksizeBxB
n/Bblocks
Carnegie Mellon
32
CacheMissAnalysis¢ Assume:
§ Cacheblock=8doubles§ CachesizeC<<n(muchsmallerthann)§ Threeblocksfitintocache:3B2<C
¢ Second(block)itera/on:§ Sameasfirstitera-on§ 2n/B*B2/8=nB/4
¢ Totalmisses:§ nB/4*(n/B)2=n3/(4B)
*=
BlocksizeBxB
n/Bblocks
Carnegie Mellon
33
Part(b):BlockingSummary¢ Noblocking:(9/8)*n3
¢ Blocking:1/(4B)*n3
¢ SuggestlargestpossibleblocksizeB,butlimit3B2<C!
¢ Reasonfordrama/cdifference:§ Matrixmul-plica-onhasinherenttemporallocality:
§ Inputdata:3n2,computa-on2n3
§ EveryarrayelementsusedO(n)-mes!§ Butprogramhastobewridenproperly
¢ Foradetaileddiscussionofblocking:§ hdp://csapp.cs.cmu.edu/public/waside.html
Carnegie Mellon
34
Part(b):Specs¢ Cache:
§ You get 1 kilobytes of cache§ Directly mapped (E=1)§ Block size is 32 bytes (b=5)§ There are 32 sets (s=5)
¢ Test Matrices:§ 32 by 32§ 64 by 64§ 61 by 67
Carnegie Mellon
35
Part(b)¢ Things you’ll need to know:
§ Warnings are errors§ Header files§ Eviction policies in the cache
Carnegie Mellon
36
WarningsareErrors¢ Strictcompila/onflags
¢ Reasons:§ Avoidpoten-alerrorsthatarehardtodebug§ Learngoodhabitsfromthebeginning
¢ Add“-Werror”toyourcompila/onflags
Carnegie Mellon
37
MissingHeaderFiles¢ Remembertoincludefilesthatwewillbeusingfunc/ons
from
¢ Iffunc/ondeclara/onismissing§ Findcorrespondingheaderfiles§ Use:man<func-on-name>
¢ Liveexample§ man3getopt
Carnegie Mellon
38
Evic/onpoliciesofCache¢ ThefirstrowofMatrixAevictsthefirstrowofMatrixB
§ Cachesarememoryaligned.§ MatrixAandBarestoredinmemoryataddressessuchthatboth
thefirstelementsaligntothesameplaceincache!§ Diagonalelementsevicteachother.
¢ Matricesarestoredinmemoryinarowmajororder.§ Iftheen-rematrixcan’tfitinthecache,thenaWerthecacheisfull
withalltheelementsitcanload.Thenextelementswillevicttheexis-ngelementsofthecache.
§ Example:-4x4Matrixofintegersanda32bytecache.§ Thethirdrowwillevictthefirstrow!
Carnegie Mellon
39
Style¢ Readthestyleguideline
§ ButIalreadyreadit!§ Good,readitagain.
¢ Startforminggoodhabitsnow!
Carnegie Mellon
40
Ques/ons?