40
University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

Embed Size (px)

Citation preview

Page 1: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 1

Computer Systems

Cache characteristics

Page 2: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 2

How do you know?(ow133)cpuid

This system has a Genuine Intel(R) Pentium(R) 4 processorProcessor Family: F, Extended Family: 0, Model: 2, Stepping: 7

Pentium 4 core C1 (0.13 micron): core-speed 2 Ghz - 3.06 GHz (bus-speed 400/533 MHz)

Instruction TLB: 4K, 2M or 4M pages, fully associative, 128 entriesData TLB: 4K or 4M pages, fully associative, 64 entries1st-level data cache: 8K-bytes, 4-way set associative, sectored cache, 64-byte line sizeNo 2nd-level cache or, if processor contains a valid 2nd-level cache, no3rd-level cacheTrace cache: 12K-uops, 8-way set associative2nd-level cache: 512K-bytes, 8-way set associative, sectored cache, 64-byte line size

Page 3: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 3

CPUID-instruction

asm("push %ebx;movl $2,%eax;CPUID;movl %eax,cache_eax;movl %ebx,cache_ebx;movl %edx,cache_edx;movl %ecx,cache_ecx;pop %ebx");

Page 4: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 4

A limited number of answers

http://www.sandpile.org/ia32/cpuid.htm

Page 5: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 5

Direct-Mapped Cache

• Simplest kind of cache

• Characterized by exactly one line per set.

E=1 lines per setvalid

valid

valid

tag

tag

tag

• • •

set 0:

set 1:

set S-1:

cache block

cache block

cache block

Page 6: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 6

Fully Associative Cache

• Difficult and expensive to build one that is both large and fast

Valid

Valid

Tag

TagSet 0: E = C/B lines in

the one and only set

Valid Tag

• • •

Cache block

Cache block

Cache block

Page 7: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 7

E – way Set Associative Caches

• Characterized by more than one line per set

valid tagset 0: E=2 lines per set

set 1:

set S-1:

• • •

cache block

valid tag cache block

valid tag cache block

valid tag cache block

valid tag cache block

valid tag cache block

Page 8: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 8

Accessing Set Associative Caches

• Set selection– Use the set index bits to determine the set

of interest.

m-1

t bits s bits0 0 0 0 1

0

b bits

tag set index block offset

selected set

s = log2 (S)

valid

valid

tag

tagset 0:

valid

valid

tag

tagset 1:

valid

valid

tag

tagset S-1:

• • •

cache block

cache block

cache block

cache block

cache block

cache block

S = C / (B x E)

Page 9: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 9

Accessing Set Associative Caches• Line matching and word selection

– must compare the tag in each valid line in the selected set.

1 0110 w3w0 w1 w2

1 1001

t bits s bits100i0110

0m-1

b bits

tag set index block offset

selected set (i):

=1? (1) The valid bit must be set.

= ?(2) The tag bits in one of the cache lines must

match the tag bits inthe address

(3) If (1) and (2), then cache hit, and

block offset selects starting byte.

30 1 2 74 5 6

b = log2 (B)

Page 10: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 10

Observations

• Contiguously region of addresses form a block

• Multiple address blocks share the same set-index

• Blocks that map to the same set can be uniquely identified by the tag

Page 11: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 11

Caching in a Memory Hierarchy

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Larger, slower, cheaper storagedevice at level k+1 is partitionedinto blocks.

Data is copied betweenlevels in block-sized transfer units

8 9 14 3Smaller, faster, more expensivedevice at level k caches a subset of the blocks from level k+1

Level k:

Level k+1:

4

4

4 10

10

10

Same set-index

Page 12: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 12

Request14

Request12

General Caching Concepts• Program needs object d, which

is stored in some block b.• Cache hit

– Program finds b in the cache at level k. E.g., block 14.

• Cache miss– b is not at level k, so level k

cache must fetch it from level k+1. E.g., block 12.

– If level k cache is full, then some current block must be replaced (evicted). Which one is the “victim”?

9 3

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Level k:

Level k+1:

1414

12

14

4*

4*12

12

0 1 2 3

Request12

4*4*12

14

12

Conflict miss

Page 13: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 13

Intel Processors CacheSRAM

L1 L2

Instruction Data

Pentium II 1997 4-way, 32B, 128 sets

4-way,32B, 128 sets

4-way,32B, 4096 sets

Celeron A 1998 ,, ,, 4-way,32B, 1028 sets

Pentium III Coppermine

2000 ,, ,, 4-way,32B, 2048 sets

Pentium 4Willamette

2000 8-way 4-way,64B, 64 sets

8-way,64B, 512 sets

Pentium 4Northwood

2002 ,, ,, 8-way,64B, 1028 sets

http://www11.brinkster.com/bayup/dodownload.asp?f=37

Page 14: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 14

Impact

• Associativitymore lines decrease the vulnerability on thrashing. It requires more tag-bits and control logic, which increases the hit-time (1-2 cycles for L1)

• Block sizelarger blocks exploit spatial locality (not temporal locality) to increase the hit-rate. Larger blocks increase the miss-penalty (5- 100 cycles for L1).

Page 15: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 15

Design of DRAM cache– Line size?

• Large, since disk better at transferring large blocks– Associativity?

• High, to mimimize miss rate– Write through or write back?

• Write back, since can’t afford to perform small writes to disk

What would the impact of these choices be on:– miss rate

• Extremely low. << 1%– hit time

• Must match cache/DRAM performance– miss latency

• Very high. ~20ms– tag storage overhead

• Low, relative to block size

Page 16: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 16

Locating an Object in a “Cache”• SRAM Cache

– Tag stored with cache line– Maps from cache block to memory blocks

• From cached to uncached form• Save a few bits by only storing tag

– No tag for block not in cache– Hardware retrieves information

• can quickly match against multiple tags

X

Object NameTag Data

D 243

X 17

J 105

•••

•••

0:

1:

N-1:

= X?

“Cache”

Page 17: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 17

Locating an Object in “Cache”

Data

243

17

105

•••

0:

1:

N-1:

X

Object Name

Location

•••

D:

J:

X: 1

0

On Disk

“Cache”Page Table

• DRAM Cache– Each allocated page of virtual memory has entry in

page table– Mapping from virtual pages to physical pages

• From uncached form to cached form– Page table entry even if page not in memory

• Specifies disk address

Page 18: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 18

A System with Virtual Memory

Address Translation: Hardware converts virtual addresses to physical addresses via lookup table (page table)

CPU

0:1:

N-1:

Memory

0:1:

P-1:

Page Table

Disk

VirtualAddresses

PhysicalAddresses

Page 19: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 19

Page Faults (like “Cache Misses”)What if an object is on disk rather than in memory?

– Page table entry indicates virtual address not in memory– OS exception handler invoked to move data from disk into

memory• current process suspends, others can resume• OS has full control over placement, etc.

Memory

Page Table

Disk

VirtualAddresses

PhysicalAddresses

CPU

Memory

Page Table

Disk

VirtualAddresses

PhysicalAddresses

Before fault After fault

CPU

Page 20: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 20

VM Address Translation:Hardware vs Software

VM Address Translation:Hardware vs Software

Processor

HardwareAddr TransMechanism

faulthandler

MainMemory

Secondary memorya

a'

page fault

physical addressOS performsthis transfer(only if miss)

virtual address part of the on-chipmemory mgmt unit (MMU)

Page 21: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 21

virtual page number page offset virtual address

physical page number page offset physical address0p–1

address translation

pm–1

n–1 0p–1p

Page offset bits don’t change as a result of translation

VM Address Translation• Parameters

– P = 2p = page size (bytes). – N = 2n = Virtual address limit– M = 2m = Physical address limit

Page 22: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 22

Address Translation via Page TableAddress Translation via Page Table

virtual page number (VPN) page offset

virtual address

physical page number (PPN) page offset

physical address

0p–1pm–1

n–1 0p–1ppage table base register

if valid=0then pagenot in memory

valid physical page number (PPN)access

VPN acts astable index

PTEA= PTE

Page 23: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 23

CPUTrans-lation

Cache MainMemory

VA PA miss

hitdata

Integrating VM and Cache

• Most Caches “Physically Addressed”– Allows multiple processes to have blocks in cache at

same time– Cache doesn’t need to be concerned with protection

issues (Access rights checked as part of address translation)

• Perform Address Translation Before Cache – But this involves a memory access itself (of the PTE)– Of course, page table entries can also become cached

Page 24: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 24

CPUTLB

LookupCache

MainMemory

VA PA miss

hit

data

Trans-lation

hit

miss

Speeding up Translation with a dedicated cache

• “Translation Lookaside Buffer” (TLB)– Small hardware cache in MMU– Maps virtual page to physical page numbers– Contains complete PTEs for small number of pages

Page 25: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 25

What do you know?(ow133)cpuid

This system has a Genuine Intel(R) Pentium(R) 4 processorProcessor Family: F, Extended Family: 0, Model: 2, Stepping: 7

Pentium 4 core C1 (0.13 micron): core-speed 2 Ghz - 3.06 GHz (bus-speed 400/533 MHz)

Instruction TLB: 4K, 2M or 4M pages, fully associative, 128 entriesData TLB: 4K or 4M pages, fully associative, 64 entries1st-level data cache: 8K-bytes, 4-way set associative, sectored cache, 64-byte line sizeNo 2nd-level cache or, if processor contains a valid 2nd-level cache, no3rd-level cacheTrace cache: 12K-uops, 8-way set associative2nd-level cache: 512K-bytes, 8-way set associative, sectored cache, 64-byte line size

Page 26: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 26

Matrix Multiplication Example• Major Cache Effects to Consider

– Total cache size• Exploit temporal locality and keep the working set small (e.g., by

using blocking)– Block size

• Exploit spatial locality

• Description:– Multiply N x N matrices– O(N3) total operations– Accesses

• N reads per source element• N values summed per destination

• Assumption – No so large that cache not big enough to hold multiple rows

/* ijk */for (i=0; i<n; i++) { for (j=0; j<n; j++) { sum = 0.0; for (k=0; k<n; k++) sum += a[i][k] * b[k][j]; c[i][j] = sum; }}

/* ijk */for (i=0; i<n; i++) { for (j=0; j<n; j++) { sum = 0.0; for (k=0; k<n; k++) sum += a[i][k] * b[k][j]; c[i][j] = sum; }}

Variable sumheld in register

Page 27: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 27

Matrix Multiplication (ijk)/* ijk */for (i=0; i<n; i++) { for (j=0; j<n; j++) { sum = 0.0; for (k=0; k<n; k++) sum += a[i][k] * b[k][j]; c[i][j] = sum; }}

/* ijk */for (i=0; i<n; i++) { for (j=0; j<n; j++) { sum = 0.0; for (k=0; k<n; k++) sum += a[i][k] * b[k][j]; c[i][j] = sum; }}

A B C

(i,*)

(*,j)(i,j)

Inner loop:

Column-wise

Row-wise Fixed

• Misses per Inner Loop Iteration:A B C

0.25 1.0 0.0 = 1.25

Page 28: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 28

Matrix Multiplication (jik)/* jik */for (j=0; j<n; j++) { for (i=0; i<n; i++) { sum = 0.0; for (k=0; k<n; k++) sum += a[i][k] * b[k][j]; c[i][j] = sum }}

/* jik */for (j=0; j<n; j++) { for (i=0; i<n; i++) { sum = 0.0; for (k=0; k<n; k++) sum += a[i][k] * b[k][j]; c[i][j] = sum }}

A B C

(i,*)

(*,j)(i,j)

Inner loop:

Row-wise Column-wise

Fixed

• Misses per Inner Loop Iteration:A B C

0.25 1.0 0.0 = 1.25

Page 29: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 29

Matrix Multiplication (kij)/* kij */for (k=0; k<n; k++) { for (i=0; i<n; i++) { r = a[i][k]; for (j=0; j<n; j++) c[i][j] += r * b[k][j]; }}

/* kij */for (k=0; k<n; k++) { for (i=0; i<n; i++) { r = a[i][k]; for (j=0; j<n; j++) c[i][j] += r * b[k][j]; }}

A B C

(i,*)(i,k) (k,*)

Inner loop:

Row-wise Row-wiseFixed

• Misses per Inner Loop Iteration:A B C

0.0 0.25 0.25 = 0.5

Page 30: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 30

Matrix Multiplication (ikj)/* ikj */for (i=0; i<n; i++) { for (k=0; k<n; k++) { r = a[i][k]; for (j=0; j<n; j++) c[i][j] += r * b[k][j]; }}

/* ikj */for (i=0; i<n; i++) { for (k=0; k<n; k++) { r = a[i][k]; for (j=0; j<n; j++) c[i][j] += r * b[k][j]; }}

A B C

(i,*)(i,k) (k,*)

Inner loop:

Row-wise Row-wiseFixed

• Misses per Inner Loop Iteration:A B C

0.0 0.25 0.25 = 0.5

Page 31: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 31

Matrix Multiplication (jki)/* jki */for (j=0; j<n; j++) { for (k=0; k<n; k++) { r = b[k][j]; for (i=0; i<n; i++) c[i][j] += a[i][k] * r; }}

/* jki */for (j=0; j<n; j++) { for (k=0; k<n; k++) { r = b[k][j]; for (i=0; i<n; i++) c[i][j] += a[i][k] * r; }}

A B C

(*,j)(k,j)

Inner loop:

(*,k)

Column -wise

Column-wise

Fixed

• Misses per Inner Loop Iteration:A B C

1.0 0.0 1.0 = 2.0

Page 32: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 32

Matrix Multiplication (kji)/* kji */for (k=0; k<n; k++) { for (j=0; j<n; j++) { r = b[k][j]; for (i=0; i<n; i++) c[i][j] += a[i][k] * r; }}

/* kji */for (k=0; k<n; k++) { for (j=0; j<n; j++) { r = b[k][j]; for (i=0; i<n; i++) c[i][j] += a[i][k] * r; }}

A B C

(*,j)(k,j)

Inner loop:

(*,k)

FixedColumn-wise

Column-wise

• Misses per Inner Loop Iteration:A B C

1.0 0.0 1.0 = 2.0

Page 33: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 33

Summary of Matrix Multiplication

for (i=0; i<n; i++) {

for (j=0; j<n; j++) {

sum = 0.0;

for (k=0; k<n; k++)

sum += a[i][k] * b[k][j];

c[i][j] = sum;

}

}

ijk (& jik): • 2 loads, 0 stores• misses/iter = 1.25

for (k=0; k<n; k++) {

for (i=0; i<n; i++) {

r = a[i][k];

for (j=0; j<n; j++)

c[i][j] += r * b[k][j];

}

}

for (j=0; j<n; j++) {

for (k=0; k<n; k++) {

r = b[k][j];

for (i=0; i<n; i++)

c[i][j] += a[i][k] * r;

}

}

kij (& ikj): • 2 loads, 1 store• misses/iter = 0.5

jki (& kji): • 2 loads, 1 store• misses/iter = 2.0

Page 34: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 34

Matrix Multiply Performance• Miss rates are helpful but not perfect predictors.

• Code scheduling matters, too.

0

10

20

30

40

50

60

25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400

Array size (n)

Cyc

les

/ite

rati

on

kjijkikijikjjikijk

Page 35: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 35

Improving Temporal Locality by Blocking• Example: Blocked matrix multiplication

– do not mean “cache block”.– Instead, it mean a sub-block within the matrix.– Example: N = 8; sub-block size = 4

C11 = A11B11 + A12B21 C12 = A11B12 + A12B22

C21 = A21B11 + A22B21 C22 = A21B12 + A22B22

A11 A12

A21 A22

B11 B12

B21 B22

X = C11 C12

C21 C22

Key idea: Sub-blocks (i.e., Axy) can be treated just like scalars.

Page 36: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 36

Blocked Matrix Multiply (bijk)for (jj=0; jj<n; jj+=bsize) { for (i=0; i<n; i++) for (j=jj; j < min(jj+bsize,n); j++) c[i][j] = 0.0; for (kk=0; kk<n; kk+=bsize) { for (i=0; i<n; i++) { for (j=jj; j < min(jj+bsize,n); j++) { sum = 0.0 for (k=kk; k < min(kk+bsize,n); k++) { sum += a[i][k] * b[k][j]; } c[i][j] += sum; } } }}

Page 37: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 37

Blocked Matrix Multiply Analysis• Innermost loop pair multiplies

– a 1 X bsize sliver of A by – a bsize X bsize block of B and

accumulates into 1 X bsize sliver of C– Loop over i steps through n row slivers of A & C,

using same B

A B C

block reused n times in succession

row sliver accessedbsize times

Update successiveelements of sliver

i ikk

kk jjjj

Page 38: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 38

Matrix Multiply Performance• Blocking (bijk and bikj) improves performance

by a factor of two over unblocked versions– relatively insensitive to array size.

0

10

20

30

40

50

60

Array size (n)

Cy

cle

s/it

era

tio

n

kji

jki

kij

ikj

jik

ijk

bijk (bsize = 25)

bikj (bsize = 25)

Page 39: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 39

Concluding Observations• Programmer can optimize for cache performance

– How data structures are organized– How data are accessed

• Nested loop structure• Blocking is a general technique

• All systems favor “cache friendly code”– Getting absolute optimum performance is very

platform specific• Cache sizes, line sizes, associativities, etc.

– Can get most of the advantage with generic code• Keep working set reasonably small (temporal locality)• Use small strides (spatial locality)

Page 40: University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics

University of Amsterdam

Computer Systems – cache characteristics Arnoud Visser 40

Assignment

• Optimize your ‘image-processing’ code further with blocking

• Make your code a function of the L1-size