17
ECE 1175 Embedded Systems Design Lab 3 – Cache & Memory ECE 1175 Embedded Systems Design 1

Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct

ECE 1175Embedded Systems Design

Lab 3 – Cache & Memory

ECE 1175 Embedded Systems Design

1

Page 2: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct

ECE 1175 – Lab 3

Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1

Direct GPIO Access Virtual/physical memory basics Raspberry Pi GPIO Lab task 2

ECE 1175 Embedded Systems Design 2

You need to use C/C++ to complete your lab work.

Page 3: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct

ARM Cortex-A53

Cache Basics

Cache: Fast but small memory close to the processor

ECE 1175 Embedded Systems Design 3

Caches on Raspberry Pi 3 Processor BCM2837

Data Cache Instruction Cache

L1 Cache (per core)

L2 Cache

Main Memory (used by CPU) Main Memory (used by GPU)

VideoCore IV

L2 Cache

Per slice

Per slice

Instruction CacheUniform Cache

Textual Memory Unit

Page 4: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct

0

1

2

3

4

5

6

7

… …

Cache Basics

How does cache work?

ECE 1175 Embedded Systems Design 4

0

1

2

3

4

5

6

7

… …

Memory

CPU

Look for data in

address 0

If not in cache, load the entire block into

the cache.

Load the entire block

Cache

The block size depends on specific cache design.

Page 5: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct

Cache Basics

Impact on C programming

ECE 1175 Embedded Systems Design 5

In C, multidimensional arrays are stored in row-major order in the memory. The way you access entries affects cache misses.

Address Row-major Column-major

0 a11 a11

1 a12 a21

2 a13 a31

3 a21 a12

4 a22 a22

5 a23 a32

6 a31 a13

… … …

Page 6: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct

Cache Basics

Impact on C programming

ECE 1175 Embedded Systems Design 6

Traverse a 2D array in row major order

Traverse a 2D array in column major order

for (i = 0; i < N; i++) {for (j = 0; j < N; j++) {

// Access a[j][i]}

}

for (i = 0; i < N; i++) {for (j = 0; j < N; j++) {

// Access a[i][j]}

}

Not sequential access, 100% cache misses!

Sequential access, a few compulsory cache missesstride = 1

stride = N

Let’s assume N is very large

Page 7: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct

Performance Analysis Tool – perf

perf Provide access to performance counters

• Hardware: CPU cycles, bus cycles, cache misses, etc.• Software: task clock, page faults, alignment faults, etc.• Use perf list to see available events

Offer a rich set of commands• Support multiple events• Repeated measurement• Processor-wide mode• Use perf --help to check info on a specific command

ECE 1175 Embedded Systems Design 7

Page 8: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct

Performance Analysis Tool – perf

Use perf on Raspberry Pi OS To install: sudo apt-get install linux-perf Bypass version check on Raspbian

ECE 1175 Embedded Systems Design 8

1. Check your installed perf version

2. Open /usr/bin/perf (use vim, nano, etc.)

3. Change exec “perf_$version” “$@” to exec “perf_4.9” “$@”Your installed version

Page 9: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct

Performance Analysis Tool – perf

An example of analyzing your program via perf

ECE 1175 Embedded Systems Design 9

Add events you want to measure

Measurements return

perf stat -e event1,event2,event3 [...] ./your_program

For more details: https://perf.wiki.kernel.org/index.php/Tutorial

Page 10: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct

Lab Task 1

Analyze your matrix multiplication program via perf

ECE 1175 Embedded Systems Design 10

𝐂𝐂 = 𝐀𝐀𝐀𝐀 is defined by 𝑐𝑐𝑖𝑖𝑖𝑖 = ∑𝑘𝑘=0𝑁𝑁−1 𝑎𝑎𝑖𝑖𝑘𝑘𝑏𝑏𝑘𝑘𝑖𝑖 .

Very high cache miss rate!

To reduce cache misses, you can try interchanging your loops. Use perf to measure L1 data cache misses.

Not sequentially accessed in memory

Page 11: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct

Virtual/Physical Memory Basics

In modern operating systems, physical memory is transparent to users.

ECE 1175 Embedded Systems Design 11

0

1

2

3

4

5

6

7

… …

0

1

2

3

4

5

6

7

… …

Physical Memory Virtual Memory

User Space

Mapping

Page 12: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct

Device file – dev/mem

dev/mem An image of main memory of computer Byte addresses in /dev/mem are interpreted as physical

memory (actual RAM address, registers).

ECE 1175 Embedded Systems Design 12

For more details: https://man7.org/linux/man-pages/man4/mem.4.html

Page 13: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct

Create mapping – mmap()

mmap() with dev/mem You can create a mapping from virtual to physical memory.

ECE 1175 Embedded Systems Design 13

0

1

2

3

4

5

6

7… …

0

1

2

3

4

5

6

7

… …

Physical Memory Virtual Memory

User Space

For more details: https://man7.org/linux/man-pages/man2/mmap.2.html

Page 14: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct

Lab Task 2

Direct GPIO manipulation on Raspberry Pi OS Find the physical address of GPIO registers in manual. Use mmap() and dev/mem to create a mapping. Control the GPIO registers from user space.

ECE 1175 Embedded Systems Design 14

0

1

2

3

4

5

… …0

1

2

3

4

5

… …Physical Memory Virtual Memory

User Space

Pi GPIO pinsGPIO Control Registers

set pin high/low

get pin status

Page 15: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct

Raspberry Pi GPIO

GPIO pinouts of Raspberry Pi 3

ECE 1175 Embedded Systems Design 15https://pinout.xyz/

Please select those pins without other specific purposes for your test.

Page 16: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct

Lab Task 2

For check-off You can code your GPIO pins to generate some voltage

patterns (e.g. square wave) and verify your configuration using multimeters or oscilloscopes.

What’s the benefit of direct control from OS? Easier than low-level assembly code on bare metal. OS provides more functionalities for you to develop

interesting applications. Many APIs available but you can customize your own for

specific purposes.ECE 1175 Embedded Systems Design 16

You can refer to the example here https://elinux.org/RPi_GPIO_Code_Samples.

Page 17: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct

ECE 1175 Embedded Systems Design 17

Thank you!