Upload
lelien
View
213
Download
0
Embed Size (px)
Citation preview
AIMS Embedded Systems ProgrammingMT 2018
Micro Architectures
Daniel Kroening
University of Oxford, Computer Science Department
Version 1.0, 2014
Outline
X86/Y86
ARM
Pipelining
Memory
D. Kroening: AIMS Embedded Systems Programming MT 2018 2
High-Level View of Microarchitectures
...
...
eax ebx
ecx
ZF
CPU
registers
cachesFUs
ALUFloat
Memory
memorymodule
memorymodule
I/O(USB, ...)
dataaddresscontrol
L1, L2
IP
D. Kroening: AIMS Embedded Systems Programming MT 2018 3
CPUs
I Process a sequential assembler program
I Data held in registers
I Program controls which data is given to which FU,and where the result is stored
I Program controls transfer of data between registers andmemory
I Caches speed up access to frequently used memory cells
D. Kroening: AIMS Embedded Systems Programming MT 2018 4
Instruction Set Architectures
I These summarise the behavior of a CPU from the point ofview of the programmer
I An ISA describes “what the CPU does”
I Ideally as little as possible about “how the CPU does it”
D. Kroening: AIMS Embedded Systems Programming MT 2018 5
We will study two ISAs:1. CISC: specifically the Y86
(academic variant of Intel’s x86)2. RISC: specifically the ARM 32 architecture
One of the goals of this course is to understand the difference
D. Kroening: AIMS Embedded Systems Programming MT 2018 6
Visible Registers
RAM
I Contains data and the program
Data registers
Index 0 1 2 3 4 5 6 7Name eax ecx edx ebx esp ebp esi edi
Instruction Pointer (IP)
I Points to address of current instruction
Flag registers (ZF, ...)
I Store flags for branches
D. Kroening: AIMS Embedded Systems Programming MT 2018 7
Y86 Assembler
I Subset of Intel’s x86 assembler
4 You can run a Y86 program on your x86 machine!
8 The reverse does not work in general,as too many instructions are missing(you are welcome to mend this)
D. Kroening: AIMS Embedded Systems Programming MT 2018 8
Y86 Instructions
I add/sub: Addition/subtraction of the values in tworegisters;ZF is set appropriately
I RRmov: copies value of one register into anotherI RMmov: copies value of a register into RAMI MRmov: copies value from RAM into a register
I jnz: Jumps to relative address if ZF = 0
D. Kroening: AIMS Embedded Systems Programming MT 2018 9
Y86 Loads and Stores
I Loads and stores have a Displacement :
ea = esi+ Displacement
I The displacement is included in the instruction word asimmediate constant
I The register esi is used as offset
D. Kroening: AIMS Embedded Systems Programming MT 2018 10
Y86 Instruction Formats
01
29
75
11
11
89
89
8b
11
01
7 6 3 0
01
110
f4
110
IP←IP+Distance
RD←RS
MEM[ea]←RS
RS
RS
Distance
RS
RS
RS Displacement
RD
RD
RD
SemanticsMnemonic Opcode
RS←MEM[ea]
RD←RD+RS
hlt
MRmov
RMmov
RRmov
jnz
sub
add
Displacement
RD←RD-RS
if(¬ZF)
D. Kroening: AIMS Embedded Systems Programming MT 2018 11
Example 1
add eax, edx
I Intel convention: the target register is always on the
left-hand side
I The target register is a source register, too!
I Semantics:
eax← eax + edx
D. Kroening: AIMS Embedded Systems Programming MT 2018 12
Example 2
mov edx, [BYTE one+esi]
8B 56 17
Opcode (MRmov) Displacement
01 010︸︷︷︸edx
110
Semantics:
edx← MEM[esi+17]
D. Kroening: AIMS Embedded Systems Programming MT 2018 13
How do Branches Work?
i f ( a==b ) {T ;
}else {
5 F ;}
→
mov eax , [BYTE a+es i ]mov ebx , [BYTE b+es i ]sub eax , ebxjnz f
5 ;; Code fo r ‘T ’;mov eax , [BYTE one+es i ]add eax , eax
10 jnz ef ;
; Code fo r ‘F ’;
15 e ; . . .
D. Kroening: AIMS Embedded Systems Programming MT 2018 14
Assembler Example
Address Machine Code Assembler using Mnemonics00 29 F6 sub esi, esi
02 29C0 sub eax, eax
04 29DB sub ebx, ebx
06 8B56 17 l mov edx, [BYTE one+esi]
09 01D0 add eax, edx
0B 01C3 add ebx, eax
0D 89C1 mov ecx, eax
0F 8B561B mov edx, [BYTE ten+esi]
12 29D1 sub ecx, edx
14 75 F0 jnz l
16 F4 hlt
17 01 00 0000 one dd 1
1B 0A00 0000 ten dd 10
The result is in ebx
D. Kroening: AIMS Embedded Systems Programming MT 2018 15
The NASM Assembler
I Windows:nasm -f win32 my test.asm
link /subsystem:console /entry:start my test.obj
I Linux:nasm -f elf my test.asm
ld -s -o my test my test.o
I MacOS:nasm -f macho my test.asm
ld -arch i386 -o my test my test.o
D. Kroening: AIMS Embedded Systems Programming MT 2018 16
Inline Assembler with Visual Studio
int one=1, ten=10, r e s u l t ;
int main ( ) {asm {
5 sub e s i , e s isub eax , eaxsub ebx , ebx
l : mov edx , [ one+e s i ]add eax , edx
10 add ebx , eaxmov ecx , eaxmov edx , [ ten+e s i ]sub ecx , edxjnz l
15 mov [ r e s u l t+e s i ] , ebx}
p r i n t f ( ”Result : %d\n” , r e s u l t ) ;return 0 ;
20 }
D. Kroening: AIMS Embedded Systems Programming MT 2018 17
Debugging with GDB (Part 1)
I run
Start execution
I x/[size] LabelDump a region of the memory
I x/[sizei] LabelDisassemble some memory region, e. g. x/5i $pc
I info registers
Show the value of the registers
I step
Execute one instruction
D. Kroening: AIMS Embedded Systems Programming MT 2018 18
Debugging with GDB (Part 2)
I break labelset breakpoint at label
I info break
show the breakpoints
I delete breakpoints numberwell, delete a breakpoint
I continue
resume the execution after a breakpoint
D. Kroening: AIMS Embedded Systems Programming MT 2018 19
Debugging with Visual Studio
D. Kroening: AIMS Embedded Systems Programming MT 2018 20
Debugging with XCode
D. Kroening: AIMS Embedded Systems Programming MT 2018 21
Extensions: Comparisons
We would love to have Y86 commands fori f ( a<b) { . . . }
These obviously depend on the number representation:
with sign without sign0>−7
twoc(0000)> twoc(1001)0< 9
bin(0000)< bin(1001)
D. Kroening: AIMS Embedded Systems Programming MT 2018 22
Reminder: Number Interpretation
Binary representation:
bin() : {0, 1}n −→ {0, . . . , 2n − 1}
bin(x) =
n−1∑i=0
xi · 2i
Two’s complement:
twoc() : {0, 1}n −→ {−2n−1, . . . , 2n−1 − 1}
twoc(x) = −2n−1 · xn−1 + bin(xn−2, . . . , x0)
D. Kroening: AIMS Embedded Systems Programming MT 2018 23
Comparing Unsigned Integers
Unsigned integers:
bin(a) < bin(b) ⇐⇒ bin(a)− bin(b) < 0
Recall: −b = (¬b) + 1We get the “+1” for free by setting the carry-in of the adder.
Let’s pretend we compute with one more bit (“zero extension”):
0 an−1 . . . a1 a0+ 1 ¬bn−1 . . . ¬b1 ¬b0
cn cn−1 . . . c1 1 (carry bits)= sn sn−1 . . . s1 s0 (sum)
Thus: bin(a)− bin(b) < 0 ⇐⇒ sn ⇐⇒ ¬cn
D. Kroening: AIMS Embedded Systems Programming MT 2018 24
Comparing Signed Integers
Two’s complement:
twoc(a) < twoc(b) ⇐⇒ twoc(a)− twoc(b) < 0
Again, let’s pretend we have an extra bit (“sign extension”):
an−1 an−1 . . . a1 a0+ ¬bn−1 ¬bn−1 . . . ¬b1 ¬b0
cn cn−1 . . . c1 1 (carry bits)= sn sn−1 . . . s1 s0 (sum)
Thus: twoc(a)− twoc(b) < 0 ⇐⇒ sn ⇐⇒an−1 ⊕ ¬bn−1 ⊕ cn ⇐⇒ sn−1 ⊕ cn−1 ⊕ cn
D. Kroening: AIMS Embedded Systems Programming MT 2018 25
New Flags: CF, SF, OF
We1 introduce three new flags for arithmetic operations:
I CF: The carry flag(cn in case of additions, ¬cn in case of subtraction)
I SF: The sign flag (sn−1)
I OF: The overflow flag (cn ⊕ cn−1)
1meaning Intel did soD. Kroening: AIMS Embedded Systems Programming MT 2018 26
Examples (Part 1)
000 . . . 000 = 0+ 000 . . . 001 = 1
0000 . . . 000= 000 . . . 001 = 1
ZF = 0,CF = 0,SF = 0,OF = 0
000 . . . 001 = 1− 000 . . . 001 = 1
1111 . . . 111= 000 . . . 000 = 0
ZF = 1,CF = 0,SF = 0,OF = 0
111 . . . 111 = −1+ 000 . . . 010 = 2
1111 . . . 110= 000 . . . 001 = 1
ZF = 0,CF = 1,SF = 0,OF = 0
D. Kroening: AIMS Embedded Systems Programming MT 2018 27
Examples (Part 2)
011 . . . 111 = 2n−1 − 1+ 000 . . . 001 = 1
0111 . . . 110= 100 . . . 000 = 2n−1
ZF = 0,CF = 0,SF = 1,OF = 1
100 . . . 000 = −2n−1− 000 . . . 001 = 1
1000 . . . 001= 011 . . . 111 = 2n−1 − 1
ZF = 0,CF = 0,SF = 0,OF = 1
D. Kroening: AIMS Embedded Systems Programming MT 2018 28
Branching Instructions for Comparisons
Instruction Flagsjz, je ZF
jnz, jne ¬ZFjnae, jb CFjae, jnb ¬CFjna, jbe CF ∨ ZFja, jnbe ¬(CF ∨ ZF)jnge, jl SF⊕OFjge, jnl ¬(SF⊕OF)jng, jle ((SF⊕OF) ∨ ZF)jg, jnle ¬((SF⊕OF) ∨ ZF)
jmp near unconditional
n = not, z = zero, e = equal,g = greater, l = less, a = above, b = below
i.e. jnbe = “jump if not (below or equal)”
D. Kroening: AIMS Embedded Systems Programming MT 2018 29
Branching Instructions for Comparisons
sub ax , bxJxxx ta r g e t. . .
t a r g e t :
branch if with sign without signax = bx je je
ax 6= bx jne jne
ax > bx jg ja
ax ≥ bx jge jae
ax < bx jl jb
ax ≤ bx jle jbe
D. Kroening: AIMS Embedded Systems Programming MT 2018 30
Example Branching Instructionss t a r t sub esi , es i ; array index
mov edx , [BYTE Intmax+es i ] ; Minimummov ecx , [BYTE Top+es i ] ; top indexsub ebx , ebx ; counter
5
L mov eax , ebxsub eax , ecxjae end ; counter≥Top?
10 mov esi , ebxmov edi , [BYTE Array+es i ] ; ed i :=array [ebx ]
mov eax , edisub eax , edx
15 jge sk ip; array [ebx ]≥Minimum?
mov edx , edi; Minimum:=array [ebx ]
sk ip sub esi , es i20 mov eax , [BYTE Four+es i ]
add ebx , eax ; counter+=4
jmp near L
25 end hlt
D. Kroening: AIMS Embedded Systems Programming MT 2018 31
Example Branching Instructions (Part 2)
Four dd 4Top dd 40Array dd 1 , 2 , 3 , 4 , 5 , 6 , −7, 8 , 9 , 10Intmax dd 0 x 7 f f f f f f f
D. Kroening: AIMS Embedded Systems Programming MT 2018 32
History ARM
I 1980s: Acorn ComputersI 1982: BBC Micro (8 bit)I 1986: ARM development kitI 1990: ARM, “Advanced RISC
Machines”, founded;owners: Acorn Computers, Apple andVLSI Technology
D. Kroening: AIMS Embedded Systems Programming MT 2018 33
ARM Today
I Now primarily licensed as IP, with focus on low-endembedded systems and phones (>95 % market share)
I Built by Apple, Nvidia, Qualcomm, Samsung, TI
I 2013: 37 billion ARM processors produced
I Early 64-bit prototypes for application in low-power servers
D. Kroening: AIMS Embedded Systems Programming MT 2018 34
Visible Data
I RAM, organised in 32-bit words
I RegistersI R0 to R15I R15 is a special case: this is the PCI R13 is the stack pointer (SP)I R14 is used for the return address for function calls (LR)I CPSR for various flagsI (There is another register file for floating-point numbers)
D. Kroening: AIMS Embedded Systems Programming MT 2018 35
Basic Instructions
ADD Rd, Rn, Rm Rd ← Rn +RmSUB Rd, Rn, Rm Rd ← Rn −RmMUL Rd, Rm, Rs Rd ← (Rm ·Rs)[31 : 0]
SMUL RdL, RdH , Rm, Rs RdH , RdL ← Rm ·RsUMUL RdL, RdH , Rm, Rs RdH , RdL ← Rm ·RsSDIV Rd, Rm, Rs Rd ← Rm/RsUDIV Rd, Rm, Rs Rd ← Rm/RsAND Rd, Rn, Rm Rd ← Rn&RmB label PC← label
BL label LR← PC+4; PC← label
BX Rm BX← Rm
Many variants!
D. Kroening: AIMS Embedded Systems Programming MT 2018 36
Setting Condition Flags
I Most instructions can be given a suffix S.
I In addition to the usual behaviour,the condition flags (in CPSR) are updated.
31 30 29 28
N Z C V
N = negative, Z = zero, C = carry, V = overflow
D. Kroening: AIMS Embedded Systems Programming MT 2018 37
Using Condition Flags
Most instructions can be given condition suffixes:
EQ equal NE not equalCS/HS carry set CC/LO carry clearMI negative PL positive (or zero)VS overflow VC no overflowHI higher LS lower or sameGE greater or equal LT less thanGT greater than LE less than or equal
These use 4 bits in the instruction word.
D. Kroening: AIMS Embedded Systems Programming MT 2018 38
ARM Instruction Formats
ARM uses a fixed-size instruction word:
31 28 27 21 20 19 16 15 12 11 0
Cond Opcode S Rn Rd Rmdata processing
31 28 27 25 24 23 0
Cond 1 0 1 L offsetbranch and branch&link
D. Kroening: AIMS Embedded Systems Programming MT 2018 39
ARM Instruction Formats
I There is a compressed version called“Thumb-2 Instruction Set”
I The instructions have 16 bit
I Fewer options, conditions are a separate instruction
I Aimed at better I-Cache efficiency
D. Kroening: AIMS Embedded Systems Programming MT 2018 40
Sequential Processors with Pipeline
I We will start with an implementation thatI has the form and shape of a pipeline, butI processes one instruction at a timeI processes the instructions in a fixed order of phases
I These aren’t built, but only exist for illustrative purposes.
4 But: The step to a proper pipeline is minimal(will show!)
D. Kroening: AIMS Embedded Systems Programming MT 2018 41
The 5 Instruction Phases (Stages)
1. Instruction Fetch (IF)The instruction is copied from the RAM into a register (IR)
2. Instruction Decode (ID)Loads the values of the operands from the register file intoregisters A and B;also increments the program counter
3. Execute (EX)Perform any ALU operation (say add/sub),address arithmetic for load/store
4. Memory (M)RAM access for load/store
5. Write-Back (WB)Store any result in the register file
D. Kroening: AIMS Embedded Systems Programming MT 2018 42
An Implementation: High-level View
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIPsy
stem
bus
A, B
MAR
MDRr
C FlgMDRw
D. Kroening: AIMS Embedded Systems Programming MT 2018 43
Sequential Execution
I We first implement a sequential machine:The stages are processed one after the otherin the order IF – ID – EX – M – WB
I We execute exactly one instruction at a time
I In contrast to multi-cycle designs:We stick to this even if an instruction doesn’t actually use aparticular stage
D. Kroening: AIMS Embedded Systems Programming MT 2018 44
Sequential Execution
Let I1, I2, . . . be the sequence of instructions in program order.
time 0 1 2 3 4 5 6 7 8IF I1 I2ID I1 I2EX I1 I2MEM I1 I2WB I1
D. Kroening: AIMS Embedded Systems Programming MT 2018 45
Example: Processing add
cycle:����
program:add edx, ebx
mov [100+esi], edx
0ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
D. Kroening: AIMS Embedded Systems Programming MT 2018 46
Example: Processing add (1)
cycle:����
program:add edx, ebx
mov [100+esi], edx
00
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
D. Kroening: AIMS Embedded Systems Programming MT 2018 47
Example: Processing add (2)
cycle:����
program:add edx, ebx
mov [100+esi], edx
1
add
0
2
2, 3
29, 6
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
D. Kroening: AIMS Embedded Systems Programming MT 2018 48
Example: Processing add (3)
cycle:����
program:add edx, ebx
mov [100+esi], edx
2
add
2
35 0
29, 6
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
D. Kroening: AIMS Embedded Systems Programming MT 2018 49
Example: Processing add (4)
cycle:����
program:add edx, ebx
mov [100+esi], edx
3
add
2
35
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
D. Kroening: AIMS Embedded Systems Programming MT 2018 50
Example: Processing add (5)
cycle:����
program:add edx, ebx
mov [100+esi], edx
4
add
2
35
2
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
D. Kroening: AIMS Embedded Systems Programming MT 2018 51
Example: Processing RMmov
cycle:����
program:add edx, ebx
mov [100+esi], edx
52
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
D. Kroening: AIMS Embedded Systems Programming MT 2018 52
Example: Processing RMmov (1)
cycle:����
program:add edx, ebx
mov [100+esi], edx
52
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
D. Kroening: AIMS Embedded Systems Programming MT 2018 53
Example: Processing RMmov (2)
cycle:����
program:add edx, ebx
mov [100+esi], edx
6
RMmov
2
50, 35
6, 2
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
D. Kroening: AIMS Embedded Systems Programming MT 2018 54
Example: Processing RMmov (3)
cycle:����
program:add edx, ebx
mov [100+esi], edx
7
RMmov
5
35100
0, 35
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
D. Kroening: AIMS Embedded Systems Programming MT 2018 55
Example: Processing RMmov (4)
cycle:����
program:add edx, ebx
mov [100+esi], edx
8
RMmov
5
35100
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
D. Kroening: AIMS Embedded Systems Programming MT 2018 56
Example: Processing RMmov (5)
cycle:����
program:add edx, ebx
mov [100+esi], edx
9
RMmov
5
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
D. Kroening: AIMS Embedded Systems Programming MT 2018 57
Example: Processing jnz
cycle:����
program:jnz l
the distance is 10
00
0
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
D. Kroening: AIMS Embedded Systems Programming MT 2018 58
Example: Processing jnz (1)
cycle:����
program:jnz l
the distance is 10
00
0
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
D. Kroening: AIMS Embedded Systems Programming MT 2018 59
Example: Processing jnz (2)
cycle:����
program:jnz l
the distance is 10
1
jnz
0
0
12
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
D. Kroening: AIMS Embedded Systems Programming MT 2018 60
Example: Processing jnz (3)
cycle:����
program:jnz l
the distance is 10
2
jnz
12
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
D. Kroening: AIMS Embedded Systems Programming MT 2018 61
Example: Processing jnz (4)
cycle:����
program:jnz l
the distance is 10
3
jnz
12
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
D. Kroening: AIMS Embedded Systems Programming MT 2018 62
Example: Processing jnz (5)
cycle:����
program:jnz l
the distance is 10
4
jnz
12
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
D. Kroening: AIMS Embedded Systems Programming MT 2018 63
Pipelining
I Increases the performance using the assembly-line idea
performance = instructions per cycle︸ ︷︷ ︸IPC
· clock frequency︸ ︷︷ ︸1/τ
I Standard technique in virtually all modern circuitry(not just CPUs, but also GPUs, video, networking, wireless,...)
D. Kroening: AIMS Embedded Systems Programming MT 2018 64
Pipelining
time 0 1 2 3 4 5IF I1 I2 I3 I4 I5 I6ID I1 I2 I3 I4 I5EX I1 I2 I3 I4MEM I1 I2 I3WB I1 I2
Best case: one instruction per cycle!
D. Kroening: AIMS Embedded Systems Programming MT 2018 65
Pipelining Performance
Performance:IPC · 1
τ
IPC ≈ 1
τ ≈ DFF +D
n
where:IPC : instructions per cycleτ : cycle timen: # stagesD: combinational delay without the flip flops
D. Kroening: AIMS Embedded Systems Programming MT 2018 66
Implementing the Pipeline: Roadmap
1. Resolving resource conflicts
2. Modifying the control
3. Dealing with data and control hazards
D. Kroening: AIMS Embedded Systems Programming MT 2018 67
Resource Conflicts
Let’s look at our sequential machine again:
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
Consider the C register of an ALUinstruction followed by anotherALU instruction!
IR once the 2nd instruction isfetched?
D. Kroening: AIMS Embedded Systems Programming MT 2018 68
Register Lifetime
ALU
0
addresses
A, Beax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR
load1
nextIP
syst
embu
s
A, B
MAR
MDRr
C FlgMDRw
IF ID EX M WBIR W R R R RA, B W RIP R WMAR W RMDRw W RC W R RFlags R WMDRr W Reax. . . R W
8 Problem: IR and C need to be remembered for multiplestages!
D. Kroening: AIMS Embedded Systems Programming MT 2018 69
Register Lifetime
ALU
01
addresses
A, B
load
eax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR1
IR2
IR3
IR4
nextIP
syst
embu
s
C4
A, B
MAR C3 FlgMDRw
MDRr
4 We resolveby replication !
D. Kroening: AIMS Embedded Systems Programming MT 2018 70
Resource Conflicts
Q: Which other resources are shared by stages?A: The system bus (shared by IF and MEM)!
Q: What do we do?A: Most CPUs have an L1-cache that permits two(read-)accesses simultaneously.
(Really two L1 caches: an I- and a D-cache)
D. Kroening: AIMS Embedded Systems Programming MT 2018 71
Example Pipeline
cycle:����
program (modified):add edx, ebx
mov [100+esi], ecx
0ALU
01
addresses
A, B
load
eax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR1
IR2
IR3
IR4
nextIP
syst
embu
s
C4
A, B
MAR C3 FlgMDRw
MDRr
D. Kroening: AIMS Embedded Systems Programming MT 2018 72
Example Pipeline (1)
cycle:����0
program (modified):add edx, ebx
mov [100+esi], ecx
0
ALU
01
addresses
A, B
load
eax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR1
IR2
IR3
IR4
nextIP
syst
embu
s
C4
A, B
MAR C3 FlgMDRw
MDRr
D. Kroening: AIMS Embedded Systems Programming MT 2018 73
Example Pipeline (2)
cycle:����
program (modified):add edx, ebx
mov [100+esi], ecx
1
add
0
2
2, 3
29, 6
ALU
01
addresses
A, B
load
eax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR1
IR2
IR3
IR4
nextIP
syst
embu
s
C4
A, B
MAR C3 FlgMDRw
MDRr
D. Kroening: AIMS Embedded Systems Programming MT 2018 74
Example Pipeline (3)
cycle:����2
program (modified):add edx, ebx
mov [100+esi], ecx
RMmov
2
50, 20
6, 1
add2
35 0
29, 6
ALU
01
addresses
A, B
load
eax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR1
IR2
IR3
IR4
nextIP
syst
embu
s
C4
A, B
MAR C3 FlgMDRw
MDRr
D. Kroening: AIMS Embedded Systems Programming MT 2018 75
Example Pipeline (4)
cycle:����3
program (modified):add edx, ebx
mov [100+esi], ecx
RMmov5
20100
0, 20
add35
ALU
01
addresses
A, B
load
eax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR1
IR2
IR3
IR4
nextIP
syst
embu
s
C4
A, B
MAR C3 FlgMDRw
MDRr
D. Kroening: AIMS Embedded Systems Programming MT 2018 76
Example Pipeline (5)
cycle:����4
program (modified):add edx, ebx
mov [100+esi], ecxRMmov20
100
add35
2
ALU
01
addresses
A, B
load
eax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR1
IR2
IR3
IR4
nextIP
syst
embu
s
C4
A, B
MAR C3 FlgMDRw
MDRr
D. Kroening: AIMS Embedded Systems Programming MT 2018 77
Example Pipeline (6)
cycle:����5
program (modified):add edx, ebx
mov [100+esi], ecx
RMmov
ALU
01
addresses
A, B
load
eax, ..., esi, edi
WB
M
EX
ID
IF
IP
IR1
IR2
IR3
IR4
nextIP
syst
embu
s
C4
A, B
MAR C3 FlgMDRw
MDRr
D. Kroening: AIMS Embedded Systems Programming MT 2018 78
Data and Control Dependencies
Example program with data dependency:
add edx , ebxmov [100+ es i ] , edx
Execution in the pipeline:Like that?
time 3IF . . .ID . . .EX mov [100+esi], edxMEM add edx, ebxWB
D. Kroening: AIMS Embedded Systems Programming MT 2018 79
Data and Control DependenciesExample program with data dependency:
add edx , ebxmov [100+ es i ] , edx
Execution in the pipeline:
time 3IF . . .ID mov [100+esi], edxEX BUBBLEMEM add edx, ebxWB
DATA DEPENDENCY!8 We would now read the wrong (old) value of edx!
D. Kroening: AIMS Embedded Systems Programming MT 2018 80
Memory
I ROM: read-only memory
I RAM: random-access memory(but usually means random-access read and writememory)
I SRAM: static RAMstores state as long as power is supplied
I DRAM: dynamic RAMimplemented using capacitors;the state is lost without periodic refresh
D. Kroening: AIMS Embedded Systems Programming MT 2018 81
RAM in PCs
30 pin SIMM 72 pin SIMM MicroDIMM 184 pin RAMBus RIMM
100 pin DIMM 72 pinSODIMM
144 pin SDRAMSODIMM
200 pin DDRSODIMM
200 pin DDR-2SODIMM
168 pin SDRAM DIMM 184 pin DDR DIMM 240 pin DDR-2 DIMM
D. Kroening: AIMS Embedded Systems Programming MT 2018 82
Addresses
addressDATA
RA
M
WE
I RAM/ROM-Chips store many(billions of) bits
I Distinguish using an address
I The address is given in binary
I Plus WE : read/write
I The data pins are used for readingas well as writing
D. Kroening: AIMS Embedded Systems Programming MT 2018 83
Structure
2
2 4
decoder
deco
der
address I RAM/ROM chips are a 2Dmatrix
I The address is split into arow and column
I The binary encoding isturned into unary using adecoder
D. Kroening: AIMS Embedded Systems Programming MT 2018 84
SRAM Cell with Two Inverters
Address Line
Data Data
I Reading and writingI Address line selects the cellI State is held using the inverters (latch)I Read by comparing Data and Data
D. Kroening: AIMS Embedded Systems Programming MT 2018 85
SRAM Cell in CMOS
Data
VDDAddress Line
GND
Data
D. Kroening: AIMS Embedded Systems Programming MT 2018 86
DRAM
I DRAM uses capacitorsI more simplistic and easier to build than SRAM4 high density, low costI But: slower!
→ fast but expensive SRAM for caches (more on that later)→ slow but inexpensive DRAM for the main memory
D. Kroening: AIMS Embedded Systems Programming MT 2018 87
Reminder: Capacitors
0 1 2 3 4 5 6 7 8
1
time
charge %
charging discharging
Store an electric charge – but only for limited time
D. Kroening: AIMS Embedded Systems Programming MT 2018 88
DRAM Cell
GND Data
Address Line
A bit is stored as a capacity and has to be refreshed periodically
D. Kroening: AIMS Embedded Systems Programming MT 2018 89
Data Buses
Connecting multiple memory chips:
memorymodule
CPUmemory
modulememory
module
8 No! I/O pins are expensive!
D. Kroening: AIMS Embedded Systems Programming MT 2018 90
Data Buses
I Goal: effective use of the pricey wires
I Idea: share wires for data and addresses among RAMmodules
module
controlCPU
data
memorymodule
memorymodule
memory
address
D. Kroening: AIMS Embedded Systems Programming MT 2018 91
Interface RAM Chips
I Control signals:I CS (Chip Select) – activates a particular chipI WE (Write Enable)I OE (Output Enable)
I Inactive chips have high-impedance outputs (Z)
I Write by setting WE , read by setting OE
I Interface constraint: OE and WE are never both active
D. Kroening: AIMS Embedded Systems Programming MT 2018 92
Write Cycle
validData
Address
OE
WE
CS
D. Kroening: AIMS Embedded Systems Programming MT 2018 93
Read Cycle
valid
CS
WE
OE
Address
Data
D. Kroening: AIMS Embedded Systems Programming MT 2018 94
Row- und Column-Address-Strobes
I Idea: save even more wires by sending the address in two(or more) steps
I Typical: row and column are sent separately
I RAS: Row Address Strobe,CAS: Column Address Strobe
D. Kroening: AIMS Embedded Systems Programming MT 2018 95
RAS/CAS Write Cycle
Row Col
valid
RAS
CAS
WE
Data
Address
D. Kroening: AIMS Embedded Systems Programming MT 2018 96
RAS/CAS Read Cycle
Row Col
valid
CAS
Address
RAS
WE
Data
D. Kroening: AIMS Embedded Systems Programming MT 2018 97
Bus-Bursts
8 RAM has long latencyI RAM is often accessed sequentially
I Caches therefore are arranged in lines:a sequence of consecutive addresses (e.g. 256 bytes)
I Bus-bursts: efficient transmission of an entire cache line
D. Kroening: AIMS Embedded Systems Programming MT 2018 98
Bus-Bursts
CAS Latency (CL)
Address Row
RAS
OE
DATA
CLK
Col
D0 D1 D3D2
CAS
D. Kroening: AIMS Embedded Systems Programming MT 2018 99
Double Data Rate (DDR) RAM
D3
Address Row
RAS
OE
DATA
CLK
Col
CAS
CAS Latency (CL)
D0 D1 D2
D. Kroening: AIMS Embedded Systems Programming MT 2018 100
Timings
6GB 1066MHz DDR3 ECC Reg w/Par CL7 DIMM (Kit of 3) DR, x8 w/Therm Sen KVR1066D3D8R7SK3/6G Get Price
6GB 1333MHz DDR3 ECC Reg w/Par CL9 DIMM (Kit of 3) SR, x4 w/Therm Sen KVR1333D3S4R9SK3/6G Get Price
6GB 1333MHz DDR3 ECC Reg w/Par CL9 DIMM (Kit of 3) DR, x8 w/Therm Sen KVR1333D3D8R9SK3/6G Get Price
8GB 1066MHz DDR3 Non-ECC CL7 DIMM (Kit of 2) KVR1066D3N7K2/8G Get Price
8GB 1066MHz DDR3 ECC Reg w/Par CL7 DIMM (Kit of 2) DR, x4 w/Therm Sen KVR1066D3D4R7SK2/8G Get Price
8GB 1066MHz DDR3 ECC Reg w/Parity CL7 DIMM Quad Rank, x4 w/Therm Sen KVR1066D3Q4R7S/8G Get Price
8GB 1066MHz DDR3 ECC Reg w/Par CL7 DIMM (Kit of 2) QR, x8 w/Therm Sen KVR1066D3Q8R7SK2/8G Get Price
8GB 1333MHz DDR3 ECC Reg w/Par CL9 DIMM (Kit of 2) DR, x4 w/Therm Sen KVR1333D3D4R9SK2/8G Get Price
12GB 1066MHz DDR3 Non-ECC CL7 DIMM (Kit of 3) KVR1066D3N7K3/12G Get Price
12GB 1066MHz DDR3 ECC CL7 DIMM (Kit of 3) with Thermal Sensor KVR1066D3E7SK3/12G Get Price
12GB 1066MHz DDR3 ECC Reg w/Par CL7 DIMM (Kit of 3) DR, x4 w/Therm Sen KVR1066D3D4R7SK3/12G Get Price
12GB 1066MHz DDR3 ECC Reg w/Par CL7 DIMM (Kit of 3) QR, x8 w/Therm Sen KVR1066D3Q8R7SK3/12G Get Price
12GB 1333MHz DDR3 ECC Reg w/Par CL9 DIMM (Kit of 3) DR, x4 w/Therm Sen KVR1333D3D4R9SK3/12G Get Price
16GB 1066MHz DDR3 ECC Reg w/Par CL7 DIMM (Kit of 2) QR, x4 w/Therm Sen KVR1066D3Q4R7SK2/16G Get Price
24GB 1066MHz DDR3 ECC Reg w/Par CL7 DIMM (Kit of 3) QR, x4 w/Therm Sen KVR1066D3Q4R7SK3/24G Get Price
HyperX DDR 333MHz and 400MHz
Description Part Number Price
512MB 333MHz DDR Non-ECC CL2 (2-2-2-5-1) DIMM KHX2700/512 Get Price
512MB 400MHz DDR Non-ECC CL2 (2-3-2-6-1) DIMM KHX3200A/512 Get Price
1GB 333MHz DDR Non-ECC CL2 (2-2-2-5-1) DIMM (Kit of 2) KHX2700K2/1G Get Price
1GB 400MHz DDR Non-ECC CL2.5 (2.5-3-3-7-1) DIMM KHX3200/1G Get Price
1GB 400MHz DDR Non-ECC CL2 (2-3-2-6-1) DIMM KHX3200A/1G Get Price
1GB 400MHz DDR Non-ECC CL2 (2-3-2-6-1) DIMM (Kit of 2) KHX3200AK2/1G Get Price
2GB 400MHz DDR Non-ECC CL2.5 (2.5-3-3-7-1) DIMM (Kit of 2) KHX3200K2/2G Get Price
2GB 400MHz DDR Non-ECC CL2 (2-3-2-6-1) DIMM (Kit of 2) KHX3200AK2/2G Get Price
HyperX DDR2 800MHz, 900MHz, 1000MHz, 1066MHz and 1150MHz
Description Part Number Price
512MB 800MHz DDR2 Non-ECC Low-Latency CL4 (4-4-4-12) DIMM KHX6400D2LL/512 Get Price
512MB 1066MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM KHX8500D2/512 Get Price
1GB 800MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM KHX6400D2/1G Get Price
1GB 800MHz DDR2 Non-ECC Low-Latency CL4 (4-4-4-12) DIMM KHX6400D2LL/1G Get Price
1GB 800MHz DDR2 Non-ECC Low-Latency CL4 (4-4-4-12) DIMM (Kit of 2) KHX6400D2LLK2/1G Get Price
1GB 800MHz DDR2 Non-ECC Low Lat CL4 (4-4-4-12) DIMM (NVIDIA SLI-Ready) KHX6400D2LLK2/1GN Get Price
1GB 1066MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM KHX8500D2/1G Get Price
1GB 1066MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 2) KHX8500D2K2/1G Get Price
1GB 1066MHz DDR2 CL5 (5-5-5-15) DIMM (Kit of 2) (NVIDIA SLI-Ready) KHX8500D2K2/1GN Get Price
1GB 1150MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM KHX9200D2/1G Get Price
1GB 1200MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM KHX9600D2/1G Get Price
2GB 800MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM KHX6400D2/2G Get Price
2GB 800MHz DDR2 Non-ECC Low-Latency CL4 (4-4-4-12) DIMM KHX6400D2LL/2G Get Price
2GB 800MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 2) KHX6400D2K2/2G Get Price
2GB 800MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 2) Tall HS KHX6400D2T1K2/2G Get Price
2GB 800MHz DDR2 Non-ECC Low-Latency CL4 (4-4-4-12) DIMM (Kit of 2) KHX6400D2LLK2/2G Get Price
2GB 800MHz DDR2 Non-ECC Low-Lat CL4 (4-4-4-12) DIMM (NVIDIA SLI-Ready) KHX6400D2LLK2/2GN Get Price
2GB 1066MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM KHX8500D2/2G Get Price
2GB 1066MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 2) KHX8500D2K2/2G Get Price
2GB 1066MHz DDR2 CL5 (5-5-5-15) DIMM (Kit of 2) (NVIDIA SLI-Ready) KHX8500D2K2/2GN Get Price
2GB 1150MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 2) KHX9200D2K2/2G Get Price
2GB 1200MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 2) KHX9600D2K2/2G Get Price
2GB 800MHz DDR2 ECC Low-Latency CL4 (4-4-4-12) FBDIMM (Kit of 2) KHX6400F2LLK2/2G Get Price
4GB 800MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 2) KHX6400D2K2/4G Get Price
4GB 800MHz DDR2 Non-ECC Low-Latency CL4 (4-4-4-12) DIMM (Kit of 2) KHX6400D2LLK2/4G Get Price
4GB 1066MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 2) KHX8500D2K2/4G Get Price
4GB 1066MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 2) Tall HS KHX8500D2T1K2/4G Get Price
4GB 1066MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 4) KHX8500D2K4/4G Get Price
8GB 800MHz DDR2 Non-ECC Low-Latency CL4 (4-4-4-12) DIMM (Kit of 4) KHX6400D2LLK4/8G Get Price
HyperX DDR3 1375MHz, 1600MHZ, 1625MHz, 1800MHz, 1866MHz and 2000MHz
Description Part Number Price
1GB 1375MHz DDR3 Non-ECC Low-Latency CL7 (7-7-7-20) DIMM KHX11000D3LL/1G Get Price
1GB 1600MHz DDR3 Non-ECC CL9 (9-9-9-27) DIMM KHX12800D3/1G Get Price
1GB 1625MHz DDR3 Non-ECC Low-Latency CL7 (7-7-7-20) DIMM KHX13000D3LL/1G Get Price
1GB 1625MHz DDR3 Non-ECC Low-Latency CL7 (7-7-7-20) DIMM KHX13000AD3LL/1G Get Price
1GB 1800MHz DDR3 Non-ECC CL8 (8-8-8-24) DIMM KHX14400D3/1G Get Price
1GB 1800MHz DDR3 Non-ECC CL8 (8-8-8-24) DIMM KHX14400AD3/1G Get Price
2GB 1375MHz DDR3 Non-ECC CL9 (9-9-9) DIMM (Kit of 2) KHX11000D3K2/2G Get Price
2GB 1375MHz DDR3 Non-ECC Low-Latency CL7 (7-7-7-20) DIMM KHX11000D3LL/2G Get Price
2GB 1375MHz DDR3 Non-ECC CL7 (7-7-7-20) DIMM (Kit of 2) KHX11000D3LLK2/2G Get Price
2GB 1375MHz DDR3 Non-ECC CL7 (7-7-7-20) DIMM (Kit of 2) Intel XMP KHX11000D3LLK2/2GX Get Price
2GB 1600MHz DDR3 Non-ECC CL9 (9-9-9-27) DIMM KHX12800D3/2G Get Price
2GB 1600MHz DDR3 Non-ECC CL9 (9-9-9-27) DIMM (Kit of 2) KHX12800D3K2/2G Get Price
2GB 1625MHz DDR3 Non-ECC Low-Latency CL7 (7-7-7-20) DIMM KHX13000D3LL/2G Get Price
2GB 1625MHz DDR3 Non-ECC Low-Latency CL7 (7-7-7-20) DIMM (Kit of 2) KHX13000D3LLK2/2G Get Price
2GB 1625MHz DDR3 Low Latency CL8 (8-7-7-20) DIMM (Kit of 2) NVIDIA SLI KHX13000D3LLK2/2GN Get Price
D. Kroening: AIMS Embedded Systems Programming MT 2018 101
Timings
Example: 2-2-2-5
Current standard:
1. CAS Latency2. RAS-to-CAS Delay3. RAS Precharge4. Act-to-Precharge Delay
D. Kroening: AIMS Embedded Systems Programming MT 2018 102
Caches
I Recall: DRAM slow/cheap, SRAM fast/pricey
I Idea: use SRAM as fast cache for lots of DRAM
I “Hides” the latency of the slow DRAM
I Usually good hit rates >90 %
D. Kroening: AIMS Embedded Systems Programming MT 2018 103
Caches: Overview
1
2
3
5
4
0
. . .
6
7
8
9
123
456
123 456
789
123789 123
0
tag
cacheline 0
1
2
3
index
main memorycache
0
0 1 2 3offset
D. Kroening: AIMS Embedded Systems Programming MT 2018 104
Caches: Hashing
Q: How to map the addresses?
Easiest answer: use least-significant bits
address = tag index offset
I tag: distinguishes lines with same indexI index: address in cacheI offset: distinguishes words in cache line
D. Kroening: AIMS Embedded Systems Programming MT 2018 105
Collisions
2
3
index
0
. . .
0123456789
101112131415161718192021222324
tag
0
0
0
0
1
1
0
tag
cacheline 0
1
D. Kroening: AIMS Embedded Systems Programming MT 2018 106
Overview of Design Options for Caches
I sizeI line size – number of bytes stored togetherI allocation policy – when is a new entry created?I associativity – length of list in hash tableI replacement policy – which entries to purgeI (sectoring)I write policy – write through or write backI split I/D cache or unified I/D cache
We will have more options once hierarchy is added.
D. Kroening: AIMS Embedded Systems Programming MT 2018 107
Cache Size
I Bigger cache −→ better hit rateI Bigger caches are also more expensive and have longer
paths
I Partially addressed by hierarchy(more on that later)
D. Kroening: AIMS Embedded Systems Programming MT 2018 108
Line Size
I Observation: memory accesses are clusteredI I.e., the subsequent accesses are often next to each otherI Cache entries have overhead: address bits plus flag bitsI Also remember the latency of memory!
4 Reduce overhead by making cache entry bigger
I Typical size: 64 bytes (512 bits)
D. Kroening: AIMS Embedded Systems Programming MT 2018 109
Associativity
I Also called “ways”
I An n-way cache can store n entries with the same addresshash
I Think of the length of the list in a hash table
I This reduces the number of collisions
D. Kroening: AIMS Embedded Systems Programming MT 2018 110
Associativity
12131415161718192021222324
tag
0
0
0
0
1
1
2
2
3
3
0
0
1
1
index tag
0
1
2-way cache
cacheline
. . .
0123456789
1011
D. Kroening: AIMS Embedded Systems Programming MT 2018 111
Cache Hierarchies
I Recall that fast SRAM is expensive, and bigger cacheshave long paths
I Thus: build a cache for the cache
I L1: closest to CPUI L2, L3, L4: cache the next level
I Caches get bigger the closer they get to the memory
D. Kroening: AIMS Embedded Systems Programming MT 2018 112
Statistics
Model Year L1 Cache L2 Cache L3 Cache L4 Cache80486DX 1989 8 KB jointPentium 1993 8 KB+8 KBPentium Pro 1995 8 KB+8 KB 0.25 MBPentium MMX 1997 16 KB+16 KBPentium II 1997 16 KB+16 KB 0.5 MBXeon 1998 8 KB+8 KB 0.25–1 MBPentium III 1999 16 KB+16 KB 0.5 MBPentium 4 2000 16 KB+16 KB 0.25–0.5 MBItanium 2 2002 16 KB+16 KB 1.5–9 MB 2 or 4 MBPentium M 2003 32 KB+32 KB 0.25 MBCore 2 Duo 2006 32 KB+32 KB 2 MBCore i7 2008 32 KB+32 KB 0.25 MB 8 MBCore i5 2009 32 KB+32 KB 0.25 MB 8 MBCore i3 2010 32 KB+32 KB 0.25 MB 4 MBAtom SoC 2012 32 KB+24 KB 0.25 MBCore M 2014 0.25 MB 3 MB 128 MB
Numbers are per core unless shared.
D. Kroening: AIMS Embedded Systems Programming MT 2018 113