113
AIMS Embedded Systems Programming MT 2018 Micro Architectures Daniel Kroening University of Oxford, Computer Science Department Version 1.0, 2014

AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

  • Upload
    lelien

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

AIMS Embedded Systems ProgrammingMT 2018

Micro Architectures

Daniel Kroening

University of Oxford, Computer Science Department

Version 1.0, 2014

Page 2: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Outline

X86/Y86

ARM

Pipelining

Memory

D. Kroening: AIMS Embedded Systems Programming MT 2018 2

Page 3: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

High-Level View of Microarchitectures

...

...

eax ebx

ecx

ZF

CPU

registers

cachesFUs

ALUFloat

Memory

memorymodule

memorymodule

I/O(USB, ...)

dataaddresscontrol

L1, L2

IP

D. Kroening: AIMS Embedded Systems Programming MT 2018 3

Page 4: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

CPUs

I Process a sequential assembler program

I Data held in registers

I Program controls which data is given to which FU,and where the result is stored

I Program controls transfer of data between registers andmemory

I Caches speed up access to frequently used memory cells

D. Kroening: AIMS Embedded Systems Programming MT 2018 4

Page 5: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Instruction Set Architectures

I These summarise the behavior of a CPU from the point ofview of the programmer

I An ISA describes “what the CPU does”

I Ideally as little as possible about “how the CPU does it”

D. Kroening: AIMS Embedded Systems Programming MT 2018 5

Page 6: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

We will study two ISAs:1. CISC: specifically the Y86

(academic variant of Intel’s x86)2. RISC: specifically the ARM 32 architecture

One of the goals of this course is to understand the difference

D. Kroening: AIMS Embedded Systems Programming MT 2018 6

Page 7: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Visible Registers

RAM

I Contains data and the program

Data registers

Index 0 1 2 3 4 5 6 7Name eax ecx edx ebx esp ebp esi edi

Instruction Pointer (IP)

I Points to address of current instruction

Flag registers (ZF, ...)

I Store flags for branches

D. Kroening: AIMS Embedded Systems Programming MT 2018 7

Page 8: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Y86 Assembler

I Subset of Intel’s x86 assembler

4 You can run a Y86 program on your x86 machine!

8 The reverse does not work in general,as too many instructions are missing(you are welcome to mend this)

D. Kroening: AIMS Embedded Systems Programming MT 2018 8

Page 9: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Y86 Instructions

I add/sub: Addition/subtraction of the values in tworegisters;ZF is set appropriately

I RRmov: copies value of one register into anotherI RMmov: copies value of a register into RAMI MRmov: copies value from RAM into a register

I jnz: Jumps to relative address if ZF = 0

D. Kroening: AIMS Embedded Systems Programming MT 2018 9

Page 10: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Y86 Loads and Stores

I Loads and stores have a Displacement :

ea = esi+ Displacement

I The displacement is included in the instruction word asimmediate constant

I The register esi is used as offset

D. Kroening: AIMS Embedded Systems Programming MT 2018 10

Page 11: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Y86 Instruction Formats

01

29

75

11

11

89

89

8b

11

01

7 6 3 0

01

110

f4

110

IP←IP+Distance

RD←RS

MEM[ea]←RS

RS

RS

Distance

RS

RS

RS Displacement

RD

RD

RD

SemanticsMnemonic Opcode

RS←MEM[ea]

RD←RD+RS

hlt

MRmov

RMmov

RRmov

jnz

sub

add

Displacement

RD←RD-RS

if(¬ZF)

D. Kroening: AIMS Embedded Systems Programming MT 2018 11

Page 12: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example 1

add eax, edx

I Intel convention: the target register is always on the

left-hand side

I The target register is a source register, too!

I Semantics:

eax← eax + edx

D. Kroening: AIMS Embedded Systems Programming MT 2018 12

Page 13: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example 2

mov edx, [BYTE one+esi]

8B 56 17

Opcode (MRmov) Displacement

01 010︸︷︷︸edx

110

Semantics:

edx← MEM[esi+17]

D. Kroening: AIMS Embedded Systems Programming MT 2018 13

Page 14: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

How do Branches Work?

i f ( a==b ) {T ;

}else {

5 F ;}

mov eax , [BYTE a+es i ]mov ebx , [BYTE b+es i ]sub eax , ebxjnz f

5 ;; Code fo r ‘T ’;mov eax , [BYTE one+es i ]add eax , eax

10 jnz ef ;

; Code fo r ‘F ’;

15 e ; . . .

D. Kroening: AIMS Embedded Systems Programming MT 2018 14

Page 15: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Assembler Example

Address Machine Code Assembler using Mnemonics00 29 F6 sub esi, esi

02 29C0 sub eax, eax

04 29DB sub ebx, ebx

06 8B56 17 l mov edx, [BYTE one+esi]

09 01D0 add eax, edx

0B 01C3 add ebx, eax

0D 89C1 mov ecx, eax

0F 8B561B mov edx, [BYTE ten+esi]

12 29D1 sub ecx, edx

14 75 F0 jnz l

16 F4 hlt

17 01 00 0000 one dd 1

1B 0A00 0000 ten dd 10

The result is in ebx

D. Kroening: AIMS Embedded Systems Programming MT 2018 15

Page 16: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

The NASM Assembler

I Windows:nasm -f win32 my test.asm

link /subsystem:console /entry:start my test.obj

I Linux:nasm -f elf my test.asm

ld -s -o my test my test.o

I MacOS:nasm -f macho my test.asm

ld -arch i386 -o my test my test.o

D. Kroening: AIMS Embedded Systems Programming MT 2018 16

Page 17: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Inline Assembler with Visual Studio

int one=1, ten=10, r e s u l t ;

int main ( ) {asm {

5 sub e s i , e s isub eax , eaxsub ebx , ebx

l : mov edx , [ one+e s i ]add eax , edx

10 add ebx , eaxmov ecx , eaxmov edx , [ ten+e s i ]sub ecx , edxjnz l

15 mov [ r e s u l t+e s i ] , ebx}

p r i n t f ( ”Result : %d\n” , r e s u l t ) ;return 0 ;

20 }

D. Kroening: AIMS Embedded Systems Programming MT 2018 17

Page 18: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Debugging with GDB (Part 1)

I run

Start execution

I x/[size] LabelDump a region of the memory

I x/[sizei] LabelDisassemble some memory region, e. g. x/5i $pc

I info registers

Show the value of the registers

I step

Execute one instruction

D. Kroening: AIMS Embedded Systems Programming MT 2018 18

Page 19: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Debugging with GDB (Part 2)

I break labelset breakpoint at label

I info break

show the breakpoints

I delete breakpoints numberwell, delete a breakpoint

I continue

resume the execution after a breakpoint

D. Kroening: AIMS Embedded Systems Programming MT 2018 19

Page 20: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Debugging with Visual Studio

D. Kroening: AIMS Embedded Systems Programming MT 2018 20

Page 21: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Debugging with XCode

D. Kroening: AIMS Embedded Systems Programming MT 2018 21

Page 22: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Extensions: Comparisons

We would love to have Y86 commands fori f ( a<b) { . . . }

These obviously depend on the number representation:

with sign without sign0>−7

twoc(0000)> twoc(1001)0< 9

bin(0000)< bin(1001)

D. Kroening: AIMS Embedded Systems Programming MT 2018 22

Page 23: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Reminder: Number Interpretation

Binary representation:

bin() : {0, 1}n −→ {0, . . . , 2n − 1}

bin(x) =

n−1∑i=0

xi · 2i

Two’s complement:

twoc() : {0, 1}n −→ {−2n−1, . . . , 2n−1 − 1}

twoc(x) = −2n−1 · xn−1 + bin(xn−2, . . . , x0)

D. Kroening: AIMS Embedded Systems Programming MT 2018 23

Page 24: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Comparing Unsigned Integers

Unsigned integers:

bin(a) < bin(b) ⇐⇒ bin(a)− bin(b) < 0

Recall: −b = (¬b) + 1We get the “+1” for free by setting the carry-in of the adder.

Let’s pretend we compute with one more bit (“zero extension”):

0 an−1 . . . a1 a0+ 1 ¬bn−1 . . . ¬b1 ¬b0

cn cn−1 . . . c1 1 (carry bits)= sn sn−1 . . . s1 s0 (sum)

Thus: bin(a)− bin(b) < 0 ⇐⇒ sn ⇐⇒ ¬cn

D. Kroening: AIMS Embedded Systems Programming MT 2018 24

Page 25: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Comparing Signed Integers

Two’s complement:

twoc(a) < twoc(b) ⇐⇒ twoc(a)− twoc(b) < 0

Again, let’s pretend we have an extra bit (“sign extension”):

an−1 an−1 . . . a1 a0+ ¬bn−1 ¬bn−1 . . . ¬b1 ¬b0

cn cn−1 . . . c1 1 (carry bits)= sn sn−1 . . . s1 s0 (sum)

Thus: twoc(a)− twoc(b) < 0 ⇐⇒ sn ⇐⇒an−1 ⊕ ¬bn−1 ⊕ cn ⇐⇒ sn−1 ⊕ cn−1 ⊕ cn

D. Kroening: AIMS Embedded Systems Programming MT 2018 25

Page 26: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

New Flags: CF, SF, OF

We1 introduce three new flags for arithmetic operations:

I CF: The carry flag(cn in case of additions, ¬cn in case of subtraction)

I SF: The sign flag (sn−1)

I OF: The overflow flag (cn ⊕ cn−1)

1meaning Intel did soD. Kroening: AIMS Embedded Systems Programming MT 2018 26

Page 27: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Examples (Part 1)

000 . . . 000 = 0+ 000 . . . 001 = 1

0000 . . . 000= 000 . . . 001 = 1

ZF = 0,CF = 0,SF = 0,OF = 0

000 . . . 001 = 1− 000 . . . 001 = 1

1111 . . . 111= 000 . . . 000 = 0

ZF = 1,CF = 0,SF = 0,OF = 0

111 . . . 111 = −1+ 000 . . . 010 = 2

1111 . . . 110= 000 . . . 001 = 1

ZF = 0,CF = 1,SF = 0,OF = 0

D. Kroening: AIMS Embedded Systems Programming MT 2018 27

Page 28: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Examples (Part 2)

011 . . . 111 = 2n−1 − 1+ 000 . . . 001 = 1

0111 . . . 110= 100 . . . 000 = 2n−1

ZF = 0,CF = 0,SF = 1,OF = 1

100 . . . 000 = −2n−1− 000 . . . 001 = 1

1000 . . . 001= 011 . . . 111 = 2n−1 − 1

ZF = 0,CF = 0,SF = 0,OF = 1

D. Kroening: AIMS Embedded Systems Programming MT 2018 28

Page 29: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Branching Instructions for Comparisons

Instruction Flagsjz, je ZF

jnz, jne ¬ZFjnae, jb CFjae, jnb ¬CFjna, jbe CF ∨ ZFja, jnbe ¬(CF ∨ ZF)jnge, jl SF⊕OFjge, jnl ¬(SF⊕OF)jng, jle ((SF⊕OF) ∨ ZF)jg, jnle ¬((SF⊕OF) ∨ ZF)

jmp near unconditional

n = not, z = zero, e = equal,g = greater, l = less, a = above, b = below

i.e. jnbe = “jump if not (below or equal)”

D. Kroening: AIMS Embedded Systems Programming MT 2018 29

Page 30: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Branching Instructions for Comparisons

sub ax , bxJxxx ta r g e t. . .

t a r g e t :

branch if with sign without signax = bx je je

ax 6= bx jne jne

ax > bx jg ja

ax ≥ bx jge jae

ax < bx jl jb

ax ≤ bx jle jbe

D. Kroening: AIMS Embedded Systems Programming MT 2018 30

Page 31: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example Branching Instructionss t a r t sub esi , es i ; array index

mov edx , [BYTE Intmax+es i ] ; Minimummov ecx , [BYTE Top+es i ] ; top indexsub ebx , ebx ; counter

5

L mov eax , ebxsub eax , ecxjae end ; counter≥Top?

10 mov esi , ebxmov edi , [BYTE Array+es i ] ; ed i :=array [ebx ]

mov eax , edisub eax , edx

15 jge sk ip; array [ebx ]≥Minimum?

mov edx , edi; Minimum:=array [ebx ]

sk ip sub esi , es i20 mov eax , [BYTE Four+es i ]

add ebx , eax ; counter+=4

jmp near L

25 end hlt

D. Kroening: AIMS Embedded Systems Programming MT 2018 31

Page 32: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example Branching Instructions (Part 2)

Four dd 4Top dd 40Array dd 1 , 2 , 3 , 4 , 5 , 6 , −7, 8 , 9 , 10Intmax dd 0 x 7 f f f f f f f

D. Kroening: AIMS Embedded Systems Programming MT 2018 32

Page 33: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

History ARM

I 1980s: Acorn ComputersI 1982: BBC Micro (8 bit)I 1986: ARM development kitI 1990: ARM, “Advanced RISC

Machines”, founded;owners: Acorn Computers, Apple andVLSI Technology

D. Kroening: AIMS Embedded Systems Programming MT 2018 33

Page 34: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

ARM Today

I Now primarily licensed as IP, with focus on low-endembedded systems and phones (>95 % market share)

I Built by Apple, Nvidia, Qualcomm, Samsung, TI

I 2013: 37 billion ARM processors produced

I Early 64-bit prototypes for application in low-power servers

D. Kroening: AIMS Embedded Systems Programming MT 2018 34

Page 35: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Visible Data

I RAM, organised in 32-bit words

I RegistersI R0 to R15I R15 is a special case: this is the PCI R13 is the stack pointer (SP)I R14 is used for the return address for function calls (LR)I CPSR for various flagsI (There is another register file for floating-point numbers)

D. Kroening: AIMS Embedded Systems Programming MT 2018 35

Page 36: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Basic Instructions

ADD Rd, Rn, Rm Rd ← Rn +RmSUB Rd, Rn, Rm Rd ← Rn −RmMUL Rd, Rm, Rs Rd ← (Rm ·Rs)[31 : 0]

SMUL RdL, RdH , Rm, Rs RdH , RdL ← Rm ·RsUMUL RdL, RdH , Rm, Rs RdH , RdL ← Rm ·RsSDIV Rd, Rm, Rs Rd ← Rm/RsUDIV Rd, Rm, Rs Rd ← Rm/RsAND Rd, Rn, Rm Rd ← Rn&RmB label PC← label

BL label LR← PC+4; PC← label

BX Rm BX← Rm

Many variants!

D. Kroening: AIMS Embedded Systems Programming MT 2018 36

Page 37: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Setting Condition Flags

I Most instructions can be given a suffix S.

I In addition to the usual behaviour,the condition flags (in CPSR) are updated.

31 30 29 28

N Z C V

N = negative, Z = zero, C = carry, V = overflow

D. Kroening: AIMS Embedded Systems Programming MT 2018 37

Page 38: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Using Condition Flags

Most instructions can be given condition suffixes:

EQ equal NE not equalCS/HS carry set CC/LO carry clearMI negative PL positive (or zero)VS overflow VC no overflowHI higher LS lower or sameGE greater or equal LT less thanGT greater than LE less than or equal

These use 4 bits in the instruction word.

D. Kroening: AIMS Embedded Systems Programming MT 2018 38

Page 39: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

ARM Instruction Formats

ARM uses a fixed-size instruction word:

31 28 27 21 20 19 16 15 12 11 0

Cond Opcode S Rn Rd Rmdata processing

31 28 27 25 24 23 0

Cond 1 0 1 L offsetbranch and branch&link

D. Kroening: AIMS Embedded Systems Programming MT 2018 39

Page 40: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

ARM Instruction Formats

I There is a compressed version called“Thumb-2 Instruction Set”

I The instructions have 16 bit

I Fewer options, conditions are a separate instruction

I Aimed at better I-Cache efficiency

D. Kroening: AIMS Embedded Systems Programming MT 2018 40

Page 41: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Sequential Processors with Pipeline

I We will start with an implementation thatI has the form and shape of a pipeline, butI processes one instruction at a timeI processes the instructions in a fixed order of phases

I These aren’t built, but only exist for illustrative purposes.

4 But: The step to a proper pipeline is minimal(will show!)

D. Kroening: AIMS Embedded Systems Programming MT 2018 41

Page 42: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

The 5 Instruction Phases (Stages)

1. Instruction Fetch (IF)The instruction is copied from the RAM into a register (IR)

2. Instruction Decode (ID)Loads the values of the operands from the register file intoregisters A and B;also increments the program counter

3. Execute (EX)Perform any ALU operation (say add/sub),address arithmetic for load/store

4. Memory (M)RAM access for load/store

5. Write-Back (WB)Store any result in the register file

D. Kroening: AIMS Embedded Systems Programming MT 2018 42

Page 43: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

An Implementation: High-level View

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIPsy

stem

bus

A, B

MAR

MDRr

C FlgMDRw

D. Kroening: AIMS Embedded Systems Programming MT 2018 43

Page 44: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Sequential Execution

I We first implement a sequential machine:The stages are processed one after the otherin the order IF – ID – EX – M – WB

I We execute exactly one instruction at a time

I In contrast to multi-cycle designs:We stick to this even if an instruction doesn’t actually use aparticular stage

D. Kroening: AIMS Embedded Systems Programming MT 2018 44

Page 45: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Sequential Execution

Let I1, I2, . . . be the sequence of instructions in program order.

time 0 1 2 3 4 5 6 7 8IF I1 I2ID I1 I2EX I1 I2MEM I1 I2WB I1

D. Kroening: AIMS Embedded Systems Programming MT 2018 45

Page 46: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example: Processing add

cycle:����

program:add edx, ebx

mov [100+esi], edx

0ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

D. Kroening: AIMS Embedded Systems Programming MT 2018 46

Page 47: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example: Processing add (1)

cycle:����

program:add edx, ebx

mov [100+esi], edx

00

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

D. Kroening: AIMS Embedded Systems Programming MT 2018 47

Page 48: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example: Processing add (2)

cycle:����

program:add edx, ebx

mov [100+esi], edx

1

add

0

2

2, 3

29, 6

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

D. Kroening: AIMS Embedded Systems Programming MT 2018 48

Page 49: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example: Processing add (3)

cycle:����

program:add edx, ebx

mov [100+esi], edx

2

add

2

35 0

29, 6

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

D. Kroening: AIMS Embedded Systems Programming MT 2018 49

Page 50: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example: Processing add (4)

cycle:����

program:add edx, ebx

mov [100+esi], edx

3

add

2

35

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

D. Kroening: AIMS Embedded Systems Programming MT 2018 50

Page 51: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example: Processing add (5)

cycle:����

program:add edx, ebx

mov [100+esi], edx

4

add

2

35

2

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

D. Kroening: AIMS Embedded Systems Programming MT 2018 51

Page 52: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example: Processing RMmov

cycle:����

program:add edx, ebx

mov [100+esi], edx

52

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

D. Kroening: AIMS Embedded Systems Programming MT 2018 52

Page 53: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example: Processing RMmov (1)

cycle:����

program:add edx, ebx

mov [100+esi], edx

52

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

D. Kroening: AIMS Embedded Systems Programming MT 2018 53

Page 54: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example: Processing RMmov (2)

cycle:����

program:add edx, ebx

mov [100+esi], edx

6

RMmov

2

50, 35

6, 2

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

D. Kroening: AIMS Embedded Systems Programming MT 2018 54

Page 55: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example: Processing RMmov (3)

cycle:����

program:add edx, ebx

mov [100+esi], edx

7

RMmov

5

35100

0, 35

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

D. Kroening: AIMS Embedded Systems Programming MT 2018 55

Page 56: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example: Processing RMmov (4)

cycle:����

program:add edx, ebx

mov [100+esi], edx

8

RMmov

5

35100

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

D. Kroening: AIMS Embedded Systems Programming MT 2018 56

Page 57: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example: Processing RMmov (5)

cycle:����

program:add edx, ebx

mov [100+esi], edx

9

RMmov

5

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

D. Kroening: AIMS Embedded Systems Programming MT 2018 57

Page 58: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example: Processing jnz

cycle:����

program:jnz l

the distance is 10

00

0

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

D. Kroening: AIMS Embedded Systems Programming MT 2018 58

Page 59: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example: Processing jnz (1)

cycle:����

program:jnz l

the distance is 10

00

0

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

D. Kroening: AIMS Embedded Systems Programming MT 2018 59

Page 60: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example: Processing jnz (2)

cycle:����

program:jnz l

the distance is 10

1

jnz

0

0

12

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

D. Kroening: AIMS Embedded Systems Programming MT 2018 60

Page 61: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example: Processing jnz (3)

cycle:����

program:jnz l

the distance is 10

2

jnz

12

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

D. Kroening: AIMS Embedded Systems Programming MT 2018 61

Page 62: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example: Processing jnz (4)

cycle:����

program:jnz l

the distance is 10

3

jnz

12

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

D. Kroening: AIMS Embedded Systems Programming MT 2018 62

Page 63: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example: Processing jnz (5)

cycle:����

program:jnz l

the distance is 10

4

jnz

12

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

D. Kroening: AIMS Embedded Systems Programming MT 2018 63

Page 64: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Pipelining

I Increases the performance using the assembly-line idea

performance = instructions per cycle︸ ︷︷ ︸IPC

· clock frequency︸ ︷︷ ︸1/τ

I Standard technique in virtually all modern circuitry(not just CPUs, but also GPUs, video, networking, wireless,...)

D. Kroening: AIMS Embedded Systems Programming MT 2018 64

Page 65: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Pipelining

time 0 1 2 3 4 5IF I1 I2 I3 I4 I5 I6ID I1 I2 I3 I4 I5EX I1 I2 I3 I4MEM I1 I2 I3WB I1 I2

Best case: one instruction per cycle!

D. Kroening: AIMS Embedded Systems Programming MT 2018 65

Page 66: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Pipelining Performance

Performance:IPC · 1

τ

IPC ≈ 1

τ ≈ DFF +D

n

where:IPC : instructions per cycleτ : cycle timen: # stagesD: combinational delay without the flip flops

D. Kroening: AIMS Embedded Systems Programming MT 2018 66

Page 67: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Implementing the Pipeline: Roadmap

1. Resolving resource conflicts

2. Modifying the control

3. Dealing with data and control hazards

D. Kroening: AIMS Embedded Systems Programming MT 2018 67

Page 68: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Resource Conflicts

Let’s look at our sequential machine again:

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

Consider the C register of an ALUinstruction followed by anotherALU instruction!

IR once the 2nd instruction isfetched?

D. Kroening: AIMS Embedded Systems Programming MT 2018 68

Page 69: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Register Lifetime

ALU

0

addresses

A, Beax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR

load1

nextIP

syst

embu

s

A, B

MAR

MDRr

C FlgMDRw

IF ID EX M WBIR W R R R RA, B W RIP R WMAR W RMDRw W RC W R RFlags R WMDRr W Reax. . . R W

8 Problem: IR and C need to be remembered for multiplestages!

D. Kroening: AIMS Embedded Systems Programming MT 2018 69

Page 70: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Register Lifetime

ALU

01

addresses

A, B

load

eax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR1

IR2

IR3

IR4

nextIP

syst

embu

s

C4

A, B

MAR C3 FlgMDRw

MDRr

4 We resolveby replication !

D. Kroening: AIMS Embedded Systems Programming MT 2018 70

Page 71: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Resource Conflicts

Q: Which other resources are shared by stages?A: The system bus (shared by IF and MEM)!

Q: What do we do?A: Most CPUs have an L1-cache that permits two(read-)accesses simultaneously.

(Really two L1 caches: an I- and a D-cache)

D. Kroening: AIMS Embedded Systems Programming MT 2018 71

Page 72: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example Pipeline

cycle:����

program (modified):add edx, ebx

mov [100+esi], ecx

0ALU

01

addresses

A, B

load

eax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR1

IR2

IR3

IR4

nextIP

syst

embu

s

C4

A, B

MAR C3 FlgMDRw

MDRr

D. Kroening: AIMS Embedded Systems Programming MT 2018 72

Page 73: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example Pipeline (1)

cycle:����0

program (modified):add edx, ebx

mov [100+esi], ecx

0

ALU

01

addresses

A, B

load

eax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR1

IR2

IR3

IR4

nextIP

syst

embu

s

C4

A, B

MAR C3 FlgMDRw

MDRr

D. Kroening: AIMS Embedded Systems Programming MT 2018 73

Page 74: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example Pipeline (2)

cycle:����

program (modified):add edx, ebx

mov [100+esi], ecx

1

add

0

2

2, 3

29, 6

ALU

01

addresses

A, B

load

eax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR1

IR2

IR3

IR4

nextIP

syst

embu

s

C4

A, B

MAR C3 FlgMDRw

MDRr

D. Kroening: AIMS Embedded Systems Programming MT 2018 74

Page 75: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example Pipeline (3)

cycle:����2

program (modified):add edx, ebx

mov [100+esi], ecx

RMmov

2

50, 20

6, 1

add2

35 0

29, 6

ALU

01

addresses

A, B

load

eax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR1

IR2

IR3

IR4

nextIP

syst

embu

s

C4

A, B

MAR C3 FlgMDRw

MDRr

D. Kroening: AIMS Embedded Systems Programming MT 2018 75

Page 76: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example Pipeline (4)

cycle:����3

program (modified):add edx, ebx

mov [100+esi], ecx

RMmov5

20100

0, 20

add35

ALU

01

addresses

A, B

load

eax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR1

IR2

IR3

IR4

nextIP

syst

embu

s

C4

A, B

MAR C3 FlgMDRw

MDRr

D. Kroening: AIMS Embedded Systems Programming MT 2018 76

Page 77: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example Pipeline (5)

cycle:����4

program (modified):add edx, ebx

mov [100+esi], ecxRMmov20

100

add35

2

ALU

01

addresses

A, B

load

eax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR1

IR2

IR3

IR4

nextIP

syst

embu

s

C4

A, B

MAR C3 FlgMDRw

MDRr

D. Kroening: AIMS Embedded Systems Programming MT 2018 77

Page 78: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Example Pipeline (6)

cycle:����5

program (modified):add edx, ebx

mov [100+esi], ecx

RMmov

ALU

01

addresses

A, B

load

eax, ..., esi, edi

WB

M

EX

ID

IF

IP

IR1

IR2

IR3

IR4

nextIP

syst

embu

s

C4

A, B

MAR C3 FlgMDRw

MDRr

D. Kroening: AIMS Embedded Systems Programming MT 2018 78

Page 79: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Data and Control Dependencies

Example program with data dependency:

add edx , ebxmov [100+ es i ] , edx

Execution in the pipeline:Like that?

time 3IF . . .ID . . .EX mov [100+esi], edxMEM add edx, ebxWB

D. Kroening: AIMS Embedded Systems Programming MT 2018 79

Page 80: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Data and Control DependenciesExample program with data dependency:

add edx , ebxmov [100+ es i ] , edx

Execution in the pipeline:

time 3IF . . .ID mov [100+esi], edxEX BUBBLEMEM add edx, ebxWB

DATA DEPENDENCY!8 We would now read the wrong (old) value of edx!

D. Kroening: AIMS Embedded Systems Programming MT 2018 80

Page 81: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Memory

I ROM: read-only memory

I RAM: random-access memory(but usually means random-access read and writememory)

I SRAM: static RAMstores state as long as power is supplied

I DRAM: dynamic RAMimplemented using capacitors;the state is lost without periodic refresh

D. Kroening: AIMS Embedded Systems Programming MT 2018 81

Page 82: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

RAM in PCs

30 pin SIMM 72 pin SIMM MicroDIMM 184 pin RAMBus RIMM

100 pin DIMM 72 pinSODIMM

144 pin SDRAMSODIMM

200 pin DDRSODIMM

200 pin DDR-2SODIMM

168 pin SDRAM DIMM 184 pin DDR DIMM 240 pin DDR-2 DIMM

D. Kroening: AIMS Embedded Systems Programming MT 2018 82

Page 83: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Addresses

addressDATA

RA

M

WE

I RAM/ROM-Chips store many(billions of) bits

I Distinguish using an address

I The address is given in binary

I Plus WE : read/write

I The data pins are used for readingas well as writing

D. Kroening: AIMS Embedded Systems Programming MT 2018 83

Page 84: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Structure

2

2 4

decoder

deco

der

address I RAM/ROM chips are a 2Dmatrix

I The address is split into arow and column

I The binary encoding isturned into unary using adecoder

D. Kroening: AIMS Embedded Systems Programming MT 2018 84

Page 85: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

SRAM Cell with Two Inverters

Address Line

Data Data

I Reading and writingI Address line selects the cellI State is held using the inverters (latch)I Read by comparing Data and Data

D. Kroening: AIMS Embedded Systems Programming MT 2018 85

Page 86: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

SRAM Cell in CMOS

Data

VDDAddress Line

GND

Data

D. Kroening: AIMS Embedded Systems Programming MT 2018 86

Page 87: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

DRAM

I DRAM uses capacitorsI more simplistic and easier to build than SRAM4 high density, low costI But: slower!

→ fast but expensive SRAM for caches (more on that later)→ slow but inexpensive DRAM for the main memory

D. Kroening: AIMS Embedded Systems Programming MT 2018 87

Page 88: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Reminder: Capacitors

0 1 2 3 4 5 6 7 8

1

time

charge %

charging discharging

Store an electric charge – but only for limited time

D. Kroening: AIMS Embedded Systems Programming MT 2018 88

Page 89: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

DRAM Cell

GND Data

Address Line

A bit is stored as a capacity and has to be refreshed periodically

D. Kroening: AIMS Embedded Systems Programming MT 2018 89

Page 90: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Data Buses

Connecting multiple memory chips:

memorymodule

CPUmemory

modulememory

module

8 No! I/O pins are expensive!

D. Kroening: AIMS Embedded Systems Programming MT 2018 90

Page 91: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Data Buses

I Goal: effective use of the pricey wires

I Idea: share wires for data and addresses among RAMmodules

module

controlCPU

data

memorymodule

memorymodule

memory

address

D. Kroening: AIMS Embedded Systems Programming MT 2018 91

Page 92: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Interface RAM Chips

I Control signals:I CS (Chip Select) – activates a particular chipI WE (Write Enable)I OE (Output Enable)

I Inactive chips have high-impedance outputs (Z)

I Write by setting WE , read by setting OE

I Interface constraint: OE and WE are never both active

D. Kroening: AIMS Embedded Systems Programming MT 2018 92

Page 93: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Write Cycle

validData

Address

OE

WE

CS

D. Kroening: AIMS Embedded Systems Programming MT 2018 93

Page 94: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Read Cycle

valid

CS

WE

OE

Address

Data

D. Kroening: AIMS Embedded Systems Programming MT 2018 94

Page 95: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Row- und Column-Address-Strobes

I Idea: save even more wires by sending the address in two(or more) steps

I Typical: row and column are sent separately

I RAS: Row Address Strobe,CAS: Column Address Strobe

D. Kroening: AIMS Embedded Systems Programming MT 2018 95

Page 96: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

RAS/CAS Write Cycle

Row Col

valid

RAS

CAS

WE

Data

Address

D. Kroening: AIMS Embedded Systems Programming MT 2018 96

Page 97: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

RAS/CAS Read Cycle

Row Col

valid

CAS

Address

RAS

WE

Data

D. Kroening: AIMS Embedded Systems Programming MT 2018 97

Page 98: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Bus-Bursts

8 RAM has long latencyI RAM is often accessed sequentially

I Caches therefore are arranged in lines:a sequence of consecutive addresses (e.g. 256 bytes)

I Bus-bursts: efficient transmission of an entire cache line

D. Kroening: AIMS Embedded Systems Programming MT 2018 98

Page 99: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Bus-Bursts

CAS Latency (CL)

Address Row

RAS

OE

DATA

CLK

Col

D0 D1 D3D2

CAS

D. Kroening: AIMS Embedded Systems Programming MT 2018 99

Page 100: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Double Data Rate (DDR) RAM

D3

Address Row

RAS

OE

DATA

CLK

Col

CAS

CAS Latency (CL)

D0 D1 D2

D. Kroening: AIMS Embedded Systems Programming MT 2018 100

Page 101: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Timings

6GB 1066MHz DDR3 ECC Reg w/Par CL7 DIMM (Kit of 3) DR, x8 w/Therm Sen KVR1066D3D8R7SK3/6G Get Price

6GB 1333MHz DDR3 ECC Reg w/Par CL9 DIMM (Kit of 3) SR, x4 w/Therm Sen KVR1333D3S4R9SK3/6G Get Price

6GB 1333MHz DDR3 ECC Reg w/Par CL9 DIMM (Kit of 3) DR, x8 w/Therm Sen KVR1333D3D8R9SK3/6G Get Price

8GB 1066MHz DDR3 Non-ECC CL7 DIMM (Kit of 2) KVR1066D3N7K2/8G Get Price

8GB 1066MHz DDR3 ECC Reg w/Par CL7 DIMM (Kit of 2) DR, x4 w/Therm Sen KVR1066D3D4R7SK2/8G Get Price

8GB 1066MHz DDR3 ECC Reg w/Parity CL7 DIMM Quad Rank, x4 w/Therm Sen KVR1066D3Q4R7S/8G Get Price

8GB 1066MHz DDR3 ECC Reg w/Par CL7 DIMM (Kit of 2) QR, x8 w/Therm Sen KVR1066D3Q8R7SK2/8G Get Price

8GB 1333MHz DDR3 ECC Reg w/Par CL9 DIMM (Kit of 2) DR, x4 w/Therm Sen KVR1333D3D4R9SK2/8G Get Price

12GB 1066MHz DDR3 Non-ECC CL7 DIMM (Kit of 3) KVR1066D3N7K3/12G Get Price

12GB 1066MHz DDR3 ECC CL7 DIMM (Kit of 3) with Thermal Sensor KVR1066D3E7SK3/12G Get Price

12GB 1066MHz DDR3 ECC Reg w/Par CL7 DIMM (Kit of 3) DR, x4 w/Therm Sen KVR1066D3D4R7SK3/12G Get Price

12GB 1066MHz DDR3 ECC Reg w/Par CL7 DIMM (Kit of 3) QR, x8 w/Therm Sen KVR1066D3Q8R7SK3/12G Get Price

12GB 1333MHz DDR3 ECC Reg w/Par CL9 DIMM (Kit of 3) DR, x4 w/Therm Sen KVR1333D3D4R9SK3/12G Get Price

16GB 1066MHz DDR3 ECC Reg w/Par CL7 DIMM (Kit of 2) QR, x4 w/Therm Sen KVR1066D3Q4R7SK2/16G Get Price

24GB 1066MHz DDR3 ECC Reg w/Par CL7 DIMM (Kit of 3) QR, x4 w/Therm Sen KVR1066D3Q4R7SK3/24G Get Price

HyperX DDR 333MHz and 400MHz

Description Part Number Price

512MB 333MHz DDR Non-ECC CL2 (2-2-2-5-1) DIMM KHX2700/512 Get Price

512MB 400MHz DDR Non-ECC CL2 (2-3-2-6-1) DIMM KHX3200A/512 Get Price

1GB 333MHz DDR Non-ECC CL2 (2-2-2-5-1) DIMM (Kit of 2) KHX2700K2/1G Get Price

1GB 400MHz DDR Non-ECC CL2.5 (2.5-3-3-7-1) DIMM KHX3200/1G Get Price

1GB 400MHz DDR Non-ECC CL2 (2-3-2-6-1) DIMM KHX3200A/1G Get Price

1GB 400MHz DDR Non-ECC CL2 (2-3-2-6-1) DIMM (Kit of 2) KHX3200AK2/1G Get Price

2GB 400MHz DDR Non-ECC CL2.5 (2.5-3-3-7-1) DIMM (Kit of 2) KHX3200K2/2G Get Price

2GB 400MHz DDR Non-ECC CL2 (2-3-2-6-1) DIMM (Kit of 2) KHX3200AK2/2G Get Price

HyperX DDR2 800MHz, 900MHz, 1000MHz, 1066MHz and 1150MHz

Description Part Number Price

512MB 800MHz DDR2 Non-ECC Low-Latency CL4 (4-4-4-12) DIMM KHX6400D2LL/512 Get Price

512MB 1066MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM KHX8500D2/512 Get Price

1GB 800MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM KHX6400D2/1G Get Price

1GB 800MHz DDR2 Non-ECC Low-Latency CL4 (4-4-4-12) DIMM KHX6400D2LL/1G Get Price

1GB 800MHz DDR2 Non-ECC Low-Latency CL4 (4-4-4-12) DIMM (Kit of 2) KHX6400D2LLK2/1G Get Price

1GB 800MHz DDR2 Non-ECC Low Lat CL4 (4-4-4-12) DIMM (NVIDIA SLI-Ready) KHX6400D2LLK2/1GN Get Price

1GB 1066MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM KHX8500D2/1G Get Price

1GB 1066MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 2) KHX8500D2K2/1G Get Price

1GB 1066MHz DDR2 CL5 (5-5-5-15) DIMM (Kit of 2) (NVIDIA SLI-Ready) KHX8500D2K2/1GN Get Price

1GB 1150MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM KHX9200D2/1G Get Price

1GB 1200MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM KHX9600D2/1G Get Price

2GB 800MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM KHX6400D2/2G Get Price

2GB 800MHz DDR2 Non-ECC Low-Latency CL4 (4-4-4-12) DIMM KHX6400D2LL/2G Get Price

2GB 800MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 2) KHX6400D2K2/2G Get Price

2GB 800MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 2) Tall HS KHX6400D2T1K2/2G Get Price

2GB 800MHz DDR2 Non-ECC Low-Latency CL4 (4-4-4-12) DIMM (Kit of 2) KHX6400D2LLK2/2G Get Price

2GB 800MHz DDR2 Non-ECC Low-Lat CL4 (4-4-4-12) DIMM (NVIDIA SLI-Ready) KHX6400D2LLK2/2GN Get Price

2GB 1066MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM KHX8500D2/2G Get Price

2GB 1066MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 2) KHX8500D2K2/2G Get Price

2GB 1066MHz DDR2 CL5 (5-5-5-15) DIMM (Kit of 2) (NVIDIA SLI-Ready) KHX8500D2K2/2GN Get Price

2GB 1150MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 2) KHX9200D2K2/2G Get Price

2GB 1200MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 2) KHX9600D2K2/2G Get Price

2GB 800MHz DDR2 ECC Low-Latency CL4 (4-4-4-12) FBDIMM (Kit of 2) KHX6400F2LLK2/2G Get Price

4GB 800MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 2) KHX6400D2K2/4G Get Price

4GB 800MHz DDR2 Non-ECC Low-Latency CL4 (4-4-4-12) DIMM (Kit of 2) KHX6400D2LLK2/4G Get Price

4GB 1066MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 2) KHX8500D2K2/4G Get Price

4GB 1066MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 2) Tall HS KHX8500D2T1K2/4G Get Price

4GB 1066MHz DDR2 Non-ECC CL5 (5-5-5-15) DIMM (Kit of 4) KHX8500D2K4/4G Get Price

8GB 800MHz DDR2 Non-ECC Low-Latency CL4 (4-4-4-12) DIMM (Kit of 4) KHX6400D2LLK4/8G Get Price

HyperX DDR3 1375MHz, 1600MHZ, 1625MHz, 1800MHz, 1866MHz and 2000MHz

Description Part Number Price

1GB 1375MHz DDR3 Non-ECC Low-Latency CL7 (7-7-7-20) DIMM KHX11000D3LL/1G Get Price

1GB 1600MHz DDR3 Non-ECC CL9 (9-9-9-27) DIMM KHX12800D3/1G Get Price

1GB 1625MHz DDR3 Non-ECC Low-Latency CL7 (7-7-7-20) DIMM KHX13000D3LL/1G Get Price

1GB 1625MHz DDR3 Non-ECC Low-Latency CL7 (7-7-7-20) DIMM KHX13000AD3LL/1G Get Price

1GB 1800MHz DDR3 Non-ECC CL8 (8-8-8-24) DIMM KHX14400D3/1G Get Price

1GB 1800MHz DDR3 Non-ECC CL8 (8-8-8-24) DIMM KHX14400AD3/1G Get Price

2GB 1375MHz DDR3 Non-ECC CL9 (9-9-9) DIMM (Kit of 2) KHX11000D3K2/2G Get Price

2GB 1375MHz DDR3 Non-ECC Low-Latency CL7 (7-7-7-20) DIMM KHX11000D3LL/2G Get Price

2GB 1375MHz DDR3 Non-ECC CL7 (7-7-7-20) DIMM (Kit of 2) KHX11000D3LLK2/2G Get Price

2GB 1375MHz DDR3 Non-ECC CL7 (7-7-7-20) DIMM (Kit of 2) Intel XMP KHX11000D3LLK2/2GX Get Price

2GB 1600MHz DDR3 Non-ECC CL9 (9-9-9-27) DIMM KHX12800D3/2G Get Price

2GB 1600MHz DDR3 Non-ECC CL9 (9-9-9-27) DIMM (Kit of 2) KHX12800D3K2/2G Get Price

2GB 1625MHz DDR3 Non-ECC Low-Latency CL7 (7-7-7-20) DIMM KHX13000D3LL/2G Get Price

2GB 1625MHz DDR3 Non-ECC Low-Latency CL7 (7-7-7-20) DIMM (Kit of 2) KHX13000D3LLK2/2G Get Price

2GB 1625MHz DDR3 Low Latency CL8 (8-7-7-20) DIMM (Kit of 2) NVIDIA SLI KHX13000D3LLK2/2GN Get Price

D. Kroening: AIMS Embedded Systems Programming MT 2018 101

Page 102: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Timings

Example: 2-2-2-5

Current standard:

1. CAS Latency2. RAS-to-CAS Delay3. RAS Precharge4. Act-to-Precharge Delay

D. Kroening: AIMS Embedded Systems Programming MT 2018 102

Page 103: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Caches

I Recall: DRAM slow/cheap, SRAM fast/pricey

I Idea: use SRAM as fast cache for lots of DRAM

I “Hides” the latency of the slow DRAM

I Usually good hit rates >90 %

D. Kroening: AIMS Embedded Systems Programming MT 2018 103

Page 104: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Caches: Overview

1

2

3

5

4

0

. . .

6

7

8

9

123

456

123 456

789

123789 123

0

tag

cacheline 0

1

2

3

index

main memorycache

0

0 1 2 3offset

D. Kroening: AIMS Embedded Systems Programming MT 2018 104

Page 105: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Caches: Hashing

Q: How to map the addresses?

Easiest answer: use least-significant bits

address = tag index offset

I tag: distinguishes lines with same indexI index: address in cacheI offset: distinguishes words in cache line

D. Kroening: AIMS Embedded Systems Programming MT 2018 105

Page 106: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Collisions

2

3

index

0

. . .

0123456789

101112131415161718192021222324

tag

0

0

0

0

1

1

0

tag

cacheline 0

1

D. Kroening: AIMS Embedded Systems Programming MT 2018 106

Page 107: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Overview of Design Options for Caches

I sizeI line size – number of bytes stored togetherI allocation policy – when is a new entry created?I associativity – length of list in hash tableI replacement policy – which entries to purgeI (sectoring)I write policy – write through or write backI split I/D cache or unified I/D cache

We will have more options once hierarchy is added.

D. Kroening: AIMS Embedded Systems Programming MT 2018 107

Page 108: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Cache Size

I Bigger cache −→ better hit rateI Bigger caches are also more expensive and have longer

paths

I Partially addressed by hierarchy(more on that later)

D. Kroening: AIMS Embedded Systems Programming MT 2018 108

Page 109: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Line Size

I Observation: memory accesses are clusteredI I.e., the subsequent accesses are often next to each otherI Cache entries have overhead: address bits plus flag bitsI Also remember the latency of memory!

4 Reduce overhead by making cache entry bigger

I Typical size: 64 bytes (512 bits)

D. Kroening: AIMS Embedded Systems Programming MT 2018 109

Page 110: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Associativity

I Also called “ways”

I An n-way cache can store n entries with the same addresshash

I Think of the length of the list in a hash table

I This reduces the number of collisions

D. Kroening: AIMS Embedded Systems Programming MT 2018 110

Page 111: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Associativity

12131415161718192021222324

tag

0

0

0

0

1

1

2

2

3

3

0

0

1

1

index tag

0

1

2-way cache

cacheline

. . .

0123456789

1011

D. Kroening: AIMS Embedded Systems Programming MT 2018 111

Page 112: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Cache Hierarchies

I Recall that fast SRAM is expensive, and bigger cacheshave long paths

I Thus: build a cache for the cache

I L1: closest to CPUI L2, L3, L4: cache the next level

I Caches get bigger the closer they get to the memory

D. Kroening: AIMS Embedded Systems Programming MT 2018 112

Page 113: AIMS Embedded Systems Programming MT 2018 - Micro …aims.robots.ox.ac.uk/wp-content/uploads/2018/11/2-micro.pdf · High-Level View of Microarchitectures..... eaxebx ecx ZF CPU registers

Statistics

Model Year L1 Cache L2 Cache L3 Cache L4 Cache80486DX 1989 8 KB jointPentium 1993 8 KB+8 KBPentium Pro 1995 8 KB+8 KB 0.25 MBPentium MMX 1997 16 KB+16 KBPentium II 1997 16 KB+16 KB 0.5 MBXeon 1998 8 KB+8 KB 0.25–1 MBPentium III 1999 16 KB+16 KB 0.5 MBPentium 4 2000 16 KB+16 KB 0.25–0.5 MBItanium 2 2002 16 KB+16 KB 1.5–9 MB 2 or 4 MBPentium M 2003 32 KB+32 KB 0.25 MBCore 2 Duo 2006 32 KB+32 KB 2 MBCore i7 2008 32 KB+32 KB 0.25 MB 8 MBCore i5 2009 32 KB+32 KB 0.25 MB 8 MBCore i3 2010 32 KB+32 KB 0.25 MB 4 MBAtom SoC 2012 32 KB+24 KB 0.25 MBCore M 2014 0.25 MB 3 MB 128 MB

Numbers are per core unless shared.

D. Kroening: AIMS Embedded Systems Programming MT 2018 113