Lecture 3: Instruction Set...

Preview:

Citation preview

Lecture 3: Instruction Set Architecture

Interface

instruction set

Software/compiler

hardware

Design Space of ISA

Five Primary Dimensions• Number of explicit operands ( 0, 1, 2, 3 )• Operand Storage Where besides memory?• Effective Address How is memory location

specified?• Type & Size of Operands byte, int, float, vector, . . .

How is it specified?• Operations add, sub, mul, . . .

How is it specifed?

Other Aspects• Successor How is it specified?• Conditions How are they determined?• Encodings Fixed or variable?• Parallelism

ISA Metrics• Orthogonality

– No special registers, few special cases, all operand modes available with any data type or instruction type

• Completeness– Support for a wide range of operations and target

applications

• Regularity– No overloading for the meanings of instruction fields

• Streamlined– Resource requirements can be easily determined

Ease of compilation

Basic ISA Classes

Accumulator:1 address add A acc ← ← acc + mem[A]1+x address addx A acc ← ← acc + mem[A + x]

Stack:0 address add tos ←← tos + next

General Purpose Register:2 address add A B EA(A) ← ← EA(A) + EA(B)3 address add A B C EA(A) ← ← EA(B) + EA(C)

Load/Store:3 address add Ra Rb Rc Ra ←← Rb + Rc

load Ra Rb Ra ← ← mem[Rb]store Ra Rb mem[Rb] ← ← Ra

Stack Machines• Instruction set:

+, -, *, /, . . .push A, pop A

• Example: a*b - (a+c*b)push apush b*push apush cpush b*+-

A BA

A*B

-

+

aa b

*

b

*

c

A*BA*B

A*B

AAC

A*BA A*B

Arguments Against Stacks

• Data does not always “surface” when needed– Constants, repeated operands, common sub-expressions

so TOP and SWAP instructions are required• Code density is about equal to that of GPR

instruction sets– Registers have short addresses– Keep things in registers and reuse them

• Slightly simpler to write a poor compiler, but not an optimizing compiler

Performance derived from fast registers, not the way they are used.

A "Typical" RISC

• 32-bit fixed format instruction (3 formats)• 32 32-bit GPR (R0 contains zero, DP take pair)• 3-address, reg-reg arithmetic instruction• Single address mode for load/store:

base + displacement• Simple branch conditions• Delayed branch

Example: MIPS

Op

31 26 01516202125

Rs1 Rd immediate

Op

31 26 025

Op

31 26 01516202125

Rs1 Rs2

target

Rd Opx

Register-Register

561011

Register-Immediate

Op

31 26 01516202125

Rs1 Rs2/Opx immediate

Branch

Jump / Call

shamt

• simple instructions all 32 bits wide• very structured• only three instruction formats

op rs rt rd shamt funct

op rs rt 16 bit address

op 26 bit address

R

I

J

Overview of MIPS

• Instructions:bne $t4,$t5,Label Next instruction is at Label if $t4 ° $t5

beq $t4,$t5,Label Next instruction is at Label if $t4 = $t5

j Label Next instruction is at Label

• Formats:

op rs rt 16 bit address

op 26 bit address

I

J

Addresses in Branches and Jumps

To summarize:MIPS operands

Name Example Comments$s0-$s7, $t0-$t9, $zero, Fast locations for data. In MIPS, data must be in registers to perform

32 registers $a0-$a3, $v0-$v1, $gp, arithmetic. MIPS register $zero always equals 0. Register $at is

$fp, $sp, $ra, $at reserved for the assembler to handle large constants.

Memory[0], Accessed only by data transfer instructions. MIPS uses byte addresses, so

230

memory Memory[4], ..., sequential words differ by 4. Memory holds data structures, such as arrays,

words Memory[4294967292] and spilled registers, such as those saved on procedure calls.

M I P S a s s e m b l y l a n g u a g e

C a t e g o r y I n s t r u c t i o n E x a m p l e M e a n i n g C o m m e n t s

a d d a d d $ s 1 , $ s 2 , $ s 3 $ s 1 = $ s 2 + $ s 3 T h r e e o p e r a n d s ; d a t a i n r e g i s t e r s

A r i t h m e t i c s u b t r a c t s u b $ s 1 , $ s 2 , $ s 3 $ s 1 = $ s 2 - $ s 3 T h r e e o p e r a n d s ; d a t a i n r e g i s t e r s

a d d i m m e d i a t e a d d i $ s 1 , $ s 2 , 1 0 0 $ s 1 = $ s 2 + 1 0 0 U s e d t o a d d c o n s t a n t s

l o a d w o r d l w $ s 1 , 1 0 0 ( $ s 2 ) $ s 1 = M e m o r y [$ s 2 + 1 0 0 ] W o r d f r o m m e m o r y t o r e g i s t e r

s t o r e w o r d s w $ s 1 , 1 0 0 ( $ s 2 ) M e m o r y [$ s 2 + 1 0 0 ] = $ s 1 W o r d f r o m r e g i s t e r t o m e m o r y

D a t a t r a n s f e r l o a d b y t e l b $ s 1 , 1 0 0 ( $ s 2 ) $ s 1 = M e m o r y [$ s 2 + 1 0 0 ] B y t e f r o m m e m o r y t o r e g i s t e r

s t o r e b y t e s b $ s 1 , 1 0 0 ( $ s 2 ) M e m o r y [$ s 2 + 1 0 0 ] = $ s 1 B y t e f r o m r e g i s t e r t o m e m o r y

l o a d u p p e r

i m m e d i a t e

l u i $ s 1 , 1 0 0 $ s 1 = 1 0 0 * 2 16 L o a d s c o n s t a n t i n u p p e r 1 6 b i t s

b r a n c h o n e q u a l b e q $ s 1 , $ s 2 , 2 5 i f ($ s 1 = = $ s 2 ) g o t o

P C + 4 + 1 0 0

E q u a l t e s t ; P C - r e l a t i v e b r a n c h

C o n d i t i o n a l

b r a n c h o n n o t e q u a l b n e $ s 1 , $ s 2 , 2 5 i f ($ s 1 ! = $ s 2 ) g o t o

P C + 4 + 1 0 0

N o t e q u a l t e s t ; P C - r e l a t i v e

b r a n c h s e t o n l e s s t h a n s l t $ s 1 , $ s 2 , $ s 3 i f ($ s 2 < $ s 3 ) $ s 1 = 1 ;

e l s e $ s 1 = 0

C o m p a r e l e s s t h a n ; f o r b e q , b n e

s e t l e s s t h a n

i m m e d i a t e

s l t i $ s 1 , $ s 2 , 1 0 0 i f ($ s 2 < 1 0 0 ) $ s 1 = 1 ;

e l s e $ s 1 = 0

C o m p a r e l e s s t h a n c o n s t a n t

j u m p j 2 5 0 0 g o t o 1 0 0 0 0 J u m p t o t a r g e t a d d r e s s

U n c o n d i - j u m p r e g i s t e r j r $ r a g o t o $ r a F o r s w i t c h , p r o c e d u r e r e t u r n

t i o n a l j u m p j u m p a n d l i n k j a l 2 5 0 0 $ r a = P C + 4 ; g o t o 1 0 0 0 0 F o r p r o c e d u r e c a l l

• Design alternative:– provide more powerful operations

– goal is to reduce number of instructions executed

– danger is a slower cycle time and/or a higher CPI

• Sometimes referred to as “RISC vs. CISC”– virtually all new instruction sets since 1982 have been

RISC

Alternative Architectures

Most Popular ISA of All Time:The Intel 80x86

• 1978: 8086– extension to 8080 (8 bit accumulator machine)– 16 bit, additional registers

• 1980: 8087 floating point coprocessor– adds 60 instructions – hybrid stack/register scheme

• 1982: 80286 – 24-bit address space– memory mapping and protection model

Most Popular ISA of All Time:The Intel 80x86

• 1985: 80386 – 32-bit address space– 32-bit GP registers– paging

• 1989-95– 80486, Pentium, Pentium Pro

• 1997– MMX added

80x86

• Complexity:– Instructions from 1 to 17 bytes long– one operand must act as both a source and destination– one operand can come from memory– complex addressing modes

e.g., “base or scaled index with 8 or 32 bit displacement”

• Saving grace:– the most frequently used instructions are not too difficult to

build– compilers avoid the portions of the architecture that are

slow

Intel 80x86 Integer Registers

Intel 80x86 Floating Point Registers

Usage of Intel 80x86 Floating Point Registers

NASA 7 SpiceStack (2nd operand ST(1)) 0.3% 2.0%Register (2nd operand ST(i), i>1) 23.3% 8.3%Memory 76.3% 89.7%

Above are dynamic instruction percentages (i.e., based on counts of executed instructions)

80x86 Addressing/Protection

80x86 Instruction Format

• 8086 in black; 80386 extensions in color

(Base reg + 2Scale x Index reg)

80x86 Instruction Encoding: Mod, Reg, R/M Field

r w=0 w=1 r/m mod=0 mod=1 mod=2 mod=3

16b 32b 16b 32b 16b 32b 16b 32b

0 AL AX EAX 0 addr=BX+SI =EAX same same same same same

1 CL CX ECX 1 addr=BX+DI =ECX addr addr addr addr as

2 DL DX EDX 2 addr=BP+SI =EDX mod=0 mod=0 mod=0 mod=0 reg

3 BL BX EBX 3 addr=BP+SI =EBX +d8 +d8 +d16 +d32 field

4 AH SP ESP 4 addr=SI =(sib) SI+d8 (sib)+d8 SI+d8 (sib)+d32 “

5 CH BP EBP 5 addr=DI =d32 DI+d8 EBP+d8 DI+d16 EBP+d32 “

6 DH SI ESI 6 addr=d16 =ESI BP+d8 ESI+d8 BP+d16 ESI+d32 “7 BH DI EDI 7 addr=BX =EDI BX+d8 EDI+d8 BX+d16 EDI+d32 “

First address specifier: Reg=3 bits, R/M=3 bits, Mod=2 bits

w fromopcode

r/m field depends on mod and machine mode

reg

80x86 Instruction EncodingSc/Index/Base field

sib Index Base

0 EAX EAX1 ECX ECX2 EDX EDX3 EBX EBX4 no index ESP5 EBP if mod=0, d32

if mod°0, EBP6 ESI ESI7 EDI EDI

Base + Scaled Index ModeUsed when:

mod = 0,1,2in 32-bit modeAND r/m = 4!

2-bit Scale Field3-bit Index Field3-bit Base Field

80x86 Addressing Mode Usage for 32-bit Mode

Addressing Mode GccEspr.NASA7 Spice Avg.

Register indirect 10% 10% 6% 2% 7%Base + 8-bit disp 46% 43% 32% 4% 31%Base + 32-bit disp 2% 0% 24% 10% 9%Indexed 1% 0% 1% 0% 1%Based indexed + 8b disp 0% 0% 4% 0% 1%Based indexed + 32b disp 0% 0% 0% 0% 0%Base + Scaled Indexed 12% 31% 9% 0% 13%Base + Scaled Index + 8b disp 2% 1% 2% 0% 1%Base + Scaled Index + 32b disp 6% 2% 2% 33% 11%32-bit Direct 19% 12% 20% 51% 26%

80x86 Length Distribution

Len

gth

in

byt

es

% instructions at each length

0% 10% 20% 30%

1

2

3

4

5

6

7

8

9

10

11

24%

23%

21%

3%

12%

13%

3%

0%

0%

1%

19%

17%

16%

1%

15%

27%

4%

0%

0%

1%

24%

24%

27%

4%

13%

6%

2%

0%

0%

0%

25%

24%

29%

3%

12%

4%

2%

0%

0%

0%

Espresso

Gcc

Spice

NASA7

Instruction Counts: 80x86 v. DLX

SPEC pgm x86 DLX DLX÷86

gcc 3,771,327,742 3,892,063,460 1.03espresso 2,216,423,413 2,801,294,286 1.26spice 15,257,026,309 16,965,928,788 1.11nasa7 15,603,040,963 6,118,740,321 0.39

Recommended