Click here to load reader
Upload
vutuyen
View
216
Download
4
Embed Size (px)
Citation preview
ECE7660 ISA.1 Fall’05
ECE 7660: Advanced Computer Architecture
Instruction Set Architecture
Different styles of ISA.Basic issues when designing an ISA.
What a good ISA should be?
ECE7660 ISA.2 Fall’05
RECAP: MIPS Addressing Modes/Instruction Formats
op rs rt rd
immed
register
Register (direct)
op rs rt
register
Base+index
+
Memory
immedop rs rtImmediate
immedop rs rt
PC + 4
PC-relative
+
Memory
func
R-format:
I-format:
J-format:
op addr. Memory
smt
6 5 5 5 65
Rd rs func rd
Rt rs op immed
ECE7660 ISA.3 Fall’05
RECAP: Example Machine Code in Memory
while (A[i]==k) i = i+j;
Assume i, j, and k ~ $17, $18, $19 and base of A is in $3
Loop: add $20, $17, $17add $20, $20, $20add $20, $20, $3lw $21,0($20)bne $21, $19, Exitadd $17, $17, $18j Loop
Exit:
Assume the loop is placed starting at loc 80008000: 0 17 17 20 0 32
0 20 20 20 0 32
0 20 3 20 0 32
35 20 21 0
5 21 19 8
0 17 18 17 0 32
2 8000
Offset in branch is relative. Address in jump is absolute.Address in Branch or Jump instruction is word address so that they can go 4 times far as opposed to byte address.
2
2000
ECE7660 ISA.4 Fall’05
Other Representative ISAs
°Intel 8086/88 => 80286 => 80386 => 80486 => P5=> P6• 8086 few transistors to implement 16-bit microprocessor
• tried to be somewhat compatible with 8-bit microprocessor 8080
• successors added features which were missing from 8086 over next 15 years
• product of several different Intel engineers over 10 to 15 years
• Announced 1978
°VAX simple compilers & small code size =>• efficient instruction encoding
• powerful addressing modes
• powerful instructions
• few registers
• product of a single talented architect
• Announced 1977
Fall’05
Taxonomy of ISA• Stack: both operands are implicit on the top of the stack,
a data structure in which items are accessed an a last in, first out fashion.
• Accumulator: : one operand is implicit in the accumulator, a special-purpose register.
• General Purpose Register: all operands are explicit in specified registers or memory locations. Depending on where operands are specified and stored, there are three different ISA groups:
– Register-Memory: one operand in register and one in memory.Examples: IBM 360/370, Intel 80x86 family, Mototola68000;
– Memory-Memory: both operands are in memory. Example: VAX.– Register-Register (load & store): all operands, except for those in
load and store instructions, are in registers. Examples: SPARC (Sun Microsystems), MIPS, Precision Architecture (HP), PowerPC (IBM), Alpha (DEC).
ECE7660 ISA.6 Fall’05
Examples of ISA ClassesAccumulator: (earliest machines)
1 address add A acc ← acc + mem[A]
1+x address addx A acc ← acc + mem[A + x]
Stack: (HP calculator, Java virtual machines)
0 address add tos ← tos + next
General Purpose Register: (e.g. Intel 80x86, Motorola 68xxx)
2 address add A B EA(A) ← EA(A) + EA(B)
3 address add A B C EA(A) ← EA(B) + EA(C)
Load/Store: (e.g. SPARC, MIPS, PowerPC)
3 address add Ra Rb Rc Ra ← Rb + Rc
load Ra Rb Ra ← mem[Rb]
store Ra Rb mem[Rb] ← Ra
Comparison:Bytes per instruction? Number of Instructions? Cycles per instruction?
Fall’05
Example:
ISA Comparison
TOS
ALUALU ALU ALU
AccumulatorStack Reg. Set Reg. Set
Memory Memory Memory Memory
(a) Stack (b) Accumulator (c) Register-Memory (d) Reg-Reg/Load-Store
Push APush BAddPop C
Load AAdd BStore C
Load R1,AAdd R1,BStore R1,C
Load R1,ALoad R2,BAdd R3,R1,R2Store R3,C
C A+B
ALU
Memory
Add C,A,B or Add A,B
(e) Memory-Memory
ECE7660 ISA.8 Fall’05
°Since 1975 all machines use general purpose registers( Java Virtual Machine adopts Stack architecture )
°Advantages of registers
• registers are faster than memory
• registers are easier for a compiler to use
- e.g., (A*B) – (C*D) – (E*F) can do multiplies in any order vs. stack
• registers can hold variables
- memory traffic is reduced, so program is sped up (since registers are faster than memory)
- code density improves (since register named with fewer bits than memory location)
General Purpose Registers Dominate
ECE7660 ISA.9 Fall’05
More About Register
Integrated together in process chip.
In top level in memory hierarchy.
Faster to access and simpler to use.
Special role in MIPS ISA: only registers not symbolic variables can be In instructions.
The number of registers can not be too more, not be too less.
Effective use of registers is a key to program performance.
MBus Module
External Cache
DatapathRegisters
InternalCache
Control
SuperSPARC Processor
ECE7660 ISA.10 Fall’05
Machine Examples: Address & Registers
Intel 8086
VAX 11
MC 68000
MIPS
2 x 8 bit bytesAX, BX, CX, DXSP, BP, SI, DICS, SS, DSIP, Flags
2 x 8 bit bytes16 x 32 bit GPRs
2 x 8 bit bytes8 x 32 bit GPRs7 x 32 bit addr reg1 x 32 bit SP1 x 32 bit PC
2 x 8 bit bytes32 x 32 bit GPRs32 x 32 bit FPRsHI, LO, PC
acc, index, count, quotstack, stringcode,stack,data segment
r15-- program counterr14-- stack pointerr13-- frame pointerr12-- argument ptr
32
32
24
20
ECE7660 ISA.11 Fall’05
Examples of Register Usage
Number of memory addresses per typical ALU instruction
Maximum number of operands per typical ALU instruction
Examples
0 3 SPARC, MIPS, Precision Architecture, Power PC
1 2 Intel 80x86, Motorola 68000
2 2 VAX (also has 3-operand formats)
3 3 VAX (also has 2-operand formats)
ECE7660 ISA.12 Fall’05
Example:
In VAX: ADDL (R9), (R10), (R11)mem[R9] <-- mem[R10] + mem[R11]
In MIPS: lw R1, (R10); load a wordlw R2, (R11)add R3, R1, R2; R3 <-- R1+R2sw R3, (R9); store a word
ECE7660 ISA.13 Fall’05
Pros and Cons of Number of Memory Operands/OperandsRegister-register: 0 memory operands/instr, 3 (register) operands/instr
+ Simple, fixed-length instruction encoding. Simple code generation model. Instructions take similar numbers of clocks to execute
-Higher instruction count than architectures with memory references in instructions. Some instructions are short and bit encoding may be wasteful.
Register-memory (1,2)
+Data can be accessed without loading first. Instruction format tends to be easy to encode and yields good density.
-Operands are not equivalent since a source operand in a binary operation is destroyed. Encoding a register number and a memory Address in each instruction may restrict the number of registers. Clocks per instruction varies by operand location.
Memory-memory (3,3)
+Most compact. Doesn't waste registers for temporaries.-Large variation in instruction size, especially for three-operand
instructions. Also, large variation in work per instruction. Memory accesses create memory bottleneck.
ECE7660 ISA.14 Fall’05
Design of Instruction Set Architecture
• Memory Addressing– Byte Ordering– Memory Addressing Mode
• Type and Size of Operands• Operations in the Instruction Sets
– Operations for Media and Signal Processing
• Instructions for Control Flow• Encoding an Instruction Set
ECE7660 ISA.15 Fall’05
Memory addressing
• BYTE Addressing: – Since 1980, almost all machines, except DSPs,
are byte-addressed, providing access to byte, half-words (2 bytes), words (4 bytes), and double words (8 bytes)
• Two questions for design of ISA– For a 32-bit word, read it as four loads of bytes
from sequential byte addresses or as one load work from a single byte address. How byte address map onto words ?
– Can a word be placed on any byte boundary?
ECE7660 ISA.16 Fall’05
Addressing Objects in Memory
Two byte orderings to access large objects (half words, words, and double words):
Big Endian: address of most significant bit (msb)e.g. IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA
Little Endian: address of least significant bit (lsb)e.g. Intel 80x86, DEC Vax
msb lsb
3 2 1 0 little endian word 0:
0 1 2 3 big endian word 0:
Word:
Byte ordering can be a problem when exchanging data between computers with different ordering conventions
ECE7660 ISA.17 Fall’05
BIG Endian vs Little Endian
Example 1: Memory layout of a number #ABCD
In Big Endian: CDAB
In Little Endian: ABCD
Example 2: Memory layout of a number #FF00
$1001$1000
$1001$1000
ECE7660 ISA.18 Fall’05
Byte Swap Problem
GH
EF
CD
AB 0
1
2
3
increasingbyteaddress
Big Endian
AB
CD
EF
GH 0
1
2
3
Little Endian
Each system is self-consistent, but causes problems when they needcommunicate!
Memory layout of a number of ABCDEFGH
Fall’05
Object Alignment in Memory:
• Alignment: require that objects fall on address that is multiple of their size.
• Alignment of bytes: : an access to an object of size s s bytes at byte address AA is aligned if A mod s = 0A mod s = 0. Memory is aligned on a multiple of a word or double-word boundary
• Misalignment causes extra memory accesses and HW costs
Fall’05
Memory Addressing Mode• How ISA specifies the address of an object in mem
– Operands: : they can be found in registers, memory locations, and instructions themselves (instruction stream)
– Effective Address: specifies the actual memory address when a memory location is used for an operand
– PC-Relative Addressing: : addressing modes that depend on the program counter
– Immediates/Literals: : considered as memory addressing modes, even though the value they access is in the instruction steam
– Displacement Mode: must determine the range of displacement judiciously – Immediate/Literal Mode:: must decide the level of support for all operations
or a subset and the range of values
– … …
– Modulo/Circular Mode for DSPs:: handling infinite, continuous stream of data relies on circular buffers
– Bit-Reverse Mode: : used exclusively for FFTs
ECE7660 ISA.21 Fall’05
Addressing Mode Examples
Addressing mode Example Meaning
Register Add R4,R3 R4 ← R4+R3
Immediate Add R4,#3 R4 ← R4+3
Displacement Add R4,100(R1) R4 ← R4+Mem[100+R1]
Register indirect Add R4,(R1) R4 ← R4+Mem[R1]
Indexed Add R3,(R1+R2) R3 ← R3+Mem[R1+R2]
Direct or absolute Add R1,(1001) R1 ← R1+Mem[1001]
Memory indirect Add R1,@(R3) R1 ← R1+Mem[Mem[R3]]
Auto-increment Add R1,(R2)+ R1 ← R1+Mem[R2]; R2 ← R2+d
Auto-decrement Add R1,–(R2) R2 ← R2–d; R1 ← R1+Mem[R2]
Scaled Add R1,100(R2)[R3] R1 ← R1+Mem[100+R2+R3*d]
ECE7660 ISA.22 Fall’05
Addressing Mode:
• Addressing modes have the ability to significantly reduce instruction counts
• They also add to the complexity of building a machine• The addressing modes have different usage frequencies
ECE7660 ISA.23 Fall’05
Displacement Address Size
0%
5%
10%
15%
20%
25%
30%
0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15
Int. Avg. FP Avg.
Average of 5 programs from SPECint92 and Average of 5 programs from SPECfp92X-axis is in powers of 2: 8 < addresses < 16
1% of addresses > 16-bits
ECE7660 ISA.24 Fall’05
Immediate Size
• 50% to 60% fit within 8 bits
• 75% to 80% fit within 16 bits
ECE7660 ISA.25 Fall’05
Typical Operations
Data Movement Load (from memory)Store (to memory)memory-to-memory moveregister-to-register moveinput (from I/O device)output (to I/O device)push, pop (to/from stack)
Arithmetic integer (binary + decimal) or FPAdd, Subtract, Multiply, Divide
Logical not, and, or, set, clear
Shift shift left/right, rotate left/right
Control (Jump/Branch) unconditional, conditional
Subroutine Linkage call, return
Interrupt trap, return
Synchronization test & set (atomic r-m-w)
String search, translate
ECE7660 ISA.26 Fall’05
Top 10 80x86 Instructions
° Rank instruction Integer Average Percent total executed1 load 22%
2 conditional branch 20%
3 compare 16%
4 store 12%
5 add 8%
6 and 6%
7 sub 5%
8 move register-register 4%
9 call 1%
10 return 1%
Total 96%
°Simple instructions dominate instruction frequency
ECE7660 ISA.27 Fall’05
Methods of Testing Condition
° Condition CodesProcessor status bits are set as a side-effect of arithmetic instructions (possibly on Moves) or explicitly by compare or test instructions. E.g. 80x86, PowerPC, SPARC
ex: add r1, r2, r3
bz label
° Condition RegisterTests arbitrary register with the result of a comparison.
E.g. Alpha, MIPS
Ex: cmp r1, r2, r3; compare r2 with r3, 0 or 1 is stored in r1
bgt r1, label; branch on greater
° Compare and BranchCompare is part of the branch. E.g. VAX, PA-RISC
Ex: bgt r1, r2, label; if r1 > r2, then go to label
ECE7660 ISA.28 Fall’05
Condition Codes
Setting CC as side effect can reduce the # of instructions
X: ...
SUB r0, #1, r0BRP X
X: ...
SUB r0, #1, r0CMP r0, #0BRP X
versus
But also has disadvantages:
--- not all instructions set the condition codeswhich do and which do not often confusing!e.g., shift instruction sets the carry bit
--- dependency between the instruction that sets the CC and the onethat tests it: to overlap their execution, may need to separate themwith an instruction that does not change the CC
ifetch read compute write
ifetch read compute write
New CC computedOld CC read
ECE7660 ISA.29 Fall’05
Branches--- Conditional control transfers
Four basic conditions:N -- negativeZ -- zero
V -- overflowC -- carry
Sixteen combinations of the basic four conditions:AlwaysNeverNot EqualEqualGreaterLess or EqualGreater or EqualLessGreater UnsignedLess or Equal UnsignedCarry ClearCarry SetPositiveNegativeOverflow ClearOverflow Set
UnconditionalNOP~ZZ~[Z + (N V)]Z + (N V)~(N V)N V~(C + Z)C + Z~CC~NN~VV
⊗⊗
⊗⊗
ECE7660 ISA.30 Fall’05
Conditional Branch Distance
Bits of Branch Dispalcement
0%
5%
10%
15%
20%
25%
30%
35%
40%
0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15
Int. Avg. FP Avg.
• Distance from branch in instructions 2i => Š ±2i-1
• 25% of integer branches are > 22
ECE7660 ISA.31 Fall’05
Conditional Branch Addressing
• PC-relative since most branches at least 8 bitssuggested (± 128 instructions)
• Compare Equal/Not Equal most important for integerprograms
Frequency of comparison
types in branches
0% 20% 40% 60% 80% 100%
EQ/NE
GT/LE
LT/GE
37%
23%
40%
86%
7%
7%
Int Avg.
FP Avg.
ECE7660 ISA.32 Fall’05
Operation Summary
• Support these simple instructions, since they will dominate the number of instructions executed:
load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch (with a PC-relative address at least 8-bits long), jump, call, return;
ECE7660 ISA.33 Fall’05
Data TypesBit: 0, 1
Bit String: sequence of bits of a particular length4 bits is a nibble8 bits is a byte
16 bits is a half-word 32 bits is a word
Character:ASCII 7 bit codeEBCDIC 8 bit code (IBM)UNICODE 16 bit code (Java)
Decimal:digits 0-9 encoded as 0000b thru 1001btwo decimal digits packed per 8 bit byte
Integers:Sign & Magnitude: 0X vs. 1X1's Complement: 0X vs. 1(~X)2's Complement: 0X vs. (1's comp) + 1
Floating Point:Single PrecisionDouble PrecisionExtended Precision
Positive #'s same in allFirst 2 have two zerosLast one usually chosen
M x RE How many +/- #'s?
Where is decimal pt?How are +/- exponents
represented?
exponent
basemantissa
ECE7660 ISA.34 Fall’05
Operand Size Usage
Frequency of reference by size
0% 20% 40% 60% 80%
Byte
Halfword
Word
Doubleword
0%
0%
31%
69%
7%
19%
74%
0%
Int Avg.
FP Avg.
Support these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floating point numbers
ECE7660 ISA.35 Fall’05
Instruction Format
• If have many memory operands per instructions and many addressing modes, need an Address Specifierper operand
• If have load-store machine with 1 address per instr.and one or two addressing modes, then just encodeaddressing mode in the opcode
ECE7660 ISA.36 Fall’05
Generic Examples of Instruction Formats
Variable:
Fixed:
Hybrid:
Op, #oprnd Addr Spec1 Addr Spec2 Addr Specn
Op, Addr Field1, Addr Field2, Addr Field3
Op, Addr Spec, Addr Field
Op, Addr Spec1, Addr Spec2, Addr Field
Op, Addr Spec1, Addr Field1, Addr Field2
…
• Variable format supports any number of operands, with each address specifier determining the addressing mode. It is preferred if code size is important• Fixed format always has the same number of operands, with the address modes; preferred if performance is important.• Hybrid has multiple formats, specified by the opcode
ECE7660 ISA.37 Fall’05
Instruction Set Metrics
Design-time metrics:
° Can it be implemented, in how long, at what cost?
° Can it be programmed? Ease of compilation?
Static Metrics:
° How many bytes does the program occupy in memory?
Dynamic Metrics:
° How many instructions are executed?
° How many bytes does the processor fetch to execute the program?
° How many clocks are required per instruction?
° How "lean" a clock is practical?
Best Metric: Time to execute the program!
NOTE: this depends on instructions set, processor organization, andcompilation techniques.
CPI
Inst. Count Cycle Time
ECE7660 ISA.38 Fall’05
RISC Vs. CISC
°Determined by VLSI technology.
°Software cost goes up constantly. To be convenient for programmers.
°To shorten the semantic gap between HLL and architecture withoutadvanced compilers.
°To reduce the program length because memory was expensive.°VAX 11/780 reached the climax with >300 instructions and >20 addressing modes.
°Things changed: HLL, Advanced Compiler, Memory size, …
°Finding: 25% instructions used in 95% time.
°Size: usually <100 instructions and <5 addressing modes.
°Other properties: fixed instruction format, register based, hardware control…
°Gains: CPI is smaller, Clock cycle shorter, Hardware simpler, Pipeline easier
°Loss: Program becomes longer, but memory becomes larger and larger, cheaper and cheaper. Programmability becomes poor, but people use HLL instead of IS.
°Result: although program is prolonged, the total gain is still a plus.
RISC CISC
CISC RISC
ECE7660 ISA.39 Fall’05
CISC vs RISC: Summary
° RISC was a clear winner in the decade-long RISC/CISC debate, technically speaking
• The debate was closed by a comparison between VAX 8700 and MIPS M2000 by the designers of VAX processors in 1998
° But Intel processor survived the RISC/CISC debate• The chip volume is high enough to pay the extra design costs
• Sufficient resources to build microprocessors that translate from CISC to RISC internally (using hardware)
° Embedded market competes in both cost and power• They tend to use compilers and RISC architectures.
• More than twice as many 32-bit embedded microprocessors were shipped in 2000 than PC microprocessors and over 90% are RISC processors