ENGS 116 Lecture 31 Instruction Set Design Vincent H. Berk September 29 th, 2008 Reading for Today: Chapter 1.5 – 1.11, Mazor article Reading for Wednesday:

ENGS 116 Lecture 3 1

Instruction Set Design

Vincent H. Berk

September 29th, 2008

Reading for Today: Chapter 1.5 – 1.11, Mazor article

Reading for Wednesday: Appendix B.1 – B.11, Wulf article

Homework for Wednesday: 1.1, 1.3, 1.6, 1.7, 1.13


Instruction Sets

instruction set

software

hardware


Interface Design

A good interface:

• Lasts through many implementations (portability, compatibility).

• Is used in many different ways (generality).

• Provides convenient functionality to higher levels.

• Permits an efficient implementation at lower levels.

use

use

use

imp 1

imp 2

imp 3

InterfaceTime


Evolution of Instruction SetsSingle Accumulator (EDSAC 1950)

Accumulator + Index Registers (Manchester Mark I, IBM 700 series 1953)

Separation of Programming Modelfrom Implementation

High-level Language Based Concept of a Family (B5000 1963) (IBM 360 1964)

General Purpose Register Machines

Complex Instruction Sets Load/Store Architecture(Vax, Intel 432 1977-80) (CDC 6600, Cray 1 1963-76)

CISC RISC(Intel x86, Pentium II/III/4, (MIPS, SPARC, 88000, IBM RS6000, …1987)

core 2, AMD Atlon/Opteron)


Evolution of Instruction Sets

• Major advances in computer architecture are typically associated with landmark instruction set designs

– Ex: Stack vs General Purpose Registers (GPR)

• Design decisions must take into account:

– technology

– machine organization

– programming languages

– compiler technology

– operating systems

• Few will ever design an instruction set, but understanding ISA design decisions is important


Design Space of ISAFive Primary Dimensions

– Number of explicit operands (0,1,2,3) - ISA class

– Operand storage Where besides memory?

– Effective address: How is memory location specified?

– Type & size of operands byte, int, float, vectors,

32-bits, 64-bits? How is it specified?

– Operations add, sub, mul, … How is it specified?

Other Aspects

• Successor How is it specified?

• Conditions How are they determined?

• Encodings Fixed or variable? Wide?

• Parallelism


Basic ISA Classes

Accumulator:

1 address add A acc acc + mem[A]

1+x address addx A acc acc + mem[A + x]

Stack:

0 address add tos tos + next

General Purpose Register:

2 address add A B A A + B

3 address add A B C A B + C

Load/Store:

3 address add Ra Rb Rc Ra Rb + Rc

load Ra Rb Ra mem[Rb]

store Ra Rb mem[Rb] Ra


Primary Advantages and Disadvantagesof Each Class of Machine

Stack

A: Simple model of expression evaluation (reverse polish). Short instructions can yield good code density.

D: A stack cannot be randomly accessed. This limitation makes it difficult to generate efficient code. It’s also difficult to implement efficiently, since the stack becomes a bottleneck.

Accumulator

A: Minimizes internal state of machine. Short instructions.

D: Since accumulator is only temporary storage, memory traffic is highest for this approach.


Register

A: Most general model for code generation.

D: All operands must be named, leading to longer instructions.

While most early machines used stack or accumulator-style architectures, modern machines (designed in last 10-15 years and still in use) use a general-purpose register architecture.

Registers are faster than memory Registers are easier for compilers to use Registers can be used more effectively than other forms of

internal storage


Machine Types


general-purposeregisters

EDSAC 1 Accumulator 1949

IBM 701 1 Accumulator 1953CDC 6600 8 Load-store 1963IBM 360 16 Register-memory 1964DEC PDP-8 1 Accumulator 1965DEC PDP-11 8 Register-memory 1970Intel 8008 1 Accumulator 1972Motorola 6800 2 Accumulator 1974DEC VAX 16 Register-memory, mem-mem 1977Intel 8086 1 Extended accumulator 1978Motorola 68000 16 Register-memory 1980Intel 80386 8 Register-memory 1985MIPS 32 Load-store 1985HP PA-RISC 32 Load-store 1986SPARC 32 Load-store 1987PowerPC 32 Load-store 1992DEC Alpha 32 Load-store 1992

FIGURE from SECOND EDITION


Addressing Modes

• Register Ri

• Immediate (literal) v

• Direct (absolute) M[v]

• Base+Displacement M[Ri +v]

• Register indirect M[Ri]

• Base+Index (Indexed) M[Ri + Rj]

• Scaled Index M[Ri + Rj*d +v]

• Autoincrement M[Ri++]

• Autodecrement M[Ri--]

• Memory indirect M[ M[Ri] ]

reg. file

memory


Addressing Modes


Memory Alignment

Processors often require data-types to be aligned on addresses that are a multiple of their size:

• address % sizeof (datatype) == 0

• bytes can be aligned everywhere

• 4 byte integers aligned on addresses divisible by 4

Byte Order

• Little Endian - Little End First (Intel) • Big Endian – Big End First (PowerPC, MIPS, NBO)

•Bi-Endian – can do both (SPARC v9)

D C B A

A B C D


Operations in the Instruction Set

• Arithmetic and logical – integer arithmetic and logical operations: add, and, subtract, or

• Data transfer – loads/stores (move instructions on machines with memory addressing)

• Control – branch, jump, procedure call and return, traps

• System – operating system call, virtual memory management instructions

• Floating point – floating-point operations: add, multiply

• Decimal – decimal add, decimal multiply, decimal-to-character conversions

• String – string move, string compare, string search

• Graphics – pixel and vertex operations


Rank

80x86 instruction

Integer average (% total executed)

1 load 22%

2 conditional branch 20% 3 compare 16%

4 store 12% 5 add 8%

6 and 6%

7 sub 5%

8 move register-register 4%

9 call 1% 10 return 1% Total 96%

FIGURE B.13 The top 10 instructions for the 80x86.


Control Flow

PIC – Position Independent Code

Caller vs. Callee saving of state


Instruction Set Encoding

• Affects program size:

– Number of instructions: size of the Opcode

– Number of instructions: types of instructions

– Number of operands

– Number of registers: size of the operand fields

– Variable instruction length vs. Fixed instruction length

• Intel x86 instructions are between 1 and 17 bytes long.



RISC vs. CISC

RISC = Reduced Instruction Set Computer

• Small instruction sets

• Fixed-length instructions that often execute in a single cycle

• Operations performed only on registers

• Simpler chip that can run at higher clock speed

CISC = Complex Instruction Set Computer

• Large instruction sets

• Complex, variable-length instructions

• Memory-to-memory operations


Design Principles CISC(Patterson, 1985)

• Richer instruction sets would simplify compilers.

• Richer instruction sets would alleviate the software crisis.

• Richer instruction sets would improve architecture quality.

– Since execution speed was proportional to program size, architectural techniques that led to smaller programs also led to faster computers.


Design Principles RISC(Patterson, 1985)

• Functions should be kept simple unless there is a very good reason to do otherwise.

• Simple decoding and pipelined execution are more important than program size.

• Compiler technology should be used to simplify instructions rather than to generate complex instructions.


A “Typical” RISC(Patterson)

• 32-bit fixed format instruction (3 formats)

• 32 64-bit general-purpose registers (R0 contains zero, double-precision numbers take two registers)

• Single address mode for load/store: base + displacement (no indirection)

• Simple branch conditions

• Delayed branch to avoid pipeline penalties

Examples: DLX, SPARC, MIPS, HP PA-RISC, DEC Alpha, IBM/Motorola PowerPC, Motorola M88000


MIPS Instruction Formats (DLX)

R-type op rs shamtrdrt funct

31 26 21 16 11 6 0

6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

I-type op rtrs immediate/address

6 bits 5 bits 5 bits 16 bits

J-type op target address

6 bits 26 bits


Impact of Compiler Technologyon Architecture Decisions

The interaction of compilers and high-level languages significantly affects how programs use an instruction set.

1. How are variables allocated and addressed? How many registers are needed to allocate variables appropriately?

2. What is the impact of optimization techniques on instruction mixes?

3. What control structures are used and with what frequency?


Instruction Set PropertiesThat Simplify Compiler Writing

1. Provide regularity.

2. Provide primitives, not solutions.

3. Simplify tradeoffs among alternatives.

4. Provide instructions that bind the quantities known at compile time as constants.


DEC VAX: “The penultimate CISC”

• VAX-11/780 introduced in 1977

• 2 goals:

– 32-bit extension of PDP-11 architecture (make customers comfortable)

– ease task of writing compilers and operating systems

• General-purpose register machine with large orthogonal instruction set

• 16 general-purpose registers (4 reserved)

• Large number of addressing modes, large number of instructions


• Any combination of addressing modes works with nearly every opcode

• Variable-length instructions

– 3-operand instruction may have 0 to 3 operand memory references, each of which may be any of the addressing modes

• Elaborate instructions can take dozens of clock cycles


IBM 360/370

• 360 introduced in 1964 – first to use notion of instruction set architecture (370 introduced in 1970 as successor to 360)

• Goals:– exploit storage – large main storage, storage hierarchies

– support concurrent I/O

– create a general-purpose machine with new OS facilities and many data types

– maintain strict upward and downward machine-language compatibility

• 32-bit machine with byte addressability and support for variety of data types


• 16 32-bit, general-purpose registers

• 4 double-precision (64-bit) floating-point registers

• 5 instruction formats, each of which is associated with a single addressing mode

• Basic operations

– logic operations on bits, character strings, and fixed words

– decimal or character operations on strings of characters or decimal digits

– fixed-point binary arithmetic

– floating-point arithmetic


IBM 360


Cray


ISA Metrics• Regularity (Orthogonality)

– No special registers, few special cases, all operand modes available with any data type or instruction type

• Primitives rather than solutions

• Completeness

– Support for a wide range of operations and target applications

•Streamlined

– Resource needs easily determined

• Ease of compilation

• Ease of implementation

• Scalability

• Density (network bandwidth and power consumption)

Documents

ENGS 116 Lecture 31 Instruction Set Design Vincent H. Berk September 29 th, 2008 Reading for Today: Chapter 1.5 – 1.11, Mazor article Reading for Wednesday: