The MIPS Instruction Set - egr.msu.edu · The MIPS Instruction Set ! Used as the example throughout the book ! Large share of embedded core market but dwarfed by ARM ! Typical of

The University of Adelaide, School of Computer Science 17 January 2011

Chapter 2 — Instructions: Language of the Computer 2

CSE 420 Chapter 2 — Instructions: Language of the Computer — 3

The MIPS Instruction Set n  Used as the example throughout the book n  Large share of embedded core market

but dwarfed by ARM

n  Typical of many modern ISAs n  See MIPS Reference Data tear-out card, and

Appendixes B and E


Arithmetic Operations n  Add and subtract have three operands

n  Two sources and one destination add a, b, c # a gets b + c

n  All MIPS arithmetic ops have this form n  Design Principle 1:

Simplicity favors regularity. n  Regularity makes implementation simpler n  Regularity enables higher performance

at lower cost

§2.2 Operations of the C

omputer H

ardware




Register Operands n  Arithmetic instructions use register operands n  MIPS has a 32 × 32-bit register file

n  Used for frequently accessed data n  Numbered 0 to 31

n  Assembler names n  $t0, $t1, …, $t9 for temporary values n  $s0, $s1, …, $s7 for saved variables

n  Design Principle 2: Smaller is faster.

§2.3 Operands of the C

omputer H

ardware


Register Operand Example n  C code: f = (g + h) - (i + j);

n  Register assignment? n  f in $s0, g in $s1, h in $s2, i in $s3, j in $s4 n  Temps?

n  Compiled MIPS code: add $t0, $s1, $s2 add $t1, $s3, $s4 sub $s0, $t0, $t1




Load-Store Architecture n  Data is in memory

n  Load values from memory into registers n  Store result from register to memory

n  Details: n  Memory is byte addressed

n  Each address identifies an 8-bit byte n  Words are aligned in memory

n  Word addresses must be a multiple of 4 n  MIPS is Big Endian

n  Most-significant byte at least address of a word n  c.f. Little Endian: least-significant byte at least address


Memory Operand Example 1 n  C code: g = h + A[8];

n  g in $s1, h in $s2, base address of A in $s3 n  Compiled MIPS code:

n  Index 8 requires offset of 32 Why?

lw $t0, 32($s3) # load word add $s1, $s2, $t0

offset base register




Memory Operand Example 2 n  C code: A[12] = h + A[8];

n  h in $s2, base address of A in $s3 n  Compiled MIPS code:

n  Index 8 requires offset of 32 n  Index 12 requires offset of ?

lw $t0, 32($s3) # load word add $t0, $s2, $t0 sw $t0, 48($s3) # store word


Registers vs. Memory n  Registers are faster than memory n  Load-store è

“more” instructions to be executed n  Register optimization is important!




Immediate Operands n  Constant data specified in an instruction addi $s3, $s3, 4

n  There is no “subtract immediate” instruction n  Subtract is “add a negative” addi $s2, $s1, -4

n  Design Principle 3: Make the common case fast. n  Small constants are common n  Immediate operand avoids a load instruction n  Why?


The Constant Zero n  MIPS register 0 ($zero) is the constant 0

n  Cannot be overwritten n  Useful for common operations

n  e.g., move between registers add $t2, $s1, $zero

n  Why a fixed register zero?




Sign Extension n  How do you

represent a number using more bits? e.g. move an 8-bit number into a 16-bit word

n  Replicate the sign bit to the left n  c.f. unsigned values: extend with 0s

n  Examples: 8-bit to 16-bit n  +2: 0000 0010 => 0000 0000 0000 0010 n  –2: 1111 1110 => 1111 1111 1111 1110

n  In MIPS instruction set n  addi: extend immediate value n  lb, lh: extend loaded byte/halfword n  beq, bne: extend the displacement


Representing Instructions n  Instructions are encoded in binary

n  Called “machine code”

n  MIPS instructions n  Encoded as 32-bit instruction words (Regularity!) n  Small number of formats encode

opcode, register numbers, … n  Register numbers

n  $t0 – $t7 are reg’s 8 – 15 n  $s0 – $s7 are reg’s 16 – 23 n  $t8 – $t9 are reg’s 24 – 25 n  (0-8 and 25-31 described later – procedure calls)

§2.5 Representing Instructions in the C

omputer




MIPS R-format Instructions

n  Instruction fields n  op: operation code (opcode) n  rs: first source register number n  rt: second source register number n  rd: destination register number n  shamt: shift amount (00000 for now) n  funct: function code (extends opcode) (Why?)

op rs rt rd shamt funct 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits


R-format Example

add $t0, $s1, $s2

special $s1 $s2 $t0 0 add

0 17 18 8 0 32

000000 10001 10010 01000 00000 100000

000000100011001001000000001000002 = 0232402016

op rs rt rd shamt funct 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits




Hexadecimal n  Base 16

n  Compact representation of bit strings n  4 bits per hex digit

0 0000 4 0100 8 1000 c 1100 1 0001 5 0101 9 1001 d 1101 2 0010 6 0110 a 1010 e 1110 3 0011 7 0111 b 1011 f 1111

n  Example: eca8 6420 n  1110 1100 1010 1000 0110 0100 0010 0000


MIPS I-format Instructions

n  Immediate arithmetic and load/store instructions n  rt: destination or source register number n  Constant: –215 to +215 – 1 n  Address: offset added to base address in rs

n  Design Principle 4: Good design demands good compromises. n  Different formats complicate decoding,

but allow 32-bit instruction uniformity n  Keep formats as similar as possible

op rs rt constant or address 6 bits 5 bits 5 bits 16 bits




Stored Program Computers n  Instructions represented in

binary, just like data n  Instructions and data

stored in memory n  Programs can

operate on programs n  e.g., compilers, linkers, …

n  Binary compatibility allows compiled programs to work on other computers with the same ISA.

The BIG Picture


Logical Operations n  Instructions for bitwise manipulation

Operation C Java MIPS Shift left << << sll

Shift right >> >>> srl

Bitwise AND & & and, andi

Bitwise OR | | or, ori

Bitwise NOT ~ ~ nor

n  Useful for extracting and inserting groups of bits in a word

§2.6 Logical Operations




Branch Addressing n  Branch instructions specify

n  Opcode, two registers, target address n  Most branch targets are near branch

n  Forward or backward

op rs rt constant or address 6 bits 5 bits 5 bits 16 bits

n  PC-relative addressing n  Target address = PC + offset × 4 n  PC already incremented by 4 by this time


Jump Addressing n  Jump (j and jal) targets could be

anywhere in text segment n  Encode full address in instruction

op address 6 bits 26 bits

n  (Pseudo)Direct jump addressing n  Target address = PC31…28 : (address × 4)

Morgan Kaufmann Publishers 26 January 2011

Chapter 4 — The Processor 1

Chapter 4

The Processor

CSE 420 Chapter 4 — The Processor — 2

Introduction n  CPU performance factors

n  Instruction count n  Determined by ISA and compiler

n  CPI and Cycle time n  Determined by CPU hardware

n  We will examine two MIPS implementations n  A simplified version n  A more realistic pipelined version

n  Simple subset, shows most aspects n  Memory reference: lw, sw n  Arithmetic/logical: add, sub, and, or, slt n  Control transfer: beq, j

§4.1 Introduction




Instruction Execution n  PC → instruction memory, fetch instruction n  Register numbers → register file, read registers n  Depending on instruction class

n  Use ALU to calculate n  Arithmetic result n  Memory address for load/store n  Branch target address

n  Access data memory for load/store n  PC ← target address or PC + 4


CPU Overview




Multiplexers n  Can’t just join

wires together n  Use multiplexers


Control




Clocking Methodology n  Combinational logic

transforms data during clock cycles n  Between clock edges n  Input from state elements,

output to state element n  Longest delay determines clock period


Building a Datapath n  Datapath

n  Elements that process data and addresses in the CPU

n  Registers, ALUs, mux’s, memories, …

n  We will build a MIPS datapath incrementally n  Refining the overview design

§4.3 Building a D

atapath




Instruction Fetch

32-bit register

Increment by 4 for next instruction


R-Format Instructions n  Read two register operands n  Perform arithmetic/logical operation n  Write register result




Load/Store Instructions n  Read register operands n  Calculate address using 16-bit offset

n  Use ALU, but sign-extend offset n  Load: Read memory and update register n  Store: Write register value to memory


Branch Instructions n  Read register operands n  Compare operands

n  Use ALU, subtract and check Zero output n  Calculate target address

n  Sign-extend displacement n  Shift left 2 places (word displacement) n  Add to PC + 4

n  Already calculated by instruction fetch




Branch Instructions

Just re-routes

wires

Sign-bit wire replicated


Composing the Elements n  First-cut data path

does an instruction in one clock cycle n  Each data-path element

can only do one function at a time n  Hence, we need

separate instruction and data memories n  Use multiplexers where alternate data

sources are used for different instructions




R-Type/Load/Store Datapath

Full Data Path




ALU Control n  ALU used for

n  Load/Store: F = add n  Branch: F = subtract n  R-type: F depends on funct field

§4.4 A Sim

ple Implem

entation Schem

e

ALU control Function 0000 AND 0001 OR 0010 add 0110 subtract 0111 set-on-less-than 1100 NOR


ALU Control n  Assume 2-bit ALUOp derived from opcode

n  Combinational logic derives ALU control

opcode ALUOp Operation funct ALU function ALU control lw 00 load word XXXXXX add 0010

sw 00 store word XXXXXX add 0010 beq 01 branch equal XXXXXX subtract 0110 R-type 10 add 100000 add 0010

subtract 100010 subtract 0110 AND 100100 AND 0000 OR 100101 OR 0001

set-on-less-than 101010 set-on-less-than 0111




The Main Control Unit n  Control signals derived from instruction

0 rs rt rd shamt funct 31:26 5:0 25:21 20:16 15:11 10:6

35 or 43 rs rt address 31:26 25:21 20:16 15:0

4 rs rt address 31:26 25:21 20:16 15:0

R-type

Load/ Store

Branch

opcode always read

read, except for load

write for R-type

and load

sign-extend and add

Data Path With Control




Pipelining Analogy n  Pipelined laundry: overlapping execution

n  Parallelism improves performance n  Four stages: Wash – Dry – Fold - Hang

§4.5 An O

verview of P

ipelining n  Four loads: n  Speedup

= 8/3.5 = 2.3 n  Non-stop:

n  Speedup = 2n/(0.5n + 1.5) ≈ 4 = number of stages


MIPS Pipeline n  Five stages, one step per stage

1.  IF: Instruction fetch from memory 2.  ID: Instruction decode & register read 3.  EX: Execute operation or calculate address 4.  MEM: Access memory operand 5.  WB: Write result back to register




Hazards n  Situations that prevent starting the next

instruction in the next cycle n  Structural hazard

n  A required resource is busy n  Data hazard

n  Need to wait for previous instruction to complete its data read/write

n  Control hazard n  Deciding on control action

depends on previous instruction


Structure Hazards n  Conflict for use of a resource n  In MIPS pipeline with a single memory

n  Load/store requires data access n  Instruction fetch would have to stall

for that cycle n  Would cause a pipeline “bubble”

n  Hence, pipelined data paths require separate instruction & data memories n  Or separate instruction & data caches




MIPS Pipelined Datapath §4.6 P

ipelined Datapath and C

ontrol

WB

MEM

Right-to-left flow leads to hazards


Pipeline registers n  Need registers between stages

n  To hold information produced in previous cycle




Pipeline Operation n  Cycle-by-cycle flow of instructions

through the pipelined data path n  “Single-clock-cycle” pipeline diagram

n  Shows pipeline usage in a single cycle n  Highlight resources used

n  c.f. “multi-clock-cycle” diagram n  Graph of operation over time

n  We’ll look at “single-clock-cycle” diagrams for load & store


IF for Load, Store, …




ID for Load, Store, …


EX for Load




MEM for Load


WB for Load

Wrong register number




Corrected Datapath for Load


EX for Store




Pipelined Control n  Control signals derived from instruction

n  As in single-cycle implementation

Chapter 4 — The Processor — 66

Pipelined Control




Datapath with Forwarding


Load-Use Data Hazard

Need to stall for one cycle

Documents

The MIPS Instruction Set - egr.msu.edu · The MIPS Instruction Set ! Used as the example throughout the book ! Large share of embedded core market but dwarfed by ARM ! Typical of