Skup instrukcija mikropocesora MIPS

Skup instrukcija

mikropocesora MIPS

MIPS register convention

Register 1, called $at, is reserved for the assembler, and registers 26-27, called $k0-$k1, are reserved for the operationg system.

Name Register number

Usage Preserved on call?

$zero 0 the constant value 0 n.a.

$v0-$v1

2-3 values for results and expression evalution

$a0-$a3

4-7 arguments yes

$t0-$t7

8-15 temporaries no

$s0-$s7

16-23 saved yes

$t8-$t9

24-15 more tempporaries no

$gp 28 global pointer yes

$sp 29 stack pointer yes

$fp 30 frame pointer yes

$ra 31 return address yes

MIPS operands

Name Example Comments

32 registers

$s0-$s7, $t0-$t9, $gp, $fp, $zero, $sp, $ra, $at, Hi, Lo

Fast locations for data, In MIPS, data must be in registers to perform arithmetic. MIPS register $zero always equals 0. Register $at is reserved for the assumbler to handle large constants. Hi and Lo contain the results of multiply and divide.

230 memory words

Memory [0], Memory [4], ..., Memory [4294967292]

Accessed only by data transfer instructions. MIPS uses byte addresses, so sequential words differ by 4. Memory holds data structures, such as arrays, and spilled registers, such as those saved on procedure calls.

Fig. 13.1 MicroMIPS instruction formats and naming of the various fields.

5 bits 5 bits

31 25 20 15 0

Opcode Source 1 or base

Source 2 or dest’n

op rs rt

R 6 bits 5 bits

5 bits

6 bits

10 5 fn

jta Jump target address, 26 bits

imm Operand / Offset, 16 bits

Destination Unused Opcode ext I

inst Instruction, 32 bits

Seven R-format ALU instructions (add, sub, slt, and, or, xor, nor)Six I-format ALU instructions (lui, addi, slti, andi, ori, xori)Two I-format memory access instructions (lw, sw)Three I-format conditional branch instructions (bltz, beq, bne)Four unconditional jump instructions (j, jr, jal, syscall)

We will refer to this diagram later

A Small Set of Instructions

Instruction UsageLoad upper immediate lui rt,imm

Add add rd,rs,rt

Subtract sub rd,rs,rt

Set less than slt rd,rs,rt

Add immediate addi rt,rs,imm

Set less than immediate slti rd,rs,imm

AND and rd,rs,rt

OR or rd,rs,rt

XOR xor rd,rs,rt

NOR nor rd,rs,rt

AND immediate andi rt,rs,imm

OR immediate ori rt,rs,imm

XOR immediate xori rt,rs,imm

Load word lw rt,imm(rs)

Store word sw rt,imm(rs)

Jump j L

Jump register jr rs

Branch less than 0 bltz rs,L

Branch equal beq rs,rt,L

Branch not equal bne rs,rt,L

Jump and link jal L

System call syscall

Control transfer

Arithmetic

Memory access

100000

1213143543

2014530

323442

36373839

12Table

The MicroMIPS Instruction Set

13.2 The Instruction Execution Unit

Fig. 13.2 Abstract view of the instruction execution unit for MicroMIPS. For naming of instruction fields, see Fig. 13.1.

Data cache

Instr cache

Next addr

Control

Reg file

rs,rt,rd (rs)

Address

5 bits 5 bits

31 25 20 15 0

Opcode Source 1 or base

Source 2 or dest’n

op rs rt

R 6 bits 5 bits

5 bits

6 bits

10 5 fn

jta Jump target address, 26 bits

imm Operand / Offset, 16 bits

Destination Unused Opcode ext I

inst Instruction, 32 bits

bltz,jr

beq,bne

12 A/L, lui, lw,sw

syscall

22 instructions

13.3 A Single-Cycle Data Path

Fig. 13.3 Key elements of the single-cycle MicroMIPS data path.

Data cache

Instr cache

Next addr

Reg file

rs (rs)

Data addr

Data in 0

ALUSrc ALUFunc DataWrite

DataRead

RegInSrc

RegDst RegWrite

32 / 16

Register input

Data out

ALUOvfl

Next PC

Incr PC

Br&Jump

ALU out

Instruction fetch Reg access / decode ALU operation Data access

Register writeback

An ALU for MicroMIPS

Fig. 10.19 A multifunction ALU with 8 control signals (2 for function class, 1 arithmetic, 3 shift, 2 logic) specifying the operation.

AddSub

Shifter

Logic unit

Logic function

Amount

Constant amount

Variable amount

ConstVar

Function class

Shift function

5 LSBs Shifted y

32-input NOR

Ovfl Zero

Shorthand symbol for ALU

Ovfl Zero

Control

0 or 1

AND 00 OR 01

XOR 10 NOR 11

00 Shift 01 Set less 10 Arithmetic 11 Logic

00 No shift 01 Logical left 10 Logical right 11 Arith right

13.4 Branching and Jumping

Fig. 13.4 Next-address logic for MicroMIPS (see top part of Fig. 13.3). Adder

jta imm

SysCallAddr

Branch condition checker

1 0 1 2 3

/ 32 BrTrue / 32

/ 30 / 30

/ 30 4 MSBs

30 MSBs

BrType

IncrPC

NextPC

/ 30 31:2

(PC)31:2 + 1 Default option

(PC)31:2 + 1 + imm When instruction is branch and condition is met

(PC)31:28 | jta When instruction is j or jal

(rs)31:2 When the instruction is jr SysCallAddr Start address of an operating system routine

Update options for PC

Lowest 2 bits of PC always 00

4 MSBs

13.5 Deriving the Control SignalsTable 13.2 Control signals for the single-cycle MicroMIPS implementation.

Control signal 0 1 2 3

RegWrite Don’t write Write

RegDst1, RegDst0 rt rd $31

RegInSrc1, RegInSrc0 Data out ALU out IncrPC

ALUSrc (rt ) imm

AddSub Add Subtract

LogicFn1, LogicFn0 AND OR XOR NOR

FnClass1, FnClass0 lui Set less Arithmetic Logic

DataRead Don’t read Read

DataWrite Don’t write Write

BrType1, BrType0 No branch beq bne bltz

PCSrc1, PCSrc0 IncrPC jta (rs) SysCallAddr

Reg file

Data cache

Next addr

Control Signal

Settings

Table 13.3

Load upper immediate Add Subtract Set less than Add immediate Set less than immediate AND OR XOR NOR AND immediate OR immediate XOR immediate Load word Store word Jump Jump register Branch on less than 0 Branch on equal Branch on not equal Jump and link System call

001111 000000 100000 000000 100010 000000 101010 001000 001010 000000 100100 000000 100101 000000 100110 000000 100111 001100 001101 001110 100011 101011 000010 000000 001000 000001 000100 000101 000011 000000 001100

1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0

00 01 01 01 00 00 01 01 01 01 00 00 00 00

01 01 01 01 01 01 01 01 01 01 01 01 01 00

1 0 0 0 1 1 0 0 0 0 1 1 1 1 1

0 1 1 0 1 0 0

00 01 10 11 00 01 10

00 10 10 01 10 01 11 11 11 11 11 11 11 10 10

0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

11 0110 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 10 00 00 00 01 11

Instruction Reg

’Sub

Instruction Decoding

Fig. 13.5 Instruction decoder for MicroMIPS built of two 6-to-64 decoders.

jrInst

norInst

sltInst

orInst

xorInst

syscallInst

andInst

addInst

subInst

RtypeInst

bltzInst jInst jalInst beqInst bneInst

sltiInst

andiInst oriInst

xoriInst luiInst

lwInst

swInst

addiInst

/ 6 / 6 op fn

Control Signal Generation

Auxiliary signals identifying instruction classes

arithInst = addInst subInst sltInst addiInst sltiInst

logicInst = andInst orInst xorInst norInst andiInst oriInst xoriInst

immInst = luiInst addiInst sltiInst andiInst oriInst xoriInst

Example logic expressions for control signals

RegWrite = luiInst arithInst logicInst lwInst jalInst

ALUSrc = immInst lwInst swInst

AddSub = subInst sltInst sltiInst

DataRead = lwInst

PCSrc0 = jInst jalInst syscallInst

Putting It All Together

Data cache

Instr cache

Next addr

Reg file

rs (rs)

Data addr

Data in 0

DataRead

RegInSrc

RegDst RegWrite

32 / 16

Register input

Data out

ALUOvfl

Next PC

Incr PC

Br&Jump

ALU out

Fig. 13.3

Control

addInst

subInstjInst

sltInst

Fig. 10.19

AddSub

Shifter

Logic unit

Logic function

Amount

Constant amount

Variable amount

ConstVar

Function class

Shift function

5 LSBs Shifted y

32-input NOR

Ovfl Zero

Shorthand symbol for ALU

Ovfl Zero

Control

0 or 1

AND 00 OR 01

XOR 10 NOR 11

00 Shift 01 Set less 10 Arithmetic 11 Logic

00 No shift 01 Logical left 10 Logical right 11 Arith right

jta imm

SysCallAddr

Branch condition checker

1 0 1 2 3

/ 32 BrTrue / 32

/ 30 / 30

/ 30 4 MSBs

30 MSBs

BrType

IncrPC

NextPC

/ 30 31:2

Fig. 13.4

4 MSBs

13.6 Performance of the Single-Cycle Design

Fig. 13.6 The MicroMIPS data path unfolded (by depicting the register write step as a separate block) so as to better visualize the critical-path latencies.

Instruction access 2 nsRegister read 1 nsALU operation 2 nsData cache access 2 nsRegister write 1 ns Total 8 ns

Single-cycle clock = 125 MHz

ALU-type

Branch

Not used

(and jr)

(except jr & jal)

R-type 44% 6 nsLoad 24% 8 nsStore 12% 7 nsBranch 18% 5 nsJump 2% 3 ns

Weighted mean 6.36 ns

How Good is Our Single-Cycle Design?

Instruction access 2 nsRegister read 1 nsALU operation 2 nsData cache access 2 nsRegister write 1 ns Total 8 ns

Single-cycle clock = 125 MHz

Clock rate of 125 MHz not impressive

How does this compare with current processors on the market?

Not bad, where latency is concerned

A 2.5 GHz processor with 20 or so pipeline stages has a latency of about

0.4 ns/cycle 20 cycles = 8 ns

Throughput, however, is much better for the pipelined processor:

Up to 20 times better with single issue

Perhaps up to 100 times better with multiple issue

A Multicycle Implementation

Fig. Single-cycle versus multicycle instruction execution.

Instr 2 Instr 1 Instr 3 Instr 4

3 cycles 3 cycles 4 cycles 5 cycles

Time saved

Instr 1 Instr 4 Instr 3 Instr 2

Time needed

Time allotted

A Multicycle Data Path

Fig. Abstract view of a multicycle instruction execution unit for MicroMIPS. For naming of instruction fields

Control

Reg file

rs,rt,rd (rs)

Address

Inst Reg

Data Reg

z Reg PC

Multicycle Data Path with Control Signals Shown

Fig. Key elements of the multicycle MicroMIPS data path.

Three major changes relative to the single-cycle data path:

1. Instruction & data caches combined

2. ALU performs double duty for address calculation

3. Registers added for intercycle data

Cache Reg file

Address

Inst Reg

Data Reg

z Reg PC

ALUSrcX

ALUFunc

MemWrite MemRead

RegInSrc

RegDst

RegWrite

ALUOvfl

PCSrc PCWrite

IRWrite

ALU out

0 1 2 3

InstData ALUSrcY

SysCallAddr

ALUZero

JumpAddr

4 MSBs

Corrections are shown in red

Clock Cycle and Control Signals Table Control signal 0 1 2 3

JumpAddr jta SysCallAddr

PCSrc1, PCSrc0 Jump addr x reg z reg ALU out

PCWrite Don’t write Write

InstData PC z reg

MemRead Don’t read Read

MemWrite Don’t write Write

IRWrite Don’t write Write

RegWrite Don’t write Write

RegDst1, RegDst0 rt rd $31

RegInSrc1, RegInSrc0 Data reg z reg PC

ALUSrcX PC x reg

ALUSrcY1, ALUSrcY0 4 y reg imm 4 imm

AddSub Add Subtract

LogicFn1, LogicFn0 AND OR XOR NOR

FnClass1, FnClass0 lui Set less Arithmetic Logic

Register file

Program counter

Execution Cycles

Table Execution cycles for multicycle MicroMIPS

Instruction Operations Signal settingsAny Read out the instruction and

write it into instruction register, increment PC

InstData = 0, MemRead = 1IRWrite = 1, ALUSrcX = 0ALUSrcY = 0, ALUFunc = ‘+’PCSrc = 3, PCWrite = 1

Any Read out rs & rt into x & y registers, compute branch address and save in z register

ALUSrcX = 0, ALUSrcY = 3ALUFunc = ‘+’

ALU type Perform ALU operation and save the result in z register

ALUSrcX = 1, ALUSrcY = 1 or 2ALUFunc: Varies

Load/Store Add base and offset values, save in z register

ALUSrcX = 1, ALUSrcY = 2ALUFunc = ‘+’

Branch If (x reg) = < (y reg), set PC to branch target address

ALUSrcX = 1, ALUSrcY = 1ALUFunc= ‘’, PCSrc = 2PCWrite = ALUZero or ALUZero or ALUOut31

Jump Set PC to the target address jta, SysCallAddr, or (rs)

JumpAddr = 0 or 1,PCSrc = 0 or 1, PCWrite = 1

ALU type Write back z reg into rd RegDst = 1, RegInSrc = 1RegWrite = 1

Load Read memory into data reg InstData = 1, MemRead = 1

Store Copy y reg into memory InstData = 1, MemWrite = 1

Load Copy data register into rt RegDst = 0, RegInSrc = 0RegWrite = 1

Fetch & PC incr

Decode & reg read

ALU oper & PC update

Reg write or mem access

Reg write for lw

The Control State Machine

Fig. The control state machine for multicycle MicroMIPS.

State 0 InstData = 0

MemRead = 1 IRWrite = 1

ALUSrcX = 0 ALUSrcY = 0 ALUFunc = ‘+’

PCSrc = 3 PCWrite = 1

Cycle 1 Cycle 3 Cycle 2 Cycle 1 Cycle 4 Cycle 5

ALU- type

lw/ sw lw

State 1

State 5 ALUSrcX = 1 ALUSrcY = 1 ALUFunc = ‘ ’ JumpAddr = %

PCSrc = @ PCWrite = #

State 8

RegDst = 0 or 1 RegInSrc = 1 RegWrite = 1

State 7

ALUSrcX = 1 ALUSrcY = 1 or 2 ALUFunc = Varies

State 6

InstData = 1 MemWrite = 1

State 4

RegDst = 0 RegInSrc = 0 RegWrite = 1

State 2

State 3

InstData = 1 MemRead = 1

Jump/ Branch

Notes for State 5: % 0 for j or jal, 1 for syscall, don’t-care for other instr’s @ 0 for j, jal, and syscall, 1 for jr, 2 for branches # 1 for j, jr, jal, and syscall, ALUZero () for beq (bne), bit 31 of ALUout for bltz For jal, RegDst = 2, RegInSrc = 1, RegWrite = 1

Note for State 7: ALUFunc is determined based on the op and fn f ields

Speculative calculation of branch address

Branches based on instruction

State and Instruction Decoding

Fig. State and instruction decoders for multicycle MicroMIPS.

jrInst

norInst

sltInst

orInst xorInst

syscallInst

andInst

addInst

subInst

RtypeInst

bltzInst jInst

jalInst beqInst bneInst

sltiInst

andiInst oriInst xoriInst luiInst

lwInst

swInst

andiInst

/ 6 / 6 op fn

ControlSt0 ControlSt1 ControlSt2 ControlSt3 ControlSt4 ControlSt5

ControlSt8

ControlSt6 1

0 1 2 3 4 5

12 13 14 15

8 9 10

ControlSt7

addiInst

Control Signal Generation

Certain control signals depend only on the control state

ALUSrcX = ControlSt2 ControlSt5 ControlSt7RegWrite = ControlSt4 ControlSt8

Auxiliary signals identifying instruction classes

addsubInst = addInst subInst addiInstlogicInst = andInst orInst xorInst norInst andiInst oriInst xoriInst

Logic expressions for ALU control signals

AddSub = ControlSt5 (ControlSt7 subInst)FnClass1 = ControlSt7 addsubInst logicInst

FnClass0 = ControlSt7 (logicInst sltInst sltiInst)

LogicFn1 = ControlSt7 (xorInst xoriInst norInst)

LogicFn0 = ControlSt7 (orInst oriInst norInst)

Performance of the Multicycle Design

Fig. The MicroMIPS data path unfolded (by depicting the register write step as a separate block) so as to better visualize the critical-path latencies.

ALU-type

Branch

Not used

(and jr)

(except jr & jal)

R-type 44% 4 cyclesLoad 24% 5 cyclesStore 12% 4 cyclesBranch 18% 3 cyclesJump 2% 3 cycles

Contribution to CPIR-type 0.444 = 1.76Load 0.245 = 1.20Store 0.124 = 0.48Branch 0.183 = 0.54Jump 0.023 = 0.06

_____________________________

Average CPI 4.04

How Good is Our Multicycle Design?

Clock rate of 500 MHz better than 125 MHz of single-cycle design, but still unimpressive

How does the performance compare with current processors on the market?

Not bad, where latency is concerned

A 2.5 GHz processor with 20 or so pipeline stages has a latency of about 0.4 20 = 8 ns

Throughput, however, is much better for the pipelined processor:

Up to 20 times better with single issue

Perhaps up to 100 with multiple issue

R-type 44% 4 cyclesLoad 24% 5 cyclesStore 12% 4 cyclesBranch 18% 3 cyclesJump 2% 3 cycles

Contribution to CPIR-type 0.444 = 1.76

Load 0.245 = 1.20Store 0.124 = 0.48Branch 0.183 = 0.54Jump 0.023 = 0.06

_____________________________

Average CPI 4.04

Cycle time = 2 nsClock rate = 500 MHz

Microprogramming

State 0 InstData = 0

MemRead = 1 IRWrite = 1

Cycle 1 Cycle 3 Cycle 2 Cycle 1 Cycle 4 Cycle 5

ALU- type

lw/ sw lw

State 1

State 8

State 7

State 6

State 4

State 2

State 3

Jump/ Branch

Notes for State 5: % 0 for j or jal, 1 for syscall, don’t-care for other instr’s @ 0 for j, jal, and syscall, 1 for jr, 2 for branches # 1 for j, jr, jal, and syscall, ALUZero () for beq (bne), bit 31 of ALUout for bltz For jal, RegDst = 2, RegInSrc = 1, RegWrite = 1

Note for State 7: ALUFunc is determined based on the op and fn f ields

The control state machine resembles a program (microprogram)

Microinstruction

Fig. Possible 22-bit microinstruction format for MicroMIPS.

PC control

Cache control

Register control

ALU inputs

JumpAddr PCSrc

PCWrite

InstData MemRead

MemWrite IRWrite

FnType LogicFn

AddSub ALUSrcY

ALUSrcX RegInSrc

RegDst RegWrite

Sequence control

ALU function

Symbolic Names for Microinstruction Field ValuesTable Microinstruction field values and their symbolic names. The default value for each unspecified field is the all 0s bit pattern.

Field name Possible field values and their symbolic names

PC control0001 1001 x011 x101 x111

PCjump PCsyscall PCjreg PCbranch PCnext

Cache control0101 1010 1100

CacheFetch CacheStore CacheLoad

Register control1000 1001 1011 1101

rt Data rt z rd z $31 PC

ALU inputs*000 011 101 110

PC 4 PC 4imm x y x imm

ALU function*

0xx10 1xx01 1xx10 x0011 x0111

x1011 x1111 xxx00

Seq. control01 10 11

PCdisp1 PCdisp2 PCfetch* The operator symbol stands for any of the ALU functions defined above (except for “lui”).

10000 10001 10101 11010

Control Unit for Microprogramming

Fig. Microprogrammed control unit for MicroMIPS .

Microprogram memory or PLA

op (from instruction register) Control signals to data path

Address 1

MicroPC

Sequence control

Dispatch table 1

Dispatch table 2

Microinstruction register

fetch: ---------------

andi: ----------

Multiway branch

64 entries in each table

Microprogram for MicroMIPS

Fig. The complete MicroMIPS microprogram.

fetch: PCnext, CacheFetch, PC+4 # State 0 (start)PC + 4imm, PCdisp1 # State 1

lui1: lui(imm) # State 7luirt z, PCfetch # State 8lui

add1: x + y # State 7addrd z, PCfetch # State 8add

sub1: x - y # State 7subrd z, PCfetch # State 8sub

slt1: x - y # State 7sltrd z, PCfetch # State 8slt

addi1: x + imm # State 7addirt z, PCfetch # State 8addi

slti1: x - imm # State 7sltirt z, PCfetch # State 8slti

and1: x y # State 7andrd z, PCfetch # State 8and

or1: x y # State 7orrd z, PCfetch # State 8or

xor1: x y # State 7xorrd z, PCfetch # State 8xor

nor1: x y # State 7norrd z, PCfetch # State 8nor

andi1: x imm # State 7andirt z, PCfetch # State 8andi

ori1: x imm # State 7orirt z, PCfetch # State 8ori

xori: x imm # State 7xorirt z, PCfetch # State 8xori

lwsw1: x + imm, mPCdisp2 # State 2lw2: CacheLoad # State 3

rt Data, PCfetch # State 4sw2: CacheStore, PCfetch # State 6j1: PCjump, PCfetch # State 5jjr1: PCjreg, PCfetch # State 5jrbranch1: PCbranch, PCfetch # State 5branchjal1: PCjump, $31PC, PCfetch # State 5jalsyscall1:PCsyscall, PCfetch # State 5syscall

Exception Handling

Exceptions and interrupts alter the normal program flow

Examples of exceptions (things that can go wrong):

ALU operation leads to overflow (incorrect result is obtained) Opcode field holds a pattern not representing a legal operation Cache error-code checker deems an accessed word invalid Sensor signals a hazardous condition (e.g., overheating)

Exception handler is an OS program that takes care of the problem

Derives correct result of overflowing computation, if possible Invalid operation may be a software-implemented instruction

Interrupts are similar, but usually have external causes (e.g., I/O)

Exception Control States

Fig. Exception states 9 and 10 added to the control state machine.

State 0 InstData = 0 MemRead = 1

IRWrite = 1 ALUSrcX = 0 ALUSrcY = 0 ALUFunc = ‘+’

Cycle 1 Cycle 3 Cycle 2 Cycle 4 Cycle 5

ALU- type

lw/ sw lw

State 1

State 8

State 7

State 6

State 4

State 2

State 3

Jump/ Branch

State 10 IntCause = 0

CauseWrite = 1 ALUSrcX = 0 ALUSrcY = 0 ALUFunc = ‘ ’ EPCWrite = 1 JumpAddr = 1

State 9 IntCause = 1

CauseWrite = 1 ALUSrcX = 0 ALUSrcY = 0 ALUFunc = ‘ ’ EPCWrite = 1 JumpAddr = 1

Illegal operation

Overflow

Single-Cycle Data Path

Fig. Key elements of the single-cycle MicroMIPS data path.

Data cache

Instr cache

Next addr

Reg file

rs (rs)

Data addr

Data in 0

DataRead

RegInSrc

RegDst RegWrite

32 / 16

Register input

Data out

ALUOvfl

Next PC

Incr PC

Br&Jump

ALU out

Clock rate = 125 MHzCPI = 1 (125 MIPS)

Multicycle Data Path

Fig. Key elements of the multicycle MicroMIPS data path.

Clock rate = 500 MHz

CPI 4 ( 125 MIPS)

Cache Reg file

Address

Inst Reg

Data Reg

z Reg PC

ALUSrcX

ALUFunc

MemWrite MemRead

RegInSrc

RegDst

RegWrite

ALUOvfl

PCSrc PCWrite

IRWrite

ALU out

0 1 2 3

InstData ALUSrcY

SysCallAddr

ALUZero

JumpAddr

4 MSBs

Getting the Best of Both Worlds

Single-cycle:Clock rate = 125 MHz

CPI = 1

Multicycle:Clock rate = 500 MHz

Pipelined:Clock rate = 500 MHz

Pipelining Concepts

Fig. Pipelining in the student registration process.

Approval Cashier Registrar ID photo Pickup

Start here

1 2 3 4 5

Strategies for improving performance

1 – Use multiple independent data paths accepting several instructions that are read out at once: multiple-instruction-issue or superscalar

2 – Overlap execution of several instructions, starting the next instruction before the previous one has run to completion: (super)pipelined

Pipelined Instruction Execution

Fig. Pipelining in the MicroMIPS instruction execution process.

Cycle 7 Cycle 6 Cycle 5 Cycle 4 Cycle 3 Cycle 2 Cycle 1 Cycle 8

Reg f ile

Reg file ALU

Reg f ile

Reg file ALU

Reg f ile

Reg file ALU

Reg f ile

Reg file ALU

Reg f ile

Reg file ALU

Cycle 9

Instr cache

Data cache

Time dimension

Task dimension

Alternate Representations of a Pipeline

Fig. Two abstract graphical representations of a 5-stage pipeline executing 7 tasks (instructions).

(a) Task-time diagram (b) Space-time diagram

Instruction

Pipeline stage

1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11

Start-up region

Drainage region

a a a a a a a

w w w w w w w

d d d d d d d

r r r r r r r

f f f f f f f

f = Fetch r = Reg read a = ALU op d = Data access w = Writeback

Except for start-up and drainage overheads, a pipeline can execute one instruction per clock tick; IPS is dictated by the clock frequency

Pipelining Example in a Photocopier

Example

A photocopier with an x-sheet document feeder copies the first sheet in 4 s and each subsequent sheet in 1 s. The copier’s paper path is a 4-stage pipeline with each stage having a latency of 1s. The first sheet goes through all 4 pipeline stages and emerges after 4 s. Each subsequent sheet emerges 1s after the previous sheet. How does the throughput of this photocopier vary with x, assuming that loading the document feeder and removing the copies takes 15 s.

Solution

Each batch of x sheets is copied in 15 + 4 + (x – 1) = 18 + x seconds. A nonpipelined copier would require 4x seconds to copy x sheets. For x > 6, the pipelined version has a performance edge. When x = 50, the pipelining speedup is (4 50) / (18 + 50) = 2.94.

Pipeline Stalls or Bubbles

Fig. Read-after-write data dependency and its possible resolution through data forwarding .

Reg f ile

Reg file ALU

Reg f ile

Reg file ALU

Reg f ile

Reg file ALU

Reg f ile

Reg file ALU

$5 = $6 + $7

$8 = $8 + $6

$9 = $8 + $2

sw $9, 0($3)

Data forwarding

Instr cache

Data cache

First type of data dependency

Inserting Bubbles in a Pipeline

Without data forwarding, three bubbles are needed to resolve a read-after-write data dependency

Reg file

Reg file ALU

Reg file

Reg file ALU

Reg file

Reg file ALU

Reg file

Reg file ALU

Reg file

Reg file ALU

Cycle 9

Instr cache

Data cache

Time dimension

Task dimension

Bubble

Writes into $8

Reads from $8

Reg file

Reg file ALU

Reg file

Reg file ALU

Reg file

Reg file ALU

Reg file

Reg file ALU

Reg file

Reg file ALU

Cycle 9

Instr cache

Data cache

Time dimension

Task dimension

Bubble

Writes into $8

Reads from $8

Two bubbles, if we assume that a register can be updated and read from in one cycle

Second Type of Data Dependency

Fig. Read-after-load data dependency and its possible resolution

through bubble insertion and data forwarding.

Data mem

Instr mem

Reg f ile

Reg file ALU

Data mem

Instr mem

Reg f ile

Reg file ALU

Data mem

Instr mem

Reg f ile

Reg file ALU

sw $6, . . .

lw $8, . . .

Insert bubble?

$9 = $8 + $2

Data mem

Instr mem

Reg f ile

Reg file ALU

Reorder?

Without data forwarding, three (two) bubbles are needed to resolve a read-after-load data dependency

Control Dependency in a Pipeline

Fig. Control dependency due to conditional branch.

Data mem

Instr mem

Reg f ile

Reg file ALU

Data mem

Instr mem

Reg f ile

Reg file ALU

Data mem

Instr mem

Reg f ile

Reg file ALU

$6 = $3 + $5

beq $1, $2, . . .

Insert bubble?

$9 = $8 + $2

Data mem

Instr mem

Reg f ile

Reg file ALU

Reorder? (delayed branch)

Assume branch resolved here

Here would need 1-2 more bubbles

Pipeline Timing and Performance

Fig. Pipelined form of a function unit with latching overhead.

Stage 1

Stage 2

Stage 3

Stage q 1

Stage q

Function unit

Latching of results

Fig. Throughput improvement due to pipelining as a function of the number of pipeline stages for different pipelining overheads.

Throughput Increase in a q-Stage Pipeline

1 2 3 4 5 6 7 8 Number q of pipeline stages

Ideal: /t = 0

/t = 0.1

/t = 0.05

tt / q +

q1 + q / t

Assume that one bubble must be inserted due to read-after-load dependency and after a branch when its delay slot cannot be filled.Let be the fraction of all instructions that are followed by a bubble.

Pipeline Throughput with Dependencies

q(1 + q / t)(1 + )Pipeline speedup =

R-type 44% Load 24% Store 12% Branch 18% Jump 2%

Example

Calculate the effective CPI for MicroMIPS, assuming that a quarter of branch and load instructions are followed by bubbles.

Solution

Fraction of bubbles = 0.25(0.24 + 0.18) = 0.105CPI = 1 + = 1.105 (which is very close to the ideal value of 1)

Pipelined Data Path Design

Fig. Key elements of the pipelined MicroMIPS data path. ALU

Data cache

Instr cache

Next addr

Reg file

rs (rs)

Data addr

ALUSrc ALUFunc DataWrite DataRead

RegInSrc

RegDst RegWrite

ALUOvfl

IncrPC

Br&Jump

1 Incr

NextPC

SeqInst

RetAddr

Stage 1 Stage 2 Stage 3 Stage 4 Stage 5

Address

Pipelined Control

Fig. Pipelined control signals. ALU

Data cache

Instr cache

Next addr

Reg file

rs (rs)

Data addr

ALUSrc ALUFunc

DataWrite DataRead

RegInSrc

RegDst RegWrite

ALUOvfl

IncrPC

Br&Jump

1 Incr

NextPC

SeqInst

RetAddr

Address

Optimal Pipelining

Fig. Higher-throughput pipelined data path for MicroMIPS and the execution of consecutive instructions in it .

Reg file

Data cache

Instr cache

Data cache

Instr cache

Data cache

Instr cache

Reg file

Reg f ile ALU

Reg file

Reg f ile ALU

Reg file

Reg f ile ALU

Instruction fetch

Register readout

ALU operation

Data read/store

Register writeback

MicroMIPS pipeline with more than four-fold improvement

Optimal Number of Pipeline Stages

Fig. Pipelined form of a function unit with latching overhead.

Stage 1

Stage 2

Stage 3

Stage q 1

Stage q

Function unit

Latching of results Assumptions:

Pipeline sliced into q stagesStage overhead is q/2 bubbles per branch (decision made midway)Fraction b of all instructions are taken branches

Derivation of q opt

Average CPI = 1 + b q / 2

Throughput = Clock rate / CPI = [(t / q + )(1 + b q / 2)] –1/2

Differentiate throughput expression with respect to q and equate with 0

q opt = [2(t / ) / b)]

Data Dependencies and Hazards

Fig. Data dependency in a pipeline.

Reg f ile

Reg file ALU

Reg f ile

Reg file ALU

Reg f ile

Reg file ALU

Reg f ile

Reg file ALU

Reg f ile

Reg file ALU

Cycle 9

$2 = $1 - $3

Instructions that read register $2

Instr cache

Data cache

Fig. When a previous instruction writes back a value computed by the ALU into a register, the data dependency can always be resolved through forwarding.

Resolving Data Dependencies via Forwarding

Reg f ile

Reg file ALU

Reg f ile

Reg file ALU

Reg f ile

Reg file ALU

Reg f ile

Reg file ALU

Cycle 9

$2 = $1 - $3

Instr cache

Data cache

Fig. When the immediately preceding instruction writes a value read out from the data memory into a register, the data dependency cannot be resolved through forwarding (i.e., we cannot go back in time) and a bubble must be inserted in the pipeline.

Certain Data Dependencies Lead to Bubbles Cycle 7 Cycle 6 Cycle 5 Cycle 4 Cycle 3 Cycle 2 Cycle 1 Cycle 8

Reg f ile

Reg file ALU

Reg f ile

Reg file ALU

Reg f ile

Reg file ALU

Reg f ile

Reg file ALU

Cycle 9

lw $2,4($12)

Instr cache

Data cache

Data Forwarding

Fig. Forwarding unit for the pipelined MicroMIPS data path.

Data cache

RegInSrc4

RegWrite4 RetAddr3

ALUSrc1

Stage 3 Stage 4 Stage 5 Stage 2

y4 x3 y3

RegWrite3

Reg file

RegInSrc3

ALUSrc2

RetAddr3, RegWrite3, RegWrite4 RegInSrc3, RegInSrc4

Forwarding unit, upper

Forwarding unit, lower

Design of the Data Forwarding Units

Fig. Forwarding unit for the pipelined MicroMIPS data path.

Data cache

RegInSrc4

RegWrite4 RetAddr3

ALUSrc1

Stage 3 Stage 4 Stage 5 Stage 2

y4 x3 y3

RegWrite3

Reg file

RegInSrc3

ALUSrc2

RetAddr3, RegWrite3, RegWrite4 RegInSrc3, RegInSrc4

Forwarding unit, upper

Forwarding unit, lower

RegWrite3 RegWrite4 s2matchesd3 s2matchesd4 RetAddr3 RegInSrc3 RegInSrc4 Choose

0 0 x x x x x x2

0 1 x 0 x x x x2

0 1 x 1 x x 0 x4

0 1 x 1 x x 1 y4

1 0 1 x 0 1 x x3

1 0 1 x 1 1 x y3

1 1 1 1 0 1 x x3

Table Partial truth table for the upper forwarding unit in the pipelined MicroMIPS data path.

Let’s focus on designing the upper data forwarding unit

Incorrect in textbook

Hardware for Inserting Bubbles

Fig. Data hazard detector for the pipelined MicroMIPS data path.

Stage 3 Stage 2

Reg file

Data hazard detector

Control signals from decoder

DataRead2

Instr cache

LoadPC

Stage 1

PC Inst reg

All-0s

Bubble

Controls or all-0s

Pipeline Branch Hazards

Software-based solutions

Compiler inserts a “no-op” after every branch (simple, but wasteful)

Branch is redefined to take effect after the instruction that follows it

Branch delay slot(s) are filled with useful instructions via reordering

Hardware-based solutions

Mechanism similar to data hazard detector to flush the pipeline

Constitutes a rudimentary form of branch prediction:Always predict that the branch is not taken, flush if mistaken

More elaborate branch prediction strategies possible

Branch Prediction

Predicting whether a branch will be taken

Always predict that the branch will not be taken

Use program context to decide (backward branch is likely taken, forward branch is likely not taken)

Allow programmer or compiler to supply clues

Decide based on past history (maintain a small history table); to be discussed later

Apply a combination of factors: modern processors use elaborate techniques due to deep pipelines

A Simple Branch Prediction Algorithm

Fig. Four-state branch prediction scheme.

Not taken

Predict taken

Predict taken again

Predict not taken

Not taken Taken

Taken Not taken

Example

L1: ---- ----L2: ---- ---- br <c2> L2 ---- br <c1> L1

20 iter’s

10 iter’sImpact of different branch prediction schemes

Solution

Always taken: 11 mispredictions, 94.8% accurate1-bit history: 20 mispredictions, 90.5% accurate2-bit history: Same as always taken

Hardware Implementation of Branch Prediction

Fig. Hardware elements for a branch prediction scheme.

The mapping scheme used to go from PC contents to a table entry is the same as that used in direct-mapped caches (Chapter 18)

Compare

Addresses of recent branch instructions

Target addresses

History bit(s) Low-order

bits used as index

Logic From PC

Incremented PC

Next PC

Read-out table entry

Advanced Pipelining

Fig. Dynamic instruction pipeline with in-order issue, possible out-of-order completion, and in-order retirement.

Deep pipeline = superpipeline; also, superpipelined, superpipeliningParallel instruction issue = superscalar, j-way issue (2-4 is typical)

Stage 1

Instr cache

Instruction fetch

Function unit 1

Function unit 2

Function unit 3

Stage 2 Stage 3 Stage 4 Variable # of stages Stage q 2 Stage q 1 Stage q

Ope- rand prep

Instr decode

Retirement & commit stages

Instr issue

Stage 5

Performance Improvement for Deep Pipelines

Hardware-based methods

Lookahead past an instruction that will/may stall in the pipeline(out-of-order execution; requires in-order retirement)

Issue multiple instructions (requires more ports on register file)Eliminate false data dependencies via register renamingPredict branch outcomes more accurately, or speculate

Software-based method

Pipeline-aware compilationLoop unrolling to reduce the number of branches

Loop: Compute with index i Loop: Compute with index i

Increment i by 1 Compute with index i + 1

Go to Loop if not done Increment i by 2

Go to Loop if not done

CPI Variations with Architectural Features

Table Effect of processor architecture, branch prediction methods, and speculative execution on CPI.

Architecture Methods used in practice CPI

Nonpipelined, multicycle Strict in-order instruction issue and exec 5-10

Nonpipelined, overlapped In-order issue, with multiple function units 3-5

Pipelined, static In-order exec, simple branch prediction 2-3

Superpipelined, dynamic Out-of-order exec, adv branch prediction 1-2

Superscalar 2- to 4-way issue, interlock & speculation 0.5-1

Advanced superscalar 4- to 8-way issue, aggressive speculation 0.2-0.5

Development of Intel’s Desktop/Laptop Micros

In the beginning, there was the 8080; led to the 80x86 = IA32 ISA

Half a dozen or so pipeline stages

802868038680486Pentium (80586)

A dozen or so pipeline stages, with out-of-order instruction execution

Pentium ProPentium IIPentium IIICeleron

Two dozens or so pipeline stages

Pentium 4

More advanced technology

Instructions are broken into micro-ops which are executed out-of-order but retired in-order

Dealing with Exceptions

Exceptions present the same problems as branches

How to handle instructions that are ahead in the pipeline?(let them run to completion and retirement of their

results)

What to do with instructions after the exception point?(flush them out so that they do not affect the state)

Precise versus imprecise exceptions

Precise exceptions hide the effects of pipelining and parallelism by forcing the same state as that of strict sequential execution

(desirable, because exception handling is not complicated)

Imprecise exceptions are messy, but lead to faster hardware(interrupt handler can clean up to offer precise

exception)

The Three Hardware Designs for MicroMIPS

Data cache

Instr cache

Next addr

Reg file

rs (rs)

Data addr

Data in 0

DataRead

RegInSrc

RegDst RegWrite

32 / 16

Register input

Data out

ALUOvfl

Next PC

Incr PC

Br&Jump

ALU out

Single-cycle

Cache Reg file

Address

Inst Reg

Data Reg

z Reg PC

ALUSrcX

ALUFunc

MemWrite MemRead

RegInSrc

RegDst

RegWrite

ALUOvfl

PCSrc PCWrite

IRWrite

ALU out

0 1 2 3

InstData ALUSrcY

SysCallAddr

ALUZero

JumpAddr

4 MSBs

Multicycle

Data cache

Instr cache

Next addr

Reg file

rs (rs)

Data addr

ALUSrc ALUFunc

DataWrite DataRead

RegInSrc

RegDst RegWrite

ALUOvfl

IncrPC

Br&Jump

1 Incr

NextPC

SeqInst

RetAddr

Pipelined

125 MHzCPI = 1

500 MHzCPI 4

500 MHzCPI 1.1

Main MIPS assembly language instruction set

Category

Instruction Example Meaning Comments

Arithmetic

add add $sl, $s2, $s3

$s1 = $s2 + $s3

Three operands; overflow detected

subtract sub $s1, $s2, $s3

$s1 = $s2 - $s3 Three operands; overflow detected

add immediate addi $s1, $s2, 100

$s1 = $s2 + 100

+ constant; overflow detected

add unsigned addu $s1, $s2, $s3

$s1 = $s2 + $s3

Three operands; overflow undetected

subtract unsigned subu $sl, $s2, $s3

$s1 = $s2 - $s3 Three operands; overflow undetected

add immediate unsigned

addiu $s1, $s2, 100

$s1 = $s2 + 100

+ constant; overflow undetected

move from coporoc. reg.

mfc0 $s1, $epc $s1 = $epc Used to copy Exception PC plus other special registers

multiply mult $s2, $s3 Hi, Lo = $s2 * $s3

64-bit signed product in Hi, Lo

multiply unsigned multu $s2, $s3 Hi, Lo = $s2 * $s3

64-bit unsigned product in Hi, Lo

divide div $s2, $s3 Lo = $s2/ $s3, Hi = $s2 mod $s3

Lo = quotient, Hi = remainder

divide unsigned divu $s2, $s3 Lo = $s2/ $s3, Hi = $s2 mod $s3

Unsigned quotient and remainder

move from Hi mfhi $s1 $s1 = Hi Used to get copy of Hi

move from Lo mflo $s1 $s1 = Lo Used to get copy of Lo

Category

Logical and and $s1, $s2, $s3

$s1 = $s2 & $s3 Three reg. operands; logical AND

or or $s1, $s2, $s3

$s1 = $s2 ! $s3 Three reg. operands; logical OR

and immediate

andi $s1, $s2, 100

$s1 = $s2 & 100 Logical AND reg. constant

or ori $s1, $s2, 100

$s1 = $s2 ! 100 Logical OR reg. constant

shiftr left logical

sll $s1, $s2, 10 $s1 = $s2 << 10 Shift left by constant

shift right logical

srl $s1, $s2, 10 $s1 = $s2 >> 10 Shift right by constant

Data transfer

load word lw $s1, 100($s2)

$s1 = Memory [$s2+100]

Word from memory to register

store word sw $s1, 100($s2)

Memory [$s2+100]= $s1

Word from register to memory

load byte unsigned

lbu $s1, 100($s2)

$s1 = Memory [$s2+100]

Byte from memory to register

store byte sb $s1, 100($s2)

Memory [$s2+100]= $s1

Byte from register to memory

load upper immediate

lui $s1, 100 $s1 = 100*216 Loads constant in upper 16 bits

Main MIPS assembly language instruction set– cont.

Category

Conditional branch

branch on equal

beq $s1, $s2, 25

if ($s1 ==$s2) go to PC + 4 + 100

Equal test; PC relative branch

branch on not equal

bne $s1, $s2, 25

if ($s1!=$s2) go to PC + 4 + 100

Not equal test; PC relative

set on less than

slt $s1, $s2, $s3

if($s2< $s3) $s1 = 1; else $s1 = 0

Compare less than; two's complement

set less than immediate

slti $s1, $s2, 100

if ($s2< 100) $s1 = 1; else $s1 = 0

Compare < constant; two's complement

set less than unsigned

sltu $s1, $s2, $s3

if($s2< $s3) $s1 = 1; else $s1 = 0

Compare less than; natural numbers

set less than unsigned immediate

sltiu $s1, $s2, 100

if($s2< 100) $s1 = 1; else $s1 = 0

Compare < constant; natural numbers

Unconditi-onal jump

jump j 2500 go to 10000 Jump to target address

jump register jr $ra go to $ra For switch, procedure return

jump and link jal 2500 $ra =PC+4; go to 10000

For procedure call

Main MIPS assembly language instruction set– cont.

MIPS machine Language Example

Name Format

6 bits

5 bits

5 bits 5 bits 5 bits

6 bits

Coments

add R 0 2 3 1 0 32 add $1, $2, $3

sub R 0 2 3 1 0 34 sub $1, $2, $3

addi I 8 2 1 100 addi $1, $2,100

addu R 0 2 3 1 0 33 addu $1, $2, $3

subu R 0 2 3 1 0 35 subu $1, $2, $3

addiu I 9 2 1 100 addiu $1,$2,100

mfc0 R 16 0 1 14 0 0 mfc0 $1, $epc

mult R 0 2 3 0 0 24 mult $2, $3

multu R 0 2 3 0 0 25 multu $2, $3

div R 0 2 3 0 0 26 div $2, $3

divu R 0 2 3 0 0 27 divu $2, $3

mfhi R 0 0 0 1 0 16 mfhi $1

mflo R 0 0 0 1 0 18 mflo $1

and R 0 2 3 1 0 36 and $1, $2, $3

or R 0 2 3 1 0 37 or $1, $2, $3

Example

Name Format

6 bits 5 bits 5 bits 5 bits

5 bits

6 bits Coments

andi I 12 2 1 100 andi $1, $2,100

ori I 13 2 1 100 ori $1, $2,100

sll R 0 0 2 1 10

0 sll $1, $2, 10

srl R 0 0 2 1 10

2 srl $1, $2, 10

lw I 35 2 1 100 lw $1,100($2)

sw I 43 2 1 100 sw $1,100($2)

lui I 15 0 1 100 lui $1,100

beq I 4 1 2 25 beq $1,$2,100

bne I 5 1 2 25 bne $1,$2,100

slt R 0 2 3 1 0 42 slt $1, $2, $3

slti I 10 2 1 100 slti $1,$2,100

sltu R 0 2 3 1 0 43 sltu $1, $2, $3

sltiu I 11 2 1 100 sltiu $1,$2,100

j J 2 2500 j 10000

jr R 0 31 0 0 0 8 jr $31

jal J 3 2500 jal 10000

MIPS machine Language – cont.

MIPS Instruction formats

Name Flelds

Comments

Field size

6 bits

5 bits 5 bits

6 bits

All MIPS instructions 32 bits

R-format

op rs rt rd shamt

funct Arithemetic instruction format

I-Format

op rs rt address/immediate Arithemetic instruction format

J-Format

op target address Jump instruction format

Main MIPS machine language, Formats and examples are shown, with values in each field: op and funct fields form the opcode (each 6 bits), rs field gives a source register (5 bits), rt is also normally a source register (5 bits), rd is the destination register (5 bits), and shamt supplies the shift amount (5 bits). The field values are all in decimal. Floating-point machine language instructions are shown in Figure 4.47 on page 291. Appendix A gives the full MIPS machine language.

MIPS floating-point operands

Name Example Comments

32 floating point registers

$f0, $f1, $f2..., $f31,

MIPS floating -point registers are used in pairs for double precision numbers.

230 memory words

Memory [0], Memory [4], ....,Memory [4294967292]

Accessed only by data transfer instructions. MIPS uses byte addresses, so sequential words differ by 4. Memory holds data structures, such as arrays, and spilled registers, such as those saved on procedure calls.

MIPS floating-point assembly language

Category

Arthmetic

FP add single add.s $f2, $f4, $f6

$f2 = $f4 + $f6 FP add (single precision)

FP subtract single

sub.s $f2, $f4, $f6

$f2 = $f4 - $f6 FP sub (single precision)

FP multiply single

mul.s $f2, $f4, $f6

$f2 = $f4 * $f6 FP. multiply(single precision)

FP divide single

div.s $f2, $f4, $f6

$f2 = $f4 / $f6 FP divide (single precision)

FP add double add.d $f2, $f4, $f6

$f2 = $f4 + $f6 FP add (double precision)

FP subtract double

sub.d $f2, $f4, $f6

$f2 = $f4 - $f6 FP sub (double precision)

FP multiply double

mul.d $f2, $f4, $f6

$f2 = $f4 * $f6 FP multiply (double precis.)

FP divide double

div.d $f2, $f4, $f6

$f2 = $f4 / $f6 FP divide (double precis.)

Data transfer

load word copr.1

lwc1 $f1, 100($s2)

$f1 = Mem [$s2+100] 32-bit data to FP register

store word copr.1

swc1 $f1,100($s2)

Mem[$s2+100] =$f1 32-bit data to memory

Conditional branch

branch on FP true

bclt 25 if(cond==1) go to PC+4+100

PC relative branch if FP cond.

branch on FP false

bclf 25 if(cond==0) go to PC+4+100

PC relative branch if not cond.

FP compare single (eq.ne.lt.le.gt.ge)

c.lt.s $f2, $f4 if($f2 < $f4) cond=1; else cond = 0

FP compare less than single precision

FP compare double

c.lt.d $f2, $f4 if($f2 <$f4) (eq.ne.lt.le.gt.ge) cond=1; else cond = 0

FP compare less than single precision

MIPS floating-point machine language

Name Format

Example Comments

add.s R 17 16 6 4 2 0 add.s $f2, $f4 $f6

sub.s R 17 16 6 4 2 1 sub.s $f2, $f4 $f6

mul.s R 17 16 6 4 2 2 mul.s $f2, $f4 $f6

div.s R 17 16 6 4 2 3 div.s $f2, $f4 $f6

add.d R 17 17 6 4 2 0 add.d $f2, $f4 $f6

sub.d R 17 17 6 4 2 1 sub.d $f2, $f4 $f6

mul.d R 17 17 6 4 2 2 mul.d $f2, $f4 $f6

div.d R 17 17 6 4 2 3 div.d $f2, $f4 $f6

lwc1 I 49 20 2 100 lwc1 $f2, 100($s4)

swc1 I 57 20 2 100 swc1 $f2, 100($s4)

bc1t I 17 8 1 25 bc1t 25

bc1f I 17 8 0 25 bc1f 25

c.lt.s R 17 16 4 2 0 60 c.lt.s $f2, $f4

c.lt.d R 17 17 4 2 0 60 c.lt.d $f2, $f4

Field size

6 bits

5 bits

5 bits 5 bits 5 bits 6 bits All MIPS instruction 32 bits

Arithmetic and Logical Instructions

Addition (with overflow)

add rd,rs,rt 6 5 5 5 5 6

0 rs rt rd 0 0x20

Absolute value

abs rdest,rsrc pseudoinstruction

Put the absolute value of register rsrc in register rdest.

Addition (without overflow)

addu rd,rs,rt 6 5 5 5 5 6

Put the sum of registers rs and rt into register rd.

0 rs rt rd 0 0x21

Addition immediate (without overflow)

addi rt,rs,imm 6 5 5 16

Arithmetic and Logical Instructions – con.

8 rs rt imm

Addition immediate (without overflow)

addi rt,rs,imm 6 5 5 16

Put the sum of registers rs and sign-extended immediate into register rt.

9 rs rt imm

and rd,rs,rt 6 5 5 5 5 6

Put the logical AND of registers rs and rt into register rd.

And immediate

andi rd,rs,imm 6 5 5 16

Put the logical AND of register rs and the zero-extended immediate into register rt.

0xc rs rt imm

0 rs rt rd 0 0x24

Divide (with overflow)

div rs,rt 6 5 5 10 6

Divide (with overflow)

div rdest,rsrc 1,src2 pseudoinstruction

Divide (without overflow)

divu rs,rt 6 5 5 10 6

0 rs rt 0 0x1a

0 rs rt 0 0x1b

Divide (without overflow)

div rdest,rsrc 1,src2 pseudoinstruction

Put the quotient of register rsrc1 and src2 into register rdest.

Divide register rs by register rt. Leave the quotient in register lo and the remainder in register hi.

Multiply

mult rs,rt 6 5 5 10 6

Multiply (without overflow)

mul rdest,rsrc 1,src2 pseudoinstruction

Unsigned multiply

multu rs,rt 6 5 5 10 6

0 rs rt 0 0x18

0 rs rt 0 0x19

Unsigned multiply (with overflow)

mulou rdest,rsrc1,src2 pseudoinstruction

Put the product of register rsrc1 and src2 into register rdest.

Multiply (with overflow)

mulo rdest,rsrc 1,src2 pseudoinstruction

Multiply register rs and rt. Leave the low-order word of the product in register lo and the high-order word in register hi.

nor rd,rs,rt 6 5 5 5 5 6

Put the logical NOR of registers rs and rt into register rd.

0 rs rt rd 0 0x27

Negate value (with overflow)

neg rdest,rsrc pseudoinstruction

not rdest,rsrc pseudoinstruction

Put the sum of registers rs and rt into register rd.

Negate value (without overflow)

negu rdest,rsrc pseudoinstruction

Put the negative of register rsrc into register rdst.

or rd,rs,rt 6 5 5 5 5 6

Put the logical OR of registers rs and rt into register rd.

OR immediate

ori rt,rs,imm 6 5 5 16

Put the logical OR of register rs and the zero-extended immediate into register rt.

0xd rs rt imm

0 rs rt rd 0 0x25

Remainder

rem rdest,rsrc1,rsrc2 pseudoinstruction

Unsigned remainder

remu rdest,rsrc1,rsrc2 pseudoinstructionPut the remainder od register rsrc1 divided by register rsrc2 into register rdest. Note that if an operand is negative, the remainder is unspecified by the MIPS architecture and depends on the convention of the machine on which SPIM is run.

Shift left logical

sll rd,rt,shamt 6 5 5 5 5 6

0 rs rt rd shamt 0

Shift left logical variable

sllv rd,rt,rs 6 5 5 5 5 6

0 rs rt rd 0 4

Shift right arithmetic

sra rd,rt,shamt 6 5 5 5 5 6

0 rs rt rd shamt 3

Shift right arithmetic variable

srav rd,rt,rs 6 5 5 5 5 6

0 rs rt rd 0 7

Shift right logical

sra rd,rt,shamt 6 5 5 5 5 6

0 rs rt rd shamt 2

Shift right logical variable

srlv rd,rt,rs 6 5 5 5 5 6

Shift register rt left (right) by the distance indicated by immediate shamt or the register rs and put the result in register rd.

0 rs rt rd 0 6

Rotate left

rol rdest,rsrc1,rsrc2 pseudoinstruction

Rotate right

ror rdest,rsrc1,rsrc2 pseudoinstructionRotate register rsrc1 left(right) by the distance indicated by rsrc2 and put the result in register rdest.

Subtract (with overflow)

sub rd,rs,rt 6 5 5 5 5 6

Subtract (without overflow)

subu rd,rs,rt 6 5 5 5 5 6

Put the difference of register rs and rt into register rd.

0 rs rt rd 0 0x22

0 rs rt rd 0 0x23

Exclusive OR

xor rd,rs,rt 6 5 5 5 5 6

Put the logical XOR of registers rs and the zero-extended immediate into register rt.

0 rs rt rd 0 0x26

XOR immediate

xori rt,rs,imm 6 5 5 16

Put the logical XOR of register rs and the zero-extended immediate into register rt.

0xe rs rt imm

Constant-Manipulating Instructions

Load upper immediate

lui rt,imm 6 5 5 16

Load the lower halfword of the immediate imm into the upper halfword of registers rt. The lower bits of the register are set to 0.

0xf rs rt imm

Load immediate

li rdest,imm pseudoinstructionMove the immediate imm into register rdest.

Comparison Instructions Set less than

slt rd,rs,rt 6 5 5 5 5 6

Set less than unsigned

sltu rd,rs,rt 6 5 5 5 5 6

Set register rd to 1 if register rs is less than rt, and to 0 otherwise.

0 rs rt rd 0 0x2a

0 rs rt rd 0 0x2b

Set less than immediate

slti rd,rs,imm 6 5 5 16

Set less than unsigned immediate

sltiu rd,rs,imm 6 5 5 16

Set register rd to 1 if register rs is less than the sign-extended immediate, and to 0 otherwise.

0xb rs rd imm

0xa rs rd imm

Comparison Instructions – cont.

Set greater than equal

sge rdest,rsrc1,rsrc2 pseudoinstruction

Set equal

seq rdest,rsrc1,rsrc2 pseudoinstruction

Set register rdest to 1 if register rsrc1 equals rsrc2, and to 0 otherwise.

Set greater than equal unsigned

sgeu rdest,rsrc1,rsrc2 pseudoinstructionSet register rdest to 1 if register rsrc1 is greater than or equal to rsrc2, and to 0 otherwise.

Set greater than

sgt rdest,rsrc1,rsrc2 pseudoinstruction

Comparison Instructions – cont.

Set greater than unsigned

sgtu rdest,rsrc1,rsrc2 pseudoinstructionSet register rdest to 1 if register rsrc1 is greater than rsrc2, and to 0 otherwise.

Set less than equal

sle rdest,rsrc1,rsrc2 pseudoinstruction

Set less than equal unsigned

sleu rdest,rsrc1,rsrc2 pseudoinstructionSet register rdest to 1 if register rsrc1 is less than or equal to rsrc2, and to 0 otherwise.

Set not equal

sne rdest,rsrc1,rsrc2 pseudoinstructionSet register rdest to 1 if register rsrc1 is not equal to rsrc2, and to 0 otherwise.

Branch Instructions

Branch Instruction

b label pseudoinstructionUnconditionally branch to the instruction at the label.

Branch coprocessor z true

bczt label 6 5 5 16

Branch coprocessor z false

bczf label 6 5 5 16

Conditionally branch the number of instructions specified by the offset if z's condition flag is true (false). z is 0, 1, 2, or 3. The floating-point unit is z = 1.

0x1z 8 1 offset

0x1z 8 0 offset

Branch Instructions – cont.

Branch on equal

beq rs,rt,label 6 5 5 16

Conditionally branch the number of instructions specified by the offset if register rs equals rt.

Branch on greater than equal zero

bgez rs,label 6 5 5 16

Conditionally branch the number of instructions specified by the offset if register rs is greater than or equal to 0.

4 rs rt offset

1 rs 1 offset

Branch on greater than equal zero and link

bgezal rs,label 6 5 5 16

Conditionally branch the number of instructions specified by the offset if register rs is greater than or equal to 0. Save the address of the next instruction in register 31.

1 rs 0x11 offset

Branch on greater than zero

bgtz rs,label 6 5 5 16

Conditionally branch the number of instructions specified by the offset if register rs is greater than 0.

Branch on less than equal zero

blez rs,label 6 5 5 16

Conditionally branch the number of instructions specified by the offset if register rs is less than or equal to 0.

7 rs 0 offset

6 rs 0 offset

Branch on less than equal zero and link

bltzal rs,label 6 5 5 16

Conditionally branch the number of instructions specified by the offset if register rs is less than 0. Save the address of the next instruction in register 31.

1 rs 0x10 offset

Branch on less than zero

bltz rs,label 6 5 5 16

Conditionally branch the number of instructions specified by the offset if register rs is less than 0.

Branch on not equal

bne rs,label 6 5 5 16

Conditionally branch the number of instructions specified by the offset if register rs is not equal to rt.

1 rs 0 offset

5 rs rt offset

Branch on equal zero

beqz rsrc,label pseudoinstruction

Conditionally branch to the instructions at the label if rsrc1 equals 0.

Branch on greater than equal

bge rsrc1,rsrc2,label pseudoinstruction

Branch on greater than equal unsigned

bgeu rsrc1,rsrc2,label pseudoinstruction

Conditionally branch to the instruction at the label if register rsrc1 is greater than or equal to rsrc2.

Branch on greater than

bgt rsrc1,rsrc2,label pseudoinstruction

Branch on greater than unsigned

bgtu rsrc1,rsrc2,label pseudoinstruction

Conditionally branch to the instruction at the label if register rsrc1 is greater than rsrc2.

Branch Instructions – cont. Branch on less than equal

ble rsrc1,rsrc2,label pseudoinstruction

Branch on less than equal unsigned

bleu rsrc1,rsrc2,label pseudoinstructionConditionally branch to the instruction at the label if register rsrc1 is less than or equal to rsrc2.

Branch on less than

blt rsrc1,rsrc2,label pseudoinstruction

Branch on less than unsigned

bltu rsrc1,rsrc2,label pseudoinstructionConditionally branch to the instruction at the label if register rsrc1 is less than rsrc2. Branch on not equal zero

bnez rsrc,label pseudoinstructionConditionally branch to the instruction at the label if register rsrc is not equal to 0.

Jump Instructions

J target 6 26

Unconditionally jump to the instruction at target.

Jump and link

jal target 6 26

Unconditionally jump to the instruction at target. Save the address of the next instruction in register rd.

2 target

Jump and link register

jalr rs,rd 6 5 5 5 5 6

Unconditionally jump to the instruction whose address is in register rs. Save the address of the next instruction in register rd (which defaults to 31).

0 rs 0 rd 0 9

Jump Instructions – cont. Jump register

jr rs 6 5 5 16

Unconditionally jump to the instruction whose address is in register rs.

0x0 rs 0 0x8

Load address

la rdest,address pseudoinstructionLoad computed address-not the contents of the location-into register rdest.

Load byte

lb rt,address 6 5 5 16

0x20 rs rt offset

Load unsigned byte

lbu rt,address 6 5 5 16

Load the byte at address into register rt. The byte is sign-extended by lb, but not by lbu.

0x24 rs rt offset

Load halfword

lh rt,address 6 5 5 16

0x21 rs rt offset

Load unsigned halfword

lhu rt,address 6 5 5 16

Load the 16-bit quantity (halfword) at address into register rt. The halfword is sign-extended by lh, but not by lhu.

0x25 rs rt offset

Jump Instructions – cont.

Load word

lw rt,address 6 5 5 16

Load the 32-bit quantity (word) at address into register rt.

0x23 rs rt offset

Jump Instructions – cont.

Load word coprocessor

lwcz rt,address 6 5 5 16

Load the word at address into register rt of coprocessor z (0-3). The floating-point unit is z=1.

0x3z rs rt offset

Load word left

lwl rt,address 6 5 5 16

0x22 rs rt offset

Load word right

lwr rt,address 6 5 5 16

Load the left (right) bytes from the word at the possibly unaligned address into register rt.

0x23 rs rt offset

Jump Instructions – cont. Load doubleword

ld rdest,address pseudoinstruction Load the 64-bit quantity at address into register rdest and rdest + 1.

Unaligned load halfword

ulh rdest,address pseudoinstruction

Unsigned load halfword unaligned

ulhu rdest,address pseudoinstruction Load the 16-bit quantity (halfword) at the possibly unaligned address into register rdest. The halfword is sign-extended by ulh, but not ulhu.

Unsigned load word

ulw rdest,address pseudoinstruction Load the 32-bit quantity (word) at the possibly unaligned address into register rdest.

Store Instructions Store byte

sb rt,address 6 5 5 16

Store the low byte from register rt at address.

0x28 rs rt 0x8

Store word

sw rt,address 6 5 5 16

Store the word from register rt at address.

0x2b rs rt offset

Store word coprocessor

swcz rt,address 6 5 5 16

Store the word from register rt of coprocessor z at address. The floating point unit is z = 1.

0x3(1-z) rs rt offset

Store halfword

sh rt,address 6 5 5 16

Store the low halfword from register rt at address.

0x29 rs rt 0x8

Store Instructions – cont. Store word left

swl rt,address 6 5 5 16

0x2a rs rt 0x8

Store doublewordsd rsrc,address pseudoinstruction

Store the 64-bit quantity in registers rsrc and rsrc + 1 address.

Store word right

swr rt,address 6 5 5 16

Store the left (right) bytes from register rt at the possibly unaligned address.

0x29 rs rt 0x8

Unaligned store halfwordush rsrc,address pseudoinstruction Store the low halfword from register rsrc at the possibly unaligned address.

Unaligned store wordusw rsrc,address pseudoinstruction Store the word from register rsrc at the possibly unaligned address.

Data Movement Instructions Move

move rdest,rsrc pseudoinstructionMove register rsrc to rdest.

Move from hi

mfhi rd 6 10 5 5 6

0 0 rd 0 0x10

Move from lo

mflo rd 6 10 5 5 6

Move the hi (lo) register to register rd.

0 0 rd 0 0x12

Move to hi

mthi 6 5 15 6

0 rs 0 0x11

Move to lo

mtlo 6 5 15 6

Move register rs to the hi (lo) register.

Data Movement Instructions – cont.

0 rs 0 0x11

Move from coprocessor z

mfcz rt,rd 6 5 5 5 11

Move coprocessor z's register rd to CPU register rt. The floating-point unit is coprocessor z=1.

0x1z 0 rt rd 0

Move double from coprocessor zmfcl,d rdest,rsrc1 pseudoinstructionMove floating-point registers frsrc1 and frsrc1 + 1 to CPU registers rdest and rdest + 1.

Move to coprocessor z

mtcz rt,rd 6 5 5 5 11

Move CPU register rt to coprocessor z's register rd.

0x1z 0 rt rd 0

Floating-Point Instructions Floating-point absolute value double

abs.d fd,fs 6 5 5 5 5 6

0x11 1 0 fs fd 5

Floating-point absolute value single

abs.s fd,fs 6 5 5 5 5 6

Compute the absolute value of the floating-point double (single) in register fs and put it in register fd.

0x11 1 0 fs fd 5

Floating-point addition double

add.d fd,fs,ft 6 5 5 5 5 6

0x11 1 0 fs fd 5

Floating-point addition single

add.s fd,fs,ft 6 5 5 5 5 6

Compute the sum of the floating-point double (single) in register fs and ft and put it in register fd.

0x11 1 0 fs fd 5

Floating-Point Instructions – cont. Compare equal double

c.eq.d fs,ft 6 5 5 5 5 2 4

0x11 1 ft fs fd FC 2

Compare equal single

c.eq.s fs,ft 6 5 5 5 5 2 4

Compare the floating-point double in register fs against the one in ft and set the floating-point condition flag true if they are equal.

Compare less than equal double

c.le.d fs,ft 6 5 5 5 5 2 4

Compare less than equal single

c.le.s fs,ft 6 5 5 5 5 2 4

Compute the floating-point double in register fs against the one in ft and set the floating-point condition flag true if the first is less than or equal to the second. Use the bclt or bclf instructions to test the value of this flag.

Floating-Point Instructions – cont. Compare less than double

c.lt.d fs,ft 6 5 5 5 5 2 4

0x11 1 ft fs 0 FC 2

Compare less than single

c.lt.s fs,ft 6 5 5 5 5 2 4

Compute the floating-point double in register fs against the one in ft and set the condition flag true if the first is less than the second. Use the bclt or bclf instructions to test the value of this flag.

0x11 0 ft fs 0 FC 2

Convert single to double

cvt.d.s fd,fs 6 5 5 5 5 6

0x11 1 0 fs fd 0x21

Convert integer to double

cvt.d.s fd,fs 6 5 5 5 5 6

Convert the single precisioion floating-point number or integer in register fs to a double precision number and put it in register fd.

0x11 1 0 fs fd 0x21

Floating-Point Instructions – cont. Convert double to single

cvt.s.d fd,fs 6 5 5 5 5 6

0x11 1 0 fs fd 0x20

Convert integer to single

cvt.s.w fd,fs 6 5 5 5 5 6

Convert the double precisioion floating-point number or integer in register fs to a single precision number and put it in register fd.

0x11 0 0 fs fd 0x20

Convert double to integer

cvt.w.d fd,fs 6 5 5 5 5 6

0x11 1 0 fs fd 0x24

Convert single to integer

cvt.w.s fd,fs 6 5 5 5 5 6

Convert the double or single precisioion floating-point number in register fs to an integer and put it in register fd.

0x11 0 0 fs fd 0x24

Floating-Point Instructions – cont.

Floating-point divide double

div.d fd,fs,ft 6 5 5 5 5 6

0x11 1 ft fs fd 3

Floating-point divide single

div.s fd,fs,ft 6 5 5 5 5 6

Compute the quotient of the floating-point doubles (singles) in registers fs and ft and put it in register fd.

0x11 0 ft fs fd 3

Load floating-point double

l.d fdest,address pseudoinstruction

Load floating-point single

l.s fdest,address pseudoinstruction

Load the floating-point double (single) at address into register fdest.

Move floating-point double

mov.d fd,fs 6 5 5 5 5 6

0x11 1 0 fs fd 6

Move floating-point single

mov.s fd,fs 6 5 5 5 5 6

Move the floating-point double (single) from register fs to register fd.

0x11 0 0 fs fd 6

Floating-point multiply double

mul.d fd,fs,ft 6 5 5 5 5 6

0x11 1 ft fs fd 2

Floating-point multiply single

mul.s fd,fs,ft 6 5 5 5 5 6

Compute the product of the floating-point dobule (single) in registers fs and ft and put it in register fd.

0x11 0 ft fs fd 2

Negate double

neg.d fd,fs 6 5 5 5 5 6

0x11 1 ft fs fd 7

Negate single

neg.s fd,fs 6 5 5 5 5 6

Negate the floating-point double (single) in register fs and put it in register fd.

0x11 0 0 fs fd 7

Store floating-point double

s.d fdest,address pseudoinstruction

Store floating-point single

s.s fdest,address pseudoinstructionStore the floating-point double (single) in register fdest at address.

Floating-point subtract double

sub.d fd,fs,ft 6 5 5 5 5 6

0x11 1 ft fs fd 1

Floating-point subtract single

sub.s fd,fs,ft 6 5 5 5 5 6

Compute the difference of the floating-point double (single) in registers fs and ft and put it in register fd.

0x11 0 ft fs fd 1

Floating-Point Instructions Return from exception

rfe 6 1 19 6

Restore the Status register.

0x10 1 0 0x20

System call

syscall 6 20 6

Register $v0 contains the number of the system call.

0 0 0xc

break 6 20 6

Cause exception code. Exception 1 is reserved for the debugger.

0 code 0xd

No operation

nop 6 5 5 5 5 6

Do nothing.

0 0 0 0 0 0

Skup instrukcija mikropocesora MIPS

Documents

an introduc on to the MIPS processors, use, and set …cs.unc.edu/Research/stc/FAQs/Projectors/ProjectionDesign/MIPS/MIPS... · The MIPS processors feature unique functionality and

7 - Tok Izvrsavanja Instrukcija i Instrukcijski Paralelizam

Polar Instrukcija

soongsil - PLDWorld.comDigital IF Processing Baseband Modem 1000 MIPS ~ 50 ~ 1000 MIPS 1000 ~ 2000 MIPS 10 ~ 50 MIPS (Speech) 1000 MIPS (Video) - FPGA - Programmable Dedicated H/W

1. Interkatedarski znanstveni skup organizacije i menadžmenta s

MIPS Assembly What is MIPS?

® Aviva SKUP

MIPS Hello World MIPS Assembly 1 - Virginia Tech

JPA, JPC, JPD, NS, PF, MQ · JPA, JPC, JPD, NS, PF, MQ Pumps and boosters Saugumo instrukcija ir kita svarbi informacija GRUNDFOS INSTRUKCIJA JPA, JPC, JPD Installation and operating

PETI MEĐUNARODNI ZNANSTVENI SKUP IKONOGRAFSKIH STUDIJA VLADAR I IKONOGRAFIJA … · 2011-05-15 · peti meĐunarodni znanstveni skup ikonografskih studija vladar i ikonografija moĆi

9. Međunarodni nau čni skup - University of Belgradeicf.fasper.bg.ac.rs/zbornici/2015-DANAS-PROCEEDINGS.pdf · 2015. 11. 23. · 9. Međunarodni naučni skup Specijalna edukacija

MIPS: Microprocessor without Interlocked Pipeline Stages · 2015-09-15 · ECE473 2 MIPS Architecture •MIPS: Microprocessor without Interlocked Pipeline Stages •Why MIPS instead

Cobas h 232 POC system - SKUP

Mezofilní pěchavové trávníky - pladias.cz · Poa baden-sis NOT skup. Potentilla arenaria NOT skup. Stachys recta Struktura a druhové složení. Saxifrago-Sesleri-etum zahrnuje

MIPS64™ Architecture For Programmers Volume II: …delliott/cmpe382/MIPS/MIPS42...MIPS I™, MIPS II™, MIPS III™, MIPS IV™, MIPS V™, MDMX™, MIPSsim™, MIPSsimCA™,

MIPS Technologies Corporate Updates and MIPS Android

MIPS32® Architecture Volume II: The MIPS32® Instruction Set II: MIPS32... · 2021. 2. 24. · MIPS, MIPS I, MIPS II, MIPS III, MIPS IV, MIPS V, MIPS-3D, MIPS16, MIPS16e, MIPS32,

MIPS SDE 5.03 Programmers’ Guide · MIPS® SDE 5.03 Programmers’ Guide ® ™ ® MIPS® ® MIPS ® MIPS ® ™ ® ™ ®® ® ® SDE

NC 407 FTS, FTS-V, FTP-V, FTU-V , FTW-V user manual 18.01.22 … · -v, ftp-v, ftu-v ftu-v6, ftw-v6 lietoŠanas instrukcija ) stogo lango eksploatacijos instrukcija ) gebruiksaanwijzing

MIPS vs. ARM Assembly Comparing Registersece2035.ece.gatech.edu/readings/embedded/MIPSvsARM.pdf · MIPS vs. ARM Assembly Comparing Registers MIPS: The MIPS instruction set acknowledges