82
Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee http://dusithost.dusit.ac.th/ ~juthawut_cha/home.htm

Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee wut_cha/home.htm

Embed Size (px)

DESCRIPTION

Overview  Brief look Digital logic  CPU Datapath MIPS Example 3Introduction to Computer Organization and Architecture

Citation preview

Page 1: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Introduction to Computer Organization and Architecture

Lecture 11By Juthawut

Chantharamaleehttp://dusithost.dusit.ac.th/~juthawut_cha/home.htm

Page 2: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Outline Building a CPU

Basic Components MIPS Instructions (Microprocessor without Interlocked Pipeline Stages)

Basic 5 Steps for CPU Single-Cycle Design Multi-cycle Design Comparison of Single and Multi-cycle Designs

2Introduction to Computer Organization and Architecture

Page 3: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Overview Brief look

Digital logic

CPU Datapath MIPS Example

3Introduction to Computer Organization and Architecture

Page 4: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Digital Logic

D Q

D-type Flip-flop

Clock(edge-triggered)

S (Select input)

A

BF

0

1

Multiplexer

D-type Flip-flop with Enable

Clock(edge-triggered)

D QEN

0

1D Q

DQ

EN(enable)

Clock(edge-triggered)

4Introduction to Computer Organization and Architecture

Page 5: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Digital Logic

1 Bit

D Q

Clock(edge-triggered)

EN

4 Bits

Clock(edge-triggered)

D3 Q3

EN

D2 Q2D1 Q1D0 Q0

Registers

N Bits

D Q

Clock(edge-triggered)

EN

5Introduction to Computer Organization and Architecture

Page 6: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Digital Logic

outin

drive

Tri-state Driver (Buffer)In Drive Out0 0 Z1 0 Z

0 1 0

1 1 1

What is Z ??

6Introduction to Computer Organization and Architecture

Page 7: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Digital Logic

Adder/Subtractor or ALUA B

F

Carry-outAdd/sub or ALUopCarry-in

7Introduction to Computer Organization and Architecture

Page 8: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Overview Brief look

Digital logic

How to Design a CPU Datapath MIPS Example

8Introduction to Computer Organization and Architecture

Page 9: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Designing a CPU: 5 Steps Analyze the instruction set datapath requirements

MIPS: ADD, SUB, ORI, LW, SW, BR Meaning of each instruction given by RTL (register transfers) 2 types of registers: CPU/ISA registers, temporary registers

Datapath requirements select the datapath components ALU, register file, adder, data memory, etc

Assemble the datapath Datapath must support planned register transfers Ensure all instructions are supported

Analyze datapath control required for each instruction Assemble the control logic

9Introduction to Computer Organization and Architecture

Page 10: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Step 1a: Analyze ISA All MIPS instructions are 32 bits long. Three instruction formats:

R-type

I-type

J-type

R: registers, I: immediate, J: jumps These formats intentionally chosen to simplify design

op target address02631

6 bits 26 bits

op rs rt rd shamt funct061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits

10Introduction to Computer Organization and Architecture

Page 11: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Step 1b: Analyze ISA

Meaning of the fields: op: operation of the instruction rs, rt, rd: the source and destination register specifiers

Destination is either rd (R-type), or rt (I-type) shamt: shift amount funct: selects the variant of the operation in the “op” field immediate: address offset or immediate value target address: target address of the jump instruction

op target address02631

6 bits 26 bits

op rs rt rd shamt funct061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits

R-type

I-type

J-type

11Introduction to Computer Organization and Architecture

Page 12: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

MIPS ISA: subset for today ADD and SUB

addU rd, rs, rt subU rd, rs, rt

OR Immediate: ori rt, rs, imm16

LOAD and STORE Word lw rt, rs, imm16 sw rt, rs, imm16

BRANCH: beq rs, rt, imm16

op rs rt rd shamt funct061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits

12Introduction to Computer Organization and Architecture

Page 13: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Step 2: Datapath RequirementsREGISTER FILE

MIPS ISA requires 32 registers, 32b each

Called a register file Contains 32 entries Each entry is 32b

AddU rd,rs,rt or SubU rd,rs,rt Read two sources rs, rt Operation rs + rt or rs – rt Write destination rd ← rs+/-rt

Requirements Read two registers (rs, rt) Perform ALU operation Write a third register (rd)

RdReg1

RdReg2

WrRegWrData

RdData1

RdData2

RegWrite

REGFILE

RegisterNumbers(5 bits ea)

How toimplement?

ALU

ALUop

Result

Zero?

13Introduction to Computer Organization and Architecture

Page 14: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Step 3: Datapath Assembly ADDU rd, rs, rt SUBU rd, rs, rt

Need an ALU Hook it up to REGISTER FILE REGFILE has 2 read ports (rs,rt), 1 write port (rd)

rsParametersCome FromInstructionFields

rt

rd

Control Signals DependUpon Instruction Fields

Eg:ALUop = f(Instruction) = f(op, funct)

RdReg1

RdReg2

WrRegWrData

RdData1

RdData2

RegWrite

REGFILE

ALU

ALUop

Result

Zero?

14Introduction to Computer Organization and Architecture

Page 15: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Steps 2 and 3: ORI Instruction ORI rt, rs, Imm16

Need new ALUop for ‘OR’ function, hook up to REGFILE 1 read port (rs), 1 write port (rt), 1 const value (Imm16)

rs

FromInstruction

rt

rt rdX

RdReg1

RdReg2

WrRegWrData

RdData1

RdData2

RegWrite

REGFILE

ZERO-EXTEND

ALU

ALUop

Result

Zero?

16-bitsImm16

ALUsrc

0

1Control SignalsDepend UponInstruction Fields

E.g.:ALUsrc = f(Instruction) = f(op, funct)

15Introduction to Computer Organization and Architecture

Page 16: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Steps 2 and 3 Destination Register Must select proper destination, rd or rt

Depends on Instruction Type R-type may write rd I-type may write rt

FromInstruction

RdReg1RdReg2

WrRegWrData

RdData1

RdData2REGFILE

rsrt

rd

ZERO-EXTEND

ALU

ALUop

Result

Zero?

ALUsrc

0

1

RegDst

1

0

16-bitsImm16

RegWrite

16Introduction to Computer Organization and Architecture

Page 17: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Steps 2 and 3: Load Word LW rt, rs, Imm16

Need Data Memory: data ← Mem[Addr] Addr is rs+Imm16, Imm16 is signed, use ALU for +

Store in rt: rt ← Mem[rs+Imm16]

RdReg1RdReg2

WrRegWrData

RdData1

RdData2REGFILE

rsrt

rdSIGN/ZERO-

EXTEND

ALU

ALUop

Result

Zero?

ALUsrc

0

1

RegDst

1

0Imm16

RegWrite

AddrRdData

MemtoReg

0

1

DATAMEM

ExtOp

17Introduction to Computer Organization and Architecture

Page 18: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Steps 2 and 3: Store Word SW rt, rs, Imm16

Need Data Memory: Mem[Addr] ← data Addr is rs+Imm16, Imm16 is signed, use ALU for +

Store in Mem: Mem[rs+Imm16] ← rt

RdReg1

RdReg2

WrRegWrData

RdData1

RdData2REGFILE

rsrt

rdSIGN/ZERO-

EXTEND

ALU

ALUop

Result

Zero?

ALUsrc

0

1

RegDst

1

0

Imm16

RegWrite

AddrRdData

WrData

MemtoReg

1

0

DATAMEM

ExtOp

MemWrite

18Introduction to Computer Organization and Architecture

Page 19: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Writes: Need to Control Timing Problem: write to data memory

Data can come anytime Addr must come first MemWrite must come after Addr

Else? writes to wrong Addr!

Solution: use ideal data memory Assume everything works ok How to fix this for real? One solution: synchronous memory Another solution: delay MemWr to come late

Problems?: write to register file Does RegWrite signal come after WrReg number? When does the write to a register happen? Read from same register as being written?

19Introduction to Computer Organization and Architecture

Page 20: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Missing Pieces: Instruction Fetching Where does the Instruction come from?

From instruction memory, of course!

Recall: stored-program concept Alternatives? How about hard-coding wires and switches…? This

is how ENIAC was programmed! (Electronic Numerical Integrator and Computer)

How to branch? BEQ rs, rt, Imm16

20Introduction to Computer Organization and Architecture

Page 21: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Instruction Processing Fetch instruction Execute instruction

Fetch next instruction Execute next instruction

Fetch next instruction Execute next instruction

Etc…

How to maintain sequence? Use a counter! Branches (out of sequence) ? Load the counter!

21Introduction to Computer Organization and Architecture

Page 22: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Instruction Processing Program Counter

Points to current instruction

Address to instruction memory Instr ← InstrMem[PC]

Next instruction: counts up by 4 Remember: memory is byte-addressable, instructions are 4 bytes PC ← PC + 4

Branch instruction: replace PC contents

22Introduction to Computer Organization and Architecture

Page 23: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Step 1: Analyze Instructions Register Transfer Language…

op | rs | rt | rd | shamt | funct = InstrMem[ PC ]

op | rs | rt | Imm16 = InstrMem[ PC ]

Instr Register Transfers

ADDU R[rd] ← R[rs] + R[rt]; PC ← PC + 4

SUBU R[rd] ← R[rs] – R[rt]; PC ← PC + 4

ORI R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4

LOAD R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4

STORE MEM[ R[rs] + sign_ext(Imm16) ] ← R[rt]; PC ← PC + 4

BEQ if ( R[rs] == R[rt] ) then PC ← PC + 4 + { sign_ext(Imm16)] || b’00’ } else PC ← PC + 4

23Introduction to Computer Organization and Architecture

Page 24: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Steps 2 and 3: Datapath & Assembly

PC: a register Counter, counts by +4 Provides address to Instruction Memory

Add

Readaddress

InstructionMemory

Instruction[31:0]

PC

Instruction[31:0]

4

24Introduction to Computer Organization and Architecture

Page 25: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Steps 2 and 3: Datapath & Assembly

Add AddAdd

result

Readaddress

InstructionMemory

Instruction[31:0]

PC

0Mux1

Sign/Zero

Extend

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0] (Imm16)

16 32

PCSrcShiftLeft 2

4

PC: a register Counter, counts by +4 Sometimes, must add

SignExtend{Imm16||b’00’} for branch instructionsNote: the sign-extender for Imm16

is already in the datapath(everything else is new) ExtOp

25

Page 26: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Steps 2 and 3: Add Previous Datapath

Add Add

ALU

Addresult

ALUresult

Zero

Readaddress

InstructionMemory

Instruction[31:0]

RegisterFile

DataMemory

PC

Addr-ess

Readdata

Writedata

0Mux1

1Mux0

0Mux1

0Mux1

ALUControl

Sign/Zero

Extend

Writereg.

Readreg. 1Readreg. 2

Readdata 2

Readdata 1

Writedata

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0] (Imm16)

Instruction[5:0] (funct)

16 32

RegWrite

RegDst

ALUSrc

MemWrite

PCSrc

MemtoReg

ALUOp

ShiftLeft 2

4

ExtOp

Page 27: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

What have we done? Created a simple CPU datapath

Control still missing (next slide)

Single-cycle CPU Every instruction takes 1 clock cycle Clocking ?

27Introduction to Computer Organization and Architecture

Page 28: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

One Clock Cycle Clock Locations

PC, REGFILE have clocks

Operation On rising edge, PC will get new value

Maybe REGFILE will have one value updated as well After rising edge

PC and REGFILE can’t change New value out of PC Instruction out of INSTRMEM Instruction selects registers to read from REGFILE Instruction controls ALUop, ALUsrc, MemWrite, ExtOp, etc ALU does its work DataMem may be read (depending on instruction) Result value goes back to REGFILE New PC value goes back to PC Await next clock edge

Lots to do in only1 clockcycle !!

28Introduction to Computer Organization and Architecture

Page 29: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Missing Steps? Control is missing (Steps 4 and 5 we mentioned earlier)

Generate the green signals ALUsrc, MemWrite, MemtoReg, PCSrc, RegDst, etc

These are all f(Instruction), where f() is a logic expression Will look at control strategies in upcoming lecture

Implementation Details How to implement REGFILE?

Read port: tristate buffers? Multiplexer? Memory? Two read ports: two of above? Write port: how to write only 1 register?

How to control writes to memory? To register file?

More instructions Shift instructions Jump instruction Etc

29Introduction to Computer Organization and Architecture

Page 30: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

1-Cycle CPU Datapath

Add Add

ALU

Addresult

ALUresult

Zero

Readaddress

InstructionMemory

Instruction[31:0]

RegisterFile

DataMemory

PC

Addr-ess

Readdata

Writedata

0Mux1

1Mux0

0Mux1

0Mux1

ALUControl

Sign/Zero

Extend

Writereg.

Readreg. 1Readreg. 2

Readdata 2

Readdata 1

Writedata

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0] (Imm16)

Instruction[5:0] (funct)

16 32

RegWrite

RegDst

ALUSrc

MemWrite

PCSrc

MemtoReg

ALUOp

ShiftLeft 2

4

ExtOp

Page 31: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

1-cycle CPU Datapath + Control

PCSrc

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Instruction[31:26]

Sign/Zero

Extend

DataMemory

Addr-ess

Readdata

Writedata

ALUALU

result

Zero

Readaddress

InstructionMemory

Instruction[31:0]

Add

PC

4Add

Addresult

ShiftLeft 2

RegisterFile

Writereg.

Readreg. 1

Readreg. 2

Readdata 2

Readdata 1

Writedata

RegDstBranchMemReadMemtoRegALUOpMemWriteALUSrcRegWrite

ALUcontrol

Con-trol

Page 32: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Input or Output Signal Name R-format Lw Sw Beq

Inputs

Op5 0 1 1 0

Op4 0 0 0 0

Op3 0 0 1 0

Op2 0 0 0 1

Op1 0 1 1 0

Op0 0 1 1 0

Outputs

RegDst 1 0 X X

ALUSrc 0 1 1 0

MemtoReg 0 1 X X

RegWrite 1 1 0 0

MemRead 0 1 0 0

MemWrite 0 0 1 0

Branch 0 0 0 1

ALUOp1 1 0 0 0

ALUOp0 0 0 0 1

Also: I-type instructions (ORI) & ExtOp (sign-extend control), etc.

1-cycle CPU Control – Lookup Table

Page 33: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

1-cycle CPU + Jump Instruction

Instruction[31:26]

Instruction[25:0]

PC + 4 [31..28]

Jump address [31..0]

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Page 34: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

1-cycle CPU Problems? Every instruction 1 cycle Some instructions “do more work”

Eg, lw must read from DATAMEM All instructions must have same clock period…

Many instructions run slower than necessary

Tricky timing on MemWrite, RegWrite(?) signals Write signal must come *after* address is stable

Need extra resources… PC+4 adder, ALU for BEQ instruction, DATAMEM+INSTRMEM

34Introduction to Computer Organization and Architecture

Page 35: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Performance! Single-Cycle CPU Performance

Execute one instruction per clock cycle (CPI=1) Clock cycle time? Note dataflow includes:

INSTRMEM read REGFILE access Sign extension ALU operation DATAMEM read REGFILE/PC write

Not every instruction uses all resources (eg, DATAMEM read) Can we change clock period for each instruction?

No! (Why not?) One clock period: the worst case! This is why a single-cycle CPU is not good for performance

35Introduction to Computer Organization and Architecture

Page 36: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

1-cycle CPU Datapath + Controller

Instruction[31:26]

Instruction[25:0]

PC + 4 [31..28]

Jump address [31..0]

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Page 37: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

1-cycle CPU Summary Operation

1 cycle per instruction Control signals held fixed during entire cycle (except BRANCH) Only 2 registers

PC, updated every clock cycle REGFILE, updated when required

During clock cycle, data flows from register-outputs to register-inputs Fixed clock frequency / period

Performance 1 instruction per cycle Slowest instruction determines clock frequency

Outstanding issue: MemWrite timing Assume this signal writes to memory at end of clock cycle

37Introduction to Computer Organization and Architecture

Page 38: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Multi-cycle CPU Goals Improve performance

Break each instruction into smaller steps / multiple cycles LW instruction 5 cycles SW instruction 4 cycles R-type instruction 4 cycles Branch, Jump 3 cycles

Aim for 5x clock frequency Complex instructions (eg, LW) 5 cycles same performance as before Simple instructions (eg, ADD) fewer cycles faster

Save resources (gates/transistors) Re-use ALU over multiple cycles Put INSTR + DATA in same memory

MemWrite timing solved?

38Introduction to Computer Organization and Architecture

Page 39: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Multi-cycle CPU Datapath

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Instr[15:0]

InstructionRegister

MemoryData

Register

ALUOut

A

B

MemoryMemData

Address

Writedata

Registers

RdData1

RdData2

RdReg2

RdReg1

Writereg

Writedata

Add multiplexers + control signals (IorD, MemtoReg, ALUSrcA, ALUSrcB) Move signal paths (+4, Shift Left 2)

4

ShiftLeft 2

SignExtend

PCMux

Mux

ALUALU

result

Zero

Mux

Mux

Mux

Page 40: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Multi-cycle CPU Datapath

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Instr[15:0]

ALUOut

A

B

MemoryMemData

Address

Writedata

Registers

RdData1

RdData2

RdReg2

RdReg1

Writereg

Writedata

Add registers + control signals (IR, MDR, A, B, ALUOut) Registers with no control signal load value every clock cycle (eg, PC)

4

ShiftLeft 2

SignExtend

PCMux

Mux

ALUALU

result

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

Page 41: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Instruction Execution Example Execute a “Load Word” instruction

LW rt, 0(rs)

5 Steps1. Fetch instruction2. Read registers3. Compute address4. Read data5. Write registers

41Introduction to Computer Organization and Architecture

Page 42: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Load Word Instruction Sequence

1. Fetch InstructionInstructionRegister ← Mem[PC]

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[5:0]

Instr[15:0]

ALUOut

A

BWritedata

Registers

RdData1

RdData2

RdReg2

RdReg1

Writereg

Writedata

4

ShiftLeft 2

SignExtend

PCMux

Mux

ALUALU

result

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

Instruction[15:0]

MemoryMemData

Address

Page 43: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Load Word Instruction Sequence

2. Read RegistersA ← Registers[Rs]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Instr[15:0]

ALUOut

A

B

MemoryMemData

Address

Writedata

Registers

RdData2

RdReg2

Writereg

Writedata

4

ShiftLeft 2

SignExtend

PCMux

Mux

ALUALU

result

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

Instruction[25:21]

RdData1

RdReg1

Page 44: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Load Word Instruction Sequence

3. Compute AddressALUOut ← A + {SignExt(Imm16),b’00’}

Instruction[25:21]

Instruction[20:16]

Instruction[15:0]

Instruction[5:0]

Instr[15:0]

B

MemoryMemData

Address

Writedata

Registers

RdData1

RdData2

RdReg2

RdReg1

Writereg

Writedata

4

ShiftLeft 2

SignExtend

PCMux

Mux

ALUALU

result

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

Instruction[15:11]

ALUOut

A

Page 45: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Load Word Instruction Sequence

4. Read DataMDR ← Memory[ALUOut]

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Instr[15:0]

A

BWritedata

Registers

RdData1

RdData2

RdReg2

RdReg1

Writereg

Writedata

4

ShiftLeft 2

SignExtend

PCMux

Mux

ALUALU

result

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

ALUOut

MemoryMemData

Address

Page 46: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Load Word Instruction Sequence

5. Write RegistersRegisters[Rt] ← MDR

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Instr[15:0]

ALUOut

A

B

MemoryMemData

Address

Writedata

Registers

RdData1

RdData2

RdReg2

RdReg1

4

ShiftLeft 2

SignExtend

PCMux

Mux

ALUALU

result

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

Writereg

Writedata

Page 47: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Load Word Instruction Sequence

All 5 Steps Shown

Instruction[5:0]

Instr[15:0]

BWritedata

Registers

RdData2

RdReg2

4

ShiftLeft 2

SignExtend

PCMux

Mux

ALUALU

result

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

ALUOut

MemoryMemData

AddressRdData1

RdReg1

Writereg

Writedata

A

Page 48: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Multi-cycle Load Word: Recap1. Fetch Instruction InstructionRegister ← Mem[PC]

2. Read Registers A ← Registers[Rs]

3. Compute Address ALUOut ← A + {SignExt(Imm16)}

4. Read Data MDR ← Memory[ALUOut]

5. Write Registers Registers[Rt] ← MDR

Missing Steps?

48Introduction to Computer Organization and Architecture

Page 49: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Multi-cycle Load Word: Recap1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read Registers A ← Registers[Rs]

3. Compute Address ALUOut ← A + {SignExt(Imm16)}

4. Read Data MDR ← Memory[ALUOut]

5. Write Registers Registers[Rt] ← MDR

Missing Steps? Must increment the PC Do it as part of the instruction fetch (in step 1) Need PCWrite control signal

49Introduction to Computer Organization and Architecture

Page 50: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Multi-cycle R-Type Instruction1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read Registers A ← Registers[Rs]; B ← Registers[Rt]

3. Compute Value ALUOut ← A op B

4. Write Registers Registers[Rd] ← ALUOut

RTL describes data flow action in each clock cycle Control signals determine precise data flow Each step implies unique control values

50Introduction to Computer Organization and Architecture

Page 51: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Multi-cycle R-Type Instruction: Control Signal Values1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4

MemRead=1, ALUSrcA=0, IorD=0, IRWrite, ALUSrcB=01, ALUop=00, PCWrite, PCSource=00

2. Read Registers A ← Registers[Rs]; B ← Registers[Rt]ALUSrcA=0, ALUSrcB=11, ALUop=00

3. Compute Value ALUOut ← A op BALUSrcA=1, ALUSrcB=00, ALUop=10

4. Write Registers Registers[Rd] ← ALUOutRegDst=1, RegWrite, MemtoReg=0

Each step implies unique control values Fixed for entire cycle “Default value” implied if unspecified

51Introduction to Computer Organization and Architecture

Page 52: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Check Your Work – Is RTL Valid ? 1. Datapath check

Within one cycle… Each cycle has valid data flow path (path exists) Each register gets only one new value

Across multiple cycles… Register value is defined before use in previous (earlier in time) clock cycle

Eg, “A 3” must occur before “B A” Make sure register value doesn’t disappear if set >1 cycle earlier

2. Control signal check Each cycle, RTL describing the datapath flow implies a value for each control

signal 0 or 1 or default or don’t care

Each control signal gets only one fixed value the entire cycle

3. Overall check Does the sequence of steps work ?

52Introduction to Computer Organization and Architecture

Page 53: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Multi-cycle BEQ Instruction

1. Fetch InstructionInstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read Registers, Precompute TargetA ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’}

3. Compare Registers, Conditional Branchif( (A – B) ==0 ) PC ← ALUOut

Green shows PC calculation flow (in parallel with other operations)

53Introduction to Computer Organization and Architecture

Page 54: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Multi-cycle Datapath with Control Signals

Instr[25:21]

Instr[20:16]

Instr[15:0]

Instr[15:0]

Instruction[5:0]

In[15:11]

Instr[25:0]

PC[31..28]

Jumpaddress

[31..0]

PCWrite

IorDMemRead

MemWrite

MemtoReg

IRWritePCSrc

ALUOp

ALUSrcA

ALUSrcB

RegWrite

RegDst

ALUControl

54Introduction to Computer Organization and Architecture

Page 55: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Multi-cycle Datapath with Controller

Instr.[31:26]

Instr[31:26]

Instr[25:21]

Instr[20:16]

Instr[15:0]

Instr[15:0]

Instruction[5:0]

In[15:11]

Instr[25:0]

PC[31..28]

Jumpaddress

[31..0]

Page 56: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Multi-cycle BEQ Instruction

1. Fetch InstructionInstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read Registers, Precompute TargetA ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’}

3. Compare Registers, Conditional Branchif( (A – B) ==0 ) PC ← ALUOut

Green shows PC calculation flow (in parallel with other operations)

56Introduction to Computer Organization and Architecture

Page 57: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Multi-cycle Datapath with Control Signals

Instr[25:21]

Instr[20:16]

Instr[15:0]

Instr[15:0]

Instruction[5:0]

In[15:11]

Instr[25:0]

PC[31..28]

Jumpaddress

[31..0]

PCWrite

IorDMemRead

MemWrite

MemtoReg

IRWritePCSrc

ALUOp

ALUSrcA

ALUSrcB

RegWrite

RegDst

ALUControl

57Introduction to Computer Organization and Architecture

Page 58: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Multi-cycle Datapath with Controller

Instr.[31:26]

Instr[31:26]

Instr[25:21]

Instr[20:16]

Instr[15:0]

Instr[15:0]

Instruction[5:0]

In[15:11]

Instr[25:0]

PC[31..28]

Jumpaddress

[31..0]

Page 59: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Multi-cycle CPU Control: Overview

General approach: Finite State Machine (FSM) Need details in each branch of control…

Precise outputs for each state (Mealy depends on inputs, Moore does not) Precise “next state” for each state (can depend on inputs)

ControlSignalOutputs

ControlSignalOutputs

59Introduction to Computer Organization and Architecture

Page 60: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

How to Implement FSM ? Manually with logic gates + FFs

Bubble diagram, next-state table, state assignment Karnaugh map for each state bit, each output bit (painful!)

High-level language description (eg, Verilog, VHDL) Describe FSM bubble diagram (next-states, output values) Automatically synthesized into gates + FFs

Microcode (µ-code) description Sequence through many µ-ops for each CPU instruction

One µ-op (µ-instruction) sends correct control signal for 1 cycle µ-op similar to one bubble in FSM

Acts like a mini-CPU within a CPU µPC: microcode program counter Microcode storage memory contains µ-ops

Can look similar to RTL or some new “assembly language”

60Introduction to Computer Organization and Architecture

Page 61: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

FSM Specification: Bubble Diagram

Can build thisby examiningRTL

It is possible toautomaticallyconvert RTLinto this form !

61

Page 62: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

FSM: Gates + FFs Implementation

FSMHigh-level

Organization

62Introduction to Computer Organization and Architecture

Page 63: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

FSM: Microcode Implementation

Adder

1

Datapathcontroloutputs

Sequencingcontrol

Inputs from instructionregister opcode field

MicrocodeStorage

(memory)

Inputs

Outputs

Microprogram Counter

Address Select Logic

63Introduction to Computer Organization and Architecture

Page 64: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Multi-cycle CPU with Control FSM

Instr.[31:26]

Instr[31:26]

Instr[25:21]

Instr[20:16]

Instr[15:0]

Instr[15:0]

Instruction[5:0]

In[15:11]

Instr[25:0]

PC[31..28]

Jumpaddress

[31..0]

FSMControlOutputs

ConditionalBranch

Page 65: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Control FSM: Overview

General approach: Finite State Machine (FSM) Need details in each branch of control…

65Introduction to Computer Organization and Architecture

Page 66: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Detailed FSM

66

Page 67: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Detailed FSMInstruction

Fetch

MemoryReference

Branch JumpR-Type

67

Page 68: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Detailed FSM: Instruction Fetch

68Introduction to Computer Organization and Architecture

Page 69: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Detailed FSM: Memory Reference

LW SW

69

Page 70: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Detailed FSM: R-Type Instruction

70Introduction to Computer Organization and Architecture

Page 71: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Detailed FSM: Branch Instruction

71Introduction to Computer Organization and Architecture

Page 72: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Detailed FSM: Jump Instruction

72Introduction to Computer Organization and Architecture

Page 73: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Performance Comparison

Single-cycle CPUvs

Multi-cycle CPU

73Introduction to Computer Organization and Architecture

Page 74: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Simple Comparison

Single-cycle CPU1 clock cycle

5 clock cyclesMulti-cycle CPU

4 clock cyclesMulti-cycle CPU

3 clock cyclesMulti-cycle CPU

SW, R-type

BEQ, J

LW

All

Page 75: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

What’s really happening?

Single-cycle CPU

Multi-cycle CPU

( Load Word Instruction )

Fetch Decode Memory WriteCalcAddr

Ideally:

75Introduction to Computer Organization and Architecture

Page 76: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

In practice, steps differ in speeds…

Single-cycle CPU

Multi-cycle CPU

Fetch Decode MemoryCalcAddr

Fetch Decode MemoryCalcAddr

Write

Write

Violation!Wasted time!

Load Word Instruction

76Introduction to Computer Organization and Architecture

Page 77: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Single-cycle vs Multi-cycleLW instruction faster for single-cycle

Single-cycle CPU

Fetch Decode MemoryCalcAddr

Fetch Decode MemoryCalcAddr

Write

Write

Violation fixed!

Multi-cycle CPU

Now wasted time is larger!

77Introduction to Computer Organization and Architecture

Page 78: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Single-cycle vs Multi-cycleSW instruction ~ same speed

Single-cycle CPU

Fetch Decode MemoryCalcAddr

Fetch Decode MemoryCalcAddr

Multi-cycle CPU

Wasted time!

Speed diff

78Introduction to Computer Organization and Architecture

Page 79: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Single-cycle vs Multi-cycleBEQ, J instruction faster for multi-cycle

Single-cycle CPU

Fetch Decode CalcAddr

Fetch Decode CalcAddr

Wasted time!

Speed diff

Multi-cycle CPU

79Introduction to Computer Organization and Architecture

Page 80: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Performance Summary Which CPU implementation is faster?

LW single-cycle is faster SW,R-type about the same BEQ,J multi-cycle is faster

Real programs use a mix of these instructions

Overall performance depends instruction frequency !

80Introduction to Computer Organization and Architecture

Page 81: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

Implementation Summary Single-cycle CPU

1 instruction per cycle (eg, 1MHz 1 MIPS) No “wasted time” on most complex instruction Large wasted time on simpler instructions Simple controller (just a lookup table or memory) Simple instructions

Multi-cycle CPU << 1 instruction per cycle (eg, 1MHz 0.2 MIPS) Small time wasted on most complex instruction

Hence, this instruction always slower than single-cycle CPU Small time wasted on simple instructions

Eliminates “large wasted time” by using fewer clock cycles Complex controller (FSM) Potential to create complex instructions

81Introduction to Computer Organization and Architecture

Page 82: Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee  wut_cha/home.htm

The End Lecture 11