Carnegie Mellon
1
Design of Digital Circuits 2014Srdjan CapkunFrank K. Gürkaynak
Adapted from Digital Design and Computer Architecture, David Money Harris & Sarah L. Harris ©2007 Elsevier
http://www.syssec.ethz.ch/education/Digitaltechnik_14
Multi-cycle MIPS Processor
Carnegie Mellon
2
What Will We Learn?
What are the problems of Single-cycle Processor
Multi-cycle Architecture for the MIPS
Determine the performance of Multi-cycle Processor
Carnegie Mellon
3
Single-cycle MIPS Processor
Single-cycle microarchitecture:+ simple
- cycle time limited by longest instruction (lw)
- two adders/ALUs and two memories
Carnegie Mellon
4
Single-cycle Performance
SignImm
CLK
A RD
Instruction
Memory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
0
1
0
1
A RD
Data
Memory
WD
WE0
1
PC0
1
PC' Instr25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
Branch
MemWrite
MemtoReg
ALUSrc
RegWrite
Op
Funct
Control
Unit
Zero
PCSrc
CLK
ALUControl2:0
ALU
1
0100
1
0
1
0 0
Carnegie Mellon
5
Delay of Individual Components (Example)
Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 30
Register setup tsetup 20
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20
Carnegie Mellon
6
Processor Performance
How fast is my program? Every program consists of a series of instructions
Each instruction needs to be executed.
So how fast are my instructions ? Instructions are realized on the hardware
They can take one or more clock cycles to complete
Cycles per Instruction = CPI
How much time is one clock cycle? The critical path determines how much time one cycle requires =
clock period.
1/clock period = clock frequency = how many cycles can be done each second.
Carnegie Mellon
7
Processor Performance
Now as a general formula Our program consists of executing N instructions.
Our processor needs CPI cycles for each instruction.
The maximum clock speed of the processor is f,and the clock period is therefore T=1/f
Our program will execute in
N x CPI x (1/f) = N x CPI x T seconds
Carnegie Mellon
8
How can I Make the Program Run Faster?
N x CPI x (1/f)
Reduce the number of instructions Make instructions that ‘do’ more (CISC)
Use better compilers
Use less cycles to perform the instruction Simpler instructions (RISC)
Use multiple units/ALUs/cores in parallel
Increase the clock frequency Find a ‘newer’ technology to manufacture
Redesign time critical components
Adopt pipelining
Carnegie Mellon
9
Multi-cycle MIPS Processor
Single-cycle microarchitecture:+ simple
- cycle time limited by longest instruction (lw)
- two adders/ALUs and two memories
Multi-cycle microarchitecture:+ higher clock speed
+ simpler instructions run faster
+ reuse expensive hardware on multiple cycles
- sequencing overhead paid many times
Same design steps: datapath & control
Carnegie Mellon
10
What Do We Want To Optimize
Single Cycle Architecture uses two memories
One memory stores instructions, the other data
We want to use a single memory (Smaller size)
Carnegie Mellon
11
What Do We Want To Optimize
Single Cycle Architecture uses two memories
One memory stores instructions, the other data
We want to use a single memory (Smaller size)
Single Cycle Architecture needs three adders
ALU, PC, Branch address calculation
We want to use the ALU for all operations (smaller size)
Carnegie Mellon
12
What Do We Want To Optimize
Single Cycle Architecture uses two memories
One memory stores instructions, the other data
We want to use a single memory (Smaller size)
Single Cycle Architecture needs three adders
ALU, PC, Branch address calculation
We want to use the ALU for all operations (smaller size)
In Single Cycle Architecture all instructions take one cycle
The most complex operation slows down everything!
Divide all instructions into multiple steps
Simpler instructions can take fewer cycles (average case may be faster)
Carnegie Mellon
13
Consider lw instruction
For an instruction such as: lw $t0, 0x20($t1)
We need to:
Read the instruction from memory
Then read $t1 from register array
Add the immediate value (0x20) to calculate the memory address
Read the content of this address
Write to the register $t0 this content
Carnegie Mellon
14
Multi-cycle Datapath: instruction fetch
First consider executing lw
STEP 1: Fetch instruction
b
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Register
File
PCPC' Instr
CLK
WD
WE
CLK
EN
IRWrite
read from the memory location [rs]+imm to location [rt]
op rs rt imm
6 bits 5 bits 5 bits 16 bits
I-Type
Carnegie Mellon
15
Multi-cycle Datapath: lw register read
b
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Register
File
PCPC' Instr25:21
CLK
WD
WE
CLK CLK
A
EN
IRWrite
op rs rt imm
6 bits 5 bits 5 bits 16 bits
I-Type
Carnegie Mellon
16
Multi-cycle Datapath: lw immediate
SignImm
b
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
PCPC' Instr25:21
15:0
CLK
WD
WE
CLK CLK
A
EN
IRWrite
op rs rt imm
6 bits 5 bits 5 bits 16 bits
I-Type
Carnegie Mellon
17
Multi-cycle Datapath: lw address
SignImm
b
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
PCPC' Instr25:21
15:0
SrcB
ALUResult
SrcA
ALUOut
CLK
ALUControl2:0
ALU
WD
WE
CLK CLK
A CLK
EN
IRWrite
op rs rt imm
6 bits 5 bits 5 bits 16 bits
I-Type
Carnegie Mellon
18
Multi-cycle Datapath: lw memory read
SignImm
b
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
PCPC' Instr25:21
15:0
SrcB
ALUResult
SrcA
ALUOut
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
A CLK
EN
IRWriteIorD
0
1
op rs rt imm
6 bits 5 bits 5 bits 16 bits
I-Type
Carnegie Mellon
19
Multi-cycle Datapath: lw write register
SignImm
b
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
PCPC' Instr25:21
15:0
SrcB20:16
ALUResult
SrcA
ALUOut
RegWrite
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
A CLK
EN
IRWriteIorD
0
1
op rs rt imm
6 bits 5 bits 5 bits 16 bits
I-Type
Carnegie Mellon
20
Multi-cycle Datapath: increment PC
PCWrite
SignImm
b
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
0
1PCPC' Instr25:21
15:0
SrcB
20:16
ALUResult
SrcA
ALUOut
ALUSrcARegWrite
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
A
00
01
10
11
4
CLK
ENEN
ALUSrcB1:0
IRWriteIorD
0
1
Carnegie Mellon
21
Multi-cycle Datapath: sw
Write data in rt to memory
SignImm
b
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
0
1PC0
1
PC' Instr25:21
20:16
15:0
SrcB20:16
ALUResult
SrcA
ALUOut
MemWrite ALUSrcARegWrite
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
A
00
01
10
11
4
CLK
ENEN
ALUSrcB1:0
IRWriteIorDPCWrite
B
Carnegie Mellon
22
Multi-cycle Datapath: R-type Instructions
Read from rs and rt
Write ALUResult to register file
Write to rd (instead of rt)
0
1
SignImm
b
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
0
1
0
1PC0
1
PC' Instr25:21
20:16
15:0
SrcB20:16
15:11
ALUResult
SrcA
ALUOut
RegDstMemWrite MemtoReg ALUSrcARegWrite
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0
IRWriteIorDPCWrite
Carnegie Mellon
23
Multi-cycle Datapath: beq
Determine whether values in rs and rt are equal
Calculate branch target address: BTA = (sign-extended immediate << 2) + (PC+4)
SignImm
b
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
0
1
0
1 0
1
PC0
1
PC' Instr25:21
20:16
15:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
RegDst BranchMemWrite MemtoReg ALUSrcARegWrite
Zero
PCSrc
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0
IRWriteIorD PCWrite
PCEn
Carnegie Mellon
24
Complete Multi-cycle Processor
SignImm
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
0
1
0
1 0
1
PC0
1
PC' Instr25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
Re
gD
st
Branch
MemWrite
Mem
toR
eg
ALUSrcA
RegWriteOp
Funct
Control
Unit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWrite
PCEn
Carnegie Mellon
25
Control Unit
ALUSrcA
PCSrc
Branch
ALUSrcB1:0
Opcode5:0
Control
Unit
ALUControl2:0
Funct5:0
Main
Controller
(FSM)
ALUOp1:0
ALU
Decoder
RegWrite
PCWrite
IorD
MemWrite
IRWrite
RegDst
MemtoReg
Register
Enables
Multiplexer
Selects
Carnegie Mellon
26
Main Controller FSM: Fetch
Reset
S0: Fetch
SignImm
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
0
1
0
1 0
1
PC0
1
PC' Instr25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
Reg
Dst
Branch
MemWrite
Mem
toR
eg
ALUSrcA
RegWriteOp
Funct
Control
Unit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWrite
PCEn
0
1 1
0
X
X
00
01
0100
1
0
Carnegie Mellon
27
Main Controller FSM: Fetch
SignImm
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
0
1
0
1 0
1
PC0
1
PC' Instr25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
Reg
Dst
Branch
MemWrite
Mem
toR
eg
ALUSrcA
RegWriteOp
Funct
Control
Unit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWrite
PCEn
0
1 1
0
X
X
00
01
0100
1
0
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
Reset
S0: Fetch
Carnegie Mellon
28
Main Controller FSM: Decode
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
Reset
S0: Fetch S1: Decode
SignImm
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
0
1
0
1 0
1
PC0
1
PC' Instr25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
Reg
Dst
Branch
MemWrite
Mem
toR
eg
ALUSrcA
RegWriteOp
Funct
Control
Unit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWrite
PCEn
X
0 0
0
X
X
0X
XX
XXXX
0
0
Carnegie Mellon
29
Main Controller FSM: Address Calculation
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
Reset
S0: Fetch
S2: MemAdr
S1: Decode
Op = LW
or
Op = SW
SignImm
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
0
1
0
1 0
1
PC0
1
PC' Instr25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
Reg
Dst
Branch
MemWrite
Mem
toR
eg
ALUSrcA
RegWriteOp
Funct
Control
Unit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWrite
PCEn
X
0 0
0
X
X
01
10
010X
0
0
Carnegie Mellon
30
Main Controller FSM: Address Calculation
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
Reset
S0: Fetch
S2: MemAdr
S1: Decode
Op = LW
or
Op = SW
SignImm
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
0
1
0
1 0
1
PC0
1
PC' Instr25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
Reg
Dst
Branch
MemWrite
Mem
toR
eg
ALUSrcA
RegWriteOp
Funct
Control
Unit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWrite
PCEn
X
0 0
0
X
X
01
10
010X
0
0
Carnegie Mellon
31
Main Controller FSM: lw
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
IorD = 1
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemRead
Op = LW
or
Op = SW
Op = LW
RegDst = 0
MemtoReg = 1
RegWrite
S4: Mem
Writeback
Carnegie Mellon
32
Main Controller FSM: sw
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
IorD = 1IorD = 1
MemWrite
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
Op = LW
or
Op = SW
Op = LW
Op = SW
RegDst = 0
MemtoReg = 1
RegWrite
S4: Mem
Writeback
Carnegie Mellon
33
Main Controller FSM: R-Type
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
IorD = 1
RegDst = 1
MemtoReg = 0
RegWrite
IorD = 1
MemWrite
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALU
Writeback
Op = LW
or
Op = SW
Op = R-type
Op = LW
Op = SW
RegDst = 0
MemtoReg = 1
RegWrite
S4: Mem
Writeback
Carnegie Mellon
34
Main Controller FSM: beq
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
IorD = 1
RegDst = 1
MemtoReg = 0
RegWrite
IorD = 1
MemWrite
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCSrc = 1
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALU
Writeback
S8: Branch
Op = LW
or
Op = SW
Op = R-type
Op = BEQ
Op = LW
Op = SW
RegDst = 0
MemtoReg = 1
RegWrite
S4: Mem
Writeback
Carnegie Mellon
35
Complete Multi-cycle Controller FSM
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
IorD = 1
RegDst = 1
MemtoReg = 0
RegWrite
IorD = 1
MemWrite
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCSrc = 1
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALU
Writeback
S8: Branch
Op = LW
or
Op = SW
Op = R-type
Op = BEQ
Op = LW
Op = SW
RegDst = 0
MemtoReg = 1
RegWrite
S4: Mem
Writeback
Carnegie Mellon
36
Main Controller FSM: addi
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
IorD = 1
RegDst = 1
MemtoReg = 0
RegWrite
IorD = 1
MemWrite
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCSrc = 1
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALU
Writeback
S8: Branch
Op = LW
or
Op = SW
Op = R-type
Op = BEQ
Op = LW
Op = SW
RegDst = 0
MemtoReg = 1
RegWrite
S4: Mem
Writeback
Op = ADDI
S9: ADDI
Execute
S10: ADDI
Writeback
Carnegie Mellon
37
Main Controller FSM: addi
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
IorD = 1
RegDst = 1
MemtoReg = 0
RegWrite
IorD = 1
MemWrite
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCSrc = 1
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALU
Writeback
S8: Branch
Op = LW
or
Op = SW
Op = R-type
Op = BEQ
Op = LW
Op = SW
RegDst = 0
MemtoReg = 1
RegWrite
S4: Mem
Writeback
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
RegDst = 0
MemtoReg = 0
RegWrite
Op = ADDI
S9: ADDI
Execute
S10: ADDI
Writeback
Carnegie Mellon
38
Extended Functionality: j
SignImm
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
0
1
0
1PC0
1
PC' Instr25:21
20:16
15:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
RegDst BranchMemWrite MemtoReg ALUSrcARegWrite
Zero
PCSrc1:0
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0
IRWriteIorD PCWrite
PCEn
00
01
10
<<2
25:0 (jump)
31:28
27:0
PCJump
Carnegie Mellon
39
Control FSM: j
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 00
IRWrite
PCWrite
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
IorD = 1
RegDst = 1
MemtoReg = 0
RegWrite
IorD = 1
MemWrite
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCSrc = 01
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALU
Writeback
S8: Branch
Op = LW
or
Op = SW
Op = R-type
Op = BEQ
Op = LW
Op = SW
RegDst = 0
MemtoReg = 1
RegWrite
S4: Mem
Writeback
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
RegDst = 0
MemtoReg = 0
RegWrite
Op = ADDI
S9: ADDI
Execute
S10: ADDI
Writeback
Op = J
S11: Jump
Carnegie Mellon
40
Control FSM: j
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 00
IRWrite
PCWrite
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
IorD = 1
RegDst = 1
MemtoReg = 0
RegWrite
IorD = 1
MemWrite
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCSrc = 01
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALU
Writeback
S8: Branch
Op = LW
or
Op = SW
Op = R-type
Op = BEQ
Op = LW
Op = SW
RegDst = 0
MemtoReg = 1
RegWrite
S4: Mem
Writeback
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
RegDst = 0
MemtoReg = 0
RegWrite
Op = ADDI
S9: ADDI
Execute
S10: ADDI
Writeback
PCSrc = 10
PCWrite
Op = J
S11: Jump
Carnegie Mellon
41
Multi-cycle Performance
Instructions take different number of cycles:
3 cycles: beq, j
4 cycles: R-Type, sw, addi
5 cycles: lw
CPI is weighted average, i.e. SPECINT2000 benchmark:
25% loads
10% stores
11% branches
2% jumps
52% R-type
Average CPI = (0.11 + 0.02) 3 +(0.52 + 0.10) 4 +(0.25) 5 = 4.12
Carnegie Mellon
42
Multi-cycle Performance
Multi-cycle critical path:
Tc =
SignImm
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
0
1
0
1 0
1
PC0
1
PC' Instr25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
Re
gD
st
Branch
MemWrite
Mem
toR
eg
ALUSrcA
RegWriteOp
Funct
Control
Unit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWrite
PCEn
Carnegie Mellon
43
Multi-cycle Performance
Multicycle critical path:
Tc = tpcq + tmux + max(tALU + tmux, tmem) + tsetup
SignImm
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
0
1
0
1 0
1
PC0
1
PC' Instr25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
Re
gD
st
Branch
MemWrite
Mem
toR
eg
ALUSrcA
RegWriteOp
Funct
Control
Unit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWrite
PCEn
Carnegie Mellon
44
Multicycle Performance Example
Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 30
Register setup tsetup 20
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20
Tc =
Carnegie Mellon
45
Multicycle Performance Example
Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 30
Register setup tsetup 20
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20
Tc = tpcq_PC + tmux + max(tALU + tmux, tmem) + tsetup
= [30 + 25 + 250 + 20] ps
= 325 ps
Carnegie Mellon
46
Multi-cycle Performance Example
For a program with 100 billion instructions executing on a multi-cycle MIPS processor
CPI = 4.12
Tc = 325 ps
Execution Time = (# instructions) × CPI × Tc
= (100 × 109)(4.12)(325 × 10-12)= 133.9 seconds
This is slower than the single-cycle processor (92.5 seconds). Why?
Carnegie Mellon
47
Multi-cycle Performance Example
For a program with 100 billion instructions executing on a multi-cycle MIPS processor
CPI = 4.12
Tc = 325 ps
Execution Time = (# instructions) × CPI × Tc
= (100 × 109)(4.12)(325 × 10-12)= 133.9 seconds
This is slower than the single-cycle processor (92.5 seconds). Why?
Not all steps the same length
Sequencing overhead for each step (tpcq + tsetup= 50 ps)
Carnegie Mellon
48
Review: Single-Cycle MIPS Processor
SignImm
CLK
A RD
Instruction
Memory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
0
1
0
1
A RD
Data
Memory
WD
WE0
1
PC0
1PC' Instr
25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
Branch
MemWrite
MemtoReg
ALUSrc
RegWrite
Op
Funct
Control
Unit
Zero
PCSrc
CLK
ALUControl2:0
ALU
0
1
25:0 <<2
27:0 31:28
PCJump
Jump
Carnegie Mellon
49
Review: Multicycle MIPS Processor
ImmExt
CLK
ARD
Instr / Data
Memory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
Register
File
0
1
0
1PC0
1
PC' Instr25:21
20:16
15:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
Zero
CLK
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
00
01
10
<<2
25:0 (Addr)
31:28
27:0
PCJump
5:0
31:26
Branch
MemWrite
ALUSrcA
RegWriteOp
Funct
Control
Unit
PCSrc
CLK
ALUControl2:0
ALUSrcB1:0IRWrite
IorD
PCWrite
PCEn
Re
gD
st
Mem
toR
eg
Carnegie Mellon
50
What Have We Learned?
A more ‘realistic’ architecture
Shared data and program memory
A single ALU for all operations
Multi-cycle: Operations take different number of cycles
Simpler operations take less steps
More complex operations take more steps
Average CPI
Bottom line
Smaller
More complex control
Not necessarily faster (overhead)