View
59
Download
0
Category
Preview:
DESCRIPTION
EEM 486 : Computer Architecture Designing a Multicycle Processor. Processor. Input. Control. Memory. Datapath. Output. The Big Picture. Designing a Multiple Clock Cycle Datapath. OPcode. Control Logic / Store (PLA, ROM). Decode. microinstruction. Conditions. Instruction. - PowerPoint PPT Presentation
Citation preview
EEM 486: Computer Architecture
Designing a Multicycle Processor
The Big Picture
Designing a Multiple Clock Cycle Datapath
Control
Datapath
Memory
ProcessorInput
Output
Single-Cycle Processor
In our single-cycle processor, each instruction is realizedby exactly one control command or microinstruction
Control Logic / Store (PLA, ROM)
OPcode
Datapath
Inst
ruct
ion
Decode
Cond
ition
sControlPoints
microinstruction
Abstract View of Single Cycle-Processor
PCNe
xt P
C
Regi
ster
Fetc
h ALU Reg.
W
rt
Mem
Acce
ss
Data
MemIn
stru
ctio
nFe
tch
ALUc
tr
RegD
st
ALUS
rcEx
tOp
Mem
Wr
Equa
l
nPC_
sel
RegW
r
Mem
Wr
Mem
Rd
MainControl
ALUcontrol
op
fun
Ext
What’s Wrong with CPI=1 Processor?
Long Cycle Time All instructions take as much time as the slowest Real memory is not as nice as our idealized memory
◦ Cannot always get the job done in one (short) cycle
PC Inst Memory mux ALU Data Mem mux
PC Reg FileInst Memory mux ALU mux
PC Inst Memory mux ALU Data Mem
PC Inst Memory cmp mux
Reg File
Reg File
Reg File
Arithmetic & Logical
Load
Store
Branch
Critical Path
setup
setup
Memory Access Time
Physics fast memories are small (large memories are slow)
Use a hierarchy of memories
Storage Arrayselected word line
addressstorage cellbit line
sense ampsaddressdecoder
CacheProcessor
1 time-period
proc
. bus
L2Cache
mem
. bus
2-3 time-periods20 - 50 time-periods
memory
Multicycle Approach
Break up the instructions into steps:◦ Let each step take one “smaller” clock cycle
- Balance the amount of work to be done- Restrict each cycle to use only one major functional
unit Major functional units: Memory, Register File, and ALU
◦ Let different instructions take different numbers of cycles Use a functional unit more than once within
execution of one instruction (Less hardware)◦ A single memory unit for both instructions and data◦ A single ALU, rather than an ALU and two adders
At the end of a cycle◦ store values for use in later cycles ◦ introduce additional “internal” registers
Partitioning the CPI=1 Datapath Add registers between smallest steps
PCNe
xt P
C
Oper
and
Fetc
h Exec Reg.
Fil
e
Mem
Acce
ss
Data
Mem
Inst
ruct
ion
Fetc
h
ALUc
tr
RegD
st
ALUS
rcEx
tOp
Mem
Wr
nPC_
sel
RegW
r
Mem
Wr
Mem
Rd
Equa
l
Instruction fetch
Decode and Operand fetch
Execution Memory access
Writeback
Recall: Step-by-step Processor Design
Step 1: ISA Logical Register TransfersStep 2: Components of the DatapathStep 3: RTL + Components DatapathStep 4: Datapath + Logical RTs Physical RTsStep 5: Physical RTs Control
Step 4 : R-type (add, sub, . . .)
inst Logical Register TransfersADDU R[rd]<–R[rs] + R[rt]; PC <– PC + 4
Step 1. Instruction Fetch IR ← MEM[PC], PC ← PC + 4
Step 2. Instruction Decode and Register Fetch A ← R[rs], B ← R[rt]
Step 3. Execution ALUOut ← A op B
Step 4. Write-back R[rd] ← ALUOut
Step 4 : R-type (add, sub, . . .)
PC
Instruction [15-11]
A
B
01
014
ALU ALU
Out
Rs
Rw
Rt Registers
Write data
Read data 1
Read data 2
Instruction [31-26]
Instruction [25-21]
Instruction [20-16]
Instruction [15-0]
Instruction register
Address
MemData
Writedata
Memory
MemRead MemWrite IRWrite RegWrite ALUSrcA
ALUSrcB ALUctr
nPCWrite
Step 4 : Logical immediate
inst Logical Register TransfersORI R[rt] <– R[rs] OR ZExt(Im16); PC <– PC + 4
Step 1. Instruction Fetch IR ← MEM[PC], PC ← PC + 4
Step 2. Instruction Decode and Register Fetch A ← R[rs]
Step 3. Execution ALUOut ← A OR ZExt(Im16)
Step 4. Write-back R[rt] ← ALUOut
Step 4 : Logical immediate
PC
Inst [15-11]
A
B
01
0
14
ALU ALU
Out
Rs
Rw
Rt Registers
Write data
Read data 1
Read data 2
Instruction [31-26]
Instruction [25-21]
Instruction [20-16]
Instruction [15-0]
Instruction register
Address
MemData
Writedata
Memory
MemRead MemWrite IRWrite RegWrite ALUSrcA
ALUSrcB ALUctr
nPCWrite
2
Zeroextend
01
RegDst
16 32
Step 4 : Load
inst Logical Register TransfersLW R[rt] <– MEM[R[rs] + SExt(Im16)]; PC <– PC + 4
Step 1. Instruction Fetch IR ← MEM[PC], PC ← PC + 4
Step 2. Instruction Decode and Register Fetch A ← R[rs]
Step 3. Memory address computation ALUOut ← A + SExt(Im16)
Step 4. Memory access MDR ← Memory[ALUOut]
Step 5. Load completion R[rt] ← MDR
Step 4 : Load
PC
Inst [15-11]
A
B
01
014
ALU ALU
Out
Rs
Rw
Rt Registers
Write data
Read data 1
Read data 2
Instruction [31-26]
Instruction [25-21]
Instruction [20-16]
Instruction [15-0]
Instruction register
Address
MemData
Writedata
Memory
MemRead MemWrite IRWrite RegWrite ALUSrcA
ALUSrcB ALUctr
nPCWrite
2
Extender
01
RegDst
16 32
01
MDR
01
IorD MemtoReg ExtOp
Step 4 : Store
inst Logical Register TransfersSW MEM[R[rs] + SExt(Im16)] <– R[rt]; PC <– PC + 4
Step 1. Instruction Fetch IR ← MEM[PC], PC ← PC + 4
Step 2. Instruction Decode and Register Fetch A ← R[rs], B ← R[rt]
Step 3. Memory address computation ALUOut ← A + SExt(Im16)
Step 4. Memory access Memory[ALUOut] ← B
Step 4 : Store
PC
Inst [15-11]
A
B
01
0
14
ALU ALUOut
Rs
Rw
Rt Registers
Write data
Read data 1
Read data 2
Instruction [31-26]
Instruction [25-21]
Instruction [20-16]
Instruction [15-0]
Instruction register
Address
MemData
Writedata
Memory
MemRead MemWrite IRWrite RegWrite ALUSrcA
ALUSrcB ALUctr
nPCWrite
2
Extender
01
RegDst
16 32
01
MDR
01
IorD MemtoReg ExtOp
Step 4 : Branch
inst Logical Register TransfersBEQ if R[rs] == R[rt] then PC <= PC + 4 + SExt(Im16) || 00
else PC <= PC + 4
Step 1. Instruction Fetch IR ← MEM[PC], PC ← PC + 4
Step 2. Instruction Decode and Register Fetch A ← R[rs], B ← R[rt]
ALUOut ← PC + SExt(Im16) || 00
Step 3. Branch completion If A = B, PC ← ALUOut
Step 4 : Branch
PC
Inst [15-11]
A
B
01
0
14
ALUALUOut
Rs
Rw
Rt Registers
Write data
Read data 1
Read data 2
Instruction [31-26]
Instruction [25-21]
Instruction [20-16]
Instruction [15-0]
Instruction register
Address
MemData
Writedata
Memory
MemRead MemWrite IRWrite RegWrite ALUSrcA
ALUSrcB ALUctr
2
Extender
01
RegDst
16 32
01
MDR
01
IorD MemtoReg
3
Shift left 2
1 0
PCSource
PCWrite PCWriteCond
Zero
ExtOp
Multicycle Processor
RegWriteALUSrcA
RegDst
PCSourcePCWriteCond
PC
Inst [15-11]
A
B
01
014
ALUALUOut
Rs
Rw
Rt Registers
Write data
Read data 1
Read data 2
Instruction [31-26]
Instruction [25-21]
Instruction [20-16]
Instruction [15-0]
Instruction register
Address
MemDataWritedata
Memory
MemReadMemWrite
IRWrite
ALUSrcB
ALUOp
2
Extender
01
16 32
01
MDR
01
IorD
MemtoReg
3
Shift left 2
1 0
PCWrite
Zero
ExtOpControl Op [5-0]
ALUControl
Instruction [5-0]
Step name Action for R-type instructions
Action for memory-reference instructions
Action for branches
Action for jumps
IR = Memory[PC]PC = PC + 4
A = Reg [IR[25-21]]B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II(IR[15-0]) PC = ALUOut (IR[25-0]<<2)
Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]ALUOut or
Store: Memory [ALUOut] = B
Memory read completion Load: Reg[IR[20-16]] = MDR
Instruction fetch
Instruction decode/register fetch
Execution, address computation, branch/jump
completion
Memory access or R-type completion
Summary of Instruction Steps
Performance Evaluation
What is the average CPI?◦ State diagram gives CPI for each instruction type◦ Workload gives frequency of each type
Type CPIi for type Frequency CPIi x freqIi Arith/Logic 4 40% 1.6Load 5 30% 1.5Store 4 10% 0.4Branch 3 20% 0.6
Average CPI: 4.1
Simple Questions
How many cycles will it take to execute this code?
lw $t2, 0($t3)lw $t3, 4($t3)beq $t2, $t3, Label #assume notadd $t5, $t2, $t3sw $t5, 8($t3)
Label:...◦ 21 cycles
What is going on during the 8th cycle of execution?◦ Address calculation to put on ALUOut
In what cycle does the actual addition of $t2 and $t3 takes place?◦ 16th cycle
Summary
Disadvantages of the Single Cycle Processor◦ Long cycle time◦ Cycle time is too long for all instructions except the Load
Multiple Cycle Processor:◦ Divide the instructions into smaller steps◦ Execute each step (instead of the entire instruction) in one
cyclePartition datapath into equal size chunks to minimize
cycle timeFollow same 5-step method for designing “real”
processor
Recommended