Upload
tranphuc
View
215
Download
0
Embed Size (px)
Citation preview
Designing a Single Cycle Datapath
orThe Do-It-Yourself CPU Kit
Reading 4.4 – HW due Monday
Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
The Big Picture: Where are We Now?
• The Five Classic Components of a Computer
• Today’s Topic: Datapath Design, then Control Design
Control
Datapath
Memory
ProcessorInput
Output
The Big Picture: The Performance Perspective
• Processor design (datapath and control) will determine:– Clock cycle time– Clock cycles per instruction
• Starting today:– Single cycle processor:
Advantage: One clock cycle per instruction Disadvantage: long cycle time
• ET = Insts * CPI * Cycle Time Execute anentire instruction
• We're ready to look at an implementation of the MIPS simplified to contain only:– memory-reference instructions: lw, sw – arithmetic-logical instructions: add, sub, and, or, slt– control flow instructions: beq
• Generic Implementation:– use the program counter (PC) to supply instruction address– get the instruction from memory– read registers– use the instruction to decide exactly what to do
The Processor: Datapath & Control
Let’s look at some regularity in our instructions
Review: Two Types of Logic Components
StateElement
clk
A
BC = f(A,B,state)
CombinationalLogic
A
BC = f(A,B)
Clocking Methodology
• All storage elements are clocked by the same clock edge
Clk
Don’t CareSetup Hold
.
.
.
.
.
.
.
.
.
.
.
.
Setup Hold
Consequently, our cycle time will be the sum of:(a) The Clock-to-Q time of the input registers.(b) The longest delay path through the combinational logic block.(c) The set up time of the output register.(d) And finally the clock skew.In order to avoid hold time violation, you have to make sure this inequality is fulfilled. ---- DRAW CT
Which is correct about the ALU and memory in MIPS?
A. The ALU always performs an operation before accessing data memoryB. The ALU sometimes performs an operation before accessing data memoryC. Data memory is always accessed before performing an ALU operationD. Data memory is sometimes accessed before performing an ALU operationE. None of the above.
Isomorphic
Which is correct about the ALU and the register file in MIPS?
A. The ALU always performs an operation before accessing the register fileB. The ALU sometimes performs an operation before accessing the register fileC. The register file is always accessed before performing an ALU operationD. The register file is sometimes accessed before performing an ALU operationE. None of the above.
Isomorphic
So what does this tell us?
Draw the register file before ALU before memory
Register Transfer Language (RTL)
• is a mechanism for describing the movement and manipulation of data between storage elements:
R[3] <- R[5] + R[7]PC <- PC + 4 + R[5]R[rd] <- R[rs] + R[rt]R[rt] <- Mem[R[rs] + immed]
We’ll be using this from time to time – its just a shorthand for what is going on in hardware, we’ll use it in a second
Review: The MIPS Instruction Formats
• All MIPS instructions are 32 bits long. The three instruction formats:
R-type
I-type
J-typeop target address
02631
6 bits 26 bits
op rs rt rd shamt funct061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
Before we start designing our processor – we need to know how the instructions look alike.
MIPS is simple – only 3 formats and they have some common features. Let’s look more closely at the few instructions we are focusing on today.
The MIPS Subset
• R-type– add rd, rs, rt– sub, and, or, slt
• LOAD and STORE– lw rt, rs, imm16– sw rt, rs, imm16
• BRANCH:– beq rs, rt, imm16
op rs rt rd shamt funct061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
op rs rt displacement016212631
6 bits 16 bits5 bits5 bits
PC = PC+4
R[rd] = R[rs] OP R[rt]
PC = PC+4
R[rt] = Mem[R[rs] + SE(imm)] OR
Mem[R[rs] + SE(imm)] = R[rt]
ZERO = (R[rs] – R[rt] == 0)
PC = if(ZERO) PC + 4+ (SE(Imm)<<2)
Else PC = PC+4
BEFORE GOING ON… quick reminder…
Storage Element: Register
• Register– Similar to the D Flip Flop except
N-bit input and output Write Enable input
– Write Enable: 0: Data Out will not change 1: Data Out will become Data In (on the clock edge)
Clk
Data In
Write Enable
N N
Data Out
Which of these describes our register file?
A. Two 32-bit outputs, 3 5-bit inputs, clk input, 1-bit control input
B. Two 32-bit outputs, 3 32-bit inputs, clk input, 1-bit control input
C. Two 32-bit outputs, 2 5-bit inputs, 1 32-bit input, clk input, 1-bit control input
D. Two 32-bit outputs, 2 32-bit inputs, 1 32-bit input, clk input, 1-bit control input
E. None of the above
Register File
Clk
Write Data
RegWrite
3232
Read Data 1
32Read Data 2
32 32-bitRegisters
5
5
5
RR1
RR2
WR
Which of these describes our memory (for now)?
A. One 32-bit output, 1 5-bit input, 1 32-bit input, clk input, 1-bit control input, 1 bit control input
B. One 32-bit output, 2 5-bit inputs, clk input, 1-bit control input, 1 bit control input
C. One 32-bit output, 2 32-bit inputs, clk input, 2 1-bit control inputs
D. One 32-bit output, 1 32-bit input, clk input, 2 1-bit control inputs
E. None of the above
Memory
Clk
Write Data
MemWrite
32 32Read Data
Address
MemRead
Can we layout a high-level design to do this?
Draw as much as you can implementing one instruction at a time – get the students involved
You’ll want to do something like this for your lab
Putting it All Together: A Single Cycle Datapath• We have everything except control signals (later)
Ignoring control - which instruction does this active datapath represent
A. R-typeB. lwC. swD. BeqE. None of the above
Active Single-Cycle Datapath
Ignoring control - which instruction does this active datapath represent
A. R-typeB. lwC. swD. BeqE. None of the above
Active Single-Cycle Datapath
Ignoring control - which instruction does this active datapath represent
A. R-typeB. lwC. swD. BeqE. None of the above
Active Single-Cycle Datapath
Ignoring control - which instruction does this active datapath represent
A. R-typeB. lwC. swD. BeqE. None of the above
Active Single-Cycle Datapath
Key Points
• CPU is just a collection of state and combinational logic• We just designed a very rich processor, at least in terms of
functionality• ET = IC * CPI * Cycle Time
– where does the single-cycle machine fit in?
Control Logic for the Single-Cycle CPU
orWho’s in charge here?
Putting it All Together: A Single Cycle Datapath• We have everything except control signals
We’re going to connect up all these Signals to a central place, and controlThem from there, based on opcode/funct
Okay, then, what about those Control Signals?
Point out we’ve just hooked these up.
Peer instruction question asking if decode can happen in parallel with register read.
Selection
Select the true statement for MIPS
A Registers can be read in parallel with control signal generationB Instruction Read can be done in parallel with control signal generationC Registers can be written in parallel with control signal generationD The main ALU can execute in parallel with control signal generationE None of the above
Okay, then, what about those Control Signals?
Start here
Notice control bits come from opcode and sometimes function code bits. R-type are the same except for the ALU
ALU control bits• Recall: 5-function ALU
ALU control input Function Operations 000 And and 001 Or or 010 Add add, lw, sw 110 Subtract sub, beq 111 Slt slt
Take your time here, this isn’t obvious. These are the 3 bit input signals which cause the processor to do what you want.
Full ALU
sign bit (adder output from bit 31)
what signals accomplish: Binvert CIn Operand?or? add?sub?beq?slt?
And 0 0 0Or 0 0 1Add 0 0 2Sub 1 1 2Beq 1 1 2Slt 1 1 3
Consolidate to 3 wires since Binvert and CIn are always the same
ALU control bits• Recall: 5-function ALU
• based on and from instruction• ALU doesn’t need to know all opcodes--we will summarize
opcode with ALUOp (2 bits):00 - lw,sw 01 - beq 10 - R-format
ALU control input Function Operations 000 And and 001 Or or 010 Add add, lw, sw 110 Subtract sub, beq 111 Slt slt
MainControl
op6
ALUControl
func
2
6ALUop
ALUctr3
Opcode (31-26) Function code (5-0)
Generating ALU controlInstruction
opcodeALUOp Instruction
operationFunction
codeDesired
ALUaction
ALUcontrolinput
lw 00 load word xxxxxx add 010
sw 00 store word xxxxxx add 010
beq 01 branch eq xxxxxx subtract 110
R-type 10 add 100000 add 010
R-type 10 subtract 100010 subtract 110
R-type 10 AND 100100 and 000
R-type 10 OR 100101 or 001
R-type 10 slt 101010 slt 111
ALUControlLogic
Essentially a truth table, and we can design logic to do this.
Generating individual ALU signalsALUop Function ALUCtr
signals 00 xxxx 010
01 xxxx 110
10 0000 010
10 0010 110
10 0100 000
10 0101 001
10 1010 111
ALUctr2 = ALUctr1 =
ALUctr0 =
MainControl
op6
ALUControl
func
2
6ALUop
ALUctr3
Op0 + Op1F1Op1+F2Op1Op0(F0+F3)
A: (Op1)(!Op)(F0+F3)
B: !Op1+!F2
C: Op0+Op1F1
Select ALUctr2 ALUctr1 ALUctr0
A A B C
B A C BC B C AD C B AE None of the above
Select RegDst MemToReg ALUOp
A 0 X 00B 1 0 00C 0 X 10D 1 0 10E None of the above
add instruction control signals?
ISOMORPHIC
Select ALUSrc RegDst ALUOp
A 0 0 00B 1 X 00C 0 0 10D 1 X 10E None of the above
sw instruction control signals?
ISOMORPHIC
beq Control
Instruction RegDst ALUSrcMemto-
RegReg Write
Mem Read
Mem Write Branch ALUOp1 ALUp0
R-formatlwswbeq
Ultimately we canGenerate the controlSignals for all insts.
Branches are a bit tricker – let’sDo this together
Control Truth TableR-format lw sw beq
Opcode 000000 100011 101011 000100RegDst 1 0 x xALUSrc 0 1 1 0MemtoReg 0 1 x xRegWrite 1 1 0 0
Outputs MemRead 0 1 0 0MemWrite 0 0 1 0Branch 0 0 0 1ALUOp1 1 0 0 0ALUOp0 0 0 0 1
Here’s a truth table – which means we can make the logic to design it.
Control• Simple combinational logic (truth tables)
Operation2
Operation1
Operation0
Operation
ALUOp1
F3
F2
F1
F0
F (5– 0)
ALUOp0
ALUOp
ALU control block
R-format Iw sw beq
Op0Op1Op2Op3
Op4Op5
Inputs
Outputs
RegDst
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branch
ALUOp1
ALUOpO
Here’s the truth table
Which wire – if always ZERO – would break add?
CB A
D
ISOMORPHIC
Which wire – if always ONE – would break lw?
CB A
D
ISOMORPHIC
Add new instructions
• Potentially requires modifying the datapath• Potentially requires adding more control wires – which
would impact our previous truth table.
Select Best Answer
A Yes – we need both new control and datapath.B Yes – we need just datapath.C No – but we should for better performance.D No – just changing control signals is fine.E Single cycle can’t do jump register.
Do we need to modify our single-cycle design to do jr
ISOMORPHIC
Single-Cycle CPU Summary
• Easy, particularly the control• Which instruction takes the longest? By how much? Why
is that a problem?• ET = IC * CPI * CT • What else can we do?
Single-Cycle CPU Summary
• Easy, particularly the control• Which instruction takes the longest? By how much? Why
is that a problem?• ET = IC * CPI * CT• What else can we do?• When does a multi-cycle implementation make sense?
– e.g., 70% of instructions take 75 ns, 30% take 200 ns?– suppose 20% overhead for extra latches
• Real machines have much more variable instruction latencies than this.
200 vs. (200*.3+75*.7)*1.2 (60+50)*1.2 ~ 135