45
Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 U nported License .

SC and SC Control

Embed Size (px)

Citation preview

Page 1: SC and SC Control

Designing a Single Cycle Datapath

orThe Do-It-Yourself CPU Kit

Reading 4.4 – HW due Monday

Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Page 2: SC and SC Control

The Big Picture: Where are We Now?

• The Five Classic Components of a Computer

• Today’s Topic: Datapath Design, then Control Design

Control

Datapath

Memory

ProcessorInput

Output

Page 3: SC and SC Control

The Big Picture: The Performance Perspective

• Processor design (datapath and control) will determine:– Clock cycle time– Clock cycles per instruction

• Starting today:– Single cycle processor:

Advantage: One clock cycle per instruction Disadvantage: long cycle time

• ET = Insts * CPI * Cycle Time Execute anentire instruction

Page 4: SC and SC Control

• We're ready to look at an implementation of the MIPS simplified to contain only:– memory-reference instructions: lw, sw – arithmetic-logical instructions: add, sub, and, or, slt– control flow instructions: beq

• Generic Implementation:– use the program counter (PC) to supply instruction address– get the instruction from memory– read registers– use the instruction to decide exactly what to do

The Processor: Datapath & Control

Let’s look at some regularity in our instructions

Page 5: SC and SC Control

Review: Two Types of Logic Components

StateElement

clk

A

BC = f(A,B,state)

CombinationalLogic

A

BC = f(A,B)

Page 6: SC and SC Control

Clocking Methodology

• All storage elements are clocked by the same clock edge

Clk

Don’t CareSetup Hold

.

.

.

.

.

.

.

.

.

.

.

.

Setup Hold

Consequently, our cycle time will be the sum of:(a) The Clock-to-Q time of the input registers.(b) The longest delay path through the combinational logic block.(c) The set up time of the output register.(d) And finally the clock skew.In order to avoid hold time violation, you have to make sure this inequality is fulfilled. ---- DRAW CT

Page 7: SC and SC Control

Which is correct about the ALU and memory in MIPS?

A. The ALU always performs an operation before accessing data memoryB. The ALU sometimes performs an operation before accessing data memoryC. Data memory is always accessed before performing an ALU operationD. Data memory is sometimes accessed before performing an ALU operationE. None of the above.

Isomorphic

Page 8: SC and SC Control

Which is correct about the ALU and the register file in MIPS?

A. The ALU always performs an operation before accessing the register fileB. The ALU sometimes performs an operation before accessing the register fileC. The register file is always accessed before performing an ALU operationD. The register file is sometimes accessed before performing an ALU operationE. None of the above.

Isomorphic

Page 9: SC and SC Control

So what does this tell us?

Draw the register file before ALU before memory

Page 10: SC and SC Control

Register Transfer Language (RTL)

• is a mechanism for describing the movement and manipulation of data between storage elements:

R[3] <- R[5] + R[7]PC <- PC + 4 + R[5]R[rd] <- R[rs] + R[rt]R[rt] <- Mem[R[rs] + immed]

We’ll be using this from time to time – its just a shorthand for what is going on in hardware, we’ll use it in a second

Page 11: SC and SC Control

Review: The MIPS Instruction Formats

• All MIPS instructions are 32 bits long. The three instruction formats:

R-type

I-type

J-typeop target address

02631

6 bits 26 bits

op rs rt rd shamt funct061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits

Before we start designing our processor – we need to know how the instructions look alike.

MIPS is simple – only 3 formats and they have some common features. Let’s look more closely at the few instructions we are focusing on today.

Page 12: SC and SC Control

The MIPS Subset

• R-type– add rd, rs, rt– sub, and, or, slt

• LOAD and STORE– lw rt, rs, imm16– sw rt, rs, imm16

• BRANCH:– beq rs, rt, imm16

op rs rt rd shamt funct061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits

op rs rt displacement016212631

6 bits 16 bits5 bits5 bits

PC = PC+4

R[rd] = R[rs] OP R[rt]

PC = PC+4

R[rt] = Mem[R[rs] + SE(imm)] OR

Mem[R[rs] + SE(imm)] = R[rt]

ZERO = (R[rs] – R[rt] == 0)

PC = if(ZERO) PC + 4+ (SE(Imm)<<2)

Else PC = PC+4

BEFORE GOING ON… quick reminder…

Page 13: SC and SC Control

Storage Element: Register

• Register– Similar to the D Flip Flop except

N-bit input and output Write Enable input

– Write Enable: 0: Data Out will not change 1: Data Out will become Data In (on the clock edge)

Clk

Data In

Write Enable

N N

Data Out

Page 14: SC and SC Control

Which of these describes our register file?

A. Two 32-bit outputs, 3 5-bit inputs, clk input, 1-bit control input

B. Two 32-bit outputs, 3 32-bit inputs, clk input, 1-bit control input

C. Two 32-bit outputs, 2 5-bit inputs, 1 32-bit input, clk input, 1-bit control input

D. Two 32-bit outputs, 2 32-bit inputs, 1 32-bit input, clk input, 1-bit control input

E. None of the above

Page 15: SC and SC Control

Register File

Clk

Write Data

RegWrite

3232

Read Data 1

32Read Data 2

32 32-bitRegisters

5

5

5

RR1

RR2

WR

Page 16: SC and SC Control

Which of these describes our memory (for now)?

A. One 32-bit output, 1 5-bit input, 1 32-bit input, clk input, 1-bit control input, 1 bit control input

B. One 32-bit output, 2 5-bit inputs, clk input, 1-bit control input, 1 bit control input

C. One 32-bit output, 2 32-bit inputs, clk input, 2 1-bit control inputs

D. One 32-bit output, 1 32-bit input, clk input, 2 1-bit control inputs

E. None of the above

Page 17: SC and SC Control

Memory

Clk

Write Data

MemWrite

32 32Read Data

Address

MemRead

Page 18: SC and SC Control

Can we layout a high-level design to do this?

Draw as much as you can implementing one instruction at a time – get the students involved

You’ll want to do something like this for your lab

Page 19: SC and SC Control

Putting it All Together: A Single Cycle Datapath• We have everything except control signals (later)

Page 20: SC and SC Control

Ignoring control - which instruction does this active datapath represent

A. R-typeB. lwC. swD. BeqE. None of the above

Active Single-Cycle Datapath

Page 21: SC and SC Control

Ignoring control - which instruction does this active datapath represent

A. R-typeB. lwC. swD. BeqE. None of the above

Active Single-Cycle Datapath

Page 22: SC and SC Control

Ignoring control - which instruction does this active datapath represent

A. R-typeB. lwC. swD. BeqE. None of the above

Active Single-Cycle Datapath

Page 23: SC and SC Control

Ignoring control - which instruction does this active datapath represent

A. R-typeB. lwC. swD. BeqE. None of the above

Active Single-Cycle Datapath

Page 24: SC and SC Control

Key Points

• CPU is just a collection of state and combinational logic• We just designed a very rich processor, at least in terms of

functionality• ET = IC * CPI * Cycle Time

– where does the single-cycle machine fit in?

Page 25: SC and SC Control

Control Logic for the Single-Cycle CPU

orWho’s in charge here?

Page 26: SC and SC Control

Putting it All Together: A Single Cycle Datapath• We have everything except control signals

We’re going to connect up all these Signals to a central place, and controlThem from there, based on opcode/funct

Page 27: SC and SC Control

Okay, then, what about those Control Signals?

Point out we’ve just hooked these up.

Page 28: SC and SC Control

Peer instruction question asking if decode can happen in parallel with register read.

Selection

Select the true statement for MIPS

A Registers can be read in parallel with control signal generationB Instruction Read can be done in parallel with control signal generationC Registers can be written in parallel with control signal generationD The main ALU can execute in parallel with control signal generationE None of the above

Page 29: SC and SC Control

Okay, then, what about those Control Signals?

Start here

Notice control bits come from opcode and sometimes function code bits. R-type are the same except for the ALU

Page 30: SC and SC Control

ALU control bits• Recall: 5-function ALU

ALU control input Function Operations 000 And and 001 Or or 010 Add add, lw, sw 110 Subtract sub, beq 111 Slt slt

Take your time here, this isn’t obvious. These are the 3 bit input signals which cause the processor to do what you want.

Page 31: SC and SC Control

Full ALU

sign bit (adder output from bit 31)

what signals accomplish: Binvert CIn Operand?or? add?sub?beq?slt?

And 0 0 0Or 0 0 1Add 0 0 2Sub 1 1 2Beq 1 1 2Slt 1 1 3

Consolidate to 3 wires since Binvert and CIn are always the same

Page 32: SC and SC Control

ALU control bits• Recall: 5-function ALU

• based on and from instruction• ALU doesn’t need to know all opcodes--we will summarize

opcode with ALUOp (2 bits):00 - lw,sw 01 - beq 10 - R-format

ALU control input Function Operations 000 And and 001 Or or 010 Add add, lw, sw 110 Subtract sub, beq 111 Slt slt

MainControl

op6

ALUControl

func

2

6ALUop

ALUctr3

Opcode (31-26) Function code (5-0)

Page 33: SC and SC Control

Generating ALU controlInstruction

opcodeALUOp Instruction

operationFunction

codeDesired

ALUaction

ALUcontrolinput

lw 00 load word xxxxxx add 010

sw 00 store word xxxxxx add 010

beq 01 branch eq xxxxxx subtract 110

R-type 10 add 100000 add 010

R-type 10 subtract 100010 subtract 110

R-type 10 AND 100100 and 000

R-type 10 OR 100101 or 001

R-type 10 slt 101010 slt 111

ALUControlLogic

Essentially a truth table, and we can design logic to do this.

Page 34: SC and SC Control

Generating individual ALU signalsALUop Function ALUCtr

signals 00 xxxx 010

01 xxxx 110

10 0000 010

10 0010 110

10 0100 000

10 0101 001

10 1010 111

ALUctr2 = ALUctr1 =

ALUctr0 =

MainControl

op6

ALUControl

func

2

6ALUop

ALUctr3

Op0 + Op1F1Op1+F2Op1Op0(F0+F3)

A: (Op1)(!Op)(F0+F3)

B: !Op1+!F2

C: Op0+Op1F1

Select ALUctr2 ALUctr1 ALUctr0

A A B C

B A C BC B C AD C B AE None of the above

Page 35: SC and SC Control

Select RegDst MemToReg ALUOp

A 0 X 00B 1 0 00C 0 X 10D 1 0 10E None of the above

add instruction control signals?

ISOMORPHIC

Page 36: SC and SC Control

Select ALUSrc RegDst ALUOp

A 0 0 00B 1 X 00C 0 0 10D 1 X 10E None of the above

sw instruction control signals?

ISOMORPHIC

Page 37: SC and SC Control

beq Control

Instruction RegDst ALUSrcMemto-

RegReg Write

Mem Read

Mem Write Branch ALUOp1 ALUp0

R-formatlwswbeq

Ultimately we canGenerate the controlSignals for all insts.

Branches are a bit tricker – let’sDo this together

Page 38: SC and SC Control

Control Truth TableR-format lw sw beq

Opcode 000000 100011 101011 000100RegDst 1 0 x xALUSrc 0 1 1 0MemtoReg 0 1 x xRegWrite 1 1 0 0

Outputs MemRead 0 1 0 0MemWrite 0 0 1 0Branch 0 0 0 1ALUOp1 1 0 0 0ALUOp0 0 0 0 1

Here’s a truth table – which means we can make the logic to design it.

Page 39: SC and SC Control

Control• Simple combinational logic (truth tables)

Operation2

Operation1

Operation0

Operation

ALUOp1

F3

F2

F1

F0

F (5– 0)

ALUOp0

ALUOp

ALU control block

R-format Iw sw beq

Op0Op1Op2Op3

Op4Op5

Inputs

Outputs

RegDst

ALUSrc

MemtoReg

RegWrite

MemRead

MemWrite

Branch

ALUOp1

ALUOpO

Here’s the truth table

Page 40: SC and SC Control

Which wire – if always ZERO – would break add?

CB A

D

ISOMORPHIC

Page 41: SC and SC Control

Which wire – if always ONE – would break lw?

CB A

D

ISOMORPHIC

Page 42: SC and SC Control

Add new instructions

• Potentially requires modifying the datapath• Potentially requires adding more control wires – which

would impact our previous truth table.

Page 43: SC and SC Control

Select Best Answer

A Yes – we need both new control and datapath.B Yes – we need just datapath.C No – but we should for better performance.D No – just changing control signals is fine.E Single cycle can’t do jump register.

Do we need to modify our single-cycle design to do jr

ISOMORPHIC

Page 44: SC and SC Control

Single-Cycle CPU Summary

• Easy, particularly the control• Which instruction takes the longest? By how much? Why

is that a problem?• ET = IC * CPI * CT • What else can we do?

Page 45: SC and SC Control

Single-Cycle CPU Summary

• Easy, particularly the control• Which instruction takes the longest? By how much? Why

is that a problem?• ET = IC * CPI * CT• What else can we do?• When does a multi-cycle implementation make sense?

– e.g., 70% of instructions take 75 ns, 30% take 200 ns?– suppose 20% overhead for extra latches

• Real machines have much more variable instruction latencies than this.

200 vs. (200*.3+75*.7)*1.2 (60+50)*1.2 ~ 135