74
Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture slide contents

Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

Embed Size (px)

Citation preview

Page 1: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

Computer OrganizationCS224

Chapter 4 Part a The Processor

Spring 2011

With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture slide contents

Page 2: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

The Big Picture

• The Five Classic Components of a Computer

• Today’s Topic: Design a Single Cycle Processor

Control

Datapath

Memory

Processor Input

Output

Page 3: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

The Performance Perspective• Performance of a machine is determined by:

– Instruction count– Clock cycle time– Clock cycles per instruction

• Processor design (datapath and control) will determine:– Clock cycle time– Clock cycles per instruction

• Today: Single cycle processor– Advantage: One clock cycle per instruction– Disadvantage: long cycle time

CPI

Inst. Count Cycle Time

Page 4: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Big Picture: Processor Implementation

• Key Ideas– Concept of datapath and control– Where the instruction and data bits go– Modern hardware organization

• Clocking, combinational, and sequential logic using computer organization as an example

– Handling complexity• Abstraction, use commonality, multilevel interpretation

• Approach– Start with a simple implementation and iteratively improve it

Page 5: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Processor Design Steps

1. Analyze instruction set => datapath requirements– the meaning of each instruction is given by the register transfers

(ISA model => RTL model)– datapath must include storage element for ISA registers

• possibly more– datapath must support each register transfer

2. Select set of datapath components and establish clocking methodology

3. Assemble datapath meeting the RTL requirements

Page 6: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Processor Design (cont’d)4. Analyze implementation of each instruction to determine setting of control points that effect the register transfer.

5. Assemble the control logic

6. RTL datapath and control design are refined to track physical design and functional validation

– Changes made for timing and errata (a.k.a. “bug”) fixes– Amount of work varies with capabilities of CAD tools and degree

of optimization for cost/performance

Page 7: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Subset of Instructions• To simplify our study of processor design, we will focus

on a subset of the MIPS instructions– Memory: lw and sw– Arithmetic: add, sub, and, ori, and slt– Branch: beq and j

• Example in lecture uses ori rather than or covered in text, to demonstrate one more category of instructions

• The method of implementing other instructions should come naturally from these

Page 8: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

MIPS Format Review• R-Format

– add rd, rs, rt– sub rd, rs, rt

OP=0 rs rt rd sa funct

Bits 6 5 5 5 5 6

firstsource

register

secondsource

register

resultregister

shiftamount

functioncode

Page 9: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

MIPS Format Review (cont)

• I-Format– lw rt, rs, imm– sw rt, rs, imm– beq rs, rt, imm– ori rt, rs, imm

• Reminders– Branch uses PC Relative addressing (PC + 4 + 4 × imm)

OP rs rt imm

Bits 6 5 5 16

firstsource

register

secondsource

register

immediate

Page 10: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

MIPS Format Review (cont)• J-Format

– j target

• Reminders– Uses pseudodirect addressing (target × 4) to allow addressing

228 bits directly– Uses top 4 bits from PC

OP target

Bits 6 26

jump target address

Page 11: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

What Happens?• It’s hard to see how we should go about organizing the

processor• To start thinking about it, look at what happens on each

instruction– The instruction specified by the PC is fetched from memory– One or two registers are read (lw vs. add for instance)– The ALU must be used to add, subtract, etc.– The results are stored (to memory or a register)

Page 12: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Execution Cycle

Instruction

Fetch

Instruction

Decode

Operand

Fetch

Execute

Result

Store

Next

Instruction

Obtain instruction from program storage

Determine required actions and instruction size

Locate and obtain operand data

Compute result value or status

Deposit results in storage for later use

Determine successor instruction

Page 13: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Implementation Overview• Data flows through memory and functional units

RegistersRegister #

Data

Register #

Datamemory

Address

Data

Register #

PC Instruction ALU

Instructionmemory

Address

Page 14: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Some Logic Design…• Two important definitions

– Combinational – output is dependent only on current inputs• Example: ALU

– Sequential – element contains state information• Example: Registers

Page 15: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

1-bitFull

Adder

1 bit ALU• Using a MUX we can add the AND, OR, and adder

operations into a single ALU

A

B

Cout

Cin ALUOp

Mu

x Result

Page 16: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

4 bit ALU

A0

B01-bitALU

Result0

CIn0

COut0A1

B11-bitALU

Result1

CIn1

COut1A2

B21-bitALU

Result2

CIn2

COut2A3

B31-bitALU

Result3

CIn3

COut3

COut3

ALUopALUop

4

4

A

B

ALUopALUop

3

Page 17: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Combinational Elements

32A

B32

Sum

Carry

Ad

der

CarryIn

32A

B32

Y32

Select

MU

X

32

32

A

B32

Result

Zero

OP

AL

U

Adder

ALU

MUX

Page 18: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

D Latches• Modified SR Latch• Latches value when C is asserted

C

D

Q

Q

Page 19: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

D Flip Flops• Uses Master/Slave D Latches

D

CLK

Q

Q

D

Latch

D

C

Q

Q

D

Latch

D

C

Q

Q

Page 20: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Storage Element: Register• Register

– Similar to D Flip Flop• N bit input and output

• Write Enable input

– Write Enable• 0: Data Out will not change

• 1: Data Out will become Data In

– Data changes only on falling edge!Clk

Data In

Write Enable

N N

Data Out

Page 21: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Storage Element: Reg File• Register File consists of 32 registers

– Two 32 bit output busses• busA and busB

– One 32 bit input bus• busW

– Register 0 hard wired to value 0– Register selected by

• RA selects register to put on busA• RB selects register to put on busB• RW selects register to be written via busW when Write Enable is 1

– Clock input (CLK)• CLK input is a factor only for write operation• During read, behaves as combinational logic block

– RA or RB stable busA or busB valid after “access time”– Minor simplification of reality

Clk

busW

Write Enable

32 32busA

32busB

5 5 5RW RA RB

32 32-bitRegisters

Page 22: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Storage Element: Memory• Memory

– One input bus: Data In– One output bus: Data Out– Address selection

• Address selects the word to put on Data Out

• To write to address, set Write Enable to 1

– Clock input (CLK)• CLK input is a factor only for write operation• During read, behaves as combinational logic block

– Valid Address Data Out valid after “access time”– Minor simplification of reality

Clk

Data In

Write Enable

32 32Data Out

Address

Page 23: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Some Logic Design…• All storage elements have same clock

– Edge-triggered clocking

– “Instantaneous” state change (simplification!)

– Timing always work if the clock is slow enough

Cycle Time = Clk-to-Q + Longest Delay + Setup + Clock Skew

Clk

Don’t CareSetup Hold

.

.

.

.

.

.

Setup Hold

.

.

.

.

.

.

Page 24: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Instruction Fetch (I.F.) RTL• Common RTL operations

– Fetch instructionMem[PC]; Fetch instruction from memory

– Update program counter• Sequential

PC <- PC + 4; Calculate next address

Page 25: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Datapath: I.F. Unit

32

Instruction WordAddress

InstructionMemory

PCClk

Ad

der

4

Page 26: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Add RTL• Add instructionadd rd, rs, rt

Mem[PC]; Fetch instruction from memory

R[rd] <- R[rs] + R[rt]; Add operation

PC <- PC + 4; Calculate next address

OP=0 rs rt rd sa funct

Bits 6 5 5 5 5 6

firstsource

register

secondsource

register

resultregister

shiftamount

functioncode(=32)

Page 27: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Sub RTL• Sub instructionsub rd, rs, rt

Mem[PC]; Fetch instruction from memory

R[rd] <- R[rs] - R[rt]; Sub operation

PC <- PC + 4; Calculate next address

OP=0 rs rt rd sa funct

Bits 6 5 5 5 5 6

firstsource

register

secondsource

register

resultregister

shiftamount

functioncode(=34)

Page 28: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Datapath: Reg/Reg Ops• R[rd] <- R[rs] op R[rt];

– ALU control and RegWr based on decoded instruction– Ra, Rb, and Rd from rs, rt, rd fields

32

Result

ALU control

Clk

busW

RegWr

3232

busA

32busB

5 5 5

Rw Ra Rb32 32-bitRegisters

Rs RtRd

AL

U

Instruction

Page 29: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

OR Immediate RTL• OR Immediate instructionori rt, rs, imm

Mem[PC]; Fetch instruction from memory

R[rt] <- R[rs] OR ZeroExt(imm);

OR operation with Zero-Extend

PC <- PC + 4; Calculate next address

OP rs rt imm

Bits 6 5 5 16

firstsource

register

secondregister(dest)

immediate

Page 30: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

Datapath: Immediate Ops• Rw set by MUX and ALU B set as busB or ZeroExt(imm)• ALUsrc and RegDst set based on instruction

Rd Rt

32

Result

ALU control

Clk

busW

RegWr

3232

busA

32busB

5 5 5

Rw Ra Rb32 32-bitRegisters

Rs Rt (Don’t Care)

RegDst

Zero

Ext

Mux

3216imm16

ALUSrc

AL

U

MU

X

Page 31: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

Load RTL• Load instructionlw rt, rs, imm

Mem[PC]; Fetch instruction from memoryAddr <- R[rs]+SignExt(imm); Compute memory addrR[rt] <- Mem[Addr]; Load data into registerPC <- PC + 4; Calculate next address

OP rs rt imm

Bits 6 5 5 16

firstsource

register

secondregister(dest)

immediate

Page 32: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

Datapath: Load• Extender handles sign vs. zero extension of immediate• MUX selects between ALU result and Memory output

RtRd

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

5 5 5

Rw Ra Rb32 32-bitRegisters

Rs Rt (Don’t Care)

RegDst

Exten

der

3216

imm16

ALUSrc

ExtOp

MemtoReg

Clk

Data InWrEn

32

Adr

DataMemory

32MemWr

Mux

MU

XMU

X

AL

U

Page 33: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

Store RTL• Store instructionsw rt, rs, imm

Mem[PC]; Fetch instruction from memoryAddr <- R[rs]+ SignExt(imm); Compute memory addrMem[Addr] <- R[rt]; Load data into registerPC <- PC + 4; Calculate next address

OP rs rt imm

Bits 6 5 5 16

firstsource

register

secondsource

register

immediate

Page 34: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Datapath: Store• Register rt is passed on busB into memory• Memory address calculated just as in lw case

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs RtRegDst

Exten

der

3216imm16

ALUSrc

MemtoReg

Clk

Data In WrEn32

Adr

DataMemory

MemWr

RtRd

Mux

AL

U

MU

X

32

MU

X

ExtOp

Page 35: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

Branch RTL• Branch instructionbeq rs, rt, imm

Mem[PC]; Fetch instruction from memory

Cond <- R[rs] – R[rt]; Calculate branch condition

if (Cond eq 0) Test if equal

PC <- PC + 4 +

SignExt(imm)*4;Calculate PC Relative address

else

PC <- PC + 4; Calculate next address

OP rs rt immBits 6 5 5 16

firstsource

secondsource

immediate

Page 36: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Datapath: Branch

ExtOp

ALUctr

Clk

busW

RegWr

3232

busA

32busB

5 5 5

Rw Ra Rb32 32-bitRegisters

Rs Rt

RegDst

Exten

der

3216

imm16

ALUSrc

PCClk

Next AddressLogic16

imm16

Branch

To InstructionMemory

Zero

More Detail to Come

AL

U

MU

X

RtRd

Mux

Page 37: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

The Next Address• PC is byte-addressed in instruction memory

– SequentialPC[31:0] = PC[31:0] + 4

– Branch operationPC[31:0] = PC[31:0] + 4 + SignExt(imm) × 4

• Instruction Addresses– PC is byte addressed, but instructions are 4 bytes long– Therefore 2 LSBs of the 32 bit PC are always 0– No reason to have hardware keep the 2 LSBs Simplify hardware by using 30 bit PC

• SequentialPC[31:2] = PC[31:2] + 1

• Branch operationPC[31:2] = PC[31:2] + 1 + SignExt(imm)

Page 38: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Datapath: Fast, Expensive Next-I.F. Logic

• PC incremented to next instruction normally• On beq instruction then can add immediate × 4 to PC + 4

3030

Sig

nE

xt

30

16imm16

“1”

PC

Clk

30

30

BranchZero

Addr[31:2]

InstructionMemory

Addr[1:0]“00”

32

Instruction[31:0]Instruction[15:0]

30Ad

der A

dd

er

MU

X

Page 39: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

Datapath: Slow, Smaller Next-I.F. Logic

• Slow because cannot start address add until ALU zero• But probably not the critical path (LOAD is usually)

30

30

Sig

nE

xt

3016imm16

“0”

PC

Clk

30

Branch Zero

Addr[31:2]

InstructionMemory

Addr[1:0]“00”

32

Instruction[31:0]

30

“1”

Carry In

Instruction[15:0]

MU

X

Ad

der

Page 40: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Jump RTL• Jump instructionj target

Mem[PC]; Fetch instruction from memoryPC[31:2] <- PC[31:28] ||

target[25:0]; Calculate next address

OP target

Bits 6 26

jump target address

Page 41: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

Datapath: I.F. Unit with Jump• MUX controls if PC is pseudodirect jump

3030

Sig

nE

xt

30

16imm16

“1”

PC

Clk

30

30

Branch Zero

“00”

Addr[31:2]

InstructionMemory

Addr[1:0]

32

26

4PC[31:28]

Target 30

Jump

Instruction[15:0]

Instruction[31:0]

30

Instruction[25:0]M

UX

Ad

der

MU

X

Ad

der

Page 42: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Putting it All Together

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs Rt

RegDst

Exten

der

3216imm16

ALUSrc

ExtOp

MemtoReg

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr

InstructionFetch Unit

Clk

Zero

Instruction[31:0]

JumpBranch [21:25]

[16:20]

[11:15]

[0:15]

Imm16RdRsRt

MU

X

MU

X

AL

U

RtRd

Mux

Page 43: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

A Real MIPS Datapath

Page 44: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Control Design

• Next: Designing the Control for the Single Cycle Datapath

Control

Datapath

Memory

Processor

Input

Output

Page 45: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Adding Control• Analyze datapath and RTLs for control

– Identify control points for pieces of the datapath• Instruction Fetch Unit

• Integer function units

• Memory

– Categorize type of control signal• Flow of data through multiplexors

• Writes of state information

– Derive control signal values for each instruction

• Design and implement control with logic/PLA/ROM (for single cycle & pipelined)

Page 46: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

Instruction Fetch (first part)• Always fetch next instruction

Mem[PC];

3030

Sig

nE

xt

30

16imm16

“1”

PC

Clk

30

30

Branch = previous

Zero =previous

“00”

Addr[31:2]

InstructionMemory

Addr[1:0]

32

26

4PC[31:28]

Target 30

Jump = previous

Instruction[15:0]

Instruction[31:0]

30

Instruction[25:0]M

UX

Ad

der

MU

X

1

00

1A

dd

er

Page 47: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

5

Control for Arithmetic

32

ALUctr = <op>

Clk

busW

RegWr = 1

3232

busA

32busB

5 5

Rw Ra Rb32 32-bitRegisters

Rs Rt

RegDst = 1

Exten

der

3216imm16

ALUSrc = 0

ExtOp = X

MemtoReg = 0

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = 0

InstructionFetch Unit

Clk

Zero

Instruction[31:0]

Jump = 0Branch = 0 [21:25]

[16:20]

[11:15]

[0:15]

Imm16RdRsRt

MU

X

MU

X

AL

U

RtRd

Mux1 0

0

1

0

1

Page 48: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

Instruction Fetch at End• Increment PC: PC = PC+4; (for all but Branch/Jump)

3030

Sig

nE

xt

30

16imm16

“1”

PC

Clk

30

30

Branch = 0 Zero = X

“00”

Addr[31:2]

InstructionMemory

Addr[1:0]

32

26

4PC[31:28]

Target 30

Jump = 0

Instruction[15:0]

Instruction[31:0]

30

Instruction[25:0]M

UX

Ad

der

MU

X

0

1

1

0

Ad

der

Page 49: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

Control for Immediate (ori)

32

ALUctr = <op>

Clk

busW

RegWr = 1

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs Rt

RegDst = 0

Exten

der

3216imm16

ALUSrc=1

ExtOp=0

MemtoReg = 0

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = 0

InstructionFetch Unit

Clk

Zero

Instruction[31:0]

Jump = 0Branch = 0 [21:25]

[16:20]

[11:15]

[0:15]

Imm16RdRsRt

MU

X

MU

X

AL

U

RtRd

Mux1 0

0

1

0

1

Page 50: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

5

Control for Load (lw)

32

ALUctr = Add

Clk

busW

RegWr = 1

3232

busA

32busB

5 5

Rw Ra Rb32 32-bitRegisters

Rs Rt

RegDst = 0

Exten

der

3216imm16

ALUSrc = 1

ExtOp = 1

MemtoReg = 1

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = 0

InstructionFetch Unit

Clk

Zero

Instruction[31:0]

Jump = 0Branch = 0 [21:25]

[16:20]

[11:15]

[0:15]

Imm16RdRsRt

MU

X

MU

X

AL

U

RtRd

Mux1 0

0

1

0

1

Page 51: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

5

Control for Store (sw)

32

ALUctr = Add

Clk

busW

RegWr = 0

3232

busA

32busB

5 5

Rw Ra Rb32 32-bitRegisters

Rs Rt

RegDst = X

Exten

der

3216imm16

ALUSrc = 1

ExtOp = 1

MemtoReg = X

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = 1

InstructionFetch Unit

Clk

Zero

Instruction[31:0]

Jump = 0Branch = 0 [21:25]

[16:20]

[11:15]

[0:15]

Imm16RdRsRt

MU

X

MU

X

AL

U

RtRd

Mux1 0

0

1

0

1

Page 52: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

5 5

Control for Branch (beq)

32

ALUctr = Sub

Clk

busW

RegWr = 0

3232

busA

32busB

5

Rw Ra Rb32 32-bitRegisters

Rs Rt

RegDst = X

Exten

der

3216imm16

ALUSrc = 0

ExtOp = X

MemtoReg = X

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = 0

InstructionFetch Unit

Clk

Zero

Instruction[31:0]

Jump = 0Branch = 1 [21:25]

[16:20]

[11:15]

[0:15]

Imm16RdRsRt

MU

X

MU

X

AL

U

RtRd

Mux1 0

0

1

0

1

Page 53: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

Instruction Fetch (beq)Consider the interesting case where we branch (Zero = 1)

3030

Sig

nE

xt

30

16imm16

“1”

PC

Clk

30

30

Branch = 1 Zero = 1

“00”

Addr[31:2]

InstructionMemory

Addr[1:0]

32

26

4PC[31:28]

Target 30

Jump = 0

Instruction[15:0]

Instruction[31:0]

30

Instruction[25:0]M

UX

Ad

der

MU

X

0

1

1

0

Ad

der

Page 54: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

5 5 5

Control for Jump (j)

32

ALUctr = X

Clk

busW

RegWr = 0

3232

busA

32busB

Rw Ra Rb32 32-bitRegisters

Rs Rt

RegDst = X

Exten

der

3216imm16

ALUSrc = X

ExtOp = X

MemtoReg = X

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = 0

InstructionFetch Unit

Clk

Zero

Instruction[31:0]

Jump = 1Branch = 0 [21:25]

[16:20]

[11:15]

[0:15]

Imm16RdRsRt

MU

X

MU

X

AL

U

RtRd

Mux1 0

0

1

0

1

Page 55: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

Instruction Fetch (j)

3030

Sig

nE

xt

30

16imm16

“1”

PC

Clk

30

30

Branch = 0 Zero = X

“00”

Addr[31:2]

InstructionMemory

Addr[1:0]

32

26

4PC[31:28]

Target 30

Jump = 1

Instruction[15:0]

Instruction[31:0]

30

Instruction[25:0]M

UX

Ad

der

MU

X

0

1

1

0

Ad

der

Page 56: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

5 5 5

Control Path

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

Rw Ra Rb32 32-bitRegisters

Rs Rt

RegDst

Exten

der

3216imm16

ALUSrc

ExtOp

MemtoReg

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr

InstructionFetch Unit

Clk

Zero

Instruction[31:0]

JumpBranch [21:25]

[16:20]

[11:15]

[0:15]

Imm16RdRsRt

MU

X

MU

X

AL

U

RtRd

Mux1 0

0

1

0

1

Page 57: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Summary of Control Signals

add sub ori lw sw beq jumpRegDstALUSrcMemtoRegRegWriteMemWriteBranchJumpExtOpALUctr<2:0>

1001000x

Add

1001000x

Sub

01010000

Or

01110001

Add

x1x01001

Add

x0x0010x

Sub

xxx0001x

xxx

funcop 00 0000 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010

coding from green card

10 0000 10 0010 Not Important

Page 58: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Multilevel Decoding• 12-input control will be very large (212 = 4096)• To keep decoder size smaller, decode some control

lines in each stage • Since only R-type instructions (with op = 000000)

need function field bits, give these to ALU control

func

MainControl

op

6

ALUControl(Local)

N

6ALUop

ALUctr

3

AL

U

Page 59: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Multilevel Decoding: Main Control Table

R-type ori lw sw beq jumpRegDstALUSrcMemtoRegRegWriteMemWriteBranchJumpExtOp

ALUop<N:0>

1001000x

“R-type”

01010000

Or

01110001

Add

x1x01001

Add

x0x0010x

Subtract

xxx0001x

xxx

op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010

Page 60: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

The Encoding of ALUop

• In this exercise, ALUop has to be 2 bits wide to represent:– (1) “R-type” instructions

– “I-type” instructions that require the ALU to perform:

• (2) Or, (3) Add, and (4) Subtract

• To implement the full MIPS ISA, ALUop has to be 3 bits wide to represent:– (1) “R-type” instructions

– “I-type” instructions that require the ALU to perform:

• (2) Or, (3) Add, (4) Subtract, and (5) And (e.g. andi)

MainControl

op

6

ALUControl(Local)

func

N

6ALUop

ALUctr

3

R-type ori lw sw beq jump

ALUop (Symbolic) “R-type” Or Add Add Subtract xxx

ALUop<2:0> 1 00 0 10 0 00 0 00 0 01 xxx

Page 61: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

The Decoding of the “func” Field

R-type ori lw sw beq jump

ALUop (Symbolic) “R-type” Or Add Add Subtract xxx

ALUop<2:0> 1 00 0 10 0 00 0 00 0 01 xxx

MainControl

op

6

ALUControl(Local)

func

N

6ALUop

ALUctr

3

op rs rt rd shamt funct

061116212631

R-type

funct<5:0> Instruction Operation

10 0000

10 0010

10 0100

10 0101

10 1010

add

subtract

and

or

set-on-less-than

ALUctr<2:0> ALU Operation

000

001

010

110

111

Add

Subtract

And

Or

Set-on-less-than

ALUctr

AL

U

Page 62: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

Truth Tables

R-type ori lw sw beqALUop(Symbolic) “R-type” Or Add Add Subtract

ALUop<2:0> 1 00 0 10 0 00 0 00 0 01

ALUop func

bit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3>

0 0 0 x x x x

ALUctrALUOperation

Add 0 1 0

bit<2> bit<1> bit<0>

0 x 1 x x x x Subtract 1 1 0

0 1 x x x x x Or 0 0 1

1 x x 0 0 0 0 Add 0 1 0

1 x x 0 0 1 0 Subtract 1 1 0

1 x x 0 1 0 0 And 0 0 0

1 x x 0 1 0 1 Or 0 0 1

1 x x 1 0 1 0 Set on < 1 1 1

funct<3:0> Instruction Op.

0000

0010

0100

0101

1010

add

subtract

and

or

set-on-less-than

Page 63: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

The Logic Equation for ALUctr<2>

ALUop func

bit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3> ALUctr<2>

0 x 1 x x x x 1

1 x x 0 0 1 0 1

1 x x 1 0 1 0 1

• ALUctr<2> = !ALUop<2> & ALUop<0> +

ALUop<2> & !func<2> & func<1> & !func<0>

This makes func<3> a don’t care

Page 64: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

The Logic Equation for ALUctr<1>

ALUop func

bit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3>

0 0 0 x x x x 1

ALUctr<1>

0 x 1 x x x x 1

1 x x 0 0 0 0 1

1 x x 0 0 1 0 1

1 x x 1 0 1 0 1

• ALUctr<1> = !ALUop<2> & !ALUop<1> +

ALUop<2> & !func<2> & !func<0>

Page 65: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

The Logic Equation for ALUctr<0>

ALUop func

bit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3> ALUctr<0>

0 1 x x x x x 1

1 x x 0 1 0 1 1

1 x x 1 0 1 0 1

• ALUctr<0> = !ALUop<2> & ALUop<1> +

ALUop<2> & !func<3> & func<2> & !func<1> & func<0> + ALUop<2> & func<3> & !func<2> & func<1> & !func<0>

Page 66: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

The ALU Control Logic

ALUControl(Local)

func

3

6ALUop

ALUctr

3

• ALUctr<2> = !ALUop<2> & ALUop<0> +

ALUop<2> & !func<2> & func<1> & !func<0>• ALUctr<1> = !ALUop<2> & !ALUop<0> +

ALUop<2> & !func<2> & !func<0>• ALUctr<0> = !ALUop<2> & ALUop<1>

+ ALUop<2> & !func<3> & func<2> & !func<1> & func<0>

+ ALUop<2> & func<3> & !func<2> & func<1> & !func<0>

Page 67: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

Main Control Truth Table

R-type ori lw sw beq jump

RegDst

ALUSrc

MemtoReg

RegWrite

MemWrite

Branch

Jump

ExtOp

ALUop (Symbolic)

1

0

0

1

0

0

0

x

“R-type”

0

1

0

1

0

0

0

0

Or

0

1

1

1

0

0

0

1

Add

x

1

x

0

1

0

0

1

Add

x

0

x

0

0

1

0

x

Subtract

x

x

x

0

0

0

1

x

xxx

op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010

ALUop <2> 1 0 0 0 0 x

ALUop <1> 0 1 0 0 0 x

ALUop <0> 0 0 0 0 1 x

MainControl

op

6

ALUControl(Local)

func

3

6

ALUop

ALUctr

3

RegDst

ALUSrc

:

Page 68: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Truth Table for RegWrite

R-type ori lw sw beq jump

RegWrite 1 1 1 0 0 0

op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010

• RegWrite = R-type + ori + lw

= !op<5> & !op<4> & !op<3> & !op<2> & !op<1> & !op<0> (R-type) + !op<5> & !op<4> & op<3> & op<2> & !op<1> & op<0> (ori) + op<5> & !op<4> & !op<3> & !op<2> & op<1> & op<0> (lw)

op<0>

op<5>. .op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

R-type ori lw sw beq jump

RegWrite

Page 69: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

PLA Implementationop<0>

op<5>. .op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

R-type ori lw sw beq jumpRegWrite

ALUSrc

MemtoReg

MemWrite

Branch

Jump

RegDst

ExtOp

ALUop<2>

ALUop<1>

ALUop<0>

Page 70: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Implementing Control• Programmable Logic Array (PLA) vs.

“Random Logic”– Design Changes

• Validation changes are common• PLA is less work to change; area/timing impact is predictable

– Area• Tradeoff depends on complexity of logic (# of gates)

– Timing and Power• Random logic generally better since individual paths can be tuned

• Alternative approach is Read Only Memory (ROM/PROM) – Also combinational, but size makes it slow– used for microcoded control with more than one state/cycle per

instruction

Page 71: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

5 5 5

Putting It All Together

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

Rw Ra Rb32 32-bitRegisters

Rs Rt

RegDst

Exten

der

3216imm16

ALUSrc

ExtOp

MemtoReg

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr

InstructionFetch Unit

Clk

Zero

Instruction[31:0]

JumpBranch [21:25]

[16:20]

[11:15]

[0:15]

Imm16RdRsRt

MU

X

MU

X

AL

U

RtRd

Mux1 0

0

1

0

1

MainControl

op6

ALUControl func

6

3ALUop

ALUctr3

RegDst

ALUSrc

:Instr[5:0]

Instr[31:26]

Page 72: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

Worst Case Timing (Load)Clk

PC

Rs, Rt, Rd,Op, Func

Clk-to-Q

ALUctr

Instruction Memory Access Time

Old Value New Value

RegWr Old Value New Value

Delay through Control Logic

busA

Register File Access Time

Old Value New Value

busB

ALU Delay

Old Value New Value

Old Value New Value

New ValueOld Value

ExtOp Old Value New Value

ALUSrc Old Value New Value

MemtoReg Old Value New Value

Data Mem Address Old Value New Value

busW Old Value New

Sum of {Mux Delay+setup+skew}

Delay through Extender & Mux

RegisterWrite Occurs --->

Data Memory Access Time

Page 73: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

CS224 Spring 2011

Single Cycle Processor• Advantages

– Single cycle per instruction makes logic and clock simple– All machines would have a CPI of 1

• Disadvantages– Inefficient utilization of memory and functional units since different

instructions take different lengths of time• Each functional unit is used only once per clock cycle• e.g. ALU only computes values a small amount of the time

– Cycle time is the worst case path long cycle times!• Load instruction

– PC CLK-to-Q + – instruction memory access time + – register file access time + – ALU delay + – data memory access time + – register file setup time + – clock skew

– All machines would have a CPI of 1, with cycle time set by the longest instruction!

Page 74: Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture

Single cycle datapath => CPI=1, CCT => long

5 steps to design a processor• 1. Analyze instruction set => datapath requirements

• 2. Select set of datapath components & establish clock methodology

• 3. Assemble datapath meeting the requirements

• 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer.

• 5. Assemble the control logic

Control is the hard part

MIPS makes control easier• Instructions same size

• Source registers always in same place

• Immediates same size, location

• Operations always on registers/immediates

Summary

Control

Datapath

Memory

ProcessorInput

Output