Designing a Pipelined CPU

Preview:

DESCRIPTION

Read 4.7 for next class. Designing a Pipelined CPU. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Review -- Single Cycle CPU. Review -- Multiple Cycle CPU. Ifetch. - PowerPoint PPT Presentation

Citation preview

Designing a Pipelined CPU

Read 4.7 for next class

Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Review -- Single Cycle CPU

Review -- Multiple Cycle CPU

Review -- Instruction Latencies

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec Mem WrLoad

•Single-Cycle CPU

•Multiple Cycle CPU

Ifetch Reg/Dec Exec WrAdd

Instruction Latencies and Throughput•Single-Cycle CPU

•Multiple Cycle CPU

•Pipelined CPU

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec Mem WrLoad

Which of the statements below is true about a pipelined processor?

Selection Statement

A Instruction latency remains essentially unchanged from single-cycle (minus some overheads); Instruction throughput increases

B Instruction latency remains essentially unchanged from multi-cycle (minus some overheads); Instruction throughput increases

C Instruction latency improves by a factor of 5 over single-cycle (minus some overheads); Instruction throughput increases

D Instruction latency improves by a factor of 5 over multi-cycle (minus some overheads); Instruction throughput increases

E None of the above

Instruction Latencies and Throughput

Instruction Latencies and Throughput•Single-Cycle CPU

•Multiple Cycle CPU

•Pipelined CPU

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Ifetch Reg/Dec Exec Mem WrLoad

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec Mem WrLoad

Instruction Latencies and Throughput•Single-Cycle CPU

•Multiple Cycle CPU

•Pipelined CPU

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec Mem WrLoad

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec Mem WrLoad

Instruction Latencies and Throughput•Single-Cycle CPU

•Multiple Cycle CPU

•Pipelined CPU

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec Mem WrLoad

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec Mem WrLoad

Pipelining Advantages

• Higher maximum throughput

• Higher utilization of CPU resources

• But, more complicated datapath, more complex control(?)

PI throughputVs. latency

Pipelining ThroughputPeek Throughput

1. Longest instruction2. Average instruction3. Cycle time

PI Matching

Selection SC MC Pipeline

A 1 2 3B 1 2 1C 1 3 2D 3 1 2E None of the above

A Pipelined Datapath

IF: Instruction fetch

ID: Instruction decode and register fetch

EX: Execution and effective address calculation

MEM: Memory access

WB: Write back

Idea for a Pipelined Datapath

Execution in a Pipelined Datapath

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9

lw

lw

lw

lw

lw

IF ID EX MEM WB

IF ID EX MEM WB

Execution in a Pipelined Datapath

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9

lw

lw

lw

lw

lw

steadystate

steadystate

IF ID EX MEM WB

IF ID EX MEM WB

Make sure to pointOut Steady StateCPI = 1

Should we force every instruction to go through all 5 stages? Can we break it up like we did for multi-cycle, with R-type taking 4 cycles instead of 5?

Selection Yes/No Reason (Choose BEST answer)

A Yes Decreasing R-type to 4 cycles improves instruction throughput

B Yes Decreasing R-type to 4 cycles improves instruction latency

C No Decreasing R-type to 4 cycles causes hazards

D No Decreasing R-type to 4 cycles causes hazards and doesn’t impact throughput

E No Decreasing R-type to 4 cycles causes hazards and doesn’t impact latency

Pipeline Stages

Mixed Instructions in the Pipeline

IM Reg

AL

U Reg

IM Reg

AL

U DM Reg

CC1 CC2 CC3 CC4 CC5 CC6

lw

add

Pipeline Principles

• All instructions that share a pipeline must have the same stages in the same order.– therefore, add does nothing during Mem stage

– sw does nothing during WB stage

• All intermediate values must be latched each cycle.

• There is no functional block reuse

IM Reg A

LU DM Reg

IF ID EX MEM WB

Pipelined DatapathInstruction Fetch Instruction Decode/

Register FetchExecute/

Address CalculationMemory Access Write Back

registers!registers!

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

The Pipeline in Executionadd $10, $1, $2 Instruction Decode/

Register FetchExecute/

Address CalculationMemory Access Write Back

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Draw active datapath throughexample

The Pipeline in Executionlw $12, 1000($4) add $10, $1, $2 Execute/

Address CalculationMemory Access Write Back

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

The Pipeline in Executionsub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2 Memory Access Write Back

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

The Pipeline in ExecutionInstruction Fetch sub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2 Write Back

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

The Pipeline in ExecutionInstruction Fetch Instruction Decode/

Register Fetchsub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

The Pipeline in ExecutionInstruction Fetch Instruction Decode/

Register Fetchsub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Did someone spot the error??Write Register is wrong

The Pipeline in ExecutionInstruction Fetch Instruction Decode/

Register FetchExecute/

Address Calculationsub $15, $4, $1 lw $12, 1000($4)

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

The Pipeline, with controls But….

Pipeline Control

Selection SC MC Pipeline

A X Y YB Y X XC X Y XD Y X YE None of the above

Control Logic

X. Combinational LogicY. FSM or Microprogram

Pipelined Control

IF/I

D

ID/E

X

EX

/ME

M

ME

M/W

B

controlinstruction

Pipelined Control Signals

Execution Stage Control Lines Memory Stage Control Lines Write Back Stage ControlLines

Instruction RegDst ALUOp1 ALUOp0 ALUSrc Branch MemRead MemWrite RegWrite MemtoRegR-Format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw x 0 0 1 0 0 1 0 xbeq x 0 1 0 1 0 0 0 x

The Pipeline with Control Logic

Is it really this simple?

IM Reg

AL

U DM Reg

IM Reg

AL

U DM

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8

sub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

What just happened here which is problematic (BEST ANSWER)?A. The register file is trying to read and write the same registerB. The ALU and data memory are both active in the same cycleC. A value is used before it is producedD. Both A and BE. Both A and C Neg edge!

Data Hazards• When a result is needed in the pipeline before it is

available, a “data hazard” occurs.

IM Reg

AL

U DM Reg

IM Reg

AL

U DM

IM Reg

AL

U DM Reg

IM Reg A

LU DM Reg

IM Reg

AL

U DM Reg

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8

sub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

R2 AvailableR2 Available

R2 NeededR2 Needed

Use red and black lines!

Hazards continued

• What happens when...add $3, $10, $11

lw $8, 1000($3)

sub $11, $8, $7

Draw dependencies

The Pipeline in Executionlw $8, 1000($3) add $3, $10, $11 Execute/

Address CalculationMemory Access Write Back

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

The Pipeline in Executionsub $11, $8, $7 lw $8, 1000($3) add $3, $10, $11 Memory Access Write Back

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

The Pipeline in Executionadd $10, $1, $2 sub $11, $8, $7 lw $8, 1000($3) add $3, $10, $11 Write Back

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Pipelining Key Points

• ET = IC * CPI * CT

• We achieve high throughput without reducing instruction latency.

• Pipelining exploits a special kind of parallelism (parallelism between functionality required in different cycles).

• Pipelining uses combinational logic to generate (and registers to propagate) control signals.

• Pipelining creates potential hazards.

Summary comparison (MIPS designs)

CPI CT

Single Cycle

Multi-cycle

Pipeline

Recommended