40
Designing a Pipelined CPU Read 4.7 for next cla Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 U nported License .

Designing a Pipelined CPU

Embed Size (px)

DESCRIPTION

Read 4.7 for next class. Designing a Pipelined CPU. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Review -- Single Cycle CPU. Review -- Multiple Cycle CPU. Ifetch. - PowerPoint PPT Presentation

Citation preview

Page 1: Designing a Pipelined CPU

Designing a Pipelined CPU

Read 4.7 for next class

Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Page 2: Designing a Pipelined CPU

Review -- Single Cycle CPU

Page 3: Designing a Pipelined CPU

Review -- Multiple Cycle CPU

Page 4: Designing a Pipelined CPU

Review -- Instruction Latencies

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec Mem WrLoad

•Single-Cycle CPU

•Multiple Cycle CPU

Ifetch Reg/Dec Exec WrAdd

Page 5: Designing a Pipelined CPU

Instruction Latencies and Throughput•Single-Cycle CPU

•Multiple Cycle CPU

•Pipelined CPU

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec Mem WrLoad

Page 6: Designing a Pipelined CPU

Which of the statements below is true about a pipelined processor?

Selection Statement

A Instruction latency remains essentially unchanged from single-cycle (minus some overheads); Instruction throughput increases

B Instruction latency remains essentially unchanged from multi-cycle (minus some overheads); Instruction throughput increases

C Instruction latency improves by a factor of 5 over single-cycle (minus some overheads); Instruction throughput increases

D Instruction latency improves by a factor of 5 over multi-cycle (minus some overheads); Instruction throughput increases

E None of the above

Instruction Latencies and Throughput

Page 7: Designing a Pipelined CPU

Instruction Latencies and Throughput•Single-Cycle CPU

•Multiple Cycle CPU

•Pipelined CPU

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Ifetch Reg/Dec Exec Mem WrLoad

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec Mem WrLoad

Page 8: Designing a Pipelined CPU

Instruction Latencies and Throughput•Single-Cycle CPU

•Multiple Cycle CPU

•Pipelined CPU

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec Mem WrLoad

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec Mem WrLoad

Page 9: Designing a Pipelined CPU

Instruction Latencies and Throughput•Single-Cycle CPU

•Multiple Cycle CPU

•Pipelined CPU

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec Mem WrLoad

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec Mem WrLoad

Page 10: Designing a Pipelined CPU

Pipelining Advantages

• Higher maximum throughput

• Higher utilization of CPU resources

• But, more complicated datapath, more complex control(?)

PI throughputVs. latency

Page 11: Designing a Pipelined CPU

Pipelining ThroughputPeek Throughput

1. Longest instruction2. Average instruction3. Cycle time

PI Matching

Selection SC MC Pipeline

A 1 2 3B 1 2 1C 1 3 2D 3 1 2E None of the above

Page 12: Designing a Pipelined CPU

A Pipelined Datapath

IF: Instruction fetch

ID: Instruction decode and register fetch

EX: Execution and effective address calculation

MEM: Memory access

WB: Write back

Page 13: Designing a Pipelined CPU

Idea for a Pipelined Datapath

Page 14: Designing a Pipelined CPU

Execution in a Pipelined Datapath

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9

lw

lw

lw

lw

lw

IF ID EX MEM WB

IF ID EX MEM WB

Page 15: Designing a Pipelined CPU

Execution in a Pipelined Datapath

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9

lw

lw

lw

lw

lw

steadystate

steadystate

IF ID EX MEM WB

IF ID EX MEM WB

Make sure to pointOut Steady StateCPI = 1

Page 16: Designing a Pipelined CPU

Should we force every instruction to go through all 5 stages? Can we break it up like we did for multi-cycle, with R-type taking 4 cycles instead of 5?

Selection Yes/No Reason (Choose BEST answer)

A Yes Decreasing R-type to 4 cycles improves instruction throughput

B Yes Decreasing R-type to 4 cycles improves instruction latency

C No Decreasing R-type to 4 cycles causes hazards

D No Decreasing R-type to 4 cycles causes hazards and doesn’t impact throughput

E No Decreasing R-type to 4 cycles causes hazards and doesn’t impact latency

Pipeline Stages

Page 17: Designing a Pipelined CPU

Mixed Instructions in the Pipeline

IM Reg

AL

U Reg

IM Reg

AL

U DM Reg

CC1 CC2 CC3 CC4 CC5 CC6

lw

add

Page 18: Designing a Pipelined CPU

Pipeline Principles

• All instructions that share a pipeline must have the same stages in the same order.– therefore, add does nothing during Mem stage

– sw does nothing during WB stage

• All intermediate values must be latched each cycle.

• There is no functional block reuse

IM Reg A

LU DM Reg

IF ID EX MEM WB

Page 19: Designing a Pipelined CPU

Pipelined DatapathInstruction Fetch Instruction Decode/

Register FetchExecute/

Address CalculationMemory Access Write Back

registers!registers!

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Page 20: Designing a Pipelined CPU

The Pipeline in Executionadd $10, $1, $2 Instruction Decode/

Register FetchExecute/

Address CalculationMemory Access Write Back

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Draw active datapath throughexample

Page 21: Designing a Pipelined CPU

The Pipeline in Executionlw $12, 1000($4) add $10, $1, $2 Execute/

Address CalculationMemory Access Write Back

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Page 22: Designing a Pipelined CPU

The Pipeline in Executionsub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2 Memory Access Write Back

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Page 23: Designing a Pipelined CPU

The Pipeline in ExecutionInstruction Fetch sub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2 Write Back

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Page 24: Designing a Pipelined CPU

The Pipeline in ExecutionInstruction Fetch Instruction Decode/

Register Fetchsub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Page 25: Designing a Pipelined CPU

The Pipeline in ExecutionInstruction Fetch Instruction Decode/

Register Fetchsub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Did someone spot the error??Write Register is wrong

Page 26: Designing a Pipelined CPU

The Pipeline in ExecutionInstruction Fetch Instruction Decode/

Register FetchExecute/

Address Calculationsub $15, $4, $1 lw $12, 1000($4)

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Page 27: Designing a Pipelined CPU

The Pipeline, with controls But….

Page 28: Designing a Pipelined CPU

Pipeline Control

Selection SC MC Pipeline

A X Y YB Y X XC X Y XD Y X YE None of the above

Control Logic

X. Combinational LogicY. FSM or Microprogram

Page 29: Designing a Pipelined CPU

Pipelined Control

IF/I

D

ID/E

X

EX

/ME

M

ME

M/W

B

controlinstruction

Page 30: Designing a Pipelined CPU

Pipelined Control Signals

Execution Stage Control Lines Memory Stage Control Lines Write Back Stage ControlLines

Instruction RegDst ALUOp1 ALUOp0 ALUSrc Branch MemRead MemWrite RegWrite MemtoRegR-Format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw x 0 0 1 0 0 1 0 xbeq x 0 1 0 1 0 0 0 x

Page 31: Designing a Pipelined CPU

The Pipeline with Control Logic

Page 32: Designing a Pipelined CPU

Is it really this simple?

Page 33: Designing a Pipelined CPU

IM Reg

AL

U DM Reg

IM Reg

AL

U DM

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

IM Reg

AL

U DM Reg

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8

sub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

What just happened here which is problematic (BEST ANSWER)?A. The register file is trying to read and write the same registerB. The ALU and data memory are both active in the same cycleC. A value is used before it is producedD. Both A and BE. Both A and C Neg edge!

Page 34: Designing a Pipelined CPU

Data Hazards• When a result is needed in the pipeline before it is

available, a “data hazard” occurs.

IM Reg

AL

U DM Reg

IM Reg

AL

U DM

IM Reg

AL

U DM Reg

IM Reg A

LU DM Reg

IM Reg

AL

U DM Reg

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8

sub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

R2 AvailableR2 Available

R2 NeededR2 Needed

Use red and black lines!

Page 35: Designing a Pipelined CPU

Hazards continued

• What happens when...add $3, $10, $11

lw $8, 1000($3)

sub $11, $8, $7

Draw dependencies

Page 36: Designing a Pipelined CPU

The Pipeline in Executionlw $8, 1000($3) add $3, $10, $11 Execute/

Address CalculationMemory Access Write Back

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Page 37: Designing a Pipelined CPU

The Pipeline in Executionsub $11, $8, $7 lw $8, 1000($3) add $3, $10, $11 Memory Access Write Back

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Page 38: Designing a Pipelined CPU

The Pipeline in Executionadd $10, $1, $2 sub $11, $8, $7 lw $8, 1000($3) add $3, $10, $11 Write Back

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Page 39: Designing a Pipelined CPU

Pipelining Key Points

• ET = IC * CPI * CT

• We achieve high throughput without reducing instruction latency.

• Pipelining exploits a special kind of parallelism (parallelism between functionality required in different cycles).

• Pipelining uses combinational logic to generate (and registers to propagate) control signals.

• Pipelining creates potential hazards.

Page 40: Designing a Pipelined CPU

Summary comparison (MIPS designs)

CPI CT

Single Cycle

Multi-cycle

Pipeline