Designing a Pipelined CPU

Read 4.7 for next class

Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Review -- Single Cycle CPU

Review -- Multiple Cycle CPU

Review -- Instruction Latencies

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem WrLoad

•Single-Cycle CPU

•Multiple Cycle CPU

Ifetch Reg/Dec Exec WrAdd

Instruction Latencies and Throughput•Single-Cycle CPU

•Pipelined CPU

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Which of the statements below is true about a pipelined processor?

Selection Statement

A Instruction latency remains essentially unchanged from single-cycle (minus some overheads); Instruction throughput increases

B Instruction latency remains essentially unchanged from multi-cycle (minus some overheads); Instruction throughput increases

C Instruction latency improves by a factor of 5 over single-cycle (minus some overheads); Instruction throughput increases

D Instruction latency improves by a factor of 5 over multi-cycle (minus some overheads); Instruction throughput increases

E None of the above

Instruction Latencies and Throughput

•Pipelined CPU

Pipelining Advantages

• Higher maximum throughput

• Higher utilization of CPU resources

• But, more complicated datapath, more complex control(?)

PI throughputVs. latency

Pipelining ThroughputPeek Throughput

1. Longest instruction2. Average instruction3. Cycle time

PI Matching

Selection SC MC Pipeline

A 1 2 3B 1 2 1C 1 3 2D 3 1 2E None of the above

A Pipelined Datapath

IF: Instruction fetch

ID: Instruction decode and register fetch

EX: Execution and effective address calculation

MEM: Memory access

WB: Write back

Idea for a Pipelined Datapath

Execution in a Pipelined Datapath

IM Reg

U DM Reg

IM Reg

U DM Reg

IM Reg

U DM Reg

IM Reg

U DM Reg

IM Reg

U DM Reg

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9

IF ID EX MEM WB

Execution in a Pipelined Datapath

IM Reg

U DM Reg

IM Reg

U DM Reg

IM Reg

U DM Reg

IM Reg

U DM Reg

IM Reg

U DM Reg

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9

steadystate

IF ID EX MEM WB

Make sure to pointOut Steady StateCPI = 1

Should we force every instruction to go through all 5 stages? Can we break it up like we did for multi-cycle, with R-type taking 4 cycles instead of 5?

Selection Yes/No Reason (Choose BEST answer)

A Yes Decreasing R-type to 4 cycles improves instruction throughput

B Yes Decreasing R-type to 4 cycles improves instruction latency

C No Decreasing R-type to 4 cycles causes hazards

D No Decreasing R-type to 4 cycles causes hazards and doesn’t impact throughput

E No Decreasing R-type to 4 cycles causes hazards and doesn’t impact latency

Pipeline Stages

Mixed Instructions in the Pipeline

IM Reg

U DM Reg

CC1 CC2 CC3 CC4 CC5 CC6

Pipeline Principles

• All instructions that share a pipeline must have the same stages in the same order.– therefore, add does nothing during Mem stage

– sw does nothing during WB stage

• All intermediate values must be latched each cycle.

• There is no functional block reuse

IM Reg A

LU DM Reg

IF ID EX MEM WB

Pipelined DatapathInstruction Fetch Instruction Decode/

Register FetchExecute/

Address CalculationMemory Access Write Back

registers!registers!

Instructionmemory

Address

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

0Writedata

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

ALUresult

ALUZero

Datamemory

Address

The Pipeline in Executionadd $10, $1, $2 Instruction Decode/

Instructionmemory

Address

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

0Writedata

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

ALUresult

ALUZero

Datamemory

Address

Draw active datapath throughexample

The Pipeline in Executionlw $12, 1000($4) add $10, $1, $2 Execute/

Instructionmemory

Address

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

0Writedata

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

ALUresult

ALUZero

Datamemory

Address

The Pipeline in Executionsub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2 Memory Access Write Back

Instructionmemory

Address

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

0Writedata

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

ALUresult

ALUZero

Datamemory

Address

The Pipeline in ExecutionInstruction Fetch sub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2 Write Back

Instructionmemory

Address

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

0Writedata

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

ALUresult

ALUZero

Datamemory

Address

The Pipeline in ExecutionInstruction Fetch Instruction Decode/

Register Fetchsub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2

Instructionmemory

Address

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

0Writedata

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

ALUresult

ALUZero

Datamemory

Address

Register Fetchsub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2

Instructionmemory

Address

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

0Writedata

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

ALUresult

ALUZero

Datamemory

Address

Did someone spot the error??Write Register is wrong

Address Calculationsub $15, $4, $1 lw $12, 1000($4)

Instructionmemory

Address

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

0Writedata

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

ALUresult

ALUZero

Datamemory

Address

The Pipeline, with controls But….

Pipeline Control

Selection SC MC Pipeline

A X Y YB Y X XC X Y XD Y X YE None of the above

Control Logic

X. Combinational LogicY. FSM or Microprogram

Pipelined Control

controlinstruction

Pipelined Control Signals

Execution Stage Control Lines Memory Stage Control Lines Write Back Stage ControlLines

Instruction RegDst ALUOp1 ALUOp0 ALUSrc Branch MemRead MemWrite RegWrite MemtoRegR-Format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw x 0 0 1 0 0 1 0 xbeq x 0 1 0 1 0 0 0 x

The Pipeline with Control Logic

Is it really this simple?

IM Reg

U DM Reg

IM Reg

U DM Reg

IM Reg

U DM Reg

IM Reg

U DM Reg

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8

sub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

What just happened here which is problematic (BEST ANSWER)?A. The register file is trying to read and write the same registerB. The ALU and data memory are both active in the same cycleC. A value is used before it is producedD. Both A and BE. Both A and C Neg edge!

Data Hazards• When a result is needed in the pipeline before it is

available, a “data hazard” occurs.

IM Reg

U DM Reg

IM Reg

U DM Reg

IM Reg A

LU DM Reg

IM Reg

U DM Reg

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8

sub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

R2 AvailableR2 Available

R2 NeededR2 Needed

Use red and black lines!

Hazards continued

• What happens when...add $3, $10, $11

lw $8, 1000($3)

sub $11, $8, $7

Draw dependencies

The Pipeline in Executionlw $8, 1000($3) add $3, $10, $11 Execute/

Instructionmemory

Address

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

0Writedata

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

ALUresult

ALUZero

Datamemory

Address

The Pipeline in Executionsub $11, $8, $7 lw $8, 1000($3) add $3, $10, $11 Memory Access Write Back

Instructionmemory

Address

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

0Writedata

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

ALUresult

ALUZero

Datamemory

Address

The Pipeline in Executionadd $10, $1, $2 sub $11, $8, $7 lw $8, 1000($3) add $3, $10, $11 Write Back

Instructionmemory

Address

Add Addresult

Shiftleft 2

IF/ID EX/MEM MEM/WB

0Writedata

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

ALUresult

ALUZero

Datamemory

Address

Pipelining Key Points

• ET = IC * CPI * CT

• We achieve high throughput without reducing instruction latency.

• Pipelining exploits a special kind of parallelism (parallelism between functionality required in different cycles).

• Pipelining uses combinational logic to generate (and registers to propagate) control signals.

• Pipelining creates potential hazards.

Summary comparison (MIPS designs)

CPI CT

Single Cycle

Multi-cycle

Pipeline

Designing a Pipelined CPU

Documents

Design of a Pipelined CPU (Part 5) (slightly different from part ......Design of a Pipelined CPU (Part 5) (slightly different from part 2 design) Important difference between the earlier

PIPELINED FUGUES

CSEE 3827: Fundamentals of Computer Systemsmartha/courses/3827/sp10/slides/11_pipelined… · Pipelined MIPS Implementation. Single-Cycle CPU Performance Issues • Longest delay

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards

Constructive Computer Architecture: Non-Pipelined and Pipelined Processors Arvind

Ch. 10 Central Processing Unit Designs - CISC. Two CPU designs CISC –Non-pipelined datapath with a micro- programmed control unit RISC –Pipelined datapath

Nios Embedded Processor - actel.kractel.kr/.../nios_programmers_reference_32.pdf · Nios CPU Overview The Nios CPU is a pipelined, single-issue RISC processor in which most instructions

Nios Embedded Processor - Ryerson Universitycourses/ee8205/Data... · Nios CPU Overview The Nios CPU is a pipelined, single-issue RISC processor in which most instructions run in

Pipelined Electronics

Pipelined Decision Tree Classiﬁcation ...ivpcl.ece.unm.edu/Publications/JOURNALS/2015/Pipelined Decision...Pipelined Decision Tree Classiﬁcation AcceleratorImplementationinFPGA(DT-CAIF)

The Pipelined Processor - GitHub Pagesclcheungac.github.io/...tutorial10_Fall2015.pdf · The Pipelined Processor 2 The Pipelined Processor Pipeline hazards - exercises Re-ordering

Designing a Complete Pipelined Datapath to MIPS ISA ...sbmicro.org.br/sforum-eventos/sforum2014/14.pdf · 9/9/2013 · Designing a Complete Pipelined Datapath to MIPS ISA: ... implemented

The Pipelined CPU - Edward Bosworth, Ph.D. · PPT file · Web view · 2013-09-22The Pipelined CPU. The CPU pipeline is similar to an assembly line. The execution of an instruction

Outline Introduction Version 1 EMY CPU : Pipelined EMY CPU

Architecture (Pipelined Implementation)

CPU Performance Pipelined CPU - cs.cornell.edu

Machine Structures Lecture 19 – CPU Design: Designing a Single-cycle CPU, pt 2

Pipelined Processor Design

The Pipelined CPU With Control - University of California ...american.cs.ucdavis.edu/academic/ecs154b/154bpdf/hazardsplus.pdf · The Pipelined CPU With Control Read Address IM Add

The Pipelined CPU