Arquitectura de Computadoras - CINVESTAVadiaz/ArqComp/12-Pipelining1.pdf · Tecnologías de...

Preview:

Citation preview

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 1

Laboratorio deTecnologías de Información

Introduction to Pipelining:Introduction to Pipelining: DatapathDatapath

Arquitectura de ComputadorasArquitectura de ComputadorasArturo DArturo Dííaz Paz Péérezrez

Centro de InvestigaciCentro de Investigacióón y de Estudios Avanzados del IPNn y de Estudios Avanzados del IPNLaboratorio de TecnologLaboratorio de Tecnologíías de Informacias de Informacióónn

adiaz@cinvestav.mxadiaz@cinvestav.mx

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 2

Laboratorio deTecnologías de InformaciónPipeliningPipelining

1 2 3 4

A way

of

exploiting

instruction

level

parallelism

1 2 3 4

1 2 3 4

1 2 3 4

timein

strn

s Throughput

Latency

1 2 3 4

1 2 3 4

1 2 3 4

time

inst

rns Throughput

Latency

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 3

Laboratorio deTecnologías de InformaciónObservationsObservations

Pipelining doesn’t help latency of a single task, it helps throughput of the entire workload

Pipeline rate limited by slowest pipeline stage♦

Multiple tasks operating simultaneously

Potential speedup = Number pipe stages♦

Unbalanced lengths of pipe stages reduces speedup

Time to “fill”

pipeline and time to “drain”

it reduces speedup

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 4

Laboratorio deTecnologías de Información5 Steps of DLX Datapath5 Steps of DLX Datapath

PCInst.

Memory

Add

Registers

SignExtend

Add

Mux

Mux

Zero ?

Mux

DataMemory

Mux

NPC

IR

16

A

B

SM D

ALUOutput

LM D

Instruction Fetch Instruction Decode/Register Fetch

ExecuteAddr. Calc.

MemoryAccess

WriteBack

32

4

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 5

Laboratorio deTecnologías de InformaciónPipelinedPipelined DLX DLX DatapathDatapath

Data stationary

control-

local decode

for

each

phase

/ pipeline

stage

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 6

Laboratorio deTecnologías de Información

Visualizing PipeliningVisualizing Pipelining

Single Cycle, Multiple Cycle, vs. PipelineSingle Cycle, Multiple Cycle, vs. Pipeline

Clk

Cycle 1

Multiple Cycle Implementation:

Ifetch Reg Exec Mem Wr

Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10

Load Ifetch Reg Exec Mem Wr

Ifetch Reg Exec MemLoad Store

Pipeline Implementation:

Ifetch Reg Exec Mem WrStore

Clk

Single Cycle Implementation:

Load Store Waste

IfetchR-type

Ifetch Reg Exec Mem WrR-type

Cycle 1 Cycle 2

R-type

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 8

Laboratorio deTecnologías de InformaciónWhy Pipeline?Why Pipeline?

Suppose we execute 100 instructions♦

Single Cycle Machine■

45 ns/cycle x 1 CPI x 100 inst = 4500 ns

Multicycle Machine■

10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns

Ideal pipelined machine■

10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 9

Laboratorio deTecnologías de InformaciónLimits to PipeliningLimits to Pipelining

Its not that easy for computers

Limits to pipelining: Hazards

prevent next instruction from executing during its designated clock cycle■

Structural hazards: HW cannot support this combination of instructions

Data hazards: instruction depends on result of prior instruction still in the pipeline

Control hazards: pipelining of branches & other instructions that change the PC

Common solution is to stall the pipeline until the hazard is resolved, inserting one or more “bubbles”

in the pipeline

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 10

Laboratorio deTecnologías de Información

1 1 MemoryMemory isis StructuralStructural HazardHazard

Detection is easy in this case!

(right half highlight means read, left half write)

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 11

Laboratorio deTecnologías de Información

1 Memory is Structural Hazard1 Memory is Structural Hazard

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 12

Laboratorio deTecnologías de InformaciónData Hazard on r1Data Hazard on r1

• Dependencies backwards in time are hazards

Instr.

Order

Time (clock cycles)

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

IF ID/RF EX MEM WBAL

UIm Reg Dm Reg

AL

UIm Reg Dm RegA

LUIm Reg Dm Reg

Im

AL

UReg Dm Reg

AL

UIm Reg Dm Reg

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 13

Laboratorio deTecnologías de Información

Data Hazard SolutionData Hazard Solution

• “Forward” result from one stage to another

Instr.

Order

Time (clock cycles)

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

IF ID/RF EX MEM WBAL

UIm Reg Dm Reg

AL

UIm Reg Dm RegA

LUIm Reg Dm Reg

Im

AL

UReg Dm Reg

AL

UIm Reg Dm Reg

•“or” OK if define read/write properly

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 14

Laboratorio deTecnologías de Información3 Generic Data Hazards3 Generic Data Hazards

Instri

followed by Instrj

Read After Write (RAW)■

Instrj

tries to read operand before Instri

writes it♦

Write After Read (WAR)■

Instrj tries to write operand before Instri

reads it■

Can’t happen in DLX 5 stage pipeline because

» all instructions take 5 stages» reads are always in stage 2, and» writes are always in stage 5

Write After Write (WAW)■

Instrj tries to write operand before Instri writes it

» Leaves wrong result (Instri not Instrj)■

Can’t happen in DLX 5 stage pipeline because

» all instructions take 5 stages» writes are always in stage 5

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 15

Laboratorio deTecnologías de Información

Control Hazard: WaitControl Hazard: Wait

Stall: wait until decision is clear♦

Impact: 2 lost cycles (i.e. 3 clock cycles per branch instruction) => slow

Move decision to end of decode■

save 1 cycle per branch

Instr.

Order

Time (clock cycles)

Add

Beq

Load

AL

UMem Reg Mem Reg

AL

UMem Reg Mem RegA

LUReg Mem RegMemLostpotential

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 16

Laboratorio deTecnologías de Información

Control Hazard: PredictControl Hazard: Predict

Instr.

Order

Time (clock cycles)

Add

Beq

Load

AL

UMem Reg Mem Reg

AL

UMem Reg Mem Reg

Mem

AL

UReg Mem Reg

Predict:

guess one direction then back up if wrong♦

Impact: 0 lost cycles per branch instruction if right, 1 if wrong (right -

50% of time)■

Need to “Squash”

and restart following instruction if wrong■

Produce CPI on branch of (1 *.5 + 2 * .5) = 1.5■

Total CPI might then be: 1.5 * .2 + 1 * .8 = 1.1 (20% branch)♦

More dynamic scheme: history of 1 branch (-

90%)

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 17

Laboratorio deTecnologías de Información

Control Hazard: Delayed BranchControl Hazard: Delayed Branch

Instr.

Order

Time (clock cycles)

Add

Beq

Misc

AL

UMem Reg Mem Reg

AL

UMem Reg Mem Reg

Mem

AL

UReg Mem Reg

Load Mem

AL

UReg Mem Reg

Delayed Branch: Redefine branch behavior (takes place after next instruction)

Impact: 0 clock cycles per branch instruction if can find instruction to put in “slot”

(-

50% of time)

As launch more instruction per clock cycle, less useful

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 18

Laboratorio deTecnologías de Información

Forwarding to Avoid Data HazardForwarding to Avoid Data Hazard

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 19

Laboratorio deTecnologías de Información

Data Hazard even with ForwardingData Hazard even with Forwarding

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 20

Laboratorio deTecnologías de Información

Forwarding and LoadsForwarding and Loads

Reg

• Dependencies backwards in time are hazards

Time (clock cycles)

lw r1,0(r2)

sub r4,r1,r3

IF ID/RF EX MEM WBAL

UIm Reg Dm

AL

UIm Reg Dm Reg

Can’t solve with forwarding: Must delay/stall instruction dependent on loads

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 21

Laboratorio deTecnologías de Información

Forwarding and LoadsForwarding and Loads

Reg

Time (clock cycles)

lw r1,0(r2)

sub r4,r1,r3

IF ID/RF EX MEM WBAL

UIm Reg DmA

LUIm Reg Dm RegStall

Dependencies backwards in time are hazards

Can’t solve with forwarding: Must delay/stall instruction dependent on loads

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 22

Laboratorio deTecnologías de Información

Designing a Pipelined ProcessorDesigning a Pipelined Processor

Go back and examine your datapath

and control diagram♦

associated resources with states

ensure that flows do not conflict, or figure out how to resolve

assert control in appropriate stage

Control and Control and DatapathDatapath: Split state : Split state diagdiag into 5 piecesinto 5 pieces

IR <- Mem[PC]; PC <– PC+4;

A <- R[rs]; B<– R[rt]

S <– A + B;

R[rd] <– S;

S <– A + SX;

M <– Mem[S]

R[rd] <– M;

S <– A or ZX;

R[rt] <– S;

S <– A + SX;

Mem[S] <- B

If CondPC < PC+SX;

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

A

B

SReg

File

Equ

al

PC

Nex

t PC

IR

Inst

. Mem

D

M

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 24

Laboratorio deTecnologías de Información

Pipelined ProcessorPipelined Processor

What happens if we start a new instruction every cycle?

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

A

B

S

M

Reg

File

Equ

al

PC

Nex

t PC

IR

Inst

. Mem

Valid

IRex

Dcd

Ctrl

IRm

em

Ex

Ctrl

IRw

b

Mem

Ctrl

WB

Ctrl

D

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 25

Laboratorio deTecnologías de InformaciónPipelinedPipelined DatapathDatapath

Data stationary

control-

local decode

for

each

phase

/ pipeline

stage

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 26

Laboratorio deTecnologías de Información

Pipelining the Load InstructionPipelining the Load Instruction

The five independent functional units in the pipeline datapath

are:■

Instruction Memory for the Ifetch

stage

Register File’s Read ports (bus A and busB) for the Reg/Dec

stage■

ALU for the

Exec

stage

Data Memory for the Mem

stage■

Register File’s Write

port (bus W) for the Wr

stage

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Ifetch Reg/Dec Exec Mem Wr1st lw

Ifetch Reg/Dec Exec Mem Wr2nd lw

Ifetch Reg/Dec Exec Mem Wr3rd lw

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 27

Laboratorio deTecnologías de Información

The Four Stages of RThe Four Stages of R--typetype

Ifetch: Instruction Fetch■

Fetch the instruction from the Instruction Memory

Reg/Dec: Registers Fetch and Instruction Decode♦

Exec: ■

ALU operates on the two register operands

Update PC

Wr: Write the ALU output back to the register file

Cycle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec WrR-type

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 28

Laboratorio deTecnologías de Información

Pipelining the RPipelining the R--type and Load type and Load InstructionInstruction

We have pipeline conflict or structural hazard:■

Two instructions try to write to the register file at the same time!

Only one write port

ClockCycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec WrR-type

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 29

Laboratorio deTecnologías de Información

Important ObservationImportant Observation

Each functional unit can only be used once

per instruction

Each functional unit must be used at the same

stage for all instructions:■

Load uses Register File’s Write Port during its 5th

stage

R-type uses Register File’s Write Port during its 4th

stage

Ifetch Reg/Dec Exec Mem WrLoad1 2 3 4 5

Ifetch Reg/Dec Exec WrR-type1 2 3 4

°

2 ways to solve this pipeline hazard.

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 30

Laboratorio deTecnologías de Información

Solution 1: Insert Solution 1: Insert ““BubbleBubble”” into into the Pipelinethe Pipeline

Insert a “bubble”

into the pipeline to prevent 2 writes at the same cycle■

The control logic can be complex.

Lose instruction fetch and issue opportunity.

No instruction is started in Cycle 6!

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec WrR-typeIfetch Reg/Dec Exec WrR-type Pipeline

Bubble

Ifetch Reg/Dec Exec Wr

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 31

Laboratorio deTecnologías de Información

Solution 2: Delay RSolution 2: Delay R--typetype’’s Write s Write by One Cycleby One Cycle

Delay R-type’s register write by one cycle:■

Now R-type instructions also use Reg

File’s write port at Stage 5

Mem

stage is a NOOP

stage: nothing is being done.

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Exec WrR-type Mem

Exec

Exec

Exec

Exec

1 2 3 4 5

Modified Control & Modified Control & DatapathDatapathIR <- Mem[PC]; PC <– PC+4;

A <- R[rs]; B<– R[rt]

S <– A + B;

R[rd] <– M;

S <– A + SX;

M <– Mem[S]

R[rd] <– M;

S <– A or ZX;

R[rt] <– M;

S <– A + SX;

Mem[S] <- B

if Cond PC < PC+SX;

M <– S

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

A

B

SReg

File

Equ

al

PC

Nex

t PC

IR

Inst

. Mem

D

M

M <– S

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 33

Laboratorio deTecnologías de Información

The Four Stages of StoreThe Four Stages of Store

Ifetch: Instruction Fetch■

Fetch the instruction from the Instruction Memory

Reg/Dec: Registers Fetch and Instruction Decode♦

Exec: Calculate the memory address

Mem: Write the data into the Data Memory

Cycle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec MemStore Wr

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 34

Laboratorio deTecnologías de InformaciónThe Three Stages of The Three Stages of BeqBeq

Ifetch: Instruction Fetch■

Fetch the instruction from the Instruction Memory

Reg/Dec: ■

Registers Fetch and Instruction Decode

Exec: ■

compares the two register operand,

select correct branch target address■

latch into PC

Cycle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec MemBeq Wr

Control DiagramControl DiagramIR <- Mem[PC]; PC < PC+4;

A <- R[rs]; B<– R[rt]

S <– A + B;

R[rd] <– S;

S <– A + SX;

M <– Mem[S]

R[rd] <– M;

S <– A or ZX;

R[rt] <– S;

S <– A + SX;

Mem[S] <- B

If Cond PC < PC+SX;

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

Reg

File

Equ

al

PC

Nex

t PC

IR

Inst

. Mem

M <– S M <– S

A

B

S

D

M

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 36

Laboratorio deTecnologías de Información

DatapathDatapath + Data Stationary + Data Stationary ControlControl

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

A

B

SReg

File

PC

Nex

t PC

IR

Inst

. Mem

D

Dec

ode

MemCtrl

WB Ctrl

M

rs rt

oprsrt

fun

im

exmewbrwv

mewbrwv

wbrwv

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 37

Laboratorio deTecnologías de Información

LetLet’’s Try it Outs Try it Out

10 lw r1, r2(35)14 addI r2, r2, 320 sub r3, r4, r524 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12

100 and r13, r14, 15

these addresses are octal

Start: Fetch 10Start: Fetch 10

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

A

B

SReg

File

PC

Nex

t PC

IR

Inst

. Mem

D

Dec

ode

MemCtrl

WB Ctrl

M

rs rt im

10 lw r1, r2(35)14 addI r2, r2, 320 sub r3, r4, r524 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12

100 and r13, r14, 15

n n n n

10

Fetch 14, Decode 10Fetch 14, Decode 10

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

A

B

SReg

File

PC

Nex

t PC

IR

Inst

. Mem

D

Dec

ode

MemCtrl

WB Ctrl

M

2 rt im

10

lw

r1, r2(35)

14 addI r2, r2, 320 sub r3, r4, r524 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12

100 and r13, r14, 15

n n n

14

lwr1

, r2(

35)

Fetch 20, Decode 14, Exec 10Fetch 20, Decode 14, Exec 10

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

r2

B

SReg

File

PC

Nex

t PC

IR

Inst

. Mem

D

Dec

ode

MemCtrl

WB Ctrl

M

2 rt 35

10

lw

r1, r2(35)

14

addI

r2, r2, 3

20 sub r3, r4, r524 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12

100 and r13, r14, 15

n n

20

lwr1

addI

r2, r

2, 3

Fetch 24, Decode 20, Exec 14, Fetch 24, Decode 20, Exec 14, MemMem 1010

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

r2

B

r2+3

5

Reg

File

PC

Nex

t PC

IR

Inst

. Mem

D

Dec

ode

MemCtrl

WB Ctrl

M

4 5 3

10

lw

r1, r2(35)

14

addI

r2, r2, 3

20

sub

r3, r4, r5

24 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12

100 and r13, r14, 15

n

24lw

r1

sub

r3, r

4, r5

addI

r2, r

2, 3

Fetch 30, Fetch 30, DcdDcd 24, Ex 20, 24, Ex 20, MemMem 14, WB 1014, WB 10

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

r4

r5

r2+3

Reg

File

PC

Nex

t PC

IR

Inst

. Mem

D

Dec

ode

MemCtrl

WB Ctrl

M[r2

+35]

6 7

10

lw

r1, r2(35)

14

addI

r2, r2, 3

20

sub

r3, r4, r5

24

beq

r6, r7, 100

30 ori r8, r9, 1734 add r10, r11, r12

100 and r13, r14, 15

30

lwr1

beq

r6, r

7 10

0

addI

r2

sub

r3

Fetch 34, Fetch 34, DcdDcd 30, Ex 24, 30, Ex 24, MemMem 20, WB 1420, WB 14

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

r6

r7

r2+3

Reg

File

PC

Nex

t PC

IR

Inst

. Mem

D

Dec

ode

MemCtrl

WB Ctrl

r1=M

[r2+3

5]

9 xx

10

lw

r1, r2(35)

14

addI

r2, r2, 3

20

sub

r3, r4, r5

24

beq

r6, r7, 100

30

ori

r8, r9, 17

34 add r10, r11, r12

100 and r13, r14, 15

34

beq

addI

r2

sub

r3r4

-r5

100

orir

8, r9

17

Fetch 100, Fetch 100, DcdDcd 3434, Ex 30, , Ex 30, MemMem 24, WB 2024, WB 20

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

r9

x

Reg

File

PC

Nex

t PC

IR

Inst

. Mem

D

Dec

ode

MemCtrl

WB Ctrl

r1=M[r2+35]

11 12

10

lw

r1, r2(35)

14

addI

r2, r2, 3

20

sub

r3, r4, r5

24

beq

r6, r7, 100

30

ori

r8, r9, 17

34

add

r10, r11, r12

100 and r13, r14, 15

100

beq

r2 = r2+3

sub

r3r4

-r5

17or

ir8

xxx

add

r10,

r11,

r12

ooops, we should have only one delayed instruction

Fetch 104, Fetch 104, DcdDcd 100, 100, Ex 34Ex 34, , MemMem 30, WB 2430, WB 24

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

r11

r12

Reg

File

PC

Nex

t PC

IR

Inst

. Mem

D

Dec

ode

MemCtrl

WB Ctrl

r1=M[r2+35]

14 15

10

lw

r1, r2(35)

14

addI

r2, r2, 3

20

sub

r3, r4, r5

24

beq

r6, r7, 100

30

ori

r8, r9, 17

34

add

r10, r11, r12

100 and r13, r14, 15

104

beq

r2 = r2+3r3 = r4-r5

xx

orir

8

xxx

add

r10

and

r13,

r14,

r15

n

Squash the extra instruction in the branch shadow!

r9 |

17

Fetch 108, Fetch 108, DcdDcd 104, Ex 100, 104, Ex 100, MemMem 3434, WB 30, WB 30

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

r14

r15

Reg

File

PC

Nex

t PC

IR

Inst

. Mem

D

Dec

ode

MemCtrl

WB Ctrl

r1=M[r2+35]

10

lw

r1, r2(35)

14

addI

r2, r2, 3

20

sub

r3, r4, r5

24

beq

r6, r7, 100

30

ori

r8, r9, 17

34

add

r10, r11, r12

100 and r13, r14, 15

110

r2 = r2+3r3 = r4-r5

xx

orir

8

add

r10

and

r13

n

Squash the extra instruction in the branch shadow!r9

| 17

r11+

r12

Fetch 114, Fetch 114, DcdDcd 110, 110, Ex 104Ex 104, , MemMem 100, 100, WB 34WB 34

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

Reg

File

PC

Nex

t PC

IR

Inst

. Mem

D

Dec

ode

MemCtrl

WB Ctrl

r1=M[r2+35]

10

lw

r1, r2(35)

14

addI

r2, r2, 3

20

sub

r3, r4, r5

24

beq

r6, r7, 100

30

ori

r8, r9, 17

34

add

r10, r11, r12

100 and r13, r14, 15

114

r2 = r2+3r3 = r4-r5r8 = r9 | 17

add

r10

and

r13

n

Squash the extra instruction in the branch shadow!r1

1+r1

2

NO WBNO Ovflow

r14

& R

15

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 48

Laboratorio deTecnologías de Información

Summary: PipeliningSummary: Pipelining

What makes it easy■

all instructions are the same length

just a few instruction formats■

memory operands appear only in loads and stores

What makes it hard?■

structural hazards: suppose we had only one memory

control hazards: need to worry about branch instructions■

data hazards: an instruction depends on a previous instruction

We’ll build a simple pipeline and look at these issues

We’ll talk about modern processors and what really makes it hard:■

exception handling

trying to improve performance with out-of-order execution, etc.

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Pipelining- 49

Laboratorio deTecnologías de Información

SummarySummary

Pipelining is a fundamental concept■

multiple steps using distinct resources

Utilize capabilities of the Datapath

by pipelined instruction processing■

start next instruction while working on the current one

limited by length of longest stage (plus fill/flush)■

detect and resolve hazards

Recommended