74
Lecture 09: RISC-V Pipeline Implementa8on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan [email protected] www.secs.oakland.edu/~yan 1

Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Embed Size (px)

Citation preview

Page 1: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Lecture09:RISC-VPipelineImplementa8on

CSE564ComputerArchitectureSummer2017

DepartmentofComputerScienceandEngineeringYonghongYan

[email protected]/~yan

1

Page 2: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Acknowledgement

•  SlidesadaptedfromComputerScience152:ComputerArchitectureandEngineering,Spring2016byDr.GeorgeMichelogiannakisfromUCBerkeley

2

Page 3: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Introduc8on

•  CPUperformancefactors–  InstrucNoncount

•  DeterminedbyISAandcompiler–  CPIandCycleNme

•  DeterminedbyCPUhardware

•  ThreegroupsofinstrucNons–  Memoryreference:lw,sw–  ArithmeNc/logical:add,sub,and,or,slt–  Controltransfer:jal,jalr,b*

•  CPI–  Single-cycle,CPI=1–  5stageunpipelined,CPI=5–  5stagepipelined,CPI=1

CPU Time = InstructionsProgram

* CyclesInstruction

*TimeCycle

Page 4: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

AnIdealPipeline

•  Allobjectsgothroughthesamestages•  Nosharingofresourcesbetweenanytwostages•  PropagaNondelaythroughallpipelinestagesisequal•  Theschedulingofanobjectenteringthepipelineisnot

affectedbytheobjectsinotherstages

4

stage 1

stage 2

stage 3

stage 4

Thesecondi+onsgenerallyholdforindustrialassemblylines,butinstruc+onsdependoneachother!

Page 5: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Review:UnpipelinedDatapathforRISC-V

5

0x4

RegWriteEn

AddAdd

clk

WBSelMemWrite

addr

wdata

rdataDataMemory

we

WASel Op2SelImmSelOpCode

clk

clk

addrinst

Inst.Memory

PC rd1

GPRs

rs1rs2

wawd rd2

we

ImmSelect

ALU

ALUControl

PCSelbrrindjabspc+4

Bcomp?BrLogic

Page 6: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Review:HardwiredControlTable

6

Opcode ImmSel Op2Sel FuncSel MemWr RFWen WBSel WASel PCSel

ALU ALUi LW SW BEQtrue

BEQfalse

JAL

JALR

Op2Sel=Reg/Imm WBSel=ALU/Mem/PC PCSel=pc+4/br/rind/jabs

* * *no yes rindPC rd

jabs* * * no

yes PC rdpc+4BrType12 * * no no * *

brBrType12 * * no no * *pc+4BsType12 Imm + yes no * *

pc+4* Reg Func no yes ALU rdIType12 Imm Op pc+4no yes ALU rd

pc+4IType12 Imm + no yes Mem rd

Page 7: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

PipelinedDatapath

7

ClockperiodcanbereducedbydividingtheexecuNonofaninstrucNonintomulNplecycles

tC>max{tIM,tRF,tALU,tDM,tRW}(=tDMprobably)

However,CPIwillincreaseunlessinstruc+onsarepipelined

write-backphase

fetchphase

executephase

decode&Reg-fetchphase

memoryphase

addr

wdata

rdataDataMemory

weALU

ImmSelect

0x4Add

addrrdata

Inst.Memory

rd1

GPRs

rs1rs2

wawdrd2

we

IRPC

Page 8: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

TechnologyAssump8ons

8

Thus,thefollowingNmingassumpNonisreasonable

• Asmallamountofveryfastmemory(caches)backedupbyalarge,slowermemory• FastALU(atleastforintegers)• MulNportedRegisterfiles(slower!)

tIM~=tRF~=tALU~=tDM~=tRW

A5-stagepipelinewillbefocusofourdetaileddesign-somecommercialdesignshaveover30pipeline

stagestodoanintegeradd!

Page 9: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

5-StagePipelinedExecu8on

9

+me t0 t1 t2 t3 t4 t5 t6 t7 ....instrucNon1 IF1 ID1 EX1 MA1 WB1instrucNon2 IF2 ID2 EX2 MA2 WB2instrucNon3 IF3 ID3 EX3 MA3 WB3instrucNon4 IF4 ID4 EX4 MA4 WB4instrucNon5 IF5 ID5 EX5 MA5 WB5

Write-Back(WB)

I-Fetch(IF)

Execute(EX)

Decode,Reg.Fetch(ID)

Memory(MA)

addr

wdata

rdataDataMemory

weALU

ImmSelect

0x4Add

addrrdata

Inst.Memory

rd1

GPRs

rs1rs2wawdrd2

we

IRPC

Page 10: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

5-StagePipelinedExecu8onResourceUsageDiagram

10

+me t0 t1 t2 t3 t4 t5 t6 t7 ....IF I1 I2 I3 I4 I5 ID I1 I2 I3 I4 I5EX I1 I2 I3 I4 I5MA I1 I2 I3 I4 I5WB I1 I2 I3 I4 I5

Resources

Write-Back(WB)

I-Fetch(IF)

Execute(EX)

Decode,Reg.Fetch(ID)

Memory(MA)

addr

wdata

rdataDataMemory

weALU

0x4Add

addrrdata

Inst.Memory

rd1

GPRs

rs1rs2wawdrd2

we

IRPC

ImmSelect

Page 11: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

PipelinedExecu8on:ALUInstruc8ons

11

IRIR IR

PC A

BY

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR

ImmSelect

ALUrd1

GPRs

rs1rs2

wawdrd2

we

wdata

addr

wdata

rdataDataMemory

we

Notquitecorrect!WeneedanInstruc+onReg(IR)foreachstage

Page 12: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

PipelinedRISC-VDatapathwithoutjumps

12

IRIR IR

PC A

BY

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR

ImmSelect

ALUrd1

GPRs

rs1rs2

wawdrd2

we

DataMemorywdata

addr

wdata

rdata

we

ImmSel Op2Sel

WBSelMemWrite

RegWriteEn

F D E M W

ControlPointsNeedtoBeConnected

ALUControl

Page 13: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Instruc8onsinteractwitheachotherinpipeline

•  AninstrucNoninthepipelinemayneedaresourcebeingusedbyanotherinstrucNoninthepipelineàstructuralhazard

•  AninstrucNonmaydependonsomethingproducedbyanearlierinstrucNon–  Dependencemaybeforadatavalue

àdatahazard–  DependencemaybeforthenextinstrucNon’saddress

àcontrolhazard(branches,excep+ons)

13

Page 14: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

ResolvingStructuralHazards

•  StructuralhazardoccurswhentwoinstrucNonsneedsamehardwareresourceatsameNme–  CanresolveinhardwarebystallingnewerinstrucNonNllolder

instrucNonfinishedwithresource•  Astructuralhazardcanalwaysbeavoidedbyaddingmorehardwaretodesign–  E.g.,iftwoinstrucNonsbothneedaporttomemoryatsame

Nme,couldavoidhazardbyaddingsecondporttomemory•  Our5-stagepipelinehasnostructuralhazardsbydesign

–  ThankstoRISC-VISA,whichwasdesignedforpipelining

14

Page 15: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

DataHazards

15

... x1 ← x0 + 10 x4 ← x1 + 17 ...

x1 is stale. Oops!

x1 ← … x4 ← x1 …

IR IR IR

PC A

B

Y

R

MD1 MD2

addr inst

Inst Memory

0x4 Add

IR

Imm Select

ALU rd1

GPRs

rs1 rs2

wa wd rd2

we

wdata

addr

wdata

rdata Data Memory

we

Page 16: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

HowWouldYouResolveThis?

•  ThreeopNons–  Wait(stall)–  Bypass:askthemforwhatyouneedbeforehis/herfinal

deliverable–  Speculateonvaluestoread

16

Page 17: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

ResolvingDataHazards(1)

17

Strategy 1: Wait for the result to be available by freezing earlier pipeline stages è interlocks

Page 18: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

InterlockstoresolveDataHazards

18

IR IR IR

PC A

B

Y

R

MD1 MD2

addr inst

Inst Memory

0x4 Add

IR

Imm Select

ALU rd1

GPRs

rs1 rs2

wa wd rd2

we

wdata

addr

wdata

rdata Data Memory

we

bubble

... x1 ← x0 + 10 x4 ← x1 + 17 ...

Stall Condition

Page 19: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

StalledStagesandPipelineBubbles

19

stalled stages

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .

IF I1 I2 I3 I3 I3 I3 I4 I5 ID I1 I2 I2 I2 I2 I3 I4 I5 EX I1 - - - I2 I3 I4 I5 MA I1 - - - I2 I3 I4 I5 WB I1 - - - I2 I3 I4 I5

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .

(I1) x1 ← (x0) + 10 IF1 ID1 EX1 MA1 WB1 (I2) x4 ← (x1) + 17 IF2 ID2 ID2 ID2 ID2 EX2 MA2 WB2 (I3) IF3 IF3 IF3 IF3 ID3 EX3 MA3 WB3 (I4) IF4 ID4 EX4 MA4 WB4 (I5) IF5 ID5 EX5 MA5 WB5

Resource Usage

- ⇒ pipeline bubble

Page 20: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

InterlockControlLogic

20

IR IR IR

PC A

B

Y

R

MD1 MD2

addr inst

Inst Memory

0x4 Add

IR

Imm Select

ALU rd1

GPRs

rs1 rs2

wa wd rd2

we

wdata

addr

wdata

rdata Data Memory

we

bubble

Compare the source registers of the instruction in the decode stage with the destination register of the uncommitted instructions.

stall Cstall

ws

rs2 rs1 ?

Page 21: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

InterlockControlLogicignoringjumps&branches

21

Shouldwealwaysstallifanrsfieldmatchessomerd?

IRIR IR

PC A

BY

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR ALUrd1

GPRs

rs1rs2

wawdrd2

we

wdata

addr

wdata

rdataDataMemory

we

bubble

stallCstall

wsW

rs1rs2 ?

weW

re1 re2Cre

wsEweM wsM

Cdest CdestweE

noteveryinstrucNonwritesaregister=>wenoteveryinstrucNonreadsaregister=>re

ImmSelect

we:writeenable,1-biton/offws:writeselect,5-bitregisternumberre:readenable,1-biton/offrs:readselect,5-bitregisternumber

Page 22: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

InRISC-VSodorImplementa8on

22

Page 23: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Source&Des8na8onRegisters

23

ALUI/LW/JALRALU

SW/Bcond

func7 rs2 rs1 func3 rd opcode immediate12 rs1 func3 rd opcode imm rs2 rs1 func3 imm

Jump Offset[19:0]

opcode

rd opcode source(s) des+na+on

ALU rd<=rs1func10rs2 rs1,rs2 rdALUI rd<=rs1opimm rs1 rdLW rd<=M[rs1+imm] rs1 rdSW M[rs1+imm]<=rs2 rs1,rs2 -Bcondrs1,rs2 rs1,rs2 -

true: PC<=PC+imm false: PC<=PC+4

JAL x1<=PC,PC<=PC+imm - rdJALR rd<=PC,PC<=rs1+imm rs1 rd

Page 24: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

DerivingtheStallSignal

24

Cdestws=rd

we=Caseopcode

ALU,ALUi,LW,JALR=>on... =>off

Crere1=Caseopcode

ALU,ALUi, =>on =>off

re2=Caseopcode

=>on ->off

LW,SW,Bcond,JALRJAL

ALU,SW,Bcond...

Cstall stall=((rs1D==wsE)&&weE+ (rs1D==wsM)&&weM+ (rs1D==wsW)&&weW)&&re1D+ ((rs2D==wsE)&&weE+ (rs2D==wsM)&&weM+ (rs2D==wsW)&&weW)&&re2D

Page 25: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

HazardsduetoLoads&Stores

25

...M[x1+7]<=x2x4<=M[x3+5]...

IRIR IR

PC A

BY

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR

ImmSelect

ALUrd1

GPRs

rs1rs2

wawdrd2

we

wdata

addr

wdata

rdataDataMemory

we

bubble

StallCondi+on

Isthereanypossibledatahazardinthisinstruc+onsequence?

Whatifx1+7=x3+5?

Page 26: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Load&StoreHazards

26

However,thehazardisavoidedbecauseourmemorysystemcompleteswritesinonecycle!Load/StorehazardsaresomeNmesresolvedinthepipelineandsomeNmesinthememorysystemitself.Moreonthislaterinthecourse.

...M[x1+7]<=x2x4<=M[x3+5]...

x1+7=x3+5=>datahazard

Page 27: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

ResolvingDataHazards(2)

27

Strategy2:Routedataassoonaspossibleaweritiscalculatedtotheearlierpipelinestageàbypass

Page 28: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Bypassing

28

Eachstallorkillintroducesabubbleinthepipeline =>CPI>1

time t0 t1 t2 t3 t4 t5 t6 t7 . . . . (I1) x1 ← x0 + 10 IF1 ID1 EX1 MA1 WB1 (I2) x4 ← x1 + 17 IF2 ID2 ID2 ID2 ID2 EX2 MA2 WB2 (I3) IF3 IF3 IF3 IF3 ID3 EX3 MA3 (I4) stalled stages IF4 ID4 EX4 (I5) IF5 ID5

time t0 t1 t2 t3 t4 t5 t6 t7 . . . . (I1) x1 ← x0 + 10 IF1 ID1 EX1 MA1 WB1 (I2) x4 ← x1 + 17 IF2 ID2 EX2 MA2 WB2 (I3) IF3 ID3 EX3 MA3 WB3 (I4) IF4 ID4 EX4 MA4 WB4 (I5) IF5 ID5 EX5 MA5 WB5

Anewdatapath,i.e.,abypass,cangetthedatafromtheoutputoftheALUtoitsinput

Page 29: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

HardwareSupportforForwarding

Page 30: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Detec8ngRAWHazards

•  Pass register numbers along pipeline –  ID/EX.RegisterRs = register number for Rs in ID/EX –  ID/EX.RegisterRt = register number for Rt in ID/EX –  ID/EX.RegisterRd = register number for Rd in ID/EX

•  Current instruction being executed in ID/EX register •  Previous instruction is in the EX/MEM register •  Second previous is in the MEM/WB register •  RAW Data hazards when

1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt

FwdfromEX/MEMpipelinereg

FwdfromMEM/WBpipelinereg

Page 31: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Detec8ngtheNeedtoForward•  But only if forwarding instruction will write to a register!

–  EX/MEM.RegWrite, MEM/WB.RegWrite

•  And only if Rd for that instruction is not R0 –  EX/MEM.RegisterRd ≠ 0 –  MEM/WB.RegisterRd ≠ 0

Page 32: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

ForwardingCondi8ons

•  Detecting RAW hazard with Previous Instruction –  if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)

and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 (Forward from EX/MEM pipe stage)

–  if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 (Forward from EX/MEM pipe stage)

•  Detecting RAW hazard with Second Previous –  if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)

and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 (Forward from MEM/WB pipe stage)

–  if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 (Forward from MEM/WB pipe stage)

Page 33: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

AddingaBypass

33

ASrc

...(I1)x1<=x0+10(I2)x4<=x1+17

x4<=x1... x1<=...

IRIR IR

PC A

BY

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR

ImmSelect

ALUrd1

GPRs

rs1rs2

wawdrd2

we

wdata

addr

wdata

rdataDataMemory

we

bubble

stall

D

E M W

Whendoesthisbypasshelp?x1<=M[x0+10]x4<=x1+17

JAL500x4<=x1+17

yes no no

Page 34: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

TheBypassSignalDerivingitfromtheStallSignal

34

ASrc=(rs1D==wsE)&&weE&&re1D

we=CaseopcodeALU,ALUi,LW,,JALJALR=>on...=>off

NobecauseonlyALUandALUiinstrucNonscanbenefitfromthisbypass

Isthiscorrect?

SplitweEintotwocomponents:we-bypass,we-stall

stall=(((rs1D==wsE)&&weE+(rs1D==wsM)&&weM+(rs1D==wsW)&&weW)&&re1D+((rs2D==wsE)&&weE+(rs2D==wsM)&&weM+(rs2D==wsW)&&weW)&&re2D)

ws=rd

Page 35: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

BypassandStallSignals

35

we-bypassE=CaseopcodeEALU,ALUi =>on

... =>off

ASrc=(rs1D==wsE)&&we-bypassE&&re1D

SplitweEintotwocomponents:we-bypass,we-stall

stall=((rs1D==wsE)&&we-stallE+ (rs1D==wsM)&&weM+(rs1D==wsW)&&weW)&&re1D

+((rs2D==wsE)&&weE+(rs2D==wsM)&&weM+(rs2D==wsW)&&weW)&&re2D

we-stallE=CaseopcodeELW,JAL,JALR=>on

JAL =>on... =>off

Page 36: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

FullyBypassedDatapath

36

ASrcIRIR IR

PC A

BY

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR ALU

ImmSelect

rd1

GPRs

rs1rs2

wawdrd2

we

wdata

addr

wdata

rdataDataMemory

we

bubble

stall

D

E M W

PCforJAL,...

BSrc

Istheres+llaneedforthestallsignal?stall=(rs1D==wsE)&&(opcodeE==LWE)&&(wsE!=0)&&re1D

+(rs2D==wsE)&&(opcodeE==LWE)&&(wsE!=0)&&re2D

Page 37: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

ControlHazards

WhatdoweneedtocalculatenextPC?•  ForJumps

–  Opcode,PCandoffset

•  ForJumpRegister–  Opcode,Registervalue,andPC

•  ForCondiNonalBranches–  Opcode,Register(forcondiNon),PCandoffset

•  ForallotherinstrucNons–  OpcodeandPC(andhavetoknowit’snotoneofabove)

37

Page 38: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

PCCalcula8onBubbles

38

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .

(I1) x1 ← x0 + 10 IF1 ID1 EX1 MA1 WB1 (I2) x3 ← x2 + 17 IF2 IF2 ID2 EX2 MA2 WB2 (I3) IF3 IF3 ID3 EX3 MA3 WB3 (I4) IF4 IF4 ID4 EX4 MA4 WB4

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .

IF I1 - I2 - I3 - I4 ID I1 - I2 - I3 - I4 EX I1 - I2 - I3 - I4 MA I1 - I2 - I3 - I4 WB I1 - I2 - I3 - I4

Resource Usage

- ⇒ pipeline bubble

Page 39: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

SpeculatenextaddressisPC+4

39

A jump instruction kills (not stalls) the following instruction

stall

How?

I2

I1

104

IR IR

PC addr inst

Inst Memory

0x4 Add

bubble

IR

E M Add

Jump?

PCSrc (pc+4 / jabs / rind/ br)

I1 096 ADD I2 100 J 304 I3 104 ADD I4 304 ADD

kill

Page 40: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

PipeliningJumps

40

I2

I1

104

stall

IR IR

PC addr inst

Inst Memory

0x4 Add

bubble

IR

E M Add

Jump?

PCSrc (pc+4 / jabs / rind/ br)

IRSrcD = Case opcodeD JAL ⇒ bubble ... ⇒ IM

To kill a fetched instruction -- Insert a mux before IR

Any interaction between stall and jump?

bubble

IRSrcD

I2 I1

304 bubble

I1 096 ADD I2 100 J 304 I3 104 ADD I4 304 ADD

kill

Page 41: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

JumpPipelineDiagrams

41

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .

IF I1 I2 I3 I4 I5 ID I1 I2 - I4 I5 EX I1 I2 - I4 I5 MA I1 I2 - I4 I5 WB I1 I2 - I4 I5

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .

(I1) 096: ADD IF1 ID1 EX1 MA1 WB1 (I2) 100: J 304 IF2 ID2 EX2 MA2 WB2 (I3) 104: ADD IF3 - - - - (I4) 304: ADD IF4 ID4 EX4 MA4 WB4

Resource Usage

- ⇒ pipeline bubble

Page 42: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

PipeliningCondi8onalBranches

42

I1 096ADDI2 100BEQx1,x2+200I3 104ADDI4 304ADD

BEQ?

I2

I1

104

stall

IR IR

PC addrinst

InstMemory

0x4Add

bubble

IR

E MAdd

PCSrc(pc+4/jabs/rind/br)

bubble

IRSrcD

BranchcondiNonisnotknownunNltheexecutestage

whatac+onshouldbetakeninthedecodestage?

AYALU

Taken?

Page 43: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

PipeliningCondi8onalBranches

43

I1 096ADDI2 100BEQx1,x2+200I3 104ADDI4 304ADD

stall

IR IR

PC addrinst

InstMemory

0x4Add

bubble

IR

E MAdd

PCSrc(pc+4/jabs/rind/br)

bubble

IRSrcD

AYALU

Taken?

Ifthebranchistaken-killthetwofollowinginstrucNons-theinstrucNonatthedecodestageisnotvalid⇒stallsignalisnotvalid

I2 I1

108I3

Bcond?

?

Page 44: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

PipeliningCondi8onalBranches

44

I1: 096ADDI2: 100BEQx1,x2+200I3: 104ADDI4: 304ADD

stall

IR IR

PC addrinst

InstMemory

0x4Add

bubble

IR

E M

PCSrc(pc+4/jabs/rind/br)

bubble AYALU

Taken?I2 I1

108I3

Bcond?

Jump?

IRSrcD

IRSrcE

Ifthebranchistaken-killthetwofollowinginstrucNons-theinstrucNonatthedecodestageisnotvalid⇒stallsignalisnotvalid

Add

PC

Page 45: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

BranchPipelineDiagrams(resolvedinexecutestage)

45

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .

IF I1 I2 I3 I4 I5 ID I1 I2 I3 - I5 EX I1 I2 - - I5 MA I1 I2 - - I5 WB I1 I2 - - I5

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .

(I1) 096: ADD IF1 ID1 EX1 MA1 WB1 (I2) 100: BEQ +200 IF2 ID2 EX2 MA2 WB2 (I3) 104: ADD IF3 ID3 - - - (I4) 108: IF4 - - - - (I5) 304: ADD IF5 ID5 EX5 MA5 WB5

Resource Usage

- ⇒ pipeline bubble

Page 46: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

WhatIf…

•  Weusedasimplebranchthatcomparesonlyoneregister(rs1)againstzero

•  Canwedoanybeyer?

46

IR IR IR

PC A

B

Y

R

MD1 MD2

addr inst

Inst Memory

0x4 Add

IR

Imm Select

ALU rd1

GPRs

rs1 rs2

wa wd rd2

we

wdata

addr

wdata

rdata Data Memory

we

Page 47: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Usesimplerbranches(e.g.,onlycompareoneregagainstzero)withcompareindecodestage

47

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .

IF I1 I2 I3 I4 I5 ID I1 I2 - I4 I5 EX I1 I2 - I4 I5 MA I1 I2 - I4 I5 WB I1 I2 - I4 I5

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .

(I1) 096: ADD IF1 ID1 EX1 MA1 WB1 (I2) 100: BEQZ +200 IF2 ID2 EX2 MA2 WB2 (I3) 104: ADD IF3 - - - - (I4) 300: ADD IF4 ID4 EX4 MA4 WB4

Resource Usage

- ⇒ pipeline bubble

Page 48: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

BranchDelaySlots(exposecontrolhazardtosoeware)

•  ChangetheISAseman8cssothattheinstrucNonthatfollowsajumporbranchisalwaysexecuted–  givescompilertheflexibilitytoputinausefulinstrucNonwherenormally

apipelinebubblewouldhaveresulted.

48

Delayslotinstruc+onexecutedregardlessofbranchoutcome

I1 096 ADD I2 100 BEQZ r1, +200 I3 104 ADD I4 300 ADD

Page 49: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

BranchPipelineDiagrams(branchdelayslot)

49

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .

IF I1 I2 I3 I4 ID I1 I2 I3 I4 EX I1 I2 I3 I4 MA I1 I2 I3 I4 WB I1 I2 I3 I4

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .

(I1) 096: ADD IF1 ID1 EX1 MA1 WB1 (I2) 100: BEQZ +200 IF2 ID2 EX2 MA2 WB2 (I3) 104: ADD IF3 ID3 EX3 MA3 WB3 (I4) 300: ADD IF4 ID4 EX4 MA4 WB4

Resource Usage

Page 50: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Post-1990RISCISAsdon’thavedelayslots

•  EncodesmicroarchitecturaldetailintoISA–  C.f.IBM650drumlayout

•  Whataretheproblemswithdelayslots?

•  Performanceissues–  E.g.,I-cachemissorpagefaultondelayslotinstrucNoncauses

machinetowait,evenifdelayslotisaNOP•  Complicatesmoreadvancedmicroarchitectures

–  30-stagepipelinewithfour-instrucNon-per-cycleissue•  Complicatesthecompiler’sjob•  BeyerbranchpredicNonreducedneedfordelayslots

50

Page 51: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

WhyanInstruc8onmaynotbedispatchedeverycycle(CPI>1)

•  Fullbypassingmaybetooexpensivetoimplement–  typicallyallfrequentlyusedpathsareprovided–  someinfrequentlyusedbypasspathsmayincreasecycleNmeandcounteractthebenefitofreducingCPI

•  Loadshavetwo-cyclelatency–  InstrucNonawerloadcannotuseloadresult– MIPS-IISAdefinedloaddelayslots,asowware-visiblepipelinehazard(compilerschedulesindependentinstrucNonorinsertsNOPtoavoidhazard).RemovedinMIPS-II(pipelineinterlocksaddedinhardware)

•  MIPS:“MicroprocessorwithoutInterlockedPipelineStages”•  CondiNonalbranchesmaycausebubbles

–  killfollowinginstrucNon(s)ifnodelayslots

51

Machineswithso^ware-visibledelayslotsmayexecutesignificantnumberofNOPinstruc+onsinsertedbythecompiler.NOPsincreaseinstruc+ons/program!

Page 52: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

RISC-VBranchesandJumps

•  JAL:uncondi8onaljumptoPC+immediate

•  JALR:indirectjumptors1+immediate

•  Branch:if(rs1condsrs2),branchtoPC+immediate

52

Page 53: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

RISC-VBranchesandJumps

53

Instruc<on Takenknown? Targetknown?

JAL

JALRB<cond.>

EachinstrucNonfetchdependsononeortwopiecesofinformaNonfromtheprecedinginstrucNon:

1)IstheprecedinginstrucNonatakenbranch?2)Ifso,whatisthetargetaddress?

•  JAL:uncondi8onaljumptoPC+immediate•  JALR:indirectjumptors1+immediate•  Branch:if(rs1condsrs2),branchtoPC+immediate

AeerInst.Decode

AeerInst.Decode AeerInst.Decode

AeerInst.Decode AeerReg.Fetch

AeerExecute

Page 54: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

BranchPenal8esinModernPipelines

54

A PCGeneraNon/MuxP InstrucNonFetchStage1F InstrucNonFetchStage2B BranchAddressCalc/BeginDecodeI CompleteDecodeJ SteerInstrucNonstoFuncNonalunitsR RegisterFileReadE IntegerExecute

Remainderofexecutepipeline(+another6stages)

UltraSPARC-IIIinstrucNonfetchpipelinestages(in-orderissue,4-waysuperscalar,750MHz,2000)

BranchTargetAddressKnown

BranchDirec+on&JumpRegisterTargetKnown

Page 55: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

ReducingControlFlowPenalty

•  SowwaresoluNons–  Eliminatebranches-loopunrolling

•  Increasestherunlength–  ReduceresoluNonNme-instrucNonscheduling

•  ComputethebranchcondiNonasearlyaspossible(oflimitedvaluebecausebranchesowenincriNcalpaththroughcode)

•  HardwaresoluNons–  Findsomethingelsetodo-delayslots

•  Replacespipelinebubbleswithusefulwork(requiressowwarecooperaNon)

–  Speculate-branchpredicNon•  SpeculaNveexecuNonofinstrucNonsbeyondthebranch

55

Page 56: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

BranchPredic8on

•  Mo+va+on:–  BranchpenalNeslimitperformanceofdeeplypipelined

processors–  Modernbranchpredictorshavehighaccuracy–  (>95%)andcanreducebranchpenalNessignificantly

•  Requiredhardwaresupport:–  Predic+onstructures:

•  Branchhistorytables,branchtargetbuffers,etc.

–  Mispredictrecoverymechanisms:•  Keepresultcomputa+onseparatefromcommit •  KillinstrucNonsfollowingbranchinpipeline•  Restorestatetothatfollowingbranch

56

Page 57: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Sta8cBranchPredic8on

57

Overallprobabilityabranchistakenis~60-70%but:

ISAcanayachpreferreddirecNonsemanNcstobranches,e.g.,MotorolaMC88110

bne0(preferredtaken) beq0(nottaken)

backward90%

forward50%

WhatC++statementdoesthislooklike

WhatC++statementdoesthislooklike

Page 58: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

DynamicBranchPredic8onlearningbasedonpastbehavior

•  TemporalcorrelaNon(Nme)–  IfItellyouthatacertainbranchwastakenlastNme,doesthishelp?

–  ThewayabranchresolvesmaybeagoodpredictorofthewayitwillresolveatthenextexecuNon

•  SpaNalcorrelaNon(space)–  Severalbranchesmayresolveinahighlycorrelatedmanner–  Forinstance,apreferredpathofexecuNon

58

Page 59: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

DynamicBranchPredic8on

•  1-bitpredicNonscheme–  Low-porNonaddressasaddressforaone-bitflagforTakenor

NotTakenhistorically–  Simple

•  2-bitpredicNon–  Misstwicetochange

Page 60: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

BranchPredic8onBits

•  Assume2BPbitsperinstrucNon•  ChangethepredicNonawertwoconsecuNvemistakes!

60

¬takewrong

taken¬taken

taken

taken

taken¬takeright

takeright

takewrong

¬taken

¬taken¬taken

BPstate: (predicttake/¬take)x(lastpredic+onright/wrong)

Page 61: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

BranchHistoryTable

61

4K-entryBHT,2bits/entry,~80-90%correctpredicNons

00FetchPC

Branch? TargetPC

+

I-Cache

Opcode offsetInstruc+on

kBHTIndex

2k-entryBHT,2bits/entry

Taken/¬Taken?

Page 62: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Exploi8ngSpa8alCorrela8onYehandPaC,1992

62

IffirstcondiNonfalse,secondcondiNonalsofalseHistoryregister,H,recordsthedirecNonofthelastNbranchesexecutedbytheprocessor

if (x[i] < 7) then!!y += 1;!

if (x[i] < 5) then!!c -= 4;!

Page 63: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Two-LevelBranchPredictor

63

Pen+umProusestheresultfromthelasttwobranchestoselectoneofthefoursetsofBHTbits(~95%correct)

0 0

kFetchPC

ShiwinTaken/¬Takenresultsofeachbranch

2-bitglobalbranchhistoryshiwregister

Taken/¬Taken?

Page 64: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Specula8ngBothDirec8ons•  AnalternaNvetobranchpredicNonistoexecutebothdirecNonsofabranchspeculaNvely

–  resourcerequirementisproporNonaltothenumberofconcurrentspeculaNveexecuNons

–  onlyhalftheresourcesengageinusefulworkwhenbothdirecNonsofabranchareexecutedspeculaNvely

–  branchpredicNontakeslessresourcesthanspeculaNveexecuNonofbothpaths

•  WithaccuratebranchpredicNon,itismorecosteffecNvetodedicateallresourcestothepredicteddirecNon!–  Whatwouldyouchoosewith80%accuracy?

64

Page 65: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

AreWeMissingSomething?

•  Knowingwhetherabranchistakenornotisgreat,butwhatelsedoweneedtoknowaboutit?

Branchtargetaddress

65

Page 66: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Limita8onsofBHTs

66

OnlypredictsbranchdirecNon.Therefore,cannotredirectfetchstreamunNlawerbranchtargetisdetermined.

UltraSPARC-IIIfetchpipeline

Correctlypredictedtakenbranch

penalty

JumpRegisterpenalty

A PCGeneraNon/MuxP InstrucNonFetchStage1F InstrucNonFetchStage2B BranchAddressCalc/BeginDecodeI CompleteDecodeJ SteerInstrucNonstoFuncNonalunitsR RegisterFileReadE IntegerExecute

Remainderofexecutepipeline(+another6stages)

Page 67: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

BranchTargetBuffer

67

BPbitsarestoredwiththepredictedtargetaddress.IFstage:If(BP=taken)thennPC=targetelsenPC=PC+4Later:checkpredic+on,ifwrongthenkilltheinstruc+onandupdateBTB&BPbelseupdateBPb

IMEM

PC

BranchTargetBuffer(2kentries)

k

BPbpredicted

target BP

target

Page 68: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

AddressCollisions(Mis-Predic8on)

68

WhatwillbefetchedawertheinstrucNonat1028?BTBpredicNon = Correcttarget = =>

Assumea128-entryBTB

BPbtargettake236

1028Add.....

132Jump+104

InstrucNonMemory

2361032

killPC=236andfetchPC=1032

Isthisacommonoccurrence?

Page 69: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

BTBisonlyforControlInstruc8ons

•  IsevenbranchpredicNonfastenoughtoavoidbubbles?•  WhendoweindextheBTB?

–  i.e.,whatstateisthebranchin,inordertoavoidbubbles?

•  BTBcontainsusefulinforma8onforbranchandjumpinstruc8onsonly=>Donotupdateitforotherinstruc8ons

•  ForallotherinstrucNonsthenextPCisPC+4!

•  Howtoachievethiseffectwithoutdecodingtheinstruc+on?

69

Page 70: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

BranchTargetBuffer(BTB)

70

• KeepboththebranchPCandtargetPCintheBTB• PC+4isfetchedifmatchfails• OnlytakenbranchesandjumpsheldinBTB• NextPCdeterminedbeforebranchfetchedanddecoded

2k-entry direct-mapped BTB (can also be associative) I-Cache PC

k

Valid

valid

EntryPC

=

match

predicted

target

targetPC

Page 71: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

AreWeMissingSomething?(2)

•  WhendoweupdatetheBTBorBHT?

71

IR IR IR

PC A

B

Y

R

MD1 MD2

addr inst

Inst Memory

0x4 Add

IR

Imm Select

ALU rd1

GPRs

rs1 rs2

wa wd rd2

we

wdata

addr

wdata

rdata Data Memory

we

Page 72: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

CombiningBTBandBHT

•  BTBentriesareconsiderablymoreexpensivethanBHT,butcanredirectfetchesatearlierstageinpipelineandcanaccelerateindirectbranches(JR)

•  BHTcanholdmanymoreentriesandismoreaccurate

72

A PCGeneraNon/MuxP InstrucNonFetchStage1F InstrucNonFetchStage2B BranchAddressCalc/BeginDecodeI CompleteDecodeJ SteerInstrucNonstoFuncNonalunitsR RegisterFileReadE IntegerExecute

BTB

BHTBHTinlaterpipelinestagecorrectswhenBTBmissesapredictedtakenbranch

BTB/BHTonlyupdateda^erbranchresolvesinEstage

Page 73: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

UsesofJumpRegister(JR)

•  Switchstatements(jumptoaddressofmatchingcase)

•  DynamicfuncNoncall(jumptorun-NmefuncNonaddress)

•  SubrouNnereturns(jumptoreturnaddress)

73

HowwelldoesBTBworkforeachofthesecases?

BTBworkswellifsamecaseusedrepeatedly

BTBworkswellifsamefuncNonusuallycalled,(e.g.,inC++programming,whenobjectshavesametypeinvirtualfuncNoncall)

BTBworkswellifusuallyreturntothesameplace⇒O^enonefunc+oncalledfrommanydis+nctcallsites!

Page 74: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15

Subrou8neReturnStack

SmallstructuretoaccelerateJRforsubrouNnereturns,typicallymuchmoreaccuratethanBTBs.

74

&fb() &fc()

Pushcalladdresswhenfunc+oncallexecuted

Popreturnaddresswhensubrou+nereturndecoded

fa() { fb(); } fb() { fc(); } fc() { fd(); }

&fd() kentries(typicallyk=8-16)