Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
CS152ComputerArchitectureandEngineeringCS252GraduateComputerArchitecture
Lecture3-Pipelining
KrsteAsanovicElectricalEngineeringandComputerSciences
UniversityofCaliforniaatBerkeley
http://www.eecs.berkeley.edu/~krstehttp://inst.eecs.berkeley.edu/~cs152
LastTimeinLecture2
§ Microcoding,aneffec?vetechniquetomanagecontrolunitcomplexity,inventedinerawhenlogic(tubes),mainmemory(magne?ccore),andROM(diodes)useddifferenttechnologies
§ DifferencebetweenROMandRAMspeedmo?vatedaddi?onalcomplexinstruc?ons
§ TechnologyadvancesleadingtofastSRAMmadetechnologyassump?onsinvalid
§ Complexinstruc?onssetsimpedeparallelandpipelinedimplementa?ons
§ Load/store,register-richISAs(pioneeredbyCray,popularizedbyRISC)performbePerinnewVLSItechnology
2
“IronLaw”ofProcessorPerformance
§ Instruc?onsperprogramdependsonsourcecode,compilertechnology,andISA
§ Cyclesperinstruc?ons(CPI)dependsonISAandµarchitecture
§ Timepercycledependsupontheµarchitectureandbasetechnology
3
Time=Instruc?onsCyclesTimeProgramProgram*Instruc?on*Cycle
Microarchitecture CPI cycle?meMicrocoded >1 shortSingle-cycleunpipelined 1 longPipelined 1 short
MemoryEXecuteDecodeFetch
Classic5-StageRISCPipeline
4
Registers
ALUB
A
DataCache
PC Instruc?on
Cache
Store
Imm
Inst.R
egister
Writeback
Thisversiondesignedforregfiles/memorieswithsynchronousreadsandwrites.
CPIExamples
5
Time
Inst3
7cycles
Inst1 Inst2
5cycles 10cyclesMicrocodedmachine
3instruc?ons,22cycles,CPI=7.33Unpipelinedmachine
3instruc?ons,3cycles,CPI=1
Inst1 Inst2 Inst3
Pipelinedmachine
3instruc?ons,3cycles,CPI=1Inst1
Inst2Inst3 5-stagepipelineCPI≠5!!!
InstrucNonsinteractwitheachotherinpipeline
§ Aninstruc?oninthepipelinemayneedaresourcebeingusedbyanotherinstruc?oninthepipelineàstructuralhazard
§ Aninstruc?onmaydependonsomethingproducedbyanearlierinstruc?on– Dependencemaybeforadatavalue
àdatahazard– Dependencemaybeforthenextinstruc?on’saddress
àcontrolhazard(branches,excep?ons)
§ HandlinghazardsgenerallyintroducesbubblesintopipelineandreducesidealCPI>1
6
PipelineCPIExamples
7
Time
3instruc?onsfinishin3cyclesCPI=3/3=1
Inst1Inst2
Inst3
3instruc?onsfinishin4cyclesCPI=4/3=1.33
Inst1Inst2
Inst3Bubble
Measurefromwhenfirstinstruc?onfinishestowhenlastinstruc?oninsequencefinishes.
3instruc?onsfinishin5cyclesCPI=5/3=1.67
Inst1
Inst2Inst3
Bubble1
Bubble2Inst3
ResolvingStructuralHazards
§ Structuralhazardoccurswhentwoinstruc?onsneedsamehardwareresourceatsame?me– Canresolveinhardwarebystallingnewerinstruc?on?llolderinstruc?onfinishedwithresource
§ Astructuralhazardcanalwaysbeavoidedbyaddingmorehardwaretodesign– E.g.,iftwoinstruc?onsbothneedaporttomemoryatsame?me,couldavoidhazardbyaddingsecondporttomemory
§ ClassicRISC5-stageintegerpipelinehasnostructuralhazardsbydesign– ManyRISCimplementa?onshavestructuralhazardsonmul?-cycleunitssuchasmul?pliers,dividers,floa?ng-pointunits,etc.,andcanhaveonregisterwritebackports
8
TypesofDataHazards
9
Considerexecu?ngasequenceofregister-registerinstruc?onsoftype:
rk← rioprjData-dependence
r3←r1opr2 Read-ager-Writer5←r3opr4 (RAW)hazard
An?-dependencer3←r1opr2 Write-ager-Readr1←r4opr5 (WAR)hazard
Output-dependencer3←r1opr2 Write-ager-Writer3←r6opr7 (WAW)hazard
ThreeStrategiesforDataHazards
§ Interlock– Waitforhazardtoclearbyholdingdependentinstruc?oninissuestage
§ Bypass– Resolvehazardearlierbybypassingvalueassoonasavailable
§ Speculate– Guessonvalue,correctifwrong
10
InterlockingVersusBypassing
11
add x1, x3, x5 sub x2, x1, x4
F add x1, x3, x5 D
F
X
D
F
sub x2, x1, x4
W
M
X bubble
F
D
W
X M W
M W
W
M
D
X bubble
M
X bubble
D
F
Instruc?oninterlockedindecodestage
F D X M W add x1, x3, x5
F D X M W sub x2, x1, x4
BypassaroundALUwithnobubbles
MemoryEXecuteDecodeFetch
ExampleBypassPath
12
Registers
ALUB
A
DataCache
PC Instruc?on
Cache
Store
Imm
Inst.R
egister
Writeback
MemoryEXecuteDecodeFetch
FullyBypassedDataPath
13
Registers
ALUB
A
DataCache
PC Instruc?on
Cache
Store
Imm
Inst.R
egister
Writeback
F D X M W
F D X M W
F D X M W
F D X M W[AssumesdatawriEentoregistersinaWcycleisreadableinparallelDcycle(doEedline).Extrawritedataregisterandbypasspathsrequiredifthisisnotpossible.]
ValueSpeculaNonforRAWDataHazards
§ Ratherthanwaitforvalue,canguessvalue!
§ Sofar,onlyeffec?veincertainlimitedcases:– Branchpredic?on– Stackpointerupdates– Memoryaddressdisambigua?on
14
ControlHazards
WhatdoweneedtocalculatenextPC?§ ForUncondi?onalJumps
– Opcode,PC,andoffset
§ ForJumpRegister– Opcode,Registervalue,andoffset
§ ForCondi?onalBranches– Opcode,Register(forcondi?on),PCandoffset
§ Forallotherinstruc?ons– OpcodeandPC(andhavetoknowit’snotoneofabove)
15
MemoryEXecuteDecodeFetch
ControlflowinformaNoninpipeline
16
Registers B
A
DataCache
PC Instruc?on
Cache
Store
Imm
Inst.R
egister
Writeback
PCknown Opcode,offsetknown
Branchcondi?on,Jumpregistervalueknown
ALU
EXecuteDecodeFetch
RISC-VUncondiNonalPC-RelaNveJumps
17
Registers B
A
Instruc?onCache
Imm
Inst.R
egister
ALU
PC_d
ecod
e
Add
Jump?PCJumpSelPC
_fetch
Kill
FKill
+4
[Killbitturnsinstruc?onintoabubble]
PipeliningforUncondiNonalPC-RelaNveJumps
18
M W
X M W
D X M W
j target F D
F
target: add x1, x2, x3
X
D
F
bubble
BranchDelaySlots§ EarlyRISCsadoptedideafrompipelinedmicrocodeengines,andchangedISAseman?cssoinstruc?onaRerbranch/jumpisalwaysexecutedbeforecontrolflowchangeoccurs:0x100 j target 0x104 add x1, x2, x3 // Executed before target … 0x205 target: xori x1, x1, 7
§ Sogwarehastofilldelayslotwithusefulwork,orfillwithexplicitNOPinstruc?on
19
M W
X M W
D X M W
j target F D
F
target: xori x1, x1, 7
X
D
F
add x1, x2, x3
Post-1990RISCISAsdon’thavedelayslots
§ EncodesmicroarchitecturaldetailintoISA– c.f.IBM650drumlayout
§ Performanceissues– IncreasedI-cachemissesfromNOPsinunuseddelayslots– I-cachemissondelayslotcausesmachinetowait,evenifdelayslotisaNOP
§ Complicatesmoreadvancedmicroarchitectures– Consider30-stagepipelinewithfour-instruc?on-per-cycleissue
§ BePerbranchpredic?onreducedneed– Branchpredic?oninlaterlecture
20
EXecuteDecodeFetch
RISC-VCondiNonalBranches
21
Registers B
A
Instruc?onCache
Inst.
Inst.R
egister
ALU
PC_d
ecod
e
Add
Branch?PCSel
PC_fetch
Kill
FKill
+4
Cond?
PC_execute
Add
Kill
DKill
PipeliningforCondiNonalBranches
22
M W
X M W
D X M W
beq x1, x2, target F D
F
target: add x1, x2, x3
X
D
F
bubble
bubble
F D X M W
PipeliningforJumpRegister
§ Registervalueobtainedinexecutestage
23
M W
X M W
D X M W
jr x1 F D
F
target: add x5, x6, x7
X
D
F
bubble
bubble
F D X M W
WhyinstrucNonmaynotbedispatchedeverycycleinclassic5-stagepipeline(CPI>1)
§ Fullbypassingmaybetooexpensivetoimplement– typicallyallfrequentlyusedpathsareprovided– someinfrequentlyusedbypasspathsmayincreasecycle?me
andcounteractthebenefitofreducingCPI
§ Loadshavetwo-cyclelatency– Instruc?onagerloadcannotuseloadresult– MIPS-IISAdefinedloaddelayslots,asogware-visiblepipeline
hazard(compilerschedulesindependentinstruc?onorinsertsNOPtoavoidhazard).RemovedinMIPS-II(pipelineinterlocksaddedinhardware)• MIPS:“MicroprocessorwithoutInterlockedPipelineStages”
§ Jumps/Condi?onalbranchesmaycausebubbles– killfollowinginstruc?on(s)ifnodelayslots
24
MachineswithsoRware-visibledelayslotsmayexecutesignificantnumberofNOPinstruc?onsinsertedbythecompiler.NOPsreduceCPI,butincreaseinstruc?ons/program!
CS152Administrivia
§ PS1isposted§ PS1isdueatstartofclassonMondayFeb5
§ Lab1outonFriday§ Lab1overviewinSec?onFriday2-4pm,
– DIS1013113Etcheverry– DIS102310Soda
25
CS252Administrivia
§ CS252discussionsgradingpolicy– We’llignoreyourtwolowestscoresingrading,whichincludesabsences– Sendinsummaryevenifyoucan’taPenddiscussion
§ CS252Piazzaclasshasbeencreated– SignupforthisaswellasCS152Piazza
§ EachCS252paperhasdedicatedthread– Postyourresponseasprivatenotetoinstructors– Due6AMMondaybeforeMondaydiscussionsec?on
26
TrapsandInterrupts
Inclass,we’llusefollowingterminology§ Excep&on:Anunusualinternaleventcausedbyprogramduringexecu?on– E.g.,pagefault,arithme?cunderflow
§ Trap:Forcedtransferofcontroltosupervisorcausedbyexcep?on– Notallexcep?onscausetraps(c.f.IEEE754floa?ng-pointstandard)
§ Interrupt:Anexternaleventoutsideofrunningprogram,whichcausestransferofcontroltosupervisor
§ Trapsandinterruptsusuallyhandledbysamepipelinemechanism
27
HistoryofExcepNonHandling
§ (Analy?calEnginehadoverflowexcep?ons)§ FirstsystemwithtrapswasUnivac-I,1951
– Arithme?coverflowwouldeither• 1.triggertheexecu?onatwo-instruc?onfix-uprou?neataddress0,or
• 2.attheprogrammer'sop?on,causethecomputertostop– LaterUnivac1103,1955,modifiedtoaddexternalinterrupts
• Usedtogatherreal-?mewindtunneldata
§ FirstsystemwithI/OinterruptswasDYSEAC,1954– Hadtwoprogramcounters,andI/OsignalcausedswitchbetweentwoPCs
– Also,firstsystemwithDMA(directmemoryaccessbyI/Odevice)– And,firstmobilecomputer(twotractortrailers,12tons+8tons)
28
DYSEAC,firstmobilecomputer!
29
• Carriedintwotractortrailers,12tons+8tons• BuiltforUSArmySignalCorps
[CourtesyMarkSmotherman]
AsynchronousInterrupts
§ AnI/OdevicerequestsaPen?onbyasser?ngoneofthepriori?zedinterruptrequestlines
§ Whentheprocessordecidestoprocesstheinterrupt– Itstopsthecurrentprogramatinstruc?onIi,comple?ngalltheinstruc?onsuptoIi-1(preciseinterrupt)
– ItsavesthePCofinstruc?onIiinaspecialregister(EPC)– Itdisablesinterruptsandtransferscontroltoadesignatedinterrupthandlerrunninginsupervisormode
30
Interrupts:alteringthenormalflowofcontrol
31
Ii-1 HI1
HI2
HIn
Ii
Ii+1
programinterrupthandler
Anexternalorinternaleventthatneedstobeprocessedbyanother(system)program.Theeventisusuallyunexpectedorrarefromprogram’spointofview.
InterruptHandler
§ SavesEPCbeforeenablinginterruptstoallownestedinterrupts⇒– needaninstruc?ontomoveEPCintoGPRs– needawaytomaskfurtherinterruptsatleastun?lEPCcanbesaved
§ Needstoreadastatusregisterthatindicatesthecauseoftheinterrupt
§ Usesaspecialindirectjumpinstruc?onERET(return-from-environment)which– enablesinterrupts– restorestheprocessortotheusermode– restoreshardwarestatusandcontrolstate
32
SynchronousTrap
§ Asynchronoustrapiscausedbyanexcep?ononapar?cularinstruc?on
§ Ingeneral,theinstruc?oncannotbecompletedandneedstoberestartedagertheexcep?onhasbeenhandled– requiresundoingtheeffectofoneormorepar?allyexecutedinstruc?ons
§ Inthecaseofasystemcalltrap,theinstruc?onisconsideredtohavebeencompleted– aspecialjumpinstruc?oninvolvingachangetoaprivilegedmode
33
ExcepNonHandling5-StagePipeline
34
§ Howtohandlemul?plesimultaneousexcep?onsindifferentpipelinestages?
§ Howandwheretohandleexternalasynchronousinterrupts?
PCInst.Mem D Decode E M
DataMem W+
IllegalOpcode Overflow Dataaddress
Excep?onsPCaddressExcep?on
AsynchronousInterrupts
ExcepNonHandling5-StagePipeline
35
PCInst.Mem D Decode E M
DataMem W+
IllegalOpcode
Overflow DataaddressExcep?ons
PCaddressExcep?on
Asynchronous Interrupts
ExcD
PCD
ExcE
PCE
ExcM
PCM
Cause
EPC
KillDStage
KillFStage
KillEStage
SelectHandlerPC
KillWriteback
CommitPoint
ExcepNonHandling5-StagePipeline
§ Holdexcep?onflagsinpipelineun?lcommitpoint(Mstage)
§ Excep?onsinearlierpipestagesoverridelaterexcep?onsforagiveninstruc?on
§ Injectexternalinterruptsatcommitpoint(overrideothers)
§ Ifexcep?onatcommit:updateCauseandEPCregisters,killallstages,injecthandlerPCintofetchstage
36
SpeculaNngonExcepNons
§ Predic?onmechanism– Excep?onsarerare,sosimplypredic?ngnoexcep?onsisveryaccurate!
§ Checkpredic?onmechanism– Excep?onsdetectedatendofinstruc?onexecu?onpipeline,specialhardwareforvariousexcep?ontypes
§ Recoverymechanism– Onlywritearchitecturalstateatcommitpoint,socanthrowawaypar?allyexecutedinstruc?onsagerexcep?on
– Launchexcep?onhandleragerflushingpipeline
§ BypassingallowsuseofuncommiPedinstruc?onresultsbyfollowinginstruc?ons
37
Acknowledgements
§ ThiscourseispartlyinspiredbypreviousMIT6.823andBerkeleyCS252computerarchitecturecoursescreatedbymycollaboratorsandcolleagues:– Arvind(MIT)– JoelEmer(Intel/MIT)– JamesHoe(CMU)– JohnKubiatowicz(UCB)– DavidPaPerson(UCB)
38