Upload
bathsheba-holland
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
1Images from Patterson-Hennessy BookImages from Patterson-Hennessy Book
Machines that introduced pipelining and instruction-level parallelism.
Clockwise from top: IBM Stretch, IBM 360/91, and CDC 6600
2
COMP 740:COMP 740:Computer Architecture and Computer Architecture and ImplementationImplementation
Montek SinghMontek Singh
Thu, Feb 12, 2009Thu, Feb 12, 2009
Topic: Topic: Instruction-Level Parallelism IInstruction-Level Parallelism I
(Dynamic Scheduling: Scoreboarding)(Dynamic Scheduling: Scoreboarding)
3
OutlineOutline A more complex pipeline, the MIPS R4000A more complex pipeline, the MIPS R4000
Look at the effects of memory with longer latencyLook at the effects of memory with longer latency Also long floating point instructionsAlso long floating point instructions
Dynamic schedulingDynamic scheduling ScoreboardingScoreboarding
4
R4000 PipelineR4000 Pipeline From early 90sFrom early 90s Just before SGI bought MIPSJust before SGI bought MIPS SuperpipelinedSuperpipelined
Approx. 2 instructions per cycleApprox. 2 instructions per cycle
Caches were pipelinedCaches were pipelined Which is what most of the book’s discussion is aboutWhich is what most of the book’s discussion is about
R4000 – 100MHz, 1.3M transistors, 2 levels of R4000 – 100MHz, 1.3M transistors, 2 levels of cachecache
R4400 – up to 250 MHz, larger cachesR4400 – up to 250 MHz, larger caches
5
Block DiagramBlock Diagram
6
Pipeline DiagramPipeline Diagram
Same logic as before, but now multiple Same logic as before, but now multiple cycles for memory accesscycles for memory access
Deeper pipeline will lead to more Deeper pipeline will lead to more hazardshazards More forwardingMore forwarding Longer branch delaysLonger branch delays
Decode Address calculation, branching
7
Forwarding, 2 cycle delayForwarding, 2 cycle delay
8
Or a 2 cycle stallOr a 2 cycle stall
ADD stalled for R1ADD stalled for R1 SUB uses forwarded value, OR from regSUB uses forwarded value, OR from reg
9
Branch Delay = 3 CyclesBranch Delay = 3 Cycles
10
Predicted not TakenPredicted not Taken
If branch taken, need to stall for 2 cycles If branch taken, need to stall for 2 cycles beyond delay slotbeyond delay slot
11
8 Stages in FP pipeline8 Stages in FP pipeline Stages are used one or more times, depending Stages are used one or more times, depending
on instruction (next)on instruction (next)
12
Some FP InstructionsSome FP Instructions Note latencies and initiation intervalsNote latencies and initiation intervals Individual stages may result in structural Individual stages may result in structural
hazardshazards
13
Structural Hazard Example 1Structural Hazard Example 1
Units Units needed at needed at same time same time highlightedhighlighted
14
Structural Hazard Example 2Structural Hazard Example 2
The shorter ADD instruction clears the pipeline The shorter ADD instruction clears the pipeline fast so doesn’t stall MULfast so doesn’t stall MUL
15
Structural Hazard Example 3Structural Hazard Example 3
Notice how these long instructions can have long-lasting effectsNotice how these long instructions can have long-lasting effects
16
PerformancePerformance CPI for base case (1.0), and CPI for base case (1.0), and
with stallswith stalls Left 4 programs integerLeft 4 programs integer Cache effects Cache effects notnot included included Load stalls – 2 cycles nowLoad stalls – 2 cycles now Branch stalls now more Branch stalls now more
expensiveexpensive FP result is a RAW hazardFP result is a RAW hazard Structural not a big problemStructural not a big problem
17
What Do We Have So Far?What Do We Have So Far? Multiple instructions in flight at one timeMultiple instructions in flight at one time If data hazard, no new instructions issue until If data hazard, no new instructions issue until
hazard cleared (stall)hazard cleared (stall)
Could minimize stalls by reordering Could minimize stalls by reordering instructionsinstructions static schedulingstatic scheduling
a smart complier could reorder instructions to minimize a smart complier could reorder instructions to minimize stallstall
using a detailed description of the architectureusing a detailed description of the architecture
dynamic schedulingdynamic scheduling … next topic … next topicor, add hardware to do this at run timeor, add hardware to do this at run time
18
Out of Order ExecutionOut of Order Execution With dynamic scheduling, we can do With dynamic scheduling, we can do out of out of
order executionorder execution Execute instructions with no dependencies Execute instructions with no dependencies Implies out of order completionImplies out of order completion
Today discuss one method: Today discuss one method: scoreboardingscoreboarding
So far, instructions So far, instructions issuedissued in order in order Later we’ll look at out of order issueLater we’ll look at out of order issue
19
Decode StageDecode Stage Split the ID stage into 2 stagesSplit the ID stage into 2 stages
11stst = = issue stageissue stagedecode and check for structural hazardsdecode and check for structural hazards
22ndnd = = read operand stageread operand stagewait until operands available, read and proceedwait until operands available, read and proceed
20
ScoreboardingScoreboarding Use a new hardware unit called the Use a new hardware unit called the scoreboardscoreboard
hardware data structurehardware data structureKeeps track of dependencies, and executes out of order…Keeps track of dependencies, and executes out of order…… … operands become availableoperands become available
First used on CDC 6600First used on CDC 660016 functional units16 functional units
21
MIPS with ScoreboardMIPS with Scoreboard Complex EX Complex EX
stagestage Each functional Each functional
unit hasunit has 2 inputs2 inputs 1 output1 output
22
What is a Scoreboard?What is a Scoreboard?A Scoreboard is a table maintained by the A Scoreboard is a table maintained by the
hardware:hardware: keeps track of instructions being fetched, issued, keeps track of instructions being fetched, issued,
executed etc.executed etc. keeps track of the resources (functional units and keeps track of the resources (functional units and
operands) they use/needoperands) they use/need keeps track of which instructions modify which keeps track of which instructions modify which
registersregisters
uses this information to dynamically schedule uses this information to dynamically schedule instructionsinstructionsvery similar to a pen and paper calculationvery similar to a pen and paper calculationsimple step-by-step procedure easily implemented in simple step-by-step procedure easily implemented in
hardwarehardware
23
Dynamic Scheduling with a Dynamic Scheduling with a ScoreboardScoreboard Original development in CDC 6600Original development in CDC 6600 Simplified example in HP4 for MIPS FP operationsSimplified example in HP4 for MIPS FP operations
Using neither Using neither renamingrenaming nor nor forwardingforwardingValues always move from registers to function units, and from Values always move from registers to function units, and from
function units back to registersfunction units back to registers However, write-back of results happen as soon as However, write-back of results happen as soon as
possible, not in a statically scheduled slotpossible, not in a statically scheduled slotOut-of-order completion can give rise to WAR and WAW Out-of-order completion can give rise to WAR and WAW
hazards hazards Remember: machine “knows” original program order (needed Remember: machine “knows” original program order (needed
for hazard detection)for hazard detection) Machine modelMachine model
2 FP multipliers (10 cycles), 1 FP adder (2 cycles), 1 FP divider 2 FP multipliers (10 cycles), 1 FP adder (2 cycles), 1 FP divider (40 cycles), all non-pipelined(40 cycles), all non-pipelined
1 integer unit for everything else (incl. memory references)1 integer unit for everything else (incl. memory references)
24
New Worry: WAR HazardsNew Worry: WAR Hazards Didn’t exist before, because read occurred Didn’t exist before, because read occurred
earlyearly ExampleExample
DIV.D F0, F2, F4DIV.D F0, F2, F4
ADD.D F10, F0, F8ADD.D F10, F0, F8
SUB.D F8, F8, F14SUB.D F8, F8, F14
ADD could easily stall for DIV’s F0ADD could easily stall for DIV’s F0 If SUB allowed to execute, then ADD might use wrong If SUB allowed to execute, then ADD might use wrong
value for F8value for F8SUB has a WAR hazard with ADD through register F8!SUB has a WAR hazard with ADD through register F8!
25
Scoreboard ImplicationsScoreboard Implications Out-of-order completion Out-of-order completion WAW, WAR WAW, WAR
hazards?hazards? for WAW: for WAW: stall in Issuestall in Issue until previous write until previous write
completescompletes for WAR: for WAR: stall in Write Resultstall in Write Result until previous read until previous read
completes completes
Need to have multiple instructions in Need to have multiple instructions in execution phaseexecution phase multiple execution units or pipelined execution unitsmultiple execution units or pipelined execution units
Scoreboard keeps track of dependences, state Scoreboard keeps track of dependences, state of operationsof operations
Scoreboard replaces ID, EX, WB with 4 stagesScoreboard replaces ID, EX, WB with 4 stages
26
New StagesNew Stages The fetch is same, others have changed.The fetch is same, others have changed. Let’s look at them one by oneLet’s look at them one by one
Fetch IssueRead
OperandsEX WB
27
IssueIssue
If If the required functional unit is available, andthe required functional unit is available, and no other unit is pending a write to same registerno other unit is pending a write to same register
Then an instruction is issuedThen an instruction is issued Moves to “read operands” stageMoves to “read operands” stage
The register restriction prevents WAW hazardsThe register restriction prevents WAW hazards
Fetch IssueRead
OperandsEX WB
28
Read OperandsRead Operands
By now, the functional unit is assignedBy now, the functional unit is assigned If operands are available, allows functional If operands are available, allows functional
unit to read operands from register fileunit to read operands from register file This design has no forwardingThis design has no forwarding
So one extra cycle of latency So one extra cycle of latency
Fetch IssueRead
OperandsEX WB
29
EXEX
Has more Has more functional unitsfunctional units
Notifies Notifies scoreboard scoreboard when donewhen done
Fetch IssueRead
OperandsEX WB
30
Write ResultWrite Result
Prevent WAR hazardsPrevent WAR hazards In this caseIn this case
DIV.D F0, F2, F4DIV.D F0, F2, F4
ADD.D F10, F0, F8ADD.D F10, F0, F8
SUB.D F8, F8, F14SUB.D F8, F8, F14
Will stall the WB of the SUB.D until ADD.D Will stall the WB of the SUB.D until ADD.D reads F8reads F8
Fetch IssueRead
OperandsEX WB
31
Components of ScoreboardComponents of Scoreboard Hardware data Hardware data
structurestructure Look at pieces, one Look at pieces, one
by oneby one Instructions (in Instructions (in
order) listed on top order) listed on top leftleft
32
Instruction StatusInstruction Status
All but last issued (ADD is waiting in Issue All but last issued (ADD is waiting in Issue stage)stage)
First LD completeFirst LD complete MUL, SUB waiting for register F2 (LD)MUL, SUB waiting for register F2 (LD) DIV waiting for F0 (result of MUL)DIV waiting for F0 (result of MUL)
33
Status of Each Functional Unit Status of Each Functional Unit
Fi is destination; j, k sourcesFi is destination; j, k sources Q lists producers of inputsQ lists producers of inputs R column indicates that input registers are R column indicates that input registers are
ready, but not yet read (set to No after read)ready, but not yet read (set to No after read)
34
Register ResultRegister Result
Shows which unit is producing which registerShows which unit is producing which register Needed by Issue stageNeeded by Issue stage
35
Later in ExecutionLater in Execution
LD and SUB (fast ops) have completedLD and SUB (fast ops) have completed ADD and MUL in processADD and MUL in process DIV waiting for MUL to write F0DIV waiting for MUL to write F0
36
Almost DoneAlmost Done
DIV about ready to writeDIV about ready to write Most everything complete and pipeline Most everything complete and pipeline
almost flushedalmost flushed
37
Cost of Extra PerformanceCost of Extra Performance Scoreboard hardwareScoreboard hardware Extra functional unitsExtra functional units Extra busesExtra buses
Which may result in structural hazardWhich may result in structural hazard Hardware needs to assign busesHardware needs to assign buses
Performance depends onPerformance depends on Amount of parallelism in code sequenceAmount of parallelism in code sequence Window size of the scoreboardWindow size of the scoreboard Size of basic block (i.e., code without branches), … Size of basic block (i.e., code without branches), …
nextnext
38
Status – Our Pipeline NowStatus – Our Pipeline Now Can execute instructions out of orderCan execute instructions out of order Have not discussed out of order Have not discussed out of order issueissue
Could extend our scoreboarding to do thisCould extend our scoreboarding to do this
Still, the opportunities in basic block limitedStill, the opportunities in basic block limited Basic blocks tend to be shortBasic blocks tend to be short
Would like to issue past branchesWould like to issue past branches
39
NextNext We’ll first look at techniques to increase issue We’ll first look at techniques to increase issue
potentialpotential Compiler techniquesCompiler techniques
Then look at branch predictionThen look at branch prediction Look at Tomasulo’s algorithm for dynamic Look at Tomasulo’s algorithm for dynamic
schedulingscheduling
Begin reading Chapter 2 of HPBegin reading Chapter 2 of HP
40
Self-Study MaterialSelf-Study Material
Summary of scoreboarding algorithmSummary of scoreboarding algorithmOne long scoreboarding exampleOne long scoreboarding exampleFormal logic equations for scoreboarding Formal logic equations for scoreboarding logiclogic
41
Four Stages of Scoreboard ControlFour Stages of Scoreboard Control1.1. Issue:Issue: decode instr. & check for structural hazards (ID1)decode instr. & check for structural hazards (ID1)
If functional unit is free and no WAW hazard with other active If functional unit is free and no WAW hazard with other active instruction …instruction … … … scoreboard issues the instruction to the functional unit and updates scoreboard issues the instruction to the functional unit and updates
its internal data structure.its internal data structure. If a structural or WAW hazard exists …If a structural or WAW hazard exists …
… … instruction issue stallsinstruction issue stalls– unless there is buffering between fetch and issue, no further instructions unless there is buffering between fetch and issue, no further instructions
can issue until these hazards are cleared.can issue until these hazards are cleared.
2.2. Read operands:Read operands: wait until no data hazards, then read wait until no data hazards, then read (ID2)(ID2) A source operand is available if no earlier issued active A source operand is available if no earlier issued active
instruction is going to write it.instruction is going to write it. When all source operands are available …When all source operands are available …
… … scoreboard tells the functional unit to proceed to read the scoreboard tells the functional unit to proceed to read the operands from registers and begin execution.operands from registers and begin execution.
Thus, scoreboard resolves RAW hazards dynamically in this Thus, scoreboard resolves RAW hazards dynamically in this stepstep instructions may be sent into execution out of orderinstructions may be sent into execution out of order
42
Four Stages of Scoreboard Control Four Stages of Scoreboard Control (cont.)(cont.)3.3. Execution:Execution: operate on operandsoperate on operands
The functional unit begins execution upon receiving The functional unit begins execution upon receiving operandsoperands
When result is ready, it notifies the scoreboardWhen result is ready, it notifies the scoreboard
4.4. Write Result:Write Result: finish execution (WB)finish execution (WB) Once scoreboard is aware that functional unit has Once scoreboard is aware that functional unit has
completed execution, scoreboard checks for WAR hazards.completed execution, scoreboard checks for WAR hazards. If no WAR hazard …If no WAR hazard …
… … it writes resultsit writes results If WAR hazard …If WAR hazard …
… … it stalls the completing instructionit stalls the completing instruction Example:Example:
DIV.DDIV.D F0,F2,F4F0,F2,F4
ADD.DADD.D F10,F0,F8F10,F0,F8
SUB.DSUB.D F8,F8,F14F8,F8,F14
CDC 6600 scoreboard would stall SUB.D until ADD.D reads opsCDC 6600 scoreboard would stall SUB.D until ADD.D reads ops
43
Three Parts of the ScoreboardThree Parts of the Scoreboard1.1. Instruction status: Instruction status: Which of 4 steps instruction Which of 4 steps instruction
is inis in
2.2. Functional unit (FU) status: Functional unit (FU) status: Indicates state of Indicates state of FUFU
Nine fields for each functional unitNine fields for each functional unit Busy: Indicates whether the unit is busy or notBusy: Indicates whether the unit is busy or not Op: Operation to perform in the unit (e.g., + or -)Op: Operation to perform in the unit (e.g., + or -) Fi: Destination registerFi: Destination register Fj, Fk: Source registersFj, Fk: Source registers Qj, Qk: Functional units producing source registers Fj, FkQj, Qk: Functional units producing source registers Fj, Fk Rj, Rk: Flags indicating when Fj, Fk are readyRj, Rk: Flags indicating when Fj, Fk are ready
3.3. Register result status: Register result status: Indicates which Indicates which functional unit will write each register, if anyfunctional unit will write each register, if any
blank when no pending instructions will write that blank when no pending instructions will write that registerregister
44
Scoreboard Example Cycle 0Scoreboard Example Cycle 0
Instruction Status Read Execution Write
Instruction j k Issue Operand Complete Result LD F6 34+ R2 LD F2 45+ R3 MULT F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional Unit Status Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
Divide No
Register Result Status CLOCK F0 F2 F4 F6 F8 F10 F12 … F31
0 FU
45
Scoreboard Example Cycle 1Scoreboard Example Cycle 1Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1LD F2 45+ R3MULT F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 Yes
Mult1 NoMult2 NoAdd NoDivide NoRegister Result Status
CLOCK F0 F2 F4 F6 F8 F10 F12 … F311 FU Int
First LD issues
46
Scoreboard Example Cycle 2Scoreboard Example Cycle 2Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2LD F2 45+ R3MULT F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 No
Mult1 NoMult2 NoAdd NoDivide NoRegister Result Status
CLOCK F0 F2 F4 F6 F8 F10 F12 … F312 FU Int
Structural hazard on Integer unit; second LD stalls in IF stage
47
Scoreboard Example Cycle 3Scoreboard Example Cycle 3
Instruction Status Read Execution Write
Instruction j k Issue Operand Complete Result LD F6 34+ R2 1 2 3 LD F2 45+ R3 MULT F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional Unit Status Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 No
Mult1 No
Mult2 No
Add No
Divide No
Register Result Status CLOCK F0 F2 F4 F6 F8 F10 F12 … F31
3 FU Int Second LD is still stalled
48
Scoreboard Example Cycle 4Scoreboard Example Cycle 4Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3MULT F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 No
Mult1 NoMult2 NoAdd NoDivide NoRegister Result Status
CLOCK F0 F2 F4 F6 F8 F10 F12 … F314 FU
Second LD still stalled; first LD done
49
Scoreboard Example Cycle 5Scoreboard Example Cycle 5Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5MULT F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 Yes
Mult1 NoMult2 NoAdd NoDivide NoRegister Result Status
CLOCK F0 F2 F4 F6 F8 F10 F12 … F315 FU Int
Second LD issues as the structural hazard on Integer unit has cleared
50
Scoreboard Example Cycle 6Scoreboard Example Cycle 6Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6MULT F0 F2 F4 6SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 No
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 NoAdd NoDivide NoRegister Result Status
CLOCK F0 F2 F4 F6 F8 F10 F12 … F316 FU Mul1 Int
MULT issues
51
Scoreboard Example Cycle 7Scoreboard Example Cycle 7Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULT F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6ADDD F6 F8 F2
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 No
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 NoAdd Yes Sub F8 F6 F2 Int Yes No
Divide NoRegister Result Status
CLOCK F0 F2 F4 F6 F8 F10 F12 … F317 FU Mul1 Int Add
SUBD issues; MULT stalled on LD
52
Scoreboard Example Cycle 8aScoreboard Example Cycle 8aInstruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULT F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 No
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 NoAdd Yes Sub F8 F6 F2 Int Yes No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31
8 FU Mul1 Int Add Div
DIVD issues; SUBD stalled on LD
53
Scoreboard Example Cycle 8bScoreboard Example Cycle 8bInstruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 NoAdd Yes Sub F8 F6 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31
8 FU Mul1 Add Div
LD writes F2; MULT and SUBD enabled
54
Scoreboard Example Cycle 9Scoreboard Example Cycle 9Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd Yes Sub F8 F6 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31
9 FU Mul1 Add Div
MULT and SUBD read operands and enter execution
55
Scoreboard Example Cycle 10Scoreboard Example Cycle 10Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd Yes Sub F8 F6 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31
9 FU Mul1 Add Div
Structural hazard on Add unit stalls the final ADDD
56
Scoreboard Example Cycle 11Scoreboard Example Cycle 11Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd Yes Sub F8 F6 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31
11 FU Mul1 Add Div
SUBD and MULT are still in execution
57
Scoreboard Example Cycle 12Scoreboard Example Cycle 12Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31
12 FU Mul1 Div
SUBD writes results; Add unit free; structural hazard resolves
58
Scoreboard Example Cycle 13Scoreboard Example Cycle 13Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31
13 FU Mul1 Add Div
Note WAR hazard between DIVD and ADDD
59
Scoreboard Example Cycle 14Scoreboard Example Cycle 14Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31
14 FU Mul1 Add Div
MULT still executing; DIVD stalled on F0 (RAW hazard)
60
Scoreboard Example Cycle 15Scoreboard Example Cycle 15Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31
15 FU Mul1 Add Div
MULT still executing
61
Scoreboard Example Cycle 16Scoreboard Example Cycle 16Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31
16 FU Mul1 Add Div
ADDD completes execution, ready to write result into F6
62
Scoreboard Example Cycle 17Scoreboard Example Cycle 17Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31
17 FU Mul1 Add Div
WAR hazard : ADDD stalls in Write Result stage
63
Scoreboard Example Cycle 18Scoreboard Example Cycle 18Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31
18 FU Mul1 Add Div
DIVD stalled (RAW hazard on F0), ADDD stalled (WAR hazard on F6)
64
Scoreboard Example Cycle 19Scoreboard Example Cycle 19Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9 19SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31
19 FU Mul1 Add Div
MULT completes execution
65
Scoreboard Example Cycle 20Scoreboard Example Cycle 20Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 Yes Yes
Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31
20 FU Add Div
MULT writes result; DIVD can proceed to read operands at next cycle
66
Scoreboard Example Cycle 21Scoreboard Example Cycle 21Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDD F6 F8 F2 13 14 16
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 No No
Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31
21 FU Add Div
DIVD reads operands; WAR hazard on F6 is resolved
67
Scoreboard Example Cycle 22Scoreboard Example Cycle 22Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDD F6 F8 F2 13 14 16 22
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd NoDivide Yes Div F10 F0 F6 No No
Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31
22 FU Div
40 cycleDivide!
ADDD completes writing of result
68
Scoreboard Example Cycle 61Scoreboard Example Cycle 61Instruction Status Read Execution Write
Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61ADDD F6 F8 F2 13 14 16 22
Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd NoDivide Yes Div F10 F0 F6 No No
Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31
61 FU Div
DIVD completes execution; ready to write result
69
Scoreboard SummaryScoreboard Summary CDC designers measured performance CDC designers measured performance
improvement of 1.7 for compiled FORTRAN improvement of 1.7 for compiled FORTRAN code, 2.5 for assembly code, 2.5 for assembly No pipeline scheduling in software No pipeline scheduling in software Slow memory (no cache)Slow memory (no cache)
Limitations of 6600 scoreboardLimitations of 6600 scoreboardNo forwardingNo forwardingLimited to instructions in basic block (small issue Limited to instructions in basic block (small issue
window)window)Number of functional units (structural hazards)Number of functional units (structural hazards)Wait for WAR hazardsWait for WAR hazardsPrevent WAW hazardsPrevent WAW hazards
70
Scoreboard: Bookkeeping ActionsScoreboard: Bookkeeping ActionsInstruction StatusInstruction Status Wait UntilWait Until BookkeepingBookkeeping
IssueIssue Not Busy[FU] and not Not Busy[FU] and not Result[D]Result[D]
Busy[FU]Busy[FU]yes; yes; Op[FU]Op[FU]op; Fi[FU]op; Fi[FU]D; D; Fj[FU]Fj[FU]S1; Fk[FU]S1; Fk[FU]S2; S2; Qj[FU]Qj[FU]Result[S1]; Result[S1]; Qk[FU]Qk[FU]Result[S2]; Result[S2]; RjRjnot Qj; Rknot Qj; Rknot Qk; not Qk; Result[D]Result[D]FUFU
Read OperandsRead Operands Rj and RkRj and Rk RjRjNo; RkNo; RkNo;No;QjQj0; Qk0; Qk00
Execution CompleteExecution Complete Functional unit doneFunctional unit done
Write ResultWrite Result ff((Fj[f]≠Fi[FU] or ((Fj[f]≠Fi[FU] or Rj[f]=No) &Rj[f]=No) &(Fk[f]≠Fi[FU] or (Fk[f]≠Fi[FU] or Rk[f]=No))Rk[f]=No))
f (if Qj[f]=FU then f (if Qj[f]=FU then Rj[f]Rj[f]yes);yes); f (if Qk[f]=FU then f (if Qk[f]=FU then Rk[f]Rk[f]yes);yes);Result[Fi[FU]]Result[Fi[FU]]0; 0; Busy[FU]Busy[FU]No;No;