70
1 Images from Patterson-Hennessy Book Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction- level parallelism. Clockwise from top: IBM Stretch, IBM 360/91, and CDC 6600

1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

Embed Size (px)

Citation preview

Page 1: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

1Images from Patterson-Hennessy BookImages from Patterson-Hennessy Book

Machines that introduced pipelining and instruction-level parallelism.

Clockwise from top: IBM Stretch, IBM 360/91, and CDC 6600

Page 2: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

2

COMP 740:COMP 740:Computer Architecture and Computer Architecture and ImplementationImplementation

Montek SinghMontek Singh

Thu, Feb 12, 2009Thu, Feb 12, 2009

Topic: Topic: Instruction-Level Parallelism IInstruction-Level Parallelism I

(Dynamic Scheduling: Scoreboarding)(Dynamic Scheduling: Scoreboarding)

Page 3: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

3

OutlineOutline A more complex pipeline, the MIPS R4000A more complex pipeline, the MIPS R4000

Look at the effects of memory with longer latencyLook at the effects of memory with longer latency Also long floating point instructionsAlso long floating point instructions

Dynamic schedulingDynamic scheduling ScoreboardingScoreboarding

Page 4: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

4

R4000 PipelineR4000 Pipeline From early 90sFrom early 90s Just before SGI bought MIPSJust before SGI bought MIPS SuperpipelinedSuperpipelined

Approx. 2 instructions per cycleApprox. 2 instructions per cycle

Caches were pipelinedCaches were pipelined Which is what most of the book’s discussion is aboutWhich is what most of the book’s discussion is about

R4000 – 100MHz, 1.3M transistors, 2 levels of R4000 – 100MHz, 1.3M transistors, 2 levels of cachecache

R4400 – up to 250 MHz, larger cachesR4400 – up to 250 MHz, larger caches

Page 5: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

5

Block DiagramBlock Diagram

Page 6: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

6

Pipeline DiagramPipeline Diagram

Same logic as before, but now multiple Same logic as before, but now multiple cycles for memory accesscycles for memory access

Deeper pipeline will lead to more Deeper pipeline will lead to more hazardshazards More forwardingMore forwarding Longer branch delaysLonger branch delays

Decode Address calculation, branching

Page 7: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

7

Forwarding, 2 cycle delayForwarding, 2 cycle delay

Page 8: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

8

Or a 2 cycle stallOr a 2 cycle stall

ADD stalled for R1ADD stalled for R1 SUB uses forwarded value, OR from regSUB uses forwarded value, OR from reg

Page 9: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

9

Branch Delay = 3 CyclesBranch Delay = 3 Cycles

Page 10: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

10

Predicted not TakenPredicted not Taken

If branch taken, need to stall for 2 cycles If branch taken, need to stall for 2 cycles beyond delay slotbeyond delay slot

Page 11: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

11

8 Stages in FP pipeline8 Stages in FP pipeline Stages are used one or more times, depending Stages are used one or more times, depending

on instruction (next)on instruction (next)

Page 12: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

12

Some FP InstructionsSome FP Instructions Note latencies and initiation intervalsNote latencies and initiation intervals Individual stages may result in structural Individual stages may result in structural

hazardshazards

Page 13: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

13

Structural Hazard Example 1Structural Hazard Example 1

Units Units needed at needed at same time same time highlightedhighlighted

Page 14: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

14

Structural Hazard Example 2Structural Hazard Example 2

The shorter ADD instruction clears the pipeline The shorter ADD instruction clears the pipeline fast so doesn’t stall MULfast so doesn’t stall MUL

Page 15: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

15

Structural Hazard Example 3Structural Hazard Example 3

Notice how these long instructions can have long-lasting effectsNotice how these long instructions can have long-lasting effects

Page 16: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

16

PerformancePerformance CPI for base case (1.0), and CPI for base case (1.0), and

with stallswith stalls Left 4 programs integerLeft 4 programs integer Cache effects Cache effects notnot included included Load stalls – 2 cycles nowLoad stalls – 2 cycles now Branch stalls now more Branch stalls now more

expensiveexpensive FP result is a RAW hazardFP result is a RAW hazard Structural not a big problemStructural not a big problem

Page 17: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

17

What Do We Have So Far?What Do We Have So Far? Multiple instructions in flight at one timeMultiple instructions in flight at one time If data hazard, no new instructions issue until If data hazard, no new instructions issue until

hazard cleared (stall)hazard cleared (stall)

Could minimize stalls by reordering Could minimize stalls by reordering instructionsinstructions static schedulingstatic scheduling

a smart complier could reorder instructions to minimize a smart complier could reorder instructions to minimize stallstall

using a detailed description of the architectureusing a detailed description of the architecture

dynamic schedulingdynamic scheduling … next topic … next topicor, add hardware to do this at run timeor, add hardware to do this at run time

Page 18: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

18

Out of Order ExecutionOut of Order Execution With dynamic scheduling, we can do With dynamic scheduling, we can do out of out of

order executionorder execution Execute instructions with no dependencies Execute instructions with no dependencies Implies out of order completionImplies out of order completion

Today discuss one method: Today discuss one method: scoreboardingscoreboarding

So far, instructions So far, instructions issuedissued in order in order Later we’ll look at out of order issueLater we’ll look at out of order issue

Page 19: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

19

Decode StageDecode Stage Split the ID stage into 2 stagesSplit the ID stage into 2 stages

11stst = = issue stageissue stagedecode and check for structural hazardsdecode and check for structural hazards

22ndnd = = read operand stageread operand stagewait until operands available, read and proceedwait until operands available, read and proceed

Page 20: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

20

ScoreboardingScoreboarding Use a new hardware unit called the Use a new hardware unit called the scoreboardscoreboard

hardware data structurehardware data structureKeeps track of dependencies, and executes out of order…Keeps track of dependencies, and executes out of order…… … operands become availableoperands become available

First used on CDC 6600First used on CDC 660016 functional units16 functional units

Page 21: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

21

MIPS with ScoreboardMIPS with Scoreboard Complex EX Complex EX

stagestage Each functional Each functional

unit hasunit has 2 inputs2 inputs 1 output1 output

Page 22: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

22

What is a Scoreboard?What is a Scoreboard?A Scoreboard is a table maintained by the A Scoreboard is a table maintained by the

hardware:hardware: keeps track of instructions being fetched, issued, keeps track of instructions being fetched, issued,

executed etc.executed etc. keeps track of the resources (functional units and keeps track of the resources (functional units and

operands) they use/needoperands) they use/need keeps track of which instructions modify which keeps track of which instructions modify which

registersregisters

uses this information to dynamically schedule uses this information to dynamically schedule instructionsinstructionsvery similar to a pen and paper calculationvery similar to a pen and paper calculationsimple step-by-step procedure easily implemented in simple step-by-step procedure easily implemented in

hardwarehardware

Page 23: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

23

Dynamic Scheduling with a Dynamic Scheduling with a ScoreboardScoreboard Original development in CDC 6600Original development in CDC 6600 Simplified example in HP4 for MIPS FP operationsSimplified example in HP4 for MIPS FP operations

Using neither Using neither renamingrenaming nor nor forwardingforwardingValues always move from registers to function units, and from Values always move from registers to function units, and from

function units back to registersfunction units back to registers However, write-back of results happen as soon as However, write-back of results happen as soon as

possible, not in a statically scheduled slotpossible, not in a statically scheduled slotOut-of-order completion can give rise to WAR and WAW Out-of-order completion can give rise to WAR and WAW

hazards hazards Remember: machine “knows” original program order (needed Remember: machine “knows” original program order (needed

for hazard detection)for hazard detection) Machine modelMachine model

2 FP multipliers (10 cycles), 1 FP adder (2 cycles), 1 FP divider 2 FP multipliers (10 cycles), 1 FP adder (2 cycles), 1 FP divider (40 cycles), all non-pipelined(40 cycles), all non-pipelined

1 integer unit for everything else (incl. memory references)1 integer unit for everything else (incl. memory references)

Page 24: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

24

New Worry: WAR HazardsNew Worry: WAR Hazards Didn’t exist before, because read occurred Didn’t exist before, because read occurred

earlyearly ExampleExample

DIV.D F0, F2, F4DIV.D F0, F2, F4

ADD.D F10, F0, F8ADD.D F10, F0, F8

SUB.D F8, F8, F14SUB.D F8, F8, F14

ADD could easily stall for DIV’s F0ADD could easily stall for DIV’s F0 If SUB allowed to execute, then ADD might use wrong If SUB allowed to execute, then ADD might use wrong

value for F8value for F8SUB has a WAR hazard with ADD through register F8!SUB has a WAR hazard with ADD through register F8!

Page 25: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

25

Scoreboard ImplicationsScoreboard Implications Out-of-order completion Out-of-order completion WAW, WAR WAW, WAR

hazards?hazards? for WAW: for WAW: stall in Issuestall in Issue until previous write until previous write

completescompletes for WAR: for WAR: stall in Write Resultstall in Write Result until previous read until previous read

completes completes

Need to have multiple instructions in Need to have multiple instructions in execution phaseexecution phase multiple execution units or pipelined execution unitsmultiple execution units or pipelined execution units

Scoreboard keeps track of dependences, state Scoreboard keeps track of dependences, state of operationsof operations

Scoreboard replaces ID, EX, WB with 4 stagesScoreboard replaces ID, EX, WB with 4 stages

Page 26: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

26

New StagesNew Stages The fetch is same, others have changed.The fetch is same, others have changed. Let’s look at them one by oneLet’s look at them one by one

Fetch IssueRead

OperandsEX WB

Page 27: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

27

IssueIssue

If If the required functional unit is available, andthe required functional unit is available, and no other unit is pending a write to same registerno other unit is pending a write to same register

Then an instruction is issuedThen an instruction is issued Moves to “read operands” stageMoves to “read operands” stage

The register restriction prevents WAW hazardsThe register restriction prevents WAW hazards

Fetch IssueRead

OperandsEX WB

Page 28: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

28

Read OperandsRead Operands

By now, the functional unit is assignedBy now, the functional unit is assigned If operands are available, allows functional If operands are available, allows functional

unit to read operands from register fileunit to read operands from register file This design has no forwardingThis design has no forwarding

So one extra cycle of latency So one extra cycle of latency

Fetch IssueRead

OperandsEX WB

Page 29: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

29

EXEX

Has more Has more functional unitsfunctional units

Notifies Notifies scoreboard scoreboard when donewhen done

Fetch IssueRead

OperandsEX WB

Page 30: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

30

Write ResultWrite Result

Prevent WAR hazardsPrevent WAR hazards In this caseIn this case

DIV.D F0, F2, F4DIV.D F0, F2, F4

ADD.D F10, F0, F8ADD.D F10, F0, F8

SUB.D F8, F8, F14SUB.D F8, F8, F14

Will stall the WB of the SUB.D until ADD.D Will stall the WB of the SUB.D until ADD.D reads F8reads F8

Fetch IssueRead

OperandsEX WB

Page 31: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

31

Components of ScoreboardComponents of Scoreboard Hardware data Hardware data

structurestructure Look at pieces, one Look at pieces, one

by oneby one Instructions (in Instructions (in

order) listed on top order) listed on top leftleft

Page 32: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

32

Instruction StatusInstruction Status

All but last issued (ADD is waiting in Issue All but last issued (ADD is waiting in Issue stage)stage)

First LD completeFirst LD complete MUL, SUB waiting for register F2 (LD)MUL, SUB waiting for register F2 (LD) DIV waiting for F0 (result of MUL)DIV waiting for F0 (result of MUL)

Page 33: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

33

Status of Each Functional Unit Status of Each Functional Unit

Fi is destination; j, k sourcesFi is destination; j, k sources Q lists producers of inputsQ lists producers of inputs R column indicates that input registers are R column indicates that input registers are

ready, but not yet read (set to No after read)ready, but not yet read (set to No after read)

Page 34: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

34

Register ResultRegister Result

Shows which unit is producing which registerShows which unit is producing which register Needed by Issue stageNeeded by Issue stage

Page 35: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

35

Later in ExecutionLater in Execution

LD and SUB (fast ops) have completedLD and SUB (fast ops) have completed ADD and MUL in processADD and MUL in process DIV waiting for MUL to write F0DIV waiting for MUL to write F0

Page 36: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

36

Almost DoneAlmost Done

DIV about ready to writeDIV about ready to write Most everything complete and pipeline Most everything complete and pipeline

almost flushedalmost flushed

Page 37: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

37

Cost of Extra PerformanceCost of Extra Performance Scoreboard hardwareScoreboard hardware Extra functional unitsExtra functional units Extra busesExtra buses

Which may result in structural hazardWhich may result in structural hazard Hardware needs to assign busesHardware needs to assign buses

Performance depends onPerformance depends on Amount of parallelism in code sequenceAmount of parallelism in code sequence Window size of the scoreboardWindow size of the scoreboard Size of basic block (i.e., code without branches), … Size of basic block (i.e., code without branches), …

nextnext

Page 38: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

38

Status – Our Pipeline NowStatus – Our Pipeline Now Can execute instructions out of orderCan execute instructions out of order Have not discussed out of order Have not discussed out of order issueissue

Could extend our scoreboarding to do thisCould extend our scoreboarding to do this

Still, the opportunities in basic block limitedStill, the opportunities in basic block limited Basic blocks tend to be shortBasic blocks tend to be short

Would like to issue past branchesWould like to issue past branches

Page 39: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

39

NextNext We’ll first look at techniques to increase issue We’ll first look at techniques to increase issue

potentialpotential Compiler techniquesCompiler techniques

Then look at branch predictionThen look at branch prediction Look at Tomasulo’s algorithm for dynamic Look at Tomasulo’s algorithm for dynamic

schedulingscheduling

Begin reading Chapter 2 of HPBegin reading Chapter 2 of HP

Page 40: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

40

Self-Study MaterialSelf-Study Material

Summary of scoreboarding algorithmSummary of scoreboarding algorithmOne long scoreboarding exampleOne long scoreboarding exampleFormal logic equations for scoreboarding Formal logic equations for scoreboarding logiclogic

Page 41: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

41

Four Stages of Scoreboard ControlFour Stages of Scoreboard Control1.1. Issue:Issue: decode instr. & check for structural hazards (ID1)decode instr. & check for structural hazards (ID1)

If functional unit is free and no WAW hazard with other active If functional unit is free and no WAW hazard with other active instruction …instruction … … … scoreboard issues the instruction to the functional unit and updates scoreboard issues the instruction to the functional unit and updates

its internal data structure.its internal data structure. If a structural or WAW hazard exists …If a structural or WAW hazard exists …

… … instruction issue stallsinstruction issue stalls– unless there is buffering between fetch and issue, no further instructions unless there is buffering between fetch and issue, no further instructions

can issue until these hazards are cleared.can issue until these hazards are cleared.

2.2. Read operands:Read operands: wait until no data hazards, then read wait until no data hazards, then read (ID2)(ID2) A source operand is available if no earlier issued active A source operand is available if no earlier issued active

instruction is going to write it.instruction is going to write it. When all source operands are available …When all source operands are available …

… … scoreboard tells the functional unit to proceed to read the scoreboard tells the functional unit to proceed to read the operands from registers and begin execution.operands from registers and begin execution.

Thus, scoreboard resolves RAW hazards dynamically in this Thus, scoreboard resolves RAW hazards dynamically in this stepstep instructions may be sent into execution out of orderinstructions may be sent into execution out of order

Page 42: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

42

Four Stages of Scoreboard Control Four Stages of Scoreboard Control (cont.)(cont.)3.3. Execution:Execution: operate on operandsoperate on operands

The functional unit begins execution upon receiving The functional unit begins execution upon receiving operandsoperands

When result is ready, it notifies the scoreboardWhen result is ready, it notifies the scoreboard

4.4. Write Result:Write Result: finish execution (WB)finish execution (WB) Once scoreboard is aware that functional unit has Once scoreboard is aware that functional unit has

completed execution, scoreboard checks for WAR hazards.completed execution, scoreboard checks for WAR hazards. If no WAR hazard …If no WAR hazard …

… … it writes resultsit writes results If WAR hazard …If WAR hazard …

… … it stalls the completing instructionit stalls the completing instruction Example:Example:

DIV.DDIV.D F0,F2,F4F0,F2,F4

ADD.DADD.D F10,F0,F8F10,F0,F8

SUB.DSUB.D F8,F8,F14F8,F8,F14

CDC 6600 scoreboard would stall SUB.D until ADD.D reads opsCDC 6600 scoreboard would stall SUB.D until ADD.D reads ops

Page 43: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

43

Three Parts of the ScoreboardThree Parts of the Scoreboard1.1. Instruction status: Instruction status: Which of 4 steps instruction Which of 4 steps instruction

is inis in

2.2. Functional unit (FU) status: Functional unit (FU) status: Indicates state of Indicates state of FUFU

Nine fields for each functional unitNine fields for each functional unit Busy: Indicates whether the unit is busy or notBusy: Indicates whether the unit is busy or not Op: Operation to perform in the unit (e.g., + or -)Op: Operation to perform in the unit (e.g., + or -) Fi: Destination registerFi: Destination register Fj, Fk: Source registersFj, Fk: Source registers Qj, Qk: Functional units producing source registers Fj, FkQj, Qk: Functional units producing source registers Fj, Fk Rj, Rk: Flags indicating when Fj, Fk are readyRj, Rk: Flags indicating when Fj, Fk are ready

3.3. Register result status: Register result status: Indicates which Indicates which functional unit will write each register, if anyfunctional unit will write each register, if any

blank when no pending instructions will write that blank when no pending instructions will write that registerregister

Page 44: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

44

Scoreboard Example Cycle 0Scoreboard Example Cycle 0

Instruction Status Read Execution Write

Instruction j k Issue Operand Complete Result LD F6 34+ R2 LD F2 45+ R3 MULT F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional Unit Status Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer No

Mult1 No

Mult2 No

Add No

Divide No

Register Result Status CLOCK F0 F2 F4 F6 F8 F10 F12 … F31

0 FU

Page 45: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

45

Scoreboard Example Cycle 1Scoreboard Example Cycle 1Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1LD F2 45+ R3MULT F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer Yes Load F6 R2 Yes

Mult1 NoMult2 NoAdd NoDivide NoRegister Result Status

CLOCK F0 F2 F4 F6 F8 F10 F12 … F311 FU Int

First LD issues

Page 46: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

46

Scoreboard Example Cycle 2Scoreboard Example Cycle 2Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2LD F2 45+ R3MULT F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer Yes Load F6 R2 No

Mult1 NoMult2 NoAdd NoDivide NoRegister Result Status

CLOCK F0 F2 F4 F6 F8 F10 F12 … F312 FU Int

Structural hazard on Integer unit; second LD stalls in IF stage

Page 47: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

47

Scoreboard Example Cycle 3Scoreboard Example Cycle 3

Instruction Status Read Execution Write

Instruction j k Issue Operand Complete Result LD F6 34+ R2 1 2 3 LD F2 45+ R3 MULT F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional Unit Status Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer Yes Load F6 R2 No

Mult1 No

Mult2 No

Add No

Divide No

Register Result Status CLOCK F0 F2 F4 F6 F8 F10 F12 … F31

3 FU Int Second LD is still stalled

Page 48: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

48

Scoreboard Example Cycle 4Scoreboard Example Cycle 4Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3MULT F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer Yes Load F6 R2 No

Mult1 NoMult2 NoAdd NoDivide NoRegister Result Status

CLOCK F0 F2 F4 F6 F8 F10 F12 … F314 FU

Second LD still stalled; first LD done

Page 49: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

49

Scoreboard Example Cycle 5Scoreboard Example Cycle 5Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5MULT F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer Yes Load F2 R3 Yes

Mult1 NoMult2 NoAdd NoDivide NoRegister Result Status

CLOCK F0 F2 F4 F6 F8 F10 F12 … F315 FU Int

Second LD issues as the structural hazard on Integer unit has cleared

Page 50: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

50

Scoreboard Example Cycle 6Scoreboard Example Cycle 6Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6MULT F0 F2 F4 6SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer Yes Load F2 R3 No

Mult1 Yes Mult F0 F2 F4 Integer No Yes

Mult2 NoAdd NoDivide NoRegister Result Status

CLOCK F0 F2 F4 F6 F8 F10 F12 … F316 FU Mul1 Int

MULT issues

Page 51: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

51

Scoreboard Example Cycle 7Scoreboard Example Cycle 7Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULT F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6ADDD F6 F8 F2

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer Yes Load F2 R3 No

Mult1 Yes Mult F0 F2 F4 Integer No Yes

Mult2 NoAdd Yes Sub F8 F6 F2 Int Yes No

Divide NoRegister Result Status

CLOCK F0 F2 F4 F6 F8 F10 F12 … F317 FU Mul1 Int Add

SUBD issues; MULT stalled on LD

Page 52: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

52

Scoreboard Example Cycle 8aScoreboard Example Cycle 8aInstruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULT F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDD F6 F8 F2

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer Yes Load F2 R3 No

Mult1 Yes Mult F0 F2 F4 Integer No Yes

Mult2 NoAdd Yes Sub F8 F6 F2 Int Yes No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31

8 FU Mul1 Int Add Div

DIVD issues; SUBD stalled on LD

Page 53: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

53

Scoreboard Example Cycle 8bScoreboard Example Cycle 8bInstruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDD F6 F8 F2

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 Yes Mult F0 F2 F4 Yes Yes

Mult2 NoAdd Yes Sub F8 F6 F2 Yes Yes

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31

8 FU Mul1 Add Div

LD writes F2; MULT and SUBD enabled

Page 54: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

54

Scoreboard Example Cycle 9Scoreboard Example Cycle 9Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9DIVD F10 F0 F6 8ADDD F6 F8 F2

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 Yes Mult F0 F2 F4 No No

Mult2 NoAdd Yes Sub F8 F6 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31

9 FU Mul1 Add Div

MULT and SUBD read operands and enter execution

Page 55: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

55

Scoreboard Example Cycle 10Scoreboard Example Cycle 10Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9DIVD F10 F0 F6 8ADDD F6 F8 F2

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 Yes Mult F0 F2 F4 No No

Mult2 NoAdd Yes Sub F8 F6 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31

9 FU Mul1 Add Div

Structural hazard on Add unit stalls the final ADDD

Page 56: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

56

Scoreboard Example Cycle 11Scoreboard Example Cycle 11Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11DIVD F10 F0 F6 8ADDD F6 F8 F2

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 Yes Mult F0 F2 F4 No No

Mult2 NoAdd Yes Sub F8 F6 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31

11 FU Mul1 Add Div

SUBD and MULT are still in execution

Page 57: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

57

Scoreboard Example Cycle 12Scoreboard Example Cycle 12Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 Yes Mult F0 F2 F4 No No

Mult2 NoAdd NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31

12 FU Mul1 Div

SUBD writes results; Add unit free; structural hazard resolves

Page 58: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

58

Scoreboard Example Cycle 13Scoreboard Example Cycle 13Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 Yes Mult F0 F2 F4 No No

Mult2 NoAdd Yes Add F6 F8 F2 Yes Yes

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31

13 FU Mul1 Add Div

Note WAR hazard between DIVD and ADDD

Page 59: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

59

Scoreboard Example Cycle 14Scoreboard Example Cycle 14Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 Yes Mult F0 F2 F4 No No

Mult2 NoAdd Yes Add F6 F8 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31

14 FU Mul1 Add Div

MULT still executing; DIVD stalled on F0 (RAW hazard)

Page 60: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

60

Scoreboard Example Cycle 15Scoreboard Example Cycle 15Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 Yes Mult F0 F2 F4 No No

Mult2 NoAdd Yes Add F6 F8 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31

15 FU Mul1 Add Div

MULT still executing

Page 61: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

61

Scoreboard Example Cycle 16Scoreboard Example Cycle 16Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 Yes Mult F0 F2 F4 No No

Mult2 NoAdd Yes Add F6 F8 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31

16 FU Mul1 Add Div

ADDD completes execution, ready to write result into F6

Page 62: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

62

Scoreboard Example Cycle 17Scoreboard Example Cycle 17Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 Yes Mult F0 F2 F4 No No

Mult2 NoAdd Yes Add F6 F8 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31

17 FU Mul1 Add Div

WAR hazard : ADDD stalls in Write Result stage

Page 63: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

63

Scoreboard Example Cycle 18Scoreboard Example Cycle 18Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 Yes Mult F0 F2 F4 No No

Mult2 NoAdd Yes Add F6 F8 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31

18 FU Mul1 Add Div

DIVD stalled (RAW hazard on F0), ADDD stalled (WAR hazard on F6)

Page 64: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

64

Scoreboard Example Cycle 19Scoreboard Example Cycle 19Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9 19SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 Yes Mult F0 F2 F4 No No

Mult2 NoAdd Yes Add F6 F8 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31

19 FU Mul1 Add Div

MULT completes execution

Page 65: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

65

Scoreboard Example Cycle 20Scoreboard Example Cycle 20Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 No No

Divide Yes Div F10 F0 F6 Mult1 Yes Yes

Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31

20 FU Add Div

MULT writes result; DIVD can proceed to read operands at next cycle

Page 66: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

66

Scoreboard Example Cycle 21Scoreboard Example Cycle 21Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDD F6 F8 F2 13 14 16

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 No No

Divide Yes Div F10 F0 F6 No No

Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31

21 FU Add Div

DIVD reads operands; WAR hazard on F6 is resolved

Page 67: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

67

Scoreboard Example Cycle 22Scoreboard Example Cycle 22Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDD F6 F8 F2 13 14 16 22

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 NoMult2 NoAdd NoDivide Yes Div F10 F0 F6 No No

Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31

22 FU Div

40 cycleDivide!

ADDD completes writing of result

Page 68: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

68

Scoreboard Example Cycle 61Scoreboard Example Cycle 61Instruction Status Read Execution Write

Instruction j k Issue Operand Complete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61ADDD F6 F8 F2 13 14 16 22

Functional Unit StatusName Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 NoMult2 NoAdd NoDivide Yes Div F10 F0 F6 No No

Register Result StatusCLOCK F0 F2 F4 F6 F8 F10 F12 … F31

61 FU Div

DIVD completes execution; ready to write result

Page 69: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

69

Scoreboard SummaryScoreboard Summary CDC designers measured performance CDC designers measured performance

improvement of 1.7 for compiled FORTRAN improvement of 1.7 for compiled FORTRAN code, 2.5 for assembly code, 2.5 for assembly No pipeline scheduling in software No pipeline scheduling in software Slow memory (no cache)Slow memory (no cache)

Limitations of 6600 scoreboardLimitations of 6600 scoreboardNo forwardingNo forwardingLimited to instructions in basic block (small issue Limited to instructions in basic block (small issue

window)window)Number of functional units (structural hazards)Number of functional units (structural hazards)Wait for WAR hazardsWait for WAR hazardsPrevent WAW hazardsPrevent WAW hazards

Page 70: 1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

70

Scoreboard: Bookkeeping ActionsScoreboard: Bookkeeping ActionsInstruction StatusInstruction Status Wait UntilWait Until BookkeepingBookkeeping

IssueIssue Not Busy[FU] and not Not Busy[FU] and not Result[D]Result[D]

Busy[FU]Busy[FU]yes; yes; Op[FU]Op[FU]op; Fi[FU]op; Fi[FU]D; D; Fj[FU]Fj[FU]S1; Fk[FU]S1; Fk[FU]S2; S2; Qj[FU]Qj[FU]Result[S1]; Result[S1]; Qk[FU]Qk[FU]Result[S2]; Result[S2]; RjRjnot Qj; Rknot Qj; Rknot Qk; not Qk; Result[D]Result[D]FUFU

Read OperandsRead Operands Rj and RkRj and Rk RjRjNo; RkNo; RkNo;No;QjQj0; Qk0; Qk00

Execution CompleteExecution Complete Functional unit doneFunctional unit done

Write ResultWrite Result ff((Fj[f]≠Fi[FU] or ((Fj[f]≠Fi[FU] or Rj[f]=No) &Rj[f]=No) &(Fk[f]≠Fi[FU] or (Fk[f]≠Fi[FU] or Rk[f]=No))Rk[f]=No))

f (if Qj[f]=FU then f (if Qj[f]=FU then Rj[f]Rj[f]yes);yes); f (if Qk[f]=FU then f (if Qk[f]=FU then Rk[f]Rk[f]yes);yes);Result[Fi[FU]]Result[Fi[FU]]0; 0; Busy[FU]Busy[FU]No;No;