28
CS/EE 5810 CS/EE 6810 F00: 1 Tomasulo Dynamic Scheduling

Tomasulo Dynamic Scheduling

  • Upload
    idra

  • View
    73

  • Download
    1

Embed Size (px)

DESCRIPTION

Tomasulo Dynamic Scheduling. Dynamic Issue. In IBM 360/91 about 3 years after CDC 6600 (1966) Goal: High Performance without special compilers Things to remember about the 60’s: No caches, no RISC, very few registers, no precise exceptions Differences between IBM 360 & CDC 6600 ISA - PowerPoint PPT Presentation

Citation preview

Page 1: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 1

Tomasulo Dynamic Scheduling

Page 2: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 2

Dynamic Issue

• In IBM 360/91 about 3 years after CDC 6600 (1966)

• Goal: High Performance without special compilers

• Things to remember about the 60’s:

– No caches, no RISC, very few registers, no precise exceptions

• Differences between IBM 360 & CDC 6600 ISA

– IBM has only 2 register specifiers/instr vs. 3 in CDC 6600

– IBM has 4 FP registers vs. 8 in CDC 6600• Why Study? lead to Alpha 21264, HP 8000, MIPS 10000, Pentium II,

PowerPC 604, …

Page 3: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 3

Dynamic Issue

Goal: take advantage of multiple function units and deal with long memory latencies

• Advantages:

– Speed

• Problems: multiple execution latencies

– Result is out of order completion

– Forwarding and hazard control become more difficult

– Precise exceptions would later amplify the problem (non-issue in the ’60s)

• Answer: HW to issue instructions when hazards clear

Page 4: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 4

Dynamic Issue

• Hazards = data, structural, control

– Data: RAW (true data dependence), WAR ( anti-dependence), WAW (output dependence)

– Structural: Are the required resources available?

– Control: Is this instruction supposed to execute or not?

• Implementation – 2 early approaches

– Control flow – CDC 6600 (scoreboard) (1964)

– Data flow – Tomasulo, IBM 360/91 (1967)» Simple idea – when opcode and operands are ready, and

the appropriate set of resources are ready, launch the “execution packet”

» Interesting wrinkle – does not used named registers for intermediate storage

» Implicit introduction of Register Renaming

Page 5: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 5

Tomasulo vs. Scoreboard

• Control & buffers distributed with Function Units (FU) vs. centralized in scoreboard;

– FU buffers called “reservation stations”; have pending operands• Registers in instructions replaced by values or pointers to reservation

stations(RS); called register renaming ;

– avoids WAR, WAW hazards

– More reservation stations than registers, so can do optimizations compilers can’t

• Results to FU from RS, not through registers, over Common Data Bus that broadcasts results to all FUs

• Load and Stores treated as FUs with RSs as well

• Integer instructions can go past branches, allowing FP ops beyond basic block in FP queue

Page 6: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 6

LoadBuffer

FPRegisters

FP Op Queue

StoreBuffer

FP AddRes.Station

FP MulRes.Station

CommonDataBus

Tomasulo Organization

Page 7: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 7

Reservation Station Duties

• Snarf sources off CDB when they appear

– CDB results are tagged with where they came from

• When all operands are present, enable the associate FU to execute

• Since values aren’t really written to registers (until later): no WAR or WAW hazards are possible

• Structural hazards checked at two points

– At dispatch – a free reservation station of the right type must be available

– When execution packet is ready – multiple reservatino stations may compete for a shared FU

» Program order used as basis for arbitration if required

Page 8: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 8

Virtual Registers

• Tag field associated with data

• Tag field is a virtual register ID

• Corresponds to reservation station and load buffer names

• Motivation due to the 360’s register weakness

– Had only 4 FP regs

– The 9 renamed regs (reservation station slots) were a significant bonus

• Intel’s x86 architecture is also register-poor

– With renamed registers they can get around this

Page 9: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 9

Three Stages of Tomasulo Algorithm1. Issue—get instruction from FP Op Queue

If reservation station free (no structural hazard), control issues instr & sends operands (renames registers).

2. Execution—operate on operands (EX) When both operands ready then execute;

if not ready, watch Common Data Bus for result3. Write result—finish execution (WB)

Write on Common Data Bus to all awaiting units; mark reservation station available

• Normal data bus: data + destination (“go to” bus)• Common data bus: data + source (“come from” bus)

– 64 bits of data + 4 bits of Functional Unit source address– Write if matches expected Functional Unit (produces result)– Does the broadcast

Page 10: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 10

Reservation Station Components

Op—Operation to perform in the unit (e.g., + or –)

Vj, Vk—Value of Source operands

– Store buffers has V field, result to be stored

Qj, Qk—Reservation stations producing source registers (value to be written)

– Note: No ready flags as in Scoreboard; Qj,Qk=0 => ready– Store buffers only have Qi for RS producing result

Busy—Indicates reservation station or FU is busy

Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.

Page 11: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 11

Tomasulo Example Cycle 0

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTDF0 F2 F4 Load3 NoSUBDF8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name BusyOp Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No0 Mult1 No0 Mult2 No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F300 FU

Page 12: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 12

Tomasulo Example Cycle 1

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 yes Load1 yes Regs[R2]+34LD F2 45+ R3 Load2 NoMULTDF0 F2 F4 Load3 NoSUBDF8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name BusyOp Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No0 Mult1 No0 Mult2 No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F301 FU Load1

Page 13: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 13

Tomasulo Example Cycle 2

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 yes 1 Load1 yes Regs[R2]+34LD F2 45+ R3 yes Load2 yes Regs[R3]+45MULTDF0 F2 F4 Load3 NoSUBDF8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name BusyOp Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No0 Mult1 No0 Mult2 No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F302 FU Load2 Load1

Page 14: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 14

Tomasulo Example Cycle 3

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 yes yes Load1 yes Regs[R2]+34LD F2 45+ R3 yes 1 Load2 yes Regs[R3]+45MULTDF0 F2 F4 yes Load3 NoSUBDF8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name BusyOp Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No0 Mult1 Yes Mul Regs[F4] Load20 Mult2 No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F303 FU Mul1 Load2 Load1

Page 15: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 15

Tomasulo Example Cycle 4

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 yes yes yes Load1 NoLD F2 45+ R3 yes yes Load2 yes Regs[R3]+45MULTDF0 F2 F4 yes Load3 NoSUBDF8 F6 F2 yesDIVD F10 F0 F6ADDDF6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name BusyOp Vj Vk Qj Qk0 Add1 Yes Sub M[R2+34] Load20 Add2 No0 Add3 No0 Mult1 Yes Mul Regs[F4] Load20 Mult2 No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F304 FU Mul1 Load2 Add1

Sort of like Figure 4.9 in your text

Page 16: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 16

Tomasulo Example Cycle 5

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 yes yes yes Load1 NoLD F2 45+ R3 yes yes yes Load2 NoMULTDF0 F2 F4 yes 1 Load3 NoSUBDF8 F6 F2 yes 1DIVD F10 F0 F6 yesADDDF6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name BusyOp Vj Vk Qj Qk1 Add1 Yes Sub M[R2+34] M[R3+45]0 Add2 No0 Add3 No1 Mult1 Yes Mul M[R3+45] Regs[F4]0 Mult2 Yes Div Regs[F6] Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F305 FU Mul1 Add1 Mult2

Page 17: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 17

Tomasulo Example Cycle 6

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 yes yes yes Load1 NoLD F2 45+ R3 yes yes yes Load2 NoMULTDF0 F2 F4 yes 2 Load3 NoSUBDF8 F6 F2 yes 2DIVD F10 F0 F6 yesADDDF6 F8 F2 yesReservation Stations S1 S2 RS for j RS for k

Time Name BusyOp Vj Vk Qj Qk2 Add1 Yes Sub M[R2+34] M[R3+45]0 Add2 Yes Add Regs[F2] Add10 Add3 No2 Mult1 Yes Mul M[R3+45] Regs[F4]0 Mult2 Yes Div Regs[F6] Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F306 FU Mul1 Add2 Add1 Mult2

Page 18: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 18

Tomasulo Example Cycle 7

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 yes yes yes Load1 NoLD F2 45+ R3 yes yes yes Load2 NoMULTDF0 F2 F4 yes 3 Load3 NoSUBDF8 F6 F2 yes yesDIVD F10 F0 F6 yesADDDF6 F8 F2 yesReservation Stations S1 S2 RS for j RS for k

Time Name BusyOp Vj Vk Qj Qk3 Add1 Yes Sub M[R2+34] M[R3+45]0 Add2 Yes Add Regs[F2] Add10 Add3 No3 Mult1 Yes Mul M[R3+45] Regs[F4]0 Mult2 Yes Div Regs[F6] Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F307 FU Mul1 Add2 Add1 Mult2

Page 19: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 19

Tomasulo Example Cycle 8

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 yes yes yes Load1 NoLD F2 45+ R3 yes yes yes Load2 NoMULTDF0 F2 F4 yes 4 Load3 NoSUBDF8 F6 F2 yes yes yesDIVD F10 F0 F6 yesADDDF6 F8 F2 yes 1Reservation Stations S1 S2 RS for j RS for k

Time Name BusyOp Vj Vk Qj Qk0 Add1 No1 Add2 Yes Add Add1 Regs[F2]0 Add3 No4 Mult1 Yes Mul M[R3+45] Regs[F4]0 Mult2 Yes Div Regs[F6] Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F308 FU Mul1 Add2 Mult2

•Note: ADDD can execute (and complete) before DIVD issues because an old version of F6 is stored in the reservation station which avoids the WAR hazard

Page 20: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 20

Tomasulo Example Cycle 9

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 yes yes yes Load1 NoLD F2 45+ R3 yes yes yes Load2 NoMULTDF0 F2 F4 yes 5 Load3 NoSUBDF8 F6 F2 yes yes yesDIVD F10 F0 F6 yesADDDF6 F8 F2 yes 2Reservation Stations S1 S2 RS for j RS for k

Time Name BusyOp Vj Vk Qj Qk0 Add1 No2 Add2 Yes Add Add1 Regs[F2]0 Add3 No5 Mult1 Yes Mul M[R3+45] Regs[F4]0 Mult2 Yes Div Regs[F6] Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F309 FU Mul1 Add2 Mult2

Page 21: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 21

Tomasulo Example Cycle 10

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 yes yes yes Load1 NoLD F2 45+ R3 yes yes yes Load2 NoMULTDF0 F2 F4 yes 6 Load3 NoSUBDF8 F6 F2 yes yes yesDIVD F10 F0 F6 yesADDDF6 F8 F2 yes yesReservation Stations S1 S2 RS for j RS for k

Time Name BusyOp Vj Vk Qj Qk0 Add1 No3 Add2 Yes Add Add1 Regs[F2]0 Add3 No6 Mult1 Yes Mul M[R3+45] Regs[F4]0 Mult2 Yes Div Regs[F6] Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3010 FU Mul1 Add2 Mult2

Page 22: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 22

Tomasulo Example Cycle 11

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 yes yes yes Load1 NoLD F2 45+ R3 yes yes yes Load2 NoMULTDF0 F2 F4 yes 7 Load3 NoSUBDF8 F6 F2 yes yes yesDIVD F10 F0 F6 yesADDDF6 F8 F2 yes yes yesReservation Stations S1 S2 RS for j RS for k

Time Name BusyOp Vj Vk Qj Qk0 Add1 No3 Add2 No0 Add3 No7 Mult1 Yes Mul M[R3+45] Regs[F4]0 Mult2 Yes Div Regs[F6] Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3011 FU Mul1 Mult2

Page 23: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 23

Tomasulo Example Cycle 12

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 yes yes yes Load1 NoLD F2 45+ R3 yes yes yes Load2 NoMULTDF0 F2 F4 yes 8 Load3 NoSUBDF8 F6 F2 yes yes yesDIVD F10 F0 F6 yesADDDF6 F8 F2 yes yes yesReservation Stations S1 S2 RS for j RS for k

Time Name BusyOp Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No8 Mult1 Yes Mul M[R3+45] Regs[F4]0 Mult2 Yes Div Regs[F6] Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3012 FU Mul1 Mult2

Page 24: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 24

Tomasulo Example Cycle 13

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 yes yes yes Load1 NoLD F2 45+ R3 yes yes yes Load2 NoMULTDF0 F2 F4 yes 9 Load3 NoSUBDF8 F6 F2 yes yes yesDIVD F10 F0 F6 yesADDDF6 F8 F2 yes yes yesReservation Stations S1 S2 RS for j RS for k

Time Name BusyOp Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No9 Mult1 Yes Mul M[R3+45] Regs[F4]0 Mult2 Yes Div Regs[F6] Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3013 FU Mul1 Mult2

Page 25: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 25

Tomasulo Example Cycle 14

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 yes yes yes Load1 NoLD F2 45+ R3 yes yes yes Load2 NoMULTDF0 F2 F4 yes yes Load3 NoSUBDF8 F6 F2 yes yes yesDIVD F10 F0 F6 yesADDDF6 F8 F2 yes yes yesReservation Stations S1 S2 RS for j RS for k

Time Name BusyOp Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No

10 Mult1 Yes Mul M[R3+45] Regs[F4]0 Mult2 Yes Div Regs[F6] Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3014 FU Mul1 Mult2

This is Figure 4.10 in the text

Page 26: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 26

Tomasulo Example Cycle 15

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 yes yes yes Load1 NoLD F2 45+ R3 yes yes yes Load2 NoMULTDF0 F2 F4 yes yes yes Load3 NoSUBDF8 F6 F2 yes yes yesDIVD F10 F0 F6 yes 1ADDDF6 F8 F2 yes yes yesReservation Stations S1 S2 RS for j RS for k

Time Name BusyOp Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add30 Mult11 Mult2 Yes Div Mult1 Regs[F6]

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3015 FU Mult2

Page 27: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 27

Tomasulo Example Cycle 16

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 yes yes yes Load1 NoLD F2 45+ R3 yes yes yes Load2 NoMULTDF0 F2 F4 yes yes yes Load3 NoSUBDF8 F6 F2 yes yes yesDIVD F10 F0 F6 yes 2ADDDF6 F8 F2 yes yes yesReservation Stations S1 S2 RS for j RS for k

Time Name BusyOp Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add30 Mult12 Mult2 Yes Div Mult1 Regs[F6]

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3016 FU Mult2

•Now do 38 more DIVD cycles and then write back F10 to finish

Page 28: Tomasulo Dynamic Scheduling

CS/EE 5810CS/EE 6810F00: 28

Review: Tomasulo• Prevents Register as bottleneck

– Where’s the new bottleneck?

• Avoids WAR, WAW hazards of Scoreboard

• If we assume branch prediction (next subject…)

– Allows loop unrolling in HW

– Not limited to basic blocks

• Lasting Contributions

– Dynamic scheduling

– Register renaming

– Load/store disambiguation» Out of order is OK if addresses don’t match

• 360/91 descendants are PowerPC 604, 620; MIPS R10000; HP-PA 8000; Intel Pentium Pro