85
1 Tomasulo’s Algorithm JianJia Chen (Slides are based on the lecture note from Prof. ChiaLin Yang)

Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

1

Tomasulo’s Algorithm

Jian-­Jia Chen(Slides are based on the lecture note from

Prof. Chia-­Lin Yang)

Page 2: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

2

Dynamic Scheduling Mechanisms:Tomasulo Algorithm

n For IBM 360/91 about 3 years after CDC 6600n Goal: High Performance without special compilersn Small number of floating point registers (4 in 360) prevented interesting compiler scheduling of operationsn This led Tomasulo to try to figure out how to get more effective registers —renaming in hardware!

n Why Study 1966 Computer? n The descendants of this have flourished!

n Alpha 21264, Pentium 4, AMD Opteron, Power 5, …

Page 3: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

3

Tomasulo Organization

FP adders

Add1Add2Add3

FP multipliers

Mult1Mult2

FP Registers

Reservation Stations

Common Data Bus (CDB)

Instruction Queue

Load BuffersStore Buffers

Load/Store Op

Address Unit

Memory Unit

Page 4: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

4

Reservation Station Components

Op operation to perform in the unit (e.g., + or -­ )Qj, Qk reservation stations producing source registers Vj, Vk value of source operandsBusy indicates reservation station and FU is busy

Register result status indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.

Page 5: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

5

Three Stages of Tomasulo Algorithm

1. Issue get instruction from FP Op QueueIf reservation station free, issue instr & send operands to the reservation station. This steps also performs the process of renaming registers.

2. Execution operate on operands (EX)When both operands ready then execute;;if not ready, watch CDB for result

3. Write result finish execution (WB)Write on Common Data Bus to all awaiting units;; mark reservation station available.

Page 6: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

6

Tomasulos Algorithmus: Beispiel

Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F300 FU

Clock cycle counter

FU countdown

Instruction stream

3 Load/Buffers

3 FP Adder R.S.2 FP Mult R.S.

Page 7: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

7

Tomasulos Algorithmus: Zyklus 1Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F301 FU Load1

Page 8: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

8

Tomasulos Algorithmus: Zyklus 2Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F302 FU Load2 Load1

Hinweis: mehrere Ladebefehle können im Lade Puffer gelassen werden.

Page 9: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

9

Tomasulos Beispiel: Zyklus 3Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 Yes MULTD R(F4) Load2Mult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F303 FU Mult1 Load2 Load1

• Hinweis : Das Register F2 wird umbenannt zu Load2 von der Reservation Station.

• Load1 wird im nächsten Zyklus fertig sein. Welcher Befehl braucht Load1?

Register renaming

Page 10: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

10

Tomasulos Algorithmus: Zyklus 4Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 Yes SUBD M(A1) Load2Add2 NoAdd3 NoMult1 Yes MULTD R(F4) Load2Mult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F304 FU Mult1 Load2 M(A1) Add1

• Load2 wird im nächsten Zyklus fertig sein. Welcher Befehl braucht Load2?

Page 11: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

11

Tomasulos Algorithmus: Zyklus 5Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

2 Add1 Yes SUBD M(A1) M(A2)Add2 NoAdd3 No

10 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F305 FU Mult1 M(A2) M(A1) Add1 Mult2

• Der Timer fängt für Add1 und Mult1 an.

Page 12: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

12

Tomasulos Algorithmus: Zyklus 6Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

1 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2) Add1Add3 No

9 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F306 FU Mult1 M(A2) Add2 Add1 Mult2

Page 13: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

13

Tomasulos Algorithmus: Zyklus 7Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

0 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2) Add1Add3 No

8 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F307 FU Mult1 M(A2) Add2 Add1 Mult2

• SUBD wird im nächsten Zyklus fertig sein. Welcher Befehl braucht Add1?

Page 14: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

14

Tomasulos Algorithmus: Zyklus 8Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 No2 Add2 Yes ADDD (M-M) M(A2)

Add3 No7 Mult1 Yes MULTD M(A2) R(F4)

Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F308 FU Mult1 M(A2) Add2 (M-M) Mult2

Out-of-order Ausführung

Page 15: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

15

Tomasulos Algorithmus: Zyklus 9Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 No1 Add2 Yes ADDD (M-M) M(A2)

Add3 No6 Mult1 Yes MULTD M(A2) R(F4)

Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F309 FU Mult1 M(A2) Add2 (M-M) Mult2

Page 16: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

16

Tomasulos Algorithmus: Zyklus 10Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 No0 Add2 Yes ADDD (M-M) M(A2)

Add3 No5 Mult1 Yes MULTD M(A2) R(F4)

Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3010 FU Mult1 M(A2) Add2 (M-M) Mult2

• ADDD wird im nächsten Zyklus fertig sein. Welcher Befehl braucht Add2?

Page 17: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

17

Tomasulos Algorithmus: Zyklus 11Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

4 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3011 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

• Alle schnellen Befehle sind fertig in diesem Zyklus ausgeführt!

Page 18: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

18

Tomasulos Algorithmus: Zyklus 12Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

3 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3012 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

Page 19: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

19

Tomasulos Algorithmus: Zyklus 13Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

2 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3013 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

Page 20: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

20

Tomasulos Algorithmus: Zyklus 14Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

1 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3014 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

Page 21: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

21

Tomasulos Algorithmus: Zyklus 15Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

0 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3015 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

• MULTD wird im nächsten Zyklus fertig sein. Welcher Befehl braucht Mult1?

Page 22: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

22

Tomasulos Algorithmus: Zyklus 16Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 No

40 Mult2 Yes DIVD M*F4 M(A1)

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3016 FU M*F4 M(A2) (M-M+M)(M-M) Mult2

Page 23: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

23

Überspring ein paar Zyklen!

Page 24: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

24

Tomasulos Algorithmus: Zyklus 55Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 No

1 Mult2 Yes DIVD M*F4 M(A1)

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3055 FU M*F4 M(A2) (M-M+M)(M-M) Mult2

Page 25: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

25

Tomasulos Algorithmus: Zyklus 56Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 No

0 Mult2 Yes DIVD M*F4 M(A1)

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3056 FU M*F4 M(A2) (M-M+M)(M-M) Mult2

• DIVD wird im nächsten Zyklus fertig sein. Welcher Befehl braucht Mult2?

Page 26: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

26

Tomasulos Algorithmus: Zyklus 57Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56 57ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 Yes DIVD M*F4 M(A1)

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3056 FU M*F4 M(A2) (M-M+M)(M-M) Result

• Noch einmal: In-order issue, out-of-order execution and out-of-order completion. (In-order Ausgabe, out-of-order Ausführung, und out-of-order Fertigstellung)

Page 27: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

27

Tomasulos Algorithmus: Ausführung für Schleifen

Loop: LD F0 0 R1

MULTD F4 F0 F2SD F4 0 R1

SUBI R1 R1 #8BNEZ R1 Loop

n vereinfachte Annahme: MULTD dauert 4 Taktzyklenn Annahme: Der erste LD Befehl dauert 8 Taktzyklen(L1 Cache Miss). Die folgenden LD Befehle dauern nur 1 Taktzyklus (L1 Cache Hit)

n Sprungvorhersage: immer ausgeführtn Wir führen hier nur 2 Iterationen aus

Page 28: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

28

Tomasulos Algorithmus: SchleifeInstruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 Load1 No1 MULTD F4 F0 F2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

0 80 Fu

Added Store Buffers

Der Wert des Registers R1, benutzt für die Berechnung der Adresse um die Iterationssteuerung zu kontrollieren

Sprungbefehl

Iter-ationCount

Page 29: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

29

Tomasulos Algorithmus (Schleife) Zyklus 1Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 80

Load2 NoLoad3 NoStore1 NoStore2 NoStore3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

1 80 Fu Load1

Page 30: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

30

Tomasulos Algorithmus (Schleife) Zyklus 2Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No

Load3 NoStore1 NoStore2 NoStore3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

2 80 Fu Load1 Mult1

Page 31: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

31

Tomasulos Algorithmus (Schleife) Zyklus 3Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No

Store1 Yes 80 Mult1Store2 NoStore3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

3 80 Fu Load1 Mult1

n Das Register F0 wird automatisch umbenannt

Page 32: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

32

Tomasulos Algorithmus (Schleife) Zyklus 4Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No

Store1 Yes 80 Mult1Store2 NoStore3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

4 80 Fu Load1 Mult1

n SUBI Befehl wird ausgeführt (nicht in FP Einheiten)

Page 33: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

33

Tomasulos Algorithmus (Schleife) Zyklus 5Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No

Store1 Yes 80 Mult1Store2 NoStore3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

5 72 Fu Load1 Mult1

n BNEZ Befehl wird ausgeführt (nicht in FP Einheiten)

Page 34: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

34

Tomasulos Algorithmus (Schleife) Zyklus 6Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult1

Store2 NoStore3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

6 72 Fu Load2 Mult1

n Beachten: Das Wort von Adresse 80 wird nicht F0 zugeordnet

Page 35: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

35

Tomasulos Algorithmus (Schleife) Zyklus 7Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 No

Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

7 72 Fu Load2 Mult2

n Registeransatz ist hier unabhängig von Berechnungenn Die erste und zweite Iterationen können sich überlappen

Page 36: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

36

Tomasulos Algorithmus (Schleife) Zyklus 8Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

8 72 Fu Load2 Mult2

Page 37: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

37

Tomasulos Algorithmus (Schleife) Zyklus 9Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

9 72 Fu Load2 Mult2

§ Load1 wird im nächsten Zyklus fertig sein. Welcher Befehl braucht Load1?§ Beachten: SUBI wird ausgeführt

Page 38: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

38

Tomasulos Algorithmus (Schleife) Zyklus 10Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 10 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1

4 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

10 64 Fu Load2 Mult2

§ Load2 wird im nächsten Zyklus fertig sein. Welcher Befehl braucht Load2?§ Beachten: BNEQ wird ausgeführt

Page 39: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

39

Tomasulos Algorithmus (Schleife) Zyklus 11Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1

3 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #84 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

11 64 Fu Load3 Mult2

n Der dritte LD (Laden) Befehl

Page 40: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

40

Tomasulos Algorithmus (Schleife) Zyklus 12Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1

2 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #83 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

12 64 Fu Load3 Mult2

n Warum wird der dritte MULTD nicht ausgegeben?

Page 41: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

41

Tomasulos Algorithmus (Schleife) Zyklus 13Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1

1 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #82 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

13 64 Fu Load3 Mult2

n Warum wird der dritte SD (Speichern) nicht ausgegeben?

Page 42: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

42

Tomasulos Algorithmus (Schleife) Zyklus 14Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1

0 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #81 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

14 64 Fu Load3 Mult2

§ Mult1 wird im nächsten Zyklus fertig sein. Welcher Befehl braucht Mult1?

Page 43: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

43

Tomasulos Algorithmus (Schleife) Zyklus 15Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8

0 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

15 64 Fu Load3 Mult2

§ Mult2 wird im nächsten Zyklus fertig sein. Welcher Befehl braucht Mult2?

Page 44: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

44

Tomasulos Algorithmus (Schleife) Zyklus 16Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1

4 Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

16 64 Fu Load3 Mult1

Page 45: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

45

Tomasulos Algorithmus (Schleife) Zyklus 17Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 Yes 64 Mult1

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

17 64 Fu Load3 Mult1

Page 46: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

46

Tomasulos Algorithmus (Schleife) Zyklus 18Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 Yes 64 Mult1

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

18 64 Fu Load3 Mult1

Page 47: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

47

Tomasulos Algorithmus (Schleife) Zyklus 19Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 19 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 No2 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 19 Store3 Yes 64 Mult1

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

19 56 Fu Load3 Mult1

Page 48: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

48

Tomasulos Algorithmus (Schleife) Zyklus 20Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 Yes 561 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 19 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 No2 MULTD F4 F0 F2 7 15 16 Store2 No2 SD F4 0 R1 8 19 20 Store3 Yes 64 Mult1

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

20 56 Fu Load1 Mult1

• Noch einmal: In-order issue, out-of-order execution and out-of-order completion. (In-order Ausgabe, out-of-order Ausführung, und out-of-order Fertigstellung)

Page 49: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

49

n Three key ideasn Dynamic branch prediction

n Choose which instructions to executen Speculative execution

n Allow execution of instructions before the control dependences are resolved n HW undo if prediction is wrong

n Dynamic instruction schedulingn Scheduling different combinations of basic blocksn Exploiting more ILP requires that we overcome the limitation of control dependence:

Hardware Based Speculation (cont’d)

Page 50: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

50

Hardware Speculation

Execution completeExecution

2 3 4 51

•Instruction 1 & 2 are allowed to change the machine state•Instruction 4 & 5 should not change the machine state

branch

Page 51: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

51

Hardware Speculationn Tomasulo without speculation

n Issuen Executionn Write Result

n write results to the CDB & update the register file or memory

n Tomasulo with speculationn Issuen Executionn Write Result

n write results to the CDB & store results in a HW buffer (Reorder Buffer)n Commit

n Update register file or memory

Page 52: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

52

Four Steps of Speculative Tomasulo Algorithm

1. Issue—get instruction from FP Op QueueIf reservation station and reorder buffer slot free, issue instr & send operands & reorder buffer no. for destination.

2. Execution—operate on operands (EX)When both operands ready then execute;; if not ready, watch CDB for result;; when both in reservation station, execute

3. Write result—finish execution (WB)Write on Common Data Bus to all awaiting FUs & reorder buffer;; mark reservation station available.

4. Commit—update register with reorder resultWhen instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer.

Page 53: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

53

Hardware Speculation

n Need HW buffer for results of uncommitted instructions: reorder buffer

::::::MULTD F0, F2, F4DIVD F10,F0,F6::::

Busy Op Vj Vk Qj Qk Dest

ReorderBuffer

FP Regs

FPOp

Queue

FP Adder FP Adder

Res Stations Res Stations

Mult1 Yes mult [F2] [F4] #1Mult2 Yes div [F6] #1

Reservation Station

Reorder BufferBusy Instruction State Destination Value

#1 yes multd f0,f2,f4 Write Result F0 x#2 yes divd f10,f0,f6 Execute F10

Page 54: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

54

n It is a simple circular array with a head and a tail pointer:n New instructions is allocated a position at the tail in program order.

n Each entry provides a location for storing the instruction’s result.

n When the instruction at the head complete and becomes non-­speculative the values are committed and the instruction is removed from the buffer.

Reorder Buffer

Tail Head

Page 55: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

55

Commit Stage

1 2 4 53

2 4 53

4 53

6

Instruction 1 commit

4 5

Instruction 2 commit

Wait for instruction 3 to complete

Instruction 3 commit

5Instruction 4 commit

6 7

6 7 8

6 7 8 9

Page 56: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

56

Speculative Executionn How to handle mispredicted branch?

4 53 6 7 Inst 3 is a mispredicted branch

flush the reorder buffer

11 Fetch from the right path

Page 57: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

57

Tomasulo With Reorder buffer:

ToMemory

FP adders FP multipliers

Reservation Stations

FP OpQueue

ROB7ROB6

ROB5

ROB4

ROB3

ROB2

ROB1F0 LD F0,10(R2) N

Done?

Dest Dest

Oldest

Newest

from Memory

1 10+R2Dest

Reorder Buffer

Registers

Page 58: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

58

2 ADDD R(F4),ROB1

Tomasulo With Reorder buffer:

ToMemory

FP adders FP multipliers

Reservation Stations

FP OpQueue

ROB7ROB6

ROB5

ROB4

ROB3

ROB2

ROB1

F10F0

ADDD F10,F4,F0LD F0,10(R2)

NN

Done?

Dest Dest

Oldest

Newest

from Memory

1 10+R2Dest

Reorder Buffer

Registers

Page 59: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

59

3 DIVD ROB2,R(F6)2 ADDD R(F4),ROB1

Tomasulo With Reorder buffer:

ToMemory

FP adders FP multipliers

Reservation Stations

FP OpQueue

ROB7ROB6

ROB5

ROB4

ROB3

ROB2

ROB1

F2F10F0

DIVD F2,F10,F6ADDD F10,F4,F0LD F0,10(R2)

NNN

Done?

Dest Dest

Oldest

Newest

from Memory

1 10+R2Dest

Reorder Buffer

Registers

Page 60: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

60

3 DIVD ROB2,R(F6)2 ADDD R(F4),ROB16 ADDD ROB5, R(F6)

Tomasulo With Reorder buffer:

ToMemory

FP adders FP multipliers

Reservation Stations

FP OpQueue

ROB7ROB6

ROB5

ROB4

ROB3

ROB2

ROB1

F0 ADDD F0,F4,F6 NF4 LD F4,0(R3) N-- BNE F2,<…> NF2F10F0

DIVD F2,F10,F6ADDD F10,F4,F0LD F0,10(R2)

NNN

Done?

Dest Dest

Oldest

Newest

from Memory

1 10+R2Dest

Reorder Buffer

Registers

5 0+R3

Page 61: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

61

3 DIVD ROB2,R(F6)2 ADDD R(F4),ROB16 ADDD ROB5, R(F6)

Tomasulo With Reorder buffer:

ToMemory

FP adders FP multipliers

Reservation Stations

FP OpQueue

ROB7ROB6

ROB5

ROB4

ROB3

ROB2

ROB1

--F0

ROB5 ST 0(R3),F4ADDD F0,F4,F6

NN

F4 LD F4,0(R3) N-- BNE F2,<…> NF2F10F0

DIVD F2,F10,F6ADDD F10,F4,F0LD F0,10(R2)

NNN

Done?

Dest Dest

Oldest

Newest

from Memory

Dest

Reorder Buffer

Registers

1 10+R25 0+R3

Page 62: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

62

3 DIVD ROB2,R(F6)

Tomasulo With Reorder buffer:

ToMemory

FP adders FP multipliers

Reservation Stations

FP OpQueue

ROB7ROB6

ROB5

ROB4

ROB3

ROB2

ROB1

--F0

M[10] ST 0(R3),F4ADDD F0,F4,F6

YN

F4 M[10] LD F4,0(R3) Y-- BNE F2,<…> NF2F10F0

DIVD F2,F10,F6ADDD F10,F4,F0LD F0,10(R2)

NNN

Done?

Dest Dest

Oldest

Newest

from Memory

1 10+R2Dest

Reorder Buffer

Registers

2 ADDD R(F4),ROB16 ADDD M[10],R(F6)

Page 63: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

63

3 DIVD ROB2,R(F6)2 ADDD R(F4),ROB1

Tomasulo With Reorder buffer:

ToMemory

FP adders FP multipliers

Reservation Stations

FP OpQueue

ROB7ROB6

ROB5

ROB4

ROB3

ROB2

ROB1

--F0

M[10]<val2>

ST 0(R3),F4ADDD F0,F4,F6

YY

F4 M[10] LD F4,0(R3) Y-- BNE F2,<…> NF2F10F0

DIVD F2,F10,F6ADDD F10,F4,F0LD F0,10(R2)

NNN

Done?

Dest Dest

Oldest

Newest

from Memory

1 10+R2Dest

Reorder Buffer

Registers

Page 64: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

64

--F0

M[10]<val2>

ST 0(R3),F4ADDD F0,F4,F6

YEx

F4 M[10] LD F4,0(R3) Y-- BNE F2,<…> N

3 DIVD ROB2,R(F6)2 ADDD R(F4),ROB1

Tomasulo With Reorder buffer:

ToMemory

FP adders FP multipliers

Reservation Stations

FP OpQueue

ROB7ROB6

ROB5

ROB4

ROB3

ROB2

ROB1

F2F10F0

DIVD F2,F10,F6ADDD F10,F4,F0LD F0,10(R2)

NNN

Done?

Dest Dest

Oldest

Newest

from Memory

1 10+R2Dest

Reorder Buffer

Registers

What about memoryhazards???

Page 65: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

65

Avoiding Memory Hazardsn WAW and WAR hazards through memory are eliminated with speculation because actual updating of memory occurs in order, when a store is at head of the ROB, and hence, no earlier loads or stores can still be pending

n RAW hazards through memory are maintained by two restrictions: 1. not allowing a load to initiate the second step of its execution if any active ROB entry occupied by a store has a Destination field that matches the value of the address field of the load, and

2. maintaining the program order for the computation of an effective address of a load with respect to all earlier stores.

n these restrictions ensure that any load that accesses a memory location written to by an earlier store cannot perform the memory access until the store has written the data

Page 66: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

Backup -­ German

66

Page 67: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

67

hardwarebasierte spekulative Ausführung

n Speculation: allow an instruction to execute that is dependent on a predicted branch without any consequences (including exceptions) if branch prediction is wrong (“HW undo”)

n Speculative Execution vs. Dynamic branch prediction + dynamic instruction scheduling n Speculative execution

n Speculate the outcome of a branch and fetch, issue and executeinstructions

n Dynamic instruction scheduling + branch predictionn Speculate the outcome of a branch, and fetch, issue instructions

Page 68: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

68

n Drei Ideenn dynamische Sprungvorhersagen (dynamic branch prediction)

n wählen welche Befehle auszuführen ausn spekulative Ausführung (speculative execution)

n erlauben Ausführungen der Befehle bevor die Kontrolflussabhängigkeit gelöst wird

n löschen im HW wenn die Vorhersage unrichtig istn dynamische Befehlsausführung (dynamic instruction scheduling)

n führen verschiedene Kombinationen von Befehlen aus n bearbeiten die Limitierung von Kontrolflussabhängigkeit, um bessere ILP zu bekommen

hardwarebasierte spekulative Ausführung

Page 69: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

69

hardwarebasierte spekulative Ausführung

Ausführung fertigAusführung

2 3 4 51

• Befehle 1 & 2 werden den Maschinenzustand zu verändern erlaubt• Befehle 4 & 5 sollen den Maschinenzustand nicht verändern

Zweig

Page 70: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

70

hardwarebasierte spekulative Ausführungn Tomasulo ohne Spekulation

n Issuen Executionn Write Result

n write results to the CDB & update the register file or memory

n Tomasulo mit Spekulationn Issuen Executionn Write Result

n write results to the CDB & store results in a HW buffer (Reorder Buffer)n Commit

n Update register file or memory

Page 71: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

71

Four Steps of Speculative Tomasulo Algorithm

1. Issue—get instruction from FP Op QueueIf reservation station and reorder buffer slot free, issue instr & send operands & reorder buffer no. for destination.

2. Execution—operate on operands (EX)When both operands ready then execute;; if not ready, watch CDB for result;; when both in reservation station, execute

3. Write result—finish execution (WB)Write on Common Data Bus to all awaiting FUs & reorder buffer;; mark reservation station available.

4. Commit—update register with reorder resultWhen instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer.

Page 72: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

72

Hardware Speculation

n Need HW buffer for results of uncommitted instructions: reorder buffer

::::::MULTD F0, F2, F4DIVD F10,F0,F6::::

Busy Op Vj Vk Qj Qk Dest

ReorderBuffer

FP Regs

FPOp

Queue

FP Adder FP Adder

Res Stations Res Stations

Mult1 Yes mult [F2] [F4] #1Mult2 Yes div [F6] #1

Reservation Station

Reorder BufferBusy Instruction State Destination Value

#1 yes multd f0,f2,f4 Write Result F0 x#2 yes divd f10,f0,f6 Execute F10

Page 73: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

73

n It is a simple circular array with a head and a tail pointer:n New instructions is allocated a position at the tail in program order.

n Each entry provides a location for storing the instruction’s result.

n When the instruction at the head complete and becomes non-­speculative the values are committed and the instruction is removed from the buffer.

Reorder Buffer

Tail Head

Page 74: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

74

Commit Stage

1 2 4 53

2 4 53

4 53

6

Instruction 1 commit

4 5

Instruction 2 commit

Wait for instruction 3 to complete

Instruction 3 commit

5Instruction 4 commit

6 7

6 7 8

6 7 8 9

Page 75: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

75

Speculative Executionn How to handle mispredicted branch?

4 53 6 7 Inst 3 is a mispredicted branch

flush the reorder buffer

11 Fetch from the right path

Page 76: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

76

Tomasulo mit Reorder buffer:

ToMemory

FP adders FP multipliers

Reservation Stations

FP OpQueue

ROB7ROB6

ROB5

ROB4

ROB3

ROB2

ROB1F0 LD F0,10(R2) N

Done?

Dest Dest

Oldest

Newest

from Memory

1 10+R2Dest

Reorder Buffer

Registers

Page 77: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

77

2 ADDD R(F4),ROB1

Tomasulo mit Reorder buffer:

ToMemory

FP adders FP multipliers

Reservation Stations

FP OpQueue

ROB7ROB6

ROB5

ROB4

ROB3

ROB2

ROB1

F10F0

ADDD F10,F4,F0LD F0,10(R2)

NN

Done?

Dest Dest

Oldest

Newest

from Memory

1 10+R2Dest

Reorder Buffer

Registers

Page 78: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

78

3 DIVD ROB2,R(F6)2 ADDD R(F4),ROB1

Tomasulo mit Reorder buffer:

ToMemory

FP adders FP multipliers

Reservation Stations

FP OpQueue

ROB7ROB6

ROB5

ROB4

ROB3

ROB2

ROB1

F2F10F0

DIVD F2,F10,F6ADDD F10,F4,F0LD F0,10(R2)

NNN

Done?

Dest Dest

Oldest

Newest

from Memory

1 10+R2Dest

Reorder Buffer

Registers

Page 79: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

79

3 DIVD ROB2,R(F6)2 ADDD R(F4),ROB16 ADDD ROB5, R(F6)

Tomasulo mit Reorder buffer:

ToMemory

FP adders FP multipliers

Reservation Stations

FP OpQueue

ROB7ROB6

ROB5

ROB4

ROB3

ROB2

ROB1

F0 ADDD F0,F4,F6 NF4 LD F4,0(R3) N-- BNE F2,<…> NF2F10F0

DIVD F2,F10,F6ADDD F10,F4,F0LD F0,10(R2)

NNN

Done?

Dest Dest

Oldest

Newest

from Memory

1 10+R2Dest

Reorder Buffer

Registers

5 0+R3

Page 80: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

80

3 DIVD ROB2,R(F6)2 ADDD R(F4),ROB16 ADDD ROB5, R(F6)

Tomasulo mit Reorder buffer:

ToMemory

FP adders FP multipliers

Reservation Stations

FP OpQueue

ROB7ROB6

ROB5

ROB4

ROB3

ROB2

ROB1

--F0

ROB5 ST 0(R3),F4ADDD F0,F4,F6

NN

F4 LD F4,0(R3) N-- BNE F2,<…> NF2F10F0

DIVD F2,F10,F6ADDD F10,F4,F0LD F0,10(R2)

NNN

Done?

Dest Dest

Oldest

Newest

from Memory

Dest

Reorder Buffer

Registers

1 10+R25 0+R3

Page 81: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

81

3 DIVD ROB2,R(F6)

Tomasulo mit Reorder buffer:

ToMemory

FP adders FP multipliers

Reservation Stations

FP OpQueue

ROB7ROB6

ROB5

ROB4

ROB3

ROB2

ROB1

--F0

M[10] ST 0(R3),F4ADDD F0,F4,F6

YN

F4 M[10] LD F4,0(R3) Y-- BNE F2,<…> NF2F10F0

DIVD F2,F10,F6ADDD F10,F4,F0LD F0,10(R2)

NNN

Done?

Dest Dest

Oldest

Newest

from Memory

1 10+R2Dest

Reorder Buffer

Registers

2 ADDD R(F4),ROB16 ADDD M[10],R(F6)

Page 82: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

82

3 DIVD ROB2,R(F6)2 ADDD R(F4),ROB1

Tomasulo mit Reorder buffer:

ToMemory

FP adders FP multipliers

Reservation Stations

FP OpQueue

ROB7ROB6

ROB5

ROB4

ROB3

ROB2

ROB1

--F0

M[10]<val2>

ST 0(R3),F4ADDD F0,F4,F6

YY

F4 M[10] LD F4,0(R3) Y-- BNE F2,<…> NF2F10F0

DIVD F2,F10,F6ADDD F10,F4,F0LD F0,10(R2)

NNN

Done?

Dest Dest

Oldest

Newest

from Memory

1 10+R2Dest

Reorder Buffer

Registers

Page 83: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

83

--F0

M[10]<val2>

ST 0(R3),F4ADDD F0,F4,F6

YEx

F4 M[10] LD F4,0(R3) Y-- BNE F2,<…> N

3 DIVD ROB2,R(F6)2 ADDD R(F4),ROB1

Tomasulo mit Reorder buffer:

ToMemory

FP adders FP multipliers

Reservation Stations

FP OpQueue

ROB7ROB6

ROB5

ROB4

ROB3

ROB2

ROB1

F2F10F0

DIVD F2,F10,F6ADDD F10,F4,F0LD F0,10(R2)

NNN

Done?

Dest Dest

Oldest

Newest

from Memory

1 10+R2Dest

Reorder Buffer

Registers

What about memoryhazards???

Page 84: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

84

Avoiding Memory Hazardsn WAW and WAR hazards through memory are eliminated with speculation because actual updating of memory occurs in order, when a store is at head of the ROB, and hence, no earlier loads or stores can still be pending

n RAW hazards through memory are maintained by two restrictions: 1. not allowing a load to initiate the second step of its execution if any active ROB entry occupied by a store has a Destination field that matches the value of the address field of the load, and

2. maintaining the program order for the computation of an effective address of a load with respect to all earlier stores.

n these restrictions ensure that any load that accesses a memory location written to by an earlier store cannot perform the memory access until the store has written the data

Page 85: Tomasulo’s*Algorithm* · 2020. 6. 16. · 1 Tomasulo’s*Algorithm* Jian3Jia*Chen (Slides*are*based*on*the*lecture*note*from* Prof.*Chia3Lin*Yang)*

85

hardwarebasierte spekulative Ausführung

n Speculation: allow an instruction to execute that is dependent on a predicted branch without any consequences (including exceptions) if branch prediction is wrong (“HW undo”)

n Speculative Execution vs. Dynamic branch prediction + dynamic instruction scheduling n Speculative execution

n Speculate the outcome of a branch and fetch, issue and execute instructions

n Dynamic instruction scheduling + branch predictionn Speculate the outcome of a branch, and fetch, issue instructions