Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
EECS 470 Instruction/Decode Buffer
Fetch
Dispatch Buffer
Decode
Ord
er
Lecture 8Speculation &
Dispatch Buffer
Reservation
Dispatch
StationsIssue
In O
Speculation &Precise Interrupts IIFall 2007
Reorder/
Complete
Execute
Finish
Out
of
Ord
err
Completion BufferFall 2007
Prof. Thomas Wenisch
http://www eecs umich edu/courses/eecs470
Store Buffer
Complete
RetireIn O
rder
http://www.eecs.umich.edu/courses/eecs470
Many thanks to Prof. Martin and Roth of University of Pennsylvania for most of Many thanks to Prof. Martin and Roth of University of Pennsylvania for most of these slides. these slides. Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen,
Lecture 8 Slide 1EECS 470
Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie Mellon University, Purdue Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie Mellon University, Purdue University, University of Michigan, and University of Wisconsin. University, University of Michigan, and University of Wisconsin.
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Announcements
HW # 3 is posted, due 10/10HW # 3 is posted, due 10/10
Programming assignment #3 (due 10/8)Programming assignment #3 (due 10/8)
Project handout is posted• Form groups of 3‐5 ASAP• Bigger group == higher expectations for gradinggg g p g p g g
Lecture 8 Slide 2EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Readings
For Today:For Today:Smith & Pleszkun “Implementing Precise Interrupts”H & P Chapter 2.4‐2.6, 2.8
Have you read yet?D Si “D i S f R i t R i T h i ”D. Sima “Design Space of Register Renaming Techniques”
Some of the homework questions cover the papers!Some of the homework questions cover the papers!
Lecture 8 Slide 3EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
The Problem with Precise Stateinsn buffer
regfile
D$I$B
P bl it b k bi t t f ti
P SD
Problem: writeback combines two separate functions• Forwards values to younger insns: OK for this to be out‐of‐order
• Write values to registers: would like this to be in‐order
Similar problem (decode) for OoO execution: solution?• Split decode (D) → in‐order dispatch (D) + out‐of‐order issue (S)
EECS 470Lecture 8 Slide 4EECS 470 EECS 470
• Separate using insn buffer: scoreboard or reservation station
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Re-Order Buffer (ROB)Reorder buffer (ROB)
regfile
D$I$B
Insn buffer→ re‐order buffer (ROB)
P W1 W2
Insn buffer → re‐order buffer (ROB)• Buffers completed results en route to register file
• May be combined with RS or separate
• Combined in picture: register‐update unit RUU (Sohi’s method)
• Separate (more common today): P6‐style
Split writeback (W) into two stages
EECS 470Lecture 8 Slide 5EECS 470 EECS 470
Split writeback (W) into two stages• Why is there no latch between W1 and W2?
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Complete and RetireReorder buffer (ROB)
regfile
D$I$B
Complete (C): second part of decode
P C R
Complete (C): second part of decode• Completed insns write results into ROB
+ Out‐of‐order: wait doesn’t back‐propagate to younger insns
Retire (R): aka commit, graduate• ROB writes results to register file
• In order: stall back‐propagates to younger insns
EECS 470Lecture 8 Slide 6EECS 470 EECS 470
In order: stall back propagates to younger insns
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Load/Store Queue (LSQ)ROB makes register writes in‐order, but what about stores?
As usual, i.e., to D$ in X stage?• Not even close, imprecise memory worse than imprecise registers
Load/store queue (LSQ)• Completed stores write to LSQp Q
• When store retires, head of LSQ written to D$
• When loads execute, access LSQ and D$ in parallel
• Forward from LSQ if older store with matching address• Forward from LSQ if older store with matching address
• More modern design: loads and stores in separate queues
EECS 470Lecture 8 Slide 7EECS 470 EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
ROB + LSQROB
regfile
I$BP C R
store data load data
D$
LSQload/store
store dataaddr
EECS 470Lecture 8 Slide 8EECS 470 EECS 470
Modulo gross simplifications, this picture is almost realistic!
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6P6: Start with Tomasulo’s algorithm… add ROB
• Separate ROB and RS
Simple‐P6• Our old RS organization: 1 ALU 1 load 1 store 2 3 cycle FP• Our old RS organization: 1 ALU, 1 load, 1 store, 2 3‐cycle FP
EECS 470Lecture 8 Slide 9EECS 470 EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 Data StructuresReservation Stations are same as before
ROB• head, tail: pointers maintain sequential order
• R: insn output register, V: insn output value
T diff tTags are different• Tomasulo: RS# → P6: ROB#
Map Table is differentp• T+: tag + “ready‐in‐ROB” bit
• T==0 → Value is ready in regfile
• T! 0→ Value is not ready• T!=0 → Value is not ready
• T!=0+ → Value is ready in the ROB
EECS 470Lecture 8 Slide 10EECS 470 EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 Data Structures
valueT+Map TableRegfile
R valueHead
DB
.V
DB
.T
HeadRetire
TailDispatch
V1 V2T2T1Top========
CD
CD
Dispatch========
ROBDispatch
FU
==RS
T
==
• Insn fields and status bits
• Tags
• Values
EECS 470Lecture 8 Slide 11EECS 470 EECS 470
Values
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 Data StructuresROBht # Insn R V S X C
1 ldf X( 1) f1
Map TableReg T+f0
CDBT V
1 ldf X(r1),f12 mulf f0,f1,f23 stf f2,Z(r1) 4 addi r1,4,r1
f0f1f2r1
5 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST no4 FP1 no
EECS 470Lecture 8 Slide 12EECS 470 EECS 470
5 FP2 no
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 PipelineNew pipeline structure: F, D, S, X, C, R
• D (dispatch)
• Structural hazard (ROB/LSQ/RS) ? Stall
• Allocate ROB/LSQ/RS
• Set RS tag to ROB#Set RS tag to ROB#
• Set Map Table entry to ROB# and clear “ready‐in‐ROB” bit
• Read ready registers into RS (from either ROB or Regfile)
X ( t )• X (execute)
• Free RS entry
• Use to be at W, can be earlier because RS# are not tags
EECS 470Lecture 8 Slide 13EECS 470 EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 Pipeline• C (complete)
• Structural hazard (CDB)? wait
• Write value into ROB entry indicated by RS tag
• Mark ROB entry as complete
• If not overwritten, mark Map Table entry “ready‐in‐ROB” bit (+)
• R (retire)
• Insn at ROB head not complete ? stall
• Handle any exceptions• Handle any exceptions
• Write ROB head value to register file
• If store, write LSQ head to D$
• Free ROB/LSQ entries
EECS 470Lecture 8 Slide 14EECS 470 EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 Dispatch (D): Part I
valueT+Map TableRegfile
R valueHead
DB
.V
DB
.T
HeadRetire
TailDispatch
V1 V2T2T1Top========
CD
CD
Dispatch========
ROBDispatch
FU
==RS
T
==
• RS/ROB full ? stall
• Allocate RS/ROB entries, assign ROB# to RS output tag
• Set output register Map Table entry to ROB#, clear “ready‐in‐ROB”
EECS 470Lecture 8 Slide 15EECS 470 EECS 470
Set output register Map Table entry to ROB#, clear ready in ROB
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 Dispatch (D): Part II
valueT+Map TableRegfile
R valueHead
DB
.V
DB
.T
HeadRetire
TailDispatch
V1 V2T2T1Top========
CD
CD
Dispatch========
ROBDispatch
FU
==RS
T
==
• Read tags for register inputs from Map Table
• Tag==0 → copy value from Regfile (not shown)
• Tag!=0 → copy Map Table tag to RS
EECS 470Lecture 8 Slide 16EECS 470 EECS 470
• Tag!=0+ → copy value from ROB
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 Complete (C)
valueT+Map TableRegfile
R valueHead
DB
.V
DB
.T
HeadRetire
TailDispatch
V1 V2T2T1Top========
CD
CD
Dispatch========
ROBDispatch
FU
==RS
T
==
• Structural hazard (CDB) ? Stall : broadcast <value,tag> on CDB
• Write result into ROB, if still valid set MapTable “ready‐in‐ROB” bit
• Match tags, write CDB.V into RS slots of dependent insns
EECS 470Lecture 8 Slide 17EECS 470 EECS 470
Match tags, write CDB.V into RS slots of dependent insns
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 Retire (R)
valueTMap TableRegfile
R valueHead
DB
.V
DB
.T
HeadRetire
TailDispatch
V1 V2T2T1Top========
CD
CD
Dispatch========
ROBDispatch
FU
==RS
T
==
• ROB head not complete ? stall : free ROB entry
• Write ROB head result to Regfile
• If still valid, clear Map Table entry
EECS 470Lecture 8 Slide 18EECS 470 EECS 470
If still valid, clear Map Table entry
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 1ROBht # Insn R V S X Cht 1 ldf X( 1) f1 f1
Map TableReg T+f0
CDBT V
ht 1 ldf X(r1),f1 f12 mulf f0,f1,f23 stf f2,Z(r1) 4 addi r1,4,r1
f0f1 ROB#1f2r1
5 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)
Reservation Stations# FU busy op T T1 T2 V1 V2 set ROB# tag1 ALU no2 LD yes ldf ROB#1 [r1]3 ST no4 FP1 no
allocate
EECS 470Lecture 8 Slide 19EECS 470 EECS 470
5 FP2 no
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 2ROBht # Insn R V S X Ch 1 ldf X( 1) f1 f1 2
Map TableReg T+f0
CDBT V
h 1 ldf X(r1),f1 f1 c2t 2 mulf f0,f1,f2 f2
3 stf f2,Z(r1) 4 addi r1,4,r1
f0f1 ROB#1f2 ROB#2r1
5 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)
Reservation Stations# FU busy op T T1 T2 V1 V2 set ROB# tag1 ALU no2 LD yes ldf ROB#1 [r1]3 ST no4 FP1 yes mulf ROB#2 ROB#1 [f0] allocate
EECS 470Lecture 8 Slide 20EECS 470 EECS 470
y # # [ ]5 FP2 no
allocate
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 3ROBht # Insn R V S X Ch 1 ldf X( 1) f1 f1 2 3
Map TableReg T+f0
CDBT V
h 1 ldf X(r1),f1 f1 c2 c32 mulf f0,f1,f2 f2
t 3 stf f2,Z(r1) 4 addi r1,4,r1
f0f1 ROB#1f2 ROB#2r1
5 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#3 ROB#2 [r1]4 FP1 yes mulf ROB#2 ROB#1 [f0]
allocatefree
EECS 470Lecture 8 Slide 21EECS 470 EECS 470
y # # [ ]5 FP2 no
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 4ROBht # Insn R V S X Ch 1 ldf X( 1) f1 f1 [f1] 2 3 4
Map TableReg T+f0
CDBT VROB#1 [f1]h 1 ldf X(r1),f1 f1 [f1] c2 c3 c4
2 mulf f0,f1,f2 f2 c43 stf f2,Z(r1)
t 4 addi r1,4,r1 r1
f0f1 ROB#1+f2 ROB#2r1 ROB#4
ROB#1 [f1]
5 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)
ldf finished1. set “ready-in-ROB” bit2. write result to ROB3 CDB broadcast
Reservation Stations# FU busy op T T1 T2 V1 V2
# ll t
3. CDB broadcast
1 ALU yes add ROB#4 [r1]2 LD no3 ST yes stf ROB#3 ROB#2 [r1]4 FP1 yes mulf ROB#2 ROB#1 [f0] CDB.V
allocate
ROB#1 ready
EECS 470Lecture 8 Slide 22EECS 470 EECS 470
y # # [ ]5 FP2 no grab CDB.V
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 5ROBht # Insn R V S X C
1 ldf X( 1) f1 f1 [f1] 2 3 4
Map TableReg T+f0
CDBT V
1 ldf X(r1),f1 f1 [f1] c2 c3 c4h 2 mulf f0,f1,f2 f2 c4 c5
3 stf f2,Z(r1) 4 addi r1,4,r1 r1 c5
f0f1 ROB#5f2 ROB#2r1 ROB#4
t 5 ldf X(r1),f1 f16 mulf f0,f1,f27 stf f2,Z(r1)
ldf retires1. write ROB result to regfile
Reservation Stations# FU busy op T T1 T2 V1 V2
#1 ALU yes add ROB#4 [r1]2 LD yes ldf ROB#5 ROB#43 ST yes stf ROB#3 ROB#2 [r1]4 FP1 no
allocate
free
EECS 470Lecture 8 Slide 23EECS 470 EECS 470
5 FP2 nofree
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 6ROBht # Insn R V S X C
1 ldf X( 1) f1 f1 [f1] 2 3 4
Map TableReg T+f0
CDBT V
1 ldf X(r1),f1 f1 [f1] c2 c3 c4h 2 mulf f0,f1,f2 f2 c4 c5+
3 stf f2,Z(r1) 4 addi r1,4,r1 r1 c5 c6
f0f1 ROB#5f2 ROB#6r1 ROB#4
5 ldf X(r1),f1 f1t 6 mulf f0,f1,f2 f2
7 stf f2,Z(r1)
Reservation Stations# FU busy op T T1 T2 V1 V2
f1 ALU no2 LD yes ldf ROB#5 ROB#43 ST yes stf ROB#3 ROB#2 [r1]4 FP1 yes mulf ROB#6 ROB#5 [f0] allocate
free
EECS 470Lecture 8 Slide 24EECS 470 EECS 470
y # # [ ]5 FP2 no
allocate
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 7ROBht # Insn R V S X C
1 ldf X( 1) f1 f1 [f1] 2 3 4
Map TableReg T+f0
CDBT VROB#4 [ 1]1 ldf X(r1),f1 f1 [f1] c2 c3 c4
h 2 mulf f0,f1,f2 f2 c4 c5+3 stf f2,Z(r1) 4 addi r1,4,r1 r1 [r1] c5 c6 c7
f0f1 ROB#5f2 ROB#6r1 ROB#4+
ROB#4 [r1]
5 ldf X(r1),f1 f1 c7t 6 mulf f0,f1,f2 f2
7 stf f2,Z(r1) stall D (no free ST RS)
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD yes ldf ROB#5 ROB#4 CDB.V3 ST yes stf ROB#3 ROB#2 [r1]4 FP1 yes mulf ROB#6 ROB#5 [f0]
ROB#4 readygrab CDB.V
EECS 470Lecture 8 Slide 25EECS 470 EECS 470
y # # [ ]5 FP2 no
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 8ROBht # Insn R V S X C
1 ldf X( 1) f1 f1 [f1] 2 3 4
Map TableReg T+f0
CDBT VROB#2 [f2]1 ldf X(r1),f1 f1 [f1] c2 c3 c4
h 2 mulf f0,f1,f2 f2 [f2] c4 c5+ c83 stf f2,Z(r1) c84 addi r1,4,r1 r1 [r1] c5 c6 c7
f0f1 ROB#5f2 ROB#6r1 ROB#4+
ROB#2 [f2]
5 ldf X(r1),f1 f1 c7 c8t 6 mulf f0,f1,f2 f2
7 stf f2,Z(r1)
stall R for addi (in-order)
ROB#2 invalid in MapTabledon’t set “ready-in-ROB”
Reservation Stations# FU busy op T T1 T2 V1 V2
don t set ready in ROB
1 ALU no2 LD no3 ST yes stf ROB#3 ROB#2 [f2] [r1]4 FP1 yes mulf ROB#6 ROB#5 [f0]
ROB#2 readygrab CDB.V
EECS 470Lecture 8 Slide 26EECS 470 EECS 470
y # # [ ]5 FP2 no
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 9ROBht # Insn R V S X C
1 ldf X( 1) f1 f1 [f1] 2 3 4
Map TableReg T+f0
CDBT VROB#5 [f1]1 ldf X(r1),f1 f1 [f1] c2 c3 c4
2 mulf f0,f1,f2 f2 [f2] c4 c5+ c8h 3 stf f2,Z(r1) c8 c9
4 addi r1,4,r1 r1 [r1] c5 c6 c7
f0f1 ROB#5+f2 ROB#6r1 ROB#4+
ROB#5 [f1]
5 ldf X(r1),f1 f1 [f1] c7 c8 c96 mulf f0,f1,f2 f2 c9
t 7 stf f2,Z(r1)
retire mulf
all pipe stages active at once!
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#7 ROB#6 ROB#4.V4 FP1 yes mulf ROB#6 ROB#5 [f0] CDB.V ROB#5 ready
free, re-allocate
EECS 470Lecture 8 Slide 27EECS 470 EECS 470
y # # [ ]5 FP2 no
ROB#5 readygrab CDB.V
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 10ROBht # Insn R V S X C
1 ldf X( 1) f1 f1 [f1] 2 3 4
Map TableReg T+f0
CDBT V
1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c8
h 3 stf f2,Z(r1) c8 c9 c104 addi r1,4,r1 r1 [r1] c5 c6 c7
f0f1 ROB#5+f2 ROB#6r1 ROB#4+
5 ldf X(r1),f1 f1 [f1] c7 c8 c96 mulf f0,f1,f2 f2 c9 c10
t 7 stf f2,Z(r1)
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#7 ROB#6 ROB#4.V4 FP1 no free
EECS 470Lecture 8 Slide 28EECS 470 EECS 470
5 FP2 nofree
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 11ROBht # Insn R V S X C
1 ldf X( 1) f1 f1 [f1] 2 3 4
Map TableReg T+f0
CDBT V
1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5 c83 stf f2,Z(r1) c8 c9 c10
h 4 addi r1,4,r1 r1 [r1] c5 c6 c7
f0f1 ROB#5+f2 ROB#6r1 ROB#4+
5 ldf X(r1),f1 f1 [f1] c7 c8 c96 mulf f0,f1,f2 f2 c9 c10
t 7 stf f2,Z(r1)
retire stf
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#7 ROB#6 ROB#4.V4 FP1 no
EECS 470Lecture 8 Slide 29EECS 470 EECS 470
5 FP2 no
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Precise State in P6Point of ROB is maintaining precise state
• How does that work?
• Easy as 1,2,3
1. Wait until last good insn retires, first bad insn at ROB head
2. Clear contents of ROB, RS, and Map Table2. Clear contents of ROB, RS, and Map Table
3. Start over
• Works because zero (0) means the right thing…
0 i ROB/RS t i t• 0 in ROB/RS → entry is empty
• Tag == 0 in Map Table → register is in regfile• …and because regfile and D$ writes take place at R
• Example: page fault in first stf
EECS 470Lecture 8 Slide 30EECS 470 EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 9 (with precise state)ROBht # Insn R V S X C
1 ldf X( 1) f1 f1 [f1] 2 3 4
Map TableReg T+f0
CDBT VROB#5 [f1]1 ldf X(r1),f1 f1 [f1] c2 c3 c4
2 mulf f0,f1,f2 f2 [f2] c4 c5+ c8h 3 stf f2,Z(r1) c8 c9
4 addi r1,4,r1 r1 [r1] c5 c6 c7
f0f1 ROB#5+f2 ROB#6r1 ROB#4+
ROB#5 [f1]
5 ldf X(r1),f1 f1 [f1] c7 c8 c96 mulf f0,f1,f2 f2 c9
t 7 stf f2,Z(r1) PAGE FAULT
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#7 ROB#6 ROB#4.V4 FP1 yes mulf ROB#6 ROB#5 [f0] CDB.V
EECS 470Lecture 8 Slide 31EECS 470 EECS 470
y # # [ ]5 FP2 no
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 10 (with precise state)ROBht # Insn R V S X C
1 ldf X( 1) f1 f1 [f1] 2 3 4
Map TableReg T+f0
CDBT V
1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c83 stf f2,Z(r1) 4 addi r1,4,r1
f0f1f2r1
5 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)
faulting insn at ROB head?CLEAR EVERYTHING
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST no4 FP1 no
EECS 470Lecture 8 Slide 32EECS 470 EECS 470
5 FP2 no
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 11 (with precise state)ROBht # Insn R V S X C
1 ldf X( 1) f1 f1 [f1] 2 3 4
Map TableReg T+f0
CDBT V
1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c8
ht 3 stf f2,Z(r1) 4 addi r1,4,r1
f0f1f2r1
5 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)
START OVER(after OS fixes page fault)
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#3 [f4] [r1]4 FP1 no
EECS 470Lecture 8 Slide 33EECS 470 EECS 470
5 FP2 no
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 12 (with precise state)ROBht # Insn R V S X C
1 ldf X( 1) f1 f1 [f1] 2 3 4
Map TableReg T+f0
CDBT V
1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c8
h 3 stf f2,Z(r1) c12t 4 addi r1,4,r1 r1
f0f1f2r1 ROB#4
5 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)
Reservation Stations# FU busy op T T1 T2 V1 V2
#1 ALU yes addi ROB#4 [r1]2 LD no3 ST yes stf ROB#3 [f4] [r1]4 FP1 no
EECS 470Lecture 8 Slide 34EECS 470 EECS 470
5 FP2 no
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 PerformanceIn other words: what is the cost of precise state?
+ In general: same performance as “plain” Tomasulo
• ROB is not a performance device
• Maybe a little better (RS freed earlier → fewer struct hazards)– Unless ROB is too smallUnless ROB is too small
• In which case ROB struct hazards become a problem
• Rules of thumb for ROB size
At l t N ( idth) * b f i t b t D d R• At least N (width) * number of pipe stages between D and R
• At least N * thit‐L2• Can add a factor of 2 to both if you want
• What is the rationale behind these?
EECS 470Lecture 8 Slide 35EECS 470 EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 (Tomasulo+ROB) ReduxPopular design for a while
• (Relatively) easy to implement correctly
• Anything goes wrong (mispredicted branch, fault, interrupt)?
• Just clear everything and start again
• Examples: Intel PentiumPro, IBM/Motorola PowerPC, AMD K6Examples: Intel PentiumPro, IBM/Motorola PowerPC, AMD K6
Actually making a comeback…E l I t l P ti M• Examples: Intel PentiumM
But went away for a while, why?y y
EECS 470Lecture 8 Slide 36EECS 470 EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
The Problem with P6
valueT+Map TableRegfile
R valueHead
DB
.V
DB
.T
HeadRetire
TailDispatch
V1 V2T2T1Top========
CD
CD
Dispatch========
ROBDispatch
FU
==RS
T
==
Problem for high performance implementations– Too much value movement (regfile/ROB→RS→ROB→regfile)– Multi input muxes long buses complicate routing and slow clock
EECS 470Lecture 8 Slide 37EECS 470 EECS 470
– Multi‐input muxes, long buses complicate routing and slow clock