Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Storage Bus Instruction Unit
EECS 470Floating Point 4
8
Floating
Operand
Stack
FLOSControl
Floating Point
Buffers FLB 3
4
5
6
Floating PointRegisters(FLR)
Control
248
Floating
Operand Stack
Floating Point
Buffers (FLB)
23456
Point
BusyBits
Tags
(FLOS)
Lecture 6
Tomasolu’s Algorithm
Registers FLR
0
2
Store 3
C
Decoder
1
2
Decoder
Registers (FLR)02
12
23
Control
•
Tags
(FLOS)
Tomasolu s Algorithm
Fall 2007
Data
1
2
Buffers SDB
Control
StoreData
12
Buffers (SDB)
Control
•
FLB BusFLR BusCDB ••
•
Sink TagTag SourceCtrl.Sink TagTag SourceCtrl.
•
Fall 2007
Prof. Thomas Wenisch
http://www.eecs.umich.edu/courses/eecs4
Adder
Result
Multiply/Divide
Common Data Bus (CDB)
Adder
Sink TagTag SourceCtrl.
Sink TagTag SourceCtrl.Sink TagTag SourceCtrl.
Result
http://www.eecs.umich.edu/courses/eecs470
Many thanks to Prof. Martin and Roth of University of Pennsylvania for most of Many thanks to Prof. Martin and Roth of University of Pennsylvania for most of these slides. these slides. Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen,
Lecture 6 Slide 1EECS 470
Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie Mellon University, Purdue Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie Mellon University, Purdue University, University of Michigan, and University of Wisconsin. University, University of Michigan, and University of Wisconsin.
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Announcements
HW # 2 due Wednesday 9/26HW # 2 due Wednesday 9/26 Programming assignment #2 due today
• Hand in printed synthesis results by 6pm at CSE 4620• Electronic submission of integer square root by 6pm
Lecture 6 Slide 2EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Readings
For Today:For Today:H & P Chapter 2.4‐2.6D. Sima “Design Space of Register Renaming Techniques”
For Wednesday:Smith & Pleszkun “Implementing Precise Interrupts”Smith & Pleszkun Implementing Precise Interrupts
Lecture 6 Slide 3EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Scoreboard Redux• The good
+ Cheap hardware• InsnStatus + FuStatus + RegStatus ~ 1 FP unit in area
+ Pretty good performance• 1.7X for FORTRAN (scientific array) programs1.7X for FORTRAN (scientific array) programs
• The less good– No bypassing
• Is this a fundamental problem?– Limited scheduling scope
• Structural/WAW hazards delay dispatch/ y p– Slow issue of truly-dependent (RAW) insns
• WAR hazards delay writeback• Fix with hardware register renaming
EECS 470Lecture 6 Slide 4EECS 470
• Fix with hardware register renaming
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Register Renaming• Register renaming (in hardware)
• Change register names to eliminate WAR/WAW hazards• One of most the beautiful things in computer architecture• Key: think of registers (r1,f0…) as names, not storage locations+ Can have more locations than names+ Can have more locations than names+ Can have multiple active versions of same name
H d it k?• How does it work?• Map-table: maps names to most recent locations
• SRAM indexed by namey• On a write: allocate new location, note in map-table• On a read: find location of most recent write via map-table lookup• Small detail: must de-allocate locations at some point
EECS 470Lecture 6 Slide 5EECS 470
• Small detail: must de-allocate locations at some point
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Register Renaming Example• Parameters
• Names: r1,r2,r3• Locations: p1,p2,p3,p4,p5,p6,p7• Original mapping: r1→p1, r2→p2, r3→p3, p4–p7 are “free”
MapTable FreeList Raw insns Renamed insnsMapTable FreeList Raw insns Renamed insnsr1 r2 r3p1 p2 p3 p4,p5,p6,p7 add r2,r3,r1 add p2,p3,p4p4 p2 p3 p5 p6 p7 sub r2 r1 r3 sub p2 p4 p5p4 p2 p3 p5,p6,p7 sub r2,r1,r3 sub p2,p4,p5p4 p2 p5 p6,p7 mul r2,r3,r3 mul p2,p5,p6p4 p2 p6 p7 div r1,4,r1 div p4,4,p7
• Renaming+ Removes WAW and WAR dependences+ Leaves RAW intact!
EECS 470Lecture 6 Slide 6EECS 470
+ Leaves RAW intact!
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Scheduling Algorithm II: Tomasulo• Tomasulo’s algorithm
• Reservation stations (RS): instruction buffer• Common data bus (CDB): broadcasts results to RS• Register renaming: removes WAR/WAW hazards
• First implementation: IBM 360/91 [1967]• Dynamic scheduling for FP units only• Bypassing
• Our example: “Simple Tomasulo”• Our example: Simple Tomasulo• Dynamic scheduling for everything, including load/store• No bypassing (for comparison with Scoreboard)
EECS 470Lecture 6 Slide 7EECS 470
• 5 RS: 1 ALU, 1 load, 1 store, 2 FP (3-cycle, pipelined)
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Tomasulo Data Structures• Reservation Stations (RS#)
• FU, busy, op, R: destination register name• T: destination register tag (RS# of this RS)• T1,T2: source register tags (RS# of RS that will produce value)• V1,V2: source register valuesV1,V2: source register values
• That’s new
• Map Table• T: tag (RS#) that will write this register
• Common Data Bus (CDB)• Broadcasts <RS# value> of completed insns• Broadcasts <RS#, value> of completed insns
• Tags interpreted as ready-bits++• T==0 → Value is ready somewhere
EECS 470Lecture 6 Slide 8EECS 470
• T!=0 → Value is not ready, wait until CDB broadcasts T
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Simple Tomasulo Data Structures
valueTMap TableRegfile
DB
.V
DB
.T
V1 V2T2T1Top========
CD
CD
Fetchedinsns
R========
FU
==Reservation Stations
T
==
• Insn fields and status bits• Tags• Values
EECS 470Lecture 6 Slide 9EECS 470
Values
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Simple Tomasulo Pipeline• New pipeline structure: F, D, S, X, W
• D (dispatch)• Structural hazard ? stall : allocate RS entry
• S (issue)• RAW hazard ? wait (monitor CDB) : go to executeRAW hazard ? wait (monitor CDB) : go to execute
• W (writeback)• Write register, free RS entry
W d RAW d d t S i l• W and RAW-dependent S in same cycle• W and structural-dependent D in same cycle
EECS 470Lecture 6 Slide 10EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Tomasulo Dispatch (D)
valueTMap TableRegfile
DB
.V
DB
.T
V1 V2T2T1Top========
CD
CD
Fetchedinsns
R========
FU
==Reservation Stations
T
==
• Stall for structural (RS) hazards• Allocate RS entry• Input register ready ? read value into RS : read tag into RS
EECS 470Lecture 6 Slide 11EECS 470
• Input register ready ? read value into RS : read tag into RS• Set register status (i.e., rename) for ouput register
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Tomasulo Issue (S)
valueTMap TableRegfile
DB
.V
DB
.T
V1 V2T2T1Top========
CD
CD
Fetchedinsns
R========
FU
==Reservation Stations
T
==
• Wait for RAW hazards• Read register values from RS
EECS 470Lecture 6 Slide 12EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Tomasulo Execute (X)
valueTMap TableRegfile
DB
.V
DB
.T
V1 V2T2T1Top========
CD
CD
Fetchedinsns
R========
FU
==Reservation Stations
T
==
EECS 470Lecture 6 Slide 13EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Tomasulo Writeback (W)
valueTMap TableRegfile
DB
.V
DB
.T
V1 V2T2T1Top========
CD
CD
Fetchedinsns
R========
FU
==Reservation Stations
T
==
• Wait for structural (CDB) hazards• Output Reg Status tag still matches? clear, write result to register• CDB broadcast to RS: tag match ? clear tag, copy value
EECS 470Lecture 6 Slide 14EECS 470
g g, py• Free RS entry
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Difference Between Scoreboard…
XS Insn valueTReg StatusRegfile
R1 R2 T2T1Top========
Fetchedinsns
R========
FU StatusFU
==
T
==
EECS 470Lecture 6 Slide 15EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
…And Tomasulo
valueTMap TableRegfile
DB
.V
DB
.T
V1 V2T2T1Top========
CD
CD
Fetchedinsns
R========
FU
==Reservation Stations
T
==
• What in Tomasulo implements register renaming?• Value copies in RS (V1, V2)• Insn stores correct input values in its own RS entry
EECS 470Lecture 6 Slide 16EECS 470
p y+ Future insns can overwrite master copy in regfile, doesn’t matter
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Value/Copy-Based Register Renaming• Tomasulo-style register renaming
• Called “value-based” or “copy-based” • Names: architectural registers• Storage locations: register file and reservation stations
• Values can and do exist in bothValues can and do exist in both• Register file holds master (i.e., most recent) values+ RS copies eliminate WAR hazards
St l ti f d t i t ll b RS# t• Storage locations referred to internally by RS# tags• Register table translates names to tags• Tag == 0 value is in register file• Tag != 0 value is not ready and is being computed by RS#
• CDB broadcasts values with tags attached• So insns know what value they are looking at
EECS 470Lecture 6 Slide 17EECS 470
So insns know what value they are looking at
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Value-Based Renaming Exampleldf X(r1),f1 (allocated RS#2)
• MT[r1] == 0 → RS[2].V2 = RF[r1] • MT[f1] = RS#2• MT[f1] = RS#2
mulf f0,f1,f2 (allocate RS#4)• MT[f0] == 0 → RS[4].V1 = RF[f0]
MT[f1] RS#2 RS[4] T2 RS#2• MT[f1] == RS#2 → RS[4].T2 = RS#2• MT[f2] = RS#4
addf f7,f8,f0C it RF[f0] b f lf t h ?
Map TableReg Tf0• Can write RF[f0] before mulf executes, why?
ldf X(r1),f1• Can write RF[f1] before mulf executes, why?
C [ ] b f f h ?
f0f1 RS#2f2 RS#4r1
• Can write RF[f1] before first ldf, why?
Reservation StationsT FU busy op R T1 T2 V1 V2
r1
EECS 470Lecture 6 Slide 18EECS 470
2 LD yes ldf f1 - - - [r1]4 FP1 yes mulf f2 - RS#2 [f0]
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Tomasulo Data StructuresInsn StatusInsn D S X Wldf X(r1),f1
Map TableReg Tf0
CDBT V
ldf X(r1),f1mulf f0,f1,f2stf f2,Z(r1) addi r1,4,r1
f0f1f2r1
ldf X(r1),f1mulf f0,f1,f2stf f2,Z(r1)
Reservation StationsT FU busy op R T1 T2 V1 V21 ALU no1 ALU no2 LD no3 ST no4 FP1 no
EECS 470Lecture 6 Slide 19EECS 470
5 FP2 no
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Tomasulo: Cycle 1Insn StatusInsn D S X Wldf X(r1),f1 c1
Map TableReg Tf0
CDBT V
ldf X(r1),f1 c1mulf f0,f1,f2stf f2,Z(r1) addi r1,4,r1
f0f1 RS#2f2r1
ldf X(r1),f1mulf f0,f1,f2stf f2,Z(r1)
Reservation StationsT FU busy op R T1 T2 V1 V21 ALU no1 ALU no2 LD yes ldf f1 - - - [r1]3 ST no4 FP1 no
allocate
EECS 470Lecture 6 Slide 20EECS 470
5 FP2 no
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Tomasulo: Cycle 2Insn StatusInsn D S X Wldf X(r1),f1 c1 c2
Map TableReg Tf0
CDBT V
ldf X(r1),f1 c1 c2mulf f0,f1,f2 c2stf f2,Z(r1) addi r1,4,r1
f0f1 RS#2f2 RS#4r1
ldf X(r1),f1mulf f0,f1,f2stf f2,Z(r1)
Reservation StationsT FU busy op R T1 T2 V1 V21 ALU no1 ALU no2 LD yes ldf f1 - - - [r1]3 ST no4 FP1 yes mulf f2 - RS#2 [f0] - allocate
EECS 470Lecture 6 Slide 21EECS 470
y # [ ]5 FP2 no
allocate
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Tomasulo: Cycle 3Insn StatusInsn D S X Wldf X(r1),f1 c1 c2 c3
Map TableReg Tf0
CDBT V
ldf X(r1),f1 c1 c2 c3mulf f0,f1,f2 c2stf f2,Z(r1) c3addi r1,4,r1
f0f1 RS#2f2 RS#4r1
ldf X(r1),f1mulf f0,f1,f2stf f2,Z(r1)
Reservation StationsT FU busy op R T1 T2 V1 V21 ALU no1 ALU no2 LD yes ldf f1 - - - [r1]3 ST yes stf - RS#4 - - [r1]4 FP1 yes mulf f2 - RS#2 [f0] -
allocate
EECS 470Lecture 6 Slide 22EECS 470
y # [ ]5 FP2 no
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Tomasulo: Cycle 4Insn StatusInsn D S X Wldf X(r1),f1 c1 c2 c3 c4
Map TableReg Tf0
CDBT VRS#2 [f1]ldf X(r1),f1 c1 c2 c3 c4
mulf f0,f1,f2 c2 c4stf f2,Z(r1) c3addi r1,4,r1 c4
f0f1 RS#2f2 RS#4r1 RS#1
RS#2 [f1]
ldf X(r1),f1mulf f0,f1,f2stf f2,Z(r1)
ldf finished (W) clear f1 RegStatus
Reservation StationsT FU busy op R T1 T2 V1 V21 ALU yes addi r1 - - [r1] - allocate
gCDB broadcast
1 ALU yes addi r1 [r1]2 LD no3 ST yes stf - RS#4 - - [r1]4 FP1 yes mulf f2 - RS#2 [f0] CDB.V
allocatefree
RS#2 ready →
EECS 470Lecture 6 Slide 23EECS 470
y # [ ]5 FP2 no
S# y →grab CDB value
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Tomasulo: Cycle 5Insn StatusInsn D S X Wldf X(r1),f1 c1 c2 c3 c4
Map TableReg Tf0
CDBT V
ldf X(r1),f1 c1 c2 c3 c4mulf f0,f1,f2 c2 c4 c5stf f2,Z(r1) c3addi r1,4,r1 c4 c5
f0f1 RS#2f2 RS#4r1 RS#1
ldf X(r1),f1 c5mulf f0,f1,f2stf f2,Z(r1)
Reservation StationsT FU busy op R T1 T2 V1 V21 ALU yes addi r1 - - [r1] -1 ALU yes addi r1 [r1]2 LD yes ldf f1 - RS#1 - -3 ST yes stf - RS#4 - - [r1]4 FP1 yes mulf f2 - - [f0] [f1]
allocate
EECS 470Lecture 6 Slide 24EECS 470
y [ ] [ ]5 FP2 no
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Tomasulo: Cycle 6Insn StatusInsn D S X Wldf X(r1),f1 c1 c2 c3 c4
Map TableReg Tf0
CDBT V
ldf X(r1),f1 c1 c2 c3 c4mulf f0,f1,f2 c2 c4 c5+stf f2,Z(r1) c3addi r1,4,r1 c4 c5 c6
f0f1 RS#2f2 RS#4RS#5r1 RS#1
ldf X(r1),f1 c5mulf f0,f1,f2 c6stf f2,Z(r1)
no D stall on WAW: scoreboard wouldoverwrite f2 RegStatusanyone who needs old f2 tag has it
Reservation StationsT FU busy op R T1 T2 V1 V21 ALU yes addi r1 - - [r1] -
anyone who needs old f2 tag has it
1 ALU yes addi r1 [r1]2 LD yes ldf f1 - RS#1 - -3 ST yes stf - RS#4 - - [r1]4 FP1 yes mulf f2 - - [f0] [f1]
EECS 470Lecture 6 Slide 25EECS 470
y [ ] [ ]5 FP2 yes mulf f2 - RS#2 [f0] - allocate
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Tomasulo: Cycle 7Insn StatusInsn D S X Wldf X(r1),f1 c1 c2 c3 c4
Map TableReg Tf0
CDBT VRS#1 [r1]ldf X(r1),f1 c1 c2 c3 c4
mulf f0,f1,f2 c2 c4 c5+stf f2,Z(r1) c3addi r1,4,r1 c4 c5 c6 c7
f0f1 RS#2f2 RS#5r1 RS#1
RS#1 [r1]
ldf X(r1),f1 c5 c7mulf f0,f1,f2 c6stf f2,Z(r1)
no W wait on WAR: scoreboard wouldanyone who needs old r1 has RS copy
D stall on store RS: structural
Reservation StationsT FU busy op R T1 T2 V1 V21 ALU no
addi finished (W) clear r1 RegStatusCDB broadcast
1 ALU no2 LD yes ldf f1 - RS#1 - CDB.V3 ST yes stf - RS#4 - - [r1]4 FP1 yes mulf f2 - - [f0] [f1]
RS#1 ready →grab CDB value
EECS 470Lecture 6 Slide 26EECS 470
y [ ] [ ]5 FP2 yes mulf f2 - RS#2 [f0] -
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Tomasulo: Cycle 8Insn StatusInsn D S X Wldf X(r1),f1 c1 c2 c3 c4
Map TableReg Tf0
CDBT VRS#4 [f2]ldf X(r1),f1 c1 c2 c3 c4
mulf f0,f1,f2 c2 c4 c5+ c8stf f2,Z(r1) c3 c8addi r1,4,r1 c4 c5 c6 c7
f0f1 RS#2f2 RS#5r1
RS#4 [f2]
ldf X(r1),f1 c5 c7 c8mulf f0,f1,f2 c6stf f2,Z(r1)
mulf finished (W) don’t clear f2 RegStatusalready overwritten by 2nd mulf (RS#5)
CDB b d tReservation StationsT FU busy op R T1 T2 V1 V21 ALU no
CDB broadcast
1 ALU no2 LD yes ldf f1 - - - [r1]3 ST yes stf - RS#4 - CDB.V [r1]4 FP1 no
RS#4 ready →grab CDB value
EECS 470Lecture 6 Slide 27EECS 470
5 FP2 yes mulf f2 - RS#2 [f0] -g
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Tomasulo: Cycle 9Insn StatusInsn D S X Wldf X(r1),f1 c1 c2 c3 c4
Map TableReg Tf0
CDBT VRS#2 [f1]ldf X(r1),f1 c1 c2 c3 c4
mulf f0,f1,f2 c2 c4 c5+ c8stf f2,Z(r1) c3 c8 c9addi r1,4,r1 c4 c5 c6 c7
f0f1 RS#2f2 RS#5r1
RS#2 [f1]
ldf X(r1),f1 c5 c7 c8 c9mulf f0,f1,f2 c6 c9stf f2,Z(r1)
2nd ldf finished (W) clear f1 RegStatusCDB broadcast
Reservation StationsT FU busy op R T1 T2 V1 V21 ALU no1 ALU no2 LD no3 ST yes stf - - - [f2] [r1]4 FP1 no
EECS 470Lecture 6 Slide 28EECS 470
5 FP2 yes mulf f2 - RS#2 [f0] CDB.V RS#2 ready →grab CDB value
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Tomasulo: Cycle 10Insn StatusInsn D S X Wldf X(r1),f1 c1 c2 c3 c4
Map TableReg Tf0
CDBT V
ldf X(r1),f1 c1 c2 c3 c4mulf f0,f1,f2 c2 c4 c5+ c8stf f2,Z(r1) c3 c8 c9 c10addi r1,4,r1 c4 c5 c6 c7
f0f1f2 RS#5r1
ldf X(r1),f1 c5 c7 c8 c9mulf f0,f1,f2 c6 c9 c10stf f2,Z(r1) c10
stf finished (W) no output register → no CDB broadcast
Reservation StationsT FU busy op R T1 T2 V1 V21 ALU no1 ALU no2 LD no3 ST yes stf - RS#5 - - [r1]4 FP1 no
free → allocate
EECS 470Lecture 6 Slide 29EECS 470
5 FP2 yes mulf f2 - - [f0] [f1]
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Scoreboard vs. TomasuloScoreboard Tomasulo
Insn D S X W D S X Wldf X(r1),f1 c1 c2 c3 c4 c1 c2 c3 c4ldf X(r1),f1 c1 c2 c3 c4 c1 c2 c3 c4mulf f0,f1,f2 c2 c4 c5+ c8 c2 c4 c5+ c8stf f2,Z(r1) c3 c8 c9 c10 c3 c8 c9 c10addi r1,4,r1 c4 c5 c6 c9 c4 c5 c6 c7ldf X(r1),f1 c5 c9 c10 c11 c5 c7 c8 c9mulf f0,f1,f2 c8 c11 c12+ c15 c6 c9 c10+ c13stf f2,Z(r1) c10 c15 c16 c17 c10 c13 c14 c15
Hazard Scoreboard TomasuloInsn buffer stall in D stall in DFU wait in S wait in SRAW wait in S wait in SWAR wait in W noneWAW stall in D none
EECS 470Lecture 6 Slide 30EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Scoreboard vs. Tomasulo II: Cache MissScoreboard Tomasulo
Insn D S X W D S X Wldf X(r1),f1 c1 c2 c3+ c8 c1 c2 c3+ c8ldf X(r1),f1 c1 c2 c3+ c8 c1 c2 c3+ c8mulf f0,f1,f2 c2 c8 c9+ c12 c2 c8 c9+ c12stf f2,Z(r1) c3 c12 c13 c14 c3 c12 c13 c14addi r1,4,r1 c4 c5 c6 c13 c4 c5 c6 c7ldf X(r1),f1 c8 c13 c14 c15 c5 c7 c8 c9mulf f0,f1,f2 c12 c15 c16+ c19 c6 c9 c10+ c13stf f2,Z(r1) c13 c19 c20 c21 c7 c13 c14 c15
• Assume• 5 cycle cache miss on first ldf• Ignore FUST and RS structural hazards• Ignore FUST and RS structural hazards+ Advantage Tomasulo
• No addi WAR hazard (c7) means iterations run in parallel
EECS 470Lecture 6 Slide 31EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Can We Add Superscalar?• Dynamic scheduling and multiple issue are orthogonal
• E.g., Pentium4: dynamically scheduled 5-way superscalar• Two dimensions
• N: superscalar width (number of parallel operations)• W: window size (number of reservation stations)W: window size (number of reservation stations)
• What do we need for an N-by-W Tomasulo?• RS: N tag/value w-ports (D), N value r-ports (S), 2N tag CAMs (W)• Select logic: W→N priority encoder (S)• MT: 2N r-ports (D) N w-ports (D)• MT: 2N r ports (D), N w ports (D)• RF: 2N r-ports (D), N w-ports (W)• CDB: N (W)
Whi h th i i ?
EECS 470Lecture 6 Slide 32EECS 470
• Which are the expensive pieces?
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Superscalar Select Logic• Superscalar select logic: W→N priority encoder
– Somewhat complicated (N2 logW)• Can simplify using different RS designs
• Split design• Divide RS into N banks: 1 per FU?• Divide RS into N banks: 1 per FU? • Implement N separate W/N→1 encoders+ Simpler: N * logW/N– Less scheduling flexibility
• FIFO design [Palacharla+]• Can issue only head of each RS bank• Can issue only head of each RS bank + Simpler: no select logic at all– Less scheduling flexibility (but surprisingly not that bad)
EECS 470Lecture 6 Slide 33EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Can We Add Bypassing?
valueTMap TableRegfile
DB
.V
DB
.T
V1 V2T2T1Top========
CD
CD
Fetchedinsns
R======== ==
Reservation StationsT
==
• Yes, but it’s more complicated than you might think
FU
EECS 470Lecture 6 Slide 34EECS 470
• In fact: requires a completely new pipeline
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Why Out-of-Order Bypassing Is HardNo Bypassing Bypassing
Insn D S X W D S X Wldf X(r1),f1 c1 c2 c3 c4 c1 c2 c3 c4ldf X(r1),f1 c1 c2 c3 c4 c1 c2 c3 c4mulf f0,f1,f2 c2 c4 c5+ c8 c2 c3 c4+ c7stf f2,Z(r1) c3 c8 c9 c10 c3 c6 c7 c8addi r1,4,r1 c4 c5 c6 c7 c4 c5 c6 c7ldf X(r1),f1 c5 c7 c8 c9 c5 c7 c7 c9mulf f0,f1,f2 c6 c9 c10+ c13 c6 c9 c8+ c13stf f2,Z(r1) c10 c13 c14 c15 c10 c13 c11 c15
• Bypassing: ldf X in c3 → mulf X in c4 → mulf S in c3• But how can mulf S in c3 if ldf W in c4? Must change pipeline
M d h d l• Modern scheduler• Split CDB tag and value, move tag broadcast to S
• ldf tag broadcast now in cycle 2 → mulf S in cycle 3
EECS 470Lecture 6 Slide 35EECS 470
g y y• How do multi-cycle operations work? How do cache misses work?
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Dynamic Scheduling Summary• Dynamic scheduling: out-of-order execution
• Higher pipeline/FU utilization, improved performance• Easier and more effective in hardware than software
+ More storage locations than architectural registers+ Dynamic handling of cache misses+ Dynamic handling of cache misses
• Instruction buffer: multiple F/D latches• Implements large scheduling scope + “passing” functionality• Split decode into in-order dispatch and out-of-order issue
• Stall vs. wait
• Dynamic scheduling algorithmsDynamic scheduling algorithms• Scoreboard: no register renaming, limited out-of-order• Tomasulo: copy-based register renaming, full out-of-order
EECS 470Lecture 6 Slide 36EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Are we done?
• When can Tomasulo go wrong?• Exceptions!!
N t fi t l ti d f i t ti i RS• No way to figure out relative order of instructions in RS
• Next Lecture: How to make exceptions work• Next Lecture: How to make exceptions work
EECS 470Lecture 6 Slide 37EECS 470