Upload
jatin
View
34
Download
2
Embed Size (px)
DESCRIPTION
Pipelining IV. Systems I. Topics Implementing pipeline control Pipelining and performance analysis. Implementing Pipeline Control. Combinational logic generates pipeline control signals Action occurs at start of following cycle. Initial Version of Pipeline Control. bool F_stall = - PowerPoint PPT Presentation
Citation preview
Pipelining IVPipelining IV
TopicsTopics Implementing pipeline control Pipelining and performance analysis
Systems I
2
Implementing Pipeline Control
Combinational logic generates pipeline control signals Action occurs at start of following cycle
E
M
W
F
D
CCCC
rB
srcA
srcB
icode valE valM dstE dstM
Bchicode valE valA dstE dstM
icode ifun valC valA valB dstE dstM srcA srcB
valC valPicode ifun rA
predPC
d_srcB
d_srcA
e_Bch
D_icode
E_icode
M_icode
E_dstM
Pipecontrollogic
D_bubble
D_stall
E_bubble
F_stall
3
Initial Version of Pipeline Controlbool F_stall =
# Conditions for a load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } ||# Stalling at fetch while ret passes through pipelineIRET in { D_icode, E_icode, M_icode };
bool D_stall = # Conditions for a load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB };
bool D_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Bubble for ret IRET in { D_icode, E_icode, M_icode };
bool E_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB};
4
Control Combinations
Special cases that can arise on same clock cycle
Combination ACombination A Not-taken branch ret instruction at branch target
Combination BCombination B Instruction that reads from memory to %esp Followed by ret instruction
LoadE
UseD
M
Load/use
JXXE
D
M
Mispredict
JXXE
D
M
Mispredict
EretD
M
ret 1
retE
bubbleD
M
ret 2
bubbleE
bubbleD
retM
ret 3
EretD
M
ret 1
EretD
M
ret 1
retE
bubbleD
M
ret 2
retE
bubbleD
M
ret 2
bubbleE
bubbleD
retM
ret 3
bubbleE
bubbleD
retM
ret 3
Combination B
Combination A
5
Control Combination A
Should handle as mispredicted branch Stalls F pipeline register But PC selection logic will be using M_valM anyhow
JXXE
D
M
Mispredict
JXXE
D
M
Mispredict
E
retD
M
ret 1
E
retD
M
ret 1
E
retD
M
ret 1
Combination A
Condition F D E M W
Processing ret stall bubble normal normal normal
Mispredicted Branch normal bubble bubble normal normal
Combination stall bubble bubble normal normal
6
Stall in F
Your book provides two inconsistent meanings for “stall in F”
Instruction remains in F and injects a bubble into D
Instruction squashed into D, same PC fetched
Figure 4.61
Use the one that keeps 1 instr per pipeline stage
E M WD
F
F D E M WF D E M W
7
JXX + ret works great!
0x000: xorl %eax,%eax
1 2 3 4 5 6 7 8 9
F D E M WF D E M W
0x002: jne target # Not taken F D E M WF D E M W
E M W
10
0x011: t: ret # Target
bubble
0x012: nop # Target + 1
F D
E M W
D
F
bubble
0x007: irmovl $1,%eax # Fall through
0x00d: nop
F D E M WF D E M W
F D E M WF D E M W
8
Control Combination B
Would attempt to bubble and stall pipeline register D Signaled by processor as pipeline error
LoadE
UseD
M
Load/use
E
retD
M
ret 1
E
retD
M
ret 1
E
retD
M
ret 1
Combination B
Condition F D E M W
Processing ret stall bubble normal normal normal
Load/Use Hazard stall stall bubble normal normal
Combination stall bubble + stall
bubble normal normal
9
Handling Control Combination B
Load/use hazard should get priority ret instruction should be held in decode stage for additional
cycle
LoadE
UseD
M
Load/use
E
retD
M
ret 1
E
retD
M
ret 1
E
retD
M
ret 1
Combination B
Condition F D E M W
Processing ret stall bubble normal normal normal
Load/Use Hazard stall stall bubble normal normal
Combination stall stall bubble normal normal
10
Corrected Pipeline Control Logic
Load/use hazard should get priority ret instruction should be held in decode stage for additional
cycle
Condition FF DD EE MM WW
Processing ret stallstall bubblebubble normalnormal normalnormal normalnormal
Load/Use Hazard stallstall stallstall bubblebubble normalnormal normalnormal
Combination stallstall stallstall bubblebubble normalnormal normalnormal
bool D_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode } # but not condition for a load/use hazard && !(E_icode in { IMRMOVL, IPOPL }
&& E_dstM in { d_srcA, d_srcB });
11
Load/use hazard with ret
mrmovl F Dret F
mrmovl F D Eret F Daddl F
mrmovl F D E M bubble Eret F D Daddl F F
mrmovl F D E M W bubble E Mret F D D Eaddl F F bubble Daddl F
12
Pipeline Summary
Data Hazards Most handled by forwarding
No performance penalty
Load/use hazard requires one cycle stall
Control Hazards Cancel instructions when detect mispredicted branch
Two clock cycles wasted
Stall fetch stage while ret passes through pipelineThree clock cycles wasted
Control Combinations Must analyze carefully First version had subtle bug
Only arises with unusual instruction combination
13
Performance Analysis with Pipelining
Ideal pipelined machine: CPI = 1Ideal pipelined machine: CPI = 1 One instruction completed per cycle But much faster cycle time than unpipelined machine
However - hazards are working against the idealHowever - hazards are working against the ideal Hazards resolved using forwarding are fine Stalling degrades performance and instruction comletion
rate is interrupted
CPI is measure of “architectural efficiency” of designCPI is measure of “architectural efficiency” of design
Cycle
Seconds
nInstructio
Cycles
Program
nsInstructio
Program
Seconds timeCPU
14
CPI for PIPE
CPI CPI 1.0 1.0 Fetch instruction each clock cycle Effectively process new instruction almost every cycle
Although each individual instruction has latency of 5 cycles
CPI CPI >> 1.0 1.0 Sometimes must stall or cancel branches
Computing CPIComputing CPI C clock cycles I instructions executed to completion B bubbles injected (C = I + B)
CPI = C/I = (I+B)/I = 1.0 + B/I Factor B/I represents average penalty due to bubbles
15
Computing CPI
CPICPI Function of useful instruction and bubbles
Cb/Ci represents the pipeline penalty due to stalls
Can reformulate to account forCan reformulate to account for load penalties (lp) branch misprediction penalties (mp) return penalties (rp)
CPI Ci CbCi
1.0 CbCi
CPI 1.0 lpmp rp
16
Computing CPI - II
So how do we determine the penalties?So how do we determine the penalties? Depends on how often each situation occurs on average How often does a load occur and how often does that load
cause a stall? How often does a branch occur and how often is it
mispredicted How often does a return occur?
We can measure theseWe can measure these simulator hardware performance counters
We can estimate through historical averagesWe can estimate through historical averages Then use to make early design tradeoffs for architecture
17
Computing CPI - III
CPI = 1 + 0.31 = 1.31 == 31% worse than idealCPI = 1 + 0.31 = 1.31 == 31% worse than ideal
This gets worse when:This gets worse when: Account for non-ideal memory access latency Deeper pipelines (where stalls per hazard increase)
CauseCause NameName InstructionInstructionFrequencyFrequency
ConditionConditionFrequencyFrequency
StallsStalls ProductProduct
Load/UseLoad/Use lplp 0.300.30 0.30.3 11 0.090.09
MispredictMispredict mpmp 0.200.20 0.40.4 22 0.160.16
ReturnReturn rprp 0.020.02 1.01.0 33 0.060.06
Total penaltyTotal penalty 0.310.31
18
CPI for PIPE (Cont.)B/I = LP + MP + RP
LP: Penalty due to load/use hazard stalling Fraction of instructions that are loads 0.25 Fraction of load instructions requiring stall 0.20 Number of bubbles injected each time 1
LP = 0.25 * 0.20 * 1 = 0.05
MP: Penalty due to mispredicted branches Fraction of instructions that are cond. jumps 0.20 Fraction of cond. jumps mispredicted 0.40 Number of bubbles injected each time 2
MP = 0.20 * 0.40 * 2 = 0.16
RP: Penalty due to ret instructions Fraction of instructions that are returns 0.02 Number of bubbles injected each time 3
RP = 0.02 * 3 = 0.06
Net effect of penalties 0.05 + 0.16 + 0.06 = 0.27 CPI = 1.27 (Not bad!)
Typical Values
19
Summary
TodayToday Pipeline control logic Effect on CPI and performance
Next TimeNext Time Further mitigation of branch mispredictions State machine design