View
215
Download
0
Category
Preview:
Citation preview
Pipeline Hazards
• Pipeline hazards• These are situations that inhibit that the next instruction can
be processed in the next stage of the pipeline. • This leads to an interrupt of the synchronous execution in the
pipeline and thus to a performance decrease.
• Solution: suspend the execution of the instruction (pipeline stall)
• If an instruction is suspended in a certain stage of the pipeline, all subsequent instructions are also stopped.
• The pipeline logic inserts NOP operations into the next pipeline stage.
• The processing of all earlier instructions is continued.
Resource Hazards
• Structural hazards• Result from two instructions that are processed in different
stages which require the same resource. • Not all of the components can be replicated to make sure that
this never happens.
• Examples• Parallel writes to the register file, e.g., if arithmetic operations
can write directly and load in the memory access phase.• Parallel access to memory in IF and MA• Subsequent instructions need the FP division hardware that is
not implemented as a pipeline.
Data Hazards and Control Hazards
• Data hazards• Instruction access the same data as earlier instructions and
these are not yet finished, e.g., an operand computed by a previous instruction is not yet available.
• Data hazards result from data dependences between the instructions.
• Branch (control) hazards• The next instruction cannot be fetched due to a jump in the
control flow.
Resolving Pipeline Hazards
• Simple solution is to stop the pipeline• Insertion of NOPs or Pipeline Bubbles.• This reduces the pipeline throughput.
• Many techniques in hardware and software have been developed to reduce the effect of hazards on the performance.
Pipeline Hazards and Data Dependences
• Data dependences occur between statements in the program.
• Exampleadd R1,R2,R3
sub R4,R5,R6
and R6,R1,R8
xor R9,R1,R11
Data Dependence
• An instruction j is data dependent on instruction i if• There is a path from i to j • and
where– I(i) = set or read data– O(i)=set of written data
))()(())()(())()(( jOiOjIiOjOiI
True dependence
• True or flow dependence: first write then read• Example
LOOP: load F0,0(R1)
add F4,F0,F2
Anti Dependence
• Anti dependence (first read then write)• Instruction i reads an operand from a register or memory
which is overwritten by a later instruction.
ADD R2,R3,R4
XOR R3,R5,R6
Output Dependence
• Output dependence (both write)• Instruction i and j write the same register or memory address:
ADD R2,R3,R4
XOR R2,R5,R6
• Anti and output dependences are called name dependences.
Dependences and Pipeline Hazards
• Data dependences are properties of the program. • It depends on the pipeline organization and the
temporal execution of instructions whether data dependences lead to pipeline hazards or not.
• Data dependences• may induce hazards. Thus, they point out the possibility. • They determine the execution order of instructions.
– Independent instructions can be reordered and even executed in parallel.
– They determine the maximum degree of parallelism.
Data Hazards
• Data hazards can occur if data dependent instructions are executed only with a short delay in the pipeline.
• Thus their accesses can overlap in the pipeline. • Example: True dependence
load R1, Aload R2, B
add R2,R1,R2
mul R1,R2,R1
WBWBMAMAEXEXIDIDIFIFWBWBMAMAEXEXIDIDIFIF
WBWBMAMAEXEXIDIDIFIFWBWBMAMAEXEXIDIDIFIF
Zeitti+1 ti+3ti ti+2 ti+4
Data Hazards
• Example: True dependence
WBWBMAMAEXEXIDIDIFIF
WBWBMAMAEXEXIDIDIFIF
add R2,R1,R2
mul R1,R2,R1
Zeitti+1 ti+3ti ti+2 ti+4
R2neu
R2alt
Read wrong value
Data Hazards Classification
• Read-after-write (RAW) • Happens if instruction j reads a source register before
instruction i wrote its result. • Implied by a true dependence.
• Write-after-Read (WAR)• Happens if instruction j writes the target register before
instruction i reads the operand.• Implied by an anti dependence
• Write-after-Write (WAW)• Happens if instruction j writes its target register before
instruction i wrote its result to the same register.• Implied by an output dependence.• Can happen in pipelines where multiple stages can write or an
instruction can proceed without waiting for a stalled previous instruction.
insti
…
instj
Handling Hazards
• Software solutions (static solutions)• Implemented by the compiler• Insertion of NOPs
– Detection of potential data hazards– Insertion of NOPs after instructions that might lead to hazards.
• Reordering of instructions– Instruction scheduling phase of the compiler– Reorders instructions so that independent instructions are
executed between dependent instructions.
Handling Hazards
• Hardware solutions (Dynamic Solutions)• Detection of conflicts
– Requires an appropriate hardware logic• Handling
– Interlocking, Stalling– Forwarding– Forwarding with interlocking
Handling Hazards in the Hardware
• Pipeline Interlocking• Detection of hazards.• Stops instruction j and all subsequent instructions for multiple
cycles.
WBWBMAMAEXEXIDIDIFIF
WBWBMAMAEXEX
add R2,R1,R2
mul R1,R2,R1
Zeitti+1 ti+3ti ti+2 ti+4
R2
stallstallstallstallIDIDIFIF
Handling Hazards in the Hardware
• Forwarding• Direct forward of ALU results to the ALU input.• Eliminates stall cycles.• Requires additional hardware (forwarding logic)
WBWBMAMAEXEXIDIDIFIF
WBWBMAMAEXEXIDIDIFIF
add R2,R1,R2
mul R1,R2,R1
Zeitti+1 ti+3ti ti+2 ti+4
Forwarding and Interlocking
• Not all hazards can be handled by forwarding• Example: true dependence with load operation
WBWBMAMAEXEXIDIDIFIF
WBWBMAMAEXEXIDIDIFIF
load R2,A
add R1,R2,R1
Zeitti+1 ti+3ti ti+2 ti+4
WBWBMAMAEXEXIDIDIFIF
WBWBMAMAEXEX
load R2,A
add R1,R2,R1
Zeitti+1 ti+3ti ti+2 ti+4
Solution:
Forwarding + Interlocking
stallstallIDIDIFIF
MIPS-Pipeline
PC
AD
D
MU
X
InstructionMemory
IF/ID
Re
gisters
ID/EX
Zero?
MU
XM
UX
AL
U
EX/MEM
Da
taM
emo
ry
MEM/WB
MU
X
Sign
-e
xtend
IR
NPC
4
IR6..10
IR11..15
16 bit 32 bit
IMM
A
B
Branchtaken
COND
AL
U O
utp
ut
LM
D
Hinweis: Skript Wismüller
Branch Hazards
• Computation of the target and condition is done in the EX phase and it replaces PC in the MA phase.
• Condition typically depends on the EXE phase of the previous instruction requiring forwarding.
• Thus, only after three cycles the correct instruction can be loaded.
Branch Hazards
JUMP
Target
WBWBMAMAEXEXIDIDIFIF
WBWBMAMAEXEXIDIDIFIF
Zeit
PC
Stall cycles
stallstallstallstallstallstall
Branch Hazards
• Condition and target should be computed already in ID• Structural Hazard:
– ALU can not be used for the computation of the target. Additional ALU is thus required in ID.
• Data dependence with previous arithmetic instruction– RAW Hazard
• Critical path in ID phase is prolongated– Decoding, computation of branch target, and updating PC for
critical path.
Resolving Branch Hazards
• Insertion of independent instructions• Instruction scheduling of compiler• Fill the stall cycle with an indepent instruction (Delay Slot)
add R1,R2,R3
br addrnop...
...
br addradd R1,R2,R3...
...
Branch Prediction
• Prediction of branch decision when a jump is encountered.
• Speculative execution of instructions dependent on the predicted outcome.
• After the condition was computed• Either continue without delay since the prediction was correct• or delete the started instructions and fetch the correct ones.
• Two classes• Static branch prediction by hardware or compiler• Dynamic branch prediciton by the hardware
Static Branch Prediction
• Hardware• Static prediction in processor, backward jumps are predicted
to be always taken.
• Compiler• Specification via a bit in the jump opcode• Prediction can be guided by program analysis or profiling
(feedback directed compilation)
Dynamic Branch Prediction
• Properties• Based on dynamic behavior of the application
– The history of a jump is taken into account.
• Leads to more precise predictions• Expensive in terms of hardware
• Branch Prediction Buffer• Cache for information about conditional jumps• Requires that the target can be computed fast
Branch Prediction Buffer
• Cache Organization
Address-Tag 0Address-Tag 0Address-Tag 1Address-Tag 1
inval 0inval 0
Address-Tag 1inval 0inval 1
Address-Tag 1
1024 entries
(Instruction address >> 2) % 1024
20 Bit
Single Bit Predictor
• Single prediction bit• If the Bit is set, the brunch is predicted to be taken. • If the prediction is wrong the bit is inverted.
NT
NTT
T
Predict Taken Predict Not Taken
Single Bit vs Double Bit Predictors
• Single Bit Predictor is suboptimal for nested loops
• Wrong prediction in the first iteration of inner loop.
DO
DO
S1
S2
JUMP
JUMP
Two Bit Predictor
• Two bits allow to have four states– strongly taken– weakly taken– weakly not taken– strongly not taken
• Requires two mispredictions to switch prediction.
Two Bit Predictor
(11)Predict taken
(11)Predict taken
(10)Predict taken
(10)Predict taken
(01)Predict
not taken
(01)Predict
not taken
(00)Predict
not taken
(00)Predict
not taken
T
T
T
NT
NT
NT
NT
T
weakly taken
weakly not taken
Two Bit Predictor
(11)Predict taken
(11)Predict taken
(10)Predict taken
(10)Predict taken
(01)Predict
not taken
(01)Predict
not taken
(00)Predict
not taken
(00)Predict
not taken
T
T
T
NT
NT
NT
NT
T DO
DO
S1
S2
JUMP
JUMP
Two-Bit Predictor with Saturation Scheme
• Count the taken jumps• If sum >= 2, predict taken jump• Extensible to n Bit
• Experiments showed that there is no big impact.
T NT
(11)Predict taken
(11)Predict taken
(10)Predict taken
(10)Predict taken
(01)Predict not taken
(01)Predict not taken
(00)Predict not taken
(00)Predict not taken
NT
T
T NT
T NT
Size of Prediction Buffer – SPEC 89
Prediction Accuracy 2-bit Predictor
0
0
0
5
9
9
11
5
18
10
1
0
1
5
9
9
12
5
18
10
0 5 10 15 20
nasa7
matrix300
tomcatv
doduc
spice
fpppp
gcc
espresso
eqntott
li
4096
Unlimited
% Misspredictions
Correlation Predictors
• Prediction is also based on the history of other jumps.
• Simple two bit predictor is not sufficient to predict third branch.
• Taking into account the preceding jumps, enables a correct prediction.
If (aa==2) aa=0;If (bb==2) bb=0;If (aa!=bb){ … }
(m,n)-Predictors
• (m,n)-Predictors:• Uses the history of the last m jumps to select one of 2m n-bit
predictors.
• Branch History Register (BHR)• m-Bit shift register• Store the global history of the last m jumps. Bits determine
whether the jump was taken. • After each jump the outcome is shifted into the BHR• The BHR gives the index in the Pattern History Table (PHT)
(m,n) Predictors
• Example: (2,2) Predictor:
Pattern History Tables PHTs(2-Bit Predictors)
11
1 0Branch History Register (BHR)2 Bit Schieberegister)
Jump address
2-Bit Predictor
Brunch Target Buffer
• Branch Target Address Cache, Branch Target Buffer• Required, if the computation of the target address is late in the
pipeline. • Stores the jump address and the target address• Can be used in the IF phase.• Can be combined with a predictor.
Adress of jump instruction
Targetaddress
Predictionbits
Cycle i+1
Cycle i+2
Cycle i
Branch Target Buffer (BTB)
• Prediction in IF Send PC to memory and BTB
Found?
Branch& Taken?
Fetch instr. at target
Taken?
Update BTBkill instructions
update PC
Mispredicted branchkill fetched instructions
update PCdelete entry from BTB
Branch corretlypredicted;
Continue executionwith no stalls
Normal instructionexecution
No Yes
YesNo
No Yes
Fetch next instruction
YesNo
Recommended