Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Exam #2 ResultsExam #2 Results
Pipeline HazardsPipeline Hazards
Jin-Soo Kim ([email protected])Computer Systems Laboratory
Sungkyunkwan Universityhttp://csl.skku.edu
3ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
HazardsHazards What are hazards?
• Situations that prevent starting the next instruction in the next cycle
Structure hazards• A required resource is busy
Data hazards• Need to wait for previous instruction to complete its
data read/write
Control hazards• Deciding on control action depends on previous
instruction
4ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Structure HazardsStructure Hazards Conflict for use of a resource
In MIPS pipeline with a single memory• Load/store requires data access• Instruction fetch would have to stall for that cycle Would cause a pipeline “bubble”
Hence, pipelined datapaths require separate instruction/data memories• Or separate instruction/data caches
5ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Data Hazards (1)Data Hazards (1) An instruction depends on completion of data
access by a previous instruction
add $s0, $t0, $t1sub $t2, $s0, $t3
6ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Data Hazards (2)Data Hazards (2) Read After Write (RAW)
• Inst J tries to read operand before Inst I writes it:
• Caused by a “(true) dependence”. This hazard results from an actual need for communication.
I: add $t1, $t2, $t3J: sub $t4, $t1, $t3
7ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Data Hazards (3)Data Hazards (3) Write After Read (WAR)
• Inst J writes operand before Inst I reads it:
• Called an “anti-dependence” by compiler writers. This results from the reuse of the name “$t1”.
I: sub $t4, $t1, $t3J: add $t1, $t2, $t3K: add $t6, $t1, $t7
8ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Data Hazards (4)Data Hazards (4) Write After Write (WAW)
• Inst J writes operand before Inst I writes it:
• Called an “output dependence” by compiler writers. This also results from the reuse of the name “$t1”.
I: sub $t1, $t4, $t3J: add $t1, $t2, $t3K: add $t6, $t1, $t7
9ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
ForwardingForwarding Forwarding (aka Bypassing)
• Use result when it is computed• Don’t wait for it to be stored in a register• Requires extra connections in the datapath
10ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Forwarding ExampleForwarding Example Consider this sequence:
We can resolve hazards with forwarding• How do we detect when to forward?
sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)
11ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Dependencies & ForwardingDependencies & Forwarding
12ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Forwarding Conditions (1)Forwarding Conditions (1) Detecting the need to forward
• Pass register numbers along pipeline– e.g., ID/EX.RegisterRs = register number for Rs sitting in
ID/EX pipeline register
• ALU operand register numbers in EX stage are given by ID/EX.RegisterRs & ID/EX.RegisterRt
• Data hazards when1a. EX/MEM.RegisterRd = ID/EX.RegisterRs1b. EX/MEM.RegisterRd = ID/EX.RegisterRt2a. MEM/WB.RegisterRd = ID/EX.RegisterRs2b. MEM/WB.RegisterRd = ID/EX.RegisterRt
Fwd from EX/MEMpipeline reg
Fwd from MEM/WBpipeline reg
13ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Forwarding Conditions (2)Forwarding Conditions (2) Detecting the need to forward (cont’d)
• But only if forwarding instruction will write to a register!– EX/MEM.RegWrite, MEM/WB.RegWrite
• And only if Rd for that instruction is not $zero– EX/MEM.RegisterRd ≠ 0,
MEM/WB.RegisterRd ≠ 0
14ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Forwarding Conditions (3)Forwarding Conditions (3) Forwarding paths
15ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Forwarding Conditions (4)Forwarding Conditions (4) EX hazard
• if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)and (EX/MEM.RegisterRd = ID/EX.RegisterRs))
ForwardA = 10 • if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt))ForwardB = 10
MEM hazard• if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))ForwardA = 01
• if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB = 01
16ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Double Data HazardDouble Data Hazard Consider this sequence:
Both hazards occur• Want to use the most recent
Revise MEM hazard condition• Only forward if EX hazard condition isn’t true
add $1, $1, $2add $1, $1, $3add $1, $1, $4
17ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Rev’d Forwarding ConditionsRev’d Forwarding Conditions MEM hazard
• if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA = 01
• if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB = 01
18ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Datapath with ForwardingDatapath with Forwarding
19ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Load-Use Data Hazard (1)Load-Use Data Hazard (1) Load-Use data hazards
• Can’t always avoids stalls by forwarding• If value not computed when needed• Can’t forward backward in time!
20ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Load-Use Data Hazard (2)Load-Use Data Hazard (2)
Need to stall for one cycle
21ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Load-Use Data Hazard (3)Load-Use Data Hazard (3) Load-use hazard detection
• Check when using instruction is decoded in ID stage• ALU operand register numbers in ID stage are given
by IF/ID.RegisterRs & IF/ID.RegisterRt• Load-use hazard when
– ID/EX.MemRead and((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt))
• If detected, stall and insert bubble
22ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Load-Use Data Hazard (4)Load-Use Data Hazard (4) How to stall the pipeline?
• Force control values in ID/EX register to 0– EX, MEM, and WB do nop (no-operation)
• Prevent update of PC and IF/ID register– Using instruction is decoded again– Following instruction is fetched again– 1-cycle stall allows MEM to read data for lw
» Can subsequently forward to EX stage
23ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Load-Use Data Hazard (5)Load-Use Data Hazard (5) Stall/Bubble in the pipeline
Stall inserted here
24ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Load-Use Data Hazard (6)Load-Use Data Hazard (6) Stall/Bubble in the pipeline (cont’d)
Or, more accurately...
25ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Datapath + Hazard DetectionDatapath + Hazard Detection
26ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Compiler Approach (1)Compiler Approach (1) Stalls reduce performance
• But are required to get correct results
Compiler can arrange code to avoid hazards and stalls• Requires knowledge of the pipeline structure
27ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Compiler Approach (2)Compiler Approach (2) Code scheduling to avoid stalls
• Reorder code to avoid use of load result in the next instruction
(C code) A = B + E; C = B + F;
lw $t1, 0($t0)lw $t2, 4($t0)add $t3, $t1, $t2sw $t3, 12($t0)lw $t4, 8($t0)add $t5, $t1, $t4sw $t5, 16($t0)
lw $t1, 0($t0)lw $t2, 4($t0)lw $t4, 8($t0) add $t3, $t1, $t2sw $t3, 12($t0)add $t5, $t1, $t4sw $t5, 16($t0)
stall
stall
13 cycles 11 cycles
28ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Control Hazards (1)Control Hazards (1) Branch determines flow of control
• Fetching next instruction depends on branch outcome
• Pipeline can’t always fetch correct instruction– Still working on ID stage of branch
In MIPS pipeline• Need to compare registers and compute target early
in the pipeline• Add hardware to do it in ID stage
29ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Control Hazards (2)Control Hazards (2) Stall on branch
• If branch outcome determined in MEM:
PC
Flush theseinstructions(Set controlvalues to 0)
30ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Control Hazards (3)Control Hazards (3) Reducing branch delay
• Move hardware to determine outcome to ID stage– Target address adder– Register comparator
Example: branch taken
36: sub $10, $4, $840: beq $1, $3, 744: and $12, $2, $548: or $13, $2, $652: add $14, $4, $256: slt $15, $6, $7
...72: lw $4, 50($7)
31ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Control Hazards (4)Control Hazards (4)
32ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Control Hazards (5)Control Hazards (5)
33ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Data Hazards for Branches (1)Data Hazards for Branches (1) Data hazards for branches
• If a comparison register is a destination of 2nd or 3rd preceding ALU instruction Can resolve using forwarding
…
add $4, $5, $6
add $1, $2, $3
beq $1, $4, target
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
34ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Data Hazards for Branches (2)Data Hazards for Branches (2) Data hazards for branches (cont’d)
• If a comparison register is a destination of preceding ALU instruction or 2nd preceding load instruction Need 1 stall cycle
beq stalled
add $4, $5, $6
lw $1, addr
beq $1, $4, target
IF ID EX MEM WB
IF ID EX MEM WB
IF ID
ID EX MEM WB
35ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Data Hazards for Branches (3)Data Hazards for Branches (3) Data hazards for branches (cont’d)
• If a comparison register is a destination of immediately preceding load instruction Need 2 stall cycles
beq stalled
beq stalled
lw $1, addr
beq $1, $0, target
IF ID EX MEM WB
IF ID
ID
ID EX MEM WB
36ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Branch Prediction (1)Branch Prediction (1) Branch prediction
• Longer pipelines can’t readily determine branch outcome early– Stall penalty becomes unacceptable
• Predict outcome of branch– Only stall if prediction is wrong
• In MIPS pipeline– Can predict branches not taken– Fetch instruction after branch, with no delay
37ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Branch Prediction (2)Branch Prediction (2) Static branch prediction
• Based on typical branch behavior• Example: loop and if-statement branches
– Predict backward branches taken– Predict forward branches not taken
Dynamic branch prediction• Hardware measures actual branch behavior
– e.g., record recent history of each branch
• Assume future behavior will continue the trend– When wrong, stall while re-fetching, and update history
38ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Static Branch PredictionStatic Branch Prediction MIPS with “Predict Not Taken”
Prediction correct
Prediction incorrect
39ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Dynamic Branch Prediction (1)Dynamic Branch Prediction (1) Dynamic branch prediction
• In deeper and superscalar pipelines, branch penalty is more significant
• Use dynamic prediction– Branch prediction buffer (aka branch history table)– Indexed by recent branch instruction addresses– Stores outcome (taken/not taken)– To execute a branch
» Check table, expect the same outcome» Start fetching from fall-through or target» If wrong, flush pipeline and flip prediction
40ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Dynamic Branch Prediction (2)Dynamic Branch Prediction (2) 1-bit predictor: shortcoming
• Inner loop branches mispredicted twice!
– Mispredict as taken on last iteration of inner loop– Then mispredict as not taken on first iteration of inner loop
next time around
outer: ……
inner: ……beq …, …, inner…beq …, …, outer
41ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Dynamic Branch Prediction (3)Dynamic Branch Prediction (3) 2-bit predictor
• Only change prediction on two successive mispredictions
42ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Dynamic Branch Prediction (4)Dynamic Branch Prediction (4) Calculating the branch target
• Even with predictor, still need to calculate the target address– 1-cycle penalty for a taken branch
• Branch target buffer– Cache of target addresses– Indexed by PC when instruction fetched
» If hit and instruction is branch predicted taken, can fetch target immediately
43ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim ([email protected])
Compiler ApproachCompiler Approach Delayed branch
• Branch delay slot filled by a useful instruction• Done by compilers and assemblers