View
216
Download
0
Tags:
Embed Size (px)
Citation preview
CSCI 620 NOTE8
1
Instruction Level Parallelism andTomasulo’s approach
CSCI 620 NOTE8
2Instruction Level Parallelism
• Pipeline CPI = Ideal pipeline CPI + Structural stalls + Data hazard stalls + Control stalls
• Reduce stalls, reduce CPI
• Reduce CPI, increase IPC
• Instruction-level parallelism (ILP) seeks to reduce stalls
• Importance of ILP is more visible in Loop-level parallelism:
for (i=1; i<1000; i=i+1)
{
x[i] = x[i] + y[i];
}
CSCI 620 NOTE8
3Major Techniques to increase ILP
Techniques Reduces Section
Forwarding and bypassing Potential data hazard stalls
Delayed branches and simple branch scheduling
Control hazard stalls
Basic dynamic scheduling (scoreboarding) Data hazard stalls from true dependences
Dynamic scheduling with renaming Data hazard stalls and stalls from antidependences and output dependences
Dynamic branch prediction Control stalls
Issuing multiple instructions per cycle Ideal CPI
Speculation Data hazards and control hazard stalls
Dynamic memory disambiguation Data hazard stalls with memory
Loop unrolling Control hazard stalls
Basic compiler pipeline scheduling Data hazard stalls
Compiler dependence analysis Ideal CPI, data hazard stalls
Software pipelining, trace scheduling Ideal CPI, data hazard stalls
Compiler speculation Ideal CPI, data, control stalls
CSCI 620 NOTE8
4Instruction Level Parallelism
• ILP by SW (static) or HW (dynamic) techniques
• HW intensive ILP dominates desktop and server markets
• SW compiler intensive approaches more likely seen in embedded systems—but IA-64 uses the approach
CSCI 620 NOTE8
5
Dependences• Two instructions are parallel if they can execute
simultaneously in a pipeline without causing any stalls (assuming no structural hazards) and can be reordered
• Two instructions that are dependent are not parallel and cannot be reordered—must be executed in-order—even though they can be partially overlapped
• Three types of dependences
– Data dependences(=true data dependences)
– Name dependences
– Control dependences
CSCI 620 NOTE8
6
Dependences
• Dependences are properties of programs• Whether a dependence results in an actual hazard(& the length of stalls) are
properties of the pipeline organization• Dependence
1) indicates the potential for a hazard2) Determines the order in which results must be calculated3) Sets an upperbound for ILP
• Problems caused by Dependences can be solved by:1) Try to avoid by rescheduling2) Eliminate by transforming the code (alter the code)
• Compiler concerned about dependences in program, whether or not a HW hazard occurs depends on a given pipeline
CSCI 620 NOTE8
7Review of Data Hazards
Consider instructions i and j, where i occurs before j.
RAW (read after write) — j tries to read a source before i writes it, so j gets the old value
WAW (write after write) — j tries to write an operand before it is written by i (only possible in pipelines that write in more than one pipe stage or allow an instruction to proceed even when a previous instruction is stalled)
WAR (write after read) — j tries to write a destination before it is read by i, so i incorrectly gets the new value (only possible when some instructions can write results early in the pipeline and other instructions can read sources late in the pipeline)
CSCI 620 NOTE8
8(1) Data Dependences
• (True) Data dependences
– Instruction i produces a result used by instruction j(directly), or
– Instruction j is data dependent on instruction k, and instruction k is data dependent on instruction i (inderectly).
j k i j i
• Easy to determine in cases of registers (fixed names)
• Harder to determine for memory:
– Does 100(R4) = 20(R6)?
– From different loop iterations, does 20(R4) = 20(R4)?
– Will see hardware technique in chap 2
i: ADD.D F0, F2, F4
j: SUB.D F6, F0, F8
CSCI 620 NOTE8
9(2) Name Dependences
• Second type of dependences called name dependence: two instructions use same name (same register or memory location) but don’t exchange data
• Antidependence
– Instruction j writes a register or memory location that instruction i reads from and instruction i must be executed first—if not, then WAR hazard
• Output dependence
– Instruction i and instruction j write the same register or memory location; ordering between instructions must be preserved—if not, then WAW
* Name Dependences are harder to handle for memory accesses– Does 100(R4) = 20 (R6)?– From different loop iterations, does 20(R4) = 20(R4)?
i : ADD.D F0, F2, F4
j : SUB.D F2, F6, F8
i : ADD.D F0, F2, F4
j : SUB.D F0, F6, F8
CSCI 620 NOTE8
10
Register Renaming eliminates WAR & WAWAssuming temporary registers S and T :
DIV.D F0, F2, F4 DIV.D F0, F2, F4ADD.D F6, F0, F8 ADD.D S, F0, F8S.D F6, 0(R1) S.D S, 0(R1)SUB.D F8, F10, F14 SUB.D T, F10, F14MUL.D F6, F10, F8 MUL.D F6, F10, T
(True) Data Dependences ? Antidependences(WAR) ? Output dependences(WAW) ? Which dependences are eliminated by renaming? Subsequent F8 must be replaced by T How about F6? Not needed to be replaced as F8 because MULT.D will change F6
(True) Data Dependences= (1) DIV.D—ADD.D (2) ADD.D—S.D (3) SUB.D—MUL.D
Antidependences = ADD.D—SUB.D
Output dependences = ADD.D—MUL.D
Register renaming
WAR & WAW are eliminated by register renaming—will be implemented in hardware
CSCI 620 NOTE8
11(3) Control Dependence
• Final kind of dependence called control dependence • Example
if pl {S1;};if p2 {S2;}
S1 is control dependent on p1 and S2 is control dependent on p2 but not on p1.
Note that S2 could be data dependent on S1.
CSCI 620 NOTE8
12Control Dependences
• Two (obvious) constraints on control dependences:
– An instruction that is control dependent on a branch cannot be moved before the branch so that its execution is no longer controlled by the branch
– An instruction that is not control dependent on a branch cannot be moved to after the branch so that its execution is controlled by the branch
if p1 {S1;
};if p2
{S2;}
S1;if p1
{S1;};if p2
{S2;}
if pl {S1;
};S3;if p2
{S2;}
if pl {S1;
};S3;if p2
{S2;}S3
CSCI 620 NOTE8
13Limitations of Scoreboarding(Scoreboard
hardware onnext slide)
• No forwarding hardware
• Limited to instructions in basic block (small window)
• Small number of functional units (structural hazards), especially integer/load/store units—only one each
• Can not issue if structural or WAW hazards
• Must wait until WAR hazards resolved
• Imprecise exceptions due to out-of-order execution
Improvement? Tomasulo’s Approach
CSCI 620 NOTE8
14
Figure A.50 The basic structure of a MIPS processor with a scoreboard
Scoreboard
Integer unit
FP add
FP divide
FP mult
FP mult
Registers Data buses
Control/statusControl/status
Data flows
Control/status flows
Scoreboard originally proposed in CDC6600 (Seymore Cray,1964)
Scoreboard Hardware— centralized control by Scoreboard
CSCI 620 NOTE8
15Busy – Indicates whether the unit is busy or notOp – Operation to perform in the unit (e.g., add or subtract)Fi – Destination registerFj, Fk – Source-register numbersQj, Qk – Functional units producing source registers Fj, FkRj, Rk – Flags indicating when Fj, Fk are available and not yet read.
CSCI 620 NOTE8
16Tomasulo’s Algorithm
For IBM 360/91 about 3 years after CDC 6600 (Late 1960s)Goal: High performance without special compilersDifferences between Tomasulo’s Algorithm & Scoreboard(Similar to Scoreboarding, but added Register Renaming)
– Control & buffers (called “reservation stations”) distributed with functional units vs. centralized in scoreboard—Scoreboard/Inst buffer Reservation Stations for each FU
– Registers in instructions replaced by pointers to reservation station buffer
– HW renaming of registers to avoid WAR, WAW hazards– Common data bus (CDB) broadcasts results to functional units– Load and stores treated as functional units as well
Very Importantly– Tomasulo’s algorithm are adopted to many modern CPUs;
Alpha 21264, HP PA-8000, MIPS R10K, Pentium III, Pentium 4, PowerPC 604, etc…
CSCI 620 NOTE8
17Key concept: Reservation Stations(RS)
• Distributed (rather than centralized) control scheme
– Bypassing(data directly to RS rather than via registers) is allowed via Common Data Bus (CDB) to RS
– Register Renaming eliminates WAR/WAW hazards
• Scoreboard/Instruction Buffer => Reservation Stations
– Fetch and Buffer operands as soon as available
• Eliminates need to always get values from registers at execute
– Pending instructions designate reservation stations that will provide their inputs
– Successive writes to a register cause only the last one to update the register
CSCI 620 NOTE8
18MIPS Floating-point unit using Tomasulo’s Algorithm
CSCI 620 NOTE8
19Details
• Each reservation station holds instructions that has been issued and waiting for execution—an instruction may already have all the operands or it has the name(s) of RS or the names of load buffers which will provide them. These name fields are called “tags”—4-bits each to denote one of 5 RSs & 6 Load buffers—RSs are used for renaming
• Load buffer & Store buffer behave almost exactly like RS
• All results from the FUs and from memory are sent on the Common Data Bus which is connected to everywhere except the Load buffer
CSCI 620 NOTE8
20
Three Stages of Tomasulo’s Algorithm
1. Issue: Get the next instruction from FP operation queue (FIFO) If reservation station free (if Not free stall (=structural hazard)), issues instruction & sends operands (if available in register, else provide name of FU(=renaming)). Avoids WAR & WAW
2. Execution: Operate on operands (EX)
When both operands ready(already in Vj/Vk or from CDB), get them, then execute; if not ready, watch common data bus for result. RAW avoided
3. Write result: Finish execution (WB)
Write on common data bus so that all awaiting FUs can hear; mark reservation station as available.
Common data bus: 64 bit data + 4 bit source (“come from”)
CSCI 620 NOTE8
21
Data Buses in Tomasulo’s Algorithm
• Compare to Normal data bus which has: data + destination (“go to” bus)
• CDB(Common Data Bus): data + source (“come from” bus)
– 64 bits of data + 4 bits of Functional Unit source address
(RS’s number)
– Any receiving unit(Store buffer, RSs, FP registers) will accept(Write) if the RS’s number matches the expected number
CSCI 620 NOTE8
22Reservation Station Components
Op – Operation to perform in the unit (e.g., + or – )
Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here
Vj, Vk – Registers that store the Value of source operands—temp registers for renaming
Busy – Indicates reservation station and FU is busy
Register result status – Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register.
CSCI 620 NOTE8
23Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
24Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
25
Load & Store require 2 steps:
Step 1: Compute effective addr(ea)
Step 2: Place ea in buffer
Execution(Load or Store) can start when memory unit is not busy
Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
26Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
27
Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
28Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
29Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
30Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
31Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
32Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
33Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
34Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
35Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
36Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
37Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
38Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
39Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
40
Wait until DIVD finishesDivide takes 40 cycles
CSCI 620 NOTE8
41Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
42Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
43Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
44
• Why take longer on scoreboard of CDC 6600?Structural HazardsLack of forwarding
• Both in-order issue and out-of-order execution• Scoreboard cannot handle WAR & WAW• Tomasulo can with register renaming• Both will stall with Branch instruction—later see Tomasulo with Speculation
Assuming(for Scoreboard):Add takes 2 clock cycles, multiply=10, divide=40
Scoreboard Tomasulo
Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
45
Let’s try this site--http://www.ecs.umass.edu/ece/koren/architecture/Tomasulo/AppletTomasulo.html
CSCI 620 NOTE8
46
CSCI 620 NOTE8
47Tomasulo’s Algorithm: A Loop-Based Example
Loop: LD F0 0(R1)MULTD F4 F0 F2SD F4 0(R1)SUBI R1 R1 #8BNEZ R1 Loop
• Multiply takes 4 clocks• Assume first load takes 8 clocks (cache miss), second load
takes 1 clock (hit)—on a cache miss, a block(several words) is brought into the cache
• Reality: integer instructions run ahead
CSCI 620 NOTE8
48
Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
49
Cache miss occurs, so LD must wait for 8 cycles
Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
50
Cache miss occurs, so LD must wait for 8 cycles
Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
51
Cache miss occurs, so LD must wait for 8 cycles
Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
52
Cache miss occurs, so LD must wait for 8 cycles
Since SUBI is executed by Integer unit, it is not shown here—we only show the FP unit here
Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
53
Cache miss occurs, so LD must wait for 8 cycles
Since BNEZ is executed by Integer unit, it is not shown here—we only show the FP unit here
Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
54
Cache miss occurs, so LD must wait for 8 cycles
This is “register renaming”
Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
55
Cache miss occurs, so LD must wait for 8 cycles
This is “register renaming”
Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
56
Cache miss occurs, so LD must wait for 8 cycles
Higher ILP !
Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
57
Cache is finally ready, so read from memory
Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
58Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
59Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
60Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
61Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
62Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
63Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
64Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
65Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
66Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
67Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
68Op – Operation to perform in the unit (e.g., + or – )Qj, Qk – The name of Reservation stations that will produce source registers—no values stored hereVj, Vk – Registers that store the Value of source operands—temp registers for renamingBusy – Indicates reservation station and FU is busy
CSCI 620 NOTE8
69Tomasulo Summary
Reservation stations: renaming to larger set of registers + buffering source operands
– Prevents registers becoming bottleneck
– Distribute RAW hazard detection—to RSs
– Avoids WAR, WAW hazards of scoreboard by Register Renaming
– Allows loop unrolling in HW
– Tag match in CDB requires many associative compares
– Common Data Bus Achilles heal of Tomasulo Multiple writebacks (multiple CDBs) expensive
CSCI 620 NOTE8
70Tomasulo Summary
Lasting Contributions—Most of modern processors employ the algorithm
– Dynamic scheduling
– Register renaming
– Load/store disambiguation– Load address compared with store address in store buffer If match found load instruction is not sent to load buffer—avoids which hazard?
RAW
360/91 descendants are Pentium III, IV; PowerPC 604; MIPS R10000; HP-PA 8000; Alpha 21264