Lecture 13Notes on RISC-Pipelining.
Computer Organization and Architecture
RISC(Recap)Reduced Instruction Set ComputerKey features
Large number of general purpose registersOr use of compiler technology to optimize
register useLimited and simple instruction setEmphasis on optimising the instruction
pipeline
2 Lecture-13-Notes-ON-RISC-PIPELINING
RISC Characteristics(Recap)One instruction per cycleRegister to register operationsFew, simple addressing modesFew, simple instruction formatsHardwired design (no microcode)Fixed instruction formatMore compile time/effort
3 Lecture-13-Notes-ON-RISC-PIPELINING
RISC CharacteristicsOne instruction per cycle
One machine instruction per machine cycleA machine cycle : the time it takes to fetch twooperands from registers, perform an ALUoperation and store the result in a register
Register to register operationsMost operations should be register-to-register with only simple
LOAD and STORE operationsThe design feature simplifies the instruction set and the
control unit
Lecture-13-Notes-ON-RISC-PIPELINING4
RISC CharacteristicsFew, simple addressing modes• Almost all instructions use register addressing• Several additional modes• Displacement• PC-relative
Few, simple instruction formatsOnly one or a few formats are usedInstruction length is fixed and aligned on word boundaries
Lecture-13-Notes-ON-RISC-PIPELINING5
RISC Pipelining(Recap)• Most instructions are register to register• Two phases of execution• I : Instruction fetch• E: Execute• ALU operation with register input and output
• For load and store• I : Instruction fetch• E: Execute• Calculate memory address
• D: Memory• Register to memory or memory to register operation
• If an instruction needs an operand that is altered by the preceding instruction, a delay is required• This delay can be accomplished by a NOOP
6 Lecture-13-Notes-ON-RISC-PIPELINING
Sequential Operation Vs Two Way Pipelines
Lecture-13-Notes-ON-RISC-PIPELINING7
• Sequential operation is obviously in-efficient.• Two-way pipelined
• I and E stages of two different instructions can be performed simultaneously• Yields up to twice the execution rate of sequential
• Problems• Causes wait state with accesses to memory• Branch disrupts flow
• (NOOP instruction can be inserted by assembler or compiler)
Three way Pipelined Vs Four Way Pipelined
Lecture-13-Notes-ON-RISC-PIPELINING8
• Permitting two memory accesses at one time allows for fully pipelined operation (dual-port RAM).• Since E is usually longer, break E into two parts
• E1 – Register file read• E2 – ALU operation and register write
•Because of RISC design, this is not as difficult to do.•Up to four instructions can be under way at one time
(potential speedup of 4)
Optimization of PipeliningData and branch dependencies reduce the overall
execution rateDelayed branch
Does not take effect until after execution of following instruction“This” following instruction is the delay slot
Lecture-13-Notes-ON-RISC-PIPELINING9
Delayed Branches? Traditional pipelining disposes of instruction loaded in pipe after branch. Delayed branching executes instruction loaded in pipe after branch. NOOP can be used if instruction cannot be found to execute after JUMP. This makes it so no special circuitry is needed to clear the pipe. It is left up to the compiler to rearrange instructions or add NOOPs
Lecture-13-Notes-ON-RISC-PIPELINING10
Delayed Branches? The interchange of instructions will work successfully for
unconditional branches calls and returns Cannot be blindly applied for conditional branches In the condition that is tested for, the branch can be altered by
the immediately preceding instruction, the compiler must refrain from doing the interchange and instead insert a NOOP.
Delayed load can be used on LOAD instructions On the LOAD instruction, the register that is to be the target of the
load is locked by the processorThe processor continues execution of the instruction stream
until it reaches an instruction requiring that register At that point, it idles until the load is complete.
The scheduling of instructions for the pipeline and the dynamic allocation of registers should be considered together to achieve the greatest efficiency
Lecture-13-Notes-ON-RISC-PIPELINING11
Delayed Branches?
Lecture-13-Notes-ON-RISC-PIPELINING12
Instruction PipelineTwo classes of processors have evolved to offer
execution of multiple instructions per clock cycleSuper Scalar architecture
Replicates each of the pipeline stages so that two or more instructions at the same stage of the pipeline can be processed simultaneously.
Super Pipelined architectureMakes use of more fine-grained, pipeline stagesWith more stages, more instructions can be in the pipeline
at the same time, increasing parallelism
Lecture-13-Notes-ON-RISC-PIPELINING13
Instruction PipelineBoth approaches have limitationsWith superscalar architecture
Dependencies between instructions in different pipelines can slow down the system.
Overhead logic is required to coordinate these dependencies
With super pipeliningOverhead associated with transferring
instructions from one stage to the next
Lecture-13-Notes-ON-RISC-PIPELINING14