Download ppt - Coa Lecture 13 Notes on Risc Pipe Lining

Lecture 13Notes on RISC-Pipelining.

Computer Organization and Architecture

RISC(Recap)Reduced Instruction Set ComputerKey features

Large number of general purpose registersOr use of compiler technology to optimize

register useLimited and simple instruction setEmphasis on optimising the instruction

pipeline

2 Lecture-13-Notes-ON-RISC-PIPELINING

RISC Characteristics(Recap)One instruction per cycleRegister to register operationsFew, simple addressing modesFew, simple instruction formatsHardwired design (no microcode)Fixed instruction formatMore compile time/effort


RISC CharacteristicsOne instruction per cycle

One machine instruction per machine cycleA machine cycle : the time it takes to fetch twooperands from registers, perform an ALUoperation and store the result in a register

Register to register operationsMost operations should be register-to-register with only simple

LOAD and STORE operationsThe design feature simplifies the instruction set and the

control unit

Lecture-13-Notes-ON-RISC-PIPELINING4

RISC CharacteristicsFew, simple addressing modes• Almost all instructions use register addressing• Several additional modes• Displacement• PC-relative

Few, simple instruction formatsOnly one or a few formats are usedInstruction length is fixed and aligned on word boundaries


RISC Pipelining(Recap)• Most instructions are register to register• Two phases of execution• I : Instruction fetch• E: Execute• ALU operation with register input and output

• For load and store• I : Instruction fetch• E: Execute• Calculate memory address

• D: Memory• Register to memory or memory to register operation

• If an instruction needs an operand that is altered by the preceding instruction, a delay is required• This delay can be accomplished by a NOOP


Sequential Operation Vs Two Way Pipelines


• Sequential operation is obviously in-efficient.• Two-way pipelined

• I and E stages of two different instructions can be performed simultaneously• Yields up to twice the execution rate of sequential

• Problems• Causes wait state with accesses to memory• Branch disrupts flow

• (NOOP instruction can be inserted by assembler or compiler)

Three way Pipelined Vs Four Way Pipelined


• Permitting two memory accesses at one time allows for fully pipelined operation (dual-port RAM).• Since E is usually longer, break E into two parts

• E1 – Register file read• E2 – ALU operation and register write

•Because of RISC design, this is not as difficult to do.•Up to four instructions can be under way at one time

(potential speedup of 4)

Optimization of PipeliningData and branch dependencies reduce the overall

execution rateDelayed branch

Does not take effect until after execution of following instruction“This” following instruction is the delay slot


Delayed Branches? Traditional pipelining disposes of instruction loaded in pipe after branch. Delayed branching executes instruction loaded in pipe after branch. NOOP can be used if instruction cannot be found to execute after JUMP. This makes it so no special circuitry is needed to clear the pipe. It is left up to the compiler to rearrange instructions or add NOOPs


Delayed Branches? The interchange of instructions will work successfully for

unconditional branches calls and returns Cannot be blindly applied for conditional branches In the condition that is tested for, the branch can be altered by

the immediately preceding instruction, the compiler must refrain from doing the interchange and instead insert a NOOP.

Delayed load can be used on LOAD instructions On the LOAD instruction, the register that is to be the target of the

load is locked by the processorThe processor continues execution of the instruction stream

until it reaches an instruction requiring that register At that point, it idles until the load is complete.

The scheduling of instructions for the pipeline and the dynamic allocation of registers should be considered together to achieve the greatest efficiency


Delayed Branches?


Instruction PipelineTwo classes of processors have evolved to offer

execution of multiple instructions per clock cycleSuper Scalar architecture

Replicates each of the pipeline stages so that two or more instructions at the same stage of the pipeline can be processed simultaneously.

Super Pipelined architectureMakes use of more fine-grained, pipeline stagesWith more stages, more instructions can be in the pipeline

at the same time, increasing parallelism


Instruction PipelineBoth approaches have limitationsWith superscalar architecture

Dependencies between instructions in different pipelines can slow down the system.

Overhead logic is required to coordinate these dependencies

With super pipeliningOverhead associated with transferring

instructions from one stage to the next