26
CSC 4250 Computer Architectures September 22, 2006 Appendix A. Pipelining

CSC 4250 Computer Architectures

  • Upload
    cera

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

CSC 4250 Computer Architectures. September 22, 2006 Appendix A. Pipelining. Instruction Issue. What is Instruction Issue? Process of letting an instruction move from ID stage to EX stage. Data Hazards. How does MIPS Integer Pipeline avoid data hazards? - PowerPoint PPT Presentation

Citation preview

Page 1: CSC 4250 Computer Architectures

CSC 4250Computer Architectures

September 22, 2006Appendix A. Pipelining

Page 2: CSC 4250 Computer Architectures

Instruction Issue

What is Instruction Issue? Process of letting an instruction move from

ID stage to EX stage.

Page 3: CSC 4250 Computer Architectures

Data Hazards

How does MIPS Integer Pipeline avoid data hazards? The pipeline checks data hazards during ID. If hazard

exists, then pipeline stops instruction issue.

Page 4: CSC 4250 Computer Architectures

Forwarding

Determine if forwarding is needed during EX Set appropriate controls

Page 5: CSC 4250 Computer Architectures

Data Forwarding to ALU inputs in EX

Opcode Comparison (if equal then forward)

RR ALU EX/MEM.IR[rd] == ID/EX.IR[rs]

RR ALU EX/MEM.IR[rd] == ID/EX.IR[rt]

RR ALU MEM/WB.IR[rd] == ID/EX.IR[rs]

RR ALU MEM/WB.IR[rd] == ID/EX.IR[rt]

ALU Imm. EX/MEM.IR[rt] == ID/EX.IR[rs]

ALU Imm. EX/MEM.IR[rt] == ID/EX.IR[rt]

ALU Imm. MEM/WB.IR[rt] == ID/EX.IR[rs]

ALU Imm. MEM/WB.IR[rt] == ID/EX.IR[rt]

Load MEM/WB.IR[rt] == ID/EX.IR[rs]

Load MEM/WB.IR[rt] == ID/EX.IR[rt]

Page 6: CSC 4250 Computer Architectures

More on Forwarding

There are 10 separate comparisons needed to tell whether a forwarding operation should occur. Remember that the pipeline latch for destination instruction in EX is ID/EX, while the source values come from the ALUOutput portion of EX/MEM or MEM/WB or the LMD portion of MEM/WB.

Page 7: CSC 4250 Computer Architectures

Load Interlocks

Example:

LD R1,45(R2)

DADD R5,R1,R7 How to detect need for Load Interlock?

Page 8: CSC 4250 Computer Architectures

Logic to Detect Need for Load Interlocks

Opcode field of ID/EX

Opcode field of IF/ID

Matching operand fields

1 Load RR ALU ID/EX.IR[rt]==IF/ID.IR[rs]

2 Load RR ALU ID/EX.IR[rt]==IF/ID.IR[rt]

3 Load Load, store, ALU immed.,

or branch

ID/EX.IR[rt]==IF/ID.IR[rs]

Page 9: CSC 4250 Computer Architectures

Explanation of Previous Slide

Lines 1 and 2 test whether the load destination register is one of the source registers for an R-R operation in ID. Line 3 determines if the load destination register is a source for a load or store effective address, an ALU immediate, or a branch test. Remember that the IF/ID register holds the state of the instruction in ID, which potentially uses the load result, while ID/EX holds the state of the instruction in EX, which is the load instruction.

Page 10: CSC 4250 Computer Architectures

Exception (also called interrupt or fault)

Includes I/O device request Invoke OS Breakpoint (programmer-requested interrupt) Integer (FP) arithmetic overflow Page fault (not in main memory) Misaligned memory accesses Memory protection violation Use undefined or unimplemented instruction Hardware malfunctions Power failure

Page 11: CSC 4250 Computer Architectures

Characteristics of Exceptions

Synchronous versus asynchronous(same place, same data and mem. location →

synchronous) User requested versus coerced

(hardware event not under control → coerced) User maskable versus user nonmaskable

(event can be disabled by user → maskable) Within versus between instructions

(exceptions within instructions are usually synchronous) Resume versus terminate

(program’s execution continues → resume)

Page 12: CSC 4250 Computer Architectures

Actions for Different Exceptions Type Sync.? User

requ.?User mask?

Within instr.?

Resume?

I/O device request Asyn. Coer. Nonm. Betw. Resume

Invoke OS Sync. Requ. Nonm. Betw. Resume

Breakpoint Sync. Requ. Mask. Betw. Resume

Integer (FP) arith. overflow Sync. Coer. Mask. Within Resume

Page fault Sync. Coer. Nonm. Within Resume

Misaligned mem. accesses Sync. Coer. Mask. Within Resume

Mem. protection violation Sync. Coer. Nonm. Within Resume

Use undefined instructions Sync. Coer. Nonm. Within Termin.

Hardware malfunctions Asyn. Coer. Nonm. Within Termin.

Power failure Asyn. Coer. Nonm. Within Termin.

Page 13: CSC 4250 Computer Architectures

How to Save Pipeline State

Force trap instruction into pipeline on next IF Until trap is taken, turn off all writes for faulting

instruction and all instructions that follow in the pipeline

After exception-handling routine in OS receives control, it saves the PC of faulting instruction

Page 14: CSC 4250 Computer Architectures

Delayed Branch

It is not possible to re-create the state of the processor with a single PC

Need to save and restore as many PCs as the length of the branch delay plus one

Page 15: CSC 4250 Computer Architectures

Precise Exceptions

If the pipeline can be stopped so that the instructions just before the faulting instruction are completed and those after it can be restarted from scratch, the pipeline is said to exhibit precise exceptions

Page 16: CSC 4250 Computer Architectures

Possible Exceptions in MIPS Pipeline Stage Exceptions

IF Page fault on instruction fetch;

misaligned memory access;

memory protection violation

ID Undefined or illegal opcode

EX Arithmetic exception

MEM Page fault on data fetch;

misaligned memory access;

memory protection violation

WB None

Page 17: CSC 4250 Computer Architectures

Exceptions in MIPS

Multiple exceptions may occur in same clock cycle Exceptions may occur out of order Example:

LD IF ID EX MEM WB

DADD IF ID EX MEMWB

LD may encounter a data page fault, while DADD gets an instruction page fault

Page 18: CSC 4250 Computer Architectures

Out-of-order Exceptions

Say we want precise exceptions DADD exception occurs first Pipeline cannot handle DADD exception yet Hardware posts all exceptions caused by a given

instrution in a status vector associated with that instruction

Once exception set, stop register and memory writes When instruction enters WB, status vector is checked Exceptions handled in same order as unpipelined

processor ─ exception in earliest instruction first

Page 19: CSC 4250 Computer Architectures

MIPS Pipeline with Unpipelined FP Units

Page 20: CSC 4250 Computer Architectures

MIPS Pipeline with two pipelined FP units

Page 21: CSC 4250 Computer Architectures

Functional Units

Functional unit Latency Initiation interval

Integer ALU 0 1

Data memory 1 1

FP add 3 1

FP multiply 6 1

FP divide 24 25

Page 22: CSC 4250 Computer Architectures

Hazards in Longer Latency Pipelines Divide unit not pipelined

→ Structural hazards Varying instruction running times

→ Multiple register writes in a cycle Instructions don’t reach WB in order

→ WAW hazards Instructions complete in order different from issue

→ Problems with exceptions Longer latency of operations

→ RAW hazards more frequent WAR hazards?

Page 23: CSC 4250 Computer Architectures

Data Hazards

1. WAW Hazard ─ Write After Write Hazard

2. RAW Hazard ─ Read After Write Hazard

3. WAR Hazard ─ Write After Read Hazard

4. Is there a RAR hazard?

Page 24: CSC 4250 Computer Architectures

Stalls from WAW Hazards

1. DIV.D F0,F4,F6

2. ADD.D F10,F0,F8

3. ADD.D F0,F12,F14

Clock cycle number

In. 1 2 3 4 5 6 7 8 9 10 11 … 27 28 29 30 31 32 33

1 IF ID D1 D2 D3 D4 D5 D6 D7 D8 D9 … D25 ME WB

2 IF

3 IF

Fill in the blanks above

Page 25: CSC 4250 Computer Architectures

Stalls from RAW Hazards

1. L.D F4,O(R2)2. MUL.D F0,F4,F63. ADD.D F2,F0,F84. S.D F2,O(R2)

Clock cycle numberIn. 1 2 3 4 5 6 7 8 9 10 11 … 27 28 291 IF ID EX ME WB2 IF3 IF4 IF Fill in the blanks above

Page 26: CSC 4250 Computer Architectures

No WAR Hazards

WAR hazards are not possible, since the register reads always occur in the ID stage (as long as instructions are issued in order).

Example:ADD R1,R2,R3 IF ID EX ME WB

ADD R2,R4,R5 IF ID EX ME WB