46
Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

  • View
    222

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

Lec 9: Pipelining

Kavita Bala

CS 3410, Fall 2008

Computer Science

Cornell University

Page 2: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Basic PipeliningFive stage “RISC” load-store architecture1. Instruction fetch (IF)

• get instruction from memory2. Instruction Decode (ID)

• translate opcode into control signals and read regs3. Execute (EX)

• perform ALU operation4. Memory (MEM)

• Access memory if load/store5. Writeback (WB)

• update register file

Following slides thanks to Sally McKee

Page 3: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Sample Code (Simple)

• Assume eight-register machine

• Run the following code on a pipelined datapath add 3 1 2 ; reg 3 = reg 1 + reg 2

nand 6 4 5 ; reg 6 = ~(reg 4 & reg 5)

lw 4 20 (2) ; reg 4 = Mem[reg2+20]

add 5 2 5 ; reg 5 = reg 2 + reg 5

sw 7 12(3) ; Mem[reg3+12] = reg 7

Slides thanks to Sally McKee

Page 4: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

PC Instmem

Reg

iste

r fi

le

MUXA

LU

MUX

1

Datamem

+

MUX

IF/ID

ID/EX

EX/Mem

Mem/WB

MUX

Bits 0-2

Bits 15-17

op

dest

offset

valB

valA

PC+1PC+1target

ALUresult

op

dest

valB

op

dest

ALUresult

mdata

instru

ction

0

R2

R3

R4

R5

R1

R6

R0

R7

regA

regB

Bits 21-23

data

dest

Slides thanks to Sally McKee

Page 5: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Time Graphs

Time: 1 2 3 4 5 6 7 8 9

add

nand

lw

add

sw

fetch decode execute memory writeback

fetch decode execute memory writeback

fetch decode execute memory writeback

fetch decode execute memory writeback

fetch decode execute memory writeback

Slides thanks to Sally McKee

Page 6: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Pipelining Recap• Powerful technique for masking latencies

– Logically, instructions execute one at a time– Physically, instructions execute in parallel

Instruction level parallelism

• Decouples the processor model from the implementation– Interface vs. implementation

• BUT dependencies between instructions complicate the implementation

Page 7: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

What Can Go Wrong?

• Data hazards– register reads occur in stage 2 – register writes occur in stage 5 – could read the wrong value if is about to be

written• Control hazards

– branch instruction may change the PC in stage 4

– what do we fetch before that?

Page 8: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Handling Data Hazards

• Detect and Stall– If hazards exist, stall the processor until they

go away– Safe, but not great for performance

• Detect and Forward– If hazards exist, fix up the pipeline to get the

correct value (if possible)– Most common solution for high performance

Page 9: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Handling Data Hazards III:

• Detect– dest of early instrs == source of current

• Forward:– New bypass datapaths route computed data to

where it is needed– New MUX and control to pick the right data

• Beware: Stalling may still be required even in the presence of forwarding

Page 10: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Different type of Hazards

• Use register value in subsequent instruction

add 3 1 2

or 6 3 1

nand 5 3 4

sub 6 3 1

Slides thanks to Sally McKee

Page 11: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Data Hazards: AL ops

add 3 1 2 nand 5 3 4

time

fetch decode execute memory writeback

fetch decode execute memory writeback

add

nand

If not careful, you read the wrong value of R3

Page 12: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Data Hazards: AL ops

add 3 1 2 nand 5 3 4

time

fetch decode execute memory writeback

fetch decode execute memory writeback

add

nand

If not careful, you read the wrong value of R3

Page 13: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Data Hazards: AL ops

add 3 1 2 nand 5 3 4

time

fetch decode execute memory writeback

fetch decode execute memory writeback

add

nand

If not careful, you read the wrong value of R3

Page 14: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Handling Data Hazards: Forwarding

• No point forwarding to decode

• Forward to EX stage

• From output of ALU and MEM stages

Page 15: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Data Hazards: AL ops

add 3 1 2 nand 5 3 4

time

fetch decode execute memory writeback

fetch decode execute memory writeback

add

nand

Page 16: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Data Hazards: AL ops

add 3 1 2 and 8 0 2 nand 5 3 4

time

fetch decode execute memory writeback

fetch decode execute memory writeback

add

add

fetch decode execute memory writebacknand

Page 17: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Forwarding Illustration

Instr.

Order

add $3

add

nand $5 $3

AL

UIM Reg DM Reg

AL

UIM Reg DM Reg

AL

UIM Reg DM Reg

AL

UIM Reg DM Reg

Page 18: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Data Hazards: AL ops

add 3 1 2 and 8 0 2and 8 2 0nand 5 3 4

time

fetch decode execute memory writeback

fetch decode execute memory writeback

addadd

fetch decode execute memory writebacknandfetch decode execute memory writebackadd

Write in first half of cycle, read in second half of cycle

Page 19: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Data Hazards: lw

lw 3 12(zero) and 8 0 2 nand 5 3 4

time

fetch decode execute memory writeback

fetch decode execute memory writeback

lw

add

fetch decode execute memory writebacknand

Page 20: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Data Hazards: lw

lw 3 12(zero) and 8 0 2 nand 5 3 4

time

fetch decode execute memory writeback

fetch decode execute memory writeback

lw

add

fetch decode execute memory writebacknand

Page 21: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Data Hazards: lw

lw 3 12(zero) and 8 0 2 nand 5 3 4

time

fetch decode execute memory writebacklw

fetch decode execute memory writebacknand

Page 22: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

A tricky case

add 1 1 2

add 1 1 3

add 1 1 4

Instr.

Order

add 1 1 2

AL

UIM Reg DM Reg

add 1 1 3

add 1 1 4

AL

UIM Reg DM Reg

AL

UIM Reg DM Reg

Page 23: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Another tricky case

add 0 4 1add 3 0 4

Page 24: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Handling Data Hazards: Summary

• Forward:– New bypass datapaths route computed data to

where it is needed

• Beware: Stalling may still be required even in the presence of forwarding

• For lw– MIPS 2000/3000: a delay slot– MIPS 4000 onwards: stall

But really rely on compiler to treat it like delay slot

Page 25: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Detection

• Detection:– Compare regA with previous DestRegs – Compare regB with previous DestRegs

– regA and regB going to ALU– Previous destregs

Previously in ALU, Memory and Writeback

Page 26: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

PCInstmem

Reg

iste

r fi

leMUXA

LU

MUX

1

Datamem

+

MUX

IF/ID

ID/EX

EX/Mem

Mem/WB

MUX

op

dest

offset

valB

valA

PC+1PC+1target

ALUresult

op

dest

valB

op

dest

ALUresult

mdata

instru

ction

0

R2

R3

R4

R5

R1

R6

R0

R7

regA

regB

data

dest

Slides thanks to Sally McKee

Page 27: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Sample Code

Which data hazards do you see?

add 3 1 2

nand 5 3 4

add 7 6 3

lw 6 24(3)

sw 6 12(2)

Slides thanks to Sally McKee

Page 28: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Sample Code

Which data hazards do you see?

add 3 1 2

nand 5 3 4

add 7 6 3

lw 6 24(3)

sw 6 12(2)

Slides thanks to Sally McKee

Page 29: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Hazard

PCInstmem

Reg

iste

r fi

leMUXA

LU

MUX

1

Datamem

+

MUX

IF/ID

ID/EX

EX/Mem

Mem/WB

MUX

add

7

14

12

nan

d 5 3 4

7 10 1177

14

1

0

8

R2

R3

R4

R5

R1

R6

R0

R7

regA

regB

data

3

fwd fwd fwd

3

First half of cycle 3

Slides thanks to Sally McKee

Page 30: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

PCInstmem

Reg

iste

r fi

leMUXA

LU

MUX

1

Datamem

+

MUX

IF/ID

ID/EX

EX/Mem

Mem/WB

MUX

nand

11

10

23

21

add

add

7 6 3

7 10 11 77

14

1

0

8

R2

R3

R4

R5

R1

R6

R0

R7

regA

regB

5data

H1

3

End of cycle 3

Slides thanks to Sally McKee

Page 31: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

New Hazard

PCInstmem

Reg

iste

r fi

leMUXA

LU

MUX

1

Datamem

+

MUX

IF/ID

ID/EX

EX/Mem

Mem/WB

MUX

nand

11

10

23

21

add

add

7 6 3

7 10 11 77

14

1

0

8

R2

R3

R4

R5

R1

R6

R0

R7

regA

regB

5data

3MUX

H1

3

First half of cycle 4

21

11

Slides thanks to Sally McKee

Page 32: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

PCInstmem

Reg

iste

r fi

leMUXA

LU

MUX

1

Datamem

+

MUX

IF/ID

ID/EX

EX/Mem

Mem/WB

MUX

add

10

1

34

-2

nand

add

21

lw 6 24(3)

7 10 1177

14

1

0

8

R2

R3

R4

R5

R1

R6

R0

R7

regA

regB

75 3data

MUX

H2 H1

End of cycle 4

Slides thanks to Sally McKee

Page 33: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

PCInstmem

Reg

iste

r fi

leMUXA

LU

MUX

1

Datamem

+

MUX

IF/ID

ID/EX

EX/Mem

Mem/WB

MUX

add

10

1

34

-2

nand

add

21

lw 6 24(3)

7 10 1177

14

1

0

8

R2

R3

R4

R5

R1

R6

R0

R7

regA

regB

75 3data

MUX

H2 H1

First half of cycle 5

3

21

1

Slides thanks to Sally McKee

Page 34: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

PCInstmem

Reg

iste

r fi

leMUXA

LU

MUX

1

Datamem

+

MUX

IF/ID

ID/EX

EX/Mem

Mem/WB

MUX

lw

24

21

4 5

22

add

nand

-2

sw 6 12(2)

7 21 11 77

14

1

0

8

R2

R3

R4

R5

R1

R6

R0

R7

regA

regB

7 5data

MUX

H2 H1

6

End of cycle 5

Slides thanks to Sally McKee

Page 35: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

PCInstmem

Reg

iste

r fi

leMUXA

LU

MUX

1

Datamem

+

MUX

IF/ID

ID/EX

EX/Mem

Mem/WB

MUX

lw

24

21

4 5

22

add

nand

-2

sw 6 12(2)

7 21 11 77

14

1

0

8

R2

R3

R4

R5

R1

R6

R0

R7

regA

regB

67 5

data

MUX

H2 H1

First half of cycle 6

Hazard

6

en

en

L

Slides thanks to Sally McKee

Page 36: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

PCInstmem

Reg

iste

r fi

leMUXA

LU

MUX

1

Datamem

+

MUX

IF/ID

ID/EX

EX/Mem

Mem/WB

MUX

5

31

lw

add

22

sw 6 12(2)

7 21 11 -2

14

1

0

8

R2

R3

R4

R5

R1

R6

R0

R7

regA

regB

6 7data

MUX

H2

End of cycle 6

nop

Slides thanks to Sally McKee

Page 37: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

PCInstmem

Reg

iste

r fi

leMUXA

LU

MUX

1

Datamem

+

MUX

IF/ID

ID/EX

EX/Mem

Mem/WB

MUX

nop

5

31

lw

add

22

sw 6 12(2)

7 21 11 -2

14

1

0

8

R2

R3

R4

R5

R1

R6

R0

R7

regA

regB

6 7data

MUX

H2

First half of cycle 7

Hazard

6

Slides thanks to Sally McKee

Page 38: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

PCInstmem

Reg

iste

r fi

leMUXA

LU

MUX

1

Datamem

+

MUX

IF/ID

ID/EX

EX/Mem

Mem/WB

MUX

sw

12

7

1

5

nop

lw

99

7 21 11 -2

14

1

0

22

R2

R3

R4

R5

R1

R6

R0

R7

regA

regB

6data

MUX

H3

End of cycle 7

Slides thanks to Sally McKee

6

Page 39: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

PCInstmem

Reg

iste

r fi

leMUXA

LU

MUX

1

Datamem

+

MUX

IF/ID

ID/EX

EX/Mem

Mem/WB

MUX

sw

12

7

1

5

nop

lw

99

7 21 11 -2

140

8

R2

R3

R4

R5

R1

R6

R0

R7

regA

regB

data

MUX

H3

First half of cycle 8

99

12

Slides thanks to Sally McKee

6 99

Page 40: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

PCInstmem

Reg

iste

r fi

leMUXA

LU

MUX

1

Datamem

+

MUX

IF/ID

ID/EX

EX/Mem

Mem/WB

MUX

12

7

1

5

111

sw

nop

7 21 11 -2

140

8

R2

R3

R4

R5

R1

R6

R0

R7

regA

regB

data

MUX

H3

End of cycle 8

Slides thanks to Sally McKee

6 99

Page 41: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Control Hazards

beqz 3 goToZero nand 5 1 4add 6 7 8

time

fetch decode execute memory writebackbeqz

fetch decode execute memory writebacknand

Page 42: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Control Hazards for 4-stage pipeline

beqz 3 goToZero nand 5 1 4add 6 7 8

time

F/D execute memory writebackbeqz

F/D execute memory writebacknand

Page 43: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Control Hazards• Stall

– Inject NOPs into the pipeline when the next instruction is not known

– Pros: simple, clean; Cons: slow

• Delay Slots– Tell the programmer that the N instructions after a jump will

always be executed, no matter what the outcome of the branch– Pros: The compiler may be able to fill the slots with useful

instructions; Cons: breaks abstraction boundary

• Speculative Execution– Insert instructions into the pipeline– Replace instructions with NOPs if the branch comes out opposite

of what the processor expected– Pros: Clean model, fast; Cons: complex

Page 44: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Control Hazards for 4-stage pipeline

beqz 3 goToZero nand 5 1 4add 6 7 8

time

F/D execute memory writebackbeqz

F/D execute memory writebacknand

goToZero: lui 2 1

F/D execute memory writebacklui

Delay slot!

Page 45: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Hazard summary

• Data hazards– Forward – Stall for lw/sw

• Control hazard– Delay slot

Page 46: Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University

Pipelining for Other Circuits• Pipelining can be applied to any combinational

logic circuit

• What’s the circuit latency at 2ns. per gate?• What’s the throughput?• What is the stage breakdown? Where would you

place latches?• What’s the throughput after pipelining?