39
1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

1

RISC Pipeline

Han WangCS3410, Spring 2010

Computer ScienceCornell University

See: P&H Chapter 4.6

Page 2: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

2

Homework 2

0 1 2 3 4 5 6 7 8 9

Page 3: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

3

Announcements- Homework 2 due tomorrow midnight- Programming Assignment 1 release tomorrow- Pipelined MIPS processor (topic of today)- Subset of MIPS ISA- Feedback- We want to hear from you!- Content?

Page 4: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

4

Absolute Jump

tgt

+4

||

Data Mem

addr

ext

5 5 5

Reg.File

PC

Prog.Mem ALUinst

control

imm

offset

+

=?

cmp

Could have used ALU for

link add

+4

op mnemonic description0x3 JAL target r31 = PC+8 (+8 due to branch delay slot)

PC = (PC+4)31..28 || (target << 2)

Page 5: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

5

A Processor

alu

PC

imm

memory

memory

din dout

addr

target

offset cmpcontrol

=?

new pc

registerfile

inst

extend

+4 +4

Review: Single cycle processor

Page 6: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

6

Single Cycle ProcessorAdvantages• Single Cycle per instruction make logic and clock simple

Disadvantages• Since instructions take different time to finish, memory

and functional unit are not efficiently utilized.• Cycle time is the longest delay.

– Load instruction

• Best possible CPI is 1

Page 7: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

7

Pipeline Hazards

0h 1h 2h 3h…

Page 8: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

8

Write-BackMemory

InstructionFetch Execute

InstructionDecode

registerfile

control

A Processor

alu

imm

memory

din dout

addr

inst

PC

memory

computejump/branch

targets

new pc

+4

extend

Page 9: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

9

Basic Pipeline

Five stage “RISC” load-store architecture1. Instruction fetch (IF)

– get instruction from memory, increment PC2. Instruction Decode (ID)

– translate opcode into control signals and read registers3. Execute (EX)

– perform ALU operation, compute jump/branch targets4. Memory (MEM)

– access memory if needed5. Writeback (WB)

– update register file

Slides thanks to Sally McKee & Kavita Bala

Page 10: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

10

Pipelined Implementation

Break instructions across multiple clock cycles (five, in this case)

Design a separate stage for the execution performed during each clock cycle

Add pipeline registers to isolate signals between different stages

Page 11: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

11

Write-BackMemory

InstructionFetch Execute

InstructionDecode

extend

registerfile

control

Pipelined Processor

alu

memory

din dout

addrPC

memory

newpc

inst

IF/ID ID/EX EX/MEM MEM/WB

imm

BA

ctrl

ctrl

ctrl

BD D

M

computejump/branch

targets

+4

Page 12: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

12

IF

Stage 1: Instruction Fetch

Fetch a new instruction every cycle• Current PC is index to instruction memory• Increment the PC at end of cycle (assume no branches for now)

Write values of interest to pipeline register (IF/ID)• Instruction bits (for later decoding)• PC+4 (for later computing branch targets)

Page 13: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

13

IF

PC

instructionmemory

newpc

inst

addr mc

00 = read word

1

IF/ID

WE 1

Rest

of p

ipel

ine

+4

PC+4

pcsel

pcregpcrel

pcabs

Page 14: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

14

ID

Stage 2: Instruction Decode

On every cycle:• Read IF/ID pipeline register to get instruction bits• Decode instruction, generate control signals• Read from register file

Write values of interest to pipeline register (ID/EX)• Control information, Rd index, immediates, offsets, …• Contents of Ra, Rb• PC+4 (for computing branch targets later)

Page 15: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

15

ID

ctrl

ID/EX

Rest

of p

ipel

ine

PC+4

inst

IF/ID

PC+4

Stag

e 1:

Inst

ructi

on F

etch

registerfile

WERd

Ra Rb

DB

A

BA

extend imm

decode

result

dest

Page 16: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

16

EX

Stage 3: Execute

On every cycle:• Read ID/EX pipeline register to get values and control bits• Perform ALU operation• Compute targets (PC+4+offset, etc.) in case this is a branch• Decide if jump/branch should be taken

Write values of interest to pipeline register (EX/MEM)• Control information, Rd index, …• Result of ALU operation• Value in case this is a memory store instruction

Page 17: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

17

Stag

e 2:

Inst

ructi

on D

ecod

e

pcrel

pcabs

EX

ctrl

EX/MEM

Rest

of p

ipel

ine

BD

ctrl

ID/EX

PC+4

BA

alu

+

||

branch?

imm

pcsel

pcreg

Page 18: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

18

MEM

Stage 4: Memory

On every cycle:• Read EX/MEM pipeline register to get values and control bits• Perform memory load/store if needed

– address is ALU result

Write values of interest to pipeline register (MEM/WB)• Control information, Rd index, …• Result of memory operation• Pass result of ALU operation

Page 19: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

19

MEM

ctrl

MEM/WB

Rest

of p

ipel

ine

Stag

e 3:

Exe

cute

MD

ctrl

EX/MEM

BD

memory

din dout

addr

mc

Page 20: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

20

WB

Stage 5: Write-back

On every cycle:• Read MEM/WB pipeline register to get values and control bits• Select value and write to register file

Page 21: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

21

WB

Stag

e 4:

Mem

ory

ctrl

MEM/WB

MD

result

dest

Page 22: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

22IF/ID

+4

ID/EX EX/MEM MEM/WB

mem

din dout

addrinst

PC+4

OP

BA

Rd

BD

MD

PC+4

imm

OP

Rd

OP

Rd

PC

instmem

Rd

Ra Rb

DB

A

Page 23: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

23

Example

add r3, r1, r2; nand r6, r4, r5; lw r4, 20(r2); add r5, r2, r5; sw r7, 12(r3);

Page 24: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

24

0:add1:nand2:lw3:add4:sw

r0r1r2r3r4r5r6r7

0369

12187

4122

IF/ID

+4

ID/EX EX/MEM MEM/WB

mem

din dout

addrinst

PC+4

OP

BA

Rd

BD

MD

PC+4

imm

OP

Rd

OP

Rd

PC

instmem

77

add r3, r1, r2nand r6, r4, r5 add r3, r1, r2lw r4, 20(r2) nand r6, r4, r5 add r3, r1, r2add r5, r2, r5 lw r4, 20(r2) nand r6, r4, r5 add r3, r1, r2sw r7, 12(r3) add r5, r2, r5 lw r4, 20(r2) nand r6, r4, r5 add r3, r1, r2sw r7, 12(r3) add r5, r2, r5 lw r4, 20(r2) nand r6, r4, r5sw r7, 12(r3) add r5, r2, r5 lw r4, 20(r2)sw r7, 12(r3) add r5, r2, r5 sw r7, 12(r3)

Rd

Ra Rb

DB

A

Page 25: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

25

Time Graphs

1 2 3 4 5 6 7 8 9

add

nand

lw

add

sw

Clock cycle

Latency:Throughput:Concurrency:

CPI =

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

Page 26: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

26

Pipelining Recap

Powerful technique for masking latencies• Logically, instructions execute one at a time• Physically, instructions execute in parallel

– Instruction level parallelism

Abstraction promotes decoupling• Interface (ISA) vs. implementation (Pipeline)

Page 27: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

27

The end

Page 28: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

28

Sample Code (Simple)

Assume eight-register machineRun the following code on a pipelined datapath

add 3 1 2 ; reg 3 = reg 1 + reg 2 nand 6 4 5 ; reg 6 = ~(reg 4 & reg 5) lw 4 20 (2) ; reg 4 = Mem[reg2+20] add 5 2 5 ; reg 5 = reg 2 + reg 5 sw 7 12(3) ; Mem[reg3+12] = reg 7

Slides thanks to Sally McKee

Page 29: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

29

PC Instmem

Regi

ster

file

MUXA

LU

MUX

1

Datamem

+

MUX

MUX

Bits 0-2

Bits 15-17

op

dest

offset

valB

valA

PC+1PC+1target

ALUresult

op

dest

valB

op

dest

ALUresult

mdata

instruction

0

R2

R3

R4

R5

R1

R6

R0

R7

regA

regB

Bits 21-23

data

dest

IF/ID ID/EX EX/MEM MEM/WB

Page 30: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

30

PC Instmem

Regi

ster

file

MUXA

LU

MUX

1

Datamem

+

MUX

MUX

Bits 0-2

Bits 15-17

nop

0

0

0

0

000

0

nop

0

0

nop

0

0

0

0

nop

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits 21-23

data

dest

InitialState

IF/ID ID/EX EX/MEM MEM/WB

Page 31: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

31

PC Instmem

Regi

ster

file

MUXA

LU

MUX

1

Datamem

+

MUX

MUX

Bits 0-2

Bits 15-17

nop

0

0

0

0

010

0

nop

0

0

nop

0

0

0

0

add 3 1 2

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits 21-23

data

dest

Fetch: add 3 1 2

add 3 1 2

Time: 1 IF/ID ID/EX EX/MEM MEM/WB

Page 32: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

32

PC Instmem

Regi

ster

file

MUXA

LU

MUX

1

Datamem

+

MUX

MUX

Bits 0-2

Bits 15-17

add

3

3

9

36

120

0

nop

0

0

nop

0

0

0

0nand 6 4 5

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

1

2

Bits 21-23

data

dest

Fetch: nand 6 4 5

nand 6 4 5 add 3 1 2

Time: 2 IF/ID ID/EX EX/MEM MEM/WB

Page 33: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

33

PC Instmem

Regi

ster

file

MUXA

LU

MUX

1

Datamem

+

MUX

MUX

Bits 0-2

Bits 15-17

nand

6

6

7

18

234

45

add

3

9

nop

0

0

0

0lw 4 20(2)

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

4

5

Bits 21-23

data

dest

Fetch: lw 4 20(2)

lw 4 20(2) nand 6 4 5 add 3 1 2

Time: 3

36

9

3

IF/ID ID/EX EX/MEM MEM/WB

Page 34: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

34

PC Instmem

Regi

ster

file

MUXA

LU

MUX

1

Datamem

+

MUX

MUX

Bits 0-2

Bits 15-17

lw

4

20

18

9

348

-3

nand

6

7

add

3

45

0

0add 5 2 5

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

2

4

Bits 21-23

data

dest

Fetch: add 5 2 5

add 5 2 5 lw 4 20(2) nand 6 4 5 add 3 1 2

Time: 4

18

7

6

45

3

IF/ID ID/EX EX/MEM MEM/WB

Page 35: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

35

PC Instmem

Regi

ster

file

MUXA

LU

MUX

1

Datamem

+

MUX

MUX

Bits 0-2

Bits 15-17

add

5

5

7

9

4523

29

lw

4

18

nand

6

-3

0

0sw 7 12(3)

945187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

2

5

Bits 21-23

data

dest

Fetch: sw 7 12(3)

sw 7 12(3) add 5 2 5 lw 4 20 (2) nand 6 4 5 add 3 1 2

Time: 5

9

20

4

-3

6

45

3

IF/ID ID/EX EX/MEM MEM/WB

Page 36: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

36

PC Instmem

Regi

ster

file

MUXA

LU

MUX

1

Datamem

+

MUX

MUX

Bits 0-2

Bits 15-17

sw

7

12

22

45

5 9

16

add

5

7

lw

4

29

99

09

45187

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

3

7

Bits 21-23

data

dest

No moreinstructions

sw 7 12(3) add 5 2 5 lw 4 20(2) nand 6 4 5

Time: 6

9

7

5

29

4

-3

6

IF/ID ID/EX EX/MEM MEM/WB

Page 37: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

37

PC Instmem

Regi

ster

file

MUXA

LU

MUX

1

Datamem

+

MUX

MUX

Bits 0-2

Bits 15-17

15

57

sw

7

22

add

5

16

0

09

45997

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits 21-23

data

dest

No moreinstructions

nop nop sw 7 12(3) add 5 2 5 lw 4 20(2)

Time: 7

45

7

12

16

5

99

4

IF/ID ID/EX EX/MEM MEM/WB

Page 38: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

38

PC Instmem

Regi

ster

file

MUXA

LU

MUX

1

Datamem

+

MUX

MUX

Bits 0-2

Bits 15-17

sw

7

57

0

9

459916

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits 21-23

data

dest

No moreinstructions

nop nop nop sw 7 12(3) add 5 2 5

Time: 8

2257

22

16

5

Slides thanks to Sally McKee

IF/ID ID/EX EX/MEM MEM/WB

Page 39: 1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

39

PC Instmem

Regi

ster

file

MUXA

LU

MUX

1

Datamem

+

MUX

MUX

Bits 0-2

Bits 15-17

9

459916

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits 21-23

data

dest

No moreinstructions

nop nop nop nop sw 7 12(3)

Time: 9 IF/ID ID/EX EX/MEM MEM/WB