24
ee457_Lab6_Part4_r3.fm 7/22/07 EE457 Lab #6 / part 4 1 / 24 C Copyright 2006 Gandhi Puvvada 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below is an arrangement where a 4-bit register file is to be used with the pipelined ripple carry adder discussed in your class notes. The register file has 8 registers and also two read ports and one write port. We need to be able to perform only two instructions using this set- up, an ADD and a NOP (NO-Operation). In the ADD, you add two source registers and store the result in the destination register. In the NOP, it does not matter whether you add or not, you should NOT STORE any result. Here we are NOT designing any HDU (Hazard Detection Unit) or FU (Forwarding Unit) to deal with data dependencies. Let us assume that the compiler is responsible for inserting NOPs to take care of any dependencies. The instructions are 10-bits long and the formats are given below. The single-bit opcode is a "1" for ADD and a "0" for NOP. Instructions keep coming into the IF/ID register on every clock. You are not responsible for instruction fetching. Complete the datapath and control on the next page. Mark the sizes of all the stage registers. Control bits can be carried along with data in the stage registers. Here we are ignoring the final carry C4 and storing the 4-bit result. Do NOT be misled by Miss Bruin’s design below! Instruction Opcode rs = Source Reg 1 rt = Source Reg 2 rd = Destination Reg size of the fields => 1 bit 3 bits 3 bits 3 bits add rd, rs, rt 1 rs2 rs1 rs0 rt2 rt1 rt0 rd2 rd1 rd0 nop 0 rs2 rs1 rs0 rt2 rt1 rt0 rd2 rd1 rd0 opcode rs2 rs1 rs0 rt2 rt1 rt0 rd2 rd1 rd0 IF/ID Size = 10bit R1A2 R1A1 R1A0 R2A2 R2A1 R2A0 WA2 WA0 WA1 R1D3 R1D2 R1D1 R1D0 R2D3 R2D2 R2D1 R2D0 WD3 WD2 WD1 WD0 WRITE CLK SYS_CLK REGISTER FILE A B Co Ci S A B Co Ci S A B Co Ci S A B Co Ci S ID/EX1 Size = EX4/WB Size = EX3/EX4 Size = EX2/EX3 Size = EX1/EX2 Size = EX1 WB EX2 EX3 EX4 IF ID D Q D D D D D D D D D Q Q Q Q Q Q Q Q Q Read_Address_1 Read_Address_2 Write_Address Read_Data_2 Write_Data Read_Data_1 rs = Source Reg 1 rt = Source Reg 2 rd = Destination Reg Miss Bruin’s Design 12 bits 12 bits 11 bits 10 bits 8 bits EE457 Lab 6 Part 4 Revised by Gandhi Puvvada and Wei-jen Hsu Based on ee457_Lab6_Part4.fm of 10/15/04 by Gandhi Puvvada

1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 1 / 24C Copyright 2006 Gandhi Puvvada

1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below is an arrangement where a 4-bit register file is to be used with the pipelined ripple carry adder discussed in your class notes. The register file has 8 registers and also two read ports and one write port. We need to be able to perform only two instructions using this set-up, an ADD and a NOP (NO-Operation). In the ADD, you add two source registers and store the result in the destination register. In the NOP, it does not matter whether you add or not, you should NOT STORE any result. Here we are NOT designing any HDU (Hazard Detection Unit) or FU (Forwarding Unit) to deal with data dependencies. Let us assume that the compiler is responsible for inserting NOPs to take care of any dependencies.The instructions are 10-bits long and the formats are given below. The single-bit opcode is a "1" for ADD and a "0" for NOP.

Instructions keep coming into the IF/ID register on every clock. You are not responsible for instruction fetching.

Complete the datapath and control on the next page. Mark the sizes of all the stage registers. Controlbits can be carried along with data in the stage registers. Here we are ignoring the final carry C4 andstoring the 4-bit result. Do NOT be misled by Miss Bruin’s design below!

Instruction Opcode rs = Source Reg 1 rt = Source Reg 2 rd = Destination Reg

size of the fields => 1 bit 3 bits 3 bits 3 bits

add rd, rs, rt 1 rs2 rs1 rs0 rt2 rt1 rt0 rd2 rd1 rd0

nop 0 rs2 rs1 rs0 rt2 rt1 rt0 rd2 rd1 rd0

opc

ode

r

s2

r

s1

rs

0

rt

2

rt

1

rt

0

r

d2

rd

1

rd

0

IF/I

D

Size

= 1

0bit

R1A

2R

1A1

R1A

0R

2A2

R2A

1R

2A0

WA

2W

A0

WA

1

R1D

3R

1D2

R1D

1R

1D0

R2D

3R

2D2

R2D

1R

2D0

WD

3

WD

2

WD

1

WD

0

WR

ITE

CL

KSY

S_C

LK

RE

GIS

TE

R F

ILE

AB

Co

Ci

S

AB

Co

Ci

S

AB

Co

Ci

S

AB

Co

Ci

S

ID/E

X1

Size

=

EX

4/W

BSi

ze =

EX

3/E

X4

Size

=

EX

2/E

X3

Size

=

EX

1/E

X2

Size

=

EX

1

WB

EX

2

EX

3

EX

4

IF ID

D Q

DD

DD

DD

DD

D

QQ

QQ

QQ

QQ

Q

Rea

d_A

ddre

ss_1

Rea

d_A

ddre

ss_2

Wri

te_A

ddre

ss

Rea

d_D

ata_

2Write_Data

Rea

d_D

ata_

1

rs =

Sou

rce

Reg

1rt

= S

ourc

e R

eg 2

rd =

Des

tinat

ion

Reg

Miss Bruin’s Design

12 b

its

12 b

its

11 b

its

10 b

its

8 b

its

EE457 Lab 6 Part 4 Revised by Gandhi Puvvada and Wei-jen Hsu Based on ee457_Lab6_Part4.fm of 10/15/04 by Gandhi Puvvada

Page 2: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 2 / 24C Copyright 2006 Gandhi Puvvada

List what major design errors you corrected in Miss Bruin’s design. __________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

opcode rs2 rs1 rs0 rt2 rt1 rt0 rd2 rd1 rd0

IF/ID

Size = 10bit

R1A2 R1A1 R1A0 R2A2 R2A1 R2A0 WA2 WA0WA1

R1D3 R1D2 R1D1 R1D0 R2D3 R2D2 R2D1 R2D0

WD3WD2

WD1

WD0

WRITE

CLKSYS_CLKREGISTER FILE

A BCo Ci

S

A BCo Ci

S

A BCo Ci

S

A BCo Ci

S

ID/EX1Size =

EX4/WBSize =

EX3/EX4Size =

EX2/EX3Size =

EX1/EX2Size =

EX1

WB

EX2

EX3

EX4

IF

ID

D

Q

D D D D D D D D D

Q Q Q Q Q Q Q Q Q

Read_Address_1 Read_Address_2 Write_Address

Read_Data_2W

rite

_Dat

aRead_Data_1

rs = Source Reg 1 rt = Source Reg 2 rd = Destination Reg

Page 3: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 3 / 24C Copyright 2006 Gandhi Puvvada

2 [Based on question 5 of Summer 2003 Midterm and question 8 of Spring 1994 Final] Pipeline Design (Stalling / Flushing / Forwarding):

2.1 Bubbles are produced ________________________________________________________ (in stalling only/in flushing only/in stalling as well as in flushing/in neither stalling nor flushing).

2.2 In the early-branch design of the pipeline CPU (current lab6 based on 3rd ed.), flushing and stalling ___________________ (never occur in the same clock cycle/may sometimes occur in the same clock cycle/always occur in the same clock cycle).

In a late-branch design (based on the first edition), if the branch below is successful, do flushing and stalling both occur together or one would prevent the other? Explain.

beq $1, $2, TARGETlw $4, 40 ($5)or $8, $4, $6

________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

2.3 There are 9 (1+4+2+2) control signals generated by the control unit. Eight of these (8 out of 9) are going from the ID stage to the EX stage. Do you need to convert all the 8 signals to zero when you stall an instruction in the ID stage? Please explain below.

2.4 To ___________ (stall/flush) an instruction in ID stage, you inhibit (prevent) updating of the following register(s). (circle as many of the following as you wish) PC , IF/ID , ID/EX , EX/MEM , MEM/WB You never inhibit (prevent) updating of a stage register if you are currently _______________ _______________________________________________________________________ (flushing / stalling / can not fill this blank with either of the previous two choices).

2.5 In this question we consider the late-branch design of the first edition with one HDU in ID stage and one FU in EX stage, and an internally forwarding register file.

Page 4: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 4 / 24C Copyright 2006 Gandhi Puvvada

In the answers below, if there is a stalling, state the reason for stalling and which instruction(s) in which stage(s) are being stalled. If there is a forwarding, state the reason and also state which instruction from which stage is offering forwarding help to which instruction in which stage.

All the three streams use the same 3 instructions in different order.

For stream #1 above, the following occur(s): (circle all correct choices) (i) hazard detection and stalling by HDU (ii) forwarding by FU(iii) internal forwarding in the reg. file (iv) none of theseRemark: ____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

For stream #2 above, the following occur(s): (circle all correct choices) (i) hazard detection and stalling by HDU (ii) forwarding by FU(iii) internal forwarding in the reg. file (iv) none of theseRemark: ____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

For stream #3 above, the following occur(s): (circle all correct choices) (i) hazard detection and stalling by HDU (ii) forwarding by FU(iii) internal forwarding in the reg. file (iv) none of theseRemark: ____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

2.5.1 Now reconsider the above three streams in the context of the early-branch design based on the current lab 6. Explain any differences or striking resemblances to your three answers above._________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Stream #1 Stream #2 Stream #3add $3 , $3 , $1; lw $3 , 40($5); lw $3 , 40($5);or $6 , $5 , $4; or $6 , $5 , $4; add $3 , $3 , $1;lw $3 , 40($5); add $3 , $3 , $1; or $6 , $5 , $4;

Page 5: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 5 / 24C Copyright 2006 Gandhi Puvvada

2.6 In this question we consider the early-branch design of our current lab 6 with two HDUs (HDU and HDU_Br) and two FUs (FU and FU_Br). Of course the register file is an internally forwarding register file. Identify the dependencies in the following instruction streams and how they should be resolved:

For the stream #1 above, the following occur(s): (circle all correct choices) (i) HDU_Br initiated stalling (ii) HDU initiated stalling(iii) forwarding by FU_Br (iv) forwarding by FU(v) internal forwarding in the reg. file (vi) none of these)Remark:_____________________________________________________________________

_____________________________________________________________________________________________________________________________________________________________________________________________________________________

For the stream #2 above, the following occur(s): (circle all correct choices) (i) HDU_Br initiated stalling (ii) HDU initiated stalling(iii) forwarding by FU_Br (iv) forwarding by FU(v) internal forwarding in the reg. file (vi) none of these)Remark:_____________________________________________________________________

_____________________________________________________________________________________________________________________________________________________________________________________________________________________

For the stream #3 above, the following occur(s): (circle all correct choices) (i) HDU_Br initiated stalling (ii) HDU initiated stalling(iii) forwarding by FU_Br (iv) forwarding by FU(v) internal forwarding in the reg. file (vi) none of these)Remark:_____________________________________________________________________

_____________________________________________________________________________________________________________________________________________________________________________________________________________________

For the stream #4 above, the following occur(s): (circle all correct choices) (i) HDU_Br initiated stalling (ii) HDU initiated stalling(iii) forwarding by FU_Br (iv) forwarding by FU(v) internal forwarding in the reg. file (vi) none of these)Remark:_____________________________________________________________________

_______________________________________________________________________

Stream #1 Stream #2add $2 , $2 , $2; add $2 , $3 , $4;sub $1 , $2 , $3; sub $5 , $6 , $7;beq $2 , $0 , loop1; beq $5 , $2 , loop1;

Stream #3 Stream #4lw $4 , $3(40); lw $4 , $3(40);beq $4 , $0 , loop1; sub $5 , $6 , $7;

beq $4 , $0 , loop1;

Page 6: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 6 / 24C Copyright 2006 Gandhi Puvvada

______________________________________________________________________________________________________________________________________________

Summary: In the lab #6 design for the early-branch, we stall the branch instruction for _________ (0/1/2/3/arbitrary) clock cycles if it is dependent on an R-type instruction __________________ (in EX stage / in MEM stage). We stall the branch instruction for _________ (0/1/2/3/arbitrary) clock cycles if it is dependent on an LW instruction in EX stage (i.e. beq is dependent on lw immediately ahead of it) . We stall the branch instruction for _________ (0/1/2/3/arbitrary) clock cycles if it is dependent on an LW instruction in MEM stage.

The result of an R-type instruction is available at the end of _______ (EX/MEM/WB) stage, and the result of an LW instruction is available at the end of _______ (EX/MEM/WB) stage. However, we choose not to forward these results (to beq) from the same stage where they are generated because _________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

2.7 Whenever a load word (lw) instruction is followed by a dependent instruction (dependent on the word being loaded), the HDU detects the hazard and inserts a bubble. This being the case, to reduce the hardware, the compiler (a simple-minded design of a compiler) can be asked to put a NOP (no operation instruction) between such instructions without losing any additional performance. TRUE / FALSE

In the case of an early-branch design, can we use the same principle in the case of control hazards with conditional branch instructions by asking compiler to put one NOP after every conditional branch instruction to avoid the hardware associated with flushing the instruction in IF stage? Tell us first if this suggestion is feasible (meaning, it will produce correct output for the program)? If it is feasible, do you change (lose or gain) performance by doing so? Compare with the above case of lw.

2.8 In this question we focus on the specific point of tapping of the branch control signal in the ID stage for (a) ANDing with the equality inference and (b) for HDU_Br to produce STALL_BEQ. Reproduced below is the relevant extract of the block diagram. In particular, note that both the AND gate in ID-stage and the HDU_Br take the branch control signal from the output of control unit (Point B in the figure).

Page 7: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 7 / 24C Copyright 2006 Gandhi Puvvada

Mr. Bruin claims that he discovered a problem in this design. He argues that the branch control signal for the AND gate should be taken after the flush mux (Point C) in the design to avoid erroneous branching. For example consider the following stream:

lw $4 , $3(40) ;beq $4 , $0 , loop1 ;

The BEQ instruction should be stalled for 2 clock cycles to resolve its dependency on the LW. However, if register $4 contains 0 before the execution of LW, the AND gate sees a 1 on both of its inputs and would take the branch based on wrong value of $4!!So Mr. Bruin concludes that a false branch will occur. Comment on Mr. Bruin’s discovery. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________He further offers a solution by moving the tapping of branch control signal from point B to point C instead. Evaluate the proposed solution by answering the following:

It is _______________________________ (a must / a feasible change but does not make any difference / a feasible change that improves the design / a sin) to move the tapping of branch control signal for the AND gate from point B to point C. Explain:________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

0

opco

de Co

ntr

ol(PC

)

EX

MEWB

IF/ID ID/EXID-Stage

HDU_Br

STALL_BEQSTALL_LW

STALL

Branch01

Branch

=

A

B

C

Hazarddetection

unit

Page 8: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 8 / 24C Copyright 2006 Gandhi Puvvada

It is _______________________________ (a must / a feasible change but does not make any difference / a feasible change that improves the design / a sin) to move the tapping of branch control signal for the HDU_Br from point B to point C. Explain:________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________Another person suggests that instead of waiting for the control unit to generate the branch control signal, the OPCODE field can be re-coded so that we can identify BEQ instruction by inspection of a single bit among the six-bit OPCODE field. With this modification, we can bypass the control unit and get branch control signal from point A in the figure. Is this a good suggestion or bad one? Are there any other things we should take care of ? Consider the following control sequence. Notice that in case the first BEQ is taken, the second BEQ should be flushed.

beq $0 , $1 , loop1 ;beq $4 , $2 , loop2 ;

________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

2.9 Take a closer look at the muxes used to provide forwarding help in the EX stage, reproduced below on the left hand side:

We observe that the two muxes on the left are arranged in the particular order so that the forwarding help with higher priority (help from MEM stage) is fed into the second mux. Is this ordering significant? If the order of the muxes is reversed (as given in the "Modified design" on the right-hand side), can it be made to work? If so, what aspects/precautions need to be taken into consideration in the design of the FU (forwarding unit)? Answer the following questions:

FW_R

S_W

B

FW_R

S_M

EM

11

0

0

original read data

forwardedhelp fromWB stage

forwardedhelp fromMEM stage

FW_R

S_M

EM_n

ew

FW_R

S_W

B_n

ew

11

0

0

original read data

forwardedhelp fromMEM stage

forwardedhelp fromWB stage

Original lab design Modified design

Page 9: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 9 / 24C Copyright 2006 Gandhi Puvvada

In the following instruction sequences, we need the forwarded value for $3 ($rs). What should the 2 control signals be?

add $10, $11, $12 ;add $3 , $3 , $3 ;or $6 , $3 , $4 ;

In the original design, FW_RS_WB= (0/1/X), FW_RS_MEM= (0/1/X)In the modified design, FW_RS_WB= (0/1/X), FW_RS_MEM= (0/1/X)

add $3 , $3 , $3 ;add $10, $11, $12 ;or $6 , $3 , $4 ;

In the original design, FW_RS_WB= (0/1/X), FW_RS_MEM= (0/1/X)In the modified design, FW_RS_WB= (0/1/X), FW_RS_MEM= (0/1/X)

add $3 , $3 , $3 ;add $3 , $3 , $3 ;or $6 , $3 , $4 ;

In the original design, FW_RS_WB= (0/1/X), FW_RS_MEM= (0/1/X)In the modified design, FW_RS_WB= (0/1/X), FW_RS_MEM= (0/1/X)

From the observations made in above instruction sequences, can we generate the 2 forwarding control signals independent of each other (a) in the original design and (b) in the modified design?

__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

2.10 [Based on Question #6 of Fall 2006 midterm]FU_Br and FU in a 5-stage early branch design:

Your friend says that the MEM hazard cases shown in the above two streams are attended to by the FU_Br in ID stage. Agree / Disagree.

RegInstr.

HDU

Data

FU

IF ID EX MEM WB

BRANCH

BR

1

FU_Br

PC

cont

rol

HDU_Br

Zero

i add $1, $2, $3

i+1 xor $11, $12, $13

i+2 beq $4, $1, loop

i add $1, $2, $3

i+1 xor $11, $12, $13

i+2 sub $4, $1, $5

Remove??

Page 10: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 10 / 24C Copyright 2006 Gandhi Puvvada

He further argues that one set of forwarding muxes in EX stage attending to the same very hazard redundantly (MEM hazard between a dependent instruction in EX stage and donor instruction in WB stage) can be removed. Agree / Disagree. Explain with a suitable example:

IF ID EX M WBCC1CC2CC3

Page 11: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 11 / 24C Copyright 2006 Gandhi Puvvada

3 [Based on Question #4 of Fall 1995 Final] Modified Pipeline Design (7-stage pipeline) :

Pipelined CPU: A variation of the 5-stage pipeline CPU is the following 7-stage pipeline CPU. Here we assume that the memory accesses take two clocks - one for TLB access and the second for cache access. Hence we have IF1 and IF2 in the place of IF stage and similarly MEM1 and MEM2 in the place of MEM stage. Many details are omitted in the simple block diagrams given below. As before, we always try to resolve dependency problems through forwarding to the extent possible and will resort to stalling if forwarding cannot help.

Late Branch

Early Branch

RegInstr.TLB

Instr.cache

DataTLB

Datacache

FU

PC

IF1 IF2 ID EX MEM1 MEM2 WB

Zero

Zero

BRANCH

BR

1

7-stage pipelined version of the late-branch design of the 1st edition

HDU

cont

rol

RegInstr.TLB

Instr.cache

HDU

DataTLB

Datacache

FU

IF1 IF2 ID EX MEM1 MEM2 WB

BRANCH

BR

1

7-stage pipelined version of the early-branch design of the 3rd ed. and our lab 6

FU_Br

PC

contr

ol

HDU_Br

Zero

Remove mux Pair 2? See Q 3.1.

pair

#1pa

ir #2

pair

#3

pair

#1

(Treat it as removed for Q3.2)

Page 12: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 12 / 24C Copyright 2006 Gandhi Puvvada

3.1 The two pairs of forwarding muxes in ID stage (in the early branch design) provide forwarding help from R-type instructions in MEM1 and MEM2 to beq (and also other instructions) in ID stage. Let us investigate whether we really need 3 pairs of forwarding muxes in the EX stage.These muxes (#1, #2, and #3) provide forwarding help to a dependent instruction in the EX from (a) an R-type or lw instruction in WB stage, (b) an R-type instruction in MEM2 stage, and (c) an R-type instruction in MEM1 stage respectively (in that specific order to implement the needed priority). Mr. Trojan argues that the mux pair #2 can be removed but not the mux pair #1. Explain.

__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

3.2 Compare the original 5-stage late-branch and early-branch pipelines with these 7-stage versions by answering questions in the tables on the next 7 pages (sorry, it is a long question).

3.3 Flushing of the two instructions in the IF1 and IF2 stages in the case of the 7-stage pipeline:

Note: This part of the design is common to both branch implementations (late or early).

The flushing arrangement shown on the side is extracted from the earlier diagrams. As you can see it is hardly complete.Two of your assistants submitted the following designs to you. You are asked to finalize this design. You can adopt any one of them as is, or take any one of them and modify to your liking.

Instr.TLB

Instr.cache

IF1 IF2 ID

BR1

7-stage pipeline

PC

cont

rol

Instr.TLB

Instr.cache

IF1 IF2 ID

BR1

7-stage pipeline

PC

cont

rol

RESET RESET

Instr.TLB

Instr.cache

IF1 IF2 ID

BR1

7-stage pipeline

PC

cont

rol

RESET RESET

Assistant #2’sdesign of flush

Assistant #1’sdesign of flush

Page 13: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 13 / 24C Copyright 2006 Gandhi Puvvada

Dep

ende

ncy

of a

R-t

ype

inst

ruct

ion

on a

load

wor

d in

stru

ctio

n, st

allin

g by

HD

U to

res

olve

the

depe

nden

cy p

robl

em:

Des

ign

item

In 5

-sta

ge la

te-b

ranc

hIn

5-s

tage

ear

ly-b

ranc

hIn

7-s

tage

late

-bra

nch

In 7

-sta

ge e

arly

-bra

nch

i

l

w $

1, 6

0($2

)

i+1

a

dd

$4,

$1,

$6

Any

bub

bles

? H

ow m

any?

Whe

re a

re th

ey in

sert

ed?

Com

plet

e th

e Ti

me-

Spac

e di

agra

ms.

This

exa

mpl

e is

com

plet

ed b

y us

.

Bub

bles

= _

__1_

____

_ (0

/1/2

/3)

Bub

bles

= _

____

1___

__ (0

/1/2

/3)

Bub

bles

= _

___2

____

__ (0

/1/2

/3)

Bub

bles

= _

___2

____

__ (0

/1/2

/3)

i

lw $

1, 6

0($2

)

i+1

sub

$10

, $11

, $12

i+2

add

$4,

$1,

$6

Any

bub

bles

? H

ow m

any?

Whe

re a

re th

ey in

sert

ed?.

Bub

bles

= _

____

____

__ (0

/1/2

/3)

Bub

bles

= _

____

____

__ (0

/1/2

/3)

Bub

bles

= _

____

____

__ (0

/1/2

/3)

Bub

bles

= _

____

____

__ (0

/1/2

/3)

How

man

y co

mpa

rato

rs d

oes

the

HD

U (n

ot H

DU

_Br)

have

? W

here

do

the

dest

ina-

tion

regi

ster

add

r. in

puts

to

the

com

para

tors

com

e fr

om?

# of

com

para

tors

= _

____

Des

tinat

ion

reg.

add

r. in

put(s

) com

e(s)

from

:

# of

com

para

tors

= _

____

Des

tinat

ion

reg.

add

r. in

put(s

) com

e(s)

from

:

# of

com

para

tors

= _

____

Des

tinat

ion

reg.

add

r. in

put(s

) com

e(s)

from

:

# of

com

para

tors

= _

____

Des

tinat

ion

reg.

add

r. in

put(s

) com

e(s)

from

:

Del

ay sl

ots f

or lw

: To

avoi

d th

e

use

of H

DU

, ho

w m

any

dela

y

slot

s sho

uld

we

decl

are

for

lw?

# of

Del

ay sl

ots =

___

___

# of

Del

ay sl

ots =

___

___

# of

Del

ay sl

ots =

___

___

# of

Del

ay sl

ots =

___

___

lwad

d

add

lwad

dlw

add

lwad

d

as 5-

stage

late

-bra

nch

lwad

d

add

add

add

lw

lw

lw

lwad

d

as 7-

stage

late

-bra

nch

lwsu

bad

dlw

sub

add

lwsu

bad

dlw

sub

add

Page 14: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 14 / 24C Copyright 2006 Gandhi Puvvada

Dep

ende

ncy

of a

R-t

ype

inst

ruct

ion

on a

noth

er R

-typ

e in

stru

ctio

n; F

orw

ardi

ng:

Des

ign

item

In 5

-sta

ge la

te-b

ranc

hIn

5-s

tage

ear

ly-b

ranc

hIn

7-s

tage

late

-bra

nch

In 7

-sta

ge e

arly

-bra

nch

i

ad

d $

5, $

7, $

9

i+1

xo

r $

1, $

2, $

3

i+2

or

$

10,

$11,

$12

i+3

su

b $

3, $

5, $

1

Exp

lain

forw

ardi

ng to

inst

ruct

ion

(i+3)

sub

rece

ives

late

st $

1 fr

om x

or w

hen

sub

is

in _

____

_ st

age

and

xor

is in

___

___

stag

e un

der t

he c

ontro

l of

____

____

____

____

_(F

U/in

tern

al fo

rwar

d-in

g in

regi

ster

file

).su

b re

ceiv

es la

test

$5

from

__

____

____

____

____

____

____

____

__du

e to

___

____

____

___

____

____

____

____

(FU

/inte

rnal

forw

ard-

ing

in re

gist

er fi

le).

sub

rece

ives

late

st $

1 fr

om x

or fi

rst t

ime

whe

n su

b is

in _

____

_ st

age

and

xor

is in

__

____

stag

e un

der t

he

cont

rol o

f __

____

__

____

____

(FU

_Br/F

U/

inte

rnal

forw

ardi

ng in

re

gist

er fi

le).

It re

ceiv

es th

e sa

me

valu

e ag

ain

seco

nd

time

whe

n su

b is

in

____

__ st

age a

nd x

or is

in

___

___

stag

e un

der

the

cont

rol o

f __

____

__ (F

U_B

r/FU

/in

tern

al fo

rwar

ding

in

regi

ster

file

).su

b re

ceiv

es la

test

$5

from

___

____

____

___

____

____

____

____

__du

e to

___

____

____

___

____

____

____

____

(FU

_Br/F

U/in

tern

al

forw

ardi

ng in

regi

ster

fil

e).

sub

rece

ives

late

st $

1 fr

om x

or w

hen

sub

is

in _

____

_ st

age

and

xor

is in

___

___

stag

e un

der t

he c

ontro

l of

____

____

____

____

_(F

U/in

tern

al fo

rwar

d-in

g in

regi

ster

file

).su

b re

ceiv

es la

test

$5

from

add

whe

n su

b is

in

___

___

stag

e an

d ad

d is

in _

____

_ st

age

unde

r the

con

trol o

f __

____

____

____

___

(FU

/inte

rnal

forw

ard-

ing

in re

gist

er fi

le).

sub

rece

ives

late

st $

1 fr

om

xor

first

tim

e w

hen

sub

is in

__

____

stag

e an

d xo

r is

in

____

__ st

age u

nder

the c

ontro

l of

___

____

_ (F

U_B

r/FU

). In

the

abse

nce

of m

ux p

ar #

2,

it _

____

____

_ (r

ecei

ves/

does

n’t r

ecei

ve)

the

sam

e va

lue

agai

n se

cond

tim

e af

ter

1 cl

ock.

su

b re

ceiv

es la

test

$5

from

ad

d fi

rst t

ime

whe

n su

b is

in

____

__ st

age

and

add

is in

__

____

stag

e und

er th

e con

trol

of _

____

____

____

____

(FU

_Br/F

U/in

tern

al fo

rwar

d-in

g in

regi

ster

file

). It

rece

ives

th

e sa

me

valu

e ag

ain

seco

nd

time

whe

n su

b is

in _

____

_ st

age

and

add

is in

___

___

stag

e un

der t

he c

ontro

l of

____

____

(FU

_Br/F

U/in

ter-

nal f

orw

ardi

ng in

regi

ster

file

).

Page 15: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 15 / 24C Copyright 2006 Gandhi Puvvada

FU_B

r, F

U d

etai

ls:

Des

ign

item

In 5

-sta

ge la

te-b

ranc

hIn

5-s

tage

ear

ly-b

ranc

hIn

7-s

tage

late

-bra

nch

In 7

-sta

ge e

arly

-bra

nch

How

man

y co

mpa

rato

rs d

oes

the

forw

ardi

ng u

nit i

n ID

stag

e (F

U_B

r, no

t FU

) hav

e?

How

big

are

the

forw

ardi

ng

mux

es (n

-bit

wid

e m

-to-

1

mux

)? H

ow m

any?

Whe

re

do th

e da

ta in

puts

to th

e

mux

es c

ome

from

?

# of

com

para

tors

in F

U_B

r =

____

____

____

____

__Fo

rwar

ding

mux

(es)

in

the

A-le

g of

equ

ality

ch

ecke

r (si

ze a

nd n

um-

ber (

whi

ch is

sam

e for

the

B-le

g)) =

___

____

____

____

____

____

____

___

Dat

a in

puts

for t

his/

thes

e co

me

from

___

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

# of

com

para

tors

in F

U_B

r =

____

____

____

____

__Fo

rwar

ding

mux

(es)

in

the

A-le

g of

equ

ality

ch

ecke

r (si

ze a

nd n

um-

ber (

whi

ch is

sam

e for

the

B-le

g)) =

___

____

____

____

____

____

____

___

Dat

a in

puts

for t

his/

thes

e co

me

from

___

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

How

man

y co

mpa

rato

rs d

oes

the

forw

ardi

ng u

nit i

n E

X

stag

e (F

U, n

ot F

U_B

r) h

ave?

How

big

are

the

forw

ardi

ng

mux

es (n

-bit

wid

e m

-to-

1

mux

)? H

ow m

any?

Whe

re

do th

e da

ta in

puts

to th

e

mux

es c

ome

from

?

# of

com

para

tors

in F

U =

____

____

____

____

___

Forw

ardi

ng m

ux(e

s) in

th

e A

-leg

of A

LU (s

ize

and

num

ber (

whi

ch is

sa

me

for t

he B

-leg)

) =

____

____

____

____

___

____

____

____

____

___

Dat

a in

puts

for t

his/

thes

e co

me

from

___

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

# of

com

para

tors

in F

U =

____

____

____

____

___

Forw

ardi

ng m

ux(e

s) in

th

e A

-leg

of A

LU (s

ize

and

num

ber (

whi

ch is

sa

me

for t

he B

-leg)

) =

____

____

____

____

___

____

____

____

____

___

Dat

a in

puts

for t

his/

thes

e co

me

from

___

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

Not

e: M

ux P

air #

2 is

re

mov

ed.

# of

com

para

tors

in F

U =

____

____

____

____

___

Forw

ardi

ng m

ux(e

s) in

th

e A

-leg

of A

LU (s

ize

and

num

ber (

whi

ch is

sa

me

for t

he B

-leg)

) =

____

____

____

____

___

____

____

____

____

___

Dat

a in

puts

for t

his/

thes

e co

me

from

___

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

Same as 5

-stage la

te-branch

TRUE /

FALSE

Not applica

ble

Not applica

ble

Page 16: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 16 / 24C Copyright 2006 Gandhi Puvvada

Prio

rity

in F

U a

nd F

U_B

r: N

ote:

Des

ign

item

In 5

-sta

ge la

te-b

ranc

hIn

5-s

tage

ear

ly-b

ranc

hIn

7-s

tage

late

-bra

nch

In 7

-sta

ge e

arly

-bra

nch

Prio

rity

in F

U (F

U, n

ot

FU_B

r): F

orw

ardi

ng to

a

depe

nden

t ins

truc

tion

stan

d-

ing

in E

X st

age.

Opt

to fo

r-

war

d fr

om th

e ne

arer

than

the

fart

her

The

FU p

refe

rs to

allo

w

forw

ardi

ng h

elp

from

the

____

____

__ (M

EM/W

B)

over

___

____

____

____

_(M

EM/W

B).

Prio

rity

is im

plem

ente

d by

pla

cing

the f

orw

ardi

ng

mux

es re

ceiv

ing

forw

ard-

ing

help

from

__

____

____

____

__

(MEM

/WB

) ups

tream

of

the

forw

ardi

ng m

uxes

re

ceiv

ing

forw

ardi

ng h

elp

from

__

____

____

____

__(M

EM/W

B).

The

FU p

refe

rs to

allo

w

forw

ardi

ng h

elp

from

the

____

____

__ (M

EM1/

MEM

2/W

B) o

ver

____

____

____

____

(MEM

1/M

EM2W

B) a

s w

ell a

s ___

____

____

___

(MEM

1/M

EM2W

B) F

ur-

ther

___

____

____

____

___

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

__.

Not

e: M

ux P

air #

2 is

re

mov

ed.

Prio

rity

is im

plem

ente

d by

pl

acin

g th

e fo

rwar

ding

m

uxes

rece

ivin

g fo

rwar

d-in

g he

lp fr

om

____

____

____

____

(M

EM1/

MEM

2/W

B)

upst

ream

of t

he f

orw

ard-

ing

mux

es re

ceiv

ing

for-

war

ding

hel

p fr

om

____

____

____

____

(MEM

1/M

EM2/

WB

).

Prio

rity

in F

U_B

r (F

U_B

r,

not F

U):

For

war

ding

to a

BE

Q in

stru

ctio

n st

andi

ng in

ID st

age.

Opt

to fo

rwar

d

from

the

near

er th

an th

e

fart

her

No

prio

rity

need

s to

be

impl

emen

ted

in F

U_B

r.TR

UE

/ F

ALS

EEx

plai

n: _

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

Prio

rity

is im

plem

ente

d by

pl

acin

g th

e fo

rwar

ding

m

uxes

rece

ivin

g fo

rwar

d-in

g he

lp fr

om

____

____

____

____

(E

X/M

EM1/

MEM

2/W

B)

upst

ream

of t

he f

orw

ard-

ing

mux

es re

ceiv

ing

for-

war

ding

hel

p fr

om

____

____

____

____

(EX

/MEM

1/M

EM2/

WB

).

Not ap

plica

ble

Not ap

plica

ble

Page 17: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 17 / 24C Copyright 2006 Gandhi Puvvada

Dep

ende

ncy

of a

BE

Q in

stru

ctio

n on

a R

-typ

e in

stru

ctio

n; S

talli

ng th

roug

h H

DU

_Br,

For

war

ding

thro

ugh

FU_B

r/FU

:

Des

ign

item

In 5

-sta

ge la

te-b

ranc

hIn

5-s

tage

ear

ly-b

ranc

hIn

7-s

tage

late

-bra

nch

In 7

-sta

ge e

arly

-bra

nch

i

b

eq $

2, $

4, T

arge

t

How

man

y in

stru

ctio

ns fo

l-

low

ing

a su

cces

sful

bra

nch

are

flush

ed?

# of

inst

ruct

ions

that

nee

d to

be

flush

ed =

____

____

____

____

___

# of

inst

ruct

ions

that

nee

d to

be

flush

ed =

____

____

____

____

___

# of

inst

ruct

ions

that

nee

d to

be

flush

ed =

____

____

____

____

___

# of

inst

ruct

ions

that

nee

d to

be

flush

ed =

____

____

____

____

___

i

a

dd $

1, $

2, $

3

i+1

b

eq $

1, $

0, l

oop

How

man

y cl

ock

cycl

es d

oes

the

BE

Q h

ave

to b

e st

alle

d?

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

add

whe

n be

q is

in

____

___

stag

e an

d ad

d is

in

___

____

__ st

age u

nder

th

e co

ntro

l of

____

____

____

____

_(F

U/in

tern

al fo

rwar

ding

in

regi

ster

file

).

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

add

whe

n be

q is

in

____

___

stag

e an

d ad

d is

in

___

____

__ st

age u

nder

th

e co

ntro

l of

____

____

____

____

_(F

U_B

r/FU

/inte

rnal

for-

war

ding

in re

gist

er fi

le).

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

add

whe

n be

q is

in

____

___

stag

e an

d ad

d is

in

___

____

__ st

age u

nder

th

e co

ntro

l of

____

____

____

____

_(F

U/in

tern

al fo

rwar

ding

in

regi

ster

file

).

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

add

whe

n be

q is

in

____

___

stag

e an

d ad

d is

in

___

____

__ st

age u

nder

th

e co

ntro

l of

____

____

____

____

_(F

U_B

r/FU

/inte

rnal

for-

war

ding

in re

gist

er fi

le).

i

a

dd $

1, $

2, $

3

i+1

xo

r $

11,

$12,

$13

i+2

be

q $

1, $

0, l

oop

How

man

y cl

ock

cycl

es d

oes

the

BE

Q h

ave

to b

e st

alle

d?

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

add

whe

n be

q is

in

____

___

stag

e an

d ad

d is

in

___

____

__ st

age u

nder

th

e co

ntro

l of

____

____

____

____

_(F

U/in

tern

al fo

rwar

ding

in

regi

ster

file

).

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

add

whe

n be

q is

in

____

___

stag

e an

d ad

d is

in

___

____

__ st

age u

nder

th

e co

ntro

l of

____

____

____

____

_(F

U_B

r/FU

/inte

rnal

for-

war

ding

in re

gist

er fi

le).

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

add

whe

n be

q is

in

____

___

stag

e an

d ad

d is

in

___

____

__ st

age u

nder

th

e co

ntro

l of

____

____

____

____

_(F

U/in

tern

al fo

rwar

ding

in

regi

ster

file

).

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

add

whe

n be

q is

in

____

___

stag

e an

d ad

d is

in

___

____

__ st

age u

nder

th

e co

ntro

l of

____

____

____

____

_(F

U_B

r/FU

/inte

rnal

for-

war

ding

in re

gist

er fi

le).

Page 18: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 18 / 24C Copyright 2006 Gandhi Puvvada

Dep

ende

ncy

of a

BE

Q in

stru

ctio

n on

a lw

inst

ruct

ion;

Sta

lling

thro

ugh

HD

U_B

r, F

orw

ardi

ng th

roug

h FU

_Br/

FU:

Des

ign

item

In 5

-sta

ge la

te-b

ranc

hIn

5-s

tage

ear

ly-b

ranc

hIn

7-s

tage

late

-bra

nch

In 7

-sta

ge e

arly

-bra

nch

i

lw

$

1, $

2(40

)

i+1

b

eq $

1, $

0, l

oop

How

man

y cl

ock

cycl

es d

oes

the

BE

Q h

ave

to b

e st

alle

d?

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

lw w

hen

beq

is in

__

____

_ st

age

and

lw is

in

___

____

__ st

age

unde

r the

con

trol o

f __

____

____

____

___

(FU

/inte

rnal

forw

ardi

ng

in re

gist

er fi

le).

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

lw w

hen

beq

is in

__

____

_ st

age

and

lw is

in

___

____

__ st

age

unde

r the

con

trol o

f __

____

____

____

___

(FU

_Br/F

U/in

tern

al fo

r-w

ardi

ng in

regi

ster

file

).

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

lw w

hen

beq

is in

__

____

_ st

age

and

lw is

in

___

____

__ st

age

unde

r the

con

trol o

f __

____

____

____

___

(FU

/inte

rnal

forw

ardi

ng

in re

gist

er fi

le).

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

lw w

hen

beq

is in

__

____

_ st

age

and

lw is

in

___

____

__ st

age

unde

r the

con

trol o

f __

____

____

____

___

(FU

_Br/F

U/in

tern

al fo

r-w

ardi

ng in

regi

ster

file

).

i

lw

$

1, $

2(40

)

i+1

ad

d $

6, $

5, $

4

i+2

b

eq $

1, $

0, l

oop

How

man

y cl

ock

cycl

es d

oes

the

BE

Q h

ave

to b

e st

alle

d?

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

lw w

hen

beq

is in

__

____

_ st

age

and

lw is

in

___

____

__ st

age

unde

r the

con

trol o

f __

____

____

____

___

(FU

/inte

rnal

forw

ardi

ng

in re

gist

er fi

le).

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

lw w

hen

beq

is in

__

____

_ st

age

and

lw is

in

___

____

__ st

age

unde

r the

con

trol o

f __

____

____

____

___

(FU

_Br/F

U/in

tern

al fo

r-w

ardi

ng in

regi

ster

file

).

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

lw w

hen

beq

is in

__

____

_ st

age

and

lw is

in

___

____

__ st

age

unde

r the

con

trol o

f __

____

____

____

___

(FU

/inte

rnal

forw

ardi

ng

in re

gist

er fi

le).

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

lw w

hen

beq

is in

__

____

_ st

age

and

lw is

in

___

____

__ st

age

unde

r the

con

trol o

f __

____

____

____

___

(FU

_Br/F

U/in

tern

al fo

r-w

ardi

ng in

regi

ster

file

).

i

lw

$

1, $

2(40

)

i+1

ad

d $

6, $

5, $

4

i+2

or

$

16, $

15, $

14

i+3

b

eq $

1, $

0, l

oop

How

man

y cl

ock

cycl

es d

oes

the

BE

Q h

ave

to b

e st

alle

d?

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

lw w

hen

beq

is in

__

____

_ st

age

and

lw is

in

___

____

__ st

age

unde

r the

con

trol o

f __

____

____

____

___

(FU

/inte

rnal

forw

ardi

ng

in re

gist

er fi

le).

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

lw w

hen

beq

is in

__

____

_ st

age

and

lw is

in

___

____

__ st

age

unde

r the

con

trol o

f __

____

____

____

___

(FU

_Br/F

U/in

tern

al fo

r-w

ardi

ng in

regi

ster

file

).

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

lw w

hen

beq

is in

__

____

_ st

age

and

lw is

in

___

____

__ st

age

unde

r the

con

trol o

f __

____

____

____

___

(FU

/inte

rnal

forw

ardi

ng

in re

gist

er fi

le).

# of

clo

ck c

ycle

s beq

ne

eds t

o be

stal

led

=__

____

____

____

____

_be

q re

ceiv

es la

test

$1

from

lw w

hen

beq

is in

__

____

_ st

age

and

lw is

in

___

____

__ st

age

unde

r the

con

trol o

f __

____

____

____

___

(FU

_Br/F

U/in

tern

al fo

r-w

ardi

ng in

regi

ster

file

).

Page 19: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 19 / 24C Copyright 2006 Gandhi Puvvada

Mis

cella

neou

s:

Des

ign

item

In 5

-sta

ge la

te-b

ranc

hIn

5-s

tage

ear

ly-b

ranc

hIn

7-s

tage

late

-bra

nch

In 7

-sta

ge e

arly

-bra

nch

How

man

y co

mpa

rato

rs d

oes

the

HD

U_B

r) h

ave?

Des

tinat

ion

regi

ster

add

r.(s)

com

e(s)

to H

DU

_Br

from

.....

# of

com

para

tors

in

HD

U_B

r = _

____

____

Des

t. R

eg. a

ddr.(

s)

com

e(s)

from

___

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

__

# of

com

para

tors

in

HD

U_B

r = _

____

____

Des

t. R

eg. a

ddr.(

s)

com

e(s)

from

___

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

____

__

Tho

ugh

it is

not

des

irab

le to

“del

ay”

the

BE

Q e

xecu

tion,

how

late

in th

e pi

pelin

e ca

n

you

exec

ute

the

BE

Q in

str.

?

The

late

st st

age

for e

xe-

cutin

g B

EQ is

___

____

_(E

X/M

EM/W

B).

The

late

st st

age

for e

xe-

cutin

g B

EQ is

___

____

_(E

X/M

EM1/

MEM

2/W

B).

The

ear

liest

a B

EQ

can

be

exec

uted

from

is:

The

earli

est s

tage

for e

xe-

cutin

g B

EQ is

___

____

_(I

F/ID

/EX

).

The

earli

est s

tage

for e

xe-

cutin

g B

EQ is

___

____

_(I

F1/IF

2/ID

/EX

).

Not ap

plica

ble

Not ap

plica

ble

Not ap

plica

ble

Not ap

plica

ble

Not ap

plica

ble

Not ap

plica

ble

Page 20: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 20 / 24C Copyright 2006 Gandhi Puvvada

4 [Based on Question #5 of Summer 2004 Midterm] Modified Pipeline Design (4-stage pipeline) :

4.1 Pipelined CPU design:

Refer to your lab #6 5-stage pipeline design.

For the sake of this problem let us assume that we have a very fast ALU and a very fast Data Memory. Because they are very fast we could combine the EX-stage and the MEM-stage into one stage called EXMEM. Also to make the problem simpler, in this question we don’t consider forwarding help for BEQ instructions. Hence the FU_Br in ID stage has been removed. A BEQ instruction is stalled until the dependency is resolved. On the next page, a partially modified 4-stage design is presented. The HDU is not needed in this design and is removed. The input connections to the FU and HDU_Br are reduced.

Complete the forwarding paths to carry forwarding data to the forwarding MUXes and also input connections to the FU (forwarding unit) on the next page.

4.2 Compare and contrast the 5-stage pipeline design of lab #6 with the 4-stage pipeline design on the next page.

4.2.1 Unlike in the 5-stage pipeline, we do not need the regular HDU for LW dependency in the 4-stage pipeline because _______ _________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________However, we still need HDU_Br to stall the BEQ instructions. Answer the following questions about stalling happened in the instruction sequences:

Stream #1:lw $4 , $3(40) ;add $10, $4 , $6 ;

For this stream #1, ________ clock cycles is needed for stalling. Remark:______________________________________________________________________________________________________________________________________________________________________________________________________________

Stream #2:lw $4 , $3(40) ;beq $10, $4 , loop1 ;

For this stream #2, ________ clock cycles is needed for stalling. Remark:______________________________________________________________________________________________________________________________________________________________________________________________________________

Page 21: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 21 / 24C Copyright 2006 Gandhi Puvvada

Stream #3:add $4 , $3, $2 ;beq $10, $4, loop1 ;

For this stream #3, ________ clock cycles used for stalling. Remark:___________________________________________________________________________________________________________________________________________________________________________________________________________

4.2.2 In the 5-stage pipeline, the PCWrite is under the control of ____________________________ _____________________________________ (HDU/HDU_Br/FU/FU_Br/Successful Branch/Successful Jump/Combination of these/none of these/none, no need to control, activated all the time).In the 4-stage pipeline, the PCWrite is under the control of ____________________________ _____________________________________ (HDU/HDU_Br/FU/FU_Br/Successful Branch/Successful Jump/Combination of these/none of these/none, no need to control, activated all the time).

4.2.3 The forwarding unit (FU) in the case of the 5-stage pipeline has _____(0/1/2/3/4/5/6) ________ (1-bit/2-bit/3-bit/4-bit/5-bit/32-bit) comparators where as the FU in the case of the 4-stage pipeline has _____ (0/1/2/3/4/5/6) _________ (1-bit/2-bit/3-bit/4-bit/5-bit/32-bit) comparators.

The FU in the case of the 4-stage pipeline produces ____________________ (one/two) outputs, of size __________ (1-bit / each 1-bit / 2-bit / each 2-bit) to control the forwarding muxes.

4.2.4 The HDU_Br (Hazard Detection Unit assisting beq) in the case of the 5-stage pipeline has _____(0/1/2/3/4/5/6) ________ (1-bit/2-bit/3-bit/4-bit/5-bit/32-bit) comparators where as the same in the case of the 4-stage pipeline has _____ (0/1/2/3/4/5/6) _________ (1-bit/2-bit/3-bit/4-bit/5-bit/32-bit) comparators.

4.2.5 _______ (Like / Unlike) in the case of the 5-stage pipeline, we ____________ (need / don’t need) prioritization in the 4-stage pipeline in providing forwarding help to the instr #3 in the sequence of adds on the right.

4.2.6 If the clock frequency is the same for the two pipelines and we ignore the control (branch) hazard, the performance of the 4-stage pipeline is________________________________________ (better than / equal to / worse than / sometimes better than and sometimes worse than) the 5-stage pipeline performance.Explain. ___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

4.2.7 In the 4-stage pipeline, since the ALU and the Memory are both in one stage, they can work simultaneously and this merging of ALU with Memory in a single stage does not call for extending the clock period (even if we use the original ALU and Data memory which are NOT fast). TRUE / FALSE Explain. ___________________________________________________ ______________________________________________________________________________

instr #1 add $2, $2, $2instr #2 add $2, $2, $2instr #3 add $2, $2, $2

Page 22: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 22 / 24C Copyright 2006 Gandhi Puvvada

04 Instruction

memory

PC

+

r1 r2

R1

R2

w W

opcode rs rt rd shift funct

Registers

Control

(PC)

(rs) (rt)

ALU

rt rdA

LU ctrl

Sig

nex

t.

EXME

WB

ALU

Src

ALU

Op

Reg

Dst

ALU

Src

RegDst

ALU

Op

Reg

Writ

e_EX

Datamemory

@ W

R

MemRead

MemWrite

IF.F

lush

WR

WB MEM_data REG_data

RegWrite

Mem

toR

eg

+

=

functs_ext

Shift

Left

2Zero

Forw

ardi

ng U

nit

IF/I

DIF

-Sta

geID

/EX

ME

MID

-Sta

geE

XM

EM

-Sta

geE

XM

EM

/WB

WB

-Sta

ge

rs

WriteRegister_EX

HD

U_B

r

STA

LL_B

EQ

STA

LL

Bra

nch

0 1

01

1

1

1

0

00

0

0 1

Bra

nch

1

fow

ardi

ng_m

ux_c

ontr

ol

Ear

ly B

ranc

h 4-

stag

e

Page 23: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 23 / 24C Copyright 2006 Gandhi Puvvada

Forw

ardi

ngun

it

Haz

ard

dete

ctio

nun

it

04

0

0

Instructionmemory

PC

+

r1 r2

R1

R2

w W

opcode rs rt rd shift funct

Registers

Control

(PC)

(rs) (rt)

ALU

rs rt rd functshift

ALU ctrl

Sign ext.

EXME

WB

ALU

Src

ALU

Op

Reg

Dst

ALU

Src

Reg

Dst

ALU

Op

Mem

Rea

d

+

(PC)

Z

Datamemory

WR

ME

WB ALU_result

@ W

R

MemRead

MemWrite

Store_data

RegWrite

(PC)

Branch

ID.F

lush

IF.Flush

EX.F

lush

WR

WB MEM_data REG_data

RegWrite

Mem

toR

eg

Orig

inal

dra

win

g pr

ovid

ed b

y Pr

of. D

uboi

sPi

pelin

ed C

PU (L

ate

Bra

nch

from

1st

Ed.

) for

the

EE

457

clas

s Lab

#6

Shift

Left

2

3/26

/200

0

IF/I

DIF

-Sta

geID

/EX

ID-S

tage

EX

/ME

ME

X-S

tage

ME

M-S

tage

ME

M/W

B WB

-Sta

ge

Lat

e B

ranc

h (O

LD

Lab

6)

Page 24: 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple Carry Adder · 2007-07-23 · 1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below

ee457_Lab6_Part4_r3.fm 7/22/07

EE457 Lab #6 / part 4 24 / 24C Copyright 2006 Gandhi Puvvada

Haz

ard

dete

ctio

nun

it

04 Instruction

memory

PC

+

r1 r2

R1

R2

w W

opcode rs rt rd shift funct

Registers

Control

(PC)

(rs) (rt)

ALU

rt rd

ALU ctrl

Sig

nex

t.

EXME

WB

ALU

Src

ALU

Op

Reg

Dst

ALU

Src

RegDst

ALU

Op

Reg

Writ

e_EX

Datamemory

WR

ME

WB ALU_result

@ W

R

MemRead

MemWrite

Store_data

RegWrite

IF.F

lush

WR

WB MEM_data REG_data

RegWrite

Mem

toR

eg

+

=

functs_ext

Shift

Left

2Zero

Forw

ardi

ng U

nit

Des

igne

d by

: Gan

dhi P

uvva

daD

etai

led

impl

emen

tatio

n of

Ear

ly B

ranc

h su

gges

ted

in 3

rd E

d.10

/18/

06

IF/I

DIF

-Sta

geID

/EX

ID-S

tage

EX

/ME

ME

X-S

tage

ME

M-S

tageM

EM

/WB

WB

-Sta

ge

rs

Mem

Rea

d_EX

Mem

Rea

d_M

EM

WriteRegister_EXFU

_Br

FW_RS_WB

FW_RS_MEM

FW_RT_WB

FW_RT_MEM

FW_RT

FW_RS

WriteRegister_MEM

Writ

eReg

iste

r_M

EMH

DU

_Br

STA

LL_B

EQST

ALL_

LW

STA

LL

Branch

0 1

0 1 10

01

11

11

1

00

00

0

0

0 1

Branch

1

fow

ardi

ng_m

ux_c

ontr

ol

Dra

wn

by: W

ei-je

n H

su

Ear

ly B

ranc

h(C

urre

nt L

ab6)