51
1 CA226 — Advanced Computer Architecture Stephen Blott <[email protected]> Table of Contents

CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

1

CA226 — AdvancedComputer Architecture

Stephen Blott <[email protected]>

Table of Contents

Page 2: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

2

Types of Hazard

Structural hazardsresource conflicts;hardware cannot support all instruction combinations simultaneously

Data hazardswhen one instruction depends upon the result (which is not yet available) of aprevious instruction

Control hazardswhen the address of the next instruction cannot be determined immediately(branch, jump instructions — today)

Page 3: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

3

Control HazardsControl hazards:

• arise from pipelining of branch (and jump) instructions

As described thus far, branching decisions:

• are made during the Mem stage of the pipeline

A naive approach:

• stall until branch decision is known

Page 4: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

4

TerminologyWhenever we encounter a branch:

• it is:

• either taken, or not taken

• the cost may be different in each case

Page 5: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

5

Control Hazards

Page 6: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

6

Naive Branching

1 2 3 4 5 6 7

branch IF ID Ex Mem** WB

branch+4 stall stall stall **IF ID Ex

branch+8 stall stall stall IF ID

branch IF ID Ex Mem** WB

target stall stall stall **IF ID Ex

target+4 stall stall stall IF ID

Page 7: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

7

Unfortunately …This will result in:

• the pipeline being stalled for three cycles every time a branch is encountered

• and branch instructions are common

Page 8: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

8

…What might help is:

• a prediction

Predict that a branch will either be:

• taken, or not taken

Page 9: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

9

…Easiest thing to do:

• predict branch not taken

• simply allow subsequent instructions to continue to flow into the pipeline

Page 10: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

10

Predict Not Taken

Table 1. And branch is indeed not taken:

branch IF ID Ex Mem** WB

branch+4 IF ID Ex **Mem WB

branch+8 IF ID Ex Mem WB

branch+12 IF ID Ex Mem

Perfect!

• But what if the branch is in fact taken?

Page 11: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

11

Predict Not Taken

Table 2. But branch is in fact taken:

branch IF ID Ex Mem** WB

branch+4 IF ID Ex **Mem WB

branch+8 IF ID **Ex Mem WB

branch+12 IF **ID Ex Mem

target **IF

Page 12: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

12

Predict Not Taken

Table 3. But branch is in fact taken:

branch IF ID Ex Mem** WB

branch+4 IF ID Ex **nop nop

branch+8 IF ID **nop nop nop

branch+12 IF **nop nop nop

target **IF

Observe:

• none of the subsequent instructions has yet changed memory or any registersthat’s helpful!replace them with nop instructions

(Still a stall of three cycles when branch taken.)

Page 13: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

13

Slightly BetterWhen a branch instruction is detected:

• route the Branch Taken condition:

• from Ex(instead of from Mem)

• to ID(instead of to IF)

Page 14: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

14

MIPS Pipeline

Page 15: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

15

Example

Table 4. Branch not taken:

branch IF ID Ex** Mem WB

branch+4 IF stall **ID Ex Mem WB

branch+8 IF ID Ex Mem

branch+12 IF ID Ex

Note

We save two stalls:one because we learn the decision one cycle sooner, andone because we allow the subsequent instruction into IF

Page 16: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

16

Example — Branch Taken

Table 5. Branch taken:

branch IF ID Ex** Mem WB

branch+4 IF nop **ID nop nop nop

target **IF ID Ex Mem

target+4 IF ID Ex

Note

An effective stall of two cycles, but one better than before, because we learn if thebranch is taken one cycle sooner.

Page 17: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

17

Where do we stand?If a branch is not taken:

• we have a stall of one cycle

If a branch is taken:

• we have a stall of two cycles

Page 18: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

18

In PracticeUnfortunately:

• branches are commonand most branches are taken(which is indeed unfortunate)

Page 19: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

19

In PracticeAdd additional hardware in ID:

• detect branches

• decode the target address:target = IF/ID.nPC + (sign-extend(Regs[IF/ID.IR(0..15)]) <<2)(so we need at leastat least an adder)

• calculate whether the branch is taken:we need to:

• test equality, and for zero(and perhaps a couple of other tests)

Page 20: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

20

..

Page 21: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

21

..So:

• branching is so common and the cost of stalls so great,

• that it is worth the cost and complexity of additional hardware in the ID pipelinestage

Page 22: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

22

..So:

• we determine one stage earlier still whether a branch is taken or not(in ID, now, instead of in Ex)

So, we have:

• no stall if the branch is not taken, and

• a one-cycle stall if the branch is taken

Page 23: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

23

Now…

Table 6. Branch not taken:

1 2 3 4 5 6 7

branch IF ID** Ex Mem WB

branch+4 IF **ID Ex Mem WB

Page 24: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

24

Now…

Table 7. Branch taken:

branch IF ID** Ex Mem WB

branch+4 IF **nop nop nop nop

target **IF ID Ex Mem WB

target+4 IF ID Ex Mem

Page 25: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

25

..Try these in the simulator ….

bnez r0,target ; no stalldaddi r1,r0,1

beqz r0,target ; branch taken, stall of 1 cycledaddi r1,r1,1

Note to self:

• see branch.s

Page 26: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

26

Predict Not TakenIn effect:

• we’re guessing, here, that the branch will not be taken

• so this strategy is known as predict not taken

Page 27: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

27

..So:

• no stall if the branch is not taken

• a stall of one cycle if the branch is taken

What might the average number of stall cycles for branch instructions be?

Page 28: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

28

Unfortunately, …The common case in practice is …

• that the branch is taken!

• so the average number of stalls per branch, in practice, approaches 1

Page 29: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

29

Because …for (i=0; i<N; i+=1){ // do stuff}

Whenever we have such a loop:

• the branch is taken more often than not taken

Page 30: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

30

Because … daddi r1,r0,0 ; i=0; beq r1,r2,done ; if (i==N) goto done;loop: ; do stuff daddi r1,r1,1 ; i+=1; bne r1,r2,done ; if (i!=N) goto loop;done:

The bne instruction:

• is repeated about N times so the branch is usually taken,so the stalls-per-branch approaches 1

Page 31: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

31

Might we do better?A predict branch taken strategy:

• would be helpful

• unfortunately, this is not possible on MIPS:

• we only learn the target address after the ID stage

• so a cycle has already been wasted

Page 32: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

32

Might we do better?A predict branch taken strategy:

• would be helpful

• unfortunately, this is not possible on MIPS:

• we only learn the target address after the ID stage

• so a cycle has already been wasted

Hmm:

• Wasted.

• Or is it?

Page 33: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

33

..How might we:

• make good use of that "wasted" cycle?

Page 34: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

34

The "Branch Delay Slot"A branch delay slot is:

• the instruction following any branch (or jump) instruction

Approach:

• the instruction in the delay slot is always executed,whether the branch is taken or not

Page 35: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

35

The "Delay Slot"

Table 8. Branch not taken:

branch IF ID** Ex Mem WB

branch+4 (BDS) IF **ID Ex Mem WB

branch+8 IF ID Ex Mem WB

The instruction after the branch:

• is always executed,good!

Page 36: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

36

The "Delay Slot"

Table 9. Branch taken:

branch IF ID** Ex Mem WB

branch+4 (BDS) IF ID Ex Mem WB

target **IF ID Ex Mem WB

target+4 IF ID Ex Mem

The instruction after the branch:

• is always executed,"branch+4" is executed anyway,no stall!

Page 37: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

37

The "Delay Slot"On such hardware, compilers:

• must insert a suitable instruction into the delay slot

• or, if that is not possible, then a nop (poor solution)

Page 38: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

38

Some Cases — nop

This:

dadd r1,r2,r3 bnez r2,somewhere

Becomes:

dadd r1,r2,r3 bnez r2,somewhere nop ; poor solution, effectively a stall

Note

Correct, but not great.The nop is in effect a stall.

Page 39: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

39

Some Cases — Independent InstructionThis:

dadd r1,r2,r3 bnez r2,somewhere

Becomes:

bnez r2,somewhere dadd r1,r2,r3 ; the branch does not depend on r1

Page 40: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

40

Some Cases — Temporary RegistersThis:

dadd r1,r2,r3 or r20,r2,r3 ; r20 is temporary register within this loop bnez r1,target ...target: dsub r4,r5,r6

Becomes:

dadd r1,r2,r3 bnez r1,target or r20,r2,r3 ; doesn't matter if executed ... ; again, the delay cycle is effectively losttarget: ; but only if the branch is taken! (no nop) dsub r4,r5,r6

Page 41: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

41

Loop — Far BetterThis:

target: dsub r4,r5,r6 ; assume r4 is a temporary register ... ; do stuff daddi r1,r1,-1 bnez r1,target ; branch depends on r1 nop ; BDS: we want to use this slot

Page 42: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

42

Loop — Far BetterThis:

target: dsub r4,r5,r6 ; assume r4 is a temporary register ... ; do stuff daddi r1,r1,-1 bnez r1,target

Becomes:

dsub r4,r5,r6 ; moved uptarget: ... ; do stuff daddi r1,r1,-1 bnez r1,target dsub r4,r5,r6 ; repeated, from above

Page 43: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

43

..Try these in the simulator, again, ….

bnez r0,target ; no stalldaddi r1,r0,1

beqz r0,target ; branch taken, no stall with branch delay slotdaddi r1,r1,1

Note

This time with the branch delay slot enabled.

Page 44: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

44

More Insurmountable StallsExample:

dadd r1,r2,r3 bnez r1,target ; stall one cycle

ld r1,N(r0) bnez r1,target ; stall two cycles

The branch:

• depends upon an immediately preceding arithmetic instruction

• depends upon an immediately preceding load (stall two cycles)

Page 45: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

45

Another Insurmountable Stall

Table 10. If branch taken is resolved in Ex:

dadd r1,r2,r3 IF ID Ex** Mem WB

bnez r1,target IF ID **Ex Mem WB

delay slot IF ID Ex Mem WB

No problem:

• r1 can be forwarded, as before

Page 46: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

46

Another Insurmountable Stall

Table 11. If branch taken is resolved in ID:

dadd r1,r2,r3 IF ID Ex** Mem WB

bnez r1,target IF **ID Ex Mem WB

delay slot IF ID Ex Mem WB

Oops:

• forwarding can’t help here

Page 47: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

47

Another Insurmountable Stall

Table 12. If branch taken is resolved in ID:

dadd r1,r2,r3 IF ID Ex** Mem WB

bnez r1,target IF stall **ID Ex Mem WB

delay slot IF ID Ex Mem

Such a RAW dependency:

• results in a stall of one cycle

(Try to find another instruction which can be inserted in between.)

Page 48: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

48

JumpsJumps:

• are handled the same way:we learn the target address in ID,the instruction in the delay slot is always executed

Page 49: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

49

Jumps

Table 13. Jumps are always taken:

jump IF ID** Ex Mem WB

delay slot IF ID Ex Mem WB

target **IF ID Ex Mem WB

target+8 IF ID Ex Mem

Page 50: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

50

ExampleNote to self:

• take a look at ../winmips64/reverse-with-nops.s

Page 51: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch

CA226 — AdvancedComputer Architecture

51

Done<script> (function() { var mathjax = 'mathjax/MathJax.js?config=asciimath'; // var mathjax= 'http://smblott.computing.dcu.ie/mathjax/MathJax.js?config=asciimath'; var element= document.createElement('script'); element.async = true; element.src = mathjax;element.type = 'text/javascript'; (document.getElementsByTagName('HEAD')[0]||document.body).appendChild(element); })(); </script>