30
Anshul Kumar, CSE IITD CSL718 : Superscalar CSL718 : Superscalar Processors Processors Speculative Execution 2nd Feb, 2006

CSL718 : Superscalar Processors

  • Upload
    lajos

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

CSL718 : Superscalar Processors. Speculative Execution 2nd Feb, 2006. Handling Control Dependence. Simple pipeline Branch prediction reduces stalls due to control dependence Wide issue processor Mere branch prediction is not sufficient - PowerPoint PPT Presentation

Citation preview

Page 1: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD

CSL718 : Superscalar CSL718 : Superscalar ProcessorsProcessors

CSL718 : Superscalar CSL718 : Superscalar ProcessorsProcessors

Speculative Execution

2nd Feb, 2006

Page 2: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 2

Handling Control DependenceHandling Control DependenceHandling Control DependenceHandling Control Dependence

• Simple pipeline– Branch prediction reduces stalls due to control

dependence

• Wide issue processor– Mere branch prediction is not sufficient– Instructions in the predicted path need to be

fetched and EXECUTED (speculated execution)

Page 3: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 3

What is required for speculation?What is required for speculation?What is required for speculation?What is required for speculation?

• Branch prediction to choose which instructions to execute

• Execution of instructions before control dependences are resolved

• Ability to undo the effects of incorrectly speculated sequence

• Preserving of correct behaviour under exceptions

Page 4: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 4

Types of speculationTypes of speculationTypes of speculationTypes of speculation

• Hardware based speculation– done with dynamic branch prediction and

dynamic scheduling– used in Superscalar processors

• Compiler based speculation– done with static branch prediction and static

scheduling– used in VLIW processors

Page 5: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 5

Extending Tomasulo’s scheme for Extending Tomasulo’s scheme for speculative executionspeculative execution

Extending Tomasulo’s scheme for Extending Tomasulo’s scheme for speculative executionspeculative execution

• Introduce re-order buffer (ROB)

• Add another stage – “commit”

Normal execution• Issue• Execute• Write result

Speculative execution• Issue• Execute• Write result• Commit

f xfx

i i xx

Page 6: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 6

Extending Tomasulo’s scheme for Extending Tomasulo’s scheme for speculative execution – contd.speculative execution – contd.

Extending Tomasulo’s scheme for Extending Tomasulo’s scheme for speculative execution – contd.speculative execution – contd.

• Write results into ROB in the “write result” stage• Write results into register file or memory in the

“commit” stage• Dependent instructions can read operands from

ROB• A speculative instruction commits only if the

prediction is determined to be correct• Instructions may complete execution out-of-order,

but they commit in-order

Page 7: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 7

Recall Tomasulo’s scheme ......

Page 8: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 8

IssueIssueIssueIssue

• Get next instruction from instruction queue• Check if there is a matching RS which is

empty– no: structural hazard, instruction stalls– yes: issue the instruction to that RS

• For each operand, check if it is available in RF– yes: put the operand in the RS– no: keep track of FU that will produce it

Page 9: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 9

ExecuteExecuteExecuteExecute

• If one or more operands not available, wait and monitor CDB

• When an operand becomes available, it is placed in RS

• When all operands are available, start execution

• Choice may need to be made if multiple instructions become ready at the same time

Page 10: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 10

Write resultWrite resultWrite resultWrite result

• When result is available– write it on CDB and – from there into RF and relevant RSs

• Mark RS as available

Page 11: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 11

More formal description ......

Page 12: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 12

RS and RF fieldsRS and RF fieldsRS and RF fieldsRS and RF fields

op busy Qj Vj Qk Vk val Qi

Page 13: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 13

IssueIssueIssueIssue

• Get instruction <op, rd, rs, rt> from instruction queue

• Wait until r RS[r].busy = no• if (RF[rs].Qi 0)

{RS[r].Qj RF[rs].Qi}else {RS[r].Vj RF[rs].val; RS[r].Qj 0}

• similarly for rt• RS[r].op op; RS[r].busy yes;

RF[rd].Qi r

Page 14: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 14

ExecuteExecuteExecuteExecute

• Wait until RS[r].Qj = 0 and RS[r].Qk = 0

• Compute result: operation is RS[r].op, operands are RS[r].Vj and RS[r].Vk

Page 15: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 15

Write resultWrite resultWrite resultWrite result

• Wait until execution complete at r and CDB available

x if (RF[x].Qi = r)

{RF[x].val result; RF[x].Qi 0} x if (RS[x].Qj = r)

{RS[x].Vj result; RS[x].Qj 0}

• similarly for Qk / Vk

• RS[r].busy no

Page 16: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 16

Tomasulo’s scheme plus ROB......

Page 17: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 17

IssueIssueIssueIssue

• Get next instruction from instruction queue• Check if there is a matching RS which is empty

and an empty slot in ROB– no: structural hazard, instruction stalls

– yes: issue the instruction to that RS and mark the ROB slot, also put ROB slot number in RS

• For each operand, check if it is available in RF or ROB– yes: put the operand in the RS

– no: keep track of FU that will produce it

Page 18: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 18

Execute (no change)Execute (no change)Execute (no change)Execute (no change)

• If one or more operands not available, wait and monitor CDB

• When an operand becomes available, it is placed in RS

• When all operands are available, start execution

• Choice may need to be made if multiple instructions become ready at the same time

Page 19: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 19

Write resultWrite resultWrite resultWrite result

• When result is available– write it on CDB with ROB tag and – from there into ROB RF and relevant RSs

• Mark RS as available

Page 20: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 20

Commit (non-branch instruction)Commit (non-branch instruction)Commit (non-branch instruction)Commit (non-branch instruction)

• Wait until instruction reaches head of ROB

• Update RF

• Remove instruction from ROB

Page 21: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 21

Commit (branch instruction)Commit (branch instruction)Commit (branch instruction)Commit (branch instruction)

• Wait until instruction reaches head of ROB

• If branch is mispredicted, – flush ROB– Restart execution at correct successor of the

branch instruction

• else– Remove instruction from ROB

Page 22: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 22

More formal description ......

Page 23: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 23

RS fieldsRS fieldsRS fieldsRS fields

op busy Qi Qj Vj Qk Vk

Page 24: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 24

RF fieldsRF fieldsRF fieldsRF fields

val Qi busy

Page 25: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 25

ROB fieldsROB fieldsROB fieldsROB fields

inst busy rdy val dst

Page 26: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 26

IssueIssueIssueIssue• Get instruction <op, rd, rs, rt> from instruction queue• Wait until r RS[r].busy=no and

ROB[b].busy=no, where b = ROB tail• if (RF[rs].busy) {h RF[rs].Qi;

if (ROB[h].rdy) {RS[r].Vj ROB[h].val; RS[r].Qj 0}else {RS[r].Qj h}

} else {RS[r].Vj RF[rs].val; RS[r].Qj 0}

• similarly for rt• RS[r].op op; RS[r].busy yes; RS[r].Qi b• RF[rd].Qi b; RF[rd].busy yes; ROB[b].busy yes• ROB[b].inst op; ROB[b].dst rd; ROB[b].rdy no

Page 27: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 27

Execute (no change)Execute (no change)Execute (no change)Execute (no change)

• Wait until RS[r].Qj = 0 and RS[r].Qk = 0

• Compute result: operation is RS[r].op, operands are RS[r].Vj and RS[r].Vk

Page 28: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 28

Write resultWrite resultWrite resultWrite result• Wait until execution complete at r and CDB

available

• b RS[r].Qi; RS[r].busy no x if (RF[x].Qi = r)

{RF[x] result; RF[x].Qi 0} x if (RS[x].Qj = b)

{RS[x].Vj result; RS[x].Qj 0}

• similarly for Qk / Vk

• ROB[b].rdy yes; ROB[b].val result

Page 29: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 29

Commit (non-branch instruction)Commit (non-branch instruction)Commit (non-branch instruction)Commit (non-branch instruction)

• Wait until instruction reaches head of ROB (entry h) and ROB[h].rdy = yes

• d ROB[h].dst

• RF[d].val ROB[h].val

• ROB[h].busy no

• if (RF[d].Qi = h) {RF[d].busy no}

Page 30: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 30

Commit (branch instruction)Commit (branch instruction)Commit (branch instruction)Commit (branch instruction)

• Wait until instruction reaches head of ROB (entry h) and ROB[h].rdy = yes

• If branch is mispredicted, – clear ROB, RF[ ].Qi– fetch branch dest

• else– ROB[h].busy no– if (RF[d].Qi = h) {RF[d].busy no}