15
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands

COMPSYS 304

  • Upload
    idalee

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

COMPSYS 304. Computer Architecture Speculation & Branching. Morning visitors - Paradise Bay, Bay of Islands. Speculation. High Tech Gambling? Data Prefetch Cache instruction dcbt : data cache block touch Attempts to bring data into cache so that it will be “close” when needed - PowerPoint PPT Presentation

Citation preview

Page 1: COMPSYS 304

COMPSYS 304

Computer ArchitectureSpeculation & Branching

Morning visitors - Paradise Bay, Bay of Islands

Page 2: COMPSYS 304

Speculation• High Tech Gambling?• Data Prefetch

• Cache instruction dcbt : data cache block touch

• Attempts to bring data into cache• so that it will be “close” when needed

• Allows SIU to use idle bus bandwidth• if there’s no spare bandwidth,

this read can be given low priority• Speculative because

• a branch may occur before it’s used• we speculate that this data may be needed

PowerPC mnemonic -Similar opcodes found in other architectures:SPARC v9, MIPS, …

Page 3: COMPSYS 304

Speculation - General• Some functional units almost always idle

• Make them do some (possibly useful) workrather than idle

• If the speculation was incorrect, results are simply abandoned

• No loss in efficiency; Chance of a gain• Researchers are actively looking at

software prefetch schemes• Fetch data well before it’s needed• Reduce latency when it’s actually needed

• Speculative operations have low priority and use idle resources

Page 4: COMPSYS 304

Branching• Expensive

• 2-3 cycles lost in pipeline• All instructions following branch ‘flushed’

• Bandwidth wasted fetching unused instructions• Stall while branch target is fetched

• We can speculate about the target of a branch• Terminology

• Branch Target : address to which branch jumps

• Branch Taken : control transfers to non- sequential address (target)

• Branch Not Taken : next instruction is executed

Page 5: COMPSYS 304

Branching - Prediction• Branches can be

• unconditional: branch is always taken call subroutine return from subroutine

• conditional: branch depends on state of computation, eg

has loop terminated yet?• Unconditional branches are simple

• New instructions are fetched as soon as the branch is recognized

• As early in the pipeline as possible • Branch units often placed with fetch &

decode stages

Page 6: COMPSYS 304

Branching - Branch Unit• PowerPC 603 logical layout

Page 7: COMPSYS 304

Branching - Speculation• We have the following code: if ( cond ) s1; else s2;

• Superscalar machine • Multiple functional units• Start executing both branches (s1 and s2)• Keep idle functional units busy!

• One is speculative and will be abandoned• Processor will eventually calculate the branch

condition and select which result should be retained (written back)

• MIPS R10000 - up to 4 speculative at once

Page 8: COMPSYS 304

Branching - Speculation• MIPS R10000 -

• Up to 4 speculative at once• Instructions are “tagged” with a 4 bit mask

• Indicates to which branch instruction it belongs

• As soon as condition is determined,mis-predicted instructions are aborted

Page 9: COMPSYS 304

Branching - Prediction• We have a sequence of instructions:

addlw

sub brne L1 or st

? If you were asked to guess which branch should be preferred, which would you choose:

? Next sequential instruction (L2)? Branch target (L1)

L2

L1 Some mixture of arithmetic,load, store, etc, instructions

branch on some condition

Some more arithmetic,load, store, etc, instructions

Page 10: COMPSYS 304

Branching - Prediction• Studies show that backward branches are

taken most of the time!• Because of loops:

add ;any mix of arith,lw ;load, store, etc,

sub ;instructionsbrne L1 ;branch back to loop start

or ;some more arith,st ;memory, etc instructions

L2

L1

Page 11: COMPSYS 304

Branching - Prediction Rule• A simple prediction rule:

• Take backward branches works amazingly well!• For a loop with n iterations,

this is wrong in 1/n cases only!• A system working on this rule alone would

• detect the backward branch and • start fetching from the branch target

rather than the next instruction

Page 12: COMPSYS 304

Branching - Improving the prediction• Static prediction systems

• Compiler can mark branches• Likely to be taken or not

• Instruction fetch unit will use the marking as advice on which instruction to fetch

• Compiler often able to give the right advice • Loops are easily detected• Other patterns in conditions can be recognized

• Checking for EOF when reading a file• Error checking

Page 13: COMPSYS 304

Branching - Improving the prediction• Dynamic prediction systems

• Program history determines most likely branch• Branch Target Buffers - Another cache!

Page 14: COMPSYS 304

Branching - Branch Target Buffer• Instruction Add[11:3] selects BTB entry• Tag determines “hit”• Stats select taken/not taken

Pentium 4>91% prediction

accuracy -4K entry BHT

(Branch History Table)G4e – 2K entries

Page 15: COMPSYS 304

Superscalar - summary• Superscalar machines have multiple

functional units (FUs)eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x

load/store• Requires complex IFU

• Able to issue multiple instructions/cycle (typ 4)• Able to detect hazards (unavailability of

operands)• Able to re-order instruction issue

• Aim to keep all the FUs busy• Typically, 6-way superscalars can achieve

instruction level parallelism of 2-3