25
C C M M L L C C M M L L CS 230: Computer CS 230: Computer Organization and Organization and Assembly Language Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics Arizona State University Slides courtesy: Prof. Yann Hang Lee, ASU, Prof. Mary Jane Irwin, PSU, Ande Carle, UCB

CS 230: Computer Organization and Assembly Language

  • Upload
    pekelo

  • View
    19

  • Download
    0

Embed Size (px)

DESCRIPTION

CS 230: Computer Organization and Assembly Language. Aviral Shrivastava. Department of Computer Science and Engineering School of Computing and Informatics Arizona State University. Slides courtesy: Prof. Yann Hang Lee, ASU, Prof. Mary Jane Irwin, PSU, Ande Carle, UCB. Announcements. - PowerPoint PPT Presentation

Citation preview

Page 1: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

CS 230: Computer CS 230: Computer Organization and Organization and

Assembly LanguageAssembly LanguageAviral

ShrivastavaDepartment of Computer Science and

EngineeringSchool of Computing and Informatics

Arizona State University

Slides courtesy: Prof. Yann Hang Lee, ASU, Prof. Mary Jane Irwin, PSU, Ande Carle, UCB

Page 2: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

AnnouncementsAnnouncements• Alternate Project

– Submit Nov 24

• Quiz 5– Thursday, Nov 19, 2009– Pipelining

• Finals– Tuesday, Dec 08, 2009– Please come on time (You’ll need all the time)– Open book, notes, and internet– No communication with any other human

Page 3: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Benefits of PipeliningBenefits of Pipelining

• Pipeline latches: pass the status and result of the current instruction to next stage

• Comparison:

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10

Ifetch

lw sw

Dec/Reg Exec Mem Wr Dec/Reg Exec MemIfetchSingle- cycle inst.

Ifetch Dec/Reg Exec Mem Wr

Ifetch Dec/Reg Exec Mem Wr

Ifetch Dec/Reg Exec Mem Wr

pipelined

Page 4: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Branch HazardsBranch Hazards

• So far, we’ve limited discussion of hazards to:– Arithmetic/logic operations– Data transfers

• Also need to consider hazards involving branches:– Example:

• 40: beq $1, $3, 28• 44: and $12, $2, $5• 48: or $13, $6, $2• 52: add $14, $2, $2• 72: lw $4, 50($7)

• How long will it take before the branch decision takes effect?– What happens in the meantime?

Page 5: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Branch signal determined in Branch signal determined in MEM stageMEM stage

Readreg 1

Shiftleft 2

Signextend

InstructionMemory

Read address

Readreg 2

Writereg

Writedata

Readdata 1

Readdata 2

Readaddr

Writeaddr

Writedata

Readdata

ALU

Add

Add

Zero

Mux

Mux

Mux

PC

DataMemory

Mux

IF/ID

EX/MEM

ID/EX

MEM/WB

ALUcontrol

Reg

Writ

e

ALUSrc

Bra

nch

Mem

Writ

e

Mem

toR

eg

Reg

Dst

ALUOp

Mem

Re

ad

PCSrc

Inst[15-0]

Inst[20-16]

Inst[15-11]

Control

WB

M

EX

WB

M WB

Registers

Page 6: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Pipeline impact on Pipeline impact on branchbranch

• If branch condition true, must skip 44, 48, 52– But, these have already started down the pipeline– They will complete unless we do something about it

• How do we deal with this?– We’ll consider 2 possibilities

IM Reg DM Reg

IM Reg DM Reg

IM Reg DM Reg

IM Reg DM Reg

IM Reg DM Reg

40 beq $1, $3, 28

44 and $12, $2, $5

48 or $13, $6, $2

52 add $14, $2, $2

72 lw $4, 50($7)

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9clock cycle:PC Changed during Mem cycle of beq

Page 7: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Dealing w/branch hazards: Dealing w/branch hazards: always stallalways stall

• Branch taken– Wait 3 cycles– No proper instructions in the pipeline– Same delay as without stalls (no time lost)

40 beq $1, $3, $28

72 lw $4, 50($7)

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9clock cycle:

bubble

IM Reg DM Reg

IM bubblebubble

CC 10 CC 11 CC 12

bubble

stall

stall

stall

bubbleIM bubblebubble bubble

bubbleIM bubblebubble bubble

IM Reg DM Reg

Page 8: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Dealing w/branch hazards: Dealing w/branch hazards: always stallalways stall

• Branch not taken– Still must wait 3 cycles– Time lost– Could have spent cycles fetching and decoding next

instructions

40 beq $1, $3, $28

44 and $12, $2, $5

48 or $13, $6, $2

52 add $14, $2, $2

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9clock cycle:

bubble

IM Reg DM Reg

IM bubblebubble

CC 10 CC 11 CC 12

bubble

IM Reg DM Reg

IM Reg DM Reg

stall

stall

stall

bubbleIM bubblebubble bubble

bubbleIM bubblebubble bubble

IM Reg DM Reg

Page 9: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Assume branch not takenAssume branch not taken

• On average, branches are taken ½ the time– If branch not taken…

• Continue normal processing

– Else, if branch is taken…• Need to flush improper instruction from pipeline

• Cuts overall time for branch processing in ½

Page 10: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Flushing unwanted instructions Flushing unwanted instructions from pipelinefrom pipeline

• Useful to compare w/stalling pipeline:– Simple stall: inject bubble into pipe at ID stage only

• Change control to 0 in the ID stage• Let “bubbles” percolate to the right

– Flushing pipe: must change inst. In IF, ID, and EX• IF Stage:

– Zero instruction field of IF/ID pipeline register– Use new control signal IF.Flush

• ID Stage:– Use existing “bubble injection” mux that zeros control for

stalls– Signal ID.Flush is ORed w/stall signal from hazard detection

unit• EX Stage:

– Add new muxes to zero EX pipeline register control lines– Both muxes controlled by single EX.Flush signal

• Control determines when to flush:– Depends on Opcode and value of branch condition

Page 11: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Flushing PipelineFlushing Pipeline

PC

IF/ID

EX/MEM

ID/EX

MEM/WB

WB

M

EX

WB

M WB

Mux0 M

ux0

Mux

0

HazardDetection

Unit

Control

IF.Flush

ID.Flush

EX.Flush

Branch Decision

Flush Pipeline

Page 12: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Assume “branch not taken”…and branch is Assume “branch not taken”…and branch is not taken…not taken…

• Execution proceeds normally – no penalty

IM Reg DM Reg

IM Reg DM Reg

IM Reg DM Reg

IM Reg DM Reg

40 beq $1, $3, 28

44 and $12, $2, $5

48 or $13, $6, $2

52 add $14, $2, $2

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9clock cycle:

Page 13: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Assume “branch not taken”…and branch Assume “branch not taken”…and branch isis taken…taken…

• Bubbles injected into 3 stages during cycle 5

IM Reg DM Reg

IM Reg

IM Reg

IM

IM Reg DM Reg

40 beq $1, $3, 28

44 and $12, $2, $5

48 or $13, $6, $2

52 add $14, $2, $2

72 lw $4, 50($7)

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9clock cycle:

bubble bubble bubble bubble

bubble bubble bubble

bubble bubble

Page 14: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Reservation Table Reservation Table PicturePicture

• Another way of looking at it…

40: beq $1, $3, 72

44: and $12, $2, $5

48: or $13, $6, $2

52: add $14, $2, $2

72: lw $4, 50($7)

Assume Branch Not Taken and Correct

Assume Branch Not Taken and NOT Correct

1 2 3 4 5 6 7 8 9

IF Beq And Or Add 56

ID Beq And Or Add 56

EX Beq And Or Add 56

Mem

Beq And Or Add 56

WB Beq And Or Add 56

1 2 3 4 5 6 7 8 9

IF Beq And Or Add Sw

ID Beq And Or Add Sw

EX Beq And Or Add Sw

Mem

Beq --- --- --- 56

WB Beq --- --- --- 56

No penalty

3 cycle penalty

(FYI, branchFreq ~ 20%; &3 cycle penalty50% of time)

Page 15: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Branch Penalty ImpactBranch Penalty Impact

• Assume 16% of all instructions are branches– 4% unconditional branches: 3 cycle penalty– 12% conditional: 50% taken

• For a sequence of N instructions (assume N is large)

• N cycles to initiate each• 3 * 0.04 * N delays due to unconditional branches• 0.5 * 3 * 0.12 * N delays due to conditional taken• Also, an extra 4 cycles for pipeline to empty

• Total:– 1.3*N + 4 total cycles (or 1.3 cycles/instruction)

(CPI)• 30% Performance Hit!!! (Bad thing)

Page 16: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Branch Penalty ImpactBranch Penalty Impact

• Some solutions:– In ISA: branches always execute next 1 or

2 instructions• Instruction so executed said to be in delay slot• See SPARC ISA• (example – loop counter update)

– In organization: move comparator to ID stage and decide in the ID stage• Reduces branch delay by 2 cycles• Increases the cycle time

Page 17: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Branch PredictionBranch Prediction

• Prior solutions are “ugly”• Better (& more common): guess in IF stage

– Technique is called “branch predicting”; needs 2 parts:• “Predictor” to guess where/if instruction will branch (and to

where)• “Recovery Mechanism”: i.e. a way to fix your mistake

– Prior strategy:• Predictor: always guess branch never taken• Recovery: flush instructions if branch taken

– Alternative: accumulate info. in IF stage as to…• Whether or not for any particular PC value a branch was taken

next• To where it is taken• How to update with information from later stages

Page 18: CS 230: Computer Organization and Assembly Language

CCMMLL

A Branch PredictorA Branch Predictor

PC

InstructionMemory

Normal PC value

BranchPredictionLogic

Guess Branch

Guess as to whereto branch

BranchUpdateInformation

Page 19: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Branch History TableBranch History Table

PC

InstructionMemory

Normal PC value

BranchHistoryTable

Given a PC, look up an entry in Table.Each Table entry has two fields

1 bit Branch PredictionNew PC value

BHT updated by Mem stage when each real branch is resolved

Questions:How to keep BHT from being too bigHow to generate prediction

Answer to BHT size question: use only bottom N bits (e,g, N=8) of PCThis means that multiple instructions will

“share” same entry, causing potential mistakesBranch Prediction

Predicted PC Value

Branch Prediction Accuracy: how often is our prediction correct

Page 20: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Branch Prediction Branch Prediction InformationInformation

• One bit predictor:– Use result from last time we saw this instruction

• Problem:– Even if branch is almost always taken, we will be

wrong at least twice• 1st time we the instruction• 1st time the branch is not taken• Also, 1st time branch is taken again after than• And if branch alternates b/t taken, not taken…

– We get 0% accuracy

• Can we do better? Yep.

Page 21: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Branch Prediction Branch Prediction InformationInformation

• How to do better?– Keep a “counter” in each entry of the number of

times taken in the last N times executed– Keep information about the “pattern” of previous

branches

• Book’s scheme: a “2-bit saturating counter”– Increment when branch is taken– Decrement when branch is not taken– Don’t increment or decrement above or below a

max/min count• Use sign of count as predictor

Page 22: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Book’s 2 Bit Branch Book’s 2 Bit Branch CounterCounter

PredictTaken

PredictTaken

PredictNot

Taken

PredictNot

Taken

Actually Taken

Actually Taken

Actually Not Taken

Actu ally Taken

Actually Not Taken

Actu ally Taken Actu ally Not Taken

Actu ally Not TakenAs soon as (and only when) we have two mispredictions in a row do we change our prediction.

Page 23: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Computing Computing PerformancePerformance

• Program assumptions:– 23% loads and in ½ of cases, next instruction uses load value– 13% stores– 19% conditional branches– 2% unconditional branches– 43% other

• Machine Assumptions:– 5 stage pipe with all forwarding

• Only penalty is 1 cycle on use of load value immediately after a load)

• Jumps are totally resolved in ID stage for a 1 cycle branch penalty• 75% branch prediction accuracy• 1 cycle delay on misprediction

Page 24: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

The Answer:The Answer:

• CPI penalty calculation:– Loads:

• 50% of the 23% of loads have 1 cycle penalty: .5*.23=0.115

– Jumps:• All of the 2% of jumps have 1 cycle penalty: 0.02*1 = 0.02

– Conditional Branches:• 25% of the 19% are mispredicted for a 1 cycle penalty:

0.25*0.19*1 = 0.0475

• Total Penalty: 0.115 + 0.02 + 0.0475 = 0.1825

• Average CPI: 1 + 0.1825 = 1.1825

Page 25: CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

Yoda says…Yoda says…

Death is a natural part of life. Rejoice for those around you who transform into the Force. Mourn them do not. Miss them do not