22
ECE 486/586 Computer Architecture Lecture # 16 Spring 2019 Portland State University

ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

ECE 486/586

Computer Architecture

Lecture # 16

Spring 2019

Portland State University

Page 2: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

Lecture Topics

• Branch Prediction

Reference:

• Chapter 3: Section 3.3

Page 3: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

Why Predict Branches?

• The decision about control flow (where to fetch the next instruction from?) is made in the fetch stage

• The branch penalty is non-zero because when the processor computes the branch outcome (in decode stage), a useless instruction may have already been fetched and needs to be discarded

• To prevent the fetching of useless instruction, the processor needs to know about the branch outcome in the fetch stage

• This involves the following steps:

– Anticipating that the instruction being fetched is a branch instruction

– Predicting whether the branch instruction will be taken or not taken

– Predicting the branch target address (for a taken branch)

Page 4: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

Basic Branch Prediction

• Branch prediction buffer (branch history table)

• Memory indexed by low order bits of branch instruction address

• Stores previous branch outcomes to predict next outcome

• Memory is not tagged (unlike cache)

• Consequence: entry may reflect a different branch (aliasing)

PC

10PC[11:2]

210 = 1K entries

Page 5: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

Static Branch Prediction

• In static branch prediction, the prediction made for a conditional branch remains constant (static) throughout the execution of a program

• Example 1: Always-predict-not-taken– Simplest form of prediction, always fetch next instruction in the sequential order

– In case of a misprediction, the incorrectly fetched instruction is discarded and branch penalty is incurred

– Low prediction accuracy because many branches in the program are taken

• Typically, branch outcomes are not completely random

• In a loop with many iterations, forward branches (beginning of loop) are mostly not taken and backward branches (end of loop) are mostly taken

• Example 2: Predict not-taken for forward branches and taken for backward branches– Improves prediction accuracy as compared to the always-not-taken prediction

– Mispredictions still happen during the last loop iteration

Page 6: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

Dynamic Branch Prediction

• Outcomes for a branch instruction often change during program execution– Static prediction may result in high misprediction accuracy

• But, outcomes for a particular branch often follow a predictable pattern

• Key idea behind dynamic branch prediction:– Track the past outcomes for a branch instruction to make predictions about

future outcomes

• In its simplest form, a dynamic prediction algorithm can use the result of the most recent execution of a branch instruction– This result can be captured in a single bit (e.g., “0” if the branch was taken and

“1” if the branch was not taken)

– The processor assumes that the next time, the branch instruction is executed, its outcome is the same as the last time

Page 7: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

1-bit Branch Prediction

• The algorithm is implemented by a 2-state state machine:LT -- Branch is likely to be taken

LNT -- Branch is likely not to be taken• The prediction for a branch is based on the current state of the state machine • The state transitions are based on the actual outcome computed after the branch has been executed

Page 8: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

Example

• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 1-bit branch predictor starts in the LNT state. What predictions will it make for each instance of the branch?

Page 9: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

Example (cont.)

• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 1-bit branch predictor starts in the LNT state. What predictions will it make for each instance of the branch?

Instance Current State Prediction Actual Outcome Next State

1 LNT NT T LT

2

3

4

5

6

Page 10: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

Example (cont.)

• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 1-bit branch predictor starts in the LNT state. What predictions will it make for each instance of the branch?

Instance Current State Prediction Actual Outcome Next State

1 LNT NT T LT

2 LT T T LT

3

4

5

6

Page 11: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

Example (cont.)

• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 1-bit branch predictor starts in the LNT state. What predictions will it make for each instance of the branch?

Instance Current State Prediction Actual Outcome Next State

1 LNT NT T LT

2 LT T T LT

3 LT T NT LNT

4 LNT NT T LT

5 LT T T LT

6 LT T NT LNT

Page 12: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

Example (cont.)

• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 1-bit branch predictor starts in the LNT state. What predictions will it make for each instance of the branch?

Instance Current State Prediction Actual Outcome Next State

1 LNT NT T LT

2 LT T T LT

3 LT T NT LNT

4 LNT NT T LT

5 LT T T LT

6 LT T NT LNT

Prediction Accuracy = 2/6

Mispredictions happen during both the first and last iterations of the loop => one bit of state not enough to capture the branch outcome pattern accurately

Page 13: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

2-bit Branch Prediction

ST: Strongly likely to be takenLT: Likely to be taken

LNT: Likely not to be takenSNT: Strongly likely not to be taken

Branch predicted as Not taken in these two states

Branch predicted as Taken in these two states

Page 14: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

Example

• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 2-bit branch predictor starts in the LT state. What predictions will it make for each instance of the branch?

Instance Current State Prediction Actual Outcome Next State

1 LT T T ST

2 ST T T ST

3 ST T NT LT

4 LT T T ST

5 ST T T ST

6 ST T NT LT

Page 15: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

Example (cont.)

• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 2-bit branch predictor starts in the LT state. What predictions will it make for each instance of the branch?

Instance Current State Prediction Actual Outcome Next State

1 LT T T ST

2 ST T T ST

3 ST T NT LT

4 LT T T ST

5 ST T T ST

6 ST T NT LT

Prediction Accuracy = 4/6

Mispredictions happen only during the last iteration of the loop => less mispredictionsthat 1-bit prediction

Page 16: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

Prediction Accuracy of 4K 2-bit Predictor

Page 17: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

Having more Entries Isn’t the Solution

Page 18: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

Correlating Branch Predictors

• Simple 2-bit prediction schemes use branch history of single branch to predict future behavior of that branch. This is called a local branch prediction

• Behavior of other branches may have impact on the current branch

• Outcomes of different branches often correlated

Example:

If (a == 2)

a = 0;

If (b == 2)

b = 0;

If ( a == b) {

}

If the first two branches are not taken, then the third one is taken. Local branch prediction cannot capture this behavior

DADDi R3, R1, -2

BNEZ R3, L1 ; a != -2

DADD R1, R0, R0

L1: DADDI R3, R2, -2

BNEZ R3, L2 ; b!= -2

DADD R2, R0, R0

L2: DSUB R3, R1, R2

BEQZ R3, L3 ; a== b

Page 19: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

Correlating Branch Predictor with 2-bit Global History Register

Branch Address

11

2-bit per-branch predictors

3

01

10 11 00Prediction

= 11

• Correlating (or 2-level) Predictors use the behavior of other branches (global branch history) to make branch predictions

• Can extend branch history as m-bits recording history of last m branches• Requires 2m tables of length

2(branch address bits used)

• Global branch history implemented as a m-bit shift register where each bit records whether a branch was taken or not taken

2-bit global branch history

Page 20: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

Correlating Branch Predictor with m-bit Global History Register

(m,n) correlating predictor uses behavior of last m branches to choose from 2m

branch predictors, each of which is an n-bit predictor

Total number of bits = 2m * n * Number of entries in each prediction table= 2m * n * 2(branch address bits used)

For a predictor that does not use any global history, m = 0, e.g., a (0,2) is a 2-bit predictor with no global history

Branch Address

1..0

n-bit per-branch predictors

3

10….1

10..1 0..1 0..0

m-bit global branch history

Page 21: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

Correlating Predictor Examples

Question: How many bits are in the (0,2) branch predictor with 4K entries? How many entries are in a (2,2) predictor with the same number of bits?

Solution:

Number of bits = 2m * n * 2(branch address bits used)

For the (0,2) predictor:

Number of bits = 20 * 2 * 4K = 8K bits

For the (2,2) predictor:

Number of bits = 8K

8K = 22 * 2 * Number of predictor entries

=> Number of predictor entries = 1K

Page 22: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University

Comparison of 2-bit Predictors