Impact of delay on branch predictors

8/16/2019 Impact of delay on branch predictors

1/1

Paper: The Impact of Delay on the Design of Branch Predictors

Summary:

Increasing complexity of the branch predictor and deeply pipelined microarchitectures lead to a delay

in branch prediction. The premise of the argument is based on studies that show that shrinking feature

sizes larger wire delays and smaller clock cycles will lead to multi!cycle access times on larger chips.

This sur"ey paper focusses on techni#ues that can be utilized to accommodate this delay. The authors

examine a caching approach an o"erriding approach and a cascading look!ahead approach. The

o"erriding approach uses a #uick but relati"ely inaccurate predictor that guides instruction fetch in a

single cycle which can be corrected by a slower but more accurate predictor that needs multiple

cycles. The cascading look!ahead scheme exploits the time between branches to start reading the

prediction tables. Different configurations are e"aluated on a simulator that simulates different

processor technologies $%&'nm to (& nm) and determines the optimal parameters for each and for

different predictors. They also present results for different clocking strategies. They demonstrate that

efficiency of a predictor relies on the accuracy as well as delay.

Strengths:• Pro"ides insight into the effects of complex branch predictors on the delay in prediction.

• *ighlights the important tradeoff that really accurate complex predictor may still perform

worse than a faster less accurate predictor.

• Pro"ides us with experimentally determined configurations of pattern history tables for

different processor technologies.

• Propose that o"erriding yields better performance than other delay hiding methods.

• I think an important point also highlighted is how branch fre#uency affects latency in

prediction.

• There is particularly good insight in to how processor technology and clocking affect the IP+.

The hybrid predictor achie"es high accuracy but lowest IP+ on smaller technologies as the

access times increase.• They show that the o"erriding scheme can work best across most processor technologies and

aggressi"e clocking.

Weaknesses:

• They assume that the BTB is kept of constant capacity and access time. *owe"er this may

not be true in the face of the clock rate impact and the nanometer technology.

• ,lthough they show that o"erriding works better than caching and cascading o"erall but they

ha"en-t addressed its utility on "ery large hardware budgets i.e. in ''s of kilobytes. This

could possibly be because they only simulate

Related work :

• +ited /0 times on 1oogle 2cholar.

• This paper ser"ed as a foundation for the de"elopment of pipelined predictors that were

introduced in successi"e papers by Daniel 3imenez.

• ,ndrew 2eznec also then proposed the ahead pipelined architecture for branch prediction.

Documents

Impact of delay on branch predictors