Upload
vikym14
View
215
Download
0
Embed Size (px)
Citation preview
8/16/2019 Impact of delay on branch predictors
1/1
Paper: The Impact of Delay on the Design of Branch Predictors
Summary:
Increasing complexity of the branch predictor and deeply pipelined microarchitectures lead to a delay
in branch prediction. The premise of the argument is based on studies that show that shrinking feature
sizes larger wire delays and smaller clock cycles will lead to multi!cycle access times on larger chips.
This sur"ey paper focusses on techni#ues that can be utilized to accommodate this delay. The authors
examine a caching approach an o"erriding approach and a cascading look!ahead approach. The
o"erriding approach uses a #uick but relati"ely inaccurate predictor that guides instruction fetch in a
single cycle which can be corrected by a slower but more accurate predictor that needs multiple
cycles. The cascading look!ahead scheme exploits the time between branches to start reading the
prediction tables. Different configurations are e"aluated on a simulator that simulates different
processor technologies $%&'nm to (& nm) and determines the optimal parameters for each and for
different predictors. They also present results for different clocking strategies. They demonstrate that
efficiency of a predictor relies on the accuracy as well as delay.
Strengths:• Pro"ides insight into the effects of complex branch predictors on the delay in prediction.
• *ighlights the important tradeoff that really accurate complex predictor may still perform
worse than a faster less accurate predictor.
• Pro"ides us with experimentally determined configurations of pattern history tables for
different processor technologies.
• Propose that o"erriding yields better performance than other delay hiding methods.
• I think an important point also highlighted is how branch fre#uency affects latency in
prediction.
• There is particularly good insight in to how processor technology and clocking affect the IP+.
The hybrid predictor achie"es high accuracy but lowest IP+ on smaller technologies as the
access times increase.• They show that the o"erriding scheme can work best across most processor technologies and
aggressi"e clocking.
Weaknesses:
• They assume that the BTB is kept of constant capacity and access time. *owe"er this may
not be true in the face of the clock rate impact and the nanometer technology.
• ,lthough they show that o"erriding works better than caching and cascading o"erall but they
ha"en-t addressed its utility on "ery large hardware budgets i.e. in ''s of kilobytes. This
could possibly be because they only simulate
Related work :
• +ited /0 times on 1oogle 2cholar.
• This paper ser"ed as a foundation for the de"elopment of pipelined predictors that were
introduced in successi"e papers by Daniel 3imenez.
• ,ndrew 2eznec also then proposed the ahead pipelined architecture for branch prediction.