Upload
candace-maud-carroll
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Branches
Daniel Ángel Jiménez
Departments of Computer Science
UT San Antonio & Rutgers
2
About Me
Born in Fort Hood, Texas in 1969 (~80 miles north on IH-35) Dad from Mexico, Mom from Texas Lived in Temple, Texas
Moved to San Antonio, Texas in 1973 (~80 miles south on IH-35) B.S. at UTSA, 1992 M.S. at UTSA, 1994
Moved to San Marcos, Texas in 1995 (~30 miles south on IH-35) Started Ph.D. program at UT Austin
Moved back to San Antonio in 1996 Non-tenure-track faculty, UTHSCSA
Moved to Austin in 1999 Ph.D. UT Austin, 2002
Moved to New Jersey in 2002, New York 2003 Asst. Professor, Rutgers
Sabbatical in Barcelona, Spain in 2005 Back to San Antonio in 2007
Associate Professor, UTSA Mostly for the breakfast tacos
3
More about me
Always liked computer programming First computer was Tandy Color Computer in 1984
Fortunate sequence of mentors guided me into my career Mom – Education is important (didn’t believe her at the time) Neal Wagner – theory is exciting Hugh Maynard – math is my friend Betty Travis – Research Careers for Minority Scholars Calvin Lin – perfect fit Ph.D. advisor Uli Kremer – welcomed me into being a professor
Like taekwondo, piano, traveling, Spanish music Current favorite band – Ojos de Brujo
4
This Talk
How an instruction is processed – pipelining
Kinds of branches
Branch prediction
Accuracy
Technique
Empirical properties of branches
How to handle branches
Conclusion
5
How an Instruction is Processed
Instruction fetch
Instruction decode
Execute
Memory access
Write back
Processing can be divided
into five stages:
6
Instruction-Level Parallelism
Instruction fetch
Instruction decode
Execute
Memory access
Write back
To speed up the process, pipelining overlaps execution of multiple instructions, exploiting parallelism between instructions
7
Control Hazards: Branches
Conditional branches create a problem for pipelining: the next instruction can't be fetched until the branch has executed, several stages later.
Branch instruction
8
Pipelining with Branches
Instruction fetch
Instruction decode
Execute
Memory access
Write back
Branches cause bubbles in the pipeline, where some stages are left idle.
Unresolved branch instruction
9
Branch Prediction
Instruction fetch
Instruction decode
Execute
Memory access
Write back
A branch predictor allows the processor to speculatively fetch and execute instructions down the predicted path.
Speculative execution
Branch predictors must be highly accurate to avoid mispredictions!
10
Kinds of Branches
Conditional Very common, 1/4 to 1/10 of instructions Must be predicted, can be hard to predict Loops back edges with short fixed trip counts can be predicted perfectly
Unconditional Targets still have to be predicted with BTB
Indirect E.g. jumping through a table of addresses Can be predicted, often just use BTB as predictor
Returns Predicted with RAS >99% possible if you avoid deep recursion
11
Branch Predictor Accuracy is Critical
The cost of a misprediction is proportional to pipeline depth Predictor accuracy is more important for deeper pipelines Need good branch predictor to feed core with right-path insts
Simulations with SimpleScalar/Alpha
Deeper pipelines allow higher clock rates by decreasing the delay of each pipeline stage
Decreasing misprediction rate from 9% to 4% results in 31% speedup for 32 stage pipeline
Today’s pipelines have been scaled back, but only temporarily…
12
Conditional Branch Prediction
Most predictors are based on 2-level adaptive branch prediction [Yeh & Patt ’91]
Branch outcomes are shifted into a history register, 1 for taken, 0 for not taken
History bits and address bits combine to index a pattern history table (PHT) of 2-bit saturating counters
Prediction is high bit of counter Counter is incremented if branch is
taken, decremented if branch is not taken
GAs – a common type of predictor
13
Characteristics of Branch Behavior
Branches tend to be highly biased 53% are strongly biased, taken at least 98% or at most 2% of the time Remaining branches also exhibit weak biases A few branches show no bias
Branch outcomes are highly correlated with past branch history
14
Important Facts about Branches
A taken branch is (often) more costly than an untaken branch Trace caches can mitigate this
Mispredicted branches are very costly Some mispredictions are more costly than others – how to exploit that?
Be aware of your machine’s indirect branch predictor What’s the best way to compile dense switch/case stmts?
What to do about virtual dispatch?
Some ISAs have hint bits
These can help a lot if set correctly
But only if microarch uses them
15
What to do about mispredictions?
Capacity/Conflict Too many program paths, collisions in tables
Solutions: use the hint bits or align branches
Unfortunately branch predictors are secret so options are limited
Branches not correlated with recent history Split loops so trip counts are within history length
Data dependent branches with unfriendly distributions
Predicate if possible
Profile Performance counters + tools such as VTune or Oprofile
16
Conclusion
Branches can have variable costs due primarily to prediction
Be aware of the implementation of branches
Profiling and ISA support for branches
Different causes and effects of mispredictions
Impact of mispredictions has crept up in recent years
17
The End
http://www.cs.utsa.edu/~dj
18
Related Compiler Work
Profile-guided code placement to improve instruction locality Program restructuring for virtual memory [Hatfield & Gerald `71] Reducing conflict misses in direct-mapped I$ [McFarling `88, `89] Procedure placement [Petis & Hansen `90], [Gloy & Smith `99]
Transformations for reducing branch costs Branch alignment [Calder & Grunwald `94],[Young et al. `97] Software trace cache [Ramirez et al. `99]
Transformations for improving predictor accuracy Static correlated branch prediction [Young & Smith `99] Address adjustment [Chen & King `99] Reverse-engineering branch predictors [Milenkovic et al. `04]
PHT partitioning [Jiménez `05]