Upload
marnin
View
31
Download
0
Embed Size (px)
DESCRIPTION
2D-Profiling Detecting Input-Dependent Branches with a Single Input Data Set. Hyesoon Kim M. Aater Suleman Onur Mutlu Yale N. Patt. HPS Research Group The University of Texas at Austin. Motivation. Profile-guided code optimization has become essential for achieving good performance. - PowerPoint PPT Presentation
Citation preview
2D-ProfilingDetecting Input-Dependent Brancheswith a Single Input Data Set
Hyesoon KimM. Aater SulemanOnur MutluYale N. Patt
HPS Research Group The University of Texas at Austin
2
Motivation
Profile-guided code optimization has become essential for achieving good performance. Run-time behavior profile-time behavior: Good! Run-time behavior ≠ profile-time behavior: Bad!
3
Motivation
Profiling with one input set is not enough! Because a program can show different behavior with
different input data sets Example: Performance of predicated execution is
highly dependent on the input data set Because some branches behave differently with
different input sets
4
Input-dependent Branches Definition
A branch is input-dependent if its misprediction rate differs by more than some Δ over different input data sets.
Inp. 1 Inp. 2 Inp.1 – Inp. 2Misprediction rate of Br. X 30% 29% 1%
Misprediction rate of Br. Y 30% 5% 25%
Input-dependent branch
Input-dependent br. hard-to-predict br.
5
An Example Input-dependent Branch Example from Gap (SPEC2K): Type checking branch
TypHandle Sum (A,B)
If ((type(A) == INT) && (type(B) == INT)) { Result = A + B;Return Result;
}Return SUM(A, B);
Train input set: A&B are integers 90% of the time misprediction rate: 10%
Reference input set: A&B are integers 42% of the time misprediction rate: 30% (30%-10%)>Δ
//input-dependent br
6
Predicated Execution
Eliminate hard-to-predict branchesbut fetch blocks B and C all the time
(normal branch code)
C B
D
AT N
p1 = (cond) branch p1, TARGET
mov b, 1 jmp JOIN
TARGET: mov b, 0
A
B
C
BCD
A(predicated code)
A
B
C
if (cond) { b = 0;}else { b = 1;} p1 = (cond)
(!p1) mov b, 1
(p1) mov b, 0
7
Predicated Code Performance vs. Branch Misprediction Rate
Normal branch code performs better
Predicated code performs better
run-time (input B) profile-time (input A)
Converting a branch to predicated code could hurt performance if run-time misprediction rate is lower than profile-time misprediction rate
X2
3
4
5
6
7
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Branch misprediction rate (%)
Exec
utio
n tim
e (c
ycle
s)
predicated code
normal branch code
8
0
0.2
0.4
0.6
0.8
1
1.2
gzip vpr gcc mcf crafty parserperlbmk gap vortex bzip2 twolfExec
. tim
e no
rmal
ized
to n
on-p
redi
cate
d co
de
run-time: input-Arun-time: input-Brun-time: input-C
Predicated Code Performance vs. Input Set
9%-4%
-16%1%
Measured on an Itanium-II machine
non-predicated
Predicated execution loses performance because of input-dependent branches
9
If We Know a Branch is Input-Dependent
May not convert it to predicated code. May convert it to a wish branch.
[Kim et al. Micro’05] May not perform other compiler optimizations
or may perform them less aggressively. Hot-path/trace/superblock-based optimizations [Fisher’81, Pettis’90, Hwu’93, Merten’99]
10
Our Goal
Identify input-dependent branches by using a single input set for profiling
11
Talk Outline Motivation 2D-profiling Mechanism Experimental Results Conclusion
12
0
0.2
0.4
0.6
0.8
1
0 500 1000 1500Time (in terms of number of executed instructions x 100M)
Bra
nch
Pred
ictio
n A
ccur
acy
0
0.2
0.4
0.6
0.8
1
0 500 1000 1500Time (in terms of number of executed instructions x 100M)
Bra
nch
Pred
ictio
n A
ccur
acy
Key Insight of 2D-profilingPhase behavior in prediction accuracyis a good indicator of input dependence
input-dependent
input-independent
phase 1
phase 2
phase 3
13
Traditional Profiling
brA
time
brB
time
MEAN pr.Acc(brA) MEAN pr.Acc(brB) behavior of brA behavior of brB
MEAN pr.Acc(brA)
MEAN pr.Acc(brB)
pr. A
ccpr
. Acc
14
2D-profiling
brA
time
brB
timeMEAN pr.Acc(brA) MEAN pr.Acc(brB)
STD pr.Acc(brA) ≠ STD pr.Acc(brB)behavior of brA ≠ behavior of brB
A: input-dependent br, B: input-independent br
MEAN pr.Acc(brA)STD pr.Acc(brA)
MEAN pr.Acc(brB)STD pr.Acc(brB)pr
. Acc
pr. A
cc
15
Calculate MEAN (brA, brB, …), Standard deviation (brA, brB, …),
PAM:Points Above Mean (brA, brB, …)
2D-profiling Mechanism
Slice 1 Slice 2 Slice N …
mean Pr.Acc(brA,s1)mean Pr.Acc(brB,s1)
mean Pr.Acc(brA,s2)mean Pr.Acc(brB,s2)
mean Pr.Acc(brA,sN)mean Pr.Acc(brB,sN)...
...
The profiler collects branch prediction accuracy information for every static branch over time
mean brA
time
PAM:50%
PAM:0%
brA
brBmean brB
... ......
slice size = M instructions
16
Input-dependence Tests STD&PAM-test: Identify branches that have large variations
in accuracy over time (phase behavior) STD-test (STD > threshold): Identify branches that have large
variations in the prediction accuracy over time PAM-test (PAM > threshold): Filter out branches that pass STD-
test due to a few outlier samples
MEAN&PAM-test: Identify branches that have low prediction accuracy and some time-variation in accuracy MEAN-test (MEAN < threshold): Identify branches that have low
prediction accuracy PAM-test (PAM > threshold): Identify branches that have some
variation in the prediction accuracy over time
A branch is classified as input-dependent if it passes either STD&PAM-test or MEAN&PAM-test
17
Talk Outline Motivation 2D-profiling Mechanism Experimental Results Conclusion
18
Experimental Methodology Profiler: PIN-binary instrumentation tool Benchmarks: SPEC 2K INT Input sets
Profiler: Train input set Input-dependent Branches: Reference input set and train/other
extra input sets Input-dependent branch: misprediction rate of the branch
changes more than Δ = 5% when input data set changes Different Δ are examined in our TechReport [reference 11].
Branch predictors Profiler: 4KB Gshare, Machine: 4KB Gshare Profiler: 4KB Gshare, Machine: 16KB Perceptron (in paper)
19
Evaluation Metrics Coverage and Accuracy for input-dependent
branches
dependent-Input ActualPredictedCorrectly
ABACOV
A B
dependent-InputPredictedPredictedCorrectly
BBACC
A
Actual Input-dependent br.
Predicted Input-dependent br. (2D-profiler)Correctly Predicted Input-dependent br.
20
Input-dependent Branches
0%10%
20%30%
40%50%
60%70%
80%90%
100%
bzip2 gzip twolf gap crafty gcc% o
f bra
nche
s th
at a
re in
put-d
epen
dent
2 input sets3 input sets4 input sets5 input sets6 input sets7 input sets8 input sets
21
00.10.20.30.40.5
0.60.70.80.9
1
coverage accuracy coverage accuracy
Frac
tion
2 input sets3 input sets4 input sets5 input sets6 input sets7 input sets8 input sets
2D-profiling Results
input-dependent branches input-independent branches
Phase behavior and input-dependence are strongly correlated!
63%
88%93%
76%
22
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
bzip2 gzip twolf gap crafty gcc AVG
Nor
mal
ized
exe
cutio
n tim
e
EdgeGshare2D+Gshare
The Cost of 2D-profiling?
2D-profiling adds only 1% overhead over modeling the branch predictor in software Using a H/W branch predictor [Conte’96]
54%1%
23
Conclusion 2D-profiling is a new profiling technique to find input-
dependent characteristics by using a single input data set for profiling
2D-profiling uses time-varying information instead of just average data
Phase behavior in prediction accuracy in a profile run input-dependent
2D-profiling accurately identifies input-dependent branches with very little overhead (1% more than modeling the branch predictor in the profiler)
Applications of 2D-profiling are an open research topic Better predicated code/wish branch generation algorithms Detecting other input-dependent program characteristics
Questions?