Upload
jessie-stewart
View
218
Download
0
Embed Size (px)
DESCRIPTION
September 23, 2002PACT Main idea Workload design space is p-D space –with p = # relevant program characteristics –p is too large for understandable visualization –correlation between p characteristics Idea: reduce p-D space to q-D space –with q small (typically 2 to 4) –without losing important information –no correlation –achieved by multivariate data analysis techniques: PCA and cluster analysis
Citation preview
Workload Design: Selecting Representative Program-Input
Pairs
Lieven EeckhoutHans VandierendonckKoen De Bosschere
Ghent University, BelgiumPACT 2002, September 23, 2002
September 23, 2002 PACT 2002 2
Introduction• Microprocessor design: simulation of
workload = set of programs + inputs– constrained in size due to time limitation– taken from suites, e.g., SPEC, TPC, MediaBench
• Workload design:– which programs?– which inputs?– representative: large variation in behavior– benchmark-input pairs should be “different”
September 23, 2002 PACT 2002 3
Main idea• Workload design space is p-D space
– with p = # relevant program characteristics– p is too large for understandable visualization– correlation between p characteristics
• Idea: reduce p-D space to q-D space– with q small (typically 2 to 4)– without losing important information– no correlation– achieved by multivariate data analysis
techniques: PCA and cluster analysis
September 23, 2002 PACT 2002 4
Goal• Measuring impact of input data sets on
program behavior– “far away” or weak clustering: different
behavior– “close” or strong clustering: similar behavior
• Applications:– selecting representative program-input pairs
• e.g., one program-input pair per cluster• e.g., take program-input pair with smallest
dynamic instruction count– getting insight in influence of input data sets– profile-guided optimization
September 23, 2002 PACT 2002 5
Overview• Introduction• Workload characterization• Data analysis
– Principal components analysis (PCA)– Cluster analysis
• Evaluation• Discussion• Conclusion
September 23, 2002 PACT 2002 6
Workload characterization (1)
• Instruction mix– int, logic, shift&byte, load/store, control
• Branch prediction accuracy– bimodal (8K*2 bits), gshare (8K*2 bits) and
hybrid (meta: 8K*2 bits) branch predictor• Data and instruction cache miss rates
– Five caches with varying size and associativity
September 23, 2002 PACT 2002 7
Workload characterization (2)
• Number of instructions between two taken branches
• Instruction-Level Parallelism– IPC of an infinite-resource machine with only
read-after-write dependencies• In total: p = 20 variables
September 23, 2002 PACT 2002 8
Overview• Introduction• Workload characterization• Data analysis
– Principal components analysis (PCA)– Cluster analysis
• Evaluation• Discussion• Conclusion
September 23, 2002 PACT 2002 9
PCA• Many program characteristics (variables) are
correlated• PCA computes new variables
– p principal components PCi– linear combination of original characteristics– uncorrelated– contain same total variance over all benchmarks– Var[PC1] > Var [PC2] > Var[PC3] > …– most have near-to-zero variance (constant)– reduce dimension of workload space to q = 2 to 4
September 23, 2002 PACT 2002 10
PCA: Interpretation
• Interpretation– Principal Components
(PC) along main axes of ellipse
– Var(PC1) > Var(PC2) > ...
– PC2 is less important to explain variation over program-input pairs
• Reduce No. of PC’s– throw out PCs with
negligible variance
Variable 1
Var
iabl
e 2
PC 1PC 2
September 23, 2002 PACT 2002 11
Cluster analysis
• Hierarchic clustering
• Based on distance between program-input pairs
• Can be represented by a dendrogram
September 23, 2002 PACT 2002 12
Overview• Introduction• Workload characterization• Data analysis
– Principal components analysis (PCA)– Cluster analysis
• Evaluation• Discussion• Conclusion
September 23, 2002 PACT 2002 13
Methodology• Benchmarks
– SPECint95• Inputs from SPEC: train and ref• Inputs from the web (ijpeg)• Reduced inputs (compress)
– TPC-D on postgres v6.3– Compiled with –O4 on Alpha– 79 program-input pairs
• ATOM– Instrumentation– Measuring characteristics
• STATISTICA– Statistical analysis
September 23, 2002 PACT 2002 14
GCC: principal components
-1-0.8-0.6-0.4-0.2
00.20.40.60.8
1
ILP
BIM
OD
AL
GS
HA
RE
HY
BR
ID
LD_S
T
INT_
AR
IT
INT_
LOG
I
INT_
SH
IF
CTR
L
BR
EA
K
I_8K
B
I_16
KB
I_32
KB
I_64
KB
I_12
8KB
D_8
KB
D_1
6KB
D_3
2KB
D_6
4KB
D_1
28K
B
Workload Characteristic
Wei
ght i
n P
C
Principal Component 1Principal Component 2
2 PC’s: 96,9% of total variance
September 23, 2002 PACT 2002 15
GCC
-2
-1
0
1
2
3
4
5
-3 -2 -1 0 1 2 3
principal component 1
prin
cipa
l com
pone
nt 2
emit-rtl
insn-emit
protoize
varasm
explow
recog
reload1expr
cp-decl
insn-recogprint-treedbxout
toplev
High branch prediction accuracyHigh I-cache miss rates
High D
-cache miss rates
Many control &
shift insnM
any LD/STs
and ILP
7 inputs
September 23, 2002 PACT 2002 16
compress
linkage distance
gogcc.em
it-rtl +g
cc.insn-recog
gcc
gcc.explow
Q6+Q12+Q13+Q15
vortex
li
Q16Q5 m88k.ref
Q10Q
8
perl.ju
mble
Q3+Q7+Q9+Q11+Q14+Q17
m88ksim
.train
perl.scrabbl
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
li.takr+li.browse+li.boyer
Q2+Q4
ijpeg
compress.100,000
Workload space: 4 PCs -> 93.1%
ijpeg, compress and
go are isolated
Go: low branch prediction accuracyCompress: high data cache miss rateIjpeg: high LD/STs rate, low ctrl ops rate
September 23, 2002 PACT 2002 17
compress
linkage distance
gogcc.em
it-rtl +g
cc.insn-recog
gcc
gcc.explow
Q6+Q12+Q13+Q15
vortex
li
Q16Q5 m88k.ref
Q10Q
8
perl.ju
mble
Q3+Q7+Q9+Q11+Q14+Q17
m88ksim
.train
perl.scrabbl
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
li.takr+li.browse+li.boyer
Q2+Q4
ijpeg
compress.100,000
Workload space
strong clustering
September 23, 2002 PACT 2002 18
Small versus large inputs• Vortex:
– Train: 3.2B insn– Ref: 92.5B insn– Similar behavior: linkage distance ~ 1.4
• Not for m88ksim– Linkage distance ~ 4
• Reference input for compress can be reduced without significantly impacting behavior: 2B vs. 60B instructions
September 23, 2002 PACT 2002 19
Impact of input on behavior
• For TPC-D queries:– Weak clustering– Large impact– I-cache behavior
• In general: variation between programs is larger than the variation between input sets for the same program– However: there are exceptions where input
has large impact on behavior, e.g., TPC-D and perl
September 23, 2002 PACT 2002 20
Overview• Introduction• Workload characterization• Data analysis
– Principal components analysis (PCA)– Cluster analysis
• Evaluation• Discussion• Conclusion
September 23, 2002 PACT 2002 21
Conclusion• Workload design
– representative– not long running
• Principal Components Analysis (PCA) and cluster analysis help in detecting input data sets resulting in similar or different behavior of a program
• Applications:– workload design: representativeness while
taking into account simulation time– impact of input data sets on program behavior– profile-guided optimizations