Upload
oswald-johns
View
216
Download
2
Tags:
Embed Size (px)
Citation preview
Optimum System Balance for Systems of Finite Price
John D. McCalpin, Ph.D.IBM Corporation
Austin, TX
SuperComputing 2004Pittsburgh, PA
November 10, 2004
Overview
• The HPC Challenge Benchmark was announced last year at SuperComputing’2003
• The HPC Challenge Benchmark consists of– LINPACK (HPL)– STREAM– PTRANS (transposing the array used by HPL)– RandomAccess (random read/modify/write)– FFT– and some low-level MPI latency & BW measurements
• No single figure of merit is defined
Overview (continued)
• Q: What is a “balanced” system?
• My answer:“A balanced system is one for which the primary applications are limited in performance by the most expensive component(s) of the system.”
The Two Questions
• We need to understand both performance and cost in the context of low-level component metrics
• Performance– What performance model?– Use a harmonically weighted, time-based model
• Cost– What cost model?– Simple linear additive cost model
Performance Model
• Composite Figures of Merit for Performance must be based on “time” rather than “rate”– i.e., weighted harmonic means of rates
• Why?– Combining “rates” in any other way fails to have a
“Law of Diminishing Returns”
• Time = Work/Rate• Repeat for each component: Ti = Wi/Ri
A Simple Composite Model
• Assume the time to solution is composed of a compute time proportional to peak GFLOPS plus a memory transfer time proportional to sustained memory bandwidth
• Assume “x Bytes/FLOP” to get:
• Target SPECfp_rate2000 as the workload
GB/s SustainedBytes
GFLOPSPeak op FP 1
op" FP Effective" 1 GFLOPS" Balanced"
x
Does Peak GFLOPS predict SPECfp_rate2000?SPECfp_rate2000 vs Peak MFLOPS
0
1000
2000
3000
4000
5000
6000
7000
8000
0.00 5.00 10.00 15.00 20.00 25.00 30.00
SPECfp_rate2000/cpu
Pe
ak
MF
LO
PS
Does Sustained Memory Bandwidth predict SPECfp_rate2000?
SPECfp_rate2000 vs Sustained BW
0.000
1.000
2.000
3.000
4.000
5.000
6.000
7.000
0.00 5.00 10.00 15.00 20.00 25.00 30.00
SPECfp_rate2000/cpu
GB
/s p
er
CP
U
Optimized Model Results
• Results rounded to nearby round values:– Bytes/FLOP for large caches === 0.16– Bytes/FLOP for small caches === 0.80– Size of asymptotically large cache === ~12 MB– Coefficient of best fit === ~6.4– The units of the coefficient are
SPECfp_rate2000 / Effective GFLOPS
Does this Revised Metric predict SPECfp_rate2000?Optimized SPECfp_rate2000 Estimates
0.00
5.00
10.00
15.00
20.00
25.00
30.00
0.00 5.00 10.00 15.00 20.00 25.00 30.00
SPECfp_rate2000/cpu
Es
tim
ate
d R
ate
/cp
u
Cost Model
• Assume simple linear additive model– FLOPS cost some amount
– Sustained BW costs a different amount
– Define:
= Rmem / Rcpu
= Wmem / Wcpu
= ($/BW) / ($/FLOPS)
System characteristics
Application characteristics
Technology characteristics
How to Optimize?
• For a given application (Wmem/Wcpu), what is the optimum system balance ?
• Many people seem to believe that the system should be “balanced” to the application:– optimal =
i.e.,
– Rmem/Rcpu = Wmem/Wcpu
• This does not optimize price/performance
The Correct Optimization
• This is actually an easy optimization problem• Minimize cost/performance
– Same as minimizing cost * time
• Optimum cost/performance occurs at – = sqrt(/)
• Definitely not intuitive!
Example: High BW, expensive BWgamma = 3, delta = 3
0.000
0.500
1.000
1.500
2.000
2.500
3.000
0.10 1.00 10.00
beta
rela
tive
pri
ce/p
erfo
rman
ce
= 3 relatively high BW = 3 relatively expensive BW
Optimum price/performance is at =1, not =3Improvement in price/performance is ~30%
High BW, very expensive memorygamma = 3, delta = 10
0.000
0.500
1.000
1.500
2.000
2.500
3.000
3.500
0.10 1.00 10.00
beta
rela
tive
pri
ce/p
erfo
rman
ce
= 3 relatively high BW = 10 very expensive BW
Optimum price/performance is at =0.58, not =3Improvement in price/performance is ~50%
Low-BW, expensive BWgamma = 0.1, delta = 3
0.000
2.000
4.000
6.000
8.000
10.000
12.000
14.000
0.10 1.00 10.00
beta
rela
tive
pri
ce/p
erfo
rman
ce
= 0.1 low BW application = 3 moderately expensive BW
Optimum price/performance is at =0.18, not =0.1Improvement in price/performance is ~5%
More BW helps here even though it is expensive, because the application does not need much.
Medium BW, expensive BWgamma = 1, delta = 3
0.000
0.500
1.000
1.500
2.000
2.500
3.000
3.500
4.000
4.500
0.10 1.00 10.00
beta
rela
tive
pri
ce/p
erfo
rman
ce
= 1 modest BW = 3 moderately expensive BW
Optimum price/performance is at =0.58, not =1Improvement in price/performance is ~10%
Summary
• Balance is important to cost/performance
• You must understand performance
• You must understand cost
• Optimum cost-performance is not intuitive!