18
Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

Embed Size (px)

Citation preview

Page 1: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

Optimum System Balance for Systems of Finite Price

John D. McCalpin, Ph.D.IBM Corporation

Austin, TX

SuperComputing 2004Pittsburgh, PA

November 10, 2004

Page 2: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

Overview

• The HPC Challenge Benchmark was announced last year at SuperComputing’2003

• The HPC Challenge Benchmark consists of– LINPACK (HPL)– STREAM– PTRANS (transposing the array used by HPL)– RandomAccess (random read/modify/write)– FFT– and some low-level MPI latency & BW measurements

• No single figure of merit is defined

Page 3: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

Overview (continued)

• Q: What is a “balanced” system?

• My answer:“A balanced system is one for which the primary applications are limited in performance by the most expensive component(s) of the system.”

Page 4: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

The Two Questions

• We need to understand both performance and cost in the context of low-level component metrics

• Performance– What performance model?– Use a harmonically weighted, time-based model

• Cost– What cost model?– Simple linear additive cost model

Page 5: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

Performance Model

• Composite Figures of Merit for Performance must be based on “time” rather than “rate”– i.e., weighted harmonic means of rates

• Why?– Combining “rates” in any other way fails to have a

“Law of Diminishing Returns”

• Time = Work/Rate• Repeat for each component: Ti = Wi/Ri

Page 6: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

A Simple Composite Model

• Assume the time to solution is composed of a compute time proportional to peak GFLOPS plus a memory transfer time proportional to sustained memory bandwidth

• Assume “x Bytes/FLOP” to get:

• Target SPECfp_rate2000 as the workload

GB/s SustainedBytes

GFLOPSPeak op FP 1

op" FP Effective" 1 GFLOPS" Balanced"

x

Page 7: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

Does Peak GFLOPS predict SPECfp_rate2000?SPECfp_rate2000 vs Peak MFLOPS

0

1000

2000

3000

4000

5000

6000

7000

8000

0.00 5.00 10.00 15.00 20.00 25.00 30.00

SPECfp_rate2000/cpu

Pe

ak

MF

LO

PS

Page 8: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

Does Sustained Memory Bandwidth predict SPECfp_rate2000?

SPECfp_rate2000 vs Sustained BW

0.000

1.000

2.000

3.000

4.000

5.000

6.000

7.000

0.00 5.00 10.00 15.00 20.00 25.00 30.00

SPECfp_rate2000/cpu

GB

/s p

er

CP

U

Page 9: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

Optimized Model Results

• Results rounded to nearby round values:– Bytes/FLOP for large caches === 0.16– Bytes/FLOP for small caches === 0.80– Size of asymptotically large cache === ~12 MB– Coefficient of best fit === ~6.4– The units of the coefficient are

SPECfp_rate2000 / Effective GFLOPS

Page 10: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

Does this Revised Metric predict SPECfp_rate2000?Optimized SPECfp_rate2000 Estimates

0.00

5.00

10.00

15.00

20.00

25.00

30.00

0.00 5.00 10.00 15.00 20.00 25.00 30.00

SPECfp_rate2000/cpu

Es

tim

ate

d R

ate

/cp

u

Page 11: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

Cost Model

• Assume simple linear additive model– FLOPS cost some amount

– Sustained BW costs a different amount

– Define:

= Rmem / Rcpu

= Wmem / Wcpu

= ($/BW) / ($/FLOPS)

System characteristics

Application characteristics

Technology characteristics

Page 12: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

How to Optimize?

• For a given application (Wmem/Wcpu), what is the optimum system balance ?

• Many people seem to believe that the system should be “balanced” to the application:– optimal =

i.e.,

– Rmem/Rcpu = Wmem/Wcpu

• This does not optimize price/performance

Page 13: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

The Correct Optimization

• This is actually an easy optimization problem• Minimize cost/performance

– Same as minimizing cost * time

• Optimum cost/performance occurs at – = sqrt(/)

• Definitely not intuitive!

Page 14: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

Example: High BW, expensive BWgamma = 3, delta = 3

0.000

0.500

1.000

1.500

2.000

2.500

3.000

0.10 1.00 10.00

beta

rela

tive

pri

ce/p

erfo

rman

ce

= 3 relatively high BW = 3 relatively expensive BW

Optimum price/performance is at =1, not =3Improvement in price/performance is ~30%

Page 15: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

High BW, very expensive memorygamma = 3, delta = 10

0.000

0.500

1.000

1.500

2.000

2.500

3.000

3.500

0.10 1.00 10.00

beta

rela

tive

pri

ce/p

erfo

rman

ce

= 3 relatively high BW = 10 very expensive BW

Optimum price/performance is at =0.58, not =3Improvement in price/performance is ~50%

Page 16: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

Low-BW, expensive BWgamma = 0.1, delta = 3

0.000

2.000

4.000

6.000

8.000

10.000

12.000

14.000

0.10 1.00 10.00

beta

rela

tive

pri

ce/p

erfo

rman

ce

= 0.1 low BW application = 3 moderately expensive BW

Optimum price/performance is at =0.18, not =0.1Improvement in price/performance is ~5%

More BW helps here even though it is expensive, because the application does not need much.

Page 17: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

Medium BW, expensive BWgamma = 1, delta = 3

0.000

0.500

1.000

1.500

2.000

2.500

3.000

3.500

4.000

4.500

0.10 1.00 10.00

beta

rela

tive

pri

ce/p

erfo

rman

ce

= 1 modest BW = 3 moderately expensive BW

Optimum price/performance is at =0.58, not =1Improvement in price/performance is ~10%

Page 18: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004

Summary

• Balance is important to cost/performance

• You must understand performance

• You must understand cost

• Optimum cost-performance is not intuitive!