Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin,...

Preview:

Citation preview

Optimum System Balance for Systems of Finite Price

John D. McCalpin, Ph.D.IBM Corporation

Austin, TX

SuperComputing 2004Pittsburgh, PA

November 10, 2004

Overview

• The HPC Challenge Benchmark was announced last year at SuperComputing’2003

• The HPC Challenge Benchmark consists of– LINPACK (HPL)– STREAM– PTRANS (transposing the array used by HPL)– RandomAccess (random read/modify/write)– FFT– and some low-level MPI latency & BW measurements

• No single figure of merit is defined

Overview (continued)

• Q: What is a “balanced” system?

• My answer:“A balanced system is one for which the primary applications are limited in performance by the most expensive component(s) of the system.”

The Two Questions

• We need to understand both performance and cost in the context of low-level component metrics

• Performance– What performance model?– Use a harmonically weighted, time-based model

• Cost– What cost model?– Simple linear additive cost model

Performance Model

• Composite Figures of Merit for Performance must be based on “time” rather than “rate”– i.e., weighted harmonic means of rates

• Why?– Combining “rates” in any other way fails to have a

“Law of Diminishing Returns”

• Time = Work/Rate• Repeat for each component: Ti = Wi/Ri

A Simple Composite Model

• Assume the time to solution is composed of a compute time proportional to peak GFLOPS plus a memory transfer time proportional to sustained memory bandwidth

• Assume “x Bytes/FLOP” to get:

• Target SPECfp_rate2000 as the workload

GB/s SustainedBytes

GFLOPSPeak op FP 1

op" FP Effective" 1 GFLOPS" Balanced"

x

Does Peak GFLOPS predict SPECfp_rate2000?SPECfp_rate2000 vs Peak MFLOPS

0

1000

2000

3000

4000

5000

6000

7000

8000

0.00 5.00 10.00 15.00 20.00 25.00 30.00

SPECfp_rate2000/cpu

Pe

ak

MF

LO

PS

Does Sustained Memory Bandwidth predict SPECfp_rate2000?

SPECfp_rate2000 vs Sustained BW

0.000

1.000

2.000

3.000

4.000

5.000

6.000

7.000

0.00 5.00 10.00 15.00 20.00 25.00 30.00

SPECfp_rate2000/cpu

GB

/s p

er

CP

U

Optimized Model Results

• Results rounded to nearby round values:– Bytes/FLOP for large caches === 0.16– Bytes/FLOP for small caches === 0.80– Size of asymptotically large cache === ~12 MB– Coefficient of best fit === ~6.4– The units of the coefficient are

SPECfp_rate2000 / Effective GFLOPS

Does this Revised Metric predict SPECfp_rate2000?Optimized SPECfp_rate2000 Estimates

0.00

5.00

10.00

15.00

20.00

25.00

30.00

0.00 5.00 10.00 15.00 20.00 25.00 30.00

SPECfp_rate2000/cpu

Es

tim

ate

d R

ate

/cp

u

Cost Model

• Assume simple linear additive model– FLOPS cost some amount

– Sustained BW costs a different amount

– Define:

= Rmem / Rcpu

= Wmem / Wcpu

= ($/BW) / ($/FLOPS)

System characteristics

Application characteristics

Technology characteristics

How to Optimize?

• For a given application (Wmem/Wcpu), what is the optimum system balance ?

• Many people seem to believe that the system should be “balanced” to the application:– optimal =

i.e.,

– Rmem/Rcpu = Wmem/Wcpu

• This does not optimize price/performance

The Correct Optimization

• This is actually an easy optimization problem• Minimize cost/performance

– Same as minimizing cost * time

• Optimum cost/performance occurs at – = sqrt(/)

• Definitely not intuitive!

Example: High BW, expensive BWgamma = 3, delta = 3

0.000

0.500

1.000

1.500

2.000

2.500

3.000

0.10 1.00 10.00

beta

rela

tive

pri

ce/p

erfo

rman

ce

= 3 relatively high BW = 3 relatively expensive BW

Optimum price/performance is at =1, not =3Improvement in price/performance is ~30%

High BW, very expensive memorygamma = 3, delta = 10

0.000

0.500

1.000

1.500

2.000

2.500

3.000

3.500

0.10 1.00 10.00

beta

rela

tive

pri

ce/p

erfo

rman

ce

= 3 relatively high BW = 10 very expensive BW

Optimum price/performance is at =0.58, not =3Improvement in price/performance is ~50%

Low-BW, expensive BWgamma = 0.1, delta = 3

0.000

2.000

4.000

6.000

8.000

10.000

12.000

14.000

0.10 1.00 10.00

beta

rela

tive

pri

ce/p

erfo

rman

ce

= 0.1 low BW application = 3 moderately expensive BW

Optimum price/performance is at =0.18, not =0.1Improvement in price/performance is ~5%

More BW helps here even though it is expensive, because the application does not need much.

Medium BW, expensive BWgamma = 1, delta = 3

0.000

0.500

1.000

1.500

2.000

2.500

3.000

3.500

4.000

4.500

0.10 1.00 10.00

beta

rela

tive

pri

ce/p

erfo

rman

ce

= 1 modest BW = 3 moderately expensive BW

Optimum price/performance is at =0.58, not =1Improvement in price/performance is ~10%

Summary

• Balance is important to cost/performance

• You must understand performance

• You must understand cost

• Optimum cost-performance is not intuitive!

Recommended