27
Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of Sciences [email protected], {baoyg, tgm, cmy}@ncic.ac.cn

Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Embed Size (px)

Citation preview

Page 1: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Extending Amdahl’s Law in the Multicore Era

Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen

Institute of Computing Technology, Chinese Academy of Sciences

[email protected], {baoyg, tgm, cmy}@ncic.ac.cn

Page 2: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

A Brief Intro Of ICT, CAS

ICT has built the Fastest HPC in China – Dawning 5000, which is 233.5TFlops and rank 10th in Top500.

ICT has developed the Loongson CPU

Page 3: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Outline

• I. Background and Related Works

• II. Model of Multicore Scalability

• III. Symmetrical Multicore Chips

• IV. Asymmetrical Multicore Chips

• V. Dynamic Multicore Chips

• VI. Conclusion and Future Work

Page 4: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

We are in the Multi-Core Era

• Mainstream market has already been dominated by multicore

• Intel: 2-core Core Duo, 4-core i7

• AMD: 2-core Athlon, 4-core Opteron

• IBM: 2-core POWER6, 9-core Cell

• Sun: 8-core T1/T2

• ……

Page 5: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Many-Core is coming

• Some processor vendors have announced or released their manycore processors

• Tilera: 64-core

• Intel: 80-core

• GPGPU: 100x-core

• ……

Page 6: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Revisiting Amdahl’s Law in the Multi/Many-Core Era

• Assume that a fraction f of a program’s execution time was infinitely parallelizable with no scheduling overhead, while the remaining fraction, 1 − f, was totally sequential. Using p processors to accelerate the parallel fraction.

• Fixed-size speedup, the amount of work to be executed is independent of the number of processors

Page 7: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Implications of Amdahl’s Law

• Despite its simplicity, Amdahl’s law applies broadly and gives important insights such as:

• (i) Attack the common case: When f is small, optimization will have little effect.

• (ii) The aspects you ignore also limit speedup: Even if p approaches infinity, speedup is bounded by 1/(1−f) .

Page 8: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Mark Hill et al.’s Insights

• Hill and Marty apply Amdahl’s law to multicore hardware by constructing a cost model for the number and performance of cores in one chip.

Obtaining optimal multicore performance requires further research both in extracting more parallelism and in making sequential cores faster.

• Woo and Lee have extended Hill’s work by taking power and energy into account.

Page 9: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Motivation of Our Work

• The revised Amdahl’s Law model provides a better understanding of multicore scalability.

• However, there is little work on theoretical analysis.

• This paper presents our investigations on

theoretical analysis of multicore scalability and attempts to find the optimal results under different conditions.

Page 10: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Model of Multicore Scalability

• We adopt the same cost model on multicore hardware proposed by Hill and Marty, which includes two assumptions:

• First, assume that a multicore chip of given size and technology generation can contain at most n base core equivalents (BCE)

• Second, assume that the individual core with more resources (r BCEs) can achieve better sequential performance.

– 1 < perf(r) < r

• The architecture of multicore chips can be classified into three types: – Symmetric– Asymmetric – Dynamic

Page 11: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Model-Symmetrical

• A symmetric multicore chip requires that all its cores have the same cost.

• Example: given 16 BCEs.– r = 8 2 cores * 8 BCEs/core

– r = 4 4 cores * 4 BCEs/core

• Given the resource budget of n BCEs, we have n/r cores, each with r BCEs. Performance of each core is perf(r). Then we get

Page 12: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Model-Asymmetrical• In an asymmetric multicore chip, several cores are more

powerful than the others.

• Example: given 16 BCEs– 1 four-BCE core and 12 base cores. – 1 six-BCE core and 10 base cores.

• Given the resource budget of n BCEs, we have 1+n−r cores with one larger core (with r BCEs) and n−r base cores (with 1 BCE each). Then we get

Page 13: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Model-Dynamic

• A dynamic multicore chip can dynamically combine up to r cores into one core in order to boost sequential performance.– In sequential mode, it can execute with performance of

perf(r) when the dynamic techniques use r BCEs.

– In parallel mode, it can obtain performance of n using all base cores in parallel.

• Then, we get

Page 14: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Symmetrical Multicore Chips

– Fixed n and r, speedup is an increasing function of f– Fixed f and r, speedup is an increasing function of n

Increasing both the parallel fraction (f) and the number of base core (n) can improve the speedup of symmetric multicore chip.

• For fixed f and n, we have the following theorem:

Page 15: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Symmetrical Multicore Chips

• For any fixed f and c, – if f < c, the maximum speedup is achieved at r = n.– if f > c and n is not big, the maximum speedup is achieved at r = 1.– if f > c and n is big enough, to obtain optimal multicore performance,

the resources of BCEs should be dedicated to one core

intended to offer reasonable individual core’s performance.

Page 16: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Symmetrical Multicore Chips• If n is big enough, then will the maximum

speedup always be achieved between extremes for any perf(x) < x?

• Counterexample: – (i) perf(x)=kx, for any 0<k<1; – (ii) perf(x)=xc, for any f<c<1.

Page 17: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Asymmetrical Multicore Chips

• Similarly, increasing both the parallel fraction (f) and the number of BCEs (n) can improve the speedup of asymmetric multicore chip.

• For fixed f and n, we have the following theorem:

Page 18: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Asymmetrical Multicore Chips

• If f >c and n is not big, maximum speedup is achieved at r = 1.• If f <c and n is not big, maximum speedup is achieved at r = n.• For any fixed f and c, if n is big enough, the maximum

speedup is achieved at 1<r0<n.

Page 19: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Asymmetrical Multicore Chips

• Note that the optimal r0 in Theorem 2 can not be solved analytically.

• r0 is linear with n, and if n is big enough, r0 will approach n to any extent.

Page 20: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Asymmetrical Multicore Chips

• If n is big enough, will the maximum speedup always be achieved between extremes for any perf(x)<x?

• Counterexample: – perf(x)=kx, for any f<k<1.

• For saturated functions, • Like p(x)=xc, p(x)=kxc+mxc’+…, where c, c’<1.

Page 21: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Asymmetrical Multicore Chips

• Based on the simplistic assumptions of Amdahl’s law, it makes most sense to devote extra resources to increase only one core’s capability. In fact we have the following theorem:

• Although the architecture of asymmetric multicore chip using one large core and many base cores is assumed originally for simplicity, it is indeed the optimal architecture in the sense of speedup.

Page 22: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Dynamic Multicore Chips

• We should increase both f and n to enhance the speedup of dynamic multicore chip.

• For fixed f and n,– if perf(r) is an increasing function, speedup is also an increasing

function the maximum speedup is always achieved at r = n.

Dynamic multicore chips can offer potential speedups that are greater and never worse than symmetric or asymmetric multicore chips with identical perf(r) functions.

• So researchers should continue to investigate methods that approximate a dynamic multicore chip.

Page 23: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Potentials of Maximum Speedups

• Recall that in the Amdahl’s law, even if the number of processors approaches infinity, the speedup is bound by1/(1−f) .

• The increasing of n can improve the speedup continuously. Under the assumption of perf(r) = rc, when n approaches infinity, the speedup can also approach infinity even if the performance index c is small.

Page 24: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Implications and Results

• A theoretical analysis of multicore scalability is investigated, and quantitative conditions are given to determine how to obtain optimal multicore performance.

• The theorems and corollary provide computer architects with a better understanding of multicore design types, enabling them to make more informed tradeoffs.

• However, our precise quantitative results are suspect because the real world is much more complex. The model considered here ignores many important structures.

• This theoretical analysis attempts to provide insights on future work.

Page 25: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Future Work

• In applications, the parallel fraction f can not be infinitely parallelizable. The parallel degree can be less than some constant d or even be random in some circumstances.

• Introducing practical structures, such as memory hierarchy, shared caches, etc.

• More cores might allow more parallelism for larger problem size. Fixed-time speedup, like the Gustafson’s law, should be considered.

• … …

Page 26: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Acknowledgements

• We would like to thank Professor Mark Hill for his valuable comments and suggestions.

• We also appreciate the help of Dr. Mark Squillant and the arrangement of the MAMA organizator on this video presentation.

Page 27: Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of

Thanks

Welcome Questions and Comments