13
Heterogeneous CPU/GPU co-processor clusters Michael Fruchtman

Heterogeneous CPU/GPU co- processor clusters Michael Fruchtman

Embed Size (px)

Citation preview

Page 1: Heterogeneous CPU/GPU co- processor clusters Michael Fruchtman

Heterogeneous CPU/GPU co-processor clusters

Michael Fruchtman

Page 2: Heterogeneous CPU/GPU co- processor clusters Michael Fruchtman

Current State

• Eight of the top ten most efficient clusters are heterogeneous [1]

• Power law of efficiency

Page 3: Heterogeneous CPU/GPU co- processor clusters Michael Fruchtman

Current State

• At today’s efficiencies:– An exascale (1018) cluster will require

200MegaWatts [2]– Cluster efficiency must grow by 66% a year to keep

up with Moore’s Law– Most efficient cluster increased at normalized

61.4% average per year• This gap represents the increase in power requirements

to grow from petascale to exascale

Page 4: Heterogeneous CPU/GPU co- processor clusters Michael Fruchtman

Power Efficient Amdahl’s Law [3]

• Three transitions from P– P to P*, P to c*, P+c*– Speedup per watt

– f is fraction of parallel execution– N total number of cores P+c*– Wc percentage of power draw of c to P

– Kc percentage of power draw of idle c to active c– K power draw of P– Sc performance of c relative to P

Page 5: Heterogeneous CPU/GPU co- processor clusters Michael Fruchtman

Power Efficient Amdahl’s Law [3]

• Given Wc=0.25, Sc=0.5, Kc=0.60• N variable to power budget, K=1• Top: f=0.3• Bottom: f=0.9• P+c* is superiorwith increasedparallelization

Page 6: Heterogeneous CPU/GPU co- processor clusters Michael Fruchtman

GPU Architecture [4]

Page 7: Heterogeneous CPU/GPU co- processor clusters Michael Fruchtman

P-E Amdahl’s Law and GPU

• Wc = 0.00417, 0.5 watts per core, K=120 – Intel i7 980 XE

• Kc = 0.115– turning on a GPU is 71% of power draw [5]

• Sc is harder to measure, memory or computation bound? GPU memory architecture makes this difficult to measure.

• Sc = 0.172 assuming computational with the GTX580

Page 8: Heterogeneous CPU/GPU co- processor clusters Michael Fruchtman

Threads, Blocks and Performance [5]

Page 9: Heterogeneous CPU/GPU co- processor clusters Michael Fruchtman

Formal Power Modeling [6]

Average Geometric Error of Power Prediction = 9.18%

Page 10: Heterogeneous CPU/GPU co- processor clusters Michael Fruchtman

Temperature Model [6]

RC_Rise = 35 and RC_Decay = 65 GPU dependent constants

Page 11: Heterogeneous CPU/GPU co- processor clusters Michael Fruchtman

Conditions for GPU Use

• GTX 580 draws 244W on load– Speedup must be greater than 2, 3 for safety– f must be very high, preferably 0.9 or higher

• Improved energy efficiency is based on performance– Example: GPUDB SQL queries– Without joins speedup 20+ [7] – With joins 2-7 [8]

Page 12: Heterogeneous CPU/GPU co- processor clusters Michael Fruchtman

Reducing GPU Power Usage

• Powergating

• Improved Memory Coalescence – Memory Coalescence Models

• Incoherent Branching– Incoherent Branching Models

• NVIDIA Optimus reduces idle power to near zero

Page 13: Heterogeneous CPU/GPU co- processor clusters Michael Fruchtman

References• [1] Feng, Wu-chan and Kirk W. Cameron. "The Green 500 List - November 2010." The Green 500. Virginia

Tech and Virginia Polytechnic Institute and State University. November 2010. Web. March 15 2011.

• [2] T. Agerwala. Challenges on the road to exascale computing. Proceedings of the 22nd annual international conference on Supercomputing (ICS '08). ACM, New York, NY, USA, 2-2. 2008.

• [3] D. Woo and H-H Lee. Extending Amdahl's Law for Energy-Efficient Computing in the Multi-Core Era. IEEE Xplore. IEEE Computer Society. December 2008. Web. March 15, 2011.

• [4] R. Smith. "NVIDIA's GeForce GTX 580: Fermi Redefined. AnandTech. November 9, 2010. Web. March 16, 2011. http://www.anandtech.com/show/4008/nvidias-geforce-gtx-580

• [5] R. Suda and D. Ren. Accurate Measurements and Precise Modeling of Power Dissipation of CUDA Kernels towards Power Optimized High Performance Computing. International Conference on Parallel and Distributed Computing, Applications and Technologies. IEEE Computer Society. pp. 432-438. 2009.

• [6] S. Hong and H. Kim. An Integrated GPU Power and Performance Model. ISCA '10 Proceedings of the 37th annual international symposium on Computer architecture. ACM, New York, NY, USA. pp. 280-289. 2010.

• [7] P. Bakkum and K. Skadron. Accelerating SQL Database Operations on a GPU with CUDA. GPGPU '10 Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units. ACM, New York, NY, USA. pp. 94-103.

• B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational Joins on Graphics Processors. SIGMOD '08 Proceeding on the 2008 ACM SIGMOD international conference on Management of data. ACM, New York, NY, USA. pp. 511-524. 2008.