1
Generation Construction (kernel 1) BFS (kernel 2) Count of cache misses per sec [s 1 ] Power consumption [W] Scale constant constant constant Performance evaluation of Graph500 considering CPUDRAM power shifting Yuta Kakibuka 1 , Yuichiro Yasui 2 , Takatsugu Ono 3 , Katsuki Fujisaw 2 , Koji Inoue 3 1 Graduate School of Information Science and Electrical Engineering, Kyushu University, 2 Institute of Mathematics for Industry, Kyushu University 3 Faculty of Information Science and Electrical Engineering, Kyushu University 1. Introduction and Motivation 2. Basic Power Analysis Power consumption CPU DRM DRM Traditional CPU CPU DRM CPU Overprovisioned Traditional Overprovisioned Potential power consumption Actual power consumption DRM Power constraint Power Capping Computing demands on a power constraints Experimental environment Emerging Application (Graph500 ) Directionoptimized BFS (Hybrid of Topdown and Bottomup approach) Topdown BFS Bottomup BFS Need to know the impact of power capping on Graph500 performance! 3. Impact of CPU/DRAM Power Capping Good power shifting depends on input size 4. Summary The power consumption of DRAM is increasing in specific range. Good power budgeting depends on input size. Importance of shifting power budget to DRAM is increasing with the increase of input size. Overprovisioned system has too much hardware than its capacity of the power source. The actual power consumption of the components must be controlled by a power capping technique in order not to violate the power constraint. A graph analysis is increasing its importance with growing big data applications. A large scale graph processing app. is executed on an overprovisioning system. It is not revealed the performance impact of graph applications under power constraints. The power consumption of DRAM strongly affects dependency of overall power consumption on problem size than the power consumption of CPU. Relatively small input Preferentially shifting power to CPU brings higher performance. The power consumption of DRAM is less than the fewest power budget for DRAM in this range, so power shifting to DRAM is enough. The power constraints of DRAM are more than or equal to 20[W]. The power consumption of DRAM is also less than 20[W] under no power constraint setting. Relatively large input Preferentially shifting power to DRAM brings higher performance. The power consumption of DRAM is up to 40[W]. Allotting 40[W] to DRAM brings highest performance for 25 <= scale <= 27. However, allotting 50[W] to DRAM and 50[W] to CPU brings worst performance. Performance of Graph500 under power constraint Dynamic behavior of power consumption We need to shift power appropriately for each hardware component Ex1) Over Provisioned System Ex2) Social Demands There are some social demands for power constraints. For example, Ability of an electricity generating station. Power contract of building. Power consumption of BFS kernel forms a phase. Do not need to concern about changing the amount of power shifting during execution of BFS. CPU Intel Xeon E52620 (6core) x2, TDP 95[W] Memory 16GB x8 (128GB) Compiler Intel Compiler Version 14.0.0 Power MGMT RAPL Total:120[W] Total:100[W] Impact of input sizes on CPU/DRAM power Small input Large input These kernels forms a phase each. The power consumption of DRAM is increasing with the increase of the count of access to DRAM. Highest perf for 120[W] budget Highest perf for 100[W] budget Power consumption under no constraints CPU DRAM Input size (scale) Large small 1020[W] 2030[W] 3040[W] 4050[W] 8090[W] 90100[W]

PerformanceevaluationofGraph500 consideringCPU ... Archive/tech... · Generation Construction (kernel01) BFS (kernel02) s 1] c] Scale constant constant constant PerformanceevaluationofGraph500

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PerformanceevaluationofGraph500 consideringCPU ... Archive/tech... · Generation Construction (kernel01) BFS (kernel02) s 1] c] Scale constant constant constant PerformanceevaluationofGraph500

GenerationConstruction(kernel  1)

BFS(kernel  2)C

ount  of  cache  misses  p

er  sec  [s-‐‑‒1]  

Power  consumption  [W]

Scale

constant

constant

constant

Performance  evaluation  of  Graph500considering  CPU-­‐DRAM  power  shifting  

Yuta  Kakibuka1,  Yuichiro Yasui2,  Takatsugu Ono3,  Katsuki Fujisaw2,  Koji  Inoue31Graduate  School  of  Information  Science  and  Electrical  Engineering,  Kyushu  University,  2Institute  of  Mathematics  for  Industry,  Kyushu  University

3Faculty  of  Information  Science  and  Electrical  Engineering,  Kyushu  University

1.  Introduction  and  Motivation

2.  Basic  Power  Analysis

Power  co

nsum

ption

CPU

DRM

DRM

Traditional

CPUCPU

DRM

CPUOver-­‐provisioned Traditional Over-­‐provisioned

Potential  power  consumption Actual  power  consumption

DRMPower  constraint

Power  Capping

Computing  demands  on  a  power  constraints  

Experimental  environment

Emerging  Application  (Graph500  )Direction-­‐optimized  BFS  (Hybrid  of  Top-­‐down  and  Bottom-­‐up  approach)

Top-­‐down  BFS Bottom-­‐up  BFS

Need  to  know  the  impact  of  power  capping  on  Graph500  performance!

3.  Impact  of  CPU/DRAM  Power  Capping

Good  power  shifting  depends  on  input  size

4.  Summary• The  power  consumption  of  DRAM  is  increasing  in  specific  range.• Good  power  budgeting  depends  on  input  size.• Importance  of  shifting  power  budget  to  DRAM  is  increasing  with  the  increase  of  

input  size.

• Over-­‐provisioned system has too much hardware than itscapacity of the power source.

• The actual power consumption of the componentsmust be controlled by a power capping technique inorder not to violate the power constraint.

• A graph analysis is increasing its importancewith growing big data applications.

• A large scale graph processing app. is executed onan overprovisioning system.

• It is not revealed the performance impact ofgraph applications under power constraints.

The power consumption of DRAM stronglyaffects dependency of overall powerconsumption on problem size than thepower consumption of CPU.

Relatively  small  input• Preferentially shifting power to CPU brings higher performance.• The power consumption of DRAM is less than the fewest power budget for

DRAM in this range, so power shifting to DRAM is enough.• The power constraints of DRAM are more than or equal to 20[W].• The power consumption of DRAM is also less than 20[W] under no power

constraint setting.

Relatively  large  input• Preferentially  shifting  power  to  DRAM  brings  higher  performance.• The  power  consumption  of  DRAM  is  up  to  40[W].• Allotting  40[W]  to  DRAM  brings  highest  performance  for  25  <=  scale  <=  27.• However, allotting 50[W] to DRAM and 50[W] to CPU brings worst performance.

Performance  of  Graph500  under  power  constraint

Dynamic  behavior  of  power  consumption

We  need  to  shift  power  appropriately  for  each  hardware  component

Ex1)  Over  Provisioned  System Ex2)  Social Demands

There  are  some  social  demands  for  power  constraints.

For  example,  • Ability  of  an  electricity  generating  

station.• Power  contract  of  building.  

Power consumption of BFS kernel forms a phase.Do not need to concern about changing theamount of power shifting during execution of BFS.

CPU Intel  Xeon  E5-­‐2620 (6core)  x2,  TDP  95[W]

Memory 16GB  x8  (128GB)

Compiler Intel  Compiler  Version  14.0.0

Power MGMT RAPL

Total:120[W]

Total:100[W]

Impact  of  input  sizes  on  CPU/DRAM  power

Small  input

Large  input

These  kernels  forms  a  phase  each.

The power consumption of DRAM isincreasing with the increase of the count ofaccess to DRAM.

Highest  perf  for  120[W]  budget

Highest  perf  for  100[W]  budget

Powerconsumptionunderno  constraints

CPUDRAM

Input  size  (scale) Largesmall

10-­‐20[W] 20-­‐30[W] 30-­‐40[W] 40-­‐50[W]

80-­‐90[W] 90-­‐100[W]