Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Energy Efficiency andWater-Cool-TechnologyInnovations
2018 Lenovo - All rights reserved.
Karsten Kutzer | April 10th 2018 | Swiss Conference 2018Acknowledgments: Luigi Brochard, Vinod Kamath, Martin Hiegl (Lenovo)
Julita Corbalan (BSC)
2
Why care about Power and Cooling?
Increasing
Electricy Cost
Performance-
Power relation
Application
Diversity
Waste Heat
Reuse
Data Center
limitations
Leading the Industry in Energy Aware HPC
2018 Lenovo - All rights reserved.
3
0
200
400
600
800
1000
1200
1400
1600
1800
2000
60
80
100
120
140
160
180
200
220
240
2006-0
6-0
1
2006-1
1-0
1
2007-0
4-0
1
2007-0
9-0
1
2008-0
2-0
1
2008-0
7-0
1
2008-1
2-0
1
2009-0
5-0
1
2009-1
0-0
1
2010-0
3-0
1
2010-0
8-0
1
2011-0
1-0
1
2011-0
6-0
1
2011-1
1-0
1
2012-0
4-0
1
2012-0
9-0
1
2013-0
2-0
1
2013-0
7-0
1
2013-1
2-0
1
2014-0
5-0
1
2014-1
0-0
1
2015-0
3-0
1
2015-0
8-0
1
2016-0
1-0
1
2016-0
6-0
1
2016-1
1-0
1
2017-0
4-0
1
Intel Xeon Processor & Spec_fp Rate
TDP CFP2006 Rate2018 Lenovo - All rights reserved.
Performance-Power relation
500400320300 35024020585 12075
NVIDIA / AMD GPU
XEON
PHIAMDNERVANA/CREST
NVIDIA SXM
• Maintaining Moore’s Law with increased competition is resulting in higher component power
• Increased memory count, NVMe adoption, and I/O requirements are driving packaging and feature tradeoffs (superset of features doesn’t fit in 1U)
• Shared cooling fan power savings no longer exist for dense 2S nodes architectures due to non-spreadcore CPU layout high airflow requirements
For highest performance systems will have toreduce density or move to optimized cooling.
ARM SOC
Haswell
Sandy Bridge / IvyBridge
42018 Lenovo - All rights reserved.
Application Diversity
• CPU bound BQCD case
• Node runs on full Power
• CPU provides full performancewhile running at full power
• Memory bound BQCD case
• Node still runs on full Power
• CPU provides less performancewhile still running at full power
0.00
100.00
200.00
300.00
400.00
500.00
600.00
1 9 17 25 33 41 49 57 65 73 81 89 97 105
113
121
129
137
145
153
161
169
177
185
193
201
209
217
225
DC node[W]
CPU pkg 0 [W]
RAM pkg 0 [W]
CPU pkg 1 [W]
RAM pkg 1 [W]
0.00
100.00
200.00
300.00
400.00
500.00
600.00
1 9 17 25 33 41 49 57 65 73 81 89 97 105
113
121
129
137
145
153
161
169
177
185
193
201
209
217
225
DC node[W]
CPU pkg 0 [W]
RAM pkg 0 [W]
CPU pkg 1 [W]
RAM pkg 1 [W]
Turbo ON: 157 GFlops Turbo ON: 65 Gflops
SD650 with 2 sockets 8168 and 6 x 16GB DIMMs; room temp = 21°C, inlet water = 45°C, 1.5 lpm/tray
How much energy do we waste on non-CPU bound application?
52018 Lenovo - All rights reserved.
Waste Energy reuse - ERE
Energy Waste Direct Reuse Indirect Reuse
How much energy do we waste by not using the system heat?
Pictures: Leibniz Supercomputing Centre
6
Energy Aware HPC
2018 Lenovo - All rights reserved.
Best CPU choice with max TDP supported
Best performance fully utilizing the system
Best TCO / Performance for maximized ROI
Best use of limited DataCenter capacities
Best Carbon Footprint for eco responsible HPC
7
The three Pillars
Leading the Industry in Energy Aware HPC
2018 Lenovo - All rights reserved.
Hardware Software Infrastructure
Hardware
2018 Lenovo - All rights reserved.
9
Direct Water Cooling
Water Cooling Technologies
2018 Lenovo - All rights reserved.
10
• Standard Air flow with
internal fans cooled with
the room climatization
• Broadest choice of
configurable options
supported
• Relatively inefficient cooling
• Air cooled but heat
removed with RDHX
through chilled water
• Retains high flexibility
• Enables extremely tight
rack placement
• Potentially room neutral
• Most heat removed by
onboard-waterloop with
up to 50°C temperature
• Supports highest TDP CPU
at densest footprint
• Higher performance
• Free cooling
Air Cooled Air Cooled
w/ Rear Door Heat Exch.
Direct Water Cooled
2018 Lenovo - All rights reserved.
Lenovo Cooling Technologies
Choose for broadest choice
of customizable options
Choose for max performance
and high energy efficiency
Choose for increased energy
efficiency with broad choice
PUE ~2.0 – 1.5
ERE ~2.0 – 1.5
PUE ~1.4 – 1.2
ERE ~1.4 – 1.2
PUE <=1.1
ERE <=1.1
112018 Lenovo - All rights reserved.
Return on Investment for DWC vs RDHx
• New data centers: Water cooling has immediate payback.
• Existing air-cooled data center payback period strongly depends on electricity rate
DWC RDHx
$0.06/kWh $0.12/kWh $0.20/kWh
12
Rear Door Heat Exchanger
2018 Lenovo - All rights reserved.
Up to 27°C Cold Water Cooling
Up to 100% Heat Removal Efficiency on 30kW
No moving parts or power required
Tenthousands of nodes install base
Long
ago
2009
2010
132018 Lenovo - All rights reserved.
Lenovo Rear Door Heat Exchanger
Feature RDHx2
3.500 times more efficient than cold air
Air Movement
Provided by the systems in the rack
Heat removal
At 18oC Water temp, 27oC inlet air temp:100% for 30kW; 90% for 40kW
Watertemperature
• Min 18° C / 64.4° F for ASHRAE Class 1
• Min 22° C / 71.6° F for ASHRAE Class 2
• Max 27°C
WaterVolume
9 Liters / 2.4 Gallons
Water Flow Rate
• Min 22.7 liters / 6 gallons per minute• Max 56.8 liters / 15 gallons per minute
Door Dimensions
• Depth: 129mm/5in. • Height: 1950mm/76.8in. • Width: 600mm/23.6in.
Door Assembly Weight
• Empty: 39kg/85lbs• Filled 48kg/105lbs
Connection • ¾ inch quick connect(Supply: Parker SH6-63W; Return: Parker SH6-62-W; or equivalent)
© Torsten Bloth
142018 Lenovo - All rights reserved.
Lenovo RDHx2 – Typical Environment
15
Direct “Hot” Watercooling
2012
2014
2018
>24.000 nodes globally
Up to 50°C Hot Water Cooling
Up to 90% Heat Removal Efficiency
World Record Energy Reuse Efficiency
30+ patents on market leading design
2018 Lenovo - All rights reserved.
162018 Lenovo - All rights reserved.
Lenovo ThinkSystem SD650
Feature SD650
Processors2 Intel “Purley” Generation processors per node• Socket-F for Intel Omnipath supported• >120W all Skylake Shelves supported
Form factor 1U Full wide tray double-node / 6U12N Chassis
Memory SlotsMax Memory
• 12x DDR4 (R/LR) 2667MHz DIMM• 4x Intel Apache Pass DIMM ready
Storage • 2x SATA slim SSD / 1x NVMe, 2x M.2 SATA SSD
NIC 1x 1 GBaseT, 1x 1 GbE XCC dedicated
PCIe1x x16 PCIe for EDR Infiniband / OPA1001x x16 ML2 for 10Gbit Ethernet (in place of Storage)
Power 1300W/1500W/2000W Platinum and 1300W Titanium
USB ports Up to 1x front via dongle cable + 1x internal (2.0)
Cooling• No fans on chassis, PSU fans only• Up to 50°C warm water circulated through cooling
tubes for component level cooling
System MGMT / TPM
XCC, dedicated port or sharedTPM, Pluggable TCM
Dimensions 915mm depth, front access w/ front I/O
© Torsten Bloth
17
Top-Down View
2018 Lenovo - All rights reserved.
ThinkSystem SD650
Water Inlet *)
Water Outlet
Power
Board
CPUs
6 DIMMs
per CPU
2 AEP
per CPU
x16 PCIe Slot
Disk Drive
M.2 Slot
50°C
60°C
two nodes sharing a tray and a waterloop
*) inlet water temperature 50°C with configuration limitations (45°C without configuration limitations)
182018 Lenovo - All rights reserved.
SD650 Improved Node Water Cooling Architecture
• Focus on maximizing efficiency for high (up to 50°C) inlet water temperatures
• Device cooling optimization by minimizing water to device temperature differences
– dT CPU < ~0.1 K / W
– dT Memory < ~1 K / W
– dT Network < ~1 K / W
• Direct water cooling of processors, memory, voltage regulation devices and IO devices (Network and Disk)
• Water circuit traverses all critical components to optimize cooling.
DISK
Conductive
plate
Memory
Water
chanels
192018 Lenovo - All rights reserved.
HPL Temperature & Frequency on SD650 with 8168
PL2 (short term RAPL limit) is 1.2 x TDP PL1 (long term RAPL limit) is TDP
Non AVX instructions AVX instructions Non AVX instructions
SD650 with 2 sockets 8168 and 12 x 16GB DIMMs; room temp = 21°C, inlet water = 40°C, 1.5 lpm/tray
202018 Lenovo - All rights reserved.
Performance Optimization
• ThinkSystem SD530 – Standard Performance
– ~ 2.15 TeraFlop/s sustained HPL w/ SKL 6148 20C 2.4Ghz 150W
– /s sustained HPL w/ SKL 6148 20C 2.4Ghz 150W
• ThinkSystem SD650 – High Performance Mode
– ~ 2.34 TeraFlop/s sustained HPL w/ SKL 6148 20C 2.4Ghz 150W
HPC [GF] AC node DC node CPU Temp
Turbo OFF 2152.7 400.1 368.0 81.8
Turbo ON 2147.2 400.4 368.3 82.1
Turbo OFF
Turbo ON
Turbo OFF 2342.0 472.5 434.7 36.8
Turbo ON 2333.4 473.2 435.4 36.9
SD530 and SD650 with 2 sockets 6148 and 12 x 16GB DIMMs; room temp = 21°C, inlet water = 18°C, 1.5 lpm/tray
+9% +18%
Software
2018 Lenovo - All rights reserved.
22
MANAGINGREPORTING
Becoming Energy Aware
2018 Lenovo - All rights reserved.
232018 Lenovo - All rights reserved.
SD650 – DC Power Sampling/Reporting Frequency
• AC power at chassis level (through FPC)
– With xCAT
– With ipmi
• DC power and energy at node level through XCC
– With hw_usage library
– With ipmi
– With RAPL
– With Allinea
– With LSF or LEAR
NM/ME
HSC
RAPL
CPU/memory
(energy MSRs)
XCC/BMC
1Hz
10Hz
1KHz
Meter
500Hz
Sensor
200Hz1Hz
High Level Software
HSC –node
power
XCC/BMC
FPGA
100Hz
100Hz
100Hz
New for Lenovo
ThinkSystem SD650
10KHz
Sensor
24
Bulk 12V Node 12V
2018 Lenovo - All rights reserved.
SD650 – advanced Accuracy for Power and Energy
• Node DC Power readings– Better than or equal to +/-3% power reading accuracy
– down to the node’s minimum active power (~40-50W DC).
– Power granularity <=100mW
– At least 100Hz update rate for node power readings
• Node DC Energy meter– Accumulator for Energy in Joules (~10 weeks until meter overflow)
XCCME (Node
Manager)
SN1405006
(used for
capping)
FPGA
(FIFO)
ipmi raw
oem cmd
Rsense
INA226
(used for
metering)
High accuracy, fast sampling Maintains compatibility with Node Manager
252018 Lenovo - All rights reserved.
262018 Lenovo - All rights reserved.
Energy Aware Run time: Motivation
• Power and Energy has become a critical constraint for HPC systems
• Performance and Power consumption of parallel applications depends on:
– Architectural parameters
– Runtime node configuration
– Application characteristics
– Input data
• Manual “best” frequency– Difficult to select manually and it is a time
consuming process (resources and then power) and not reusable
– It may change along time
– It may change between nodes
Configure application for Architecture X
Execute with N frequencies:
calculate time and energy
Select optimal frequency
27
EAR – Automatic and Dynamic CPU Frequency
• Architecture characterization
• Application characterization– Outer loop detection (DPD)
– Application signature computation (CPI,GBS,POWER,TIME)
• Performance and power projection
• Users/System policy definition for frequency selection (with thresholds)– MINIMIZE_ENERGY_TO_SOLUTION
- Goal: To save energy by reducing frequency (with potential performance degradation)
- We limit the performance degradation with a MAX_PERFORMANCE_DEGRADATION threshold
– MINIMIZE_TIME_TO_SOLUTION
- Goal: To reduce time by increasing frequency (with potential energy increase)
- We use a MIN_PERFORMANCE_EFFICIENCY_GAIN threshold to avoid that application that do not scale with frequency to consume more energy for nothing
2018 Lenovo - All rights reserved.
282018 Lenovo - All rights reserved.
EAR – Functional Overview
Learning Phase (at EAR installation*)
Execution Phase (loaded with application)
Kernel
Execution
Coefficients
Computation
Coeffcients
Database
Dynamic Patter Detection
detects outer loop
Compute power and
performance metrics
for outer loop
Energy Policy
read
CP
U F
requency
* or every time cluster configuration is modified
(more memory per node, new processors ...)
Optimal frequency
calculation
292018 Lenovo - All rights reserved.
BQCD_CPU with EAR MIN_ENERGY_TO_SOLUTION
0
5000
10000
15000
20000
25000
30000
0
13184
3225
947
299
779
390
597
51515
41285
534
2011
522
2553
471
3023
401
3533
108
3883
160
4591
497
5063
327
5621
436
6088
931
6599
507
6954
154
7478
425
7922
780
8294
067
8793
099
9248
717
18777
011
49344
177
7988
6200
110
390
411
140
874
404
171
355
272
201
829
244
232
294
824
262
738
421
293
165
791
323
574
238
353
978
730
384
358
646
414
755
367
445
175
899
475
596
024
506
014
315
536
410
396
566
819
213
Acuumulated me
BQCD_CPU:Outerloopsizedetected(mpirank0)
2300000
2350000
2400000
2450000
2500000
2550000
2600000
2650000
0
134249
232508
321534
408810
692318
1506881
2107093
2762515
3130317
3787863
4367471
4882332
5566174
5918062
6438408
6951142
7481230
7985974
8504949
8813306
9337736
25467570
56976636
88462174
120870973
152311968
184688272
217057025
249418854
280817893
313128553
345428274
377717332
410009928
442326512
474646835
506014315
538310924
570616511
Frequency
Accumulated me
BQCD_CPU:Frequency(mpirank0)
0
5000
10000
15000
20000
25000
30000
0
130783
220771
295401
383475
461384
1079442
1830691
2338181
2840801
3298260
3792725
4365887
4864175
5392171
5828340
6403551
6693295
7204960
7666811
8017997
8521646
8974694
9337736
26419798
56021729
85604660
115152228
144683945
174209742
203733324
233245157
262738421
292216336
321677811
351131866
380564787
410009928
439469002
468944263
498411778
527870308
557327668
575261394
Acuumulated me
BQCD_CPU:Outerloopsizedetected(mpirank8)
2440000246000024800002500000252000025400002560000258000026000002620000
0
132238
248215
386318
502916
846670
1414410
2166140
2657547
3182790
3679084
4148797
4872170
5253045
5903995
6272346
6776910
7296748
7802636
8272221
8640726
9145024
9656532
36068860
66603483
97121470
128576097
159059861
190494052
221913387
253303684
283750918
315114433
346461071
377801000
409146018
440503053
471881334
502293490
533647990
565005225
Frequency
Accumulated me
BQCD_CPU:Frequency(mpirank8)
M
P
I
R
A
N
K
0
M
P
I
R
A
N
K
8
Big loop detected
Policy is applied
F: 2.6Ghz2.4Ghz
230235240245250255260265270275
0
131843
229109
304274
398976
671428
1330758
2082498
2573903
3099153
3595445
4065140
4788521
5169396
5820376
6187342
6693295
7213097
7718993
8188571
8557082
9061369
9572891
35985216
66519836
97037820
128492446
158976207
190410395
221829728
253220023
283667255
315030767
346377404
377717332
409062348
440419381
471797662
502209816
533564316
564921550
Avg.Power(W
)
Accumulated me
BQCD_CPU:MeasuredPOWER(mpirank0)
245
250
255
260
265
270
0
131195
244291
380251
491701
689551
1366600
1933029
2593333
3105484
3613768
3963772
4669953
5145362
5703804
6013462
6681447
7034739
7555744
8004770
8374090
8868621
9330380
15973526
43705120
73298174
103789543
133340894
163821371
194299015
224765387
254257203
284701562
315114433
345511942
375902594
406291401
436701498
467128200
497543500
527003712
557411343
575230251
Avg.Power(W
)
Accumulated me
BQCD_CPU:MeasuredPOWER(mpirank8)
Power is reduced
0
200000
400000
600000
800000
1000000
1200000
0
134249
232508
321534
408810
692318
1506881
2107093
2762515
3130317
3787863
4367471
4882332
5566174
5918062
6438408
6951142
7481230
7985974
8504949
8813306
9337736
25467570
56976636
88462174
120870973
152311968
184688272
217057025
249418854
280817893
313128553
345428274
377717332
410009928
442326512
474646835
506014315
538310924
570616511
Itera
onme(usecs)
Accumulated me
BQCD_CPU:MeasuredItera on me(mpirank0)
0
200000
400000
600000
800000
1000000
1200000
0
132238
248215
386318
502916
846670
1414410
2166140
2657547
3182790
3679084
4148797
4872170
5253045
5903995
6272346
6776910
7296748
7802636
8272221
8640726
9145024
9656532
36068860
66603483
97121470
128576097
159059861
190494052
221913387
253303684
283750918
315114433
346461071
377801000
409146018
440503053
471881334
502293490
533647990
565005225
Itera
onme(usecs)
Accumulated me
BQCD_CPU:MeasuredItera on me(mpirank8)
Iteration time is
similar
Infrastructure
2018 Lenovo - All rights reserved.
312018 Lenovo - All rights reserved.
PUE, ITUE, TUE and ERE
• Power Usage Effectiveness (PUE) says how much power a datacenter uses is not used for computing.• It is the ratio of total power to the power delivered to computing equipment.
• It does not take into account how effective a server uses the Power it gets.
• Ideal value is 1.0
• IT Usage Effectiveness (ITUE) measures how much power a system uses is not used for computing.• It is the ratio of the power of IT equipment to the power of the computing components.
• Multiplied with the PUE it gives the Total-Power Usage Effectiveness (TUE)
• Ideal value is 1.0
• Energy Reuse Effectiveness (ERE) integrates the reuse of the power dissipated by the computer. • It is the ratio of total power considering also reuse to the power delivered to computing equipment.
• An ideal ERE is 0.0. If no reuse, ERE = PUE
𝑃𝑈𝐸 =𝑇𝑜𝑡𝑎𝑙 𝐹𝑎𝑐𝑖𝑙𝑖𝑡𝑦 𝑃𝑜𝑤𝑒𝑟
𝑇𝑜𝑡𝑎𝑙 𝐼𝑇 𝑃𝑜𝑤𝑒𝑟
𝐼𝑇𝑈𝐸 =𝑇𝑜𝑡𝑎𝑙 𝐼𝑇 𝑃𝑜𝑤𝑒𝑟
𝑇𝑜𝑡𝑎𝑙 𝐶𝑜𝑚𝑝𝑢𝑡𝑒 𝑃𝑜𝑤𝑒𝑟
𝐸𝑅𝐸 =(𝑇𝑜𝑡𝑎𝑙 𝐹𝑎𝑐𝑖𝑙𝑖𝑡𝑦 𝑃𝑜𝑤𝑒𝑟 − 𝑇𝑜𝑡𝑎𝑙 𝑅𝑒𝑢𝑠𝑒 𝑃𝑜𝑤𝑒𝑟)
𝑇𝑜𝑡𝑎𝑙 𝐼𝑇 𝑃𝑜𝑤𝑒𝑟
32
• Standard Air flow with
internal fans cooled with
the room climatization
• Broadest choice of
configurable options
supported
• Relatively inefficient cooling
• Air cooled but heat
removed with RDHX
through chilled water
• Retains high flexibility
• Enables extremely tight
rack placement
• Potentially room neutral
• Waste heat reused to
generate coldness to cool
non-DWC components
• Retains highest TDP,
footprint and performance
• Potentially all system heat
covered through DWC
• Most heat removed by
onboard-waterloop with
up to 50°C temperature
• Supports highest TDP CPU
at densest footprint
• Higher performance
• Free cooling
Air Cooled Air Cooled
w/ Rear Door Heat Exch.
Direct Water Cooled
w/ Adsorption Chilling
Direct Water Cooled
2018 Lenovo - All rights reserved.
Lenovo Cooling Technologies
Choose for broadest choice
of customizable options
Choose for max performance
and high energy efficiency
Choose for increased energy
efficiency with broad choice
Choose for max performance
and max energy efficiency
PUE ~2.0 – 1.5
ERE ~2.0 – 1.5
PUE ~1.4 – 1.2
ERE ~1.4 – 1.2
PUE <=1.1
ERE <=1.1
PUE <=1.1
ERE <1
332018 Lenovo - All rights reserved.
Value of Direct Water Cooling with Free Cooling
• Reduced noise level in the DataCenter
• Reduced server power consumption– Lower processor power consumption (~ 5%)
– No fan per node (~ 4%)
• Reduce cooling power consumption– At 45°C free cooling all year long ( ~ 25%)
• Energy Aware Scheduling– Only CPU bound jobs get max frequency (~ 5%)
• CAPEX Savings– Less conventional chillers for the Computing System
Energy Savings
35-40%
Total Saving
34
Adsorption Chilling
The method of using solid materials for cooling via evaporation.
• Adsorption chiller consists of two identical vacuum containers, each containing two heat exchangers – and water.
– Adsorber (Desorber)Coated with the adsorbent (e.g. zeolite)
– Evaporator (Condenser)Evaporation and condensation of water
• Adsorption process has 2 phases– in the adsorption phase the water on the
evaporator is taken in by the coated material in the adsorber. Through that evaporation the evaporater and the water flowing through it does cool down while the adsorber fills with water vapor and heats up the water flowing through it. When the adsorber is saturated the process is reversed.
– in the desorption phase hot water is passed through the adsorber acting as a desorber rather as its desorbing the water vapor and dispensing it to the evaporator which is acting as condenser at that point condensing the vapor back to water. Again the process is reversed when the adsorberis emptied.
Module 1
DesorptionHot Water from Compute Racks
52°/46 °C
Condensation Cooling Water
to Hybrid Cooling Tower
26°/32°C
Adsorption Cooling Water
to Hybrid Cooling Tower
26°/32°C
Evaporation Chilled Water
to Storage etc. Racks23°/20°C
Desorber Condenser
Module 2
Adsorber Evaporater
352018 Lenovo - All rights reserved.
Value of Direct Water Cooling with Adsorption Chiller
• Reduced noise level in the DataCenter
• Maximum TDP CPU Choice
• Reduced server power consumption– Lower processor power consumption (~ 5%)
– No fan per node (~ 4%)
• Reduce cooling power consumption– At 50°C free cooling all year long (~ 25%)
– Heat Reuse generate 600kW cooling capacity (> 5%)
• Energy Aware Runtime– Frequency optimization during runtime (~ 5%)
• CAPEX Savings– Less conventional chillers for the Computing System
Energy Savings
40 - 50%
Total Saving
2018 Lenovo - All rights reserved.