Upload
surf
View
292
Download
2
Tags:
Embed Size (px)
Citation preview
Symposium Groene ICT en Duurzame ontwikkeling: Meters maken in het Hoger Onderwijs
Duurzame Supercomputers
Walter Lioen <[email protected]>
Groepsleider Supercomputing
Supercomputing and Sustainability
January 31, 2013 Sustainable Supercomputing – Walter Lioen 2
Outline
• SURFsara
• Supercomputing
• Performance - TOP500
- Green500
• Requirements
• Sustainability - Investment vs. Total Cost of Ownership
- Energy efficiency:
- Application throughput / TCO
- Warm water cooling
- On-demand growth
- Energy aware scheduling
About SURFsara
• SURFsara offers an integrated ICT research infrastructure and provides services in the areas of computing, data storage, visualization, networking, cloud and e-Science.
• SARA was founded in 1971 as an Amsterdam computing center by the two Amsterdam universities (UvA and VU) and the current CWI
• Independent as of 1995 • Founded Vancis in 2008 offering ICT services and
ICT products to enterprises, universities, and educational and healthcare institutions
• As from 1 January 2013, SARA – from then on SURFsara – forms part of the SURF Foundation
• First supercomputer in The Netherlands in 1984 (Control Data Cyber 205). Hosting the national supercomputer(s) ever since.
Sustainable Supercomputing – Walter Lioen January 31, 2013 3
What is a Supercomputer?
January 31, 2013 Sustainable Supercomputing – Walter Lioen 4
• A supercomputer is a computer at the frontline of current
processing capacity, particularly speed of calculation
• Consequently, the specification of a supercomputer is constantly
changing
• Rule of thumb: a supercomputer is at least 1,000 – 10,000 up to
100,000 times faster than an average PC
Why supercomputing?
January 31, 2013 Sustainable Supercomputing – Walter Lioen 5
Large scale scientific computing Simulation of processes tot are otherwise • Impossible in practice • Too expensive • Too dangerous • Too extended
Examples • Astronomy
- How did the universe begin? - How do stars form and evolve?
• Weather Prediction, Climatology • Nuclear Physics • Aerodynamics (cars, planes, rockets) • Biology (proteins, DNA, drugs) • Medical sciences (bone formation, blood flow)
Top500: HPL benchmark
January 31, 2013 Sustainable Supercomputing – Walter Lioen 6
• HPL is a software package that solves a (random) dense linear
system in double precision (64 bits) arithmetic on distributed-
memory computers
• For Sequoia (the current nr. 2) - n 12,681,215
• Computational kernel: DGEMM (matrix multiply)
• Extremely efficient on all processors (in cache)
• Limiting factors: - Speed of interconnect
- Speed to (local accelerator) memory (for e.g. GPU)
• However, far more important: application speed
• “In Amsterdam a Ferrari is useless (speed-wise)”
Green500: TOP500 MFlop/s / W
January 31, 2013 Sustainable Supercomputing – Walter Lioen 7
November 2012
• position 1 – 4: - commodity processors with coprocessors or
- commodity processors with graphics processing units (GPUs)
- TOP500 #1 (Titan) is Green500 #3
• position 5 – 29: - Blue Gene/Q
SURFsara National Supercomputing History
January 31, 2013 Sustainable Supercomputing – Walter Lioen 8
Year Machine Rpeak
GFlop/s kW
GFlop/s
/ kW
1984 CDC Cyber 205 1-pipe 0.1 250 0.0004
1988 CDC Cyber 205 2-pipe 0.2 250 0.0008
1991 Cray Y-MP/4128 1.33 200 0.0067
1994 Cray C98/4256 4 300 0.0133
1997 Cray C916/121024 12 500 0.024
2000 SGI Origin 3800 1,024 300 3.4
2004 SGI Origin 3800 +
SGI Altix 3700
3,200 500 6.4
2007 IBM p575 Power5+ 14,592 375 40
2008 IBM p575 Power6 62,566 540 116
2009 IBM p575 Power6 64,973 560 116
2013 Bull bullx DLC 250,000 260 962
2014 Bull bullx DLC >1,000,000 >520 1923
Top500 – iPad 2 performance
January 31, 2013 Sustainable Supercomputing – Walter Lioen 9
• An A5 processor core of an iPad 2 is as fast as a four processor
Cray 2 supercomputer (1.951 GFlop/s)
• In 1985 an eight processor Cray 2 was the fastest supercomputer
in the world
• The iPad 2 would still have been listed in the Top500 of 1994
Moore’s Law (1965)
January 31, 2013 Sustainable Supercomputing – Walter Lioen 10
• The number of transistors on an integrated circuit doubles every
2 years
• Because of faster transistors, the speed doubles every 18 months
• The clock speed stopped doubling a couple of years ago
• Nowadays the number of cores doubles
• Moore noted that if car manufacturers
had something like this, cars would get
100,000 miles to the gallon and it would
be cheaper to buy a Rolls Royce than
park it. (Cars would also be only a half
an inch long.)
Governance of the procurement
January 31, 2013
Selection committee: • dr. ir. Anwar Osseyran (director SARA)
• prof. dr. Wim Liebrand (director SURF)
• prof. dr. Jacob de Vlieg (director NLeSC)
• prof. dr. ir. Henk Dijkstra (chairman NWO-WGS)
Technical advisory committee (SARA): • Walter Lioen (system architecture, applications & benchmarks)
• Huub Stoffers (system architecture, storage, system management)
• Aad van der Steen (system architecture, applications & benchmarks)
• Mark van de Sanden (system architecture and storage)
• Peter Michielse (general, phasing and vice-chair)
• Axel Berg (general, datacenter and chair)
11 Sustainable Supercomputing – Walter Lioen
Extensive requirements analysis
January 31, 2013
• Interviews with top 25 users of Huygens (mid 2011)
• Workshop grand challenge experiences (April 29, 2011)
• Detailed analysis of Huygens resource usage (mid 2011 – Q1 2012) - Which User Applications (2008 – 2012)
- Scaling of Applications (current use and scaling potential)
- Actual memory usage
- I/O profiles
• HPC market and technology assessment
12 Sustainable Supercomputing – Walter Lioen
From requirements analysis to technical
requirements for the procurement
January 31, 2013
Application benchmark suite
Technical requirements
HPC market
analysis
User requirements
System statistics
13 Sustainable Supercomputing – Walter Lioen
Most important technical requirements (1/2)
January 31, 2013
Compute & processor architecture
• General purpose capability system
• Large number of Thin compute nodes: - at least 16 cores
- at least 1 GB memory/core, 2 GB highly preferred
• Small number of Fat compute nodes: - at least 32 cores
- at least 4 GB memory/core, 8 GB highly preferred
Concept of thin node and fat node islands:
• Non-blocking low-latency interconnect within thin node islands (at least
4,096 cores) and fat node island (at least 1,024 cores)
• Interconnect bandwidth among islands not be pruned by more than a
factor of the order of 4:1
Application benchmark suite
Technical requirements
3
1 2
14 Sustainable Supercomputing – Walter Lioen
Most important technical requirements (2/2)
January 31, 2013
Accelerators
• At first only if application benchmark shows real benefit
• Option to add accelerators during the course of the contract
I/O
• I/O bandwidth to scratch minimal 0.15 GB/TFlop/s
• Disk space scratch/project minimal 5 TB/TFlop/s
Energy and cooling efficiency
• Costs for power and cooling in Total Costs of Ownership (TCO)
equation, vendor to optimize power related costs vs. investment costs
15 Sustainable Supercomputing – Walter Lioen
Application Benchmark Suite
January 31, 2013
• Application benchmark codes selected based on use, spread across
scientific areas, scaling (potential)
• These 7 codes represent 50% of the work load on Huygens (2008 – 2012)
• Final application benchmark set selected in consultation with NWO-WGS
Benchmark Code Scientific area Scaling (MPI tasks) Weight
ADF Quantum chemistry 384 10%
GROMACS MD 2048, 1024, 4096 20%
POP Ocean circulation 1280, 640, 2560 15%
SPARKLE CFD 1024 15%
SPO-DVR Molecular QD 512, 256, 1024 10%
SUSHI Cosmology 2048 15%
VASP ab-initio QM-MD 128 15%
16 Sustainable Supercomputing – Walter Lioen
Energy and cooling efficiency
January 31, 2013
Costs and sustainability are important, overall application performance/Watt
• Energy efficiency for the supercomputer system - Energy use under full load - Energy use when idle - Average energy use of running system
• Efficiency for cooling the supercomputer - Air cooling efficiency factor 1.6 - Water cooling (< 30ºC) efficiency factor 1.4 - Warm water cooling (> 30ºC ) efficiency factor 1.2
• Advantage of warm water cooling over air cooling and ‘cold’ water cooling: - when inlet temperature of water is 30ºC or higher, we can assume free cooling for
all year - in Amsterdam, 0.9% of days per year maximum temperature is above 30ºC - All thin compute nodes of the new Bull system are Direct Liquid Cooled with inlet
of 35ºC • Energy efficiency when using the supercomputing system
- Frequency of CPU is not fixed anymore - Optimization of CPU frequency per application becomes possible,
energy/application-aware scheduling technologies become possible - Evolution towards energy budget instead of CPU time budget for users
17 Sustainable Supercomputing – Walter Lioen
Phasing and on-demand growth requirements
January 31, 2013
Basic principle: stepwise growth of capacity with demand • Cost-effective use of available funding
• Less good for Top500 ranking
Phasing • Phase 0: as soon as possible in 2013:
- Installation of 1.5 current Huygens capacity (~100 TFlop/s)
• Phase 1: as soon as possible in 2013 (taking advantage of latest technology):
- Installation of 3 – 4 current Huygens capacity (195 – 260 TFlop/s)
• Phase 2: in 2014 (in part dependent on available technology):
- On-demand installation of at least 6 – 10 current Huygens capacity (at least
390 – 650 TFlop/s), dependent on user demand
18 Sustainable Supercomputing – Walter Lioen
Awarding requirements & weight
January 31, 2013
Awarding Requirements Weight
AR1 Hardware Requirements 10%
AR2 File system and I/O 10%
AR3 Software Requirements 10%
AR4 Operational Requirements (including energy usage) 15%
AR5 Maintenance, Support, Documentation and Training Requirements 5%
AR6 Applications Performance (through Applications Benchmark Suite) 40%
AR7 On-demand growth, phasing, partnership in innovation 10%
Total 100%
19 Sustainable Supercomputing – Walter Lioen
Specs of the new Cartesius supercomputer
January 31, 2013
Phase 0 (scheduled production May 2013, total peak perf. 89 TFlop/s)
• Fat node island (22 TFlop/s peak) - 32 fat nodes, 4 8-core Intel Sandy Bridge CPUs/node, 256 GB/node
• Thin node island (67 TFlop/s peak) - 202 thin nodes, 2 8-core Intel Sandy Bridge CPUs/node, 64 GB/node
Phase 1 (scheduled production July 2013, total peak perf. ~270 TFlop/s)
• Replacement of all thin nodes
• Installation of thin node islands with latest Intel Ivy Bridge CPUs - ~ 13,000 cores, 64 GB/node
Phase 2 (scheduled production from 2H 2014, total peak perf. > 1 PFlop/s)
• On-demand addition of thin node islands with latest Intel Haswell CPUs
Phase 1 – 2 (on-demand accelerator option)
• Addition of nodes with NVIDIA GPU or Intel Xeon Phi
20 Sustainable Supercomputing – Walter Lioen
Phased installation and on-demand growth
January 31, 2013
1 2 3 4 5 6
Data center
preparation
and delivery
(Early) access
for users
Production of
full phase 1
Upgrade phase 1
phase out of
Huygens
Installation of
phase 0
On-demand
growth to
> 1PFlop/s
Dec 2012 – Feb 2013
Feb – April 2013
July 2013 May 2013
2014 H2 May 2013
21 Sustainable Supercomputing – Walter Lioen
PRACE 2IP prototype:
Scalable Hybrid Architecture – CSC, Finland
January 31, 2013 Sustainable Supercomputing – Walter Lioen 22
EU collaboration: CSC, SURFsara, CSCS T-Platforms “T-REX” architecture • 192 compute nodes
- 48 Nvidia Kepler 48 Intel MIC - ~300 Tflop/s (~3 GF/s/W)
SURFsara research topics: • Programming paradigms
- Application porting to accelerator + MPI
• Energy policies - Dynamic Voltage and Frequency Scaling (DVFS)
Adjust frequency and voltage of the CPU. The actual workload determines which frequency/voltage is chosen.
- Dynamic Power Management (DPM) Power off when device becomes idle. Activation uses temporarily more energy.
- Maybe a hybrid policy, e.g. a mix of DPM and DVFS, is preferable.
Sustainability of / in / by Supercomputing – Summary
January 31, 2013 Sustainable Supercomputing – Walter Lioen 23
• Funding of NL supercomputing - SARA → SURFsara
• Requirements - general purpose: memory / core, not yet accelerators (for largest part), ... - (sustainability of parallel programming paradigms, think CUDA)
• Performance - application throughput: 7 most relevant applications, # jobs / lifetime - additional “application enabling effort”: 3 new fte (optimization, parallelization, scaling)
• Phasing - state-of-the-art processors (higher performance / lower energy)
• Energy - using “slower” processors (lower clock) - on-demand growth
• Cooling - warm water cooling → free cooling - cold corridors - (water cooled doors)
• Price - TCO: total budget =investment + energy + cooling + housing + ups (storage only)
• Price/Performance: hard optimization problem - maximization of application throughput / TCO: left as an “exercise” for the vendor
• Last but not least - Greening by IT is one of the supercomputing application areas
Thank you for listening!
January 31, 2013 Sustainable Supercomputing – Walter Lioen 24