Upload
simtec
View
109
Download
4
Embed Size (px)
Citation preview
1
Prof. dr. sc. Zlatan Car
2016 ANSYS' Convergence ConferenceLjubljana
Rmax: 233.565 TFlop/s
Rpeak: 287.539 TFlop/s
Peak Power (kW): 108.48
Processor: Xeon E5-2690v3 12C 2.6GHz
Sockets per Node: 2
Cores per Socket: 12
Nodes: 288
Primary Interconnect: InfiniBand FDR
441 175 48
Prof. dr. sc. Zlatan Car2
Schematic Architecture
3
Supercomputer BURA• Hybrid computer architecture – multiprocessor and multicomputer system
• System with a large amount of shared memory and processors (SMP) is comprised of 12TB of memory, 512 processor cores and 245TB of local data storage.
• Multicomputer system - computer cluster: 288 compute nodes (6912 processor cores); each node comprises 64GB of memory and 320GB of data storage. In total 18 terabytes of RAM and disk systems capacities of 850 terabytes.
• Four nodes with double accelerators.
• Central system for data storage has a capacity of 1PB, while data archive has 2,5 PB.
4
Prof. dr. sc. Zlatan Car
Supercomputer BURA
Cooling system consists of 2 subsystems:• WARM WATER subsystem for direct cooling of computing nodes
• COLD WATER subsystem for cooling of other components including 4 cooling chambers from data center (Cooling power is 210kW with additional 210kW in redundancy)
• Both subsystems have possibility to operate in a free-cooling mode when the outside temperature is below 27 (24) °C, inlet water temperature is 35 (30)°C
• 550 meters of pipes and 10 pumps have been installed in the cooling system. As for the automation purposes, 45 relays for digital outputs and 110 different types of sensors for analog and digital inputs and outputs have been integrated. Only for interconnection of control components, 5 km of cable is used.
5
Prof. dr. sc. Zlatan Car
Bura Supercomputer
6
bullx DLC B700 Rack
Solution
Plateform bullx DLC B700
Sockets Intel® Xeon EP E5-2690 v3
High Speed InterconnectNetwork
InfiniBand Mellanox 4x FDR (56 GB/s)
Management Network
Ethernet 1 GbitCisco 3750
Large memory nodes
bullx S6130
Service & I/O nodes
bullx R423-E4i
Scratch Storage Lustre File system
Home Storage NFS based*
Software Stack/OS bullx scs4 / RHEL 6
HSM Solution Graudata (GAM) software
▶ Mature Technology
▶ Second DLC BladeGeneration
▶ Standard Architecture
bullx Direct Liquid Cooling system
Cooperative Power
Chassis (CPC)
bullx DLC B700 chassis
Front view
Rear view
Compute Solution based on Conventionnal CPU
8
bullx DLC B720 Blade
Conventional Nodes
Racks DLC B700 4
B700 Châssis 16
B720 computeblades
144
Compute Nodes 288
Intel®E5-2690 v3(12c, 2.6 GHz, 30 Mo/135W)
576
bullx DLC B700
bullx DLC B700 chassis
287,5 Tflops Peak
18,5 TB memory
78,7% HPL efficiency
Compute Solution based on Accelerators node
9
Accelerators Nodes
Racks DLC B700 No addon
B700 Châssis 1
B715 compute blades 4
Compute Nodes 4
NVIDIA K40 8
▶ Per node
– 2 Ivy Bridge-EP E5-2650 v3
– 64 GB memory (8 x 8 GB DDR3 DIMMs 1866 MT/s)
– 2 NVIDIA K40 GPU
– 1 HDD 320 GB
– 1 IB 4x FDR C-IB NIC
– 1 Ethernet 1 Gb
– 1 BMC
8 GT/s
IB
FD
R5
6G
b/
s
DMI
IV
B
4x DDR3@1600
51.2GB/s
IV
B
bullx B715
CX
3H
DD
/S
SD
PC
H
PCIe-3 8x
HD
D/S
SD
gb
E
B H
BM
C
Accelerator
PCIe-316x
PCIe-316x
Accelerator
bullx S6130
10
▶ 2 bullx octo-modules S6130
▶ Second Generation of bullx Shared memory nodes
▶ Based on BCS2 chip & Intel Haswell EX sockets (Intel Brickland)
▶ Local Storage attachment (3x NetApp E5500)
Per bullx S6130
Bi-sockets modules
2
Sockets 16
SKU E7-8867 v3 cores (16c/2.5 GHz/9,6 GT/45Mo, 165W)
Sharedmemory
6 TB (384 x 16 GB RDIMM)
Raid boardSAS
4
HDD 600Go SAS 10k
8
10G ports (RJ45)
2
BMC 1
Octo-modules
All-To-All TopologyXQPI link: 14 GT/s unidirectional
Bura Supercomputer
11
Performances
System Benchmark Test case Labelling Commitmts Final
Thin Nodes
HPL
Performance 234.9 TFlops 233.56 Tflops
Power consumption at
Rmax123.6 kW 108.48 kW
Computational efficiency
81.7 % 98.97 %
Power Efficiency1.90
TFlops/kW2.15
TFlops/kW
SPEC mpi 2007 xSPECmpiL_2007_
base90 90,8 (92,4)
OpenFoam Motorbike ClockTime 3.66 min 3.53 min
LS-Dyna Car2Car Elapsed time 25.41 min 22.00 min
Bura Supercomputer
12
Performances
System Benchmark Test case Labelling Commitmts Final
SMP Nodes
iozone 3.347
x Read throughput 12 GB/s 12.50 GB/s
x Write throughput 12 GB/s 13.21 GB/s
SPEC omp 2012 xSPECompG_2012
_base43 47.9
Gaussian G09 D.01
g09-big.com Wall-clock time 5.83 h 5.13 h
Abaqus s4e Elapsed time 1.08 h 0.93 h
Scratch iozone 3.347
x Read throughput 30 GB/s min 37.91 GB/s min
x Write throughput 30 GB/s min 32.85 GB/s min
FDR Throughput IMB Ping Pong x x 6,31 GB/s 6,31 GB/s
Running on BURA / What do we compute?
• Design of complex computational systems (HPC Cloud)
• Design of low power computing• Computational chemistry and biology• Computer aided drug design• Cyber security• Structural mechanical engineering and
design• Advanced scheduling and planning systems• Advanced weather forecast• Air pollution• Fluid dynamics
Example of calculation
Prof. dr. sc. Zlatan Car
Prof. dr. sc. Željko Svedružić (Department of biotechnology, UNIRI)
Center for advanced computing and modelling
• Technical expertise:• Design of High Performance Computing system (cluster, hybrid cluster, SMP)
• Design of data storage
• Design of complex green Data Center
• Procurement expertise:• Preparation of procurement documentation (technical and capacity
requirements)
• Leading of procurement, open call and competitive negotiation dialogue
Consulting expertise
Collaboration / Joint projects
17
Email: [email protected]: www.cnrm.uniri.hr