16
EURORA the greenest supercomputer Carlo Cavazzoni, CINECA 8/7/2013

EURORA - Prace Training Portal: Events SANAM-King Abdulaziz City for ... Laboratory #7 CADMOS BG/Q-Ecole Polytechnique Federale de Lausanne #8 Interdisciplinary Centre for ... EURORA

Embed Size (px)

Citation preview

EURORA the greenest supercomputer

Carlo Cavazzoni, CINECA

8/7/2013

#1 Eurora-CINECA

#2 Aurora Tigon-Selex ES Chieti

#3 Beacon-National Institute for

Computational Sciences/University of

Tennessee

#4 SANAM-King Abdulaziz City for

Science and Technology

#5 IBM Thomas J. Watson Research

Center

#6 Cetus-DOE/SC/Argonne National

Laboratory

#7 CADMOS BG/Q-Ecole

Polytechnique Federale de Lausanne

#8 Interdisciplinary Centre for

Mathematical and Computational

Modelling, University of Warsaw

#9 Vesta-DOE/SC/Argonne National

Laboratory

#10 University of Rochester

The Green500 List - June 2013

EURORA (EURopean many integrated cORe Architecture)

Prototype Project

Founded by PRACE 2IP EU project Grant agreement number: RI-283493

Goal: evaluate a new architectures for next generation Tier-0 system

PRACE Partners involved:

CINECA (Italy),

GRNET (Greece), IPB (Serbia), NCSA, (Bulgaria)

Vendor:

Eurotech

EURORA

project objectives

Address Today HPC Constraints: Flops/Watt,

Flops/m2,

Flops/Dollar.

Efficient Cooling Technology: hot water cooling (free cooling);

measure power efficiency, evaluate (PUE & TCO).

Improve Application Performances: at the same rate as in the past (~Moore’s Law);

new programming models.

Evaluate Hybrid (accelerated) Technology: Intel Xeon Phi; NVIDIA Kepler.

Custom Interconnection Technology: 3D Torus network (FPGA);

evaluation of accelerator-to-accelerator

communications.

EURORA,

chassis 1 rack, 16 chassis

16 nodes card or

8 nodes card + 16 accelerators

Eurora Rack

Physical dimensions: 2133mm(48U) h, 1095mm w, 1500 mm d;

Weight (full rack with cooling fully loaded with water): 2000Kg

Power/Cooling typical requirements: 120-130 kW @ 48 Vdc

cooling

Hot water 50-80C

Temperature gap 3-5C

No rotating fans

Cold plates – direct on component liquid cooling

Dry chillers

Free cooling

Temperature sensors – downgrade performance is

required

System isolation

Quick disconnect

EURORA,

node

2 Intel Xeon E5

2 Intel MIC or

2 nVidia Kepler

16GByte DDR3 1.6GHz

SSD disk

Node card

8

Xeon PHI

K20

EURORA Network

3D Torus custom network

FPGA (Altera Stratix V)

APENET

Ad-hoc MPI subset/API

InfiniBand FDR

Mellanox ConnectX3

MPI + Filesystem

64 compute cards

128 Xeon SandyBridge (2.1GHz, 95W and 3.1GHz, 150W)

16GByte DDR3 1600MHz per node

160GByte SSD per node

1 FPGA (Altera Stratix V) per node

IB QDR interconnect

3D Torus interconnect

128 Accelerator cards (NVIDA K20 and INTEL PHI)

EURORA

prototype configuration

Definition of System Metrics and Benchmarks.

Performance Measurements (Flops/Watt using Linpack).

Connectivity Benchmarks (Bandwidth and Latency).

Application Porting.

Application Benchmarks (time, scalability, watt to solution).

Different Operational Conditions (e.g. change water temp.).

EURORA

planned experiments

• Material Science (Quantum-ESPRESSO)

• Life Science (GROMACS)

• Fundamental Physics (QCD)

• Earth Science / Weather Forecast

• High Throughput Virtual Screening (Pharma

industry - DOMPE’)

EURORA

applications

• Message Passing (MPI)

• Shared Memory (OpenMP)

• Kernel offload (pragmas / native)

• Hybrid: MPI + OpenMP + extensions/OpenCL

EURORA

programming models

First results

DATASET: Ta2O5-2x1xz-552, 20 iterations EURORA (5nodes, 10 K20 GPU, 10 MPI task, 8 OpenMP threads per core): 789.2 secs

PLX (5nodes, 10 M2070 GPU, 10 MPI task, 6 OpenMP threads per core): 2180.4 secs

BGQ (64nodes, 256 MPI task, 8 OpenMP threads per core ) : 920.4 secs

789

2180

920

0

500

1000

1500

2000

2500

EURORA (5 nodes) PLX (5 nodes) FERMI (64 nodes)

seco

nd

sQuantum-ESPRESSO - Benchmark

DATASET: Ta2O5-2x1xz-552

with

GPU

DATASET: 256 H2O molecules BGQ (1024cores) : 9.0 seconds/iteration

BGQ (2048cores) : 5.0 seconds/iteration

BGQ (4096cores) : 3.7 seconds/iteration

EURORA (32core,2.1GHz): 71 seconds/iteration

EURORA (32core,3.1GHz): 61 seconds/iteration

EURORA (64core,3.1GHz): 35 seconds/iteration

EURORA (128core,3.1GHz): 19 seconds/iteration

EURORA (256core, 3.1GHz): 12 seconds/iteration

0

10

20

30

40

50

60

70

80se

con

ds/

ite

rati

on

Quantum-ESPRESSO benchmarkDATASET: 256 H2O

without

GPU

ACCESS

• Access will be granted upon request to all

partners of the prototype project.

• ISCRA, LISA, PRACE

• Other requests will be evaluated case by case.

• We are working to grant early access to the

KNC board already installed.

16