41
Mitglied der Helmholtz-Gemeinschaft Computational Science at the Jülich Supercomputing Centre Paul Gibbon C2S@EXA Meeting, INRIA, Paris, 8 November 2016

Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

Mitg

lie

d d

er H

elm

ho

ltz-

Ge

me

inscha

ft

Computational Science at the

Jülich Supercomputing Centre

Paul Gibbon

C2S@EXA Meeting, INRIA, Paris, 8 November 2016

Page 2: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 2

FZ-Juelich:

5800 staff

2000 scientists

800 PhD students

Budget: 600M€

Mission:

• Energy

• Climate

• Transport

• Health• Key Technologies

Page 3: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 3

Jülich Supercomputing Centre

Page 4: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 4

Supercomputer operation for: Centre – FZJ

Region – RWTH Aachen University

Germany – Gauss Centre for Supercomputing

John von Neumann Institute for Computing

Europe – PRACE, EU projects

Application support Unique support & research environment at JSC

Peer review support and coordination

R&D work Methods and algorithms, computational science,

performance analysis and tools

Scientific Big Data Analytics with HPC

Computer architectures, Co-DesignExascale Labs together with IBM, Intel, NVIDIA

Education and training

Jülich Supercomputing Centre

Page 5: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 5

JSC Dual Architecture Strategy

File Server

Lustre GPFS

IBM Power 4+ JUMP, 9 TFlop/s

IBM Power 6

JUMP, 9 TFlop/s

IBM Blue Gene/P

JUGENE, 1 PFlop/sIntel Nehalem

JUROPA300 TFlop/s

General-PurposeCluster Highly Scalable System

Intel Haswell

JURECA~ 2,2 PFlop/s

+ Booster~ 10 PFlop/s

IBM Blue Gene/Q

JUQUEEN5.9 PFlop/s

IBM Blue Gene/L

JUBL, 45 TFlop/s

JUQUEEN successor

~ 50 PFlop/s

Page 6: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 6

Research and Support Environment

Page 7: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 7

The Simulation Laboratory as HPC Enabler

Advisory Board

Co

mm

un

ity G

rou

ps Simulation Laboratory

Support:

Application analysis

Re-engineering

Community codes

Workshops

Research:

Scalable algorithms

XXL simulations

3rd party projects

Hardware co-design

Cross-Sectional Teams, Exascale and Data Laboratories

Page 8: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 8

Simulation Labs @ JSCB

iolo

gy

Nu

cle

ar

&

Pa

rtic

le

Flu

id &

So

lid

En

g.

Pla

sm

a

Ph

ys

ics

Te

rres

tria

l

Sy

ste

ms

Ab

In

itio

Me

tho

ds

Mo

lec

ula

r

Sy

ste

ms

Clim

ate

Sc

ien

ce

Ne

uro

-

sc

ien

ce

Page 9: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

Mitg

lie

d d

er H

elm

ho

ltz-

Ge

me

inscha

ft

Applications

• Atomistic methods in materials science

• Computational biology

• Mesh-free plasma modelling

• Hydrology in terrestrial systems

• Linear solvers in Ab Initio computation

• Quantum information processing

• (Parallel in Time methods)

Page 10: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 10

HPC in Materials Science - Godehard Sutmann

• High Performance Computing on

various length and time scales to

extend range of applications in

numerical experiments

• Atomistic methods

• bond order potentials

• embedded atom MD

• force field based molecular dynamics

• hybrid MD/Monte Carlo

• Mesoscopic methods

• phase field

• Development of HPC methods

JUQUEEN (Jülich) Vulcan (ICAMS)

Page 11: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 11

• Very high efficiency obtained for IMD on JUQUEEN

• Enables simulations on large

atomistic scales

• Size effects in dislocation networks

• Sub-micrometer particle deposition

on surfaces

Molecular Dynamics IMD

11

Scaling up to 458,752 JUQUEEN coresusing 1.835 Mio. Threads (!)

Page 12: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 12

Adaptive load-balancing

• Cell based deformation of computational domains – particles follow

their container cells

• Adaptive LB by “force”-based motion of vertices

• Basic data and communication structures unchanged

• Disadvantage: Larger surface possible communication overhead

Page 13: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 13

Results

• Droplet simulation with ~1.000.000 atoms

• 4x4x4 domains dynamically balanced

• No extensive idle time on CPUs

• Integrated into OpenSource MD code IMD

• Method applicable to a wide range

of particle simulation methods

Imbalancefactor

Page 14: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 14

Top7: 92 amino acids, designed

• All-atom Replica Exchange

Markov Chain Monte Carlo

simulation using ProFASi

• ~ 20k CPU/h per folding event

on JUROPA (cur. ~ 30 foldings)

• Free energy minimum at 3.5 Å

from the experimental structure

• Largest protein folded ab initio

• Experimental folding time: ~ 1 s

!!

gray: x-ray

color: simulation

potential use in Biotechnology

• SynBio: Top7 used as enzyme

scaffold, high thermostability

Computational Biology– Olav Zimmermann

Page 15: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 15

Biotech application

• Partner: FZJ_IBG-1 (W.

Wiechert)

E. v. Lieres, S. Kondrat

• Project: Multiscale modeling of

3D- reaction diffusion systems by

combining Brownian Dynamics and

Finite Element approaches.

• Support goal: Scaling to large

numbers of processors to allow for

relevant system sizes.

• Applications: exploration of

spatial influences on biochemical

reactions: crowding, natural and

artificial compartments, enzyme

immobilization, enzyme tethering.

Page 16: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 16l1 lNa b

Parallelism over slices: level 1

Parallelism over

poles: level 2

(A – Izi) V = YParallel linear solver:

level 3

Least-Squared Optimized Poles (LSOP) eigensolver - Edoardo Di Napoli A x = l x

eigenspectrum

Spectral projector: a rational function in complex plane

z1

z2

z3 z4

z5

z6

Si ai(A – Izi)-1

Page 17: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 17

Quantum Information Processing- Kristel Michielsen

Investigate the genuine

quantumness of the D-Wave Two and D-

Wave 2X chips with 512 and 1152 super-

conducting qubits, respectively

QIPJSC

World record: 43 qubitsApplications: Physical realizations of quantum

computers, sources and control of decoherence,

security tests of QKD (non-ideal components)

Reproduces the

statistical distributions of quantum theory by

modeling physical phenomena as a chronological sequence of single events

Applications: interference,entanglement, quantumcryptography, …

Adiabatic quantum computing

Discrete event simulation of QIP

Large scale multi-qubit systems simulator

Page 18: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 18

Massively parallel quantum spin dynamics simulator

Applications: Physical realizations of quantum computers, sources and

control of decoherence, security tests of quantum key distribution

N qubits is superposition of 2N quantum states !i Ht

Comp. Phys. Comm. (2007),

Phys. Rev. B (2012), Phys. Rev. A (2013)

43 qubits: world record

Page 19: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 19

• Method using quantum

fluctuations to search for

the solution of an

optimization problem.

• In physics, the search of

the ground state of a spin-

glass is a typical example

of a hard optimization

problem.

• Spin-glass: magnetic system

with frustrated interactions

Quantum annealing

: ferromagnetic bond, satisfied

if both spins are parallel: antiferromagnetic bond, satisfied

if both spins are antiparallel

: satisfied bond

: unsatisfied bond

Page 20: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 20

Benchmarking of a quantum annealing computer and (quantum)

annealing simulators

Quantum annealing

Quantum annealing computer manufactured

by D-Wave

Page 21: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 21

Mesh-free plasma simulation (P Gibbon): N-body problem with long-range potential

Page 22: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 22

Parallel algorithm : space-filling curve

Page 23: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 23

PEPC Framework: physics modules

Page 24: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 24

Mesh-free plasma-wall particle

simulation at scale

458752

(64 x 109 particles)

Kelvin-Helmholtz

Instability in

magnetised plasma

Scaling of hybrid electrostatic tree-

code algorithm across entire JUQUEEN

Steinbusch, B. ; Gibbon, P. ; Sydora, R. D.

Physics of plasmas 23, 052119 (2016)

Page 25: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 25

Outlook: Mesh-free Darwin model

Poisson-like

field-solver

Implicit

integrator

L. Siddi, G. Lapenta, PG, Physics of plasmas (in prep., 2017)

Page 26: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 26

Terrestrial systems:

- Klaus Görgen

http://www.esrl.noaa.gov

Complex interactions and

feedbacks between

various sub-systems of

the geo-ecosystem (e.g.

pedo-, bio-, hydro- or atmosphere) at a

multitude of spatio-

temporal scales

Anthropogenic climate

system changes modify

land surface and ecosystem processes

with impacts on may

sectors (e.g. water

management,

argriculture, power generation)

Page 27: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 27

ParFlow hydrologic model

Simulates surface and

subsurface flow: highly

resolved simulations afford

the identification of patterns

across multiple space

scales and new approaches

in water resources

assessment

Map of water table depth (m) over continental USA with two insets

zooming into the North and South Platte River basin, headwaters to

the Mississippi River. Colors represent depth in log scale (from 0.01

to 100 m). Maxwell et al 2015

Page 28: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 28

• Fully integrated multi-physics

simulation platform, towards earth system models at regional scale, multi-disciplinary research tool

• Current primary focus: water cycle

• Complex real-world patterns are

resolved

• Conceptual “virtual realities” aid in

process understanding

Coupled model system TerrSysMP

BG/Q scaling: 32768 procs => 64 x bigger problem size

• Use of highly flexible external coupler OASIS

• Refactoring of coupling interface towards extreme scaling

• Close coop with Parallel Performance team

• Collaboration with FZJ IBG-3, Uni Bonn (MIUB)

• Exploration of new HPC architectures

Page 29: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 29

First Successes

Refactoring of OASIS-MCT

coupling interface to remove

scaling bottleneck

Scaling now to 32k cores:

64x increased problem size!

COSMO

Page 30: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 30

High-Q Club

- Dirk Brömmel

Page 31: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 31

High-Q scaling results

Page 32: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 32

Current membership

Page 33: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 33

JSC Dual Architecture Strategy

File Server

Lustre GPFS

IBM Power 4+ JUMP, 9 TFlop/s

IBM Power 6

JUMP, 9 TFlop/s

IBM Blue Gene/P

JUGENE, 1 PFlop/sIntel Nehalem

JUROPA300 TFlop/s

General-PurposeCluster Highly Scalable System

Intel Haswell

JURECA~ 2,2 PFlop/s

+ Booster~ 10 PFlop/s

IBM Blue Gene/Q

JUQUEEN5.9 PFlop/s

IBM Blue Gene/L

JUBL, 45 TFlop/s

JUQUEEN successor

~ 50 PFlop/s

Page 34: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 34

Dilemma #1 Grand Challenge applications require extreme performance

Not achievable with general purpose architectures (x86 Clusters):Cost, Energy

Highly scalable architectures not suitable for applications requiring high single node performance, large memory per core

Solution: Dual architecture approach JUQUEEN - Highly scalable system

JUROPA / JURECA – general purpose system

Common Storage

JSC’s Architecture Strategy – Rationale

Page 35: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 35

Dilemma #1 Grand Challenge applications require extreme performance

Not achievable with general purpose architectures (x86 Clusters):Cost, Energy

Highly scalable architectures not suitable for applications requiring high single node performance, large memory per core

Solution: Dual architecture approach JUQUEEN - Highly scalable system

JUROPA / JURECA – general purpose system

Dilemma #2 Parts of complex applications often have different requirements and

scalability properties

Heterogeneous architectures / clusters with accelerators: static ratio of CPU / Accelerator performance potentially wastes resources and energy

Solution: Cluster – Booster concept Separation of CPU (Cluster) and Accelerator (Booster) allows dynamic

resource allocation to optimize resource utilization

Requires substantial application modifications

JSC’s Architecture Strategy – Rationale

Page 36: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 36

More Diverse Challenges …

Extreme Scale Computing

Big Data Analytics

Deep Learning

Interactivity

Page 37: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 37

… lead to Modular Supercomputing

Module 2:

GPU-Acc

Module 3:

Many core Booster

Module 1:

Cluster

BN

BN

BN

BN

BN BN

BN

BN

BN

CN

CN

Module 4:

Memory

Booster

NAM

NAM

NAM

NICMEM

NICMEM

Module 5:

Data

Analytics

DN DN

Module 6:

Graphics

Booster

GN GN

Module 0:

Storage

GA

GA

GA

GA

GA GA

DiskDiskDisk Disk

Page 38: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 38

Cluster-Booster Architecture (EU Project DEEP)

CN

CN

CN

InfiniBand

Cluster

BI BN

BI BN

BI BN

BN

BN

BN

BN

BN

BN

EXTOLL

Booster

Low/Medium scalable code parts Highly scalable code parts

Page 39: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 39

DEEP Complete System

DEEP

Cluster

DEEP

Booster

Page 40: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 40

Neuromorphic andQuantum Computers

BrainScales Neuromorphic

Hardware Heidelberg

(Karlheinz Meier)

Spinnaker

Manchester

(Steve Furber)

D-Wave

Quantum Annealer

1

Nz z z

P i i ij i j

i ij

H h J

Page 41: Jülich Supercomputing Centre · 2017-03-15 · Refactoring of OASIS-MCT coupling interface to remove scaling bottleneck Scaling now to 32k cores: 64x increased problem size! COSMO

6 November 2016 C2S@EXA Meeting, INRIA, Paris 41