39
Title DUNE on current and next generation HPC Platforms Markus Blatt Dr. Markus Blatt HPC-Simulation-Software & Services Forschungszentrum J¨ ulich, Germany March 8, 2012 M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene ulich 2012 1 / 33

DUNE on current and next generation HPC Platforms

Embed Size (px)

DESCRIPTION

In this talk we present the Distributed and Unified NumericsEnvironment (DUNE). It is a software framework for the parallelnumerical solution of partial differential equations with grid-basedmethods. Using generic programming techniques it strives for both:high flexibility (efficiency of the programmer) and high performance(efficiency of the program). We present parallel applications realized with DUNE andshow their scalability on current HPC platforms such as the BlueGene/P system in Jülich.Finally we will take a closer look on hardware attributes thatinfluence the scalability of DUNE and software solving partialdifferential equations in general. We investigate how DUNE willperform on future hardware like Blue Gene/Q.Special emphasis will be put on the performance of paralleliterative solvers both in general and in DUNE.

Citation preview

Page 1: DUNE on current and next generation HPC Platforms

Title

DUNE on current and next generation HPC Platforms

Markus Blatt

Dr. Markus BlattHPC-Simulation-Software & Services

Forschungszentrum Julich, GermanyMarch 8, 2012

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 1 / 33

Page 2: DUNE on current and next generation HPC Platforms

Outline

Outline

1 DUNE

2 ParallelizationParallel Grid InterfaceParallel Iterative SolversParallel Algebraic MultigridScalabilityA Glimpse at other DUNE projects

3 Trends and Outlook for HPC and DUNE

4 HPC-Simulation-Software & Services

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 2 / 33

Page 3: DUNE on current and next generation HPC Platforms

DUNE

Why we created DUNE!

Problems with most PDE software• Mostly support one special set of features:

• IPARS: block structured, parallel, multiphysics.• Alberta: simplicial, unstructured, bisection refinement.• UG: unstructured, multi-element, red-green refinement, parallel.• QuocMesh: Fast, on-the-fly structured grids.

• Other features either not or inefficiently supported.

The idea of DUNE• Separation of data structures and algorithms.

• Easy exchange of interface implementations.

• Reuse of legacy software.

• Fine grained interfaces.

• C++ with templates.

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 3 / 33

Page 4: DUNE on current and next generation HPC Platforms

DUNE

Modularity

dune−grid

ALUUG

dune−grid−howto

dune−fem

dune−istl

dune−commonAlberta

NeuronGrid

dune−pdelab−howto

dune−pdelab

dune−localfunctions

VTK Gmsh

SuperLU

Metis

• Grid interface: (non-)conforming hierarchically grid interface.

• Iterative Solver Template Library: Dense and sparse linear algebra.

• PDELab: Discretization module based on residual formulation.

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 4 / 33

Page 5: DUNE on current and next generation HPC Platforms

DUNE

PDELab: Plug, Code and Play Simulation Software

• Choose:• Grid• Finite Element• Maybe imlement local

operator.• Time stepping scheme.• (Non-)linear solvers.

• Recompile application

• Run efficiently.

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 5 / 33

Page 6: DUNE on current and next generation HPC Platforms

DUNE

Sample Simulations

Transport in porous media, Density-driven flow, Flow around Root networks, Neuron network simulations

• Electromagnetics• Computational neuroscience: biophysically realistic networks of neurons• Geostatistical inversion: coping with uncertain parameters• Linear Acoustics• Multiphase-Flow and transport in porous media• Density-driven flow

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 6 / 33

Page 7: DUNE on current and next generation HPC Platforms

Parallelization

Parallelization

Parallel Grids

• Domain decomposition (overlapping or non-overlapping).

• Load-balancing

• Message passing based on MPI is handled by grid manager.

Parallel linear algebra

• Message passing decoupled from grid.

• Abstraction: Parallel index sets to identify data globally.

• Reuse of efficient sequential linear algebra.

• Minimize communication.

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 7 / 33

Page 8: DUNE on current and next generation HPC Platforms

Parallelization Parallel Grid Interface

An Example of a Parallel Grid

c = 0

c = 0

c = 0

c = 1

c = 1

c = 1

c = 2

c = 2

c = 2

1

First row: withoverlap and ghostsSecond row: withoverlap onlyThird row: withghosts only

interior

overlap

ghost

border

front

not stored

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 8 / 33

Page 9: DUNE on current and next generation HPC Platforms

Parallelization Parallel Grid Interface

Parallel Grids in Dune

• YaspGrid• structured• 2D/3D• arbitrary overlap

• UGGrid• unstructured• 2D/3D• multi-element• one layer of ghost cells• (conforming) red-green refinement• (non-free!)

• ALUGrid• unstructured• 3D• tetrahedral or hexahedral elements• ghost cells• (non-conforming) bisection refinement

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 9 / 33

Page 10: DUNE on current and next generation HPC Platforms

Parallelization Parallel Grid Interface

Parallel Grids in Dune

• YaspGrid• structured• 2D/3D• arbitrary overlap

• UGGrid• unstructured• 2D/3D• multi-element• one layer of ghost cells• (conforming) red-green refinement• (non-free!)

• ALUGrid• unstructured• 3D• tetrahedral or hexahedral elements• ghost cells• (non-conforming) bisection refinement

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 9 / 33

Page 11: DUNE on current and next generation HPC Platforms

Parallelization Parallel Grid Interface

Parallel Grids in Dune

• YaspGrid• structured• 2D/3D• arbitrary overlap

• UGGrid• unstructured• 2D/3D• multi-element• one layer of ghost cells• (conforming) red-green refinement• (non-free!)

• ALUGrid• unstructured• 3D• tetrahedral or hexahedral elements• ghost cells• (non-conforming) bisection refinement

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 9 / 33

Page 12: DUNE on current and next generation HPC Platforms

Parallelization Parallel Iterative Solvers

Index Sets

Index Set

• Distributed overlapping index set I =⋃P−1

p=0 Ip

• Process p manages mapping Ip −→ [0, np).

• Might only store information about the mapping for shared indices.

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 10 / 33

Page 13: DUNE on current and next generation HPC Platforms

Parallelization Parallel Iterative Solvers

Index Sets

Index Set

• Distributed overlapping index set I =⋃P−1

p=0 Ip

• Process p manages mapping Ip −→ [0, np).

• Might only store information about the mapping for shared indices.

Global Index

• Identifies a position (index) globally.

• Arbitrary and not consecutive (to support adaptivity).

• Persistent.

• On JUGENE this is not an int to get rid off the 32 bit limit!

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 10 / 33

Page 14: DUNE on current and next generation HPC Platforms

Parallelization Parallel Iterative Solvers

Index Sets

Index Set

• Distributed overlapping index set I =⋃P−1

p=0 Ip

• Process p manages mapping Ip −→ [0, np).

• Might only store information about the mapping for shared indices.

Local Index• Addresses a position in the local container.

• Convertible to an integral type.

• Consecutive index starting from 0.

• Non-persistent.

• Provides an attribute to identify ghost region.

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 10 / 33

Page 15: DUNE on current and next generation HPC Platforms

Parallelization Parallel Iterative Solvers

Remote Information and Communication

Remote Index Information• For each process q the process p knows all common global indices

together with their attribute on q.

Communication• Target and source partition of the index is chosen using attribute

flags, e.g from ghost to owner and ghost.

• If there is remote index information of q available on p, then p sendall the data in one message.

• All communication takes place asynchronously at the same time.

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 11 / 33

Page 16: DUNE on current and next generation HPC Platforms

Parallelization Parallel Iterative Solvers

Parallel Matrix Representation

• Let Ii be a nonoverlapping decomposition of our index set I .

• Ii is the augmented index set such set for all k ∈ Ii with|akj |+ |ajk | 6= 0 also k ∈ Ii holds.

• Then the locally stored matrix looks like

Ii

Ii

Aii ∗

0 I

• Therefore Av can be computed locally for the entries associated withIi if v is known for Ii

• A communication step ensures consistent ghost values.

• Matrix can be used for hybrid preconditioners.

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 12 / 33

Page 17: DUNE on current and next generation HPC Platforms

Parallelization Parallel Algebraic Multigrid

Algebraic Multigrid (AMG)

Stationary Iterative Methods

• Error reduction stagnates with increasing number of iterations andunknowns

• Reduces only high frequency errors

Algebraic Multigrid

• approximate smooth residual on a coarser grid and solve there.

• calculate a correction there

• interpolate correction to the fine grid and add it to the current guess

• Use algebraic nature of the problem to define coarse level.

• Coarsening adapts to problem and grid automatically

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 13 / 33

Page 18: DUNE on current and next generation HPC Platforms

Parallelization Parallel Algebraic Multigrid

Aggregation AMG

Simple, non-smoothed version

• Piecewise constant prolongators Pl .

• Heuristic and greedy aggregation algorithm.

• Al−1 = PTl AlPl

• Proposed by Raw, Vanek et al., Braess

• Preconditioner for Krylov methods.

Observations• Reasonable coarse grid operator for systems.

• Preserves FV discretization.

• Very memory efficient.

• Fast and scalable V-cycle.

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 14 / 33

Page 19: DUNE on current and next generation HPC Platforms

Parallelization Parallel Algebraic Multigrid

Aggregation AMG

Simple, non-smoothed version

• Piecewise constant prolongators Pl .

• Heuristic and greedy aggregation algorithm.

• Al−1 = PTl AlPl

• Proposed by Raw, Vanek et al., Braess

• Preconditioner for Krylov methods.

Observations• Reasonable coarse grid operator for systems.

• Preserves FV discretization.

• Very memory efficient.

• Fast and scalable V-cycle.

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 14 / 33

Page 20: DUNE on current and next generation HPC Platforms

Parallelization Parallel Algebraic Multigrid

Illustration Parallel Setup

Decoupled Aggregation

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 15 / 33

Page 21: DUNE on current and next generation HPC Platforms

Parallelization Parallel Algebraic Multigrid

Illustration Parallel Setup

Communicate Ghost Aggregates

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 15 / 33

Page 22: DUNE on current and next generation HPC Platforms

Parallelization Parallel Algebraic Multigrid

Parallel Setup Phase

• Every process builds aggregates in his owner region.

• One communication with next neighbors to update aggregateinformation in ghost region.

• Coarse level index sets are a subset of the fine level.

• Remote index information can be deduced locally.

• Galerkin product can be calculated locally.

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 16 / 33

Page 23: DUNE on current and next generation HPC Platforms

Parallelization Parallel Algebraic Multigrid

Data Agglomeration on Coarse Levels

coarsen target

number of nonidle processor n1

num

ber

of

ver

tice

s per

pro

cess

or 0

1

L

L−1

(L−2)’

L−2

L−3

L−4

L−4’

L−5

L−6

L−6’

L−7

• Repartition the data ontofewer processes.

• Use METIS on the graph ofthe communication pattern.(ParMETIS cannot handlethe full machine!)

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 17 / 33

Page 24: DUNE on current and next generation HPC Platforms

Parallelization Scalability

Weak Scalability Results (Poisson)

procs 1/H lev. TB TS It TIt TT

1 80 5 19.86 31.91 8 3.989 51.778 160 6 27.7 46.4 10 4.64 74.2

64 320 7 74.1 49.3 10 4.93 123512 640 8 76.91 60.2 12 5.017 137.1

4096 1280 10 81.31 64.45 13 4.958 145.832768 2560 11 92.75 65.55 13 5.042 158.3

262144 5120 12 188.5 67.66 13 5.205 256.2

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 18 / 33

Page 25: DUNE on current and next generation HPC Platforms

Parallelization Scalability

Clipped Log-Random Problem

• −∇ · (k(x)∇u) = f in Ω

• κ(x) realization of log-random field with variance σ2, mean 0, andcorrelation length λ.

• k(x): binary medium constructed from κ(x).

• Weak scaling: λ scales with mesh width h.

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 19 / 33

Page 26: DUNE on current and next generation HPC Platforms

Parallelization Scalability

Weak Scalability Results (Possion vs. Clipped)

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 20 / 33

Page 27: DUNE on current and next generation HPC Platforms

Parallelization Scalability

Weak Scalability Results (Clipped Log-Random Problem)

• σ2 = 8, λ = 4h

procs 1/h lev. TB TS It TIt TT

1 80 5 19.93 49.39 12 4.116 69.328 160 6 28.1 73.7 15 4.91 102

64 320 7 75.1 105 20 5.26 180512 640 8 80.11 134 25 5.362 214.1

4096 1280 10 84.71 171.7 33 5.203 256.432768 2560 11 93.24 189.5 36 5.264 282.7

262144 5120 12 195.9 386.5 72 5.368 582.5

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 21 / 33

Page 28: DUNE on current and next generation HPC Platforms

Parallelization Scalability

Parallel Groundwater Simulation

Figure: Cut through the ground beneath an acre

• Highly discontinuous permeability of the ground.• 3D simulations with high resolution.• Efficient and robust parallel iterative solvers.

−∇ · (K (x)∇u) = f in Ω (1)

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 22 / 33

Page 29: DUNE on current and next generation HPC Platforms

Parallelization Scalability

Weak Scalability Results II

• Richards equation.• 64 ∗ 64 ∗ 128 unknowns per process.• 1.25E11 unknowns on the full JUGENE.• One time step in simulation.

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 23 / 33

Page 30: DUNE on current and next generation HPC Platforms

Parallelization Scalability

Efficiency Solver Components

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 24 / 33

Page 31: DUNE on current and next generation HPC Platforms

Parallelization Scalability

Efficiency IO

• Highly tuned by Olaf Ippisch

• SionLib from Julich rocks!

• Still: IO not very scalable!

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 25 / 33

Page 32: DUNE on current and next generation HPC Platforms

Parallelization A Glimpse at other DUNE projects

Robert Klöfkorn

Projects using DUNE-FEM on JUGENE

DFG SPP MetStrom: Adaptive Numerics for Multiscale Phenomena· D. Kroner, S.Brdar (Freiburg)· M. Baldauf, D. Schuster (DWD)· R. Klofkorn (Stuttgart)· A. Dedner (Warwick)

Mountain wave test case: work by S.Brdar

BW Stiftung HPC-11: Simulation of 2-stroke engines with detailed combustion· D. Kroner, D. Lebiedz, M. Nolte, M. Fein (Freiburg)· R. Klofkorn (Stuttgart)· A. Dedner (Warwick)

work by D.Trescher

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 26 / 33

Page 33: DUNE on current and next generation HPC Platforms

Parallelization A Glimpse at other DUNE projects

Robert Klöfkorn

Strong scaling on Blue Gene/P

Table: Strong scaling and efficiency on the supercomputer JUGENE (Julich, Germany)

#cores #cells/core1 #DOFs/core time (ms)2 speed-up efficiency

512 474 296250 46216 — —4096 59 36875 6294 7.34 0.91

32768 7 4375 949 48.71 0.7665536 3 1875 504 91.70 0.72

Navier-Stokes equations solved with CDG2 ( k = 4, 3D)overall number of cells 243 000 (#DOFS ≈ 1.52 · 108) on Cartesian grid≈ 6.1 GB memory consumption on a desktop machineexplicit Runge-Kutta method of order 3

programming techniques for performance· template meta programming· automated code generation of DG kernels· hybrid parallelization (MPI / pthreads , work in progress)· overlap computation and communication (DCMF INTERRUPT=1)

1average #cells/core2average run time per time step

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 27 / 33

Page 34: DUNE on current and next generation HPC Platforms

Trends and Outlook for HPC and DUNE

(My) Parallel Machines

Helics I (2003)

• 256 nodes

• Dual AMD Athlon 1,4GHz

• 5.9 GFLOPSpeak/node

• 1GB mainmemory/node

• Myrinet 2 Gbit

Helics II (2007)

• 156+4 nodes

• 2x Dual Core AMDOpteron 2220 2.8 GHz

• 18.8 GFLOPSpeak/node

• 8 GB RAM/node (21.4GB/s)

• Myricom 10Gbit

Blue Gene/P (2009)

• 73728 nodes

• PowerPC 450Quad-core 850Mhz

• 13,6 GFLOPSpeak/node

• 2 GB RAM/node (13.6GB/s)

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 28 / 33

Page 35: DUNE on current and next generation HPC Platforms

Trends and Outlook for HPC and DUNE

Observations in Parallel Computing

Software

• Solution of time dependent (nonlinear) equations with implicit timestepping schemes.

• Most time consuming: Solution of linear system.

• Peak GFLOPS out of reach!

• Methods are limited by memory bandwidth.

Hardware

• Costs for compute power drop fast (2002: 12 USD/MFLOP, 2011:.01 USD/MFLOP)

• Costs for main memory drop only slightly.

• Main memory not power efficient.

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 29 / 33

Page 36: DUNE on current and next generation HPC Platforms

Trends and Outlook for HPC and DUNE

Current Hardware/Software Trends

The hardware manufacturers solution (Green Computing)

• More cores per node

• Less main memory per core.

• SIMD (Blue Gene/Q, GPGPU)

• Increase GFLOPS per GB/s main memory speed

• Faster network interconnects.

How does DUNE cope

• Memory efficiency!

• Minimize communication!

• Favor less convergent iterative methods!

• Time to solution / scalability matters most!

• Ability to compute bigger problems faster.

Software always two steps behind.M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 30 / 33

Page 37: DUNE on current and next generation HPC Platforms

Trends and Outlook for HPC and DUNE

Current Work and Future Plans

Software Point of View

• Hybrid parallelization in ALUgrid (Klofkorn, University Stuttgart)

• Borrow ideas from GPGPU-computing (coalesced memory)

• Check out cache oblivious algorithms

• Check out parallel in time algorithms.

Application Point of View

• Inverse Modeling• Geostatistical inversion: University Heidelberg (Ippisch, Ngo) and

Tubingen (Cirpka, Schwede)• Run several parallel forward simulation in parallel.

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 31 / 33

Page 38: DUNE on current and next generation HPC Platforms

Trends and Outlook for HPC and DUNE

DUNE on Blue Gene/Q?

Advantages of a central installation:

• Saves scientists a lot of time.

• Would help optimizing DUNE for the platform.

• Possibility of professional installation and user support.

• Closer cooperation of DUNE and IBM brings benefits to all.

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 32 / 33

Page 39: DUNE on current and next generation HPC Platforms

HPC-Simulation-Software & Services

What can we do for you?

Efficient simulation software made to measure.

Hans-Bunte-Str. 8-10, 69123 Heidelberg, Germanyhttp://www.dr-blatt.de

M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene Julich 2012 33 / 33