14
DFT requirements for leadership-class computers N. Schunck Department of Physics Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics Division, Oak Ridge National Laboratory, Oak Ridge, TN-37831, USA http://unedf.org The 3rd LACM-EFES-JUSTIPEN Workshop JIHIR, Oak Ridge National Laboratory, February 23-25, 2009 A. Baran, J. Dobaczewski, J. McDonnell, J. Moré, W. Nazarewicz, N. Nikolov, H. H. Nam, J. Pei, J. Sarich, J. Sheikh, A. Staszczak, M. V. Stoitsov, S. Wild

DFT requirements for leadership-class computers

Embed Size (px)

DESCRIPTION

http://unedf.org. DFT requirements for leadership-class computers. N. Schunck Department of Physics  Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics Division, Oak Ridge National Laboratory, Oak Ridge, TN-37831, USA. - PowerPoint PPT Presentation

Citation preview

Page 1: DFT requirements for leadership-class computers

DFT requirements for leadership-class computers

N. SchunckDepartment of Physics Astronomy, University of Tennessee, Knoxville, TN-37996, USA

Physics Division, Oak Ridge National Laboratory, Oak Ridge, TN-37831, USA

http://unedf.org

The 3rd LACM-EFES-JUSTIPEN Workshop JIHIR, Oak Ridge National Laboratory, February 23-25, 2009

A. Baran, J. Dobaczewski, J. McDonnell, J. Moré, W. Nazarewicz, N. Nikolov, H. H. Nam, J. Pei, J. Sarich, J. Sheikh, A. Staszczak, M. V. Stoitsov, S. Wild

Page 2: DFT requirements for leadership-class computers

Nuclear DFT: Why supercomputing?1

Why super-computers:

Large-scale problems (LACM): fission, shape coexistence, time-dependent problems

Systematic restoration of broken symmetries and correlations “made easy” (QRPA, GCM?, etc.)

Optimization of extended functionals on larger sets of experimental data

DFT: A global theory

Supercomputers: DFT at full power…

Ground-state of even nucleus can be computed in a matter of minutes on a standard laptop: why bother with supercomputing?

Principle: average out individual degrees of freedom Treatment of correlations ?

Current lack of quantitative predictions at the ~100 keV level

Extrapolability ?

“No limit” theory: from light nuclei to the physics of neutron stars

Rich physics

Fast and reliable

Page 3: DFT requirements for leadership-class computers

Classes of DFT Solvers2

1D 2D 3D

r-space1 mn, 1 core

(HFBRAD)5 hours,70 cores

(HFBAX)-

HO basis -2 mn, 1 core

(HFBTHO)5 hours, 1 core

(HFODD)

Computational package used and developed at ORNL and estimate of the resources needed for a standard HFB

calculation

Coordinate-space: direct integration of the HFB equations Accurate: provide « exact » result Slow and CPU/memory intensive for 2D-3D geometries

Configuration space: expansion of the solutions on a basis (usually HO) Fast and amenable to beyond mean-field extensions Truncation effects: source of divergences/renormalization issues Wrong asymptotic unless different bases are used (WS, PTG, Gamow, etc.)

Non-linear integro-differential fixed point problem

Page 4: DFT requirements for leadership-class computers

Recent physics achievements3

Even-even, odd-even and odd-odd mass tables

Nuclear fission

Systematics of odd-proton states in odd nucleiCf. Talks by M. Stoitsov, S. Wild and J.

Moré

Online resources:

http://massexplorer.org/

http://unedf.org/

Page 5: DFT requirements for leadership-class computers

Petascale and beyond4• Hardware constraints (see R. Lusk and J. Vary’s talks):

Many cores (100,000+) stacked into sockets - Currently 4 cores/socket, evolution toward 8 cores/socket and more

Small-memory per core (shared memory per socket) Short, crash-prone, expensive runtime

• Consequences on the architecture of DFT solvers: Optimize time of one HFB calculation: reduce number of iterations, use symmetries

smartly by improving/interfacing codes, parallelization, etc. Work on parallel wrapper: load balancing, checkpoints, error control mechanisms, etc.

Page 6: DFT requirements for leadership-class computers

Optimization - Interface HFBTHO/HFODD

• Restarting HFODD from HFB-THO means:– Tremendous gain in time of calculation

– Accrued numerical stability

– Taking advantage of existing mass tables

• Procedure:– Coordinate + phase transformation (both unitary)

– Modify HFODD to restart from HFB matrix elements instead of density fields on Gauss-Hermite mesh

5• Interface fulling

working for spherical HO bases (precision of restart at 10-4 - 10-6)

• Memory issue for deformed bases

HFB-THO: Axial

Cylindrical coordinates

Time-reversal symmetry

j-block diagonalization

HFODD: symmetry-unrestricted

Cartesian coordinates

Y-simplex eigenbasis

No time-reversal symmetry

Full diagonalization

Page 7: DFT requirements for leadership-class computers

6 Optimization – HFODD Profiling

Broyden routine: storage of NBroyden fields on 3D Gauss-Hermite mesh

Temporary array allocation for HFB matrix diagonalization

neutrons protons

Calculations by J. McDonnell

Safe limit memory/core on Jaguar/Franklin

Page 8: DFT requirements for leadership-class computers

7 Optimization – HFODD ParallelizationM

M

• Two levels of parallelism handled by simple MPI group structure– Nuclear configuration (Z, N, interaction, {Qλμ}, etc.)

– HFB solver

• Standard PBLAS and ScaLAPACK libraries for distributed linear algebra

• Natural splitting of the HFB matrix (OpenMP): perhaps not scalable enough

• Splitting:– HFB matrix into N blocks– Eigenfunctions conserve the same N-blocks splitting – Densities must be re-constructed piecewise

• Challenges– Identify self-contained set of all matrices required for one iteration– Handling of conserved symmetries: give different block

structure– Identify and replace all BLAS calls by PBLAS equivalents

M

M

Page 9: DFT requirements for leadership-class computers

Optimization - Finite-size spin instabilities8• Response of the nucleus to a

perturbation with finite momentum q studied in the RPA theory

• Channels: scalar-isoscalar, scalar-isovector, vector-isoscalar, vector-isovector, etc.

Modern Skyrme functionals are highly-instable with respect to finite-size spin perturbations !

Convergence of the HFB calculation of 100 blocked states in 157-165Ba

Region of instability

T. Lesinski et al, Phys. Rev. C 74, 044315 (2006)D. Davesne et al, arXiv:0906.1927 (2009)

Warning for next generation of functionals: stability must be assessed !

Page 10: DFT requirements for leadership-class computers

Work in progress - Fission9• Example of challenges for next generation DFT: microscopic description of nuclear

fission• Degrees of freedom at the HFB level: deformation, temperature• Potential energy surfaces depend critically on interaction/functional and pairing

correlations

• Computational tools– Augmented Lagrangian

Method – Broyden Method

• Precision tools– Large bases – Benchmarks

• Distributed computing tools– MPI wrapper – Load balancing – Efficient, independent,

constraint calculations

Static HFB pre-requisites

Page 11: DFT requirements for leadership-class computers

DFT Computing Infrastructure10

Interfacing codes

Parallelize solver

Load balancing

Page 12: DFT requirements for leadership-class computers

11 Deliverables Year 2-3

• Have a DFT package combining HFB-THO and HFODD available for large-scale calculations

• Optimize full diagonalization of “large” (4,000 4,000) matrices in HFODD

– Take advantage of N-core architecture

– Increase speed for large bases (fission, heavy nuclei)

– Overcome current memory limitations

• Optimize Broyden method (Cf. Jorge’s talk) to improve stability/convergence

• Papers on odd nuclei:

1.Methodology and Theoretical Models

2.Systematic and comparison with experiment

Workplan Year 2-3 Current Status

Done (for spherical bases) - large-scale calculations up to 14,112 cores (2 hours)

Well on target– Parallelization of the HFODD core (PBLAS,

ScaLAPACK)

– Will solve issues related to speed, memory and precision

– Change of iteration cycle: updating HFB matrix elements instead of fields

Done - Numerical instabilities of large-scale calculations can be tracked down to physical instabilities built-in current functionals (see Mario’s talk)

Delayed by problem of instabilities– Paper 1 ready to be published– Paper 2 in preparation– Additional Paper 3 on finite-size spin instabilities in preparation

Page 13: DFT requirements for leadership-class computers

Work Plan (Year 4)12

• Physics

– Optimization of DME-based functionals: genetic algorithm + Argonne optimizer (cf Mario’s talk)

– Applications of DME functionals: UNEDF-1

• Computing

– Implement DME functionals in HFODD (study of time-odd channels)

– Complete version 1.0 of parallel HFODD core Demonstrate efficiency and scalability of the code First applications: N-dimensional potential energy surface, fission pathways

– Improve parallel interface to HFODD: Optimistic: it should be a good application of ADLB (“moderately long to long” work

units of 1-2 hours, little communication).

Realistic: remove the master and have him work like a slave (French revolution spirit)

– Replace sequential I/O by parallel I/O for HFODD records (used as checkpoints)

Remaining of the year• New version of HFODD: HFBTHO interface, shell correction, finite-temperature,

Augmented Lagrangian Method, matrix elements mixing, parallel interface, etc.• 2 papers on odd nuclei and 1 on spin instabilities in preparation

Page 14: DFT requirements for leadership-class computers

December 10, 2008 Slide 14

Nuclear Structure and Nuclear Interactions

Forefront Questions in Nuclear Science and the Role of High Performance Computing January 26-28, 2009 · Washington, D.C.

Microscopic Description of Nuclear Fission

Scientific and computational challenges

• Describe dynamics with novel energy functionals and ab initio methods

1) adiabatic approach 2) non-adiabatic/early stochastic3) full time-dependent dynamics

• Develop ultra-scale techniques for the description of fission

• Build a spectroscopic precision nuclear energy density functional

• Perform constrained minimization on a multi-dimensional potential energy surface

• Find full spectrum of dense millions-sized matrices

• Predict half-lives, mass and kinetic energy distribution of fission fragments and fission cross-sections

• Analyze the fission process through the visualization of time evolution

• Develop scalable application software for time-dependent many-body dynamics

• Societal Impact Nuclear Energy programs Threat reduction NNSA Stockpile Stewardship Program

• Time-dependent many-body dynamics Low-energy heavy-ion collisions and

nucleon- and photon-induced reactions Neutron star quakes Vortex dynamics in quantum super-fluids

Summary of research direction

Expected Scientific and Computational Outcomes Potential impact on Nuclear Science

Our Holy Grail…