Upload
sophie-francis
View
224
Download
0
Tags:
Embed Size (px)
Citation preview
ENZO AND EXTREME SCALE AMR FOR HYDRODYNAMIC COSMOLOGY
Michael L. Norman, UC San Diego and [email protected]
WHAT IS ENZO?
A parallel AMR application for astrophysics and cosmology simulations Hybrid physics: fluid + particle + gravity + radiation Block structured AMR MPI or hybrid parallelism
Under continuous development since 1994 Greg Bryan and Mike Norman @ NCSA Shared memorydistributed memoryhierarchical memory C++/C/F, >185,000 LOC
Community code in widespread use worldwide Hundreds of users, dozens of developers Version 2.0 @ http://enzo.googlecode.com
TWO PRIMARY APPLICATION DOMAINS
ASTROPHYSICAL FLUID DYNAMICS HYDRODYNAMIC COSMOLOGY
Supersonic turbulence Large scale structure
ENZO PHYSICSPhysics Equations Math type Algorithm(s) Communicati
on
Dark matter Newtonian N-body
Numerical integration
Particle-mesh Gather-scatter
Gravity Poisson Elliptic FFTmultigrid
Global
Gas dynamics
Euler Nonlinear hyperbolic
Explicit finite volume
Nearest neighbor
Magnetic fields
Ideal MHD Nonlinear hyperbolic
Explicit finite volume
Nearest neighbor
Radiation transport
Flux-limited radiation diffusion
Nonlinear parabolic
Implicit finite differenceMultigrid solves
Global
Multispecies chemistry
Kinetic equations
Coupled stiff ODEs
Explicit BE ,implicit
None
Inertial, tracer, source , and sink particles
Newtonian N-body
Numerical integration
Particle-mesh Gather-scatter
Physics modules can be used in any combination in 1D, 2D and 3D making ENZO a very powerful and versatile code
ENZO MESHING
Berger-Collela structured AMR
Cartesian base grid and subgrids
Hierarchical timetepping
Level 0
AMR = collection of grids (patches);each grid is a C++ object
Level 1
Level 2
Unigrid = collection of Level 0 grid patches
EVOLUTION OF ENZO PARALLELISM
Shared memory (PowerC) parallel (1994-1998) SMP and DSM architecture (SGI Origin 2000, Altix) Parallel DO across grids at a given refinement level
including block decomposed base grid O(10,000) grids
Distributed memory (MPI) parallel (1998-2008) MPP and SMP cluster architectures (e.g., IBM PowerN) Level 0 grid partitioned across processors Level >0 grids within a processor executed sequentially Dynamic load balancing by messaging grids to
underloaded processors (greedy load balancing) O(100,000) grids
Projection of refinement levels
160,000 grid patches at 4 refinement levels
1 MPI task per processor
Task = a Level 0 grid patch and all associated subgrids;
processed sequentially across and within levels
EVOLUTION OF ENZO PARALLELISM
Hierarchical memory (MPI+OpenMP) parallel (2008-) SMP and multicore cluster architectures (SUN
Constellation, Cray XT4/5) Level 0 grid partitioned across shared memory
nodes/multicore processors Parallel DO across grids at a given refinement
level within a node Dynamic load balancing less critical because of
larger MPI task granularity (statistical load balancing)
O(1,000,000) grids
N MPI tasks per SMPM OpenMP threads per task
Task = a Level 0 grid patch and all associated subgrids processed concurrently within levels and
sequentially across levels
Each grid is an OpenMP thread
ENZO ON PETASCALE PLATFORMS
ENZO ON CRAY XT5 1% OF THE 64003 SIMULATION
Non-AMR 64003 80 Mpc box 15,625 (253) MPI tasks,
2563 root grid tiles 6 OpenMP threads per
task 93,750 cores 30 TB per checkpoint/re-
start/data dump >15 GB/sec read, >7
GB/sec write Benefit of threading
reduce MPI overhead & improve disk I/O
ENZO ON PETASCALE PLATFORMS
ENZO ON CRAY XT5 105 SPATIAL DYNAMIC RANGE
AMR 10243 50 Mpc box, 7 levels of refinement 4096 (163) MPI tasks, 643
root grid tiles 1 to 6 OpenMP threads
per task - 4096 to 24,576 cores
Benefit of threading Thread count increases
with memory growth reduce replication of grid
hierarchy data
Using MPI+threads to access more RAM as the AMR calculation grows in size
ENZO ON PETASCALE PLATFORMS
ENZO-RHD ON CRAY XT5 COSMIC REIONIZATION
Including radiation transport 10x more expensive LLNL Hypre multigrid
solver dominates run time near ideal scaling to at
least 32K MPI tasks Non-AMR 10243 8 and
16 Mpc boxes 4096 (163) MPI tasks, 643
root grid tiles
BLUE WATERS TARGET SIMULATIONRE-IONIZING THE UNIVERSE
Cosmic Reionization is a weak-scaling problem large volumes at a fixed resolution to span range of scales
Non-AMR 40963 with ENZO-RHD Hybrid MPI and OpenMP SMT and SIMD tuning 1283 to 2563 root grid tiles 4-8 OpenMP threads per task 4-8 TBytes per checkpoint/re-start/data dump (HDF5) In-core intermediate checkpoints (?) 64-bit arithmetic, 64-bit integers and pointers Aiming for 64-128 K cores 20-40 M hours (?)
PETASCALE AND BEYOND
ENZO’s AMR infrastructure limits scalability to O(104) cores
We are developing a new, extremely scalable AMR infrastructure called Cello http://lca.ucsd.edu/projects/cello
ENZO-P will be implemented on top of Cello to scale to
CURRENT CAPABILITIES: AMR VS TREECODE
CELLO EXTREME AMR FRAMEWORK: DESIGN PRINCIPLES
Hierarchical parallelism and load balancing to improve localization
Relax global synchronization to a minimum Flexible mapping between data structures
and concurrency Object-oriented design Build on best available software for fault-
tolerant, dynamically scheduled concurrent objects (Charm++)
CELLO EXTREME AMR FRAMEWORK: APPROACH AND SOLUTIONS
1. hybrid replicated/distributed octree-based AMR approach, with novel modifications to improve AMR scaling in terms of both size and depth;
2. patch-local adaptive time steps; 3. flexible hybrid parallelization strategies; 4. hierarchical load balancing approach based on actual
performance measurements; 5. dynamical task scheduling and communication; 6. flexible reorganization of AMR data in memory to permit
independent optimization of computation, communication, and storage;
7. variable AMR grid block sizes while keeping parallel task sizes fixed;
8. address numerical precision and range issues that arise in particularly deep AMR hierarchies;
9. detecting and handling hardware or software faults during run-time to improve software resilience and enable software self-management.
IMPROVING THE AMR MESH:PATCH COALESCING
IMPROVING THE AMR MESH:TARGETED REFINEMENT
IMPROVING THE AMR MESH:TARGETED REFINEMENT WITH BACKFILL
CELLO SOFTWARE COMPONENTS
http://lca.ucsd.edu/projects/cello
ROADMAP
Enzo website (code, documentation) http://lca.ucsd.edu/projects/enzo
2010 Enzo User Workshop slides http://lca.ucsd.edu/workshops/enzo2010
yt website (analysis and vis.) http://yt.enzotools.org
Jacques website (analysis and vis.) http://jacques.enzotools.org/doc/Jacques/Ja
cques.html
ENZO RESOURCES
BACKUP SLIDES
Level 0
x x
x
Level 1
Level 2
GRID HIERARCHY DATA STRUCTURE
(0,0)
(1,0)
(2,0) (2,1)
(0)
(1,0) (1,1)
(2,0) (2,1) (2,2) (2,3) (2,4)
(3,0) (3,1) (3,2) (3,4) (3,5) (3,6) (3,7)
(4,0) (4,1) (4,3) (4,4)
Depth
(le
vel)
Breadth (# siblings)
Scaling the AMR grid hierarchy in depth and breadth
10243, 7 LEVEL AMR STATS
Level Grids Memory (MB) Work = Mem*(2^level)
0 512 179,029 179,029
1 223,275 114,629 229,258
2 51,522 21,226 84,904
3 17,448 6,085 48,680
4 7,216 1,975 31,600
5 3,370 1,006 32,192
6 1,674 599 38,336
7 794 311 39,808
Total 305,881 324,860 683,807
real grid object
virtual grid object
grid metadataphysics data
grid metadata
Current MPI Implementation
SCALING AMR GRID HIERARCHY
Flat MPI implementation is not scalable because grid hierarchy metadata is replicated in every processor For very large grid counts, this dominates memory
requirement (not physics data!) Hybrid parallel implementation helps a lot!
Now hierarchy metadata is only replicated in every SMP node instead of every processor
We would prefer fewer SMP nodes (8192-4096) with bigger core counts (32-64) (=262,144 cores)
Communication burden is partially shifted from MPI to intranode memory accesses
CELLO EXTREME AMR FRAMEWORK
Targeted at fluid, particle, or hybrid (fluid+particle) simulations on millions of cores
Generic AMR scaling issues: Small AMR patches restrict available parallelism Dynamic load balancing Maintaining data locality for deep hierarchies Re-meshing efficiency and scalability Inherently global multilevel elliptic solves Increased range and precision requirements for
deep hierarchies