Fusion Minisymposium, SIAM CSE03
Preconditioning Implicit Methods for Coupled Physics Problems
David KeyesCenter for Computational Science
Old Dominion University&
Institute for Scientific Computing ResearchLawrence Livermore National Laboratory
Fusion Minisymposium, SIAM CSE03
Plan of presentation Background/progress report
Fusion applications in the TOPS SciDAC portfolio Center for Extended Magnetohydrodynamic Modeling (CEMM) Center for Magnetic Reconnection Studies (CMRS)
Parallel nonlinearly implicit solvers in TOPS PETSc’s domain-decomposed JFNK framework Hypre’s multilevel preconditioners
What’s wrong with this picture? Curse of the fast waves Curse of the splitting errors Curse of the “brute force” algebraic solvers
Current and future needs/plans Physics-based preconditioning Coupling multi-physics applications
Fusion Minisymposium, SIAM CSE03
Acknowledgments For mentoring, application codes, and some slides, our
primary SciDAC fusion collaborators to date: Steve Jardin and Jin Chen (CEMM) Amitava Bhattacharjee and Kai Germaschewski (CMRS)
For solver codes, new development, and some slides, our TOPS partners:
Barry Smith and the PETSc team (ANL) Rob Falgout and the Hypre team (LLNL)
For putting it all together: Florin Dobrian, TOPS project post-doc (ODU)
For future algorithmic philosophy: Xiao-Chuan Cai (CU Boulder) and Dana Knoll (LANL)
For support and computational resources: DOE SciDAC program and earlier computational math investments NERSC and ORNL computing centers
Fusion Minisymposium, SIAM CSE03
Two fusion energy applications Center for Extended MHD
Modeling (CEMM, based at PPPL)
Realistic toroidal geom., unstructured mesh, hybrid FE/FD discretization
Fields expanded in scalar potentials, and streamfunctions
Operator-split, linearized, w/11 potential solves in each poloidal cross-plane/step (90% exe. time)
Parallelized w/PETSc (Tang et al., SIAM PP01, Chen et al., SIAM AN02, Jardin et al., SIAM CSE03)
Want from TOPS: Now: scalable linear implicit solver
for much higher resolution (and for AMR)
Later: fully nonlinearly implicit solvers and coupling to other codes
Center for Magnetic Reconnection Studies (CMRS, based at Iowa)
Idealized 2D Cartesian geom., periodic BCs, simple FD discretization
Mix of primitive variables and streamfunctions
Sequential nonlinearly coupled explicit evolution, w/2 Poisson inversions on each timestep
Want from TOPS: Now: scalable Jacobian-free Newton-
Krylov nonlinearly implicit solver for higher resolution in 3D (and for AMR)
Later: physics-based preconditioning for nonlinearly implicit solver
Fusion Minisymposium, SIAM CSE03
CEMM profile from SciDAC website
Center for Extended Magnetohydrodynamic ModelingThis research project will develop computer codes that will enable a realistic assessment of the mechanisms leading to disruptive and other stability limits in the present and next generation of fusion devices. With an improvement in the efficiency of codes and with the extension of the leading 3D nonlinear magneto-fluid models of hot, magnetized fusion plasmas, this research will pioneer new plasma simulations of unprecedented realism and resolution. These simulations will provide new insights into low frequency, long-wavelength non-linear dynamics in hot magnetized plasmas, some of the most critical and complex phenomena in plasma and fusion science. The underlying models will be validated through cross-code and experimental comparisons.
PI: Steve JardinPrinceton Plasma Physics Lab
Fusion Minisymposium, SIAM CSE03
CMRS profile from SciDAC websiteMagnetic Reconnection: Applications to Sawtooth Oscillations, Error Field Induced Islands and the Dynamo EffectThe research goals of this project include producing a unique high performance code and using this code to study magnetic reconnection in astrophysical plasmas, in smaller scale laboratory experiments, and in fusion devices. The modular code that will be developed will be a fully three-dimensional, compressible Hall MHD code with options to run in slab, cylindrical and toroidal geometry and flexible enough to allow change in algorithms as needed. The code will use adaptive grid refinement, will run on massively parallel computers, and will be portable and scalable. The research goals include studies that will provide increased understanding of sawtooth oscillations in tokamaks, magnetotail substorms, error-fields in tokamaks, reverse field pinch dynamos, astrophysical dynamos, and laboratory reconnection experiments.
PI: Amitava BhattacharjeeUniversity of Iowa
Fusion Minisymposium, SIAM CSE03
Summary of progress on CEMM TOPS running full-up M3D code on PPPL’s preferred NERSC “Seaborg”
platform Using up to 75 processors per poloidal cross-plane in weak scaling – fixed
memory per processor (parallel concurrency can be 10-100 times this in toroidal direction), PETSc handles all aspects of the distributed data structures
Have integrated Hypre as preconditioner to PETSc and created new Hypre coarsening rules to accommodate a peculiarity of M3D’s Dirichlet boundary conditions
1-level additive Schwarz preconditioner has excellent per-iteration implementation efficiency, but nonscalable convergence
Algebraic Multigrid preconditioner has perfect convergence scalability, but (presently) high set-up time and poor set-up scalability, and mildly degrading per-iteration implementation efficiency
PLAN: separately tune each of the 11 different Poisson solves per timestep PLAN: work on improving Hypre solves and amortizing Hypre set-ups PLAN (with APDEC team): incorporate AMR
Fusion Minisymposium, SIAM CSE03
Hypre’s AMG in CEMM app M3D’s custom discretization and parallel data structures dictated by
geometry and physics: periodic toroidally, unstructured poloidally, fit to flux surfaces, partitioned into 3n2 chunks poloidally times arbitrary toroidal decomposition, some mesh anisotropy
Mesh mapped and reordered using PETSc’s index sets, solved by GMRES, preconditioned with Hypre’s parallel algebraic MG solver of Ruge-Stueben type
figures c/o S. Jardin, CEMM
Fusion Minisymposium, SIAM CSE03
Hypre’s AMG in CEMM app Iteration count results below are averaged over 19 different PETSc
SLESSolve calls in initialization and first timestep loop for this operator-split unsteady code, abcissa is number of procs in scaled problem; problem size ranges from 12K to 303K unknowns (approx 4K per processor)
0
100
200
300
400
500
600
700
3 12 27 48 75
ASM-GMRESAMG-FMGRES
iterations
size or procs
Fusion Minisymposium, SIAM CSE03
Hypre’s AMG in CEMM app, cont. Scaled speedup timing results below are summed over 19 different
PETSc SLESSolve calls in initialization and first timestep loop for this operator-split unsteady code
Much of AMG cost is coarse-grid formation (preprocessing); in production, these coarse hierarchies will be saved for reuse (same linear systems are called in each timestep loop), making AMG less expensive
0
10
20
30
40
50
60
3 12 27 48 75
ASM-GMRESAMG-FMGRESAMG inner
size or procs
time
Fusion Minisymposium, SIAM CSE03
Summary of progress on CMRS CMRS team has provided TOPS with discretization of model 2D
multicomponent MHD evolution code in PETSc’s FormFunctionLocal format using DMMG and automatic differentiation for Jacobian objects
TOPS has implemented fully nonlinearly implicit GMRES-MG-ILU parallel solver with custom deflation of nullspace in CMRS’s doubly periodic formulation
CMRS and TOPS reproduce the same dynamics on the same grids with the same time-stepping, up to a finite-time singularity due to collapse of current sheet (that falls below presently uniform mesh resolution)
TOPS code, being implicit, can choose timesteps an order of magnitude larger, with potential for higher ratio in more physically realistic parameter regimes, but is still slower in wall-clock time
PLAN: tune PETSc solver by profiling, blocking, reuse, etc. PLAN: go to higher-order in time PLAN: identify the numerical complexity benefits from implicitness (in
suppressing fast timescales) and quantify (explicit versus implicit) PLAN (with APDEC team): incorporate AMR
Fusion Minisymposium, SIAM CSE03
Equilibrium:
Model equations: (Porcelli et al., 1993, 1999)2D Hall MHD sawtooth instability
figures c/o A. Bhattacharjee, CMRS
Vorticity, early time
Vorticity, later time
zoom
ex29.c in
PETSc 2.5.1
Fusion Minisymposium, SIAM CSE03
PETSc’s DMMG in CMRS app Mesh and time refinement studies of CMRS Hall magnetic reconnection
model problem (4 mesh sizes, dt=0.1 (nondimensional, near CFL limit for fastest wave) on left, dt=0.8 on right)
Measure of functional inverse to thickness of current sheet versus time, for 0<t<200 (nondimensional), where singularity occurs around t=215
Fusion Minisymposium, SIAM CSE03
PETSc’s DMMG in CMRS app, cont. Implicit timestep increase studies of CMRS Hall magnetic reconnection
model problem, on finest (192192) mesh of previous slide, in absolute magnitude, rather than semi-log
Fusion Minisymposium, SIAM CSE03
Scope for TOPS Design and implementation of “solvers”
Time integrators
Nonlinear solvers
Constrained optimizers
Linear solvers
Eigensolvers
Software integration Performance optimization
0),,,( ptxxf
0),( pxF
bAx
BxAx
0,0),(..),(min uuxFtsuxu
Optimizer
Linear solver
Eigensolver
Time integrator
Nonlinear solver
Indicates dependence
Sens. Analyzer
(w/ sens. anal.)
(w/ sens. anal.)
Fusion Minisymposium, SIAM CSE03
Jacobian-Free Newton-Krylov Method In the Jacobian-Free Newton-Krylov (JFNK) method for
F(u)=0 , a Krylov method solves the linear Newton correction equation, requiring Jacobian-vector products
These are approximated by the Fréchet derivatives
so that the actual Jacobian elements are never explicitly needed, where is chosen with a fine balance between approximation and floating point rounding error (or by automatic differentiation)
)]()([1
)( uFvuFvuJ
Build preconditioner using convenient approximations without losing Newton convergence in the outer loop
Fusion Minisymposium, SIAM CSE03
Background of PETSc Library Developed at ANL to support research, prototyping, and production
parallel solutions of operator equations in message-passing environments
Distributed data structures as fundamental objects - index sets, vectors/gridfunctions, and matrices/arrays
Iterative linear and nonlinear solvers, combinable modularly and recursively, and extensibly
Portable, and callable from C, C++, Fortran Uniform high-level API, with multi-layered entry Aggressively optimized: copies minimized, communication aggregated
and overlapped, caches and registers reused, memory chunks preallocated, inspector-executor model for repetitive tasks (e.g., gather/scatter)
See http://www.mcs.anl.gov/petsc
Fusion Minisymposium, SIAM CSE03
PETSc codeUser code
ApplicationInitialization
FunctionEvaluation
JacobianEvaluation
Post-Processing
PC KSPPETSc
Main Routine
Linear Solvers (SLES)
Nonlinear Solvers (SNES)
Timestepping Solvers (TS)
User Code/PETSc Interactions
Fusion Minisymposium, SIAM CSE03
PETSc codeUser code
ApplicationInitialization
FunctionEvaluation
JacobianEvaluation
Post-Processing
PC KSPPETSc
Main Routine
Linear Solvers (SLES)
Nonlinear Solvers (SNES)
Timestepping Solvers (TS)
AD code
User Code/PETSc Interactions
AMG added here
Fusion Minisymposium, SIAM CSE03
Background of Hypre Library Developed at LLNL to support research, prototyping, and production
parallel solutions of linear equations in message-passing environments Object-oriented design similar to PETSc Concentrates on linear problems only Richer in preconditioners than PETSc, with focus on algebraic
multigrid, including many research prototypes not yet in the public release
Includes other preconditioners, including sparse approximate inverse (Parasails) and parallel ILU (Euclid)
See http://www.llnl.gov/CASC/hypre/
Fusion Minisymposium, SIAM CSE03
Algebraic multigrid background AMG assumes only information about the underlying matrix structure AMG automatically forms coarse operators, based on problem data,
using heuristics There are many AMG algorithms, and the area is under active
theoretical and experimental development
Recur, as in geometric MG
AMG FrameworkR n
algebraically smooth error
error damped by pointwise
relaxation
Choose coarse grids, transfer operators, etc.
to eliminate
Fusion Minisymposium, SIAM CSE03
Sample of Hypre’s scaled efficiency
PFMG-CG on Red (40x40x40)
0
0.2
0.4
0.6
0.8
1
0 1000 2000 3000 4000
procs / problem size
scal
ed e
ffic
ien
cy
Setup
Solve
64K DOFs
200M DOFs
Fusion Minisymposium, SIAM CSE03
What’s wrong with this picture? Curse of the fast waves
MHD phenomena include waves at many different timescales, some of which (e.g., Whistler, Alfvén) are not dynamically relevant to some simulations but suppress the timestep of explicit methods due to instability
Curse of the splitting errors Traditional operator-split approaches to advancing evolutionary
PDEs incur first-order-in-time splitting errors that suppress the timestep of semi-implicit methods due to inaccuracy
Curse of the “brute force” algebraic solvers Fully implicit methods to overcome timestep limitations demand
powerful solvers, whose cost per iteration for large timesteps may take back any savings they provide by allowing larger timesteps for a given accuracy
Fusion Minisymposium, SIAM CSE03
What do we need? To overcome the algebraic complexity of implicit
solvers: Knowledge of the physics already present in traditional operator-split
approaches
To begin to couple existing large-scale nonlinear physics codes:
An algebraic framework like Schwarz methods for linear problems, which shows how to use conveniently obtained solutions to subproblems of the new fully coupled problem (e.g., the solver procedures in the original component codes) to arrive at the solution of the full problem, faster and more reliably than by nonlinear Picard iteration (“successive substitutions”) alone
TOPS is addressing these near-downstream issues: Physics-based preconditioning (work w/ D. Knoll’s LANL group) Nonlinear Schwarz (work w/ X.-C. Cai’s Boulder group (part of TOPS)) Nonlinear solver interface (work w/ SciDAC CCA project)
Fusion Minisymposium, SIAM CSE03
Philosophy of Jacobian-free NK To evaluate the linear residual, we use the true F’(u) , giving a
true Newton step and asymptotic quadratic Newton convergence To precondition the linear residual, we do anything convenient
that uses understanding of the dominant physics/mathematics in the system and respects the limitations of the parallel computer architecture and the cost of various operations:
combinations of operator-split Jacobians (for reasons of physics or reasons of numerics)
Jacobian of related discretization (for “fast” solves) Jacobian of lower-order discretization (for more stability, less storage) Jacobian with “lagged” values for expensive terms (for less computation per
degree of freedom) Jacobian stored in lower precision (for less memory traffic per preconditioning
step) Jacobian blocks decomposed for parallelism
Fusion Minisymposium, SIAM CSE03
Philosophy of Jacobian-free NK, cont. These motivations are not new; most large-scale
application codes also take “short cuts” on the approximate Jacobian operator to be inverted – showing physical intuition
The problem with many codes is that they do not anywhere have an accurate global Jacobian operator; they use only the weak Jacobian
This leads to a weakly nonlinearly converging “defect correction method”
Defect correction:
in contrast to preconditioned Newton:
)()( 11 kkk uFBuuJB
)( kk uFuB
Fusion Minisymposium, SIAM CSE03
Physics-based preconditioning:example of 1D Shallow Water Equations
Continuity
Momentum
Gravity wave speed
0)(
x
u
t
g
xg
x
u
t
u
)()( 2
Typically , but stability restrictions require timesteps based on the CFL criterion for the fastest wave, for an explicit method
One can solve fully implicitly, or one can filter out the gravity wave by solving semi-implicitly
ug
Fusion Minisymposium, SIAM CSE03
Physics-based preconditioning, cont. Continuity (*)
Momentum (**)
0)( 11
x
u nnn
0)()()( 121
xg
x
uuu nn
nnn
Solving (**) for and substituting into (*),
where
1)( nu
x
S
xxg
nn
nnn
)(1
21
x
uuS
nnn
)(
)(2
Fusion Minisymposium, SIAM CSE03
Physics-based preconditioning, cont. After the parabolic equation is spatially discretized
and solved for , then can be found from n
nnn S
xgu
1
1)(
One scalar parabolic solve and one scalar explicit update replace an implicit hyperbolic system
Similar tricks are employed in aerodynamics (sound waves), MHD (Alfvén waves), etc.
Temporal truncation error remains in (**), due to lagged advection, but in “delta” form it makes a great preconditioner
Mousseau et al. for gravity waves (generalization of this example) Pernice et al. for acoustic waves (SIMPLE scheme) Chacon et al. for Whistler waves in MHD (very intricate; see JCP)
1n 1)( nu
Fusion Minisymposium, SIAM CSE03
Physics-based Preconditioning When the shallow-water wave splitting
is used as a solver, it leaves a first-order in time splitting error
In the Jacobian-free Newton-Krylov framework, this solver, which maps a residual into a correction, can be regarded as a preconditioner
The true Jacobian is never formed yet the time-implicit nonlinear residual at each time step can be made as small as needed for nonlinear consistency in long time integrations
Fusion Minisymposium, SIAM CSE03
Physics-based preconditioning In Newton iteration, one seeks to obtain a correction
(“delta”) to solution, by inverting the Jacobian matrix on (the negative of) the nonlinear residual:
A typical operator-split code also derives a “delta” to the solution, by some implicitly defined means, through a series of implicit and explicit substeps
This implicitly defined mapping from residual to “delta” is a natural preconditioner
TOPS Software must (and will) accommodate this!
)()]([ 1 kkk uFuJu
kk uuF )(
Fusion Minisymposium, SIAM CSE03
1D Shallow water preconditioning Define continuity residual for each timestep:
Define momentum residual for each timestep:
_)]([
Rx
u
uRx
gu n _
][)(
Continuity delta-form (*):
Momentum delta form (**):
x
uR
nnn
11 )(
_
xg
x
uuuuR
nn
nnn
121 )()()(_
Fusion Minisymposium, SIAM CSE03
1D Shallow water preconditioning, cont. Solving (**) for and substituting into (*),
After this parabolic equation is solved for , we have
This completes the application of the preconditioner to one Newton-Krylov iteration at one timestep
Of course, the parabolic solve need not be done exactly; one sweep of multigrid can be used
)( u
)_(_)][
( 22 uRx
Rxx
g n
uRx
gu n _][
)(
Fusion Minisymposium, SIAM CSE03
Operator-split preconditioning Subcomponents of a PDE operator often have special
structure that can be exploited if they are treated separately
Algebraically, this is just a generalization of Schwarz, by term instead of by subdomain
Suppose and a preconditioner is to be constructed, where and are each “easy” to invert
Form a preconditioned vector from as follows:
Equivalent to replacing with First-order splitting error, yet often used as a solver!
RSIJ 1SI RI
u
J SRRSI 1
uSIRI 111 )()(
Fusion Minisymposium, SIAM CSE03
Operator-split preconditioning, cont. Suppose S is convection-diffusion and R is
reaction, among a collection of fields stored as gridfunctions
On a small regular 2D grid with a five-point stencil:
R is trivially invertible in block diagonal form S is invertible with one multilevel solve per field
J = S + R
Fusion Minisymposium, SIAM CSE03
Preconditioners assembled from just the “strong” elements of the Jacobian, alternating the source term and the diffusion term operators, are competitive in convergence rates with block-ILU on the Jacobian
particularly, since the decoupled scalar diffusion systems are amenable to simple multigrid treatment – not as trivial for the coupled system
The decoupled preconditioners store many fewer elements and significantly reduce memory bandwidth requirements and are expected to be much faster per iteration when carefully implemented
See “alternative block factorization” by Bank et al.; incorporated into SciDAC TSI solver by D’Azevedo
Operator-split preconditioning, cont.
Fusion Minisymposium, SIAM CSE03
Nonlinear Schwarz preconditioning Nonlinear Schwarz has Newton both inside and outside
and is fundamentally Jacobian-free It replaces with a new nonlinear system
possessing the same root, Define a correction to the partition (e.g.,
subdomain) of the solution vector by solving the following local nonlinear system:
where is nonzero only in the components of the partition
Then sum the corrections:
0)( uF0)( uthi
thi
)(ui
0))(( uuFR ii n
i u )(
)()( uu ii
Fusion Minisymposium, SIAM CSE03
Nonlinear Schwarz, cont. It is simple to prove that if the Jacobian of F(u) is
nonsingular in a neighborhood of the desired root then and have the same unique root
To lead to a Jacobian-free Newton-Krylov algorithm we need to be able to evaluate for any : the residual the Jacobian-vector product
Remarkably, (Cai-Keyes, 2000) it can be shown that
where and All required actions are available in terms of !
0)( u
nvu ,
)()( uu ii
0)( uF
vu ')(
JvRJRvu iiTii )()( 1'
)(' uFJ Tiii JRRJ
)(uF
Fusion Minisymposium, SIAM CSE03
Example of nonlinear Schwarz
Newton’s method Additive Schwarz Preconditioned Inexact Newton (ASPIN)
Difficulty at critical Re
Stagnation beyond
critical Re
Convergence for all Re
Fusion Minisymposium, SIAM CSE03
Coupling using Nonlinear Schwarz Consider systems F1(u1 ,u2)=0 and F2(u1 ,u2)=0
Couple together as F(u)=0 Apply nonlinear Schwarz with
nonoverlapping partitions corresponding to original systems
Then
where
(gives insight into coupling direction/scaling)
IJJ
JJIJRJRu ii
Tii
211
22
121
111'
)(
)()()(
j
iij u
FJ
Fusion Minisymposium, SIAM CSE03
Abstract Gantt Chart for TOPS
Algorithmic Development
Research Implementations
Hardened Codes
Applications Integration
Dissemination
time
e.g.,PETSc
e.g.,TOPSLib
e.g., ASPIN
Each color module represents an algorithmic research idea on its way to becoming part of a supported community software tool. At any moment (vertical time slice), TOPS has work underway at multiple levels. While some codes are in applications already, they are being improved in functionality and performance as part of the TOPS research agenda.
Fusion Minisymposium, SIAM CSE03
Summary Important codes in fusion MHD are bottlenecked without powerful
parallel implicit solvers These same codes often contain valuable approximate solvers being
employed as the only solver, thereby limiting timestep effectiveness Coupling such codes together will only impose further limits on the
timestep – it is likely to be smaller than the smallest timestep permitted by any component code, to accommodate higher levels of coupling
Nonlinearly implicit solvers for each component can leverage existing solvers as physics-based preconditioners, if software allows, strengthening the components even prior to their integration
Nonlinear versions of Schwarz contain promise for preconditioning coupled codes
Doing physics is more fun than doing driven cavities