Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Review/Preface to Second Day of DD-15 Tutorial
William D. Gropp
Lab for Advanced Numerical SoftwareArgonne National Laboratory
David E. Keyes
Department of Applied Physics & Applied MathematicsColumbia University
DD15 Tutorial, Berlin, 17-18 July 2003
PETSc codeUser code
ApplicationInitialization
FunctionEvaluation
JacobianEvaluation
Post-Processing
PC KSPPETSc
Main Routine
Linear Solvers (SLES)
Nonlinear Solvers (SNES)
Timestepping Solvers (TS)
PETSc’s SLES & SNES: objects and algorithms
DD15 Tutorial, Berlin, 17-18 July 2003
PETSc objectsl Vector and matrix objects, Vec and Mat
n Accessed and used as mathematical objects n Code usage independent of runtime choice of data structuresn Implemented to allow for high performance
u Split-phase operationsu Overlap of communication and computationu Aggregation, locality, low memory footprint
l Cartesian grid object DAl Solver objects SLES and SNES
n Large collection of progressive algorithmic options assembled under abstract interfaces
n Recursively callable, with separate contexts for sub-objectsn Command-line tunable
l Auxilliary objects (e.g., Viewer)
DD15 Tutorial, Berlin, 17-18 July 2003
Basic and advanced algorithmsl Linear solvers
n Schwarz: single-level, two-leveln Schur: reduced, full (interface-interior, primal-dual)n Schwarz-Schur hybrids
l Nonlinear solversn Newton-Krylov-Schwarz (NKS)n Jacobian-free Newton-Krylov (JFNK)
l Implicit time integratorsn Time-accurate time stepping (PETSc’s TS)n Pseudo-transient continuation (? tc)
DD15 Tutorial, Berlin, 17-18 July 2003
Scalability properties of DDl Parallel scaling (time per iteration)
n Good “surface-to-volume” ratio: communication subdominant to computation, Amdahl’s law defeated
n Reasonable communication requirements: mostly nearest neighbor, of small size when global
n Can increase processors without bound with increased system size, at power-law rate that depends upon richness of communication network
l Convergence scaling (iterations needed for solution)n Schwarz and Schur can have bounded condition number
n Newton can have quadratic convergencen Nonlinear robustification techniques
DD15 Tutorial, Berlin, 17-18 July 2003
So far…l Good stories mentioned …
n 1999 Gordon Bell Prize for aerodynamic computation using NKS (PETSc, Argonne)
n 2002 Gordon Bell Prize for structural dynamics computation using FETI-DP (Salinas, Sandia)
n Combustion model for ? tc
l But only simple examples presented in detail …n Poisson problem
n Bratu problem
n Driven cavity
DD15 Tutorial, Berlin, 17-18 July 2003
This morning …l Brief review
l More advanced algorithms
l Programmatic context of current PETSc development environmentn Mathematicians and computer scientists, come join
the fun☺
l Broader sense of PDE-type scientific applications being addressed through PETScn Applications scientists, come join the fun☺
DD15 Tutorial, Berlin, 17-18 July 2003
Two distinct definitions of scalabilityl “Strong scaling”
n execution time decreases in inverse proportion to the number of processors
n fixed size problem overall
l “Weak scaling”n execution time remains constant,
as problem size and processor number are increased in proportion
n fixed size problem per processor
n also known as “Gustafson scaling”
T
p
good
poor
poor
N ∝ p
log T
log pgood
N constant
Slope= -1
Slope= 0
DD15 Tutorial, Berlin, 17-18 July 2003
Three important usages of “efficiency”l For any iterative method,
l “Convergence efficiency”
l “Implementation efficiency”
l Overall efficiency
l “Ideal” is unity for all three measures
)()(# iterpertimeitersoftime ×=
)(#)(#
large
smallconv Pforitersof
Pforitersof=η
large
small
large
smallimpl )(
)(PP
PforiterpertimePforiterpertime
×=η
large
small
large
smallimplconv )(
)(PP
PfortimePfortime
×=×= ηηη
DD15 Tutorial, Berlin, 17-18 July 2003
PDE varietiesl Evolution (time hyperbolic, time parabolic)
l Equilibrium (elliptic, spatially hyperbolic or parabolic)
l Mixed, varying by region
l Mixed, of multiple type (e.g., parabolic with elliptic constraint)
KK =∇•⋅∇−•∂∂
=•⋅∇+•∂∂
)()(,))(()( αt
ft
K=∇•−•⋅∇ )( αU
DD15 Tutorial, Berlin, 17-18 July 2003
Common featuresl Despite enormous variety of mathematical analysis
required, most computational algorithms for PDEs can be described with surprisingly few common bulk-synchronous (SPMD) kernelsn Domain-decomposed loops over mesh points manipulating purely
local data
n Domain-decomposed loops over mesh points (or edges) manipulating own and near neighbors’ data
n Global reductions and recurrences
l The large variety of operations, yet simplicity of operation class, invites abstraction and performance optimization: in short, an object oriented library
DD15 Tutorial, Berlin, 17-18 July 2003
Questions?
l Now is a great time for them!
l Next, we go on to nonlinear Schwarz and advanced concepts in nonlinear solver software, to finish Lecture #2 from Thursday afternoon
l Then we start Lecture #3 on applications this morning
l This afternoon’s Lecture #4 is on two research topics: physics-based preconditioning and PDE-constrained optimization
l Whatever we don’t finish will be posted on line by this evening
DD15 Tutorial, Berlin, 17-18 July 2003
Nonlinear Schwarz preconditioningl Nonlinear Schwarz has Newton both inside and
outside and is fundamentally Jacobian-freel It replaces with a new nonlinear system
possessing the same root, l Define a correction to the partition (e.g.,
subdomain) of the solution vector by solving the following local nonlinear system:
where is nonzero only in the components of the partition
l Then sum the corrections: to get an implicit function of u
0)( =uF0)( =Φ u
thi
thi
)(uiδ
0))(( =+ uuFR ii δn
i u ℜ∈)(δ
)()( uu ii δ∑=Φ
DD15 Tutorial, Berlin, 17-18 July 2003
Nonlinear Schwarz – picture
11
11
0 0
u
F(u)
Ri
RiuRiF
DD15 Tutorial, Berlin, 17-18 July 2003
Nonlinear Schwarz – picture
11
11
0 0
11
11
0 0
u
F(u)
Ri
Rj
Riu
RjF
RiF
Rju
DD15 Tutorial, Berlin, 17-18 July 2003
Nonlinear Schwarz – picture
11
11
0 0
11
11
0 0
u
F(u)
Fi’(ui)
Ri
Rj
Riu
RjF
RiF
Rju
diu+dju
DD15 Tutorial, Berlin, 17-18 July 2003
Nonlinear Schwarz, cont.l It is simple to prove that if the Jacobian of F(u) is
nonsingular in a neighborhood of the desired root then and have the same unique root
l To lead to a Jacobian-free Newton-Krylov algorithm we need to be able to evaluate for any :n The residual n The Jacobian-vector product
l Remarkably, (Cai-Keyes, 2000) it can be shown that
where and l All required actions are available in terms of !
0)( =Φ u
nvu ℜ∈,)()( uu ii δ∑=Φ
0)( =uF
vu ')(Φ
JvRJRvu iiTii )()( 1' −∑≈Φ
)(' uFJ = Tiii JRRJ =
)(uF
DD15 Tutorial, Berlin, 17-18 July 2003
Nonlinear Schwarz, cont.
Discussion:l After the linear version of additive Schwarz was
invented, it was realized that it is in the same class of methods as additive multigrid, in terms of operator algebra: projection off the error in a set of subspaces (though linear MG is usually done multiplicatively)
l After nonlinear Schwarz was invented, it was realized that it is in the same class of methods as nonlinear multigrid (full approximation scheme MG), in terms of operator algebra (though nonlinear multigrid is usually done multiplicatively)
DD15 Tutorial, Berlin, 17-18 July 2003
Driven cavity in velocity-vorticity coords
02 =∂∂
−∇−y
uω
02 =∂∂
+∇−x
vω
0Gr2 =∂∂
−∂∂
+∂∂
+∇−xT
yv
xu
ωωω
0)(Pr2 =∂∂
+∂∂
+∇−yT
vxT
uT
x-velocity
y-velocity
vorticity
internal energy
hotcold
DD15 Tutorial, Berlin, 17-18 July 2003
Experimental example of nonlinear Schwarz
Newton’s method Additive Schwarz Preconditioned Inexact Newton(ASPIN)
Difficulty at critical Re
Stagnation beyond
critical Re
Convergence for all Re
DD15 Tutorial, Berlin, 17-18 July 2003
Common software infrastructure for nonlinear PDE solvers – coming in PETSc
l The user codes to the problem they are solving (by providing F(u)), not the algorithm used to solve the problem
l Implementation of various algorithms reuse common concepts and code when possible, withoutlosing efficiency
Optimizer
Linear solver
Eigensolver
Time integrator
Nonlinear solver
Indicates dependence
Sens. AnalyzerVision:
DD15 Tutorial, Berlin, 17-18 July 2003
Encompassing …l Newton’s method
n Jacobian-free methods n w/Jacobian-based preconditioned linear solvers
(Newton-PCG)n w/multigrid linear solvers (Newton-MG)
l Nonlinear multigridn a.k.a. Full approximation scheme (FAS) n a.k.a. MG-Newton
DD15 Tutorial, Berlin, 17-18 July 2003
Software engineering ingredients
l Standard solver interfaces (SLES, SNES)
l Solver libraries
l Automatic differentiation (AD) tools
l Code generation tools
DD15 Tutorial, Berlin, 17-18 July 2003
Algorithm review
F(u) = 0, Jacobian A(u)
Newton
Newton–SOR (1 inner linear sweep over i)
SOR–Newton (1 inner nonlinear sweep over i)
1( ) ( )u u A u F u−← −
1( ){ ( ) ( )[ ]}i ji ii i ij jj i
u u A u F u A u u u−
<
← − − −∑
1( ) ( )ii ii iu u A u F u−← −
DD15 Tutorial, Berlin, 17-18 July 2003
Cute observation
( ) ( )
( ) ( ) ( )[ ]
ii ii
ji i ij j
A u A u
F u F u A u u u
≈
≈ + −∑1( ){ ( ) ( )[ ]}i ji ii i ij j
j i
u u A u F u A u u u−
<
← − − −∑
1( ) ( )ii ii iu u A u F u−← −SOR-Newton
… with approximations
… gives Newton-SOR
⇒ (Gauss-Seidel) matrix-free linear relaxation
is very similar to nonlinear relaxation
DD15 Tutorial, Berlin, 17-18 July 2003
Function and Jacobian evaluation
l FAS requires them pointwise
l Newton requires them globally
l Newton-MG requires bothn Newton (outside) globally
n MG (inside) locally
DD15 Tutorial, Berlin, 17-18 July 2003
Enter Automatic Differentiation
l Given code for AD can compute
n A(u) and
n A(u)*w efficientlyl Given code for AD can compute
n and
n efficiently
( )F u
( )iF u( )iiA u
( )ij jj
A u w∑
DD15 Tutorial, Berlin, 17-18 July 2003
Automated Code generation (“in-lining”)
• Newton method requires a user-provided function and an AD Jacobian
• Big performance hit if handled directly with components§ A outer loop over an inner subroutine that calls a function at
each point is inefficient globally
§ An outer subroutine that loops over all inner points is wasteful locally (information thrown away)
• Need to have it both ways, depending upon algorithmic context; user should not see this level of performance-induced coding complexity
DD15 Tutorial, Berlin, 17-18 July 2003
Coarse grid correction can also be handled
Coarse grid objects generated together with fine
(e.g., PETSc’s DMMG)
l Newton-MG
l MG-Newton
( ) ( )H H
TH
A Ru c RF u
u u R c
=
← −
)
( ) ( ) ( ) 0H H HF Ru c F Ru RF u+ − + =) )
DD15 Tutorial, Berlin, 17-18 July 2003
Conclusion
The algorithmic/mathematical building blocks for Newton-MG and MG-Newton are essentially the same.
Thus the software building blocks should be also (and they will be in the next release of PETSc, version 3.0).