Challenges and Opportunities in Using Challenges and Opportunities in Using Automatic Differentiation with Object-Automatic Differentiation with Object-Oriented Toolkits for Scientific ComputingOriented Toolkits for Scientific Computing
Paul Hovland (Argonne National Laboratory) Steven Lee (Lawrence Livermore National Laboratory)Lois McInnes (ANL)Boyana Norris (ANL)Barry Smith (ANL)
The Computational Differentiation Project at Argonne National Laboratory
AcknowledgmentsAcknowledgments
Jason Abate Satish Balay Steve Benson Peter Brown Omar Ghattas Lisa Grignon William Gropp Alan Hindmarsh David Keyes Jorge Moré Linda Petzold Widodo Samyono
OutlineOutline
Intro to AD Survey of Toolkits
SensPVODE PETSc TAO
Using AD with Toolkits Toolkit Level Parallel Function Level Subdomain Level Element/Vertex Function Level
Experimental Results Conclusions and Expectations
Automatic DifferentiationAutomatic Differentiation
Technique for augmenting code for computing a function with code for computing derivatives
Analytic differentiation of elementary operations/functions, propagation by chain rule
Can be implemented using source transformation or operator overloading
Two main modes Forward: propagates derivatives from independent
to dependent variables Reverse (adjoint): propagates derivatives from
dependent to independent variables
Comparison of MethodsComparison of Methods
Finite Differences Advantages: cheap Jv, easy Disadvantages: inaccurate (not robust)
Hand Differentiation Advantages: accurate; cheap Jv, JTv, Hv, … Disadvantages: hard; difficult to maintain
consistency Automatic Differentiation
Advantages: cheap JTv, Hv; easy? Disadvantages: Jv costs ~2 function evals; hard?
PVODE: Parallel ODE-IVP PVODE: Parallel ODE-IVP solversolver
Algorithm developers:
Hindmarsh, Byrne, Brown and Cohen
ODE Initial-Value Problems
Stiff and non-stiff integrators
Written in C
MPI calls for communication
PVODE for ODE PVODE for ODE simulationssimulations
ODE Initial-Value Problem (standard form):
Implicit time-stepping using BDF methods for y(tn)
Solve nonlinear system for y(tn) via Inexact Newton Solve update to Newton iterate using Krylov methods
. , ,
.)( with ),,,( 00
mNN RpRyRy
ytypytfy
ObjectiveObjective
ODE Solver(PVODE)
F(y,p)
y|t=0
py|t=t1, t2, ...
SensitivitySolver
y|t=0
p y|t=t1, t2, ...
dy/dp|t=t1, t2, ...
automatically
Possible ApproachesPossible Approaches
ad_PVODE
ad_F(y,ad_y,p,ad_p)
y, ad_y|t=0
p,ad_py, ad_y|t=t1, t2, ...
SensPVODEy|t=0
py, dy/dp |t=t1, t2, ...
ad_F(y,ad_y,p,ad_p)
Apply AD to PVODE:
Solve sensitivity eqns:
Sensitivity Differential Sensitivity Differential EquationsEquations
Differentiate y= f(t, y, p) with respect to pi:
A linear ODE-IVP for the sensitivity vector si(t) :
.iii p
f
p
y
y
f
p
y
.0)( with ,)()( 0
tsp
fts
y
fts i
iii
PETScPETSc
Portable, Extensible Toolkit for Scientific computing
Parallel Object-oriented Free Supported (manuals, email) Interfaces with Fortran 77/90, C/C++ Available for virtually all UNIX platforms, as well
as Windows 95/NT Flexible: many options for solver algorithms and
parameters
PETSc codeUser code
ApplicationInitialization
FunctionEvaluation
JacobianEvaluation
Post-Processing
PC KSPPETSc
Linear Solvers (SLES)
Nonlinear Solvers (SNES)
SolveF(u) = 0
Nonlinear PDE SolutionNonlinear PDE Solution
AD-generated code
Main Routine
TAOTAO
Object-oriented techniques Component-based (CCA) interaction Leverage existing parallel computing infrastructure Reuse of external toolkits
The Right Way
The process of nature by which all things change and which is to be followed for a life of harmony
Toolkit for advanced optimization
TAO GoalsTAO Goals
Portability
Performance
Scalable parallelism
An interface independent of architecture
Unconstrained optimization Limited-memory variable-metric method Trust region/line search Newton method Conjugate-gradient method Levenberg-Marquardt method
Bound-constrained optimization Trust region Newton method Gradient projection/conjugate gradient method
Linearly-constrained optimization Interior-point method with iterative solvers
TAO AlgorithmsTAO Algorithms
ApplicationInitialization
Function & GradientEvaluation
HessianEvaluation
Post-Processing
Application Driver
Toolkit for Advanced
Optimization(TAO)PC KSP
Linear SolversMatrices
Vectors
Optimization Tools
TAO codeUser code PETSc code
PETSc and TAOPETSc and TAO
Using AD with ToolkitsUsing AD with Toolkits
Apply AD to toolkit to produce derivative-enhanced toolkit
Use AD to provide Jacobian/Hessian/gradient for use by toolkit. Apply AD at Parallel Function Level Subdomain Function Level Element/Vertex Function Level
Differentiated Version of ToolkitDifferentiated Version of Toolkit
Makes possible sensitivity analysis, black-box optimization of models constructed using toolkit
Can take advantage of high-level structure of algorithms, providing better performance: see Andreas’ and Linda’s talks
Ongoing work with PETSc and PVODE
Levels of Function EvaluationLevels of Function Evaluationint FormFunction(SNES snes,Vec X,Vec F,void *ptr){ Parallel Function Level /* Variable declarations omitted */ mx = user->mx; my = user->my; lambda = user->param; Subdomain Function Level hx = one/(double)(mx-1); hy = one/(double)(my-1); sc = hx*hy*lambda; hxdhy = hx/hy; hydhx = hy/hx;
ierr = DAGlobalToLocalBegin(user->da,X,INSERT_VALUES,localX);CHKERRQ(ierr); ierr = DAGlobalToLocalEnd(user->da,X,INSERT_VALUES,localX);CHKERRQ(ierr);
ierr = VecGetArray(localX,&x);CHKERRQ(ierr); ierr = VecGetArray(localF,&f);CHKERRQ(ierr);
ierr = DAGetCorners(user->da,&xs,&ys,PETSC_NULL,&xm,&ym,PETSC_NULL);CHKERRQ(ierr); ierr = DAGetGhostCorners(user->da,&gxs,&gys,PETSC_NULL,&gxm,&gym,PETSC_NULL);CHKERRQ(ierr);
for (j=ys; j<ys+ym; j++) { row = (j - gys)*gxm + xs - gxs - 1; for (i=xs; i<xs+xm; i++) { row++; if (i == 0 || j == 0 || i == mx-1 || j == my-1) {f[row] = x[row]; continue;} Vertex/Element Function Level u = x[row]; uxx = (two*u - x[row-1] - x[row+1])*hydhx; uyy = (two*u - x[row-gxm] - x[row+gxm])*hxdhy; f[row] = uxx + uyy - sc*PetscExpScalar(u); } }
ierr = VecRestoreArray(localX,&x);CHKERRQ(ierr); ierr = VecRestoreArray(localF,&f);CHKERRQ(ierr);
ierr = DALocalToGlobal(user->da,localF,INSERT_VALUES,F);CHKERRQ(ierr); ierr = PLogFlops(11*ym*xm);CHKERRQ(ierr); return 0; }
Parallel Function LevelParallel Function Level
Advantages Well-defined interface:
int Function(SNES, Vec, Vec, void *);void function(integer, Real, N_Vector, N_Vector, void *);
No changes to function Disadvantages
Differentiation of toolkit support functions (may result in unnecessary work)
AD of parallel code (MPI, OpenMP) May need global coloring
Subdomain Function Subdomain Function LevelLevel
Advantages No need to differentiate communication
functions Interface may be well defined
Disadvantages May need local coloring May need to extract from parallel function
Using AD with PETScUsing AD with PETSc
Global-to-local scatter of ghost values
Parallel functionassembly
Local Function computation
Parallel Jacobian assembly
Global-to-local scatter of ghost values
Local Jacobiancomputation
Local Function computation
ADIFOR or ADIC
Local Jacobiancomputation
Script file
Coded manually; can be automated
Seed matrix initialization
Using AD with TAOUsing AD with TAO
Global-to-local scatter of ghost values
Parallel functionassembly
Local Function computation
Parallel gradient assembly
Global-to-local scatter of ghost values
Local gradientcomputation
Local Function computation
ADIFOR or ADIC
Local gradientcomputation
Script file
Coded manually; can be automated
Seed matrix initialization
Element/Vertex Function LevelElement/Vertex Function Level
Advantages Reduced memory requirements No need for matrix coloring
Disadvantages May be difficult to decompose function to this level
(boundary conditions, other special cases) Decomposition to this level may impede efficiency
ExampleExample int localfunction2d(Field **x,Field **f,int xs, int xm, int ys, int ym, int mx,int my, void *ptr) {
xints = xs; xinte = xs+xm; yints = ys; yinte = ys+ym;
if (yints == 0) { j = 0; yints = yints + 1; for (i=xs; i<xs+xm; i++) { f[j][i].u = x[j][i].u; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega + (x[j+1][i].u - x[j][i].u)*dhy; f[j][i].temp = x[j][i].temp-x[j+1][i].temp; } }
if (yinte == my) { j = my - 1; yinte = yinte - 1; for (i=xs; i<xs+xm; i++) { f[j][i].u = x[j][i].u - lid; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega + (x[j][i].u - x[j-1][i].u)*dhy; f[j][i].temp = x[j][i].temp-x[j-1][i].temp; } }
if (xints == 0) { i = 0; xints = xints + 1; for (j=ys; j<ys+ym; j++) { f[j][i].u = x[j][i].u; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega - (x[j][i+1].v - x[j][i].v)*dhx; f[j][i].temp = x[j][i].temp; } }
if (xinte == mx) { i = mx - 1; xinte = xinte - 1; for (j=ys; j<ys+ym; j++) { f[j][i].u = x[j][i].u; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega - (x[j][i].v - x[j][i-1].v)*dhx; f[j][i].temp = x[j][i].temp - (double)(grashof>0); } }
for (j=yints; j<yinte; j++) { for (i=xints; i<xinte; i++) {
vx = x[j][i].u; avx = PetscAbsScalar(vx); vxp = p5*(vx+avx); vxm = p5*(vx-avx); vy = x[j][i].v; avy = PetscAbsScalar(vy); vyp = p5*(vy+avy); vym = p5*(vy-avy);
u = x[j][i].u; uxx = (two*u - x[j][i-1].u - x[j][i+1].u)*hydhx; uyy = (two*u - x[j-1][i].u - x[j+1][i].u)*hxdhy; f[j][i].u = uxx + uyy - p5*(x[j+1][i].omega-x[j-1][i].omega)*hx;
u = x[j][i].v; uxx = (two*u - x[j][i-1].v - x[j][i+1].v)*hydhx; uyy = (two*u - x[j-1][i].v - x[j+1][i].v)*hxdhy; f[j][i].v = uxx + uyy + p5*(x[j][i+1].omega-x[j][i-1].omega)*hy;
u = x[j][i].omega; uxx = (two*u - x[j][i-1].omega - x[j][i+1].omega)*hydhx; uyy = (two*u - x[j-1][i].omega - x[j+1][i].omega)*hxdhy; f[j][i].omega = uxx + uyy + (vxp*(u - x[j][i-1].omega) + vxm*(x[j][i+1].omega - u)) * hy +(vyp*(u - x[j-1][i].omega) + vym*(x[j+1][i].omega - u)) * hx -p5 * grashof * (x[j][i+1].temp - x[j][i-1].temp) * hy;
u = x[j][i].temp; uxx = (two*u - x[j][i-1].temp - x[j][i+1].temp)*hydhx; uyy = (two*u - x[j-1][i].temp - x[j+1][i].temp)*hxdhy; f[j][i].temp = uxx + uyy + prandtl * ((vxp*(u - x[j][i-1].temp) + vxm*(x[j][i+1].temp - u)) * hy + (vyp*(u - x[j-1][i].temp) + vym*(x[j+1][i].temp - u)) * hx); } }
ierr = PetscLogFlops(84*ym*xm);CHKERRQ(ierr); return 0; }
Experimental ResultsExperimental Results
Toolkit Level – Differentiated PETSc Linear Solver Parallel Nonlinear Function Level – SensPVODE Local Subdomain Function Level
PETSc TAO
Element Function Level – PETSc
Differentiated Linear Differentiated Linear Equation SolverEquation Solver
Increased AccuracyIncreased Accuracy
Increased AccuracyIncreased Accuracy
SensPVODE: ProblemSensPVODE: Problem
Diurnl kinetics advection-diffusion equation 100x100 structured grid 16 processors of a Linux cluster with 550 MHz
processors and Myrinet interconnect
SensPVODE: TimeSensPVODE: Time
SensPVODE: Number of TimestepsSensPVODE: Number of Timesteps
SensPVODE: Time/TimestepSensPVODE: Time/Timestep
PETSc ApplicationsPETSc Applications
Toy problems Solid fuel ignition: finite difference discretization; Fortran & C
variants; differentiated using ADIFOR, ADIC Driven cavity: finite difference discretization; C
implementation; differentiated using ADIC Euler code
Based on legacy F77 code from D. Whitfield (MSU) Finite volume discretization Up to 1,121,320 unknowns Mapped C-H grid Fully implicit steady-state Tools: SNES, DA, ADIFOR
C-H Structured GridC-H Structured Grid
Algorithmic PerformanceAlgorithmic Performance
Real PerformanceReal Performance
Hybrid MethodHybrid Method
Hybrid Method (cont.)Hybrid Method (cont.)
TAO: Preliminary ResultsTAO: Preliminary Results
For More InformationFor More Information
PETSc: http://www.mcs.anl.gov/petsc/ TAO: http://www.mcs.anl.gov/tao/ Automatic Differentiation at Argonne
http://www.mcs.anl.gov/autodiff/ ADIFOR: http://www.mcs.anl.gov/adifor/ ADIC: http://www.mcs.anl.gov/adic/
http://www.autodiff.org
10 challenges for PDE 10 challenges for PDE optimization algorithmsoptimization algorithms
1.1. Problem sizeProblem size2.2. Efficiency vs. intrusivenessEfficiency vs. intrusiveness3.3. ““Physics-based” globalizationsPhysics-based” globalizations4.4. Inexact PDE solversInexact PDE solvers5.5. Approximate JacobiansApproximate Jacobians6.6. Implicitly-defined PDE residualsImplicitly-defined PDE residuals7.7. Non-smooth solutionsNon-smooth solutions8.8. Pointwise inequality constraintsPointwise inequality constraints9.9. Scalability of adjoint methods to large numbers Scalability of adjoint methods to large numbers
of inequalities of inequalities 10.10. Time-dependent PDE optimization Time-dependent PDE optimization
7. Non-smoothness7. Non-smoothness
PDE residual may not depend smoothly on state PDE residual may not depend smoothly on state variables (maybe not even continuously)variables (maybe not even continuously) Solution-adaptivitySolution-adaptivity Discontinuity-capturing, front-trackingDiscontinuity-capturing, front-tracking Subgrid-scale modelsSubgrid-scale models Material property evaluationMaterial property evaluation Contact problemsContact problems Elasto(visco)plasticityElasto(visco)plasticity
PDE residual may not depend smoothly or PDE residual may not depend smoothly or continuously on decision variablescontinuously on decision variables Solid modeler- and mesh generator-inducedSolid modeler- and mesh generator-induced