Upload
ashley-arnold
View
217
Download
2
Tags:
Embed Size (px)
Citation preview
COMPASS All-hands Meeting,
Fermilab, Sept. 17-18 2007
Scalable Solvers in Scalable Solvers in Petascale Electromagnetic Petascale Electromagnetic
SimulationSimulation
Lie-Quan (Rich) Lee, Volkan Akcelik, Ernesto Prudencio, Lixin Ge
Stanford Linear Accelerator Center
Xiaoye Li, Esmond NgLawrence Berkeley National Laboratory
Work supported by DOE ASCR, BES & HEP Divisions under contract DE-AC02-76SF00515
OverviewOverview
Shape Determination/Optimization V. Akcelik, L. Lee (SLAC) T. Tautges, P. Knupp, L. Diachin (ITAPS) O. Ghattas, E. Ng, D. Keyes (TOPS)
Linear and Nonlinear Eigensolvers L. Lee(SLAC), X. Li, E. Ng, C. Yang (LBNL/TOPS)
Scalable Linear Solvers L. Lee (SLAC), X. Li, E. Ng (TOPS)
Shape Determination Shape Determination
and Optimizationand Optimization
Shape Determination and Shape Determination and Optimization For SCRF Optimization For SCRF CavitiesCavities
Shape changes due to Fabrication errors Addition of stiffening rings Tuning for accelerating mode
Change HOM Damping -> Beam quality
Ring in the middle
HOM Damping changes
Tuning
Least-squares MinimizationLeast-squares Minimization
Unknowns are shape deviation parameters Gauss-Newton with truncated-SVD Indefinite linear systems from KKT (deferred)
Its forward problemis Maxwell eigenvalue problem
Example 1 for ILC TDR Example 1 for ILC TDR CavityCavity Create a synthetic example, artificially deform a 3D 9 cell
ILC cavity. Choose a set of parameters defining shape variations, in
total 26 independent inversion parameters. Cell radius dr (x9) an cell length dz (x9) Iris radius (x8)
Assign random values to these variables, and deform the cavity.
Solve the Maxwell eigenvalue problem. Use the first 45 nonzero frequencies, and first 9 modes
field distribution as the targeted values
Results for Example 1Results for Example 1 The nonlinear solver
converges within a handful of iterations
Frequencies and Fields match remarkably
Objective function decreases by 10e6
The “target” and “inverted” cavity shapes are very close to each other
Determining TDR Shape Determining TDR Shape with Measured Frequencies with Measured Frequencies Experimental data for manufactured baseline
ILC cavities from DESY The first 45 mode frequencies, and the first 9
monopole mode field distribution along the cavity axis 82 parameters: cell radius, length, tuning,
warping, and iris radius
Cell length error Cell radius error Deformed surface Elliptical shape
ResultsResults
Difference of Frequencies and Field values Red: inverted cavity - measured values Black/blue: ideal shape - measured values
An article has been accepted by JCP
MHz
Future Work on Shape Future Work on Shape DeterminationDetermination Measurement data contain error
better algorithm
Choices of shape deviation parameters
Extending the method to using frequencies, fields and external Qs where The forward problem is a complex nonlinear
eigenvalue problem!
Mesh smoothing (ITAPS)Meshes near pickup gap
red: deformedblack: original
Linear and Nonlinear Linear and Nonlinear
EigensolversEigensolvers
RF Cavity Eigenvalue RF Cavity Eigenvalue Problem Problem
E
ClosedCavity
MNedelec-type Element
Find frequency and field vector of normal modes:
“Maxwell’s Eqns in Frequency Domain”
Cavity with Waveguide Cavity with Waveguide CouplingCoupling
• Vector wave equation with waveguide boundary conditions can be modeled by a non-linear eigenvalue problem
OpenCavity
Waveguide BCWaveguide BC
Waveguide BC
With
• One waveguide mode per port only
Cavity with Waveguide Cavity with Waveguide Coupling for Multiple Coupling for Multiple Waveguide ModesWaveguide Modes
• Vector wave equation with waveguide boundary conditions can be modeled by a non-linear eigenvalue problem (NEP)
OpenCavity
Waveguide BC
Waveguide BC
Waveguide BC
where
iWSMP MUMPS SuperLU_Dist
Krylov Subspace Methods
Domain-specific preconditioners
Different solver options have different performance dynamics
Omega3P
Lossless Lossy Material
PeriodicStructure
ExternalCoupling
ESIL/withRestart
ISIL w/ refinement
Implicit/Explicit Restarted Arnoldi SOAR Self-Consistent
IterationNonlinearArnoldi/JD
Physics Problems and Physics Problems and Solver OptionsSolver Options
Path to Simulate Path to Simulate ILC RF Unit (3-cryomodule)ILC RF Unit (3-cryomodule) Optimized ILC single cavity routinely
Simulated 4-cavity STF last year
Simulating 8-cavity ILC Cryomodule this
yearSimulate ILC 3-cryomodule RF Unit
- ~200M DOFs, further CS/AM advance needed, petascale
Future Work for Future Work for
EigensolversEigensolvers Parallelize AMLS, understand and
improve its performance and scalability Nonlinear Jacobi-Davidson
Choice of initial space Strategy for updating preconditioner and
choice of preconditioners New algorithm development for NEP/LEP
avoid shift-invert for interior eigenvalues LEP helps NEP (Self Consistent Iterations)
Scalable Linear SolversScalable Linear Solvers
Linear Solver is Linear Solver is Computational Kernel of Computational Kernel of Many CodesMany Codes Indefinite Matrices
Linear systems arising from shift-invert eigensolver in Omega3P
Indefinite linear system from KKT conditions S-parameter computation in S3P
Symmetric Positive Definite (SPD) Matrices From implicit time-stepping in T3P From thermal and mechanical analysis TEM3P From electro/magneto static analysis Gun3P
Issues in Petascale Electromagnetic simulations: Direct solver: memory usage, scalability of triangular solver Iterative solver: performance, effectiveness (preconditioner)
Omega3P Scalability on Omega3P Scalability on Jaguar/XTJaguar/XT with Iterative Linear with Iterative Linear SolverSolver
1.5M tetrahedral elements NDOFs = 9.6M NNZ = 506M
LCLS RF Gun
Scalability Using Sparse Direct Scalability Using Sparse Direct Solver MUMPSSolver MUMPS
Sparse Direct Solver is effective for highly indefinite matrices
Scalability affected by performance of Triangular Solver
N=2M, PSPASES Triangular Solver
N=2,019,968, nnz=32,024,600 No. of entries in L =1 billion
Need more scalable Triangular Solvers
More “More “Memory-usageMemory-usage” ” Scalable Sparse Direct Scalable Sparse Direct SolversSolvers
Maximal per-rank MU is 4-5 times than the average MU
Once it cannot fit into Nprocs, it most likely will not fit into 2*Nprocs
More “memory-usage” scalable solvers needed
MUMPS per-rank memory usage
N=1.11M, nnz=46.1M Complex matrix
Memory Saving Techniques Memory Saving Techniques
Single precision for factor matrix, iterative refinement to recover double precision accuracy (F)
Domain-specific Preconditioners Factorize real part of the matrix (R)
• Real part is a good approximation to the complex matrix User single precision to factorize real part of the
matrix (RF) Hierarchical preconditioners (FE order is the level)
(HP)• single precision for (1,1)-block (HPF)
• real part only for (1,1)-block (HPR)
• single precision & real part for (1,1)-block (HPRF)
Testing Results for Complex Testing Results for Complex Shifted Linear SystemsShifted Linear Systems
Recent Progress of Recent Progress of SuperLUSuperLU(Xiaoye Li)(Xiaoye Li)
Parallel symbolic factorization significantly reduces memory usage
Matrix for DDS Matrix for ILC Cavity
Future Work on Linear Future Work on Linear
SolversSolvers Direct versus iterative solvers, hybrid
solvers Investigate applicability of out-of-core
sparse direct solvers from TOPS Apply multigrid solvers from TOPS for SPD
matrices Extend PSPASES to indefinite/complex
matrices Develop more effective domain-specific
preconditioners