Upload
lala
View
43
Download
0
Embed Size (px)
DESCRIPTION
Trends in Sparse Linear Research and Software Developments in France. P. Amestoy 1 , M. Daydé 1 , I. Duff 2 , L. Giraud 1 , A. Haidar 3 , S. Lanteri 4 , J.-Y. L’Excellent 5 , P. Ramet 6 1 IRIT – INPT / ENSEEIHT 2 CERFACS & RAL 3 CERFACS 4 INRIA, nachos project-team 5 INRIA / LIP-ENSL - PowerPoint PPT Presentation
Citation preview
Trends in Sparse Linear Trends in Sparse Linear Research and Software Research and Software
Developments Developments in Francein France
P. AmestoyP. Amestoy11, , M. DaydéM. Daydé11, I. Duff, I. Duff22, L. Giraud, L. Giraud11, A. Haidar, A. Haidar33, S. , S. LanteriLanteri44, J.-Y. L’Excellent, J.-Y. L’Excellent55, P. Ramet, P. Ramet66
1 IRIT – INPT / ENSEEIHT2 CERFACS & RAL
3 CERFACS4INRIA, nachos project-team
5 INRIA / LIP-ENSL 6 INRIA Futurs / LaBRI
Overview of sparse linear Overview of sparse linear algebra techniquesalgebra techniques
From Iain DuffFrom Iain Duff
CERFACS & RALCERFACS & RAL
IntroductionIntroduction
Solution of Solution of
Ax = b
Where A is large (may be 106 or greater) and sparse
Direct methodsDirect methods Key idea : factorize the matrix into the product of matrices easy to
invert (triangular) with possibly some permutations to preserve sparsity and maintain numerical stability
E.g. Gaussian Elimination : PAQ = LU where
• P and Q are rows / columns permutations, • L : Lower triangular (sparse), • U: Upper triangular (sparse)
Then forward / backward substitution : Ly = PbUQTx = y
Good points• Can solve very large problems even in 3D• Generally are fast ... even on fancy machines• Are robust and well packaged
Bad points• Need to have A either element-wise or assembled : can have very high
storage requirement• Can be costly
Iterative methodsIterative methods Generate a sequence of vectors converging towards the solution of the
linear system
E.g. iterative methods based on Krylov subspaces : Кk(A; r0) = sp{r0, Ar0, · · · Ak−1r0} where r0 = b − Ax0.
The idea is then to choose a suitable xk x0 + Кk (A; r0)
For example, so that it minimizes || b − Axk ||2 (GMRES).
There are many Krylov methods, depending on criteria used for choosing xk.
Good points• May not require to form A• Usually very low storage requirements• Can be efficient on 3D problems
Bad points• May require a lot of iterations for convergence• May require preconditioning
Hybrid methodsHybrid methods
Not just preconditioningNot just preconditioning• ILU Sparse approximate inverse or
Incomplete Cholesky: core techniques of direct methods are used
We focus on • using a direct method/code combined
with an iterative method.
Hybrid methids (con’t)Hybrid methids (con’t)
Generic examples of hybrid methods are:• Domain decomposition ... using direct
method on local subdomains and/or direct preconditioner on interface
• Block iterative methods .... direct solver on subblocks
• Factorization of nearby problem as a preconditioner
Sparse direct methodsSparse direct methods
Sparse Direct SolversSparse Direct Solvers
Usually three stepsUsually three steps• Pre-processing : symbolic factorizationPre-processing : symbolic factorization• Numercial factorizationNumercial factorization• Forward-backward substitutionForward-backward substitution
Simulate a phenomenon Solve a sparse linear system
~100 MB
Factorize the matrix
~10 GBMUMPS
71 GFlops
512 proc T3E
MUMPS and sparse direct methods
MUMPS team
http://graal.ens-lyon.fr/MUMPS andand http://mumps.enseeiht.fr
History Main contributors since 1996 : Patrick
Amestoy, Iain Duff, Abdou Guermouche, Jacko Koster, Jean-Yves L’Excellent, Stéphane Pralet
Current development team :• Patrick Amestoy, ENSEEIHT-IRIT• Abdou Guermouche, LABRI-INRIA• Jean-Yves L’Excellent, INRIA• Stéphane Pralet, now working for SAMTECH
Phd Students• Emmanuel Agullo, ENS-Lyon• Tzvetomila Slavova, CERFACS.
Users Around Around 1000 users, 2 requests per day Academics or industrials Type of applications :
• Structural mechanics, CAD• Fluid dynamics, Magnetohydrodynamic, Physical
Chemistry• Wave propagation and seismic imaging, Ocean
modelling• Acoustics and electromagnetics propagation• Biology• Finite Element Analysis, Numerical Optimization,
Simulation• . . .
MUMPSMUMPS:: A MUltifrontal Massively Parallel A MUltifrontal Massively Parallel SolverSolver
MUMPS solves large systems of linear equations of the form Ax=b by factorizing A into A=LU or LDLT
Symmetric or unsymmetric matrices (partial pivoting)
Parallel factorization and solution phases (uniprocessor version also available)
Iterative refinement and backward error analysis
Various matrix input formats • assembled format• distributed assembled format• sum of elemental matrices
Partial factorization and Schur complement matrix
Version for complex arithmetic
Several orderings interfaced: AMD, AMF, PORD, METIS, SCOTCH
The multifrontal method (Duff, Reid’83)
Memory is divided into two parts (that canoverlap in time) :
• the factors• the active memory
Elimination tree represents tasksdependencies
ImplementationImplementation Distributed
multifrontal solver (MPI / F90 based)
Dynamic distributed scheduling to accomodate both numerical fill-in and multi-user environments
Use of BLAS, ScaLAPACK
A fully asynchronous distributed solver (VAMPIR trace, 8 processors)
MUMPS 3 main steps (plus initialization and termination) :
•JOB=-1 : initialize solver type (LU, LDLT ) and default parameters
•JOB=1 : analyse the matrix, build an ordering, prepare factorization
•JOB=2 : (parallel) numerical factorization A = LU
•JOB=3 : (parallel) solution step forward and backward substitutions (Ly = b,Ux = y)
•JOB=-2 : termination deallocate all MUMPS data structures
Car body148770 unknowns and 5396386 nonzerosMSC.Software
AVAILABILITY MUMPS is available free of charge
It is used on a number of platforms (CRAY, SGI, IBM, Linux, …) and is downloaded once a day on average (applications in chemistry, aeronautics, geophysics, ...)
If you are interested in obtaining MUMPS for your own use,
please refer to the MUMPS home page
Some MUMPS users: Boeing, BRGM, CEA, Dassault, EADS, EDF, MIT, NASA, SAMTECH, ...
Matrix SolverNumber of processors
1 4 8 16 32 64 128
BbmatMUMPS - 32.1 10.8 12.3 10.4 9.1 7.8SuperLU - 132.9 72.5 39.8 23.5 15.6 11.1
Ecl32MUMPS - 23.9 13.4 9.7 6.6 5.6 5.4SuperLu - 48.5 26.6 15.7 9.6 7.6 5.6
COMPETITIVE PERFORMANCE
Comparison with SuperLU extracted from ACM TOMS 2001 and obtained with S. Li
Recent performance results (ops = nb of operations)
Factorization time in seconds of large matrices on the CRAY T3E (1 proc: not enough memory)
MatrixOpsx109
Factorizationtime in seconds
on 1 proc
Factorizationtime in seconds
on 64 proc
Factorizationtime in seconds
on 128 procAUDIKW_1 5682 3262.8 54.6 35.9
BRGM 31010 - 283.9 - CONESHL_mod 1640 1099.0 19.6 12.1
CONV3D64 23880 - 207.5 146.5 ULTRASOUND80 3915 1542.2 37.1 29.5
Functionalities, Features Recent features
• Symmetric indefinite matrices : preprocessing and 2-by-2 pivots
• Hybrid scheduling• 2D cyclic distributed Schur complement• Sparse, multiple right-hand sides• Singular matrices with detection of null pivots• Interfaces to MUMPS : Fortran, C, Matlab (S. Pralet, while at
ENSEEIHT-IRIT) and Scilab (A. Fèvre, INRIA)
Future functionalities• Out-of-core execution• Parallel analysis phase• Rank revealing algorithms• Hybrid direct-iterative solvers (with Luc Giraud)
Ratio of active and total memory peak on different numbers of processors for several large problems
(Ph.D. E. Agullo, ENS Lyon and Ph.D. M. Slavova, CERFACS)
• Use disk storage to solvevery large problems•Parallel out-of-core Factorization•Preprocessing to minimize volume of I/O•Scheduling for out-of-coresolution
On-going research on Out-of-Core solvers
Hybrid Scheduling Both memory and workload information are used to obtain
a better behaviour in terms of estimated memory, memory used and factorization time in the context of parallel factorization algorithms
Estimated memory much closer to effective memory used
Estimated and effective memory (millions of reals) for the factorization on 64 processors
Max: maximum amount of memoryAvg: average memory per processor
MUMPSstandard
MUMPShybrid
Matrix Estim real Estim real
AUDIKW_1 Max 118.7 50.7 73.9 41.9
Avg 76.2 31.4 49.5 32.1
BRGM Max 406.6 - 257.6 175.1
Avg 185.0 - 158.9 123.5
CONESHL_mod Max 59.6 33.1 33.8 22.5
Avg 25.2 16.8 21.6 16.2
CONV3D64 Max 93.7 88.4 86.9 81.0
Avg 68.7 60.5 60.9 60.2
ULTRASOUND80 Max 43.0 38.9 29.3 27.2
Avg 26.2 22.4 23.5 21.8
Memory minimizing schedules Multifrontal methods can use a large amount of
temporary data By decoupling task allocation and task processing, we
can reduce the amount of temporary data: new optimal schedule proposed in this context (Guermouche, L'Excellent, ACM TOMS)
Memory gains:
Active memory ratio (new algorithm vs Liu's ordering)
Remark: Gains relative to Liu's algorithm are equal to 27.1, 17.5 and 19.6 for matrices 8, 9, and 10 (gupta matrices), respectively
Preprocessing for symmetric matrices (S. Pralet, ENSEEIHT-IRIT)
Preprocessing: new scaling available, symmetric weighted matching and automatic tuning of the preprocessing strategies
Pivoting strategy (2-by-2 pivot and static pivoting) Improvement:
• factorization time• robustness in particular on KKT systems arising from
optimization• memory estimation
Matrix n nnz Old NewCONT-300 180095 539396 - 4.2
BOYD2 466316 890091 - 2.6STOKES128 49666 295938 1.5 0.8
Factorization time on a Linux PC (Pentium4 2.80 GHz)
Scotch, PasTIXScotch, PasTIX
PaStiX TeamPaStiX Team
INRIA – Futurs / LaBriINRIA – Futurs / LaBri
PaStiX solverPaStiX solverFunctionnalitiesFunctionnalities
• LLt, LDLt, LU factorization (symmetric pattern) with LLt, LDLt, LU factorization (symmetric pattern) with supernodal implementationsupernodal implementation
• Static pivoting (Max. Weight Matching) + It. Raff. / Static pivoting (Max. Weight Matching) + It. Raff. / CG / GMRESCG / GMRES
• 1D/2D block distribution + Full BLAS31D/2D block distribution + Full BLAS3• Support external ordering library (provided Scotch Support external ordering library (provided Scotch
ordering)ordering)• MPI/Threads implementation (SMP node / Cluster / MPI/Threads implementation (SMP node / Cluster /
Multi-core / NUMA)Multi-core / NUMA)• Simple/Double precision + Float/Complexe Simple/Double precision + Float/Complexe
operationsoperations• Require only C + MPI + Posix ThreadRequire only C + MPI + Posix Thread• Multiple RHS (direct factorization)Multiple RHS (direct factorization)• Incomplete factorization ILU(k) preconditionnerIncomplete factorization ILU(k) preconditionner
PaStiX solver (con’t)PaStiX solver (con’t)
Available on INRIA GforgeAvailable on INRIA Gforge• All-in-One source codeAll-in-One source code• Easy to install on Linux or AIX systemsEasy to install on Linux or AIX systems• Simple API (WSMP like)Simple API (WSMP like)• Thread safe (can be called from multiple threads in Thread safe (can be called from multiple threads in
multiple MPI communicators)multiple MPI communicators)
Current worksCurrent works• Use of parallel ordering (PT-Scotch) and parallel Use of parallel ordering (PT-Scotch) and parallel
symbolic factorization)symbolic factorization)• Dynamic scheduling inside SMP nodes (static mapping)Dynamic scheduling inside SMP nodes (static mapping)• Out-of Core implementationOut-of Core implementation• Generic Finite Element Assembly (domaine Generic Finite Element Assembly (domaine
decomposition associated to matrix distribution)decomposition associated to matrix distribution)
Direct solver chain (in PaStiX)Direct solver chain (in PaStiX)
Scotch(ordering &
amalgamation)
Fax(block symbolic
factorization)
Blend(refinement &
mapping)
Sopalin(factorizing &
solving)
graph partition symbolMatrixDistributed
solverMatrix
Distributedfactorized
solverMatrix
Distributedsolution
Analyze (sequential steps) // fact. and solve
Direct solver chain (in PaStiX)Direct solver chain (in PaStiX)
Scotch(ordering &
amalgamation)
Fax(block symbolic
factorization)
Blend(refinement &
mapping)
Sopalin(factorizing &
solving)
graph partition symbolMatrixDistributed
solverMatrix
Distributedfactorized
solverMatrix
Distributedsolution
Sparse matrix ordering (minimizes fill-in)•Scotch: an hybrid algorithm
• incomplete Nested Dissection
• the resulting subgraphs being ordered with an Approximate Minimum Degree method under constraints (HAMD)
Direct solver chain (in PaStiX)Direct solver chain (in PaStiX)
Scotch(ordering &
amalgamation)
Fax(block symbolic
factorization)
Blend(refinement &
mapping)
Sopalin(factorizing &
solving)
graph partition symbolMatrixDistributed
solverMatrix
Distributedfactorized
solverMatrix
Distributedsolution
The symbolic block factorization Q(G,P)Q(G,P)→→Q(G,P)*=Q(GQ(G,P)*=Q(G
*,P)*,P)=> linear in number => linear in number of blocks!of blocks!
Dense block structures Dense block structures →→ only a few extra pointers only a few extra pointers to store the matrixto store the matrix
Direct solver chain (in PaStiX)Direct solver chain (in PaStiX)
Scotch(ordering &
amalgamation)
Fax(block symbolic
factorization)
Blend(refinement &
mapping)
Sopalin(factorizing &
solving)
graph partition symbolMatrixDistributed
solverMatrix
Distributedfactorized
solverMatrix
Distributedsolution1 2 3 4 5 6 7 8
1 2 3 4 5
5 6 7 8
51 2 3 4 5 6 7 8
4 41 2 2 3 86 7
2321
6 7 7
CPU time prediction
Exact memory ressources
Modern architecture management (SMP nodes) : Modern architecture management (SMP nodes) : hybrid Threads/MPI implementation (all processors in hybrid Threads/MPI implementation (all processors in the same SMP node work directly in share memory)the same SMP node work directly in share memory)
Less MPI communication and lower the parallel Less MPI communication and lower the parallel memory overcostmemory overcost
Scotch(ordering &
amalgamation)
Fax(block symbolic
factorization)
Blend(refinement &
mapping)
Sopalin(factorizing &
solving)
graph partition symbolMatrixDistributed
solverMatrix
Distributedfactorized
solverMatrix
Distributedsolution
Direct solver chain (in PaStiX)Direct solver chain (in PaStiX)
Incomplete factorization in PaStiXIncomplete factorization in PaStiX Start from the acknowledgement that it is difficult to Start from the acknowledgement that it is difficult to
build a generic and robust pre-conditionerbuild a generic and robust pre-conditioner Large scale 3D problemsLarge scale 3D problems High performance computingHigh performance computing
Derive direct solver techniques to pre-conditionerDerive direct solver techniques to pre-conditioner
What’s new: (dense) What’s new: (dense) blockblock formulation formulation
Incomplete block symbolic factorization:Incomplete block symbolic factorization: Remove blocks with algebraic criteria Remove blocks with algebraic criteria Use amalgamation algorithm to get dense blocksUse amalgamation algorithm to get dense blocks
Provide incomplete factorization LDLProvide incomplete factorization LDLtt, Cholesky, LU , Cholesky, LU (with static pivoting for symmetric pattern)(with static pivoting for symmetric pattern)
Numerical experiments Numerical experiments (TERA1)(TERA1)
Successful approach for a large collection of industrial Successful approach for a large collection of industrial test cases (PARASOL, Boeing Harwell, CEA) on IBM SP3test cases (PARASOL, Boeing Harwell, CEA) on IBM SP3
TERA1 supercomputer of CEA Ile-de-France TERA1 supercomputer of CEA Ile-de-France (ES45 SMP 4 procs)(ES45 SMP 4 procs)
COUPOLE40000 : COUPOLE40000 : 26.5 1026.5 1066 of unknowns of unknowns 1.5 101.5 101010 NNZL and 10.8Tflops NNZL and 10.8Tflops
356 procs: 34s356 procs: 34s 512 procs: 27s512 procs: 27s 768 procs: 20s 768 procs: 20s
(>500Gflop/s about 35% peak perf.)(>500Gflop/s about 35% peak perf.)
Numerical experiments Numerical experiments (TERA10)(TERA10)
Successful approach on 3D mesh problem with about Successful approach on 3D mesh problem with about 30 millions of unkowns on TERA10 supercomputer30 millions of unkowns on TERA10 supercomputer
But memory is the bottleneck !!!But memory is the bottleneck !!!
ODYSSEE code of French ODYSSEE code of French CEA/CESTACEA/CESTA
Electro-Electro-magnetism code magnetism code (Finite Element (Finite Element Meth. + Integral Meth. + Integral Equation)Equation)Complex double Complex double precision, Schur precision, Schur Compl.Compl.
LinksLinks Scotch : Scotch : http://gforge.inria.fr/projects/scotchhttp://gforge.inria.fr/projects/scotch PaStiX : PaStiX : http://gforge.inria.fr/projects/pastixhttp://gforge.inria.fr/projects/pastix MUMPS : MUMPS : http://mumps.enseeiht.fr/http://mumps.enseeiht.fr/
http://graal.ens-lyon.fr/MUMPShttp://graal.ens-lyon.fr/MUMPS ScAlApplix : ScAlApplix : http://www.labri.fr/project/scalapplixhttp://www.labri.fr/project/scalapplix
ANR CIGCANR CIGC NumasisNumasis ANR CIS ANR CIS Solstice & AsterSolstice & Aster
Latest publication : to appear in Parallel Computing : Latest publication : to appear in Parallel Computing : On finding On finding approximate supernodes for an efficient ILU(k) factorizationapproximate supernodes for an efficient ILU(k) factorization
For more publications, see : http://www.labri.fr/~ramet/For more publications, see : http://www.labri.fr/~ramet/
OSSAU code of French OSSAU code of French CEA/CESTACEA/CESTA 2D / 3D structural mechanics code2D / 3D structural mechanics code
ODYSSEE code of French ODYSSEE code of French CEA/CESTACEA/CESTA Electro-magnetism code Electro-magnetism code
(Finite Element Meth. + Integral Equation)(Finite Element Meth. + Integral Equation) Complex double precision, Schur Compl.Complex double precision, Schur Compl.
Fluid mechanicsFluid mechanics LU factorization with static pivoting LU factorization with static pivoting
(SuperLU approach like)(SuperLU approach like)
Industrial applicationsIndustrial applications
Other parallel sparse direct codes
Code Technique Scope Availability (www.)
MA41 Multifrontal UNS cse.clrc.ac.uk/Activity/HSL
MA49 Multifrontal QR RECT cse.clrc.ac.uk/Activity/HSL
PanelLLT Left-looking SPD NgPARDISO Left-right looking UNS Schenk
PSL Left-looking SPD/UNS SGI product
SPOOLES Fan-in SYM/UNS netlib.org/linalg/spooles
SuperLU Left-looking UNS nersc.gov/xiaoye/SuperLU
WSMP Multifrontal SYM/UNS IBM product
Shared-memory codes
Other parallel sparse direct codes Distributed-memory codes
Code Technique Scope Availability (www.)Availability (www.)
CAPSS Multifrontal SPD netlib.org/scalapack
MUMPS Multifrontal SYM/UNS mumps.enseeiht.frgraal.ens-lyon.fr/MUMPS
PaStiX Fan-in SYM/UNS gforge.inria.fr/pastix
PSPASES Multifrontal SPD cs.umn.edu/mjoshi/pspases
SPOOLES Fan-in SYM/UNS netlib.org/linalg/spooles
SuperLU Fan-out UNS nersc.gov/xiaoye/SuperLU
S+ Fan-out UNS cs.ucsb.edu/research/S+
WSMP Multifrontal SYM IBM product
Sparse solver for Ax = b: only a black box ?
Preprocessing and postprocessing :• Symmetric permutations to reduce fill :(Ax = b => PAPtPx = b)• Numerical pivoting, scaling to preserve numerical accuracy• Maximum transversal (set large entries on the diagonal)• Preprocessing for parallelism (influence of task mapping on
parallelism)• Iterative refinement, error analysis
Default (often automatic/adaptive) setting of the options is available.However, a better knowledge of the options can help the
user to further improve : memory usage, time for solution, numerical accuracy.
The GRID-TLSE ProjetThe GRID-TLSE Projet
A web expert site for sparse linear A web expert site for sparse linear algebraalgebra
Overview of GRID-TLSE: Overview of GRID-TLSE: http://gridtlse.orghttp://gridtlse.org
Supported by Supported by
• ANR LEGO ProjectANR LEGO Project
• ANR SOLSTICE ProjectANR SOLSTICE Project
• CNRS / JST Program : REDIMPS ProjectCNRS / JST Program : REDIMPS Project
• ACI GRID-TLSE ProjectACI GRID-TLSE Project Partners:Partners: Supported Supported
byby
Sparse Matrices Expert Site ?Sparse Matrices Expert Site ?
Expert siteExpert site: Help users in choosing the right : Help users in choosing the right solvers and its parameters for a given problemsolvers and its parameters for a given problem
Chosen approachChosen approach: Expert scenarios which answer : Expert scenarios which answer common user requestscommon user requests
Main goalMain goal: Provide a friendly test environment for : Provide a friendly test environment for expert and non-expert users of sparse linear expert and non-expert users of sparse linear algebra software.algebra software.
Sparse Matrices Expert Site ?Sparse Matrices Expert Site ?
Easy access toEasy access to::• Software and tools;Software and tools;• A wide range of computer architectures;A wide range of computer architectures;• Matrix collections;Matrix collections;• Expert Scenarios.Expert Scenarios.
AlsoAlso : Provide a testbed for sparse : Provide a testbed for sparse linear algebra softwarelinear algebra software
Why using the grid ?Why using the grid ? Sparse linear algebra software makes use of Sparse linear algebra software makes use of
sophisticated algorithms for (pre-/post-) sophisticated algorithms for (pre-/post-) processing the matrix.processing the matrix.
Multiple parameters interfere for efficient Multiple parameters interfere for efficient execution of a sparse direct solver:execution of a sparse direct solver:• Ordering;Ordering;• Amount of memory;Amount of memory;• Architecture of computer;Architecture of computer;• Available libraries.Available libraries.
Determining the best combination of parameter Determining the best combination of parameter values is a multi-parametric problem.values is a multi-parametric problem.
Well-suited for execution over a GridWell-suited for execution over a Grid..
ComponentsComponents
How do software X and Y compare in terms of memory How do software X and Y compare in terms of memory and CPU on my favourite matrix A ?and CPU on my favourite matrix A ?
??0
5
10
15
20
25
30
35
40
Memory CPU
Matrix GRE
MUMPS
SuperLU
Software componentsSoftware components
Weaver: high level Weaver: high level administrator for the administrator for the deployment and the deployment and the exploitation of services on the exploitation of services on the gridgrid
Websolve: an Web Websolve: an Web interface to start services on interface to start services on he gridhe grid
Middleware: DIET Middleware: DIET developed within GRID-developed within GRID-ASP (LIP, Loria Resedas, ASP (LIP, Loria Resedas, LIFC-SDRP) and soon ITBL LIFC-SDRP) and soon ITBL (?)(?)
User 1User 1User iUser i
Hybrid SolversHybrid Solvers
Parallel hybrid iterative/direct solver for the solution of large sparse linear systems
arising from 3D elliptic discretization
L. Giraud1 A. Haidar2 S. Watson3
1ENSEEIHT, Parallel Algorithms and Optimization Group2 rue Camichel, 31071 Toulouse, France
CERFACS - Parallel Algorithm Project42 Avenue Coriolis, 31057 Toulouse, France
3Departments of Computer Science and Mathematics, Virginia Departments of Computer Science and Mathematics, Virginia Polytechnic InstitutePolytechnic Institute, USA
Non-overlapping domain Non-overlapping domain decomposition decomposition
Natural approach for PDE’s, extend to general sparse matrices
Partition the problem into subdomains, subgraphs• Use a direct solver on the subdomains (MUMPS package)
• Robust algebraically preconditioned iterative solver on the interface (Algebraic Additive Schwarz preconditioner, possibly with sparsified and mixed arithmetic variants)
Numerical behaviour of the Numerical behaviour of the preconditionerspreconditioners
Convergence history on a 43 millions dof problems on 1000 System X processors - Virginia Tech.
Parallel scaled scalability studyParallel scaled scalability study
Parallel elapsed time with fixed sub-problem size (43 000 dof)when the number of procesors varies from 27 (1.1 106 dof)
up-to 1000 (43 106 dof)
Hybrid iterative/direct strategiesfor solving large sparse linear systems
resulting from the finite element discretization of the time-harmonic
Maxwell equations
L. Giraud1 A. Haidar2 S. Lanteri31ENSEEIHT, Parallel Algorithms and Optimization Group
2 rue Camichel, 31071 Toulouse, FranceCERFACS - Parallel Algorithm Project
42 Avenue Coriolis, 31057 Toulouse, France3INRIA, nachos project-team
2004 Route des Lucioles, BP 93, 06902 Sophia Antipolis Cedex, France
Context and objectives Solution of Solution of time-harmonic electromagnetic wave
propagation Discretization in space
• Discontinuous Galerkin time-harmonic methods• Unstructured tetrahedral meshes• Based on Pp nodal (Lagrange) interpolation• Centered or upwind fluxes for the calculation of jump terms at
cell boundaries Discretization in space results in a large, sparse, with
complex coefficients linear system• Direct (sparse LU) solvers for 2D problems• Parallel solvers are mandatory for 3D problems
Related publications• H. Fol (PhD thesis, 2006)• V. Dolean, H. Fol, S. Lanteri and R. Perrussel(J. Comp. Appl.
Math., to appear, 2007)
Solution algorithm
Parallel hybrid iterative/direct solver• Domain decomposition framework
Schwarz algorithm with characteristic interface conditions
Sparse LU subdomain solver (MUMPS, P.R. Amestoy, I.S. Duff and J.-Y. L’Excellent, Comput. Meth. App. Mech. Engng., Vol 184, 2000)
Interface (Schur complement type) formulation Iterative interface solver (GMRES or BiCGstab) Algebraic block preconditioning of the interface
system; exploit the structure of the system (free to construct and store)
Scattering of a plane wave by a PEC cube
Plane wave frequency: 900 MHz Tetrahedral mesh:
• # vertices = 67,590 • # elements = 373,632
Total number of DOF: 2,241,792
Performance results on various number of processors (IBM JS21)
# procs # procs
PrecondPrecond
88 1616 3232
NoneNone 5050 5454 6363
M1M1 2424 2525 2626
Number of iterations
Scattering of a plane wave by a PEC cube
Scattering of a plane wave by a head
Plane wave frequency: 1800 MHz Tetrahedral mesh:
• # vertices = 188101 • # elements = 1118952
Total number of DOF: 6,713,712
Performance results on various number of processors (Blue Gene B/L)
# procs # procs
PrecondPrecond
4848 6464 128128 256256
NoneNone 150150 161161 198198 240240
M1M1 4040 4242 5151 6262
Number of iterations
Scattering of a plane wave by a head