Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
A Parallel Implementation of the BDDCMethod for Linear Elasticity
Jakub Sıstek
joint work withP. Burda, M. Certıkova, J. Mandel, J. Novotny, B. Sousedık
Institute of Mathematics of the AS CR, PragueCzech Technical University, Prague
University of Colorado Denver
MAFELAP 2009, June 9 - 12, 2009
INSTITUTE
ofMATH
EMATICS
Academ
yofSciences
Czech
Republic
Some motivation for domain decomposition (DD) methods
I very large systems of algebraic equations arising from FEM –difficulties with solution by conventional numerical methods
I direct methods slow due to the problem sizeI iterative methods slow due to large condition number –
suitable preconditioner neededI combination of these approaches – synergy in domain
decomposition (DD) methods
I natural way to parallelize FEM
I two recent methods for elliptic PDEs – BDDC and FETI-DP
Brief overview of BDDC method
I Balancing Domain Decomposition by Constraints
I 2003 C. Dohrmann (Sandia), theory with J. Mandel (UCD)
I nonoverlapping primary domain decomposition method
I equivalent to FETI-DP [Mandel, Dohrmann, Tezaur 2005]
The abstract problem
Variational setting
u ∈ U : a(u, v) = 〈f , v〉 ∀v ∈ U
I a (·, ·) symmetric positive definite form on U
I 〈·, ·〉 is inner product on U
I U is finite dimensional space
Matrix form
u ∈ U : Au = f
I A symmetric positive definite matrix on U
Linked together
〈Au, v〉 = a (u, v) ∀u, v ∈ U
BDDC set-up
I division into subdomains
I selection of coarse problem nodes (also called corners)
interface
subdomain iΩcoarse problem nodes
h
H
finite elements
Function spaces in BDDC
U ⊂ W c ⊂ Wcontinuous continuous at coarse no continuity
problem nodes
I enough coarse nodes to fix floating subdomains – rigid bodymodes captured
I a (·, ·) symmetric positive definite form on W c
I corresponding matrix Ac symmetric positive definite, almostblock diagonal structure, larger dimension than A
I operator of projection E : W c → U, Range(E ) = U,e.g. averaging across interfaces (arithmetic, weighted)
The second intermediate space in BDDC
Only coarse nodes do not suffice for optimal preconditioning in 3D⇒ additional constraints on functions from W c necessary
U ⊂ W ⊂ W c
continuous add constraints only corners
Examples: equivalent averages on subsets of interface (edges,faces) across interface, additional pointwise continuity constraints
The BDDC preconditioner
Define MBDDC : r ∈ U −→ ∆u ∈ U
MBDDC : r 7−→ ∆u = Ew , w ∈ W : a (w , z) = 〈r ,Ez〉 ,∀z ∈ W
I r - residual in an iteration of PCG
I ∆u - correction to solution (preconditioned residual)
The coarse space in BDDC
In implementation, space W may be split into independentsubdomain spaces and energy-orthogonal coarse space. On eachsubdomain – coarse degrees of freedom – basis functions Ψi –prescribed values of coarse degrees of freedom, minimal energyelsewhere, [
Ai CTi
Ci 0
] [Ψi
Λi
]=
[0I
].
I Ai . . . local subdomain stiffness matrix
I Ci . . . matrix of constraints – selects unknowns into coarsedegrees of freedom
Matrix of coarse problem AC assembled from local matricesACi = ΨT
i Ai Ψi .
The coarse space in BDDC
a function from coarse space coarse basis function
An iteration of BDDC1. Residual on interface
r (k) = g − Su(k)
S – Schur complement w.r.t. interface, g – condensed r.h.s.2. Distribution of residual
local problems coarse problemfor i = 1, . . . ,N
ri = ETi r (k) rC =
N∑i=1
RTCi Ψ
Ti ET
i r (k)
3. Correction of solution[Ai CT
i
Ci 0
] [∆ui
µi
]=
[ri
0
]AC ∆uC = rC
4. New approximation
∆u =N∑
i=1
Ei (ΨiRCi ∆uC + ∆ui ) , u(k+1) = u(k) + ∆u
Frontal solver
I B. M. Irons, 1970
I direct solver for sparse matrices arising in FEM
I number of flops O(n · nfron2), where nfron << n is the frontwidth
I memory demand – nfron2 if out-of-core
I element-by-element approach – element matrices read fromfile until whole line is assembled, then immediately eliminated
I basic scheme – block 1 - ‘free’ variables, block 2 -‘constrained’ (also ‘fixed’) variables[
A11 A12
A21 A22
] [x1
x2
]=
[f1f2
]+
[0
Rea2
], (1)
I x2, f1, f2 – inputs
I x1, Rea2 (reaction forces) – outputs
General constraints vs. frontal solver on subdomain
central idea – split matrix C according to types of constraints
I corners as Dirichlet boundary conditions, i.e. fixed variables
I averages enforced by Lagrange multipliers – matrix Cf
coarse problem construction on subdomain (index i omitted) Aff Afc CTf
Acf Acc 0Cf 0 0
Ψcf Ψavg
fI 0λc λavg
=
0 00 00 I
.
Algorithm of preconditioner setup
1. Forward step of frontal solver with corners marked as fixedvariables in matrix A.
2. Find A−1ff CT
f by backward solve by frontal solver[Aff Afc
Acf Acc
] [A−1
ff CTf
0
]=
[CT
f
0
]+
[0(
Cf A−1ff Afc
)T
].
3. Construct Cf A−1ff CT
f and factorize it by LAPACK.
4. Backward solve of dual problem by LAPACK for λ from
Cf A−1ff CT
f λ = −[
Cf A−1ff Afc I
].
5. Backward solve for Ψf by frontal solver[Aff Afc
Acf Acc
] [Ψc
f Ψavgf
I 0
]=
[−CT
f λ0
]+
[0
Rea
].
6. Compute local AC = ΨT AΨ = ΨT
[−CT
f λRea
].
General constraints vs. frontal solver
subdomain problem solution Aff Afc CTf
Acf Acc 0Cf 0 0
uf
0µ
=
r00
.Now single right-hand side.
Algorithm of preconditioning action on subdomain
1. Backward step of frontal solver for A−1ff r[
Aff Afc
Acf Acc
] [A−1
ff r0
]=
[r0
]+
[0
Rea
].
2. Backward step of LAPACK for µ
Cf A−1ff CT
f µ = Cf A−1ff r .
3. Backward step of frontal solver for uf[Aff Afc
Acf Acc
] [uf
0
]=
[−CT
f µ+ r0
]+
[0
Rea
].
Implementation
I subdomain problems - frontal solver + LAPACK
I coarse problem - MUltifrontal Massively Parallel sparse directSolver (MUMPS) http://mumps.enseeiht.fr
I mainly Fortran 77 programming language, partly Fortran 90,MPI library
I tested onI SGI Altix 4700, CTU, Prague, CR
72 processors Intel Itanium 2, OS Linux
Hip joint replacement
I 33 186 quadratic elements, 544 734 unknowns
I 16 subdomains, 35 corners, 12 edges, and 35 faces
I 32 subdomains, 57 corners, 12 edges, and 66 faces
I 16 processors of SGI Altix 4700
Decomposition into 32 subdomains
Hip joint replacement
von Mises stresses in improved design
Hip joint replacement
0
20
40
60
80
100
0 1000 2000 3000 4000
cond
ition
num
ber
estim
ate
[/]
number of corners [/]
Condition number SGI Altix 470016 processors
nsub = 16nsub = 32
Condition number for adding corners
Hip joint replacement
0
200
400
0 1000 2000 3000 4000
wal
l tim
e [s
econ
ds]
number of corners [/]
Wall times for variable coarse problem SGI Altix 470016 processors
nsub = 16nsub = 32
Wall clock time for adding corners
Hip joint replacement, 16 subdomains
coarse problem C. C.+E. C.+F. C.+E.+F.
iterations 35 34 26 26
cond. number est. 96 96 65 65
factorization (sec) 91 80 78 106
pcg iter (sec) 53 49 38 37
total (sec) 183 166 153 181
adding averages to 335 corners
coarse degrees of freedom:
I C. - Corners only
I C.+E. - Corners and averages on Edges
I C.+F. - Corners and averages on Faces
I C.+E.+F. - Corners and averages on Edges and Faces
Hip joint replacement, 32 subdomains
coarse problem C. C.+E. C.+F. C.+E.+F.
iterations 35 32 30 27
cond. number est. 149 70 59 46
factorization (sec) 60 57 59 62
pcg iter (sec) 49 40 37 34
total (sec) 128 115 113 113
adding averages to 557 corners
coarse degrees of freedom:
I C. - Corners only
I C.+E. - Corners and averages on Edges
I C.+F. - Corners and averages on Faces
I C.+E.+F. - Corners and averages on Edges and Faces
Conclusion
I distinguish between point constraints and averages for frontalsolver
I many matrices needed in BDDC simple side-product of frontalsolver (reactions)
I ‘minimal’ number of corners does not assure minimal solutiontime
I constraits on edges and/or faces can considerably shorten thesolution time
I more sophisticated (adaptive) way for selection of constraints- ongoing research