Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Scalability analysis of the distributed-memory implementation of the
Aggregated unfitted Finite Element Method (AgFEM)
Alberto F. Martín∗, Santiago Badia, Francesc Verdugo, Eric Neiva
MWNDEA 2020, Melbourne, Australia, 12/02/2020
Embedded Finite elements
CutFEM, Finite Cell Method, AgFEM, X-FEM, ...
body-fitted mesh unfitted mesh
4 Simplified mesh generation
8 Dirichlet BC? 8 Numerical integration? 8 ill-conditioning? (this talk)
Alberto F. Martín (Monash University) MWNDEA2020 2/35
Embedded Finite elements
CutFEM, Finite Cell Method, AgFEM, X-FEM, ...
body-fitted mesh unfitted mesh
4 Simplified mesh generation
8 Dirichlet BC? 8 Numerical integration? 8 ill-conditioning? (this talk)
Alberto F. Martín (Monash University) MWNDEA2020 2/35
Parallel distributed-memory simulation pipeline
1. Unfitted (adaptive) Cartesian grids(p4est)
2. Partition using space filling-curves(p4est)
3. Unfitted FE discretization (AgFEM)
4. AMG linear solver (PETSc)
Alberto F. Martín (Monash University) MWNDEA2020 3/35
Parallel distributed-memory simulation pipeline
1. Unfitted (adaptive) Cartesian grids(p4est)
2. Partition using space filling-curves(p4est)
3. Unfitted FE discretization (AgFEM)
4. AMG linear solver (PETSc)
Alberto F. Martín (Monash University) MWNDEA2020 3/35
Parallel distributed-memory simulation pipeline
1. Unfitted (adaptive) Cartesian grids(p4est)
2. Partition using space filling-curves(p4est)
3. Unfitted FE discretization (AgFEM)
4. AMG linear solver (PETSc)
Alberto F. Martín (Monash University) MWNDEA2020 3/35
Unfitted methods at large scales: pros and cons
4 Highly scalable mesh generationbased on octrees (e.g. p4est)
4 Highly scalable mesh partition withspace-filling curves (Parmetis notneeded)
4 Highly scalable adaptive meshrefinement + load balancing
8 Not guaranteed that highly scalablelinear solvers keep their optimalproperties for cut elements.
Alberto F. Martín (Monash University) MWNDEA2020 4/35
Unfitted methods at large scales: pros and cons
4 Highly scalable mesh generationbased on octrees (e.g. p4est)
4 Highly scalable mesh partition withspace-filling curves (Parmetis notneeded)
4 Highly scalable adaptive meshrefinement + load balancing
8 Not guaranteed that highly scalablelinear solvers keep their optimalproperties for cut elements.
Alberto F. Martín (Monash University) MWNDEA2020 4/35
Unfitted methods at large scales: pros and cons
4 Highly scalable mesh generationbased on octrees (e.g. p4est)
4 Highly scalable mesh partition withspace-filling curves (Parmetis notneeded)
4 Highly scalable adaptive meshrefinement + load balancing
8 Not guaranteed that highly scalablelinear solvers keep their optimalproperties for cut elements.
Alberto F. Martín (Monash University) MWNDEA2020 4/35
Unfitted methods at large scales: pros and cons
4 Highly scalable mesh generationbased on octrees (e.g. p4est)
4 Highly scalable mesh partition withspace-filling curves (Parmetis notneeded)
4 Highly scalable adaptive meshrefinement + load balancing
8 Not guaranteed that highly scalablelinear solvers keep their optimalproperties for cut elements.
Alberto F. Martín (Monash University) MWNDEA2020 4/35
PETSc CG + AMG preconditioner on unfitted meshes
Poisson equation (weak scaling test with 5 meshes)
AMG+AgFEM AMG + Naive unfitted FEM*
103 104 105 106 107 108
DOFs
4
6
8
10
12
14
16
GC
itera
tions
103 104 105 106 107 108
DOFs
4
6
8
10
12
14
16
GC
itera
tions
* Nitsche BCs + modified integration in cut cells
Alberto F. Martín (Monash University) MWNDEA2020 5/35
Why linear solvers are affected by cut cells?
Condition number estimates (Poisson Eq.)
(a) Body-fitted case
k2(A) ∼ h−2
(b) Naive unfitted FEM
k2(A) ∼ |η|−(2p+1− 2d )
"small cut cell problem"
Alberto F. Martín (Monash University) MWNDEA2020 6/35
Possible remedies
Fix the linear solver
Taylor your parallel solver to deal with k2(A) ∼ |η|−(2p+1−2/d)
Example: [S. Badia, F. Verdugo. Robust and scalable domaindecomposition solvers for unfitted finite element methods. Journal ofComputational and Applied Mathematics (2018) ].
Fix the linear system (this talk)
Enhance the unfitted FE method so that k2(A) ∼ h−2
Use a standard scalable solver
Examples: CutFEM, AgFEM
Alberto F. Martín (Monash University) MWNDEA2020 7/35
Agenda
1. The AgFEM method (serial case)
2. Parallel implementation
3. Performance of parallel AgFEM + AMG solvers
Alberto F. Martín (Monash University) MWNDEA2020 8/35
Agenda
1. The AgFEM method (serial case)
2. Parallel implementation
3. Performance of parallel AgFEM + AMG solvers
Alberto F. Martín (Monash University) MWNDEA2020 9/35
AgFEM method for the Poisson Eq.
−∆u = f in Ω
u = uD on ∂Ω
Alberto F. Martín (Monash University) MWNDEA2020 10/35
Weak imposition of Dirichlet BCs
Nitsche’s Method
Find uh ∈ V h such that
ah(uh, vh) = lh(vh) ∀vh ∈ V h (vh does not vanish on ∂Ω!)
where
ah(uh, vh) :=∑
K∈T acth
∫
K∩Ω
∇uh · ∇vh −∑
F∈T acth ∩∂Ω
∫
F
(∇uh · n)vh
−∑
F∈(T acth ∩∂Ω)
∫
F
uh(∇vh · n) +∑
F∈(T acth ∩∂Ω)
βh−1F
∫
∂Ω
uh vh
lh(vh) :=∑
K∈T acth
∫
K∩Ω
f vh +∑
F∈(T acth ∩∂Ω)
βh−1F
∫
F
uDvh −∑
F∈(T acth ∩∂Ω)
∫
F
uD(∇vh · n
)
The key feature of AgFEM is the definition of the discrete space VhAlberto F. Martín (Monash University) MWNDEA2020 11/35
Starting point: "naive" FE space
V stdh := u ∈ C0(Ωact) : u|K ∈ Qp(K) ∀K ∈ T act
h
T acth , Ωact V std
h
Alberto F. Martín (Monash University) MWNDEA2020 12/35
Aggregated FE space
Basic idea: improve conditioning by removing problematic DOFs
V aggh :=
u ∈ Vh : u× =
∑
•∈masters(×)
Cוu• ∀× ∈ P
• well-posed dofs× problematic dofs (P)
Alberto F. Martín (Monash University) MWNDEA2020 13/35
Definition of constraints via cell aggregates
1. Generate cell aggregates(1 interior cell + several cut cells)
2. Define dof to root cell map root(×)
via the aggregates
3. Define constraints:
u× =∑
•∈dofs(root(×))
φroot(×)• (x×)u•
Alberto F. Martín (Monash University) MWNDEA2020 14/35
Definition of constraints via cell aggregates
1. Generate cell aggregates(1 interior cell + several cut cells)
2. Define dof to root cell map root(×)
via the aggregates
3. Define constraints:
u× =∑
•∈dofs(root(×))
φroot(×)• (x×)u•
Alberto F. Martín (Monash University) MWNDEA2020 14/35
Definition of constraints via cell aggregates
1. Generate cell aggregates(1 interior cell + several cut cells)
2. Define dof to root cell map root(×)
via the aggregates
3. Define constraints:
u× =∑
•∈dofs(root(×))
φroot(×)• (x×)u•
Alberto F. Martín (Monash University) MWNDEA2020 14/35
Definition of constraints via cell aggregates
1. Generate cell aggregates(1 interior cell + several cut cells)
2. Define dof to root cell map root(×)
via the aggregates
3. Define constraints:
u× =∑
•∈dofs(root(×))
φroot(×)• (x×)u•
Alberto F. Martín (Monash University) MWNDEA2020 14/35
Results for the unfitted aggregated FEM (Poisson Eq.)1
κ(A) ≤ c1h−2 (Condition number bound)
β ≤ c2h−2 (Nitsche’s penalty coef.)
‖u− uh‖H1(Ω) ≤ c3hp (Optimal convergence order)
‖u− uh‖L2(Ω) ≤ c4hp+1 (Optimal convergence order)
and others (inverse/trace inequalities, bound of aggregate size, boundof the extended solution, ...)
1 [Badia, Verdugo, Martín. The aggregated unfitted finite element method for ellipticproblems. Comput. Methods Appl. Mech. Eng. (2018).]
Alberto F. Martín (Monash University) MWNDEA2020 15/35
0.3 0.4 0.5 0.6 0.7
ℓ
0
5
10
15
20
25
30
log
10
(co
nd
est(
A))
p=1, standard
p=2, standard
p=1, aggr.
p=2, aggr.
Alberto F. Martín (Monash University) MWNDEA2020 16/35
0.3 0.4 0.5 0.6 0.7
ℓ
0
5
10
15
20
25
30
log
10
(co
nd
est(
A))
p=1, standard
p=2, standard
p=1, aggr.
p=2, aggr.
Alberto F. Martín (Monash University) MWNDEA2020 17/35
Convergence test
-2.5 -2 -1.5 -1
log10(h)
-8
-7
-6
-5
-4
-3
-2
-1
0
1
log10(E
rror
energ
y n
orm
)
p=1, standard
p=2, standard
p=1, aggr.
p=2, aggr.
slope 1
slope 2
(a) 2D
-2.5 -2 -1.5 -1
log10(h)
-8
-7
-6
-5
-4
-3
-2
-1
0
1
log10(E
rror
energ
y n
orm
)p=1, standard
p=2, standard
p=1, aggr.
p=2, aggr.
slope 1
slope 2
(b) 3DAlberto F. Martín (Monash University) MWNDEA2020 18/35
Extension to the Stokes problem2
75.0 |u|0.0
−∆u+∇p = f in Ω
∇ · u = 0 in Ω
u = 0 on ΓD
(∇u− pI) · n = g on ΓN
2[Badia, Martín, Verdugo. Mixed aggregated finite element methods for the unfitteddiscretization of the stokes problem. SIAM J. Sci. Comput., 40(6). 2018.]
Alberto F. Martín (Monash University) MWNDEA2020 19/35
Alberto F. Martín (Monash University) MWNDEA2020 20/35
0.3 0.4 0.5 0.6 0.7
`
0
5
10
15
20
25
30
35
40
log10(condest(A
))
Aggregated
Standard
0.3 0.4 0.5 0.6 0.7
`
0
5
10
15
20
25
30
35
40
log10(condest(A
))
Aggregated
Standard
Alberto F. Martín (Monash University) MWNDEA2020 21/35
1.2 1.4 1.6 1.8 2.0
log10((DOF)1/d
)
−6
−5
−4
−3
−2
log10
(‖u
h−
u‖ H
1
‖u‖ H
1
)
Aggregated
Standard
slope -2
1.2 1.4 1.6 1.8 2.0
log10((DOF)1/d
)
−6
−5
−4
−3
−2
log10
(‖u
h−
u‖ L
2
‖u‖ L
2
)Aggregated
Standard
slope -3
1.2 1.4 1.6 1.8 2.0
log10((DOF)1/d
)
−6
−5
−4
−3
−2
log10
(‖p
h−
p‖ L
2
‖p‖ L
2
)
Aggregated
Standard
slope -2
Alberto F. Martín (Monash University) MWNDEA2020 22/35
Agenda
1. The AgFEM method (serial case)
2. Parallel implementation
3. Performance of parallel AgFEM + AMG solvers
Alberto F. Martín (Monash University) MWNDEA2020 23/35
Parallel mesh distribution
D1 D2
(a) D = D1, D2. (b) View from D1. (c) View from D2
Main phases to be parallelized:
• Cell Aggregation
• Imposition of constraints
Alberto F. Martín (Monash University) MWNDEA2020 24/35
Cell aggregation (serial)
touched
untouched
aggregated
Alberto F. Martín (Monash University) MWNDEA2020 25/35
Cell aggregation (serial)
touched
untouched
aggregated
Alberto F. Martín (Monash University) MWNDEA2020 25/35
Cell aggregation (serial)
touched
untouched
aggregated
Alberto F. Martín (Monash University) MWNDEA2020 25/35
Cell aggregation (serial)
touched
untouched
aggregated
Alberto F. Martín (Monash University) MWNDEA2020 25/35
Cell aggregation (serial)
touched
untouched
aggregated
Alberto F. Martín (Monash University) MWNDEA2020 25/35
Aggregates in 3D
Alberto F. Martín (Monash University) MWNDEA2020 26/35
Cell aggregation (parallel)
(a) Step 1.
(b) Step 2. (c) Comm. (d) Step 3.
4 Standard nearest neighbor communication to determine root cellsAlberto F. Martín (Monash University) MWNDEA2020 27/35
Cell aggregation (parallel)
(a) Step 1. (b) Step 2.
(c) Comm. (d) Step 3.
4 Standard nearest neighbor communication to determine root cellsAlberto F. Martín (Monash University) MWNDEA2020 27/35
Cell aggregation (parallel)
(a) Step 1. (b) Step 2. (c) Comm.
(d) Step 3.
4 Standard nearest neighbor communication to determine root cellsAlberto F. Martín (Monash University) MWNDEA2020 27/35
Cell aggregation (parallel)
(a) Step 1. (b) Step 2. (c) Comm. (d) Step 3.
4 Standard nearest neighbor communication to determine root cellsAlberto F. Martín (Monash University) MWNDEA2020 27/35
Parallel imposition of constraints
Ds′ Ds′′ Ds
8 Subdomain-local constraints not even possible in some cases
4 At the end, only standard neighbor communication required
Alberto F. Martín (Monash University) MWNDEA2020 28/35
Agenda
1. The AgFEM method (serial case)
2. Parallel implementation
3. Performance of parallel AgFEM + AMG solvers
Alberto F. Martín (Monash University) MWNDEA2020 29/35
Weak scaling test setup
• Poisson eq.
• AgFEM vs "naive" unfitted FEM
• Linear solver:PCG from PETSc
• Preconditioner:smooth aggregation AMG fromPETSc (GAMG)
• Up to 16K cores and 1000Mbackground cells
Computed at Mare Nostrum 4
Alberto F. Martín (Monash University) MWNDEA2020 30/35
Number of PCG iterations (weak scaling)
103 104 105 106 107 108
DOFs
4
6
8
10
12
14
16
GC
itera
tions
103 104 105 106 107 108
DOFs
4
6
8
10
12
14
16
GC
itera
tions
103 104 105 106 107 108
DOFs
4
6
8
10
12
14
GC
itera
tions
(a) Popcorn (b) Spiral (c) Swiss Cheese
agg (load 1)agg (load 2)agg (load 3)
std (load 1)std (load 2)std (load 3)
Alberto F. Martín (Monash University) MWNDEA2020 31/35
Computational time (secs) AgFEM stages (weak scaling)
105 106 107 108
DOFs
0
1
2
3
4
Wal
lclo
cktim
e[s
]
105 106 107 108
DOFs
0
1
2
3
4
5
Wal
lclo
cktim
e[s
]105 106 107 108
DOFs
0
5
10
15
Wal
lclo
cktim
e[s
]
(a) Popcorn (b) Spiral (c) Swiss Cheese
Cell aggregation (Alg. 2)Path reconstruction (Algs. 3 and 4)Import data from root cells (Alg. 5)Setup constraints (Sect. 3.8)
Setup of local DOFs ids (Sect. 3.3)Setup of global DOFs ids (Sect. 3.7)FE integration + assembly (Table 2)
Alberto F. Martín (Monash University) MWNDEA2020 32/35
Computational time (secs) AMG solver (weak scaling)
105 106 107 108
DOFs
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Wal
lclo
cktim
e[s
]
105 106 107 108
DOFs
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Wal
lclo
cktim
e[s
]
104 105 106 107
DOFs
0
1
2
3
4
Wal
lclo
cktim
e[s
]
(a) Popcorn (b) Spiral (c) Swiss Cheese
Linear solver setup Linear solver run
• Setup degradation (11.94x CPU time for 4,448x larger problem)
• Similar for standard FEM in a box (2.50x for 355.26x)
Alberto F. Martín (Monash University) MWNDEA2020 33/35
Conclusions
4 Embedded FEM enables scalable octree-based meshes
8 ... but can destroy the scalability of linear solvers
4 AgFEM allows to recover the optimal scaling of linear solver
4 ... while keeping the optimal discretization order.
Alberto F. Martín (Monash University) MWNDEA2020 34/35
For more details, see papers:
F. Verdugo, A.F. Martín, S. Badia. Distributed-memory parallelization ofthe aggregated unfitted finite element method. CMAME, 357, 2019.
S. Badia, A.F. Martín, F. Verdugo. Mixed aggregated finite elementmethods for the unfitted discretization of the Stokes problem. SIAM J.Sci. Comput., 40(6), 2018.
S. Badia, F. Verdugo, A.F. Martín. The aggregated unfitted finite elementmethod for elliptic problems. CMAME, 336, 2018.
https://github.com/fempar/fempar
Alberto F. Martín (Monash University) MWNDEA2020 35/35