30
FreeFem++ workshop 2016, 08 Dec. Direct solver and domain decomposition preconditioner for indefinite finite element matrices Atsushi Suzuki 1 1 Cybermedia Center, Osaka University [email protected] http://www.ljll.math.upmc.fr/suzukia/

Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

FreeFem++ workshop 2016, 08 Dec.

Direct solver anddomain decomposition preconditionerfor indefinite finite element matrices

Atsushi Suzuki1

1Cybermedia Center, Osaka [email protected]

http://www.ljll.math.upmc.fr/∼suzukia/

Page 2: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

outlines

I examples of indefinite finite element stiffness matrixsparse symmetric/unsymmetric, singular

I overview of sparse direct solverI pivoting strategy to solve indefinite and/or singular matrixI parallel efficiency of matrix from 3D N-S eqs. on super-scalar

and vector CPUsI coarse space to accelerate convergence of iterative solverI conclusion

Page 3: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

indefinite and singular matrix in solving PDE

2D stationary cavity driven flow problem in (0, 1)× (0, 1)µ : viscosity coefficient−2µ∇ ·D(u) + u · ∇u +∇p = 0 in Ω,

∇ · u = 0 in Ω,

u = g on ∂Ω.

g = [4x(1− x) 0]T on the top,g = 0 elsewhere.(u, p) : sol. ⇒ (u, p + 1) : sol.

Newton iteration to solve nonlinear system

K

[~un

~pn

]=

[A(~un−1) BT

B 0

] [~un

~pn

]=

[~fn

~0

]KerK =

[~0~1

]find u ∈ H1(Ω);u = g on ∂Ω, p ∈ L2

0(Ω) is not implemented.L2

0(Ω) = p ∈ L2(Ω) ;∫Ω

p = 0I fixing one point for pressureI penalization for pressureI matrix solver detects the kernel

Page 4: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

indefinite matrix from electro-magnetic problem

constraints on the external force: ∇ · f = 0 in Ω ⊂ R3.

∇× (∇× u) = f in Ω,

∇ · u = 0 in Ω,

u× n = 0 on ∂Ω.

H0(curl ; Ω) = u ∈ L2(Ω)3 ; ∇× u ∈ L2(Ω)3 ; u× n = 0

find (u, p) ∈ H0(curl ; Ω)×H10 (Ω)

(∇× u,∇× v) + (v,∇p) = (f, v) ∀v ∈ H0(curl ; Ω)

(u,∇q) = 0 ∀q ∈ H10 (Ω)

has a unique solution.I (∇× ·,∇× ·) : coercive on W ,

W = H0(curl ; Ω) ∩ u ∈ H(div; Ω) ; div u = 0.I stiffness matrix is symmetric but indefinite.I H0(curl ; Ω) = gradH1

0 (Ω)⊕W .

Page 5: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

indefinte and singular from electro-magnetic problem

constraints on the external force: ∇ · f = 0 in Ω.

∇× (∇× u) = f in Ω,

∇ · u = 0 in Ω,

(∇× u)× n = 0 on ∂Ω,

u · n = 0 on ∂Ω.

L20(Ω) = p ∈ L2(Ω) ; (p, 1) = 0

find (u, p) ∈ H(curl ; Ω)× H1(Ω) ∩ L20(Ω)

(∇× u,∇× v) + (v,∇p) = (f, v) ∀v ∈ H(curl ; Ω)

(u,∇q) = 0 ∀q ∈ H1(Ω) ∩ L20(Ω)

finite element approximation : Nédélec element of degree 0 and P1

N0(K) = (P0(K))3 ⊕ [x× (P0(K))3], P1(K)

ker[A BT

B 0

]=

[~0~1

], ∃A−1 on kerB

not easy problem for usual direct solvers

Page 6: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

semi-conductor problem with Drift-Diffusion model : 1/3

hole concentration p : unknownpotential ϕ : given log(ni/nd) in N-region, log(na/ni) in P.

−div(∇p + p∇ϕ) = 0 in Ωp = g on ΓD

∂νp = 0 on ΓN

following Maxwell-Boltzman statistics : p = niexp(ϕp − ϕ

Vth)

I ϕp : quasi-Fermi levelI ni : intrinsic concentration of the semiconductorI Vth = KBT/q : thermal voltageI KB : Boltzmann constantI q : positive electron chargeI T : lattice temperature

Page 7: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

semi-conductor problem with Drift-Diffusion model : 1/3

hole concentration p : unknownpotential ϕ : given log(ni/nd) in N-region, log(na/ni) in P.

−div(∇p + p∇ϕ) = 0 in Ωp = g on ΓD

∂νp = 0 on ΓN

p = ni/nd ∂νp = 0

p = na/ni

∂νp = 0

Page 8: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

semi-conductor problem with Drift-Diffusion model : 2/3

Slotboom variable ξ : p = ξe−ϕ −Jp = ∇p + p∇ϕ = ∇ξe−ϕ

−div(−Jp) = 0 in Ω−Jpe

ϕ = ∇ξ in Ω

function space : H(div) = τ ∈ L2(Ω)2 ; div τ ∈ L2(Ω),Σ = τ ∈ H(div) ; τ · ν = 0 on ΓN

integration by parts leads to

−∫

Ω

eϕJp · τ =∫

Ω

∇ξ · τ = −∫

Ω

ξ∇ · τ +∫

∂Ω

ξτ · ν

F. Brezzi, L. D. Marini, S. Micheletti, P. Pietra, R. Sacco, S. Wang.Discretization of semiconductor device problems (I) F. Brezzi et al.,Handbook of Numerical Anasysis vol XIII, Elsevier 2005hybridization of mixed formulation + mass lumping ⇒ FVM

mixed formulation + higher order approximation ⇒ indefinite matrix

Page 9: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

semi-conductor problem with Drift-Diffusion model : 2/3

mixed-type weak formulation

find (Jp, ξ) ∈ Σ× L2(Ω)∫Ω

eϕJp · τ −∫

Ω

ξ∇ · τ =−∫

ΓD

geϕτ · ν ∀τ ∈ Σ∫Ω

∇ · Jpv = 0 ∀v ∈ L2(Ω)

symmetric indefintereplacing ξ = eϕp again,

find (Jp, p) ∈ Σ× L2(Ω)∫Ω

eϕJp · τ −∫

Ω

eϕp∇ · τ =−∫

ΓD

geϕτ · ν ∀τ ∈ Σ∫Ω

∇ · Jpv = 0 ∀v ∈ L2(Ω)

unsymmetic indefintie cf. exponential fitting with FVM

Ravier-Thomas element for H(div) RT1(K) = (P1(K))2 + ~xP1(K),picewsie linear element for L2(Ω) P1(K).

Page 10: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

abstract framework

V : Hilbert space with inner product (·, ·) and norm || · ||.bilinear form a(·, ·) : V × V → R

I continuous : ∃γ > 0 |a(u, v)| ≤ γ||u|| ||v|| ∀u, v ∈ V .

I ∃α1 > 0 supv∈V,v 6=0

a(u, v)||v||

≥ α1||u|| ∀u ∈ V .

I ∃α2 > 0 supu∈V,u 6=0

a(u, v)||u||

≥ α2||v|| ∀v ∈ V .

find u ∈ V s.t. a(u, v) = F (v) ∀v ∈ V has a unique solution.

∀U ⊂ V subspace

find u ∈ U s.t. a(u, v) = F (v) ∀v ∈ U

in general, inf-sup condition in subspace U is unclear.in discretized problem : Vh ⊂ V ?in linear solver (subspace of Vh) ?

Page 11: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

State of the art : software for sparse direct solverSoftware parallel elimination data pivoting kernel

env. strategy manag. detectionUMFPACK —- multi-frontal static yes no

SuperLU_MT shared super-nodal dynamic yes noPardiso shared super-nodal dynamic yes +

√ε-p. no

SuperLU_DIST distributed super-nodal static no,√

ε-p. noMUMPS distributed multi-frontal dynamic yes yes

Dissection shared multi-frontal static yes yes

T. A. Davis, I. S. Duff. A combined unifrontal/multifrontal method for unsymmetricsparse matrices,ACM Trans. Math. Software, 25 (1999), 1–20.J. W. Demmel, S. C. Eisenstat, J. R. Gilbert, X. S. Li, J. W. H. Liu.A supernodal approach to sparse partial pivoting,SIAM J. Matrix Anal. Appl., 20 (1999), 720–755.O. Schenk, K. Gärtner. Solving unsymmetric sparse systems of liner equationswith PARDISO,Future Generation of Computer Systems, 20 (2004), 475–487.X. S. Li, J. W. Demmel. SuperLU_DIST : A scalable distributed-memory sparsedirect solver for unsymmetric linear systems,ACM Trans. Math. Software, 29 (2003), 110–140.P. R. Amestoy, I. S. Duff, J.-Y. L’Execellent. Mutlifrontal parallel distributedsymmetric and unsymmetric solvers,Comput. Methods Appl. Mech. and Engrg, 184 (2000) 501–520.A. Suzuki, F.-X. Roux, A dissection solver with kernel detection for symmetricfinite element matrices on shared memory computers,Int. J. Numer. Meth. in Engng, 100 (2014) 136–164.

Page 12: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

ordering of sparse matrix

sparse matrix needs to be re-orderedI to reduce fill-inI to increase parallelization of factorization → multi-frontI to increase size of block structure → supernode

example:7 stencil of Poisson equation, 113 nodes.

original matrix reverse Cuthill-McKee nested-dissection(3 layers)

Page 13: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

nested-dissection by graph decomposition

8 94

c

d

6

a b5 e f7

2 31

8 9

4

a b

5

c d

6

e f

7

2 3

1

sparse solver

dense solver

dense solver

dense solver

A. George. Numerical experiments using dissection methods to solve n by n gridproblems. SIAM J. Num. Anal. 14 (1977),161–179.software package:METIS : V. Kumar, G. Karypis, A fast and high quality multilevel scheme forpartitioning irregular graphs. SIAM J. Sci. Comput. 20 (1998) 359–392.SCOTCH : F. Pellegrini J. Roman J, P. Amestoy, Hybridizing nested dissectionand halo approximate minimum degree for efficient sparse matrix ordering.Concurrency: Pract. Exper. 12 (2000) 69–84.

I each leaf can be computed in parallel ⇐ multi-frontI load unbalance? because of different size of subdomainsI parallel computation of higher levels?

# cores > # subdomains

Page 14: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

recursive generation of Schur complement»A11 A21

A21 A22

–=

»A11 0

A21A−111 S22

– »I1 A−1

11 A12

0 I2

–S22 = A22 −A21A−1

11 A12 = A22 − (A21U−111 )D−1

11 L−111 A12 : recursively computed

8 9 a b c d e f 4 5 6 7 2 3 1

88

99

aa

bb

cc

dd

ee

ff

84

94

a5

b5

c6

d6

e7

f7

82

92

a2

b2

d3

e3

f3

91

b1

c1

d1

e1

44

55

66

77

42

52

73

61

22

33

11

21

31

48 49

5a 5b

6c 6d

7e 7f

28 29 2a 2b

3d 3e 3f

1b 1c 1d 1e

24 25

16

37

12 1319

42

52

73

63

22

33

11

21

31

24 25

36 37

12 13

22

33

11

21

31

12 13

41

51

61

71

14 15 16 17

Schur complement

by

Schur complement

by

11

Schur complement

by

sparse solver

dense solver

dense solver

44

55

66

77

dense factorization

sparse part : completely in paralleldense part : better use of BLAS 3; dgemm, dtrsm

Page 15: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

pivoting strategy

full pivoting : A = ΠTLLUΠR partial pivoting : A = ΠLU

find maxk<i,j≤n|A(i, j)| find maxk<i≤n|A(i, k)|

k k

symmetric pivoting : A = ΠT LDUΠ 2× 2 pivoting : A = ΠT L D UΠ

find maxk<i≤n|A(i, i)| find maxk<i,j≤ndet∣∣∣∣A(i, i) A(i, j)A(j, i) A(j, j)

∣∣∣∣k k

sym. pivoting is mathematically not always possible → 2× 2 pivoting

Page 16: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

understanding pivoting strategy by solution in subspaces

A = ΠT LDUΠ: symmetric pivotingD: diagonal, L: lower triangle, L(i, i) = 1, U : upper tri., U(i, i) = 1.

I index set i1, i2, · · · , imI Vm = span[~ei1 , ~ei2 , · · · , ~eim ] ⊂ RN

I Pm : RN → Vm orthogonal projection.

find ~u ∈ Vm (A~u− ~f,~v) = 0 ∀v ∈ Vm.

∃Π : A = ΠT LD UΠ⇒ ∃i1, i2, · · · , iN s.t. PmA PT

m : invertible on Vm 1 ≤ ∀m ≤ N .

2× 2 pivoting: Vm−1, Vm, Vm+1, Vm+2, Vm+3, by skipping Vm+1.

J. R. Bunch, L. Kaufman. Some stable methods for calculating inertiaand solving symmetric linear systems,Math. Comput, 31 (1977) 163–179.R. Bank, T.-F. Chan. An analysis of the composite step biconjugategradient method.Numer. Math, 66 (1993) 295–320.

Page 17: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

symmetric pivoting with postponing for block strategyI nested-dissection decomposition may produce singular

sub-matrix for indefinite matrix

τ : given threshold for null pivot|A(i, i)|/|A(i− 1, i− 1)| < τ ⇒ |A(i, i)| is null pivot.

i − 1i

44

55

66

77

22

33

11

31

21

71

61

51

41

73

63

52

42

invertible entries

candidates ofnull pivot

Schur complement matrix from suspicious (postponed) null pivotsand additional nodes ⇒ kernel detection algorithm

Page 18: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

kernel detection (rank deficient problem)[A11 A12

A21 A22

]=

[A11 0A21 S22

] [I1 A−1

11 A12

0 I2

]S22 = 0 ⇒ KerA =

[A−1

11 A12

−I2

]symmetric semi-positive definite, m + k = 4 + 6 = 10by Householder-QR factorization:

4.60e-02 -1.20e-02 2.91e-03 1.16e-02 2.24e-02 -9.33e-05 -3.60e-02 8.22e-03 -7.77e-03 -2.90e-020.0 3.84e-02 4.84e-03 -2.21e-02 1.87e-02 1.30e-03 -9.14e-03 -2.74e-02 1.48e-02 -1.91e-020.0 0.0 2.96e-02 1.68e-03 -2.55e-02 1.11e-04 1.28e-02 -1.04e-04 -1.12e-03 -1.20e-020.0 0.0 0.0 1.28e-02 -1.66e-03 8.48e-04 -4.29e-05 -7.90e-04 -8.56e-03 2.51e-030.0 0.0 0.0 0.0 1.23e-11 -5.49e-13 -8.30e-12 1.67e-13 2.10e-14 -7.08e-120.0 0.0 0.0 0.0 0.0 6.70e-13 -1.02e-13 -5.33e-13 -1.62e-13 1.18e-130.0 0.0 0.0 0.0 0.0 0.0 3.33e-13 -2.48e-14 -6.61e-14 -3.18e-130.0 0.0 0.0 0.0 0.0 0.0 0.0 1.22e-13 4.46e-15 -4.34e-140.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.05e-14 8.16e-150.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.09e-14

how to set threshold to distinguish betweennon-zero (1.28e-02) and zero (1.23e-11) values ?

Pardiso no capability of kernel detection.MUMPS user has to choose this value.Dissection + an algorithm by measuring dimension of residual of

matrix with a projection onto the image space.

Page 19: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

kernel detection algorithm based on LDU : 3/4

A : N ×N unsymmetric, dimKerA = k ≥ 1, dimImA ≥ m.two parameters: l, n, which define size of factorization,

N − n ln l

»A11 A12

A21 A22

–=

»A11 0A21 S22

– »I1 A−1

11 A12

0 I2

– gImn = span» gA−1

11 A12

−I2

–⊥.

I projection : P⊥n : RN → gImn

I solution in subspace, eA†N−lb =

» gA−111 b1

0

–, b =

»b1

b2

–lN − ll l

Theoretically ¬A−1N−k+1

perturbed solution with machine epsilon of double precision ε0

gA11

−1b1 = U−1

11 D−111 L−1

11 b1 + ε0em

err(n)l := max

˘max

x=[0 xl]6=0

||P⊥n ( eA†

N−lA x− x)||||x|| , max

x=[xN−l 0]6=0

|| eA†N−lA x− x||||x||

¯n = k + 1 ⇔ err(k+1)

k ≈ 0 ∧ err(k+1)k+1 ≈ 0 ∧ err(k+1)

k+2 ∼ 1

n = k ⇔ err(k)k−1 0 ∧ err(k)

k ≈ 0 ∧ err(k)k+1 ∼ 1

n = k − 1 ⇔ err(k−1)k−2 0 ∧ err(k−1)

k−1 0 ∧ err(k−1)k ∼ 1

Page 20: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

Exchange of 1× 1 and 2× 2 pivot entries

B =

1l2 1l3 0 1

d1

d2 d0

d0 d3

1 u2 u3

1 01

=

d1 d1u2 d1u3

d1l2 d2 + d1l2u2 d0 + d1l2u3

d1l3 d0 + d1l3u2 d3 + d1l3u3

find (i, j) : |bii · bjj − bjibij | ≥ |bkk · bmm − bmkbkm| for 2× 2 block

(i, j, h), (k, m , n) ∈ (1, 2, 3), (2, 3, 1), (3, 1, 2) .

permutation Π(1, 2, 3) = i, j, h,

Π B ΠT =

10 1l′1 l′2 1

d′1 d′0d′0 d′2

d′3

1 0 l′11 l′2

1

.

d2 6= 0 ⇒∣∣∣∣ d1 d1u2

d1l2 d2 + d1l2u2

∣∣∣∣ = d1 · (d2 + d1l2u2)− (d1l2)(d2l2) = d1d2 6= 0 .

d2 = 0 ∧ d3 = 0 ⇒ d0 6= 0,∣∣∣∣ d1l2u2 d0 + d1l2u3

d0 + d1l3u2 d1l3u3

∣∣∣∣ = d1l2u3 · d1l3u2−(d0+d1l2u3)(d0+d1l3u2) 6=0 .

d−4d−3d−2(d−1d0)(d1d2)→d−4d−3(d′−2d

′−1)d

′0(d1d2)→(d′−4d

′′−3)(d

′′′′−2d

′′′′−1)d

′′′′0 d′′1d′2 .

Kernel detection algorithm assumes dimImA ≥ m ≥ 4.

Page 21: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

example of kernel detection algorithmstationary Navier-Stokes equations, Re = 12, 800, N = 43, 998,τ = 10−2, (m = 4) + 1 + 1 : kernel ∼ pressure ambiguity6× 6 matrix by Householder QR factorization

4.221911e-2 5.065337e-3 5.137137e-3 1.493815e-3 3.874611e-2 1.218166e-24.060156e-2 3.228548e-2 5.466190e-3 2.174984e-3 6.749120e-4

1.389616e-2 6.729308e-3 8.980537e-3 1.813681e-31.708745e-3 1.640027e-15 6.871814e-1

1.203270e-15 1.788546e-131.674888e-16

computed residuals with orthogonal projection:k err(k)

k−1 err(k)k err(k)

k+1

2 1.57098143 · 10−3 2.69712366 · 10−16 8.11624415 · 10−1

β1 2.22044604 · 10−16

β4 8.88178419 · 10−16

β6 1.50415172 · 10−3

γ0 1.15583524 · 10−9

residual of kernel vectors:dim. of kernel = 1 dim. of kernel = 22.23779349 · 10−15 1.92578081 · 10−3

1.84833445 · 10−3

Page 22: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

stiffness matrix of electro-magnetic equations by FreeFem++

FreeFem++ script

load "msh3"load "Dissection"defaulttoDissection;mesh3 Th=cube(20,20,20);fespace VQh(Th, [Edge03d, P1]); // Nedelec elementVQh [u1, u2, u3, p], [v1, v2, v3, q];varf aa([u1, u2, u3, p], [v1, v2, v3, q]) =int3d(Th)((dy(u3)-dz(u2)) * (dy(v3) - dz(v2)) +

(dz(u1)-dx(u3)) * (dz(v1) - dx(v3)) +(dx(u2)-dy(u1)) * (dx(v2) - dy(v1)) +dx(p) * v1 + dy(p) * v2 + dz(p) * v3 +dx(q) * u1 + dy(q) * u2 + dz(q) * u3);

matrix A = aa(VQh, VQh, solver=sparsesolver,tolpivot=1.0e-2,strategy=102);

solver elapsed time (sec). algebraic errorUMFPACK 32.348 3.55790

Intel Pardiso 9.698 4.07409× 10−7

Dissection 10.534 5.89406× 10−15

Page 23: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

parallel performance on Xeon [email protected] 14cores ×2

100

1000

1 10

time(

sec.

)

# cores

Dissection

MUMPSIntel Pardiso

O(1/p)

unsymmetric matrix N = 1, 032, 183, nnz = 97, 961, 089, dim ker = 1.from 3D Navier-Stokes eqs., P2/P1, h=1/35, Re=300. 57GB mem.

Page 24: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

parallel performance on Xeon v3

n = 945, 164, nnz = 89, 588, 848.36.8GFlop/s × 28 cores, 64GB shared memory

purple yellow light green light blue dark bluesparse LDU spase Schur DTRSM DGEMM dense LDU# of cores CPU time (sec.) elapsed (sec.) GFlop/s of DGEMM

1 1,268.0 1,268.9 36.352 1,108.3 659.39 36.274 1,178.5 356.22 34.068 1,469.2 201.24 31.2316 1,813.2 129.63 25.2528 2,002.0 94.43 22.90

Page 25: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

parallel performance on NEC SX-ACE

n = 945, 164, nnz = 89, 588, 848.64GFlop/s × 4 cores, 64GB shared memory

purple yellow light green light blue dark bluesparse LDU spase Schur DTRSM DGEMM dense LDU

# of cores CPU time (sec.) elapsed (sec.) GFlop/s of DGEMM1 1,080.4 1,081.9 44.852 1,108.3 590.96 43.764 1,178.5 345.84 41.31

Page 26: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

Additive Schwarz preconditioner for 3D computation : 1/4

Rp : overlapping decomposition, Dp : a partition of unity (discrete)∑Mp=1R

Tp DpRp = IN ,

coarse space by Nicolaides~zp ⊂ RN : basis of coarse space, Z = [~z1, · · · , ~zM ], R0 = ZT .

~zp = RTp DpRp

~1,

2-level ASM preconditioner

Q−1ASM,2 = RT

0 (R0ART0 )−1R0 +

M∑p=1

RTp (RpART

p )−1Rp

hybrid version of 2-level ASM preconditioner

Q0 = RT0 (R0ART

0 )−1R0, P0 = I −Q0A

Q−1ASM,hybrid = Q0 + PT

0

M∑p=1

RTp (RpART

p )−1RpP0

cf. V. Dolean, P Jolivet, F. Nataf, An Introduction to Domain Decomposition Methods –Algorithms, Theory, and Parallel Implementation, SIAM, 2015

Page 27: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

Additive Schwarz preconditioner for 3D computation : 2/4

the stiffness matrix restricted on the coarse spaceR0ART

0 is invertible for indefinite problem ?Stokes eqs : coarse space ⇐ rigid body modes + pressure constant

penalty-type stabilized finite element methodVh ⊂ V : P1 finite elementQh ⊂ Q : P1 finite element +

∫Ω

ph dx = 0.Find (uh, ph) ∈ Vh(g)×Qh s.t.

a(uh, vh) + b(vh, ph) = (f, vh) ∀vh ∈ Vh,

b(uh, qh)− δd(ph, qh) = 0 ∀qh ∈ Qh.

δ > 0 : stability parameter, d(ph, qh) =∑K∈T

h2K

∫K

∇ph · ∇qh dx.

|ph|2h = d(ph, ph) : mesh dependent norm on Qh.I uniform weak inf-sup condition : Franca-Stenberg [1991]

∃β0, β1 > 0 ∀h > 0 supvh∈Vh

b(vh, qh)||vh||1

≥ β0||qh||0 − β1|qh|h ∀ qh ∈ Q0.

Page 28: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

Additive Schwarz preconditioner for 3D computation : 3/4

matrix formulation of the stabilized FEM for the Stokes eqs.V = RNv , Q ⊂ RNp , (~q,~1) = 0 for ~q ∈ Q.find (~u, ~p) ∈ V ×Q s.t.([

A BT

B −δ D

] [~u~p

]−

[~f~0

],

[~v~q

])= 0 ∀(~v, ~q) ∈ V ×Q

subspace : U ×R, U ⊂ V , R ⊂ Q

⇒[A BT

B −δ D

]is invertible on U ×R.

proof([A BT

B −δ D

] [~u~p

],

[~v~q

])= 0 ∀(~v, ~q ) ∈ U ×R ⇒ ~u = ~0, ~p = ~0.

( ~u, ~p ) ∈ U ×R ⇒ ( ~u, −~p ) ∈ U ×R.

([A BT

B −δ D

] [~u~p

],

[~u−~p

])= (A~u, ~u ) + δ( D~p, ~p ) > 0.

cf. S, Profeedings of ALGORITMY 2009

Page 29: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

Additive Schwarz preconditioner for 3D computation : 4/450×50×50 FEM nodes, overlap = 1

p=8 16 32

1e-12

1e-10

1e-08

1e-06

0.0001

0.01

1

0 20 40 60 80 100 120 140

rela

tive

resi

dual

iteration

diagonal scaling8 subdomains

16 subdomains32 subdomains

R0ART0 : 7p× 7p

dimker(R0ART0 ) = 1

is detected byDissection.

Page 30: Direct solver and domain decomposition preconditioner for ...hecht/ftp/ff++days/2016/pdf/AS.pdf · Direct solver and domain decomposition preconditioner for indefinite finite element

conclusion

I indefinite matrix is factorized by postponing strategy forsuspicious null pivots

I combination of 1x1 and 2x2 pivoting can factorize finite elementmatrices without adding perturbation

I new kernel detection algorithm resolves rank deficient problemfrom FEM matrices

I 1M DOF is factorized with 57GB memory on a shared memorycomputer within 2 minutes

I direct solver is efficiently used as sub-domain solverI stabilized term for the Stokes equations ensures solvability of the

coarse problem