24
A Parallel Implementation of the BDDC Method for Linear Elasticity Jakub ˇ ıstek joint work with P. Burda, M. ˇ Cert´ ıkov´ a, J. Mandel, J. Novotn´ y, B. Soused´ ık Institute of Mathematics of the AS CR, Prague Czech Technical University, Prague University of Colorado Denver MAFELAP 2009, June 9 - 12, 2009 I N S T I T U T E o f M A T H E M A T IC S Academy of Sciences Czech Republic

A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

A Parallel Implementation of the BDDCMethod for Linear Elasticity

Jakub Sıstek

joint work withP. Burda, M. Certıkova, J. Mandel, J. Novotny, B. Sousedık

Institute of Mathematics of the AS CR, PragueCzech Technical University, Prague

University of Colorado Denver

MAFELAP 2009, June 9 - 12, 2009

INSTITUTE

ofMATH

EMATICS

Academ

yofSciences

Czech

Republic

Page 2: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

Some motivation for domain decomposition (DD) methods

I very large systems of algebraic equations arising from FEM –difficulties with solution by conventional numerical methods

I direct methods slow due to the problem sizeI iterative methods slow due to large condition number –

suitable preconditioner neededI combination of these approaches – synergy in domain

decomposition (DD) methods

I natural way to parallelize FEM

I two recent methods for elliptic PDEs – BDDC and FETI-DP

Page 3: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

Brief overview of BDDC method

I Balancing Domain Decomposition by Constraints

I 2003 C. Dohrmann (Sandia), theory with J. Mandel (UCD)

I nonoverlapping primary domain decomposition method

I equivalent to FETI-DP [Mandel, Dohrmann, Tezaur 2005]

Page 4: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

The abstract problem

Variational setting

u ∈ U : a(u, v) = 〈f , v〉 ∀v ∈ U

I a (·, ·) symmetric positive definite form on U

I 〈·, ·〉 is inner product on U

I U is finite dimensional space

Matrix form

u ∈ U : Au = f

I A symmetric positive definite matrix on U

Linked together

〈Au, v〉 = a (u, v) ∀u, v ∈ U

Page 5: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

BDDC set-up

I division into subdomains

I selection of coarse problem nodes (also called corners)

interface

subdomain iΩcoarse problem nodes

h

H

finite elements

Page 6: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

Function spaces in BDDC

U ⊂ W c ⊂ Wcontinuous continuous at coarse no continuity

problem nodes

I enough coarse nodes to fix floating subdomains – rigid bodymodes captured

I a (·, ·) symmetric positive definite form on W c

I corresponding matrix Ac symmetric positive definite, almostblock diagonal structure, larger dimension than A

I operator of projection E : W c → U, Range(E ) = U,e.g. averaging across interfaces (arithmetic, weighted)

Page 7: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

The second intermediate space in BDDC

Only coarse nodes do not suffice for optimal preconditioning in 3D⇒ additional constraints on functions from W c necessary

U ⊂ W ⊂ W c

continuous add constraints only corners

Examples: equivalent averages on subsets of interface (edges,faces) across interface, additional pointwise continuity constraints

Page 8: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

The BDDC preconditioner

Define MBDDC : r ∈ U −→ ∆u ∈ U

MBDDC : r 7−→ ∆u = Ew , w ∈ W : a (w , z) = 〈r ,Ez〉 ,∀z ∈ W

I r - residual in an iteration of PCG

I ∆u - correction to solution (preconditioned residual)

Page 9: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

The coarse space in BDDC

In implementation, space W may be split into independentsubdomain spaces and energy-orthogonal coarse space. On eachsubdomain – coarse degrees of freedom – basis functions Ψi –prescribed values of coarse degrees of freedom, minimal energyelsewhere, [

Ai CTi

Ci 0

] [Ψi

Λi

]=

[0I

].

I Ai . . . local subdomain stiffness matrix

I Ci . . . matrix of constraints – selects unknowns into coarsedegrees of freedom

Matrix of coarse problem AC assembled from local matricesACi = ΨT

i Ai Ψi .

Page 10: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

The coarse space in BDDC

a function from coarse space coarse basis function

Page 11: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

An iteration of BDDC1. Residual on interface

r (k) = g − Su(k)

S – Schur complement w.r.t. interface, g – condensed r.h.s.2. Distribution of residual

local problems coarse problemfor i = 1, . . . ,N

ri = ETi r (k) rC =

N∑i=1

RTCi Ψ

Ti ET

i r (k)

3. Correction of solution[Ai CT

i

Ci 0

] [∆ui

µi

]=

[ri

0

]AC ∆uC = rC

4. New approximation

∆u =N∑

i=1

Ei (ΨiRCi ∆uC + ∆ui ) , u(k+1) = u(k) + ∆u

Page 12: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

Frontal solver

I B. M. Irons, 1970

I direct solver for sparse matrices arising in FEM

I number of flops O(n · nfron2), where nfron << n is the frontwidth

I memory demand – nfron2 if out-of-core

I element-by-element approach – element matrices read fromfile until whole line is assembled, then immediately eliminated

I basic scheme – block 1 - ‘free’ variables, block 2 -‘constrained’ (also ‘fixed’) variables[

A11 A12

A21 A22

] [x1

x2

]=

[f1f2

]+

[0

Rea2

], (1)

I x2, f1, f2 – inputs

I x1, Rea2 (reaction forces) – outputs

Page 13: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

General constraints vs. frontal solver on subdomain

central idea – split matrix C according to types of constraints

I corners as Dirichlet boundary conditions, i.e. fixed variables

I averages enforced by Lagrange multipliers – matrix Cf

coarse problem construction on subdomain (index i omitted) Aff Afc CTf

Acf Acc 0Cf 0 0

Ψcf Ψavg

fI 0λc λavg

=

0 00 00 I

.

Page 14: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

Algorithm of preconditioner setup

1. Forward step of frontal solver with corners marked as fixedvariables in matrix A.

2. Find A−1ff CT

f by backward solve by frontal solver[Aff Afc

Acf Acc

] [A−1

ff CTf

0

]=

[CT

f

0

]+

[0(

Cf A−1ff Afc

)T

].

3. Construct Cf A−1ff CT

f and factorize it by LAPACK.

4. Backward solve of dual problem by LAPACK for λ from

Cf A−1ff CT

f λ = −[

Cf A−1ff Afc I

].

5. Backward solve for Ψf by frontal solver[Aff Afc

Acf Acc

] [Ψc

f Ψavgf

I 0

]=

[−CT

f λ0

]+

[0

Rea

].

6. Compute local AC = ΨT AΨ = ΨT

[−CT

f λRea

].

Page 15: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

General constraints vs. frontal solver

subdomain problem solution Aff Afc CTf

Acf Acc 0Cf 0 0

uf

=

r00

.Now single right-hand side.

Page 16: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

Algorithm of preconditioning action on subdomain

1. Backward step of frontal solver for A−1ff r[

Aff Afc

Acf Acc

] [A−1

ff r0

]=

[r0

]+

[0

Rea

].

2. Backward step of LAPACK for µ

Cf A−1ff CT

f µ = Cf A−1ff r .

3. Backward step of frontal solver for uf[Aff Afc

Acf Acc

] [uf

0

]=

[−CT

f µ+ r0

]+

[0

Rea

].

Page 17: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

Implementation

I subdomain problems - frontal solver + LAPACK

I coarse problem - MUltifrontal Massively Parallel sparse directSolver (MUMPS) http://mumps.enseeiht.fr

I mainly Fortran 77 programming language, partly Fortran 90,MPI library

I tested onI SGI Altix 4700, CTU, Prague, CR

72 processors Intel Itanium 2, OS Linux

Page 18: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

Hip joint replacement

I 33 186 quadratic elements, 544 734 unknowns

I 16 subdomains, 35 corners, 12 edges, and 35 faces

I 32 subdomains, 57 corners, 12 edges, and 66 faces

I 16 processors of SGI Altix 4700

Decomposition into 32 subdomains

Page 19: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

Hip joint replacement

von Mises stresses in improved design

Page 20: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

Hip joint replacement

0

20

40

60

80

100

0 1000 2000 3000 4000

cond

ition

num

ber

estim

ate

[/]

number of corners [/]

Condition number SGI Altix 470016 processors

nsub = 16nsub = 32

Condition number for adding corners

Page 21: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

Hip joint replacement

0

200

400

0 1000 2000 3000 4000

wal

l tim

e [s

econ

ds]

number of corners [/]

Wall times for variable coarse problem SGI Altix 470016 processors

nsub = 16nsub = 32

Wall clock time for adding corners

Page 22: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

Hip joint replacement, 16 subdomains

coarse problem C. C.+E. C.+F. C.+E.+F.

iterations 35 34 26 26

cond. number est. 96 96 65 65

factorization (sec) 91 80 78 106

pcg iter (sec) 53 49 38 37

total (sec) 183 166 153 181

adding averages to 335 corners

coarse degrees of freedom:

I C. - Corners only

I C.+E. - Corners and averages on Edges

I C.+F. - Corners and averages on Faces

I C.+E.+F. - Corners and averages on Edges and Faces

Page 23: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

Hip joint replacement, 32 subdomains

coarse problem C. C.+E. C.+F. C.+E.+F.

iterations 35 32 30 27

cond. number est. 149 70 59 46

factorization (sec) 60 57 59 62

pcg iter (sec) 49 40 37 34

total (sec) 128 115 113 113

adding averages to 557 corners

coarse degrees of freedom:

I C. - Corners only

I C.+E. - Corners and averages on Edges

I C.+F. - Corners and averages on Faces

I C.+E.+F. - Corners and averages on Edges and Faces

Page 24: A Parallel Implementation of the BDDC Method for Linear ...sistek/talks/Sistek-2009-MAFELAP-talk.pdfP. Burda, M. Cert kov a, J. Mandel, J. Novotn y, B. Soused k Institute of Mathematics

Conclusion

I distinguish between point constraints and averages for frontalsolver

I many matrices needed in BDDC simple side-product of frontalsolver (reactions)

I ‘minimal’ number of corners does not assure minimal solutiontime

I constraits on edges and/or faces can considerably shorten thesolution time

I more sophisticated (adaptive) way for selection of constraints- ongoing research