7
ISSN 1433-9307 Fakultät für Mathematik und Informatik PREPRINT 2019-02 S. Köhler, O. Rheinbach, S. Sandfeld and D. Steinberger FETI-DP Solvers and Deal.II for Problems in Dislocation Mechanics

S. Köhler, O. Rheinbach, S. Sandfeld and D. Steinberger ... · Each rank has information about its locally owned cells and one ghost layer of neighboring cells. In deal.II, only

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: S. Köhler, O. Rheinbach, S. Sandfeld and D. Steinberger ... · Each rank has information about its locally owned cells and one ghost layer of neighboring cells. In deal.II, only

iSSn 1433-9307

Fakultät für Mathematik und Informatik

PrePrint 2019-02

S. Köhler, O. Rheinbach, S. Sandfeld and D. Steinberger

FETI-DP Solvers and Deal.II for Problems in Dislocation Mechanics

Page 2: S. Köhler, O. Rheinbach, S. Sandfeld and D. Steinberger ... · Each rank has information about its locally owned cells and one ghost layer of neighboring cells. In deal.II, only

S. Köhler, O. Rheinbach, S. Sandfeld and

D. Steinberger

FETI-DP Solvers and Deal.II for Problems in Dislocation Mechanics

TU Bergakademie Freiberg

Fakultät für Mathematik und Informatik

Prüferstraße 9

09599 FREIBERG

http://tu-freiberg.de/fakult1

Page 3: S. Köhler, O. Rheinbach, S. Sandfeld and D. Steinberger ... · Each rank has information about its locally owned cells and one ghost layer of neighboring cells. In deal.II, only

ISSN 1433 – 9307

Herausgeber: Dekan der Fakultät für Mathematik und Informatik

Herstellung: Medienzentrum der TU Bergakademie Freiberg

Page 4: S. Köhler, O. Rheinbach, S. Sandfeld and D. Steinberger ... · Each rank has information about its locally owned cells and one ghost layer of neighboring cells. In deal.II, only

FETI-DP SOLVERS AND DEAL.II FOR PROBLEMS INDISLOCATION MECHANICS

STEPHAN KÖHLER∗, OLIVER RHEINBACH∗, STEFAN SANDFELD† , AND

DOMINIK STEINBERGER†

Abstract. FETI-DP solvers and the deal.II adaptive finite element library are combined tosolve dislocation eigenstrain problems in micromechanics. Computational results using adaptivefinite elements with millions of unknowns and up to 3072 cores of the Taurus supercomputer at ZIHin Dresden are presented.

Key words. FETI-DP solvers, deal.II, adaptive finite elements, eigenstrain problems

1. Mechanical Model. To compute the stresses associated with dislocationswithin a specimen for the characterization of the microstructure [10, 11], we start byconsidering a linear elastic model described by

div σ = 0, σ = σT , σ = C : εel, and εel =1

2

(∇u+ (∇u)T

)to solve for the displacements u. Here, σ is the stress tensor, εel the elastic straintensor, and C the stiffness tensor. Dislocations are one-dimensional defects present incrystalline materials. They are the boundary of an area over which two crystal halveshave been displaced by the so-called Burgers vector ~b.

Within the linear elastic model, dislocations may be modeled by an eigenstrainapproach [5, 6] by expressing the total strain by

εtot = εel + εeig,

where εeig is the eigenstrain contribution due to the dislocation microstructure. Thearea enclosed by a dislocation is described by a vector ~A that is perpendicular to it.The eigenstrain contributions

dεeig =1

2(~b⊗ d ~A+ d ~A⊗~b),

where ⊗ denotes the outer product, are regularized using the non-singular formulationproposed by Cai et al. [4], similarly to the work by Jamond et al. [8]. The eigenstrainof a dislocation is a contribution to the right hand side of a linear elasticity problem.

2. Implementation of FETI-DP in Deal.II and Parallel Results. TheFinite Element Tearing and Interconnecting dual-primal method [7] is a nonover-lapping domain decomposition method for the solution of partial differential equa-tions. We decompose our computational domain Ω into nonoverlapping subdomainsΩ1, . . . ,ΩN and define the interface Γ between the subdomains as Γ := (∪Ni ∂Ωi)\∂ΩD,where ∂ΩD denotes the Dirichlet boundary of Ω. The degrees of freedom (dof) of theinterface are partitioned into dual variables (∆) and primal variables (Π). All remain-ing dofs are denoted as inner dofs (I). The global problem is obtained by assembling

∗Technische Universität Bergakademie Freiberg, Fakultät für Mathematik und Informatik, In-stitut für Numerische Mathematik und Optimierung, 09599 Freiberg, Germany; stephan.koehler,[email protected]†Technische Universität Bergakademie Freiberg, Fakultät für Maschinenbau, Verfahrens- und En-

ergietechnik, 09599 Freiberg, Germany; stefan.sandfeld, [email protected]

1

Page 5: S. Köhler, O. Rheinbach, S. Sandfeld and D. Steinberger ... · Each rank has information about its locally owned cells and one ghost layer of neighboring cells. In deal.II, only

Fig. 1. 28 dislocations in the cube [−5e−7, 5e−7]3 with 96 MPI ranks after the second adaptiverefinement.

the local subdomain problems (K(i), f (i)) only in the primal dofs. To enforce conti-nuity of the dual dofs of the solution a constraint Bu = 0 and Lagrange multipliers λare used. This leads to the global saddle point problem

[K B T

B O

] [uλ

]=

[f0

],(2.1)

where

K =

K

(1)BB K

(1)TΠB

. . ....

K(N)BB K

(N)TΠB

K(1)ΠB . . . K

(N)ΠB KΠΠ

, f =

f

(1)B...

f(N)B

(2.2)

and subscript B denotes the union of ∆ and I. The assembly in the primal dofsensures the invertibility of K. The system (2.1) can be solved by eliminating the dofsu and solving the remaining system Fλ = d with some Krylov space method and apreconditioner; for more details, see, e.g., [12, 9]. We interface our parallel FETI-DPimplementation [9] based on PETSc [2].

Deal.II [1] is a C++ finite element library making use of adaptive refinementand hanging node constraints. The parallelization in deal.II is based on MPI, and,for parallel linear algebra, PETSc or Trilinos can be interfaced. For the FETI-DPmethod all subdomain matrices K(i) and load vectors f (i) must be held unassembled.

For a parallel distributed triangulation, each grid cell belongs to exactly one MPIrank. Each rank has information about its locally owned cells and one ghost layerof neighboring cells. In deal.II, only a global numbering of the dofs is used. Eachdof belongs also to exactly one rank, given by the locally owned dofs. Additionally,each rank has the index set locally active dofs (dofs of locally owned cells) and locallyrelevant dofs (locally active dofs plus the dofs of ghost cells). Deal.II uses the classConstraintMatrix for the handling of the hanging node constraints and the Dirichletboundary values. To set up the local sparsity patterns and local constraint matrices,a global instance is built and the entries are copied according to a local numbering

2

Page 6: S. Köhler, O. Rheinbach, S. Sandfeld and D. Steinberger ... · Each rank has information about its locally owned cells and one ghost layer of neighboring cells. In deal.II, only

Fig. 2. Strong scalability from 96 to 768 subdomains and MPI ranks. The parallel efficienciesfor Solve+Ass., using 96 cores as a baseline, are 103%, 84%, and 78%. The initial grid has 503

cells and was two times adaptively refined; Q1 finite elements. The timings are for the assemblyand solve phases after the second adaptive refinement.

Table 13 072 MPI ranks; Q1 finite elements; 28 dislocations; initial grid has 403 cells

#Cores Ref. Cells Dofs Coarse It. Time=#Subd. Cyc. Dofs Fact. ass. solve

3 072 0 64 000 206 763 17 646 4.83s 276 0.93s 21.7s1 198 001 685 989 23 190 9.44s 438 2.86s 46.5s2 618 470 2 160 846 26 055 9.06s 534 9.95s 57.6s3 1 941 288 6 742 342 27 228 7.89s 624 39.7s 68.5s4 6 124 971 21 009 075 27 894 6.68s 694 152s 84.8s5 19 358 702 65 600 544 28 341 5.66s 688 520s 145s

of the locally relevant dofs. This local numbering of the locally relevant dofs is alsoused for the computation of the interface dofs. The interface dofs are defined as allnon-hanging node dofs which are shared between a locally owned cell and a ghost cell.For the hanging node dofs which are shared between a locally owned cell and a ghostcell, all dofs by which the hanging node dofs are constrained are selected. The primaldofs can be selected from the interface dofs. Also, faces, edges and vertices can becomputed [9]. Note, that the primal constraints have to ensure the invertibality ofall matrices K(i)

BB , i = 1, . . . , N . Furthermore, there has to be some special care inthe treatment of the Dirichlet boundary conditions and in the case of non-connectedsubdomains.

For our numerical experiments in Figure 2 and Table 1, a problem with 28 dislo-cations in the cube [−5e− 7, 5e− 7]3 was considered. The grid refinement was basedon the KellyErrorEstimator in deal.II. The partitioning is performed by p4est [3]. Weuse a relatively large number of nodal coarse dofs to ensure the invertibility of K also

3

Page 7: S. Köhler, O. Rheinbach, S. Sandfeld and D. Steinberger ... · Each rank has information about its locally owned cells and one ghost layer of neighboring cells. In deal.II, only

for very complex grid decompositions. The solve time is therefore, up to refinementlevel 4, dominated by the coarse factorization and forward-backward substitution. Abetter approach would be to use edge or face averages which would also lead to afewer number of coarse dofs and also better conditioning, see, e.g. [9]. The stoppingcriterion was a relative residual reduction to 1e− 9.

Acknowledgments. The computations were performed on a Bull Cluster atthe Center for Information Services and High Performance Computing (ZIH) at TUDresden. We like to thank Guido Kanschat and Daniel Arndt for their support withdeal.II. The third and the last author acknowledge funding from the European Re-search Council Starting Grant, “A Multiscale Dislocation Language for Data-DrivenMaterials Science," ERC Grant Agreement No. 759419 MuDiLingo.

REFERENCES

[1] G. Alzetta, D. Arndt, W. Bangerth, V. Boddu, B. Brands, D. Davydov,R. Gassmoeller, T. Heister, L. Heltai, K. Kormann, M. Kronbichler, M. Maier,J.-P. Pelteret, B. Turcksin, and D. Wells, The deal.II library, version 9.0,Journal of Numerical Mathematics, 26 (2018), pp. 173–183, https://doi.org/10.1515/jnma-2018-0054.

[2] S. Balay, W. D. Gropp, L. C. McInnes, and B. F. Smith, in Modern Software Toolsin Scientific Computing, E. Arge, A. M. Bruaset, and H. P. Langtangen, eds., BirkhäuserPress, 1997, pp. 163–202.

[3] C. Burstedde, L. C. Wilcox, and O. Ghattas, p4est: Scalable algorithms for paralleladaptive mesh refinement on forests of octrees, SIAM Journal on Scientific Computing, 33(2011), pp. 1103–1133, https://doi.org/10.1137/100791634.

[4] W. Cai, A. Arsenlis, C. R. Weinberger, and V. V. Bulatov, A non-singular contin-uum theory of dislocations, Journal of the Mechanics and Physics of Solids, 54 (2006),pp. 561–587, https://doi.org/10.1016/j.jmps.2005.09.005, http://www.sciencedirect.com/science/article/pii/S002250960500195X.

[5] J. D. Eshelby and R. E. Peierls, The determination of the elastic field of an ellipsoidalinclusion, and related problems, Pro. of the Royal Society of London. Series A. Math. andPhy. Sciences, 241 (1957), pp. 376–396, https://doi.org/10.1098/rspa.1957.0133.

[6] J. D. Eshelby and R. E. Peierls, The elastic field outside an ellipsoidal inclusion, Pro. ofthe Royal Society of London. Series A. Math. and Phy. Sciences, 252 (1959), pp. 561–569,https://doi.org/10.1098/rspa.1959.0173.

[7] C. Farhat, M. Lesoinne, P. LeTallec, K. Pierson, and D. Rixen, Feti-dp: a dual–primal unified feti method—part i: A faster alternative to the two-level feti method, Int.J. Num. Meth. Engng., 50 (2001), pp. 1523–1544.

[8] O. Jamond, R. Gatti, A. Roos, and B. Devincre, Consistent formulation for the discrete-continuous model: Improving complex dislocation dynamics simulations, InternationalJournal of Plasticity, 80 (2016), pp. 19–37, https://doi.org/10.1016/j.ijplas.2015.12.011.

[9] A. Klawonn and O. Rheinbach, A parallel implementation of dual-primal feti methodsfor three-dimensional linear elasticity using a transformation of basis, SIAM Journal onScientific Computing, 28 (2006), pp. 1886–1906.

[10] S. Sandfeld, M. Monavari, and M. Zaiser, From systems of discrete dislocations to acontinuous field description: stresses and averaging aspects, Modelling and Simulation inMaterials Science and Engineering, 21 (2013), p. 085006.

[11] D. Steinberger, R. Gatti, and S. Sandfeld, A universal approach towards computationalcharacterization of dislocation microstructure, JOM, 68 (2016), pp. 2065–2072.

[12] A. Toselli and O. Widlund, Domain decomposition methods-algorithms and theory, vol. 34,Springer Science & Business Media, 2006.

4