My paper for Domain Decomposition Conference in Strobl, Austria, 2005

H-matrix based preconditioner for the skin problemB.N.Khoromskij, A.Litvinenko

bokh, [email protected] Planck Institute for Mathematics in the Sciences

Leipzig. 18/08/2006Abstract

In this paper we propose and analyze the new H-Cholesky based preconditioner for the so-called

skin problem [5]. After a special reordering of indices and omitting the coupling, we obtain a

block diagonal matrix which is very suitable for the hierarchical Cholesky (H-Cholesky) factor-

ization. We perform the H-Cholesky factorization of this matrix and use it as a preconditioner

for the cg method. We will show that the new preconditioner requires less memory and com-

putational time than the standard H-Cholesky preconditioner, which is also very cheap and fast.

Key words: skin problem, H-matrix approximation, hierarchical Cholesky, jumpingcoefficients, domain decomposition.

1 Introduction

In the series of papers [7], [9], [10] the authors successfully apply the iteration method (cg,gmres, bicgstab) with H-matrices based preconditioners to different types of second orderelliptic differential problems. In this paper we continue the research in this direction.Under some definite conditions H-matrices can be used even as a direct solver. There areresults (see, e.g., [11] and references therein) where authors apply additive Schwarz domaindecomposition preconditioners. It is known that for problems with jumping coefficients

(see (1)) the condition number cond(A) is proportional to h−d supx,y∈Ω

α(x)

α(y), where α(x)

denotes the jumping coefficient, d the spatial dimension and h the grid step size. This iswhy a good preconditioner W is needed so that cond(W−1A) ≃ 1.In this paper we consider a diffusion process (see (1)) through the domain as shown inFig. 1 (left). This figure shows cells and the lipid layer between them. In this problemthe Dirichlet boundary condition means the presence of some drugs on the boundary γ

of the skin fragment. The right-hand side presents external forces. The zero Neumanncondition on Γ\γ shows that there is no penetration through the surface Γ\γ. Typical forthe skin problem are the high jumping coefficients. The penetration coefficient inside thecells is very low ∼ 10−5 − 10−3, but it is large between cells.The diffusion equation has the form:

div(α(x)∇u) = f x ∈ Ωu = 0 x ∈ γ∂u∂n

= g x ∈ Γ \ γ

(1)

where Γ = ∂Ω, α(x, y) = ε ≪ 1 in cells and α(x, y) = β = 1 in between. The rest of this

1

ε

β

z

y

x

Figure 1: (left) A skin fragment consists of cells and of the lipid layer. The penetrationthrough the cells goes very slowly and very fast through the lipid layer. (right) Thesimplified model of a skin fragment contains 8 cells with the lipid layer between them.Ω = [−1, 1]3, α(x, y) = ε inside cells and α(x, y) = β = 1 in the lipid layer.

paper is structured as follows. In Section 2 we describe the discretisation which is doneby FEM. We recall the main idea of the H-matrix technique in Section 3. Section 4 isdevoted to the new preconditioner and estimations of its complexity. Numerical tests andcomparisons of different preconditioners are provided in Section 5. Finally, some remarksconclude the paper.

2 Discretisation (FEM)

Let us choose the triangulation τh which is compatible with the lipid layer, i.e., τh :=τ 1

h ∪ τ 2

h , where τ 1

h is a triangulation of the lipid layer and τ 2

h a triangulation of cells. Letbj , j = 1..n, be piecewise linear basis functions and

Vh ⊂ H1(Ω), Vh := spanb1, ..., bn. (2)

Then the variational formulation of the initial problem is

find uh ∈ Vn, so that a(uh, v) = c(v) for all v ∈ Vn. (3)

Assuming (2), we obtain the equivalent problem

Au = c, where Aij = a(bj , bi) and ci := c(bi), i, j = 1, .., n. (4)

Here

a(bj , bi) =

∫

α(∇bj ,∇bi)dx =

∫

Ω

fbjdx +

∫

Γ\γ

gbjdΓ =: cj . (5)

2

The lipid layer between the cells defines the natural decomposition of Ω. The width ofthis layer is proportional to the grid step size h. Note that after the reordering of indices,we can represent the global stiffness in the following form:

(

A11 εA12

εA21 εA22

)

. (6)

Here A11, A22 are the stiffness matrices which correspond to the lipid layer and to therest of domain accordingly. A12, A21 are coupling matrices. To simplify the model we willconsider Ω as in Fig. 1 (right).

3 Hierarchical Matrices

The hierarchical matrices (H-matrices) were introduced in 1998 by Hackbusch [2] andsince then, H-matrices have been applied in a wide range of applications. They provide aformat for the data-sparse representation of fully-populated matrices. Suppose there aretwo matrices A ∈ R

n×k and B ∈ Rm×k, k ≪ min(n, m), so that ABT = R ∈ R

n×m. Wesay then that R is the rank-k matrix. The main idea of H-matrices is to approximatecertain subblocks of a given matrix by rank-k matrices. The admissible partitioningindicates which blocks can be approximated by rank-k matrices. The storage requirementfor matrices A and B is k(n + m) instead of n · m for matrix R. One of the biggestadvantages of H-matrices is that the complexity of the H-matrix addition, multiplicationand inversion is not bigger than Ckn logq n, q = 1, 2 (see [2], [13]). The lack is that theconstant C is large. For example for 3D case it can be bigger than 120.To build an H-matrix one needs an admissible block partitioning (see Fig. 2). To buildthis partitioning one needs an admissibility condition and a block cluster tree. To buildthe block cluster tree a cluster tree is necessary. The cluster tree requires grid data. Formore details see [2] or [13].

H-matrixverices

finite elements

cluster treeblockcluster tree

admissibilitycondition

admissiblepartitioning

H-Choleskyfactorization

Figure 2: The schema of building an H-matrix and its H-Cholesky factorisation.

Definition 3.1 We define the set of H-matrices with the maximal rank k as followsH(TI×J , k) := M ∈ R

I×J | rank(M |t×s) ≤ k for all admissible leaves t × s of TI×J.

3

Algorithm of the H-Cholesky factorizationOur aim is to compute the H-Cholesky factorization of the stiffness matrix which appearsafter discretisation of the Laplace operator. Suppose that

A =

[

A11 A12

A21 A22

]

=

[

L11 0L21 L22

] [

U11 U12

0 U22

]

then the algorithm is as follows

1. compute L11 and U11 as H-Cholesky decomposition of A11.

2. compute U12 from L11U12 = A12 (use a recursive block forward substitution).

3. compute L21 from L21U11 = A21 (use a recursive block backward substitution).

4. compute L22 and U22 as H-Cholesky decomposition of L22U22 = A22 ⊖ L21 ⊙ U12.

All the steps are executed in the class of H-matrices.

4 New Preconditioner

The H-Cholesky factorization of the stiffness matrix produces H-matrix as shown in Fig.3 (left). After reordering of the index set I(Ω) and omitting the coupling between cells andthe lipid layer we obtain H-matrix as shown in Fig. 3 (right). As a new preconditionerwe use the H-Cholesky decomposition of

(

A11 00 εA22

)

. (7)

Remark 4.1 Note that W−1A := (LLT )−1

A = L−T L−1A = L−T AL−1, i.e., W−1A ispositive definite and symmetric. Thus, for solving the initial problem (4) we apply the pcgmethod with the H-Cholesky preconditioner.

Below we prove that omitting of the coupling for small ε is possible.

Lemma 4.1 For a symmetric and positive definite matrix A =

(

A11 A12

A21 A22

)

and any

vector v =

(

v1

v2

)

it is hold ‖(A12v1, v2)‖ ≤ ‖A1/2

11 v1‖ · ‖A1/2

22 v2‖.

Proof: From Cauchy inequality for any vectors u, v it follows

‖uTAv‖ = ‖(u, v)‖A ≤ ‖u‖A · ‖v‖A.

Construct two vectors u = (v1, 0)T and v = (0, v2)T , then uTAv = (A12v2, v1). It means

that‖(A12v2, v1)‖ ≤ ‖v1‖A · ‖v2‖A = ‖A

1/2

11v1‖ · ‖A

1/2

22v2‖.

4

Lemma 4.2 For a symmetric and positive definite matrix A =

(

A11 A12

A21 A22

)

and any

vector v =

(

v1

v2

)

it is hold

2(A12u2, u1) ≤ (A11u1, u1) + (A22u2, u2),

‖(A12v1, v2)‖ ≤1

2

(

‖A1/2

11 v1‖ + ‖A1/2

22 v2‖)

.

Proof: Let u1 := v1 and u2 = −v2 then u =

(

u1

−u2

)

. From the positive definiteness

of A it follows

0 ≤ (Au, u) = (A11u1, u1) − (A12u2, u1) − (A21u1, u2) + (A22u2, u2).

Move negative terms to the left, obtain

(A12u2, u1) + (A21u1, u2) ≤ (A11u1, u1) + (A22u2, u2).

Recall that A is symmetric, obtain 2(A12u2, u1) ≤ (A11u1, u1) + (A22u2, u2) and

2(A12u2, u1) ≤ (A1/2

11u1, A

1/2

11u1) + (A

1/2

22u2, A

1/2

22u2),

‖(A12u2, u1)‖ ≤1

2

(

‖A1/2

11 u1‖ + ‖A1/2

22 u2‖)

.

Lemma 4.3 Let u be a vector and W =

(

A11 00 A22

)

be a preconditioner, then

‖(Au, u)‖ ≤ 2(Wu, u). (8)

Proof: Compute both scalar products

(W2u, u) =

((

A11 00 εA22

) (

u1

u2

)

,

(

u1

u2

))

= (A11u1, u1) + ε(A22u2, u2).

(Au, u) =

((

A11 εA12

εA21 εA22

) (

u1

u2

)

,

(

u1

u2

))

= (A11u1, u1) + 2ε(A12u2, u1) + ε(A22u2, u2) = (Wu, u) + 2ε(A12u2, u1),

From the previous Lemma it follows that (Au, u) ≤ (Wu, u) + (Wu, u).

Remark 4.2 Recall that A and W are spectral equivalent if c1 · I ≤ W−1A ≤ c2 · I,∀u ∈ R

n.

Lemma 4.4 Matrices A and W are spectral equivalent with I ≤ W−1A ≤ 2cdotI.

Proof: We will write A ≥ B if A − B is semi-positive definite. From Lemma 4.3 follows(Au, u) ≤ 2(Wu, u), u ∈ R

n. Move everything in the left part, obtain ((A−2W )u, u) ≤ 0.Since the last holds for ∀u than A − 2W ≤ 0 or W−1A ≤ 2.From the construction of W it is clear that A − W ≥ 0, i.e. W−1A ≥ I.Thus, I ≤ W−1A ≤ 2 · I.

5

32

15 32

8 8

8

24

15 24

8 8

8 15

12

24

15 24

15 368

8 15

12 1512 12

9

24

15 24

15 36

15 15

9 15

36

15 27

8 8

8

1512 15

12

129

24

15 24

8 8

8

24

15 24

15 15

12

36

15 36

15 15

12 15

915 15

9

36

15 36

15 15

9 15

27

15 27

8 8

8 15

12 1512 12

9 1512 12

12

159 9

9

24

15 24

8 8

8

24

15 24

15 15

12

36

15 36

15 15

12 15

915 15

9

36

15 36

15 15

9 15

27

15 27

12 12

5 5

15 15

11 10

12 15

159 9

5 5

15 15

9 9

9

36

15 36

15 15

12

36

15 36

15 15

9 15 15

915 15

9

27

15 27

15 15

9 15

27

15 27

8 8

8 15

1512

1212 12

9 1512 12

12 15

9 9

9 1512 12

12 15

9 9

9 159 9

9 159 9

9

24

15 24

8 8

8

24

15 24

15 15

12

36

15 36

15 15

12 15

915 15

9

36

15 36

15 15

9 15

27

15 27

12 12

5 5

15 15

12 10

12 15

159 9

5 5

15 15

9 9

9

36

15 36

15 15

12

36

15 36

15 15

9 15 15

915 15

9

27

15 27

15 15

9 15

27

15 27

11 11

5 5 1515 15

10 9 15

12

12

129 9

5 5 1515 15

9 9

9 15

15

9 9

9 9 13

5 5

15 15

9 15 159 9

9 9 139 9

9

36

15 36

15 15

12

36

15 36

15 15

9 15 15

915 15

9

27

15 27

15 15

9 15

27

15 27

15 15

5 5 159 9

5 5 15

15 15

15 15 159 15

9 9 15

159 9

5 5 1515 15

9 9

9

27

15 27

15 15

9 15

27

15 27

15 15

9 15 15

915 15

9

27

15 27

15 15

9 15

27

15 27

16

4 36

4 4

9

36

1220

11 19

15 4

12 1012 9

9

16

12 32

10 12

7 14

32

15 3415

12 1012

9 1415 8

15

16

12 32

12 12

9 14

32

15 34

6 1512 15

1010 15

9

3212 12

15 12

7 9

34

14 2415 7

12 9

12

9 1315 8

15 1515 8

15 1415 12

15

16

12 32

12 12

9 14

32

15 34

15 1512 15

1010 15

9

3212 12

15 12

7 9

34

14 24

7 8 15 1512 6

7 10 9 15

915

1010 12

5 5

15 14

8 8

15

3212 12

15 12

7 9

34

14 24

15 12

8 12

15 14

15 14

7 15

34

15 24

15 33

27

9 27

9 9

9

27

15 27

9 9

9 159 9

9

27

15 27

15 15

9 15

27

15 27

279 18

9 9

6

2712 18

9 9

6 159 9

6

2712 18

15 12

6 10

2714 18

279 18

9 9

6

2712 18

9 9

6 159 9

6

2712 18

15 12

6 10

2714 18

279 18

15 30

9 9

6 15

10

2712 18

15 30

279 18

9 9

6

2712 18

9 9

6 159 9

6

2712 18

15 12

6 10

2714 18

279 18

15 30

9 9

6 15

10

2712 18

15 30

279 18

15 30

9 9

6 15

10

2712 18

15 30

279 18

15 30

15 1510

3015 20

Figure 3: H-Cholesky factorizations of the standard stiffness matrix (left) and the stiffnessmatrix without coupling between the lipid layer and cells (right). The dark blocks ∈R

36×36 are dense matrices and the light blocks are low-rank matrices. The steps in thegrey blocks show the decay of the singular values in the logarithmic scale.

5 Numerical tests

Table 5 gives the theoretical estimations of the sequential and parallel complexities of theH-Cholesky factorization of W1 and W2.

Preconditioner Comp. Complexity Parallel Complexity

W1 := H-Cholesky decomp. of(

A11 A12

A21 A22

)

O(kn log2n) O(kn log2

n)

W2 := H-Cholesky decomp. of(

A11 00 A22

)

O(knI log2nI) maxO(knI log2

nI),

+O(k(n − nI) log2(n − nI)) O(kn0 log2n0)

Table 1: Complexities of the preconditioners W1 and W2. p is the number of processors,nI is the number of degrees of freedom in the lipid layer, n0 := n−nI

p−1.

Remark 5.1 The sparsity constant Csp is an important H-matrix feature and is presentin all H-matrix complexity estimates. This constant depends on the size of the H-matrix.Since the new preconditioner is simplier the sparsity constant is also smaller. In the frameof the numerical experiments for Table 5 Csp(W1) = 64 and Csp(W1) = 26. For the model

6

0.01

0.1

1

10

100

1000

10000

0 50 100 150 200 250

alpha=1

alpha=1e-2

alpha=1e-4

"alpha_1""alpha_1e-2""alpha_1e-4"

Figure 4: Decay of singular values of A for ε = 1, ε = 10−2 and ε = 10−4.

domain with larger number of cells the difference between sparsity constants will be moresignificant.

Table 5 shows the resources requirements for the preconditioners W1 and W2. We seethat W2 requires less resources than W1. It requires less memory (S(W1) > S(W2)) andtime (t(W1) > t(W2)) for the building. Columns 2 and 5 contain the times for computingthe Cholesky factorisations and cg iterations. In Table 5 we compare the solutions u and

k t(W1),sec S(W1),MB iter(W1) t(W2),sec S(W2),MB iter(W2)

1 24 + 10.6 2 ∗ 102 69 8.7+10 102 992 70 + 11.3 3.8 ∗ 102 46 21.6+13.3 1.8 ∗ 102 914 208 + 12.5 7.5 ∗ 102 17 68+13.5 3.5 ∗ 102 606 483.7 + 82 1.1 ∗ 103 11 123+26 5.1 ∗ 102 74

Table 2: Comparison of the preconditioners W1 and W2. 403 dofs, ‖Ax − b‖ = 10−8,α = 10−5.

ucg, obtained with the preconditioners W1 and W2. The solution ucg, obtained with thepreconditioner W1 is considered as ’exact’.

6 Conclusion

The matrix W2 can be successfully used as a preconditioner. The simple structure ofW2 is the reason why it is good parallelisable. The parallel computational complexity is

7

k|ucg−u|

|u||ucg − u|∞

1 5.3 ∗ 10−10 4.5 ∗ 10−6

2 5.1 ∗ 10−9 3.5 ∗ 10−8

4 5.8 ∗ 10−10 4.6 ∗ 10−6

6 7.2 ∗ 10−10 2.5 ∗ 10−5

Table 3: Comparison of the solutions ucg and u. 403 dofs, ‖Ax − b‖ = 10−8, α = 10−5.

maxO(nI log2 nI),O(nD log2 nD), nD := n−nI

p−1, nI number of degrees of freedom in the

lipid layer. The sequential version of the preconditioner W2 requires less memory. Notethat the more cells domain Ω contains, the bigger the advantages in storage and compu-tational resources will be (see Table 5). The disadvantage is the relative large number ofpcg iterations, but these iterations require less resources than the standard H-Choleskypreconditioner W1. In frames of HLIB (see [1]) it is quite easy to implement the offeredpreconditioner.

Acknowledgment: The authors wish to thank Prof. Dr. Hackbusch for his correc-tions as well as Dr. Borm and Dr. Grasedyck for HLIB.

8

References

[1] Hierarchical matrix library: www.hlib.org

[2] W.Hackbusch: A sparse matrix arithmetic based on H-matrices. Part 1: Introductionto H-matrices. Computing, 62: 89-108, 1999.

[3] W. Hackbusch: Direct Domain Decomposition using the Hierarchical Matrix Tech-nique, pp. 39-50, Domain Decomposition Methods in Sci. and Engineering. Cocoyoc,Mexico, 2003.

[4] W. Hackbusch, B.N. Khoromskij and R. Kriemann: Direct Schur ComplementMethod by Hierarchical Matrix Techniques. Computing and Visualisation in Science,2005, 8: 179-188.

[5] B.N. Khoromskij and G. Wittum: Numerical Solution of Elliptic Differential Equa-tions by Reduction to the Interface. LNCSE 36, Springer, 2004.

[6] M.Bebendorf and W.Hackbusch: Existence of H-Matrix approximants to the inverseFE-matrix of elliptic operators with L∞ - coefficients. Numerische Mathematik, 95:1-28, 2003.

[7] M.Bebendorf: Hierarchical LU decomposition-based preconditioners for BEM, Com-puting 74, 225-247, 2005.

[8] S. Le Borne, Ronald Kriemann, Lars Grasedyck: Parallel Black Box Domain De-composition Based H-LU Preconditioning, Preprint 115, 2005, Max-Planck-InstitutMIS, Leipzig.

[9] S. Le Borne, Lars Grasedyck: H-matrix preconditioners in convection-dominatedproblems, SIAM J. Matrix Anal. Appl., Vol. 27, No. 4, pp. 1172-1183.

[10] S. Le Borne: H-matrices for convection-diffusion problems with constant convection,Computing, 70 (2003), 261-274.

[11] I.G. Graham, P.Lechner and R.Scheichl: Domain Decomposition for MultiscalePDEs, Bath Institute for Complex Systems, Preprint 11/06 (2006), available atwww.bath.ac.uk/math-sci/BICS

[12] A. Litvinenko: Application of Hierarchical Matrices for Solving Multiscale Problems.PhD Dissertation, Leipzig University, submitted, April 2006.

[13] L.Grasedyck, W.Hackbusch: Construction and Arithmetics of H-Matrices. Comput-ing, 70: 295-334, 2003.

[14] Michael Lintner, The eigenvalue problem for the Laplacian in H-matrix arithmeticand application to the heat and wave equation. Computing, 72:293-323, 2004.

9

Education

My paper for Domain Decomposition Conference in Strobl, Austria, 2005