550412

Embed Size (px)

Citation preview

  • 7/29/2019 550412

    1/10

    NUMERICAL MATHEMATICSA Journal of Chinese Universities (English Series)

    Vol. 15 No. 3, pp. 268-277August 2006

    An Implicitly Restarted Block Arnoldi Method

    in a Vector-Wise Fashion

    Qian Yin and Linzhang Lu

    School of Mathematical Science, Xiamen University, Xiamen 361005, China.

    Received April 19, 2005; Accepted (in revised version) July 31, 2005

    Abstract. In this paper, we develop an implicitly restarted block Arnoldi algorithm ina vector-wise fashion. The vector-wise construction greatly simplifies both the detection

    of necessary deflation and the actual deflation itself, so it is preferable to the block-wiseconstruction. The numerical experiment shows that our algorithm is effective.

    Key words: IRAM; Arnoldi-type algorithm; vector-wise block Arnoldi; implicit restart.

    AMS subject classifications: 65F15

    1 Introduction

    Many scientific applications lead to large-scale eigenvalue problems, where typically only a feweigenvalues are of interest. For such problems Krylov methods are well suited.

    In the first Krylov methods, the Krylov subspace, which is constructed by the Arnoldi fac-torization, is based on a single vector. By instead using a set of orthonormal vectors to generatethe Krylov subspace, a block Arnoldi method is obtained.

    As the iteration proceeds, the storage and computational requirements increase and the al-gorithm needs to be restarted. The earlier approaches considered explicit restarting, using infor-mation obtained from the Hessenberg matrix to construct an improved starting vector. Sorensendevised an approach for the single-vector Arnoldi, where instead the factorization is updated [7].This is accomplished via an implicit shifted QR-iteration applied to the Arnoldi-factorization.Sorensen chose the shifts among the unwanted Ritz-values.

    The implicit restarting technique was generalized to block Arnoldi by Lehoucq and Maschhoff[5]. In paper [4] which is about the model reduction problem, Freund introduces an algorithmwhich implements the block Arnoldi method in a vector-wise fashion, as opposed to the block-wise construction. The vector-wise construction is preferable to the block-wise constructionbecause it greatly simplifies both the detection of necessary deflation and the actual deflationitself. And the choice of the initial block vector could be arbitrary, while in the block-wise case,an orthogonal one is needed.

    Correspondence to: Linzhang Lu, School of Mathematical Science, Xiamen University, Xiamen 361005, China.Email: [email protected] work is supported by National Natural Science Foundation of China No. 10531080.

    Numer. Math. J. Chinese Univ. (English Ser.) 268 http://www.global-sci.org/nm

  • 7/29/2019 550412

    2/10

    Q. Yin and L. Lu 269

    Since the deflation exists, it becomes necessary to exploit whether the structure of the matricesin the Arnoldi factorization after the truncation from (k + p)-order to k-order is right. Inthis paper, we first prove this point which means that the implicit restarting technique can be

    combined with the vector-wise block Arnoldi method, we then develop an implicitly restartedblock Arnoldi algorithm in a vector-wise fashion to solve the large-scale eigenproblems.

    In Section 2 we give some notations and the vector-wise block Arnoldi process which isintroduced in [4]. In Section 3 we present our strategy for the computation of eigenvalues. InSection 4 we report our numerical experiments which demonstrate that the proposed algorithmis effective.

    2 Arnoldi-type algorithm

    2.1 Block Krylov subspaces

    We first introduce our notion of block Krylov subspaces for multiple starting vectors. Let

    A RNN be a given N N matrix and

    R = [r1 r2 rm] RNm (1)

    be a given matrix of m right starting vectors r1, r2, , rm. In contrast to the case m = 1, linearindependence of the columns in the block Krylov sequence,

    R,AR,A2R, , Aj1R, (2)

    is lost gradually in general. By scanning the columns of the matrices in (2) from left to rightand deleting each column that is linearly dependent on earlier columns, we obtain the deflatedblock Krylov sequence

    R1, AR2, A2R3, , A

    jmax1Rjmax. (3)

    This process of deleting linearly dependent vectors is referred to as exact deflation in the follow-ing. In (3), for each j = 1, 2, , jmax, Rj is a submatrix of Rj1, with Rj = Rj1 if, and onlyif, deflation occurred within the jth Krylov block Aj1R in (2). Here, for j = 1, we set R0 = R.Denoting by mj the number of columns of Rj , we thus have

    m m1 m2 mjmax 1. (4)

    By construction, the columns of the matrices (3) are linearly independent, and for each n, thesubspace spanned by the first n of these columns is called the nth block Krylov subspace(inducedby A and R). In the following, we denote the nth Krylov subspace by Kn(A, R). For later use,we remark that for

    n = m1 + m2 + + mj , (5)

    where 1 j jmax, the nth block Krylov subspace is given by

    Kn(A, R) = Colspan{R1, AR2, A2R3, , A

    j1Rj }. (6)

    2.2 Basis vectors

    The columns of the deflated block Krylov sequence (3), which is used in the above definitionof Kn(A, R), tend to be almost linearly dependent even for moderate values of n. Therefore,

  • 7/29/2019 550412

    3/10

    270 An Implicitly Restarted Block Arnoldi Method in a Vector-Wise Fashion

    they should not be used in numerical computations. Instead, we construct other suitable basisvectors.

    In the following,

    v1, v2, , vn RN (7)

    denotes a set of basis vectors for Kn(A, R), i.e.,

    span{v1, v2, , vn} = Kn(A, R).

    The N n matrixVn := [v1 v2 vn] (8)

    whose columns are the basis vectors (7) is called a basis matrix for Kn(A, R).

    2.3 Arnoldi-type algorithm

    The classical Arnoldi process generates orthonormal basis vectors for the sequence of Krylov

    subspaces Kn(A, r), n 1, induced by A and a single starting vector r. In this subsection, westate an Arnoldi-type algorithm that extends the classical algorithm to block-Krylov subspacesKn(A, R), n 1.

    Like the classical Arnoldi process, the algorithm constructs the basis vectors (7) to be ortho-normal. In terms of the basis matrix (8), this orthonormality can be expressed as follows:

    VTn Vn = I.

    In addition to (7), the algorithm produces the so-called candidate vectors,

    vn+1, vn+2, , vn+mc , (9)

    for the next mc basis vectors vn+1, vn+2, , vn+mc . Here, mc = mc(n) is the number m of

    columns in the starting block (1), R, reduced by the number of exact and inexact deflations thathave occurred so far. The candidate vectors (9) satisfy the orthogonality relation

    VTn [vn+1 vn+2 vn+mc ] = 0.

    Due to the vector-wise construction of (7) and (9), detection of necessary deflation and theactual deflation becomes trivial. In fact, essentially the same proof as given in [1] for the case ofa Lanczos-type algorithm can be used to show that exact deflation at step n of the Arnoldi-typeprocess occurs if, and only if, vn = 0. Similarly, inexact deflation occurs if, and only if, vn 0,but vn = 0. Therefore, in the algorithm, one checks if

    vn dtol , (10)

    where dtol 0 is a suitably chosen deflation tolerance. If (10) is satisfied, then vn is deflated by

    deleting vn, shifting the indices of all remaining candidate vectors by 1, and setting mc = mc1.If this results in mc = 0, then the block-Krylov subspace is exhausted and the algorithm isstopped. Otherwise, the deflation procedure is repeated until a vector vn with vn > dtol isfound. This vector is then turned into vn by normalizing it to have Euclidean norm 1.

    A complete statement of the resulting Arnoldi-type algorithm is as follows.

    Algorithm 1

    function [V,W,T,Rho,mc, n] = vbArnoldi(A,N,R,m,jmax)(0) Set vi = ri f o r i = 1, 2, , m.

  • 7/29/2019 550412

    4/10

    Q. Yin and L. Lu 271

    Set mc = m.(1) For j = 1, 2, ,jmax

    (1.1) Compute vj and check if the deflation criterion (10) is fulfilled.

    If yes, vj is deflated by doing the following:Set mc = mc 1. If mc = 0, set j = j 1 and n = j, and stop.Set vi = vi+1 for i = j, j + 1, , j + mc 1.Return to Step (1.1).

    (1.2) Set tj,jmc = vj and vj = vj /tj,jmc .(1.3) Compute vj+mc = Avj .(1.4) For i = 1, 2, , j do:

    Set ti,j = vTi vj+mc and vj+mc = vj+mc viti,j.(1.5) For i = j mc + 1, j mc + 2, , j 1 do:

    Set tj,i = vTj vi+mc and vi+mc = vi+mc vjtj,i.(2) Set n = j and V = [v1 v2 vn] and W = [vn+1 vn+2 vn+mc ];

    Set T = [ti, l]i=1,2, ,nl=1,2, ,n and Rho = [ti, lm]

    i=1,2, ,nl=1,2, ,m.

    Remark 2.1. Other block-Arnoldi algorithms (all without deflation though) can be found in [6,Section 6.12].

    After n passes through the main loop, Algorithm 1 has constructed the first n basis vectors(7) and the candidate vectors (9) for the next mc basis vectors. In terms of the basis matrix (8),Vn, the recurrences used to generate all these vectors can be written compactly in matrix format.We collect the recurrence coefficients computed during the first n passes through the main loopof Algorithm 1 in the matrices

    Rho := [ti, lm]i=1,2, ,nl=1,2, ,m and T n := [ti, l]

    i=1,2, ,nl=1,2, ,n , (11)

    respectively. Moreover, in (11), all elements ti, lm and ti, l that are not explicitly defined in

    Algorithm 1 are set to be zero. The compact statement of the recurrences used in Algorithm 1is now as follows. For n 1, we have

    AVn = VnTn + [0 0 vn+1 vn+2 vn+mc ]. (12)

    In (12), we assumed that only exact deflations are performed. If both exact and inexact deflations

    are performed, then an additional matrix term, say Vdefln , appears on the right-hand side of (12).

    The only non-zero columns of Vdefln are those non-zero vectors that satisfied the deflation check(10). Since at any stage of Algorithm 1, at most mmc = mmc(n) vectors have been deflated,the additional matrix term is small in the sense that

    Vdefln dtolm mc(n).3 Our algorithm for computing the eigenvalues

    3.1 The Arnoldi factorization

    Let AVk = VkTk + [0 0 vk+1 vk+2 vk+mc ] be a k-step Arnoldi factorization of A, wecan easily use p additional steps (see Algorithm 2) to get the (k + p)-step Arnoldi factorization

    AVk+p = Vk+pTk+p + [0 0 vk+p+1 vk+p+2 vk+p+mc ]. (13)

  • 7/29/2019 550412

    5/10

    272 An Implicitly Restarted Block Arnoldi Method in a Vector-Wise Fashion

    If we denoteWk+p = [vk+p+1 vk+p+2 vk+p+mc ], (14)

    then we can write (13) in the following form

    AVk+p = (Vk+p Wk+p)

    Tk+p

    0... Imc

    , (15)

    where Imc is mc mc identity matrix.

    Algorithm 2

    function [V,W,T,Rho,mc, n] = vbArnoldi2(A,V,W,T,Rho,k,p,mc, m)

    Input: AV V T = W(0... Imc) with V

    TV = Ik , VTW = 0.

    Output: AV V T = W(0... Imc) with V

    TV = Ik+p , VTW = 0.

    (1) For j = k + 1, k + 2, , k + p

    (1.1) Compute vj and check if the deflation criterion (10) is fulfilled.If yes, vj is deflated by doing the following:Set mc = mc 1. If mc = 0, set j = j 1 and n = j, and stop.Set vi = vi+1 for i = j, j + 1, , j + mc 1.Return to Step (1.1).

    (1.2) Set tj,jmc = vj and vj = vj /tj,jmc .(1.3) Compute vj+mc = Avj .(1.4) For i = 1, 2, , j do:

    Set ti,j = vTi vj+mc and vj+mc = vj+mc viti,j.(1.5) For i = j mc + 1, j mc + 2, , j 1 do:

    Set tj, i = vTj vi+mc and vi+mc = vi+mc vj tj, i.(2) Set n = j and V = [v1 v2 vn] and W = [vn+1 vn+2 vn+mc ].

    3.2 Updating the Arnoldi factorization

    Throughout this discussion, the integer k should be thought of as a fixed prespecified integerof modest size. Let p be another positive integer, and consider the result of k + p steps of theArnoldi process applied to A, which has the factorization form as (15). In practice, we choose plarger than k, and here we assume that p > k > m1 throughout our following discussion.

    Let be a shift, putting V = Vk+p, T = Tk+p, and let T I = QR with Q orthogonal andR upper triangular. An analogy of the explicitly shifted QR-algorithm may be applied to (13).It consists of the following four steps:

    (A I)V V(T I) = Wk+p(0... Imc), (16)

    (A I)V V QR = Wk+p(0

    ..

    . Imc), (17)

    (A I)(V Q) (V Q)(RQ) = Wk+p(0... Imc)Q, (18)

    A(V Q) (V Q)(RQ + I) = Wk+p(0... Imc)Q. (19)

    Let V+ = V Q and T+ = RQ + I = QTQRQ + QTQ = QT(QR + I)Q = QTT Q.

    Proposition 3.1. Suppose that T R(k+p)(k+p) is an m1-upper Hessenberg matrix (i.e. for1 j k + p m1 1, T(j + m1 + 1 : k + p, j) = 0), T I = QR with Q orthogonal and R

  • 7/29/2019 550412

    6/10

    Q. Yin and L. Lu 273

    upper triangular, then Q has the same zero structure as T (i.e. if T(i, j) = 0, then Q(i, j) = 0)and T+ = RQ + I = Q

    TT Q is also an m1-upper Hessenberg matrix as T.

    Proof The proof is similar to the one in [3, p. 166, Proposition 4.5].

    Proposition 3.2. We denote v+i = V+ei, rij = eTi Rej, then we have

    (A I)(v1, v2, , vm1) = (v+1 , v

    +2 , , v

    +m1

    )

    r11 r12 r1,m10 r22 r2,m1...

    .... . .

    ...0 0 rm1,m1

    .

    Proof As we have assumed that p > k > m1, we have the result

    Wk+p(0... Imc)ei = 0, f or i = 1, 2, , m1.

    Applying the matrices in (17) to the vector e1, we get

    (A I)V e1 V QRe1 = Wk+p(0... Imc)e1 = 0.

    (A I)v1 = r11v+1 . (20)

    Applying the matrices in (17) to the vector e2, we get

    (A I)V e2 V QRe2 = Wk+p(0... Imc)e2 = 0.

    (A I)v2 = V Q

    r120...0

    +

    0r22

    .

    ..0

    = r12v+1 + r22v+2 . (21)

    It follows from (20) and (21) that

    (A I)(v1, v2) = (v+1 , v

    +2 )

    r11 r120 r22

    . (22)

    The above result can be extended to v3, v4, , vm1 ,

    (A I)(v1, v2, , vm1) = (v+1 , v

    +2 , , v

    +m1

    )

    r11 r12 r1,m10 r22 r2,m1...

    .... . .

    ...

    0 0 rm1,m1

    .

    This completes the proof of Proposition 3.2.

    This idea may be extended for up to p shifts being applied successively. The application ofa QR-iteration corresponding to a shift produces an m1-upper Hessenberg orthogonal Q R(k+p)(k+p) such that

    AVk+pQ = (Vk+pQ, Wk+p)

    QTTk+pQ

    (0... Imc)Q

    .

  • 7/29/2019 550412

    7/10

  • 7/29/2019 550412

    8/10

    Q. Yin and L. Lu 275

    which is the desired result.

    We denote

    W+

    k =

    v+k+1 v

    +k+2 v

    +k+mc

    ,

    so we have

    AV+k =

    V+k , W+

    k

    T+k(0

    ... Imc)

    . (28)

    Note that (V+k )TVp = 0 and (V

    +k )

    TWk+p = 0, so (V+

    k )TW+k = 0. Thus (28) is a legitimate

    Arnoldi factorization of A. Using this as a starting point it is possible to use p additional steps(see Algorithm 2) to return to the original form (13).

    3.3 The complete algorithm for computing the eigenvalues

    Based on the above strategy, we can present our complete algorithm.

    Algorithm 3

    (1) Start: Given the number q of the desired eigenpairs;Choose the steps k of the Arnoldi-type process;Choose p unwanted eigenvalues as p shifts;Given the initial block size m, a user-prescribed tolerance tol and aninitial N m block vector V1.

    (2) [V, W, T, Rho, mc, n] = vbArnoldi(A, N, V1, m , k + p).

    (3) Compute the eigenvalues (k+p)i (i = 1, 2, , k + p) of Tk+p, and select q

    of them as approximations to the desired eigenvalues i, here we denote yias theeigenvector associated with

    (k+p)i which satisfies Tk+pyi =

    (k+p)i yi.

    (4) (Compute the approximate eigenvectors of A and the corresponding

    residual norms as convergence testing criteria.)(4.1) xi := Vk+pyi, i = 1, 2, , q

    (4.2) i := Axi (k+p)i xi, i = 1, 2, , q

    (5) If i < tol (i = 1, 2, , q)stop;

    else(5.1) Sort (T) according to the algebraically largest real part (the largest

    modulus) and select the p eigenvalues with the smallest real part(modulus) as shifts, here we use a vector to preserve these p shifts.

    (5.2) Q := Ik+p ;(5.3) for j = 1, 2, , p

    (5.3.1) T (j)I = Qj Rj (perform the QR factorization);

    (5.3.2) T := QT

    j T Qj ;(5.3.3) Q := QQj ;

    (5.4) W := (V Q)

    0Ip

    (0

    ... Ip)T

    Ik0

    + W(0

    ... Imc)Q

    Ik0

    ;

    (5.5) V := (V Q)

    Ik0

    ;

    (5.6) Compute the number mc of the nonzero columns of W;

    (5.7) W := W

    0

    Imc

    ;

  • 7/29/2019 550412

    9/10

  • 7/29/2019 550412

    10/10

    Q. Yin and L. Lu 277

    [2] Bai Z, Demmel J, Dongarra J, Ruhe A, van der Vorst H, editors. Templates for the solution ofalgebraic eigenvalue problems: A practical guide. SIAM, Philadelphia, 2000.

    [3] Demmel J. W. Applied numerical linear algebra. SIAM, Philadelphia, 1997.

    [4] Freund R W. Krylov-subspace methods for reduced-order modeling in circuit simulation. J. Comput.Appl. Math., 2000, 123: 395-421.

    [5] Lehoucq R B, Maschhoff K J. Implementation of an implicitly restarted block Arnoldi. PreprintMCS-P649-0297, Argonne National Laboratory, Argonne, IL, 1997.

    [6] Saad Y. Iterative methods for sparse linear systems, PWS Publishing Company, Boston, 1996.[7] Sorensen D C. Implicit application of polynomial filters in a k-Step Arnoldi method. SIAM J. Matrix

    Anal. Appl., 1992, 13: 357-385.