1
On the properties of Krylov subspaces in finite precision CG computations Tom´ s Gergelits * <[email protected]> Zdenˇ ek Strakoˇ s *,<[email protected]> * Faculty of Mathematics and Physics, Charles University in Prague Institute of Computer Science, Academy of Sciences of the Czech Republic Introduction We address the question of difference between Krylov subspaces generated by the CG method in finite precision arithmetic and their exact arithmetic counterparts. Since we study the behaviour of CG in practical computations, we concentrate on situation with significant delay of convergence. We observe that apart from the delay the computed Krylov subspaces do not depart much from their exact arithmetic counterparts. Krylov subspaces in practical computations The need of computing the basis of the Krylov subspace K l (B , v ) = span{v , Bv ,..., B l -1 v } is in the nutshell of CG and other Krylov subspace methods. However, due to rounding errors, the subspace spanned by the practically computed basis may differ and the following questions arise: I What is the difference between the computed Krylov subspace K l (B , v ) and its exact arithmetic counterpart K l (B , v )? I Can we find perturbations ΔB , δ v such that K l (B B , v + δ v )= K l (B , v )? I How much are the Krylov subspaces K l (B B , v + δ v ) sensitive to general small perturbations? Results in literature, e.g., [1, 2, 3], rely on the assumption of full dimensionality of studied (computed or perturbed) Krylov subspaces. The CG method in practical computations The CG method is the method of choice for solving linear systems Ax = b , A F N ×N HPD, b F N , F is R or C. I CG is a projection method which minimizes the energy norm of the error x l x 0 + K l (A, r 0 ), r l ⊥K l (A, r 0 ); kx - x l k A = min {kx - y k A : y x 0 + K l (A, r 0 )} . I CG is computationally based on short recurrences. Delay of convergence & rank deficiency Using short recurrences in practical computation leads to the loss of global orthogonality of the computed residuals and even to the loss of their linear independence. Consequently, the computed Krylov subspace spanned by these vectors can be rank-deficient which causes significant delay of convergence. 0 10 20 30 40 50 10 -15 10 -10 10 -5 10 0 delay of convergence iteration number relative A-norm of the error exact arithmetic finite precision arithmetic loss of orthogonality 0 10 20 30 40 50 0 5 10 15 20 25 30 35 40 45 50 iteration number rank of the computed Krylov subspace rank-deficiency FP CG computation exact CG computation Idea of shift Taking into the account the phenomenon of delay of convergence, we should compare different iterations when comparing CG computations in finite precision and exact arithmetic. We relate: k -th iteration of FP CG ⇐⇒ l -th iteration of exact CG where k - l corresponds to delay of convergence or rank-deficiency of computed Krylov subspace. We want to study: kx - x k k A × kx - x l k A x k × x l K k (A, r 0 ) × K l (A, r 0 ). Correspondence among computed and exact approximation vectors We see that k ¯ x k -x l k A kx -x l k A 1, i.e., the distance between the exact and shifted FP approximations is small in comparison with the actual size of error. 0 6 (6) 12 (12) 18 (23) 25 (50) 10 -15 10 -10 10 -5 10 0 energy norm l(k) x - x l A x - ¯ x k A ¯ x k - x l A ¯ x k -x l A x-x l A Ax = b ¯ x k x l x F N exact computation finite precision computation x k ¯ x k x x 0 F N Ax = b exact computation finite precision computation delay at the k-th step Figure : Trajectory of approximations ¯ x k generated by FP CG computations follows closely the trajectory of the exact CG approximations x l with a delay given by the rank-deficiency of the computed Krylov subspace. Correspondence between computed and exact Krylov subspaces The distance between subspaces is measured using principal angles ϑ j and vectors p j , q j defined as ϑ j = min p ∈F j kp k=1 min q ∈G j kq k=1 arccos ( p * q ) arccos ( p j * q j ) where F j K k (A, r 0 ) p 1 ,..., p j -1 , G j ≡K l (A, r 0 ) q 1 ,..., q j -1 . We compute principal angles and vectors via the computation of the SVD decomposition of the matrix U * l V l U * l V l = F ΣG * , where U l is the orthogonal basis of the l -dimensional restriction of K k (A, r 0 ) and V l is an orthogonal basis of K l (A, r 0 ). It holds that: [p 1 ,..., p l ] P = U l F , p j = U l f j , [q 1 ,..., q l ] Q = V l G , q j = V l g j , diag(cos(ϑ 1 ),..., cos(ϑ l )) Σ. 0 6 (6) 12 (12) 18 (23) 25 (54) 0 0.5 1 1.5 l(k) Comparison of principal angles of subspaces ¯ K k and K l angles - in degrees ϑ l ϑ l1 ϑ l2 ϑ l3 0 82 (170) 164 (502) 246 (976) 329 (1699) 0 50 100 l(k) Comparison of principal angles of subspaces ¯ K k and K l angles - in degrees ϑ l ϑ l1 ϑ l2 ϑ l3 Figure : Left: In this numerical experiment, the largest canonical angle is 1 o and thus Krylov subspaces K l (A, r 0 ) and K k (A, r 0 ) are nearly the same. Right: Things can be more complicated, as illustrated on experiment with data Bus 494 from the MatrixMarket database. The departure of subspaces is, however, still only in few directions. Study of departure of Krylov subspaces Influence of clustered eigenvalues We have observed that the quality of closeness of generated Krylov subspaces depends on the possible presence of clustered eigenvalues. The tighter cluster is, the more severe is departure of subspaces. 0 6 (6) 13 (13) 20 (24) 27 (50) 0 10 20 30 l(k) Comparison of principal angles of subspaces ¯ K k and K l angles - in degrees ϑ l ϑ l1 ϑ l2 ϑ l3 0 6 (6) 13 (13) 20 (26) 27 (48) 0 50 100 l(k) Comparison of principal angles of subspaces ¯ K k and K l angles - in degrees ϑ l ϑ l1 ϑ l2 ϑ l3 Figure : Illustration of the influence of clustered eigenvalues. We plot the largest principal angles for two different settings. Five largest eigenvalues are clustered in the interval of length Δ = 10 -8 (left) and Δ = 10 -12 (right). Similar phenomenon of the loss and recapture of correlation between Arnoldi vectors computed by two implementations of the Arnoldi algorithm in finite precision arithmetic was observed in [4]. Concluding remarks I The trajectories of computed approximations are enclosed in a shrinking “cone”. I Krylov subspaces are in general sensitive to small perturbations of the matrix A. The observed “stability” (or inertia) of computed Krylov subspace represents phenomenon which needs further investigation. I There is principle difference in analysis between short and long recurrences. Using short recurrences we can not guarantee that the computed basis is well conditioned and that the computed subspaces have full dimension. Acknowledgement This work is supported by the ERC-CZ project LL1202, by the GA ˇ CR grant 201/13-06684S and by the GAUK grant 695612. Bibliography Carpraux, J.-F. and Godunov, S. K. and Kuznetsov, S. V. (1996). Condition number of the Krylov bases and subspaces. Linear Algebra and its Applications 265, 1–28. Paige, C. C. and Van Dooren, P. (1999). Sensitivity analysis of the Lanczos reduction. Numerical Linear Algebra with Applications 6, 29–50. Stewart, G. W. (2002). Backward error bounds for approximate Krylov subspaces. Linear Algebra and its Applications 340, 81–86. Okˇ sa, G. and Rozloˇ zn´ ık, M. (2011). On numerical behavior of the Arnoldi algorithm in finite precision for matrices with close eigenvalues. In SNA’11: Seminar on Numerical Analysis 2011, Institute of Geonics AS CR, Ostrava, 88–91. http://more.karlin.mff.cuni.cz

On the properties of Krylov subspaces in finite precision ...On the properties of Krylov subspaces in nite precision CG computations Tom a s Gergelits

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On the properties of Krylov subspaces in finite precision ...On the properties of Krylov subspaces in nite precision CG computations Tom a s Gergelits

On the properties of Krylov subspaces in finite precision CGcomputations

Tomas Gergelits∗ <[email protected]> Zdenek Strakos∗,† <[email protected]>

∗Faculty of Mathematics and Physics, Charles University in Prague†Institute of Computer Science, Academy of Sciences of the Czech Republic

Introduction

We address the question of difference between Krylov subspaces generated by the CG method in finiteprecision arithmetic and their exact arithmetic counterparts. Since we study the behaviour of CG inpractical computations, we concentrate on situation with significant delay of convergence. We observethat apart from the delay the computed Krylov subspaces do not depart much from their exact arithmeticcounterparts.

Krylov subspaces in practical computations

The need of computing the basis of the Krylov subspace

Kl(B , v) = span{v ,Bv , . . . ,B l−1v}is in the nutshell of CG and other Krylov subspace methods. However, due to rounding errors, the subspacespanned by the practically computed basis may differ and the following questions arise:

I What is the difference between the computed Krylov subspace Kl(B , v) and its exact arithmeticcounterpart Kl(B , v) ?

I Can we find perturbations ∆B , δv such that Kl(B + ∆B , v + δv) = Kl(B , v) ?

I How much are the Krylov subspaces Kl(B + ∆B , v + δv) sensitive to general small perturbations?

Results in literature, e.g., [1, 2, 3], rely on the assumption of full dimensionality of studied (computedor perturbed) Krylov subspaces.

The CG method in practical computations

The CG method is the method of choice for solving linear systems

Ax = b, A ∈ FN×N HPD, b ∈ FN , F is R or C.

I CG is a projection method which minimizes the energy norm of the error

xl ∈ x0 +Kl(A, r0), rl ⊥ Kl(A, r0); ‖x − xl‖A = min {‖x − y‖A : y ∈ x0 +Kl(A, r0)} .I CG is computationally based on short recurrences.

Delay of convergence & rank deficiencyUsing short recurrences in practical computation leads to the loss of global orthogonality of thecomputed residuals and even to the loss of their linear independence. Consequently, the computed Krylovsubspace spanned by these vectors can be rank-deficient which causes significant delay ofconvergence.

0 10 20 30 40 50

10−15

10−10

10−5

100

delay ofconvergence

iteration number

rela

tive

A−

norm

of t

he e

rror

exact arithmeticfinite precision arithmeticloss of orthogonality

0 10 20 30 40 500

5

10

15

20

25

30

35

40

45

50

iteration number

rank

of t

he c

ompu

ted

Kry

lov

subs

pace

rank−deficiency

FP CG computationexact CG computation

Idea of shift

Taking into the account the phenomenon of delay of convergence, we should compare differentiterations when comparing CG computations in finite precision and exact arithmetic. We relate:

k-th iteration of FP CG ⇐⇒ l -th iteration of exact CG where

k − l corresponds to delay of convergence or rank-deficiency of computed Krylov subspace.We want to study:

‖x − xk‖A × ‖x − xl‖Axk × xl

Kk(A, r0) × Kl(A, r0).

Correspondence among computed and exact approximation vectors

We see that

‖xk−xl‖A‖x−xl‖A

� 1,

i.e., the distance between the exact and shifted FPapproximations is small in comparison with the actualsize of error.

0 6 (6) 12 (12) 18 (23) 25 (50)

10−15

10−10

10−5

100

ener

gy n

orm

l(k)

‖x− xl‖A‖x− xk‖A‖xk − xl‖A‖xk−xl‖A‖x−xl‖A

Ax = b

xk

xl

x

FN

exact computation

finite precision computation

xk

xkx

x0

FNAx = b

exact computation

finite precision computation

delay at the k-th step

Figure : Trajectory of approximations xk generated by FP CG computations follows closely the trajectory of the exact CGapproximations xl with a delay given by the rank-deficiency of the computed Krylov subspace.

Correspondence between computed and exact Krylov subspaces

The distance between subspaces is measured using principal angles ϑj and vectors pj , qj defined as

ϑj = minp∈Fj‖p‖=1

minq∈Gj‖q‖=1

arccos ( p∗q ) ≡ arccos ( pj∗qj ) where Fj ≡ Kk(A, r0) ∩

{p1, . . . , pj−1

}⊥,

Gj ≡ Kl(A, r0) ∩{q1, . . . , qj−1

}⊥.

We compute principal angles and vectors via the computation of the SVD decomposition of the matrix U∗l VlU∗l Vl = FΣG∗,

where Ul is the orthogonal basis of the l -dimensional restriction of Kk(A, r0) and Vl is an orthogonal basisof Kl(A, r0). It holds that:

[p1, . . . , pl ] ≡ P = UlF , pj = Ul fj ,

[q1, . . . , ql ] ≡ Q = VlG , qj = Vlgj ,

diag(cos(ϑ1), . . . , cos(ϑl)) ≡ Σ.

0 6 (6) 12 (12) 18 (23) 25 (54)0

0.5

1

1.5

l(k)

Comparison of principal angles of subspaces Kk and Kl

angles-in

◦degrees

ϑlϑl−1ϑl−2ϑl−3

0 82 (170) 164 (502) 246 (976) 329 (1699)0

50

100

l(k)

Comparison of principal angles of subspaces Kk and Kl

angles-in

◦degrees

ϑl

ϑl−1ϑl−2ϑl−3

Figure : Left: In this numerical experiment, the largest canonical angle is ≈ 1o and thus Krylov subspaces Kl(A, r0) andKk(A, r0) are nearly the same. Right: Things can be more complicated, as illustrated on experiment with data Bus 494 from theMatrixMarket database. The departure of subspaces is, however, still only in few directions.

Study of departure of Krylov subspaces

Influence of clustered eigenvaluesWe have observed that the quality of closeness of generated Krylov subspaces depends on the possiblepresence of clustered eigenvalues. The tighter cluster is, the more severe is departure of subspaces.

0 6 (6) 13 (13) 20 (24) 27 (50)0

10

20

30

l(k)

Comparison of principal angles of subspaces Kk and Kl

angles-in

◦degrees

ϑlϑl−1ϑl−2ϑl−3

0 6 (6) 13 (13) 20 (26) 27 (48)0

50

100

l(k)

Comparison of principal angles of subspaces Kk and Kl

angles-in

◦degrees

ϑlϑl−1ϑl−2ϑl−3

Figure : Illustration of the influence of clustered eigenvalues. We plot the largest principal angles for two different settings. Fivelargest eigenvalues are clustered in the interval of length ∆ = 10−8 (left) and ∆ = 10−12 (right).

Similar phenomenon of the loss and recapture of correlation between Arnoldi vectors computed by twoimplementations of the Arnoldi algorithm in finite precision arithmetic was observed in [4].

Concluding remarks

I The trajectories of computed approximations are enclosed in a shrinking “cone”.

I Krylov subspaces are in general sensitive to small perturbations of the matrix A. The observed “stability”(or inertia) of computed Krylov subspace represents phenomenon which needs further investigation.

I There is principle difference in analysis between short and long recurrences. Using shortrecurrences we can not guarantee that the computed basis is well conditioned and that thecomputed subspaces have full dimension.

Acknowledgement

This work is supported by the ERC-CZ project LL1202, by the GACR grant 201/13-06684S and by theGAUK grant 695612.

Bibliography

Carpraux, J.-F. and Godunov, S. K. and Kuznetsov, S. V. (1996). Condition number of the Krylov basesand subspaces. Linear Algebra and its Applications 265, 1–28.

Paige, C. C. and Van Dooren, P. (1999). Sensitivity analysis of the Lanczos reduction. Numerical LinearAlgebra with Applications 6, 29–50.

Stewart, G. W. (2002). Backward error bounds for approximate Krylov subspaces. Linear Algebra and itsApplications 340, 81–86.

Oksa, G. and Rozloznık, M. (2011). On numerical behavior of the Arnoldi algorithm in finite precision formatrices with close eigenvalues. In SNA’11: Seminar on Numerical Analysis 2011, Institute of Geonics ASCR, Ostrava, 88–91.

http://more.karlin.mff.cuni.cz