Upload
palwster
View
223
Download
0
Embed Size (px)
Citation preview
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
1/459
Iterative Methodsfor Sparse
Linear Systems
Yousef Saad
1
2 3
4 5
6
7
8
9
10
11
12
13
14
15
Copyright c
2000 by Yousef Saad.
SECOND EDITION WITH CORRECTIONS. JANUARY3RD , 2000.
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
2/459
PREFACE xiii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Suggestions for Teaching . . . . . . . . . . . . . . . . . . . . . . . . . xv
1 BACKGROUND IN LINEAR ALGEBRA 1
1.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Square Matrices and Eigenvalues . . . . . . . . . . . . . . . . . . . . . 3
1.3 Types of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Vector Inner Products and Norms . . . . . . . . . . . . . . . . . . . . . 6
1.5 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Subspaces, Range, and Kernel . . . . . . . . . . . . . . . . . . . . . . . 9
1.7 Orthogonal Vectors and Subspaces . . . . . . . . . . . . . . . . . . . . 10
1.8 Canonical Forms of Matrices . . . . . . . . . . . . . . . . . . . . . . . 15
1.8.1 Reduction to the Diagonal Form . . . . . . . . . . . . . . . . . 151.8.2 The Jordan Canonical Form . . . . . . . . . . . . . . . . . . . . 16
1.8.3 The Schur Canonical Form . . . . . . . . . . . . . . . . . . . . 17
1.8.4 Application to Powers of Matrices . . . . . . . . . . . . . . . . 19
1.9 Normal and Hermitian Matrices . . . . . . . . . . . . . . . . . . . . . . 21
1.9.1 Normal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.9.2 Hermitian Matrices . . . . . . . . . . . . . . . . . . . . . . . . 24
1.10 Nonnegative Matrices, M-Matrices . . . . . . . . . . . . . . . . . . . . 26
1.11 Positive-Definite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.12 Projection Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331.12.1 Range and Null Space of a Projector . . . . . . . . . . . . . . . 33
1.12.2 Matrix Representations . . . . . . . . . . . . . . . . . . . . . . 35
1.12.3 Orthogonal and Oblique Projectors . . . . . . . . . . . . . . . . 35
1.12.4 Properties of Orthogonal Projectors . . . . . . . . . . . . . . . . 37
1.13 Basic Concepts in Linear Systems . . . . . . . . . . . . . . . . . . . . . 38
1.13.1 Existence of a Solution . . . . . . . . . . . . . . . . . . . . . . 38
1.13.2 Perturbation Analysis . . . . . . . . . . . . . . . . . . . . . . . 39
Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2 DISCRETIZATION OF PDES 442.1 Partial Differential Equations . . . . . . . . . . . . . . . . . . . . . . . 44
2.1.1 Elliptic Operators . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.1.2 The Convection Diffusion Equation . . . . . . . . . . . . . . . 47
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
3/459
2.2 Finite Difference Methods . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2.1 Basic Approximations . . . . . . . . . . . . . . . . . . . . . . . 48
2.2.2 Difference Schemes for the Laplacean Operator . . . . . . . . . 49
2.2.3 Finite Differences for 1-D Problems . . . . . . . . . . . . . . . 51
2.2.4 Upwind Schemes . . . . . . . . . . . . . . . . . . . . . . . . . 512.2.5 Finite Differences for 2-D Problems . . . . . . . . . . . . . . . 54
2.3 The Finite Element Method . . . . . . . . . . . . . . . . . . . . . . . . 55
2.4 Mesh Generation and Refinement . . . . . . . . . . . . . . . . . . . . . 61
2.5 Finite Volume Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3 SPARSE MATRICES 68
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.2 Graph Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.2.1 Graphs and Adjacency Graphs . . . . . . . . . . . . . . . . . . 70
3.2.2 Graphs of PDE Matrices . . . . . . . . . . . . . . . . . . . . . 72
3.3 Permutations and Reorderings . . . . . . . . . . . . . . . . . . . . . . . 72
3.3.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.3.2 Relations with the Adjacency Graph . . . . . . . . . . . . . . . 75
3.3.3 Common Reorderings . . . . . . . . . . . . . . . . . . . . . . . 75
3.3.4 Irreducibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.4 Storage Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.5 Basic Sparse Matrix Operations . . . . . . . . . . . . . . . . . . . . . . 87
3.6 Sparse Direct Solution Methods . . . . . . . . . . . . . . . . . . . . . . 883.7 Test Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4 BASIC ITERATIVE METHODS 95
4.1 Jacobi, Gauss-Seidel, and SOR . . . . . . . . . . . . . . . . . . . . . . 95
4.1.1 Block Relaxation Schemes . . . . . . . . . . . . . . . . . . . . 98
4.1.2 Iteration Matrices and Preconditioning . . . . . . . . . . . . . . 102
4.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.2.1 General Convergence Result . . . . . . . . . . . . . . . . . . . 1044.2.2 Regular Splittings . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.2.3 Diagonally Dominant Matrices . . . . . . . . . . . . . . . . . . 108
4.2.4 Symmetric Positive Definite Matrices . . . . . . . . . . . . . . 112
4.2.5 Property A and Consistent Orderings . . . . . . . . . . . . . . . 112
4.3 Alternating Direction Methods . . . . . . . . . . . . . . . . . . . . . . 116
Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5 PROJECTION METHODS 122
5.1 Basic Definitions and Algorithms . . . . . . . . . . . . . . . . . . . . . 122
5.1.1 General Projection Methods . . . . . . . . . . . . . . . . . . . 123
5.1.2 Matrix Representation . . . . . . . . . . . . . . . . . . . . . . . 124
5.2 General Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.2.1 Two Optimality Results . . . . . . . . . . . . . . . . . . . . . . 126
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
4/459
5.2.2 Interpretation in Terms of Projectors . . . . . . . . . . . . . . . 127
5.2.3 General Error Bound . . . . . . . . . . . . . . . . . . . . . . . 129
5.3 One-Dimensional Projection Processes . . . . . . . . . . . . . . . . . . 131
5.3.1 Steepest Descent . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.3.2 Minimal Residual (MR) Iteration . . . . . . . . . . . . . . . . . 1345.3.3 Residual Norm Steepest Descent . . . . . . . . . . . . . . . . . 136
5.4 Additive and Multiplicative Processes . . . . . . . . . . . . . . . . . . . 136
Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6 KRYLOV SUBSPACE METHODS PART I 144
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
6.2 Krylov Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.3 Arnoldis Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.3.1 The Basic Algorithm . . . . . . . . . . . . . . . . . . . . . . . 147
6.3.2 Practical Implementations . . . . . . . . . . . . . . . . . . . . . 149
6.4 Arnoldis Method for Linear Systems (FOM) . . . . . . . . . . . . . . . 152
6.4.1 Variation 1: Restarted FOM . . . . . . . . . . . . . . . . . . . . 154
6.4.2 Variation 2: IOM and DIOM . . . . . . . . . . . . . . . . . . . 155
6.5 GMRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.5.1 The Basic GMRES Algorithm . . . . . . . . . . . . . . . . . . 158
6.5.2 The Householder Version . . . . . . . . . . . . . . . . . . . . . 159
6.5.3 Practical Implementation Issues . . . . . . . . . . . . . . . . . 161
6.5.4 Breakdown of GMRES . . . . . . . . . . . . . . . . . . . . . . 165
6.5.5 Relations between FOM and GMRES . . . . . . . . . . . . . . 1656.5.6 Variation 1: Restarting . . . . . . . . . . . . . . . . . . . . . . 168
6.5.7 Variation 2: Truncated GMRES Versions . . . . . . . . . . . . . 169
6.6 The Symmetric Lanczos Algorithm . . . . . . . . . . . . . . . . . . . . 174
6.6.1 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.6.2 Relation with Orthogonal Polynomials . . . . . . . . . . . . . . 175
6.7 The Conjugate Gradient Algorithm . . . . . . . . . . . . . . . . . . . . 176
6.7.1 Derivation and Theory . . . . . . . . . . . . . . . . . . . . . . 176
6.7.2 Alternative Formulations . . . . . . . . . . . . . . . . . . . . . 180
6.7.3 Eigenvalue Estimates from the CG Coefficients . . . . . . . . . 1816.8 The Conjugate Residual Method . . . . . . . . . . . . . . . . . . . . . 183
6.9 GCR, ORTHOMIN, and ORTHODIR . . . . . . . . . . . . . . . . . . . 183
6.10 The Faber-Manteuffel Theorem . . . . . . . . . . . . . . . . . . . . . . 186
6.11 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.11.1 Real Chebyshev Polynomials . . . . . . . . . . . . . . . . . . . 188
6.11.2 Complex Chebyshev Polynomials . . . . . . . . . . . . . . . . 189
6.11.3 Convergence of the CG Algorithm . . . . . . . . . . . . . . . . 193
6.11.4 Convergence of GMRES . . . . . . . . . . . . . . . . . . . . . 194
6.12 Block Krylov Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
7 KRYLOV SUBSPACE METHODS PART II 205
7.1 Lanczos Biorthogonalization . . . . . . . . . . . . . . . . . . . . . . . 205
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
5/459
7.1.1 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
7.1.2 Practical Implementations . . . . . . . . . . . . . . . . . . . . . 208
7.2 The Lanczos Algorithm for Linear Systems . . . . . . . . . . . . . . . . 210
7.3 The BCG and QMR Algorithms . . . . . . . . . . . . . . . . . . . . . . 210
7.3.1 The Biconjugate Gradient Algorithm . . . . . . . . . . . . . . . 2117.3.2 Quasi-Minimal Residual Algorithm . . . . . . . . . . . . . . . 212
7.4 Transpose-Free Variants . . . . . . . . . . . . . . . . . . . . . . . . . . 214
7.4.1 Conjugate Gradient Squared . . . . . . . . . . . . . . . . . . . 215
7.4.2 BICGSTAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
7.4.3 Transpose-Free QMR (TFQMR) . . . . . . . . . . . . . . . . . 221
Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
8 METHODS RELATED TO THE NORMAL EQUATIONS 230
8.1 The Normal Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
8.2 Row Projection Methods . . . . . . . . . . . . . . . . . . . . . . . . . 232
8.2.1 Gauss-Seidel on the Normal Equations . . . . . . . . . . . . . . 232
8.2.2 Cimminos Method . . . . . . . . . . . . . . . . . . . . . . . . 234
8.3 Conjugate Gradient and Normal Equations . . . . . . . . . . . . . . . . 237
8.3.1 CGNR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
8.3.2 CGNE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
8.4 Saddle-Point Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
9 PRECONDITIONED ITERATIONS 2459.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
9.2 Preconditioned Conjugate Gradient . . . . . . . . . . . . . . . . . . . . 246
9.2.1 Preserving Symmetry . . . . . . . . . . . . . . . . . . . . . . . 246
9.2.2 Efficient Implementations . . . . . . . . . . . . . . . . . . . . . 249
9.3 Preconditioned GMRES . . . . . . . . . . . . . . . . . . . . . . . . . . 251
9.3.1 Left-Preconditioned GMRES . . . . . . . . . . . . . . . . . . . 251
9.3.2 Right-Preconditioned GMRES . . . . . . . . . . . . . . . . . . 253
9.3.3 Split Preconditioning . . . . . . . . . . . . . . . . . . . . . . . 254
9.3.4 Comparison of Right and Left Preconditioning . . . . . . . . . . 2559.4 Flexible Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
9.4.1 Flexible GMRES . . . . . . . . . . . . . . . . . . . . . . . . . 256
9.4.2 DQGMRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
9.5 Preconditioned CG for the Normal Equations . . . . . . . . . . . . . . . 260
9.6 The CGW Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
10 PRECONDITIONING TECHNIQUES 265
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
10.2 Jacobi, SOR, and SSOR Preconditioners . . . . . . . . . . . . . . . . . 266
10.3 ILU Factorization Preconditioners . . . . . . . . . . . . . . . . . . . . 269
10.3.1 Incomplete LU Factorizations . . . . . . . . . . . . . . . . . . . 270
10.3.2 Zero Fill-in ILU (ILU(0)) . . . . . . . . . . . . . . . . . . . . . 275
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
6/459
10.3.3 Level of Fill and ILU(
) . . . . . . . . . . . . . . . . . . . . . . 278
10.3.4 Matrices with Regular Structure . . . . . . . . . . . . . . . . . 281
10.3.5 Modified ILU (MILU) . . . . . . . . . . . . . . . . . . . . . . 286
10.4 Threshold Strategies and ILUT . . . . . . . . . . . . . . . . . . . . . . 287
10.4.1 The ILUT Approach . . . . . . . . . . . . . . . . . . . . . . . 28810.4.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
10.4.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . 292
10.4.4 The ILUTP Approach . . . . . . . . . . . . . . . . . . . . . . . 294
10.4.5 The ILUS Approach . . . . . . . . . . . . . . . . . . . . . . . . 296
10.5 Approximate Inverse Preconditioners . . . . . . . . . . . . . . . . . . . 298
10.5.1 Approximating the Inverse of a Sparse Matrix . . . . . . . . . . 299
10.5.2 Global Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . 299
10.5.3 Column-Oriented Algorithms . . . . . . . . . . . . . . . . . . . 301
10.5.4 Theoretical Considerations . . . . . . . . . . . . . . . . . . . . 303
10.5.5 Convergence of Self Preconditioned MR . . . . . . . . . . . . . 305
10.5.6 Factored Approximate Inverses . . . . . . . . . . . . . . . . . . 307
10.5.7 Improving a Preconditioner . . . . . . . . . . . . . . . . . . . . 310
10.6 Block Preconditioners . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
10.6.1 Block-Tridiagonal Matrices . . . . . . . . . . . . . . . . . . . . 311
10.6.2 General Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 312
10.7 Preconditioners for the Normal Equations . . . . . . . . . . . . . . . . 313
10.7.1 Jacobi, SOR, and Variants . . . . . . . . . . . . . . . . . . . . . 313
10.7.2 IC(0) for the Normal Equations . . . . . . . . . . . . . . . . . . 314
10.7.3 Incomplete Gram-Schmidt and ILQ . . . . . . . . . . . . . . . 316Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
11 PARALLEL IMPLEMENTATIONS 324
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
11.2 Forms of Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
11.2.1 Multiple Functional Units . . . . . . . . . . . . . . . . . . . . . 325
11.2.2 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
11.2.3 Vector Processors . . . . . . . . . . . . . . . . . . . . . . . . . 326
11.2.4 Multiprocessing and Distributed Computing . . . . . . . . . . . 32611.3 Types of Parallel Architectures . . . . . . . . . . . . . . . . . . . . . . 327
11.3.1 Shared Memory Computers . . . . . . . . . . . . . . . . . . . . 327
11.3.2 Distributed Memory Architectures . . . . . . . . . . . . . . . . 329
11.4 Types of Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
11.4.1 Preconditioned CG . . . . . . . . . . . . . . . . . . . . . . . . 332
11.4.2 GMRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
11.4.3 Vector Operations . . . . . . . . . . . . . . . . . . . . . . . . . 333
11.4.4 Reverse Communication . . . . . . . . . . . . . . . . . . . . . 334
11.5 Matrix-by-Vector Products . . . . . . . . . . . . . . . . . . . . . . . . 335
11.5.1 The Case of Dense Matrices . . . . . . . . . . . . . . . . . . . 335
11.5.2 The CSR and CSC Formats . . . . . . . . . . . . . . . . . . . . 336
11.5.3 Matvecs in the Diagonal Format . . . . . . . . . . . . . . . . . 339
11.5.4 The Ellpack-Itpack Format . . . . . . . . . . . . . . . . . . . . 340
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
7/459
11.5.5 The Jagged Diagonal Format . . . . . . . . . . . . . . . . . . . 341
11.5.6 The Case of Distributed Sparse Matrices . . . . . . . . . . . . . 342
11.6 Standard Preconditioning Operations . . . . . . . . . . . . . . . . . . . 345
11.6.1 Parallelism in Forward Sweeps . . . . . . . . . . . . . . . . . . 346
11.6.2 Level Scheduling: the Case of 5-Point Matrices . . . . . . . . . 34611.6.3 Level Scheduling for Irregular Graphs . . . . . . . . . . . . . . 347
Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
12 PARALLEL PRECONDITIONERS 353
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
12.2 Block-Jacobi Preconditioners . . . . . . . . . . . . . . . . . . . . . . . 354
12.3 Polynomial Preconditioners . . . . . . . . . . . . . . . . . . . . . . . . 356
12.3.1 Neumann Polynomials . . . . . . . . . . . . . . . . . . . . . . 356
12.3.2 Chebyshev Polynomials . . . . . . . . . . . . . . . . . . . . . . 357
12.3.3 Least-Squares Polynomials . . . . . . . . . . . . . . . . . . . . 360
12.3.4 The Nonsymmetric Case . . . . . . . . . . . . . . . . . . . . . 363
12.4 Multicoloring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
12.4.1 Red-Black Ordering . . . . . . . . . . . . . . . . . . . . . . . . 366
12.4.2 Solution of Red-Black Systems . . . . . . . . . . . . . . . . . . 367
12.4.3 Multicoloring for General Sparse Matrices . . . . . . . . . . . . 368
12.5 Multi-Elimination ILU . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
12.5.1 Multi-Elimination . . . . . . . . . . . . . . . . . . . . . . . . . 370
12.5.2 ILUM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
12.6 Distributed ILU and SSOR . . . . . . . . . . . . . . . . . . . . . . . . 37412.6.1 Distributed Sparse Matrices . . . . . . . . . . . . . . . . . . . . 374
12.7 Other Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
12.7.1 Approximate Inverses . . . . . . . . . . . . . . . . . . . . . . . 377
12.7.2 Element-by-Element Techniques . . . . . . . . . . . . . . . . . 377
12.7.3 Parallel Row Projection Preconditioners . . . . . . . . . . . . . 379
Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
13 DOMAIN DECOMPOSITION METHODS 383
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38313.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
13.1.2 Types of Partitionings . . . . . . . . . . . . . . . . . . . . . . . 385
13.1.3 Types of Techniques . . . . . . . . . . . . . . . . . . . . . . . . 386
13.2 Direct Solution and the Schur Complement . . . . . . . . . . . . . . . . 388
13.2.1 Block Gaussian Elimination . . . . . . . . . . . . . . . . . . . 388
13.2.2 Properties of the Schur Complement . . . . . . . . . . . . . . . 389
13.2.3 Schur Complement for Vertex-Based Partitionings . . . . . . . . 390
13.2.4 Schur Complement for Finite-Element Partitionings . . . . . . . 393
13.3 Schwarz Alternating Procedures . . . . . . . . . . . . . . . . . . . . . . 395
13.3.1 Multiplicative Schwarz Procedure . . . . . . . . . . . . . . . . 395
13.3.2 Multiplicative Schwarz Preconditioning . . . . . . . . . . . . . 400
13.3.3 Additive Schwarz Procedure . . . . . . . . . . . . . . . . . . . 402
13.3.4 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
8/459
13.4 Schur Complement Approaches . . . . . . . . . . . . . . . . . . . . . . 408
13.4.1 Induced Preconditioners . . . . . . . . . . . . . . . . . . . . . . 408
13.4.2 Probing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
13.4.3 Preconditioning Vertex-Based Schur Complements . . . . . . . 411
13.5 Full Matrix Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41213.6 Graph Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
13.6.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 414
13.6.2 Geometric Approach . . . . . . . . . . . . . . . . . . . . . . . 415
13.6.3 Spectral Techniques . . . . . . . . . . . . . . . . . . . . . . . . 417
13.6.4 Graph Theory Techniques . . . . . . . . . . . . . . . . . . . . . 418
Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
REFERENCES 425
INDEX 439
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
9/459
xii
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
10/459
Iterative methods for solving general, large sparse linear systems have been gaining
popularity in many areas of scientific computing. Until recently, direct solution methods
were often preferred to iterative methods in real applications because of their robustness
and predictable behavior. However, a number of efficient iterative solvers were discovered
and the increased need for solving very large linear systems triggered a noticeable and
rapid shift toward iterative techniques in many applications.
This trend can be traced back to the 1960s and 1970s when two important develop-
ments revolutionized solution methods for large linear systems. First was the realization
that one can take advantage of sparsity to design special direct methods that can be
quite economical. Initiated by electrical engineers, these direct sparse solution methods
led to the development of reliable and efficient general-purpose direct solution software
codes over the next three decades. Second was the emergence of preconditioned conjugate
gradient-like methods for solving linear systems. It was found that the combination of pre-
conditioning and Krylov subspace iterations could provide efficient and simple general-purpose procedures that could compete with direct solvers. Preconditioning involves ex-
ploiting ideas from sparse direct solvers. Gradually, iterative methods started to approach
the quality of direct solvers. In earlier times, iterative methods were often special-purpose
in nature. They were developed with certain applications in mind, and their efficiency relied
on many problem-dependent parameters.
Now, three-dimensional models are commonplace and iterative methods are al-
most mandatory. The memory and the computational requirements for solving three-
dimensional Partial Differential Equations, or two-dimensional ones involving many
degrees of freedom per point, may seriously challenge the most efficient direct solvers
available today. Also, iterative methods are gaining ground because they are easier toimplement efficiently on high-performance computers than direct methods.
My intention in writing this volume is to provide up-to-date coverage of iterative meth-
ods for solving large sparse linear systems. I focused the book on practical methods that
work for general sparse matrices rather than for any specific class of problems. It is indeed
becoming important to embrace applications not necessarily governed by Partial Differ-
ential Equations, as these applications are on the rise. Apart from two recent volumes by
Axelsson [15] and Hackbusch [116], few books on iterative methods have appeared since
the excellent ones by Varga [213]. and later Young [232]. Since then, researchers and prac-
titioners have achieved remarkable progress in the development and use of effective iter-ative methods. Unfortunately, fewer elegant results have been discovered since the 1950s
and 1960s. The field has moved in other directions. Methods have gained not only in effi-
ciency but also in robustness and in generality. The traditional techniques which required
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
11/459
rather complicated procedures to determine optimal acceleration parameters have yielded
to the parameter-free conjugate gradient class of methods.
The primary aim of this book is to describe some of the best techniques available today,
from both preconditioners and accelerators. One of the aims of the book is to provide a
good mix of theory and practice. It also addresses some of the current research issuessuch as parallel implementations and robust preconditioners. The emphasis is on Krylov
subspace methods, currently the most practical and common group of techniques used in
applications. Although there is a tutorial chapter that covers the discretization of Partial
Differential Equations, the book is not biased toward any specific application area. Instead,
the matrices are assumed to be general sparse, possibly irregularly structured.
The book has been structured in four distinct parts. The first part, Chapters 1 to 4,
presents the basic tools. The second part, Chapters 5 to 8, presents projection methods and
Krylov subspace techniques. The third part, Chapters 9 and 10, discusses precondition-
ing. The fourth part, Chapters 11 to 13, discusses parallel implementations and parallel
algorithms.
I am grateful to a number of colleagues who proofread or reviewed different versions of
the manuscript. Among them are Randy Bramley (University of Indiana at Bloomingtin),Xiao-Chuan Cai (University of Colorado at Boulder), Tony Chan (University of California
at Los Angeles), Jane Cullum (IBM, Yorktown Heights), Alan Edelman (Massachussett
Institute of Technology), Paul Fischer (Brown University), David Keyes (Old Dominion
University), Beresford Parlett (University of California at Berkeley) and Shang-Hua Teng
(University of Minnesota). Their numerous comments, corrections, and encouragements
were a highly appreciated contribution. In particular, they helped improve the presenta-
tion considerably and prompted the addition of a number of topics missing from earlier
versions.
This book evolved from several successive improvements of a set of lecture notes for
the course Iterative Methods for Linear Systems which I taught at the University of Min-nesota in the last few years. I apologize to those students who used the earlier error-laden
and incomplete manuscripts. Their input and criticism contributed significantly to improv-
ing the manuscript. I also wish to thank those students at MIT (with Alan Edelman) and
UCLA (with Tony Chan) who used this book in manuscript form and provided helpful
feedback. My colleagues at the university of Minnesota, staff and faculty members, have
helped in different ways. I wish to thank in particular Ahmed Sameh for his encourage-
ments and for fostering a productive environment in the department. Finally, I am grateful
to the National Science Foundation for their continued financial support of my research,
part of which is represented in this work.
Yousef Saad
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
12/459
This book can be used as a text to teach a graduate-level course on iterative methods for
linear systems. Selecting topics to teach depends on whether the course is taught in a
mathematics department or a computer science (or engineering) department, and whether
the course is over a semester or a quarter. Here are a few comments on the relevance of the
topics in each chapter.
For a graduate course in a mathematics department, much of the material in Chapter 1
should be known already. For non-mathematics majors most of the chapter must be covered
or reviewed to acquire a good background for later chapters. The important topics for
the rest of the book are in Sections: 1.8.1, 1.8.3, 1.8.4, 1.9, 1.11. Section 1.12 is best
treated at the beginning of Chapter 5. Chapter 2 is essentially independent from the restand could be skipped altogether in a quarter course. One lecture on finite differences and
the resulting matrices would be enough for a non-math course. Chapter 3 should make
the student familiar with some implementation issues associated with iterative solution
procedures for general sparse matrices. In a computer science or engineering department,
this can be very relevant. For mathematicians, a mention of the graph theory aspects of
sparse matrices and a few storage schemes may be sufficient. Most students at this level
should be familiar with a few of the elementary relaxation techniques covered in Chapter
4. The convergence theory can be skipped for non-math majors. These methods are now
often used as preconditioners and this may be the only motive for covering them.
Chapter 5 introduces key concepts and presents projection techniques in general terms.Non-mathematicians may wish to skip Section 5.2.3. Otherwise, it is recommended to
start the theory section by going back to Section 1.12 on general definitions on projectors.
Chapters 6 and 7 represent the heart of the matter. It is recommended to describe the first
algorithms carefully and put emphasis on the fact that they generalize the one-dimensional
methods covered in Chapter 5. It is also important to stress the optimality properties of
those methods in Chapter 6 and the fact that these follow immediately from the properties
of projectors seen in Section 1.12. When covering the algorithms in Chapter 7, it is crucial
to point out the main differences between them and those seen in Chapter 6. The variants
such as CGS, BICGSTAB, and TFQMR can be covered in a short time, omitting details ofthe algebraic derivations or covering only one of the three. The class of methods based on
the normal equation approach, i.e., Chapter 8, can be skipped in a math-oriented course,
especially in the case of a quarter system. For a semester course, selected topics may be
Sections 8.1, 8.2, and 8.4.
Currently, preconditioning is known to be the critical ingredient in the success of it-
erative methods in solving real-life problems. Therefore, at least some parts of Chapter 9
and Chapter 10 should be covered. Section 9.2 and (very briefly) 9.3 are recommended.
From Chapter 10, discuss the basic ideas in Sections 10.1 through 10.3. The rest could be
skipped in a quarter course.
Chapter 11 may be useful to present to computer science majors, but may be skimmedor skipped in a mathematics or an engineering course. Parts of Chapter 12 could be taught
primarily to make the students aware of the importance of alternative preconditioners.
Suggested selections are: 12.2, 12.4, and 12.7.2 (for engineers). Chapter 13 presents an im-
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
13/459
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
14/459
For the sake of generality, all vector spaces considered in this chapter are complex, unless
otherwise stated. A complex matrix is an array of complex numbers
The set of all
matrices is a complex vector space denoted by
. The main
operations with matrices are the following:
Addition:
, where
, and
are matrices of size
and
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
15/459
Multiplication by a scalar:
, where
Multiplication by another matrix:
where
, and
Sometimes, a notation with column vectors and row vectors is used. The column vector
is the vector consisting of the -th column of ,
...
Similarly, the notation will denote the -th row of the matrix
For example, the following could be written
or
Thetranspose of a matrix in
is a matrix
in
whose elements are
defined by
. It is denoted by
. It is often more
relevant to use thetranspose conjugatematrix denoted by
and defined by
in which the bar denotes the (element-wise) complex conjugation.
Matrices are strongly related to linear mappings between vector spaces of finite di-
mension. This is because they represent these mappings with respect to two given bases:
one for the initial vector space and the other for the image vector space, or rangeof .
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
16/459
A matrix is square if it has the same number of columns and rows, i.e., if
. An
important square matrix is the identity matrix
where
is the Kronecker symbol. The identity matrix satisfies the equality
for every matrix of size . The inverse of a matrix, when it exists, is a matrix such that
The inverse of is denoted by
.
Thedeterminantof a matrix may be defined in several ways. For simplicity, the fol-lowing recursive definition is used here. The determinant of a
matrix
is defined
as the scalar . Then the determinant of an
matrix is given by
where
is an
matrix obtained by deleting the first row and the -th
column of . A matrix is said to besingularwhen
andnonsingularotherwise.
We have the following simple properties:
.
.
.
.
.
From the above definition of determinants it can be shown by induction that the func-
tion that maps a given complex value to the value
is a polynomial
of degree ; see Exercise 8. This is known as thecharacteristic polynomialof the matrix
.
A complex scalar is called an eigenvalue of the square matrix if
a nonzero vector of
exists such that
. The vector is called aneigenvector
of associated with . The set of all the eigenvalues of
is called the spectrum of
and
is denoted by
.
A scalar is an eigenvalue of if and only if
. That is true
if and only if (iffthereafter) is a root of the characteristic polynomial. In particular, there
are at most distinct eigenvalues.
It is clear that a matrix is singular if and only if it admits zero as an eigenvalue. A well
known result in linear algebra is stated in the following proposition.
A matrix is nonsingular if and only if it admits an inverse.
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
17/459
Thus, the determinant of a matrix determines whether or not the matrix admits an inverse.
The maximum modulus of the eigenvalues is called spectral radiusand is denoted by
Thetraceof a matrix is equal to the sum of all its diagonal elements
It can be easily shown that the trace of is also equal to the sum of the eigenvalues of
counted with their multiplicities as roots of the characteristic polynomial.
If
is an eigenvalue of
, then
is an eigenvalue of
. Aneigenvector of associated with the eigenvalue
is called a left eigenvector of .
When a distinction is necessary, an eigenvector of is often called a right eigenvector.
Therefore, the eigenvalue as well as the right and left eigenvectors, and , satisfy the
relations
or, equivalently,
The choice of a method for solving linear systems will often depend on the structure of
the matrix . One of the most important properties of matrices is symmetry, because of
its impact on the eigenstructure of . A number of other classes of matrices also have
particular eigenstructures. The most important ones are listed below:
Symmetric matrices:
.
Hermitian matrices:
.
Skew-symmetric matrices:
.
Skew-Hermitian matrices:
.
Normal matrices:
.
Nonnegative matrices:
(similar definition for nonpositive,
positive, and negative matrices).
Unitary matrices:
.
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
18/459
It is worth noting that a unitary matrix is a matrix whose inverse is its transpose conjugate
, since
A matrix such that is diagonal is often called orthogonal.
Some matrices have particular structures that are often convenient for computational
purposes. The following list, though incomplete, gives an idea of these special matrices
which play an important role in numerical analysis and scientific computing applications.
Diagonal matrices:
for . Notation:
Upper triangular matrices:
for
.
Lower triangular matrices:
for .
Upper bidiagonal matrices:
for or
.
Lower bidiagonal matrices: for or .
Tridiagonal matrices: for any pair such that
. Notation:
Banded matrices:
only if
, where
and
are two
nonnegative integers. The number
is called the bandwidth of .
Upper Hessenberg matrices:
for any pair such that . Lower
Hessenberg matrices can be defined similarly.
Outer product matrices:
, where both and are vectors.
Permutation matrices: the columns of are a permutation of the columns of the
identity matrix.
Block diagonal matrices:generalizes the diagonal matrix by replacing each diago-nal entry by a matrix. Notation:
Block tridiagonal matrices: generalizes the tridiagonal matrix by replacing each
nonzero entry by a square matrix. Notation:
The above properties emphasize structure, i.e., positions of the nonzero elements with
respect to the zeros. Also, they assume that there are many zero elements or that the matrix
is of low rank. This is in contrast with the classifications listed earlier, such as symmetry
or normality.
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
19/459
An inner product on a (complex) vector space is any mapping
from
into
,
which satisfies the following conditions:
is linear with respect to , i.e.,
isHermitian, i.e.,
ispositive definite, i.e.,
Note that (2) implies that
is real and therefore, (3) adds the constraint that
must also be positive for any nonzero . For any and ,
Similarly,
for any
. Hence,
for any
and
. In particular
the condition (3) can be rewritten as
and
iff
as can be readily shown. A useful relation satisfied by any inner product is the so-called
Cauchy-Schwartz inequality:
The proof of this inequality begins by expanding
using the properties of
,
If
then the inequality is trivially satisfied. Assume that
and take
. Then
shows the above equality
which yields the result (1.2).
In the particular case of the vector space , a canonical inner product is the
Euclidean inner product. The Euclidean inner product of two vectors
and
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
20/459
of
is defined by
which is often rewritten in matrix notation as
It is easy to verify that this mapping does indeed satisfy the three conditions required for
inner products, listed above. A fundamental property of the Euclidean inner product in
matrix computations is the simple relation
The proof of this is straightforward. The adjointof with respect to an arbitrary inner
productis a matrix such that
for all pairs of vectors
and
. A matrixisself-adjoint, or Hermitian with respect to this inner product, if it is equal to its adjoint.
The following proposition is a consequence of the equality (1.5).
Unitary matrices preserve the Euclidean inner product, i.e.,
for any unitary matrix and any vectors and .
Indeed,
.
A vector norm on a vector space is a real-valued function
on , which
satisfies the following three conditions:
and
iff .
.
.
For the particular case when
, we can associate with the inner product (1.3)theEuclidean normof a complex vector defined by
It follows from Proposition 1.3 that a unitary matrix preserves the Euclidean norm metric,
i.e.,
The linear transformation associated with a unitary matrix is therefore anisometry.
The most commonly used vector norms in numerical linear algebra are special cases
of the Holder norms
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
21/459
Note that the limit of
when
tends to infinity exists and is equal to the maximum
modulus of the s. This defines a norm denoted by
. The cases
,
, and
lead to the most important norms in practice,
The Cauchy-Schwartz inequality of (1.2) becomes
For a general matrix in
, we define the following special set of norms
The norm
isinducedby the two norms
and
. These norms satisfy the usual
properties of norms, i.e.,
and
iff
The most important cases are again those associated with . The case
is of particular interest and the associated norm
is simply denoted by
and
called a -norm. A fundamental property of a -norm is that
an immediate consequence of the definition (1.7). Matrix norms that satisfy the above
property are sometimes called consistent. A result of consistency is that for any square
matrix ,
In particular the matrix
converges to zero ifany of its -norms is less than 1.
The Frobenius norm of a matrix is defined by
This can be viewed as the 2-norm of the column (or row) vector in consisting of all the
columns (respectively rows) of listed from to (respectively to .) It can be shown
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
22/459
that this norm is also consistent, in spite of the fact that it is not induced by a pair of vector
norms, i.e., it is not derived from a formula of the form (1.7); see Exercise 5. However, it
does not satisfy some of the other properties of the
-norms. For example, the Frobenius
norm of the identity matrix is not equal to one. To avoid these difficulties, we will only use
the term matrix norm for a norm that is induced by two norms as in the definition (1.7).Thus, we will not consider the Frobenius norm to be a proper matrix norm, according to
our conventions, even though it is consistent.
The following equalities satisfied by the matrix norms defined above lead to alternative
definitions that are often easier to work with:
As will be shown later, the eigenvalues of
are nonnegative. Their square roots
are called singular values of and are denoted by
. Thus, the relation
(1.11) states that
is equal to , the largest singular value of .
From the relation (1.11), it is clear that the spectral radius
is equalto the 2-norm of a matrix when the matrix is Hermitian. However, it is not a matrix norm
in general. For example, the first property of norms is not satisfied, since for
we have
while
. Also, the triangle inequality is not satisfied for the pair ,
and where is defined above. Indeed,
while
A subspace of
is a subset of
that is also a complex vector space. The set of alllinear combinations of a set of vectors
of is a vector subspace
called the linear span of ,
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
23/459
If the
s are linearly independent, then each vector of
admits a unique expres-
sion as a linear combination of the
s. The set is then called a basis of the subspace
.
Given two vector subspaces and , theirsum is a subspace defined as the set of
all vectors that are equal to the sum of a vector of and a vector of . The intersection
of two subspaces is also a subspace. If the intersection of and is reduced to
, then
the sum of
and
is called their direct sum and is denoted by
. When
is equal to
, then every vector of
can be written in a unique way as the sum of
an element of and an element of . The transformation that maps into
is a linear transformation that isidempotent, i.e., such that
. It is called aprojector
onto along .
Two important subspaces that are associated with a matrix
of
are itsrange,defined by
and itskernelornull space
The range of is clearly equal to the linear span of its columns. The rankof a matrix
is equal to the dimension of the range of , i.e., to the number of linearly independent
columns. This column rankis equal to the row rank, the number of linearly independent
rows of . A matrix in
is offull rankwhen its rank is equal to the smallest of
and .
A subspace is said to beinvariantunder a (square) matrix whenever . In
particular for any eigenvalue of the subspace
is invariant under . The
subspace
is called the eigenspace associated with and consists of all the
eigenvectors of associated with , in addition to the zero-vector.
A set of vectors
is said to beorthogonalif
It is orthonormalif, in addition, every vector of has a 2-norm equal to unity. A vector
that is orthogonal to all the vectors of a subspace is said to be orthogonal to this sub-
space. The set of all the vectors that are orthogonal to is a vector subspace called the
orthogonal complementof and denoted by
. The space
is the direct sum of
and
its orthogonal complement. Thus, any vector can be written in a unique fashion as the
sum of a vector in and a vector in . The operator which maps into its component in
the subspace is theorthogonal projectoronto .
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
24/459
Every subspace admits an orthonormal basis which is obtained by taking any basis and
orthonormalizing it. The orthonormalization can be achieved by an algorithm known as
the Gram-Schmidt process which we now describe. Given a set of linearly independent
vectors
, first normalize the vector , which means divide it by its 2-
norm, to obtain the scaled vector
of norm unity. Then
is orthogonalized against thevector by subtracting from a multiple of to make the resulting vector orthogonal
to , i.e.,
The resulting vector is again normalized to yield the second vector . The -th step of
the Gram-Schmidt process consists of orthogonalizing the vector
against all previous
vectors
.
1. Compute
. If
Stop, else compute
.
2. For
Do:
3. Compute
, for
4.
5.
,
6. If
then Stop, else
7. EndDo
It is easy to prove that the above algorithm will not break down, i.e., all steps will
be completed if and only if the set of vectors is linearly independent. From
lines 4 and 5, it is clear that at every step of the algorithm the following relation holds:
If
,
, and if denotes the upper triangular
matrix whose nonzero elements are the
defined in the algorithm, then the above relationcan be written as
This is called the QR decomposition of the
matrix . From what was said above, the
QR decomposition of a matrix exists whenever the column vectors of form a linearly
independent set of vectors.
The above algorithm is the standard Gram-Schmidt process. There are alternative for-
mulations of the algorithm which have better numerical properties. The best known of
these is the Modified Gram-Schmidt (MGS) algorithm.
1. Define
. If Stop, else
.
2. For Do:
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
25/459
3. Define
4. For , Do:
5.
6.
7. EndDo 8. Compute
,
9. If then Stop, else
10. EndDo
Yet another alternative for orthogonalizing a sequence of vectors is the Householder
algorithm. This technique uses Householderreflectors, i.e., matrices of the form
in which
is a vector of 2-norm unity. Geometrically, the vector
represents a mirrorimage of with respect to the hyperplane
.
To describe the Householder orthogonalization process, the problem can be formulated
as that of finding a QR factorization of a given
matrix . For any vector , the vector
for the Householder transformation (1.15) is selected in such a way that
where is a scalar. Writing
yields
This shows that the desired
is a multiple of the vector
,
For (1.16) to be satisfied, we must impose the condition
which gives
, where
is the first component
of the vector . Therefore, it is necessary that
In order to avoid that the resulting vector be small, it is customary to take
which yields
Given an
matrix, its first column can be transformed to a multiple of the column
, by premultiplying it by a Householder matrix ,
Assume, inductively, that the matrix has been transformed in
successive steps into
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
26/459
the partially upper triangular form
. . .
...
...
......
...
This matrix is upper triangular up to column number
. To advance by one step, it must
be transformed into one which is upper triangular up the -th column, leaving the previous
columns in the same form. To leave the first
columns unchanged, select a
vector
which has zeros in positions through
. So the next Householder reflector matrix is
defined as
in which the vector
is defined as
where the components of the vector
are given by
if
if
if
with
We note in passing that the premultiplication of a matrix by a Householder trans-
form requires only a rank-one update since,
where
Therefore, the Householder matrices need not, and should not, be explicitly formed. In
addition, the vectors
need not be explicitly scaled.
Assume now that
Householder transforms have been applied to a certain matrix
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
27/459
of dimension
, to reduce it into the upper triangular form,
. . . ...
...
...
Recall that our initial goal was to obtain a QR factorization of . We now wish to recover
the and matrices from the s and the above matrix. If we denote by the product
of the
on the left-side of (1.22), then (1.22) becomes
in which is an
upper triangular matrix, and
is an
zero block.
Since is unitary, its inverse is equal to its transpose and, as a result,
If
is the matrix of size which consists of the first columns of the identity
matrix, then the above equality translates into
The matrix
represents the first columns of
. Since
and
are the matrices sought. In summary,
in which is the triangular matrix obtained from the Householder reduction of
(see
(1.22) and (1.23)) and
1. Define
2. For
Do:
3. If compute
4. Compute
using (1.19), (1.20), (1.21)
5. Compute
with
6. Compute
7. EndDo
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
28/459
Note that line 6 can be omitted since the are not needed in the execution of the
next steps. It must be executed only when the matrix is needed at the completion of
the algorithm. Also, the operation in line 5 consists only of zeroing the components
and updating the -th component of
. In practice, a work vector can be used for
and its nonzero components after this step can be saved into an upper triangular matrix.Since the components 1 through of the vector
are zero, the upper triangular matrix
can be saved in those zero locations which would otherwise be unused.
This section discusses the reduction of square matrices into matrices that have simpler
forms, such as diagonal, bidiagonal, or triangular. Reduction means a transformation that
preserves the eigenvalues of a matrix.
Two matrices and
are said to be similar if there is a nonsingular
matrix such that
The mapping
is called a similarity transformation.
It is clear that similarity is an equivalence relation. Similarity transformations preserve
the eigenvalues of matrices. An eigenvector of
is transformed into the eigenvector
of
. In effect, a similarity transformation amounts to representing the matrix
in a different basis.
We now introduce some terminology.
An eigenvalue of hasalgebraic multiplicity
, if it is a root of multiplicity
of the characteristic polynomial.
If an eigenvalue is of algebraic multiplicity one, it is said to besimple. A nonsimpleeigenvalue ismultiple.
Thegeometric multiplicity of an eigenvalue of
is the maximum number of
independent eigenvectors associated with it. In other words, the geometric multi-
plicity is the dimension of the eigenspace
.
A matrix is derogatory if the geometric multiplicity of at least one of its eigenvalues
is larger than one.
An eigenvalue is semisimple if its algebraic multiplicity is equal to its geometric
multiplicity. An eigenvalue that is not semisimple is called defective.
Often,
(
) are used to denote thedistincteigenvalues of . It is
easy to show that the characteristic polynomials of two similar matrices are identical; see
Exercise 9. Therefore, the eigenvalues of two similar matrices are equal and so are their
algebraic multiplicities. Moreover, if is an eigenvector of , then is an eigenvector
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
29/459
of and, conversely, if is an eigenvector of
then
is an eigenvector of . As
a result the number of independent eigenvectors associated with a given eigenvalue is the
same for two similar matrices, i.e., their geometric multiplicity is also the same.
The simplest form in which a matrix can be reduced is undoubtedly the diagonal form.
Unfortunately, this reduction is not always possible. A matrix that can be reduced to the
diagonal form is called diagonalizable. The following theorem characterizes such matrices.
A matrix of dimension is diagonalizable if and only if it has line-
arly independent eigenvectors.
A matrix is diagonalizable if and only if there exists a nonsingular matrix
and a diagonal matrix such that
, or equivalently , where is
a diagonal matrix. This is equivalent to saying that linearly independent vectors exist
the column-vectors of
such that
. Each of these column-vectors is an
eigenvector of .
A matrix that is diagonalizable has only semisimple eigenvalues. Conversely, if all the
eigenvalues of a matrix are semisimple, then
has
eigenvectors. It can be easily
shown that these eigenvectors are linearly independent; see Exercise 2. As a result, wehave the following proposition.
A matrix is diagonalizable if and only if all its eigenvalues are
semisimple.
Since every simple eigenvalue is semisimple, an immediate corollary of the above result
is: When has distinct eigenvalues, then it is diagonalizable.
From the theoretical viewpoint, one of the most important canonical forms of matrices is
the well known Jordan form. A full development of the steps leading to the Jordan form
is beyond the scope of this book. Only the main theorem is stated. Details, including the
proof, can be found in standard books of linear algebra such as [117]. In the following,
refers to the algebraic multiplicity of the individual eigenvalue and
is theindexof the
eigenvalue, i.e., the smallest integer for which
.
Any matrix can be reduced to a block diagonal matrix consisting of
diagonal blocks, each associated with a distinct eigenvalue . Each of these diagonal
blocks has itself a block diagonal structure consisting of sub-blocks, where is the
geometric multiplicity of the eigenvalue . Each of the sub-blocks, referred to as a Jordan
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
30/459
block, is an upper bidiagonal matrix of size not exceeding
, with the constant
on the diagonal and the constant one on the super diagonal.
The -th diagonal block, , is known as the -th Jordan submatrix (sometimes
Jordan Box). The Jordan submatrix number
starts in column
. Thus,
. . .
. . .
where each
is associated with
and is of size
the algebraic multiplicity of
. It hasitself the following structure,
. . .
with
. . . . . .
Each of the blocks corresponds to a different eigenvector associated with the eigenvalue
. Its size
is the index of .
Here, it will be shown that any matrix is unitarily similar to an upper triangular matrix. The
only result needed to prove the following theorem is that any vector of 2-norm one can be
completed by
additional vectors to form an orthonormal basis of
.
For any square matrix , there exists a unitary matrix such that
is upper triangular.
The proof is by induction over the dimension . The result is trivial for
.
Assume that it is true for
and consider any matrix
of size
. The matrix admits
at least one eigenvector that is associated with an eigenvalue . Also assume without
loss of generality that
. First, complete the vector into an orthonormal set, i.e.,
find an
matrix such that the matrix is unitary. Then
and hence,
Now use the induction hypothesis for the
matrix : There
exists an
unitary matrix such that is upper triangular.
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
31/459
Define the
matrix
and multiply both members of (1.24) by from the left and from the right. The
resulting matrix is clearly upper triangular and this shows that the result is true for , with
which is a unitary
matrix.
A simpler proof that uses the Jordan canonical form and the QR decomposition is the sub-
ject of Exercise 7. Since the matrix is triangular and similar to
, its diagonal elements
are equal to the eigenvalues of ordered in a certain manner. In fact, it is easy to extend
the proof of the theorem to show that this factorization can be obtained with any orderfor
the eigenvalues. Despite its simplicity, the above theorem has far-reaching consequences,
some of which will be examined in the next section.
It is important to note that for any , the subspace spanned by the first columns
of is invariant under . Indeed, the relation implies that for , we
have
If we let and if is the principal leading submatrix of dimension
of
, the above relation can be rewritten as
which is known as the partial Schur decomposition of . The simplest case of this decom-
position is when , in which case is an eigenvector. The vectors are usually called
Schur vectors. Schur vectors are not unique and depend, in particular, on the order chosen
for the eigenvalues.
A slight variation on the Schur canonical form is the quasi-Schur form, also called the
real Schur form. Here, diagonal blocks of size are allowed in the upper triangular
matrix . The reason for this is to avoid complex arithmetic when the original matrix isreal. A block is associated with each complex conjugate pair of eigenvalues of the
matrix.
Consider the matrix
The matrix has the pair of complex conjugate eigenvalues
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
32/459
and the real eigenvalue
. The standard (complex) Schur form is given by the pair
of matrices
and
It is possible to avoid complex arithmetic by using the quasi-Schur form which consists of
the pair of matrices
and
We conclude this section by pointing out that the Schur and the quasi-Schur forms
of a given matrix are in no way unique. In addition to the dependence on the orderingof the eigenvalues, any column of can be multiplied by a complex sign
and a new
corresponding can be found. For the quasi-Schur form, there are infinitely many ways
to select the blocks, corresponding to applying arbitrary rotations to the columns of
associated with these blocks.
The analysis of many numerical techniques is based on understanding the behavior of the
successive powers
of a given matrix . In this regard, the following theorem plays a
fundamental role in numerical linear algebra, more particularly in the analysis of iterative
methods.
The sequence
,
converges to zero if and only if
.
To prove the necessary condition, assume that
and consider a unit
eigenvector associated with an eigenvalue of maximum modulus. We have
which implies, by taking the 2-norms of both sides,
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
33/459
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
34/459
For any matrix norm
, we have
The proof is a direct application of the Jordan canonical form and is the subject
of Exercise 10.
This section examines specific properties of normal matrices and Hermitian matrices, in-cluding some optimality properties related to their spectra. The most common normal ma-
trices that arise in practice are Hermitian or skew-Hermitian.
By definition, a matrix is said to be normal if it commutes with its transpose conjugate,
i.e., if it satisfies the relation
An immediate property of normal matrices is stated in the following lemma.
If a normal matrix is triangular, then it is a diagonal matrix.
Assume, for example, that is upper triangular and normal. Compare the first
diagonal element of the left-hand side matrix of (1.25) with the corresponding element of
the matrix on the right-hand side. We obtain that
which shows that the elements of the first row are zeros except for the diagonal one. The
same argument can now be used for the second row, the third row, and so on to the last row,
to show that for .
A consequence of this lemma is the following important result.
A matrix is normal if and only if it is unitarily similar to a diagonal
matrix.
It is straightforward to verify that a matrix which is unitarily similar to a diagonal
matrix is normal. We now prove that any normal matrix is unitarily similar to a diagonal
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
35/459
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
36/459
A normal matrix whose eigenvalues are real is Hermitian.
As will be seen shortly, the converse is also true, i.e., a Hermitian matrix has real eigenval-
ues.
An eigenvalue of any matrix satisfies the relation
where is an associated eigenvector. Generally, one might consider the complex scalars
defined for any nonzero vector in
. These ratios are known as Rayleigh quotientsand
are important both for theoretical and practical purposes. The set of all possible Rayleigh
quotients as runs over
is called thefield of valuesof . This set is clearly bounded
since each
is bounded by the the 2-norm of , i.e.,
for all .
If a matrix is normal, then any vector in can be expressed as
where the vectors form an orthogonal basis of eigenvectors, and the expression for
becomes
where
From a well known characterization of convex hulls established by Hausdorff (Hausdorffs
convex hull theorem), this means that the set of all possible Rayleigh quotients as runs
over all of
is equal to the convex hull of the s. This leads to the following theorem
which is stated without proof.
The field of values of a normal matrix is equal to the convex hull of its
spectrum.
The next question is whether or not this is also true for nonnormal matrices and the
answer is no: The convex hull of the eigenvalues and the field of values of a nonnormal
matrix are different in general. As a generic example, one can take any nonsymmetric real
matrix which has real eigenvalues only. In this case, the convex hull of the spectrum is
a real interval but its field of values will contain imaginary values. See Exercise 12 for
another example. It has been shown (Hausdorff) that the field of values of a matrix is a
convex set. Since the eigenvalues are members of the field of values, their convex hull is
contained in the field of values. This is summarized in the following proposition.
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
37/459
The field of values of an arbitrary matrix is a convex set which
contains the convex hull of its spectrum. It is equal to the convex hull of the spectrum
when the matrix is normal.
A first result on Hermitian matrices is the following.
The eigenvalues of a Hermitian matrix are real, i.e.,
.
Let be an eigenvalue of and an associated eigenvector or 2-norm unity.
Then
which is the stated result.
It is not difficult to see that if, in addition, the matrix is real, then the eigenvectors can be
chosen to be real; see Exercise 21. Since a Hermitian matrix is normal, the following is a
consequence of Theorem 1.7.
Any Hermitian matrix is unitarily similar to a real diagonal matrix.
In particular a Hermitian matrix admits a set of orthonormal eigenvectors that form a basis
of .
In the proof of Theorem 1.8 we used the fact that the inner products
are real.
Generally, it is clear that any Hermitian matrix is such that
is real for any vector
. It turns out that the converse is also true, i.e., it can be shown that if
is
real for all vectors
in
, then the matrix is Hermitian; see Exercise 15.
Eigenvalues of Hermitian matrices can be characterized by optimality properties of
the Rayleigh quotients (1.28). The best known of these is the min-max principle. We nowlabel all the eigenvalues of
in descending order:
Here, the eigenvalues are not necessarily distinct and they are repeated, each according to
its multiplicity. In the following theorem, known as theMin-Max Theorem, represents a
generic subspace of
.
The eigenvalues of a Hermitian matrix are characterized by the
relation
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
38/459
Let
be an orthonormal basis of
consisting of eigenvectors of
associated with
respectively. Let
be the subspace spanned by the first of
these vectors and denote by
the maximum of
over all nonzero vectors
of a subspace . Since the dimension of
is , a well known theorem of linear algebra
shows that its intersection with any subspace
of dimension
is not reduced to
, i.e., there is vector in
. For this
, we have
so that
.
Consider, on the other hand, the particular subspace
of dimension
which
is spanned by
. For each vector in this subspace, we have
so that
. In other words, as runs over all the
-dimensional
subspaces,
is always
and there is at least one subspace for which
. This shows the desired result.
The above result is often called the Courant-Fisher min-max principle or theorem. As a
particular case, the largest eigenvalue of satisfies
Actually, there are four different ways of rewriting the above characterization. The
second formulation is
and the two other ones can be obtained from (1.30) and (1.32) by simply relabeling the
eigenvalues increasingly instead of decreasingly. Thus, with our labeling of the eigenvalues
in descending order, (1.32) tells us that the smallest eigenvalue satisfies
with
replaced by if the eigenvalues are relabeled increasingly.
In order for all the eigenvalues of a Hermitian matrix to be positive, it is necessary and
sufficient that
Such a matrix is called positive definite. A matrix which satisfies
for any is
said to bepositive semidefinite. In particular, the matrix
is semipositive definite for
any rectangular matrix, since
Similarly, is also a Hermitian semipositive definite matrix. The square roots of the
eigenvalues of for a general rectangular matrix are called thesingular values of
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
39/459
and are denoted by . In Section 1.5, we have stated without proof that the 2-norm of
any matrix is equal to the largest singular value of
. This is now an obvious fact,
because
which results from (1.31).
Another characterization of eigenvalues, known as the Courant characterization, is
stated in the next theorem. In contrast with the min-max theorem, this property is recursive
in nature.
The eigenvalue and the corresponding eigenvector of a Hermi-
tian matrix are such that
and for ,
In other words, the maximum of the Rayleigh quotient over a subspace that is orthog-
onal to the first eigenvectors is equal to and is achieved for the eigenvector
associated with . The proof follows easily from the expansion (1.29) of the Rayleigh
quotient.
Nonnegative matrices play a crucial role in the theory of matrices. They are important in
the study of convergence of iterative methods and arise in many applications includingeconomics, queuing theory, and chemical engineering.
Anonnegative matrix is simply a matrix whose entries are nonnegative. More gener-
ally, a partial order relation can be defined on the set of matrices.
Let and be two matrices. Then
if by definition, for , . If
denotes the zero matrix,
then isnonnegativeif
, andpositiveif
. Similar definitions hold in which
positive is replaced by negative.
The binary relation imposes only apartial order on since two arbitrary matrices
in are not necessarily comparable by this relation. For the remainder of this section,
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
40/459
we now assume that only square matrices are involved. The next proposition lists a number
of rather trivial properties regarding the partial order relation just defined.
The following properties hold.
The relation for matrices is reflexive ( ), antisymmetric (if and , then ), and transitive (if and , then ).
If and are nonnegative, then so is their product and their sum .
If is nonnegative, then so is
.
If , then .
If
, then
and similarly
.
The proof of these properties is left as Exercise 23.
A matrix is said to bereducibleif there is a permutation matrix
such that
is block upper triangular. Otherwise, it isirreducible. An important result concerning non-
negative matrices is the following theorem known as the Perron-Frobenius theorem.
Let be a real nonnegative irreducible matrix. Then
,
the spectral radius of , is a simple eigenvalue of . Moreover, there exists an eigenvector
with positive elements associated with this eigenvalue.
A relaxed version of this theorem allows the matrix to be reducible but the conclusion is
somewhat weakened in the sense that the elements of the eigenvectors are only guaranteed
to benonnegative.Next, a useful property is established.
Let
be nonnegative matrices, with
. Then
and
Consider the first inequality only, since the proof for the second is identical. The
result that is claimed translates into
which is clearly true by the assumptions.
A consequence of the proposition is the following corollary.
Let and
be two nonnegative matrices, with
. Then
The proof is by induction. The inequality is clearly true for
. Assume that(1.35) is true for . According to the previous proposition, multiplying (1.35) from the left
by results in
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
41/459
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
42/459
A matrix is said to be an -matrix if it satisfies the following four
properties:
for
.
for .
is nonsingular.
.
In reality, the four conditions in the above definition are somewhat redundant and
equivalent conditions that are more rigorous will be given later. Let be any matrix which
satisfies properties (1) and (2) in the above definition and let be the diagonal of
. Since
,
Now define
Using the previous theorem,
is nonsingular and
if and only if
. It is now easy to see that conditions (3) and (4) of Definition 1.4
can be replaced by the condition
.
Let a matrix be given such that
for
.
for .
Then is an
-matrix if and only if
, where
.
From the above argument, an immediate application of Theorem 1.15 shows that
properties (3) and (4) of the above definition are equivalent to
, where
and
. In addition,
is nonsingular iff
is and
is nonnegative iff is.
The next theorem shows that the condition (1) in Definition 1.4 is implied by the other
three.
Let a matrix be given such that
for
.
is nonsingular.
.
Then
for , i.e., is an -matrix.
where
.
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
43/459
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
44/459
It must be emphasized that this definition is only useful when formulated entirely for real
variables. Indeed, if were not restricted to be real, then assuming that
is real
for all complex would imply that is Hermitian; see Exercise 15. If, in addition to
Definition 1.41, is symmetric (real), then
is said to be Symmetric Positive Definite
(SPD). Similarly, if
is Hermitian, then
is said to beHermitian Positive Definite (HPD).Some properties of HPD matrices were seen in Section 1.9, in particular with regards
to their eigenvalues. Now the more general case where is non-Hermitian and positive
definite is considered.
We begin with the observation that any square matrix (real or complex) can be decom-
posed as
in which
Note that both and are Hermitian while the matrix
in the decomposition (1.42)
is skew-Hermitian. The matrix in the decomposition is called the Hermitian part of
, while the matrix is theskew-Hermitian partof . The above decomposition is the
analogue of the decomposition of a complex number
into
,
When is real and is a real vector then
is real and, as a result, the decom-
position (1.42) immediately gives the equality
This results in the following theorem.
Let be a real positive definite matrix. Then is nonsingular. In
addition, there exists a scalar
such that
for any real vector .
The first statement is an immediate consequence of the definition of positive defi-
niteness. Indeed, if were singular, then there would be a nonzero vector such that
and as a result
for this vector, which would contradict (1.41). We now prove
the second part of the theorem. From (1.45) and the fact that is positive definite, we
conclude that is HPD. Hence, from (1.33) based on the min-max theorem, we get
Taking
yields the desired inequality (1.46).
A simple yet important result which locates the eigenvalues of in terms of the spectra
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
45/459
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf
46/459
Projection operators orprojectorsplay an important role in numerical linear algebra, par-
ticularly in iterative methods for solving various matrix problems. This section introduces
these operators from a purely algebraic point of view and gives a few of their important
properties.
A projector is any linear mapping from
to itself which is idempotent, i.e., such that
A few simple properties follow from this definition. First, if is a projector, then so is
, and the following relation holds,
In addition, the two subspaces
and
intersect only at the element zero.
Indeed, if a vector belongs to
, then , by the idempotence property. If it
is also in
, then . Hence, which proves the result. Moreover,
every element of
can be written as
. Therefore, the space
canbe decomposed as the direct sum
Conversely, every pair of subspaces and which forms a direct sum of defines a
unique projector such that
and
. This associated projector
maps an element of
into the component , where is the -component in the
unique decomposition
associated with the direct sum.
In fact, this association is unique, that is, an arbitrary projector can be entirely
determined by the given of two subspaces: (1) The range of , and (2) its null space
which is also the range of
. For any , the vector satisfies the conditions,
The linear mapping is said to project onto andalongorparallel tothe subspace .
If is of rank , then the range of
is of dimension
. Therefore, it is natural to
define through its orthogonal complement which has dimension . The above
conditions that define for any become
These equations define a projector onto andorthogonalto the subspace . The first
statement, (1.51), establishes the degrees of freedom, while the second, (1.52), gives
7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse