Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

Embed Size (px)

Citation preview

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    1/459

    Iterative Methodsfor Sparse

    Linear Systems

    Yousef Saad

    1

    2 3

    4 5

    6

    7

    8

    9

    10

    11

    12

    13

    14

    15

    Copyright c

    2000 by Yousef Saad.

    SECOND EDITION WITH CORRECTIONS. JANUARY3RD , 2000.

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    2/459

    PREFACE xiii

    Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv

    Suggestions for Teaching . . . . . . . . . . . . . . . . . . . . . . . . . xv

    1 BACKGROUND IN LINEAR ALGEBRA 1

    1.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 Square Matrices and Eigenvalues . . . . . . . . . . . . . . . . . . . . . 3

    1.3 Types of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.4 Vector Inner Products and Norms . . . . . . . . . . . . . . . . . . . . . 6

    1.5 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    1.6 Subspaces, Range, and Kernel . . . . . . . . . . . . . . . . . . . . . . . 9

    1.7 Orthogonal Vectors and Subspaces . . . . . . . . . . . . . . . . . . . . 10

    1.8 Canonical Forms of Matrices . . . . . . . . . . . . . . . . . . . . . . . 15

    1.8.1 Reduction to the Diagonal Form . . . . . . . . . . . . . . . . . 151.8.2 The Jordan Canonical Form . . . . . . . . . . . . . . . . . . . . 16

    1.8.3 The Schur Canonical Form . . . . . . . . . . . . . . . . . . . . 17

    1.8.4 Application to Powers of Matrices . . . . . . . . . . . . . . . . 19

    1.9 Normal and Hermitian Matrices . . . . . . . . . . . . . . . . . . . . . . 21

    1.9.1 Normal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 21

    1.9.2 Hermitian Matrices . . . . . . . . . . . . . . . . . . . . . . . . 24

    1.10 Nonnegative Matrices, M-Matrices . . . . . . . . . . . . . . . . . . . . 26

    1.11 Positive-Definite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 30

    1.12 Projection Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331.12.1 Range and Null Space of a Projector . . . . . . . . . . . . . . . 33

    1.12.2 Matrix Representations . . . . . . . . . . . . . . . . . . . . . . 35

    1.12.3 Orthogonal and Oblique Projectors . . . . . . . . . . . . . . . . 35

    1.12.4 Properties of Orthogonal Projectors . . . . . . . . . . . . . . . . 37

    1.13 Basic Concepts in Linear Systems . . . . . . . . . . . . . . . . . . . . . 38

    1.13.1 Existence of a Solution . . . . . . . . . . . . . . . . . . . . . . 38

    1.13.2 Perturbation Analysis . . . . . . . . . . . . . . . . . . . . . . . 39

    Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    2 DISCRETIZATION OF PDES 442.1 Partial Differential Equations . . . . . . . . . . . . . . . . . . . . . . . 44

    2.1.1 Elliptic Operators . . . . . . . . . . . . . . . . . . . . . . . . . 45

    2.1.2 The Convection Diffusion Equation . . . . . . . . . . . . . . . 47

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    3/459

    2.2 Finite Difference Methods . . . . . . . . . . . . . . . . . . . . . . . . . 47

    2.2.1 Basic Approximations . . . . . . . . . . . . . . . . . . . . . . . 48

    2.2.2 Difference Schemes for the Laplacean Operator . . . . . . . . . 49

    2.2.3 Finite Differences for 1-D Problems . . . . . . . . . . . . . . . 51

    2.2.4 Upwind Schemes . . . . . . . . . . . . . . . . . . . . . . . . . 512.2.5 Finite Differences for 2-D Problems . . . . . . . . . . . . . . . 54

    2.3 The Finite Element Method . . . . . . . . . . . . . . . . . . . . . . . . 55

    2.4 Mesh Generation and Refinement . . . . . . . . . . . . . . . . . . . . . 61

    2.5 Finite Volume Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

    Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

    3 SPARSE MATRICES 68

    3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

    3.2 Graph Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

    3.2.1 Graphs and Adjacency Graphs . . . . . . . . . . . . . . . . . . 70

    3.2.2 Graphs of PDE Matrices . . . . . . . . . . . . . . . . . . . . . 72

    3.3 Permutations and Reorderings . . . . . . . . . . . . . . . . . . . . . . . 72

    3.3.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 72

    3.3.2 Relations with the Adjacency Graph . . . . . . . . . . . . . . . 75

    3.3.3 Common Reorderings . . . . . . . . . . . . . . . . . . . . . . . 75

    3.3.4 Irreducibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

    3.4 Storage Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

    3.5 Basic Sparse Matrix Operations . . . . . . . . . . . . . . . . . . . . . . 87

    3.6 Sparse Direct Solution Methods . . . . . . . . . . . . . . . . . . . . . . 883.7 Test Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

    Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

    4 BASIC ITERATIVE METHODS 95

    4.1 Jacobi, Gauss-Seidel, and SOR . . . . . . . . . . . . . . . . . . . . . . 95

    4.1.1 Block Relaxation Schemes . . . . . . . . . . . . . . . . . . . . 98

    4.1.2 Iteration Matrices and Preconditioning . . . . . . . . . . . . . . 102

    4.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

    4.2.1 General Convergence Result . . . . . . . . . . . . . . . . . . . 1044.2.2 Regular Splittings . . . . . . . . . . . . . . . . . . . . . . . . . 107

    4.2.3 Diagonally Dominant Matrices . . . . . . . . . . . . . . . . . . 108

    4.2.4 Symmetric Positive Definite Matrices . . . . . . . . . . . . . . 112

    4.2.5 Property A and Consistent Orderings . . . . . . . . . . . . . . . 112

    4.3 Alternating Direction Methods . . . . . . . . . . . . . . . . . . . . . . 116

    Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

    5 PROJECTION METHODS 122

    5.1 Basic Definitions and Algorithms . . . . . . . . . . . . . . . . . . . . . 122

    5.1.1 General Projection Methods . . . . . . . . . . . . . . . . . . . 123

    5.1.2 Matrix Representation . . . . . . . . . . . . . . . . . . . . . . . 124

    5.2 General Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

    5.2.1 Two Optimality Results . . . . . . . . . . . . . . . . . . . . . . 126

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    4/459

    5.2.2 Interpretation in Terms of Projectors . . . . . . . . . . . . . . . 127

    5.2.3 General Error Bound . . . . . . . . . . . . . . . . . . . . . . . 129

    5.3 One-Dimensional Projection Processes . . . . . . . . . . . . . . . . . . 131

    5.3.1 Steepest Descent . . . . . . . . . . . . . . . . . . . . . . . . . 132

    5.3.2 Minimal Residual (MR) Iteration . . . . . . . . . . . . . . . . . 1345.3.3 Residual Norm Steepest Descent . . . . . . . . . . . . . . . . . 136

    5.4 Additive and Multiplicative Processes . . . . . . . . . . . . . . . . . . . 136

    Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

    6 KRYLOV SUBSPACE METHODS PART I 144

    6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

    6.2 Krylov Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

    6.3 Arnoldis Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

    6.3.1 The Basic Algorithm . . . . . . . . . . . . . . . . . . . . . . . 147

    6.3.2 Practical Implementations . . . . . . . . . . . . . . . . . . . . . 149

    6.4 Arnoldis Method for Linear Systems (FOM) . . . . . . . . . . . . . . . 152

    6.4.1 Variation 1: Restarted FOM . . . . . . . . . . . . . . . . . . . . 154

    6.4.2 Variation 2: IOM and DIOM . . . . . . . . . . . . . . . . . . . 155

    6.5 GMRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

    6.5.1 The Basic GMRES Algorithm . . . . . . . . . . . . . . . . . . 158

    6.5.2 The Householder Version . . . . . . . . . . . . . . . . . . . . . 159

    6.5.3 Practical Implementation Issues . . . . . . . . . . . . . . . . . 161

    6.5.4 Breakdown of GMRES . . . . . . . . . . . . . . . . . . . . . . 165

    6.5.5 Relations between FOM and GMRES . . . . . . . . . . . . . . 1656.5.6 Variation 1: Restarting . . . . . . . . . . . . . . . . . . . . . . 168

    6.5.7 Variation 2: Truncated GMRES Versions . . . . . . . . . . . . . 169

    6.6 The Symmetric Lanczos Algorithm . . . . . . . . . . . . . . . . . . . . 174

    6.6.1 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

    6.6.2 Relation with Orthogonal Polynomials . . . . . . . . . . . . . . 175

    6.7 The Conjugate Gradient Algorithm . . . . . . . . . . . . . . . . . . . . 176

    6.7.1 Derivation and Theory . . . . . . . . . . . . . . . . . . . . . . 176

    6.7.2 Alternative Formulations . . . . . . . . . . . . . . . . . . . . . 180

    6.7.3 Eigenvalue Estimates from the CG Coefficients . . . . . . . . . 1816.8 The Conjugate Residual Method . . . . . . . . . . . . . . . . . . . . . 183

    6.9 GCR, ORTHOMIN, and ORTHODIR . . . . . . . . . . . . . . . . . . . 183

    6.10 The Faber-Manteuffel Theorem . . . . . . . . . . . . . . . . . . . . . . 186

    6.11 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

    6.11.1 Real Chebyshev Polynomials . . . . . . . . . . . . . . . . . . . 188

    6.11.2 Complex Chebyshev Polynomials . . . . . . . . . . . . . . . . 189

    6.11.3 Convergence of the CG Algorithm . . . . . . . . . . . . . . . . 193

    6.11.4 Convergence of GMRES . . . . . . . . . . . . . . . . . . . . . 194

    6.12 Block Krylov Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 197

    Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

    7 KRYLOV SUBSPACE METHODS PART II 205

    7.1 Lanczos Biorthogonalization . . . . . . . . . . . . . . . . . . . . . . . 205

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    5/459

    7.1.1 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

    7.1.2 Practical Implementations . . . . . . . . . . . . . . . . . . . . . 208

    7.2 The Lanczos Algorithm for Linear Systems . . . . . . . . . . . . . . . . 210

    7.3 The BCG and QMR Algorithms . . . . . . . . . . . . . . . . . . . . . . 210

    7.3.1 The Biconjugate Gradient Algorithm . . . . . . . . . . . . . . . 2117.3.2 Quasi-Minimal Residual Algorithm . . . . . . . . . . . . . . . 212

    7.4 Transpose-Free Variants . . . . . . . . . . . . . . . . . . . . . . . . . . 214

    7.4.1 Conjugate Gradient Squared . . . . . . . . . . . . . . . . . . . 215

    7.4.2 BICGSTAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

    7.4.3 Transpose-Free QMR (TFQMR) . . . . . . . . . . . . . . . . . 221

    Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

    8 METHODS RELATED TO THE NORMAL EQUATIONS 230

    8.1 The Normal Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

    8.2 Row Projection Methods . . . . . . . . . . . . . . . . . . . . . . . . . 232

    8.2.1 Gauss-Seidel on the Normal Equations . . . . . . . . . . . . . . 232

    8.2.2 Cimminos Method . . . . . . . . . . . . . . . . . . . . . . . . 234

    8.3 Conjugate Gradient and Normal Equations . . . . . . . . . . . . . . . . 237

    8.3.1 CGNR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

    8.3.2 CGNE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

    8.4 Saddle-Point Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

    Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

    9 PRECONDITIONED ITERATIONS 2459.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

    9.2 Preconditioned Conjugate Gradient . . . . . . . . . . . . . . . . . . . . 246

    9.2.1 Preserving Symmetry . . . . . . . . . . . . . . . . . . . . . . . 246

    9.2.2 Efficient Implementations . . . . . . . . . . . . . . . . . . . . . 249

    9.3 Preconditioned GMRES . . . . . . . . . . . . . . . . . . . . . . . . . . 251

    9.3.1 Left-Preconditioned GMRES . . . . . . . . . . . . . . . . . . . 251

    9.3.2 Right-Preconditioned GMRES . . . . . . . . . . . . . . . . . . 253

    9.3.3 Split Preconditioning . . . . . . . . . . . . . . . . . . . . . . . 254

    9.3.4 Comparison of Right and Left Preconditioning . . . . . . . . . . 2559.4 Flexible Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

    9.4.1 Flexible GMRES . . . . . . . . . . . . . . . . . . . . . . . . . 256

    9.4.2 DQGMRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

    9.5 Preconditioned CG for the Normal Equations . . . . . . . . . . . . . . . 260

    9.6 The CGW Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

    Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

    10 PRECONDITIONING TECHNIQUES 265

    10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

    10.2 Jacobi, SOR, and SSOR Preconditioners . . . . . . . . . . . . . . . . . 266

    10.3 ILU Factorization Preconditioners . . . . . . . . . . . . . . . . . . . . 269

    10.3.1 Incomplete LU Factorizations . . . . . . . . . . . . . . . . . . . 270

    10.3.2 Zero Fill-in ILU (ILU(0)) . . . . . . . . . . . . . . . . . . . . . 275

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    6/459

    10.3.3 Level of Fill and ILU(

    ) . . . . . . . . . . . . . . . . . . . . . . 278

    10.3.4 Matrices with Regular Structure . . . . . . . . . . . . . . . . . 281

    10.3.5 Modified ILU (MILU) . . . . . . . . . . . . . . . . . . . . . . 286

    10.4 Threshold Strategies and ILUT . . . . . . . . . . . . . . . . . . . . . . 287

    10.4.1 The ILUT Approach . . . . . . . . . . . . . . . . . . . . . . . 28810.4.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

    10.4.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . 292

    10.4.4 The ILUTP Approach . . . . . . . . . . . . . . . . . . . . . . . 294

    10.4.5 The ILUS Approach . . . . . . . . . . . . . . . . . . . . . . . . 296

    10.5 Approximate Inverse Preconditioners . . . . . . . . . . . . . . . . . . . 298

    10.5.1 Approximating the Inverse of a Sparse Matrix . . . . . . . . . . 299

    10.5.2 Global Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . 299

    10.5.3 Column-Oriented Algorithms . . . . . . . . . . . . . . . . . . . 301

    10.5.4 Theoretical Considerations . . . . . . . . . . . . . . . . . . . . 303

    10.5.5 Convergence of Self Preconditioned MR . . . . . . . . . . . . . 305

    10.5.6 Factored Approximate Inverses . . . . . . . . . . . . . . . . . . 307

    10.5.7 Improving a Preconditioner . . . . . . . . . . . . . . . . . . . . 310

    10.6 Block Preconditioners . . . . . . . . . . . . . . . . . . . . . . . . . . . 310

    10.6.1 Block-Tridiagonal Matrices . . . . . . . . . . . . . . . . . . . . 311

    10.6.2 General Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 312

    10.7 Preconditioners for the Normal Equations . . . . . . . . . . . . . . . . 313

    10.7.1 Jacobi, SOR, and Variants . . . . . . . . . . . . . . . . . . . . . 313

    10.7.2 IC(0) for the Normal Equations . . . . . . . . . . . . . . . . . . 314

    10.7.3 Incomplete Gram-Schmidt and ILQ . . . . . . . . . . . . . . . 316Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

    11 PARALLEL IMPLEMENTATIONS 324

    11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324

    11.2 Forms of Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325

    11.2.1 Multiple Functional Units . . . . . . . . . . . . . . . . . . . . . 325

    11.2.2 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

    11.2.3 Vector Processors . . . . . . . . . . . . . . . . . . . . . . . . . 326

    11.2.4 Multiprocessing and Distributed Computing . . . . . . . . . . . 32611.3 Types of Parallel Architectures . . . . . . . . . . . . . . . . . . . . . . 327

    11.3.1 Shared Memory Computers . . . . . . . . . . . . . . . . . . . . 327

    11.3.2 Distributed Memory Architectures . . . . . . . . . . . . . . . . 329

    11.4 Types of Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331

    11.4.1 Preconditioned CG . . . . . . . . . . . . . . . . . . . . . . . . 332

    11.4.2 GMRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

    11.4.3 Vector Operations . . . . . . . . . . . . . . . . . . . . . . . . . 333

    11.4.4 Reverse Communication . . . . . . . . . . . . . . . . . . . . . 334

    11.5 Matrix-by-Vector Products . . . . . . . . . . . . . . . . . . . . . . . . 335

    11.5.1 The Case of Dense Matrices . . . . . . . . . . . . . . . . . . . 335

    11.5.2 The CSR and CSC Formats . . . . . . . . . . . . . . . . . . . . 336

    11.5.3 Matvecs in the Diagonal Format . . . . . . . . . . . . . . . . . 339

    11.5.4 The Ellpack-Itpack Format . . . . . . . . . . . . . . . . . . . . 340

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    7/459

    11.5.5 The Jagged Diagonal Format . . . . . . . . . . . . . . . . . . . 341

    11.5.6 The Case of Distributed Sparse Matrices . . . . . . . . . . . . . 342

    11.6 Standard Preconditioning Operations . . . . . . . . . . . . . . . . . . . 345

    11.6.1 Parallelism in Forward Sweeps . . . . . . . . . . . . . . . . . . 346

    11.6.2 Level Scheduling: the Case of 5-Point Matrices . . . . . . . . . 34611.6.3 Level Scheduling for Irregular Graphs . . . . . . . . . . . . . . 347

    Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350

    12 PARALLEL PRECONDITIONERS 353

    12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353

    12.2 Block-Jacobi Preconditioners . . . . . . . . . . . . . . . . . . . . . . . 354

    12.3 Polynomial Preconditioners . . . . . . . . . . . . . . . . . . . . . . . . 356

    12.3.1 Neumann Polynomials . . . . . . . . . . . . . . . . . . . . . . 356

    12.3.2 Chebyshev Polynomials . . . . . . . . . . . . . . . . . . . . . . 357

    12.3.3 Least-Squares Polynomials . . . . . . . . . . . . . . . . . . . . 360

    12.3.4 The Nonsymmetric Case . . . . . . . . . . . . . . . . . . . . . 363

    12.4 Multicoloring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365

    12.4.1 Red-Black Ordering . . . . . . . . . . . . . . . . . . . . . . . . 366

    12.4.2 Solution of Red-Black Systems . . . . . . . . . . . . . . . . . . 367

    12.4.3 Multicoloring for General Sparse Matrices . . . . . . . . . . . . 368

    12.5 Multi-Elimination ILU . . . . . . . . . . . . . . . . . . . . . . . . . . . 369

    12.5.1 Multi-Elimination . . . . . . . . . . . . . . . . . . . . . . . . . 370

    12.5.2 ILUM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371

    12.6 Distributed ILU and SSOR . . . . . . . . . . . . . . . . . . . . . . . . 37412.6.1 Distributed Sparse Matrices . . . . . . . . . . . . . . . . . . . . 374

    12.7 Other Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376

    12.7.1 Approximate Inverses . . . . . . . . . . . . . . . . . . . . . . . 377

    12.7.2 Element-by-Element Techniques . . . . . . . . . . . . . . . . . 377

    12.7.3 Parallel Row Projection Preconditioners . . . . . . . . . . . . . 379

    Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380

    13 DOMAIN DECOMPOSITION METHODS 383

    13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38313.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384

    13.1.2 Types of Partitionings . . . . . . . . . . . . . . . . . . . . . . . 385

    13.1.3 Types of Techniques . . . . . . . . . . . . . . . . . . . . . . . . 386

    13.2 Direct Solution and the Schur Complement . . . . . . . . . . . . . . . . 388

    13.2.1 Block Gaussian Elimination . . . . . . . . . . . . . . . . . . . 388

    13.2.2 Properties of the Schur Complement . . . . . . . . . . . . . . . 389

    13.2.3 Schur Complement for Vertex-Based Partitionings . . . . . . . . 390

    13.2.4 Schur Complement for Finite-Element Partitionings . . . . . . . 393

    13.3 Schwarz Alternating Procedures . . . . . . . . . . . . . . . . . . . . . . 395

    13.3.1 Multiplicative Schwarz Procedure . . . . . . . . . . . . . . . . 395

    13.3.2 Multiplicative Schwarz Preconditioning . . . . . . . . . . . . . 400

    13.3.3 Additive Schwarz Procedure . . . . . . . . . . . . . . . . . . . 402

    13.3.4 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    8/459

    13.4 Schur Complement Approaches . . . . . . . . . . . . . . . . . . . . . . 408

    13.4.1 Induced Preconditioners . . . . . . . . . . . . . . . . . . . . . . 408

    13.4.2 Probing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410

    13.4.3 Preconditioning Vertex-Based Schur Complements . . . . . . . 411

    13.5 Full Matrix Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41213.6 Graph Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414

    13.6.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 414

    13.6.2 Geometric Approach . . . . . . . . . . . . . . . . . . . . . . . 415

    13.6.3 Spectral Techniques . . . . . . . . . . . . . . . . . . . . . . . . 417

    13.6.4 Graph Theory Techniques . . . . . . . . . . . . . . . . . . . . . 418

    Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422

    REFERENCES 425

    INDEX 439

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    9/459

    xii

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    10/459

    Iterative methods for solving general, large sparse linear systems have been gaining

    popularity in many areas of scientific computing. Until recently, direct solution methods

    were often preferred to iterative methods in real applications because of their robustness

    and predictable behavior. However, a number of efficient iterative solvers were discovered

    and the increased need for solving very large linear systems triggered a noticeable and

    rapid shift toward iterative techniques in many applications.

    This trend can be traced back to the 1960s and 1970s when two important develop-

    ments revolutionized solution methods for large linear systems. First was the realization

    that one can take advantage of sparsity to design special direct methods that can be

    quite economical. Initiated by electrical engineers, these direct sparse solution methods

    led to the development of reliable and efficient general-purpose direct solution software

    codes over the next three decades. Second was the emergence of preconditioned conjugate

    gradient-like methods for solving linear systems. It was found that the combination of pre-

    conditioning and Krylov subspace iterations could provide efficient and simple general-purpose procedures that could compete with direct solvers. Preconditioning involves ex-

    ploiting ideas from sparse direct solvers. Gradually, iterative methods started to approach

    the quality of direct solvers. In earlier times, iterative methods were often special-purpose

    in nature. They were developed with certain applications in mind, and their efficiency relied

    on many problem-dependent parameters.

    Now, three-dimensional models are commonplace and iterative methods are al-

    most mandatory. The memory and the computational requirements for solving three-

    dimensional Partial Differential Equations, or two-dimensional ones involving many

    degrees of freedom per point, may seriously challenge the most efficient direct solvers

    available today. Also, iterative methods are gaining ground because they are easier toimplement efficiently on high-performance computers than direct methods.

    My intention in writing this volume is to provide up-to-date coverage of iterative meth-

    ods for solving large sparse linear systems. I focused the book on practical methods that

    work for general sparse matrices rather than for any specific class of problems. It is indeed

    becoming important to embrace applications not necessarily governed by Partial Differ-

    ential Equations, as these applications are on the rise. Apart from two recent volumes by

    Axelsson [15] and Hackbusch [116], few books on iterative methods have appeared since

    the excellent ones by Varga [213]. and later Young [232]. Since then, researchers and prac-

    titioners have achieved remarkable progress in the development and use of effective iter-ative methods. Unfortunately, fewer elegant results have been discovered since the 1950s

    and 1960s. The field has moved in other directions. Methods have gained not only in effi-

    ciency but also in robustness and in generality. The traditional techniques which required

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    11/459

    rather complicated procedures to determine optimal acceleration parameters have yielded

    to the parameter-free conjugate gradient class of methods.

    The primary aim of this book is to describe some of the best techniques available today,

    from both preconditioners and accelerators. One of the aims of the book is to provide a

    good mix of theory and practice. It also addresses some of the current research issuessuch as parallel implementations and robust preconditioners. The emphasis is on Krylov

    subspace methods, currently the most practical and common group of techniques used in

    applications. Although there is a tutorial chapter that covers the discretization of Partial

    Differential Equations, the book is not biased toward any specific application area. Instead,

    the matrices are assumed to be general sparse, possibly irregularly structured.

    The book has been structured in four distinct parts. The first part, Chapters 1 to 4,

    presents the basic tools. The second part, Chapters 5 to 8, presents projection methods and

    Krylov subspace techniques. The third part, Chapters 9 and 10, discusses precondition-

    ing. The fourth part, Chapters 11 to 13, discusses parallel implementations and parallel

    algorithms.

    I am grateful to a number of colleagues who proofread or reviewed different versions of

    the manuscript. Among them are Randy Bramley (University of Indiana at Bloomingtin),Xiao-Chuan Cai (University of Colorado at Boulder), Tony Chan (University of California

    at Los Angeles), Jane Cullum (IBM, Yorktown Heights), Alan Edelman (Massachussett

    Institute of Technology), Paul Fischer (Brown University), David Keyes (Old Dominion

    University), Beresford Parlett (University of California at Berkeley) and Shang-Hua Teng

    (University of Minnesota). Their numerous comments, corrections, and encouragements

    were a highly appreciated contribution. In particular, they helped improve the presenta-

    tion considerably and prompted the addition of a number of topics missing from earlier

    versions.

    This book evolved from several successive improvements of a set of lecture notes for

    the course Iterative Methods for Linear Systems which I taught at the University of Min-nesota in the last few years. I apologize to those students who used the earlier error-laden

    and incomplete manuscripts. Their input and criticism contributed significantly to improv-

    ing the manuscript. I also wish to thank those students at MIT (with Alan Edelman) and

    UCLA (with Tony Chan) who used this book in manuscript form and provided helpful

    feedback. My colleagues at the university of Minnesota, staff and faculty members, have

    helped in different ways. I wish to thank in particular Ahmed Sameh for his encourage-

    ments and for fostering a productive environment in the department. Finally, I am grateful

    to the National Science Foundation for their continued financial support of my research,

    part of which is represented in this work.

    Yousef Saad

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    12/459

    This book can be used as a text to teach a graduate-level course on iterative methods for

    linear systems. Selecting topics to teach depends on whether the course is taught in a

    mathematics department or a computer science (or engineering) department, and whether

    the course is over a semester or a quarter. Here are a few comments on the relevance of the

    topics in each chapter.

    For a graduate course in a mathematics department, much of the material in Chapter 1

    should be known already. For non-mathematics majors most of the chapter must be covered

    or reviewed to acquire a good background for later chapters. The important topics for

    the rest of the book are in Sections: 1.8.1, 1.8.3, 1.8.4, 1.9, 1.11. Section 1.12 is best

    treated at the beginning of Chapter 5. Chapter 2 is essentially independent from the restand could be skipped altogether in a quarter course. One lecture on finite differences and

    the resulting matrices would be enough for a non-math course. Chapter 3 should make

    the student familiar with some implementation issues associated with iterative solution

    procedures for general sparse matrices. In a computer science or engineering department,

    this can be very relevant. For mathematicians, a mention of the graph theory aspects of

    sparse matrices and a few storage schemes may be sufficient. Most students at this level

    should be familiar with a few of the elementary relaxation techniques covered in Chapter

    4. The convergence theory can be skipped for non-math majors. These methods are now

    often used as preconditioners and this may be the only motive for covering them.

    Chapter 5 introduces key concepts and presents projection techniques in general terms.Non-mathematicians may wish to skip Section 5.2.3. Otherwise, it is recommended to

    start the theory section by going back to Section 1.12 on general definitions on projectors.

    Chapters 6 and 7 represent the heart of the matter. It is recommended to describe the first

    algorithms carefully and put emphasis on the fact that they generalize the one-dimensional

    methods covered in Chapter 5. It is also important to stress the optimality properties of

    those methods in Chapter 6 and the fact that these follow immediately from the properties

    of projectors seen in Section 1.12. When covering the algorithms in Chapter 7, it is crucial

    to point out the main differences between them and those seen in Chapter 6. The variants

    such as CGS, BICGSTAB, and TFQMR can be covered in a short time, omitting details ofthe algebraic derivations or covering only one of the three. The class of methods based on

    the normal equation approach, i.e., Chapter 8, can be skipped in a math-oriented course,

    especially in the case of a quarter system. For a semester course, selected topics may be

    Sections 8.1, 8.2, and 8.4.

    Currently, preconditioning is known to be the critical ingredient in the success of it-

    erative methods in solving real-life problems. Therefore, at least some parts of Chapter 9

    and Chapter 10 should be covered. Section 9.2 and (very briefly) 9.3 are recommended.

    From Chapter 10, discuss the basic ideas in Sections 10.1 through 10.3. The rest could be

    skipped in a quarter course.

    Chapter 11 may be useful to present to computer science majors, but may be skimmedor skipped in a mathematics or an engineering course. Parts of Chapter 12 could be taught

    primarily to make the students aware of the importance of alternative preconditioners.

    Suggested selections are: 12.2, 12.4, and 12.7.2 (for engineers). Chapter 13 presents an im-

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    13/459

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    14/459

    For the sake of generality, all vector spaces considered in this chapter are complex, unless

    otherwise stated. A complex matrix is an array of complex numbers

    The set of all

    matrices is a complex vector space denoted by

    . The main

    operations with matrices are the following:

    Addition:

    , where

    , and

    are matrices of size

    and

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    15/459

    Multiplication by a scalar:

    , where

    Multiplication by another matrix:

    where

    , and

    Sometimes, a notation with column vectors and row vectors is used. The column vector

    is the vector consisting of the -th column of ,

    ...

    Similarly, the notation will denote the -th row of the matrix

    For example, the following could be written

    or

    Thetranspose of a matrix in

    is a matrix

    in

    whose elements are

    defined by

    . It is denoted by

    . It is often more

    relevant to use thetranspose conjugatematrix denoted by

    and defined by

    in which the bar denotes the (element-wise) complex conjugation.

    Matrices are strongly related to linear mappings between vector spaces of finite di-

    mension. This is because they represent these mappings with respect to two given bases:

    one for the initial vector space and the other for the image vector space, or rangeof .

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    16/459

    A matrix is square if it has the same number of columns and rows, i.e., if

    . An

    important square matrix is the identity matrix

    where

    is the Kronecker symbol. The identity matrix satisfies the equality

    for every matrix of size . The inverse of a matrix, when it exists, is a matrix such that

    The inverse of is denoted by

    .

    Thedeterminantof a matrix may be defined in several ways. For simplicity, the fol-lowing recursive definition is used here. The determinant of a

    matrix

    is defined

    as the scalar . Then the determinant of an

    matrix is given by

    where

    is an

    matrix obtained by deleting the first row and the -th

    column of . A matrix is said to besingularwhen

    andnonsingularotherwise.

    We have the following simple properties:

    .

    .

    .

    .

    .

    From the above definition of determinants it can be shown by induction that the func-

    tion that maps a given complex value to the value

    is a polynomial

    of degree ; see Exercise 8. This is known as thecharacteristic polynomialof the matrix

    .

    A complex scalar is called an eigenvalue of the square matrix if

    a nonzero vector of

    exists such that

    . The vector is called aneigenvector

    of associated with . The set of all the eigenvalues of

    is called the spectrum of

    and

    is denoted by

    .

    A scalar is an eigenvalue of if and only if

    . That is true

    if and only if (iffthereafter) is a root of the characteristic polynomial. In particular, there

    are at most distinct eigenvalues.

    It is clear that a matrix is singular if and only if it admits zero as an eigenvalue. A well

    known result in linear algebra is stated in the following proposition.

    A matrix is nonsingular if and only if it admits an inverse.

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    17/459

    Thus, the determinant of a matrix determines whether or not the matrix admits an inverse.

    The maximum modulus of the eigenvalues is called spectral radiusand is denoted by

    Thetraceof a matrix is equal to the sum of all its diagonal elements

    It can be easily shown that the trace of is also equal to the sum of the eigenvalues of

    counted with their multiplicities as roots of the characteristic polynomial.

    If

    is an eigenvalue of

    , then

    is an eigenvalue of

    . Aneigenvector of associated with the eigenvalue

    is called a left eigenvector of .

    When a distinction is necessary, an eigenvector of is often called a right eigenvector.

    Therefore, the eigenvalue as well as the right and left eigenvectors, and , satisfy the

    relations

    or, equivalently,

    The choice of a method for solving linear systems will often depend on the structure of

    the matrix . One of the most important properties of matrices is symmetry, because of

    its impact on the eigenstructure of . A number of other classes of matrices also have

    particular eigenstructures. The most important ones are listed below:

    Symmetric matrices:

    .

    Hermitian matrices:

    .

    Skew-symmetric matrices:

    .

    Skew-Hermitian matrices:

    .

    Normal matrices:

    .

    Nonnegative matrices:

    (similar definition for nonpositive,

    positive, and negative matrices).

    Unitary matrices:

    .

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    18/459

    It is worth noting that a unitary matrix is a matrix whose inverse is its transpose conjugate

    , since

    A matrix such that is diagonal is often called orthogonal.

    Some matrices have particular structures that are often convenient for computational

    purposes. The following list, though incomplete, gives an idea of these special matrices

    which play an important role in numerical analysis and scientific computing applications.

    Diagonal matrices:

    for . Notation:

    Upper triangular matrices:

    for

    .

    Lower triangular matrices:

    for .

    Upper bidiagonal matrices:

    for or

    .

    Lower bidiagonal matrices: for or .

    Tridiagonal matrices: for any pair such that

    . Notation:

    Banded matrices:

    only if

    , where

    and

    are two

    nonnegative integers. The number

    is called the bandwidth of .

    Upper Hessenberg matrices:

    for any pair such that . Lower

    Hessenberg matrices can be defined similarly.

    Outer product matrices:

    , where both and are vectors.

    Permutation matrices: the columns of are a permutation of the columns of the

    identity matrix.

    Block diagonal matrices:generalizes the diagonal matrix by replacing each diago-nal entry by a matrix. Notation:

    Block tridiagonal matrices: generalizes the tridiagonal matrix by replacing each

    nonzero entry by a square matrix. Notation:

    The above properties emphasize structure, i.e., positions of the nonzero elements with

    respect to the zeros. Also, they assume that there are many zero elements or that the matrix

    is of low rank. This is in contrast with the classifications listed earlier, such as symmetry

    or normality.

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    19/459

    An inner product on a (complex) vector space is any mapping

    from

    into

    ,

    which satisfies the following conditions:

    is linear with respect to , i.e.,

    isHermitian, i.e.,

    ispositive definite, i.e.,

    Note that (2) implies that

    is real and therefore, (3) adds the constraint that

    must also be positive for any nonzero . For any and ,

    Similarly,

    for any

    . Hence,

    for any

    and

    . In particular

    the condition (3) can be rewritten as

    and

    iff

    as can be readily shown. A useful relation satisfied by any inner product is the so-called

    Cauchy-Schwartz inequality:

    The proof of this inequality begins by expanding

    using the properties of

    ,

    If

    then the inequality is trivially satisfied. Assume that

    and take

    . Then

    shows the above equality

    which yields the result (1.2).

    In the particular case of the vector space , a canonical inner product is the

    Euclidean inner product. The Euclidean inner product of two vectors

    and

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    20/459

    of

    is defined by

    which is often rewritten in matrix notation as

    It is easy to verify that this mapping does indeed satisfy the three conditions required for

    inner products, listed above. A fundamental property of the Euclidean inner product in

    matrix computations is the simple relation

    The proof of this is straightforward. The adjointof with respect to an arbitrary inner

    productis a matrix such that

    for all pairs of vectors

    and

    . A matrixisself-adjoint, or Hermitian with respect to this inner product, if it is equal to its adjoint.

    The following proposition is a consequence of the equality (1.5).

    Unitary matrices preserve the Euclidean inner product, i.e.,

    for any unitary matrix and any vectors and .

    Indeed,

    .

    A vector norm on a vector space is a real-valued function

    on , which

    satisfies the following three conditions:

    and

    iff .

    .

    .

    For the particular case when

    , we can associate with the inner product (1.3)theEuclidean normof a complex vector defined by

    It follows from Proposition 1.3 that a unitary matrix preserves the Euclidean norm metric,

    i.e.,

    The linear transformation associated with a unitary matrix is therefore anisometry.

    The most commonly used vector norms in numerical linear algebra are special cases

    of the Holder norms

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    21/459

    Note that the limit of

    when

    tends to infinity exists and is equal to the maximum

    modulus of the s. This defines a norm denoted by

    . The cases

    ,

    , and

    lead to the most important norms in practice,

    The Cauchy-Schwartz inequality of (1.2) becomes

    For a general matrix in

    , we define the following special set of norms

    The norm

    isinducedby the two norms

    and

    . These norms satisfy the usual

    properties of norms, i.e.,

    and

    iff

    The most important cases are again those associated with . The case

    is of particular interest and the associated norm

    is simply denoted by

    and

    called a -norm. A fundamental property of a -norm is that

    an immediate consequence of the definition (1.7). Matrix norms that satisfy the above

    property are sometimes called consistent. A result of consistency is that for any square

    matrix ,

    In particular the matrix

    converges to zero ifany of its -norms is less than 1.

    The Frobenius norm of a matrix is defined by

    This can be viewed as the 2-norm of the column (or row) vector in consisting of all the

    columns (respectively rows) of listed from to (respectively to .) It can be shown

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    22/459

    that this norm is also consistent, in spite of the fact that it is not induced by a pair of vector

    norms, i.e., it is not derived from a formula of the form (1.7); see Exercise 5. However, it

    does not satisfy some of the other properties of the

    -norms. For example, the Frobenius

    norm of the identity matrix is not equal to one. To avoid these difficulties, we will only use

    the term matrix norm for a norm that is induced by two norms as in the definition (1.7).Thus, we will not consider the Frobenius norm to be a proper matrix norm, according to

    our conventions, even though it is consistent.

    The following equalities satisfied by the matrix norms defined above lead to alternative

    definitions that are often easier to work with:

    As will be shown later, the eigenvalues of

    are nonnegative. Their square roots

    are called singular values of and are denoted by

    . Thus, the relation

    (1.11) states that

    is equal to , the largest singular value of .

    From the relation (1.11), it is clear that the spectral radius

    is equalto the 2-norm of a matrix when the matrix is Hermitian. However, it is not a matrix norm

    in general. For example, the first property of norms is not satisfied, since for

    we have

    while

    . Also, the triangle inequality is not satisfied for the pair ,

    and where is defined above. Indeed,

    while

    A subspace of

    is a subset of

    that is also a complex vector space. The set of alllinear combinations of a set of vectors

    of is a vector subspace

    called the linear span of ,

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    23/459

    If the

    s are linearly independent, then each vector of

    admits a unique expres-

    sion as a linear combination of the

    s. The set is then called a basis of the subspace

    .

    Given two vector subspaces and , theirsum is a subspace defined as the set of

    all vectors that are equal to the sum of a vector of and a vector of . The intersection

    of two subspaces is also a subspace. If the intersection of and is reduced to

    , then

    the sum of

    and

    is called their direct sum and is denoted by

    . When

    is equal to

    , then every vector of

    can be written in a unique way as the sum of

    an element of and an element of . The transformation that maps into

    is a linear transformation that isidempotent, i.e., such that

    . It is called aprojector

    onto along .

    Two important subspaces that are associated with a matrix

    of

    are itsrange,defined by

    and itskernelornull space

    The range of is clearly equal to the linear span of its columns. The rankof a matrix

    is equal to the dimension of the range of , i.e., to the number of linearly independent

    columns. This column rankis equal to the row rank, the number of linearly independent

    rows of . A matrix in

    is offull rankwhen its rank is equal to the smallest of

    and .

    A subspace is said to beinvariantunder a (square) matrix whenever . In

    particular for any eigenvalue of the subspace

    is invariant under . The

    subspace

    is called the eigenspace associated with and consists of all the

    eigenvectors of associated with , in addition to the zero-vector.

    A set of vectors

    is said to beorthogonalif

    It is orthonormalif, in addition, every vector of has a 2-norm equal to unity. A vector

    that is orthogonal to all the vectors of a subspace is said to be orthogonal to this sub-

    space. The set of all the vectors that are orthogonal to is a vector subspace called the

    orthogonal complementof and denoted by

    . The space

    is the direct sum of

    and

    its orthogonal complement. Thus, any vector can be written in a unique fashion as the

    sum of a vector in and a vector in . The operator which maps into its component in

    the subspace is theorthogonal projectoronto .

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    24/459

    Every subspace admits an orthonormal basis which is obtained by taking any basis and

    orthonormalizing it. The orthonormalization can be achieved by an algorithm known as

    the Gram-Schmidt process which we now describe. Given a set of linearly independent

    vectors

    , first normalize the vector , which means divide it by its 2-

    norm, to obtain the scaled vector

    of norm unity. Then

    is orthogonalized against thevector by subtracting from a multiple of to make the resulting vector orthogonal

    to , i.e.,

    The resulting vector is again normalized to yield the second vector . The -th step of

    the Gram-Schmidt process consists of orthogonalizing the vector

    against all previous

    vectors

    .

    1. Compute

    . If

    Stop, else compute

    .

    2. For

    Do:

    3. Compute

    , for

    4.

    5.

    ,

    6. If

    then Stop, else

    7. EndDo

    It is easy to prove that the above algorithm will not break down, i.e., all steps will

    be completed if and only if the set of vectors is linearly independent. From

    lines 4 and 5, it is clear that at every step of the algorithm the following relation holds:

    If

    ,

    , and if denotes the upper triangular

    matrix whose nonzero elements are the

    defined in the algorithm, then the above relationcan be written as

    This is called the QR decomposition of the

    matrix . From what was said above, the

    QR decomposition of a matrix exists whenever the column vectors of form a linearly

    independent set of vectors.

    The above algorithm is the standard Gram-Schmidt process. There are alternative for-

    mulations of the algorithm which have better numerical properties. The best known of

    these is the Modified Gram-Schmidt (MGS) algorithm.

    1. Define

    . If Stop, else

    .

    2. For Do:

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    25/459

    3. Define

    4. For , Do:

    5.

    6.

    7. EndDo 8. Compute

    ,

    9. If then Stop, else

    10. EndDo

    Yet another alternative for orthogonalizing a sequence of vectors is the Householder

    algorithm. This technique uses Householderreflectors, i.e., matrices of the form

    in which

    is a vector of 2-norm unity. Geometrically, the vector

    represents a mirrorimage of with respect to the hyperplane

    .

    To describe the Householder orthogonalization process, the problem can be formulated

    as that of finding a QR factorization of a given

    matrix . For any vector , the vector

    for the Householder transformation (1.15) is selected in such a way that

    where is a scalar. Writing

    yields

    This shows that the desired

    is a multiple of the vector

    ,

    For (1.16) to be satisfied, we must impose the condition

    which gives

    , where

    is the first component

    of the vector . Therefore, it is necessary that

    In order to avoid that the resulting vector be small, it is customary to take

    which yields

    Given an

    matrix, its first column can be transformed to a multiple of the column

    , by premultiplying it by a Householder matrix ,

    Assume, inductively, that the matrix has been transformed in

    successive steps into

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    26/459

    the partially upper triangular form

    . . .

    ...

    ...

    ......

    ...

    This matrix is upper triangular up to column number

    . To advance by one step, it must

    be transformed into one which is upper triangular up the -th column, leaving the previous

    columns in the same form. To leave the first

    columns unchanged, select a

    vector

    which has zeros in positions through

    . So the next Householder reflector matrix is

    defined as

    in which the vector

    is defined as

    where the components of the vector

    are given by

    if

    if

    if

    with

    We note in passing that the premultiplication of a matrix by a Householder trans-

    form requires only a rank-one update since,

    where

    Therefore, the Householder matrices need not, and should not, be explicitly formed. In

    addition, the vectors

    need not be explicitly scaled.

    Assume now that

    Householder transforms have been applied to a certain matrix

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    27/459

    of dimension

    , to reduce it into the upper triangular form,

    . . . ...

    ...

    ...

    Recall that our initial goal was to obtain a QR factorization of . We now wish to recover

    the and matrices from the s and the above matrix. If we denote by the product

    of the

    on the left-side of (1.22), then (1.22) becomes

    in which is an

    upper triangular matrix, and

    is an

    zero block.

    Since is unitary, its inverse is equal to its transpose and, as a result,

    If

    is the matrix of size which consists of the first columns of the identity

    matrix, then the above equality translates into

    The matrix

    represents the first columns of

    . Since

    and

    are the matrices sought. In summary,

    in which is the triangular matrix obtained from the Householder reduction of

    (see

    (1.22) and (1.23)) and

    1. Define

    2. For

    Do:

    3. If compute

    4. Compute

    using (1.19), (1.20), (1.21)

    5. Compute

    with

    6. Compute

    7. EndDo

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    28/459

    Note that line 6 can be omitted since the are not needed in the execution of the

    next steps. It must be executed only when the matrix is needed at the completion of

    the algorithm. Also, the operation in line 5 consists only of zeroing the components

    and updating the -th component of

    . In practice, a work vector can be used for

    and its nonzero components after this step can be saved into an upper triangular matrix.Since the components 1 through of the vector

    are zero, the upper triangular matrix

    can be saved in those zero locations which would otherwise be unused.

    This section discusses the reduction of square matrices into matrices that have simpler

    forms, such as diagonal, bidiagonal, or triangular. Reduction means a transformation that

    preserves the eigenvalues of a matrix.

    Two matrices and

    are said to be similar if there is a nonsingular

    matrix such that

    The mapping

    is called a similarity transformation.

    It is clear that similarity is an equivalence relation. Similarity transformations preserve

    the eigenvalues of matrices. An eigenvector of

    is transformed into the eigenvector

    of

    . In effect, a similarity transformation amounts to representing the matrix

    in a different basis.

    We now introduce some terminology.

    An eigenvalue of hasalgebraic multiplicity

    , if it is a root of multiplicity

    of the characteristic polynomial.

    If an eigenvalue is of algebraic multiplicity one, it is said to besimple. A nonsimpleeigenvalue ismultiple.

    Thegeometric multiplicity of an eigenvalue of

    is the maximum number of

    independent eigenvectors associated with it. In other words, the geometric multi-

    plicity is the dimension of the eigenspace

    .

    A matrix is derogatory if the geometric multiplicity of at least one of its eigenvalues

    is larger than one.

    An eigenvalue is semisimple if its algebraic multiplicity is equal to its geometric

    multiplicity. An eigenvalue that is not semisimple is called defective.

    Often,

    (

    ) are used to denote thedistincteigenvalues of . It is

    easy to show that the characteristic polynomials of two similar matrices are identical; see

    Exercise 9. Therefore, the eigenvalues of two similar matrices are equal and so are their

    algebraic multiplicities. Moreover, if is an eigenvector of , then is an eigenvector

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    29/459

    of and, conversely, if is an eigenvector of

    then

    is an eigenvector of . As

    a result the number of independent eigenvectors associated with a given eigenvalue is the

    same for two similar matrices, i.e., their geometric multiplicity is also the same.

    The simplest form in which a matrix can be reduced is undoubtedly the diagonal form.

    Unfortunately, this reduction is not always possible. A matrix that can be reduced to the

    diagonal form is called diagonalizable. The following theorem characterizes such matrices.

    A matrix of dimension is diagonalizable if and only if it has line-

    arly independent eigenvectors.

    A matrix is diagonalizable if and only if there exists a nonsingular matrix

    and a diagonal matrix such that

    , or equivalently , where is

    a diagonal matrix. This is equivalent to saying that linearly independent vectors exist

    the column-vectors of

    such that

    . Each of these column-vectors is an

    eigenvector of .

    A matrix that is diagonalizable has only semisimple eigenvalues. Conversely, if all the

    eigenvalues of a matrix are semisimple, then

    has

    eigenvectors. It can be easily

    shown that these eigenvectors are linearly independent; see Exercise 2. As a result, wehave the following proposition.

    A matrix is diagonalizable if and only if all its eigenvalues are

    semisimple.

    Since every simple eigenvalue is semisimple, an immediate corollary of the above result

    is: When has distinct eigenvalues, then it is diagonalizable.

    From the theoretical viewpoint, one of the most important canonical forms of matrices is

    the well known Jordan form. A full development of the steps leading to the Jordan form

    is beyond the scope of this book. Only the main theorem is stated. Details, including the

    proof, can be found in standard books of linear algebra such as [117]. In the following,

    refers to the algebraic multiplicity of the individual eigenvalue and

    is theindexof the

    eigenvalue, i.e., the smallest integer for which

    .

    Any matrix can be reduced to a block diagonal matrix consisting of

    diagonal blocks, each associated with a distinct eigenvalue . Each of these diagonal

    blocks has itself a block diagonal structure consisting of sub-blocks, where is the

    geometric multiplicity of the eigenvalue . Each of the sub-blocks, referred to as a Jordan

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    30/459

    block, is an upper bidiagonal matrix of size not exceeding

    , with the constant

    on the diagonal and the constant one on the super diagonal.

    The -th diagonal block, , is known as the -th Jordan submatrix (sometimes

    Jordan Box). The Jordan submatrix number

    starts in column

    . Thus,

    . . .

    . . .

    where each

    is associated with

    and is of size

    the algebraic multiplicity of

    . It hasitself the following structure,

    . . .

    with

    . . . . . .

    Each of the blocks corresponds to a different eigenvector associated with the eigenvalue

    . Its size

    is the index of .

    Here, it will be shown that any matrix is unitarily similar to an upper triangular matrix. The

    only result needed to prove the following theorem is that any vector of 2-norm one can be

    completed by

    additional vectors to form an orthonormal basis of

    .

    For any square matrix , there exists a unitary matrix such that

    is upper triangular.

    The proof is by induction over the dimension . The result is trivial for

    .

    Assume that it is true for

    and consider any matrix

    of size

    . The matrix admits

    at least one eigenvector that is associated with an eigenvalue . Also assume without

    loss of generality that

    . First, complete the vector into an orthonormal set, i.e.,

    find an

    matrix such that the matrix is unitary. Then

    and hence,

    Now use the induction hypothesis for the

    matrix : There

    exists an

    unitary matrix such that is upper triangular.

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    31/459

    Define the

    matrix

    and multiply both members of (1.24) by from the left and from the right. The

    resulting matrix is clearly upper triangular and this shows that the result is true for , with

    which is a unitary

    matrix.

    A simpler proof that uses the Jordan canonical form and the QR decomposition is the sub-

    ject of Exercise 7. Since the matrix is triangular and similar to

    , its diagonal elements

    are equal to the eigenvalues of ordered in a certain manner. In fact, it is easy to extend

    the proof of the theorem to show that this factorization can be obtained with any orderfor

    the eigenvalues. Despite its simplicity, the above theorem has far-reaching consequences,

    some of which will be examined in the next section.

    It is important to note that for any , the subspace spanned by the first columns

    of is invariant under . Indeed, the relation implies that for , we

    have

    If we let and if is the principal leading submatrix of dimension

    of

    , the above relation can be rewritten as

    which is known as the partial Schur decomposition of . The simplest case of this decom-

    position is when , in which case is an eigenvector. The vectors are usually called

    Schur vectors. Schur vectors are not unique and depend, in particular, on the order chosen

    for the eigenvalues.

    A slight variation on the Schur canonical form is the quasi-Schur form, also called the

    real Schur form. Here, diagonal blocks of size are allowed in the upper triangular

    matrix . The reason for this is to avoid complex arithmetic when the original matrix isreal. A block is associated with each complex conjugate pair of eigenvalues of the

    matrix.

    Consider the matrix

    The matrix has the pair of complex conjugate eigenvalues

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    32/459

    and the real eigenvalue

    . The standard (complex) Schur form is given by the pair

    of matrices

    and

    It is possible to avoid complex arithmetic by using the quasi-Schur form which consists of

    the pair of matrices

    and

    We conclude this section by pointing out that the Schur and the quasi-Schur forms

    of a given matrix are in no way unique. In addition to the dependence on the orderingof the eigenvalues, any column of can be multiplied by a complex sign

    and a new

    corresponding can be found. For the quasi-Schur form, there are infinitely many ways

    to select the blocks, corresponding to applying arbitrary rotations to the columns of

    associated with these blocks.

    The analysis of many numerical techniques is based on understanding the behavior of the

    successive powers

    of a given matrix . In this regard, the following theorem plays a

    fundamental role in numerical linear algebra, more particularly in the analysis of iterative

    methods.

    The sequence

    ,

    converges to zero if and only if

    .

    To prove the necessary condition, assume that

    and consider a unit

    eigenvector associated with an eigenvalue of maximum modulus. We have

    which implies, by taking the 2-norms of both sides,

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    33/459

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    34/459

    For any matrix norm

    , we have

    The proof is a direct application of the Jordan canonical form and is the subject

    of Exercise 10.

    This section examines specific properties of normal matrices and Hermitian matrices, in-cluding some optimality properties related to their spectra. The most common normal ma-

    trices that arise in practice are Hermitian or skew-Hermitian.

    By definition, a matrix is said to be normal if it commutes with its transpose conjugate,

    i.e., if it satisfies the relation

    An immediate property of normal matrices is stated in the following lemma.

    If a normal matrix is triangular, then it is a diagonal matrix.

    Assume, for example, that is upper triangular and normal. Compare the first

    diagonal element of the left-hand side matrix of (1.25) with the corresponding element of

    the matrix on the right-hand side. We obtain that

    which shows that the elements of the first row are zeros except for the diagonal one. The

    same argument can now be used for the second row, the third row, and so on to the last row,

    to show that for .

    A consequence of this lemma is the following important result.

    A matrix is normal if and only if it is unitarily similar to a diagonal

    matrix.

    It is straightforward to verify that a matrix which is unitarily similar to a diagonal

    matrix is normal. We now prove that any normal matrix is unitarily similar to a diagonal

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    35/459

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    36/459

    A normal matrix whose eigenvalues are real is Hermitian.

    As will be seen shortly, the converse is also true, i.e., a Hermitian matrix has real eigenval-

    ues.

    An eigenvalue of any matrix satisfies the relation

    where is an associated eigenvector. Generally, one might consider the complex scalars

    defined for any nonzero vector in

    . These ratios are known as Rayleigh quotientsand

    are important both for theoretical and practical purposes. The set of all possible Rayleigh

    quotients as runs over

    is called thefield of valuesof . This set is clearly bounded

    since each

    is bounded by the the 2-norm of , i.e.,

    for all .

    If a matrix is normal, then any vector in can be expressed as

    where the vectors form an orthogonal basis of eigenvectors, and the expression for

    becomes

    where

    From a well known characterization of convex hulls established by Hausdorff (Hausdorffs

    convex hull theorem), this means that the set of all possible Rayleigh quotients as runs

    over all of

    is equal to the convex hull of the s. This leads to the following theorem

    which is stated without proof.

    The field of values of a normal matrix is equal to the convex hull of its

    spectrum.

    The next question is whether or not this is also true for nonnormal matrices and the

    answer is no: The convex hull of the eigenvalues and the field of values of a nonnormal

    matrix are different in general. As a generic example, one can take any nonsymmetric real

    matrix which has real eigenvalues only. In this case, the convex hull of the spectrum is

    a real interval but its field of values will contain imaginary values. See Exercise 12 for

    another example. It has been shown (Hausdorff) that the field of values of a matrix is a

    convex set. Since the eigenvalues are members of the field of values, their convex hull is

    contained in the field of values. This is summarized in the following proposition.

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    37/459

    The field of values of an arbitrary matrix is a convex set which

    contains the convex hull of its spectrum. It is equal to the convex hull of the spectrum

    when the matrix is normal.

    A first result on Hermitian matrices is the following.

    The eigenvalues of a Hermitian matrix are real, i.e.,

    .

    Let be an eigenvalue of and an associated eigenvector or 2-norm unity.

    Then

    which is the stated result.

    It is not difficult to see that if, in addition, the matrix is real, then the eigenvectors can be

    chosen to be real; see Exercise 21. Since a Hermitian matrix is normal, the following is a

    consequence of Theorem 1.7.

    Any Hermitian matrix is unitarily similar to a real diagonal matrix.

    In particular a Hermitian matrix admits a set of orthonormal eigenvectors that form a basis

    of .

    In the proof of Theorem 1.8 we used the fact that the inner products

    are real.

    Generally, it is clear that any Hermitian matrix is such that

    is real for any vector

    . It turns out that the converse is also true, i.e., it can be shown that if

    is

    real for all vectors

    in

    , then the matrix is Hermitian; see Exercise 15.

    Eigenvalues of Hermitian matrices can be characterized by optimality properties of

    the Rayleigh quotients (1.28). The best known of these is the min-max principle. We nowlabel all the eigenvalues of

    in descending order:

    Here, the eigenvalues are not necessarily distinct and they are repeated, each according to

    its multiplicity. In the following theorem, known as theMin-Max Theorem, represents a

    generic subspace of

    .

    The eigenvalues of a Hermitian matrix are characterized by the

    relation

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    38/459

    Let

    be an orthonormal basis of

    consisting of eigenvectors of

    associated with

    respectively. Let

    be the subspace spanned by the first of

    these vectors and denote by

    the maximum of

    over all nonzero vectors

    of a subspace . Since the dimension of

    is , a well known theorem of linear algebra

    shows that its intersection with any subspace

    of dimension

    is not reduced to

    , i.e., there is vector in

    . For this

    , we have

    so that

    .

    Consider, on the other hand, the particular subspace

    of dimension

    which

    is spanned by

    . For each vector in this subspace, we have

    so that

    . In other words, as runs over all the

    -dimensional

    subspaces,

    is always

    and there is at least one subspace for which

    . This shows the desired result.

    The above result is often called the Courant-Fisher min-max principle or theorem. As a

    particular case, the largest eigenvalue of satisfies

    Actually, there are four different ways of rewriting the above characterization. The

    second formulation is

    and the two other ones can be obtained from (1.30) and (1.32) by simply relabeling the

    eigenvalues increasingly instead of decreasingly. Thus, with our labeling of the eigenvalues

    in descending order, (1.32) tells us that the smallest eigenvalue satisfies

    with

    replaced by if the eigenvalues are relabeled increasingly.

    In order for all the eigenvalues of a Hermitian matrix to be positive, it is necessary and

    sufficient that

    Such a matrix is called positive definite. A matrix which satisfies

    for any is

    said to bepositive semidefinite. In particular, the matrix

    is semipositive definite for

    any rectangular matrix, since

    Similarly, is also a Hermitian semipositive definite matrix. The square roots of the

    eigenvalues of for a general rectangular matrix are called thesingular values of

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    39/459

    and are denoted by . In Section 1.5, we have stated without proof that the 2-norm of

    any matrix is equal to the largest singular value of

    . This is now an obvious fact,

    because

    which results from (1.31).

    Another characterization of eigenvalues, known as the Courant characterization, is

    stated in the next theorem. In contrast with the min-max theorem, this property is recursive

    in nature.

    The eigenvalue and the corresponding eigenvector of a Hermi-

    tian matrix are such that

    and for ,

    In other words, the maximum of the Rayleigh quotient over a subspace that is orthog-

    onal to the first eigenvectors is equal to and is achieved for the eigenvector

    associated with . The proof follows easily from the expansion (1.29) of the Rayleigh

    quotient.

    Nonnegative matrices play a crucial role in the theory of matrices. They are important in

    the study of convergence of iterative methods and arise in many applications includingeconomics, queuing theory, and chemical engineering.

    Anonnegative matrix is simply a matrix whose entries are nonnegative. More gener-

    ally, a partial order relation can be defined on the set of matrices.

    Let and be two matrices. Then

    if by definition, for , . If

    denotes the zero matrix,

    then isnonnegativeif

    , andpositiveif

    . Similar definitions hold in which

    positive is replaced by negative.

    The binary relation imposes only apartial order on since two arbitrary matrices

    in are not necessarily comparable by this relation. For the remainder of this section,

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    40/459

    we now assume that only square matrices are involved. The next proposition lists a number

    of rather trivial properties regarding the partial order relation just defined.

    The following properties hold.

    The relation for matrices is reflexive ( ), antisymmetric (if and , then ), and transitive (if and , then ).

    If and are nonnegative, then so is their product and their sum .

    If is nonnegative, then so is

    .

    If , then .

    If

    , then

    and similarly

    .

    The proof of these properties is left as Exercise 23.

    A matrix is said to bereducibleif there is a permutation matrix

    such that

    is block upper triangular. Otherwise, it isirreducible. An important result concerning non-

    negative matrices is the following theorem known as the Perron-Frobenius theorem.

    Let be a real nonnegative irreducible matrix. Then

    ,

    the spectral radius of , is a simple eigenvalue of . Moreover, there exists an eigenvector

    with positive elements associated with this eigenvalue.

    A relaxed version of this theorem allows the matrix to be reducible but the conclusion is

    somewhat weakened in the sense that the elements of the eigenvectors are only guaranteed

    to benonnegative.Next, a useful property is established.

    Let

    be nonnegative matrices, with

    . Then

    and

    Consider the first inequality only, since the proof for the second is identical. The

    result that is claimed translates into

    which is clearly true by the assumptions.

    A consequence of the proposition is the following corollary.

    Let and

    be two nonnegative matrices, with

    . Then

    The proof is by induction. The inequality is clearly true for

    . Assume that(1.35) is true for . According to the previous proposition, multiplying (1.35) from the left

    by results in

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    41/459

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    42/459

    A matrix is said to be an -matrix if it satisfies the following four

    properties:

    for

    .

    for .

    is nonsingular.

    .

    In reality, the four conditions in the above definition are somewhat redundant and

    equivalent conditions that are more rigorous will be given later. Let be any matrix which

    satisfies properties (1) and (2) in the above definition and let be the diagonal of

    . Since

    ,

    Now define

    Using the previous theorem,

    is nonsingular and

    if and only if

    . It is now easy to see that conditions (3) and (4) of Definition 1.4

    can be replaced by the condition

    .

    Let a matrix be given such that

    for

    .

    for .

    Then is an

    -matrix if and only if

    , where

    .

    From the above argument, an immediate application of Theorem 1.15 shows that

    properties (3) and (4) of the above definition are equivalent to

    , where

    and

    . In addition,

    is nonsingular iff

    is and

    is nonnegative iff is.

    The next theorem shows that the condition (1) in Definition 1.4 is implied by the other

    three.

    Let a matrix be given such that

    for

    .

    is nonsingular.

    .

    Then

    for , i.e., is an -matrix.

    where

    .

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    43/459

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    44/459

    It must be emphasized that this definition is only useful when formulated entirely for real

    variables. Indeed, if were not restricted to be real, then assuming that

    is real

    for all complex would imply that is Hermitian; see Exercise 15. If, in addition to

    Definition 1.41, is symmetric (real), then

    is said to be Symmetric Positive Definite

    (SPD). Similarly, if

    is Hermitian, then

    is said to beHermitian Positive Definite (HPD).Some properties of HPD matrices were seen in Section 1.9, in particular with regards

    to their eigenvalues. Now the more general case where is non-Hermitian and positive

    definite is considered.

    We begin with the observation that any square matrix (real or complex) can be decom-

    posed as

    in which

    Note that both and are Hermitian while the matrix

    in the decomposition (1.42)

    is skew-Hermitian. The matrix in the decomposition is called the Hermitian part of

    , while the matrix is theskew-Hermitian partof . The above decomposition is the

    analogue of the decomposition of a complex number

    into

    ,

    When is real and is a real vector then

    is real and, as a result, the decom-

    position (1.42) immediately gives the equality

    This results in the following theorem.

    Let be a real positive definite matrix. Then is nonsingular. In

    addition, there exists a scalar

    such that

    for any real vector .

    The first statement is an immediate consequence of the definition of positive defi-

    niteness. Indeed, if were singular, then there would be a nonzero vector such that

    and as a result

    for this vector, which would contradict (1.41). We now prove

    the second part of the theorem. From (1.45) and the fact that is positive definite, we

    conclude that is HPD. Hence, from (1.33) based on the min-max theorem, we get

    Taking

    yields the desired inequality (1.46).

    A simple yet important result which locates the eigenvalues of in terms of the spectra

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    45/459

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse.Linear.Systems.(2000).pdf

    46/459

    Projection operators orprojectorsplay an important role in numerical linear algebra, par-

    ticularly in iterative methods for solving various matrix problems. This section introduces

    these operators from a purely algebraic point of view and gives a few of their important

    properties.

    A projector is any linear mapping from

    to itself which is idempotent, i.e., such that

    A few simple properties follow from this definition. First, if is a projector, then so is

    , and the following relation holds,

    In addition, the two subspaces

    and

    intersect only at the element zero.

    Indeed, if a vector belongs to

    , then , by the idempotence property. If it

    is also in

    , then . Hence, which proves the result. Moreover,

    every element of

    can be written as

    . Therefore, the space

    canbe decomposed as the direct sum

    Conversely, every pair of subspaces and which forms a direct sum of defines a

    unique projector such that

    and

    . This associated projector

    maps an element of

    into the component , where is the -component in the

    unique decomposition

    associated with the direct sum.

    In fact, this association is unique, that is, an arbitrary projector can be entirely

    determined by the given of two subspaces: (1) The range of , and (2) its null space

    which is also the range of

    . For any , the vector satisfies the conditions,

    The linear mapping is said to project onto andalongorparallel tothe subspace .

    If is of rank , then the range of

    is of dimension

    . Therefore, it is natural to

    define through its orthogonal complement which has dimension . The above

    conditions that define for any become

    These equations define a projector onto andorthogonalto the subspace . The first

    statement, (1.51), establishes the degrees of freedom, while the second, (1.52), gives

  • 7/27/2019 Saad.Y.-.Iterative.Methods.for.Sparse