Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
General Properties of Sparse Matrices Sparse Matrices and Graphs Reordering Gaussian Elimination for Sparse Matrices
4.4. Gaussian Elimination for Sparse Matrices4.4.1. Algebraic Pivoting in GE
• Numerical pivoting: for eliminating elements in column k chooselarge(st) entry in column/row/block k and permute this elementon the diagonal position.
• Disadvantage: may lead to large fill in in the sparsity pattern of A.
• Idea: Choose pivot element according to minimum fill in! Notethat for well-conditioned A = AT > 0 no numerical pivoting isnecessary.
• Heuristic: Choose pivot element according to the degree in graph→ minimum degree reordering
Parallel Numerics, WT 2013/2014 4 Sparse Matrices
page 37 of 49
General Properties of Sparse Matrices Sparse Matrices and Graphs Reordering Gaussian Elimination for Sparse Matrices
Special Case A = AT
• For elimination in the k th column of A:
– Define rm := number of nonzero entries in row m– Choose pivot index i by ri = minm rm
– Do the pivot permutation and the elimination– Go to next column k
• rm is #nonzeros in the mth row = #vertices directly connectedwith vertex mHence, pivot vertex is vertex with minimum degree in G(Ak )
• Heuristics: few entries in mth row/column→ few fill in because
– only few elements to eliminate– the pivot row used in the elimination is very sparse
→ Multiple minimum degree reordering
Parallel Numerics, WT 2013/2014 4 Sparse Matrices
page 38 of 49
General Properties of Sparse Matrices Sparse Matrices and Graphs Reordering Gaussian Elimination for Sparse Matrices
Multiple Minimum Degree reordering MMD
• Often, many vertices may have the minimum degree. Which oneshould be chosen?
• Choose an independent set: Indices with the same minimumdegree, but that are no neighbors.
– Eliminating any will have no effect on the degree of theothers, hence we can eliminate in Gaussian Eliminationprocess in parallel (compare Multifrontal methods)
– It reduces also the runtime because the outer loop is lessthan n.
– It often leads to fewer nonzeros.
• Approximative Minimum Degree Reordering (AMD) usesapproximations of the degrees.
Parallel Numerics, WT 2013/2014 4 Sparse Matrices
page 39 of 49
General Properties of Sparse Matrices Sparse Matrices and Graphs Reordering Gaussian Elimination for Sparse Matrices
Generalization to Nonsymmetric Problems:Markowitz
• Define rm := nnz in row m; cp := nnz in column p
• Choose pivot element with index pair (i , j) such that
(ri − 1)(cj − 1) = minm,p
(rm − 1)(cp − 1)
• Heuristics:– small cj leads to few elimination steps– small ri leads to sparse pivot row used in the elimination.
• Special case ri = 1 or cj = 1: no fill in.
• Include numerical pivoting by applying algebraic pivoting only onindices with absolute value that is not to small, e.g.,
|ai,j | ≥ 0.1 ·maxr ,s|ar ,s|
Parallel Numerics, WT 2013/2014 4 Sparse Matrices
page 40 of 49
General Properties of Sparse Matrices Sparse Matrices and Graphs Reordering Gaussian Elimination for Sparse Matrices
Comparison
Example:
A =
∗ ∗ ∗ . . . ∗∗ ∗∗ ∗...
. . .∗ ∗
First elimination step leads to dense matrix!
∗ ∗ ∗ . . . ∗0 ∗ ∗ . . . ∗0 ∗ ∗ . . . ∗...
......
. . ....
0 ∗ ∗ . . . ∗
Parallel Numerics, WT 2013/2014 4 Sparse Matrices
page 41 of 49
General Properties of Sparse Matrices Sparse Matrices and Graphs Reordering Gaussian Elimination for Sparse Matrices
Comparison (cont.)
Cuthill McKee:
With starting vertex 1:keeps numbers unchanged→ no improvement
Optimal bandwidth is not satisfactory: bandwidht n2
Parallel Numerics, WT 2013/2014 4 Sparse Matrices
page 42 of 49
General Properties of Sparse Matrices Sparse Matrices and Graphs Reordering Gaussian Elimination for Sparse Matrices
Comparison (cont. 2)Minimum degree is efficient for this example (PAPT ):
Optimal reordering in one step: 1 ↔ n Optimal costs: O(n).
∗ 0 0 . . . ∗0 ∗ . . . ∗
0 ∗...
......
.... . . ∗
∗ ∗ . . . ∗ ∗
Parallel Numerics, WT 2013/2014 4 Sparse Matrices
page 43 of 49
General Properties of Sparse Matrices Sparse Matrices and Graphs Reordering Gaussian Elimination for Sparse Matrices
Comparison (cont. 2)Minimum degree is efficient for this example (PAPT ):
Optimal reordering in one step: 1 ↔ n Optimal costs: O(n).
∗ 0 0 . . . ∗0 ∗ . . . ∗
0 ∗...
......
.... . . ∗
∗ ∗ . . . ∗ ∗
Parallel Numerics, WT 2013/2014 4 Sparse Matrices
page 43 of 49
General Properties of Sparse Matrices Sparse Matrices and Graphs Reordering Gaussian Elimination for Sparse Matrices
Example: Nonsymmetric PermutationCosts: O(n2)
Test: MATLAB (symamd, colamd, amd, colperm, symrcm)load(’west0479.mat’);a=west0479; s=a’*a; p=symamd(s); spy(s(p,p));
See matrix ’west0479.mat’ on Matrix Market:
http://math.nist.gov/MatrixMarket/data/Harwell-Boeing/chemwest/west0479.html
Parallel Numerics, WT 2013/2014 4 Sparse Matrices
page 44 of 49
General Properties of Sparse Matrices Sparse Matrices and Graphs Reordering Gaussian Elimination for Sparse Matrices
4.4.2. Gaussian Elimination in Graph
• Gaussian elimination can be modelled without numericalcomputations only algebraically by computing the sequence ofrelated graphs (in terms of dense subgraphs (matrices) = clique)
• Modification of GE:
1. Apply algebraic prestep for GE, determining the graphsrelated to the elimination matrices Ak in GE.
2. Based on these graphs we can decide• whether GE will lead to nearly dense matrices (do not use GE in this
case!)• what additional entries will appear during GE (prepare the storage)
• Algebraic prestep is cheap, can be implemented using cliques.
Parallel Numerics, WT 2013/2014 4 Sparse Matrices
page 45 of 49
General Properties of Sparse Matrices Sparse Matrices and Graphs Reordering Gaussian Elimination for Sparse Matrices
4.4.3. A Parallel Direct Solver
Frontal methods for band matrices (from PDE)
• Frontal dense matrix of size (β + 1)× (2β + 1)
• Move frontal matrix to CPU and treat as dense matrix.
• Then move frontal matrix one entry down-right and do nextelimination.
• Repeat until done.
• No parallelism until now.
Parallel Numerics, WT 2013/2014 4 Sparse Matrices
page 46 of 49
General Properties of Sparse Matrices Sparse Matrices and Graphs Reordering Gaussian Elimination for Sparse Matrices
Multifrontal in ParallelTo make efficient use of parallelism search for “independent“elimination steps!
A1,1 as first pivot element is related to a first frontal matrix thatcontains all information to eliminate the first column:Dense submatrix for k = 1:
Parallel Numerics, WT 2013/2014 4 Sparse Matrices
page 47 of 49
General Properties of Sparse Matrices Sparse Matrices and Graphs Reordering Gaussian Elimination for Sparse Matrices
Multifrontal in Parallel (cont.)
Because A1,2 = A2,1 = 0 we can in parallel consider already thefrontal matrix related to k = 2, the second step:
Parallel Numerics, WT 2013/2014 4 Sparse Matrices
page 48 of 49
General Properties of Sparse Matrices Sparse Matrices and Graphs Reordering Gaussian Elimination for Sparse Matrices
Multifrontal in Parallel (cont. 2)
The computations for k = 1 and k = 2 are independent and canbe done in parallel (in the 2× 2 block):
k = 1:Ai,j → Ai,j −
Ai,1 · A1,j
A1,1
k = 2:Ai,j → Ai,j −
Ai,2 · A2,j
A2,2
Number of frontal matrices that can be used in parallel dependson the sparsity pattern.
Parallel Numerics, WT 2013/2014 4 Sparse Matrices
page 49 of 49
Stationary Methods Nonstationary Methods Preconditioning
Parallel Numerics, WT 2013/2014
5 Iterative Methods for Sparse Linear Systemsof Equations
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 1 of 73
Stationary Methods Nonstationary Methods Preconditioning
Contents1 Introduction
1.1 Computer Science Aspects1.2 Numerical Problems1.3 Graphs1.4 Loop Manipulations
2 Elementary Linear Algebra Problems2.1 BLAS: Basic Linear Algebra Subroutines2.2 Matrix-Vector Operations2.3 Matrix-Matrix-Product
3 Linear Systems of Equations with Dense Matrices3.1 Gaussian Elimination3.2 Parallelization3.3 QR-Decomposition with Householder matrices
4 Sparse Matrices4.1 General Properties, Storage4.2 Sparse Matrices and Graphs4.3 Reordering4.4 Gaussian Elimination for Sparse Matrices
5 Iterative Methods for Sparse Linear Systems of Equations5.1 Stationary Methods5.2 Nonstationary Methods5.3 Preconditioning
6 Domain Decomposition6.1 Overlapping Domain Decomposition6.2 Non-overlapping Domain Decomposition6.3 Schur Complements
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 2 of 73
Stationary Methods Nonstationary Methods Preconditioning
• Disadvantages of direct methods (in parallel):– strongly sequential– may lead to dense matrices– sparsity pattern changes, additional entries necessary– indirect addressing– storage– computational effort
• Iterative solver:– choose initial guess = starting vector x (0), e.g., x (0) = 0– iteration function x (k+1) := Φ(x (k))
• Applied on solving a linear system:– Main part of Φ should be a matrix-vector multiplication Ax
(matrix-free!?)– Easy to parallelize, no change in the pattern of A.
x (k) k→∞−→ x̄ = A−1b
– Main problem: Fast convergence!Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 3 of 73
Stationary Methods Nonstationary Methods Preconditioning
5.1. Stationary Methods5.1.1. Richardson Iteration
• Construct from Ax = b an iteration process:
b = Ax = ( A− I + I︸ ︷︷ ︸(artificial) splitting of A
)x = x − (I − A)x ⇒ x = b + (I − A)x
= b + Nx
• Leads to equation x = Φ(x) with Φ(x) := b + Nx :
start: x (0);
x (k+1) := Φ(x (k)) = b + Nx (k) = b + (I − A)x (k)
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 4 of 73
Stationary Methods Nonstationary Methods Preconditioning
Richardson Iteration (cont.)start: x (0);
x (k+1) := Φ(x (k)) = b + Nx (k) = b + (I − A)x (k)
If x (k) is convergent, x (k) → x̃ ,then
x̃ = Φ(x̃) = b + Nx̃ = b + (I − A)x̃ ⇒ Ax̃ = b
and therefore it holds
x (k) → x̃ = x̄ := A−1b
Residual-based formulation:
x (k+1) = Φ(x (k)) = b + (I − A)x (k) = b + x (k) − Ax (k)
= x (k) + (b − Ax (k))︸ ︷︷ ︸r(x) = residual
= x (k) + r(x (k))
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 5 of 73
Stationary Methods Nonstationary Methods Preconditioning
Convergence Analysis via Neumann Series
x (k) = b + Nx (k−1) = b + N(b + Nx (k−2)) = b + Nb + N2x (k−2) =
. . . = b + Nb + N2b + · · ·+ Nk−1b + Nk x (0) =
=∑k−1
j=0N jb + Nk x (0) =
(∑k−1
j=0N j)
b + Nk x (0)
Special case x (0) = 0:
x (k) =
(∑k−1
j=0N j)
b
⇒ x (k) ∈ span{b,Nb,N2b, . . . ,Nk−1b} = span{b,Ab,A2b, . . . ,Ak−1b}= Kk (A,b)
which is called the Krylov space to A and b.
For ‖N‖ < 1 holds:∑k−1
j=0N j →
∑∞
j=0N j = (I − N)−1 = (I − (I − A))−1 = A−1
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 6 of 73
Stationary Methods Nonstationary Methods Preconditioning
Convergence Analysis via Neumann Series(cont.)
x (k) →(∑∞
j=0N j)
b = (I − N)−1b = A−1b = x̄
Richardson iteration is convergent for ‖N‖ < 1 or A ≈ I.Error analysis for e(k) := x (k) − x̄ :
e(k+1) = x (k+1) − x̄ = Φ(x (k))− Φ(x̄) = (b + Nx (k))− (b + Nx̄) =
= N(x (k) − x̄) = Ne(k)
‖e(k)‖ ≤ ‖N‖‖e(k−1)‖ ≤ ‖N‖2‖e(k−2)‖ ≤ · · · ≤ ‖N‖k‖e(0)‖
‖N‖ < 1⇒ ‖N‖k k→∞−→ 0⇒ ‖e(k)‖ k→∞−→ 0
• Convergence, if ρ(N) = ρ(I − A) < 1, where ρ is spectral radius
ρ(N) = |λmax| = maxi
(|λi |) (λi is eigenvalue of N)
• Eigenvalues of A have to be all in circle around 1 with radius 1.Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 7 of 73
Stationary Methods Nonstationary Methods Preconditioning
Splittings of A• Convergence of Richardson only in very special cases!
Try to improve the iteration for better convergence!• Write A in form A := M − N
b = Ax = (M − N)x = Mx − Nx ⇔ x = M−1b + M−1Nx = Φ(x)
Φ(x) = M−1b + M−1Nx = M−1b + M−1(M − A)x =
= M−1(b − Ax) + x = x + M−1r(x)
• N should be such that Ny can be evaluated efficiently.• M should be such that M−1y can be evaluated efficiently.
x (k+1) = x (k) + M−1r (k)
• Iteration with splitting M − N is equivalent to Richardson on
M−1Ax = M−1b
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 8 of 73
Stationary Methods Nonstationary Methods Preconditioning
Convergence
• Iteration with splitting A = M − N is convergent if
ρ(M−1N) = ρ(I −M−1A) < 1
• For fast convergence it should hold
– M−1A ≈ I– M−1A should be better conditioned than A itself
• Such a matrix M is called a preconditioner for A.Is used in other iterative methods to accelerate convergence.
• Condition number:
κ(A) = ‖A−1‖‖A‖,∣∣∣∣λmax
λmin
∣∣∣∣ , orσmax
σmin
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 9 of 73
Stationary Methods Nonstationary Methods Preconditioning
5.1.2. Jacobi (Diagonal) Splitting
Choose A = M − N = D − (L + U) withD = diag(A)
L the lower triangular part of A, and
U the upper triangular part.
x (k+1) = D−1b + D−1(L + U)x (k) =
= D−1b + D−1(D − A)x (k) = x (k) + D−1r (k)
Convergent for A ≈ diag(A) or diagonal dominant matrices:
ρ(D−1N) = ρ(I − D−1A) < 1
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 10 of 73
Stationary Methods Nonstationary Methods Preconditioning
Jacobi (Diagonal) Splitting (cont.)Iteration process written elementwise:
x (k+1) = D−1(b − (A−D)x (k))⇒ x (k+1)j =
1ajj
bj −n∑
m=1,m 6=j
aj,mx (k)m
ajjx
(k+1)j = bj −
∑j−1
m=1aj,mx (k)
m −∑n
m=j+1aj,mx (k)
m
• Damping or relaxation for improving convergence• Idea: Iterative method as correction of last iterate in search
direction.• Introduce step length for this correction step:
x (k+1) = x (k) + D−1r (k) → x (k+1) = x (k) + ωD−1r (k)
with additional damping parameter ω.• Damped Jacobi iteration:
x (k+1)damped = (ω + 1− ω)x (k) + ωD−1r (k) = ωx (k+1) + (1− ω)x (k)
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 11 of 73
Stationary Methods Nonstationary Methods Preconditioning
Damped Jacobi Iteration
x (k+1) = x (k) + ωD−1r (k) = x (k) + ωD−1(b − Ax (k)) =
= . . .
= ωD−1b + [(1− ω)I + ωD−1(L + U)]x (k)
is convergent for
ρ([(1− ω)I + ωD−1(L + U)]︸ ︷︷ ︸ω→0−→ I
) < 1
Look for optimal ω with best convergence (add. degree of freedom).
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 12 of 73
Stationary Methods Nonstationary Methods Preconditioning
Parallelism in the Jacobi Iteration
• Jacobi method is easy to parallelize: only Ax and D−1x .
• But often too slow convergence!
• Improvement: block Jacobi iteration
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 13 of 73
Stationary Methods Nonstationary Methods Preconditioning
5.1.3. Gauss-Seidel Iteration
Always use newest information available!
Jacobi iteration:
ajjx(k+1)j = bj −
j−1∑m=1
aj,m x (k)m︸︷︷︸
already computed
−n∑
m=j+1
aj,mx (k)m
Gauss-Seidel iteration:
ajjx(k+1)j = bj −
j−1∑m=1
aj,m x (k+1)m︸ ︷︷ ︸
already computed
−n∑
m=j+1
aj,mx (k)m
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 14 of 73
Stationary Methods Nonstationary Methods Preconditioning
Gauss-Seidel Iteration (cont.)
• Compare dependancy graphs for general iterative algorithms.Here:
x = f (x) = D−1(b + (D − A)x) = D−1(b − (L + U)x)
to splitting A = (D − L)− U = M − N
x (k+1) = (D − L)−1b + (D − L)−1Ux (k) =
= (D − L)−1b + (D − L)−1(D − L− A)x (k) =
= x (k) + (D − L)−1r (k)
• Convergence depends on spectral radius ρ(I − (D − L)−1A) < 1
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 15 of 73
Stationary Methods Nonstationary Methods Preconditioning
Parallelism in the Gauss-Seidel Iteration
• Linear system in D − L is easy to solve because D − L is lowertriangular but
• strongly sequential!
• Use red-black ordering or graph colouring for compromise:
parallel↔ convergence
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 16 of 73
Stationary Methods Nonstationary Methods Preconditioning
Successive Over Relaxation (SOR)
• Damping or relaxation:
x (k+1) = x (k)+ω(D−L)−1r (k) = ω(D−L)−1b+[(1−ω)+ω(D−L)−1U]x (k)
• Convergence depends on spectral radius of iteration matrix
(1− ω) + ω(D − L)−1U
• Parallelization of SOR == parallelization of GS
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 17 of 73
Stationary Methods Nonstationary Methods Preconditioning
Stationary Methods (in General)
• Can always be written in the two normal forms
x (k+1) = c + Bx (k)
with convergence depending on ρ(B) and
x (k+1) = x (k) + Fr (k)
with preconditioner F , B = I − FA
• For x (0) = 0:x (k+1) ⊆ Kk (B, c),
which is the Krylov space with respect to matrix B and vector c.
• Slow convergence (but good smoothing properties!→ multigrid)
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 18 of 73
Stationary Methods Nonstationary Methods Preconditioning
MATLAB Example
• clear; n=100;k=10;omega=1;stationary
• tridiag(−.5,1,−.5):
– Jacobi norm | cos(x)|– GS norm sin(x)√
2(1−cos(x))
– both < 1→ convergence, but slow
• To improve convergence→ nonstationary methods (or multigrid)
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 19 of 73
Stationary Methods Nonstationary Methods Preconditioning
Chair of Informatics V—SCCSEfficient Numerical Algorithms—Parallel & HPC
• High-dimensional numerics (sparse grids)• Fast iterative solvers (multi-level methods,
preconditioners)• Adaptive, octree-based grid generation• Space-filling curves• Numerical linear algebra• Numerical algorithms for image processing• HW-aware numerical programming
Fields of application in simulation
• CFD (incl. fluid-structure interaction)• Plasma physics• Molecular dynamics• Quantum chemistry
Further info→ www5.in.tum.deFeel free to come around and ask for thesis topics!
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 20 of 73
Stationary Methods Nonstationary Methods Preconditioning
5.2. Nonstationary Methods5.2.1. Gradient Method
• Consider A = AT > 0 (A SPD)
Function Φ(x) =12
xT Ax − bT x
• n-dim. paraboloid Rn → R• Gradient ∇Φ(x) = Ax − b
• Position with ∇Φ(x) = 0 is exactlyminimum of paraboloid
• Instead of solving Ax = b considerminx Φ(x)
• Local descent direction in y :∇Φ(x) · y is minimum for
y = −∇Φ(x)
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 21 of 73
Stationary Methods Nonstationary Methods Preconditioning
Gradient Method (cont.)
• Optimization: start with x (0)
x (k+1) := x (k) + αk d (k)
with search direction d (k) and step size αk .
• In view of previous results the optimal (local) search direction is
−∇Φ(x (k)) =: d (k)
• To define αk :
minα
g(α) := minα
(Φ(x (k) + α(b − Ax (k))))
= minα
(12
(x (k) + αd (k))T A(x (k) + αd (k))− bT (x (k) + αd (k))
)= min
α
(12α2d (k)T
Ad (k) − αd (k)Td (k) +
12
x (k)TAx (k) − x (k)T
b)
αk = d (k)T d (k)
d (k)T Ad (k)d (k) = −∇Φ(x (k)) = b − Ax (k)
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 22 of 73
Stationary Methods Nonstationary Methods Preconditioning
Gradient Method (cont. 2)
x (k+1) = x (k) +‖b − Ax (k)‖2
2
(b − Ax (k))T A(b − Ax (k))(b − Ax (k))
• Method of steepest descent.
• Disadvantage: Distortedcontour lines.
• Slow convergence (zig zagpath)
• Local descent direction is notglobally optimal
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 23 of 73
Stationary Methods Nonstationary Methods Preconditioning
Analysis of the Gradient Method
• Definition A-norm:‖x‖A :=
√xT Ax
• Consider error:
‖x − x̄‖2A = ‖x − A−1b‖2
A = (xT − bT A−1)A(x − A−1b)
= xT Ax − 2bT x + bT A−1b= 2Φ(x) + bT A−1b
• Therefore, minimizing Φ is equivalent to minimizing the error inthe A-norm! More detailed analysis reveals:
‖x (k+1) − x̄‖2A ≤
(1− 1
κ(A)
)· ‖x (k) − x̄‖2
A
• Therefore, for κ(A)� 1 very slow convergence!
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 24 of 73
Stationary Methods Nonstationary Methods Preconditioning
5.2.2. The Conjugate Gradient Method
• Improving descent direction being globally optimal.
• x (k+1) := x (k) + αk p(k) with search direction not being negativegradient, but projection of gradient that is A-conjugate to allprevious search directions:
p(k) ⊥ Ap(j) for all j < k orp(k) ⊥A p(j) or
p(k)TAp(j) = 0 for j < k
• We choose new search direction as component of last residualthat is A-conjugate to all previous search directions.
• αk again by 1-dim. minimization as before (for chosen p(k))
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 25 of 73
Stationary Methods Nonstationary Methods Preconditioning
The Conjugate Gradient Algorithm
p(0) = r (0) = b − Ax (0)
for k = 1,2, . . . do
α(k) = − 〈r(k),r (k)〉
〈p(k),Ap(k)〉
x (k+1) = x (k) − α(k)p(k)
r (k+1) = r (k) + α(k)Ap(k)
if ‖r (k+1)‖22 ≤ ε then break
β(k) = 〈r (k+1),r (k+1)〉〈r (k),r (k)〉
p(k+1) = r (k+1) + β(k)p(k)
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 26 of 73
Stationary Methods Nonstationary Methods Preconditioning
Properties of Conjugate Gradients• It holds
p(j)TAp(k) = 0 = r (j)
Tr (k) for j 6= k
• and
span(p(1), . . . ,p(j)) = span(r (0), . . . , r (j−1)) =
= span(r (0),Ar (0), . . . ,Aj−1r (0)) = Kj (A, r (0))
• Especially for x (0) = 0 it holds
Kj (A, r (0)) = span(b,Ab, . . . ,Aj−1b)
• x (k) is best approximate solution to Ax = b in subspaceKk (A, r (0)). For x (0) = 0 : x (k) ∈ Kk (A,b)
• Error:‖x (k) − x̄‖A = min
x∈Kk (A,b)‖x − x̄‖A
• Cheap 1-dim. minimization gives optimal k -dim. solution for free!
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 27 of 73
Stationary Methods Nonstationary Methods Preconditioning
Properties of Conjugate Gradients (cont.)
• Consequence: After n steps Kn(A,b) = Rn and thereforex (n) = A−1b is solution in exact arithmetic.
• Unfortunately, this is not true in floating point arithmetic.
• Furthermore, O(n) iteration steps would be too costly:costs: #iterations ∗ matrix-vector-product
• Matrix-vector-product easy in parallel.
• But, how to get fast convergence and reduce #iterations?
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 28 of 73
Stationary Methods Nonstationary Methods Preconditioning
Error Estimation (x (0) = 0)
‖e(k)‖A = ‖x (k) − x̄‖A = minx∈Kk (A,b)
‖x − x̄‖A =
= minα0,...,αk−1
∥∥∥∥∑k−1
j=0αj (Ajb)− x̄
∥∥∥∥A
=
= minP(k−1)(x)
∥∥∥P(k−1)(A)b − x̄∥∥∥
A=
= minP(k−1)(x)
∥∥∥P(k−1)(A)Ax̄ − x̄∥∥∥
A=
= minP(k−1)(x)
∥∥∥(P(k−1)(A)A− I)(x̄ − x (0))∥∥∥
A=
= minQ(k)(x),Q(k)(0)=1
∥∥∥Q(k)(A)e(0)∥∥∥
A
for polynomial Q(k)(x) of degree k with Q(k)(0) = 1.Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 29 of 73
Stationary Methods Nonstationary Methods Preconditioning
Error Estimation
• Matrix A has orthonormal basis of eigenvectors uj , j = 1, . . . ,n,eigenvalues λj
• It holds
Auj = λjuj , j = 1, . . . ,n, and uTj uk = 0 for j 6= k and 1 for j = k
• Start error in ONB: e(0) =∑n
j=1ρjuj
‖e(k)‖A = minQ(k)(0)=1
∥∥∥∥∥∥Q(k)(A)n∑
j=1
ρjuj
∥∥∥∥∥∥A
= minQ(k)(0)=1
∥∥∥∥∥∥n∑
j=1
ρjQ(k)(A)uj
∥∥∥∥∥∥A
=
= minQ(k)(0)=1
∥∥∥∥∥∥n∑
j=1
ρjQ(k)(λj )uj
∥∥∥∥∥∥A
≤ minQ(k)(0)=1
{max
j=1,...,n
∣∣∣Q(k)(λj )∣∣∣}∥∥∥∥∥∥
n∑j=1
ρjuj
∥∥∥∥∥∥A
=
= minQ(k)(0)=1
{max
j=1,...,n|Q(k)(λj )|
}∥∥∥e(0)∥∥∥
A
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 30 of 73
Stationary Methods Nonstationary Methods Preconditioning
Error Estimates
By choosing polynomials with Q(k)(0) = 1, we can derive errorestimates for the error after the k th step:
Choose, e.g. Q(k)(x) :=
∣∣∣∣1− 2λmax + λmin
x∣∣∣∣k
‖e(k)‖A ≤ maxj=1,...,n
|Q(k)(λj )|∥∥∥e(0)
∥∥∥A
=
∣∣∣∣1− 2λmax
λmax + λmin
∣∣∣∣k ‖e(0)‖A
=
(λmax − λmin
λmax + λmin
)k
‖e(0)‖A =
(κ(A)− 1κ(A) + 1
)k
‖e(0)‖A
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 31 of 73
Stationary Methods Nonstationary Methods Preconditioning
Better Estimates
• Choose normalized Chebyshev polynomialsTn(x) = cos(n arccos(x))
‖e(k)‖A ≤1
Tk
(κ(A)+1κ(A)−1
)‖e(0)‖A ≤ 2
(√κ(A)− 1√κ(A) + 1
)k
‖e(0)‖A
• For clustered eigenvalues choose special polynomial, e.g.assume that A has only two eigenvalues λ1 and λ2:
Q(2)(x) :=(λ1 − x)(λ2 − x)
λ1λ2
‖e(2)‖A ≤ maxj=1,2
∣∣∣Q(2)(λj )∣∣∣ ‖e(0)‖A = 0
Convergence of CG after two steps!
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 32 of 73
Stationary Methods Nonstationary Methods Preconditioning
Outliers/Cluster
Assume the matrix has an eigenvalue λ1 > 1 and all othereigenvalues are contained in an ε-neighborhood of 1:
∀λ 6= λ1 : |λ− 1| < ε
Q(2)(x) :=(λ1 − x)(1− x)
λ1
‖e(2)‖A ≤ max|λ−1|<ε
∣∣∣∣ (λ1 − λ)(1− λ)
λ1
∣∣∣∣ ‖e(0)‖A ≤(λ1 − 1 + ε)ε
λ1= O(ε)
Very good approximation of CG after only two steps!
Important: small number of outliers combined with cluster.
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 33 of 73
Stationary Methods Nonstationary Methods Preconditioning
Conjugate Gradients Summary
• To get fast convergence and reduce the number of iterations:→ find preconditioner M, such that M−1Ax = M−1b withclustered eigenvalues.
• Conjugate gradients (CG) is always the method of choice forsymmetric positive definite A (in general).
• To improve convergence, include preconditioning (PCG).
• CG has two important properties: optimal and cheap.
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 34 of 73
Stationary Methods Nonstationary Methods Preconditioning
Parallel Conjugate Gradients Algorithm
Parallel Numerics, WT 2013/2014 5 Iterative Methods for Sparse Linear Systems of Equations
page 35 of 73