View
2
Download
0
Category
Preview:
Citation preview
Iterative Methods in Linear Algebra(part 1)
Stan Tomov
Innovative Computing LaboratoryComputer Science Department
The University of Tennessee
Wednesday March 31, 2010
CS 594, 03-31-2010
CS 594, 03-31-2010
Outline
Part IDiscussionMotivation for iterative methods
Part IIStationary Iterative Methods
Part IIINonstationary Iterative Methods
CS 594, 03-31-2010
Part I
Discussion
CS 594, 03-31-2010
Discussion
Part 1:
CS 594, 03-31-2010
Discussion
Part 2:
CS 594, 03-31-2010
Discussion
Original matrix:>> load matrix.output>> S = spconvert(matrix);>> spy(S)
Reordered matrix:>> load A.data>> S = spconvert(A);>> spy(S)
CS 594, 03-31-2010
Discussion
Original sub-matrix:>> load matrix.output
>> S = spconvert(matrix);
>> spy(S(1:8660),1:8660))
Reordered sub-matrix:>> load A.data
>> S = spconvert(A);
>> spy(S(8602:8700,8602:8700))
CS 594, 03-31-2010
Discussion
Results on torc0:Pentium III 933 MHz, 16 KB L1and 256 KB L2 cache
Original matrix:Mflop/s = 42.4568
Reordered matrix:Mflop/s = 45.3954
BCRSMflop/s = 72.17374
Results on woodstock:Intel Xeon 5160 Woodcrest3.0GHz, L1 32 KB, L2 4 MB
Original matrix:Mflop/s = 386.8
Reordered matrix:Mflop/s = 387.6
BCRSMflop/s = 894.2
CS 594, 03-31-2010
Homework 9
CS 594, 03-31-2010
Questions?
Homework #9
PART I
PETScMany iterative solversWe will see how is projection used in iterative methods
PART II
hybrid CPU-GPUs computations
CS 594, 03-31-2010
Hybrid CPU-GPU computations
CS 594, 03-31-2010
Iterative Methods
CS 594, 03-31-2010
Motivation
So far we have discussed and have seen:
Many engineering and physics simulations lead to sparsematricese.g. PDE based simulations and their discretizations based on
FEM/FVMfinite differencesPetrov-Galerkin type of conditions, etc.
How to optimize performance on sparse matrix computations
Some software how to solve sparse matrix problems (e.g.PETSc)
The question is:Can we solve sparse matrix problems faster than using directsparse methods?
In certain cases Yes:using iterative methods
CS 594, 03-31-2010
Sparse direct methods
Sparse direct methods
These are based on Gaussian elimination (LU)
Performed in sparse format
Are there limitations of the approach?
Yes, they have fill-ins which lead to
more memory requirementsmore flops being performed
Fill-ins can become prohibitively high
CS 594, 03-31-2010
Sparse direct methods
Consider LU for the matrix below
a nonzero is represented bya *
1st step of LU factorization willintroduce fill-ins
marked by an F
CS 594, 03-31-2010
Sparse direct methods
Fill-ins can be improved by reordering
Remember: we talked about it in slightly different context(for speed and parallelization)
Consider the following reordering
These were extreme cases
but still, the problem exists
CS 594, 03-31-2010
Sparse methods
Sparse direct vs dense methods
Dense takes O(n2) storage, O(n3) flops, runs within peakperformance
Sparse direct can take O(#nonz) storage, and O(#nonz) flops, butthese can also grow due to fill-ins and performance is bad
with #nonz << n2 and ’proper’ ordering it pays off to do sparsedirect
Software (table from Julien Langou)http://www.netlib.org/utk/people/JackDongarra/la-sw.html
Package Support Type Language ModeReal Complex f77 c c++ Seq Dist SPD Gen
DSCPACK yes X X X M XHSL yes X X X X X X
MFACT yes X X X XMUMPS yes X X X X X M X X
PSPASES yes X X X M XSPARSE ? X X X X X X
SPOOLES ? X X X X M XSuperLU yes X X X X X M XTAUCS yes X X X X X X
UMFPACK yes X X X X XY12M ? X X X X X
CS 594, 03-31-2010
Sparse methods
What about Iterative Methods? Think for example
xi+1 = xi + P(b − Axi )
Pluses:
Storage is only O(#nonz) (for the matrix and a few workingvectors)
Can take only a few iterations to converge (e.g. << n)
for P = A−1 it takes 1 iteration (check)!
In general easy to parallelize
CS 594, 03-31-2010
Sparse methods
What about Iterative Methods? Think for example
xi+1 = xi + P(b − Axi )
Minuses:
Performance can be bad(as we saw in Lecture 11 and today’s discussion)
Convergence can be slow or even stagnateBut can be improved with preconditioning
Think of P as a preconditioner, an operator/matrix P ≈ A−1
Optimal preconditioners (e.g. multigrid can be) lead toconvergence in O(1) iterations
CS 594, 03-31-2010
Part II
Stationary Iterative Methods
CS 594, 03-31-2010
Stationary Iterative Methods
Can be expressed in the form
xi+1 = Bxi + c
where B and c do not depend on i
older, simpler, easy to implement, but usually not as effective(as nonstationary)
examples: Richardson, Jacobi, Gauss-Seidel, SOR, etc. (next)
CS 594, 03-31-2010
Richardson Method
Richardson iteration
xi+1 = xi + (b − Axi ) = (I − A)xi + b (1)
i.e. B from the definition above is B = I − A
Denote ei = x − xi and rewrite (1)
x − xi+1 = x − xi − (Ax − Axi )
ei+1 = ei − Aei
= (I − A)ei
||ei+1|| ≤ ||(I − A)ei || ≤ ||I − A|| ||ei || ≤ ||I − A||2 ||ei−1||
≤ · · · ≤ ||I − A||i ||e0||
i.e. for convergence (ei → 0) we need
||I − A|| < 1
for some norm || · ||, e.g. when ρ(A) < 1.
CS 594, 03-31-2010
Jacobi Method
Jacobi Method
xi+1 = xi + D−1(b − Axi ) = (I − D−1A)xi + D−1b (2)
where D is the diagonal of A (assumed nonzero; B = I − D−1Afrom the definition)
Denote ei = x − xi and rewrite (2)
x − xi+1 = x − xi − D−1(Ax − Axi )
ei+1 = ei − D−1Aei
= (I − D−1A)ei
||ei+1|| ≤ ||(I − D−1A)ei || ≤ ||I − D−1A|| ||ei || ≤ ||I − D−1A||2 ||ei−1||
≤ · · · ≤ ||I − D−1A||i ||e0||
i.e. for convergence (ei → 0) we need
||I − D−1A|| < 1
for some norm || · ||, e.g. when ρ(D−1A) < 1
CS 594, 03-31-2010
Gauss-Seidel Method
Gauss-Seidel Method
xi+1 = (D − L)−1(Uxi + b) (3)
where −L is the lower and −U the upper triangular part of A, andD is the diagonal.
Equivalently:8>>>>>><>>>>>>:
x(i+1)1 = (b1 − a12x
(i)2 − a13x
(i)3 − a14x
(i)4 − . . .− a1nx
(i)n )/a11
x(i+1)2 = (b2 − a21x
(i+1)1 − a23x
(i)3 − a24x
(i)4 − . . .− a2nx
(i)n )/a22
x(i+1)3 = (b2 − a31x
(i+1)1 − a32x
(i+1)2 − a34x
(i)4 − . . .− a3nx
(i)n )/a22
...
x(i+1)n = (bn − an1x
(i+1)1 − an2x
(i+1)2 − an3x
(i+1)3 − . . .− an,n−1x
(i+1)n−1 )/ann
CS 594, 03-31-2010
A comparison (from Julien Langou slides)
Gauss-Seidel method
A =
0BBB@10 1 2 0 10 12 1 3 11 2 9 1 00 3 1 10 01 2 0 0 15
1CCCA b =
0BBB@2344364980
1CCCA
‖b − Ax(i)‖2/‖b‖2, function of i , the number of iterations
x(0) x(1) x(2) x(3) x(4) . . .0.0000 2.3000 0.8783 0.9790 0.9978 1.00000.0000 3.6667 2.1548 2.0107 2.0005 2.00000.0000 2.9296 3.0339 3.0055 3.0006 3.00000.0000 3.5070 3.9502 3.9962 3.9998 4.00000.0000 4.6911 4.9875 5.0000 5.0001 5.0000
CS 594, 03-31-2010
A comparison (from Julien Langou slides)Jacobi Method: 8>>>>>>>><>>>>>>>>:
x(i+1)1 = (b1 − a12x
(i)2 − a13x
(i)3 − a14x
(i)4 − . . .− a1nx
(i)n )/a11
x(i+1)2 = (b2 − a21x
(i)1 − a23x
(i)3 − a24x
(i)4 − . . .− a2nx
(i)n )/a22
x(i+1)3 = (b2 − a31x
(i)1 − a32x
(i)2 − a34x
(i)4 − . . .− a3nx
(i)n )/a22
.
.
.
x(i+1)n = (bn − an1x
(i)1 − an2x
(i)2 − an3x
(i)3 − . . .− an,n−1x
(i)n−1)/ann
Gauss-Seidel method8>>>>>>>><>>>>>>>>:
x(i+1)1 = (b1 − a12x
(i)2 − a13x
(i)3 − a14x
(i)4 − . . .− a1nx
(i)n )/a11
x(i+1)2 = (b2 − a21x
(i+1)1 − a23x
(i)3 − a24x
(i)4 − . . .− a2nx
(i)n )/a22
x(i+1)3 = (b2 − a31x
(i+1)1 − a32x
(i+1)2 − a34x
(i)4 − . . .− a3nx
(i)n )/a22
.
.
.
x(i+1)n = (bn − an1x
(i+1)1 − an2x
(i+1)2 − an3x
(i+1)3 − . . .− an,n−1x
(i+1)n−1 )/ann
Gauss-Seidel is the method with better numerical properties(less iterations to convergence). Which is the method withbetter efficiency in term of implementation in sequential orparallel computer? Why?In Gauss-Seidel, the computation of x
(i+1)k+1 implies the knowledge of x
(i+1)k .
Parallelization is impossible.
CS 594, 03-31-2010
Convergence
Convergence can be very slowConsider a modified Richardson method:
xi+1 = xi + τ(b − Axi ) (4)
Convergence is linear, similarly to Richardson we get
||ei+1|| ≤ ||I − τA||i ||e0||
but can be very slow (if ||I − τA|| is close to 1), e.g. let
A be symmetric and positive definite (SPD)
the matrix norm in ||I − τA|| is induced by || · ||2Then the best choice for τ (τ = 2
λ1+λN) would give
||I − τA|| =k(A)− 1
k(A) + 1
where k(A) = λNλ1
is the condition number of A.
CS 594, 03-31-2010
Convergence
Note:
The rate of convergence depend on thecondition number k(A)
Even for the the best τ the rate of convergence
k(A)− 1
k(A) + 1
is slow (i.e. close to 1) for large k(A)
We will see convergence of nonstationary methods alsodepend on k(A) but is better, e.g. compare with CG√
k(A)− 1√k(A) + 1
CS 594, 03-31-2010
Part III
Nonstationary Iterative Methods
CS 594, 03-31-2010
Nonstationary Iterative Methods
The methods involve information that changes at every iteration
e.g. inner-products with residuals or other vectors, etc.
The methods are
newer, harder to understand, but more effective
in general based on the idea of orthogonal vectors andsubspace projections
examples: Krylov iterative methods
CG/PCG, GMRES, CGNE, QMR, BiCG, etc.
CS 594, 03-31-2010
Krylov Iterative Methods
CS 594, 03-31-2010
Krylov Iterative Methods
Krylov subspaces: these are the spaces
Ki (A, r) = span{r , Ar , A2r , . . . , Ai−1r}
Krylov iterative methods find approximation xi to x where
Ax = b,
as a minimization on Ki (A, r).We have seen how this can be done for example by projection, i.e.by the the Petrov-Galerkin conditions:
(Axi , φ) = (b, φ) for ∀φ ∈ Ki (A, r)
CS 594, 03-31-2010
Krylov Iterative Methods
In general we
expend the Krylov subspace by a matrix-vector product, and
do a minimization/projection in it.
Various methods result by specific choices of expansion andminimization/projection.
CS 594, 03-31-2010
Krylov Iterative Methods
An example: The Conjugate Gradient Method (CG)
The method is for SPD matrices
There is a way at iteration i to construct new ’search direction’ pi s.t.
span{p0, p1, . . . , pi} ≡ Ki+1(A, r0) and (Api , pj ) = 0 for i 6= j.
Note: A is SPD⇒ (Api , pj ) ≡ (pi , pj )A can be used as an inner product, i.e. p0, . . . , pi is an (·, ·)Aorthogonal basis for Ki+1(A, r0)
⇒ we can easily find xi+1 ≈ x as
xi+1 = x0 + α0p0 + · · · + αi pi s.t.
(Axi+1, pj ) = (b, pj ) for j = 0, . . . , i
Namely, because of the (·, ·)A orthogonality of p0, . . . , pi at iteration i + 1 we have to find only αi
(Axi+1, pj ) = (A(xi + αi pi ), pi ) = (b, pi ), ⇒ αi =(ri , pi )
(Api , pi )
Note: xi above actually can be replaced by any x0 + v , v ∈ Ki (A, r0) (Why?)
CS 594, 03-31-2010
Learning Goals
A brief introduction to iterative methods
Motivation for iterative methods and links to previouslectures, namely
PDE discretization and sparse matricesOptimized implementations
Stationary iterative methods
Nonstationary iterative methods and links to building blocksthat we have already covered
Projection/MinimizationOrthogonalization
Krylov methods; an example with CG;to see more examples ... (next lecture)
CS 594, 03-31-2010
Recommended