SM1 Iterative Digest

8/3/2019 SM1 Iterative Digest

1/17

Solution Methods:

Iterative Techniques

MIT c2011


2/17

1 MotivationSlide 1

Consider a standard second order finite difference discretization of

2u = f,

on a regular grid, in 1, 2, and 3 dimensions.

A u = f

1.1 1D Finite DifferencesSlide 2

n points

n n matrixbandwidth b = O(1)

Cost of Gaussian elimination O(b

2

n) =O

(n)


n n points

n2 n2 matrix

bandwidth b = O(n)Cost of Gaussian elimination O(b2n2) = O(n4)

1


3/17


n n n points

n3 n3 matrixbandwidth b = O(n2)

Cost of Gaussian elimination O(b2n3) = O(n7) !

This means that if we halve the grid spacing, we will have 8 times (23) moreunknowns and the cost of solving the problem will increase by a factor of 128(27). It is apparent that, at least for practical three dimensional problems, fastermethods are needed.

2 Basic Iterative Methods

2.1 Jacobi

2.1.1 Intuitive InterpretationSlide 5

Instead of solving

uxx = f,

we solveu

t= uxx + f, N1

starting from an arbitrary u(x, 0).

We expect u(x, t ) u(x).

That is, we expect the solution of the time dependent problem to converge to thesolution of the steady state problem.

Note 1 Solution by relaxation

By adding the time dependent term ut

, we have now a parabolic equation. Ifthe boundary conditions are kept fixed, we expect the solution to convergeto the steady state solution (i.e., u

t= 0). Recall that the time dependent

heat transfer problem is modeled by this equation. In this case, we expectthe temperature to settle to a steady state distribution provided that the heatsource, f, and the boundary conditions, do not depend on t.

2


4/17

Slide 6To solveu

t= uxx + f

we use an inexpensive (explicit) method. Thus avoiding the need to solve a

linear system of equations.

For instance,ur+1i u

ri

t=

uri+1 2uri + u

ri1

h2+ fi

u = {ui}ni=1, f = {fi}

ni=1

Here, we use the super index r to denote iteration (or time level). u will denotethe solution vector. An approximation to u at iteration r will be denoted byur.

ur+1 = ur + t(fAur) = (ItA) ur + tf .

Slide 7

Stability dictates that

t h2

2Thus, we take t as large as possible, i.e. (t = h2/2).

ur+1 =

I

h2

2A

ur +

h2

2f

ur+1i =1

2(uri+1 + u

ri1 + h

2 fi) for i = 1, . . . n.

2.1.2 Matrix FormSlide 8

Split A

A = D L U

D: DiagonalL: Lower triangularU: Upper triangular

A u = f becomes (D L U) u = f

Iterative method

Dur+1 = (L + U) ur + f

Slide 9

ur+1 = D1(L + U) ur + D1 f= D1(D A) ur + D1 f= (ID1A) ur + D1 f

3


5/17

ur+1 = RJ ur + D1 f

RJ = (ID

1

A) : Jacobi Iteration MatrixD1ii = h

2/2

Note that, in order to implement this method we would typically not form anymatrices, but update the unknowns, one component at a time, according to

ur+1i =1

2(uri+1 + u

ri1 + h

2 fi) for i = 1, . . . n

2.1.3 ImplementationSlide 10

Jacobi iteration

ur+1i =12 (u

ri+1 + u

ri1 + h

2fi)

2.2 Gauss-SeidelSlide 11

Assuming we are updating the unknowns according to their index i, at the timeur+1i needs to be calculated, we already know ur+1i1 . The idea of Gauss-Seidel

iteration is to always use the most recently computed value.

Gauss-Seidel iteration (consider most recent iterate)

ur+1i =12 (u

ri+1 + u

r + 1i1 + h

2fi)

4


6/17

2.2.1 Matrix FormSlide 12

Split A

A = D L U

D: DiagonalL: Lower triangularU: Upper triangular

A u = f becomes (D L U) u = f

Iterative method

(D L)ur+1 = Uur + f

The above matrix form assumes that we are updating through our unknowns inascending order. If we were to update in reverse order, i.e., the last unknown

first, the iteration matrix would become RGS = (D U)1L.

Note 2 Updating order

We see that, unlike in the Jacobi method, the order in which the unknowns areupdated in Gauss-Seidel changes the result of the iterative procedure. One couldsweep through the unknowns in ascending, descending, or alternate orders. Thelatter procedure is called symmetric Gauss-Seidel.Another effective strategy, known as red-black Gauss-Seidel iteration, is to up-

date the even unknowns first (red)

ur+12i =1

2

ur2i+1 + u

r2i1 + h

2 f2i

,

and then the odd components (black)

ur+12i+1 =1

2

ur+12i+2 + u

r+12i + h

2 f2i+1

.

The red-black Gauss-Seidel iteration is popular for parallel computation sincethe red (black) points only require the black (red) points and these can beupdated in any order. The method readily extends to multiple dimensions. In2D, for instance, the red and black points are shown below.

Slide 13

5


7/17

ur+1

= (D L)1

Uur

+ (D L)1

f

ur+1 = RGS ur + (D L)1 f

RGS = (D L)1U : Gauss-Seidel Iteration Matix

2.3 Error EquationSlide 14

Let u be the soution of A u = f.

For an approximate solution ur, we define

Iteration Error: er

= u ur

Residual: rr = fA ur

Subtracting A ur from both sides ofA u = f,

A u A ur = f A ur

ERROR EQUATION A er = rr N3

We note that, whereas the iteration error is not an acessible quantity, since itrequires u, the residual is easily computable and only requires ur.

Note 3 Relationship between error and residual.We have seen that the residual is easily computable and in some sense it mea-sures the amount by which our approximate solution ur fails to satisfy theoriginal problem.It is clear that ifr = 0, we have e = 0. However, it is not always the case thatwhen r is small in norm, e is also small in norm.We have that A u = f and A1r = e. Taking norms, we have

f2 A2 u2, e2 A12 r2 .

Combining these two expressions, we have

e2u2

A2 A12

cond (A)r2f2

.

From here we see that if our matrix is not well conditioned, i.e., cond ( A) islarge, then small residuals can correspond to significant errors.

6


8/17

2.3.1 JacobiSlide 15

ur+1

= D

1

(L + U)ur

+ D

1

fu = D1(L + U) u + D1f

subtracting

er+1 = D1(L + U) RJ

er = RJ er

2.3.2 Gauss-SeidelSlide 16

Similarly,

er+1 = (D L)1 U

RGSer = RGS e

r

It is clear that the above equations satisfied by the error are not useful for prac-tical purposes since e0 = uu0, will not be known before the problem is solved.We will see however, that by studying the equation satisfied by the error we candetermine useful properties regarding the convergence of our schemes.

2.4 Examples

2.4.1 JacobiSlide 17

uxx = 1 u(0) = u(1) = 0; u0

= 0

2.4.2 Gauss-Seidel Slide 18uxx = 1 u(0) = u(1) = 0; u

0 = 0

7


9/17

2.4.3 ObservationsSlide 19

The number of iterations required to obtain a certain level of convergenceis O(n2).

Gauss-Seidel is about twice as fast as Jacobi.

Why ?

3 Convergence AnalysisSlide 20

er = R er1 = R R er2 = = Rr e0 .

The iterative method will converge if

limr Rr

= 0 (R) = max |(R)| < 1 . N4

(R) is the spectral radius.

The spectral radius is the radius of the smallest circle centered at the originthat contains all the eigenvalues. The above condition indicates that the nec-essary and sufficient condition for the scheme to converge, for arbitrary initialconditions, is that the largest eienvalue, in modulus, lies within the unit circle.

Note 4 Condition on (R)

The condition that

limr

Rr = 0 if and only if (R) < 1

8


10/17

can be easily shown for the case where R is diagonalizable. In this case R =X1 X, where is the diagonal matrix of eigenvectors. We have that Rr =

X1 r X and hence we require that

limr

r = 0

which is equivalent to (R) < 1.

3.1 TheoremsSlide 21

If the matrix A is strictly diagonally dominant then Jacobi and Gauss-

Seidel iterations converge starting from an arbitrary initial condition. N5

A = {ai j} is strictly diagonally dominant if

|ai i | >n

j = 1j = i

|ai j | for all i.

If the matrix A is symmetric positive definite then Gauss-Seidel iteration

converges for any initial solution. N6

Note 5 Convergence of Jacobi iteration for diagonally dominant

matrices

For the Jacobi iteration, RJ = D1(L + U), it can be easily seen that R haszero diagonal entries. It can be shown (Gershgorin Theorem - see GVL) that amatrix with zeros in the main diagonal, (RJ) RJ. Hence

(RJ) = RJ2 RJ = max1jn

nj=1j=1

ai jai i < 1 ,

the last inequality follows form the diagonal dominance of A.

Note 6 Convergence of Gauss-Seidel for SPD matrices

(See Theorem 10.1.2 of GVL)

9


11/17

3.2 JacobiSlide 22

Unfortunately none of the above convergence theorems helps us to show thatJacobi iteration will converge for our model problem. However, since for ourmodel problem we can calculate the eigenvalues of A explicitly (see previouslectures), convergence can be shown directly.

RJ = D1 (L + U) = D1 (D A) = ID1 A = I

h2

2A

If A vk = k vk ,

then RJ vK =

I

h2

2A

vk =

1

h2

2k(A)

k(RJ)

vk

k

(RJ) = 1

h2

2 k

(A)Eigenvectors of RJ Eigenvectors of A

Slide 23

Recall . . .

A =1

h2

2 1 0 0

1 2 1.

..

.

.

.

0.

..

..

.... 0

.

.

..

.. 1 2 1

0 0 1 2

n n SPD

Avk = kvk k = 1, . . . , n

Eigenvectors: h = 1n+1

vkj = sin(khj) = sin

kj

n + 1

Eigenvalues:

k(A) =2

h2[1 cos(kh)]

Slide 24

k(RJ) = 1h2

2k(A) = 1 [1 cos kh]

= cosk

n + 1< 1, k = 1, . . . , n

Jacobi converges for our model problem.

RJvk = k(RJ)v

k

Slide 25

10


12/17

Slide 26

Since A has a complete set of eigenvectors vk, k = 1, . . . n,

Write e0 = nk=1

ck vk vk, k-th eigenvector of RJ

e1 = R e0 =n

k=1

ck k(RJ) v

k

er = Rr e0 =n

k=1

ck (k(RJ))

r vk

3.2.1 ExampleSlide 27

We consider the situation in which the initial error only contains two modalcomponents corresponding to k = 1 and k = 5.

Slide 28

The figure below show the evolution of the error for each iteration. The faster

decay of the k = 5 mode can be readily observed.

11


13/17

3.2.2 Convergence rateSlide 29

k(RJ

) = cos(kh), k = 1, . . . n

Largest |k(RJ)| for k = 1, . . . n

Worst case e0 = c1v1 er = c1 (RJ)

r v1

er

e0= (cos(h))r

1

2h2

2

rSlide 30

For h small (n large) h 0 and cos(h) can be approximated as 12h2

2.

To obtain a given level of convergence, e.g. 10,

er

e0< 10

1

2h2

2

r< 10 r =

log(1 2h2

2)

2

2h2=

2(n + 1)2

2

We have used the fact that for x small log(1 + x) = xx2

2+

x3

3. . .

r = O(n2)

The number of iterations required to obtain a given level of convergence is n2.

12


14/17

3.2.3 Convergence rate (2D)Slide 31

This analysis can be extended to two and three dimensions in a straightforwardmanner.

In two dimensions, and for a uniform, n n, grid,we have

k(RJ) = 1h2

4(A) =

1

2[cos(kh) + cos(h)]

(RJ) = cos(h) = co s

n + 1

h = 1

n+1

Therefore, r = O(n2)

It is important to note that in 1D, 2D and also in 3D, r = O(n2). This,combined with the fact that the number of unknowns in 1, 2, and 3D is n, n2

and n3, makes iterative methods potentially attractive for 2, and specially, 3Dproblems.

3.3 Gauss-SeidelSlide 32

The convergence analysis for the Gauss-Seidel method is analogous to that ofthe Jacobi method. The expressions for the eigenvectors and eigenvalues, for

the model problems, can also be found in closed form although the procedure issomewhat more involved.

RGS = (D L)1 U

It can be shown (try it !) that

k(RGS) = cos2(kh) = [k(RJ)]

2< 1

Gauss-Seidel converges for our problem

The eigenvalues are always positive.

But, Eigenvectors of RGS Eigenvectors of A

The eigenvectors v

k

= {v

k

j }

n

j=1 of RGS are

vkj = [

k(RGS)]j sin(khj)

These eigenvectors, although not orthogonal, still form a complete basis and canbe used to represent the error components.

13


15/17

Slide 33

3.3.1 Convergence rate

Slide 34To obtain a given level of convergence, e.g. 10

er

e0< 10

1

2h2

2

2r< 10 r =

2 log(1 2h2

2)

2h2=

(n + 1)2

2

r = O(n2)

Same rate as Jacobi, but requires only half the number of iterations.

4 Comparative costSlide 35

GaussIterative Elimination

1D nn O(n2 n)=O(n3) O(n)

2D n2n2 O(n2 n2)=O(n4) O(n4)

3D n3n3 O(n2 n3)=O(n5) O(n7)

red = # itersgreen = cost/iter

Only 3D problems give iterative methods some cost advantage over Gaussianelimination.

14


16/17

5 Over/Under Relaxation

5.0.2 Main Idea Slide 36Typically

ur+1 = R ur + f f = D1f Jacobif = (D L)1f Gauss-Seidel

Can we extrapolate the changes?

ur+1 = (Rur + f) + (1 ) ur

= [ R + (1 ) I] R

ur + f > 0

5.1 How large can we choose ?Slide 37

k(R) = k(R) + (1 )

Jacobi k(RJ) = cos kh

GS k(RGS) = cos2 kh

(R) < 1 0 J 1 can only beunder-relaxed

0 GS 2 can be over-relaxed

N7

Note 7 Succesive Over-Relaxation (SOR)

Over-relaxed Gauss-Seidel is about a factor of two faster than standard GS butstill requires O(n2) iterations.The idea of over-relaxation can be used to improve convergence further. Ifinstead of extrapolating the changes for all the unknowns at the same time, weextrapolate the change of each unknown as soon as it has been computed, anduse the updated value in subsequent unknown updates we obtain what is calledsuccessive over-relaxation (SOR). In a component form SOR is written as

ur+1i =

1

2(uri+1 + u

r+1i1 ) +

h2

2f i

+ (1 ) uri .

with 1 < < 2 and typically 1.5.In matrix form SOR can be written as we

ur+1 = (D L)1 [((1 ) D + U)u + f]

= RSOR ur + (D L)1 .

It can be shown that the spectral radius of RSOR, for our model problem, is1O(h) and hence the number of iterations scales like n rather than n2. SORwas very popular 30 years ago. Nowadays, we can do better.

15


17/17

REFERENCES

[GVL] G. Golub, C. van Loan, Matrix Computations, Johns Hopkins.

16

Documents

SM1 Iterative Digest