View
28
Download
1
Category
Preview:
Citation preview
The Conjugate Gradient MethodTom Lyche
University of Oslo
Norway
The Conjugate Gradient Method – p. 1/23
Plan for the dayThe method
Algorithm
Implementation of test problems
Complexity
Derivation of the method
Convergence
The Conjugate Gradient Method – p. 2/23
The Conjugate gradient methodRestricted to positive definite systems: Ax = b,A ∈ R
n,n positive definite.
Generate {xk} by xk+1 = xk + αkpk,
pk is a vector, the search direction,
αk is a scalar determining the step length.
In general we find the exact solution in at most niterations.
For many problems the error becomes small after a fewiterations.
Both a direct method and an iterative method.
Rate of convergence depends on the square root of thecondition number
The Conjugate Gradient Method – p. 3/23
The name of the gameConjugate means orthogonal; orthogonal gradients.
But why gradients?
Consider minimizing the quadratic function Q : Rn → R
given by Q(x) := 12xT Ax − xT b.
The minimum is obtained by setting the gradient equalto zero.
∇Q(x) = Ax − b = 0 linear system Ax = b
Find the solution by solving r = b − Ax = 0.
The sequence {xk} is such that {rk} := {b − Axk} isorthogonal with respect to the usual inner product in R
n.
The search directions are also orthogonal, but withrespect to a different inner product.
The Conjugate Gradient Method – p. 4/23
The algorithmStart with some x0. Set p0 = r0 = b − Ax0.
For k = 0, 1, 2, . . .
xk+1 = xk + αkpk, αk = rTk rk
pTk Apk
rk+1 = b − Axk+1 = rk − αkApk
pk+1 = rk+1 + βkpk, βk =rT
k+1rk+1
rTk rk
The Conjugate Gradient Method – p. 5/23
Example[
2 −1−1 2
]
[ x1x2
] = [ 10 ]
Start with x0 = 0.
p0 = r0 = b = [1, 0]T
α0 = rT0 r0
pT0 Ap0
= 12 , x1 = x0 + α0p0 = [ 0
0 ] + 12 [ 1
0 ] =[
1/20
]
r1 = r0 − α0Ap0 = [ 10 ] − 1
2
[
2−1
]
=[
01/2
]
, rT1 r0 = 0
β0 = rT1 r1
rT0 r0
= 14 , p1 = r1 + β0p0 =
[
01/2
]
+ 14 [ 1
0 ] =[
1/41/2
]
,
α1 = rT1 r1
pT1 Ap1
= 23 ,
x2 = x1 + α1p1 =[
1/20
]
+ 23
[
1/41/2
]
=[
2/31/3
]
r2 = 0, exact solution.
The Conjugate Gradient Method – p. 6/23
Exact method and iterative methodOrthogonality of the residuals implies that xm is equal to the solutionx of Ax = b for some m ≤ n.
For if xk 6= x for all k = 0, 1, . . . , n − 1 then rk 6= 0 fork = 0, 1, . . . , n − 1 is an orthogonal basis for R
n. But then rn ∈ Rn is
orthogonal to all vectors in Rn so rn = 0 and hence xn = x.
So the conjugate gradient method finds the exact solution in at mostn iterations.
The convergence analysis shows that ‖x − xk‖A typically becomessmall quite rapidly and we can stop the iteration with k much smallerthat n.
It is this rapid convergence which makes the method interesting andin practice an iterative method.
The Conjugate Gradient Method – p. 7/23
Conjugate Gradient Algorithm
[Conjugate Gradient Iteration] The positive definite linear system Ax = b is
solved by the conjugate gradient method. x is a starting vector for the iteration. The
iteration is stopped when ||rk||2/||r0||2 ≤ tol or k > itmax. itm is the number of
iterations used.
function [ x , i tm ]= cg (A, b , x , t o l , i tmax ) r =b−A∗x ; p= r ; rho=r ’∗ r ;
rho0=rho ; for k =0: i tmax
i f sqrt ( rho / rho0)<= t o l ˆ2
i tm=k ; return
end
t =A∗p ; a=rho / ( p ’∗ t ) ;
x=x+a∗p ; r =r−a∗ t ;
rhos=rho ; rho=r ’∗ r ;
p= r +( rho / rhos )∗p ;
end i tm=itmax +1;
The Conjugate Gradient Method – p. 8/23
A family of test problemsWe can test the methods on the Kronecker sum matrix
A = C1⊗I+I⊗C2 =
C1
C1
. . .
C1
C1
+
cI bI
bI cI bI
. . .. . .
. . .
bI cI bI
bI cI
,
where C1 = tridiagm(a, c, a) and C2 = tridiagm(b, c, b).Positive definite if c > 0 and c ≥ |a| + |b|.
The Conjugate Gradient Method – p. 9/23
m = 3, n = 9
A =
2c a 0 b 0 0 0 0 0
a 2c a 0 b 0 0 0 0
0 a 2c 0 0 b 0 0 0
b 0 0 2c a 0 b 0 0
0 b 0 a 2c a 0 b 0
0 0 b 0 a 2c 0 0 b
0 0 0 b 0 0 2c a 0
0 0 0 0 b 0 a 2c a
0 0 0 0 0 b 0 a 2c
b = a = −1, c = 2: Poisson matrix
b = a = 1/9, c = 5/18: Averaging matrix
The Conjugate Gradient Method – p. 10/23
Averaging problemλjk = 2c + 2a cos (jπh) + 2b cos (kπh), j, k = 1, 2, . . . ,m.
a = b = 1/9, c = 5/18
λmax = 59 + 4
9 cos (πh), λmin = 59 − 4
9 cos (πh)
cond2(A) = λmax
λmin= 5+4 cos(πh)
5−4 cos(πh) ≤ 9.
The Conjugate Gradient Method – p. 11/23
2D formulation for test problemsV = vec(x). R = vec(r), P = vec(p)
Ax = b ⇐⇒ DV + V E = h2F ,
D = tridiag(a, c, a) ∈ Rm,m, E = tridiag(b, c, b) ∈ R
m,m
vec(Ap) = DP + PE
The Conjugate Gradient Method – p. 12/23
Testing
[Testing Conjugate Gradient ] A = trid(a, c, a, m) ⊗ Im + Im ⊗trid(b, c, b, m) ∈ R
m2,m2
function [ V, i t ]= cg tes t (m, a , b , c , t o l , i tmax )
h =1 / (m+1) ; R=h∗h∗ones (m) ;
D=sparse ( t r i d i a g o n a l ( a , c , a ,m) ) ; E=sparse ( t r i d i a g o n a l ( b , c , b ,m) ) ;
V=zeros (m,m) ; P=R; rho=sum(sum(R.∗R ) ) ; rho0=rho ;
for k =1: i tmax
i f sqrt ( rho / rho0)<= t o l
i t =k ; return
end
T=D∗P+P∗E; a=rho /sum(sum(P.∗T ) ) ; V=V+a∗P; R=R−a∗T ;
rhos=rho ; rho=sum(sum(R.∗R ) ) ; P=R+( rho / rhos )∗P;
end ;
i t = i tmax +1;
The Conjugate Gradient Method – p. 13/23
The Averaging Problem
n 2 500 10 000 40 000 1 000 000 4 000 000
K 22 22 21 21 20
Table 1: The number of iterations K for the averag-
ing problem on a√
n ×√n grid. x0 = 0 tol = 10−8
Both the condition number and the required number of iterations areindependent of the size of the problem
The convergence is quite rapid.
The Conjugate Gradient Method – p. 14/23
Poisson Problemλjk = 2c + 2a cos (jπh) + 2b cos (kπh), j, k = 1, 2, . . . ,m.
a = b = −1, c = 2
λmax = 4 + 4 cos (πh), λmin = 4 − 4 cos (πh)
cond2(A) = λmax
λmin= 1+cos(πh)
1−cos(πh) = cond(T )2.
cond2(A) = O(n).
The Conjugate Gradient Method – p. 15/23
The Poisson problem
n 2 500 10 000 40 000 160 000
K 140 294 587 1168
K/√
n 1.86 1.87 1.86 1.85
Using CG in the form of Algorithm 8 with ǫ = 10−8 and x0 = 0 we listK, the required number of iterations and K/
√n.
The results show that K is much smaller than n and appears to beproportional to
√n
This is the same speed as for SOR and we don’t have to estimateany acceleration parameter!√
n is essentially the square root of the condition number of A.
The Conjugate Gradient Method – p. 16/23
ComplexityThe work involved in each iteration is
1. one matrix times vector (t = Ap),
2. two inner products (pT t and rT r),
3. three vector-plus-scalar-times-vector (x = x + ap,r = r − at and p = r + (rho/rhos)p),
The dominating part of the computation is statement 1.Note that for our test problems A only has O(5n) nonzeroelements. Therefore, taking advantage of the sparseness ofA we can compute t in O(n) flops. With such animplementation the total number of flops in one iteration isO(n).
The Conjugate Gradient Method – p. 17/23
More ComplexityHow many flops do we need to solve the test problemsby the conjugate gradient method to within a giventolerance?
Average problem. O(n) flops. Optimal for a problemwith n unknowns.
Same as SOR and better than the fast method basedon FFT.
Discrete Poisson problem: O(n3/2) flops.
same as SOR and fast method.
Cholesky Algorithm: O(n2) flops both for averaging andPoisson.
The Conjugate Gradient Method – p. 18/23
Analysis and Derivation of the MethodTheorem 3 (Orthogonal Projection). Let S be a subspace of a finitedimensional real or complex inner product space (V , F, 〈·, ·, )〉. To eachx ∈ V there is a unique vector p ∈ S such that
〈x − p, s〉 = 0, for all s ∈ S. (1)
x
x
x - p
p=PS
S
The Conjugate Gradient Method – p. 19/23
Best ApproximationTheorem 4 (Best Approximation). Let S be a subspace of a finitedimensional real or complex inner product space (V , F, 〈·, ·, )〉. Letx ∈ V , and p ∈ S . The following statements are equivalent
1. 〈x − p, s〉 = 0, for all s ∈ S.
2. ‖x − s‖ > ‖x − p‖ for all s ∈ S with s 6= p.
If (v1, . . . ,vk) is an orthogonal basis for S then
p =k
∑
i=1
〈x,vi〉〈vi,vi〉
vi. (2)
The Conjugate Gradient Method – p. 20/23
Derivation of CGAx = b, A ∈ R
n,n is pos. def., x, b ∈ Rn
(x,y) := xT y, x,y ∈ Rn
〈x,y〉 := xTAy = (x,Ay) = (Ax,y)
‖x‖A =√
xT Ax
W0 = {0}, W1 = span{b}, W2 = span{b,Ab},Wk = span{b,Ab,A2b, . . . ,Ak−1b}W0 ⊂ W1 ⊂ W2 ⊂ Wk ⊂ · · ·dim(Wk) ≤ k, w ∈ Wk ⇒ Aw ∈ Wk+1
xk ∈ Wk, 〈xk − x,w〉 = 0 for all w ∈ Wk
p0 = r0 := b, pj = rj −∑j−1
i=0〈rj ,pi〉〈pi,pi〉
pi, j = 1, . . . , k.
The Conjugate Gradient Method – p. 21/23
ConvergenceTheorem 5. Suppose we apply the conjugate gradient method to apositive definite system Ax = b. Then the A-norms of the errors satisfy
||x − xk||A||x − x0||A
≤ 2
(√κ − 1√κ + 1
)k
, for k ≥ 0,
where κ = cond2(A) = λmax/λmin is the 2-norm condition number ofA.
This theorem explains what we observed in the previoussection. Namely that the number of iterations is linked to√
κ, the square root of the condition number of A. Indeed,the following corollary gives an upper bound for the numberof iterations in terms of
√κ.
The Conjugate Gradient Method – p. 22/23
Recommended