The Conjugate Gradient Method...Conjugate Gradient Algorithm [Conjugate Gradient Iteration] The...

The Conjugate Gradient MethodTom Lyche

University of Oslo

Norway

The Conjugate Gradient Method – p. 1/23

Plan for the dayThe method

Algorithm

Implementation of test problems

Complexity

Derivation of the method

Convergence

The Conjugate gradient methodRestricted to positive definite systems: Ax = b,A ∈ R

n,n positive definite.

Generate {xk} by xk+1 = xk + αkpk,

pk is a vector, the search direction,

αk is a scalar determining the step length.

In general we find the exact solution in at most niterations.

For many problems the error becomes small after a fewiterations.

Both a direct method and an iterative method.

Rate of convergence depends on the square root of thecondition number

The name of the gameConjugate means orthogonal; orthogonal gradients.

But why gradients?

Consider minimizing the quadratic function Q : Rn → R

given by Q(x) := 12xT Ax − xT b.

The minimum is obtained by setting the gradient equalto zero.

∇Q(x) = Ax − b = 0 linear system Ax = b

Find the solution by solving r = b − Ax = 0.

The sequence {xk} is such that {rk} := {b − Axk} isorthogonal with respect to the usual inner product in R

The search directions are also orthogonal, but withrespect to a different inner product.

The algorithmStart with some x0. Set p0 = r0 = b − Ax0.

For k = 0, 1, 2, . . .

xk+1 = xk + αkpk, αk = rTk rk

pTk Apk

rk+1 = b − Axk+1 = rk − αkApk

pk+1 = rk+1 + βkpk, βk =rT

k+1rk+1

rTk rk

Example[

2 −1−1 2

[ x1x2

] = [ 10 ]

Start with x0 = 0.

p0 = r0 = b = [1, 0]T

α0 = rT0 r0

pT0 Ap0

= 12 , x1 = x0 + α0p0 = [ 0

0 ] + 12 [ 1

0 ] =[

r1 = r0 − α0Ap0 = [ 10 ] − 1

, rT1 r0 = 0

β0 = rT1 r1

rT0 r0

= 14 , p1 = r1 + β0p0 =

+ 14 [ 1

0 ] =[

1/41/2

α1 = rT1 r1

pT1 Ap1

= 23 ,

x2 = x1 + α1p1 =[

1/41/2

2/31/3

r2 = 0, exact solution.

Exact method and iterative methodOrthogonality of the residuals implies that xm is equal to the solutionx of Ax = b for some m ≤ n.

For if xk 6= x for all k = 0, 1, . . . , n − 1 then rk 6= 0 fork = 0, 1, . . . , n − 1 is an orthogonal basis for R

n. But then rn ∈ Rn is

orthogonal to all vectors in Rn so rn = 0 and hence xn = x.

So the conjugate gradient method finds the exact solution in at mostn iterations.

The convergence analysis shows that ‖x − xk‖A typically becomessmall quite rapidly and we can stop the iteration with k much smallerthat n.

It is this rapid convergence which makes the method interesting andin practice an iterative method.

Conjugate Gradient Algorithm

[Conjugate Gradient Iteration] The positive definite linear system Ax = b is

solved by the conjugate gradient method. x is a starting vector for the iteration. The

iteration is stopped when ||rk||2/||r0||2 ≤ tol or k > itmax. itm is the number of

iterations used.

function [ x , i tm ]= cg (A, b , x , t o l , i tmax ) r =b−A∗x ; p= r ; rho=r ’∗ r ;

rho0=rho ; for k =0: i tmax

i f sqrt ( rho / rho0)<= t o l ˆ2

i tm=k ; return

t =A∗p ; a=rho / ( p ’∗ t ) ;

x=x+a∗p ; r =r−a∗ t ;

rhos=rho ; rho=r ’∗ r ;

p= r +( rho / rhos )∗p ;

end i tm=itmax +1;

A family of test problemsWe can test the methods on the Kronecker sum matrix

A = C1⊗I+I⊗C2 =

bI cI bI

. . .. . .

bI cI bI

where C1 = tridiagm(a, c, a) and C2 = tridiagm(b, c, b).Positive definite if c > 0 and c ≥ |a| + |b|.

m = 3, n = 9

2c a 0 b 0 0 0 0 0

a 2c a 0 b 0 0 0 0

0 a 2c 0 0 b 0 0 0

b 0 0 2c a 0 b 0 0

0 b 0 a 2c a 0 b 0

0 0 b 0 a 2c 0 0 b

0 0 0 b 0 0 2c a 0

0 0 0 0 b 0 a 2c a

0 0 0 0 0 b 0 a 2c

b = a = −1, c = 2: Poisson matrix

b = a = 1/9, c = 5/18: Averaging matrix

Averaging problemλjk = 2c + 2a cos (jπh) + 2b cos (kπh), j, k = 1, 2, . . . ,m.

a = b = 1/9, c = 5/18

λmax = 59 + 4

9 cos (πh), λmin = 59 − 4

9 cos (πh)

cond2(A) = λmax

λmin= 5+4 cos(πh)

5−4 cos(πh) ≤ 9.

2D formulation for test problemsV = vec(x). R = vec(r), P = vec(p)

Ax = b ⇐⇒ DV + V E = h2F ,

D = tridiag(a, c, a) ∈ Rm,m, E = tridiag(b, c, b) ∈ R

vec(Ap) = DP + PE

Testing

[Testing Conjugate Gradient ] A = trid(a, c, a, m) ⊗ Im + Im ⊗trid(b, c, b, m) ∈ R

function [ V, i t ]= cg tes t (m, a , b , c , t o l , i tmax )

h =1 / (m+1) ; R=h∗h∗ones (m) ;

D=sparse ( t r i d i a g o n a l ( a , c , a ,m) ) ; E=sparse ( t r i d i a g o n a l ( b , c , b ,m) ) ;

V=zeros (m,m) ; P=R; rho=sum(sum(R.∗R ) ) ; rho0=rho ;

for k =1: i tmax

i f sqrt ( rho / rho0)<= t o l

i t =k ; return

T=D∗P+P∗E; a=rho /sum(sum(P.∗T ) ) ; V=V+a∗P; R=R−a∗T ;

rhos=rho ; rho=sum(sum(R.∗R ) ) ; P=R+( rho / rhos )∗P;

i t = i tmax +1;

The Averaging Problem

n 2 500 10 000 40 000 1 000 000 4 000 000

K 22 22 21 21 20

Table 1: The number of iterations K for the averag-

ing problem on a√

n ×√n grid. x0 = 0 tol = 10−8

Both the condition number and the required number of iterations areindependent of the size of the problem

The convergence is quite rapid.

Poisson Problemλjk = 2c + 2a cos (jπh) + 2b cos (kπh), j, k = 1, 2, . . . ,m.

a = b = −1, c = 2

λmax = 4 + 4 cos (πh), λmin = 4 − 4 cos (πh)

cond2(A) = λmax

λmin= 1+cos(πh)

1−cos(πh) = cond(T )2.

cond2(A) = O(n).

The Poisson problem

n 2 500 10 000 40 000 160 000

K 140 294 587 1168

n 1.86 1.87 1.86 1.85

Using CG in the form of Algorithm 8 with ǫ = 10−8 and x0 = 0 we listK, the required number of iterations and K/

The results show that K is much smaller than n and appears to beproportional to

This is the same speed as for SOR and we don’t have to estimateany acceleration parameter!√

n is essentially the square root of the condition number of A.

ComplexityThe work involved in each iteration is

1. one matrix times vector (t = Ap),

2. two inner products (pT t and rT r),

3. three vector-plus-scalar-times-vector (x = x + ap,r = r − at and p = r + (rho/rhos)p),

The dominating part of the computation is statement 1.Note that for our test problems A only has O(5n) nonzeroelements. Therefore, taking advantage of the sparseness ofA we can compute t in O(n) flops. With such animplementation the total number of flops in one iteration isO(n).

More ComplexityHow many flops do we need to solve the test problemsby the conjugate gradient method to within a giventolerance?

Average problem. O(n) flops. Optimal for a problemwith n unknowns.

Same as SOR and better than the fast method basedon FFT.

Discrete Poisson problem: O(n3/2) flops.

same as SOR and fast method.

Cholesky Algorithm: O(n2) flops both for averaging andPoisson.

Analysis and Derivation of the MethodTheorem 3 (Orthogonal Projection). Let S be a subspace of a finitedimensional real or complex inner product space (V , F, 〈·, ·, )〉. To eachx ∈ V there is a unique vector p ∈ S such that

〈x − p, s〉 = 0, for all s ∈ S. (1)

Best ApproximationTheorem 4 (Best Approximation). Let S be a subspace of a finitedimensional real or complex inner product space (V , F, 〈·, ·, )〉. Letx ∈ V , and p ∈ S . The following statements are equivalent

1. 〈x − p, s〉 = 0, for all s ∈ S.

2. ‖x − s‖ > ‖x − p‖ for all s ∈ S with s 6= p.

If (v1, . . . ,vk) is an orthogonal basis for S then

〈x,vi〉〈vi,vi〉

vi. (2)

Derivation of CGAx = b, A ∈ R

n,n is pos. def., x, b ∈ Rn

(x,y) := xT y, x,y ∈ Rn

〈x,y〉 := xTAy = (x,Ay) = (Ax,y)

‖x‖A =√

W0 = {0}, W1 = span{b}, W2 = span{b,Ab},Wk = span{b,Ab,A2b, . . . ,Ak−1b}W0 ⊂ W1 ⊂ W2 ⊂ Wk ⊂ · · ·dim(Wk) ≤ k, w ∈ Wk ⇒ Aw ∈ Wk+1

xk ∈ Wk, 〈xk − x,w〉 = 0 for all w ∈ Wk

p0 = r0 := b, pj = rj −∑j−1

i=0〈rj ,pi〉〈pi,pi〉

pi, j = 1, . . . , k.

ConvergenceTheorem 5. Suppose we apply the conjugate gradient method to apositive definite system Ax = b. Then the A-norms of the errors satisfy

||x − xk||A||x − x0||A

(√κ − 1√κ + 1

, for k ≥ 0,

where κ = cond2(A) = λmax/λmin is the 2-norm condition number ofA.

This theorem explains what we observed in the previoussection. Namely that the number of iterations is linked to√

κ, the square root of the condition number of A. Indeed,the following corollary gives an upper bound for the numberof iterations in terms of

√κ.

Corollary 6. If for some ǫ > 0 we have k ≥ 12 ln(2

ǫ )√

κ then

||x − xk||A/||x − x0||A ≤ ǫ.

The Conjugate Gradient Method...Conjugate Gradient Algorithm [Conjugate Gradient Iteration] The...

Documents

MATH 4211/6211 – Optimization Conjugate Gradient Method · Conjugate Gradient Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math

Algorithm 851: CG DESCENT, a Conjugate Gradient Method ...hozhang/papers/cg_compare.pdf · Algorithm 851: CG DESCENT, a Conjugate Gradient Method with Guaranteed Descent WILLIAM W

2D CAVITY MODELING USING METHOD OF MOMENTS AND …solvers, such as the LU decomposition (LUD), conjugate gradient (CG) method [13, 19], bi-conjugate gradient (BCG) method [19–21],

A robust preconditioner for the conjugate gradient method

Nonlinear Conjugate Gradient Methods Conjugate Gradient … · are provided for the conjugate gradient method assuming the descent property of each search direction. Some research

Conjugate Gradient Iterative Method for Deblurring Images

An Introduction to the Conjugate Gradient Method Without the …users.metu.edu.tr/ccandan/EE504_spring2004/notes/... · 2016-12-01 · An Introduction to the Conjugate Gradient Method

Performance Enhancement of Conjugate Gradient Method (CGM

SOLVING LINEAR EQUATIONS WITH CONJUGATE GRADIENT METHOD …sites.khas.edu.tr/tez/CanerSayin_izinli.pdf · SOLVING LINEAR EQUATIONS WITH CONJUGATE GRADIENT METHOD ON OPENCL PLATFORMS

Stochastic conjugate gradient method for least-square seismic inversion problems C… · · 2014-11-03Stochastic conjugate gradient method for least-square seismic inversion problems

NONLINEAR CONJUGATE GRADIENT METHODS FOR VECTOR OPTIMIZATION · methods are presented. Key words. vector optimization, Pareto-optimality, conjugate gradient method, unconstrained

Comparison of Conjugate Gradient Method and Jacobi Method

On Preconditioning the Linearized Conjugate Gradient ... · 2 2. Conjugate Gradient for nonlinear optimization The conjugate gradient method CG, is a suitable tool for solving symmetric

PRECONDITIONED CONJUGATE GRADIENT METHOD … · PRECONDITIONED CONJUGATE GRADIENT METHOD FOR OPTIMAL CONTROL PROBLEMS WITH CONTROL AND STATE CONSTRAINTS ROLAND HERZOG AND EKKEHARD

Stochastic conjugate gradient method for least-square seismic … · 2019-06-04 · Stochastic conjugate gradient method for least-square seismic inversion problems Wei Huang*, Hua-Wei

2.7.6 Conjugate Gradient Method for a Sparse System

An Introduction to the Conjugate Gradient Method Without the …graphics.cs.cmu.edu/.../01/painless-conjugate-gradient.pdf · 2008-09-08 · An Introduction to the Conjugate Gradient

ON ERROR ESTIMATION IN THE CONJUGATE GRADIENT METHOD …

An Introduction to the Conjugate Gradient Method Without ...jrs/papers/cg.pdf · An Introduction to the Conjugate Gradient Method Without the Agonizing Pain Edition 11 4 Jonathan

Conjugate Gradient Method