13
SIAM J. NUMER. ANAL. Vol. 11, No. 1, March 1974 BIDIAGONALIZATION OF MATRICES AND SOLUTION OF LINEAR EQUATIONS* C. C. PAIGE Abstract. An algorithm given by Golub and Kahan [2] for reducing a general matrix to bidiagonal form is shown to be very important for large sparse matrices. The singular values of the matrix are those of the bidiagonal form, and these can be easily computed. The bidiagonalization algorithm is shown to be the basis of important methods for solving the linear least squares problem for large sparse matrices. Eigenvalues of certain 2-cyclic matrices can also be efficiently computed using this bidiagonalization. 1. Introduction. The singular value decomposition of an m n matrix A is (1.1) A XZY H, where X and Y are unitary matrices and 2; is a rectangular diagonal m n matrix with nonnegative real diagonal entries, the singular values of A. Golub and Kahan [2, (2.4)] suggest a particular method for producing a bidiagonal matrix with the same singular values as A. This method is related to the Lanczos method of minimized iterations for tridiagonalizing a symmetric matrix [5], and like that method is ideally suited for large sparse matrices. The equivalent bidiagonalization of a general matrix and tridiagonalization of a symmetric matrix using Householder matrices is more stable but cannot be applied to many large sparse problems because of storage and time considerations. Despite a certain numerical instability it is known that a particular variant of the Lanczos tridiagonalization algorithm still gives extremely useful results [7, that is, the eigenvalues and eigenvectors are given as accurately as could be hoped, but the Convergence can be slower than the theoretical rate, and often several computed eigenvalues can represent just one true eigenvalue. The bidiagonaliza- tion algorithm of Golub and Kahan will be seen to be numerically equivalent to this variant of the Lanczos method applied to a certain matrix, and so will have the same sort of stability. Thus singular values will be given accurately, but with uncertain multiplicity. The bidiagonalization algorithm and its properties are given in 2 more fully than in [2]. Section 2 also indicates how eigenvalues of 2-cyclic matrices may be found efficiently using this approach. The Lanczos tridiagonalization of symmetric matrices is the basis of the conjugate gradients method of solution of equations [6], [4]; and Reid [8] has done much good work to show that the conjugate gradients method not only works, but is one of the best available for several large sparse problems. The reason it works the way it does is directly related to the performance of the Lanczos algorithm. In a similar manner it will be shown here that the bidiagonalization algorithm is the basis for two methods of solving the linear least squares problem. These methods are remarkably simple and ideally suited to general large sparse prob- lems, requiring very little computation per step, or storage. They derive their * Received by the editors July 11, 1972. ? School of Computer Science, McGill University, Montreal, Canada. This work was done while Visiting Research Associate at Stanford University under National Science Foundation Grant GJ- 29988X. 197 Downloaded 04/30/14 to 130.149.15.249. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Bidiagonalization of Matrices and Solution of Linear Equations · BIDIAGONALIZATIONOFMATRICES 201 so eigenvalues of such matrices can be computed efficiently just by using the bidiagonalization

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bidiagonalization of Matrices and Solution of Linear Equations · BIDIAGONALIZATIONOFMATRICES 201 so eigenvalues of such matrices can be computed efficiently just by using the bidiagonalization

SIAM J. NUMER. ANAL.Vol. 11, No. 1, March 1974

BIDIAGONALIZATION OF MATRICES AND SOLUTION OFLINEAR EQUATIONS*

C. C. PAIGE

Abstract. An algorithm given by Golub and Kahan [2] for reducing a general matrix to bidiagonalform is shown to be very important for large sparse matrices. The singular values of the matrix are thoseof the bidiagonal form, and these can be easily computed. The bidiagonalization algorithm is shown tobe the basis of important methods for solving the linear least squares problem for large sparse matrices.Eigenvalues of certain 2-cyclic matrices can also be efficiently computed using this bidiagonalization.

1. Introduction. The singular value decomposition of an m n matrix A is

(1.1) A XZYH,where X and Y are unitary matrices and 2; is a rectangular diagonal m n matrixwith nonnegative real diagonal entries, the singular values of A.

Golub and Kahan [2, (2.4)] suggest a particular method for producing abidiagonal matrix with the same singular values as A. This method is related to theLanczos method of minimized iterations for tridiagonalizing a symmetric matrix[5], and like that method is ideally suited for large sparse matrices. The equivalentbidiagonalization of a general matrix and tridiagonalization of a symmetricmatrix using Householder matrices is more stable but cannot be applied to manylarge sparse problems because of storage and time considerations.

Despite a certain numerical instability it is known that a particular variant ofthe Lanczos tridiagonalization algorithm still gives extremely useful results [7,that is, the eigenvalues and eigenvectors are given as accurately as could be hoped,but the Convergence can be slower than the theoretical rate, and often severalcomputed eigenvalues can represent just one true eigenvalue. The bidiagonaliza-tion algorithm of Golub and Kahan will be seen to be numerically equivalent tothis variant of the Lanczos method applied to a certain matrix, and so will have thesame sort of stability. Thus singular values will be given accurately, but withuncertain multiplicity. The bidiagonalization algorithm and its properties aregiven in 2 more fully than in [2]. Section 2 also indicates how eigenvalues of2-cyclic matrices may be found efficiently using this approach.

The Lanczos tridiagonalization of symmetric matrices is the basis of theconjugate gradients method of solution of equations [6], [4]; and Reid [8] hasdone much good work to show that the conjugate gradients method not only works,but is one of the best available for several large sparse problems. The reason itworks the way it does is directly related to the performance ofthe Lanczos algorithm.

In a similar manner it will be shown here that the bidiagonalization algorithmis the basis for two methods of solving the linear least squares problem. Thesemethods are remarkably simple and ideally suited to general large sparse prob-lems, requiring very little computation per step, or storage. They derive their

* Received by the editors July 11, 1972.? School of Computer Science, McGill University, Montreal, Canada. This work was done while

Visiting Research Associate at Stanford University under National Science Foundation Grant GJ-29988X.

197

Dow

nloa

ded

04/3

0/14

to 1

30.1

49.1

5.24

9. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 2: Bidiagonalization of Matrices and Solution of Linear Equations · BIDIAGONALIZATIONOFMATRICES 201 so eigenvalues of such matrices can be computed efficiently just by using the bidiagonalization

198 c.c. PAIGE

accuracy and convergence characteristics from those of the bidiagonalizationalgorithm. Finally, the relevant singular values of the problem can be found, aswell as the solution; these are important for understanding the accuracy of thesolution. The two solution of equations methods are developed and examined in

3 and 4.

2. The bidiagonalization algorithm. Here will denote the/2-norm, that is,

Ilull (u, u) u"u.The algorithm suggested by Golub and Kahan 2] for bidiagonalizing A will nowbe described in detail.

For a given m x n matrix A and a given initial vector U with ]]ul]l 1, themethod produces m-dimensional vectors u l, u2,’", and n-dimensional vectorsvl, v2, as follows: For 1,2,

(2.1) (iVi-- AHui- fliUi 1, lU0 0,

(2.2) fli+ 1/’/i+ Avi (zilgi,

where the real scalars i and i+l are chosen to be nonnegative and such thatliui+l]l ]lv/]] 1. Suppose i, fli+l are nonzero for i= 1,2,..., k. Then theabove process is fully defined for k steps, and if

U [-u,u2,..., u,

(2.3) LV-- v,v2,...,

then (2.1) and (2.2) may be rewritten

(2.4) An U VLn,

(X 2

k

AV UL + k+ lUk+ lek,where ek is the last column of the k k unit matrix lk, so that

(2.5) UnAV= UnUL + flk+lUnuk+, LVnV.

Ifak+ is also nonzero and one more step of(2.1) alone is executed, with [U,b/k+ 1], L =-[Ln, ilk+ lek] H, then

(2.6)

Therefore,

(2.7) VnAnO VnVn + + Vnv++It is now easy to show by induction that

(2.8) UnU VHV Ik.

This is certainly true for k 1, so suppose it is true for some k _> 1. Then from(2.5), UnUk+ 0, and from (2.7), VnVk + 0, and so (2.8) is true for k + 1.

This orthogonality ensures that the process will curtail for some value kno greater than the minimum of m and n with either ilk+ Uk+l 0 in (2.4) orD

ownl

oade

d 04

/30/

14 to

130

.149

.15.

249.

Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 3: Bidiagonalization of Matrices and Solution of Linear Equations · BIDIAGONALIZATIONOFMATRICES 201 so eigenvalues of such matrices can be computed efficiently just by using the bidiagonalization

BIDIAGONALIZATION OF MATRICES 199

Ok + 1Vk + 0 in (2.6). As a result there are two possible final states of the algorithm,the first with the k k matrix L"

(2.9) AI-Iu VLI-I, A V= UL, UI-IU vnv= I,,

and the second with the (k + 1) x k matrix "(2.10) A1-1 V11 A V-- 1-1 i,+ VI-Iv I,.

To understand which termination will occur, consider the singular valuedecomposition of in (2.10). has linearly independent columns so

(2.11) pMQI-1 pup ppl-l= I,+ 1, QI-IQ QQI-I= I,

where M is (k + 1) k and has elements

Thus

m,...,mkk> 0" mu= 0 fori-Cj.

Al-lp VQMu, A VQ UPM.

Therefore,

(2.12) AAHp PMMI-I, AriA VQ VQMI-IM,

and so when (2.10) results, A has at least k nonzero singular values m11, tnk,and at least one zero singular value with the last column of OP being the singularvector that lies in V(An), the null space of An.

Now if for it is true that ui (A), the range of A, then it can be seenfrom (2.2) that this is true for all i. But (Au) is the orthogonal complement of(A), so if ul (A), then all ui are orthogonal to Y(AU), and from the previousparagraph this means that (2.10) cannot be the final result, and so (2.9) must result.Next note that if(2.9) is the final result, then since L is nonsingular U A VL- 1, sothat u (A) for all i; thus if ul (A) then (2.9) cannot follow and (2.10) mustbe the final result.

Note that if A has rank r m, then u (A) necessarily, and (2.9) follows.If r < m, then (2.10) will result unless u is a linear combination of some of thecolumns of X in (1.1). Both final states give k nonzero singular values of A, so theprocess must curtail with k <= r.

If the process halts with k < r, then it can be continued in an obvious wayuntil r orthogonal vectors v x, ..., vr are obtained [2]. Then if u (A), the result(2.9) can be written

(2.13) A ULVH, UHU vHv I,..

Otherwise the result follows from (2.10)"

VnV I,.(2.14) A ULVu OnO I+ 1,

Such continuation need not be considered here.Dow

nloa

ded

04/3

0/14

to 1

30.1

49.1

5.24

9. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 4: Bidiagonalization of Matrices and Solution of Linear Equations · BIDIAGONALIZATIONOFMATRICES 201 so eigenvalues of such matrices can be computed efficiently just by using the bidiagonalization

200 c.c. PAIGE

We shall now relate the bidiagonalization algorithm to the Lanczos reductionof a certain symmetric matrix to tridiagonal form. Defining

W--- [w1, w2, W2k/A 0 /’/2 0 b/k 0

0 V 0 V2 0

(2.15) 0B=

A/

0 51

51 0T =- /2 0 52

it can be seen for example that (2.4) is mathematically equivalent to

(2.16) BW WT + fl,+ lw2k+ lek, Wt-Iw I2,, WHw2k+ O,

which is the result of applying 2k steps of the Lanczos process to B using as initialvector w l, the first column of W. Note that computation and vector storage haveeffectively been halved by taking full advantage of the structure of B. A directcomparison of this algorithm with the successful variant of the Lanczos algorithmin [7] applied to the matrix B and initial vector w shows that the two methods arealso computationally equivalent.

In practice neither algorithm will stop with either 5 or zero, but for largeenough k many of the eigenvalues of T in (2.15) will approximate eigenvalues of Bto almost machine precision. Now if 2 is an eigenvalue of B such that

then

AHAz ,2z,

so that a 121 is a singular value of A, and several singular values of A will begiven to almost machine precision by computing the eigenvalues of T. However,with the permutation matrix

P [e,, e, +1, e2, e,+ 2, e,, e2,]

T-pr OLT )P’so that the singular values of L are the moduli of eigenvalues of T, and thus weneed only compute the singular values of L. These can be accurately computedusing the QR-like algorithm given in [3].

Finally, if a is a singular value of A, then 7 a are eigenvalues of the 2-cyclicmatrix

Dow

nloa

ded

04/3

0/14

to 1

30.1

49.1

5.24

9. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 5: Bidiagonalization of Matrices and Solution of Linear Equations · BIDIAGONALIZATIONOFMATRICES 201 so eigenvalues of such matrices can be computed efficiently just by using the bidiagonalization

BIDIAGONALIZATION OF MATRICES 201

so eigenvalues of such matrices can be computed efficiently just by using thebidiagonalization algorithm on A, and this is particularly important for largesparse A, which arise from elliptic partial differential equations. This result isparallel to that in I9], where Reid shows how the computation and storage canbe halved when the conjugate gradients algorithm is applied to such 2-cyclicmatrices.

In practice the bidiagonalization algorithm performed just as expected, thelargest singular values being given with remarkable accuracy very quickly, whilethe small ones (which correspond to the middle eigenvalues of B in (2.15)) wereslower to converge, especially when there were several very close, very small ones.If the process was carried far enough, the large singular values would often appearseveral times.

3. Solution of the linear least squares problem. I. Given the m x n matrix Aand an m-vector b, the problem is

(3.1) minimize IlAx- bllor equivalently, find x and r such that

(3.2) r + Ax b, Anr O.

Any such x is called a least squares solution. The x which also minimizes [[xl[ iscalled the minimum least squares solution. The minimum least squares solution isthe unique solution orthogonal to V(A), and so will have the form x Any. Thusif y is any solution of

(3.3) AnAAny Anb,

then x Any is the minimum least squares solution.In order to motivate the methods that are about to be introduced for solving

such problems, suppose that A V UL as in (Z9), and the representation x Vyholds; then since 0 rnA V rnUL and L is nonsingular, Unr 0. Also

(3.4) b= r + Ax r + AVy= r+ ULy;

thus y and x may be found from

(3.5) Ly Unb, x Vy.

It is now necessary to show that for a certain choice of Ul in (2.1) and (2.2)such a representation of x is possible. We must then consider which of (2.9) and(2.10) results and what can be done in the latter case.

The most convenient choice of initial vector is the one to be considered here:

(3.6) Ul b/l, 1 Ilbllthis is convenient because Unb lel is then known a priori this will be discussedagain in 5. The equations of the form (3.1) or (3.2) now separate into two possibleclasses with corresponding slightly different methods of solution. First, if theequation Ax b is compatible, that is if r 0 in (3.2), then b e (A) and so ue (A) in (3.6), and from 2 it follows that (2.9) must result. This compatiblecase will be treated now in full; the case when r 4= 0 will be examined later in thispaper.D

ownl

oade

d 04

/30/

14 to

130

.149

.15.

249.

Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 6: Bidiagonalization of Matrices and Solution of Linear Equations · BIDIAGONALIZATIONOFMATRICES 201 so eigenvalues of such matrices can be computed efficiently just by using the bidiagonalization

202 c.c. PAIGE

Here

AHU= VLH, AV UL UU vHv I(3.7) Ax b flu,

and let x- Vy+w, where Vw=O. Then Ax= ULy+Aw=flUl; butUAw LVw 0, so Ly fle and thus Aw 0, meaning that the solutionof minimum norm is

(3. x Vy,

since from (3.7), Vy AnUL-ny (An), and so is orthogonal to V(A).The elements r/i of y are found from Ly flleX, that is,

(3.9)

The method is an attractive one because the 2-norm of the error is minimizedin the ith step over all possible approximate solutions which are combinations ofv l, ..., vi. It is an excellent algorithm if A is a large sparse matrix, since a meansofcomputing u Av u and storage for 3 vectors is just about all that is needed.What is more, the residual is available at each step with no extra computation,for if Yi (r/, ..., r/i, 0,..., 0)n, then the residual ri after the ith step is

r b- AVy lUl ULy -fli+ liUi+l

(3.10) li((ZiU Avi) --flili lUi rliAv

ri_ tIiAv

where use has bccn made of (2.2) and the expression for r/i+ in (3.9). From thisexpression it can bc sccn that residuals at different steps arc orthogonal.

This is not a new method as will bc shown in the next paragraph. Its importancehere is the very simple derivation from the very natural bidiagonalization algorithm(2.1) and (2.2), so that the connection between the method and the relevant singularvalues of A is laid bare. As is well known, these singular values arc of vital impor-tance in the solution of linear equations.

In fact, the method (3.9) is equivalent to Craig’s method in the computationalform described by Faddccv and Faddccva 1, (23), p. 405. In Craig’s algorithmtake Xo O, ai 1/2i bi [i+ 1/’/i+ l2/[ili]2, gi orlivi, and ri fli+ ll’]iui+ 1,

and the algorithm here will result. Thus Craig’s method is seen to apply to anycompatible system of linear algebraic equations. It has been pointed out in thatCraig’s method is mathematically equivalent to applying the method of conjugategradients [4] to

(3.11) AAny b, x Any.

The purpose in using (3.9) rather than conjugate gradients applied to (3.11) is toavoid any suggestion of deterioration in the condition of the problem to be solved.

The possibility that now has to be examined is the one where r is nonzero in(3.2), so that u b/ N(A). Section 2 then shows that (2.10) must be the finalstate of the bidiagonalization procedure. Thus solve

(3.12) r + Ax b 1/./1, Anr 0Dow

nloa

ded

04/3

0/14

to 1

30.1

49.1

5.24

9. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 7: Bidiagonalization of Matrices and Solution of Linear Equations · BIDIAGONALIZATIONOFMATRICES 201 so eigenvalues of such matrices can be computed efficiently just by using the bidiagonalization

BIDIAGONALIZATION OF MATRICES 203

using

~~, VuV Ik(3.13) AnO V, AV= UL /uO Ik+l,

Suppose

(3.14) x= Vy + w with Vnw= O"

then r + Oy + Aw fllUl, but OnAw Vnw 0 so Ey fllelgiving r + Aw OOnr; thus wnAnr + wnAnAw ]]Awll 2= wnAnOOnr O,since wnAHO 0 from the above.

Thus Aw 0, and a smaller solution is obtained by taking x Vy in (3.14).But from (2.1), vie (An) for all i, and so this x is orthogonal to V(A) and istherefore the minimum least squares solution. The equations of interest are now

(3.15) r + [Ly-- flxUl, Eybut since r is unknown a priori the ith element r/i of y apparently cannot be foundat the same time as ui and vi are produced--unlike (3.9). However, suppose

(’)(3.16) 7 =- ur =_ T2 =_ 01r/7.

Tk- 1/then since from (3.13) and (3.12)

(3.17) EnOnr VnAnr O,

it follows on dividing by 7 that

(3.18) E(t) =-oe,T

where

2L= 2

This can be solved, giving

(3.19) Ti Ti- lOi/fli+ 1’

k+

i= 1,...,k, whereTo 1.

Thus if 7 were known a priori, ur would be available in the ith step and socould be found. On the other hand, from (3.15) and (3.17),

(3.20) Ilrll 2 fltrHUlso if7 were known a priori, then Ilrll would be known a priori, which seems unlikely.

Suppose now that vectors z and w are found from

(1)(3.21) Lz e, Lw

Dow

nloa

ded

04/3

0/14

to 1

30.1

49.1

5.24

9. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 8: Bidiagonalization of Matrices and Solution of Linear Equations · BIDIAGONALIZATIONOFMATRICES 201 so eigenvalues of such matrices can be computed efficiently just by using the bidiagonalization

204 c.c. PAIGE

then taking

(3.22)

gives

(3.23)

so that

y--z--TW

Ly= fle UUr

(3.24) x Vy Vz- 7Vwand Vz and Vw may be formed as the bidiagonalization proceeds. In the last step 7may be found from the last element of the middle equation in (3.15), equation(3.22), and (3.16).

k+ l/k ilk+ l(k

(3.25) Y flk +1 k/(3k + lk

where , are the ith elements of z, w respectively. The solution x is then givenby (3.24).

If Vz and 7 Vw were large and nearly equal, then cancellation in (3.22) wouldmean numerical instability. However, from (3.21), [[z[[ 31/a where a is thesmallest singular value of L, that is, the smallest relevant (nonzero) singular valueof A. Thus the subtraction in (3.22) can at most introduce an (additional) error in xof 10(8)/, where is the machine precision. Such an error size is to be expectedfrom the condition of the problem, and so the step (3.24) appears acceptable.

With this approach the algorithm for solving either compatible or incompat-ible systems of equations can be written as follows:

Zo := 1; o := 0; o := -1; vz := 0; vw := 0;

u b;

v := Anus; i:= 0;

general step:

i:=i-+- 1;

(.0i (’i-1 i(Di-1)/i UW :-- UW "at- (_DiUi

i + ll’li + Avi ibli

if/i + O, then (solution vz residual O; stop)

"i :: --"i-lOi/i + 1;

i+ 1Ui+ AHUi+ i+ ll)i;

if 0i + O, then

(’ :: i+ li/(i+ l(Di 75i); solution :: vz yvw; stop);

go to general step;Dow

nloa

ded

04/3

0/14

to 1

30.1

49.1

5.24

9. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 9: Bidiagonalization of Matrices and Solution of Linear Equations · BIDIAGONALIZATIONOFMATRICES 201 so eigenvalues of such matrices can be computed efficiently just by using the bidiagonalization

BIDIAGONALIZATION OF MATRICES 205

As before i, /i are chosen so that [Ivil[ ]]uil] 1. This requires only 3 vectors oflength n, i.e., v, vz, vw, and 1 vector of length m, i.e., u; or if it were known a priorithat 7 0, then vw would not be needed.

It can be shown by using (3.13), (3.15), and (3.16), that after the ith step theresidual is

r -/+(- 7o)u+ + 7 _uj=l

but this is not available until 7 is known at the end of the process. However, ifr 0 in (3.12) then 7 0 from (3.16) and so -fl/ liU+ is the residual after theith step; this is just (3.10).

In practice the stopping criteria would be almost useless, and in fact it couldhappen that no z or/3i was even small. The hope is that both and o becomenegligible, and that 7 in (3.25) attains a stable value.

4. Solution of the linear least squares problem. II. The bidiagonalizationalgorithm can be applied with An interchanged with A in (2.1) and (2.2), and then adifferent algorithm for solution of equations will result. However, since the presentbidiagonalization algorithm has been analyzed so fully in 2, it will be retained,and the alternative solution of equations algorithm will be derived from applyingthis same bidiagonalization algorithm to the problem

(4.1) r + Anx b, Ar O,

where, as before, A is an m n matrix, and b is given.The most convenient choice of initial vector here is

(4.2) u Zb/[31, fl IlZbll.With this choice u (A), and from the discussion in 2 the final result of thebidiagonalization must be

(4.3) AnU VLn, A V UL, UnU VnV I.Suppose now that

x Uy + f, Unf O.

Then VnAnf LnUnf 0, and

Anx AnUy + Anf VLny + Anf b- r;

therefore,

(4.4) Lny Vn(b- r).

Thus b- r VVn(b- r) + Anf, giving with (4.3),

(4.5) ]1/,/1 Ab A(b- r)= ULVn(b- r)+ AAnf,and on multiplying by Un,(4.6) fllei LVn(b- r)

since UnAAnf LVnAnf O. Finally substituting (4.6) in (4.5) gives AAnf O.Dow

nloa

ded

04/3

0/14

to 1

30.1

49.1

5.24

9. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 10: Bidiagonalization of Matrices and Solution of Linear Equations · BIDIAGONALIZATIONOFMATRICES 201 so eigenvalues of such matrices can be computed efficiently just by using the bidiagonalization

206 c.c. PAIGE

Thus f V(An), and since Uy is orthogonal to V(An), x Uy is the minimumleast squares solution if y can be found from (4.4). From (4.4) and (4.6) it followsthat

(4.7) Lny= z, Lz= fllel, z =- Vn(b- r).

Clearly y cannot be found until the bidiagonalization is complete, but we arereally only interested in finding Uy, so

(4.8) x Uy UL-nLny Wz, W UL-n,and Wcan be computed a column at a time as the algorithm progresses, by solving

(4.9) LWn Un.In this algorithm

AnW AnUL-n= V

from (4.3), and so is orthonormal; what is more, if X Wzi, Z (1, "’", i,0, ..., 0), then r b An b AnUL-nz b Vz which can easily beupdated as the computation progresses. Without computing r the algorithm canbe written as follows. Remember that cannot in theory be zero.

i:=1;

fllul := Ab;

w := u/;

1 :--

general step:i:= i+ 1;

flibli Avi- Oi- bli

OiVi := AHui- fliUi 1;

W (hi fliWi( := -(fl/3i-

go to general step;

1Vl := AHUl

X :--" 1W1;

if fli 0 then stop;

X :"- X .qt_ (iWi

As usual the ei and fli are chosen so that [Ivll Ilull 1. This algorithmrequires storage space for the 3 vectors u, w, x of dimension m, and v ofdimension n.

This is apparently a new algorithm; its advantages are its simplicity andeconomy and its direct relation to the bidiagonalization (2.1) and (2.2), so that thesingular values relevant to the problem can be found easily from L. Good approxi-mations to the singular values can also be found after only a fraction of the fullnumber of steps.

If this is used computationally, then it is unlikely that either fl or ai will everbecome negligible, but in practice the ( eventually do, and so the computation canbe curtailed, giving as accurate a solution as could be hoped for. A possible testis to check that Ar A(Anx b) is negligible. Computational experience showsD

ownl

oade

d 04

/30/

14 to

130

.149

.15.

249.

Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 11: Bidiagonalization of Matrices and Solution of Linear Equations · BIDIAGONALIZATIONOFMATRICES 201 so eigenvalues of such matrices can be computed efficiently just by using the bidiagonalization

BIDIAGONALIZATION OF MATRICES 207

that unless the condition of the problem is too bad for the machine precision, thealgorithm will converge, but on problems with very close, very small singularvalues, far more than n steps may be required.

5. Comments and conclusions. The very natural bidiagonalization algorithmof Golub and Kahan [2, described here in (2.1) and (2.2), appears to be of signifi-cant theoretical importance. First, it allows singular values and vectors of A tobe obtained easily, for instance by finding the singular value decomposition of thebidiagonal matrix L using the QR-like method given in [33. This was the originalmotivation for deriving the algorithm. Now since the singular values are of suchgreat importance in solving linear equations, it is not, in retrospect, so surprisingthat this bidiagonalization turns out to be very basically related to several methodsfor solving such equations. Two solutions ofequations algorithms have been devel-oped here using the bidiagonalization algorithm as a basis; these have the impor-tant advantage that the relevant singular values can be found if needed.

A very important point that does not appear to be generally realized is thatthe choice of initial vector for solution of equations in algorithms such as these iscomputationally all important. Choices other than those in (3.6) and (4.2) areperfectly satisfactory in theory, and just require small alterations in the algorithmsinvolving one additional inner product. However, the ones given seem to be theonly ones that can be relied upon to give convergence in practice. Note that when(3.6) is used it is assumed in the algorithm that Unb file1, and yet it is knownthat this is not usually the case, that is, orthogonality is lost. If the true value ofUnb is used in the algorithm instead offlle, it is found in practice that the algorithmwill not converge once orthogonality is lost. A similar result holds for (4.2).

This does not mean that another initial approximation to the solution cannotbe used. Suppose r + Ax b, Anr 0, and Xo is a good approximation to x;then if ro b Axo, it is clear we now want to solve

r+Ag=r0, Anr=O,

so that x Xo + g. The method for solving this new problem can now start withthe necessary initial vector, e.g., fllu ro in (3.6). Note here that if xo Yo + Zowith zo V(A), Yo (An), then x y + zo with y .(An), so that x is not theminimum least squares solution for nonzero z0.

The computational importance of all the algorithms suggested here lies intheir application to very large problems with a sparse matrix A. In this case thecomputation per step and storage is about as small as could be hoped, and theo-retically the number of steps will be no greater than the minimum dimension of A.In practice, however, the number of steps required for a given accuracy can bemuch less or much greater than the expected number, depending, among otherthings, on the computer precision and the actual singular values. Rounding errorsusually cause orthogonality of the vectors U and of the vectors vi in (2.1) and (2.2)to be lost, but the singular values and vectors can still be accurately obtained, andthe solution of the given equations can be accurately computed. This is an exactparallel to the computational performance of the Lanczos method for findingeigenvalues of symmetric matrices [7], even as far as often obtaining severalcomputed singular values of A corresponding to one actual singular value.D

ownl

oade

d 04

/30/

14 to

130

.149

.15.

249.

Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 12: Bidiagonalization of Matrices and Solution of Linear Equations · BIDIAGONALIZATIONOFMATRICES 201 so eigenvalues of such matrices can be computed efficiently just by using the bidiagonalization

208 c.c. PAIGE

Although the bidiagonalization algorithm has been extensively tested, onlythe solution of equations algorithm in 4 has been tested, it being more straight-forward than the extension of Craig’s algorithm given in 3. The suggestions oncomputational technique come from experience with the algorithm in 4 andother algorithms closely related to these here.

The methods described here are for a general matrix A; if A is Hermitian,then these methods will be inefficient. The Lanczos algorithm takes advantage ofthe symmetry of A for finding eigenvalues (singular values), while the method ofconjugate gradients is suited to positive definite systems of equations. A futurepaper by Paige and Saunders will show how advantage may be taken of symmetryfor efficiently solving systems of equations with an indefinite symmetric matrix byusing the vectors from the Lanczos process. The method given in that paper willalso be applied to certain symmetric matrices arising from least squares problems,and in some cases will lead to computational algorithms very similar to the oneshere. Since such particular examples will be included in that paper, and since theirperformances parallel that ofthe algorithms here, no computational examples needbe included here.

An attempt has been made here to show the bidiagonalization algorithmintroduced by Golub and Kahan is a very basic algorithm, and of great practicalimportance for large sparse matrices. The same can be said of the original Lanczosalgorithm for tridiagonalizing Hermitian matrices; in fact, it could be arguedthat this latter algori{hm is more basic, one reason being that the former can bederived in a simple manner from the tridiagonalization algorithm applied to thematrix B in (2.15).

Acknowledgments. My thanks to Gene Golub for making my stay at Stanfordsuch a pleasant one, and also to him and Mike Saunders for discussions, ideas,and insight on this work.

would like to pay tribute to George Forsythe, by whose courtesy visitedStanford, and whose personality has left so many clear and strong marks on thatplace and the people working there. It is my regret that arrived too late to meethim.

REFERENCES

[1] D. K. FADDEFV AYD V. N. FADDEFVA, Computational Methods ofLinear Algebra, W. H. Freeman,London, 1963.

[2] G. GOLVB AND W. KAnAN, Calculating the singular values and pseudo-inverse of a matrix, thisJournal, 2 (1965), pp. 205-224.

[3] G. GOLVB AND C. REINSCH, Singular value decomposition and least squares solutions, Numer. Math.,14 (1970), pp. 403-420.

[4] M. R. HESTENES AND E. STIEFEL, Methods of conjugate gradients for solving linear systems, J. Res.Nat. Bur. Standards, 49 (1952), pp. 409-436.

[5] C. LAyczos, An iteration method for the solution of the eigenvalue problem of linear differentialand integral operators, Ibid., 45 (1950), pp. 255-282.

[6] , Solution ofsystems of linear equations by minimized iterations, Ibid., 49 (1952), pp. 33-53.[7] C. C. PAGE, Computational variants of the Lanczos method Jbr the eigenproblem, J. Inst. Math.

Appl., 10 (1972), pp. 373-381.Dow

nloa

ded

04/3

0/14

to 1

30.1

49.1

5.24

9. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 13: Bidiagonalization of Matrices and Solution of Linear Equations · BIDIAGONALIZATIONOFMATRICES 201 so eigenvalues of such matrices can be computed efficiently just by using the bidiagonalization

BIDIAGONALIZATION OF MATRICES 209

[8] J. K. REID, On the method of conjugate graduents for the solution of large sparse systems of linearequations, Proc. Conference on large sparse sets of linear equations, Academic Press, NewYork, 1971.

[9] The use of conjugate gradients for systems of linear equations possessing "property A ",this Journal, 9 (1972), pp. 325-332.

Dow

nloa

ded

04/3

0/14

to 1

30.1

49.1

5.24

9. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p