11
Computers & Chemistry Vol. 10, No. 4, pp. 281-291, 1986 0097-8485/86 $3.00+0.00 Printed in Great Britain Pergamon Journals Ltd DVDSON: A SUBROUTINE TO EVALUATE SELECTED SETS OF EIGENVALUES AND EIGENVECTORS OF LARGE SYMMETRIC MATRICES GERARDO CISNEROS Departamento de Aplicaci6n de Microcomputadoras, Instituto de Ciencias, Universidad Aut6noma de Puebla, Apartado Postal 461, 72000 Puebla, Puebla, M,~xico MANUEL BERRONDO lnstituto de Fisica, Universidad Nacional Aut6noma de M~xico, Apartado Postal 139-B, 62190 Cuemavaca, Morelos, M~xico and CARLOS F. BUNGE Instituto de Fisica, Universidad Nacional Aut6noma de Mrxico, Apart.ado Postal 20-364, 01000 Mrxico D.F., Mrxico (Received 31 January 1986) Abstract--Program HYMAT, originally written by Weber, Lacroix and Wanner, has been improved in efficiencyand generalized to evaluate any selected set of eigenvaluesand eigenvectorsof large sparse real symmetric matrices. The time-consumingsteps are expressed as calls to subroutines which may exploit a vector architecture. A novel way to improve speed of convergenceis discussed. 1. INTRODUCTION In quantum chemistry circles, Davidson's method (Davidson, 1975) has been established as the algorithm of choice for the accurate and efficient computation of eigenvalues and eigenvectors of large sparse real symmetric matrices. Subroutine HYMAT (Weber et al., 1980), the first published program embodying Dav- idson's algorithm, later improved in efficiency and memory requirements (Cisneros & Bunge, 1984), could only evaluate one eigenvalue and eigenvector per call. The purpose of this note is to present subroutine DVDSON, a standard FORTRAN 77 structured pro- gram incorporating a useful extension to HYMAT, and considerably more efficient than its predecessor for very large and very sparse matrices. Moreover, all vector operations are written in terms of modules which can easily be adapted for optimal efficiency on vector processing machines such as the Cray or Cyber 205 supercomputers, or VAX-11 computers with an attached FPS-164 array processor, or the HP-1000 series computers for which we present a particular implementation of it. The main steps of Davidson's algorithm are de- scribed in comments throughout the program whose listing is given in the Appendix. 2. USAGE The calling sequence is CALL DVDSON (MVMPY,N,MI,MF,NIV,ISELEC, IORD,NOUT, LUN,LUN 1,LUN2,CRITA, CRITR, CRITE, ORTHO, ADIAG,V,E,W,IERR) The meaning of all arguments is explained in the head- ing of the subroutine listing. Here we only stress some important points. MVMPY is a user supplied subroutine which pro- duces a vector W out of W = A*V where V is a trial vector and A is the matrix defining the eigen-problem under consideration: Avi = e,v~ . (1) In the evaluation of eigenvectors of most large mat- rices, which is what Davidson's method is for, virtually all execution time is spent in MVMPY. In the calling program, an EXTERNAL statement must declare the actual name of the subroutine which is referred to by the dummy argument MVMPY. In general, subroutine MVMPY will receive implicit input from the rest of the program through one or more COMMON blocks and data files. In one of its simplest implementations, given in the Appendix after subroutine DVDSON, MVMPY receives a rowwise lower triangular matrix A from COMMON, while vector V and the resulting vector B = A*V are arguments. Full vectorization in subroutine MVMPY is achieved in terms of N dot products and an equal number of vector-scalar-mul- tiply-and-add (VSMA) operations for vectors of av- erage size equal to N/2. Subroutine DVDSON receives implieit input and delivers implicit output through a sequential access file connected to logical unit LUN. The input file contains an initial set of NIV vectors. Convergence towards a given eigenvector vi will be fast if the initial NIV- dimensional vector space contains a good approxi- marion to vi. Notice, however, that it is forbidden to 281

DVDSON: A subroutine to evaluate selected sets of eigenvalues and eigenvectors of large symmetric matrices

Embed Size (px)

Citation preview

Computers & Chemistry Vol. 10, No. 4, pp. 281-291, 1986 0097-8485/86 $3.00+0.00 Printed in Great Britain Pergamon Journals Ltd

D V D S O N : A S U B R O U T I N E T O E V A L U A T E S E L E C T E D S E T S

O F E I G E N V A L U E S A N D E I G E N V E C T O R S O F L A R G E

S Y M M E T R I C M A T R I C E S

GERARDO CISNEROS Departamento de Aplicaci6n de Microcomputadoras, Instituto de Ciencias, Universidad Aut6noma de

Puebla, Apartado Postal 461, 72000 Puebla, Puebla, M,~xico

MANUEL BERRONDO lnstituto de Fisica, Universidad Nacional Aut6noma de M~xico, Apartado Postal 139-B,

62190 Cuemavaca, Morelos, M~xico

and

CARLOS F. BUNGE Instituto de Fisica, Universidad Nacional Aut6noma de Mrxico, Apart.ado Postal 20-364,

01000 Mrxico D.F., Mrxico

(Received 31 January 1986)

Abstract--Program HYMAT, originally written by Weber, Lacroix and Wanner, has been improved in efficiency and generalized to evaluate any selected set of eigenvalues and eigenvectors of large sparse real symmetric matrices. The time-consuming steps are expressed as calls to subroutines which may exploit a vector architecture. A novel way to improve speed of convergence is discussed.

1. INTRODUCTION

In quantum chemistry circles, Davidson's method (Davidson, 1975) has been established as the algorithm of choice for the accurate and efficient computation of eigenvalues and eigenvectors of large sparse real symmetric matrices. Subroutine HYMAT (Weber et al., 1980), the first published program embodying Dav- idson's algorithm, later improved in efficiency and memory requirements (Cisneros & Bunge, 1984), could only evaluate one eigenvalue and eigenvector per call. The purpose of this note is to present subroutine DVDSON, a standard FORTRAN 77 structured pro- gram incorporating a useful extension to HYMAT, and considerably more efficient than its predecessor for very large and very sparse matrices. Moreover, all vector operations are written in terms of modules which can easily be adapted for optimal efficiency on vector processing machines such as the Cray or Cyber 205 supercomputers, or VAX-11 computers with an attached FPS-164 array processor, or the HP-1000 series computers for which we present a particular implementation of it.

The main steps of Davidson's algorithm are de- scribed in comments throughout the program whose listing is given in the Appendix.

2. USAGE

The calling sequence is

CALL DVDSON (MVMPY,N,MI,MF,NIV,ISELEC, IORD,NOUT, LUN,LUN 1,LUN2,CRITA,

CRITR, CRITE, ORTHO, ADIAG,V,E,W,IERR)

The meaning of all arguments is explained in the head- ing of the subroutine listing. Here we only stress some important points.

MVMPY is a user supplied subroutine which pro- duces a vector W out of W = A*V where V is a trial vector and A is the matrix defining the eigen-problem under consideration:

A v i = e,v~ . (1)

In the evaluation of eigenvectors of most large mat- rices, which is what Davidson's method is for, virtually all execution time is spent in MVMPY. In the calling program, an EXTERNAL statement must declare the actual name of the subroutine which is referred to by the dummy argument MVMPY. In general, subroutine MVMPY will receive implicit input from the rest of the program through one or more COMMON blocks and data files. In one of its simplest implementations, given in the Appendix after subroutine DVDSON, MVMPY receives a rowwise lower triangular matrix A from COMMON, while vector V and the resulting vector B = A*V are arguments. Full vectorization in subroutine MVMPY is achieved in terms of N dot products and an equal number of vector-scalar-mul- tiply-and-add (VSMA) operations for vectors of av- erage size equal to N / 2 .

Subroutine DVDSON receives implieit input and delivers implicit output through a sequential access file connected to logical unit LUN. The input file contains an initial set of NIV vectors. Convergence towards a given eigenvector vi will be fast if the initial NIV- dimensional vector space contains a good approxi- marion to vi. Notice, however, that it is forbidden to

281

282 G. CISNEROS et al.

give as initial vectors a set in terms of which the matrix A has eigenvalues equal to any of its diagonal matrix elements (stored in array ADIAG). This precludes the use of a single vector with one component equal to a constant and all others equal to zero. For example, one may use a vector with the ith component equal to 1 and all others equal to 0 except for one of them (assuming A 0 :¢= 0), which may be assigned a small value SMALL = 1.D-10.

3. SPEED IMPROVEMENTS

Different calls may provide DVDSON with differ- ent names for MVMPY. For example, given a physical problem, one may conceive a sequence of n skeleton matrices SA 1, SA2 . . . . . SAn for which the calculation of ASi*V is drastically simplified relative to the cal- culation of A*V while some eigenvectors of interest very closely resemble corresponding ones of A. Dif- ferent subroutines MVMPY1, MVMPY2 . . . . .

MVMPYn might be employed to evaluate SA l ' V , SA2*V . . . . . SAn*V, respectively. For certain prob- lems (Bunge & Cisneros, 1986), the execution time needed to evaluate a given eigenvector on ~ of SAn may be orders of magnitude smaller than the time required to simply evaluate A ' V , while vn~ used as a trial eigenvector to solve (1) may be such a good approx- imation to v~ that only one or two Davidson iterations may be necessary to achieve reasonable convergence towards a given eigenvalue and eigenvector.

4. ACCURACY

Three convergence criteria represented by argu- ments CRITA, CRITR and CRITE are used, the pro- tess being terminated upon fulfillment of any one of them.

The first two criteria test convergence to the Mth eigenvector. For CRITA, the inequality to be satisfied is ABS(U(J,M)) .LT. CRITA, where U ( J , M ) is the Jth component of the Mth approximate eigenvector of the projection of the original matrix into the Jth dimensional manifold of vectors obtained after the Jth iteration (Davidson, 1975). For CRITR, it is required that RESNOR .LT. CRITR, where RESNOR is the modulus of the residual vector r = Av-ev obtained from the current approximate eigenvector v. The mod- ulus of r allows an assessment of the quality of the eigenvector and it provides a rigorous although pes- simistic bound for the eigenvalue (Wilkinson, 1965).

The last convergence test is satisfied when ABS(VAL-VALOLD) .LT. CRITE, where VAL and VALOLD are the current and previous approxima- tions to the eigenvalue, respectively. Notice that VAL- OLD is entered for the first iteration as E ( M I ) where M I is the index of the first wanted eigenvalue. In this way one may give as E ( M I ) an approximation ob- tained in a previous call to subroutine DVDSON, to promote the quick convergence strategy discussed in Section 3, thus making it possible to achieve conver- gence after a single iteration once a much smaller problem has been solved.

$. EFFICIENCY AND IMPROVEMENTS

For scalar computers, the present code is more efficient than our previous one (Cisneros & Bunge, 1984) on account of having replaced the calls to the EISPACK routines TRED2 and TQL2 by a call to subroutine HQRII1 (Bunge & Bunge, 1986), which is an improved version of subroutine HQRII (Beppu & Ninomiya, 1982).

Additionally, in subroutine DVDSON we have ex- pressed all vector operations in terms of calls to sub- routines which may easily be optimized for vector ar- chitectures. These subroutines appear after subroutine DVDSON and perform the following operations: vec- tor-scalar-multiply-and-add (VSMA1D), vector-sca- lar-multiply (VSMD), dot product of vectors (VDOTD), vector-divide by scalar minus vector (VDSSVD), zero a vector (ZRO1D), and copy a vector (COPYV).

In the listing of the program, each section of vec- torizable code is separated by comments showing sec- tion numbers. Vectorization for the HP-1000 series computers is achieved in a way analogous to that de- scribed in the previous paper (Berrondo, Bunge & Bunge, 1986).

Users hating access to vector processors may re- place the call to subroutine HQRII1 by a call to its vectorized version (Berrondo, Bunge & Bunge, 1986).

For dense matrices the execution time is propor- tional to the number of iterations and largely deter- mined by the matrix-vector multiply subroutine MVMPY. When using very sparse matrices the exe- cution time for subroutine MVMPY may be relatively small, in which case it is of interest to assess timings for Davidson's algorithm. From an inspection of the program given in the Appendix, it may be verified that the Jth iteration involves essentially 4 J - 2 reads, 3 writes, 2 J - 2 inner products and 3J -1 VSMA oper- ations, all of them acting upon vectors of size equal to the matrix order N.

For a given computer, let M(VDOT) be the number of mega- floating- point- operations- per- second- CPU (MFLOPS) obtained in the evaluation (in FOR- TRAN) of a vector inner product, let M(VSMA) be the number of MFLOPS achieved for a VSMA op- eration, and let r be the I /O transfer rate in Mb/s. Then it follows that the accumulated execution time t (in seconds) after the Jth iteration (other than that employed in MVMPY) is given by

t = j 2 { 3 * [ 8 N / r ] + 3 * [ N / M ( V S M A ) ] +

2*[N/M(VDOT)]I I . E - 6 (2)

for double precision (8 bytes) vectors. On a VAX- 11 / 780 computer with an RP81 disk, r = 0.2 Mb/s for large vectors (Cisneros & Bunge, 1986), M(VDOT) = 0.227 (Bunge & Bunge, 1986) and M(VSMA) = 0.173 (Cisneros, Bunge & Roothaan, 1986). Thus, for a ma- trix of order 10000, 10 iterations would require 120 seconds of I/O, 27 seconds for vector operations and a negligible m o u n t of time for diagonalizing ten sub-

DVDSON 283

matrices of orders 1 through 10. Since t in (2) increases linearly with the matrix order N, one can envisage the calculation of eigenvectors and eigenvalues for mat- rices with very large N if the calculation in subroutine MVMPY can be accomplished reasonably fast.

6. CONCLUSIONS

Subroutine DVDSON is efficient when a few ei- genvalues and eigenveetors are needed, even for dense matrices of order as low as 30. For large matrices it should be the most convenient option, as it is based on the algorithm (Davidson, 1975) which permits the most flexible computational strategy for the evaluation of the matrix-vector product Av while being remark- ably accurate for eigenvector calculations (Cisneros & Bunge, 1984).

It should be stressed that Davidson's method al- lows for the direct calculation of any eigenvalue and eigenvector, without need to evaluate lower or higher lying eigenvectors. As such it will be particularly useful in atomic structure calculations of families of excited states (Froese-Fischer, 1977).

The listing of subroutine DVDSON contains (en- capsulated in comments) a vectorized code for the HP- 1000 computers. Users having access to supercom- puters, or computers having attached array processors may achieve vector processing speeds by replacing the vectorizable sections by corresponding routines in their vector libraries.

REFERENCES

Beppu, Y., & Ninomiya, I. (1982), Comput. Chem. 6, 87. Bunge, A.V., & Bunge, C.F. (1986), Comput. Chem. 10,259. Berrondo, M., Bunge, A.V., & Bunge, C.F. (1986), Comput.

Chem. 10, 269. Bunge, C.F., & Cisneros, G. (1986), preprint. Cisneros, G., & Bunge, C.F. (1984), Comput. Chem. 8, 157. Cisneros, G., & Bunge, C.F. (1986), Comput. Chem. 10,000. Cisneros, G., Bunge, C.F., & Roothaan, C.C.J. (1986), pre-

print. Davidson, E.R. (1975), J. Comp. Phys,, 17, 87. Froese-Fischer, C. (1977) The Hartree-Fock Method for

Atoms." .4 Numerical Approach, John Wiley, New York. Weber, J., Lacroix, R., & Wanner, G. (1980), Comput. Chem.

4, 55. Wilkinson, J.H. (1965), The Algebraic Eigenvalue Problem,

Clarendon Press, Oxford.

APPENDIX

C .................................................................

SUBROUTINE DVDS0N (MVMP¥,NNL,MINL,MFNL,NIVNL,ISELEC,IORDNL, * NOUTNL, LUNNL, LUNINL, LUN2NL, CRITAN, CR ITRN, * CR ITEN, 0RTHON, ADIAG, V, E, W, IERR}

C .................................................................

Title: module DVDSON for calculating an arbitrary predetermined set of eigenvalues and eigenvectors of a real symmetric matrix.

Abstract: A general Fortran implementation of Davidson's algorithm EE. R. Davidson, J. Comp. Phys. 17,87(1975)3 for eigenvalues and eigenvectors of large an~ sparse real symmetric matrices is given. This is an extension of a previous program CG. Cisneros and C. F. Bunge, Comput. Chem. 8,157(1984)3 which is in turn an improved version of a subroutlne given by J. Heber, R. Lacroix and G. Hanner, Comput. Chem. 4,55(1980). A detailed account of the various steps is given in commentaries at appropriate places within the program. Vector operations are localized in modules of a vector processing llbrary which may be optimized for a given array processor or vector computer.

Environment: Standard Fortran 77.

Copyright by Gerardo Cisneros, Manuel Berrondo, Carlos F. Bunge, Jacques Weber, Roger Lacroix and Gerhard Harmer, 1986.

Reference: G.Cisneros, M. Berrondo & C.F.Bunge, Comput. Chem. 10, (1986).

User library calls: CALL HQRIII (J,I,J,IORD,H,D,MX,V,.TRUE.,IER), published by A.V.Bunge and C.F.Bunge, Comput. Chem. I0, (1986), or CALL VHQRII (J,1,J,IORD,H,D,MX,V,.TRUE.,IER), published by M.Berrondo, A.V.Bunge & C.F.Bunge, Comput. Chem..10", (1986). CALL MVMPY (J,N,V,N) where MVMPY may be a user supplied sub- routine which produces the vector H = A~V with N being the di- mension of matrix A and V a trial vector. For large matrices A and low order eigenvalues this is the only time consuming step.

Parameter values:

MX=30 is the maximum allowed number of required eigenvectors. CER0,UN0 are trivial constants. BIG=I.D20 is a large floating point number.

Formal arguments:

N (NNL) is the actual dimension of the real symmetric matrix A under consideration. A is accessed indirectly through subroutine MVMP¥. Notice that all variables in the argumentlist have the suffix NL or N indicating non-local status. For better

284 G. CISNEROS et aL

C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C

C

C C C C C C C C C C C

C

C C C

C

C C C C C

efficiency, all non-local variables are converted to local variables immediately upon entering this subroutine.

MI is the index of the lowest elgenvalue to be calculated. If MI=0 the program looks for a predetermined set of elgen- values specified in vector ISELEC.

MP is the index of the highest elgenvalue to be calculated. M must be less than or equal to MX.

NIV is the number of initial vectors V which form a manifold onto which matrix A is first projected thus generating a matrix AV. Note that NIV .GE. MP when MI is greater than zero. The eigenvalues and eigenvectors of AV are estimates for the corresponding ones of A at all stages of the calculation.

ISELEC is a vector of dimension MX specifying the indices of the eigenvalues to be calculated, viz., ISELEC(1)=6, ISELEC(2)=8 for calculating the 6-th and 8-th eigenvalues. ISELEC(I)=0 for I greater than NEIG, the total number of wanted eigen- values. (NEIG is calculated by the program.) The program will accept any order in the values placed in ISELEC, however, users may wish to use a given order to speed up convergence. If ISELEC(1)=0 then it must hold MF .GE. MI and MI .GT. 0.

IORD =-i, elgenvalues are considered in increasing order (usual case in quantum mechanics), IORD .GE. 0 for eigenvalues in decreasing order. Users must always show some common sense to select the initlal set of approximate eigenvectors, noticing that for a given problem different values of IORD require different initial sets.

MOOT is the logical unit number /or output (including diagnostics) within module DVDSON. If NOUT .LE. 0 output is bypassed.

LUN is the logical number of a sequential access external file. On input it contains the initial set of approximate eigen- vectors. On output it holds the flnal eigenvectors.

LUNI is the logical number of a scratch file used by DVDSON. LUN2 is the logical number of another scratch file used by DVDSON. CRITA is a convergence threshold, see below. CRITR Is a convergence threshold. An eigenvalue and corresponding

eigenvector are considered to be converged when either AV(J,M) is less than CRITA (J labels the current iteration), or when the norm of the current residual vector is less than CRITR.

CRITE is a convergence threshold for eigenvalues. Convergence is reached when ABS(VAL-VOLD) is less than CRITE where VAL is the last estimate for a given eigenvalue.

0RTH0 is an orthogonality threshold for current trial vectors V. Suggested value: 0RTH0=I.D-6.

ADIAG is a vector of dimension N containing the diagonal elements of matrix A.

V is a vector of dimension N. On output it contains the last calculated eigenvector.

E is a vector of dimension MX. On input, E(M) may contain an estimate for the M-th eigenvalue. On output, array E contains eigenvalues MI through MF, or in the order specified by array ISELEC.

M is a working vector of dimension N. IERR Completion status indicator.

Implicit input:

Initial set of NIV (preferably approximate) vectors given in sequential access flle LUN. It is forbidden to give as initial vectors a set in terms of which the A matrix has eigenvalues equal to any of its diagonal matrix elements (stored in ADIAG). This precludes the use of a single vector with one component equal to a constant and all others equal to zero. For example, one may use a vector with one component equal to i. and all others equal to 0. except for one of them which may be assigned a value SMALL=I.D-10. Or one may use two vectors each with a (different) component equal to I. and all others equal to zero.

Implicit output:

Sequential access file LUN will contain the calculated eigen- vectors in the same order as the corresponding eigenvalues, one vector per each logical record.

Completion status:

IERR=0 normal successful completion. IERR-33 means that matrix A is a null matrix. IERR=129 indicates a fatal error: the calling program must choose

a recovery path and exit. Suggestions for corrective actions are printed in file NOUT.

IMPLICIT DOUBLE PRECISION (A-H,0-Z)

C~Excluslvely for HP-1000 series computers. C IMPLICIT DOUBLE PRECISION (A-H,0-Z), INTEGER*4 (I-N) C EMA ADIAG,V,H C,End of exclusive section for HP-1000 series computers.

PARAM~T~ (MX=30, MXI=(MX*MX+MX)/2, CER0=0.D0, UN0=I.D0, * BIG=I.24683579D20) DIMENSION ISELEC(MX),AV(MXI),H(MXI),D(MX),U(MX,MX),

* ADIAG(NNL),V(NNL),E(NNL),H(NNL)

DVDSON 285

C C

Validation and initial set up. variables.

N = NNL MI = MINL MF =MFNL NIV - NIVNL IORD = IORDNL NOUT = NOUTNL LUN = LUNNL LUNI = LUNINL LUN2 = LUN2NL

First convert argument variables to local

CRITA = CRITAN CRITR = CRITRN CRITE = CRITEN 0RTH0 = 0RTHON

IF (NIV .GT. MX) THEN P{RITE ( NOUT, * ) ' NIV . GT. MX IN SUBROUTINE DVDSON" IERR=I29 RETURN

END IF IF (MI .EQ. 0) THEN

C If MI=0 wanted order of eigenvalues is in array ISELEC.

M = ISELEC(1) IF (M .EQ. 0) THEN

WRITE (NOUT,*) 'MI AND ISELEC(1) BOTH ZERO IN SUB. DVDSON' IERR=I29 RETURN

ELSE NEIG = 1

END IF DO I0 I=2,MX

IF (ISELEC(I) .NE. 0) THEN NEIG -- NEIG + 1

ELSE GO TO 30

END IF 10 CONTINUE

ELSE

C If MI .NE. 0 eigenvalues of order MI through MF will be computed.

M = MI NEIG = MF - MI + 1 IF (MF .LT. MI) TH~2~

MRITE (NOUT,*) 'MF LESS THAN MI IN SUB. Dg~DSON ' IERR=I29 RETURN

END IF END IF IF (MI .GT. 0) THEN

C Store in ISELEC the order of all eigenvalues to be computed.

DO 20 I=I,NEIG 20 ISELEC(I) = I + MI - 1

END IF 30 LARGE = ISELEC(1)

DO 40 I=2,NEIG 40 IF (ISELEC(I) .GT. LARGE) LARGE - ISELEC(I)

IF (LARGE .GT. NIV) THEN ~RITE (NOUT,*) 'NUMBER OF INITIAL VECTORS SMALLER THAN ORDER'

• ' OF LARGEST EIGENVALUE' IERR=129 RETURN

END IF

C Beginning of the main loop.

REWIND LUN RENIND LUNI REWIND LUN2 VALOLD = E(M) MR = 1 J = 0

50 J = J+ 1 JP = J + 1 JM = J- 1 JW = (J'J-J)/2 IF (J .LE. NIV) READ (LUN) V IF (J .GT. i) THEN

C 0rthogonalization of the J-th approximate elgenvector, C stored in V, to all approximate elgenvectors, stored in M.

SMAX2 = BIG 60 SMAXI = CER0

R~NIND LUNI DO 80 LzI,JM

READ (LUNI) W

286 G. CISNEROS et al.

C * V e c t o r i z a b l e c o d e , S e c t i o n 1 . CALL VDOTD ( N , H , V , S ) SHAXI= HAX(SHAX1,ABS(S) ) S = - S CALL VSHA1D ( N , S , H , V )

C,End of Section i.

C*Vectorized code for HP-1000 series computers, Section i. C CALL DHDOT (S,H,I,V,1,N) C SMAXI= MAX(SMAXI,ABS(S)) C S = -S C CALL DWPIV (S,W,I,V,I,V,I,N) C,End of Section 1 replacement for HP-1000 series computers.

80 CONTINUE IF (SMAXI .GE. 0RTH0) THEN

C If the orthogonalizatlon procedure is non-convergent, go to restart

IF (SMAXI .GT. SMAX2) GO TO 250 SMAX2 = SMAXl GO TO 60

END IF END IF

C Normalization of the J-th approximate elqenvector.

C*Vectorizable code, Section 2. CALL VDOTD (N,V,V,S) S = UN0/SQRT(S) CALL VSMD (N,S,V,V)

C~End of Section 2.

C*Vectorlzed code for HP-1000 series computers, Section 2. C CALL I~DOT (S,V,I,V,I,N) C S = UN0/SQRT(S) C CALL DWSMY (S,V,I,V,I,N) C'End of Section 2 replacement for HP-1000 series computers.

C Hrlte the J-th basis vector in LUNI.

WRITE (LUNI) V

C First calculate H <== A * V in subroutine MVMPY. For matrices A with C many non-zero matrix elements this is the time consuming step.

CALL MVMP¥ (J,N,V,W)

C Save the resulting vector H in LUN2 for later use In the calculation C of the residual vector.

WRITE (LUN2) H

C Computation of the lower triangle of matrix AV, which is the matrix A C expressed in the basis of J (J .GE. NIV) vectors V.

C*Vectorlzable code, Section 3. CALL VDOTD (N,W,V,S)

C,End of Section 3.

C*Vectorlzed code for HP-1000 series computers, Section 3. C CALL DHDOT (S,H,I,V,I,N) C,End of Section 3 replacement for HP-1000 series computers.

AV(JW+J) = S IF (J .GT. i) THEN

RE~IND LUNI DO 130 I=I,JM

READ (LUNI) V

C*Vectorlzable code, Section 4. CALL VDOTD (N,W,V,S)

C,End of Section 4.

C*Vectorized code for HP-1000 series computers, Section 4. C CALL D~DOT (S,W,I,V,I,N) C~End of Section 4 replacement for HP-1000 series computers.

130 AV(JH+I) = S END IF IF (J .LT. NIV) GO TO 50

C Diagonallzatlon of matrix AV, which is the matrix A expressed in the C basis of J (J .GE. NIV) vectors V. Since subroutine HQRIII uses a C rowwlse lower triangle of the corresponding real symmetric matrix and C destroys it, matrix AV is transferred into matrix H.

IF (J .EQ. i) THEN U(I,I) = UN0

VAL = AV(1) E(MR) = VAL

ELSE

D V D S O N 287

C*Vectorizable code, Section 5. CALL C0PYV (JH+J,AV,H) CALL HQRIII (J,I,J,IORD,H,D,MX,U,.TRUE. ,IERR)

C~End of Section 5. (Vectorized version of HORIII is named VHORII. )

C*Vectorlzed code for HP-1000 series computers, Section 5. C CALL DVMOV (AV,I,H,I,JH+J) C CALL VHQRII (J,I,J,IORD,H,D,MX,U,.TRUE. ,IERR) C,End of Section 5 replacement for HP-1000 series computers.

IF (IERR .NE. 0) THEN [4RITE (NOUT,*) 'ERROR IN DIAGONALIZATION' IERR=I29 CLOSE (LUNI, STATUS='DELETE' ) CLOSE (LUN2, STATUS='DELETE') RETURN

END IF VAL = D(M)

END IF

IF (J .EQ. N) THEN

C If J .EQ. N prepare to exit storing the wanted elqenvalues in array E C and the corresponding elgenvectors i n successive records of file LUN.

REMIND LUN IR = 0 DO 170 II=I,NEIG

I = ISELEC( II ) IR = IR + 1

E(IR) = D(I) RI'~I ND LUN1

C*Vectorlzable code, Section 6. CALL ZR01D (N,V)

C,End of Section 6.

C~Vectorlzed code for HP-1000 series computers, Section 6. C CALL D~MOV (CER0,0,V,1,N) C,End of Section 6 replacement for HP-1000 series computers.

DO 160 K-I,J READ (LUNI) M UKI - U(K,I)

C*Vectorlzable code, Section 7. CALL VSMAID (N,UKI,N,V)

C,End of Section 7.

C*Vectorlzed code for HP-1000 series computers, Section 7. C CALL D0~IV (UKI,N,I,V,I,V,I,N) C,End of Section 7 replacement for HP-1000 series computers.

160 coBTINUE RITE (LUN) V

170_ CONTINUE R~IND LUN CLOSE (LUNI, STATUS='DELKTE' ) CLOSE (LUN2, STATUS-' DELETE' ) REHmN

END IF

C Construction of V, the (normalized) M-th approximate eigenvector result- C Ing from the J-th iteration. It is equal to the final elgenvector if C this one turns out to be the last iteration.

C*Vectorlzable code, Section 8. CALL ZR01D (N,V)

CaEnd of Section 8.

CaVectorlzed code for HP-1000 series computers, Section 8. C CALL D~0V (CER0,0,V,I,N) C,End of Section 8 replacement for HP-1000 series computers.

REHIND LUN1 DO 190 I=l,J

READ (LUNI) N UIM = U(I,M)

C*Vectorlzable code, Section 9. CALL VSMAID (N,UIM,H,V)

C,End of Section 9.

C*Vectorlzed code for HP-1000 series computers, Section 9. C CALL DHPIV (UIM,N,1,V,I,V,I,N) C,End of Section 9 replacement for HP-1000 series computers.

190 CONTINUE

C Save approximate eigenvector V Just in case it turns out to be the C converged result.

MROLD = MR

CAC IO /4 -D

288 G. CISNEROS et al.

REWIND LUN DO 200 I=I,MROLD-I

200 READ (LUN) WRITE (LUN) V

ALPHA = ABS(U(J,M)) IF (NOUT .GT. 0) WRITE (NOUT,'C4H J=,I3,LX,gH EIG NO. ,I3,2H -,

* D21.10,5X,6HALE~4A=,DI4.7)') J,M,VAL,~

C Test convergence on ALPHA - ABS(U(J,M)). If the absolute value of the C M-th vector V in the expansion of the current elgenvector is less than C CRITA, move into next elgenvalue and exit if there are no more. C Also, test convergence on the energy. VAL and VALOLD contain the current C and previous approximate elgenvalues, respectively.

IF (ALPHA .LT. CRITA E(MR) = VAL MR= MR+ 1 M = ISELEC(MR) IF

.OR. ABS(VAL-VALOLD) .LT. CRITE) THEN

END END IF

(MR .GT. NEIG) THEN IF (ALPHA .LT. CRITA .AND. NOUT .GT. 0) MRITE (NOUT,'(26H CONVERGENCE ON ALPHA .LT.,GII.3)') CRITA IF (ABS(VAL-VALOLD) .LT. CRITE .AND. NOUT .GT. 0) WRITE (NOUT,'(26H CONVERGENCE ON CRITE .LT.,Gll.3)') CRITE CLOSE (LUNI, STATUS='DELETE') CLOSE (LUN2, STATUS='DELETE') RETURN IF

C N e i t h e r c o n v e r g e n c e t e s t o n VAL n o r o n ALPHA h a s n o t b e e n p a s s e d a t t h e C J-th iteration. Thus calculate the residual vector C H(I) = (A * V)(I) - VAL * V(I) C to check if convergence test on it is fulfilled. Notice that the I-th C component of (A * V) can be evaluated from U(I,M) and the vectors stored C in LUN2.

S = -VAL

C*Vectorlzable code, Section I0. CALL VSMD (N,S,V,W)

C,End of Section I0.

C*Vectorlzed code for HP-1000 series computers, Section I0. C CALL DWSMY (S,V,I,H,1,N) C~End of Section I0 replacement for HP-1000 series computers.

R~7~IND LUN2 DO 220 I=l,J

READ (LUN2) V UIM = U(I,M)

C*Vectorlzable code, Section Ii. CALL VSMAID (N,UIM,V,M)

C,End of Section 11.

C*Vectorlzed code for HP-1000 series computers, Section 11. C ,CALL DWPIV (UIM,V,1,W,I,W,1,N) C,End of Section 11 replacement for HP-1000 series computers.

220 CONTINUE

C Form the norm of the resldual vector.

C*Vectorlzable code, Section 12. CALL VDOTD (N,W,W,RESNOR)

C,End of Section 12.

C*Vectorlzed code for HP-1000 series computers, Section 12. C CALL DHDOT (RESNOR,W,I,W,I,N) C,End of Section 12 replacement for HP-1000 series computers.

RESNOR = SQRT(RESNOR)

IF (NOUT .GT. 0) MRITE (NOUT,'(17H RESIDUAL NORM =,DI4.7)')RESNOR

C If the resldual norm is less than CRITR move Into next elgenvalue and C exit if there are no more.

IF (RESNOR .LT. CRITR) THEN E(MR) = VAL MR= MR+ 1 M = ISELEC(MR) IF (MR .GT. NEIG) THEN

WRITE (NOUT,'(27H CONVERGENCE ON RESNOR .LT. ,GII.3)')CRITR CLOSE (LUNI, STATUS='DELETE') CLOSE (LUN2, STATUS='DELETE') IF (NEIG .EQ. i) THEN

R~RIND LUN READ (LUN) V

END IF RETURN

END IF

DVDSON 289

END IF VALOLD = VAL IF (J .LT. MX) THEN

Convergence has not yet been achieved. Form a trlal (J+l)-th approximate vector, V, which win later be orthonormalized to the previous J vectors.

C*Vectorizable code, Section 13. CALL VDSSVD {N,VAL,M,ADIAG,V)

C,End of Section 13.

C*Vectorlzed code for HP-1000 series computers, Section 13. C CALL DWSSB (VAL,ADIAG,I,V,I,N) C CALL D~DIV (H,I,V,1,V,I,N) C'End of Section 13 replacement for HP-1000 series computers.

GO TO 50 END IF

C Restart procedure. Give as restarting vectors the lowest NIV current C elgenvectors of the MX by MX matrlx AV.

250 RE~IND LUN DO 280 I=I,NIV

RERIND LUNI

C*Vectorizable code, Section 14. CALL ZR01D (N,V)

C,End of Section 14.

C*Vectorized code for HP-1000 series computers, Section 14. C CALL D~0V (CERO,0,V,I,N) C,End of Section 14 replacement for HP-1000 series computers.

DO 270 K=I,J READ (LUNI) W UKI = U(K,I)

C*Vectorizable code, Section 15. CALL VSMAID (N,UKI,W,V)

C,End of Section 15.

C*Vectorlzed code for HP-1000 series computers, Section 15. C CALL DWPIV (UIK,W,I,V,I,V,I,N) C~End of Section 15 replacement for HP-1000 series computers.

270 CONTINUE WRITE (LUN) V

280 CONTINUE R~d I ND LUN REWIND LUN1 DO 290 I=I,NIV

READ (LUN) H 290 WRITE (LUNI) H

REWIND LUN REWIND LUNI IF (NOUT .GT. 0) WRITE (NOUT,'(//15H RESTART DVDSON)') J=0 GO TO 50 END

C ................................. SUBROUTINE MVMPY (JIT,N,V,B)

C .................................

C Purpose: to multiply a real symmetric matrix A (given in rowwlse lower C triangular form) times a vector V yielding a vector B. C JIT, which denotes the current Davldson iteration, is not used C in the present code.

IMPLICIT DOUBLE PRECISION (A-H,0-Z) PARAM~-I~ (NX=256, NXI=(NX*NX+NX)I2, CER0-0.D0) DIMENSION V(N),B(N) COMMON /MATRXA/ A(NXI)

IF (N .GT. NX) THEN PRINT *,' N .GT. NX IN SUBROUTINE MVMP¥' STOP

END IF

I0- 0 DO i0 I-I,N

I0 - I0 + I - 1 IMI - I - 1 CALL VDOTD (IM1,A(I0+I) ,V,StR4) VI - V(I) CALL VSMAID (IMI,VI,A(~0+I) ,B)

10 B(I) - SUM + A(I+I0)*%q

RETURN END

290 G. CISNEROS et aL

C ..................................

SUBROUTINE VSMAID (N,SI,VI,V) C ..................................

IMPLICIT DOUBLE PRECISION (A-H,0-Z) DIMENSION VI(*),V(*)

NL = N SIL= Sl NREST = MOD(NL,6) DO 10 I=I,NREST

i0 V(I ) = SIL~VI(I ) + V(I ) DO 20 I=NREST+I,NL,6

V(I ) = SIL*VI(I ) + V(I ) V(I+I) = SIL*VI(I+I) + V(I+I) V(I+2) = SIL*VI(I+2) + V(I+2) V(I+3) ~ SIL*VI(I+3) + V(I+3) V(I+4) = SIL*VI(I+4) + V(I+4)

20 V(I+5) = SIL*VI(I+5) + V(I+5) RETURN END

C ...............................

SUBROUTINE VSMD (N,S,VI,V) C ...............................

IMPLICIT DOUBLE PRECISION (A-H,0-Z) DIMENSION VI(*),V(*)

NL = N SL = S N'REST = MOD(NL,6) DO i0 I=I,NREST

i0 V(I ) = SL*VI(I ) DO 20 I=NREST+I,NL,6

V(I ) = SL*VI(I ) V(I+I) = SL*VI(I+I) V(I+2) = SL*VI(I+2) V(I+3) = SL*VI(I+3) V(I+4) = SL*VI(I+4)

20 V(I+5) = SL*VI(I+5) RETURN END

C .................................

SUBROUTINE VDOTD (N,VI,V2,S) C ............................ - ....

IMPLICIT DOUBLE PRECISION (A-H,0-Z) PARAM~-¥~.~ (CERO=0.D0) DIMENSION VI(*),V2(*)

NL = N SL = CER0 NREST = MOD(NL,6) DO i0 I=I,NREST

i0 SL = VI(I )*V2(I ) + SL DO 20 I=NREST+I,NL,6

20 SL = VI(I )*V2(I ) + VI(I+I)*V2(I÷I) + VI(I+2)*V2(I+2) * + VI(I+3)*V2(I+3) + VI(I+4)*V2(I+4) + VI(I+5)*V2(I+5) + SL S = SL RETURN END

C ....................................

SUBROUTINE VDSSVD (N,S,V1,V2,V) C ....................................

IMPLICIT DOUBLE PRECISION (A-H,O-Z) DIMENSION VI(*),V2(*),V(*)

NL = N SL = S NREST = MOD(NL,6) DO I0 I=I,NREST

i0 V(I ) = VI(I )/(S-V2(I )) DO 20 I=NREST+I,NL,6

V(I ) = VI(I )/(SL-V2(I ) V(I+I) = VI(I+I)/(SL-V2(I+I) V(I+2) = VI(I+2)/(SL-V2(I+2) V(I+3) = VI(I+3)/(SL-V2(I+3) V(I+4) = VI(I+4)/(SL-V2(I+4)

20 V(I+5) = VI(I+5)/iSL-V2(I+5) RETURN END

C ...........................

SUBROUTINE ZR01D (N,V) C ...........................

IMPLICIT DOUBLE PRECISION (C,V) PARAM~T~ (CER0=0.D0) DIMENSION V(*)

NL = N NREST = MOD(NL,6) DO i0 I=I,NREST

i0 V(I ) = CERO DO 20 I=NREST+I,NL,6

V(I ) = CER0 V(I+I) = CER0

DVDSON 291

V(I+2) = CER0 V(I+3) = CER0 V(I+4) = CER0

20 V(I+5) = CER0 REnmN ~D

C .............................. SUBROUTINE COPYV (N,V1,V~

C .............................. IMPLICIT DOUBLE PRECISION (C,V) DIMENSION VI(*),V(*)

NL = N NREST = MOD(NL,6) DO 10 I=I,NREST

10 V(I ) = VI(I ) DO 20 I-NREST+I,NL,6

V(I ) = VI(I ) V(I+I) = VI(I+I) V(I+2) = VI(I+2) V(I+3) = VI(I+3) V(I+4) - VI(I+4)

20. V(I+5) = Vl(I+5) RETURN ERD