Ab initio geometry optimization for large molecules

— —< <

Ab Initio Geometry Optimization forLarge Molecules

FRANK ECKERT, PETER PULAY,* HANS-JOACHIM WERNERInstitut fur Theoretische Chemie, Universitat Stuttgart, Pfaffenwaldring 55, D-70569 Stuttgart,¨ ¨Germany

Received 19 November 1996; accepted 15 February 1997

ABSTRACT: Various geometry optimization techniques are systematicallyŽ .investigated. The rational function RF and direct inversion in the iterative

Ž .subspace DIIS methods are compared and optimized for the purpose ofgeometry optimization. Various step restriction and line search procedures aretested. The model Hessian recently proposed by Lindh et al. has been used inconjunction with different Hessian update procedures. Optimization for over 30molecules have been performed in Z-matrix coordinates, local normalcoordinates, and curvilinear natural internal coordinates, using the sameapproximations for the Hessian in all cases. The most effective and stableprocedure for optimization of equilibrium structures was found to be the DIISminimization in natural internal coordinates using the BFGS update of themodel Hessian. Our method shows faster overall convergence than all previouslypublished methods for the same test suite of molecules. Q 1997 John Wiley &Sons, Inc. J Comput Chem 18: 1473]1483, 1997

Keywords: geometry optimization; DIIS optimization algorithm; Hessian;natural internal coordinates; rational function

Introduction

ne of the most important areas of appliedO quantum chemistry is the determination ofŽstationary points equilibrium and transition struc-

* Permanent address: Department of Chemistry and Biochem-istry, University of Arkansas, Fayetteville, AR 72701, USA

Correspondence to: H.-J. WernerContractrgrant sponsor: Air Force Office of ResearchContractrgrant sponsor: National Science FoundationContractrgrant sponsor: Alexander-von-Humboldt Founda-

tion

.tures on potential energy surfaces. The availabil-ity of ab initio gradient techniques1 ] 3 has made thedevelopment of efficient geometry optimization

Ž .methods possible for reviews see Refs. 4]6 . Forlarger systems, ab initio geometry optimization isonly practical using analytic gradients and approx-imate Hessians employing some variant of thequasi-Newton algorithm.7

The efficiency of an optimization, measured bythe number of energy and gradient evaluationsneeded to achieve convergence, depends mainly

Ž .on three factors: 1 the optimization algorithmŽ .controlling the step size and direction; 2 the ap-

( )Journal of Computational Chemistry, Vol. 18, No. 12, 1473 ]1483 1997Q 1997 John Wiley & Sons, Inc. CCC 0192-8651 / 97 / 121473-11

ECKERT, PULAY, AND WERNER

Ž .proximation to the Hessian; and 3 the coordi-nates used to describe the system. Reasonablestarting geometries can easily be obtained bygraphics-based molecular model builders withsubsequent preoptimization by molecular mechan-ics force-field methods.

The most important algorithms for ab initio ge-Ž .ometry optimization are the rational function RF

method,8 the closely related eigenvector followingŽ . 9EF algorithm, and the direct inversion in the

Ž . 10iterative subspace DIIS method. There are manypossibilities for approximating the Hessian.4

Widely used are approximations based on empiri-cal force fields, such as valence force fields,11,12 ormolecular mechanics force fields.13 Recently, a verysimple model Hessian has been proposed by Lindhet al.,14 which depends on the actual geometry andhas been shown to be at least as effective as muchmore elaborate force fields. This model Hessianhas been employed in the present work, togetherwith various Hessian update procedures.

During recent years much of the attention hasshifted toward the coordinate system in which theoptimization is carried out. The simplest choice isCartesian coordinates,13,15 but these are moststrongly coupled and work well only if good ap-proximations to the Hessian are available. To re-duce the couplings various internal coordinateshave been suggested: curvilinear natural internalcoordinates16,17; redundant internal coordi-nates18,19; and the delocalized internal coordinatesby Baker et al.20

In the present article, we compare optimizationsŽin local normal coordinates defined by the eigen-

.vectors of the approximate Hessian with curvilin-ear internal coordinates.16,17 For comparison, somecalculations are also performed in Z-matrix coordi-nates. For purposes of unambiguous comparisonthe model Hessian has been transformed into anyof the three different coordinate systems studied.

ŽFurthermore, the same optimization methods RF.or DIIS were used for each choice of coordinates.

The following section summarizes the optimiza-tion algorithms, coordinates, and Hessian approxi-mations employed in this work. In a later sectionthese methods are systematically tested for a widevariety of molecules. The RF and DIIS methods arefound to perform about equally well when com-bined with the model Hessian of Lindh et al.14

Natural internal coordinates16,17 were found to bemore effective than Z-matrix and local normalcoordinates.

Methodology

RATIONAL FUNCTION METHOD

For a surface of n independent coordinates, aquadratic Taylor expansion can be used to approx-imate the energy surface in the neighborhood of a

Ž .point x s x , . . . , x :k 1 n

1Ž2. † †Ž . Ž . Ž .E x s E x q g s q s H s 1k k k k k k2

Ž .The step vector s s x y x describes a displace-k kment from the reference geometry x ; g and Hk k k

Žare the gradient vector and Hessian matrix first.and second derivative of the energy at x ; thek

subscript k refers to the step number. ApplyingŽ2. Ž .the stationarity condition, E r s s 0, to eq. 1k

Ž .leads to the quasi-Newton QN step:

y1 Ž .s s yH g 2k k k

The QN step is only reasonable if the Hessian hasŽthe correct eigenvalue structure i.e., for minimiza-

.tions all eigenvalues must be positive . If the Hes-sian is not positive definite, or the QN step is too

Ž .large, one can modify eq. 1 by adding a term1 †ls s . Minimization of the resulting functionalk k2

leads to a level shift in the denominator:

y1Ž . Ž .s s y H q l1 g 3k k k

Ž .A sufficiently large value of l holds H q l1kpositive definite and restricts the optimization stepto a trust region in which the quadratic approxi-mation to the energy surface is reasonable.

A natural choice of l is provided by the rationalŽ . 8,9function RF method. In the simplest version of

Ž .this method, the linear equation, eq. 3 , is re-placed by an eigenvalue equation:

H gk k s sk k Ž .s l 4k† ž / ž /ž /g 0 1 1k

Ž .The matrix on the left-hand side of eq. 4 is theaugmented Hessian matrix of dimension n q 1. Re-

Ž .solving eq. 4 yields:

† Ž .l s g s 5k k k

Ž . Ž .g q H y l 1 s s 0 6k k k k

Ž . Ž .Thus, eq. 6 is equivalent to eq. 3 with l givenŽ .by eq. 5 . Note that l is negative in general, ask

the step direction is roughly opposite to the direc-tion of the gradient.

VOL. 18, NO. 121474

GEOMETRY OPTIMIZATION

For minimizations, l is chosen as the lowestkeigenvalue of the augmented Hessian matrix. Thecorresponding eigenvector gives, after scaling itslast element to 1, the optimization step. This choice

Ž .of l ensures that H y l1 is positive definite,k kbut it does not guarantee convergence. If the Hes-sian has small eigenvalues, the RF step may be-come very large and overshoot the minimum. It isthen necessary to restrict the step size. There areseveral ways of doing this.

( )i Scaling of the whole step vector if its normŽ < < .exceeds a given threshold s - smax .k

( )ii Restricting the values of each componentŽ .to a maximum value smax .

( )iii A more sophisticated dynamical step scalingprocedure is obtained by including the

< <constraint s - smax in the minimization.kThis leads to a modified augmented Hes-sian8:

H ra g s sk k k k Ž .s l 7k† ž / ž /ž / 1ra 1rag 0k

where a determines the Lagrange multi-plier l :k

† Ž .l s ag s 8k k k

Ž . Ž .g q H y al 1 s s 0 9k k k k

If the length of the step vector exceeds thethreshold smax then a is increased and the step is

Ž . < <recalculated from eq. 7 until the condition s Fksmax is fulfilled. In contrast to a simple scaling,this method also modifies the direction of the stepvector. For large a it approaches the steepest de-scent method. Thus, convergence is guaranteed byan appropriate choice of smax. This ‘‘step re-stricted augmented Hessian method’’7 has beenused successfully in MCSCF optimization tech-

21,22 Žniques. We find, however see ‘‘Limiting the.Optimization Step’’ subsection , that, despite its

theoretical justification, this method does not im-prove the convergence of geometry optimization

Žin our examples. In fact, the simplest method i.e.,restricting all individual components of the step

.vector s to a maximum value usually leads tokbest overall convergence.

LINE SEARCH

An alternative way of improving the stability ofoptimization is to accept only the direction of the

step and determine the step length by locating theminimum between the current geometry x andkthe previous geometry x s x y s . For ap-ky1 k ky1proximately quadratic surfaces, such a line searchprocedure is valid if the scalar products g† sk ky1

† Žand yg s are both negative i.e., a minimumky1 ky1.exists on the line between the two points . Exact

line search procedures minimize the energy alongs . Due to the numerous energy calculationsky1needed, such procedures are not efficient in abinitio calculations. Partial line search procedures tryto interpolate the energy minimum between thetwo points. The algorithm examined in this workis based on a proposal by Schlegel.23 It seeks theminimum along a quartic polynomial built fromthe energies and gradients at x and x with thek ky1auxiliary condition 2Er x 2 G 0 at both points, toinsure that the polynomial has only one minimum.The geometry, energy, and gradient interpolatedin this manner are used to determine the newoptimization step s .k

GEOMETRY DIIS

The DIIS method10 attempts to find an optimumlinear combination of the previous geometries. Thebasic assumption of the DIIS method is that theerror vector, e*, which is a measure for the devia-tion of the actual geometry from the minimum, isa linear function of the geometry, and thus can beapproximated by a linear combination of the errorvectors, e , e , . . . , e , obtained in the previous it-1 2 kerations 1 . . . k:

k

Ž .e* s c e 10Ý i iis1

As discussed in what follows, several choices areŽ .possible for vectors e e.g., the gradient . The besti

linear combination of geometries is determined byminimizing the norm of the interpolated error vec-tor e* with the auxiliary condition Ý c s 1. Usingi ithe Lagrangian method of constrained optimiza-tion yields an inhomogeneous system of k q 1linear equations for coefficients c :i

A 1 c 0 Ž .s 11† ž / ž /ž / yl 11 0

where A s e†e is the DIIS matrix built fromi j i jscalar products of the error vectors, 1 is a vector oflength k with all elements 1, and l is the La-

JOURNAL OF COMPUTATIONAL CHEMISTRY 1475


grangian multiplier. The linear equations may beconveniently solved by diagonalization of the DIISmatrix and subsequent renormalization of the co-efficient vector c, such that Ý c s 1. By monitor-i i

Ž .ing its eigenvalues, one can detect approximateŽlinear dependencies in the DIIS matrix A A is

scaled such that the last diagonal element becomes.unity . In the case of linear dependency, the largest

Žerror vectors i.e., the most remote from the actual.geometry are successively removed from the DIIS

matrix until A becomes reasonably well condi-Ž .tioned. The coefficients c obtained from eq. 11i

are used to interpolate the geometry and gradientof the system:

kU Ž .x s c x 12Ýk i i

is1

kU Ž .g s c g 13Ýk i i

is1

The new geometry is then obtained by taking aquasi-Newton optimization step using the interpo-lated geometry and gradient:

U y1 U Ž .x s x y H g 14kq1 k k k

Alternatively, geometry relaxation can be achievedby performing an RF step as described previouslyusing xU and gU . The QN or RF relaxation stepsk ksU s x y xU were scaled if their norm ex-k kq1 k

Žceeded 0.3 measuring the bond lengths in a and0.angles in radians . Additionally, the individual

Žcomponents of the entire GDIIS step i.e., interpo-.lation plus relaxation , s s x y x , were re-k kq1 k

Ž .stricted to a maximum value 0.3 a or radians .0The convergence properties of the GDIIS opti-

mization depend on the choice of error vectors thatdetermine the DIIS matrix A. In the original workby Csaszar and Pulay,10 the error vectors were´ ´approximated by quasi-Newton steps built fromthe approximate Hessian matrix H and the previ-kous gradients g of the optimization, e s yHy1 gi i k iand A s g†Hy1†Hy1 g , corresponding to an in-i j i k k jterpolation in the subspace of the optimization

Ž .steps geometries . In practice, we found that opti-mizations using this interpolation scheme showedoscillative behavior in some of the test cases andthus converged slowly. Alternative approaches areDIIS interpolations that attempt to minimize the

Ž † y1 . Ženergy i.e., A s g H g or the gradients i.e.,i j i k j† .A s g g . One would expect the gradient inter-i j i j

polation to perform better with stiff molecules,because strong bonds that have large gradient

components are weighed strongly, whereas theinterpolation of the steps or the energy shouldperform better with flexible molecules. We alsotested an approximate energy interpolation schemeusing the diagonal elements of the Hessian re-

˚stricted to 0.5 G H G 3.0 aJrA. The purpose ofi ithis restriction is to prevent the interpolation fromweighing torsional motions with small force con-stants too much, and stiff bonds too little. For mostcases we find that the simplest method, using thegradient for the error vector yields the fastest con-

Ž .vergence cf. next section .

HESSIAN MATRIX

Another highly important factor for the conver-gence behavior of geometry optimizations is theapproximation to the Hessian matrix, which deter-

Ž .mines the harmonic second order coupling be-tween the coordinates. A good approximation tothe Hessian matrix will accelerate the convergenceof any optimization. Calculation of the exact ana-lytical or numerical Hessian is computationally

w Ž .costly ; O n times the energy and gradient cal-xculations and is usually not economical in ab initio

optimizations on ground-state molecules.For the latter, the usual procedure is to calculate

an approximate Hessian from a molecular forcefield and improve it by an updating procedure. Apromising alternative to this approach is the‘‘model Hessian’’ recently proposed by Lindh etal.14 The model Hessian is a very simple forcefieldapproximation derived from only 15 parameters,which incorporates quadratic contributions frombond stretchings, angle bending, and torsional an-gle bending. The model Hessian depends on thegeometry and is recalculated in every optimizationstep. As shown in Ref. 14, it outperforms betterapproximations to the Hessian, which are calcu-lated only once at the start of the optimization. Inaddition, it is useful to apply updating to themodel Hessian.

Updating procedures try to improve an approx-imate Hessian matrix using the geometries andgradients generated during the optimization his-

Ž .tory for a review see Ref. 4 . Two methods arewidely used for energy minimizations: the BFGSupdate24 ] 27 and the conjugate gradient update,28

as modified by Schlegel.23 The latter algorithmexcludes points from the update if they are too farfrom the actual point or if the geometries arelinearly dependent, avoiding updates with ‘‘chem-ically insignificant’’ geometries.23 It also reordersthe optimization’s history by decreasing energy.

VOL. 18, NO. 121476


The BFGS update uses just the last m geometriesand gradients. The BFGS method has the tendencyto keep the Hessian positive definite, which is veryuseful in minimizations, because negative Hessianeigenvalues may lead to uphill steps on the PES.

COORDINATE SYSTEMS

The choice of the coordinate system is an impor-tant factor for the convergence of geometry opti-

Žmizations. For strictly quadratic surfaces i.e., very.close to the minimum , all nonsingular coordinate

systems are equivalent, provided that the Hessianis appropriately transformed. Any optimization al-gorithm will run into difficulties in the presence of

Ž .large anharmonic cubic and higher couplings.Ž .Simple Hessian estimates e.g., a diagonal one

work only for weakly coupled coordinates. Thecoordinate system should therefore be chosen suchthat coupling terms in the potential function be-tween coordinates are minimized.

Cartesian coordinates, widely used in molecularmechanics and modeling programs, are the sim-plest choice, but they are highly coupled and alsoinclude redundant translations and rotations.Curvilinear constraints can be imposed only ap-proximately in Cartesian optimizations.29,30 How-ever, if a good approximation to the Hessian isavailable, Cartesians work well, as shown byBaker13,15 and as expected from the previous dis-cussion.

Internal coordinates are more appealing fromthe chemist’s point of view because they employquantities such as bond lengths and bond angles.Much work in geometry optimization was done inZ-matrix coordinates, which represent a simplenonredundant internal coordinate system, contain-ing just bond lengths, bond angles, and dihedralangles. However, setting up reasonable Z-matricesŽ .i.e., ones with low couplings can be quite compli-cated for large molecules, especially for ring sys-tems, and it cannot easily be automated.

If a good approximation to the Hessian is avail-Ž .able e.g., from molecular mechanics one can cre-

ate a set of local normal coordinates in the frame ofthe eigenvectors of that Hessian. The advantage ofthese coordinates is their simple automatic genera-tion and low harmonic couplings. Local normalcoordinates are linear combinations of bond lengthsand angles and, therefore, do not provide the samevivid picture to the chemist as individual coordi-nates like Z-matrices.

Curvilinear internal coordinates have been usedin the earliest gradient geometry optimizations,but the effort needed to set them up limited theirusage. However, the superior convergence theycan potentially provide has shifted more attentiontoward them recently.13,17,18 The ‘‘natural internalcoordinates,’’16 introduced long ago, minimize bothharmonic and anharmonic couplings. They resem-ble the coordinates used by vibrational spectrosco-pists31 and are built from bond stretchings andlinear combinations of bond angles and torsionalangles, employing local pseudosymmetry aroundeach atomic center. Their construction is quitecomplicated but automatic programs have recentlybecome available for their construction.17,32 In cer-tain cases it is difficult to avoid using more than

Ž .3N y 6 5 natural internal coordinates. Such re-dundancies can be handled by a generalization ofthe usual optimization procedures.19 Ultimately,one can use a heavily redundant full set of bondlengths, bond angles, torsional angles, and out-of-plane displacements,18,20 avoiding the constructionof the natural internals. This method has not beentested in the present work. It is important to en-sure that the definition of the internal coordinatesstays constant in the course of the optimization. Inthe present work, we always use the coordinatesdetermined at the starting geometries.

To compare optimizations in different coordi-nate systems on equal footing, one must use ap-proximations equivalent to the Hessian matrix.Our program transforms a given gradient andHessian from Cartesian coordinates to either Z-matrix coordinates, local normal coordinates, ornatural internal coordinates. In most cases, themodel Hessian of Lindh et al.14 was used. TheHessian update was performed in internal coordi-

Žnates i.e., after the transformation was carried.out .

Results and Discussion

The methods outlined above have been incorpo-rated into the MOLPRO33 package of ab initioprograms. To compare the methods systematically,we used a test suite proposed by Baker.13 This testsuite comprises 30 molecules with 2 to 81 internaldegrees of freedom. All optimizations were carried

Ž .out at the restricted Hartree]Fock RHF level em-ploying the STO-3G basis set. Initial Cartesian ge-ometries were taken from Ref. 13. The units for



Ž .internal coordinates are atomic units a for0stretchings and radians for angles. Units for energy

Ž .are atomic units E . Unless noted otherwise, thehmagnitudes of the components in the optimization

Ž .step were restricted to 0.3 a or radians or less in0internal coordinates. The optimization was termi-nated when the maximum gradient component in

Žinternal coordinates was less than 0.0003 E ra orh 0.E rrad and either an energy change from theh

previous cycle was less than 10y6 E or all compo-h

nents of the optimization step in internal coordi-nates were smaller than 0.0003. These convergencecriteria were also used by Baker13 and Lindh etal.14

COORDINATE SYSTEMS

Table I shows the number of iterations neededto optimize the equilibrium geometries of Baker’stest set of 30 molecules.13 For comparison, this

TABLE I.Comparison of Number of Steps Required to Optimize Equilibrium Geometries.

b c d dBaker Lindh et al. LNC Nat. internala e eMolecule EF RF DIIS RF DIIS RF DIIS

Water 5 4 4 4 4 4 4Ammonia 6 5 7 5 5 6 6Ethane 4 4 4 4 4 4 4Acetylene 6 5 5 5 5 6 6Allene 5 5 4 5 5 4 5Hydroxysulfane 11 8 11 7 9 7 7Benzene 4 3 3 3 3 3 3Methylamine 5 5 5 5 5 5 5Ethanol 6 5 5 5 5 5 5Acetone 7 5 5 5 5 5 5Disilylether 10 11 11 10 12 9 91,3,5-Trisilacyclohexane 8 8 8 8 8 6 6Benzaldehyde 6 5 5 5 5 5 51,3-Difluorobenzene 5 5 5 5 5 5 51,3,5-Trifluorobenzene 5 4 4 4 4 4 4Neopentane 5 5 4 4 4 4 4Furan 8 7 6 6 7 6 6Naphtalene 5 6 6 6 6 6 61,5-Difluoronapthalene 6 6 6 6 6 6 6

f f2-Hydroxybicyclopentane 15 10 8 9 9 9 9ACHTAR10 11 8 8 8 8 9 8ACANIL01 7 8 8 8 7 8 8Benzidine 10 10 10 8 8 7 8Pterin 9 9 9 9 8 9 9Difuropyrazine 8 7 7 7 7 7 7Mesityloxide 7 6 5 5 5 6 6Histidine 30 20 44 23 28 14 14Dimethylpentane 9 10 9 10 10 10 9Caffeine 10 7 7 7 7 7 7Menthone 14 14 14 13 14 10 10

Total 240 215 237 209 218 196 196

a Starting geometries given in Ref. 13.b ( )Ref. 13 using a molecular mechanics Hessian in natural internal coordinates and eigenvector-following EF optimizationalgorithm,9 which is very similar to the RF method.c Ref. 14 using their model Hessian with BFGS update and local normal coordinates.d This work, using the model Hessian of Lindh et al.14 with BFGS update. LNC s optimization in local normal coordinates; Nat.internal s optimization in natural internal coordinates.e ( )Geometry DIIS algorithm interpolation in the subspace of the gradients .f ( )Converged to higher final energy structure than Ref. 13 E s y265.46237035 E .h

VOL. 18, NO. 121478


table also includes the best results of Baker andLindh et al.14 For each molecule, four calculationsare presented: minimization with the rational func-tion or the geometry DIIS methods in either localnormal or natural internal coordinates. In all cases,the Hessian matrix was approximated by the modelHessian of Lindh et al.,14 which was updated ininternal coordinates using the standard BFGS pro-cedure.

Using natural internal coordinates, one moleculeŽ .2-hydroxybicyclopentane converged to a higher

Ž .local energy minimum endo form than in Ref. 13Ž .exo form . The optimizations in local normal coor-dinates were essentially the same as in Lindh’scalculations, with minor differences, due to differ-ent restrictions on the step length. Table I showsthat, measured by the total number of iterationsteps, natural internal coordinates perform some-what better than local normal coordinates.

Table II compares optimization in Z-matrix co-ordinates with local normal and natural internalcoordinates for eight medium-sized cyclic mole-cules, used by Schlegel34 to compare Z-matrix,Cartesian, and mixed coordinate optimizations.The Z-matrix coordinates are stated to have mini-mal couplings. Nevertheless, Table II shows thatZ-matrix coordinates are inferior to the other twocoordinate systems. Overall, natural internal coor-dinates show the best performance. As expected,the performance of all coordinate systems is simi-lar in those cases where the starting geometry is

good, as shown by the low number of iterationsŽ .F 10 needed for convergence. In the more diffi-cult cases, the differences are larger.

APPROXIMATIONS TO THE HESSIANMATRIX

Table I shows that the use of Lindh’s modelHessian, which depends on the geometry and isrecomputed in every iteration, is superior to calcu-lating the Hessian matrix once by a molecularmechanics force field and using it throughout theoptimization.13 The total number of optimization

Ž .cycles 196 for RF method was considerablyŽsmaller than in previous work 240 steps using the

. 13eigenvector method .Different updating schemes are compared in

Table III. It is readily seen that updating improvesconvergence rapidly. This is especially true in localnormal coordinates, which are more strongly cou-pled than the natural internal coordinates, andthus depend more critically on the approximationof the Hessian matrix. For all four cases, the stan-dard BFGS update24 ] 27 performed much better

Ž . 23than the conjugate gradient CG update. Theproblem of the CG update seems to be the reorder-ing of the optimization steps by decreasing energy.If an optimization step leads to an increase in theenergy it is not used in the update. So, no newinformation is incorporated in the Hessian, leadingin some cases to infinite oscillation between twogeometries. Although, in most cases, the CG up-

TABLE II.Comparison of Number of Steps Required to Optimize Equilibrium Geometries Using Various Coordinate

aSystems.

bZ-matrix Local normal Nat. internal

Molecule RF DIIS RF DIIS RF DIIS

2-Fluorofuran 7 7 7 7 6 6Norbornane 13 13 6 5 5 5

[ ]Bicyclo 2,2,2 octane 16 21 13 15 12 13[ ]Bicyclo 3,2,1 octane 6 6 6 6 6 6

2-Hydroxybicyclopentane 7 8 9 9 8 8ACTHCP 40 40 24 24 29 22

qHistamine H 40 ) 50 34 ) 50 21 231,4,5-Trihydroxyanthroquinone 8 8 7 7 7 7

Total 142 ) 153 106 ) 123 94 91

a Algorithms and parameters as used in Table I. The ‘‘) ’’ signs indicate that some molecules failed to converge within 50( )optimization cycles the maximum allowed .

b Starting geometries given in Ref. 34.



TABLE III.Total Number of Optimization Cycles in RFMinimizations Using Different Updating Schemes

ato the Hessian Matrix.

Local normal Nat. internal

Update RF GDIIS RF GDIIS

bDiag. — — 219 222None ) 329 250 300 218

cConj. grad. ) 328 ) 406 ) 314 216BFGS 209 218 196 196

a Update was restricted to the use of five geometries andgradients. Unless noted otherwise, the Hessian matrix wasapproximated by the model Hessian of Lindh et al.14 The‘‘) ’’ signs indicate that some molecules failed to converge

( )within 50 optimization cycles the maximum allowed .b Hessian matrix approximated by diagonal force field with-out update. Only available in natural internal coordinates( )see Ref. 16 .c ( )Conjugate gradient update of Schlegel see Ref. 23 .

date behaves similarly to the BFGS, this oscillatorybehavior occurring for a few molecules led to itspoorer overall performance.

It is wise to restrict the number of geometriesincluded in the BFGS update to prevent the use ofinformation from steps too far away from the ac-tual point which may be misleading. We found theinclusion of the last five geometries into the updat-ing procedure to be a good compromise. Although,in some cases, the inclusion of more geometries

Žleads to faster convergence, there are cases e.g.,molecules with very flexible modes, and therefore

.large step sizes in which the inclusion of toomany geometries is dangerous.

The algorithm which sets up the natural inter-nal coordinates also provides diagonal force con-stants as a guess to the Hessian matrix17 using asimple valence force field.31 As seen in Table III,this approximation worked quite well in spite of

its simplicity and that it is not updated—probablybecause it is tailored to the natural internal coordi-nates.

LIMITING THE OPTIMIZATION STEP

Table IV shows the performance of RF opti-mizations in different coordinate systems usingvarious methods of restricting the step length.These procedures were activated if the step norm

Ž .exceeded the threshold of 0.3 a or radians . Lower0thresholds always slowed down convergence. Withhigher thresholds the optimization may show os-

Ž .cillative behavior see below . Unexpectedly, thebest performance was achieved with the simpleststep limiting method, by cutting back those com-ponents of the step vector that exceeded thethreshold. Step scaling, which affects all compo-

Žnents scaling of the step norm or the dynamical.procedure , showed a slightly worse performance,

indicating that if some components of the stepvector are too large, the rest of them may still bereasonable and should not be scaled. In some cases,convergence was best without any restriction onthe step size. Nevertheless, we do not recommendomitting step restriction, because the Hessian mayoccasionally have a very small eigenvalue, leadingto very large steps and thus far overshooting theminimum.

LINE SEARCH

Combination of the RF method with a line searchprocedure did not improve convergence of theoptimizations—in natural internal coordinates they

Ž .converged even slower see Table IV . The fact thatadditional interpolation procedures do not yieldimproved performance of geometry optimizationsindicates that the RF method generally provides a

TABLE IV.Total Number of Optimization Cycles in RF Minimizations with Model Hessian14 and BFGS Update UsingDifferent Step Scaling and Line Search Methods.

Step scaling

Coordinates Line search None Stepmax Stepnorm Dynamic

Local normal None 209 209 212 224Local normal Partial 209 210 211 214

Natural internal None 200 196 198 202Natural internal Partial 206 206 209 210

VOL. 18, NO. 121480


reasonable step toward the minimum and does notŽneed further improvement except for simple step

.scaling .

GEOMETRY DIIS

Table V compares GDIIS optimizations with dif-ferent definitions of the error vector. Gradientsattach more weight to stiff degrees of freedom,whereas estimated errors in the geometry, yH g ,k kweigh floppy coordinates more strongly. Using

Žestimated energy lowerings i.e., the scalar prod-ucts of geometry steps and gradients in the DIIS

. Ž .matrix Eq. 11 is intermediate between these.Using Baker’s test suite, which includes both stiffand flexible molecules, the best overall perfor-mance was achieved using gradients.

The number of error vectors included in theDIIS procedure is also important. Error vectorswhich are too far away from the actual geometryare not linear in the geometry and, by contributingmisleading information, slow down the optimiza-tion. On the other hand, the quality of the DIISinterpolation improves with the number of vectorsincluded. We found the use of the gradients of thelast five geometries to be a reasonable compromisewhich yielded the best convergence for most of thetest cases.

We have also compared two methods for theŽrelaxation of the geometry i.e., the optimization

step which is made using the interpolated geome-.try and gradient . A simple quasi-Newton step,

proposed originally,10 was found to be less effec-Ž . Žtive than a rational function RF step see

.Table V .To summarize, the most effective GDIIS inter-

polation was obtained with a DIIS matrix con-structed from, at most, five gradient vectors of theprevious geometries. The new geometry was then

predicted using the RF method from the interpo-lated geometry and gradient vectors. Using naturalinternal coordinates, the RF and DIIS methodsperformed about equally well, whereas, in localnormal coordinates, the RF method appears to be

Ž .superior cf. Tables I and II . On the other hand,Table III shows that the GDIIS method is muchless sensitive to the Hessian update procedure.One may therefore argue that GDIIS is the morestable optimization technique.

CONVERGENCE CRITERIA

All calculations reported so far were carried outusing Baker’s convergence criterion.13 Baker usesthe maximum component of the gradient togetherwith the maximum component of the step or theenergy decrease between the previous two geome-

Ž .tries see left scheme in Fig. 1 . This criterion canhandle rigid systems, in which small changes incoordinates cause strong changes in the energyŽ .convergence due to small gradient and step aswell as flexible molecules, which show relatively

Žlarge displacements with small chemically in-. Žsignificant energy changes convergence due to

.small gradient and energy change .We propose another scheme, which is very close

to Baker’s criterion. It provides virtually the sameŽenergy in all optimizations deviations were below

y7 .10 E , but often saves a gradient calculation ashcompared with Baker’s scheme. We simply checkfor convergence after the energy calculation, but

Žbefore gradient evaluation see right scheme in.Fig. 1 . As seen in Table VI, this reduces the total

number of optimization cycles from 196 to 185 forthe rational function optimization in natural inter-nal coordinates using a BFGS-updated model Hes-sian. The somewhat weaker convergence criterionused in Gaussian18,35 reduces the total number of

TABLE V.Total Number of Optimization Cycles in Geometry DIIS Minimizations with Model Hessian and BFGSUpdate Using Different Approximations to the DIIS Matrix.

a bGradients Scaled energy Energy Steps† † y1 † y1 † y1† y1g g g H g g H g g H H gi j i k j i k j i k k j

c d c c cCoordinates RF QN RF RF RF

Local normal 218 284 218 219 233Natural internal 196 219 197 197 205

a ˚2 2[ ]Only diagonal elements of the actual Hessian scaled to 0.5 F H F 3.0 aJ / A or aJ / rad were used.iib All elements of the actual Hessian were used.c Relaxation: rational function step.d Relaxation: quasi-Newton step.



TABLE VI.Comparison of Total Number of Steps Required to Optimize Equilibrium Geometries Using VariousConvergence Criteria.

a b c dBaker Lindh Schlegel MOLPRO

Baker criterion 240 215 — 196MOLPRO criterion — — — 185Gaussian criterion — 199 183 183

a Ref. 13: EF optimization in natural internal coordinates using a molecular mechanics Hessian.b Ref. 14: RF optimization in local normal coordinates using the model Hessian.c Refs. 18 and 35: restricted quasi-Newton step in redundant internal coordinates using an empirical force-field Hessian.d This work: RF optimization in natural internal coordinates using the model Hessian of Lindh et al.14

cycles further to 183, but, in some cases, the errorŽ y4 .in the energy is much larger up to 10 E , soh

that the final energies given in Ref. 13 are notalways reproduced. As shown in Table VI, usingthe criterion in Gaussian we need the same total

Ž .number of optimization cycles 183 as Ref. 18.However, using our criterion increases the numberof cycles only insignificantly, to 185, and improvesthe accuracy of the calculated energies.

Conclusions

Calculations for a wide range of moleculesdemonstrate that the use of curvilinear naturalinternal coordinates16,17 significantly improves theconvergence and stability of geometry optimiza-tions. Local normal coordinates defined by theeigenvectors of the Hessian also perform well if areasonable approximation of the Hessian matrix isavailable. We used the model Hessian of Lindh etal.,14 which is recalculated at every optimizationstep and improved by the BFGS update procedure.These choices of coordinates and Hessian involveno user input and work as ‘‘black box’’ proceduresas required for geometry optimizations of largemolecules. Of the optimization methods investi-

FIGURE 1. Convergence schemes.

Ž . 8,9gated, the rational function RF algorithm andgeometry DIIS methods10 performed best. The RFmethod was found to be more sensitive to theHessian update procedure, whereas the DIISmethod was somewhat more sensitive to the choiceof coordinates.

The present results indicate that the optimiza-tion algorithms and the coordinate systems do notseem to leave much room for further improve-ment. Given the simplicity of the model Hessian ofLindh et al.,14 it performs remarkably well, butfurther speed-ups may be possible using moresophisticated approximations to the Hessian.

Acknowledgments

We thank Roland Lindh for stimulating discus-sions and for providing a program that computesthe model Hessian. P. P. gratefully acknowledgessupport by the Air Force Office of Research, theNational Science Foundation, and the Alexander-von-Humboldt Foundation.

References

Ž .1. P. Pulay, Adv. Chem. Phys., 69, 241 1987 .

2. Y. Yamaguchi, Y. Osamura, J. D. Goddard, and H. F.Schaefer III, A New Dimension to Quantum Chemistry—Ana-lytic Derivative Methods in Ab-Initio Molecular ElectronicStructure Theory, Oxford University Press, Oxford, 1994.

3. P. Jørgensen and J. Simons, Geometrical Derivatives of EnergySurfaces and Molecular Properties, Reidel, Dordrecht, 1981.

Ž .4. H. B. Schlegel, Adv. Chem. Phys., 67, 249 1987 .

5. J. D. Head, B. Weiner, and M. C. Zerner, Int. J. QuantumŽ .Chem., 33, 177 1988 .

6. J. D. Head and M. C. Zerner, Adv. Quantum Chem., 20, 239Ž .1989 .

7. R. Fletcher, Practical Methods of Optimization, Wiley, Chich-ester, 1987.

VOL. 18, NO. 121482


8. A. Banerjee, N. Adams, J. Simons, and R. Shepard, J. Phys.Ž .Chem., 89, 52 1985 .

Ž .9. J. Baker, J. Comput. Chem., 7, 385 1986 .Ž .10. P. Csaszar and P. Pulay, J. Mol. Struct., 114, 31 1984 .´ ´

Ž .11. H. B. Schlegel, Theor. Chim. Acta, 66, 333 1984 .Ž .12. T. H. Fischer and J. Almlof, J. Phys. Chem., 96, 9768 1992 .¨

Ž .13. J. Baker, J. Comput. Chem., 14, 1085 1993 .˚14. R. Lindh, A. Bernhardsson, G. Karlstrom, and P. A.¨

Ž .Malmqvist, Chem. Phys. Lett., 241, 423 1995 .Ž .15. J. Baker and W. J. Hehre, J. Comput. Chem., 12, 606 1991 .

16. P. Pulay, G. Fogarasi, F. Pang, and J. E. Boggs, J. Am. Chem.Ž .Soc., 101, 2550 1979 .

17. G. Fogarasi, X. Zhou, P. W. Taylor, and P. Pulay, J. Am.Ž .Chem. Soc., 114, 8191 1992 .

18. C. Peng, P. Y. Ayala, H. B. Schlegel, and M. J. Frisch, J.Ž .Comput. Chem., 17, 49 1996 .

Ž .19. P. Pulay and G. Fogarasi, J. Chem. Phys., 96, 2856 1991 .20. J. Baker, A. Kessi, and B. Delley, J. Chem. Phys., 105, 192

Ž .1996 .21. P. Jørgensen, P. Swanstrøm, and D. L. Yeager, J. Chem.

Ž .Phys., 78, 347 1983 .Ž .22. H.-J. Werner, Adv. Chem. Phys., 69, 1 1987 , and references

therein.Ž .23. H. B. Schlegel, J. Comput. Chem., 3, 214 1982 .Ž .24. C. G. Broyden, J. Inst. Math. Appl., 6, 76 1970 .

Ž .25. R. Fletcher, Comput. J., 13, 317 1970 .Ž .26. D. Goldfarb, Math. Comp., 24, 23 1970 .

Ž .27. D. F. Shanno, Math. Comp., 24, 647 1970 .Ž .28. R. Fletcher and C. M. Reeves, Comput. J., 13, 317 1970 .

29. D. Lu, M. Zhao, and D. G. Truhlar, J. Comput. Chem., 12,Ž .376 1991 .

Ž .30. J. Baker, J. Comput. Chem., 13, 241 1992 .31. E. B. Wilson Jr, J. C. Decius, and P. C. Cross, Molecular

Vibrations, McGraw-Hill, New York, 1955.32. R. Ahlrichs, M. Bar, M. Ehrig, M. Haser, H. Horn, and C.¨ ¨

Kolmel, TURBOMOLE, v. 2.1 beta, Biosym Technologies,¨San Diego, CA, 1992.

33. MOLPRO is a package of ab initio programs written byH.-J. Werner and P. J. Knowles with contributions from J.Almlof, R. D. Amos, M. J. O. Deegan, F. Eckert, S. T. Elbert,¨C. Hampel, R. Lindh, W. Meyer, M. E. Mura, K. A. Peter-son, R. M. Pitzer, H. Stoll, A. J. Stone, P. R. Taylor, and T.Thorsteinsson, Version 96.4, University of Birmingham, UK,1996.

Ž .34. H. B. Schlegel, Int. J. Quantum Chem. Symp., 26, 243 1992 .35. M. J. Frisch, G. W. Trucks, H. B. Schlegel, P. M. W. Gill, B.

G. Johnson, M. A. Robb, J. R. Cheeseman, T. Keith, G. A.Petersson, J. A. Montgomery, K. Raghavachari, M. A. Al-Laham, V. G. Zakrzewski, J. V. Ortiz, J. B. Foresman, J.Cioslowski, B. B. Stefanov, A. Nanayakkara, M. Challa-combe, C. Y. Peng, P. Y. Ayala, W. Chen, M. W. Wong, J. L.Andres, E. S. Replogle, R. Gomperts, R. L. Martin, D. J. Fox,J. S. Binkley, D. J. Defrees, J. Baker, J. P. Stewart, M.Head-Gordon, C. Gonzalez, and J. A. Pople, Gaussian-94,Revision D.1, Gaussian, Inc., Pittsburgh, PA, 1995.


Documents

Ab initio geometry optimization for large molecules