Free energy calculation methods: A theoretical and empirical comparison of numerical errors and a new method qualitative estimates of free energy changes

— —< <

Free Energy Calculation Methods: ATheoretical and Empirical Comparisonof Numerical Errors and a NewMethod for Qualitative Estimates of FreeEnergy Changes

RANDALL J. RADMER, PETER A. KOLLMANDepartment of Pharmaceutical Chemistry, University of California at San Francisco, San Francisco,California 94143-0446

Received 12 February 1996; accepted 30 September 1996

ABSTRACT: We present a comparison of four free energy calculation methods:Ž . Ž .thermodynamic integration TI ; traditional free energy perturbation FEP ;

Ž .Bennett’s acceptance ratio method IPS ; and a method that is related to anŽ .implementation of the WHAM method CRS . The theoretical bases of the

methods are first described, then calculations of the solvation free energies ofmethane and ethane are performed to determine the magnitude of the errors forthe different methods. We find that the methods give similar errors when many

Ž .intermediate states windows are used, but the IPS and CRS methods givesmaller errors than the TI and FEP methods when no intermediate states are

Ž .used. We also present a new procedure based on the CRS method that usescoordinates from simulations of a set of solutes to calculate the solvation freeenergies of additional solutes for which no simulations were performed.

ŽSolvation free energies for nine solutes methanol, dimethylether, methylamine,methylammonium, dimethylamine, fluoromethane, difluoromethane,

.trifluoromethane, and tetrafluoromethane are estimated based only onŽsimulations of set of small hydrophobic solutes including methane, ethane, and

Correspondence to: P. A. Kollman; e-mail: [email protected] grant sponsor: National Institutes of Health, con-

tract grant number: GM-39552Contract grant sponsor: National Science Foundation, Con-

tract grant number: CHE-94-13472

Q 1997 by John Wiley & Sons, Inc. CCC 0192-8651 / 97 / 070902-18

FREE ENERGY CALCULATION

.propane . These estimates can be surprisingly accurate and appear to be usefulfor making rapid estimates of solvation free energies. Q 1997 by John Wiley &Sons, Inc. J Comput Chem 18: 902]919, 1997

Introduction

ree energies calculated using molecular dy-F Ž . Ž .namics MD or Monte Carlo MC simula-tions are commonly used to obtain quantities suchas partition coefficients, relative stabilities ofmacromolecules, and free energies of binding ofsmall organic molecules to proteins and nucleicacids.1 ] 6 Although frequently accurate, these freeenergies are only estimates of the correct free ener-gies because two approximations are typically em-ployed. The first is that empirically derived, classi-cal energy functions are used to model interactionenergies. The second approximation is that numer-ical methods are used to calculate ratios of config-uration integrals that cannot be solved analyti-cally. The numerical calculations are done using a

Žset of atomic coordinates generated by MD or MC.simulations as a representative sample of all pos-

sible configurations of the system. Because only afinite number of coordinate sets are generated, thecalculated free energy is only an estimate of thecorrect free energy. Not surprisingly, more accu-rate estimates are obtained when longer simula-tions are used, but the minimum simulation timeneeded depends on the system being examinedand cannot be determined before the simulation isperformed. Pearlman7 has reported that, for somecalculations involving simple perturbations, atleast 700 ps of sampling was required for con-verged solvation free energies. It is reasonable toassume that more complex perturbations will re-quire longer simulations.

Because of the amount of computer time neededto calculate free energies, the most efficient methodshould be used. In this article, we examine theerrors resulting from the use of numerical methodsin an effort to determine when the use of thesemethods is appropriate. First, we give a theoreticaldescription of some of the different methods anddiscuss the expected errors for each of them. Then,we perform an empirical comparison of the fourfree energy calculation methods by determining

the errors in the calculated solvation free energiesof a set of solutes. The four methods are: thermo-

Ž .dynamic integration TI ; traditional free energyŽ .perturbation FEP ; Bennett’s acceptance ratio

method,1 referred to by us as the intermediateŽ .perturbed state IPS method; and a method we

Ž .refer to as the composite reference state CRSmethod that is related to an implementation of theWHAM method presented by Kumar et al.2 It isworth noting that we are comparing methods ofcalculating free energies from a given simulation;we are not comparing simulation protocols thatmay also affect the accuracy of the free energyestimates.

When calculating relative free energies usingthe methods examined, it is traditional, and gener-ally appropriate, to base the calculations on simu-lations of all relevant molecules. However, underconditions we will discuss, it is sometimes possi-ble to estimate relative free energies for a set ofmolecules based on simulations of only a subset ofmolecules. We present preliminary results show-ing that, based on simulations of one set of solutes,it is possible to estimate the relative solvation freeenergies of nine additional solutes with reasonableaccuracy. Although less accurate than calculationsin which all molecules are simulated, this proce-dure appears to give qualitative free energy esti-mates for a wide variety of molecules withoutrequiring that simulations be performed on each ofthem.

Theoretical Background

This section describes two methods for calculat-ing relative free energies, each of which is basedon an exact expression. As these cannot be solvedanalytically for most relevant systems, numericalmethods are often employed. The resulting errors,particularly those resulting from insufficient sam-pling of the relevant systems, will be examined ingreater detail and are the focus of this study.

The first method we will examine is thermody-namic integration,8 ] 10 based on the following

JOURNAL OF COMPUTATIONAL CHEMISTRY 903

RADMER AND KOLLMAN

equation:

dG dHH1 1l l Ž .DG s dl s dl 1H H10 ¦ ;dl dl0 0 l

where DG s G y G is the difference in Gibbs10 1 0free energies between states 1 and 0. G is the freel

energy of the state defined by a coupling parame-ter, l, where l can take on values from zero to

Ž .unity. The total energy Hamiltonian is given byHH , where HH and HH are the energies at states 0l 0 1

² :and 1. The angle brackets, . . . , indicate a con-l

stant pressure, constant temperature ensemble av-erage performed on the state with potential energydefined by l.

In practice, the kinetic component of the freeenergy is not included in the numerical calcula-tions described in what follows because it cangenerally be calculated analytically,9 or it can bemade to cancel if thermodynamic cycles are used.11

To keep the notation as simple as possible, we willuse G to indicate the free energies calculated, butit should be kept in mind that these would bedescribed more accurately as excess free energies.

The free energies are calculated by assumingthat an infinite time average is equal to an ensem-

Ž .ble average by the ergodic hypothesis , a finitenumber of samples are used to estimate the infi-nite time average, and the integration is donenumerically. If the trapezoid rule is used for theintegration over l, the free energy is estimated by:

nN l Ž .1 dVV rl i , k Ž .DG f Dl 2Ý Ý10 ž /n dllks1 is1

where N is the total number of simulations per-Ž .formed at different states defined by l ; n is thel

number of samples from each state; VV is thel

potential energy; and the coordinates, r , are cho-i, kŽ w VVlŽr i, k .yGlxr RT .sen with a Boltzmann probability e

by performing an MD or MC simulation at each lŽthe subscript i indicates sample number and the k

.indicates simulation number .The second method examined is free energy

perturbation,8 ] 10 using the following equation:

yD HH10 r RT Ž .² :DG s yRT ln e 3010

where R is the gas constant, T is the absolutetemperature, D HH is the energy difference be-10

² :tween states 1 and 0, and . . . indicates an0

ensemble average at state 0.In this article, we are primarily interested in

Ž .different methods for solving eq. 3 numerically.These will be described in greater detail below.

TRADITIONAL FREE ENERGYPERTURBATION WITH ONE WINDOW

Ž .When eq. 3 is used to calculate relative freeŽenergies, state 0 and in general any state on which

.a simulation is performed is referred to as theŽreference state and state 1 and any state that is not

.simulated is referred to as the perturbed state.This relationship will be represented schematicallyas follows:

6

0 1" "

In practice, the kinetic component of the freeenergy is not included in the numerical calcula-tions for reasons previously described. The relativefree energy is estimated by first assuming thatinfinite time averages are equal to ensemble aver-ages and, second, by using finite time averages toapproximate infinite time averages. This gives:

n01yD VV Žr .r RT10 i , 0 Ž .DG f D g s yRT ln e 4Ý10 10 n0 is1

where D g indicates a finite time estimate of10DG ; n is the number of samples on state 0; and10 0coordinates, r , are chosen with a Boltzmanni, 0probability by performing a MD or MC simulationon state 0.

Errors in the free energy estimates given by eq.Ž .4 can be characterized by examining an expres-

Ž .sion for the root mean square RMS error. It canŽ .be shown see Appendix that for a large number of

independent samples, an approximate expressionfor the RMS error in the estimate of DG is:10

122 Ž .1 r r 11Ž .RMS D g y DG f RT dr yH10 10 ž /Ž .n r r n0 0 0

Ž .5

Ž . yw VV Žr.yG xr RTwhere r r s e is the Boltzmann prob-ability.

Ž .Two points regarding eq. 5 are worth noting.The first is the well-known fact that the RMS erroris proportional to the reciprocal of the square root

Žof the number of independent samples RMS error1

.A ; thus, increasing sampling decreases the'nexpected error. The second point concerns how therelative probabilities of states 0 and 1 affects the

2Ž . Ž . Ž .error. Because the ratio of r r and r r in eq. 51 0must remain small to keep the RMS error small,

VOL. 18, NO. 7904


the probability of each conformation, r, sampledŽ .by the reference state, r r , should never ap-0

proach zero more quickly then the probability forŽ . Žthe perturbed state, r r . Stated more simply but1

.with less accuracy , it should be possible to sampleimportant conformations of the perturbed stateduring a simulation of the reference state. Thisleads to the common view that two states must besimilar for the free energy perturbation method togive good estimates of their relative free energy. Infact, this is only necessary if both states are to besimulated, and the relative free energy of the otherstate calculated. As we will discuss later, it is oftenadvantageous to simulate only one state, whileensuring that important perturbed state conforma-tions are reasonably likely to be sampled. Thisrequires only that the accessible perturbed stateconformations be a subset of the accessible refer-ence state conformations.

TRADITIONAL FREE ENERGYPERTURBATION WITH MULTIPLE WINDOWS

One way to decrease the RMS error is to addadditional states between the two states of interest.The calculation then involves combining resultsfrom a series of smaller calculations, where eachcalculation is referred to as a window. For simplic-ity we will discuss the use of only one intermedi-ate state, state m. A calculation from state 0 tostate 1 can be expanded into two calculations: thefirst from state 0 to state m, and the second fromstate m to state 1:

6 6

m0 1" ""

The estimate of the total change in free energyw Ž .xis see eq. 4 is given by:

D g s D g q D g10 1m m0

nm1yD VV Žr .r RT1 m i , ms yRT ln eÝnm is1

n01yD VV Žr .r RTm0 i , 0 Ž .y RT ln e 6Ýn0 is1

where r are coordinates generated by perform-i, 0ing a simulation on state 0, r are coordinatesi, mgenerated by performing a simulation on state m,and n and n are the number of samples on0 mstates 0 and m, respectively.

Samples on states 0 and m are independent, sow Ž .xthe RMS error can be given as follows see eq. 5 :

Ž .RMS D g y DG10 10

2 Ž .1 r rmf RT drHž Ž .n r r0 0

122 Ž .1 1 r r 11 Ž .y q dr y 7H /Ž .n n r r n0 m m m

This equation is useful because it makes it pos-sible to compare the errors in this method versus acalculation with no intermediates; if the energy ofstate m is defined as the average of the energies of

Žstate 0 and state 1 and equal total sampling is. Ž .done , then subtracting the square of eq. 7 from

Ž .the square of eq. 5 always gives a positive num-ber, indicating that this method does give a smallerexpected error than methods using no intermedi-ate states.

Ž .The definition of state m that minimizes eq. 7could be found in the same manner as will bedone in the next two sections, but it does notaddress the essential problem with this method.The difficulty is that state m acts as both a refer-ence state, which should be able to sample allimportant configurations of state 1, and as a per-turbed state, which should be completely accessi-ble from sampled configurations of state 0. Arrang-ing the calculation so that the intermediate statesare only reference states or perturbed states elimi-nates this problem and will be described in whatfollows.

INTERMEDIATE PERTURBED STATE( )IPS METHOD

Sampling could be done from state 0 to a newŽ .intermediate perturbed state state b and, inde-

pendently, from state 1 to the same intermediatestate. The total free energy change can be found bysubtracting these two free energy differences. Anoptimal state b will be defined such that the ex-pected RMS error in the estimated free energydifference between states 0 and 1 is minimized.Although we will present it in a different form,this is formally identical to Bennett’s acceptanceratio method1:

6

60 b 1" " "


RADMER AND KOLLMAN

w Ž .xThe total change in free energy is see eq. 4 :

D g s D g y D g10 b0 b1

1n yD VV Žr .r RT0 b0 i , 0Ý eis1n0 Ž .s yRT ln 81

n D VV Žr .r RT1 b1 i , 1Ý eis1n1

Ž .and the RMS error see Appendix is given by:

2 2Ž . Ž .r r r rb bŽ .RMS D g y DG f RT qH10 10 Ž . Ž .ž n r r n r r0 0 1 1

121 1

=dr y y /n n0 1

The definition of state b that minimizes thisŽ .equation see Appendix is:

Ž . Ž .n r r n r r0 0 1 1Ž . Ž .r r s l 9b , o p t b Ž . Ž .n r r q n r r0 0 1 1

where l is the normalization constant. Equation 9bindicates that the accessible configurations of theoptimal perturbed state are essentially the configu-rations accessible to both of the reference state,ensuring that accessible perturbed state conforma-tions can be sampled during simulations of eitherreference state.

The relative free energy, DG , is estimated by10Ž . Ž . w Ž .solving for VV r in eq. 9 using r r sb b, o p t

yw VV bŽr.yG b xr RT x Ž .e , substituting this into eq. 8 , andrearranging and canceling terms:

1 eyVV 1Žr i , 0 .r RTn0Ýis1 Ž .n q r0 b i , 0 Ž .D g s yRT ln 1010 yVV Žr .r RT0 i , 11 en1Ýis1 Ž .n q r1 b i , 1

where:

1 1x xyw VV Žr.q DG rRT yw VV Žr.y DG rRT0 10 1 102 2Ž .q r s n e q n eb 0 1

Ž .11

It is generally advantageous to rearrange theseequations so that the exponentials contain relative,rather than absolute, energies to prevent numericaloverflow.

Ž . Ž .To solve eqs. 10 and 11 for D g , the value of10DG is needed. As this is obviously not known, an10

Žestimate of DG must be used this is still correct10

in the large sample limit, although it is not opti-.mal . Two simple ways of dealing with this are to

Ž .set DG to zero we call this the IPSO method or10to set DG equal to D g . For the second case, the10 10free energy differences can be found by substitut-

Ž . Ž .ing D g for DG in eqs. 10 and 11 , and solv-10 10Ž 12ing for D g Numerical Recipes gives efficient10

.methods for finding roots of equations . A conve-nient method used by Bennett1 consists of guess-

Ž .ing an initial value of D g , using this in eq. 11 ,10Ž .and using the resulting q r to get an improvedb i

Ž .estimate of D g using eq. 10 . This process is10Žrepeated until D g is self-consistent we call this10

.the IPSsc method . A minor disadvantage of thismethod is that it requires storing the potentialenergies found during the simulation and calcu-lating the free energies after the simulations arecompleted.

To get accurate results, some overlap must existbetween states 0 and 1. If the states are too dissim-ilar, additional simulations can be performed onappropriately chosen intermediate states and thecalculation done for each of the smaller perturba-tions as described previously. This is illustrated

Ž .below for n arbitrary intermediate states m , re-Ž .sulting in n q 1 IPS states b .

mb0 ª ¤ ª ???11# # #m b 1¤ ª ¤n nq1 ###

INTERMEDIATE REFERENCESTATE METHODS

Ž .A new intermediate reference state state s canbe defined, making states 0 and 1 perturbed states.Although we use different notation, this is equiva-lent to using umbrella sampling to find the free

Ž 13.energy difference see Torrie and Valleau :

6

6

0 s 1" ""wThe estimate change in free energy is see eq.

Ž .x4 :

D g s D g y D g10 1 s 0 s

1n yD VV Žr .r RTs 1 s i , sÝ eis1ns Ž .s yRT ln 121n yD VV Žr .r RTs 0 s i , sÝ eis1ns

VOL. 18, NO. 7906


and the RMS error, is given by:

Ž .RMS D g y DG10 10122w Ž . Ž .x1 r r y r r0 1f RT drH Ž .ž /n r rs s

Ž .see Appendix . Unfortunately, the optimal refer-ence state found using this equation is not useful.Although it minimizes the RMS error in the esti-mate of DG , it also allows the RMS error in10

Žestimates of the intermediate free energies D g1 s.and D g used in eq. 12 to approach infinity. So,0 s

in practice, D g and D g cannot be calculated if1 s 0 sthe simulation is performed at the optimal refer-ence state.

In principle, any state that samples all accessibleconfigurations of states 0 and 1 can be used, butthis may be difficult to choose in practice. A com-putationally convenient definition of state s resultsfrom generating a set of coordinates from state 0Ž .by performing a simulation on state 0 and gener-

Žating another set of coordinates from state 1 by.performing a simulation on state 1 . The combina-

Žtion of the two sets of coordinates in the large.sample limit can be considered to have resulted

from a single simulation on a state with probabil-Ž . Ž .ity, r r , equal to the weighted average of r rs, av e 0

Ž .and r r :1

Ž . Ž .n r r q n r r0 0 1 1Ž . Ž .r r s 13s , av e n q n0 1

This is used to estimate DG by solving for10Ž . Ž . w Ž . yw VV sŽr.yGs xr RT xVV r in eq. 13 using r r s e .s s, av e

Ž .From eq. 12 , equations giving g and g can be0 s 1 sfound:

n yVV Žr .r RT0 i , 001 eD g s yRT ln Ý0 s yVV Žr .r RTs i , 0žn q n e0 1 is1

n yVV Žr .r RT0 i , 11 eq Ý yVV Žr .r RTs i , 1 /eis1

and:

n yVV Žr .r RT1 i , 001 eD g s yRT ln Ý1 s yVV Žr .r RTs i , 0žn q n e0 1 is1

n yVV Žr .r RT1 i , 11 eq Ý yVV Žr .r RTs i , 1 /eis1

where:

eyVV sŽr.r RT

n eyw VV 0Žr.yDG 0 s xr RT q n eyw VV 1Žr.yDG1 s xr RT0 1s

n q n0 1

Because DG and DG are not known, their val-0 s 1 sues will be estimated using D g and D g . The0 s 1 srelative free energies can then determined self-con-

wsistently as was done with the IPSsc method seeŽ . Ž .xeqs. 10 and 11 .

This methodology can be extended to more thanŽ .two states by mixing additional states into r r ,s

and performing all perturbations with respect tostate s. For a total of N states, this can be repre-sented as:

??? N y 12# #N1

6 6

# 6

6

#s#

where state s is defined such that its Boltzmannprobability is equal to the weighted average of theBoltzmann probabilities of all N states:

N1Ž . Ž .r r s n r rÝs , av e k kNÝ nks1 k ks1

where n is the number of samples in the simula-kŽ .tion of state k and r r is the Boltzmann probabil-k

ity of state k. The set of N equations is representedby the following equation for the estimate of thefree energy difference between state l and state s:

n yVV Žr .r RTN l i , kk1 eŽ .D g s yRT ln 14Ý Ýl s N yVV Žr .r RTs i , kÝ n eks1 k ks1 is1

where:

N1yVV Žr.r RT yw VV Žr.yD g xr RTs k k s Ž .e s n e 15Ý kNÝ nks1 k ks1

wand l represents any state. It is worth noting thatŽ . Ž .eqs. 14 and 15 indicate that the energy of each

state must be calculated for all coordinates gener-ated, regardless of the state on which the coordi-

xnates were generated. We will refer to this as theŽ .composite reference state CRS method because

all of the perturbations are done with respect toŽ .one composite reference state state s .

Ž . Ž .The energies in eqs. 14 and 15 can be foundby periodically saving coordinates from each simu-


RADMER AND KOLLMAN

lation and then calculating the energy for all of thestates for each saved coordinate. Although thisobviously uses considerable disk space, it has theadvantage of allowing any intermediate states tobe used, and, additionally, it allows the coordi-nates to be used for additional free energy calcula-tions after the simulation is completed. This isthe method used for the H7 simulations we willdescribe.

The potential energy of intermediate states canoften be defined as a linear combination of thepotential energies of the two states of interest:

Ž . Ž . Ž . Ž . Ž .VV r s 1 y l VV r q lVV r 16l 0 1

where l is a coupling parameter that defines thestates. In this case, coordinates do not have to besaved. Instead, simulations are done on all thestates and the derivative of the potential energy

Ž .dVV rlwith respect to l is saved. Given and l

dlfor each coordinate of each simulation, the poten-tial energies can be found with respect to theenergy at l s 0, and the relative free energies can

Ž . Ž .be found using eqs. 14 and 15 . This correspondsto an implementation of the WHAM method pre-

2 w Ž .sented by Kumar et al. see eq. 21 in Kumar et2 xal. . We use this method for the MM1, MM400,

ME1, and ME400 simulations described in whatfollows.

When coordinates are saved, it is also possibleto estimate the free energy of a state for which nosimulations were done; in this case, state l in eqs.Ž . Ž .14 and 15 does not correspond to any of thestates 1 through N. For reasons previously dis-cussed, care should be taken to assure that thecomposite reference state can sample all accessibleconfigurations of the perturbed state.

Methodology

A total of five sets of simulations were per-formed such that relative free energies could be

Ž . Ž .calculated for: 1 seven hydrophobic solutes H7 ;Ž . Ž .2 methane and methane using 1 window MM1 ;Ž .3 methane and methane using 400 windowsŽ . Ž .MM400 ; 4 methane and ethane using 1 windowŽ . Ž .ME1 ; 5 and methane and ethane using 400

Ž .windows ME400 .In each case, the solute was simulated using

AMBER 4.0,14 in a box of 207 TIP3P15 watermolecules, at a constant temperature of 300 K andconstant pressure of 1 bar using coupling con-

stants16 of 0.2 ps, and a time step of 1.0 fs. TheSHAKE17 algorithm was used to constrain allbond-lengths to their equilibrium values and an

˚8-A cutoff was used for the nonbonded interac-tions. Bond, angle, and torsional parameters arefrom the Weiner et al.18 force field. Onlysolute]solvent interaction energies were includedin the calculation of the free energies.19

The H7 simulations were performed on a set ofsolutes with topology shown in Figure 1. Atomicparameters were chosen such that the solute wouldcorrespond to ‘‘nothing,’’ methane, ethane,

Ž .propane, a single generic atom X , a diatomicŽ .molecule consisting of two generic atoms X , or2

a triatomic molecule consisting of three genericŽ .atoms X . The atomic charges are given in Table3

I and the van der Waals parameters are given inTables II and III. At least 100 ps of equilibrationfollowed by 500 ps of data collection was done foreach solute. During the data collection, atomiccoordinates were saved every 0.1 ps.

Three methods were used to determine the rela-tive solvation free energies from the H7 simula-

Ž .tions. The first was the FEP method using eq. 4Žthis was done for both the forward and reverse

.directions . The second was the IPS method usingŽ . Ž . Žeqs. 10 and 11 solving for DG self-con-21

.sistently , and the last was the CRS method, usingŽ . Ž .eqs. 14 and 15 , where the composite reference

state consists of the seven solutes described previ-ously. All calculations were done using only thesaved coordinates. To determine the energy of theperturbed states, the coordinates were modifiedsuch that they would correspond to the correctbond-lengths as necessary. Resulting free energiesare given in Table IV.

The MM1 and MM400 simulation sets wereused to estimate the RMS error in the calculated

FIGURE 1. Topology for H7 solutes. See Table I forcharges, and Tables II and III for van der Waalparameters.

VOL. 18, NO. 7908


TABLE I.( )Atomic Charges Used for Each Solute See Fig. 1 .

Solute A1 B1 B2 C1 C2 D1 D2, D3 D4 ]D6

Nothing 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000Methane y0.464 0.116 0.116 0.116 0.116 0.000 0.000 0.000Ethane y0.027 y0.027 0.009 0.009 0.009 0.009 0.009 0.000Propane 0.296 y0.308 y0.308 y0.041 y0.041 0.067 0.067 0.067X 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000X 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.0002X 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.0003CH OH 0.196 y0.667 0.016 0.016 0.016 0.423 0.000 0.0003

( )O CH y0.344 y0.029 y0.029 0.067 0.067 0.067 0.067 0.0673 2CH NH 0.307 y1.002 y0.017 y0.017 y0.017 0.000 0.373 0.0003 2

+CH NH y0.038 y0.192 0.114 0.114 0.114 0.296 0.296 0.0003 3( )CH NH y0.691 y0.011 y0.011 0.373 0.000 0.057 0.057 0.0573 2CH F 0.011 y0.251 0.080 0.080 0.080 0.000 0.000 0.0003CH F 0.390 y0.246 y0.246 0.051 0.051 0.000 0.000 0.0002 2CHF 0.644 y0.230 y0.230 y0.230 0.046 0.000 0.000 0.0003CH 0.756 y0.189 y0.189 y0.189 y0.189 0.000 0.000 0.0004

solvation free energy difference between twoŽ .methane solutes see Fig. 2 . Because of numerical

errors, the calculated free energy difference willnot, in general, be zero. The MM1 simulationsinvolved 18 consecutive, 200.5-ps simulations of

Ž .methane 1 and 18 consecutive 200.5-ps simula-Ž . Ž .tions of methane 2 . Individual methane 1 simula-

Ž .tions were paired with individual methane 2 sim-ulations to give 18 sets of 401-ps simulations. Noparameters were changed during these simula-

tions, so no equilibration beyond the initial equili-bration was done. The MM400 simulations alsoinvolved 18 sets of 401-ps simulations. However,each simulation involved 401 windows, 1 ps inlength. During each simulation the solute parame-ters were changed such that the interaction energy

Ž .was a linear combination of methane 1 andŽ . w Ž . xmethane 2 see eq. 16 and Fig. 2 , resulting in

Ž .the consecutive simulation of methane 1 , 399 in-Ž .termediate solutes, and methane 2 . The initial

TABLE II.( )Atom van der Waals Parameters by Solute See Fig. 1 and Table III .

Solute A1 B1 B2 C1 C2 D1 D2, D3 D4 ]D6

Nothing DH DC DC DH DH DH DH DHMethane CT HC HC HC HC DH DH DHEthane CT CT HC HC HC HC HC DHPropane CT CT CT HC HC HC HC HCX X DC DC DH DH DH DH DHX X X DC DH DH DH DH DH2X X X X DH DH DH DH DH3CH OH CT OH HC HC HC HO DH DH3

( )O CH OS CT CT H1 H1 H1 H1 H13 2CH NH CT NX H1 H1 H1 DH HX DH3 2

+CH NH CT N3 HP HP HP H H DH3 3( )CH NH NX CT CT HX DH H1 H1 H13 2CH F CT F HF HF HF DH DH DH3CH F CT F F HF HF DH DH DH2 2CHF CT F F F HT DH DH DH3CH CT F F F F DH DH DH4


RADMER AND KOLLMAN

TABLE III.van der Waals Parameters by Atom Type( )See Table II .

˚( ) ( )Atom type R* A e kcal / mol

CT 1.9082 0.1094DC 1.9082 0.0000DH 1.4872 0.0000F 1.7500 0.0610H 0.6000 0.0157H1 1.3870 0.0157HC 1.4872 0.0157HF 1.2900 0.0150HO 0.0000 0.0000HP 1.1000 0.0157HT 1.2100 0.0150HX 0.6890 0.0157OH 1.7210 0.2104OS 1.6837 0.1700N3 1.8240 0.1700NX 1.8750 0.1700X 1.6000 0.1000

structure for each simulation was taken from thefinal state of the previous simulation followed by12 ps of equilibration. For each of the 1.0-ps simu-lations, data were generated using only the last 0.5ps, allowing the system time to equilibrate. TheMM1 and MM400 simulations used about the sametotal CPU time, but half of the MM400 simulationtime was devoted to equilibration and half to datacollection. No equilibration was done for the MM1simulation, so only data collection was done. The

Ž .dVV rlŽ .derivative of energy with respect to ldl

was stored at each step so that relative energiesw Ž .could be calculated see eq. 16 and associated

xtext .Four free energy calculation methods were used

to calculate the free energy difference from thedata generated. The first was the TI method using

Ž .eq. 2 and the second was the FEP method usingŽ .eq. 4 , which was done for both the forward

Ž . Ž .FEPf and reverse FEPr directions. The com-Ž .bined FEPc value is the average of the forward

and reverse calculations. The third method was theŽ . Ž .IPS method, using eqs. 10 and 11 . The interme-

diate perturbed state was defined in two ways: byassigning a value of zero to DG on the right-hand21

Ž . Ž .side of eq. 11 IPSO ; or by solving for DG21Ž .self-consistently IPSsc as described. The fourth

Ž .method was the CRS method, using eqs. 14 andŽ .15 , where the composite reference state consists

Ž . Ž .of methane 1 and methane 2 for the MM1 simu-Ž . Ž .lations and methane 1 , methane 2 and the 399

intermediate solutes for the MM400 simulations.The results are given as the RMS deviation of

Ž .the 18 estimates around three values: 1 aroundŽ .the mean of the 18 estimates; 2 around the best

Žestimate found by combining the data from all 18.simulations and doing one large calculation ; and

Ž . Ž .3 around the correct answer in this case, zero .The results are shown in Table VI.

The ME1 and ME400 simulation sets were usedto estimate the RMS error in the calculated solva-tion free energy difference between methane and

Ž .ethane see Fig. 3 . The ME1 simulations weredone analogously to MM1, and the ME400 simula-tions were done analogously to MM400. The re-sults were calculated in the same manner as theresults from the methane-to-methane simulations.The correct answer for this calculation is notknown, so we will treat the ‘‘best’’ result from thetraditional method as correct; this is the FEPcmethod with ME400 simulations, giving 0.24kcalrmol. The results are shown in Table VII.

Relative solvation free energies for nine addi-tional solutes were calculated using only the H7simulations just described; results are shown inTable VIII. This was done using the CRS methodw Ž . Ž .xeqs. 14 and 15 , where the composite referencestate consisted of only the coordinates from the‘‘nothing,’’ methane, ethane, propane, X, X , and2X simulations as above; the perturbed states were3not included in the composite reference state. Thecoordinates were modified such that they wouldcorresponds to the correct bond-lengths of the per-turbed states where necessary. Parameters formethanol were taken from Sun and Kollman.20

Parameters for dimethylether were developedanalogously to those for methanol. Parameters formethylamine and dimethylamine are from Mor-gantini and Kollman.21 Meng et al.22 give parame-ters for methylammonium. The fluorocarbon pa-

23 Žrameters are from Gough et al. see Tables I, II,.and III for parameters .

Results and Discussion

In this study, the expected errors for differentfree energy calculation methods are compared fromboth a theoretical and an empirical perspective. Todo the theoretical comparison it was necessary toassume that the estimates were based on a largenumber of independent samples. Assuming thiswas valid, it was shown that the IPS method has a

VOL. 18, NO. 7910


TABLE IV.( ) aRelative Free Energies Calculated Using the FEP, IPS, and CRS Methods Based on H7 Simulations .

Perturbed stateReference state Nothing Methane Ethane Propane X X X2 3

1.99 4.66 146 2.03 2.16 4.25bNothing 0.00 1.91 1.39 DNC 2.20 2.40 2.98

2.71 2.95 3.11 2.19 2.52 2.84

2.07 0.14 0.50 0.71 0.90 0.91Methane y1.91 0.00 0.18 0.05 y0.44 y0.18 0.11

y2.71 0.24 0.40 y0.52 y0.19 0.13

3.10 1.10 0.27 1.83 0.46 0.54Ethane y1.39 y0.18 0.00 0.17 y0.68 y0.48 y0.18

y2.95 y0.24 0.16 y0.76 y0.43 y0.11

5.54 2.99 1.25 4.46 3.32 2.17bPropane DNC y0.05 y0.17 0.00 y1.32 y0.47 y0.24

y3.11 y0.40 y0.16 y0.92 y0.59 y0.26

0.14 0.55 0.49 6.95 0.33 0.77X y2.20 0.44 0.68 1.32 0.00 0.35 0.71

y2.19 0.52 0.76 0.92 0.33 0.66

1.12 0.63 0.62 0.80 0.32 0.34X y2.40 0.18 0.48 0.47 y0.35 0.00 0.342

y2.52 0.19 0.43 0.59 y0.33 0.32

2.01 0.50 0.32 0.32 0.98 0.63X y2.98 y0.11 0.18 0.24 y0.71 y0.34 0.003

y2.84 y0.13 0.11 0.26 y0.66 y0.32

aThe row of each cell indicates the reference state used for the calculation; the column indicates the perturbed state. The first( )number in each cell is the result using the FEP method, the second is for the IPS self consistent method, and the last is for the

( )CRS method done by combining simulations of all solutes to form the composite reference state .bDid not converge.

smaller expected RMS error than the FEP method.It was also shown that to keep the errors in theestimates small, accessible conformations of theperturbed state must also be accessible during asimulation of the reference state; this was used to

argue that the CRS method should also reduce themagnitude of the expected errors.

Unfortunately, an interpretation of the empiricalcomparison of the methods is not straightforward,in part, because only a small number of systems

FIGURE 2. Topology for MM solutes. ‘‘C’’ indicates a carbon, ‘‘H’’ indicates a hydrogen, and ‘‘D’’ indicates a dummy( ) ( )atom no charge or van der Waals interaction . See Table III for van der Waals parameters atom type CT and HC .

Charges are: q = y0.464, q = 0.116.C H


RADMER AND KOLLMAN

TABLE V.Previously Calculated Relative Solvation

20Free Energy for Selected Hydrocarbons.

Previous calculateda ( )Perturbation free energy kcal / mol

Nothing ª Methane 2.67 " 0.04Methane ª Ethane 0.15 " 0.07Ethane ª Propane 0.18 " 0.09

aSee Sun et al.20

were examined. Additionally, there are many dif-ferent ways to perform the calculations consis-tently with each method, and, again, only a smallnumber of them could be examined. For example,to ensure reasonable estimates we performed longŽ .400-ps simulations for each of the 18 methane-to-methane and the 18 methane-to-ethane simula-tions. We assumed that conclusions drawn fromthese simulations would be appropriate for shortersimulations that are more typical. With thesecaveats in mind, results from the five sets of simu-lations performed will be described and compared.

Relative solvation free energies for the H7 so-lutes are shown in Table IV. The correct relativesolvation free energy for the parameters used isnot known so we will compare the solvation freeenergies of methane, ethane, and propane withresults from a simulation reported by Sun and

20 Ž .Kollman see Table V . The RMS errors in thefree energy estimates were not determined forthese calculations; we only show whether or notthe estimates are in reasonable agreement withprevious calculations.

ŽResults using the FEP method first number in.each cell depend on the direction of the perturba-

tion. For example, the relative solvation free en-Žergy for methane to ethane using the methane

.simulation is calculated as 0.14 kcalrmol and therelative solvation free energy for ethane to propaneŽ .using the ethane simulations is 0.27 kcalrmol.Both results are in good agreement with the previ-ously calculated values given in Table V. Even the‘‘nothing’’ to methane perturbation is surprisinglyaccurate if the calculation is done using the X

Žatom as an intermediate ‘‘nothing’’ to X gives2.03 kcalrmol and X to methane gives 0.55

.kcalrmol resulting in a combined value of 2.58kcalrmol. However, the reverse calculationsŽwhere the calculation is based on simulations of

.the larger of the two solutes give values in errorby over 1 kcalrmol for the propane to ethane andethane to methane calculations, and the error inthe methane to ‘‘nothing’’ calculation is evenlarger.

These results are consistent with our expecta-tions in that accessible conformations of the per-turbed state must be accessible during a simula-tion of the reference state to give good estimates.For example, adding a methyl group to methanecan be permissible because the simulation ofmethane will occasionally generate water struc-tures appropriate for ethane. Conversely, remov-ing a methyl group during a simulation of ethanedoes not give good estimates because the simula-tion of ethane will not generate water structuresappropriate for methane. A slightly oversimplifieddescription of this situation is that the water struc-ture around the larger solute is a subset of the

TABLE VI.Relative Free Energy Estimates for Methane to Methane.a

( )Relative free energy kcal / mol

Mean " RMS deviation Best " RMS deviation 0.000 " RMS deviation

Method MM1 MM400 MM1 MM400 MM1 MM400TI y18.443 " 27.835 y0.096 " 0.458 y18.443 " 27.835 y0.096 " 0.458 0.000 " 33.390 0.000 " 0.468FEPf 1.620 " 0.508 0.085 " 0.208 1.274 " 0.615 0.029 " 0.215 0.000 " 1.698 0.000 " 0.224FEPr y1.774 " 0.268 y0.089 " 0.217 y1.707 " 0.277 y0.039 " 0.222 0.000 " 1.794 0.000 " 0.234FEPc y0.077 " 0.323 y0.002 " 0.194 y0.217 " 0.352 y0.005 " 0.194 0.000 " 0.332 0.000 " 0.194IPS0 y0.041 " 0.168 y0.004 " 0.229 y0.039 " 0.168 y0.007 " 0.230 0.000 " 0.173 0.000 " 0.230IPS y0.040 " 0.163 y0.009 " 0.225 y0.038 " 0.163 y0.010 " 0.225 0.000 " 0.168 0.000 " 0.225SCCRS y0.040 " 0.163 y0.002 " 0.247 y0.038 " 0.163 y0.002 " 0.247 0.000 " 0.168 0.000 " 0.247

a (Calculations were done using two sets of 18 methane-to-methane simulations, each 401 ps long the MM1 and MM400) ( )simulations . Root mean square RMS deviations are of the individual simulations around the estimate given. They are not the

standard deviation of the estimate.

VOL. 18, NO. 7912


FIGURE 3. Topology for ME solutes. ‘‘C’’ indicates a carbon, ‘‘H’’ indicates a hydrogen, and ‘‘D’’ indicates a dummy( ) ( )atom no charge or van der Waals interaction . See Table III for van der Waals parameters atom type CT and HC .

Charges for methane are: q = y0.464, q = 0.116; charges for ethane are: q = y0.027, q = 0.009.C H C H

water structure around the smaller solute, allow-ing simulations of the smaller solute to adequatelysample water structure of both solutes.

Ž .The IPS method second number in each cellgives results in good agreement with those givenin Table V for changes in the solute involving the

Ž .addition of one methyl group or less . For largerchanges in the solute the IPS method does not givegood estimates. Note that the IPS calculations re-quire simulations of both solutes, so the two IPS

Žentries for any pair of solutes are identical unlikethe FEP calculation, which is based on simulations

.of only one of the two solutes . The CRS methodŽ .last number in each cell , which uses a single

Žcomposite reference state formed by combining.simulations of all seven solutes to calculate the

relative free energies, gives results in good agree-ment with those given in Table V for all pairs ofsolutes.

A problem with the comparison of results fromthe H7 simulations is that it shows only whetheror not the results are in reasonable agreement withprevious calculations. To get reasonable estimatesof the errors for each of the different methods, wehave examined some simple perturbations in muchgreater detail.

Results from the 400-window methane-to-Ž .methane see Fig. 2 simulations are given in Table

Ž .VI the MM400 simulations . The correct relativesolution free energy is obviously zero, so the lasttwo columns are the most useful for comparingRMS deviations of the estimates for the differentmethods. This shows that the FEP, IPS, and CRS

Žmethods all give about the same RMS error about.0.2 kcalrmol , and the TI method gives an RMS

error about twice as large. In general, this is con-sistent with our expectations because the theoreti-cal advantage of the IPS and CRS methods is that

TABLE VII.Relative Free Energy Estimates for Methane to Ethane.a

( )Relative free energy kcal / mol

Mean " RMS deviation Best " RMS deviation 0.240 " RMS deviation

Method ME1 ME400 ME1 ME400 ME1 ME400TI 217.664 " 14.914 0.520 " 0.489 217.664 " 14.914 0.520 " 0.489 0.240 " 217.935 0.240 " 0.564FEPf 0.181 " 0.080 0.257 " 0.210 0.175 " 0.080 0.214 " 0.214 0.240 " 0.100 0.240 " 0.211FEPr y1.199 " 0.382 0.204 " 0.210 y0.964 " 0.449 0.265 " 0.219 0.240 " 1.489 0.240 " 0.213FEPc y0.509 " 0.189 0.230 " 0.187 y0.394 " 0.222 0.240 " 0.187 0.240 " 0.773 0.240 " 0.187IPS0 0.174 " 0.074 0.252 " 0.183 0.171 " 0.074 0.230 " 0.184 0.240 " 0.100 0.240 " 0.183IPS 0.172 " 0.074 0.249 " 0.174 0.169 " 0.074 0.235 " 0.174 0.240 " 0.100 0.240 " 0.174SCCRS 0.172 " 0.074 0.237 " 0.184 0.169 " 0.074 0.222 " 0.185 0.240 " 0.100 0.240 " 0.185

a ( )Calculations were done using two sets of 18 methane-to-ethane simulations, each 401 ps long the ME1 and ME400 simulations .( )Root mean square RMS deviations are of the individual simulations around the estimate given. They are not the standard

deviation of the estimate.


RADMER AND KOLLMAN

TABLE VIII.[ ( ) ( )]Relative Solvation Free Energy Calculated Using the CRS Method Eqs. 14 and 15 , Where the Composite

( )Reference State Consists of only Nothing, Methane, Ethane, Propane, X, X , and X See Text . These Results2 3Were Calculated Based on the Composite Reference State Formed by Combining the H7 Simulations.

( )Free energy kcal / mola aPerturbation CRS Rank Previous calc. Rank

b dEthane ª CH OH-1 y6.11 } y6.89 }3b dEthane ª CH OH-2 y5.55 } y6.89 }3b dEthane ª CH OH-3 y5.43 } y6.89 }3

c dEthane ª CH OH y5.77 2 y6.89 23e( )Propane ª O CH y3.02 5 y3.5 53 2

b fNothing ª CH NH -1 y3.05 } y3.57 }3 2b fNothing ª CH NH -2 y2.12 } y3.57 }3 2b fNothing ª CH NH -3 y1.98 } y3.57 }3 2

c fNothing ª CH NH y2.58 3 y3.57 33 2+ gNothing ª CH NH y58.9 1 y66.9 13 3

b f( )Nothing ª CH NH-1 y0.35 } y1.95 }3 2b f( )Nothing ª CH NH-2 y0.03 } y1.95 }3 2

c f( )Nothing ª CH NH y0.21 4 y1.95 43 2hMethane ª CH F y2.44 6 y2.3 73iMethane ª CH F y2.21 7 y2.7 62 2iMethane ª CHF y1.89 8 y1.7 83

Methane ª CH 0.42 9 0.44h 94

aRanked based on solvation free energy, using solvation free energies for methane, ethane, and propane from Table V.bOnly one of the possible hydrogen orientations was used.cAll the possible hydrogen orientations were used.dSee Sun and Kollman.20

eUsed different method.3 2

fSee Morgantini and Kollman.21

g 2 2 ( )See Meng et al. this does not include the Born cutoff correction of y20.5 kcal / mol .hSee Gough et al.23

iAverage of two indirect paths.23

they required only some overlap between states. Inthis case, each calculation is divided into many

Ž .small calculations using 399 intermediate states ,so the overlap between adjacent states is appar-ently adequate to allow each simulation to sampleenough of the perturbed state to get good free

Ž .energy estimates even for the FEP method . Thuspoor overlap is not the limiting factor in thesecalculations.

We were moderately surprised that the TI errorswere not more similar to the FEP errors. Thissuggests that the rate of change in free energy with

Ž .respect to the coupling parameter l is not con-w Ž .stant over each window see eq. 2 and associated

xtext for this calculation. In fairness to the TImethod, it should be noted that no effort wasmade to choose intermediates such that the freeenergy change would be more nearly linear withl.

To evaluate how well the methods would per-form for calculations involving poor overlap be-tween the two states of interest, the relative freeenergy was calculated using no intermediate statesŽ .the MM1 simulations . The results are consistentwith our expectations in that the IPS and CRS

Žmethods give the smallest RMS errors about 0.2. Žkcalrmol , followed by the FEP methods from 0.3

.to 1.8 kcalrmol . The TI method gives the worstresults, as expected, because an assumption used

Ž .to find eq. 2 is not appropriate here. The mostsurprising result from this set of calculations isthat RMS errors using the combined FEP methodŽ .FEPc are only about one fifth the size of the RMSerrors using the forward or the reverse FEP meth-

Ž .ods 0.3 kcalrmol vs. 1.7 and 1.8 kcalrmol . Thisseemed a bit fortuitous considering that each FEPcestimate is calculated by averaging an FEPf and

VOL. 18, NO. 7914


and FEPr estimate. A possible explanation for thisŽis that the symmetry of this calculation methane

.to methane led to cancelation of errors that wouldnot be expected for most useful calculations. If the

ŽFEPc results are ignored we will come back to this.point later , these results show that, for calcula-

tions involving poor overlap between the states ofinterest, the IPS and CRS methods are clearly su-perior to the FEP and TI methods. This is consis-tent with results from previous comparisons of theFEP and IPS methods done by Saito andNakamura24 and Ferguson.25

The real question we hoped to answer withthese simulations was whether it is better to use

Žmany windows giving better overlap between. Žstates or few windows allowing more simulation

.of the states of interest . With this in mind, theMM1 and MM400 simulations were performedsuch that they used approximately the sameamount of CPU time. Our results show that the

ŽMM1 simulations give smaller RMS errors for the.IPS and CRS methods , but the difference is small

enough that if may not be meaningful. In fact, theMM1 simulations generated twice as much useful

Ždata as the MM400 simulations because half theMM400 simulation time was devoted to equilibrat-ing the system, which was not required for the

.MM1 simulations , possibly explaining why theratio of the MM400 and MM1 errors is approxi-mately equal to the square root of 2. It is alsopossible that a different choice of intermediatestates would improve the results from the MM400simulations. Thus, the results of the MM1 andMM400 simulations do not indicate whether it isbetter to use 1 or 400 windows. Obviously, anintermediate number of windows may be betterfor this case, but we chose not to pursue the matterfurther, in large part because the optimal numberof windows is likely to be system dependent.

The major disadvantage of using the methane-to-methane simulations to compare free energycalculation methods is that the symmetry of thischange may lead to cancelation of errors that wouldnot take place with most relevant systems. Wetherefore performed and examined a methane-to-ethane calculation in a manner similar to themethane-to-methane calculation described previ-ously. Unfortunately, the correct relative solvationfree energy for these parameters is not knownŽresults shown in Table V are not adequately con-

.verged for this comparison . So we will use theŽestimate from the traditional method 0.24

.kcalrmol using 400 windows and the FEPcŽ .method see Table VII . Because this may not be

the correct result, we are biasing conclusions wedraw regarding the errors in the different methodsin favor of this method.

Errors for the different methods applied to the400 window methane-to-ethane simulations are

Ž .given in Table VII the ME400 simulations . Aswith the MM400 simulations, the FEP, IPS, andCRS methods all give about the same RMS errorŽ .about 0.2 kcalrmol , so poor overlap is not thelimiting factor in these calculations. The TI methodgives RMS errors, about three times larger, so thefree energy is not nearly linear over the Dl usedhere.

Ž .When one window is used ME1 simulations ,the IPS and CRS methods give the smallest RMS

Ž .errors 0.1 kcalrmol . Two additional points areworth noting here. First, the FEPc method does not

Žgive good estimates its RMS error is about 0.8.kcalrmol . This shows that, in general, the FEPc

method should not be used for calculations involv-Žing large perturbations despite the small error

obtained using the FEPc method and the MM1.simulations . The second important point is that

the FEPf method gives errors as small as the IPSand CRS methods, whereas the FEPr method giveserrors more then an order of magnitude larger.This is consistent with the expectation that simula-tions of the smaller solute can be used to giveestimates of the relative solvation free energy ofthe larger solute, but the converse is not true. Thisis also qualitatively consistent with the FEP resultsfrom the H7 simulations discussed previously.

A comparison of RMS errors from the ME1 andME400 simulations shows that the ME1 simula-tions give smaller errors, but we have the sameconcerns discussed above in the comparison of theMM1 and MM400 simulations. It should be noted,however, that if a relative free energy of 0.15

Ž 20kcalrmol as reported by Sun and Kollman ; see.Table V is assumed to be correct, then free energy

estimates based on the one-window simulation arein excellent agreement and, more importantly, theRMS deviation from 0.15 kcalrmol is smaller thanthe RMS deviation from 0.24 kcalrmol. The con-verse is true for the 400-window simulations. Insummary, our results suggest that a single win-dow gives better results for this case, but thedifference does not seem to be significant.

The observation that simulations of methanecan be used to estimate relative solvation freeenergies of ethane suggests the use of simulationsof a reference solute to estimate the relative freeenergy of modified solutes that have not beensimulated. This has been done previously. For ex-


RADMER AND KOLLMAN

ample, free energy changes have been estimatedusing the rate of change in free energy as a func-tion of atomic parameters, calculated using a sin-gle simulation.26, 27 Also, van Gunsteren andcoworkers used ‘‘soft’’ atoms, which allow sam-pling of conformations corresponding to the pres-ence of a real atom or no atoms at the soft-atomposition.28

Instead of using a simulation of one solute, wepropose combining coordinates from simulationsof a set of solutes using the CRS method. Theadvantage of doing this is it gives more flexibilityfor selecting the regions of conformation space thatwill be sampled. If this is done carefully, a rela-tively small set of appropriately chosen solutescould be used to sample the important conforma-tions of a larger number of solutes that are notsimulated. This can then be used to find the rela-tive free energies of solutes that have not beensimulated.

To evaluate this method, we found the compos-ite reference state for the saved coordinates fromthe H7 simulations, as described previously. Equa-

Ž . Ž .tions 14 and 15 were used to calculate therelative solvation free energy of nine solutes thatwere not simulated. They are methanol, dimeth-ylether, methylamine, dimethylamine, methylam-monium, and four fluoromethanes. These soluteswere chosen because they all have structural simi-larities to at least one of the seven simulatedsolutes and, additionally, they have had their sol-vation free energies calculated previously so thatcomparisons can be made. The results are given inTable VIII.

The estimated solvation free energies for thefluorocarbons are in good agreement with previ-ously calculated results. This shows that heavyatoms can be added to the solute in a single step.The free energies for the hydrogen bonding solutesare all less negative than the previously calculatedvalues, indicating that the water structure for theseven hydrophobic simulated solutes did not sam-ple all of the solvent structure for solutes involvedin electrostatic interactions with the solvent. Forexample, the solvation free energies for methanoland methylamine are both underestimated byabout 1 kcalrmol. Although quantitatively disap-pointing, this is impressive qualitatively, becausethe calculation shows much of the expected hydro-gen bonding between the perturbed solute and

Žwater meaning that the simulations did sample.perturbed state water structures , even though it is

based on simulations of only hydrophobic solutes.

Even the calculation of methylammonium, whichinvolves adding a net charge, shows much of theexpected electrostatic interaction with water.

The least impressive calculation for neutral so-lutes underestimates the solvation free energy fordimethylamine by almost 2 kcalrmol. We foundthis surprising because the solvation free energyfor dimethylether is underestimated by only 0.5kcalrmol, even though the two solutes are struc-turally similar. To understand why this is the case,interactions between dimethylamine and a singlewater molecule as well as interactions betweendimethylether and a single water molecule wereexamined. This was done by moving a watermolecule around each of the two solutes, andidentifying the low-energy conformations. For eachsolute, a total of 500 million conformations weregenerated. Almost 20,000 conformations within 0.6kcalrmol of the minimum energy conformationfor dimethylether were found, whereas only about2300 conformations within 0.6 kcalrmol of theminimum energy conformation for dimethylamine

Žwere found this order of magnitude differencewas constant for any energy difference up to at

.least 3 kcalrmol .Examination of these conformations shows that,

for near-optimal water]dimethylether interactions,the water can occupy any position near one, orboth, of the lone pairs on dimethylether’s oxygen.However, near-optimal interactions with dimeth-ylamine require that the water approach the lonepair of dimethylamine’s nitrogen from above theplane of the three heavy atoms, allowing the waterto occupy only one end of the region occupied bywaters interacting with dimethylether. This showsthat it is easier to find good interactions betweendimethylether and a single water than betweendimethylamine and a single water. We assume thisis also true for bulk water, accounting for thedifference in errors. We also assume that the errorwould be reduced if a better choice of referencesolutes is used to define the composite referencestate, or longer simulations are performed.

The solvation free energies of solutes foundusing this method clearly are not as accurate ascalculations where all solutes are simulated; how-ever, the ranking of absolute solvation free energy,based on previously calculated values, is almostidentical to the ranking based on the CRS solvation

Ž .free energies see Table VIII . This is of interestbecause it suggests that this method could be usedto estimate and rank interaction free energies be-tween a receptor and a large number of proposedligands, without simulating each of them. A sce-

VOL. 18, NO. 7916


nario that might be useful for doing structure-based ligand design with biological macro-molecules starts by simulating the macromoleculebound to a known ligand and calculating the rateof change in the free energy with respect to param-

Ž 27, 29eters see Cieplak et al. for an example of the.use of free energy derivatives . This, in conjunction

with examination of the structures using molecu-lar graphics, could be used to suggest ligands thatmight bind the macromolecules more tightly. Theseproposed ligands would then be quickly ranked,based on their estimated binding free energies,using the method described here. The mostpromising ligands could then be examined care-fully using more accurate free energy calculationmethods, or their binding could be determinedexperimentally.

Conclusions

In the ‘‘Theoretical Background’’ section, weargue that the IPS and CRS methods should givesmaller expected errors than the FEP and TI meth-ods. An empirical comparison of the methodsshows that, in general, the IPS and CRS methodsgive better results if the overlap between the states

Ž .is small as for the MM1 and ME1 simulations .ŽHowever, if the overlap is very good as for the

.MM400 and ME400 simulations then the FEP, IPS,and CRS methods give about the same RMS error.Our results also suggest that using few intermedi-

Ž .ate states and the IPS or CRS method may givebetter results than using many states, but the dif-ference does not appear to be significant. Consider-ing that the IPS and CRS methods never givesignificantly larger errors, our results suggest thatthe IPS or CRS methods should be used in place ofthe FEP or TI methods, particularly if the overlap

Žbetween states is not known to be good it shouldbe kept in mind that the overlap may be inade-quate for even the IPS or CRS methods, in whichcase additional intermediate states would be

.needed, regardless of the method used .We also show that estimating the solvation free

energies of solutes that are not simulated is possi-ble, although the results are less accurate. For thesolutes examined, the ranking of solvation freeenergy is almost identical to the previously calcu-lated ranking. This may be useful for estimatingthe relative free energies of a large number ofcompounds to identify which would be worthexamining further. We are currently exploring thepossibility of using this procedure to estimate the

relative binding free energy for ligandrmacro-molecule interactions so that we can identify lig-and modifications worthy of more accurate freeenergy calculations. We anticipate that this will beuseful for improving the binding of drugs or leadcompounds to biomolecules.

Acknowledgments

R. J. R. gratefully acknowledges support fromŽ .NIH Grant GM-39552 G. L. Kenyon ; P. A. K.

Ž .thanks the NSF Grant CHE-94-13472 .

Appendix

RMS ERRORS IN FREE ENERGY ESTIMATESAND OPTIMAL INTERMEDIATE STATES

The RMS Error in Free Energy Estimates forSimulations Performed on State 0 andPerturbed to State 1

The RMS error of a free energy estimate is:122Ž . Ž .RMS D g y DG s E D g y DG½ 510 10 10 10

Ž .17

where DG is the free energy, D g is an estimate10 10Ž .of the free energy, and E . . . indicates the expec-

tation value calculated over all possible sets ofsamples. The error can be written as:

n0 Ž .1 r r1 i Ž .D g y DG s yRT ln 18Ý10 10 Ž .n r r0 0 iis1

where coordinates, r , are chosen with a BoltzmanniŽ . Ž .probability, r r . Substituting eq. 18 into eq.0 i

Ž .17 , and finding the expectation value, gives:


2n0 Ž .1 r r1 is RT ??? ln ÝH H ž /Ž .n r rr r 0 0 in 1 is11

12n1

Ž . Ž .= r r dr 19Ł 1 k kks1

We cannot solve this exactly, so the logarithm isapproximated with the first term of its Taylorseries expansion:

n n0 0Ž . Ž .1 r r 1 r r1 i 1 i Ž .ln f y 1 20Ý ÝŽ . Ž .n r r n r r0 0 i 0 0 iis1 is1


RADMER AND KOLLMAN

For sets of samples that give good estimates ofŽ .the free energy D g y DG f 0 , this approxi-10 10

w Ž .xmation is obviously reasonable see eq. 18 . Theapproximation is not good for sets of samples thatoverestimate or underestimate the free energy.Problems resulting from this approximation may

Ž .1 r r1 in0be most serious as Ý approaches zerois1 Ž .n r r0 0 iŽ .because the right-hand side of eq. 20 approaches

negative unity instead of negative infinity. Thissituation is most likely to occur if their are asignificant number of possible set of samples thatsample the perturbed state only in inaccessibleregions. For example, if most of the conformationssampled during simulations of the reference stateare not accessible to the perturbed state, the esti-mate of the RMS error given in this section islikely to underestimate the correct error.

Accepting these problems, the estimate of theRMS error can be simplified by substituting eq.Ž . Ž .20 into eq. 19 . Expanding the resulting inte-grand, integrating the cross terms, and moving thesummation out from under the integral gives:

122 Ž .1 r r 11Ž .RMS D g y DG f RT dr yH10 10 ž /Ž .n r r n0 0 0

Ž .21

RMS Error in Free Energy Estimates and theOptimal Perturbed State for SimulationsPerformed on States 0 and 1, Perturbed toState b

If sampling on states 0 and 1 is independent,then:

2 Ž . 2 Ž .RMS D g y DG s RMS D g y DG10 10 b0 b0

2 Ž .q RMS D g y DGb1 b1

wSampling on states 0 and 1 is independent, so seeŽ .xeq. 21 :

2 Ž .r r 1b22 Ž . Ž .RMS D g y DG f RT dr yH10 10 ž Ž .n r r n0 0 0

2 Ž .r r 1bq dr yH /Ž .n r r n1 1 1

Ž 30 31Using Euler’s equation see Boas, Pierre, or.any other description of variational calculus , the

2Ž .r that minimizes RMS D g y DG can beb, o p t 10 10found:

2 2Ž . Ž .r r r rb , o p t b , o p t Ž . q y 2l r rb b , o p tŽ . Ž .n r r n r r0 0 1 1s 0Ž .r rb , o p t

Ž .Differentiating and solving for r r gives:b, o p t

Ž . Ž .n r r n r r0 0 1 1Ž .r r s lb , o p t b Ž . Ž .n r r q n r r0 0 1 1

where l is found to normalize r .b b, o p t

RMS Error in Free Energy Estimates and theOptimal Reference State for SimulationsPerformed on State s, Perturbed to States 0and 1

The error in the free energy estimate is:

ns Ž .1 r r0 iD g y DG s yRT ln Ý10 10 Ž .n r rs s iis1

ns Ž .1 r r1 iq RT ln Ý Ž .n r rs s iis1

Using the first term of the logarithm’s Taylor se-wries expansion to approximate the logarithm eq.

Ž .x20 gives:

ns Ž .1 r r0 iD g y DG f yRT y 1Ý10 10 ž /Ž .n r rs s iis1

ns Ž .1 r r1 iq RT y 1Ýž /Ž .n r rs s iis1

ns Ž . Ž .1 r r y r r0 i 1 iD g y DG f yRT Ý10 10 ž /Ž .n r rs s iis1

This gives an RMS error of:


2ns Ž . Ž .1 r r y r r0 i 1 if RT ??? ÝH H ž /Ž .n r rr r s s in 1 is1s

12ns

Ž .= r r drŁ s k kks1

VOL. 18, NO. 7918


Expanding the integrand and integrating wherepossible gives:

Ž .RMS D g y DG10 10122w Ž . Ž .x1 r r y r r0 1 Ž .f RT dr 22H Ž .ž /n r rs s

Using Euler’s equation as above, the r thats, o p tminimizes this equation can be found:

2w Ž . Ž .xr r y r r0 1 y2 Ž . q l r rs s , o p tŽ .r rs , o p ts 0Ž .r rs , o p t

Ž .Differentiating and solving for r r gives:s, o p t

Ž . < Ž . Ž . <r r s l r r y r rs , o p t s 0 1

where l is found to normalize r .s sUnfortunately, there is a significant difficulty

with this optimal state; it allows the RMS error inthe two intermediate estimates, D g and D g , to0 s 1 sapproach infinity, making them impossible to cal-culate in practice. We are not able to determine ifthis is a result of approximating the logarithmwith part of its Taylor series expansion, or if itwould be the case even if the RMS error couldhave been calculated without making any approxi-mations.

References

Ž .1. C. H. Bennett, J. Comput. Phys., 22, 245]268 1976 .2. S. Kumar, D. Bouzida, R. H. Swendsen, P. A. Kollman, and

Ž .J. M. Rosenberg, J. Comput. Chem., 13, 1011]1021 1992 .3. J. A. McCammon and S. C. Harvey, Dynamics of Proteins and

Nucleic Acids, Cambridge University Press, New York, 1987.4. D. L. Beveridge and F. M. DiCapua, Annu. Rev. Biophys.

Ž .Biophys. Chem., 18, 431]492 1989 .5. T. P. Straatsma and J. A. McCammon, Annu. Rev. Phys.

Ž .Chem., 43, 407]435 1992 .Ž .6. P. A. Kollman, Chem. Rev., 93, 2395]2417 1993 .

Ž .7. D. A. Pearlman, J. Phys. Chem., 98, 1487]1493 1994 .

Ž .8. R. W. Zwanzig, J. Chem. Phys., 22, 1420]1426 1954 .9. M. P. Allen and D. J. Tildesley, Computer Simulation of

Liquids, Oxford University Press, New York, 1987.Ž .10. W. F. van Gunsteren, Prot. Eng., 2, 5]13 1988 .

11. B. L. Tembe and J. A. McCammon, Comput. Chem., 8,Ž .281]283 1984 .

12. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T.Vetterling, Numerical Recipes in FORTRAN: The Art of Scien-tific Computing, Cambridge University Press, New York,1989.

13. G. M. Torrie and J. P. Valleau, J. Comput. Phys., 23, 187]199Ž .1977 .

14. D. A. Pearlman, D. A. Case, J. W. Caldwell, G. L. Seibel,U. C. Singh, P. Weiner, and P. A. Kollman, University ofCalifornia at San Francisco, 1991.

15. W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W.Ž .Impey, and M. L. Klein, J. Chem. Phys., 79, 926]935 1983 .

16. H. J. C. Berendsen, J. P. M. Postma, W. F. van Gunsteren,A. DiNola, and J. R. Haak, J. Chem. Phys., 81, 3684]3690Ž .1984 .

17. W. F. van Gunsteren and H. J. C. Berendsen, Mol. Phys., 34,Ž .1311]1327 1977 .

18. S. J. Weiner, P. A. Kollman, D. T. Nguyen, and D. A. Case,Ž .J. Comput. Chem., 7, 230]252 1986 .

19. W. L. Jorgensen and C. Ravimohen, J. Chem. Phys., 83,Ž .3050]3054 1985 .

20. Y. Sun and P. A. Kollman, J. Comput. Chem., 16, 1164]1169Ž .1995 .

21. P.-Y. Morgantini and P. A. Kollman, J. Am. Chem. Soc., 117,6057]6063, 1995.

22. E. C. Meng, P. Cieplak, J. W. Caldwell, and P. A. Kollman,Ž .J. Am. Chem. Soc., 116, 12061]12062 1994 .

23. C. A. Gough, D. A. Pearlman, and P. A. Kollman, J. Chem.Phys., 99, 9103]9110, 1993.

24. M. Saito and H. Nakamura, J. Comput. Chem., 11, 76]81Ž .1990 .

Ž .25. D. M. Ferguson, J. Chem. Phys., 99, 10086]10087 1993 .26. P. E. Smith and W. F. van Gunsteren, J. Chem. Phys., 100,

Ž .577]585 1994 .27. P. Cieplak, D. A. Pearlman, and P. A. Kollman, J. Chem.

Ž .Phys., 101, 627]633 1994 .28. T. C. Beutler, A. E. Mark, R. C. van Schaik, P. R. Gerber,

and W. F. van Gunsteren, Chem. Phys. Lett., 222, 529]539Ž .1994 .

29. P. Cieplak and P. A. Kollman, J. Comput.-Aided Mol. Des., 7,Ž .291]304 1993 .

30. M. L. Boas, Mathematical Methods in the Physical Sciences,2nd Ed., Wiley, New York, 1983.

31. D. A. Pierre, Optimization Theory with Applications, Dover,New York, 1986.

32. D. L. Veenstra, personal communication.


Documents

Free energy calculation methods: A theoretical and empirical comparison of numerical errors and a new method qualitative estimates of free energy changes