Annual Review of Physical Chemistry Volume 51 Issue 1 2000 [Doi 10.1146%2Fannurev.physchem.51.1.435] Cheatham III, Thomas E.; Kollman, Peter a. -- M OLECULAR D YNAMICS S IMULATION

P1: FUIAugust 12, 2000 15:49 Annual Reviews AR109-16

Annu. Rev. Phys. Chem. 2000. 51:43571Copyright c 2000 by Annual Reviews. All rights reserved

MOLECULAR DYNAMICS SIMULATION OFNUCLEIC ACIDS

Thomas E. Cheatham IIIDepartment of Medicinal Chemistry, University of Utah, Salt Lake City,Utah 84112-5820; e-mail: [email protected]

Peter A. KollmanDepartment of Pharmaceutical Chemistry, University of California, San Francisco,California 94143-0446; e-mail: [email protected]

Key Words free energies, A/B transition, sequence specific structure, hydration,ion association

n Abstract We review molecular dynamics simulations of nucleic acids, includingthose completed from 1995 to 2000, with a focus on the applications and resultsrather than the methods. After the introduction, which discusses recent advances in thesimulation of nucleic acids in solution, we describe force fields for nucleic acids andthen provide a detailed summary of the published literature. We emphasize simulationsof small nucleic acids (6 to 24 mer) in explicit solvent with counterions, using reliableforce fields and modern simulation protocols that properly represent the long-rangeelectrostatic interactions. We also provide some limited discussion of simulation in theabsence of explicit solvent. Absent from this discussion are results from simulationsof protein-nucleic acid complexes and modified DNA analogs. Highlights from themolecular dynamics simulation are the spontaneous observation of A$ B transitionsin duplex DNA in response to the environment, specific ion binding and hydration,and reliable representation of protein-nucleic acid interactions. We close by examiningmajor issues and the future promise for these methods.

INTRODUCTION

The past5 years have culminated a renaissance in the simulation of nucleic acidstructures with atomistic models, empirical force fields, and molecular-dynamics(MD) simulation methodologies. After a short introduction and a brief discussionof force fields, this review highlights these recent advances in MD simulation ofnucleic acids. The presentation focuses heavily on MD of smaller (6 to 24 mer)DNA duplexes in explicit solution, although some discussion is also provided forother nucleic acid systems ranging from DNA triplexes and quadruplexes to RNAloops and larger structures and finally to protein-nucleic acid complexes. Limited

0066-426X/00/1001-0435$14.00 435

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.


436 CHEATHAM KOLLMAN

discussion of the use of more efficient methods for simulation of nucleic acids,such as implicit-solvation models, is also provided. This review does not discussthe algorithms, methods, or runtime parameters in any appreciable detail. For moreinformation on these aspects, read the cited literature and consult more detaileddiscussions of MD simulation methods, such as those of Allen & Tildesley (1) orLeach (2). Also, this review does not detail the large body of work in simulationof protein-nucleic acid complexes.

Before ca. 1995, simulations of nucleic acids were plagued by instabilities(for reviews, see 35), owing largely to the applicationnecessitated by limits incomputational powerof approximation methods, which lack the rigor requiredto reasonably represent highly charged systems, such as nucleic acids. Althoughsimulations of protein structures with an explicit representation of solvent wererather robust and reliable by ca. 1995, simulations of nucleic acid structures werecharacterized by distortion of duplex structures, broken base pairing, and misrepre-sented sequence-specific fine structures. In simulations of highly charged systems,such as nucleic acids, particularly disastrous behavior is seen with simple trunca-tion of the long-range electrostatic interactions, which leads to localized heatingand instability (6). Despite these limitations, the gross failures were often missedbecause the simulations were too short (


MOLECULAR DYNAMICS SIMULATION OF NUCLEIC ACIDS 437

been amplified considerably in the past 5 years, including demonstrations of theenvironmental dependence of nucleic acid structures, studies of the role of spe-cific water and ion interactions, and the ability to reliably represent nucleic acidfine structures in MD simulations. Now it is routine to run 1- to 10-ns-lengthsimulations of nucleic acids in explicit solution with a proper treatment of thelong-range electrostatic interactions. The ease and reliability have increased to thelevel that some groups are now performing the final stages of NMR refinementof nucleic acid structures with these methods and are including explicit solvent(1113). These methods are useful to increase the value and reliability of structuresrefined from NMR data because, without long-range information (such as datafrom residual dipolar couplings), the resolution of these structures is less than thatfor high-quality crystal structures.

A number of factors, including the development of new methods and forcefields, the ready availability of significant computational resources, and collabo-ration among a variety of research groups, coalesced in ca. 1995, which led toa large advance in the simulation of nucleic acid structures in solution. Whenhigher-level QM calculations and more significant testing of the force fields wereachieved, second-generation force fields appeared that all did a reasonable jobof representing nucleic acid structures. Noteworthy are the Cornell et al forcefield (14) developed by our group at the University of California, San Francisco,and others developed by MacKerell et al (15), Foloppe & MacKerell (16), andMacKerell & Banavali (17) in CHARMM; and Langley at Bristol Myers Squibb(18; discussed in more detail below). Access to computational resources increasedsignificantly, not only with the availability of fast workstations from a variety ofvendors, but with access to supercomputer centers in the United States [such asthe National Science Foundation (NSF) Supercomputer Centers in Pittsburgh, PA;San Diego, CA; Cornell University (Ithaca, NY); and the University of Illinois(Urbana)] and in the rest of the world. The access to the NSF SupercomputerCenters (and to clusters of workstations) spawned efforts to parallelize the majorMD codes, such as AMBER and CHARMM. The early AMBER effort was aidedconsiderably by these supercomputer centers, most notable the Pittsburgh Super-computing Center (for more information, see 19, 20). In addition to parallelization,the most relevant methodological development for nucleic acid simulation was thedevelopment of the fast Ewald methods (21), such as the particle-particle parti-cle mesh Ewald (22, 23), the particle mesh Ewald [PME (2426)], and the fastFourier Poisson Ewald (27) methods. These allow a full Ewald treatment with anefficient algorithm that uses fast Fourier transforms to evaluate the (expensive)reciprocal-space Ewald interactions. The PME methods have become particularlypopular for their relative ease of implementation, the free availability of Dardenscode, the smoothness of the errors, the efficiency, and the ready adaptation to ar-bitrarily shaped unit cells and constant-pressure simulation. For a more detaileddiscussion of methods for treating the long-range electrostatic interactions, seethe review by Sagui & Darden (28). Simulations that applied the PME methodsand the Cornell et al force field demonstrated, in 1995, that stable nanosecond

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



length MD trajectories of proteins and nucleic acids could be generated; in con-trast, when a simple group-based truncated cutoff of the pair interactions wasapplied, nucleic acid duplexes rapidly distorted and fell apart (6). This does notimply that the PME methods are necessary to generate stable trajectories; it simplypoints out the necessity to properly treat the long-range electrostatic interactionsand, more specifically, avoid improper treatment of the long-range electrostaticforces.

By this time, it was also readily apparent that stable trajectories could be gen-erated for highly charged systems by carefully smoothing the long-range energiesor forces to avoid discontinuities and instabilities, which can be done by eitherswitching the potential (i.e. adding a spline to bring the interactions to zero over afinite range, typically 24 A) or by shifting the potential to zero at the cutoff. How-ever, given that the dynamics are largely dictated by the forces, energy switchingor shifting does not typically lead to better behavior. For atom-based switchingfunctions (over a short range), atomic fluctuations are completely inhibited. Thebest behavior with a cutoff is seen when the electrostatic forces are shifted on anatomic basis; this allows stable nanosecond length simulation of nucleic acids insolution (15, 29, 30). For more information on methods to smooth/shift the energiesor forces, see the work by Levitt et al (31) and Steinbach & Brooks (32). Althoughstable simulation can be observed with highly charged systems and applicationsof the atom-based force shift method, not all properties are accurately represented.Studies of lipid bilayers that apply these methods have shown that certain watertransport properties are misrepresented unless an Ewald treatment is applied (33).Given that the fast Ewald methods are only slightly more computationally demand-ing than an appropriate force-shifted cutoff in the 12- to 16-A range, the Ewaldmethods have started to become the methods of choice. Despite the widespreadadoption, these methods do have some limitations. Specifically, Ewald methodsapply true periodicity, which may have potentially troubling aspects, particularlyfor low-dielectric solvents (34, 35). Note that, except where explicitly mentionedotherwise, all of the simulations discussed in detail in this review were performedby using proper methods to treat the long-range electrostatics, which mostly in-clude PME methods and occasionally atom-based force-shifted cutoffs in the 12-to 14-A range.

In addition to a significant advance in the representation of nucleic acids inexplicit solvent, more approximate methods have also seen significant advancesmore recently. These include the use of implicit solvent methods, such as the gen-eralized Born methodology [36, 37; for review, see the article by Bashford & Casein this volume (38)]. Recent simulations suggest that these methods can reason-ably reproduce the structure and dynamics seen in simulations in explicit solvent,at reduced cost (39, 40; V Tsui & DA Case, J. Am. Chem. Soc.; 40). Althoughsuch methods miss important effects from specific ion and water association,the increased efficiency holds promise for the study of much larger nucleic acidstructures.

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



DEVELOPMENT OF NUCLEIC ACID FORCE FIELDS

There are three factors that contribute to the realism of an MD simulationtheforce field, the way it is implemented, and the relevance of the sampled states.Whereas the importance of an accurate force field is clear to all, often the im-portance of its implementation is not fully appreciated. In the early 1980s, whencomputer limitations inhibited the inclusion of water in molecular mechanics andMD simulations of macromolecules, the force field was often blamed when thestructure deviated in simulation from the X-ray crystal structure. Now, as becameclear in the early 1990s, the implementation of the force field and the inclusion ofsolvent and counterions are just as critical to an accurate simulation as the forcefield parameters themselves. Although it is possible to treat the solvent implicitly(i.e. with an effective dielectric constant that is dependent on distance or a moreelaborate model to mimic solvent screening), recently simulations with an explicitrepresentation of solvent and counterions, as well as periodic boundary conditions,have been the predominant method of applying MD simulation to nucleic acids.

The Force Field

A molecular mechanics potential energy function or force field describes the struc-ture and covalent connectivity of the molecules. For macromolecular systems,typically a pairwise potential function is used, as follows:

U DXbonds

kb.r req/2 CX

anglesk . eq/2 C

Xdihedrals

Xn

Vn2

[1C cos.n /]CX

i

Xj>i

"Aijr12ij Bij

r6ij

!C qi q j rij

#C3:

As can be seen, molecular mechanics potential-energy functions are broken downinto different terms to describe nuclear motion. Whereas QM electronic-structurecalculations on molecules are usually solved for the electronic structure with fixednuclei (the Born-Oppenheimer approximation) and then solved for the wave func-tions of nuclear motion, by using a description of the effective nuclear potentialenergy U, molecular-mechanics approaches typically attempt to mimic the firststep (derivation of an effective potential for nuclear motion) by empirically (oftenwith the aid of electronic-structure calculations on appropriate molecular frag-ments) fitting the above equation and then using this potential energy to solve theclassical equations of motion for the system.

The first two terms in the equation involve the potential energy of distortionfrom equilibrium of bond lengths and bond angles (req and eq). To reproduceknown bond lengths and angles (to 0.01 A and 3) is usually straightforward,unless the fragments are highly strained (i.e. multicyclic) and the data can beobtained from high-resolution crystal structures. On the other hand, the force

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



constants, kb and k, must come from empirical data on vibrational frequencieson fragments. Although one can reproduce isolated bond stretching of X-H bonds(where X is C, N, or O) with such a model, most of the frequencies in the midrange(5001500 cm1) are highly coupled, and a much more elaborate model [such asMM3 (41) or CVFF (42)], involving anharmonicity and cross terms, is requiredto reproduce such frequencies more than qualitatively. In addition to the standardterms above, the CHARMM force fields (43) include a Urey-Bradley term, whichdescribes 13 interactions between atoms bonded to a common atom and leads to animprovement of the calculated frequencies over the harmonic model. Nonetheless,in our opinion, errors in reproducing experimental frequencies in this range aremuch less significant than errors in the last three terms of the force field equation,and it is more important that the errors are not highly coupled with these moresignificant terms.

We now consider these last three terms in the force field equation, the torsional,van der Waals, and electrostatic interactions. The torsional energy is a periodictrigonometric function that describes the energy as a function of rotation anglearound a given bond (J-K), described by four atoms connected in series (I-J-K-L).Although some earlier models of nucleic acids (e.g. CHARMM19) required theenergy of bond rotation around atoms (J-K) to be described in terms of one spe-cific (I-J-K-L) quartet, all other models (and the more recent CHARMM models)describe the rotational energy as a sum of all I-J-K-L quartets. For example, inethane, each of the nine possible H-C-C-H quartets of atoms has its own torsionalenergy. Of course, in this case, all of them have the same magnitude of the maxi-mum potential energy (Vn) and phase . In addition to terms that describe rotationabout bonds, improper torsions are also included to enforce planarity, either as astandard torsion term as above (such as that used in the AMBER force fields) oras a simple harmonic term (as used in the CHARMM force fields). In the majorityof force fields, the torsional potential is not parameterized until all the other partsof the force field have been parameterized. This is done by empirically adjustingthe torsional potential function to reproduce, as accurately as possible, the ex-perimental (or high-level, ab initio QM-calculated) barrier heights and minimumenergies and the energies of fragments of relevance in nucleic acid systems. Notethat, in molecular mechanic models, any electrostatic or van der Waals effects fromatoms that are bonded together (1-2 interactions) or bonded to a common atom(1-3 interactions) are assumed to be folded into the bond and angle terms and thusare not included in the nonbonded evaluation. However 1-4 interactions (atomsseparated by at least three bonds) are included, although, in some force fields, theyare treated differently from all other nonbonded interactions. There is obviouslycoupling between the torsional energies and the 1-4 nonbonded interactions, andthere is no obvious algorithmic way to deconvolute them. There are two meth-ods in common use for deriving the partial charges on the atoms (qi), which is arequired step for evaluation of the electrostatic term. These methods include fit-ting the electrostatic potential (ESP) of appropriate fragments [ESP or restrainedESP (RESP) (44)] and empirically adjusting the charges to reproduce a set of

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



ab initiocalculated interaction energies or liquid and solid properties. On the otherhand, all force fields in current use evaluate the van der Waals energy empiricallyto fit nonbonded interactions or the properties of liquids and/or solids.

We briefly describe the approach to derive torsional potentials in the force fieldby Cornell et al (14) and contrast this to those for nucleic acids presented byMacKerell et al (15), Foloppe & MacKerell (16), and Langley (18), which arethe only examples in which the force field parameter derivations have been de-scribed in the literature in sufficient detail to understand how this process wascarried out. Cornell et al (14) derived a force field for proteins, nucleic acids,and organic molecules by using the build-up principle, in which one used van derWaals parameters from liquid simulations, partial charges from electrostatic po-tential calculations [RESP charges (44)] and a minimal set of torsional potentials.It was assumed that van der Waals parameters (dispersion 1=r6 and exchangerepulsion 1=R12) are much less dependent on the subtleties of electron distribu-tion. Therefore, one can assume that the van der Waals parameters for all atomsof a given hybridization (i.e. Csp2) are equal, with the exception of hydrogen,because hydrogen has no inner shell of electrons and thus its electron distributionis much more sensitive to chemical bonding. The RESP charges are derived foreither molecules or fragments of the polymers, using a 631G* basis set, to ensurebalance with the effective two-body models [T1P3P, SPC, SPC/E, F3C (45, 46)]used for water liquid. Finally, the torsional potentials are derived, beginning withthe simplest molecule, ethane, and continuing through such molecules as tetrahy-drofuran and dimethyl phosphate, until reaching nucleosides and nucleotides. Asnoted above, a minimal approach to torsional potentials is used, whereby the samesingle torsional potential is used for the rotation around Csp3-Csp3 bonds in ethaneas the Csp3-Csp3 bonds in nucleosides. However, when bonding/backbonding (i.e.anomeric) effects are well defined, owing to very different electronegativities ofX and Y in X-C-C-Y, additional Fourier terms are needed to correctly representthese.

The approaches of MacKerell and colleagues and Langley differ in their choicesof partial electrostatic charges qi, with MacKerell et al deriving theirs empiricallyand Langley using the Cornell et al values. Langley used the CHARMM23 valuesdirectly for the bond, angle, and nonbonded parameters. Where both methodsmainly differ from that of Cornell et al is in their use of a considerably larger numberof atom types to include more independent torsional parameters. For example,Langley began with CHARMM23 torsional values and the results of torsionalenergies that were derived from high-level ab initio calculations, but iterativelyadjusted these data based on the results of many aqueous simulations of DNAand RNA fragments, to more accurately reproduce crystal structure parameters forDNA helices and to ensure B! A DNA transitions when they were known tohappen empirically in response to salt and counterions. MacKerell et al made theirempirical adjustments in CHARMM23 on a number of data (not including explicitDNA simulations), which did not include nucleoside torsional energies and whichled to this force field making the C20-endo torsional energy higher than C30-endo

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



torsional energy and the preference of A- over B-DNA for double helical DNA inaqueous solution. The Cornell et al force field led to a C20-endo nucleoside thatwas lower in energy than a C30-endo nucleoside, without any special parameteradjustments. A key factor is that the Osp3-Csp3-Csp3-Osp3 V2 torsional parameter,derived to fit the O-C-C-O gauche/trans energy difference in 1,2 dihydroxyethane,leads to a stabilization of C20-endo puckers in deoxy sugars.

How does one extend the nucleic acid force field models to other fragments, forexample base or sugar analogs? The advantage of RESP charges and a minimaltorsional model is their ease of use, which is illustrated in studies by Shields et al(47); for base analogs, where one merely carries out a single QM calculation at the631G* level and derives the RESP charges with this wave function. Further supportfor the accuracy of models with RESP charges is the study by Hobza et al (48),who showed that the charges and van der Waals parameters of Cornell et al betterreproduced base-base hydrogen bonding and stacking than did other models, suchas CHARMM23 and OPLS (49), as well as lower-level QM calculations, despitethe fact that the latter were adjusted to fit various properties, including hydrogenbond energies. Thus, the RESP approach to charge derivation appears quite robustand easily extendable.

In a later force field, Foloppe & MacKerell (50) used ab initio calculations ondeoxynucleosides and parameterized the torsional potential of the force field toreproduce the relative C20-endo/C30-endo energy difference of deoxyribosylcyto-sine (dC), deoxyribosylthymine (dT), deoxyribosyladenine (dA), and deoxyribo-sylguanine (dG). It is interesting that dC was lower in energy in C30-endo when D gC (50). Cheatham et al (51) independently carried out ab initio calculationson these systems and also carried out ab initio calculations on the C20-endo/C30-endo energy difference for the nucleoside analog, where the 50-OH was changedto H. In this case, all nucleosides with 50-OH! 50-H favored C20-endo by compa-rable energies, supporting the idea that the more favorable Cl-H to O50 interactionin cytosine than thymine in C30-endo vs C20-endo causes the C30-endo preferencein dC. It remains to be determined why NMR studies on nucleosides apparentlydo not reflect this preference, in that they apparently consistently favor C20-endoby 0.3 kcal/mol for all nucleosides.

Cheatham et al also presented a subtly modified version of the Cornell et al forcefield for nucleic acids (51), in which only three torsional potentials were altered,C(sp3)-O(sp3)-C(sp3)-N(sp2) (C40-O10-C10-N9 or N1), O(sp3)-C(sp3)-N(sp2)-C(sp2) (010-C10-N9 or N1-C8 or C6), and O(sp3)-C(sp3)-C(sp3)-O(sp3) (O10-C40-C30-O30 and O50-C5-C40-O10). This modification was inspired by the fact that ananomeric gauche preference for C(sp3)-O(sp3)-C(sp3)-N(sp2) was expected, whichwas not included in the example from Cornell et al, and by the the fact that theab initio model system for the derivation of O10-C10-N9(N1)-C8(C6) was smallerthan desirable. The final torsion (O-C-C-O) was slightly changed from that ofCornell et al to ensure a C20-endo preference for the nucleotides. This set of pa-rameters, parm98, led to better reproduction of helical repeat, C20-endo sugarpucker phase, and angle for average DNA helices, but these parameters did

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



not support A-form structure in 85% ethanol and for d[ACCCGCGGGT]2 in thepresence of Co(NH3)63C, as found experimentally and by Cornell et al, which sug-gests that parm98 is not an unequivocal improvement over the approach of Cornellet al, particularly for studying distortions and conformational transitions in B-DNAhelices.

In this regard, one of the difficulties in assessing the results of MD simulations insolution is that experimental X-ray structures likely include considerable crystal-packing effects, and NMR-derived structures are not sufficiently accurate to givefine sequence-specific details for such structures. MD simulations in the crystalfor nanoseconds appear to be in excellent agreement with experiments, but sucha timescale for crystal simulations may be insufficient to reveal flaws in the forcefield. Thus, other than gross flaws, it is challenging to evaluate force fields forcomplex nucleic acid systems, and this challenge is compounded by the observationof metastable structures in MD simulation over multinanosecond time frames.

What are the perspectives for future improvements in nucleic acid force fields?All of the current force fields are effective two-body additive, and PA Kollmanand P Cieplak (unpublished data) are developing a protein/nucleic acid force fieldthat uses a higher-level ab initio model to derive the RESP charges and includes lonepairs and polarization. Other groups are also developing more advanced modelsof protein force fields. One should realize that the addition of polarization ef-fects is only the first step in going beyond the simple nonbonded (electrostatic Cvan der Waals) terms of the standard molecular mechanics force field equation; itis not clear how to explicitly include charge transfer terms in a classical molecularmechanical approach.

STABLE AND DYNAMIC SIMULATIONOF NUCLEIC ACIDS IN SOLUTION

By ca. 1995, stable nanosecond-length MD simulations of DNA duplexes and othernucleic acid structures in solution were becoming routine. As mentioned above,this progress related to increases in computer power, more reliable force fields, andthe use of either Ewald methods or atom-based force shifting to treat electrostaticinteractions. The first major tests of these methods and force fields involved crystalsimulation followed shortly after by solution simulations. Key issues addressedwere whether the experimentally determined structure could be maintained andwhether collective and atomic motions were inhibited.

Crystal Simulations

Darden and coworkers, applying the standard group-based truncated cutoff inAMBER (19), had tremendous difficulty in maintaining the experimental structurein simulations of nucleic acids in solution, which led to considerable effort todevise a better method and, subsequently, publication of the PME method (24),

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



a generalization of the particle-particle PME method of Hockney & Eastwood(22). After development of the methods, Darden et al investigated the stabilityof a variety of nucleic acid crystal structures, including the B-DNA model ofd[CGCGAATTCGCG]2 (52), a Z-DNA hexamer crystal (53), and high-resolutionRNA dinucleotide structures (54). The simulation of the B-DNA crystal, whichincluded four complete dodecamers in the periodic cell along with ions and solvent,was run for 2.2 ns, one of the longest nucleic acid simulations of the time.During the MD, the structure remained close to the crystal structure [within 1.2A rms deviation (rmsd) for all of the heavy atoms]. Sequence-specific structurewas well maintained and properly reproduced by the force field, including theexpected sequence-specific narrowing of the central AATT region. Dynamics wereapparently not inhibited too severely because atomic positional fluctuations thatwere calculated from the simulation correlated with the thermal B-factors, althoughthey were 25% lower than expected. Most motion was seen in the phosphates,followed by the sugars, and then the bases, in line with expectations from thethermal-mobility data. Despite the tight packing in the crystal, dynamics wereclearly evident during the simulation, including transient crankshaft transitions in and and BI-to-BII transitions in the " and backbone angles.

Crystal simulations (minimization and short 20-ps dynamics simulations)were also performed by MacKerell and coworkers (15) on various base pairs andmonomers during the development of the CHARMM all-22 force field. Thesesimulations were useful to check for force field errors and also gave reasonableestimates of the experimental heats of sublimation. To test the force field on largerstructures in solution, short (100-ps) simulations of GpC, a Z-DNA hexamer, andthe B-DNA decamer d[CCAACGTTGG]2 were performed with crystal symmetryapplied. In these simulations, the structures were well maintained. Similar to thepreviously discussed crystal simulations, the thermal mobility was lower in thesimulation than that seen in the crystallography experiments. Although the authorsattribute this to the absence of lattice disorder in the simulation and/or incompletesampling of the relevant conformational substates, it may also result from inhibitedsampling by the crystal packing and boundary conditions.

Although the demonstration of stable trajectories is a necessary first step inthe validation of the force fields and simulation protocol, it is not sufficient. Smalldeviations from the crystal structure could also result if the sampled structureis metastable on the MD simulation timescale (hiding force field deficiencies)and/or if the applied methods inhibit fluctuations (such as may occur with atomicbased energy switching over small ranges). Crystal simulations present particulardifficulties in conformational sampling owing to the high density of packing, whichmay inhibit fluctuations. Given that, in general, simulations tend to overemphasizethe motions compared with experiments (55), this effect could be an issue in theearly crystal simulations. A major difficulty in preparing crystal simulations isbuilding models at the correct density, because the simulations are most often runat a constant volume. If the crystal is slightly over packed, high pressures mayresult, and the motion may be inhibited.

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



More recently, Bevan et al (57) have reinvestigated the simulation of nucleicacid crystals, starting with the high-resolution structure of d[CCAACGTTGG]2.These simulations were run for a significantly longer period of time (25 ns) witha much more careful building and equilibration (including constant-pressure MD)of the initial crystal model. In these simulations, both with a single-unit cell con-taining two duplexes and a double-unit cell containing four duplexes, the crystalstructure was well maintained. Note, however, that in this relatively short simula-tion (compared with the millisecond or longer time-scale averages in experiments),the various duplex structures are not identical in each simulation (i.e. the dynam-ics of each are independent). Compared with free dynamics in solution (withoutthe imposition of crystal boundary conditions), the sampling is also somewhatinhibited, which implies that to fully equilibrate the crystal simulations takeslonger when crystal symmetry and boundary conditions are applied. Despite thesedifferences among the duplexes, low rmsd values are obtained for all of them inthe simulation. The rmsd values for 5-ns average structures to the crystal structureare in the range of 1.01.48 A.

Stable Simulations of Nucleic Acids in Solution

Nucleic acid structures are strongly influenced by the solvent and ionic envi-ronment. This is most readily apparent with DNA duplexes, which can adopt avariety of canonical (A, B, C, Z) and aggregated (P) forms, depending on the wa-ter activity, ionic concentration, and ionic identity (5860). Crystal packing canalso modulate duplex structure, which has been observed in crystal structures ofthe same sequence that adopt different structures depending on the crystal form(6163). Sequences that are otherwise B-DNA under solution conditions havebeen known to crystallize into A-DNA conformations [even while maintaininga disorderednonrefinableB-DNA molecule floating in the crystals solventchannel (64)]. Given the potential for artifacts from crystal packing, the limitedresolution of NMR-refined structures of nucleic acids at the time, and becauseDNA under physiological conditions more likely resembles solution conditionsthan crystal conditions, there was considerable interest in investigating MD ofnucleic acid under solution conditions. A goal was to move beyond simply demon-strating stable simulation of canonical B-DNA or A-DNA structures to determinewhether the methods could accurately represent sequence-dependent fine struc-tures. An added goal was to determine whether the methods could accuratelyrepresent specific water and ion association (as is discussed in more detail below).Critical at the outset was the demonstration that the methods, while generatingstable trajectories, did not inhibit motion (as was seen to a limited degree in thecrystal simulations). We (6) and others (6568) performed solution phase simu-lations of DNA duplexes, triplexes, and RNA hairpin loops in ca. 1995, applyingEwald methods. In these simulations, the nucleic acid structures were stable overnanosecond length simulations. The triplex simulations suggested a key role forwater in the stabilization of the third strand via coordination to the bases in the first

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



and third strands (66, 67); additionally, specific ion binding in the major groovewas also observed (both of which are discussed in greater detail below). Giventhat the local structure of nucleic acids is modulated in a sequence-specific man-ner by the specific interaction of ions and water, simpler implicit models of ionicand hydration effects may not be sufficient to capture the sequence-dependent finestructure of nucleic acids. The fact that specific water and ion interactions areimportant largely justifies the considerably greater cost of simulations, includingexplicit solvent and counterions. However, before experimentalists are willing toaccept our results and, ultimately, predictions, it is necessary to push the methods ina variety of directions to show the wide range of applicability and their agreementwith experiments.

As a step towards validating the methods and one particular force field, Younget al (69, 70) performed extensive simulation (>5 ns) of the benchmarkd[CGCGAATTCGCG]2 DNA duplex with varied NaC ion distributions in solution,using PME methods and the Cornell et al force field (69, 70). These simulations(some of the longest of the time) suggested that this force field did a surprisinglygood job of representing the B-DNA structure. As the authors pointed out, this issignificant because earlier force fields tended to stabilize around unrealistic struc-tures that were intermediate between A- and B-forms of DNA. Over the entire5-ns simulation, the all-atom rmsd values for the average structure from the MDwere 2.71 A from canonical B-DNA and 2.88 A from the crystal structure of thissequence. The observation that the structure is closer to canonical B-DNA thanthe crystal structure does not imply that this force field does not accurately rep-resent the sequence-dependent structure, in part because the simulation did notapply the crystal boundary conditions. This particular crystal structure, influencedby crystal-packing artifacts, does not have the expected palindromic symmetrythat is more accurately reproduced in the MD simulation. Given this, a bettercomparison might be to the NMR data of Lane and coworkers (71), for whichgood agreement to the few reported helicoidal parameters is observed, except thatthe MD model is slightly underwound compared with experiments. Similar de-viations from the crystal structure are seen in simulation of d[CCAACGTTGG]2(72), where, in solution [as is expected (73)], there is a significant bend into themajor groove at the TpG steps, in contrast to the crystal structure, which has a bendinto the minor groove at this step (56). This sequence was also significantly undertwisted, with average pucker and backbone angles that are also too low; theseare known deficiencies of the Cornell et al force field that have been improvedwith more recent versions of the force field (51). Otherwise, backbone anglesare in excellent agreement with crystal values. Despite some differences betweenthe crystal and solution structures, certain sequence-dependent structural featuresare shared, and these are reproduced in the simulation. In both the d[CGCGAATTCGCG]2 and d[CCAACGTTGG]2 sequences, there is a narrowing of theminor groove at the central region of the duplex, and in the former sequence,characteristic junction bending at the central A-tract. Also, there is typicallyfairly good agreement with some of the helicoidal parameters, such as rise, shift,

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



and slide. A good example of sequence-specific structure reproduction in the simu-lation was seen in simulations of the A-RNA duplex r[CCAAGCUUGG]2, startedfrom a canonical A-form structure (74). During this simulation, the central CpGstep developed an anomalous rise and low helical twist, which correlates with apositive shifting and negative slide of the central guanines to increase stacking.Although at first we were worried that this change was an artifact of the simulation,this interstrand guanine stacking has been observed in all of the tetragonal octamerA-DNA structures (75) at a central CpG step. Because what was thought to be anartifact of crystal packing was observed spontaneously in solution simulations,this interstrand stacking is likely a real, contextual sequence-dependent structuraleffect.

In addition to demonstrating reasonable sequence-specific structure, Younget al (70) attempted to further validate the force field by comparing the distri-butions of angle and helicoidal values that were calculated from configurationssampled during the MD, with those tabulated from an analysis of A-DNA andB-DNA structures in the Nucleic Acid Databank (76). When compared in thisway, the distributions of values from the simulation overlap well with those seenfor B-DNA structures, except that the width of the distributions is, in general,larger in the simulation. This difference is likely to be caused by thermal motionover short timescales (which is averaged away in the crystal structures) and servesto emphasize that these structures are dynamic on a subnanosecond timescale.Although the distributions, in general, agree well, there are some differencesowing to the sequence-specific structure maintained during the MD simulation.These differences are expected and relate to bending of the duplex and groovewidths, both of which agree with experiments. More recently, the data from thedatabase of experimental structures have been used to further refine the forcefields. Langley (18), using a hybrid of the Cornell et al charges (14), MacKerellbase parameters (15), and other parameters from QUANTA 3.2 (77), has itera-tively modified the torsional parameters in simulations of DNA and RNA under avariety of conditions to match the expected experimental values. This force fieldgives an excellent representation of a variety of nucleic acid structures in solu-tion and under a variety of other conditions (as is discussed below). Currently,work is underway in various laboratories to further validate these newer forcefields.

Dynamics of Nucleic Acids in Solution

A major worry with the MD simulations of nucleic acids in the ca. 1995 timeframe was that, suddenly, with a proper treatment of the long-range electrostaticforces, the DNA was stable in the simulations, whereas previously it distortedrapidly. This worry regarding the overstabilization of DNA, as was somewhatrealized in the crystal simulations with the lower-than-expected fluctuations, led toconsiderable effort to demonstrate that, indeed, the MD simulations displayed notonly a stable structure, but expected dynamics. As touched on above, considerable

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



dynamics are seen in the MD simulations. With the Cornell et al force field, forexample, over the course of an equilibrated nanosecond length simulation, theinstantaneous structures fluctuate within 1.5 A of the average structure. Withall of the recent force fields, considerable motion is evident, which is observedthrough the DNA, with correlated changes in backbone dihedral angles, sugarrepuckering, and even, in some cases, base pair opening and closing. The backbonetransitions in DNA, including BI-to-BII backbone transitions (correlated transitionsin " and from trans, gauche- to gauche-, trans) occurring on a timescale from5 to >500 ps, and sugar repuckering are evident throughout the sequence andsimulations (70, 72). Not all sequences and structures show the same dynamics.For example, B-DNA structures are considerably more flexible than A-DNA (74)or Z-DNA structures. Also, the force fields reproduce the trend that pyrimidinerepuckering is more frequent than purine repuckering, and purines tend to adopthigher sugar pucker phase values (73, 78, 79). More motion is also observed atthe termini, with increased sugar repuckering rates and larger atomic positionalfluctuations evident. Overall, the fluctuations of base pair roll, tilt, and twist thatwe have observed in simulations (72, 74) follow the expected trends (with rollingpreferred over tilting) except that they are slightly larger in magnitude (roll7.58.7, tilt 4.65.7, twist 4.95.3) than expected (8082). The observationof enhanced motion is gratifying because we had worried that the imposition ofperiodic boundary conditions and application of Ewald methods led to dampedmotion; this worry was further underscored by the later retracted paper stating thatEwald simulations damped translational motion (83).

Despite the dearth of data on the rates of backbone transition and sugar repuck-ering from the simulations in the subnanosecond to 1- to 10-ns time range, verylittle direct comparison to experiments has been performed because few exper-imental data are available. New data from the dynamic Stokes shift of speciallydesigned fluorescent base-pair analogs show that experiments can give insight intothe timescale of motions of internal bases, suggesting in this case, motions in the300-ps and 13-ns time ranges (84). The former is in the range that is accessiblein current MD simulations, with simulations in the 100-ns time range expectedin the next few years. Also, MD simulation appears to be able to represent tran-siently some of the longer-time-scale processes, such as base pair opening (as isdiscussed below). In addition to overall motion of the nucleic acid, experimentscan give detailed information about the timescale of solvent interaction, such asby measuring water hydrogen-bonding interactions to nucleic acid bases by NMR.Only recently people have started to compare these specific water hydration life-times with experiments (85). In addition to comparisons with the experimentaldata, there has also been recent work that aims to extract the essential modes ofmotion from explicit all-atom simulations to improve mesoscale simulation mod-els. For example, Bruant et al (86) have used nanosecond length MD simulation toaid this effort and effectively allow decisions as to which degrees of freedom can befrozen without compromising the ability to represent the structure and dynamics(86).

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



Dynamics of Nucleic Acids in Solution: A$ B TransitionsGiven the early worry that a complete treatment of the electrostatics (Ewald) andthe use of periodic boundary conditions (i.e. pseudo crystal conditions) mightinhibit motion and also to check the reliability of the Cornell et al force field(14), we began a study of A-DNA duplexes of d[CCAACGTTGG]2 to deter-mine whether A-DNA would be (meta)stabilized in MD simulations. To our sur-prise, spontaneous A-DNAtoB-DNA transitions were seen in 500 ps (72).This result was gratifying because not only was the B-form stabilized under low-salt-solution conditions as was expected, but these simulations suggested that thebarriers to interconversion are not too high and can be surmounted in nanosec-ond length simulations. This finding opened the door for detailed study of themolecular processes that stabilize the A and B forms of DNA and also addedcredibility to the results obtained in the simulation of DNA duplexes. Since thistime, all of the duplex DNA sequences studied to date with the Cornell et alforce field under physiological (low)-salt conditions [except perhaps the TATAbox binding proteins cognate DNA sequence, which retains some A-DNA char-acter (87), or very G-rich sequences, which display A-like helicoidal parametersbut a B-DNA backbone (18, 88)] show spontaneous A-DNAtoB-DNA transi-tions. Conversion from A-DNA to B-DNA is also seen for DNA hairpins, despitethe extra constraints of the loop. Simulations by Miller & Kollman (89) showeda rapid A-DNAtoB-DNA transition in the stem part of the hairpin sequencewhen the hairpin 50-GGACUUGGUCC-30 is changed to its fully deoxy form,d[GGACUUCGGUCC] with either deoxyuracil or thymine (in place of U), con-sistent with NMR data (90). Triplex DNA is also able to undergo A-DNAtoB-DNA transitions (91); three1.5-ns simulations of triplex d(TAT) DNA, startingin an A-form, a B-form, and a P-type structure derived from crystallography, allconverged to a common B-form structure in solution. Similar results were alsoseen with a PNADNAPNA triple helix (in which PNA represents polyamide-modified nucleic acid), where three very different starting structures all rapidlyconverged to a common structure that resembled the crystal structure in manyways (47).

In the simulations, the conversion from an A-form structure to a B-form struc-ture occurs very rapidly, in the range of1001000 ps. If we focus on the variablesthat change most with this conversionincluding the sugar pucker, minor groovewidth, base pair inclination, x-displacement from the helical axis, and the end-to-end lengthit does not appear that a single variable drives the A-to-B transition.Data based on analysis of a series of transitions suggest a generalized flow fromA to B in which no single mechanism can describe the complex process that in-volves many individual and concerted conformational changes. It is known fromsimulation, however, that inhibiting sugar repuckering will inhibit the change fromA-DNA to B-DNA, and than B-DNA to A-DNA transitions can be forced by induc-ing C30-endo sugar puckers (92, 93). In converting from B-DNA to A-DNA, whenthe puckers are forced into C30-endo, certain indicators convert to A-DNA values

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



rapidly, including the x-displacement and helical twist, whereas the rise betweenbase pairs and base pair inclination takes significantly longer. We suggested thatthe sugar pucker alone does not drive the transition from B-DNA to A-DNA andthat other stabilizing elements are necessary (such as salt interactions and hydra-tion in the major groove). Others have suggested not only that the sugar puckersneed to convert to C30-endo but that the constraints need to be removed to addback in some dynamics to facilitate complete conversion to A-DNA (93). It shouldbe noted, however, that the precise balance between A-DNA and B-DNA with theCornell et al force field may not be fully representative; moreover, deficiencies inthe sugar repuckering (such as C30-endo sugar pucker states that are very shortlived) may muddle this simple picture.

All of the results discussed above were with a single force field, the Cornellet al (14) force field developed by our group (or a logical extension of this forcefield as for PNA modeling). However, not all force fields are equivalent. Simula-tions with the CHARMM 23 all-hydrogen parameter set display stable A-DNAstructures and spontaneous B-DNAtoA-DNA transitions in



Environmental Dependence of DNA Structure

In addition to displaying spontaneous A-DNAtoB-DNA transitions of DNA du-plexes under conditions expected to stabilize B-DNA, a further test of the forcefields is the ability to see the reverse: B-DNAtoA-DNA transitions with the sameforce field and simulation protocol under conditions expected to stabilize A-DNA.The first tests of the force field as discussed above, specifically stabilization ofA-RNA (74) and the A-form of phosphoramidate-modified DNA (97), were notparticularly surprising. With RNA, it was expected that the addition of the moreelectronegative O20-hydroxyl, subsequent extra torsion angles, and ability of theO20 hydroxyl to hydrogen bond to nearby groups would lead to stabilization ofthe C30-endo sugar pucker. For phosphoramidate-modified DNA, C30-endo sugarpuckers were stabilized by the absence of a V2 torsion term for the backbone tosugar N-C-C-O torsions, corresponding to the O-C-C-O torsion present in unmodi-fied DNA to represent the gauche tendency of such angles (92, 97, 98). The absenceof this torsion is justified by the observation that 30-NH2 has the same percentage ofC30-endo sugar puckers as 30-H (in contrast to 30-OH). In these cases, a subtle mod-ification to the force field, caused by a chemical change, is the driving force behindthe stabilization of the A-form geometry. A far more convincing demonstration ofthe force field and simulation protocols ability to accurately represent the struc-ture and dynamics would be to demonstrate stabilization of A-DNA simply bychanges in the local environment without any force field modifications. Examplesinclude both general mechanisms, such as the reduction of water activity by highsalt conditions or the addition of ethanol, and specific mechanisms, such as con-version from B-DNA to A-DNA upon binding of a specific ligand or protein. Bothmechanisms for inducing B-DNAtoA-DNA transitions are subtle and dependon a number of factors, including ionic strength and identity and DNA sequence.For example, depending on what ion is present, ethanol-induced transitions fromB-DNA can be to A-DNA [at 76%, 80%, or 84% ethanol (v/v) in the presenceof NaC, KC, or CsC, respectively (59)] or to C-DNA or aggregated P-DNA [inthe presence of LiC or Mg2C (99, 100)]. Transitions to A-form structures fromB-DNA are not seen in methanol (101). Similarly, with the specific mechanisms,although binding of polycationic ions such as hexaaminecobalt (III), neomycin, orspermine can induce B! A transitions in certain G-rich sequences, Pt(NH3)42Ccannot (102). Given the complexities, demonstration of the stabilization of A-DNAby these simulation methods through subtle changes in the environment can giveinsight into the molecular interactions stabilizing A-DNA.

With these goals in mind, we set out to determine whether the modern sim-ulation methods could show stabilization of A-DNA and ultimately spontaneousB-DNAtoA-DNA transitions. The first set of simulations involved immersing aminimally solvated A-DNA model of d[CCAACGTTGG]2 into a periodic box ofexplicit all-atom ethanol molecules (103). In contrast to simulations in pure water,where a complete A-DNAtoB-DNA transition was observed in 0.5 ns, theA-DNA model is stable in MD simulation (applying the same protocols and force

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



field) for 5C ns. This stabilization of the A-DNA model occurs despite repuck-ering of the sugars away from C30-endo and significant C20-endo populations. Todetermine the factors that may be responsible for the stabilization of A-DNA, weinvestigated the average water, ion, and ethanol densities around the DNA. Signifi-cant water hydration and ion association were observed in the major groove (whichserves to stabilize the close approach of the phosphates in the bend across the majorgroove that characterizes A-form structures). The normally well-hydrated minorgroove appeared less hydrated, with ethanol density appearing within each end ofthe minor groove and also interacting with the backbone. Further support for theidea that ion and water hydration helps stabilize the A-DNA structure comes fromsimulations of the same sequence in pure ethanol, in which the DNA structuredistorts rapidly in nanosecond length simulations. This stabilization of A-DNA inethanol was also independently demonstrated by Sprous et al (104), using a unitedatom model for ethanol, which is more computationally efficient in simulationsof d[CGCGAATTCGCG]2. In these simulations, stabilization of the A-form (al-though subject to conformational sampling and equilibration issues) was shownto occur when the A-DNA was immersed into a premixed solvent box of 85%ethanol and water. In both cases, however, spontaneous B-DNAtoA-DNA tran-sitions could not be observed in1- to 5-ns simulations in the presence of ethanol,even at elevated temperatures. To induce a transition, Cheatham et al (103) slightlymodified the force field by reducing the V2 torsion barrier for the O-C-C-O tor-sions, which shifts the sugar pucker population slightly towards C30-endo. Whenthis is done, a rapid transition from B-DNA to A-DNA is seen. However, this subtlemodification to the force field lessens the quality of the B-DNA model in solution.

To overcome some of these force field deficiencies, as has been discussed,Langley has developed the BMS force field for nucleic acids (18). This forcefield was empirically adjusted based on the results of MD simulation of a va-riety of hexamer nucleic acid sequences under various conditions and, as such,accurately represents the environmental dependence of these sequences. WhereasB-DNAtoA-DNA transitions are observed in 75% ethanol for d[GGGCCC]2,A-DNAtoB-DNA transitions are seen for d[AAATTT]2 in 75% ethanol, asexpected. The simulations of DNA in mixed ethanol/water solution mirror the re-sults regarding ion and water hydration in the major groove seen previously. Inaddition to tuning the force field for mixed ethanol/water solution, B-DNAtoA-DNA transitions and stabilization of A-DNA are also observed in high salt (4 MNaCl); this is in contrast to the behavior seen with the Cornell et al force field (TECheatham III, unpublished observations).

To investigate a specific mechanism of A-DNA stabilization, we performedsimulations of d[ACCCGCGGGT]2 in aqueous solution and in the presence of4:1 Co(NH3)63C (105). Previous experimental work and NMR studies had demon-strated that the addition of this ion leads to B-DNAtoA-DNA transitions(102, 106). This was reproduced in a series of simulations on this sequence thatdemonstrated not only stabilization of A-DNA, but spontaneous B-DNAtoA-DNA transitions in simulations with the Cornell et al force field. The stabilization

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



of the A-form structure is clearly linked to association of the ions not only deepwithin the major groove in favorable GpG electrostatic pockets, but by more tran-sient association of ions that help bridge the close approach of the phosphates.Overall, the two types of association seen for these ions in the simulation (thetight binding and more weakly associated ions) are consistent with experimentalobservations (107, 108).

Mixed A-DNA/B-DNA Duplexes: Simulations ofd[CCCCCTTTTT]

A further test of the reliability of the force fields in the representation of A-DNA/B-DNA conformational equilibria is the ability to accurately represent the struc-ture of the d[CCCCCTTTTT] duplex, which is B-DNA under low-salt conditionsbut converts to A-DNA in the G-rich region as the salt concentration is raised (109)or trifluorethanol is added (110). To investigate this issue, Feig & Pettitt have per-formed a series of simulations on sequences of this type with the Cornell et al andCHARMM23 force fields (111, 112). These fairly long simulations (10 ns) on thed[C5T5]-d[A5G5] were run using both the Cornell et al (14) and CHARMM23 all-hydrogen parameter set (15) force fields at a salt concentration of 0.8 M addedNaC/Cl, using Ewald methods. Under these conditions, B-DNA is expected to bethe dominant structural form, because complete conversion to a B/A junction doesnot occur until higher salt concentrations. The simulations confirm the previouslyseen force field biases, specifically that this early CHARMM force field supports anA-form geometry throughout the duplex, whereas the Cornell et al force field sup-ports mostly a B-form geometry. Particularly interesting is that both force fieldsshow fluctuations in the conformation between both A- and B-form structures,although this particular CHARMM force field rarely samples B-DNA conforma-tions. The population of A-like helicoidal values in the Cornell et al force field sim-ulations appears higher than seen previously (70, 72, 74); this is likely caused by thehigher salt concentration in these simulations. Unfortunately, we have not been ableto induce A-form structure in high-salt simulations of poly(G)-poly(C) 10-mer du-plexes in2-ns simulations with the Cornell et al force field to date (TE CheathamIII, unpublished data). In addition to investigating the structural preferences in thesimulations, Feig & Pettitt provide a detailed analysis of sampling of the varioushelicoidal and angle values over the course of the simulation. This analysis suggeststhat convergence in the MD simulations with these force fields and simulationprotocols requires4 ns (112); this is consistent with correlation times of34 nsfor dynamics of a 10-base-pair fragment measured by NMR (113). More recentmeasurements of the internal dynamics within a 16-mer DNA duplex, based onmeasurement of the dynamic Stokes shift of a internal coumarin C-riboside probe,give longer correlation times [with components at 300 ps and 13 ns (84)]. Thissuggests that longer simulations of DNA duplexes in solution are necessary.

Much better behavior has been seen in 5-ns-length simulations with this se-quence and Langleys BMS force field (18), which reproduces a stable B-form

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



structure throughout at low-salt concentrations and stable A-form (in the G-Cregion) and B-form (in the A-T region) structures in simulations at lower wateractivity (i.e. 75% ethanol or 4 M NaCl) (114). The simulations at low wateractivity provide further evidence that A-DNA is stabilized by specific and asym-metric screening of the phosphates and grooves owing to sequence-specific waterand ion association. At high salt, the poly(G)-poly(C) region of the duplex is in theA-form and shows extensive ion and water association in the major groove, withions interacting preferentially with the guanine N7 or O6 atoms and the phosphatebackbone with no cation association in the minor groove; in contrast, the B-formstructure of the poly(A)-poly(T) region shows the opposite behavior (DR Langley,personal communication).

DNA Hydration and Ion Association

MD simulations of nucleic acids in aqueous solution provide a wealth of detailsabout the structure and dynamics. From the discussion in the previous section, itis also clear that the modern simulation protocols and force fields are sufficientlyrobust at this time even to represent subtle environmental dependencies of DNAduplex structure. Given that MD simulations give a time history for all atomicmotions in the system, in simulations that include explicit water and counterions,precise details for all specific ion and water interactions with the nucleic acid canbe monitored. Average pictures of the hydration can be obtained by calculatingradial-distribution functions or more specific proximity-based distribution func-tions (115, 116), which in turn can be integrated to estimate occupancies. Detailedpictures of the hydration or ion association can be obtained by fixing the DNA toa common reference frame and accumulating relevant populations or atomic den-sities on a grid for visualization of the hydration (70, 74, 117, 118). These resultscan be compared directly with specific high-resolution crystal structures or to av-erage analyses obtained by looking at many different crystal structures (117, 119).In addition, populations and lifetimes for various DNA hydrogen bond donor oracceptor interactions to water or ions can be tabulated (85, 120) and then compareddirectly with measurements from NMR. Through this step, a very detailed pictureof the specific hydration and ion association with nucleic acids can be obtained. Thewealth of data provide complexity in the presentation because simply listing everypossible water-base contact quickly leaves the reader glassy-eyed and, moreover,subtleties in the methods are often hidden. These subtleties relate to differencesin the force fields and water models used in the simulation (with inaccurate wa-ter transport properties or lack of tetrahedrality in their interaction), inaccuraciesin the treatment of long-range electrostatics (which can lead to distortion or en-hanced dynamics in the nucleic acid), and motions of the nucleic acid (which willchange interpretations about the specific localization of water, depending on thestatic reference frame chosen). Based on these problems, quantitative analysisof the strength of water interaction based on water oxygen probability densities ata specific site, compared with bulk [and treated as equilibrium-binding constants

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



(91)], should be considered rough estimates. Analyses based on lifetimes of inter-action may be better; however, these analyses again are dependent on definitionsof interaction, accuracy of the water transport, and other factors. To this end, theanalysis of MD simulations for specific hydration and ion association should beconsidered just a beginning. Qualitatively, excellent reproduction of what is seenexperimentally is seen in a variety of simulations with various simulation protocols.Quantitatively, more detailed simulation and analyses are likely to be necessary;however, there have been a few good advances (85, 120, 121). It is also noteworthythat analyzing hydration by MD and Monte Carlo simulation is not new and hasa long history, starting with Monte Carlo simulations with fixed nucleic acid (forreview see 3, 122); this discussion focuses on more recent results from simulationsof nucleic acids in explicit solvent, using modern simulation protocols.

Given the ability for ready analyses of specific hydration patterns in nucleicacids from MD simulations with explicit solvent, this type of analysis has beenperformed for most of these published simulations. Looking at water hydrationdensities, a number of groups have demonstrated that MD simulation can accu-rately reproduce the spine of hydration in A-tracts, d[CCAACGTTGG]2 andd[CGCGAATTCGCG]2, cones of hydration around phosphates, and extensivehydration in the major groove (69, 70, 74, 88, 92, 97, 118, 123, 124) as seen in ex-periments. The differences in hydration seen in A-form structures, such as water-bridged phosphates and extensive major-groove hydration, are also readily ap-parent (18, 74, 103, 105, 111, 112, 114, 123), as is the specific hydration of RNAstructures (120, 125, 126). A nice example of the latter is seen in simulations of theRNA microhelixAla and 3:70 variants, in which a strongly bound water moleculein the minor groove is evident, consistent with crystallographic analyses (120).Moreover, in simulations with various wobble pairs (mutants), less to no wa-ter density is found at the same location, which correlates directly with measuredaminoacylation activities. Lifetimes of bound water have also been estimated fromthe MD simulations and suggest water lifetimes in the 10- to 600-ps time range,which is, in general, consistent with experimental data and suggests lifetimes of



these simulations, the cesium atoms appear to interact with bases or sugar oxygensdirectly in the minor groove, sodium via water bridges (with some interaction inmajor groove and with phosphates), and lithium ions directly to the phosphatesand rarely in the minor groove. The addition of divalent ions (Mg2C) also leads todifferences in the simulation, such as decreased fluctuation of the backbone andreduced water mobility (29) and enhanced A-tract bending (124). These obser-vations suggest that subtle differences in the force field (specifically the van derWaals parameters) can, even without the inclusion of nonadditive effects, representsome of the differences in ion interaction and association; however, this furtherpoints out force field dependencies in such analysis, because there is no inclu-sion of nonadditive effects and, given the differences seen, careful calibration andbenchmarking of the effects are necessary. However, given the precise molecularpicture that emerges, MD simulation will be a useful tool to unravel controversiesin the literature about the role of specific ion binding on nucleic acid structure(128, 130) and bending.

Energetic Balance of the Force Fields andFree-Energy Simulation

One exciting recent development has been the ability to calculate free energies forvery different structures, by using a combination of MD simulations in explicit wa-ter and averaging of the molecular mechanical energies and continuum-calculatedsolvation free energies of representative solute configurations sampled in the MDsimulations. This calculation has been applied to the relative energies of A- andB-DNA by both Srinivasan et al (131) and Jayaram et al (132), with both ap-proaches finding a preference for the B form by a reasonable amount. Srinivasanalso examined a number of other double helical structures, including phosphorami-date duplexes, in which the 30-O is changed to NH and in which both calculationand experiment find the A rather than the B form more stable, and they examinedribonucleotide helices, in which, both experimentally and in the calculation, theA form is favored. Both Srinivasan et al and Cheatham et al (88) investigatedsequence-dependent effects and found, as expected and also as found experimen-tally, that runs of poly(dG) or poly(dC) favor A and runs of poly(dT) favor B.

Srinivasan et al also compared the energy of some RNA hairpin loops (133),and, in two cases, both calculations found that the expected structural form waslower in free energy. A UUCG hairpin becomes a duplex in the crystal and, indeed,Srinivasan found the monomer (observed in solution) to be lower in free energy atlow-salt concentrations and the dimer to be lower in free energy at high-salt con-centrations. That this energy balance is achieved is somewhat miraculous becausethe internal electrostatic energy is more favorable for two monomers than for theduplex. The energy balance is evaluated with no cutoff and a unity dielectric con-stant, which leads to very large electrostatic-energy differences (1800 kcal/mol)between two monomers, with the two sets of 11 negative charges infinitely sepa-rated compared with the dimer, in which there are 22 negative charges.

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



Reyes & Kollman applied this methodology to study the relative free energyof association of U1A protein and two RNAs, an internal loop and a hairpin loop(134). A new approach to computational mutagenesis was developed, and thefree energies of conformational change upon binding were calculated to be 10kcal/mol for both protein and RNA.

In addition to the crude free-energy estimates made directly from energetic anal-ysis of the MD-sampled configurations, explicit free-energy perturbation methods(135) have also been applied to nucleic acid systems, with varying success. Thesemethods have been used both to evaluate the strength of specific ligand interactionsand also to estimate the potential of mean force of association or stacking. A recentexample of the former is in the estimation of the absolute free energy of netropsinbinding to d[CGCGAATTCGCG]2 (136), which is estimated to be10.3 kcal/molfrom experimental data. Despite the use of poor cutoff methods, free-energy per-turbation methods in MD simulations suggest a free energy of11.5 kcal/mol forlow-salt d[GCGAATTCGC]2 and10.2 kcal/mol for d[CGCAAATTGGC]2. Thepredominant contribution to the free energy is from van der Waals interactions.Although these results appear to be in very good agreement, it should be notedthat the error bars on the simulation are significant. These statistical errors may inprinciple be reduced upon the application of a better treatment of the long-rangeelectrostatic interactions. In addition to reasonable representation of the bindingfree energy, it is interesting that 14 waters diffuse into the minor groove uponannihilation of the netropsin, which is consistent with the crystal results.

Potential of mean force estimates (137) have also been made of a variety ofprocesses ranging from base pair interaction (138, 139) to stacking (140146) toDNA elongation (147). Simulations of Watson-Crick base pairing suggest a freeenergy of2 kcal/mol per hydrogen bond and highlight the interactions of solventin base pairing through secondary water-bridged minima (138). These calculationssuggest that (isolated) adenine-thymine pairs undergo frequent exchanges betweenstandard and reversed Watson-Crick base pairing. Although base pairing is not verysensitive to the surrounding sequence, base stacking is. Simulations suggest thatthe free energy of converting from an unstacked to a stacked conformation is verysequence dependent with the experimental preference of purine-purine > purine-pyrimidine pyrimidine-purine > pyrimidine-pyrimidine accurately reproduced(141, 142). MD simulation can get a handle on this process, as has been seen insimulations in which isolated unstacked ribodinucleoside monophosphate convertsto stacked forms (140) and, more recently, in simulations of poly(A) 10-mer singlestrands that appear to spontaneously stack into duplex like structures [i.e. thestructures resemble the structure of poly(A) in poly(A)-poly(T) 10-mer duplexes;TE Cheatham III, unpublished data].

Stretching DNA

Whereas regularized models of stretched DNA are seen in molecular-modelingtreatments with implicit solvent representations (148151) and the DNA can be

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



stretched to twice its normal length before base pairs break, when the stretching isdone with MD simulation, very irregular structures characterized by local distor-tions and disrupted base pairing are seen (147, 152). Despite the differences, bothmolecular models and the MD simulation give complementary pictures. Whereasthe molecular-mechanical models and internal coordinate treatments map out theset of possible ensemble-averaged and minimal-energy structures, the MD tra-jectories map out one particular path towards strand separation. By calculatingpotential of mean forces for strand separation in MD simulation, MacKerell &Lee (147) showed that reasonable quantitative agreement can be seen with exper-imental data. The simulations suggest that DNA can be stretched from a lengthshorter than A-DNA to 2.4-fold the contour length of B-DNA before any ener-getic penalty is paid, at which time strand separation begins to occur (with a forceat the barrier of 0.09 nN). This corresponds well with atomic force microscopyexperiments, which suggests strand separation at2.6 contour lengths with a forceat the barrier of 0.13 nN.

DNA Bending: Simulations of Phased A-Tracts in DNA

The progress in the representation of DNA structure in simulations has been excit-ing because the level of detail moves beyond simply reproducing the canonicalduplex structure to displaying sequence-dependent fine structure. A good exampleis with the spontaneous generation of both static and dynamic DNA bending. In thesimulations of d[CCAACGTTGG]2 starting from a canonical B-DNA or A-DNAstructure, bending at the TpG steps into the major groove was seen (72), consistentwith solution experiments (73), in contrast to the bending into the minor grooveat this step seen in the crystal (56). In simulations of d[CGCGAATTCGCG]2 onthe other hand, the MD simulations exhibit bending in the same general regionsas that seen in the myriad of crystal structures, with bending to the minor grooveat the G2-C3 steps, followed by a bend to the major groove at C3-G4 (70, 153). Inthis structure, the bending loci are at or near the junctions of the AATT and CGCGtracts. The ability of MD simulation to spontaneously reproduce these bendingphenomena led to a series of beautiful simulations by Young & Beveridge (124)and Sprous et al (154) to investigate A-tract (i.e. adenine repeats of>4 nucleotides)bending and phased A-tracts (i.e. repeats of A-tracts in phase with helical repeat)in DNA in solution, using the modern simulation methods.

The precise structural effects and role of A-tracts in DNA bending at the molec-ular level are not fully understood (for review, see 124, 155157). Although gelexperiments and anomalous runtimes clearly demonstrate that DNA is bent by1721 per A-tract in the direction of the minor groove at the center of theA-tract stretches (158, 159), how it is bent is not clear. A large variety of mod-els and combinations of these have been proposed, including those that supposestraight A-tract regions with bending at junctions, wedge models that proposesmall bends at ApA steps in the A-tract, and general sequence bend models(for more information, see the cited reviews above). Because study of A-tract

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



bending is complicated by factors such as ionic strength and identity, cocrystaliza-tion agents (160, 161), and the difficulty in obtaining reliable structural informa-tion, this makes A-tract bending an excellent benchmark for simulation methods.Some of the most detailed structural information has come from hydroxyl radicalcleavage experiments, which suggest a progressive narrowing of the minor groovein the A-tract, moving from 50 to 30 (162, 163). In addition, NMR experiments(concurrent with the simulations performed by Young & Beveridge) suggest thelocalization of Mn2C at the 50-end of A-tract sequences in the minor groove (whichis consistent with a wider minor groove; 127).

To investigate A-tract bending, MD simulations were performed on a 25-merDNA oligonucleotide with two successive A-tracts (a 6-mer and a 5-mer) spacedby 11 base pairs [therefore in phase with a full turn of B-form DNA (124)]. A con-trol 25-mer sequence without A-tracts, composed of three repeats of the BamH1recognition sequence, was also subjected to MD simulations. Both simulations,with various salt concentrations, used the Cornell et al (14) force field in AMBER(19), with periodic boundary conditions, explicit solvent and counterions, and aPME treatment of the electrostatics (25), and both simulations were run for5 ns.The MD results reproduce the bending in the A-tract sequence (with a calculatedbending angle of 15.4 per A-tract compared with the experimental values of1721), leading to an overall bend of2630 compared with18 in the con-trol sequence. There are a number of notable points in the simulations. The bendingis dynamic, with a range of0 up to80, over the course of the MD simulation.Whereas the direction of the bending in the A-tract sequence is generally unchang-ing over time, in the control sequence the direction is highly variable. The wideningof the minor groove at the 50-end of the A-tract is reproduced in the MD simulationfor both A-tracts. There is a profound salt dependence; simulations of the A-tractwith only net-neutralizing sodium (NaC) do not show the extra bending, in contrastto those at physiological added-salt (60 mM KCl) or higher-salt (60 mM KCl C10 mM Mg2C) concentrations. In the A-tract simulations with divalent ions added,these ions are shown to interact with the backbone and grooves in a nonuniformmanner, and during the simulation, spontaneous association of these ions into the50-ends of the A-tract in the minor groove is reproduced. The simulations also hintat the effect of increased bending on the addition of divalent counterions; furthersimulations with higher divalent salt concentrations are necessary to verify thiseffect. In addition to reproducing the experimental data, the simulations also givea molecular picture of the static and dynamic A-tract structures. As mentioned,the bending is dynamic, although it tends toward a given direction. The model ofA-tract bending supported is a junction model, with bending at the junctions anda relatively straight A-tract sequence.

Another significant test and benchmark of the methods is in the simulation ofd[GA4T4C]n vs d[GT4A4C]n sequences (154). These particular sequences are inter-esting because the A4T4 repeats show significant gel anomalies in contrast to T4A4repeats (164). To determine whether there are differences in the overall bendingand dynamics, simulations of 30-mers containing two repeats of each sequence

Ann

u. R

ev. P

hys.

Chem

. 200

0.51

:435

-471

. Dow

nloa

ded

from

ww

w.an

nual

revi

ews.o

rgby

Uni

vers

ity o

f Roc

heste

r Lib

rary

on

05/0

8/13

. For

per

sona

l use

onl

y.



were run by using the modern simulation protocols. Whereas the simulation ofthe A4T4 repeat suggested an overall bend characterized by three distinct bendsat the junctions adjacent to each A-tract and at the central CpG step, a significantbend in the T4A4 repeat was seen only at the central CpG step (leading to lessoverall curvature in this sequence). Overall, the A4T4 repeat behaves similarly tothe previously simulated A-tract sequences with a widened minor groove at the50-end, transitioning into a narrow minor groove over the rest of the A-tract. Incontrast, the T4A4 repeat does not show this minor groove opening and narrow-ing, and it displays large roll and low twist at the central TpA step, leading to azigzag bending pattern that cancels the overall bend. These results provide furthersupport for the model of A-tract regions as essentially straight, and they suggestthat models of curvature based on analysis of dinucleotide properties will likelybe insufficient to explain DNA bending.

Simulations of A-tracts have also been investigated by Sherer et al to understandthe differences in bending upon substitution for the standard adenine-thymine basepair with an inosine-methylcytosine base pair within A-tracts (165). The idea wasthat A-tract sequences could be stabilized by bifurcated hydrogen bonds enabledby high propeller twist, which therefore could rigidify the A-tract and emphasizeconformational transitions at the junction. However, experiments show very littleeffect on the overall curvature upon substitutions of inosine-methylcytosine pairs[which should have less propensity to form bifurcated hydrogen bonds (166)] forthe adenine-thymine pair. The MD simulations confirm this and agree with the crys-tallographic data by suggesting that the substitutions do not lead to major structuralchanges or differences in the bending. In each of three short (800-ps) simulations(a control, single substitution, and triple substitution in the

Documents

Annual Review of Physical Chemistry Volume 51 Issue 1 2000 [Doi 10.1146%2Fannurev.physchem.51.1.435] Cheatham III, Thomas E.; Kollman, Peter a. -- M OLECULAR D YNAMICS S IMULATION