22
Calculation of pK a s in RNA: On the Structural Origins and Functional Roles of Protonated Nucleotides Christopher L. Tang 1 , Emil Alexov 1 , Anna Marie Pyle 2 and Barry Honig 1 1 Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biophysics, Columbia University, 1130 St. Nicholas Avenue, Room 815, New York, NY 10032, USA 2 Department of Molecular Biophysics and Biochemistry , Howard Hughes Medical Institute, Yale University , New Haven, CT 06520, USA pK a calculations based on the PoissonBoltzmann equation have been widely used to study proteins and, more recently, DNA. However, much less attention has been paid to the calculation of pK a shifts in RNA. There is accumulating evidence that protonated nucleotides can stabilize RNA structure and participate in enzyme catalysis within ribozymes. Here, we calculate the pK a shifts of nucleotides in RNA structures using numerical solutions to the PoissonBoltzmann equation. We find that significant shifts are predicted for several nucleotides in two catalytic RNAs, the hairpin ribozyme and the hepatitis delta virus ribozyme, and that the shifts are likely to be related to their functions. We explore how different structural environments shift the pK a s of nucleotides from their solution values. RNA structures appear to use two basic strategies to shift pK a s: (a) the formation of compact structural motifs with structurally- conserved, electrostatic interactions; and (b) the arrangement of the phosphodiester backbone to focus negative electrostatic potential in specific regions. © 2006 Published by Elsevier Ltd. *Corresponding author Keywords: ribozyme; pseudoknot; pK a calculation; PoissonBoltzmann equation; RNA structure Introduction There is increasing evidence that ionized nucleo- tides play important roles in RNA structure and function. Adenosine and cytidine can protonate on their N1 and N3 atoms, respectively, but both are poor bases and have pK a s in solution that render them neutral at pH 7 (their pK a s in solution are 3.8 for adenosine and 4.3 for cytidine; Figure 1). 1,2 Nevertheless, there have been numerous examples where, based on the examination of crystal and solution structures, 311 protonated nucleotides ap- pear to be present in RNA, suggesting that their pK a s have been shifted upwards from their solution values. Nucleotides with elevated pK a s have been implicated to play a direct role in ribozyme catalysis. 12,13 For example, many lines of biochem- ical and structural evidence suggest that the hepatitis delta virus ribozyme (HDVR) and the hairpin ribozyme, in particular, utilize protonated nucleotides or nucleotides with elevated pK a s to achieve optimal activity. 1424 Protonated nucleotides have been implicated in a wide variety of structures, ranging from frameshifting pseudoknots to the ribosome itself. 2527 Therefore, a central question is whether nucleotides with shifted pK a s play as significant a role in RNA structure and function as they often do in proteins, where pK a -shifted residues affect protein stability, 28 control conformational changes, 29 modulate binding to substrates, 30 and participate in catalytic mechanisms. 31 Indeed, the availability of protonated nucleotides would add to the diversity of chemical groups that could be used for function in RNA. 32,33 Here, we seek to understand the structural determinants of pK a shifts, relative to solution values, of nucleotides in RNA using computational Present address: E. Alexov, Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA. Abbreviations used: BPH, branch point helix; LDZ, lead-dependent ribozyme; BWYV, beet western yellows virus; PEMV, pea enation mosaic virus; HDVR, hepatitis delta virus ribozyme; LPB/NLPB, linear/non-linear PoissonBoltzmann equation; ESP, electrostatic potential; MCCE, multi-conformation continuum electrostatics; MC, Monte Carlo; RMSD, root-mean-square deviation. E-mail address of the corresponding author: [email protected] doi:10.1016/j.jmb.2006.12.001 J. Mol. Biol. (2007) 366, 14751496 0022-2836/$ - see front matter © 2006 Published by Elsevier Ltd.

Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

Calculation of pKas in RNA: On the Structural Originsand Functional Roles of Protonated Nucleotides

Christopher L. Tang1, Emil Alexov1, Anna Marie Pyle2

and Barry Honig1!1Howard Hughes MedicalInstitute, Center forComputational Biology andBioinformatics, Department ofBiochemistry and MolecularBiophysics, ColumbiaUniversity, 1130 St. NicholasAvenue, Room 815, New York,NY 10032, USA2Department of MolecularBiophysics and Biochemistry,Howard Hughes MedicalInstitute, Yale University,New Haven, CT 06520, USA

pKa calculations based on the Poisson–Boltzmann equation have beenwidely used to study proteins and, more recently, DNA. However, muchless attention has been paid to the calculation of pKa shifts in RNA. Thereis accumulating evidence that protonated nucleotides can stabilize RNAstructure and participate in enzyme catalysis within ribozymes. Here, wecalculate the pKa shifts of nucleotides in RNA structures using numericalsolutions to the Poisson–Boltzmann equation. We find that significantshifts are predicted for several nucleotides in two catalytic RNAs, thehairpin ribozyme and the hepatitis delta virus ribozyme, and that theshifts are likely to be related to their functions. We explore how differentstructural environments shift the pKas of nucleotides from their solutionvalues. RNA structures appear to use two basic strategies to shift pKas:(a) the formation of compact structural motifs with structurally-conserved, electrostatic interactions; and (b) the arrangement of thephosphodiester backbone to focus negative electrostatic potential inspecific regions.

© 2006 Published by Elsevier Ltd.

*Corresponding authorKeywords: ribozyme; pseudoknot; pKa calculation; Poisson–Boltzmannequation; RNA structure

Introduction

There is increasing evidence that ionized nucleo-tides play important roles in RNA structure andfunction. Adenosine and cytidine can protonate ontheir N1 and N3 atoms, respectively, but both arepoor bases and have pKas in solution that renderthem neutral at pH 7 (their pKas in solution are 3.8for adenosine and 4.3 for cytidine; Figure 1).1,2Nevertheless, there have been numerous exampleswhere, based on the examination of crystal and

solution structures,3–11 protonated nucleotides ap-pear to be present in RNA, suggesting that theirpKas have been shifted upwards from their solutionvalues. Nucleotides with elevated pKas have beenimplicated to play a direct role in ribozymecatalysis.12,13 For example, many lines of biochem-ical and structural evidence suggest that thehepatitis delta virus ribozyme (HDVR) and thehairpin ribozyme, in particular, utilize protonatednucleotides or nucleotides with elevated pKas toachieve optimal activity.14–24 Protonated nucleotideshave been implicated in a wide variety of structures,ranging from frameshifting pseudoknots to theribosome itself.25–27 Therefore, a central question iswhether nucleotides with shifted pKas play assignificant a role in RNA structure and function asthey often do in proteins, where pKa-shifted residuesaffect protein stability,28 control conformationalchanges,29 modulate binding to substrates,30 andparticipate in catalytic mechanisms.31 Indeed, theavailability of protonated nucleotides would add tothe diversity of chemical groups that could be usedfor function in RNA.32,33Here, we seek to understand the structural

determinants of pKa shifts, relative to solutionvalues, of nucleotides in RNA using computational

Present address: E. Alexov, Department of Physics andAstronomy, Clemson University, Clemson, SC 29634,USA.Abbreviations used: BPH, branch point helix; LDZ,

lead-dependent ribozyme; BWYV, beet western yellowsvirus; PEMV, pea enation mosaic virus; HDVR, hepatitisdelta virus ribozyme; LPB/NLPB, linear/non-linearPoisson–Boltzmann equation; ESP, electrostatic potential;MCCE, multi-conformation continuum electrostatics; MC,Monte Carlo; RMSD, root-mean-square deviation.E-mail address of the corresponding author:

[email protected]

doi:10.1016/j.jmb.2006.12.001 J. Mol. Biol. (2007) 366, 1475–1496

0022-2836/$ - see front matter © 2006 Published by Elsevier Ltd.

Page 2: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

methods. Ourwork builds on the extensive literaturethat exists for calculating pKas in proteins,34–42 and,more recently, in DNA.43 Most of these methods relyon numerical solutions to the Poisson–Boltzmann(PB) equation to obtain electrostatic contributions tothe pKa shift. The linear PB equation (LPB) has beenused in most applications in proteins but, given thehigh charge density on RNA molecules, thenonlinear PB equation (NLPB) is more appropriate.The NLPB has been applied extensively to highlycharged systems such as acidic membranes andnucleic acids. Despite the large charge densities ofhighly charged molecules and the high mobile iondensities that accumulate in their vicinity, thepredictions of the NLPB have been in remarkableagreement with experiment in many cases. Exam-ples include the salt-dependence of binding ofproteins and ligands to DNA,44,45 the salt andmembrane charge-dependence of the binding ofproteins and peptides to membrane surfaces,46electrostatic potentials around DNA as measuredby EPR experiments,47 and the absolute magnitudeand salt-dependence of the pKa shift experiencedby a ligand that intercalates into DNA.48 The NLPBhas also successfully explained the binding iso-therms of mixed ion species binding to DNA andRNA,49 the stoichiometry and free energy ofmagnesium binding to DNA and RNA,50,51 andthe magnesium-dependence of RNA folding.52,53In many of these cases the salt concentration

approaches 1 M, a region where the approximationin traditional PB methods such as Debye-Huckeltheory are believed not to be valid. However, eventhe linearization condition inherent in the LPB (e!/kT<<1, where ! is the electrostatic potential)actually improves at high salt, since the potentialsinduced by a macromolecule become weaker as theconcentration of salt increases. As discussed by

Sharp & Honig,54 a more serious problem withDebye-Huckel theory is that it chooses one mobileion as a fixed charge and all other mobile ionsbecome part of the ion atmosphere. This introducesan artificial asymmetry into the problem that doesnot exist when the macromolecule is assumed to befixed and the surrounding salt is treated as mobile.Indeed, one would not normally think of a DNAmolecule as part of the ion atmosphere of a mobilesodium ion. The lack of symmetry in macromolec-ular and colloidal solutions makes it possible todefine the electrostatic free energy uniquely withinthe context of the NLPB, and it is the availability ofthis formalism that enabledmany of the applicationsthat followed.54All PB methods ignore ion–ion correlation and

effects of ion size. However, these work in oppositedirections in the sense that ion–ion correlationeffects increase the local ion concentration, whileion size effects tend to reduce the local ionconcentration by ensuring that two ions do notapproach each other too closely. This cancellationmay account, in part, for the fact that the NLPBprovides such an accurate description of thedependence of electrostatic free energies on saltconcentration, as summarized in the previousparagraph. Indeed, the NLPB underestimates iondistributions around cylinders as obtained fromMonte Carlo simulations by only 10–15%, and theeffect on electrostatic free energies appears to be inthis range or smaller.55–58 This is the reason that theNLPB has been applied so effectively to macromo-lecular systems, despite its well known approxima-tions. Simply stated, the consequences of theseapproximations, especially the effect on electrostaticfree energies involving monovalent ions, do notappear to be severe. Their effect on solutionscontaining divalent ions are almost certainly moreserious, although there have been few experimentaltests that have made it possible to determine themagnitude of the problem.Based on the past successes of applying the

NLPB to charged macromolecules, it seems reason-able to apply it to the calculation of pKas in RNA.However, pKa calculations on systems with manytitratable groups require the use of methods thattake multiple ionization equilibria into account. Aswill be discussed below, when a significant numberof nucleotides are involved, the existence of a largenumber of ionization states can introduce compu-tational complexity that, due to the lack ofadditivity among individual terms, essentiallyprecludes the full use of the NLPB in pKacalculations. In order to deal with this problem,we introduce a method that uses solutions to theLPB for which additivity holds, but then adds acorrection term that accounts approximately formissing nonlinear contributions. The approxima-tion is related to one used previously in thetreatment of the titration behavior of polylysine.59Our calculations are based on a Monte Carlo

treatment of multiple ionization states. Specifically,we use a modified version of the MCCEmethod that

Figure 1. Adenosine and cytidine in their unproto-nated (A, C) and protonated (A+, C+) states. Their solutionpKas are shown in parentheses.

1476 Calculating pKas in RNA

Page 3: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

has been shown to be effective both for pKacalculations in proteins60–62 and for the placementof hydrogen in crystal structures.63 We extend theMCCE method so that it is applicable for pKacalculations in RNA. To this end, we report a new setof atomic parameters for calculating electrostaticpotentials in RNA molecules containing protonatednucleotides. Our approach is validated by testing itsability to reproduce quantitatively pKas taken fromthe literature. We address the role of shifted pKas inRNA through an analysis of the branch-pointhelix,64 the lead-dependent ribozyme,65 pseudo-knots from the beet western yellows virus66 andthe pea enation mosaic plant virus,67 HDVR,68,69and the hairpin ribozyme.70,71 For cases whereexperimental data are available, the calculated pKashifts are in quite good agreement with experimen-tal results. However, based on experience withproteins, it is unlikely that all of the calculatedpKas will be in quantitative agreement with exper-imental values, primarily because conformationalchanges that accompany changes in ionization stateare not taken into account. The consequences ofassuming a rigid RNA structure will be discussedfurther below but the expectation is that calculatedshifts will be too large. However, the key result ofour analysis is not the magnitude of pKa shifts butthe identification of nucleotides that undergosignificant shifts and the determination of thestructural factors that lead to these shifts.Our analysis reveals that nucleotides with

elevated pKas are often located at positions inthe structure where they contribute to hydrogenbonds in their protonated states, and in regions ofthe RNA that have been characterized to becatalytically or functionally important. We findalso that several distinct features of RNA areimportant for the occurrence of pKa shifts tohigher values. These include the abundance ofnegatively charged phosphate groups nearpKa-shifted nucleotides and conserved interactionswith polar groups from adjacent nucleotides. Inaddition, as is the case for proteins, the removal ofnucleotide groups from the solvent generallyfavors pKa shifts of bases to lower values. Acomparison between C+GCA motifs in divergentstructures gives us a novel view of how thesemotifs may be stabilized. Our analysis provides adetailed picture of how structure influences pKashifts in RNA molecules.

Results

Calculation of nucleotide pKas in RNAstructures

Assessment of the accuracy of calculated pKasby comparison with spectroscopically determinedvalues

In order to validate our approach, we comparedcalculated pKas to those determined by NMR

spectroscopy for two RNAs. The first of these, thebranch-point helix (BPH), has a 21 nucleotide stem–loop structure containing an internal asymmetricloop (PDB ID 17ra).64 In the structure, consecutiveadenosine residues in the asymmetric loop, A6 andA7, stack within the helix opposite a single uridine,U16. The measured pKa of A7 is shifted to 6.1, whilethe other adenosine residues in the structure havepKa ! 5.5 (Table 1). Calculations were carried outusing ionic strengths that mimic experimentalconditions (e.g. 10 mM monovalent salt forBPH).64 As can be seen from Table 1, there is astriking agreement between the measured andcalculated values. The two nucleotides with thehighest and second highest measured pKa shifts (A7and A13) were identified in the correct order andwere calculated to have pKas within 0.7 pKa unit ofthe experiment. In addition, the calculated pKas ofnucleotides involved inWatson–Crick base-pairs aredepressed from their solution values, as normallywould be expected.

Table 1. Comparison of calculated and spectroscopicallydetermined pKas

NucleotideSecondarystructure

Spectroscopically-determined pKa

Calculated pKausing NL correction

Branch-point helix (BPH) in 10 mM NaCl64

C3 wc 2.5±0.7A6 <5.0 2.5±0.9A7 A+Ua 6.1 6.8±0.8A10 <5.0 1.7±0.6A13 5.5 5.3±0.4C14 wc 3.5±1.0C15 wc 1.4±0.8A17 wc <5.0 2.7±1.3C20 wc 1.7±0.8C21 wc 2.1±0.5

Lead-dependent ribozyme (LDZ) in 100 mM NaCl65

C2 wc 2.1±1.5A4 wc !3.1 <3.0C5 wc 3.0±2.0C6 A+Cb 2.8±2.4A8 4.3± .3 4.9±0.8C10 wc 1.4±1.5C11 wc 3.7±1.5A12 wc !3.1 <3.0C14 wc 4.6±1.0A16 3.8±0.4 3.4±1.1A17 3.8±0.4 2.4±1.3A18 3.5±0.6 3.6±0.9A25 A+C 6.5±0.1 7.3±1.8C28 wc 3.1±0.7C30 wc 5.0±2.0

All pKas were calculated from the LPB using non linearcorrection as described in the text. The mean±standard devia-tion of the calculated pKa values is given for the 12 low-energyNMR structures for BPH (PDB ID 17ra) and the 25 low-energyNMR structures for LDZ (PDB ID 1ldz). Secondary structureinteractions are annotated with one of the following types: wc,Watson–Crick; A+U, protonated AU; or A+C, protonated AC.Nucleotides with experimentally measured pKas are highlightedin bold-face.

a In an A+U pair, A:N1+ forms a hydrogen bond with U:O2 asthe acceptor.

b In an A+C pair, A:N1+ forms a hydrogen bond with C:O2 asthe acceptor.

1477Calculating pKas in RNA

Page 4: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

The second structure, lead-dependent ribozyme(LDZ), is a 30 nucleotide stem–loop that alsocontains an internal asymmetric loop (PDB IDs1ldz and 2ldz).72 The asymmetric loop contains aprotonated A+C pair, A25-C6, in which A25 dis-plays a measured pKa of 6.5±0.1 (Table 1). This loopcontains a non-canonical AG pair flanked by twoextrahelical guanosine nucleotides; these and allother nucleotides are measured to have more typicalpKas of less than 4.3 (Table 1).65,73 We calculatedpKas for each nucleotide and averaged them acrossthe published set of 25 NMR conformers. Thenucleotides with the highest (A25) and secondhighest (A8) measured pKas were identified in thecorrect order and the calculated values wereaccurate to within 0.8 pKa unit.Our ability to account for salt effects was tested

by comparing the calculated pKa of A25 toexperimental measurements under two salt condi-tions. The experimentally determined pKa of A25shifts from 6.5 to 5.9 upon changing the concen-tration of monovalent ion from 100 mM to 500 mM(Table 2).65,73 The calculated shift, from 7.3 to 6.6, isin excellent agreement with experiment, as are theabsolute values, which are within 0.8 pKa unit ofthe experimental measurement. As can be seen inTable 2, agreement with experiment is reducedslightly if the MCCE procedure is used inconjunction with the LPB. In all other cases wehave examined, the pKas reported by the LPBmethod are one to two units larger than thoseobtained from the NLPB, probably because the

effects of the ion atmosphere in screening interac-tions with phosphate groups is underestimated bythe LPB. Since even the NLPB calculations tend tooverestimate pKa shifts, the use of the LPB reducesoverall agreement between the calculated resultsand experiment.

Identification of pKa-shifted nucleotides inpseudoknots

The pKa calculations were carried out on pseu-doknot structures from the beet western yellowsvirus, BWYV-", and the pea enation mosaic virus,PEMV-". BWYV-" and PEMV-" share a commonsecondary structure topology composed of twostems (S1 and S2) and two loops (L1 and L2), whereL1 interacts with the major groove of S2, and L2interacts with the minor groove of S1. In bothpseudoknots, tertiary contacts between the L1 andS2 form a C+GCA structural motif containing aprotonated cytidine (Figure 2). As can be seen inthe Figure, C+GCA appears to be a recurringstructural motif that has been observed inHDVR.67 The unfolding of both pseudoknots havebeen shown to be highly pH-dependent, which hasbeen attributed to the cytidine in the C+GCA motifon the basis of the proposed hydrogen bondbetween the protonated cytidine N3 nitrogenatom and the guanosine O6 oxygen atom in thestructure.74,75The protonated cytidine in the C+GCA motif is

identified in the calculations as having the mostelevated pKa in both structures (Table 3). As shownin Table 3, the pKa is calculated to be 13.7 for C8 inBWYV-" and 10.6 for C10 in PEMV-". The pH-dependence of unfolding in BWYV-" and PEMV-"have been measured to exhibit apparent pKas of6.8–7.3 and 7.1, respectively.74,75 However, theapparent pKas obtained from folding/unfoldingtransitions do not correspond directly to the pKasof individual groups. In the simplest case, whereonly a single group controls titration behavior, theapparent pKa corresponds to the midpoint betweenthe pKa of the titratable group in the two states(folded and unfolded).40,76 If only a single groupdetermines the shape of the titration curves for

Table 2. Salt-dependence of the pKa of A25 in lead-dependent ribozyme

[NaCL](mM)

Spectroscopically-determined pKa

Calculated pKausing NLcorrection

Calculated pKausing LPB alone

100 6.5± .1 7.3±1.8 7.9±1.8500 5.9± .1 6.6±1.8 6.8±1.8

The pKa of A25 was calculated using 25 low-energy NMRstructures from PDB ID 1ldz under different salt conditions andcompared to experiment.73 Calculations were performed usingthe nonlinear (NL) correction term and using the LPB alone.

Figure 2. HDVR, BWYV-" and PEMV-" share a common C+GCA structural motif. Dotted lines indicate the hydrogenbond network. A hydrogen bond between protonated N3 atom of cytidine and the O6 atom of guanosine is indicated bythe red arrow (forming a C+G [rh] interaction, also discussed in the legend to Table 3).

1478 Calculating pKas in RNA

Page 5: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

BWYV-" and PEMV-", then assuming a pKa of 4.3for cytidine in the unfolded state would predict apKa of "10 for the cytidine in the folded state, whichis in good agreement with the calculated value forPEMV-". However, when multiple titration sitesinfluence the folding reaction, the titration behaviorbecomes more complex and the experimental databecome more difficult to interpret. Moreover, ifresidual secondary structure is present in theunfolded state, then assuming a reference pKa of4.3 would not be correct.

Moody et al. have provided a cogent discussion ofthermodynamic linkage relationships involved inpH-dependent RNA folding and have discussedconditions where large unfolded state pKas might beexpected.76 Their analysis highlights the difficultiesof assigning experimental pKas to cytidine in thepseudoknots considered here. We can say with somecertainty that the relevant values are greater than theapparent pKas so that they are likely to be above 7.5,and perhaps significantly higher. Thus, the calcula-tions are successful in identifying cytidine nucleo-tides that have undergone significant pKa shifts,although we are unable to determine at this pointwhether the actual values are calculated accurately.On the other hand, the highest pKas we are aware ofthat has been measured experimentally for cytidinein a nucleic acid structure is 9.5.5 As such, acalculated pKa such as 13.7 for C8 in BWYV-" isunprecedented, and thus is almost certainly toohigh.Indeed, there is reason to believe that some pKa

values have been overestimated, since we havetreated the RNA structure as rigid; that is, we havenot allowed the RNA to undergo conformationalrelaxation in response to a change in protonation.Since the crystal and NMR structures studied herewere determined in pH ranges where the cytidinenucleotides of interest are protonated, one wouldexpect some conformational relaxation to occur inthe folded state that would stabilize the unproto-nated form of the cytidine. This would, in turn,reduce the pKa to below the value obtained byassuming a rigid structure. For this reason, thevalues reported here for C8 in BWYV-" and C10 inPEMV-" are likely to be too large. On the otherhand, the calculations clearly identify these twocytidine nucleotides as undergoing significant pKashifts to higher values. Consistent with previousstudies, our calculations suggest that these groupsdetermine the pH-dependent unfolding of the twopseudoknots at high pH.The error resulting from the use of only two

crystal conformations was greater than 4 pKa unitsfor A23 in the BWYV-" (Table 3), and we concludedwe could not determine its pKa with any precision(data not shown). This is likely due to the fact thatsmall changes in local structure around the titratinggroup between the two conformations can havelarge effects on electrostatic free energies. Theresulting energy differences may lead to large errors,especially if the number of conformations consid-ered is very few. On the other hand, averages overlarger numbers of conformations usually lead to lessnoisy results, as was the case for most of theremaining calculations.

pKa-shifted nucleotides in the HDV ribozyme

We computed the pKas of all titratable nucleotidesin HDVR and the hairpin ribozyme, so as todetermine the locations of nucleotides likely to beprotonated at physiological pH. HDVR catalyzes asite-specific phosphodiester self-cleavage reaction

Table 3. Comparison of calculated and apparent pKas ofunfolding

NucleotideSecondarystructure

Apparent pKaof unfolding

Calculated pKa usingNL correction

BWYV-! in 100 mM NaCl, 10 mM MgCl274,75

C3 wc <3.0C5 wc <3.0C8 C+G[rh]a 6.8-7.3 13.7±0.1A9 2.6±0.1C10 wc <3.0C11 wc <3.0C14 wc <3.0C15 wc <3.0C17 wc <3.0A20 O2! 7.3±0.1A21 4.6±0.6C22 4.5±0.1A23 O2! n.r.A24 AG <3.0A25 O2! 6.1±0.2C26 wc <3.0

PEMV-! in 100 mM NaCl74,75

C5 wc <3.0C6 wc <3.0C10 C+G[rh] 7.1 10.6±1.1A12 4.4±2.0C13 wc 3.5±3.4C15 wc 5.7±2.7C16 wc <3.0A19 AU[s]b, wc 7.4±0.7A21 3.3±1.4A22 3.8±1.8A23 7.8±2.3C24 4.9±2.5A25 4.6±1.9A26 <3.0A27 O2! 2.1±1.5C30 wc <3.0A31 4.2±1.0

pKas were calculated for each adenosine and cytidine nucleotidein BWYV-" and PEMV-". The mean±standard deviation wascalculated for a set of four BWYV-" crystal structures (PDB ID437d and 1l2x) and 15 low-energy NMR structures of PEMV-"(PDB ID 1kpy). The apparent pKas of unfolding values (column 3)are taken from the literature. 74,75 Secondary structure interactionsare annotated with types defined in Table 1 or from the following:o2!, hydrogen-bonded to a 2! hydroxyl group; C+G[rh], proton-ated cytidine interaction with guanosine along its Hoogsteenedge; AG, AG mispair; or AU[s] sheared AU. For BWYV-", thecalculated pKa of A23 is marked n.r. (not reported) because thereis a large discrepancy between the calculated values in the twocrystal structures. See the text for details.

a See Figure 2 for examples of C+G[rh] pairs.b In a sheared AU pair, the U appears shifted into the minor

groove and U:O4 is within hydrogen bonding distance of A:N1,where, if a hydrogen bond is formed, the latter should beprotonated.

1479Calculating pKas in RNA

Page 6: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

that has been shown to be strongly pH-depen-dent.14,77 The structure of the HDVR ribozyme hasbeen solved in the precursor and product confor-mations.68,69 We performed pKa calculations usingthe product (1cx0 and 1drz) and precursor (1vc5)crystal structures, each obtained at pH#6.68,69Because of the central interest of C75 for understand-ing HDVR enzymatic function,14,16,19,68,69,77–79 weperformed calculations only for structures withcytidine at position 75; the nine remaining structureswere omitted from this study. With the exception ofC75, the calculated pKas of the nucleotides werequite similar for the product and precursor struc-tures (data not shown). This was expected since,except for differences between the product andprecursor structures near C75, the overall similarityof the selected structures is very strong (<1.6/1.1 Åall-atom/all-phosphate-atom root-mean-square de-viation (RMSD)). Following experimental condi-tions,14,77,80 we performed our calculations at1.0 M monovalent salt (i.e. NaCl or LiCl). Ion-specific effects between two different species ofmonovalent salt, however, cannot be taken intoaccount within the context of the PB equation.Figure 3(a) displays the pKas calculated for all

titratable nucleotides in HDVR. Two nucleotides,C41 and C75, were calculated to have pKas greaterthan 5.8 (Figures 3(a) and 4(a)–(c)). C41 is part of aCAA three-nucleotide loop in HDVR and isinvolved in the structurally-conserved C+GCAmotif, found also in the BWYV and PEMV pseu-doknots described above in Figure 2. Its calculatedpKa is 10.6, which is in the same range as the valuescalculated for the protonated cytidine nucleotides ofthe C+GCA motifs in the two pseudoknots. Theidentification of C41 as a nucleotide with a shiftedpKa is consistent with the results reported by Beenand co-workers, who have attributed the apparentrate constant for catalysis of about 7 to C41.17,80,81C75 is calculated to have the second highest pKa in

the product structure of the HDV ribozyme, with acalculated value of"9.6. Althoughwe are not able toreport pKas for the precursor because atoms near the5! terminus of the RNA are missing (1vc5),69 C75 isexpected to have a higher pKa in the precursor thanin the product, because it contains an extra negativecharge due to the phosphate group located near C75.C75 is located at the active-site of the ribozyme andappears to form a hydrogen bond with the 5!terminus OH in the product structure as in Figure4(b) (PDB IDs 1cx0 and 1drz). In the precursorstructure (PDB ID 1vc5), the N3 atom of C75 iswithin 2.7 Å of the O2P atom of the scissilephosphate group, suggesting strongly that theprotonated state is stabilized by nearby phosphategroups. C75 has been shown to play a direct role incatalysis,14,16 and the mutation of C75 to U or Geffectively eliminates ribozyme activity.82–85 Muta-tion of C75 to adenosine lowers the apparent pKa ofthe reaction by an amount that corresponds to thedifference in the solution pKa values of cytidineand adenosine, suggesting strongly that the appar-ent pKa of the reaction reflects that of the nuc-

leotide at this position.14,18 It has been proposedthat the catalytic activity of HDVR depends on theprotonation of C75.14,19,69 The identification ofC75 as a nucleotide with an elevated pKa supportsthis hypothesis, although the calculated value isgreater than the best estimate in the literature,pKa"6–8.14,80,81

pKa-shifted nucleotides in the hairpin ribozyme

Like HDVR, the hairpin ribozyme catalyzes asite-specific phosphodiester cleavage reaction. Inthe crystallized structure of this ribozyme, thesubstrate appears as a separate strand, but thebase-pairing of this strand with the ribozymestrand is integral to the formation of the ribozymestructure. Together, the substrate and ribozymestrands fold into a single four-helix junction.70,71The active site is located within an extensiveinterface between the two major helices of thefour-helix junction. pKas were calculated for theprecursor and product structures (PDB ID 1m5kand 1m5v, each crystallized at pH 5), for whichfour structures were available. Calculations werenot done on the 1m5o transition-state structure,since the presence of the vanadate ion made thepartial charges of the transition state difficult topredict. Our calculations identified three nucleo-tides, A10, A22 and A38, whose pKas are predictedto be greater than 5.8 in the hairpin ribozyme whencalculations were done under experimental saltconditions (1.0 M monovalent salt and 10 mMdivalent salt).86 The calculated values are 6.6, 7.2and 5.9, respectively (Figure 3(b)).As can be seen in Figure 5, A38 is located at the

interface of the two major helices near the activesite. In its protonated state, A38 appears to form ahydrogen bond with the oxygen atom at the site ofthe catalytic reaction. Biochemical characterizationof the hairpin ribozyme has shown that thereplacement of the adenosine with an abasicresidue reduces the rate of catalysis by five to sixorders of magnitude.20 However, activity can belargely restored by supplying free adenine insolution.20 Furthermore, substituting adenine withnucleobases having a higher pKa, such as isogua-nine (pKa=9.0), raises the apparent pKa of thereaction, suggesting that the nucleotide at position38 is responsible for at least some of the pH-dependence observed in the reaction.20 Substitu-tions by other nucleotide analogs displayed equiv-alent pKa shifts. On the basis of this evidence, it hasbeen suggested that A38 in the protonated statestabilizes the transition state of the hairpin ribo-zyme. The elevated pKa calculated for A38 isconsistent with this idea.A10 is also located in the interface between the

two major helices near the active site. The elevatedpKa of A10 is consistent with the sensitivity of theribozyme activity to the solution pKa of nucleotideanalogs substituted at A10.87,88 In particular, thedecrease in activity when A10 is substituted with 8-aza-adenosine (n8A), whose solution pKa is 2.2, can

1480 Calculating pKas in RNA

Page 7: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

be rescued by lowering the pH of the reaction,suggesting that the ionization of A10 influencescatalytic activity directly.23 New crystal structureshave indicated that ordered water molecules arenear the active site of the hairpin ribozyme, and oneof these is in direct contact with the N1 atom of

A10.89 Disruption of the water network by perturb-ing the protonation state of A10 could explain thepH-dependent nucleotide analog interference pat-tern of n8A, further supporting the existence of anelevated pKa for A10. Our treatment of buriedwaters as a dielectric continuum is, of course,

Figure 3. pKas and electrostatic free energies in (a) hepatitis delta virus ribozyme and (b) the hairpin ribozyme.Nucleotides with significantly shifted pKas are labeled. These include C41 (calculated pKa=10.6) and C75 (9.6) in HDVR,and A10 (6.6), A22 (7.2) and A38 (5.9) in the hairpin ribozyme. Values less than 3.0 are not reported. Locations ofnucleotides involved in Watson–Crick base-pairs are indicated as w. The red line indicates the solution pKa ofcytidine=4.3.

1481Calculating pKas in RNA

Page 8: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

problematic but to account for the interactions ofindividual water molecules properly would requiresimulations that are beyond the scope of this work.Rather, our goals here are to identify nucleobaseswith shifted pKas and to understand how RNAstructure is designed to effect these shifts.Lastly, A22 also exhibits an elevated pKa in the

hairpin ribozyme. However, there is no evidence atthis point that A22 plays a specific role in catalysis.

Energetic contributions to pKa shifts in RNA

As has been discussed extensively for aminoacids,34–42,60,62 a number of factors can result in thepKa shift of a nucleotide in RNA away from the valueobserved for the isolated nucleotide in solution. Inthe context of RNA, these include favorable interac-tions between negatively charged phosphate groups

and the protonated form of the base, desolvationeffects and intramolecular interactions with otherbases. Structural features that stabilize the protonat-ed state of the nucleotide shift pKas upward,whereasfeatures that destabilize that state shift pKas down-ward. Favorable interactions of a protonated basewith negatively charged phosphate groups (base–phosphate interactions) will always favor a shift tohigher pKas. Desolvation effects resulting from thetransfer of an ionizable nucleotide from the solventinto a buried location within an RNA molecule willfavor lower pKas compared to those observed insolution, due to the loss of stabilizing interactions ofthe ionized species with the solvent. Lastly, intra-molecular interactions between nucleobases (base–base interactions) through hydrogen bonds andother polar interactions can also shift pKas. Thesize and direction of this effect depend on the

Figure 4. Structure and organization of the HDV ribozyme. (a) Surface view and secondary structure schematic ofHDVR: P1 (red), P1.1 (yellow), P2 (tan), P3 (green), and P4 (purple). The approximate location of the scissile bond at thejunction of several secondary structure elements is indicated by an arrow. (b) The C75:N3 and G1:O5! atoms within theactive site are within hydrogen bonding distance (1cx0). (c) C41 and C75 are shown relative to the secondary structureelements in HDVR. Colors of nucleotides correspond to those depicted in (a). The J4/2 loop is shown in pale blue and theC+GCA motif is in magenta.

1482 Calculating pKas in RNA

Page 9: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

detailed structural environment of each ionizablegroup. Due to the lack of additivity of individualcontributions within the NLPB, we cannot reportspecific contributions for each of these terms to pKashifts. However, in the following sections wereport individual contributions to electrostaticfree energies that can be related to structuralfeatures of the RNA. This allows us to considerhow RNA structure is used to produce shiftedpKas.

Role of solvation and hydrogen bonding instabilizing pKa shifts: the branch-point helix

Although A6 and A7 are situated in very similarstructural environments in the branch-point helix,the pKa of A7 is observed to be elevated, but the pKaof A6 is not (Table 1). In an attempt to understandthe source of this difference, we have calculated anumber of contributions to the electrostatic potentialat both sites. As can be seen in Table 4, negativelycharged phosphate groups contribute a strong

negative electrostatic potential that stabilizes theprotonated form of each base by "3.6 kcal/mol.However, desolvation opposes the phosphate con-tribution for A6 and A7. The effect is much smallerfor A7, which is more exposed to solvent than A6.Indeed, much of the difference between the pKas ofA6 and A7 can be attributed to solvent exposure.The ionized form of A7 is stabilized also byfavorable interactions with other bases, primarilyU16, which can form a hydrogen bond with the N1atom of A7 viaO2. Thus, the pKa of A7 is shifted to ahigher value, due to the effects of the phosphategroups and interactions with other bases. Incontrast, A6 has weaker interactions with otherbases and the effect of the phosphate backbone isopposed by desolvation effects.

Role of phosphate and base interactions instabilizing pKa shifts in HDVR

To understand the role of phosphate and basegroups in shifting pKas, we calculated individual

Figure 5. Structure and organization of the hairpin ribozyme. (a) Secondary structure cartoon of the hairpin ribozymeand (b) the locations of A10, A22 and A38 within the ribozyme. The approximate location of the scissile bond within theinterface (gray region) of the two major helices is indicated by the arrow. (c) The conformation of A38, A-1 and G+1 in thehairpin ribozyme active site (1m5o) showing A38:N1 within hydrogen bonding distance of O5! in the scissile phosphategroup.

1483Calculating pKas in RNA

Page 10: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

electrostatic free energy terms for C41 and C75 inHDVR, and A10, A22 and A38 in the hairpinribozyme, and compared these values to thosecalculated for all other nucleotides (Figure 3). Theelectrostatic terms vary for each nucleotide, butthere are regions in which base-phosphate and/orbase–base interactions stabilize the protonated formof the base due to a strongly negative electrostaticpotential. In particular, nucleotides with highlypositive pKa shifts are found in regions where thenegative electrostatic potential due to phosphategroups is particularly large.Different contributions stabilize the ionized forms

of C41 and C75 in HDVR (Figure 3(a)). For C75,base–phosphate interactions dominate base–baseinteractions, whereas for C41 base–base interactionsalso contribute favorably. C75 experiences a highnegative electrostatic potential induced by thespecific arrangement of phosphate groups inHDVR. A unique feature of HDVR is the presenceof a nested pseudoknot in the core of the structure. Areverse turn of the phosphodiester backbone atnucleotides C21 and C22 in this part of the structureresults in a cup-like geometry of the phosphategroups such that they surround one surface of thecatalytic C75 nucleotide (Figure 6). On the oppositesurface, phosphate groups adjacent to the scissilebond also contribute to the negative potential.Finally, an S-turn in the backbone of the so-calledJ4/2 loop brings the A77 phosphate group within7 Å of the catalytic core, which would not beachieved if the backbone did not contain a turn inthis region. This unique convergence of geometriesappears to be the source of the high electrostaticpotential surrounding the active site nucleotide.In contrast, C41 is located in a region of the RNA

where the phosphate potential is not unusuallynegative (Figure 3(a)). Instead, the high pKa shift ofC41 can be explained by the unusually strongenergetic contributions from base–base interactions,predominantly with nucleotides within the C+GCAmotif. As discussed in the following section, thefavorable interactions that stabilize the protonated

cytidine include those formed with the guanosinenucleotide, with which it shares two hydrogenbonds, and additional neighboring nucleobase inter-actions that are conserved across very different RNAstructures.

Conservation of stabilizing interactions in theC+GCA motif

In order to understand the energetic interactionsrequired to stabilize the protonated cytidine in theC+GCA motif, we compared the magnitude ofindividual electrostatic free energy terms at the N3atom of the protonated cytidine within the threestructures where it is found; HDVR, BWYV-" andPEMV-". As can be observed in Figure 2, the fournucleotides composing the C+GCA motif can bereadily superimposed. In each case, the protonatedcytidine (C41 in HDVR, C8 in BWYV-", and C10 inPEMV-") is involved in an unusual hydrogen bondwith the major-groove (Hoogsteen) edge of theguanosine in the motif (G73 in HDVR, G12 inBWYV-", and G28 in PEMV-"). As can be seen in thestructure, this hydrogen bond interaction occurs viathe keto oxygen of guanosine and stabilizes proton-ated cytidine. Consistent with its role in thestructure, this interaction emerges as a majorstabilizing feature, as shown in Table 4.A second feature present in all three structures

appears to stabilize the protonated cytidine. Specif-ically, a neighboring nucleotide stacked above (and/or below) the plane of the C+G base-pair contributesto the stability of the protonated cytidine. As can beseen in Table 4, G74 in HDVR, C14 in BWYV-", andU9 in PEMV-" play this role. In each case, theindividual free energy terms due to neighboringnucleobase interactions are >1.4 kcal/mol. Tounderstand how three apparently different nucleo-tides could serve the same role in stabilizing thisstructure, we compared the structures of the threenucleotides near the C+GCA motif (Figure 7). InFigure 7, it is clear that the G74, C14 and U9nucleotides all contribute a keto oxygen atom to a

Table 4. Electrostatic contributions due to changes in protonation state in the branch point helix and C+GCA motif

Structure/nucleotides

Desolvationfree energy

Source ofcontribution

Base–base interactionfree energy

Base–phosphateinteraction free energy

Branch-point helixA6 +2.2 All NTs +0.4 $3.6A7 +0.5 All NTs $1.3 $3.5

C+GCA motifsHDVR

C41 $0.8 G73 $4.6 $0.5G74 $1.4 $0.3

BWYV-"C8 $0.1 G12 $5.4 $0.7

C14 $3.7 $0.5PEMV-"

C10 $2.3 U9 $1.8 $0.4G28 $2.6 $0.4

Electrostatic contributions from specific sources were calculated as discussed in Methods and are reported in units of kcal/mol. Rowslabeled All NTs signify the base–base and base–phosphate contributions due to all nucleotides. See Results for details.

1484 Calculating pKas in RNA

Page 11: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

position within 4 Å of the proton of the ionizedcytidine, stacked above or below the plane of theC+G base-pair. This oxygen atom is not involved in a

hydrogen bond with the proton but rather it isarranged so as to optimize local electrostatic inter-actions in the neighborhood of the cytidine proton.

Figure 7. Hydrogen bonds and neighboring nucleobase interactions stabilize the protonated cytidine in the C+GCAmotif. Distances are given for neighboring nucleobase interactions between oxygen and cytidineN3. See the text for details.

Figure 6. Phosphate and other oxygen atoms can stabilize protonated nucleotides. (a) Structure of the phosphodiesterbackbone near C75 (purple and yellow surface, purple and blue trace). A cluster of phosphate ions (yellow) from the P1.1pseudoknot helix, the substrate strand, and the J4/2 loop stabilize the pKa shift of C75. The O5! atom in the scissilephosphate group is labeled. (b) The conformation of G29 is consistent with the formation of an O2! hydrogen bond withA78.

1485Calculating pKas in RNA

Page 12: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

Most interesting is the case of C14 in BWYV-".This pseudoknot is characterized by a highlyunusual backbone conformation in the vicinity ofnucleotides 13 and 14 (A.M.P. and L. Wadley,unpublished results).90 For example, U13 is C2!-endo and the backbone is described by #-$ valuesthat fall outside any of the typical regions for RNA(#, 62.3; $, 42.2, for conformation B). The G12-C14base-step is characterized by a much greater thanusual helical twist of nearly 90° (Figure 7). Thisovertwisted conformation is accommodated by theRNA backbone through the unpaired and outward-ly-flipped U13 base. From these observations andthe calculated energy profiles, the structurally-conserved keto oxygen atom described aboveappears to be an important feature of the C+GCAmotif, which, to our knowledge, has not beencharacterized.

Stabilization of pKa shifts in the hairpin ribozyme

To understand the structural features that stabi-lize the pKa shift of A10, A22 and A38 in thehairpin ribozyme, free energy contributions thataffect protonation were calculated for thesenucleotides. pKas generally coincide with theregions of the RNA where the negative electro-static potential due to phosphate groups is higherthan in the surrounding structure. Specifically,nucleotides 9-10, 20–27, and 38–44 (Figure 3(b))experience a significant negative electrostaticpotential. Within each of these nucleotide ranges,at least one nucleotide is calculated to have a pKashift > 2 pKa units from its solution value. Thesenucleotides coincide with center of the denseinterface between the two major helical elements ofthe hairpin ribozyme that consists of four phos-

phodiester backbone segments arranged in closeproximity (Figure 8). It is likely that the closepacking of this interface evolved in such a way asto bring together the four backbone segments,resulting in the creation of a region at the centerof the helical interface, and the catalytic core,with a particularly high negative electrostaticpotential.In some cases, nucleotides may interact strongly

with phosphate groups but are not calculated tohave a large pKa shift (e.g. A77 and A78 in HDVR;Figure 3(a)). In these cases, the protonation of onenucleotide with a higher pKa, such as C75, reducesthe pKa shifts of nearby nucleotides that interactwith it.

Discussion

Calculating pKas in RNA

Conformational relaxation

Here, we report a treatment of the factors thatproduce pKa shifts in RNA structures. On the basisof comparisons to experimental results where thepKas have been determined directly, our approachappears to be effective in predicting the pKas ofnucleotides. Most significantly, it is successful inidentifying bases with a shifted pKas and, in eachcase, offers a structural interpretation for the shifts.In systems where pKas have not been measureddirectly, the identification of pKa shifts in this studyis supported by multiple lines of biochemicalevidence, such as the pH-dependence of catalysisor unfolding. In many cases, the calculated pKascan be interpreted meaningfully and agree quan-

Figure 8. Phosphodiester backbones of the two major helices stabilize the pKa shift of A38 at the interface of thehelices. The scissile phosphate group (arrow) is colored orange (phosphorus) and red (oxygen).

1486 Calculating pKas in RNA

Page 13: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

titatively with experiment, but in a few othercases, such as the protonated C in C+GCA andthe catalytic C in HDVR, the calculated valuesappear to be more elevated than the best estimatesnow available from experiment.As we have discussed, the discrepancies are

likely due to the assumption that the RNAstructure does not change with change of theionization state. Using an internal dielectric con-stant of 4 accounts for some minor conformationalrelaxation throughout the RNA associated withnucleotide ionization,91 but clearly this does notaccount for major changes that could occur if, say, anucleotide was stacked into a helix in oneconformation, but flipped out of the helix inanother. This, for example, has been shown tooccur in the U6 RNA intermolecular stem–loop(ISL).92 In such cases, conformational changesmust be treated explicitly. Although assuming arigid molecule is clearly an oversimplification, itis of considerable interest to explore pKa predic-tions based on the experimental structure alonewithout introducing uncertainties arising from atreatment of conformational relaxation in RNA.Indeed, the good agreement between the calcu-lated and experimental results obtained here forthe branch-point helix and the lead-dependentribozyme suggests that base protonation doesnot induce large conformational changes in thesetwo structures, or at least that such changes arenot large enough to have significant effects onpKa.

Nonlinear effects

A major complication involved in calculatingpKas in a highly charged molecule arises from thenonlinear response to salt concentration. Forproteins where nonlinear effects are not generallythought to be important, the electrostatic interac-tion between two sites is independent of theionization state of the other sites. Thus, allinteractions are additive and need to be calculatedonly once. In contrast, when the nonlinear PBequation is used, the interaction of each pair ofsites depends on the charge states of other sites,since these, in turn, affect the screening by salt ofthe pairwise interaction in question. Accounting forthis effect exactly is computationally expensive andto do so would require a separate PB calculationfor each of the 2N ionization states of themacromolecule. An approach to this problem wasconsidered by Vorobjev et al. for polylysine helicalpeptides.59 They used a screening factor for eachpairwise interaction, which increased with theelectrostatic potential that characterized eachinteraction (and hence, the net charge of themolecule). Our method differs by attempting tocalculate a nonlinear correction as a single termcorrelated with the total net charge on the RNA.Overall, this is a simpler approach for treatingnonlinearity that nonetheless appears to achievegood accuracy compared to experiment.

Divalent ions

The appropriate treatment of divalent ions is ofconsiderable importance, given their significantelectrostatic contributions to folding, stability andligand binding within RNA.93,94 Here, divalent ionshave been treated using the same formalismgoverning the interaction of monovalent ions withRNA. Thus, site-bound ions are not treated explic-itly; rather, all divalent ions are assumed to bebound diffusely and treated directly by applicationof the NLPB.50 The effects of site-bound ions ispotentially of particular importance to the calcula-tion of pKas in the active conformation of the HDVRstructure, where it has been reported that electrondensity, interpreted as the presence of a site-boundhydrated metal ion (e.g. Mg2+(H2O)6), is observed inthe crystal structure 4.3 Å away from the titrationsite of the catalytic nucleotide (PDB 1sj3).69 Howev-er, we have used only the structures of HDVR inwhich Mg2+ was not observed in the active site;namely, in the product conformations (PDB 1cx0and 1drz) and one that had been made inactive bythe removal of Mg2+ from the solution conditions ofthe crystal structure (PDB 1vc5). A second factorjustifying our treatment of divalent ions is thatHDVR is catalytically active even in the presence ofonly monovalent salt.14,80 Thus, at the very least, thecalculated pKa shift of the active site C75 is relevantto understanding the nature of catalysis in theabsence of magnesium.

On the structural origins of pKa shifts in RNA

The role of phosphate groups in inducing pKa shifts

How do structural elements in ribozymes elevatethe pKas of nucleotides near their active sites? Wepropose that the elevated pKas of several nucleotidesare a consequence of the architecture, or “fold”, ofthe RNA, in which the local abundance of phosphategroups helps to elevate pKas. In HDVR, for instance,the two major helical axes of the ribozyme form a Y-shaped intersection that converges near the activesite (Figure 4(c)). As a consequence, the active sitecytidine, C75, is brought together with the phos-phate groups adjacent to the scissile bond, as well asthose involved with forming the central, P1.1pseudoknot helix (nucleotides 21-22; Figure 6).Such a compact arrangement of phosphate groups,along with the curvature of the molecular surface,can focus strong electrostatic potentials in the activesite region. Using calculations of the surfacepotential with the nonlinear PB equation, Bevilacquaand colleagues have shown that this is indeedcalculated to happen for this ribozyme.77,95 Inaddition, we now show that the highly negativepotential leads to an elevated pKa calculated for theC75 nucleotide. Notably, we also observed that theelectrostatic environment surrounding the proximalnucleotides, A77 and A78 (Figure 3(a)), also favorelevated pKas, although it remains to be shownwhether this has any functional significance for theribozyme.

1487Calculating pKas in RNA

Page 14: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

A similar convergence of phosphate groups canbe observed near the active site of the hairpinribozyme (Figure 5). The high electrostatic potentialsurrounding the active site may enable A38 toprotonate and form a functionally importanthydrogen bond in the transition state of thetransesterification reaction. In this case, the localabundance of phosphate groups near the active siteis the consequence of the crossing of the two majorhelical axes through the center of the molecule(Figure 8). Much like HDVR, the formation of thetwo-helix interface buries phosphate groups fromboth the substrate and ribozyme strands. In total,19 nucleotides are buried by greater than 70% oftheir surface areas, and these occur mainly in theinterface. It seems likely that there will be furtherinstances in other RNAs where unusually highdensities of phosphate groups or buried phosphategroups are used to shift the pKas of functionallyimportant nucleotides.

The role of base–base interactions in inducingpKa shifts

Because of the importance of base–base interac-tions for defining structures in RNA, a number ofattempts have been made to catalogue the hydrogenbonding patterns that are possible between nucleo-bases.2,25,96–99 Through manual and automaticmeans, these efforts have identified eight distinctpatterns of base-pairing that involve at least oneprotonated nucleotide: C+C (cis and trans), A+C (cis),A+G (Hoogsteen and reverse Hoogsteen), C+G (cis,Hoogsteen and reverse Hoogsteen), where cis andtrans refer to the relative orientations along theglycosidic bonds. An underlying assumption forconstructing most base-pair compendiums has beento limit them to coplanar base-pairs involved in atleast two hydrogen bonds. Although these simpli-fications have been useful for enumerating the mostlikely configurations for protonated nucleotides,clearly these heuristics may miss the identificationof nucleotides that are protonated if the hydrogenbond acceptor is a phosphate oxygen atom or 2!OH.Indeed, the pKa of adenosine or cytidine is partic-ularly ambiguous in the absence of energy calcula-tions when the hydrogen bonding partner is a 2!OHbecause it is possible for the 2! oxygen atom to act aseither a hydrogen bond donor or acceptor. Anexample of this can be observed in the catalytic coreof HDVR for A78 (Figure 6(b)).Among the set of base-pairs containing protonated

nucleotides, one of the more commonly observedseems to be the A+C base-pair.4,92,100,101 In ourcurrent study, the A+C pair has appeared in thelead-dependent ribozyme where we have obtained acalculated pKa for the pair close to 6.5. Does this base-pair have any function other than to stabilize the RNAunder acidic conditions? Others have noted that theA+C interaction is isosteric to GU wobble pairs,96,100where the C1! atoms of the A+C pair and those of GUand glycosidic bonds are equally distant and in thesame relative orientation. However, unlike the GU

wobble pair, the A+C pair forms only when the pH issufficiently low for adenosine to protonate. Thus,unlike the GU wobble, A+C pairs can act as a pH-sensitive conformational switch, such as the one thatappears to occur near the cleavage site in the Varkudsatellite ribozyme.101 In this system, deprotonation iscoupled to a conformational change in the cleavagesite stem–loop. A significant conformational changeupon a shift in pH is observed also for the U6 ISL.92 Itis possible that pH-dependent base-pairs like A+Cmay be conserved in an RNAwhere sensitivity to itspH environment may be important to its function.Conserved hydrogen bonds appear to be the main

source of stability of the C+GCA motif. However,more subtle structural features such as the structuraland electrostatic effects of neighboring nucleobasesmay also be involved. It is known that duplexeshaving the same composition of base-pairs but indifferent permutations have different energies ofduplex formation.102,103 Indeed, there is experimentalevidence that nearest neighbors influence the pKas ofadjacent nucleotides directly.104 Differences arisefrom the different stabilities introduced by thejuxtaposition of different interactions between base-pairs due to the different permutations. In the case ofthe C+GCA motif, the structure appears to preferen-tially adopt a conformation where adjacent ketooxygen atoms are positioned to stabilize the proton-ated cytidine. Future work may involve performingnucleotide sequence alignment to discover whetherthis preference is more widely conserved.On the basis of our calculations, several nucleotides

have been identified in the catalytic cores of ribo-zymes to have elevated pKas. Notably, the predictednucleotides coincide almost precisely with nucleo-tides that have been shown to have catalytic roles.Moreover, many of these nucleotides that we havepredicted to have anomalous pKas correspond togroups that have been suggested to be protonated onthe basis of the pH-dependence of the catalytic rates ofreaction. Each of the calculated results can generallybe understood in structural terms. In the case ofHDVR, C75 is thought to act as a general acid or basenear the 2! OH nucleophile of the precursor HDVR.The proximity of the C75 base to the 5! oxygenterminus (Figure 4(d)) has suggested the possibilitythat, under certain conditions, the protonated form ofC75 could be stable and act as a possible general acidor base, even though there is no structure of the nativesequence clearly showing this interaction. In thehairpin ribozyme, the protonated form of A38appears to form a hydrogen bond with a centralphosphate oxygen atom within the trapped transi-tion-state mimic (Figure 5(c)). Since the hairpinribozyme displays no specificity for metal ions andnone is observed to bind near the active site,nucleotides alone may be solely responsible forcatalysis.105,106 Indeed, the same may be true for theHDV ribozyme.80 Such findings deepen our appreci-ation of the nature ofRNA catalysis and the versatilityof ribonucleotides. Moreover, the theoretical andcomputational methodology developed in this workoffers the possibility of understanding the structural

1488 Calculating pKas in RNA

Page 15: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

origins of pKa shifts of nucleotides that play functionalroles and, in addition, of using structural informationto identify these nucleotides when direct experimen-tal measurements are not available.

Methods

pKa calculations based on the Poisson–Boltzmann equa-tion have been widely used to study proteins,34–42,60,62and, more recently, DNA.43 Here, we review the under-lying theory in order to discuss its application in the contextof RNA. We used a modified version of the programMCCE,60–62 which uses a distance-dependent pairwiseenergy softening function107 to help prevent large electro-static energies from dominating the pKa calculations. Thiscan occur if ionizable groups approach each other tooclosely due to small errors in the crystal structure or whendielectric screening is not accounted for completely. Aunique feature of MCCE is its ability to account forconformational changes between protonated and unproto-nated states of the titratable group by sampling overmultiple conformations. However, we have used a simpli-fied version where titratable nucleotides are held rigid andcan exist only in one of two states: protonated andunprotonated, and where the charge of the nucleotide is0e or $1e, respectively.

Theory of multi-site titration in nucleic acids

The theory of multi-site titration for polymers in solutionwas developed previously,37,41,59,60 and is described here asapplied to nucleic acids. Given a nucleic acid with Nnucleotides that might be protonated, we can compute thetitration curve of the ith nucleotide by finding its averagedegree of protonation, xi, as a function of pH, (i.e. xi=+1 ifprotonated, otherwise xi=0). For the purposes of this study,we consider adenosine and cytidine nucleotides as capableof protonation, specifically on theirN1 orN3 imino nitrogenatom, respectively, although the same representation can beused for additional types of nucleotides.We represent the protonationmicrostate, m, of the nucleic

acid by the vector x with N elements, which describes thetitration states of each nucleotide in the molecule for thatmicrostate. A free energy, %Gm, is associated with eachmicrostate. There are M=2N such microstates. The averagecharge on the ith nucleotide can be found by taking theBoltzmann-weighted average:

hxii !

XM

mxi m" #exp $DGm

kBT

! "

XM

mexp $DGm

kBT

! " "1#

over the set of possible microstates. In a nucleic acid withNtitratable nucleotides, the complete Boltzmann-weightedaverage requires the computation of 2N terms. In practice,this is avoided by using a Monte Carlo (MC) procedure toestimate the frequency of low-energy microstates, whichwill dominate the partition function. These are used tocalculate the titration curves of each nucleotide using themicrostate free energy described by equation (2). The pKa ofthe ith nucleotide is obtained by finding the pH atwhich ‹xi›is equal to 0.5 using the multi-conformational continuumelectrostatics (MCCE) procedure.60 MCCE was designed toaccount for local conformational changes around an

ionizable group but this feature of the program has notbeen developed for nucleic acids. For this reason, we havekept the RNA structure rigid and the term multiconforma-tional is not appropriate for the current application. Thus,wehave kept the title of the program we used but have turnedoff one of its features. However, theMCCE program offers awell-tested MC approach to using continuum electrostaticsin the calculation of pKas and it has thus provided aparticularly useful vehicle in the current study.In MCCE, %Gm

LPB is obtained from solutions of the LPBand is written as:

DGLPBm !

XN

i

xid%2:3kBT pH$pKrefa i" #

# $

&DGself i" #&DGfixed i" #'

&12

XN

i

XN

jp i

xid xjd DGpair i; j" #&DGvdW i; j" #% &

"2#

where the reference pKa, pKaref(i), is the pKa of the ith

titratable nucleotide in the hypothetical unfolded state ofthe nucleic acid (Figure 9). As a simplification, this value istaken to be the same as the solution pKa of an isolatednucleotide monophosphate and is quoted from experi-mental measurement as 3.8 for 5!-AMP at 25 °C in 0.1 MKNO3 and 4.3 for 5!-CMP at 25 °C in 0.1 M KCl.1 Theprecision of these measurements is expected to be ±0.2–0.4pKa unit, given possible differences in temperature andsalt concentration between the reference state and theexperimental conditions of the RNA structures usedhere.The additional free energy terms in equation (2) are

responsible for pKa shifts relative to the solution value.The self free energy, %Gself(i), is the desolvation cost ofprotonating nucleotide i in the folded state compared tothe unfolded state. %Gfixed(i) gives the change in the freeenergy of solvent-screened coulombic, or pairwise, inter-actions between the charges in a protonated or unproto-nated nucleotide and fixed charges in the RNA (i.e. due to

Figure 9. Thermodynamic cycle considered for a pKacalculation of a single nucleotide for simplicity. SeeMethods for details.

1489Calculating pKas in RNA

Page 16: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

guanosine and uridine nucleotides). In the MCCEmethod,%Gfixed(i) includes any change in free energy of van derWaals (vdW) interactions upon protonation. The finalenergy terms, %Gpair(i,j) and %GvdW(i,j), give the freeenergies of pairwise interaction and the van der Waalsinteraction, respectively, between the ith and jth titratablenucleotides. kB is Boltzmann's constant and T is thetemperature of the system. At T=25 °C, kBT is taken to be0.6 kcal/mol.Standard equations for computing the desolvation free

energy of a nucleotide and free energy of interactionbetween the nucleotide and other partial charges (e.g. dueto other fixed or titratable nucleotides) from electrostaticpotential are given by:108

Gself !12

Xatoms

nqnB

rxn$fieldn "3a#

and

Gfixedjpair !Xatoms

nqnB

chg"m#n "3b#

where the summations run over the atoms of thenucleotide and qn is the partial charge of the nth atom inthe nucleotide. &n

rxn-field is the reaction field potential at theposition of atom n induced by solvation effects, and&n

chg(m)

is the site potential at the coordinates of atom n induced bypartial charges in the set of atoms m with all other partialcharges set to zero. Hence, the electrostatic free energyterms in equation (2) can be expressed as:

DGself i" # ! " 12

Xnucl"i#

nqprotn Brxn$field

n

$ 12

Xnucl"i#

nqunprn Brxn$field

n #RNA

$" 12

Xnucl"i#

nqprotn Brxn$field

n

$ 12

Xnucl"i#

nqunprn Brxn$field

n #solution

"4a#

and

DGfixedjpair"i; j# !Xnucl"i#

nqprotn Bffixedjpairg"j#

n

$Xnucl"i#

nqunprn Bffixedjpairg"j#

n "4b#

where qnprot and qnunpr refer to the partial charges for theprotonated and unprotonated forms of the nucleotide i,and &n

{fixed|pair}(j) refer to potentials computed under theappropriate set of atoms in nucleotide j.Since free energies within the LPB are additive, all of the

terms in equation (2) need be calculated only once for aparticular macromolecule. Thus, the PB equation does notneed to be solved during every step of the MC procedure.However, as discussed below, additivity is lost if theNLPB is used and every term in the equation depends onthe microstate involved. This would require that the NLPBbe solved for every step in an MC procedure, which is notcomputationally feasible when many nucleotides areinvolved. In the next section, we describe our use of theLPB and NLPB equations. The section that followsintroduces a method that allows us to calculate approx-

imate non linear microstate energies that can be used inthe context of the MC procedure.

Linear and nonlinear Poisson–Boltzmannequations

Electrostatic site potentials and reaction field potentialsare obtained from finite difference solutions to thePoisson–Boltzmann equation:109

jd' rrrr" #jf rrrr" # & 4kekBT

U f rrrr" # & F f" # ! 0 "5#

where !(r) denotes the electrostatic potential, (f(r) denotesthe distribution of partial atomic charges and '(r) is thevalue of the dielectric constant for any point in space.54 F(!) has the general form:

F f" #u 4kkBT

X

i

cbi zi exp $zif rrrr" #" # "6#

where the sum is taken over all mobile ion species, and ciband zi are the bulk concentration and electrical charge ofeach species. Where only monovalent salt appears in thesolvent, F(!) is rewritten as –'0)2sinh(!(r)), where )2 is8*e2I/'0kBTand I is the ionic strength. When potentials aresmall, sinh(!(r)) can be approximated simply as !(r) andF(!) is simplified to:

F"f#u$ q0n2f"rrrr# "7#

This form for F(!) yields the linear PB equation and hasthe important property that energetic contributions de-rived from it are linearly additive. Thus, the linear PBequation can be used to break up larger calculations intoindividual contributions to the electrostatic free energy,which can then be summed to yield total values, asdescribed by equation (2). However, the drawback is thatthe linear approximation is valid only for molecules wherethe net charge is small and ions of different valence are allincorporated into a single ionic strength parameter I. RNAhowever bears a –1e charge for every unprotonatednucleotide in its structure, and electrostatic potentials canbecome very high for even moderately sized molecules.Water is assigned a value of '=80 and a lower dielectric

constant is generally used to represent the solute; thesolvent-accessible molecular surface represents theboundary between these two dielectric regions. Asdiscussed in previous work, a value of '=1 represents asolute with no electronic polarizability (an implicitassumption in most all-atom simulations).91 The value of2 has been shown to account well for electronic polariz-ability in a static structure, whereas larger values such as 4account in small part for conformational changes in themolecule that accompany changes in ionization state.91Since our model for RNA keeps the nucleobase andbackbone rigid, a value of 4 is consistent with work donein proteins, and was adopted for this work.41,60 Thedielectric constant inside the molecular surface of the RNAis assigned this value. The ionic strength is assigned avalue of zero at every point in the finite difference latticethat is inside this surface and within an ion-excludedregion that extends 2 Å from the surface.

The nonlinear correction to microstate energy

When the NLPB is used, as is appropriate for highlycharged molecules, the additive property of the linear

1490 Calculating pKas in RNA

Page 17: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

equation is no longer valid. This is because the concen-tration of salt around the RNA depends on the charge stateof each nucleotide; for example, there are clearly morepositively charged counterions around the RNA whennucleotides are all negatively charged than when they areneutral. Thus, pairwise interactions between any twonucleotides depend on the ionization state of all othernucleotides. This leads to a major combinatorial problemthat cannot be addressed without some type of approx-imation. In earlier studies, this was addressed byintroducing a correction factor for each pair to accountfor the non linearity.59 However, we choose the simplerassumption of introducing a correction factor for eachcharged state of the RNA, which is less expensive tocalculate.Our approach is to use the LPB equation to obtain

pairwise energies that do not depend on the charge state ofother nucleotides and to correct these linear energiesbased on the net charge of the RNA. The difference inelectrostatic free energy obtained from the NLPB and LPBis defined here as %Gcorr, where the superscript corrdenotes a correction term. Thus:

DGcorr ! DGNLPB $ DGLPB "8#

where %GNLPB and %GLPB are the electrostatic freeenergies computed using the nonlinear and linear PBequations, respectively.54 We assume that %Gcorr can beapproximated with a function that has a quadraticdependence on net charge. Specifically:

DGcorrm ! a Z2

m & b Zm & c "9#

where a, b and c, are coefficients that are appropriate for aparticular conformation of a given macromolecule. Zmrepresents the number of nucleotides protonated inmicrostate m, and can be written as:

Zm !XN

i

xi"m# "10#

where the sum is over all nucleotides in a particularmicrostate. We determine values for a, b and c for eachmolecule by running LPB and NLPB calculations on threedifferent microstates that produce a particular net chargeand, in this way we are able to plot %Gcorr as a function ofZ. Fitting these points to the polynomial of equation (9)yields values of a, b and c. (A plot of the nonlinearcorrection energy for the RNAs studied here appears inSupplementary Data Figure 1.)We now define the approximate free energy of a

microstate, %GmNLPB(apprx), as:

DGNLPB"apprx#m ! DGLPB

m & DGcorrm "11#

%GmNLPB(apprx) is used in our MC procedure instead of

%GmLPB. Note that the free energy defined by %Gm

NLPB(apprx)

is additive, but equation (11) accounts for nonlinear effectsin an approximate way. The use of %Gm

NLPB(apprx) yieldsmore accurate agreement between computed and exper-imental pKas than %Gm

LPB alone (Supplementary DataFigure 2). The difference is often on the order of +1–2 pKaunits. All pKa calculations reported here were performedusing the nonlinear correction, except where notedotherwise. A separate set of values, a, b and c, is computedfor each NMR or crystal structure. The resulting nonlinearcorrections are similar within each set of structures andsalt conditions (see Supplementary Data Figure 1). Note

that %GmNLPB(apprx) is used only in the context of the MC

procedure.

Individual contributions to the electrostatic freeenergy

The electrostatic free energy contribution due todesolvation is defined by equation (4a). We note herethat values for %Gself(i) obtained from the LPB and NLPBare nearly identical (data not shown).In order to obtain a measure of the electrostatic effects

due to phosphate groups and to other bases, we havecalculated electrostatic potentials at the N1 atom of eachadenosine and the N3 atom of each cytidine. Although werecognize that the potential obtained from the NLPB is notadditive, the terms we report are related directly to RNAstructural features and this provides insight as to thesource of the pKa shifts. Contributions due to phosphategroups, which include the atoms: P, O1P, O2P, O5! andO3!, are obtained by assuming these groups to be charged,while all other atoms in the RNA are kept neutral.Multiplying these potentials by +1e, to reflect a change inionization state at the site of protonation, yields thecontribution to the electrostatic free energies that isreported in Figure 3 and Table 4. In order to calculatethe electrostatic potentials due to the bases, we keep thephosphate groups charged and calculate the differentialpotential when the atoms in the bases are assumed to becharged relative to when they are assumed to be neutral.This can be done for all the bases in RNA or for anindividual nucleotide.

The determination of partial atomic charges and radii

The solution to the PB equations relies on a detailedatomic description of partial charges within the RNAalong with its molecular surface. Since standard molecularmechanics force-fields do not provide partial charges forionized forms of AMP and CMP, new partial atomiccharges were calculated for these nucleotides. Ourphilosophy was to devise a simple way to generate partialcharges that, when combined with appropriate radii,would be consistent with the experimental literatureconcerning the solvation energies of nucleobase deriva-tives. To do this, we used a philosophy similar to that usedin the development of the AMBER atom-centered chargesand a PARSE-like strategy for the selection of appropriateatomic radii.110,111 Atomic radii are used to describe thesolvent-accessible molecular surface (and hence, thedielectric boundary) between solute and the solvent forcalculations used here. The hydrogen radius was assigneda value of 1.10 Å. These and other atomic radii werechosen, in part, for their ability to reproduce trends ofsolvation in the four nucleobases. Consistent with PARSEradii for amino acids, Pauling's atomic radii were assignedto all heavy atoms. Thus, the atomic radius of phosphoruswas assigned using its literature value of 1.90 Å.112 (SeeSupplementary Data Table 1).Partial charges were generated by fitting atom-centered

charges to electrostatic potentials (ESP) derived ab initiousing the B3LYP/6-31g* level of theory and using theprogram Gaussian 98 (gaussian.com). Nine calculationswere performed: one for each of six ribonucleosides (A,A+, C, C+, G and U), and one for each of threeconformations of dimethyl-phosphate (gauche-gauche,gauche-trans and trans-trans). The partial charges on riboseatoms C5!, H5!1, H5!2, C4!, H4!, O4!, C3!, H3!, C2!, O2!,

1491Calculating pKas in RNA

Page 18: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

HO2 were made equivalent in all six ribonucleosides byaveraging the corresponding partial charges for eachatom. Excess charges were redistributed over the atomsC1!, H1! and N1/9 (nitrogen involved in the glycosidicbond) to ensure the net charge per nucleotide was integral.A single set of partial charges was obtained for thephosphate atoms P, O1P, O2P, O3!, O5! by averaging thecorresponding partial charges in the three conformers. Theprotons in each pair, H5!1/H5!2 (ribose), H21/H22(guanosine), H41/H42 (cytidine), H61/H62 (adenosine),were made equivalent by redistributing the partial chargeevenly between the two protons. The overall redistribu-tion of charge resulting from this procedure was verysmall. Partial charges for all remaining nucleobase atomswere not modified. United atoms were created for all RNAhydroxyl groups, O2!/HO2, O3T/H3T (3! terminus),O5T/H5T (5! terminus), by summing the partial chargeon the oxygen and hydrogen atoms and placing the sum atthe coordinates of the oxygen atom. This procedureproduced partial charges that were not significantlydifferent from those of AMBER 94 or ChARMM 27. (SeeSupplementary Data Table 2; atom names and nucleotidestructures are given in Supplementary Data Figure 3).To validate the partial charges and atomic radii set,

solvation free energies from gas to water were calculatedby summing electrostatic and non-polar contributions tosolvation111 (equations (1), (3), (4) therein) and the resultswere compared to the solvation free energy determinedfor 9-methyladenine; this quantity was derived originallyin the work by Ferguson et al.113 using the experimentallymeasured heat of vaporization of 9-methyladenosine.Based on the comparison of calculated solvation freeenergies for 9-methyladenine for various hydrogen radii,the radius of 1.10 Å was chosen. The relative solubilitiesof the nucleobases have been determined on the basis oftheir ability to partition between water and chloroformas well as between water and cyclohexane, where inorder of hydrophilicity, G>C>U>A.114,115 The calculat-ed solvation free energy for 9-methyladenine is consis-tent with the experimental value and the remainingcalculated solvation free energies are consistent with thehydrophilicity scale established by Wolfenden and co-workers. (Supplementary Data Tables 3 and 4).Finally, we scale atomic radii in order to use them in PB

calculations where the solute dielectric of RNA is set to avalue greater than 1. In particular, we scale the atomicradius by 87% when working with '=4, the value used inthe pKa calculations. Atomic radius scaling was used incalculations of solvation free energy by Sitkoff et al.111 Therationale is to maintain the same solvation free energy ofindividual nucleotides as calculated for '=1 whenalternate values for the dielectric is used. The scalingfactor maintains the balance between solvation andpairwise energies involved in the calculation of pKa shifts.Tests of these parameters (atomic radii, internal dielectric)for pKa calculations revealed that the values chosen werequite reasonable, as shown in Results. We emphasize thatthe choice of the scaling factor was obtained indepen-dently of any pKa calculation, and was not in any waychosen so as to fit experimental pKas.

Preparation of structures before calculation

Coordinates of RNA structures were obtained from theProtein Data Bank (PDB). The following structures wereused in this work: 17ra (BPH),64 1ldz and 2ldz (LDZ),72437d and 1l2x (BWYV-"),66,116 1kpy and 1kpz (PEMV-"),671cx0, 1drz and 1vc5 (HDVR),68,69 1m5k and 1m5v (hairpinribozyme).70,71 Crystallographic water and all metal ions

were removed from the structures and are not included inthe calculations. NMR structures having multiple con-formations were separated and treated individually. Thetopology and parameter files were modified for the X-PLOR program to handle the protonation of ionizednucleotides. Hydrogen atoms for all nucleotides, ionizedor neutral, were addedusing the X-PLORprogramholdingheavy-atom positions fixed.117 The modified X-PLORtopology and parameter files are available upon request.Calculations were performed on the proton-added struc-tures without further minimization. In the structures forBWYV-", the 5! triphosphate terminus was removed andreplaced with a standard O5! terminus. The productstructure of the hairpin ribozyme contains 2!-3!-cyclicphosphate between A12 and G13 of the cleaved substratestrand. To obtain pKas of the ribozyme in the productconformation, partial charges were first determined for the2!-3!-cyclic phosphate using the ESP protocol describedabove.

Source code and additional parameters

All the source code used in this work, including ourmodified version of MCCE, will be made available via thewebsite†.In general, all parameters not otherwise discussed here

are given in Supplementary Data Table 5.

Acknowledgements

We thank Donald Petrey for assistance withGRASP2, Li Xi for assistance with Gaussian98, andKevin Keating for calculations of #-$ angles. We aregrateful to Lucy Forrest, Mickey Kosloff and RemoRohs for many helpful comments in the writing ofthe manuscript.

References

1. Izatt, R. M., Christensen, J. J. & Rytting, J. H. (1971).Sites and thermodynamic quantities associated withproton and metal ion interaction with ribonucleicacid, deoxyribonucleic acid, and their constituentbases, nucleosides, and nucleotides. Chem. Rev. 71,439–481.

2. Saenger, W. (1984). Principles of Nucleic Acid Structure.Springer-Verlag, New York.

3. Gao, X. L. & Patel, D. J. (1987). NMR studies of A.Cmismatches in DNA dodecanucleotides at acidic pH.Wobble A(anti).C(anti) pair formation. J. Biol. Chem.262, 16973–16984.

4. Cai, Z. & Tinoco, I., Jr (1996). Solution structure ofloop A from the hairpin ribozyme from tobaccoringspot virus satellite. Biochemistry, 35, 6026–6036.

5. Asensio, J. L., Lane, A. N., Dhesi, J., Bergqvist, S. &Brown, T. (1998). The contribution of cytosineprotonation to the stability of parallel DNA triplehelices. J. Mol. Biol. 275, 811–822.

6. Jang, S. B., Hung, L. W., Chi, Y. I., Holbrook, E. L.,Carter, R. J. & Holbrook, S. R. (1998). Structure of an

†http://wiki.c2b2.columbia.edu/honiglab_public/index.php/RNA

1492 Calculating pKas in RNA

Page 19: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

RNA internal loop consisting of tandem C-A+ base-pairs. Biochemistry, 37, 11726–11731.

7. Durant, P. C. & Davis, D. R. (1999). Stabilization ofthe anticodon stem-loop of tRNALys,3 by an A+-Cbase-pair and by pseudouridine. J. Mol. Biol. 285,115–131.

8. Morse, S. E. & Draper, D. E. (1995). Purine-purinemismatches in RNA helices: evidence for protonatedG.A pairs and next-nearest neighbor effects. Nucl.Acids Res. 23, 302–306.

9. Ravindranathan, S., Butcher, S. E. & Feigon, J. (2000).Adenine protonation in domain B of the hairpinribozyme. Biochemistry, 39, 16026–16032.

10. Bink, H. H., Hellendoorn, K., van der Meulen, J. &Pleij, C. W. (2002). Protonation of non-Watson-Crickbase-pairs and encapsidation of turnip yellowmosaic virus RNA. Proc. Natl Acad. Sci. USA, 99,13465–13470.

11. Blanchard, S. C. & Puglisi, J. D. (2001). Solutionstructure of the A loop of 23S ribosomal RNA. Proc.Natl Acad. Sci. USA, 98, 3720–3725.

12. Bevilacqua, P. C. (2003). Mechanistic considerationsfor general acid-base catalysis by RNA: revisiting themechanism of the hairpin ribozyme. Biochemistry, 42,2259–2265.

13. Bevilacqua, P. C., Brown, T. S., Nakano, S. & Yajima,R. (2004). Catalytic roles for proton transfer andprotonation in ribozymes. Biopolymers, 73, 90–109.

14. Nakano, S., Chadalavada, D. M. & Bevilacqua, P. C.(2000). General acid-base catalysis in the mechanismof a hepatitis delta virus ribozyme. Science, 287,1493–1497.

15. Oyelere, A. K., Kardon, J. R. & Strobel, S. A. (2002).pKa perturbation in genomic Hepatitis Delta Virusribozyme catalysis evidenced by nucleotide analogueinterference mapping. Biochemistry, 41, 3667–3675.

16. Perrotta, A. T., Shih, I. & Been, M. D. (1999).Imidazole rescue of a cytosine mutation in a self-cleaving ribozyme. Science, 286, 123–126.

17. Wadkins, T. S., Shih, I., Perrotta, A. T. & Been, M. D.(2001). A pH-sensitive RNA tertiary interactionaffects self-cleavage activity of the HDV ribozymesin the absence of added divalent metal ion. J. Mol.Biol. 305, 1045–1055.

18. Shih, I. H. & Been, M. D. (2001). Involvement of acytosine side chain in proton transfer in the rate-determining step of ribozyme self-cleavage. Proc.Natl Acad. Sci. USA, 98, 1489–1494.

19. Das, S. R. & Piccirilli, J. A. (2005) General acidcatalysis by the hepatitis delta virus ribozyme 1,45–52.

20. Kuzmin, Y. I., Da Costa, C. P., Cottrell, J. W. & Fedor,M. J. (2005). Role of an active site adenine in hairpinribozyme catalysis. J. Mol. Biol. 349, 989–1010.

21. Kuzmin, Y. I., Da Costa, C. P. & Fedor, M. J. (2004).Role of an active site guanine in hairpin ribozymecatalysis probed by exogenous nucleobase rescue.J. Mol. Biol. 340, 233–251.

22. Lebruska, L. L., Kuzmine, Y. I. & Fedor, M. J. (2002).Rescue of an abasic hairpin ribozyme by cationicnucleobases: evidence for a novel mechanism ofRNA catalysis. Chem. Biol. 9, 465–473.

23. Ryder, S. P., Oyelere, A. K., Padilla, J. L., Klosterme-ier, D., Millar, D. P. & Strobel, S. A. (2001).Investigation of adenosine base ionization in thehairpin ribozyme by nucleotide analog interferencemapping. RNA, 7, 1454–1463.

24. Wilson, T. J., Ouellet, J., Zhao, Z. Y., Harusawa, S.,Araki, L., Kurihara, T. & Lilley, D. M. (2006).

Nucleobase catalysis in the hairpin ribozyme. RNA,12, 980–987.

25. Lee, J. C. & Gutell, R. R. (2004). Diversity of base-pairconformations and their occurrence in rRNA struc-ture and RNA structural motifs. J. Mol. Biol. 344,1225–1249.

26. Xiong, L., Polacek, N., Sander, P., Bottger, E. C. &Mankin, A. (2001). pKa of adenine 2451 in theribosomal peptidyl transferase center remains elu-sive. RNA, 7, 1365–1369.

27. Muth, G. W., Chen, L., Kosek, A. B. & Strobel, S. A.(2001). pH-dependent conformational flexibilitywithin the ribosomal peptidyl transferase center.RNA, 7, 1403–1415.

28. Yang, A. S. & Honig, B. (1994). Structural origins ofpHand ionic strength effects on protein stability. Aciddenaturation of sperm whale apomyoglobin. J. Mol.Biol. 237, 602–614.

29. Bullough, P. A., Hughson, F. M., Skehel, J. J. & Wiley,D. C. (1994). Structure of influenza haemagglutinin atthe pH of membrane fusion. Nature, 371, 37–43.

30. Frick, D. N., Rypma, R. S., Lam, A. M. & Frenz, C. M.(2004). Electrostatic analysis of the hepatitis C virusNS3 helicase reveals both active and allosteric sitelocations. Nucl. Acids Res. 32, 5519–5528.

31. Ondrechen, M. J., Clifton, J. G. & Ringe, D. (2001).THEMATICS: a simple computational predictor ofenzyme function from structure. Proc. Natl Acad. Sci.USA, 98, 12473–12478.

32. Doudna, J. A. & Cech, T. R. (2002). The chemicalrepertoire of natural ribozymes.Nature, 418, 222–228.

33. Fedor, M. J. & Williamson, J. R. (2005). The catalyticdiversity of RNAs. Nature Rev. Mol. Cell. Biol. 6,399–412.

34. Demchuk, E. & Wade, R. C. (1996). Improving thecontinuum dielectric approach to calculating pKas ofionizable groups in proteins. J. Phys. Chem. 100,17373–17387.

35. Nielsen, J. E. & Vriend, G. (2001). Optimizing thehydrogen-bond network in Poisson-Boltzmannequation-based pKa calculations. Proteins: Struct.Funct. Genet. 43, 403–412.

36. Mehler, E. L. & Guarnieri, F. (1999). A self-consistent,microenvironment modulated screened coulombpotential approximation to calculate pH-dependentelectrostatic effects in proteins. Biophys. J. 77, 3–22.

37. Bashford, D. & Karplus, M. (1990). pKas of ionizablegroups in proteins: atomic detail from a continuumelectrostatic model. Biochemistry, 29, 10219–10225.

38. Antosiewicz, J., McCammon, J. A. & Gilson, M. K.(1994). Prediction of pH-dependent properties ofproteins. J. Mol. Biol. 238, 415–436.

39. Antosiewicz, J., McCammon, J. A. & Gilson, M. K.(1996). The determinants of pKas in proteins.Biochemistry, 35, 7819–7833.

40. Yang, A. S. & Honig, B. (1993). On the pHdependence of protein stability. J. Mol. Biol. 231,459–474.

41. Yang, A. S., Gunner, M. R., Sampogna, R., Sharp, K.& Honig, B. (1993). On the calculation of pKas inproteins. Proteins: Struct. Funct. Genet. 15, 252–265.

42. Li, H., Robertson, A. D. & Jensen, J. H. (2005). Veryfast empirical prediction and rationalization ofprotein pKa values. Proteins: Struct. Funct. Genet. 61,704–721.

43. Petrov, A. S., Lamm, G. & Pack, G. R. (2004). Thetriplex-hairpin transition in cytosine-rich DNA.Biophys. J. 87, 3954–3973.

44. Misra, V. K., Sharp, K. A., Friedman, R. A. & Honig,

1493Calculating pKas in RNA

Page 20: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

B. (1994). Salt effects on ligand-DNA binding. Minorgroove binding antibiotics. J. Mol. Biol. 238, 245–263.

45. Misra, V. K., Hecht, J. L., Sharp, K. A., Friedman, R. A.& Honig, B. (1994). Salt effects on protein-DNAinteractions. The lambda cI repressor and EcoRIendonuclease. J. Mol. Biol. 238, 264–280.

46. Ben-Tal, N., Honig, B., Peitzsch, R. M., Denisov, G. &McLaughlin, S. (1996). Binding of small basic pep-tides to membranes containing acidic lipids: theoret-ical models and experimental results. Biophys. J. 71,561–575.

47. Hecht, J. L., Honig, B., Shin, Y. K. & Hubbell, W. L.(1995). Electrostatic potentials near-the-surface ofDNA - Comparing Theory and Experiment. J. Phys.Chem. 99, 7782–7786.

48. Misra, V. K. & Honig, B. (1995). On the magnitude ofthe electrostatic contribution to ligand-DNA interac-tions. Proc. Natl Acad. Sci. USA, 92, 4691–4695.

49. Misra, V. K. & Draper, D. E. (1999). The interpretationof Mg2+ binding isotherms for nucleic acids usingPoisson–Boltzmann theory. J. Mol. Biol. 294, 1135–1147.

50. Misra, V. K. & Draper, D. E. (2000). Mg2+ binding totRNA revisited: the nonlinear Poisson–Boltzmannmodel. J. Mol. Biol. 299, 813–825.

51. Misra, V. K. & Draper, D. E. (2001). A thermodynamicframework for Mg2+ binding to RNA. Proc. Natl Acad.Sci. USA, 98, 12456–12461.

52. Misra, V. K. & Draper, D. E. (2002). The linkage bet-ween magnesium binding and RNA folding. J. Mol.Biol. 317, 507–521.

53. Misra, V. K., Shiman, R. & Draper, D. E. (2003). Athermodynamic framework for the magnesium-de-pendent folding of RNA. Biopolymers, 69, 118–136.

54. Sharp, K. A. & Honig, B. (1990). Calculating totalelectrostatic energies with the nonlinear Poisson-Boltzmann equation. J. Phys. Chem. 94, 7684–7692.

55. Murthy, C. S., Bacquet, R. J. & Rossky, P. J. (1985).Ionic distributions near poly-electrolytes–a compar-ison of theoretical approaches. J. Phys. Chem. 89,701–710.

56. Bacquet, R. & Rossky, P. J. (1984). Ionic atmosphere ofrodlike poly-electrolytes–a hypernetted chain study.J. Phys. Chem. 88, 2660–2669.

57. Svensson, B., Jonsson, B. & Woodward, C. E. (1990).Monte-Carlo simulations of an electric double-layer.J. Phys. Chem. 94, 2105–2113.

58. Guldbrand, L., Jonsson, B., Wennerstrom, H. & Linse,P. (1984). Electrical double-layer forces—a Monte-Carlo study. J. Chem. Phys. 80, 2221–2228.

59. Vorobjev, Y. N., Scheraga, H. A., Hitz, B. & Honig, B.(1994). Theoretical modeling of electrostatic effects oftitratable side-chain groups on protein conformationin a polar ionic solution. 1. Potential of mean forcebetween charged lysine residues and titration of poly(L-lysine) in 95-percent methanol solution. J. Phys.Chem. 98, 10940–10948.

60. Alexov, E. G. & Gunner, M. R. (1997). Incorporatingprotein conformational flexibility into the calculationof pH-dependent protein properties. Biophys. J. 72,2075–2093.

61. Alexov, E. G. & Gunner, M. R. (1999). Calculatedprotein and proton motions coupled to electrontransfer: electron transfer fromQA- to QB in bacterialphotosynthetic reaction centers. Biochemistry, 38,8253–8270.

62. Gunner, M. R. & Alexov, E. (2000). A pragmaticapproach to structure based calculation of coupledproton and electron transfer in proteins. Biochim.Biophys. Acta, 1458, 63–87.

63. Forrest, L. R. & Honig, B. (2005). An assessment ofthe accuracy of methods for predicting hydrogenpositions in protein structures. Proteins: Struct. Funct.Genet. 61, 296–309.

64. Smith, J. S. & Nikonowicz, E. P. (1998). NMRstructure and dynamics of an RNA motif commonto the spliceosome branch-point helix and the RNA-binding site for phage GA coat protein. Biochemistry,37, 13486–13498.

65. Legault, P. & Pardi, A. (1997). Unusual dynamics andpKa shift at the active site of a lead-dependentribozyme. J. Am. Chem. Soc. 119, 6621–6628.

66. Su, L., Chen, L., Egli, M., Berger, J. M. & Rich, A.(1999). Minor groove RNA triplex in the crystalstructure of a ribosomal frameshifting viral pseudo-knot. Nature Struct. Biol. 6, 285–292.

67. Nixon, P. L., Rangan, A., Kim, Y. G., Rich, A.,Hoffman, D. W., Hennig, M. & Giedroc, D. P. (2002).Solution structure of a luteoviral P1-P2 frameshiftingmRNA pseudoknot. J. Mol. Biol. 322, 621–633.

68. Ferre-D'Amare, A. R., Zhou, K. & Doudna, J. A.(1998). Crystal structure of a hepatitis delta virusribozyme. Nature, 395, 567–574.

69. Ke, A., Zhou, K., Ding, F., Cate, J. H. & Doudna, J. A.(2004). A conformational switch controls hepatitisdelta virus ribozyme catalysis. Nature, 429, 201–205.

70. Rupert, P. B. & Ferre-D'Amare, A. R. (2001). Crystalstructure of a hairpin ribozyme-inhibitor complexwith implications for catalysis. Nature, 410, 780–786.

71. Rupert, P. B., Massey, A. P., Sigurdsson, S. T. & Ferre-D'Amare, A. R. (2002). Transition state stabilizationby a catalytic RNA. Science, 298, 1421–1424.

72. Hoogstraten, C. G., Legault, P. & Pardi, A. (1998).NMR solution structure of the lead-dependentribozyme: evidence for dynamics in RNA catalysis.J. Mol. Biol. 284, 337–350.

73. Legault, P., Hoogstraten, C. G., Metlitzky, E. & Pardi,A. (1998). Order, dynamics and metal-binding in thelead-dependent ribozyme. J. Mol. Biol. 284, 325–335.

74. Nixon, P. L., Cornish, P. V., Suram, S. V. & Giedroc,D. P. (2002). Thermodynamic analysis of conservedloop-stem interactions in P1-P2 frameshifting RNApseudoknots from plant Luteoviridae. Biochemistry,41, 10665–10674.

75. Nixon, P. L. & Giedroc, D. P. (2000). Energetics ofa strongly pH-dependent RNA tertiary structurein a frameshifting pseudoknot. J. Mol. Biol. 296,659–671.

76. Moody, E. M., Lecomte, J. T. & Bevilacqua, P. C.(2005). Linkage between proton binding and foldingin RNA: a thermodynamic framework and itsexperimental application for investigating pKa shift-ing. RNA, 11, 157–172.

77. Nakano, S., Proctor, D. J. & Bevilacqua, P. C. (2001).Mechanistic characterization of the HDV genomicribozyme: assessing the catalytic and structural contri-butions of divalent metal ions within a multichannelreaction mechanism. Biochemistry, 40, 12022–12038.

78. Bevilacqua, P. C., Brown, T. S., Chadalavada, D.,Lecomte, J., Moody, E. & Nakano, S. I. (2005).Linkage between proton binding and folding inRNA: implications for RNA catalysis. Biochem. Soc.Trans. 33, 466–470.

79. Shih, I. H. & Been, M. D. (2002). Catalytic strategiesof the hepatitis delta virus ribozymes. Annu. Rev.Biochem. 71, 887–917.

80. Perrotta, A. T. & Been, M. D. (2006). HDV ribozymeactivity in monovalent cations. Biochemistry, 45,11357–11365.

1494 Calculating pKas in RNA

Page 21: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

81. Perrotta, A. T., Wadkins, T. S. & Been, M. D. (2006).Chemical rescue, multiple ionizable groups, andgeneral acid-base catalysis in the HDV genomicribozyme. RNA, 12, 1282–1291.

82. Kumar, P. K., Suh, Y. A., Miyashiro, H., Nishikawa, F.,Kawakami, J., Taira, K. & Nishikawa, S. (1992).Random mutations to evaluate the role of bases attwo important single-stranded regions of genomicHDV ribozyme. Nucl. Acids Res. 20, 3919–3924.

83. Belinsky, M. G., Britton, E. & Dinter-Gottlieb, G.(1993). Modification interference analysis of a self-cleaving RNA from hepatitis delta virus. FASEB J. 7,130–136.

84. Suh, Y. A., Kumar, P. K., Kawakami, J., Nishikawa, F.,Taira, K. & Nishikawa, S. (1993). Systematic substi-tution of individual bases in two important single-stranded regions of the HDVribozyme for evaluationof the role of specific bases. FEBS Letters, 326,158–162.

85. Tanner, N. K., Schaff, S., Thill, G., Petit-Koskas, E.,Crain-Denoyelle, A. M. &Westhof, E. (1994). A three-dimensional model of hepatitis delta virus ribozymebased on biochemical and mutational analyses. Curr.Biol. 4, 488–498.

86. Nesbitt, S. M., Erlacher, H. A. & Fedor, M. J. (1999).The internal equilibrium of the hairpin ribozyme:temperature, ion and pH effects. J. Mol. Biol. 286,1009–1024.

87. Grasby, J. A., Mersmann, K., Singh, M. & Gait, M. J.(1995). Purine functional groups in essential residuesof the hairpin ribozyme required for catalyticcleavage of RNA. Biochemistry, 34, 4068–4076.

88. Ryder, S. P. & Strobel, S. A. (1999). Nucleotide analoginterference mapping of the hairpin ribozyme:implications for secondary and tertiary structureformation. J. Mol. Biol. 291, 295–311.

89. Salter, J., Krucinska, J., Alam, S., Grum-Tokars, V. &Wedekind, J. E. (2006). Water in the active site of anall-RNA hairpin ribozyme and effects of Gua8 basevariants on the geometry of phosphoryl transfer.Biochemistry, 45, 686–700.

90. Wadley, L. M. & Pyle, A. M. (2004). The identificationof novel RNA structural motifs using COMPADRES:an automated approach to structural discovery.Nucl.Acids Res. 32, 6650–6659.

91. Gilson, M. K. & Honig, B. H. (1986). The dielec-tric constant of a folded protein. Biopolymers, 25,2097–2119.

92. Reiter, N. J., Blad, H., Abildgaard, F. & Butcher, S. E.(2004). Dynamics in the U6 RNA intramolecularstem-loop: a base flipping conformational change.Biochemistry, 43, 13739–13747.

93. Misra, V. K. & Draper, D. E. (1998). On the role ofmagnesium ions in RNA stability. Biopolymers, 48,113–135.

94. Draper, D. E., Grilley, D. & Soto, A. M. (2005). Ionsand RNA folding. Annu Rev Biophys. Biomol. Struct.34, 221–243.

95. Chin, K., Sharp, K. A., Honig, B. & Pyle, A. M. (1999).Calculating the electrostatic properties of RNAprovides new insights into molecular interactionsand function. Nature Struct. Biol. 6, 1055–1061.

96. Leontis, N. B., Stombaugh, J. &Westhof, E. (2002). Thenon-Watson-Crick base-pairs and their associatedisostericity matrices. Nucl. Acids Res. 30, 3497–3531.

97. Leontis, N. B. & Westhof, E. (2001). Geometricnomenclature and classification of RNA base-pairs.RNA, 7, 499–512.

98. Lemieux, S. & Major, F. (2002). RNA canonical and

non-canonical base-pairing types: a recognitionmethod and complete repertoire. Nucl. Acids Res.30, 4250–4263.

99. Walberer, B. J., Cheng, A. C. & Frankel, A. D. (2003).Structural diversity and isomorphism of hydrogen-bonded base interactions in nucleic acids. J. Mol. Biol.327, 767–780.

100. Hunter, W. N., Brown, T., Anand, N. N. & Kennard,O. (1986). Structure of an adenine-cytosine base-pairin DNA and its implications for mismatch repair.Nature, 320, 552–555.

101. Flinders, J. & Dieckmann, T. (2001). A pH con-trolled conformational switch in the cleavage site ofthe VS ribozyme substrate RNA. J. Mol. Biol. 308,665–679.

102. Freier, S. M., Kierzek, R., Jaeger, J. A., Sugimoto, N.,Caruthers, M. H., Neilson, T. & Turner, D. H. (1986).Improved free-energy parameters for predictions ofRNA duplex stability. Proc. Natl Acad. Sci. USA, 83,9373–9377.

103. Yildirim, I. & Turner, D. H. (2005). RNA challen-ges for computational chemists. Biochemistry, 44,13225–13234.

104. Moody, E. M., Brown, T. S. & Bevilacqua, P. C. (2004).Simple method for determining nucleobase pKavalues by indirect labeling and demonstration of apKa of neutrality in dsDNA. J. Am. Chem. Soc. 126,10200–10201.

105. Murray, J. B., Seyhan, A. A.,Walter, N. G., Burke, J.M.& Scott, W. G. (1998). The hammerhead, hairpin andVS ribozymes are catalytically proficient in monova-lent cations alone. Chem. Biol. 5, 587–595.

106. Fedor, M. J. (2000). Structure and function of thehairpin ribozyme. J. Mol. Biol. 297, 269–291.

107. Alexov, E. (2003). Role of the protein side-chainfluctuations on the strength of pair-wise electrostaticinteractions: comparing experimental with compu-ted pKas. Proteins: Struct. Funct. Genet. 50, 94–103.

108. Gilson, M. K. & Honig, B. (1988). Calculation of thetotal electrostatic energy of a macromolecularsystem: solvation energies, binding energies, andconformational analysis. Proteins: Struct. Funct. Genet.4, 7–18.

109. Rocchia, W., Sridharan, S., Nicholls, A., Alexov, E.,Chiabrera, A. & Honig, B. (2002). Rapid grid-basedconstruction of the molecular surface and the useof induced surface charge to calculate reactionfield energies: applications to the molecular sys-tems and geometric objects. J. Comput. Chem. 23,128–137.

110. Cornell, W. D., Cieplak, P., Bayly, C. I., Gould, I. R.,Merz, K. M., Ferguson, D. M. et al. (1995). A secondgeneration force field for the simulation of proteins,nucleic acids, and organic molecules. J. Am. Chem.Soc. 117, 5179–5197.

111. Sitkoff, D., Sharp, K. A. & Honig, B. (1994).Correlating solvation free energies and surfacetensions of hydrocarbon solutes.Biophys. Chem. 51,397–403; discussion 404-399.

112. Pauling, L. (1960). The Nature of the Chemical Bond, 3rdedit. Cornell University Press, .

113. Ferguson, D. M., Radmer, R. J. & Kollman, P. A.(1991). Determination of the relative binding freeenergies of peptide inhibitors to the HIV-1 protease.J. Med. Chem. 34, 2654–2659.

114. Cullis, P. M. & Wolfenden, R. (1981). Affinities ofnucleic acid bases for solvent water. Biochemistry, 20,3024–3028.

115. Shih, P., Pedersen, L. G., Gibbs, P. R. &Wolfenden, R.

1495Calculating pKas in RNA

Page 22: Calculation of pK s in RNA: On the Structural Origins and ... · nucleotides or nucleotides with elevated pK as to achieveoptimalactivity.14–24 Protonatednucleotides have been implicated

(1998). Hydrophobicities of the nucleic acid bases:distribution coefficients from water to cyclohexane.J. Mol. Biol. 280, 421–430.

116. Egli, M., Minasov, G., Su, L. & Rich, A. (2002). Metalions and flexibility in a viral RNA pseudoknot at

atomic resolution. Proc. Natl Acad. Sci. USA, 99,4302–4307.

117. Brünger, A. T. (1992). X-PLOR Version 3.1. A Systemfor X-ray Crystallography and NMR. Yale UniversityPress, New Haven, CT.

Edited by D. E. Draper

(Received 26 July 2006; received in revised form 29 November 2006; accepted 1 December 2006)Available online 6 December 2006

1496 Calculating pKas in RNA