9
Phosphorus Binding Sites in Proteins: Structural Preorganization and Coordination Mathias Gruber, Per Greisen, Jr., Caroline M. Junker, and Claus He ́ lix-Nielsen* The Biomimetic Membrane Group, Department of Physics, Technical University of Denmark, DK 2800 Kongens Lyngby Denmark * S Supporting Information ABSTRACT: Phosphorus is a ubiquitous element of the cell, which is found throughout numerous key molecules related to cell structure, energy and information storage and transfer, and a diverse array of other cellular functions. In this work, we adopt an approach often used for characterizing metal binding and selectivity of metalloproteins in terms of interactions in a rst shell (direct residue interactions with the metal) and a second shell (residue interactions with rst shell residues) and use it to characterize binding of phosphorus compounds. Similar analyses of binding have previously been limited to individual structures that bind to phosphate groups; here, we investigate a total of 8307 structures obtained from the RCSB Protein Data Bank (PDB). An analysis of the binding site amino acid propensities reveals very characteristic rst shell residue distributions, which are found to be inuenced by the characteristics of the phosphorus compound and by the presence of cobound cations. The second shell, which supports the coordinating residues in the rst shell, is found to consist mainly of protein backbone groups. Our results show how the second shell residue distribution is dictated mainly by the rst shell of the binding site, especially by cobound cations and that the main function of the second shell is to stabilize the rst shell residues. INTRODUCTION Specic binding of phosphate groups in proteins has been widely studied because it is essential for a large number of functions and pathways within cells. 13 The phosphate group is prominent in the phospholipid molecules constituting the lipid bilayer of cellular membranes in the adenylate energy transporter molecules adenosine mono-, di-, and triphosphate (AMP, ADP, ATP, respectively), and it constitutes 9% of the mass of the nucleic acids in deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) molecules. 1 In fact, nearly 20% of all proteins found in the RCSB Protein Data Bank interact with phosphate, be it in its inorganic form or bound in an organic moiety such as a nucleotide. In structural analyses of phosphate recognition, the focus has so far been on direct interactions between, for example, the phosphate anion and protein amino acids and their relation to associated specic binding motifs (e.g., Rossmann fold, glycine rich sequences, and P-loops). 3 However, the recognition and coordination of phosphorus compounds in proteins may involve additional interactions that reach beyond rst shell interactions. The shellterminology originates from structural analyses of metalloproteins in which the metal ion is coordinated by direct interactions not only with rst shell ligands, the term ligand referring to backbone moieties, or specic amino acid side chains but also indirectly (via proteinprotein interactions) with second shell ligands. 4,5 The importance of second shell interactions for proteinmetal recognition and metalloprotein function is exemplied in protecting and shielding a binding site core, 6 for stabilization of the binding site complex, 7,8 for enhancing binding site anity, 4,911 and in other ways ne-tuning the inner environ- ment of the binding site. 12,13 We, therefore, speculated if second shell interactions also could play a role for binding of phosphorus compounds in proteins as variations in second shell residues or backbone structures may result in changes in the physicochemical properties of the binding site in analogy with previous observations for metalloproteins. 14 Here, we address this question by performing a structural survey of all structures that bind phosphate groups in the RCSB Protein Data Bank, looking at both rst and second shell interactions. We dene rst shell interactions as direct interactions between the phosphate groups and the amino acid residues or the protein backbone. Second shell interactions refer to the interactions between the protein and rst shell moieties (see Figure 1). We analyzed the entire data set en bloc as well as in subsets classied by protein function (i.e., enzyme versus nonenzyme) and type of phosphorus compound (e.g., inorganic phosphate, nucleotide, etc.). From the analyses performed, we identied the general tendencies seen in binding sites for phosphorus compounds, the inuence of dierent types of phosphorus compounds on the binding characteristics of the rst and second shell, the relevance of dierent residue interactions in the binding site, the importance of second shell interactions in recognition, and nally the involvement of cobound cations on binding characteristics. Received: August 30, 2013 Revised: December 12, 2013 Published: January 9, 2014 Article pubs.acs.org/JPCB © 2014 American Chemical Society 1207 dx.doi.org/10.1021/jp408689x | J. Phys. Chem. B 2014, 118, 12071215

Phosphorus Binding Sites in Proteins: Structural Preorganization and Coordination

  • Upload
    claus

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Phosphorus Binding Sites in Proteins: Structural Preorganization andCoordinationMathias Gruber, Per Greisen, Jr., Caroline M. Junker, and Claus Helix-Nielsen*

The Biomimetic Membrane Group, Department of Physics, Technical University of Denmark, DK 2800 Kongens Lyngby Denmark

*S Supporting Information

ABSTRACT: Phosphorus is a ubiquitous element of the cell, which is found throughoutnumerous key molecules related to cell structure, energy and information storage andtransfer, and a diverse array of other cellular functions. In this work, we adopt an approachoften used for characterizing metal binding and selectivity of metalloproteins in terms ofinteractions in a first shell (direct residue interactions with the metal) and a second shell(residue interactions with first shell residues) and use it to characterize binding ofphosphorus compounds. Similar analyses of binding have previously been limited toindividual structures that bind to phosphate groups; here, we investigate a total of 8307structures obtained from the RCSB Protein Data Bank (PDB). An analysis of the bindingsite amino acid propensities reveals very characteristic first shell residue distributions,which are found to be influenced by the characteristics of the phosphorus compound andby the presence of cobound cations. The second shell, which supports the coordinatingresidues in the first shell, is found to consist mainly of protein backbone groups. Ourresults show how the second shell residue distribution is dictated mainly by the first shell of the binding site, especially bycobound cations and that the main function of the second shell is to stabilize the first shell residues.

■ INTRODUCTION

Specific binding of phosphate groups in proteins has beenwidely studied because it is essential for a large number offunctions and pathways within cells.1−3 The phosphate group isprominent in the phospholipid molecules constituting the lipidbilayer of cellular membranes in the adenylate energytransporter molecules adenosine mono-, di-, and triphosphate(AMP, ADP, ATP, respectively), and it constitutes ∼9% of themass of the nucleic acids in deoxyribonucleic acid (DNA) andribonucleic acid (RNA) molecules.1 In fact, nearly 20% of allproteins found in the RCSB Protein Data Bank interact withphosphate, be it in its inorganic form or bound in an organicmoiety such as a nucleotide.In structural analyses of phosphate recognition, the focus has

so far been on direct interactions between, for example, thephosphate anion and protein amino acids and their relation toassociated specific binding motifs (e.g., Rossmann fold, glycinerich sequences, and P-loops).3 However, the recognition andcoordination of phosphorus compounds in proteins mayinvolve additional interactions that reach beyond first shellinteractions. The “shell” terminology originates from structuralanalyses of metalloproteins in which the metal ion iscoordinated by direct interactions not only with first shellligands, the term ligand referring to backbone moieties, orspecific amino acid side chains but also indirectly (via protein−protein interactions) with second shell ligands.4,5

The importance of second shell interactions for protein−metal recognition and metalloprotein function is exemplified inprotecting and shielding a binding site core,6 for stabilization ofthe binding site complex,7,8 for enhancing binding site

affinity,4,9−11 and in other ways fine-tuning the inner environ-ment of the binding site.12,13 We, therefore, speculated ifsecond shell interactions also could play a role for binding ofphosphorus compounds in proteins as variations in second shellresidues or backbone structures may result in changes in thephysicochemical properties of the binding site in analogy withprevious observations for metalloproteins.14

Here, we address this question by performing a structuralsurvey of all structures that bind phosphate groups in the RCSBProtein Data Bank, looking at both first and second shellinteractions. We define first shell interactions as directinteractions between the phosphate groups and the aminoacid residues or the protein backbone. Second shell interactionsrefer to the interactions between the protein and first shellmoieties (see Figure 1). We analyzed the entire data set en blocas well as in subsets classified by protein function (i.e., enzymeversus nonenzyme) and type of phosphorus compound (e.g.,inorganic phosphate, nucleotide, etc.). From the analysesperformed, we identified the general tendencies seen in bindingsites for phosphorus compounds, the influence of differenttypes of phosphorus compounds on the binding characteristicsof the first and second shell, the relevance of different residueinteractions in the binding site, the importance of second shellinteractions in recognition, and finally the involvement ofcobound cations on binding characteristics.

Received: August 30, 2013Revised: December 12, 2013Published: January 9, 2014

Article

pubs.acs.org/JPCB

© 2014 American Chemical Society 1207 dx.doi.org/10.1021/jp408689x | J. Phys. Chem. B 2014, 118, 1207−1215

■ MATERIALS AND METHODSThe Protein Data Bank was surveyed for structures containingphosphorus ligands with a resolution below 3.0 Å. Structureswith more than 90% sequence identity were removed from thedata set, and the remaining structures were divided by theirtype of phosphorus compound into six groups as follows:phosphate (2812 structures), pyrophosphate (162 structures),nucleoside monophosphates (NMP, 537 structures), nucleo-side diphosphates (NDP, 1809 structures), nucleosidetriphosphates (NTP, 1022 structures), and the coenzymesnicotinamide adenine dinucleotide and flavin adenine dinucleo-tide (FAD/FADH + NAD/NADH, 1965 structures). In thecase of oligomeric proteins, the analysis was restricted to one ofthe homologous subunits, as these are assumed to have anidentical mode of binding the phosphorus compounds.3 ForPDB files containing an ensemble of structures as determinedby NMR, only the first model present in the file was used.Separation of the data set into enzymes and nonenzymes wasestablished by using the enzyme classification (EC) numberinformation in the PDB files.We defined a first solvation shell as all protein residues

containing O, N, or S atoms within a 3.5 Å cutoff distance fromthe O atoms of the phosphorus atom. This definition was usedin order to capture all H-bond donors (HO, HN, HS) andelectrostatic interactions in one search.3,8 Besides amino acidresidues, water molecules and metals were also included in thefirst shell because they often have important structuralfunctions.3,8,15,16

The O, N, and S atoms of the first solvation shell were usedas first shell centers for defining the second solvation shell.Amino acid residues with an O, N, or S atom within a 3.5 Åcutoff distance of a first shell center was counted as second shellresidues. When counting second shell residues, it was ensuredthat all backbone moieties in the second shell identified byinteractions with backbone moieties in the first shell were onlyincluded if the residues in question were placed at least oneamino acid apart in the protein. This was done to avoid

counting noninteracting residues simply for being withinproximity of each other. Water was not counted as a secondshell residue, as it is expected that water may commonly befound in the second shell near protein surfaces, where it doesnot play any structural or catalytic roles.8

To obtain error estimates on the calculated residuedistributions, we divided the data sets into blocks of 100structures and calculated block averages and standarddeviations. The choice of block size was based on thedependency of the error estimates on the block size for thedifferent residues (see Supporting Information Figure S1).Error estimates were only calculated for data sets with morethan 1000 structures and were not calculated for the smallersubsets that did not contain enough structures for properstatistics.Magnesium was found as the most predominant metal cation

in the first shell; therefore, the analysis was extended tocounting the number of O, N, and S atoms within 3.5 Å ofMg2+ ions.

■ RESULTSFirst Shell Residue Interactions. Out of the 85 848

structures present in the PDB on the November 5, 2012, 19604 matched the structural features searched for in this study.Out of these structures, only structures binding to the sixgroups of phosphorus compounds listed in Table 1 wereselected. The complete data set comprised 8307 structures intotal, where 8240 were determined by X-ray crystallography, 27by NMR, and 40 by electron microscopy.

We start out by analyzing the first shell amino acidoccurrence pattern for the entire data set, which is shown inFigure 2. The bar graph shows the percentage of the variousamino acids that participate in first shell recognition ofphosphate groups in the data set. The high occurrence of Glyresidues is in agreement with Gly-rich loops being importantbinding motifs for phosphate groups, that is, the consensussequence of the so-called P-loop, which is commonly found inATP and GTP binding proteins.17,18 The polar residues Ser andThr are also strongly represented, both through backbone andside chain interactions. The frequency of Tyr is found to beabout 70% lower than that of Ser and Thr, despite the polarproperties of its hydroxyl group. Steric factors likely leads to apreference for smaller Ser and Thr over the more bulky Tyrresidue.19 In cases where the binding site is close to the surface,the hydrophobicity of the phenyl group in Tyr would also makeit unfavorable and thereby explain its lower frequency. Thepositively charged amino acids Lys and Arg both have a highoccurrence, which is expected because of the anionic nature ofthe phosphate moieties. Finally, the side chains of Asp and Glualso seem to be of importance for binding. Their role can, onone hand, be attributed to ion pairing with cations coboundwith the phosphate moiety, but it has also been argued thatthey act to form a negative environment in the binding site

Figure 1. Phosphate ion bound in a protein. Phosphate is bound toArg in the first solvation shell, indicated by the black dashed line, andto Glu in the second shell, indicated by the pink dashed line. Theselines represent only two interactions out of many other side chain andbackbone interactions between the protein and phosphate (PDB ID:3FWP).

Table 1. How the Investigated Data Set Is Separated intoDifferent Classes in Terms of Which PhosphorousCompounds the Individual Structures Bind

phosphoruscompound PO4

3− NMP NDP NTP pyrophosphateNAD/FAD

PDB files 2812 537 1809 1022 162 1965

The Journal of Physical Chemistry B Article

dx.doi.org/10.1021/jp408689x | J. Phys. Chem. B 2014, 118, 1207−12151208

suitable for discriminating between different protonation stagesof phosphate groups.18,20

The bulky nonpolar residues were generally found to havelow interaction frequencies, which agrees with earlierobservations.21 Finally, we note that water molecules arepresent as a predominant component in the first shell for allphosphorus compounds. The high frequency of watermolecules reflects the fact that desolvation of phosphaterequires a substantial amount of energy (e.g., the Gibbs energyof formation of the phosphate ion in solution is 243 kcal/mol).22

The results presented in Figure 2 are in excellent agreementwith what was found earlier by Hirsch et al. (2007) despite thefact that their data set was limited to phosphate groups boundto C atoms (i.e., they excluded structures binding freephosphate ions).3 Thus, the observed pattern seems to be ageneral feature of proteins that bind phosphorus compounds.Second Shell Residue Interactions. We now analyze the

complete distribution of second shell residues in the entire dataset, which is shown in Figure 3. The number of residues in thesecond shell is generally lower than in the first shell, and theaverage number ratio of first shell to second shell residues ofthe entire set was found to be 1.64:1. This means that not allfirst shell residues have a second shell partner in the proteinmatrix, which is consistent with what was has been found formetal binding sites.8 Similarly to the first shell, the distributionof amino acids in the second shell shows a high occurrence ofGly residues. The positive amino acids Arg and Lys are muchless frequent in the second shell compared to the first shell, andinstead, a higher frequency of the negative acids Asp and Glu isobserved. This shows that Asp and Glu in the second shell playstabilizing roles, mainly by charge−charge and charge-dipoleinteractions with first shell residues, which will lower the

conformational entropy of the first shell residues and aid withthe formation of a preorganized binding site.To obtain more information about the distribution seen in

Figure 3, we turn to analyzing the distribution of second shellresidues around some of the most predominant first shellresidues in the presence and absence of cations (see Figure 4).Out of the 8307 structures in the data set, 1661 structurescontain phosphate groups complexed directly with metalcations in the first shell and an additional 486 of the structurescontain metal cations only in the second shell of thephosphorus compound. This implies that approximately 26%of the structures in the data set contain one or more cationsbound in close proximity to the phosphorus compound, whichis consistent with what has previously been observed forphosphate-binding structures.3

Second Shell Interactions with First Shell Asp/GluResidues. The most common second shell partner for Asp andGlu is the backbone amide group that hydrogen bonds with thecarboxylate oxygen of the first shell Asp/Glu residues (seeFigure 4a−b). The carboxylate oxygens of Asp/Glu are capableof forming salt bridges with Arg/Lys residues in the secondshell, which explains the high frequency of these positiveresidues in Figure 4a−b. The high frequency of second shellAsp/Glu residues interacting with first shell Asp/Glu residuesmight come as a surprise because one would expect a tendencytoward mutual repulsion between the negatively charged sidechains. Given a large and diverse data set of proteins fromdifferent organisms, the variation in effective pKa values may,however, result in many of the residues being protonated, thusallowing first and second shell Glu and Asp residues to interactthrough hydrogen bonds or their interactions to be mediatedthrough water molecules.

Figure 2. Frequency distribution of first shell amino acid residues (one-letter codes) in binding sites for phosphorus compounds for the full data setof 8307 protein structures. Residues with side chains capable of interacting with the phosphate groups are shown twice, with the first histogram barreferring to the side-chain interaction and the second histogram bar referring to the backbone interaction. The residues are grouped and color-codedas follows: gray (apolar), green (polar), blue (basic), red (acidic) and yellow (water). Error bars are based on block averaging of the data set with ablock size of 100 structures.

Figure 3. Frequency distribution of second shell amino acid residues (one-letter codes) in binding sites for phosphorus compounds for the full dataset of 8307 protein structures. The residues are grouped and color-coded as follows: gray (apolar), green (polar), blue (basic) and red (acidic). Errorbars are based on block averaging of the data set with a block size of 100 structures.

The Journal of Physical Chemistry B Article

dx.doi.org/10.1021/jp408689x | J. Phys. Chem. B 2014, 118, 1207−12151209

Additionally, the high frequency of Asp/Glu residues inFigure 4a−b can partly be explained by the presence of metalcations, such that both first shell and second shell Asp/Gluresidues are stabilized by interaction with metal cations (seeSupporting Information Figure S3); in such cases, both residuesinteract with the cation and not each other. The presentmethod of mining a large database of structures is inherentlysusceptible to noise from special cases such as this, where firstshell and second shell partners are counted only because oftheir proximity to each other and not because an interaction istaking place.The high frequency of negatively charged residues is

consistent with previous observations that phosphate-bindingproteins use negatively charged binding cavities to discriminatebetween substrates.20 For example, in the case of phosphate-

binding protein (PBP), the possibility for a hydrogen bondbetween the dibasic phosphate ion (HPO4

2−) and an Aspresidue results in PBP being able to bind the phosphate ion fiveorders of magnitude more tightly than the sulfate ion (SO4

2−),which is repulsed from the negative cavity.23

Second Shell Interactions with First Shell Arg/LysResidues. Having looked at the second shell around thenegative residues Asp and Glu, we now turn to look at thesecond shell around the positive amino acids Arg and Lys,which are often found in the first solvation shell of binding sitesfor phosphorus compounds. In Figure 4c−d, it is seen that Argand Lys predominantly partner with backbone groups in thesecond shell. As expected, the mutual repulsion betweenpositive residues in the first and second shell means thatvirtually no amount of Arg or Lys is observed in the second

Figure 4. Frequency distribution of second shell residues found bounded to different first shell residues: (a) Asp, (b) Glu, (c) Arg, (d) Lys, (e)water, and (f) backbone. The data set is separated into structures with a metal cation in either the first or second shell of the binding site (full colorbars) and structures where no metal complexation takes place (dashed bars). All backbone interactions have been pooled together in “BKB”. Bars arecolor-coded as follows: gray (apolar), green (polar), blue (basic), and red (acidic).

The Journal of Physical Chemistry B Article

dx.doi.org/10.1021/jp408689x | J. Phys. Chem. B 2014, 118, 1207−12151210

shell. The positive residues of the first shell have a hightendency to interact with negative Glu and Asp residues in thesecond shell through salt bridges, and thus, along with thebackbone, Glu and Asp are seen to constitute the main secondshell support for the Arg and Lys residues.Second Shell Residue Interactions with First Shell

Water Molecules. Given that water was found to be a verycommon first shell ligand (Figure 2), the second shell residuedistribution around first shell water molecules was alsoinvestigated, and the result is shown in Figure 4e. The mainsecond shell residue for interaction with first shell watermolecules is the protein backbone, both in the presence andabsence of cations. The high frequency of water molecules inthe first solvation shell highlights the importance of water inmediating interactions between phosphate groups and thebinding site.24,25 The second most frequent second shellpartners for water in the first shell are Asp and Glu. Thefrequencies of these negatively charged residues are largelydictated by the presence or absence of cations (∼30% vs ∼7%),which indicates that the main reason for the presence of Aspand Glu residues in Figure 4e is stabilization of metal cationsand not indirect interactions through water molecules withphosphate groups. The frequency of neutral and positivesecond shell amino acids around first shell water were foundnot to be influenced by the presence of cations in the bindingsite.

Second Shell Interactions with First Shell BackboneAmides. One of the most frequent first shell residues involvedin binding of phosphate groups is the amide group (CONH) ofthe protein backbone. When examining the second shellresidues for protein backbone groups in the first shell (seeFigure 4f), we observe that these consist almost exclusively ofprotein backbone residues (i.e., there is a high occurrence ofbackbone−backbone interactions in the binding sites). Forthese interactions, the first shell backbone moieties interactpredominantly (∼80% of the interactions) via their N−Hgroup with the carbonyl oxygen of the second shell backbonepartners (the first shell backbone moieties then interact viatheir nitrogen lone pair with the phosphorus compound). Asimilarly high frequency of backbone−backbone interactions isobserved in the literature for metal binding sites.8 The cause forthe high frequency of backbone moieties in both the first shelland the second shell is likely related to their universality (i.e., abackbone group can partner in principle with any type ofresidue acting either as a hydrogen-bond acceptor or donor),and the results show how it is the folding of the proteinbackbone that provides the main support for the coordinationof first shell residues in the binding site.

First and Second Shell Distributions for DifferentPhosphorus Compounds. So far, our analysis has focused onthe first and second shell interactions of the full data set. Thedata set contained seven different common phosphorus

Figure 5. First shell (a) and second shell (b) analysis of different common phosphate compounds showing residues involved in binding. “BKB” refersto all backbone interactions. The moieties are: phosphate (PO4, light blue), nucleoside monophosphates (NMP, green), nucleoside diphosphates(NDP, light green), nucleoside triphosphates (NTP, yellow), pyrophosphate (PP, orange), and coenzymes (FAD and NAD, red).

The Journal of Physical Chemistry B Article

dx.doi.org/10.1021/jp408689x | J. Phys. Chem. B 2014, 118, 1207−12151211

compounds, and we now turn our attention to the differencesbetween these individual compounds and show their first andsecond shell binding characteristics in Figure 5.Looking at the first shell residue distribution of structures

binding the phosphate ion (PO43−) in Figure 5, it is seen that

there is a preference for side-chain interactions over backboneinteractions. Especially, the interactions with the positiveresidues Lys and Arg are important, but a relatively highfrequency is also observed for Asp and Glu residues. The Aspand Glu residues are observed despite the relative lowoccurrence of positive cations, indicating that their presenceis actually related to phosphate interactions and not only tocation stabilization.Looking at the first shell for the monophosphates (NMP),

their binding characteristics are highly similar to that of thephosphate ion, showing that the attached nucleoside moietydoes not change the binding characteristics of phosphate group.Addition of more phosphate groups, however, such as in NDPand NTP, greatly changes the binding characteristics: whengoing from NMP to NDP or NTP, an increased preference forLys and a diminished preference for His and Arg is observed. Ithas previously been postulated that Lys is important for thestabilization of β- and γ-phosphates.3 It should furthermore benoted that Lys is highly conserved in the consensus P-loopsequence.26 For NDP and NTP, backbone interactions aremore frequent than in NMP. Steric factors along with theincreased frequency of cobound cations are presumed to beresponsible for the shift in distribution of positive residuesinteracting with the nucleotides.Looking at the second shell binding characteristics in Figure

5b, it is seen that the residue distributions of the nucleotides aremore or less identical to that of the phosphate ion PO4

3− andthat no big differences exist between the nucleotides NMP,NDP, and NTP. This is interesting considering the differencesin first shell for these phosphorus compounds observed inFigure 5a. It does not seem like the second shell distribution ofthe investigated nucleotide-binding structures is important forany selectivity toward the number of phosphate groups in thebound nucleotide.The first shell binding characteristics for pyrophosphate are

found to be very similar to that of NMP and PO4, except thefrequency of backbone residues is smaller and there is anincreased preference for the positive residue Lys. It is seen inFigure 5 that pyrophosphate binding by proteins is oftenassisted by metal cations, which is marked by an exceptionalhigh frequency of Mg2+ ions. Consequently, the pyrophos-phate-binding structures are also found to contain largeramounts of Asp and Glu residues in the second shell comparedto the nucleotides and PO4

3− binding structures. Another partof the explanation for the Asp and Gly presence is thatpyrophosphate has a pKa of 6.70,22 meaning it will beprotonated at pH 7 and thus capable of interacting with Aspand Glu residues.The coenzymes NAD and FAD are both dinucleotides,

meaning they contain two nucleosides linked by a pyrophos-phate moiety. They are often involved in redox reactions (i.e.,NAD+/NADH and FAD+/FADH). First shell bindingcharacteristics for NAD and FAD are, however, found to differfrom pyrophosphate; practically none of the charged aminoacids, except for Arg, are found interacting with thepyrophosphate moiety of NAD/FAD, and it is found that thebinding is rarely assisted by any cations in the first and secondshell of the pyrophosphate moiety. These characteristics (i.e.,

high Arg frequency and absence of cations) correspond wellwith investigations of the protein−coenzyme interactions foundin the literature.27,28

No noteworthy differences in the frequency of water in thefirst shell were observed for the different phosphoruscompounds. Looking at the overall tendencies of Figure 5, itseems that the distribution of second shell residues for differentphosphorus compounds is more influenced by the tendency ofthe structures to cobind cations than by the structure and typeof the phosphorus compound. Cations are usually found in thefirst solvation shell of the phosphorus compound binding sitesand, as such, have a stronger influence than phosphate groupson the second shell residues because of their closer proximity.For the first shell residues, it is clear that each phosphoruscompound has its own “fingerprint”, which may be more or lessunique compared to the other classes, but such characteristicdifferences are much less apparent for the second shell residues.It is, however, evident from Figure 5b that there is a clearpreference for certain residues in the second shell of bindingsites. This indicates a general tendency for how the bindingsites are preorganized in the proteins and how first shellresidues are backed up by second shell residues. Solely on thebasis of the amount of first and second shell residues counted inthe analysis, it is found that ∼60% of the first shell residues arebacked up by second shell residues in the protein. Furtherinvestigations of individual first shell residues show howsecondary residues in the protein back up these residues (seeTable 2).

■ DISCUSSIONFrom the abundant amount of structural information present inthe PDB, we have analyzed the statistics of how proteins innature bind to phosphorus compounds. A previous study byHirsch et al. has addressed interactions within the first solvationshell of phosphate group binding sites.3 In this study, weextended their approach by investigating how second shellstructural features influence the binding characteristics. Despiteour larger (the data set used by Hirsch et al. contained 3003structures) and less restrictive data set, the first shell bindingcharacteristics were found to be nearly identical to what hasbeen reported by Hirsch et al.In our analysis, we included all structures in the PDB

database that bind to specific phosphorus compounds, and theonly inclusion criterion when building our database was thepresence of a ligand with a phosphate group in the PDB file. Aconsequence of this liberal approach is that our data set maycontain some degree of noise, for example, from the presenceof free phosphates that may have cocrystallized in locations thatdo not represent binding sites simply because of thecrystallization conditions used. Despite the fact that our dataset included structures where phosphate from the crystallization

Table 2. Propensities for First Shell Residues To Be BackedUp by Second Shell Interactionsa

first shell residue Arg Lys His Asp Glu BKB HOH

propensity forsecond shellbackup

0.49 0.72 0.22 0.40 0.42 0.24 1.05

aThe propensities are calculated as the total amount of second shellresidues in the data set interacting with these first shell residuesdivided by the individual amount of first shell residues.

The Journal of Physical Chemistry B Article

dx.doi.org/10.1021/jp408689x | J. Phys. Chem. B 2014, 118, 1207−12151212

buffer has simply cocrystallized with the structure, thedistribution of first and second shell ligands for the phosphatepart of our data set is found to be similar to that of the otherphosphorus compounds. All our results must, therefore, bereviewed in the light of the fact that they represent large datasets that have been defined only by the presence of certainphosphorus compounds. More accurate residue distributiontendencies are likely to be revealed in data sets constructedfrom applying more specific structural or functional properties.Another essential factor related to the data sets investigatedhere is the variation in the effective pKa-values of the individualamino acid residues and phosphorus compounds. Theprotonation state of these individual chemical groups willdepend on the crystallization conditions used, and theexperimental conditions thus affect the residue distributions.Differences in the distributions of first shell residues in the

binding sites for different phosphorus compounds wereobserved. These differences represent the overall tendency ofsome phosphorus compounds to prefer certain residues toothers. As such, the first shell residue distributions represent amore or less unique “fingerprint” that identifies the type ofcompound bound. It is noted that it has previously been shownthat more structural classifications of the proteins (e.g., by helix-type and nonhelix-type binding sites) reveal even morepronounced differences in residue propensities of the bindingsite.21 Approximately 26% of the structures in this studycontained positively charged cations in either the first or secondsolvation shell. The presence or absence of these cations wasfound to greatly influence the distribution of charged residuesfound in the binding sites. At the same time, the occurrence ofcations was found to be strongly dependent on the phosphoruscompound in question; for example, for the nucleotides, thefrequency of all cations follows the trend NTP > NDP > NMP,representing the increase in total negative charge of thenucleotides, and virtually no cations (only ∼2% of thestructures) were found cobound with the coenzymes NADand FAD. The propensity to cobind cations seems to be highlydependent on the type of compound being bound, and thepresence or absence of these cations in turn influence the firstshell residue distribution.Negative amino acids in the second shell were found to play

an important role for stabilizing the first solvation shell of thebinding sites. The first solvation shell on the other handcontained more positive amino acids, consistent with thephosphate group substrates being anionic. Looking at both thefirst and second solvation shell, the overall observed highfrequency of negative residues is consistent with previousobservations for phosphate and sulfate binding proteins, whereit is proposed to increase the discrimination between differentnegative substrates.29 This selectivity occurs because anysubstrate that does not fit perfectly into the binding cavitywill be rejected by the negative environment.20,30 It should bekept in mind that the observed high frequency of negativelycharged amino acids is, in part, also caused by cobound cations,which are stabilized through charge−charge interactions withthe negative residues.Throughout all of the second shell distribution profiles

presented in Figures 3−5, backbone residues were found to behighly predominant. The explanation for this is likely theuniversal ability of these residues to interact with a largespectrum of first shell residues through either their carbonyloxygens or amide protons.8 It was found that first shell waterhas no clear preference for negative, positive, or neutral second

shell residues (see Figure 4e). This stands in contrast tometalloenzymes, where water is often found to be ionized orpolarized by charged residues for subsequent hydrolyticreaction,12,13 which in statistical surveys is seen as apredominant frequency of negative amino acids in the secondshell of various metal cations.8 In the case of enzymes that bindphosphorus compounds, the presence of charged residues thatpolarize water molecules for subsequent hydrolytic reaction hassimilarly been reported in certain enzymes (e.g., forpyrophosphatase hNUDT515 and phosphorylcholine phospha-tase).16 Separating the data set into enzymes and nonenzymes,no significant difference in the frequency of negative or positiveamino acids was seen for enzymes when compared tononenzymes (see Supporting Information, Figure S7). In theenzyme data set, only 28% of the structures contained coboundcations, whereas in the nonenzyme data set, 43% of thestructures had cobound cations. The high frequency of Asp andGlu residues in the enzyme data set must therefore be seen inthe light of the fact that these are present despite a lowerfrequency of cations, for which one may otherwise haveexpected a decreased frequency of Asp and Glu residues. Thisexample of separating the data set into enzymes andnonenzymes shows that one must be careful when extractinginformation from database surveys; some structural effects andfeatures may simply be diluted or hidden by a large data set. Totest whether certain structures were overrepresented in our dataset, we performed our analyses on data sets where homologousstructures were removed at 30, 50, 70, and 90% similaritycutoffs (see Supporting Information, Figure S8), confirmingthese data sets did not significantly change any of the observedtrends.Given that the second shell residue distribution profiles for

the binding sites of different phosphorus compounds looksimilar (see Figure 5) and that they are mainly affected by thepresence or absence of cations, it does not seem like the secondshell contributes to the selectivity of the binding site. There is,however, a characteristic distribution of residues in the secondshell that is more or less conserved for different compounds.The selectivity for different phosphorus compounds can, thus,be attributed mainly to first shell interactions within theprotein. The second shell layer, in general, serves the functionof stabilizing or protecting the inner-core structure, and thesecond shell residues are of such nature that favorableinteractions can occur with the respective first shell residues.

■ CONCLUSIONSIn this study, we present a statistical analysis of binding sites forphosphorus compounds in 8307 protein structures selected bytheir ability to bind specific classes of phosphorus compounds.Of the structures investigated, a remarkably high 74% of thestructures were found to bind the phosphorus compoundswithout the assistance of metal ions. Despite the relatively smallamount of structures where the phosphorus compounds cobindwith metal ions, it was found that these structures are veryinfluential on the overall binding characteristics of the entiredata set, most profoundly by influencing the frequency anddistribution of charged residues in both the first and secondshell of the binding site. A very characteristic and conservedresidue distribution is observed for the second shell, whichhints at its importance for stabilizing the binding site. Thedistributions of first shell residues revealed that some more orless unique binding characteristics may apply to differentphosphorus compounds. These characteristics are, however,

The Journal of Physical Chemistry B Article

dx.doi.org/10.1021/jp408689x | J. Phys. Chem. B 2014, 118, 1207−12151213

influenced not only by the properties of the compound but alsogreatly by the tendency of the given class of protein to cobindwith metal ions. The statistical data presented in this study addsto the in-depth understanding of how phosphorus compoundsare bound by proteins, which is of considerable interest ifinnovative ways of using biotechnology for phosphorusrecovery are to be exploited.

■ ASSOCIATED CONTENT*S Supporting InformationBar graphs with first shell and second shell residue andinteraction distance distributions, illustration of phosphatebinding site with cobound cation, bar graphs showing first shellresidue distributions in the presence and absence of cations andfor enzymes and nonenzymes. This material is available free ofcharge via the Internet at http://pubs.acs.org.

■ AUTHOR INFORMATIONCorresponding Author*C. Helix-Nielsen. Tel: +45 60681081, E-mail: [email protected] ContributionsThe manuscript was written through contributions of allauthors. All authors have given approval to the final version ofthe manuscript.FundingThis work was supported by the Danish Agency for Science viaa grant to the innovation consortium “Natural Ingredients andNew Energy”.NotesThe authors declare no competing financial interest.

■ ACKNOWLEDGMENTSP.J.G. was supported by Carlsbergfondet and Familien HedeNielsen Fond and Fru Vera Hansens Fond. M.F.G. and C.H.N.wish to acknowledge the support for this work through theInnovation Consortium Natural Ingredients and Green Energy(NIGE), with sustainable purification technologies financiallysupported by Danish Agency for Science Technology andInnovation.

■ REFERENCES(1) Elser, J. J. Phosphorus: a Limiting Nutrient for Humanity? Curr.Opin. Biotechnol. 2012, 23, 833−838.(2) Blank, L. M. The Cell and P: From Cellular Function toBiotechnological Application. Curr. Opin. Biotechnol. 2012, 23, 846−851.(3) Hirsch, A. K. H.; Fischer, F. R.; Diederich, F. PhosphateRecognition in Structural Biology. Angew. Chem., Int. Ed. Engl. 2007,46, 338−52.(4) Vipond, I. B.; Moon, B. J.; Halford, S. E. An Isoleucine to LeucineMutation That Switches the Cofactor Requirement of the EcoRVRestriction Endonuclease from Magnesium to Manganese. Biochem-istry 1996, 35, 1712−21.(5) Levy, R.; Sobolev, V.; Edelman, M. First and Second Shell MetalBinding Residues in Human Proteins Are DisproportionatelyAssociated with Disease-Related SNPs. Hum. Mutat. 2011, 32,1309−18.(6) Maynard, A. T.; Covell, D. G. Reactivity of Zinc Finger Cores:Analysis of Protein Packing and Electrostatic Screening. J. Am. Chem.Soc. 2001, 123, 1047−58.(7) Dudev, T.; Lim, C. Factors Governing the Protonation State ofCysteines in Proteins: An Ab initio/CDM Study. J. Am. Chem. Soc.2002, 124, 6759−66.

(8) Dudev, T.; Lin, Y.; Dudev, M.; Lim, C. First Second ShellInteractions in Metal Binding Sites in Proteins: a PDB Survey andDFT/CDM Calculations. J. Am. Chem. Soc. 2003, 125, 3168−80.(9) He, Q.; Mason, A. B.; Woodworth, R. C.; Tam, B. M.;Macgillivray, R. T. A.; Grady, J. K.; Chasteen, N. D. Mutations atNonliganding Residues Tyr-85 and Glu-83 in the N-Lobe of HumanSerum Transferrin. J. Biol. Chem. 1998, 273, 17018−17024.(10) Mertz, P.; Yu, L.; Sikkink, R.; Rusnak, F. Kinetic andSpectroscopic Analyses of Mutants of a Conserved Histidine in theMetallophosphatases Calcineurin and Lambda Protein Phosphatase. J.Biol. Chem. 1997, 272, 21296−302.(11) Variants, A.; Ditusa, C. A.; Mccall, K. A.; Christensen, T.;Mahapatro, M.; Fierke, C. A.; Toone, E. J. Thermodynamics of MetalIon Binding . 2 . Metal Ion Binding by Carbonic. Biochemistry 2001,40, 5345−5351.(12) Christianson, D. W.; Cox, J. D. Catalysis by Metal-ActivatedHydroxide in Zinc and Manganese Metalloenzymes. Annu. Rev.Biochem. 1999, 68, 33−57.(13) Lipscomb, W. N.; Strater, N. Recent Advances in ZincEnzymology. Chem. Rev. 1996, 96, 2375−2434.(14) Ebert, J.; Altman, R. Robust Recognition of Zinc Binding Sitesin Proteins. Protein Sci. 2008, 54−65.(15) Zha, M.; Guo, Q.; Zhang, Y.; Yu, B.; Ou, Y.; Zhong, C.; Ding, J.Molecular Mechanism of ADP-Ribose Hydrolysis by Human NUDT5from Structural and Kinetic Studies. J. Mol. Biol. 2008, 379, 568−78.(16) Infantes, L.; Otero, L. H.; Beassoni, P. R.; Boetsch, C.; Lisa, A.T.; Domenech, C. E.; Albert, A. The Structural Domains ofPseudomonas Aeruginosa Phosphorylcholine Phosphatase Cooperatein Substrate Hydrolysis: 3D Structure and Enzymatic Mechanism. J.Mol. Biol. 2012, 423, 503−14.(17) Bianchi, A.; Giorgi, C.; Ruzza, P.; Toniolo, C.; Milner-White, E.J. A Synthetic Hexapeptide Designed to Resemble a Proteinaceous P-Loop Nest Is Shown to Bind Inorganic Phosphate. Proteins 2012, 80,1418−24.(18) Guimaraes, C. R. W.; Rai, B. K.; Munchhof, M. J.; Liu, S.; Wang,J.; Bhattacharya, S. K.; Buckbinder, L. Understanding the Impact of theP-Loop Conformation on Kinase Selectivity. J. Chem. Inf. Model. 2011,51, 1199−204.(19) Darby, N. J.; Creighton, T. E. Protein Structure; IRL Press atOxford University Press: Oxford, U.K., 1993.(20) Morales, R.; Berna, A.; Carpentier, P.; Contreras-Martel, C.;Renault, F.; Nicodeme, M.; Chesne-Seck, M.-L.; Bernier, F.; Dupuy, J.;Schaeffer, C.; et al. Serendipitous Discovery and X-Ray Structure of aHuman Phosphate Binding Apolipoprotein. Structure 2006, 14, 601−9.(21) Copley, R. R.; Barton, G. J. A Structural Analysis of Phosphateand Sulphate Binding Sites in Proteins - Estimation of Propensities forBinding and Conservation of Phosphate Binding Sites. J. Mol. Biol.1994, 242, 321−329.(22) CRC Handbook of Chemistry and Physics, 88th ed.; Lide, D. R.,Ed.; CRC Press: Boca Raton, FL, 2007.(23) Leucke, H.; Quiocho, F. A. High Specificity of a PhosphateTransport Protein Determined by Hydrogen Bonds. Nature 1990, 347,402−406.(24) Levy, Y.; Onuchic, J. N. Water Mediation in Protein Folding andMolecular Recognition. Annu. Rev. Biophys. Biomol. Struct. 2006, 35,389−415.(25) Baron, R.; McCammon, J. A. Molecular Recognition and LigandAssociation. Annu. Rev. Phys. Chem. 2012, 64, 151−175.(26) Leipe, D. D.; Koonin, E. V.; Aravind, L. Evolution andClassification of P-Loop Kinases and Related Proteins. J. Mol. Biol.2003, 333, 781−815.(27) Carugo, O.; Argos, P. NADP-Dependent Enzymes. I: ConservedStereochemistry of Cofactor Binding. Proteins 1997, 28, 10−28.(28) Giangreco, I.; Packer, M. J. Pharmacophore Binding Motifs forNicotinamide Adenine Dinucleotide Analogues Across MultipleProtein Families: A Detailed Contact-Based Analysis of the InteractionBetween Proteins and NAD(P) Cofactors. J. Med. Chem. 2013, 56,6175−89.

The Journal of Physical Chemistry B Article

dx.doi.org/10.1021/jp408689x | J. Phys. Chem. B 2014, 118, 1207−12151214

(29) Ledvina, P. S.; Yao, N.; Choudhary, A.; Quiocho, F. A. NegativeElectrostatic Surface Potential of Protein Sites Specific for AnionicLigands. Proc. Natl. Acad. Sci. U. S. A. 1996, 93, 6786−91.(30) Vyas, N. K.; Vyas, M. N.; Quiocho, F. A. Crystal Structure of M.Tuberculosis ABC Phosphate Transport Receptor. Structure 2003, 11,765−774.

The Journal of Physical Chemistry B Article

dx.doi.org/10.1021/jp408689x | J. Phys. Chem. B 2014, 118, 1207−12151215