View
212
Download
0
Category
Preview:
Citation preview
Frozen, but No Accident – Why the 20 Standard
Amino Acids were Selected
Andrew J. Doig
Department of Chemistry and Manchester Institute of Biotechnology, 131 Princess Street,
University of Manchester, Manchester M7 1DN, UK
Email: andrew.doig@manchester.ac.uk
Running Title: Why the 20 Standard Amino Acids were Selected
Keywords: protein structure; evolution; RNA world; protein folding; protein stability
1
Abstract
The 20 standard amino acids encoded by the Genetic Code were adopted during the RNA World,
around 4 billion years ago. This amino acid set could be regarded as a frozen accident, implying
that other possible structures could equally well have been chosen to use in proteins. Amino
acids were not primarily selected for their ability to support catalysis, since the RNA World
already had highly effective cofactors to perform reactions, such as oxidation, reduction and
transfer of small molecules. Rather, they were selected to enable the formation of soluble
structures with close-packed cores, allowing the presence of ordered binding pockets. Factors to
take into account when assessing why a particular amino acid might be used include: its
component atoms, functional groups, biosynthetic cost, use in a protein core or on the surface,
solubility and stability. Applying these criteria to the 20 standard amino acids, and considering
some other simple alternatives that are not used, we find that there are excellent reasons for the
selection of every amino acid. Rather than being a frozen accident, the set of amino acids
selected appears to be near ideal.
2
Introduction
Why the particular 20 amino acids were selected to be encoded by the Genetic Code remains a
puzzle. Is this standard set the optimal choice or would many other amino acids be just as useful?
While we can expect some arbitrary choices, where a selected amino acids is as good as another
not used, how much luck was involved with this choice, leaving us with a frozen or historical
accident [1] that we are now unable to change? Here, I argue that there are excellent reasons for
using (or not using) each possible amino acid and that the set used is near optimal.
The RNA World
Protein synthesis is likely to have arisen following the RNA World [2], approximately 4 billion
years ago, a time between the origin of life and the Last Universal Common Ancestor (LUCA) of
all life on Earth. Life then used RNA, cofactors and metals to perform catalysis and DNA as the
genetic material. The planetary atmosphere was mildly reducing, with no free oxygen, so
metabolism was entirely anaerobic.
Cofactors, such as NAD, flavins, S-adenosyl methionine, pyridoxal phosphate and many
metals, are superb catalysts for a wide range of chemical reactions, notably redox reactions,
transfer of small groups and electron transfer. Their essentiality for metabolism and the presence
of RNA fragments in their structures, suggests that they were present in the RNA World [3-7].
Since redox chemistry and small group transfer is so well covered by cofactors, there was no
need for proteins to perform these functions. Amino acids were instead selected to promote
folding into close-packed structures, forming binding pockets bound by non-polar and charged
groups, and with oriented hydrogen bond donors and acceptors.
3
In terms of the three simple properties of charge, size and hydrophobicity, the selected
amino acids show a high diversity compared to random alternative sets [8, 9]. While a wide
range of these properties is clearly desirable, additional factors must also play a role.
Weber & Miller considered the same question, of why these 20 amino acids, in 1981, making
many insightful points, though largely considering the ease of synthesis of each amino acid,
particularly in prebiotic conditions, and their susceptibility to unwanted chemical reactions [10].
If protein synthesis arose from the RNA world, however, life was already biochemically
sophisticated and the environment was substantially modified from the conditions prevailing
during abiogenesis. Arguments based on prebiotic conditions are thus not especially helpful in
rationalizing amino acid selection. Cleaves discussed the origin of the biologically coded amino
acids in 2010, with his review concentrating on prebiotic syntheses, stability, chirality,
biosynthetic accessibility and cost [11]. Here, I re-evaluate the amino acids, mostly from the
perspective of how they affect protein structure and stability. I will assume that proteins are only
composed of -amino acids, though alternative biopolymers is a fascinating field in itself.
Criteria for Selecting Amino Acids
Choice of Atoms. Are there alternatives to C, H, N, O and S? Amino acids need to be made
of atoms that are abundant on Earth and have useful properties. Atoms such as Selenium and
Antimony might have interesting chemistry, but their rarity precludes their use. Metals are too
soluble in water to be retained in a side chain. Halogens polarise bonds, due to their high
electronegativity, and can be prone to nucleophilic attack to give their ions. Silicon and
phosphorus are likely to be the only plausible additional atoms that could be used. Phosphorus is
not especially abundant, however, and was already heavily used in the RNA World. Protein
phosphorylation is widespread, though this takes place at selected sites only, perhaps to help
4
avoid unnecessary use of this essential atom. Phosphoserine is also easily hydrolysed. Silicon is
abundant, but has a strong tendency to form four bonds to oxygen, rather than any other atom.
Organosilanes appear to have few functional groups that would have a clear advantage over
those formed by carbon.
Functional Groups. The choice of functional groups is rather limited in small
molecules when using only C, H, O, N or S. Amides, amines, hydroxyls, carboxyls and carbon-
nitrogen bonds, present in the standard set, are stable groups that can form hydrogen bonds and
electrostatic interactions. Esters, anhydrides, nitriles and many other groups are too readily
hydrolysed in water. Ketones and aldehydes are too easily reduced or oxidised, and are
susceptible to nucleophilic attack. Carbon-carbon double and triple bonds are also more reactive
than single bonds.
Biosynthetic Cost. Protein synthesis takes a major share of the energy resources of a cell [12].
Table 1 shows the cost of biosynthesis of each amino acid, measured in terms of number of
glucose and ATP molecules required. These data are often non-intuitive. For example, Leu costs
only 1 ATP, but its isomer Ile costs 11. Why would life ever therefore use Ile instead of Leu, if
they have the same properties? Larger is not necessarily more expensive; Asn and Asp cost more
in ATP than their larger alternatives Gln and Glu, and large Tyr costs only two ATP, compared
to 15 for small Cys. The high cost of sulphur containing amino acids is notable.
Burial and Surface. Proteins have close-packed cores with the same density as organic
solids and side chains fixed into a single conformation [13]. A solid core is essential to stabilize
proteins and to form a rigid structure with well-defined binding sites. Non-polar side chains have
therefore been selected to stabilise close-packed hydrophobic cores. Conversely, proteins are
5
dissolved in water, so other side chains are used on a protein surface to keep them soluble in an
aqueous environment.
Solubility. Amino acids need to be soluble in the highly concentrated aqueous environment
inside the cell. The least soluble amino acid at pH 7 in water is Tyr
(https://www.anaspec.com/html/amino_acids_properties.html), so any less soluble than this may
not be acceptable.
Stability. Even with stable functional groups, some amino acids are prone to unwanted
reactions, such as cyclisation or acyl transfer, that can lead to decomposition or racemisation.
Which Amino Acids Came First?
It is plausible that the first proteins used a subset of the 20 and a simplified Genetic Code,
with the first amino acids acquired from the environment. Several studies agree on a consensus
set of these amino acids, comprising Ala, Asp, Glu, Gly, Ile, Leu, Pro, Ser, Thr, and Val [14-17].
It is likely that folded proteins, with some desirable function, can be produced from a subset of
the 20 [15, 18, 19], though this has not yet been demonstrated for this set of 10. For example,
Longo et al., showed that β-trefoil proteins can be produced using just these 10 plus Asn, Gln
and Arg in high salt [20]. Their set is notable for containing amino acids with a high propensity
for -helices (e.g. Ala and Leu) and -sheets (e.g. -branched Val, Ile and Thr), though no
positively charged side chains. Additional amino acids required the evolution of metabolic
pathways, increasing the set to 20. As additional amino acids are added to the code, the
advantage of adding a further amino acid decreases compared to the risk of adding too many
deleterious mutations simultaneously [21].While these 10 might be readily available for the first
proteins, they were not the only amino acids present. The question still remains as to why these
10 were selected from a much larger pool of amino acids.
Energetics of Protein Folding
6
Folded proteins are stabilised by hydrogen bonding, removal of non-polar groups from water
(hydrophobic effect), van der Waals forces, salt-bridges and disulphide bonds. Folding is
opposed by loss of conformational entropy, where rotation around bonds is restricted, and
introduction of strain. These forces are well balanced, so that the overall free energy changes for
all the steps in protein folding are close to zero.
Protein folding can be broken down into three stages (Table 2). Whether a protein actually
folds by this mechanism (though it may well do [22, 23]) does not matter here, as I am simply
showing alternative environments for a side chain. Firstly, an unfolded protein can form isolated
secondary structure. G for forming a poly(Ala) -helix in water is close to zero [24], so the
penalty for restricting the backbone to a helical conformation is nearly equal and opposite to the
benefits of forming amide-amide hydrogen bonds. Secondly, non-polar surfaces on secondary
structural elements meet, excluding water, and forming a fluid, non-polar, liquid-like core.
Formation of these dry molten globules is driven by the hydrophobic effect. Hydrogen bonds
may also strengthen as the dielectric is lower in a non-polar environment. Finally, the molten
globule freezes, forming a folded protein. Side-chains that were able to rotate in the molten
globule are now locked into a single rotamer, costing conformational entropy [25], offset by
improved van der Waals bonding [22]. A rough cost for freezing a side chain is given by the
number of rotatable bonds (Table 1; Fig. 1). Strain is also introduced, as many side chains are
forced to adopt conformations that differ from their most stable state [26-28]. Finally, all stages
are at risk of aggregation, giving useless or even toxic aggregates.
Isolated secondary structure, molten globules and aggregates are potential traps that proteins
must avoid. Protein folding may stall at the isolated secondary structure stage, if non-polar
surface area to be buried is too small, and at the molten globule stage, if the cost of freezing the
7
side chains is too high. Non-polar side chains should therefore have a high surface area to
maximise bonding, while not be too flexible, which would make the loss of conformational
entropy in the freezing step too high. A protein surface must have polar groups to keep the
protein soluble. If -sheets are overly favoured, non-functional and possibly toxic amyloid may
form.
Proteins are also stabilised by hydrogen bonds using side chains, either to other side chains or
the backbone. While a hydrogen bond might be intrinsically strong, formation of the bond
requires fixing the side chain, costing conformational entropy, and strain if the conformation is
non-optimal.
The 20 Standard Amino Acids
Considering the environment 4 billion years ago, and general reasons for choosing side chains,
we can now consider the 20 individually:
Gly Gly is tiny, cheap and has the most flexible backbone. The complete lack of a side chain
means it can adopt conformations unreachable by other amino acids and its lack of side chain
means it can enter narrow spaces.
Ala Ala is small and cheap. Its CH3 side chain has little surface to form van der Waals bonds,
but it has a profound effect on the backbone, restricting it to either the or conformations. It
is therefore more energetically favourable to put Ala into secondary structure than Gly. The
formation of an isolated -helix in water restricts rotations in side chains in all amino acids,
except Ala and Gly [29]. Hence, Ala has the highest preference to be in isolated -helices [30].
Val, Leu, Ile, Phe These hydrocarbon side chains are clearly present to drive protein folding
by forming hydrophobic cores, yet why exactly these were selected deserves an explanation.
Firstly, why are there so many hydrocarbon side chains, rather than just, say, Leu? Multiple
8
hydrocarbon side chains may be used as they are required to fill a protein core with no clashes
and no holes. A variety of pieces are required to fit all the gaps within a protein core. Each can
adopt a number of rotamers with similar energies [31], thus giving a range of possible shapes for
each side chain. Thus Leu and Ile are both needed to increase the range of possible hydrocarbon
3D shapes, despite Ile being more costly to synthesise (Table 1).
Val, Leu, Ile and Phe are striking in that they all have branched carbons, rather than straight
chains. Using a branch gives one fewer dihedral angle to be fixed compared to a straight chain
[32]. Side chains therefore enter a protein core not just because they are hydrophobic, but also
because they do not lose too much conformational entropy when they fold. Similarly, rings are
rigid, so Phe has a large hydrocarbon surface and only two bonds to be fixed.
Hydrocarbon side chains larger than Phe, Ile or Leu may not be used, as they would be less
soluble as amino acids. A cyclohexane ring is also less soluble than a benzene ring [33], so Phe
is used, rather than cyclohexylalanine.
Ile and Thr differ from the other 18 in that they have centres of chirality in their side chains.
Alloisoleucine has the opposite chirality at its C. The selection of isoleucine over alloisoleucine
seems to be chance.
Met The explanation above for why branched side chains are preferred over straight chains
fails for Met. Met has three bonds to be fixed if it is used in a protein core (Table 1), giving less
stabilisation then one might expect for its size. The first use of Met in protein is likely not to
have been for protein stability, however, but rather for initiation of protein synthesis, using the
AUG codon and formylMet. Occasional use of AUG would then allow Met to be found at other
sites in a protein, such as in forming Sulphur-aromatic interactions [34]. It may also be useful in
forming a hydrophobic core with its unique shape. Met is prone to oxidation at its sulphur,
9
potentially leading to loss of protein function, but this would not have occurred before the Great
Oxidation Event around 2.3 billion years. Met is also the most expensive amino acid to
biosynthesise (Table 1). Life is now stuck with this non-ideal amino acid.
Lys, Arg Lys and Arg are used to give a protein solubility and for catalysis. They are rarely
buried, not only because they have positively charged groups, but also because they are on the
ends of long, flexible, straight chains, with four rotatable bonds. Their functional groups are also
valuable when a positive charge is needed. Arg, in particular, is a superb hydrogen bond donor,
with five NH bonds, polarised due to its delocalised positive charge.
Non-polar residues, such as Leu and Ile, are also often found on the protein surface [35]. One
might expect this to be a problem for the protein, as they will reduce protein solubility. If there
are Lys or Arg side chains nearby, however, non-polar groups can form hydrophobic interactions
with the CH2 groups in the Lys and Arg side chains [36]. Non-polar side chains would thus be
less tolerated on the surface if charged groups had fewer CH2 groups.
Ser, Thr, Asn, Gln, Tyr These amino acids frequently hydrogen bond to other side chains,
amide groups in the backbone, to substrates or to ligands. The intrinsic benefit of hydrogen bond
formation is offset by the conformational entropic cost of fixing the side chain in position and
strain. Functional groups selected for hydrogen bonding are thus on the ends of short chains,
with just one or two CH2 groups, so that the cost of restricting them is low and their biosynthetic
cost is minimised.
The aromatic ring in Tyr lowers the pKa of its OH, making it easier to form an O - group and
act as an acid. Tyr can also form functional radicals, such as in ribonucleotide reductase,
photosystem II and prostaglandin H synthase [37], and transport electrons in pili [38].
10
Allothreonine, with the opposite chirality at its C, decomposes twice as fast [39], so threonine
may have been selected for this reason.
Asp, Glu The carboxyl group is stable, has a negative charge for strong electrostatic bonds,
binds metals, is an excellent hydrogen bond acceptor, increases protein solubility and can
transfer protons. Its two-fold symmetry means that the entropic cost of fixing the carboxyl group
is low.
Since Asp and Glu (and Asn and Gln) have the same functional groups, why do we have two
pairs of these amino acids? At some sites Asp and Glu (or Asn and Gln) can readily substitute for
each other. However, this is not always the case. The first residue preceding the -helix is the N-
cap; Asp and Asn are strongly favoured here since they can accept hydrogen bonds from free
backbone NH groups [40, 41]. Glu and Gln cannot do this, as their extra CH2 group pushes their
functional groups out beyond the helix terminus. Similarly, at the second (N2) position of an -
helix, Gln and Glu are frequently found since they can loop around and hydrogen bond to the
backbone NH groups [42]. Asn and Asp are too short to form these rings. Glu to Asp or Gln to
Asn can sometimes thus be very non-conservative substitutions, so all four are used.
Cys A key function of proteins is to bind metals. Soft metals, such as copper, zinc and
cadmium, bind more tightly to sulphur than to oxygen. Cys may therefore have been selected for
its metal binding properties, despite its high biosynthetic cost (Table 1). In particular, Cys is
commonly used to bind iron-sulphur clusters. These cofactors are found in a wide variety of
metalloproteins, playing crucial roles in metabolism and are very ancient [43]. The SH side chain
is also an effective nucleophile; it has a lower pKa than OH, readily forming S-.
Cys is the only side chain able to perform redox reactions, by forming disulphide bonds to
stabilise folded proteins. Cys was selected in an anaerobic environment, however, where
11
disulphide bond formation would have been rare or non-existent. It is therefore not plausible than
Cys was selected for its ability to form disulphides, and its subsequent use for this purpose,
starting very approximately 1.5 billion years later after the Great Oxidation Event, is a lucky
accident.
His While the range of chemical reactions available to the 20 side chains is dismally poor,
they are good at proton transfer. Rates of proton transfer are maximised when the general acid
has a pKa that is the same as the environment. His, with a pKa of around 6.5, that is easily
tweaked by varying its environment, is an excellent side chain for general acid and base
catalysis, and is thus abundantly found in active sites requiring proton transfer. It is also often
found binding metals.
Pro The unique structure of Pro, with no backbone NH group and its ring restricting its
backbone, means that it is incompatible with many sites in a protein. Nevertheless, it can be
highly stabilising, if its ring locks it into a desired conformation, such as within turns. Indeed, it
has the lowest conformational entropic cost of folding of any amino acid. Pro is the simplest
stable ring structure linking the C and N in an amino acid.
Trp Trp with its double aromatic ring is the least abundant amino acid and the most expensive
to synthesise. It has some distinct properties: Trp, with an absorption maximum at around 280
nm, is well suited to be a UV-B chromophore [44]. The UVR8 protein uses an excitonically
coupled Trp pyramid for this purpose in plants, for example [45], detecting potentially damaging
levels of UV radiation. The Trp indole chromophore may therefore have been selected to help
protect organisms from the intense UV radiation that was present on early Earth in the absence of
an ozone layer.
12
Radical formation is often seen as problematic, as it can lead to oxidative damage, and Trp is
most susceptible to radical formation [46]. This ability to form radicals can be used by proteins
for electron transfer, with chains of Trp used to transfer electrons across a protein [47, 48].
Amino Acids Not Selected
We can now consider why some amino acids were not selected (Fig. 2), even though they are
unlikely to be difficult or expensive to synthesise.
D-amino acids Mirror image D-amino acids can adopt unusual backbone conformation to
drive the formation of specific folded structures, such as turns and helix terminators [49].
Nevertheless, D-amino acids have steric clashes with the backbone when in secondary structure.
Proteins must therefore be either all L or all D so that these helices and sheets can form, and this
choice appears to be arbitrary.
Disubstituted at C All 20 amino acids have a hydrogen at the C, but it is possible to
have a carbon here instead. The simplest of these disubstituted amino acids is -aminoisobutyric
acid, with two methyl groups attached to the C. This additional side chain restricts the
conformation of the backbone to a helix, with the rare 310-helix favoured over the more common
-helix [50]. Disubstituted amino acids may therefore not be used, due to this lack of flexibility.
Trisubstituted at C If the H attached to the C in Val is replaced by another CH3, we
would have a C(CH3)3 side chain, giving t-leucine. This would be very hydrophobic for just one
rotatable bond. However, side chain selection is a balance between its stabilising effect on a
protein and the ability to adopt a range of conformations and a quaternary carbon would be
highly restrictive.
Straight Chain Hydrocarbons It is remarkable that the simplest hydrocarbon side chain
after Ala, CH2CH3 (α-aminobutyric acid or homoalanine), is not used. Similarly, norvaline and
13
norleucine, which have CH2CH2CH3 and CH2CH2CH2CH3, respectively, are not used, despite
each having a similar hydrophobicity to their isomers. However, as explained above, they cost
more conformational entropy to fold than their branched isomers. If Ala is extended to α-
aminobutyric acid, it gains an additional rotamer to be fixed; extending it further to Val gives
extra non-polar surface without adding any more bonds to be fixed. Hence, Val is favoured over
α-aminobutyric acid. Straight chain hydrocarbons favour isolated -helix formation [51], so
would promote the formation of solvent exposed helices or dry molten globules, not folded
proteins. Similarly, lipids use long straight chains to ensure that membranes remain fluid.
Norvaline and norleucine are also prone to misincorporation in proteins, in place of leucine or
methionine, respectively [52].
Short Chain Amines The amine group NH3+ is found on the end of a long chain of four
CH2 groups. Shorter chains, notably CH2NH3+, would have the same functional group and cost
less to biosynthesise. The Lys side chain has been selected to be on the protein surface, driven
there, not just by the presence of the highly polar group, but also by the amine being on the end
of a long straight chain, which maximises the number of bonds to be fixed if the side chain was
buried. The CH2NH3+ side chain is not used since it is conflicted - its lack of carbons means that
it would have few rotamers to fix, thus encouraging its burial, while the amine group would
prefer the surface. Amino acids with amine side chains are also prone to acyl transfer and
lactamisation (Fig. 3), with the likelihood of these unwanted reactions decreasing with chain
length [53, 54]. Arginine analogues with one or two CH2 groups can also cyclise (Fig. 4) [10].
Long Chain Carboxyls Carboxyls are found with just one or two CH2 groups, and not on
the end of long straight chains. Asp and Glu are used to hydrogen bond or transfer protons. If
they were on the ends of long chains, it would cost more conformational entropy for them to fold
14
into a single conformation for hydrogen bonding. Long chain carboxyls could play a similar role
to Lys with solubility, but having two polar atoms in the carboxyl group, compared to one in an
amine, makes the carboxyl more suitable for hydrogen bonding than the amine.
Functional Groups Bonded to C Functional groups are never bonded directly to the
C; thus we see a CH2 between a hydroxyl, carboxyl, thiol, amide, indole, phenyl or imidazole
group and the C. Large groups attached to the C, such as indole, phenyl and imidazole greatly
restrict the allowed conformations of the backbone. Aromatic or carboxyl groups attached to the
C will allow easy racemisation at the C, as proton loss here is facilitated by the aromatic
rings. Amino acids with OH or SH attached to the C are unstable. In general, a CH2R side chain
is most often preferred, since it allows and structure, is not too expensive, is stable and does
not lose too much conformational entropy when folded.
Conclusion
While we will never know for sure what happened 4 billion years ago, many of the ideas
discussed here are testable, principally by introducing unnatural amino acids into proteins or only
using a subset of amino acids. For example, it may be impossible, or at least much more difficult,
to make folded, functional proteins that use only straight chain hydrocarbon side chains, lack
short chain hydrogen bonding groups, or use short chain positive side chains instead of Lys or
Arg.
There are excellent reasons for the choice of every one of the twenty amino acids and the non-
use of other apparently simple alternatives. If all else fails, one can resort to chance or a “frozen
accident”, as an explanation. The only true frozen accident may be using isoleucine rather than
alloisoleucine. I therefore believe that when we study the xenobiochemistry of life on other
15
planets, or if we reran the events at the end of the RNA World, alien organisms will be using
much the same group of amino acids that we find here on Earth.
Acknowledgements
I thank my colleagues Simon Lovell, Alex Jones, Douglas Kell, Pedro Mendes, Andy Munro and
Simon Webb for many helpful discussions.
References
1. Crick, F. H. C. (1968) The Origin of the Genetic Code, J Mol Biol. 38, 367-379.2. Pressman, A., Blanco, C. & Chen, I. A. (2015) The RNA World as a Model System to Study the Origin of Life, Current Biology. 25, R953-R963.3. White, H. B. (1976) Coenzymes as Fossils of an Earlier Metabolic State, J Mol Evol. 7, 101-104.4. Jadhav, V. R. & Yarus, M. (2002) Coenzymes as coribozymes, Biochimie. 84, 877-888.5. Saran, D., Frank, J. & Burke, D. H. (2003) The tyranny of adenosine recognition among RNA aptamers to coenzyme A, BMC Evol Biol. 3.6. Denessiouk, K. A., Rantanen, V. V. & Johnson, M. S. (2001) Adenine recognition: A motif present in ATP-, CoA-, NAD-, NADP-, and FAD-dependent proteins, Proteins-Structure Function and Genetics. 44, 282-291.7. Chen, X., Li, N. & Ellington, A. D. (2007) Ribozyme catalysis of metabolism in the RNA world, Chem Biodivers. 4, 633-655.8. Philip, G. K. & Freeland, S. J. (2011) Did Evolution Select a Nonrandom "Alphabet" of Amino Acids?, Astrobiology. 11, 235-240.9. Ilardo, M., Meringer, M., Freeland, S., Rasulev, B. & Cleaves, H. J. (2015) Extraordinarily Adaptive Properties of the Genetically Encoded Amino Acids, Sci Rep. 5.10. Weber, A. L. & Miller, S. L. (1981) Reasons for the Occurrence of the Twenty Coded Protein Amino Acids, J Mol Evol. 17, 273-284.11. Cleaves, H. J. (2010) The origin of the biologically coded amino acids, J Theor Biol. 263, 490-498.12. Akashi, H. & Gojobori, T. (2002) Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis, Proceedings of the National Academy of Sciences of the United States of America. 99, 3695-3700.13. Richards, F. M. (1977) Areas, Volumes, Packing, and Protein-Structure, Annual Review of Biophysics and Bioengineering. 6, 151-176.14. Brooks, D. J., Fresco, J. R., Lesk, A. M. & Singh, M. (2002) Evolution of amino acid frequencies in proteins over deep time: Inferred order of introduction of amino acids into the genetic code, Mol Biol Evol. 19, 1645-1655.
16
15. Doi, N., Kakukawa, K., Oishi, Y. & Yanagawa, H. (2005) High solubility of random-sequence proteins consisting of five kinds of primitive amino acids, Protein Eng Des Sel. 18, 279-284.16. McDonald, G. D. & Storrie-Lombardi, M. C. (2010) Biochemical Constraints in a Protobiotic Earth Devoid of Basic Amino Acids: The "BAA(-) World", Astrobiology. 10, 989-1000.17. Longo, L. M. & Blaber, M. (2012) Protein design at the interface of the pre-biotic and biotic worlds, Arch Biochem Biophys. 526, 16-21.18. Davidson, A. R., Lumb, K. J. & Sauer, R. T. (1995) Cooperatively Folded Proteins in Random Sequence Libraries, Nature Structural Biology. 2, 856-864.19. Riddle, D. S., Santiago, J. V., BrayHall, S. T., Doshi, N., Grantcharova, V. P., Yi, Q. & Baker, D. (1997) Functional rapidly folding proteins from simplified amino acid sequences, Nature Structural Biology. 4, 805-809.20. Longo, L. M., Lee, J. & Blaber, M. (2013) Simplified protein design biased for prebiotic amino acids yields a foldable, halophilic protein, Proceedings of the National Academy of Sciences of the United States of America. 110, 2135-2139.21. Higgs, P. G. (2009) A four-column theory for the origin of the genetic code: tracing the evolutionary pathways that gave rise to an optimized code, Biol Direct. 4.22. Baldwin, R. L., Frieden, C. & Rose, G. D. (2010) Dry molten globule intermediates and the mechanism of protein unfolding, Proteins. 78, 2725-2737.23. Baldwin, R. L. & Rose, G. D. (2013) Molten globules, entropy-driven conformational change and protein folding, Curr Opin Struct Biol. 23, 4-10.24. Scholtz, J. M., Marqusee, S., Baldwin, R. L., York, E. J., Stewart, J. M., Santoro, M. & Bolen, D. W. (1991) Calorimetric determination of the enthalpy change for the -helix to coil transition of an alanine peptide in water, Proc Nat Acad Sci USA. 88, 2854-8.25. Doig, A. J. & Sternberg, M. J. (1995) Side-chain conformational entropy in protein folding, Prot Sci. 4, 2247-51.26. Maiorov, V. & Abagyan, R. (1998) Energy strain in three-dimensional protein structures, Fold Des. 3, 259-269.27. Penel, S. & Doig, A. J. (2001) Rotamer strain energy in protein helices - Quantification of a major force opposing protein folding, J Mol Biol. 305, 961-968.28. Ventura, S., Vega, M. C., Lacroix, E., Angrand, I., Spagnolo, L. & Serrano, L. (2002) Conformational strain in the hydrophobic core and its implications for protein folding and design, Nature Structural Biology. 9, 485-493.29. Creamer, T. P. & Rose, G. D. (1992) Side-chain entropy opposes -helix formation but rationalizes experimentally determined helix-forming propensities, Proc Nat Acad Sci USA. 89, 5937-41.30. Rohl, C. A., Chakrabartty, A. & Baldwin, R. L. (1996) Helix propagation and N-cap propensities of the amino acids measured in alanine-based peptides in 40 volume percent trifluoroethanol, Prot Sci. 5, 2623-37.31. Lovell, S. C., Word, J. M., Richardson, J. S. & Richardson, D. C. (2000) The penultimate rotamer library, Proteins: Struc Funct Genet. 40, 389-408.32. Doig, A. J. (1996) Thermodynamics of amino acid side-chain internal rotations, Biophys Chem. 61, 131-141.33. McAuliffe, C. (1966) Solubility in Water of Paraffin, Cycloparaffin, Olefin, Acetylene, Cycloolefin, and Aromatic Hydrocarbons, J Phys Chem. 70.
17
34. Valley, C. C., Cembran, A., Perlmutter, J. D., Lewis, A. K., Labello, N. P., Gao, J. & Sachs, J. N. (2012) The Methionine-aromatic Motif Plays a Unique Role in Stabilizing Protein Structure, Journal of Biological Chemistry. 287, 34979-34991.35. Chothia, C. (1976) The nature of the accessible and buried surfaces in proteins, J Mol Biol. 105, 1-14.36. Andrew, C. D., Penel, S., Jones, G. R. & Doig, A. J. (2001) Stabilising non-polar/polar side chain interactions in the -helix, Proteins: Struct Funct Genet. 45, 449-455.37. Pedersen, J. Z. & Finazziagro, A. (1993) Protein-Radical Enzymes, Febs Letters. 325, 53-58.38. Vargas, M., Malvankar, N. S., Tremblay, P. L., Leang, C., Smith, J. A., Patel, P., Synoeyenbos-West, O., Nevin, K. P. & Lovley, D. R. (2013) Aromatic Amino Acids Required for Pili Conductivity and Long-Range Extracellular Electron Transport in Geobacter sulfurreducens, mBio. 4.39. Schroeder, R. A. & Bada, J. L. (1977) Kinetics and Mechanism of the Epimerization and Decomposition of Threonine in Fossil Foraminifera, Geochim Cosmochim Acta. 41, 1087-1095.40. Serrano, L. & Fersht, A. R. (1989) Capping and -helix stability, Nature. 342, 296-9.41. Doig, A. J. & Baldwin, R. L. (1995) N- and C-capping preferences for all 20 amino acids in -helical peptides, Prot Sci. 4, 1325-36.42. Penel, S., Hughes, E. & Doig, A. J. (1999) Side-chain structures in the first turn of the -helix, J Mol Biol. 287, 127-143.43. Wachtershauser, G. (1992) Groundworks for an Evolutionary Biochemistry - The Iron Sulfur World, Prog Biophys Mol Biol. 58, 85-201.44. Fritsche, E., Schafer, C., Calles, C., Bernsmann, T., Bernshausen, T., Wurm, M., Hubenthal, U., Cline, J. E., Hajimiragha, H., Schroeder, P., Klotz, L. O., Rannug, A., Furst, P., Hanenberg, H., Abel, J. & Krutmann, J. (2007) Lightening up the UV response by identification of the arylhydrocarbon receptor as a cytoplasmatic target for ultraviolet B radiation, Proceedings of the National Academy of Sciences of the United States of America. 104, 8851-8856.45. Christie, J. M., Arvai, A. S., Baxter, K. J., Heilmann, M., Pratt, A. J., O'Hara, A., Kelly, S. M., Hothorn, M., Smith, B. O., Hitomi, K., Jenkins, G. I. & Getzoff, E. D. (2012) Plant UVR8 Photoreceptor Senses UV-B by Tryptophan-Mediated Disruption of Cross-Dimer Salt Bridges, Science. 335, 1492-1496.46. Hawkins, C. L. & Davies, M. J. (2001) Generation and propagation of radical reactions on proteins, Biochim Biophys Acta-Bioenerg. 1504, 196-219.47. Park, H. W., Kim, S. T., Sancar, A. & Deisenhofer, J. (1995) Crystal-Structure of DNA Photolyase from Escherichia-Coli, Science. 268, 1866-1872.48. Chaves, I., Pokorny, R., Byrdin, M., Hoang, N., Ritz, T., Brettel, K., Essen, L. O., van der Horst, G. T. J., Batschauer, A. & Ahmad, M. (2011) The Cryptochromes: Blue Light Photoreceptors in Plants and Animals in Annual Review of Plant Biology, Vol 62 (Merchant, S. S., Briggs, W. R. & Ort, D., eds) pp. 335-364, Annual Reviews, Palo Alto.49. Mahalakshmi, R. & Balaram, P. (2006) The Use of D-Amino Acids in Peptide Design in D-Amino Acids: A New Frontier in Amino Acid and Protein Research - Practical Methods and Protocols (Konno, R., Brückner, H., D'Aniello, A., Fisher, G. H., Noriko, F. & Homma, H., eds) pp. 415-430, Nova Science Publishers.50. Karle, I. L. & Balaram, P. (1990) Structural characteristics of -helical peptide molecules containing Aib residues, Biochemistry. 29, 6747-6756.51. Padmanabhan, S. & Baldwin, R. L. (1991) Straight-chain non-polar amino acids are good helix-formers in water, J Mol Biol. 219, 135-7.
18
52. Alvarez-Carreno, C., Becerra, A. & Lazcano, A. (2013) Norvaline and Norleucine May Have Been More Abundant Protein Components during Early Stages of Cell Evolution, Orig Life Evol Biosph. 43, 363-375.53. Poduska K, K. G., Silaev AB, Rudinger, J. (1965) Amino acids and peptides. LII. Intramolecular aminolysis of amide bonds in derivatives of α,γ-diaminobutyric acid, α,β-diaminopropionic acid, and ornithine, Collect Czech Chem Comm. 30, 2410-2433.54. Hettinge.Tp & Craig, L. C. (1970) Edeine. 4. Structures of Antibiotic Peptides Edeine-A1 and Edeine-B1, Biochemistry. 9, 1224-&.55. Phillips, R., Kondev, J., Theriot, J. & Garcia, H. G. (2013) Physical Biology of the Cell, 2nd edn, Garland Science.
19
Table 1. Amino Acid Properties
Amino Acid
Molecules per Cell[55]
Biosynthetic Cost in Glucose Equivalents
Anaerobic Biosynthetic Cost in ATP Equivalents [55]
Number of Rotatable Bonds
Ala 2.9 x 10-8 0.5 1 0
Arg 1.7 x 10-8 0.5 13 4
Asn 1.4 x 10-8 0.5 5 2
Asp 1.4 x 10-8 0.5 2 2
Cys 5.2 x 10-7 0.5 15 1
Glu 1.5 x 10-8 0.5 -1 3
Gln 1.5 x 10-8 0.5 0 3
Gly 3.5 x 10-8 0.5 2 0
His 5.4 x 10-7 1 7 2
Ile 1.7 x 10-8 1 11 2
Leu 2.6 x 10-8 1.5 1 2
Lys 2.0 x 10-8 1 9 4
Met 8.8 x 10-7 1 23 3
Phe 1.1 x 10-8 2 2 2
Pro 1.3 x 10-8 0.5 4 0
Ser 1.2 x 10-8 0.5 2 1
Thr 1.5 x 10-8 0.5 8 1
Trp 3.3 x 10-7 2.5 7 2
Tyr 7.9 x 10-7 2 2 2
Val 2.4 x 10-8 1 2 1
20
Table 2. Possible Structures of a Protein
Structure
Unfolded Isolated Secondary Structure
Dry Molten Globule
Folded Aggregated
Secondary Structure
None -Helices and/or -sheets
-Helices and/or -sheets
-Helices and/or -sheets
Low -helix; High -sheet
Properties Side Chains Solvent exposed
Solvent exposed Buried non-polar side chains. Fluid, liquid-like core, with side chains able to rotate
Buried non-polar side chains. Rigid, solid-like core, with side chains locked into a single rotamer
Buried non-polar side chains. Rigid, solid-like core, with side chains locked into a single rotamer
Main Thermodynamic Changes upon Formation
Loss of conformational entropy offset by hydrogen bonding and water released from polar groups
Loss of conformational entropy offset by gain in entropy of water released from non-polar groups
Gain of van der Waals and hydrogen bonds, offset by loss of conformational entropy and introduction of strain
Loss of conformational entropy offset by gain in entropy of water released from protein surface
21
Figure Legends
Fig. 1. The 20 Standard Amino Acids showing Rotatable Bonds
Fig. 2. Examples of Amino Acids not Selected. A) D-Alanine. B) Norvaline. C) t-Leucine. D) -
aminoisobutyric acid E) diaminopropionic acid. F) Homoglutamic acid
Fig. 3. Acyl Transfer and Lactamisation in Amino Acids with Amines
Fig. 4. Cyclisation of Arg Analogues
22
Recommended