Transcript
Page 1: Internal Mirror Symmetry of Nucleotide Sequences in Genes Encoding Different Families of Proteins

1607-6729/01/0304- $25.00 © 2001

MAIK “Nauka /Interperiodica”0088

Doklady Biochemistry and Biophysics, Vol. 377, 2001, pp. 88–91. Translated from Doklady Akademii Nauk, Vol. 377, No. 2, 2001, pp. 273–276.Original Russian Text Copyright © 2001 by Shpakov.

One of the most important characteristics of thestructural organization of protein molecules is theirsymmetry. It is expressed at the level of both thedomain organization of protein molecules [1] and theirprimary structure [2, 3]. Internal mirror symmetry wasdiscovered in amino acid sequences of proteins that aredifferent in their origin, structure, and function [2–10].As a result, this type of symmetry can be considered auniversal characteristic of the primary structure of pro-tein molecules, which occurred at the earliest stages oftheir evolution [3]. The discovery of internal mirrorsymmetry in the amino acid sequence of a proteinimplies that it should also exist in the nucleotidesequence encoding this protein. This hypothesis is cor-roborated by the data on the internal symmetry in thesequences of the second codon bases in nucleotidesequences [4, 5]. It is known that the second codonbases determine the type of amino acid residuesencoded by these codons; for this reason, the secondcodon bases are also called the “roots” of codons ofamino acids [11].

The goals of this study were (1) to find internal mir-ror symmetry in the nucleotide sequences of genesencoding different protein families using the approachthat we developed earlier and (2) to compare the distri-bution of the internal symmetry centers in the nucle-otide sequences with that in the amino acid sequences.The study was performed on yeast and human genes.We used the genes of the yeast

Schizosaccharomycespombe

(

gpt

) and

Saccharomyces cerevisiae

(

alg

5

,

alg

6

,

alg

8

, and

swp

1

) that code for the enzymes of the doli-chol cycle. These enzymes are involved in the synthesisof the oligosaccharide precursor for N-glycosylation ofendoplasmic reticulum proteins in eukaryotic cells. Thehuman genes studied were those encoding the compo-nents of the adenylate cyclase signal system(

β

2

-adrenoreceptor,

α

-subunits of stimulatory (

α

s

) andinhibitory (

α

i

2

) G proteins, and adenylate cyclase VI).The discovery of internal symmetry in nucleotide

sequences is a fundamental achievement in molecular

biology of nucleic acids, because, up to now, symmetryin DNA and RNA molecules has only been consideredin the context of their spatial organization, (in particu-lar, the peculiarities of DNA double helix folding or thetopology of mRNA palindromes) [12]. This study is thefirst step towards a comprehensive investigation ofinternal symmetry at the level of gene primary struc-ture. Decoding of the mechanisms of origin and trans-formation of symmetric structures at the first stages ofbiomolecule evolution and determination of the role ofinternal mirror symmetry in the functional activity ofnucleic acids and proteins may become a promisingline of research in the basics of molecular evolution.

To detect internal symmetry in amino acidsequences, we earlier developed and tested the point–matrix method, which is based on a comparative analy-sis of the direct and reverse amino acid sequences [6, 7],and the method for scanning symmetric segments in theprimary structure of proteins [8–10]. After modificationfor nucleotide sequence analysis, the latter method wasfound to be preferential for detecting symmetric seg-ments in nucleotide sequences, especially for comparativeanalysis of the distribution of internal symmetry in nucle-otide and amino acid sequences. Its algorithm is shown inFig. 1. In each position

k

of length

L

, we compare thedirect and reverse sequences that start either at the samepoint with coordinate

k

(

k

=

l

,

l

+ 1, …,

L

l

+ 1, where

l

is the window length) (case 1) or at two adjacent pointswith coordinates

k

and

k

+ 1 (

k

=

l

,

l

+ 1, …,

L

l

)

(case 2).The symmetry center is represented by one nucleotide(single-point site) in the former case and two adjacentnucleotides (double-point site) in the latter case. Thecriterion of proximity of the direct and reverse nucle-otide sequences is the sum (

S

) of values that determinecoincidence or noncoincidence of nucleotides duringtheir comparison in pairs. In case 1, it is the sum of(

k

+

n

) and (

k

n

) amino acid residues (

n

<

l

;

S

k

); and incase 2, it is the sum of (

k

+

n

+ 1) and (

k

n

) amino acidresidues (

n

<

l

;

S

k

,

k

+

1

). The obtained sums are first nor-malized to the window length and then presentedgraphically as symmetry profiles. The minimums onthe profile correspond to the centers of internal mirrorsymmetry. Using this algorithm, we analyzed both full-size nucleotide sequences (the frame length was

Internal Mirror Symmetry of Nucleotide Sequencesin Genes Encoding Different Families of Proteins

A. O. Shpakov

Presented by Academician V.L. Sviderskii September 27, 2000

Received September 27, 2000

Sechenov Institute of Evolution, Physiology,and Biochemistry, Russian Academy of Sciences,pr. Morisa Toreza 44, St. Petersburg, 194223 Russia

BIOCHEMISTRY, BIOPHYSICS,AND MOLECULAR BIOLOGY

Page 2: Internal Mirror Symmetry of Nucleotide Sequences in Genes Encoding Different Families of Proteins

DOKLADY BIOCHEMISTRY AND BIOPHYSICS

Vol. 377

2001

INTERNAL MIRROR SYMMETRY 89

24 nucleotides) and the shortened variants, i.e., thesequences of the first, second, and third codon bases(the frame length was 12 nucleotides). To compare thedistribution of internal symmetry in the nucleotide andamino acid sequences, we analyzed internal symmetryin amino acid sequences in parallel (the frame was12 amino acid residues).

The use of the method of internal symmetry scan-ning made it possible to reveal the symmetric segmentsin both full-size and shortened nucleotide sequences inall genes chosen. Figure 2 shows an example of the pro-files obtained: the symmetry profile of the gene

gpt

bysingle-point internal symmetry centers. The analysis ofthe data obtained led us to the following conclusions.

First, in most cases, the full-size and shortenednucleotide sequences significantly differ in the degreeof symmetry. For instance, at

S

< 15, in the case of full-size nucleotide sequences, the density of internal sym-metry centers (i.e., the number of these centers per100 nucleotides) varied from 10.8 (the

α

i

2

gene) to 22.7(the

β

2

-adrenoreceptor gene) for the centers formed byone nucleotide. For the centers formed by two adjacentnucleotides, this value varied from 6.1 (the

α

s

gene) to15.7 (the

β

2

-adrenoreceptor gene). A different patternwas observed in the case of the shortened nucleotidesequences (

S

< 7), which were the sequences of thefirst, second, and third codon bases. The densities ofinternal symmetry centers were 11.1–16.9, 13.6–21.2,and 13.2–45.1, as opposed to 5.7–12.0, 5.9–12.7, and3.5–35.2 for the symmetry centers formed by one andby two nucleotides, respectively. Noteworthy, in thegroup of genes encoding closely related enzymes of thedolichol cycle or the

α

subunit of G proteins, the rangeof density values was considerably less.

Second, distribution of the internal symmetry cen-ters in all nucleotide sequences analyzed was veryirregular. For example, in full-size nucleotidesequences, more than half of the internal symmetrycenters were located in the regions accounting for nomore than 20% of the gene length.

To evaluate the correlation between the distributionof internal symmetry in different nucleotide sequencesversus amino acid sequences, we compared their sym-metry profiles. Correlation was estimated quantita-tively by the number of coincidences between the inter-nal symmetry centers in the nucleotide and amino acidsequences. Data summarized in the table show that thepositive correlation between the distribution of theinternal symmetry centers in amino acid sequences andthe sequences of the second codon bases was the mostpronounced. This is in a good agreement with thehypothesis that the second codon bases determine thetype of amino acid residues [11].

Earlier, we showed that, as a rule, the internal sym-metry centers in the primary structures of proteinseither coincide with functionally important sites or arelocated in the vicinity of them [2, 3]. The discoveredsimilarity between the profiles of symmetry of aminoacid sequences and the corresponding sequences of thesecond codon bases indicates that, in the latter, the den-sity of internal symmetry centers is the highest for thesegments encoding functionally important regions ofthe proteins. These regions are usually the most con-served in nucleotide sequences; therefore, there is acorrespondence between the symmetry level in thesequence of the second codon bases and the degree towhich their primary structures are conserved. This cor-respondence can be illustrated by the following data.

In the genes encoding the enzymes of the dolicholcycle (

gpt

,

alg

5

,

alg

6

, and

alg

8

), the major cluster ofthe internal symmetry centers is located in the nucle-otide sequence segments that correspond to the poten-tial catalytic sites of these enzymes and to the most con-served transmembrane domains. The latter determinethe topology of the enzyme molecule in the membraneand participate in binding with the hydrophobic poly-isoprenoid chain of the substrate of the enzymatic reac-tion. In the

β

2

-adrenoreceptor gene, symmetric struc-tures are distinct in the nucleotide sequence segmentsthat correspond to the regions of amino acid sequencesresponsible for the interaction with the

α

subunit of the

k

l

+ 1 , . . . ,

k

n

, . . . ,

5'

l

. . . g g g c c c a a a t t T t t a a a c c c g g g . . .

L

3'

--------------------------------------------------------------

{ }

l

= 12 ,

n

<

l

(a)

5'

l L

3'

--------------------------------------------------------------

{ }

l

= 12 ,

n

<

l

(b)

k

, . . . ,

k

+

n

, . . . ,

k

+

l

– 1

k

l

+ 1 , . . . ,

k

n

, . . . ,

k

,

k

+ 1 , . . . ,

k

+

n +

1

, . . . ,

k

+

l

. . . g g g c c c a a a t t T T t t a a a c c c g g g . . .

Fig. 1.

Algorithm of scanning of nucleotide sequences to reveal (a) the single-point and (b) the double-point internal symmetry cen-ters.

Page 3: Internal Mirror Symmetry of Nucleotide Sequences in Genes Encoding Different Families of Proteins

90

DOKLADY BIOCHEMISTRY AND BIOPHYSICS

Vol. 377

2001

SHPAKOV

G protein. In the genes encoding

α

subunits of the Gproteins, the density of internal symmetry centers ismaximum in those nucleotide sequence segments thatencode (1) the N-terminal regions of α subunits, whichare responsible for the interaction with the βγ dimer;(2) the “triggering” regions (I and III), which form thepocket that binds to the guanine nucleotide moleculeand are involved in the interaction with the effector andthe βγ dimer; and (3) the C-terminal regions, whichprovide the functional coupling of α subunits with thereceptor and effector molecules. In the gene encodingtype VI adenylate cyclase, the clusters of internal sym-metry centers are pronounced in the nucleotidesequences that encode the regions either responsible forthe interaction with the α subunits of G proteins and are

involved in stabilization of the functionally activeC1/C2 dimer complex or highly conserved among dif-ferent types of adenylate cyclase transmembranedomains.

Thus, the analysis of internal symmetry in genesencoding protein families revealed internal mirror sym-metry in their primary structure, both in the full-sizenucleotide sequences and in the sequences of the first,second, and third codon bases. We found that the den-sity of the internal symmetry centers significantly var-ies from gene to gene, the centers themselves being dis-tributed in the nucleotide sequences very irregularly. Acomparative analysis of the distribution of the internalsymmetry centers in amino acid sequences and differ-ent nucleotide sequences demonstrated a positive cor-

1

2

3

4

5

50 100 1500

1

2

3

4

5

200 250 300150

1

2

3

4

5

350 400300

Fig. 2. Profiles of distribution of the internal symmetry centers formed by one nucleotide (in the case of amino acid sequences andamino acid residues) in (1) the full-size and shortened nucleotide sequences (i.e., (2) the first, (3) second, and (4) third codon bases)of the gene gpt, as well as (5) in the amino acid sequence of the GPT protein. The numbers on the horizontal scales correspond tothe positions of the amino acid residues in the enzyme molecule and to the positions of nucleotides in the shortened nucleotidesequences. In the case of a full-size nucleotide sequence, each amino acid residue corresponds to one codon.

Page 4: Internal Mirror Symmetry of Nucleotide Sequences in Genes Encoding Different Families of Proteins

DOKLADY BIOCHEMISTRY AND BIOPHYSICS Vol. 377 2001

INTERNAL MIRROR SYMMETRY 91

relation between the symmetric structures in aminoacid sequences and the sequences of the second codonbases; this is consistent with the decisive role of the lat-ter in coding for amino acid residues. The revealed cor-relation, along with the previously established closeinterrelation between the symmetry in amino acidsequence and its functional importance [2, 3], makes itpossible to use the data on the internal symmetry ingenes as a prognostic method for finding the segmentsin nucleotide sequences that encode functionallyimportant regions in protein molecules.

REFERENCES

1. Blundell, T.L. and Srinivasan, N., Proc. Natl. Acad. Sci.USA, 1996, vol. 93, no. 25, pp. 14243–14248.

2. Shpakov, A.O. and Pertseva, M.N., Usp. Biol. Khim.,1999, vol. 39, pp. 141–186.

3. Shpakov, A.O., Zh. Evol. Biokhim. Fiziol., 2000, vol. 36,no. 6.

4. Shpakov, A.O., Zh. Evol. Biokhim. Fiziol., 1995, vol. 31,nos. 5–6, pp. 519–528.

5. Shpakov, A.O., Zh. Evol. Biokhim. Fiziol., 1996, vol. 32,no. 5, pp. 545–555.

6. Shpakov, A.O., Ukr. Biokhim. Zh., 1997, vol. 69, nos. 5–6,pp. 117–127.

7. Shpakov, A.O., Tsitologiya, 1997, vol. 39, no. 7,pp. 590–600.

8. Shpakov, A.O., Zh. Evol. Biokhim. Fiziol., 1998, vol. 34,no. 5, pp. 539–548.

9. Shpakov, A.O., Tsitologiya, 1998, vol. 40, nos. 2–3,pp. 210–221.

10. Shpakov, A.O., Ukr. Biokhim. Zh., 1998, vol. 70, no. 5,pp. 140–145.

11. Chipens, G.I., Ievinya, N.G., and Tsilinskis, E.E.,Bioorg. Khim., 1992, vol. 18, no. 11, pp. 1445–1453.

12. Garcia-Bellido, A., Proc. Natl. Acad. Sci. USA, 1996,vol. 93, no. 25, pp. 14229–14232.

Quantitative estimation of the correspondence between internalsymmetry centers in the full-size and shortened nucleotide se-quences and the amino acid sequences that they encode

Protein

Nucleotide sequences

full-sizefirst

codon bases

second codon bases

third codon bases

GPT:single-point 0.119 0.157 0.412 0.098double-point 0.132 0.200 0.415 0.156

ALG5:single-point 0.117 0.163 0.377 0.132double-point 0.078 0.088 0.333 0.151

ALG6:single-point 0.212 0.241 0.396 0.123double-point 0.177 0.233 0.565 0.148

ALG8:single-point 0.133 0.132 0.458 0.101double-point 0.122 0.178 0.424 0.154

SWP1:single-point 0.206 0.089 0.356 0.158double-point 0.197 0.179 0.515 0.100

β-Adrenoreceptor:single-point 0.238 0.329 0.547 0.153double-point 0.216 0.440 0.644 0.128

αs-Subunit:single-point 0.120 0.172 0.320 0.107double-point 0.097 0.143 0.375 0.058

αi2-Subunit:single-point 0.026 0.024 0.208 0.055double-point 0.029 0.083 0.286 0.047

Adenylate cyclase:single-point 0.133 0.169 0.458 0.116double-point 0.118 0.089 0.420 0.123

Note: The values represent the ratios between the number of coin-cidences of internal symmetry centers when comparingamino acid and nucleotide sequences and the total amount ofthe internal symmetry centers in the nucleotide sequences.


Recommended