6
Int. J. Peptide Protein Res. 9, 1977, 5 - 10 Published by Munksgaard, Copenhagen, Denmark No part may be reproduced by any process without written permission from the author(s) SYMMETRY PATTERNS IN TRYPSINOGEN SEMIH ERHAN, LARRY D. GRELLER and BARBARA RASCO Department of Animal Biology, School of Veterinaty Medicine, University of Pennsylvania Philadelphia, Pen nsy 1 vania, U. S. A. Received 14 April 1976 When the primary structure of bovine trypsinogen is searched for the existence of regularities, according to Greller & Erhan (1974), one finds eight pairs of peptides. arranged in a symmetrical pattern along the molecule. These peptides cover 49% of the length of the molecule-1 12 of the 227 amino acids - and each pair folds in a similar way. This observation is in agreement with the observation that “Trypsin folds into two halves, each o f which contains a pseudo-cylindrical arrangement of hydrogen bonds.. .”, Stroud et al. (1971). Thus the above mentioned method is capable not only of detecting regularities along the primaty structure but also of predicting the folding of a protein. Whether any regularities occur within the pri- mary structure of a protein is a question that emerged almost as soon as the amino acid se- quence of proteins became available and has remained a point of strong controversy ever since. During our studies on amino acid hom- ologies found among proteins that appear to be ancestrally unrelated, we have developed a method whereby a computer performs a sliding match between two proteins and computes the similarities between two vertical amino acid pairs and prints the result (Greller & Erhan, 1974). Using this method, it soon became ap- parent that many repeating segments occurred along a protein chain which were homologous either to a small peptide or to a “peptide” found within a protein which was used as the “key” (Erhan & Greller, 1974). These repeating segments, which we have named “sub- sequences”, also appear when a single protein is matched against itself. When tqpsinogen is matched against itself one observes, typically, many repeating sub-sequences; however, 16 of these sub-sequences have something unique about them: eight of them are found along the amino half and eight of them along the car- boxyl half of the trypsinogen molecule and they form eight pairs of homologous peptides. Furthermore, these peptide pairs fold in a simi- lar way, and we believe this method can be used to predict the folding of a protein. MATERIALS AND METHODS The matching was performed according to Greller & Erhan (1974) whereby a sliding match is made, either between two different proteins, a protein and a small peptide, or of a protein against itself. At each position the simi- larity scores of each vertical amino acid pair are computed and printed. The similarities are scored on a scale of 0 to 9, where 0 indicates a no-match and 8 and 9 represent identical amino acids, with 9 indicating the occurrence of the identical vertical pairs between F, C, W, and Y, according to McLachlan (1971). In the print- out 8 is represented by (.) and 9 by (’) for easier recognition of the matches we consider significant (Greller & Erhan, 1974), where about half of the amino acids are identical. 5

SYMMETRY PATTERNS IN TRYPSINOGEN

Embed Size (px)

Citation preview

Page 1: SYMMETRY PATTERNS IN TRYPSINOGEN

Int. J. Peptide Protein Res. 9, 1977, 5 - 10 Published by Munksgaard, Copenhagen, Denmark No part may be reproduced by any process without written permission from the author(s)

SYMMETRY PATTERNS IN TRYPSINOGEN

SEMIH ERHAN, LARRY D. GRELLER and BARBARA RASCO

Department of Animal Biology, School of Veterinaty Medicine, University of Pennsylvania Philadelphia, Pen nsy 1 vania, U. S. A .

Received 14 April 1976

When the primary structure o f bovine trypsinogen is searched for the existence of regularities, according to Greller & Erhan (1974) , one finds eight pairs of peptides. arranged in a symmetrical pattern along the molecule. These peptides cover 49% of the length of the molecule-1 12 of the 227 amino acids - and each pair folds in a similar way. This observation is in agreement with the observation that “Trypsin folds into two halves, each o f which contains a pseudo-cylindrical arrangement of hydrogen bonds. . .”, Stroud et al. (1971). Thus the above mentioned method is capable not only of detecting regularities along the primaty structure but also o f predicting the folding o f a protein.

Whether any regularities occur within the pri- mary structure of a protein is a question that emerged almost as soon as the amino acid se- quence of proteins became available and has remained a point of strong controversy ever since. During our studies on amino acid hom- ologies found among proteins that appear to be ancestrally unrelated, we have developed a method whereby a computer performs a sliding match between two proteins and computes the similarities between two vertical amino acid pairs and prints the result (Greller & Erhan, 1974). Using this method, it soon became ap- parent that many repeating segments occurred along a protein chain which were homologous either to a small peptide or to a “peptide” found within a protein which was used as the “key” (Erhan & Greller, 1974). These repeating segments, which we have named “sub- sequences”, also appear when a single protein is matched against itself. When tqpsinogen is matched against itself one observes, typically, many repeating sub-sequences; however, 16 of these sub-sequences have something unique about them: eight of them are found along the

amino half and eight of them along the car- boxyl half of the trypsinogen molecule and they form eight pairs of homologous peptides. Furthermore, these peptide pairs fold in a simi- lar way, and we believe this method can be used to predict the folding of a protein.

MATERIALS AND METHODS

The matching was performed according to Greller & Erhan (1974) whereby a sliding match is made, either between two different proteins, a protein and a small peptide, or of a protein against itself. At each position the simi- larity scores of each vertical amino acid pair are computed and printed. The similarities are scored on a scale of 0 to 9 , where 0 indicates a no-match and 8 and 9 represent identical amino acids, with 9 indicating the occurrence of the identical vertical pairs between F, C, W, and Y, according to McLachlan (1971). In the print- out 8 is represented by (.) and 9 by (’) for easier recognition of the matches we consider significant (Greller & Erhan, 1974), where about half of the amino acids are identical.

5

Page 2: SYMMETRY PATTERNS IN TRYPSINOGEN

SEMIH ERHAN, LARRY D. GRELLER AND BARBARA RASCO

The alpha carbon backbones of the homolo- Atomic coordinates of trypsinogen were gous peptides were constructed utilizing a wire kindly supplied by Drs. R. E. Dickerson and bender developed by Rubin & Richardson R. M. Stroud of California Institute of (1972) and which is avadable from Charles Technology. Supper Co., Natick, Mass.

The bending angles needed for this construc- tion were obtained from a program devised and kindly furnished by Dr. Byron Rubin of the Institute for Cancer Research, Philadelphia, Pa.

RESULTS

When trypsinogen is matched against itself one finds the ususal occurrence of many repeating

TABLE 1 Homologous peptides found in trypsinogen, their position, sequence, and double-matching probabilities

Position of Symbol used Amino acid sequence M score of Double-matching the peptidea for the peptide of the peptide the match probability

P(M' > M) b Individual scores

9-18 1 GGY TCGANTV 210-219 11 GVYTKVCNYV

. 2 ' . 0 2 1 . 1 . 47 2.0 x 10-3

22-27 2 VSLNSG* 197-202 12 VSWGSG

. . 3 3 . . 38 1.4 x 10-4

' 3 . 3 . 31 4.5 x 10-4

31-35 3 CGGSL 189-193 13 CSGKL

49 -54 4 182-187 14

57-60 7 175-178 15

KSGIQV DSGGPV 3 . . 1 3 .

GQDN GKDS . 4 . 5

31 4.5 x 10-3

25 9.7 x lo-'

12-76 8 SASKS 152-156 16 SSCKS

. 4 2 . . 30 2.0 x 10-3

81-92 9 SYNSNTLNNDIM 130-142 6 TKS SGTSY PDVL

5 1 5 . 3 . 2 2 1 . 5 6 54 2.5 x 10-3

101 -108 10 SLNSNVAS* 110-1 17 5 SLPTSCAS

. . 1 5 4 1 . . 43 1.0 x 10-3

a Position number indicates the position along the protein of the sequence being matched, numbered contigu- ously from the amino terminus of the protein. Contiguous numbering is used to avoid confirsion arising from many different groups introducing gaps into their sequences for various reasons. b The numbers, periods, and apostrophes below the target sequences represent the individual similarity scores between the amino acids of the two peptides. M score shows the cumulative score for the span length of the key which is obtained by summing of the individual scores. Double-matching probability gives the probability for such a matching to occur by chance. *SLNS sequence occurs within the two peptides 2 and 10, underscoring the significance of these repeating sequences. These remarks are also valid for Tables 2 and 3. 6

Page 3: SYMMETRY PATTERNS IN TRYPSINOGEN

SYMMETRY PATTERNS IN TRYPSINOGEN

sub-sequences with different levels of signifi- cance (Table I). These sub-sequences are not all of the homologies which cover over 90% of the length of the molecule found during the matching of trypsinogen against itself. They were selected for this study because of their interesting distribution along the trypsinogen molecule. Among these some stand out, not because of their exceptional statistical signifi- cance, but because they were situated along the trypsinogen molecule in a symmetrical fashion (Fig. 1). Thus one sees eight pairs of peptides; eight peptides stretch along the carboxyl half and their respective pairs stretch along the amino half of the trypsinogen molecule. We have already discussed the reasons why and lo4 truly represent significant homologies (Greller & Erhan, 1974, Erhan & Greller, 1 9 74a)

Throughout this study all amino acids are numbered contiguously from the N-terminus. Furthermore, these sub-sequences cover nearly half (1 12 or 49%) of the 227 amino acid length of the molecule.

There are two basic assumptions behind amino acid homology studies among proteins:

1) If the amino acid sequences of two pep- tides contain similar amino acids their folding will be similar.

2) If proteins that perform the same reaction or bind the same substrates have active

FIGURE 1 Symmetry pattern found in tryp- sinogen. a) Blocks represent the relative positions of the homologous sub- sequences found along the protein chain. Trypsinogen is drawn as a Line from the N-terminus to the C- terminus. The numbers found above the blocks represent the arbitrary numbers given to the

and/or binding site fragments that are homologous then the 3-dimensional con- formation of these fragments is similar.

The first assumption follows directly from the demonstration that folding of a protein is dependent only on its amino acid sequence (Anfinsen et al., 1961). Sequences that are homologous contain similar amino acids, and proteins containing similar sequences in all like- lihood will fold similarly.

The second assumption is a logical extension of the first assumption, supported by certain observations. We have found very significant homology between active site fragments of trypsinogen and a human Bence Jones protein (Erhan & Greller, 1974b). A comparison of the 3-dimensional folding of trypsinogen active site fragments and the homologous peptides from Bence Jones, on a molecular model of the.latter molecule, has demonstrated great similarity of conformation. Preliminary experiments in my laboratory have demonstrated the presence of weak proteolytic activity in a Bence Jones preparation. Since then, Rossmann & Argos have demonstrated amino acid sequence simi- larities between the heme-binding pockets of globins and cytochrome b5 (personal com- munication). Tufty & Kretsinger (1975) also find homologous regions, in myosin light chain, to troponin and parvalbumin calcium-binding regions.

a

4 7 8 9 10 5 8 16 15 1413 12 II homologies as listed in Table I. I 2 3

The numbers found below the Figure give the contiguous num-

n n n n n n n nn n n n n n n n , 5 0 100 150 2 0 0 coon 2nN

bering of each amino acid from the N-terminus; each small verti- cal line represents the 10th amino acid. b) Shows the trypsinogen mol- ecule folded over itself with eight homologous peptides on each side.

b

n n n n n n n m 2HN 5 0 I00

H 0 0 3 QOZ OSI u u u u u u u d

7

Page 4: SYMMETRY PATTERNS IN TRYPSINOGEN

SEMIH ERHAN, LARRY D. GRELLER AND BARBARA RASCO

In order to test this idea, alpha carbon back- bones of the eight pairs of homologous peptides were constructed from steel wire and com- pared, Five of the eight pairs were found to fold similarly.

DISCUSSION

On the basis of homology studies which had demonstrated that a number of homologous peptides occurred repeatedly along a protein molecule, it was suggested that early pro- teinoids might have been formed by stepwise condensation of primordial peptides and that it was possible to detect these primordial peptides today (Erhan & Greller, 1974~). This idea was supported by theoretical considerations (Simon, 1973). Furthermore, based on the Anfinsen et ul. (1961) demonstration that folding of a protein is dependent only on its amino acid sequence, it was also suggested that since the homologies found represented pep- tides with similar amino acid sequences, their folding should also be similar. This idea was

supported when homologies were found be- tween active site fragments of trypsinogen and a particular Bence Jones protein (Erhan & Greller, 1974b). The homologous segments on the Bence Jones protein were demonstrated as folding in a way similar to active site of tryp- sinogen. Preliminary experiments with a Bence Jones protein have demonstrated the existence of weak proteolytic activity.

Therefore it is reasonable to expect confor- mational similarity between homologous pep- tides. If these homologous sub-sequences are found within the primary structure of a protein then one can expect to find similar folding along the homologous segments. If, further- more, the homologous sub-sequences display a symmetry pattern, then it should not be sur- prising to find that two halves of the molecule have a “roughly” similar folding. Stroud er ul., (1971) have found “. . . the trypson molecule to fold up into two halves, each of which con- tains a pseudo-cylindrical arrangement of hy- drogen bonds between adjacent antipardel extended chains similar to that described by

TABLE 2 Homologous peptides found in chymotrypsinogen A

~~

Position of Amino acid sequence M score of Double-matching the peptidea of the peptide the match probability

P (M’ > M) b Individual scores

16 IVN 234 LVN

53 VTA 23 1 VTA

61 TTS 221 STS

74 GSSS 187 GVSS

82 KLKIA 175 KIKDA

5 . . 21 10-3

. . . 24 10-l

5 . . 21 1 0 - ~

. 2 . . 26 1 0 - ~

. 5 . 0 29 10-3

. . 5 21 10-3

112 ASF 158 ASL

8

Page 5: SYMMETRY PATTERNS IN TRYPSINOGEN

SYMMETRY PATTERNS IN TRYPSINOGEN

TABLE 3 Homologous peptides found in elastase

Position of Amino acid sequence M score of Double-matching the peptidea of the peptide the match probability

P (M' > M) Individual scores

1 1 SWF'SQI 207 SFVSRL

. 6 3 . 5 5 35 10-3

43 AAHCV 201 AVHGV

73 CVQ 179 GVR

. 3 . 1 . 29 10-3

. . 5 21 10-3

82 YWNTDDVA 149 YLPTVDYA

' 3 1 . 1 . 3 . 41 10-3

98 RLAQSU 140 QLAQTL

5 . . . 5 4 38 10-4

Blow for crchymotrypsin. . .". Similarly, referring to a hydrogen-binding

map of chymotrypsin, Birktoft & Blow (1972) write, ". . .the pattern of zigzag lines is drawn to emphasize the existence of two folded units in the molecule from residues 27-112 and from residues 133-230. . . .A newspaper could be inserted, through the molecular model, almost completely bisecting it into two halves.. .". Hartley & Shotton (1971) too, make the observation, ". . .one can see that as in chymotrypsin the elastase molecule also ap- pears to be divided into two halves composed of residues 27-127.. .in the upper left hemi- sphere and residues 128-230.. . . in the lower right hemisphere. . .",

We have therefore included these proteins in our studies also, to find out whether similar symmetry patterns could be observed since both chymotrypsin and elastase are related to trypsinogen. These studies have yielded six symmetrically situated homologies on chymo- trypsinogen A, and five on elastase. These re- sults are shown in Tables 2 and 3.

Thus the method developed appears capable of suggesting regions along a protein where folding can be expected to be similar, in ad-

dition to detecting regularities along the pri- mary structure. When used together with a predictive method such as the one developed by Chou & Fasman (1974ab) it can be expected to improve the accuracy of predictions.

ACKNOWLEDGMENTS

The authors are indebted to Drs. R. M. Stroud and R. E. Dickerson for making atomic coordinates of trypsin available, to Dr. Byron Rubin for permitting the use of his algorithms to obtain bending angles, and to Dr. J. P. Glusker for letting us use the Byron Bender in her laboratory. Thanks are also due to many colleagues, students, and others too numerous to list individually, for participating in this study.

REFERENCES

Anfinsen, C. B., Haber, E., Sela, M. &White, F. H., Jr.

Buktoft, J. J. & Blow, D. M. (1972) J. Mol. Biof. 68,

Chou, P. Y. & Fasman, G. D. (1974) Biochem 13,

Chou, P. Y. & Fasman, G. D. 19743 13, Biochem

9

(1961)Proc. Natl. Acad. Sci. U.S. 47,1309-1315

187-240

211-222

222-245

Page 6: SYMMETRY PATTERNS IN TRYPSINOGEN

SEMIH ERHAN, LARRY D. GRELLER AND BARBARA RASCO

Erhan, S. & Greller, L. D. (19740) Int. J. Pep?. Prot.

Erhan, S . & Greller, L. D. (1974b) Nature (Lond.)

Greller, L. D, & Erhan, S. (1974) Int. J. Pep?. Pro?. Res. 6,165-173

Hartley, B. S. & Shotton, D. M. (1971) in The En- zymes(Boyer, P. D., ed.), 3rd Edn., vol. 3, pp. 323- 373, Academic Press, New York

McLachlan, A. D. (1971) J. Mol. Biol. 61,409-424 Rubin, B. & Richardson, J . S. (1972) Biopolymers 11,

Res. 6,175-181

251,353-355

2381-2385

Simon, H.A. (1973) in Hierarchy Theory (Pattee, H. H., ed.), p. 3, George Braille, New York

Stroud, R. M., Kay, L. M. & Dickerson, R. E. (1971) Cold Spring Harbor Symposium vol. 36, p. 125

Address: Semih Erhan 2101 Chestnut Street Philadelphia Pennsylvania 19 103 U.S.A.

10