7
J Mol Evol (1989) 28:279-285 Journal of MolecularEvolution ~) Springer-Verlag New Yark Inc. 1989 Deletions in Processed Pseudogenes Accumulate Faster in Rodents than in Humans Dan Graur, ~ Yuval Shuali, 1 and Wen-Hsiung Li 2 1Department of Zoology,GeorgeS. Wise Facultyof Life Sciences,Tel Aviv University,Ramat Aviv 69978, Israel 2Center for Demographicand PopulationGenetics, Universityof Texas Health ScienceCenter at Houston, PO Box 20334, Houston, Texas 77225, USA Summary. The relative rates of point nucleotide substitution and accumulation of gap events (dele- tions and insertions) were calculated for 22 human and 30 rodent processed pseudogenes. Deletion events not only outnumbered insertions (the ratio being 7:1 and 3:1 for human and rodent pseudo- genes, respectively), but also the total length of dele- tions was greater than that of insertions. Compared with their functional homologs, human processed pseudogenes were found to be shorter by about 1.2%, and rodent pseudogenes by about 2.3%. DNA loss from processed pseudogenes through deletion is es- timated to be at least seven times faster in rodents than in humans. In comparison with the rate of point substitutions, the abridgment of pseudogenes during evolutionary times is a slow process that probably does not retard the rate of growth of the genome due to the proliferation of processed pseu- dogenes. Key words: Processed pseudogenes -- Rate of sub- stitution -- Deletions -- Insertions -- Genome size Introduction Pseudogenes are DNA sequences that bear struc- tural similarity to functional genes but are prevented from being expressed properly due to mutational defects (Proudfoot 1980; Proudfoot and Maniatis 1980; Vanin 1985). Because, with few exceptions, all pseudogenes lack function, one can assume that all mutations occurring in pseudogenes are free from Offprint requests to: D. Graur selective constraints, and thus, may be randomly fixed in populations. Therefore, the pattern and rate of substitution in pseudogenes should reflect the pat- tern and rate of spontaneous mutations (Gojobori et al. 1982). Pseudogenes arise in evolution by one of three pathways. The first pathway involves the duplica- tion of a functional gene and the subsequent accu- mulation of inactivating mutations in one of the resulting duplicates (Lacy and Maniatis 1980; Proudfoot and Maniatis 1980; Brisson and Verma 1982; Proudfoot et al. 1982; Hutchinson et al. 1983; Vanin 1983). The second pathway is the duplication of a preexisting pseudogene. Some examples of such pseudogenes exist in the literature (e.g., Cleary et al. 1981). The third mechanism of creating pseudo- genes involves the reverse transcription of a pro- cessed messenger RNA, and the subsequent incor- poration of the resulting complimentary DNA (cDNA) back into the genome. These retropseu- dogenes or processed pseudogenes, so called because they lack introns, possess oligo-A sequences at their 3' ends, are usually unlinked to their functional homologs, and sometimes bear remnants of post- transcriptional modifications (Chen et al. 1982; Hollis et al. 1982; Reilly et al. 1982; Vanin 1985), are ubiquitous throughout the mammalian genome. Processed pseudogenes have a unique property that makes them ideal for evolutionary studies. They are, with very few exceptions (e.g., McCarrey and Thomas 1987), dead-on-arrival sequences, i.e., they are devoid of function from the the moment they are incorporated back into the genome. Thus, their sequences can be used to infer the rates of sponta- neous deletions and insertions and also to compare

Deletions in processed pseudogenes accumulate faster in rodents than in humans

Embed Size (px)

Citation preview

Page 1: Deletions in processed pseudogenes accumulate faster in rodents than in humans

J Mol Evol (1989) 28:279-285 Journal of Molecular Evolution ~) Springer-Verlag New Yark Inc. 1989

Deletions in Processed Pseudogenes Accumulate Faster in Rodents than in Humans

Dan Graur, ~ Yuval Shuali, 1 and Wen-Hsiung Li 2

1 Department of Zoology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Israel 2 Center for Demographic and Population Genetics, University of Texas Health Science Center at Houston, PO Box 20334, Houston, Texas 77225, USA

Summary. The relative rates of point nucleotide substitution and accumulation of gap events (dele- tions and insertions) were calculated for 22 human and 30 rodent processed pseudogenes. Deletion events not only outnumbered insertions (the ratio being 7:1 and 3:1 for human and rodent pseudo- genes, respectively), but also the total length of dele- tions was greater than that of insertions. Compared with their functional homologs, human processed pseudogenes were found to be shorter by about 1.2%, and rodent pseudogenes by about 2.3%. DNA loss from processed pseudogenes through deletion is es- timated to be at least seven times faster in rodents than in humans. In comparison with the rate of point substitutions, the abridgment of pseudogenes during evolutionary times is a slow process that probably does not retard the rate of growth of the genome due to the proliferation of processed pseu- dogenes.

Key words: Processed pseudogenes -- Rate of sub- stitution -- Deletions -- Insertions -- Genome size

Introduction

Pseudogenes are DNA sequences that bear struc- tural similarity to functional genes but are prevented from being expressed properly due to mutational defects (Proudfoot 1980; Proudfoot and Maniatis 1980; Vanin 1985). Because, with few exceptions, all pseudogenes lack function, one can assume that all mutations occurring in pseudogenes are free from

Offprint requests to: D. Graur

selective constraints, and thus, may be randomly fixed in populations. Therefore, the pattern and rate of substitution in pseudogenes should reflect the pat- tern and rate of spontaneous mutations (Gojobori et al. 1982).

Pseudogenes arise in evolution by one of three pathways. The first pathway involves the duplica- tion of a functional gene and the subsequent accu- mulation of inactivating mutations in one of the resulting duplicates (Lacy and Mania t is 1980; Proudfoot and Maniatis 1980; Brisson and Verma 1982; Proudfoot et al. 1982; Hutchinson et al. 1983; Vanin 1983). The second pathway is the duplication of a preexisting pseudogene. Some examples of such pseudogenes exist in the literature (e.g., Cleary et al. 1981). The third mechanism of creating pseudo- genes involves the reverse transcription of a pro- cessed messenger RNA, and the subsequent incor- poration of the resulting complimentary DNA (cDNA) back into the genome. These retropseu- dogenes or processed pseudogenes, so called because they lack introns, possess oligo-A sequences at their 3' ends, are usually unlinked to their functional homologs, and sometimes bear remnants of post- transcriptional modifications (Chen et al. 1982; Hollis et al. 1982; Reilly et al. 1982; Vanin 1985), are ubiquitous throughout the mammalian genome.

Processed pseudogenes have a unique property that makes them ideal for evolutionary studies. They are, with very few exceptions (e.g., McCarrey and Thomas 1987), dead-on-arrival sequences, i.e., they are devoid of function from the the moment they are incorporated back into the genome. Thus, their sequences can be used to infer the rates of sponta- neous deletions and insertions and also to compare

Page 2: Deletions in processed pseudogenes accumulate faster in rodents than in humans

280

these r a t e s w i th the r a t e o f p o i n t s u b s t i t u t i o n ( m u - t a t ion) . In c o m p a r i s o n , w h e n d e a l i n g w i t h n o n p r o - ces sed p s e u d o g e n e s , one m u s t t a k e i n to a c c o u n t t he s o - c a l l e d n o n f u n c t i o n a l i z a t i o n t i m e , i.e., t he t i m e i t t a k e s f r o m the m o m e n t o f d u p l i c a t i o n to the t i m e o n e c o p y loses i ts f u n c t i o n (Li e t al. 1981). W i t h o u t t a k i n g th i s f ac to r i n to a c c o u n t , o n e is p r o n e to u n - d e r e s t i m a t e t h e r a t e o f e v o l u t i o n in p s e u d o g e n e s . H o w e v e r , in o r d e r to e s t i m a t e t he n o n f u n c t i o n a l i z a - t i o n t i m e , one m u s t h a v e i n d e p e n d e n t k n o w l e d g e o f the t i m e o f d i v e r g e n c e b e t w e e n e i t h e r the gene a n d i ts conspec i f i c p s e u d o g e n e , o r b e t w e e n two or - t h o l o g o u s f u n c t i o n a l genes , i .e. , the t i m e o f d i v e r - gence b e t w e e n two spec ies (Li e t al. 1981). T h i s t y p e o f i n f o r m a t i o n is u s u a l l y n o t a v a i l a b l e .

T h e a i m o f the p r e s e n t p a p e r is to s t u d y the r a t e o f m o l e c u l a r e v o l u t i o n o f p r o c e s s e d p s e u d o g e n e s . W e a re i n t e r e s t e d in b o t h p o i n t s u b s t i t u t i o n a n d gap e v e n t s (de l e t i ons a n d inse r t ions ) . I n th i s s t u d y we i n c o r p o r a t e t he f ind ings o f W u a n d L i (1985) a n d L i e t al. (1987) to t he effect t h a t the r a t e s o f nu - c l e o t i d e s u b s t i t u t i o n a re h i g h e r in r o d e n t s t h a n in h u m a n s .

Data and Methods

Twenty-two human, 14 murine, and 16 rat processed pseudogene sequences were gathered from the literature and aligned to their conspecific functional homologs by the methods of Needleman and Christian (1970) and Wilbur and Lipman (1985). Lack of all the introns that are present in the functional homologs was the only criterion used in determining whether a pseudogene is pro- cessed or not, and consequently, whether it should be included in the present study or not. For this reason, pseudogenes derived from intronless functional genes, e.g., the human interferon a pseudogenes, were not included. We also disregarded truncated pseudogenes, regardless of the size of their 5' or 3' truncation, and despite the fact that most truncated pseudogenes are pro- cessed. The reason is that in truncated genes it is difficult to know how much of the total length of the truncation was deleted as a result of the initial processing of the mRNA and how much is due to the accumulation of subsequent deletions adjacent to the 5' and 3' termini.

The number of nucleotide differences between a pseudogene and its conspecific functional homolog was translated into the number of substitutions per site between the two sequences by the method of Jukes and Cantor (1969), and denoted as d. Because our sample of genes contains only conservative ones, the great majority of the nucleotide substitutions should have occurred in the pseudogenes. Therefore, d may be regarded as an approximate indicator of the age of the pseudogene. An alternative to using d would be to compute the number of substitutions at fourfold degenerate sites (Li et al. 1985a) that vary less between genes. This procedure, however, would reduce the number of sites dras- tically and increase the error of the estimate.

With respect to gap events, it was assumed that all insertions and deletions have occurred in the pseudogene and not in the functional gene, where they could have caused frameshifts in the coding sequence with consequent deleterious effects. In 36 out of 52 cases, we could compare the conspecific functional homolog with a homologous gene from a different species (an ortholog), and thus ascertain with greater confidence in which lineage the

gap occurred, and whether it was a deletion or an insertion. (In all the cases marked with an asterisk in Table 1, we have inde- pendent corroboration that the gaps indeed occurred in the pseu- dogene line.) The number and size of gaps were recorded. For each pseudogene, we also recorded the total lengths of deletions and insertions as percentages of the length of the respective func- tional gene. For reasons that will become apparent later, the data from mouse and rat were pooled together. The genes and the pseudogenes that constituted the data base, as well as the number of point nucleotide substitutions (d) and the percentage of dele- tions and insertions are listed in Table 1, together with the bib- liographical sources.

R e s u l t s

O u r s a m p l e o f p s e u d o g e n e s c o n t a i n s 75 d e l e t i o n e v e n t s in h u m a n p s e u d o g e n e s a n d 54 e v e n t s in ro- d e n t p s e u d o g e n e s . I n s e r t i o n s o c c u r m u c h less fre- quen t ly , fo r o u r s a m p l e c o n t a i n s o n l y 11 i n s e r t i o n s in h u m a n s a n d 16 i n s e r t i o n s in r o d e n t s . T h e m e a n l eng th o f d e l e t i o n s is s ign i f i can t ly l o w e r in h u m a n s (2.36 _ 0 .32 n u c l e o t i d e s p e r de l e t i on ) t h a n in ro - d e n t s (4.54 _ 0.87). F i f t y - s e v e n p e r c e n t o f al l de le - t i o n s in h u m a n s i n v o l v e s ing le -base de l e t i ons . In r o d e n t s , 50% o f al l d e l e t i o n s a re s i ng l e -base ones . I n b o t h h u m a n s a n d r o d e n t s , d e l e t i o n e v e n t s sig- n i f i can t ly o u t n u m b e r e d inse r t ions ( W i l c o x o n s igned- r a n k m a t c h e d - p a i r tes t , P < 0.01 a n d P < 0 .05, r e spec t ive ly ) . O n the ave rage , t he re were 3.41 ___ 0.71 d e l e t i o n even t s a n d 0.50 _-+ 0.15 i n se r t i on even t s p e r h u m a n p s e u d o g e n e , a n d 1.80 + 0 .44 d e l e t i o n s a n d 0 .53 +_ 0 .20 i n s e r t i o n s p e r r o d e n t p s e u d o g e n e , t he r a t i o o f d e l e t i o n s to i n s e r t i o n s b e i n g a p p r o x i - m a t e l y 7:1 a n d 3:1 for h u m a n a n d r o d e n t p s e u d o - genes, r e spec t i ve ly . I n a d d i t i o n , the t o t a l l eng th o f d e l e t i o n s was g r e a t e r t h a n t ha t o f i n se r t i ons . I n fact , a b o u t 1.2% o f i ts t o t a l l ength is de l e t ed , w h e r e a s o n l y 0 .1% is i n s e r t e d in an a v e r a g e h u m a n p r o c e s s e d p s e u d o g e n e . T h e c o r r e s p o n d i n g n u m b e r s in the ro - d e n t a r e 2 .3% a n d 1.1%. T h e d i f fe rences b e t w e e n the t o t a l l eng ths o f d e l e t i o n s a n d i n s e r t i o n s a r e h igh - ly s ign i f i can t for t he h u m a n l ineage ( p a i r e d t -- 4 .299 , P < 0 .0001) . T h e d i f fe rence was n o t s t a t i s t i ca l ly s ign i f ican t in r o d e n t s d u e to t he p r e s e n c e o f a 125- ba se i n s e r t i o n in the ra t a - t u b u l i n p s e u d o g e n e . O u r p s e u d o g e n e d a t a s u p p o r t de Jong a n d R y d 6 n ' s (1981 ) h y p o t h e s i s t h a t t he p r e p o n d e r a n c e o f d e l e t i o n s a m o n g gap e v e n t s is i n h e r e n t in t he m u t a t i o n a l p r o - cess, r a t h e r t h a n be ing a b y - p r o d u c t o f se lec t ion .

N e x t , we l o o k e d for a r e l a t i o n s h i p b e t w e e n the r a t e o f d e l e t i o n s a n d i n s e r t i o n s a n d the ra te o f p o i n t n u c l e o t i d e s u b s t i t u t i o n in p r o c e s s e d p s e u d o g e n e s . W e f ind t h a t t he n u m b e r o f d e l e t i o n e v e n t s is sig- n i f i can t ly c o r r e l a t e d w i th d in b o t h t h e h u m a n a n d r o d e n t l ineages ( r = 0 .677 a n d r = 0 .597 , r e spe c - t ive ly) . S imi l a r l y , the p e r c e n t a g e o f d e l e t e d D N A is p o s i t i v e l y c o r r e l a t e d w i t h d (r = 0 .477 a n d 0 .642 , for h u m a n s a n d r o d e n t s , r e spec t ive ly ) . M o r e o v e r ,

Page 3: Deletions in processed pseudogenes accumulate faster in rodents than in humans

281

Table 1. The number of point substitutions per nucleotide site (d), and the relative lengths of gaps (insertions and deletions) between human, rat, and mouse processed pseudogenes and their functional conspecifie homologs. Continued on next page

Length of Lengths of coding insertions

Pseudogene region d (total %) Lengths ofdeletions (total %) Source"

H u m a n

Metallothionein II* 180 0.028 None None (1) Apoferritin H- ! 33 429 0.036 None None (2) ~-tubulin-21 B* 1332 0.042 None 1 (3)

(0.000) (0.075) Giycerate-3-P dehydrogenase-X* 1002 0.042 None 3 (4)

(0.000) (0.299) Apoferritin H- 123 429 0.045 None None (2) Phosphoglycerate kinase-X 1248 0.054 None 1, 1, 5 (5)

(0.000) (0.561) Argininosuccinate synthetase-I 1233 0.071 3, 6 1, 1, 3, 9 (6)

(0.728) (1.127) Dihydrofolate reductase-l* 558 0.076 1 1, 4 (7)

(0.179) (0.896) Argininosuccinate synthetase-3 1233 0.077 None 1, 1, 5 (6)

(0.000) (0.649) Triosephosphate isomerase 19A* 747 0.079 1 1, 1, 1, 2, 3, 4 (8)

(0.134) (1.163) Chromosomal protein HMG17-28 267 0.081 1 12 (9)

(0.375) (4.494) Triosephosphate isomerase 5A* 747 0.087 1 1, 3, 3, 4, 5 (8)

(0.134) (2.142) Chromosomal protein HMGI 7-60 267 0.092 None 1 (9)

(0.000) (0.375) Lactate dehydrogenase A-H463* 993 0.097 None 3, 3 (10)

(0.000) (0.604) /~-actin-2* 1128 0.101 3, 3 1, 1, 2, 3, 19 (11)

(0.532) (2.305) /3-tubulin- 14B* 1332 0.104 None 1, 1 (3)

(0.000) (0.376) /3-actin-l* 1128 0.109 2 1, 1, 1, 3 (11)

(0.178) (0.535) #-tubulin-7B* 1332 0.134 None 1, 1, 1, 1, 1, 1, 2, 2, 2 (3)

(0.000) (3.228) Metallothionein I* 180 0.141 None 1 (12)

(0.000) (0.556) C'u/Zn superoxide dismutase-69.1 459 0.177 1 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3 (13)

(0.218) (3.484) Cu/Zn superoxide dismutase-71.4 459 0.191 1 1, 1, 1, 2, 3 (13)

(0.218) (1.743) ~-tubulin-11B* 1332 0.227 None 1, 1, 2, 2, 3, 4, 9 (3)

(0.000) (1.652)

Mouse Ribosomal protein L32-4A 402 0.002 None None (14) Ribosomal protein L30-2" 342 0.009 None None (15) Ribosomal protein SI6-P 123 0.029 1, 3 None (16)

(3.252) (0.000) Ribosomal protein L30-4" 342 0.032 None None (15) Cellular tumor antigen P53" 1173 0.035 1, 1, 6 1, 2, 5 (17)

(0.682) (0.682) Cytochrome c-MC3* 312 0.038 None 16 (18)

(0.000) (5.128) Lactate dehydrogenase A-MI 1" 993 0.059 None 1, 1, 3, 3 (19)

(0.000) (0.806) Cytochrome c-MC4* 312 0.075 None 1, 3 (17)

(0.000) (1.282) Ribosomal protein L30-3" 342 0.083 1 None (15)

(0.292) (0.000) Lactate dehydrogenase A-MI4* 993 0.093 None 1, 1, 1, 1, 1, 1, 2, 11 (19)

(0.000) (1.854)

Page 4: Deletions in processed pseudogenes accumulate faster in rodents than in humans

282

Table 1. Continued from previous page

Pseudogene

Length of Lengths of coding insertions region d (total %) Lengths of deletions (total %) Source ~

Lactate dehydrogenase A-M 10*

Cytochrome c-MC2*

a-globin-3*

a-globin-30.5*

Rat Ribosomal protein L35A-A Ribosomal protein L35A-B Metallothionein I-B*

Apoferritin L-31" Cytoehrome e-RCI 3*

Apoferritin Lo45" Cytochrome c-RC5* Cytochrome c-RC9* a-tubulin

Cytochrome c-RC6* Cytochrome c-RC8*

Metallothionein I-C*

Glutathione-S-transferase

Cytoehrome c-RC 10*

Metallothionein I-A*

Ribosomal protein L35A-G

993 0.095 None 1, 1, 4 (19) (0.000) (0.604)

312 0.110 None 1, 3, 7 (17) (0.000) (3.352)

423 0.175 1, 2, 6, 9 3, 3, 3, 28 (20) (4.255) (8.745)

993 0.216 None 1, 1, 1, 1, 3, 5, 21 (21) (o.ooo) (8.511)

327 0.000 None None (22) 327 0.000 None None (22) 180 0.000 1, 3 None (12)

(0.566) (0.000) 345 0.017 None None (23) 312 0.017 None 11 (24)

(0.000) (3.526) 345 0.019 None None (23) 312 0.019 None None (24) 312 0.019 None None (24)

1350 0.025 1, 1, 125 1, 1, 1, 1, 5 (25) (9.407) (0.667)

312 0.036 None None (24) 312 0.036 None None (24) 180 0.036 None 21 (12)

(0.000) (11.677) 630 0.068 None 1, 1, 1, I, 1, 1, 3 (26)

(0.000) (1.429) 312 0.072 None 2, 9, 9 (24)

(0.000) (6.410) 180 0.100 None 9 (12)

(0.000) (5.000) 327 o. 147 45 24 (22)

(13.761) (8.257)

None = no recorded gap events; * = independent corroboration that gaps occur in the pseudogene (see text) a Sources for the nucleotide sequences of the pseudogenes: (1) Karin and Richards 1982; (2) Constanzo et al. 1986; (3) Wilde et al.

1982, Gwo-Shu Lee et al. 1983; (4) Benham et al. 1984, Hanauer and Mandel 1984; (5) Michelson et al. 1985; (6) Anagnou et al. 1984, Masters et al. 1983; (7) Freytag et al. 1984; (8) Brown et al. 1985; (9) Srikantha et al. 1987; (10) Tsujibo et al. 1985; (11) Moos and Gallwitz 1983; (12) Varshney and Gedamu 1984; (13) Danciger et al. 1986, Delbar et al. 1987; (14) Dudov and Perry 1984; (15) Wiedemann and Perry 1984; (16) Wagner and Perry 1985; (17) Zakut-Houri et al. 1983, Benchimol et al. 1984; (18) Limbach and Wu 1985; (19) Fukasawa et al. 1986, Fukasawa et al. 1987; (20) Vanin 1985; (21) Vanin et al. 1980; (22) Kuzumaki et al. 1987; (23) Leibold et al. 1984; (24) Scrapulla 1983, Scrapulla and Wu 1983; (25) Lemischka and Sharp 1982; (26) Okada et al. 1987

w e can s h o w tha t a l i n e a r r e l a t i o n s h i p exis t s b e t w e e n the t o t a l pe r cen t age o f d e l e t i o n s a n d the n u m b e r o f s u b s t i t u t i o n s . In b o t h l ineages , the s lopes o f the l in- e a r r eg r e s s ion l ines o f p e r c e n t d e l e t i o n on d were f o u n d to b e s ign i f i can t ly d i f fe ren t f r o m ze ro (P < 0 .0001 a n d P < 0 .025 in r o d e n t s a n d h u m a n s , re- spec t ive ly ) . Because we f o u n d n o s t a t i s t i ca l ly sig- n i f i can t d i f fe rence b e t w e e n the r a t e s o f d e l e t i o n s in m o u s e a n d ra t , the d a t a f r o m these t w o spec ies were p o o l e d toge ther . T h e r e l a t i o n s h i p b e t w e e n the r a t e s o f n u c l e o t i d e s u b s t i t u t i o n a n d pe r cen t age D N A loss is s h o w n in Fig. 1. T h e s lope o f r eg re s s ion o f the c u m u l a t i v e loss o f D N A d u e to d e l e t i o n s o n d is 11.41 for h u m a n s a n d 40 .10 for r o d e n t s . Thus , the s lope fo r r o d e n t s is a b o u t 3.5 t i m e s s t e e p e r t h a n t ha t

for t he h u m a n p s e u d o g e n e s . T h e a b s o l u t e v a l u e s o f t he s lopes in Fig. 1 a r e u n d e r e s t i m a t e s b e c a u s e the d v a l u e i n c l u d e s changes n o t o n l y in t he p s e u d o g e n e b u t a l so in t he f u n c t i o n a l gene, w h e r e a s a l m o s t a l l d e l e t i o n s o c c u r r e d in the p s e u d o g e n e . T h e r a t i o be - t w e e n the t w o ra tes , o n the o t h e r h a n d , is n o t af- fec ted . Because r o d e n t genes a c c u m u l a t e p o i n t m u - t a t i o n s a t l eas t t w o t i m e s fas te r t h a n h u m a n ones

( W u a n d L i 1985; B r i t t e n 1986; Li e t al . 1987), we m a y d e d u c e t ha t t he r a t e o f D N A loss f r o m p r o - ces sed p s e u d o g e n e s d u e to d e l e t i o n e v e n t s is a t l eas t s even t i m e s h i g h e r in r o d e n t s t h a n in h u m a n s . O n e m u s t no te , h o w e v e r , t h a t o u r s a m p l e o f genes in r o d e n t s is d i f fe ren t f r o m t h a t in h u m a n s . Because o f t he fact t ha t f u n c t i o n a l genes a r e k n o w n to h a v e

Page 5: Deletions in processed pseudogenes accumulate faster in rodents than in humans

12-

10-

B-

4-

2-

o o

0

./J 0 _ t , ~ O _ , ~ r ' ..,- ', O " ~ ' ~ " , * * * ', ', ',

0.o5 o,~ o.15 o.2 0.25

Number of Nucleotide Substitutions/Site

283

Fig. 1. The relationship between the total length of deletions and the num- ber of nucleotide substitutions per site between a processed pseudogene and its conspecific functional homolog. Upper line = rodents; lower line = humans.

different rates of substitution, one must use the same set of genes in both groups in order to obtain an accurate estimate of the ratio. Thus, the difference in the slopes for humans and rodents may, in reality, be either larger or smaller than that in Fig. 1. To be on the conservative side, let us assume that the dif- ference is smaller. However, because the majority of point substitutions should have occurred in the pseudogenes, the difference in slopes is unlikely to be explained completely by the fact that different genes were used.

Despite the fact that insertions deafly occur more frequently in rodents than in humans, and that the differences between rodents and humans were sta- tistically significant whether we used the number of insertion events or the percentage of length inserted into the processed pseudogene as variables, inser- tions are much more infrequent than deletions for a detailed picture to emerge. Much more data than presently available is needed to estimate the rate and pattern of insertion in processed pseudogenes.

D i s c u s s i o n

There are two kinds of processes that affect the evo- lution of processed pseudogenes. The first involves the very rapid accumulation of point mutations that, being unconstrained by function, fire fixed in the population according to the rules of stochastic sub- stitution of neutral alleles. This accumulation of molecular changes eventually obliterates the simi- larity between the pseudogene and its functional homolog that evolves much more slowly. Indeed, identifying old pseudogenes is quite a difficult task (e.g., Hardison et al. 1986). We call this process compositional assimilation, whereby a pseudogene loses similarity to the functional gene and melts into the background of its surrounding DNA. From the pattern of substitution in pseudogenes, it was esti-

mated that the nucleotide composition of pseudo- genes will eventually become A+T-r ich (Gojobori et al. 1982). Functional genes, on the other hand, will tend to accumulate C and G residues at degen- erate sites (Ticher and Graur, in press).

The second process is characterized by the pseu- dogene becoming progressively shorter compared to the functional gene through the accumulation of deletions. Interestingly, rodents are at least seven times more efficient at getting rid of this particular type of junk DNA than are humans. This process, which we call length abridgment, is by far a much slower process than the one of compositional assim- ilation. Previously we noted that the relative rate of deletions to substitutions is approximately 40% length deletion per nucleotide substitution per site in rodents and about 10% in humans. Assuming that during long-term evolution the relative rate of dele- tions to substitutions is 25%, and by using Li et al.'s (1985b) estimate for the rate of substitution in pseu- dogenes (5.0 x 10 -9 substitutions/site/year), we can estimate that a processed pseudogene loses half of its DNA in 400 million years. This means, for in- stance, that the human genome still contains major chunks of most of the pseudogenes that were found in the ancestor of all mammals, including mono- tremes, marsupials, and eutherians. Obviously, these ancient pseudogenes had lost any recognizable sim- ilarity to the functional genes much earlier. Possible mechanisms involved in the deletion process are summarized in Levinson and Gutman (1987) and in Walsh (1987). Initial results show that most dele- tions in pseudogenes are in fact reductions in the number of repeated elements within short simple tandem arrays (Landan, Li, and Graur, unpub- lished).

P. Leder (cited in Lewin 1981) referred to the generation of processed pseudogenes as the Vesu- vian model, whereby a functional locus continu- ously pumps out defective copies of itself and dis-

Page 6: Deletions in processed pseudogenes accumulate faster in rodents than in humans

284

perses t h e m all ove r the genome . As a consequence , the g e n o m e size increases d u r i n g e v o l u t i o n a r y t ime s due to the a c c u m u l a t i o n o f j u n k D N A . O u r f indings on the re la t ive ineff ic iency o f the de l e t ion process i n m a m m a l i a n genomes , i n c o n j u n c t i o n wi th the fast a c c u m u l a t i o n o f nuc l eo t i de subs t i t u t ions , i n - dicates tha t the m a m m a l i a n g e n o m e is l i teral ly l i t- te red wi th processed pseudogenes in va r i ous stages o f c o m p o s i t i o n a l a s s imi l a t i on . Because in m a n y m u l t i g e n e fami l ies the n u m b e r o f processed re t ro- p seudogenes far exceeds tha t o f the func t i ona l ho- mo logs (for the m o s t ex t reme e x a m p l e to date, see S r i k a n t h a et al. 1987), we be l i eve tha t processed p seudogenes are crea ted at a m u c h faster ra te t h a n they are ob l i t e ra t ed by the process o f p seudogene

a b r i d g m e n t . Thus , the g rowth o f the g e n o m e is n o t s igni f icant ly r e t a rded by the occur rence o f de le t ions .

T h e o ld processed pseudogenes are too d ive rgen t f rom the i r f u n c t i o n a l paralogs to be recognizable as such by e i ther m o l e c u l a r p robes or by c o m p u t e r searches for s imi l a r i t y i n D N A da ta banks . O n the

o the r hand , the difference in the rates o f de le t ion b e t w e e n roden t s a n d h u m a n s m a y be one o f the factors a c c o u n t i n g for the fact t ha t the nuc l ea r ge- n o m e o f r oden t s is a p p r o x i m a t e l y 2.94 x 109 b p in size, whereas the h u m a n g e n o m e is a l i t t le larger, a b o u t 3.43 x 109 bp ( C a v a l i e r - S m i t h 1985).

Acknowledgments. We thank Aharon Ticher and Robert Schwartz for their advice on handling computerized nucleotide data banks. This work was supported in part by research grants from the Foundation for Basic Research, Tel Aviv University, the Hertz Foundation, and by NIH grant GM30998.

References

Anagnou NP, O'Brien SJ, Shimada T, Nash WG, Chen M J, Nienhuis AW (1984) Chromosomal organization of the hu- man dihydrofolate reductase genes: dispersion, selective am- plification, and a novel form of polymorphism. })roe Natl Acad Sci USA 81:5170-5174

Benchimol S, Jenkins JR, Crawford LV, Leppard K, Lamb P, WilliamsonNM, PimDC, HarlowE (1984) Molecular anal- ysis of the gene for the p53 cellular tumor antigen. Cancer Cells 2:383-391

Benham FJ, Hodgkinson S, Davies KE (1984) A glyceralde- hyde-3-phosphate dehydrogenase pseudogene on the short arm ofthe human X chromosome defines a multigene family. EMBO J 3:2635-2640

Brisson N, Verma DPS (1982) Soybean leghemoglobin gene family: normal, pseudo and truncated genes. Proc Nail Acad Sci USA 79:4055--4059

Britten RJ (1986) Rates of DNA sequence evolution differ be- tween taxonomic groups. Science 231:1393-1398

Brown JR, Dear IO, Krug JR, Masquat LE (1985) Character- ization of the functional gene and several processed pseu- dogenes in the human triosephosphate isomerase gene family. Mol Cell Biol 5:1694-1706

Cavalier-Smith T (1985) Eukaryote gene numbers, non-coding DNA and genome size. In: Cavalier-Smith T (ed) The evo- lution of genome size. Wiley, London, pp 69-103

Chen M J, Shimada T, Moulton AD, Harrison M, Nienhuis AW (1982) Intronless human dihydrofolate reductase genes arc derived from processed RNA molecules. Proc Nail Acad Sci USA 79:7435-7439

Cleary ML, Schon EA, Lingrel JB (1981) Two related pseu- dogenes are the result ofgene duplication in the goat fl-globin locus. Cell 26:181-190

Constanzo F, Colombo M, Staempfli S, Santoro C, Marone M, Frank R, Delins H, Cortese R (1986) Structure ofgene and pseudogenes of human apoferritin H. Nucleic Acids Res 14: 721-736

Danciger E, Dafni N, Bernstein Y, Laver-Rodich Z, Neer A, Groner Y (1986) Human copper-zinc superoxide dismutase gene family molecular structure and characterization of four copper-zinc superoxidase related pseudogenes. Proc Nail Acad Sci USA 83:3619-3623

deJongWW, RydrnL (1981) Causes of more frequent deletions than insertions in mutations and protein evolution. Nature 290:157-159

Delbar JM, Nicole A, Davriol L, Meunier-Rotival M, Galibert F, Sinet PM, Jerome H (1987) Cloning and sequencing of a rat copper-zinc superoxide dismutase complementary DNA. Eur J Biochem 166:181-188

Dudov K.P, Perry RP (1984) The gene family encoding the mouse ribosomal protein L32 contains a uniquely expressed intron-containing gene and an unmutated processed pseu- dogene. Cell 37:457--468

Freytag SO, Bock HGO, Beaudet AL, O'Brien WE (1984) Mo- lecular structure of human argininosuccinate synthetase pseu- dogenes. Evolutionary and mechanistic implications. J Biol Chem 259:3160-3166

Fukasawa KM, Li WH, Yagi K, Luo CC, Li SSL (1986) Mo- lecular evolution of mammalian lactate dehydrogenase-A genes and pseudogenes: association of a mouse processed pseudo- gene with a B1 repetitive sequence. Mol Biol Evol 3:330-342

Fukasawa KM, Tanimura M, Sakai I, Sharief FS, Chung FZ, Li SSL (1987) Molecular nature of spontaneous mutations in mouse lactate dehydrogenase-A processed pseudogenes. Ge- netics 115:177-184

Gojobori T, Li WH, Granr D (1982) Patterns of nucleotide substitution in pseudogenes and functional genes. J Mol Evol 18:360-369

Gwo-Shu Lee M, Lewis SA, Wilde CD, Cowan NJ (1983) Evo- lutionary history of a multigene family: an expressed human fl-tubulin gene and three processed pseudogenes. Cell 33:477- 487

HanauerA, MandelJL (1984) Theglyceraldehyde-3-phosphate dehydrogenase gene family: structure of a human eDNA and of an X chromosome linked pseudogene; amazing complexity of the gene family in the mouse. EMBO J 3:2627-2633

Hardison RC, Sawada I, Cheng JF, Shen CKJ, Schmid CW (1986) A previously undetected pseudogene in the human alpha glo- bin gene cluster. Nucleic Acids Res 14:1903-1911

Hollis GF, Hieter PA, McBride OW, Swan D, Leder P (1982) Processed genes: a dispersed human immunoglobulin gene bearing evidence of RNA-type processing. Nature 296:321- 325

Hutchinson CA, Brown BA, Davis MG, Hardies SC, Hill A, Padgett RW, Phillips SJ, Timmons BEL, Weaver SG, Edgell MH (1983) fl homologous structures in the ~ globin locus of the mouse. In: Goldwasser E (ed) Regulation of hemoglobin biosynthesis. Elsevier, New York, pp 51-68

Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism. Aca- demic Press, New York, pp 21-132

Karin M, Richards RI (1982) Human metallothionein genes-- primary structure of the metallothionein-II gene and a related processed pseudogene. Nature 299:797-802

Kuzumaki T, Tanaka T, Ishikawa K, Ogata K (1987) Rat ri-

Page 7: Deletions in processed pseudogenes accumulate faster in rodents than in humans

bosomal protein L35a multigene family: molecular structure and characterization of three L35a-related pseudogenes. Biochim Biophys Acta 909:99-106

Lacy E, Maniatis T (1980) The nucleotide sequence of a rabbit B-globin pseudogene. Cell 21:545-553

Leibold EA, Aziz N, Brown AJP, Munro HN (1984) Conser- vation in rat liver of light and heavy subunit sequences of mammalian ferritin: presence of unique octopeptide in the light subunit. J Biol Chem 259:4327-4334

Lemischka I, Sharp PA (1982) The sequences of an expressed rat c~-tubulin gene and a pseudogene with an inserted repet- itive element. Nature 300:330-335

Levinson G, Gutman GA (1987) Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol Biol Evol 4:203-221

Lewin R (1981) Evolutionary history written in globin genes. Science 214:426--429

Li WH, Gojobori T, Nei M (1981) Pseudogenes as a paradigm of neutral evolution. Nature 292:237-239

Li WH, Luo CC, Wu CI (1985a) A new method for estimating synonymous and nonsynonymous rates of substitution con- sidering the relative likelihood of nucleotide and codon changes. J Mol Evoi 2:150-174

Li WH, Luo CC, Wu CI (1985b) Evolution of DNA sequences. In: Maclntyre RJ (ed) Molecular evolutionary genetics. Ple- num, New York, pp 1-94

Li WH, Tanimura M, Sharp PM (1987) An evaluation of the molecular clock hypothesis using mammalian DNA se- quences. J Mol Evol 25:330-342

Limbach J, Wu R (1985) Characterization of a mouse somatic cytochrome c gene and three cytochrome c pseudogenes. Nu- cleic Acids Res 13:617-630

Masters JN, Yang JK, Cellini A, Attardi G (1983) A human dihydrofolate reductase pseudogene and its relationship to multiple forms of specific messenger RNA. J Mol Biol 167: 23-36

McCarrey JR, Thomas K (1987) Human testis-specific PGK gene lacks introns and possesses characteristics of a processed gene. Nature 326:501-505

Michelson AM, Bruns GAP, Morton CC, Orkin SH (1985) The human pbosphoglycerate kinase family. HLA-associated se- quences and an X-linked locus containing a processed pseu- dogene and its functional counterpart. J Biol Chem 260:6982- 6992

Moos M, Gallwitz D (1983) Structure of two human/~-actin related processed genes one of which is located next to a simple repetitive sequence. EMBO J 2:757-761

Needleman SB, Christian DW (1970) A general method appli- cable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443--453

Okada A, Sakai M, Muramatsu M (1987) The structure of the rat glutathione S-transferase P gene and related pseudogenes. J Biol Chem 262:3858-3863

Proudfoot N (1980) Pseudogenes. Nature 286:840-841 Proudfoot N, Maniatis T (1980) The structure of a human

a-giobin pseudogene and its relationship to a a-globin gene duplication. Cell 21:537-544

285

Proudfoot N, Gil A, Maniatis T (1982) The structure of the human zeta-globin gene and a closely linked, nearly identical pseudogene. Cell 31:553-563

Reilly JG, Ogden R, Rossi JJ (1982) Isolation of a mouse pseu- do tRNA gene encoding CCA--a possible example of reverse flow of genetic information. Nature 300:287-289

Scrapulla RC (1983) Processed pseudogenes of rat cytochrome c are preferentiaUy derived from one of three alternate mRNAs. Mol Cell Biol 4:2279-2288

Scrapulla RC, Wu R (1983) Nonallelic members of the cyto- chrome c multigene family of the rat may arise through dif- ferent messenger RNAs. Cell 32:473--482

Srikantha T, Landsman D, Bustin M (1987) Retropseudogenes for human chromosomal protein HMG-17. J Mol Biol 197: 405--413

Ticher A, Graur D (1989) Nucleic acid composition, codon usage, and the rate of synonymous substitution in protein coding genes. J Mol Evol (in press)

Tsujibo H, Tiano HF, Li SSL (1985) Nucleotide sequence of the eDNA and an intronless pseudogene for human lactate dehydrogenase-A isozyme. Eur J Bioehem 147:9-15

Vanin EF (1983) Globin pseudogenes. In: Goldwasser E (ed) Regulation of hemoglobin biosynthesis. Elsevier, New York, pp 69-88

Vanin EF (1985) Processed pseudogenes: characteristics and evolution. Annu Rev Genet 19:253-272

Vanin EF, Goldberg GI, Tucker PW, Smithies O (1980) A mouse a-globin related pseudogene lacking intervening se- quences. Nature 286:222-226

Varshney U, Gedamu L (1984) Human metallothionein MT-I and MT-II processed genes. Gene 31:135-145

Wagner M, Perry RP (1985) Characterization of the muitigene family encoding the mouse S16 ribosomal protein: strategy for distinguishing an expressed gene from its processed pseu- dogene counterparts by an analysis of total genomic DNA. Mol Cell Biol 5:3560-3576

Walsh JB (1987) Persistence of tandem arrays: implications for satellite and simple-sequence DNAs. Genetics 115:553-567

Wiedemann LM, Perry RP (1984) Characterization of the ex- pressed gene and several processed pseudogenes for the mouse ribosomal protein L30 gene family. Mol Cell Biol 4:2518- 2528

Wilbur WJ, Lipman DJ (1985) Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci USA 80:726-730

Wilde CD, Crowther CE, Cripe TP, Gwo-Shu Lee M, Cowan NJ (1982) Evidence that a human/%tubul in pseudogene is de- rived from its corresponding mRNA. Nature 297:83-84

Wu CI, Li WH (1985) Evidence for higher rates in rodents than in man. Proe Nati Acad Sci USA 82:1741-1745

Zakut-Houri R, Oren M, Bienz B, Lavie V, Hazum S, Givol D (1983) A single gene and a pseudogene for the cellular tumor antigen p53. Nature 306:594-597

Received March 21, 1988/Revised and accepted August 18, 1988