5
Proc. Natl. Acad. Sci. USA Vol. 88, pp. 6716-6720, August 1991 Genetics Role of diversifying selection and gene conversion in evolution of major histocompatibility complex loci (major histocompatibility complex polymorphisms/multigene family) ToMOKO OHTA National Institute of Genetics, Mishima 411, Japan Communicated by Motoo Kimura, April 12, 1991 (received for review February 2, 1990) ABSTRACT Genes at the major histocompatibility com- plex (MHC) in mammals are known to have exceptionally high polymorphism and linkage disequilibrium. In addition, these genes form highly complicated gene families that have evolved through gene conversion and unequal crossing-over. It has been shown recently that amino acid substitution at the antigen recognition site (ARS) is more rapid than synonymous substi- tution, suggesting some kind of positive natural selection working at the ARS. It is highly desirable to know the interactive effect of gene conversion and natural selection on the evolution and variation of MHC gene families. A population genetic model is constructed that incorporates both selection and gene conversion. Diversifying selection is assumed in which sequence diversity is enhanced not only between alleles at the same locus but also between duplicated genes. Expressed and nonexpressed loci are assumed as in the class I gene family of MHC, with gene conversion occurring among all loci. Exten- sive simulation studies reveal that very weak selection at individual amino acid sites in combination with gene conversion can explain the unusual pattern of evolution and polymor- phisms. Here both gene conversion and natural selection contribute to enhancing polymorphism. The exceptionally high levels of polymorphism at the class I and class II loci of the major histocompatibility complex (MHC) in human and mouse have been of great interest for many years (for reviews, see refs. 1 and 2). Based on recent discoveries of the effects of protein structure on antigen recognition and by using reported DNA sequences at these loci, Hughes and Nei (3) have shown that amino acid re- placement substitutions occur more frequently than synon- ymous substitutions at the antigen recognition site (ARS). From this finding, these authors argue that heterozygote advantage (overdominant selection) is operating at the ARS. However, these genes are known to be evolving under various molecular interaction mechanisms such as gene con- version and unequal crossing-over (see refs. 2 and 4-7, for reviews), and overdominant selection at fixed loci would seem to be an insufficient mechanism. It is highly desirable to investigate how natural selection interacts with such molecular mechanisms. In this report, I show that a model that incorporates both selection and gene conversion fits better to the observed facts than the model of simple over- dominant selection. Model and Simulation Procedure In the genomes of human and mouse, there are usually three loci each of the class I and class II gene families (1, 2). All genes are expressed as important cell-surface molecules that participate in regulating immune reaction (ref. 8, see pages 1037-1054). In both class I and class II families, there is variation in the number of genes among different species and also in the level of polymorphism (see ref. 9). Apparently, the numbers of normally expressed genes are rather small, usu- ally three but occasionally two or four, in each of the two class families of mouse and human. The expressed loci are called "classical" for the class I family. Exceptionally high levels of polymorphism exist only at the expressed loci, and nonexpressed loci are much less polymorphic. It has been speculated that the seemingly "dormant" genes may be useful as a donor repertory for gene conversion and help in enhancing polymorphisms (10, 11). Based on such unusual genetic organization at MHC loci, it has been suggested that these genes evolve with continuing formation, diversifica- tion, and degeneration of alleles and loci, presumably be- cause of changing demands upon antigen presentation by a variable antigenic environment (12). This picture may be viewed as "genetic turnover" involving unequal crossing- over, gene conversion, and diversifying selection. It might also be regarded as a type of frequency-dependent selection in the sense of minority advantage within a population of a gene family. From sequence comparisons, it is thought that the three presently expressed loci of each gene family were duplicated after the mouse-human divergence (9). At the time when the genes duplicated, their divergence would have been low. Thus, in my model, diversity among genes is assumed to be enhanced by selection. Two types of loci are assumed: Is is the number of selected loci corresponding to the classical class I loci, and In is the number of nonselected loci corre- sponding to the nonclassical ones. All loci are assumed to be identical and free of mutation at the beginning. Each locus consists of 50 sites that correspond to the amino acid sites in the ARS. Mutation according to the infinite allele model (13) is assumed at each site. Fig. 1 shows the model for the case of Is = 3 and In = 6. A realistic value of the mutation rate per ARS, v, was chosen with respect to the product, Nv, where N is the effective population size. It is now known that the average heterozygosity per nucleotide site of man is around 0.002- 0.004 (ref. 14, see page 267). This value can be set approx- imately equal to 4NvO, where vo is the selectively neutral mutation rate per site per generation (15). There are 57 amino acids in the ARS (16, 17), and they would correspond to roughly 100 amino acid replacement sites. Therefore, 4Nv of ARS should be -0.2-0.4. I have used Nv - 0.1 (N = 50, and v = 0.002). I further assume that one generation roughly equals a year in the ancestral species of man and mouse in the subsequent discussion. As in an ordinary gene family, gene conversion is assumed to occur among the loci, in addition to mutation and random genetic drift. Interlocus but intrachromosomal conversion is carried out by choosing two loci from (is + In) loci, and one Abbreviations: MHC, major histocompatibility complex; ARS, an- tigen recognition site. 6716 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. Downloaded by guest on February 23, 2022

Role of diversifying selection and major

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Role of diversifying selection and major

Proc. Natl. Acad. Sci. USAVol. 88, pp. 6716-6720, August 1991Genetics

Role of diversifying selection and gene conversion in evolution ofmajor histocompatibility complex loci

(major histocompatibility complex polymorphisms/multigene family)

ToMOKO OHTANational Institute of Genetics, Mishima 411, Japan

Communicated by Motoo Kimura, April 12, 1991 (received for review February 2, 1990)

ABSTRACT Genes at the major histocompatibility com-plex (MHC) in mammals are known to have exceptionally highpolymorphism and linkage disequilibrium. In addition, thesegenes form highly complicated gene families that have evolvedthrough gene conversion and unequal crossing-over. It hasbeen shown recently that amino acid substitution at the antigenrecognition site (ARS) is more rapid than synonymous substi-tution, suggesting some kind of positive natural selectionworking at the ARS. It is highly desirable to know theinteractive effect of gene conversion and natural selection onthe evolution and variation ofMHC gene families. A populationgenetic model is constructed that incorporates both selectionand gene conversion. Diversifying selection is assumed in whichsequence diversity is enhanced not only between alleles at thesame locus but also between duplicated genes. Expressed andnonexpressed loci are assumed as in the class I gene family ofMHC, with gene conversion occurring among all loci. Exten-sive simulation studies reveal that very weak selection atindividual amino acid sites in combination with gene conversioncan explain the unusual pattern of evolution and polymor-phisms. Here both gene conversion and natural selectioncontribute to enhancing polymorphism.

The exceptionally high levels of polymorphism at the class Iand class II loci of the major histocompatibility complex(MHC) in human and mouse have been of great interest formany years (for reviews, see refs. 1 and 2). Based on recentdiscoveries of the effects of protein structure on antigenrecognition and by using reported DNA sequences at theseloci, Hughes and Nei (3) have shown that amino acid re-placement substitutions occur more frequently than synon-ymous substitutions at the antigen recognition site (ARS).From this finding, these authors argue that heterozygoteadvantage (overdominant selection) is operating at the ARS.However, these genes are known to be evolving undervarious molecular interaction mechanisms such as gene con-version and unequal crossing-over (see refs. 2 and 4-7, forreviews), and overdominant selection at fixed loci wouldseem to be an insufficient mechanism. It is highly desirableto investigate how natural selection interacts with suchmolecular mechanisms. In this report, I show that a modelthat incorporates both selection and gene conversion fitsbetter to the observed facts than the model of simple over-dominant selection.

Model and Simulation Procedure

In the genomes of human and mouse, there are usually threeloci each of the class I and class II gene families (1, 2). Allgenes are expressed as important cell-surface molecules thatparticipate in regulating immune reaction (ref. 8, see pages

1037-1054). In both class I and class II families, there isvariation in the number of genes among different species andalso in the level of polymorphism (see ref. 9). Apparently, thenumbers of normally expressed genes are rather small, usu-ally three but occasionally two or four, in each of the twoclass families of mouse and human. The expressed loci arecalled "classical" for the class I family. Exceptionally highlevels of polymorphism exist only at the expressed loci, andnonexpressed loci are much less polymorphic. It has beenspeculated that the seemingly "dormant" genes may beuseful as a donor repertory for gene conversion and help inenhancing polymorphisms (10, 11). Based on such unusualgenetic organization at MHC loci, it has been suggested thatthese genes evolve with continuing formation, diversifica-tion, and degeneration of alleles and loci, presumably be-cause of changing demands upon antigen presentation by avariable antigenic environment (12). This picture may beviewed as "genetic turnover" involving unequal crossing-over, gene conversion, and diversifying selection. It mightalso be regarded as a type of frequency-dependent selectionin the sense of minority advantage within a population of agene family.From sequence comparisons, it is thought that the three

presently expressed loci of each gene family were duplicatedafter the mouse-human divergence (9). At the time when thegenes duplicated, their divergence would have been low.Thus, in my model, diversity among genes is assumed to beenhanced by selection. Two types of loci are assumed: Is isthe number of selected loci corresponding to the classicalclass I loci, and In is the number of nonselected loci corre-sponding to the nonclassical ones. All loci are assumed to beidentical and free of mutation at the beginning. Each locusconsists of 50 sites that correspond to the amino acid sites inthe ARS. Mutation according to the infinite allele model (13)is assumed at each site. Fig. 1 shows the model for the caseof Is = 3 and In = 6.A realistic value of the mutation rate per ARS, v, was

chosen with respect to the product, Nv, where N is theeffective population size. It is now known that the averageheterozygosity per nucleotide site of man is around 0.002-0.004 (ref. 14, see page 267). This value can be set approx-imately equal to 4NvO, where vo is the selectively neutralmutation rate per site per generation (15). There are 57 aminoacids in the ARS (16, 17), and they would correspond toroughly 100 amino acid replacement sites. Therefore, 4Nv ofARS should be -0.2-0.4. I have used Nv - 0.1 (N = 50, andv = 0.002). I further assume that one generation roughlyequals a year in the ancestral species ofman and mouse in thesubsequent discussion.As in an ordinary gene family, gene conversion is assumed

to occur among the loci, in addition to mutation and randomgenetic drift. Interlocus but intrachromosomal conversion iscarried out by choosing two loci from (is + In) loci, and one

Abbreviations: MHC, major histocompatibility complex; ARS, an-tigen recognition site.

6716

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 23

, 202

2

Page 2: Role of diversifying selection and major

Proc. Natl. Acad. Sci. USA 88 (1991) 6717

of the two, randomly chosen, converts the other. A site irandomly chosen from the 49 sites excluding the rightmosone, and either the region to its left including itself or thregion to its right excluding itself is converted. The rate aoccurrence of the above event is A per gene per generationBecause only half of a gene is converted on the average, thieffective conversion rate, Ae = A/2. This procedure producevarious "recombinant" genes among loci. A small value oconversion rate (NAe = 0.0-0.4) was chosen that was thoughto be realistic based on gene diversity and gene trees.

Intralocus but interchromosomal conversion is also incorporated. It is performed by choosing a locus from (Is + ln) locof a diploid individual, and the gene at this locus on on(chromosome converts that on the homologous chromosomeof the individual. This is again done by choosing a site fron49 sites of the gene. The interchromosomal conversion hawsimilar effect as the ordinary crossing-over. Various "recomibinant" genes are again produced by this process, andrecombination is between genes at the same locus this timeThe rate of intralocus conversion is ,3, and the effective rateis Pe = 8/2. Again a low rate (NIBe = 0.1) was chosen that wasthought to be realistic from the data.For selectively neutral mutations, theoretical predictions

made on genetic variability in the present model is possible(4, 18). When selection is involved, the process is morecomplicated, and extensive Monte Carlo simulations arerequired.

If the system starts from identical genes, natural selectionshould operate to increase diversity not only between allelesat the same locus but also between genes at different loci.Here it is convenient to use the identity coefficients (4, 18) ofthe multigene family model. Fig. 1A shows the two identitycoefficients that represent the probabilities ofgene identity ofthe illustrated relationships for the case of Is = 3 and In = 6.Selection is assumed to work to lower F and/or C1. Let dF1and dc~i be the numbers of different sites among the se-quences of the illustrated relationships of a diploid individualas in Fig. 1B. Then the fitness of a gamete, w, from thisindividual with dcj and dF, is assumed to be given by

w = exp(-AF - Ac), [1]

where

AF = SF (dt -dFi),

AF = 0,

is-1

AC = Sc E (dt - dc,.),i=l

AC = °,

when dFi < dt,

when dFi 2 dt,

when dc1i < dt, and

when dc,i 2 dt.

In this equation, SF and sc are selection coefficients and d, isthe number corresponding to the truncation point. Thisfitness function is motivated by the consideration that func-tional diversity among class I or class II molecules dependson amino acid differences in the ARS, and the greater thedifferences, the more beneficial a gene family is until eachdifference reaches its truncation point dt. This model ofselection treats only part of the turnover process of the wholegene family mentioned before-i.e., the process of differen-tiation of duplicated genes. The results are useful for under-standing sequence comparison data. The meaning of "diver-sifying" selection is somewhat different from that of itsordinary usage, because diversity usually means phenotypicdiversity. In the present study, environment is an antigenicworld.

A[sIStieIfn.WeI's4,it

r-

'S

) ) U) | | | | chromosome

1

Cl

B 50 sites/7 ... I I I

::I t~ I t I I

dFl dF2 dF3

-I I I I I II C1 2

FIG. 1. (A) Diagram of the gene family consisting of threeexpressed loci and six nonexpressed loci. Allelic (F) and nonallelic(C1) identity coefficients are also shown. (B) Diagram of threeexpressed loci. A gene contains 50 sites, and dF, and dclj are thenumbers of different sites in comparison.

RESULTSOur main interest is in how gene diversity is attained underselection and gene conversion. First, results of sequencedivergence from the ancestral sequence are presented. Twocases are studied. In case 1, three expressed loci are presentand the copy number remains constant (4s = 3, in = 0). In case2, three expressed and six nonexpressed loci are present (is= 3, In = 6). In all cases, both interlocus (intrachromosome)

f and intralocus (interchromosome) conversions are incorpo-rated. In each case, several levels of selection intensity (SFand Sc of Eq. 1) were considered, but the value of truncationpoint was always assumed to be 10 (dt = 10 in Eq. 1).The sequence divergences from the original measured by

the distance-i.e., -lo0ge( - Pd)-where Pd is the fraction ofdifferent sites, is presented in Fig. 2 as functions of time. Thestraight line gives the case of neutral mutations. When 2Ns =0.5, gene conversion appears to be more effective in case 2than in case 1. However, when 2Ns = 1.0, the pattern of theaccelerated divergence is similar in both cases. For 2Ns =2.0, the acceleration slows down as time goes on in case 2,where gene conversion from the nonselected loci is incorpo-rated. This is thought to be caused by the decreased selectioneffect produced by conversion from the nonexpressed locithat are not rapidly diverging. I have repeated simulations,and very similar figures were obtained. Thus, the result isrepeatable and has an important bearing on the observeddecrease of the acceleration of amino acid substitution at theantigen recognition site of the class I genes (3). This problemwill be discussed further in a later section.

It can be seen from the figures, that the selection is veryefficient even when it is mild. The selection intensity of 2Ns

1 is sometimes called "near neutrality" (15). The reasonfor such efficient selection is the linkage disequilibriumamong segregating sites, a situation similar to the Franklin-Lewontin effect of multilocus selection (19). Once "coad-apted" mutant blocks are established, they are maintained inthe population by selection (20). The appearance of suchcoadapted blocks is thought to be the result of interactingforces of selection and random drift (19).

Let us now turn our attention to the more general proper-ties of the present model. Several interesting quantities suchas allelic diversity or actual number of alleles were examinedin the period from the (1/v)th generation to the end of eachsimulation experiment, and average values are presented inTable 1. These values do not pertain to the equilibriumsituation but rather are the average for the transient phase.The actual MHC gene families turn over in the evolutionary

Genetics: Ohta

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 23

, 202

2

Page 3: Role of diversifying selection and major

Proc. Natl. Acad. Sci. USA 88 (1991)

1'N, * o l . *

V.I

0.1 0.1 0.1

0 0 0

0 20 40 60 80 0 20 40 60 80 20 40 60 80

Time

FIG. 2. Gene divergence measured by -lo&(l - pd) is given as functions of time, where Pd is the fraction of sites that differ from the originalsequence. Time is measured by units ofN generations. -, case 1; ---, case 2. Parameters are 2Nv = 0.2, 2NA = 0.1, and 2N,8 = 0.4.

time scale as discussed previously, so the transient phaseshould be closer to the real situation.Examined are nonallelic diversity, allelic diversity, age of

nonidentity (polymorphism; both allelic and non-allelic),actual number of alleles, identity excess, allelic diversity atone of the nonexpressed loci (case 2), and distance from theoriginal sequence at the end of each simulation. Allelic andnonallelic diversity are measured by the fraction of thedifferent sites among the 50 sites. Age of nonidentity ismeasured by the age of the younger mutant at the two sitescompared whenever the two sites differ, and the value isaveraged for all different sites. Both the diversity and the agefor nonallelic comparisons are made for nonallelic genes onthe same chromosome, corresponding to C1 in Fig. 1. Theactual number of alleles is the one found in the simulatedpopulation of 2N = 100.

Identity excess is a measure of linkage disequilibrium.When there are many sites in a gene, this measure isconvenient and hence is used here. Let FA and FB be thefractions of identical sites of two randomly chosen chromo-somes at the first and the second loci, respectively. Let FABbe the probability of having identical sites simultaneously atthe first and second locus. Then the identity excess is FAB -

FAFB. I have suggested the following standardized measure(21):

FAB- FAFBFt ,A [2]

HAHB

where HA = 1 - FA and HB = 1 - FB. The denominator isthe product of the fraction of nonidentical sites at the firstlocus and that at the second locus. Hedrick (22) suggestedstandardization of each observed value, instead of using themeans ofFA, FB, and so on. When a multisite gene is treated,the present measure is more convenient. Both measures aredependent on the value ofthe denominator, and one has to becareful in evaluating the results (22).

Several interesting properties of the model can be found inthe data ofTable 1. (i) Very weak selection at individual sitescauses a large increase of allelic and nonallelic diversities, aswell as of the actual number of alleles. (ii) Although selectionis effective in both cases, allelic and nonallelic diversity andthe actual number of alleles are increased by interlocus geneconversion, especially when selection is very weak. (iii) Ageof nonidentity becomes higher by the selection, and interlo-cus conversion again increases the age of nonidentity. (iv)Identity excess is large even when selection is very mild-i.e., fairly large linkage disequilibrium is expected under thepresent model. (v) Allelic diversity at the nonexpressed loci(case 2) is one-third to about one-half of that at the expressedloci. (vi) As pointed out before, in case 2, genetic divergencemeasured by distance from the original sequence is decreasedby gene conversion from nonexpressed loci in the later periodof the simulations when 2Ns = 1.0 or more. All of theseproperties of the model have significant implications forunderstanding the observed pattern ofMHC polymorphisms,which will be discussed later.Our next simulation experiments incorporate a different

form of selection, in which diversifying selection is only for

Table 1. Properties of the simulated populations in the period from the (1/v)th to the 80Nth generationAge of Actual Identity Allelic Divergence

Nonallelic Age of Allelic polymor- number excess diversity,* at 80NthCase 2Ns diversity* nonidentityt diversity* phismt of alleles standard nonexpressed generation

1 0.0 0.152 ± 0.074 1037 0.010 ± 0.006 93 3.21 1.24 - 0.1510.5 0.176 ± 0.074 827 0.038 ± 0.025 443 4.30 1.50 0.1771.0 0.259 ± 0.052 1253 0.078 ± 0.047 824 4.72 2.26 0.2%1.5 0.249 ± 0.069 1115 0.086 ± 0.052 698 4.94 3.43 0.2692.0 0.295 ± 0.060 1277 0.129 ± 0.036 1029 5.22 1.34 0.292

2 0.0 0.145 ± 0.064 1042 0.024 ± 0.019 540 3.92 2.30 0.027 0.1790.5 0.257 ± 0.081 1142 0.070 ± 0.042 790 4.59 5.26 0.023 0.2641.0 0.258 + 0.069 1039 0.088 ± 0.053 824 4.57 3.00 0.046 0.2261.5 0.261 ± 0.041 1193 0.120 ± 0.041 999 5.25 1.20 0.076 0.2022.0 0.290 ± 0.035 1288 0.135 ± 0.040 1152 5.95 1.02 0.071 0.214

Other parameters: 2Nv = 0.2, 2NA = 0.1 and 2NP = 0.4, with N = 50.*Diversity is measured by the fraction of different sites among the 50 sites with the standard deviation between generations.tAge is the average value in terms of number of generations for all nonidentical sites in diploid individuals.

6718 Genetics: Ohta

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 23

, 202

2

Page 4: Role of diversifying selection and major

Proc. Natl. Acad. Sci. USA 88 (1991) 6719

Table 2. Properties of the simulated populations in the period from the (1/v)th to the 16ONth generation, for case 1 with 2Ns = 1.0Age of Age of Actual Identity Divergence

Nonallelic non- Allelic polymor- number excess at 16ONth2NA diversity* identityt diversity* phismt of alleles standard generation

C1 + F 0.0 0.419 ± 0.101 2259 0.053 ± 0.024 622 4.10 0.77 0.5060.02 0.340 ± 0.094 2087 0.099 ± 0.046 1272 4.35 2.19 0.4750.04 0.336 ± 0.069 2096 0.119 ± 0.050 1577 4.79 2.11 0.4420.1 0.266 ± 0.059 1802 0.118 ± 0.039 1583 4.79 2.50 0.3850.2 0.249 ± 0.069 1529 0.101 ± 0.053 1276 5.39 2.16 0.4040.4 0.223 ± 0.049 1895 0.098 ± 0.053 1533 5.68 3.12 0.519

F 0.0 0.328 ± 0.152 1953 0.062 ± 0.031 758 4.14 0.24 0.4390.02 0.252 ± 0.092 1632 0.099 ± 0.058 1102 4.33 2.27 0.4890.04 0.198 ± 0.072 1635 0.090 ± 0.038 1332 4.49 1.96 0.3180.1 0.171 ± 0.058 1793 0.132 ± 0.033 1738 5.17 2.66 0.4690.2 0.173 ± 0.053 1471 0.131 ± 0.041 1361 5.73 2.78 0.5390.4 0.067 ± 0.027 992 0.103 ± 0.032 1359 6.03 5.67 0.395

C1 + F means that selection works for diversity of both allelic and nonallelic genes (SF = SC in Eq. 1), and F means that selection works onlyon allelic genes (SF > 0, SC = 0 in Eq. 1). Other parameters: 2Nv = 0.2 and 2NP = 0.4, with N = 50. See Table 1 for footnotes.

allelic genes and does not work on nonallelic genes-i.e., SF> 0 and sc = 0 in Eq. 1. Also, the length of the simulation isextended to 16ONth generations. As before, the averagevalues of gene diversity, age, and so on, in the period fromthe (1/v)th generation to the 16ONth generation were exam-ined. Table 2 gives the results for case 1. In the series ofexperiments, the intensity of selection is fixed (2Ns = 1.0),and the rate of interlocus conversion was varied from 2NA =

0.0 to 2NA = 0.4. In the table, C1 + F means that there isselection for diversity of both allelic and nonallelic genes (SF= Sc = 0.01 in Eq. 1), and F means that selection is only onallelic genes (SF = 0.01, Sc = 0 in Eq. 1). Let us call the former(C1 + F) selection and the latter F selection.The results of Table 2 show that, as the conversion rate

increases, the nonallelic diversity decreases, whereas theallelic diversity becomes larger in both selection models. Asto the effect of the selection form, the nonallelic diversity ishigher and the allelic diversity is lower with (C1 + F)selection than with F selection. This is just as expected.Thus, in the extreme situation of F selection with highconversion rate, the allelic diversity exceeds the nonallelicdiversity. Such a relationship is not in accord with the realdata of MHC polymorphisms. The age of polymorphism issimilarly affected by the type of selection and by the con-version rate as the diversity. The actual number ofalleles alsoincreases as the conversion rate increases in both selectionmodels. The result has a significant bearing on understandingMHC polymorphisms, which will be discussed later. In bothmodels, the identity excess is high.The results of case 2 for the two models of selection are

given in Table 3. General properties of data in Table 3 are

quite similar to those of data in Table 2. However, there aresignificant differences between the two. The differences arecaused by gene conversion involving nonexpressed loci incase 2. First, both diversity and age of nonidentity increaseby conversion from nonexpressed loci. Second, in case 2,unlike the previous case 1, allelic diversity does not exceednonallelic diversity even in the extreme situation ofF selec-tion with high conversion rate. Third, the actual number ofalleles tends to be slightly larger, but the identity excess andthe distance at the end tend to be smaller in case 2 than in case1. In case 2, the allelic diversity at one of the nonexpressedloci was also measured and is given in the table. The diversityat the nonexpressed locus is 30-73% of that at the expressedloci. The difference between the expressed and the nonex-pressed loci is insufficient compared with real data, and theproblem will be discussed later.

DISCUSSIONThe present simulation studies have clearly shown that theinteraction among diversifying selection, gene conversion,and random genetic drift is important for acquiring andmaintaining MHC polymorphisms: diversifying selection aswell as gene conversion is effective in increasing the allelicdiversity, the actual number of alleles, and the age of poly-morphism. The effect of conversion is particularly pro-nounced in case 2, where gene conversion from the nonex-pressed loci is incorporated. Note that random drift is alsoimportant. In their simulation study of multilocus overdom-inance, Franklin and Lewontin (19) concluded that the es-tablishment of complementary blocks of genes is caused by

Table 3. Properties of the simulated populations in the period from the (1/v)th to the 16ONth generation for case 2 with 2Ns = 1.0

Age of Age of Actual Identity Allelic DivergenceNonallelic non- Allelic polymor- number excess, diversity,* at 16ONth

2NA diversity* identityt diversity* phismt of alleles standard nonexpressed generationC1 + F 0.0 0.462 + 0.146 2197 0.074 ± 0.032 804 4.34 1.25 0.022 0.521

0.02 0.339 ± 0.087 2026 0.079 ± 0.041 1366 4.37 0.79 0.033 0.3580.04 0.332 ± 0.099 2331 0.105 ± 0.038 1541 4.45 1.70 0.055 0.3090.1 0.284 ± 0.063 1941 0.121 ± 0.055 1640 5.10 1.78 0.091 0.3520.2 0.274 ± 0.060 1898 0.146 ± 0.044 1684 5.92 1.29 0.095 0.3840.4 0.273 ± 0.067 1820 0.159 ± 0.059 1703 7.15 1.65 0.096 0.464

F 0.0 0.336 ± 0.141 1738 0.017 ± 0.012 143 3.36 1.95 0.013 0.4500.02 0.286 ± 0.105 1898 0.093 ± 0.043 1360 4.43 1.19 0.038 0.3210.04 0.278 ± 0.109 1872 0.117 ± 0.054 1610 4.86 1.87 0.056 0.3540.1 0.248 ± 0.082 2056 0.136 ± 0.044 1817 5.28 1.92 0.075 0.2940.2 0.213 ± 0.072 1687 0.133 ± 0.049 1604 6.17 1.90 0.093 0.4220.4 0.167 ± 0.050 1494 0.133 ± 0.034 1506 7.11 2.36 0.097 0.380

C1 + F, F, and other parameters are as in Table 2. See Table 1 for footnotes.

Genetics: Ohta

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 23

, 202

2

Page 5: Role of diversifying selection and major

Proc. Natl. Acad. Sci. USA 88 (1991)

"finiteness" of population size. In our study, gene conver-sion makes the system more complex, and all three processescontribute to the development of polymorphisms.The real data of MHC polymorphisms suggest that many

polymorphisms are trans species-i.e., more ancient thanrecent speciation (23). The age of polymorphisms are oftenestimated to be (50-100)N generations (24). Our resultssuggest that an age of this order of magnitude may beexplained by assuming weak selection at each site. In addi-tion to the results given in Tables 1-3, I have checked the ageof polymorphic alleles at the end of each simulation experi-ment. These values often exceed 10ON generations at the160Nth generation of the experiment even under weak se-lection (2Ns = 1.0). Note that the age of polymorphic sitesgiven in Tables 2 and 3 is the average for all segregating sitesin the period from the (1/v)th to the 160Nth generation, andthe value is much smaller than the age of polymorphic allelesat the end of the experiment.The actual number of alleles in a sample of 10-80 haplo-

types is reported to be 5-20 in local Mus populations (25). Ourresults suggest that the number in the simulated populationsis slightly smaller than such data. This problem may beovercome by making ln larger or the region of conversionsmaller in the simulation to induce more "recombinant"genes. The allele number may also be increased by bringingthe subdivided population structure into the model. Theseproblems are left to a future study.

Allelic diversity at the nonexpressed locus may not besmall enough as compared with that at the expressed loci inthe simulated populations. In other words, the differencebetween the expressed and the nonexpressed genes may beinsufficient to account for the actually observed difference.In future analyses, the preferential conversion from thenonexpressed to the expressed loci should be incorporated.Linkage disequilibrium is one ofthe most intensely studied

quantities on MHC polymorphisms (26). Strong associationamong serologically detectable alleles ofdifferent loci is oftenobserved, but the combination of strongly associated allelesis usually different between local populations, indicatingsome effects of random drift. In the present simulations,strong linkage disequilibrium occurs as shown by standard-ized identity excess in Tables 1-3. Although one needs moredetailed numerical comparison with the actual data, theassociation of alleles between loci appears to be strongenough in the simulated populations. I incorporated inter-chromosomal conversion, which has an effect similar to theordinary meiotic crossing-over, but it may not be quitesufficient.

Finally, I discuss the puzzling observation on the acceler-ation of amino acid substitution at ARS. Figure 2 of ref. 20indicates that the acceleration of amino acid substitutiondisappears as genes become old in the genetic turnoverprocess mentioned earlier. Hughes and Nei (3) suggest thatthe slowdown of the acceleration can be explained if onlycertain types of amino acid replacements are favored even atARS with back and forth substitutions, which they call"saturation." For the class II gene family, this hypothesismay be appropriate, since the mean divergence is high. It islikely that the evolutionary pattern is considerably differentbetween the class I and class II families. It has been suggestedthat the HLA-DR locus activates the immune reaction,whereas the HLA-DQ locus may suppress the reaction (27).

Thus, the two loci may not be interchangeable, and thegenetic turnover may be prevented in the class II gene family.For class I genes, the mean divergence is 40% at most, andone has to assume very limited types of amino acid replace-ments at ARS to explain the slowdown by saturation. I havementioned that the diversifying selection for allelic andnonallelic genes results in rapid divergence at the beginningof the turnover process, followed by the slower divergence(20). The present simulations have shown that conversionfrom the nonexpressed loci strengthens this tendency. Asmentioned before, this effect may be caused by the slowdivergence at the nonexpressed genes from which geneticinformation transfers to the expressed loci.

I thank Professors Motoo Kimura, Takehiko Sasazuki, Philip W.Hedrick, Bruce S. Weir, Hiroshi Hori, and Kenichi Aoki for theirmany valuable comments on the manuscript. This work is supportedby a Grant-in-Aid from the Ministry of Education, Science andCulture of Japan. This is contribution no. 1869 from the NationalInstitute of Genetics, Mishima, Japan.

1. Klein, J. (1986) Natural History of the Major Histocompati-bility Complex (Wiley, New York).

2. Bodmer, W. F. & Bodmer, J. G. (1989) in Mathematical Evo-lutionary Theory, ed. Feldman, M. W. (Princeton Univ. Press,Princeton, NJ), pp. 315-334.

3. Hughes, A. L. & Nei, M. (1988) Nature (London) 335,167-170.4. Ohta, T. (1983) Theor. Popul. Biol. 23, 216-240.5. Ohta, T. (1988) in Oxford Surveys in Evolutionary Biology, V,

eds. Harvey, P. H. & Partridge, L. (Oxford Univ. Press,Oxford, U.K.), pp. 41-65.

6. Kappes, D. & Strominger, J. L. (1988) Annu. Rev. Biochem.57, 991-1028.

7. Lawlor, D. A., Zemmour, J., Ennis, P. D. & Parham, P. (1990)Annu. Rev. Immunol. 8, 23-63.

8. Alberts, B., Bray, D., Lewis, J., Raff, M., Roberts, K. &Watson, J. D. (1989) Molecular Biology of the Cell (Garland,New York), 2nd Ed.

9. Klein, J. & Figueroa, F. (1986) CRC Crit. Rev. Immunol. 6,295-386.

10. Bregegere, F. (1983) Biochimie 65, 229-237.11. Ohta, T. (1984) Genetics 106, 517-528.12. Parham, P. (1989) Nature (London) 324, 617-618.13. Kimura, M. & Crow, J. F. (1964) Genetics 49, 725-738.14. Nei, M. (1987) Molecular Evolutionary Genetics (Columbia

Univ. Press, New York).15. Kimura, M. (1983) The Neutral Theory ofMolecular Evolution

(Cambridge Univ. Press, London).16. Bjorkman, P. J., Saper, M. A., Samraoui, B., Bennett, W. S.,

Strominger, J. L. & Wiley, D. C. (1987) Nature (London) 329,506-512.

17. Bjorkman, P. J., Saper, M. A., Samraoui, B., Bennett, W. S.,Strominger, J. L. & Wiley, D. C. (1987) Nature (London) 329,512-518.

18. Nagylaki, T. (1984) Genetics 106, 529-548.19. Franklin, I. & Lewontin, R. C. (1970) Genetics 65, 707-734.20. Ohta, T. (1991) in Evolution ofLife, eds. Osawa, S. & Honjo,

T. (Springer, Berlin), pp. 145-159.21. Ohta, T. (1980) Genet. Res. 36, 181-197.22. Hedrick, P. W. (1987) Genetics 117, 331-341.23. Klein, J. (1987) Hum. Immunol. 19, 155-162.24. Takahata, N. & Nei, M. (1990) Genetics 124, 967-978.25. Nadeau, J. H., Wakeland, E. K., Gotze, D. & Klein, J. (1981)

Genet. Res. 37, 17-31.26. Bodmer, J. G. & Bodmer, W. F. (1970)Am. J. Hum. Genet. 22,

396-411.27. Sasazuki, T. (1989) Prog. Immunol. 7, 853-860.

6720 Genetics: Ohta

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 23

, 202

2