6
Molecular Phylogenetics and Evolution 42 (2007) 388–393 www.elsevier.com/locate/ympev 1055-7903/$ - see front matter © 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.ympev.2006.07.015 Likelihood-ratio tests for positive selection of human and mouse duplicate genes reveal nonconservative and anomalous properties of widely used methods Robert Friedman 1 , Austin L. Hughes ¤ Department of Biological Sciences, University of South Carolina, Columbia, SC 29208, USA Received 18 April 2006; revised 6 July 2006; accepted 22 July 2006 Available online 3 August 2006 Abstract Two commonly used methods based on likelihood-ratio tests (LRTs) for detecting positive Darwinian selection at the molecular level were applied to a data set of 604 gene families containing two members in the human genome and two members in the mouse genome. These methods detected positive selection in a very high proportion of families; in over 50% of families, there was signiWcant evidence of positive selection by one or both methods. However, less than a third of families showing evidence for positive selection by at least one of the methods showed evidence of positive selection by both methods. The outcome of these tests was predicted better by sequence length, G + C content at third-codon positions, and the level of synonymous substitution than by the level of nonsynonymous substitution or the ratio of nonsynonymous to synonymous substitution. These results suggested that LRT-based tests for positive selection may be sensitive to certain factors that make it diYcult to reconstruct the true pattern of nucleotide substitution. © 2006 Elsevier Inc. All rights reserved. Keywords: Likelihood-ratio test; Nucleotide content; Positive selection; Synonymous substitution 1. Introduction Kimura (1977) was the Wrst to point out that comparison of the numbers of synonymous and nonsynonymous nucle- otide substitutions can be a powerful tool for studying nat- ural selection. A number of subsequent studies tested the prediction that positive Darwinian selection favoring amino acid changes will cause the number of nonsynony- mous substitutions per nonsynonymous site (d N ) to exceed the number of synonymous substitutions per synonymous site (d S ) (Hill and Hastie, 1987; Hughes and Nei, 1988, 1989). Studies such as those of vertebrate major histocom- patibility complex (MHC) genes used this approach to test a priori hypotheses based on biological knowledge. In the case of the class I MHC, for example, structural studies had identiWed the codons encoding the peptide-binding region (PBR) of the molecule (Bjorkman et al., 1987a,b). On the hypothesis that MHC polymorphism is due to overdomi- nant selection (or a similar form of balancing selection) relating to peptide binding and thus to disease resistance (Doherty and Zinkernagel, 1975), it was predicted that nat- ural selection will promote diversity in the PBR but not elsewhere in the molecule. Hughes and Nei (1988) tested this prediction by estimating d S and d N separately in the PBR codons and in the rest of the gene. More recently, a number of investigators have sought to test the hypothesis that positive Darwinian selection is acting on a group of related genes without a speciWc a priori hypothesis regarding the codons subject to positive selection. In these cases, widespread use has been made of likelihood- ratio tests (LRTs) that compare a model incorporating posi- tive selection (i.e., d N > d S ) at certain sites with a model not incorporating positive selection (Yang et al., 2000). These * Corresponding author. Fax: +1 803 777 4002. E-mail address: [email protected] (A.L. Hughes). 1 Present address: Bioinformatics Center, University of Connecticut, Storrs, CT 06269, USA.

Likelihood-ratio tests for positive selection of human and mouse duplicate genes reveal nonconservative and anomalous properties of widely used methods

Embed Size (px)

Citation preview

Page 1: Likelihood-ratio tests for positive selection of human and mouse duplicate genes reveal nonconservative and anomalous properties of widely used methods

Molecular Phylogenetics and Evolution 42 (2007) 388–393www.elsevier.com/locate/ympev

Likelihood-ratio tests for positive selection of human and mouse duplicate genes reveal nonconservative and anomalous properties of widely used methods

Robert Friedman 1, Austin L. Hughes ¤

Department of Biological Sciences, University of South Carolina, Columbia, SC 29208, USA

Received 18 April 2006; revised 6 July 2006; accepted 22 July 2006Available online 3 August 2006

Abstract

Two commonly used methods based on likelihood-ratio tests (LRTs) for detecting positive Darwinian selection at the molecular levelwere applied to a data set of 604 gene families containing two members in the human genome and two members in the mouse genome.These methods detected positive selection in a very high proportion of families; in over 50% of families, there was signiWcant evidence ofpositive selection by one or both methods. However, less than a third of families showing evidence for positive selection by at least one ofthe methods showed evidence of positive selection by both methods. The outcome of these tests was predicted better by sequence length,G + C content at third-codon positions, and the level of synonymous substitution than by the level of nonsynonymous substitution or theratio of nonsynonymous to synonymous substitution. These results suggested that LRT-based tests for positive selection may be sensitiveto certain factors that make it diYcult to reconstruct the true pattern of nucleotide substitution.© 2006 Elsevier Inc. All rights reserved.

Keywords: Likelihood-ratio test; Nucleotide content; Positive selection; Synonymous substitution

1. Introduction

Kimura (1977) was the Wrst to point out that comparisonof the numbers of synonymous and nonsynonymous nucle-otide substitutions can be a powerful tool for studying nat-ural selection. A number of subsequent studies tested theprediction that positive Darwinian selection favoringamino acid changes will cause the number of nonsynony-mous substitutions per nonsynonymous site (dN) to exceedthe number of synonymous substitutions per synonymoussite (dS) (Hill and Hastie, 1987; Hughes and Nei, 1988,1989). Studies such as those of vertebrate major histocom-patibility complex (MHC) genes used this approach to testa priori hypotheses based on biological knowledge. In the

* Corresponding author. Fax: +1 803 777 4002.E-mail address: [email protected] (A.L. Hughes).

1 Present address: Bioinformatics Center, University of Connecticut,Storrs, CT 06269, USA.

1055-7903/$ - see front matter © 2006 Elsevier Inc. All rights reserved.doi:10.1016/j.ympev.2006.07.015

case of the class I MHC, for example, structural studies hadidentiWed the codons encoding the peptide-binding region(PBR) of the molecule (Bjorkman et al., 1987a,b). On thehypothesis that MHC polymorphism is due to overdomi-nant selection (or a similar form of balancing selection)relating to peptide binding and thus to disease resistance(Doherty and Zinkernagel, 1975), it was predicted that nat-ural selection will promote diversity in the PBR but notelsewhere in the molecule. Hughes and Nei (1988) testedthis prediction by estimating dS and dN separately in thePBR codons and in the rest of the gene.

More recently, a number of investigators have sought totest the hypothesis that positive Darwinian selection is actingon a group of related genes without a speciWc a priorihypothesis regarding the codons subject to positive selection.In these cases, widespread use has been made of likelihood-ratio tests (LRTs) that compare a model incorporating posi-tive selection (i.e., dN > dS) at certain sites with a model notincorporating positive selection (Yang et al., 2000). These

Page 2: Likelihood-ratio tests for positive selection of human and mouse duplicate genes reveal nonconservative and anomalous properties of widely used methods

R. Friedman, A.L. Hughes / Molecular Phylogenetics and Evolution 42 (2007) 388–393 389

methods have been widely used in recent years, but it is stilluncertain whether they have a tendency to support thehypothesis of positive selection when it is untrue. However, arecent study provided two examples of cases where relatedmethods of inferring positively selected codons producedapparent false-positives (Suzuki and Nei, 2004).

A survey of published studies using LRT-based methodsprobably would not provide an accurate assessment of theprobability of detecting positive selection by these methodsbecause investigators tend to test for selection in caseswhere there are biological reasons for expecting it to occur.Therefore, we applied commonly used LRT-based methodsto a large data set of gene families from human and mouse.Because these methods require a phylogenetic tree, we usedfour-member families, which represent the minimum sizefor reconstructing an unrooted tree. We used these analysesto examine the factors associated with achieving a signiW-cant result in these tests and to compare the resultsobtained with diVerent selection models.

2. Methods

2.1. Sequence data

The genomic data for human (Version 16.33) and mouse(Version 16.30) were obtained from Ensembl (http://www.ensembl.org). The version numbers refer to the data-base software (Ensembl Version 16) and the assembly ofthe genomic sequences (NCBI Versions 33 and 30). Proteinfamilies were identiWed by homology, and a single-linkagemethod employed by the BLASTCLUST software avail-able in the Blast software package (Altschul et al., 1997).Sequence homology was established by identifying matchesusing a conservative E value of 10¡6 with a minimum of30% sequence identity across at least 50% across the lengthof two sequences. Preliminary analyses with this and otherdata sets showed that this is a conservative set of criterionfor establishing gene family membership (Hughes andFriedman, 2004, Unpublished data). The single-linkagemethod assembles larger families by linking shared genesamong families, thus ensuring that a given gene will beassigned to only one family. We identiWed 605 families hav-ing two members each in human and mouse; one of thesefamilies was excluded from analyses because the phyloge-netic tree was not resolved (i.e., a unique ML tree was notobtained).

2.2. Statistical analyses

Homologous sequences were aligned at the amino acidlevel using the CLUSTAL W program (Thompson et al.,1994), and this alignment was imposed on the DNAsequences using a customized Perl script. Any codon atwhich the alignment postulated a gap in any sequence wasexcluded from the analysis. The alignments are available asSupplementary Material. The number of synonymousnucleotide substitutions per synonymous site (dS) and the

number of nonsynonymous nucleotide substitutions pernonsynonymous site (dN) were estimated by a maximumlikelihood method (Yang and Nielsen, 2000) using the soft-ware package PAML (Yang, 1997).

Likelihood-ratio tests were used to compare modelsassuming no positive selection with those implying that atsome codons the ratio of nonsynonymous to synonymoussubstitution (�) exceeds 1, as expected under positive selec-tion. Two tests of positive selection were used:

(1) M1/M2 tests, which compare models M1 and M2(Nielsen and Yang, 1998; Yang et al., 2000). Under theM1 model, a single parameter is estimated: p0, the pro-portion of conserved sites with �0D0. At the remainingsites, constituting proportion p1 of the total sites (wherep1D1¡ p0), it is assumed that �1D 1. The M2 modeladds a class of positively selected sites with frequency p2(where p2D 1¡p1¡p0), with ratio �2 estimated from thedata. Thus, while M1 estimates a single parameter (p0),M2 estimates three parameters (p0, p1, and �2).(2) M7/M8 tests, which compare models M7 and M8(Yang et al., 2000). In the M7 model, � follows a betadistribution such that 0 6�6 1, and the two parame-ters (p and q) of the beta distribution are estimated fromthe data. In the M8 model, a proportion p0 of sites have� drawn from the beta distribution. The remaining sites(proportion p1) are positively selected and have the same�1 (>1). Thus, M7 estimates two parameters (p and q),while M8 estimates four parameters (p, q, p0, and �1).

In the likelihood-ratio test (LRT), twice the log-likeli-hood diVerence between the two models is compared withthe �2 statistics with degrees of freedom equal to the diVer-ence in the number of parameters, i.e., 2 degrees of freedomin the case of both M1/M2 tests and M7/M8 tests. In orderto count an individual gene family as showing a signiWcantevidence of positive selection, we required both a signiW-cant�2 statistic (at the 5% level) and the identiWcation ofone or more codons subject to positive selection. Resultsusing a 1% signiWcance level were similar to those obtainedusing a 5% signiWcance level; only the latter are reportedhere. The phylogenetic tree for each family was recon-structed by the ML method in PAML assuming the M1model. We used linear discriminant analyses (Morrisonp.231–232, 1976) in order to predict the outcome of testsbased on variables describing sequence properties of familymembers. Statistical analyses comparing families were con-ducted using the Minitab statistical package, release 13(http://www.minitab.com/).

3. Results

M1/M2 tests gave signiWcant support to the hypothesisof positive selection at the 5% level in 209 of 604 families(34.6%). M7/M8 tests provided signiWcant support for posi-tive selection at the 5% level in 229 families (37.9%) and 329families (54.5%) were signiWcant by one or both types of

Page 3: Likelihood-ratio tests for positive selection of human and mouse duplicate genes reveal nonconservative and anomalous properties of widely used methods

390 R. Friedman, A.L. Hughes / Molecular Phylogenetics and Evolution 42 (2007) 388–393

test. A 2£ 2 contingency table showed a signiWcant associa-tion between signiWcance by the two tests (Table 1). How-ever, this association was explained mainly by the fact thatthe largest category in the table was that of families signiW-cant by neither test (275 families or 45.5%) (Table 1). Of the329 families in which one or both tests was signiWcant, only107 (32.5%) showed signiWcance by both tests (Table 1).

Yang et al. (2000) warn that, because the �2 test may notbe strictly applicable in these tests, one should have cautionregarding cases where the test statistic is close to the criticalvalue. In the present data set, most signiWcant �2 valueswere in fact substantially above the critical value, which is5.99 for the 5% level with 2 d.f. In M1/M2 tests, the medianvalue of the �2 statistic for families showing evidence ofpositive selection was 201.30 (range 8.02–1957.98). Of 209families, 190 (90.9%) had �2 values signiWcant at the 0.01level; 181 (86.6%) had �2 values signiWcant at the 0.001 level;and 161 (77.0%) had �2 values signiWcant at the 0.00001level. In M7/M8 tests, the median value of the �2 statisticfor families showing evidence of positive selection was22.07 (range 6.01–220.74). Of 227 families, 206 (90.7%) had�2 values signiWcant at the 0.01 level; 196 (86.3%) had �2 val-ues signiWcant at the 0.001 level; and 175 (77.1%) had �2 val-ues signiWcant at the 0.00001 level. Thus, by both tests, thesets of families with signiWcant results included very fewthat were marginally signiWcant and many that were veryhighly signiWcant.

By M1/M2 tests, families with Tree I topologies (Fig. 1)showed evidence of positive selection in 198 of 576 cases(34.4%), while families with Tree II topologies showed sig-niWcant evidence of positive selection in 11 of 28 cases(39.3%) (Table 2). Thus, there was no detectable diVerencebetween the two topologies with respect to the incidence ofsigniWcant results in the M1/M2 test. On the other hand, byM7/M8 tests, families with Tree I topologies were signiW-cantly more likely to show signiWcant evidence of positive

Table 1Numbers of families showing signiWcant (at 5% level) evidence of positiveselection by M1/M2 tests and by M7/M8 tests

Test of independence: �2 D 25.25; 1 d.f.; P < 0.001.

M7/M8 tests

M1/M2 tests Not signiWcant (%) SigniWcant (%)

Not signiWcant 275 (45.5) 120 (19.9)SigniWcant 102 (16.9) 107 (17.7)

Fig. 1. The two alternative topologies relating the two human genes (H)and the two mouse genes (M) in the families analyzed.

M

M

H

H

Tree I

M

H

H

M

Tree II

selection (222/576 or 38.5%) than were families with Tree IItopologies (5/28 or 17.9%) (Table 2).

We compared median values of the following quantitiesamong families categorized by whether there was evidenceof positive selection by either or both of the two types oftest: (1) the length (number of codons) of the alignedsequences; (2) mean G + C content at third-codon positions(GC3) in the four genes in the family; (3) mean dS for allpair-wise comparisons among the four genes in the family;(4) mean dN for all pair-wise comparisons among genes inthe family; and (5) mean dN/dS for all pair-wise compari-sons among the four genes in the family (Table 3).

There were signiWcant diVerences among the four catego-ries of families with respect to sequence length, GC3, andmean dS (Table 3). Families showing evidence of positiveselection only by M7/M8 tests had the highest mediansequence length, while families showing evidence of positiveselection by neither test showed the lowest median sequencelength (Table 3). On the other hand, families showing evi-dence of positive selection by both types of test showed thehighest median GC3 and the highest median value of mean dS(Table 3). The lowest median GC3 was in families showingevidence of positive selection by M7/M8 only, whereas fami-lies showing evidence of positive selection by either or bothtests all had higher median values of mean dS than did familiesshowing evidence of positive selection by neither test (Table3). In contrast, although mean dN and mean dN/dS are the twovariables with the most obvious theoretical link to positiveselection, median values of these two variables did not diVersigniWcantly among categories of families. Similar patternswere seen when the means of the same variables were com-pared by parametric analysis of variance (data not shown).

Table 2SigniWcance (at the 5 and 1% levels) of tests for positive selection in fami-lies classiWed by the topology of the phylogenetic tree

a See Fig. 1.b Test of independence: �2 D 0.29; 1 d.f.; not signiWcant.c Test of independence: �2 D 4.87; 1 d.f.; P D 0.027.

Tree Ia Tree IIa

M1/M2 testsNot signiWcant 378 17SigniWcant 198 11b

M7/M8 testsNot signiWcant 354 23SigniWcant 222 5c

Table 3Medians of variables describing sequences used in tests of positive selec-tion

n.s., not signiWcant

Length (codons)

Mean GC3 (%)

Mean dS

Mean dN

Mean dN/dS

SigniWcant by:Neither test 281 60.8 1.666 0.355 0.186M1/M2 test only 313 59.8 1.708 0.348 0.194M7/M8 test only 524 56.6 1.786 0.353 0.171Both tests 436 62.1 1.909 0.340 0.162

P (Kruskal–Wallis test) <0.001 0.006 0.013 n.s. n.s.

Page 4: Likelihood-ratio tests for positive selection of human and mouse duplicate genes reveal nonconservative and anomalous properties of widely used methods

R. Friedman, A.L. Hughes / Molecular Phylogenetics and Evolution 42 (2007) 388–393 391

We used linear discriminant analyses in order to exam-ine further the relationship between these variables and evi-dence of positive selection. A discriminant function basedon sequence length, GC3, and mean dS successfully pre-dicted whether or not there would be signiWcant evidence ofpositive selection by M1/M2 tests in 55.1% of families (333/604). Adding mean dN and mean dN/dS to the discriminantfunction improved the success rate to 57.5% (347 families).On the other hand, a discriminant function based onsequence length, GC3, and mean dS successfully predictedwhether or not there would be signiWcant evidence of posi-tive selection by M7/M8 tests in 70.0% of families (423/604).Adding mean dN and mean dN/dS to the discriminant func-tion decreased the success rate to 69.0% (417/604 families).Overall, a discriminant function based on sequence length,GC3, and mean dS successfully predicted whether or notthere would be signiWcant evidence of positive selection byeither M1/M2 or M7/M8 tests (or both) in 63.1% of families(381/604). Adding mean dN and mean dN/dS to the discrimi-nant function decreased the success rate to 62.9% (380/604families).

Because these variables were inter-correlated, we usedpartial correlation to tease apart further the relationshipsbetween these variables and the outcome of LRT. We com-puted partial rank correlation coeYcients between the �2

statistic in M1/M2 tests and each of the following variables,simultaneously controlling for all other variables: length,GC3, mean dS, mean dN, and mean dN/dS (Table 4). Like-wise, we computed partial rank correlation coeYcientsbetween the �2 statistic in M7/M8 tests and each of thesame set of variables, simultaneously controlling for allother variables (Table 4). The �2 statistic in M1/M2 testsshowed highly signiWcant positive partial correlations withlength, GC3, and mean dS (Table 4). However, The �2 statis-tic in M1/M2 tests did not show a signiWcant positive par-tial correlation with either mean dN or mean dN/dS (Table4). In contrast, the �2 statistic in M7/M8 tests showed ahighly signiWcant positive partial correlation with length,but was not signiWcantly correlated with any of the othervariables (Table 4).

Overall mean GC3 was 59.8% (§0.4 S.E.), and the medianwas 59.7%. In 490 families (81.1%), GC3 was greater than50.0%. Thus, there was a preponderance of GC-rich genefamilies. We calculated GC-content bias as the absolutevalue of the diVerence between GC3 and 50.0%. GC-contentbias was strongly positively correlated with GC3 (rank cor-

Table 4Rank partial correlations with �2 statistics for Wve variables (simulta-neously controlling in each case for the eVects of other four variables)

n.s., not signiWcant

Length Mean GC3

Mean dS

Mean dN

Mean dN/dS

M1/M2 tests 0.577 0.278 0.132 ¡0.007 0.061P <0.001 <0.001 0.001 n.s. n.s.

M7/M8 tests 0.360 ¡0.018 0.047 0.019 ¡0.004P <0.001 n.s. n.s. n.s. n.s.

relation coeYcientD0.875; P < 0.001). This relationshipimplies that, in this data set, gene families with high GC3were those with the highest GC-content bias.

4. Discussion

A study based on estimating dS and dN over entire genesby Endo et al. (1996) found evidence of positive selection inonly 0.47% of 3595 gene families surveyed. Yang et al. (2000)suggested that LRT-based approaches are more likely thanEndo et al.’s (1996) method to detect positive selection, butdid not provide a quantitative estimate of the increase inpower. The present study applied LRT methods to a set ofmammalian gene families none of which was predicted it apriori to be subject to positive selection. Nonetheless, LRTmethods supported positive selection in an extraordinarilyhigh proportion of families, about two orders of magnitudehigher than found by Endo et al. (1996).

The widely used M1/M2 test yielded statistically signiW-cant evidence of positive selection in 34.6% of families. TheM7/M8 test was slightly less conservative, yielding signiW-cant support in 37.6% of families. Because the approachused by Endo et al. (1996) did not estimate dS and dN sepa-rately for separate functional domains of proteins, it islikely that those authors underestimated the frequency ofcases of positive selection. Nonetheless, it seems possiblethat the LRT methods overestimated the frequency of posi-tive selection in the present data, at least to some extent.

Whether the LRT test is prone to false-positives mightbe addressed by computer simulation, as Zhang (2004) hasrecently done for likelihood models that test for positiveselection at individual sites along a speciWc lineage. How-ever, aside from questions about the test itself, the expectedextent of positive selection in natural populations remainsan open question. Kimura and Ohta (1974) predicted thatover the evolution of life-purifying selection and chanceWxation of neutral variants have occurred “far more fre-quently” than Wxation by positive selection. On the otherhand, Smith and Eyre-Walker (2002) estimated that 45% ofthe amino acid substitutions between Drosophila simulansand D. yakuba have been Wxed by positive selection.

As with many data sets to which LRT methods havebeen applied, natural selection on proteins in the presentdata set might have involved both divergence of orthologsbetween species and divergence of paralogs within species.Since gene duplication occurs frequently over the course ofevolution (Lynch and Conery, 2000) but not all duplicategenes are retained, the duplicate genes found in genomesare not a random sample of all duplicates that haveoccurred (Hughes, 1994). Since duplicate genes that havediverged functionally are more likely to be retained, the fre-quency of positive selection between paralogs may be rela-tively high (Hughes, 1994). On the other hand, if theretention of duplicate genes occurs mainly because of theevolution of expression diVerences (Lynch and Force,2000), natural selection favoring coding sequence diver-gence may be relatively rare. Clearly, if positive selection is

Page 5: Likelihood-ratio tests for positive selection of human and mouse duplicate genes reveal nonconservative and anomalous properties of widely used methods

392 R. Friedman, A.L. Hughes / Molecular Phylogenetics and Evolution 42 (2007) 388–393

as widespread in multi-gene families as suggested by thepresent results, evidence for positive selection in itself doesnot provide a great deal of information. Rather, it is impor-tant to test biological hypotheses regarding the regions ofmolecules on which selection has focused (Hughes, 1999).

In addition to providing evidence of a high frequency ofdetection of positive selection, our results revealed a certainlack of comparability between the methods. Over two-thirds (67.5%) of families showing signiWcant evidence ofpositive selection by one of the two tests did not show evi-dence of positive selection by the other test (Table 1). Fur-thermore, in both tests a signiWcant result was associatedwith variables not expected to have any biological rele-vance to the presence of positive selection. One such vari-able was sequence length, which partial correlation analysesshowed to be positively associated with the �2 statistic inboth kinds of tests (Table 4). An eVect of sequence length isnot surprising, since these tests depend on the assumptionthat each codon evolves independent of other codons(Yang et al., 2000). Thus, all else being equal, it is expectedthat as the number of codons increases, the power to detectpositive selection should increase as well. Nonetheless, itseems paradoxical that the median sequence length in fami-lies where positive selection was supported by both testswas substantially lower than the median sequence length infamilies where only M7/M8 supported positive selection(Table 3).

In addition, partial correlation analyses showed a signiW-cant positive correlation between the �2 statistic in M1/M2tests and mean dS among family members (Table 4). Inthese families, mean dS serves primarily as a proxy for theage of duplication of paralogous gene pairs. Therefore, itseems that M1/M2 tests are more likely to detect selectionin cases of ancient duplication. In the case of M7/M8 tests,there was not a signiWcant positive correlation between the�2 statistic and mean dS among family members (Table 4).However, M7/M8 tests were signiWcantly more likely todetect positive selection in families with Tree I, indicatingancient duplication prior to human–mouse divergence,than in families with Tree II, indicating duplication afterthe human–mouse divergence (Fig. 1). Thus, M7/M8 testsmay also somewhat more readily detect selection in the caseof ancient duplication.

Partial correlation analysis showed a signiWcant associa-tion between the �2 statistic in M1/M2 tests and G + C atthird-codon positions (GC3) (Table 4). In this data set,GC3 and mean dS were positively correlated (rank correla-tion coeYcientD0.322; P < 0.001). However, since partialcorrelation controls for such correlations among variables,our results imply that the association between a high �2 sta-tistic in the M1/M2 test and GC3 is independent of the lin-ear relationship between GC3 and mean dS. One factor thatmay explain the association of GC3 with a high �2 statisticin the M1/M2 test is the fact that, as is typical of mamma-lian genomes (Duret, 2002), there is a bias towards G + C atthird positions. As a consequence, genes with high GC3constituted the majority of genes with biased GC content.

False-positives in the detection of positive selection mayoccur when synonymous substitutions at certain sites areunderestimated because of this bias.

Our results thus suggest that the M1/M2 test may besensitive to factors that make it diYcult to reconstructthe true pattern of codon substitution. Such factorsapparently include a large evolutionary distance amongsequences and a skewed G + C content. LRT methodsrely on “nested” models in which a simpler model (notassuming positive selection) is compared with a morecomplex model (including positive selection). However,this testing procedure may not be reliable if neither thesimple model nor the complex model accurately reXectsthe true evolutionary process. The M7/M8 test appears tobe less sensitive to large evolutionary distances andG + C content than the M1/M2 test (Table 4), presum-ably because the use of a beta distribution to model �may more accurately reXect the distribution of � valuesamong sites in biological data. Nonetheless, it might bepreferable to apply the M7/M8 test only to groups ofclosely related genes (e.g., Jansa et al., 2003), given thediYculty of reconstructing the true pattern of substitu-tion among distantly related sequences.

Our results were obtained with data sets containingonly four members; and it may be possible to avoid someof the anomalous results we obtained by using as large anumber of sequences as possible, as previously recom-mended (Anisimova et al., 2001). In addition, recentmodiWcations of the models used in the LRT tests mayoVer improvements (Wong et al., 2004; Yang et al., 2005).However, it is important to note that numerous papershave been published claiming to provide evidence of pos-itive selection on the basis of the models examined here.Our results suggest that all such claims should beregarded with caution.

In the future, it would be worthwhile to explore newforms of the LRT that, even in the absence of positive selec-tion, make provision for a broader variance of � acrosscodons than is assumed in currently used models (Hughesand Friedman, 2005). It is our hope that the present resultswill stimulate further improvements in the tests for positiveselection. We anticipate that further improvement of themodels of sequence evolution, along with the application ofmore statistically robust tests for positive selection, willenhance our understanding of the role of natural selectionin the evolution of protein-coding genes.

Acknowledgment

This research was supported by a Grant GM43940 fromthe National Institutes of Health.

Appendix A. Supplementary data

Supplementary data associated with this article can befound, in the online version, at doi:10.1016/j.ympev.2006.07.015.

Page 6: Likelihood-ratio tests for positive selection of human and mouse duplicate genes reveal nonconservative and anomalous properties of widely used methods

R. Friedman, A.L. Hughes / Molecular Phylogenetics and Evolution 42 (2007) 388–393 393

References

Altschul, S.F., Madden, T.L., SchaVer, A.A., Zhang, J., Zhang, Z., Miller,W., Lipman, D.J., 1997. Gapped BLAST and PSI-BLAST: a new gen-eration of protein database search programs. Nucleic Acids Res. 25,3389–3402.

Anisimova, M., Bielawski, J.P., Yang, Z., 2001. Accuracy and power of thelikelihood-ratio test in detecting adaptive evolution. Mol. Biol. Evol.18, 1585–1592.

Bjorkman, P.J., Saper, M.A., Samraoui, B., Bennet, W.S., Strominger, J.L.,Wiley, D.C., 1987a. Structure of the human class I histocompatibilityantigen, HLA-A2. Nature 329, 506–512.

Bjorkman, P.J., Saper, M.A., Samraoui, B., Bennet, W.S., Strominger, J.L.,Wiley, D.C., 1987b. The foreign antigen binding site and T-cell recogni-tion regions of class I histocompatibility antigens. Nature 329, 512–518.

Doherty, P.C., Zinkernagel, R.M., 1975. Enhanced immunologic surveillancein mice heterozygous at the H-2 gene complex. Nature 256, 50–52.

Duret, L., 2002. Evolution of synonymous codon usage in metazoans.Curr. Opin. Genet. Dev. 12, 640–649.

Endo, T., Ikeo, K., Gojobori, T., 1996. Large-scale search for genes onwhich positive selection may operate. Mol. Biol. Evol. 13, 685–690.

Hill, R.E., Hastie, N.D., 1987. Accelerated evolution in the reactive centerregions of serine protease inhibitors. Nature 326, 96–99.

Hughes, A.L., 1994. The evolution of functionally novel proteins after geneduplication. Proc. R. Soc. Lond. B 256, 119–124.

Hughes, A.L., 1999. Adaptive Evolution of Genes and Genomes. OxfordUniversity Press, New York.

Hughes, A.L., Friedman, R., 2004. DiVerential loss of ancestral gene fami-lies as a source of genomic divergence in animals. Proc. R. Soc. Lond BSuppl. 271, S107–S109.

Hughes, A.L., Friedman, R., 2005. Variation in the pattern of synonymousand nonsynonymous diVerence between two fungal genomes. Mol.Biol. Evol. 22, 1320–1324.

Hughes, A.L., Nei, M., 1988. Pattern of nucleotide substitution at MHCclass I loci reveals overdominant selection. Nature 335, 167–170.

Hughes, A.L., Nei, M., 1989. Nucleotide substitution at major histocom-patibility complex class II loci: evidence for overdominant selection.Proc. Natl. Acad. Sci. USA 86, 958–962.

Jansa, S.A., Lundrigan, B.L., Tucker, P.K., 2003. Tests for positive selec-tion on immune and reproductive genes in closely related species of themurine genus Mus. J. Mol. Evol. 56, 294–307.

Kimura, M., 1977. Preponderance of synonymous changes as evidence forthe neutral theory of molecular evolution. Nature 267, 275–276.

Kimura, M., Ohta, T., 1974. On some principles governing molecular evo-lution. Proc. Natl. Acad. Sci. USA 71, 2848–2852.

Lynch, M., Conery, J.S., 2000. The evolutionary fate and consequences ofduplicate genes. Science 290, 1151–1155.

Lynch, M., Force, A., 2000. The probability of duplicate gene preservationby subfunctionalization. Genetics 154, 459–473.

Morrison, D.F., 1976. Multivariate Statistical Methods, second ed.McGraw-Hill, New York. 231–232.

Nielsen, R., Yang, Z., 1998. Likelihood models for detecting positivelyselected amino acid sites and applications to the HIV-1 envelope gene.Genetics 148, 920–936.

Smith, Eyre-Walker, A., 2002. Adaptive protein evolution in Drosophila.Nature 415, 1022–1024.

Suzuki, Y., Nei, M., 2004. False-positive selection identiWed by ML-basedmethods: examples from the Sig1 gene of the diatom ThalassiosiraweissXogii and the tax gene of a human T-cell lymphotropic virus. Mol.Biol. Evol. 21, 914–921.

Thompson, J.D., Higgins, D.G., Gibson, T., 1994. CLUSTALW: improv-ing the sensitivity of progressive multiple sequence alignment throughsequence weighting, position-speciWc gap penalties and weight matrixchoice. Nucleic Acids Res. 22, 4673–4680.

Wong, W.S.W., Yang, Z., Goldman, N., Nielsen, R., 2004. Accuracy andpower of statistical methods for detecting adaptive evolution in proteincoding sequences and for identifying positively selected sites. Genetics168, 1041–1051.

Yang, Z., 1997. PAML: a program package for phylogenetic analysis bymaximum likelihood. Comput. Appl. Biosci. 13, 555–556.

Yang, Z., Nielsen, R., 2000. Estimating synonymous and nonsynonymoussubstitution rates under realistic evolutionary models. Mol. Biol. Evol.17, 32–43.

Yang, Z., Nielsen, R., Goldman, N., Pedersen, A.M., 2000. Codon-substitu-tion models for heterogeneous selection pressure at amino acid sites.Genetics 155, 431–449.

Yang, Z., Wong, W.S.W., Nielsen, R., 2005. Bayes empirical Bayes infer-ence of amino acid sites under positive selection. Mol. Biol. Evol. 22,1107–1118.

Zhang, J., 2004. Frequent false detection of positive selection by thelikelihood method with branch-site models. Mol. Biol. Evol. 21,1332–1339.